Signed in as:
filler@godaddy.com
Management routinely conveys previously material non-public information to the market during earnings calls, and these disclosures move stock prices. Although communication in this setting is inherently multi-modal, market participants overwhelmingly focus on the words spoken, treating transcripts as sufficient representations of managerial intent and conviction. This paper demonstrates that the voice itself carries a distinct and economically meaningful information channel. Using paralinguistic acoustic features extracted from CEO speech during the Q&A segments of earnings calls, we show that vocal delivery encodes managerial confidence, stress, and uncertainty in ways that are orthogonal to textual content. Employing an event-study framework, we find that these acoustic signals predict short- and medium-horizon excess returns, indicating that markets do not fully incorporate the information conveyed by voice at the time of disclosure. The results establish the voice channel as a material, under-recognized component of corporate communication and a source of alpha for systematic investors, independent of language-based analysis
Paralinguistic features, such as assertiveness, arousal, and nervousness, contain significant economic information. Our findings demonstrate that even when language is devoid of strong text sentiment, the acoustic properties of executive speech can predict post-earnings announcement drift
How Vocal Delivery Improves Earnings-Call NLP Sentiment
This research brief examines whether vocal delivery improves the economic interpretation of earnings-call NLP sentiment. Text sentiment models can identify whether management language is positive, neutral, or negative, but they do not observe whether that language is delivered with vocal confirmation, control, or strain. Using 41,395 Russell 3000 earnings-call observations from July 15, 2020 through September 30, 2025, covering 2,862 unique tickers, we sort text sentiment and proprietary management-calibrated vocal measures into quintiles and evaluate event-sample-relative excess returns over 10-, 20-, and 30-trading-day horizons. The results show that voice improves NLP sentiment on both sides of the sentiment distribution. In the highest NLP sentiment quintile, adding high vocal Valence improves the positive-text signal: over the next 10 trading days, positive text alone produces an event-sample-relative excess return of +0.07%, while positive text paired with high vocal Valence produces +0.46%, an incremental +39 basis points with an event-date clustered t-statistic of 2.03. The hit rate increases from 49.0% to 51.7%. The stronger evidence appears on the downside. In the lowest NLP sentiment quintile, high Vocal Strain observations produce a 20-trading-day event-sample-relative excess return of -0.67%, compared with +0.36%for low-strain negative-text observations, a spread of approximately -103 basis points with an event-date clustered t-statistic of -3.13. Low Balanced Delivery produces similar downside separation. The findings suggest that vocal delivery is best understood not as a replacement for NLP sentiment, but as a conditioning layer: voice helps investors distinguish positive language delivered with confirming vocal tone from routine positive language, and negative language delivered under strain from negative language delivered with control.
Asset managers increasingly explore the idea of extracting confidence, stress, or tone from earnings-call audio. Many already license the raw inputs: audio recordings and transcripts, and assume that with modern ML tools, building proprietary behavioral signals is a straightforward extension. It is not.
Access to audio and text is necessary, but it is nowhere near sufficient.
Across firms, internal build attempts repeatedly fail for the same fundamental reasons.
Copyright © 2026 Speech Craft Analytics Inc. - All Rights Reserved.