The Information Content of Financial Textual Data: Creating News Measures for Volatility Modeling and for the Analysis of Price Jumps

Poli, Francesco

We retrieve news stories and earnings announcements of the S&P 100 constituents from two professional news providers, along with ten macroeconomic indicators. We also gather data from Google Trends about these firms' assets as an index of retail investors' attention. Thus, we create an extensive and innovative database that contains precise information with which to analyze the link between news and asset price dynamics. We detect the sentiment of news stories using a dictionary of sentiment-related words and negations and propose a set of more than five thousand information-based variables that provide natural proxies for the information used by heterogeneous market players. We first shed light on the impact of information measures on daily realized volatility and select them by penalized regression. Then we use these measures to forecast volatility and obtain superior results with respect to the results of models that omit them. Thereafter, we detect intraday price jumps in the S&P 100 constituents' stocks and we build high frequency news indicators from news stories released by two professional news providers, earnings announcements, and twenty-three US macroeconomic indicators. We investigate the extent to which statistically significant intraday jumps are associated with the news indicators and select them by penalized logistic regression. We compare the economic significance of jumps and we find effects on returns and volatility at both high frequency and daily level, and that these effects vary depending on the type of news to which jumps are associated. We also find that future quarterly and yearly returns seem to be exposed to jump risk measures built using jumps related to macro-announcements. A common method to detect the sentiment of a text is the so-called bag-of-words approach. Finally, we extend the method in three directions, by using: 1) an extended negations list of single words, two-word sequences, and three-word sequences; 2) lists of sentiment-related expressions; 3) lists of sentiment-related words combinations. The aim is creating a general method suitable for detecting the sentiment of a financial text of any type.

Ricaviamo da due news provider professionali le news e gli annunci sugli utili dei componenti dell'S&P 100 e dieci indicatori macroeconomici, inoltre raccogliamo i Google Trends associati ai titoli come indice dell'attenzione dei piccoli investitori. Creiamo un database esteso ed innovativo, utile per analizzare il legame tra le news e gli andamenti dei prezzi dei titoli. Rileviamo il sentiment delle news usando un dizionario di parole associate a un sentiment e delle negazioni, e proponiamo un insieme di più di cinquemila variabili che rappresentano l'informazione usata da agenti eterogenei. Facciamo luce sull'impatto delle misure di informazione sulla volatilità realizzata giornaliera e le selezioniamo con la regressione penalizzata; poi le usiamo per prevedere la volatilità, ottenendo risultati superiori rispetto a modelli che le omettono. Dopodiché, rileviamo i salti intragiornalieri nei prezzi dei componenti dell'S&P 100 e costruiamo indicatori di news ad alta frequenza dalle news di due news provider professionali, dagli annunci sugli utili e da ventitré indicatori macroeconomici. Mettiamo in relazione i salti nei prezzi con gli indicatori di news e li selezioniamo con la regressione logistica penalizzata. Confrontiamo l'importanza economica dei salti e troviamo effetti sui rendimenti e sulla volatilità sia a livello di alta frequenza che giornaliero, e che questi effetti variano a seconda del tipo di news a cui i salti sono associati. Troviamo anche che i rendimenti futuri trimestrali e annuali sembrano esposti a misure di rischio basate sui salti associati ad annunci macroeconomici. Un metodo comune per rilevare il sentiment di un testo è il cosiddetto bag-of-words. Estendiamo il metodo in tre direzioni, usando: 1) una lista estesa di negazioni composta da parole singole, sequenze di due parole, e sequenze di tre parole; 2) liste di espressioni associate a un sentiment; 3) liste di combinazioni di parole associate a un sentiment. Lo scopo è creare un metodo generale adatto a rilevare il sentiment di un testo finanziario di qualsiasi tipo.

The Information Content of Financial Textual Data: Creating News Measures for Volatility Modeling and for the Analysis of Price Jumps / Poli, Francesco. - (2017 Aug 01).