News impact analysis in algorithmic trading


Student thesis: Doctoral Thesis

View graph of relations


  • Xiaodong LI

Related Research Unit(s)


Awarding Institution
Award date3 Oct 2014


The stock market is one of the most important financial markets. Investors in the market gather and process market information to enhance their trading decisions. Among all forms of information, market news that reports the latest market status is one of the most important information sources that are believed to have an impact on the stock prices. With the advancement of algorithmic trading, news agencies, such as Bloomberg, have made a tremendous improvement on the reporting speed and the volume of their news. However, the format of the news is not machine-readable, and the voluminous news stream makes it increasingly difficult to be processed manually. Therefore, how to model and automatically process the market news, and analyze its market impact have become a set of challenging problems in both academic study and industrial practice. In this thesis, news impact is modelled and analyzed from three perspectives. For each perspective, we use one chapter to describe the approach we propose and discuss the experiment setup and results. Firstly, we study the problem of how news sentiment can help stock price prediction. Bag-of-words approach analyzes the latent relationship between statistical patterns of words and stock price movements. In contrast, news sentiment, which is an important ring in the chain of mapping from word patterns to price movements, analyzes the news impact in sentiment space. We first implement a generic stock price prediction framework which can make use of different external signals to predict the stock prices. We then use the Harvard psychological dictionary and Loughran-McDonald financial sentiment dictionary to construct the sentiment space. Text news articles are then quantitatively measured and projected onto the sentiment space. Predictions generated by either the bag-of-words approach or sentiment analysis are evaluated and compared at different market classification levels. Experiments are conducted on five-year daily historical Hong Kong Stock Exchange prices and news articles. Results show that: (1) At individual stock level, sector index level and market index level, the models with sentiment analysis outperform the bag-of-words model in both the validation set and the independent testing set; (2) The models which use sentiment polarity cannot provide useful predictions; (3) There is a minor difference between the models using the two different sentiment dictionaries. Secondly, we study the problem of how news summarization can help stock price prediction. A multiple document summarization algorithm is proposed to summarize the daily news articles. Compared with conventional summarization methods, the proposed algorithm constructs and preserves sentence relevance structures during the recursive calculation of sentence significance values. Potential important sentences "present" themselves gradually by gaining higher significance values, and the summary paragraph is then generated by selecting top-k scored sentences. Convergence of the algorithm is proved, and experiment, which is conducted on two standard data sets (DUC 2006 and DUC 2007), shows that the proposed model gives convincing results. In the second step, we reuse the stock price prediction framework implemented in the sentiment analysis. The summarization model generates summaries from news articles, which are then evaluated according to whether they can improve the prediction of stocks' daily return. Experiments are conducted on five-year daily Hong Kong Stock Exchange data, with the news reported by FINET. Evaluations are done at individual stock level, sector index level and market index level. Results show that the predictions based on news article summaries outperform the predictions based on full-length articles in both the validation and independent testing sets. Finally, we study the problem whether integrating the information from news and short-term historical prices can help stock price prediction. Previous works focus either on market news purely as exogenous factors that tend to lead the price process, or on the analysis of how past stock price processes can affect future stock returns. Taking one step further, we quantitatively integrate information from both market news and stock prices in order to improve the accuracy of prediction of stock future price return in an intra-day trading context. We present the design and architecture of our approach for market information fusion. By means of multiple kernel learning, the hidden information behind the two sources is effectively extracted, and more importantly, seamlessly integrated rather than simply combined by a single kernel approach. Experiments of comprehensive comparisons between our approach and three baseline methods (which use only one type of information, or naively combine the two sources) have been undertaken on the intra-day tick-by-tick data of the Hong Kong Stock Exchange and market news archives of the same period. It has been shown that for both cross-validation and independent testing, our approach achieves the best results.

    Research areas

  • Journalism, Commercial, Program trading (Securities), Stock price forecasting