Learning decisions with unlabeled data for business intelligence

  • Sijie REN

Student thesis: Doctoral Thesis

Abstract

To support better business decision making is the goal of moderm business intelligence (BI). Decision making literature shows that the process of making a decision can be described in three activities. First, decision makers monitor the environment and formulate the essence of the decision problem. Second, decision makers design possible courses of actions for the decision. Third, decision makers choose among the alternative actions. With the rapid development of the Web and information technology (IT), it's not uncommon for contemporary organizations to constantly generate huge amount of data. This makes the business and the operating environment the decision makers need to monitor in these organizations more and more complex. As a consequence of such a phenomenon, many novel scenarios where non-programmed rather than programmed decisions need to be made are created. For instance, the amount of financial reports and reviews available online is far beyond any financial analyst'S reading capacity. Therefore, when one tries to evaluate the market without going through all the information available, there is no routine or programmed procedure to follow. Though one can use his judgment or intuition to make a decision, the quality of such a decision is usually poor simply because the decision maker is blind to the essence of the decision problem. In this study we are interested in finding an effective approach for BI to monitor the complex data-driven environment of contemporary organizations so that the design activity in the decision making process can be well addressed and the quality of decision making can be improved. While the conventional methods to make non-programmed decisions in organizations are human judgment and intuition, in the light of statistical learning theory, we argue that machine learning (ML) is a keen tool for contemporary BI in effectively supporting non-programmed decisions in the complex data-driven environment. We also indicate two propositions which lead to the research gaps in applying machine learning methods in BI. First, due to the scarcity and expensiveness of labeled data, the utilization of unlabeled data would be highly desirable in many scenarios in BI. However, an important issue, the "performance degradation" issue is not well addressed in the literature. Second, with the increasingly more complex environment of organizations, the conventional hand-crafted features would display its poverty in solving novel and dynamic decision problems. We used the following approach in this study. To address the first research gap, we propose to adopt semi-supervised learning paradigm and draw on statistical learning theory to approach the performance degradation issue. To address the second research gap we propose to adopt the emerging unsupervised feature learning and deep learning paradigm to enable the learning of feature mapping function from unlabeled data. We conducted extensive experiments in two representative scenarios for BI, namely financial decision support and transportation decision support with two representative data types, Web based text data and machine generated time-series data. Our results showed that the essence of the performance degradation issue can be understood in a novel perspective. If the bias-variance trade-off is well balanced in the model, semi-supervised learning would lead to better performance. We also show that high quality feature mapping function could be created by unsupervised feature learning which opens another door of performance improvement by using unlabeled data. We also carried out a case analysis to show that the improved design activity does create new insights in understanding the essence of the decision problem and would be likely to positively influence the choice activity in the decision making process. Theoretical and practical implications of this study are also discussed in the thesis.
Date of Award2 Oct 2013
Original languageEnglish
Awarding Institution
  • City University of Hong Kong
SupervisorShaoyi Stephen LIAO (Supervisor)

Keywords

  • Business
  • Machine learning
  • Information technology
  • Business intelligence
  • Decision making

Cite this

'