Semantically Enhanced Tensor Factorization for Medical Information Retrieval and Recommendation Systems


Student thesis: Doctoral Thesis

View graph of relations



Awarding Institution
Award date30 Jul 2018


With the dramatic growth of digital publications and the explosion of information on the Internet, medical Information Retrieval (IR) and recommendation systems are playing an increasingly important role in assisting physicians and patients to gain access to knowledge and information. Integrating domain knowledge can augment additional information and discover new patterns for such data mining tasks. The Semantic Web represents domain knowledge using explicit semantics in a machine-processable manner to support knowledge-based systems and enable semantic reasoning. Typically, medical data mining suffers from high dimensionality, sparsity, and incompleteness; consequently, traditional approaches prove insufficient. Previous studies corroborate that tensors have the potential to address the shortcomings of medical data mining. In addition, tensor provide a natural and seamless representation of the Semantic Web with different semantic links. Therefore, semantically enhanced tensor-based methods are proposed to cope with multiaspect data for healthcare data analytics.

To bridge the semantic gap and better exploit the knowledge from the Semantic Web, we present semantic networks extracted from the Semantic Web for text analysis. Query-based semantic expansion networks with rescaled centrality and semantic enrichment strategies are proposed to improve the performance of medical text classification. Second, this dissertation explores tensor-based methods for healthcare data mining. We develop tensor factorization methods for chronic disease prediction and location-based fall prediction, and demonstrate the superior performance of the tensor-based methods compared with other widely used machine-learning methods. Next, we develop a semantically enhanced medical IR system with a two-stage query expansion strategy. A tensor factorization method is used to estimate the importance of semantic association triples with a minor contribution of the incremental pseudo-relevance feedback method. We also present a recommendation system for medical community Question Answering (cQA) platforms which models the ternary relationship using a third-order tensor and adapts tensor factorization to predict high-quality answers. This dissertation focuses on the semantically enhanced tensor-based framework which tackles several key aspects of multidimensional and heterogeneous data analytics, and extends our knowledge on its applications to medical data analytics in several healthcare problems.

    Research areas

  • Tensor Factorization, Semantic Web, Medical Information Retrieval, Healthcare Data Mining, Recommendation Systems