Abstract
Recent advancements in natural language processing (NLP) have led to significant improvements in the performance of language models across a range of tasks, from translation and summarization to question-answering and dialogue systems. These models, powered largely by deep learning techniques and massive amounts of data, have achieved impressive fluency and syntactic accuracy, often mimicking human-like text generation capabilities.However, despite these advancements, current language models frequently struggle with tasks requiring more than linguistic proficiency. They often lack the ability to dynamically incorporate external knowledge effectively and to apply reasoning skills that are essential for handling complex text generation tasks. For example, generating text that involves cause-and-effect relationships, hypothetical scenarios, or logical deductions remains a challenge. One of the fundamental limitations is that language models, primarily trained on static datasets, typically generate responses based on the probability distributions learned during their training phase. They do not actively seek or integrate new information during the generation process, which limits their ability to produce contextually relevant and factually accurate content when dealing with topics not well-represented in their training data. Moreover, while some models can mimic reasoning based on the patterns seen during training, they do not truly understand the underlying logic or facts of the situations they describe. This often leads to generation errors when models are presented with questions or tasks that require genuine reasoning abilities, such as solving math problems, engaging in complex problem-solving, or generating reliable and informative content on specialized topics. This thesis addresses the current limitations of language models by proposing innovative frameworks designed to enhance their capabilities through the integration of structured knowledge bases and sophisticated reasoning mechanisms.
For the problem of accurately retrieving knowledge triples from structured knowledge, we propose a novel hierarchical structured knowledge retriever. We directly leverage the two-tier architecture of structured knowledge, consisting of high-level entities and low-level knowledge triples, to design our task-agnostic structured knowledge hunter. Specifically, we employ a local-global interaction scheme for structured knowledge representation learning and a hierarchical transformer-based pointer network as the backbone for selecting relevant knowledge triples and entities. By combining the strong generative ability of language models with the high faithfulness of the knowledge hunter, our model achieves high interpretability, enabling users to comprehend the model’s output generation process. We empirically demonstrate the effectiveness of our model in both internal knowledge-enhanced table-to-text generation and external knowledge-enhanced dialogue response generation. Our task-agnostic model outperforms state-of-the-art methods and corresponding language models, setting new standards on the benchmark.
In addition to only enhancing language models with knowledge, we also explore the integration of both reasoning outcomes and commonsense knowledge. We propose a reasoning-aware commonsense knowledge retriever to equip the language model with enhanced commonsense knowledge, which not only includes factual information but also reflects the underlying commonsense reasoning. To address this, we adopt a coarse-to-fine approach for retrieving knowledge that aligns with both context and reasoning outcomes. Initially, we locate a specific sub-region of the knowledge base, ensuring all sentences within are relevant to the context. We then narrow our search within this sub-region to specifically extract knowledge relevant to reasoning outcomes. Throughout both phases, we employ a Monte Carlo Tree Search method to effectively leverage complex connections between sentences, enhancing our exploration of the knowledge base. Experiments on two multi-turn dialogue datasets demonstrate that our knowledge retrieval approach not only aligns more closely with the underlying reasoning in human conversations but also significantly enhances the diversity of the retrieved knowledge, resulting in more informative and creative responses.
Lastly, for the problem of low prediction accuracy and efficiency issues in Large Language Models (LLMs), we propose a bidirectional chaining method, Bi-Chainer to assist language models in complex reasoning tasks. Bi-Chainer dynamically switches to depth-first reasoning in the opposite reasoning direction when it encounters multiple branching options within the current direction. Thus, the intermediate reasoning results can be utilized as guidance to facilitate the reasoning process. We show that Bi-Chainer achieves sizable accuracy boots over unidirectional chaining frameworks on challenging logical reasoning datasets. Moreover, Bi-Chainer enhances the accuracy of intermediate proof steps and reduces the average number of inference calls, resulting in more efficient and accurate reasoning.
To summarize, in this thesis, we propose a series of innovative frameworks designed to significantly enhance the text-generation capabilities of language models. We achieve this by integrating structured knowledge bases and intrinsic reasoning mechanisms into these models. Specifically, we develop a hierarchically structured knowledge retriever and then propose a reasoning-aware knowledge retriever that couples knowledge retrieval with reasoning outcomes. Besides, we propose a bidirectional chaining method to assist language models in complex reasoning tasks. To this end, we aim to address and overcome the limitations of language models that primarily rely on static datasets and probability-based outputs.
| Date of Award | 23 Jan 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Linqi SONG (Supervisor) |