Skip to main navigation Skip to search Skip to main content

Information Retrieval in the Era of Large Language Models: Understanding, Generalization, and Trustworthiness

  • ZHAO, Xiangyu (Principal Investigator / Project Coordinator)
  • Li, Qing (Co-Investigator)
  • Xu, Jianliang (Co-Investigator)
  • YIN, Dawei (Co-Investigator)

Project: Research

Project Details

Description

In the digital era, information retrieval (IR) systems are essential for navigating vastinformation, powering applications like search engines and e-commerce. Over 90% ofonline experiences begin with search engines like Google, Microsoft, and Baidu, drivingbillions of daily interactions. Additionally, government programs, including US-FAI,UK-RI and EU-Horizon, fund significantly in IR advancements, underscoring theirtechnical and economic importance.Despite their importance, current IR systems face several major challenges: 1)Insufficient Understanding in Query Rewriting: Traditional methods struggle toaccurately understand and refine user queries, leading to inconsistent and suboptimalretrieval outcomes. 2) Inadequate Generalization in Retrieval and Ranking: Existingdesigns with separate retrievers and rankers limit their generalization ability acrosstasks, hampering system flexibility and scalability. 3) Limited Trustworthiness inAnswer Generation: Conventional solutions grapple with limited access to externalknowledge, reducing answer accuracy and trustworthiness.To address these challenges, this proposal aims to revolutionize IR systems with LargeLanguage Models (LLMs) to enhance retrieval accuracy, consistency, flexibility, andtrustworthiness. Leveraging LLMs' advanced capabilities in text understanding,generalization, knowledge integration, and generation, the project proposes four keyresearch tasks: 1) Query Rewriter: Design an LLM-powered query rewriting frameworkto improve user intent understanding and align downstream retrieval modules, ensuringretrieval effectiveness and consistency. 2) Retriever and Ranker: Integrate retrieval andranking tasks via LLMs, bridging semantic gaps and simplifying system design andmaintenance. 3) Answer Generator: Develop an active retrieval-augmented generationframework to integrate external knowledge iteratively, ensuring accurate, trustworthyanswers. 4) Module Unification: Unify above IR modules into a cohesive LLM-drivenframework, boosting overall system performance and efficiency.Project Significance and Feasibility:This proposal's key intellectual merit lies in offering the first comprehensiveinvestigation of LLM-based IR systems, bridging traditional IR's limitations withadvanced LLMs for improved search quality. To achieve this, we will systematicallyinvestigate bottlenecks in key IR modules, leading to four innovative research directions(Tasks 1-4, with validated feasibility), and commercial validation (Task 5). Success willyield 1) next-generation LLM-powered IR systems, and 2) open-source prototypes forbroader use. This project's impact extends globally, transforming IR paradigms, enhancing user satisfaction, platform profitability, technological advancements,ultimately benefiting academia, industry, and society.Our interdisciplinary team, with over 3,000 publications, 70,000 citations, and 20 bestpaper awards, combines expertise in IR, LLMs, and trustworthy AI. Partnerships withBaidu, Tencent, Ant Group, and Amazon, who plan to deploy our outputs, furtherstrengthen the project's practical impact and feasibility. 
Project number9043858
Grant typeGRF
StatusActive
Effective start/end date1/01/26 → …

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.