Abstract
Assessing factuality of text generated by large language models (LLMs) is an emerging yet crucial research area, aimed at alerting users to potential errors and guiding the development of more reliable LLMs. Nonetheless, the evaluators assessing factuality necessitate suitable evaluation themselves to gauge progress and foster advancements. This direction remains under-explored, resulting in substantial impediments to the progress of factuality evaluators. To mitigate this issue, we introduce a benchmark for Factuality Evaluation of large Language Models, referred to as FELM. In this benchmark, we collect responses generated from LLMs and annotate factuality labels in a fine-grained manner. Contrary to previous studies that primarily concentrate on the factuality of world knowledge (e.g. information from Wikipedia), FELM focuses on factuality across diverse domains, spanning from world knowledge to math and reasoning. Our annotation is based on text segments, which can help pinpoint specific factual errors. The factuality annotations are further supplemented by predefined error types and reference links that either support or contradict the statement. In our experiments, we investigate the performance of several LLM-based factuality evaluators on FELM, including both vanilla LLMs and those augmented with retrieval mechanisms and chain-of-thought processes. Our findings reveal that while retrieval aids factuality evaluation, current LLMs are far from satisfactory to faithfully detect factual errors.
| Original language | English |
|---|---|
| Title of host publication | 37th Conference on Neural Information Processing Systems (NeurIPS 2023) |
| Editors | A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, S. Levine |
| Pages | 44502-44523 |
| ISBN (Electronic) | 9781713899921 |
| Publication status | Published - Dec 2023 |
| Event | 37th Conference on Neural Information Processing Systems (NeurIPS 2023) - New Orleans Ernest N. Morial Convention Center, New Orleans, United States Duration: 10 Dec 2023 → 16 Dec 2023 https://papers.nips.cc/paper_files/paper/2023 https://nips.cc/Conferences/2023 |
Publication series
| Name | Advances in Neural Information Processing Systems |
|---|---|
| Volume | 36 |
| ISSN (Print) | 1049-5258 |
Conference
| Conference | 37th Conference on Neural Information Processing Systems (NeurIPS 2023) |
|---|---|
| Abbreviated title | NIPS '23 |
| Place | United States |
| City | New Orleans |
| Period | 10/12/23 → 16/12/23 |
| Internet address |
Bibliographical note
Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).Fingerprint
Dive into the research topics of 'FELM: Benchmarking Factuality Evaluation of Large Language Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver