Skip to main navigation Skip to search Skip to main content

Building a construction law knowledge repository to enhance general-purpose large language model performance on domain question-answering: a case of China

  • Shenghua Zhou
  • , Hongyu Wang
  • , S. Thomas Ng
  • , Dezhi Li*
  • , Shenming Xie
  • , Kaiwen Chen
  • , Wentao Wang
  • *Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Purpose – Achieving smart question-answering (QA) for construction laws (CLs) holds significant promise in aiding domain professionals with legal inquiries. Existing studies of construction law question-answering (CLQA) rely on learning-based models, which require extensive training data and are limited to a narrow QA scope. Meanwhile, general-purpose large language models (GPLLMs) possess great potential for CLQA but fall short of domain-specific knowledge. This study aims to propose a data-driven and expertise-based approach to develop a construction law knowledge repository (CLKR) and validate its effectiveness in enhancing the CLQA performance of GPLLMs.
Design/methodology/approach – This methodology includes (1) recognizing 702 candidate CL documents from 374,992 official judgments, (2) building a CLKR with 387 filtered documents covering eight CL knowledge areas, (3) integrating CLKR and seven representative GPLLMs and (4) constructing a 2,140-question CLQA dataset from Professional Construction Engineer Qualification Examinations (PCEQEs) during 2014–2023 to compare CLQA performance between seven pairs of GPLLMs with and without CLKR.
Findings – The CLKR significantly enhances the CLQA performance of seven GPLLMs, yielding an impressive average accuracy increase of 21.1%, with individual improvements ranging from 9.9 to 44.9%. Furthermore, CLKR boosts the accuracy of single-answer questions by 14.9% and multiple-answer questions by 38.3%. Additionally, the accuracy enhancements across 8 CL knowledge areas are between 14.5 and 28.2%.
Originality/value – This study proposes an approach of developing the external knowledge base of CLKR to empower GPLLMs, significantly expanding the scope of CLQA while bypassing the complex training of traditional learning-based models. Moreover, this study confirms the effectiveness of CLKR in augmenting GPLLM performance and offers a reusable CLQA test dataset as a benchmark.
© 2025 Shenghua Zhou, Hongyu Wang, S. Thomas Ng, Dezhi Li, Shenming Xie, Kaiwen Chen and Wentao Wang. Published by Emerald Publishing Limited.
Original languageEnglish
Pages (from-to)518–546
Number of pages29
JournalEngineering, Construction and Architectural Management
Volume32
Issue number13
Online published1 May 2025
DOIs
Publication statusPublished - 15 Dec 2025

Funding

This study is financially supported by National Natural Science Foundation of China (No. 72201057) and Social Science Foundation of Jiangsu Province (No. 23GLC020).

Research Keywords

  • Construction laws
  • Knowledge repository
  • Large language models
  • Question-answering

Fingerprint

Dive into the research topics of 'Building a construction law knowledge repository to enhance general-purpose large language model performance on domain question-answering: a case of China'. Together they form a unique fingerprint.

Cite this