Skip to main navigation Skip to search Skip to main content

Tailored Vision-Language Solutions for Comprehensive Hazard Identification on Construction Sites

Qihua Chen, Xianfei Yin*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

The identification of hazards is critical for mitigating accident risks on construction sites. Although current automatic recognition methods based on computer vision have demonstrated significant success, their performance can be limited by inadequate generalization capabilities and challenges in handling dynamic, complex scenes, leading to incomplete detection of various types of construction hazards. This study introduces a novel approach that leverages a tailored vision-language model (VLM) to enhance comprehensive hazard identification on construction sites. By capitalizing on the cross-modal understanding and robust generalization abilities of VLMs, this method aims to improve both the accuracy and comprehensiveness of construction hazard identification. To validate this approach, we constructed a specialized dataset comprising 1,139 images across 31 fine-grained categories of construction hazards for fine-tuning and evaluation. The experimental results revealed that, when fine-tuned on the Qwen2-VL-7B model, the VLM achieved a precision of 0.856, representing a substantial improvement from 0.495 precision in the non-fine-tuned state. Moreover, fine-tuning significantly reduced prediction errors and enhanced the model’s capability to understand and identify construction hazards. In addition, advanced VLM models demonstrate greater application potential compared to traditional models such as CLIP. This study advances the accuracy and comprehensiveness of hazard identification on construction sites, offering a new technological perspective for integrated automatic hazard detection in construction environments. © 2025 International Association on Automation and Robotics in Construction.
Original languageEnglish
Title of host publicationProceedings of the 42nd International Symposium on Automation and Robotics in Construction
EditorsJiansong Zhang, Qian Chen, Gaang Lee, Vicente A. Gonzalez, Vineet R. Kamat
PublisherInternational Association for Automation and Robotics in Construction (IAARC)
Pages548-555
Number of pages8
ISBN (Print)9780645832228
DOIs
Publication statusPublished - Jul 2025
Event42nd International Symposium on Automation and Robotics in Construction (ISARC 2025) - Concordia University, Montreal, Canada
Duration: 28 Jul 202531 Jul 2025
https://www.iaarc.org/isarc-2025

Publication series

NameProceedings of the International Symposium on Automation and Robotics in Construction
ISSN (Electronic)2413-5844

Conference

Conference42nd International Symposium on Automation and Robotics in Construction (ISARC 2025)
Abbreviated titleISARC
PlaceCanada
CityMontreal
Period28/07/2531/07/25
Internet address

Funding

The authors gratefully acknowledge the financial support provided by the National Natural Science Foundation of China (Grant No. 72404233), the Guangdong Basic and Applied Basic Research Foundation (Grant No. 2025A1515010190) and the New Faculty Start-up Grant from the City University of Hong Kong (Project No. 9610701).

Research Keywords

  • Construction management
  • Fine-tuning
  • Hazard identification
  • Vision-language model

Fingerprint

Dive into the research topics of 'Tailored Vision-Language Solutions for Comprehensive Hazard Identification on Construction Sites'. Together they form a unique fingerprint.

Cite this