Abstract
The identification of hazards is critical for mitigating accident risks on construction sites. Although current automatic recognition methods based on computer vision have demonstrated significant success, their performance can be limited by inadequate generalization capabilities and challenges in handling dynamic, complex scenes, leading to incomplete detection of various types of construction hazards. This study introduces a novel approach that leverages a tailored vision-language model (VLM) to enhance comprehensive hazard identification on construction sites. By capitalizing on the cross-modal understanding and robust generalization abilities of VLMs, this method aims to improve both the accuracy and comprehensiveness of construction hazard identification. To validate this approach, we constructed a specialized dataset comprising 1,139 images across 31 fine-grained categories of construction hazards for fine-tuning and evaluation. The experimental results revealed that, when fine-tuned on the Qwen2-VL-7B model, the VLM achieved a precision of 0.856, representing a substantial improvement from 0.495 precision in the non-fine-tuned state. Moreover, fine-tuning significantly reduced prediction errors and enhanced the model’s capability to understand and identify construction hazards. In addition, advanced VLM models demonstrate greater application potential compared to traditional models such as CLIP. This study advances the accuracy and comprehensiveness of hazard identification on construction sites, offering a new technological perspective for integrated automatic hazard detection in construction environments. © 2025 International Association on Automation and Robotics in Construction.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 42nd International Symposium on Automation and Robotics in Construction |
| Editors | Jiansong Zhang, Qian Chen, Gaang Lee, Vicente A. Gonzalez, Vineet R. Kamat |
| Publisher | International Association for Automation and Robotics in Construction (IAARC) |
| Pages | 548-555 |
| Number of pages | 8 |
| ISBN (Print) | 9780645832228 |
| DOIs | |
| Publication status | Published - Jul 2025 |
| Event | 42nd International Symposium on Automation and Robotics in Construction (ISARC 2025) - Concordia University, Montreal, Canada Duration: 28 Jul 2025 → 31 Jul 2025 https://www.iaarc.org/isarc-2025 |
Publication series
| Name | Proceedings of the International Symposium on Automation and Robotics in Construction |
|---|---|
| ISSN (Electronic) | 2413-5844 |
Conference
| Conference | 42nd International Symposium on Automation and Robotics in Construction (ISARC 2025) |
|---|---|
| Abbreviated title | ISARC |
| Place | Canada |
| City | Montreal |
| Period | 28/07/25 → 31/07/25 |
| Internet address |
Funding
The authors gratefully acknowledge the financial support provided by the National Natural Science Foundation of China (Grant No. 72404233), the Guangdong Basic and Applied Basic Research Foundation (Grant No. 2025A1515010190) and the New Faculty Start-up Grant from the City University of Hong Kong (Project No. 9610701).
Research Keywords
- Construction management
- Fine-tuning
- Hazard identification
- Vision-language model
Fingerprint
Dive into the research topics of 'Tailored Vision-Language Solutions for Comprehensive Hazard Identification on Construction Sites'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver