Abstract
Traffic crash data is crucial for understanding and developing effective crash preventive measures. Local police officers collect this data to address the five “W” questions of each crash (i.e. When?, Where?, What?, Who? and Why?) The quality of safety research outcomes depends heavily on the accuracy and completeness of crash data. While under-reporting of crashes is a known issue, misreporting and incompleteness present even more serious challenges in crash databases. These issues can lead to inaccurate analysis and hinder the development of effective road safety measures. Therefore, it is essential to address and improve the quality of crash data to ensure reliable research outcomes and enhance road safety efforts.To obtain more reliable crash data, researchers have proposed conducting in-depth crash investigations to gather additional information about crash scenarios. Crash narratives recorded by police officers in the crash database can serve as a valuable source of data. These narratives provide detailed textual descriptions of the crash process and contain useful information that can help address the issue of incompleteness in crash data. In comparison to commonly used tabular variables, crash narratives offer unique insights into the crash process. Furthermore, accessing crash narratives is relatively easy, and advancements in Natural Language Processing (NLP) and Artificial Intelligence (AI) technologies have made analyzing and utilizing crash narratives more efficient and accessible.
The thesis focuses on enhancing the analysis of tabular crash data by incorporating text mining techniques and crash narratives. The aim is to address various aspects such as predicting missing information, identifying hidden crash causes, discovering new risk factors, and integrating potential correlations between crashes in injury severity analysis.
In the first study, the author focused on the issue of missing information in crash-related variables and the valuable information contained in crash narratives. The author recognized that previous studies often overlooked the detailed contents and writing structures within crash narratives. To address this, the author developed a deep-learning-based framework that utilizes a sentence-level text-mining technique to obtain missing information. The author’s framework incorporates the Bidirectional Encoder Representation from Transformers (BERT) model and the Graph Attention Network (GAT) model. The input features consist of sentence-level text features and their relative locations within the crash narratives. By integrating these models, the author’s framework achieves a high level of accuracy in variable imputation.
In the second study, the author focused on the limitations of the accident reporting form (ARF) in accurately recording crash data, particularly for state-related crashes. The ARF only allows for reporting one crash cause based on a pre-specified crash cause code in China. This limitation can lead to inaccuracies, especially for state-related crashes where multiple causes may be involved. To address this issue, the author employed natural language processing (NLP) and deep learning models to analyze 1,624 state-related crash narratives. These narratives are detailed written accounts provided by responding officers, containing valuable free-form information associated with the crash occurrence. The results demonstrated that the text-CNN model performed the best in accurately classifying the narratives. These findings highlight the potential of utilizing crash narratives to identify the actual causative factors behind some inaccurately designated crash causes.
In the third study, the author acknowledged the limitations of using mainstream police-reported crash data for injury analysis, as it may lack detailed information on crash occurrences. To bridge this gap, the author collected crash narrative data from the Crash Investigation Sampling System (CISS) and utilized it for injury analysis. The findings of this study underscore the potential of crash narrative analysis as a complementary or alternative approach to conventional injury studies. By incorporating crash narrative data, researchers can gain a more comprehensive understanding of motor vehicle crashes, surpassing the insights derived solely from tabulated police-reported crash data.
The fourth study aimed to investigate injuries sustained by bus passengers as a result of non-collision incidents. The lack of complete incident circumstances poses a challenge for studying non-collision injuries, as police records are primarily designed for collision-related data. To address this challenge, the study utilized a dataset of 12,823 narratives recorded by the police over a ten-year period in Hong Kong. The results of the study demonstrated the potential of topic modeling in uncovering the characteristics of non-collision injuries sustained by bus passengers. Additionally, it highlighted the dynamic nature of topic prevalence and the interplay between different topics. The study suggests that future research should consider topic modeling as a promising tool for standardizing the extraction of narrative texts in injury surveillance.
In the fifth study, the author focused on the valuable information contained in crash narratives and the potential of text mining techniques in road safety research. The study proposed a novel hybrid approach that incorporates unstructured crash narrative data through text mining in injury severity analysis. The two-stage approach begins by applying the structural topic modeling method to identify thematic concepts and their correlations based on crash narrative data. This study highlights the importance of incorporating unstructured crash narrative data in injury severity analysis and showcases the effectiveness of the proposed hybrid approach in capturing meaningful insights from both structured and unstructured data sources. Overall, this research contributes to the growing body of knowledge on leveraging text mining techniques for extracting valuable information from crash narratives and their application in road safety research.
The proposed framework in each study contributes to the existing literature by providing new approaches for crash narrative data mining and information extraction. For instance, one study successfully addressed the issue of missing information in crash reports, while another study uncovered previously undisclosed crash causes. These findings not only enhance the quality of crash data but also offer fresh insights into risk factors and improve the accuracy of crash severity analyses. Overall, these research findings broaden our understanding of crash narratives and their potential for improving road safety research and interventions.
Date of Award | 19 Jul 2023 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Qingpeng ZHANG (Supervisor) & Helai HUANG (External Supervisor) |
Keywords
- Road crash
- Text mining
- Factors identification
- Characteristics of crashes