Application of explainable machine learning for real-time safety analysis toward a connected vehicle environment

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

38 Scopus Citations
View graph of relations

Author(s)

  • Chen Yuan
  • Ye Li
  • Helai Huang
  • Honggang Wang

Related Research Unit(s)

Detail(s)

Original languageEnglish
Article number106681
Journal / PublicationAccident Analysis and Prevention
Volume171
Online published22 Apr 2022
Publication statusPublished - Jun 2022

Abstract

Due to the difficulty of obtaining traffic flow data and conflicts simultaneously, conflict-based analysis using macroscopic traffic features is much less studied. This research aims to analyze real-time safety by a disaggregate study and explore the benefit of the connected vehicle (CV) for real-time safety evaluation. To avoid the endogeneity problem regarding conflicts and traffic features in regression models, machine learning is employed to obtain a reliable and practical real-time safety model. The results show that the Random Forest outperforms eXtreme Gradient Boosting, Support Vector Machine and Adaptive Boosting models, achieving the best performance with the highest AUC of 0.827. For a deep understanding of conflict mechanisms, the explainable machine learning method SHAP (SHapley Additive exPlanation) is introduced to improve the model interpretability providing insights into the impacts of traffic flow features. Lane difference regarding average speed is found to have the most significant impacts on real-time safety. Speed variation, the proportion of trucks and traffic volume are associated with conflict occurrence. Further analysis highlights that the impacts of traffic features are heterogeneous and there may exist specific patterns of paired features affecting real-time safety. Encouragingly, SHAP appears to be able to complement the traditional model with random components in terms of revealing heterogeneity. The explainable machine learning can also provide a solid basis for discretizing continuous variables while previous studies perform discretization mainly based on prior knowledge and experience. The experimental result regarding CV Market Penetration Rate (CV-MPR) demonstrates that the model performance is gradually elevated with the increase of penetration rate. The initial stage of the CV market (20%, 40% CV-MPR) yields the most significant gains in real-time safety evaluation. These findings can be used beneficially in active traffic management.

Research Area(s)

  • Connected vehicles, Machine learning, Market penetration rate, Real-time safety, SHAP