Skip to main navigation Skip to search Skip to main content

Examining nonlinearity in population inflow estimation using big data: An empirical comparison of explainable machine learning models

  • Songhua Hu
  • , Chenfeng Xiong
  • , Peng Chen
  • , Paul Schonfeld*
  • *Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Mobile device location data (MDLD) contain population-representative, fine-grained travel demand information, facilitating opportunities to validate established relations between travel demand and underlying factors from a big data perspective. Using the nationwide census block group (CBG)-level population inflow derived from MDLD as the proxy of travel demand, this study examines its relations with various factors including socioeconomics, demographics, land use, and CBG attributes. A host of tree-based machine learning (ML) models and interpretation techniques (feature importance, partial dependence plot (PDP), accumulated local effect (ALE), SHapley Additive exPlanations (SHAP)) are extensively compared to determine the best model architecture and justify interpretation robustness. Empirical results show that: 1) Boosting trees perform the best among all models, followed by bagging trees, single trees, and linear regressions. (2) Feature importance holds consistently among different tree-based models but is influenced by measures of importance and hyperparameter settings. 3) Pronounced nonlinearities, threshold effects, and interaction effects are observed in relations among population inflow and most of its determinants. 4) Compared with PDP, ALE and SHAP plots are more reliable in the presence of outliers, feature dependency, and local heterogeneity. Taken together, techniques introduced in this study can either be integrated into customary travel demand models to enhance model accuracy or serve as interpretation tools that offer a comprehensive understanding of intricate relations. © 2023 Elsevier Ltd. All rights reserved.
Original languageEnglish
Article number103743
JournalTransportation Research Part A: Policy and Practice
Volume174
Online published21 Jun 2023
DOIs
Publication statusPublished - Aug 2023
Externally publishedYes

Funding

The authors extend their gratitude to Dr. Xinyu (Jason) Cao from the University of Minnesota and Dr. Yiqun Xie from the University of Maryland for their valuable suggestions. The authors also express their appreciation to the anonymous reviewers for their insightful comments and suggestions, which have been instrumental in improving the quality of this manuscript.

Research Keywords

  • Explainable machine learning
  • Interpretability
  • Mobile device location data
  • Nonlinearity
  • Travel demand

Fingerprint

Dive into the research topics of 'Examining nonlinearity in population inflow estimation using big data: An empirical comparison of explainable machine learning models'. Together they form a unique fingerprint.

Cite this