Enhancing flood susceptibility predictions by using certainty factor in non-flood selection: a case study of Guangdong Province with four tree-based machine learning models

Jian Yang, Sixiao Chen, Zhongdong Duan*, Yanan Tang, Ping Lu

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

2 Citations (Scopus)

Abstract

Flood susceptibility assessment is a critical component of effective flood risk management. Traditional machine learning (ML) models for flood susceptibility often utilize randomly selected non-flood samples, potentially compromising the accuracy of the assessment results. In this study, we introduced the Certainty Factor (CF) method to enhance the selection of non-flood samples, thereby improving the precision and reliability of flood inventory datasets. To evaluate the effectiveness of this approach, we created two flood inventory datasets for Guangdong Province: Dataset A, containing 1427 flood samples and 1427 CF-based non-flood samples, and Dataset B, comprising 1427 flood samples and 1427 randomly selected non-flood samples. Four tree-based ML models were trained and tested on these two datasets, and the trained models were then used to predict flood susceptibility maps in Guangdong Province. Our comparative analysis demonstrated that all tree-based ML models trained on Dataset A significantly outperformed those trained on Dataset B, with higher metrics in Kappa of 34.35%, Accuracy of 13.36%, and area under the ROC curve (AUC) of 7.97%. The generated flood susceptibility maps reveal that approximately 27% of Guangdong Province is at moderate to very high flood risk, with specific high-risk areas concentrated in the Pearl River Basin. We also identified the top 10 counties at the highest flood risk, providing valuable guidance for targeted flood risk mitigation efforts. Overall, this research underscores the importance of refining flood inventory data for more reliable flood susceptibility predictions and offers a robust methodological framework that can be applied to other regions facing similar challenges.

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025
Original languageEnglish
Pages (from-to)3123–3146
Number of pages24
JournalStochastic Environmental Research and Risk Assessment
Volume39
Online published2 Jun 2025
DOIs
Publication statusPublished - Jul 2025

Funding

Financial support from the National Key R&D Program of China (Grant Number 2023YFC3805203), the National Natural Science Foundation of China (Grant Number 52239008), the Shenzhen Science and Technology Program (Grant Number KQTD20210811090112003), and the State Funded Postdoctoral Fellowships Program (GZB20230966) are gratefully acknowledged.

Research Keywords

  • Flood susceptibility
  • Certainty factor
  • Non-flood samples
  • Tree-based machine learning

Fingerprint

Dive into the research topics of 'Enhancing flood susceptibility predictions by using certainty factor in non-flood selection: a case study of Guangdong Province with four tree-based machine learning models'. Together they form a unique fingerprint.

Cite this