Abstract
Cervical cancer (CC) is a major cause of mortality in women, with stagnant survival rates, highlighting the need for improved prognostic models. This study aims to develop and compare machine learning models for predicting five-year cause-specific survival (CSS) in CC patients and evaluate their performance against traditional methods like the Cox Proportional Hazards model. Using data from the Surveillance, Epidemiology, and End Results (SEER) program, we applied the Synthetic Minority Over-Sampling Technique to address class imbalance and used stepwise forward selection, feature importance, and permutation importance for feature selection. The Gradient Boosting Survival Analysis (GBSA) model outperformed others with an Inverse Probability of Censoring Weighted Concordance Index of 0.835 and an Integrated Brier Score of 0.120. SHAP value analysis identified tumor stage and surgical resection as key factors. These findings address a critical gap in CSS prediction for CC patients and offer insights for clinical decision-making and personalized treatment. The GBSA model provides more accurate survival predictions, aiding clinicians in tailoring treatment strategies to improve patient outcomes. However, the retrospective study design, potential SEER data entry errors, and the lack of genetic markers and detailed treatment protocols should be considered when interpreting the results. © The Author(s) 2025.
| Original language | English |
|---|---|
| Article number | 22465 |
| Number of pages | 13 |
| Journal | Scientific Reports |
| Volume | 15 |
| Online published | 2 Jul 2025 |
| DOIs | |
| Publication status | Published - 2025 |
Funding
This work was supported by the City University of Hong Kong’s New Research Initiatives/Infrastructure Support from Central (APRC), grant number 9610401.
Publisher's Copyright Statement
- This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/