TY - JOUR
T1 - Learn to explain the smile
T2 - An interpretable hybrid machine learning model to understand the implied volatility of CSI 300 options
AU - Li, Pengshi
AU - Huang, Jinbo
AU - Lin, Yan
PY - 2026/2
Y1 - 2026/2
N2 - We propose an interpretable hybrid machine learning framework for forecasting and explaining implied volatility surface dynamics of CSI 300 index options. Our methodology leverages machine learning to correct a theory-based baseline model. Initial predictions are derived from an analytical model, while the second stage involves a machine learning model trained on the residuals of the first stage. We construct three variants of hybrid models using XGBoost: a baseline three-feature model, a VIX-augmented four-feature model, and a five-feature model incorporating a newly developed options-implied ambiguity index. Empirical results using 2019–2025 CSI 300 options data show that the five-feature model significantly outperforms both the analytical benchmark and VIX-only model. Performance improvements are especially pronounced in market rallies and high-ambiguity regimes, where ambiguity attenuates implied volatility compression and amplifies perceptions of downside risk. We further use SHAP value analysis to demonstrate that feature effects are economically coherent and state-dependent. Our findings confirm that ambiguity is a distinct and quantitatively meaningful risk factor for explaining implied volatility dynamics in emerging market. © 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
AB - We propose an interpretable hybrid machine learning framework for forecasting and explaining implied volatility surface dynamics of CSI 300 index options. Our methodology leverages machine learning to correct a theory-based baseline model. Initial predictions are derived from an analytical model, while the second stage involves a machine learning model trained on the residuals of the first stage. We construct three variants of hybrid models using XGBoost: a baseline three-feature model, a VIX-augmented four-feature model, and a five-feature model incorporating a newly developed options-implied ambiguity index. Empirical results using 2019–2025 CSI 300 options data show that the five-feature model significantly outperforms both the analytical benchmark and VIX-only model. Performance improvements are especially pronounced in market rallies and high-ambiguity regimes, where ambiguity attenuates implied volatility compression and amplifies perceptions of downside risk. We further use SHAP value analysis to demonstrate that feature effects are economically coherent and state-dependent. Our findings confirm that ambiguity is a distinct and quantitatively meaningful risk factor for explaining implied volatility dynamics in emerging market. © 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
KW - Ambiguity
KW - CSI 300 options
KW - Implied volatility surface
KW - Interpretable machine learning
UR - http://www.scopus.com/inward/record.url?scp=105029718999&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-105029718999&origin=recordpage
U2 - 10.1016/j.pacfin.2025.103038
DO - 10.1016/j.pacfin.2025.103038
M3 - RGC 21 - Publication in refereed journal
SN - 0927-538X
VL - 96
JO - Pacific-Basin Finance Journal
JF - Pacific-Basin Finance Journal
M1 - 103038
ER -