TY - JOUR
T1 - Assessing the Nonlinear Effect of Atmospheric Variables on Primary and Oxygenated Organic Aerosol Concentration Using Machine Learning
AU - Qin, Yiming
AU - Ye, Jianhuai
AU - Ohno, Paul
AU - Liu, Pengfei
AU - Wang, Junfeng
AU - Fu, Pingqing
AU - Zhou, Liyuan
AU - Li, Yong Jie
AU - Martin, Scot T.
AU - Chan, Chak K.
PY - 2022/4/21
Y1 - 2022/4/21
N2 - Organic aerosol (OA) accounts for a significant fraction of atmospheric particulate matter. The OA concentration in the atmosphere is of high variability and depends on factors such as emission, the atmospheric oxidation process, meteorology, and transport. Due to the complex interactions among the numerous factors, accurate estimation of the effects of target variables on OA concentration is often challenging. Herein, a random forest machine learning algorithm successfully predicted the concentrations of primary and oxygenated organic aerosol (POA and OOA) at urban and rural sites in Hong Kong. The random forest model explained more than 80% of the observed traffic-POA, cooking-POA, and OOA. In contrast, a multiple linear regression model only explained 30-50% of these OA concentrations. In the random forest model training process, NOx was also the most important variable for traffic-POA and cooking-POA. For OOA, multiple parameters were equally crucial in the model prediction, including NOx, O3, and relative humidity (RH). The dependence of OA concentrations on atmospheric conditions (e.g., various NOx and O3 concentrations and meteorological conditions) was calculated via the partial dependence algorithm. The results suggested that the dependence of OA concentrations on atmospheric conditions was nonlinear and depended on different condition regimes. The partial dependence algorithm provides insights into the POA source and OOA formation mechanisms under a complex environment.
AB - Organic aerosol (OA) accounts for a significant fraction of atmospheric particulate matter. The OA concentration in the atmosphere is of high variability and depends on factors such as emission, the atmospheric oxidation process, meteorology, and transport. Due to the complex interactions among the numerous factors, accurate estimation of the effects of target variables on OA concentration is often challenging. Herein, a random forest machine learning algorithm successfully predicted the concentrations of primary and oxygenated organic aerosol (POA and OOA) at urban and rural sites in Hong Kong. The random forest model explained more than 80% of the observed traffic-POA, cooking-POA, and OOA. In contrast, a multiple linear regression model only explained 30-50% of these OA concentrations. In the random forest model training process, NOx was also the most important variable for traffic-POA and cooking-POA. For OOA, multiple parameters were equally crucial in the model prediction, including NOx, O3, and relative humidity (RH). The dependence of OA concentrations on atmospheric conditions (e.g., various NOx and O3 concentrations and meteorological conditions) was calculated via the partial dependence algorithm. The results suggested that the dependence of OA concentrations on atmospheric conditions was nonlinear and depended on different condition regimes. The partial dependence algorithm provides insights into the POA source and OOA formation mechanisms under a complex environment.
KW - atmospheric variables
KW - machine learning
KW - nonlinear effect
KW - organic aerosol
KW - partial dependence
UR - http://www.scopus.com/inward/record.url?scp=85126598054&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85126598054&origin=recordpage
U2 - 10.1021/acsearthspacechem.1c00443
DO - 10.1021/acsearthspacechem.1c00443
M3 - RGC 21 - Publication in refereed journal
SN - 2472-3452
VL - 6
SP - 1059
EP - 1066
JO - ACS Earth and Space Chemistry
JF - ACS Earth and Space Chemistry
IS - 4
ER -