Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

18 Scopus Citations
View graph of relations

Author(s)

  • Jun Ma
  • Jack C.P. Cheng
  • Yuexiong Ding
  • Changqing Lin
  • Mingzhu Wang
  • Chong Zhai

Detail(s)

Original languageEnglish
Article number101092
Journal / PublicationAdvanced Engineering Informatics
Volume44
Online published3 Apr 2020
Publication statusPublished - Apr 2020

Abstract

Air pollution has become one of the world's largest health and environmental problems. Studies focusing on air quality prediction, influential factors analysis, and control policy evaluation are increasing. When conducting these studies, valid and high-quality air pollution data are necessarily required to generate reasonable results. Missing data, which is frequently contained in the collected raw data, therefore, has become a significant barrier. Existing methods on missing data either cannot effectively capture the temporal and spatial mechanism of air pollution or focus on sequences with low missing rates and random missing positions. To address this problem, this paper proposes a new imputation methodology, namely transferred long short-term memory-based iterative estimation (TLSTM-IE) to impute consecutive missing values with large missing rates. A case study is conducted in New York City to verify the effectiveness and priority of the proposed methodology. Long-interval consecutive missing PM2.5 concentration data are filled. Experimental results show that the proposed model can effectively learn from long-term dependencies and transfer the learned knowledge. The imputation accuracy of the TLSTM-IE model is 25–50% higher than other commonly seen methods. The novelty of this study lies in two aspects. First is that we target at long-interval consecutive missing data, which has not been addressed before by existing studies in atmospheric research. Second is the novel application of transfer learning on missing values imputation. To our best knowledge, no research on air quality has implemented this technique on this problem before.

Research Area(s)

  • Air quality, Deep learning, Long short-term memory (LSTM), Long-interval consecutive missing values, Neural network, Transfer learning

Citation Format(s)

Transfer learning for long-interval consecutive missing values imputation without external features in air pollution time series. / Ma, Jun; Cheng, Jack C.P.; Ding, Yuexiong; Lin, Changqing; Jiang, Feifeng; Wang, Mingzhu; Zhai, Chong.

In: Advanced Engineering Informatics, Vol. 44, 101092, 04.2020.

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review