Using internet search data to predict new HIV diagnoses in China: a modelling study

Qingpeng Zhang*, Yi Chai, Xiaoming Li, Sean D Young, Jiaqi Zhou

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

21 Citations (Scopus)
70 Downloads (CityUHK Scholars)

Abstract

Objectives   Internet data are important sources of abundant information regarding HIV epidemics and risk factors. A number of case studies found an association between internet searches and outbreaks of infectious diseases, including HIV. In this research, we examined the feasibility of using search query data to predict the number of new HIV diagnoses in China.

Design   We identified a set of search queries that are associated with new HIV diagnoses in China. We developed statistical models (negative binomial generalised linear model and its Bayesian variants) to estimate the number of new HIV diagnoses by using data of search queries (Baidu) and official statistics (for the entire country and for Guangdong province) for 7 years (2010 to 2016).

Results  Search query data were positively associated with the number of new HIV diagnoses in China and in Guangdong province. Experiments demonstrated that incorporating search query data could improve the prediction performance in nowcasting and forecasting tasks.

Conclusions  Baidu data can be used to predict the number of new HIV diagnoses in China up to the province level. This study demonstrates the feasibility of using search query data to predict new HIV diagnoses. Results could potentially facilitate timely evidence-based decision making and complement conventional programmes for HIV prevention.

Original languageEnglish
Article numbere018335
JournalBMJ Open
Volume8
Issue number10
Online published17 Oct 2018
DOIs
Publication statusPublished - Oct 2018

Research Keywords

  • health informatics
  • internet
  • predictive model
  • search query
  • surveillance

Publisher's Copyright Statement

  • This full text is made available under CC-BY-NC 4.0. https://creativecommons.org/licenses/by-nc/4.0/

Fingerprint

Dive into the research topics of 'Using internet search data to predict new HIV diagnoses in China: a modelling study'. Together they form a unique fingerprint.

Cite this