Modeling methods for residential energy consumption and user behaviors of online-targeted display advertising
Student thesis: Doctoral Thesis
Related Research Unit(s)
This dissertation presents modeling methods for two research problems: residential energy consumption (REC) and user click-through behaviors towards online-targeted display advertising (OTDA). Understanding REC via modeling household-level survey data is important for governments, energy corporations, and home appliance manufacturers. National residential energy consumption surveys (RECS) collect household-level data with stratified random sampling schemes. RECS data, consequently, have a natural and explicit multilevel structure, reflected by geographical clustering of households. To handle this multilevel structure, we introduce a multilevel regression model. This approach divides overall REC variations into two sources: area and household-level variations; and respectively explains them with environmental and household-level variables. With a 53% explained variance proportion (EVP) (82% of area variations and 47% of household-level variations); the proposed multilevel regression model outperforms previous models, e.g. traditional linear regression models. Furthermore, the multilevel regression model largely reduces the impact of area variations on the estimated effects of influencing factors of REC. We introduce regularized linear models (RLMs) with the elastic net penalty to model REC. The purpose is to identify significant factors among all utilizable variables from RECS micro-datasets. This approach imposes no antecedent theory on variable selection. The elastic net penalty is a compromise of the ridge-regression and the LASSO penalty. It helps to handle more than 500 RECS variables of complicated multicollinearity in one model. With the U.S. 2009 RECS dataset, we develop a series of RLMs with the elastic net penalty. All constructed RLMs simultaneously assign non-zero effects to 98 selected variables. The best-fit RLM, which explains 65% of the total variability, outperforms most previous models in the literature. OTDA, as a new mode of online display advertising, has developed rapidly due to its capability to target potential customers. This dissertation addresses the issue from the perspective of OTDA publishers. As many management problems inherently involve optimization and statistical modeling, we develop a two-step forecasting method to forecast user click-through behaviors towards OTDA, so as to control uncertainties in formulating allocation plans for OTDA resources. We introduce a finite mixture regression model, i.e., an arbitrary-pointsinflated (API) Poisson regression model as a foreshadowing. With an offset in the Poisson component, this model can handle count data with an arbitrary number of inflated points and link clicks with page views. We develop algorithms for parameter estimation, adaptively choosing the best-fit API Poisson regression model according to the Bayesian information criterion (BIC), and obtaining the corresponding Hessian matrix. The two-step forecasting method involves a modeling and a predicting step. It can forecast user clicks towards matches of advertising requests and candidate allocation plans, based on data observed in current period. The modeling step segments data in current period into sub-samples with an adequate number of sample sizes, and constructs sub-models using an adaptive API Poisson regression algorithm. The predicting step provides two predicting schemes, and selects the scheme with higher per campaign prediction accuracy as the final scheme, to forecast user clicks in next period. Moreover, the proposed method is of fast computing speed and robust parameter estimation. We adapt this two-step forecasting method to forecast user clicks towards OTDA for a social network site. The empirical results show that our approaches have higher accuracy than other previous methods, including logistic regression, truncated logistic regression, etc. The ensemble-predicting scheme achieves higher accuracy in forecasting non-zero clicks, compared to the campaign-tocampaign predicting scheme. The model involving page views possesses the smallest prediction error among all alternative models being considered. Finally, we present a brief discussion on forecasting page views and suggest a further extension of the API Poisson regression to model count data other than Poisson distribution.
- Methodology, Internet advertising, Households, Household surveys, Linear models (Statistics), Energy consumption