Building Recommendation Systems using Mobile Data Analytics

推薦系統中移動數據分析

Student thesis: Doctoral Thesis

View graph of relations

Author(s)

  • Yan LYU

Related Research Unit(s)

Detail(s)

Awarding Institution
Supervisors/Advisors
Award date8 Sep 2016

Abstract

The increasing use of location tracking services, such as GPS, has accumulated a large volume of mobile data. These mobile data provide us comprehensive information for understanding the human mobility citywide, including the unique travel behavior of individuals and collective mobile behavior of the population. They can be used to facilitate various applications, especially recommendation systems. In this thesis, we investigate three typical mobile data, i.e., check-in data of social media users, trajectories of taxi passengers and cell tower data dumps, and study how to utilize them for recommendation systems. The major work is outlined as follows.
Firstly, we investigate the check-in data (i.e., point of interest (POI) visiting histories) of social media users and propose a personalized POI recommendation framework called iMCRec. In iMCRec, we first develop three preference models to estimate a user’s preferences regarding geographical positions, categories, and attributes of POIs, respectively. To incorporate the geographical, category and attribute criteria with personalized weights for different users, a multi-criteria decision making (MCDM) based iterative process is proposed. To improve system efficiency, the alternative filtering process is designed to prune candidate POIs. We also propose a learning strategy to learn a user’s personal weight on each criterion. Extensive experiments are conducted on two real-world data sets collected from Yelp. Experimental results show that iMCRec not only performs better than the state-of-the-art POI recommendation techniques, but it also provides a more effective and flexible trade-off mechanism than other multi-criteria-based techniques.
Secondly, we explore travel patterns of passengers from taxi trajectory data and recommend bus lines for customized bus (CB) systems with a bus line planning framework called T2CBS. A customized bus system is a new emerging public transportation mode that aims to provide direct and efficient transit services for groups of commuters with similar travel demand. Our T2CBS first discovers similar travel demand from taxi trajectories, with a clustering algorithm; and then proposes a bus stop deployment algorithm to deploy CB stops at the pick-up and drop-off locations of trajectory clusters. A probability model is proposed to estimate the probability for a taxi passenger to switch to CB buses. With this probability model, CB lines that achieve the maximum total daily profit are generated by a routing algorithm, a timetabling algorithm, and a merging scheme. Extensive experiments are conducted on one-month taxi trajectory data in Nanjing, China. Experimental results demonstrate that our T2CBS can generate CB lines with higher profit, compared with baseline methods. The study on travel experience shows that the CB lines generated by T2CBS can provide efficient transit services with short walk distance and small departure time adjustment. The moderate increase in travel time is significantly dominated by the savings in bus fare.
Finally, we investigate the problem of finding correlation between collective behavior of mobile users and distribution of points of interest (POIs) in a city. Specifically, we use large-scale cell tower data dumps collected from cell towers and POIs extracted from a popular social network service, Weibo. This chapter is dedicated to predicting the POI densities of different regions in the covered area using the two data sources. A prediction result can be used as a recommendation for opening a new store/branch. The crux of our contribution is the method of representing the collective behavior of mobile users as a histogram of connection counts over a period of time in each region. This representation ultimately enables us to apply a supervised learning algorithm to our problem in order to train a POI prediction model using the POI data set as the ground truth. We studied 12 state-of-the-art classification and regression algorithms; experimental results demonstrate the feasibility and effectiveness of the proposed method.