Analyzing driving factors of land values in urban scale based on big data and non-linear machine learning techniques

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

37 Scopus Citations
View graph of relations



Original languageEnglish
Article number104537
Journal / PublicationLand Use Policy
Online published27 Feb 2020
Publication statusPublished - May 2020


Land value plays a vital role in the real estate market. It is a critical reference for urban planners to reallocate land resources and introduce valid policies. Studying the influential factors on land value can help better understand the spatial-temporal variation of land values and design effective control policies. This attracted a number of scholars to study the spatial and temporal relationships between land value and its possible influential factors from the perspective of macro and micro. However, the majority of the existing studies have the problems of linear assumption and multicollinearity in research models. Limited features and the lack of feature selection procedure are another two commonly seen limitations. To overcome the gaps, this paper adopts non-linear machine learning (ML) methods to investigate the influential factors on land values per square foot based on “big data” in New York City. More than one thousand potential factors are considered, covering from the land attribute, point of interest, demographics, housing, to economic, education, and social. They are further selected using a feature extraction model named Recursive Feature Elimination (RFE). Six ML algorithms, including Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Multi Linear Regression (MLR), Linear Support Vector Regression (SVR), Multilayer Perceptron (MLP) Regression, and K-Nearest Neighbor (KNN) Regression are evaluated and compared. The optimal one with an R-square value of 0.933 is used to calculate the feature importance further. Several important impact features are disclosed, including the number of newsstands, and the vacant housing percentage.

Research Area(s)

  • Big data, Land values per square foot, Machine learning, Place of interest, Recursive feature elimination