A Mixed Convex Penalized Machine Learning Approach for High-Dimensional Financial Data with its Application on Vast Portfolio Selections

Project: Research

View graph of relations


A substantial amount of research has been devoted to covariance estimators, which play an indispensable and critical role in economic, financial and statistical studies (Newey and West, 1987; Fan, Liu, Wang, et al., 2018). It is also crucial for practical applications like risk management, portfolio allocation, and regression analysis (Jagannathan and Ma, 2003; Fan, Zhang, and Yu, 2012). However, high-dimensional and high-frequency data, which are contaminated by microstructure noise and asynchronous trading, pose new challenges to covariance estimations. Kim, Wang, and Zou (2016) further documents an economically significant trade-off, in the current available approaches, among positive definiteness, unbiasedness, consistency and rate of convergence, which decreases its applicability in areas like vast portfolio selections and evaluations.This proposal intends to push current theoretical boundaries by proposing a novel covariance estimator via a machine learning (ML) approach. Assuming that the large covariance can be decomposed into a low-rank matrix and a sparse one, our proposed method aims to provide unbiased, consistent and positive definite estimates of these two components in a one-step procedure. Our new method is fully nonparametric. We avoid the imposition of linear factor models, but let the data directly determine the low-rank component, which avoids potential misspecification errors in the construction of the data generating process of a low-rank matrix and possible contractions in the number of factors. Moreover, unlike current approaches, which require the stock constituents of a large portfolio to be well-diversified among different industries with prior knowledge of their classifications, our method will overcome this limitation by allowing for valid investigations on vast portfolios that have constituents with unobserved connections and from concentrated industries.Traditional ML methodologies are widely applied on i.i.d. data, therefore this proposal will bridge ML technologies with serially dependent financial data by developing a novel efficient ML algorithm. Moreover, we will design the algorithm so that the initial biasadjusted volatility is not required to be positive semi-definite, which will enable the optimal rate of statistical convergence. Moreover, we will derive a data-driven algorithm to select the optimal tuning parameters for ease of practical use.As an application, we plan to apply our new ML approach to a large panel of asynchronously traded assets. We aim to consistently identify and estimate the systemic and idiosyncratic risk components for vast portfolios and conduct out-of-sample portfolio evaluations. The success of our novel ML approach will help justify the role machine learning techniques play in modern financial practices.


Project number9042838
Grant typeGRF
Effective start/end date1/01/2013/06/24