A Spatio-Temporal Diffusion Model for Missing and Real-Time Financial Data Inference

Yupeng Fang, Ruirui Liu, Huichou Huang*, Peilin Zhao, Qingyao Wu*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Missing values and unreleased figures are common but highly important for backtesting and real-time analysis in the financial industry, yet underexploited in the existing literature. In this paper, we focus on the issue of empirical asset pricing, where the cross-section of future asset returns is a function of lagged firm characteristics that vary in time frequencies and missing ratios. Most of the existing imputation methods cannot fully capture the complex and evolving spatio-temporal relations among firm-level characteristics. In particular, these methods fail to explicitly consider the spatial relations and feature structure in the stock network where we have to process granular data of thousands of stocks and hundreds of characteristics for each stock. To address these challenges, we propose a spatio-temporal diffusion model (STDM) that gradually recovers the masked financial data conditioning on high-dimensional stock-and-characteristics historical data. We propose characteristic-specific projection to construct characteristic-level features at both ends of the STDM, meanwhile maintaining firm-level features in the middle of the STDM to largely reduce the computational memory. Moreover, along with the temporal attention, we design a spatial graph convolutional network, making it computationally efficient and effective to learn time-varying spatio-temporal interdependence across firms. We further employ an implicit sampler that greatly accelerates the inference procedure so that the STDM is able to produce high-quality point and density estimates of missing and real-time firm characteristics within a few steps. We evaluate our model on the most comprehensive open-source dataset 'OSAP' and generate state-of-the-art performance in extensive experiments. © 2024 ACM.
Original languageEnglish
Title of host publicationCIKM '24 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
Place of PublicationNew York, NY
PublisherAssociation for Computing Machinery
Pages602-611
ISBN (Print)9798400704369
DOIs
Publication statusPublished - Oct 2024
Externally publishedYes
Event33rd ACM International Conference on Information and Knowledge Management (CIKM 2024) - Boise Centre, Boise, United States
Duration: 21 Oct 202425 Oct 2024
https://cikm2024.org/

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings
ISSN (Print)2155-0751

Conference

Conference33rd ACM International Conference on Information and Knowledge Management (CIKM 2024)
Abbreviated titleCIKM '24
PlaceUnited States
CityBoise
Period21/10/2425/10/24
Internet address

Research Keywords

  • diffusion model
  • financial data processing
  • missing value imputation
  • real-time nowcasting

Fingerprint

Dive into the research topics of 'A Spatio-Temporal Diffusion Model for Missing and Real-Time Financial Data Inference'. Together they form a unique fingerprint.

Cite this