Selling Data To a Machine Learner: Pricing via Costly Signaling

Junjie Chen*, Minming Li*, Haifeng Xu*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

16 Citations (Scopus)

Abstract

We consider a new problem of selling data to a machine learner who looks to purchase data to train his machine learning model. A key challenge in this setup is that neither the seller nor the machine learner knows the true quality of data. When designing a revenue-maximizing mechanism, a data seller faces the tradeoff between the cost and precision of data quality estimation. To address this challenge, we study a natural class of mechanisms that price data via costly signaling. Motivated by the assumption of i.i.d. data points as in classic machine learning models, we first consider selling homogeneous data and derive an optimal selling mechanism. We then turn to the sale of heterogeneous data, motivated by the sale of multiple data sets, and show that 1) on the negative side, it is NP-hard to approximate the optimal mechanism within a constant ratio e/e+1 + o(1); while 2) on the positive side, there is a 1/k-approximate algorithm, where k is the number of the machine learner’s private types. © 2022 by the author(s).
Original languageEnglish
Title of host publicationProceedings of the 39th International Conference on Machine Learning
EditorsKamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, Sivan Sabato
PublisherML Research Press
Pages3336-3359
Publication statusPublished - Jul 2022
Event39th International Conference on Machine Learning (ICML 2022) - Hybrid, Baltimore, United States
Duration: 17 Jul 202223 Jul 2022
https://icml.cc/virtual/2022/index.html
https://icml.cc/Conferences/2022
https://proceedings.mlr.press/v162/

Publication series

NameProceedings of Machine Learning Research
Volume162
ISSN (Electronic)2640-3498

Conference

Conference39th International Conference on Machine Learning (ICML 2022)
PlaceUnited States
CityBaltimore
Period17/07/2223/07/22
Internet address

Fingerprint

Dive into the research topics of 'Selling Data To a Machine Learner: Pricing via Costly Signaling'. Together they form a unique fingerprint.

Cite this