Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Number of pages | 33 |
Journal / Publication | Transactions on Machine Learning Research |
Volume | 2023 |
Publication status | Published - Jun 2023 |
Link(s)
Attachment(s) | Documents
Publisher's Copyright Statement
|
---|---|
Document Link | Links
|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-86000050333&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(e244dbae-9b71-4446-994a-d90702a4227c).html |
Abstract
Pool-based Active Learning (AL) has proven successful in minimizing labeling costs by sequentially selecting the most informative unlabeled data from large pool and querying their labels from an oracle or annotators. However, existing AL sampling schemes may not perform well in out-of-distribution (OOD) data scenarios, where the unlabeled data pool contains samples that do not belong to the pre-defined categories of the target task. Achieving strong AL performance under OOD data scenarios presents a challenge due to the inherent conflict between AL sampling strategies and OOD data detection. For instance, both more informative in-distribution (ID) data and OOD data in an unlabeled data pool would be assigned high informativeness scores (e.g., high entropy) during AL processes. To address this dilemma, we propose a Monte-Carlo Pareto Optimization for Active Learning (POAL) sampling scheme, which selects optimal subsets of unlabeled samples with fixed batch size from the unlabeled data pool. We formulate the AL sampling task as a multi-objective optimization problem and employ Pareto optimization based on two conflicting objectives: (1) the conventional AL sampling scheme (e.g., maximum entropy) and (2) the confidence of excluding OOD data samples. Experimental results demonstrate the effectiveness of our POAL approach on classical Machine Learning (ML) and Deep Learning (DL) tasks. © 2023, Transactions on Machine Learning Research. All rights reserved.
Research Area(s)
Bibliographic Note
Research Unit(s) information for this publication is provided by the author(s) concerned.
Citation Format(s)
Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios. / Zhan, Xueying (Co-first Author); Dai, Zeyu (Co-first Author); Wang, Qingzhong et al.
In: Transactions on Machine Learning Research, Vol. 2023, 06.2023.
In: Transactions on Machine Learning Research, Vol. 2023, 06.2023.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Download Statistics
No data available