Skip to main navigation Skip to search Skip to main content

Data-efficient Active Learning for Structured Prediction with Partial Annotation and Self-Training

Zhisong Zhang, Emma Strubell, Eduard Hovy

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

2 Downloads (CityUHK Scholars)

Abstract

In this work we propose a pragmatic method that reduces the annotation cost for structured label spaces using active learning. Our approach leverages partial annotation, which reduces labeling costs for structured outputs by selecting only the most informative substructures for annotation. We also utilize self-training to incorporate the current model's automatic predictions as pseudo-labels for unannotated sub-structures. A key challenge in effectively combining partial annotation with self-training to reduce annotation cost is determining which sub-structures to select to label. To address this challenge, we adopt an error estimator to adaptively decide the partial selection ratio according to the current model's capability. In evaluations spanning four structured prediction tasks, we show that our combination of partial annotation and self-training using an adaptive selection ratio reduces annotation cost over strong full annotation baselines under a fair comparison scheme that takes reading time into consideration. © 2023 Association for Computational Linguistics.
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationEMNLP 2023
EditorsHouda Bouamor, Juan Pino, Kalika Bali
PublisherAssociation for Computational Linguistics
Pages12991-13008
Number of pages18
ISBN (Print)9798891760615
DOIs
Publication statusPublished - Dec 2023
Externally publishedYes
Event2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Hybrid, Singapore
Duration: 6 Dec 202310 Dec 2023

Conference

Conference2023 Findings of the Association for Computational Linguistics: EMNLP 2023
PlaceSingapore
CityHybrid
Period6/12/2310/12/23

Publisher's Copyright Statement

  • This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/

Fingerprint

Dive into the research topics of 'Data-efficient Active Learning for Structured Prediction with Partial Annotation and Self-Training'. Together they form a unique fingerprint.

Cite this