Abstract
There exists a discrepancy between the token-level objective during training and the overall sequence-level quality that is expected from the model. This discrepancy leads to issues like exposure bias. To align the model with human expectations, sequence-level objectives are often used to fine-tune pre-trained models. In this paper, we introduce a contrastive preference model that enhances the traditional Plackett-Luce model by incorporating an indicator function. Building upon this novel preference model, we propose Contrastive Preference Learning (CPL), which uses offline samples with list-wise preferences to fine-tune a pre-trained model in Neural Machine Translation. Our experiments, conducted on three language pairs, demonstrate that CPL outperforms not only the vanilla Transformer model but also other token-level and sequence-level baselines. Furthermore, the ablation study highlights the essential role of the proposed indicator function in achieving this improvement. © 2024 Association for Computational Linguistics.
| Original language | English |
|---|---|
| Title of host publication | Findings of the Association for Computational Linguistics: NAACL 2024 - Findings |
| Subtitle of host publication | NAACL 2024 |
| Editors | Kevin Duh, Helena Gomez, Steven Bethard |
| Publisher | Association for Computational Linguistics |
| Pages | 2724-2735 |
| ISBN (Print) | 9798891761193 |
| DOIs | |
| Publication status | Published - Jun 2024 |
| Event | 2024 Annual Conference of the North American Association for Computational Linguistics (NAACL 2024) - Hybrid, Mexico City, Mexico Duration: 16 Jun 2024 → 21 Jun 2024 https://aclanthology.org/volumes/2024.findings-naacl/ |
Publication series
| Name | Findings of the Association for Computational Linguistics: NAACL - Findings |
|---|
Conference
| Conference | 2024 Annual Conference of the North American Association for Computational Linguistics (NAACL 2024) |
|---|---|
| Place | Mexico |
| City | Mexico City |
| Period | 16/06/24 → 21/06/24 |
| Internet address |
Bibliographical note
Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).Publisher's Copyright Statement
- This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/