Abstract
In Neural Machine Translation, models are often trained with teacher forcing and suffer from exposure bias due to the discrepancy between training and inference. Current token-level solutions, such as scheduled sampling, aim to maximize the model's capability to recover from errors. Their loss functions have a side effect: a sequence with errors may have a larger probability than the ground truth. The consequence is that the generated sequences may deviate from the ground truth. This side effect is verified in our experiments. To address this issue, we propose using token-level contrastive learning to coordinate three training objectives: the usual MLE objective, an objective for recovery from errors, and a new objective to explicitly constrain the recovery in a scope that does not impact the ground truth. Our empirical analysis shows that this method effectively achieves these objectives in training and reduces the frequency with which the third objective is violated. Experiments on three language pairs (German-English, Russian-English, and English-Russian) show that our method outperforms the vanilla Transformer and other methods addressing the exposure bias. © 2024 The authors, © 2024 European Association for Machine Translation.
Original language | English |
---|---|
Title of host publication | Proceedings of the 25th Annual Conference of the European Association for Machine Translation |
Editors | Carolina Scarton, Charlotte Prescott, Chris Bayliss, Chris Oakley, Joanna Wright, Stuart Wrigley, Xingyi Song, Edward Gow-Smith, Rachel Bawden, Víctor M Sánchez-Cartagena, Patrick Cadwell, Ekaterina Lapshinova-Koltunski, Vera Cabarrão, Konstantinos Chatzitheodorou, Mary Nurminen, Diptesh Kanojia, Helena Moniz |
Publisher | European Association for Machine Translation |
Pages | 68-79 |
Volume | 1: Research And Implementations & Case Studies |
ISBN (Print) | 978-1-0686907-0-9 |
Publication status | Published - Jun 2024 |
Event | 25th Annual Conference of the European Association for Machine Translation (EAMT 2024) - University of Sheffield, Sheffield, United Kingdom Duration: 24 Jun 2024 → 27 Jun 2024 https://eamt2024.sheffield.ac.uk/ |
Publication series
Name | Proceedings of the Annual Conference of the European Association for Machine Translation, EAMT |
---|
Conference
Conference | 25th Annual Conference of the European Association for Machine Translation (EAMT 2024) |
---|---|
Country/Territory | United Kingdom |
City | Sheffield |
Period | 24/06/24 → 27/06/24 |
Internet address |
Bibliographical note
Research Unit(s) information for this publication is provided by the author(s) concerned.Publisher's Copyright Statement
- This full text is made available under CC-BY-ND 4.0. https://creativecommons.org/licenses/by-nd/4.0/