Abstract
Machine translation (MT) is not only a practical application but also an interesting task for research in machine learning. For example, Transformer, an architecture of deep neural networks (DNNs) widely used for Large Language Models (LLMs), was first invented for this task. Since its emergence, NMT has become the new state of the art with superior performance. However, it still has open issues that are well recognized in the community, such as beam search curse and exposure bias. Mitigating these issues will not only improve the performance of MT but also enhance the understanding of DNNs.This thesis analyzes the error modes behind the issues in NMT and investigates the calibration of an NMT model with Contrastive Learning. Calibration refers to finetuning a pre-trained model so that the sequences with higher probabilities get better performance. First, we analyze the search errors and model errors under different conditions. We find that gold references often have a lower probability than predictions from beam search, while an exact search without search error gets worse translations. These results demonstrate that further reducing search errors is not promising, and we should focus on calibrating the model's probabilities in decoding and aligning them with the quality of generated sequences. Second, we develop two methods to calibrate the model using Contrastive Learning, working at the token level and sequence level, respectively. The token-level method starts from the insight that current solutions, such as scheduled sampling, may recover too much and deviate from the ground truth. Accordingly, our method introduces a new objective realized with Contrastive Learning to constrain the recovery. The sequence-level method develops a contrastive preference model based on the traditional Plackett-Luce model and applies it to calculate the loss for list-wise ranking. Third, we explore how to use LLMs to calibrate an NMT model. We develop a method that uses LLMs to enhance the fluency of NMT's generation by integrating a language model at the target side. We use Contrastive Learning to constrain fluency so that it does not exceed the LLMs. Extensive theoretical analysis and experimental results demonstrate that these calibration methods are effective.
| Date of Award | 9 May 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | Xiaohua JIA (Supervisor) |
Keywords
- machine translation
- deep neural networks
- contrastive learning