Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

AI-generated keywords: Large language models Machine translation Contrastive Preference Optimization Performance gap Supervised fine-tuning

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Study explores the performance of moderate-sized large language models (LLMs) in machine translation (MT)
  • Aims to bridge the performance gap between LLMs with 7B or 13B parameters and state-of-the-art conventional encoder-decoder models or larger-scale LLMs
  • Proposes a novel approach called Contrastive Preference Optimization (CPO) to address shortcomings of supervised fine-tuning (SFT) for LLMs in MT task
  • CPO trains models to generate high-quality translations instead of mimicking reference translations like SFT does
  • Applied CPO to ALMA models with 22K parallel sentences and 12M parameters, resulting in significant improvements compared to traditional supervised fine-tuning
  • ALMA-R model demonstrates performance on par with or surpassing WMT competition winners and GPT-4 on WMT'21, WMT'22, and WMT'23 test datasets
  • Introduces a new approach to optimize LLMs for machine translation tasks by training them to generate high-quality translations
  • Findings showcase how Contrastive Preference Optimization can push the boundaries of LLM performance in machine translation.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim

Abstract: Moderate-sized large language models (LLMs) -- those with 7B or 13B parameters -- exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22 and WMT'23 test datasets.

Submitted to arXiv on 16 Jan. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2401.08417v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In this study titled "Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation," Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, and Young Jin Kim explore the performance of moderate-sized large language models (LLMs) in machine translation (MT). The researchers aim to bridge the performance gap between LLMs with 7B or 13B parameters and state-of-the-art conventional encoder-decoder models or larger-scale LLMs such as GPT-4. They address the shortcomings of supervised fine-tuning (SFT) for LLMs in the MT task by proposing a novel approach called Contrastive Preference Optimization (CPO). CPO trains models to generate high-quality translations instead of mimicking reference translations like SFT does. To evaluate its effectiveness, CPO is applied to ALMA models with only 22K parallel sentences and 12M parameters. The results show significant improvements compared to traditional supervised fine-tuning. The resulting model, called ALMA-R, demonstrates performance on par with or surpassing that of WMT competition winners and GPT-4 on WMT'21, WMT'22, and WMT'23 test datasets. This study introduces a new approach to optimize LLMs for machine translation tasks by training them to generate high-quality translations rather than just adequate ones. The findings showcase how Contrastive Preference Optimization can push the boundaries of LLM performance in machine translation.
Created on 25 Jan. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.