A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

AI-generated keywords: Generative Large Language Models (LLMs) Natural Language Processing (NLP) translation task fine-tuning approach Advanced Language Model-based trAnslator (ALMA)

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Significant advancements in various tasks have not fully translated to the translation task for models with moderate sizes
  • Previous attempts to improve translation capabilities of moderate-sized language models (LLMs) have had limited success
  • The authors propose a novel approach specifically designed for the translation task, eliminating the need for abundant parallel data
  • The proposed approach consists of two stages of fine-tuning: initial fine-tuning on monolingual data and subsequent fine-tuning on a small set of high-quality parallel data
  • Experimental results show that ALMA, a developed LLM based on LLaMA-2, achieves significant improvements in translation performance across ten directions from test datasets
  • ALMA outperforms previous work and even surpasses models with larger parameters like NLLB-54B and GPT-3.5-text-davinci-003
  • This study demonstrates that fine-tuning approaches can enhance translation capabilities of moderate-sized LLMs without relying heavily on parallel data
  • The proposed method opens up new possibilities for improving machine translation performance using large language models.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Haoran Xu, Young Jin Kim, Amr Sharaf, Hany Hassan Awadalla

Abstract: Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation.

Submitted to arXiv on 20 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.11674v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In recent years, have made significant advancements in various tasks. However, these advancements have not been fully reflected in the translation task, particularly for models with moderate sizes (i.e., 7B or 13B parameters). These moderate LLMs still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs but with limited success. To address this issue, the authors propose a novel specifically designed for the translation task. This approach eliminates the need for abundant parallel data that traditional translation models typically rely on. The proposed approach consists of two stages of fine-tuning: initial fine-tuning on monolingual data and subsequent fine-tuning on a small set of high-quality parallel data. The authors introduce their developed LLM, called , which is based on LLaMA-2 as the underlying model. Experimental results demonstrate that ALMA achieves an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across ten translation directions from the WMT'21 and WMT'22 test datasets. Notably, ALMA's performance surpasses all previous work and even outperforms models like NLLB-54B and GPT-3.5-text-davinci-003, despite having only 7B or 13B parameters. This study establishes a foundation for a novel training paradigm in machine translation by demonstrating that fine-tuning approaches can significantly enhance the translation capabilities of moderate-sized LLMs without relying heavily on parallel data. The proposed method opens up new possibilities for improving machine translation performance using large language models.
Created on 16 Jan. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.