Document-Level Machine Translation with Large Language Models

AI-generated keywords: Large Language Models Document-Level Machine Translation Discourse-Aware Prompts Chat-GPT GPT-4

AI-generated Key Points

  • Wang et al. evaluate Large Language Models (LLMs) like Chat-GPT and GPT-4 in document-level machine translation tasks
  • Evaluation focuses on impact of discourse-aware prompts, comparison of translation models, and analysis of discourse modeling abilities
  • Leveraging long-text modeling capabilities of Chat-GPT results in outperforming commercial MT systems in human evaluation
  • GPT-4 demonstrates strong ability to explain discourse knowledge despite occasional errors in selecting translation candidates during contrastive testing
  • Both Chat-GPT and GPT-4 show superior performance and potential as a promising paradigm for document-level translation
  • Study highlights challenges and opportunities of discourse modeling for LLMs
  • Suggests future exploration of more challenging discourse-aware NLP tasks like the GuoFeng Benchmark by Wang et al. (2023)
  • Research contributes valuable insights into capabilities of LLMs in handling complex language tasks at a document level, showcasing potential for further advancements in natural language processing technologies
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Longyue Wang, Chenyang Lyu, Tianbo Ji, Zhirui Zhang, Dian Yu, Shuming Shi, Zhaopeng Tu

License: CC BY 4.0

Abstract: Large language models (LLMs) such as Chat-GPT can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. Taking document-level machine translation (MT) as a testbed, this paper provides an in-depth evaluation of LLMs' ability on discourse modeling. The study fo-cuses on three aspects: 1) Effects of Discourse-Aware Prompts, where we investigate the impact of different prompts on document-level translation quality and discourse phenomena; 2) Comparison of Translation Models, where we compare the translation performance of Chat-GPT with commercial MT systems and advanced document-level MT methods; 3) Analysis of Discourse Modelling Abilities, where we further probe discourse knowledge encoded in LLMs and examine the impact of training techniques on discourse modeling. By evaluating a number of benchmarks, we surprisingly find that 1) leveraging their powerful long-text mod-eling capabilities, ChatGPT outperforms commercial MT systems in terms of human evaluation. 2) GPT-4 demonstrates a strong ability to explain discourse knowledge, even through it may select incorrect translation candidates in contrastive testing. 3) ChatGPT and GPT-4 have demonstrated superior performance and show potential to become a new and promising paradigm for document-level translation. This work highlights the challenges and opportunities of discourse modeling for LLMs, which we hope can inspire the future design and evaluation of LLMs.

Submitted to arXiv on 05 Apr. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2304.02210v1

In their study, Wang et al. evaluate the performance of Large Language Models (LLMs) such as Chat-GPT and GPT-4 in document-level machine translation tasks. The evaluation focuses on three main aspects: the impact of discourse-aware prompts, comparison of translation models, and analysis of discourse modeling abilities. They find that leveraging the long-text modeling capabilities of Chat-GPT results in outperforming commercial MT systems in human evaluation. Additionally, GPT-4 demonstrates a strong ability to explain discourse knowledge, despite occasional errors in selecting translation candidates during contrastive testing. Overall, both Chat-GPT and GPT-4 show superior performance and potential to become a promising paradigm for document-level translation. The study highlights the challenges and opportunities of discourse modeling for LLMs and suggests future exploration of more challenging discourse-aware NLP tasks like the GuoFeng Benchmark by Wang et al. (2023). This research contributes valuable insights into the capabilities of LLMs in handling complex language tasks at a document level, showcasing their potential for further advancements in natural language processing technologies.
Created on 20 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.