Comparing Formulaic Language in Human and Machine Translation: Insight from a Parliamentary Corpus

AI-generated keywords: Neural Machine Translation Human Translations Parliamentary Corpus Text Genres Collocational Bigrams

AI-generated Key Points

  • Study aims to replicate previous research comparing neural machine translations to human translations
  • Previous study found that neural machine translations have more formulaic sequences with high-frequency words, but fewer with rare words compared to human translations
  • Researchers used a parliamentary corpus to replicate the findings
  • Corpus was translated from French to English using DeepL, Google Translate, and Microsoft Translator
  • Results confirmed previous observations but with less pronounced differences
  • Suggests that using text genres resulting in more literal translations (e.g., parliamentary corpora) is preferable for comparing human and machine translations
  • Google translations had fewer highly collocational bigrams than DeepL and Microsoft translations
  • Findings provide insights into differences between neural machine translation systems and emphasize the importance of considering text genre when evaluating translation quality.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yves Bestgen

Presented at ParlaCLARIN III: Workshop on Creating, Enriching and Using Parliamentary Corpora
License: CC BY 4.0

Abstract: A recent study has shown that, compared to human translations, neural machine translations contain more strongly-associated formulaic sequences made of relatively high-frequency words, but far less strongly-associated formulaic sequences made of relatively rare words. These results were obtained on the basis of translations of quality newspaper articles in which human translations can be thought to be not very literal. The present study attempts to replicate this research using a parliamentary corpus. The text were translated from French to English by three well-known neural machine translation systems: DeepL, Google Translate and Microsoft Translator. The results confirm the observations on the news corpus, but the differences are less strong. They suggest that the use of text genres that usually result in more literal translations, such as parliamentary corpora, might be preferable when comparing human and machine translations. Regarding the differences between the three neural machine systems, it appears that Google translations contain fewer highly collocational bigrams, identified by the CollGram technique, than Deepl and Microsoft translations.

Submitted to arXiv on 22 Jun. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2206.10919v1

This study aims to replicate previous research that compared neural machine translations to human translations. The previous study found that neural machine translations contain more strongly-associated formulaic sequences made of high-frequency words, but fewer strongly-associated formulaic sequences made of rare words, compared to human translations. In this study, the researchers used a parliamentary corpus to see if the findings could be replicated. The text in the corpus was translated from French to English using three well-known neural machine translation systems: DeepL, Google Translate, and Microsoft Translator. The results confirmed the observations from the news corpus but with less pronounced differences. This suggests that using text genres that typically result in more literal translations, such as parliamentary corpora, may be preferable when comparing human and machine translations. Additionally, the study found that Google translations contained fewer highly collocational bigrams than DeepL and Microsoft translations. These findings provide insights into the differences between neural machine translation systems and highlight the importance of considering text genre when evaluating translation quality.
Created on 25 Jul. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.