Comparing Formulaic Language in Human and Machine Translation: Insight from a Parliamentary Corpus
AI-generated Key Points
- Study aims to replicate previous research comparing neural machine translations to human translations
- Previous study found that neural machine translations have more formulaic sequences with high-frequency words, but fewer with rare words compared to human translations
- Researchers used a parliamentary corpus to replicate the findings
- Corpus was translated from French to English using DeepL, Google Translate, and Microsoft Translator
- Results confirmed previous observations but with less pronounced differences
- Suggests that using text genres resulting in more literal translations (e.g., parliamentary corpora) is preferable for comparing human and machine translations
- Google translations had fewer highly collocational bigrams than DeepL and Microsoft translations
- Findings provide insights into differences between neural machine translation systems and emphasize the importance of considering text genre when evaluating translation quality.
Authors: Yves Bestgen
Abstract: A recent study has shown that, compared to human translations, neural machine translations contain more strongly-associated formulaic sequences made of relatively high-frequency words, but far less strongly-associated formulaic sequences made of relatively rare words. These results were obtained on the basis of translations of quality newspaper articles in which human translations can be thought to be not very literal. The present study attempts to replicate this research using a parliamentary corpus. The text were translated from French to English by three well-known neural machine translation systems: DeepL, Google Translate and Microsoft Translator. The results confirm the observations on the news corpus, but the differences are less strong. They suggest that the use of text genres that usually result in more literal translations, such as parliamentary corpora, might be preferable when comparing human and machine translations. Regarding the differences between the three neural machine systems, it appears that Google translations contain fewer highly collocational bigrams, identified by the CollGram technique, than Deepl and Microsoft translations.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Welcome to our AI assistant! Here are some important things to keep in mind:
- The assistant will only answer questions related to this specific paper.
- Please note that this is not a bot for casual chatting.
- If you want the answer in a language other than the language you chose for navigating the website, simply add "TRANSLATE IN LANGUAGE L" at the end of your query (replace "LANGUAGE L" with the language of your choice).
- For example, you could ask "Can you extract the most important aspect of the paper? TRANSLATE IN SPANISH".
- If you want to keep the history of your questions/answers you should create an account.
Assess the quality of the AI-generated content by voting
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through atree representation
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.