Better to Ask in English: Evaluation of Large Language Models on English, Low-resource and Cross-Lingual Settings

AI-generated keywords: Large Language Models Low-resource languages GPT-4 Llama 2 Gemini

AI-generated Key Points

  • Large Language Models (LLMs) have shown remarkable performance in natural language processing tasks
  • LLMs are predominantly developed and evaluated in English, leading to a gap in understanding their effectiveness in low-resource languages like Bangla, Hindi, and Urdu
  • This study evaluates the performance of LLMs such as GPT-4, Llama 2, and Gemini across English and low-resource languages
  • Traditional machine learning models and transformer-based approaches have been used for analyzing low-resource languages, but multi-lingual LLMs offer new opportunities
  • Computational resources for Bangla, Hindi, and Urdu are limited despite being widely spoken globally
  • The study focuses on evaluating the effectiveness of LLMs specifically in Bangla, Hindi, and Urdu compared to English
  • Promising results with LLMs in these languages have been observed but more comprehensive studies are needed to determine their full potential
  • Zero-shot prompting and different prompt settings are utilized to analyze how GPT-4 outperforms other LLMs across all five prompt settings and languages
  • While all three models perform better with English prompts, there is room for improvement with low-resource language prompts
  • The study contributes to enhancing LLM capabilities in addressing challenges posed by low-resource languages like Bangla, Hindi, and Urdu
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Krishno Dey, Prerona Tarannum, Md. Arid Hasan, Imran Razzak, Usman Naseem

License: CC BY 4.0

Abstract: Large Language Models (LLMs) are trained on massive amounts of data, enabling their application across diverse domains and tasks. Despite their remarkable performance, most LLMs are developed and evaluated primarily in English. Recently, a few multi-lingual LLMs have emerged, but their performance in low-resource languages, especially the most spoken languages in South Asia, is less explored. To address this gap, in this study, we evaluate LLMs such as GPT-4, Llama 2, and Gemini to analyze their effectiveness in English compared to other low-resource languages from South Asia (e.g., Bangla, Hindi, and Urdu). Specifically, we utilized zero-shot prompting and five different prompt settings to extensively investigate the effectiveness of the LLMs in cross-lingual translated prompts. The findings of the study suggest that GPT-4 outperformed Llama 2 and Gemini in all five prompt settings and across all languages. Moreover, all three LLMs performed better for English language prompts than other low-resource language prompts. This study extensively investigates LLMs in low-resource language contexts to highlight the improvements required in LLMs and language-specific resources to develop more generally purposed NLP applications.

Submitted to arXiv on 17 Oct. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2410.13153v1

In recent years, Large Language Models (LLMs) have gained significant attention for their remarkable performance in various natural language processing tasks. However, most LLMs are predominantly developed and evaluated in English, leaving a gap in understanding their effectiveness in low-resource languages, particularly those spoken widely in South Asia such as Bangla, Hindi, and Urdu. To address this gap, this study evaluates the performance of LLMs like GPT-4, Llama 2, and Gemini across English and these low-resource languages. Previous research has laid the groundwork for exploring LLMs in downstream tasks for low-resource languages. While traditional machine learning models and transformer-based approaches have been commonly used for analyzing these languages, the emergence of multi-lingual LLMs presents new opportunities. Despite being some of the most spoken languages globally, computational resources for Bangla, Hindi, and Urdu remain limited. The study focuses on evaluating the effectiveness of LLMs specifically in Bangla, Hindi, and Urdu compared to English. Existing literature showcases promising results with LLMs in these languages but highlights the need for more comprehensive studies to determine their full potential and identify areas for improvement. By utilizing zero-shot prompting and different prompt settings, the study aims to provide a detailed analysis of how GPT-4 outperforms other LLMs like Llama 2 and Gemini across all five prompt settings and languages. The findings suggest that while all three models perform better with English prompts, there is room for enhancing their performance with low-resource language prompts. Overall, this study contributes to the ongoing efforts to enhance LLMs' capabilities in addressing the unique challenges posed by low-resource languages like Bangla, Hindi, and Urdu. By shedding light on the strengths and limitations of current models in cross-lingual contexts, it paves the way for future advancements in developing more inclusive NLP applications tailored to diverse linguistic landscapes.
Created on 14 Mar. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.