Data Bias According to Bipol: Men are Naturally Right and It is the Role of Women to Follow Their Lead

AI-generated keywords: bias labeled datasets sensitive terms multilingual pretrained models toxic comments

AI-generated Key Points

  • Introduction of new large labeled datasets on bias in Italian, Dutch, and German languages
  • Bias detected in all evaluated datasets across five languages, including English GLUE/SuperGLUE leaderboards
  • Utilization of state-of-the-art multilingual pretrained models like mT5 and mBERT for benchmarking
  • Motivation from recent events highlighting social bias in AI and large language models (LLMs)
  • Comparison of various bias evaluation methods, including the bipol metric for explainability
  • Confirmation of bias in toxic comments through annotation with a confidence level of 95% and error margin of 7%
  • Identification of male bias along with other forms of bias in many datasets
  • Emphasis on gender biases as well as biases related to origin and age in previous research efforts
  • Focus on binary gender bias in Dutch language studies
  • Conducting experiments using SotA pre-trained multilingual models mT5-small and mBERT-base to compare macro F1 performance for toxic comments containing bias
  • Methodology involving rigorous annotation techniques with multiple annotators and gold samples for high-quality results
  • Contribution of new labeled datasets, lexica of sensitive terms, models, and codes for detecting bias in multiple languages
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Irene Pagliai, Goya van Boven, Tosin Adewumi, Lama Alkhaled, Namrata Gurung, Isabella Södergren, Elisa Barney

Presented at ICNLSP
License: CC BY 4.0

Abstract: We introduce new large labeled datasets on bias in 3 languages and show in experiments that bias exists in all 10 datasets of 5 languages evaluated, including benchmark datasets on the English GLUE/SuperGLUE leaderboards. The 3 new languages give a total of almost 6 million labeled samples and we benchmark on these datasets using SotA multilingual pretrained models: mT5 and mBERT. The challenge of social bias, based on prejudice, is ubiquitous, as recent events with AI and large language models (LLMs) have shown. Motivated by this challenge, we set out to estimate bias in multiple datasets. We compare some recent bias metrics and use bipol, which has explainability in the metric. We also confirm the unverified assumption that bias exists in toxic comments by randomly sampling 200 samples from a toxic dataset population using the confidence level of 95% and error margin of 7%. Thirty gold samples were randomly distributed in the 200 samples to secure the quality of the annotation. Our findings confirm that many of the datasets have male bias (prejudice against women), besides other types of bias. We publicly release our new datasets, lexica, models, and codes.

Submitted to arXiv on 07 Apr. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2404.04838v2

In this study, we introduce new large labeled datasets on bias in three languages - Italian, Dutch, and German. These datasets consist of almost 2 million samples each and are accompanied by lexica of sensitive terms for bias detection in the respective languages. Our experiments reveal that bias exists in all ten datasets of five languages evaluated, including benchmark datasets on the English GLUE/SuperGLUE leaderboards. We utilize state-of-the-art multilingual pretrained models such as mT5 and mBERT to benchmark on these datasets. The motivation behind our research stems from recent events highlighting the prevalence of social bias in AI and large language models (LLMs). We aim to estimate bias in multiple datasets by comparing various bias evaluation methods, including the bipol metric which offers explainability. Additionally, we confirm the assumption that toxic comments contain bias by annotating 200 samples from a toxic dataset population with a confidence level of 95% and an error margin of 7%. To ensure annotation quality, we include 30 gold samples with unanimous agreement from the original data. Our findings indicate that many datasets exhibit male bias (prejudice against women) along with other forms of bias. The literature review highlights previous efforts to measure and mitigate bias in languages other than English, emphasizing gender biases as well as biases related to origin and age. For Dutch language studies specifically, binary gender bias is a common focus. To verify our assumption regarding toxic comments containing bias, we conduct experiments using SotA pre-trained multilingual models mT5-small and mBERT-base to compare their macro F1 performance. Our methodology involves rigorous annotation techniques with multiple annotators and gold samples to ensure high-quality results. Overall, this study contributes new labeled datasets, lexica of sensitive terms, models, and codes for detecting bias in multiple languages. By shedding light on the presence of biases across various datasets and languages, we aim to address the challenge of social bias in AI systems effectively.
Created on 24 Nov. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.