Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias

AI-generated keywords: AI bias sexual objectification language-vision models CLIP objective web scrapes

AI-generated Key Points

  • Study by Socher et al. found evidence of AI bias related to sexual objectification in language-vision models trained on web scrapes using CLIP objective
  • Bias persists in AI systems as shown through replicated psychology experiments
  • Human characteristics disassociated from images of objectified women based on clothing, emotions overlooked by model
  • Antarctic Captions used emotion words less for partially clothed women compared to fully clothed women
  • Female professionals more likely associated with sexual descriptions than male professionals
  • Prompts like "a [age] year old girl" led to generation of sexualized images up to 73% of the time for certain AI models
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Robert Wolfe, Yiwei Yang, Bill Howe, Aylin Caliskan

ACM FAccT 2023
12 pages, 4 figures, 2 tables
License: CC BY-NC-SA 4.0

Abstract: Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics, such as emotions, are disregarded and the person is treated as a body. We replicate three experiments in psychology quantifying sexual objectification and show that the phenomena persist in AI. A first experiment uses standardized images of women from the Sexual OBjectification and EMotion Database, and finds that human characteristics are disassociated from images of objectified women: the model's recognition of emotional state is mediated by whether the subject is fully or partially clothed. Embedding association tests (EATs) return significant effect sizes for both anger (d >0.80) and sadness (d >0.50), associating images of fully clothed subjects with emotions. GRAD-CAM saliency maps highlight that CLIP gets distracted from emotional expressions in objectified images. A second experiment measures the effect in a representative application: an automatic image captioner (Antarctic Captions) includes words denoting emotion less than 50% as often for images of partially clothed women than for images of fully clothed women. A third experiment finds that images of female professionals (scientists, doctors, executives) are likely to be associated with sexual descriptions relative to images of male professionals. A fourth experiment shows that a prompt of "a [age] year old girl" generates sexualized images (as determined by an NSFW classifier) up to 73% of the time for VQGAN-CLIP and Stable Diffusion; the corresponding rate for boys never surpasses 9%. The evidence indicates that language-vision AI models trained on web scrapes learn biases of sexual objectification, which propagate to downstream applications.

Submitted to arXiv on 21 Dec. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2212.11261v2

A study conducted by Socher et al. found evidence of AI bias related to sexual objectification in language-vision models trained on web scrapes using the Contrastive Language-Image Pretraining (CLIP) objective. The researchers replicated three psychology experiments and discovered that this bias persists in AI systems. The first experiment showed that human characteristics were disassociated from images of objectified women, with emotions being overlooked by the model depending on the subject's clothing. In a second experiment, an automatic image captioner called Antarctic Captions used words denoting emotion less frequently for partially clothed women compared to fully clothed women. A third experiment revealed that images of female professionals were more likely to be associated with sexual descriptions than images of male professionals. Lastly, a fourth experiment demonstrated that prompts like "a [age] year old girl" led to the generation of sexualized images up to 73% of the time for certain AI models, highlighting the importance of addressing and mitigating biases in AI systems for fair and ethical outcomes.
Created on 24 Nov. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.