Change is Hard: A Closer Look at Subpopulation Shift

AI-generated keywords: Subpopulation Shift Machine Learning Algorithm Generalization Worst-Class Accuracy Tradeoff

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Machine learning models perform poorly on underrepresented subgroups in training data
  • Authors propose a unified framework to understand and explain common shifts in subgroups
  • Evaluation of 20 state-of-the-art algorithms on 12 real-world datasets from various domains
  • Existing algorithms only improve subgroup robustness over certain types of shifts
  • Simple selection criterion based on worst-class accuracy is surprisingly effective for model selection without group information
  • Tradeoff between worst-group accuracy (WGA) and other important metrics
  • Need to carefully choose testing metrics when evaluating algorithm performance
  • Code and data related to the research available at https://github.com/YyzHarry/SubpopBench
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yuzhe Yang, Haoran Zhang, Dina Katabi, Marzyeh Ghassemi

Code and data are available at https://github.com/YyzHarry/SubpopBench

Abstract: Machine learning models often perform poorly on subgroups that are underrepresented in the training data. Yet, little is understood on the variation in mechanisms that cause subpopulation shifts, and how algorithms generalize across such diverse shifts at scale. In this work, we provide a fine-grained analysis of subpopulation shift. We first propose a unified framework that dissects and explains common shifts in subgroups. We then establish a comprehensive benchmark of 20 state-of-the-art algorithms evaluated on 12 real-world datasets in vision, language, and healthcare domains. With results obtained from training over 10,000 models, we reveal intriguing observations for future progress in this space. First, existing algorithms only improve subgroup robustness over certain types of shifts but not others. Moreover, while current algorithms rely on group-annotated validation data for model selection, we find that a simple selection criterion based on worst-class accuracy is surprisingly effective even without any group information. Finally, unlike existing works that solely aim to improve worst-group accuracy (WGA), we demonstrate the fundamental tradeoff between WGA and other important metrics, highlighting the need to carefully choose testing metrics. Code and data are available at: https://github.com/YyzHarry/SubpopBench.

Submitted to arXiv on 23 Feb. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2302.12254v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their research titled "Change is Hard: A Closer Look at Subpopulation Shift," authors Yuzhe Yang, Haoran Zhang, Dina Katabi, and Marzyeh Ghassemi address the issue of machine learning models performing poorly on underrepresented subgroups in training data. They aim to understand the mechanisms that cause subpopulation shifts and how algorithms can generalize across diverse shifts at scale. To achieve this, the authors propose a unified framework that dissects and explains common shifts in subgroups. They then establish a comprehensive benchmark by evaluating 20 state-of-the-art algorithms on 12 real-world datasets from various domains such as vision, language, and healthcare. Through training over 10,000 models and analyzing the results, the researchers make intriguing observations for future progress in this field. They find that existing algorithms only improve subgroup robustness over certain types of shifts but not others. Additionally, while current algorithms rely on group-annotated validation data for model selection, the authors discover that a simple selection criterion based on worst-class accuracy is surprisingly effective even without any group information. Furthermore, unlike previous works solely focused on improving worst-group accuracy (WGA), this study demonstrates a fundamental tradeoff between WGA and other important metrics. This highlights the need to carefully choose testing metrics when evaluating algorithm performance. The researchers provide code and data related to their work at https://github.com/YyzHarry/SubpopBench. By expanding our understanding of subpopulation shift mechanisms and algorithm generalization across diverse shifts, this research contributes valuable insights for future advancements in machine learning models' performance on underrepresented subgroups.
Created on 16 Aug. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.