Active learning for data streams: a survey

AI-generated keywords: Online active learning

AI-generated Key Points

  • Online active learning with data streams aims to minimize costs by selecting informative real-time data points.
  • Obtaining annotated data remains a challenge for training complex prediction and decision-making models, hindering AI integration into real-world applications like healthcare or autonomous driving.
  • Current strategies in this field include uncertainty sampling, diversity sampling, query by committee, and reinforcement learning for online classification, regression, and semi-supervised learning.
  • Further research is needed for online active linear regression models and advanced methods applicable to nonlinear models beyond linear bandits.
  • Future directions include exploring model-agnostic approaches for regression models and developing single-pass online sampling strategies for dynamic data streams.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Davide Cacciarelli, Murat Kulahci

Machine Learning (2023): 1-55
Published in Machine Learning (2023)
License: CC BY 4.0

Abstract: Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a closed pool of unlabeled data, and it has been the focus of many surveys and literature reviews. However, the growing availability of data streams has led to an increase in the number of approaches that focus on online active learning, which involves continuously selecting and labeling observations as they arrive in a stream. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in real time. We review the various techniques that have been proposed and discuss their strengths and limitations, as well as the challenges and opportunities that exist in this area of research.

Submitted to arXiv on 17 Feb. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2302.08893v4

, , , , Online active learning with data streams is a rapidly evolving field in machine learning that focuses on selecting the most informative data points in real-time to minimize the cost associated with collecting labeled observations. The increasing volume of data generated by modern applications has made it crucial to develop effective methods for learning from data streams continuously. However, the challenge lies in obtaining annotated data to train complex prediction and decision-making models, hindering the integration of artificial intelligence into real-world applications such as healthcare, autonomous driving, and industrial production. This comprehensive survey provides an overview of the current state-of-the-art strategies for online active learning with data streams. Various techniques based on uncertainty sampling, diversity sampling, query by committee, and reinforcement learning have been explored in contexts like online classification, regression, and semi-supervised learning. The analysis highlights the need for further research into online active linear regression models and advanced methods applicable to nonlinear models beyond linear bandits. Future directions in this field include investigating model-agnostic approaches for regression models and developing single-pass online sampling strategies for dynamic data streams. While ensemble models and batch-based approaches have been dominant in online classification, there is a growing interest in exploring methods that can handle continuous streams of data without requiring batch processing. Research efforts are also directed towards leveraging Bayesian optimization for active learning in nonlinear regression problems to enhance model performance. with is a rapidly evolving field that aims to minimize costs by selecting informative real-time data points. However, obtaining annotated data remains a challenge for training complex prediction and decision-making models, hindering AI integration into practical applications like healthcare or autonomous driving. This survey provides an overview of current strategies including uncertainty sampling, diversity sampling, query by committee, and reinforcement learning for online classification, regression, and semi-supervised learning. Further research is needed for online active linear regression models and advanced methods applicable to nonlinear models beyond linear bandits. Future directions include model-agnostic approaches for regression models and single-pass online sampling strategies for dynamic data streams. While ensemble models and batch-based approaches dominate online classification, there is a growing interest in continuous stream methods without batch processing. Research also focuses on leveraging Bayesian optimization for active learning in nonlinear regression to enhance model performance. capabilities have made from crucial in machine learning. However, obtaining annotated data remains a challenge for training complex prediction and decision-making models, hindering AI integration into real-world applications like healthcare or autonomous driving. This survey provides an overview of current strategies including uncertainty sampling, diversity sampling, query by committee, and reinforcement learning for online classification, regression, and semi-supervised learning. Further research is needed for online active linear regression models and advanced methods applicable to nonlinear models beyond linear bandits. Future directions include model-agnostic approaches for regression models and single-pass online sampling strategies for dynamic data streams. While ensemble models and batch-based approaches dominate online classification, there is a growing interest in continuous stream methods without batch processing. Research also focuses on leveraging Bayesian optimization for active learning in nonlinear regression to enhance model performance. through has become crucial with the increasing volume of data generated by modern applications. This comprehensive survey provides an overview of current strategies including uncertainty sampling, diversity sampling, query by committee, and reinforcement learning for online classification, regression, and semi-supervised learning in the context of . Further research is needed for online active linear regression models and advanced methods applicable to nonlinear models beyond linear bandits. Future directions include model-agnostic approaches for regression models and single-pass online sampling strategies for dynamic data streams.
Created on 14 Mar. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.