SAM 2: Segment Anything in Images and Videos

AI-generated keywords: SAM 2 Segment Anything Model visual segmentation video processing computer vision

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • The paper introduces SAM 2, a foundational model for promptable visual segmentation in images and videos.
  • SAM 2 leverages user interaction to enhance the model and dataset collection process, resulting in the largest video segmentation dataset to date.
  • Built on a simple transformer architecture with streaming memory capabilities, SAM 2 demonstrates robust performance across various tasks through training on an extensive dataset.
  • In video segmentation tasks, SAM 2 showcases improved accuracy while requiring fewer interactions compared to previous methodologies.
  • In image segmentation tasks, SAM 2 outperforms its predecessor (SAM) by being more accurate and faster.
  • The authors believe that SAM 2 will advance video segmentation and related perception tasks in computer vision.
  • A version of the model along with the dataset and an interactive demo are being released at https://ai.meta.com/sam2.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Dollár, Christoph Feichtenhofer

Website: https://ai.meta.com/sam2

Abstract: We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. Our model is a simple transformer architecture with streaming memory for real-time video processing. SAM 2 trained on our data provides strong performance across a wide range of tasks. In video segmentation, we observe better accuracy, using 3x fewer interactions than prior approaches. In image segmentation, our model is more accurate and 6x faster than the Segment Anything Model (SAM). We believe that our data, model, and insights will serve as a significant milestone for video segmentation and related perception tasks. We are releasing a version of our model, the dataset and an interactive demo.

Submitted to arXiv on 01 Aug. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2408.00714v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The paper "SAM 2: Segment Anything in Images and Videos" introduces the , a foundational model for promptable visual segmentation in both images and videos. Developed by authors Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman Rädle, Chloe Rolland, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick,Piotr Dollár,and Christoph Feichtenhofer,the leverages user interaction to enhance both the model and dataset collection process,resulting in the largest video segmentation dataset to date. Built on a simple transformer architecture with streaming memory capabilities,the demonstrates robust performance across various tasks through training on their extensive dataset. In video segmentation specifically,it showcases improved accuracy while requiring fewer interactions compared to previous methodologies. Additionally,in image segmentation tasks, outperforms its predecessor - the Segment Anything Model (SAM) - by being more accurate and faster. The authors believe that their innovative approach will advance video segmentation and related perception tasks in computer vision. To facilitate further research and application of ,a version of the model along with the dataset and an interactive demo are being released at https://ai.meta.com/sam2.
Created on 23 Oct. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.