Olmo 3

AI-generated keywords: Olmo 3 language models long-context reasoning document packing synthetic data augmentation

AI-generated Key Points

  • Olmo 3 is a family of state-of-the-art and fully-open language models available at the 7B and 32B parameter scales.
  • The construction of Olmo 3 models focuses on enhancing long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall capabilities.
  • The flagship model within this family is the Olmo 3 Think 32B, known as the most robust fully-open thinking model currently released.
  • A long-context recipe was developed using documents sourced from the olmOCR science PDFs pool to extend the context capabilities of Olmo 3.
  • The Dolma 3 Longmino Pool combines 34% long-context data with 66% high-quality short-context data sampled from Dolma 3 Dolmino Mix for training purposes.
  • Various techniques were applied during the long-context extension process including utilizing YaRN in full attention layers and implementing document packing and inter-document masking.
  • Performance evaluation of context-extended models was conducted on two popular long-context benchmarks: RULER and HELMET.
  • Sourcing of long context data primarily involved scientific PDFs processed by olmOCR and filtered based on gzip compressibility metrics.
  • Refinement process included experiments to determine optimal components for enhancing performance in long-context scenarios.
  • Applying YaRN to full attention layers yielded superior outcomes while leveraging olmOCR science PDFs proved more effective compared to alternative methods.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Team Olmo, :, Allyson Ettinger, Amanda Bertsch, Bailey Kuehl, David Graham, David Heineman, Dirk Groeneveld, Faeze Brahman, Finbarr Timbers, Hamish Ivison, Jacob Morrison, Jake Poznanski, Kyle Lo, Luca Soldaini, Matt Jordan, Mayee Chen, Michael Noukhovitch, Nathan Lambert, Pete Walsh, Pradeep Dasigi, Robert Berry, Saumya Malik, Saurabh Shah, Scott Geng, Shane Arora, Shashank Gupta, Taira Anderson, Teng Xiao, Tyler Murray, Tyler Romero, Victoria Graf, Akari Asai, Akshita Bhagia, Alexander Wettig, Alisa Liu, Aman Rangapur, Chloe Anastasiades, Costa Huang, Dustin Schwenk, Harsh Trivedi, Ian Magnusson, Jaron Lochner, Jiacheng Liu, Lester James V. Miranda, Maarten Sap, Malia Morgan, Michael Schmitz, Michal Guerquin, Michael Wilson, Regan Huff, Ronan Le Bras, Rui Xin, Rulin Shao, Sam Skjonsberg, Shannon Zejiang Shen, Shuyue Stella Li, Tucker Wilde, Valentina Pyatkin, Will Merrill, Yapei Chang, Yuling Gu, Zhiyuan Zeng, Ashish Sabharwal, Luke Zettlemoyer, Pang Wei Koh, Ali Farhadi, Noah A. Smith, Hannaneh Hajishirzi

License: CC BY 4.0

Abstract: We introduce Olmo 3, a family of state-of-the-art, fully-open language models at the 7B and 32B parameter scales. Olmo 3 model construction targets long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall. This release includes the entire model flow, i.e., the full lifecycle of the family of models, including every stage, checkpoint, data point, and dependency used to build it. Our flagship model, Olmo 3 Think 32B, is the strongest fully-open thinking model released to-date.

Submitted to arXiv on 15 Dec. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2512.13961v1

Olmo 3 is a family of state-of-the-art and fully-open language models available at the 7B and 32B parameter scales. The construction of the Olmo 3 models focuses on enhancing long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall capabilities. The flagship model within this family is the Olmo 3 Think 32B which stands out as the most robust fully-open thinking model currently released. To extend the context capabilities of Olmo 3, a long-context recipe was developed using documents sourced from the olmOCR science PDFs pool. This collection known as Dolma 3 Longmino Pool combines 34% long-context data with 66% high-quality short-context data sampled from Dolma 3 Dolmino Mix. Training on this mix involved an additional 7B and 32B models. Various techniques were applied during the long-context extension process including utilizing YaRN in full attention layers and implementing document packing and inter-document masking. The performance of the context-extended models was evaluated on two popular long-context benchmarks: RULER and HELMET. RULER consists of synthetic long-context tasks like Needle-in-a-Haystack variations and aggregation tasks serving as a primary metric for guiding recipe development. On the other hand HELMET offers a diverse set of long-context benchmarks covering retrieval in-context learning and summarization tasks to assess more general capabilities. The sourcing of long context data primarily involved scientific PDFs processed by olmOCR and filtered based on gzip compressibility metrics. Additional filtering considerations were made using LongPpl to identify key tokens requiring long-range dependencies. The refinement process also included experiments to determine optimal components for enhancing performance in long-context scenarios. Overall results indicated that applying YaRN to full attention layers yielded superior outcomes while leveraging olmOCR science PDFs proved more effective compared to alternative methods. Synthetic data augmentation further enhanced performance over natural documents alone. Document packing was identified as a crucial factor in boosting performance for longer contexts. In conclusion through meticulous analysis and experimentation with architectural design decisions outlined in Bertsch et al., Olmo 3 has successfully expanded its capabilities in handling complex long-context tasks with impressive results showcased across various benchmarks.
Created on 23 Feb. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.