Olmo 3
AI-generated keywords:
Olmo 3
language models
long-context reasoning
document packing
synthetic data augmentation
- Olmo 3 is a family of state-of-the-art and fully-open language models available at the 7B and 32B parameter scales.
- The construction of Olmo 3 models focuses on enhancing long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall capabilities.
- The flagship model within this family is the Olmo 3 Think 32B, known as the most robust fully-open thinking model currently released.
- A long-context recipe was developed using documents sourced from the olmOCR science PDFs pool to extend the context capabilities of Olmo 3.
- The Dolma 3 Longmino Pool combines 34% long-context data with 66% high-quality short-context data sampled from Dolma 3 Dolmino Mix for training purposes.
- Various techniques were applied during the long-context extension process including utilizing YaRN in full attention layers and implementing document packing and inter-document masking.
- Performance evaluation of context-extended models was conducted on two popular long-context benchmarks: RULER and HELMET.
- Sourcing of long context data primarily involved scientific PDFs processed by olmOCR and filtered based on gzip compressibility metrics.
- Refinement process included experiments to determine optimal components for enhancing performance in long-context scenarios.
- Applying YaRN to full attention layers yielded superior outcomes while leveraging olmOCR science PDFs proved more effective compared to alternative methods.
Authors:
Team Olmo,
:,
Allyson Ettinger,
Amanda Bertsch,
Bailey Kuehl,
David Graham,
David Heineman,
Dirk Groeneveld,
Faeze Brahman,
Finbarr Timbers,
Hamish Ivison,
Jacob Morrison,
Jake Poznanski,
Kyle Lo,
Luca Soldaini,
Matt Jordan,
Mayee Chen,
Michael Noukhovitch,
Nathan Lambert,
Pete Walsh,
Pradeep Dasigi,
Robert Berry,
Saumya Malik,
Saurabh Shah,
Scott Geng,
Shane Arora,
Shashank Gupta,
Taira Anderson,
Teng Xiao,
Tyler Murray,
Tyler Romero,
Victoria Graf,
Akari Asai,
Akshita Bhagia,
Alexander Wettig,
Alisa Liu,
Aman Rangapur,
Chloe Anastasiades,
Costa Huang,
Dustin Schwenk,
Harsh Trivedi,
Ian Magnusson,
Jaron Lochner,
Jiacheng Liu,
Lester James V. Miranda,
Maarten Sap,
Malia Morgan,
Michael Schmitz,
Michal Guerquin,
Michael Wilson,
Regan Huff,
Ronan Le Bras,
Rui Xin,
Rulin Shao,
Sam Skjonsberg,
Shannon Zejiang Shen,
Shuyue Stella Li,
Tucker Wilde,
Valentina Pyatkin,
Will Merrill,
Yapei Chang,
Yuling Gu,
Zhiyuan Zeng,
Ashish Sabharwal,
Luke Zettlemoyer,
Pang Wei Koh,
Ali Farhadi,
Noah A. Smith,
Hannaneh Hajishirzi
Abstract: We introduce Olmo 3, a family of state-of-the-art, fully-open language models at the 7B and 32B parameter scales. Olmo 3 model construction targets long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall. This release includes the entire model flow, i.e., the full lifecycle of the family of models, including every stage, checkpoint, data point, and dependency used to build it. Our flagship model, Olmo 3 Think 32B, is the strongest fully-open thinking model released to-date.
Submitted to arXiv on 15 Dec. 2025
- Comprehensive Summary
- Key points
- Layman's Summary
- Blog article
Olmo 3 is a family of state-of-the-art and fully-open language models available at the 7B and 32B parameter scales. The construction of the Olmo 3 models focuses on enhancing long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall capabilities. The flagship model within this family is the Olmo 3 Think 32B which stands out as the most robust fully-open thinking model currently released. To extend the context capabilities of Olmo 3, a long-context recipe was developed using documents sourced from the olmOCR science PDFs pool. This collection known as Dolma 3 Longmino Pool combines 34% long-context data with 66% high-quality short-context data sampled from Dolma 3 Dolmino Mix. Training on this mix involved an additional 7B and 32B models. Various techniques were applied during the long-context extension process including utilizing YaRN in full attention layers and implementing document packing and inter-document masking. The performance of the context-extended models was evaluated on two popular long-context benchmarks: RULER and HELMET. RULER consists of synthetic long-context tasks like Needle-in-a-Haystack variations and aggregation tasks serving as a primary metric for guiding recipe development. On the other hand HELMET offers a diverse set of long-context benchmarks covering retrieval in-context learning and summarization tasks to assess more general capabilities. The sourcing of long context data primarily involved scientific PDFs processed by olmOCR and filtered based on gzip compressibility metrics. Additional filtering considerations were made using LongPpl to identify key tokens requiring long-range dependencies. The refinement process also included experiments to determine optimal components for enhancing performance in long-context scenarios. Overall results indicated that applying YaRN to full attention layers yielded superior outcomes while leveraging olmOCR science PDFs proved more effective compared to alternative methods. Synthetic data augmentation further enhanced performance over natural documents alone. Document packing was identified as a crucial factor in boosting performance for longer contexts. In conclusion through meticulous analysis and experimentation with architectural design decisions outlined in Bertsch et al., Olmo 3 has successfully expanded its capabilities in handling complex long-context tasks with impressive results showcased across various benchmarks.
- - Olmo 3 is a family of state-of-the-art and fully-open language models available at the 7B and 32B parameter scales.
- - The construction of Olmo 3 models focuses on enhancing long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall capabilities.
- - The flagship model within this family is the Olmo 3 Think 32B, known as the most robust fully-open thinking model currently released.
- - A long-context recipe was developed using documents sourced from the olmOCR science PDFs pool to extend the context capabilities of Olmo 3.
- - The Dolma 3 Longmino Pool combines 34% long-context data with 66% high-quality short-context data sampled from Dolma 3 Dolmino Mix for training purposes.
- - Various techniques were applied during the long-context extension process including utilizing YaRN in full attention layers and implementing document packing and inter-document masking.
- - Performance evaluation of context-extended models was conducted on two popular long-context benchmarks: RULER and HELMET.
- - Sourcing of long context data primarily involved scientific PDFs processed by olmOCR and filtered based on gzip compressibility metrics.
- - Refinement process included experiments to determine optimal components for enhancing performance in long-context scenarios.
- - Applying YaRN to full attention layers yielded superior outcomes while leveraging olmOCR science PDFs proved more effective compared to alternative methods.
Summary- Olmo 3 is a family of advanced language models available in different sizes.
- These models are designed to improve understanding and use of language in various tasks like coding, chatting, and recalling information.
- The main model, Olmo 3 Think 32B, is known for its strong thinking abilities.
- New data sources were used to make the models better at understanding long pieces of text.
- Different techniques were tested to see which ones worked best for improving the models' performance.
Definitions- Language Models: Programs that can understand and generate human language.
- Parameters: Settings or variables that control how a model works.
- Context: Information surrounding a particular piece of data that helps understand it better.
- Document: A written or digital record containing information.
- Training: Teaching a model by giving it examples to learn from.
The Olmo 3 language models, developed by Bertsch et al., are a family of state-of-the-art and fully-open models that have been recently released at the 7B and 32B parameter scales. These models have been specifically designed to enhance long-context reasoning, function calling, coding, instruction following, general chat, and knowledge recall capabilities. The flagship model within this family is the Olmo 3 Think 32B which stands out as one of the most robust fully-open thinking models currently available.
To further expand the context capabilities of Olmo 3, a long-context recipe was developed using documents sourced from the olmOCR science PDFs pool. This collection, known as Dolma 3 Longmino Pool, combines both long-context data (34%) and high-quality short-context data (66%) sampled from Dolma 3 Dolmino Mix. The training process for this mix involved an additional use of both the 7B and 32B models.
Various techniques were applied during the long-context extension process in order to achieve optimal results. One such technique was utilizing YaRN in full attention layers which proved to be highly effective in improving performance. Additionally, document packing and inter-document masking were also implemented to further enhance performance.
In order to evaluate the performance of these context-extended models on long-context tasks, two popular benchmarks were used: RULER and HELMET. RULER consists of synthetic long-context tasks such as Needle-in-a-Haystack variations and aggregation tasks which serve as primary metrics for guiding recipe development. On the other hand HELMET offers a diverse set of long-context benchmarks covering retrieval in-context learning and summarization tasks to assess more general capabilities.
The sourcing of long context data primarily involved scientific PDFs processed by olmOCR with additional filtering based on gzip compressibility metrics. Furthermore, LongPpl was used to identify key tokens requiring long-range dependencies during the refinement process. Experiments were also conducted to determine the optimal components for enhancing performance in long-context scenarios.
The results of these experiments indicated that applying YaRN to full attention layers yielded superior outcomes compared to other methods. Additionally, leveraging olmOCR science PDFs proved to be more effective than alternative methods and synthetic data augmentation further enhanced performance over natural documents alone. Document packing was also identified as a crucial factor in boosting performance for longer contexts.
In conclusion, through meticulous analysis and experimentation with architectural design decisions outlined in Bertsch et al., Olmo 3 has successfully expanded its capabilities in handling complex long-context tasks with impressive results showcased across various benchmarks. The development of Dolma 3 Longmino Pool has proven to be a significant step towards improving the overall capabilities of Olmo 3 models and further advancements are expected in the future as research continues on this topic.