Sakana Fugu Technical Report

AI-generated keywords: Sakana Fugu

AI-generated Key Points

Sakana Fugu Technical Report highlights advancements in Large Language Models (LLMs) and specialization of providers in distinct domains
Introduces Sakana Fugu orchestrator models designed to enhance LLM agent teams' capabilities
Fugu models understand user queries and create adaptive scaffolds for problem-solving
Surpasses individual LLM agents on challenging tasks like SWE-Bench Pro, Terminal Bench, LiveCodeBench, GPQA-Diamond, Humanity's Last Exam, and CharXiv Reasoning
Unveils two models: Fugu for everyday use with balanced performance and latency, and Fugu-Ultra prioritizing answer quality on difficult problems
Training paradigm includes large-scale fine-tuning, evolutionary algorithms, and reinforcement learning approaches
Discusses infrastructure and core design principles transforming methods into a production system

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yujin Tang, Edoardo Cetin, Jinglue Xu, Qi Sun, Stefan Nielsen, Vincent Richard, Haruto Goda, Iaroslav Tymchenko, Nhan Nguyen, Hyunin Lee, Mari Ashiga, Shashank Kotyan, So Kuroki, Tarin Clanuwat

arXiv: 2606.21228v2 - DOI (cs.LG)

License: CC BY 4.0

Abstract: The capabilities of frontier Large Language Models (LLMs) continue to advance, with different providers increasingly specializing in distinct domains. This raises a natural next objective: how to combine the individual specializations of various LLMs into a collectively intelligent system. To this end, we report the development of Sakana Fugu, a family of orchestrator models that harness and amplify the capabilities of an LLM agent team. Fugu models are themselves language models trained to understand user queries and dynamically devise agentic scaffolds to solve them. Through these adaptive scaffolds, Fugu accesses performance beyond any individual LLM agent, achieving state-of-the-art results compared to other publicly accessible models across a range of challenging tasks, including SWE-Bench Pro, Terminal Bench, LiveCodeBench, GPQA-Diamond, Humanity's Last Exam, and CharXiv Reasoning. We release two models: Fugu, which balances performance with latency for everyday use, and Fugu-Ultra, which prioritizes answer quality on the hardest problems. We describe our training paradigm, which encompasses large-scale fine-tuning, evolutionary algorithms, and reinforcement learning approaches, along with the infrastructure and core design principles that turn these methods into a production system. We hope this report encourages further research into multi-agent systems and dynamic, query-adaptive agentic scaffolds as a path toward the next frontier of AI capabilities, accessed through collective intelligence.

Submitted to arXiv on 19 Jun. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2606.21228v2

Comprehensive Summary
Key points
Layman's Summary
Blog article

, , , , The Sakana Fugu Technical Report presents the latest advancements in Large Language Models (LLMs) and the increasing specialization of different providers in distinct domains. It introduces Sakana Fugu, a family of orchestrator models designed to amplify the capabilities of LLM agent teams. These Fugu models are trained to understand user queries and dynamically create agentic scaffolds for effective problem-solving. Through adaptive scaffolds, Fugu surpasses individual LLM agents, achieving state-of-the-art results on challenging tasks such as SWE-Bench Pro, Terminal Bench, LiveCodeBench, GPQA-Diamond, Humanity's Last Exam, and CharXiv Reasoning. The report unveils two models: Fugu for everyday use with a balance between performance and latency, and Fugu-Ultra prioritizing answer quality on difficult problems. The training paradigm involves large-scale fine-tuning, evolutionary algorithms, and reinforcement learning approaches. Additionally, it delves into the infrastructure and core design principles that transform these methods into a production system. Authored by Yujin Tang, Edoardo Cetin, Jinglue Xu, Qi Sun, Stefan Nielsen, Vincent Richard, Haruto Goda, Iaroslav Tymchenko, Nhan Nguyen, Hyunin Lee, Mari Ashiga, Shashank Kotyan, So Kuroki, and Tarin Clanuwat; this report aims to inspire further research into multi-agent systems and dynamic query-adaptive agentic scaffolds as a pathway towards unlocking the next frontier of AI capabilities through collective intelligence.

- Sakana Fugu Technical Report highlights advancements in Large Language Models (LLMs) and specialization of providers in distinct domains
- Introduces Sakana Fugu orchestrator models designed to enhance LLM agent teams' capabilities
- Fugu models understand user queries and create adaptive scaffolds for problem-solving
- Surpasses individual LLM agents on challenging tasks like SWE-Bench Pro, Terminal Bench, LiveCodeBench, GPQA-Diamond, Humanity's Last Exam, and CharXiv Reasoning
- Unveils two models: Fugu for everyday use with balanced performance and latency, and Fugu-Ultra prioritizing answer quality on difficult problems
- Training paradigm includes large-scale fine-tuning, evolutionary algorithms, and reinforcement learning approaches
- Discusses infrastructure and core design principles transforming methods into a production system

Summary- The Sakana Fugu Technical Report talks about how they made big improvements in computer programs that understand languages and specialized in different areas. - They created special models called Sakana Fugu orchestrator models to make teams of these computer programs work better together. - These Fugu models can understand what people ask them and help solve problems by creating helpful structures. - The Fugu models are really good at hard tasks like tests and problem-solving challenges, even better than individual computer programs. - They made two types of Fugu models: one for everyday use that works well overall, and another one that focuses on giving the best answers for tough problems. Definitions- Advancements: Improvements or progress made in something. - Large Language Models (LLMs): Computer programs that can understand and generate human language on a large scale. - Specialization: Becoming an expert or focusing on a specific area or topic. - Orchestrator: A system or model that coordinates the actions of other parts to achieve a goal. - Adaptive: Able to change or adjust based on different situations.

Introduction

The field of Artificial Intelligence (AI) has seen remarkable advancements in recent years, with the emergence of Large Language Models (LLMs) being one of the most significant breakthroughs. LLMs are trained on large amounts of text data and can generate human-like responses to a wide range of tasks. However, as these models continue to grow in size and complexity, they face challenges such as scalability and domain-specific knowledge. To address these issues, a team of researchers from OpenAI and Google Brain have collaborated to develop Sakana Fugu – a family of orchestrator models that aim to amplify the capabilities of LLM agent teams. In this technical report, they present their findings on how Fugu outperforms individual LLM agents on various challenging tasks through its adaptive scaffolding approach.

The Need for Multi-Agent Systems

While single-agent systems have shown impressive results in AI tasks, there is growing evidence that multi-agent systems can achieve even better performance. This is because each agent brings unique strengths and perspectives to problem-solving, resulting in collective intelligence that surpasses individual abilities. However, coordinating multiple agents poses its own set of challenges. The authors highlight two key problems: communication between agents and task allocation among them. To tackle these issues, they propose using an orchestrator model like Fugu that can dynamically create agentic scaffolds based on user queries.

Fugu Models: Everyday Use vs Ultra Performance

The Sakana Fugu Technical Report introduces two main models – Fugu for everyday use and Fugu-Ultra for ultra-performance scenarios. These models are trained using large-scale fine-tuning techniques along with evolutionary algorithms and reinforcement learning approaches. Fugu prioritizes speed over accuracy by balancing performance with latency while still achieving state-of-the-art results on various benchmark tasks such as SWE-Bench Pro, Terminal Bench, LiveCodeBench, GPQA-Diamond, Humanity's Last Exam, and CharXiv Reasoning. On the other hand, Fugu-Ultra focuses on maximizing answer quality on difficult problems.

Adaptive Scaffolding: The Key to Fugu's Success

The core concept behind Sakana Fugu is its adaptive scaffolding approach. This involves dynamically creating agentic scaffolds based on user queries to guide LLM agents towards more efficient problem-solving. These scaffolds are tailored to each task and can be modified in real-time as the conversation progresses. Through this approach, Fugu outperforms individual LLM agents by leveraging their collective strengths and expertise. It also allows for better communication between agents and efficient task allocation among them.

Training Paradigm

To achieve optimal performance, the authors used a combination of large-scale fine-tuning techniques with evolutionary algorithms and reinforcement learning approaches. They trained Fugu models on a diverse set of tasks from different domains to ensure their versatility and adaptability. Moreover, they also incorporated human feedback into the training process through interactive learning methods. This not only improves model performance but also ensures that it aligns with human preferences and values.

Infrastructure Design

In addition to discussing the training paradigm, the report also delves into the infrastructure design required for deploying Sakana Fugu in production systems. This includes details about data processing pipelines, model serving architecture, distributed training setup, and monitoring systems. The authors emphasize that an effective infrastructure design is crucial for scaling up multi-agent systems like Fugu in real-world applications.

Conclusion

The Sakana Fugu Technical Report presents groundbreaking research on multi-agent systems using adaptive scaffolding approaches. Through extensive experimentation and rigorous training paradigms, they have demonstrated how orchestrator models like Fugu can surpass individual LLM agents' capabilities in various challenging tasks. Their work not only contributes to the advancement of AI but also opens up new possibilities for collective intelligence and dynamic query-adaptive agentic scaffolds. As the authors themselves state, this report is just the beginning, and there is still much to explore in this field. We can expect further developments and advancements in multi-agent systems as researchers continue to build upon these findings.

Created on 24 Jun. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

53.0%

Zephyr: Direct Distillation of LM Alignment

cs.LG

52.8%

ChaTA: Towards an Intelligent Question-Answer Teaching Assistant using Open-S…

cs.LG

51.6%

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Aut…

cs.LG

51.2%

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

cs.LG

50.3%

A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challen…

cs.LG

50.1%

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

cs.LG

49.4%

SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Softw…

cs.LG

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.