Sakana Fugu Technical Report

AI-generated keywords: Sakana Fugu

AI-generated Key Points

  • Sakana Fugu Technical Report highlights advancements in Large Language Models (LLMs) and specialization of providers in distinct domains
  • Introduces Sakana Fugu orchestrator models designed to enhance LLM agent teams' capabilities
  • Fugu models understand user queries and create adaptive scaffolds for problem-solving
  • Surpasses individual LLM agents on challenging tasks like SWE-Bench Pro, Terminal Bench, LiveCodeBench, GPQA-Diamond, Humanity's Last Exam, and CharXiv Reasoning
  • Unveils two models: Fugu for everyday use with balanced performance and latency, and Fugu-Ultra prioritizing answer quality on difficult problems
  • Training paradigm includes large-scale fine-tuning, evolutionary algorithms, and reinforcement learning approaches
  • Discusses infrastructure and core design principles transforming methods into a production system
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yujin Tang, Edoardo Cetin, Jinglue Xu, Qi Sun, Stefan Nielsen, Vincent Richard, Haruto Goda, Iaroslav Tymchenko, Nhan Nguyen, Hyunin Lee, Mari Ashiga, Shashank Kotyan, So Kuroki, Tarin Clanuwat

License: CC BY 4.0

Abstract: The capabilities of frontier Large Language Models (LLMs) continue to advance, with different providers increasingly specializing in distinct domains. This raises a natural next objective: how to combine the individual specializations of various LLMs into a collectively intelligent system. To this end, we report the development of Sakana Fugu, a family of orchestrator models that harness and amplify the capabilities of an LLM agent team. Fugu models are themselves language models trained to understand user queries and dynamically devise agentic scaffolds to solve them. Through these adaptive scaffolds, Fugu accesses performance beyond any individual LLM agent, achieving state-of-the-art results compared to other publicly accessible models across a range of challenging tasks, including SWE-Bench Pro, Terminal Bench, LiveCodeBench, GPQA-Diamond, Humanity's Last Exam, and CharXiv Reasoning. We release two models: Fugu, which balances performance with latency for everyday use, and Fugu-Ultra, which prioritizes answer quality on the hardest problems. We describe our training paradigm, which encompasses large-scale fine-tuning, evolutionary algorithms, and reinforcement learning approaches, along with the infrastructure and core design principles that turn these methods into a production system. We hope this report encourages further research into multi-agent systems and dynamic, query-adaptive agentic scaffolds as a path toward the next frontier of AI capabilities, accessed through collective intelligence.

Submitted to arXiv on 19 Jun. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2606.21228v2

, , , , The Sakana Fugu Technical Report presents the latest advancements in Large Language Models (LLMs) and the increasing specialization of different providers in distinct domains. It introduces Sakana Fugu, a family of orchestrator models designed to amplify the capabilities of LLM agent teams. These Fugu models are trained to understand user queries and dynamically create agentic scaffolds for effective problem-solving. Through adaptive scaffolds, Fugu surpasses individual LLM agents, achieving state-of-the-art results on challenging tasks such as SWE-Bench Pro, Terminal Bench, LiveCodeBench, GPQA-Diamond, Humanity's Last Exam, and CharXiv Reasoning. The report unveils two models: Fugu for everyday use with a balance between performance and latency, and Fugu-Ultra prioritizing answer quality on difficult problems. The training paradigm involves large-scale fine-tuning, evolutionary algorithms, and reinforcement learning approaches. Additionally, it delves into the infrastructure and core design principles that transform these methods into a production system. Authored by Yujin Tang, Edoardo Cetin, Jinglue Xu, Qi Sun, Stefan Nielsen, Vincent Richard, Haruto Goda, Iaroslav Tymchenko, Nhan Nguyen, Hyunin Lee, Mari Ashiga, Shashank Kotyan, So Kuroki, and Tarin Clanuwat; this report aims to inspire further research into multi-agent systems and dynamic query-adaptive agentic scaffolds as a pathway towards unlocking the next frontier of AI capabilities through collective intelligence.
Created on 24 Jun. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.