Latent Collaboration in Multi-Agent Systems

AI-generated keywords: Multi-agent systems Large language models LatentMAS Collaboration framework Efficiency improvements

AI-generated Key Points

Significant progress in extending large language models (LLMs) to system-level intelligence through coordination in multi-agent systems (MAS)
Introduction of LatentMAS framework enabling direct collaboration among models within a continuous latent space
End-to-end training-free framework allowing LLM agents to engage in pure latent collaboration, enhancing seamless teamwork
Process initiation by generating auto-regressive latent thoughts using last-layer hidden embeddings by each agent in LatentMAS
Storage and sharing of latent representations in a common working memory for lossless information exchange between agents
Theoretical analyses showing higher expressiveness and information preservation with lower complexity compared to traditional text-based MAS approaches
Empirical evaluations demonstrating superior performance of LatentMAS across various benchmarks, achieving up to 14.6% higher accuracy and reducing output token usage by 70.8%-83.7%
Efficiency gains with 4x-4.3x faster end-to-end inference without additional training requirements
Availability of code and data for LatentMAS at https://github.com/Gen-Verse/LatentMAS for further exploration and development
Illustration in Figure 2 highlighting how LatentMAS enhances multi-agent information transfer at the final transformer layer compared to sequential and hierarchical MAS structures

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jiaru Zou, Xiyuan Yang, Ruizhong Qiu, Gaotang Li, Katherine Tieu, Pan Lu, Ke Shen, Hanghang Tong, Yejin Choi, Jingrui He, James Zou, Mengdi Wang, Ling Yang

arXiv: 2511.20639v1 - DOI (cs.CL)

Project: https://github.com/Gen-Verse/LatentMAS

License: CC BY 4.0

Abstract: Multi-agent systems (MAS) extend large language models (LLMs) from independent single-model reasoning to coordinative system-level intelligence. While existing LLM agents depend on text-based mediation for reasoning and communication, we take a step forward by enabling models to collaborate directly within the continuous latent space. We introduce LatentMAS, an end-to-end training-free framework that enables pure latent collaboration among LLM agents. In LatentMAS, each agent first performs auto-regressive latent thoughts generation through last-layer hidden embeddings. A shared latent working memory then preserves and transfers each agent's internal representations, ensuring lossless information exchange. We provide theoretical analyses establishing that LatentMAS attains higher expressiveness and lossless information preservation with substantially lower complexity than vanilla text-based MAS. In addition, empirical evaluations across 9 comprehensive benchmarks spanning math and science reasoning, commonsense understanding, and code generation show that LatentMAS consistently outperforms strong single-model and text-based MAS baselines, achieving up to 14.6% higher accuracy, reducing output token usage by 70.8%-83.7%, and providing 4x-4.3x faster end-to-end inference. These results demonstrate that our new latent collaboration framework enhances system-level reasoning quality while offering substantial efficiency gains without any additional training. Code and data are fully open-sourced at https://github.com/Gen-Verse/LatentMAS.

Submitted to arXiv on 25 Nov. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2511.20639v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

In the field of multi-agent systems (MAS), there has been significant progress in extending large language models (LLMs) from individual reasoning to system-level intelligence through coordination. Traditional LLM agents rely on text-based communication for reasoning, but a new framework called LatentMAS introduces a groundbreaking approach by enabling direct collaboration among models within a continuous latent space. This end-to-end training-free framework allows LLM agents to engage in pure latent collaboration, enhancing their ability to work together seamlessly. In LatentMAS, each agent initiates the process by generating auto-regressive latent thoughts using last-layer hidden embeddings. These latent representations are then stored and shared in a common working memory, facilitating lossless information exchange between agents. Theoretical analyses have shown that LatentMAS achieves higher expressiveness and information preservation with significantly lower complexity compared to conventional text-based MAS approaches. Empirical evaluations across various benchmarks covering math and science reasoning, commonsense understanding, and code generation demonstrate the superior performance of LatentMAS. It consistently outperforms strong single-model and text-based MAS baselines, achieving up to 14.6% higher accuracy while reducing output token usage by 70.8%-83.7%. Moreover, LatentMAS offers 4x-4.3x faster end-to-end inference, showcasing its efficiency gains without the need for additional training. The results highlight that the new latent collaboration framework not only enhances system-level reasoning quality but also provides substantial efficiency improvements. The code and data for LatentMAS are openly available at https://github.com/Gen-Verse/LatentMAS, allowing for further exploration and development in this cutting-edge area of research. Figure 2 illustrates the difference between sequential and hierarchical MAS structures, showcasing how LatentMAS enhances multi-agent information transfer at the final transformer layer. For further inquiries or collaborations related to this research, contact Ling Yang at ly1988@pri.

- Significant progress in extending large language models (LLMs) to system-level intelligence through coordination in multi-agent systems (MAS)
- Introduction of LatentMAS framework enabling direct collaboration among models within a continuous latent space
- End-to-end training-free framework allowing LLM agents to engage in pure latent collaboration, enhancing seamless teamwork
- Process initiation by generating auto-regressive latent thoughts using last-layer hidden embeddings by each agent in LatentMAS
- Storage and sharing of latent representations in a common working memory for lossless information exchange between agents
- Theoretical analyses showing higher expressiveness and information preservation with lower complexity compared to traditional text-based MAS approaches
- Empirical evaluations demonstrating superior performance of LatentMAS across various benchmarks, achieving up to 14.6% higher accuracy and reducing output token usage by 70.8%-83.7%
- Efficiency gains with 4x-4.3x faster end-to-end inference without additional training requirements
- Availability of code and data for LatentMAS at https://github.com/Gen-Verse/LatentMAS for further exploration and development
- Illustration in Figure 2 highlighting how LatentMAS enhances multi-agent information transfer at the final transformer layer compared to sequential and hierarchical MAS structures

SummaryResearchers have made big progress in making smart computer systems that can work together. They created a new way called LatentMAS for models to collaborate directly in a shared space. This new method allows the models to work together without needing extra training. The models start by thinking on their own and then share their thoughts with each other in a special memory space. LatentMAS is shown to be better and faster than older methods, with code available for others to use. Definitions- Large Language Models (LLMs): Advanced computer programs that understand and generate human language. - System-level intelligence: Smart computer systems capable of performing complex tasks. - Multi-agent systems (MAS): Computer systems where multiple agents or entities work together towards a common goal. - Collaboration: Working together towards a shared objective. - Latent space: A mathematical space where data points are represented as vectors, often used in machine learning for modeling relationships between variables. - End-to-end training-free framework: A system that can function without additional training from start to finish. - Auto-regressive latent thoughts: Sequential predictions generated by an algorithm based on previous outputs. - Working memory: Temporary storage area where information is held briefly during processing tasks. - Expressiveness: Ability of a system to convey meaning effectively. - Information preservation: Maintaining the original content and meaning of data throughout processing.

Multi-agent systems (MAS) have been a popular area of research in the field of artificial intelligence, with numerous applications in various domains such as robotics, economics, and social sciences. These systems consist of multiple agents that interact with each other to achieve a common goal. With recent advancements in large language models (LLMs), there has been significant progress in extending individual reasoning to system-level intelligence through coordination. Traditional LLM agents rely on text-based communication for reasoning, which can be limiting in terms of expressiveness and efficiency. However, a new framework called LatentMAS introduces a groundbreaking approach by enabling direct collaboration among models within a continuous latent space. This end-to-end training-free framework allows LLM agents to engage in pure latent collaboration, enhancing their ability to work together seamlessly. So what exactly is LatentMAS and how does it differ from traditional MAS approaches? Let's dive into the details. The Framework LatentMAS is an innovative framework that enables multi-agent systems to collaborate using latent representations instead of text-based communication. It consists of two main components: auto-regressive latent thoughts generation and common working memory. In this framework, each agent initiates the process by generating auto-regressive latent thoughts using last-layer hidden embeddings. These latent representations are then stored and shared in a common working memory, facilitating lossless information exchange between agents. This approach offers several advantages over traditional text-based MAS approaches: 1. Higher Expressiveness: By using continuous latent representations instead of discrete tokens for communication, LatentMAS achieves higher expressiveness and information preservation with significantly lower complexity. 2. Efficiency Gains: The use of continuous latent representations also leads to substantial efficiency gains without the need for additional training. Empirical evaluations have shown that LatentMAS offers 4x-4.3x faster end-to-end inference compared to conventional text-based MAS approaches. 3. Better Performance: Theoretical analyses have shown that LatentMAS outperforms traditional MAS approaches in terms of reasoning quality. Empirical evaluations across various benchmarks covering math and science reasoning, commonsense understanding, and code generation have consistently shown superior performance of LatentMAS. It achieves up to 14.6% higher accuracy while reducing output token usage by 70.8%-83.7%. Figure 1: Sequential vs Hierarchical MAS structures (Source: https://github.com/Gen-Verse/LatentMAS) To better understand the difference between sequential and hierarchical MAS structures, Figure 1 illustrates how LatentMAS enhances multi-agent information transfer at the final transformer layer. Availability The code and data for LatentMAS are openly available at https://github.com/Gen-Verse/LatentMAS, allowing for further exploration and development in this cutting-edge area of research. Conclusion In conclusion, LatentMAS is a groundbreaking framework that enables direct collaboration among models within a continuous latent space. By using auto-regressive latent thoughts generation and common working memory, it allows LLM agents to engage in pure latent collaboration without relying on text-based communication. Empirical evaluations have shown that LatentMAS offers higher expressiveness, efficiency gains, and better performance compared to traditional text-based MAS approaches. With its availability as open-source code and data, we can expect further advancements in this area of research. For further inquiries or collaborations related to this research, contact Ling Yang at ly1988@pri.

Created on 07 Dec. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

59.6%

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domai…

cs.CL

59.6%

M+: Extending MemoryLLM with Scalable Long-Term Memory

cs.CL

59.0%

A Comprehensive Overview of Large Language Models

cs.CL

58.9%

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

cs.CL

57.6%

A Survey on Multi-hop Question Answering and Generation

cs.CL

57.4%

Yi: Open Foundation Models by 01.AI

cs.CL

57.2%

Enhancing Retrieval-Augmented LMs with a Two-stage Consistency Learning Compr…

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.