Do LLMs Benefit From Their Own Words?

AI-generated keywords: Language Models Multi-turn Interactions Contextual Dependencies Dialogue Systems Response Quality

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

Authors: Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, Jacob Andreas
Study compares traditional full-context prompting vs user-turn-only prompting in multi-turn interactions
Removing prior assistant responses does not significantly affect response quality on many turns
Reduction in cumulative context lengths by up to 10 times observed
36.4% of multi-turn conversations contain self-contained prompts
User-turn-only prompting can outperform full context due to context pollution causing errors, hallucinations, or stylistic inconsistencies
Proposal of a context-filtering approach to enhance response quality and decrease memory consumption

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, Jacob Andreas

arXiv: 2602.24287v1 - DOI (cs.CL)

License: NONEXCLUSIVE-DISTRIB 1.0

Abstract: Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we revisit this design choice by asking whether large language models benefit from conditioning on their own prior responses. Using in-the-wild, multi-turn conversations, we compare standard (full-context) prompting with a user-turn-only prompting approach that omits all previous assistant responses, across three open reasoning models and one state-of-the-art model. To our surprise, we find that removing prior assistant responses does not affect response quality on a large fraction of turns. Omitting assistant-side history can reduce cumulative context lengths by up to 10x. To explain this result, we find that multi-turn conversations consist of a substantial proportion (36.4%) of self-contained prompts, and that many follow-up prompts provide sufficient instruction to be answered using only the current user turn and prior user turns. When analyzing cases where user-turn-only prompting substantially outperforms full context, we identify instances of context pollution, in which models over-condition on their previous responses, introducing errors, hallucinations, or stylistic artifacts that propagate across turns. Motivated by these findings, we design a context-filtering approach that selectively omits assistant-side context. Our findings suggest that selectively omitting assistant history can improve response quality while reducing memory consumption.

Submitted to arXiv on 27 Feb. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2602.24287v1

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

The impact of conditioning large language models on their own prior responses in multi-turn interactions is investigated by authors Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, and Jacob Andreas. The study compares the traditional full-context prompting approach with a user-turn-only prompting method that excludes previous assistant responses across various open reasoning models and a state-of-the-art model. Interestingly, the results show that removing prior assistant responses does not significantly affect response quality on many turns. This leads to a reduction in cumulative context lengths by up to 10 times. The analysis reveals that a considerable portion (36.4%) of multi-turn conversations contain self-contained prompts and many follow-up prompts can be answered using only the current user turn and previous user turns. In cases where the user-turn-only prompting outperforms full context, it is attributed to instances of context pollution where models overly rely on their past responses resulting in errors, hallucinations or stylistic inconsistencies propagating through subsequent turns. Based on these findings, the researchers propose a context-filtering approach that selectively omits assistant-side history to enhance response quality while decreasing memory consumption. This study highlights how omitting assistant history can improve performance in large language models during multi-turn interactions and emphasizes the importance of considering contextual dependencies in dialogue systems for more effective communication outcomes.

- Authors: Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, Jacob Andreas
- Study compares traditional full-context prompting vs user-turn-only prompting in multi-turn interactions
- Removing prior assistant responses does not significantly affect response quality on many turns
- Reduction in cumulative context lengths by up to 10 times observed
- 36.4% of multi-turn conversations contain self-contained prompts
- User-turn-only prompting can outperform full context due to context pollution causing errors, hallucinations, or stylistic inconsistencies
- Proposal of a context-filtering approach to enhance response quality and decrease memory consumption

Summary- The study looked at different ways of helping people talk to computers. - They found that sometimes it's better for the computer to only listen to what the person says, instead of remembering everything from before. - Taking away old computer responses didn't make a big difference in how good the new responses were most of the time. - Sometimes, making the computer remember less information can make it work faster and better. - They suggest a new way for computers to understand and respond better in conversations. Definitions- Authors: People who write books or do research. - Prompting: Giving hints or suggestions to help someone do something. - Interactions: When two things affect each other or work together.

In recent years, there has been a surge in the development of large language models (LLMs) that have shown impressive performance on various natural language processing tasks. These models are trained on massive amounts of text data and can generate human-like responses to prompts given by users. However, one major challenge in using LLMs for dialogue systems is their ability to maintain coherence and consistency across multiple turns of conversation. To address this issue, a team of researchers from Stanford University and Google Brain conducted a study titled "The impact of conditioning large language models on their own prior responses in multi-turn interactions". The paper was authored by Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, and Jacob Andreas. In this research paper, the authors investigate how conditioning LLMs on their own previous responses affects response quality in multi-turn interactions. Traditionally, LLMs are conditioned on full-context prompts which include both the user's current turn as well as all previous assistant responses. However, the researchers propose an alternative approach called user-turn-only prompting where only the current user turn is used as input to the model while excluding previous assistant responses. This method aims to reduce context length and memory consumption while still maintaining high-quality responses. To compare these two approaches, the researchers evaluated them across various open reasoning models and a state-of-the-art model called GPT-3. Surprisingly, they found that removing prior assistant responses did not significantly affect response quality on many turns. In fact, it led to a reduction in cumulative context lengths by up to 10 times. Upon further analysis of multi-turn conversations between humans and machines, the researchers discovered that a considerable portion (36.4%) contained self-contained prompts where no information from previous assistant responses was needed to generate an appropriate response. They also found that many follow-up prompts could be answered using only the current user turn and previous user turns. This suggests that LLMs have the ability to understand and maintain contextual dependencies without relying on previous assistant responses. In cases where the user-turn-only prompting outperformed full context, it was attributed to instances of "context pollution" where models overly rely on their past responses resulting in errors, hallucinations, or stylistic inconsistencies propagating through subsequent turns. This highlights the importance of considering contextual dependencies in dialogue systems for more effective communication outcomes. Based on these findings, the researchers propose a context-filtering approach that selectively omits assistant-side history to enhance response quality while decreasing memory consumption. This method aims to strike a balance between using relevant context information and avoiding potential errors caused by over-reliance on previous assistant responses. Overall, this study sheds light on how omitting assistant history can improve performance in large language models during multi-turn interactions. It also emphasizes the importance of considering contextual dependencies in dialogue systems for more effective communication outcomes. With further research and development, this approach could lead to more human-like and coherent conversations with virtual assistants and chatbots.

Created on 03 Mar. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

65.2%

Artificial Impressions: Evaluating Large Language Model Behavior Through the Le…

cs.CL

62.5%

Learning When to Retrieve, What to Rewrite, and How to Respond in Conversatio…

cs.CL

61.8%

Lost in the Middle: How Language Models Use Long Contexts

cs.CL

61.8%

Self-Rewarding Language Models

cs.CL

61.8%

Are LLMs All You Need for Task-Oriented Dialogue?

cs.CL

61.6%

LLMs Get Lost In Multi-Turn Conversation

cs.CL

61.6%

Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabiliti…

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.