Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

AI-generated keywords: Context Files Software Development Coding Agents AGENTBENCH Task Performance

AI-generated Key Points

Context files are commonly used in software development to tailor coding agents to repositories either manually or automatically.
A comprehensive evaluation was conducted by the researchers using two settings: established SWE-bench tasks with LLM-generated context files and a novel collection of issues with developer-committed context files.
The study found that context files tended to decrease task success rates and increase inference cost by over 20% compared to providing no repository context.
Both LLM-generated and developer-provided context files encouraged broader exploration but ultimately made tasks harder due to unnecessary requirements.
The study introduced AGENTBENCH, a benchmark for assessing the impact of context files on coding agents' performance in real-world tasks.
Evaluation on AGENTBENCH and SWE-BENCH LITE revealed that LLM-generated context files generally decreased agent performance, while developer-written context files showed a slight improvement.
Context files led to more thorough testing and exploration by coding agents according to an in-depth analysis of agent traces.
Human-written context files should only include minimal requirements to avoid unnecessarily challenging coding agents.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, Martin Vechev

arXiv: 2602.11988v1 - DOI (cs.SE)

License: CC BY 4.0

Abstract: A widespread practice in software development is to tailor coding agents to repositories using context files, such as AGENTS.md, by either manually or automatically generating them. Although this practice is strongly encouraged by agent developers, there is currently no rigorous investigation into whether such context files are actually effective for real-world tasks. In this work, we study this question and evaluate coding agents' task completion performance in two complementary settings: established SWE-bench tasks from popular repositories, with LLM-generated context files following agent-developer recommendations, and a novel collection of issues from repositories containing developer-committed context files. Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%. Behaviorally, both LLM-generated and developer-provided context files encourage broader exploration (e.g., more thorough testing and file traversal), and coding agents tend to respect their instructions. Ultimately, we conclude that unnecessary requirements from context files make tasks harder, and human-written context files should describe only minimal requirements.

Submitted to arXiv on 12 Feb. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2602.11988v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

By Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, and Martin Vechev, the authors investigate the effectiveness of context files in software development. Context files are commonly used to tailor coding agents to repositories either manually or automatically. Despite being encouraged by agent developers, there has been a lack of rigorous investigation into their actual impact on real-world tasks. The researchers conducted a comprehensive evaluation using two settings: established SWE-bench tasks from popular repositories with LLM-generated context files based on agent-developer recommendations and a novel collection of issues from repositories with developer-committed context files. Surprisingly, the study found that context files tended to decrease task success rates compared to providing no repository context while also increasing inference cost by over 20%. Both LLM-generated and developer-provided context files encouraged broader exploration by coding agents but ultimately made tasks harder due to unnecessary requirements. The key contributions of the study include the introduction of AGENTBENCH, a curated benchmark for assessing the impact of context files on coding agents' performance in real-world tasks. The evaluation involved different coding agents and models on AGENTBENCH and SWE-BENCH LITE, revealing that LLM-generated context files generally decreased agent performance across various models or prompts used for generation. However, developer-written context files showed a slight improvement in agent performance. Furthermore, an in-depth analysis of agent traces demonstrated that context files led to more thorough testing and exploration by coding agents. The findings suggest that human-written context files should only include minimal requirements to avoid making tasks unnecessarily challenging for coding agents. Overall, this research sheds light on the importance of carefully considering the role of repository-level context files in optimizing coding agent performance in software development tasks.

- Context files are commonly used in software development to tailor coding agents to repositories either manually or automatically.
- A comprehensive evaluation was conducted by the researchers using two settings: established SWE-bench tasks with LLM-generated context files and a novel collection of issues with developer-committed context files.
- The study found that context files tended to decrease task success rates and increase inference cost by over 20% compared to providing no repository context.
- Both LLM-generated and developer-provided context files encouraged broader exploration but ultimately made tasks harder due to unnecessary requirements.
- The study introduced AGENTBENCH, a benchmark for assessing the impact of context files on coding agents' performance in real-world tasks.
- Evaluation on AGENTBENCH and SWE-BENCH LITE revealed that LLM-generated context files generally decreased agent performance, while developer-written context files showed a slight improvement.
- Context files led to more thorough testing and exploration by coding agents according to an in-depth analysis of agent traces.
- Human-written context files should only include minimal requirements to avoid unnecessarily challenging coding agents.

SummaryContext files are like special instructions for coding helpers in software development. Researchers tested different types of context files to see how they affect the coding helpers' performance. They found that context files can make tasks harder for the coding helpers by adding extra information. A new benchmark called AGENTBENCH was created to test how context files impact coding helpers in real tasks. It's important to keep context files simple so that coding helpers don't get confused. Definitions- Context files: Special instructions used in software development to guide coding agents. - Repository: A place where code and other project-related files are stored. - Inference cost: The amount of resources needed to make decisions based on available information. - Benchmark: A standard or reference point used for comparison or evaluation. - Coding agents: Programs or tools that assist with writing code.

Introduction: Software development is a complex process that involves writing, testing, and debugging code to create functional and efficient programs. With the increasing demand for software in various industries, developers are constantly looking for ways to improve their productivity and efficiency. In recent years, there has been a growing interest in using coding agents or automated tools to assist developers in their tasks. These agents use machine learning algorithms to analyze code repositories and provide suggestions for improvements or bug fixes. One aspect that has received significant attention from both agent developers and researchers is the use of context files. Context files contain information about the repository structure, dependencies, and other relevant details that help tailor coding agents' performance to specific projects. While they have been widely encouraged by agent developers, there has been a lack of rigorous investigation into their actual impact on real-world tasks. In this research paper titled "The Impact of Context Files on Coding Agents," Thibaud Gloaguen et al. investigate the effectiveness of context files in software development tasks. The authors conducted a comprehensive evaluation using two settings: established SWE-bench tasks from popular repositories with LLM-generated context files based on agent-developer recommendations and a novel collection of issues from repositories with developer-committed context files. AGENTBENCH - A Curated Benchmark: To assess the impact of context files on coding agents' performance in real-world tasks, the authors introduced AGENTBENCH - a curated benchmark specifically designed for this purpose. This benchmark consists of two main components: SWE-BENCH LITE and AGENTBENCH. SWE-BENCH LITE contains established software engineering (SWE) tasks from popular repositories such as Apache Commons Math, Joda-Time, Guava Libraries, etc., which were used as baseline tests for comparison purposes. On the other hand, AGENTBENCH comprises 150 issues collected from various open-source repositories such as TensorFlow.js, React Native CLI Tools, Angular CLI Tools, etc. These issues were manually selected to cover a wide range of programming languages and tasks, including bug fixes, feature additions, and refactoring. Evaluation Methodology: To evaluate the impact of context files on coding agents' performance, the authors used different coding agents and models on both AGENTBENCH and SWE-BENCH LITE. The coding agents included CodeHint - an agent trained with supervised learning techniques, DeepFix - an agent based on deep learning models, and GPT-3 - a state-of-the-art language model. The results from the evaluation revealed that LLM-generated context files generally decreased agent performance across various models or prompts used for generation. This was surprising as these context files were based on recommendations from agent developers themselves. On the other hand, developer-written context files showed a slight improvement in agent performance. Impact of Context Files: An in-depth analysis of agent traces demonstrated that context files led to more thorough testing and exploration by coding agents. However, this also resulted in unnecessary requirements being added to the task at hand, making it more challenging for the agents to complete successfully. The study found that providing no repository context actually yielded better results compared to using LLM-generated or developer-provided context files. This suggests that human-written context files should only include minimal requirements to avoid making tasks unnecessarily challenging for coding agents. Conclusion: In conclusion, this research sheds light on the importance of carefully considering the role of repository-level context files in optimizing coding agent performance in software development tasks. While they may encourage broader exploration by coding agents, they can also make tasks harder due to unnecessary requirements. The introduction of AGENTBENCH provides a curated benchmark for future studies investigating the impact of context files on coding agents' performance. The findings from this research call for further investigation into developing more efficient methods for generating relevant and minimalistic context files tailored to specific projects. Overall, this study highlights the need for careful consideration and evaluation of context files in software development to ensure that they are not hindering the performance of coding agents. With further research and improvements, context files can potentially play a significant role in enhancing coding agent performance and ultimately improving developers' productivity.

Created on 20 Feb. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

62.2%

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

cs.SE

54.2%

Agentless: Demystifying LLM-based Software Engineering Agents

cs.SE

50.7%

Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Ag…

cs.SE

49.4%

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intel…

cs.SE

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.