Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

AI-generated keywords: Context Files Software Development Coding Agents AGENTBENCH Task Performance

AI-generated Key Points

  • Context files are commonly used in software development to tailor coding agents to repositories either manually or automatically.
  • A comprehensive evaluation was conducted by the researchers using two settings: established SWE-bench tasks with LLM-generated context files and a novel collection of issues with developer-committed context files.
  • The study found that context files tended to decrease task success rates and increase inference cost by over 20% compared to providing no repository context.
  • Both LLM-generated and developer-provided context files encouraged broader exploration but ultimately made tasks harder due to unnecessary requirements.
  • The study introduced AGENTBENCH, a benchmark for assessing the impact of context files on coding agents' performance in real-world tasks.
  • Evaluation on AGENTBENCH and SWE-BENCH LITE revealed that LLM-generated context files generally decreased agent performance, while developer-written context files showed a slight improvement.
  • Context files led to more thorough testing and exploration by coding agents according to an in-depth analysis of agent traces.
  • Human-written context files should only include minimal requirements to avoid unnecessarily challenging coding agents.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, Martin Vechev

License: CC BY 4.0

Abstract: A widespread practice in software development is to tailor coding agents to repositories using context files, such as AGENTS.md, by either manually or automatically generating them. Although this practice is strongly encouraged by agent developers, there is currently no rigorous investigation into whether such context files are actually effective for real-world tasks. In this work, we study this question and evaluate coding agents' task completion performance in two complementary settings: established SWE-bench tasks from popular repositories, with LLM-generated context files following agent-developer recommendations, and a novel collection of issues from repositories containing developer-committed context files. Across multiple coding agents and LLMs, we find that context files tend to reduce task success rates compared to providing no repository context, while also increasing inference cost by over 20%. Behaviorally, both LLM-generated and developer-provided context files encourage broader exploration (e.g., more thorough testing and file traversal), and coding agents tend to respect their instructions. Ultimately, we conclude that unnecessary requirements from context files make tasks harder, and human-written context files should describe only minimal requirements.

Submitted to arXiv on 12 Feb. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2602.11988v1

By Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, and Martin Vechev, the authors investigate the effectiveness of context files in software development. Context files are commonly used to tailor coding agents to repositories either manually or automatically. Despite being encouraged by agent developers, there has been a lack of rigorous investigation into their actual impact on real-world tasks. The researchers conducted a comprehensive evaluation using two settings: established SWE-bench tasks from popular repositories with LLM-generated context files based on agent-developer recommendations and a novel collection of issues from repositories with developer-committed context files. Surprisingly, the study found that context files tended to decrease task success rates compared to providing no repository context while also increasing inference cost by over 20%. Both LLM-generated and developer-provided context files encouraged broader exploration by coding agents but ultimately made tasks harder due to unnecessary requirements. The key contributions of the study include the introduction of AGENTBENCH, a curated benchmark for assessing the impact of context files on coding agents' performance in real-world tasks. The evaluation involved different coding agents and models on AGENTBENCH and SWE-BENCH LITE, revealing that LLM-generated context files generally decreased agent performance across various models or prompts used for generation. However, developer-written context files showed a slight improvement in agent performance. Furthermore, an in-depth analysis of agent traces demonstrated that context files led to more thorough testing and exploration by coding agents. The findings suggest that human-written context files should only include minimal requirements to avoid making tasks unnecessarily challenging for coding agents. Overall, this research sheds light on the importance of carefully considering the role of repository-level context files in optimizing coding agent performance in software development tasks.
Created on 20 Feb. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.