SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

AI-generated keywords: Software Engineering

AI-generated Key Points

  • Introduction of SWE-agent as an autonomous system utilizing a language model for computer interaction in software engineering tasks
  • Implementation of a custom-built agent-computer interface (ACI) enhancing capabilities such as code file creation, modification, repository navigation, and program execution
  • Impressive performance on the SWE-bench platform with 12.5% issue resolution, surpassing previous achievements with retrieval-augmented generation (RAG)
  • Impact of ACI design on agent behavior and performance, providing insights on effective design strategies
  • Evolution of software engineering benchmarks to evaluate language model performance through diverse challenges like translating problems into different languages, incorporating libraries, and enhancing test coverage
  • Use of software engineering as an evaluation domain for language models by integrating real-world SE subtasks like automated program repair, bug localization, and testing
  • Leveraging the SWE-bench dataset for evaluation with rigorous automatic execution-based methods showcasing the efficacy of the approach
  • Potential of language models like SWE-agent in addressing complex software engineering challenges and the importance of thoughtful ACI design for optimizing agent performance
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press

First two authors contributed equally. Code and demo at https://swe-agent.com
License: CC BY 4.0

Abstract: Software engineering is a challenging task requiring proficiency in both code generation and interacting with computers. In this paper, we introduce SWE-agent, an autonomous system that uses a language model to interact with a computer to solve software engineering tasks. We show that a custom-built agent-computer interface (ACI) greatly enhances the ability of an agent to create and edit code files, navigate entire repositories and execute programs. On SWE-bench, SWE-agent is able to solve 12.5% of issues, compared to the previous best of 3.8% achieved with retrieval-augmented generation (RAG). We explore how ACI design impacts an agent's behavior and performance, and provide insights on effective design.

Submitted to arXiv on 06 May. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2405.15793v1

, , , , In the realm of software engineering, the task of code generation and computer interaction is a complex challenge that requires a high level of proficiency. In this paper, we introduce SWE-agent, an autonomous system that utilizes a language model to effectively interact with computers in order to tackle various software engineering tasks. Through the implementation of a custom-built agent-computer interface (ACI), we demonstrate how this interface significantly enhances the agent's capabilities to create and modify code files, navigate through entire repositories, and execute programs. On the SWE-bench platform, our SWE-agent showcases impressive performance by successfully resolving 12.5% of issues, surpassing the previous best achievement of 3.8% with retrieval-augmented generation (RAG). We delve into the impact of ACI design on the behavior and overall performance of the agent, offering valuable insights on effective design strategies. Furthermore, we explore related work in software engineering benchmarks, highlighting advancements in code generation tasks that evaluate language model performance. These benchmarks have evolved to encompass diverse challenges such as translating problems into different programming languages, incorporating third-party libraries, and enhancing test coverage. Additionally, recent efforts have emphasized using software engineering as a robust evaluation testbed for language models by integrating real-world SE subtasks like automated program repair, bug localization, and testing within a unified task formulation. By leveraging the comprehensive SWE-bench dataset comprising task instances from various GitHub repositories and employing rigorous automatic execution-based evaluation methods, our study underscores the significance of utilizing software engineering as a multifaceted evaluation domain for language models. Through detailed experimental setups and analysis on both full-scale SWE-bench test sets and focused subsets like SWE-bench Lite for functional bug fixes evaluation, we present compelling results that underscore the efficacy of our approach. In conclusion, our research sheds light on the potential of language models as agents in tackling intricate software engineering challenges and emphasizes the importance of thoughtful ACI design in optimizing agent performance. The integration of cutting-edge technologies like SWE-agent opens up new avenues for advancing automation in software development processes while showcasing promising outcomes in enhancing code generation capabilities and repository-level code editing tasks.
Created on 04 Sep. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.