SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

AI-generated keywords: Self-improvement Autonomous web agents Skill-centric framework Procedural knowledge abstraction Transferable skills

AI-generated Key Points

  • Humans have evolved self-improvement mechanisms through environment exploration, hierarchical abstraction of experiences, and collaborative construction of skill repertoires
  • Autonomous web agents struggle with procedural knowledge abstraction, refining skills, and skill composition
  • SkillWeaver framework enables agents to synthesize reusable skills as APIs through three stages:
  • Skill Proposal: Identifying novel skills based on observations and available APIs
  • Skill Synthesis: Generating successful trajectories from proposed skills to synthesize APIs
  • Skill Honing: Testing synthesized APIs with automatically generated test cases for robustness
  • Experiments show significant success rate improvements on WebArena and real-world websites using SkillWeaver
  • Stronger agents can enhance weaker ones through transferable skills synthesized by SkillWeaver
  • SkillWeaver enables autonomous self-improvement in navigating complex online environments by building conceptual maps, accumulating procedural knowledge as reusable skills, composing simple skills into complex routines, and enhancing decision-making processes without extensive training data or external supervision
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Boyuan Zheng, Michael Y. Fatemi, Xiaolong Jin, Zora Zhiruo Wang, Apurva Gandhi, Yueqi Song, Yu Gu, Jayanth Srinivasa, Gaowen Liu, Graham Neubig, Yu Su

License: CC BY 4.0

Abstract: To survive and thrive in complex environments, humans have evolved sophisticated self-improvement mechanisms through environment exploration, hierarchical abstraction of experiences into reuseable skills, and collaborative construction of an ever-growing skill repertoire. Despite recent advancements, autonomous web agents still lack crucial self-improvement capabilities, struggling with procedural knowledge abstraction, refining skills, and skill composition. In this work, we introduce SkillWeaver, a skill-centric framework enabling agents to self-improve by autonomously synthesizing reusable skills as APIs. Given a new website, the agent autonomously discovers skills, executes them for practice, and distills practice experiences into robust APIs. Iterative exploration continually expands a library of lightweight, plug-and-play APIs, significantly enhancing the agent's capabilities. Experiments on WebArena and real-world websites demonstrate the efficacy of SkillWeaver, achieving relative success rate improvements of 31.8% and 39.8%, respectively. Additionally, APIs synthesized by strong agents substantially enhance weaker agents through transferable skills, yielding improvements of up to 54.3% on WebArena. These results demonstrate the effectiveness of honing diverse website interactions into APIs, which can be seamlessly shared among various web agents.

Submitted to arXiv on 09 Apr. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2504.07079v1

In complex environments, humans have evolved sophisticated self-improvement mechanisms through environment exploration, hierarchical abstraction of experiences into reusable skills, and collaborative construction of a skill repertoire. However, autonomous web agents still struggle with procedural knowledge abstraction, refining skills, and skill composition. To address this challenge, SkillWeaver is introduced as a skill-centric framework enabling agents to autonomously synthesize reusable skills as APIs. The framework consists of three stages: 1. Skill Proposal: The agent identifies novel skills based on observations and available APIs in the skill library. 2. Skill Synthesis: Successful trajectories generated by executing proposed skills are used to synthesize APIs. 3. Skill Honing: Synthesized APIs undergo testing with automatically generated test cases for robustness. Through iterative exploration and practice on new websites, the agent builds a library of lightweight, plug-and-play APIs that enhance its capabilities significantly. Experiments on WebArena and real-world websites demonstrate the efficacy of SkillWeaver, with relative success rate improvements of 31.8% and 39.8%, respectively. Stronger agents can also enhance weaker ones through transferable skills synthesized by SkillWeaver, resulting in improvements of up to 54.3% on WebArena. By honing diverse website interactions into APIs that can be seamlessly shared among web agents, SkillWeaver showcases the effectiveness of autonomous self-improvement in navigating complex online environments. This framework enables agents to build conceptual maps of website environments, accumulate procedural knowledge as reusable skills, compose simple skills into complex routines, and enhance decision-making processes through learned skills without the need for extensive training data or external supervision.
Created on 10 Apr. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.