Reflexion: an autonomous agent with dynamic memory and self-reflection

AI-generated keywords: Reflexion LLM Self-reflection Decision-Making HotPotQA

AI-generated Key Points

  • Recent advancements in large language model (LLM) agents have shown remarkable performance across various benchmarks.
  • A team of researchers proposed Reflexion to address the limitation of LLM agents lacking certain qualities inherent to human decision-making processes, such as the ability to learn from mistakes through self-reflection.
  • Reflexion endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities.
  • The team introduced a heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and construct an internal memory map of the given environment for full automation.
  • The team evaluated their approach by assessing the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive search-based question-and-answer tasks in HotPotQA environments. They observed success rates of 97% and 51%, respectively.
  • The agent used ReAct to solve 97% of the given tasks in 12 trials out of 134 tasks while failing only four times in AlfWorld environments.
  • In HotPotQA environments equipped with a Wikipedia search engine, the agent had to perform relevant searches across multiple documents before providing EM answers given context.
  • Reflexion demonstrates the emergent property of self-reflection in an agent's decision making process which could lead to more efficient problem solving through trial and error.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Noah Shinn, Beck Labash, Ashwin Gopinath

License: CC BY 4.0

Abstract: Recent advancements in decision-making large language model (LLM) agents have demonstrated impressive performance across various benchmarks. However, these state-of-the-art approaches typically necessitate internal model fine-tuning, external model fine-tuning, or policy optimization over a defined state space. Implementing these methods can prove challenging due to the scarcity of high-quality training data or the lack of well-defined state space. Moreover, these agents do not possess certain qualities inherent to human decision-making processes, specifically the ability to learn from mistakes. Self-reflection allows humans to efficiently solve novel problems through a process of trial and error. Building on recent research, we propose Reflexion, an approach that endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. To achieve full automation, we introduce a straightforward yet effective heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and, in some environments, construct an internal memory map of the given environment. To assess our approach, we evaluate the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive, search-based question-and-answer tasks in HotPotQA environments. We observe success rates of 97% and 51%, respectively, and provide a discussion on the emergent property of self-reflection.

Submitted to arXiv on 20 Mar. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2303.11366v1

Recent advancements in large language model (LLM) agents have shown remarkable performance across various benchmarks. To address the limitation of these agents lacking certain qualities inherent to human decision-making processes, such as the ability to learn from mistakes through self-reflection, a team of researchers proposed Reflexion. This approach endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities. The team introduced a heuristic that enables the agent to pinpoint hallucination instances, avoid repetition in action sequences, and construct an internal memory map of the given environment for full automation. The team evaluated their approach by assessing the agent's ability to complete decision-making tasks in AlfWorld environments and knowledge-intensive search-based question-and-answer tasks in HotPotQA environments. They observed success rates of 97% and 51%, respectively. Notably, they demonstrated improved accuracy through self-improved learning rather than success by retry. In AlfWorld environments, the agent used ReAct to solve 97% of the given tasks in 12 trials out of 134 tasks while failing only four times. In HotPotQA environments equipped with a Wikipedia search engine, the agent had to perform relevant searches across multiple documents before providing EM answers given context. While recent works aimed at allowing natural language agents to exhibit reflective-like qualities have shown impressive performance due to their ability to explain mistakes within sub-tasks within trials or process next decisions within closed-loop feedback environments using inner monologue or self-generated solutions for LLM fine-tuning; they rely on immediate failure detection for sub-tasks and cannot explain mistakes that may have developed over a long range of actions and subtasks. Overall, Reflexion demonstrates the emergent property of self-reflection in an agent's decision making process which could lead to more efficient problem solving through trial and error.
Created on 27 Mar. 2023

Assess the quality of the AI-generated content by voting

Score: 1

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.