Gorilla: Large Language Model Connected with Massive APIs

AI-generated keywords: Large Language Models (LLMs) Gorilla APIBench HuggingFace TorchHub

AI-generated Key Points

  • Large Language Models (LLMs) have made significant advancements in various tasks
  • LLMs struggle with generating accurate input arguments and often hallucinate incorrect API call usage
  • Gorilla is a finetuned LLaMA-based model that outperforms GPT-4 in writing API calls
  • Gorilla demonstrates strong adaptability to test-time document changes when combined with a document retriever
  • Evaluation of Gorilla's performance is done using the APIBench dataset consisting of HuggingFace, TorchHub, and TensorHub APIs
  • Successful integration of the retrieval system with Gorilla improves the reliability and applicability of LLM outputs
  • Availability of Gorilla's code, model, data, and demo at https://gorilla.cs.berkeley.edu facilitates its adoption
  • Empowering LLMs to use tools through APIs enhances their capabilities and interaction with tools in various domains
  • The proposed pipeline for finetuning LLMs to call APIs surpasses GPT-4's performance in massive datasets collected from companies like Samsung SDS, Uber, and VMware
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Shishir G. Patil, Tianjun Zhang, Xin Wang, Joseph E. Gonzalez

License: CC BY 4.0

Abstract: Large Language Models (LLMs) have seen an impressive wave of advances recently, with models now excelling in a variety of tasks, such as mathematical reasoning and program synthesis. However, their potential to effectively use tools via API calls remains unfulfilled. This is a challenging task even for today's state-of-the-art LLMs such as GPT-4, largely due to their inability to generate accurate input arguments and their tendency to hallucinate the wrong usage of an API call. We release Gorilla, a finetuned LLaMA-based model that surpasses the performance of GPT-4 on writing API calls. When combined with a document retriever, Gorilla demonstrates a strong capability to adapt to test-time document changes, enabling flexible user updates or version changes. It also substantially mitigates the issue of hallucination, commonly encountered when prompting LLMs directly. To evaluate the model's ability, we introduce APIBench, a comprehensive dataset consisting of HuggingFace, TorchHub, and TensorHub APIs. The successful integration of the retrieval system with Gorilla demonstrates the potential for LLMs to use tools more accurately, keep up with frequently updated documentation, and consequently increase the reliability and applicability of their outputs. Gorilla's code, model, data, and demo are available at https://gorilla.cs.berkeley.edu

Submitted to arXiv on 24 May. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2305.15334v1

Large Language Models (LLMs) have made significant advancements in various tasks, such as mathematical reasoning and program synthesis. However, their ability to effectively use tools via API calls remains a challenge. Even state-of-the-art LLMs like GPT-4 struggle with generating accurate input arguments and often hallucinate incorrect API call usage. To address this issue, the researchers introduce Gorilla, a finetuned LLaMA-based model that outperforms GPT-4 in writing API calls. Gorilla demonstrates strong adaptability to test-time document changes when combined with a document retriever. This enables flexible user updates or version changes and mitigates the problem of hallucination commonly encountered when directly prompting LLMs. To evaluate Gorilla's performance, the researchers introduce APIBench, a comprehensive dataset consisting of HuggingFace, TorchHub, and TensorHub APIs. The successful integration of the retrieval system with Gorilla showcases the potential for LLMs to accurately use tools, keep up with frequently updated documentation, and improve the reliability and applicability of their outputs. The availability of Gorilla's code, model, data, and demo at https://gorilla.cs.berkeley.edu further facilitates its adoption. The paper also highlights the importance of empowering LLMs to use tools for accessing larger and changing knowledge bases and accomplishing complex computational tasks. By integrating plugins that allow LLMs to invoke external tools through APIs, they can become primary interfaces to computing infrastructure and the web. Techniques like Gorilla enhance an LLM's ability to identify appropriate APIs for specific tasks; correct usage of APIs improves an LLM's interaction with tools in various domains. The proposed pipeline for finetuning LLMs to call APIs surpasses GPT-4's performance in massive datasets collected by the researchers from companies like Samsung SDS, Uber, and VMware. Overall, these advancements contribute to expanding the capabilities of LLMs and their potential to revolutionize various industries by providing them with powerful interfaces that enable them to access large knowledge bases quickly while remaining reliable even when faced with frequent updates or version changes.
Created on 27 Dec. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.