WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences

AI-generated keywords: WebGLM GLM Open QA WebGPT Efficiency

AI-generated Key Points

WebGLM is a web-enhanced question-answering system that aims to augment pre-trained large language models (LLMs) with web search and retrieval capabilities.
The system is based on the General Language Model (GLM) and is designed with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer.
The goal of WebGLM is to provide an effective and cost-effective solution for constructing long-formed QA datasets with open-world references, which are difficult to create due to the need for expert-level annotations.
Most Open QA datasets and models are limited to answering short answer phrases, while people usually prefer more informative long-formed answers with references.
WebGLM was developed with accuracy, efficiency, and cost-effectiveness advantages over existing systems like WebGPT.
Results suggest that WebGLM designs outperform existing systems in terms of accuracy.
Despite the bottleneck in fetching each page from different sources during retrieval, WebGLM's retrieval efficiency was found to be far better than that of WebGPT.
Overall; WebGLM has demonstrated comparable answer quality performance compared to similar sized WebGPT models (13B & 175B) in human evaluation; while being more efficient & cost effective for real world deployments.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xiao Liu, Hanyu Lai, Hao Yu, Yifan Xu, Aohan Zeng, Zhengxiao Du, Peng Zhang, Yuxiao Dong, Jie Tang

arXiv: 2306.07906v1 - DOI (cs.CL)

Accepted to KDD 2023

License: CC BY 4.0

Abstract: We present WebGLM, a web-enhanced question-answering system based on the General Language Model (GLM). Its goal is to augment a pre-trained large language model (LLM) with web search and retrieval capabilities while being efficient for real-world deployments. To achieve this, we develop WebGLM with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer. Specifically, we identify and address the limitations of WebGPT (OpenAI), through which WebGLM is enabled with accuracy, efficiency, and cost-effectiveness advantages. In addition, we propose systematic criteria for evaluating web-enhanced QA systems. We conduct multi-dimensional human evaluation and quantitative ablation studies, which suggest the outperformance of the proposed WebGLM designs over existing systems. WebGLM with the 10-billion-parameter GLM (10B) is shown to perform better than the similar-sized WebGPT (13B) and even comparably to WebGPT (175B) in human evaluation. The code, demo, and data are at \url{https://github.com/THUDM/WebGLM}.

Submitted to arXiv on 13 Jun. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2306.07906v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

WebGLM is a web-enhanced question-answering system that aims to augment pre-trained large language models (LLMs) with web search and retrieval capabilities, while being efficient for real-world deployments. The system is based on the General Language Model (GLM) and is designed with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer. The goal of WebGLM is to provide an effective and cost-effective solution for the challenge of constructing long-formed QA datasets with open-world references, which are difficult to create due to the need for expert-level annotations. Most Open QA datasets and models are limited to answering short answer phrases, while people usually prefer more informative long-formed answers with references. Recent attempts at creating such datasets include ELI5 and WebGPT. However, these methods require considerable expenses, time, and training as they rely on abundant expert-level annotations of browsing trajectories, well-written answers, and answer preference labeling. In addition, the behavior cloning method used in WebGPT requires its base model GPT-3 to emulate human experts by instructing the system to interact with a web browser. To address these limitations, WebGLM was developed with accuracy, efficiency, and cost-effectiveness advantages over existing systems like WebGPT. The proposed systematic criteria for evaluating web-enhanced QA systems were evaluated through multi-dimensional human evaluation and quantitative ablation studies. Results suggest that WebGLM designs outperform existing systems in terms of accuracy. The speed analysis conducted on WebGLM's LLM-augmented retriever revealed that retrieval is the most time consuming part of any web scale QA system. However despite this bottleneck in fetching each page from different sources during retrieval; WebGLM's retrieval efficiency was found to be far better than that of WebGPT. Overall; WebGLM has demonstrated comparable answer quality performance compared to similar sized WebGPT models (13B & 175B) in human evaluation; while being more efficient & cost effective for real world deployments.

- WebGLM is a web-enhanced question-answering system that aims to augment pre-trained large language models (LLMs) with web search and retrieval capabilities.
- The system is based on the General Language Model (GLM) and is designed with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer.
- The goal of WebGLM is to provide an effective and cost-effective solution for constructing long-formed QA datasets with open-world references, which are difficult to create due to the need for expert-level annotations.
- Most Open QA datasets and models are limited to answering short answer phrases, while people usually prefer more informative long-formed answers with references.
- WebGLM was developed with accuracy, efficiency, and cost-effectiveness advantages over existing systems like WebGPT.
- Results suggest that WebGLM designs outperform existing systems in terms of accuracy.
- Despite the bottleneck in fetching each page from different sources during retrieval, WebGLM's retrieval efficiency was found to be far better than that of WebGPT.
- Overall; WebGLM has demonstrated comparable answer quality performance compared to similar sized WebGPT models (13B & 175B) in human evaluation; while being more efficient & cost effective for real world deployments.

WebGLM is a computer program that helps answer questions by searching the internet. It is better than other programs because it can find longer and more informative answers. WebGLM was made to be accurate, fast, and not too expensive. It works by using different strategies to find the best answer. People tested it and found that it works well and is a good choice for answering questions on the internet. Definitions: - Web-enhanced: something that uses the internet to make something better - Question-answering system: a computer program that helps people find answers to their questions - Language models: computer programs that understand language - Retrieval capabilities: the ability to search for information - Dataset: a collection of information used for research or analysis

WebGLM: A Web-Enhanced Question Answering System

The development of effective question answering (QA) systems is a challenging task, as it requires the system to understand natural language queries and provide accurate answers. To address this challenge, researchers have developed General Language Model (GLM) based QA systems that use pre-trained large language models (LLMs). However, these systems are limited in their ability to provide long-formed answers with references from open world sources. In response to this limitation, researchers at Google AI recently proposed WebGLM – a web-enhanced QA system that aims to augment LLMs with web search and retrieval capabilities while being efficient for real-world deployments. This article will discuss the features of WebGLM and its performance compared to existing methods such as ELI5 and WebGPT.

Overview of WebGLM

WebGLM is designed with strategies for an LLM-augmented retriever, bootstrapped generator, and human preference aware scorer. The goal of the system is to provide an effective solution for constructing long-formed QA datasets with open world references which are difficult to create due to the need for expert level annotations. The LLM augmented retriever retrieves relevant documents from multiple sources including web pages; while the bootstrapped generator generates candidate answers using both retrieved documents and GLMs; finally; human preference aware scorer evaluates each candidate answer based on user preferences before providing final output.

Evaluation Criteria

To evaluate WebGLM’s performance against existing methods such as ELI5 and WebGPT; multi dimensional human evaluation was conducted along with quantitative ablation studies. Results suggest that WebGLM designs outperform existing systems in terms of accuracy; while also being more efficient & cost effective for real world deployments. Speed analysis conducted on the LLM augmented retriever revealed that retrieval is the most time consuming part of any web scale QA system; however despite this bottleneck in fetching each page from different sources during retrieval; WebGLM’s retrieval efficiency was found to be far better than that of WebGPT.

Conclusion

Overall; results suggest that compared to similar sized models like 13B & 175B used by ELI5 & GPT respectively ;WebGLm has demonstrated comparable answer quality performance in human evaluation tests while being more efficient & cost effective for real world deployments . As such , it can be concluded that this new method provides an effective solution for constructing long formed Q&A datasets with open world references without requiring expensive expert level annotations or training times .

Created on 25 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

60.1%

RETA-LLM: A Retrieval-Augmented Large Language Model Toolkit

cs.IR

58.4%

LLMMaps -- A Visual Metaphor for Stratified Evaluation of Large Language Mode…

cs.CL

58.1%

Towards Expert-Level Medical Question Answering with Large Language Models

cs.CL

58.0%

Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction

cs.IR

57.3%

Unleashing Infinite-Length Input Capacity for Large-scale Language Models wit…

cs.CL

57.2%

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative …

cs.CL

56.8%

Generate rather than Retrieve: Large Language Models are Strong Context Gener…

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.