WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences

AI-generated keywords: WebGLM GLM Open QA WebGPT Efficiency

AI-generated Key Points

  • WebGLM is a web-enhanced question-answering system that aims to augment pre-trained large language models (LLMs) with web search and retrieval capabilities.
  • The system is based on the General Language Model (GLM) and is designed with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer.
  • The goal of WebGLM is to provide an effective and cost-effective solution for constructing long-formed QA datasets with open-world references, which are difficult to create due to the need for expert-level annotations.
  • Most Open QA datasets and models are limited to answering short answer phrases, while people usually prefer more informative long-formed answers with references.
  • WebGLM was developed with accuracy, efficiency, and cost-effectiveness advantages over existing systems like WebGPT.
  • Results suggest that WebGLM designs outperform existing systems in terms of accuracy.
  • Despite the bottleneck in fetching each page from different sources during retrieval, WebGLM's retrieval efficiency was found to be far better than that of WebGPT.
  • Overall; WebGLM has demonstrated comparable answer quality performance compared to similar sized WebGPT models (13B & 175B) in human evaluation; while being more efficient & cost effective for real world deployments.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xiao Liu, Hanyu Lai, Hao Yu, Yifan Xu, Aohan Zeng, Zhengxiao Du, Peng Zhang, Yuxiao Dong, Jie Tang

Accepted to KDD 2023
License: CC BY 4.0

Abstract: We present WebGLM, a web-enhanced question-answering system based on the General Language Model (GLM). Its goal is to augment a pre-trained large language model (LLM) with web search and retrieval capabilities while being efficient for real-world deployments. To achieve this, we develop WebGLM with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer. Specifically, we identify and address the limitations of WebGPT (OpenAI), through which WebGLM is enabled with accuracy, efficiency, and cost-effectiveness advantages. In addition, we propose systematic criteria for evaluating web-enhanced QA systems. We conduct multi-dimensional human evaluation and quantitative ablation studies, which suggest the outperformance of the proposed WebGLM designs over existing systems. WebGLM with the 10-billion-parameter GLM (10B) is shown to perform better than the similar-sized WebGPT (13B) and even comparably to WebGPT (175B) in human evaluation. The code, demo, and data are at \url{https://github.com/THUDM/WebGLM}.

Submitted to arXiv on 13 Jun. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2306.07906v1

WebGLM is a web-enhanced question-answering system that aims to augment pre-trained large language models (LLMs) with web search and retrieval capabilities, while being efficient for real-world deployments. The system is based on the General Language Model (GLM) and is designed with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer. The goal of WebGLM is to provide an effective and cost-effective solution for the challenge of constructing long-formed QA datasets with open-world references, which are difficult to create due to the need for expert-level annotations. Most Open QA datasets and models are limited to answering short answer phrases, while people usually prefer more informative long-formed answers with references. Recent attempts at creating such datasets include ELI5 and WebGPT. However, these methods require considerable expenses, time, and training as they rely on abundant expert-level annotations of browsing trajectories, well-written answers, and answer preference labeling. In addition, the behavior cloning method used in WebGPT requires its base model GPT-3 to emulate human experts by instructing the system to interact with a web browser. To address these limitations, WebGLM was developed with accuracy, efficiency, and cost-effectiveness advantages over existing systems like WebGPT. The proposed systematic criteria for evaluating web-enhanced QA systems were evaluated through multi-dimensional human evaluation and quantitative ablation studies. Results suggest that WebGLM designs outperform existing systems in terms of accuracy. The speed analysis conducted on WebGLM's LLM-augmented retriever revealed that retrieval is the most time consuming part of any web scale QA system. However despite this bottleneck in fetching each page from different sources during retrieval; WebGLM's retrieval efficiency was found to be far better than that of WebGPT. Overall; WebGLM has demonstrated comparable answer quality performance compared to similar sized WebGPT models (13B & 175B) in human evaluation; while being more efficient & cost effective for real world deployments.
Created on 25 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.