WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences
AI-generated Key Points
- WebGLM is a web-enhanced question-answering system that aims to augment pre-trained large language models (LLMs) with web search and retrieval capabilities.
- The system is based on the General Language Model (GLM) and is designed with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer.
- The goal of WebGLM is to provide an effective and cost-effective solution for constructing long-formed QA datasets with open-world references, which are difficult to create due to the need for expert-level annotations.
- Most Open QA datasets and models are limited to answering short answer phrases, while people usually prefer more informative long-formed answers with references.
- WebGLM was developed with accuracy, efficiency, and cost-effectiveness advantages over existing systems like WebGPT.
- Results suggest that WebGLM designs outperform existing systems in terms of accuracy.
- Despite the bottleneck in fetching each page from different sources during retrieval, WebGLM's retrieval efficiency was found to be far better than that of WebGPT.
- Overall; WebGLM has demonstrated comparable answer quality performance compared to similar sized WebGPT models (13B & 175B) in human evaluation; while being more efficient & cost effective for real world deployments.
Authors: Xiao Liu, Hanyu Lai, Hao Yu, Yifan Xu, Aohan Zeng, Zhengxiao Du, Peng Zhang, Yuxiao Dong, Jie Tang
Abstract: We present WebGLM, a web-enhanced question-answering system based on the General Language Model (GLM). Its goal is to augment a pre-trained large language model (LLM) with web search and retrieval capabilities while being efficient for real-world deployments. To achieve this, we develop WebGLM with strategies for the LLM-augmented retriever, bootstrapped generator, and human preference-aware scorer. Specifically, we identify and address the limitations of WebGPT (OpenAI), through which WebGLM is enabled with accuracy, efficiency, and cost-effectiveness advantages. In addition, we propose systematic criteria for evaluating web-enhanced QA systems. We conduct multi-dimensional human evaluation and quantitative ablation studies, which suggest the outperformance of the proposed WebGLM designs over existing systems. WebGLM with the 10-billion-parameter GLM (10B) is shown to perform better than the similar-sized WebGPT (13B) and even comparably to WebGPT (175B) in human evaluation. The code, demo, and data are at \url{https://github.com/THUDM/WebGLM}.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.