Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction

AI-generated keywords: LLMs CF Rating Prediction Fine-Tuning Data Efficiency

AI-generated Key Points

  • Large Language Models (LLMs) are effective in text generation, translation, and summarization
  • LLMs require less data than Collaborative Filtering (CF) for user preference comprehension
  • A study was conducted on CF and LLMs for user rating prediction
  • Zero-shot LLMs perform worse than traditional recommender models with user interaction data
  • Fine-tuning LLMs with limited training data can achieve comparable or better performance than traditional models
  • LLMs have access to real-world information that can be used for answering questions and creative writing
  • Previous studies have explored using BERT and GPT-2 for recommendation problems but not achieving good results compared to well-tuned baselines like GRU4Rec.
  • The study highlights the potential benefits of fine-tuning LLMs with limited training data for efficient recommendation systems.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Wang-Cheng Kang, Jianmo Ni, Nikhil Mehta, Maheswaran Sathiamoorthy, Lichan Hong, Ed Chi, Derek Zhiyuan Cheng

License: CC BY 4.0

Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities in generalizing to new tasks in a zero-shot or few-shot manner. However, the extent to which LLMs can comprehend user preferences based on their previous behavior remains an emerging and still unclear research question. Traditionally, Collaborative Filtering (CF) has been the most effective method for these tasks, predominantly relying on the extensive volume of rating data. In contrast, LLMs typically demand considerably less data while maintaining an exhaustive world knowledge about each item, such as movies or products. In this paper, we conduct a thorough examination of both CF and LLMs within the classic task of user rating prediction, which involves predicting a user's rating for a candidate item based on their past ratings. We investigate various LLMs in different sizes, ranging from 250M to 540B parameters and evaluate their performance in zero-shot, few-shot, and fine-tuning scenarios. We conduct comprehensive analysis to compare between LLMs and strong CF methods, and find that zero-shot LLMs lag behind traditional recommender models that have the access to user interaction data, indicating the importance of user interaction data. However, through fine-tuning, LLMs achieve comparable or even better performance with only a small fraction of the training data, demonstrating their potential through data efficiency.

Submitted to arXiv on 10 May. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2305.06474v1

Large Language Models (LLMs) have proven to be highly effective in handling a wide range of tasks such as text generation, translation, and summarization. However, their ability to comprehend user preferences based on their previous behavior remains an emerging research question. Collaborative Filtering (CF) has traditionally been the most effective method for these tasks, relying heavily on extensive rating data. In contrast, LLMs require considerably less data while maintaining exhaustive world knowledge about each item such as movies or products. In this paper submitted to ACM, the authors conduct a thorough examination of both CF and LLMs within the classic task of user rating prediction. The task involves predicting a user's rating for a candidate item based on their past ratings. The authors investigate various LLMs in different sizes ranging from 250M to 540B parameters and evaluate their performance in zero-shot, few-shot, and fine-tuning scenarios. The study reveals that zero-shot LLMs lag behind traditional recommender models that have access to user interaction data, indicating the importance of such data. However, through fine-tuning with only a small fraction of training data, LLMs achieve comparable or even better performance than traditional models demonstrating their potential through data efficiency. Furthermore, the authors highlight that LLMs are trained on enormous datasets of text providing access to real-world information which can be converted into knowledge used for answering questions and creative writing like poems and articles. Previous studies have explored formulating recommendation problems as natural language tasks using BERT and GPT-2 but not achieving results as good as well-tuned baselines like GRU4Rec. Overall, this study contributes valuable insights into evaluating LLMs' understanding of user preferences in comparison with traditional recommender models utilizing human interaction data. It highlights the potential benefits of fine-tuning LLMs with limited training data for efficient recommendation systems.
Created on 14 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.