Leveraging Large Language Models for Mental Health Prediction via Online Text Data

AI-generated keywords: Large Language Models Mental Health Prediction Prompt Design Enhancements Instruction Finetuning Online Text Data

AI-generated Key Points

  • Recent surge of Large Language Models (LLMs) including GPT-3.5/4, PaLM, FLAN-T5, and Alpaca showing promising potential for various applications
  • Lack of research focusing on understanding and enhancing LLMs' capabilities in the mental health domain
  • Comprehensive evaluation of multiple LLMs specifically for mental health prediction tasks using online text data
  • Experiment results show that while LLMs show promise in mental health tasks, their performance is not yet comparable to task-specific NLP models
  • Prompt design enhancement strategies effective for critical action prediction tasks like suicide prediction
  • Instruction finetuning significantly improves the performance of LLMs across all mental health prediction tasks simultaneously
  • Best-finetuned model developed in the study - Mental-Alpaca - outperforms larger GPT-3.5 model by 16.7% on balanced accuracy and performs comparably to state-of-the-art task-specific models
  • Effectiveness of prompt design enhancements for mental health tasks and potential for further improvement through instruction finetuning highlighted
  • Utilization of four diverse mental health datasets to define six different mental health prediction tasks ranging from binary stress prediction to five-level suicide risk prediction at both post-level and user-level data
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xuhai Xu, Bingshen Yao, Yuanzhe Dong, Hong Yu, James Hendler, Anind K. Dey, Dakuo Wang

License: CC BY 4.0

Abstract: The recent technology boost of large language models (LLMs) has empowered a variety of applications. However, there is very little research on understanding and improving LLMs' capability for the mental health domain. In this work, we present the first comprehensive evaluation of multiple LLMs, including Alpaca, Alpaca-LoRA, and GPT-3.5, on various mental health prediction tasks via online text data. We conduct a wide range of experiments, covering zero-shot prompting, few-shot prompting, and instruction finetuning. The results indicate the promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned model, Mental-Alpaca, outperforms GPT-3.5 (25 times bigger) by 16.7\% on balanced accuracy and performs on par with the state-of-the-art task-specific model. We summarize our findings into a set of action guidelines for future researchers, engineers, and practitioners on how to empower LLMs with better mental health domain knowledge and become an expert in mental health prediction tasks.

Submitted to arXiv on 26 Jul. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2307.14385v1

The recent surge of Large Language Models (LLMs), including GPT-3.5/4, PaLM, FLAN-T5, and Alpaca, has shown promising potential for various applications. However, there is a lack of research focusing on understanding and enhancing LLMs' capabilities in the mental health domain. This study presents a comprehensive evaluation of multiple LLMs specifically for mental health prediction tasks using online text data. The experiments cover zero-shot prompting, few-shot prompting, and instruction finetuning to assess the performance of LLMs on mental health tasks. Results show that while LLMs show promise in these tasks, their performance is not yet comparable to task-specific NLP models. <br> Through detailed experiments and analysis, it was found that prompt design enhancement strategies are effective for critical action prediction tasks like suicide prediction. Furthermore,<br> instruction finetuning significantly improves the performance of LLMs across all mental health prediction tasks simultaneously. The best-finetuned model developed in this study - Mental-Alpaca - outperforms the larger GPT-3.5 model by 16.7% on balanced accuracy and performs comparably to state-of-the-art task-specific models.<br> This research highlights key takeaways such as the effectiveness of prompt design enhancements for mental health tasks and the potential for further improvement through instruction finetuning. It also provides actionable guidelines for future researchers, engineers, and practitioners looking to enhance LLMs with better knowledge in the mental health domain and excel in mental health prediction tasks.<br> Additionally,<br> the study utilized four diverse mental health datasets - Dreaddit, DepSeverity, SDCNL, and CSSRS-Suicide - to define six different mental health prediction tasks ranging from binary stress prediction to five-level suicide risk prediction at both post-level and user-level data. Overall, this research contributes valuable insights into leveraging large language models for mental health prediction using online text data and highlights areas for further exploration and improvement in this important domain.
Created on 08 Apr. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.