PMC-LLaMA: Further Finetuning LLaMA on Medical Papers

AI-generated keywords: PMC-LLaMA Language Model Fine-Tuning Biomedical QA Datasets Medical Domain

AI-generated Key Points

  • Large Language Models (LLMs) have shown remarkable capabilities in natural language understanding across various domains
  • In areas that require precision, such as medical applications, these models often exhibit unsatisfactory performance due to a lack of domain-specific knowledge
  • PMC-LLaMA is an open-source language model that is fine-tuned on 4.8 million biomedical academic papers to inject medical knowledge and enhance its capability in the medical domain
  • The authors conducted a preliminary investigation by fine-tuning the existing LLaMA model with the aforementioned dataset and demonstrated that PMC-LLaMA is more suitable for medical tasks compared to LLaMA
  • The evaluation was conducted on three biomedical QA datasets: PubMedQA, MedMCQA, and USMLE, showing better understanding of biomedical domain-specific concepts and achieving high performance on QA benchmarks after fine-tuning
  • The authors outline their fine-tuning procedure using S2ORC datasets with specific training details such as max context length set at 512, batch size at 128, AdamW optimizer with learning rate 2e-5 and Fully Sharded Data Parallel (FSDP) acceleration strategy and bf16 (Brain Floating Point) data format.
  • The model is trained for five epochs with eight A100 GPUs in around seven days, and in each epoch they randomly sample 512 continuous tokens per paper for training.
  • The authors also provide a detailed description of the evaluation benchmark which includes three QA datasets: PubMedQA, MedMCQA and UMLSE.
  • Overall , PMC - LLaMA offers an open - source language model that enhances LLaMA's capability in the medical domain by injecting domain - specific knowledge through fine - tuning on biomedical academic papers .
  • The model and codes are publicly available along with an online demo for further exploration.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

License: CC BY 4.0

Abstract: Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding in various domains. These models can usually behave well on daily dialog, or question answering scenarios, however, in areas that value precision, for example, in medical applications, they often exhibit unsatisfactory performance due to a lack of domain-specific knowledge. In this report, we introduce PMC-LLaMA, an open-source language model that is acquired by fine-tuning an open-source language model on a total of 4.8 million biomedical academic papers for further injecting medical knowledge, enhancing its capability in medical domain. Our preliminary evaluations are conducted on three biomedical QA datasets, including PubMedQA, MedMCQA, and USMLE, showing that the our model after finetuning, i.e., PMC-LLaMA, demonstrates better understanding of biomedical domain-specific concepts, thus achieving high performance on QA benchmarks. The model and codes, along with an online demo, are publicly available.

Submitted to arXiv on 27 Apr. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2304.14454v1

Large Language Models (LLMs) have shown remarkable capabilities in natural language understanding across various domains. However, in areas that require precision, such as medical applications, these models often exhibit unsatisfactory performance due to a lack of domain-specific knowledge. In this report, the authors introduce PMC-LLaMA, an open-source language model that is fine-tuned on 4.8 million biomedical academic papers to inject medical knowledge and enhance its capability in the medical domain. The authors conduct a preliminary investigation by fine-tuning the existing LLaMA model with the aforementioned dataset and demonstrate that PMC-LLaMA is more suitable for medical tasks compared to LLaMA. The evaluation is conducted on three biomedical QA datasets: PubMedQA, MedMCQA, and USMLE. The results show that after fine-tuning, PMC-LLaMA demonstrates better understanding of biomedical domain-specific concepts and achieves high performance on QA benchmarks. In terms of experiment details, the authors outline their fine-tuning procedure using S2ORC datasets with 81.1M English-language academic papers filtered with PubMed Central (PMC)-id resulting in around 4.9M highly related papers totaling over 75B tokens. They fine-tune the LLaMA-7B model on these open-accessed PMC papers using an autoregressive generation objective introduced in GPT2 with specific training details such as max context length set at 512, batch size at 128, AdamW optimizer with learning rate 2e-5 and Fully Sharded Data Parallel (FSDP) acceleration strategy and bf16 (Brain Floating Point) data format. The model is trained for five epochs with eight A100 GPUs in around seven days, and in each epoch they randomly sample 512 continuous tokens per paper for training. The authors also provide a detailed description of the evaluation benchmark which includes three QA datasets: PubMedQA, MedMCQA and UMLSE. PubMedQA contains questions on biomedical research where the model is provided with paper abstracts from PubMed and required to complete multiple choice questions while MedMCQA is a dataset of multiple choice questions sourced from mock exams and past exams of two Indian medical school entrance exams called AIIMS and NEET PG each question having four choices . Overall , PMC - LLaMA offers an open - source language model that enhances LLaMA's capability in the medical domain by injecting domain - specific knowledge through fine - tuning on biomedical academic papers . The model and codes are publicly available along with an online demo for further exploration .
Created on 11 May. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.