Orca: Progressive Learning from Complex Explanation Traces of GPT-4
AI-generated Key Points
- Recent research has focused on improving smaller language models through imitation learning using outputs from large foundation models (LFMs).
- Challenges include limited imitation signals, small scale homogeneous training data, and lack of rigorous evaluation resulting in overestimation of small model capabilities.
- Orca is a 13-billion parameter model that learns to imitate the reasoning process of LFMs using rich signals from GPT-4 and teacher assistance from ChatGPT.
- Orca surpasses state-of-the-art instruction-tuned models by more than 100% in complex zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and 42% on AGIEval.
- Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive performance in professional and academic examinations like the SAT, LSAT, GRE, and GMAT without CoT while trailing behind GPT-4.
- Learning from step-by-step explanations is a promising direction to improve model capabilities regardless of their size.
- Data size and coverage are crucial for aligning smaller models to their more powerful counterparts like GPT-4.
- Explanation Tuning is an effective method for aligning smaller models to GPT-4.
- Further development is needed for robust evaluation methods, advancement of alignment post-training techniques, and more effective use of powerful teachers like GPT 4.
Authors: Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, Ahmed Awadallah
Abstract: Recent research has focused on enhancing the capability of smaller models through imitation learning, drawing on the outputs generated by large foundation models (LFMs). A number of issues impact the quality of these models, ranging from limited imitation signals from shallow LFM outputs; small scale homogeneous training data; and most notably a lack of rigorous evaluation resulting in overestimating the small model's capability as they tend to learn to imitate the style, but not the reasoning process of LFMs. To address these challenges, we develop Orca (We are working with our legal team to publicly release a diff of the model weights in accordance with LLaMA's release policy to be published at https://aka.ms/orca-lm), a 13-billion parameter model that learns to imitate the reasoning process of LFMs. Orca learns from rich signals from GPT-4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT. To promote this progressive learning, we tap into large-scale and diverse imitation data with judicious sampling and selection. Orca surpasses conventional state-of-the-art instruction-tuned models such as Vicuna-13B by more than 100% in complex zero-shot reasoning benchmarks like Big-Bench Hard (BBH) and 42% on AGIEval. Moreover, Orca reaches parity with ChatGPT on the BBH benchmark and shows competitive performance (4 pts gap with optimized system message) in professional and academic examinations like the SAT, LSAT, GRE, and GMAT, both in zero-shot settings without CoT; while trailing behind GPT-4. Our research indicates that learning from step-by-step explanations, whether these are generated by humans or more advanced AI models, is a promising direction to improve model capabilities and skills.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.