Our team, consisting of Kunlong Chen, Junjun Wang, Zhaoqun Chen, Kunjin Chen, and Yitian Chen, secured the 3rd place in the KDD CUP 2024 paper source tracing competition with our innovative approach. The challenge was to identify the reference sources (ref-sources) of academic papers. While many teams relied on fine-tuning pre-trained neural language models like BERT or ChatGLM, we took a different route by leveraging closed-source large language models (LLMs). These LLMs have shown remarkable capabilities in handling complex reasoning tasks in zero-shot or few-shot scenarios. One notable aspect of our methodology was its independence from GPUs for model training. Instead of traditional GPU-intensive methods, we utilized closed-source LLMs to directly generate predicted reference sources from the provided papers. To enhance the accuracy of these predictions, we implemented ensemble learning techniques. Our approach stood out among the award-winning strategies as the only one that did not require GPU usage during model training. Our work has been detailed in the paper titled "LLM-Powered Ensemble Learning for Paper Source Tracing: A GPU-Free Approach," authored by Kunlong Chen, Junjun Wang, Zhaoqun Chen, Kunjin Chen, and Yitian Chen. This research showcases how advancements in LLM technology can revolutionize source tracing tasks and highlights the effectiveness of ensemble learning in refining predictions. For those interested in exploring our methodology further, the code is available at https://github.com/Cklwanfifa/KDDCUP2024-PST. Overall, our success in this competition underscores the potential of closed-source LLMs and novel approaches like ensemble learning in addressing challenging tasks within the realm of academic paper source tracing.
- - Team consisting of Kunlong Chen, Junjun Wang, Zhaoqun Chen, Kunjin Chen, and Yitian Chen secured 3rd place in KDD CUP 2024 paper source tracing competition
- - Innovative approach using closed-source large language models (LLMs) for identifying reference sources of academic papers
- - Methodology independent from GPUs for model training, instead leveraging LLMs for generating predicted reference sources directly from papers
- - Utilized ensemble learning techniques to enhance prediction accuracy
- - Research detailed in the paper titled "LLM-Powered Ensemble Learning for Paper Source Tracing: A GPU-Free Approach"
- - Code available at https://github.com/Cklwanfifa/KDDCUP2024-PST
SummaryA group of five people got 3rd place in a competition about finding where academic papers come from. They used a new way with big language models to figure out the sources of papers without needing special computer chips. They worked together to make their guesses more accurate. Their work is explained in a paper called "LLM-Powered Ensemble Learning for Paper Source Tracing: A GPU-Free Approach." You can find their computer code online.
Definitions- Team: A group of people working together.
- Academic papers: Documents written by scholars and researchers on specific topics.
- Language models (LLMs): Computer programs that understand and generate human language.
- GPUs: Graphics Processing Units, specialized computer chips used for fast processing.
- Ensemble learning: A technique where multiple models are combined to improve accuracy.
- Code: Instructions written for computers to perform specific tasks.
Introduction
The KDD CUP 2024 paper source tracing competition was a highly anticipated event in the field of natural language processing (NLP). The challenge was to identify the reference sources (ref-sources) of academic papers, a task that has long been considered difficult due to the complex reasoning involved. Our team, consisting of Kunlong Chen, Junjun Wang, Zhaoqun Chen, Kunjin Chen, and Yitian Chen, participated in this competition with our innovative approach and secured the 3rd place. In this blog article, we will delve into our methodology and highlight its key aspects that set it apart from other winning strategies.
The Challenge
The task at hand for the KDD CUP 2024 paper source tracing competition was to accurately identify the reference sources of academic papers. This is a challenging task as it requires understanding not only the content of the paper but also its references and their relationships. Many teams relied on fine-tuning pre-trained neural language models like BERT or ChatGLM for this task. However, our team took a different route by leveraging closed-source large language models (LLMs).
Closed-Source LLMs: A Game-Changer
Closed-source LLMs have shown remarkable capabilities in handling complex reasoning tasks in zero-shot or few-shot scenarios. These models are trained on massive amounts of data and can perform well even when faced with unseen data. This makes them ideal for tasks like paper source tracing where there is limited annotated data available.
One notable aspect of our methodology was its independence from GPUs for model training. Traditional methods rely heavily on GPU usage which can be expensive and time-consuming. Instead, we utilized closed-source LLMs to directly generate predicted reference sources from the provided papers.
Ensemble Learning: Refining Predictions
To enhance the accuracy of our predictions, we implemented ensemble learning techniques. Ensemble learning involves combining multiple models to make a final prediction. This approach has been proven to be effective in improving the performance of machine learning models.
In our case, we trained multiple closed-source LLMs and combined their outputs using an ensemble method called stacking. This helped us refine our predictions and achieve higher accuracy compared to using a single model.
Our Winning Methodology
Our approach stood out among the award-winning strategies as the only one that did not require GPU usage during model training. This was made possible by leveraging closed-source LLMs and implementing ensemble learning techniques.
We detailed our methodology in the paper titled "LLM-Powered Ensemble Learning for Paper Source Tracing: A GPU-Free Approach," authored by Kunlong Chen, Junjun Wang, Zhaoqun Chen, Kunjin Chen, and Yitian Chen. The paper showcases how advancements in LLM technology can revolutionize source tracing tasks and highlights the effectiveness of ensemble learning in refining predictions.
Exploring Our Methodology Further
For those interested in exploring our methodology further, we have made our code available on GitHub at https://github.com/Cklwanfifa/KDDCUP2024-PST. We believe that sharing our code will not only help others understand our approach but also encourage collaboration and further research in this area.
Conclusion
In conclusion, securing 3rd place in the KDD CUP 2024 paper source tracing competition was a proud moment for our team. Our success underscores the potential of closed-source LLMs and novel approaches like ensemble learning in addressing challenging tasks within the realm of academic paper source tracing. We hope that this blog article has provided insights into our winning methodology and sparked interest among readers to explore this topic further.