BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search

AI-generated keywords: BERT Distillation Relevance Prediction Unlabeled Data Embedding Analysis Source Code

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Importance of relevance in user experience and business profit for e-commerce search platforms
  • Proposal of a data-driven framework for search relevance prediction using BERT distillation
  • Distillation process resulting in a student model with over 97% test accuracy compared to teacher models
  • Significant reduction in serving costs with lower latency than BERT-Base and TinyBERT
  • Introduction of techniques like temperature rescaling and teacher model stacking to enhance accuracy without increasing complexity
  • Evaluation on in-house e-commerce search relevance data and public dataset on sentiment analysis from GLUE benchmark
  • Embedding analysis and case study demonstrating the strength of the resulting model
  • Public availability of data processing and model training source code to reduce energy consumption and promote accessibility for small organizations.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yunjiang Jiang, Yue Shang, Ziyang Liu, Hongwei Shen, Yun Xiao, Wei Xiong, Sulong Xu, Weipeng Yan, Di Jin

10 pages, 7 figures, to appear in ICDM 2020

Abstract: Relevance has significant impact on user experience and business profit for e-commerce search platform. In this work, we propose a data-driven framework for search relevance prediction, by distilling knowledge from BERT and related multi-layer Transformer teacher models into simple feed-forward networks with large amount of unlabeled data. The distillation process produces a student model that recovers more than 97\% test accuracy of teacher models on new queries, at a serving cost that's several magnitude lower (latency 150x lower than BERT-Base and 15x lower than the most efficient BERT variant, TinyBERT). The applications of temperature rescaling and teacher model stacking further boost model accuracy, without increasing the student model complexity. We present experimental results on both in-house e-commerce search relevance data as well as a public data set on sentiment analysis from the GLUE benchmark. The latter takes advantage of another related public data set of much larger scale, while disregarding its potentially noisy labels. Embedding analysis and case study on the in-house data further highlight the strength of the resulting model. By making the data processing and model training source code public, we hope the techniques presented here can help reduce energy consumption of the state of the art Transformer models and also level the playing field for small organizations lacking access to cutting edge machine learning hardwares.

Submitted to arXiv on 20 Oct. 2020

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2010.10442v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The paper titled "BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search" addresses the importance of relevance in user experience and business profit for e-commerce search platforms. The authors propose a data-driven framework for search relevance prediction by distilling knowledge from BERT and related multi-layer Transformer teacher models into simple feed-forward networks. This distillation process results in a student model that achieves over 97% test accuracy compared to the teacher models, while significantly reducing serving costs (with latency 150x lower than BERT-Base and 15x lower than TinyBERT). The authors also introduce techniques such as temperature rescaling and teacher model stacking to further enhance model accuracy without increasing complexity. The experimental results presented in the paper include evaluations on both in-house e-commerce search relevance data and a public dataset on sentiment analysis from the GLUE benchmark. The latter leverages another large-scale public dataset, disregarding potentially noisy labels. The authors perform embedding analysis and present a case study on the in-house data to demonstrate the strength of their resulting model. In an effort to reduce energy consumption of state-of-the-art Transformer models and level the playing field for small organizations lacking access to cutting-edge machine learning hardware, the authors make their data processing and model training source code publicly available. Overall, this paper provides insights into improving search relevance prediction using BERT distillation with massive unlabeled data, showcasing its effectiveness through extensive experiments and analyses.
Created on 24 Nov. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.