Multilingual E5 Text Embeddings: A Technical Report

Summaries already available in other languages: fr

Authors: Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei

6 pages

Abstract: This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023. Three embedding models of different sizes (small / base / large) are provided, offering a balance between the inference efficiency and embedding quality. The training procedure adheres to the English E5 model recipe, involving contrastive pre-training on 1 billion multilingual text pairs, followed by fine-tuning on a combination of labeled datasets. Additionally, we introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes. Information regarding the model release can be found at https://github.com/microsoft/unilm/tree/master/e5 .

Submitted to arXiv on 08 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.05672v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

The summary is not ready yet

The key points are not ready yet

The Layman's summary is not ready yet

The blog article is not ready yet

Created on 01 Sep. 2024

Available in other languages: fr

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

66.7%

BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text …

cs.CL

65.5%

Improving Text Embeddings with Large Language Models

cs.CL

63.8%

Nomic Embed: Training a Reproducible Long Context Text Embedder

cs.CL

61.5%

Text Embeddings by Weakly-Supervised Contrastive Pre-training

cs.CL

59.3%

MaLA-500: Massive Language Adaptation of Large Language Models

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.