An Overview of the Data-Loader Landscape: Comparative Performance Analysis

Authors: Iason Ofeidis, Diego Kiedanski, Leandros Tassiulas

17 pages, 28 figures
License: CC BY 4.0

Abstract: Dataloaders, in charge of moving data from storage into GPUs while training machine learning models, might hold the key to drastically improving the performance of training jobs. Recent advances have shown promise not only by considerably decreasing training time but also by offering new features such as loading data from remote storage like S3. In this paper, we are the first to distinguish the dataloader as a separate component in the Deep Learning (DL) workflow and to outline its structure and features. Finally, we offer a comprehensive comparison of the different dataloading libraries available, their trade-offs in terms of functionality, usability, and performance and the insights derived from them.

Submitted to arXiv on 27 Sep. 2022

Explore the paper tree

Click on the tree nodes to be redirected to a given paper and access their summaries and virtual assistant

Also access our AI generated Summaries, or ask questions about this paper to our AI assistant.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.