Pathways: Asynchronous Distributed Dataflow for ML
AI-generated Key Points
- Pathways is a large-scale orchestration layer designed for accelerators to facilitate exploration of novel systems and machine learning research ideas while maintaining top-notch performance for current models.
- The system utilizes a sharded dataflow graph with asynchronous operators that efficiently gang-schedules heterogeneous parallel computations on thousands of accelerators and coordinates data transfers over dedicated interconnects.
- Pathways features an innovative asynchronous distributed dataflow design that allows the control plane to execute in parallel despite dependencies in the data plane, simplifying the expression of complex new parallelism patterns.
- Demonstrated performance parity with state-of-the-art systems by achieving close to 100% accelerator utilization when running Single Program Multiple Data (SPMD) computations over 2048 Tensor Processing Units (TPUs).
- Pathways delivers throughput comparable to SPMD case for Transformer models pipelined across 16 stages or sharded across two islands of accelerators connected via a data center network.
- The system effectively handles diverse ML workloads, maximizes accelerator utilization, and overall system performance, showcasing potential for driving advancements in machine learning research and cutting-edge model development.
Authors: Paul Barham, Aakanksha Chowdhery, Jeff Dean, Sanjay Ghemawat, Steven Hand, Dan Hurt, Michael Isard, Hyeontaek Lim, Ruoming Pang, Sudip Roy, Brennan Saeta, Parker Schuh, Ryan Sepassi, Laurent El Shafey, Chandramohan A. Thekkath, Yonghui Wu
Abstract: We present the design of a new large scale orchestration layer for accelerators. Our system, Pathways, is explicitly designed to enable exploration of new systems and ML research ideas, while retaining state of the art performance for current models. Pathways uses a sharded dataflow graph of asynchronous operators that consume and produce futures, and efficiently gang-schedules heterogeneous parallel computations on thousands of accelerators while coordinating data transfers over their dedicated interconnects. Pathways makes use of a novel asynchronous distributed dataflow design that lets the control plane execute in parallel despite dependencies in the data plane. This design, with careful engineering, allows Pathways to adopt a single-controller model that makes it easier to express complex new parallelism patterns. We demonstrate that Pathways can achieve performance parity (~100% accelerator utilization) with state-of-the-art systems when running SPMD computations over 2048 TPUs, while also delivering throughput comparable to the SPMD case for Transformer models that are pipelined across 16 stages, or sharded across two islands of accelerators connected over a data center network.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.