Live Session
Chamber of Commerce
Poster
17 Oct
 
8:00
CEST
Thursday Posters
Add Session to Calendar 2024-10-17 08:00 am 2024-10-17 05:10 pm Europe/Rome Thursday Posters Thursday Posters is taking place on the RecSys Hub. Https://recsyshub.org
Research

Embedding Optimization for Training Large-scale Deep Learning Recommendation Systems with EMBark

View on ACM Digital Library

Shijie Liu (NVIDIA), Nan Zheng (NVIDIA), Hui Kang (NVIDIA), Xavier Simmons (NVIDIA), Junjie Zhang (NVIDIA), Matthias Langer (NVIDIA), Wenjing Zhu (NVIDIA), Minseok Lee (NVIDIA) and Zehuan Wang (NVIDIA)

View Paper PDFView Poster
Abstract

Training large-scale deep learning recommendation models (DLRMs) with embedding tables stretching across multiple GPUs in a cluster presents a unique challenge, demanding the efficient scaling of embedding operations that require substantial memory and network bandwidth within a hierarchical network of GPUs. To tackle this bottleneck, we introduce EMBark---a comprehensive solution aimed at enhancing embedding performance and overall DLRM training throughput at scale. EMBark empowers users to create and customize sharding strategies, and features a highly-automated sharding planner, to accelerate diverse model architectures on different cluster configurations. EMBark groups embedding tables, considering their preferred communication compression method to reduce communication overheads effectively. It embraces efficient data-parallel category distribution, combined with topology-aware hierarchical communication, and pipelining support to maximize the DLRM training throughput. Across four representative DLRM variants (DLRM-DCNv2, T180, T200, and T510), EMBark achieves an average end-to-end training throughput speedup of 1.5x and up to 1.77x over traditional table-row-wise sharding approaches.

Join the Conversation

Head to Slido and select the paper's assigned session to join the live discussion.

Conference Agenda

View Full Agenda →
No items found.