Load Balancing in the AI Data Centers

0 views

0 0

Load Balancing in the AI Data Centers

AI/ML workloads in data centers generate distinct traffic called “Elephant flows.” These large amounts of remote direct memory access (RDMA) traffic are typically produced by graphics processing units (GPUs) in AI servers. It is essential to ensure that the fabric bandwidth utilization is efficient and works well even in situations of low entropy workloads. Juniper’s Arun Gandhi, Mahesh Subramaniam, and Himanshu Tambakuwala discuss the efficient load balancing techniques and their pros and cons within the AI data center fabric.

Managing the Elephant in the Room for AI Data Centers:

Managing the Elephant in the Room for AI Data Centers

RDMA Over Converged Ethernet Version 2 for AI Data Centers:
https://www.juniper.net/us/en/the-feed/topics/ai-and-machine-learning/rdma-over-converged-ethernet-version-2-for-ai-data-centers.html

AI Data Center Networking:
https://www.juniper.net/us/en/solutions/data-center/ai-infrastructure.html

Date: March 26, 2024

Related videos