Online

Session 2: Overview of Underlying Networking Protocols for AI Deployments

Why AI workloads break traditional networks—and how modern protocols fix it

From Ethernet Limits to Lossless High-Performance Fabrics

1. Introduction: Your GPUs are only as fast as your network
Growth of distributed training
East-west traffic explosion
GPU starvation problem

2. Anatomy of an AI Training Job – Network perspective and requirements
Data parallel training
Gradient synchronization
Collective Communication patterns
Reduce
AllReduce (main)
ReduceScatter
AllGather
Broadcast
Point-to-Point Communication pattern
Send / Recv

3. RDMA Fundamentals – Why CPU becomes bottleneck without RDMA
Zero-copy → memory efficiency
Kernel bypass → latency
Queue pairs → parallelism

4. RDMA over Fabrics
InfiniBand
Native RDMA – Designed for HPC
Ethernet trying to behave like InfiniBand
RoCEv1 / RoCEv2
Layer 2 vs Layer 3 implications

5. The Hard Problem: Lossless Ethernet
Problems, why it breaks
Incast problem
Many-to-many patterns
Microbursts

Failure modes
Head-of-line blocking
Congestion spreading
Deadlocks

Solutions
PFC (Priority Flow Control) → prevents drops
ECN (Explicit Congestion Notification) → signals congestion
DCQCN (Data Center Quantized Congestion Notification) → controls rate

6. NVIDIA AI Networking Stack – Tying everything together
Inside the node
NVLink / NVSwitch → avoid the network whenever possible

Across nodes
NCCL (NVIDIA Collective Communication Library) → optimizes collectives
NVIDIA Spectrum-X – ASIC-level architecture → optimized Ethernet for AI

7. Closing: Business Impact
In AI infrastructure, network misconfiguration is not a minor issue, it directly translates into lost GPU ROI
Poor congestion control → GPU idle time
Packet loss → training retries
Latency spikes → longer training cycles

Complete
Registration

📅 May 7th 2026
10 AM – 11.30 AM CST
📍 Virtual Room

Speaker

Picture of Agustin Ciciliani

Agustin Ciciliani

Engineer at BVS One

Learn how we helped 100 top brands gain success