InfiniBand: Powering High-Performance Data Centers

Driven by the booming development of cloud computing and big data, InfiniBand has become a key technology and plays a vital role at the core of the data center. But what exactly is InfiniBand technology? What attributes contribute to its widespread adoption? The following guide will answer your questions.

What is InfiniBand?

InfiniBand is an open industrial standard that defines a high-speed network for interconnecting servers, storage devices, and more. It leverages point-to-point bidirectional links to enable seamless communication between processors located on different servers. It is compatible with various operating systems such as Linux, Windows, and ESXi.

InfiniBand Network Fabric

InfiniBand, built on a channel-based fabric, comprises key components like HCA (Host Channel Adapter), TCA (Target Channel Adapter), InfiniBand links (connecting channels, ranging from cables to fibers, and even on-board links), and InfiniBand switches and routers (integral for networking). Channel adapters, particularly HCA and TCA, are pivotal in forming InfiniBand channels, ensuring security and adherence to Quality of Service (QoS) levels for transmissions.

InfiniBand vs Ethernet

InfiniBand was developed to address data transmission bottlenecks in high-performance computing clusters. The primary differences with Ethernet lie in bandwidth, latency, network reliability, and more.

High Bandwidth and Low Latency

InfiniBand provides higher bandwidth and lower latency, meeting the performance demands of large-scale data transfer and real-time communication applications.

RDMA Support

InfiniBand supports Remote Direct Memory Access (RDMA), enabling direct data transfer between node memories. This reduces CPU overhead and improves transfer efficiency.

Scalability

InfiniBand Fabric allows for easy scalability by connecting a large number of nodes and supporting high-density server layouts. Additional InfiniBand switches and cables can expand network scale and bandwidth capacity.

High Reliability

InfiniBand Fabric incorporates redundant designs and fault isolation mechanisms, enhancing network availability and fault tolerance. Alternate paths maintain network connectivity in case of node or connection failures.

Conclusion

The InfiniBand network has undergone rapid iterations, progressing from SDR 10Gbps, DDR 20Gbps, QDR 40Gbps, FDR56Gbps, EDR 100Gbps, and now to HDR 200Gbps and NDR 400Gbps/800Gbps InfiniBand. For those considering the implementation of InfiniBand products in their high-performance data centers, further details are available from FS.com.

Mastering the Basics of GPU Computing

It’s known that training large models is done on clusters of machines with preferably many GPUs per server. This article will introduce the professional terminology and common network architecture of GPU computing.

Exploring Key Components in GPU Computing

PCIe Switch Chip

In the domain of high-performance GPU computing, vital elements such as CPUs, memory modules, NVMe storage, GPUs, and network cards establish fluid connections via the PCIe (Peripheral Component Interconnect Express) bus or specialized PCIe switch chips.

NVLink

NVLink is a wire-based serial multi-lane near-range communications link developed by Nvidia. Unlike PCI Express, a device can consist of muıltiple NVLinks, and devices use mesh networking to communicate instead of a central hub. The protocol was first announced in March 2014 and uses proprietary high-speed signaling interconnect (NVHS).

The technology supports full mesh interconnection between GPUs on the same node. And the development from NVLink 1.0, NVLink 2.0, NVLink 3.0 to NVLink 4.0 has significantly enhanced the two-way bandwidth and improved the performance of GPU computing applications.

NVSwitch

NVSwitch is a switching chip developed by NVIDIA, designed specifically for high-performance computing and artificial intelligence applications. Its primary function is to provide high-speed, low-latency communication between multiple GPUs within the same host.

NVLink Switch

Unlike the NVSwitch, which is integrated into GPU modules within a single host, the NVLink Switch serves as a standalone switch specifically engineered for linking GPUs in a distributed computing environment.

HBM

Several GPU manufacturers have taken innovative ways to address the speed bottleneck by stacking multiple DDR chips to form so-called high-bandwidth memory (HBM) and integrating them with the GPU. This design removes the need for each GPU to traverse the PCIe switch chip when engaging its dedicated memory. As a result, this strategy significantly increases data transfer speeds, potentially achieving significant orders of magnitude improvements.

Bandwidth Unit

In large-scale GPU computing training, performance is directly tied to data transfer speeds, involving pathways such as PCIe, memory, NVLink, HBM, and network bandwidth. Different bandwidth units are used to measure these data rates.

Storage Network Card

The storage network card in GPU architecture connects to the CPU via PCIe, enabling communication with distributed storage systems. It plays a crucial role in efficient data reading and writing for deep learning model training. Additionally, the storage network card handles node management tasks, including SSH (Secure Shell) remote login, system performance monitoring, and collecting related data. These tasks help monitor and maintain the running status of the GPU cluster.

For the above in-depth exploration of various professional terms, you can refer to this article Unveiling the Foundations of GPU Computing-1 from FS community.

High-Performance GPU Fabric

NVSwitch Fabric

In a full mesh network topology, each node is connected directly to all the other nodes. Usually, 8 GPUs are connected in a full-mesh configuration through six NVSwitch chips, also referred to as NVSwitch fabric.

This fabric optimizes data transfer with a bidirectional bandwidth, providing efficient communication between GPUs and supporting parallel computing tasks. The bandwidth per line depends on the NVLink technology utilized, such as NVLink3, enhancing the overall performance in large-scale GPU clusters.

IDC GPU Fabric

The fabric mainly includes computing network and storage network. The computing network is mainly used to connect GPU nodes and support the collaboration of parallel computing tasks. This involves transferring data between multiple GPUs, sharing calculation results, and coordinating the execution of massively parallel computing tasks. The storage network mainly connects GPU nodes and storage systems to support large-scale data read and write operations. This includes loading data from the storage system into GPU memory and writing calculation results back to the storage system.

Want to know more about CPU fabric? Please check this article Unveiling the Foundations of GPU Computing-2 from FS community.