HPC and Future Networks: Architectures, Technologies, and Innovations

High-Performance Computing (HPC) has become a crucial tool for solving complex problems and pushing the boundaries of scientific research, and various other applications. However, efficient operation of HPC systems requires specialized infrastructure and support. HPC has emerged as an indispensable tool across various domains, capable of addressing complex challenges and driving innovation in fields such as science, meteorology, finance, and healthcare.

Understanding the importance of data centers in supporting HPC is essential, as knowing the three fundamental components—compute, storage, and networking—that constitute high-performance computing systems is crucial.

Facilities in High-Performance Computing

Intensive computations in HPC environments generate substantial heat, necessitating advanced cooling solutions. Efficient cooling prevents overheating, ensuring system stability and prolonging hardware lifespan. Supporting HPC, data centers employ cutting-edge cooling facilities, including liquid cooling systems and precision air conditioning. Moreover, data center architects explore innovative cooling technologies like immersion cooling, submerging servers in special liquids for effective heat dissipation.

Success in HPC data centers relies on a range of specialized equipment tailored to meet the unique demands of high-performance computing. Key components include data center switches, server network cards, high-speed optical modules, DAC and AOC cables, and power supplies.

The Growing Demand for Network Infrastructure in High-Performance Computing

With revolutionary technologies like 5G, big data, and the Internet of Things (IoT) permeating various aspects of society, the trajectory towards an intelligent, digitized society over the next two to three decades is inevitable. Data center computing power has become a powerful driving force, shifting focus from resource scale to computational scale.

To meet the ever-growing demand for computing power, high-performance computing (HPC) has become a top priority, especially as computational cluster scales expand from the petascale to the exascale. This shift imposes increasingly higher demands on interconnect network performance, marking a clear trend of deep integration between computation and networking. HPC introduces different network performance requirements in three typical scenarios: loosely coupled computing scenarios, tightly coupled scenarios, and data-intensive computing scenarios.

In summary, high-performance computing (HPC) imposes stringent requirements on network throughput and latency. To meet these demands, the industry widely adopts Remote Direct Memory Access (RDMA) as an alternative to the TCP protocol to reduce latency and maximize CPU utilization on servers. Despite its advantages, the sensitivity of RDMA to network packet loss highlights the importance of lossless networks.

The Evolution of High-Performance Computing Networks

Traditional data center networks have historically adopted a multi-hop symmetric architecture based on Ethernet technology, relying on the TCP/IP protocol stack for transmission. However, despite over 30 years of development, Remote Direct Memory Access (RDMA) technology has gradually replaced TCP/IP, becoming the preferred protocol for HPC networks. Additionally, the choice of RDMA network layer protocols has evolved from expensive lossless networks based on the InfiniBand (IB) protocol to intelligent lossless networks based on Ethernet.

From TCP to RDMA

In traditional data centers, Ethernet technology and the TCP/IP protocol stack have been the norm for building multi-hop symmetric network architectures. However, due to two main limitations—latency issues and CPU utilization—the TCP/IP network is no longer sufficient to meet the demands of high-performance computing. To address these challenges, RDMA functionality has been introduced at the server side. RDMA is a direct memory access technology that enables data transfer directly between computer memories without involving the operating system, thus bypassing time-consuming processor operations. This approach achieves high bandwidth, low latency, and low resource utilization.

From IB to RoCE

RDMA enables direct data read and write between applications and network cards. RDMA’s zero-copy mechanism allows the receiving end to read data directly from the sending end’s memory, significantly reducing CPU burden and improving CPU efficiency. Currently, there are three choices for RDMA network layer protocols: InfiniBand, iWARP (Internet Wide Area RDMA Protocol), and RoCE (RDMA over Converged Ethernet). Although RoCE offers many advantages, its sensitivity to packet loss requires support from lossless Ethernet. This evolution of HPC networks reflects a continuous pursuit of enhanced performance, efficiency, and interoperability.

Enterprise Innovative Solution: Designing High-Performance Data Center Networks

The architecture of data center networks has evolved from the traditional core-aggregation-access model to the modern Spine-Leaf design. This approach fully utilizes network interconnection bandwidth, reduces multi-layer convergence rates, and is easy to scale. When traffic bottlenecks occur, horizontal expansion can be achieved by increasing uplink links and reducing convergence ratios, minimizing the impact on bandwidth expansion. Overlay networks utilize EVPN-VXLAN technology to achieve flexible network deployment and resource allocation.

This solution draws on the design experience of internet data center networks, adopting the Spine-Leaf architecture and EVPN-VXLAN technology to provide a versatile and scalable network infrastructure for upper-layer services. Production and office networks are isolated by domain firewalls and connected to office buildings, labs, and regional center exits. The core switches of the production network provide up to 1.6Tb/s of inter-POD communication bandwidth and 160G of high-speed network egress capacity, with each POD’s internal horizontal network capacity reaching 24Tb, ensuring minimal packet loss. The building wiring is planned based on the Spine-Leaf architecture, with each POD’s switches interconnected using 100G links and deployed in TOR mode. The overall network structure is more streamlined, improving cable deployment and management efficiency.

Future-Oriented Equipment Selection

When envisioning and building data center networks, careful consideration of technological advancements, industry trends, and operational costs over the next five years is crucial. The choice of network switches plays a vital role in the overall design of data center networks. Traditional large-scale network designs often opt for chassis-based equipment to enhance the overall capacity of the network system, but scalability is limited.

Therefore, for the network equipment selection of this project, NVIDIA strongly advocates for adopting a modular switch network architecture. This strategic approach facilitates rapid familiarization by maintenance teams. Additionally, it provides operational flexibility for future network architecture adjustments, equipment reuse, and maintenance replacements.

In response to the ongoing trend of business transformation and the surge in demand for big data, most data center network designs adopt the mature Spine-Leaf architecture, coupled with EVPN-VXLAN technology to achieve efficient network virtualization. This architectural approach ensures convenient high-bandwidth, low-latency network traffic, laying the foundation for scalability and flexibility.

How FS Can Help

FS is a professional provider of communication and high-speed network system solutions for network, data center, and telecommunications customers. Leveraging NVIDIA® InfiniBand switches, 100G/200G/400G/800G InfiniBand transceivers, and NVIDIA® InfiniBand adapters, FS offers customers a comprehensive set of solutions based on InfiniBand and lossless Ethernet (RoCE). These solutions meet diverse application requirements, enabling users to accelerate their businesses and enhance performance. For more information, please visit FS.COM.

Enhancing Data Center Networks with InfiniBand Solutions

With the rapid growth of data centres driven by expansive models, cloud computing, and big data analytics, there is an increasing demand for high-speed data transfer and low-latency communication. In this complex network ecosystem, InfiniBand (IB) technology has become a market leader, playing a vital role in addressing the challenges posed by the training and deployment of expansive models. Constructing high-speed networks within data centres requires essential components such as high-rate network cards, optical modules, switches, and advanced network interconnect technologies.

NVIDIA Quantum™-2 InfiniBand Switch

When selecting switches, NVIDIA’s QM9700 and QM9790 series stand out as the most advanced devices. Built on NVIDIA Quantum-2 architecture, they offer 64 NDR 400Gb/s InfiniBand ports within a standard 1U chassis. This breakthrough translates to an individual switch providing a total bidirectional bandwidth of 51.2 terabits per second (Tb/s), along with an unprecedented handling capacity exceeding 66.5 billion packets per second (BPPS).

The NVIDIA Quantum-2 InfiniBand switches extend beyond their NDR high-speed data transfer capabilities, incorporating extensive throughput, on-chip compute processing, advanced intelligent acceleration features, adaptability, and sturdy construction. These attributes establish them as the quintessential selections for sectors involving high-performance computing (HPC) and expansive cloud-based infrastructures.

Additionally, the integration of NDR switches helps minimise overall cost and complexity, thereby promoting the development of data centre network technology.

Also Check- Revolutionizing Data Center Networks: 800G Optical Modules and NDR Switches | FS Community

ConnectX®-7 InfiniBand Card

The NVIDIA ConnectX®-7 InfiniBand network card (HCA) ASIC delivers a staggering data throughput of 400Gb/s, supporting 16 lanes of PCIe 5.0 or PCIe 4.0 host interface. Utilising advanced SerDes technology with 100Gb/s per lane, the 400Gb/s InfiniBand is achieved through OSFP connectors on both the switch and HCA ports. The OSFP connector on the switch supports two 400Gb/s InfiniBand ports or 200Gb/s InfiniBand ports, while the network card HCA features one 400Gb/s InfiniBand port. The product range includes active and passive copper cables, transceivers, and MPO fibre cables. Notably, despite both using OSFP packaging, there are differences in physical dimensions, with the switch-side OSFP module equipped with heat fins for cooling.

OSFP 800G Optical Transceiver

The OSFP-800G SR8 Module is specifically crafted for utilization in 800Gb/s 2xNDR InfiniBand systems, offering throughput up to 30m over OM3 or 50m over OM4 multimode fibre (MMF) by utilising a wavelength of 850nm through dual MTP/MPO-12 connectors. Its dual-port configuration represents a significant advancement incorporating two internal transceiver engines, fully unlocking the switch’s potential.

This design allows the 32 physical interfaces to support up to 64 400G NDR interfaces. With its high-density and high-bandwidth design, this module enables data centres to seamlessly meet the escalating network demands of applications such as high-performance computing and cloud infrastructure.

Furthermore, the FS OSFP-800G SR8 Module provides outstanding performance and reliability, delivering robust optical interconnection choices for data centres. This module enables data centres to utilise the complete performance potential of the QM9700/9790 series switch, facilitating high-bandwidth and low-latency data transmission.

NDR Optical Connection Solution

Addressing the NDR optical connection challenge, the NDR switch ports utilize OSFP with eight channels per interface, each employing 100Gb/s SerDes. This allows for three mainstream connection speed options: 800G to 800G, 800G to 2X400G, and 800G to 4X200G. Additionally, each channel supports a downgrade from 100Gb/s to 50Gb/s, facilitating interoperability with previous-generation HDR devices.

The 400G NDR series cables and transceivers offer diverse product choices for configuring network switch and adapter systems, focusing on data centre lengths of up to 500 meters to accelerate HPC computing systems. The various connector types, including passive copper cables (DAC), active optical cables (AOC), and optical modules with jumpers, cater to different transmission distances and bandwidth requirements, ensuring low latency and an extremely low bit error rate for high-bandwidth HPC and accelerated computing applications.

Please see the article for deployment details Infiniband NDR OSFP Solution from the FS community.

Conclusion

In summary, InfiniBand (IB) technology offers unparalleled throughput, intelligent acceleration, and robust performance for HPC, and cloud infrastructures. FS OSFP-800G SR8 Module and NDR Optical Connection Solution further enhance data centre capabilities, enabling high-bandwidth, low-latency connectivity essential for modern computing applications.

Explore the full range of advanced networking solutions at FS.com and revolutionize your data centre network today!

RoCE Technology for Data Transmission in HPC Networks

RDMA (Remote Direct Memory Access) enables direct data transfer between devices in a network, and RoCE (RDMA over Converged Ethernet) is a leading implementation of this technology. improves data transmission with high speed and low latency, making it ideal for high-performance computing and cloud environments.

Definition

As a type of RDMA, RoCE is a network protocol defined in the InfiniBand Trade Association (IBTA) standard, allowing RDMA over converged Ethernet network. Shortly, it can be regarded as the application of RDMA technology in hyper-converged data centers, cloud, storage, and virtualized environments. It possesses all the benefits of RDMA technology and the familiarity of Ethernet. If you want to understand RoCE in depth, you can read this article RDMA over Converged Ethernet Guide | FS Community.

Types

Generally, there are two RDMA over Converged Ethernet versions: RoCE v1 and RoCE v2. It depends on the network adapter or card used.

RoCE v1

Retaining the interface, transport layer, and network layer of InfiniBand (IB), the RoCE protocol substitutes the link layer and physical layer of IB with the link layer and network layer of Ethernet. In the link-layer data frame of a RoCE data packet, the Ethertype field value is specified by IEEE as 0x8915, unmistakably identifying it as a RoCE data packet. However, due to the RoCE protocol’s non-adoption of the Ethernet network layer, RoCE data packets lack an IP field. Consequently, routing at the network layer is unfeasible for RoCE data packets, restricting their transmission to routing within a Layer 2 network.

ROCE v2

Introducing substantial enhancements, the RoCE v2 protocol builds upon the RoCE protocol’s foundation. RoCEv2 transforms the InfiniBand (IB) network layer utilized by the RoCE protocol by incorporating the Ethernet network layer and a transport layer employing the UDP protocol. It harnesses the DSCP and ECN fields within the IP datagram of the Ethernet network layer for implementing congestion control. This enables RoCE v2 protocol packets to undergo routing, ensuring superior scalability. As RoCEv2 fully supersedes the original RoCE protocol, references to the RoCE protocol generally denote the RoCE v2 protocol, unless explicitly specified as the first generation of RoCE.

Also Check- An In-Depth Guide to RoCE v2 Network | FS Community

InfiniBand vs. RoCE

In comparison to InfiniBand, RoCE presents the advantages of increased versatility and relatively lower costs. It not only serves to construct high-performance RDMA networks but also finds utility in traditional Ethernet networks. However, configuring parameters such as Headroom, PFC (Priority-based Flow Control), and ECN (Explicit Congestion Notification) on switches can pose complexity. In extensive deployments, especially those featuring numerous network cards, the overall throughput performance of RoCE networks may exhibit a slight decrease compared to InfiniBand networks.

In actual business scenarios, there are major differences between the two in terms of business performance, scale, operation and maintenance.

Benefits

RDMA over Converged Ethernet ensures low-latency and high-performance data transmission by providing direct memory access through the network interface. This technology minimizes CPU involvement, optimizing bandwidth and scalability as it enables access to remote switch or server memory without consuming CPU cycles. The zero-copy feature facilitates efficient data transfer to and from remote buffers, contributing to improved latency and throughput with RoCE. Notably, RoCE eliminates the need for new equipment or Ethernet infrastructure replacement, leading to substantial cost savings for companies dealing with massive data volumes.

How FS Can Help

In the fast-evolving landscape of HPC data center networks, selecting the right solution is paramount. Drawing on a skilled technical team and vast experience in diverse application scenarios, FS utilizes RoCE to tackle the formidable challenges encountered in High-Performance Computing (HPC). FS offers a range of products, including NVIDIA® InfiniBand Switches, 100G/200G/400G/800G InfiniBand transceivers and NVIDIA® InfiniBand Adapters, establishing itself as a professional provider of communication and high-speed network system solutions for networks, data centers, and telecom clients. Take action now – register for more information and experience our products through a Free Product Trial.

InfiniBand: Powering High-Performance Data Centers

Driven by the booming development of cloud computing and big data, InfiniBand has become a key technology and plays a vital role at the core of the data center. But what exactly is InfiniBand technology? What attributes contribute to its widespread adoption? The following guide will answer your questions.

What is InfiniBand?

InfiniBand is an open industrial standard that defines a high-speed network for interconnecting servers, storage devices, and more. It leverages point-to-point bidirectional links to enable seamless communication between processors located on different servers. It is compatible with various operating systems such as Linux, Windows, and ESXi.

InfiniBand Network Fabric

InfiniBand, built on a channel-based fabric, comprises key components like HCA (Host Channel Adapter), TCA (Target Channel Adapter), InfiniBand links (connecting channels, ranging from cables to fibers, and even on-board links), and InfiniBand switches and routers (integral for networking). Channel adapters, particularly HCA and TCA, are pivotal in forming InfiniBand channels, ensuring security and adherence to Quality of Service (QoS) levels for transmissions.

InfiniBand vs Ethernet

InfiniBand was developed to address data transmission bottlenecks in high-performance computing clusters. The primary differences with Ethernet lie in bandwidth, latency, network reliability, and more.

High Bandwidth and Low Latency

InfiniBand provides higher bandwidth and lower latency, meeting the performance demands of large-scale data transfer and real-time communication applications.

RDMA Support

InfiniBand supports Remote Direct Memory Access (RDMA), enabling direct data transfer between node memories. This reduces CPU overhead and improves transfer efficiency.

Scalability

InfiniBand Fabric allows for easy scalability by connecting a large number of nodes and supporting high-density server layouts. Additional InfiniBand switches and cables can expand network scale and bandwidth capacity.

High Reliability

InfiniBand Fabric incorporates redundant designs and fault isolation mechanisms, enhancing network availability and fault tolerance. Alternate paths maintain network connectivity in case of node or connection failures.

Conclusion

The InfiniBand network has undergone rapid iterations, progressing from SDR 10Gbps, DDR 20Gbps, QDR 40Gbps, FDR56Gbps, EDR 100Gbps, and now to HDR 200Gbps and NDR 400Gbps/800Gbps InfiniBand. For those considering the implementation of InfiniBand products in their high-performance data centers, further details are available from FS.com.

Mastering the Basics of GPU Computing

It’s known that training large models is done on clusters of machines with preferably many GPUs per server. This article will introduce the professional terminology and common network architecture of GPU computing.

Exploring Key Components in GPU Computing

PCIe Switch Chip

In the domain of high-performance GPU computing, vital elements such as CPUs, memory modules, NVMe storage, GPUs, and network cards establish fluid connections via the PCIe (Peripheral Component Interconnect Express) bus or specialized PCIe switch chips.

NVLink

NVLink is a wire-based serial multi-lane near-range communications link developed by Nvidia. Unlike PCI Express, a device can consist of muıltiple NVLinks, and devices use mesh networking to communicate instead of a central hub. The protocol was first announced in March 2014 and uses proprietary high-speed signaling interconnect (NVHS).

The technology supports full mesh interconnection between GPUs on the same node. And the development from NVLink 1.0, NVLink 2.0, NVLink 3.0 to NVLink 4.0 has significantly enhanced the two-way bandwidth and improved the performance of GPU computing applications.

NVSwitch

NVSwitch is a switching chip developed by NVIDIA, designed specifically for high-performance computing and artificial intelligence applications. Its primary function is to provide high-speed, low-latency communication between multiple GPUs within the same host.

NVLink Switch

Unlike the NVSwitch, which is integrated into GPU modules within a single host, the NVLink Switch serves as a standalone switch specifically engineered for linking GPUs in a distributed computing environment.

HBM

Several GPU manufacturers have taken innovative ways to address the speed bottleneck by stacking multiple DDR chips to form so-called high-bandwidth memory (HBM) and integrating them with the GPU. This design removes the need for each GPU to traverse the PCIe switch chip when engaging its dedicated memory. As a result, this strategy significantly increases data transfer speeds, potentially achieving significant orders of magnitude improvements.

Bandwidth Unit

In large-scale GPU computing training, performance is directly tied to data transfer speeds, involving pathways such as PCIe, memory, NVLink, HBM, and network bandwidth. Different bandwidth units are used to measure these data rates.

Storage Network Card

The storage network card in GPU architecture connects to the CPU via PCIe, enabling communication with distributed storage systems. It plays a crucial role in efficient data reading and writing for deep learning model training. Additionally, the storage network card handles node management tasks, including SSH (Secure Shell) remote login, system performance monitoring, and collecting related data. These tasks help monitor and maintain the running status of the GPU cluster.

For the above in-depth exploration of various professional terms, you can refer to this article Unveiling the Foundations of GPU Computing-1 from FS community.

High-Performance GPU Fabric

NVSwitch Fabric

In a full mesh network topology, each node is connected directly to all the other nodes. Usually, 8 GPUs are connected in a full-mesh configuration through six NVSwitch chips, also referred to as NVSwitch fabric.

This fabric optimizes data transfer with a bidirectional bandwidth, providing efficient communication between GPUs and supporting parallel computing tasks. The bandwidth per line depends on the NVLink technology utilized, such as NVLink3, enhancing the overall performance in large-scale GPU clusters.

IDC GPU Fabric

The fabric mainly includes computing network and storage network. The computing network is mainly used to connect GPU nodes and support the collaboration of parallel computing tasks. This involves transferring data between multiple GPUs, sharing calculation results, and coordinating the execution of massively parallel computing tasks. The storage network mainly connects GPU nodes and storage systems to support large-scale data read and write operations. This includes loading data from the storage system into GPU memory and writing calculation results back to the storage system.

Want to know more about CPU fabric? Please check this article Unveiling the Foundations of GPU Computing-2 from FS community.

400G Ethernet Manufacturers and Vendors

New data-intensive applications have led to a dramatic increase in network traffic, raising the demand for higher processing speeds, lower latency, and greater storage capacity. These require higher network bandwidth, up to 400G or higher. Therefore, the 400G market is currently growing rapidly. Many organizations join the ranks of 400G equipment vendors early, and are already reaping the benefits. This article will take you through 400G Ethernet market trend and some global 400G equipment vendors.

The 400G Era

The emergence of new services, such as 4K VR, Internet of Things (IoT), and cloud computing, raises connected devices and internet users. According to an IEEE report, they forecast that “device connections will grow from 18 billion in 2017 to 28.5 billion devices by 2022.” And the number of internet users will soar “from 3.4 billion in 2017 to 4.8 billion in 2022.” Hence, network traffic is exploding. Indeed, the average annual growth rate of network traffic remains at a high level of 26%.

Annual Growth of Network Traffic
Annual Growth of Network Traffic

Facing the rapid growth of network traffic, 100GE/200GE ports are unable to meet the demand for network connectivity from a large number of customers. Many organizations and enterprises, especially hyperscale data centers and cloud operators, are aggressively adopting next-generation 400G network infrastructure to help address workloads. 400G provides the ideal solution for operators to meet high-capacity network requirements, reduce operational costs, and achieve sustainability goals. Due to the good development prospects of 400G market, many IT infrastructure providers are scrambling to layout and join the 400G market competition, launching a variety of 400G products. Dell’Oro group indicates “the ecosystem of 400G technologies, from silicon to optics, is ramping.” Starting in 2021, large-scale deployments will contribute meaningful market. They forecast that 400G shipments will exceed 15 million ports by 2023, and 400G will be widely deployed in all of the largest core networks in the world. In addition, according to GLOBE NEWSWIRE, the global 400G transceiver market is expected to be at $22.6 billion in 2023. 400G Ethernet is about to be deployed at scale, leading to the arrival of the 400G era.

400G Growth

Companies Offering 400G Networking Equipment

Many top companies seized the good opportunity of the fast-growing 400G market, and launched various 400G equipment. Many well-known IT infrastructure providers, which laid out 400G products early on, have become the key players in the 400G market after years of development, such as Cisco, Arista, Juniper, etc.

400G Equipment Vendors
400G Equipment Vendors

Cisco

Cisco foresaw the need for the Internet and its infrastructure at a very early stage, and as a result, has put a stake in the ground that no other company has been able to eclipse. Over the years, Cisco has become a top provider of software and solutions and a dominant player in the highly competitive 25/50/100Gb space. Cisco entered the 400G space with its latest networking hardware and optics as announced on October 31, 2018. Its Nexus switches are Cisco’s most important 400G product. Cisco primarily expects to help customers migrate to 400G Ethernet with solutions including Cisco’s ACI (Application Centric Infrastructure), streamlining operations, Cisco Nexus data networking switches, and Cisco Network Assurance Engine (NAE), amongst others. Cisco has seized the market opportunity and is continuing to grow its sales with its 400G products. Cisco reported second-quarter revenue of $12.7 billion, up 6% year over year, demonstrating the good prospects of 400G Ethernet market.

Arista Networks

Arista Networks, founded in 2008, provides software-driven cloud networking solutions for large data center storage and computing environments. Arista is smaller than rival Cisco, but it has made significant gains in market share and product development during the last several years. Arista announced on October 23, 2018, the release of 400G platforms and optics, presenting its entry into the 400G Ethernet market. Nowadays, Arista focuses on its comprehensive 400G platforms that include various series switches and 400G optical modules for large-scale cloud, leaf and spine, routing transformation, and hyperscale IO intensive applications. The launch of Arista’s diverse 400G switches has also resulted in significant sales and market share growth. According to IDC, Arista networks saw a 27.7 percent full year switch ethernet switch revenue rise in 2021. Arista has put legitimate market share pressure on leader Cisco in the tech sector during the past five years.

Juniper Networks

Juniper is a leading provider of networking products. With the arrival of the 400G era, Juniper offers comprehensive 400G routing and switching platforms: packet transport routers, universal routing platforms, universal metro routers, and switches. Recently, it also introduced 400G coherent pluggable optics to further address 400G data communication needs. Juniper believes that 400G will become the new data rate currency for future builds and is fully prepared for the 400G market competition. And now, Juniper has become the key player in the 400G market.

Huawei Technologies

Huawei, a massive Chinese tech company, is gaining momentum in its data center networking business. Huawei is already in the “challenger” category to the above-mentioned industry leaders—getting closer to the line of “leader” area. On OFC 2018, Huawei officially released its 400G optical network solution for commercial use, joining the ranks of 400G product vendors. Hence, it achieves obvious economic growth. Huawei accounted for 28.7% of the global communication equipment market last year, an increase of 7% year on year. As Huawei’s 400G platforms continue to roll out, related sales are expected to rise further. The broad Chinese market will also further strengthen Huawei’s leading position in the global 400G space.

FS

Founded in 2009, FS is a global high-tech company providing high-speed communication network solutions and services to several industries. Through continuous technology upgrades, professional end-to-end supply chain, and brand partnership with top vendors, FS services customers across 200 countries – with the industry’s most comprehensive and innovative solution portfolio. FS is one of the earliest 400G vendors in the world, with a diverse portfolio of 400G products, including 400G switches, optical transceivers, cables, etc. FS thinks 400G Ethernet is an inevitable trend in the current networking market, and has seized this good opportunity to gain a large number of loyal customers in the 400G market. In the future, FS will continue to provide customers with high-quality and reliable 400G products for the migration to 400G Ethernet.

Getting Started with 400G Ethernet

400G is the next generation of cloud infrastructure, driving next-generation data center networks. Many organizations and enterprises are planning to migrate to 400G. The companies mentioned above have provided 400G solutions for several years, making them a good choice for enterprises. There are also lots of other organizations trying to enter the ranks of 400G manufacturers and vendors, driving the growing prosperity of the 400G market. Remember to take into account your business needs and then choose the right 400G product manufacturer and vendor for your investment or purchase.

Data Center Layout

Data center layout design is a challenging task requiring expertise, time, and effort. However, the data center can accommodate in-house servers and many other IT equipment for years if done properly. When designing such a modest facility for your company or cloud-service providers, doing everything correctly is crucial.

As such, data center designers should develop a thorough data center layout. A data center layout comes in handy during construction as it outlines the best possible placement of physical hardware and other resources in the center.

What Is Included in a Data Center Floor Plan?

The floor plan is an important part of the data center layout. Well-designed floor plan boosts the data centers’ cooling performance, simplifies installation, and reduces energy needs. Unfortunately, most data center floor plans are designed through incremental deployment that doesn’t follow a central plan. A data center floor plan influences the following:

  • The power density of the data center
  • The complexity of power and cooling distribution networks
  • Achievable power density
  • Electrical power usage of the data center

Below are a few tips to consider when designing a data center floor plan:

Balance Density with Capacity

“The more, the better” isn’t an applicable phrase when designing a data center. You should remember the tradeoff between space and power in data centers and consider your options keenly. If you are thinking of a dense server, ensure that you have enough budget. Note that a dense server requires more power and advanced cooling infrastructure. Designing a good floor plan allows you to figure this out beforehand.

Consider Unique Layouts

There is no specific rule that you should use old floor layouts. Your floor design should be based on specific organizational needs. If your company is growing exponentially, your data center needs will keep changing too. As such, old layouts may not be applicable. Browse through multiple layouts and find one that perfectly suits your facility.

Think About the Future

A data center design should be based on specific organizational needs. Therefore, while you may not need to install or replace some equipment yet, you might have to do so after a few years due to changing facility needs. Simply put, your data center should accommodate company needs several years in the future. This will ease expansion.

Floor Planning Sequence

A floor or system planning sequence outlines the flow of activity that transforms the initial idea into an installation plan. The floor planning sequence involves the following five tasks:

Determining IT Parameters

The floor plan begins with a general idea that prompts the company to change or increase its IT capabilities. From the idea, the data center’s capacity, growth plan, and criticality are then determined. Note that these three factors are characteristics of the IT function component of the data center and not the physical infrastructure supporting it. Since the infrastructure is the ultimate outcome of the planning sequence, these parameters guide the development and dictate the data centers’ physical infrastructure requirements.

Developing System Concept

This step uses the IT parameters as a foundation to formulate the general concept of data center physical infrastructure. The main goal is to develop a reference design that embodies the desired capacity, criticality, and scalability that supports future growth plans. However, with the diverse nature of these parameters, more than a thousand physical infrastructure systems can be drawn. Designers should pick a few “good” designs from this library.

Determining User Requirements

User requirements should include organizational needs that are specific to the project. This phase should collect and evaluate organizational needs to determine if they are valid or need some adjustments to avoid problems and reduce costs. User requirements can include key features, prevailing IT constraints, logistical constraints, target capacity, etc.

Generating Specifications

This step takes user requirements and translates them into detailed data center design. Specifications provide a baseline for rules that should be followed in the last step, creating a detailed design. Specifications can be:

  • Standard specifications – these don’t vary from one project to another. They include regulatory compliance, workmanship, best practices, safety, etc.
  • User specifications – define user-specific details of the project.

Generating a Detailed Design

This is the last step of the floor planning sequence that highlights:

  • A detailed list of the components
  • Exact floor plan with racks, including power and cooling systems
  • Clear installation instructions
  • Project schedule

If the complete specifications are clear enough and robust, a detailed design can be automatically drawn. However, this requires input from professional engineers.

Principles of Equipment Layout

Datacenter infrastructure is the core of the entire IT architecture. Unfortunately, despite this importance, more than 70% of network downtime stems from physical layer problems, particularly cabling. Planning an effective data center infrastructure is crucial to the data center’s performance, scalability, and resiliency.

Nonetheless, keep the following principles in mind when designing equipment layout.

Control Airflow Using Hot-aisle/Cold-aisle Rack Layout

The principle of controlling airflow using a hot-aisle/cold-aisle rack layout is well defined in various documents, including the ASHRAE TC9.9 Mission Critical Facilities. This principle aims to maximize the separation of IT equipment exhaust air and fresh intake air by placing cold aisles where intakes are present and hot aisles where exhaust air is released. This reduces the amount of hot air drawn through the equipment’s air intake. Doing this allows data centers to achieve power densities of up to 100%.

Provide Safe and Convenient Access Ways

Besides being a legal requirement, providing safe and convenient access ways around data center equipment is common sense. The effectiveness of a data center depends on how row layouts can double up as aisles and access ways. Therefore, designers should factor in the impact of column locations. A column can take up three or more rack locations if it falls within the row of racks. This can obstruct the aisle and lead to the complete elimination of the row.

Align Equipment With Floor and Ceiling Tile Systems

Floor and ceiling tiling systems also play a role in air distribution systems. The floor grille should align with racks, especially in data centers with raised floor plans. Misaligning floor grids and racks can compromise airflow significantly.

You should also align the ceiling tile grid to the floor grid. As such, you shouldn’t design or install the floor until the equipment layout has been established.

data center

Plan the Layout in Advance

The first stages of deploying data center equipment heavily determine subsequent stages and final equipment installation. Therefore, it is better to plan the entire data center floor layout beforehand.

How to Plan a Server Rack Installation

Server racks should be designed to allow easy and secure access to IT servers and networking devices. Whether you are installing new server racks or thinking of expanding, consider the following:

Rack Location

When choosing a rack for your data center, you should consider its location in the room. It should also leave enough space in the sides, front, rear, and top for easy access and airflow. As a rule of thumb, a server rack should occupy at least six standard floor tiles. Don’t install server racks and cabinets below or close to air conditioners to protect them from water damage in case of leakage.

Rack Layout

Rack density should be considered when determining the rack layout. More free space within server racks allows for more airflow. As such, you can leave enough vertical space between servers and IT devices to boost cooling. Since hot air rises, place heat-sensitive devices, such as UPS batteries, at the bottom of server racks, heavy devices should also be placed at the bottom.

Cable Layout

Well-planned rack layout is more than a work of art. Similarly, an excellent cable layout should leverage cable labeling and management techniques to ease the identification of power and network cables. Cables should have markings at both ends for easy identification. Avoid marking them in the middle. Your cable management system should also have provisions for future additions or removal.

Conclusion

Designing a data center layout is challenging for both small and established IT facilities. Building or upgrading data centers is often perceived to be intimidating and difficult. However, developing a detailed data center layout can ease everything. Remember that small changes in the plan during installation lead to costly consequences downstream.

Article Source: Data Center Layout

Related Articles:

How to Build a Data Center?

The Most Common Data Center Design Missteps

Data Center Containment: Types, Benefits & Challenges

Over the past decade, data center containment has experienced a high rate of implementation by many data centers. It can greatly improve the predictability and efficiency of traditional data center cooling systems. This article will elaborate on what data center containment is, common types of it, and their benefits and challenges.

What Is Data Center Containment?

Data center containment is the separation of cold supply air from the hot exhaust air from IT equipment so as to reduce operating cost, optimize power usage effectiveness, and increase cooling capacity. Containment systems enable uniform and stable supply air temperature to the intake of IT equipment and a warmer, drier return air to cooling infrastructure.

Types of Data Center Containment

There are mainly two types of data center containment, hot aisle containment and cold aisle containment.

Hot aisle containment encloses warm exhaust air from IT equipment in data center racks and returns it back to cooling infrastructure. The air from the enclosed hot aisle is returned to cooling equipment via a ceiling plenum or duct work, and then the conditioned air enters the data center via raised floor, computer room air conditioning (CRAC) units, or duct work.

Hot aisle containment

Cold aisle containment encloses cold aisles where cold supply air is delivered to cool IT equipment. So the rest of the data center becomes a hot-air return plenum where the temperature can be high. Physical barriers such as solid metal panels, plastic curtains, or glass are used to allow for proper airflow through cold aisles.

Cold aisle containment

Hot Aisle vs. Cold Aisle

There are mixed views on whether it’s better to contain the hot aisle or the cold aisle. Both containment strategies have their own benefits as well as challenges.

Hot aisle containment benefits

  • The open areas of the data center are cool, so that visitors to the room will not think the IT equipment is not being cooled sufficiently. In addition, it allows for some low density areas to be un-contained if desired.
  • It is generally considered to be more effective. Any leakages that come from raised floor openings in the larger part of the room go into the cold space.
  • With hot aisle containment, low-density network racks and stand-alone equipment like storage cabinets can be situated outside the containment system, and they will not get too hot, because they are able to stay in the lower temperature open areas of the data center.
  • Hot aisle containment typically adjoins the ceiling where fire suppression is installed. With a well-designed space, it will not affect normal operation of a standard grid fire suppression system.

Hot aisle containment challenges

  • It is generally more expensive. A contained path is needed for air to flow from the hot aisle all the way to cooling units. Often a drop ceiling is used as return air plenum.
  • High temperatures in the hot aisle can be undesirable for data center technicians. When they need to access IT equipment and infrastructure, a contained hot aisle can be a very uncomfortable place to work. But this problem can be mitigated using temporary local cooling.

Cold aisle containment benefits

  • It is easy to implement without the need for additional architecture to contain and return exhaust air such as a drop ceiling or air plenum.
  • Cold aisle containment is less expensive to install as it only requires doors at ends of aisles and baffles or roof over the aisle.
  • Cold aisle containment is typically easier to retrofit in an existing data center. This is particularly true for data centers that have overhead obstructions such as existing duct work, lighting and power, and network distribution.

Cold aisle containment challenges

  • When utilizing a cold aisle system, the rest of the data center becomes hot, resulting in high return air temperatures. It also may create operational issues if any non-contained equipment such as low-density storage is installed in the general data center space.
  • The conditioned air that leaks from the openings under equipment like PDUs and raised floor tend to enter air paths that return to cooling units. This reduces the efficiency of the system.
  • In many cases, cold aisles have intermediate ceilings over the aisle. This may affect the overall fire protection and lighting design, especially when added to an existing data center.

How to Choose the Best Containment Option?

Every data center is unique. To find the most suitable option, you have to take into account a number of aspects. The first thing is to evaluate your site and calculate the Cooling Capacity Factor (CCF) of the computer room. Then observe the unique layout and architecture of each computer room to discover conditions that make hot aisle or cold aisle containment preferable. With adequate information and careful consideration, you will be able to choose the best containment option for your data center.

Article Source: Data Center Containment: Types, Benefits & Challenges

Related Articles:

What Is a Containerized Data Center: Pros and Cons

The Most Common Data Center Design Missteps

The Chip Shortage: Current Challenges, Predictions, and Potential Solutions

The COVID-19 pandemic caused several companies to shut down, and the implications were reduced production and altered supply chains. In the tech world, where silicon microchips are the heart of everything electronic, raw material shortage became a barrier to new product creation and development.

During the lockdown periods, some essential workers were required to stay home, which meant chip manufacturing was unavailable for several months. By the time lockdown was lifted and the world embraced the new normal, the rising demand for consumer and business electronics was enough to ripple up the supply chain.

Below, we’ve discussed the challenges associated with the current chip shortage, what to expect moving forward, and the possible interventions necessary to overcome the supply chain constraints.

Challenges Caused by the Current Chip Shortage

As technology and rapid innovation sweeps across industries, semiconductor chips have become an essential part of manufacturing – from devices like switches, wireless routers, computers, and automobiles to basic home appliances.

devices

To understand and quantify the impact this chip shortage has caused spanning the industry, we’ll need to look at some of the most affected sectors. Here’s a quick breakdown of how things have unfolded over the last eighteen months.

Automobile Industry

in North America and Europe had slowed or stopped production due to a lack of computer chips. Major automakers like Tesla, Ford, BMW, and General Motors have all been affected. The major implication is that the global automobile industry will manufacture 4 million fewer cars by the end of 2021 than earlier planned, and it will forfeit an average of $110 billion in revenue.

Consumer Electronics

Consumer electronics such as desktop PCs and smartphones rose in demand throughout the pandemic, thanks to the shift to virtual learning among students and the rise in remote working. At the start of the pandemic, several automakers slashed their vehicle production forecasts before abandoning open semiconductor chip orders. And while the consumer electronics industry stepped in and scooped most of those microchips, the supply couldn’t catch up with the demand.

Data Centers

Most chip fabrication companies like Samsung Foundries, Global Foundries, and TSMC prioritized high-margin orders from PC and data center customers during the pandemic. And while this has given data centers a competitive edge, it isn’t to say that data centers haven’t been affected by the global chip shortage.

data center

Some of the components data centers have struggled to source include those needed to put together their data center switching systems. These include BMC chips, capacitors, resistors, circuit boards, etc. Another challenge is the extended lead times due to wafer and substrate shortages, as well as reduced assembly capacity.

LED Lighting

LED backlights common in most display screens are powered by hard-to-find semiconductor chips. The prices of gadgets with LED lighting features are now highly-priced due to the shortage of raw materials and increased market demand. This is expected to continue up to the beginning of 2022.

Renewable Energy- Solar and Turbines

Renewable energy systems, particularly solar and turbines, rely on semiconductors and sensors to operate. The global supply chain constraints have hurt the industry and even forced some energy solutions manufacturers like Enphase Energy to

Semiconductor Trends: What to Expect Moving Forward

In response to the global chip shortage, several component manufacturers have ramped up production to help mitigate the shortages. However, top electronics and semiconductor manufacturers say the crunch will only worsen before it gets better. Most of these industry leaders speculate that the semiconductor shortage could persist into 2023.

Based on the ongoing disruption and supply chain volatility, various analysts in a recent CNBC article and Bloomberg interview echoed their views, and many are convinced that the coming year will be challenging. Here are some of the key takeaways:

Pat Gelsinger, CEO of Intel Corp., noted in April 2021 that the chip shortage would recover after a couple of years.

DigiTimes Report found that Intel and AMD server ICs and data centers have seen their lead times extend to 45 to 66 weeks.

The world’s third-largest EMS and OEM provider, Flex Ltd., expects the global semiconductor shortage to proceed into 2023.

In May 2021, Global Foundries, the fourth-largest contract semiconductor manufacturer, signed a $1.6 billion, 3-year silicon supply deal with AMD, and in late June, it launched its new $4 billion, 300mm-wafer facility in Singapore. Yet, the company says its production capacity will only increase component production earliest in 2023.

TMSC, one of the leading pure-play foundries in the industry, says it won’t meaningfully increase the component output until 2023. However, it’s optimistic that the company will ramp up the fabrication of automotive micro-controllers by 60% by the end of 2021.

From the industry insights above, it’s evident that despite the many efforts that major players put into resolving the global chip shortage, the bottlenecks will probably persist throughout 2022.

Additionally, some industry observers believe that the move by big tech companies such as Amazon, Microsoft, and Google to design their own chips for cloud and data center business could worsen the chip shortage crisis and other problems facing the semiconductor industry.

article, the authors hint that the entry of Microsoft, Amazon, and Google into the chip design market will be a turning point in the industry. These tech giants have the resources to design superior and cost-effective chips of their own, something most chip designers like Intel have in limited proportions.

Since these tech giants will become independent, each will be looking to create component stockpiles to endure long waits and meet production demands between inventory refreshes. Again, this will further worsen the existing chip shortage.

Possible Solutions

To stay ahead of the game, major industry players such as chip designers and manufacturers and the many affected industries have taken several steps to mitigate the impacts of the chip shortage.

For many chip makers, expanding their production capacity has been an obvious response. Other suppliers in certain regions decided to stockpile and limit exports to better respond to market volatility and political pressures.

Similarly, improving the yields or increasing the number of chips manufactured from a silicon wafer is an area that many manufacturers have invested in to boost chip supply by some given margin.

chip manufacturing

Here are the other possible solutions that companies have had to adopt:

Embracing flexibility to accommodate older chip technologies that may not be “state of the art” but are still better than nothing.

Leveraging software solutions such as smart compression and compilation to build efficient AI models to help unlock hardware capabilities.

LED Lighting

The latest global chip shortage has led to severe shocks in the semiconductor supply chain, affecting several industries from automobile, consumer electronics, data centers, LED, and renewables.

Industry thought leaders believe that shortages will persist into 2023 despite the current build-up in mitigation measures. And while full recovery will not be witnessed any time soon, some chip makers are optimistic that they will ramp up fabrication to contain the demand among their automotive customers.

That said, staying ahead of the game is an all-time struggle considering this is an issue affecting every industry player, regardless of size or market position. Expanding production capacity, accommodating older chip technologies, and leveraging software solutions to unlock hardware capabilities are some of the promising solutions.

Added

This article is being updated continuously. If you want to share any comments on FS switches, or if you are inclined to test and review our switches, please email us via media@fs.com or inform us on social media platforms. We cannot wait to hear more about your ideas on FS switches.

Article Source: The Chip Shortage: Current Challenges, Predictions, and Potential Solutions

Related Articles:

Impact of Chip Shortage on Datacenter Industry

Infographic – What Is a Data Center?

The Most Common Data Center Design Missteps

Introduction

Data center design is to provide IT equipment with a high-quality, standard, safe, and reliable operating environment, fully meeting the environmental requirements for stable and reliable operation of IT devices and prolonging the service life of computer systems. Data center design is the most important part of data center construction directly relating to the success or failure of data center long term planning, so its design should be professional, advanced, integral, flexible, safe, reliable, and practical.

9 Missteps in Data Center Design

Data center design is one of the effective solutions to overcrowded or outdated data centers, while inappropriate design results in obstacles for growing enterprises. Poor planning can lead to a waste of valuable funds and more issues, increasing operating expenses. Here are 9 mistakes to be aware of when designing a data center.

Miscalculation of Total Cost

Data center operation expense is made up of two key components: maintenance costs and operating costs. Maintenance costs refer to the costs associated with maintaining all critical facility support infrastructure, such as OEM equipment maintenance contracts, data center cleaning fees, etc. Operating costs refer to costs associated with day-to-day operations and field personnel, such as the creation of site-specific operational documentation, capacity management, and QA/QC policies and procedures. If you plan to build or expand a business-critical data center, the best approach is to focus on three basic parameters: capital expenditures, operating and maintenance expenses, and energy costs. Taking any component out of the equation, you might face the case that the model does not properly align an organization’s risk profile and business spending profile.

Unspecified Planning and Infrastructure Assessment

Infrastructure assessment and clear planning are essential processes for data center construction. For example, every construction project needs to have a chain of command that clearly defines areas of responsibility and who is responsible for aspects of data center design. Those who are involved need to evaluate the potential applications of the data center infrastructure and what types of connectivity requirements they need. In general, planning involves a rack-by-rack blueprint, including network connectivity and mobile devices, power requirements, system topology, cooling facilities, virtual local and on-premises networks, third-party applications, and operational systems. For the importance of data center design, you should have a thorough understanding of the functionality before it begins. Otherwise, you’ll fall short and cost more money to maintain.

data center

Inappropriate Design Criteria

Two missteps can send enterprises into an overspending death spiral. First of all, everyone has different design ideas, but not everyone is right. Second, the actual business is mismatched with the desired vision and does not support the setting of kilowatts per square foot or rack. Over planning in design is a waste of capital. Higher-level facilities also result in higher operational and energy costs. A data center designer establishes the proper design criteria and performance characteristics and then builds capital expenditure and operating expenses around it.

Unsuitable Data Center Site

Enterprises often need to find a perfect building location when designing a data center. If you don’t get some site-critical information, it will lead to some cases. Large users are well aware of the data center and have concerns about power availability and cost, fiber optics, and irresistible factors. Baseline users often have business model shells in their core business areas that decide whether they need to build or refurbish. Hence, premature site selection or unreasonable geographic location will fail to meet the design requirements.

Pre-design Space Planning

It is also very important to plan the space capacity inside the data center. The raised floor to support ratio can be as high as 1 to 1, while the mechanical and electrical equipment needs enough space to accommodate. In addition, the planning of office and IT equipment storage areas also needed to be considered. Therefore, it is very critical to estimate and plan the space capacity during data center design. Estimation errors can make the design of a data center unsuitable for the site space, which means suspending project re-evaluation and possibly repurchasing components.

Mismatched Business Goals

Enterprises need to clearly understand their business goals when debugging a data center so that they can complete the data center design. After meeting the business goals, something should be considered, such as which specific applications the data center supports, additional computing power, and later business expansion. Additionally, enterprises need to communicate these goals to data center architects, engineers, and builders to ensure that the overall design meets business needs.

Design Limitations

The importance of modular design is well-publicized in the data center industry. Although the modular approach refers to adding extra infrastructure in an immediate mode to preserve capital, it doesn’t guarantee complete success. Modular and flexible design is the key to long-term stable operation, also meets your data center plans. On the power system, you have to take note of adding UPS (Uninterruptible Power Supply) capacity to existing modules without system disruption. Input and output distribution system design shouldn’t be overlooked, it can allow the data center to adapt to any future changes in the underlying construction standards.

Improper Data Center Power Equipment

To design a data center to maximize equipment uptime and reduce power consumption, you must choose the right power equipment based on the projected capacity. Typically, you might use redundant computing to predict triple server usage to ensure adequate power, which is a waste. Long-term power consumption trends are what you need to consider. Install automatic power-on generators and backup power sources, and choose equipment that can provide enough power to support the data center without waste.

Over-complicated Design

In many cases, redundant targets introduce some complexity. If you add multiple ways to build a modular system, things can quickly get complicated. The over-complexity of data center design means more equipment and components, and these components are the source of failure, which can cause problems such as:

  • Human error. Data statistics errors lead to system data vulnerability and increase operational risks.
  • Expensive. In addition to equipment and components, the maintenance of components failure also incurs more charges.
  • Design concept. If maintainability wasn’t considered by the data center design when the IT team has the requirements of operating or servicing, system operational normality even human security get impacts.

Conclusion

Avoid the nine missteps above to find design solutions for data center IT infrastructure and build a data center that suits your business. Data center design missteps have some impacts on enterprises, such as business expansion, infrastructure maintenance, and security risks. Hence, all infrastructure facilities and data center standards must be rigorously estimated during data center design to ensure long-term stable operation within a reasonable budget.

Article Source: The Most Common Data Center Design Missteps

Related Articles:

How to Utilize Data Center Space More Effectively?

Data Center White Space and Gray Space