GIGABYTE and NVIDIA have long been in partnership to develop NVIDIA-Certified Systems for GPU computing use cases such as Artificial Intelligence (AI), High Performance Computing (HPC), Virtual Desktop (VDI), Edge Computing, 5G, Render Farm, Professional Graphics processing and more. To address the multitude of use cases, GIGABYTE offers the largest portfolio of GPU computing server solutions in the market, with modular system design and configurability in mind.
The solutions come with optimised air cooling and readiness for DLC cooling and immersion cooling (in partnership with Asperitas, CoolIT, GRC, Submer and many others). The portfolio continues to expand as next-generation computing technologies from major CPU/GPU manufacturers enter the market all aiming for the highest computing density, performance, and energy efficiency.
Among the various certified systems, the following models are of particular interest for this article: G292-Z20, R282-Z96, G492-ZD2, and immersion-cooled systems.
G292-Z20 – the Densest GPU Computing Platform
Based on the latest AMD EPYC 7002 / 7003 CPU architecture, the G292-Z20 system design has a single CPU socket and relies on the high AMD EPYC CPU core count (up to 64 cores) to control up to 8 NVIDIA GPU cards (PCIe form factor, double-slot or single-slot dimensions). The unified memory space (as in a single NUMA) across CPU, system memory, GPU, and network devices gives the greatest computing performance with the least latency in data movement. Either in bare metal set-up or in virtualisation, G292-Z20 can guarantee optimal distribution of computing resources.
G292-Z20 comes with 8x PCIe Gen4 slots for NVIDIA GPU, 1x CPU socket for AMD EPYC, 8x DDR4 3200MHz DIMM slots, 8x 2.5” hot-swap drive bays (where 2 bays support NVMe PCIe Gen3 and 6 bays support SATA/SAS), 2x PCIe Gen4 expansion slots for add-on devices such as HBA FC / storage cards and NVIDIA SmartNIC to accelerate data transfer across nodes and clusters and GPUDirect/RDMA. Such compact, GPU-centric computing features interest especially HPC users who work with Artificial Intelligence, molecular simulations, genomics sequencing, weather prediction, and other use cases.
G292-Z20 comes also with immersion-cooling readiness. The article addresses about this topic at the end.
R282-Z96 – a Versatile, All-Purpose GPU Computing Platform
R282-Z96 comes with dual CPU sockets for AMD EPYC 7002 / 7003 processors (up to 64 cores each socket), support for up to 3 NVIDIA GPU cards (PCIe form factor, double-slot or single-slot dimensions), and extensive options for PCIe add-on card configuration.
The onboard 32 DIMM slots offer for up to 4TB of DDR4 ECC memory (or up to 8TB using 3DS LRDIMM modules). For local storage, R282-Z96 has an M.2 storage slot and 12x 3.5″/2.5″ SATA/SAS hot-swap HDD/SSD drive bays. An optional NVMe kit also exists for integrating U.2 NVMe PCIe Gen4 drives.
Most importantly, the R282-Z96 system design provides a NUMA-balanced layout across the two CPU domains: the system memory, the local storage, and the PCIe slots are evenly distributed ensuring optimum performance and reducing performance bottlenecks in demanding workloads.
R282-Z96 is therefore an ideal solution for VDI and HPC. For instance, two NVIDIA GPU cards such as A16 and A40 can be used for low / mid / high end virtual desktops and virtual applications. NVIDIA A30 and A100 can be used for containerisation in AI development and for molecular analysis, particle simulation, genomics sequencing, weather prediction and other HPC workloads that require CPU-GPU balanced resources for computation.
G492-ZD2 – the Most Powerful GPU System with NVIDIA A100 SXM4 & NVLink
G492-ZD2 is among the best seller models at GIGABYTE: the system is based on 8x NVIDIA A100 SXM4 GPU and 2x AMD EPYC CPU sockets and offers the possibility of installing up to 10x NVIDIA SmartNIC to accelerate data transfer across nodes and clusters and GPUDirect/RDMA. Certified for RHEL and VMWare, G492-ZD2 is well suited also for providing maximum Multi-Instance GPU (MIG) sessions for AI developers who run workloads under different containerised environments and require custom algorithms, libraries, and datasets to be executed in isolated user spaces.
The system employs a novel cooling solution that dedicates a cooling chamber for NVIDIA GPU and SmartNIC used in the PCIe expansion slots, ensuring the highest airflow possible to cool the high-performance components. In fact, the system consists of two separate parts: a 3U GPU sledge that sits above a 1U server that houses the CPU, system memory, storage bays, and front facing PCIe slots. The 3U GPU sledge makes easy swap-out possible in the event of system maintenance, considering the intricate onboard interconnects that link all the GPU modules and the 1U server together.
The inclusion and choices of the NVIDIA A100 SXM4 modules in the G492-ZD2 system is important, in that new NVIDIA Magnum IO GPUDirect technologies favour faster throughput while offloading workloads from the CPU to achieve performance boosts. G492-ZD2 supports NVIDIA GPUDirect RDMA for direct data exchange between GPUs and third-party devices such as NICs or storage adapters. And there is support for GPUDirect Storage for a direct data path to move data from storage to GPU memory while offloading the CPU, thus resulting in higher bandwidth and lower latency.
Next-Generation HPC Readiness: Liquid-Cooling and Immersion Cooling Servers
At GIGABYTE, we are witnessing a drastic increase in the demand for Direct Liquid Cooling (DLC) and Immersion Cooling (mainly single-phase based) compared to the pre-COVID era. The demand arises mainly from data centre operators and Cloud Services Providers (CSP) who express their concerns about incessantly rising computing power and thus the resulting heat output by computing components (especially by CPU and GPU).
We provide support to data centres and customers for analysing their projects, energy consumption, heat dissipation, space optimization and PUE/ water use efficiency (WUE), among many other technical topics, at each step of solution design.
Taking a step further, GIGABYE also offers installation/deployment services working with data centre infrastructure companies, to ensure that customers receive smooth project delivery and short turn-around time for operational readiness. Most importantly, GIGABYTE strongly advises its customers to benefit from its Proof-of-Concept (PoC) resources for validating every solution design and project parameters for making the best decision, as many environmental factors might alter the expected performance and system stability. GIGABYTE has PoC units (in both single-phase and dual-phase immersion cooling) for testing and validation of immersion cooled servers. The server model options come in 1U/2U/4U form factors and can be modified on request to suit different use cases and workloads. GIGABYTE works with all major liquid-cooling and immersion-cooling technology partners in the market, so that customers can count on design-in compatibility of GIGABYTE total solutions with their infrastructure.
Conclusion
Beyond the current HPC technology and onward to Q4 2022, 2023 and further, GIGABYTE is ready for launching next-generation GPU computing solutions in partnership with NVIDIA. GIGABYTE will continue to address diverse use cases by adapting system design to real-world workflows and data centre architectures.
Check here for the current GIGABYTE – NVIDIA promotion campaign.