skip to main content

Lenovo Weather and Climate Platform Guide

Planning / Implementation

Home
Top
Published
23 May 2025
Form Number
LP2220
PDF size
21 pages, 2.1 MB

Abstract

The Lenovo EveryScale Weather & Climate HPC Reference Architecture is designed to enhance weather forecasting and climate modeling capabilities. This architecture leverages Lenovo ThinkSystem SC750 V4 Neptune™ servers, featuring Intel Xeon 6900-series processors, and integrates advanced water-cooling technology for optimal performance and energy efficiency. The solution includes high-speed InfiniBand networking and MRDIMM memory technology, providing the computational power needed for accurate weather predictions and climate simulations.

This guide is intended for sales architects, customers, and partners who seek to deploy a validated HPC infrastructure solution for weather and climate research.

Introduction

Weather forecasting is incredibly important for society helping to prepare for and respond to the continuously shifting environmental conditions. Accurate weather predictions can impact agriculture, transportation, disaster management, and everyday activities, helping to protect lives and property.

Climate modeling, on the other hand, helps us understand long-term changes and trends, enabling policymakers and scientists to develop strategies to combat the climate crisis and take actions to mitigate its consequences.

To gain insights into weather and climate, numerical simulations are essential. These simulations use advanced computing technologies to provide detailed analysis, modeling, and projections. High-performance computing systems, like those used for example by Environment & Climate Change Canada (ECCC), allow researchers to run complex simulations that predict weather and climate phenomena more accurately.

This paper describes the supercomputer design at ECCC, intended as a platform guide and reference architecture for Weather and Climate environments. At the core of the design, beginning with the Intel Xeon 6 CPU which leverages a distinct memory subsystem, and extending to the server, compute infrastructure, rack layout, and network topology, this reference architecture aims to facilitate a foundation for performance and reliability required in the Weather & Climate vertical.

Case Study - Environment & Climate Change Canada

Canada uses the Global Environmental Multi-Scale Model (GEM), an advanced weather forecasting and data assimilation system developed by the Recherche en Prévision Numérique (RPN), Meteorological Research Branch (MRB), and the Canadian Meteorological Centre (CMC). The GEM Global Forecast System and the Global Deterministic Prediction System (GDPS) are being used for global data assimilation cycles and medium-range forecasting, as well as regional data assimilation spin-up cycles and short-range forecasting.

In addition to the operational GEM model, several experimental versions are being utilized to offer higher resolution insights and serve as test environments for new parameterizations, thereby continuously enhancing operations.

As is typical for mission-critical national weather forecasting, Canada is deploying two supercomputer environments. These are typically the two most powerful supercomputers in Canada and are represented on the Top500 ranking of the most powerful commercially available high-performance computing systems in the world.

Since 2021, these systems have utilized Lenovo Neptune technology. Lenovo Neptune supports energy-efficient supercomputing by improving operational and cooling efficiency. Consequently, the systems were ranked in the Green500 list of the most energy-efficient commercially available high-performance computing systems within the top three of general-purpose environments.

In addition to the supercomputers that execute large-scale numerical simulations, two pre-processing and post-processing environments are employed for data preparation and visualization of results. These tools facilitate the dissemination of information and reports to various stakeholders, like weather forecasts, alerts and warnings, and weather and climate projections.

Capacity computing refers to systems optimized to solve numerous small problems simultaneously. These systems maximize throughput, handling a high volume of tasks efficiently. Capacity computing is suitable for applications where many independent or loosely coupled tasks need to be processed in parallel, achieving high productivity by completing a large number of small-scale computations.

Capability computing, on the other hand, is designed to tackle large, complex problems that require substantial computational power. These systems focus on the ability to solve individual, computationally intensive tasks quickly. Capability computing is essential for applications where the priority is to execute major simulations or analyses that demand significant resources. The cost of delivering computational performance as capability is typically higher due to the need for advanced hardware and substantial processing power to handle demanding algorithms.

In high-performance computing (HPC), both capacity and capability computing play crucial roles. For organizations like Environment and Climate Change Canada (ECCC), capability computing is particularly important because it allows the rapid processing of large-scale weather and climate models, which are vital for accurate forecasts and risk assessments. Capacity computing is necessary complementing this by efficiently generating numerous products that depict the forecasted environmental state to various users.

The Lenovo ThinkSystem SC750 V4 Neptune, featuring Intel Xeon 6 processors, is engineered to provide the optimal balance between capacity and capability computing in numerical simulation for weather forecasting and climate research.

Overview

The Lenovo EveryScale Weather & Climate reference architecture is built on a Scalable Unit (SU) structure with 192 servers and 384 CPUs and scales up to 10 Scalable Units with 2,048 Servers and 4,096 CPUs within a standard FAT Tree Network Topology. Considering other scaling network topologies such as Dragonfly+, the reference architecture could grow exponentially.

A Scalable Unit consists of four compute racks and a central network rack that houses both high-speed InfiniBand leaf and spine switches, as well as Ethernet connectivity. The Lenovo Heavy Duty Rack Cabinets offer sufficient cable routing channels to efficiently direct InfiniBand connections to the adjacent network rack, while also accommodating all necessary Ethernet connections. These five-rack SUs are designed to scale conveniently, providing growth on demand for forecasting models of any size.

Lenovo EveryScale Weather & Climate Scalable Unit
Figure 1. Lenovo EveryScale Weather & Climate Scalable Unit

Components

The main hardware components of Lenovo Weather and Climate RA are Compute nodes and the Networking infrastructure. As an integrated solution they come together in a Lenovo EveryScale Rack (Machine Type 1410).

Compute Infrastructure

The Compute Infrastructure is built on the latest generation of Lenovo Neptune Supercomputing systems.

Lenovo ThinkSystem N1380

The ThinkSystem N1380 Neptune chassis is the core building block, built to enable exascale-level performance while maintaining a standard 19-inch rack footprint. It uses liquid cooling to remove heat and increase performance and is engineered for the next decade of computational technology.

Lenovo ThinkSystem N1380 Enclosure
Figure 2. Lenovo ThinkSystem N1380 Enclosure

N1380 features an integrated manifold that offers a patented blind-mate mechanism with aerospace-grade drip-less connectors to the compute trays, ensuring safe and seamless operation. The unique design of the N1380 eliminates the need for internal airflow and power-consuming fans. As a result, it achieves a reduction in typical data center power consumption by up to 40% compared to similar air-cooled systems.

This newly developed enclosure incorporates up to four ThinkSystem 15kW Titanium Power Conversion Stations (PCS). These stations are directly fed with high current three-phase power and supply power to an internal 48V busbar, which in turn powers the compute trays. The PCS design is a game-changer, merging power conversion, rectification, and distribution into a single package. This is a significant transformation from traditional setups that require separate rack PDUs, additional cables and server power supplies, resulting in best-in-class efficiency.

Each 13U Lenovo ThinkSystem N1380 Neptune enclosure houses eight Lenovo ThinkSystem SC-series Neptune trays. Up to three N1380 enclosures fit into a standard 19" rack cabinet, packing 24 trays into just two 60x60 datacenter floor tiles.

The following table lists the configuration of the N1380 Enclosures.

Table 1. Configuration of the N1380 Enclosure
Part number Description Quantity
7DDHCTOLWW Lenovo ThinkSystem N1380 Neptune Enclosure 1
BYKR ThinkSystem N1380 Neptune Enclosure Midplate Assembly 1
C4KW 0.95M, 63A 240-415V, 3-Phase Y-Splitter Floor Power Cable 2
BYKV 2.8M, 63A 240-415V, 3-Phase WYE IEC to Y-Splitter Rack Power Cable 2
BE0E N+N Redundancy With Over-Subscription 1
BYKK ThinkSystem N1380 Neptune EPDM Hose Connection 1
BYKJ ThinkSystem N1380 Neptune System Management Module V3 1
BYJZ ThinkSystem N1380 Neptune Enclosure 1
BYKH ThinkSystem N1380 Neptune 15kW 3-Phase 200-480V Titanium Power Conversion Station 4
5WS7C20194 5Yr Premier 24x7 4Hr Resp N1380 Neptune Enclosure 1

Lenovo ThinkSystem SC750 V4

The ThinkSystem SC750 V4 Neptune node is the next-generation high-performance server based on the sixth generation Lenovo Neptune direct water cooling platform.

The Lenovo ThinkSystem SC750 V4 server tray with two distinct two-socket nodes
Figure 3. Lenovo ThinkSystem SC750 V4 Neptune Server Tray

Supporting the Intel Xeon 6900P-series, the ThinkSystem SC750 V4 Neptune stands as a powerhouse for demanding HPC workloads. Its industry-leading direct water-cooling system ensures steady heat dissipation, allowing CPUs to maintain accelerated operation and achieve up to a 10% performance enhancement.

With 12 channels of high-speed DDR5 RDIMM or an impressive 8800MHz high-bandwidth MRDIMM capability, it excels in memory bandwidth-bound workloads, positioning it as a preferred choice for meteorology and engineering applications like WRF, ICON, OpenFOAM, and Fluent.

Completing the package with support for high-performance NVMe and high-speed, low latency networking with the latest InfiniBand, Omnipath, and Ethernet choices, the SC750 V4 is your all-in-one solution for HPC workloads.

At its core, Lenovo Neptune applies 100% direct warm-water cooling, maximizing performance and energy efficiency without sacrificing accessibility or serviceability. The SC750 V4 is installed into the ThinkSystem N1380 Neptune enclosure which itself integrates seamlessly into a standard 19" rack cabinet. Featuring a patented blind-mate stainless steel dripless quick connection, SC750 V4 node trays can be added “hot” or removed for service without impacting other node trays in the enclosure.

This modular design ensures easy serviceability and extreme performance density, making the SC750 V4 the go-to choice for compute clusters of all sizes - from departmental/workgroup levels to the world’s most powerful supercomputers – from Exascale to Everyscale.

Intel Xeon 6900-Series processors with P-cores

Intel Xeon 6 processors with P-cores are optimized for high performance per core. With more cores, double the memory bandwidth, and AI acceleration in every core, Intel Xeon 6 processors with P-cores provide twice the performance for the widest range of workloads, including HPC and AI.

The Intel Xeon 6900 with P-core processors, boasting up to 128 Performance-cores, offers superior performance that make them ideal for vectorized workloads in fields like biology and chemistry, including applications such as NAMD, GROMACS, LAMMPS, CP2K, and Quantum ESPRESSO, and boost performance for machine and deep learning workloads as well.

With AI acceleration in every core, the Intel Advanced Matrix Extensions (Intel AMX) speeds up inferencing for INT8 and BF16, and it offers support for FP16-trained models, with up to 2,048 floating point operations per cycle per core for INT8 and 1,024 floating point operations per cycle per core for BF16/FP16. This can result in gains up to 2x higher Llama-13B (BF16) on gen to gen performance.

In the field of weather & climate prediction, performance improvements are also observed. Consider NEMO in the field of ocean modelling, there are many number of parameters most key being ocean temperature and velocity. Using Intel Xeon 6 processors ocean simulations run up to 2.35x faster than the previous generation Xeon processors, when comparing a timed step process from start to finish. For climate prediction codes such as GEM we also see better than a 2x improvement in performance due to the increased core count, additional memory channels and enhanced I/O capability when compared to the previous generation of processors.

MRDIMM technology

MRDIMM technology, or multiplexed rank DIMMs, represents a significant advancement in memory performance, particularly for memory-intensive workloads such as climate forecasting models like MOM6, NEMO, WRF, and GEM. Companies like Micron have been at the forefront of developing high-speed memory solutions, contributing to the reliability and efficiency of MRDIMMs.

MRDIMM operation
Figure 4. MRDIMM Multiplex Functionality

By operating at speeds of up to 8800 MT/s, these DIMMs drastically reduce the time required to access memory, in combination with Xeon6 leading to a 200% improvement in performance versus previous Xeon generation. This enhancement allows for lower power consumption or increased performance with limited increase in energy usage, making MRDIMMs an important choice for modern high-performance computing systems, such as those used for weather and climate research.

Configuration

Each node is equipped with two Intel Xeon 6980P CPUs, each comprising 128 Xeon6 P-cores. This configuration provides 256 Xeon6 P-cores and 1.5TB of MRDIMM RAM per node, making the SC750 V4 highly suited for core and RAM-intensive tasks. Utilizing high-speed, low-latency network adapters at 200, 400, or 800Gbps, the SC750 V4, when paired with Intel Xeon6, offers exceptional scalability for the most demanding parallel MPI jobs.

The following table lists the configuration of the SC750 V4 Trays.

Table 2. Configuration of the SC750 V4 trays
Part number Description Quantity
7DDJCTOLWW Lenovo ThinkSystem SC750 V4 Neptune Tray 1
BZ7E ThinkSystem SC750 V4 ConnectX-7 Auxiliary Cable 1
5977 Select Storage devices - no configured RAID required 1
BPKR TPM 2.0 1
C2WQ Intel Xeon 6980P 128C 500W 2.0GHz Processor 4
BKSP ThinkSystem NVIDIA ConnectX-7 NDR OSFP400 1-port PCIe Gen5 x16 InfiniBand Adapter (SharedIO) DWC 1
C0TY ThinkSystem 32GB TruDDR5 8800MHz (2Rx8) MRDIMM 48
BYK4 ThinkSystem SC750 V4 Neptune Tray 1
B7Y0 Enable IPMI-over-LAN 1
5WS7C20199 5Yr Premier 24x7 4Hr Resp SC750 V4 Neptune Tray 1

Network Infrastructure

The Network Infrastructure is built on NVIDIA networking technology for both InfiniBand and Ethernet.

High Performance Network

The scalable unit consists of four compute racks, each housing 48 nodes shared across 24 Lenovo ThinkSystem SC750 V4 compute trays. Each compute tray is equipped with a high-speed InfiniBand adapter that supports up to 400 Gbps. The SC750 V4’s InfiniBand adapters are Lenovo SharedIO capable and feature an internal high-speed PCIe link between compute nodes. This configuration allows both nodes within a single compute tray to access a single 400 Gbps InfiniBand connection, thereby reducing the cabling and switching requirements by up to 50% in the overall design.

This streamlined approach offers a cost-effective solution for scaling CPU and memory-intensive workloads. By maintaining a balanced design, customers can accurately scale their workload when CPU and memory tasks heavily outweigh inter-node communication requirements. This optimized configuration results in significant cost savings while delivering optimal price and performance.

The following table lists the configuration of the NVIDIA QM9790 NDR InfiniBand Leaf Switch.

Table 3. Configuration of the NVIDIA QM9790 NDR InfiniBand Leaf Switch
Part number Description Quantity per system
0724HED NVIDIA QM9790 64-Port Unmanaged Quantum NDR InfiniBand Switch (oPSE) 1
BP64 NVIDIA QM9790 64-Port Unmanaged Quantum NDR InfiniBand Switch (oPSE) 1
BRQ6 2.8m, 10A/100-250V, C15 to C14 Jumper Cord 2
BQJD ThinkSystem NDRx2 OSFP800 IB Multi ModeTwin-Transceiver 16
BQJN Lenovo 3M NVIDIA NDR Multi Mode Optical Cable 4
BQJR Lenovo 10M NVIDIA NDR Multi Mode Optical Cable 14
BQJS Lenovo 20M NVIDIA NDR Multi Mode Optical Cable 14
BQJX Lenovo 2M NVIDIA NDRx2 OSFP800 to 2x NDR OSFP400 Passive Copper Splitter Cable 8
BQJY Lenovo 3M NVIDIA NDRx2 OSFP800 to 2x NDR OSFP400 Passive Copper Splitter Cable 8
BP66 NVIDIA QM97xx Enterprise Rack Mount Kit 1
5WS7B96633 5Yr Premier 24x7 4Hr Resp NVID QM9790 oPSE 1

Management Network

Cluster management is usually done over Ethernet, and the SC750 V4 offers multiple options. It comes with 25GbE SFP28 Ethernet ports, a Gigabit Ethernet port, and a dedicated XClarity Controller (XCC) port. These can be customized based on cluster management and workload needs.

For stable environments with infrequent OS changes like weather and climate systems, the single Gigabit Ethernet port suffices. A CAT5e or CAT6 cable per node can use Network Controller Sideband Interface (NC-SI) for remote out-of-band and cluster management over one wire. For higher bandwidth needs or frequent updates, the 25Gb Ethernet interfaces offer additional capacity and support sideband communication to the XCC.

Front view of the tray with two ThinkSystem SC750 V4 nodes
Figure 5. Front view the SC750 V4 with management ports

The SC750 V4 integrates the XCC through the Data Center Secure Control Module (DC-SCM) I/O board. This module also includes a Root of Trust module (NIST SP800-193 compliant), USB 3.2 ports, a VGA port, and MicroSD card capability for additional storage with the XCC, offering firmware storage options up to 4GB, including N-1 firmware history.

The N1380 enclosure features a System Management Module 3 (SMM) at the rear, managing both the enclosure and individual servers through a web browser or Redfish/IPMI 2.0 commands. The SMM provides remote connectivity to XCC controllers, node-level reporting, power control, enclosure power management, thermal management, and inventory tracking.

In the context of this Weather & Climate reference architecture, the gigabit Ethernet interfaces of each SC750 V4 compute tray are linked to an NVIDIA SN2201 Management Leaf switch, ensuring connectivity for the compute nodes' cluster management, out-of-band management, and N1380 enclosure systems management modules (SMM).

The following table lists the configuration of the NVIDIA SN2201 1GbE Management Leaf Switches.

Table 4. Configuration of the NVIDIA SN2201 1GbE Management Leaf Switches
Part number Description Quantity per system
7D5FCTOGWW NVIDIA SN2201 1GbE Managed Switch with Cumulus (oPSE) 1
BPC8 NVIDIA SN2201 1GbE Managed Switch with Cumulus (oPSE) 1
6201 1.5m, 10A/100-250V, C13 to IEC 320-C14 Rack Power Cable 2
3793 3m Yellow Cat5e Cable 1
B306 Mellanox QSA 100G to 25G Cable Adapter 2
BFH2 Lenovo 25Gb SR SFP28 Ethernet Transceiver 2
BSNA NVIDIA SN2201 Enterprise Rack Mount Kit for Recessed Mounting 1
5WS7B98278 5Yr Premier 24x7 4Hr Resp NVID SN2201 PSE 1

Lenovo EveryScale Solution


Figure 6. Lenovo EveryScale Heavy Duty Enterprise Rack Cabinet

The Server and Networking components and Operating System can come together as a Lenovo EveryScale Solution. It is a framework for designing, manufacturing, integrating and delivering data center solutions, with a focus on High Performance Computing (HPC), Technical Computing, and Artificial Intelligence (AI) environments.

Lenovo EveryScale provides Best Recipe guides to warrant interoperability of hardware, software and firmware among a variety of Lenovo and third-party components.

Addressing specific needs in the data center, while also optimizing the solution design for application performance, requires a significant level of effort and expertise. Customers need to choose the right hardware and software components, solve interoperability challenges across multiple vendors, and determine optimal firmware levels across the entire solution to ensure operational excellence, maximize performance, and drive best total cost of ownership.

Lenovo EveryScale reduces this burden on the customer by pre-testing and validating a large selection of Lenovo and third-party components, to create a “Best Recipe” of components and firmware levels that work seamlessly together as a solution. From this testing, customers can be confident that such a best practice solution will run optimally for their workloads, tailored to the client’s needs.

In addition to interoperability testing, Lenovo EveryScale hardware is pre-integrated, pre-cabled, pre-loaded with the best recipe and optionally an OS-image and tested at the rack level in manufacturing, to ensure a reliable delivery and minimize installation time in the customer data center.

Scalability

As outlined in the component selection, the Lenovo reference design for weather and climate applications employs a high-speed network utilizing NDR InfiniBand. This fifth-generation InfiniBand fabric is implemented in a Fat Tree topology, enabling scalability up to 2,048 compute nodes through Lenovo SharedIO and 32 QM9790 NDR InfiniBand leaf switches. Each leaf switch connects to the spine network via 32 NDR uplinks. The spine layer consists of 16 QM9790 NDR InfiniBand spine switches.

Scale Out Network Topology
Figure 7. Scale Out Network Topology

For larger weather and climate solutions requiring extensive scalability, the Dragonfly+ topology is a highly efficient option. This topology offers fewer diameter networks while maintaining high bandwidth and minimizing latency. Dragonfly+ is particularly beneficial for large node counts needing all-to-all or many-to-many communication, crucial for global weather and climate models. Adaptive routing within the Dragonfly+ ensures balanced global links between groups, optimizing performance. For smaller, regional tasks, jobs can be scheduled within a group of the Dragonfly+ topology, providing an optimal structure for localized or tree-friendly traffic. This design emphasizes balanced communication and robustness, making it suitable for high-complexity environments.

Four Group Dragonfly+ Network Topology
Figure 8. Four Group Dragonfly+ Network Topology

The Dragonfly+ group itself is structured in a Fat Tree topology. Fat Tree topologies are ideal for smaller clusters where workload is more localized or up/down friendly, and the complexities of managing multiple groups of a Dragonfly+ are not required. They ensure a flat communication hierarchy with balanced IO across the whole network. The simplicity of Fat Tree topology makes it easier to configure and maintain, providing a cost-effective solution for smaller-scale deployments. Additionally, Fat Tree networks offer predictable performance, which is crucial for applications with consistent and high throughput requirements.

Both topologies provide a solid network interconnect foundation for weather and climate solutions, addressing the needs for scalability, high bandwidth, and low latency, ensuring robust performance for complex simulations and models.

Performance

The performance characteristics of weather and climate workloads are generally balanced between compute-intensive and memory-intensive demands. Compute-intensive refers to workloads that benefit from a high number of processor cores, elevated CPU frequencies and increased instructions per cycle (IPC) capabilities. In contrast, memory-intensive workloads rely on high memory bandwidth and are particularly sensitive to the rate at which data is read from and written to memory. A large cache can also enhance performance for memory-bound workloads.

The degree to which a workload leans toward compute or memory intensity depends on factors such as the specific application, model resolution, and the physics being modeled. Higher-resolution models tend to be more memory-intensive, requiring increased memory bandwidth to run simulations efficiently. Conversely, lower-resolution models are often more compute-intensive and benefit from CPUs with higher clock speeds that accelerate scalar and vector operations.

Among the applications we tested, MOM6 and NEMO are ocean models, while WRF and GEM are mesoscale weather models. From a compute perspective, these four workloads presented in the plot below exhibit broadly similar behavior. However, NEMO is somewhat more memory-intensive, which contributes to its relatively stronger performance in memory bandwidth-constrained environments.

The figure below with the benchmark results highlights these distinctions.

Weather and Climate Application Performance Comparison
Figure 9. Weather and Climate Application Performance Comparison

The results presented above are derived from benchmark runs by Lenovo HPC Innovation Center using the Lenovo ThinkSystem SC750 V5 with Intel Xeon 6 CPUs and Micron MRDIMM memory.

Lenovo TruScale

Lenovo TruScale XaaS is your set of flexible IT services that makes everything easier. Streamline IT procurement, simplify infrastructure and device management, and pay only for what you use – so your business is free to grow and go anywhere.

Lenovo TruScale is  the unified solution that gives you simplified access to:

  • The industry’s broadest portfolio – from pocket to cloud – all delivered as a service
  • A single-contract framework for full visibility and accountability
  • The global scale to rapidly and securely build teams from anywhere
  • Flexible fixed and metered pay-as-you-go models with minimal upfront cost
  • The growth-driving combination of hardware, software, infrastructure, and solutions – all from one single provider with one point of accountability.

For information about Lenovo TruScale offerings that are available in your region, contact your local Lenovo sales representative or business partner.

Lenovo Financial Services

Why wait to obtain the technology you need now? No payments for 90 days and predictable, low monthly payments make it easy to budget for your Lenovo solution.

  • Flexible

    Our in-depth knowledge of the products, services and various market segments allows us to offer greater flexibility in structures, documentation and end of lease options.

  • 100% Solution Financing

    Financing your entire solution including hardware, software, and services, ensures more predictability in your project planning with fixed, manageable payments and low monthly payments.

  • Device as a Service (DaaS)

    Leverage latest technology to advance your business. Customized solutions aligned to your needs. Flexibility to add equipment to support growth. Protect your technology with Lenovo's Premier Support service.

  • 24/7 Asset management

    Manage your financed solutions with electronic access to your lease documents, payment histories, invoices and asset information.

  • Fair Market Value (FMV) and $1 Purchase Option Leases

    Maximize your purchasing power with our lowest cost option. An FMV lease offers lower monthly payments than loans or lease-to-own financing. Think of an FMV lease as a rental. You have the flexibility at the end of the lease term to return the equipment, continue leasing it, or purchase it for the fair market value. In a $1 Out Purchase Option lease, you own the equipment. It is a good option when you are confident you will use the equipment for an extended period beyond the finance term. Both lease types have merits depending on your needs. We can help you determine which option will best meet your technological and budgetary goals.

Ask your Lenovo Financial Services representative about this promotion and how to submit a credit application. For the majority of credit applicants, we have enough information to deliver an instant decision and send a notification within minutes.

Bill of materials – First Scalable Unit

This section provides an example Bill of Materials (BoM) of one Scaleable Unit (SU) deployment. This example BoM includes:

  • 5x Lenovo Heavy Duty 48U Rack Cabinets
  • 12x Lenovo ThinkSystem N1380 Neptune Enclosures
  • 96x Lenovo ThinkSystem SC750 V4 Compute Dual Node Trays
  • 3x QM9790 Quantum NDR InfiniBand Leaf Switches
  • 16x QM9790 Quantum NDR InfiniBand Spine Switches
  • 4x SN2201 Spectrum Gigabit Ethernet Leaf Switches

Note: Storage is optional and not included in this BoM.

Tables in this section:

Lenovo ThinkSystem N1380 Neptune Enclosure

Table 5. Lenovo ThinkSystem N1380 Neptune Enclosure
Part number Description Quantity per system Total quantity
7DDHCTOLWW Lenovo ThinkSystem N1380 Neptune Enclosure 1 12
BYKR ThinkSystem N1380 Neptune Enclosure Midplate Assembly 1 12
C4KW 0.95M, 63A 240-415V, 3-Phase Y-Splitter Floor Power Cable 2 24
BYKV 2.8M, 63A 240-415V, 3-Phase WYE IEC to Y-Splitter Rack Power Cable 2 24
BE0E N+N Redundancy With Over-Subscription 1 12
BYKK ThinkSystem N1380 Neptune EPDM Hose Connection 1 12
BYKJ ThinkSystem N1380 Neptune System Management Module V3 1 12
BYJZ ThinkSystem N1380 Neptune Enclosure 1 12
BYKH ThinkSystem N1380 Neptune 15kW 3-Phase 200-480V Titanium Power Conversion Station 4 48
5WS7C20194 5Yr Premier 24x7 4Hr Resp N1380 Neptune Enclosure 1 12

ThinkSystem SC750 V4 Neptune Tray

Table 6. ThinkSystem SC750 V4 Neptune Tray
Part number Description Quantity per system Total quantity
7DDJCTOLWW Lenovo ThinkSystem SC750 V4 Neptune Tray 1 96
BZ7E ThinkSystem SC750 V4 ConnectX-7 Auxiliary Cable 1 96
5977 Select Storage devices - no configured RAID required 1 96
BPKR TPM 2.0 1 96
C2WQ Intel Xeon 6980P 128C 500W 2.0GHz Processor 4 384
BKSP ThinkSystem NVIDIA ConnectX-7 NDR OSFP400 1-port PCIe Gen5 x16 InfiniBand Adapter (SharedIO) DWC 1 96
C0TY ThinkSystem 32GB TruDDR5 8800MHz (2Rx8) MRDIMM 48 4608
BYK4 ThinkSystem SC750 V4 Neptune Tray 1 96
B7Y0 Enable IPMI-over-LAN 1 96
5WS7C20199 5Yr Premier 24x7 4Hr Resp SC750 V4 Neptune Tray 1 96

NVIDIA QM9790 Quantum NDR InfiniBand Leaf Switches

Table 7. NVIDIA QM9790 Quantum NDR InfiniBand Leaf Switches
Part number Description Quantity per system Total quantity
0724HED NVIDIA QM9790 64-Port Unmanaged Quantum NDR InfiniBand Switch (oPSE) 1 3
BP64 NVIDIA QM9790 64-Port Unmanaged Quantum NDR InfiniBand Switch (oPSE) 1 3
BRQ6 2.8m, 10A/100-250V, C15 to C14 Jumper Cord 2 6
BQJD ThinkSystem NDRx2 OSFP800 IB Multi ModeTwin-Transceiver 16 48
BQJN Lenovo 3M NVIDIA NDR Multi Mode Optical Cable 4 12
BQJR Lenovo 10M NVIDIA NDR Multi Mode Optical Cable 14 42
BQJS Lenovo 20M NVIDIA NDR Multi Mode Optical Cable 14 42
BQJX Lenovo 2M NVIDIA NDRx2 OSFP800 to 2x NDR OSFP400 Passive Copper Splitter Cable 8 24
BQJY Lenovo 3M NVIDIA NDRx2 OSFP800 to 2x NDR OSFP400 Passive Copper Splitter Cable 8 24
BP66 NVIDIA QM97xx Enterprise Rack Mount Kit 1 3
5WS7B96633 5Yr Premier 24x7 4Hr Resp NVID QM9790 oPSE 1 3

NVIDIA QM9790 Quantum NDR InfiniBand Spine Switches

Table 8. NVIDIA QM9790 Quantum NDR InfiniBand Spine Switches
Part number Description Quantity per system Total quantity
0724HED NVIDIA QM9790 64-Port Unmanaged Quantum NDR InfiniBand Switch (oPSE) 1 16
BP64 NVIDIA QM9790 64-Port Unmanaged Quantum NDR InfiniBand Switch (oPSE) 1 16
BRQ6 2.8m, 10A/100-250V, C15 to C14 Jumper Cord 2 32
BQJD ThinkSystem NDRx2 OSFP800 IB Multi ModeTwin-Transceiver 32 512
BP66 NVIDIA QM97xx Enterprise Rack Mount Kit 1 16
5WS7B96633 5Yr Premier 24x7 4Hr Resp NVID QM9790 oPSE 1 16

NVIDIA SN2201 Gigabit Ethernet Leaf Switches

Table 9. NVIDIA SN2201 Gigabit Ethernet Leaf Switches
Part number Description Quantity per system Total quantity
7D5FCTOGWW Nvidia SN2201 1GbE Managed Switch with Cumulus (oPSE) 1 4
BPC8 Nvidia SN2201 1GbE Managed Switch with Cumulus (oPSE) 1 4
6201 1.5m, 10A/100-250V, C13 to IEC 320-C14 Rack Power Cable 2 8
3793 3m Yellow Cat5e Cable 1 4
B306 Mellanox QSA 100G to 25G Cable Adapter 2 8
BFH2 Lenovo 25Gb SR SFP28 Ethernet Transceiver 2 8
BSNA NVIDIA SN2201 Enterprise Rack Mount Kit for Recessed Mounting 1 4
5WS7B98278 5Yr Premier 24x7 4Hr Resp NVID SN2201 PSE 1 4

Lenovo Heavy Duty 48U Rack Cabinet

Table 10. Lenovo Heavy Duty 48U Rack Cabinet
Part number Description Quantity per system Total quantity
1410P48 Lenovo EveryScale 48U Pearl Heavy Duty Rack Cabinet 1 5
BJ64 Lenovo EveryScale 48U Pearl Heavy Duty Rack Cabinet 1 5
BJPC Side Panel Right Installation 1 5
BJ2N Front Installation of 180mm Extension Kit 1 5
BJPB Side Panel Left Installation 1 5
C0D6 0U 18 C13/C15 and 18 C13/C15/C19/C21 Switched and Monitored 32A 3 Phase WYE PDU 2 2
BJ2M Rear Installation of 180mm Extension Kit 1 5
BJ68 ThinkSystem 48U Pearl Heavy Duty Rack Side Panel 2 10
BJ6A ThinkSystem 48U Pearl Heavy Duty Rack Extension 2 10
BJ67 ThinkSystem 48U Pearl Heavy Duty Rack Rear Door 1 5
2304 Integration Prep 1 5
2310 Solution Specific Test 1 5
AU8K LeROM Validation 1 5
B1EQ Network Verification 1 5
5WS7B96703 5Yr Premier 24x7 4Hr Resp EveryScale 48U Rack 1 5

Authors

Martin W Hiegl is the Executive Director of Advanced Solutions at Lenovo, responsible for the global High-Performance Computing (HPC) and Enterprise Artificial Intelligence (EAI) solution business. He oversees the global EAI and HPC functions, including Sales, Product, Development, Service, and Support, and leads a team of subject matter specialists in Solution Management, Solution Architecture, and Solution Engineering. This team applies their extensive expertise in associated technologies, Supercomputer solution design, Neptune water-cooling infrastructure, Data Science and application performance to support Lenovo’s role as the most trusted partner in Enterprise AI and HPC Infrastructure Solutions. Martin holds a Diplom (DH) from DHBW, Stuttgart, and a Bachelor of Arts (Hons) from Open University, London, both in Business Informatics. Additionally, he holds a United States patent pertaining to serial computer expansion bus connection.

David DeCastro is the Director of HPC Solutions Architecture at Lenovo leading a team of subject matter experts and application performance engineers of various disciplines across the HPC portfolio including Weather & Climate. His role consists of developing HPC solution architecture, planning roadmaps, and optimizing go to market strategies for the Lenovo HPC community. David has over 15 years of HPC experience across a variety of industries including, Higher Education, Government, Media & Entertainment, Oil & Gas and Aerospace. He has been heavily engaged optimizing HPC solutions to provide customers the best value for their research investments in a manner that delivers on time and within budget.

Kevin Dean is the HPC Performance and CAE Segment Architect on the HPC Worldwide Customer Engagement Team within the Infrastructure Solutions Group at Lenovo. The role consists of leading the HPC performance engineering process and strategy as well as providing CAE application performance support and leading the customer support for the manufacturing vertical. Kevin has 8 years of experience in HPC and AI application performance support at Lenovo plus 12 years of aerodynamic design and computational fluid dynamics experience in the US defense and automotive racing industries. Kevin holds an MS degree in Aerospace Engineering from the University of Florida and a BS degree in Aerospace Engineering from Virginia Polytechnic Institute and State University.

Related product families

Product families related to this document are the following:

Trademarks

Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.

The following terms are trademarks of Lenovo in the United States, other countries, or both:
Lenovo®
from Exascale to Everyscale®
Neptune®
ThinkSystem®
XClarity®

The following terms are trademarks of other companies:

Intel® and Xeon® are trademarks of Intel Corporation or its subsidiaries.

GDPS® is a trademark of IBM in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.