Published
13 Sep 2024Form Number
LP2018PDF size
27 pages, 1.3 MBAbstract
Lenovo, a leading provider of data center solutions, has announced the publication of new liquid cooling standards designed to promote wider adoption of this efficient and sustainable technology. Drawing on over a decade of experience in developing and deploying liquid cooling solutions, Lenovo's standards aim to address key challenges and provide a framework for industry-wide collaboration.
Key features of the new standards include:
- Tiered approach: A classification system categorizes liquid cooling solutions into three tiers based on performance, efficiency, and environmental impact.
- Comprehensive guidelines: Detailed recommendations covering fluid temperatures, cooling infrastructure, safety measures, and compliance requirements.
- Reference designs: Practical examples of data center topologies and solution configurations to assist in implementation.
- Best practices: Guidance on water quality, chemical treatment, and maintenance to ensure optimal system performance and longevity.
By providing a clear and standardized framework, Lenovo's liquid cooling standards aim to accelerate the transition to more sustainable and energy-efficient data center operations.
These standards are intended to benefit data center managers, IT professionals, and engineers involved in designing and operating high-performance computing (HPC) systems, artificial intelligence (AI) training, and machine learning (ML) infrastructures. In this paper, we assume readers have a basic understanding of data center operations, cooling systems, and the challenges associated with managing high-density computing environments. Familiarity with industry standards like ASHRAE and OCP would be beneficial but not essential.
Introduction
Liquid cooling of data center servers delivers on the promise of performance, energy efficiency and reliability for high performance computing systems, Artificial Intelligence training and machine learning, inference systems and dense computing such as that used in High Performance Computing designs. We have experienced a rapid growth in processing power and packaging density for both computing and graphical processing units (CPUs and GPUs). Processing power for current processing generation has increased by a factor of 3 up to 500W for CPUs 1000W+ for GPUs. In turn, the overall rack power values have increased from about 15kW to 100kW+ and required device temperatures that have been reduced in the range of 8x°C to about 5x°C. To reliably and efficiently cool platforms designed around the silicon vendor roadmaps liquid cooling is now necessary.
Direct water cooling is achieved by circulating the cooling water directly through cold plates that contact, for example, the CPU thermal case, DIMMs, and other high-heat-producing components in the node. Lenovo has led the industry in delivering liquid cooling solutions since 2012 and has demonstrated reliable solutions under the Neptune® brand of liquid cooled platforms. To ensure that the deep technical knowhow of liquid cooling deployed worldwide reliably, we are documenting solution standards and reference designs so that customers have a solution matrix from which they can select and incorporate into their data center environments.
Fortunately, we do not have to create data center environment standards alone. The industry standard organization ASHRAE (American Society of Heating, Refrigerating and Air-Conditioning Engineers) was founded in 1959 with the focus on providing heating and air-conditioning guidelines to engineers that work in the field of HVAC. The organization has expanded its defining and maintaining of standards to cover computer data centers. In 2021, the TC9.9 committee responsible for all aspects of mission critical facilities, data centers, technology spaces, and electronic equipment/systems, put forth recommendations and guidelines on Liquid Cooling. The recommendations include definition of the class of inlet fluid temperatures for the Facility Water System (FWS) from the primary facility. These values (such as W17 or W45) define the limited upper bound of the inlet fluid temperature to the compute infrastructure in degrees Celsius. The standards also document guidelines for selection of the wetted materials list, the fluid chemistry requirements and recommendations on the approach used to maintain the integrity of the liquid loop.
The Open Compute Project (OCP) organization has begun to leverage the guidelines documented by ASHRAE and extend the guidelines further to include design collateral for server reference designs in 2023. The expanded collateral produced by OCP includes items such as liquid loop design, cold plate designs and quick connect design.
The OCP has also focused on categorizing the facility water loop based on delivered fluid temperatures into three groups:
- Group 1: High (40-45°C)
- Group 2: Medium (30-37°C)
- Group 3: Low (15-25°C)
Lenovo recognizes that there are industry standards bodies and collaboration entities that help the community of data center owners and customers of system hardware to use a common language and a set of standards to allow for the ability to procure, install and maintain a diverse set of solutions from various hardware suppliers. While the approach has merit, liquid cooling is not yet standardized enough to allow for liquid cooling to be democratized and more readily deployed. In this document Lenovo will describe the solution stack that we have optimized over the past 12 years to deliver efficient cooling and will categorize the solution in a matrix of 3x3 with an overall goal of helping contribute our decade plus long expertise back to the community.
The 3x3 matrix will define the rack and solution topology for Hot/Warm/Cold inlet FWS and Technology Cooling Systems (TCS) fluid temperatures that can be applied to most all liquid cooled AI/HPC/Enterprise Solution Topologies.
Solution Tiers for performance & efficiency
Lenovo has adopted OCP’s method of describing the cooling tiers to bring consistency to the different methods of efficiencies a data center can achieve. The following table describes Lenovo’s ranking for each tier using de-ionized water and Lenovo’s open-loop cold plate and water loop designs.
The design of the liquid cooling loop infrastructure that can provide the various tiers described in the table can be visualized in the figure below. This layout is referenced in the ASHRAIE TC 9.9 Standard and is typically of how the Facility Water System and Technology Cooling Systems are designed as separate loops. This approach keeps facility liquid separate from technology liquid thus allows for different liquids to be used in each loop.
Figure 1. Example liquid cooling loop that illustrates how the facility water loops and secondary supply loops to the IT equipment are designed
The tiers are described in the following subsections:
Lenovo Gold Tier Standard (Best)
The facility level FWS (labeled as Facility Water/Primary Loop in the figure above) and the compute infrastructure loops (labeled as Solution Water/Secondary Loop) are shown with adiabatic or dry coolers transporting the heat from the data center to the external ambient outside temperature. The diagram also includes an optional chiller as an intermediate heat extraction entity tied to a coolant distribution unit (CDU)that feeds the liquid cooling system racks.
Using the ASHRAE TC 9.9 standard as the starting reference design, the following two figures show the Lenovo Gold solution data center topologies and example designs.
Figure 2. Design 1: Lenovo Neptune Gold, ASHRAE W40 with 95%+ Heat Capture
Figure 3. Design 2: Lenovo Neptune Gold, ASHRAE W40 FWS and W17 TCS with 100% Heat Capture using DLC and Passive Rear Door Heat Exchangers (RHDX)
The examples above can be used for most liquid cooled AI/HPC/Enterprise Solutions. The data center fluid topology examples above illustrate the possible designs on the layout of the liquid loop in a data center could be designed.
Two examples that illustrate how the Lenovo Gold designed solutions just referenced above could be implemented are given below:
- Solution 1 is a design targeting a Generative AI model using a dense NVIDIA GPU accelerator solution for a Lenovo ThinkSystem SR780a V3.
- Solution 2 is a HPC solution using Lenovo ThinkSystem SD650-N V3 servers that also contain NVIDIA accelerators.
Solution 1: Generative AI Solution using SR780a V3 for H200: 100% Heat Extraction
Technical specifications of the solution:
- SR780a V3 Chassis with 8x NVIDIA H200 GPUs
- 9.5 kW Power/ Chassis: 7.1 kW water, 2.4 kW air
- 48U Rack w/ 8 Chassis SR780a V3:
- ~103 kW Rack Power: 77 kW water; 26 kW air
- ~100 liters per minute (lpm) Water @ 45°C to DLC; 50 lpm @ 17°C to RHDX
- Vertiv FS1350 CDU: 16 Racks Topology connected
The following figure shows the solution topology.
Figure 4. Solution 1: Generative AI Solution using SR780a V3 for H200: 100% Heat Extraction
Solution 2: HPC Solution using SD650-N V3: 95%+ Heat Extraction
Technical specifications of the solution:
- 36x Lenovo ThinkSystem SD650-N V3 systems per rack
- ~105 kW Rack
- Each Rack of 36x Lenovo Systems and Liquid Cooled Power Supplies
- ~135 lpm Water @ 45°C to DLC
- FS 1350 CDU: 8 Racks Topology
The following figure shows the solution topology.
Figure 5. Solution 2: HPC Solution using SD650-N V3: 95%+ Heat Extraction
Lenovo Silver Tier Standard (Moderate)
The Lenovo Silver Tier will typically apply to systems and solutions that are based on Enterprise and AI/HPC solutions that are designed to extract up to 80% of the systems power consumption into the cold plates with a fluid temperature for the data center delivered at a W32 facility water temperature. The racks that comprise such systems can range from a power consumption from 50 kW to 120 kW. The residual heat flux will have to be managed by the air-cooling solutions such as in-Row coolers or CRAH units.
The solution example below shows the example of six racks of HPC/AI solutions at about 100 kW power consumption per rack with 6 Racks mapped to a FS600 CDU. The residual heat of the racks can be extracted by 2 in-row Coolers.
Each rack would use about 70 lpm @ 36°C for a total fluid requirement of about 420 lpm for the cluster. The residual heat of about 25kW per rack will have to be managed by the air handlers.
Lenovo Bronze Tier Standard (Acceptable)
The Lenovo Basic Tier Standard will typically apply to systems and solutions that are based on Enterprise and AI/HPC solutions that are designed to extract up to 60% of the systems power consumption into the cold plates with a fluid temperature for the data center delivered at a W17 facility water temperature. Given that the facility provides Chilled Water, it is also possible to use a passive RDHX to either supplement the cooling to provide a 100% heat extraction. The racks that comprise such systems can range from a power consumption from 30 kW to 100 kW. The residual heat flux will have to be managed by the air-cooling solutions such RDHX solution or in-Row/CRAH coolers. With traditional chilled water, the systems will deliver systems performance enhancements with reduced system power due to reduction of systems fan power from a 20-30% of the systems power to up to 10% of the systems power.
A solution topology for the W17 based cooling circuit can be varied, where servers can be cooled with cold plate cooling or cold plated cooling w/ a passive or active RDHX cooling. A solution with a mix of Cold Plate based Cooling and a RDHX solution is also shown in the Basic Tier example. The solution can also be mapped to an in-Rack CDU instead of an aisle level CDU. The example solution below is shown only with an aisle level CDU.
Lenovo Neptune Compliant Liquid Cooling
As mentioned previously, the ASHRAE standard group aims to deliver base level liquid cooling standards in the data center. Lenovo Neptune, however, is a description of capabilities and adherence to the best of several standards related to Data Center liquid cooling and ensures to meet or exceed each of the base standard. Being Neptune Direct Water Cooling (DWC) compliant means providing best-in-class quality to users, while enabling a broad set of environmental capabilities like water and ambient air temperatures. Additionally, solutions designed leveraging Lenovo Neptune systems can often contribute to higher levels of sustainability by reducing both the power a solution can consume compared to a comparable air-cooling solution and the power required due to fan power reduction or elimination.
Some advantages that moving to a DWC solution can provide are:
- Reduction of the reliance on Computer Room Air Handling (CRAH) and Computer Room Air Conditioning (CRAC) systems. These CRAH/CRAC units can contribute to higher power consumption in the data center.
- Minimize the need for higher cubic feet per minute (CFM) of air flow resulting in higher fan power required per system
- Use of warmer inlet temperature liquid which reduces the need to consume power to generate cold liquid temperatures
- High heat rejection to liquid resulting in high heat capture from components generating heat.
- Ability to move to a dry cooling system and other economizer systems resulting in lower data center water consumption.
The largest consumer of power in any data center is the IT equipment itself. The second largest consumer is usually the air handling/air conditioning systems in the facility itself. By reducing dependence on the facility air handlers, the largest component of the total energy savings can be dramatic when Direct Water Cooling is deployed.
The advantages of the use of water cooling over air cooling result from water’s higher specific heat capacity, density, and thermal conductivity. These features allow water to transmit heat over greater distances with much less volumetric flow and reduced temperature difference as compared to air.
For cooling IT equipment, this heat transfer capability is its primary advantage. Water has a tremendously increased ability to transport heat away from its source to a secondary cooling surface, which allows for large, more optimally designed radiators or heat exchangers rather than small, inefficient fins that are mounted on or near a heat source, such as a CPU.
The two most common fluids used in secondary loops are de-ionized water and PG25. It’s important to note that PG25 is completely acceptable to use in Lenovo Neptune open loop systems. However, due to PG25’s reduced heat transfer efficiency compared to pure water, the environmental considerations needed for disposal and toxicity, higher viscosity and the lack of need to ship Lenovo Neptune systems pre-filled, Lenovo prefers to use deionized water whenever applicable. Water allows for higher energy efficiency, lower pressure drops, higher thermal capacity to push higher TCS inlet temperatures and better sustainability metrics overall. If PG25 is preferred, then standard best practices should be adhered to as well as proper fluid quality maintenance.
Direct Water Cooling uses the benefits of water by distributing it directly to the highest heat generating node subsystem components. By doing so, the offering realizes 7% - 10% direct energy savings when compared to an air-cooled equivalent. That energy savings results from the removal of the system fans and the lower operating temp of the direct water-cooled system components.
Water is delivered to IT equipment from a coolant distribution unit (CDU) via the water manifold. As shown in the following figure, each manifold section attaches to an enclosure and connects directly to the water inlet and outlet connectors for each compute node to deliver water safely and reliably to and from each server tray.
All aspects of water cooling are heavily intertwined. The quality of the water, for example, determines its specific heat capacity which means how effective it is as a coolant. That in turn impacts both the water temperature that can be used as well as the pressure needed within the cooling loop. It also impacts the choice of materials to ensure non-reactiveness or with changing viscosity the design of components.
Direct Water-Cooling Quality
The four most common quality issues within direct water-cooling environments are scaling, fouling, corrosion, and microbiological growth. They can stem from suboptimal design, implementation, or maintenance of the water-cooling environment.
- Scaling
Scaling refers to the deposition of dense, adherent material on the cooling loop surfaces. Scaling occurs when the solubilities of salts in water are exceeded because of high concentrations or because of increased temperatures.
- Fouling
Fouling of cooling loops is the deposition of non-scale-forming substances such as corrosion products and organics. Fungi, such as Fusarium, have been known to grow, foul and plug filters and fine finned heat sinks. They generally grow at the water line in cooling tower basins or sumps.
- Corrosion
Corrosion can take many forms. The common forms of corrosion relevant to the cooling loop are uniform corrosion, pitting corrosion, and galvanic corrosion:
- Uniform corrosion, also referred to as general corrosion, is the spatially uniform removal of metal from the surface. It is the usually expected mode of corrosion.
- Pitting corrosion is localized attack of a metal surface that in the case of copper tubes can lead to water leaks with a typical mean time to failure of about 2 years. This localized attack creates “pits” in the tubing material which leads to erosion at the attack point and usually can be seen as “pinhole” type leaks.
- Galvanic corrosion arises when two metals that are wide apart in the galvanic series are in electrical contact and immersed in the same water environment. The potential difference that arises between the two metals in contact forces electrons to flow from the less noble to the more noble metal. On the less noble metal surface corrosion occurs gives off electrons that are consumed on the more noble metal surface by a reduction reaction that can take many chemical forms. Examples are the reduction of metal ions or the consumption of oxygen and water to form hydroxyl ions. Even when not in electrical contact, aluminum can be galvanically attacked by copper because of dissolved copper ions in very low concentrations depositing on the aluminum surface forming the galvanic corrosion couple.
- Microbiological growth
Similar to Fusarium growth, microbiological growth in cooling water systems can lead to deposition, fouling, and corrosion within the cooling loop. Prevention of microbiological growth involves (a) making sure that the cooling loop hardware is assembled from components that are free of biological organisms and (b) ongoing fluid treatment with biocides to control the bacteria population. To avoid biological growth, the water-cooling loops should be shipped and stored dry. Every effort must be made to blow out the water and dry the water-cooling loop as much as possible before shipping and storage.
Quality Best Practices
The following best practices are part of Lenovo Neptune Direct Water-Cooling standard to prevent those problems:
- Design clean: Restrict the water-wetted metallurgies to copper alloys and stainless steels. Avoid use of plain-carbon steel hardware which can rust and foul the water-cooling loop.
- Build clean: Ensure that the cooling loop components are clean and free of bacteria and fungi. The cooling loop assembly should be free of soldering and/or brazing fluxes. Clean water should be used in the assembly operations. Any residual water should be blown out of the assembly. The finished assembly should be clean and dry.
- Ship clean: Any residual water from the assembly and/or testing operations should be blown out from the cooling loop before shipping to avoid corrosion and microbiological growth. As a final step, use nitrogen gas to dry the system. Plug ends and ship the system with the cooling loop pressurized with nitrogen gas.
- Install clean: The cooling loop should be kept clean during the installation step. Brazing is preferred over soldering. The problem with soldering is porous joints that keep leaching out flux residue. All flux residues should be cleaned off. Fill the system with clean water and, if possible, include a secondary step to de-ionize the water in the cooling loop before the addition of biocide and corrosion inhibitors.
- Maintain clean: Monitor and maintain the pH, water conductivity, bacteria count, and the corrosion inhibitor concentration.
Material Quality
The material choice is extremely important not just in terms of the design capabilities, but also both in context of reliability of the system and ability to maintain a high-quality operation.
Material selection and installation is a complex issue often governed by building codes and other local requirements. The appropriate authorities having jurisdiction (such as building inspectors, fire departments, insurance providers, and code compliance officers) should be consulted before planning and installing cooling distribution systems.
For chemical compatibility purposes however, all piping must be composed of specified materials to prevent scaling and to allow proper reactions with the chemistry of the water within the system.
Hard piping and tubes
It is crucial that any liquid leading hardware-like tubes and piping consist of corrosion resistant alloys such as copper alloys and stainless steels.
Copper alloys must be lead-free and contain less than 15% zinc.
Stainless steel should be low carbon and needs to have been treated to improve its resistance to corrosion. As long as there is low possibility of acid entrapment in crevices passivation is desirable as well. The steel must be welded without sensitization and should not be brazed.
Not allowed are Aluminum and Aluminum alloys as well as Brass with >15% zinc or Bradd containing lead. Also forbidden are steels that are not stainless steel or stainless steel that are not properly solution treated.
Piping must also be large enough, as dictated by industry best practices, to avoid excessive water velocity as well as undue pressure drop which can cause issues with external equipment such as pumps in the CDU.
Soft piping and hoses
For softer design choices like hoses, EPDM rubber (ethylene propylene diene terpolymer) must form the inner lining.
While PVC can be acceptable it should be avoided where extreme heat could cause flammability, for example within electrical components. It could be used at a facility level, subject to local authorities’ oversight and regulation.
EPDM has proven to be the best material. The flammability rating must be CSA or UL VW-1. Peroxide cured hoses are preferred because they do not absorb triazoles.
Not allowed is FEP (Fluorinated ethylene propylene) as commonly found in cheap and consumer grade IT internal plumbing systems.
To join copper, brazed joints must be used.
For stainless steel, however, braze joints are not allowed, but TIG and MIG welding are needed. Sensitization must be avoided. Welded assembly should be cleaned, and if possible, passivated, as long as there is low possibility of acid entrapment in crevices.
Solder joints that come in contact with water are not allowed because solder joints are porous; they leach flux residue into the cooling loop. Solder joints may pass inspection and pressure test as manufactured but may still be unreliable during operation.
Threaded joints and fittings must use a thread sealant. Teflon tape is not allowed, as particles from the tape may enter the water stream and create clogs.
Water Quality
The water used within the system side cooling loop must be reasonably clean, bacteria- free water (<100 CFU/ml) such as de-mineralized water, reverse osmosis water, de-ionized water, or distilled water.
The details of the water treatment depend on whether the local municipality allows the disposal of water that contains some cleaning chemicals down a sanitary drain. If the local municipality does not allow the disposal of contaminated water down a sanitary drain, a de-ionizing bypass can be included in the water-cooling loop to allow the cleaning of the water to purity levels corresponding to resistivity > 0.1 Mohm-cm (conductivity < 10 mS/cm) before pouring the water down the drain.
The water must be filtered with an in-line 50-micron filter (approximately 288 mesh) and must be treated with anti-biological and anti-corrosion measures.
When clean water is not readily available, the following method should be employed. This approach is particularly beneficial for large cooling systems, as the water undergoes de-ionization before it reaches any of the racks in the loop. Make sure that the water is purified prior to adding any chemicals. This purification can be achieved by using de-ionizing cartridges installed within the cooling system.
Even if de-ionized water was used to fill the system, a de-ionizing step is prudent for two reasons:
- To ensure that the starting water is de-ionized
- To remove any ions that may have leached off the walls of the cooling loop.
One effective approach to de-ionize the water loop is as follows:
- When the water needs to be de-ionized, the valves V2 and V3 can be opened and valve V1 partially closed to bypass some of the water through the de-ionizing canister.
- During this de-ionizing step, the cooling loop and the computers can keep operating normally.
- When the de-ionization is complete, the V2 and V3 valves should be closed and V1 fully opened.
- The de-ionization step should raise the resistivity of the water greater than 1 MW/cm.
- Under normal operation, the V2 and V3 valves are closed and V1 valve is fully open.
Figure 8. De-ionizing the water using de-ionizing cartridges installed in the cooling loop
Water Treatment
The largest contributor to any failure within a liquid-cooled systems is failure to maintain the fluid’s integrity. Growth, corrosion, fouling, etc can all be traced back to a lack of regular monitoring and maintenance of the fluid quality within the TCS loop.
In this section, we present high-level guidelines that should be followed to ensure high quality liquid in the TCS loop.
De-ionizing equipment
De-ionizing equipment is optional. It is recommended for use in large cooling loops.
When the water needs to be de-ionized (see the Figure 8 above), some of the water can be bypassed to flow through the de-ionizing cartridge.
Dosing equipment
The following equipment is used to dose the cooling loop:
- Recommend utilizing a stainless steel or fiberglass chemical shot feeder.
- System volumes less than 100 gallons utilize a 0.1-gallon size feeder
- System volumes less than 1000 gallons utilize a one (1) gallon size feeder
- System volumes greater than 1000 gallons utilize a 2.5-gallon feeder.
- Chemical pump as per Nalco or another water treatment contractor specification.
Monitoring equipment
Liquid quality monitoring spans a range of tests that should be performed on the TCS fluid itself. The first line of defense is to deploy a physical monitoring system that will inform in real time to a user. However, it’s also recommended that test kits that can detect anomalies in the water chemistry at a more granular level should also be deployed quarterly.
Examples of suitable test kits include the following:
- 3D TRASAR® Controller (#060-TR5500.88) for systems larger than 250 gallons to enable precise and continuous monitoring of system water chemistries: conductivity, pH, corrosion rate and turbidity.
- Azole test kit,
- Nalco PN 460-P3119.88 – Triazole Reagent Set, 25 mL
- Nalco PN 500-P2553.88 – UV Lamp with power supply, 115 VAC
- Nalco PN 400-P0890.88 – Nalco DR/890 Colorimeter
- Nalco PN 500-P1204.88 – 25 mL graduated cylinder
- Nalco bacteria test kit
- Nalco PN 500-P3054.88 – Bacteria dip slides
- Water resistivity monitor with 0-10 Mohm-cm range,
- Nalco PN 400-C006P.88
Enablement quality
Besides the quality of the material and the quality of the water used the best practices described require meticulous management of manufacturing, build, shipment and installation of the direct water-cooled environment.
Manufacturing and shipment
Any units to become part of the system, no matter if an IT component or a cooling infrastructure component that had liquid needs to be fully drained and cleaned, to eliminate any risk of introducing contaminants.
Any plumbing unit that has had water passed through it prior to shipping, whether the water was treated or not, should involve the following to eliminate the potential for microbiological fouling, general fouling, and/or corrosion during the shipping/storage phase.
Utilize nitrogen to purge any residual water out of the system and provide a nitrogen blanket for shipping. Avoid use of compressed air for shipment, as compressed air can lead to oxygen-based corrosion issues. If the supply of nitrogen is limited, the residual water may be blown out first by using compressed air followed by a thorough purging with nitrogen gas.
Site Startup and Initial Treatment
The following items must be available to properly and safely complete the initial system start up.
- De-ionizing cartridges of appropriate capacity: Optional
- Treatment chemicals in appropriate quantities. Nalco and Veolia/SUEZ are leading providers of the inhibitor products and preferred by Lenovo. Deviations from preferred chemicals require written approval from Lenovo.
- System with 20 gallons or less coolant:
Lenovo suggests you use a prepackaged cleaner and inhibitor solution: Nalco 460-CCL2567 or Nalco CCL2567 and Nalco 460-CCL100 or Nalco CCL100. If exposure to bacteria is suspected or a concern, biocides such as Nalco H-550 or Nalco 73500 may be used. If fungi are suspected or a concern, Nalco 77352 may be used. Veolia/SUEZ SV5520 is a prepackage inhibitor and suitable alternate. Veolia/SUEZ NX1100 is the recommended biocide.
- System with greater than 20 gallons of coolant:
Lenovo suggests the use of concentrated chemicals. The cleaner in concentrated form is Nalco 2567. The inhibitor in concentrated form is Nalco 3DT-199. If exposure to bacteria is suspected or a concern, biocides such as Nalco H-550 or Nalco 73500 may be used. If fungi are suspected or a concern, Nalco 77352 may be used. Veolia/SUEZ SV5520 is a prepackage inhibitor and suitable alternate which can be ordered in 55 us gallon containers. Veolia/SUEZ NX1100 is the recommended biocide.
- System with 20 gallons or less coolant:
- A method to add chemicals: Utilize installed system chemical shot feeder and/or an appropriately sized chemical feed pump.
- Source of demineralized water, reverse osmosis water, de-ionized water, or distilled water.
- Proper personal protective equipment.
- Approved drainage to drain pre-cleaning waters, i.e., sanitary sewer. The customer is responsible for the drainage process per local regulations.
- Appropriate test kits to monitor Nalco 3DT-199 or authorized equivalent residual and bacteria count after Nalco H-550, Nalco 73500, Nalco 77352, or Veolia/SUEZ NX1100 addition:
- Water resistivity monitor with 0-10 Mohm-cm range.
Systems 20 gallons or less
The cleaning procedure is as follows:
- This cleaning procedure should be performed on the cooling loop before any computer racks are hooked up to the system.
- System should be empty. If not, drain the system completely.
- Remove all the filters from filter housings.
- Ensure bypass hoses are connected between the supply and return portions of the cooling loop to ensure the cleaning of all sections of the system.
- One of the two following cleansing procedures may be used:
- Chemical cleaning – This is the most effective manner of cleaning the plumbing loop
- Fill the system with the cleaning solution. The recommended cleaning solutions are Nalco 460-CCL2567, Nalco CCL2567, or authorized equivalent.
- Circulate the cleaning solution for a minimum of 30 minutes (longer if time permits) to ensure cleaner reaches all sections of the system.
- Drain the system completely, disposing of the cleaner per local regulations
- Refill with demineralized water, reverse osmosis water, de-ionized water, or distilled water.
- Circulate the water for 15 minutes.
- Drain the system completely, disposing of the cleaner per local regulations
- Immediately proceed to filling the system with water containing pre-mixed inhibitor and preservative
- De-ionize the water by bypassing some of the water flow through the de-ionizing cartridge/s and circulate the water normally through the complete system until the resistivity of the water increases above 1Mohms-cm.
- Proceed to inhibitor dosing procedure.
- Chemical cleaning – This is the most effective manner of cleaning the plumbing loop
The inhibitor dosing procedure is as follows.
If the system was cleaned using cleaning solution Nalco 460-CCL2567 or Nalco CCL2567 or authorized equivalent and if at the end of the cleaning step the system was empty with no water in it, proceed as follows:
- Install new or cleaned 50-micron filter in the filter housings.
- Fill the coolant reservoir with Nalco 460PCCL100 / Nalco CCL100.
- Add 180 ppm of Nalco 3DT-199 so as to raise the azole concentration to 100 ppm.
- If bacteria or fungi is suspected or is a serious concern, inject one of the following biocides:
The choice of biocide depends on the expected microbiological material in the cooling loop. Glutaraldehyde biocide is more effective against anaerobic bacteria. Isothiazolone is more effective against aerobic bacteria, fungi and algae. When in doubt, use the isothiazolone biocide.
- 100 ppm of Nalco H-550 (glutaraldehyde) or
- 200 ppm of Nalco 73500 (glutaraldehyde) or
- 100 ppm of Nalco 77352 (isothiazolone)
- 100 ppm of Veolia/SUEZ NX1100 (isothiazolone)
- Confirm azole residual using Nalco azole test kit or authorized equivalent.
- System is ready for use
If the system was cleaned using de-ionized water only and the system is full of de-ionized water, dose as follows:
- Install new or cleaned 50-micron filter in the filter housings.
- Inject one of the following biocides:
The choice of biocide depends on the expected microbiological contamination in the cooling loop. Glutaraldehyde biocide is more effective against anaerobic bacteria. Isothiazolone is more effective against aerobic bacteria, fungi and algae. When in doubt, use the isothiazolone biocide.
- 100 ppm of Nalco H-550 (glutaraldehyde), or
- 200 ppm of Nalco 73500 (glutaraldehyde), or
- 100 ppm of Nalco 77352 (isothiazolone), or
- 100 ppm of Veolia/SUEZ NX1100 (isothiazolone)
- Inject 300 ppm of Nalco 3DT-199 or authorized equivalent to achieve 100 ppm azole concentration.
- Confirm azole residual using Nalco azole test kit or authorized equivalent.
- The system is now ready for use.
Systems more than 20 gallons
The cleaning procedure is as follows:
- This cleaning procedure should be performed on the cooling loop before any computer racks are hooked up to the system.
- System should be empty. If not, drain the system completely.
- Remove all filters from filter housings.
- Ensure bypass hoses are connected between the supply and the return manifolds of the cooling loop to ensure the cleaning of all the surfaces of the cooling loop.
- One of the two following cleansing procedures may be used:
- Chemical cleaning – This is the most effective manner of cleaning the cooling loop
- Fill the system with demineralized water, reverse osmosis water, de-ionized water, or distilled water.
- Add the required volume of cleaning solution Nalco 2567 or authorized equivalent as per the manufacturer recommendation.
- Circulate the cleaning solution for a minimum of 4 hours
- Drain the system completely utilizing all drain ports available, disposing of the cleaning solution per local regulations
- Refill with demineralized water, reverse osmosis water, de-ionized water, or distilled water.
- Circulate the water for 1 hour.
- Drain the system completely utilizing all drain ports available, disposing of the water per local regulations
- Refill with demineralized water, reverse osmosis water, de-ionized water, or distilled water.
- Circulate for 15 minutes
- Completely fill the system with demineralized water, reverse osmosis water, de-ionized water, or distilled water.
- De-ionize the water by bypassing some of the water flow through the de-ionizing cartridge/s and circulate the water normally through the complete system until the resistivity of the water increases above 1Mohm-cm.
- Proceed to inhibitor dosing procedure.
- Chemical cleaning – This is the most effective manner of cleaning the cooling loop
The inhibitor dosing procedure is as follows:
- Install new or cleaned 50-micron filter in the filter housings.
- The dosing procedure for systems larger than 20 gallons is the same regardless of the cleaning technique.
- Add one of the following biocides:
- 100 ppm of Nalco H-550 (glutaraldehyde) or
- 200 ppm of Nalco 73500 (glutaraldehyde) or
- 100 ppm of Nalco 77352 (isothiazolone) or
- 100 ppm of Veolia/SUEZ NX1100 (isothiazolone)
The choice of biocide depends on the expected microbiological material in the cooling loop. Glutaraldehyde biocide is more effective against anaerobic bacteria. Isothiazolone is more effective against aerobic bacteria, fungi, and algae. When in doubt, use the isothiazolone biocide.
- Circulate for 30 minutes
- Add 300 ppm of Nalco 3DT-199 or authorized equivalent to achieve 100 ppm of azole concentration.
- Circulate for 30 minutes
- Confirm azole residual using Nalco azole test kit or authorized equivalent.
- System is ready for use
If needed, the water in the system can be drained one of two ways:
- The water can be de-ionized to a purity corresponding to resistivity greater than 0.1 Mohms.cm and poured down any municipal drain.
- The water can be poured down a sanitary drain with the local municipality's permission.
Monitoring
It is important to conduct a bacteria test on a quarterly basis and add 100 ppm of Nalco H-550 or 200 ppm of Nalco 73500 biocide if bacteria count is greater than 100 CFU/ml. Nalco 77352 fungicide may be added if fungi has been a concern in the past. Veolia/SUEZ NX1100 is a suitable alternative to the Nalco products.
Fungi may not be detected in the water, even though it may grow and cause blockage of cooling channels in cold plates used to cool the computer processors. Reduced coolant flow rate through the cold plates may be an indication of blocked channels due to fungi growth.
On large systems which have more than 250 gallons of water, monitoring solutions such as Nalco 3D TRASAR® controller should be installed on the system cooling loop to enable precise and continuous monitoring of system water chemistries, conductivity, pH, corrosion rate and turbidity.
It is important to conduct an azole test on a quarterly basis and add Nalco 3DT-199 or authorized equivalent to bring the azole residual concentration to the minimum desired 100 ppm level or any other desirable ppm level.
Extending systems
There are many cases where new systems will need to be added to existing systems on the same TCS liquid loop. In the case where systems are being added to an existing loop, the following procedure should be followed:
- Racks should arrive from Lenovo ready for installation.
- Install rack(s) and open flow from existing system.
- Be sure automated water make-up on the chiller coolant reservoir is activated. If there is no automated water make up feature, top off the system side reservoir.
- Within two hours of installing the new racks, add one of the following biocides:
- 100 ppm of Nalco H-550 (glutaraldehyde) or
- 200 ppm of Nalco 73500 (glutaraldehyde) or
- 100 ppm of Nalco 77352 (isothiazolone), or
- 100 ppm of Veolia/SUEZ NX1100 (isothiazolone)
- The choice of biocide depends on the expected microbiological material in the cooling loop. Glutaraldehyde biocide is more effective against anaerobic bacteria. Isothiazolone is more effective against aerobic bacteria, fungi and algae. When in doubt, use the isothiazolone biocide.
- Add 300 ppm of Nalco 3DT-199 or authorized equivalent to achieve 100 ppm of azole concentration. The amount of inhibitor dosage is calculated based on the volume of the makeup water.
- Confirm azole residual using Nalco azole test kit or authorized equivalent.
Water temperatures
Direct water cooling operates across a broad spectrum of inlet water temperatures. The term “inlet water temperature” typically refers to the temperature of the water as it enters the IT device. This means that any temperature increases from components such as the CDU must be factored in on top of the primary loop datacenter water temperature.
The water temperature plays a pivotal role in determining the Power Usage Effectiveness (PUE), which is a measure of the power required to cool a kilowatt of IT Power.
- Cold Water
Cold Water with a temperature below 27°C (81°F) is classified as “cold” water. The primary sources of cold water are naturally cold sources such as groundwater or lake bottom water, or alternatively, water that is actively chilled before being used for datacenter cooling, similar to conditioned air for air cooling.
However, chilling the water used for cooling adversely impacts the PUE, as the heated return water requires not only the power from pumps for transportation but also the power of the chillers for cooling down.
- Warm Water
The term “warm water” applies when the inlet temperature operates around 30-40°C (86-104°F). This range allows for the utilization of free cooling to dispense waste heat, especially in colder geographies. Free cooling typically uses the outside ambient temperature in cooling towers located on the building’s roof to exchange heat.
In warmer environments, it might be necessary to operate the cooling towers wet to leverage not just the ambient temperature but also the cooling effect of evaporation. This process requires a small amount of power but uses a significant amount of water, which can be a concern in hot and dry areas prone to droughts.
- Hot Water
When the inlet temperature exceeds 40°C (104°F), it is considered “hot” water used for cooling. In addition to the benefits of warm water extending to more areas of the world, allowing for dry cooling tower use in free cooling, this temperature enables the effective reuse of energy in the return water.
The most common reuse is for heating buildings, but there are also other innovative reuses in production today, such as serving as a heat source for an adsorption chilling cycle.
Water pressure
One crucial factor when thinking about fluid moving through two points on a system is the pressure a system must maintain to provide proper flow. Too high a pressure in the system can lead to stress on joints or O-rings and too low can result in flow rates that are below the rate requirement to effectively provide heat rejection from the IT components to the liquid.
Pressure drop is a term used to describe the differential pressure that a fluid must overcome to flow through a system. The cause of pressure drop are factors such as the friction the liquid encounters in the loop, tube diameter, angles in the loop and gravity itself. Most CDU companies provide Pressure drop curves to allow for the best settings at the CDU itself to provide adequate flow rates into the components.
Because Lenovo uses de-ionized water and not liquids such as the industry standard PG25, the low viscosity results in lower pressure drops compared to similar systems that use PG25.
The design of the FWS should be optimized to reduce the overall pressure drop in the solution. It is common for mechanical systems connected to the FWS to have a maximum working pressure of 10 Bar (145 psi). It is critical that the FWS operating pressure is understood, and protections are put in place to not exceed the maximum working pressure of the mechanical equipment connected directly to it, such as Coolant Distribution Units (CDUs).
Neptune compliant data center design
The figure below depicts the fundamental components of a Neptune compliant datacenter design closely aligned to the ASHRAE standard defined Liquid Cooling Guidelines for Datacom Equipment Centers, 2nd Edition, ASHRAE Datacom Series, Book 4, Atlanta, GA 30329-2305.
Figure 9: Example Lenovo Neptune compliant loop design based on the ASHRAE Liquid Cooling Guidelines for Datacom Equipment Centers
The building water system (also called condenser water system (CWS)) exchanges heat between an economizer like a cooling tower and the datacenter infrastructure.
On the datacenter, depending on cooling approach, that could be a chiller for chilled water or cold air or go directly to a Cooling Distribution Unit (CDU) for liquid cooling.
The environment after the optional chiller but before the CDU is called Primary Loop or Facilities Water System (FWS). The Secondary Loop or Datacom Equipment Cooling System (DECS) is the environment in which the water comes in contact with components to be cooled.
That Secondary Loop can be provided by in-rack CDUs for single-rack cooling going up to about 100kW, to external off-the-shelf CDUs for multi-rack also called row- or pod-level cooling to about 600kW up to datacenter level cooling which can reach from existing 1.4MW to custom build multi-megawatt CDUs.
Cooling Distribution Units
CDU solutions supporting Neptune products should be equipped with multiple levels of redundancy across all critical components and should be concurrently maintainable with service access from any two sides. The features Lenovo recommends are listed below.
CDU features:
- Configurable voltage to suit regional power inputs
- Ability to select top or bottom exit outlets for primary and secondary connections
- Integrated filtration on primary (≤500µ) and secondary (≤50µ) circuits
- As a minimum, redundant components for the following:
- Secondary distribution pumps with auto change-over feature and/or ability to run concurrently
- Inlet and outlet temperature sensors
- Inlet and outlet pressure transducers
- Expansion vessels
- Inverters
- Automatically operated fill pump with suitable reservoir
- Primary and Secondary flow meters
- Appropriate isolation and check valves to enable concurrent maintenance
- Ability to operate in N+N or N+1 configurations with other CDUs on a common secondary fluid network (SFN)
- Pressure relief valves
- Integrated leak detection monitoring
- Support for redundant power feeds
- Ability to regulate pump speed by either flow or differential pressure control
- Automatic air vents
- Able to communicate via common protocols for systems management (SNMP/Modbus/BACnet/Redfish)
- Hold necessary agency approvals (CE/UL/ISO)
Lenovo recognizes that each installation can be unique for specific data centers and can be consulted for design consultation as part of the server configuration, manufacturing, and delivery process. CDU selection and architecture can be provided by Lenovo following review of operating environment and server configuration. This offering would include considerations around the following but not limited to:
- Best practice CDU selection with reference to the above features
- Data center operating limits (water temperature and available flow rates)
- Redundancy requirements
- Secondary fluid network design
- Rack level vs. Row level CDU cooling.
- Pod cooling and growth capabilities per pod
- Custom bespoke manifold strategies
Questions on any topic listed above or related can be referred to CDUsupport@lenovo.com.
Conclusion
Implementing room standards for liquid cooling solutions in data centers is crucial for ensuring the efficient and reliable operation of IT infrastructure. These standards dictate the environmental conditions, cooling infrastructure, power efficiency measures, and safety protocols necessary to support liquid-cooled systems effectively. By establishing clear guidelines for different tiers of liquid cooling, data center operators can optimize performance, minimize energy consumption, and mitigate risks associated with cooling technology.
Proper room standards enable data centers to adapt to varying cooling requirements, from high-performance computing environments capable of running on warm water to more conventional setups utilizing cooler water temperatures. Ultimately, adherence to these standards not only enhances operational efficiency but also promotes sustainability and safety in data center operations.
Authors
Matthew Ziegler is the Director of Lenovo Neptune and Sustainability at Lenovo's Infrastructure Solutions Group. He leads efforts in liquid-cooling and sustainability for data centers. Matthew began his career in life science research, spending a decade in the field before shifting his focus to the design and architecture of x86-based supercomputers (HPC) for various industries, including life sciences, energy, digital media, and atmospheric sciences. He joined IBM in 2003, where he broadened his HPC expertise before transitioning to Lenovo in 2014 following IBM’s acquisition. At Lenovo, he continued to drive HPC innovations and now dedicates his work to liquid-cooling solutions. Matthew holds a BA in Molecular, Cellular, and Developmental Biology from the University of Colorado, Boulder.
Vinod Kamath is a Lenovo Distinguished Engineer specializing in thermal engineering within the Infrastructure Solutions Group, where he focuses on advancing thermal management and energy efficiency in data centers. Vinod began his career at IBM in 1991, where he spent over two decades contributing to significant advancements in mechanical and thermal engineering. He transitioned to Lenovo in 2014, taking on a key role in high-performance computing and data center technologies. In 2023, Vinod was promoted to Distinguished Engineer, reflecting his leadership in the field and his pivotal role in liquid-cooled technology innovation. He earned his B. Tech in Mechanical Engineering from IIT Delhi and a Ph.D. from The Johns Hopkins University.
Jerrod Buterbaugh serves as a Senior Principal Technical Consultant in Lenovo’s Software and Solutions Group, where he specializes in data center power and cooling services, with a focus on high-density computing systems. Jerrod began his career in the United States Air Force as an Avionics and Electrical Technician before moving into the tech industry. He spent over a decade at IBM, developing expertise in server power systems and data center power engineering. Since joining Lenovo in 2014, Jerrod has been instrumental in advancing cooling technologies, particularly direct contact liquid cooling, and optimizing energy efficiency in data centers. A recognized leader in his field, he is a Lenovo Master Inventor and an active member of ASHRAE TC9.9, contributing to industry standards. Jerrod holds a B.S. in Electrical Engineering from the University of Florida.
Stuart Smith is the High-Performance Computing Principal Consultant for Power and Cooling in Lenovo’s Software and Solutions Group, where he leads the implementation of advanced cooling solutions to optimize data center performance. Stuart began his career in the British Army as a Combat Engineer, where he gained foundational skills in mechanical and electrical engineering during operational tours in Iraq and Afghanistan. After transitioning to the HVAC industry, he gained extensive experience in project management and design for data center and telecommunications cooling systems. Stuart held key roles at Eaton Williams Group Ltd and Nortek Global HVAC before becoming the Director of CSH HVAC Services Ltd in 2021. In 2022, he joined Lenovo, where he focuses on innovative cooling strategies for high-performance computing environments. Stuart’s educational background includes an HNC in Building Services Engineering from Stockport College.
Trademarks
Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.
The following terms are trademarks of Lenovo in the United States, other countries, or both:
Lenovo®
Neptune®
ThinkSystem®
Other company, product, or service names may be trademarks or service marks of others.
Configure and Buy
Full Change History
Course Detail
Employees Only Content
The content in this document with a is only visible to employees who are logged in. Logon using your Lenovo ITcode and password via Lenovo single-signon (SSO).
The author of the document has determined that this content is classified as Lenovo Internal and should not be normally be made available to people who are not employees or contractors. This includes partners, customers, and competitors. The reasons may vary and you should reach out to the authors of the document for clarification, if needed. Be cautious about sharing this content with others as it may contain sensitive information.
Any visitor to the Lenovo Press web site who is not logged on will not be able to see this employee-only content. This content is excluded from search engine indexes and will not appear in any search results.
For all users, including logged-in employees, this employee-only content does not appear in the PDF version of this document.
This functionality is cookie based. The web site will normally remember your login state between browser sessions, however, if you clear cookies at the end of a session or work in an Incognito/Private browser window, then you will need to log in each time.
If you have any questions about this feature of the Lenovo Press web, please email David Watts at dwatts@lenovo.com.