Artificial Intelligence is Accelerating the Need for Liquid Cooling

Artificial intelligence (AI) is revolutionizing the workflow of many companies. It allows for more innovation in almost every field by processing and interpreting a huge amount of data in real time, improving decision-making and problem-solving and leading to more accurate predictive analytics to forecast trends and outcomes. All of this computational and accelerator-based innovation requires greater power consumption and presents challenges for data center cooling.

Over the past ten years, significant innovations in CPU design have enlarged core counts and increased frequency. As a result, CPU Thermal Design Power (TDP) has nearly doubled in just a few processor generations and is expected to continue to increase over time. The emergence of power-hungry high-performance general-purpose GPUs for workloads, such as AI and machine learning (ML), work to capitalize the processing capabilities. However, the heat byproduct is becoming a challenge for rack and data center deployments. Similar to CPUs, the growth of power consumption of GPUs has rapidly increased. For example, while the power of an NVIDIA A100 GPU in 2021 was 300W, the latest NVIDIA H100 GPUs draws up to 700W. Further enhancements can see GPUs’ power consumption topping 1000W within the next three years.

Chart showing CPU processors power consumption from 2008 to 2023 and projections for 2024-25.
Figure 1 CPU power consumption history.

The cooling challenges presented by these powerful processors are being met by innovation beyond the silicon. Cooling components, such as fans and heat sinks, are getting more efficient with each generation. Dell Technologies intelligent system management, iDRAC, ensures adequate cooling with minimal fan usage by constantly monitoring sensors throughout the server and learning from its environment. These, and other features, are part of Dell’s Smart Cooling technology, ensuring the fraction of server power spent on cooling can decrease even as total power demands are increasing.

A key aspect of Dell’s Smart Cooling technology is Direct Liquid Cooling (DLC), where a liquid coolant is pumped to hot components within each server. Dell is on its third generation of DLC server platforms. This journey started in the HPC space in 2018, and we now offer 12 DLC-enabled platforms with our 16th generation servers because DLC is not just for HPC anymore. Customers choose DLC-enabled servers to lower their cooling costs, save space and use more of their limited data center power for compute rather than cooling.

Liquid Cooling Basics Explained

Liquid cooling is the thermal extraction method that utilizes liquid coolant to remove heat from some or all of the components inside a server. In Dell’s solution, we use Direct Liquid Cooling, often abbreviated to DLC. Looking at Dell’s DLC3000 and DLC7000 solutions, a coolant distribution unit CDU circulates liquid around a coolant loop to collect and convey heat away from the server. Then via heat exchanger, facility-chilled water transports the heat out of the data center. PowerEdge servers use specially designed liquid cooled cold plates, which are in direct contact with the servers’ CPUs and GPUs .

Diagram depicting the components of a typical Direct Liquid Cooling (DLC) solution.
Figure 2: The components of a typical DLC solution .

Six Key Benefits of Direct Liquid Cooling

Given liquid cooling is much more efficient at collecting and moving heat compared to air cooling, liquid holds four times more heat than air. DLC offers numerous advantages over traditional air-cooling methods, making it an attractive option for modern data centers.

    1. Greater computational density. DLC allows for higher server density in data centers because there is no longer the need to design space for the required airflow. For example, Dell DLC allows customers to deploy 58% more CPU cores using PowerEdge C6620 per rack than air-cooled C6620.1
    2. Uniform cooling. Liquid cooling eliminates hot spots and ensures even distribution of cooling across servers.
    3. Improved server performance. Maintaining servers at supported temperatures through liquid cooling can lead to improved performance and even lower failure rates. Overheating can lead to the CPU temporarily applying thermal throttling, which reduces server performance.
    4. Energy savings. By reducing the need for energy-intensive air conditioning systems and high-speed fans, direct liquid cooling can lead to energy savings and reduced operational costs with lower power usage effectiveness ratio (PUE).
    5. Increased sustainability. Lower power can mean a reduced carbon footprint.
    6. Noise reduction. As a by-product, Direct Liquid Cooling systems are generally quieter than air cooling systems because they require server fans to run at much lower speeds, and the data center air moving infrastructure has far less work.
Diagram depicting the hardware components of Dell's Direct Liquid Cooling (DLC) solution.
Figure 3 Hardware components of Dell DLC solution .

Dell customers can now benefit from a new pre-integrated DLC 3000 or 7000 rack solution for PowerEdge Servers that eliminates the complexity and risk associated with correctly selecting and installing liquid cooling. The DLC3000 rack solution is an ideal solution for customers looking to deploy up to five racks or looking to pilot their first DLC solution. It includes a rack, rack manifold to distribute coolant to servers and in-rack CDU ready to accept factory-built Dell DLC-enabled rack or modular servers. The rack with integrated DLC3000 cooling solution is built, tested and then delivered to the customer’s data center floor, where the Dell professional services team connects the rack to facility-chilled water supply and ensures full operation. Finally, Dell ProSupport maintenance and warranty coverage backs everything in the rack to make the whole experience as simple as possible.

Customers can monitor and manage server power plus thermal data with Dell OpenManage Enterprise Power Manager. Power Manager collects information supplied by each server’s iDRACs, and it can be reported as an individual server, a rack, a row or the entire data center. Organizations can utilize this data to review server power efficiency and locate thermal anomalies such as hotspots. Power Manager also offers additional features including power capping and carbon emission calculation. It also has built-in automation to respond to DLC leaks and thermal events.

As current trends continue the growth of processor CPU and GPU power to support the most demanding workloads, so too will the use of liquid cooling expand to play an important role in data centers. While Direct Liquid Cooling offers many benefits, it is not is not without its challenges. Implementing liquid cooling requires planning and additional installation. We have helped many customers along this journey to reduce their data center PUE. PhonePe, for example saw a drop in PUE ratio from 1.6 to 1.3.  Dell Technologies can support your DLC strategy—wherever you are in your journey. Contact us to learn more.