How to prepare your data centres for AI workloads

How to prepare your data centres for AI workloads

Wojtek Piorko, Managing Director Africa, Vertiv

The investments required to upgrade data centre infrastructure to power and cool AI hardware are substantial. This transition for African enterprises will not happen quickly and data centre administrators must look for ways to make power and cooling future-ready, explain Wojtek Piorko and Jonathan Duncan at Vertiv.

AI is already transforming people’s everyday lives, with local use of technology like ChatGPT, virtual assistants, navigation applications and chatbots on the upswing. And just as it is transforming every single industry, it is also beginning to fundamentally change data centre infrastructure, driving significant changes in how high-performance computing is powered and cooled.

To put this into perspective, consider the fact that a typical IT rack used to run workloads from five to 10 kilowatts, kW, and racks running loads higher than 20 kW were considered as high-density. AI-chips, however, can require around five times as much power and five times as much cooling capacity in the same space as a traditional server. So, we are now seeing rack densities of 40 kW per rack, and even more than 100 kW in some instances.

This will require extensive capacity increases across the entire power train; from the grid to chips in each rack. It also means that, due to traditional cooling methods not being able to handle the heat generated by GPUs running AI calculations, the introduction of liquid-cooling technologies into the data centre white space, and eventually the enterprise server room, will be a requirement for most deployments.

Investments to upgrade the infrastructure needed to both power and cool AI hardware are substantial, and navigating these new design challenges is critical. The transition will not happen quickly: data centre and server room designers must look for ways to make power and cooling infrastructure future-ready, with considerations for the future growth of their workloads.

To absorb the massive amount of heat generated by hardware running AI workloads, two liquid cooling technologies are emerging as primary options:

Direct-to-chip liquid cooling

Cold plates sit atop the heat-generating components, usually chips such as CPUs and GPUs to draw off heat. Pumped single-phase or two-phase fluid draws off heat from the cold plates to send it out of the data centre, exchanging heat but not fluids with the chip. This can remove between 70 to 75% of the heat generated by equipment in the rack, leaving 25 to 30% to be removed by air-cooling systems.

Rear-door heat exchangers

Passive or active heat exchangers replace the rear door of the IT rack with heat exchanging coils, through which fluid absorbs heat produced in the rack. These systems are often combined with other cooling systems as either a strategy to maintain room neutrality, or as part of a transitional design starting the journey into liquid cooling.

While direct-to-chip liquid cooling offers significantly higher density cooling capacity than air, it is important to note that there is still excess heat that the cold plates cannot capture. This heat will be rejected into the data room unless it is contained and removed through other means such as rear-door heat exchangers or room air cooling.

Because power and cooling are becoming such integral parts of IT solution design in the data room, we are seeing a blurring of the borders between IT and facilities teams, something that can add complexity when it comes to design, deployment and operation. Thus, partnerships and full-solution expertise rank as top requirements for smooth transitions to higher densities.

Jonathan Duncan, Technical Director Africa, Vertiv
Jonathan Duncan, Technical Director Africa, Vertiv

Challenges for Africa

One of the main challenges when considering cooling solutions for data centres is the fact that servers need to be kept within certain temperature and humidity limits to function optimally. At the same time, infrastructure issues around water availability and usage is another concern that must be constantly addressed within the data centre arena across the continent.

Using liquid cooling solutions within the data centre system offers one way to avert issues, if they are using closed loop liquid cooling technology, which means that once the system pipes have been filled with water, there is no water waste. Moreover, a closed water loop is capable of capturing heat to utilise it to warm nearby offices, homes or farms, supporting the circular economy.

Africa is also the second-driest continent, behind only Australia, with two thirds of it classed as arid or semi-arid.

How to do it? A roadmap to kicking off this type of cooling strategy should include the following steps:

Current and future requirements

IT and facility teams must decide how much space to allocate to new AI, HPC workloads to support current demand and growth over the next few years. Some will convert a few racks at a time, while others could allocate entire rooms for these workloads and supporting the addition of liquid cooling systems.

Site audit

Before developing a business case, teams need to know if retrofitting a facility with liquid cooling systems is technically and economically feasible. The IT and facility team should work with partners to conduct a thorough site audit, including the following steps:

  • Perform a computational fluid dynamics study of existing airflows in the facility.
  • Analyse existing air-cooling equipment to see if it provides enough capacity to be leveraged in the new hybrid cooling infrastructure and if current piping can be reused.
  • Perform a flow network modelling analysis to evaluate the ability of the liquid cooling system to support server liquid cooling requirements.
  • Execute a water and power usage effectiveness analyses to determine how efficiently they are using water and power resources.
  • Carry out a total cost of ownership study to optimise operations by replacing old or inefficient equipment to lower operational costs.
  • Examine infrastructure to see if it can be adapted for use with more power-intensive workloads, such as AI.
  • Review physical space to see if raised floors can support the combined weight of new power and hybrid cooling systems, and determine access routes for piping.
  • Check the facility for potential required maintenance of existing infrastructure.
  • Review the on-site water supply, to determine if it is suitable for use in planned liquid cooling systems.
  • Address any safety regulation compliance concerns.

Modelling the desired space

With this data and support of a specialised partner, IT and facility teams can model the desired hybrid cooling infrastructure in the data centre and identify obstacles to overcome, such as weight restrictions, a lack of on-site water, the need to install new piping, access route concerns, and other issues. Once all issues have been addressed, it is a good idea to contract with a vendor to construct a digital twin replica of the new design to explore new systems and processes in 3D.

Impact on operations

The audit and modelling exercise provides the IT and facility team with insight into how extensive the liquid cooling deployment will be to develop a business case for executive consideration. The IT and facility team will also want to consider how on-site construction will disrupt current operations and what impact adding extra heat loads on site will have on current workloads and service-level agreements.

Efficiency and sustainability

Since liquid cooling removes heat at the source, it can be more efficient than air cooling alone and lowers facilities’ PUE metrics. By using water or fluid to cool systems and allows teams to recapture and reuse heat, reducing wasted energy and supporting the circular economy. These gains can reduce indirect or energy-regulated emissions for enterprises. As a result, liquid cooling can be an essential part of enterprises’ sustainability programmes.

RFP and RFQ

With this information, a new solution customised for site requirements can be designed, whereafter a design with a bill of materials and required services, issue quote requests and select the manufacturers to build and integrate the liquid cooling system.

Browse our latest issue

Intelligent CIO Africa

View Magazine Archive