Intelligent CIO Europe Issue 73 | Page 35

TALKING

‘‘ business

tThe need to adapt physical infrastructure design for data centres in the era of AI disruption is important . Can you elaborate on the specific challenges that data centres face when accommodating AI-driven workloads ?

Even though AI is a hot topic , very few people are talking about the physical infrastructure aspect of it . AI presents a different type of workload and technology compared to the traditional , more common x86 two-socket server . New AI workloads have GPU accelerators that run in parallel , operating like one giant computer , which is quite different from the x86 servers , which process workloads and then return to idle mode . They are capable of processing and training data at very high speeds and capacities .
Schneider Electric ’ s white paper outlines key considerations related to power , cooling , racks and software tools in the context of AI . Can you provide insight into some of the most critical considerations and their impact on data centre design ?
The servers are different – they are larger , heavier , deeper and have more connections . Frequently , they are now liquid-cooled , or will be in the future . There are tremendous changes to power , cooling and racks , which must be beefier to support the weight . In addition to the servers being larger and heavier , they also use more power . and it spiked up occasionally to 10 . The new training clusters run at capacity , so if you design it for 100kW per rack , it will run at 100kW per rack .
It ’ s important to be cognizant of running at capacity and use software tools to manage your environment as you are on the critical edge and have marginal buffer .
AI workloads are expected to grow significantly . What strategies and innovations can organisations implement to address the increasing power demand in existing and new data centres , optimising them for AI ?
There are two approaches to consider . Starting from scratch would be the preferred option , as it allows for the optimisation of the power train with fewer step downs of voltages and transformers . For existing environments with sufficient power capacity , technologies like rear door heat exchangers can be fitted to current racks , providing higher densities , such as 40 to 70kW per rack . Depending on available power , retrofitting the current site can be done . However , if power is limited , a very dense application may result in excess floor space that remains vacant .
Steven Carlini , Vice President of Innovation and Data Center , Schneider Electric
Another consideration is that the density of these racks could range from 30 to 50 to up to 100kW per rack . This denser orientation dramatically changes everything and presents challenges as the power must be delivered in a smaller area and distributed at higher amperage .
Finally , you have cooling concentration and piping coming in and out of the server , which leads to manifolds and cooling distribution units , which are all new factors .
You cannot spread out the load because these servers and individual GPUs run in parallel and are connected through fibre . The fabric , or the InfiniBand , is running at high speed , which means it is extremely expensive . By trying to spread the load apart , you would spend a lot of money deploying this fibre network for all these processors . In the servers , each GPU has a network connection and each one has a fibre connection . This presents a large cost in addition to the real estate costs .
Due to these high costs , we are seeing a high desire to deploy these servers , high-density racks and clusters with as small a footprint as possible . Due to their design , they operate very close to capacity and maximum thresholds . Previously , when you had 10kW per rack , you were usually running at 3kW per rack
Recommended strategies involve modelling everything with Digital Twins , from power systems to IT rooms . This allows organisations to better visualise the implications before deploying in the physical world .
With AI applications placing strain on power and cooling infrastructure , how can data centres balance energy efficiency and environmental responsibility with the demands of AIdriven applications ?
Currently , a permit for the construction of a data centre requires a demonstration the facility can operate at a very high efficiency level or very low PUE . In many cases , to obtain a permit , a data centre must show that it will be powered with a certain amount of renewable sources or use PPAs or renewable energy credits . It ’ s a given that these centres must be designed to be highly efficient .
Liquid cooling significantly contributes to making the cooling more efficient . The types of neon economizers and heat rejection used outside for further liquid are also important considerations . Designing data centres to be as efficient as possible and using the highest percentage of renewable power is the approach that leading companies are taking .
www . intelligentcio . com INTELLIGENTCIO EUROPE 35