There are various ways to get the best out of your data centre. Zac Potts, Associate Director (Data Centre Design) – Sudlows, discusses what this means and why it is critical to invest in definition and design in order to lay strong foundations for reliable performance.
What does it mean to you to be squeezing every drop of performance out of a data centre? The answer will undoubtedly change depending on the angle you’re coming from, but should it?
Designers of the cooling or power systems may read this as delivering the highest capacity or most efficient cooling or UPS System. IT teams will have different ideas of ‘performance’ depending on what they do: compute power, bandwidth, or storage capacity, perhaps.
Finance, of course, will likely look at the bottom line – a high-performance data centre is one that generates a lot of money, or costs very little to support a business which generates a lot of money.
Fundamentally, performance is about output, so it’s critical that we understand what performance means to the data centre in question. Unfortunately, there are very few data centres with a single function – Bitcoin-mining farms maybe – but the vast majority are inherently complex with numerous functions, the proportion and distribution of which may even change with time.
Getting the most out of any single facility can mean different things – is it more racks, more power, or higher density? Is it high-performance compute, GPU arrays, or other specialised hardware?
‘High performance’ can mean many things to different people.
Performance is different to efficiency but often the two become closely linked – make it more efficient and then use the spare capacity you’ve created to deliver more.
Whatever the intended meaning, there are two critical factors to delivering high performance; definition and operation.
The definition stage outlines what is needed and within what limitations. Depending on the application, it could be simple or fairly complex, but to be able to squeeze every bit out of a facility or design, it is important to ensure the definition is sound, free from ambiguity or issues and applies the right constraints in the right places. It is not the supply air temperature to the data centre that’s important, for example, but the temperature of the equipment being supported. Defining the wrong parameter often results in great effort and expense being invested in meeting a specific requirement, which is later uncovered to either be outdated or set arbitrarily.
Whether a new facility or an existing one, the investment in the definition stage will always pay off as once fully defined, we are then able to maximise the performance of the facility, and because of a good definition, we will know what performance means.
For many years and still today, many high-performing data centres will consist of a number of carefully tuned systems, each looking after part of the system and reacting to changes in demand, be it at the IT or facility level. These control systems work to balance the performance targets and constrains set out in the definition stages, so the importance in getting that right is clear. Advanced design tools like CFD and advanced load placement algorithms offer a way to refine operation but are still based on the same definition and only offer information based on a snapshot in time.
A data centre with a solid definition, well designed with a modern deployment of sensors and controls, would still be a good example, delivering good figures in any number of KPIs chosen to be reported. That said, momentum is growing with the adoption of more complex systems with a wider scope and some level of Machine Learning.
Machine Learning can in some cases be overstated. At this time, the level of adoption is limited and within active deployments, there is a range of successes and failures.
The proven potential of Machine Learning systems cannot be undervalued though, especially when it comes to the final incremental improvements in efficiency and performance. It is in this area where Machine Learning offers an impartial, multi-skilled, constantly working and constantly watching, team member one who is aware of the goals and can predict how these are best achieved.
The same tools which feed into leading design processes are being integrated into the ML decision tree of the ML models. At Sudlows, for instance, our Simulation and Modelling team are integrating CFD and hydraulic system models so algorithms can work with both observed historical data and continually recalculated simulated results of scenarios which, hopefully, we’ll never experience – unnerving combinations of poor load placement, system failures, peak days and grid power interruptions.
Limited to just improving the performance of the M&E, a developed ML system will soon become unchallenged, but fortunately the scope is much greater. Systems have expanded to consider long- and short-term reliability, offer predictive advice on imminent faults and issues and perhaps most importantly, bridge disciplines to advise IT and M&E systems based on the calculated impact to the other.
There is a huge gap between the majority of the industry and the small few that are implementing such systems at scale and given the adoption of basics such as aisle containment, it might well be a long time before we see such systems in the majority of spaces, but we will eventually.
The key to squeezing every last drop of performance out of a facility might one day be a highly-refined Machine Learning system, but first and foremost, in my opinion, it is the project definition. A poor definition of what is required and the constraints within which to operate will hinder the future deployment of Machine Learning much the same as it will hinder the initial design and manual refinement.
For today, although such advanced controls will always offer an edge, a well-designed facility with a ‘standard’ system can still be optimised for a good level of performance gains through good initial design and constant review, indeed using many of the same tools which feed into an advanced ML platform.
In many ways, a modern facility with its extensive data collection and dynamic operation is ‘ML ready’ when the time comes, but in the meantime, it is critical to build up from the basics and invest in the definition and design, or the additional layers of the future will have a very poor foundation upon which to build.