Are you at risk of a cooling failure? Seven checks you can do today

Are you at risk of a cooling failure? Seven checks you can do today

Jo-Anne Garvie, Director, Secure I.T. Environments

Jo-Anne Garvie, Director at Secure I.T. Environments, offers seven tips for things you can do today to evaluate whether you might be at heightened risk over the summer.

In early June 2023, it was reported that the El Niño planet warming weather effect had started in the Pacific and would lead to increased temperatures across the globe over the coming two years. Recently, we have seen the terrible impact of forest fires in countries across the globe such as Canada and Hawaii. It all sounds like events happening a long way away, but UK temperatures have risen on average 1 degree since 1990 and summers have set new records in recent years. 

For data centre owners, these periods of prolonged high temperatures are of particular concern because of the strain and risk of failure they place on critical infrastructure. Poorly maintained cooling has been a major contributor to some serious data centre failures in recent years and with supply chains limiting the availability of new equipment it is critical to stay on top of the needs of your data centre assets. 

Keeping a cooling system in the best possible shape will reduce the strain it is under, make it more efficient, cut costs and prolong its life. But there are things you can check today that could have an immediate impact on the performance of you cooling infrastructure and give you early warning of problems that may be to come.   

  • Clean those coils – When was the last time the coils in your refrigerant based AC system were cleaned? You should have a log of this and if it was two or more months ago, then your next clean should be booked in. Clean coils offer the optimum heat rejection for your system, so getting into a cycle where they receive a clean just before the hotter summer months is important too. Every system should receive 3-4 cleans across a year.
  • Refrigerant gas charges – When was the last time your system was charged? Inevitably gas losses take place during the normal operation of your AC system, and if not kept within their normal range, then the performance of the system is compromised. Your system may warn you if it becomes seriously low but you may not notice drops in performance that have already occurred. Your system should be routinely charged annually to keep it operating at its optimum design capacity and consider adding a Pressure Leak Test to any servicing package you have.
  • Are you out of spec? – In recent decades design parameters for ambient temperatures have risen from the mid 30s centigrade, up to mid 40s. Depending on the age of your installation, it may not be designed to deal with these new norms, putting you at an elevated failure risk because your system is working harder to maintain target temperatures and humidity. Check these data points so that you can assess whether you need to accelerate your upgrade plans.
  • Don’t forget your CRAC units – Once again it’s all down to organised maintenance and the hardworking CRAC (Computer Room Air Conditioning) units can often be overlooked with teams feeling the job is done when they have cleaned coils. Check when the filters were last cleaned or replaced as these units will be working their hardest right now, and perhaps harder than they would otherwise need to.
  • Thermostat review – What are your cooling and humidity targets and are they still valid? Blocked thermostats, or ‘tweaks’ made to temperatures over time, can lead to inefficient cooling. So, take the time today to check there is clear airflow to all thermostats, they are well placed, settings are correct and locked. This can make a huge difference to how efficiently your system is working, energy efficiency and costs.
  • What’s moved around? – Team members may have moved servers or whole cabinets around in the data centre. We’ve seen examples of empty cabinets being cooled or a row of cabinets containing a single piece of hardware each.  Whilst sometimes there are valid reasons for dividing equipment up in this way, it should only be done when necessary. Undertake a cabinet by cabinet review of how your servers and cooling are being distributed. Adding blanking panels to cabinets can also maximise the efficiency of air flow.
  • Strategy drift – When was the last time the cooling component of your data centre or IT strategy was reviewed? And how has the business or its sustainability goals changed since then? It is important to make sure these are working in parallel towards a common goal and have not drifted apart. Whilst you can’t fix it in a day, you will be able to check when it was last reviewed and consider whether to make this a more regular feature. Even if the business is still well aligned, equipment evolves year on year in terms of efficiency and output, and modern systems are sized to cope with ‘new’ extreme summer conditions that we face today. It’s important to keep your finger on the pulse and a data centre audit can be a great way to assess where you are.

As reliance on our data centre infrastructure increases, so do the risks associated with any failure. Keeping the environmental conditions of that estate under control is critical, but with good planning, controls and maintenance regimes in place, they can be mitigated and ensure that cooling equipment works well across its lifetime. Supply chains for cooling equipment remain stretched, with long lead times for components, so it is important to keep cooling front of mind and avoid any unexpected surprises.

Browse our latest issue

Intelligent Data Centres

View Magazine Archive