The path to reliable data centre networks

The path to reliable data centre networks

Roland Mestric, Head of Marketing for IP Network Automation, Nokia, tells us how achieving reliable data centre networks demands a new approach to building and managing networks.

The phone rings in the middle of the night, announcing another outage.

You join the bridge, while you are digging around device logs, looking for the source of this chaos. It feels like hunting for another needle in this haystack of hell. Then you hear the chime of someone new joining the bridge. A junior engineer, trembling at the gates of judgment, eeks:

“I applied a change, thinking it wouldn’t have any impact…”

It is the beginning of a long, stressful night of troubleshooting and firefighting.

This scenario is all too familiar for many data centre network engineers.

Why do data centre networks break so often?

According to the Uptime Institute, human error is the root cause of up to two-thirds of data centre infrastructure downtime, highlighting a critical vulnerability that must be mitigated in an era where cloud-based applications and services are the backbone of the global economy.

These human errors can include misconfigurations, improper policy updates, insufficient pre-production testing by network operators and accidental input errors as a result of misunderstanding or ‘fat fingers.’

The growing complexity of modern networks adds to the challenge. In data centres, this complexity stems from factors like hybrid cloud and on-premises environments, in addition to the integration of legacy hardware.

Ensuring there is consistent connectivity, enforcing security policies and maintaining smooth data flow between cloud providers and on-premises systems is increasingly difficult. Adding to the problem, legacy hardware further complicates networks, as outdated protocols and proprietary configurations often conflict with modern API-driven and software-defined networking, leading to operational inefficiencies and potential failures.

These problems need a remedy

The consequences of downtime can be severe, resulting in lost revenue, damaged reputation and compromised customer trust. In an era where businesses are increasingly reliant on data centre infrastructures, the need for resilient systems has become more crucial than ever.

Data centre teams expect their networks to be simple, reliable and easy to scale. They also expect everything will keep working even while they’re upgrading hardware or software. And they crave products that are resilient, secure and perform well.

Overall, data centre networks need to be reliable and simple to manage but achieving this requires moving past outdated practices.

Automation to the rescue

Reducing human error to zero is the goal for network operations.

Automation is a key tool that can help reduce the dependency on manual intervention and streamline routine tasks.

However, existing solutions have not realised the promise of automation because they focus on the wrong things and miss out on adding more reliability and predictability to data centre network operations. Automation can amplify both positive and negative effects, making it essential to implement it carefully. Turning on automation at scale can uncover previously unknown weaknesses that expose the fragility of the network.

Lack of network visibility is a key issue. Without real-time insights into device performance, traffic patterns and network health, IT teams are forced to operate blindly and react only after problems impact customers. Troubleshooting becomes slow and inefficient, as engineers manually piece together data from fragmented tools which increasingly drives up resolution times and prolongs disruptions.

Predictable operations

We need to make network automation more trustable and easier to use while making network operations predictable.

If we can ensure that the proposed changes will succeed, we’re able to move faster and with greater confidence. If our tools emphasise correctness and repeatability, we’ll reap the benefits we’ve been after all along, and we can avoid those calls at night and those regretful comments: “I thought it wouldn’t have any impact.”

With features like revision control, digital twin modeling, pre- and post-change checks, streaming telemetry and network-wide transactions, operations teams can trust that the network performs exactly as intended. Every change is rigorously validated and verified before it goes live, ensuring seamless, reliable operations. This predictable automation guarantees correct outcomes and emphasises repeatability.

Just as pilots use precise telemetry and pre-flight checks to ensure safe travel, predictable operations in data centre networks can ensure that changes are implemented safely and efficiently, without disrupting the entire system.

But automation is not enough

Product quality plays a critical role in network reliability. Defective hardware or buggy software can trigger unexpected failures, disrupts operations and force organisations into reactive mode by scrambling to resolve issues they didn’t anticipate. 

To manage accelerating demand with all the freedom and control they need, data centre teams should prioritise solutions that adopt a quality-first approach and bring new levels of reliability, simplicity and adaptability.

The future of data centres

To mitigate risk, organisations need proactive automation, comprehensive observability, top-of-the-line product quality and a fail-safe validation solution.

Achieving reliable data centre networks demands a new approach to building and managing networks. By addressing human error, poor visibility and product quality, and by embracing emerging technologies such as AI-driven automation and real-time observability, organisations can achieve a new level of operational resilience, ensuring persistent business performance, especially as the complexity of data centre infrastructures grows.

Data centre networking is evolving toward self-healing systems that proactively resolve issues to prevent downtime, reduce human error with event-driven automation and deploy advanced hardware and software that set higher standards for performance and reliability.

Browse our latest issue

Intelligent Data Centres

View Magazine Archive