The road to recovery: Core concepts of DR planning and mistakes to avoid

The road to recovery: Core concepts of DR planning and mistakes to avoid

If there are no effective and preventive recovery measures in place, disasters of all types can result in downtime that can cause significant damage to an organisation’s bottom line. Sergei Serdyuk, VP of Product Management, NAKIVO, offers insight into the difference between Disaster Recovery and Business Continuity as well as some of the major pitfalls to be avoided when it comes to DR planning.

If there are no effective and preventive recovery measures in place, disasters of all types can result in downtime that can significantly damage an organisation’s bottom line. Ransomware attacks, outages, floods, fires and human errors are all disaster scenarios that could happen to any organisation at any moment. 

How disasters affect organisations and data

The cost of downtime for business can be divided into tangible and intangible costs. Tangible costs directly result from the interruption of business operations and cause a business to lose revenue and productivity. Intangible costs refer to the loss of prospects due to damage to the business reputation and brand image. Additional costs can be associated with repairing damage in the aftermath of downtime, such as legal fees for compliance in efforts to adhere to data protection regulations. 

The cost of downtime should provide a good idea of why high availability is a major focus for businesses of different sizes and across industries nowadays. According to The Uptime Institute’s Annual Outages Analysis 2023, ‘When outages do occur, they are becoming more expensive, a trend that is likely to continue as dependency on digital services increases. With more than two-thirds of all outages costing more than US$100,000, the business case for investing more in resiliency — and training — is becoming stronger’.

According to the report, the most common causes of human error-related outages are data centre staff failing to follow procedures (47%) and incorrect staff processes and procedures (40%). Other contributors include in-service issues such as inadequate maintenance. Given these facts, a disaster scenario impacting an organisation at some point seems to be inevitable.

Disaster Recovery vs. Business Continuity

A strong Disaster Recovery (DR) plan enables rapid recovery from disasters without severe data loss, downtime or financial damages. For example, in a ransomware attack, recovery options are limited. Either the organisation pays the ransom, reinstalls everything from scratch and loses its data, or restores from a backup. Data backup is a core component of Disaster Recovery, but it is far from the only part. 

It is important to highlight a common error here: that the DR plan is often confused with the Business Continuity (BC) plan. Both are very different and serve different functions. While some Disaster Recovery concept and metrics commonly overlap and can be confusing, there are important distinctions to be made. 

Business Continuity refers to a business’ ability to maintain core operations during and after a disruptive incident and involves creating a framework that helps an organisation deal with potential threats. Whereas DR refers to restoring access to IT systems and data needed to carry out critical business functions and can be considered as a key component of BC planning. In other words, BC depends on DR processes and measures put in place to regain access to critical systems. While DR and BC both focus on getting operations up and running in minimum time, DR is a more focused process that aims to limit the damage and restore access to data and systems which are crucial to the business to function properly. 

System availability: RTO and RPO

One cannot explore DR planning without considering system availability. System availability refers to the system being operational and accessible to users at any given moment, including planned downtime for maintenance. Availability is measured as a percentage of uptime and is usually in the high nines, such as five-nines (five minutes of downtime per year). 

The most critical metrics with regard to system availability are RTO and RPO. 

RTO (recovery time objective) specifies the target amount of time that systems can be unavailable before full recovery. The lower the RTO, the better. Should this time be exceeded during a disruption, it means the organisation is starting to suffer losses. 

RPO (recovery point objective) refers to the amount of data the organisation can afford to lose during disruption. RPO plays a vital role in determining the frequency and retention policy of the backup. For example, in terms of backups, RPO refers to the last usable version of the backup. If the RPO is one hour, then an hour’s worth of data loss or less is tolerable. If the RPO is four hours, then up to four hours of data can be lost, and so on. 

Backup, recovery and replication 

Perhaps the most well-known component of Disaster Recovery is backup. Simply put, a backup is a copy of data stored for the purpose of recovery after disruption. Recovery from backup is the most reliable way to restore operations after a disruptive event. 

Replication refers to creating geographic, geographically distributed copies of the data to improve data accessibility and availability. When done properly, replication can ensure some of the shortest RTOs right after a disruptive event. 

But what is the difference between replication and backup? Backups can be used for archiving and long-term storage. Backups also facilitate the operational recoveries and can be helpful for regulatory compliance. Replication, on the other hand, requires a remote site dedicated to replicated workloads. Those replicas are intended for instant recovery of workloads for business continuity and availability. Replicas are usually stored in the original format to speed up Disaster Recovery and reduce downtime. 

The most common mistakes made in DR planning 

DR planning cannot be fully explored without consideration of some of the common mistakes that most businesses unintentionally make during this process. Here are five major pitfalls to be avoided:

1. Not protecting DR systems

The most commonly made mistake is not properly protecting Disaster Recovery systems in the production environment. When a disaster hits, these systems can be wiped out along with everything else and all the work invested into Disaster Recovery planning. 

2. Not allocating the necessary resources to the DR plan 

Another mishap is not allocating the necessary resources to the DR plan. For example, DR recovery will entail transferring and uploading significant volumes of data. So if the required bandwidth and network infrastructure is lacking, this can create a huge hindrance to the DR process running effectively. 

3. Assuming offsite data is protected

Another common mistake is the assumption that offsite data is automatically protected. For example, while cloud service providers are typically responsible for maintaining infrastructure uptime, they do not usually provide sufficient protection for their customers’ data. 

4. Unnecessary backups: The impact of non-essential data

It is important to ask yourself whether every bit of data in your infrastructure is critical for Business Continuity. It is highly unlikely. Regularly creating backups for non-essential data wastes storage space and raises costs by a significant margin. Unnecessary backups also take valuable time away from high priority tasks. 

5. Using high availability as your DR plan

Using high availability as your DR plan is another common pitfall. While high availability can do wonders for Business Continuity, considering it as a replacement for a DR plan is a recipe for failure. High availability should be reserved only for critical systems.

6. Leaving DR plans untested

Finally, leaving your plan untested deprives the organisation of valuable insights into the reliability of recovery workflows. Regular testing allows the identification of any weaknesses in the workflows and the allocation resources to fill gaps, eventually building a more robust and resilient DR plan.

Browse our latest issue

Intelligent Data Centres

View Magazine Archive