Disaster recovery (DR) readiness is a necessity for today’s highly digitalised companies and an important item on any board agenda. CIO’s must be able to prove, with full documentary evidence, that the business is operationally resilient from an IT perspective. This can turn out to be extremely difficult using partially automated and manually operated tests, says Ryan Lawlor, Sales Specialist, CASA.
“Given the extent to which a company’s IT systems are integrated into all aspects of the business, disaster recovery tests of these systems is a major component of any company’s operational resilience program and reporting, and thus a pressing concern for the board,” Lawlor notes. “The benefits of an effective resilience programme are significant to more than just a DR success rate, they carry across to better stakeholder returns and success ability of a company, thus getting your DR right becomes even more important.”
Research by McKinsey shows that the quest for resilience is a trend. Resilient companies generated 50% greater total shareholder return than their less-resilient peers during the 2020-1 economic recovery, and generally outperformed their peers in all economic conditions. But only 16% of respondents felt their business was very well prepared to anticipate and react to external shocks and disruptions.
King IV recommends that boards ensure that business resilience is properly provided for (Principle 12, Recommended Practice 13(c)). Boards need a high level of confidence that the company’s IT systems can recover from a disaster within the agreed timeframes.
Lawlor says that when it comes to DR readiness, the challenge is that most company’s IT infrastructure platforms have become extremely complex, including a dynamic mix of on-premise, multi-cloud and private-cloud environments. Ensuring they are recoverable depends on regular, thorough testing, something that is very time-consuming, disruptive, resource-intensive and expensive.
In these complex, dynamic IT environments, the sequences followed in the runbooks must be exact and correctly interpreted as the author wrote, or failover will be compromised often causing delays—a fertile area for human error.
In response, Broadcom developed the Automic Enterprise Disaster Recovery Automation and Monitoring solution, to fully automate the monitoring, reporting, testing and workflow processes across the entire DR life cycle. This eliminates human error and means that testing can be undertaken with up to 80% fewer human resources, and at a quarter of the time. This equates to saving hundreds of manhours over weekends and greatly reduced RTO (Recovery Time Objectives) and RPO (Recovery Point Objectives) targets.
As important, Lawlor says, the DR Automation and Monitoring solution provides a single view for managing the full recovery process across physical, virtual and private cloud infrastructures, giving real-time views with RTO and RPO times, as well as full documentary evidence and proof of operational resilience that can then be submitted to regulators as well as giving the board confidence in the company’s ability to withstand any disaster.
For example, using this solution, ING Bank is able to failover in 25% of the time it previously took, with 90% of the processes automated.
“Disaster recovery preparation and testing is a highly complex and intensive subject area. Automation and monitoring of the runbooks, environment and processes offers a way to make it more reliable, less expensive and faster, with fewer resources, while providing full documentary evidence of the tests. This gives the CIO, the board and the regulator, the necessary audit reports. It also provides the company with the opportunity to test more often giving everyone an even higher level of assurance that the business will survive whatever is thrown at it now and into the future, instead of thinking ‘We have done a test, I hope we can come back online as soon as possible’,” Lawlor concludes.