Defining RTOs – Help Needed

Jun 10, 2022

This week, Charlie needs your help and advice when it comes to RTOs! Read on to learn more about his questions and thoughts regarding the changing world of business continuity.

This week, I will be sharing some of my thoughts regarding RTOs as I am slightly struggling, and hope some readers of the bulletin may be able to share their advice. I am an FBCI, who has contributed to several versions of the Good Practice Guidelines, and I have written a book on business continuity, so I should know my stuff! However, I have a few conundrums which I would like to share with you. I am presently on an island in the Caribbean working with a power and water company. The company supplies water and power to the island which has a population of about 130,000. The island has no standing water so all the water comes from a desalination plant. A few years ago, PlanB Consulting won the contract to help them improve their business continuity. Over the last few years, we have written loss of power plant plans, water desalination plans, cyber incident response plans, a crisis plan, improved their hurricane plans and we are presently developing their business continuity provision.

What is the RTO of power production?

One of the first BIA interviews that I conducted was with the head of the power production. They have a large number of engines of different sorts, and fuel types, as well as a solar farm, and wind turbines which produce a sixth of their power. There are some single points of failure in terms of power distribution, and the generators are all situated close to the shoreline making them vulnerable to a storm surge. We have looked at their MBCO, which comes into the bracket of 0-24 hours. One of the criteria for setting the MBCO would lead to loss of power across the whole of the island. I am now struggling to set the RTO. Below I have listed my three choices:

  1. Setting the RTO at 0 as a total power loss to the island is unacceptable, and we don’t want to say that a power loss can be tolerated.
  2. Setting the RTO at 2 years is the amount of time it would roughly take if the generator sets and infrastructure were lost to a storm surge to rebuild and recover to what they have at the moment.
  3. They have set themselves a target of no more than 24 hours of power loss across the island in a year, so should the RTO be 24 hours?

I go round in circles on this one, but I am going to set the RTO at 0 knowing fine well in certain circumstances that the RTO cannot be met.

I am also looking at the MBCO and what is the minimum acceptable level of recovery at a 0 RTO. Is at least part of the island on power or the whole island on power? The problem for a business continuity consultant in these circumstances is that as there are so many permutations of loss of engines, and combinations of circumstances that you can not really put in place a recovery strategy, and also that they are the expert in running their power plant, so, how am I an amateur in power production, going to tell them anything they don’t know already?

To meet the requirement for implementing business continuity, I feel that I should give a key part of the company an RTO and the business continuity treatment. However, I feel that I am giving a meaningless RTO which I don’t really like doing.

Call centre RTO and recovery

Last week, I did the BIA interview for the call centre. This is a much more fertile ground for a business continuity professional. I feel as though I have the skills to get a good RTO for the call centre, and then suggest a strategy for its recovery. This was the bread and butter of the business continuity profession pre-covid. In this case, oh, if it was only that simple. Their call centre is important, but instead of the traditional call centres where call takers take all the incoming calls, they have multiple channels for customers to contact them. They have self-service through their website, email, WhatsApp, face-to-face, and telephone. In fact, the telephone is not that widely used, as it takes up people’s mobile minutes, and so all the other methods of contact are just as important. Loss of the building traditionally is where work area recovery (WAR) came in, and organisations spent millions buying WAR seats. The solution to the loss of their building was to work from home, use softphones on their laptops or to log on through any other computer, and make use of a virtual machine. They are not bothered in the least about losing their building. Even the laptops left in the building are no longer watched, as they can use their home PC, or even a tablet to log into the company systems. We have set an 8 hour RTO for them, but as they have so many different ways for the customer to contact them they are unlikely to have all means of contact down at once.

The one with the long lead time – HR

When teaching business continuity, I say the essence of business continuity is all about the priority of recovery and recovering the things which are most critical to the organisation first, and the things which have the least impact last. The department which I always use as an example of a long RTO is HR. For this organisation, I have HR with an RTO of two weeks. This fits the business continuity model with their MTPD at greater than 1 month. This is all fine in theory, but I will have to explain why they have an RTO of 2 weeks when during any disaster they can work from home immediately, and aren’t going to wait for two weeks before doing anything. Yes, there could be a situation where there are limited resources, and they have to go back to the queue as others need them, but you get into quite an obtuse argument trying to find a situation where a 2 week RTO recovery would actually happen. It does make the point that they are at the back of the recovery queue, but why stop their recovery and wait for 2 weeks just to be in line with their business continuity set requirements, if they can operate immediately?

In all of these cases, I believe I have applied the business continuity theory correctly, but I am coming up with real-world issues which don’t really seem to fit the model. Then you try and explain this to the recipient of business continuity, and they, like you, are not really convinced. Business continuity needs to add value to an organisation, tell them something they didn’t know and improve their resilience. With the advent of work from home and greater IT resiliency, I feel sometimes we are trying to justify a process which is not really taking us anywhere, or improving the resiliency of the organisation. I am not sure yet what the answer is, but I am going to continue to ponder this issue.

Any thoughts or help will be gratefully received.

Sign-up to our Newsletter

"*" indicates required fields