As the use of ‘playbooks’ is becoming increasingly popular, Charlie provides his thoughts on how they can be effectively used within Business Continuity.
On social media and in business continuity circles, I am seeing more and more reference to ‘playbooks’ so this week I would write my thoughts on them and leave it up to you to decide if you need one!
The best place to start is to find a good definition. Playbook is a noun from North America meaning “a book containing a sports team’s strategies and plays, especially in American football”. The areas you see them being talked about is mainly in the area of IT and cyber response, and sometimes I hear people calling their crisis management plans, playbooks.
Plans, typically for me, consist of two parts. There is an incident management bit which is the framework within which the incident will be managed. So, this bit states who is in the incident management team, who is on call, their roles and responsibilities, locations, call out and how they will interact with other teams. Within the first part there could also be communications guidance on who the team is responsible for communicating with, how they will communicate, when they will communicate it out, and what are each audience’s communications requirements.
The second part of the plan consists of the recovery strategies that the incident management part will manage. These could be recovery of premises, loss of people, loss of IT, and telephony or a supplier. There could be also be plans to deal with specific events such as kidnap, dawn raids, cyber-attacks or strikes.
Most of the recovery plans in business continuity plans are linear, in that we have a set strategy and if an incident occurs, we follow a set strategy. For example, if we lose our building our short term plan is to send non-essential staff home and essential staff to a work area recovery location. The medium term is to rent additional offices, and the long-term strategy is to rebuild the lost office. What you are going to do and the major strategic decisions are made in advance. The only decisions which need to be made are the tactical and operational decisions such as who you will communicate with, and which particular staff will go to the recovery location.
What is different in a playbook is that there a number of options for recovery and the strategic decision of which one to be chosen will be made on the day and will depend on particular circumstance of the incident. This type of response is particularly relevant to a number of different incidents, especially in the IT and Cyber domains.
Typical examples could be:
- Do you switch to your alternative data centre or do you wait until the main comes up?
- Do you pay a cyber ransom?
- When do you need to inform those on your databases that their information has possibly been hacked?
- If you have denial of access attack do you disconnect your system from the internet?
In all of these examples there is no obvious response and it will very much depend on the circumstances and will be a judgment call on the day. Often this judgement call might need to be made without the full facts being available and the consequences of whatever decision may be substantial.
As with the more conventional plans you still need the incident management and the communications framework but it is the response bit which if different.
So what might we want record against each of the scenarios in our playbook, using the ‘whether we switch to the alternative data centre’ as an example?
- What is the scenario the ‘play’ covers? Whether to switch to our alternative data centre.
- Options available – do we switch to the alternative data centre or wait to bring up the main data centre?
- What circumstance is there a clear decision? If total destruction of the main data centre we switch to alternative, or if the main is likely to be down for less than 4 hours, don’t switch to the alternative.
- What are the issues to be taken into account on making the decision? How do we get the data created on the alternative system once the switch has taken place and synchronised back into the main system?
- How long does it take to implement the decision? The switch will take 2 hours.
- Who can make the decision? CIO, CTO.
- What needs to be done to implement the decision, and who needs to do it? You may want to cross reference any plans to implement this – IT staff.
- What is the downside of operating the alternative? There is reduced connectivity so the users will notice a slowdown in system performance.
- Who needs communication that a decision has been made? All users, call centre, senior managers.
- What subsequent actions need to be taken and how will the recovery be carried out?
I don’t use the playbook very much but I think there is lots of scope to incorporate plays within my plans. I encourage you consider using them as well!