A Guide to Writing Contingency Plans/Playbooks

Aug 20, 2021

This week, Charlie goes into depth about different business continuity plans, how to know which plan suits which incident, and how to create a framework that works for you!

I have spoken about the requirement for writing generic response plans in a previous bulletin. This is a framework that covers all responses to any type of incident. The concept of these frameworks is that the “next incident is the one we haven’t thought of”. In an age of unprecedented incidents and black swans we cannot write contingency plans for incidents we don’t know could exist. Once we have the generic plans in place, we then need to turn our attention to possible scenarios that could impact the organisation and decide if it’s worth writing a response specifically for it.

When I talk about contingency plans, I am not thinking about the ‘old school’ methodology where you write a plan for every eventuality. What I mean is, that when you create a plan for a particular building you write a response for every possible threat e.g. flood, fire, transport blockage, radiation leak at the local nuclear station, etc. All the plans are basically the same but have a different heading. This led to very thick plans and would often confuse the user if the exact scenario didn’t manifest itself, then it became difficult to know which plan they should use. This style of plan was once very popular in the US, and I still use it occasionally. The present way of writing business continuity plans are based on the concept, I don’t care how or why the building is unavailable, but the plan will cover how to continue to deliver essential activities as if a key asset was lost.

Identifying the Contingency Plan Required:

As part of the analysis stage of the business continuity life cycle, the first task is to look at the organisation’s risks and threats. For example, identify whether the building is in a flood zone, then only one member of staff needs to know how to carry out the steps to respond, or a single point of failure, such as a particular application which is critical to operating the organisation. Additionally, are there a number of suppliers who, if failed, would have a significant impact on our organisation?

Once the risks and threats are identified, lets move on to the design stage of the lifecycle and look at the possible solutions to mitigate the identified threat or risk. For certain risks you will need to put a mitigation plan in place, i.e. make sure that the single point of failure person shares their knowledge with others, maybe buy from two suppliers rather than one or purchase additional disaster recovery services to make sure that the critical application is less susceptible to downtime. To mitigate the risk you might also put in a capability that will help recover faster after an incident. For example, you could have a call centre where people are not allowed to work from home due to security reasons. This call centre is critical to the operations and has a very short RTO if there was a breach or the systems went down then this would have a huge impact on the business. You could purchase a work area recovery (WAR) solution or build an alternative in-house solution that would allow you to get the call centre up and running quickly. Only by implementing this solution will the short RTOs be met. Other similar capabilities could be a product recall, a disaster recovery server rebuild plan or have a contract in place to bring on an alternative supplier if the main supplier fails.

Writing a Task-Orientated Contingency Plan:

Where there is a procedure or a process to be carried out with a beginning, middle and end this lends itself to a task-orientated contingency plan or playbook. To rebuild a service there are a set number of steps and a sequence to follow. Often there is only one way to do this and the process can be very detailed.

  1. Load tape 1
  2. Type in “//now run 456tv”
  3. This will take approximately 30 minutes to run
  4. At the end, you should see a message saying G237pq tape run installed successfully

In the same way, a product recall or even a WAR activation have a series of consecutive steps to carry out the process. For WAR these could be:

Once the decision is made to activate work area recovery carry out the following tasks.

  1. The BCM will ring SunGard on 0141 XXXXXXX and say “this organisation X would like to activate 100 seats under the contact number 123456”.
  2. Ring IT and tell them you are invoking the WAR contract and ask them to go to the recovery centre as per the plan.
  3. Ring the Call Centre Manager and get them to identify which staff will go to the WAR location.
  4. Order 2 buses from McGill’s (number in plan) quoting contract 123786 and ask for them to be at the HQ car park in 3 hours.

Where there is capability this lends itself to a series of tasks or even a flowchart as there is a logical flow to every task.

Writing Considerations to Contingency Plans or Playbooks:

When there is a scenario where there’s not a set sequential list of tasks to be carried out, but the plan needs to give the responder guidance on how to deal with specific types of incidents, a judgement call will need to be made on how to respond and then a considerations checklist should be used. The cyber response I think is particularly suitable for a consideration style playbook. There is not a specific way to respond to a cyber-attack and the decisions may have to be made on the circumstances so this doesn’t lend itself to a list of tasks. Having a consideration contingency plan lends itself to a more flexible approach as often scenarios don’t manifest themselves nor the impacts they caused as you might’ve imagined. Therefore, a certain amount of the response will have to be made up on the day to suit the particular circumstances.

The elements which should go into a consideration contingency plan is as follows:

  1. The tasks that need to be done (this should be a list of known tasks and responses which need to be carried out but should be standalone tasks rather than as per the task-orientated contingency plan which covers a sequence of tasks)
  2. Which decision needs to be made, who by and when
  3. Possible risks
  4. Possible impacts
  5. List of stakeholders for the particular incident
  6. Any changes in the RTO for this particular incident
  7. Supporting documentation e.g. the crisis communications have a set of predefined templates for an incident
  8. Questions you might want to ask to understand the situation and where information may be sought from
  9. Guidance on communications strategies and ‘lines to take’
  10. External support required or contracts in place
  11. Worst case scenarios

I think this type of list lends itself to a specific scenario and the person writing the plan can use their experience of managing an incident, but at the same time giving the person who will be using the plan flexibility on how to respond as the incident manifests itself. A plan doesn’t give you all the answers as there’s not a set answer to most incidents. But, it does give you a framework to follow and when under pressure the vision narrows the framework to read down the list and consider the incident in its entirety.

Look at Your Incident Management Guidance Again:

I think you should look again at your plans and ask yourself:

  1. Do I have a good generic framework that can manage any incident?
  2. Have I identified and then implemented capabilities I need to recover?
  3. To implement the capabilities do I have to write a task list that is sufficiently detailed enough to for those responsible for implementing?
  4. For the risks and threats identified have I developed a list for consideration for those responding which will assist and give them guidance at the same time as being flexible enough for them to use their judgement and experience to respond?

Sign-up to our Newsletter

"*" indicates required fields