Business Continuity and
Disaster Recovery Planning
1
More than 20% of all small – medium sized
businesses suffer a major disaster every 5
years.
Almost all that lose their data for 10 days or
more file for bankruptcy within a year.
www.Palindrome.com
Project initiation steps
Recovery and continuity planning
requirements
Business impact analysis
Selecting, developing, and implementing
disaster and continuity plans
Backup and offsite facilities
Types of drills and tests
Any disruptive event (natural or man-made)
that interrupts normal system in such a
significant way that a considerable and
coordinated effort is required to achieve a
recovery.
Geological: earthquakes, volcanoes,
lahars, tsunamis, landslides, and
sinkholes
Meteorological: hurricanes, tornados,
wind storms, hail, ice storms, snow
storms, rainstorms, and lightning
Other: avalanches, fires, floods,
meteors and meteorites, and solar
storms
Health: widespread illnesses,
quarantines, and pandemics
(remember Anthrax? What will you do
if they find Anthrax in the mailroom?)
Labor: strikes, walkouts, and slow-
downs that disrupt services and
supplies
Social-political: war, terrorism,
sabotage, vandalism, civil unrest,
protests, demonstrations, cyber
attacks, and blockades
Materials: fires, hazardous materials
spills
Utilities: power failures,
communications outages, water
supply shortages, fuel shortages, and
radioactive fallout from power plant
accidents
Damage to facilities and equipment
Utility outages
Communication outages
Transportation/delivery delays
Personnel unavailable (or unable to
travel) to work
Remember CIA?
Which of these security services
(security pillars) does business
continuity and disaster recovery
planning support?
Disasters are a fact of life
Personnel need to be trained and prepared
for their occurrence
Plan Type Description
Business Resumption Plan Focus on necessary business processes
instead of IT procedures
Continuity of Operations Plan (COOP) Establishes management and headquarters
after a disaster. Outlines roles and
authorities, orders of succession, and
individual role tasks.
IT Contingency Plan (ITCP) Plan for restoring systems, networks, major
apps after a disruption at the original facility.
Crisis Communications Plan Provides procedures for disseminating
internal and external communications; means
to provide critical status information and
control rumors.
Cyber Incident Response Plan Provides procedures for mitigating and
correcting a cyber attack – addresses
mitigation and isolation of affected systems,
clean up, and loss minimization
Disaster Recovery Plan (DRP) How to recover IT mechanisms after a
disaster. Focuses on disasters that require IT
processing to take place at another facility.
BCP and DRP are two distinct, but related,
plans
◦ Business Continuity Plan (BCP) - ensures
that the business will continue to operate
before (includes a focus on prevention),
during, and after an event. A strategic
(long-term) plan.
Identifies alternate personnel,
equipment, and facilities
◦
BCP and DRP are two distinct, but related,
plans
◦ Disaster Recovery Plan (DRP) – Tactical,
shorter-term plan that focuses on the
immediate response and recovery of
critical IT systems during a disruption.
Contains procedures for emergency
response (assessment, salvage, repair, and
eventual restoration of damaged facilities
and systems)
NIST 800-34:
Contingency Planning
Guide for Information
Technology Systems.
Seven step process
for BCP and DRP
projects.
ISO 17799: Code of Practice for Information
Security Management. Section 14 addresses
business continuity management.
BS25999: Code of Practice for Business
Continuity Management.
NFPA 1600: Standard on Disaster /
Emergency Management and Business
Continuity Programs.
NFPA 1620: The Recommended Practice for
Pre-Incident Planning.
HIPAA: Requires a documented and tested
disaster recovery plan.
Cheaper cyber insurance (reduced risk from
long term outages)
Market advantage
Process improvements
Improved organizational maturity
(ISC)2
Project initiation
Business Impact Assessment
Recovery strategy
Plan design and development
Implementation
Testing
Continual maintenance
• Integrate law and regulations
• Define the scope, goals, and roles
• Choose project team members
Pre-planning • Develop project plan and project charter
Activities/Policy • Management approval
• Identify critical functions (criticality analysis and impact statements) and resources
• Calculate MTD (Maximum Tolerable Downtime) and other key metrics (RTO, RPO)
• Identify threats
• Calculate risks
BIA
• Identify backup solutions
• Implement controls
Identify • Mitigate risk
Preventive
Controls
• Business process
• Facility
• Supply and technology
Develop • User and user environment
Recovery
Strategies • Data
• Document procedures, recovery solutions, roles and tasks, and emergency response
Develop BCP
• Test plan
• Improve plan
• Train employees
Exercise test drill
• Integrate into change control process
• Assign responsibility
Maintain • Update plan
BCP • Distribute after updating
Identify a business continuity coordinator to
lead BCP team
Develop team:
◦ Business units, senior management, IT dept.
Security dept. Communications department, legal
department
Develop a project plan
Gain management approval
Formal method for determining how a
disruption to the organization’s IT systems
will impact the mission.
Consists of 2 processes:
◦ Identification of critical assets
◦ Comprehensive risk assessment
Steps Description
Identify
critical assets • IT assets that are mission-essential
and must be recovered first
• Identify interdependencies
Conduct BCP/DRP-focused Risk • Identify risks to each asset
Assessment • Conduct vulnerability analysis
• Statements of Impact
Determine Maximum Tolerable Consists of two metrics:
Downtime (MTD) - the maximum time • Recovery Time Objective (RTO) -
each business process can be maximum time allowed to recover
inoperative before significant damage business or IT systems (from disaster
or long-term viability is threatened onset to resumption of businesses
processes)
MTD=RTO+WRT • Work Recovery Time (WRT) – time
required to configure a recovered
system
Term Definition
Recovery
Point Objective (RPO) Level of data/work loss or system
inaccessibility (measured in time)
resulting from a disaster that an
organization can withstand –counted
backwards from onset of disaster
Mean Time between Failures (MBTF) Average amount of time a system or
device is runs before it fails
Mean Time to Repair (MTTR) Length of time to recover a failed device
or system
Minimum Operating Requirements Minimum environmental and connectivity
requirements required to operate
RPO Technologies
8 – 14 days New equipment, data recovery from backup
4 – 7 days Cold systems, data recovery from backup
2 – 3 days Warm systems, data recovery from backup
12 -24 hours Warm systems, recovery from high speed
6 – 12 hours Hot systems, recovery from high speed backup media
3 – 6 hours Hot systems, data replication
1 – 3 hours Clustering, data replication
<1 hour Clustering, near real time data replication
Adapted from CISSP Guide to Security Essentials
For each process, describe the impact
on the rest of the organization if the
process is incapacitated
Examples
Inability to process payments
Inability to produce invoices
Inability to access customer data for
support purposes
Fortification of facility
Redundancy (clustered servers, drives, etc.)
Power lines
Fire suppression/detection
Redundant vendor support
Insurance
UPS/generators
Data backup technologies
Media protection safeguards
Inventory
5 Steps that we’ll discuss:
1. Business process recovery
2. Facility recovery
3. Supply and technology recovery
4. User environment recovery
5. Data recovery
Define critical steps of a company’s
processes
Required roles
Required resources
Input and output mechanisms
Workflow steps
Time for completion
Interfaces with other processes
3 types of disruptions:
◦ Nondisasters – disruption in service due to a
device malfunction or failure
◦ Disasters – An event causes the loss of the entire
facility for a day or longer
◦ Catastrophes – major disruption that destroys the
facility, requiring moving operations to offsite
facility
Type of offsite facility Advantages Disadvantages
Hot Site – fully configured High availability - can Expensive!!!
equipment and lines.
with be immediately ready
Data retrieved and loaded – or within matter of
from backup site hours
Cold Site – supplies basic Lowest availability – Least Expensive
environment (electrical, AC, longest restoration
plumbing) but no systems – time
can also just be a reciprocal
agreement
Warm Site – anywhere in Less expensive Not immediately available
between. (requires some setup and
restoration
Operational Testing not
available
Note: For CISSP exam purposes – a hot site here is a subscription service
– not owned by the company!!!
Redundant Sites:
Redundant site: Site is equipped and
configured exactly like the production site –
data – data can be streamed live
Rolling hot site: Large truck or trailer is
turned into a work area
Multiple processing centers – Distributed
through multiple locations
Recovery team must be able to recreate the
environment
◦ Hardware? Software?
◦ Configuration manuals?
◦ Where are your recovery plans stored?
◦ How long will it take for new equipment to arrive – many have
requirements within 24 hours (do you have a contract with your
vendor that provides for this?)
◦ Backups – do you have apps and O/Ss to support your restored
data (remember that we covered types of backups last week)?
◦ Ensure that there are at least two copies available of a
company’s operating system software and critical apps – one
offsite and one offsite – test these to ensure you can restore!!!!!
Employee Notification – develop a Crisis
Communications Plan
◦ Call Tree – used to rapidly communicate information throughout
an organization by assigning the responsibility for contacting
employees to other employees (i.e. Margaret calls Bob and 9
other people, Bob then calls 10 people, who each call 10
people, etc.)
◦ Identify users who need to return to work and how they need to
work
◦ Can you return to paper processes? Can you automate
processes?
Covered last week (all in how the archive bit is
handled – remember?)
◦ Full Backup – every file is backed up and archive bit
is removed
◦ Differential Backup – only files with the archive bit
are backed up, but the archive bit is left on the file
(so backup is cumulative until the full backup runs
and removes the bits – necessitating restoring the
last full backup and last differential)
◦ Incremental Backup - only files with the archive bit
are backed up, and the archive bit is removed from
the file (necessitates layering the incremental tapes
in order over the full backup during restoration)
Disk shadowing – online backup storage (disk
mirroring is a one-to-one relationship, disk
shadowing uses multiple drives to create
shadow sets
Electronic vaulting – makes copies of files as
they are modified and periodically transmits
them to offsite backup storage (common in
banks)
Remote journaling – includes only moving the
“deltas” that have taken place
Close enough or provision to access media?
Far enough away to withstand regional
disaster?
Closed on weekends or holidays?
Commensurate security controls to
production facility?
Availability of bonded transport system (Iron
Mountain)?
Does data need to be encrypted if leaving the
production facility?
Method of transferring risk
Cyberinsurance – new type of insurance that
covers DoS, malware, privacy-related
lawsuits, downstream liability, etc.
Business interruption insurance – covers loss
of revenue in the event something bad
happens
BCP coordinator needs to define teams:
◦ Damage assessment team – Determines the cause of the
disaster, potential for further damage, and whether or not
to activate the BCP
◦ Restoration team – responsible for getting the alternate site
into a working and functioning environment
◦ Salvage Team – responsible for starting the recovery of the
original site
◦ Media relations team
◦ Security team
◦ Telecommunications team
Reconstitution phase - when a company
moves back to its original site or new site
Test Type Purpose
DRP Review Most basic – reading the DRP from start to finish by the
team that developed it to ensure that it is complete
Checklist (consistency) Often performed concurrently with a structured
walkthrough or tabletop test – lists all necessary
components required for recovery
Structured Walkthrough Group walks through the process on paper
/Tabletop
Simulation Teams actually carry out the recovery process (disaster
Test/Walkthrough Drill is simulated) – scope of simulation can vary
Parallel Processing Recovery of crucial processing components at an
alternate computing facility and then restoration from a
previous backup without disrupting production)
Partial and Complete Risky! Processing is stopped at the primary location
Business Interruption and transitioned to the alternate location
At least annually!!
Identify test objectives and scope
Identify Lessons Learned
Revise the plan after testing (I look for
“lessons learned” as an audit item)
Note: BCPs are updated whenever there are
significant changes to the organization
Determine how frequently (at least annually)
Good idea to train different roles more
regularly
Train so that everyone knows the initial steps
and where to find the plans
First aid and CPR
Starting emergency power
Call tree
http://www.bcmpedia.org/w/images/thumb/1/19/Call_Tree.png/400px-Call_Tree.png
Plans updated whenever there is a change to
the environment
Plans reviewed for updates at least annually if
no changes
Track and document all planned changes and
implement a formal approval process for all
substantial changes
Changes must be auditable!
NIST SP 800-34 (now Rev. 1)
ISO/IEC-27301 – draft - part of ISO 27000
series – addresses Information and
Communications Technology (ICT) and
Information Security Management System
(ISMS)
BS-25999 (2 parts) – British business
continuity standard
BCI (Business Continuity Institute) – 6 step
Good Practice Guidelines
Lack of management support
No coordination with vendors
Lack of testing
Lack of prioritization
Lack of training and awareness
Cloud environments complicate Disaster
Recovery
◦ Cloud environments can be a part of an
organization’s DR process
◦ Must plan on how personnel will access the cloud
Which of the following is the number one
priority of all BCP and DRPs?
◦ A. The elimination of potential outages
◦ B. The reduction of potential outages
◦ C. Protection and welfare of employees
◦ D. The minimization of potential outages
Which of the following is the number one
priority of all BCP and DRPs?
◦ A. The elimination of potential outages
◦ B. The reduction of potential outages
◦ C. Protection and welfare of employees
◦ D. The minimization of potential outages
Maximum Tolerable Downtime (MTD)
comprises which two metrics?
◦ A. Recovery Point Objective (RPO) and Work
Recovery Time (WRT)?
◦ B. Recovery Point Objective (RPO) and Mean Time to
Repair (MTTR)?
◦ C. Recovery Time Objective (RTO) and Mean Time
to Repair (MTTR)?
◦ D. Recovery Time Objective (RTO) and Work
Recovery Time (WRT)?
Maximum Tolerable Downtime (MTD)
comprises which two metrics?
◦ A. Recovery Point Objective (RPO) and Work
Recovery Time (WRT)?
◦ B. Recovery Point Objective (RPO) and Mean Time to
Repair (MTTR)?
◦ C. Recovery Time Objective (RTO) and Mean Time
to Repair (MTTR)?
◦ D. Recovery Time Objective (RTO) and Work
Recovery Time (WRT)?
An example of risk transference is:
A. Offsite storage
B. Insurance
C. Maintaining spare equipment offsite
D. Fire suppression
An example of risk transference is:
A. Offsite storage
B. Insurance
C. Maintaining spare equipment offsite
D. Fire suppression
What is one of the first steps in identifying a
BCP?
A. Identify backup solution
B. Decide whether the company needs to perform a
walk-through, parallel, or simulation test
C. Perform a business impact analysis
D. Develop a business resumption plan.
What is one of the first steps in identifying a
BCP?
A. Identify backup solution
B. Decide whether the company needs to perform a
walk-through, parallel, or simulation test
C. Perform a business impact analysis
D. Develop a business resumption plan.
Which plan details the steps required to
restore normal business operations/mission
after recovery from a disruptive event?
◦ A. Business Continuity Plan (BCP)
◦ B. Business Resumption Plan (BRP)
◦ C. Continuity of Operations Plan (COOP)
◦ D. Occupant Emergency Plan (OEP)
Which plan details the steps required to
restore normal business operations/mission
after recovery from a disruptive event?
◦ A. Business Continuity Plan (BCP)
◦ B. Business Resumption Plan (BRP)
◦ C. Continuity of Operations Plan (COOP)
◦ D. Occupant Emergency Plan (OEP)
Which draft Business Continuity guideline
ensures continuity of Information and
Communications Technology (ICT) as a part
of the organization's Information Security
Management System (ISMS)?
◦ A. BCI
◦ B. BS-7799
◦ C. ISO/IEC-27031
◦ D. NIST SP 800-34
Which draft Business Continuity guideline
ensures continuity of Information and
Communications Technology (ICT) as a part
of the organization's Information Security
Management System (ISMS)?
◦ A. BCI
◦ B. BS-7799
◦ C. ISO/IEC-27031
◦ D. NIST SP 800-34
Which of the following best describes the difference
between an Information Systems Contingency Plan and
Disaster Recovery Plan?
A. Information Systems Contingency Plan procedures
are developed for recovery of the system regardless of
site or location after a non-disaster
B. Disaster Recovery Plan procedures are developed for
recovery of the system regardless of site or location
C. Disaster Recovery Plan can be activated at the
system's current location or at an alternate site
D. Information Systems Contingency Plan is developed
for disasters that require restoration of IT systems at an
alternate site.
Which of the following best describes the difference
between an Information Systems Contingency Plan and
Disaster Recovery Plan?
A. Information Systems Contingency Plan procedures
are developed for recovery of the system regardless of
site or location after a non-disaster
B. Disaster Recovery Plan procedures are developed for
recovery of the system regardless of site or location
C. Disaster Recovery Plan can be activated at the
system's current location or at an alternate site
D. Information Systems Contingency Plan is developed
for disasters that require restoration of IT systems at an
alternate site.
What is the primary objective of a disaster recovery
plan?
a. To recover critical processes in a timely manner
b. Manage public relations after a crisis
c. To minimize financial loss during normal operations
outage
d. Re-design the security infrastructure of the
organization after an emergency
What is the primary objective of a disaster recovery
plan?
a. To recover critical processes in a timely manner
b. Manage public relations after a crisis
c. To minimize financial loss during normal operations
outage
d. Re-design the security infrastructure of the
organization after an emergency
A critical company asset would most likely have which of
the following MTD values?
A. Minutes to hours
B. Days
C. Weeks
D. Months
A critical company asset would most likely have which of
the following MTD values?
A. Minutes to hours
B. Days
C. Weeks
D. Months