Designing for safety
Eric Marsden
System safety
The application of engineering and management principles,
criteria, and techniques to optimize all aspects of safety within
the constraints of operational effectiveness, time, and cost
throughout all phases of the system life cycle
A planned, disciplined and systematic approach to preventing or
reducing accidents throughout the lifecycle of a system
Primary concern is the management of risks:
risk identification, evaluation, elimination & control
through analysis, design & management
A
c le v e r
Quote from Memoirs of a fortunate jew, D. A. Segre, Grafton Books, 1988.
o f
w a y
o u t
in to
w h ic h
n e v e r
2 / 60
p e r s o n
h a v e
a n
a
is
o n e
w h o
n t
u n p le a s a
w is e
g o t
p e r s o n
f in d s
s itu a tio n
w o u ld
e s .
th e m s e lv
History of system safety
Arose in the 1950s after dissatisfaction with the fly-fix-fly approach to
safety
early development in us Air Force
led to mil-std-882 Standard Practice for System Safety (v1 1960s)
Rather than assigning a safety engineer to demonstrate that a design is
safe, integrate safety considerations from the design phase
3 / 60
Founding principles
Safety should be designed in
Critical reviews of the system design identify hazards that can be controlled by modifying
the design
Modifications are most readily accepted during the early stages of design, development,
and test
Previous design deficiencies can be corrected to prevent their recurrence
Inherent safety requires both engineering and management techniques to
control the hazards of a system
A safety program must be planned and implemented such that safety analyses are
integrated with other factors that impact management decisions
4 / 60
Founding principles
Safety requirements must be consistent with other program or design
requirements
The evolution of a system design is a series of tradeoffs among competing
disciplines to optimize relative contributions
Safety competes with other disciplines; it does not override them
5 / 60
Safe design: main principles
inherent
safety
safety
factors
safe design
negative
feedback
6 / 60
multiple
independent
safety barriers
Inherently safe design
Inherent: belonging to the very nature of the person/thing (inseparable)
Recommended first step in safety engineering
Change the process to eliminate hazards, rather than accepting the
hazards and developing add-on features to control them
unlike engineered features, inherent safety cannot be compromised
Minimize inherent dangers as far as possible
potential hazards are excluded rather than just enclosed or managed
replace dangerous substances or reactions by less dangerous ones (instead of
encapsulating the process)
use fireproof materials instead of flammable ones (better than using flammable
materials but keeping temperatures low)
perform reactions at low temperatures & pressures instead of building resistant
vessels
7 / 60
W h a t
y o u
c a n 't
le a k .
- -
T r e v o r
d o n 't
K le tz
h a v e ,
Inherently safe design
Image source: http://xkcd.com/1626/
8 / 60
Inherently safe design
Four main methods:
1
Minimize: reducing the amount of hazardous material present
at any one time
Substitute: replacing one material with a less hazardous one
Example: cleaning with water and detergent rather than a
flammable solvent
Moderate: reducing the strength of an effect
Example: having a cold liquid instead of a gas at high pressure
Example: using material in a dilute rather than concentrated form
9 / 60
Simplify: designing out problems rather than adding
additional equipment or features to deal with them
Inherently safe design
Four main methods:
1
Minimize: reducing the amount of hazardous material present
at any one time
Substitute: replacing one material with a less hazardous one
Example: cleaning with water and detergent rather than a
flammable solvent
Moderate: reducing the strength of an effect
Example: having a cold liquid instead of a gas at high pressure
Example: using material in a dilute rather than concentrated form
9 / 60
Simplify: designing out problems rather than adding
additional equipment or features to deal with them
Inherently safe design
Four main methods:
1
Minimize: reducing the amount of hazardous material present
at any one time
Substitute: replacing one material with a less hazardous one
Example: cleaning with water and detergent rather than a
flammable solvent
Moderate: reducing the strength of an effect
Example: having a cold liquid instead of a gas at high pressure
Example: using material in a dilute rather than concentrated form
9 / 60
Simplify: designing out problems rather than adding
additional equipment or features to deal with them
Inherently safe design
Four main methods:
1
Minimize: reducing the amount of hazardous material present
at any one time
Substitute: replacing one material with a less hazardous one
Example: cleaning with water and detergent rather than a
flammable solvent
Moderate: reducing the strength of an effect
Example: having a cold liquid instead of a gas at high pressure
Example: using material in a dilute rather than concentrated form
9 / 60
Simplify: designing out problems rather than adding
additional equipment or features to deal with them
Inherently safe design
Two further principles are sometimes cited:
error tolerance: equipment and processes can be designed to be capable
of withstanding possible faults or deviations from design
example: making piping and joints capable of withstanding the maximum
possible pressure if outlets are closed
limit effects: designing and locating equipment so that the worst
possible condition gives less danger
example: bungalows located away from process areas
example: gravity will take a leak to a safe place
example: bunds contain leakage
10 / 60
Related CSB safety video
us csb safety video Inherently Safer: The Future of Risk Reduction, July 2012
Watch the video: https://youtu.be/h4ZgvD4FjJ8
11 / 60
Example of minimizing safety-critical surface: storage tank
A storage tank feeds liquid to a chemical process
Process requires liquid to be supplied at variable
pressure
depth
gauge
achieved by controlling height of liquid within the tank
A depth sensor measures height of liquid and control
system tells pump to move the liquid into tank
Hazard is spillage of the liquid, which is toxic
With the given arrangement the whole controller is
safety-critical
if something fails then the pump could be activated in
such a way that it overfills the tank, resulting in a hazard
How could we reduce the safety-critical area?
12 / 60
toxic
liquid
pump
control
system
Example of minimizing safety-critical surface: storage tank
depth
gauge
Use a non-programmable element to provide
additional safety
What is achieved:
even if the controller by mistake sends
safety-violating command to the pump,
shut-off valve ensures that it will be ignored
safety-critical area is reduced to float switch
and shut-off valve
13 / 60
pump
shut-o
value
toxic
liquid
control
system
Minimize: the safety kernel concept
A safety kernel is a simple arrangement (e.g.
combination of hardware and software) that
implements a critical set of operations
Kernel is small and simple so more effort can be
applied to verify its trustworthiness
is sometimes protected by special hardware techniques
decoupled from complexity in other parts of the system
Similar concept for security: the trusted computing
base
14 / 60
Related CSB safety video
us csb safety video Fire From Ice, July 2008
Watch the video: https://youtu.be/3QKpVnTqngc
15 / 60
Examples of substitution
Use bleach in the process (where possible) instead of chlorine gas
Use simple hardware devices instead of a software-intensive computer
system
Electronic temperature measurement instead of thermometers based on
level of mercury
Reduce dust hazard by using less fine particles, or by treating product in a
slurry instead of a powder
Use an inert gas such as nitrogen instead of an air mixture, to reduce
explosion hazards
The substitution principle is part of the ecs reach regulation and of
the Biocidal Products Regulation
substitution of harmful chemicals with safer alternatives
16 / 60
Examples of moderation
Reduce mass flowrates to lessen pressure on piping
Reduce quantities of hazardous materials stored on site
and amounts requiring transport by road or rail
Miniaturize process reactors
Use proven technology and processes
introducing new technology introduces new unknowns, as well as unknown
unknowns
17 / 60
Simplification: principles
A simple design minimizes
number of parts
functional modes
number and complexity of interfaces
A simple system has a small number of unknowns in the
interactions within the system and with its environment
e
I c o n c lu d
o f
tin g
c o n s tr u c
w a y
O n e
A system is intellectually unmanageable when the level of
interactions reaches a point where they cannot be thoroughly
planned, understood, anticipated, guarded against
th e r e
a n d
unmanageable
18 / 60
a r e
th e
is
o b v io u s
o th e r
w a y
th a t
s o
it
n o
is
w a y s
d e s ig n :
s im p le
th a t
ie s
d e f ic ie n c
to
th e r e
ie s .
d e f ic ie n c
r e
C . A . H o a
tw o
s o f tw a r e
m a k e
to
a r e
th e r e
y
o b v io u s l
te d
c o m p lic a
- -
System accidents occur when systems become intellectually
th a t
m a k e
a r e
n o
it
s o
Counter-examples of simplification
19 / 60
Counter-examples of simplification
20 / 60
Principle: tolerate errors
setpoint
time evolution of some process
parameter (temperature, pressure)
21 / 60
Principle: tolerate errors
setpoint
21 / 60
normal
operating
limits
Principle: tolerate errors
setpoint
21 / 60
normal
operating
limits
safe
operating
limits
Principle: tolerate errors
setpoint
21 / 60
normal
operating
limits
safe
operating
limits
instrumentation
range
Principle: tolerate errors
setpoint
normal
operating
limits
safe
operating
limits
instrumentation
range
equipment
containment
limits
W id e r
g
o p e r a tin
ity
o p p o r tu n
:
a c c id e n t
21 / 60
f o r
lim its
r e c o v e r y
ly
in h e r e n t
m o r e
b e f o r e
s a f e r
Illustration: overfill alarms in fuel tanks
Overfill level (maximum capacity)
The tank rated capacity is a theoretical tank level, far enough below the overfill level to allow time to
respond to the final warning (eg the LAHH) and still prevent loss of containment/damage.
It may also include an allowance for thermal expansion of the contents after filling is complete.
Tank rated capacity
The LAHH is an independant alarm driven by a separate level sensor etc. It will warn of a failure
of some element of a primary (process) control system. It should be set at or below the
tank rated capacity to allow adequate time to terminate the transfer by alternative means
before loss of containment/damage occurs.
Response
Time 3
LAHH
Response
Time 2
Ideally, and where necessary to achieve the required safety integrity, it should have a trip action to
automatically terminate the filling operation.
The LAH is an alarm derived from the ATG (part of the process control system). This alarm is the first
stage overfilling protection, and should be set to warn when the normal fill level has been exceeded;
it should NOT be used to control filling.
Factors influencing the alarm set point are: providing a prompt warning of overfilling and maximising
the time available for corrective action while minimising spurious alarms eg due to transient level fluctuations or thermal expansion.
Normal fill level (normal capacity)
Defined as the maximum level to which the tank will be intentionally filled under routine
process control.
Provision of an operator configurable notification also driven from the ATG may assist
with transfers though it offers minimal if any increase in safety integrity.
Source: UK HSE report Safety and environmental standards for fuel storage sites, 2009
22 / 60
LAH
Response
Time 1
Trip
Alarm
Notification
(optional)
Illustration of inherent safety principles at Bhopal
Elimination: MIC (methyl isocyanate) would not have been produced if
an alternative process route was used to produce the same chemical
Minimization: such a large storage of MIC was unnecessary
different reactor design would have cut the inventory of MIC to a few
kilograms in the reactor, with no intermediate storage of many tonnes required
Substitution: an alternative route involving phosgene as an
intermediate could have been used
Attenuation: MIC could have been stored under refrigerated condition
Simplification: a simpler piping system would have alerted the
maintenance crew of necessary action
23 / 60
Safe design precedence
start here!
Hazard elimination
substitution
simplification
decoupling
elimination of human
errors
reduction of hazardous
materials or conditions
Hazard reduction
design for observability
and controllability
barriers (lockins,
lockouts, interlocks)
failure minimization
safety factors and
margins
redundancy
inherently safe
systems
24 / 60
probabilistically
safe systems
Hazard control
reducing exposure
isolation and
containment
fail-safe design
Damage reduction
protective barriers
Inherent safety: diculties
A knife cuts
25 / 60
Inherent safety: diculties
Most medicines
are toxic
26 / 60
Inherent safety: diculties
Gasoline is able to store
large quantities of energy in
a compact form (= very
hazardous)
e s
S o m e tim
th e
e s
p r o p e r ti
f o r
o b je c t
th a t
27 / 60
is
b u ilt
m a k e
it
v e r y
w h ic h
a r e
a n
th o s e
u s
h a z a r d o
Inherent safety: tradeoffs
CFCs have low toxicity, not flammable, but cause environmental impacts
are alternatives propane (flammable) or ammonia (flammable & toxic)
inherently safer?
Increasing the burst-pressure to working-pressure ratio of a tank
increases reliability
reduces safety (new hazards: tank explosion, new chemical reactions possible
at higher pressures)
28 / 60
Passive vs. active protection
Passive safeguards maintain safety by their presence and fail into safe
states
Active safeguards require hazard or condition to be detected and corrected
Tradeoffs:
passive methods rely on physical principles
active methods depend on less reliable detection and recovery mechanisms
passive methods tend to be more restrictive in terms of design freedom
not always feasible to implement
29 / 60
Passive protection: examples
Permanent grounding and bonding via continuous metal equipment and
pipe rather than with removable cables
Designing high pressure equipment to contain overpressure hazards such
as internal deflagration
Containing hazardous inventories with a dike that has a bottom sloped to
a remote impounding area, which is designed to minimize surface area
Pebble-bed nuclear reactors use pebbles of uranium encased in graphite
to moderate the reaction: the more heat produced, the more the pebbles
expand, causing the reaction to slow down
30 / 60
Passive protection example: filling a tank
vapour
ground
fill nozzle
spark
area
ethyl acetate
ground
pump
weigh scale
Hazard: ignition of flammable liquid during
filling, due to static electricity
Source: CCPS Process Beacon, January 2009
31 / 60
Passive protection example: filling a tank
Nozzle/Dip Pipe Bonded to Tote and Pump
vapour
ground
fill nozzle
spark
area
ethyl acetate
ground
pump
weigh scale
Hazard: ignition of flammable liquid during
filling, due to static electricity
Source: CCPS Process Beacon, January 2009
31 / 60
Nozzle
Ground
Dip Pipe
Ground
Weigh Scale
Ground
Pump
Non-splash filling solution eliminates the
hazard
Active protection mechanisms
Active design solutions require devices to monitor a process variable and
function to mitigate a hazard
Active solutions generally involve a considerable maintenance and
procedural component and are therefore typically less reliable than
inherently safer or passive solutions
To achieve necessary reliability, redundancy is often used to eliminate
conflict between production and safety requirements (such as having to
shut down a unit to maintain a relief valve)
Active solutions are sometimes referred to as engineering controls
32 / 60
Active protection example: safety valve
Safety valve prevents overpressure in
a vessel or pipe
Depicted: standard steam boiler safety
valve (DN25)
Image source: SV1XV, Wikimedia Commons, CC BY-SA licence
33 / 60
Active protection example: rupture disk
Rupture disk prevents overpressure in
a vessel or pipe
34 / 60
Active protection example: interlock
Interlocking device to
prevent incompatible
positions of various
switches
Similarly, household
microwave ovens have an
interlock that disables
magnetron if door is open
Image source: Wikimedia Commons, author Audriusa, CC BY-SA licence
35 / 60
Active protection example: lockout mechanisms
Lockout-tagout or lock-and-tag mechanisms ensure equipment
cannot be started while maintenance is underway
Each worker places a lock on the power switch for the
equipment before intervening on it plus tag with their name
If another worker arrives to work on same equipment, also puts
his lock+tag on same switch
Power can only be reestablished when all workers have
reclaimed their lock
Essential safety procedure for variety of electrical, mechanical,
pneumatic equipment
36 / 60
Lockout-tagout video by Napo
Watch video: https://youtu.be/G2ERlrWAmAE
The Napo safety video series, https://www.napofilm.net/en/ (EU-OSHA)
37 / 60
Lockout-tagout video by SafeQuarry
Watch video: https://youtu.be/wnFDQSC36Q4
38 / 60
Fail-safe principle
A system is fail-safe if it remains or moves into a safe state in case of
failure
Examples:
train brakes require energy to be released
control rods in a nuclear reactor are suspended by electromagnets; power
failure leads to scramming
traffic light controllers use a conflict monitor unit to detect faults or conflicting
signals and switch an intersection to a flashing error signal, rather than
displaying potentially dangerous conflicting signals
39 / 60
Illustration: railroad semaphores
stop
go
Railroad semaphores are designed so that the
vertical position indicates stop/danger
If the controlling mechanism fails, gravity
pulls the arm down to the stop position
40 / 60
Illustration: elevator brakes
Source: Elisha Otiss elevator patent drawing, 1861 (via Wikipedia), public domain
41 / 60
Illustration: elevator brakes
The safety elevator, invented by Elisha Otis in 1861.
At the top of the elevator car is a braking mechanism
made of spring-loaded arms and pivots. If the main cable
breaks, the springs push out two sturdy bars called
pawls so they lock into vertical racks of
upward-pointing teeth on either side. This ratchet-like
device clamps the elevator in place.
Modern elevators generally use a safety governor
which is activated when the elevator moves too quickly.
If centrifugal force exerts a greater force on hooked
flyweights than a spring holding them in place, they lock
into ratchets and stop the elevator.
42 / 60
Illustration: nuclear control rods
Control rods in a nuclear
reactor are suspended by
electromagnets. When
placed in the reactor vessel,
they absorb neutrons and
slow down the nuclear
reaction.
Power failure leads to
scramming: gravity makes
the rods drop into the
reactor vessel and
progressively shut down the
nuclear reaction.
43 / 60
Fail-silent principle
Property of a subsystem to remain in or to move to a state in which it
does not affect the other subsystems in case of a failure
Mostly applicable to computer/network systems
Hypothesis: silence is a safe state of the subsystem
When associated with watchdog mechanisms, allows fault detection
44 / 60
Decoupling
A tightly coupled system is one that is highly interdependent
each part is linked to many other parts
failure or unplanned behaviour in one part may rapidly affect status of others
processes are time-dependent and cannot wait: little slack in the system
sequences are invariant
only one way to reach the objective
System accidents are caused by unplanned interactions
Coupling creates increased number of interfaces and potential
interactions
45 / 60
Principle: design for controllability
Objective: make system easier to control, for humans & for computers
Use incremental control
perform critical steps incrementally rather than in one step
provide feedback, to test validity of assumptions and models upon which
decisions are made; to allow taking corrective action before significant damage
is done
provide various types of fallback or intermediate states
Use negative feedback mechanisms to achieve automatic shutdown
when the operator loses control
example: safety value that lets out steam when pressure becomes too high in a
steam boiler
example: dead mans handle that stops train when driver falls asleep
Decrease time pressures
Provide decision aids and monitoring mechanisms
46 / 60
Procedural design solutions
Procedural design solutions require a person to perform an action to
avoid a hazard
example: following a standard operating procedure
example: responding to an indication of a problem such as an alarm, an
instrument reading, a noise, a leak
Since an individual is involved in performing the corrective action,
consideration needs to be given to human factors issues
example: over-alarming
example: improper allocation of tasks between machine and person
Because of the human factors involved, procedural solutions are generally
the least reliable of the four categories
Procedural solutions are sometimes referred to as administrative controls
47 / 60
Examples of procedural design solutions
Following standard operating procedures to keep process operations
within established equipment mechanical design limits
Manually closing a feed isolation valve in response to a high level alarm
to avoid tank overfilling
Executing preventive maintenance procedures to prevent equipment
failures
Manually attaching bonding and grounding systems
48 / 60
Risk treatment: barrier types
49 / 60
Design principle: defence in depth
Multiple, independent safety barriers organized in chains
independence: if one barrier fails, the next is still intact
both functional and structural independence
Use large design margins to overcome epistemic uncertainty
(conservative design)
Use quality assurance techniques during design and manufacturing
Operate within predetermined safe design limits
Continuous testing and inspections to ensure original design margins are
maintained
Complementary principles:
high degree of single element integrity
no single failure of any active component will disable any barrier
50 / 60
Design principle: defence in depth
Level Objective
1
Prevention of abnormal operation and failures
Control of abnormal operation and detection
of failures
Control of accidents within the design basis
Control of severe plant conditions, including
prevention of accident progression and
mitigation of the consequences of severe
accidents
Mitigation of radiological consequences of
significant releases of radioactive materials
Source: INSAG-10 report Defence in depth in nuclear safety, 1996, IAEA
51 / 60
Essential means
Conservative design and high quality
in construction and operation
Control, limiting and protection
systems and other surveillance
features
Engineering safety features and
accident procedures
Complementary measures and
accident management
Off-site emergency response
Design principle: defence in depth
Hierarchy of safety barriers:
first preventive barriers (avoid occurrence of unwanted event)
then protective barriers (limit consequences of accident)
lesson from the Titanic disaster: improvement of preventive barriers (hull
divided into watertight compartments) is not a reason for reducing protective
barriers (lifeboats)
Further principles:
controls closest to the hazard are preferred since they may provide
protection to the largest population of potential receptors, including workers
and the public
controls that are effective for multiple hazards are preferred since they can
be resource effective
52 / 60
Hierarchy of controls
Control selection strategy should follow the following standard of preference
at all stages of design:
1
minimization of hazardous materials is the first priority
safety structures/systems/components are preferred over administrative
controls
passive structures/systems/components are preferred over active
structures/systems/components
preventive controls are preferred over mitigative controls
facility safety structures/systems/components are preferred over personal
protective equipment (PPE)
(This wording from doe-std-1189-2008)
53 / 60
Barrier types
Physical, material
obstructions, hindrances
Functional
mechanical (interlocks)
logical, spatial, temporal
Symbolic
signs & signals
procedures
interface design
Immaterial
rules, laws, procedures
54 / 60
Barrier types on the road
Symbolic:
requires
interpretation
Physical: works even
when not seen
Symbolic: requires
interpretation
Symbolic: requires
interpretation
55 / 60
Barrier criteria
Effectiveness: how effective the barrier is expected to be in achieving its
purpose
Latency: how long it takes for the barrier to become effective, once
triggered
Robustness: how resistant the barrier is w.r.t. variability of the
environment (working practices, degraded information, unexpected
events, etc.)
Resources required: costs in building and maintaining the barrier
Evaluation: how easy it is to verify that the barrier works
56 / 60
Important design principle: conservatism
Ensure a margin between the anticipated operating
and accident conditions (covering normal operation as
well as postulated incidents and accidents) and
equipment failure conditions
Prefer incremental to wholesale change
Prefer proven in use components over novel
technologies and implementations
where applications are unique or first-of-a-kind,
additional efforts (testing, increased safety margins)
should be taken
Heavy use of standards and good practices
57 / 60
Image credits
Beakers on slide 15: https://flic.kr/p/23BSz, CC BY-NC-SA licence
Tracks on slide 19: https://flic.kr/p/ac7oLB, CC BY-ND licence
Wires on slides 20: https://flic.kr/p/cFM3cd, CC BY licence
Knife on slide 25: https://flic.kr/p/4A3oRE, CC BY-NC licence
Pills on slide 26: https://flic.kr/p/8wbqMi, CC BY-NC-ND licence
Petrol cans on slides 27: https://flic.kr/p/6BWn2d, CC BY licence
Railroad semaphores on slide 40: https://flic.kr/p/nP4JbD, CC BY-NC-SA licence
Nuclear power plant on slide 43: Online textbook Principles of General Chemistry, CC BY-NC-SA licence
Valve on slide 48: https://flic.kr/p/4yixsL, CC BY-NC-ND licence
Castle on slide 50: https://flic.kr/p/9cKAvr, CC BY licence
Books at Trinity College library on slide 57 by Wendy, via https://flic.kr/p/fVs7BZ, CC
BY-NC-ND licence
For more free course materials on risk engineering,
visit https://risk-engineering.org/
58 / 60
Further reading
Book Engineering a safer world systems thinking applied to safety by
Nancy Leveson (mit Press, 2012), isbn: 978-0262016629
can be purchased in hardcover or downloaded in pdf format for free
uk hse research report Improving inherent safety (OTH 96 521) from 1996
insag-10 report Defence in Depth in Nuclear Safety, from iaea
US Department of Energy Nonreactor nuclear safety design guide
(DOE G 420.1-1A 12-4-2012) provides useful generic guidance on
designing for safety
The International System Safety Society website at system-safety.org
For more free course materials on risk engineering,
visit https://risk-engineering.org/
59 / 60
Feedback welcome!
This presentation is distributed under the terms of
the Creative Commons Attribution Share Alike
licence
@LearnRiskEng
Was some of the content unclear? Which parts of the lecture
were most useful to you? Your comments to
feedback@risk-engineering.org (email) or
@LearnRiskEng (Twitter) will help us to improve these course
materials. Thanks!
For more free course materials on risk engineering,
visit https://risk-engineering.org/
60 / 60
fb.me/RiskEngineering
google.com/+RiskengineeringOrgCourseware