Wang 2018
Wang 2018
PII: S0951-8320(18)30135-2
DOI: 10.1016/j.ress.2018.07.021
Reference: RESS 6220
Please cite this article as: Likun Wang , Zaili Yang , Bayesian network modelling and analysis of
accident severity in waterborne transportation: a case study in China, Reliability Engineering and
System Safety (2018), doi: 10.1016/j.ress.2018.07.021
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights
T
IP
CR
US
AN
M
ED
PT
CE
AC
1
ACCEPTED MANUSCRIPT
T
IP
Abstract
CR
The rapid development of the shipping industry requires the use of large vessels carrying
high-volume cargoes. Accidents incurred by these vessels can lead to a heavy loss of life and
damage to the environment and property. As a leading country in international trade, China has
developed its waterway transport systems, including inland waterways and coastal shipping, in the
US
past decades. A few catastrophic shipping accidents have occurred during this period. This paper
aims to develop a new risk analysis approach based on Bayesian networks (BNs) to enable the
AN
analysis of accident severity in waterborne transportation. Although the risk data are derived from
accidents that occurred in China‘s waters, the risk factors influencing accident severity and the
risk modelling methodology are generic and capable of generating useful insights on waterway
risk analysis in a broad sense.
M
To develop the BN-based risk model, waterway accident data are first collected from all
accident investigation reports by China‘s Maritime Safety Administration (MSA) from 1979 to
2015. Based on the derived quantitative data, we identify the factors related to the severity of
ED
waterway accidents and use them as nodes of the risk model. Second, based on a receiver
operating characteristic (ROC) curve, an augmented naïve BN (ABN) model is selected through a
comparative study with a naïve BN (NBN) model to analyse the key risk factors influencing
PT
waterway accident severity. The results show that the key factors influencing waterway safety
include the type and location of the accident and the type and age of the ship. Moreover, a novel
CE
1. Introduction
Waterborne transportation is vital for sustaining national economic development given its
capability of providing cheaper and greener solutions compared to other transport modes. For
instance, over the past several years, China‘s waterway (including both coastal and inland)
shipping has been developing rapidly. By 2016, its waterway freight volume reached 6.382 billion
tonnes, which was a 480% increase from 2001, and its water transport ship load capacity reached
266.22 million tonnes, 211.73 million tonnes more than that in 2001. Due to increasing shipping
demand, waterway traffic density increases, and the navigational environment becomes complex,
2
ACCEPTED MANUSCRIPT
leading to a high level of risk. It has been reported that in 2016 alone, 196 accidents occurred and
203 people died in China‘s waterways (Statistical Bulletin of Transportation Industry
Development, 2017). Maritime accidents cause casualties, economic loss, environmental
degradation and waterway congestion (Zhang et al. 2013). Compared to ocean transportation,
ships used in inland and coastal waterways are smaller, and their ability to tackle emergencies is
lower. Hence, the probability and severity of the accidents that involve these ships are probably
higher. It has been found that larger Danish-flagged cargo ships often suffer fewer accidents
(Danish Maritime Authorities, 2010). Hansen, Jepsen, and Hermansen (2012) concluded that
vessels smaller than 3000 GT put their crews at great risk of being in a maritime accident, which
requires the crews of such vessels to abandon ship more frequently than those of large vessels.
T
Statistics from the Maritime Accident Investigation Branch (MAIB) in 2010 disclosed that the risk
IP
of the total loss of a small ship is much higher than that of a large vessel. This paper therefore
aims to analyse the characteristics of shipping accidents involving small vessels in inland and
CR
coastal waterways and, based on a data-driven Bayesian network (BN), to identify the important
risk factors influencing accident severity for risk prediction and accident prevention. The accident
data used in this study are obtained from the accident investigation reports by China‘s Maritime
Safety Administration (MSA) over the past 30 years. Accidents that occurred in China‘s inland
US
and coastal waterways are selected, and those associated with deep-sea transportation are
eliminated. The term ―accident‖ in this paper refers to both accidents and casualties, as defined by
AN
the International Maritime Organization (IMO).
This paper is organised as follows. Section 2 describes the current literature relating to
maritime accidents, with a focus on maritime risk assessment using BNs. In Section 3, the method
and results of accident data mining are presented. Section 4 presents the methodology of
M
developing a BN-based risk model for the analysis of waterway accidents. In contrast to previous
relevant studies that rely, more or less, on expert judgments to interpret subjective probabilities,
the novelty of the methodology lies in the use of data-driven approaches to identify key risk
ED
factors and quantify their interdependencies. There are no studies in the literature that analyze all
accident investigation reports of a particular country/region (e.g., China) over a long time span
(e.g., 30 years). In Section 5, the BN model verification is conducted by comparing two models
PT
based on different BN calculations. Section 6 describes the scenario analysis for drawing useful
findings in terms of risk prediction and accident prevention. Conclusions and future work are
CE
discussed in Section 7.
2. Literature review
AC
Guedes Soares, 2008). The integration of established and novel techniques to assess risks is a
current goal within many maritime organisations. The application of probabilistic methods to
model some of these high risks is a current practice because it has potential to help in the process
of decision making, which would allow regulatory changes to be proposed (Guedes Soares and
Teixeira, 2001)
Some papers set natural weather conditions as one of various variables that influence ship
accidents (Zhang et al., 2013; Mullai and Paulsson, 2011; Balmat et al., 2009), whereas Knapp et
al. (2011) focus on oceanographic conditions. This paper uses econometric models to measure the
effect of significant wave height and wind strength on the probability of vessel casualty, and the
results show that the probability of vessel casualty is influenced by seasonality, wind strength and
T
wave height.
IP
It is commonly stated that 80% of all accidents are associated with human factors (Antao and
Guedes Soares, 2008). Several human factor analysis models have been introduced and widely
CR
used, such as the Human Factors Analysis and Classification System (HFACS), the Technique for
Retrospective and Predictive Analysis of Cognitive Errors (TRACEr), the Cognitive Reliability
and Error Analysis Method (CREAM) and Accident Analyse Mapping (AcciMap). Chen et al.
(2013) established a maritime incident analysis framework using HFACS-MA. Akyuz (2015)
US
assessed human factors in ship grounding accidents with AcciMap. Sotiralis et al. (2016)
calculated the collision accident probability due to human error with TRACEr and BN. Yang et al.,
AN
(2013), Wu et al. (2017) and Xi et al., (2017) proposed different modified CREAM based on
evidential reasoning to estimate the human error probability in maritime accidents.
Reviewing these studies reveals that although diverse causes of accidents are presented in
different routes/locations, common risk factors influencing the occurrence probability or
M
consequence severity of accidents exist; these are presented in Table 1. The analysis of such
causes and factors aids the analysis of the initial set of risk variables in this study.
ED
Table 1
Variables from the relevant literature
PT
Ship type Weng and Yang (2015), Heij and Knapp (2012), Cariou, Mejia, and Wolff (2008),
Balmat et al. (2011), Li, Yin, and Fan (2014), Knapp et al. (2011)
CE
Ship age Knapp et al. (2011), Balmat et al. (2009), Zhang et al. (2013), Li, Yin, and Fan (2014),
Wu et al. (2015)
Ship flag or registry Knapp et al. (2011), Balmat et al. (2009), Li, Yin, and Fan (2014)
Gross tonnage Zhang et al. (2013), Knapp et al. (2011), Balmat et al. (2009), Hansen, Jepsen, and
Hermansen (2012), Li, Yin, and Fan (2014), Knapp, Bijwaard, and Heij (2011)
Ship speed Balmat et al. ( 2011), Talley, Yip, and Jin (2012)
Ship defects Hänninen and Kujala (2014), Knapp, Bijwaard, and Heij (2011)
4
ACCEPTED MANUSCRIPT
Loading Guedes Soares and Teixeira (2001), Akyuz and Celik (2014)
Crew Akhtar and Utne (2014), Mullai and Paulsson (2011), Yang et al. (2013), Hänninen and
Kujala (2012), Weng and Yang (2015), Prabhu Gaonkar, Xie, and Fu (2013)
Location Weng and Yang (2015), Mullai and Paulsson (2011), Sun et al. (2013)
Rain, fog Mullai and Paulsson (2011), Balmat et al. (2009), Weng and Yang (2015)
T
Season Knapp et al. (2011), Zhang et al. (2013), Li, Yin, and Fan (2014)
IP
Human factors Antao and Guedes Soares (2008), Chen et al. (2013), Akyuz (2015), Sotiralis et al.
CR
(2016), Wu et al. (2017)
Maritime accident risk models often involve quantitative analysis. The IMO proposed a
US
formal safety assessment (FSA) method for risk management in maritime accident analysis. The
FSA method is a systematic approach to ship accident analysis that considers ship condition,
organisational management, human operation and hardware (Guedes Soares and Teixeira, 2001).
AN
To further compute the causal relationships among the above factors, some quantitative risk
assessments are provided in maritime accident research. For example, fault-tree analysis (FTA)
has been used to analyse the causes of maritime accidents (Ronza et al. 2003, Kum and Sahin
2015). Antao and Guedes Soares (2006) used the FSA method to identify basic events that could
M
lead to a Ro-Ro vessel accident, and they built an FTA model to analyse the relation between the
relevant events. Recently, Zhang et al. (2013) used the FSA method to analyse ship accident
ED
consequences in the Yangtze River and then used the BN tool for quantitative analysis. In addition,
Fabiano et al. (2010) proposed summarised statistics for evaluating accident frequency over time
or certain risk control levels. Balmat et al. (2011) evaluated maritime risk assessment based on a
PT
fuzzy-logic approach.
Maritime accident data are available from established datasets and accident investigation
reports. Among the most often used historical datasets are Lloyd‘s Register Fairplay, Lloyd‘s
CE
Maritime Intelligence Unit, and the IMO. The contained statistics consist of ship names, ship
registries, accident dates and times, types of casualties, consequences, locations, ship types, gross
tonnages, classification societies, dead weights, and injured or dead people. Lloyd‘s data normally
AC
cover ships larger than 100 gross registered tons and thus omit a large percentage of fishing
vessels (Guedes Soares and Teixeira, 2001).
Heij et al. (2013) used the accident statistics of Lloyd‘s Register Fairplay, Lloyd‘s Maritime
Intelligence Unit and the IMO. Wu et al. (2015) used accident statistics of the Yangtze River MSA,
and Weng and Yang (2015) used shipping accident statistics managed by Lloyd‘s List Intelligence
Company.
Accident investigation reports are a useful way to obtain more complete accident data. The
investigation reports of maritime accidents are often available from maritime authorities, such as
the MAIB of the UK, the MSA of China, and the Transport Safety Board of Canada. These reports
provide much more detailed information than existing databases and contain details of what
5
ACCEPTED MANUSCRIPT
occurred, subsequent actions taken and recommendations. In the present literature, few studies use
accident investigation reports to conduct accident analysis, and even fewer use them to conduct
quantitative risk analysis simply due to the large workload required to aggregate the data from
each report for a dataset of meaningful critical mass. For instance, from cases of high-speed craft
accidents, Antao and Guedes Soares (2008) found the chain of events that led to the accidents and
their associated contributory factors and causes. With the taxonomy of the TRACEr method,
Graziano et al. (2016) coded and analysed grounding and collision accidents investigation reports
to identify human and organisational errors (HOFs). Chauvin et al. (2013), Akhtar and Utne
(2014), and Chen et al. (2013) analysed accident reports of selected cases to investigate maritime
accident HOFs.
T
The data collection and analysis method based on accident reports, though time consuming,
IP
will no doubt bring new findings and rich information that cannot be easily obtained from existing
databases and facilitate the use of primary data in maritime accident analysis. Its novelty is also
CR
highlighted by feeding such information into an advanced BN model for enabling risk prediction
and accident prevention, instead of discussing the importance of individual factors based on basic
statistical analysis. It will therefore help the authors generate new findings in this study region, in
contrast to previous relevant studies, which rely on data derived from the same/similar sources.
studies (Yang et al., 2013a). BNs have been used in waterway transport accident research due to
their advantages, including their usefulness in conducting backward risk diagnosis and forward
risk prediction and in accommodating new evidence to update an analysis without the need to
ED
literature review, or a combination of the above (Zhang, 2016). Trucco et al. (2008) presented a
BN of maritime accidents with human and organisational factors, which was an extension of the
fault tree. With the aid of experts in the maritime and petroleum industries, a BN was obtained to
CE
predict the risk of maritime piracy against offshore oil fields (Bouejla et al. 2014). Based on the
available data from the Maritime Authority (DGAM), expert knowledge was consulted for the
construction and validation of the BBN model (Antao and Guedes Soares, 2008). With the
AC
suggestions of six selected experts who were consulted to identify major factors influencing the
likelihood of a successful hijacking of a ship, a BN model was developed to estimate the
likelihood of a ship being hijacked in the Western Indian or Eastern African regions (Pristrom et al.
2016). With a combination of statistics and expert knowledge, Zhang et al. (2016) built a Bayesian
belief network to express the dependencies between the indicator variables and the consequences
of the Tianjin port accident.
When BN structures are developed from data using a machine learning algorithm, there is a
possibility that the generated casual relationships are unreasonable and ambiguous. Previous
studies have used expert knowledge or a taxonomy model to optimise such structures. Zhang et al.
(2013) estimated navigational risks on the Yangtze River using a BN technique; the preliminary
6
ACCEPTED MANUSCRIPT
structure of the BN was obtained from data via the necessary path condition algorithm, and
additional domain knowledge was referenced to further consolidate the structure. Ma et al. (2016)
presented a BN-based target-extraction method to extract moving vessels from numerous blips
captured in frame-by-frame radar images; at the beginning, an initial BN structure was established
based on expert judgment and was then improved with the help of a K2 scoring algorithm. Akhtar
and Utne (2014) developed a Bayesian causal network to analyse maritime accidents using the
qualitative model (HFACS) and its taxonomy for structuring fatigue-related factors into levels,
which decreased the number of links (correlations) in need.
To avoid the subjectivity associated with expert input in BN modelling, two plain machine
learning algorithms, the naïve BN (NBN) and the augmented NBN (ABN) (Friedman et al., 1997),
T
are applied in this study because of their demonstrated efficiency and capability. With the core
IP
idea of classification, the NBN and ABN models can be built to simplify BN structures without
sacrificing the accuracy of the model.
CR
Previous studies in which a BN was applied to maritime risk tended to focus on the
probabilities of shipping accidents rather than their severity, and accident data were frequently
obtained directly from existing databases rather than compiled from investigation reports. The
novelty of this study is its attempt to construct a BN from primary data directly derived from
US
accident investigation reports containing rich information that fits the specific requirement of this
study.
AN
Furthermore, we extract influencing factors and the nature of waterway transportation
accidents from accident investigation reports using text mining techniques. The text mining
method has a wealth of applications in other disciplines, such as enterprise management and
sociology (Glaser, 1992). However, the use of this text coding approach in maritime risk data
M
elicitation, such as in Mullai and Paulsson (2011), is scant. To develop a rational BN structure, we
use and compare the NBN and ABN algorithms to select the best-fit BN structure with specific
evaluating indicators.
ED
3. Data mining
PT
We collected 229 accident investigation reports from Chinese coastal waterways and inland
CE
rivers from China‘s MSA and its fourteen subordinates. As many as 350 vessels were involved in
these reported accidents from 1979 to 2015. Each report includes a description of the ship(s), crew,
ship companies, accident location, navigational environment, accident process, losses, and an
AC
T
China, 2002.
IP
CR
3.2 Coding
The grounded theory (GT) method is a systematic methodology involving the discovery of
theory through the analysis of data (Glasser and Strauss, 1967). Selective reduction is the kernel of
US
GT. Using the GT method, we stepwise process the 229 text cases collected from MSA by coding,
conceptual formulation, categorisation and repeated comparisons to extract the influencing risk
factors and their effects on waterway transport accidents. The specific process is described as
AN
follows.
(1) Coding: we read the original case reports word by word and sentence by sentence to
encode the data. We mark the sentences in the original text that involve the consequences of each
M
higher levels of abstraction. Then, we retrieve the ultimate accident-related categories, which are
presented in Table 3. In this process, the attributes and their categorisation in previous studies (in
Table 1) are used as a reference.
CE
(4) Attribution: According to the categories in Table 3, we obtain the related attribute values
of each case. Finally, we obtain a database with 350 records with 21 columns.
(5) Relationships recognition: When contrasting explanations (i.e. causal relationships)
AC
appear, one solution is to use domain expert evaluations with reference to the mainstream
explanations in the literature. If the experts argue the explanations opposite to the mainstream
ones, extra justifications are needed. The other is to mark the contrasting explanations, in order to
test the sensitivity in the model validation process.
Fig. 1 presents the ships involved in the accidents. It shows that accidents involving bulk
cargo ships occur most frequently, as found by Zhang et al. (2016), followed by container ships,
and more than 50% of the catastrophic accidents involve bulk cargo ships. The frequencies of the
other variables are presented by the percentage values attached at each state of them in Table 3.
Table 3
Categories
8
ACCEPTED MANUSCRIPT
T
Length (metres) 100 or less, more than 100 1, 2 63.3, 36.7
IP
Gross tonnage
300 or less, 300 to 10000, greater than 10000 1, 2, 3 15.4, 62, 22.6
CR
(GT)
Ship speed
5 or less, 5 to 10, greater than 10 1, 2, 3 42.5, 36.1, 21.4
(knots)
Ship defects
US
No defect, defect was unrelated to the accident, or
sailing;
1, 2 77.2, 22.8
AN
relevant defect, or no recent PSC check
certificate
ED
Quay, port channel, anchorage, inland waterway, 1, 2, 3, 4, 13.1, 8.5, 10.6, 24.9,
Location
coastal waterway 5 42.9
Wind (Beaufort
4 or less, 5 to 7, greater than 7 1, 2, 3 30.8, 58.6, 10.6
scale)
AC
9
ACCEPTED MANUSCRIPT
Accident
Minor, major, critical, catastrophic 1, 2, 3, 4 20.6, 14.1, 20.6, 44.7
severity
*The flag of convenience (FOC) refers to vessel registry in the following locations: Panama, Limassol, Kingston,
Valletta, Belize, Majuro, Cyprus, Phnom Penh, Cambodia, and Willemstad.
** The maritime accidents were divided into eight types, i.e., collision, contact, standing, grounding,
fire/explosion, sinking, wind strike and other, according to the Regulation of Water Transportation Accidents
Statistics provided by the MoT (2002).
T
In the subsequent quantitative analysis, accident severity is defined as a dependent variable,
the other categories in Table 3 are treated as influencing variables, and the attributes correspond to
IP
the variable states.1
60%
CR
Minor
50% Major
Critical
Accident severity
40%
30%
20%
US Catastrophic
AN
10%
0%
Container Dry bulk Fishing Tanker Barge/ bug Ro/Ro ship Passenger Other
M
Fig. 1. Percentage of each ship type involved in minor, major, critical, and catastrophic
accidents.
ED
<G, P>. G is a directed acyclic graph, the nodes in the network correspond to the variables, the
tangential arc refers to the causal relationship between variables, the directional arc from node X
to node Y indicates that X has a direct causal effect on Y, and the conditional probability P(Y|X)
CE
P(Y) is the prior probability of the hypothesis, i.e., the likelihood that Y will be in a certain
state, prior to consideration of any other relevant information (evidence), which is X. P(X|Y) is the
conditional probability (the likelihood of evidence given the hypothesis to be tested), and P(Y|X) is
the posterior probability of the hypothesis (the likelihood of Y being in a certain state, conditional
on the evidence provided) (Akhtar and Utne 2014).
The development of a BN model includes the following steps: BN structure learning, BN
monitoring and analysis and model validation, sensitivity analysis, estimation and evaluation
1
The entire process was conducted in Chinese given that the accident investigation reports were all in this
language. The identified categories were later translated to English for this paper.
10
ACCEPTED MANUSCRIPT
T
possible structures for a given problem increases super-exponentially with the number of variables
in the problem domain (Yang et al., 2018).
IP
NBN and the ABN can reduce the complexity given that the partial structure of the network
is fixed. An NBN is a simple structure that has an independent node as the parent node of all the
CR
other nodes, and no other connections are allowed in the structure. However, strong assumptions
are required in most NBN cases. To make the model more realistic, we adopted an ABN model,
US
whose architecture consists of a naïve architecture that is made richer by basing the ties between
the child nodes on the value of the target node. Since ABN is based on NBN, the latter is
introduced first.
AN
4. 1 NBN learning
M
An NBN is a network structure in which the target node is directly connected to all other
nodes and each child node is independent of the other nodes. The NBN structure is generated by
ED
specification. The NBN model is most commonly applied to classification problems (Friedman et
al., 1997).
a) Let ‗accident severity‘ be the class variable (S) with one state for each possible state, and
PT
let
be the set of risk variables ( 𝑘 ) (i.e., ship type, hull type, ship age, ship flag, length, gross tonnage,
ship speed, ship defects, loading, crew, location, rain, fog, visibility, wind, season, time of day,
CE
navigational environment, human factors, and accident type, respectively), where each variable
represents a property that we observe and include in our model.
b) Given the simplicity and strong assumption of the pairwise independence of the attributes,
AC
two types of structures can be obtained to describe the relationships between S and 𝑘 .
Fig. 2(a) shows the first structure, in which 𝑘 is the parent node of ‗accident severity‘, and
‗accident severity‘ is the only child of each risk factor node and no other structure. In our study,
the ‗accident severity‘ of four states can be assigned to S, and it has 20 influencing variables, each
of which can be assigned to more than one state, as in Table 3. For any set of observations
, the complexity of computing the conditional probability distribution
| ) is non-linear, and there may be more than 2E+09 conditional probability
distributions that need to be computed (the size of the conditional probability table increases
exponentially with the number of parents).
Let ‗accident severity‘ have no parents, and let it be the only parent of each feature variable.
11
ACCEPTED MANUSCRIPT
Fig. 2(b) shows the second structure; S is the only parent of each child node. The structure consists
of the prior distribution ) and 65 conditional probability distributions 𝑘 | ). This classifier
algorithm is much simpler, and it can be used to express the relationship between variables. In this
paper, we adopt the structure in Fig. 2(b) as the NBN structure.
…… ……
T
IP
Fig. 2(a). BN converging structure with Fig. 2(b). BN diverging structure with
‗accident severity‘ as a child node. ‗accident severity‘ as the parent node.
CR
c) Estimate the conditional probability distribution as follows:
n
P ( S ) P ( Rk S )
P ( S RST , RHT ,
US
, RAT ) P ( S Rk ) n
k 1
P( R )
k 1
k
(2)
AN
4. 2 ABN learning
M
The ABN consists of an NBN enriched by the relationships between the child nodes and the
value of the target node (the common parent). The ABN modelling technique is implemented as
ED
follows.
a) Generate a Naïve Bayes structure with the target node (e.g., ‘accident severity’) directly
PT
Given different values of α, assume that the relationship between child nodes ( 𝑘 ) is allowed to
exist by fixing the links from the target node (S) to the children ( ). With the aim of
AC
the minimum description length score (Lam, Bacchus 1993), a greedy search algorithm among the
children is used to obtain the augmented part of the ABN given different values of α.
𝑀𝐷𝐿 𝐵 𝐷) α𝐷𝐿 𝐵) + 𝐷𝐿 𝐷|𝐵) (3)
where 𝐵 represents the ABN, and 𝐷 represents the dataset given to ABN 𝐵.
c) Evaluate the structure/data ratios of different ABN structures, where the structure/data
ratios are 𝐷𝐿 𝐵)/𝐷𝐿 𝐷|𝐵), 𝐷𝐿 𝐵) is the description length of the ABN, and 𝐷𝐿 𝐷|𝐵) is the
description length of the data given the ABN.
The structure/data ratios allow us to consider the structural complexity and predictive
performance of the network. The lower the value of α is, the higher the value of 𝐷𝐿 𝐵)/𝐷𝐿 𝐷|𝐵)
12
ACCEPTED MANUSCRIPT
will be. In other words, when α equals 0.1, the ABN structure in Fig. 3(a) is more complex than
the structure with α = 1 in Fig. 3(b), and the target predictive precision of ABN (α = 0.1) is higher
than that of ABN (α = 1). After comparing the structure/data ratios under different α, an ABN
structure (𝐵) will be selected with a satisfied trade-off between predictive performance versus
network complexity.
T
IP
CR
Fig. 3(a). ABN structure with α = 0.1 US Fig. 3(b). ABN structure with α = 1
AN
d) Estimate the conditional probability distribution as follows:
n
P ( S ) P ( Rk (Rk)
)
M
The types of construction validity tests for BN models include nomological, face, content,
CE
concurrent and convergent validity, qualitative features and the sensitivity test (Pitchforth and
Mengersen, 2013; Mazaheri, 2016; Sotiralis et al., 2016). In this paper, the NBN structure is fixed
AC
by nature, so the structure and parameters need not be checked. In contrast, during the process of
ABN structure learning, the model content verification will be done. In section 5.2, three domain
experts were interviewed to verify the parameters and their relationships in the ABN model. The
sensitivity test is described in detail in sections 4.4 and 6.2.
In addition, this paper uses the receiver operating characteristic (ROC) curve to verify the
model from the data statistics view. ROC is a plot of the true positive rate (Y-axis) against the
false-positive rate (X-axis), and the ROC index represents the surface under the ROC curve
divided by the total surface. For the different BN models, we use the indicator of the ROC curve
to evaluate the fitness of the NBN and ABN models in this paper.
13
ACCEPTED MANUSCRIPT
T
categorical variables. It measures how much (on average) the observation of a random variable y
tells us about the uncertainty of x, i.e., by how much the entropy of x is reduced if we have
IP
information on y. If I(Y, X) > 0, then the association between y and x is strong; if I(Y, X) ≈ 0, then
the association is weak, and y and x occur simultaneously only by chance; and if I(Y, X) < 0, then y
CR
and x are complementary, and there is no association. The mutual information between ‗accident
severity‘ and other risk variables can be defined as
| 𝑘)
𝑘) | 𝑘)
where
∑
US
represents each state of ‗accident severity‘,
𝑘) ∑
𝑘
) (5)
analysis to determine how the risk variable affects ‗accident severity‘ is performed as follows.
The value of the target node (e.g., ‗accident severity‘) is computed when the state of one
ED
child node (e.g., risk variable) is assigned different values, and the states of the other child nodes
are locked. In other words, for a specific k, where 𝑘 has a strong relationship with S, we set 𝑘
to a different state i, then compute the joint probability 𝑗 𝑘 𝑖) and the mean value
PT
𝐸 𝑘 𝑖).
𝑗 𝑘 𝑖) 𝑗) × 𝑘 𝑖| 𝑗) (6)
CE
𝐸 𝑘 𝑖) ∑𝑗 𝑗 × 𝑗 𝑘 𝑖) (7)
5. BN structure learning
AC
Assuming that all the child nodes are independent, we can construct an NBN as shown in Fig.
4.
14
ACCEPTED MANUSCRIPT
T
IP
CR
Fig. 4. NBN structure
0.8
M
Structure/DataRatio
0.6
ED
0.4
0.2
PT
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Structural Coefficient
Fig. 5. Representation of Structure/Data Ration variation for different values of the Structural
CE
coefficient (a).
The complexity of the structure increases with a decreasing structural coefficient. This
complexity becomes problematic when it increases more rapidly than the predictive precision.
AC
Visual inspection suggests that there could be a trade-off (e.g., the sharp bend of the curve)
between the predictive performance and the network complexity when α is equal to 0.3.
(2) Content validity
In addition, in the process of learning the structure in ABN, the common sense, accident
report and expert knowledge should be used to ensure the rational causal relationships between the
BN nodes. Hanninen, Kujala (2014) used prior knowledge to forbid some arcs while using a
hill-climbing algorithm to learn a BN structure in ship accidents. With reference to Hanniene
Kujala‘s work, when learing the ABN structure in this work, we consider the following prior
15
ACCEPTED MANUSCRIPT
knowledge to argue the casual relationships which are not harmony with the current understanding
learnt from the literature.
‗Ship type‘ should have no relationships with ‗Ship age‘ and ‗Ship flag‘; ‗Ship age‘ should
have no relationships with ‗Ship flag‘, ‗Length‘, ‗Gross tonnage‘; ‗Ship flag‘ has no
relationships with ‗Ship flag‘, ‗Length‘, ‗Gross tonnage‘;
Ship static characteristics (e.g., ‗Ship type‘, ‗Hull type‘, ‗Ship age‘, ‗Ship flag‘, ‗Length‘,
‗Gross tonnage‘) should have no relationships with accident environment states (e.g.,
‗Rain‘, ‗Fog‘, ‗Visibility‘, ‗Wind‘), the ship loading state (e.g., ‗Loading‘), the crew
seaworthiness (e.g., ‗Crew‘), and accident time and location (e.g., ‗Location‘, ‗Season‘,
T
‗Time of day‘);
IP
‗Ship defect‘ or ‗Crew‘ should have no relationships with ‗Rain‘, ‗Fog‘, ‗Visibility‘,
‗Wind‘, ‗Season‘ ,‗Time of day‘, ‗Ship speed‘ and ‗Navigational environment‘;
CR
Using the above prior knowledge to prohibit relevant arcs, we obtain a BN structure from
data with the ABN algorithm (α=0.3), as shown in Fig. 6(a). The arc orientation is the opposite of
US
the causal relation for low computational complexity (4.1). It is observed that ‗Ship speed → Time
of day‘, ‗Loading → Navigational environment‘ and ‗Length → Human factors‘, is not consistent
with the actual situation because accident time will not influence ship speed. In addition, it can be
AN
shown that navigational environment has a small consequence for ship loading situation, also the
human factors should have a small consequence for ship length.
M
ED
PT
CE
AC
Fig. 6(a). ABN model structure with initial Fig. 6(b). ABN model structure with
forbidden arcs. additional forbidden arcs ‗Ship speed → Time
of day‘, ‗Loading → Navigational
environment‘, ‗Length → Human factors‘.
Therefore, we adjust the ABN structure with additional forbidden arcs of ‗Ship speed →
Time of day‘, ‗Loading → Navigational environment‘, ‗Length → Human factors‘. As a result, the
revised structure is shown in Fig. 6(b); the arc ‗Accident type → Time of day‘ in Fig.6(b) is not
16
ACCEPTED MANUSCRIPT
consistent with the actual situation, and we forbidden this arc and adjust the ABN structure, and
the updated structure is shown in Fig. 6(c).
T
IP
CR
US
Fig. 6(c). Final ABN structure with additional forbidden arcs ‗Accident type → Time of day‘.
AN
5.3 ROC curve
We use the indicators of ROC to evaluate the fitness of the different BN models. The results
are shown in Table 4.
M
Table 4
Degrees of fit of the models
ED
The ROC value of the ABN model is higher, indicating that this model yields a better result.
Thus, we select the ABN model (Fig. 6(c)) for the subsequent data analysis.
CE
6. Model results
AC
Fig. 7 shows the initial prior probability distributions of the factors involved in the adjusted
ABN.
17
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
M
ED
the type of accident with the highest probability: 60.27%. Dry bulk cargo vessels accounted for
the largest percentage (i.e., 51.42%) of shipments involved in accidents. Ships younger than 5
CE
years were involved in the largest percentage (i.e., 44.63%) of accidents. The majority of vessels
involved in accidents, 63.70%, were less than 100 m long. Gross tonnages in the range of
300-10000 accounted for 61.46% of the ships involved in accidents.
AC
With respect to the safety of the ships, 22.70% of the vessels involved in accidents had
deficiencies or failed to conduct safety inspections, 7.92% were overloaded, and 20.87% had an
insufficient number of crew members or a crew member with an incomplete or invalid certificate.
In terms of navigational environment, rain was present in 21.69% of the accidents, fog in
19.96%, and poor visibility in 17.30%. In addition, 42.14% of the accidents occurred between
November and the following March, 64.38% occurred at night time, and 74.74% occurred in
waterways with shipping congestion and other poor navigational environments.
18
ACCEPTED MANUSCRIPT
T
severity‘; these variables are ‗accident type‘, ‗location‘, ‗ship type‘ and ‗ship age‘. Additional
𝑘) greater than 0.02 but less than 0.05, i.e., ‗ship flag‘, ‗ship
IP
variables that yield values of
speed‘, ‗time (of day)‘, ‗visibility‘, ‗gross tonnage‘,‗environment‘, ‗crew‘, ‗season‘ and ‘ wind‘,
CR
also had a significant effect on ‗accident severity‘.
The variables ‗ship defects‘, ‗loading‘, ‗fog‘, ‗hull type‘, ‗human factors‘, ‗rain‘ and ‗length‘
had a relatively weak effect on ‗accident severity‘.
US
AN
M
ED
PT
CE
Fig. 8. Mutual information shared with the target node (the size of the node is equal to the MI
value)
AC
T
IP
CR
US
AN
Fig. 9(a). Mean values of ‗accident severity‘ against different accident types, and the
M
posterior probability of each state of ‗accident severity‘ with respect to different accident types.
Fig. 9(a) shows that when the accident is a sinking, the ‗accident severity‘ tends to be the
ED
highest, at a value of 3.237. When a grounding occurs, the probability of ‗minor accident‘ is the
highest, and the mean probability of ‗accident severity‘ is the lowest, which means that the
waterway ‗accident severity‘ tends to be minor.
PT
CE
AC
Fig. 9(b). Mean values of ‗accident severity‘ in different locations, and the posterior probability of
20
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
Fig. 9(c). Mean values of ‗accident severity‘ against different ship types, and the posterior
probability of each state of ‗accident severity‘ with respect to different ship types.
M
The severity of the waterway accident involving fishing vessel tends to be the highest: 3.456.
When the ship is hauling passengers, dry bulk cargo or a barge/tug, then the mean value of
‗accident severity‘ are 2.801, 3.007, 3.310, respectively. If the ship is a tanker, container ship, or
ED
ro/ro vessel, then the mean value of ‗accident severity‘ are lower: 2.705, 2.657, and 2.501,
respectively.
PT
CE
AC
Fig. 9(d). Mean values of ‗accident severity‘ against different ship ages, and the posterior
probability of each state of ‗accident severity‘ with respect to different ship ages.
21
ACCEPTED MANUSCRIPT
With increasing ship age, the severity of accidents generally rises. However, a 6-10 years ship
tends to be safer than one aged 0-5 years, probably because a new ship has a certain run-in period.
Ships aged 16-20 years tend to be slightly less safe than those more than 20 years.
The model enables analysis of the severity of waterway accidents based on various scenarios
involving different natural and navigational environments and vessel managerial conditions. Two
T
scenarios are undertaken focusing on the environment and vessel management to demonstrate the
possible research implications of the BN model.
IP
6.3.1 Scenario one: Hypotheses of natural and navigational environment aspects
CR
In scenario one, waterway risk under specific environmental conditions is estimated. Here,
environmental factors, including ‗season‘, ‗wind‘, ‗rain‘, ‗fog‘, ‗the time of day (TD)‘, and
US
‗navigational environment (NE)‘, are chosen. The variables are assigned the following states:
‗season‘ = ‗winter‘, ‗wind‘ = ‗greater than 7 on the Beaufort scale‘, ‗rain‘ = ‗rain‘, ‗fog‘ = ‗fog‘,
‗time’ = ‗night time‘, and ‗environment‘ = ‗high traffic density or other poor navigational
AN
environment‘. Assuming that one or more than one of the above natural and navigational
environment situations occurs, and considering that strong winds and heavy fogs, or heavy rain
and heavy fog do not usually happen simultaneously, we obtain 40 combinations of different
M
environmental conditions. If we fix the other variables in the ABN structure as constant (i.e. lock
the evidence), we can computer the increasing percent of the posterior probability of ―catastrophic‖
ED
noted that the rain itself could not lead to serious accident severity, but when rain occurs with
wind, the ship accident severity will increase significantly. In the meantime, the highest marginal
CE
contribution to ‗AS = 4‘ comes from ‗wind = greater than 7, time of day = night, navigational
environment = poor‖.
Obviously, all stakeholders should pay great attention when encountering poor navigational
AC
environments, especially severe weather conditions, given their significant effect on the accident
severity.
22
ACCEPTED MANUSCRIPT
60%
catastrophic accident
posterior probability of
Increased percent of the
55%
55% 53%
50% 51%
50% 47% 48% 48%
45% 41% 41% 42%
40% 38%
35% 33%
30%
25%
20%
Season 2 2 2 2 2
T
Wind 3 3 3 3 3 3 3 3 3
Rain 2
IP
Fog 2 2 2 2
TD 2 2 2 2 2 2 2
CR
NE 2 2 2 2 2 2 2 2
Combinations of environment conditions
Fig. 10. Posterior probabilities increasing by 30% for catastrophic accidents in scenario one.
‗smaller than 5 knots‘. The drop percentages of mean values of the node ‗accident severity‘ in
scenario two is shown in Fig. 11. When the crew is sufficiently staffed, the ship has no defects,
and the ship speed is slow, the mean value of accident severity decreases by 10% compared to
ED
the initial state. These values indicate a significant decrease in risk. The results indicate that the
reduction of hidden managerial dangers can significantly reduce the severity of the accidents.
PT
0%
mean value of ‘AS’
Decreased percent of
-1% -2%
-2% -2%
CE
-4%
-6% -7%
-8%
AC
-8% -9%
-10%
-10%
-12%
Crew 1 1 1 1
Ship defects 1 1 1 1
Ship speed 1 1 1 1
Combinations of managerial conditions
Fig. 11. Decreased percent of mean values of ‗accident severity‘ in scenario two arranged in
decreasing order.
23
ACCEPTED MANUSCRIPT
7. Conclusions
In this study, we extracted useful data from maritime accident investigation reports for risk
analysis using the GT method and BNs. We analysed the reports of waterway accidents held by
MSA in the past 30 years and then identified and analysed the causal factors influencing waterway
transport accidents. A novel BN model was constructed to analyse waterway risks using ABN
modelling.
Based on the mutual information contained in the ABN model, the risk variables are grouped
and ranked according to their degrees of closeness to the node of accident severity in the following
order: Among Group I (i.e. mutual information higher than 0.05) are accident type, location, ship
T
type and ship age; Group II (i.e. mutual information greater than 0.02 but less than 0.05) includes
IP
ship flag (registry), ship speed, time of day, visibility, gross tonnage, environment, crew, season
and wind; and Group Ⅲ (i.e. mutual information less than 0.02): ship defects, loading, fog, hull
CR
type, human factors, rain and length.
From the analysis, useful insights are obtained as follows:
(i)
US
When the type of accident is a sinking, the severity of the accident is the highest, and
when the type is a grounding, the accident severity is the lowest.
(ii) When the accident occurs at a quay, the risk of serious consequences is the lowest.
AN
When the location is an inland or coastal waterway, the average severity is the highest,
and the severity of the waterway accident tends to be catastrophic.
(iii) when the ship is a fishing vessel, the severity of the accident is the highest among all
M
vessel types;
(iv) with increasing ship age, the accident severity generally increases.
ED
The ABN model and the scenario analysis help investigate whether oceanographic conditions
influence risk and if these effects change over time. The relevant findings will provide useful
guides to the stakeholders, including ship operators to take better safety control options (with
PT
respect to the most influencing risk factors) to eliminate/reduce accident consequences; and
policymakers (e.g. classification societies) to set new safety standards (e.g. design for safety with
CE
severity. At night, heavy wind creating a poor navigational environment, causes the significant
increase of the probability of a catastrophic accident. When the crew is sufficiently staffed, the
ship has no defects, and the ship speed is slow, the mean value of accident severity decreases by
10%. Obviously, such analysis results suggest appropriate way of developing countermeasures for
accident prevention.
Despite the above contributions and findings, the paper has shown some limitations, among
which the significant includes
1) The completeness of the data mined from the text case is arguable. More sources should be
used to compensate the missing data.
24
ACCEPTED MANUSCRIPT
Acknowledgements
T
This research is sponsored by the Shanghai Pujiang Program (Grant No. 5PJC060), the
IP
National Science Foundation of China (Grant nos. 71573172 and 71402093), and the EU H2020
CR
Reference
Akhtar, M. J., & Utne, I. B. (2014). Human fatigue‘s effect on the risk of maritime groundings – A
US
Bayesian Network modeling approach. Safety Science, 62, 427–440.
Akyuz, E., & Celik, M. (2014). A hybrid decision-making approach to measure effectiveness of
safety management system implementations on-board ships. Safety Science, 68, 169–179.
AN
Aydogdu, Y. V. (2014). A comparison of maritime risk perception and accident statistics in the
Istanbul Straight. The Journal of Navigation 67(1): 129-144.
Akyuz, E. (2015). A hybrid accident analysis method to assess potential navigational
M
Antao, P. & Soares, G. (2008). Causal factors in accidents of high speed craft and conventional
ocean going vessels. Reliability Engineering & System Safety, 93(9), 1292-1304.
Antao, P., Grande, O., Trucco, P., & Soares, G. (2009). Analysis of maritime accident data with
PT
BBN models. European Safety and Reliability Annual Conference. September 7-10, Prague,
Czech Republic.
Balmat, J.-F., Lafont, F., Maifret, R., & Pessel, N. (2009). Maritime risk assessment (MARISA), a
CE
fuzzy approach to define an individual ship risk factor. Ocean Engineering, 36, 1278–
1286.
Balmat, J.-F., Lafont, F., Maifret, R., & Pessel, N. (2011). A decision-making system to maritime
AC
T
systems: a case study for oil spill from tankers in a ship–ship collision. Safety Science, 76,
IP
42-66.
Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory. New York: Aldine.
CR
Glaser, B.G. (1992). Emergence vs Forcing: Basics of Grounded Theory Analysis. Sociology
Press, Mill Valley, CA
Hänninen, M., & Kujala, P. (2012). Influences of variables on ship collision probability in a
Bayesian belief network model. Reliability Engineering & System Safety, 102, 27–40.
US
Hänninen, M., & Kujala, P. (2014). Bayesian network modeling of Port State Control inspection
findings and ship accident involvement. Expert Systems with Applications, 41(4), 1632–
AN
1646.
Hansen, H. L., Jepsen, J. R., & Hermansen, K. (2012). Factors influencing survival in case of
shipwreck and other maritime disasters in the Danish merchant fleet since 1970. Safety
Science, 50(7), 1589–1593.
M
Heij, C., & Knapp, S. (2012). Evaluation of safety and environmental risk at individual ship and
company level. Transportation Research Part D: Transport and Environment, 17(3),
228–236.
ED
Heij, C., Knapp, S., Henderson, R., & Kleverlaan, E. (2013). Ship incident risk around the
heritage areas of Tubbataha and Banc d‘Arguin. Transportation Research Part D:
Transport and Environment, 25, 77–83.
PT
Knapp, S., Bijwaard, G., & Heij, C. (2011). Estimated incident cost savings in shipping due to
inspections. Accident Analysis & Prevention, 43(4), 1532–1539.
CE
Knapp, S., Kumar, S., Sakurada, Y., & Shen, J. (2011). Econometric analysis of the changing
effects in wind strength and significant wave height on the probability of casualty in
shipping. Accident Analysis & Prevention, 43(3), 1252–1266.
AC
Kum, S., & Sahin, B. (2015). A root cause analysis for Arctic Marine accidents from 1993 to 2011.
Safety Science, 74, 206–220.
Li, K. X., Yin, J., & Fan, L. (2014). Ship safety index. Transportation Research Part A: Policy
and Practice, 66, 75–87.
Lam, W., & Bacchus, F. (1993). Using causal information and local measures to learn Bayesian
networks. Uncertainty in Artificial Intelligence, 243-250.
Ma, F., Chen, Y., Yan, X., Chu, X., & Wang, J. (2016). A novel marine radar targets extraction
approach based on sequential images and Bayesian Network. Ocean Engineering, 120,
64–77.
Mazaheri, A., Montewka, J., & Kujala, P. (2014). Modeling the risk of ship grounding—a
26
ACCEPTED MANUSCRIPT
literature review from a risk management perspective. WMU Journal of Maritime Affairs,
13(2), 269–297.
Mazaheri, A., Montewka, J., Kujala, P. (2016). Towards an evidence-based probabilistic risk
model for ship-grounding accidents. Safety Science, 86, 195-210.
Mullai, A., & Paulsson, U. (2011). A grounded theory model for analysis of marine accidents.
Accident Analysis & Prevention, 43(4), 1590–1603.
Pearl, J. (1988), Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo,
CA.
Gaonkar, R. S. P., Xie, M., & Fu, X. (2013). Reliability estimation of maritime transportation: a
study of two fuzzy reliability models. Ocean Engineering, 72(11), 1-10.
T
Graziano, A., Teixeira, A., & Soares, G. (2016). Classification of human errors in grounding and
IP
collision accidents using the TRACEr taxonomy. Safety Science, 86, 245-257.
Prabhu Gaonkar, R. S., Xie, M., & Fu, X. (2013). Reliability estimation of maritime transportation:
CR
A study of two fuzzy reliability models. Ocean Engineering, 72, 1–10.
Pristrom, S., Yang, Z., Wang, J., & Yan, X. (2016). A novel flexible model for piracy and robbery
assessment of merchant ship operations. Reliability Engineering & System Safety, 155,
196–211.
US
Ronza, A., Félez, S., Darbra, R. M., Carol, S., Vílchez, J. A., & Casal, J. (2003). Predicting the
frequency of accidents in port areas by developing event trees from historical analysis.
AN
Journal of Loss Prevention in the Process Industries, 16(6), 551–560.
Soares, G. & Teixeira, A. (2001). Risk Assessment in Maritime Transportation. Reliability
Engineering & System Safety, 74, 299-309.
Sotiralis, P., Ventikos, N., Hamann, R., Golyshev, P., & Teixeira, A. (2016). Incorporation of
M
human factors into ship collision risk models focusing on human centred design aspects.
Reliability Engineering & System Safety, 156, 210-227.
Sun, X., Yan, X., Wu, B., & Song, X. (2013). Analysis of the operational energy efficiency for
ED
inland river ships. Transportation Research Part D: Transport and Environment, 22, 34–
39.
Talley, W. K., Yip, T. L., & Jin, D. (2012). Determinants of vessel-accident bunker spills.
PT
27
ACCEPTED MANUSCRIPT
Yang Z., Wang J. and Li K. (2013a). Maritime safety analysis in retrospect, Maritime Policy and
Management, 40: 261-277.
Yang, Z.., Bonsall, S., Wall, A., Wang, J., & Usman, M. (2013). A modified CREAM to human
reliability quantification in marine engineering, Ocean Engineering, 58, 293–303.
Yang Z., Yang Z., and Yin J. (2018). ―Realising Advanced Risk-based Port State Control
Inspection using Data-Driven Bayesian Networks‖. Transportation Research Part A:
Policy and Practice, 110, 38-56..
Zhang, J., Teixeira, A., Soares, G., Yan, X., & Liu, K. (2016). Maritime transportation risk
assessment of Tianjin Port with Bayesian Belief Networks. Risk Analysis, 36(6), 1171-1187.
Zhang, D., Yan, X., Yang, Z., Wall, A., & Wang, J. (2013). Incorporation of formal safety
T
assessment and Bayesian network in navigational risk estimation of the Yangtze River.
IP
Reliability Engineering & System Safety, 118, 93–105.
Zhang, D., Yan, X., Yang, Z., & Wang, J. (2014). An accident data-based approach for congestion
CR
risk assessment of inland waterways: A Yangtze River case. Proceedings of the Institution of
Mechanical Engineers, Part O: Journal of Risk and Reliability, 228(2), 176–188.
US
AN
M
ED
PT
CE
AC
28