1 Introduction
Collaboration of human operators
1 and automation in decision-making is growing in all areas of life. Examples include advanced warning and driver assistance systems in cars [
6,
49,
64], diagnostic procedures in healthcare [
24,
54,
62], systems that provide information and recommendations to pilots during flights [
2,
61,
80], and security analysis systems used by security administrators in
information technology (IT) environments [
65]. Automation could also provide support in multi-criteria decision-making or with linguistic models [
21,
42,
43,
82], and could enhance team performance [
12,
14]. Human operators can work with automation in various ways at the strategic, tactical, or operational levels [
23]. This paper focuses on operational-level cooperation, which in many situations can take one of two forms. In some scenarios, the automation serves as a
decision support system (DSS), intended to assist and advise the human operator’s decision-making to improve the overall task performance [
60], operating at levels 2-4 in the Sheridan and Verplank model [
73]. In other scenarios, the human takes a supervisory role, as may be required by regulation or policies, to ensure that the automation performs its function correctly [
3,
11], operating at Sheridan and Verplank level 6 [
73] in what is referred to in this paper as
Automated Decision Making (ADM) [
20,
48]. In either case, the automation collects information from the environment, processes it, and makes recommendations based on its inputs and analyses. At the same time, the operators also observe the environment independently, collect information, process it, can also consider recommendations from the automation, and establish their assessment of the situation. In a DSS, the human decides on the required action. At the same time, in an ADM, the automation makes the decision, and the operator can intervene to change the automation’s decision if it seems incorrect.
Such automation is becoming more sophisticated and incorporates not only advanced sensors and detection devices but also
artificial intelligence (AI) with
machine learning (ML) and
deep learning (DL) algorithms that analyze the available data. As automation becomes more “intelligent,” it becomes essential to define the human operators’ and the automation’s influence on the process and their respective responsibility for its outcomes [
55,
66,
78]. Are human operators in control when they are added to supervise an automated process? Are humans still fully responsible for outcomes if they receive advice from a DSS that analyzes the situation much better than they can? These are not theoretical questions but rather issues that need to be considered when an investment in DSS or ADM is made, and their implications need to be understood. Understanding the contribution of automation and human operators to the operation and the potential adverse outcomes of automated vehicles or
unattended airborne vehicles (UAVs) is critical for establishing proper guidelines for their operation, which will allow widespread deployment and use of such technologies. They are also relevant to regulators who may require human involvement in critical processes, demanding that humans have ultimate control without knowing how much they genuinely contribute to the situation. Establishing such operational guidelines and regulations requires a forward-looking analysis of automation and human involvement. Unfortunately, most research on this topic focused on retrospective analyses of specific events after the outcome was available. Such results are not easily translatable to guidelines for future systems and devices.
A first attempt to
a priori quantify the human causal responsibility when a human operator collaborates with intelligent automation was made by Douer and Meyer for single-decision events [
15,
16,
17]. Their
Responsibility Quantification (ResQu) model used information theory to quantify the contribution of the human and the machine to the final decision in a single-decision event, using it to quantify the human’s causal responsibility for the resulting action. While this serves certain situations, most real-life events are dynamic and involve changes in the environment, the decision maker, the available information, or the values of the outcomes. As explained below, the ResQu model cannot quantify human involvement in such dynamic events, and a new approach is required.
We present here a method for quantifying the Level of Influence (LoI) of the automation and the human on the decision-making and the resulting action in dynamic processes. The LoI can be used to understand the potential benefit of adding a human or automation to a process. It can be used to inform and advise system designers, process managers, and policymakers about the relative importance of humans and automation in a system.
The rest of this paper is organized as follows: We first review the related work in this field and highlight the research questions we address in this paper. A model for human-automation dynamic collaboration is described in Section
2, and an application of the model to a binary dynamic decision-making event with normally distributed noise is described in Section
3. Section
4 discusses the insights we gained from the results and identifies directions for future work. Finally, Section
5 presents the conclusions.
Related Work
There has been considerable research on the interaction between human operators and automated systems (for instance, in [
58,
73]). Multiple models have been proposed to define this interaction, such as shared-control [
72], co-active design [
36,
37], and a layered model of collaboration [
57], to name a few. These models identify which functions are to be performed by the automation and which are left to the human operator. In some cases, humans and automation perform similar activities, such as information gathering and analysis, but eventually, one makes the decision that leads to action.
DSSs are added to improve the outcomes of processes, e.g., to reduce traffic accidents, improve medical diagnoses, or prevent attacks on information systems. The analyses of DSSs often focused on the retrospective examination of decisions. They did not provide forward-looking, prospective analyses of the value that can be obtained by such DSSs (for reviews of such studies, see [
70] and [
60,
75]). ADMs, by contrast, are meant to ensure that humans remain involved in decision-making to avoid situations in which critical decisions, especially those that can affect people’s life and welfare, are entirely made by machines without human oversight. Humans may be included in the process due to insufficient trust in the automation design and capabilities or because they can add considerations of ethical or political implications, human mercy, or other aspects that may be outside the scope of the automation’s sensors and logic. Examples include the requirements to have alert drivers even when cars can supposedly drive themselves, e.g., in advanced Traffic-Aware Cruise Control mode [
69], the demand to have “meaningful human involvement” in the operation of highly
automated weapon systems (AWS) [
3], the requirement to have human officers involvement when deciding about a person’s statutory status [
22], and so on. However, simply adding a human to a process supported by a DSS or ADM does not mean that the human significantly influences the process or its outcomes. Compared to the system’s computational power, the complexity of the situation and human cognitive limitations may be such that the human will practically always be expected to accept the automation decision or recommendation. Therefore, the question of whether humans significantly influence such decisions is of the essence when designing such systems and determining regulations and policies.
Influence and Responsibility
As humans and automation collaborate to perform a task, the level of influence of each of them is less clear the more intelligent the automation becomes. This is particularly challenging at intermediate levels of automation when a process is partly autonomous with some human involvement. Humans are considered to have complete control if they wield a sword against an enemy, but without control, if an AWS has all the information, and a human follows its instructions (a paraphrase on the example given by Horowitz and Scharre [
34]). However, what would be the level of influence of a human operator on an intelligent system that performs situation analyses and recommends actions? In such scenarios, the operators must remain vigilant and control the system. They may, however, also trust the technology to alert them when needed and possibly even abort a mission autonomously, should the situation change and, for instance, civilians enter an advanced weapon’s impact perimeter. What is the operator’s level of influence in this situation?
Furthermore, what would happen when the operator’s ability to identify the situation deteriorates while the automation still functions at its original level? Would this change the operator’s influence on the decision and the outcome (e.g., injury of innocent civilians) since they have become less capable? Some work was done to define a quantitative method to attribute causation and responsibility in such situations involving multiple agents [
25,
44]. However, this work focused on the retrospective analysis of specific events for the attribution of actual, token-level causation (vs. prospective general causation) and the related responsibility of the agents to the outcome. They also assumed that all agents have the same capabilities, so they did not analyze the impact of possible differences between agents, which typically exist when humans and automation are concerned.
Unlike causality and control, responsibility is a multifaceted topic, including role, causal, legal, and moral responsibility [
30,
31,
71,
79]. A human operator using a DSS is still the one performing (or avoiding) the action, and an operator of an ADM still has the ability and responsibility to become involved and correct any perceived errors made by the automation. The person has the
authority to act, and as a result, carries the
role responsibility for the action. That said, attribution of responsibility is, in many ways, a subjective matter. People may hold operators responsible for the outcomes even if they had no way to control them, such as concerns of being liable for accidents when riding in a fully automated car [
1,
13] and the assignment of moral responsibility to humans who deploy AWS [
76]. Furthermore, operators may be held liable due to indirect liability and not necessarily due to their performance in a specific event. For instance, liability could extend to activities before an event, the operator’s intent, and their awareness of the implications of the outcome.
To quantify human responsibility in human-machine collaboration, it is necessary to identify a quantitative measure independent of any subjective factors. We followed Lagnado et al. [
44] and Douer and Meyer [
16] and aimed to quantify the
causal responsibility which is related to the human’s direct contribution to the outcome, irrespective of any legal, moral or ethical aspects. This allowed us to identify a
Human Responsibility Indicator (HRI), which provides a possible measure of human responsibility for the outcome. However, it should not be confused with other aspects of responsibility.
Douer and Meyer [
16] defined human causal responsibility as the proportion of the remaining uncertainty about the action taken, conditioned on all automation information processing functions and the overall uncertainty about the resulting action. This definition is based on the entropy concepts from information theory. These can be used when the random variables that represent the human and DSS classifications are stationary and ergodic. Entropy can be calculated when the probability function is known and changes deterministically over time [
38]. However, this is not true for dynamic events in which earlier decisions, based on random events, may change the probability distributions in non-deterministic ways [
47]. As a result, the evaluation of human responsibility in dynamic decision-making processes cannot be based on computations of entropy reduction, and a different method is needed.
Dynamic Decision Making
A model must incorporate the temporal aspects of events when quantifying the impact of automation on dynamic decision-making processes that develop over time and may be influenced by various variables [
5,
18,
52,
59]. We focus here on processes in which: (a) the situation continuously changes over time, (b) the decision maker needs to make consecutive decisions, taking their past decisions into account and considering how these changed the situation, and (c) incremental information could be gathered at each stage [
40]. Typically, such a dynamic event is finite in time, lasting for a period T or N stages, for a continuous or discrete event, respectively. If the operator refrains from deciding at any time (or stage), the event will end after time T (or N stages).
Such dynamic decision-making processes have several characteristics [
4,
18]:
(a)
A series of decisions is required to reach the goal.
(b)
The decisions are not independent, i.e., previous decisions may constrain future decisions.
(c)
The state of the environment may change, either independently or as a consequence of the decision maker’s previous actions.
(d)
The decisions have to be made in real-time.
Moreover, the dynamic environment consists of one or more components that can change over time and influence the event development:
(i)
The environment can change by itself or due to an operator’s actions (or lack thereof).
(ii)
The measurement sensitivity and setting can change due to internal or external factors (e.g., fatigue or weather conditions, respectively).
(iii)
The
value matrix for the different decisions can change over time, e.g., when a physician diagnoses a disease, the effectiveness of a treatment may decrease the later the correct diagnosis is made [
39,
41,
56,
67].
(iv)
The decision vector, which includes all possible decisions at every stage, can change in time as the available options change.
(v)
The
decision-making logic may change over time as a function of other parameters of the environment or the value matrix. Moreover, human operators are likely to deviate from the normative behavior, especially under pressure [
59], and their adjusted behavior may change the decision-making process.
(vi)
The
decision arbitration, which is the process of choosing which decision to implement (the human’s or the automation’s), can change over time. For instance, a car collision avoidance system may respect the driver’s decision to speed up even when approaching an obstacle. Still, it may override this speeding-up decision and autonomously brake when the car gets too close to the obstacle [
9].
We developed a model of decisions with such temporal changes. We used it as the basis for quantifying the human and automation influence on the decision and the resulting action.
2 An Analytic Model for Quantifying Influence in Human-machine Collaboration
The analysis of the decision-making process and the involvement of the human or the automation in the decision or the outcome is associated with Probabilistic Causation [
19,
33]. Prospective,
a priori estimation of the control of the process is related to the general, type-level causation of the outcome. An agent’s prospective general causation is this agent’s average causal contribution in all possible distributions of future events and resulting decisions in human-automation collaboration in a probabilistic world. An effect
E is said to be caused by a cause
C iff
C raises the probability of
E to occur, compared to the situation where
C is not to happen, i.e.,
\(\mathbb {P}(E|C)\gt \mathbb {P}(E|\sim C)\) [ibid]. A numerical index for the strength of causation of such a cause
C on the outcome
E is provided by the difference between those probabilities:
\(\Delta P = \mathbb {P}(E|C)-\mathbb {P}(E|\sim C)\) [
19,
35,
53]. Therefore, if the effect
E is the outcome of a decision-making process, and the cause
C is the existence of a DSS that consults the human, then we define the
level of influence (LoI) of an added agent (human or automation for ADM or DSS, respectively) by determining “how far” they moved the prospective probability distribution of the outcome from where it was before they were added, towards a different outcome probability distribution. Thereby, the model can characterize human involvement in decision-making processes in general. It cannot be used for analyzing individual decisions retrospectively.
The human-automation decision-making process was modeled following the stages of information processing: information acquisition, information analysis, decision and action selection, and action implementation [
58], as represented in Figure
1, which is an adaptation to dynamic events of the model presented in [
16]. The system is modeled as a continuous process (with
t representing time) or a discrete process (with
k representing the stage;
\(k=0\) is the first stage). A discrete model is presented in Figure
1, with the current stage number denoted as
k. The operator and automation both monitor the environment state
\(E(k)\) and collect their observations from the environment,
\(e^A(k)\) and
\(e^H(k)\) for automation and human observed data, respectively. If this is not the first stage, they also consider the automation and human classifications in the previous stage,
\(Y^C(k-1)\) and
\(X^C(k-1)\) respectively, and their decisions at the previous stage,
\(Y(k-1)\) and
\(X(k-1)\). They analyze them and generate their current state classifications of the inputs,
\(Y^C(k)\) and
\(X^C(k)\), considering their bias, which depends on the payoff values they assign at this stage to the possible outcomes (
Value or
Payoff matrix,
\(V(k)\)). Human operators also consider the automation classification output when making their classifications
2. Both automation and human then select their chosen decision,
\(Y(k)\) and
\(X(k)\), from the set of all available decision alternatives at that stage, represented by the vector
\(D(k)\). An
Arbiter then makes a selection of which decision to implement, the human’s or the automation’s, resulting in an action
\(Z(k)\) that influences the environment and generates a new environment state,
\(E(k+1)\).
For stages after the first stage, the past information about the classifications and decisions made at previous stages is reflected in the decision-making state matrix,
\(T(k)\ \ k\ge 1\), which is of dimension
\(4\times k\) and includes four vectors with values from stage 0 to stage
\(k-1\) of (i) the classifications of the automation
\(\vec{Y}_{k-1}^C\), (ii) the classifications of the operator
\(\vec{X}_{k-1}^C\), (iii) the decisions of the automation
\(\vec{Y}_{k-1}\) and (iv) the decisions of the operator
\(\vec{X}_{k-1}\), for
\(k=1 \dots N-1\).
Classifications in stage
k are based on the decision-making state from the previous stage,
\(T(k)\), the environmental observations in stage
k, and the payoff matrix at the current stage
\(V(k)\), which determines the classification bias. The classification process can be described as a function that represents the logic behind the classification process:
The functions
\(f(\cdot)\) and
\(g(\cdot)\) depend on the specific logic used for making a classification. For example, in a binary detection system, one could apply
signal detection theory (SDT) principles [
50] to build those functions using optimal criteria. The DSS and the operator can also use different types of environmental information, such as numerical (continuous or discrete), linguistic terms, visual or audible. For example, a driver assistance system uses video streams from in-car cameras together with numerical data from a
light detection and ranging (Lidar) system to classify the objects surrounding the car and identify if there are obstacles in the car’s path. In another example, a border control DSS would process the human responses in an immigration form, which could be in natural language or discrete Likert-scale responses [
46], to classify if a person should be admitted into the country. The processing of these different types of inputs is reflected in the classification functions
\(f(\cdot)\) and
\(g(\cdot)\), which are defined such that they can map all those different inputs to possible classifications
\(X^C\) and
\(Y^C\). In another system, automation can use
reinforcement learning (RL) to provide better predictions. Moreover, the operator’s function
\(g(\cdot)\) can incorporate Bayesian calculations to leverage the automation classification to modify their prior “belief” and optimize the resulting classification. Once the classifications are completed, the decision can be made by the automation and the operator (depending on who makes the decision), using the decision functions
\(H_Y\) and
\(H_X\), respectively:
The decision functions
\(H_Y\) and
\(H_X\) depend on the logic used to make the decision. For example, consider a simple random-walk decision process [
10] in which the human operator follows the logic: “(a) start with counter equals zero, (b) for automation binary classification in stage
k:
\(Y^C(k)=+1\) or
\(Y^C(k)=-1\), add or subtract one from the counter, respectively, and (c) decide to ‘Engage’ if the counter reaches threshold
\(\lambda\), ‘Abort’ if the counter reaches
\(-\lambda\) or otherwise, continue to collect more information.” For this process, the human decision function
\(H_X\) can be formalized as:
In the DSS case, the human operator makes the decisions, so the action Z equals the operator’s decision X. The observations of the environmental state are stochastic, hence, the operator’s decision, which is a function of the observations, is a random variable. The list of an operator’s decisions through all
N stages of the event can be represented, as described above, by a random vector
\(\vec{X}\) with different values for each stage:
All possible values of
\(\vec{X}_{N-1}\) define its
sample space \(\Omega\), and the probability when the operator is using a DSS for each sample decision vector of dimension
N:
\(\vec{\chi } = (\chi _0, \chi _1,\dots , \chi _{N-1})\) can be calculated as follows:
The collection of probabilities of all possible samples of \(\vec{\chi }\) defines the probability distribution of all possible decision combinations across the N stages of the event when a human operator is using a DSS. It determines the probability of the operator performing each specific series of decisions during the event.
To determine the influence of the DSS on the outcome, we quantify its influence on the probability distribution of the human operator’s decision. If the DSS did not influence it, then its influence on the outcome is defined as 0 since the probability of the outcome has not changed. In contrast, if the DSS significantly changes the probability distribution of the human’s decision, its influence on the outcome is significant. Such an analysis can be done by examining a similar reference event in which the operator decides without a DSS, as illustrated in Figure
1 without the shaded blocks. In such an event, the human operator uses the current environment state and their decision from the previous stage to determine the classification,
\(\breve{X}^C(k)\), which will drive the decision,
\(\breve{X}(k)\) that is used as the final decision,
\(\breve{Z}(k)\). The probability for the human decisions without using a DSS in such an event is calculated for every sample of the random vector
\(\vec{\breve{\chi }}_{N-1}\) in the sample space, similarly to (
8):
The level of influence of the DSS on the operator’s decision-making is defined using covariation assessment for causation strength [
35,
53], extended to a multi-dimensional probability space, as the distance between the probability distributions of the operator’s decisions throughout the event stages with and without the DSS. This distance is measured using the Hellinger distance [
32],
H, which measures the similarity between two probability distributions. Hellinger distance was used successfully in machine learning systems [
8] [
26] and across different domains, from ecological systems [
7] [
45] to security [
77]. It is a symmetrical metric bound within the range [0,1] and is defined for all possible probability distributions. This measure indicates how much the DSS influenced the probability distribution of the operator’s decisions.
The human responsibility indicator for this dynamic event can be estimated as the complement of the normalized DSS influence. Since the Hellinger distance is within the range [0,1], the human temporal responsibility, T-RESP, can be defined as
The normalization of
\(DSS_{inf}\) is performed for each event length independently since the maximum value of the influence depends on many factors that change according to the specific situation and how the environment and decision factors change over time, fundamentally changing the operator’s behavior.
Based on the above analysis, one can
a priori estimate the benefit (in terms of the increase in
expected value, EV) from using a DSS. The EV for the human decisions with DSS support is denoted by
\(EV_X\), while the expected value of the human-only process is denoted by
\(EV_{\breve{X}}\). The benefit of using the DSS is therefore defined as:
The above model is generic and is the basis for calculating automation or human influence and their expected benefit for various decision-making processes with different probability distributions. For example, an ADM case can be analyzed similarly by taking as a reference the automation-only probabilities and comparing them to the probabilities of the combined system of humans and automation. The following sections demonstrate how such a calculation can be applied to a specific scenario and what conclusions can be drawn from it.
3 Application of the Model
3.1 Binary Decision Making with Normally Distributed Noise
The above model can be applied to many situations when decision-making is required throughout an evolving dynamic event. To illustrate how this model can be used, we analyzed the schematic case of a manufacturing facility’s quality assurance (QA) department. The factory manufactures devices per customers’ orders. The manufacturing of each order is a single dynamic event that starts with the setup of the machine to the customer’s specifications (done only once, at the beginning of the order manufacturing) and then manufacturing N batches of M devices each (for a total of \(M\cdot N\) units in each order), with every batch considered to be a stage in the event. Machine setup can be done accurately or inaccurately, with a certain probability, resulting in intact or faulty devices, respectively, in all batches of that order. Before manufacturing each batch, a human QA inspector (the “operator”) checks the system setup to detect inaccurate machine setup before the batch is manufactured. This inspection is not always correct.
Similarly, a QA system (the “automation”) can perform the same inspection with its own probability of success. Before manufacturing each batch, the automation informs the operator whether it determined that the machine was set accurately or not. The operator then performs their inspection, considers the automation’s advice, and decides whether to continue manufacturing that order. While the human could have completed this inspection alone, automation was added as a DSS to help improve the overall performance of this task. In terms of decision-making, we define an inaccurate setup as a “Positive” reading, which optimally should trigger an “Alarm” (i.e., aborting the manufacturing of the order). Values are associated with each decision, depending on whether the machine was set up accurately and the decision was correct. The inspector can decide to stop manufacturing at a specific batch k. The event is then terminated, and either a True Positive (\(V_{TP}(k)\)) or a False Positive (\(V_{FP}(k)\)) value is associated with the decision for an inaccurate or accurate setup, respectively. If the inspector does not stop the manufacturing at any point and completes the order, the event ends after the N stages. The inspector’s series of N decisions results in a value of True Negative (\(V_{TN}\)) or False Negative (\(V_{FN}\)), depending on whether the setup was accurate or not, respectively.
Our analysis of this application assumes a theoretical “ideal human operator” who uses normative decision-making methods to maximize the expected value. Such a human remembers their past classifications and decisions, leverages them for Bayesian inference, and employs dynamic programming for optimal decision making [
63] (see a detailed analysis of the dynamic programming for this scenario in the Supplemental Material). This is not a realistic model of actual human decision-making, but it estimates the optimal performance level that can be reached in the task.
The insights we gain from analyzing results for an ideal human reveal characteristics of the system and can serve as the basis for the analysis of real-life scenarios involving the system. It should be noted that the human operator is “ideal” only in their ability to employ optimal decision-making processes. They do not necessarily have optimal or even good, detection sensitivity. The automation is assumed to be a simple, memory-less sensor with a fixed unbiased threshold.
This case demonstrates how dynamic events can become non-stationary. Their analysis requires a model that can accommodate such situations. The operators’ past decisions change their response tendencies (biases) for the subsequent decisions and the likelihood of them stopping the order manufacturing. Therefore, the probability of the human decision to act is not constant with k, and the stochastic variable representing it is not stationary.
3.2 Event Forward-Looking Analysis
The forward-looking analysis is performed by calculating the probabilities at the first stage (\(k=0\)) and then, using this knowledge, calculating forward the probabilities for stage k for \(k=1 \dots N-1\), based on the knowledge of the values from stage \(k-1\). The operator uses straightforward Bayesian inference to calculate the updated probabilities based on their previous probabilities and the automation’s current probabilities - where relevant. Details of the calculation are included in the Supplemental Material.
3.3 The Distance between Probability Distributions and Automation Influence
It is possible to compute the probability distributions of the series of decisions until the operator stops the order manufacturing (at stage \(k, 0\le k \le N-1\), or never), with and without the automation recommendation. Our measurement of the influence of automation on human behavior is based on the distance between these distributions. If they are close, automation has little effect on human decisions. If the distributions are very different, automation affects the decisions.
The sample space of possible decisions in this scenario consists of the following vectors (with
\(Abort_k\) and
\(Continue_k\) representing decisions to stop or continue the order manufacturing in stage
k):
Based on (
9), the probabilities for the operator’s decision without automation assistance, represented as
\(\breve{X}\) are: (note that the conditional probabilities below are defined in the Supplemental Material):
and
According to (
8), the probabilities for the human operator’s decision with automation support, represented as
\(\hat{X}\) are:
and
The Level of Influence (LoI) of the automation on the outcome is estimated by the Hellinger distance (
10) between these two distributions - the larger the distance, the stronger the automation influence. The benefit of automation in this application can then be determined, as explained above.
3.4 Numerical Example
To demonstrate how the model results can be used and interpreted, we present an example case for the above QA department scenario, using SDT concepts [
27,
51,
81]. The manufacturing machine setup accuracy is measured by a single observable parameter which transforms the data into a scale value. Such a parameter is represented by a random variable
Q that can take any real numerical value
\(q \in \mathbb {R}\). An incorrect setup of the manufacturing machine (a “positive” reading) can be represented as the presence of a
signal (denoted as
S), with a prior probability of occurrence
\(P_s\).
In contrast, a correct setup is referred to as
noise (denoted as
N), with a prior probability
\(P_n = 1-P_s\). Therefore, the measured setup accuracy Q can be associated with one of two probability distributions,
\(E_S\) or
\(E_N\), depending on whether a signal is present (Figure
2). In the equal variance SDT model, the two probability distributions are typically assumed to be similar but with different means. The difference between the means (measured in standard deviations) is the detection sensitivity
\(d^{\prime }\) of the detector (the human or automation in our case). The higher
\(d^{\prime }\), the smaller the overlap between the two distributions, and the easier it is to determine whether there is a signal. In actual applications, the prior probability of an inaccurate setup (a Signal) would be available from historical data for an existing system, the testing of a new system, or theoretical analyses or simulations for a system not built yet. The human detection sensitivities could be calculated from success rates in detecting such situations without a DSS. For example, assuming a normal probability distribution of the machine accuracy measure Q, the sensitivity,
\(d^{\prime }\), can be calculated as
\(d^{\prime }=Z(P_{TP})-Z(P_{FP})\), with
Z being the inverse of the cumulative normal distribution and
\(P_{TP}\) and
\(P_{FP}\) being the True Positive and False Positive probabilities of detection. For this numerical example we used the signal prior probability
\(P_S=0.2\) and a normal, equal variance distribution of Q with
\(E_N \sim \mathcal {N}(\mu _N,\sigma _N^2)=\mathcal {N}(0,1)\) or
\(E_S \sim \mathcal {N}(\mu _S,\sigma _S^2)=\mathcal {N}(d^{\prime },1)\) in case of Noise or Signal, respectively. Varying values of the detection sensitivity
\(d^{\prime }\) were used for the automation and the operator in the range
\([0.6,\dots ,3.0]\) with steps of
\(\Delta d^{\prime }=0.1\). This range represents agent sensitivity values from very poor (
\(d^{\prime }=0.6\)) to extremely high (
\(d^{\prime }=3.0\)). The event’s discrete dynamic programming numerical calculation, described in the Supplemental Material, was performed by taking observation steps
\(\delta =0.01\), and the probability range was divided into 100 buckets (
\(\epsilon =0.01\)). The event length was
\(N=1, 4 \& 8\) stages. We assumed that
\(M=10\) units would be manufactured in every batch, with unit values of +1 or -1 for intact or faulty units, respectively. This resulted in outcome values of
\(V_{TP}(k)=-10k\) for the cost of wasted material for faulty devices manufactured until this stage,
\(V_{FP}(k)=+10k\) (
\(k=0 \dots N-1\)) for the profit from intact devices manufactured until this stage, and
\(V_{TN}(N)=+10N\) for the value of a complete manufactured order of intact devices. The
\(V_{FN}(k)\) for each stage was calculated from the base value of maximum “waste” when the whole order is manufactured under inaccurate setup,
\(V_{FN}(N)=-10N\), going backward using dynamic programming as defined in the Supplemental Material.
A numerical analysis of the above example event provides us with a measure of the automation LoI on the operator’s decision in the case of a DSS when the automation is added to aid the human (Figure
3). Note that the results were not normalized to show the actual distance between the distributions.
The results show growth in the automation influence when the automation sensitivity (\(d_A^{\prime }\)) increases, while an increase in the human sensitivity (\(d_H^{\prime }\)) reduces the automation influence. Maximum automation influence is achieved when the automation has maximum sensitivity while the human has minimal sensitivity. When the automation sensitivity is low, the automation recommendation does not significantly influence human decisions, so the automation influence is reduced to a minimum. Although, in theory, Hellinger distance can take any value in the range [0,1], it is unlikely that the distance values in such scenarios would be close to 1 since that requires that the two probability distributions are never both positive for any sample. In reality, even with the lowest detection sensitivity, there will be some positive probability for almost all scenarios in the probability sample space, and the upper limit of 1 would not be reached. This can be demonstrated by a simple comparison of the extreme scenarios when comparing an All-Knowing Agent (\(d^{\prime }=\infty\)) who would have the following series of probabilities for an “Abort” decision at stage k: \([P_s, 0, 0, 0, \dots ]\), to a random decision (“Pure Chance”) at each stage based on a coin toss (\(d^{\prime }=0\)) that would result “Abort” probabilities of \([0.5, 0.5^2, 0.5^3, \dots ]\). The Hellinger distance between these extreme cases is 0.23, 0.68 & 0.79 for N=1, 4, & 8, respectively. Therefore, we refer to the maximal distance when the automation sensitivity is maximal and the human sensitivity is minimal as the upper limit used to normalize the influence.
The DSS influence depends not only on the relation between
\(d^{\prime }_A\) and
\(d^{\prime }_H\) but also on their absolute values. As shown by the solid black lines in Figure
3 panels (a)-(c). When the ratio of the two detection sensitivities
\(R=d^{\prime }_A/d^{\prime }_H\) is kept constant, the DSS influence still changes, depending on the actual values of these parameters.
The DSS influence changes as a function of the human and automation detection sensitivities, and it is not the same for all event lengths. In this example, the DSS influence is stronger when the event is longer, as can be seen in Figure
3, and through the growth of the average DSS influence values across all 625 sample points in each figure, as shown in Table
1.
Once the automation influence is established, the human responsibility can be calculated using (
11). The results for multiple event lengths are presented in Figure
4. Note that these charts show the data in a rotated axis view, compared to Figure
3, to better show the changes of the human responsibility as the sensitivity levels change. As shown in Figure
4, human responsibility decreases with the increase in automation sensitivity and the decrease in human sensitivity. By definition, T-RESP (
11) is zero when the automation influence is maximal, which occurs when the automation sensitivity is maximal (
\(d_A^{\prime }=3.0\)), and human sensitivity is minimal (
\(d_H^{\prime }=0.6\)). Similarly, when the human’s sensitivity is high (
\(d_H^{\prime }=3.0\)) and automation’s sensitivity is low (
\(d_A^{\prime }=0.6\)), the human is not affected by the automation’s advice and human responsibility is maximal (close to 1).
The results also show that the human responsibility indicator drops significantly in the analyzed scenario once the human detection sensitivity is below a certain threshold, between \(d^{\prime }_H\) values of 1.2 to 1.6, depending on the event length. This means that the human causal responsibility may, with specific detection sensitivity values, quickly drop when the detection sensitivity even slightly decreases.
This numerical example can demonstrate how the Hellinger distance represents the change in the decision probability. Since the probability distribution space is of dimension N, which is hard to illustrate for larger N values, we use a single decision (
\(N=1\)) to show the Hellinger distance on a linear, one-dimensional space. Using the above use case, with normal distributions and
\(d^{\prime }_A=d^{\prime }_H=2\), assuming signal probability
\(P_S=0.2\) and the above value matrix for
\(N=1\) (
\(V_{TN}=10\),
\(V_{FN}=-10\),
\(V_{TP}=V_{FP}=0\)), we calculate the probability to take action (i.e., assume “Signal”) for an All-Knowing Agent, Pure Chance (both defined above), human-only and for human assisted by automation (DSS). The Hellinger distances between those probability distribution combinations are shown in the top part of Figure
5(a) on a straight line. The Hellinger distance metric is additive on this one-dimensional space since when the human acts normatively and adopts the automation’s recommendation, the human distribution “moves” from its original (non-assisted) location towards the optimal All-Knowing Agent’s distribution by a distance equal to the DSS influence on the human. As expected, the EV improves when the automation assists the human (Figure
5(a) bottom-part). It should be noted that the distributions are not supposed to be ordered on the line according to their EV, but their order depends on the values of the outcomes and the detection sensitivity. For example, for a different outcome value ratio of 0.25, the human and human-assisted distributions would be placed to the left of the All-Knowing Agent distribution. However, the additive attribute of the distance would remain (results not shown here).
This analysis also shows that we must assume that the human aims to improve task performance and does not act randomly or against that goal. As can be seen in Figure
5(b), if the human deliberately decides to do the opposite of what a normative human would do (i.e., always choosing “Signal” when a normative human would choose “Noise,” and vice versa), this counter-normative behavior will drive the probability distribution of the human when assisted by automation, further away from the optimal All-Knowing Agent and result in the lowest EV. Our model would then show a considerable Hellinger distance, i.e., a significant automation influence and, therefore, supposedly, minimal human responsibility. In this case, it is hard to claim that the human is not responsible for outcomes, as they deliberately did not act towards improving the task performance. Thus, a prerequisite for the model is the assumption that the human aims to improve performance.