0% found this document useful (0 votes)
31 views168 pages

Electrocardiogram Delineation

This dissertation studies deep learning based automated ECG segmentation and delineation methods. Two convolutional neural network architectures are studied and used for localization of cardiac waves. Their performances are compared to each other and other studies in the literature. Furthermore, a long short-term memory network is designed and employed to capture long-term and short-term data dependencies in ECG sequences, improving P-wave identification. To address raw ECG signal segmentation, a hybrid deep learning consisting of convolutional autoencoders and LSTM networks is proposed and studied.

Uploaded by

Yaarob Issa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views168 pages

Electrocardiogram Delineation

This dissertation studies deep learning based automated ECG segmentation and delineation methods. Two convolutional neural network architectures are studied and used for localization of cardiac waves. Their performances are compared to each other and other studies in the literature. Furthermore, a long short-term memory network is designed and employed to capture long-term and short-term data dependencies in ECG sequences, improving P-wave identification. To address raw ECG signal segmentation, a hybrid deep learning consisting of convolutional autoencoders and LSTM networks is proposed and studied.

Uploaded by

Yaarob Issa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 168

Deep Learning Based

Electrocardiogram Delineation
A dissertation submitted to the

Graduate School of the University of Cincinnati

in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY (Ph.D.)

in the Department of Electrical Engineering

and Computer Science

University of Cincinnati

Cincinnati OH 45221 USA, July 2019

by

Hedayat Abrishami

M.S.C.S Islamic Azad University, Iran, 2012

B.S.C.S Islamic Azad University, Iran, 2008

Committee Chair: Xuefu Zhou, Ph.D.

Anca Ralescu, Ph.D.

Richard Czosek, M.D.

William Wee, Ph.D.

Chia Han, Ph.D.


Abstract

Cardiac waves in Electrocardiogram (ECG) provide important information for

heart conditions and effects of cardiac medications. Segmentation and analy-

sis of ECG and its constituent cardiac waves are of high significance in cardi-

ology diagnosis and pharmaceutical studies. Traditionally ECG analysis has

been performed by well-trained clinicians and cardiologists. However, utilizing

clinicians in large scale ECG screening such as drug test phases or population-

based screening programs is simply not economically feasible. Thus, an auto-

mated ECG segmentation approach that can segment cardiac waves accurately

is of high importance. This dissertation studies Deep Learning (DL) based auto-

mated ECG segmentation and delineation methods.

Due to various shapes and abnormalities presented in ECG signal, the tra-

ditional simple feature filters (i.e. correlators, or derivative-based filters) fail to

extract the variety of cardiac wave formations. Convolutional Neural Network

(ConvNet) applies multilayer feature filters on the input to extract complex fea-

tures from the signal. Thus, ConvNets can be utilized to extract hierarchical

i
features from ECG signals. Two ConvNet architectures are studied and used for

localization of cardiac waves. Their performances are compared to each other

and other studies in the literature as well. The result shows that ConvNets are

capable of extracting cardiac waves spatial features with good performance.

Furthermore, numerous long-term in addition to short-term temporal pat-

terns exist in ECG signals due to arrhythmia and other heart conditions. Gen-

erally speaking, short-term memory may not be able to capture the temporal

features in ECG signals, therefore, a Long Short-Term Memory (LSTM) network

is designed and employed to capture long-term and short-term data dependen-

cies in ECG sequences. This method has improved the identification of temporal

features, particularly in P-wave identification.

To address the raw ECG signal segmentation problem, a hybrid DL consist-

ing of convolutional autoencoders and LSTM networks is proposed and studied.

Results show that it performs generally well for raw ECG signal segmentation.

ii
iii
Acknowledgements

First and foremost, I wish to express my sincere gratitude to my advisor Profes-

sor Xuefu Zhou for his guidance at all the time of research and writing of this

dissertation. He has shown attitude and substance of a true scholar and during

the most difficult times when writing this dissertation, he gave me the moral

support and the freedom I needed to move on. Without his supervision and

constant help, this dissertation would not have been possible. I am also very

grateful to Professor Chia Han, for his aspiring guidance, insightful comments,

and encouragement. Also, a special thanks goes to Dr. Matthew Campbell

and Professor Richard Czosek, whom I worked with as Graduate Assistant for

a year and their insights in cardiology motivated me to choose this interdisci-

plinary topic.

I would like to thank the rest of my committee members, Professor William

Wee and Professor Anca Ralescu for their priceless time, insightful comments,

and challenging questions.

My sincere thanks also goes to Mr. Fred Murrell, who provided me the op-

portunity to join University of Cincinnati Simulation Center team during my

iv
Ph.D. studies. Also, I thank my mentors at the Simulation Center, Dr. Jue Wang,

Dr. Taoran Dong, and Dr. Matthew Barker for many great discussions and

helpful advises.

Last, my deepest gratitude goes to my family. My parents, Mahmoud Abr-

ishami and Mitra Nafarieh, for their love and support throughout life, and my

wife, Anna Abrishami, that her patience and kindness made this road enjoyable

to ride. Thank you all for your countless self-sacrifice and the chances provided

for me to get higher education and to chase my dreams.

v
Contents

1 Introduction 1

1.1 Significance of Research . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2 Research Scope and Contributions . . . . . . . . . . . . . . . . . . 15

1.3 Thesis Overview and Outline . . . . . . . . . . . . . . . . . . . . . 17

2 Related Works 19

2.1 Deep Learning Fundamentals . . . . . . . . . . . . . . . . . . . . . 23

2.1.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . 25

2.1.2 Convolutional Neural Networks (ConvNets) . . . . . . . . 40

2.1.3 Long Short-Term Memory Neural Networks . . . . . . . . 44

2.2 ECG Analysis Research Review . . . . . . . . . . . . . . . . . . . . 45

2.2.1 ECG Feature Extraction Methods . . . . . . . . . . . . . . . 46

2.2.2 ECG Feature Classification Methods . . . . . . . . . . . . . 48

2.2.3 DL-based Methods for ECG Analysis . . . . . . . . . . . . 51

2.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

2.5 Perspective of This Work . . . . . . . . . . . . . . . . . . . . . . . . 59

vi
Contents

3 ECG Cardiac Wave Localization Using Deep Convolutional Neural Net-

work 61

3.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.2 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.3 Neural Network Models . . . . . . . . . . . . . . . . . . . . . . . . 71

3.4 Evaluation and Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4 Supervised Cardiac Waves Segmentation Using Long Short-Term Mem-

ory Neural Network 87

4.1 Data Preparation and Feature Extraction . . . . . . . . . . . . . . . 89

4.2 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.3 Bidirectional Long Short-Term Memory Recurrent Neural Net-

work Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.3.1 Traditional RNNs and Bidirectional RNNs . . . . . . . . . 92

4.3.2 LSTM RNNs and Bidirectional LSTM RNNs . . . . . . . . 94

4.4 ECG-SegNet Architecture Model . . . . . . . . . . . . . . . . . . . 98

4.5 Training Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5 Cardiac Wave Segmentation Using Autoencoders and LSTM Layers 106

vii
Contents

5.1 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.2 Autoencoder to Extract Features . . . . . . . . . . . . . . . . . . . 110

5.3 Hybrid-ECG-SegNet Architecture Model . . . . . . . . . . . . . . 112

5.4 Training Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6 Conclusions And Future Work Plans 124

6.1 Directions for Future Research . . . . . . . . . . . . . . . . . . . . 125

Bibliography 128

viii
List of Figures

1.1 ECG signal recording [2]. . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Position of electrodes in the six chest leads and angle of Louis

position [3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Typical cardiac complex with annotated cardiac waves, segments,

intervals and peaks. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Various typical QRS cardiac wave [5]. . . . . . . . . . . . . . . . . 5

1.5 Various atypical QRS cardiac wave [5]. . . . . . . . . . . . . . . . . 6

1.6 Cardiac waves various temporal relationship. . . . . . . . . . . . 8

2.1 Artificial intelligence approaches [27] . . . . . . . . . . . . . . . . 21

2.2 Single neuron model. . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3 Activation functions. . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.4 Multilayer perceptron model example. . . . . . . . . . . . . . . . . 28

2.5 ConvNet model example. . . . . . . . . . . . . . . . . . . . . . . . 41

2.6 Max-pooling, stride and receptive field example. . . . . . . . . . . 43

2.7 RNN scheme and unrolled RNN scheme. . . . . . . . . . . . . . . 44

2.8 Categorization states. . . . . . . . . . . . . . . . . . . . . . . . . . . 57

ix
List of Figures

3.1 ConvNet cardiac wave localization. . . . . . . . . . . . . . . . . . . 63

3.2 (a) normalized ECG segment, (b) local area under the magnified

second-order derivative signal. . . . . . . . . . . . . . . . . . . . . 65

3.3 The proposed baseline MLP model architecture. . . . . . . . . . . 73

3.4 The proposed ConvNet architecture. . . . . . . . . . . . . . . . . . 75

3.5 The proposed ConvNet model with dropout layers architecture. . 78

3.6 ECGNet model result on a test set instance. . . . . . . . . . . . . . 82

3.7 Training and validation error curves for the best result, ConvNet

without dropout layer and learning rate of 10−3 . . . . . . . . . . . 83

4.1 Overall process for cardiac wave segmentation research using an

LSTM RNN model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.2 Traditional deep RNN architecture. . . . . . . . . . . . . . . . . . . 93

4.3 Bidirectional RNN layer. . . . . . . . . . . . . . . . . . . . . . . . . 95

4.4 LSTM cell [86, 88]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.5 The proposed BLSTM RNN architecture. . . . . . . . . . . . . . . 99

4.6 Accuracy curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.7 Loss curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.8 Sample result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.1 Hybrid-ECG-SegNet components. . . . . . . . . . . . . . . . . . . 107

5.2 Overall process of the Hybrid-ECG-SegNet. . . . . . . . . . . . . . 108

5.3 Autoencoder paradigm process. . . . . . . . . . . . . . . . . . . . . 111

x
List of Figures

5.4 The proposed autoencoder feature extractor. . . . . . . . . . . . . 114

5.5 The proposed hybrid architecture. . . . . . . . . . . . . . . . . . . 116

5.6 Accuracy curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.7 Loss curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.8 Two sample results. . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

5.9 Noisy sample result. . . . . . . . . . . . . . . . . . . . . . . . . . . 123

xi
List of Tables

2.1 Heart-conditions included in QTDB from various databases. . . . 55

2.2 Categorization output states. . . . . . . . . . . . . . . . . . . . . . 57

3.1 Dataset distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.2 Baseline MLP model description. . . . . . . . . . . . . . . . . . . . 73

3.3 The proposed ConvNet description. . . . . . . . . . . . . . . . . . 74

3.4 The proposed ConvNet with dropout layers description. . . . . . 77

3.5 Result for every architecture and their related learning rates. . . . 81

3.6 ECGNet with vicinity tolerance. . . . . . . . . . . . . . . . . . . . . 84

3.7 Cardiac wave identification performance. . . . . . . . . . . . . . . 84

3.8 Cardiac wave detection ratio comparison. . . . . . . . . . . . . . . 85

4.1 Dataset distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.2 Deep BLSTM RNN for ECG segmentation. . . . . . . . . . . . . . 100

4.3 ECG segmentation results. . . . . . . . . . . . . . . . . . . . . . . . 103

4.4 Segmentation accuracy comparison. . . . . . . . . . . . . . . . . . 103

4.5 Cardiac wave identification results. . . . . . . . . . . . . . . . . . . 104

xii
List of Tables

5.1 Dataset distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . 110

5.2 Hybrid-ECG-SegNet architecture. . . . . . . . . . . . . . . . . . . . 117

5.3 ECG segmentation results. . . . . . . . . . . . . . . . . . . . . . . . 119

5.4 Segmentation accuracy comparison. . . . . . . . . . . . . . . . . . 120

5.5 Cardiac wave identification results. . . . . . . . . . . . . . . . . . . 121

xiii
Acronyms

AdaGrad Adaptive Gradient algorithm.

AF Atrial Fibrillation.

AI Artificial Intelligence.

ANN Artificial Neural Network.

BIDMC Beth Israel Deaconess Medical Center.

BLSTM Bidirectional Long Short-Term Memory.

BO Bayesian Optimization.

BRNN Bidirectional Recurrent Neural Network.

ConvNet Convolutional Neural Network.

DFT Discrete Fourier Transform.

DL Deep Learning.

DWT Discrete Wavelet Transform.

xiv
Acronyms

E2E End-to-End.

ECG Electrocardiogram.

ESC European Society of Cardiology.

FN False Negative.

FNN Feedforward Neural Network.

FP False Positive.

FT Fourier Transform.

HCM Hypertrophy Cardiomyopathy.

HMM Hidden Markov Models.

HSMM Hidden Semi-Markov Model.

ICH International Conference on Harmonisation.

KNN K-Nearest Neighborhood.

LSTM Long Short-Term Memory.

MIT-BIH MIT-BIH Arrhythmia.

ML machine learning.

MLP Multilayer Perceptron.

xv
Acronyms

mV milliVolts.

NN Neural Network.

PAF Paroxysmal Atrial Fibrillation.

QTDB QT Database.

ReLU Rectified Linear Unit.

RMSE Root-Mean-Square Error.

RMSProp Root-Mean-Square Propagation.

RNN Recurrent Neural Network.

SCD Sudden Cardiac Death.

SRCNN Super-Resolution Convolutional Neural Network.

SVM Support Vector Machine.

TN True Negative.

TP True Positive.

xvi
List of Symbols

: Slicing operation

α Learning rate

C̄ Cell state

Φ Cardiac complexes location set

σ Element-wise sigmoid function

θ Weighted connection vector

ζ1 A smoothing kernel

ζ 10 A smoothing kernel

ζ 20 A derivative kernel

∆(., .) Dissimilarity function

A A hypothetical category

B A hypothetical category

C(1) A convolutional filter example

xvii
List of Symbols

C(2) A convolutional filter example

R Recurrent layer block representation

a Layer activation vector

DX ×Y Input-label distribution

E(γ) The local area under second-order derivative of ECG signal

E(norm) Normalized ECG

E( R) Raw ECG signal

E(SM)0 Smoothed ECG signal

E(SM) Smoothed ECG signal

E(WF) Normalized ECG signal

E( FD)0 First-order derivative of normalized ECG

E(SD)0 Second-order derivative of normalized ECG

F Forget gate

G Element-wise activation function

H Hidden layer neuron vector

I Input gate

O Model output

o Output vector activation

xviii
List of Symbols

Q Output gate

Tanh Element-wise tangent hyperbolic function

X Input instance

x Input vector

Y Label for an instance

y Label vector for a sample

C Encoded vector

A A member of set A

A A set of functions

B A member of set B

B A set of functions

Fn Space with n dimensions

Gm Space with m dimensions

L(., .) Loss function

X Input space

Y Label space

⊗ Convolution operation

τ Time index

xix
List of Symbols

a Neuron activation

ACC An accuracy metric

Avg(.) Average function

b Bias connection

c Center of receptive field position

FCL Fully-connected layer block representation

f req Frequency

g (.) Activation function

h Current hidden layer’s neuron index

h0 Previous hidden layer’s neuron index

i General indexing

Idx (.) Indexing function

j General indexing

K Number of categories

k Output neuron index

l Hidden layer index

m Vector dimension

MAX Max-pooling block representation

xx
List of Symbols

MI A misclassification metric

N Number of instances in a set

n Vector dimension

o Output neuron activation

S Stride

SE A sensitivity metric

Std(.) Standard deviation function

T Sequence length

t Time index

v Index of convolutional filter in a convolutional layer

w Receptive field size

x Input sample

y Label for a sample

z Weighted summation input

ζ2 A derivative kernel

xxi
Chapter 1

Introduction

Controlled by electrical currents through the heart muscle, the heart contracts

rhythmically and pumps blood throughout the body. An electrocardiogram

(ECG, or EKG) is the recording of the heart electrical activity to obtain critical in-

formation about the heart’s condition. The electrical activity of the heart varies

through time. In other words, ECG is a graphical representation of heart electri-

cal activity in milliVolts (mV) on the vertical axis against time on the horizontal

axis [1], as shown in Fig. 1.1.

Fig. 1.1: ECG signal recording [2].

As an essential part of the assessment of cardiac patients, an ECG can be uti-

1
lized to diagnose various cardiovascular diseases. For example, by measuring

time intervals on the ECG, a cardiologist can determine how long the electrical

wave takes to pass through the heart. Measuring the time duration of a cardiac

wave to travel from one part of the heart to the next shows if the electrical ac-

tivity is normal or abnormal, fast or slow. Also, by measuring the amount of

electrical activity passing through the heart muscle, a cardiologist diagnoses if

parts of the heart muscle are too large or overworked [1].

The conventional 12-lead ECG is the current worldwide standard for diag-

nostic electrocardiography. Each ECG lead monitors the electrical activity from

a particular position of the body with respect to the heart muscle [3]. Generally,

these twelve leads are divided into two groups. The first group consists of six

limb leads, and the second group includes six chest leads. The chest limbs elec-

trode angles are with respect to a slight horizontal ridge on the chest called the

angle of Louis. The angle of Louis is located where the manubrium joins the

body of the sternum [3]. The electrodes in the limb leads are placed on the left

arm, right arm and left leg, respectively. While the six limb leads are labeled as

lead I, lead II, lead III, aVR, aVL and aVF, the six chest leads are denoted as V1,

V2, V3, V4, V5 and V6. Fig. 1.2 shows the placement of the six chest leads and

position of the angle of Louis.

2
Fig. 1.2: Position of electrodes in the six chest leads and angle of
Louis position [3].

A normal cycle of heart activities referred to as a cardiac cycle or a cardiac

complex, or commonly known as a heartbeat. A normal cardiac complex is com-

posed of several cardiac waves. Among those waves, three of them are of high

significance, namely, P-wave, QRS-complex and T-wave. P-wave represents the

atrial depolarization. A typical P-wave duration is less than or equal 0.11 sec-

onds and the amplitude is less than 0.25 mV. QRS-complex represents ventricu-

lar depolarization. A typical QRS-complex duration is less than or equal to 0.12

seconds and the amplitude is over 0.25 mV [3]. Further, T-wave represents the

return of ventricular muscle to the resting state (ventricular repolarization). A

normal T-wave duration is from 0.10 to 0.25 seconds and the amplitude is at less

than 0.25 mV [3]. Other intervals and segments in the ECG signal are derivative

of these key waves including PR interval (i.e., from the beginning of P-wave to

3
the beginning of the QRS-complex), PR segment (i.e., from the end of P-wave

to the beginning of the QRS-complex segment), ST-segment (i.e., from the end

of S-wave to the beginning of the T-wave segment) and QT-interval (the end

of QRS-complex to the end of the T-wave interval). Each interval and segment

conveys information for cardiac diagnoses [4]. However, locating these cardiac

waves (P-wave, QRS-complex and T-wave) are essential in finding the rest of

the intervals and segments. Fig. 1.3 shows a typical lead II ECG cardiac com-

plex with cardiac intervals, segments and peak annotations. Further, a cardiac

Fig. 1.3: Typical cardiac complex with annotated cardiac waves,


segments, intervals and peaks.

complex consists of an electrical cardiac activation phase called depolarization

and a releasing phase called repolarization. During the depolarization phase,

P-wave and QRS-complex appear in the cardiac complex. During the repolar-

ization phase, T-wave emerges in a cardiac complex [3].

While Fig 1.3 represents one typical and normal cardiac cycle, due to arrhyth-

4
mia and many other heart conditions, various abnormalities may be present in

ECG signals and make it a challenging task to interpret. Fig. 1.4 shows five

normal cardiac cycles and four abnormal cardiac cycles are shown in Fig. 1.5.

Fig. 1.4: Various typical QRS cardiac wave [5].

5
Fig. 1.5: Various atypical QRS cardiac wave [5].

As shown in Fig. 1.4, a capital letter indicates the peak with high ampli-

tude and a small letter indicates the peak with low amplitude. The five typical

QRS-complex are QRS type (including Q-peak, R-peak and S-peak), QR type

(including Q-peak and R-peak), Q type (including only Q-peak), RS type (in-

cluding R-peak and S-peak) and R type (including only R-peak). Fig. 1.5 shows

four atypical QRS cardiac wave formations including QRSR’S’ type, QRSR’ type,

RSR’S’ type and RSR’ type. The abnormal ECG waveforms, R’ is the second pos-

itive or upward wave and S’ is the second negative or downward wave. These

abnormalities are caused by abnormal ventricular depolarization and they have

important clinical implications. For example, abnormal QRS complexes such

as RSR’, RSS’ or RSR’S’ complexes are elongated QRS waveforms and they are

signs of a symptom called bundle branch block [5]. Bundle branch block is a de-

6
lay in the contraction of ventricles (chambers of the heart) and it reduces pump-

ing efficiency of the heart muscle [6]. Moreover, the presence of RSR’ waveforms

can be an indicator of myocardial infarction (heart attack) [5]. The prime wave-

forms in atypical waveforms represent additional electrical activity to the nor-

mal waveforms. Therefore, one of the main challenges of ECG delineation is the

variety of cardiac wave formations and ECG abnormality patterns [5].

In addition to the variety of shapes in cardiac wave formations, cardiac waves

can have varieties of long-term or short term temporal relations to each other.

Fig. 1.6 shows different temporal relationships between cardiac waves. While

Fig. 1.6 (a) shows a QRS-complex is connected to the T-wave through an ele-

vated ST-segment, Fig. 1.6 (b) shows a QRS-complex is connected to the T-wave

through a depressed ST-segment. Both of them are abnormal ECG signals and

are related to Sudden Cardiac Death (SCD) and sudden cardiac arrest. Further-

more, Fig. 1.6 (c) shows T-wave inversion where the QRS-complex is connected

to an inverted T-wave. Here, unlike Fig. 1.6 (a), the T-wave has an abnormal

negative amplitude instead of normal positive ones. T-wave inversion is one of

the symptoms of SCD and eventual death [7]. These are a few examples of the

different temporal relationship between cardiac waves. A collection of abnor-

malities in cardiac waves formations and temporal relationship that lead to SCD

are reported in [4].

Computer-aided ECG analysis research idea dates back to 1950 [8]. The ob-

jectives of automated ECG analysis are to improve the correct interpretation of

7
Fig. 1.6: Cardiac waves various temporal relationship.

automated ECG analysis, reduce the need for clinician expert resources and re-

duce the cost of healthcare [8].

ECG delineation is the research of localizing onset, localizing offset, localiz-

ing peak, identification and segmentation of ECG cardiac waves. Additionally,

one of the primary goals of ECG delineation is to identify abnormalities in ECG

8
cardiac waves. Over the years, many digital signal processing algorithms have

been developed, as ECG analysis and localization of cardiac waves are crucial to

both cardiovascular research and clinical practices [8]. Based on accurate ECG

signal delineation, diagnoses of patients with potential heart diseases can be

made or the effects of newly developed medications can be studied. Campbell

et al. reported the performance of an automated ECG abnormality detection

method in SCD [7, 9].

Due to the varieties of abnormalities, it is a challenging task to develop a reli-

able automated ECG delineation system. A promising automated ECG segmen-

tation approach (a subset of ECG delineation topic) should accurately perform

ECG cardiac wave segmentation regardless of cardiac disease symptoms, car-

diac wave formations and cardiac wave temporal relations. Thus, an ECG seg-

mentation approach to accurately identify the subtleties of cardiac waves and

deliver results acceptable to cardiologists is a well-known and yet challenging

problem.

Because of cardiac wave formations and the temporal combination of cardiac

waves, the task of delineating ECG signal and its constituent cardiac waves re-

quires spatial pattern recognition methods to identify diverse patterns of cardiac

waves. Furthermore, it demands temporal pattern recognition methods to iden-

tify the temporal relation between the cardiac waves. This dissertation studies

novel ECG delineation methods (i.e. ECG segmentation, cardiac wave localiza-

tion and cardiac wave identification) based on Deep Learning (DL) methods.

9
Results show that DL-based approaches for extracting spatial and temporal fea-

tures perform better than the traditional simple-feature-filter-based approaches

in ECG delineation.

In order to measure the capability of DL-based approaches in extracting the

hierarchical structure of cardiac waves as well as capturing the long-term and

short-term temporal relation patterns of cardiac waves, experiments have been

conducted. Each research presents feasible DL approaches to address spatial

and temporal cardiac wave pattern recognition. Based on these experiments, a

novel End-to-End (E2E) DL model for ECG segmentation is proposed. An E2E

model takes raw ECG signals as its inputs and maps the inputs to the desired

output by finding the pattern in the intermediary process [10].

The goal of this dissertation is to introduce Artificial Neural Network (ANN)

models for ECG analysis from expert-annotated ECG data. There are three ad-

vantages to the proposed methods. First, the technical difficulties of pattern

recognition to extract complex features can be reduced by utilizing self-adapting

feature extraction methods. Second, by using the complex features, discrimina-

tion of cardiac waves can be maximized. Finally, both long-term and short-term

temporal dependencies, rather than only the short-term temporal dependency

in ECG data can be captured by employing models with the capability of cap-

turing long-term and short term dependencies.

This chapter is organized as follows. Section 1.1 describes the significance of

10
this research and potential applications based on automated ECG delineation.

Section 1.2 presents the scopes of computer-aided ECG analysis and contribu-

tions of this dissertation. Finally, Section 1.3 outlines the organization of the

dissertation.

1.1 Significance of Research

As discussed above, numerous diagnoses and findings can be made based upon

ECG signals. Currently, the interpretation of a patient’s ECG signals requires a

well-trained electrophysiologist or cardiologist. Due to the significant workload

involved and economic cost, generally, it is infeasible to perform general popu-

lation screening for heart diseases including the leading cause of SCD, Hyper-

trophy Cardiomyopathy (HCM). Therefore, it is of high significance and value

to conduct automated screening based on automated ECG delineation. Over

the years, ECG delineation has generally been addressed by two different ap-

proaches. One approach is to localize important peaks of ECG cardiac waves

and determine how other samples are relative to the identified peaks. The other

approach is to classify every ECG sample into one of the cardiac wave cate-

gories, i.e., ECG segmentation [11]. The performances of these two approaches

depend on the feature filters used in the ECG analysis, the kernels designed for

identification of a particular wave and the accuracy of categorization methods

[12–15].

11
Applications of the ECG delineation research include but are not limited to

the following areas.

• The first application is automated population screening for heart diseases

and assisting cardiologists [16]. Universal or national ECG screening pro-

grams can only be conducted based on a reliable automated ECG screening

systems due to their economical cost [7]. However, current ECG monitor-

ing systems are cited for their high false-positive rates and thus unable

to meet the needs [17]. In addition, an automated ECG analysis system

can assist physicians to pre-screen patients. Pre-screening patients can

improve diagnoses efficiency significantly and make a positive impact on

overall health care. A reliable automated ECG delineation should provide

a solid foundation for automated heart disease screening.

• The second application is the mobile cardiac telemetry. Remote medical di-

agnostic systems can provide medical diagnoses to patients remotely, par-

ticularly helpful for patients in isolated or remote areas who lack quality

medical services [18]. Besides, the increasing demand for predictive and

personalized healthcare require remote patient monitoring as well [18].

Thus, an ECG delineation system is a critical part of the cardiac teleme-

try.

• The third application is screening heart activities during drug trials. ECG

screening is one of the essential phases of drug development. Sometimes,

12
cardiac events are caused by reactions to drugs and this led to a variety of

regulatory responses. In 1997, the FDA and the International Conference

on Harmonisation (ICH) issued a Guidance for pharmaceutical industry

titled S6 Preclinical Safety Evaluation of Biotechnology-Derived Pharma-

ceuticals. It is mandatory that a cardiovascular safety test should be per-

formed during any drug development [19]. In addition, ECG monitoring

should be conducted in order to determine the drug dosage as well [20].

Furthermore, in the drug development phase, all types of effects associ-

ated with heart activities of the subjects under the medication should be

determined. This requires a very comprehensive ECG signal delineation.

While computer-assisted medical diagnosis offers several benefits, there are

limitations and concerns as well. Inaccurate diagnosis may cause unnecessary

treatments or harm to the patients. In addition, patients may not receive the nec-

essary and critical treatments due to the failure of identification of specific symp-

toms [21]. Therefore, verification and validation are required for any computer-

assisted diagnosis.

As a result of the discussion about the limitations and benefits of automated

ECG screening systems, increasing accuracy and comprehensiveness of such

systems will lead to more reliable ECG screening methods.

While working on a multidisciplinary project of developing an automated

HCM screening based on the Seattle Criteria [4], problems related to the limi-

13
tations of current ECG signal analysis approaches have been identified. HCM

symptoms are a subset of SCD symptoms that are common among athletes and

this subset of SCD symptoms are introduced in Seattle Criteria [4]. In our con-

ducted research, the first step of the experiment was implementing a cardiac

wave localization. Preliminary research on cardiac wave localization began and

three different approaches were implemented. Even though the results of the

implemented studies were satisfactory on a public dataset called arrhythmia

dataset [22], the localization wasn’t performing to the expectations when SCD

symptoms were occurring. By studying these methods, it became clear the pro-

posed features acquisition in these works isn’t adapted to the HCM symptoms.

For example, low R-peak amplitude in QRS-complex could mislead the algo-

rithm. As a result, an ensemble of ECG analysis approaches was adopted [7,

9].

Meanwhile, there were multiple developments on feature filter tuning and

DL methods in image processing fields. Zeiler et al. developed a novel image

classification using a DL model that can tune feature filters capable of extracting

essential spatial features for a variety of image classes from raw pixel data [23].

By that time, a method that could extract the essential features from various

cardiac wave formations and heart-condition symptoms was missing from ECG

signal processing studies. From that moment, attempts to define the problem

statement and studying an E2E ECG delineation system without any predefined

feature filters started.

14
1.2 Research Scope and Contributions

As mentioned earlier, this dissertation focuses on ECG delineation including

ECG segmentation, cardiac wave localization and cardiac wave identification.

ECG segmentation is the classification of ECG samples into the corresponding

cardiac wave categories. Once ECG segmentation is successfully performed,

cardiac wave localization can be studied as well. By localizing or segmenting

cardiac waves, cardiac wave identification rate is reported. Furthermore, using

DL methods, spatial patterns, long-term and short-term dependencies existing

in cardiac waves are investigated.

Computer-aided ECG analysis has two broad scopes. The first scope is defin-

ing standards, definitions and diagnostic criteria. Cardiology community ex-

perts have attempted to provide such standards by conducting conferences and

publishing their results. Thus, the automation algorithms can be developed with

respect to the published criteria [4, 8, 24]. Willems et al. defined cardiac waves

and its cardiac intervals and reviewed the approaches to visual determination

of the onsets and offsets of the P-wave, the QRS-complex and the T-wave [24].

Drezner et al. have defined several criteria for cardiac waves to diagnose SCD

[4]. The second scope of computer-aided ECG analysis is utilizing algorithmic

approaches for ECG signal processing. Multidisciplinary research teams study

the feasibility and accuracy of these definitions and criteria using signal process-

ing methods and algorithmic approaches [7–9].

15
Different aspects of the algorithmic computer-aided ECG analysis are dis-

cussed further. The first aspect is the acquisition of ECG signal, converting ECG

signal from analog to digital and removing baseline and artifacts of ECG signal

[25, 26]. A public dataset [22] has already acquired ECG signals and this work

isn’t dealing with acquiring ECG signals. However, a baseline wander drift re-

moval methodology is used on the provided ECG signals. The second aspect

is cardiac wave onset, offset, peak localization, cardiac waves recognition and

cardiac wave segmentation. The third aspect is the interpretation of ECG car-

diac waves using diagnostics [8]. This aspect of ECG analysis delves into the

computer-assisted diagnoses. The fourth aspect is data compression and energy

efficient methods for wearable devices with limited power consumption [8, 25].

This research does not focus on the third and fourth aspects of ECG signal pro-

cessing.

The contribution of this dissertation lies in:

• The first contribution is to improve the ability to extract complex features

for P-wave, QRS-complex and T-wave from ECG signals by utilizing a DL-

based network called Convolutional Neural Network (ConvNet) which is

capable of extracting complex features by utilizing multilayer feature fil-

ters. Chapter 3 presents DL approach adopted for this task.

• The second contribution is to explore long-term data dependency in addi-

tion to the traditional short-term data dependency in ECG signals to im-

16
prove the ECG temporal pattern recognition task. A new DL model archi-

tecture based on Long Short-Term Memory (LSTM) networks is proposed

to identify the temporal relationship between cardiac waves. Unlike the

traditional Recurrent Neural Network (RNN) and Hidden Markov Models

(HMM), this new model is capable of capturing both long-term and short-

term temporal dependencies. This feature makes it particularly suitable for

ECG segmentation. Chapter 4 details the approach for this contribution.

• The third contribution is to develop an E2E ECG segmentation model us-

ing a hybrid DL architecture. Traditional methods fail to provide a com-

petitive ECG segmentation method using raw ECG signals. In Chapter 5,

an E2E ECG segmentation model using raw ECG signal as its input based

on convolutional layers, autoencoders and LSTM layers is proposed.

1.3 Thesis Overview and Outline

This dissertation is organized as follows:

• Chapter 2 provides a background on the related works and presents an

overview of different spatial and temporal feature extraction procedures

in ECG analysis field.

• Chapter 3 introduces two Neural Network (NN) models using DL ap-

proaches for localizing and identifying ECG cardiac waves.

17
• Chapter 4 introduces research on the capability of DL methods in temporal

pattern recognition of ECG cardiac waves. Given appropriate ECG feature

vector as inputs to a deep RNN, we have demonstrated that this type of DL

model is capable of ECG segmentation with competitive results, especially

in P-wave segmentation.

• Chapter 5 presents a hybrid DL method capable of extracting various car-

diac wave formations and capturing ECG temporal relationships from the

raw ECG signal.

• Chapter 6 concludes the dissertation and discusses potential future works

in this field of research.

18
Chapter 2

Related Works

Since programmable computers were invented, people wondered if these ma-

chines can have intelligence. In the early days of the invention, programmable

computer intelligence, known as Artificial Intelligence (AI), tackled the prob-

lems that were too difficult for human beings to solve but rather straightforward

for computers, i.e., problems that required formal mathematics or intense mem-

ory bookkeeping [27]. One of the most important AI successes in the early stages

was defeating Gary Kasparov by IBM’s Deep Blue in the game of chess [27, 28].

On the other hand, the shortcoming of AI proved to be providing solutions for

the tasks that are hard-to-describe but easy-to-perform for human beings such

as understanding spoken words, recognizing objects in a scene, understanding

actions and understanding human behaviors. The reason for this shortcoming

is computers perform extraordinarily well when the set of rules are known and

can be formulated; however, if the task is more intuitive, abstract and less de-

scribable, hard-coding the knowledge into programs becomes very challenging

19
[27].

The difficulties related to comprehensive hard-coded knowledge for an AI

system motivated scientists to devise AI systems with the ability to acquire

their own knowledge by extracting patterns from the data. These approaches

are known as machine learning (ML) methods. The performance of an ML al-

gorithm heavily depends on the data presented to it. Data presentation is the

way to present meaningful interpretable information, known as features, to the

ML algorithm, then an ML algorithm can find the correlation between the fea-

tures and perform the designated tasks (i.e., decision making, regression, etc.).

Generally speaking, there are two types of features including ML-obtained fea-

tures and hand-designed features. While hand-designed features are designed

by experts to extract meaningful information from the raw data (i.e., designing

a first-order derivative filter to extract information about the edges in an image),

ML-obtained features are generated by an ML algorithm as a representation of

the input [27].

As mentioned earlier, one of the most successful approaches in ML is DL

which is a graph-based hierarchical approach consisting of multiple layers cas-

caded on top of each other. Due to its hierarchical nature, DL is capable of ex-

tracting more complicated and abstract features at the top of the hierarchies by

utilizing the simple and shallow concepts obtained from the data in the lower

hierarchies [27].

20
Fig. 2.1 summarizes different classes of approaches in AI. Here, ML mod-

ules are indicated by the grey blocks. As shown in Fig. 2.1, rule-based methods

do not have any ML module. Furthermore, classic ML methods rely on hand-

Fig. 2.1: Artificial intelligence approaches [27]

21
designed features extracted by another algorithm. In contrast to rule-based sys-

tems and classic ML methods, deep learning relies on ML approaches to obtain

their features.

Computer-aided ECG analysis is one of the many applications that AI has

tried to solve. Not only does specific pattern identification in ECG signals re-

quire in-depth expert medical knowledge but also it is very challenging to trans-

fer this knowledge in a hard-coded set of rules for programmable computers.

There are a plethora of ECG analysis studies using rule-based methods and clas-

sic ML methods [29]. These studies are referred to as traditional ECG signal

processing approaches. On the other hand, there is still a lack of study in ECG

analysis using DL approaches. Thus, more advanced approaches of AI such as

DL can be a great contribution to the ECG analysis field. The novelty of this

dissertation research is to bring DL solutions to ECG analysis and evaluate their

performances.

This chapter describes related works on traditional ECG signal processing

and DL-based methods and is organized as follows. Section 2.1 introduces DL

fundamentals and their applications. Section 2.2 reviews the traditional feature

extraction methods, classification methods and DL methods for ECG analysis.

Section 2.3 introduces the dataset used in this dissertation research. Section 2.4

introduces performance metrics adopted to evaluate various segmentation and

classification algorithms. By adopting these metrics, the performance of algo-

rithms has been investigated and compared to related works in the literature.

22
Finally, Section 2.5 discusses the perspective of the dissertation research.

2.1 Deep Learning Fundamentals

Traditional ML methods are statistical learning methods where a set of features

or attributes describe each instance in a dataset. In contrast to traditional ML

methods, DL methods are statistical learning methods that extract features or

attributes from the raw data through an optimization procedure called training

[27]. However, there’s an interpretation for defining the raw data, e.g., in hand-

writing recognition, local features of a pixel are considered to be raw data [30]

or in image classification, the RGB values of the pixels are considered to be raw

data [23].

Both traditional ML methods and DL methods are subsets of ML methods.

There are three types of learning in ML including supervised learning, unsu-

pervised learning and reinforcement learning. In supervised learning, a set of

input-label pairs, called labeled data, is provided to the algorithm for training.

A supervised learning algorithm analyzes the training data and produces an

inferred function, which can be used for mapping new examples. Unlike super-

vised learning, unsupervised learning performs without labeled data, i.e., the

input data does not have any label associated with it. The goal of unsupervised

learning is to find the underlying structure of the data, learn relationships be-

tween elements in the data and identify common characteristics among them.

23
The typical applications for unsupervised learning are clustering problems or

association between the data problems. Furthermore, reinforcement learning is

learning through an agent acting randomly or heuristically in an environment

to achieve a defined goal and learning by scalar reward/punishment when the

goal/failure is achieved [27, 30]. This dissertation utilizes supervised learning

for ECG analysis.

For supervised learning, there are three sets of input-label data pairs, the

training set, validation set and test set. For every input-label pair (X, Y), X is

an element from input space X and Y is an element of label in space Y . The

instances in the training set and test set are mutually exclusive. However, the

validation set can be a subset of the training set or mutually exclusive from the

training set and test set. The goal of the training set is to minimize a task-specific

error measure defined on the test set. For example, in a regression task, the

error measure is usually the Euclidean distance between the algorithm outputs

and the provided labels. The purpose of the validation set is to validate the

performance of learning during the training phase. In particular, the validation

set is used to determine when to stop training and avoid over-fitting. The test

set is used to measure the performance of learning on the unseen input-label

pairs after training [30].

24
2.1.1 Artificial Neural Networks

DL methods are based on ANNs and have been used in supervised learning,

unsupervised learning and reinforcement learning. ANN models attempt to

mimic the neurons in biological brain and is based on a collection of weighted

connected units called artificial neurons [31]. Each weighted connection, like

the synapses in biological brain, can transmit a signal from one artificial neuron

to another. Even though neurons in ANNs have a very small resemblance of

biological neurons, ANN’s ability in pattern recognition keeps it as a common

approach in the ML field [30]. Generally, there are two types of ANNs including

acyclical ANNs and cyclical ANNs. In acyclical ANNs, there’s no weighted con-

nection between any two neurons that can create a loop between them. These

types of ANNs are referred to as Feedforward Neural Network (FNN). In con-

trast, cyclical ANNs are RNNs and have extra weighted connections that create

loops in the network graph to maintain the temporal interstate of the network

[30, 31]. A graph-based illustration of a neuron is shown in Fig. 2.2.

Mathematical representation of a neuron is expressed as

n
z = X1×n × θn×1 + b = ∑ xi × θi + b , (2.1)
i =1

and

a = g(z) (2.2)

25
Fig. 2.2: Single neuron model.

where X1×n is the input to the neuron, θn×1 is the weighted connection vector

and b is the bias connection. Bias unit or intercept is necessary for data fitting

and every neuron except the input neuron has one independent bias unit con-

nected to it through the bias connection. A bias unit is not connected to any

previous layer neurons and it is set to numerical value of one. z is the weighted

summation input to the neuron and g(.) is called activation function. Activation

function can be a linear function or a nonlinear function such as sigmoid func-

tion, Tanh function, Rectified Linear Unit (ReLU) function and piecewise linear

function, as shown in Fig. 2.3.

The sigmoid function, given by

1
g(z) = (2.3)
1 + e−z

26
Fig. 2.3: Activation functions.

is one of the most commonly-used nonlinear activation functions for neurons,

especially for output neurons. Since its output is in the range of (0, 1), it is

suitable for modeling probability. Due to its nonlinearity, ANNs with sigmoid

activation function performs well in nonlinear classification and nonlinear equa-

tions modeling. Therefore, it is more suitable for nonlinear problems than linear

ANN models which are capable of modeling only linear equations. Moreover,

any combination of linear operators is still a linear operator, i.e., any ANN with

multiple linear layers is exactly equivalent to another ANN with a single linear

layer [30]. Linear networks are in contrast to nonlinear networks which can gain

considerable nonlinear complexity by using successive layers to re-represent the

27
input data [30, 32]. Another key property of sigmoid function is its differentia-

bility, which allows the network to be trained with gradient descent [30].

Thanh function described as

ez − e−z .
Tanh(z) = (2.4)
ez + e−z

is another nonlinear sigmoidal activation function with the range of (−1, 1).

Multilayer Perceptron

The neurons in a Multilayer Perceptron (MLP) model are arranged in three dif-

ferent interconnected layers including one input layer, one or more hidden lay-

ers and one output layer. Fig. 2.4 shows an MLP example with one input layer

Fig. 2.4: Multilayer perceptron model example.

consisting of four input neurons activated by input vector X, two hidden layers

28
(l )
each consisting of three neurons which outputs ah (where h is the index of the

hidden neuron and l is the hidden layer index) and finally one output layer con-

sisting of two output neurons which output ok where k is the index of the output

neuron.

All MLP layers except the input layer are fully-connected layers (dense lay-

ers), i.e., each neuron receives input from every neuron of the previous layer.

This allows a particular layer in MLP to learn patterns from all the combinations

of the features provided in the previous layer. One common application of MLP

models is classification. By providing a feature set for an MLP, this type of model

can classify the input into various categories.

It has been shown that an MLP with a single hidden layer containing a suf-

ficient number of neurons with nonlinear activation function can approximate

any continuous function on a compact domain to arbitrary precision [30]. How-

ever, since the output of an MLP depends only on the current input (i.e., mem-

oryless) and doesn’t pass on any past or future inputs, MLPs are more suitable

for pattern classification rather than sequence learning [30, 33].

Forward Pass (Feedforward)

In the feedforward process, the input is passed through the intermediate layers

in one direction (i.e., no feedback), activates the neurons of the hidden layers

and finally activates the output layer.

29
In an MLP with an input X, each neuron in the first hidden layer calculates

a weighted summation of the input neurons. For hidden neuron h in the first

hidden layer, the weighted sum of the input is expressed as

n
∑ θi,h xi + bh
(1) (1) (1)
zh = (2.5)
i =1

(1)
where n is the number of input neurons and θi,h denotes the weighted connec-
(1)
tion from input neuron i to hidden neuron h, bh is the bias for hidden neuron h.
(1)
Afterward, the activation function gh (.) for neuron h is applied to the weighted
(1) (1)
sum zh and the final activation ah is obtained as

(1) (1) (1)


a h = gh ( z h ) . (2.6)

By calculating the activation for each neuron in the first hidden layer, weighted

summation and activation for the rest of the hidden layers are repeated, e.g., for

neuron h in the l th hidden layer Hl , the summation and activation are given by

[30]
( l −1)

(l ) (l ) (l )
zh = θh0 ,h ah0 + bh , (2.7)
0
h ∈ H l −1

and
(l ) (l ) (l )
a h = gh ( z h ) (2.8)

(l )
where Hl −1 is the hidden neuron set in l − 1th hidden layer, θh0 ,h denotes the

weight connection from neuron h0 in l − 1th hidden layer to neuron h in the l th

30
(l )
hidden layer, bh is the bias for hidden neuron h in the l th hidden layer.

For the MLP example shown in Fig. 2.4, by activating the input layer using

the input vector X, the forward pass for the first and second hidden layers cal-

culates the output of the first and second hidden layers, respectively. Utilizing

the activation of the second hidden layer, the output layer activation O, which

is the output of the model, can be obtained.

Output Layer

In an NN model with K output neurons, the output vector is given by the acti-

vation function of the neurons in the output layer. The weighted summation, zk ,

for the kth output neuron is calculated by


( L)
zk = θh,k ah + bk (2.9)
h∈H L

where L is the index of the last hidden layer, θh,k denotes the weighted connec-
( L)
tion between the hidden neuron h and output neuron k, ah is the activation of

neuron h in the last hidden layer, bk represents the bias connection for output

neuron k, and k ∈ {1, ..., K }.

For a classification problem with K classes where K > 2, the convention is to

have K output neurons and normalize the output activations with the softmax

31
function to obtain the class probabilities as

e zk
p (ck |X) = ok = (2.10)
∑kK0 =1 ezk0

where ok is the activation of the kth output neuron. Label data in a classification

task follows a method called 1 − o f − K coding. This method represents the

label class as a binary vector with all elements equal to zero except for element

k, corresponding to the correct class ck , which equals one. For example, if K = 5

and the correct class is c2 , label data Y, is represented by (0, 1, 0, 0, 0) [30]. Using

this coding obtains the following convenient form for the label probabilities:

K
∏ ok k
y
p(Y|X) = (2.11)
k =1

which implies the probability of finding the correct class by the model. If all the

output neuron activations are close to the label data, the probability is closer to

the numerical value of one (i.e., the model finds the correct class). Otherwise,

if any of output neuron activations are not close to its corresponding label, the

probability will be closer to zero (i.e., the model is incapable of finding the cor-

rect class).

Given the above definitions for pattern classification in MLPs, input vector

activates the model and the most activated output neuron corresponds to the

predicted class label [30].

32
Loss Functions

The loss function for multiclass classification problems is simply obtained from

[30]
K
L(X, Y) = − ∑ yk ln ok (2.12)
k =1

where L(X, Y) is the loss function and the equation is differentiable which makes

it suitable for computing the error gradients. This loss value is small (close to

zero) if the output neuron activations and the label for that neurons are close

to each other. For example, if kth output neuron activation is close to zero and

the label data for that output neuron is also zero (the instance doesn’t belong to

the kth class), thus, the loss value related to the kth class will be approximately

zero because the model has classified it correctly. On the other hand, the loss

value will be large if a label element and its output neuron activation are far

from each other. For example, the kth output neuron activation is close to zero,

but the label data for that output neuron is one (the instance belongs to the kth

class), the loss value will be very large because the model has classified the kth

class incorrectly. Given the above definition, the range of the loss function for a

classification problem is (−∞, 0).

Backward Pass (Backpropagation)

By using the differentiable loss function detailed in Eq. 2.12, MLP can be trained

to minimize the loss function utilizing gradient descent methods. The most pop-

33
ular approach to calculate the gradient descent is known as backpropagation

[30, 34]. Backpropagation is also referred to as the backward pass of the net-

work. The purpose of backpropagation in ANN is finding appropriate amount

of weight modification for each weighted connected neuron to minimize the

loss function. Backpropagation is a repeated application of chain rule for partial

derivatives. The first step is to calculate the derivatives of the loss function with

respect to the output neurons [30]. For a multiclass network, by differentiating

loss function, Eq. 2.12, we obtain [30]

∂L(X, Y) y
=− k. (2.13)
∂ok ok

The activation of each neuron in a softmax layer depends on the network input

to every neuron in the layer, the chain rule gives [30]

K
∂L(X, Y) ∂L(X, Y) ∂ok0 .
= ∑ (2.14)
∂zk k 0 =1
∂ok0 ∂zk

By differentiating Eq. 2.10, we obtain [30]

∂ok0
= ok δk,k0 − ok ok0 (2.15)
∂zk

34
K
where ∑ ok = 1 because it sums over the activation of all the neurons in a
k =1
softmax layer and

 1 k = k0

δk,k0 = . (2.16)
 0 k 6= k0

By substituting Eq. 2.13 and Eq. 2.15 into Eq. 2.14, we obtain

∂L(X, Y)
= ok − yk . (2.17)
∂ok

Backpropagation is calculated the same way for all the hidden layers. It is help-

ful to introduce the differentaition of loss function with respect to the neuron

input as following [30]:


def ∂L(X, Y)
δj = (2.18)
∂z j

where j is any neuron in the network and z j is the weighted summation for any

neuron except input neurons [30]. For the neurons in the last hidden layer, we

have [30]
K
∂L(X, Y) ∂ah ∂a ∂L(X, Y) ∂zk
δh =
∂ah ∂zh
= h
∂zh ∑ ∂zk ∂ah
(2.19)
k =1

where L(X, Y) depends only on each hidden neuron h through its influence on

the output neurons. By differentiating Eq. 2.6 and Eq. 2.9 and then substituting

their derivative into Eq. 2.19 gives [30]

K
δh = g0 (z j ) ∑ δk θh,k (2.20)
k =1

35
where g0 (.) is the is the first-order derivative of the activation function with re-

spect to its input z. In addition, the δ terms for each hidden layer Hl before the

last hidden layer can be calculated recursively by [30]:


(l ) (l ) ( l +1) ( l +1)
δh = g0(l ) (zh ) δh0 θh,h0 (2.21)
0
h ∈ Hl +1

where h is the index of the hidden neuron (h ∈ Hl ) and h0 is the index of the

neuron in the hidden layer above (h0 ∈ H(l +1) ). Once the δ terms are obtained

for all the hidden neurons, utilizing Eq. 2.5, the derivatives with respect to each

of the network weights can be obtained by [30]

∂L(X, Y) ∂L(X, Y) ∂z j
= = δj ai (2.22)
∂θi,j ∂z j ∂θi,j

where θij is the weighted connection between neuron i and neuron j, z j is the

weighted summation of neuron j and ai is the output of neuron i. Finally,

weighted connections are updated by moving to the negative direction of the

derivative of loss with respect to the weighted connection, as given by

∂L(X, Y)
∆θi,j = (1 − α)θi,j − α (2.23)
∂θi,j

where α ∈ [0, 1] is the learning rate. This process is repeated until a stopping

criteria is met (i.e., the loss function stops decreasing when repeating the pro-

cess or certain numbers of repetitions completed [30]). More information on DL

36
fundamentals can be found at [30].

Batch Learning, Mini-Batch Learning and Stochastic Learning

Batch learning, mini-batch learning and stochastic learning are weight update

policies. Batch learning updates the network weights by calculating the average

of the gradients with respect to the loss function for the entire training instances.

Since the average of gradients with respect to the loss function for the entire

training set is known, batch learning has the advantage of fast convergence to a

minima, but it has the risk of falling into a local minima [30, 35]. Unlike batch

learning, in stochastic learning, the network weights are updated after calcu-

lating the gradients with respect to the loss function for each training instance.

Stochastic learning has the advantage of escaping from local minima since ev-

ery training instance has a different loss value, but it has the risk of missing the

global minima as well [35]. Due to the nature of stochastic learning, this policy

is referred to as online learning as well [30]. In contrast to batch-learning and

stochastic learning, mini-batch learning divides the training set into multiple

subsets and updates the weights after calculating the average of gradients with

respect to the loss function for each subset. Therefore, it has the advantage of

escaping the local minima and fast convergence, however, the training set size

is an important factor in mini-batch learning [35, 36].

37
Weight Initialization

Gradient descent algorithms for NNs require small, random, initial values for

the weights which commonly follows sampling from a Gaussian distribution

with mean zero and a standard deviation of 0.1 to initialize the weights [30].

Another approach for weight initialization is called Xavier weight initialization

[37]. This approach adjusts the standard deviation of zero-mean distribution

for random weight initialization of a layer dependent on the number of weights

connected to the layer. The idea behind this approach is keeping the variance

of layer’s output the same as the variance of the layer’s input. It is shown that

for deep models with a large number of hidden layers, this approach can in-

crease the convergence speed [37]. Another approach similar to Xavier weight

initialization is called He normal initialization [38]. This approach follows the

Xavier solution, but the weights range are dependent on the number of neurons

in the previous layer. It is shown that He normal initialization approach is more

efficient on models with ReLU activation function [38].

Neural Network Hyperparameters

Hyperparameters in NNs refer to the variables that construct or train NN mod-

els such as the number of hidden layers, number of neurons in each hidden

layer, learning rate and size of mini-batch in mini-batch learning. It is a chal-

lenging and critical task to select the values of these variables as they determine

38
the performance of the model.

Generally, there are three approaches to choose hyperparameters. The first

approach is a grid searching method by keeping all other hyperparameters con-

stant except the one that is targeted for grid search. For example, to identify the

best learning rate, the NN model should be trained on a range of learning rates

while the rest of the hyperparameters remain constant. The second approach

is relying on similar research studies and selecting relatively similar hyperpa-

rameters. The third approach is optimization-based methods such as Bayesian

Optimization (BO) approaches [39]. BO divides some of the inputs into differ-

ent sets called surrogates. Using a slightly different set of hyperparameters for

each surrogate, BO measures the performance of the model on each surrogate

and tries to find the best combination of hyperparameters set by repeating this

procedure. The most common approach of choosing hyperparameters is relying

on similar research studies and evaluate the performance and the convergence

of the new NN model based on them.

Additionally, annealing rate is another hyperparameter. In a common learning-

rate-annealing method called step decay, at the beginning of the training pro-

cedure, the learning rate starts from a high value and gradually the learning

rate decreases by every epoch in the training [40]. Annealing the learning rate

through the training process leads to faster learning at the beginning of the train-

ing process and decreasing the learning rate through each epoch leads to smaller

(slower) but more accurate convergence [41].

39
2.1.2 Convolutional Neural Networks (ConvNets)

ConvNet models are introduced by Lecun [42] and they are becoming increas-

ingly popular since Krizhevsky’s groundbreaking results in classifying ImageNet

dataset [43, 44] in 2014 by using a large ConvNet model. ConvNets have been

particularly useful in image/video synthesis, object classification and localiza-

tion. For instance, Longpre et al. used ConvNets to detect facial features (eyes,

lips, eyebrows, etc.) with key point locations [45]. Zeiler et al. used a ConvNet

to classify and localize objects in the ImageNet dataset [23, 44]. Moreover, Con-

vNets have been used for image segmentation and creating super-resolutions

[46, 47].

Unlike MLPs where one fully-connected hidden layer is connected to another

fully-connected hidden layer, ConvNets have small repetitive blocks of neurons

called feature filter that is applied to adjacent local neurons of the input data

called receptive fields. Consequently, ConvNets are capable of extracting local

features from adjacent regions of the input data [42]. The output of a feature

filter applied to its receptive fields is called feature map [42, 43].

Fig. 2.5 shows a ConvNet model activated by an example input vector of

X1×9 with nine samples. Furthermore, feature filter C(1) is activated in different

regions of X. In this example, it takes two adjacent neighborhoods of the input

X as a feature filter receptive field to produce a feature map. Thus, feature fil-

ter extracts local features from various regions of the input (various receptive

40
fields). Generally, a ConvNet that can extract a variety of features requires mul-

tiple feature filters to generate multiple feature maps. As a result, a combination

of multiple feature filters in a layer forms a convolutional layer.

Fig. 2.5: ConvNet model example.

In addition to convolutional layers, max-pooling layers are commonly used

in ConvNet models. Max-pooling layer cascaded on top of a layer downsamples

layer’s outputs by taking the maximum value in non-overlapping windows (re-

gions). Max-pooling layer is also known as the zoom-out layer because it allows

the model to look at a larger patch of data and choose the maximum value of its

receptive field as its output. Thus, this type of layer provides computation effi-

ciency since it reduces the size of its receptive field and passes the size-reduced

41
output to the next layer and also avoid over-fitting. In Fig. 2.5, the layer with

the MAX blocks represent a max-pooling layer.

Stride, defined as the distance between the center of two adjacent receptive

fields, refers to relative offset applied to feature filters (or any other NN opera-

tion) [48]. Stride S means how many samples from the center of a receptive field

will be horizontally and vertically translated. Stride operation reduces the over-

lap of receptive fields and spatial dimensions [48]. It is argued that the reduction

in non-overlapping receptive fields prevents over-fitting [49].

Fig. 2.6 illustrates an example of max-pooling layer, stride and receptive

fields. In this example, the max-pooling function is displayed with empty cir-

cles. Further, stride and receptive fields on an input X1×12 are explained. In this

example, the stride S = (1 × 3) and the size of the receptive field is also (1 × 3).

Thus, the centers of the receptive fields are (2, 5, 9, 11) where each center has 2

horizontal-sample distance from the next center. As it is shown, the max-pooling

operation outputs the maximum value over every adjacent receptive field of size

(1 × 3).

42
Fig. 2.6: Max-pooling, stride and receptive field example.

Another important feature of ConvNet models is that they can be cascaded

as shown in Fig. 2.5 where feature filter C(2) is cascaded on top of the max-

pooling layer of the prior convolutional layer. This feature allows the ConvNet

model to extract more complex features [23]. Generally, for decision-making

purposes, the last layers in ConvNet models are fully-connected layers, shown

as FCL block layer in Fig. 2.5.

Feature filter sizes, number of feature filters in a convolutional layer, recep-

tive field sizes and strides are additional hyperparameters to construct a Con-

vNet model.

43
2.1.3 Long Short-Term Memory Neural Networks

Temporal dependency commonly exists in speech recognition, natural language

processing, handwriting recognition, etc. Due to the ability to keep the model’s

interstate by utilizing looped connections, RNN provides a solution to capture

the temporal dependency in the data [30, 31]. Fig. 2.7 shows an RNN scheme

and unrolled RNN scheme [50]. RNN scheme shows the weighted loop connec-

tion in an RNN model and unrolled RNN scheme shows that every timestamp

passes its output to the next timestamp as well.

Fig. 2.7: RNN scheme and unrolled RNN scheme.

A task such as classification or regression is considered to have a long-term

dependency if the output at time t depends on the input presented at time τ

where τ << t, as oppressed to the short-term dependency [51]. The times-

tamp difference between τ and t varies for applications, however, 100 times-

tamps can be considered a long-term gap between the data [52, 53]. Traditional

RNNs are able to only capture short-term dependency and they are unable to

capture the long-term dependency successfully since these networks encounter

44
the problem of vanishing-exploding gradients [51], i.e., the derivative of the loss

function with respect to weights approaches close to zero or infinity after a short

period of the time. This issue makes it impossible to train the networks for

long-term dependency. To address this issue, LSTM RNN has been developed

by Hochreiter and Schmidhuber [53]. LSTM RNN uses trainable memory cells

called LSTM cells instead of simple neurons. These memory cells have three

trainable gates including input, output and forget gates. These cells remember

values over long-term periods, and the gates regulate the flow of information

into and out of the cells. A large number of applications have performed better

than their competitors using LSTM networks [54]. Details of the LSTM cells will

be discussed in Chapter 4.

As examples of successful applications in LSTM, Graves et al. introduced an

LSTM-based long-term and short-term sequence learner on handwritten recog-

nition. Their approach considered the long-term and short-term dependencies

between the handwriting pixels [55]. Later on, Graves et al. used a similar ap-

proach for audio signals and speech recognition [54].

2.2 ECG Analysis Research Review

Generally speaking, there are two technical research fields in ECG delineation.

While one focuses on the extraction of features which can best represent the

structure of cardiac waves [12, 13, 56, 57], the other is to classify the obtained

45
features into a specific set of classes, i.e., cardiac wave classes or heart-condition

symptom classes [14, 15, 58]. An extensive literature review on ECG signal delin-

eation can be found in [29]. Recently, researchers have applied DL methods for

ECG processing as well [59, 60]. In this section, a background review on ECG

feature extraction methods, ECG feature classification methods and DL-based

methods for ECG analysis will be provided.

2.2.1 ECG Feature Extraction Methods

There are three commonly-used approaches to extract features in ECG signal

analysis. The first approach is to apply filters such as smoothing filters [12], first-

order derivative filters [61, 62], second-order derivative filters [13], etc. The sec-

ond approach is based on transformation methods including Discrete Wavelet

Transform (DWT) [57, 63–65], Fourier Transform (FT) [56], etc. The third ap-

proach is amplitude-based methods [5], which considers the amplitude of the

ECG signal as the main feature for classification purposes.

Laguna et al. used a low-pass differentiator to find QRS-complex onset and

T-wave offset. Low-pass differentiators are first-order derivative filters with cut-

off frequency [12]. In their work, a cut-off frequency is defined to minimize the

impact of noise in ECG signals. Based on the first-order derivative of the ECG

signal and defined rule-based methods, QRS-complex onset and T-wave offset

are detected [12].

46
Pan and Tompkins developed one of the well-known derivative-based meth-

ods to find QRS-complexes. Derivative-based methods analyze the slopes of the

ECG signal and then identify ECG segments with higher variations in the slope

of the signal. However, assigning a cardiac wave to the highest slope varia-

tion is not always possible, especially for abnormal ECGs. For abnormal ECG

signals, it is not uncommon that either P-wave or T-wave may show a higher

change in the slope than QRS-complex. Therefore, these algorithms may result

in unacceptably high error rates [13].

Schreier et al. used a second-order derivative approach to segment QT inter-

val. In their work, they analyzed the change in the second-order derivative of

the ECG signal and amplitude of the ECG signal under a small window size to

find Q-wave onset and T-wave offset [66].

Transformation methods such as FT and DWT have been used to analyze

ECG frequency information and find frequency patterns revealed by cardiac

waves [56, 63, 64]. These methods can be comprehensive if various basis func-

tions and frequency patterns are defined for a variety of cardiac wave forma-

tions.

Murthy et al. used Discrete Fourier Transform (DFT) to obtain a filter bank

of low-pass filters for delineation of P-wave, QRS-complex and T-wave. In their

work, a set of cardiac wave instances is transformed into distinct frequency band

filter bank. The filter characteristics are determined from the time signal. Mul-

47
tiplication of the transformed signal with a complex sinusoidal function allows

the use of a bank of low-pass filters for the delineation of cardiac waves [56].

Tafreshi et al. introduced an amplitude-based method to identify multiple

normal and abnormal QRS-complex types. This type of algorithms is sensi-

tive to the threshold that is defined for the cardiac waves and if there is a mi-

nor variation in cardiac wave formations then the threshold should be adjusted.

However, if there is a major change between heartbeats the algorithm will not

perform on the heartbeats with major variations and instead, it relies on the in-

formation obtained from other ECG leads [5].

2.2.2 ECG Feature Classification Methods

Over the years, various rule-based and ML methods have been developed for

ECG cardiac wave classification. In the ML category, these methods encompass

NN, Random Forest, Support Vector Machine (SVM), Naive Bayes, HMM, linear

discriminants, logistic regression, etc. Generally, these methods focus on local-

izing the peak point of the cardiac waves. However, there is research on ECG

signal segmentation as well [11, 67].

Ouyang et al. have constructed a three fully-connected layer NN trained with

ECG information to diagnose HCM. The inputs to the NN are information on P-

wave, QRS-complex, T-wave, etc. and the outputs are four diagnostic categories

related to HCM [68].

48
Kheidorov et al. used an NN to classify ECG segments into cardiac wave cat-

egories. In their work, ECG features are obtained in two different ways. First,

ECG features are obtained by local properties of ECG data. Second, ECG fea-

tures are obtained using wavelet transform. Based on the obtained ECG features

as input, HMM is utilized to model ECG cardiac waves temporal relation. An

NN is used to connect the feature vector to the HMM states [69].

Rahman et al. used SVM and Random Forest to classify ECG signals into two

categories of HCM and non-HCM ECGs. In their work, 36 temporal features and

six spatial feature are extracted from each one of the 12-lead ECG. Thus, in total

504 features are extracted for the classification task [70]. Random Forest is an en-

semble decision tree algorithm and it is mainly used for classification problems

[69, 70]. Further, SVM finds the optimal hyperplane between the categories of

data to classify the new instances [70].

Warner et al. used logistic regression for a heart arrhythmia called left ven-

tricular hypertrophy. Logistic regression is a categorization algorithm that out-

puts the probability of belonging to a particular class through optimizing the

logistic function [71]. De Chazal et al. used linear discriminants to find car-

diac complex intervals. Linear discriminant finds the mean and variance for

each class of the data and classifies the new examples according to the obtained

mean and variance for classes [72].

Kaiser et al. used a rule-based method. This method detects heart condi-

49
tions such as myocardial infarction and left ventricular hypertrophy by extract-

ing parameter values and classifying the extracted parameter values with rule

sets aligned with the symptoms. Therefore, it generates a set of rules for every

heart disease symptom [14].

Dumont et al. used an evolutionary algorithm to tune the parameters of a

wavelet-based algorithm proposed in [65]. The objective of their tuning is to

adjust the parameters of the detection algorithm so that the detected peaks and

wave boundaries are nearly the same as those manually annotated by cardiolo-

gists [67].

HMM is in the family of Bayesian classifiers with the assumption of depen-

dency between the features [11]. Hughes et al. segmented ECG cardiac waves

by utilizing HMM. In their research, three approaches have been adopted for

ECG segmentation. First, HMM is trained on the raw ECG Signal. The result

of this experiment has the lowest performance of the three approaches. Sec-

ond, the results are improved by training HMM on wavelet encoded ECG. In

the third approach, the research improves the results more by utilizing a spe-

cial type of HMM called Hidden Semi-Markov Model Hidden Semi-Markov

Model (HSMM). Unlike HMM, HSMM governs the duration of staying in one

hidden state [11]. By studying this work, two conclusions are made. First, the

automated ECG delineation field is still struggling to provide an E2E ECG de-

lineation method using the raw ECG signal. The first experiment, utilizing raw

ECG signals, has the lowest performance compared to the wavelet encoded ECG

50
experiments. Second, HSMM outperforms HMM. This observation shows that

HMM requires a policy to stay in a particular hidden state for cardiac waves du-

ration. Additionally, HMM has its limitation of learning a combination of var-

ious patterns as typically shown in ECG signals [30]. On the other hand, deep

LSTM RNN is a sequence learner capable of capturing long-term and short-term

temporal patterns and their combinations. Therefore, there is room to improve

capturing cardiac waves temporal relations.

2.2.3 DL-based Methods for ECG Analysis

With the recent emergence of DL methods in various aspects of signal process-

ing and data analysis, DL methods have been applied to ECG pattern recogni-

tion as well. In terms of ECG signal analysis field, Kiranyaz et al. developed

the first DL-based method to analyze ECG signals [59]. The DL method was

used to classify ECG signals into normal and abnormal ECG. Their work used

1-D ConvNet to detect ventricular ectopic beats and supraventricular ectopic

beats. Rajpurkar et al. used residual connection in a deep 34-layer ConvNet to

identify heart cardiac condition symptoms and they could exceed the average

cardiologist performance in both recall and precision [60]. Acharya et al. used

DL methods to detect arrhythmias [73]. In these works, the focus was to extract

hierarchical features of ECG signals in order to classify heart disease symptoms.

Pourbabaee et al. used a ConvNet architecture to investigate the feature

51
extraction capability of these models for Paroxysmal Atrial Fibrillation (PAF)

which is a life threatening cardiac arrhythmia. Their work shows that ConvNets

are good feature extractors. However, their proposed E2E ConvNet model un-

derperforms another proposed model that uses the ConvNet as its feature ex-

tractor and K-Nearest Neighborhood (KNN) as its classifier [74].

Rubin et al. used a combination of time-frequency transformation and Con-

vNet models to identify Atrial Fibrillation (AF), which is a heart arrhythmia.

In their work, ECG signal converts to its time-frequency domain and the time-

frequency domain is used as the input to a ConvNet model to classify it into AF,

normal ECG and other categories [75].

Warrick et al. used a combination of convolutional layers and LSTM layers

to identify various types of heart arrhythmia patterns. In their work, one convo-

lutional layer and three LSTM layers have been cascaded on top of each other to

classify segments of ECG into one of the heart arrhythmia patterns [76].

Mostayed et al. used a specific type of LSTM models to classify 12-lead ECG

signals into nine classes of heart conditions. In their work, they used two hidden

RNN layers and a fully-connected layer cascaded on top of the second hidden

RNN layer to address the classification task [77].

In summary, most of the previous DL studies focus on classifying ECG sig-

nals into one of the categories of heart disease symptoms. However, there is

another aspect of ECG signal analysis, which is the synthesis of the cardiac

52
wave structure and ECG signal. While this topic is of high importance, only

a few recent DL-based research in the area of identifying cardiac waves have

been published [78–80]. An expert clinician is trained to synthesize ECG cardiac

waves under any heart condition symptom such as identifying T-wave under

the T-wave inversion or identifying ST-segment while it is elevated or identi-

fying QRS-complex while it is elongated. Based on the structure of the cardiac

waves, a cardiologist can relate the reading to a symptom of heart condition [7].

2.3 Data

PhysioNet was established in 1999 as a research resource for complex phys-

iologic signals, a cooperative project initiated by a group of computer scien-

tists, physicists, mathematicians, biomedical researchers, clinicians and educa-

tors at Boston’s Beth Israel Deaconess Medical Center (BIDMC)/Harvard Med-

ical School, Boston University and McGill University, all working together with

the MIT group. In the 1970s, PhysioNet team realized the usefulness of estab-

lishing public databases of well-characterized ECG recordings, as a basis for

evaluation, iterative improvement and objective comparison of algorithms for

automated arrhythmia analysis [22]. After five-year research, PhysioNet team

published the MIT-BIH Arrhythmia Database in 1980, which soon became the

standard reference collection of its type, used by over 500 academic, hospi-

tal and industry researchers and developers worldwide during the 1980s and

53
1990s. PhysioNet has become one of the main resources for research and ed-

ucation, offering the public free access to large collections of physiological data

and related open-source software. PhysioNet also hosts an annual series of chal-

lenges, focusing research on unsolved problems in cardiovascular research [22].

PhysioNet is managed by members of the MIT Laboratory for Computational

Physiology [22].

Additionally, PhysioNet offers a large collection of recorded physiologic of

signals sampled at 250Hz and QT Database (QTDB) for research in detection

and segmentation of ECG signals. This database includes 105 fifteen-minutes

of two-channel ECG recordings, including a broad variety of normal heartbeat

cycles, ST-segment morphologies, QRS-complexes morphologies, T-wave mor-

phologies, etc. An automated system has annotated the cardiac waveforms, and

experts made corrections when the automated system failed to perform anno-

tation [22]. This dataset provides researchers the best resources for varieties of

research in ECG signal processing. The QTDB ECG records are chosen from mul-

tiple public databases, including the MIT-BIH Arrhythmia (MIT-BIH) Database,

the European Society of Cardiology (ESC) ST-T Database and additional ECG

recordings collected at Boston’s BIDMC. The additional records are added to the

QTDB to represent extremes of cardiac physiology and SCD. This collection of

ECG recordings are an excellent source of varied and well-characterized data, to

which reference annotations marking the location of waveform boundaries are

added [22]. Table 2.1 shows the included heart-conditions from various database

54
sources. Mainly, this dissertation has adopted QTDB for three reasons. First, it

has a large amount of ECG data. Second, it includes a variety of normal cardiac

waveforms, abnormal cardiac waveforms and various types of heart-conditions.

Third, it is widely used so it is easier to compare this work with other research.

TABLE 2.1: Heart-conditions included in QTDB from various


databases.

Database Heart-condition
Arrhythmia
ST-segment abnormalities
MIT-BIH Supraventricular ectopic beats
Long term QT-segment
Normal sinus rhythm
ST-segment abnormalities
ESC
T-wave abnormalities
BIDMC records SCD

2.4 Evaluation Metrics

In order to evaluate the performance of algorithms and compare with those in

literature, a few metrics are defined as follows. The output of a categorization

problem can have four distinctive states. Considering a hypothetical categoriza-

tion algorithm that classifies the instances into two categories: 1- category A

and 2- category B, below each state of the categorization problem is defined:

• True Positive (TP): These outputs are the correct positive predictions, i.e.,

if the actual class for an instance is A, the algorithm also classifies the

instance into A, e.g. the correct category is P-wave category and the algo-

55
rithm prediction category is P-wave category for that instance.

• True Negative (TN): These outputs are the correct negative predictions,

i.e., if the actual class of an instance is not A, then the algorithm does not

classify it into A, e.g. the actual class is not P-wave and the algorithm

prediction does not classify the instance in the P-wave category.

• False Positive (FP): These outputs are incorrect positive predictions, i.e.,

if an instance is not in A, the algorithm predicts A for that instance, e.g.

the actual class of an instance is not P-wave but the algorithm predicts the

category of the instance as P-wave.

• False Negative (FN): These outputs are the incorrect negative predictions,

i.e., the instance belongs to A but the algorithm didn’t categorize it into

category A, e.g. the instance belongs to P-wave category but the catego-

rization does not predict P-wave category for the instance.

Fig. 2.8 shows a graphical representation of the four categorization states.

The red and green circles are the instances in the dataset. On the left side, the

green circles represent hypothetical P-wave instances. On the right side, the

red circles represent the hypothetical non-P-wave instances. The data instances

within the inner circle represent the data instances that the categorization clas-

sifies them as P-wave class. The data instances out of the inner circle are the

instances that the categorization does not classify them as P-wave class. Based

on this explanation, TP, TN, FP and FN are depicted in the figure.

56
Fig. 2.8: Categorization states.

Table 2.2 shows the categorization states. Good categorization is able to have

a high number of TP and TN. On the other hand, it should avoid FP and FN.

TABLE 2.2: Categorization output states.

Predicted Class

Class = Yes Class = No

Actual Class Class = Yes True Positive False Negative

Class = No False Positive True Negative

Considering TP, TN, FP and FN, measuring the performance of ML algo-

rithms relies on four metrics that combine the categorization states. These four

metrics are accuracy, precision, recall and f1-score. These metrics are used to

57
measure the performance of the segmentation algorithm as well. The follow-

ings will define the metrics and discuss the importance of them.

• Accuracy: Accuracy is the most used metric to measure the performance

of ML and classification methods. This metric measure the ratio of correct

observations over the total observations. Accuracy formula is expressed as

TP + TN .
Accuracy = (2.24)
TP + FP + FN + TN

Accuracy is a good metric when the algorithm provides almost symmetric

FP and FN, otherwise, this measure is not able to provide a meaningful

performance metric. This is the reason other performance metrics should

be considered alongside the accuracy metric.

• Precision: Precision is the ratio of correct positive observations to the total

classified positive observations. Precision is expressed as

TP .
Precision = (2.25)
TP + FP

High precision translates to low FP predictions, e.g. from all the predicted

P-wave how many of them are P-wave category.

• Recall (Sensitivity): Recall is the ratio of the correct positive observations

58
to all observation in the actual class. The recall is expressed as

TP .
Recall = (2.26)
TP + FN

High recall translates to low FN predictions, e.g. from all the instances in

P-wave category how many of them the algorithm found?

• F1-score: F1-score is the weighted average of precision and recall. There-

fore, it takes false positive and false negative into account simultaneously.

The f1-score metric is expressed as

2 × Recall × Precision .
F1 − score = (2.27)
Recall + Precision

These four criteria have been used in Chapter 4 and Chapter 5. Chapter 3

uses different metrics because the output is not categorization. Therefore, other

metrics will be introduced in Chapter 3 that can measure the performance of the

localization approach.

2.5 Perspective of This Work

This dissertation research utilizes the concepts of each of the previous sections.

Our major objective is to construct an E2E ECG segmentation approach based on

DL methods. To the best of our knowledge, there is little research on DL-based

59
ECG segmentation in the literature. Our first DL-based research in ECG analy-

sis is to localize P-wave, QRS-complex and T-wave in a single cardiac complex

using ConvNets [78]. This approach relies on a secondary algorithm to extract

cardiac complexes. Our second DL-based research is to understand the tem-

poral attributes of ECG signals using local derivative-based features and LSTM

networks for ECG segmentation [79]. However, this approach relies on local

derivative-based features. Chapter 5 is to present a comprehensive hybrid DL

method capable of extracting cardiac wave formations and capturing long-term

dependency in addition to the short-term dependency in one DL model for E2E

ECG segmentation.

60
Chapter 3

ECG Cardiac Wave Localization

Using Deep Convolutional Neural

Network

As discussed earlier in Chapter 1, due to various normal and abnormal spa-

tial patterns, the traditional feature extraction methods such as derivative-based

methods, amplitude-based methods, etc. fail to extract the variety of cardiac

wave spatial formations. In addition, the majority of the traditional methods

require in-depth medical knowledge about cardiac wave spatial formations. For

example, Tafreshi et al. utilized predefined QRS-complex spatial formations

to identify QRS-complexes [5]. Thus, only specific predefined spatial patterns

could be detected. In contrast, ConvNets use multilayer feature filters to extract

more complex features from ECG signals. As a result, it can identify a vari-

ety of cardiac wave formations. Furthermore, with labeled data and training,

61
ConvNets are capable of tuning the feature filter parameters to capture the spa-

tial patterns that exist in the training set. Therefore, by tuning the feature filter

parameters, they can extract patterns that represent the labeled data well and

capture the cardiac wave patterns.

The objective of this research is to investigate ConvNet’s capability of captur-

ing the hierarchical structure of ECG cardiac waves by localizing cardiac waves

including the P-wave, QRS-complex and T-wave. We propose a ConvNet model

which takes one cardiac complex as an input and predicts the location of cardiac

waves.

In order to train various NN models, QTDB ECG recordings should be pre-

processed for removing ECG wander drift baselines and data normalization.

In addition, since the input to the proposed NN models is only one complete

cardiac complex, ECG recordings should get divided into separate individual

cardiac complexes. Section 3.1 discusses the importance and methods of the

preprocessing and describes the derivative-based approach adopted for cardiac

complex extraction from ECG recordings. Unlike other derivative-based ECG

analysis approaches, no cardiac wave location is assigned and only the slope

variation of the signal is analyzed to extract cardiac complexes. Furthermore,

Section 3.2 explains the cardiac complex dataset distribution for training, vali-

dation and test sets. NN architectures and training procedure are presented in

Section 3.3. A test loss function to measure the performance of the models on

the unseen data is introduced in Section 3.4. Section 3.5 presents performance

62
analysis and comparisons with other research. Finally, Section 3.6 concludes the

contribution of the research. Fig. 3.1 shows the overall process for the proposed

cardiac wave localization approach.

Fig. 3.1: ConvNet cardiac wave localization.

3.1 Data Preparation

Respiration, variation in electrode impedance and body movement introduce

ECG’s wander drift baseline noises which compromise feature extraction meth-

ods [81]. Therefore, removal of wander drift baseline is critical preprocessing

for general ECG signal analysis. Since the model’s weight gradients are depen-

dent on the input values, unnormalized input data causes the gradients over

some input samples to become significantly larger than other samples. This

phenomenon causes more training iterations to converge to global minima of

the loss function [82]. Thus, following the wander drift removal, the ECG sig-

63
nals are normalized since this makes NN models to converge faster. However,

normalization will not change the location of cardiac waves.

One of the successful approaches to remove wander drift baseline from the

ECG signal is median filtering approach [26], described as

E(WF) = E( R) − M( M(E( R) )) (3.1)

where E(WF) is wander free ECG, E( R) is the raw ECG from QTDB and M(.) op-

eration is the result of applying a median filter to a signal. The size of the median

filter equals half of the sampling frequency, f req. Since ECGs from QTDB ave

been sampled at 250Hz ( f req = 250Hz), the window size for the median filter is

125 samples. Afterward, the ECG amplitudes are normalized by

E(WF) − Avg(E(WF) )
E(norm) = (3.2)
Std(E(WF) )

where Avg(.) is the average function, Std(.) is the standard deviation function

and E(norm) is the normalized ECG values. Therefore, even though the majority

of normalized ECG will be in the range of [−1, 1], there will be some samples

that exceed this range. However, because these samples are rare, it will not effect

on the training process. The regions with spikes have been excluded from the

dataset.

After wander drift baseline removal and normalization, the ECG signals are

64
ready for cardiac complex extraction. Following the methods introduced in [13],

[61] and [62], a derivative-based method has been developed to extract cardiac

complexes. In contrast to other derivative-based methods, this research detects

complete cardiac complexes instead of cardiac waves, i.e., while other research

tries to find characteristics of slope variations in P-wave, QRS-complex and T-

wave to detect the cardiac waves, this research utilizes slope variations to iden-

tify the regions with electrical impulse activity. Thus, this method finds cardiac

complexes by analyzing the high and low slope variations of the ECG signal as

shown in Fig. 3.2. The extracted cardiac complexes and their associated cardiac

wave location annotations from QTDB annotations will be the input-label pair

for training, validation and test sets.

Fig. 3.2: (a) normalized ECG segment, (b) local area under the mag-
nified second-order derivative signal.

In cardiac complex extraction, locations with high-variation in the slope of

the signal (change in the direction of the signal) imply a heart-muscle activity

such as a QRS-complex [62]. The lowest slope variation activity between two

65
high-slope variation activity locations represent the end of the one cardiac com-

plex and the beginning of another one [26, 62]. ECG slope variation activities

can be analyzed by utilizing a representation of local area under the normalized

ECG second-order derivative curve, E(γ) .

This algorithm relies on finding at least two slope variation maxima asso-

ciated with cardiac complexes. Therefore, the input to the algorithm should

include more than one cardiac complex. Since the typical duration of a cardiac

complex is about 0.83 second, eight seconds of ECG signal recording (i.e., 2, 000

samples of ECG signal sampled at 250Hz) is given as the input to the algorithm.

As a result, there will be more than one cardiac complex (approximately nine to

ten) in eight seconds ECG signal.

The followings are performed on the normalized ECG signal, E(norm) , to ob-

tain E(γ) .

E(SM) = E(norm) ⊗ ζ 1 (3.3)

where ⊗ and |.| are the convolution operation and the absolute value opera-

tion respectively. ζ 1 = [1, 1, 1] is a smoothing kernel [61], thus, E(SM) is the

smoothed absolute normalized ECG. The absolute amplitudes of ECG signal

are used since they can capture crossing the baseline in the ECG signal [61]. The

first-order derivative of the E(SM) is expressed as [61]

E( FD) = E(SM) ⊗ ζ 2 (3.4)

66
where ζ 2 = [−1, −1, −1, 0, 1, 1, 1] is a derivative kernel. Followed by another

derivative kernel, the second-order derivative of the absolute ECG signal can be

expressed as

E(SD) = E( FD) ⊗ ζ 2 . (3.5)

Thus, the local area under the magnified second-order derivative curve is ob-

tained by [61]
i +4

j
E(i γ) = ( E(SD) )6 (3.6)
j = i −4

j
where E(SD) is the jth sample of E(SD) , E(γ) is the local area under the magnified

(power of six) second-order derivative of E(norm) curve (second-order deriva-

tive is magnified to accentuate the higher absolute amplitudes [26]) , i.e., E(γ)

sums over the values of the magnified second-order derivative under the win-

dows of size nine (four prior points, the center point and four points after) which

is 36 milliseconds of ECG signals [61] and E(i γ) is the ith sample of the E(γ) . The

local maximas in E(γ) indicates high slope variations in the ECG signal.

A set called Φ contains all the high slope-variation locations which indi-

cate the location of cardiac complexes activities. Initially, Φ has only one el-

ement and that element is the location of the maximum value in E(γ) , thus,

Φ = { Idx ( Max (E(γ) ))} where Max (.) and Idx (.) are the maximum function

and indexing function respectively. After the initialization, other local maxima

sample locations can be added to this set with descending order of local maxi-

mum values. Since the Φ set is initialized by the location of the maximum value

67
of the E(γ) set and the rest of local maxima locations will be added in descend-

ing order of local maxima values, a potential local maxima that hasn’t yet been

considered for the set have lower values than the ones that are already in the Φ

set. Inspired by detecting R peak waves in [61], a threshold of 0.4 × Avg(EΦ


(γ) )

is considered to be a potential cardiac complex. This threshold is low enough

not to miss any potential cardiac complex activity. Moreover, any of the cardiac

waves in a cardiac complex can have high variations in the slope, thus, a car-

diac complex can make more than one local maxima close to each other. As a

result, to make sure to not mistake nearby local maxima as two different cardiac

complexes, any potential candidate for the Φ set should have a margin of 300

milliseconds, which is approximately the length of a QT-interval, from other Φ

elements. This margin is chosen because at least there should be the the length

of a QT-interval between to two cardiac complexes. In summary, if a sample is

a local maxima in E(γ) and it has a margin from Φ elements, it will be added to

the Φ set. Each sample in Φ represents the location of a high slope variation in a

local neighborhood of the ECG which indicates that a cardiac complex activity is

within that region. Fig. 3.2 (a) shows a segment of E(norm) and Fig. 3.2 (b) shows

E(γ) applied to the segment of E(norm) . The Φ set is marked with red-filled cir-

cles in Fig. 3.2 and red-filled stars are the beginning and the end of the cardiac

complex segment. Thus, it is shown that finding high-slope variation translates

to finding the cardiac complexes and finding low-slope variation translates to

finding the beginning and end of of cardiac complexes.

68
The following step after identifying the cardiac complex activity is finding

the starting and ending points of cardiac complexes. The lowest value of E(γ)

between every two consecutive Φ elements has the lowest slope variation be-

tween these two maximas, and it is considered as the end of a cardiac complex,

marked with red-filled stars in Fig. 3.2 (b). A cardiac complex occurs between

two low slope-variation points with a high slope-variation point in Φ elements

in between. Further, the surroundings of an element in Φ still have high-slope

variation and these points are part of a heart-muscle activity. A regular heart

muscle resting period is 200 milliseconds and to make sure that the beginning

and the ending of a cardiac complex are chosen at the end of the heart-muscle

resting period, a small margin of 20 milliseconds, which is less than the regular

resting period, between the low slope-variation points and high-slope variation

samples is introduced. By utilizing this approach, cardiac complexes can be

identified.

3.2 Data Sets

The cardiac complex extraction described above is capable of extracting cardiac

complexes from QTDB ECG recordings. Cardiac complexes are extracted from

two leads of fifteen-minute ECGs except for two instances where there were no

sampling data and the signals were blank and the regions with abnormal spikes.

Since QTDB consists of different datasets with different annotations, the car-

69
diac wave location annotations in some cases include the beginning and end of

the cardiac waves instead of the regular one point per cardiac wave. Therefore,

in all cases, the mid-point of the wave or maximum absolute amplitude point

is taken if the duration was not available to represent the location of the cardiac

wave interval.

Since the extracted cardiac complexes have different lengths, and fixed-length

vectors are needed to feed to an NN model, the extracted cardiac complexes

from E(norm) are padded to the length of 300 samples which represents 1.2 sec-

onds in time. Padding the cardiac complexes with repetitions of the signal’s last

value is adopted for under 300 samples cardiac complexes. If the network finds

cardiac waves in particular regions of the cardiac complexes, it will develop a

preconception to always output the particular regions as the location of cardiac

waves, i.e., it becomes overfitted for those regions. To prevent over-fitting, the

padding locations are randomly selected at the beginning, ending or both sides

of the cardiac complexes. Thus, the network’s output will not have any precon-

ception for any particular region.

Upon the completion of preprocessing, training, validation and test sets are

ready to be fed to the neural networks. Table 3.1 shows the distribution of ex-

tracted cardiac complexes in the training, validation and test sets. The cardiac

complexes in these three sets are mutually exclusive, indicating there is no iden-

tical cardiac complex from one recording to another. Three locations, P-wave,

QRS-complex and T-wave peaks are the network’s output target.

70
TABLE 3.1: Dataset distribution.

Set Number of Cardiac Complexes Percentage of the Total


Training Set 100,380 60 %
Validation Set 16,730 10 %
Test Set 50,189 30 %
Total 167,301 100 %

3.3 Neural Network Models

Three different DL-based NN models including an MLP network and two Con-

vNets have been developed for cardiac wave localization. Their performances

have been investigated and compared to similar research. In the proposed NNs,

samples drawn from a Gaussian distribution with mean zero and a standard

deviation of 0.1 initialized the networks’ weights and ReLU function has been

adopted as the neurons activation function. ReLU function is described as

a = Max (0, z) (3.7)

where z is the input to the ReLU and a is the output. ReLU function outputs

zero when the input is less than zero and it outputs the value of the input when

the input is not negative, thus, a ∈ [0, +∞).

In this research, Root-Mean-Square Error (RMSE), R, has been used as the

loss function of the proposed NN models and is defined as

v
K
u
u1
R =t
K ∑ ( y k − o k )2 (3.8)
k =1

71
where K is the number of outputs, ok is the predicted cardiac wave location

of the kth neuron and yk is the kth element in the label vector Y. For example,

Y = (70, 120, 160) means the P-wave, QRS-complex and T-wave are at loca-

tions 70, 120 and 160 of the input respectively. Thus, RMSE measures the dis-

tance between the network predictions and labels in terms of sample location

differences.

In our research, Adam optimization algorithm has been adopted for training

[83]. Since it has advantages offered by both Adaptive Gradient algorithm (Ada-

Grad) and Root-Mean-Square Propagation (RMSProp) [83]. Adam keeps a learn-

ing rate for each adaptive parameter in the network separately, thus it combines

the average of the first moments of the gradients and the average second-order

moments of the gradients to achieve a better learning rate for each parameter

[83].

In our experiments, three different learning rates, 10−3 , 10−4 and 10−5 an-

nealed to 10−7 through 500 epochs using step decay annealing and the best result

through the 500 epochs is reported. These three different learning rates enable

the model to perform a grid search for the best learning rate. The batch size of

5, 000 instances is chosen due to the monitoring of the convergence progression.

The first proposed NN model is an MLP model. This MLP model establishes

a benchmark of the convergence for our networks and examines if ConvNets

can perform better than the fully-connected NNs and if the hierarchical feature

72
extractions do possess advantages over the MLP model. The MLP model archi-

tecture is as follows. The first layer is the input layer with 300 input neurons for

an ECG segment with 300 samples. Inspired by the fully-connected layers in [42]

and monitoring the model’s convergence, the next layer is a fully-connected hid-

den layer with 150 neurons utilizing a ReLU activation function. Fig. 3.3 shows

the proposed MLP model architecture and Table 3.2 describes the architecture of

Fig. 3.3: The proposed baseline MLP model architecture.

the baseline network.


TABLE 3.2: Baseline MLP model description.

Layer Description Size


0 Input layer 300
1 Dense layer 150
2 Output dense layer 3

Next, two ConvNets are developed, with alternating convolutional layers

and max-pooling layers, followed by fully-connected hidden layers. One of the

ConvNets architectures has a dropout layer [84]. In order to avoid over-fitting,

during training, a dropout layer randomly selects some of the weights and pre-

vents them from getting updated by the learning algorithm. This causes the

73
model to update the weights independent of the effect of the other weights, i.e.,

it breaks the symmetry in learning algorithms [84]. The hyperparameters chosen

for the ConvNet models are inspired by [23] except for the learning rates and the

size of full-connected layers. Learning rate has been selected by grid searching

the best learning rate and the size of a fully-connected layer is inspired by the

convergence of the baseline MLP model.

Table 3.3 describes the proposed ConvNet with no dropout layer. This model

TABLE 3.3: The proposed ConvNet description.

Layer Description Size Receptive Field Size Stride


0 Input (1×300) - -
1 First convolutional layer (16×100) (1×5) (1×3)
2 First max-pooling layer (16×50) (1×2) (1×2)
3 Second convolutional layer (32×18) (1×5) (1×3)
4 Second max-pooling layer (32×9) (1×2) (1×2)
none Reshape (288) - -
5 First dense layer (150) - -
6 Output dense layer (3) - -

is six-layer deep with alternating convolutional and max-pooling layers. The

input to the network is a 1-D cardiac complex signal with 300 samples. All the

convolutional layers have (1 × 5) feature filters with stride of (1 × 3) and all the

max-pooling layers have non-overlapping (1 × 2) receptive fields with stride

of (1 × 2). The first and second convolutional layers include 16 and 32 feature

filters respectively. This is followed by two fully-connected layers. The first

fully-connected layer has 150 neurons. The second fully-connected layer, the

output layer of size three indicates three cardiac wave positions within a cardiac

74
complex. Fig. 3.4 shows the ConvNet model architecture.

Fig. 3.4: The proposed ConvNet architecture.

The output of neuron j in feature filter v at convolution layer l is expressed

as
w
( l −1)
= g( ∑ ac+i−b w c × θi,j,v + b j,v )
(l ) (l ) (l )
a j,v,c (3.9)
2
i =0

where c is center of the receptive field position, w is the length of the receptive
(l )
field. θi,j,v weighted connection between receptive field at location i and neuron j
( l −1)
for feature filter v. ac+i−b w c is the input from layer (l − 1) at location c + i − b w2 c
2

75
(l )
and b j,v is the bias for neuron j. Here, w = 5, c ∈ {1, ... , 300} for the first

layer with stride (1 × 3) and v ∈ {1, 2, ..., 16} since there are 16 feature filters.

The convolutional layer is then followed by a max-pooling layer. A max-pooling

layer with size of (1 × 2) takes a region of (1 × 2) from convolutional layer’s out-

put and selects the maximum number. The max-pooling function is described

as
(l ) ( l −1)
ac = max ( ac−b w c:c+b w c ) (3.10)
2 2

where : is the slicing operator (slicing operator selects a range from a vector)
(l )
and ac is the result of max-pooling operation on center of the receptive field

position c at layer l − 1.

The same operation is performed for the second convolutional layer and the

second max-pooling layer. As a result of applying convolutional filters and max-

poolings, feature maps of (32 × 9) is generated. Then, a matrix reshape is ap-

plied to this two-dimensional vector and converts it to one-dimension. By doing

so, the redundant dimensions are eliminated, turning the vector to a 288 feature

point vector. The 288 feature point vector is followed by a fully-connected layer

with 150 neurons. The next layer is the output layer, which is the last fully-

connected layer.

The next ConvNet architecture for cardiac complexes is the same as the pre-

vious one except that a dropout layer with dropout rate hyperparameter of 0.5

is added after each max-pooling, as illustrated in Table 3.4. The dropout rate of

76
0.5 is a commonly used value for the dropout rate [84].

TABLE 3.4: The proposed ConvNet with dropout layers descrip-


tion.

Layer Description Size Receptive Field Size Stride


0 Input (1×300) - -
1 First convolutional layer (16×100) (1×5) (1×3)
2 First max-pooling layer (16×50) (1×2) (1×2)
3 First dropout layer (16×50) - -
4 Second convolutional layer (32×18) (1×5) (1×3)
5 Second max-pooling layer (32×9) (1×2) (1×2)
6 Second dropout layer (32×9) - -
none Reshape (288) - -
7 First dense layer (150) - -
8 Output dense layer (3) - -

Fig. 3.5 shows the proposed ConvNet with dropout layers model architec-

ture.

3.4 Evaluation and Test

In this chapter, the focus is not to find the cardiac fiducial points such as P-peak,

QRS-on, QRS-off and T-peak points, but to identify cardiac waves. Therefore,

annotations aren’t fiducial cardiac points, the middle point of cardiac waves is

annotated in the data preparation step. However, finding fiducial points can

be one of the applications of identifying the cardiac waves. To evaluate if the

77
Fig. 3.5: The proposed ConvNet model with dropout layers archi-
tecture.

j
algorithm identified cardiac wave β k is expressed as


j j
1 if ok − yk ≤ 120 milliseconds

j
βk = , k ∈ {1, 2, 3} and j ∈ {1, ..., N }

0 otherwise
(3.11)

78
j
where yk is the label for the kth cardiac wave of the jth test instance in the test
j
set, ok is the network output for the kth cardiac wave of jth test instance in the
j
test set and N is the number of instances in the test set. If β k is equal to one, it

means the algorithm identified the the cardiac wave and if it is zero, it means the

algorithm missed the cardiac wave. In other words, the distance between model

output and annotation should less than 120 milliseconds to consider the cardiac

wave as identified. 120 milliseconds is a commonly used distance for cardiac

wave identification [29]. In order to measure the cardiac wave identification

performance, two criteria, Sensitivity (SE) of finding the cardiac waves and the

prediction error (MI), are defined as

j
∑N
j =1 β k
SEk = × 100 (3.12)
N

and

MIk = 100 − SEk . (3.13)

An algorithm identifies a cardiac wave if any sample within the cardiac wave

duration is identified as the cardiac wave. As a result of identifying a sample

within the cardiac wave duration, the test loss function should be able to tolerate

a degree of displacement from the label.

Another aspect of the performance is the prediction error relative to the la-

bel. The RMSE of the predicted values by the network is measured by Eq. 3.8.

Furthermore, for cardiac wave intervals, another interpretation of RMSE is re-

79
0
quired. A new output prediction, ok , introduces displacement tolerance based

on a constant and it is expressed as

dk = ok − yk k ∈ {1, 2, 3} ,

e i f dk ≥ e






dk 0 = −e i f dk ≤ −e , (3.14)




y − d otherwise

k k

0 0
ok = ok − dk

where ok is the NN prediction for the kth cardiac wave location, yk is the anno-

tated mark in the QTDB and ok 0 is the network’s new predicted position. e is a

constant indicating the vicinity that the loss function can tolerate for every wave.

The vicinity tolerance range, e, has been tried for 0, 5 and 10 which indicates 0

milliseconds, 2 milliseconds and 4 milliseconds respectively. These values mea-

sures the RMSE between the network prediction O and its label Y with tolerance

of e. For example, e = 5 (2 milliseconds) measures the RMSE between the net-

work prediction O and its label O with 2 milliseconds of displacement tolerance.

The interpretation of Eq. 3.14 is that if the predicted value, ok , is within the range

of e then it assigns the new predicted value to the label value yk . Consequently,

the error will be zero. In addition, if the predicted value, ok , is out of e range

from the target, yk , then it brings the output prediction closer to the target, as

much as value e.

80
3.5 Results

A total of 167, 301 complexes are extracted by our cardiac wave extraction algo-

rithm from QTDB and they are divided into three different sets with no sample

repetition. While 60% of them are used for training, 10% and 30% are used for

validation and test respectively. Table 3.5 shows the results of the proposed NN

models and their related learning rate.

TABLE 3.5: Result for every architecture and their related learning
rates.

Learning Training Validation Test Set


Architecture
Rate Set Cost Set Cost Cost
10−3 9.06 9.19 8.92
ConvNet with dropout 10−4 10.26 10.35 10.15
10−5 15.89 16.09 15.77
10−3 5.76 7.04 6.86
ConvNet without dropout 10−4 6.73 6.85 7.00
10−5 15.89 16.09 15.77
10−3 12.16 12.92 12.50
Fully-connected MLP 10−4 17.01 17.64 17.13
10−5 25.46 25.62 25.34

The best result for the baseline MLP model is achieved by the learning rate

10−3 with the error rate of 12.50 samples (i.e., 50 milliseconds) on the test set.

This error rate indicates that on average the predicted output for every cardiac

wave is 50 milliseconds off from the label.

The best result for the ConvNet model using dropout layers is achieved by

the learning rate 10−3 with the error rate of 8.92 samples (i.e., 35 milliseconds)

on the test set. This error rate indicates the predicted output for every cardiac

81
wave is 35 milliseconds off from the label.

The ConvNet without dropout layers with the learning rate of 10−3 achieves

the best architecture in our experiment. As shown in Table 3.5 the values of

RMSE for training, validation and testing are 5.76, 7.04 and 6.86, respectively.

Test set error rate 6.86 (27 milliseconds) means on average the predicted output

for every cardiac wave is 27 milliseconds off from the label. We named this NN

model ECGNet. Fig. 3.6 shows the result of ECGNet cardiac wave identification

on a test set instance.

Fig. 3.6: ECGNet model result on a test set instance.

ECGNet RMSE curve through 500 epochs for training and validation sets are

shown in Fig. 3.7.

82
Fig. 3.7: Training and validation error curves for the best result,
ConvNet without dropout layer and learning rate of 10−3 .

Measuring the RMSE in terms of Accuracy (ACC) percentage points is possi-

ble too. Considering the input has 300 samples, if locations of the cardiac waves

were predicted randomly, then the average RMSE would be 150 samples. There-

fore, the accuracy percentage can be obtained by

150 − Avg( R1:N )


ACC = × 100 (3.15)
150

where Avg( R1:N ) is the average of the RMSE for cardiac waves prediction in all

of the test set instances.

Table 3.6 shows the result of the ECGNet with three different vicinity toler-

ance ranges, (0, 5 and 10), using the test loss function introduced in Eq. 3.14.

When e equals zero, it means that this computation for the test set has no effect

and the output is the same as its label and ECGNet Acc is 95.43%. With a vicinity

tolerance of 10 samples, the model’s accuracy is 99%.

83
TABLE 3.6: ECGNet with vicinity tolerance.

Test set results for ECGNet


e RMSE ACC (%)
0 6.86 % 95.43%
5 1.92 % 98.72%
10 1.51 % 99.00%

Table 3.7 shows the performance of the ECGNet for both individual and all

cardiac waves. As shown in Table 3.7, ECGNet is capable of identifying 99.87%

of QRS-complexes in the test set and 99.19% of test set cardiac waves, respec-

tively. The second row illustrates the percentage of the undetected waves. The

highest missed-detection belongs to the T-wave and it shows that it is more dif-

ficult for the ECGNet to find T-waves compared to the other waves. The third

row is the mean error of the prediction output relative to the annotation in mil-

liseconds. The fourth row is the standard deviation of error. The fifth row is the

RMSE in milliseconds for the detected cardiac waves.


TABLE 3.7: Cardiac wave identification performance.

P-wave QRS-complex T-wave Overall


SE(%) 99.63 99.87 98.06 99.19
MI(%) 0.37 0.13 0.19 0.81
µ (ms) 17.03 7.68 16.05 13.59
σ (ms) 17.69 8.35 18.02 14.69
RMSE (ms) 24.56 11.35 24.13 20.01

To compare the ECGNet results with other similar research, it is worth men-

tioning other research focused on finding fiduciary cardiac points such as car-

diac wave peak points, the beginning of the wave and the ending of a wave. All

methods like ours adopt a displacement tolerance for these points. Therefore,

84
it is logical to pick the maximum detection rate on a particular cardiac wave to

compare it to the ECGNet result. For example, if research reports detection rates

for T-on, T-peak and T-off, the highest detection rate is picked. Because if they

found a particular point in a wave, it is sound to claim that the method found

the cardiac wave.

Table 3.8 compares our cardiac wave detection rate (sensitivity) to the other

methods. A more detailed report can be found in [29]. ECGNet has competitive

results with other methods in this field.


TABLE 3.8: Cardiac wave detection ratio comparison.

P-wave QRS-complex T-wave Overall


ECGNet 99.63 % 99.87 % 98.06 % 99.19 %
Laguna [12] 98.04 % 99.94 % 95.61 % 97.86 %
Martinez [65] 98.87 % 99.97 % 99.77 % 99.53 %
Hughes [11] 96.50 % 100 % 95.29 % 97.26 %
Di Marco [85] 98.15 % 100 % 99.77 % 99.30 %

3.6 Conclusion

This chapter demonstrated that a cardiac wave localization using DL methods

is promising. Our approach is to investigate the structure of ECG cardiac waves

and explore a novel localization method. The results, as shown in Table 3.8,

demonstrated that the proposed ConvNet models are capable of tuning their

feature filter parameters to extract features that are the best representative of

various cardiac wave patterns. Therefore, DL methods can be used to extract

85
spatial complex features necessary for detecting the position of various ECG

cardiac waves.

On the other hand, this method relies on a derivative-based approach to ex-

tract cardiac complexes. Therefore, this DL method hasn’t provided an E2E

ECG cardiac wave localization. Since FNNs cannot keep the interstate of the

network, the temporal relationship between the cardiac waves and cardiac com-

plexes have not been addressed and the DL model does not focus on capturing

the relationship between the cardiac waves. This issue causes to depend on a

secondary algorithm to identify the temporal attributes of ECG signal (such as

finding cardiac complexes). This is one of the areas that could be improved.

Convolutional layers performs well in spatial feature extraction and hierarchi-

cal structure extraction, however, they are not capable of capturing the temporal

relationship between the spatial features.

86
Chapter 4

Supervised Cardiac Waves

Segmentation Using Long

Short-Term Memory Neural

Network

Cardiologists identify cardiac waves in relation to each other. For example, fol-

lowing a QRS-complex location is probably a T-wave. Thus, knowledge of the

wave’s prior location is essential to predicting the consequent waves. Likewise,

each individual wave formation also affects other waves. Due to coherent tem-

poral dependency in ECG signals, NN models such as RNN can be used to cap-

ture this temporal dependency in ECG signals. Basically, an RNN can pass the

information from one timestamp to another. This feature allows it to capture

87
time series attributes [86].

The short-term temporal dependency alone may not capture all the temporal

attributes of the ECG signal. Unlike the traditional RNN, LSTM RNN is capa-

ble of capturing both long-term and short-term dependencies in the data. This

makes it suitable for ECG segmentation [55]. In this chapter, an LSTM RNN

model is developed to segment ECG cardiac waves. The LSTM RNN model has

improved the identification of temporal features, particularly in P-wave identi-

fication. The novelty of this work is to use LSTM RNNs to classify each sample

of ECG signal into one of the four categories, namely, the P-wave, the QRS-

complex, the T-wave and neutral. Thus, as a result of the classification of every

ECG sample, ECG segmentation can be achieved. Results show that given ECG

local feature vectors, LSTM RNN is capable of performing ECG segmentation.

This chapter is organized as follows. Section 4.1 describes the local features

extracted from ECG signals to train the LSTM RNN model. Section 4.2 intro-

duces the data sets including training, validation and test sets. Section 4.3 in-

troduces the Bidirectional Long Short-Term Memory (BLSTM) RNN. Section 4.4

proposes the novel architecture of ECG-SegNet and its application for ECG seg-

mentation. Section 4.5 presents training experiment and the convergence analy-

sis of ECG-SegNet. The results are demonstrated in Section 4.6, and they show

the strength of ECG-SegNet compared to the other sequence learners such as

HMM for the same task of ECG cardiac wave segmentation. Finally, Section 4.7

concludes the contribution of the research. Fig. 4.1 shows the overall process for

88
the proposed cardiac wave segmentation approach.

Fig. 4.1: Overall process for cardiac wave segmentation research


using an LSTM RNN model.

4.1 Data Preparation and Feature Extraction

Similar to the previous research, wander drift baseline and data normalization

has been applied to ECG recordings. Unlike the data prepared for the previous

research, ECG recordings are divided into 500 sample segments. With a sam-

pling frequency of 250Hz, every 2-second ECG segment includes more than one

cardiac complex (i.e., approximately 2 to 3 cardiac complexes). Thus, most of

the inputs will contain at least one complete cardiac complex and a partial car-

diac complex. As a result, the model trains on complete cardiac complexes and

89
partial cardiac complexes, thus, the model can find temporal relationships by

observing complete cardiac complexes and simultaneously, partial cardiac com-

plexes can be analyzed as well. In total 93, 490 of 500 sample segments have

been extracted from QTDB.

In this research, in addition to E(norm) samples, three local features are ex-

tracted for each sample by using different filtering kernels including the local av-

erage and the first and second-order derivatives. The smoothed E(norm) , E(SM)0 ,

is obtained by

E(SM)0 = E(norm) ⊗ ζ 10 (4.1)

where ζ 10 = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1] [61]. In addition, the first-order derivative

of the signal is obtained by

E( FD)0 = E(norm) ⊗ ζ 20 (4.2)

where ζ 20 = [−1, −1, −1, 0, 1, 1, 1] and E( FD)0 is the first-order derivative of

E(norm) [61]. Moreover, the second-order derivative of the signal is obtained by

E(SD)0 = E( FD)0 ⊗ ζ 20 (4.3)

where E(SD)0 is the second-order derivative of E(norm) [61]. Therefore, every in-

put timestamp at time t is xt = [E(norm),t , E(SM)0 ,t , E( FD)0 ,t , E(SD)0 ,t ].

Since every input timestamp has four features and the ECG segment se-

90
quence length is T = 500, the input to the LSTM RNN model is matrix X4×T =

(x1 , x2 , ..., xT ). The label matrix for the ECG segment is Y T = (y1 , y2 , ..., y T ) and

yt is the label for the timestamp t corresponding to the input timestamp xt . Ev-

ery xt belongs to one of the four categories including neutral as category one, P-

wave as category 2, QRS-complex as category three and T-wave as category four.

Considering that 1 − o f − K label coding is adopted, if the correct class of xt is

P-wave (second category) then the vector label is expressed as yt = (0, 1, 0, 0).

4.2 Data Sets

Similar to the previous research, three different sets including training, valida-

tion and test sets have been prepared from the extracted ECG segments. These

three sets are mutually exclusive, indicating there are no identical segments from

one recording to another. In addition, the sets are patient independent, i.e., there

are no ECG segments from one patient in two different sets. Table 4.1 lists the

training, validation and test data.

TABLE 4.1: Dataset distribution.

Dataset Number of segments Percentage

Training set 51,419 55%

Validation set 9,350 10%

Test set 32,721 35%

Total 93,490 100%

91
4.3 Bidirectional Long Short-Term Memory Recurrent

Neural Network Review

4.3.1 Traditional RNNs and Bidirectional RNNs

In traditional RNNs, every sample xt at time t is a vector including the features

that describes the raw input in that timestamp. Thus, in a conventional RNN

with one hidden layer, given the input matrix X = (x1 , x2 , ..., x T ), hidden layer

activation matrix sequence a = (a1 , a2 , ..., a T ) and the output matrix sequence

O = (o1 , o2 , ..., o T ), ot for timestamp t can be obtained by [54]

at = G (θx,h xt + θh,h at−1 + bh ) (4.4)

and

ot = θh,o at + bo (4.5)

where θx,h denotes weighted connection matrix between input vector and hid-

den neurons, θh,h denotes loop weighted matrix between hidden neurons, bh

denotes bias vector of hidden neurons, at is the hidden layer activations, bo

denotes bias vector of output neurons and G denotes element-wise activation

function. By cascading RNN hidden layers on top of each other and considering

the output of one hidden sequence as the input of the cascaded hidden layer,

deep RNNs can be achieved [54]. The same operations, defined in Eq. 4.4 and

92
Eq. 4.5, can be performed for the cascaded hidden layers as well. Fig. 4.2 shows

a traditional deep RNN model example. The first layer is the input layer X and

it is followed by two hidden layers, a(1) and a(2) , and an output layer O. As it is

shown, the attribute of the hidden layers in the traditional RNN is to receive the

output of the same hidden layer from previous timestamp and also passes the

hidden layer output to the layer above and also to its own next timestamp.

Fig. 4.2: Traditional deep RNN architecture.

Another aspect of conventional RNNs is only the prior data is used. How-

ever, in many cases, future data is available and can be used as an informational

source. Schuster et al. introduced a Bidirectional Recurrent Neural Network

(BRNN) [87], which uses both directions of the data, prior and future samples,

93
in two separate hidden layers and the activations are passed to the layer above

(another hidden layer or an output layer) [54]. Using the input xt at time t, the

forward hidden layer activation vector, ~at , and the backward hidden layer acti-

vation vector, a~t , are described as [54]

 
~at = G θx,~h xt + θ~h,~h~at−1 + b~h (4.6)

and
 
a~t = G θx,~h xt + θ ~~a~t+1 + b ~ (4.7)
hh h

where ~h refers to forward layer neurons and h~ refers to the backward layer neu-

rons. Correspondingly, the following equation computes the output in a BRNN

layer [54],

ot = θ~h,k~at + θ ~ a~t + bk (4.8)


h,k

where k refers to output neurons. Fig. 4.3 shows an example of BRNN with only

one hidden layer with its backward and forward hidden hidden layers.

4.3.2 LSTM RNNs and Bidirectional LSTM RNNs

One of the extensions of traditional RNNs is LSTM RNN. In contrast to tradi-

tional RNNs, LSTM RNNs are capable of storing long-term and short-term in-

formation and they are resistant to noise (input fluctuations that are random and

irrelevant to the predictions) [51]. Thus, this extension of RNN is ideal to learn

94
Fig. 4.3: Bidirectional RNN layer.

long-term and short-term data dependencies of ECG recordings. LSTM RNN

uses trainable memory cells called LSTM cells. As a reminder, these memory

cells have three trainable gates including input, output and forget gates and the

gates have the ability to add, remove and control the flow of information [30].

The gates, cell state (the stored state of the LSTM cell), cell activation and their

output computations are given in the following equations [88]:

Ft = σ (θx,F xt + θh,F at−1 + b F ) (4.9)

where σ is the element-wise sigmoid function, θx,F is the weighted connection

vector between the forget gate and the input layer, θh,F is the weighted con-

95
nection vector between the LSTM cell activation and the forget gate, at−1 is the

activation vector of the LSTM cell of the previous timestamp and b F is the forget

gate bias vector. Briefly, forget gate takes the current input and activation vec-

tor of the previous timestamp and decides if the previous timestamp cell state

should be remembered (considered) or forgotten. Further, the input gate is ex-

pressed as [88]

It = σ (θx,I xt + θh,I at−1 + b I ) (4.10)

where θh,I is the weighted connection vector between the input gate and the

LSTM cell activation, θx,I is the weighted connection between the input gate

and the input layer and b I is the input gate bias vector. Briefly, the input gate

takes the input and decides which parts of information of the new input are

good candidates to be stored in the LSTM cell state. LSTM cell state is expressed

as [88]

C̄t = Ft C̄t−1 + It Tanh(θx,C̄ xt + θh,C̄ at−1 + bC̄) (4.11)

where Tanh is element-wise tangent hyperbolic function, θx,C̄ is the weighted

connection vector between LSTM cell state and the input layer and θh,C̄ is the

weighted connection vector between the LSTM cell activation and the LSTM

cell state and , bC̄ is the cell state bias vector. Briefly, the candidate information

obtained from the input gate and the previous LSTM cell state obtained from the

forget gate decide the new information stored in the LSTM cell state. The output

96
gate and LSTM cell activation are expressed as [88]

Qt = σ (θx,Q xt + θh,Q at−1 + θC̄,Q C̄t−1 + bQ ) (4.12)

and

at = Qt Tanh(C̄t ) (4.13)

where θx,Q , θh,Q , θC̄,I , θC̄,F , θC,Q are connected weight vectors and bQ is the

output gate bias vector. Briefly, the output gate controls the flow of information

for the LSTM cell activation vector. Finally, a is the activation vector of the LSTM

cell. Fig. 4.4 shows an LSTM cell.

Fig. 4.4: LSTM cell [86, 88].

Similar to traditional RNNs, LSTM RNNs have access to only one direction

of the information (prior information). Dyer et al. have introduced Bidirectional

Long Short-Term Memory (BLSTM) as a solution similar to BRNNs for the LSTM

RNNs to benefit from both directions of the information [89]. Similar to BRNNs,

97
the input sequence is presented to the model in a forward and a backward di-

rection to two separate backward and forward hidden layers. The activations

of these two hidden layers are concatenated to form the final output [90]. How-

ever, in contrast to BRNN, BLSTM utilizes LSTM cells instead of simple neurons

for the backward and forward hidden layers. Similar to BRNN, the backward

hidden layer can capture the future information and forward hidden layer can

capture the past information [90].

4.4 ECG-SegNet Architecture Model

Utilizing BLSTM in ECG segmentation helps to classify a sample based on prior

and future samples, i.e., finding QRS-complex samples using prior samples such

as P-wave samples and future samples such as T-wave samples. Thus, BLSTM

becomes a very viable approach to be explored for ECG segmentation task.

Based on the aforementioned rationale for a suitable network using BLSTM to

classify ECG samples, a new architecture, ECG-SegNet, is proposed, as shown

in Fig. 4.5 and every block in the BLSTM layers is an LSTM layer.

This model contains two BLSTM layers and a fully-connected time distributed

output layer with softmax function as its activation function. A fully-connected

time distributed layer applies a fully-connected layer to every timestamp. The

hyperparameters such as the number of hidden layers and the number of LSTM

cells are chosen based on monitoring the convergence of the BLSTM RNN model

98
Fig. 4.5: The proposed BLSTM RNN architecture.

and are inspired by [54]. Table 4.2 describes the details of each layer for ECG-

SegNet.

The first layer is the input layer which takes time series X4×500 as explained

in Section 4.1. The next layer is a hidden BLSTM layer. In this layer, each back-

ward hidden layer and forward hidden layer have 250 LSTM cells, which sug-

99
TABLE 4.2: Deep BLSTM RNN for ECG segmentation.

Layer Description of the Layer Size


Input ECG raw signal and its features (4 ×500)
Layer 1 BLSTM with 500 LSTM cells (2 ×250)
Layer 2 BLSTM with 250 LSTM cells (2 ×125)
Output Time distributed dense layer (4 ×500)

gests that it has a total of 500 hidden LSTM cells. This is followed by another

BLSTM of size 250 and followed by the output layer which is a fully-connected

time distributed layer of size K to categorize every sample into one of the K

categories.

Calculating the loss for sequential models are similar to non-sequential mod-

els except that the loss in sequential models sums over all the timestamps and

then backpropagate to the weights. For the ECG segment X of length T, the net-

work produces a length of T output sequence O, where each ot ∈ O defines a

probability distribution over the K possible categories where k ∈ {1, 2, 3, 4}, otk

(the kth element of ot ) is the network’s estimate for the probability of observing

category k at time t, given xt . The network is trained to minimize the negative

log-probability of the label sequence using a softmax output layer for a multi-

class classification introduced in Eq. 2.12 using backpropagation through time

to determine the weight gradients [54, 86].

100
4.5 Training Experiment

Like previous research, weight initialization has been done utilizing a Gaussian

distribution with mean zero and a standard deviation of 0.1. The ECG-SegNet

is trained with Adam Optimizer [91] through 68 epochs using the mini-batch

procedure of batch size 250 which is chosen based on convergence progression.

The training stopped after 68 epochs because the gap between training set error

was getting smaller and validation set error was getting larger, and this diver-

gence is a sign of over-fitting. After training, the results showed 94.6% accuracy

for training set, 93.8% accuracy for the validation set and 93.7% accuracy for

the test set. Fig. 4.6 shows the accuracy rates and Fig. 4.7 shows the loss rates

through 68 epochs for validation and training sets.

Fig. 4.6: Accuracy curve.

101
Fig. 4.7: Loss curve.

4.6 Results

The precision, recall and f1-score related to each cardiac wave category and neu-

tral samples are reported in Table 4.3. The highest precision in the cardiac waves

belongs to the QRS-complex at 0.94 and the lowest belongs to T-wave with 0.90.

The highest recall rate belongs to the QRS-complex with 0.95 recall rate and the

lowest recall rate belongs to the P-wave 0.90. Further, the highest f1-score be-

longs to the QRS-complex at 0.94 and the lowest is tied between the P-wave and

the T-wave categories with 0.91. The average f1-score in Hybrid-ECG-SegNet is

0.94

102
TABLE 4.3: ECG segmentation results.

Precision Recall F1-Score


Neutral 0.95 0.95 0.95
P-wave 0.92 0.90 0.91
QRS-complex 0.94 0.95 0.94
T-wave 0.90 0.92 0.91
Average 0.94 0.94 0.94

For comparison purposes, two other approaches are used. Hughes et al. used

HMM to solve ECG segmentation with two approaches [11]. The first approach

used raw ECG signals and the second approach used wavelet encoded ECG.

Comparison of ECG-SegNet and both HMM approaches is provided in Table

4.4. It was shown that ECG-SegNet performs better in term of accuracy. HMM

with wavelet encoding gives better accuracy in segmenting QRS-complex and

T-wave. However, in all other cases and also in overall results, ECG-SegNet

outperforms HMM approaches.

TABLE 4.4: Segmentation accuracy comparison.

Method P-wave (%) QRS-complex (%) T-wave (%) Overall (%)


ECG-SegNet 92.0 % 94.0 % 90.0 % 92.0 %
HMM on raw
5.5 % 79.0 % 83.6 % 56.03 %
ECG [11]
HMM on
wavelet en-
74.2 % 94.4 % 96.1 % 88.23 %
coded ECG
[11]

The majority of other research focuses on finding the cardiac complex fidu-

cial points and not segmenting every single sample of ECG independently. Even

though the ECG-SegNet task is different than finding ECG cardiac waves iden-

103
tification, it provides competitive accuracy in identifying cardiac waves. Table

4.5 shows the accuracy of finding waves regardless of segmentation using ECG-

SegNet. More comparisons of cardiac wave identification with other approaches

will be discussed in the next chapter.

TABLE 4.5: Cardiac wave identification results.

Cardiac wave Accuracy


P-wave 95 %
QRS-complex 98 %
T-wave 97 %

Fig. 5.5 shows a sample from the test set and its related result and it shows

the accuracy of ECG-SegNet. P-wave, QRS-complex and T-wave areas are rep-

resented by red, blue and green regions.

Fig. 4.8: Sample result.

104
4.7 Conclusion

To the best of our knowledge, there has not been a DL-based method for ECG

signal segmentation before this research. Our work demonstrated that ECG-

SegNet is a powerful network capable of capturing the temporal attribute of the

ECG signal using only a few local features to yield very competitive results.

This research provides knowledge for the second main component of the E2E

hybrid DL method for ECG segmentation in Chapter 5. In the next chapter, the

inputs to the LSTM network are more comprehensive feature vector sequences.

105
Chapter 5

Cardiac Wave Segmentation Using

Autoencoders and LSTM Layers

The previous studies provide the two main components of an E2E ECG segmen-

tation method, complex feature extractor component and sequence learner com-

ponent with the capability of capturing long-term dependency in ECG signals.

This chapter proposes a novel hybrid DL method to combine these two compo-

nents together and introduces a single hybrid model for E2E ECG segmentation.

Results show that this hybrid DL method achieves better ECG segmentation per-

formance than the ECG-SegNet introduced in Chapter 4.

This research introduces a cardiac wave segmentation method using deep

convolutional autoencoders and LSTM networks. A convolutional autoencoder

has the potential to extract the complex hierarchical structure of a signal and has

been proven to be effective in segmentation applications [46, 47, 92]. They have

106
the ability to extract or compress the essential information from the input and re-

construct the desired output based on the extracted information. Furthermore,

LSTM networks have the capability of capturing long-term dependency of se-

quences [55]. By employing both deep convolutional autoencoders and LSTM

networks, our method is capable of learning a variety of cardiac wave forma-

tions and their temporal relations to each other.

As shown in Fig. 5.1, there are four components in the proposed new model.

The first component is the input layer to the normalized ECG signals. The sec-

Fig. 5.1: Hybrid-ECG-SegNet components.

ond component is a deep convolutional autoencoder. Upon the completion of

training, the convolutional autoencoder is capable of generating feature vectors.

The third component is a sequence learner consisting of two BLSTM layers. The

feature vectors generated by the convolutional autoencoder are the input to the

107
sequence learner. The fourth and last component form the output layer. In the

output layer, every timestamp sample is classified into one of the categories, i.e.,

neutral, P-wave, QRS-complex and T-wave. Fig. 5.2 shows the overall process

for the proposed approach. As it is shown, the flow starts by dividing nor-

Fig. 5.2: Overall process of the Hybrid-ECG-SegNet.

malized ECG data into smaller segments. The ECG segments and their related

108
annotation will get distributed into training, validation and test sets. Following

the data sets preparation, the autoencoder and the sequence learner are trained

for ECG segmentation task.

This chapter is organized as follows. Section 5.1 introduces the data sets

including training, validation and test sets. Section 5.2 explains the idea be-

hind the autoencoder paradigm. The proposed autoencoder and Hybrid-ECG-

SegNet are introduced in Section 5.3. Finally, Section 5.4 describes experiments

and the convergence of Hybrid-ECG-SegNet. The results presented in Section

5.5 demonstrate the strength of Hybrid-ECG-SegNet compared to the other se-

quence learners such as HMM for the same task of ECG cardiac waves segmen-

tation using only the normalized ECG signals.

5.1 Data Sets

Unlike the data prepared for the previous research, ECG recordings are divided

into 1, 000 sample segments, i.e., four seconds of E(norm) which contains approx-

imately four to five cardiac complexes (one or two of them can be partial cardiac

complexes). Having multiple cardiac complexes can help the model to find the

relationship between the cardiac waves. These ECG segments are the inputs

to Hybrid-ECG-SegNet model and correspondingly, the related annotations are

the labels. In total, 46, 690 segments are extracted from the QTDB to be used as

input-label pairs. In this research, there are not any traditional feature extraction

109
method and an ML approach is responsible for extracting spatial features from

ECG segments.

In our experiment, three different sets including training, validation and test

sets have been extracted from QTDB. Similar to the previous research, these

sets are mutually exclusive, indicating there are no identical segments from one

recording to another and subject independent. Table 5.1 illustrates the training,

validation and test sets.


TABLE 5.1: Dataset distribution.

Dataset Number of segments Percentage


Training set 28,014 60%
Validation set 4,669 10%
Test set 14,007 30%
Total 46,690 100%

5.2 Autoencoder to Extract Features

An autoencoder model is a bottleneck architecture that is based on an encoder-

decoder paradigm where the encoder compresses data into a lower-dimensional

representation and the decoder is tuned to either reconstruct the initial input

from the compressed data with minimum loss or the desired representation of

the input, e.g., a higher resolution of a low-resolution signal. The autoencoders

that reconstruct the initial inputs are called auto-associative autoencoders and

the autoencoders that transform the input into a representation of the input

are called non-auto-associative autoencoders [93]. In this research, the desired

110
output is a feature vector for every timestamp of the input data, thus, a non-

auto-associative autoencoder should be developed. One of the main applica-

tions of bottleneck architectures such as autoencoders is segmentation. Long et

al. used a fully convolutional network with a bottleneck architecture for E2E

image segmentation task [46]. Also, Meng et al. used autoencoders to extract

viable features for image segmentation [47]. Fig. 5.3 illustrates the operation

of an encoder-decoder paradigm for a non-auto-associative encoder where Ci

denotes the encoded data.

Fig. 5.3: Autoencoder paradigm process.

General framework of a non-auto-associative autoencoders includes F n set

with n dimensions, G m set with m dimensions where 0 < m < n, a training

set X = (x1 , x2 , ..., x N ) where N is the number of instances and xi ∈ F n , label

set Y = (y1 , y2 , ..., y N ) where yi ∈ F n , dissimilarity function ∆(., .) which

measures the dissimilarity between two members of F n , class of functions A

which transfers F n ⇒ G m , class of function B which transfer G m ⇒ F n . The

111
objective of a non-auto-associative is to find A ∈ A and B ∈ B such that

m m
min L( A, B) = min ∑ L (xi , yi ) = min ∑ ∆ ( B A (xi ) , yi ) (5.1)
A,B A,B
i =1 i =1

where is the application function and L(., .) is the loss function [93]. Therefore,

an autoencoder finds function A that lowers the dimension of the input from n

to m and B expands the lowered-dimensions to the original higher dimensions

n with minimum loss. This process causes A to extract essential information

from the input and B to construct the desired output from the essential features

extracted by A.

5.3 Hybrid-ECG-SegNet Architecture Model

Based on the aforementioned rationale for a suitable network using convolu-

tional autoencoders and BLSTM networks to classify ECG cardiac wave sam-

ples, a new architecture called Hybrid-ECG-SegNet is proposed.

The proposed autoencoder for feature extraction has three different types of

NN layers including convolutional layers for extracting spatial features, max-

pooling layers for reducing dimensions and upsampling layers for increasing

dimensions. While convolutional and max-pooling layers are used for encod-

ing, convolutional and upsampling layers are used for decoding. The decoder

follows the architecture of the Super-Resolution Convolutional Neural Network

112
(SRCNN). SRCNN upsamples the signal to a higher resolution from the low-

resolution signal without any obvious artifact for images [92]. In this research,

to increase the reduced-dimensions from max-pooling function, the upsampling

function repeats the rows and columns of the data. The hyperparameters adopted

for the research is based on our previous research and monitoring the conver-

gence progression. Fig. 5.4 shows the details of the proposed deep convolutional

autoencoder. Since the first layer has 1, 000 timestamps, the sequence length is

T = 1, 000. The autoencoder model architecture is as follows. The first convo-

lutional layer has 16 convolutional filters, thus, it transforms the input to 16 × T

dimension sizes. Then an autoencoder consisting of an encoder and a decoder

is applied. The encoder consists of two convolutional layers with alternating

max-pooling layers. By utilizing a max-pooling layer with receptive field of size

(1 × 2) and stride (1 × 2), the encoder compresses the output of the first layer
T
data to 32 × 4 dimension sizes through its bottleneck architecture, i.e., max-

pooling halves the size of its input and this results to compressing its input. A

decoder is positioned after the encoder consisting of alternating convolutional

layer and upsampling layer. Upsampling layers repeat the input’s rows and

columns (1 × 2), i.e., it doubles the size of its input. By utilizing upsampling,

the decoder decompresses the encoder data back to 16 × T dimension sizes. By

applying two max-pooling layers of size (1 × 2) and two upsampling layers of

size (1 × 2), the same size of the first convolutional layer feature maps is ob-

tained.

113
Fig. 5.4: The proposed autoencoder feature extractor.

The process of encoding and decoding with embedded convolutional layers

allows extracting the hierarchical structure of the ECG signal that represents the

114
essential information of the ECG signal. As a result of applying a convolutional

layer with 16 convolutional filters, the autoencoder generates a feature vector of

size 16 for every timestamp and the feature vectors are the input to the sequence

learner.

Feature vectors generated by the autoencoder are the input timestamps to

the sequence learner architecture through a time distributed layer. The sequence

learner architecture contains two BLSTM layers of size 2 × 150 and 2 × 75 LSTM

cells respectively. The sequence learner is followed by a time distributed fully-

connected softmax output layer. The output layer is OK ×T matrix which pro-

duces probability over K categories for each timestamp sample t. By utilizing the

loss function explained in Eq. 2.12 and using backpropagation through time, the

weight gradients are measured and weights are updated [54, 86]. After training,

ECG segmentation is achieved. The proposed Hybrid-ECG-SegNet is illustrated

in Fig. 5.5.

The category of the layers, the description of the layers, size of each layer and

the architecture of Hybrid-ECG-SegNet are detailed in Table 5.2.

5.4 Training Experiment

Weight initialization has been done similar to the previous research. The Hybrid-

ECG-SegNet is trained with RMSProp Optimizer [94] through 47 epochs using

115
Fig. 5.5: The proposed hybrid architecture.

the mini-batch procedure of batch size of 75 which is chosen based on monitor-

ing the convergence progression. The training stopped after 47 epochs because

116
TABLE 5.2: Hybrid-ECG-SegNet architecture.

Description Receptive
Layer Category Size Stride
of the layer Field Size
Layer 0 Input ECG signal (1 x 1000) - -
Convolutional
Layer 1 (16 x 1000) (1×5) (1×1)
layer with 16 filters
Max-Pooling
Layer 2 (16 x 500) (1×2) (1×2)
of Size (1 x 2)
Convolutional
Layer 3 (32 x 500) (1×5) (1×1)
layer with 32 filters
Max-Pooling
Layer 4 (32 x 250) (1×2) (1×2)
of Size (1 x 2)
Convolutional
Layer 5 (32 x 250) (1×5) (1×1)
layer with 32 filters
Upsampling
Layer 6 Auto- (32 x 500) (1 x 1) (1 x 1)
of Size (1 x 2)
encoder
Convolutional layer
Layer 7 (32 x 500) (1×5) (1×1)
with 16 filters
Upsampling
Layer 8 (16 x 1000) (1 x 1) (1 x 1)
of Size (1 x 2)
Time Distributed
Layer 9 (16 x 1000) - -
Dense Layer
Layer 10 Sequence BLSTM Layer (2 ×150) - -
Layer 11 Learner BLSTM Layer (2 ×75) - -
Time Distributed
Layer 12 Output (4 x 1000) - -
Dense Layer

the validation set error did not improve after 10 epochs and this is the descrip-

tion of the early stopping policy. RMSProp maintains per parameter learning

rate based on the average of the recent magnitude of the gradients for the param-

eter (weight). This approach is recommended for non-stationary signals [94].

After training, the results showed 94.73% accuracy for training set, 93.58% accu-

racy for the validation set and 93.99% accuracy for the test set. Fig. 5.6 shows

the accuracy rates through 47 epochs for validation and training sets. Fig. 5.7

117
Fig. 5.6: Accuracy curve.

shows the error rates through 47 epochs for validation and training sets.

Fig. 5.7: Loss curve.

118
5.5 Results

The detailed result of this segmentation is reported in Table 5.3. The highest

precision in the cardiac waves belong to the T-wave at 0.92 and the lowest are

tied between P-wave and QRS-complex with 0.9. The highest recall rate belongs

to QRS-complex with 0.95 recall rate and the lowest recall rate belongs to the

P-wave category with 0.92. The highest f1-score belongs to the QRS-complex

at 0.94 and the lowest belongs to the P-wave category with 0.91. The average

f1-score in Hybrid-ECG-SegNet is 0.94

TABLE 5.3: ECG segmentation results.

Precision Recall F1-Score


Neutral 0.95 0.95 0.95
P-wave 0.92 0.91 0.91
QRS-complex 0.94 0.93 0.94
T-wave 0.92 0.93 0.93
Average 0.94 0.94 0.94

For comparison purposes, two other approaches and ECG-SegNet are used.

Hughes et al. used HMM to solve ECG segmentation with two approaches

[11]. The first approach used the raw ECG signal and the second approach used

wavelet encoded ECG. Comparison of Hybrid-ECG-SegNet and both HMM ap-

proaches is provided in Table 5.4. It is shown that Hybrid-ECG-SegNet performs

better in term of accuracy. Using HMM with wavelet encoded in segmenting

QRS-complex and T-wave gave better accuracy. However, in all other cases and

also in overall results, Hybrid-ECG-SegNet outperforms HMM approaches.

119
TABLE 5.4: Segmentation accuracy comparison.

Method P-wave (%) QRS-complex (%) T-wave (%) Overall (%)


Hybrid-ECG-
92.0 % 94.0 % 92.0 % 92.6 %
SegNet
ECG-SegNet 92.0 % 94.0 % 90.0 % 92.0 %
HMM on raw
5.5 % 79.0 % 83.6 % 56.03 %
ECG [11]
HMM on
wavelet en-
74.2 % 94.4 % 96.1 % 88.23 %
coded ECG
[11]

Below a discussion about the differences between Hybrid-ECG-SegNet and

the reason that it outperforms other compared methods is provided.

• Comparing to ECG-SegNet approach: Hybrid-ECG-SegNet relies on a con-

volutional autoencoder and ML approaches to automate the feature extrac-

tion process. On the other hand, ECG-SegNet uses derivative filters and

smoothing features to extract features. The convolutional autoencoder has

the flexibility to extract simple and complex features by cascading convo-

lutional layers. Therefore, the advantage of Hybrid-ECG-SegNet to ECG-

SegNet comes from the ML-based feature extraction process. This result

also affirms the claim in Chapter 3. By this comparison, we conclude that

convolutional layers are capable of extracting the hierarchical structure of

the ECG signal.

• Comparing to HMM-based ECG segmentation using raw ECG signal: Re-

sults on HMM-based ECG segmentation using raw ECG signals demon-

strate that merely using a sequence learner is not enough for ECG segmen-

120
tation. Even a simple feature extractor component, such as the traditional

feature extractors introduced in Chapter 4, can outperform ECG segmenta-

tion methods that use raw ECG signals and a sequence learner. Although

the input to the Hybrid-ECG-SegNet is a normalized raw ECG signal, a

feature extraction component such as the proposed convolutional autoen-

coder is built into the model. This is the reason that Hybrid-ECG-SegNet

outperforms the HMM model using raw ECG signals.

• Comparing to HMM-based ECG segmentation using wavelet-encoded ECG

signal: Wavelet encoded ECG provides a reliable ECG representation for

the HMM sequence learner. By considering the P-wave categorization re-

sult, we realize that the HMM model under-performs by a large margin.

The reason is that HMM is not capable of remembering long-term depen-

dency, while, LSTM model is suitable for long-term dependency.

Even though the Hybrid-ECG-SegNet task is different from identifying ECG

cardiac waves, it provides competitive accuracy in this field as well. Table 5.5

shows the accuracy of identifying cardiac waves regardless of segmentation

using Hybrid-ECG-SegNet. In addition, the hybrid-ECG-SegNet outperforms

TABLE 5.5: Cardiac wave identification results.

Accuracy ECG-SegNet Accuracy Hybrid-ECG-SegNet


P-wave 95% 96%
QRS-complex 98% 99 %
T-wave 97% 98 %

ECG-SegNet algorithm by 1% in all the cardiac wave categories. The conclu-

121
sion from this observation is that the hybrid model provides synergy for the

ECG segmentation task. Fig. 5.8 shows two samples from the test set and their

related results. It shows the accuracy of Hybrid-ECG-SegNet. P-wave, QRS-

complex and T-wave areas are represented by red, blue and green regions. Fig.

5.9 shows a noisy sample from the test set and its related result. It shows the

Hybrid-ECG-SegNet is robust to noise as well.

Fig. 5.8: Two sample results.

122
Fig. 5.9: Noisy sample result.

123
Chapter 6

Conclusions And Future Work Plans

The hybrid DL method proposed in Chapter 5 utilizes a combination of ConvNet

for spatial feature extraction and BLSTM NN for capturing temporal attributes

of ECG signals and introduces an E2E ECG segmentation. The shortcoming

of the proposed method in Chapter 3 is the ConvNet is incapable of learning

the temporal features of the ECG signal. Furthermore, the shortcoming of the

proposed method in Chapter 4 is simple feature filters such as derivative-based

feature filters have been used to extract features from the ECG signal.

E2E ECG segmentation is a crucial yet insufficiently addressed area in au-

tomated cardiac diagnostics. In this dissertation, achievements were made on

proposing a hybrid DL methodology for an E2E ECG segmentation approach.

ConvNets and autoencoders were adopted to extract complex spatial features

and the hierarchical structure of ECG cardiac waves. The reason for utiliz-

ing such feature extractors is to address the various cardiac waves formations.

124
The proposed convolutional autoencoder is able to extract the hierarchical ECG

structure from a dataset with multiple cardiovascular disease symptoms, vari-

ous abnormal T-wave formations and miscellany ST-formations. Furthermore,

a sequence learner capable of capturing long-term and short-term dependencies

is adopted to capture the relationship between cardiac waves. Our result shows

that the LSTM sequence learner outperforms other types of sequence learners

based on HMMs.

The ability of detecting, identifying and segmenting cardiac waves augments

the possibility of making an impact on future research in cardiology. By com-

bining the vital information of waveforms with other methods in recognizing

symptoms, more accurate heart-related diseases can be diagnosed and high-

throughput automated ECG diagnostic systems can be developed to serve the

need of large population screening for disease prevention [7].

6.1 Directions for Future Research

The materials in this dissertation research can provide a baseline for future stud-

ies in ECG analysis using DL methodologies. In Chapters 3 and 4, different types

of DL architectures have tackled various aspects of ECG analysis. In Chapter 5,

a new DL architecture has been introduced for a successful E2E ECG segmenta-

tion. It is still an open-ended question whether DL can improve the ECG analy-

sis and diagnoses accuracy further. Our recommendations are as follows.

125
First, our dissertation research is limited to the cardiac wave formations in

QTDB, and more datasets can achieve better generalization. Especially, datasets

that have symptoms that aren’t in the QTDB dataset.

Our second recommendation is utilizing the E2E ECG segmentation approach

for diagnostic purposes in ECG analysis. All the reported results are based on

finding the cardiac waves and segmenting ECG samples. However, the primary

utility of the approach is to arrive at a diagnostic conclusion. For example, it is of

high importance to measure the abnormality detection performance on Seattle

Criteria using the proposed method.

Third, other forms of DL methodologies should be explored. In this re-

search, we explored some of DL methodologies appropriate for the problem

statement such as convolutional layers, autoencoders, BLSTM cells, dense layers

and dropout layers. However, with the continuation of DL research, other mod-

els and topologies can be good candidates for ECG delineation problem state-

ment. Attention models [95] and capsule networks [96] are excellent candidates

to be explored on ECG delineation.

Fourth, the E2E ECG delineation should be applied to potential applications.

Potential applications for an automated ECG delineation system are mentioned

in Section 1.1. Using the proposed model for those applications can be a future

explorations.

Fifth, measuring the energy efficiency of using DL methods should be consid-

126
ered, especially for wearable devices. One of the concerns over portable medical

devices is the energy consumption of such devices. A reliable, wearable medical

device should function on limited energy sources such as batteries. Therefore,

algorithms implemented in wearable medical devices should be energy efficient.

This concern has not been addressed in this dissertation and should be investi-

gated further.

127
Bibliography

[1] Electrocardiogram (ECG or EKG). https : / / www . heart . org / en / health -

topics/heart-attack/diagnosing-a-heart-attack/electrocardiogram-

ecg-or-ekg. Accessed: 2019-02-17.

[2] Belt with textile electrodes. http : / / www . e - projects . ubi . pt / smart -

clothing/achievements4.html. Accessed: 2019-02-17.

[3] A. Goldberger, Z. Goldberger, and A. Shvilkin. Goldberger’s Clinical Electro-

cardiography: A Simplified Approach: Ninth Edition. May 2017, pp. 1–275.

[4] J. A. Drezner, M. J. Ackerman, J. Anderson, E. Ashley, C. A. Asplund, A. L.

Baggish, M. Börjesson, B. C. Cannon, D. Corrado, J. P. DiFiori, P. Fischbach,

V. Froelicher, K. G. Harmon, H. Heidbuchel, J. Marek, D. S. Owens, S. Paul,

A. Pelliccia, J. M. Prutkin, J. C. Salerno, C. M. Schmied, S. Sharma, R. Stein,

V. L. Vetter, and M. G. Wilson. “Electrocardiographic interpretation in ath-

letes: the ‘Seattle Criteria’”. In: British Journal of Sports Medicine 47.3 (2013),

pp. 122–124. ISSN: 0306-3674. DOI: 10.1136/bjsports-2012-092067. URL:

https://bjsm.bmj.com/content/47/3/122.

128
[5] R. Tafreshi, A. Jaleel, J. Lim, and L. Tafreshi. “Automated analysis of ECG

waveforms with atypical QRS complex morphologies”. In: Biomedical Sig-

nal Processing and Control 10 (2014), pp. 41–49. ISSN: 17468094. DOI : 10 .

1016/j.bspc.2013.12.007. URL : http://linkinghub.elsevier.com/

retrieve/pii/S1746809413001821.

[6] M. Yuce, V. Davutoglu, B. Ozbala, S. Ercan, N. Kizilkan, I. Sari M. Akcay, C.

Akkoyun, A. Dogan, M. H. Alici, and F. Yavuz. “Fragmented QRS Is Pre-

dictive of Myocardial Dysfunction, Pulmonary Hypertension and Sever-

ity in Mitral Stenosis”. In: The Tohoku Journal of Experimental Medicine 220.4

(2010), pp. 279–283. DOI: 10.1620/tjem.220.279.

[7] M. J. Campbell, X. Zhou, C. Han, H. Abrishami, G. Webster, C. Y. M., C. T.

Sower, J. B. Anderson, T. K. Knilans, and R. J. Czosek. “Pilot study analyz-

ing automated ECG screening of hypertrophic cardiomyopathy”. English

(US). In: Heart Rhythm 14.6 (June 2017), pp. 848–852. ISSN: 1547-5271. DOI:

10.1016/j.hrthm.2017.02.011.

[8] J. Schläpfer and H. J. Wellens. “Computer-Interpreted Electrocardiograms”.

In: Journal of the American College of Cardiology 70.9 (2017), pp. 1183–1192.

ISSN: 0735-1097. DOI : 10.1016/j.jacc.2017.07.723. URL : http://www.

onlinejacc.org/content/70/9/1183.

[9] M. J. Campbell, X. Zhou, C. Han, H. Abrishami, G. Webster, C. Y. Miyake,

T. K. Knilans, and R. J. Czosek. “Electrocardiography Screening for Hy-

pertrophic Cardiomyopathy”. In: Pacing and Clinical Electrophysiology 39.9

129
(), pp. 944–950. DOI: 10.1111/pace.12913. URL: https://onlinelibrary.

wiley.com/doi/abs/10.1111/pace.12913.

[10] T. Glasmachers. “Limits of End-to-End Learning”. In: CoRR abs/1704.08305

(2017). arXiv: 1704.08305. URL: http://arxiv.org/abs/1704.08305.

[11] N. P. Hughes, L. Tarassenko, and S. J. Roberts. “Markov Models for Auto-

mated ECG Interval Analysis”. In: Proceedings of the 16th International Con-

ference on Neural Information Processing Systems. NIPS’03. Whistler, British

Columbia, Canada: MIT Press, 2003, pp. 611–618. URL : http://dl.acm.

org/citation.cfm?id=2981345.2981422.

[12] P. Laguna, N. V. Thakor, P. Caminal, R. Jané, H. Yoon, A. Bayés L., V. Marti,

and J. Guindo. “New algorithm for QT interval analysis in 24-hour Holter

ECG: performance and applications”. In: Medical and Biological Engineer-

ing and Computing 28.1 (1990), pp. 67–73. ISSN: 1741-0444. DOI : 10.1007/

BF02441680. URL: https://doi.org/10.1007/BF02441680.

[13] J. Pan and W. J. Tompkins. “A Real-Time QRS Detection Algorithm”. In:

IEEE Transactions on Biomedical Engineering BME-32.3 (1985), pp. 230–236.

ISSN: 0018-9294. DOI: 10.1109/TBME.1985.325532.

[14] W. Kaiser, T. S. Faber, and M. Findeis. “Automatic learning of rules. A

practical example of using artificial intelligence to improve computer-based

detection of myocardial infarction and left ventricular hypertrophy in the

12-lead ECG.” In: Journal of electrocardiology 29 Suppl (1996), pp. 17–20.

ISSN: 0022-0736.

130
[15] R. V. Andreao, B. Dorizzi, and J. Boudy. “ECG signal analysis through

hidden Markov models”. In: IEEE Transactions on Biomedical Engineering

53.8 (2006), pp. 1541–1549. ISSN : 0018-9294. DOI : 10 . 1109 / TBME . 2006 .

877103.

[16] C. Schmehil, D. Malhotra, and D. R. Patel. “Electrocardiography Screening

for Hypertrophic Cardiomyopathy”. In: Transl Pediatr 6.3 (), pp. 199–206.

[17] A. Alattar and N. Maffulli. “The Validity of Adding ECG to the Prepartic-

ipation Screening of Athletes An Evidence Based Literature Review”. In:

Translational medicine @ UniSa 11 (2014). 25674543[pmid], pp. 2–13. ISSN :

2239-9747. URL: https://www.ncbi.nlm.nih.gov/pubmed/25674543.

[18] Advances in Cardiovascular Diagnostic Devices. https://aabme.asme.org/

posts / advances - in - cardiovascular - diagnostic - devices. Accessed:

2018-12-05.

[19] Institute of Medicine. Accelerating the Development of Biomarkers for Drug

Safety: Workshop Summary. Ed. by Steve Olson, Sally Robinson, and Robert

Giffin. ISBN: 978-0-309-13124-7. DOI: 10.17226/12587.

[20] A. Naing, H. Veasey-Rodrigues, D. S. Hong, S. Fu, G. S. Falchook, J. J.

Wheler, A. M. Tsimberidou, S. Wen, S. N. Fessahaye, E. C. Golden, J. Aaron,

M. S. Ewer, and R. Kurzrock. “Electrocardiograms (ECGs) in phase I an-

ticancer drug development: the MD Anderson Cancer Center experience

with 8518 ECGs”. In: Annals of Oncology 23.11 (2012), pp. 2960–2963. DOI :

131
10.1093/annonc/mds130. URL : http://dx.doi.org/10.1093/annonc/

mds130.

[21] P. MartÃnez-Losas, J. Higueras, and J. C. GÃmez-Polo. “The computer-

ized interpretation of the electrocardiogram: A double-edged sword?” In:

EnfermerÃa ClÃnica (English Edition) 27.2 (2017), pp. 136 –137. ISSN: 2445-

1479. DOI: https://doi.org/10.1016/j.enfcle.2016.10.002. URL: http:

//www.sciencedirect.com/science/article/pii/S2445147916300224.

[22] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. Ch. Ivanov,

R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. “Phys-

ioBank, PhysioToolkit, and PhysioNet: Components of a New Research

Resource for Complex Physiologic Signals”. In: Circulation 101.23 (2000),

e215–e220.

[23] M. D. Zeiler and R. Fergus. “Visualizing and Understanding Convolu-

tional Networks”. In: CoRR abs/1311.2901 (2013). arXiv: 1311.2901. URL :

http://arxiv.org/abs/1311.2901.

[24] J. L. Willems, P. Arnaud, J. H. van Bemmel, P. J. Bourdillon, C. R Brohet,

S. D. Volta, J. D. Andersen, R. Degani, B. Denis, and M. Demeester. “As-

sessment of the performance of electrocardiographic computer programs

with the use of a reference data base.” In: Circulation 71 3 (1985), pp. 523–

34.

[25] P. Kligfield, L. S. Gettes, J. J. Bailey, R. Childers, B. J. Deal, E. W. Han-

cock, G. v. Herpen, J. A. Kors, P. Macfarlane, D. M. Mirvis, O. Pahlm, P.

132
Rautaharju, and G. S. Wagner. “Recommendations for the Standardization

and Interpretation of the Electrocardiogram”. In: Circulation 115.10 (2007),

pp. 1306–1324. DOI: 10.1161/CIRCULATIONAHA.106.180200.

[26] A. K. Dohare, V. Kumar, and R. Kumar. “An efficient new method for

the detection of QRS in electrocardiogram”. In: Computers and Electrical

Engineering 40.5 (2014), pp. 1717–1730. ISSN : 00457906. DOI : 10.1016/j.

compeleceng . 2013 . 11 . 004. URL : http : / / dx . doi . org / 10 . 1016 / j .

compeleceng.2013.11.004.

[27] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. http://www.

deeplearningbook.org. MIT Press, 2016.

[28] M. Campbell, A. Joseph Hoane Jr., and F. Hsu. “Deep Blue”. In: Artif. In-

tell. 134.1-2 (Jan. 2002), pp. 57–83. ISSN: 0004-3702. DOI : 10.1016/S0004-

3702(01)00129-1. URL : http://dx.doi.org/10.1016/S0004-3702(01)

00129-1.

[29] I. Beraza and I. Romero. “Comparative study of algorithms for ECG seg-

mentation”. In: Biomed. Signal Proc. and Control 34 (2017), pp. 166–173.

[30] A. Graves. Supervised Sequence Labelling with Recurrent Neural Networks.

Studies in Computational Intelligence. Berlin: Springer, 2012. URL: https:

//cds.cern.ch/record/1503877.

[31] M. Rosenblatt. Proceedings of a symposium on time series analysis. Wiley, 1963.

[32] G. E. Hinton, S. Osindero, and Y. Teh. “A Fast Learning Algorithm for

Deep Belief Nets”. In: Neural Comput. 18.7 (July 2006), pp. 1527–1554. ISSN:

133
0899-7667. DOI : 10.1162/neco.2006.18.7.1527. URL : http://dx.doi.

org/10.1162/neco.2006.18.7.1527.

[33] K. Hornik, M. Stinchcombe, and H. White. “Multilayer feedforward net-

works are universal approximators”. In: Neural Networks 2.5 (1989), pp. 359

–366. ISSN: 0893-6080. DOI : https://doi.org/10.1016/0893- 6080(89)

90020-8. URL : http://www.sciencedirect.com/science/article/pii/

0893608089900208.

[34] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. “Parallel Distributed

Processing: Explorations in the Microstructure of Cognition, Vol. 1”. In:

ed. by David E. Rumelhart, James L. McClelland, and CORPORATE PDP

Research Group. Cambridge, MA, USA: MIT Press, 1986. Chap. Learning

Internal Representations by Error Propagation, pp. 318–362. ISBN: 0-262-

68053-X. URL: http://dl.acm.org/citation.cfm?id=104279.104293.

[35] D.Randall Wilson and Tony R. Martinez. “The general inefficiency of batch

training for gradient descent learning”. In: Neural Networks 16.10 (2003),

pp. 1429 –1451. ISSN: 0893-6080. DOI : https://doi.org/10.1016/S0893-

6080(03 ) 00138 - 2. URL : http : / / www . sciencedirect . com / science /

article/pii/S0893608003001382.

[36] M. Li, T. Zhang, Y. Chen, and A. J. Smola. “Efficient Mini-batch Training

for Stochastic Optimization”. In: Proceedings of the 20th ACM SIGKDD In-

ternational Conference on Knowledge Discovery and Data Mining. KDD ’14.

New York, New York, USA: ACM, 2014, pp. 661–670. ISBN: 978-1-4503-

134
2956-9. DOI : 10.1145/2623330.2623612. URL : http://doi.acm.org/10.

1145/2623330.2623612.

[37] X. Glorot and Y. Bengio. “Understanding the difficulty of training deep

feedforward neural networks”. In: In Proceedings of the International Confer-

ence on Artificial Intelligence and Statistics (AISTATS’10). Society for Artificial

Intelligence and Statistics. 2010.

[38] K. He, X. Zhang, S. Ren, and J. Sun. “Delving Deep into Rectifiers: Sur-

passing Human-Level Performance on ImageNet Classification”. In: The

IEEE International Conference on Computer Vision (ICCV). 2015.

[39] J. Snoek, H. Larochelle, and R. P. Adams. “Practical Bayesian Optimization

of Machine Learning Algorithms”. In: Proceedings of the 25th International

Conference on Neural Information Processing Systems - Volume 2. NIPS’12.

Lake Tahoe, Nevada: Curran Associates Inc., 2012, pp. 2951–2959. URL :

http://dl.acm.org/citation.cfm?id=2999325.2999464.

[40] B. Y. Hsueh, W. Li, and I-Chen Wu. “Stochastic Gradient Descent with

Hyperbolic-Tangent Decay”. In: CoRR abs/1806.01593 (2018). arXiv: 1806.

01593. URL: http://arxiv.org/abs/1806.01593.

[41] S. J. Hanson and L. Pratt. “Advances in Neural Information Processing

Systems 1”. In: ed. by David S. Touretzky. San Francisco, CA, USA: Mor-

gan Kaufmann Publishers Inc., 1989. Chap. Comparing Biases for Minimal

Network Construction with Back-propagation, pp. 177–185. ISBN: 1-558-

60015-9. URL: http://dl.acm.org/citation.cfm?id=89851.89872.

135
[42] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. “Gradient-based learning

applied to document recognition”. In: Proceedings of the IEEE 86.11 (1998),

pp. 2278–2324. ISSN: 0018-9219. DOI: 10.1109/5.726791.

[43] A. Krizhevsky, I. Sutskever, and G. E Hinton. “Imagenet classification

with deep convolutional neural networks”. In: Advances in Neural Infor-

mation Processing Systems. 2012, p. 2012.

[44] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A.

Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and F. F. Li. “ImageNet

Large Scale Visual Recognition Challenge”. In: Int. J. Comput. Vision 115.3

(2015), pp. 211–252. ISSN: 0920-5691. DOI : 10.1007/s11263-015-0816-y.

URL : http://dx.doi.org/10.1007/s11263-015-0816-y.

[45] S. Longpre and A. Sohmshetty. “Facial Keypoint Detection”. In: Stanford

University (2016).

[46] J. Long, E. Shelhamer, and T. Darrell. “Fully Convolutional Networks for

Semantic Segmentation”. In: CoRR abs/1411.4038 (2014). arXiv: 1411.4038.

URL : http://arxiv.org/abs/1411.4038.

[47] Q. Meng, D. Catchpoole, D. Skillicom, and P. J. Kennedy. “Relational au-

toencoder for feature extraction”. In: 2017 International Joint Conference on

Neural Networks (IJCNN). 2017, pp. 364–371. DOI : 10.1109/IJCNN.2017.

7965877.

136
[48] C. Kong and S. Lucey. “Take it in your stride: Do we need striding in

CNNs?” In: CoRR abs/1712.02502 (2017). arXiv: 1712.02502. URL : http:

//arxiv.org/abs/1712.02502.

[49] H. Wu and X. Gu. “Towards dropout training for convolutional neural

networks”. In: Neural Networks 71 (2015), pp. 1 –10. ISSN: 0893-6080. DOI :

https://doi.org/10.1016/j.neunet.2015.07.007. URL : http://www.

sciencedirect.com/science/article/pii/S0893608015001446.

[50] F. A. Gers, J. Schmidhuber, and F. Cummins. “Learning to forget: continual

prediction with LSTM”. In: 1999 Ninth International Conference on Artificial

Neural Networks ICANN 99. (Conf. Publ. No. 470). Vol. 2. 1999, 850–855 vol.2.

DOI : 10.1049/cp:19991218.

[51] Y. Bengio, P. Simard, and P. Frasconi. “Learning long-term dependencies

with gradient descent is difficult”. In: IEEE Transactions on Neural Networks

5.2 (1994), pp. 157–166. ISSN: 1045-9227. DOI: 10.1109/72.279181.

[52] M. B. Ring. “Learning Sequential Tasks by Incrementally Adding Higher

Orders”. In: Advances in Neural Information Processing Systems 5. Ed. by C.

L. Giles, S. J. Hanson, and J. D. Cowan. San Mateo, California: Morgan

Kaufmann Publishers, 1993, pp. 115–122.

[53] S. Hochreiter and J. Schmidhuber. “Long Short-Term Memory”. In: Neural

Comput. 9.8 (Nov. 1997), pp. 1735–1780. ISSN: 0899-7667. DOI : 10 . 1162 /

neco.1997.9.8.1735. URL : http://dx.doi.org/10.1162/neco.1997.9.

8.1735.

137
[54] A. Graves, N. Jaitly, and A. R. Mohamed. “Hybrid speech recognition with

Deep Bidirectional LSTM”. In: 2013 IEEE Workshop on Automatic Speech

Recognition and Understanding. 2013, pp. 273–278. DOI : 10 . 1109 / ASRU .

2013.6707742.

[55] A. Graves and J. Schmidhuber. “Framewise phoneme classification with

bidirectional LSTM networks”. In: Proceedings. 2005 IEEE International Joint

Conference on Neural Networks, 2005. Vol. 4. 2005, 2047–2052 vol. 4. DOI :

10.1109/IJCNN.2005.1556215.

[56] I. S. N. Murthy and U. C. Niranjan. “Component wave delineation of ECG

by filtering in the Fourier domain”. In: Medical and Biological Engineering

and Computing 30.2 (1992), pp. 169–176. ISSN: 1741-0444. DOI : 10 . 1007 /

BF02446127. URL: https://doi.org/10.1007/BF02446127.

[57] C. Li, C. Zheng, and C. Tai. “Detection of ECG characteristic points using

wavelet transforms”. In: IEEE Transactions on Biomedical Engineering 42.1

(1995), pp. 21–28. ISSN: 0018-9294. DOI: 10.1109/10.362922.

[58] O. Sayadi and M. B. Shamsollahi. “A model-based Bayesian framework

for ECG beat segmentation”. In: Physiological Measurement 30.3 (2009), p. 335.

URL : http://stacks.iop.org/0967-3334/30/i=3/a=008.

[59] S. Kiranyaz, T. Ince, and M. Gabbouj. “Real-Time Patient-Specific ECG

Classification by 1-D Convolutional Neural Networks”. In: IEEE Transac-

tions on Biomedical Engineering 63.3 (2016), pp. 664–675. ISSN: 0018-9294.

DOI : 10.1109/TBME.2015.2468589.

138
[60] P. Rajpurkar, A. Y. Hannun, M. Haghpanahi, C. Bourn, and A. Y. Ng.

“Cardiologist-Level Arrhythmia Detection with Convolutional Neural Net-

works”. In: CoRR abs/1707.01836 (2017). arXiv: 1707.01836.

[61] A. Karimipour and M. R. Homaeinezhad. “Real-time electrocardiogram P-

QRS-T detection-delineation algorithm based on quality-supported anal-

ysis of characteristic templates”. In: Computers in Biology and Medicine 52

(2014), pp. 153–165. ISSN : 18790534. DOI : 10.1016/j.compbiomed.2014.

07.002. URL: http://dx.doi.org/10.1016/j.compbiomed.2014.07.002.

[62] M. R. Homaeinezhad, S. Atyabi, E. Daneshvar, A. Ghaffari, and M. Tah-

masebi. “Discrete wavelet-aided delineation of PCG signal events via anal-

ysis of an area curve length-based decision statistic”. In: Cardiovascular en-

gineering (Dordrecht, Netherlands) 10.4 (2010), pp. 218–234. ISSN: 1573-6806.

DOI : 10.1007/s10558-010-9110-3. URL: http://www.ncbi.nlm.nih.gov/

pubmed/21181267.

[63] B. Frénay, G. de Lannoy, and M. Verleysen. “Emission Modelling for Su-

pervised ECG Segmentation using Finite Differences”. In: 4th European

Conference of the International Federation for Medical and Biological Engineer-

ing: ECIFMBE 2008 23–27 November 2008 Antwerp, Belgium. Ed. by S. J. Van-

der, P. Verdonck, M. Nyssen, and J. Haueisen. Berlin, Heidelberg: Springer

Berlin Heidelberg, 2009, pp. 1212–1216. ISBN: 978-3-540-89208-3. DOI : 10.

1007/978-3-540-89208-3_290.

139
[64] S. K. Mukhopadhyay, M. Mitra, and S. Mitra. “Time plane ECG feature

extraction using Hilbert transform, variable threshold and slope reversal

approach”. In: (2011), pp. 1–4. DOI: 10.1109/ICCIndA.2011.6146675.

[65] J. P. Martinez, R. Almeida, S. Olmos, A. P. Rocha, and P. Laguna. “A Wavelet-

Based ECG Delineator Evaluation on Standard Databases”. In: IEEE Trans-

actions on Biomedical Engineering 51.4 (2004), pp. 570–581. ISSN: 00189294.

DOI : 10.1109/TBME.2003.821031.

[66] G. Schreier, D. Hayn, and S. Lobodzinski. “Development of a new QT al-

gorithm with heterogenous ECG databases”. In: Journal of Electrocardiology

36 (2003), pp. 145 –150. ISSN: 0022-0736. DOI: https://doi.org/10.1016/

j.jelectrocard.2003.09.039. URL : http://www.sciencedirect.com/

science/article/pii/S0022073603001274.

[67] J. Dumont, A. I. Hernandez, and G. Carrault. “Parameter optimization of

awavelet-based electrocardiogram delineator with an evolutionary algo-

rithm”. In: Computers in Cardiology, 2005. 2005, pp. 707–710. DOI: 10.1109/

CIC.2005.1588202.

[68] N. Ouyang, K. Yamauchi, and M. Ikeda. “Training a NN with ECG to di-

agnose the hypertrophic portions of HCM”. In: 1 (1998), 306–309 vol.1.

[69] W. Shi and I. Kheidorov. “Hybrid hidden Markov models for ECG seg-

mentation”. In: 2010 Sixth International Conference on Natural Computation.

Vol. 6. 2010, pp. 3323–3328. DOI: 10.1109/ICNC.2010.5583618.

140
[70] Q. A. Rahman, L. G. Tereshchenko, M. Kongkatong, T. Abraham, M. R.

Abraham, and H. Shatkay. “Identifying hypertrophic cardiomyopathy pa-

tients by classifying individual heartbeats from 12-lead ECG signals”. In:

2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

2014, pp. 224–229. DOI: 10.1109/BIBM.2014.6999159.

[71] R. Warner, Y. Ariel, M. Gasperina, and P. Okin. “Improved electrocardio-

graphic detection of left ventricular hypertrophy.” In: Journal of electrocar-

diology 35 Suppl (2002), pp. 111–5. ISSN: 0022-0736. DOI : 10.1054/jelc.

2002.37163. URL: http://www.ncbi.nlm.nih.gov/pubmed/12539107.

[72] P. De Chazal and R. B. Reilly. “A patient-adapting heartbeat classifier us-

ing ECG morphology and heartbeat interval features”. In: IEEE Transac-

tions on Biomedical Engineering 53.12 (2006), pp. 2535–2543. ISSN: 00189294.

DOI : 10.1109/TBME.2006.883802.

[73] U. R. Acharya, H. Fujita, O. S. Lih, Y. Hagiwara, J. H. Tan, and M. Adam.

“Automated detection of arrhythmias using different intervals of tachy-

cardia ECG segments with convolutional neural network”. In: Information

Sciences 405 (2017), pp. 81 –90. ISSN : 0020-0255. DOI : https://doi.org/

10.1016/j.ins.2017.04.012. URL : http://www.sciencedirect.com/

science/article/pii/S0020025517306539.

[74] B. Pourbabaee, M. J. Roshtkhari, and K. Khorasani. “Feature leaning with

deep Convolutional Neural Networks for screening patients with parox-

141
ysmal atrial fibrillation”. In: 2016 International Joint Conference on Neural

Networks (IJCNN). 2016, pp. 5057–5064. DOI: 10.1109/IJCNN.2016.7727866.

[75] J. Rubin, S. Parvaneh, A. Rahman, B. Conroy, and S. Babaeizadeh. “Densely

connected convolutional networks for detection of atrial fibrillation from

short single-lead ECG recordings”. In: Journal of Electrocardiology 51.6, Sup-

plement (2018), S18 –S21. ISSN: 0022-0736. DOI : https : / / doi . org / 10 .

1016/j.jelectrocard.2018.08.008. URL : http://www.sciencedirect.

com/science/article/pii/S0022073618303315.

[76] P. Warrick and M. N. Homsi. “Cardiac arrhythmia detection from ECG

combining convolutional and long short-term memory networks”. In: 2017

Computing in Cardiology (CinC). 2017, pp. 1–4. DOI: 10.22489/CinC.2017.

161-460.

[77] A. Mostayed, J. Luo, X. Shu, and W. Wee. “Classification of 12-Lead ECG

Signals with Bi-directional LSTM”. In: ().

[78] H. Abrishami, M. Campbell, C. Han, R. Czosek, and X. Zhou. “P-QRS-T

localization in ECG using deep learning”. In: 2018 IEEE EMBS International

Conference on Biomedical Health Informatics (BHI). 2018, pp. 210–213. DOI :

10.1109/BHI.2018.8333406.

[79] H. Abrishami, M. Campbell, C. Han, R. Czosek, and X. Zhou. “Supervised

ECG Interval Segmentation Using LSTM Neural Network”. In: 2018 Int’l

Conf. Bioinformatics and Computational Biology (BIOCOMP’18). 2018, pp. 71–

77.

142
[80] Meng J. Xiang Y Lin Z. “Automatic QRS complex detection using two-

level convolutional neural network”. In: Biomed Eng Online (2018). eprint:

doi:10.1186/s12938-018-0441-4. URL : http://arxiv.org/abs/1705.

08520.

[81] Y. Luo, R. H. Hargraves, A. Belle, O. Bai, X. Qi, K. R. Ward, M. P. Pfaffen-

berger, and K. Najarian. “A Hierarchical Method for Removal of Baseline

Drift from Biomedical Signals: Application in ECG Analysis”. In: TheSci-

entificWorldJournal. 2013.

[82] S. Santurkar, D. Tsipras, A. Ilyas, and A. Madry. “How Does Batch Nor-

malization Help Optimization?” In: arXiv e-prints, arXiv:1805.11604 (2018),

arXiv:1805.11604. arXiv: 1805.11604 [stat.ML].

[83] NormalECG. https://meds.queensu.ca/central/assets/modules/ECG/

normal_ecg.html. Accessed: 2019-07-04.

[84] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdi-

nov. “Dropout: A Simple Way to Prevent Neural Networks from Overfit-

ting”. In: J. Mach. Learn. Res. 15.1 (Jan. 2014), pp. 1929–1958. ISSN : 1532-

4435. URL: http://dl.acm.org/citation.cfm?id=2627435.2670313.

[85] L. Y. Di Marco and L. Chiari. “A wavelet-based ECG delineation algo-

rithm for 32-bit integer online processing”. In: BioMedical Engineering On-

Line 10.1 (2011), p. 23. ISSN: 1475-925X. DOI : 10.1186/1475-925X-10-23.

URL : https://doi.org/10.1186/1475-925X-10-23.

143
[86] X. Li and X. Wu. “Long Short-Term Memory based Convolutional Re-

current Neural Networks for Large Vocabulary Speech Recognition”. In:

CoRR abs/1610.03165 (2016). arXiv: 1610.03165.

[87] M. Schuster and K. K. Paliwal. “Bidirectional recurrent neural networks”.

In: IEEE Transactions on Signal Processing 45.11 (1997), pp. 2673–2681. ISSN:

1053-587X. DOI: 10.1109/78.650093.

[88] Understanding LSTM Networks. http://colah.github.io/posts/2015-

08-Understanding-LSTMs/. Accessed: 2019-07-04.

[89] C. Dyer, M. Ballesteros, W. Ling, A. M., and N. A. Smith. “Transition-

Based Dependency Parsing with Stack Long Short-Term Memory”. In:

CoRR abs/1505.08075 (2015). arXiv: 1505 . 08075. URL : http : / / arxiv .

org/abs/1505.08075.

[90] X. Ma and E. H. Hovy. “End-to-end Sequence Labeling via Bi-directional

LSTM-CNNs-CRF”. In: CoRR abs/1603.01354 (2016). arXiv: 1603.01354.

URL : http://arxiv.org/abs/1603.01354.

[91] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Opti-

mization. cite arxiv:1412.6980Comment: Published as a conference paper at

the 3rd International Conference for Learning Representations, San Diego,

2015. 2014. URL: http://arxiv.org/abs/1412.6980.

[92] C. Dong, C. C. Loy, K. He, and X. Tang. “Image Super-Resolution Using

Deep Convolutional Networks”. In: IEEE Transactions on Pattern Analysis

144
and Machine Intelligence 38.2 (2016), pp. 295–307. ISSN: 0162-8828. DOI: 10.

1109/TPAMI.2015.2439281.

[93] P. Baldi. “Autoencoders, Unsupervised Learning and Deep Architectures”.

In: Proceedings of the 2011 International Conference on Unsupervised and Trans-

fer Learning Workshop - Volume 27. UTLW’11. Washington, USA: JMLR.org,

2011, pp. 37–50. URL : http://dl.acm.org/citation.cfm?id=3045796.

3045801.

[94] R. Sebastian. “An overview of gradient descent optimization algorithms”.

In: CoRR abs/1609.04747 (2016). arXiv: 1609.04747. URL : http://arxiv.

org/abs/1609.04747.

[95] M. Luong, H. Pham, and C. D. Manning. “Effective Approaches to Attention-

based Neural Machine Translation”. In: CoRR abs/1508.04025 (2015). arXiv:

1508.04025. URL: http://arxiv.org/abs/1508.04025.

[96] S. Sabour, N. Frosst, and G. E. Hinton. “Dynamic Routing Between Cap-

sules”. In: CoRR abs/1710.09829 (2017). arXiv: 1710 . 09829. URL : http :

//arxiv.org/abs/1710.09829.

145

You might also like