Miniproject Report
Miniproject Report
A PROJECT REPORT
Submitted by
G. POOJITHA
B. VENKATA RAVINDRA
Dr. M. PRADEEPA
of
         BACHELOR OF TECHNOLOGY
                              in
                    NOVEMBER 2018
                                BONAFIDE CERTIFICATE
Certified that this minor project report titled “FEATURE EXTRACTION OF ECG
SIGNALS BY PRINCIPLE COMPONENT ANALYSIS” is the bonafide work of
“G. POOJITHA (15UEEC0043), B. VENKATA RAVINDRA (15UEEC0016)” who
carried out the project work under my supervision.
Submitted for the minor project work viva-voce examination held on ........................
                                                  i
                              ACKNOWLEGEMENT
We express our deepest gratitude to our respected Founder President And Chancellor
Col.Prof.Dr.R.RANGARAJAN,           Foundress      President    Dr.R.SAGUNTHALA
RANGARAJAN, chairperson managing trustee and vice president.
We are very much grateful to our beloved Vice Chancellor Dr.V. RAMACHANDRAN
for providing us with an environment to complete our Minor project successfully.
We thankful to our esteemed director academics Dr. ANNE KOTESWARA RAO. For
providing a wonderful environment to complete our Minor project successfully.
We extremely thankful and pay my gratitude to our dean Dr. JAYASANKAR for his
valuable guidance and support on completion of this Minor project in this presently.
                                           ii
                                      ABSTRACT
                                            iii
                          TABLE OF CONTENT
CHAPTER                          TITLE                         PAGE NO
  NO
                               ABSTRACT                          iii
                          LIST OF TABLES                         vi
                         LIST OF FIGURES                         vii
  1.                      INTRODUCTION                            1
          1.1     Introduction                                    1
          1.2     Basics of ECG                                   2
          1.2.1   Importance of ECG                               2
          1.2.2   Problems identified by ECG                      2
          1.2.3   ECG waveform                                    2
          1.2.4   Noise present in ECG signal                     6
          1.2.5   Arrhythmia                                      7
          1.2.6   Types of Arrhythmia                             8
   2                   LITERTURE REVIEW                          10
          2.1     Development history of Feature extraction      10
   3               SOFTWARE AND DATABASE                         11
                           DESCRIPTION
          3.1     MATLAB                                         12
          3.1.1   Math Graphics Programming                      12
          3.1.2   Scale Integrate Deploy                         12
          3.2     MIT-BIH Arrhythmia database                    13
   4            PRINCIPAL COMPONENT ANALYSIS                     15
          4.1     Principal Component Analysis                   15
          4.1.1   Introduction                                   15
          4.1.2   Objectives of principal component analysis     16
                                      iv
    4.1.3     Assumptions of PCA                       16
    4.2       Significance of PCA                      17
    4.3       Applications of PCA                      17
    4.4       Advantages and Disadvantages of PCA      18
    4.4.1     Advantages                               18
    4.4.2     Disadvantages                            18
5                    METHODOLOGY                       19
    5.1       Major steps in the analysis of the ECG   19
    signals
    5.1.1     ECG Preprocessing                        20
    5.2       QRS Detection                            20
    5.2.1     QRS Detection Algorithm                  21
    5.2.2     Noise elimination from ECG signals       22
    5.2.2.1 Band Pass Integer Filter                   22
    5.2.3     Derivative                               24
    5.2.4     Squaring                                 25
    5.2.5     Moving Window Integral                   25
    5.2       Algorithm of PCA                         26
6                  Results and Discussion              28
7                          Conclusion                  31
                           References                  32
                                  v
                              LIST OF TABLES
Table                              Title                Page no
 No
 1.1    Intervals of Normal heart beat                    5
 6.1    Performance of Pan-Tompkins method on MIT-BIH     29
        arrhythmia database
                                         vi
                             LIST OF FIGURES
                                         vii
                                     CHAPTER 1
INTRODUCTION
1.1          INTRODUCTION
             Electrocardiogram is a vital tool that describes the electrical activity of the
heart. Every heart contraction produces an impulse detected by electrodes placed on the
skin. The heartbeat produces a series of waves with a time variant morphology. These
waves are caused by voltage variations of the cardiac cells. Digital signal processing
techniques are used to extract useful information from the input signals received from
the body.
             ECG provide information about the heart state. Each heartbeat is produced
after an atrial depolarization (P wave), a ventricular depolarization (QRS wave) and a
ventricular repolarization (T wave). These three stages are continuously repeated.
                                            1
1.2          Basics of ECG
A normal ECG signal contains waves, intervals, segments, and one complex defined
below
                                            2
                   4. Complex (QRS): The combination of multiple waves grouped
                     together.
1. P-wave
                                            3
1. A deflection is only referred to as wave if it passes the baseline.
2. If the first wave is negative, then it is referred to as Q-wave. If the
   wave is not negative, then the QRS complex does not possess a Q-
   wave, regardless of the appearance of the QRS complex.
3. All positive waves are referred t as R-waves. The first positive wave
   is simply an “R-wave” (R). The second positive wave is called “R-
   prime wave” (R’). If a third positive wave occurs (rare) it is referred
   to as “R-bis wave” (R’’).
4. Any negative wave occurring after a positive wave is an S-wave.
5. Large waves are referred to by their capital letters (Q, R, S), and
   small waves are referred to by their lower-case letters (q, r, s).
                           4
                          Figure 1.2 Types of QRS complex
3. T-wave
              The T-wave should be concordant with the QRS complex, meaning that a
net positive QRS complex should be followed by a positive T-wave, and vice versa.
Otherwise there is discordance which might be due to pathology. A negative T-wave is
also called an inverted T-wave.
                                            5
    QT interval                0.40              to 0.43             Ventricular
                                                                   depolarization
     ST interval               0.32                 ...              Ventricular
                                                                    repolarization
1. Baseline Drift
              It occurs due to respiration and body movement that can produce a low
frequency drift from the desired base line of the signal. It is usually low in frequency,
low in amplitude, and of ongoing duration. Typically, it is a signal <5Hz. This noise can
be removed by implementing a high-pass filter with cut-off frequency at nearly 5Hz.
              The ECG signal can be affected by power line interference due to the low
output levels of the signal. Induction from neighboring equipment or faulty grounding
of equipment can be led to ECG leads picking up 50Hz interference from neighboring
power line currents. To reduce or remove this unwanted feature, removal of possible
interfering equipment and power cables is an effective approach. It is typically fixed
frequency, high in amplitude, and of ongoing duration. The amplitude of power-line
noise is very large.
                                           6
   4.        Flat line/ missing lead
  Figure 1.3 Noise of ECG signals (a) Baseline Drift, (b)Power line interference,
                                   (c)EMG noise
                                          7
1.2.5         Arrhythmia
              Normally, the Sinoatrial Node (SA) generates the initial electrical impulse
and begins the cascade of events that result in a heart-beat. For a normal healthy person,
the ECG comes off as a nearly periodic signal with depolarization followed by
repolarization at equal intervals. However, sometimes this rhythm becomes irregular.
                 1. Sinus rhythm
                     This is the normal rhythm of the heart and results from proper
                     activation of the entire heart in proper sequence. Any variation from
                     sinus rhythm is termed an arrhythmia.
                 2. Ventricular Fibrillation
                     It is a life-threatening arrhythmia which is characterized by rapid,
                     irregular activation of the ventricles and thereby prevents an
                     effective mechanical contraction. During ventricular fibrillation,
                     the ECG has no distinctive QRS complexes but instead consists of
                     an undulating baseline of variable amplitude. Although the sinus
                     node continues to function properly, P waves cannot be discerned
                     in the waveform.
                                            8
3. Ventricular tachycardia
   It is rhythm characterized by wide, bizarre QRS complexes and
   frequent ventricular premature contractions in a row. It may be
   paroxysmal or chronic and often signifies underlying myocardial
   disease.
4. Atrial flutter
   It is an AV node-independent intra-atrial macro-reentry rhythm, in
   which the atrial anatomy sustains a loop of continuous
   depolarization, often around the tricuspid value annulus in the right
   atrium. Atrial flutter can be paroxysmal or chronic and may be
   associated with extremely rapid ventricular response rates.
5. Atrial fibrillation
   It is characterized by rapid, irregular activation of the atria. It causes
   can include reentry and abnormal automaticity. It can be
   paroxysmal or chronic.
                           9
                                        Chapter 2
LITERTURE REVIEW
              A sudden Cardiac Death (SCD), which happens within one hour of onset
of symptoms because of cardiac causes. According to Heart Disease and Stroke Statistics
from the American Heart Association (AHA). That the number is expected to rise to
more than 23.6 million by 2030, the report found. In every year number of deaths are
increases and in most cases is the result of ventricular tachychardia (VT) or ventricular
fibrillation (VF). The implantable cardioverter-defibrillator has been considered as the
best protection against sudden death from ventricular arrhythmias in high-risk
individuals. However, most sudden deaths occur in individuals who do not have high-
risk profiles. This feature extraction finds the amplitudes and intervals in the P-QRS-T
wave’s analysis for classifying the normal and abnormal of the heart beat activity. This
amplitudes and intervals in the P-QRS-T waves concluded the performance of heart of
every human. In ECG, extracting the features of the P-QRS-T waves has been studied
from early time and lots of techniques as well as conversion have been presented for
accurate analysis and ECG feature extraction.
              Tien-En Chen et al [2] discussed the first (S1) and second (S2) heart beat
sound recognition based only on sound characteristics. These two assumptions of the
individual periods of S1 and S2 and time durations of S1-S2 and S2-S1 are not involved
in the detection process. These techniques use the deep neural network (DNN) concept
is used for recognized the S1 and S2 heart beat sounds. In DNN, the first heart sound
signals are first converted into a sequence of MFCC and then by using K-means
algorithm are applied to cluster features into individual groups to refine their illustration
                                             10
and discriminative capability. The refined features are then feed to the DNN classifier to
carry out the S1 and S2 identification. The DNN based method can achieve the high
precision with greater than 91 % accuracy.
               Farzad et al [3] focused the detection the specific fiducial points of the
Seismocardiogram (SCG) signal with or without using the Electrocardiogram (ECG) R-
wave as the reference point. The identified fiducial points were used to find the cardiac
time intervals. In sensitivity and complexity of the SCG signal, the presented algorithm
was intended to strongly reject the low-quality cardiac cycles, which are the ones that
include unfamiliar fiducial points. This presented algorithm is applied to concurrent ECG
and SCG signals, the desired fiducial points of the SCG signal were effectively estimated
with a high detection rate.
               Saxena et al [4] discussed the approach for efficient feature extraction form
ECG signals. He deals with a competent technique which has been created or signal
retrieval, data compression, and feature extraction of ECG signals. After the signal
retrieval from the compressed ECG data, it has been originated that the network not only
compresses the ECG data but is also improves the quality if recovered ECG signal with
respect to elimination of high frequency present in the original ECG signal.
               The algorithm proposed by Mallat and Hwang was first applied to QRS
detection. R-peaks are found by scanning for simultaneous modulus maxima in the
relevant scales of the WT. For a valid R-peak the estimated regularity must be greater
than zero; i.e., α > 0.
                                             11
                                     CHAPTER 3
3.1 MATLAB
              MATLAB helps to take the ideas beyond the desktop. It can analyses on
larger data sets and scale up to clusters and clouds. MATLAB code can be integrated
with other languages, enabling to deploy algorithms and applications within web,
enterprise production systems.
                                           12
3.1.2.1       Key Features
                                            13
Heart Association (AHA) database, it played an interesting role in stimulating
manufactures of arrhythmia analyzers to compete based on objectively measurable
performance, and much of the current appreciation of the value of common databases,
both for basic research and for medical device development and evaluation can be
attributed.
                                         14
                                     CHAPTER 4
4.1.1 Introduction
                                           15
4.1.2        Objectives of principal component analysis
Figure 4.2 Plot of data showing regression line and orthogonal line
                                         16
4.2           Significance of PCA
              Sometimes it is hard to find pattern in the data where the data is of very
high dimension and that is where the PCA comes into picture. PCA is a powerful
analyzing tool. The goal is to extract the important information from the table of
observations, represent it as set of new orthogonal variables called principal components,
and to display the pattern of similarity or dissimilarity of the observations and of the
variables. PCA, being a simple, non-parametric method of extracting relevant
information form confusing data sets, is used abundantly in all forms of analysis. PCA
shows the way for reducing a complex data set to a lower to reveal the sometimes hidden,
simplified structure that often underlie it.
              One major advantage of PCA is that once we have found these patterns in
the data, and we can compress the data, i.e., by reducing the number of dimensions,
without much loss of information. This technique can be used in image compression,
data compression, variable reduction and many other applications [3]. Its goal is the
reduction in the number of dimensions from a numerical measurement of several
variables. With this dimensional reduction, this technique looks for simplifying a
statistical problem with the minimal loss of information. This method is also used in
signal processing for separating a linear combination of signal generated from sources
that are statistically independent. This is performed by representing the data with a new
coordinate system.
                                               17
4.4     Advantages and Disadvantages of PCA
4.4.1 Advantages
                                      18
                                     CHAPTER 5
METHODOLOGY
Noise filter removes and reduces the noise components from various sources in the ECG.
Cardiac cycle detection involves detecting the QRS complex peak corresponding to each
beat. QRS Complex detection is implemented using Tompkins QRS complex detection
algorithm.
Feature Extraction
Feature extraction includes formulation and selection of characteristic features such that
they significantly relate to the abnormalities. Additional features are extracted by
performing complexity analysis on the signal.
                                           19
5.1.1         ECG Preprocessing
              ECG preprocessing is treating the ECG signal before extracting its useful
information. Different methods are used in the design of the filters used to remove
unwanted parts of signal or the different types of noise present such as power line
interference which is most common ECG noise. Filtering is also used to isolate desired
parts of the signal, such as the components lying within a certain frequency range.
              The main task of the ECG pre-processing is the accurate detection of QRS
complex. This is highlighted by the R-peak which is the easiest to detect due to its
greatest magnitude. The R peak is a key indicator of the heart rate and heart rate
variability and can show irregularities such as slow accelerated heart rates. The time
interval between the R-peaks is known as the R-R intervals. It is used to calculate the
heart rate which is calculated by
                           𝑆𝑎𝑚𝑝𝑙𝑒𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 ∗ 60
                                                = 𝑏𝑝𝑚
                              𝑅𝑝𝑒𝑎𝑘2 ∗ 𝑅𝑝𝑒𝑎𝑘1
Where bpm is the heart beats per minute and 𝑹𝒑𝒆𝒂𝒌𝟏 and 𝑹𝒑𝒆𝒂𝒌𝟐 are the times of the
consecutive peaks. The normal rate of the heart is from 60 to 100bpm. So, from the
recorded ECG, we can determine whether the heart activity is normal or abnormal.
              QRS detection algorithm is used to extract the R points of the ECG and it
is an important parameter in obtaining the R-R intervals. For this work, Pan and
Tompkins algorithm were used for QRS detection. This algorithm is chosen in this work
because it has sensitivity of 99.69% and a positive predictivity of 99.77% when tested
using the MIT/BIH database. The QRS detection algorithm developed by Pan and
Tompkins recognizes QRS complex based on analyses of the slope, amplitude and width.
                                           20
5.2.1         QRS Detection Algorithm
Figure 5.2 shows the various processes involved in the analysis of the ECG signal. To
isolate the portion of the wave where QRS energy is predominant, the signal is passed
through a band pass filter composed of cascaded high-pass and low-pass integer filters.
Then the signal is subjected to differentiation, squaring, time averaging and finally peak
is detected by applying threshold. The band pass filter is designed from a special class of
digital filters that require only integer co-efficients. Since it was not possible to directly
design the desired band pass filter with this special approach, the design consists of
cascaded low-pass and high-pass filter sections. Then next processing step is
differentiation, a standard technique for finding the high slopes that normally distinguish
the QRS complexes from other ECG waves. To this point in the algorithm, all the
processes are accomplished by linear digital filters. The differentiated waveform is
subjected to a nonlinear transformation. The nonlinear transformation involves point-by-
point squaring of the signal samples. This transformation serves to make all the data
positive prior to subsequent integration and accentuates the higher frequencies in the
                                            21
signal obtained from the differentiation process. These higher frequencies are normally
characteristic of the QRS complex.
The squared waveform passes through a moving window integrator. This integrator sums
the area under the squared waveform over a 150msec interval, advances 1 sample
interval and integrates the new 150msec window. The width of the window was chosen
to belong enough to include the time duration of extended abnormal QRS complexes,
but short enough so that it does not overlap both the QRS complex and the T wave.
Adaptive amplitude threshold applied to the band pass filtered waveform and to the
moving integration wave form are based on continuously updated estimate of the peak
signal level and the peak noise. After preliminary detection by the adaptive thresholds,
decision processes make the final determination as to whether detected event was a QRS
complex. A measurement algorithm calculates the QRS duration after the detection of
each QRS complex. Thus, two waveform features are available for subsequent analysis,
RR interval and QRS duration.
The band pass filter for the QRS (heart rate) detection algorithm reduces noise in the
ECG signal by matching the spectrum of the average QRS complex. Thus, it attenuates
T wave interference as well as noise. The pass band that maximizes the QRS energy is
approximately in the 5–15Hz range. The filter implemented in this algorithm is a
recursive integer filter in which poles are located to cancel the zeros on the unit circle of
the z plane. A low pass filter and a high pass filter are cascaded to form the band pass
filter.
                                            22
                                         (1 − z −6 )2
                                  H(z) =
                                         (1 − z −1 )2
The high pass filter is implemented by subtracting a first order low pass filter from an all
pass filter with delay. The low pass filter is an integer co efficient filter with the transfer
function
                                          Y (z)        (1−z−32 )
                             Hlp (z) =            =
                                          X (z)        (1−z−1 )
The high pass filter is obtained by dividing the output of the low pass filter by its dc gain
and then subtracting from the original signal.
                                                  23
                                           P (z)       (1−z−32 )
                              Hhp (z) =            =
                                           X (z)       (1−z−1 )
5.2.3 Derivative
After the signal has been filtered it is then differentiated to provide information about
the slope of the QRS complex.
𝐻 (𝑧) = 0.1(2 + 𝑧 −1 − 𝑧 −3 − 2𝑧 −4 )
The fraction 1/8 approximates the actual gain of 0.1. This derivative approximates the
ideal derivative in the dc through 30Hz frequency range.
                                             24
5.2.4         Squaring
Function The squaring function that the signal passes through is a nonlinear operation.
The equation that implements this operation is
This operation makes all data points in the processed signal positive and it amplifies the
output of the derivative process nonlinearly. It emphasizes the higher frequencies in the
signal that are mainly due to the QRS complex.
The slope of the R wave alone is not a guaranteed way to detect a QRS event. Many
abnormal QRS complexes that have large amplitude and long duration (not very steep
slopes) might not be detected using information of the R wave only. Thus, we need to
extract more information from the signal to detect a QRS event. Moving window
integration extracts features in addition to the slope of the R wave. It is implemented
with the following difference equation
Where N is the number of samples in the width of the moving window. The width of the
window should be approximately the same as the widest possible QRS complex. If the
size of the window is too large the integration waveform will merge the QRS and T
complexes together. On other hand, if the size of the window is too small, a QRS
                                           25
complex could produce several peaks at the output of the stage. The width of the window
should be chosen experimentally.
                                     𝑦 (1)
                                     𝑦 (2)
                              y(k)=[        ]              (1)
                                        ⋮
                                     𝑦 (𝑀 )
where M is the samples of the heartbeat. Thus, the heartbeats 𝒚𝟏 , 𝒚𝟐,⋯, 𝒚𝑵 are the N
observations of heartbeats. The entire ensemble of heartbeats is represented by the M×N
matrix[8].
      1. Calculate the mean vector: The mean vector of each heartbeat is calculated as in
         (3)
                                        1
                                     y= ∑𝑀
                                         𝑖=1 𝑦𝑖                   (3)
                                        𝑀
𝒚𝒂𝒅𝒋𝑖 = 𝑦𝑖 − 𝑦 (4)
                                                26
3. Compute the covariance matrix, as shown in (6)
                         1
                   C=         ∑𝑀          𝑇
                               𝑖=1(𝑦𝑖 − 𝑦) (𝑦𝑖 − 𝑦)         (6)
                        𝑀−1
C· 𝑒𝑖 = λ 𝒊 · 𝑒𝑖 (7)
5. Choosing components and forming a feature vector. The eigenvector with the
   highest value is the principal component. Then, the eigenvectors are ordered by
   eigenvalues from highest to lowest, which returns the components in order of
   significance.
6. Subsequently, the dimensionality is reduced by selecting K-principal components
   that retain the physiological information. Thus, the percentage of variance, 𝒓𝒌 , of
   each eigenvalue is obtained by applying (8)
                                       𝟏   ∑𝑲
                                            𝒊=𝟏 λ 𝒊
                               𝒓𝒌 =                          (8)
                                      𝑴−𝟏 ∑𝑵
                                           𝒊=𝟏 λ 𝒊
8. Deriving the new data set, the final datadet is obtained by (10)
                                𝑌𝑝𝑐𝑎(𝑘) = 𝑟𝑘 Yadj𝑇            (10)
                                                 27
                                     CHAPTER 6
The QRS automatic detection and extraction methods have been validated using the
MIT-BIH arrhythmia database[8][9]. The MIT-BIH arrhythmia database contains
records sampled at 360 Hz, with 11-bit resolution over 5mV range. Each record contains
a duration of 30minutes. For QRS detection, only the first channel of each record has
been considered. A total of 11 records have been considered. These records contain
inverted QRS polarity and low amplitude QRS, ventricular ectopic beats with low SNR,
premature ventricular beats, and premature atrial beats. The performance of the proposed
algorithm has been essentially evaluated by two parameters Sensitivity (Se)(1) and
positive predictivity(+P)(2).
                                    𝑇𝑃
                         Se(%) =           × 100            (1)
                                   𝑇𝑃+𝐹𝑁
                                    𝑇𝑃
                         +P(%)=            × 100            (2)
                                   𝑇𝑃+𝐹𝑃
Where TP (true positive) is the number of heartbeats properly detected (i.e., QRS
complexes properly detected), FN (false negative) indicates the number of heartbeats
that were not detected by this method (i.e., QRS complexes that were not detected), and
FP (False positive) indicates the false heartbeats detected (i.e., QRS complexes detected
by the method when no QRS complexes are present).
The sensitivity parameter (Se) indicates the percentage of heartbeats that were correctly
detected by the algorithm. The positive predictivity (+P) indicated the percentage of
                                             28
heartbeats detections which were real true heartbeats. The table represents the results of
the method applied to the records extracted from the MIT-BIH arrhythmia database.
  Record        No of         TP           FP           FN           Se           +P
                beats
    100         2273         2273           0            0         100.00      100.00
    101         1865         1863           9            1         99.94        99.51
    103         2084         2082           0            1         99.95       100.00
    105         2412         2359           5           28         96.13        99.93
    111         1774         1494           1            0         100.00       99.84
    112         2539         2539           5            4         98.82        99.78
    118         1535         1535           1            0         100.00       99.73
    203         2136         2094           1           21         99.00        99.80
    223         2605         2552           0           26         98.99        99.47
    232         1573         1569           1            2         99.87        99.93
   Total        20796       20360          23           83         90.25        90.72
The results of our method to detect QRS complexes are significantly precise. Our method
scored Se =90.25% and +P = 90.72% over 20796 heartbeats.
                                           29
After the QRS detection, 90 samples were selected from the left side of R-peak and 90
samples after the R-peak point. Then, the PCA technique has been applied to select useful
features which can be used for further ECG recognition,
The linear dimensionality reduction of the input Y(k) is obtained by the PCA technique.
This technique provides projection of Y(k) in the direction of highest variance. Figure
6.2 shows the fraction of total variance in the data as explained by each principal
component. As it can be seen the first four principal components account for around 99%
of the variance.
                                           30
                                     CHAPTER 7
CONCLUSION
This project has presented a novel approach for QRS detection of electrocardiogram
signals by applying the Pan-Tompkins technique. In addition, Principal Component
Analysis is implemented for feature extraction of QRS complex.
                                           31
                               REFERENCES
32