0% found this document useful (0 votes)
11 views552 pages

Information Technology in Biomedicine: Ewa Pietka Pawel Badura Jacek Kawa Wojciech Wieclawek

The document outlines the proceedings of the 9th International Conference on Information Technology in Biomedicine (ITIB 2022) held in Poland, focusing on advancements in intelligent systems and computing. It covers a wide range of topics including computational methods, image analysis, and applications in healthcare, emphasizing the integration of artificial intelligence in understanding human complexity and improving patient care. The series aims for rapid dissemination of research results and includes contributions from various academic experts in the field.

Uploaded by

ofabilcan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views552 pages

Information Technology in Biomedicine: Ewa Pietka Pawel Badura Jacek Kawa Wojciech Wieclawek

The document outlines the proceedings of the 9th International Conference on Information Technology in Biomedicine (ITIB 2022) held in Poland, focusing on advancements in intelligent systems and computing. It covers a wide range of topics including computational methods, image analysis, and applications in healthcare, emphasizing the integration of artificial intelligence in understanding human complexity and improving patient care. The series aims for rapid dissemination of research results and includes contributions from various academic experts in the field.

Uploaded by

ofabilcan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 552

Advances in Intelligent Systems and Computing 1429

Ewa Pietka
Pawel Badura
Jacek Kawa
Wojciech Wieclawek Editors

Information
Technology in
Biomedicine
9th International Conference, ITIB 2022
Kamień Śląski, Poland, June 20–22, 2022
Proceedings
Advances in Intelligent Systems and Computing

Volume 1429

Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland

Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing,
Universidad Central de Las Villas, Santa Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, School of Computer Science and Electronic Engineering,
University of Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University,
Gyor, Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas
at El Paso, El Paso, TX, USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao
Tung University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology,
University of Technology Sydney, Sydney, NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute
of Technology, Tijuana, Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro,
Rio de Janeiro, Brazil
Ngoc Thanh Nguyen , Faculty of Computer Science and Management,
Wrocław University of Technology, Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering,
The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications
on theory, applications, and design methods of Intelligent Systems and Intelligent
Computing. Virtually all disciplines such as engineering, natural sciences, computer
and information science, ICT, economics, business, e-commerce, environment,
healthcare, life science are covered. The list of topics spans all the areas of modern
intelligent systems and computing such as: computational intelligence, soft comput-
ing including neural networks, fuzzy systems, evolutionary computing and the fusion
of these paradigms, social intelligence, ambient intelligence, computational neuro-
science, artificial life, virtual worlds and society, cognitive science and systems,
Perception and Vision, DNA and immune based systems, self-organizing and
adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics
including human-machine teaming, knowledge-based paradigms, learning para-
digms, machine ethics, intelligent data analysis, knowledge management, intelligent
agents, intelligent decision making and support, intelligent network security, trust
management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are
primarily proceedings of important conferences, symposia and congresses. They
cover significant recent developments in the field, both of a foundational and
applicable character. An important characteristic feature of the series is the short
publication time and world-wide distribution. This permits a rapid and broad
dissemination of research results.
Indexed by DBLP, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and
Technology Agency (JST).
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose (aninda.bose@springer.com).

More information about this series at https://link.springer.com/bookseries/11156


Ewa Pietka Pawel Badura
• •

Jacek Kawa Wojciech Wieclawek


Editors

Information Technology
in Biomedicine
9th International Conference, ITIB 2022
Kamień Śląski, Poland, June 20–22, 2022
Proceedings

123
Editors
Ewa Pietka Pawel Badura
Faculty of Biomedical Engineering Faculty of Biomedical Engineering
Silesian University of Technology Silesian University of Technology
Gliwice, Poland Gliwice, Poland

Jacek Kawa Wojciech Wieclawek


Faculty of Biomedical Engineering Faculty of Biomedical Engineering
Silesian University of Technology Silesian University of Technology
Gliwice, Poland Gliwice, Poland

ISSN 2194-5357 ISSN 2194-5365 (electronic)


Advances in Intelligent Systems and Computing
ISBN 978-3-031-09134-6 ISBN 978-3-031-09135-3 (eBook)
https://doi.org/10.1007/978-3-031-09135-3
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The continuous growth of the amount of medical information and the variety of
multimodal content necessitates the demand for a fast and reliable technology able
to process data and deliver results in a user-friendly manner at the time and place
the information is needed. Multimodal acquisition systems, AI-powered applica-
tions, and computational methods for understanding human complexity, sound and
motion in physiotherapy, and prevention give new meaning to optimization of the
functional requirements of the healthcare system for the benefit of the patients. We
give back to the readers the book, which includes chapters written by academic
society members. The scientific scope of particular sections includes the aspects
listed below.
New explainable approaches to Computational Methods for Understanding
Human Complexity become a challenging artificial intelligence frontier able to
discover and investigate the relationship between internal biomedical processes
of the human body with the massive amount of heterogeneous data that can be
collected from humans (image data, sensor signals, phenotype, genotype, micro-
biome, clinical history, etc.). Approaches discovering the complex character of the
human body and mind employ data-intensive computational methods, e.g., pattern
recognition, machine learning, data science, and pervasive computing. Cognitive
disorders in Parkinson’s patients, human activity recognition, behavior changes due
to a hospital stay, multimodal emotion recognition systems, auditory processing
disorder in children, natural language features in patients with anorexia, and sleep
quality in population study are just examples of applications discussed in the first
part.
The Image Analysis section presents original studies reporting on scientific
approaches to support the CT and MR brain image analysis, laryngeal image
processing from high-speed video endoscopy, monitoring changes in corneal
structure, and skin layer segmentation. Computational pathology and cell studies
have been carried out on breast, cervical, and ileum images for analysis and clas-
sification. A study on chest X-rays in patients with COVID-19 is a response to the
need to face the pandemic. This section also covers fundamental studies on

v
vi Preface

comparing various approaches to analyzing microtomographic images, MRI images


with different matrix sizes, dense microorganism counting, and cephalometric
images.
Modern approach to Physiotherapy poses numerous tasks related to assessing
and monitoring the quality of human health, both in the functional and physio-
logical fields. They are carried out in hospitals, outpatient clinics, laboratories, and
everyday life. Physiotherapists more and more often indicate the need to search for
modern technological solutions whose implementation is possible only in close
collaboration with the engineering community. An interesting approach to neu-
rodevelopmental assessment in newborns and infants based on Prechtl’s general
movement estimation method is introduced by employing deep learning neural
networks to track the location of specific points during regular writhing movements.
Physiotherapy is very often related to severe pain. Its assessment is crucial for a safe
and efficient course of physiotherapy. Biomedical signals and video data of a head
pose may indicate the level of pain feeling. Their analysis in patients undergoing
fascial therapy serves as a warning against severe pain, displayed as a specific tissue
guard that protects it from damage. Feedback during rehabilitation or physiotherapy
is often required and should be understandable to the patient. Muscle training may
serve as an example for a simple presentation of the electromyographic signal.
The Sound and Motion section introduces innovative support in physiotherapy
by incorporating diagnosis before rehabilitation, real-time monitoring, and
patient-related therapeutic activity. Metro-rhythmic stimulations and spontaneous
activity become vital elements. A novel approach to correct static posture in daily
activity is induced by analyzing physiological parameters acquired during exercises
that teach the patient’s proper body posture. Introduced by the employee may
predict the impact of professional training on the well-being and health of
individuals.
The Signal Processing section indicates several exciting approaches to biosignal
acquisition and analysis. Human activity recognition from data acquired by motion
sensors with high accuracy is reported. Real-time heart rate measurement by an
embedded accelerometer in smartphones may assist the active population’s
everyday lives. Contactless photoplethysmography has been employed to calculate
pulse from a face video recording. A preliminary study on monitoring uterine
contractions through electrohysterography is also discussed.
The Modeling and Simulation section presents studies that provide valuable
solutions by giving clear insights into various complex systems. One of them shows
a concept of a drift correction method to account for dynamic changes in meta-
bolomics data for better prediction outcomes of phenotypes in a biomedical setting.
The other derives a numerical finding of the function representing the
time-dependent virus transmission intensity coefficient in an exemplary infectious
disease model. Blood glucose level control in diabetic patients has been analyzed
by simulating the insulin-glucose system for a cohort of virtual patients with
sampled model parameters. Genes of various groups of patients with different
stages of bladder cancer were subjected to the analysis and have indicated the
suitability for the cancer grade and stage diagnostics.
Preface vii

The editors would like to express their gratitude to the authors who have sub-
mitted their original research papers and all the reviewers for their valuable com-
ments. Your effort has contributed to the high quality of the book we proudly pass
on to the readers.

June 2022 Ewa Pietka


Contents

Computational Methods for Understanding Human Complexity


Classifying Changes in Motion Behaviour Due to a Hospital Stay
Using Floor Sensor Data – A Single Case Study . . . . . . . . . . . . . . . . . . 3
Laura Liebenow, Jasmin Walter, Raoul Hoffmann, Axel Steinhage,
and Marcin Grzegorzek
Cloud-Based System for Vital Data Recording at Patients’ Home . . . . . 15
Alexander Keil, Kai Hahn, Rainer Brück, Nick Brombach, Nabeel Farhan,
and Olaf Gaus
Tele-BRAIN Diagnostics Support System for Cognitive Disorders
in Parkinson’s Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Andrzej W. Mitas, Agnieszka A. Gorzkowska,
Katarzyna Zawiślak-Fornagiel, Andrzej S. Małecki, Monika N. Bugdol,
Marcin Bugdol, Marta Danch-Wierzchowska, Julia M. Mitas,
and Robert Czarlewski
EEG Signal and Deep Learning Approach in Evaluation of Cognitive
Declines in Parkinson’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Marcin Bugdol, Daniel Ledwoń, Monika N. Bugdol,
Katarzyna Zawiślak-Fornagiel, Marta Danch-Wierzchowska,
and Andrzej W. Mitas
The Role of Two-Dimensional Entropies in IRT-Based Pregnancy
Determination Evaluated on the Equine Model . . . . . . . . . . . . . . . . . . . 54
Marta Borowska, Małgorzata Maśko, Tomasz Jasiński,
and Małgorzata Domino
Eye-Tracking as a Component of Multimodal Emotion
Recognition Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Weronika Celniak and Piotr Augustyniak

ix
x Contents

Sleep Quality in Population Studies – Relationship of BMI and Sleep


Quality in Men . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Agnieszka Witek, Marcin Bugdol, and Anna Lipowicz
Exploring Developmental Factors Related to Auditory Processing
Disorder in Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Michał Kręcichwost, Natalia Moćko, Magdalena Ławecka,
Zuzanna Miodońska, Agata Sage, and Paweł Badura
Morphological Language Features of Anorexia Patients Based
on Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Stella Maćkowska, Klaudia Barańska, Agnieszka Różańska,
Katarzyna Rojewska, and Dominik Spinczyk

Image Analysis
Comparison of Analytical and Iterative Algorithms
for Reconstruction of Microtomographic Phantom Images and Rat
Mandibular Scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Paweł Lipowicz, Agnieszka Dardzińska-Głębocka, Marta Borowska,
and Ander Biguri
Comparison of Interpolation Methods for MRI Images Acquired
with Different Matrix Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Adam Cieślak, Adam Piórkowski, and Rafał Obuchowicz
Preprocessing of Laryngeal Images from
High-Speed Videoendoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Justyna Kałuża, Paweł Strumiłło, Ewa Niebudek-Bogusz,
and Wioletta Pietruszewska
Construction of a Cephalometric Image Based on Magnetic
Resonance Imaging Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Piotr Cenda, Rafał Obuchowicz, and Adam Piórkowski
Analysis of Changes in Corneal Structure During Intraocular
Pressure Measurement by Air-Puff Method . . . . . . . . . . . . . . . . . . . . . . 155
Magdalena Jędzierowska, Robert Koprowski, and Sławomir Wilczyński
Discrimination Between Stroke and Brain Tumour in CT Images
Based on the Texture Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Monika Kobus, Karolina Sobczak, Mariusz Jangas, Adrian Świątek,
and Michał Strzelecki
The Influence of Textural Features on the Differentiation
of Coronary Vessel Wall Lesions Visualized on IVUS Images . . . . . . . . 181
Weronika Małek, Tomasz Roleder, and Elżbieta Pociask
Contents xi

Computer Aided Analysis of Clock Drawing Test Samples via


PACS Plugin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Jacek Kawa, Maria Bieńkowska, Adam Bednorz, Michał Smoliński,
and Emilia J. Sitek
Study on the Impact of Neural Network Architecture and Region
of Interest Selection on the Result of Skin Layer Segmentation
in High-Frequency Ultrasound Images . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Dżesika Szymańska, Joanna Czajkowska, Szymon Korzekwa,
and Anna Płatkowska-Szczerek
Skin Lesion Matching Algorithm for Application in Full Body
Imaging Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Maria Strąkowska and Marcin Kociołek
Artifact Detection on X-ray of Lung with COVID-19 Symptoms . . . . . . 234
Alicja Moskal, Magdalena Jasionowska-Skop, Grzegorz Ostrek,
and Artur Przelaskowski
Semantic Segmentation of Abnormal Lung Areas on Chest X-rays
to Detect COVID-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Artur Przelaskowski, Magdalena Jasionowska-Skop, and Grzegorz Ostrek
Verification of Selected Segmentation Methods in Relation
to the Structures of the Knee Joint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Weronika Żak and Piotr Zarychta
Rigid and Elastic Registrations Benchmark on Re-stained Histologic
Human Ileum Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Paweł Cyprys, Natalia Wyleżoł, Adrianna Jagodzińska, Julia Uzdowska,
Bartłomiej Pyciński, and Arkadiusz Gertych
DVT: Application of Deep Visual Transformer in Cervical Cell
Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Wanli Liu, Chen Li, Hongzan Sun, Weiming Hu, Haoyuan Chen,
and Marcin Grzegorzek
Image Classification in Breast Histopathology Using Transfer
and Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Yuchao Zheng, Chen Li, Xiaomin Zhou, Haoyuan Chen, Haiqing Zhang,
Yixin Li, Hongzan Sun, and Marcin Grzegorzek
PIS-Net: A Novel Pixel Interval Sampling Network for Dense
Microorganism Counting in Microscopic Images . . . . . . . . . . . . . . . . . . 307
Jiawei Zhang, Chen Li, Hongzan Sun, and Marcin Grzegorzek
xii Contents

Biomedical Engineering and Physiotherapy, Joint Activities,


Common Goals
Analysis of Expert Agreement on Determining the Duration
of Writhing Movements in Infants to Develop an Algorithm
in OSESEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Dominika Latos, Daniel Ledwoń, Marta Danch-Wierzchowska,
Iwona Doroniewicz, Alicja Affanasowicz, Katarzyna Kieszczyńska,
Małgorzata Matyja, and Andrzej Myśliwiec
Comparative Analysis of Selected Methods of Identifying
the Newborn’s Skeletal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
Adam Mrozek, Marta Danch-Wierzchowska, Daniel Ledwoń,
Dariusz Badura, Iwona Doroniewicz, Monika N. Bugdol,
Małgorzata Matyja, and Andrzej Myśliwiec
Head Pose and Biomedical Signals Analysis in Pain
Level Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Maria Bieńkowska, Aleksandra Badura, Andrzej Myśliwiec,
and Ewa Pietka
Electromyograph as a Tool for Patient Feedback in the Field
of Rehabilitation and Targeted Muscle Training . . . . . . . . . . . . . . . . . . 356
Michal Prochazka and Vladimir Kasik
Touchless Pulse Diagnostics Methods and Devices: A Review . . . . . . . . 367
Anna Pająk and Piotr Augustyniak
Methods of Functional Assessment of the Temporomandibular
Joints – Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Damian Kania, Patrycja Romaniszyn-Kania, Marcin Bugdol,
Anna Lipowicz, Krzysztof Dowgierd, Małgorzata Kulesa-Mrowiecka,
Zofia Polewczyk, Łukasz Krakowczyk, and Andrzej Myśliwiec

Sound and Motion


The Effect of Therapeutic Commands on the Teaching
of Maintaining Correct Static Posture . . . . . . . . . . . . . . . . . . . . . . . . . . 393
Damian Kania, Tomasz Szurmik, Karol Bibrowicz,
Patrycja Romaniszyn-Kania, Mirosław Czak, Anna Mańka, Maria Rosiak,
Bruce Turner, Anita Pollak, and Andrzej W. Mitas
Improving the Process of Verifying Employee Potential During
Preventive Work Examinations – A Case Study . . . . . . . . . . . . . . . . . . 406
Marcin Bugdol, Anita Pollak, Patrycja Romaniszyn-Kania,
Monika N. Bugdol, Magdalena Jesionek, Aleksandra Badura,
Paulina Krasnodębska, Agata Szkiełkowska, and Andrzej W. Mitas
Contents xiii

The Microphone Type and Voice Acoustic Parameters


Values – A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Łukasz Pawelec, Anna Lipowicz, Mirosław Czak, and Andrzej W. Mitas

Signal Processing
Activities Classification Based on IMU Signals . . . . . . . . . . . . . . . . . . . . 435
Monika N. Bugdol, Marta Danch-Wierzchowska, Marcin Bugdol,
and Dariusz Badura
Heart Rate Measurement Based on Embedded Accelerometer
in a Smartphone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Mirella Urzeniczok, Szymon Sieciński, and Paweł Kostka
Non-invasive Measurement of Human Pulse Based on Photographic
Images of the Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Jakub Gumulski, Marta Jankowska, and Dominik Spinczyk
The Validation Concept for Automatic Electroencephalogram
Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Julia M. Mitas and Katarzyna Zawiślak-Fornagiel
Do Contractions of Abdominal Muscles Bias Parameters Describing
Contractile Activities of a Uterus? A Preliminary Study . . . . . . . . . . . . 474
Dariusz Radomski

Simulation and Modelling


Finding the Time-Dependent Virus Transmission Intensity
via Gradient Method and Adjoint Sensitivity Analysis . . . . . . . . . . . . . . 487
Krzysztof Łakomiec, Agata Wilk, Krzysztof Psiuk-Maksymowicz,
and Krzysztof Fujarewicz
A Revealed Imperfection in Concept Drift Correction
in Metabolomics Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Jana Schwarzerova, Ales Kostoval, Adam Bajger, Lucia Jakubikova,
Iro Pierides, Lubos Popelinsky, Karel Sedlar, and Wolfram Weckwerth
Two-Dimensional vs. Scalar Control of Blood Glucose Level
in Diabetic Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
Jarosław Śmieja and Artur Wyciślok
Gene Expression Analysis of the Bladder Cancer Patients Managed
by Radical Cystectomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
Anna Tamulewicz and Alicja Mazur
xiv Contents

The Influence of Low-Intensity Pulsed Ultrasound (LIPUS)


on the Properties of PLGA Biodegradable Polymer Coatings
on Ti6Al7Nb Substrate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
Karolina Goldsztajn, Janusz Szewczenko, Joanna Jaworska,
Katarzyna Jelonek, Katarzyna Nowińska, Wojciech Kajzer,
and Marcin Basiaga

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545


Computational Methods
for Understanding Human Complexity
Classifying Changes in Motion Behaviour
Due to a Hospital Stay Using Floor Sensor
Data – A Single Case Study

Laura Liebenow1(B) , Jasmin Walter1 , Raoul Hoffmann2 , Axel Steinhage2 ,


and Marcin Grzegorzek1,3
1
Institute of Medical Informatics, Universität zu Lübeck, Ratzeburger Allee 160,
23538 Lübeck, Germany
{laura.liebenow,j.walter}@student.uni-luebeck.de
2
SensProtect GmbH, Altlaufstraße 35, 85635 Höhenkirchen-Siegertsbrunn, Germany
{raoul.hoffmann,axel.steinhage}@sensprotect.com
3
University of Economics in Katowice, ul. Bogucicka 3, 40287 Katowice, Poland
marcin.grzegorzek@uni-luebeck.de

Abstract. In this paper we discuss methods to classify behavioural dif-


ferences of individuals before and after a hospital stay, which can be
detected using only data recorded with a sensor floor. The used sensor
floor offers the possibility of recording the movement behaviour of indi-
viduals as inconspicuously as possible because it is laid under the normal
floor covering. The aspect of unobtrusive monitoring promises a versatile
use, especially in nursing, which benefits both medical staff and patients.
A Multi-Layer Perceptron (MLP), a Support Vector Machine (SVM),
a Gaussian Naive Bayes (GNB) and a Random Decision Forest (RDF)
were used to classify the data. The results of the methods are represented
with the metrics Accuracy, F1-Score, Precision, Recall/Sensitivity and
Specificity. For MLP, SVM and RDF the results are very good and show
that behavioural changes can be detected using only data recorded by a
sensor floor.

Keywords: Sensor floor · Behaviour analysis · Machine learning

1 Introduction
For this project, we investigated the topic of behavioural analysis through motion
data recorded with floor sensors. Specifically, we look at the differences in the
behaviour of the same person before and after a hospitalisation.
The floor sensor used for this work is a relatively new type of sensor which,
due to its robustness, is able to cover large areas by being installed inconspicu-
ously under the normal floor covering. This makes versatile use conceivable for
support in many areas, such as care. Due to under-staffing in nursing homes,
hospitals and similar institutions, it is not possible to care for and medically

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 3–14, 2022.
https://doi.org/10.1007/978-3-031-09135-3_1
4 L. Liebenow et al.

monitor all patients to the extent that would be desirable. For people in need of
care, it is necessary to take enough time to assess the short-term and long-term
consequences of their impairment in order to be able to adapt the degree of
treatment individually.
With our work we want to make a further step towards support with floor
sensors especially in the field of care. We set out to show that it is possible to
detect changes in a person’s behaviour using data from a sensor floor alone. For
this purpose, we have made the assumption that a person behaves differently
before and after a hospitalisation due to the physical and psychological stress of
that hospital stay. Our data set is an anonymised recording of floor sensor data
from a single nursing home resident whose entire room, including the bathroom,
is equipped with floor sensors that record data 24 h a day. There is also informa-
tion about a two-week hospital stay of the said resident. To analyse and classify
the data, we use different classification methods, such as a Multi-Layer Percep-
tron, Gaussian Naive Bayes classifier, a Support Vector Machine and Random
Decision Forests. We compare the results and analyse their classification quality
and reliability. In summary, our scientific contribution is to classify the data
measured by a new variant of sensors, a large area floor sensor, using known
methods. Our long-term goal is to use the ability to detect changes in people’s
behaviour only by floor sensor data to enable individualised analyses that pro-
vides predictive and decision-making support for treating doctors and nurses.

2 Related Work

Declining health in high age can result in behavioural changes and anoma-
lies when compared to a previous healthy period. These behaviours can unob-
trusively be measured by means of technical sensors in smart home environ-
ments, which was done using motion and environmental sensors in several stud-
ies for behavioural long-term monitoring and detection of behavioural changes
[3,18,19]. The cause of sudden changes in the behaviour and activity of elderly
people is often due to the appearance of diseases of all kinds and following hos-
pital stays for treatment. Hospital stays and associated treatments can have
two divergent effects: First, they are inherently stressful events that come with
physical and psychological side-effects, which can lead to a prolonged recovery
period after discharge. This can be visible in the daily behaviour as a reduction in
activity when compared to the baseline activity and behaviour that was observed
some time before the emergence of the disease that lead to the admission to a
hospital. Second, as the hospital stays have the goal to improve the physical
condition of the patient, measurements of personal functionality can be already
improved, and further improve after discharge, when compared with observa-
tions directly before the admission, when effects of the disease already show as
declined activity as a result from limitations due to the disease which later needs
treatment in a hospital. These patterns were also found in a large study which
regarded functional changes related to hospital stays [11]. The recovery process
is typically not totally finished at the time of discharge, as an example a steady
Sensor Data Analysis 5

increase in step counts per day can be measured in the days after a hospital stay
that came with some major surgery [5]. The present work adds to this field of
research by contributing an end-to-end behavioural classification method which
uses floor sensor data and machine learning classifiers to distinguish between
behavioural profiles that were recorded on days directly before admission and
after discharge from the hospital. The method is evaluated in a single-case pilot
study on sensor floor data recorded in a senior residence from an inhabitant who
had a hospital stay in the recording period.

3 Material and Methods

This section introduces the methods that were used within this paper for the
data recording and analysis. In particular, it describes the hardware that was
used and the algorithms that were applied and developed for the classification
task of recognising behavioural changes after hospitalisation.

3.1 SensFloor

The sensor floor used for the data recordings in this study is the model Sens-
Floor R
by the company Future-Shape GmbH [1]. This sensor floor is a contact-
less system as it detects changes in the electric capacitances on a triangular grid
of sensor fields. This measurement principle stands in contrast to sensor floors
that work by measuring force or pressure, where a direct contact to the sensors
is necessary [7]. The contact-less sensor measures through a flooring layer on top
of it. Until now it is commonly used in elderly care facilities for fall detection [15]
and for ambient assisted living at home [8], as it is easy to install and integrates
well with the environment as it is hidden under the normal flooring.
Our goal is to find out if it is possible to detect changes in a person’s behaviour
just from the data provided by a sensor floor. Around-the-clock observation that
has no influence on a person’s behaviour but still collects high-resolution data
would be best suited for this. The fact that the SensFloor  R
is able to collect the
most intensively researched parameters in gait analysis, like cadence, step width,
step length and timing, is promising for the outcome of our experiments. The
focus of the present work is on movement behaviour as it is found in trajectory
patterns and the gait analysis results show that a lot of information is contained
in the SensFloor  R
data [7].
The SensFloor  R
consist of a three layer composite [7]. The layer at the bot-
tom is a thin aluminium foil for electrical shielding. In the middle a polyester
fleece of 3mm is placed. And on top is an additional thin polyester fleece that
is coated in metal, which makes it electrically conductive. The whole system is
organised as a grid consisting of independent modules. Each module has a micro-
controller board in its centre. The microcontroller is connected to eight triangular
sensor shapes and to the power supply lines. The triangular-shaped sensor fields
are meant to measure the electric capacities. These modules are placed next to
each other up to a covering of the whole surface as seen in Fig. 1. The power
6 L. Liebenow et al.

Fig. 1. The SensFloor 


R
by Future-Shape GmbH measures electronic capacity changes
with microcontrollers connected to eight triangular shaped sensors each [1]. It can be
implemented under normal floorings. This photo is reprinted from [2]

supply lines of modules that lay next to each other are electrically connected
with textile stripes of the same fabric as the top layer. If the room has a special
shape it is also possible to cut the modules in the right shape so that they fit
in the corners, as long as the microcontroller circuit remains undamaged. The
whole installation is powered by a single power supply unit of 12V, that can be
placed at any position at the edge of the sensor floor.
There are three standard types of SensFloor  R
, low resolution (1 m × 0.5 m
modules), 16 sensors/m ), high resolution (0.5 m × 0.5 m/32 sensors/m2 ) and
2

gait resolution (0.38 m × 0.38 m/5 sensors/m2 ).


Besides these rectangular types, the SensFloor  R
sensors can also be cut into
any other shape that is needed. For the recordings for the present work, the low
resolution variant was used. The rooms in the elderly care home have a size of
20.5/m2 with a bathroom with a size of 4.7/m2 . There are only three modules
in the bathroom as there are cutouts for the bathtub or shower.
The measurements of the electric capacitance are done by the microcontroller
that is placed in the middle of the modules. It measures the electric capacitances
of the eight sensor fields connected to the module. A capacitor system keeps
a measurable amount of electric charge between two plates. The second plate to
the sensor fields is for example the foot of a person in this case. The capacitance
measuring is done with a sample rate 10 Hz. The whole process of messaging the
measurements is event-based. This means there is no fixed report rate. Each two
consecutive measurements of the capacitance of a sensor field are compared to
each other and only if they differ by a certain amount for at least one sensor field
of a module, a sensor message is generated. This leads to less excess information.
These sensor messages are sent out by the module over radio on the Industrial,
Sensor Data Analysis 7

Scientific and Medical Bands (ISM) on 868 MHz or 920 MHz, depending on the
region and radio regulations. The wireless sensor messages now can be collected
by a central transceiver in a connectionless mode. A message contains the module
ID where it comes from and the current capacitance values of the eight fields.
They remain valid as long as no new message from the same module ID arrives.
The process of the constant updating of sensor states by messages is a time series
data stream. SensFloor  R
data is stored in two different types: module capacity
information and trajectory information.

3.2 Classification Methods


This work is about detecting behavioural changes after a person’s hospitalisation
in a nursing home. For this application, a classifier is needed [16]. Classifiers find
similarities in data points that belong to the same class and assigns them to
the corresponding class. This requires several steps. First, the classes Ωi have to
be defined. In our case, the days before, Ω0 , and after, Ω1 , the hospitalisation
each represent a class, Ωi , i ∈ {0, 1}. This means that it is a two-class problem,
therefore a binary classifier is needed. Assigning classes to labelled data puts our
approach in the domain of supervised machine learning. As a next step, specific
features need to be extracted from the data. The features and how we extracted
them is described in Subsect. 3.4. In the final step of evaluating the classifiers, we
used k-fold cross validation. The entire data set is split into k parts (also called
folds), k − 1 parts can be used for training and 1 part can be used for testing,
changing the folds iteratively such that every combination of training and test
sets is evaluated once. The features of each data point serve as input and the
output should be the best fitting class Ωi for the data point. To fit the classifier,
the distances between the estimated class of the classifier and the true class,
that is known based on the label of the data point, are measured. This distance
must be minimised. We used several classifiers and compared their results. The
Multi-Layer Perceptron (MLP) is a variant of an artificial neural network (ANN)
that is widely used on various classification and regression problems [10]. The
main advantages of ANNs are the ability to efficiently process large amounts of
data and the ability to generalise results [14]. In our approach, we work with
a relatively small data set. But there is the potential to use larger data sets as
the recording of the SensFloor  R
data is ongoing.
Another method we used is the Gaussian Naive Bayes (GNB) classifier. Naive
Bayes methods are supervised learning algorithms based on the application of the
Bayes theorem with the naive assumption of conditional independence between
each pair of features at a value i of the class variable Ωi [9,20]. The observed
reliability and robustness of this classifier makes it a good choice for our classifi-
cation problem, as it is fast, efficient and effective and works well on small data
sets [13]. To run the Naive Bayes classifier, an event model is needed. We decided
to use the Gaussian event model. In Gaussian Naive Bayes, the probabilities of
the features are assumed to follow a normal or Gaussian distribution [20], [9].
We also decided to use a Support Vector Machine (SVM) for our approach.
SVMs are a set of related supervised learning methods that work by constructing
8 L. Liebenow et al.

a hyperplane which linearly separates classes in the feature space [17]. They
seemed like a good choice to try out for us because several recent studies have
reported that SVMs are generally able to provide higher classification accuracy
than other data classification algorithms, especially for linear decision problems
[6].
Since Random Decision Forests achieve very good classification accuracy,
as described in many paper sources [4], and are robust to noise due to the
randomness of the features and a large number of uncorrelated decision trees,
we decided to use it in our approach as well. Random Forests have the advantage,
that a large set of individual predictors are able to outperform any individual
model.

3.3 Preprocessing

We used the trajectory information files to create a set of features needed for
the classifiers. The data is collected in a separate file for each day so that the
data samples start and end at midnight of each day.
One complete trajectory T is given by a list of data points vi with vari-
able length n, where each data point contains the information timestamp ti ,
x-coordinate xi and y-coordinate yi . A trajectory can thus be described as
follows:
T = <v1 , ..., vn >
(1)
vi = (ti , xi , yi ) , i = 1, ..., n.
We deleted all trajectories that were not completed at the end of the day to
ignore unfinished paths. Since it was not only the resident who entered and left
the room, we also tried to filter out data from possible visitors and caregivers.
For this, the following assumptions were made:

– The resident is alone in his room at 0 o’clock.


– If the resident is in the room and a trajectory starts on the module at the
door, this trajectory belongs to a visitor or carer.
– If a visitor or carer is in the room and a trajectory ends on the module at the
door, the visitor or carer leaves the room.
– If the resident is alone in the room and a trajectory ends on the module at
the door, the resident leaves the room.
– If the resident is not in the room, the first trajectory that starts on the module
at the door belongs to the resident.

In this way, we were able to exclude many trajectories that did not belong to
the resident in the best possible way (Fig. 2).

3.4 Feature Extraction

For the network features, we decided not to use the trajectory data directly, but
to extract additional information from it. For each trajectory we calculated the
following parameters:
Sensor Data Analysis 9

Fig. 2. This is a simulation of the resident’s room plan. The big blue shape represents
the living room and the blue rectangle represents the bathroom. The numbers on the
x- and y- scale symbolise the meters for the room width and length. The dark blue
dotted line is a sample trajectory

– fd the distance travelled, or length of the trajectory, given in meters


– fΔt the duration from start to end of the trajectory, given in milliseconds
– fr the radius of the circle containing all trajectory points, given in meters
The distance can be calculated given the x- and y-coordinates of each point of
a trajectory. For the calculation we used the Euclidean distance to calculate the
distance between each two points pi of a trajectory, which are then added:

n−1  
xi
fd = (pi+1 − pi )2 , pi = , where xi , yi are from vi . (2)
i=1
yi

The duration was calculated using the Unix timestamps ti stored for each data
point of the trajectories. For the total duration of a trajectory, the time difference
between the start time t1 and the end time tn of the trajectory was calculated:
fΔt = tn − t1 , where ti is from vi . (3)
For the radius, we decided to calculate the maximum radius fr that each tra-
jectory describes in the room. To do this, we first calculated the mean point v
for each trajectory. We then calculated the distances between the mean vector
p and each point pi of the trajectory, and stored the maximum distance dmax
from the mean vector as the maximum radius of the trajectory:

fr = dmax (pi , p) = max (p − pi )2 . (4)
i∈[1,...,n]

With these parameters, we filtered out and deleted trajectories that are most
likely noise from objects in the resident’s room. Therefore, we deleted all trajec-
tories with a radius or distance less than 0.5 m.
10 L. Liebenow et al.

To create a feature vector, we calculated trajectory features in one hour slots


for the 24 h of the day. To do this, we summed the distances, durations and
radii of all trajectories that started within an hour and stored the number fc of
trajectories that passed in an hour. The final set of features contains one feature
vector f for each day, describing the behaviour of a single day that looks as
follows:

f = (fc,1 , . . . , fc,24 , fd,1 , . . . , fd,24 , fΔt,1 , . . . , fΔt,24 , fr,1 , . . . , fr,24 ) , (5)

with every fx,h being the sum of all the feature values that were calculated for
every trajectory that took place in one hour of the day, for h ∈ [1, . . . , 24] and
for all feature types x ∈ {c, d, Δt, r}. The complete data set consists of 108 of
these vectors for the 108 days considered. The full set was used as input to the
classification algorithms.
We selected the described features from the trajectories and the split in 24 h
to obtain a useful representation of a person’s daily behaviour. The total number
of trajectories that was generated by the resident is a good description of the
general activity, like leaving and re-entering the room, bed, or bathroom, which
are events that create new trajectories. The distance parameter describes how
many meters the resident walks per hour in their room, which is an indication
of how capable the person is of walking. The duration parameter describes how
long it takes the person to walk one way in their room. The duration adds the
information about how long it takes the person to walk their distances. Here, it
is assumed that a person recovering from a hospitalisation will take longer for a
way than before the hospitalisation. Radius is a parameter that helps describe a
further geometrical property of the path. For example, the resident might walk
a distance of 5 m but only go back and forth in a small area, or they might walk
from one end of the room to the other. The area of the resident’s room that
is used during the paths should also change according to the health condition,
since a person in poor health is less active. Finally, the division into 24-h helps
to get an overview of the resident’s daily behaviour. It shows when the most
active hours are and when the person tends to be inactive or is not in the room
at all. When this daily routine is recorded, a repeating pattern can be identified
over several days, which also is assumed to change after a longer hospital stay.

4 Experiments and Results


This section summarises the experiments and compares the results of the classi-
fiers.
An ethics application was submitted and approved for the recording of the
data. Since April 2021, data from 10 nursing home residents have been recorded
continuously, 24 h per day. The residents’ rooms are 20.4 m2 in size. Of this, 15 m2
are covered with Future-Shape’s SensFloor  R
, leaving space at the room edges
for power supply traces. In this case, this corresponds to 30 modules. Attached to
each room is a bathroom of 4.7 m2 , which is also covered with the SensFloor  R
.
Here are 3–4 modules installed, depending on the layout of the interior.
Sensor Data Analysis 11

For our approach, only the data set of a single resident was used. This resident
was hospitalised for several weeks due to a COVID-19 diagnosis. We used the
days before hospitalisation since the start of the recording, 54 days, and an equal
number of days after discharge to obtain a balanced data set. This gives a total
of 108 days of recorded data. The days were labelled according to their class Ωi ,
which allows them to be assigned to the time of their admission before, i = 0,
or after, i = 1, the hospitalisation. We used a k-fold cross-validation with k = 5
folds for each classifier, one of the folds being a test set as described in 3.2, and
using the rest to train the machine learning models.
We used classifier implementations provided by scikit learn [12]. Our Multi-
Layer Perceptron consists of seven layers: The input, the output and five hidden
layers. Each hidden layer has a size of 100 neurons. It iterates over a maximum of
1000000 iterations. Since an Adam optimiser is used, it determines the number
of epochs, i.e. how many times each data point is used, rather than the number
of gradient steps. Our Random Decision Forest consists of 1000 estimator trees.

4.1 Results

All results for accuracy, F1 score, precision, recall/sensitivity and specificity of


the classifiers Multi-Layer Perceptron (MLP), Gaussian Naive Bayes (GNB),
Support Vector Machine (SVM) and Random Decision Forest (RDF) are listed
in Table 1. The results are calculated from the average of the results of the
individual training epochs.

Table 1. Average results over the five training epochs of the four classifiers MLP, GNB,
SVM and RDF given in five different metrics

MLP GNB SVM RDF


Accuracy 0.77 0.69 0.77 0.84
F1-Score 0.75 0.68 0.76 0.84
Precision 0.78 0.72 0.80 0.88
Recall/Sensitivity 0.77 0.61 0.76 0.79
Specificity 0.76 0.76 0.78 0.89

From the result values it can be seen that the RDF results are generally
the best and the GNB results the worst. The MLP and the SVM both perform
equally good. The confusion matrices in Fig. 3 visualise the numbers of correctly
and incorrectly classified days per class.
12 L. Liebenow et al.

Fig. 3. These confusion matrices visualise the statistical classification results of the
classifiers per class. The numbers in the lower left and upper right square of each
matrix represent the numbers of correct classified data points. The numbers in the
upper left and lower right represent the false classified data points

4.2 Discussion

With comparable average results above 75% in all metrics, the MLP, SVM and
RDF classifiers perform good enough in the task of classifying the behavioural
patterns of the monitored nursing home resident before and after hospitalisation.
Since only the precision and specificity values of the GNB are above 70%, this
classifier is less suitable for this task. The confusion matrices (see Fig. 3) give
the numbers of correctly and falsely classified data points and are the base of
the metrics given in Table 1. It is noticeable that only the MLP is better in cor-
rectly classifying the days after hospitalisation (recall/sensitivity), than correctly
classifying the days before hospitalisation (specificity). The GNB especially has
a worse recall/sensitivity. Its specificity is nearly as good as the MLP and the
SVM. The results of the RDF are noticeable better in every metric.
These values are only meaningful for this specific resident, as each person is
individual and some people cope better with hospitalisation than others. The
extent of behavioural differences may also depend on the cause of the hospitali-
sation, i.e. whether it was a bone fracture, a severe infection or something else.
Our patient was hospitalised for a COVID-19 infection. Another difficulty for
the classification is the overall health condition of a person. In particular, the
Sensor Data Analysis 13

condition of a nursing home resident can be better or worse depending on the


day without the prevalence of measurable causes.

5 Conclusion

The results of the MLP, SVM and especially RDF classifiers show that it is
possible to detect changes in a person’s behaviour before and after a hospital
stay using only sensor floor data. This promising result implies that it may be
possible to detect individual behavioural changes just by using floor sensor data
in general. This would open many new possibilities in the field of care and early
detection of emerging diseases. Our current approach is limited for practical use
because labels added in hindsight are required to classify the data. Future plans
include an unsupervised approach that would allow behavioural analysis of more
than one person without the need for retrospective labelling of the data. This
would allow the system to serve as a continuous, divergent behaviour analysis
and alerting system.

Acknowledgement. This project was funded in parts by the Christl-Lauterbach-


Stiftung. We thank Anika Fischer and Markus Eiba for their work in the data collection
process at the senior residence.

References
1. https://future-shape.com
2. Home & smart (2021). https://www.homeandsmart.de/ambient-assisted-living-aal
3. Aran, O., Sanchez-Cortes, D., Do, M.-T., Gatica-Perez, D.: Anomaly detection in
elderly daily behavior in ambient sensing environments. In: Chetouani, M., Cohn,
J., Salah, A.A. (eds.) HBU 2016. LNCS, vol. 9997, pp. 51–67. Springer, Cham
(2016). https://doi.org/10.1007/978-3-319-46843-3_4
4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
5. Cook, D.J., Thompson, J.E., Prinsen, S.K., Dearani, J.A., Deschamps, C.: Func-
tional recovery in the elderly after major surgery: assessment of mobility recovery
using wireless technology. Ann. Thorac. Surg. 96(3), 1057–1061 (2013). https://
doi.org/10.1016/j.athoracsur.2013.05.092
6. Durgesh, K.S., Lekha, B.: Data classification using support vector machine. J.
Theor. Appl. Inf. Technol. 12(1), 1–7 (2010)
7. Hoffmann, R., Brodowski, H., Steinhage, A., Grzegorzek, M.: Detecting walking
challenges in gait patterns using a capacitive sensor floor and recurrent neural
networks. Sensors 21(4), 1086 (2021)
8. Lauterbach, C., Steinhage, A., Techmer, A., Sousa, M., Hoffmann, R.: AAL func-
tions for home care and security: a sensor floor supports residents and carers. Curr.
Dir. Biomed. Eng. 4(1), 127–129 (2018)
9. Murphy, K.P., et al.: Naive Bayes classifiers. Univ. Br. Columbia 18(60) (2006)
10. Noriega, L.: Multilayer perceptron tutorial. School of Computing, Staffordshire
University (2005)
14 L. Liebenow et al.

11. Palleschi, L., et al.: Functional recovery of elderly patients hospitalized in geri-
atric and general medicine units. The PROgetto DImissioni in GEriatria study:
in-hospital functional recovery in older adults. J. Am. Geriatr. Soc. 59(2), 193–
199 (2011). https://doi.org/10.1111/j.1532-5415.2010.03239.x
12. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn.
Res. 12, 2825–2830 (2011)
13. Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of
Naive Bayes text classifiers. In: Proceedings of the 20th International Conference
on Machine Learning (ICML-03), pp. 616–623 (2003)
14. Santos, R., Rupp, M., Bonzi, S., Fileti, A.: Comparison between multilayer feed-
forward neural networks and a radial basis function network to detect and locate
leaks in pipelines transporting gas. Chem. Eng. Trans. 32, 1375–1380 (2013)
15. Steinhage, A., Lauterbach, C.: SensFloor R and NaviFloor:R robotics applications
for a large-area sensor system. Int. J. Intell. Mechatron. Robot. (IJIMR) 3(3), 43–
59 (2013)
16. Theodoridis, S., Koutroumbas, K.: Chapter 2 - classifiers based on Bayes deci-
sion theory. In: Theodoridis, S., Koutroumbas, K. (eds.) Pattern Recognition,
4th edn, pp. 13–89. Academic Press, Boston (2009). https://doi.org/10.1016/
B978-1-59749-272-0.50004-9, https://www.sciencedirect.com/science/article/pii/
B9781597492720500049
17. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Cham (1999).
https://doi.org/10.1007/978-1-4757-3264-1
18. Veronese, F., Masciadri, A., Comai, S., Matteucci, M., Salice, F.: Behavior drift
detection based on anomalies identification in home living quantitative indicators.
Technologies 6(1), 16 (2018). https://doi.org/10.3390/technologies6010016
19. Verstaevel, N., Georgé, J.P., Bernon, C., Gleizes, M.P.: A self-organized learning
model for anomalies detection: application to elderly people. In: 2018 IEEE 12th
International Conference on Self-Adaptive and Self-Organizing Systems (SASO),
pp. 70–79 (2018). https://doi.org/10.1109/SASO.2018.00018
20. Zhang, H.: Exploring conditions for the optimality of Naive Bayes. Int. J. Pattern
Recogn. Artif. Intell. 19(02), 183–198 (2005)
Cloud-Based System for Vital Data
Recording at Patients’ Home

Alexander Keil1,2(B) , Kai Hahn1 , Rainer Brück1 , Nick Brombach2 ,


Nabeel Farhan2 , and Olaf Gaus2
1
Chair for Medical Informatics and Microsystem Engineering, University of Siegen,
Am Eichenhang 50, 57068 Siegen, Germany
{alexander.keil,kai.hahn,rainer.brueck}@uni-siegen.de
2
Research Center “Digitale Modellregion Gesundheit Dreiländereck”,
Univerity of Siegen, Weidenauer Straße 167, 57076 Siegen, Germany
{nick.brombach,nabeel.farhan,olaf.gaus}@uni-siegen.de

Abstract. Due to demographic change and the declining number of


physicians, many rural areas are threatened by a lack of medical care.
To counter this problem, it is necessary to relieve the existing physicians.
This paper describes a technical infrastructure for patients to record their
vital data themselves and make it available to their physician. To do this,
physicians first prescribe vital parameter recordings for their patients in
a web frontend, i.e. what vital data should be measured, when, and
how often. The patient can view this data on a smartphone, which also
serves as a gateway for data transfer. Commercially available devices
with Bluetooth connectivity are provided for measuring the vital data.
This data is sent automatically from the devices to the smartphone and
transferred to a cloud environment. The general practitioner can then
display and evaluate the accessed data via the web frontend.

Keywords: Telemedicine · Vital data · Home monitoring · Health


platform

1 Introduction
Guaranteeing comprehensive GP (General Practice) care in the rural German
tri-border region of North Rhine-Westphalia, Rhineland-Palatinate, and Hesse is
an increasing challenge. On the one hand, this is caused by growing medical and
nursing needs due to demographic changes [1] and, on the other hand, regional
disparities with regard to the average age and thus the number of practicing
physicians in outpatient GP care are reasons [2].
This paper describes the approach of a digital transformation to introduce a
patient-controlled home monitoring of vital data as a new form of outpatient GP
care. This new approach aims concurrently at care efficiency by relieving physi-
cians’ time through delegation and (partial) automation of documentation obli-
gations, and improved cost-effectiveness by minimizing follow-up examinations
and appointments, reducing hospital stays and the use of emergency services [3].
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 15–27, 2022.
https://doi.org/10.1007/978-3-031-09135-3_2
16 A. Keil et al.

With home monitoring of vital data, the paper’s scenario contributes to an


improvement of medical care quality through improved health monitoring, faster
sequencing of treatment process steps in diagnostics. Additionally this approach
allows data processing to present meaningful information the physician. If there
are discrepancies in the data or a need for low-priority action, the physician can
contact the patient via video consultation [4].
This paper focuses the technical infrastructure and concentrates on software
and hardware issues of the outlined scenario.

2 Project Background
Although the overall number of medics is not declining, the competition to hire
and keep country doctors turns out to be a problem not only for German rural
areas, but also for the rest of Europe [5].
The current project “DataHealth” is located in Burbach a small village in
rural southern Westphalia. A digital medical platform was implemented that
allows the involvement of patients (or non-medical staff in nursing homes) for
measuring vitals signs. This considerably relieves doctors in their daily practice.
The participating physicians assess which patients are suitable for this “self-
measurement” and select the appropriate ones. For the project two scenarios were
chosen: about 20 patients participate in a nursing facility and about the same
amount are ambulatory patients of two doctors’s offices in the village. Medical
indications according to the measuring devices are e.g. arterial hypertension,
diabetes mellitus II, pneumonia, cardiac failures, and atrial fibrillation.
The project is part of a larger regional initiative lead by University of
Siegen [6] to improve the health environment by digital medicine.

3 System Overview
Home-based measurement of vital parameters aims to initiate a paradigm shift
and thus to optimize care processes. On-spot measurements taken by patients
themselves, by relatives, or by caregivers are obtained from the monitoring sys-
tem running continuously in the background and can thus be presented to physi-
cians in different resolutions (discrete or continuous) or as aggregations. In this
way, physicians can examine the clocked measurements and also the detailed
course at any time, e.g. in order to recognize intermediate values and their ten-
dencies in the case of irregularities to make these as a starting point for further
interventions.
The doctors assess initially which patients are suitable for this self-
measurement and select them. Different sensing devices allow the measurement
of vital parameters like blood pressure, ECG, heart rate, blood glucose, weight,
and oxygen saturation. Most of the devices are equipped with Bluetooth inter-
faces and can automatically sync with a smartphone app [7]. For devices with
no radio connection a manual transcript of data to a smartphone app is eas-
ily possible. Practices could be equipped with a viable number of such certified
Vital Data Recording at Home 17

(CE, MDD/MDR) monitoring systems [8] at moderate costs. In summary, the


vital data measurement builds a bridge from the traditional punctual vital data
collection in practices to an innovative, continuous, data-driven monitoring of
vital parameters by the physician while measurement takes places at patients’
home. In a subsequent step the data is transferred by a dedicated mobile app to
a certified cloud environment to be processed and to be accessible by the GPs,
as shown in Fig. 1.

Fig. 1. System view

This vital data evaluation supports physicians in their daily practice by pro-
cessing and evaluating patient data securely and in compliance with data pro-
tection laws allowing the additional use of statistics-based services. The focus
is on supporting patient-physician interaction. In this way, patients can easily
be involved in health decisions on the basis of their vital data. After informed
patient consent, physician are given access to all data and the possibility of data
interaction. They can filter, view and summarize it. The use of data processing
methods like aggregations supports physicians by transparently supplementing
the presented vital data to make informed and fact-based decisions.
The vital data are recorded within the prescribed monitoring period. Aggre-
gated and processed data are temporarily cached in the cloud to populate ser-
vices such as evaluation and visualization for a physician’s web interface. A later
integration into PMS (Practice Management Systems) is intended.
Security, authentication, and encryption procedures are used for patient data.
Since legal regulations in Germany require the use of the so-called Telematics
Infrastructure (TI) for stakeholders’ mutual authentication there are also initial
ideas to connect cloud services to the TI [9].
18 A. Keil et al.

4 Patient Frontend
For the described concept, a frontend for the patient is needed to transmit vital
parameter data. There are two basic ways to transmit: Either the sensing devices
for vital data measurement send the data directly to the cloud or some kind of
gateway is used.
In a current research project called “DataHealth” a smartphone as a gateway
has been used due to the simple implementation of this approach. In use are 40
Apple iPhones SE (version 2020). The app was implemented in the programming
language Swift to guarantee the best compatibility with Apple’s iOS system.

4.1 Vital Values and Measuring Devices

For this project, vital data measuring devices were required that are easy to
use, intended for home use, have appropriate certifications, and can transmit
measured data to a smartphone via Bluetooth. In the end, the decision was made
in chose the German manufacturer Beurer, which has many years of experience
in the fields of measuring devices and their networking with a smartphone app.
The following section refer briefly to the measured vital values and the mea-
surement devices:
The Blood pressure and hearth rate is measured using Beurer BM85
which is an easy to use blood pressure monitor with bluetooth functionality. If
it is connected to a smartphone, it transmits the measured blood pressure and
the hearth rate.
The Blood oxygen saturation is measured using Beurer PO60 which is an
easy to use pulse oximeter with bluetooth functionality. As long as a finger tip
is put into the device, it measures the hearth rate and the oxygen saturation of
the blood. If it is connected to a smartphone and the finger is removed at the
end of the measurement, it automatically transmits the measured blood oxygen
saturation. Unfortunately, it does not transmit the hearth rate as well.
For ECG measurement, Apple Watches Series 6 where chosen. Apple watches
are currently the only devices that are allowed to write ecg-data to Apple Health,
from there the data is easily accessible. In addition Apple Watches are also capa-
ble of recording the hearth rate and the blood oxygen saturation permanently
instead of only sporadically as when conventional devices are used.
For measuring the body weight, Beurer BF700 diagnostic bathroom scale
with bluetooth functionality are used. In addition, the scale can perform a bio-
electrical impedance analysis, to measure the body fat, body water, muscle and
bone mass.
The blood glucose is measured by the patients’ own equipment, as diabet-
ics already own such devices. So the option of entering a blood glucose value
manually has been implemented.
Vital Data Recording at Home 19

4.2 Mobile Operation System iOS: Health Environment and App


Implementation
On the iPhones, three different apps are currently in use to transmit the patients’
vital data to the cloud as shown in Fig. 2.

Fig. 2. Detailed path of the vital data in the smartphone

Health Manager Pro. The “HealthManager Pro” app of Beurer receives the
vital data via Bluetooth from the Beurer devices described in Sect. 4.1 and for-
wards them to Apple Health. After setting up the app for the first time, this
process is done automatically in the background without any interaction by the
user being necessary.

Apple Health. This is the central app provided by Apple to store and view all
kind of health related data. Interfaces are provided for third-party apps to read
and write health data in Apple Health with one exception: Currently, ECG data
can only be read by third parties. Only Apple Devices like the Apple Watches
are allow to write them. Within the project DataHealth Apple Health is only
used as an interface so that Health Manager Pro stores new vital data and the
project app can read it. Therefore no user interaction with the app is necessary.

DataHealth. The third application is the DataHealth app developed in the


project. The main purpose of this app is to transmit new incoming vital data
from Apple Health to the AWS cloud. The app was developed in Swift with
Xcode. It was designed to be used by elderly people with little or no technical
experience. Therefore it has an easy to operate user interface and the crucial
function, the upload of new vital data, is done automatically while the app is
active.
Figure 3(a) shows the overview page of the app. On top of the page, the cur-
rent prescriptions are displayed, by default for the current day (Heute = today)
and by pressing on the buttons for the next two days (Morgen = tomorrow,
20 A. Keil et al.

Übermorgen = the day after tomorrow). The daily prescription is divided into
daytimes (Nüchtern = empty stomach, Morgens = in the morning, Mittags = at
noon, Abends = in the evening).

Fig. 3. DataHealth App

The yellow box in the middle shows when and which last vital value was
transferred, so the user has a feedback if data has been transmitted. Below are
three buttons to manually enter health complains (Gesundheitsbeschwerden),
blood glucose level (Blutzucker) and weight (Gewicht).
On the bottom of the page are buttons for the three main pages overview
(Übersicht), history (Historie) and messages (Nachrichten).
The history page is shown in Fig. 3(b). All transmitted vital values can be
viewed here. At the top of the page, the vital value can be selected. The table
below shows the selected values and the recording date.

5 Cloud Environments for Medical Data

All vital data recorded from the patient as previously described are transmitted
to a cloud to be accessed by the physician.
Vital Data Recording at Home 21

5.1 Amazon Web Services

A cloud solution was preferred to a local server solution here for various reasons:
On the one hand, the cloud provider allows problem-free scaling of the nec-
essary resources, i.e. the required computing and storage capacities are always
adapted to the current needs (e.g. with regard to the number of users). On the
other hand, no costly hardware purchases have to be made. Only the services
used are paid for. In addition, the cloud enables a dedicated access mechanism
with account management and redundant data storage and is thus also superior
to the so-called on-premise solutions in terms of security. The choice of a service
provider fell on the current market leader in cloud computing: Amazon.
Amazon Web Services, or AWS for short, is a cloud computing service that
has been offered by Amazon for about 15 years. Cloud computing is a devel-
opment model in which data storage and large parts of data processing are no
longer carried out on the specific user devices (in this case the patients’ smart-
phones or the doctors’ desktop computers), but on an online service provided
by the chosen service provider. Amazon is by far the largest cloud provider on
the market and is one of the few certified by the German Federal Office for
Information Security (BSI) (C5, ISO/IEC 27001:2013, GxP), so that even sen-
sitive medical informatics applications with patient data do not pose a security
problem [10].

AWS DynamoDB. DynamoDB is a fast high availability serverless database


provided by AWS. Every table in a DynamoDB must have a partition key and
can have an optional sort key. Both keys together constitute the primary key,
which must be unique in the table. When a new element is created with the same
partition key and sort key (if used), the old element with the same keys will be
overwritten. Other table attributes can be added dynamically and do not need
to be known at creation.
When querying the database, any entry with the same partition key can be
retrieved quickly and efficiently. Furthermore, it is possible to filter the items
to be returned based on the sort key. In the referred project DataHealth, the
partition key is always either the ID of the patient or the ID of the physician.
This way all information about a particular patient or a particular physician can
easy and efficient be accessed.

AWS Cognito. Cognito is a token based service and used to perform user
registrations, logins and access control. To register new users, a user pool with
certain access privileges needs to be crated first. In our case we created two user
pools: a physician user pool and a patient user pool.
Every user must register with a user name and a password. Depending on
the configuration, an e-mail address or a phone number can be added. These
are mainly used to reset the password. For every user, an unique subject ID
(sub) is created automatically to identify the user. This ID is also used in the
DynamoDB tables, to store data like vital values belonging to a certain user.
22 A. Keil et al.

5.2 Table Design

For the storage of the vital values, physician and patient data and others, the
DynamoDB database service was chosen. Most of the data, including the vital
values, are stored on the server ‘eu-central-1’ which is located in Frankfurt in
Germany.

Vital Values. All vital values are stored almost in the same way but each in
a separate table. The primary key is the patient ID which equals the unique
sub-ID of the user in cognito. The sort key is the recording time of the vital
value in the format “yyyy/MM/dd HH:mm:ss:fff”. This way, the combination
of partition key and sort key should be unique at every time. Furthermore, it is
possible to retrieve the data of a particular patient very efficient, since all data
from the same patient have the same partition key. Additionally, it is possible
to retrieve a subset of the data using the sort key.
With some exceptions, all vital values have a numeric field for the vital value
itself and an additional comment field for the physician to comment the vital
value.
In Apple’s HealthKit, blood pressure is not stored in a single table but in
two separate tables one for the systolic and the other for the diastolic value.
Therefore, the decision was made to use the same structure with two tables in
DynamoDB. But since we need a comment field only once, only the systolic table
contains one.
In addition to the previously mentioned fields, the blood glucose has an
optional field for information about the last meal.
Finally, a “general health condition” field was requested where the user can
enter health complains as plain text. So the numeric field for the vital value was
replaced by a comment field with a string.

ECG. The storage of ECG data is slightly more complex than the rest of the
vital signs due to the structure and amount of data. Like the other vital values,
the primary key is the sub-id of the user and the sort key is the recording time.
Additional vales are the sampling frequency and the average hearth rate. The ecg
itself is stored as a numeric list of values in millivolt. Since the sampling frequency
of the used Apple Watch 6 512 Hz and the duration of an ecg measurement is
30 s, one ecg consists of more than 15,000 measuring points.

Physician Master Data. This table contains the subID of the Cognito user
as partition key. Additional fields are the name, first name, academic title and
role. The role can be physician, care or admin and enables different views in the
frontend (see Sect. 6.2).

Patient Master Data. This table is stored on the separate server “eu-west-1”
which is located in Ireland for data protection reasons. The primary key is the
Vital Data Recording at Home 23

subID of the Cognito user, a sort key does not exist. Additionally, it contains
the patients name, first name, birthday, sex, height (for BMI), phone number
and address.

Prescriptions. This table reflects the prescrptions of the physician with regard
to vital data recordings for every patient. The primary key is the patient ID,
the sort key is the time when the prescription was entered. Additional fields are
the end time of the prescription, the physician id and the vital value as strings,
the measuring interval as a numeric and the times of the day, morning, noon,
evening and empty stomach as bools. This was it is possible to prescribe for
example a blood pressure measurement every second day in the morning and
in the evening or a blood sugar measurement every day with empty stomach.
When the field for the end time of the prescription is empty, it is active, when
it contains a valid date, it is inactive and shown in the prescription history.

Mapping. This table contains the physicians’ ID as partition key and the
patient ID. It is used to assign physicians to their patients. Data of unassigned
patients cannot be viewed. Both the physician ID and the patient ID are the
cognito sub-ids of the user.

Others. Additional tables are for messages from the physician to the patient
(not the other way around), storing diagnoses and target areas for each patient
and vital value. When the values of the specified patient are not inside the target
area, the vales are highlighted for the physician in the web frontend.

6 Web Frontend

To allow physicians and caregivers to access their patients’ vital data, a web
frontend was implemented using ASP.NET. It is running on a virtual server of
the data center of the University of Siegen and is accessible over the world wide
web.

6.1 User Interface

The Web Frontend provides several overview and detail pages.

Overview. The starting page for the physician is the overview page. The most
recent vital signs are displayed here, which are not within the patient-specific
target area. This allows the physician to get a quick overview of the most impor-
tant new values.
24 A. Keil et al.

Dashboard. This page is the most important and displays the patients’ vital
values. There are two dropdown menus, one for selecting the patient and one
for selecting the vital value. Selectable vital values are timeline, blood pressure,
blood sugar, hearth rate, weight, blood oxygen saturation and the general health
condition.
The timeline is the default selection and provides an overview over all vital
values of the selected patient for a selected duration of days. The values are
sorted by date starting with the newest values as shown in Fig. 4. The columns
are the date of measurement, the name of the vital value, the vital value itself
and the comment of the physician if there is one.

Fig. 4. Timeline view of a patient’s dashboard

If a particular vital sign value is to be viewed in-depth, it can be selected


and displayed in a more detailed view. Figure 5 shows the blood pressure details
of a patient as an example. The views for hearth rate, weight, oxygen saturation
and blood sugar are quite similar: On the left hand side a table of the vital
values with the date of recording is displayed. Furthermore, it is possible to add
a comment to the vital value here. Green values are within in defined target
area for this patient, yellow values are outside the target area and red values are
far outside. The target area for the patient can be set on the bottom right of
the page. In the middle of the right side, the values can be viewed as a graph
including the target areas as green areas in the background.

Patients. On this page, a list of all assigned patients can be viewed including
their personal data. It will be possible to change the personal data of the patient
here, but this is not implemented yet. From here the vital value prescriptions
page of the patients is accessible.
Vital Data Recording at Home 25

Fig. 5. Blood pressure view of a patient’s dashboard

Prescriptions. At the top of this page, there is a list of all active vital value
prescriptions, below is a list of all historic prescriptions. If no end date has been
set for a prescription, it is active.
At the top of the two lists is a button to create new prescriptions via a
pop-up. In this pop-up a vital value is chosen with a drop down list ae well
as a measurement interval in days and there are four radio buttons for empty
stomach, morning, noon and evening. If a new prescription is created and an
active prescription with the same vital value for this patient already exists, the
existing one is automatically deactivated so that there can only be one active
prescription of every vital value at the same time.

6.2 User Roles


For the users of the frontend, there are three different roles defined: physician,
care and admin.
The physican can see all of his of her patients and all related vital values.
The start page is the overview page where it is shown and can see whether the
vital values of the patients are within the target area or not. The dashboard is
fully functional and comments to vital values can be made and edited. Further-
more, messages can be sent to the patients.
Care users have limited access to the system and cannot see the overview
page. Instead, the start page is the dashboard, however, when selecting a vital
value, only the table with the values is displayed, graph and limits are not visible.
This is due to the fact that care users require the data mainly for documentation
purpose but do not evaluate them.
26 A. Keil et al.

The Admin is mainly for configuration purposes. All functions are available
and all patients are visible independent of the mapping table. This user can
create new patients.

7 Conclusion
The paper described the technical infrastructure necessary for a new approach
of measuring patients’ vital parameters. Many appointments for patients in doc-
tor’s practices are only conducted to measure a new set of vital data in order to
assess the health status of the patient. Measurements at home can be a relief for
patients as well as physicians. Additionally the frequency and the duration of
measurements can be individually adapted to the patient’s condition. The sys-
tems consists of a patients’ frontend with sensing devices and transmitting hard-
and software, a middleware within a secure cloud environment and a physician
web-frontend with capabilities to visualise the vital data.

Acknowledgement. Parts of this work are funded by the European Commission,


European agricultural fund for rural development (EAFRD) within the project “Data-
Health – Flexible Patientendaten für die Gesundheitsversorgung im ländlichen Raum”
(Project duration 2021/2022).

References
1. van den Bussche, H.: Die Zukunftsprobleme der hausärztlichen Versorgung in
Deutschland. Aktuelle Trends und notwendige Maßnahmen. Bundesgesundheits-
blatt 62, 1.129–1.137 (2019). https://doi.org/10.1007/s00103-019-02997-9
2. Kassenärztliche Bundesvereinigung: Gesundheitsdaten - Regionale Verteilung der
Ärzte in der vertragsärztlichen Versorgung 2020. Statistische Informationen
aus dem Bundesarztregister (2020). https://gesundheitsdaten.kbv.de/cms/html/
16402.php. Accessed 21 Jan 2022
3. Kramer, U., Vollmar, H.C.: Digit. Health: Forum 32(6), 470–475 (2017). https://
doi.org/10.1007/s12312-017-0326-7
4. Warzecha, M., Dräger, J., Hohenberg, G.: Visite via monitor. Heilberufe 70(5),
42–43 (2018). https://doi.org/10.1007/s00058-018-3461-3
5. Winkelmann, J., Muench, U., Maier, C.: Time trends in the regional distribution
of physicians, nurses and midwives in Europe. BMC Health Serv. Res. (2020).
https://doi.org/10.1186/s12913-020-05760-y
6. University of Siegen: Digitale Modellregion Gesundheit Dreilädereck (DMGD).
https://dmgd.de. Accessed 15 Feb 2022
7. Haak, D., Deserno, V., Deserno (geb. Lehmann), T.: Datenmanagement für Medi-
zinproduktestudien. In: Kramme, R. (ed.) Informationsmanagement und Kommu-
nikation in der Medizin, pp. 145–164. Springer, Heidelberg (2017). https://doi.org/
10.1007/978-3-662-48778-5 48
8. Darms, M., Haßfeld, S., Fedtke, S.: Medizintechnik und medizinische Geräte
als potenzielle Schwachstelle. In: Darms, M., Haßfeld, S., Fedtke, S. (eds.) IT-
Sicherheit und Datenschutz im Gesundheitswesen, pp. 109–128. Springer, Wies-
baden (2019). https://doi.org/10.1007/978-3-658-21589-7 5
Vital Data Recording at Home 27

9. Voßhoff, A., Raum, B., Ernestus, W.: Telematik im Gesundheitswesen.


Bundesgesundheitsblatt 58, 1094–1100 (2015). https://doi.org/10.1007/s00103-
015-2222-6
10. Buddha, J.P., Beesetty, R.: Getting started. In: The Definitive Guide to AWS
Application Integration. Apress, Berkeley (2019). https://doi.org/10.1007/978-1-
4842-5401-1 1
Tele-BRAIN Diagnostics Support System
for Cognitive Disorders in Parkinson’s
Patients

Andrzej W. Mitas1(B) , Agnieszka A. Gorzkowska2 ,


Katarzyna Zawiślak-Fornagiel3 , Andrzej S. Malecki4 , Monika N. Bugdol1 ,
Marcin Bugdol1 , Marta Danch-Wierzchowska1 , Julia M. Mitas1 ,
and Robert Czarlewski5
1
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{andrzej.mitas,monika.bugdol,marcin.bugdol,
marta.danch-wierzchowska}@polsl.pl, julimit139@student.polsl.pl
2
Department of Neurorehabilitation, Faculty of Medical Sciences in Katowice,
Medical University of Silesia, ul. Medyków 14, 40-752 Katowice, Poland
agorzkowska@sum.edu.pl
3
University Clinical Center prof. K. Gibiński of the Medical University of Silesia,
ul. Ceglana 35, 40-752 Katowice, Poland
katarzyna zawislak@wp.pl
4
Institute of Physiotherapy and Health Science, Academy of Physical Education
in Katowice, ul. Mikolowska 72A, 40-065 Katowice, Poland
a.malecki@awf.katowice.pl
5
APA Sp. z o. o., ul. Tarnogórska 251, 44-105 Gliwice, Poland
robert.czarlewski@apagroup.pl

Abstract. The material describes the basic functional features of the


diagnostic support system in the case of neurodegenerative diseases,
which is the subject of the international research and implementation
project Tele-BRAIN. The project’s main goal is to optimise the diagnos-
tic process in the area of Parkinson’s disease based on automatic EEG
analysis with the use of artificial intelligence methods. The main part of
the project is a system of early, automatic or semi-automatic (with the
participation of an expert) algorithmic early diagnosis of cognitive disor-
ders in Parkinson’s disease. Identifying cognitive dysfunction in advance
(in relation to the occurrence of easily recognisable symptoms) may allow
for the implementation of a procedure that positive influence the exten-
sion of the patient’s independence period, a better quality of patient’s
life as well for the caregiver, and in the future it may also allow for neu-
roprotective interventions slowing down the development of dementia.
The presented algorithm is currently being implemented, and the aim
of the article is to present the concept and functional evaluation of the
pilot version of the system.

Keywords: Decision support system in neurology · qEEG in dementia


recognition
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 28–42, 2022.
https://doi.org/10.1007/978-3-031-09135-3_3
Tele-BRAIN – Diagnostics Support System 29

1 Introduction
The diagnosis of neurodegenerative diseases is of constant interest today, espe-
cially in the context of the growing possibilities of neuroimaging and the use of
information search techniques in large sets of highly diversified data.
Despite recent advances in medical knowledge, clinical criteria are still used
to diagnose Parkinson’s disease (PD). The most recent and widely used criteria
are the Movement Disorders Society (MDS) criteria published in 2015, according
to which the diagnosis of PD requires confirmation of bradykinesis and one of the
two axial symptoms - resting tremor or muscle stiffness. According to the current
guidelines, the clinician should review the exclusion criteria, red flags, and crite-
ria confirming the diagnosis after diagnosing Parkinsonian syndrome. The MDS
criteria also include cardiac scintigraphy using the I-meta-iodobenzylguanidine
(MIBG) marker and the DaTSCAN test, which in practice has its limitations
– primarily – in terms of availability and cost of these tests, and is performed
only in selected cases. In the process of diagnosing PD, the doctor uses stan-
dard neuroimaging tests, and only in special cases – e.g. in very young people
or familial parkinsonism – additionally genetic tests. Whatever biological mark-
ers of PD (so-called biomarkers) have not been introduced into routine clinical
practice yet, the observations so far indicate that the sensitivity and specificity
of the clinical criteria used are limited. Hence, the use of artificial intelligence is
a significant direction of research, which may contribute to increasing the sen-
sitivity and accuracy of diagnosis, especially at an earlier stage of PD, enable
diagnosis already in the prodromal period, and further improve the quality of
life of patients and their caregivers, and above all, the expected intervention in
the future neuroprotective. This is particularly important from the point of view
of one of the most aggravating aspects of this disease – dementia, which develops
in a large group of patients with PD.
The theory of risk, which is the basis of medical diagnostics, applies to all
of us in every situation, so the correctness of each inference can be (and usually
is) analysed in a probabilistic way as a random event, not the same as a specific
event. Ultimately, we cannot be sure of well-being even in the simple situation of
consuming an ordinary meal because, in the specific situation of environmental
conditions and the current mental condition of the consumer, it may be a dietary
error. Unfortunately, it is impossible to analyse the impact of such exemplary
non-measurable variables.
Usually, we estimate the risk of failure (or sometimes not, relying on someone
else’s, primarily unverified suggestions) and take appropriate action. We expe-
rience every day the fact that although the probabilities of various life events
are almost always lower than one, we nonetheless undertake activities related to
them (i.e. events in the probabilistic sense), e.g. when driving a car to a specific
destination. However, the accident is a random variable.
In the Tele-BRAIN project, the basis of diagnostics is the signal recording
coming from the electrical manifestations of the brain’s work. Bearing in mind
the low mass of the bodies involved in generating charges, polarising selected
areas of the head, and with high resistances (e.g. dry skin or skull bones), we
30 A. W. Mitas et al.

are glad to analyse the potentials because the currents are too low. On the
other hand, potential disturbances occur for completely trivial reasons, precisely
because of high impedance, so the measurement is, by its very nature, burdened
with high uncertainty. A completely different material is devoted to the issue of
reducing the impact of these disturbances because such a problem belongs to a
completely different research area (compression, reduction, filtration or augmen-
tation of data).
Human EEG data analysis is the obvious basis of inference for many neu-
rological pathologies. However, despite basing its principle on algorithmics, it
does not avoid heuristics; the expert’s opinions contain some conclusions that
are difficult to justify, other than with many years of practice and similar cases
from personal experience in historical terms.
A multi-channel electroencephalograph works simultaneously in each lead,
providing separate courses. Simultaneous analysis of material dispersed over time
(study period, divided into epochs) and space (amplitude of changes in each
lead) is often limited to the visual registration and interpretation of selected
patterns, characteristic of the previously described cases. The invention of a
relationship resulting from the linear combination of several (a dozen) leads is
beyond the scope of human possibilities. Special courses and certificates regulate
the appropriate authorisation to diagnose based on such records.
In this material, the merits of interest are methods of EEG signal processing,
commercialised as part of a research and implementation project, designated in
NCBR as WPN-3/5/2018, and entitled: TeleBrain – Artificially Intelligent EEG
Analysis in the Cloud.
It is an international project, and the German consortium has shown in his
research the usefulness of using artificial intelligence in the effective analysis of
electroencephalographic waveforms. The broadly understood aim of the project
is to implement the developed method and its practical use in diagnostics.

2 Scientific Research Background and Goals


of the Tele-BRAIN Project
Data processing in the cloud using stochastic methods of knowledge search is a
modern way of using mathematical methods in medicine. It is, of course, char-
acterised by a high level of technological complexity, which also requires solving
many partial problems with a research dimension. Only today do we have such
tools that enable the implementation of such tasks, not to mention the formal
and legal arrangements made in recent years, conditioning the safe management
of patient data.
The main issues that require special investment include:
1. artefact analysis and effective signal preprocessing, including analysis of the
acceptability of the extraction of selected information fractions;
2. management of personal medical data in an international cloud solution;
3. a mixed (algorithmic-stochastic) diagnosis support system containing an
explanatory system as a medical condition for potential therapy.
Tele-BRAIN – Diagnostics Support System 31

The literature often discusses the meaning of preprocessing as a method


of removing undesirable components (or factors) distracting the information
sought. One can hypothetically assume that in the original signal, the desired
information may also be revealed in the presence of noise. Such a thesis does not
stand up to elementary criticism from the point of view of reliability theory and
testing [24] because the causes of deterministic disturbances should be removed
before starting the study. If this is not possible, then modelling the effects of
such disturbances allows for their a posteriori extraction. In the case of ran-
dom disturbances due to the randomness of the phenomenon, we cannot make a
diagnosis that the diagnosis is insensitive to such disturbances. If such phenom-
ena are distinguishable from usage information, the assumption of due diligence
implies the need to distinguish and separate such a component. On the other
hand, the information compression techniques used in diagnostics are based on
the property that any extension of the information sequence, including random
data, does not fundamentally change the diagnostic values of the technique.
According to the authors, in the absence of definitive mathematical models
proving the negligible impact of disturbances, as long as it is possible to organise
the event space before starting the analysis, such an action should be taken.
The EEGs are, in particular, random artefacts, the characteristics of which are
described in the literature. In the presented project, this task is carried out in a
separate research department.
There is much talk about data management today, and regulations related to
it are sometimes a cause for concern when developing a data processing system.
With regard to personal data describing the state of health, the term ’sensi-
tive data’ 2 is used, curiously defined in the context of unauthorised access,
which would imply the possibility of such access to other personal data. They
cannot be accessed without authorisation, in particular in the sense of linking
personal identifiers and biomedical data streams 3. Some solutions prevent the
distribution of such data outside the local hermetic network of the administra-
tor. Such extremely complex conditions for the Tele-BRAIN system, operating
in the cloud-based international access processing system, were the subject of
special, secure coding.

3 The Current Status of Parkinson’s Disease in the Area


of Research
Parkinson’s disease (PD) is one of the most common neurodegenerative disorders
and the most common form of parkinsonian syndrome. The prevalence of PD in
the general population is approximately 0.3% for the population over 60 years
of age, over 70 years of age it is about 1%, and after 80 years of age - about 3%
[23,31]. It is estimated that in 2030, between 8.7 and 9.3 million people over
50 years old suffering from PD will live in the ten most populous countries of
Western Europe [12], therefore the problems resulting from this disease may be
a significant social problem.
32 A. W. Mitas et al.

Dementia is the worst complication of PD from the point of view of progno-


sis and treatment options and may develop annually in up to 10% of patients
with this diagnosis [17]. The researchers report that 30% of patients with PD
will develop dementia after ten years of disease [1,29]. According to other data,
the cumulative risk of dementia in PD is 75% in the group of patients with
PD over ten years of disease duration, 83% – after 20 years, up to 95% - after
90 years of age. This means that the majority of patients with PD will even-
tually have dementia, but the time of its onset is very variable. Up to 25% of
’de novo’ patients diagnosed with PD already have mild cognitive impairment
(MCI), which in some cases can quickly convert to dementia [17]. The presence of
dementia in PD has several clinical consequences; therefore, its early and proper
diagnosis is essential for the proper management of the patient.
Electroencephalography (EEG) is an important additional test that allows
studying the effects of developing dementia on the brain’s work [3,7]. As shown
by the previous studies, the assessment of EEG in PD can be a valuable
biomarker, especially in terms of predicting the occurrence of cognitive disorders
and their early diagnosis, as well as monitoring selected aspects of treatment
effectiveness [22]. EEG seems to be an objective and more precise diagnostic
method of assessing cognitive impairment in PD than other tools and may com-
plement psychological tests [20]. Neuropsychological diagnosis is the gold stan-
dard in assessing cognitive impairment in PD, but the way neuropsychological
tests are performed can be modified through the participant’s cooperation skills,
ability to exercise and the severity of motor symptoms. Therefore, this diagnosis
could be supported by EEG, which is a method that is not only simple and safe,
but also performed in the patient’s relaxation conditions and does not require
effort or retained verbal functions, and what is important - it does not depend on
motor symptoms [22]. Digital quantitative analysis of EEG allows the analysis
of the absolute power of various frequency bands of brain rhythms, temporal
relationships between individual brain regions, as well as functional connectivity
between them [16,30]. Changes typical for PD in quantitive EEG (qEEG) were
also analysed. It was found that specific changes in alpha and theta power spec-
tra may be markers for the diagnosis of the disease itself and are associated with
the presence of MCI in PD [9]. Other studies have compared the pathophys-
iological mechanisms of dementia in Alzheimer’s disease (AD) with dementia
in PD in qEEG and showed differences in recording, which may be associated
with different mechanisms of dementia development in both disease entities [16].
Early cortical and subcortical PD changes can potentially be mirrored in the
brain’s electrical activity. Therefore it may be suspected that an EEG would be
a valuable and sensitive diagnostic tool. There are already test results in which
the analysis of the EEG frequency allowed to identify and assess even very subtle
changes, e.g. slowing of the EEG recording. Among other things, differences in
EEG between PD and healthy patients have been described, where an increase in
delta and theta waves with accompanying reduction in alpha and beta was noted
[19]. Some researchers indicate that qEEG can be a valuable tool for the early
Tele-BRAIN – Diagnostics Support System 33

prognosis of dementia [13,15,18,22,27]. However, this requires further research


as some of these EEG changes present in PD patients without dementia [5,28].
It is worth pointing out that the EEG may be useful in the differential diag-
nosis of dementia in general. For example, some differences in quantitative EEG
have been found between PD and AD patients [2,4,8]. The authors of studies
in this area have already attempted to identify a useful EEG biomarker in the
form of the alpha/theta index in spectral analysis to differentiate patients with
cognitive decline and healthy people [10].

4 EEG Test – Potential Predictive Role in Testing


Cognitive Disorders

Different potentially biomarkers are currently on the way to explore the risk
estimation of cognitive deterioration in Parkinson Disease (PD). Although most
of the studies so far focused on the patient with clinically-established Parkinson’s
Disease Dementia (PDD), the most important seems to be to discover a potential
early biomarker of cognitive impairment in PD since its pre-clinical phases. That
is why it has become a more and more widely researched topic nowadays. Recent
studies have shown that quantitative EEG (qEEG) is a promising method to
study dynamic brain changes and prove its suitability for identifying possible
early electrophysiological markers of PD-related cognitive decline. That could
allow to mark the early stage and the onset of dementia in Parkinson’s disease
(PD) and enable the development of early treatment methods.
qEEG analysis of electrocortical activity demonstrated that a decrement of
the background rhythm frequency (alpha rhythm) together with an increase of
the low-frequency activity (delta and theta) could be associated with the degree
of cognitive decline in PD [11]. Reduction in the alfa band is widely known in the
literature. Many studies have already revealed a slow-down in the background
activity in PD patients with cognitive decline [4,6,11]. Analysing the absolute
power of seven frequency bands based on individual alpha frequency revealed
that compared to healthy controls, patients with PD without cognitive decline
had higher power in theta and lower alpha1 bands while PDD had higher power
in delta, theta, lower alpha1 and beta bands. Higher delta and gamma power
with no difference in theta and lower-alpha 1 power was the characteristic feature
of PD patients with dementia compared to non-dementia [26]. Moreover, studies
show that theta and gamma oscillations in PD EEG significantly varies from
matched controls. At rest, these oscillations increase in PD patients without
dementia, but they decrease in those with cognitive decline [21].
A recent study analysing electrocortical networks in PD patients using low-
resolution electromagnetic tomography (LORETA) and qEEG analysis found a
significant relationship between the electrocortical networks and cognitive dys-
functions in Parkinson’s Disease Mild Cognitive Impairment (PD-MCI). QEEG
analysis showed a relevant decrease of alpha Power Spectral Density (PSD) over
the occipital regions and an increase of delta PSD over the left temporal region
in PD-MCI [25]. An increase of theta power in the left temporal region and a
34 A. W. Mitas et al.

reduction of median frequency was associated with the presence of early cogni-
tive dysfunction PD and Alzheimer’s Disease (AD). However, EEG slowing was
more pronounced in PD than in AD, which may arise from a greater cholinergic
deficit in cognitively impaired patients with PD [4].
Other studies also looked at the Event-Related Potentials (ERPs), which
refer to event-related voltage changes in areas of the brain in response to spe-
cific stimuli (e.g., visual, auditory, and somatosensory stimuli). ERPs provide
a valid method to study the ongoing EEG activity during sensory, motor and
cognitive events and can be used as an electrophysiological indicator of cogni-
tive function. Research results show that ERP changes might be used to identify
cognitive impairment in PD patients and can be used to study the correlation
between cognitive and motor functions [11,32]. The P300 ERP is the best-studied
neural marker of cognitive functions. Studies have shown that changes in P300
ERP amplitude and latency are associated with impaired attention and cognitive
decline [32].

5 The Architecture of an Expert System to Assist


in the Detection of Cognitive Disorders in Parkinson’s
Disease
The first part of the proposed inference system to assist in detecting cognitive
disorders in idiopathic Parkinson’s disease (PD) is the confirmation of the diag-
nosis of idiopathic PD. For this purpose, a detailed interview is collected from
the patient, and neurological tests are carried out to identify the parkinsonian
syndrome.
The diagnosis of parkinsonian syndrome requires confirmation of the presence
of bradykinesis and one of the two axial symptoms: resting tremor or muscle
stiffness. Clinical evaluation of parkinsonian motor disorders is performed using
the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) Part III in the off
phase and in the on phase, i.e. the period of optimal dopaminergic drug activity
e.g. after administration of a levodopa preparation.
At the same time, we routinely assess the presence of non-motor symptoms
in a patient with PD. Cognitive impairment is of particular interest in this study.
The algorithm for its evaluation is provided at the end of this section.
Usually, after the interview and neurological examination, the next step is
to perform additional tests - laboratory and imaging tests aimed at excluding
secondary and atypical causes of parkinsonism.
In selected cases, the DaTSCAN test, cardiac scintigraphy using MIBG or
other research in a scientific context may be performed at this next stage. The
clinical diagnosis can be given various degrees of certainty. Thus, a patient with
a particular (sensitivity and specificity 90%) clinical diagnosis of PD cannot
have any exclusion criterion or warning symptom but must meet ≥2 criteria to
confirm PD.
A patient with clinically probable (sensitivity and specificity 80%) cannot
have any exclusion criteria may have ≤2 warning symptoms, but their presence
Tele-BRAIN – Diagnostics Support System 35

must be balanced by meeting 1 (when 1 warning symptom) or 2 (when 2 warn-


ing symptoms) confirmatory criteria. Exclusion criteria include cerebellar distur-
bances, supranuclear palsy of downward vertical gaze or slowing of vertical sac-
cades downward, the presence of a behavioural variant of frontotemporal demen-
tia or primary progressive aphasia ≤5 years of disease, parkinsonian symptoms
restricted to the lower extremities >3 years, treatment with dopamine agonists
or drugs, which reduce the content of dopamine (inducing drug-induced parkin-
sonism), no improvement after the use of high doses of levodopa (>600 mg/day),
despite moderate severity of symptoms, cortical abnormalities, ideomotor limb
apraxia or progressive aphasia, a correct image of the presynaptic system in the
DaTSCAN study, a documented alternative cause of parkinsonism.
Warning symptoms include features characteristic of atypical parkinsonian
disorders (PSP, MSA) and rapid progression of gait disorders (wheelchair ≤5
years of disease duration), pyramidal symptoms, bilateral, symmetrical parkin-
sonism, a complete absence of symptom progression ≥5 years (except for sta-
bilisation of symptoms under treatment), lack of non-motor symptoms (sleep
disorders, autonomic or mental disorders, impaired sense of smell) ≤ first five
years of the disease.
Criteria to support PD include resting tremor, levodopa-induced dyskine-
sias, pronounced on/off fluctuations, including wearing off, olfactory impair-
ment, sympathetic denervation in an MIBG study, significant improvement after
dopaminergic treatment.
The algorithm for assessing a patient with PD for the presence of cognitive
disorders consists of collecting a detailed history from the patient and the care-
giver, followed by screening tests, and then a full neuropsychological assessment,
which includes the assessment of the presence of cognitive and neuropsychiatric
disorders. The diagnosis of cognitive disorders includes the assessment of the
occurrence, type and severity of attention disorders, executive and visual-spatial
functions, memory and language disorders. Assessment of neuropsychiatric disor-
ders includes the identification of behavioural disorders, mood disorders, anxiety
disorders, impulse control disorders, sleep disorders, and the occurrence of psy-
chotic symptoms. If abnormalities are found in any of these areas, a detailed
assessment should be made - the nature of these symptoms, their severity, fre-
quency, inconvenience for the patient and the caregiver.
According to the current criteria [14], in order to diagnose dementia in PD,
it is necessary to find disorders in ≥2 cognitive domains among attention, mem-
ory, visual-spatial and language disorders. According to the general criteria for
dementia, this deficit must significantly worsen the patient’s daily functioning.
In order to diagnose MCI -PD, it is necessary to determine the deterioration in
everyday activities (but without the need for external help) and the deterioration
in the range of ≥1 cognitive function, but confirmed by performing two tests.
For PDD, it should be confirmed that the motor symptoms appeared before
the cognitive impairment. Before diagnosing MCI-PD or PDD, it is essential to
exclude severe depression, delirium and other causes of cognitive decline.
36 A. W. Mitas et al.

The next stage of the inference procedure, which forms the basis of the expert
system architecture, is the performance of an electroencephalographic examina-
tion (EEG).
After performing the above tests, the results are summarised, and the final
diagnosis is made:

– non-PD,
– PD without cognitive impairment (PD-CogN),
– PD with MCI (Parkinson’s Disease Mild Cognitive Impairment, PD-MCI),
– PD with dementia (Parkinson’s Disease Dementia, PDD).

The collection and detailing of the entire procedure, which is the basis for devel-
oping the diagnostic support algorithm in Parkinson’s disease, is presented in
three subsequent block diagrams (Figs. 1 and 2).

6 Tele-BRAIN-PL System

The discussed IT system has been implemented in cloud and local (stand-alone)
versions. The scope of functionality of both versions is similar from the user’s
point of view, with the difference that the local version has been equipped with
a mechanism for automatic updating of the algorithm version, whose current
version is in the cloud. Other functionalities are the same in both versions.
The Microsoft Azure cloud has been used for building the system, as it pro-
vides the best possibilities in terms of data security, system scalability, and high
availability. Two language versions of the system were prepared - Polish and
English, and the switching of languages takes place at the stage of logging into
the system.
The principle of operation of the discussed system (regardless of its version)
is as follows:

– through the application interface, data are delivered to the algorithm, such
as the result of the EEG examination, the description of the examination and
the description itself,
– additionally, through the system forms, demographic and clinical information
is entered.

The information gathered in this way is sent to the algorithm, which analyses
the data provided and returns a numerical result indicating the likelihood of the
disease.
Currently, in the research phase, the result of the algorithm is presented in
numerical form and additionally illustrated by a heat map. The result of the
algorithm is also generated as a .pdf file, which can be printed or saved.
The following technologies were used to build the system:

1. Azure BLOB storage – for data storage (EEG test results),


2. RestAPI – a web application making the system functions available,
Tele-BRAIN – Diagnostics Support System 37

Fig. 1. Block diagram of the basic diagnostic algorithm

3. REDIS – an element mediating communication between RestAPI and Celery,


4. Celery – contains an asynchronously executing algorithm for EEG analysis.

Access to the system (regardless of its version) is possible from any internet
browser. The necessary condition is the presence of Windows 10 64-bit, in the
version that already has WSL (Windows Subsystem for Linux), that is build
min. 19041.
TCP and SSL protocols are responsible for data security during transmission
in the system. Additionally, the security features provided by the Azure platform
itself are also used, such as AES encryption with a 256-bit key. It is one of the
strongest block cyphers, compliant with FIPS 140-2 standard.
Example screenshots illustrate the following three images in Fig. 3.
38 A. W. Mitas et al.

Fig. 2. Illustration of the test procedure for non-motor disorders (on the left) and
research algorithm of examination for cognitive impairment (on the right)

7 Summary and Conclusions

The Tele-BRAIN system results from joint works in an international research


and development project, still in progress and implementation. The presenta-
tion of the results of long-term cooperation between biomedical engineering and
Tele-BRAIN – Diagnostics Support System 39

Fig. 3. Welcome screen view. Login interface. Data input form

neurology community representatives is an indispensable element of an open


scientific discussion, which leads to the promotion of technical ideas in spe-
cific medical applications and is primarily a forum for product improvement. In
this context, the currently tested fractions of the diagnostic tree are presented,
which (the tree) is an element of a bipartite structure that uses, on the one
40 A. W. Mitas et al.

hand, artificial intelligence methods to detect selected pathologies using elec-


troencephalographic analysis, and, on the other hand, simultaneously analyses
the information available and used daily in in-depth diagnostics.
The subject of particular interest of the Polish scientific consortium mem-
ber (Faculty of Biomedical Engineering, Silesian University of Technology) was
cognitive disorders, which are common and one of the most aggravating compli-
cations of Parkinson’s disease. Based on the analysis of medical literature and
the experience of physiotherapists, it is estimated that intensive prevention of
neurocognitive disorders undertaken well in advance creates invaluable opportu-
nities to improve the patient’s quality of life, extending his ability to function
independently. Moreover, in the future, new, effective neuroprotective solutions
and more effective procognitive therapies are expected for which an early and
appropriate diagnosis of the clinical problems discussed here is crucial. Thus,
the undertaken works are scientifically cognitive and a valuable contribution to
the humanistic dimension of personalised telemedicine care. The ability to con-
duct research without the need for the patient to visit the diagnosing hospital is
invaluable, especially in the case of neurodegenerative diseases that significantly
reduce mobility.
It is expected that further work will lead to greater automation of the sys-
tem’s operation, with an emphasis on the extended application of artificial intel-
ligence methods, and each contribution to an open scientific discussion will pos-
itively contribute to shaping the patient’s comfort.

Acknowledgement. The study was realised within the project ‘TeleBrain – Arti-
ficially Intelligent EEG Analysis in the Cloud’ (grant number WPN3/9/Tele-
Brain/2018).

References
1. Aarsland, D., Zaccai, J., Brayne, C.: A systematic review of prevalence studies of
dementia in Parkinson’s disease. Mov. Disord. Off. J. Mov. Disord. Soc. 20(10),
1255–1263 (2005)
2. Babiloni, C., et al.: Cortical sources of resting state electroencephalographic
rhythms in Parkinson’s disease related dementia and Alzheimer’s disease. Clin.
Neurophysiol. 122(12), 2355–2364 (2011)
3. Babiloni, C., et al.: Abnormalities of cortical neural synchronization mechanisms
in patients with dementia due to Alzheimer’s and Lewy body diseases: an EEG
study. Neurobiol. Aging 55, 143–158 (2017)
4. Benz, N., et al.: Slowing of EEG background activity in Parkinson’s and
Alzheimer’s disease with early cognitive dysfunction. Front. Aging Neurosci. 6,
314 (2014)
5. Berendse, H.W., Stam, C.J.: Stage-dependent patterns of disturbed neural syn-
chrony in Parkinson’s disease. Parkinsonism Relat. Disord. 13, S440–S445 (2007)
6. Bočková, M., Rektor, I.: Impairment of brain functions in Parkinson’s disease
reflected by alterations in neural connectivity in EEG studies: a viewpoint. Clin.
Neurophysiol. 130(2), 239–247 (2019)
Tele-BRAIN – Diagnostics Support System 41

7. Bonanni, L.: Quantitative electroencephalogram utility in predicting conversion of


mild cognitive impairment to dementia with Lewy bodies. Neurobiol. Aging 36(1),
434–445 (2015)
8. Bonanni, L., Thomas, A., Tiraboschi, P., Perfetti, B., Varanese, S., Onofrj, M.:
EEG comparisons in early Alzheimer’s disease, dementia with Lewy bodies and
Parkinson’s disease with dementia patients with a 2-year follow-up. Brain 131(3),
690–705 (2008)
9. Bousleiman, H., et al.: P122. alpha1/theta ratio from quantitative EEG (qEEG) as
a reliable marker for mild cognitive impairment (MCI) in patients with Parkinson’s
disease (PD). Clin. Neurophysiol. 126(8), e150–e151 (2015)
10. Chaturvedi, M., et al.: Quantitative EEG (qEEG) measures differentiate Parkin-
son’s disease (PD) patients from healthy controls (HC). Front. Aging Neurosci. 9,
3 (2017)
11. Cozac, V.V., Gschwandtner, U., Hatz, F., Hardmeier, M., Rüegg, S., Fuhr, P.:
Quantitative EEG and cognitive decline in Parkinson’s disease. Parkinson’s Disease
2016 (2016)
12. Dorsey, E.A., et al.: Projected number of people with Parkinson disease in the
most populous nations, 2005 through 2030. Neurology 68(5), 384–386 (2007)
13. Dubbelink, K.T.O., et al.: Predicting dementia in Parkinson disease by combining
neurophysiologic and cognitive markers. Neurology 82(3), 263–270 (2014)
14. Emre, M., Aarsland, D., Brown, R., Burn, D.J., Duyckaerts, C., Mizuno, Y., Broe,
G.A., Cummings, J., Dickson, D.W., Gauthier, S., et al.: Clinical diagnostic criteria
for dementia associated with Parkinson’s disease. Mov. Disord. Off. J. Mov. Disord.
Soc. 22(12), 1689–1707 (2007)
15. Fonseca, L., Tedrus, G., Letro, G., Bossoni, A.: Dementia, mild cognitive impair-
ment and quantitative EEG in patients with Parkinson’s disease. Clin. EEG Neu-
rosci. 40(3), 168–172 (2009)
16. Fonseca, L.C., Tedrus, G.M., Carvas, P.N., Machado, E.C.: Comparison of quanti-
tative EEG between patients with Alzheimer’s disease and those with Parkinson’s
disease dementia. Clin. Neurophysiol. 124(10), 1970–1974 (2013)
17. Garcia-Ptacek, S., Kramberger, M.G.: Parkinson disease and dementia. J. Geriatr.
Psychiatry Neurol. 29(5), 261–270 (2016)
18. Gu, Y., Chen, J., Lu, Y., Pan, S.: Integrative frequency power of EEG correlates
with progression of mild cognitive impairment to dementia in Parkinson’s disease.
Clin. EEG Neurosci. 47(2), 113–117 (2016)
19. Han, C.X., Wang, J., Yi, G.S., Che, Y.Q.: Investigation of EEG abnormalities in
the early stage of Parkinson’s disease. Cogn. Neurodyn. 7(4), 351–359 (2013)
20. Hassan, M., et al.: Functional connectivity disruptions correlate with cognitive
phenotypes in Parkinson’s disease. NeuroImage Clin. 14, 591–601 (2017)
21. Iyer, K.K., Au, T.R., Angwin, A.J., Copland, D.A., Dissanayaka, N.N.: Theta and
gamma connectivity is linked with affective and cognitive symptoms in Parkinson’s
disease. J. Affect. Disord. 277, 875–884 (2020)
22. Klassen, B., et al.: Quantitative EEG as a predictive biomarker for Parkinson
disease dementia. Neurology 77(2), 118–124 (2011)
23. Lee, A., Gilbert, R.M.: Epidemiology of Parkinson disease. Neurol. Clin. 34(4),
955–965 (2016)
24. Mitas, A.W.: Testen Digitaler Schaltungen: Monographie. Faculty of Biomedical
Engineering Silesian University of Technology, Gliwice (2012)
25. Mostile, G., et al.: Electrocortical networks in Parkinson’s disease patients with
mild cognitive impairment. The PACOS study. Parkinsonism Relat. Disord. 64,
156–162 (2019)
42 A. W. Mitas et al.

26. Pal, A., Pegwal, N., Behari, M., Sharma, R.: High delta and gamma EEG power
in resting state characterise dementia in Parkinson’s patients. Biomark. Neuropsy-
chiatry 3, 100,027 (2020)
27. Sinanović, O., Kapidzić, A., Kovacević, L., Hudić, J., Smajlović, D.: EEG frequency
and cognitive dysfunction in patients with Parkinson’s disease. Med. Arh. 59(5),
286–287 (2005)
28. Stoffers, D., Bosboom, J., Deijen, J., Wolters, E.C., Berendse, H., Stam, C.: Slowing
of oscillatory brain activity is a stable characteristic of Parkinson’s disease without
dementia. Brain 130(7), 1847–1860 (2007)
29. Svenningsson, P., Westman, E., Ballard, C., Aarsland, D.: Cognitive impairment in
patients with Parkinson’s disease: diagnosis, biomarkers, and treatment. LANCET
Neurol. 11(8), 697–707 (2012)
30. Thatcher, R.W., North, D., Biver, C.: EEG and intelligence: relations between
EEG coherence, EEG phase delay and power. Clin. Neurophysiol. 116(9), 2129–
2141 (2005)
31. Tysnes, O.B., Storstein, A.: Epidemiology of Parkinson’s disease. J. Neural Transm.
124(8), 901–905 (2017)
32. Wang, Q., Meng, L., Pang, J., Zhu, X., Ming, D.: Characterization of EEG data
revealing relationships with cognitive and motor symptoms in Parkinson’s disease:
a systematic review. Front. Aging Neurosci. 373 (2020)
EEG Signal and Deep Learning Approach
in Evaluation of Cognitive Declines
in Parkinson’s Disease

Marcin Bugdol1(B) , Daniel Ledwoń1 , Monika N. Bugdol1 ,


Katarzyna Zawiślak-Fornagiel2 , Marta Danch-Wierzchowska1 ,
and Andrzej W. Mitas1
1
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{marcin.bugdol,daniel.ledwon,monika.bugdol,
marta.danch-wierzchowska,andrzej.mitas}@polsl.pl
2
University Clinical Center prof. K. Gibiński of the Medical University of Silesia,
ul. Medyków 14, 40-752 Katowice, Poland
katarzyna zawislak@wp.pl

Abstract. In the paper, convolutional neural network models were pro-


posed that classify the patient’s EEG into one of the three groups of
parkinsonism, i.e., no symptoms (PD-N), mild cognitive impairment(PD-
MCI), and Parkinson’s Disease Dementia (PD-PDD). Three different
architectures of Deep Convolution Neural Networks were proposed. As
the input of the CNN, two approaches were employed: the raw EEG
signal and its transformation to power spectral density (PSD). The clas-
sification process was performed as a three-class task (PD-N vs PD-
MCI vs PDD) and a two-class problem (PD-N vs PD-MCI, PD-N vs
PD-PDD, and PD-MCI vs PD-PDD). The obtained accuracy for three
classes exceeded 50%, and for the two-class, it was mostly between 60
and 70%.

Keywords: Deep learning · EEG · Parkinson’s Disease

1 Introduction
Parkinson’s Disease (PD) is one of the most common neurodegenerative diseases
and the most frequent form of parkinsonian syndrome. The frequency of occur-
rence of PD in the population is about 0.3%, for population older than 60 years it
is about 1%, and in the population older than 80 years, about 3% [15,27]. There-
fore, the problems resulting from this disease may be a significant social problem.
Parkinson’s Disease Mild Cognitive Impairment (PD-MCI) is diagnosed in
approximately 40–50% of PD patients within five years of PD diagnosis. It
is estimated that approximately 20.2% of PD patients develop mild cognitive
impairment at the time of diagnosis of Parkinson’s disease [21]. It is a group of
symptoms in which there is a cognitive decline, the so-called intermediate state
between the changes observed in the aging process and the disorders that meet
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 43–53, 2022.
https://doi.org/10.1007/978-3-031-09135-3_4
44 M. Bugdol et al.

the criteria for the diagnosis of dementia. They are a well-established risk factor
for the development of dementia [8]. The risk of passing PD-MCI in Parkin-
son’s Disease Dementia (PDD) is unclear. PD-MCI may remain stable, may
even improve and return to normal cognitive functioning in about 11–28% of
cases, not necessarily turning into dementia [23,29].
The time at which PD-MCI symptoms may progress to PDD is an individual
feature, and studies show that the annual rate of progression from PD-MCI to
PDD is around 10%, with particular susceptibility to cognitive decline in people
over 70 [28]. The fact that conversion rates for PDD are much higher in people
with PD-MCI is also confirmed by studies in which almost 60% of patients with
PDD were reported for the ParkWest cohort after five years of follow-up of
patients with PD-MCI [21], and other data reports that as much as 80% of PD-
MCI can be converted to PDD [16]. The mean percentage of patients with PD
diagnosed with dementia is 24–31% [1]. Although the results vary depending on
the study, the prevalence of PDD in patients aged 54 to 70.2 years at diagnosis
is 17% five years after diagnosis, 19.49% ten years after diagnosis, reaching 83%
20 years after diagnosis [9,30,31].
Neuropsychological diagnosis is the gold standard in assessing the cognitive
impairment in PD. However, the way neuropsychological tests are performed
may be modified by the patient’s ability to cooperate, exercise capacity, and
the severity of motor symptoms. Therefore, this diagnosis could be supported
by EEG, which is not only a simple and safe method but also carried out in
the conditions of the patient’s relaxation and does not require effort or retained
verbal functions, which is important – it does not depend on motor symptoms
[13].
The EEG is an important supporting examination to study the effects of
developing dementia on brain function [2,4]. As shown by the previous studies,
the assessment of EEG in PD may be a valuable biomarker, especially in terms
of predicting the occurrence of cognitive disorders and their early diagnosis,
as well as monitoring selected aspects of the effectiveness of treatment [13].
Digital quantitative analysis of the EEG allows the analysis of the absolute
strength of the various frequency bands of the brain’s rhythms, the temporal
relationships between individual regions of the brain, as well as the functional
connectivity between them [7,25]. Bousleiman et al. [5] analyzed the changes in
qEEG characteristics of PD and found that specific changes in alpha and theta
signal strengths may be markers for the diagnosis of the disease itself and are
associated with the presence of MCI in PD. Other studies have tried to compare
the pathophysiological mechanisms of dementia in Alzheimer’s Disease (AD)
with dementia in PD in qEEG and found differences in recording, which may be
associated with different mechanisms of dementia development in both disease
entities [7].
Automated EEG analysis is typically used for the following tasks [6]:

– motor imaginery [14],


– emotions recognitions [33],
– mental loading,
EEG Signal and Deep Learning Approach 45

– pathology detection [17,24],


– seizure detection [10,26],
– evoked potentials detection,
– sleep assessment [18].
There are different approaches in signal analysis and machine learning to do
this. Currently, Deep Learning methods have gained enormous popularity. One of
the reasons for this is that the amount of information stored in computer systems
has increased significantly. Thus it is possible to use new approaches that achieve
greater accuracy than “shallow” learning methods. The most commonly used is
the Convolutional Neural Network (CNN). As the network input, among others,
the raw signal [24], the selected frequency range [19] or its spatial representation
[3] may be used. CNN analyzes the current situation without using information
that was available at an earlier point in time. Recurrent Neural Network (RNN)
offers such possibilities, which considers the current and previous status and
develops an answer on this basis. One of the drawbacks of RNN is the problem
of vanishing or exploding gradient, which may result in these networks not being
well learned. The solution to this inconvenience is the use of the Long short-term
memory layer (LSTM), in which the previous states are not lost irretrievably but
are used to work out the current classifier response.
In [6,32], it can be found that RNNs, which include LSTM classifiers, are
the most common type of deep learning tools used for EEG seizure detection.
According to this work, the inputs to the RNN were both raw signals, des-
ignated features, and images. Accuracy is usually lowest for raw signals and
highest for images. RNNs allow for the fusion of data from different EEG stud-
ies but require information about the connection of the electrodes to perform
such transformation [22]. RNN classifiers are usually great at dealing with data
from the same person, but their accuracy decreases when they receive data from
another patient.
In this paper, a convolutional neural network model is proposed that classifies
the patient’s EEG input frames into one of the three groups of parkinsonism, i.e.,
no symptoms, mild-cognitive impairment, and Parkinson’s Disease Dementia,
PD-N, PD-MCI, and PDD, respectively.

2 Materials and Methods


The dataset consists of 64 EEG recordings of patients diagnosed with Parkin-
son Disease. It has been obtained from the University Clinical Center prof. K.
Gibiński of the Medical University of Silesia in Katowice and no permission
of the ethical commission was needed. The patients mean age was 66.1 years
(±11.3 years). There were 21 women and 43 men. Based on overall clinical eval-
uation, including patient history, neurological examination, and psychological
test results patients were classified into one of three groups regarding the pres-
ence of cognitive impairment: none (28), PD-MCI (20), PDD (16). The EEG
examination was performed according to the routine procedure. The measure-
ments contain 19 EEG channels (Fig. 1), and it was recorded 512 Hz sampling
46 M. Bugdol et al.

frequency during a standard routine examination. The full EEG recording for
each patient contained approximately 20 min of signal collected during succes-
sive activation trials. According to the established study protocol, recording
begins with approximately 3 min of resting eyes-closed condition followed by
an eye-opening command. These fragments were used in further analysis and
classification of cognitive impairments.

Fig. 1. Sensors position

The EEG fragments were firstly split into 5 s non-overlapping epochs. The
epochs with artifacts from bad channels (gaps in the collected signal) and incom-
plete epochs were removed. The resulting number of epochs for each cognitive
disorders class was as follows: 763 N-PD, 607 PD-MCI, 402 PDD. The signals
were filtered in the frequency range of 1–40 Hz with a fourth-order Butterworth
bandpass filter to reduce low-frequency drifts and power line noise.
In many solutions for EEG classification using deep neural networks, high
performance was obtained for raw EEG signal input on the convolutional neural
network. We evaluated different architectures of CNN in two approaches: one-
dimensional convolutions only on the time dimension, where weights for all EEG
channels have the same value, and two-dimensional convolutions on time and
channels dimensions. The third approach used in experiments is also based on
2D convolution. However, before the input, each EEG channel was transformed
to power spectral density (PSD) in frequency limited to range 1–40 Hz.
The first deep neural network architecture, EEG-Conv1D (Fig. 2(a)), consists
of three processing blocks and a multilayer perceptron classifier with two hidden
layers. Each block involves a 1D convolutional layer (10, 20, 40 filters, 21 kernel
size) followed by a rectified linear unit (ReLU) and max pooling (pool size equal
to 5 and the same stride). After flattening, the two fully connected layers with
the ReLu activation function have 500 and 100 units, respectively. The two-
dimensional EEG-Conv2D (Fig. 2(b)) was evaluated with the same structure
and different convolution layers kernel dimensions variants. Finally, the best
results were achieved by modifying the first processing block only. This block in
EEG-Conv2D consists of two convolutional layers: first with ten filters 1 × 21
EEG Signal and Deep Learning Approach 47

kernel (convolution in the time dimension) and second with ten filters 19 × 1
(convolution in channels dimension). This block cumulates information from all
channels, and in the following blocks, the 1×21 kernel is used in the convolutional
layer. All the networks ended with softmax activation in the last layer with two
or three outputs depending on the experiment.
Third approach PSD-Conv2D (Fig. 2(c)) was inspired by Ieracitano et al. [11]
and it is based on PSD of each channel as the input. The PSD was estimated
using periodogram with rectangular windowing function. The shape of resulting
CNN input was 194 PSD samples in 19 channels. The deep neural network
architecture consists of only one processing block with 16 filters size 3 × 3,
batch normalization, ReLU activation and 3 × 3 max pooling with strides 2 × 2.
Resulting 13824 features was then classified by one hidden layer with 300 units
and last layer with softmax activation.
The stratified 8-fold cross-validation was used to validate each model. The
patients’ set was randomly divided into eight subsets preserving the percentage
of subjects for each class. We decided on 8-fold scheme because of 64 patients
– the number of unique patients in each fold was the same. Eight patients in
the testing set are sufficient to provide an example from each class. The training
was performed eight times; each time, one of the subsets was a testing set used
after training to compute evaluation metrics: accuracy, precision, recall, and F1-
score. Results achieved from 8-folds were then presented as average and standard
deviation. We also collected results from each iteration in the confusion matrix.
For the evaluation of the 3-class results, we used the macro-averaging strategy.
For each testing set, metrics values were calculated separately for each label and
then averaged.
In CNN training, we used stochastic gradient descent (SGD) optimization
with a learning rate of 0.0001 and momentum coefficient α = 0.9. The mini-
batch size was set to 91 EEG fragments. The early stopping technique was used
to prevent overfitting. The number of training epochs depended on categorical
cross-entropy loss value improvement and accuracy value lower than 1.

3 Results
In the first experiments, individual models were validated in a three-class clas-
sification. Results from each cross-validation iteration were aggregated in the
confusion matrix presented in Fig. 3. Average results presented in Table 1 are
poor, but on the other hand, results for different folds could reach almost 70%
accuracy.

Table 1. PD-N vs PD-MCI vs PDD

Accuracy Precision Recall F1-score


EEG-Conv1D 51.37 ± 17.92 41.92 ± 17.91 47.15 ± 18.30 43.44 ± 17.81
EEG-Conv2D 52.03 ± 12.53 45.52 ± 10.56 46.27 ± 11.24 43.06 ± 8.34
PSD-Conv2D 44.89 ± 12.23 43.28 ± 11.39 45.04 ± 11.49 40.95 ± 10.40
48 M. Bugdol et al.

Fig. 2. Architectures of convolutional neural networks

The binary classification on reduced dataset was evaluated to verify the


CNNs’ ability to separate signals from different cognitive disorder level (Table 2,
3 and 4). For all models, the best results were achieved for classification between
Parkinson’s Disease without cognitive impairment and dementia. The best
results in all cases were achieved by EEG-Conv2D.
EEG Signal and Deep Learning Approach 49

Fig. 3. Confusion matrices presenting aggregated classification resulting from cross-


validation procedure

Table 2. PD-N vs PDD

Accuracy Precision Recall F1-score


EEG-Conv1D 64.34 ± 10.58 48.66 ± 16.19 44.79 ± 18.15 44.14 ± 16.64
EEG-Conv2D 61.59 ± 15.65 40.05 ± 22.00 50.33 ± 33.42 43.41 ± 25.82
PSD-Conv2D 67.35 ± 10.81 50.66 ± 24.71 51.85 ± 25.20 49.28 ± 21.46

Table 3. PD-N vs PD-MCI

Accuracy Precision Recall F1-score


EEG-Conv1D 57.93 ± 12.71 48.18 ± 17.98 54.20 ± 25.11 46.77 ± 19.86
EEG-Conv2D 65.02 ± 13.69 61.04 ± 33.26 46.79 ± 26.12 48.81 ± 27.69
PSD-Conv2D 53.59 ± 13.33 46.50 ± 20.64 39.93 ± 18.18 39.12 ± 15.42
50 M. Bugdol et al.

Table 4. PD-MCI vs PDD

Accuracy Precision Recall F1-score


EEG-Conv1D 57.43 ± 19.99 62.79 ± 22.22 58.33 ± 29.26 56.58 ± 24.76
EEG-Conv2D 58.64 ± 19.31 64.98 ± 22.99 58.26 ± 27.13 57.50 ± 24.24
PSD-Conv2D 54.12 ± 14.54 58.43 ± 25.62 58.91 ± 24.74 56.42 ± 21.98

4 Discussion
Regardless of the network architecture, the obtained results oscillate around
50–70% for two classes and about 50% for three classes. This indicates that
the solutions used to distinguish cognitive disorders from AD or healthy cases
from AD are not the best choice for the described task. The approaches where
the network input were raw EEG signals directly in the time domain did not
give good results. The usage of PSD did not improve the accuracy. The large
value of standard deviation depended heavily on the composition of train and
test sets, which suggests that the dataset consists of few cases to achieve global
generalization for the problem of cognitive disorders classification.
We evaluated three different CNN approaches presented in the literature
concerning EEG classification. They were used successfully in different classifi-
cation tasks: Parkinson’s Disease (Oh 2020), cognitive impairments (Ieracitano
2019), and motor imagery (Schirrmeister 2017). We reproduced EEG signal
processing and preparation for deep learning classification presented in other
papers. The results showed the limitations of cognitive impairments classifica-
tion in PD patients by known CNN approaches. The cohort in the presented
study is relatively small, considering the diversity of PD duration, other dis-
eases, and pathologies in EEG recordings. The other shortcoming is the dif-
ference in recording lengths, leading to imbalances in training and testing set
during cross-validation based on patients. For this reason, it was difficult to
adjust the parameters in such a way as to keep the balance between complete
lack of generalization ability and overfitting.
One of the necessary preprocessing steps, which will be included in further
work, will be detecting frames containing artefacts and their removal from further
processing. Then, various possibilities of processing the EEG signal into the form
of a feature vector will be tested, which will constitute the input of the neural
network. An additional element that could improve classification accuracy is
data augmentation by overlaying consecutive frames.
In the previous works published so far on the automatic classification of cogni-
tive disorders based on EEG, the occurrence of neurological disorders, including
PD, was one of the exclusion criteria [12,20]. However, the risk of developing
cognitive impairment in people with PD is very high (20.2% of PD patients
develop mild cognitive impairment at the time of diagnosis of Parkinson’s dis-
ease and approximately 40–50% of PD patients within five years of PD diagnosis
[21]), therefore it is necessary to look for solutions enabling their early detection,
especially in the presence of neurological disorders.
EEG Signal and Deep Learning Approach 51

5 Conclusion
The paper proposed the first approach for evaluating cognitive declines in Parkin-
son’s Disease using deep learning classifiers. The obtained results are promising,
and they indicate that such a solution is possible. On the other hand, the archi-
tectures of CNN proposed for a similar task are not quite suitable for patients
with PD. However, further work should be taken to create a tool for monitoring
changes of cognitive impairments of PD patients.

Acknowledgement. The study was realized within the project “TeleBrain – Artifi-
cially Intelligent EEG Analysis in the Cloud” (grant no. WPN3/9/TeleBrain-/2018).

References
1. Aarsland, D., Zaccai, J., Brayne, C.: A systematic review of prevalence studies of
dementia in Parkinson’s disease. Mov. Disord. 20(10), 1255–1263 (2005). https://
doi.org/10.1002/mds.20527
2. Babiloni, C., et al.: Abnormalities of cortical neural synchronization mechanisms in
patients with dementia due to Alzheimer’s and Lewy body diseases: an EEG study.
Neurobiol. Aging 55, 143–158 (2017). https://doi.org/10.1016/j.neurobiolaging.
2017.03.030
3. Bashivan, P., Rish, I., Yeasin, M., Codella, N.: Learning representations from EEG
with deep recurrent-convolutional neural networks (2015)
4. Bonanni, L., et al.: Quantitative electroencephalogram utility in predicting conver-
sion of mild cognitive impairment to dementia with Lewy bodies. Neurobiol. Aging
36(1), 434–445 (2015). https://doi.org/10.1016/j.neurobiolaging.2014.07.009
5. Bousleiman, H., et al.: P122. Alpha1/theta ratio from quantitative EEG (qEEG) as
a reliable marker for mild cognitive impairment (MCI) in patients with Parkinson’s
disease (PD). Clin. Neurophysiol. 126(8), e150–e151 (2015). https://doi.org/10.
1016/j.clinph.2015.04.249
6. Craik, A., He, Y., Contreras-Vidal, J.: Deep learning for electroencephalogram
(EEG) classification tasks: a review. J. Neural Eng. 16(3), 031,001 (2019). https://
doi.org/10.1088/1741-2552/ab0ab5
7. Fonseca, L.C., Tedrus, G.M., Carvas, P.N., Machado, E.C.: Comparison of quanti-
tative EEG between patients with Alzheimer’s disease and those with Parkinson’s
disease dementia. Clin. Neurophysiol. 124(10), 1970–1974 (2013). https://doi.org/
10.1016/j.clinph.2013.05.001
8. Goldman, J.G., Sieg, E.: Cognitive impairment and dementia in Parkinson disease.
Clin. Geriatr. Med. 36, 365–377 (2020). https://doi.org/10.1016/j.cger.2020.01.001
9. Hely, M.A., Reid, W.G., Adena, M.A., Halliday, G.M., Morris, J.G.: The Sydney
multicenter study of Parkinson’s disease: the inevitability of dementia at 20 years.
Mov. Disord. 23(6), 837–844 (2008). https://doi.org/10.1002/mds.21956
10. Hussein, R., Palangi, H., Ward, R.K., Wang, Z.J.: Optimized deep neural network
architecture for robust detection of epileptic seizures using EEG signals. Clin.
Neurophysiol. 130(1), 25–37 (2019). https://doi.org/10.1016/j.clinph.2018.10.010
11. Ieracitano, C., Mammone, N., Bramanti, A., Hussain, A., Morabito, F.C.: A con-
volutional neural network approach for classification of dementia stages based on
2d-spectral representation of EEG recordings. Neurocomputing 323, 96–107 (2019)
52 M. Bugdol et al.

12. Ieracitano, C., Mammone, N., Hussain, A., Morabito, F.C.: A novel multi-modal
machine learning based approach for automatic classification of EEG recordings in
dementia. Neural Netw. 123, 176–190 (2020)
13. Klassen, B., et al.: Quantitative EEG as a predictive biomarker for Parkinson dis-
ease dementia. Neurology 77(2), 118–124 (2011). https://doi.org/10.1212/WNL.
0b013e318224af8d
14. Kumar, S., Sharma, A., Tsunoda, T.: Brain wave classification using long short-
term memory network based optical predictor. Sci. Rep. 9(1), 9153 (2019). https://
doi.org/10.1038/s41598-019-45605-1
15. Lee, A., Gilbert, R.M.: Epidemiology of Parkinson disease. Neurol. Clin. 34(4),
955–965 (2016). https://doi.org/10.1016/j.ncl.2016.06.012. Glob. Domest. Publ.
Health Neuroepidemiol
16. Litvan, I.: Diagnostic criteria for mild cognitive impairment in Parkinson’s dis-
ease: Movement disorder society task force guidelines. Mov. Disord. 27(3), 349–356
(2012). https://doi.org/10.1002/mds.24893
17. Medvedev, A.V., Agoureeva, G.I., Murro, A.M.: A long short-term memory neural
network for the detection of epileptiform spikes and high frequency oscillations.
Sci. Rep. 9(1), 19,374 (2019). https://doi.org/10.1038/s41598-019-55861-w
18. Michielli, N., Acharya, U.R., Molinari, F.: Cascaded LSTM recurrent neural net-
work for automated sleep stage classification using single-channel EEG signals.
Comput. Biol. Med. 106, 71–81 (2019). https://doi.org/10.1016/j.compbiomed.
2019.01.013
19. Nejedly, P., et al.: Intracerebral EEG artifact identification using convolutional neu-
ral networks. Neuroinformatics 17(2), 225–234 (2018). https://doi.org/10.1007/
s12021-018-9397-6
20. Oltu, B., Akşahin, M.F., Kibaroğlu, S.: A novel electroencephalography based app-
roach for Alzheimer’s disease and mild cognitive impairment detection. Biomed.
Sig. Process. Control 63, 102,223 (2021)
21. Pedersen, K.F., Larsen, J.P., Tysnes, O.B., Alves, G.: Natural course of mild cog-
nitive impairment in Parkinson disease. Neurology 88(8), 767–774 (2017). https://
doi.org/10.1212/WNL.0000000000003634
22. Praveena, M., Sarah, A., George, s.: Deep learning techniques for EEG signal appli-
cations - a review. IETE J. Res. 1–8 (2020). https://doi.org/10.1080/03772063.
2020.1749143
23. Saredakis, D., Collins-Praino, L., Gutteridge, D., Stephan, B., Keage, H.: Conver-
sion to mci and dementia in Parkinson’s disease: a systematic review and meta-
analysis. Parkinsonism Relat. Disord. 65, 20–31 (2019). https://doi.org/10.1016/
j.parkreldis.2019.04.020
24. Schirrmeister, R., Gemein, L., Eggensperger, K., Hutter, F., Ball, T.: Deep learn-
ing with convolutional neural networks for decoding and visualization of EEG
pathology. In: 2017 IEEE Signal Processing in Medicine and Biology Symposium
(SPMB), pp. 1–7 (2017). https://doi.org/10.1109/SPMB.2017.8257015
25. Thatcher, R., North, D., Biver, C.: EEG and intelligence: relations between EEG
coherence, EEG phase delay and power. Clin. Neurophysiol. 116(9), 2129–2141
(2005). https://doi.org/10.1016/j.clinph.2005.04.026
26. Tsiouris, K.M., Pezoulas, V.C., Zervakis, M., Konitsiotis, S., Koutsouris, D.D.,
Fotiadis, D.I.: A long short-term memory deep learning network for the prediction
of epileptic seizures using EEG signals. Comput. Biol. Med. 99, 24–37 (2018).
https://doi.org/10.1016/j.compbiomed.2018.05.019
27. Tysnes, O.-B., Storstein, A.: Epidemiology of Parkinson’s disease. J. Neural
Transm. 124(8), 901–905 (2017). https://doi.org/10.1007/s00702-017-1686-y
EEG Signal and Deep Learning Approach 53

28. Vasconcellos, L.F.R., et al.: Mild cognitive impairment in Parkinson’s disease: char-
acterization and impact on quality of life according to subtype. Geriatr. Gerontol.
Int. 19(6), 497–502 (2019). https://doi.org/10.1111/ggi.13649
29. Weil, R.S., Costantini, A.A., Schrag, A.E.: Mild cognitive impairment in Parkin-
son’s disease—what is it? Curr. Neurol. Neurosci. Rep. 18(4), 1–11 (2018). https://
doi.org/10.1007/s11910-018-0823-9
30. Williams-Gray, C.H., et al.: The distinct cognitive syndromes of Parkinson’s dis-
ease: 5 year follow-up of the campaign cohort. Brain 132(11), 2958–2969 (2009).
https://doi.org/10.1093/brain/awp245
31. Williams-Gray, C.H., et al.: The campaign study of Parkinson’s disease: 10-year
outlook in an incident population-based cohort. J. Neurol. Neurosurg. Psychi-
atry 84(11), 1258–1264 (2013). https://doi.org/10.1136/jnnp-2013-305277. URL
https://jnnp.bmj.com/content/84/11/1258
32. Zhang, X., Yao, L., Wang, X., Monaghan, J., McAlpine, D., Zhang, Y.: A survey on
deep learning-based non-invasive brain signals: recent advances and new frontiers.
J. Neural Eng. 18(3), 031,002 (2021). https://doi.org/10.1088/1741-2552/abc902
33. Zhang, Y., et al.: An investigation of deep learning models for EEG-based emo-
tion recognition. Front. Neurosci. 14 (2020). https://doi.org/10.3389/fnins.2020.
622759. URL https://www.frontiersin.org/article/10.3389/fnins.2020.622759
The Role of Two-Dimensional Entropies
in IRT-Based Pregnancy Determination
Evaluated on the Equine Model

Marta Borowska1(B) , Malgorzata Maśko2 , Tomasz Jasiński3 ,


and Malgorzata Domino3
1
Institute of Biomedical Engineering, Faculty of Mechanical Engineering,
Bialystok University of Technology, Wiejska 45C, 15-351 Bialystok, Poland
m.borowska@pb.edu.pl
2
Institute of Animal Science, Department of Animal Breeding,
Warsaw University of Life Sciences, Nowoursynowska 100, 02-797 Warsaw, Poland
malgorzata masko@sggw.edu.pl
3
Institute of Veterinary Medicine, Department of Large Animal Diseases and Clinic,
Warsaw University of Life Sciences, Nowoursynowska 100, 02-797 Warsaw, Poland
{tomasz jasinski,malgorzata domino}@sggw.edu.pl

Abstract. Infrared thermography (IRT) has been used as a tool to detect


pregnancy in horses. IRT measures heat emission from the body surface,
which increases with increased blood flow and metabolic activity in uter-
ine and fetal tissues. This study aimed to extract the entropy-based fea-
tures of the IRT image texture and compare them to find pregnancy-
related and color-related differences. In the current study, 40 mares were
divided into non-pregnant and pregnant groups and IRT imaging was per-
formed. The thermal images were converted into grayscale images using
four conversion methods: grayscale image and three basic channels, the
Red, Green, and Blue channels. Image texture features were calculated
in each channel, based on four entropy measures: two-dimensional sam-
ple entropy, two-dimensional fuzzy entropy, two-dimensional dispersion
entropy, and two-dimensional distribution entropy. The signs of higher
irregularity and complexity of IRT image texture were evidenced for the
composite Grayscale image using two-dimensional sample entropy, two-
dimensional fuzzy entropy, and two-dimensional dispersion entropy. The
pattern of the IRT image texture differed in the Red and Blue channels
depending on the considered entropy feature.

Keywords: Two-dimensional entropies · Surface temperature · Mares

1 Introduction

Detection of pregnancy is important for reducing losses and managing the ani-
mal herd. Pregnancy detection is based on the clinical examination which is
supported by additional tests, qualified accordingly as invasive or non-invasive
methods. Invasive additional tests, such as hormones level evaluation, required
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 54–65, 2022.
https://doi.org/10.1007/978-3-031-09135-3_5
Two-Dimensional Entropies 55

more than a single blood sampling, whereas non-invasive methods, such as ultra-
sound examination, require direct contact, which could also be a stressful factor
[4,16,22]. Thus, there is a need for a contactless objective quantitative method to
assess pregnancy detection which could be useful in large herds of animals or non-
domestic animals. Infrared thermography (IRT) is a non-invasive method that
measures the temperature distribution of a surface and transforms it into an image
representing the differences in emitted heat [17]. Therefore, IRT is a useful diag-
nostic method [10,14], which has been successfully applied in veterinary medicine
to assess pregnancy in various species [3,20]. However, IRT is sensitive to different
types of factors: internal factors and external factors. Internal factors are related
to tissue metabolism and blood flow. During pregnancy, blood flows through the
uterus increases, making it possible to detect even subtle temperature changes
that can be detected by the IRT in its vicinity. External factors such as fluctua-
tions in ambient temperature [26], other weather conditions [20], and individual
characteristics of the equine body surface [9] can also affect the image recorded.
Therefore, thermal images should be obtained under standard conditions. Achiev-
ing this assumption is possible, however, the time required to detect pregnancy-
related changes in body surface temperature [3] and the accuracy of detection of
pregnancy [3,7,20] limit the application of IRT to late stages of pregnancy. There-
fore, there is a need to search for methods to describe thermal images in a case of
early and middle pregnancy detection. There are many approaches to describe
images. One of them is to treat an image as a texture [13]. Texture can be defined
as a measure of roughness, coarseness, directionality, or regularity [29]. Relatively
recent entropy-based methods have been introduced for quantitative description
of the texture. These methods are simple to implement and are based on the prop-
agation of measures for one-dimensional data. This new class of methods is com-
puted directly on the image and is related to image irregularity or complexity [13].
Until now, methods were known in which the entropy measure was calculated but
as a measure, disordered matrix obtained from the processing step applied to the
image [28]. Entropy-based approaches certainly deserve attention and needs to
be evaluated for usefulness in different types of applications [30]. Therefore, in
our work, entropy-based methods of images texture analysis have been applied
in the equine model of pregnancy. This study aimed to compare the results of
four based entropy methods: two-dimensional sample entropy, two-dimensional
fuzzy entropy, two-dimensional dispersion entropy, and two-dimensional distribu-
tion entropy; in order to find differences between IRT images obtained from preg-
nant and non-pregnant mares as well the contribution of subsequent color channels
in the whole measures of the entropy of thermal images.

2 Materials and Methods


2.1 Animals
In the research, 40 adult non-lactating, no maiden Konik Polski breed mares
(n = 40; age 6.00 ± 3.95 years) were imaged using IRT according to the
protocol approved by the II Local Ethical Committee on Animal Testing in
56 M. Borowska et al.

Warsaw on behalf of the National Ethical Committees on Animal Testing (Nuber


WAW2/007/2020, day 15 January 2020). The mares were selected out of a herd
of 90 Konik Polski horses housed in the Polish state stud farm Dobrzyniewo.
The selection criteria were age over 3 years, lack of lactation, lack of the clin-
ical symptoms of the disease on the day of the first examination and in the
two weeks before, and the lack of pharmacological treatment during the two
weeks before the examination. On the first day of research, the basic clinical
examination was carried out following international veterinary standards. Then
the detailed reproductive tract examination with the ultrasonographic exami-
nation of the reproductive tract was conducted following standard protocol [6].
Based on the findings, the mares were divided into one of two distinct groups
of mares: (i) non-pregnant (n = 14; age 5.50 ± 3.90 years) and (ii) pregnant
(n = 26; age 6.27 ± 4.04 years). The non-pregnant group’s inclusion criteria were
lack of mating during the current reproductive season and the negative result
of ultrasonographical pregnancy examination. The pregnant group’s inclusion
criteria were natural mating during the current reproductive season in February
and/or March and the positive result of ultrasonographical pregnancy examina-
tion between 14 and 40-days post-ovulation. All mares were housed under the
same conditions.

2.2 Thermal Images Acquisition

The same mares’ body area – the left lateral surface of the abdomen was imaged
using a non-contact thermal camera (FLIR Therma CAM E60, FLIR Systems
Brazil, Brazil) under the same imaging conditions [15,26] (0.99 emissivity, closed
space devoid of wind and sun radiation, 2.0 m distance, position of the central
beam on the half of the vertical line through the tuber coxae, 1 h acclimatiza-
tion to the imaging condition, brushing off dirt and mud from the imaged area
15 min before imaging). A total of 160 thermal images were taken four times
every two months, starting in June and ending in December, with the ambi-
ent temperature and humidity ranging from 1.0 ◦ C and 50% to 24 ◦ C and 90%,
respectively. Obtained thermal images were set in the same temperature range
(10.0 to 40.0 ◦ C) using the software FLIR Tools Professional (FLIR Systems
Brasil, Brazil) and manually segmented. During segmentation, one region of
interests ROI) was annotated on each thermal image in the flank area of the left
lateral surface of the abdomen ranging from the vertical line behind the tuber
coxae, the dorsal edge of the abdomen, the caudal edge of the last rib, and the
lower 2/3 of the abdomen height. The annotated thermal images were converted
to bitmaps and prepared for based entropy image texture analysis in the ImageJ
software (Wayne Rasband, National Institutes of Health, USA). In total, there
were 160 registered images, within the 160 ROIs were separated (Fig. 1). Each
ROIs were converted into its greyscale and then four methods of image conver-
sion were performed (Fig. 1). The first method uses only the Red channel, second
– the Green channel, third – the Blue channel. The fourth method is based on
grayscale computing by the cv2 module in Python, version: 3.8.5 64-bit. Next,
Two-Dimensional Entropies 57

in each color channel separately, the based entropy texture analysis was per-
formed using: two-dimensional sample entropy, two-dimensional fuzzy entropy,
two-dimensional dispersion entropy, and two-dimensional distribution entropy.
The mentioned features were computed using Python, version: 3.8.5 64-bit.

Fig. 1. Example of the pregnant mares and the annotated ROI conversion to the
grayscale image, Red (R) channel, Green (G) channel, and Blue (B) channel

2.3 Two-Dimensional Entropy Measures

Two-Dimensional Sample Entropy (SampEn2D). The two-dimensional


sample entropy of images is an extension of the one-dimensional time series
measure SampEn [23,25]. The SampEn2D method defines two-dimensional
patterns of length m. Each window of length m is then compared to all other
windows of length m in the image. Pattern matching refers to the case where
each pixel in one window differs no more than r from the corresponding pixel
in the window with which it is compared. The average occurrence probability is
calculated for all windows of length m and m + 1. SampEn2D is defined as the
logarithm of the calculated average probabilities. The regular images or periodic
structures in images have the same number of patterns for both m and m + 1.
Therefore, these types of images achieve low values of entropy. On the other
hand, irregular images return high entropy values.

Two-Dimensional Fuzzy Entropy (F uzzEn2D). Two-dimensional fuzzy


entropy is an extension of the one-dimensional measure F uzzEn [5,12].
F uzzEn2D is defined as the negative natural logarithm of the conditional prob-
ability that two patterns similar for their corresponding m points will remain
similar for the next m + 1 points. It uses a continuous exponential function
to determine the degree of similarity between vectors. Images containing regu-
lar patterns or periodic structures have low value of entropy while images with
irregular patterns or non-periodic structures have high value of entropy.

Two-Dimensional Dispersion Entropy (DispEn2D). Two-dimensional dis-


persion entropy is an extension of one-dimensional DispEn [2,24]. The idea of
calculation DispEn2D relies on mapped to c classes the values of image pixels.
58 M. Borowska et al.

After mapping, the obtained results are matched to the dispersion pattern π
and the probabilities p(π) of each dispersion patterns are calculated. If all possi-
ble two-dimensional image dispersion patterns have the same probability value,
DispEn2D reaches its maximum value. If there is one probability value p other
than zero, DispEn2D reaches the minimum value and the image has a regular
pattern.

Two-Dimensional Distribution Entropy (DispEn2D). The two-dimensional


distribution entropy was introduced to the quantitative description of the irregu-
larities of the images, taking into account the small size of the image [1,18]. In the
process of counting the amount of similarity between two patterns, the distance
between the corresponding windows is measured. The histogram of distance matrix
is used to estimate the empirical probability density function (eP DF ).

2.4 Statistical Analysis


Data series, representing the results of four entropy-based methods of IRT images
analysis, were tested for univariate distributions using a Shapiro-Wilk normality
test, independently for non-pregnant and pregnant as well each examined color
model. The comparisons between data series for the non-pregnant and pregnant
groups were assessed using the Mann-Whitney U test. The comparisons between
color channels were assessed using the Kruskal-Wallis test followed by the Dunn’s
multiple comparisons. The alpha value was established as α = 0.05. The numer-
ical data were presented on the box plots as the minimum and maximum values,
median, as well lower (Q1) and upper (Q3) quartiles. Statistical analysis was
performed using GraphPad Prism6 software (GraphPad Software Inc., USA).

3 Results
In all calculated features, two parameters were manually selected: m = 3 and
r = 0.2. Within the entropy features measured in the grayscale images, the
SampEn2D, FuzzEn2D, and DispEn2D were higher in the pregnant than in the
non-pregnant group, whereas for the DistEn2D no difference was found between
groups (Gray boxes on Figs. 2, 3, 4 and 5). When the thermal images were
converted to subsequent color channels, the most significant differences between
studied groups were found for the Red channel. Concerning the Red channel, the
SampEn2D and FuzzEn2D were lower in the pregnant than in the non-pregnant
group, whereas the DispEn2D and DistEn2D were contrary higher (Red boxes on
Figs. 2, 3, 4 and 5). Concerning the Green channel, the FuzzEn2D, DispEn2D,
and DistEn2D were higher in the pregnant than in the non-pregnant group,
whereas for the SampEn2D no difference was found between groups (Green boxes
on Figs. 2, 3, 4 and 5). Finally concerning the Blue channel, the SampEn2D and
FuzzEn2D were lower in the pregnant than in the non-pregnant group, whereas
the DispEn2D and DistEn2D were contrary higher (Blue boxes on Figs. 2, 3,
4 and 5). When different color channels were considered, a different pattern
Two-Dimensional Entropies 59

of entropy features occurred in the non-pregnant and pregnant groups. In the


non-pregnant group, the SampEn2D was higher in the Red and Blue channels
than in the Grayscale image and the Green channel. In the pregnant group, the
SampEn2D was lower in the Red channel than in the other channels, with no
differences between the Green channel and the Grayscale images and the Blue
channels, respectively. However, the SampEn2D was lower in the Blue channel
than in the Grayscale images and simultaneously higher than in the Red channel
(Table 1, Fig. 2).

Table 1. The results of two-dimensional sample entropy (SampEn2D) for different


color channels: mean, median, standard deviation (std) and the significance of p value
between groups (non-pregnant; pregnant)

Color channels Non-pregnant Pregnant p


mean median std mean median std
Grayscale 1.49 1.45 0.48 1.84 1.73 0.80 0.0091
Red 2.19 2.21 0.92 0.65 0.32 0.79 <0.0001
Green 1.47 1.45 0.51 1.72 1.57 0.78 0.0842
Blue 1.94 1.90 0.75 1.42 1.38 0.66 <0.0001

Fig. 2. The box plots of two-dimensional sample entropy (SampEn2D). The upper
whisker represents the maximum value; the upper line of the box represents Q3 (upper
quartile); the center line inside the box represents the median; the lower line of the box
represents Q1 (lower quartile); and the lower whisker represents the minimum value.
Lower case letters indicate differences between groups (non-pregnant; pregnant) for
p < 0.05. Asterisks indicate differences color channels (Grayscale, Red, Green, Blue)
(∗ ∗ p < 0.01; ∗ ∗ ∗p < 0.0001)

In the non-pregnant group, the FuzzEn2D differed between each of all exam-
ined color channels. In this group, the FuzzEn2D was the lowest in the Red
channel, low in the Grayscale images, high in the Green channel, and the high-
est in the Blue channel. In the pregnant group, the FuzzEn2D was lower in the
60 M. Borowska et al.

Table 2. The results of two-dimensional fuzzy entropy (FuzzEn2D) for different color
channels: mean, median, standard deviation (std) and the significance of p value
between groups (non-pregnant; pregnant)

Color channels Non-pregnant Pregnant p


mean median std mean median std
Grayscale 2.10 2.06 0.46 2.66 2.57 0.99 0.0006
Red 1.61 1.64 0.43 0.89 0.85 0.37 <0.0001
Green 2.36 2.43 0.60 2.87 2.71 1.08 0.0035
Blue 2.75 2.76 0.66 2.33 2.27 0.74 0.0004

Fig. 3. The box plots of two-dimensional fuzzy entropy (FuzzEn2D). The upper whisker
represents the maximum value; the upper line of the box represents Q3 (upper quar-
tile); the center line inside the box represents the median; the lower line of the box
represents Q1 (lower quartile); and the lower whisker represents the minimum value.
Lower case letters indicate differences between groups (non-pregnant; pregnant) for
p < 0.05. Asterisks indicate differences color channels (Grayscale, Red, Green, Blue)
(∗ ∗ p < 0.01; ∗ ∗ ∗p < 0.0001)

Red channel than in the other color channels, with no differences between the
Grayscale images, Green channel, and Blue channel (Table 2, Fig. 3).
In both groups, the DispEn2D was lower in the Red channel than in the other
color channels with no differences between others (Table 3, Fig. 4).
In both groups, the DistEn2D was lower in the Red channel than in the other
color channels. In the non-pregnant group, the DistEn2D was lower in the Blue
channel than in the Grayscale images and the Green channel, with no differences
between the last two. Whereas, in the pregnant group, the DistEn2D was higher
in the Green channel than in the Grayscale images and the Blue channel, with
no differences between the last two (Table 4, Fig. 5).
Two-Dimensional Entropies 61

Table 3. The results of two-dimensional dispersion entropy (DispEn2D) for different


color channels: mean, median, standard deviation (std) and the significance of p value
between groups (non-pregnant; pregnant)

Color channels Non-pregnant Pregnant p


mean median std mean median std
Grayscale 7.81 7.78 0.14 7.89 7.88 0.18 0.0009
Red 7.39 7.40 0.32 7.66 7.76 0.46 <0.0001
Green 7.83 7.80 0.14 7.92 7.91 0.17 0.0007
Blue 7.83 7.80 0.14 7.89 7.89 0.18 0.0071

Fig. 4. The box plots of two-dimensional dispersion entropy (DispEn2D). The upper
whisker represents the maximum value; the upper line of the box represents Q3 (upper
quartile); the center line inside the box represents the median; the lower line of the box
represents Q1 (lower quartile); and the lower whisker represents the minimum value.
Lower case letters indicate differences between groups (non-pregnant; pregnant) for
p < 0.05. Asterisks indicate differences color channels (Grayscale, Red, Green, Blue)
(∗ ∗ p < 0.01; ∗ ∗ ∗p < 0.0001)

Table 4. The results of two-dimensional distribution entropy (DistEn2D) for different


color channels: mean, median, standard deviation (std) and the significance of p value
between groups (non-pregnant; pregnant)

Color channels Non-pregnant Pregnant p


mean median std mean median std
Grayscale 0.93 0.93 0.04 0.91 0.92 0.05 0.1262
Red 0.64 0.65 0.08 0.84 0.90 0.14 <0.0001
Green 0.93 0.94 0.04 0.95 0.96 0.03 <0.0001
Blue 0.83 0.83 0.04 0.91 0.93 0.06 <0.0001
62 M. Borowska et al.

Fig. 5. The box plots of two-dimensional distribution entropy (DistEn2D). The upper
whisker represents the maximum value; the upper line of the box represents Q3 (upper
quartile); the center line inside the box represents the median; the lower line of the box
represents Q1 (lower quartile); and the lower whisker represents the minimum value.
Lower case letters indicate differences between groups (non-pregnant; pregnant) for
p < 0.05. Asterisks indicate differences color channels (Grayscale, Red, Green, Blue)
(∗ ∗ ∗p < 0.0001)

4 Discussion
In recent research, the IRT imaging of the mares’ flank area of the abdomen was
successfully used for the pregnancy evaluation using a conventional [3,20] and
an advanced [7] approaches. In the most recent research, it has been assumed
that two entropy features were extracted using Gray Level Co-occurrence Matrix
(GLCM) approach, summation entropy (SumEntrp) and Entropy, increased with
the progression of pregnancy in the Red and I channels of IRT images [7]. There-
fore, in the current study, the investigation of mares’ IRT images with four detail
entropy-based measures is fully justified. One may observe that in the Red and
Blue channels, the studied groups differed regardless of the entropy feature. In
both referred channels, the SampEn2D and FuzzEn2D decreased with the preg-
nancy, whereas the DispEn2D and DistEn2D increased with the pregnancy. The
current SampEn2D and FuzzEn2D results are convergent with the recent find-
ings, according to which the entropy features increased with the pregnancy pro-
gression [7]. It has been suggested that changes in physiological condition, such
as exercise [8,19] or pregnancy [7], cases rise of the degree of thermal energy dis-
sipation and thus entropy of the thermal image texture. In the case of pregnancy,
the thermal energy emission increase with the increase of metabolic energy pro-
duced by the growing fetus and enlarging uterus [3,4,11]. As on the IRT images,
the temperatures are coded with respective image colors, the Red component
reflects the high temperature whereas, the Blue one reflects the low temperature
[21]. Therefore, the highest differences observed currently in the Red and Blue
channel are consistent with conventional thermal results indicating the minimal
and maximal temperatures as significantly differ between pregnant states [3,20].
One can also note that both in this and previous [7] research, the share of dif-
ferences in the studied features in the Green channel in the thermal pattern of
Two-Dimensional Entropies 63

the mare’s flank area was the lowest. Considering that each IRT image is com-
posed of the total effect of three separately considered channels, the Red, Green,
and Blue one [27], the final findings in the Grayscale image can be indicative of
the whole thermal image. It is worth noting that in three out of four explored
entropy-based methods, the SampEn2D, FuzzEn2D, and DispEn2D, the values
of features were higher in the pregnant than non-pregnant group. One may sug-
gest that the IRT images obtained during pregnancy showed a more irregular
and complex texture of the thermal body surface. This could be considered as
the most important finding of the current preliminary research, as the main
result is convergent with the current state of knowledge [3,7,20]. Our primary
entropy-based results seem to be promising in distinguishing the pregnant and
non-pregnant state of the mare based on their IRT images of the flank area.
However, further research is required to apply these entropy-based measures not
only to detect pregnancy but also to monitor the progression of pregnancy during
its duration.

5 Conclusion
Thermal images obtained from the non-pregnant and pregnant mares can be
successfully investigated by four entropy-based measures after conversion to
grayscale image and three basic color channels. The signs of higher irregu-
larity and complexity of IRT image texture were evidenced for the composite
Grayscale image using three out of four explored entropy-based methods. How-
ever, the Red and Blue channels-depended pattern of the IRT image differed
between the entropy features. Although this pattern could be initially assigned
as the SampEn2D/FuzzEn2D-depended or the DispEn2D/DistEn2D-depended,
the role and utility of the subsequent entropy features and color channels in preg-
nancy or other pathologies distinguish or monitoring, require further research.

Acknowledgement. The study was performed as part of the projects WI/WM-


IIB/2/2021 and was partially financed with funds from the Polish Ministry of Science
and Higher Education.

References
1. Azami, H., Escudero, J., Humeau-Heurtier, A.: Bidimensional distribution entropy
to analyze the irregularity of small-sized textures. IEEE Signal Process. Lett. 24(9),
1338–1342 (2017)
2. Azami, H., da Silva, L.E.V., Omoto, A.C.M., Humeau-Heurtier, A.: Two-
dimensional dispersion entropy: an information-theoretic method for irregularity
analysis of images. Signal Process.: Image Commun. 75, 178–187 (2019)
3. Bowers, S., Gandy, S., Anderson, B., Ryan, P., Willard, S.: Assessment of preg-
nancy in the late-gestation mare using digital infrared thermography. Theriogenol-
ogy 72(3), 372–377 (2009)
4. Bucca, S., Fogarty, U., Collins, A., Small, V.: Assessment of feto-placental well-
being in the mare from mid-gestation to term: transrectal and transabdominal
ultrasonographic features. Theriogenology 64(3), 542–557 (2005)
64 M. Borowska et al.

5. Chen, W., Wang, Z., Xie, H., Yu, W.: Characterization of surface EMG signal
based on fuzzy entropy. IEEE Trans. Neural Syst. Rehabil. Eng. 15(2), 266–272
(2007)
6. Dascanio, J., McCue, P.: Equine Reproductive Procedures. Wiley, Hoboken (2021)
7. Domino, M., et al.: Advances in thermal image analysis for the detection of preg-
nancy in horses using infrared thermography. Sensors 22(1), 191 (2022)
8. Domino, M., et al.: The effect of rider: horse bodyweight ratio on the superficial
body temperature of horse’s thoracolumbar region evaluated by advanced thermal
image processing. Animals 12(2), 195 (2022)
9. Domino, M., Romaszewski, M., Jasiński, T., Maśko, M.: Comparison of the surface
thermal patterns of horses and donkeys in infrared thermography images. Animals
10(12), 2201 (2020)
10. Durrant, B.S., Ravida, N., Spady, T., Cheng, A.: New technologies for the study
of carnivore reproduction. Theriogenology 66(6–7), 1729–1736 (2006)
11. Fowden, A.L., Giussani, D., Forhead, A.: Physiological development of the equine
fetus during late gestation. Equine Vet. J. 52(2), 165–173 (2020)
12. Hilal, M., Berthin, C., Martin, L., Azami, H., Humeau-Heurtier, A.: Bidimensional
multiscale fuzzy entropy and its application to pseudoxanthoma elasticum. IEEE
Trans. Biomed. Eng. 67(7), 2015–2022 (2019)
13. Humeau-Heurtier, A.: Texture feature extraction methods: a survey. IEEE Access
7, 8975–9000 (2019)
14. Jones, M., et al.: Assessing pregnancy status using digital infrared thermal imaging
in Holstein Heifers. In: Journal of Dairy Science, vol. 88, pp. 40–41. AMER DAIRY
SCIENCE ASSOC 1111 N DUNLAP AVE, SAVOY, IL 61874 USA (2005)
15. Kastelic, J., Cook, R., Coulter, G., Wallins, G., Entz, T.: Environmental factors
affecting measurement of bovine scrotal surface temperature with infrared ther-
mography. Anim. Reprod. Sci. 41(3–4), 153–159 (1996)
16. Kirkpatrick, J.F., Lasley, B.L., Shideler, S.E., Roser, J.F., Turner, J.W., Jr.: Non-
instrumented immunoassay field tests for pregnancy detection in free-roaming feral
horses. J. Wildl. Manag. 57, 168–173 (1993)
17. Lahiri, B., Bagavathiappan, S., Jayakumar, T., Philip, J.: Medical applications of
infrared thermography: a review. Infrared Phys. Technol. 55(4), 221–235 (2012)
18. Li, P., Liu, C., Li, K., Zheng, D., Liu, C., Hou, Y.: Assessing the complexity
of short-term heartbeat interval series by distribution entropy. Med. Biol. Eng.
Comput. 53(1), 77–87 (2014). https://doi.org/10.1007/s11517-014-1216-0
19. Masko, M., Borowska, M., Domino, M., Jasinski, T., Zdrojkowski, L., Gajewski,
Z.: A novel approach to thermographic images analysis of equine thoracolumbar
region: the effect of effort and rider’s body weight on structural image complexity.
BMC Vet. Res. 17(1), 1–12 (2021)
20. Maśko, M., Witkowska-Pilaszewicz, O., Jasiński, T., Domino, M.: Thermal fea-
tures, ambient temperature and hair coat lengths: limitations of infrared imaging
in pregnant primitive breed mares within a year. Reprod. Domest. Anim. 56(10),
1315–1328 (2021)
21. Mccafferty, D.J.: The value of infrared thermography for research on mammals:
previous applications and future directions. Mamm. Rev. 37(3), 207–223 (2007)
22. McCue, P.M.: Reproductive evaluation of the mare. Equine Reproductive Proce-
dures, p. 1 (2014)
23. Richman, J.S., Moorman, J.R.: Physiological time-series analysis using approxi-
mate entropy and sample entropy. Am. J. Physiol.-Heart Circ. Physiol. 278(6),
H2039–H2049 (2000)
Two-Dimensional Entropies 65

24. Rostaghi, M., Azami, H.: Dispersion entropy: a measure for time-series analysis.
IEEE Signal Process. Lett. 23(5), 610–614 (2016)
25. da Silva, L.E.V., Senra Filho, A.C.D.S., Fazan, V.P.S., Felipe, J.C., Murta, L.O.:
Two-dimensional sample entropy analysis of rat sural nerve aging. In: 2014 36th
Annual International Conference of the IEEE Engineering in Medicine and Biology
Society, pp. 3345–3348. IEEE (2014)
26. Soroko, M., Howell, K., Dudek, K.: The effect of ambient temperature on infrared
thermographic images of joints in the distal forelimbs of healthy racehorses. J.
Therm. Biol. 66, 63–67 (2017)
27. Szczypiński, P., Klepaczko, A., Pazurek, M., Daniel, P.: Texture and color based
image segmentation and pathology detection in capsule endoscopy videos. Comput.
Methods Programs Biomed. 113(1), 396–411 (2014)
28. Szczypiński, P.M., Klepaczko, A.: Mazda–a framework for biomedical image tex-
ture analysis and data exploration. In: Biomedical Texture Analysis, pp. 315–347.
Elsevier (2017)
29. Tamura, H., Mori, S., Yamawaki, T.: Textural features corresponding to visual
perception. IEEE Trans. Systems Man Cybern. 8(6), 460–473 (1978)
30. Zarychta, P.: Application of fuzzy image concept to medical images matching. In:
Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) ITIB 2018. AISC, vol. 762,
pp. 27–38. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91211-0 3
Eye-Tracking as a Component
of Multimodal Emotion Recognition
Systems

Weronika Celniak(B) and Piotr Augustyniak

Department of Biocybernetics and Biomedical Engineering,


University of Science and Technology, Mickiewicza Ave. bldg. C-3,
30-059 Kraków, Poland
{wcelniak,august}@agh.edu.pl

Abstract. Emotions play an important role in everyday human interac-


tion. The accurate detection of these non-verbal signals by intelligent sys-
tems would allow for a more intuitive way of communicating in human-
computer interfaces and could allow for more effective assistance to peo-
ple suffering from mental disorders. Therefore, this is a constantly popu-
lar topic among researchers. Many detection algorithms based on a single
signal such as speech, facial expression, EEG or ECG have been described
in the literature, but thanks to both: recent technological advances in
machine learning and increased available computing power, researchers’
attention is increasingly focused on solutions using more than one modal-
ity. The aim of this paper is to present introduced to date multimodal
emotion recognition systems using eye-tracking as a method in multi-
modal emotion recognition systems as well as a brief overview of the
currently available eye-tracking devices and psychophysiological basis of
emotion-related eye features.

Keywords: Affective computing · Emotions detection · Eye-tracking

1 Introduction
Emotion recognition is a field of research that has been constantly gaining pop-
ularity. Algorithms enabling translation of this non-verbal communication cues
can not only provide another step towards enriching computer-user interaction,
but additionally may be a helpful tool in the treatment of anxiety, depression
or autism spectrum disorders. The beginning of research on automatic human
emotion detection goes back decades, but the constant development of new sig-
nal processing, machine learning and computer vision methods combined with
an increase in the accuracy and availability of measuring equipment prompts
researchers to further research in this area.
Computer recognition of emotion (also known as affective computing) is
mainly targeted to adapt the man-machine communication to emotional state of
the human user [25]. As it was proven in psychology (i.e. [39]), emotions favor
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 66–75, 2022.
https://doi.org/10.1007/978-3-031-09135-3_6
Eye-Tracking as a Component of Multimodal Emotion Recognition Systems 67

the fast (i.e. intuitive) part of human cognition and reduce slow analytic pro-
cesses. Consequently, with increase of the user’s emotion intensity his or her
perception (visual, auditory and other) becomes faster, less detailed and more
prone to use stereotypes. Certainly, maintaining unambiguous communication
in variable emotional status of the human requires the machine to recognize the
emotions and to adapt cues accordingly.
Computer recognition of emotion is also employed as part of automatic
behavior tracking in real and virtual life (e.g. security surveillance, workplace
monitoring, communication with impaired people or computer games). Wide
areas of possible applications justify the attention paid recently by research
teams worldwide to develop an effective yet inobtrusive way of detection.
Achieving a method for the automatic assessment of emotions requires close
cooperation between psychologists and engineers as the selection of appropriate
parameters and further analyses require specialized knowledge. Among the most
popular emotion classification models are Circumplex [30], Ekman’s [12] and
Plutchik’s models. Visual representation of Plutchnik’s emotion model can be
presented as a wheel. It consists of eight basic bipolar emotions that result
from adaptation mechanisms. These are respectively anticipation, joy, trust, fear,
surprise, sadness, disgust and anger. In this theory all other emotions are either
a combination of basic emotions or a version of a basic emotion of greater or
lesser intensity [29]. Paul Ekman in his theory introduced six basic emotions: joy,
sadness, anger, fear, disgust and surprise, which in his opinion are experienced
and recognized universally regardless of culture [12]. The circumplex emotions
model differs from other mentioned models because it’s not discrete. Instead
every emotion is described by two parameters valence and arousal. Every emotion
can be shown on a two-dimensional plane created by the axes corresponding to
these parameters [30].
Taking into account the models introduced by psychologists, researchers aim
to create a method that allows the computer system to translate measurement
signals into emotions. Depending on the usage scenario, solutions based on facial
expression recognition [20,23,38], speech and voice analysis [10,16,26] or physi-
ological signals have been presented. Observations of emotional state changes in
physiological signals are possible due to changes of activity of the autonomic ner-
vous system. It consists of two bipolar components – sympathetic and parasym-
pathetic nervous systems. The former is responsible for the preparation of the
organism in the event of a potential threat. It causes symptoms such as increased
hear rate, pupil size, blood pressure and respiration rate, excessive sweating. In
contrary the latter aims to maintain homeostasis of the organism. Its activity
causes among others decrease in heart rate, pupil size and blood pressure as well
as muscle relaxation [17].
Common knowledge about the effects of these systems and brainwaves activ-
ity was the basis for the development of emotion recognition methods built on
ECG (electrocardiography), EEG (electroencephalography), EOG (electroocu-
lography), EMG (electromyography), GSR (Galvanic Skin Response), Respi-
ration Rate or HRV (Heart Rate Variability) signal analysis. The solutions
68 W. Celniak and P. Augustyniak

described in the literature use single of the above-mentioned signals, but also
combinations of several of them. Nevertheless, the results obtained with these
methods still leave room for improvement. Eye-tracking is a relatively new and
promising approach to emotional evaluation. The use of eye-movement features
in a multimodal system could enrich the emotion assessment algorithms with
additional information. Moreover, eye-tracking is related to the principal human
sense of sight that rapidly responds to emotion changes in search for further cues.
Consequently, pupils becomes larger and eye motions show more variance with
the growth of emotions intensity. The scanpath also has a technical advantage –
can be recorded remotely and, in particular circumstances, behind the knowledge
of the subject. For similar advantages, the eye-tracking has recently been proved
effective for detection of deception [9] in various scenarios of investigation.
In this paper, we provide an overview of the solutions presented so far using
eye-tracking as a method in multimodal emotion recognition. It is organized as
follows: firstly the eye features of the greatest importance are presented, then
commonly used eye-tracking devices are characterized, next section contains a
description of architectures introduced to date. In the final part a summary
and comparison are given along with the advantages and disadvantages of the
presented approaches.

2 Emotion Assessment Based on Eye Features

The aim of automatic emotion recognition systems is to replicate as closely as


possible the human perception. The basis for creating such solutions should
therefore be psychological and psychophysiological knowledge based on reliable
research. To this day there isn’t one universal way to choose the most relevant
eye feature for classification in all scenarios. Therefore, it is recommended to
conduct tests evaluating the effectiveness of the designed model with diverse
available emotion-related eye features described in the following part of this
section.
The most popular parameter, which can be used as an indicator of cur-
rent emotional state, is the change in pupil diameter. Strong emotional stimuli
(negative or positive) results in sympathetic nervous system response. As a con-
sequence iris dilator muscle contract and the pupil size expands [22]. Pupil size
diameter is easy to observe and measure, however it cannot be used as an only
parameter in emotion recognition task, because it is also connected with lighting
changes [35] and cognitive processing [2,11,13,40].
An alternative parameter for emotion recognition can be blink rate. The
primary goal of blinking is restoring the neutral protective tear layer on the
cornea, but it has been also shown that blinking frequency changes in different
attention states [5]. Recent studies have described the dependency of blinking
patterns with experienced emotion [19].
Another variable that could be used in emotion recognition task is fixation
characteristic. It consist of fixation counter, fixation duration and fixation fre-
quency. The studies have shown that there is a notable difference in fixation
Eye-Tracking as a Component of Multimodal Emotion Recognition Systems 69

duration in response to negative versus positive stimuli [1,32]. In [28] authors


provide a list of scanpath parameters sorted accordingly to their correlation to
the intensity of acoustic stimuli accompanying a simple visual task of the human.
Besides those mentioned above, other eye features can be easily measured and
used for emotion recognition purposes such as pupil position, pupil movement,
gaze distance or eye motion speed. As such, more studies need to be conducted
to assess the reliability of these features as emotion indicators.

3 Eye-Tracking Devices Overview

The beginnings of research on eye movement date back to the beginning of the
20th century. The author of the first eye-tracking device was Edmund Huey,
and the solution presented by him allowed for tracking eyes in a horizontal
line. It consisted of a lens with a pointer attached to it. The movement of
the pointer reflects the movement of the eye while reading. Advancement in
technology allowed to create less invasive, more accurate and convenient meth-
ods. Nowadays eye-trackers use image processing algorithms to calculate gaze
direction from pupil center corneal reflection caused by the near-infrared light
source. Webcam image-based algorithms are also gaining popularity but follow
up research must be conducted to asses accuracy compared to near-infrared
based eye-tracking.
Both proprietary solutions and commercially available eye-tracking devices
are used in the research. They can be divided into two groups according to the
way they work: static and wearable (i.e. head-mounted). Eye-tracking is also used
in AR\VR headsets. Both options have their advantages and disadvantages. The
former allows for more natural conditions, while the latter enables measurement
while observing both the real world and the virtual world presented on the screen.

3.1 Desktop Eye-Trackers

Desktop eye-trackers usually have the form of a device mounted next to or


below the computer monitor. In order for these devices to work properly, the
user’s face must be placed in front of the monitor with their eyes at the same
height (Fig. 1). Among the most popular screen-based eye-trackers are Tobii and
Gazepoint devices. Both of them are broadly used in research in wide variety
of cases. These include, in addition to emotion recognition, Tourrete syndrome
research [4], autism spectrum disorder diagnosis [7], glaucoma [6] or strabismus
diagnosis [31] and many others.
The desktop eye trackers are static and thus applicable in predefined
workspace. Usually a computer display is used for presentation of visual cues in
so-called ‘visual tasks’. Human performance in standardized visual tasks is eval-
uated with a set of eye trajectory (i.e. scanpath) parameters. Some researchers
use the visual cues to elicit emotions, while others prefer the auditory stimu-
lation to maintain resulting scanpath parameters independent of the stimulus.
The advantage of static eye-trackers is their output in absolute space coordinates,
70 W. Celniak and P. Augustyniak

Fig. 1. Desktop eye-tracker coordinate systems

that may easily be laid over the layer of displayed visual stimulus. In some cases,
however, additional algorithm is needed to trace the face and maintain the focus
on the eye. Alternatively, the head may be immobilized in a chin support, what
is not comfortable to the observer, particularly in long-lasting experiments.

3.2 Mobile Eye-Trackers


Mobile eye-trackers are essentially glasses that have built-in cameras located
close to the eyes and an additional camera for a world view capturing. This
design allows the user to move more freely, but is more susceptible to interference
from, for example, shifting the position of the cameras in relation to the eyes.
The most widely available mobile eye trackers used in research are products from
Pupil Labs, Eyelink and Tobii. Studies using these devices include among others
the reading strategies research [24], toddler attention [21] or flight simulation
research [15].
Wearable (or head-mounted) eye-trackers are mobile and follow the subject
whenever he or she goes. This advantage supported by lightweight hardware hub
and wireless data link enables scanpath-based studies in the field in scenarios like
sport, everyday living, car driving etc. Natural scenes are replaced by computer-
animated displays by some researchers either to arrange safe experimental setup
(such as flight or drive simulators) or to provide a visual feedback to the observer.
Wearable eye-trackers provide the scanpath in coordinates relative to the head.
Consequently, relating the eye trajectory to the scene requires head positioning,
that can be done visually (i.e. with a head camera or environmental camera) or
with an inertial motion capture system.

3.3 Eye-Tracking in AR\VR Sets


With the constant expansion of the global virtual and augmented reality mar-
ket size producers are constantly trying to surpass rival’s products. Eye-tracking
techniques are considered to be one of the most beneficial features as they allow
to achieve a more immersive experience for users. The principle of eye-tracking
in AR\VR sets is similar to the classic approach, but with one major difference.
As the display is head-mounted and located close to the eyes the vergence dis-
tance may differ from the focal distance, whereas in real life they are the same
Eye-Tracking as a Component of Multimodal Emotion Recognition Systems 71

[8] (Fig. 2). Because of that, there is a lack of depth information and to carry out
eye-tracking additional calibration have to be conducted so that it is possible to
build a model of accurate gaze trajectory. In recent works various parameters of
the signal captured with commercially available virtual reality and mixed real-
ity glasses have been evaluated, among the most popular devices are Microsoft
Hololens 2 [3], HTC Vive Pro Eye [34], Fove-0 and the Varjo VR-1 [37]. Besides
creating more immersive setup of the experiment, the advantage of eye-tracking
in AR\VR sets is the rigid correspondence of scene and eye coordinates, thus
the scanpath is natively represented in the geometry of the scene.

Fig. 2. VR\AR eye-tracking coordinate systems

4 Multimodal Emotion Recognition Systems Architecture


Emotion recognition is not a trivial task and since the human organism is a com-
bination of mutually interacting systems detection based solely on one modality
is a complicated matter. Thus researchers more and more frequently lean towards
developing multimodal systems. In this section, we review solutions where eye-
tracking is one of the modalities. Predominately those approaches combine elec-
troencephalography (EEG) with eye-tracking. To some point these techniques
are compatible due to similar voltage, although the frequency of oculomotoric
signal is higher. Some emotions-related parameters derived from eye-tracking
such as blinking are also detectable in EEG and known as ‘oculomotor artifact’
and subject to sophisticated algorithms like [33].
72 W. Celniak and P. Augustyniak

Signals in multimodal recognition systems can be fused on either feature


level or decision level. Signals can also be fused by resampling them to the same
sampling rate. In [41] achieved feature vector is a combination of EEG signal
features and pupil diameter features. Short-time Fourier transform with a non-
overlapped Hanning window of 4s was used to extract these frequency domain
features. Eye-tracking signal was collected with SMI eye track glasses. The clas-
sification was performed based on the support vector machine model. On the
decision level efficiency of two different approaches was evaluated – sum strat-
egy and max strategy. The first method involves training two separate classifiers
with a single modality signal. As the final result the output with higher probabil-
ity is chosen. Another sums up probabilities of emotions from separate frequency
bands and selects the highest probability. The best results obtained were 73.59%
and 72.98%, respectively for feature fusion approach and max strategy fusion.
Lu et al. [18] besides max strategy and sum strategy employed fuzzy integral
fusion and compared all of the mentioned approaches to classification results
based on EEG and eye features (gathered from SMI ETG eye-tracking glasses)
separately. They have found that all of the fusion techniques achieved better
accuracy than the single modality technique. The most beneficial technique
seemed to be fuzzy integral fusion that outperformed unimodal solution by nearly
10% and obtained a final result of 87.59% for three basic emotional states inves-
tigated: positive, neutral and negative.
Research presented by [36] showed that there is a significant difference in
performance between feature level fusion and decision level fusion. While the
former didn’t allow to improve accuracy in presented examination settings, the
latter improved arousal classification radically and didn’t cause accuracy loss
for valence classification. Modalities used were EEG and eye gaze data. Feature
fusion was performed feature vectors concatenation and for decision level fusion
an SVM classifier with RBF kernel was trained for each modality signal sepa-
rately then specified confidence measures were calculated for both models and
summed up. Final classification is done by selecting class with the highest result.
Feature fusion efficiency was also investigated in [43]. The model proposed
for classification consisted of FCNN with two hidden layers. Feature vector was
built with parameters obtained from electroencephalogram and external physio-
logical signals such as eye features, galvanic skin response, skin temperature, and
respiratory belt. Accuracy for four emotions classification namely: satisfaction,
depression, excitement, and wrath reached 92.07%.
Paper [42] provides a description of a solution called EmotionMeter. It is
a framework for emotion recognition from six-electrode EEG and eye tracker.
This solution differs significantly from mentioned before since it uses a bimodal
deep auto-encoder (BDAE) for modalities fusion. Its input was attained from
two restricted Boltzmann machines trained separately on the EGG and eye-
tracking data (from desktop Tobii eye tracker). The final accuracy of this method
was 85.11% which was a notable improvement compared with a single modality
classification (67.82% and 70.33%).
Eye-Tracking as a Component of Multimodal Emotion Recognition Systems 73

Guo et al. [14], apart from BDAE, took advantage of features fusion. They
combined features extracted from Eye Image, Eye Movement, and EEG signals
such as pupil diameter, dispersion, fixation duration, saccade, and different eye
events statistics e.g. blink frequency from Eye Movement, from Eye Image tem-
poral features was achieved using a combination of CNN and LSTM networks
and from EEG – Differential Entropy (DE) features. For eye images capturing
SMI ETG glasses were used. For the classification model Support vector machine
(SVM) with a linear kernel was used. They found out that accuracy of the model
trained on all of the features was 79.63%, but the combination of movement and
eye images followed close by with the result of 71.99%
Contrary to the combination of EEG and eye-tracking signals Perdiz et al.
[27] used EMG and EOG signals, to classify four basic expressions: neutral, sad,
happy, and angry. Although more research has to be conducted to get more
information about accuracy of this proposed method.

5 Conclusion

Eyetracking is a relatively easy measurement revealing the performance of sight


– the principal sense in human. It may be employed as a stationary touchless
device inobtrusive and with no hygienic issues, or as wearable device allowing
for in-field experiments or either together with AR\VR goggles that display
the immersive visual stimulus and collect the oculomotor response in observer
experiences ranging far beyond what can be found in the real world. Definitely
the eye-tracker may be a valuable component of emotion recognition system
providing a trace of visual interaction and emotion-related parameters of the
scanpath.

Acknowledgement. This research was funded by AGH University of Science and


Technology in 2022 as research project No. 16.16.120.773.

References
1. Alshehri, M., Alghowinem, S.: An exploratory study of detecting emotion states
using eye-tracking technology. In: 2013 Science and Information Conference, pp.
428–433 (2013)
2. Ariel, R., Castel, A.D.: Eyes wide open: enhanced pupil dilation when selectively
studying important information. Exp. Brain Res. 232(1), 337–344 (2013). https://
doi.org/10.1007/s00221-013-3744-5
3. Aziz, S.D., Komogortsev, O.V.: An assessment of the eye tracking signal quality
captured in the Hololens 2. arXiv preprint arXiv:2111.07209 (2021)
4. Beljaars, D.: Eye-tracking: retracing visual perception in the everyday environ-
ments of people with Tourette syndrome (2015)
5. Bentivoglio, A.R., Bressman, S.B., Cassetta, E., Carretta, D., Tonali, P., Albanese,
A.: Analysis of blink rate patterns in normal subjects. Mov. disord. 12(6), 1028–
1034 (1997)
74 W. Celniak and P. Augustyniak

6. Bhowmik, S., Arjunan, S.P., Sarossy, M., Radcliffe, P., Kumar, D.K.: Pupillometric
recordings to detect glaucoma. Physiol. Meas. 42(4), 045003 (2021)
7. Bristol, S., et al.: Visual biases and attentional inflexibilities differentiate those
at elevated likelihood of autism: an eye-tracking study. Am. J. Occup. Ther.
74(4 Supplement 1), 7411505221p1–7411505221p1 (2020)
8. Clay, V., König, P., Koenig, S.: Eye tracking in virtual reality. J. Eye Mov. Res.
12(1), 3 (2019)
9. Cook, A.E., et al.: Lyin’ eyes: ocular-motor measures of reading reveal deception.
J. Exp. Psychol. Appl. 18(3), 301–313 (2012). https://doi.org/10.1037/a0028307
10. Dimitrova-Grekow, T., Klis, A., Igras-Cybulska, M.: Speech emotion recognition
based on voice fundamental frequency. Arch. Acoust. 44, 277–286 (2019)
11. Eckstein, M.K., Guerra-Carrillo, B., Singley, A.T.M., Bunge, S.A.: Beyond eye
gaze: what else can eyetracking reveal about cognition and cognitive development?
Dev. Cogn. Neurosci. 25, 69–91 (2017)
12. Ekman, P.: Basic emotions. Handb. Cogn. Emot. 98(45–60), 16 (1999)
13. de Gee, J.W., Knapen, T., Donner, T.H.: Decision-related pupil dilation reflects
upcoming choice and individual bias. Proc. Natl. Acad. Sci. 111(5), E618–E625
(2014)
14. Guo, J.J., Zhou, R., Zhao, L.M., Lu, B.L.: Multimodal emotion recognition from
eye image, eye movement and EEG using deep neural networks. In: 2019 41st
Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (EMBC), pp. 3071–3074. IEEE (2019)
15. Korek, W.T., Mendez, A., Asad, H.U., Li, W.-C., Lone, M.: Understanding human
behaviour in flight operation using eye-tracking technology. In: Harris, D., Li, W.-
C. (eds.) HCII 2020, Part II. LNCS (LNAI), vol. 12187, pp. 304–320. Springer,
Cham (2020). https://doi.org/10.1007/978-3-030-49183-3 24
16. Kwon, S., et al.: MLT-DNet: speech emotion recognition using 1D dilated CNN
based on multi-learning trick approach. Expert Syst. Appl. 167, 114177 (2021)
17. Levenson, R.: The autonomic nervous system and emotion. Emot. Rev. 6, 100–112
(2014)
18. Lu, Y., Zheng, W.L., Li, B., Lu, B.L.: Combining eye movements and EEG to
enhance emotion recognition. In: Twenty-Fourth International Joint Conference
on Artificial Intelligence (2015)
19. Maffei, A., Angrilli, A.: Spontaneous blink rate as an index of attention and emo-
tion during film clips viewing. Physiol. Behav. 204, 256–263 (2019)
20. Maglogiannis, I., Vouyioukas, D., Aggelopoulos, C.: Face detection and recognition
of natural human emotion using Markov random fields. Pers. Ubiquit. Comput.
13(1), 95–101 (2009)
21. Martin, K.B.: Differences aren’t deficiencies: eye tracking reveals the strengths of
individuals with autism. Retr. Sept. 12, 2021 (2018)
22. Mathôt, S.: Pupillometry: psychology, physiology, and function. J. Cogn. 1(1), 16
(2018)
23. Mehendale, N.: Facial emotion recognition using convolutional neural networks
(FERC). SN Appl. Sci. 2(3), 1–8 (2020)
24. Miranda, A.M., Nunes-Pereira, E.J., Baskaran, K., Macedo, A.F.: Eye movements,
convergence distance and pupil-size when reading from smartphone, computer,
print and tablet. Scand. J. Optom. Vis. Sci. 11(1), 1–5 (2018)
25. Nalepa, G.J., Palma, J., Herrero, M.T.: Affective computing in ambient intelligence
systems. Future Gener. Comput. Syst. 92, 454–457 (2019). https://doi.org/10.
1016/j.future.2018.11.016
Eye-Tracking as a Component of Multimodal Emotion Recognition Systems 75

26. Ntalampiras, S.: Speech emotion recognition via learning analogies. Pattern Recog-
nit. Lett. 144, 21–26 (2021)
27. Perdiz, J., Pires, G., Nunes, U.J.: Emotional state detection based on EMG and
EOG biosignals: a short survey. In: 2017 IEEE 5th Portuguese Meeting on Bio-
engineering (ENBENG), pp. 1–4. IEEE (2017)
28. Przybylo, J., Kańtoch, E., Augustyniak, P.: Eyetracking-based assessment of affect-
related decay of human performance in visual tasks. Future Gener. Comput. Syst.
92, 504–515 (2019). https://doi.org/10.1016/j.future.2018.02.012, https://www.
sciencedirect.com/science/article/pii/S0167739X17312001
29. Robert, P.: The nature of emotions. Am. Sci. 89(4), 344–350 (2001)
30. Russell, J.A.: A circumplex model of affect. J. Personal. Soc. Psychol. 39(6), 1161
(1980)
31. Saisara, U., Boonbrahm, P., Chaiwiriya, A.: Strabismus screening by eye tracker
and games. In: 2017 14th International Joint Conference on Computer Science and
Software Engineering (JCSSE), pp. 1–5. IEEE (2017)
32. Scott, G.G., O’Donnell, P.J., Sereno, S.C.: Emotion words affect eye fixations dur-
ing reading. J. Exp. Psychol.: Learn. Mem. Cogn. 38(3), 783 (2012)
33. Shahbakhti, M., et al.: Simultaneous eye blink characterization and elimination
from low-channel prefrontal EEG signals enhances driver drowsiness detection.
IEEE J. Biomed. Health Inform. 26, 1001–1012 (2021). https://doi.org/10.1109/
JBHI.2021.3096984
34. Sipatchin, A., Wahl, S., Rifai, K.: Eye-tracking for low vision with virtual reality
(VR): testing status quo usability of the HTC Vive Pro Eye. bioRxiv (2020)
35. Sirois, S., Brisson, J.: Pupillometry. Wiley Interdiscip. Rev.: Cogn. Sci. 5(6), 679–
692 (2014)
36. Soleymani, M., Pantic, M., Pun, T.: Multimodal emotion recognition in response
to videos. IEEE Trans. Affect. Comput. 3(2), 211–223 (2011)
37. Stein, N., et al.: A comparison of eye tracking latencies among several commercial
head-mounted displays. i-Perception 12(1), 2041669520983338 (2021)
38. Tie, Y., Guan, L.: A deformable 3-D facial expression model for dynamic human
emotional state recognition. IEEE Trans. Circuits Syst. Video Technol. 23(1), 142–
157 (2012)
39. Tversky, A., Kahneman, D.: Judgment under uncertainty: heuristics and biases.
Sci., New Ser. 185(4157), 1124–1131 (1974)
40. Zekveld, A.A., Kramer, S.E.: Cognitive processing load across a wide range of
listening conditions: insights from pupillometry. Psychophysiology 51(3), 277–284
(2014)
41. Zheng, W.L., Dong, B.N., Lu, B.L.: Multimodal emotion recognition using EEG
and eye tracking data. In: 2014 36th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, pp. 5040–5043. IEEE (2014)
42. Zheng, W.L., Liu, W., Lu, Y., Lu, B.L., Cichocki, A.: Emotionmeter: a multimodal
framework for recognizing human emotions. IEEE Trans. Cybern. 49(3), 1110–1122
(2018)
43. Zhou, J., Wei, X., Cheng, C., Yang, Q., Li, Q.: Multimodal emotion recognition
method based on convolutional auto-encoder. Int. J. Comput. Intell. Syst. 12(1),
351–358 (2019)
Sleep Quality in Population Studies –
Relationship of BMI and Sleep Quality
in Men

Agnieszka Witek1 , Marcin Bugdol2(B) , and Anna Lipowicz1


1
Department of Anthropology, Institute of Environmental Biology,
Wroclaw University of Environmental and Life Sciences, ul. Kożuchowska 5,
51-631 Wroclaw, Poland
{agnieszka.witek,anna.lipowicz}@upwr.edu.pl
2
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
marcin.bugdol@polsl.pl

Abstract. Sleep is a physiological state in which quality and quantity


should be kept accurate to maintain health in humans. Length of sleep
is one of the indicators of sleep quality along with sleep latency, char-
acteristics of awakenings during the night, and sleep architecture dis-
turbance. Such parameters can be analyzed with subjective and objec-
tive methods. The most comprehensive objective method, known as the
gold standard of sleep assessment, is polysomnography. It consists pri-
marily of electroencephalography, electrooculography and electromyog-
raphy, which enable evaluation of sleep stages occurring during the study.
Apart from assessing sleep architecture, in polysomnography the sensors
usually record the position of the patient’s body at night, limbs, and
breathing movement. There are numerous risk factors of sleep disorders
like smoking, abusing alcohol, or obesity. The aim of the study was to
assess the sleep parameters of men by their BMI value. Sleep parameters
from polysomnographic records of 94 men by the value of their BMI were
tested with the Pearson correlation coefficient test. The higher BMI the
men had, the shorter was their total sleep time (p = 0.017), the lower
was their sleep efficiency (p = 0.013), the more sleep apnea (p = 0.003)
and oxygen desaturations (p = 0.002) per hour of NREM sleep they had.
The mean BMI of the sample indicates the men were obese, hence the
accumulation of fat tissue around the throat. Such sleep disturbances
often occur due to fat accumulation around the throat in people with
high BMI. It is a cause of narrowing the throat, making it easier to col-
lapse during sleep, preventing breathing. Such episodes of apnea cause
oxygen desaturation and awakening

Keywords: Polysomnography · Body Mass Index · Sleep parameters

1 Introduction
Sleep is a physiological state present in humans as well as in most other species.
To keep optimal health, both quality and quantity of sleep should be well [2]. The
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 76–83, 2022.
https://doi.org/10.1007/978-3-031-09135-3_7
Sleep Quality 77

American Academy of Sleep Medicine and Sleep Research Society established


that healthy sleep in adults should last 7 to 9 h a day [16]. Sleep time can be
reduced due to lifestyle changes, stress, and mental problems or by own choice
[10]. Other sleep deprivation causes may be sleep disorders like insomnia, sleep
apnea, or abnormal sleep patterns [13]. Short-term consequences of this are,
among others, impaired cognitive functions and emotions control, memory, and
concentration loss, higher risk of accidents, fatigue, modified inflammatory and
hormonal response [4]. Sleep deprivation can lead to death when maintained
for a long time [18]. Length of sleep is one of the indicators of sleep quality.
According to World Health Organization (WHO), the indicators of sleep quality
are: total sleep time, the time spent waiting for falling asleep after turning the
lights out (known as sleep latency), number and duration of awakenings during
sleep, abnormal rhythm or duration of consecutive sleep stages and frequency of
sleep disruptions [18].
In sleep, there are four stages: one of rapid eye movement stage of sleep
(REM) and three stages of non-rapid eye movement sleep (NREM) [6]. Char-
acteristic of the REM sleep stage is eye movement under closed eyelids, the
occurrence of brain wave activity similar to wakefulness brain waves, paralysis
of voluntary muscles, and dreams occurrence. In NREM sleep, the brain gener-
ates slow waves – these stages are crucial for restoring the body, including the
nervous system. Consecutive phases NREM 1, NREM 2, NREM 3, and REM
make up one sleep cycle of a healthy adult. It usually lasts about 90 min and is
repeated several times at night [11].
In sleep studies, subjective and objective methods of assessing sleep are used.
Depending on the possibilities, the purpose of the study, and participants’ char-
acteristics, researchers may choose questionnaires of subjective assessment of
sleep quality and quantity, e.g., Pittsburgh Sleep Quality Index, Epworth Sleepi-
ness Scale, or Sleep Diaries [7]. They are relatively cheap, easy to use, and allow
to continue research in specific times, as a global pandemic, using the Inter-
net. However, the reliability of their results depends on respondents’ memory
and ability to assess correctly the events that happened during sleep. Out of
objective sleep assessment methods, there is actigraphy, respiratory polygra-
phy, and polysomnography [15]. Those methods use various sensors put on the
patient’s body to register the physiological parameters of the tested person. The
most comprehensive method, known as the gold standard of sleep assessment, is
polysomnography. It is conducted in laboratories, often located in hospitals (e.g.,
neurological or pulmonological wards). According to international standards, a
polysomnography study should consist of at least four neurophysiological chan-
nels like electroencephalography, electrooculography, or electromyography [1].
Electroencephalography (EEG) is used to assess the bioelectric activity of the
brain by placing electrodes on the scalp of the patient, usually using Interna-
tional Electrode Placing System 10–20. Depending on the frequency of waves
generated by the neurons, EEG enables defining if the patient is awake or asleep
and in case of sleep, makes it possible to recognize the stage of sleep the subject
is in. Using EEG recordings during the night, sleep architecture (sequence of the
78 A. Witek et al.

stages of sleep), number, and timing of awakenings during sleep are observed [1].
Electrooculography (EOG) is used to register the horizontal and vertical move-
ment of the eyes by placing the electrodes on the outer corners of the eyes (one
above the right and one below the left). EOG results are complementary to EEG
in assessing the REM stage of sleep during polysomnography. Additionally, an
electromyography (EMG) sensor on the chin makes it possible to define mus-
cle atony, characteristic of the REM sleep stage. Another EMG sensors can be
located on the anterior muscles of the tibias to record lower limb movements dur-
ing sleep. Moreover, in polysomnography study can be used: electrocardiography
(ECG – to monitor heart activity), airflow sensors (to measure the episodes of
apnea), pulse oximetry (to assess the blood oxygen level), chest and abdomen
movement sensors (to notice respiratory muscle function), microphone (to record
snoring) and body position sensors (to assess the position of the patient’s body
during the night) [1].
Access to polysomnography becomes more common as sleep disorders are
becoming more frequent, especially with the growing number of people exposed
to sleep disturbance risk factors. Such risk factors are, among others, depression
and anxiety, exposure to chronic stress, abuse of alcohol, coffee, or cigarettes,
and obesity [9,12,17]. High Body Mass Index (BMI – a value derived from the
weight and height of the person) indicates obesity and may be correlated to poor
sleep quality [3,5,14]. The aim of this study was to analyze sleep parameters of
adult men by the value of their Body Mass Index.

2 Participants and Methods

The materials for this study were obtained from the polysomnography laboratory
hospital archives in Opolskie Province, Poland. Sleep analyzes were conducted
using Nox A1 PSG System (Nox Medical, Reykjavik, Iceland) equipment. The
data were polysomnographic records of 94 men aged 38–60 examined in 2015 and
2016 on their own initiative due to their sleep disorders. Consumption of caffeine,
alcohol, or sleeping pills before the examination was a criterion for excluding
individuals from the group. Additional criteria for excluding participants from
the analysis were duration of sleep shorter than 90 min (one sleep cycle) and
body mass index (BMI, [kg/m2 ]) greater than 50. In total, 75 men met the
criteria for inclusion in the study.
The characteristics of age and BMI of the examined group is presented in
Table 1.

Table 1. General characteristic of study participants (N = 75)

Variable Median SD 10th percentile 90th percentile


Body Mass Index 31.24 6.2 25.42 41.52
Age [y] 51 6.5 42 59
Sleep Quality 79

The variables of sleep architecture assessed in the study:


– time in bed (TIB) – the time between turning the light off in the evening and
turning the light on in the morning;
– total sleep time (TST) – the time between sleep onset and final awakening;
– total sleepless time – time spent awake during TIB;
– wake after sleep onset (WASO) – time of wakefulness in the period between
sleep onset and final awakening;
– sleep efficiency – time spent asleep to time spent in bed ratio (TST to TIB
ratio);
– 2. stage of sleep to TIB ratio – the ratio of time spent sleeping in 2. stage of
sleep to time spent in bed;
– 4. stage of sleep to TIB ratio – the ratio of time spent sleeping in 4. stage of
sleep to time spent in bed;
– 3. stage of sleep to TST ratio – the ratio of time spent sleeping in 3. stage of
sleep to total sleep time;
– NREM obstructive sleep apnea (NREM OSA) – time of obstructive sleep
apnea episodes during non-REM sleep phases;
– Apnea and hypopnea in NREM – time of apnea and hypopnea in non-REM
sleep phases;
– NREM sleep apnea index – number of sleep apnea episodes in non-REM
phases per hour of sleep;
– NREM apnea/hypopnea index (NREM AHI) – number of NREM sleep apnea
episodes and number of NREM sleep hypopnea episodes ratio per hour of
sleep;
– NREM Oxygen Desaturation 90–94% – number of blood oxygen desaturation
episodes with oxygen saturation between 90% and 94%;
– NREM Oxygen Desaturation Index (NREM ODI) – number of blood oxygen
desaturation in non-REM sleep stages per hour of NREM sleep;
– Oxygen Desaturation Nadir in NREM sleep – the lowest blood oxygen satu-
ration a patient drops to during non-REM sleep;
– isolated limb movement index – number of isolated limb movements per hour
of sleep;
– snoring index in total – number of snoring episodes per hour of sleep.
Parameters describing NREM sleep and sleep time in total (NREM and REM
jointly) highly correlated, hence the conclusion that events during NREM sleep
determine parameters in total – only NREM parameters were presented.
Polysomnographic parameters were analyzed using Statistica 13.3 Software.
The data were tested with the Pearson correlation coefficient test. All results
were considered statistically significant at p < 0.05.

3 Results and Discussion


Table 2 illustrates the correlation coefficient of parameters describing sleep archi-
tecture and BMI. The time spent sleeping in total during the polysomnographic
80 A. Witek et al.

study and sleep efficiency correlated negatively with body mass index values
(Fig. 1A). The total time spent sleepless during the study and the time spent
awake after falling asleep and before final awakening correlated positively with
the BMI of participants. The ratio of time spent in the 2. stage of sleep to time
spent in bed correlated negatively while time spent in the 4. stage of sleep to
time in bed and time spent in the 3. stage of sleep to total sleep time correlated
positively with body mass index of men. Time of obstructive sleep apnea and
apnea and hypopnea in NREM sleep correlated positively with BMI. Events per
hour of NREM sleep regarding sleep apnea, apnea/hypopnea, oxygen desatura-
tion, and isolated limb movement and snoring index correlated positively with
body mass index (Fig. 1B). The number of oxygen desaturation of 90% to 94%
and oxygen desaturation Nadir in NREM sleep correlated negatively with BMI
(Fig. 1C).

Table 2. Pearson correlation coefficient of sleep parameters and body mass index
(BMI) of study participants (N = 75)

Variable r p
Total sleep time (TST) [min] −0.28 0.017
Total sleepless time [min] 0.27 0.018
Wake time After Sleep Onset (WASO) [min] 0.29 0.013
Sleep efficiency (TST to TIB ratio) [%] −0.28 0.013
2. stage of sleep to TIB ratio [%] −0.26 0.026
4. stage of sleep to TIB ratio [%] 0.27 0.021
3. stage of sleep to TST ratio [%] 0.26 0.024
NREM obstructive sleep apnea [min] 0.27 0.019
Apnea and hypopnea in NREM [min] 0.23 0.043
NREM sleep apnea index (events per hour) 0.34 0.003
NREM apnea/hypopnea index 0.34 0.003
NREM oxygen desaturation 90%–94% −0.31 0.007
NREM oxygen desaturation nadir −0.50 0.001
NREM oxygen desaturation index 0.35 0.002
Isolated limb movement index 0.39 0.001
Snoring index in total 0.24 0.042

The study adds to the body of literature the sample of the adult men whose
sleep parameters were assessed by polysomnography. The BMI value of 90% of
men (above the 10th percentile) indicates that their body weight was above nor-
mal. The study results are consistent with those of Moraes et al. [8]. In both
studies, total sleep time is significantly shorter, the higher BMI the patient had,
and sleep efficiency is the lower, the higher BMI was, and WASO values correlate
with BMI positively. Significant positive correlation of BMI with the occurrence
Sleep Quality 81

Fig. 1. Correlation of body mass index (BMI) and: (A) sleep efficiency, (B) NREM
sleep apnea index, (C) NREM oxygen desaturation Nadir of study participants (N =
75)
82 A. Witek et al.

of sleep apnea, apnea/hypopnea index, and oxygen desaturations indicates the


higher BMI of men is the more severe sleep apnea level he experienced. In over-
weight and obese men, fat accumulation in the neck narrows the throat and
predisposes it to collapse. The more fat accumulated in the neck, the more fre-
quent episodes of apnea, therefore the more awakenings during sleep. Occurrence
of episodes of apnea causes episodes of oxygen desaturation in blood. Constricted
by the accumulated fat throat is more prone to vibration, causing more snoring
episodes in men with higher BMI.

4 Conclusion
The sample of the study consisted in 90% of men who had mean BMI that
indicated overweight or obesity. The shorter and less efficient their sleep time,
the higher BMI was. Episodes of apnea and hypopnea were more numerous
the higher men’s BMI. Those sleep breathing disorders happened significantly
more often in NREM phases of sleep than in REM. Moreover, the number of
limb movements and snoring was higher the higher BMI. These tendencies often
result from the accumulation of fat in the neck that narrows the throat – it
predisposes the throat to collapse and causes apnea, hypopnea, or snoring and,
as a result, waking from sleep.

References
1. Avidan, A.Y., Zee, P.C., Smalling, T.R.: Handbook of Sleep Medicine. Lippincott
Williams & Wilkins (LWW), Philadelphia (2007)
2. Brand, S., Kirov, R.: Sleep and its importance in adolescence and in common ado-
lescent somatic and psychiatric conditions. Int. J. Gener. Med. 4, 425–442 (2011).
https://doi.org/10.2147/IJGM.S11557
3. Carno, M.A., et al.: Symptoms of sleep apnea and polysomnography as predictors
of poor quality of life in overweight children and adolescents. J. Pediatr. Psychol.
33(3), 269–278 (2007). https://doi.org/10.1093/jpepsy/jsm127
4. Dattilo, M., et al.: Effects of sleep deprivation on the acute skeletal muscle recovery
after exercise. Med. Sci. Sports Exerc. 52, 507–514 (2020). https://doi.org/10.
1249/MSS.0000000000002137
5. Dixon, J., Schachter, L., O’Brien, P.: Polysomnography before and after weight
loss in obese patients with severe sleep apnea. Int. J. Obes. 29, 1048–1054 (2005).
https://doi.org/10.1038/sj.ijo.0802960
6. Krishnan, V., Collop, N.A.: Gender differences in sleep disorders. Curr. Opin.
Pulm. Med. 12, 383–389 (2006). https://doi.org/10.1097/01.mcp.0000245705.
69440.6a
7. Landry, G.J., Best, J.R., Liu-Ambrose, T.: Measuring sleep quality in older adults:
a comparison using subjective and objective methods. Front. Aging Neurosci. 7,
166–166 (2015). https://doi.org/10.3389/fnagi.2015.00166
8. Moraes, W., et al.: Association between body mass index and sleep duration
assessed by objective methods in a representative sample of the adult popula-
tion. Sleep Med. 14(4), 312–318 (2013). https://doi.org/10.1016/j.sleep.2012.11.
010
Sleep Quality 83

9. Ohayon, M., Li, K., Guilleminault, C.: Risk factors for sleep bruxism in the general
population. Chest 119, 53–61 (2001). https://doi.org/10.1378/chest.119.1.53
10. Peng, Z., Dai, C., Ba, Y., Zhang, L., Shao, Y., Tian, J.: Effect of sleep depriva-
tion on the working memory-related n2–p3 components of the event-related poten-
tial waveform. Front. Neurosci. 14, 469–469 (2020). https://doi.org/10.3389/fnins.
2020.00469
11. Reite, M., Weissberg, M., Ruddy, J.: Clinical Manual for Evaluation and Treatment
of Sleep Disorders. American Psychiatric Pub. (2009)
12. Riemann, D., Berger, M., Voderholzer, U.: Sleep and depression - results from
psychobiological studies: an overview. Biol. Psychol. 57(1), 67–103 (2001). https://
doi.org/10.1016/S0301-0511(01)00090-4
13. Sateia, M.J.: International classification of sleep disorders-third edition. Chest
146(5), 1387–1394 (2014). https://doi.org/10.1378/chest.14-0970
14. Spruyt, K., Molfese, D., Gozal, D.: Sleep duration, sleep regularity, body weight,
and metabolic homeostasis in school-aged children. Pediatrics 127, e345-52 (2011).
https://doi.org/10.1542/peds.2010-0497
15. Tan, H.L., Gozal, D., Ramirez, H., Bandla, H., Kheirandish-Gozal, L.: Overnight
polysomnography versus respiratory polygraphy in the diagnosis of pediatric
obstructive sleep apnea. Sleep 37, 255–260 (2014). https://doi.org/10.5665/sleep.
3392
16. Watson, N.F., et al.: Joint consensus statement of the American academy of sleep
medicine and sleep research society on the recommended amount of sleep for a
healthy adult: methodology and discussion. J. Clin. Sleep Med. 11(8), 931–952
(2015). https://doi.org/10.5664/jcsm.4950
17. Witek, A., Lipowicz, A.: The impact of cigarette smoking on the quality of sleep
in polish men. Anthropol. Rev. 84(4), 369–382 (2021)
18. World Health Organization: WHO technical meeting on sleep and health: Bonn
Germany (2004)
Exploring Developmental Factors Related
to Auditory Processing Disorder
in Children

1(B)
Michal Krecichwost
 , Natalia Moćko2 , Magdalena L
 awecka3 ,
Zuzanna Miodońska , Agata Sage , and Pawel Badura1
1 1

1
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{michal.krecichwost,zuzanna.miodonska,agata.sage,pawel.badura}@polsl.pl
2
Faculty of Humanities, Institute of Linguistics, University of Silesia,
ul. Sejmu Ślaskiego
 1, 40-001 Katowice, Poland
natalia.mocko@us.edu.pl
3
Silesian Center for Hearing and Speech Medincus,
Nasypowa 18, 40-551 Katowice, Poland

Abstract. In this paper, the authors intend to discuss the diagnosis of


auditory processing disorder (APD) in the context of the child’s develop-
ment from prenatal to early childhood. We propose dividing etiological
factors (disturbing the functioning of the auditory processing system)
into three groups. Thanks to this, it will be possible to observe the like-
lihood of APD in children who previously had specific factors observed.
So far, the diagnosis of APD is associated with the school difficulties
experienced by the child, not related to the etiological factors of these
disorders. The research carried out on 60 children indicates the repeata-
bility of the child’s developmental phenomena and school difficulties. The
topic is important from a therapeutic point of view. Currently, younger
patients (preschool children) who are at risk of occurrences of APD in
the future are often overlooked in auditory therapy because they do not
qualify for a diagnosis of APD due to their age. The authors want to show
how early detection of disorders in higher hearing functions, regardless of
age, is necessary to start appropriate auditory therapy required to func-
tion correctly in a school environment. The long-term goal of the study
is to prepare data and obtain preliminary findings for the development
of computer-aided APD diagnosis tools.

Keywords: Computer-aided diagnosis · Auditory processing disorder ·


APD

1 Introduction

Auditory processing disorder (APD) is a disorder of auditory information pro-


cessing along the central auditory nervous system. APD impacts the individual’s
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 84–93, 2022.
https://doi.org/10.1007/978-3-031-09135-3_8
Developmental Factors Related to APD in Children 85

listening and communication. It constitutes a significant barrier in the sustain-


able growth of children and an obstacle in the daily functioning of adults [1,10].
APD has gained popularity in the literature concerning audiology issues and
their relation to speech development. The general studies attempt to unify the
definition of the term and compare the patients with APD to the reference
population [7]. The researchers also try to determine the method of conducting
diagnostics. It includes differential diagnosis, screening tests, and diagnostic tests
[2]. The speech therapy community discusses the APD problem relatively less
frequently in the context of therapy planning due to its etiology. These problems
result from the inability to transfer the results of the tests to functional therapy.
The speech therapist cannot determine the child’s difficulty profile based on the
diagnosis results alone. The available tools for speech therapy do not consider
the differentiation of exercises in terms of a difficulty profile. Most often, the
practices are general for all students diagnosed with APD. When planning the
therapy, the etiology of APD, which may be important information about the
child’s development, is not considered.
Specialists distinguish three primary profiles of APD: auditory decoding
deficit, prosodic deficit, and integration deficit [3]. They concern various brain
regions (left auditory cortex, right auditory cortex, and corpus callosum, respec-
tively) and yield different symptoms. The difficulties at the level of the struc-
tures of the central nervous system translate into evident problems in the school
functioning of children and adolescents diagnosed with APD [7]. The need to
provide psychological and pedagogical support as a part of work with a special-
ist (speech therapist, surdologopedist, surdologopedagogue) at respective educa-
tional stages, excluding the preschool stage, confirms the relevance of mentioned
issues.
The process of audiological diagnosis confirming the presence of APD is time-
consuming. It requires the assessment of physiological hearing, psychological
examination to determine the intellectual performance level, qualification of the
audiologist to conduct tests, several visits in a facility with a battery of tests,
and finally confirmation of the diagnosis by the audiologist. Experts in the con-
sidered field highlight the necessity to apply the hierarchy principle. The process
of diagnosis starts with general tests that indicate auditory processing issues
related to various etiologies and leads to specified ones that verify an APD and
a particular difficulty profile simultaneously [6]. Numerous studies emphasize
the relevance of specialized tests. They allow for the diagnostic evaluation of
auditory processing disorders related to audiological abnormalities that do not
constitute results of a hearing loss or physiological hearing impairment [5,8].
In Poland, the diagnosis takes place in healthcare facilities, while an educa-
tional institution carries out the main part of the therapy. The involvement of
the educational facilities results from the subjective assessment of the patient’s
abilities based on the information exchanged between parent and school. The
rehabilitation of children and adolescents diagnosed with APD shows that the
difficulties represented by individual patients may also take various forms. There-
fore, it becomes essential to consider the developmental factors that cause APD
86 M. Krecichwost
 et al.

in a child. The symptoms of APD will vary depending on what causes the dis-
order. The school difficulties arising from pregnancy damage may differ from
adolescent auditory deprivation.
In this paper, we present a preliminary study on the developmental factors
that correlate with the presence of APD. We examined a group of 60 children
and performed a qualitative analysis of gathered material to evaluate the role
of selected factors (pregnancy, perinatal, and early childhood) in the APD risk
assessment. State of the art suggests the importance of considering multiple
elements in the context of difficulties in sound processing that affect different
age groups.
So far, diagnosis is triggered mainly by noticing increased problems in learn-
ing. Specific tests are ordered and assessed by an audiologist, which confirms
the presence of APD in children, including tests of higher auditory functions.
However, such an operation frequently takes over a year. Our work constitutes
a preliminary stage for developing the examination protocol and implementing
computer-aided approaches to speed up the diagnostic process and make it more
accessible.
The remainder of the paper is structured as follows: Sect. 2 describes the
process of constructing the database, including areas of information collected
by the interviews. In Sect. 3, we present the outcomes of qualitative analysis of
gathered material. Section 4 discusses the results and indicates the ideas for the
development, while Sect. 5 concludes this paper.

2 Materials and Methods

We constructed a database by conducting interviews in educational and health-


care institutions during a pilot study. The interviews were performed by a pair of
therapists of speech and hearing during programming therapy in children show-
ing abnormalities in higher auditory functions. The diagnostic description of the
difficulties was based on the detailed answers from parents or legal guardians
during the conversation. The starting question was related to the reason for vis-
iting the speech and hearing therapists. Each time it was associated with either
school difficulties or learning issues at the preschool stage. Then, the special-
ists acquired information about the diagnosis of APD or other disorders. Later,
the caregiver described specific skills problematic for the child by answering the
following questions. Older students confirmed their difficulties directly.
The second part of the interview concerned the child’s development, from the
prenatal and perinatal period up to the standard time of completion of speech
development associated with correct articulation. The questions explored the
etiological factors considered likely to impact the occurrence of abnormalities
in the higher auditory functions. The interviews were collected in 2016–2020 on
60 children (16 girls and 44 boys) aged 6–14. Figure 1 presents the distribution
of age and gender among examined children. The group consisted of patients
diagnosed with APD and undergoing the diagnostic process.
Developmental Factors Related to APD in Children 87

Fig. 1. Distribution of age in investigated group. The median is marked with a red
bar. The whiskers present minimum and maximum values

Fig. 2. Distribution of learning difficulties and etiology factors

3 Results

The caregivers reported several school issues related to articulation, phonology


in reading and writing, auditory memory, consistency of speech, understanding
in a noisy environment, concentration during lessons, auditory hypersensitiv-
ity, and sound source location (Fig. 2(a)). The specialists noticed the following
elements of etiological background during the interviews: burdened pregnancy,
burdened childbirth, ear inflammation or hearing disorders (conductive hearing
88 M. Krecichwost
 et al.

loss) during the period of speech acquisition, 3rd tonsil/polyps (pharyngeal ade-
noid), high exposure to processed signals (increased presence of digital sounds in
the child’s acoustic environment), abnormal muscle ton/neurological disorders,
upper respiratory tract infections/obstruction of the Eustachian tubes (nasal
infections), and allergy/asthma (Fig. 2(b)).
The respondents’ answers suggest that the most frequent difficulties resulting
from etiology include: inability to properly master reading and writing skills, low
concentration level, problems with understanding in noise, articulation disorders
(persistent speech impediments), and decreased auditory memory level. Auditory
hypersensitivity occurs the least frequently (Fig. 3). It is also not an isolated
difficulty. It mostly coexists with the problems related to understanding in noise
(Fig. 3(b)).
Figure 5 shows the most frequently related etiological factors that the parents
mentioned. We observed that abnormal muscle tone (AbTo) was most common
in those children whose parents reported allergy or asthma (AlAs) as well as
frequent nasal infections (NaIn). An association was also observed in the case of
abnormal muscle tone (AbTo) and high exposure to processed signals (HiEx).
In addition, high exposure to processed signals can be linked to allergies and
asthma (AlAs).
We divided the etiological factors according to the time of occurrence and
the environment influence on the factor presence into three categories (Table 1):
(Group 1) prenatal and perinatal, (Group 2) health problems during the speech
acquisition period, and (Group 3) environment-specific factors present during
the speech acquisition period.

Table 1. Etiological factors included in each group: (Group 1) prenatal and peri-
natal, (Group 2) health problems during speech acquisition period, and (Group 3)
environment-specific factors present during speech acquisition period

Groups Etiologies
Group 1 Burdened pregnancy
Burdened childbirth
Abnormal muscle ton/neurological disorders
Group 2 Ear inflammation/hearing disorders during the period of speech learning
3rd tonsil/polyps
Upper respiratory tract infections/obstruction of the Eustachian tubes
Allergy/asthma
Group 3 High exposure to processed signals

Group 1 included both prenatal and perinatal factors. However, the issues
related to the complications in pregnancy constituted a smaller group than peri-
natal factors (Fig. 6). Groups 2 and 3 involved factors occurring in the period
of intense speech acquisition (2.5–6 years). These factors relate to the persistent
disturbance of proper reception of sounds caused by co-occurring diseases and
Developmental Factors Related to APD in Children 89

Fig. 3. Distribution of etiology factors in investigated groups: BuPr – burdened preg-


nancy, BuCh – burdened childbrith, EaDi – ear diseases, PhAd – pharyngeal adenoid,
HiEx – high exposure to processed signals, AbTo – abnormal muscle tone, NaIn – nasal
infections, AlAs – allergy/asthma

disorders (Group 2) in a specific age range. The last category of factors (Group
3) included environmental deprivation issues (high exposure to processed sig-
nals). The groups often co-occur: most commonly Group 1 with Group 2. Group
3 only co-occur, mostly with Group 2.

4 Discussion

Auditory processing disorder often constitutes a barrier to sustainable growth


and functioning for a patient. It is a challenging issue for a diagnostician too. The
profiles of symptoms sometimes cover different disorders and might not relate
only to APD. The symptoms include a broad range of topics in the patient’s
functioning. Problems most frequently indicated by parents are easily measur-
able. Difficulties in reading and writing, articulation abnormalities, or concen-
90 M. Krecichwost
 et al.

Fig. 4. Distribution of learning difficulties in investigated groups: Ar – articulation,


PhRe – phonology: reading, PhWr – phonology: writing, AuMe – auditory memory,
CoSp – continuity/consistency of speech, UnNo – understanding in noise, Co – con-
centration, AuHy – auditory hypersensitivity, SoLo – sound source location

tration issues are effortlessly observable. The remaining issues are often noted
by teachers and are related to functioning in the class.
Reception and decoding of sounds in an unfavorable acoustic environment
are mainly related to the development of cognitive [4] and linguistic functions
of a person [9]. The process of reproducing incomplete acoustic information
is possible due to phonological abilities. MacCutcheon [9] claims that noise
imposes requirements on cognitive speech processing in terms of working memory
resources that are necessary to help match incoming phonological information
with phonological representations stored in long-term memory. The maturation
of the auditory system is associated with the gradual mastery of auditory skills,
especially in the aspect of decoding speech sounds. The school skills that require
efficient higher auditory function operations should be mastered in the first years
of school education. This impacts the initiation of an audiological diagnosis after
the age of 7. Such solutions at the educational level are inconsistent with one of
the significant education assumptions – preventive actions to avoid difficulties at
Developmental Factors Related to APD in Children 91

Fig. 5. Co-occurrence of etiological factors. Color corresponds to number of patients

Fig. 6. Distribution and connections of individual etiological groups. Color corresponds


to number of patients

later stages. The complexity of organizing help is associated with an assumption


that among non-physiological abnormalities at the level of decoding the received
sound information, only APD constitutes a unit predisposing to obtain specialist
help within school. The scientific discussion in this area focuses on the need to
conduct a specialized diagnosis, including differential diagnosis.
92 M. Krecichwost
 et al.

American Speech-Language-Hearing Association emphasizes the role of the


interdisciplinary team in the diagnostic and therapeutic process of children and
adults with APD, and at the same time highlights the crucial role of the audi-
ologist as the only one who can confirm the diagnosis (in the form of tests). At
the same time, a student often goes to an audiologist when school difficulties are
already severe. This delay is harmful to the child’s psychological development.
Specialists report problems in determining whether a child with school difficul-
ties has APD or other disorders. Our observations based on 60 students indicate
the significance of determining the etiology. Various sets of reported problems
might be associated with different issues than APD. Among the most critical
factors influencing the occurrence of APD in the future, we suggest indicating
three groups of factors: prenatal and perinatal, health problems during speech
acquisition period, and environment-specific factors present during the speech
acquisition period. Identifying relationships between the most common etiolog-
ical factors helps determine the group of children at risk of developing APD
in the future (Fig. 5). Thus, the teachers and parents can take early preventive
measures focused on preparing the children for reading and writing, extending
concentration time, etc.
We plan to extend the study to include a more extensive research group in
the future. This would enable quantitative analysis of how the co-occurrence of
the risk factors increases the probability of the APD diagnosis in the future.
The examination of the etiology of difficulties allows for determining the risk
group among preschool children. This should also significantly accelerate stu-
dents’ diagnostic process and give much time for therapy before the school stage,
thus preventing the emergence and increased intensity of school difficulties.

5 Conclusion
We performed interviews with parents of 60 children with auditory processing
difficulties. Based on them, we separated a set of etiological factors that cover
the background of APD and are likely to be relevant for computer-aided diag-
nosis of this disorder in the future. The employment of etiology-based factors
may improve the diagnostic process by considering the number and duration
of factors that translate into incorrect mastery of school skills. Through this
research, we emphasize the relevance of the constant development of diagnostic
tools dedicated to APD.
The next stage of the work will be developing a remote computer-aided diag-
nosis (CAD) tool supporting the diagnosis and therapy of APD using artificial
intelligence methods. The proposed approach will enable a detailed examina-
tion of disorders of higher auditory functions (manifested in the form of school
difficulties) based on a remote interview with the parent and teacher of the exam-
ined child. The data obtained as part of the interview will constitute a knowledge
base for a fuzzy expert system, which task will be to determine the profile of
the child’s disorders (in conjunction with the etiology of their occurrence). In
addition, it will help the therapist indicate areas that they should pay special
attention to in therapy.
Developmental Factors Related to APD in Children 93

Acknowledgement. Publication supported under the Excellence Initiative –


Research University program at the Silesian University of Technology, 2020/2021, No.:
07/010/SDU/10-22-02

References
1. Agrawal, D., Dritsakis, G., Mahon, M., Mountjoy, A., Bamiou, D.E.: Experiences of
patients with auditory processing disorder in getting support in health, education,
and work settings: findings from an online survey. Front. Neurol. 12, 167 (2021).
https://doi.org/10.3389/fneur.2021.607907
2. Barry, J., Tomlin, D., Moore, D., Dillon, H.: Use of questionnaire-based measures
in the assessment of listening difficulties in school-aged children. Ear Hear. 36
(2015). https://doi.org/10.1097/AUD.0000000000000180
3. Bellis, T.J., Ferre, J.M.: Multidimensional approach to the differential diagnosis
of central auditory processing disorders in children. J. Am. Acad. Audiol. 10(6),
319–28 (1999)
4. Bradley, J.S., Sato, H.: The intelligibility of speech in elementary school classrooms.
J. Acoust. Soc. Am. 123(4), 2078–2086 (2008)
5. Cunha, P., Silva, I.M.d.C., Rabelo, N.E., Tristão, R.M.: Auditory processing dis-
order evaluations and cognitive profiles of children with specific learning disorder.
Clin. Neurophysiol. Pract. 4, 119–127 (2019). https://doi.org/10.1016/j.cnp.2019.
05.001
6. Dillon, H., Cameron, S., Glyde, H., Wilson, W., Tomlin, D.: An opinion on the
assessment of people who may have an auditory processing disorder. J. Am. Acad.
Audiol. 23, 97–105 (2012). https://doi.org/10.3766/jaaa.23.2.4
7. Iliadou, V.M., Bamiou, D.E.: Psychometric evaluation of children with auditory
processing disorder (APD): comparison with normal-hearing and clinical non-APD
groups. J. Speech Lang. Hear. Res. JSLHR 55, 791–9 (2012). https://doi.org/10.
1044/1092-4388(2011/11-0035)
8. Iliadou, V.M., Chermak, G., Bamiou, D.E., Musiek, F.: Gold standard, evidence
based approach to diagnosing APD. Hear. J. 72, 42–46 (2019). https://doi.org/10.
1097/01.HJ.0000553582.69724.78
9. Maccutcheon, D., Füllgrabe, C., Eccles, R., van der Linde, J., Panebianco, C.,
Ljung, R.: Investigating the effect of one year of learning to play a musical instru-
ment on speech-in-noise perception and phonological short-term memory in 5-to-7-
year-old children. Front. Psychol. 10 (2020). https://doi.org/10.3389/fpsyg.2019.
02865
10. Martins, J.H., Alves, M., Andrade, S., Falé, I., Teixeira, A.: Auditory processing
disorder test battery in European Portuguese-development and normative data for
pediatric population. Audiol. Res. 11(3), 474–490 (2021). https://doi.org/10.3390/
audiolres11030044
Morphological Language Features
of Anorexia Patients Based on Natural
Language Processing

Stella Maćkowska1(B) , Klaudia Barańska1,2 , Agnieszka Różańska1 ,


Katarzyna Rojewska3 , and Dominik Spinczyk1
1
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{stella.mackowska,klaudia.baranska,agnieszka.rozanska,
dominik.spinczyk}@polsl.pl
2
Maria Sklodowska-Curie National Research Institute of Oncology, ul. Roentgena 5,
02-781 Warsaw, Poland
klaudia.baranska@pib-nio.pl
3
Central Laboratory of Clinical Psychology, Independent Public University Hospital
No 1 in Zabrze, Medical University of Silesia, ul. 3 Maja 13/15, 41-800 Zabrze, Poland
katroj@op.pl

Abstract. The aim of the article is to compare the morphological fea-


tures of language between people suffering from anorexia and healthy
people. The research was conducted in cooperation with the Medical
University of Silesia on the basis of a pilot group of people: 41 from
anorexia patients and 55 from healthy young females. The study focuses
on the statistics for the following parts of speech: personal pronoun ‘I’,
possessive adjective ‘my’ verbs, adjectives. Several significant differences
were detected, including abnormal usage of the personal pronoun ‘I’ and
mostly verb ‘to be’, and action verbs by anorectic people. In the future,
it is planned to conduct the research on a larger group of patients and
include the descriptions of the drawings in the re-search.

Keywords: Adolescence · Anorexia nervosa · Eating disorders ·


Language features · Natural language processing · NLP

1 Introduction
The recent years have brought a severe increase in people suffering from vari-
ous eating disorders, wherein the first place is anorexia nervosa. The researchers
estimate that eating disorders affect approximately 1% of the total population,
and the number is still growing. Unfortunately, the situation with coronavirus
pandemic (Covid-19) has worsened the statistics. Many people fearing for their
health avoid contact with others, including medical staff. Online education, work,
lockdowns are the factors that impact mental conditions, especially among ado-
lescents. This young have been affected severely by the pandemic, as remote
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 94–104, 2022.
https://doi.org/10.1007/978-3-031-09135-3_9
Initial Motivation as a Factor Predicting the Progress 95

learning, sedentary lifestyle, and limitation on peer contacts, sports activities


have disrupted their everyday routine, and natural development [1]. The studies
report higher incidence in women and girls than in men and boys, with gender
ratios of approximately 10/1 to 15/1 [2].
However, the latest data show that during the coronavirus pandemic the
number of male patients dealing with anorexia has grown [3–5]. This situation
can raise lots of concerns as currently there is a lack of quick diagnostic tools as
well as therapists.
Anorexia nervosa is an eating disorder of unknown and very complex etiol-
ogy. It is a mental disease characterized by severe abnormalities in food intake.
Patients with anorexia take strict steps to lose weight. They are preoccupied
with an extremely low-calorie diet and extensive physical exercise. Their body
image is disrupted; even if they look skinny, they still see themselves as fat
people. Long-term anorexia leads to many somatic disorders being the result
of a restricted diet. The possible complication may include osteoporosis (even
in young women), problems with muscles, fatigue, constipations, loss of period,
fertility problems, not to mention heart and cardiovascular problems, and others
[6–8].
Anorexia has the highest mortality rate among other psychiatric disorders,
and within recent years, it has been observed the increased risk of deceases
among anorexia patients [9]. It is estimated that DALY (disability adjusted life-
years) indicator for eating disorders worldwide is 29,49 per 100000 of people.
In the USA the value is 62,27 per 100000, and for West Europe – up to 71,26
per 10000 of people. Poland has a relatively low rate and amounts to 26,33 per
100000 citizens [10]. The epidemiological research on eating disorders can be
underestimated because of the relatively rare occurrence of these afflictions and
their slight impact on DALY indicator. However, early identification of eating
disorders can prevent further disease growth and reduce the risk of its long-
lasting consequences [10,11]. It is estimated that 25–33% of anorexia or bulimia
patients suffer from other chronic diseases [12]. Thus, it seems crucial to popu-
larize research on eating disorders. The problems concern eating disorders, and
their symptoms mostly affect middle and late adolescence [13,14]. According to
the research, 13% of girls under 20 year old have experienced one of eating dis-
orders [13]. More than 5% of people suffering from anorexia committee suicides
or die due to somatic complications caused by growing cachexia [15].
The process of anorexia treatment is long and complex. The main goals for
treating anorexia are stabilizing weight loss, beginning nutrition rehabilitation,
eliminating other problematic eating patterns, psychological therapy that helps
change issues such as low self-esteem and distorted thinking patterns, and devel-
oping long-term behavioral changes [16,17].
The diagnostic process can be supported by the newest methods of Natural
Language Processing (NLP). NLP stands for the automatic handling of natural
human language in speech or text. NLP is particularly growing in the healthcare
industry. This technology can be applied in improving care delivery or disease
diagnosis. Moreover, it can also reduce the cost of treatment and, at the same
96 S. Maćkowska et al.

time, shorten the diagnostic process [18–21]. The aim of the article is to compare
the morphological features of language between people suffering from anorexia
and healthy people.

2 The State of the Art


In the professional literature, some research indicates the relationship between
the patient’s psychiatric condition like anxiety, depression, neuroses, and his/her
thoughts, expressions [22–24]. In many cases, NLP tools are applied to diagnose
the potential areas of psychiatric disorders. NLP, as a subfield between Linguis-
tics and artificial intelligence, supports the clinical diagnosis and helps to shorten
the time for disease identification [18].
The other method described in the literature is applying the voice analysis
for detecting neurological disorders (Voice Analysis for Medical Professionals,
VAMP). This method analyses the voice pitch, tone, loudness, and tempo of
the oral statements, and then based on received data, so-called voice biomarkers
are made. Such markers help detect neurodegenerative and civilization diseases
[25,26]. There are also some commercial products developed by BeyondVerbal,
Healthymize or NeuoLex designed for English languages analysis. Their goal is
to determine the patient’s health condition, mood, and emotions by analyzing
various acoustic functions from the speaker’s voice. However, this method cannot
be applied to other languages, which is its main drawback. Each spoken language
bounds in various paralinguistic means, such as intonation, accent, voice timbre,
or pauses. What is more, the parameters related to breathing during speaking
are also different.
The studies concerning automatic identification of people with eating disor-
ders, where the focus was put on clinical diagnosis, can also be found in the
literature. For instance, in the study [18], NLP methods were used for diagnos-
ing binge-eating disorders (BED). The identification was possible thanks to the
medical notes prepared by a doctor during therapeutic sessions with a patient.
In this case, the conversation with a patient was supervised by a doctor, which
definitely eased to identify the disorder. According to the authors’ knowledge,
no existing methods of processing the natural language for automatic language
analysis to support psychologist’s work are known.
Primarily, NLP methods are used in branch of linguistics called Clinical Lin-
guistics and aim to support therapy of patients with speech pathologies, neuro-
logical defects e.g. dementia. However, very few studies focus on the issue of lan-
guage features among anorexia patients. Spoken language carries great amount
of information on the speakers’ cognitive state, as it can be analyzed through
observation of speech disfluencies, such as spontaneous errors, fillers, pauses, or
false starts. Therefore, spoken language poses challenges for automated syntactic
analysis [27,28].
The authors of the paper [27] claim that some characteristics of anorexia
like negative body image, obsessive thinking, and anxious or depressive traits
can influence the language used by patients and can be detected using NLP
Initial Motivation as a Factor Predicting the Progress 97

tools. Anorexia patients are characterized by emotional and cognitive distur-


bances, resulting in spoken and written language abnormalities. According to
the authors, such disruptions can be seen at the syntactic, lexical, and semantic
levels. It is possible to identify some language features among anorexia patients,
and thanks to NLP techniques create ’digital linguistic biomarkers’.

3 Material and Method


The proposed method aims to detect some characteristic traits of language used
by people suffering from anorexia nervosa. The superior objective of the devel-
oped method is to support the therapist and to shorten the diagnostic process.
The essential step in the research was to establish a proper methodology. Due
to the fact that the language analysis uses the notes prepared by people who
in many cases were under-age, we established the cooperation with professional
psychologist-therapist dr K. Rojewska from Central Laboratory of Clinical Psy-
chology, Clinical Hospital No.1 in Zabrze Named after Professor S. Szyszko of the
Medical University of Silesia. Thanks to the cooperation, we started preliminary
research on language analysis of people suffering from anorexia.

3.1 Data Ccollection


The material used in the analysis consisted of notes prepared by female patients
with anorexia under the treatment and healthy girls and young women from
primary and secondary schools in Gliwice. We asked both groups to write free
statements about their body image. We did not suggest the length of the notes.
The statements from anorexia patients were prepared during therapeutic sessions
with the therapist or at the most convenient place to the patient. We collected 96
notes – 44 from anorexia patients and 52 from healthy young females. Average,
[minimum; maximum] number of words in the note for a healthy and sick person
were 41 [20; 103] and 72 [19; 254], respectively.

3.2 Research and Control Group


To establish research and control groups, we considered the following assump-
tions. In the case of the research group:
– The age of the patients corresponds to the adolescence (12–19 years old),
– People with diagnosed anorexia made by psychiatrist,
– No other parallel psychical disorders,
– The disorder lasted up to 3 years.
For the control group we adopted the following criteria of including the people
into the research:
– The age of the patients corresponds to the adolescence (12–19 years old),
– The lack of any diagnosed psychical disorders.
98 S. Maćkowska et al.

Within the developed criteria, 41 girls with anorexia (restrictive form) were
included in the research group. The participants were aged 12–19, an average of
15.7 ± 2. The anorexia was diagnosed according to ICD-10 and DSM IV criteria.
The average weight of the girls was 35.1 ± 4.7 kg, and BMI was between 11.3–
20.2, average 15.1 ± 2.8 (p < 0.001 vs. control group), and BMI SDS from −4.2
to 0.9, average −2.72 ± 1.49 (p < 0.001 vs. control group).
The control group consisted of 55 healthy girls, aged 12–20, average
15.1 ± 1.9, the average weight was 57.1 ± 10.1 kg and BMI from 16.5 to 25.8,
average 21.5 ± 3.4, and BMI SDS from −2.7 to 3.6, average 0.19±1.44.

3.3 Text Analysis

The text analysis assumed morphological detection of the particular part of


speech. In the research, we wanted to detect the frequency of using verbs, verb
tense (present and past), and adjectives in both groups. We were also interested
in determining the frequency of using the personal pronoun ’I’ and references
to other persons, and the possessive adjective ’my’. The parallel phase of the
analysis concerned sentiment analysis. Such analysis provides the information
whether a particular statement has positive, neutral, or negative attitude or
tone. Moreover, health centers, clinical centers, and other medical units have
recently become familiar with sentiment analysis for diagnostic purposes, e.g.
oncological patients, and psychological patients. The results of the sentiment
analysis based on the same sample data have been already published by the
authors in papers [20,29].
The current analysis was performed using SAS Viya analytic software with
open architecture able to mine, modify, manage and retrieve data from a variety
of sources and perform statistical analysis on it. Moreover, it provides compre-
hensive solutions for identifying the part of speech or categorizing the essential
textual data.
The analysis was divided into two steps. After incorporating the corpus of
notes into the platform, we used the following analysis nodes to retrieve mor-
phological features:

– Text parsing, including dropping and keeping the term process (Start/stop
list), stemming to the basic form, part of speech tagging.
– Calculating the general number of adjectives and categorizing them into neg-
ative and positive adjectives, determining the verb tense (present or past).

Text Parsing
The parsing node enables us to explore the text to find specific terms in our docu-
ments. It includes analyzing the sentence structure and representing it according
to syntactic formalism, such as constituency (or phrase-structure) and depen-
dency [30]. This step also includes text cleaning, which is removing language
errors, removing stop words (preposition or conjunction often provide minimal
value, and are often dropped during text parsing). The next process is stemming,
Initial Motivation as a Factor Predicting the Progress 99

which refers to reducing inflectional forms and sometimes derivationally related


forms of a word to a common base form.
Another part of the analysis concerned the process of tagging. Briefly speak-
ing, it refers to the process of categorizing the words in a corpus to particular
part of speech. In our analysis, we focused on the following part of speech:

– Pronoun: is an inflected part of speech used to replace any other part of


speech (e.g., noun, adjective, adverb, or numeral) except verbs. The personal
pronoun is a type of pronoun associated with a particular grammatical person
– 1st person (I), 2nd person (You), etc.
– Verb: is usually one of the main parts of a sentence and expresses an action,
an occurrence, or a state of being.
– Adjective: acts as a modifier, quantifier, or intensifier in any sentence.

3.4 The Proposed Measurement for Comparing the Data

As mentioned previously, the analysis was carried out for determined parts of
speech. The choice of presented parts of speech was made after many consulta-
tions with the psychologist. Having acquired the basic knowledge on anorexia,
we decided to establish some grammatical patterns of patients with anorexia. At
the beginning of the research to our minds seems necessary to evaluate language
of anorexia patients in terms of morphology. First, we calculated the number of
the personal pronoun ‘I’, the possessive pronoun ‘my’. Next, the focus was on
verbs and their tenses, present, and past. The last point in analysis concerned
the number of adjectives with their negative and positive tone. The number of
particular parts of speech for both groups is presented in the form of chart.
Next, we compared the numerical amount of all parts of speech for individual
groups and calculated the number of particular parts of speech.

4 Results

We analyzed 96 notes prepared by participants in the research, of whom 41 were


female with diagnosed anorexia; the average age was 15.7 participants comprise
the control group, their average age was 15.1.
Figure 1 depicts what the analysis looks like for collected notes of anorexia
participants and healthy ones. As shown in Fig. 1 among anorexia group is a
tendency to use a lot of the personal pronoun ‘I’. The same situation is for the
possessive adjective ‘my’. The references to other persons e.g., ‘you’ or ‘they’, are
relatively rare in the anorexia group, and due to this fact, they were purposely
omitted in the analysis. In the control group, on the other hand, there are no
such references. Moreover, we also observed a high variation of values as the
length of notes in both groups differed significantly.
In the research group, we can observe slightly fewer adjectives used in notes
per person than in the control one. The sick participants tend to use definitely
more negative (Mean: 3.3 ± 2.1) adjectives than positive (Mean: 0.8 ± 0.7) –
100 S. Maćkowska et al.

Fig. 1. The mean number of parts of speech per note in research and control groups

Fig. 2. The number of positive and negative adjectives used in research and control
groups

Fig. 2. The females in the control group use this part of speech with similar
frequency, but the occurrence of adjectives with positive tone is more frequent
(Mean: 2.8 ± 1.6) that those negative (Mean: 1.8 ± 1.4).
Initial Motivation as a Factor Predicting the Progress 101

Fig. 3. The number of verbs in present and past form in research and control groups

Figure 3 presents the number of verbs in present and past form in research
and control groups notes. Patients in the research group used significantly more
verbs than those in the control one (present tense Mean (RG): 5.1 ± 3.4 vs. Mean
(CG): 1.9 ± 1.8). The analysis revealed that anorexia patients in the majority
referred to present and past tenses, and the verb used with high frequency is ‘to
be’. In the control group, we did not observe verbs in the past tense.

5 Discussion and Conclusion


Our preliminary studies intend to create the linguistic profile of anorexia
patients. This topic is a relatively new and unexplored area of computer lan-
guage analysis that aims to support the diagnostic process of anorexia patients.
Based on the authors’ knowledge, no other similar studies have been conducted
and published for Polish language.
The first observation from the analysis of results is that among anorexia
patients is abnormal usage of the personal pronoun ‘I’. It corresponds with the
psychological portrait of anorexia, where the patient concentrates on herself. In
the case of verbs, people with eating disorders use mostly the verb ‘to be’, and
action verbs. Generally, they are characterized by low language fluency and low
using cognition words like feel, want, know, believe. What is more, they do not
use metaphors or irony; their utterances are direct, very often aggressive.
Unfortunately, the weak point of analysis refers to the length of notes. In
both groups, the note is limited to a few sentences. Next, anorexia patients
often avoid speaking of their problems or feelings, not to mention the body.
They have tendency to switch the topic of the conversation, so sometimes, the
notes are far from the topic.
The study is still ongoing, so the final results of the analysis are not yet
achieved. The found linguistic features will be used in the future to create rules
allowing for their automatic detection and, ultimately, for the automatic classifi-
cation of person notes. Independently, it is planned to expand the re-search with
a specific subject based analysis of the categories of difficulties, which will be
based on dictionaries suggested by psychologists. Moreover, the focus will be put
102 S. Maćkowska et al.

on extending the data samples. In the further research, we intend to elaborate


the questionnaire with additional questions and a drawing to collect broader text
corpus.

References
1. Springall, G., Cheung, M., Sawyer, S.M., Yeo, M.: Impact of the coronavirus pan-
demic on anorexia nervosa and atypical anorexia nervosa presentations to an Aus-
tralian tertiary paediatric hospital. J. Paediatr. Child Health (2021). https://doi.
org/10.1111/JPC.15755
2. Gigantesco, A., Masocco, M., Picardi, A., Lega, I., Conti, S., Vichi, M.: Hospital-
ization for anorexia nervosa in Italy. Riv. Psichiatr. 45, 154–162 (2010)
3. Miniati, M., et al.: Eating disorders spectrum during the COVID pandemic: a sys-
tematic review. Front. Psychol. 12, 4161 (2021). https://doi.org/10.3389/FPSYG.
2021.663376
4. Vuillier, L., May, L., Greville-Harris, M., Surman, R., Moseley, R.L.: The impact of
the COVID-19 pandemic on individuals with eating disorders: the role of emotion
regulation and exploration of online treatment experiences. J. Eat. Disord. 9, 1–18
(2021). https://doi.org/10.1186/S40337-020-00362-9
5. Rodgers, R.F., et al.: The impact of the COVID-19 pandemic on eating disorder
risk and symptoms. Int. J. Eat. Disord. 53, 1166–1170 (2020). https://doi.org/10.
1002/eat.23318
6. Surgenor, L.J., Maguire, S.: Assessment of anorexia nervosa: an overview of uni-
versal issues and contextual challenges. J. Eat. Disord. 1, 1–12 (2013). https://doi.
org/10.1186/2050-2974-1-29
7. Damiano, S.R., Atkins, L., Reece, J.: The psychological profile of adolescents with
anorexia and implications for treatment. J. Eat. Disord. 2, 1–1 (2014). https://
doi.org/10.1186/2050-2974-2-S1-P9
8. Anorexia Nervosa: Symptoms, Causes, and Treatments. https://www.healthline.
com/health/anorexia-nervosa. Accessed 23 Jan 2022
9. Smink, F.R.E., Van Hoeken, D., Hoek, H.W.: Epidemiology of eating disorders:
incidence, prevalence and mortality rates. Curr. Psychiatry Rep. 14, 406–414
(2012). https://doi.org/10.1007/S11920-012-0282-Y
10. Wu, J., Liu, J., Li, S., Ma, H., Wang, Y.: Trends in the prevalence and disability-
adjusted life years of eating disorders from 1990 to 2017: results from the Global
Burden of Disease Study 2017. Epidemiol. Psychiatr. Sci. 29 (2020). https://doi.
org/10.1017/S2045796020001055
11. Kotwas, A., Karakiewicz-Krawczyk, K., Zabielska, P., Jurczak, A., Bażydlo, M.,
Karakiewicz, B.: The incidence of eating disorders among upper secondary school
female students. Psychiatr. Pol. 54, 253–263 (2020). https://doi.org/10.12740/PP/
ONLINEFIRST/99164
12. Quick, V.M., Byrd-Bredbenner, C., Neumark-Sztainer, D.: Chronic illness and dis-
ordered eating: a discussion of the literature. Adv. Nutr. 4, 277 (2013). https://
doi.org/10.3945/AN.112.003608
13. Stice, E., Nathan Marti, C., Rohde, P.: Prevalence, incidence, impairment, and
course of the proposed DSM-5 eating disorder diagnoses in an 8-year prospective
community study of young women. J. Abnorm. Psychol. 122, 445 (2013). https://
doi.org/10.1037/A0030679
Initial Motivation as a Factor Predicting the Progress 103

14. Abebe, D.S., Lien, L., Von Soest, T.: The development of bulimic symptoms from
adolescence to young adulthood in females and males: a population-based longi-
tudinal cohort study. Int. J. Eat. Disord. 45, 737–745 (2012). https://doi.org/10.
1002/EAT.20950
15. Guillaume, S., et al.: Characteristics of suicide attempts in anorexia and bulimia
nervosa: a case-control study. PLoS ONE 6, 23578 (2011). https://doi.org/10.1371/
JOURNAL.PONE.0023578
16. Abbate-Daga, G., Amianto, F., Delsedime, N., De-Bacco, C., Fassino, S.: Resis-
tance to treatment in eating disorders: a critical challenge. BMC Psychiatry 13,
1–18 (2013). https://doi.org/10.1186/1471-244X-13-294
17. Robertson, A., Thornton, C.: Challenging rigidity in Anorexia (treatment, training
and supervision): questioning manual adherence in the face of complexity. J. Eat.
Disord. 9, 1–8 (2021). https://doi.org/10.1186/S40337-021-00460-2
18. Bellows, B.K., et al.: Automated identification of patients with a diagnosis of
binge eating disorder from narrative electronic health records. J. Am. Med. Inform.
Assoc. 21, e163 (2014). https://doi.org/10.1136/AMIAJNL-2013-001859
19. Funk, B., et al.: A framework for applying natural language processing in digital
health interventions. J. Med. Internet Res. 22 (2020). https://doi.org/10.2196/
13855
20. Spinczyk, D., Bas, M., Dzieciako, M., Maćkowski, M., Rojewska, K., Maćkowska,
S.: Computer-aided therapeutic diagnosis for anorexia. Biomed. Eng. Online 19
(2020). https://doi.org/10.1186/S12938-020-00798-9
21. Barańska, K., Różańska, A., Maćkowska, S., Rojewska, K., Spinczyk, D.: Determin-
ing the intensity of basic emotions among people suffering from anorexia nervosa
based on free statements about their body. Electronics 11, 138 (2022). https://
doi.org/10.3390/electronics11010138
22. Iliev, R., Dehghani, M., Sagi, E.: Automated text analysis in psychology: methods,
applications, and future developments. Lang. Cogn. 7, 265–290 (2015). https://
doi.org/10.1017/LANGCOG.2014.30
23. Calvo, R.A., Milne, D.N., Hussain, M.S., Christensen, H.: Natural language pro-
cessing in mental health applications using non-clinical texts. Nat. Lang. Eng. 23,
649–685 (2017). https://doi.org/10.1017/S1351324916000383
24. Rezaii, N., Walker, E., Wolff, P.: A machine learning approach to predicting psy-
chosis using semantic density and latent content analysis. npj Schizophr. 5, 1–12
(2019). https://doi.org/10.1038/s41537-019-0077-9
25. Van Puyvelde, M., Neyt, X., McGlone, F., Pattyn, N.: Voice stress analysis: a
new framework for voice and effort in human performance. Front. Psychol. 9, 1994
(2018). https://doi.org/10.3389/FPSYG.2018.01994
26. Rocco, D., Pastore, M., Gennaro, A., Salvatore, S., Cozzolino, M., Scorza, M.:
Beyond verbal behavior: an empirical analysis of speech rates in psychotherapy ses-
sions. Front. Psychol. 9, 978 (2018). https://doi.org/10.3389/FPSYG.2018.00978
27. Cuteri, V., et al.: Linguistic Feature of Anorexia Nervosa: A Prospective Case-
Control Pilot Study (2021). https://doi.org/10.21203/RS.3.RS-186615/V1
28. Minori, G., et al.: Linguistic markers of anorexia nervosa: preliminary data from
a prospective observational study. In: 3rd RaPID Workshop: Resources and Pro-
cessing of Linguistic, Para-linguistic and Extra-linguistic Data from People with
Various Forms of Cognitive/Psychiatric/Developmental Impairments, pp. 34–37
(2020)
104 S. Maćkowska et al.

29. Spinczyk, D., Nabrdalik, K., Rojewska, K.: Computer aided sentiment analysis
of anorexia nervosa patients’ vocabulary. Biomed. Eng. Online 17, 1–11 (2018).
https://doi.org/10.1186/S12938-018-0451-2
30. Pyysalo, S.: Text parsing. In: Dubitzky, W., Wolkenhauer, O., Cho, KH., Yokota,
H. (eds.) Encyclopedia of Systems Biology, pp. 2162–2163. Springer, New York
(2013). https://doi.org/10.1007/978-1-4419-9863-7 182
Image Analysis
Comparison of Analytical and Iterative
Algorithms for Reconstruction
of Microtomographic Phantom Images
and Rat Mandibular Scans

Pawel Lipowicz1(B) , Agnieszka Dardzińska-Glebocka



2
, Marta Borowska1 ,
3
and Ander Biguri
1
Institute of Biomedical Engineering, Faculty of Mechanical Engineering,
Bialystok University of Technology, ul. Wiejska 45C, 15-351 Bialystok, Poland
{p.lipowicz,m.borowska}@pb.edu.pl
2
Department of Mechanics and Applied Computer Science, Bialystok University
of Technology, ul. Wiejska 45C, 15-351 Bialystok, Poland
a.dardzinska@pb.edu.pl
3
Department of Applied Mathematics and Theoretical Physics, University of
Cambridge, Wilberforce Road, CB3 0WA Cambridge, UK

Abstract. For the reconstruction of cone beam tomography (CBCT)


images, the analytical algorithm of Feldkamp et al. (FDK) is mainly
used. Apart from it, many iterative algorithms have been developed, e.g.
algorithms from ART Families, conjugate gradient least squares (CGLS)
or algorithms that can use total variation regularization (i.e. ASD-POCS,
OS-ASD-POCS, OS-AwASD-POCS). However, they are infrequently
used commercially. This paper compares the reconstruction time of the
above-mentioned algorithms and analyses the images obtained from the
reconstruction using image similarity assessment methods. Both phan-
toms (Head phantom and Sheep-Logan phantom) and a scan of a rat
mandibular angle with a composite implant (titanium+bio-glass) were
used for reconstruction. The presented analysis allows to determine the
direction of further work related to methods of reducing artefacts caused
by metallic implants in the reconstruction area.

Keywords: Computed microtomography · Reconstruction


algorithms · Tomography software · Cone beam CT · Image
reconstruction · Iterative reconstruction · Image processing

1 Introduction

Computed tomography (CT) is widely used in various areas of science and still
strongly developing. Nowadays, tomography is used not only in medical areas
such as medical imaging, stomatology or oncology, but also in materials science,
geology, archaeology or life sciences.

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 107–118, 2022.
https://doi.org/10.1007/978-3-031-09135-3_10
108 P. Lipowicz et al.

Increasingly, cone beam geometry technology (CBCT) is used especially in


microtomography (7muCT). Reduced radiation dose with reconstruction of full
3D high-quality images is a very important feature of CBCT [11]. There are many
methods for mathematical reconstruction of tomographic images. Currently, the
most commonly used algorithm is that of Feldkamp et al. (FDK) [12]. Continuous
research on reconstruction algorithms has led to the publication of many other
reconstruction methods, mainly iterative ones. The main factor, why they are
not used in practice, is the reconstruction time, which is longer than in the FDK
algorithm. The difference is due to the need to perform multiple recalculations
which are time consuming tasks.
It has been repeatedly shown that iterative algorithms give better results
than the FDK algorithm [3,13,14]. The authors in this paper compared several
types of iterative algorithms and FDK using several types of filtering. The results
of reconstruction time, and comparison of images using different statistical and
analytical methods of calculating the difference between images are presented.

2 Methodology
2.1 Microtomography (µCT) Scanning

A group of 150 rats were treated with implants made of different materials (pure
titanium, bio-glass, composite of titanium and bio-glass, and polylactide (PLA)).
The implants were placed in the mandibular angle area. Then, after 20, 50 and
100 days, mandibular material was harvested from each material group. Only
the portion of the mandible that contained the implant was scanned on the
µCT scanner. Samples were scanned using a 1172 SkyScan microCT desktop
scanner (SkyScan, Kontich, Belgium). All samples were scanned under the same
parameters. The X-ray source was operated at 80 kV/125 µA, and a 0.5 mm, Al
filter was used. The exposure rate of the matrix itself was kept at 45%. The image
resolution was 7 µm and the rotation step was 0.4◦ . The images were saved at
a depth of 16 bits TIFF. The resolution of the matrix was 2000 × 1332 px. We
have all the required documents and permits from the bioethics committee to
perform procedures on laboratory animals.

2.2 Image Reconstruction


To perform the reconstruction, Python 3.8 software and the Tomographic Iter-
ative GPU-based Reconstruction Toolbox (TIGRE toolbox [4]) were used. The
development environment was run on a Windows 10 computer equipped with
an Intel Xeon CPU E3-1240 v3 3.40 GHz, NVIDIA GeForce GTX 980 Ti 6
GB graphics card, and 20 GB DDR3 RAM. For the reconstruction, the Head
phantom implemented in the toolbox and the generated Sheep-Logan phantom
(Yu-Ye-Wang type) [21] were used. Both phantoms were generated at a high
resolution (512 × 512 × 512 px – Shepp-Logan phantom and 512 × 512 × 901 px –
Head phantom). The scan from the microtomograph was the highest resolution
Comparison of Analytical and Iterative Algorithms 109

2000 × 1332 × 901 px. In order to speed up the reconstruction process for the
microtomography data, only the central 100 cross-sections were reconstructed.
Phantoms and a scan of the rat mandible were reconstructed using FDK algo-
rithms with filtering (Ram-Lak, Shepp-Logan, Cosine, Hamming, Hann), algo-
rithms known as the algebraic reconstruction technique family (ART-family),
Conjugate Gradient Least Squares (CGLS), and the total variation regular-
ization algorithms family (ASD-POCS, OS-ASD-POCS, OS-AwASD-POCS).
Phantoms and reconstructed images were analysed in 32 bit depth.
FDK is the algorithm which extracts data linearly by treating the projection
as a matrix with n-row and m-column pixels [6]. It preserves some accuracy
in the z-direction for low complexity objects, in the plane of the beam centre
trajectory, preserving integrals in the longitudinal and oblique direction [8]. The
FDK algorithm for CBCT is described as follow:
  ∞
1 2π 1 D
fˆ(x, y, z) = dθ 2 × p(θ, u, u ) √ · h(u − u) · dudθ (1)
2 0 U D 2 + u2 + v 2
−∞

where fˆ is the approximate reconstruction result and h() is the ramp filter.
Phantom and sample data were reconstructed using five different filters. Each
filter was defined in the frequency domain:

H(ω) = |ω|W (ω) (2)

where ω is the spatial frequency and W (ω) is defined as [7]:


– for Ram-Lak filter: W (ω) = rect(ω/2)
– for Sheep-Logan filter: W (ω) = rect(ω/2)sinc(ω/2)
– for Cosine filter: W (ω) = rect(ω/2)cos(ω/2)
– for Hamming filter: W (ω) = rect(ω/2)(0.54 + 0.46cos(ω))
– for Hann filter: W (ω) = rect(ω/2)(0.5 + 0.5cos(ω))
The ART-family of algorithms is a projection-by-projection reconstruction
method. The differences between the various SART, SIRT, OS-SART algorithms
concern the number of simultaneously used projections [1]. The algorithms of the
ART-family are described by the formula:

xk+1 = xk + λk V AT W (b − Axk ) (3)

where V and W are weight matrices based on ray length [1,5,9].


The conjugate gradient least squares (CGLS) algorithm employs conjugate
gradient methods for solving the least squares problem for normal equations.

AT Ax = AT b (4)

They iterate over Krylov subspaces minimizing, in descending order, the


residuals of the eigenvectors. Krylov spaces are generated as follows:

Kr (A, b) = span{b, Ab, A2 b, . . . , Ak−1 b} (5)


110 P. Lipowicz et al.

This method is definitely faster and gives similar results to algorithms from
the ART-family [15,17].
The Adaptive Steepest Descent Projection Onto Convex Subsets method
(ASD-POCS) is based on constrained total variation (TV) minimisation. The
algorithms stabilize the image reconstruction in parts with a large beam cone
angle and obtain good reconstruction of very noisy or heavily undersampled
images. The norm of the total variation is defined as the sum of the 2-norm
directional gradients of the variable.
 
||x||T V = || δα xn ||2 (6)
n α

The ASD-POCS algorithms use the SART algorithm as a constraint on the


data, solving the minimisation problem. Using different algorithm, such as the
faster OS-SART, speeds up the algorithm and it was named OS-ASD-POCS
[18,19]. A variant using adaptive-weighted total variation minimization was
named OS-AwASD-POCS. Its time is comparable to ASD-POCS despite the use
of OS-SART but, better detects edges between two media in the reconstructed
image [10].

2.3 Image Quality Assessment

Several techniques have been used to measure errors between images, to deter-
mine the errors created during reconstruction. These are Root Mean Square
Error (RMSE), Structural Similarity Index (SSIM), Multi-scale Structural Sim-
ilarity Index (MS-SSIM), and Visual Information Fidelity (VIF).
RMSE is the square root of the mean square error. It measures to what extent
the processing has induced changes per pixel. The RMSE between two images,
oryginal (k) and reconstructed (k  ) shows the formula:

 M −1,N
 1  −1
RM SE =  [k  (i, j) − k(i, j)]2 (7)
M ∗ N i=0,j=0

The lower RMSE value, the closer the image is to the original or reference
image [2].
SSIM measures the similarity between two images by comparing them in
terms of luminance l, contrast c and structure s. These three components are
combined to give a measure of image similarity that can be written with the
formula [22]:

SSIM (x, y) = [l(x, y)]α · [c(x, y)]β · [s(x, y)]γ , (8)


(2μx μy + C1 )(2σxy + C2 )
SSIM (x, y) = , (9)
(μ2x μ2y + C1 )(σx2 σy2 + C2 )
C1 = (K1 L)2 , C2 = (K2 L)2 , C3 = C2 /2, (10)
Comparison of Analytical and Iterative Algorithms 111

where C1 and C2 are constants using two scalar constants K1 and K2 and the
dynamic range of the image L. In this paper K1 = 0.01 and K2 = 0.03. The
SSIM indexing algorithm uses a sliding window method to evaluate the image
quality. SSIM takes values from −1 to 1, SSIM (x, y) = 1 if and only if x = y.The
window is moved pixel by pixel in the whole image. In the paper, the window
size is 8 × 8 px.
MS-SSIM involves low-pass filtering of the images and downsampling of the
filtered image by a factor of 2. The iterations up to M − 1 are performed. The
overall evaluation of SSIM is obtained by combining the measurements according
to the formula [20]:
M

M S − SSIM (x, y) = [lM (x, y)]αM · [cj (x, y)]βj · [sj (x, y)]γj (11)
j=1

In this paper K1 = 0.01, K2 = 0.03 and window size is 11×11 px.


The VIF method is based on the human visual system (HVS). The calcu-
lated vector models represent the information that could ideally be extracted
by the brain from a particular sub-band in the reference and the test images,
respectively [16].
 →
− −

I( C N,j ; F N,j |sN,j )
j∈subbands
V IF =  →N,j −
− → (12)
j∈subbands I( C ; E N,j |sN,j )

The method differentiates the VIF by taking values from 0 to 1 where 1


means that there is no distortion in the test image. However, values higher than
1 can be obtained and then it can mean that there is contrast enhancement, i.e.
the test image has a higher visual quality than the reference image.

3 Results
3.1 Head and Shepp-Logan Phantoms
Reconstruction Time. A script was written in Python which sequentially
reconstructed the phantoms using FDK algorithms with different filters, next
algorithms from the ART-family, CGLS, and finally the POCS family. Before
each algorithm the start time of the reconstruction was registered and after the
script exited the algorithm the end time was registered. The difference between
the start time and the end time was calculated. The results from the Head and
Shepp-Logan (S-L) phantom reconstruction are shown in Table 1.
There is a noticeable difference in the reconstruction time of the both phan-
toms. This is due to the different number of projections used for reconstruction
512 for S-L phantom and 901 for Head phantom. The differences are significant
especially with SART, CGLS, ASD-POCS and OS-AwASD-POCS algorithms.
In all iterative algorithms, 20 iterations were performed. The algorithms using
ordered subsets (OS) used a subset size of 53 for 901 projections and 32 for 512
112 P. Lipowicz et al.

Table 1. Duration of reconstruction of head phantom, Shepp-Logan phantom and Rat


madibular scan. For iterative algorithms is shown the time per iteration, 20 iterations
were performed

Algorithm Duration
Head phantom S-L phantom Rat madibular
FDK Ram Lak 0:00:10 0:00:07 0:00:40
FDK Hann 0:00:10 0:00:06 0:00:38
FDK Cosine 0:00:11 0:00:06 0:00:44
FDK Shepp-Logan 0:00:10 0:00:06 0:00:39
FDK Hamming 0:00:11 0:00:06 0:00:41
SART 0:14:38 0:08:39 1:16:58
SIRT 0:00:06 0:00:04 0:01:08
OS-SART 0:00:23 0:00:22 0:01:30
CGLS 0:00:10 0:00:06 0:03:04
ASD-POCS 0:14:15 0:08:05 1:15:26
OS-ASD-POCS 0:00:28 0:00:22 0:04:55
OS-AwASD-POCS 0:15:39 0:08:36 1:17:35

projections. The OS-SART algorithm and those using OS-ASD-POCS are less
sensitive to the number of projections. Significantly the fastest reconstruction
was performed by the FDK algorithm, the duration was in around 10 s where
for the fastest iterative algorithm SIRT the duration was over 1 min.

Assessment of Image Similarity. From the reconstructed data, one cross-


sectional image was selected with each of the aforementioned algorithms and
compared with the reference image for each phantom. These were compared
using the RMSE, SSIM, MS-SSIM and VIF methods. Figure 1 shows the same
cross-sections from each algorithm and the reference image for the Head phan-
tom. Figure 2 shows the same data for the S-L phantom.
The images of the Head phantom shown in Fig. 1 demonstrate high similarity
to each other. Significant distortion is seen in the images after SIRT and OS-
ASD-POCS reconstruction. CGLS reconstruction visibly increases the contrast
in relation to the other algorithms and the reference image.
As in the Head phantom, the images from the S-L phantom shown in Fig. 2
do not show high distortion outside the SIRT, OS-ASD-POCS and CGLS algo-
rithms. The FDK algorithm shows fluctuations of the area inside the phantom.
CGLS shows increased contrast and has also introduced noise throughout the
phantom area.
A script was executed in the Python environment to calculate the differences
between the reference image and the reconstructed images. The parameters used
for the similarity assessment methods are presented in the methodology. Table 2
shows the results of comparing the Head phantom images by the discussed methods.
Comparison of Analytical and Iterative Algorithms 113

Fig. 1. Comparison of Head phantom reference image (a) to equivalent cross-sections


from various reconstruction algorithms: FDK Ram-Lak (b), FDK Hann (c), FDK
Cosine (d), FDK Shepp-Logan (e), FDK Hamming (f), SART (g), SIRT (h), OS-SART
(i), CGLS (j), ASD-POCS (k), OS-ASD-POCS (l), OS-AwASD-POCS (m)

Fig. 2. Comparison of S-L phantom reference image (a) to equivalent cross-sections


from various reconstruction algorithms: FDK Ram-Lak (b), FDK Hann (c), FDK
Cosine (d), FDK Shepp-Logan (e), FDK Hamming (f), SART (g), SIRT (h), OS-SART
(i), CGLS (j), ASD-POCS (k), OS-ASD-POCS (l), OS-AwASD-POCS (m)

Out of all types of reconstruction algorithms, in the case of the Head


phantom, the images obtained from reconstruction with the SART and ASD-
POCS methods are the most similar to the reference image. Very similar results
were also obtained with the OS-AwASD-POCS method. The analitic algorithm
obtained slightly worse results than the aforementioned iterative algorithms. The
highest similarity among the types of filtering used in the FDK was obtained
with the Ram-Lak and Shepp-Logan filter, the worst in each method of com-
parison was obtained with the Hann type filtering. Among all the presented
algorithms, the SIRT and CGLS algorithms distorted the image the most.
After performing a similarity analysis of the images reconstructed for the
Head phantoms, calculations were performed in the same method for the S-L
phantom. Table 2 shows the results obtained for the S-L phantom. In Table 2
it can be seen that, as with Head phantom, the data from the ASD-POCS
114 P. Lipowicz et al.

Table 2. Results of comparison the reconstruction Head phantom, S-L phantom images
and reference by the RMSE, SSIM, MS-SSIM and VIF methods

Algorithm RMSE SSIM MS-SSIM VIF


Head S-L Head S-L Head S-L Head S-L
FDK Ram Lak 0.0164 0.0277 0.9616 0.8912 0.9901 0.9853 0.9328 0.8481
FDK Hann 0.0171 0.0400 0.9601 0.9611 0.9898 0.9928 0.8682 0.7159
FDK Cosine 0.0167 0.0355 0.9612 0.9539 0.9901 0.9919 0.8990 0.7664
FDK Shepp-Logan 0.0165 0.0299 0.9616 0.9201 0.9901 0.9883 0.9214 0.8197
FDK Hamming 0.0170 0.0388 0.9603 0.9599 0.9899 0.9926 0.8733 0.7260
SART 0.0158 0.0140 0.9877 0.9520 0.9920 0.9944 0.9799 0.9608
SIRT 0.0629 0.1192 0.7677 0.7296 0.8600 0.8447 0.1136 0.0613
OS-SART 0.0165 0.0352 0.9858 0.9738 0.9914 0.9959 0.8857 0.7579
CGLS 0.0442 0.0468 0.6937 0.7480 0.9494 0.9652 1.1614 0.9588
ASD-POCS 0.0157 0.0140 0.9878 0.9966 0.9921 0.9998 0.9669 0.9354
OS-ASD-POCS 0.0207 0.0453 0.9716 0.9645 0.9857 0.9923 0.7063 0.6075
OS-AwASD-POCS 0.0160 0.0146 0.9868 0.9953 0.9919 0.9996 0.9801 0.9421

and OS-AwASD-POCS iterative algorithms are most similar to the reference


images. High correlation is also achieved by SART especially the RMSE and VIF
methods. The SSIM and MS-SSIM methods also showed high similarity for the
OS-SART algorithm. Images reconstructed with the SIRT and CGLS algorithms
had the highest distortion. The high score of the CGLS algorithm in the VIF
method is explained by the increased contrast. The analytical algorithm will
again score insignificantly worse than the iterative algorithms. In the SSIM and
MS-SSIM methods, the highest similarity was obtained with the FDK algorithm
with Hann and Hamming filtering. In the RMSE and VIF methods, the FDK
Ram-Lak and Shepp-Logan gave better results.

3.2 Rat Madibular Angle Scan


Previously conducted tests of the algorithms allowed to conclude that the script
and TIGRE toolbox can effectively reconstruct images using iterative and ana-
lytical algorithms. Next, attempts were made to reconstruct a real object rat
madibular angle scanned on a microtomograph. After defining the geometry of
the device and considering the object motion correction and temperature cor-
rection, reconstructions were carried out using the algorithms presented above.

Reconstruction Time. The scan was performed at a high (2000 × 1332 px)
resolution and therefore the reconstruction process required significantly more
time. The reconstruction included only 100 central cross-sections, and the detec-
tor was resized to 400 × 1332 px to limit the duration. The measured times are
presented in Table 1.
Comparison of Analytical and Iterative Algorithms 115

The longest reconstruction time was with the SART algorithm and was over
1 day. The next longest calculations took with the OS-AwASD-POCS and ASD-
POCS algorithms (over 20 h). The FDK analytical algorithm reconstructed scans
in under 1 min. The other iterative algorithms took in the range of 20 min to 1h.
However, this is several dozen times longer than FDK.

Assessment of Image Similarity. After the reconstruction, same cross-


section was selected from each data set. These were compared with each other
and are shown in Fig. 3.

Fig. 3. Comparison of equivalent cross sections rat madibular scan from various recon-
struction algorithms: FDK Ram-Lak (a), FDK Hann (b), FDK Cosine (c), FDK Shepp-
Logan (d), FDK Hamming (e), SART (f), SIRT (g), OS-SART (h), CGLS (i), ASD-
POCS (j), OS-ASD-POCS (k), OS-AwASD-POCS (l)

In the images after reconstruction with the FDK (Fig. 3(a)–(e)) it is difficult
to notice any differences. They have a homogeneously distributed brightness.
The beam hardening artefacts at the borders of the composite implant (tita-
nium + bio-glass) are clearly visible. Images from the SIRT algorithm and OS-
ASD-POCS are the most distorted. In these images it is difficult to differentiate
the bone structure model. In the other iterative algorithms, the difference in
brightness between the right and left side of the image is visible. In SART and
ASD-POCS the beam hardening artefacts are the slightest.
Due to the fact that this is a scan of a real object, the reconstructed images
cannot be compared to the reference image. Data from the FDK Ram-Lak algo-
rithm was used to calculate image similarity. The images from the FDK analyti-
cal algorithm look the most homogeneous and the Ram-Lak filtering in phantom
reconstruction obtained the best results. The results from RMSE, SSIM, MS-
SSIM and VIF calculations are shown in Table 3. Among the analytical algo-
rithms, the Sheep-Logan filtering method obtained the least differences. The
SIRT algorithm distorted the data the most. The images from CGLS and OS-
SART are close to FDK Ram-Lak in terms of pixel values but the blurring of the
116 P. Lipowicz et al.

Table 3. Results of comparison the reconstruction S-L phantom images and reference
by the RMSE, SSIM, MS-SSIM and VIF methods

Algorithm RMSE SSIM MS-SSIM VIF


FDK Ram Lak 0.0000 1.0000 1.0000 1.0000
FDK Hann 0.0048 0.9772 0.9951 0.7520
FDK Cosine 0.0030 0.9912 0.9983 0.8471
FDK Shepp-Logan 0.0010 0.9990 0.9998 0.9452
FDK Hamming 0.0044 0.9808 0.9958 0.7692
SART 0.0249 0.8389 0.9443 0.6976
SIRT 0.0274 0.7590 0.8687 0.1088
OS-SART 0.0137 0.8767 0.9544 0.3290
CGLS 0.0130 0.8754 0.9532 0.3482
ASD-POCS 0.0258 0.8010 0.9310 0.4476
OS-ASD-POCS 0.0171 0.7787 0.9047 0.2279
OS-AwASD-POCS 0.0253 0.8253 0.9416 0.5481

image, results in a low score in the VIF method which analyses the image struc-
ture. Among the iterative algorithms, the images from SART and OS-AwASD-
POCS algorithm are the most similar considering all image assessment methods.

4 Discussion and Conclusion

Comparing the time to perform the reconstruction of the phantoms and the rat
mandibular scan, the trend and differences between the algorithms were pre-
served. The reconstruction of the mandible scan took a very long time using
iterative methods. This is due to the high image resolution of 2000 × 1332 px
and the number of projections (901). It should be taken into account that only
a fragment of the data set was reconstructed (100 cross-sections). When recon-
structing the full set, the time would probably increase several times [4].
Analysis of the results from the image evaluation methods for the phantoms
showed comparable results, slight differences may be due to fewer projections in
the S-L phantom. The iterative algorithms, particularly SART, ASD-POCS and
OS-AwASD-POCS gave slightly improved results compared to the analytical
algorithm. In both cases, the SIRT and CGLS algorithms distorted the image
the most. The worse results may be due to the need to perform more iterations
in these algorithms or to increase the size of the projection sub-sets in the case
of SIRT. However, this would result in increased reconstruction time [23].
The results from the reconstruction of the mandibular scans follow the trend
observed in the reconstruction of the phantoms. The SART iterative algorithm
showed the best results among all iterative algorithms. The OS-SART and CGLS
algorithms in the RMSE, SSIM, MS-SSIM method showed high similarity to
Comparison of Analytical and Iterative Algorithms 117

FDK Ram-Lak. The VIF method confirmed, what was visible in the images, that
blurring significantly reduces the ability to recognise structures in the images.
In summary, the algorithms with the longest reconstruction time gave the best
results, i.e. SART and OS-AwASD-POCS [3]. However, the reconstruction by the
analytical method is definitely faster, images are homogeneous and the structure
is clearly visible. Both, iterative and analytical algorithms, induce artefacts in
the implant region. The minimal number of artefacts was visible in the SART
and OS-AwASD-POCS algorithms. The filtering of the analytical algorithm does
not introduce significant changes in the image, it correctly removes high fre-
quency noise. Identifying which of the analysed filters least distorts object details
requires further research.
Microtomography requires very high resolution scans and multiple projec-
tions. This amount of data during reconstruction with iterative algorithms
requires very high computing power of computers and the reconstruction would
take many hours or even days. The results presented in this work were obtained
for algorithms in TIGRE and are implementation specific, a more task-specific
implication for iterative reconstructions may be much faster.
Further work to develop a method to reduce implant artefacts in reconstruc-
tion images will be carried out on the basis of analytical algorithms due to their
currently more practical application than iterative algorithms.

Acknowledgement. The research was performer as a part of the projects WI/WM-


IIB/2/2021, WI/WM-IIB/4/2021, WZ/WM-IIM/3/2020 and was financed with the
founds for science from the Polish Ministry of Science and Higher Education.

References
1. Andersen, A.H., Kak, A.C.: Simultaneous Algebraic Reconstruction Technique
(SART): a superior implementation of the ART algorithm. Ultrason. Imaging 6(1),
81–94 (1984)
2. Asamoah, D., Ofori, E., Opoku, S., Danso, J.: Measuring the performance of image
contrast enhancement technique. Int. J. Comput. Appl. 181(22), 6–13 (2018)
3. Beister, M., Kolditz, D., Kalender, W.A.: Iterative reconstruction methods in X-ray
CT. Physica Med. 28(2), 94–108 (2012)
4. Biguri, A., Dosanjh, M., Hancock, S., Soleimani, M.: TIGRE: a MATLAB-GPU
toolbox for CBCT image reconstruction. Biomed. Phys. Eng. Express 2(5), 055010
(2016)
5. Censor, Y., Elfving, T.: Block-iterative algorithms with diagonally scaled oblique
projections for the linear feasibility problem. SIAM J. Matrix Anal. Appl. 24(1),
40–58 (2002)
6. Feldkamp, L.A., Davis, L.C., Kress, J.W.: Practical cone-beam algorithm. Josa a
1(6), 612–619 (1984)
7. Lee, S.W., et al.: Effects of reconstruction parameters on image noise and spatial
resolution in cone-beam computed tomography. J. Korean Phys. Soc. 59(4), 2825–
2832 (2011)
118 P. Lipowicz et al.

8. Li, L., Chen, Z., Xing, Y., Zhang, L., Kang, K., Wang, G.: A general exact method
for synthesizing parallel-beam projections from cone-beam projections by filtered
backprojection. In: 2006 IEEE Nuclear Science Symposium Conference Record,
vol. 6, pp. 3476–3479. IEEE (2006)
9. Liu, J., Wright, S.: An accelerated randomized Kaczmarz algorithm. Math. Com-
put. 85(297), 153–178 (2016)
10. Liu, Y., Ma, J., Fan, Y., Liang, Z.: Adaptive-weighted total variation minimization
for sparse data toward low-dose X-ray computed tomography image reconstruction.
Phys. Med. Biol. 57(23), 7923 (2012)
11. Machin, K., Webb, S.: Cone-beam X-ray microtomography of small specimens.
Phys. Med. Biol. 39(10), 1639 (1994)
12. Pan, X., Sidky, E.Y., Vannier, M.: Why do commercial CT scanners still employ
traditional, filtered back-projection for image reconstruction? Inverse Probl.
25(12), 123009 (2009)
13. Pontana, F., et al.: Chest computed tomography using iterative reconstruction vs
filtered back projection (part 2): image quality of low-dose ct examinations in 80
patients. Eur. Radiol. 21(3), 636–643 (2011)
14. Pontana, F., et al.: Chest computed tomography using iterative reconstruction vs
filtered back projection (part 1): evaluation of image noise reduction in 32 patients.
Eur. Radiol. 21(3), 627–635 (2011)
15. Qiu, W., Titley-Péloquin, D., Soleimani, M.: Blockwise conjugate gradient methods
for image reconstruction in volumetric CT. Comput. Methods Programs Biomed.
108(2), 669–678 (2012)
16. Sheikh, H.R., Bovik, A.C.: Image information and visual quality. IEEE Trans.
Image Process. 15(2), 430–444 (2006)
17. Shewchuk, J.R., et al.: An introduction to the conjugate gradient method without
the agonizing pain (1994)
18. Sidky, E.Y., Pan, X.: Image reconstruction in circular cone-beam computed tomog-
raphy by constrained, total-variation minimization. Phys. Med. Biol. 53(17), 4777
(2008)
19. Song, X., et al.: Non-invasive location and tracking of tumors and other tissues for
radiation therapy (2010). US Patent App. 12/679,730
20. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image
quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Sys-
tems & Computers 2003, vol. 2, pp. 1398–1402. IEEE (2003)
21. Yu, H., Ye, Y., Wang, G.: Katsevich-type algorithims for variable radius spiral
cone-beam CT. In: Developments in X-Ray Tomography IV, vol. 5535, pp. 550–
557. International Society for Optics and Photonics (2004)
22. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment:
from error visibility to structural similarity. IEEE Trans. Image Process. 13(4),
600–612 (2004)
23. Zeng, G.L.: Comparison of FBP and iterative algorithms with non-uniform angu-
lar sampling. In: 2014 IEEE Nuclear Science Symposium and Medical Imaging
Conference (NSS/MIC), pp. 1–13. IEEE (2014)
Comparison of Interpolation Methods
for MRI Images Acquired with Different
Matrix Sizes

Adam Cieślak1 , Adam Piórkowski1(B) , and Rafal Obuchowicz2


1
Department of Biocybernetics and Biomedical Engineering, AGH University
of Science and Technology, Mickiewicza 30 Av., 30-059 Krakow, Poland
acieslak@student.agh.edu.pl, pioro@agh.edu.pl
2
Department of Diagnostic Imaging, Jagiellonian University Medical College,
ul. Kopernika 19, 31-501 Krakow, Poland
rafalobuchowicz@su.krakow.pl

Abstract. Magnetic resonance is one of the most comprehensive and safe


radiological techniques. However, a serious limitation is signal strength
because it is inversely proportional to image resolution. One of the
most important parameters which determines signal-to-resolution ratio is
matrix size, therefore a post-processing technique which allows the best-
possible resolution to be obtained is desired. This paper concerns a study
whose main goal was to evaluate seventeen popular interpolation methods
and select the one that best estimates human tissues when enlarging MRI
images. The experiment was conducted using data from twenty left shoul-
der MRI scans from different patients. In order to compare interpolation
methods, lower-resolution images were upsampled to higher-resolution
images, after which the quality of each method was checked using the
structural similarity index measure and mean square error.

Keywords: Interpolation · Matrix · MRI image · SSIM · MSE

1 Introduction
Magnetic resonance is a non-invasive modern diagnostic technique for which no
adverse biological effects have yet been identified. MR images offer a plethora of
diagnostic information with little risk for the patient. MR images rely on com-
plex physical phenomena, for which numerous computational process steps are
required. A key role is played by proper choice of parameters, especially matrix
size (MS), which has to be adjusted according to the size of the anatomical
region and the desired image resolution. According to the physical limitations of
the signal strength received from spin-spin and spin-lattice dynamics, increasing
the resolution (with high MS size) results in significant signal strength reduc-
tion. Inversely, the high signal strength obtained by small size matrix has the
disadvantage of low image resolution [5]. Therefore, Matrix Size has to be set
carefully with awareness of the underlying physical processes. One technique

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 119–131, 2022.
https://doi.org/10.1007/978-3-031-09135-3_11
120 A. Cieślak et al.

that is used to overcome the aforementioned problems is post-processing of the


obtained image, because rescaling image to bigger size, can lower the time of
acquiring the image. To ensure the best possible technique of reproducing an
image with a larger matrix size, an appropriate method should be selected for
interpolation that allows a value to be set at a point that was not recorded during
acquisition based on values from surrounding voxels. Interpolation can also be
used in three-dimensional images, where the cross section along the XY axis is
usually a more accurate metric than the cross-section in the Z axis, thus helping
to overcome geometric distortion. This would allow the reconstruction of images
in three axes. An appropriate interpolation technique can aid in diagnosis as this
makes it possible to access more signal quality [9].

2 Material and Methods

2.1 Interpolation Methods

Seventeen popular interpolation methods were used to enlarge the images. All
of them came from the Matplotlib library written in Python [4].
1. None 4. Bicubic [3] 9. Hermite [12] 14. Bessel [8]
2. Nearest 5. Spline 16. [7] 10. Kaiser [6] 15. Mitchel [1]
Neighbor 6. Spline 36 [7] 11. Quadratic [3] 16 Sinc [2]
3. Bilinear 7. Hanning [10] 12. Catrom [14] 17. Lanczos [1]
8. Hamming [10] 13. Gaussian [13]

2.2 Used Data


The study protocol was designed according to the guidelines of the Declaration
of Helsinki and the Good Clinical Practice Declaration Statement. Local Ethics
Committee acceptance was obtained for the study: 155/KBL/OIL/2017, dated
22.09.2017. All images were carefully anonymized. The data used in this study
came from 20 patients. T2-weighted coronal images of the shoulders were made
during a normal diagnostic procedure; images with low quality or with artifacts
were excluded. Each study includes a series of images of the same patient in the
same shot, with sizes 256 × 256, 320 × 320, 384 × 384, 448 × 448 and 512 × 512 (in
pixels). The series with a size of 512 × 512 was rejected because there was too much
noise in the images, which could have a negative impact on the results of the qual-
ity evaluation methods. For two studies, a series of photos of twenty cross-sections
were made; for the rest of the patients, the number of cross-sections was fifteen.
Field of View of the scanner for each of the images was 200 mm × 200 mm.

2.3 Image Scaling


Images Expanded to a Size of 448 × 448. Image sets with sizes 256 × 256,
320 × 320, 384 × 384 have been scaled to 448 × 448. Sample cross-sections are
shown below in Fig. 1.
Comparison of Interpolation Methods for MRI Images 121

Fig. 1. Examples of cross-sections of various sizes enlarged to the size 448 × 448 by
using the bilinear method, along with their equivalent of a given size

Images Expanded to the Size of the Nearest Larger Matrix Size. The
images in this set have been scaled to the size of the nearest larger matrix’s size.
For example, an image with a size of 320 × 320 has been enlarged to 384 × 384
(Fig. 2).

2.4 Evaluation of Interpolation Methods


In order to assess how well a method can estimate tissues while magnifying MRI
images, it was decided to compare each method with an image whose size was
the target size during rescaling. For the quality evaluation of the method, the
series of images with the original M ×M image and the one scaled to M ×M size
were compared using the SSIM Eq. (1) and M SE Eq. (2) measures. However,
these measures were only used for the five middle slices of the images serie in
order to include the fragment of series that contain the least background noise.
122 A. Cieślak et al.

Fig. 2. Examples of cross-sections of various sizes enlarged to the size of the nearest
larger matrix size by using the bilinear method, along with their equivalent of a given
size
Comparison of Interpolation Methods for MRI Images 123

(2μx μy + C1 ) + (2σxy + C2 )
SSIM (x, y) = (1)
(μ2x + μ2y + C1 )(σx2 + σy2 + C2 )

where: μx , μy are mean values of compared images, σx , σy , σx2 , σy2 and σxy are
standard deviations, variances and covariance of respective images, C1 , C2 are
small constant values to avoid division by zero problem.
The second one was Mean Square Error (MSE) of interpolated volume and
observed volume.
n
1
M SE = (y − y)2 (2)
n i=1

where: y is image of original size M × M and y is image scaled to size M × M ,


n denotes number of pixels in images [11].

3 Results

Figure 3 below shows an example of a cross-section of a randomly selected


patient. One cross-section uses the default method (Fig. 3(a)); the other uses
the Gaussian method. (Fig. 3(b)). The differences between these images are vis-
ible at first glance. They can be best seen in very heavily noisy areas where
the Gaussian method makes the elements smoother. The image for the default
interpolation method shows strong jumps in the signal, which may be because
the original image had much less information.
The quality of the methods can also be assessed by the difference images
between the images of native size and images enlarged to those size, which shows
the differences between two methods (Fig. 3(c) and 3(d)). More artifacts are
visible in the that used default method during the upsampling. The differences
slightly smaller when you look at the Gaussian interpolated image, where the
differences in the image are much smaller compared to the previous method.
However, this may be a subjective feeling for the observer.

3.1 Excluding Studies

Group A. In this group, none of the studies were rejected. This resulted in a
set of twenty studies, each with four sets of images. For these, a comparison was
made of the methods of increasing images to the size of 448 × 448 and to the size
of the image with the closest larger matrix size. The results for a given group
are presented in the Fig. 4.
It is evident that the bicubic, quadric and gaussian methods turned out to
be the best methods in this group. Others did not show as good results as the
previous three.

Group B. This group consisted of sixteen studies. The numbers 8, 10, 12 and
15 were rejected. This was due to the largest M SE peaks when zooming into
the image size with the closest larger matrix size. These results are shown in
124 A. Cieślak et al.

Fig. 3. Comparison of interpolation methods on enlarged images and differential images

the Fig. 5. For all sixteen studies, a comparison was made of the methods of
increasing images to the size of 448 × 448 and to the size of the image with the
closest larger matrix size.
The results for a given group are presented below in the Figs. 6(a), 6(b), 6(c)
and 6(d).
In this group, an increase in the scope of SSIM and a decrease in the scope
of M SE were noticed (Fig. 6). The layout of the best methods has also changed.
The quadric method shows the best results when zoomed to the size of the closest
larger matrix size. In contrast, the bicubic method is best for enlarging images to
the size of 448 × 448. Gaussian and Mitchell’s methods also showed good results.
Comparison of Interpolation Methods for MRI Images 125

Fig. 4. Graphs of the mean SSIM (a), (c) and M SE (b), (d) for five middle slices
acquired by enlarging images to a matrix of 448 × 488 size (a), (b) or to a matrix of
nearest larger matrix size (c), (d) – Group A
126 A. Cieślak et al.

Fig. 5. Series M SE plot for image scaling to size of nearest larger matrix size for
default interpolation method, for twenty patients

Group C. This group consisted of eleven studies. Those with values greater
than that of patient number 20 were rejected. This was due to the highest mean
M SE values for all study lots when zooming into the image size with the closest
larger matrix size. These results are shown in the Fig. 7. For all eleven studies, a
comparison was made of the methods of increasing images to the size of 448 × 448
and to the image size of the closest larger matrix size.
The results for a given group are presented below in the Fig. 8.
Again, there is an increase in the scope of the SSIM and a decrease in the
scope of MSE. The layout of the best methods has also changed. The methods
that showed the best quality when enlarging MRI images were quadric, bicubic,
bilinear and Mitchell’s.

4 Interpolation Used in Construction of a Cephalometric


Image Based on Data from Magnetic Resonance
Imaging

Effectiveness of better and worse interpolation method, is shown below in Fig. 9.


Using default method (Nearest), makes the reconstructed image more edgy and
one could say that it has lower resolution than the image reconstructed with
bicubic method. Bicubic method creates much more smoother image, although
some artifacts are still visible.
Comparison of Interpolation Methods for MRI Images 127

Fig. 6. Graphs of the mean SSIM (a), (c) and M SE (b), (d) for five middle slices
acquired by enlarging images to a matrix of 448 × 488 size (a), (b) or to a matrix of
nearest larger matrix size (c), (d) – Group B
128 A. Cieślak et al.

Fig. 7. Series mean M SE plot for the image scaling to size of the closest larger matrix
size for the default interpolation method, for twenty patients

5 Discussion
In conclusion, it was possible to compare popular interpolation methods and
indicate those that best deal with the estimation of human tissues when enlarging
MRI images. Taking into account sets of images where they were enlarged to the
size of the closest larger matrix size, the quadric method often turned out to
be the best method. However, when you look at the sets where the images were
enlarged to the size of 448 × 448, the bicubic method stood out there. The results
were similar to those presented by M. B. Hisham [3], where he states that bicubic
interpolation is the best one from the methods that he compares. It is also worth
paying attention to the difference in individual groups. Changes in the range of
SSIM and M SE are visible, which may mean that studies in which patients
showed greater agitation between series could be excluded. The above studies
were conducted to find the best interpolation method for tissue estimation on
MRI images during enlargement. The best method will be used to reconstruct a
cephalometric image from MRI images.
Comparison of Interpolation Methods for MRI Images 129

Fig. 8. Graphs of the mean SSIM (a), (c) and M SE (b), (d) for five middle slices
acquired by enlarging images to a matrix of 448 × 488 (a), (b) or nearest larger matrix
(c), (d) size – Group C
130 A. Cieślak et al.

Fig. 9. Cephalometric image reconstructed using Near (a), (b) and Bicubic
(c), (d) method

Acknowledgement. The data acquisition was carried out based on the consent of the
Jagiellonian University’s Bioethics Committee (No 155/KBL/OIL/2017, 22.09.2017).
This work was financed by the AGH University of Science and Technology thanks
to the Rector’s Grant 18/GRANT/2022.
This work was co-financed by the AGH University of Science and Technology,
Faculty of EAIIB, KBIB no 16.16.120.773.
Work carried out within the grant Studenckie Kola tworza innowacje - II edition,
project no. SKN/SP/535131/2022 entitled “Cephalometric image reconstruction based
on magnetic resonance imaging”.
Comparison of Interpolation Methods for MRI Images 131

References
1. Conejero, J.: Interpolation algorithms in pixinsight (2011)
2. Getreuer, P.: Linear methods for image interpolation. Image Process. Line 1, 238–
259 (2011)
3. Hisham, M., Yaakob, S.N., Raof, R., Nazren, A., Wafi, N.: An analysis of perfor-
mance for commonly used interpolation method. Adv. Sci. Lett. 23(6), 5147–5150
(2017)
4. Hunter, J.D.: Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9(03),
90–95 (2007)
5. Kokeny, P., Cheng, Y.C.N., Xie, H.: A study of MRI gradient echo signals from dis-
crete magnetic particles with considerations of several parameters in simulations.
Magnet. Reson. Imaging 48, 129–137 (2018)
6. Kuo, F.F., Kaiser, J.F.: System Analysis by Digital Computer. Wiley (1966)
7. Limongelli, M., Carvelli, V.: Damage localization in a glass fiber reinforced compos-
ite plate via the surface interpolation method. In: Journal of Physics: Conference
Series, vol. 628, p. 012095. IOP Publishing (2015)
8. Mohan, P.G., Prakash, C., Gangashetty, S.V.: Bessel transform for image resizing.
In: 2011 18th International Conference on Systems, Signals and Image Processing
(2011)
9. Plenge, E., et al.: Super-resolution methods in MRI: can they improve the trade-
off between resolution, signal-to-noise ratio, and acquisition time? Magnet. Reson.
Med. 68(6), 1983–1993 (2012)
10. Podder, P., Khan, T.Z., Khan, M.H., Rahman, M.M.: Comparative performance
analysis of hamming, hanning and blackman window. Int. J. Comput. Appl. 96(18)
(2014)
11. Sara, U., Akter, M., Uddin, M.S.: Image quality assessment through FSIM, SSIM,
MSE and PSNR- a comparative study. J. Comput. Commun. 7(3), 8–18 (2019)
12. Seta, R., Okubo, K., Tagawa, N.: Digital image interpolation method using higher-
order hermite interpolating polynomials with compact finite-difference. In: Pro-
ceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Asso-
ciation, 2009 Annual Summit and Conference, pp. 406–409. Asia-Pacific Signal and
Information Processing Association (2009)
13. Smith, J.O.: Spectral Audio Signal Processing. W3K (2011)
14. Twigg, C.: Catmull-rom splines. Computer 41(6), 4–6 (2003)
Preprocessing of Laryngeal Images
from High-Speed Videoendoscopy

Justyna Kaluża1(B) , Pawel Strumillo1 , Ewa Niebudek-Bogusz2 ,


and Wioletta Pietruszewska2
1
Institute of Electronics, Faculty of Electrical, Electronic, Computer and Control
Engineering, Lodz University of Technology, 211/215 Wolczanska Street,
90-924 Lodz, Poland
justyna.sujecka@dokt.p.lodz.pl, pawel.strumillo@p.lodz.pl
2
Department of Otolaryngology, Head and Neck Oncology,
Medical University of Lodz, Lodz, Poland
{ewa.niebudek-bogusz,wioletta.pietruszewska}@umed.lodz.pl

Abstract. In this paper, we present a method for preprocessing images


from laryngeal high-speed videoendoscopy (LHSV). We developed image
processing procedures to better prepare the images for automated com-
puter analysis. Namely, we removed shifts between consecutive LHSV
images and glares distorting the images, detected the region of interest,
rotated the images, and finally detected vocal fold regions. We tested the
developed algorithms on 13 LHSV recordings made for healthy patients
and present example results of the conducted study.

Keywords: Laryngeal High-Speed Videoendoscopy · Image


processing · Vocal folds · RGB images

1 Introduction
At the turn of the 19th and 20th century, an effective solution was sought to
correlate the size and type of pathological changes with their impact on vocal
fold vibrations. Commonly used videolaryngostroboscopy (VLS) imaging has
some limitations due to the sampling technique [13]. An adequate effect can
only be obtained if the observed movement is sufficiently periodic and stable, so
a visualization problem may arise when vocal fold movements are nonperiodic
(e.g. voice breaks or vocal tremor) [17]. In many cases the acoustic signal is
distorted to such an extent that the fundamental frequency cannot be calculated,
and thus the sampling rate of the VLS images cannot be adjusted, resulting in
the inability to visualize properly the vibrations of the vocal folds [21]. The
Laryngeal High-Speed Videoendoscopy (LHSV) overcomes these limitations by
recording the images at a frame rate that is much higher and independent of the
fundamental frequency of the vocal folds [20]. At the end of the 20th century,
the development of electronic technology allowed the construction of high-speed
video cameras [1]. Due to their size, the original models were used for scientific

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 132–142, 2022.
https://doi.org/10.1007/978-3-031-09135-3_12
Preprocessing of Laryngeal Images from High-Speed Videoendoscopy 133

and research purposes only. Initially, the cameras could recorded 2000 frames
per seconds (fps), with time the number of frames increased even to 6000 fps fps
but this was at the cost of compromising image quality or increasing the weight
of the device [17]. The breakthrough came in recent years. Currently, available
high-speed camera are of small size and weight and allow to record images at a
rate of 3200 colour images per seconds at a resolution of 480 × 400 pixels. Due to
short, exposition intervals a laser illuminator is used, which features dynamically
adjustable light intensity. The camera can be directly coupled to a computer and
the recording is carried out on-line. The use of the LHSV technique has reduced
the patient examination time from about 20 s compared to VLS imaging to
fractions of a second.
LHSV imaging is still quite new compared to previous imaging techniques
such as VLS or kymography, and new algorithms need to be developed to fully
exploit the capabilities of this laryngeal diagnostic too. There are already many
articles on glottis segmentation, optical flow or other types of analysis [9,12,15,
19,22–26], but few studies even mention how the ROI is determined from these
images [8,10,18], which can significantly accelerate the computational efficiency
of a given program. Appropriate preparation of images also affects the comfort
of working with a given analysis.
In this paper, we report a study on pre-processing images collected with
LHSV so that they are as best prepared for further processing, i.e. they are free
of artifacts due to the applied image acquisition technique.

2 Materials and Methods


2.1 Participants and LHSV Recordings
Thirteen HSV films were recorded at the Department of Otolaryngology, Head
and Neck Oncology of the Medical University of Lodz. The collection of videos
consists of 13 normophonic voices without pathology.
In all participants, the examination was performed using a high-speed cam-
era. Images were collected by the Advanced Larynx Imager System (ALIS)
(Diagnova Technologies), equipped with laser diodes as a light source (ALIS
Lum-MF1) and a HighSpeed camera (ALIS Cam HS-1). Subjects were asked for
a sustained phonation of vowel “i” at a comfortable pitch and loudness level.
The images were collected at two recording frame rates 2400 fps fps and
3200 fps fps. The corresponding time intervals for laryngeal video collection were
0.83 s and 0.63 s. The resolutions of the acquired images are 512 × 480 pixels,
480 × 400 pixels or 512 × 448 pixels for the recording rate of 2400 fps fps and
448 × 400 and 368 × 448 for the rate of 3200 fps fps.

2.2 Software Tools


A high-level programming language Python version 3.7 and the Spyder 4 envi-
ronment were used to implement the developed algorithms. It includes an exten-
sive package of standard libraries that allow to use ready-made functions, which
134 J. Kaluża et al.

directly contribute to the easy and intuitive writing of programs. The first pack-
age is OpenCV (Open Source Computer Vision Library) is an open-source library
that includes several hundreds of computer vision algorithms [2]. It is predomi-
nantly used for image processing, video analysis and object detection. The sec-
ond main library used is SciPy [6], which was used to perform operations on
signals, e.g. FFT or B-spline function, and to eliminate shifts between consecu-
tive frames of the laryngeal movie. Libraries such as NumPy [4] for creating and
operate image matrices and Matplotlib [3] for creating graphs and displaying
the results as images were also used.

2.3 Image Preprocessing

Although videos captured with LHSV are obtained in a fraction of a second,


they require many operations to prepare the image for further processing and
analysis steps [14,16]. The block diagram in Fig. 1 illustrates the flow of the
implemented image preprocessing procedures.

Fig. 1. Image preprocessing pipeline. Source: Private source

The videos that were used were in .avi, .mp4 or .hsv format, the first step
of pre-processing was to convert each of the video into the series of frames in
.bmp format, in case of first two formats using Python language of programming
and other using HSVviewer program created by Diagnova Technologies. Each
BMP image has three components: R (red), G (green) and B (blue). The red
component was separated for further processing. Since, the entire image of the
vocal folds contains red colour components, so the difference in the level of
brightness between the vocal fold and the glottis gap will be most prominent.
Additionally, in the G and B colour images, blood vessels were visible on the vocal
folds more intensely than for the R component, which could interfere with further
analysis. The blue component of images was used to reduce the reflections coming
from the source of light. Reflections are most visible for the blue component of the
image. Figure 2 shows the differences in reflectance brightness between different
image components R, G and B.
Preprocessing of Laryngeal Images from High-Speed Videoendoscopy 135

Therefore, a threshold corresponding to the reflective areas was determined


for each image. The glares usually had a brightness level close to the maximum.
The threshold was selected manually for each subject, the first frame in the
blue component image was displayed and the area around the reflections was
magnified. The values for these regions were read and the threshold was selected
on this basis. Then, this area was enlarged by dilatation (with a circle-shaped
structural element). Next, the resulting binary image was processed by inpaint
function, from the OpenCV package, and served as a mask applied to the red
component images [5]. The function reconstructs the selected image area from
the pixel near the area boundary. Since the vocal folds vibrate during phonation,
the reflections move in the image, and if they were not removed, the program
could additionally detect a fundamental frequency there, which would disturb
the result.

Fig. 2. The brightness of reflections for different image’s component (a) red compo-
nent, just edges of reflection are bright, centre is dark; (b) green component, brighter
reflection than for red component, but still darker spots in centre; (c) blue component,
the brightness reflections, the most marked difference in the level of brightness between
the vocal fold and light reflection

Next, the offsets of successive frames relative to each other had to be reduced.
The offsets were usually up to a dozen or so pixels (see Fig. 3). Such small offsets
can significantly affect further analysis. It appears that the image is blurred, but
this is due to the overlap of two frames – frame no. 1 and frame no. 143. The
fftconvolve [7] function from the SciPy package was used to remove the offsets.
It convolves two images (the first image in a given series with each subsequent
image) using the Fast Fourier Transform. The result is an image in which one
point is the brightest – if it is at the point where there is the centre of the
image, it means that the shift has not occurred. If the location of the brightest
pint is different, black rows and/or columns are added to the image to properly
shift the image with respect to the first frame. Figure 3(a) shows an example
trajectory of image shifts caused by movement of the laryngoscope relative to
larynx during video acquisition. It is a visualization that uses b-spline functions
136 J. Kaluża et al.

that approximate the movement trajectory of the image content. Figure 3(a)
shows the image displacements discretised to the grid of image pixels. Note that
the number of image shifts appears to be small, but in reality there are a large
number of them, which take repeated coordinates.

Fig. 3. (a) Shift between two images frame no. 1 and no. 143; (b) Trajectory of dis-
placements with respect to the image centre

The next step in pre-processing the laryngeal images was to rotate the images
in the video so that the edges of the vocal folds were vertically aligned in the
image. The images were rotated because phoniatricians justified it by the fact
that it is easier to diagnose and analyse the image when it is in a vertical position.
First, the image representing the largest glottal opening is selected and displayed
(pictures with the largest glottal opening were annotated by the phoniatrists),
on which the user can mark a line that is the axis of symmetry of the glottis.
The centre of this line marks the midpoint of the image rotation. After rotating
the image, the black rows and/or columns (added due to move the centre of the
line to the centre of the image) were removed. Figure 4 illustrates the adopted
image rotation procedure.
At the current stage of software development we decided that the axis of
symmetry would be defined by the user. This is because of the complexity and
polymorphic shapes of the vocal folds. Additionally, it was important to deter-
mine the axis of symmetry of the entire organ (vocal folds) and not just the
glottis (Fig. 5).
A further processing step is to crop the image to the region of interest (ROI).
Phoniatricians have suggested that it is important to be able to determine the
area of the entire vocal folds and their left and right parts. For this purpose,
the following interactive procedure was proposed. The user selects 20 points sur-
rounding the vocal fold region. A third order B-spline curve is fitted based on
the marked points. B-spline curves are a generalization of polynomial represen-
tations for Bezier curves [11]. To maintain continuity, the first point indicated
by the user is added at the end to the list of designated points. The designated
area corresponds to the area of the vocal folds. The axis of symmetry was used
to obtain the area of each vocal fold.
Preprocessing of Laryngeal Images from High-Speed Videoendoscopy 137

Fig. 4. (a) User-defined axis of the glottis symmetry and moving the centre of the
defined axis to the centre of the image; (b) Rotated image

Fig. 5. Exemplary image where the glottis gap is not symmetrical and it would be
difficult to determine the axis of symmetry automatically

Figure 6(a) shows the marked area of the vocal folds, while in Fig. 6(b) the
left and right vocal fold areas are marked with different shades. Finally, the
image was cropped to the vocal fold area with appropriate margins added.

3 Results

As a result of this study, an image pre-processing algorithms were proposed to


prepare the images for further processing. The objective of the analysis of LHSV
images is to determine the glottis area, which is the area between the edges of the
vocal folds and the area of the left and right vocal folds. The method was tested
on LHSV recorded for 13 normophonic patients. The proposed procedure had
to be tested on several different patients to verify the validity of the developed
algorithm. Table 1 presents the maximum shift between frames for each subject.
138 J. Kaluża et al.

Fig. 6. (a) The area of the vocal folds surrounded by b-spline curves; (b) The area of
the glottis divided into left (gray) and right (orange) vocal fold

Table 1. Maximum shift between the first and n-th frame for each subject

Subject Original centre fftconvolve centre Pixel’s difference


S1 (240, 256) (239, 286) 31
S2 (200, 224) (196, 235) 15
S3 (224, 256) (224, 250) 6
S4 (224, 256) (223, 252) 4
S5 (224, 256) (224, 241) 15
S6 (224, 256) (210, 244) 26
S7 (224, 256) (208, 255) 17
S8 (224, 256) (223, 242) 15
S9 (224, 256) (230,261) 11
S10 (224, 256) (218, 254) 8
S11 (224, 256) (223, 260) 5
S12 (224, 256) (228, 258) 6
S13 (224, 256) (242, 259) 21

Figure 7 shows the results of the image matching procedure.


In some cases as shown in Fig. 7(b), one can see a slight blurring of the image
even after the offset reduction. This is probably due to the different positioning
of the vocal folds despite the similar phonation phase. Note, however, that the
area of the glottis overlaps, which is most important if we decide to operate on
the pixel brightness function later.
Figure 8 shows the result of reflection reduction. This is an important result
because the distribution of grey levels in the vocal fold regions will be further
investigated.
Preprocessing of Laryngeal Images from High-Speed Videoendoscopy 139

Fig. 7. The maximum shift between two frames for images in the same phonetic phase
(a); the frame with added black rows or/and columns to compensate for the shift –
two frames overlapped after offset correction (b)

Fig. 8. Images before (a) and after (b) reduction of reflections


140 J. Kaluża et al.

4 Conclusions
This paper presents a method for pre-processing LHSV images. The LHSV image
recording technique offers a new quality of visualization of the vibrating vocal
folds. However, specially designed pre-processing algorithms are required before
quantitative analysis of the images can be performed. In this paper, the following
LHSV image pre-processing procedures were developed:
– removal of shifts between consecutive images in a series of video images caused
by relative movements of the larynx in relation to the laryngoscope, occurring
even for short image acquisition intervals of fractions of a second,
– removal of glare on images due to the intense laser light source used to illu-
minate the larynx,
– determining the ROI containing the vocal folds area,
– rotating images and determining the axis of symmetry of the glottal area,
– delineation of regions containing the left and right vocal fold area.
It should be noted that these studies were performed under the close supervi-
sion of phoniatricians. They supervised the study and defined the requirements
for computer image analysis to facilitate diagnostic interpretation of the laryn-
geal images.
The current research direction is to analyze the periodicity of vocal fold move-
ments in normophonic subjects and test the developed algorithms for patients
with voice disorders.

References
1. DiagNova. https://www.diagnova.pl/pages/zasoby/rejestracja wideo IV szybka
kamera 1.html. Accessed 15 Jan 2022
2. OpenCV. https://opencv.org/. Accessed 31 Dec 2021
3. Matplotlib - Visualization with Python. https://matplotlib.org/. Accessed 31 Dec
2022
4. NumPy. https://numpy.org/. Accessed 31 Dec 2021
5. OpenCV: Image Inpainting. https://docs.opencv.org/3.4/df/d3d/tutorial py
inpainting.html. Accessed 31 Dec 2021
6. SciPy. https://scipy.org/. Accessed 31 Dec 2021
7. scipy.signal.fftconvolve - SciPy v1.8.0 Manual. https://docs.scipy.org/doc/scipy/
reference/generated/scipy.signal.fftconvolve.html. Accessed 31 Dec 2021
8. Andrade-Miranda, G., Godino-Llorente, J.I., Moro-Velázquez, L., Gómez-Garcı́a,
J.A.: An automatic method to detect and track the glottal gap from high speed
videoendoscopic images. Biomed. Eng. Online14(1), 100 (2015). https://doi.org/
10.1186/s12938-015-0096-3
9. Andrade-Miranda, G., Henrich Bernardoni, N., Godino llorente, J., Cruz, H.: Vocal
folds dynamics by means of optical flow techniques: a review of the methods. Adv.
Sign. Process. Rev. (2018)
Preprocessing of Laryngeal Images from High-Speed Videoendoscopy 141

10. Dı́az-Cádiz, M.E., et al.: Estimating vocal fold contact pressure from raw laryngeal
high-speed videoendoscopy using a hertz contact model. Appl. Sci. 9(11), 2384
(2019). https://doi.org/10.3390/app9112384. https://www.mdpi.com/2076-3417/
9/11/2384. Number: 11 Publisher: Multidisciplinary Digital Publishing Institute
11. Forrest, A.R.: Interactive interpolation and approximation by Bezier polynomials.
Comput. J. 15(1), 71–79 (1972)
12. Gómez, P., et al.: Bagls, a multihospital benchmark for automatic glottis segmen-
tation. Sci. data 7(1), 1–12 (2020)
13. Hillman, R., Mehta, D.: The science of stroboscopic imaging. Laryngeal Evaluation:
Indirect Laryngoscopy to High-Speed Digital Imaging, pp. 101–109 (2010)
14. Ikuma, T., Kunduk, M., McWhorter, A.J.: Preprocessing techniques for high-
speed videoendoscopy analysis. J. Voice 27(4), 500–505 (2013). https://doi.
org/10.1016/j.jvoice.2013.01.014. https://www.sciencedirect.com/science/article/
pii/S0892199713000155
15. Kist, A.M., Dürr, S., Schützenberger, A., Döllinger, M.: OpenHSV: an open plat-
form for laryngeal high-speed videoendoscopy. Sci. Rep. 11(1), 13,760 (2021).
https://doi.org/10.1038/s41598-021-93149-0. Number: 1 Publisher: Nature Pub-
lishing Group
16. Koç, T., Çiloğlu, T.: Automatic segmentation of high speed video images of vocal
folds. J. Appl. Math. 2014 (2014). https://doi.org/10.1155/2014/818415
17. Mehta, D.D., Hillman, R.E.: Current role of stroboscopy in laryngeal imag-
ing. Curr. Opin. Otolaryngol. Head Neck Surg. 20(6), 429–436 (2012).
https://doi.org/10.1097/MOO.0b013e3283585f04. https://www.ncbi.nlm.nih.gov/
pmc/articles/PMC3747974/
18. Naghibolhosseini, M., Deliyski, D.D., Zacharias, S.R., de Alarcon, A., Orlikoff,
R.F.: Temporal segmentation for laryngeal high-speed videoendoscopy in
connected speech. J. Voice, Offic. J. Voice Found. 32(2), 256.e1–256.e12
(2018). https://doi.org/10.1016/j.jvoice.2017.05.014, https://www.ncbi.nlm.nih.
gov/pmc/articles/PMC5740029/
19. Pedersen, M., Jønsson, A., Mahmood, S., Agersted, A.: Which mathematical and
physiological formulas are describing voice pathology: an overview. J. Gen. Pract.
4(3) (2016). https://doi.org/10.4172/2329-9126.1000253
20. Poburka, B.J., Patel, R.R., Bless, D.M.: Voice-vibratory assessment with laryngeal
imaging (VALI) form: reliability of rating stroboscopy and high-speed videoen-
doscopy. J. Voice 31(4), 513.e1–513.e14 (2017). https://doi.org/10.1016/j.jvoice.
2016.12.003. https://www.jvoice.org/article/S0892-1997(16)30360-5/fulltext.
Publisher: Elsevier
21. Powell, M.E., et al.: Efficacy of videostroboscopy and high-speed videoendoscopy
to obtain functional outcomes from perioperative ratings in patients with vocal fold
mass lesions. J. Voice 34(5), 769–782 (2020). https://doi.org/10.1016/j.jvoice.2019.
03.012. https://www.jvoice.org/article/S0892-1997(18)30466-1/fulltext.Publisher:
Elsevier
22. Schlegel, P., et al.: Influence of analyzed sequence length on parameters in laryngeal
high-speed videoendoscopy. Appl. Sci. 8(12), 2666 (2018). https://doi.org/10.3390/
app8122666. Number: 12 Publisher: Multidisciplinary Digital Publishing Institute
23. Schlegel, P., Stingl, M., Kunduk, M., Kniesburges, S., Bohr, C., Döllinger, M.:
Dependencies and ill-designed parameters within high-speed videoendoscopy and
acoustic signal analysis. J. Voice 33(5), 811-e1 (2019)
24. Yamauchi, A., et al.: Chapter 16 analysis of HSDI/HSDP with laryngotopography:
the principles p. 4 (2015)
142 J. Kaluża et al.

25. Yamauchi, A., et al.: Vibratory phase difference of normal phonation: HSDI ana-
lyzed with laryngotopography p. 10 (2016)
26. Yamauchi, A., Yokonishi, H., Imagawa, H., Sakakibara, K.I., Nito, T., Tayama, N.,
Yamasoba, T.: Quantification of vocal fold vibration in various lryngeal disorders
using high-speed digital imaging. J. Voice Offic. J. Voice Found. 30(2), 205–214
(2016). https://doi.org/10.1016/j.jvoice.2015.04.016
Construction of a Cephalometric Image
Based on Magnetic Resonance
Imaging Data

Piotr Cenda1 , Rafal Obuchowicz2 , and Adam Piórkowski1(B)


1
Department of Biocybernetics and Biomedical Engineering, AGH University
of Science and Technology, Mickiewicza 30 Av, 30-059 Kraków, Poland
cenda@student.agh.edu.pl, pioro@agh.edu.pl
2
Department of Diagnostic Imaging, Jagiellonian University Medical College,
ul. Kopernika 19, 31-501 Kraków, Poland
rafalobuchowicz@su.krakow.pl

Abstract. Cephalometric images are commonly used in dental and


orthodontic treatment and diagnostics. The harmful effects of ionizing
radiation, on which this imaging method is based, are a well-known prob-
lem that is mentioned in numerous publications and respectable jour-
nals. The objective of this project was to reconstruct a cephalometric-
like image based on data from magnetic resonance imaging, which is
a harmless method. Two image series derived from T1-weighted and
T2-weighted magnetic resonance sequences were geometrically trans-
formed to spatially match each other. Subsequently, bone and soft tis-
sue segmentation was performed using selected morphological opera-
tions and sequence correlation. The segmented masks were interpolated
and projected onto the selected plane to form an image. The generated
cephalometric-like image, despite inaccuracies due to incorrect segmen-
tation of the sinuses, represents a good approximation of cephalometry
and could serve as a screening test that would determine whether more
accurate diagnostic methods are needed.

Keywords: Reconstruction · Cephalometry · Magnetic resonance

1 Objective
Examinations using ionizing radiation, such as cephalometry and pantomograms,
are still commonly used for imaging of the cerebrocranial and craniofacial skele-
ton in the diagnosis of malocclusion and the planning of orthodontic treat-
ment, despite its well-known harmful effects [4,6,7,9,10], which can affect the
extremely radiosensitive thyroid gland and lens, for example. Ionizing radiation
is frequently used in dental and orthodontic treatment, especially during routine
and periodic examinations, but this unnecessarily exposes the body to radiation
and increases the risk of cancer.

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 143–154, 2022.
https://doi.org/10.1007/978-3-031-09135-3_13
144 P. Cenda et al.

Scientific papers from reputable journals [1,2,8] indicate a significant cor-


relation between examinations performed with the use of ionizing radiation in
children aged up to 10 years and future coincidences of meningiomas (nearly a
fivefold increase in risk). Emerging techniques aimed at reducing this risk are
based solely on reducing the radioactive dose rather than using non-ionizing
imaging modalities such as MRI. In view of this problem, a question arises: is it
possible to obtain a similar reconstruction based on MRI images?
There is an increasing amount of research into the use of unique imaging
sequences [5,11], thanks to which very clear images of bone structures can be
obtained; for instance, “Black bone” MRI is a currently researched method of
MRI that makes bone tissues very easily segmentable. These techniques show
satisfactory results but they involve the use of non-standard MRI machines,
which precludes their use from most hospitals. The problem with using this
method instead of CT is the poor contrast between bony structures and calcified
tissues, soft tissues and air in the body.
The goal of this project is to develop a methodology that would allow the
reconstruction of cephalometry based on patient data obtained from an MRI
scanner; this includes developing a transformation so that the visibility and
accuracy of tissue outlines are preserved, which is important in the orthodontic
treatment process.

2 State of the Science


Despite the extensive number of described segmentation methods and MRI imag-
ing techniques, those involving the skull are scarce. They mostly involve segmen-
tation of the brain and bone tissue (without craniofacial area), most often using
custom MRI imaging sequences that are not widely available.
The biggest difficulty is bone tissue’s short relaxation time. Attempts to cir-
cumvent this problem include the use of “black bone” sequences (“Black Bone”
MRI) and combining results from gradient echo (GRE) and short echo (TE)
sequences [5,11]. By using a combination of “Black Bone” and other imaging
techniques, these methods provide extremely good results in which bony struc-
tures are well distinguished from soft tissue and air in the sinuses (agreement
with CT of approximately 98%). However, these studies focus only on bone seg-
mentation for 3D models, therefore other types of tissue are ignored; also, they
require special cameras and many different MRI sequences, which increases the
time required to perform the study.
Attempts to segment bone, brain and scalp using only T1 sequences have
also been made. The method mentioned in [3] includes only simple morpholog-
ical operations like closure, opening and thresholding. The brain is segmented
sequentially using the Brain Surface Extractor algorithm, which avoids mis-
segmentation as other structures; subsequently, scalp and bone masks are gen-
erated sequentially. The opening and closing actions provide an opportunity to
find the boundaries of the skull and eliminate possible interference. Additionally,
using masks from previously segmented structures ensures that subsequent seg-
mented tissues do not intersect. These results turned out to be positive due to
Cephalometric Image Reconstruction 145

the displacement error with respect to CT, which is within 3 mm. The limitation
of this method is that it can only be applied in the part of the skull contain-
ing the brain, therefore the craniofacial region that is important for dental and
orthodontic treatment is excluded.

3 Data

The data used in this project are two series of magnetic resonance images: T1-
weighted and T2-weighted (Fig. 1). Each series consists of 105 cross sections in
the transverse plane, which contains the craniofacial region, the cerebrocranial
region, and the cervical spine. The resolution of the images is 448 × 448 pixels
with a bit depth of 16.

Fig. 1. Selected cross-section of the data used, T1 (a) and T2 (b) sequences

The T1 and T2 sequences do not overlap, which was taken into account in
the constructed algorithm by fitting appropriate geometric transformations to
minimize the differences between the sequences.

4 Algorithm of Reconstruction

The whole algorithm can be divided into stages: data preparation, tissue seg-
mentation and then generation of the final reconstruction. Data preparation
includes data loading, normalization, and matching of T1-weighted and T2-
weighted image series. Processing includes filtering, thresholding, and de-noising
of the prepared data (Fig. 2); reconstruction includes interpolation of the images
between the given cross-sections and transformation into a cephalometric image.
146 P. Cenda et al.

Fig. 2. Diagram showing the process of tissue segmentation

4.1 Data Preparation

Due to the offset of the sequences used together simultaneously, it was neces-
sary to match them to each other. For this purpose, the T2 sequence was sub-
jected to geometric transformations such as translation, rotation and scaling.
The optimization function was the BFGS algorithm, whose goal was to select
the parameters of each transformation in order to reduce the differences between
the sequences as much as possible.
An optimization function was used to minimize the differences between the
sequences by applying an appropriate cost function. Its construction consisted
of the use of bilateral filtering in order to simultaneously preserve clear edges
at the boundary between the imaging background and the patient’s head; edge
detection using the Canny algorithm was also performed. For the contours thus
prepared, in each iteration of the optimization algorithm the sum of overlapping
pixels was computed and used in the minimized cost function.
The image series prepared in this way allowed for further transformation
using the correlations and differences in tissue imaging between T1-weighted
and T2-weighted MRI images. To unify the data and facilitate comparison of the
images and the tissues, each image was then normalized to an interval of 0–1,
such that the smallest and largest values in the images were 0 and 1, respectively.
Cephalometric Image Reconstruction 147

4.2 Background Segmentation

In the first step, bilateral filtering was performed on the image for a 10×10
mask size in order to suppress noise and maintain strong and unblurred bound-
aries between the background and the imaged head. Next, thresholding was
performed. Due to the normalization that took place during data preparation,
the thresholds had a value between 0 and 1. For the background segmentation
part, the threshold for both sequences was the same at 0.05 (true for values less
than the threshold).
These are preliminary versions of the masks, but they contain a lot of noise
and imperfections. Especially the area along the edges needs de-noising. For
this purpose, a merge of the two sequences was performed. As a result, the
numerous defects in the masks were mostly fixed and further de-noising was
more effective. De-noising was performed by removing all objects (unconnected
binary selections) that were smaller than the largest one.
In the next step, dilation and closure operations were applied to get rid of
elements such as noise from the masks and to get the masks as close as possible
to the head boundary. For closure, a mask size of 15 × 15 was used; for dilation,
a mask size of 7 × 7 was used. The results are shown in Fig. 3. Additionally, this
allowed for the inclusion of peripheral parts of the head, such as the auricles
and parts of the nose, which would be harder to detect in the soft tissue mask
process.

Fig. 3. Selected cross-section after dilation and closure operations

The last part of the background segmentation was to remove all the signif-
icant indentations contained in the already segmented mask. This was done to
remove any fields that may have been created by dilation and closure by cutting
off the indentations that cut into the mask with a narrow element. Segmenting
the background mask in this way allows the algorithm to distinguish the dark
values of the background image from the bones, which are also imaged with low
values in the image.
148 P. Cenda et al.

4.3 Soft Tissue Segmentation

In the soft tissue segmentation part, the first step was bilateral filtering to remove
noise and to enhance the contrast between soft tissues and other tissues. A 7 × 7
mask was used and the filtering was performed.
The next step was thresholding. The threshold was different for each
sequence: for the T1-weighted series, the threshold was 0.1; for the T2-weighted
series, it was 0.2 (true for values greater than the threshold). The different thresh-
olding values for the two sequences are due to the different representation of soft
tissues, particularly the white and gray matter of the brain and the fluids sur-
rounding it.
For the T2-weighted series, the threshold is twice as large because the key
issue was the segmentation of the outer parts of the brain and other soft tissues
that bordered other tissue types. These boundaries in the T1-weighted series
were darker, making their segmentation from the T1-weighted series impossible
because it would compromise the coherence of bone tissue segmentation. The
lower segmentation threshold of the T1-weighted series was due to the softer
contrast at the tissue boundaries and the absence of elements that could interfere
with the segmentation of the bone tissues.
The soft tissue binary masks created in this way were de-noised by removing
objects and holes in the image that were smaller than 15 pixels, thus preventing
formation of small artifacts outside the mask area and within the mask bound-
aries, but not affecting the coherence of further segmentation.
The final part of the soft tissue segmentation was to combine the T1-weighted
and T2-weighted sequences. The result of this operation is shown in Fig. 4. The
merging of the sequences involved a logical sum operation.

Fig. 4. Selected cross-sections after merging

The soft tissue binary masks prepared in this way allows the algorithm to
segment bone tissues from data more accurately. The masks take into account
brain tissues, cerebrospinal fluid, fatty tissues, connective tissues, other types of
fluids (such as blood), and pathologies (such as tumors and abscesses).
Cephalometric Image Reconstruction 149

4.4 Segmentation of Bone Tissues


Segmentation for bone tissues began with bilateral filtering using a 7 × 7 mask to
remove noise from the image and to enhance the contrast between tissues. The
sequences were then thresholded using threshold values of 0.1 for T1-weighted
and 0.2 for T2-weighted images (the masks took high values for pixels with values
less than the threshold due to the dark representation of bone tissues).
In the next step, the previously obtained soft tissue masks (Fig. 4) and back-
ground (Fig. 3) were used to exclude these areas from the bone segmentation.
This required logic filtering consisting of two steps.
In the first step, the soft tissue and background masks were combined using
a logical sum. In the second step, the summed mask values were inverted by
logical negation and then combined with the masks obtained from thresholding
using the logical conjunction function. This function assigns a high value to each
pixel if the mask from the first step and the mask from thresholding have a high
value at that pixel in the image; low values are assigned to pixels whose value
is low on either of the masks. By combining the segmented images, masks were
obtained that include bone tissues without soft tissues and background, which
is the basis for cephalometric image reconstruction.
Next, as in the segmentations of other parts of the image, noise removal was
performed. Objects and holes in the mask, which disturbed and deformed the
obtained masks, with a size smaller than 25 pixels in the image were removed.
Figure 5 illustrates the obtained masks.

Fig. 5. Selected cross-sections after merging

4.5 Cephalometric Image Generation


The binary masks of bone and soft tissues obtained from the previous part of
the algorithm are the basis of the cephalometric image reconstruction as they
serve as spatial representation of these tissues. Generating a cephalometric image
from them requires two steps: interpolation of available sections and projection
of mask values onto a plane. Due to the dimensions of the data held (105 cross-
sections in the transverse plane with a size of 448 × 448) and the characteristics
150 P. Cenda et al.

of cephalometry, which requires changing the plane to sagittal (when generating


a cephalometric image in lateral projection), it is necessary to interpolate the
masks in the transverse plane, thereby increasing the number of cross-sections.

Fig. 6. Generated and interpolated bone tissue models

For this purpose, a bicubic interpolation method was used to interpolate the
bone and soft tissue masks to 448 cross sections in the transverse plane. An
example spatial model of the bone mask is shown in Fig. 6. The models thus
prepared were used for the next step of cephalometric image generation.
Using the bone and tissue model, projection of mask values onto planes was
performed. During the projection, for each pixel on the selected plane the number
of pixels present in the mask in a given direction perpendicular to the plane
was counted. This operation simulated the density and thickness of the objects,
which are parameters that have a major influence on the appearance of the image
during cephalometric imaging. The counts from the bone and soft tissue masks
were added together with weights: 1 for bone and 0.25 for soft tissue. The values
thus calculated were normalized to 1. The results are shown in Fig. 7. These are
the generated projections in three different planes.

5 Results
By running the algorithm, a cephalometric-like image was generated. For easier
and more accurate comparison with the real cephalometric image (Fig. 8(a)),
the generated image (Fig. 8(b)) was cropped and rotated. These images were
Cephalometric Image Reconstruction 151

Fig. 7. Generated cephalometric images in three different planes

not obtained from the same individual due to the difficulty of obtaining such
data. Nevertheless, even an illustrative comparison of the two images will give
an idea of the differences and the possible inaccuracies present in the image
generated by the proposed algorithm.

Fig. 8. Comparison of the original cephalometric image (a) with the generated (b)

Leaving aside the differences caused by the nature of the examination from
which the image was obtained, the differences are most pronounced in the cran-
iofacial portion: in the area of the nasal septa and sinuses, as well as at the
esophageal site. Magnetic resonance imaging is different than computed tomog-
raphy in terms of tissue representation: in CT, bone tissues are represented as
bright elements and air and background are represented as dark elements; on
the other hand, in MRI, bone tissue, air, and background are all represented as
152 P. Cenda et al.

dark elements. Segmentation of the background is not a problem, unlike segmen-


tation of the air inside the body of the patient in the nasal septum, sinuses, and
esophagus. Thus, the background becomes indistinguishable from bone tissue.
Due to the same dark color representation during magnetic resonance imaging,
the extremely variable structure and the close proximity of bony tissues, air
segmentation is impossible without disturbing the segmentation of bone tissue.
This results in the segmentation of air inside the patient’s body as bone tissue
when applying the algorithm.
The result of this confusion is excessive image brightness at the site of the
sinuses, primarily the maxillary sinuses, as they are the largest air-containing
elements inside the body. This is because when reconstructing the cephalometric
image, the algorithm counts the number of pixels in a given direction when
projecting onto the plane. When the sinuses (or other air-containing elements)
are erroneously segmented as bone tissue, the algorithm interprets them as just
bone tissue and takes them into account during counting. This causes the image
to be brighter than it should be in such areas, indicating the presence of a thicker
layer of bone tissue, even though it is not actually present there. A clearly visible
segmentation of the sinuses is shown in Fig. 5(b), where one can see the large
volume they occupy; in some orientations, the area of the sinuses is relatively
larger than the bones themselves. The maxillary sinuses primarily interfere with
imaging of the maxilla and the zygomatic bone, which are important imaging
components for dental and orthodontic diagnosis. The frontal sinuses, which are
located in the frontal bone region, and the wedge sinuses, located at the temporal
bone, also interfere with bone segmentation but to a lesser extent due to their
smaller size.
Air in the esophagus and nasal septum does not disturb the image as much as
it does in the sinuses. This changes the appearance of the overall image generated;
however, clear imaging of the nasal septum or esophagus does not obscure key
elements of the patient’s body structure. The septum clearly does not obscure
anything due to its peripheral location, and the esophagus is predominantly
separated from the cervical portion of the spine.
The external auditory canals, which are also filled with air, became largely
separated from the bony tissues during the background segmentation process.
This was possible because of its peripheral location and its greater distance from
the bony tissues. The canal is for the most part surrounded by soft tissues, which
allowed it to be attached to the background mask without compromising the
integrity of the segmented bone tissues. However, elements such as the eardrum
cavity in the middle ear and the Eustachian tube in the inner ear, both of
which also contain air, could not be attached to the background mask due to
their location inside the body. However, their influence is minimal and does not
interfere with the generated image.
At the location of the parietal bone and the frontal bone, in the area of
the cranial vault, a disturbance in the continuity of the image can be observed,
manifested by bands of brightness changes. This is the result of interpolation,
which made it possible to increase the number of cross sections of the generated
Cephalometric Image Reconstruction 153

image in the transverse plane to match the size of the frontal and sagittal planes.
However, due to the high degree of interpolation (from 105 cross sections to 448),
distortion is present in areas of high volatility. In the region of the cranial vault,
the frontal and parietal bones begin to have a drastically smaller area in each
cross-section in the transverse plane, which causes visible brightness biases due
to inaccurate and non-smooth interpolation of values between known values in
the image. This effect can be removed by using a larger series of MRI images
that covers the same size area as previous data.

6 Discussion

The difference between the original and the created image is primarily due to
the position of the body during examination: for cephalometry, it is a standing
position; for MRI, it is a recumbent position. The biggest changes caused by this
are manifested as a different position of the skull and cranium in relation to the
cervical vertebrae.
The developed methodology, which reconstructs a cephalometric-like image,
is an initial step towards using patient-free imaging instead of methods that use
harmful ionizing radiation, such as cephalometry or pantomograms. Despite its
limitations due to inaccurate air segmentation, which distorts and disrupts the
output image generated by the proposed algorithm, it significantly resembles
cephalometric imaging. These distortions do not include most areas, such as
the cervical vertebrae, the skull, and the dentition, all of which are frequently
used by dentists and orthodontists during treatment, therefore the developed
methodology could be utilized. If the number of sections comprising the magnetic
resonance imaging sequences were increased, the mapping would gain in accuracy
and legibility, which would reduce the impact of interpolation, which noticeably
compromises image integrity.
The constructed method fundamentally describes the bone structure of the
human cerebral, craniofacial, and cervical vertebrae. Its low resolution (although
it could be increased by using more cross-sections) can be used as an initial
screening and diagnostic test that serves as a rough model of a cephalometric
image and, therefore it could be used to determine the need for imaging with
ionizing radiation.
The problem with clear sinus imaging is not necessarily just a disturbance
of the cephalometric image. When sinusitis is present, the sinuses are imaged as
brighter than normal in magnetic resonance imaging. This is due to the presence
of sinus-filling secretions that are a consequence of infection or chronic allergies.
Thus, the sinuses would be darker on the generated image, suggesting an exist-
ing disease or other pathology. Such an examination could serve as a screening
procedure that suggests the possible need for further diagnostic tests.
In conclusion, the developed algorithm, despite the partial inaccuracies, can
serve as preliminary and approximate imaging that would determine whether
further steps in the diagnostic process should be taken.
154 P. Cenda et al.

Acknowledgement. This work was financed by the AGH University of Science and
Technology thanks to the Rector’s Grant 18/GRANT/2022.
This work was co-financed by the AGH University of Science and Technology,
Faculty of EAIIB, KBIB on 16.16.120.773.
Work carried out within the grant Studenckie Kola tworza innowacje - II edition,
project no. SKN/SP/535131/2022 entitled “Cephalometric image reconstruction based
on magnetic resonance imaging”.

References
1. Claus, E.B., Calvocoressi, L., Bondy, M.L., Schildkraut, J.M., Wiemels, J.L., Wren-
sch, M.: Dental X-rays and risk of meningioma. Cancer 118(18), 4530–4537 (2012)
2. Cung, W., et al.: Cephalometry in adults and children with neurofibromatosis type
1: implications for the pathogenesis of sphenoid wing dysplasia and the NF1 facies.
Eur. J. Med. Genet. 58(11), 584–590 (2015)
3. Dogdas, B., Shattuck, D.W., Leahy, R.M.: Segmentation of skull and scalp in 3-D
human MRI using mathematical morphology. Hum. Brain Mapp. 26(4), 273–285
(2005)
4. Domeshek, L.F., Mukundan, S., Yoshizumi, T., Marcus, J.R.: Increasing concern
regarding computed tomography irradiation in craniofacial surgery. Plast. Recon-
str. Surg. 123(4), 1313–1320 (2009)
5. Eley, K.A., Delso, G.: Automated 3D MRI rendering of the craniofacial skeleton:
using ZTE to drive the segmentation of black bone and FIESTA-C images. Neu-
roradiology 63(1), 91–98 (2020). https://doi.org/10.1007/s00234-020-02508-7
6. Hwang, S.Y., Choi, E.S., Kim, Y.S., Gim, B.E., Ha, M., Kim, H.Y.: Health
effects from exposure to dental diagnostic X-ray. Environ, Health Toxicol, 33(4),
e2018,017 (2018)
7. Krille, L., et al.: Risk of cancer incidence before the age of 15 years after exposure
to ionising radiation from computed tomography: results from a German cohort
study. Radiat. Environ. Biophys. 54(1), 1–12 (2015)
8. Maillie, H.D., Gilda, J.E.: Radiation-induced cancer risk in radiographic cephalom-
etry. Oral Surg. Oral Med. Oral Pathol. 75(5), 631–637 (1993)
9. Pflugbeil, S., Pflugbeil, C., Schmitz-Feuerhake, I.: Risk estimates for meningiomas
and other late effects after diagnostic X-ray exposure of the skull. Radiat. Prot.
Dosimetry 147(1–2), 305–309 (2011)
10. Smith-Bindman, R., et al.: Radiation dose associated with common computed
tomography examinations and the associated lifetime attributable risk of cancer.
Arch. Intern. Med. 169(22), 2078–2086 (2009)
11. Zhang, R., et al.: Bone-selective MRI as a nonradiative alternative to CT for cran-
iofacial imaging. Acad. Radiol. 27(11), 1515–1522 (2020)
Analysis of Changes in Corneal Structure
During Intraocular Pressure
Measurement by Air-Puff Method

1(B)
Magdalena Jedzierowska
 , Robert Koprowski1 , and Slawomir Wilczyński2
1
Institute of Biomedical Engineering, Faculty of Science and Technology,
University of Silesia in Katowice, ul. Bedzińska
 39, 41-200 Sosnowiec, Poland
{magdalena.jedzierowska,robert.koprowski}@us.edu.pl
2
Department of Basic Biomedical Science, School of Pharmacy with the Division
of Laboratory Medicine in Sosnowiec, Medical University of Silesia in Katowice,
Kasztanowa Street 3, 41-200 Sosnowiec, Poland
swilczynski@sum.edu.pl

Abstract. This paper presents a method for analysing changes in the


corneal structure during intraocular pressure measurements using the air-
puff method. The research consisted in the analysis of displacements of
specific areas of the cornea extracted from a sequence of images obtained
from the Corvis ST tonometer. The results obtained allow for real-time
tracking of specific areas of the cornea, the displacements of which,
according to preliminary studies, are characterized by asymmetry. The
parameters proposed in the paper, i.e. the absolute displacement of the
examined area (|Δn|) and the value of its maximum deviation (d), indi-
cate their potential application in the assessment of corneal biomechan-
ics. However, there is a need for further studies involving larger research
groups of both healthy subjects and patients with diseased corneas, which
will allow for an accurate assessment of the value distribution of the
parameters obtained in terms of their clinical usefulness.

Keywords: Image structure analysis · Dynamic corneal deformation ·


Image analysis · Corvis ST · Scheimpflug camera

1 Introduction
A dynamic process of corneal deformation occurs when measuring intraocular
pressure (IOP) by the air-puff method. The cornea deforms inwards and then
returns to its original shape. This corneal behaviour is influenced by many fac-
tors, including its thickness and mechanical properties of its structure, as well
as intraocular pressure [11]. Currently, using corneal visualization Scheimpflug
technology (Corvis ST tonometer), it is possible to obtain a number of biome-
chanical parameters of the cornea, helpful, among others, in the diagnosis of
keratoconus [5]. As indicated in the papers [1,8,13,14], it can be noticed that
during deformation the cornea performs characteristic vibrations in the form
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 155–167, 2022.
https://doi.org/10.1007/978-3-031-09135-3_14
156 M. Jedzierowska
 et al.

of oscillations. What is more, changes in its structure, having an individual


character, as well as repetitive local density changes are visible in corneal cross-
section images. Following the successive frames of the IOP measurement per-
formed with a Scheimpflug camera-based tonometer, it can be noticed that these
specific changes in the corneal structure, e.g. in the form of transverse stripes,
change their position over time. Figure 1 shows sample images from the Corvis
ST camera with the highlighted specific areas of the corneal structure. The
above-mentioned changes in the corneal structure have not yet been discussed
in the literature, although the characteristic features of the corneal texture have
already been emphasized [12].

Fig. 1. Sample images of corneal cross-sections from the Corvis ST tonometer with the
highlighted characteristic areas of the corneal structure

In this study, the entire sequence of 140 corneal images from the Corvis ST
tonometer was analysed in order to find relationships in the observed changes
mentioned above. The study is not related to texture, but to the structure of
the cornea. Therefore, in the following sections, the corneal structure as well as
its displacement visible during IOP measurements with a non-contact tonometer
are analysed. The details contained in the texture of the analysed area were not
investigated. They will be the subject of discussion in future papers. Thus, the
authors decided not to use tools for analysing the texture of medical images,
such as grey level co-occurrence matrix (GLCM) methods [18], Laws Texture
Energy Method [4], wavelet texture analysis [15] or statistical approaches [3],
and only correlation, as the algorithm component, was used.

2 Materials

The analysis was performed on images obtained from a non-contact tonometer


using a Scheimpflug camera (Corvis ST tonometer). For each of the examined
subjects, the images were exported as a series of 140 images in *.jpg format.
The images had a resolution of M × N = 200 × 576 pixels (16.5 × 15.7 µm/pixel)
(where M – number of image rows, N – number of image columns). The study
Analysis of Changes in Corneal Structure 157

collected data from 20 subjects aged 24 to 69 years undergoing routine eye


examinations. Patients with corneal diseases or lesions that could affect tissue
thickness and elasticity were excluded from the study. In total, 2,800 2D images
were subjected to further analysis. The images were obtained from Sven Reisdorf,
a specialist from the Oculus laboratory. The data were anonymised and healthy
subjects gave their voluntary consent.

3 Methods
The proposed method for analysing the corneal structure, allowing for fully auto-
matic tracking of changes visible in the images of its cross-sections during the
intraocular pressure examination, has been divided into two key stages:
1. Extraction of corneal cross-sections from an image sequence.
2. Analysis of corneal structure changes in consecutive images in the sequence
(over time).
Extraction of the corneal image from individual image frames involved image
pre-processing and outer corneal edge detection. For this purpose, the authors’
method presented in [9] was used. Then, to extract the cornea, its thickness was
assumed to be constant, which, as indicated later in the paper, is possible under
specific conditions and introduces certain limitations to the entire algorithm.
The second stage consisted in analysing corneal structure changes during the
intraocular pressure examination. For this purpose, specific areas of the cornea
were followed in 140 consecutive images. Tracking was based on finding the areas
with the highest correlation to the originally selected fragment. The individual
stages of the proposed image analysis are described in detail in the following sub-
sections. The algorithm was written in MATLAB  R
ver. 9.0.0.341360 (R2016a)
using Image Processing Toolbox (ver. 9.4).

3.1 Extraction of Corneal Cross-Sections from a Sequence of 140


Images
Each of the analysed sequences of 140 2D images L(m, n), where m – the number
of rows and m ∈ (1, 200), n – the number of columns and n ∈ (1, 576), were
subjected to image pre-processing.
The first step involved median filtering with a 7 × 7 pixel window and image
normalization [9]. Next, to detect the outer corneal edge, the method presented
in [9] was used. It is based on local image thresholding and image morphological
operations, owing to which the outer corneal contour, hereinafter referred to as
LSP
k (n), is identified in the obtained binary image of the cornea LmaxAL (m, n).
For the applied window sized 75 × 75 pixels, the binarization threshold value
t(m, n) of the applied method is determined in accordance with the following
equation:   
σ(m, n)
t(m, n) = μ(m, n) 1 − k −1 (1)
R
158 M. Jedzierowska
 et al.

where: t(m, n) – threshold value for a pixel with coordinates (m,n), m ∈ (1, 200),
n ∈ (1, 576), μ(m, n) – mean brightness value for a given window, σ(m, n) – stan-
dard deviation for a given window, k – constant k > 0, selected experimentally
(k = 0.25), R – maximum value of the standard deviation.
The constant k (Eq. (1) was selected experimentally based on the contrast
and size of the analysed objects. Then, based on the image resolution and anthro-
pometric data showing a mean corneal thickness of approx. 500 µm, an assump-
tion was made that the corneal thickness was constant, equal to 20 pixels. The
above assumption applies only under certain conditions, e.g. lack of diseases or
conditions affecting the corneal thickness such as keratoconus in the test sub-
jects. The introduction of the above assumption also requires pre-selection of
subjects taking into account their individual variability. The symbol i, where
i ∈ (1, 139), was adopted for marking consecutive images in the sequence.
Knowing the values of the position of the detected outer edge of the image
LSP
k (n) for each i-th image from the sequence and using the adopted assump-
tion of constant corneal thickness, successive images of the corneal cross-sections
LCOR (m, n), where m ∈ (1, 20), were extracted. Based on the above operation,
the LCOR (mi , ni ) image sized 2800 × 576 pixels was obtained, where the sub-
script i refers to the successive images in the sequence, thus assuming values in
the range from 1 to 139. The image LCOR (mi , ni ) is shown in Fig. 2.

Fig. 2. a) Diagram showing the method for extracting a single corneal cross-section
LCOR (m, n) from the selected 2D image. b) Newly created image LCOR (mi , ni ) con-
sisting of the consecutive 140 corneal images. An exemplary area LK (mi , ki ) for which
the area with the highest correlation LK (mi+1 , j) was searched for is marked with a
red rectangle
Analysis of Changes in Corneal Structure 159

3.2 Analysis of Corneal Structure Changes in Consecutive


Cross-Sections (Real-Time Tracking)

In this part of the study, individual corneal cross-sections were analysed in terms
of changes in their structure during the IOP examination. For this purpose, in
the consecutive 140 corneal images LCOR (m, n), areas with the greatest corre-
spondence, understood in the analysed issue as the highest value of correlation
between two selected areas in the sequence, were searched for. To accomplish the
above task, each two consecutive corneal cross-sections were analysed, arranged
one after the other in the LCOR (mi , ni ) image (see Fig. 2b) as follows: Start-
ing from the first LCOR (m1 , ni ) image, the area LK (mi , ki ) was determined (an
example area is shown in Fig. 2b) with a constant length of 64 pixels, where
ki ∈ (ni , ni + 63) and ni ∈ (1, 576 − 63), and then the corresponding area
LK (mi+1 , j) was searched for in the next cross-section LCOR (mi+1 , ni ), taking
into account the possibility of its shifting left or right by 50% of the length, there-
fore j ∈ (ki − 32, ki + 32). Then, the value of correlation ri (ki , j) was determined
for all the resulting pairs of images, i.e. the correlation between LK (mi , ki ) and
LK (mi+1 , j). The area with the greatest similarity was the area for which the
calculated correlation value was the highest, i.e. ri (ki ) = max ri (ki , j).
j


Ki
LKS (mi ) = 1
Ki LK (mi , ki ) (2)
ki =1

J
LKS (mi+1 ) = 1
J LK (mi+1 , j) (3)
j=1
K
i M
i
[LK (mi ,ki )−LKS (mi )][LK (mi+1 ,j)−LKS (mi+1 )]
ki =1 mi =1
ri (ki ) = max 
 (4)
j  K M K M
  i i i
i
[LK (mi ,ki )−LKS (mi )]2 [LK (mi+1 ,j)−LKS (mi+1 )]2
ki =1 mi =1 ki =1 mi =1

where: LKS (mi ) – mean value for each image row LK (mi , ki ), where ki ∈
(ni , ni + 63) and ni ∈ (1, 576 − 63), LKS (mi+1 ) – mean value for each image row
LK (mi+1 , j), where j ∈ (ki − 32, ki + 32). The above steps were repeated from
mi = 1 to mi = 139 − 1 for all possible areas with the length of 64 pixels.
The above operations provided the matrix Ls (l, z), where l ∈ (1, 139 − 1),
z ∈ (1, N − 63) and N = 576, in which the values of ’position’ were recorded, i.e.
ki of successive areas with the maximum correlation. A fragment of the Ls (l, z)
matrix is shown in Fig. 3. It should be noted that the l coordinate of the Ls (l, z)
matrix indicates the consecutive corneal cross-sections, starting from i = 2, i.e.:
i = l + 1.
The values of the Ls (l, z) matrix were then tracked (over time) to determine
the position of the next most similar (in terms of correlation) area.
The position tracking method is shown schematically in Fig. 3 by means of
arrows, which indicate the highest correlation value in the consecutive corneal
cross-section for the first fragment LK (m1 , k1 ), i.e. for i = 2, Lk (m2 , Ls (1, 1))
was obtained for the fragment, where Ls (1, 1) = 1, then forLk (m2 , 1) the
160 M. Jedzierowska
 et al.

Fig. 3. Fragment of the matrix Ls (l, z) where l ∈ (1, 139 − 1), z ∈ (1, 576 − 63) in
which the values of ’position’ were recorded, i.e. ki of the consecutive areas with the
maximum correlation. The l coordinate of the matrix denotes successive corneal cross-
sections starting from i+1, whereas the z coordinate refers to the successive extractable
areas LK (mi , ki ). Red arrows indicate the method of tracking the area LK = (m1 , k1 ),
blue arrows indicate the method of tracking the area LK (m1 , k2 )

most similar fragment in the consecutive corneal cross-section (i = 3) is


Lk (m3 , Ls (2, 1)), Ls (2, 1) = 4, and for Lk (m3 , 4) the most similar fragment in the
consecutive corneal cross-section (i = 4) is Lk (m4 , Ls (3, 4)), where Ls (3, 4) = 4
and then for Lk (m4 , 4) it is fragment Lk (m5 , Ls (4, 4)), where Ls (4, 4) = 4 . . . etc.
Using the above tracking algorithm for all possible LK (m1 , ki ) fragments
with the constant length of 64 pixels provided changes in the positions of these
areas during the intraocular pressure test (i.e. from i = 1 to i = 139). Figure 4
shows a graph of tracking the position changes of selected corneal fragments
performed for one subject. In the graph, the indicated value ki corresponds to
the position of the left side of the tracked area, the width of which is 64 pixels,
i.e. from ki to ki + 63.
The obtained data were the basis for the analysis of specific parameters for
all 20 test subjects.

4 Results
Two parameters determined on the basis of the obtained values of displacements
constituted the basis for further analysis of corneal structure changes during
the intraocular pressure examination. For each of the 20 subjects, the following
values were designated:

– absolute displacement between the end and start of each plot of displacement
change, hereinafter denoted as: |Δn| (see Fig. 5a),
– maximum deviation from the line defined by two points describing the posi-
tion of a given area at the beginning and end of the IOP test, hereinafter
referred to as: d (see Fig. 5b).
Analysis of Changes in Corneal Structure 161

Fig. 4. Sample graph of tracking changes in the position of selected areas of the cornea
during IOP examination. The values ki , which correspond to the horizontal position
of a given fragment, are marked on the horizontal axis, whereas successive i values,
corresponding to consecutive corneal cross-sections recorded during the test, where
i ∈ (1, 139), are marked on the vertical axis

Fig. 5. Schematic diagram showing the method for determining: a) parameter |Δn|,
i.e. the value of the absolute displacement between the end and beginning of the dis-
placement change plot, b) parameter d, i.e. the value of the maximum deviation from
the line defined by two points describing the location of a given area at the beginning
and end of IOP examination

The data obtained are presented in Tables 1 and 2.


The analysis of corneal structure changes carried out in this study during IOP
measurements using the non-contact method consisted in tracking the changes
in the position of specific areas of the cornea during the examination. The mea-
162 M. Jedzierowska
 et al.

Table 1. Absolute displacement between the end and beginning of each plot of dis-
placement change (|Δn|) for the selected ki values for 20 subjects

Subject no. ki
1 65 129 193 257 321 385 449 513
1 2 25 34 69 42 31 33 2 65
2 34 0 8 32 35 29 1 2 74
3 13 14 10 79 20 84 4 1 4
4 2 17 71 89 25 4 88 62 59
5 55 11 73 63 1 46 41 23 8
6 47 10 60 36 28 61 31 24 52
7 32 32 37 101 49 29 108 5 15
8 9 16 28 48 17 81 40 20 8
9 16 9 51 117 4 82 60 24 0
10 4 9 107 43 21 2 21 1 4
11 2 6 25 83 16 5 57 12 4
12 3 11 14 92 26 6 14 8 0
13 2 25 78 54 10 74 13 17 79
14 0 7 34 107 31 36 101 20 78
15 0 15 19 103 57 8 12 5 25
16 5 31 21 69 5 50 35 18 45
17 0 2 3 13 23 27 6 9 6
18 29 1 13 82 18 45 5 1 0
19 7 38 65 75 11 28 37 15 0
20 5 6 42 23 41 1 58 16 59
Mean 13.35 14.25 39.65 68.90 24.00 36.45 38.25 14.25 29.25
Std 16.33 10.52 27.46 28.68 14.74 27.57 31.24 13.62 29.72

surements were made with a Scheimpflug camera-based tonometer (Corvis ST),


and the test time was 30 ms. During this time, as shown in Fig. 4, some areas of
the cornea change their horizontal position, which can be described as changes in
the corneal structure during its dynamic deformation. The most visible changes
are observed in the areas of the presence of two characteristic peaks (visible in
the illustrative Fig. 1) appearing in approx. 8 ms when the cornea bends inward
until its greatest concavity, i.e. around 1/4 and 3/4 of the corneal width and in
the very centre of the cornea. In most of the analysed subjects, asymmetric (in
relation to the centre of the cornea) displacements in the lateral directions (to
the left and right) are observed in the areas covering the above-described peaks.
During the moment of maximum corneal bending, it seems that it stretches in
the centre, while its structure on the sides gets denser. It can be assumed that
it is related to the material properties of the cornea, i.e. its anisotropic, nonlin-
Analysis of Changes in Corneal Structure 163

Table 2. Maximum deviation from the line defined by two points describing the posi-
tion of a given area at the beginning and end of IOP examination (d) for selected ki
values for 20 subjects

Subject no. ki
1 65 129 193 257 321 385 449 513
1 2.90 13.73 20.17 14.76 36.98 37.67 17.93 16.61 33.54
2 21.90 15.00 8.04 22.83 55.19 75.16 4.51 35.87 27.41
3 8.80 14.43 15.32 30.58 31.20 47.13 7.88 4.85 1.93
4 1.00 15.15 25.01 34.35 43.95 60.76 59.05 33.63 25.52
5 24.72 45.28 54.47 83.93 75.80 63.25 22.42 39.13 3.37
6 36.02 20.89 19.90 95.29 63.82 53.21 15.72 20.18 24.19
7 14.01 19.86 19.94 51.82 30.85 29.29 42.18 4.69 7.24
8 6.90 15.81 17.97 48.66 35.30 35.77 18.12 10.59 13.86
9 16.47 12.95 19.86 43.69 8.66 68.82 24.60 13.11 0.00
10 3.16 23.15 44.04 18.93 24.99 3.13 8.12 1.83 1.91
11 4.56 6.73 12.54 38.23 26.42 57.58 20.96 5.20 10.69
12 9.15 3.55 15.33 44.65 23.93 9.34 9.76 5.93 3.00
13 13.68 16.13 22.18 51.18 28.09 29.20 10.57 14.64 37.01
14 4.00 5.33 32.89 35.16 30.75 68.28 73.40 8.63 49.32
15 10.00 13.16 17.88 35.22 47.66 41.92 12.39 7.97 12.02
16 11.38 10.00 13.30 57.69 39.70 40.55 13.52 31.47 27.76
17 3.00 5.77 16.97 11.43 13.97 24.51 7.12 9.02 4.86
18 14.14 12.35 23.24 35.85 32.33 42.97 3.99 1.51 0.00
19 6.47 14.87 29.50 30.01 34.54 16.21 12.03 6.38 4.00
20 7.66 12.94 13.14 56.39 33.86 21.80 15.40 28.33 28.28
Mean 11.00 14.86 22.08 42.03 35.90 41.33 19.98 14.98 15.80
Std 8.45 8.57 10.79 20.42 15.32 20.05 17.68 11.85 14.26

early elastic and viscoelastic properties [6,17,20]. In addition, there are visible
displacements of areas in the central part of the cornea, which after the moment
of maximum bending take an oscillating character (in Fig. 4 it is visible around
i = 100, i.e. about 22 ms).
The parameters of absolute displacement between the end and beginning
of each plot of displacement change (|Δn|) indicate large discrepancies among
individual subjects. The standard deviations for mean values |Δn| exceed at
least half of the mean values, which proves the high variability of the data.
Nevertheless, a consistent observation for all subjects is an increase in the value of
|Δn| in the middle part of the cornea and in 1/4 and 3/4 of its width (see Fig. 6a),
i.e. in the areas subject to the greatest deformation during IOP measurements
by the air-puff method.
164 M. Jedzierowska
 et al.

Fig. 6. a) Graph showing the mean values of the absolute displacement (|Δn|) for
the selected ki values, which correspond to the horizontal position of the fragment. b)
Graph of the mean values of the maximum deviation (d) for the selected ki values

Another parameter (d), indicating the maximum deviation from the line
defined by two points describing the position of a given area at the beginning
and end of the IOP examination, showed an increase in the value of the max-
imum deviation for ki = 193, ki = 257 and ki = 321 in all the test subjects
(see Fig. 6b). An increase in d in the central part of the cornea is also visible in
Fig. 6b, presenting the mean values of d. The above corresponds to the observed
corneal changes during the IOP test with the air-puff method – the greatest
changes in the form of vibrations [1,14] are observed from 1/4 to 3/4 of the
corneal width, which, according to the obtained data, affects the largest corneal
structure changes in these areas. In addition, the greatest deviations in the cen-
tral part of the cornea indicate that the corneal structure gets denser on the
sides – the traced areas “shift to the side” when the cornea deforms the most.
It can be assumed that the visible displacements are related to stretching and
crimping of the collagen fibres of the cornea occurring under elastic deformation
of the cornea.
The parameter d, determined on the basis of the obtained displacements, is
characterized by lower values of the standard deviation than the parameter |Δn|.
Therefore, based on the preliminary analysis of the corneal structure changes
presented in this paper, it seems to be the most appropriate parameter for the
assessment of the discussed issue. At the present stage, the obtained results have
been submitted to ophthalmologists for verification under clinical conditions.

5 Discussion
The study analysed changes in the corneal structure during the intraocular pres-
sure examination using the air-puff method. The analysed images were obtained
from the Corvis ST tonometer equipped with a Scheimpflug camera, which
records corneal deformation and its morphological changes under the influence
of an air puff within approximately 30 ms [10,21]. Changes resulting from the
Analysis of Changes in Corneal Structure 165

acting force in the form of an air stream are described in biomechanics as a fast
loading and unloading process. The changes in the displacement of the corneal
structures observed in the conducted research can be described as propagating
waves, which is also visible to the naked eye when observing the images from the
IOP examination. The characteristic corneal structures move during dynamic
deformation in an asymmetrical manner (as shown in Fig. 4), which is reflected
in the observed asymmetry during the dynamic response of the cornea during
the loading and unloading process [2]. The studies conducted by the authors
are not devoid of a number of limitations that may have affected the obtained
results. Some of them include:
1. Introduction of an assumption about the constant corneal thickness. The
lack of reference to the actual corneal thickness, which in a healthy subject
is thinner in the middle and thicker on the sides, can potentially lead to
misinterpretation of the obtained results. Therefore, in further studies, the
authors intend to combine the measurement of the central corneal thickness
with the analysis of its structure.
2. The proposed algorithm for tracking specific areas is based on the correlation
parameter, thus the authors do not refer to the analysis of the corneal texture.
3. A small research group – 20 healthy subjects.
To the authors’ knowledge, the issue of corneal structure analysis, understood
as tracking its specific areas during the air-puff examination, has not yet been
described in the literature.

6 Summary
This paper presents the authors’ method for analysing corneal structure changes
during IOP measurements by using an air puff. The research was carried out
on data obtained from a Corvis ST tonometer equipped with a Scheimpflug
camera. The obtained results allow for real-time tracking of specific areas of
the cornea, the displacements of which, according to the preliminary results,
are characterized by asymmetry. In subsequent studies, the authors intend to
compare the obtained results for the group of healthy subjects and patients with
diseased corneas. The above will enable to assess the usefulness of the extracted
parameters, i.e. the absolute displacement |Δn| and the maximum deviation
d in the classification of eye diseases. The preliminary test results have been
submitted to ophthalmologists for verification in terms of their usefulness under
clinical conditions. To sum up, the preliminary analysis of corneal structure
changes carried out in this paper seems to be one of the possible ways to broaden
the knowledge about changes in the corneal morphology during its dynamic
deformation. Thus, it may allow for a more accurate assessment of the corneal
biomechanics, which plays an increasingly important role in the diagnosis of
corneal diseases [5,19] and in the screening of patients qualified for refractive
surgery [7,16]. However, further research is necessary to systematize and confirm
the obtained results.
166 M. Jedzierowska
 et al.

References
1. Boszczyk, A., et al.: Non-contact tonometry using Corvis ST: analysis of corneal
vibrations and their relation with intraocular pressure. J. Opt. Soc. Am. A. 36(4),
28–34 (2019). https://doi.org/1084-7529/19/040B28-07
2. Boszczyk, A., et al.: Novel method of measuring corneal viscoelasticity using the
Corvis ST Tonometer. J. Clin. Med. 11, 261 (2022)
3. Corrias, G., et al.: Texture analysis imaging ’what a clinical radiologist needs to
know’. Eur. J. Radiol. 146, 110055 (2022). https://doi.org/10.1016/j.ejrad.2021.
110055
4. Dash, S., Jena, U.R.: Multi-resolution Laws’ Masks based texture classification. J.
Appl. Res. Technol. 15(6), 571–582 (2017). https://doi.org/10.1016/j.jart.2017.07.
005
5. Elham, R., et al.: Keratoconus diagnosis using Corvis ST measured biomechanical
parameters. J. Curr. Ophthalmol. 29, 175–181 (2017). https://doi.org/10.1016/j.
joco.2017.05.002
6. Eliasy, A., et al.: Determination of corneal biomechanical behavior in-vivo for
healthy eyes using CorVis ST tonometry: stress-strain index. Front. Bioeng.
Biotechnol. 7(May), 1–10 (2019). https://doi.org/10.3389/fbioe.2019.00105
7. Esporcatte, L.P.G., et al.: Biomechanical diagnostics of the cornea. Eye Vis. (Lon-
don, England). 7, 9 (2020). https://doi.org/10.1186/s40662-020-0174-x
8. Han, Z., et al.: Air puff induced corneal vibrations: theoretical simulations and
clinical observations. J. Refract. Surg. 30(3), 208–213 (2014). https://doi.org/10.
3928/1081597X-20140212-02
9. Jedzierowska,
 M., et al.: A new method for detecting the outer corneal contour in
images from an ultra-fast Scheimpflug camera. Biomed. Eng. Online. 18(1), 115
(2019). https://doi.org/10.1186/s12938-019-0735-1
10. Kling, S., Hafezi, F.: Corneal biomechanics - a review. Ophthalmic Physiol. Opt.
1–13 (2017). https://doi.org/10.1111/opo.12345
11. Kling, S., Marcos, S.: Contributing factors to corneal deformation in air puff mea-
surements. Invest. Ophthalmol. Vis. Sci. 54(7), 5078–85 (2013). https://doi.org/
10.1167/iovs.13-12509
12. Koprowski, R.: Image Analysis for Ophthalmological Diagnosis. Springer (2016).
https://doi.org/10.1007/978-3-319-29546-6
13. Koprowski, R., Ambrósio, R.: Quantitative assessment of corneal vibrations dur-
ing intraocular pressure measurement with the air-puff method in patients with
keratoconus. Comput. Biol. Med. 66, 170–178 (2015). https://doi.org/10.1016/j.
compbiomed.2015.09.007
14. Koprowski, R., Wilczyński, S.: corneal vibrations during intraocular pressure mea-
surement with an Air-Puff Method. J. Healthc. Eng. 13 (2018). https://doi.org/
10.1155/2018/5705749
15. Kumar, I., et al.: Wavelet packet texture descriptors based four-class BIRADS
breast tissue density classification. Procedia Comput. Sci. 70, 76–84 (2015).
https://doi.org/10.1016/j.procs.2015.10.042
16. Ortiz, D., et al.: Corneal biomechanical properties in normal, post-laser in situ
keratomileusis, and keratoconic eyes. J. Cataract Refract. Surg. 33(8), 1371–1375
(2007). https://doi.org/10.1016/j.jcrs.2007.04.021
17. Qin, X., et al.: Evaluation of corneal elastic modulus based on Corneal Visualization
Scheimpflug Technology. Biomed. Eng. Online. 1–16 (2019). https://doi.org/10.
1186/s12938-019-0662-1
Analysis of Changes in Corneal Structure 167

18. Shin, Y.G., et al.: Histogram and gray level co-occurrence matrix on gray-scale
ultrasound images for diagnosing lymphocytic thyroiditis. Comput. Biol. Med. 75,
257–266 (2016). https://doi.org/10.1016/j.compbiomed.2016.06.014
19. Vinciguerra, R., et al.: Detection of Keratoconus with a new biomechanical index.
J. Refract. Surg. 32(12), 803–810 (2016). https://doi.org/10.3928/1081597X-
20160629-01
20. Yazdi, A.A., et al.: Characterization of non-linear mechanical behavior of the
cornea. Sci. Rep. 10, 11549 (2020). https://doi.org/10.1038/s41598-020-68391-7
21. Zhang, D., et al.: Exploring the biomechanical properties of the human cornea in
vivo based on Corvis ST. Front. Bioeng. Biotechnol. 9(November), 1–10 (2021).
https://doi.org/10.3389/fbioe.2021.771763
Discrimination Between Stroke and Brain
Tumour in CT Images Based
on the Texture Analysis

Monika Kobus, Karolina Sobczak, Mariusz Jangas, Adrian Światek,



and Michal Strzelecki(B)

Institute of Electronics, Lodz University of Technology,


Al. Politechniki 10, 93-590 L
 ódź, Poland
{monika.kobus,karolina.sobczak,mariusz.jangas,
adrian.swiatek}@edu.p.lodz.pl, michal.strzelecki@p.lodz.pl

Abstract. The brain is one of the most important organs in the human
body because it is its control centre, and any disease of the brain is a
real threat to human life. A brain tumour is a newly formed, foreign
structure, whose growth causes an increase in intracranial tightness. A
stroke is defined as a sudden neurological deficit caused by central ner-
vous system ischemia or haemorrhage. CT is a routine examination when
these diseases are suspected. However, the distinction between stroke and
tumour is not always straightforward, even for experienced radiologists.
There are solutions for detecting each disease separately, but there is no
system that would support distinguishing of both diseases. Therefore,
the aim of this work is to develop a system that allows discrimination
between a stroke and a brain tumour on CT images based on the anal-
ysis of the texture features calculated for the region of interest marked
by radiologist. Next, feature selection was performed by Fisher crite-
rion and convex hull approach. Finally, selected features were classified
with use of the Classification Learner application available in MATLAB
R2021b. Classification methods based on classification trees, k-nearest
neighbours (KNN), and support vector machine (SVM) gave the best
classification results. It was demonstrated that classification accuracy
reached over 95% for the analysed feature set that is promising result in
semiautomatic discrimination between stroke and tumour in CT data.

Keywords: Stroke · Brain tumour · Texture analysis · CT imaging

1 Introduction

The brain is one of the most important organs in the human body because it
is its control centre. The human brain is responsible for functions necessary for
human survival, such as proper heartbeat, digesting meals, receiving sensory
impressions, learning and movement [12]. For this reason, any disease of the
brain is a real threat to human life. A brain tumour is a newly formed, foreign
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 168–180, 2022.
https://doi.org/10.1007/978-3-031-09135-3_15
Discrimination Between Stroke and Brain Tumour 169

structure, whose growth causes an increase in intracranial tightness. According


to the National Cancer Registry, the incidence of a brain tumour in Poland
amounts to approximately 2,800 cases per year, of which approximately 2,300
patients die [7]. There are several intracranial malignant tumours the human;
one of the deadliest and resistant to radiotherapy and chemotherapy is a glioma.
The glioma is classified into four grades: grade I includes pilocytic astrocytoma,
grades II-III include diffuse or anaplastic astrocytoma and oligodendrogliomas,
while grade IV includes most malignant glioblastomas [6]. Symptoms of brain
tumours depend on their location and size. The most common are headaches,
blurred vision or double vision, confusion, seizures, weakness of a limb, a change
in mental functioning. Another serious disease of the brain is stroke, which
is defined as a sudden neurological deficit caused by central nervous system
ischemia or haemorrhage. About 60,000 cases of stroke are reported annually.
Stroke was the second leading cause of death worldwide in 2012 [20], so correct
and timely diagnosis is very important to start the effective treatment. 80–85% of
strokes are ischemic strokes, 10–12% are hemorrhagic strokes, and 5–7% are sub-
arachnoid hemorrhage (SAH). Strokes can be also divided considering dynamics
of symptoms, type of disrupted vessels or pathogenesis mechanism. The ischemic
stroke happens when a sudden obstruction of the blood supply or a reduction
of normal cerebral blood flow (CBF) causes brain injuries. A cerebral infarc-
tion (brain tissue damage due to a low supply of nutrients) is caused by blood
vessel blockage or by drastic reduction of CBF in the overall brain because of
large artery atherosclerosis [22]. The symptoms of a stroke include sudden pare-
sis, numbness or weakness, especially on one side of the body, sudden confusion;
trouble seeing in one or both eyes; sudden trouble with walking, severe headache.
Distinguishing between brain tumour and stroke in patients with acute focal
neurologic signs and symptoms was a particular challenge before the CT era.
About 1–3% of patients were misdiagnosed at that time. Currently, CT is a
routine examination when these diseases are suspected. In the case of ischemic
stroke, the sensitivity of CT in the first 24 h is low, the percentage of correct
diagnoses is estimated at 60%, but it increases in the following days, reaching
practically 100% after 7 days from the onset of symptoms. In CT brain tumour
may be distinguished from stroke by the presence of multiple lesions, mass effect,
lack of a vascular distribution and contrast enhancement. However, some overlap
does exist [13]. In some cases, the CT scan result is ambiguous – the radiological
picture does not correlate with the clinical diagnosis. It also happens that the
oedema is represented by a “finger-like” shape in CT image and raises the suspi-
cion of a tumour. This is a special form of stroke called “tumour mask of ischemic
stroke” [23]. In the event of sudden paresis, the possibility of a brain tumour
should also be considered. The neurological symptoms in this case are due to
the presence of an abnormal “mass” in the brain. The presence of a pathological
mass causes the appearance of characteristic symptoms in CT that belong to the
criteria for the diagnosis of the proliferative process (presence of an area with an
abnormal density, swelling, reduction of intracranial reserve). The above image
of a radiological examination may, however, occur not only in the case of the
170 M. Kobus et al.

proliferative process and often raises doubts [22]. The problem of discriminating
between stroke and tumour is also discussed in [8] where gliomas are identified
as one of the factors that may mimic a stroke in CT, which may lead to mis-
diagnosis. To avoid this, a special protocol was developed which, in addition to
the CT examination, includes medical history of the patient, neurological symp-
toms, blood pressure, ECG, blood tests as well as other imaging methods like
chest X-Ray. Also, in [18] other nonischemic conditions that mimic the presence
of stroke were analysed. It was demonstrated that application of multimodal CT
protocol (including besides the standard CT also contrast-enhancement CT, CT
angiography and CT perfusion), due to high specificity in the diagnosis of stroke
mimic, is able supporting the clinician in the choice of avoiding revascularization
treatment in patients diagnosed with stroke mimics.
Thus, the distinction between stroke and tumour visualised in CT images is
not always straightforward, even for experienced radiologists (especially since the
other observed clinical symptoms of these two conditions may be similar). In [13]
the retrospective study made on 224 patients with brain tumour demonstrated
that 4.9% of them were initially misdiagnosed as stroke. The conclusion from this
study is that CT is frequently employed for acute neurologic deficits to exclude
intracranial haemorrhage but CT may not be always sufficient to exclude brain
tumour. For these reasons, there is a real need to develop a system that will
support the diagnosis of these two brain diseases.
There are several reports in the literature on the use of computer techniques
for the diagnosis of tumour and stroke separately.
Fahmi [4] developed the automatic detection of a brain tumour in computed
tomography (CT) images. After thresholding, the image features were deter-
mined on the images, and then the “zoning” method was applied, which facili-
tates the extraction of features by dividing into zones. As a result, vector features
are obtained that can be used for classification. LVQ (Learning Vector Quanti-
zation) was used – a type of machine learning that enables training in supervised
layers in the competitive layer. By combining the zoning algorithm with vector
quantization learning, they obtained the classification of brain CT images into
healthy and sick with an average accuracy of 85%.
Another example of brain tumour detection was presented by Nanthagopal
and Rajamony [14]. The method they use is based on regions of interest. Seg-
mentation is performed using the Ncut method. It divides the image into smaller
fragments maintaining the continuity of the edges. Another approach is to use
the fuzzy c-means grouping method. This technique assigns each sample to a
cluster considering the similarity. This way, you can distinguish an invalid region
from a normal one. Then an analysis of the texture characteristics is performed.
The edge features were considered, including the average, variance of the pixel
matrix, as well as the energy or entropy of the entire studied area. The features
are determined with the use of the wavelet transform. Classification is done using
the vector machine supporting SVM, which is an example of supervised learning.
This solution uses a non-linear fit. The greatest efficiency was achieved using a
nucleus with a Gaussian RBF function.
Discrimination Between Stroke and Brain Tumour 171

Ramakrishnan [21] also presented how to classify brain CT images for brain
tumour detection. The classification process proposed in this approach is carried
out by Support-vector Machine (SVM) with various kernel functions and opti-
mization techniques followed by a segmentation process carried out by Modified
Region Growing (MRG) with threshold optimization. Algorithms such as Har-
mony Search (HS), Evolutionary Programming (EP) and Gray Wolf Optimiza-
tion (GWO) were used in threshold optimization. The entire approach was imple-
mented on the MATLAB platform. The obtained experimental results showed
that the proposed methodology (MRG-GWO) is characterized by high accuracy
and increased tumour detection compared to the other two techniques, achieving
an accuracy of 99.05%.
Qasem [19] to detect brain tumours used MRI scans, a watershed technique
(that enabled feature extraction) and machine learning. They decided to use
MRI scans instead of computed tomography images because of the less harm-
ful examination. The applied segmentation ensured appropriate identification of
regions (both foreground and background) with minor calculations. It also solved
the problem with wrong edges and distorted borders in images. The algorithm
of k-nearest neighbours was used for the classification.
Šušteršič [28] used active contour methods to evaluate the optimal trajectory
of the approach to a brain tumour. In this case, segmented CT images were used.
For this purpose, a method was used based on active contour models, considering
the following parameters: the complexity of the initial conditions, the accuracy
of the recognition of the tumour surface, computational time. The disadvantages
of this method are the strong dependence on the initial curve position and the
high sensitivity to image noise.
One of the methods of automatic detection of stroke in CT images was devel-
oped by Chawla [1], which is based on the detection of abnormalities between
the left and right hemispheres of the brain. It uses image histograms that differ
significantly in the event of a stroke. The method used ensures 90% accuracy
and 100% sensitivity in detecting abnormalities at the patient’s level, while at
the layer level it achieves an average precision of 91% and a sensitivity of 90%.
A different approach was proposed by Dourado [3], in which to classify strokes
from CT images deep learning was applied. The convolutional neural network
(CNN) to identify a healthy brain, ischemic stroke or haemorrhagic stroke was
implemented. The designed CNN network was combined with various consoli-
dated machine learning methods such as Bayesian classifier, multi-layer percep-
tron, k-Nearest Neighbour, Random Forest, and Support-vector Machine.
Kalmutskiy [10] in their work focused on the method of automatic recognition
of strokes with the use of brain images without contrast computed tomography.
The distinguishing features of this solution are the use of a data set with a very
small number of images, as well as the fact that traditional methods of computer
vision and a convolutional neural network were applied to solve the problem.
Augmentation techniques and sub-image division have also been implemented
to increase the analysed dataset.
172 M. Kobus et al.

Nedel’ko [15] also presented a method that allows the detection of strokes
from CT without the used contrast. The technique used deep neural networks
and classifiers based on image texture properties. They applied the basic U-
net neural network architecture and Haralick texture features extracted from
images for the following classifiers: SVM, Adaboost, KNN and Random Forest.
This method makes it possible to provide information that makes it possible to
recognize contours and evaluate the importance of texture features. Reported
accuracy was 83%.
Ostrek [17] used the diagnosis of strokes based on computed tomography
images. Eight non-stroke cases and thirty-two stroke cases were analyzed. The
method was based on automatic CT image segmentation, multiscale analysis,
expression, and visualization of semantic maps, as well as automatic patch based.
The following results were obtained: accuracy 74.6%, sensitivity 64.2%, speci-
ficity 82.6%.
From the above literature review it appears that there are solutions for detect-
ing stroke and brain tumor separately, but no methods that distinguish the two
diseases. This may mean that the need to distinguish them simultaneously has
not been widely identified in the neurology community. On the other hand, a
team of neurologists and radiologists from the Department of Neurology and
Stroke, Medical University of Lodz, identified the difficulty in discriminating
between these two diseases as an important diagnostic problem. Therefore, this
paper presents a system that allows to distinguish a stroke from a brain tumor
on CT images based on the analysis of texture features, which will support doc-
tors in making a correct diagnosis. No additional medical imaging techniques
(such as accurate but expensive MRI, CTA or CTP) will be needed to identify
these diseases, saving patients time and money. It was already demonstrated
that image texture reflects the structure and properties of visualized organs and
tissues, for almost all medical imaging modalities, like e.g. ultrasound [2,16],
MRI [5,24] or CT [14,15]. Thus, texture analysis approach was also selected to
solve the problem discussed in this research.

2 Materials and Methods


2.1 Dataset
In this study CT images of a stroke Fig. 1(a) and brain tumour Fig. 1(b) in
BMP file format were used. In total 39 CT images of an ischemic early stroke
(stroke effects were already visible in the CT scan) with size of 588 × 714 pixels,
collected from 15 patients and 44 CT images of brain glioblastomas (grade IV)
with size of 668 × 602 pixels, acquired from 16 patients were analysed. The
data was acquired in Military Medical Academy Memorial Teaching Hospital
of the Medical University of Lodz – Central Veterans’ Hospital using Siemens
syngo CT (acquisition parameters: voxel size 0.6 × 0.6 × .1.0 mm, FoV 256 mm,
reconstruction kernel H31s). Each image region of interest (ROI) was marked
manually by two radiologists with over 20 years’ experience where was suspicion
of stroke or brain tumour occurrence. The ROIs size for stroke ranged from 221
Discrimination Between Stroke and Brain Tumour 173

to 10936 (mean 2968) pixels and the ROIs size for brain tumour ranged from 247
to 4646 (mean 1896) pixels. Such ROI sizes ensure reliable estimation of texture
parameters.

Fig. 1. Sample images with marked ROIs: (a) stroke, (b) brain tumour

2.2 Methodology
A block diagram of the adopted methodology is shown in Fig. 2. The input of the
algorithm is the image with marked ROI that represents brain pathology. In the
first step, 337 texture features (analysis options: no ROI normalization, 8 bits
grey level images) of each ROI in the image using Qmazda program were com-
puted automatically with batch file for loading images and calculating selected
texture features. Qmazda is a software package for digital image analysis, includ-
ing computing shape, colour, and texture attributes of any shape of regions of
interest. Reduction of number of bits per pixel to 8 (originally acquired images
are 12 bits coded) is caused by the need of noise reduction in collected texture
information. Bigger number of bits better reflects the tissue structure in the
image. However, on the same time the noise (always present in medical images)
is emphasized. It blurs the texture and as a result modifies texture parameter
values. It was demonstrated that lower number of bits may improve the tex-
ture classification accuracy [11,25]. Qmazda package consists of four programs:
MaZda, MzGengui, MzReport and MzMaps [26,27]. In this study MzReport
program was used to select 10 features based on texture analysis of ROI. The
MzReport is a program for data analysis, selection of most discriminative fea-
tures, visualization of feature vector distributions, supervised machine learning
and testing of the resulting classifiers [26]. Selection of mentioned texture features
174 M. Kobus et al.

was done with three algorithms implemented in the MzReport program: Fisher
coefficient-based discrimination, Mutual information and Convex hull based dis-
crimination. Then, based on selected features the data classification was per-
formed using Classification Learner application available in MATLAB R2021b.
Manual feature reduction was performed before the classification to find most
suitable feature subsets (containing 2 or 3 texture parameters) and to avoid the
classifier overtraining. This is acceptable number of dimensions when compared
to the number of samples (83). This also guaranties reasonable classifier size
as well as reliable classification results. The automatic feature selection meth-
ods use general criteria (like Fisher coefficient or Mutual Information values)
that not always ensure the best feature subset selection. It is due to e.g. pos-
sible feature correlation (Fisher approach) or non-optimal convex hulls shape
(caused by outliers). Thus, not only the features with highest rank generated
by feature selection methods were tested but also subsets selected from best 4–5
features were analyzed. Thanks to this approach, it was possible to obtain a
better classification result (at the level of 2% percent approximately). The Clas-
sification Learner trained models to classify data into strokes and tumours cases.
To learn the model, the following 32 classifiers were used: Decision Trees, Linear
and Quadratic Discriminant Analysis, Logistic Regression, Naive Bayes Clas-
sifiers, Support Vector Machines (with linear, quadratic, cubic, and Gaussian
kernels), K-Nearest Neighbour Classifiers (k = 1, 10, 100), Ensemble Classifiers
(Boosted Trees, Bagged Trees, Subspace Discriminant, Subspace KNN, RUS-
Boosted Trees), Multilayer Perceptrons (different structures), Kernel Approx-
imation Classifiers. All experiments involving three feature selection methods
and various classifiers were conducted independently, thus for each case the same
relation between number of features and training samples was preserved.

Fig. 2. Block diagram of the adopted methodology

3 Results

Using three feature selection methods, sets of 10 texture parameters were


obtained. Interestingly, none of the selected features were assigned to more than
one set. That means that each set contained unique features. It is worth not-
ing that among the features selected using Fisher discriminant, most of them
were based on properties of brightness distribution (histogram features). Other
parameters were calculated from the autoregressive model, gradient map, and
histogram of oriented gradients (HOG). In the sets of features chosen by other
Discrimination Between Stroke and Brain Tumour 175

selection methods, mutual information, and convex hull, 90% of all were based
on the Gray Level Co-occurrence Matrix (GLCM). Features were computed for
four specified directions: 0◦ (horizontal), 45◦ , 90◦ (vertical), and 135◦ and inter-
pixel distances in the range of 1 to 4. The remaining texture features were based
on the histogram, including the mean value and Grey-Level Run-Length Matrix
(evaluated in the vertical direction).
For further analysis, manual feature reduction was performed for the obtained
selected sets of 10 features. Its aim was to limit the number of features by
discarding the correlated ones and others, that did not improve the classification
result. Resulting subset contain 2 or 3 features. Such low number of input features
simplify the classifier architecture and avoid the overtraining. In each case, the
best results were obtained for the combinations of features with highest ranks as
selected by the feature selection algorithms. For each selected subset of features,
32 different classifiers were applied with 5-fold cross-validation. For each of 5
training experiments similar quality measure values were observed what means
that classifier achieved good generalization for the investigated dataset, avoiding
overfitting. The best classification results obtained by feature subsets selected
by three algorithms are presented in Table 1 (three best classifiers were selected
for each feature classification method). Parameters in Table 1 were evaluated as
a means of values obtained in each experiment during 5-fold cross-validation.

Table 1. Quality measure values for best classifiers applied for selected texture feature
subsets (Acc – Accuracy, Sens – Sensitivity, Spec – Specificity)

Selection method Features Classification method Acc [%] Sens [%] Spec [%]
Fisher HistPerc99 Weighted KNN 96.4 92.9 100.0
Discrimination ArmTeta2 Fine KNN 95.2 94.9 95.5
(FD) GradMean Fine Gaussian SVM 94.0 94.7 93.3
Mutual glcmZ4InvDfMom Bagged Trees 90.4 89.7 909
Information glcmV1AngScMom Weighted KNN 88.0 80.9 97.2
(MI) grlmVLngREmph Fine Gaussian SVM 86.7 80.9 92.3
Convex glcmH2SumVarnc Fine Tree 95.2 90.7 100.0
Hull HistMean Weighted KNN 94.0 92.5 95.3
(CH) Bagged Trees 94.0 90.5 97.6

As shown in Table 1, the features selected by all three methods can be used
to develop a system with high classification efficiency around 86% and higher,
up to over 96%. As can be seen, the best results were achieved for the Fisher
discriminant (FD) and convex hull (CH) selection method, while the weakest
when using mutual information. It is worth noting that the best results for FD-
based selection were achieved using three features, while for CH-based selection
only two were needed. It can be also observed that among all the classifica-
tion methods, the most effective were those based on classification trees, KNN
176 M. Kobus et al.

(k = 10), and SVM with Gaussian kernel. Sample feature distributions and con-
fusion matrices (generated by the MATLAB classification Learner application)
are shown in Figs. 3, 4 and 5.

Fig. 3. (a) Distribution of two FD-based texture features and (b) confusion matrix of
the weighted 10-NN classifier

Fig. 4. (a) Distribution of two MI-based texture features and (b) confusion matrix of
the the bagged trees classifier
Discrimination Between Stroke and Brain Tumour 177

Fig. 5. (a) Distribution of two CH-based texture features and (b) confusion matrix of
the the fine tree classifier

4 Discussion and Conclusion

Performed experiments demonstrated that for analysed data, Fisher based cri-
terion and convex hull-based feature selection are the best methods of obtaining
features that can be used to discriminate between brain tumour and stroke
CT images. It is worth highlighting that second method used only two features
instead of three as in the first approach while the similar accuracy was obtained
(over 95%). Significantly lower scores were observed for the features selected by
mutual information method. Therefore, it can be concluded that convex hull
method selects more discriminative features that are better able to separate
images into classes.
Among all the selected features, most were based on the histogram and Gray
Level Co-occurrence Matrix (GLCM). This demonstrates that not only the tex-
ture features can be used to classification, but also the histogram ones. Inter-
estingly, the combination of these two types of features, mean brightness and
sum variance, turned out to be one of the best solutions with an accuracy of
95,2%. A higher score (96,2%) was achieved only for classification using three
features selected by Fisher discriminant (histogram 99th percentile, autoregres-
sive model parameter, and gradient map mean). Such results indicate that dif-
ferences between brain stroke and tumour visualized in CT images are not only
caused by the structure of diseased brain tissue (reflected by the texture param-
eters) but also by different grey level distribution that characterizes analysed
types of pathologies (coded by the histogram features).
178 M. Kobus et al.

It was also demonstrated that classification methods based on the decision


trees, k nearest neighbours (10-NN), and support vector machine (SVM) gave the
best results regardless of feature set. The features selected for the classification
separated the classes very well, which was confirmed by the high accuracy of these
classifiers. What is also important, obtained sensitivity and specificity values
exceeded 90% for almost all cases that means reduced risk of misclassification of
both diseases.
The accuracy of the classification turned out to be comparable with the
previously obtained results reported in the literature. Due to the lack of available
results of both disease discrimination, the results of a separate classification
of tumour and stroke were used. Among the classifiers of brain tumours, the
Affected Region Extraction method [19] achieved the accuracy of slightly over
94%, while the SVM [14] even nearly 98%. For strokes, the accuracy values
achieved in other studies were: 92% for machine learning [10] and 74,6% for the
support vector machine [17].
The limitation of the study was certainly the limited number of CT scans
with ROIs marked by the radiologists (3 scans per patients) as well as number
of participating patients. To confirm the obtained results, it is necessary to test
classifiers on a larger set of images. In addition, it would be worth comparing
the results obtained on images from various CT scanners. Particular attention
should be paid to the histogram features, and check if they are still efficient ones
for multicentre studies.
Discrimination between stroke and brain tumour based on the texture anal-
ysis of CT images is an appealing solution that has a chance to be implemented
in the future after further extensive testing on larger datasets. The obtained
preliminary results are promising and give a chance for eliminate the need for
complementary imaging methods e.g., MRI, if there is a difficulty in classifying
the disease. Further work will be focused on improving the discrimination accu-
racy. The ROI normalization will be performed to check if still grey level distri-
bution is a source of significant differences between two analysed brain patholo-
gies. Additionally, if it will be possible to collect much larger data set, deep
learning approach will be implemented to improve the classification accuracy. It
is also planned to implement automatic segmentation of discussed pathologies
which allows calculating the quantitative information about detected structures
in digital images [9], like e.g. volumes or shape parameters. Also, the 3D texture
analysis will be performed in this case to verify whether the spatial informa-
tion about the tissue structure will be useful in discrimination between to brain
pathologies.

Acknowledgement. We would like to thank to Professor Andrzej Klimek, former


head of the Department of Neurology and Strokes, Medical University of Lodz who
initiated this study and provided CT data for analysis.
Discrimination Between Stroke and Brain Tumour 179

References
1. Chawla, M., Sharma, S., Sivaswamy, J., Kishore, L.T.: A method for automatic
detection and classification of stroke from brain CT images. In: Proceedings of the
31st Annual International Conference of the IEEE Engineering in Medicine and
Biology Society: Engineering the Future of Biomedicine, EMBC 2009, pp. 3581–
3584 (2009). https://doi.org/10.1109/IEMBS.2009.5335289
2. Chrzanowski, L., Drozdz, J., Strzelecki, M., Krzeminska-Pakula, M., Jedrzejewski,
K.S., Kasprzak, J.D.: Application of neural networks for the analysis of intravas-
cular ultrasound and histological aortic wall appearance - an in vitro tissue char-
acterization study. Ultrasound Med. Biol. 34, 103–113 (2008). https://doi.org/10.
1016/J.ULTRASMEDBIO.2007.06.021
3. Dourado, C.M., da Silva, S.P.P., da Nóbrega, R.V.M., Antonio, A.C., Filho, P.P.,
de Albuquerque, V.H.C.: Deep learning IoT system for online stroke detection in
skull computed tomography images. Comput. Netw. 152, 25–39 (2019). https://
doi.org/10.1016/J.COMNET.2019.01.019
4. Fahmi, F., Apriyulida, F., Nasution, I.K., Sawaluddin: Automatic detection of
brain tumor on computed tomography images for patients in the intensive care
unit. J. Healthc. Eng. 2020 (2020). https://doi.org/10.1155/2020/2483285
5. Gentillon, H., Stefańczyk, L., Strzelecki, M., Respondek-Liberska, M.: Parameter
set for computer-assisted texture analysis of fetal brain. BMC Res. Notes 9, 1–
18 (2016). https://doi.org/10.1186/S13104-016-2300-3/TABLES/2. https://link.
springer.com/articles/10.1186/s13104-016-2300-3
6. Ghosh, M.K., Chakraborty, D., Sarkar, S., Bhowmik, A., Basu, M.: The interre-
lationship between cerebral ischemic stroke and glioma: a comprehensive study of
recent reports. Sign. Transduct. Target. Ther. 4(1), 1–13 (2019). https://doi.org/
10.1038/s41392-019-0075-4
7. Gośliński, J.: Nowotwory ukladu nerwowego - przyczyny i rodzaje - zwrotnik raka.pl
(2019). https://www.zwrotnikraka.pl/przyczyny-rodzaje-guzow-mozgu
8. Hatzitolios, A., et al.: Stroke and conditions that mimic it: a protocol secures a
safe early recognition. Hippokratia 12(2), 98 (2008)
9. Janowski, P., Strzelecki, M., Brzezinska-Blaszczyk, E., Zalewska, A.: Computer
analysis of normal and basal cell carcinoma mast cells. Med. Sci. Monit. 7(2),
260–265 (2001)
10. Kalmutskiy, K., Tulupov, A., Berikov, V.: Recognition of tomographic images in
the diagnosis of stroke. Lecture Notes in Computer Science (including subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol.
12665 LNCS, pp. 166–171 (2021). https://doi.org/10.1007/978-3-030-68821-9 16
11. Kociolek, M., Strzelecki, M., Obuchowicz, R.: Does image normalization and inten-
sity resolution impact texture classification? Comput. Med. Imaging Graph. 81,
101,716 (2020). https://doi.org/10.1016/j.compmedimag.2020.101716
12. Klos-Wojtczak, P.: Mózg czlowieka - jaka jest jego budowa i funkcje? hellozdrowie
(2019). https://www.hellozdrowie.pl/mozg-czlowieka-anatomia-i-fizjologia-
organu/
13. Morgenstern, L.B., Frankowski, R.F.: Brain tumor masquerading as stroke. J. Neu-
rooncol. 44(1), 47–52 (1999). https://doi.org/10.1023/A:1006237421731
14. Nanthagopal, A.P., Rajamony, R.S.: A region-based segmentation of tumour from
brain CT images using nonlinear support vector machine classifier. J. Med.
Eng. Technol. 36, 271–277 (2012). https://doi.org/10.3109/03091902.2012.682638.
https://pubmed.ncbi.nlm.nih.gov/22621242/
180 M. Kobus et al.

15. Nedel’ko, V., Kozinets, R., Tulupov, A., Berikov, V.: Comparative analysis of deep
neural network and texture-based classifiers for recognition of acute stroke using
non-contrast CT images. In: Proceedings - 2020 Ural Symposium on Biomedical
Engineering, Radioelectronics and Information Technology, USBEREIT 2020, pp.
376–379 (2020). https://doi.org/10.1109/USBEREIT48449.2020.9117784
16. Obuchowicz, R., Kruszyńska, J., Strzelecki, M.: Classifying median nerves in carpal
tunnel syndrome: ultrasound image analysis. Biocybern. Biomed. Eng. 41(2), 335–
351 (2021). https://doi.org/10.1016/j.bbe.2021.02.011
17. Ostrek, G., Przelaskowski, A.: Automatic early stroke recognition algorithm in
CT images. Lecture Notes in Computer Science (including subseries Lecture Notes
in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7339 LNBI,
pp. 101–109 (2012). https://doi.org/10.1007/978-3-642-31196-3 11. https://link.
springer.com/chapter/10.1007/978-3-642-31196-3 11
18. Prodi, E., et al.: Stroke mimics in the acute setting: role of multimodal CT protocol.
Am. J. Neuroradiol. (2022). https://doi.org/10.3174/ajnr.A7379
19. Qasem, S.N., Nazar, A., Qamar, A., Shamshirband, S., Karim, A.: A learning based
brain tumor detection system. Comput. Mater. Cont. 59, 713–727 (2019). https://
doi.org/10.32604/CMC.2019.05617
20. Raciborski, F., Gawińska, E., Klak, A., Slowik, A., Wnuk, M.: Udary mózgu:
rosnacy
 problem w starzejacym  sie spoleczeństwie. Instytut Ochrony Zdrowia
(2016)
21. Ramakrishnan, T., Sankaragomathi, B.: A professional estimate on the com-
puted tomography brain tumor images using SVM-SMO for classification
and MRG-GWO for segmentation. Patt. Recogn. Lett. 94, 163–171 (2017).
https://doi.org/10.1016/J.PATREC.2017.03.026. https://dl.acm.org/doi/abs/10.
1016/j.patrec.2017.03.026
22. (red.), S.M.: Radiologia - diagnostyka obrazowa, cześć  II. Akademia Medyczna w
Gdańsku (2001)
23. Sobotko-Waszczeniuk, O., L  ukasiewicz, A., Pyd, E., Janica, J.R., L  ebkowska, U.:
Differentiation of density of ischaemic brain tissue in computed tomography with
respect to neurological deficit in acute and subacute period of Ischaemic stroke.
Polish J. Radiol. 74(3) (2009)
24. Strzelecki, M.: Texture boundary detection using network of synchronised oscilla-
tors. Electron. Lett. 40, 466–467 (2004). https://doi.org/10.1049/EL:20040330
25. Strzelecki, M., Kociolek, M., Materka, A.: On the influence of image features
wordlength reduction on texture classification. In: International Conference on
Information Technologies in Biomedicine, pp. 15–26. Springer (2018). https://doi.
org/10.1007/978-3-319-91211-0 2
26. Szczypinski, P.M., Klepaczko, A., Kociolek, M.: Qmazda - software tools for image
analysis and pattern recognition, pp. 217–221. IEEE Computer Society (2017).
https://doi.org/10.23919/SPA.2017.8166867
27. Szczypiński, P.M.: qmazda manual (2020). http://www.eletel.p.lodz.pl/pms/
Programy/qmazda.pdf
28. Šušteršič, T., Peulić, M., Filipović, N., Peulić, A.: Application of active contours
method in assessment of optimal approach trajectory to brain tumor. In: 2015
IEEE 15th International Conference on Bioinformatics and Bioengineering, BIBE
2015 (2015). https://doi.org/10.1109/BIBE.2015.7367661
The Influence of Textural Features
on the Differentiation of Coronary Vessel
Wall Lesions Visualized on IVUS Images

Weronika Malek1(B) , Tomasz Roleder2 , and Elżbieta Pociask1


1
Faculty of Electrical Engineering, Automatics, Computer Science
and Biomedical Engineering, AGH University of Science and Technology,
Mickiewicza 30 Av., 30-059 Kraków, Poland
wmalek@student.agh.edu.pl, elzbieta.pociask@agh.edu.pl
2
Regional Specialist Hospital, Research and Development Center, ul,
H. Kamieńskiego 73a, 51-124 Wroclaw, Poland

Abstract. Distinguishing atherosclerotic plaque from other image ele-


ments is crucial for patient diagnosis and treatment. Due to the similar
echogenicity of soft (lipid) plaques and blood, there is a possibility of
misinterpreting the size of the vessel lumen, which is a key element in
correct stent selection. In this study, by means of texture analysis of
the IVUS image area in which the plaque is located, plaque was distin-
guished from the lumen of the vessel in which blood is present. qMaZda
software was used for this as it allows extensive analysis of image tex-
ture. The analysis was performed for features that were derived from
the brightness of the image. Subsequently, the focus was on statistical
analysis of the obtained features. The texture features that allow differ-
entiation between blood in the lumen area and plaque were determined
using Pearson’s and Spearman’s correlation. In order to obtain good clas-
sification accuracy, classification of the values of the selected parameters
was performed using logistic regression. The results were presented using
a confusion matrix and a plot of logistic regression. The texture features
that are capable of differentiating plaque from blood in IVUS images
were identified.

Keywords: Textural analysis · Plaque · Image texture · IVUS

1 Introduction
The rush of life, poor diet and stress cause many types of disease that affect
human health. One of them is atherosclerosis, which causes lipids to accumulate
in the walls of blood vessels. Development of atherosclerosis affects the release
of growth factors and cytokines that cause inflammation in blood vessels. The
first lesion is a fatty band which contains T lymphocytes and foam cells in its
structure [1]. The disease can progress, thus causing this fatty band to turn
into an atherosclerotic plaque. There are different types of plaques with dif-
ferent echogenicity, hence they are visible and distinguishable on IVUS images.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 181–193, 2022.
https://doi.org/10.1007/978-3-031-09135-3_16
182 W. Malek et al.

Intravascular ultrasound is a method that uses ultra-sound waves to image struc-


tures with different acoustic impedance. The waves are reflected at the borders
of media with different impedances; this causes them to return to the receiver,
where they are converted into an electrical signal from which a gray-scale image
can be constructed. In the case of atherosclerotic plaques, ultrasound waves are
least reflected by soft plaques, which mainly consist of lipids [2] and are unsta-
ble and prone to rupture. As a result, they may form thrombi that cause partial
or complete occlusion of a blood vessel, which leads to myocardial necrosis or
infarction. One way to treat this is to enlarge the lumen of the closing vessel
by inserting a stent. Different types of mechanisms and types of stents are used
to increase vessel diameter. Depending on the generation of the stent, the abil-
ity to show them after insertion into the vessel varies. Second-generation stents
allow IVUS imaging before and after the procedure. Imaging is needed to deter-
mine the mechanisms of vessel lumen expansion after the use of an intracoronary
stent. IVUS image analysis is based on the analysis of individual elements, such
as the vessel lumen or EEM. Studies have shown that the stenting method is
effective because patients who have undergone procedures that are visualizable
with IVUS show plaque reduction, plaque redistribution, vessel dilation and dis-
section, depending on the device and plaque composition [3].
The selection of an appropriate type of stent is possible by determining the
diameter of the lumen of the vessel, e.g., from IVUS images. Stent size is deter-
mined by measuring the media-to-media (or maximum outer diameter of the
elastic membrane) orthogonally within the coronary lesion. The stent size is
read as the average of these measurements and the method is considered aggres-
sive IVUS [4]. Stent sizing can also be done in a traditional manner (traditional
IVUS method). In this case, the coronary stent is formed by the greater diameter
of the proximal and distal reference lu-men of the coronary arteries. However,
studies have shown that stent sizing by these methods is feasible, safe, and has
good results when it comes to treating atherosclerosis [4]. The problem, how-
ever, is the blood in the vessel lumen, which has a similar echogenicity to the
vessel lumen. Thus, if blood and soft plaques are in close proximity, automatic
segmentation of the plaque and vessel lumen is very difficult and stent sizing can
be misleading.
IVUS results usually consist of many thousands of frames on which the vessel
lumen and the plaque boundary must be manually outlined. Automatic segmen-
tation of these structures is an important and key element in the development
of better and faster methods of analyzing IVUS results [5]. However, automatic
segmentation is still an unsolved problem due to the similar echogenicity of the
above-mentioned structures, such as blood and soft plaques. Research on the
texture of image elements shows that this problem could be solved. Textural
analysis that focuses on the texture of an image could be a solution to improve
automatic segmentation. The texture is the area of an image that we consider to
be homogeneous [6]. Analysis in medical images strives in many cases to classify
given parts of an image into the appropriate class to be able to distinguish the
parts of interest. The pixel parameters of different image fragments differ. With
The Influence of Textural Features on the Differentiation 183

pixel classification [7] and later segmentation of the image the is possibility to
assign fragments of the image into appropriate classes based on pixel similarity.
Segmentation of IVUS images is difficult. Algorithms developed for this segmen-
tation, such as semi-automatic algorithms, which may al-low better results, often
have drawbacks. For example, semi-automatic algorithms need manual interac-
tion. Another algorithm uses too much memory or is very time consuming [8].
J. Tong et al. detected the lumen contour based on textural features and sparse
kernel coding [5]. The features they tested are based on first-order statistics
and feature calculation for the gray-level co-occurrence matrix for 4 angles. The
results they obtained are satisfactory but do not show the effect of normalization
or quantization [9] on the improvement or deterioration of classification [5].
M. Kociolek et al. indicated that normalization and its effect on improving
classification are affected by the algorithm used [10]. Although their study was
performed on MRI images, it is important to see if the same relationship applies
to images obtained with IVUS. P. Mazur pointed out that quantization also
affects some elements by influencing the values of correlation coefficients [11].
Thus, this is another aspect to be considered in the analysis of plaque and blood
textures in IVUS images.
Subsequent analyses were carried out using first-order statistical methods,
the Haralick method, the Laws texture energy method, the differential gray
neighborhood matrix method (NGTDM), and the texture spectrum method.
Classification of textural features was carried out using the pattern recognition
method, and validation of the results was performed using resubstitution and
cross-validation methods. The results showed that texture analysis is possible
and the best classification is done on texture features obtained from the Haralick
method [12].
Another example of a textural analysis approach is the use of MaZda soft-
ware [13,14]. MaZda is software that focuses on different algorithms and different
color models and allows the analysis of regions of interest for more than 10,000
features. Although this software does not focus on all possible texture parame-
ters, it is a good tool to initially discern between texture parameters that have
the potential to differentiate between plaques and blood in the vessel lumen.
Additional parameters such as the GDTM, for example, may be considered in
later stages of the analysis [15].

2 Materials and Methods

This study was performed on IVUS intravascular ultrasound images obtained


with the TVC Imaging SystemTM and the TVC Insight catheter (InfraReDx,
MA). The tip of the catheter was placed at least 10 mm distal from the imaged
target lesion. The catheter was then automatically withdrawn at 0.5 mm/s (240
rpm) until the TVC catheter exited into the guide catheter. Images were acquired
in grayscale. The specialists then contoured the vessel lumen and vessel wall using
the CAAS Intravascular 2.0 system. The images were saved in DICOM format.
All the data came from a single patient, and images of 30 frames were selected
184 W. Malek et al.

in which the atherosclerotic plaque and blood in the lumen of the vessel were
visible. Although the features of the plaque area or the vessel lumen were not
considered in this analysis, images were selected so that the area of these regions
was as large as possible so as to allow better comparison of data obtained from
the textural analysis.
The first stage of data preparation was to superimpose the contours of the
vessel lumen and the vessel wall, which were described for each DICOM image
frame as coordinates. The Python programming language version 3.9.10 64-bit
was used for this. To perform statistical analysis and logistic regression, the fol-
lowing Python libraries were used: Scikit-learn [16], Matplotlib [17], Seabron [18],
NumPy [19], Pandas [20], SciPy [21], CSV. The coordinates were rounded to
integer values and then binary masks were created from them (Fig. 1).

Fig. 1. Binary masks

Binary masks were applied to 30 selected IVUS images and then prepared so
that only the plaque and lumen of the vessel were visible on the image without
including adventitia (Fig. 2).

Fig. 2. Prepared image for textural analysis with selected ROI of plaque (green) and
lumen (red)
The Influence of Textural Features on the Differentiation 185

The obtained images were entered into qMaZda software, and the area
belonging to the plaque and the area containing blood were manually marked.
Textural analysis was performed for the two regions of interest in qMaZda soft-
ware. 10,477 textural features were obtained for 30 images for the plaque and the
blood. IVUS images focus primarily on brightness, hence the features considered
later are based only on the brightness of the image marked as Y from the YUV
model. Thus, the number of features was reduced to 6,770.
The next step was to find the features that were most different from each
other. Statistical methods were used for this. Statistical tests were performed
for paired data, which meant that the feature determined for the atherosclerotic
plaque was compared with the values of the same feature determined for blood.
The Shapiro-Wilk test was performed to determine the data distribution of the
values obtained for each feature. A distinction was made between data that came
from a normal distribution and those that came from a non-normal distribution.
For features in which the data for plaque and blood were distributed differently,
it was considered that they would be analyzed as features from a non-normal
distribution.

3 Experimental Results
The Mann-Whitney test and the t-Student test for dependent samples were
conducted for data from a non-normal distribution and data from a normal
distribution, respectively. This determined whether the data were from a single
sample or not. Spearman’s and Pearson’s correlations were then used to find the
features that were most different between plaques and blood. Table 1 shows the
lowest values of both correlations for statistically valuable data which indicates
the biggest difference between plaque and blood.
Due to the large number of features that MaZda presents, the program has
appropriate labels that allow the features to be recognized more easily. A feature
name consists of a color component, normalization and quantization, a feature
extraction algorithm, a feature, and a feature name abbreviation. An example
feature name is YD8GlcmZ5Correlat, which indicates a feature whose color com-
ponent is brightness; the image is analyzed without normalization using 8 bits
for grayscale encoding. Glcm indicates that the gray level co-occurrence matrix
algorithm is used. Z5 indicates the encoding direction and distance between pix-
els (in this case (5,−5,0)). ‘Correlat’ indicates one of the characteristics of Glcm.
The other feature names are presented in the same way. Features whose calcu-
lation was based on the area of the marked region of interest were discarded
because the fields differed in size for each image analyzed. Thus, the results that
would have been obtained might have been unreliable. More than one normal-
ization is available. S means that the mean value μ and standard deviation σ of
gray-levels are computed. The range for further computation is <μ−3σ, μ−3σ>.
N means that the area gray-level histogram percentiles are computed. The new
range is defined by the first and ninety-ninth percentiles <p1, p99>. Finally, M
shows the minimum and maximum gray-levels found in the region of interest
which define a new range [13].
186 W. Malek et al.

Table 1. Features from the normal distribution and their Pearson’s (P) or Spearman’s
(S) coefficient values

Feature Coefficient value


YN8GlcmZ5Correlat −0.43 (P)
YM8GlcmZ5Entropy −0.50 (S)
YS7DwtHaarS4LH −0.51 (S)
YN8GlcmV1DifVarnc −0.45 (S)
YN6Gab8V4Mag −0.55 (S)

The values of Pearson’s and Spearman’s coefficients are negative. The fea-
tures were checked for the possibility of classifying regions of interest based on
them. Logistic regression was used for this purpose. A value of 1 was assigned
to values be-longing to plaque; a value of 0 was assigned to values belonging to
blood. The data set was divided such that 70% of the data was the training set
and 30% of the data was the test set. A logistic regression plot was then cre-
ated for each feature (Fig. 3). The confusion matrix for each feature showed the
effectiveness of the classification. The results from each matrix were collected in
Table 2.

Table 2. Results obtained from logistic regression

Feature Accuracy Correct Classification of Type 1 Type 2


plaque blood error error
YN8GlcmZ5Correlat 0.83 7 8 2 1
YM8GlcmZ5Entropy 0.89 7 9 1 1
YS7DwtHaarS4LH 0.83 5 10 1 2
YN8GlcmV1DifVarnc 0.89 7 9 1 1
YN6Gab8V4Mag 0.83 7 8 2 1

To determine the effect of normalization and the number of coding bits on


the quality of the feature used for classification, correlation values were read
without considering statistical tests. Results are presented for features whose
values were coded in the same directions. When comparing, the results were
grouped according to the algorithms and their features. The normalization names
were taken from the MaZda software, Table 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12.
The results of the logistic regression are also presented as graphs, Figs 3, 4,
5, 6 and 7. The graphs show the feature before image normalization (D) and the
statistically best feature after normalization.
The results of segmentation based on chosen texture parameters (texture
feature maps [22]) are presented in Fig. 8.
The Influence of Textural Features on the Differentiation 187

Table 3. Results obtained for different normalizations for GlcmCorrelat

Feature Pearson’s coefficient Accurancy


YN8GlcmZ5Correlat −0.43 0.83
YD8GlcmZ5Correlat −0.52 0.78
YS8GlcmZ5Correlat −0.32 0.67
YM8GlcmZ5Correlat −0.50 0.78

Table 4. Results obtained for different bits for the statistically selected feature for
GlcmCorrelat

Feature Pearson’s coefficient Accurancy


YN8GlcmZ5Correlat −0.43 0.83
YN7GlcmZ5Correlat −0.43 0.83
YN6GlcmZ5Correlat −0.43 0.83
YN5GlcmZ5Correlat −0.43 70.83
YN4GlcmZ5Correlat −0.42 0.83

Table 5. Results obtained for different normalizations for GlcmEntropy

Feature Spearman’s coefficient Accurancy


YM8GlcmZ5Entropy −0.50 0.89
YD8GlcmZ5Entropy −0.47 0.78
YS8GlcmZ5Entropy −0.27 0.89
YN8GlcmZ5Entropy −0.25 0.83

Table 6. Results obtained for different bits for the statistically selected feature for
GlcmEntropy

Feature Spearman’s coefficient Accurancy


YM8GlcmZ5Entropy −0.50 0.89
YM7GlcmZ5Entropy −0.35 0.89
YM6GlcmZ5Entropy −0.17 0.89
YM5GlcmZ5Entropy −0.16 0.89
YM4GlcmZ5Entropy −0.17 0.89

Table 7. Results obtained for different normalizations for GlcmDifVarnc

Feature Spearman’s coefficient Accurancy


YN8GlcmV1DifVarnc −0.45 0.89
YD8GlcmV1DifVarnc 0.59 0.39
YS8GlcmV1DifVarnc 0.28 0.56
YM8GlcmV1DifVarnc 0.71 0.50
188 W. Malek et al.

Table 8. Results obtained for different bits for the statistically selected feature for
GlcmDifVarnc

Feature Spearman’s coefficient Accurancy


YN8GlcmV1DifVarnc −0.45 0.89
YN7GlcmV1DifVarnc −0.45 0.89
YN6GlcmV1DifVarnc −0.45 0.89
YN5GlcmV1DifVarnc −0.45 0.89
YN4GlcmV1DifVarnc −0.42 0.89

Table 9. Results obtained for different normalizations for GabMag

Feature Spearman’s coefficient Accurancy


YN6Gab8V4Mag −0.54 0.83
YD6Gab8V4Mag 0.50 0.67
YS6Gab8V4Mag 0.46 0.67
YM6Gab8V4Mag 0.63 0.67

Table 10. Results obtained for different bits for statistically selected feature GabMag

Feature Spearman’s coefficient Accurancy


YN6Gab8V4Mag −0.54 0.83
YN5Gab8V4Mag −0.51 0.83
YN4Gab8V4Mag −0.47 0.83
YN8Gab8V4Mag −0.54 0.83
YN7Gab8V4Mag −0.54 0.83

Table 11. Results obtained for different normalizations for DwtHaar

Feature Spearman’s coefficient Accurancy


YS7DwtHaarS4LH −0.51 0.83
YD7DwtHaarS4LH −0.35 0.76
YN7DwtHaarS4LH −0.46 0.83
YM7DwtHaarS4LH −0.41 0.76

Table 12. Results obtained for different bits for the statistically selected feature for
DwtHaar

Feature Spearman’s coefficient Accurancy


YS7DwtHaarS4LH −0.51 0.83
YS6DwtHaarS4LH −0.49 0.65
YS5DwtHaarS4LH −0.47 0.65
YS4DwtHaarS4LH −0.44 0.65
YS8DwtHaarS4LH −0.47 0.83
The Influence of Textural Features on the Differentiation 189

Fig. 3. Logistic regression for GLCM-Correlat

Fig. 4. Logistic regression for GLCM-Entropy

Fig. 5. Logistic regression for DWTHAAR

Fig. 6. Logistic regression for GLCM-DifVarnc


190 W. Malek et al.

Fig. 7. Logistic regression for GAB-Mag

Fig. 8. Feature Map

4 Disscusion

The analysis indicated several features that can be used to classify plaque and
distinguish it from blood. The classification accuracy values determined by logis-
tic regression oscillated around 0.7 or 0.8 for most features. The obtained fea-
tures were determined using Gray Level Co-Occurance Matrix (Glcm), Gabor
Transform (Gab), and Discrete wavelet transform (DwtHaar) algorithms.
The Influence of Textural Features on the Differentiation 191

The overarching goal of the work was to find the best possible features to
classify soft tissue plaque and blood. MaZda software makes it possible to check
different types of normalization and quantization for all algorithms and their
features. The information obtained from this software was used in this paper to
find the relationships. The results obtained varied according to the algorithm
and its features, hence the conclusions will be presented separately for each
algorithm.
The correlation obtained from Glcm was from the normal distribution of
the features. The Pearson coefficient showed differences in values for different
normalizations. In most cases, this translated into classification accuracy using
logistic regression. The feature obtained in the statistical analysis of texture
features had the best accuracy. No normalization or different normalization at
the same quantization showed worse classification results. Next, the effect of
quantization on the normalization that obtained the best values was considered.
In this case, the Pearson coefficient and accuracy for different bits were the same
or changed slightly, but this did not affect the overall results.
The entropy from the Glcm algorithm came from a non-normal distribution,
hence Spearman correlation was used to determine the coefficient. In this case,
normalization also affects the results. Better accuracy was obtained for the nor-
malized features. However, the value of Spearman’s coefficient did not translate
into accuracy results. Although the value of the correlation coefficient differs
significantly for different bits obtained for Entropy, the accuracy for each bit
remains the same. The same conclusion is reached for DifVanc calculated from
Glcm, and Mag calculated from Gab. The different value of bits does not affect
the classification accuracy. Both these features also show that normalization sig-
nificantly affects the accuracy values. In these cases, the difference between the
features is also shown in the obtained values of the Spearman coefficients. Lower
values of this coefficient give better accuracy in logistic regression.
The logistic regression plots showed very insignificant differences, which could
be significant in the development of automatic segmentation algorithms. The
feature maps for the different features attached in the paper show that these
texture analysis methods successfully distinguish blood and plaque in images.

Acknowledgement. This work was financed by AGH University of Science and Tech-
nology thanks to the Rector’s Grant 17/GRANT/2022. This work was co-financed
by the AGH University of Science and Technology, Faculty of EAIIB, KBIB no
16.16.120.773.

References
1. Szczeklik, A., Tendera, M.: Kardiologia tom 1, 696 (2009)
2. Garcı́a-Garcı́a, H.M., Gogas, B.D., Serruys, P.W., Bruining, N.: IVUS-based
imaging modalities for tissue characterization: similarities and differences. Int.
J. Cardiovas. Imaging 27(2), 215–224 (2011). https://doi.org/10.1007/s10554-010-
9789-7
192 W. Malek et al.

3. Ahmed, J.M., et al.: Mechanism of lumen enlargement during intracoronary stent


implantation. Circulation 102(1), 7–10 (2000). https://doi.org/10.1161/01.CIR.
102.1.7
4. Wong, C.B., Hansen, N.D.: A novel method of coronary stent sizing using intravas-
cular ultrasound: safety and clinical outcomes. Int. J. Angiol. Off. Publ. Int. Coll.
Angiol. Inc. 18(1), 22 (2009). https://doi.org/10.1055/S-0031-1278317
5. Tong, J., Li, K., Lin, W., Shudong, X., Anwar, A., Jiang, L.: Automatic lumen
border detection in IVUS images using dictionary learning and Kernel sparse rep-
resentation. Biomed. Sign. Process. Control 66 (2021). https://doi.org/10.1016/j.
bspc.2021.102489
6. Strzelecki, M., Materka, A.: Tekstura obrazów biomedycznych. Wydawnictwo
Naukowe PWN, Metody analizy komputerowej. Warszawa (2017)
7. Pham, D.L., Xu, C., Prince, J. L.: Current methods in medical image segmentation
1 (2000). Accessed 22 Feb 2022. [Online]. www.annualreviews.org
8. Balocco, S., et al.: Standardized evaluation methodology and reference data-base
for evaluating IVUS image segmentation. Comput. Med. Imaging Graph. 38(2),
70–90 (2014). https://doi.org/10.1016/J.COMPMEDIMAG.2013.07.001
9. Strzelecki, M., Kociolek, M., Materka, A.: On the influence of image features word
length reduction on texture classification. In: International Conference on Infor-
mation Technologies in Biomedicine, pp. 15–26 (2018)
10. Kociolek, M., Strzelecki, M., Obuchowicz, R.: Does image normalization and inten-
sity resolution impact texture classification? Comput. Med. Imaging Graph. 81
(2020). https://doi.org/10.1016/j.compmedimag.2020.101716
11. Mazur, P.: The Influence of Bit-Depth Reduction on Correlation of Texture Fea-
tures with a Patient’s Age, in Lecture Notes in Networks and Systems, vol. 255,
pp. 191–198 (2022). https://doi.org/10.1007/978-3-030-81523-3-19
12. Vince, D.G., Dixon, K.J., Cothren, R.M., Cornhill, J.F.: Comparison of texture
analysis methods for the characterization of coronary plaques in intravascular ultra-
sound images. [Online]. www.elsevier.com/locate/compmedimag
13. Szczypiński, P.M., Strzelecki, M., Materka, A., Klepaczko, A.: MaZda - a soft-
ware package for image texture analysis. Comput. Meth. Progr. Biomed. 94, 66–76
(2009)
14. Szczypiński, P.M., Klepaczko, A., Kociolek, M.: QMaZda - Software tools for image
analysis and pattern recognition, in 2017 Signal Processing: Algorithms, Architec-
tures, Arrangements, and Applications (SPA), pp. 217–221 (2017)
15. Materka, A., Strzelecki, M.: Texture Analysis Methods - A Review (1998)
16. “scikit-image 0.19.2 docs - skimage v0.19.2 docs”. https://scikit-image.org/docs/
stable/. Accessed 22 Feb 2022
17. “Matplotlib documentation - Matplotlib 3.5.1 documentation”. https://matplotlib.
org/stable/. Accessed 8 Jan 2022
18. “seaborn: statistical data visualization - seaborn 0.11.2 documentation”. https://
seaborn.pydata.org/index.html. Accessed 22 Feb 2022
19. “NumPy documentation - NumPy v1.22 Manual”. https://numpy.org/doc/stable/.
Accessed 8 Jan 2022
20. “pandas documentation - pandas 1.4.1 documentation”. https://pandas.pydata.
org/docs/. Accessed 22 Feb 2022
The Influence of Textural Features on the Differentiation 193

21. “Chapter 8: SciPy”. https://scipython.com/book2/chapter-8-scipy/. Accessed 22


Feb 2022
22. Obuchowicz, R., Nurzynska, K., Obuchowicz, B., Urbanik, A., Piórkowski, A.: Use
of texture feature maps for the refinement of information derived from digital
intraoral radiographs of lytic and sclerotic lesions. Appl. Sci. (Switzerland) 9(15)
(2019). https://doi.org/10.3390/APP9152968
Computer Aided Analysis of Clock
Drawing Test Samples via PACS Plugin

Jacek Kawa1(B) , Maria Bieńkowska1 , Adam Bednorz2 , Michal Smoliński3 ,


and Emilia J. Sitek4,5
1
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{jacek.kawa,maria.bienkowska}@polsl.pl
2
John Paul II Geriatric Hospital, ul. Morawa 31, 40-353 Katowice, Poland
adam.bednorz@emc-sa.pl
3
RadPoint ltd, ul. Ceglana 35, 40-514 Katowice, Poland
michal.smolinski@radpoint.pl
4
Division of Neurological and Psychiatric Nursing, Faculty of Health Sciences,
Medical University of Gdansk, 80-211 Gdansk, Poland
emilia.sitek@gumed.edu.pl
5
Department of Neurology, St. Adalbert Hospital, Copernicus PL Ltd.,
80-462 Gdansk, Poland

Abstract. Clock Drawing Test (CDT) is a screening tool employed as


a standalone cognitive test or as a part of cognitive assessment battery.
This popular cognitive test has multiple procedures and scoring systems.
Most often the patient is asked to draw or fill in the clock face.
Automatic analysis and scoring of the Clock Drawing Test is not
straightforward task. There are many different scales, the input images
are of variable quality, and handwriting is occasionally hard to deci-
pher. However, psychologists can still employ computer tools to make
the assessment easier or faster.
In this paper, the computer-aided analysis of CDT using the Manos
and Wu scale was proposed. The template that divides the clock face
into eight parts is fitted to the image, then clustering and counting of
objects in individual segments are performed.
The proof-of-concept system designed for integration with the Picture
Archiving and Communications System (PACS) and RIS (Radiological
Information System) infrastructure is introduced.
The method is tested on static images registered on a tablet device.
The system performance is tested, yielding 70% accuracy at segment-
wise assessment. In qualitative assessment, neutral and positive scores
prevail.

Keywords: Clock drawing test · Dementia · Image analysis · PACS ·


VNA

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 194–207, 2022.
https://doi.org/10.1007/978-3-031-09135-3_17
Computer Aided Analysis of Clock Drawing Test Samples 195

1 Introduction
Demographic projections indicate growth in life expectancy and a significant
increase in the proportion of the elderly population. One of the undesired side
effects of this process is a growing number of people affected by age-related
diseases. It is estimated that in 2030 dementia can affect up to 65 million people
[28].
Dementia (or major neurocognitive disorder) is a complex disorder compris-
ing of cognitive and behavioral changes that lead to the loss of independence
in the activities of daily living [3]. The most common causes of dementia are:
Alzheimer’s disease, vascular dementia and dementia with Lewy bodies.
Early diagnosis enables introduction of non-pharmacological and pharmaco-
logical management. Also, of note, dementia may be related with poor medi-
cation management, which may put the patient at risk of complications due to
omission or overdose of prescribed drugs.
Clock Drawing Test (CDT) is one of the most popular cognitive screening
tests that has been used for years not only at stroke and dementia clinics [2],
but also in the primary care [19,21]. It is being used as a standalone tool, as
a part of tests as ACE (Addenbrooke’s Cognitive Assessment) [20], GPCOG
(General Practitioner Assessment of Cognition) [10], MoCA (Montreal Cogni-
tive Assessment) [27], Mini-Cog [9] or in conjunction with Mini-Mental State
Examination [14]. It is an easy-to-administer test, but procedures and scoring
methods differ considerably. It may involve drawing, filling in or copying a clock
face, sometimes with hands to indicate a specific time (most commonly 11:10,
2:45 or 8:20) [24,36]. CDT engages visual semantic memory (as the patient has
to visualize the clock face with reference to his/her knowledge), executive func-
tion (especially planning) and visuospatial function. Thus, CDT engages 3 out
of 6 cognitive domains specified in DSM-V [3]. Lower CDT score is related to
greater risk of falls in Alzheimer’s disease; CDT performance may be also used
to predict driving ability [15,34]. CDT may be either simply classified as nor-
mal vs. impaired or scored according to the quantitative characteristics (e.g.
score depends on the correct placement of numbers and sometimes hands) or
qualitative analysis (the presence of critical errors) [36]. Differences in the test
procedure and its scoring make CDT research data difficult to compare. Each
scoring methods has its own psychometric characteristics. Complex scoring sys-
tems usually have lower inter-rater reliability [2,24]. Some of the scoring systems
require the use of the template to objectively assess if the numbers are placed
in the correct sections of the clock face. Obviously, the templates can be used
only with clock face filling procedures such as the one devised by Watson et al.,
which refers to the clock face quadrants [35] or the one by Manos and Wu, which
refers to the clock face octagons [26].
In recent years, different solutions have been proposed for the digital version
of the Clock Drawing Test. The subject performs the standard task but draws
it on the tablet [18]. Apart from the clock sketch, some additional data are also
available in this type of test. Nirjon et al. [29] used information referring to the
moment when the user first touches the screen and ending when he lifts his finger
196 J. Kawa et al.

up. Another data could be the mobile sensor data such as x and y coordinates,
timestamps, and touch events [30]. Binaco et al. [8] used not only a tablet but
also a pen to register 350 different variables useful for analysis.
However, although the sensitivity of digital and paper-pencil versions is com-
parable [11], it seems that the digital version of the CDT can increase the level of
anxiety and stress in some of the elderly influencing the test results [6]. Therefore,
many studies are concerned with the automated assessment of analog (paper)
tests. An attempt to automatically evaluate the analog test was made by Guha
et al. [17]. However, it is based on digit recognition, assessing whether 12 num-
bers are present and where their centroids lie.
A group of methods features artificial intelligence and return information
about a suspected disorder. The neural networks are pre-trained with the set of
correct and incorrect clock images. They do not evaluate particular elements of
the image; thus, it does not allow for detailed scoring of the result [4,12,37].
The basic CDT assessment is a challenging task for an automated method,
as the scoring usually addresses not only the presence of indispensable elements,
but also the absence of irrelevant elements.
Still, manual, fine-grained scoring is significantly more troublesome for
humans and depends on the templates, guides, or experience. The computer-
aided analysis of the CDT could easily objectify the detailed scoring, providing
preprocessed CDT sheets or interactive utilities, yet leave the final decision to
the expert. Such an approach has successfully been employed in radiology.
Indeed, radiologists often employ computer-aided diagnosis (CAD) systems
to evaluate lesions, complex measurements, follow-up assessment, or triage. In
radiology, the CAD is traditionally [23] a Picture Archiving and Communications
System (PACS)/Vendor Neutral Archive (VNA) plugin or a standalone appli-
cation running on the radiological workstation. It operates on DICOM (Digi-
tal Imaging and Communications in Medicine) [7] objects as acquired by com-
puted tomography, magnetic resonance, etc., and stores the results as DICOM-
compatible data or in external databases. It operates using various image pro-
cessing techniques [31] or AI [16]. Moreover, it often integrates with Radiological
Information Systems (RIS) to help radiologists prepare reports. The same work-
flow could easily be adapted to any computer-assisted assessment or diagnosis
task requiring images on input and report generation, provided PACS/VNA and
RIS infrastructure could be used.
A proof of concept computer aided analysis module dedicated to static CDT
assessment is introduced in the paper. The module is designed for integration
with the PACS/VNA for result storage and RIS for presentation and expert eval-
uation. It provides REST (Representational State Transfer) and DICOM net-
work interfaces to directly process raster images or DICOM encapsulated scans.
The numerical results are stored in an auxiliary, transactional database. The
graphical results are returned as raster images or Secondary Capture DICOM
images for examination. Our long-term goal is to provide the tool to experts
evaluating digitized, traditional CDT sheets via the PACS/VNA and RIS sys-
tems using various scoring systems. However, at the moment, the CDT scans are
Computer Aided Analysis of Clock Drawing Test Samples 197

processed for evaluation according to the Manos and Wu scale [25]. Manos and
Wu scale is a 10-point scoring system based on adding a point for each of the
eight digits (1, 2, 4, 5, 8, 7, 10, 11) placed in a proper one-eighth of the circle
(octant) and two points for hands indicating 2 and 11. During the evaluation, a
template positioned over the clock drawing is employed.

2 Materials and Methods

In the study, digital Clock Drawing Test sheets are processed. The binary images
containing a clock face, hours, and hands drawn by the patients are imported into
the Matlab environment. Next, the binary images are subjected to the morpho-
logical opening and flood-filled to remove holes. Then, an object corresponding
to the clock face is selected. Parameters of the object determine the location of
the clock face and define the ROI. Finally, the Manos and Wu template is posi-
tioned. Objects inside the ROI are labeled and clustered in groups containing
separate hours. The object’s location is used to assign the objects (hours) to the
appropriate section of the Manos and Wu template and generate the report for
the expert.
The method is implemented in Matlab and deployed as a standalone exe-
cutable working inside a Docker container. The public interface consists of asyn-
chronous DICOM SecondaryCaptureImageStorage SOP service and synchronous
Python (Flask) REST service.

2.1 Materials

During the study, deidentified Clock Drawing Test digital sheets were used. Sam-
ples were acquired using a mobile application Test Pamieci
 (Memory Test) run
on two mobile devices: Samsung Galaxy Tab 4 (7 touch screen; resolution:
1280 × 800): 20 cases and Sony Xperia Z2 (10 touch screen; screen resolution
2560 × 1600): 16 cases.
Blank test sheets provided to patients via app had the clock face and the
center of the clock marked. The examinees (elderly patients of John Paul II
Geriatric Hospital in Katowice and volunteers) were asked to draw all the num-
bers designating hours inside the provided circle using a writing tool (stylus)
or a finger at the correct positions and then draw the clock hands pointing at
11:10.
During the acquisition, static image, as well as dynamical parameters, were
registered. However, this study did not employ the dynamical parameters as the
sheets were meant to resemble paper scans. The static images were captured
with the screen resolution. Exemplary scans are shown in the Fig. 1.
The data was acquired within study IS-2/54/NCBR/2015 co-financed by the
National Centre for Research and Development and approved by the bioethical
committee by the Jerzy Kukuczka Academy of Physical Education in Katowice,
resolution 1/2015.
198 J. Kawa et al.

Fig. 1. Exemplary CDT sheets as acquired in the application. Navigational controls


are visible in all images

2.2 Methods

The processing steps are presented below and shown in Fig. 2.

Preprocessing. Image processing starts with the artifact cleaning procedure.


Binarized clock images are first preprocessed. 8-connected objects smaller than
0.00001% (rounded to the nearest pixel; threshold selected experimentally based
on the dimension of a single dot) of the area of the whole CDT sheet are marked
as artifacts and removed.

Clock Face Detection. Next, all remaining objects are subjected to flood-
fill operation and evaluated. Eccentricity and area are collected and sorted in
Computer Aided Analysis of Clock Drawing Test Samples 199

Fig. 2. The workflow of the computer aided CDT analysis

ascending and descending order, respectively. The following algorithm is used to


select the object representing the face of the clock:
1. set n = 1.
2. check if any of the n first objects with the largest area is among n first objects
with the smallest Eccentricity. If so, select the object as a clock face and stop
processing.
3. n = n + 1; goto 2.

Next, the object selected as the clock face is analyzed. A centroid of the object
is selected as the center of the clock. The radius of the clock face is obtained as:

Area
R= , (1)
π
to increase the robustness of the method in presence of drawings crossing the
clock face’s template.
The resulting clock coordinates define the processing ROI (region of interest)
for the following steps.
200 J. Kawa et al.

Clock Template Segmentation. At this step, the clock template (border of


the clock face and small, filled circle marking the clock’s center) is separated
from the patient drawing.
First, all the objects in the ROI area are analyzed. The object (closed, thick
line) next to the ROI border is selected and subjected to the iterative morpho-
logical thinning. The width of the line used to the drawn template is selected as
twice the number of thinning steps necessary for objects removal, i.e., reducing
the area to 0 or breaking the single object into several 8-connected components.
Finally, the center of the clock face is analyzed. The circular object is detected
with a size matching the line width.
As a result of this processing step, the template area is excluded from the
ROI. The resulting ROI now resembles a thick ring shape.
Parameters of the clock extracted at this step are later used to superimpose
the Manos and Wu octant markers (template).

Object Labeling and Clustering. At the final processing step, objects are
labeled and clustered based on their location.
Before labeling, the objects are subjected to morphological dilation with
a disk-like structural element of a size matching the line thickness detected
above. The operation effectively merges nearby segments. The following label-
ing extracts 8-connected components. Finally, the pixels appended during the
dilation are marked as background (leaving only the original components).
Next, the centroids of the objects are extracted. Cartesian coordinates of the
centroids are used as data points for a SOM (Self Organizing Map) [22] based
clustering.
The 5×5, hexagon-like SOM-topology is employed. The initial neighbour-
hood includes all available data points. All centroids are used in training and
clustering. All objects assigned to the same SOM neuron are eventually merged.
The resulting objects are assigned to the Manos and Wu template’s octants
(segments) based on the centroid location. Finally, the report is created.

2.3 Software Architecture

The methodology is implemented as a Matlab 2022a application and compiled


into a standalone Linux binary. Binary is deployed in the Linux Ubuntu 20.04
Docker container with the Matlab Runtime installed (Fig. 3). RadPoint PACS
API is used to store preliminary resource.
Computer Aided Analysis of Clock Drawing Test Samples 201

Fig. 3. The diagram of the computer aided CDT analysis system architecture

The DICOM network interface for the container is provided by the storescp
application from the dcmtk package (OFFIS, Germany). SecondaryCapture
DICOM objects are accepted for asynchronous processing.
REST (Representational State Transfer) web service written in Python serves
as an entry point for web calls. PNG (Portable Network Graphics) or JPEG
(Joint Photographic Experts Group) files are accepted for asynchronous pro-
cessing.
In both cases, a new processing job is created (CDT is queued). However, in
the case of the REST service, the unique ID of the processing job is returned. It
can be later used to check the status of the processing and retrieve text and image
data. In DICOM interface case, the results are stored in a new SecondaryCapture
instance with the same PatientID, StudyInstanceUID and SeriesInstanceUID
as the input object with SecondaryCaptureDeviceManufacturerModelName tag
updated to ’PoC CDT CAD module 1.0’. The new instance is later stored in the
predefined DICOM node (e.g., PACS archive).
Text results (number of objects in each octant) are stored in the RIS (Radi-
ological Information System) auxiliary PRS (Preliminary Report Structure)
database with the cdt10 prefix. Opening the CDT case, RIS can read the table
and initially fill the report template shown to the expert. The startup config-
uration defines the DICOM’s Application Entity Titles and the location of the
database.
202 J. Kawa et al.

3 Results
The image dataset introduced in Sect. 2.2 has been processed using the devel-
oped module. The processing was performed on the Linux system running on
Intel Xeon Gold 6226 CPU, Matlab 2022a. The container and computational
module were started before the test. Cases were processed sequentially. The
mean processing time (not including DICOM interface overhead) for lower res-
olution cases was 1.5 s (min/max: 1.3/1.7 s) and for a higher resolution cases
7 s (min/max: 5.3/7.7 s). Obtained DICOMs were validated using dciodvfy tool
from the dicom3tools [13] package.
The results were assessed in a quantitative and qualitative manner:

– The location of the Manos and Wu template was evaluated.


– All cases were manually evaluated.
– Results were graded by the psychologist routinely working with CDT cases.

3.1 Location of the Manos and Wu Template

During the first evaluation step, the size and location of a generated Manos and
Wu template were assessed. The colored template was superimposed over the
original CDT sheet. The location was considered correct if the red template’s
line was placed directly over the clock face template (as in Fig. 2, Template
fitting stage). As the hours were not recognized, the location of the first octant’s
center line over the 12:00 h, as required by the Manos and Wu, was not checked
during the template placement assessment.
In all 36 cases, the template was assessed as correctly placed.

3.2 Manual Evaluation

Next, all available cases were manually evaluated. The objects in each Manos
and Wu octant (segment) were manually counted, and the quality of the object’s
segmentation was assessed. On that basis, the segment was considered correctly
or incorrectly processed.
The segment was considered incorrectly processed if any of the following
conditions occurred inside the said region:
– missing objects were present (e.g., hour or hand was not detected),
– 2-digit numbers (e.g.10, 11, 12) were detected as separate objects (e.g. 1–0,
1–1, 1–2),
– clustering errors were present, e.g., the clock’s hand and digit were considered
a single object despite not being 8-connected.

The Table 1 summarizes the evaluation. Of 288 evaluated octants, 201 (70%)
were assessed as correct.
Computer Aided Analysis of Clock Drawing Test Samples 203

Table 1. Evaluation of the objects segmentation with respect to Manos and Wu


octants. Octants are numbered clockwise starting from the top-right one (where 1:00
should be placed)

Octant Hour Correct Incorrect


cases cases
1 1:00 12 33% 24 67%
2 2:00 26 72% 10 18%
3 4:00 35 97% 1 3%
4 5:00 34 94% 2 6%
5 7:00 36 100% 0 0%
6 8:00 34 94% 2 6%
7 10:00 18 50% 18 50%
8 11:00 6 17% 30 83%
Total – 201 70% 87 30%

3.3 Qualitative Evaluation


The object counting step was evaluated by the expert (psychologist). For each
case, a graphical report was provided. The expert was asked if/how the presen-
tation facilitates scoring the CDT according to the Manos and Wu scale. The
following options were available:

1. does not help at all/distracts,


2. does not help,
3. needs similar attention as manual evaluation with the template already
placed,
4. it helps,
5. it helps a lot.

In one case, the results were assessed as not helpful. In twenty-five cases, the
neutral option was selected. In ten cases, the report was considered helpful. In
neither cases the report was considered very helpful or distracting.

4 Discussion

The presented results are based on the CDT data acquired by the tablet applica-
tion. However, in our study, only static images are analyzed. Although disregard-
ing dynamic parameters registered during the test could be generally considered
a drawback (dynamic registration provides more options for robust handwrit-
ing recognition or kinematic analysis), it is intentional. The long-term goal is to
analyze static scans of historical and still widely employed paper CDT sheets
lacking the dynamic parameters. Similar approach was previously employed in
our Luria’s test analysis framework in [33].
204 J. Kawa et al.

The method performance itself is considered adequate for the proof of con-
cept stage. However, the object (hours and hands) detection has to be improved
for the method to be considered ready for clinical evaluation. Significant draw-
back is a lack of handwriting recognition options. Without robust detection of
the digits, advanced scoring options are not available: one cannot position the
template at about 12:00, check whether digits are placed in specific octants, or
distinguish between digits, hands, other words, drawings inserted by the subject
or artifacts, etc. Moreover, although dedicated routines can easily detect selected,
even subtle, errors, such as recognizing incorrectly placed hours, they must be
supervised. Indeed, for human experts, clock drawings are either obviously cor-
rect (hours and hands placed where expected) or recognizably incorrect (wrong
numbers, all hours located in the upper/right half of the clock face, redundant
hands, etc.). Some of the newest scoring systems even abandoned using hand
length and focus only on the placement of hands [32].
What is more, as long as the neutral score prevails in experts’ opinion (such
as in Sect. 3.3), the computerized method can hardly be considered an overall
improvement. On the other hand, even employing a quasi-automatic method, the
computer-aided diagnosis approach provides several advantages. Fixed workflow
makes the scoring less subjective; placing the digits in correct octants can easily
be validated in terms of including most of the digit or the center of the digit, etc.
Moreover, in computer system, results are automatically archived and available
for follow-up up or remote assessment. The option to include new features that
are not directly available to the expert might also be considered beneficial.
Introduced PACS/VNA integration on the back-end layer was straightfor-
ward. The development model based on containerization makes the processing
module independent and ready for immediate deployment. On the PACS/VNA
level, only the configuration of a new DICOM node and corresponding auto-
routing rules was necessary. However, the memory footprint of the running con-
tainer, as well as resource consumption, was significant. On the other hand,
replacing the proof-of-concept, general-purpose environment with an optimized,
dedicated application is expected to significantly reduce the load without chang-
ing the public network interface.
In general, the architecture of contemporary PACS/VNA systems should
make the development of similar modules easier. Early PACS were considered
proprietary, closed environments, complaint with selected part of the standard
only. Transitioning to the more open, vendor-agnostic architecture, traditionally
linked with a VNA label [1,5], permitted various kinds of medical data stored in
the same archive. Nowadays, PACSes or VNAs (terms often used as synonymous
despite historical differences) can store DICOM-embedded scans, pictures, or
video recordings. RIS interface can be adapted to handle new types of data and
provide a unified user experience through various processing modules requiring
user interaction.

Acknowledgement. This research has been co-financed within the statutory grant
of Silesian University of Technology no. 07/010/BK 22/1011.
Computer Aided Analysis of Clock Drawing Test Samples 205

References
1. Agarwal, T.K.: Sanjeev: vendor neutral archive in PACS. Indian J. Radiol. Imaging
22(04), 242–245 (2012). https://doi.org/10.4103/0971-3026.111468
2. Agrell, B., Dehlin, O.: The clock-drawing test. Age Ageing 27(3), 399–404 (1998)
3. American Psychiatric Association: Diagnostic and statistical manual of mental
disorders, vol. 5th edn. American Psychiatric Publishing, Arlington (2013)
4. Amini, S., et al.: An artificial intelligence-assisted method for dementia detection
using images from the clock drawing test. J. Alzheimers Dis. 83(2), 581–589 (2021)
5. Armbrust, L.J.: PACS and image storage. The veterinary clinics of North America.
Small Animal Practi. 39(4), 711–718 (2009). https://doi.org/10.1016/j.cvsm.2009.
04.004
6. Bednorz, A., et al.: Zastosowanie tabletowej wersji Testu Rysowania Zegara do
rozpoznawania lagodnych zaburzeń poznawczych (MCI) u osób starszych, jako
próba telediagnostyki w geriatrii [Tablet version of Clock Drawing Test in assess-
ment of mild cognitive impairment in the elderly as an attempt to tele-diagnostics
in geriatrics] (2017)
7. Bidgood, W.D., Horii, S.C.: Introduction to the ACR-NEMA DICOM standard.
Radiographics 12(2), 345–355 (1992). https://doi.org/10.1148/radiographics.12.2.
1561424
8. Binaco, R., Calzaretto, N., Epifano, J., Emrani, S., Wasserman, V., Libon, D.,
et al.: Automated analysis of the clock drawing test for differential diagnosis of
mild cognitive impairment and Alzheimer’s disease. In: Mid-Year Meeting of the
International Neuropsychological Society (2018)
9. Borson, S., Scanlan, J., Brush, M., Vitaliano, P., Dokmak, A.: The Mini-Cog:
a cognitive ‘vital signs’ measure for dementia screening in multi-lingual elderly.
Int. J. Geriat. Psychiatr. 15(11), 1021–1027 (2000). https://doi.org/10.1002/1099-
1166(200011)15:111021::aid-gps2343.0.co;2-6
10. Brodaty, H., Low, L.F., Gibson, L., Burns, K.: What is the best dementia screening
instrument for general practitioners to use? Am. J. Geriatr. Psychiatr. 14(5), 391–
400 (2006)
11. Chan, J.Y., et al.: Evaluation of digital drawing tests and paper-and-pencil drawing
tests for the screening of mild cognitive impairment and dementia: a systematic
review and meta-analysis of diagnostic studies. Neuropsychol. Rev. 1–11 (2021).
https://doi.org/10.1007/s11065-021-09523-2
12. Chen, S., Stromer, D., Alabdalrahim, H.A., Schwab, S., Weih, M., Maier, A.: Auto-
matic dementia screening and scoring by applying deep learning on clock-drawing
tests. Sci. Rep. 10(1), 1–11 (2020)
13. Clunie, D.A.: Dicom3tools software website. https://www.dclunie.com/
dicom3tools.html (2022). Accessed 19 Jan 2022
14. Ferrucci, L., Cecchi, F., Guralnik, J.M., Giampaoli, S., Noce, C.L., Salani, B.,
Bandinelli, S., Baroni, A., Group, F.S.: Does the clock drawing test predict cogni-
tive decline in older persons independent of the mini-mental state examination? J.
Am. Geriatr. Soc. 44(11), 1326–1331 (1996)
15. Freund, B., Gravenstein, S., Ferris, R., Burke, B.L., Shaheen, E.: Drawing clocks
and driving cars. J. Gen. Intern. Med. 20(3), 240–244 (2005)
16. Fujita, H.: AI-based computer-aided diagnosis (AI-CAD): the latest review to read
first. Radiol. Phys. Technol. 13(1), 6–19 (2020). https://doi.org/10.1007/s12194-
019-00552-4
206 J. Kawa et al.

17. Guha, A., Kim, H., Do, E.Y.L.: Automated clock drawing test through machine
learning and geometric analysis. In: DMS, pp. 311–314 (2010)
18. Harbi, Z., Hicks, Y., Setchi, R.: Clock drawing test interpretation system. Proced.
Comput. Sci. 112, 1641–1650 (2017)
19. Hazan, E., Frankenburg, F., Brenkel, M., Shulman, K.: The test of time: a history
of clock drawing. Int. J. Geriatr. Psychiatr. 33(1), e22–e30 (2018)
20. Hsieh, S., Schubert, S., Hoon, C., Mioshi, E., Hodges, J.R.: Validation of the Adden-
brooke’s cognitive examination iii in frontotemporal dementia and Alzheimer’s dis-
ease. Dement. Geriatr. Cogn. Disord. 36(3–4), 242–250 (2013)
21. Kirby, M., Denihan, A., Bruce, I., Coakley, D., Lawlor, B.A.: The clock drawing test
in primary care: sensitivity in dementia detection and specificity against normal
and depressed elderly. Int. J. Geriatr. Psychiatr. 16(10), 935–940 (2001)
22. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol.
Cybern. 43(1), 59–69 (1982). https://doi.org/10.1007/bf00337288
23. Le, A.H.T., Liu, B., Huang, H.K.: Integration of computer-aided diagno-
sis/detection (CAD) results in a PACS environment using CAD–PACS toolkit and
DICOM SR. Int. J. CARS 4(4), 317–329 (2009). https://doi.org/10.1007/s11548-
009-0297-y
24. Mainland, B.J., Shulman, K.I.: Clock drawing test. In: Larner, A.J. (ed.) Cognitive
Screening Instruments, pp. 67–108. Springer, Cham (2017). https://doi.org/10.
1007/978-3-319-44775-9 5
25. Manos, P.J.: The utility of the ten-point clock test as a screen for cognitive impair-
ment in general hospital patients. Gen. Hosp. Psychiatr. 19(6), 439–444 (1997)
26. Manos, P.J., Wu, R.: The ten point clock test: a quick screen and grading method
for cognitive impairment in medical and surgical patients. Int. J. Psychiatr. Med.
24(3), 229–244 (1994)
27. Nasreddine, Z.S., et al.: The Montreal cognitive assessment, MoCA: a brief screen-
ing tool for mild cognitive impairment. J. Am. Geriatr. Soc. 53(4), 695–699 (2005).
https://doi.org/10.1111/j.1532-5415.2005.53221.x
28. Ngo, J., Holroyd-Leduc, J.M.: Systematic review of recent dementia practice guide-
lines. Age Ageing 44(1), 25–33 (2014)
29. Nirjon, S., Emi, I.A., Mondol, M.A.S., Salekin, A., Stankovic, J.A.: MOBI-COG:
a mobile application for instant screening of dementia using the mini-cog test. In:
Proceedings of the Wireless Health 2014 on National Institutes of Health, pp. 1–7
(2014)
30. Park, I., Lee, U.: Automatic, qualitative scoring of the clock drawing test (CDT)
based on U-net, CNN and mobile sensor data. Sensors 21(15), 5239 (2021)
31. Pietka, E., Kawa, J., Badura, P., Spinczyk, D.: Open architecture computer-aided
diagnosis system. Expert. Syst. 27(1), 17–39 (2010). https://doi.org/10.1111/j.
1468-0394.2009.00524.x
32. Rakusa, M., Jensterle, J., Mlakar, J.: Clock drawing test: a simple scoring system
for the accurate screening of cognitive impairment in patients with mild cogni-
tive impairment and dementia. Dement. Geriatr. Cogn. Disord. 45(5–6), 326–334
(2018)
33. Stepień,
 P., et al.: Computer aided written character feature extraction in pro-
gressive supranuclear palsy and Parkinson’s disease. Sensors 22(4), 1688 (2022).
https://doi.org/10.3390/s22041688
34. Suzuki, Y., et al.: Quantitative and qualitative analyses of the clock drawing test
in fall and non-fall patients with Alzheimer’s disease. Dementia Geriatric Cogn.
Disord. Extra 9(3), 381–388 (2019)
Computer Aided Analysis of Clock Drawing Test Samples 207

35. Watson, Y.I., Arfken, C.L., Birge, S.J.: Clock completion: an objective screening
test for dementia. J. Am. Geriatr. Soc. 41(11), 1235–1240 (1993)
36. Wójcik, D., Szczechowiak, K.: Wybrane wersje testu rysowania zegara w prak-
tyce klinicznej-analiza porównawcza ilościowych i jakościowych systemów oceny
[Selected versions of the clock test in clinical practice - a comparative analysis of
quantitative and qualitative scoring systems]. Aktualn Neurol. 19, 83–90 (2019)
37. Youn, Y.C., et al.: Use of the clock drawing test and the rey-osterrieth complex
figure test-copy with convolutional neural networks to predict cognitive impair-
ment. Alzheimer’s Res. Ther. 13(1), 1–7 (2021)
Study on the Impact of Neural Network
Architecture and Region of Interest
Selection on the Result of Skin Layer
Segmentation in High-Frequency
Ultrasound Images

Dżesika Szymańska1 , Joanna Czajkowska1(B) , Szymon Korzekwa2 ,


and Anna Platkowska-Szczerek3
1
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
joanna.czajkowska@polsl.pl
2
Department of Anatomy, Poznan University of Medical Sciences, ul. Swiecickiego 6,
60-781 Poznan, Poland
3
Anclara sp. z o.o., ul. Pulawska 136/61, 02-624 Warszawa, Poland

Abstract. One of the non-invasive methods which can be used to moni-


tor inflammatory skin diseases, such as atopic dermatitis and psoriasis, is
high-frequency ultrasound (>20 MHz). This type of imaging allows dis-
tinguishing skin layers, including the subepidermal low echogenic band,
which thickness can indicate the severity of the disease or be used to
assess the therapy. This study presents an analysis of the impact of the
neural network architecture and region of interest selection on the result
of skin layer segmentation in high-frequency ultrasound images. In addi-
tion, it examines the influence of the training parameters. The U-Net,
DC-UNet, and CFPNet-M were chosen to investigate network archi-
tectures. Ultimately, the highest segmentation result was obtained for
the DC-UNet network model, Adam optimizer, and the original images
resized to 512×256 pixels. The highest Dice coefficient was equal to 0.943
for the epidermis and 0.93 for the subepidermal low echogenic band layer
segmentation, respectively.

Keywords: High-frequency ultrasound · Convolutional neural


networks · Skin layer segmentation · SLEB · U-Net · DC-UNet ·
CFPNet-M

1 Introduction
Atopic dermatitis is a chronic inflammatory skin disease characterized by intense
itching and recurrent eczema. It can occur at any age but most often appears
in early childhood (usually between 3 and 6). Its causes are multifactorial and

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 208–221, 2022.
https://doi.org/10.1007/978-3-031-09135-3_18
Study on the Impact of Neural Network Architecture 209

complex, and one of them is the genetic factor. So far, no medicine has been
developed for it [2].
Psoriasis is a chronic and recurrent immune-mediated disease of the skin and
joints. It has several clinical skin symptoms, but chronic symmetrical erythema-
tous scaly papules and plaques are the most common manifestation. It can occur
at any age. One of its causes is a vital genetic component, but environmental
factors, such as infections, may also play an essential role [3].
To monitor both mentioned diseases, HFUS (high-frequency ultrasound,
>20 MHz) is now often used. It is a non-invasive method that enables the differ-
entiation of skin structures on a micro-scale [4].
Ultrasound images of inflammatory skin diseases show a band with low
echogenicity – SLEB layer (subepidermal low echogenic band). The thickness
of the SLEB layer may be an indicator of the disease’s severity, and its measure-
ment over time may also be used to monitor the effects of the applied therapy
[4].
Due to the growing popularity of skin imaging using ultrasound, it is neces-
sary to develop image processing techniques dedicated to this issue, which will
allow, for example, segmentation of skin layers or detection of skin lesions. One
of the possible solutions is the use of deep neural networks.

1.1 The Aims of the Study


The study aims to analyze the impact of neural network architecture and region
of interest selection on skin layer segmentation in high-frequency ultrasound
images. Apart from the two mentioned aspects, in our experiments, we analyze
the following elements of the algorithm: the size of images used for network
training, augmentation technique, optimization methods used during network
training, a binarization threshold value, and additionally, k values used in k-fold
cross-validation.

1.2 State of Art


First applications in HFUS based skin segmentation utilized the capabilities of
active contour models [7,8]. The algorithm proposed in [7] enables the accurate
segmentation of skin layers: epidermis, SLEB, and dermis. In the first step, a
non-linear filtration is used to obtain the initial localization of the epidermis
area. Then, the epidermis and dermis segmentations are carried out using an
active contour model. The SLEB segmentation step divides a previously seg-
mented dermis area into several thin layers. Finally, based on the analysis of
local statistics of the ultrasound signal, the SLEB area is determined [7].
Another solution, utilizing active contours, was presented in [8]. The authors
proposed a fully automatic segmentation method of skin lesions in dermatological
ultrasound images. The algorithm starts from epidermis and dermis detection,
consisting of two steps each. First, initialization targets to locate the Region of
Interest (ROI). Second, morphological operations and the active contour method
improve the existing rough results of the analysis [8].
210 D. Szymańska et al.

A slightly different method for epidermis layer segmentation in histopatholog-


ical images is presented in [9]. The proposed algorithm is based on thickness mea-
surements and the k-means clustering technique. The first stage involves rough
segmentation employing thresholding and shape analysis. Then, the thickness of
the segmented structure is successively calculated, and on its basis, a decision
on further processing is made. In case of finding a poor segmentation quality,
the obtained coarse results are subjected to the k-means algorithm and grouped
into epidermis and dermis class [9].
Currently, the use of neural networks in skin layer segmentation is widely
explored in literature [5,6]. For example, the authors of [5] proposed a method
of the epidermal layer and hair follicles segmentation in optical coherence tomog-
raphy (OCT) images of healthy people. It consists of two main stages. The first
applies a convolutional neural network of the coder-decoder type, which seg-
ments three image regions: the dermis, the epidermis with hair follicles, and the
area above the epidermis. The second step is post-processing, which involves the
use of a Savitzky-Golay filter and frequency-domain filtration technique [5].
The deep neural network of the coder-decoder type was also used in [6] for
the segmentation of the epidermis and the SLEB layers in ultrasound images of
people suffering from atopic dermatitis and psoriasis. The authors proposed three
step-based framework: preprocessing step targets in ROI selection; SegUNet is
responsible for main segmentation; and post-processing methods smooth the
segmented masks after resizing to original image size [6].

2 Materials and Methods


2.1 Dataset
The data set consists of 380 high-frequency ultrasound images of patients with
atopic dermatitis (303 images) or psoriasis (77 images). The dataset is publically
available [1] along with the pre-trained SegUnet model for skin layer segmenta-
tion [6] and expert delineations. The images were acquired using DUB SkinScan-
ner75 with a 75 MHz transducer. The images have the same size of 2067 × 1555
pixels, but different resolutions (lateral×axial): 0.0019 × 0.085, 0.0024 × 0.085,
0.0031 × 0.085, 0.0019 × 0.085 mm/pix.

2.2 Deep Neural Network Architectures


The U-Net [10], DC-UNet [11] and CFPNet-M[12] were chosen to analyze the
impact of neural network architecture as the newest and most widely applied
either to medical or layered image segmentation. Since these models were
intended for binary segmentation, to fit the segmentation of three objects (back-
ground, epidermis, and SLEB), the last layer of each network was changed. The
softmax replaced the sigmoid activation, and the number of filters was increased
from one to three.
The U-Net architecture was selected due to its wide application in the medi-
cal image segmentation [10]. The DC-UNet is its modification, providing better
Study on the Impact of Neural Network Architecture 211

segmentation results in layered images [11]. The last network, CFPNet-M, was
chosen due to its smaller size than previously mentioned, however similar appli-
cation area [12]. An important factor that influenced the network’s choice was
the year of its publication – the latest solutions were benefited. The U-Net is the
oldest one; however, it is often described as a reference solution in literature.
Implementations of the above architectures were taken from [11–14].

2.3 Region of Interest Selection

To prevent false detection of the epidermis, the input images were limited to
the region of interest in one of our experiments. Based on the literature, it was
assumed that this area would be 0.5 mm up from the top of the epidermis and
2 mm down from its valley [6,10,13,21,23]. After that, ROI images were scaled
to the size of 128 × 256 pixels. The designated region of interest in the original
image and expert mask is shown in Fig. 1 by a red frame.

Fig. 1. Region of interest selection

2.4 Augmentation

Due to the limited number of images in the dataset, the augmentation step is
considered as potentially improving the segmentation results. Therefore, three
variants of the data set were applied: original data (380 images), data with
212 D. Szymańska et al.

augmentation v1 (2280 images), and data with augmentation v2 (2280 images).


In each of the three variants, the input image of the following sizes: 128 × 64,
256 × 128, and 512 × 256 were tested, whereas the ROI images were always in
size of 128 × 256 pixels.
The augmentation v1 consists of extending the original data set with trans-
formed data. Left rotation 10o , right rotation 10o , mirroring (right-left), shift 20
pixels up and shift 20 pixels down were applied. The augmentation v2 differed
from v1 only by the angle of rotation. In this case, it was 20o .

2.5 Training

The last studied parameter is the optimizer, and the two considered optimiz-
ers are Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation
(Adam). The training setup was as follows: for SGD optimizer, the learning
rate was equal to 0.01, and momentum was equal to 0, whereas, for Adam opti-
mizer, the learning rate was equal to 0.001, the exponential decay rate for the
1st moment estimates was equal to 0.9, the exponential decay rate for the 2nd
moment estimates was equal to 0.999, and the epsilon was equal to 1 × 10−7 .
For all the architectures, categorical cross-entropy was used as the loss function.
Each model was trained over 100 epochs with a batch size equal to 2. The archi-
tectures were implemented in python using Keras and Tensorflow libraries in the
Collab environment.
To assess the segmentation, the external k-fold cross-validation was intro-
duced. To analyze the impact of the k value on the obtained results, two mostly
explored k values were selected: k = 5 and k = 10.

3 Results and Discussion


Both Jaccard (IoU, Intersection over Union) and Dice indexes were used to eval-
uate the segmentation quality. They are linearly dependent; however, since the
first one is widely used in CNN-based applications and the second is intuitive
for medical image processing experts, both are calculated. The Tables 1, 2, 3, 4,
5 and 6 present the obtained values. The Tables 1, 2 and 3 include three con-
sidered models, trained on original images and augmented datasets of different
sizes, using different optimizers and k-fold options. The Tables 4, 5 and 6 include
the same combinations of parameters, however, the input images were limited
to ROI, and as mentioned before, only one image size is taken into account. The
numbers presented in all the tables are the median values of the Dice and Jac-
card coefficients from all predictions of a given variant. The highest value in each
experiment is underlined. Bold is the highest value of the segmentation quality
assessment coefficients for each skin layer among all the analyzed models. Addi-
tionally, Figs. 2, 3, 4 and 5 present comparison results of different segmentation
methods illustrated in histograms.
On this basis, we can conclude that the DC-UNet architecture works bet-
ter for larger images (512 × 256) with augmentation v1. Moreover, this network
Study on the Impact of Neural Network Architecture 213

allowed for achieving the highest segmentation results for both segmented lay-
ers among all the analyzed variants. It also returns better results in the case
of using the SGD optimization method compared to the U-Net and CFPNet-M
networks. U-Net performed best in the case of a small data set (without aug-
mentation). Its results are better for smaller images and the Adam optimizer.
Moreover, the Adam optimizer works better than SGD for the most considered
approaches (excluding DC-UNet). CFPNet-M makes it possible to achieve higher
results than U-Net for larger images (512 × 256). Compared to the DC-UNet net-
work, it performs better only in the original image data set with the image size
512 × 256. In the case of the ROI data set without augmentation, the U-Net and
CFPNet-M networks provide similar results, better than the DC-UNet architec-
ture. In turn, analyzing the obtained results for the set with augmentation, the
DC-UNet network allowed for achieving the highest values.

Table 1. Results of skin layer segmentation achieved by U-Net architecture

Data set Index Optimization method Optimization method


Variant Size Adam SGD Adam SGD
k=5 k = 10 k=5 k = 10 k=5 k = 10 k=5 k = 10
Epidermis SLEB
Original 128 × 64 IoU 0.769 0.766 0.620 0.671 0.670 0.670 0.456 0.539
Dice 0.870 0.867 0.766 0.803 0.803 0.802 0.626 0.701
256 × 128 IoU 0.823 0.821 0.670 0.702 0.739 0.745 0.519 0.540
Dice 0.903 0.902 0.802 0.825 0.850 0.854 0.684 0.701
512 × 256 IoU 0.831 0.834 0.607 0.687 0.742 0.761 0.426 0.576
Dice 0.908 0.910 0.755 0.814 0.852 0.864 0.598 0.731
Augmentation v1 128 × 64 IoU 0.805 0.807 0.758 0.761 0.762 0.763 0.659 0.668
Dice 0.892 0.893 0.863 0.864 0.865 0.866 0.794 0.801
256 × 128 IoU 0.845 0.846 0.798 0.801 0.799 0.797 0.712 0.713
Dice 0.916 0.917 0.888 0.890 0.888 0.887 0.832 0.832
512 × 256 IoU 0.860 0.863 0.809 0.812 0.816 0.827 0.717 0.726
Dice 0.924 0.927 0.894 0.896 0.899 0.905 0.835 0.842
Augmentation v2 128 × 64 IoU 0.802 0.804 0.756 0.758 0.748 0.750 0.656 0.659
Dice 0.890 0.891 0.861 0.862 0.856 0.857 0.792 0.795
256 × 128 IoU 0.841 0.843 0.798 0.797 0.788 0.791 0.702 0.705
Dice 0.914 0.915 0.888 0.887 0.881 0.883 0.825 0.827
512 × 256 IoU 0.855 0.859 0.802 0.810 0.804 0.808 0.713 0.718
Dice 0.922 0.924 0.890 0.895 0.892 0.894 0.833 0.836

Since the network’s output is a probability map, the influence of the binariza-
tion threshold value on the final segmentation results is also reviewed. It shows
that the highest values were achieved for the threshold equal to 0.4. Thresh-
olds above the value of 0.6 were too restrictive, and those below 0.3 were too
imprecise.
Figures 6 and 7 show the boxplots for each network and the combinations
of parameters, which obtained the highest values of the segmentation quality
assessment coefficients for the epidermal layer and SLEB. On this basis, we can
conclude that the DC-UNet network achieves the highest quality segmentation
214 D. Szymańska et al.

of both layers among the analyzed architectures: 0.893 Jaccard and 0.943 Dice
index for the epidermal layer, and 0.869 Jaccard and 0.93 Dice index for the
SLEB layer. The exemplary segmentation results for DC-UNet model are shown
in Fig. 8.
CFPNet-M provides the results of 0.873 Jaccard and 0.932 Dice for epidermal
layer segmentation and 0.844 Jaccard and 0.915 Dice for the SLEB layer. In turn,
the results are a little bit worse using the U-Net architecture: 0.863 Jaccard and
0.927 Dice for the epidermis and 0.827 Jaccard and 0.905 Dice for the SLEB layer
segmentation. It is also worth noting that these results were achieved for the
models learned on the set with augmentation v1 and images of a size (512×256).
These are the parameters for which DC-Unet works the best.
The region of interest selection step does not provide the highest result among
all the analyzed models. However, in the case of the U-Net architecture, reducing
the image area to ROI improves segmentation using the SGD optimizer. Similar
results were obtained for the original images resized to 256 × 128 pixels and the
Adam optimizer.
Applying the augmentation significantly improved the obtained results, espe-
cially variant v1. Since the methods of data augmentation differed only in the
angle of rotation of the images (for v1, the angle of rotation was 10, while for v2
it was 20), we can conclude that too intensive rotation in the case of ultrasound
images (of a layered nature, where successive layers are arranged in parallel)
causes a deterioration of the segmentation quality. The next parameter analyzed
was the size of the images. It was noticed that the segmentation of skin layers
is more accurate for larger images. It can be explained by the information loss
during the interpolation step. For the two optimization methods used during
network training considered, we can notice that the Adam optimizer results in
significantly better segmentation than SGD.
The last analyzed parameter was the number of subsets used in the k-fold
cross-validation. It is not considered the framework parameter. However, the
validation strategy may strongly influence the obtained results. For accurate
evaluation of the developed algorithm on extensive data collection, it is sufficient
to use smaller k (e.g., 5), whereas for small data sets, k needs to be higher (e.g.,
10). The highest results were obtained in our experiments when dividing the set
into 10 parts, and the networks were trained on larger image sets.
The obtained results were also compared with SegUNet described in [6]
applied to the same dataset. The reported there Dice indexes are equal to 0.874
for the epidermis, and 0.829 for the SLEB layer, respectively, being worst than
the currently selected model. From our observation, the CNN training results
depend on the development environment and the used libraries (even for the
same architectures).
It is worth to mention, that in our analysis we considered, the only models
and parameters with promising results described in the literature. Our analysis
would complement other models or additional training parameters (e.g., loss
function, mini-batch size, number of the epoch, etc.). However, such analysis
requires additional time and hardware resources. In our opinion, the selected
Study on the Impact of Neural Network Architecture 215

Table 2. Results of skin layer segmentation achieved by DC-UNet architecture

Data set Index Optimization method Optimization method


Variant Size Adam SGD Adam SGD
k=5 k = 10 k=5 k = 10 k=5 k = 10 k=5 k = 10
Epidermis SLEB
Original 128 × 64 IoU 0.746 0.748 0.660 0.665 0.649 0.644 0.512 0.503
Dice 0.855 0.856 0.795 0.799 0.787 0.783 0.677 0.670
256 × 128 IoU 0.815 0.815 0.705 0.729 0.733 0.727 0.552 0.594
Dice 0.898 0.898 0.827 0.843 0.846 0.842 0.712 0.746
512 × 256 IoU 0.808 0.824 0.723 0.753 0.711 0.743 0.571 0.622
Dice 0.894 0.904 0.839 0.859 0.831 0.852 0.727 0.767
Augmentation v1 128 × 64 IoU 0.794 0.795 0.779 0.782 0.725 0.735 0.687 0.693
Dice 0.885 0.886 0.876 0.877 0.841 0.847 0.814 0.819
256 × 128 IoU 0.849 0.855 0.822 0.826 0.795 0.810 0.743 0.746
Dice 0.918 0.922 0.902 0.904 0.886 0.895 0.852 0.855
512 × 256 IoU 0.893 0.893 0.840 0.845 0.863 0.869 0.762 0.780
Dice 0.944 0.943 0.913 0.916 0.926 0.930 0.865 0.876
Augmentation v2 128 × 64 IoU 0.788 0.791 0.773 0.776 0.712 0.724 0.674 0.684
Dice 0.882 0.884 0.872 0.874 0.832 0.840 0.805 0.812
256 × 128 IoU 0.850 0.845 0.836 0.823 0.799 0.785 0.790 0.737
Dice 0.919 0.916 0.910 0.903 0.888 0.880 0.883 0.849
512 × 256 IoU 0.873 0.873 0.836 0.841 0.824 0.831 0.759 0.766
Dice 0.932 0.932 0.911 0.914 0.903 0.908 0.863 0.867

Fig. 2. The comparison results of different epidermis layer segmentation methods illus-
trated in a histogram

setup and conclusions coming from the results can be beneficial for further works
in the HFUS image segmentation area.
216 D. Szymańska et al.

Table 3. Results of skin layer segmentation achieved by CFPNet-M

Data set Index Optimization method Optimization method


Variant Size Adam SGD Adam SGD
k=5 k = 10 k = 5 k = 10 k = 5 k = 10 k = 5 k = 10
Epidermis SLEB
Original 128 × 64 IoU 0.741 0.737 0.612 0.647 0.640 0.643 0.489 0.517
Dice 0.851 0.849 0.759 0.786 0.781 0.783 0.657 0.682
256 × 128 IoU 0.814 0.812 0.684 0.714 0.725 0.731 0.535 0.569
Dice 0.897 0.896 0.812 0.833 0.840 0.844 0.697 0.725
512 × 256 IoU 0.841 0.842 0.637 0.672 0.766 0.773 0.452 0.549
Dice 0.914 0.914 0.778 0.804 0.868 0.872 0.622 0.709
Augmentation v1 128 × 64 IoU 0.785 0.789 0.755 0.755 0.732 0.743 0.659 0.652
Dice 0.880 0.882 0.860 0.861 0.845 0.852 0.795 0.789
256 × 128 IoU 0.840 0.845 0.808 0.810 0.797 0.804 0.716 0.719
Dice 0.913 0.916 0.894 0.895 0.887 0.891 0.835 0.837
512 × 256 IoU 0.871 0.873 0.830 0.833 0.840 0.843 0.750 0.755
Dice 0.931 0.932 0.907 0.909 0.913 0.915 0.857 0.860
Augmentation v2 128 × 64 IoU 0.782 0.783 0.751 0.752 0.720 0.732 0.646 0.647
Dice 0.878 0.878 0.858 0.858 0.837 0.845 0.785 0.786
256 × 128 IoU 0.838 0.841 0.798 0.807 0.789 0.795 0.699 0.716
Dice 0.912 0.914 0.888 0.893 0.882 0.886 0.823 0.834
512 × 256 IoU 0.866 0.867 0.826 0.828 0.827 0.832 0.743 0.744
Dice 0.928 0.929 0.905 0.906 0.905 0.909 0.852 0.853

Table 4. Results of skin layer segmentation for ROI data set achieved by U-Net archi-
tecture

Data set Index Optimization method Optimization method


Adam SGD Adam SGD
k = 5 k = 10 k = 5 k = 10 k = 5 k = 10 k = 5 k = 10
Epidermis SLEB
Original IoU 0.824 0.822 0.768 0.791 0.740 0.729 0.651 0.660
Dice 0.903 0.902 0.869 0.883 0.850 0.843 0.789 0.795
Augmentation v1 IoU 0.849 0.849 0.818 0.821 0.796 0.802 0.736 0.742
Dice 0.919 0.919 0.900 0.901 0.886 0.890 0.848 0.852
Augmetation v2 IoU 0.846 0.846 0.817 0.816 0.787 0.791 0.722 0.730
Dice 0.916 0.917 0.899 0.899 0.881 0.883 0.839 0.844
Study on the Impact of Neural Network Architecture 217

Table 5. Results of skin layer segmentation for ROI data set achieved by DC-UNet
architecture

Data set Index Optimization method Optimization method


Adam SGD Adam SGD
k = 5 k = 10 k = 5 k = 10 k = 5 k = 10 k = 5 k = 10
Epidermis SLEB
Original IoU 0.822 0.829 0.788 0.800 0.722 0.733 0.634 0.664
Dice 0.902 0.906 0.881 0.889 0.838 0.846 0.776 0.798
Augmentation v1 IoU 0.867 0.862 0.837 0.839 0.826 0.813 0.759 0.768
Dice 0.928 0.926 0.911 0.912 0.905 0.897 0.863 0.869
Augmentation v2 IoU 0.849 0.856 0.834 0.839 0.795 0.801 0.750 0.764
Dice 0.918 0.922 0.909 0.912 0.886 0.889 0.857 0.866
Table 6. Results of skin layer segmentation for ROI data set achieved by CFPNet-M
architecture

Data set Index Optimization method Optimization method


Adam SGD Adam SGD
k = 5 k = 10 k = 5 k = 10 k = 5 k = 10 k = 5 k = 10
Epidermis SLEB
Original IoU 0.824 0.823 0.764 0.775 0.725 0.737 0.643 0.666
Dice 0.903 0.903 0.866 0.873 0.840 0.848 0.783 0.799
Augmentation v1 IoU 0.851 0.852 0.823 0.825 0.809 0.811 0.741 0.743
Dice 0.920 0.920 0.903 0.904 0.894 0.896 0.851 0.853
Augmentation v2 IoU 0.847 0.849 0.821 0.824 0.797 0.802 0.736 0.736
Dice 0.917 0.918 0.902 0.903 0.887 0.890 0.848 0.848

Fig. 3. The comparison results of different SLEB layer segmentation methods illus-
trated in a histogram
218 D. Szymańska et al.

Fig. 4. The comparison results of different epidermis layer segmentation methods for
ROI data set illustrated in a histogram

Fig. 5. The comparison results of different SLEB layer segmentation methods for ROI
data set illustrated in a histogram

Fig. 6. Boxplots for epidermis layer segmentation according to the network architec-
ture. Each box covers a 25th to 75th percentile range with median value given and
indicated by a central line. The extreme values are bordered by whiskers
Study on the Impact of Neural Network Architecture 219

Fig. 7. Boxplots for SLEB layer segmentation according to the network architecture.
Each box covers a 25th to 75th percentile range with median value given and indicated
by a central line. The extreme values are bordered by whiskers

(a) AD (b) Psoriasis

Fig. 8. Exemplary segmentation results for DC-UNet model (red) superimposed at


expert delineation (yellow)
220 D. Szymańska et al.

4 Conclusion
This study presents the results of analyzing the impact of neural network archi-
tecture and region of interest selection on skin layer segmentation in high-
frequency ultrasound images. In the analysis, we considered the influence of
image size, data augmentation, applied optimization method, and binarization
threshold. Additionally, the influence of k value used in k-fold cross-validation
was investigated.
The most critical concern of this analysis was the network architecture, which
seems to be strongly related to the size of the analyzed images and the augmen-
tation strategy. From our observation, the U-Net performed better for a small
set without augmentation and images of a smaller size. The CFPNet-M pro-
vides higher accuracy than U-Net for larger images. On the other hand, the
DC-UNet network was the best solution for the set with augmentation. The
region of interest selection improved segmentation quality for all architectures
using the SGD optimization method. However, it does not achieve the highest
results among all analyzed models. From this, we can conclude that this step is
not necessary to limit erroneous detection of the epidermis and does not bring
the expected improvement in segmentation. For the optimization techniques,
the Adam optimizer proved to be much better in this matter. The augmentation
step significantly improved the segmentation results, but the limited rotation for
layered images works better. Finally, the larger size of the images, the better the
results, however it strongly influence the training time.

Acknowledgement. This research was funded by the Polish Ministry of Science and
Silesian University of Technology statutory financial support No. 07/010/BK 22/1011.

References
1. Czajkowska, J., Badura, P., Platkowska-Szczerek, A., Korzekwa, S.: Data for: Deep
Learning Approach to Skin Layers Segmentation in Inflammatory Dermatoses,
Mendeley Data. https://doi.org/10.17632/5p7fxjt7vs.1
2. Langan, S.M., Irvine, A.D., Weidinger, S.: Atopic dermatitis. The Lancet, Else-
vier, vol. 396, nb. 10247, pp. 345–360, August 2020. https://doi.org/10.1016/s0140-
6736(20)31286-1
3. Langley, R.G., Krueger, G.G., Griffiths, C.E.: Psoriasis: epidemiology, clinical fea-
tures, and quality of life. Ann Rheum Dis. BMJ 64, nb. Suppl 2, ii18-23; discussion
ii24-5, (2005). https://doi.org/10.1136/ard.2004.033217
4. Polańska, A., Dańczak-Pazdrowska, A., Jalowska, M., Żaba, R., Adamski, Z.: Cur-
rent applications of high-frequency ultrasonography in dermatology. Adv. Derma-
tol. Allergol./Postepy
 Dermatologii i Alergologii, Termedia Sp. z.o.o. 34, nb. 6,
535–542 (2017). https://doi.org/10.5114/ada.2017.72457
5. del Amor, R., et al.: Automatic segmentation of epidermis and hair follicles in opti-
cal coherence tomography images of normal skin by convolutional neural networks.
Front. Med. Front. Media 7, (2020). https://doi.org/10.3389/fmed.2020.00220
6. Czajkowska, J., Badura, P., Korzekwa, S., Platkowska-Szczerek, A.: Deep learning
approach to skin layers segmentation in inflammatory dermatoses. Ultrasonics,
Elsevier, vol. 114, pp. 106412 (2021). https://doi.org/10.1016/j.ultras.2021.106412
Study on the Impact of Neural Network Architecture 221

7. Sciolla, B., Digabel, J.L., Josse, G., Dambry, T., Guibert, B., Delachartre, P.: Joint
segmentation and characterization of the dermis in 50 MHz ultrasound 2D and 3D
images of the skin. Comput. Biol. Med. Elsevier 103, 277–286. (2018). https://doi.
org/10.1016/j.compbiomed.2018.10.029
8. Marosán, P., Szalai, K., Csabai, D., Csány, G., Horváth, A., Gyöngy, M.: Auto-
mated seeding for ultrasound skin lesion segmentation. Ultrasonics, Elsevier 110,
106268 (2021). https://doi.org/10.1016/j.ultras.2020.106268
9. Xu, H., Mandal, M.: Epidermis segmentation in skin histopathological images
based on thickness measurement and k-means algorithm. EURASIP J. Image Video
Process. 2015(1), 1–14 (2015). https://doi.org/10.1186/s13640-015-0076-3
10. Siddique, N., Paheding, S., Elkin, C., Devabhaktuni, V.: U-net and its variants for
medical image segmentation: a review of theory and applications. IEEE Access 9,
82031–82057 (2021). https://doi.org/10.1109/ACCESS.2021.3086020
11. Lou, A., Guan, S., Loew, M.H.: DC-UNET: rethinking the U-Net architecture with
dual channel efficient CNN for medical image segmentation. In: Medical Imaging
2021: Image Processing, SPIE, vol. 11596, pp. 115962T (2021). https://doi.org/10.
1117/12.2582338
12. Lou, A., Guan, S., Loew, M.: CFPNet-M: a light-weight encoder-decoder based
network for multimodal biomedical image real-time segmentation. arXiv preprint
arXiv:2105.04075 (2021). https://doi.org/10.48550/ARXIV.2105.04075
13. Sterbak, T.: U-net for segmenting seismic images with keras. https://www.
depends-on-the-definition.com/unet-keras-segmenting-images/, April 2020
14. Chen, J.: OCT-image-segmentation-ml. https://github.com/jessicaychen/OCT-
Image-Segmentation-ML, August 2020
Skin Lesion Matching Algorithm
for Application in Full Body Imaging
Systems

(B)
Maria Strakowska
 and Marcin Kociolek

Institute of Electronics, Lodz University of Technology, Al. Politechniki 10,


93-590 L
 ódź, Poland
{maria.strakowska,marcin.kociolek}@p.lodz.pl

Abstract. The full body imaging systems (FBS) recently gain atten-
tion as they are efficient tool in patient screening in early melanoma
detection. Their advantage is the ability to detect suspicious changes
that appear in places that are difficult for the patient to see indepen-
dently (e.g. on the back) as well as to observe newly formed changes
and detect the growth of existing nevi. An essential part of FBS soft-
ware is lesion matching algorithm that enables pairing lesions detected
during patient’s follow-up examination. This paper proposes such algo-
rithm that is based on feature matching followed by triangulation. It was
demonstrated that proposed method provides relatively fast and accu-
rate lesion matching. Obtained sensitivity and precision at the level of
85.9% and 86.9% respectively satisfies the requirements defined for the
FBS specification which is currently under development.

Keywords: Matching · Triangulation · Melanocytic nevi

1 Introduction

Skin cancer is taking its death toll all over with an observable upward trend.
However, if detected early, the chances of recovery are very high. Detection of this
neoplasm has long been supported by the development of image analysis meth-
ods that have been applied in the discrimination of pathological skin lesions [17].
Many algorithms for the detection and classification of skin neoplastic changes
have been described in the literature, good reviews of such methods can be found
in [3,13]. There are also many mobile applications devoted to analysis of skin
lesions that use photos taken with a smartphone [4]. Recently, full body (or
whole-, total-body) systems (FBS) have also been developed, which provide the
possibility of taking photographs of the patient’s entire body (and not just indi-
vidual moles). The advantage of this approach is the ability to detect suspicious
changes that appear in places that are difficult for the patient to see indepen-
dently (e.g., on the back). Another advantage of such systems is the possibility
of observing newly formed changes and detecting the growth of existing nevi,
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 222–233, 2022.
https://doi.org/10.1007/978-3-031-09135-3_19
Skin Lesion Matching 223

which should be of particular interest to a dermatologist because they can turn


into neoplastic changes.
A prototype of such a system was developed by Skopia Esthetic Clinic,
Kraków in cooperation with the Institute of Electronics, Lodz University of
Technology as part of a jointly implemented NCR&D project “Development of
a device to support early stage examination of skin lesions, including melanoma
using computer imaging, spatial modeling, comparative analysis and classifica-
tion methods”. To capture the patient’s pictures the prototype is equipped with
a system which automatically changes RGB digital camera (SONY DSC RX100)
location in two directions: vertically as well as around the person being examined.
During examination the person stands on a motionless platform at the center
of the device. The boom to which the camera is attached gradually rotates and
stops at selected number of equidistant angular positions. At each angular posi-
tion the camera is moved vertically and stops at selected number of positions.
For each stop of the camera in its vertical movement the camera captures the
picture. Usually, the examination contains 32 overlapping pictures (with size
of 5488 × 3664 pixels) which cover the whole area of skin of the patient skin.
The prototype is equipped with image processing software that enables, among
other, lesion detection and segmentation algorithms as well as procedure to build
a 3D model of the patient. It allows also to relate the acquired skin images with
the model, and orthorectify these images to enable detection of size and shape
changes in nevi. The details about the FBS prototype along with implemented
algorithms can be found in [15,16]. The next step in system software develop-
ment is preparation of lesion pairing algorithm. It will be needed to compare
the images acquired during follow-ups to detect new lesions or significant grows
of existing ones. Such algorithms are not very common in the literature. One
of them is presented in [10], where firstly lesions are detected using a pictorial
structure algorithm. The lesions that are located within the polygon defined by
the landmarks are identified. Then, these lesions are matched by labeling an
association graph using a tensor-based algorithm. A structured support vector
machine is employed to learn all free parameters in the aforementioned steps.
Another solution is described in [9], where the FBS equipped with a turntable
that allows patients body scans acquired for different positions is presented. The
detected and triangulated lesions are matched across stereo pairs at different
positions of the turntable and grouped into sets. Next, a mole map is built for a
topological description of all detected lesions. Using the controlled environment
of the acquisition chamber and following the scanning protocol, two such maps
from different scans can be matched, and temporal changes in individual lesions
detected. Updated version of the lesion matching algorithm is described in [8].
This paper presents another approach to detected moles matching. The devel-
oped algorithm is based on the feature matching followed by the triangulation.
It was demonstrated that proposed method provides relatively fast and accurate
lesion matching, satisfying the requirements defined for the FBS specification.
224 M. Strakowska
 and M. Kociolek

2 Materials and Methods


2.1 Materials

Our research was conducted on image pairs obtained by means of acquisition


system described in [15,16]. Informed consent was obtained from all subjects
involved in the study. Ethical review and approval were waived by the Ethics
Committee of Regional Medical Chamber in Kraków for this study, due to full
non-invasiveness of the conducted research (no medical procedures, the only
action was taking optical pictures of the participants) and to ensuring the
anonymity of the volunteers participating in the study. We had at our disposal
a series of photos received for five patients taken over a certain period of time.
Each study contained 40 pairs of images taken from 8 directions at 5 different
heights. Due to the tedious manual assessment of matched nevi, required during
our research, we limited our data set to 4 pairs of photos for each patient (total
of 20 image pairs). Example of acquired image pair is shown in Fig. 1.

Fig. 1. Example of test image pair

Each pair contains images taken for the same person over a period of time.
Both pictures of the pair are made at the same height and from the same direc-
tion. We used images of both men and women. The image pairs were taken from
different directions. To anonymize, the pairs of images in this study were cropped
so that the patient’s head was not visible. There were from 8 to 143 melanocytic
nevi on the individual images.
Neural networks are reliable tools for many medical image analysis tasks
[5,7,11,14]. Also in this study, a nevus detection procedure [15] was based on a
deep learning network (YOLO3 [12])). As a result of which a binary mask was
obtained containing areas of the detected nevi along with vectors containing
centroids and bounding boxes of individual nevi. An example of a mask with
discovered nevi superimposed on the input image is shown in Fig. 2.
The goal is to match the lesions from the images taken in different time, e.g.,
with the interval of six months. The result of the mole matching algorithm is a list
of indexes of matched moles, which corresponds to the same spot on the patient’s
Skin Lesion Matching 225

Fig. 2. An example of a binary mask with discovered nevi superimposed on the input
image (light green spots on the image).

skin. Although, the images mentioned above are taken in the same reference
position of the patient’s body it is not easy to match lesions corresponding to
each other. This results from different patient’s body posture, changes in their
weight, skin tan, lingerie which is worn etc. What is most important the character
of lesions could change. The size, shape and color of lesions could evolve, which
is in fact the reason to match and compare it. The block scheme of the developed
algorithm is shown in Fig. 3.

Fig. 3. Simplified block scheme of the algorithm

Two main stages can be indicated. First step is based on feature matching,
the second on triangulation using the spots from stage one.

Matching Based on Features. First step is intended for finding pairs of


the lesions using the feature matching method. The features are detected from
the Regions of Interest (ROI) defined in the input binary mask which points
the lesions area on the skin. The ROIs are slightly expanded to include both
the nevi and its neighborhood. Few feature types were tested such as SIFT
(Scale Invariant Feature Transform), BRISK (Binary Robust Invariant Scal-
able Keypoints), ORB (Oriented FAST and Rotated BRIEF), KAZE [1] and
AKAZE (Accelerated-KAZE) [2]. Using such feature detectors, the keypoints
226 M. Strakowska
 and M. Kociolek

and their descriptors for both images are found. Keypoints are the points of
interest found by the algorithm. Descriptors is the set of values which describes
the keypoint. Using the descriptors set and comparing its similarities, two dif-
ferent keypoints could be matched. Brute Force Matcher is used to match these
descriptors together and find similarities between the images. Tests of different
types of feature detectors show that AKAZE features gives the best result. They
ensure the highest number of keypoint’s matches, both on overall and valid ones,
which indicate the same lesion on the skin. Exemplary result of detected AKAZE
features and matching are shown in Fig. 4(a).

Fig. 4. AKAZE keypoints matched by Brut Force Matcher algorithm (a) all keypoints
matched (b) keypoints detected for marked lesion (rectangle)

The pairs of matched keypoints are the temporary result of this stage of
algorithm. Each lesion could have many keypoints – Fig. 4(b). Not all of them
could be paired with the ones corresponding to the same lesion in the second
image. In result, the additional operation must be performed to get unambiguous
pairs of lesions. The simple voting for the highest number of connections between
keypoints of the lesions have been used to get the final matching presented in
Fig. 5.

Fig. 5. Connected pairs of the lesions in both images


Skin Lesion Matching 227

Many lesions, especially these healthy, could be very similar to each other.
Therefore, the above method could not work correctly in every case and generate
wrong matches as the ones seen in Fig. 5. The most obvious wrong matches
are easy to find as the lines connecting two images have different angle from
most of the other ones. To solve this problem, the RANSAC (Random Sample
Consensus) [6] algorithm has been used to remove the wrong matches and leave
only these which are agreed with the detected model. The result of the 1st stage
of the algorithm is shown in Fig. 6.

Fig. 6. Matched lesions after RANSAC algorithm

The indicated pairs of the lesions have been created correctly. However, many
of lesions from images have not been matched. One could compare Fig. 4(a) to
Fig. 6 to state it evidently.

Maching Based on Triangulation. The second stage of the algorithm is based


on the ML (match lesions) from the previous one. These pairs are the reference
points in both images. In simplification, the UM (unmatched lesions) from both
images are paired based on their position with respect to these confirmed points.
Let’s assume that we have one UL in image 1 indicated in Fig. 7(a). The same
lesion should be found in follow-up image around the same area because the
body posture is similar. This UL is searched in the 2nd image inside limited
area defined by the circle presented in Fig. 7(a).
The center of the mentioned circle is placed at the coordinates of unmatched
lesion from image 1 (U Mim1 ), while its radius (cirRad) is calculated based on
the patients’ body center of mass in two images according to Eq. (1). What is
more, value of the radius is limited to the range < 300, 500 >. These values were
defined experimentally for a used imaging system.
cirRad = 2 · (p1 − p2) (1)
where p1, and p2 are the coordinates of the patient’s body center of mass, esti-
mated by binarization of the images in the H channel in HSV color space.
228 M. Strakowska
 and M. Kociolek

Fig. 7. (a) Unmatched lesion the image 1 and (b) corresponding area, where the pair
is searched for

The unmatched lesions create the triangles with the matched ones which are
found in their nearest neighborhood. The operation is performed on first image
with the U L1 being considered at the moment. The next sets of triangles are
created on the follow-up image with unmatched spots which meets the condition
of the neighborhood and the same matched, reference lesions as in the image 1.
Figure 8 illustrate this operation.

Fig. 8. Triangles created by the same vertex of matched (dots) and unmatchetd
(crosses) lesions. (a) first potential pair to lesion no. 1 (b) second potential pair to
lesion no. 1

Let’s assume that we have 4 matched lesions (dots) and 3 unmatched ones
(crosses) – Fig. 8. Now let’s try to find the pair to the lesion no. 1 in the first
image (U L1 ). The position condition defined by the circle radius is met in image
2 for points no. 1 and no. 2 (U L1.1 \U L1.2 ). Only these points are the candidates
to be paired with U L1 . Next, the set of triangles with the common unmatched
Skin Lesion Matching 229

lesion (U L1 ) and the M L from algorithm stage 1 are created. Considering four
ML in the neighborhood of U L1 , three triangles (t1 ,t2 ,t3 ) will be created for the
U L1 in the image 1 – Fig. 8(a). Coordinates of the centers of these lesions are
the vertices of build triangles. Common vertex for all triangles is the center of
unmatched lesion U L1 . Taking the same M L in the image 2, the new sets of the
corresponding triangles are created. These sets of triangles are (t1.1 , t2.1 , t3.1 ) for
U L1.1 (Fig. 8(a)) and (t1.2 , t2.2 , t3.2 ) for U L1.2 (Fig. 8(b)). The final matching
is performed by analyzing the similarity of the set of triangles created for each
individual unmatched lesions from follow-up image. To perform this task, two
matrices of parameters are calculated according to Eq. (2) and (3).
 
an bn cn
sss(m, n) = std , , (2)
an.m bn.m cn.m
 
αn βn γn
aaa(m, n) = std , , (3)
αn.m βn.m γn.m

where anm , bnm , cnm – triangles sides, αnm , βnm , γnm – triangles angles, n –
number of triangles for U L1 created with same referenced matched lesions on
images 1 and 2, m – number of candidate lesions (from follow-up image) to be
paired with U L1 from image 1.
In such a way the array with triangles similarity parameters is created. The
number of rows is the number of unmatched lesions in the second image that
could be the valid match to selected unmatched lesion in image 1. The number
of columns indicate the number of triangles created with the reference lesions.
The parameter calculation is based on Side-Side-Side (sss(n, m)) and Angle-
Angle-Angle (aaa(n, m)) rules of triangle comparison. Therefore, the reduction
to single value is made by calculation of standard deviation from the quotient of
the corresponding sides or angles of a given triangle. The closer to zero standard
deviation is, the more similar are the triangles. Finally, the voting takes place.
Each of the value in the matrix is cast to positive vote if the value of parameter
do not exceed the threshold value thvote . Such voting take place for both types
of parameters - sides and angles and aims to find the most similar triangles. The
possible match lesion with the highest number of the votes is chosen as a pair to
considered unmatched lesion. This part of the algorithm has two iterations. The
result of the first one shows Fig. 9(a). To increase the number of the matches the
second iteration is performed. The triangles are now created on new vertices as
the number of matched lesions (the reference points for triangles) increased. It
is also assumed that the area of interest was too small in previous iteration so
some pairs of unmatched lesions in the second image were not inside the area
of interest. In the second iteration this area (cirRad) is extended 2 times. The
result after such operation is shown in Fig. 9(b).
Figure 10 shows the result for all matched lesions which are paired by the
developed algorithm, both by feature matching method and the triangulation.
230 M. Strakowska
 and M. Kociolek

Fig. 9. New matches detected by the method of triangles, (a) – first iteration (b) –
second iteration

Fig. 10. All pairs of lesions matched by the algorithm

3 Results

In total, the melanocytic nevus detection algorithm found 599 nevi in the ini-
tial images and 648 nevi in the follow-up images. Table 1 summarizes the test
performed.

Table 1. Numerical summary of performed test

Total Actually Actually True False True False


positive negative positive positive negative negative
(P) (N) (TP) (FP) (TN) (FN)
34688 433 34255 372 56 34199 61

Table legend: Total: all possible nevi pairs in all analysed images. For each
image pair, the number of nevi pairs in original and follow up image is equal
to the product of the number of lesions found in both images. A total number
of nevi pairs is a sum of all pairs in all analysed images. Actually Positive (P):
Skin Lesion Matching 231

correct nevi pairs found by the manual review; Actually Negative (N): not correct
nevi pairs that were rejected during the manual review; True Positive (TP): nevi
pairs found both by means of lesion algorithm and confirmed during the manual
review; False Positive (FP): nevi pairs found by means of lesion algorithm and
rejected during the manual review; True Negative (TN): all not correct nevi pairs
rejected both by means of lesion algorithm and during the manual review; False
Negative (FN): nevi pairs not detected by means of lesion algorithm but found
during the manual review;
Based on the examined image pairs, the detected nevi can be combined in 34
688 pairs. Of course, most of these pairs can be easily rejected, e.g. due to the
significant vertical and or horizontal shift between most of nevi. After careful
manual analysis by a specialist, 433 correct pairs were found. Our matching
algorithm identified 428 pairs 56 of which were false positives. The algorithm
failed to discover 61 (false negative) pairs. Table 2 shows some basic statistics of
the performed test.

Table 2. Statistics of performed test

Sensitivity TP/P 85.9%


Specificity TN/N 99.8%
Precision TP/(TP+FP) 86.9%

Our algorithm achieved relatively high sensitivity (85.9%) as well as precision


(86.9%). Additionally we performed a speed test on the PC running with Ubuntu
20.4 OS and equipped with an Intel Core i7-9800X (8 cores, 16 threads, 3.8 GHz)
CPU, a Nvidia Titan RTX (4608 CUDA cores with 24 GB of DDR6 RAM) GPU,
and 128 GB of RAM. The average time to match the melanocytic nevi for one
pair of images, when using one processor core, was about 28 s. Hence, the analysis
time for the entire set of 40 images analyzed in a typical examination of one
patient takes less than 20 min. In the future, we are planning a multi-threaded
implementation of the algorithm, which will allow us to significantly shorten the
time of matching nevi for the entire data set.

4 Discussion and Summary

Obtained preliminary results are promising. The sensitivity and precision at


the level of 85.9% and 86.9% is satisfactory, and exceeds the values assumed in
system specification (85%). Compared to other research, obtained results are also
satisfactory. In [10], reported measures obtained for structured graphical model
algorithm are 80% (precision) and 87% sensitivity. In [9] no quantitative results
of stereo-pair processing-based algorithm are provided. From the partial results,
it can be roughly estimated that the accuracy of the developed technique is in
the range of 83%–89%. The improved version of this algorithm presented in [8]
232 M. Strakowska
 and M. Kociolek

achieves the precision equal almost to 100%, but the average sensitivity is 79%.
Also, the algorithm proposed there is quite complex and rather non applicable
for the systems that should operate relatively fast (due to the limited visit time
provided by each patient).
There is still a room for further improvements. The greatest number of
misidentified pairs concerned moles located in places where the skin surface
was inclined at an angle deviating significantly from 90◦ . Currently, our method
is based solely on the information contained in images. In the near future, we
intend to supplement this method with information from the 3D model. In such
a situation, the analysis will cover only these pairs of marks whose inclination
in relation to the camera axis is close to a right angle. The remaining nevi will
be matched based on imaging taken for other directions. Our acquisition system
[15,16] in its current configuration performs imaging from 8 directions of the
patient’s axis. This allows to limit the range of angles at which the nevi are
analyzed to around 65◦ ÷ 115◦ . Nevi matching accuracy is also influenced by
the quality of nevi segmentation algorithm. Improper segmentation may cause
change in lesion shape while fill adversely affect feature calculation step in the
matching algorithm. Although our CNN based segmentation method [15] is quite
accurate (both precision and sensitivity > 90% were obtained), we are still work-
ing on improving it. This is important because the binary masks of detected and
segmented lesions obtained from the previous step of the developed system are
used as the input data of the presented matching algorithm.
The algorithms were developed and tested based on the data obtained so far
(patients’ images acquired by project partner – Skopia Estetic Clinic). Currently,
these algorithms are validated on the emerging images of new patients. In the
future the algorithm will be used in a dermatology clinic for screening. Prior to
that the software will be tuned to specific vision system (its parameters may
influence the algorithm performance) and tested on larger number of images.

Acknowledgement. This work has been supported by the European Regional Fund
and National Centre for Research and Development project POIR.04.01.04-00-0125/18-
00 “Development of a device to support early stage examination of skin lesions, includ-
ing melanoma using computer imaging, spatial modeling, comparative analysis and
classification methods”, implemented by the Skopia Estetic Clinic Sp. z o.o. and the
Lodz University of Technology.

References
1. Alcantarilla, P.F., Bartoli, A., Davison, A.J.: Kaze features. Lecture Notes
in Computer Science (including subseries Lecture Notes in Artificial Intelli-
gence and Lecture Notes in Bioinformatics), vol. 7577 LNCS, pp. 214–227
(2012). https://doi.org/10.1007/978-3-642-33783-3 16, https://link.springer.com/
chapter/10.1007/978-3-642-33783-3 16
2. Alcantarilla, P.F., Nuevo, J., Bartoli, A.: Fast explicit diffusion for accelerated fea-
tures in nonlinear scale spaces (2011). http://www.bmva.org/bmvc/2013/Papers/
paper0013/abstract0013.pdf
Skin Lesion Matching 233

3. Barata, C., Celebi, M.E., Marques, J.S.: A survey of feature extraction in der-
moscopy image analysis of skin cancer. IEEE J. Biomed. Health Inform. 23, 1096–
1109 (2019). https://doi.org/10.1109/JBHI.2018.2845939
4. de Carvalho, T.M., Noels, E., Wakkee, M., Udrea, A., Nijsten, T.: Development of
smartphone apps for skin cancer risk assessment: progress and promise. JMIR Der-
matol. 2019 2(1), e13376 (2019). https://doi.org/10.2196/13376, https://derma.
jmir.org/2019/1/e13376
5. Chrzanowski, L., Drozdz, J., Strzelecki, M., Krzeminska-Pakula, M., Jedrzejewski,
K.S., Kasprzak, J.D.: Application of neural networks for the analysis of intravas-
cular ultrasound and histological aortic wall appearance - an in vitro tissue char-
acterization study. Ultrasound Med. Biol. 34, 103–113 (2008). https://doi.org/10.
1016/J.ULTRASMEDBIO.2007.06.021
6. Fischler, M.A., Bolles, R.C.: Random sample consensus. Commun. ACM 24, 381–
395 (1981). https://doi.org/10.1145/358669.358692, https://dl.acm.org/doi/abs/
10.1145/358669.358692
7. Gentillon, H., Stefańczyk, L., Strzelecki, M., Respondek-Liberska, M.: Parameter
set for computer-assisted texture analysis of fetal brain. BMC Res. Notes 9, 1–
18 (2016). https://doi.org/10.1186/S13104-016-2300-3/TABLES/2. https://link.
springer.com/articles/10.1186/s13104-016-2300-3
8. Korotkov, K., et al.: An improved skin lesion matching scheme in total body pho-
tography. IEEE J. Biomed. Health Inform. 23, 586–598 (2019). https://doi.org/
10.1109/JBHI.2018.2855409
9. Korotkov, K., Quintana, J., Puig, S., Malvehy, J., Garcia, R.: A new total body
scanning system for automatic change detection in multiple pigmented skin lesions.
IEEE Trans. Med. Imaging 34, 317–338 (2015). https://doi.org/10.1109/TMI.
2014.2357715
10. Mirzaalian, H., Lee, T.K., Hamarneh, G.: Skin lesion tracking using structured
graphical models. Med. Image Anal. 27, 84–92 (2016). https://doi.org/10.1016/J.
MEDIA.2015.03.001
11. Obuchowicz, J.R., Kruszyńska, M.S.: Classifying median nerves in carpal tun-
nel syndrome: ultrasound image analysis. Biocybern. Biomed. Eng. 41, 335–351
(2021). https://doi.org/10.1016/j.bbe.2021.02.011
12. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified,
real-time object detection, pp. 779–788 (2016)
13. Saba, T.: Recent advancement in cancer detection using machine learning: sys-
tematic survey of decades, comparisons and challenges. J. Inf. Public Health 13,
1274–1289 (2020). https://doi.org/10.1016/J.JIPH.2020.06.033
14. Strzelecki, M.: Texture boundary detection using network of synchronised oscilla-
tors. Electron. Lett. 40, 466–467 (2004). https://doi.org/10.1049/EL:20040330
15. Strzelecki, M.H., Strakowska,
 M., Kozlowski, M., Urbańczyk, T., Wielowieyska-
Szybińska, D., Kociolek, M.: Skin lesion detection algorithms in whole body images.
Sensors 21, 6639 (2021). https://doi.org/10.3390/S21196639, https://www.mdpi.
com/1424-8220/21/19/6639/htm
16. Szczypiński, P.M., Sprawka, K.: Orthorectification of skin nevi images by means of
3d model of the human body. Sensors 21, 8367 (2021). https://doi.org/10.3390/
S21248367, https://www.mdpi.com/1424-8220/21/24/8367/htm
17. Zalewska, A., Strzelecki, M., Sugut, J.: Implementation of an image analysis system
for morphological description of skin mast cells in urticaria pigmentosa. Med. Sci.
Mon. 3, 260–265 (1997)
Artifact Detection on X-ray of Lung
with COVID-19 Symptoms

Alicja Moskal, Magdalena Jasionowska-Skop(B) , Grzegorz Ostrek,


and Artur Przelaskowski

Faculty of Mathematics and Information Science, Warsaw University of Technology,


Koszykowa 75, 00-662 Warsaw, Poland
a.moskal@student.mini.pw.edu.pl,
{jasionowskam,ostrekg,arturp}@mini.pw.edu.pl

Abstract. In the current COVID-19 pandemic there are growing num-


ber of research papers on computer-aided detection of SARS-CoV-2
symptoms in the X-ray of lung. Unfortunately, there are often various
types of radiographic artifacts that may interfere with pathology recog-
nition by computer-aided systems. The radiographic artifacts include
acquisition artifacts, i.e. necklaces, buttons and bra elements and surgi-
cal artifacts, i.e. pacemakers, electrodes, cables. A computational method
for detecting, segmenting, and marking foreign bodies using masks that
exclude irrelevant areas from further analysis of chest radiograms is pre-
sented. After preprocessing, seedpoint detection is performed using Sobel
filters with adaptive thresolding based on pixel-oriented K-means. Lung
segmentation is performed in parallel to further analyze only those arti-
facts that hinder disease recognition. We grow seed points by thinning
and lightly smoothing edges with hole removal to finally delineate regions
based on shape features. The experiments were carried out using both a
database of 564 with COVID-19 findings (including 270 cases with arti-
facts within the lungs) and a database of 573 without findings (including
393 cases with artifacts within the lungs). The resulting sensitivity of
artifact detection was 74%, including 73% for normal cases and 76% for
separately analyzed COVID-19 confirmed cases.

Keywords: Chest X-rays · Artifact detection · Lung image analysis ·


Covid-19 symptoms

1 Introduction
The current COVID-19 pandemic [1] that is occurring around the world is
prompting us to try to develop automated methods for recognizing the symptoms
of SARS-Cov-2 [2] on chest X-ray. The chest X-ray (CXR) are used to recognize
the SARS-CoV-2 symptoms among different imaging modalities due to low cost,
low radiation dose, wide accessibility and easy-to-operate in general or commu-
nity hospitals [3]. One of the significant limitations of this diagnostic imaging is
the large number of various artifacts that limit the effectiveness of the diagnosis.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 234–245, 2022.
https://doi.org/10.1007/978-3-031-09135-3_20
Artifact Detection on X-ray of Lung 235

There have been reported cases in which radiographic artifacts have mimicked
potentially more serious pathology. For example, an X-ray artifact caused diag-
nostic problem in a child who is followed up with developmental dysplasia of the
hip joint [4], another time underpants with highly recognizable Snoopy shows
lobulated density overlying pelvic area [5]. In general, the presence of artifacts
can significantly affect the results of computer analysis [6]. Serious problems arise
in developing effective methods for computerized COVID-19 symptom recogni-
tion to support diagnosis, especially with deep learning methods. An extremely
timely and important challenge is to increase the reliability of models and the
efficiency of recognition methods by reducing the impact of unwanted, often
unavoidable artifacts on the effects of CXR image analysis and diagnostic inter-
pretations [7]. This work focuses on developing effective preprocessing methods
to detect these artifacts and to identify segmented areas to be excluded from
further computerized analysis to more reliably recognize of disease symptoms.
This helps solve the black box problem in the results of analyses performed with
deep models used to detect COVID-19 symptoms.

1.1 Radiographic Artifacts


In this section, we consider the types of possible radiographic artifacts which can
be seen in CXR. They are of unusual shapes, very different in size which affects
their various non-standardized image features. The radiographic artifacts can be
located in various, often surprising regions in the X-ray image. Generally they
can be divided them into two groups: acquisition artifacts and surgical artifacts.

Acquisition Artifacts. This group of artifacts could be avoided at the image


acquisition stage by removing these elements while X-ray imaging of chest. For
some reason, this was not noticed during the image acquisition. These artifacts
are difficult to remove using the preprocessing methods due to their diversity
in size, shape and location in lung field in the image. Moreover, their inten-
sity may be similar to the intensity of ribs, clavicles or lung periphery. Thus
if foreign objects are located close to these regions, their precisely detection
seems to be more complex. This artifact group includes bra elements such as
sliders for adjustable shoulder straps (Fig. 1(a)), adjustable hooks and eye fas-
tening (Fig. 1(b)), rings and underwires (Fig. 1(c)), buttons (Fig. 1(d)), chains
(Fig. 1(e)) as well as others (Fig. 1(f)).

Surgical Artifacts. Surgical artifacts cannot be removed during the image


acquisition. This group of artifacts includes pacemakers (Fig. 1(g)), electrodes
(Fig. 1(h)) and cables (Fig. 1(i)). The challenge with removing these elements
from X-ray is finding them entirely along with the cables that are connected
to them. The cables are not clearly visible in the image and do not stand out
for pixel intensity. In the automated detection process, the cables are difficult
to distinguish from ribs or lung opacity. In addition, radiographic artifacts like
pacemarkers or electrodes are difficult to detect as one element. In the image
236 A. Moskal et al.

Fig. 1. Chest X-ray with ACQUISITION ARTIFACTS: sliders in bra for adjustable
shoulder straps (a), adjustable hooks and eye fastening (b), underwire and rings in bra
(c), buttons (d), necklace (e), other (f). Chest X-ray with SURGICAL ARTIFACTS:
pacemaker (g), electrode (h) and cable (i)

of this type of artifacts, pixels of different intensity are observed because some
elements are made of metal and some of plastic and they absorb X-rays to
a different extent.
Artifact Detection on X-ray of Lung 237

1.2 Computer-Aided Detection of Artifacts

The presence of image artifacts makes the diagnostically significant content diffi-
cult to enhance in the image. In the case of CXR, the artifacts primarily interfere
with the effectiveness of computer-aided algorithms for pathology detection. The
artifacts are so visible that they are not a problem in making a diagnosis by radi-
ologists. However, both their size and their distribution, often at the lung border
or covering the ribs, can interfere with the results of automated detection meth-
ods for pathological findings. A lot of work on image preprocessing to remove
redundant image artifacts is presented in the literature.
U. Subramaniam et al. [9] applied histogram of oriented gradients algorithm,
Haar transform, and local binary pattern algorithm to improve the image quality
increasing the intersection over union scores in lung segmentation from the X-
ray. According to the research group, the segmentation of lungs from the X-ray
can improve the accuracy of results in COVID-19 detection algorithms or any
machine/deep learning techniques. M. Heidari [10] et al. indicates that two-stage
image preprocessing with the use of both a histogram equalization algorithm and
a bilateral low-pass filter as well as generating a pseudo color image matters in
developing a deep learning CAD scheme of chest X-ray to improve accuracy in
detecting COVID-19 infected pneumonia. The method of Z. Xue et al. [11] was
proposed for the detection of one type of foreign objects in the X-ray i.e. buttons
on the gown that the patient is wearing. The method involves four major steps:
intensity normalization, low contrast image identification and enhancement, seg-
mentation of lung regions, and button object extraction. The method is based
on the circular Hough transform and the Viola-Jones algorithm. An automated
method was presented by Hogeweg et al. [6] to detect, segment, and remove for-
eign objects from CXR. Detection of buttons, brassier clips, jewellery or pace-
makers and wires was performed with the use of supervised pixel classification
with a kNN classifier, resulting in a probability estimate per pixel to belong to an
artifact. Grouping and post-processing pixels with a probability above a certain
threshold was used for radiographic artifact segmentation. Based on literature
review on methods for the automatic detection of COVID-19 symptoms in the
X-ray, it seems to be a need for detection and removal of various types of foreign
objects in a simple and low-cost algorithmic way at the image preprocessing step.
Radiographic artifacts are often present in the lung field, which may erroneously
suggest the presence of pathology in computer-aided diagnosis of lung diseases.

2 The Proposed Method for Radiographic Artifact


Detection

The proposed method starts with preprocessing the DICOM image to gray scale
and contrast enhancement. Lung area segmentation is then performed for further
analysis. The edges are detected subsequently and a binary image of the mask is
created to use morphological dilation and filling the contours in the next stage.
The shape features of each detected region are analyzed to eliminate regions
238 A. Moskal et al.

with a small area or too elongated elements alike ribs. The method overview is
presented on Fig. 2.

Fig. 2. Scheme of the proposed method

The proposed methodology uses: a) initial pixel classification – threshold-


ing to define seedpoints, b) grow seedpoints to complement all unuseful redun-
dants located in the lung area, c) complete artifact masks indicating lung areas
excluded from high-level analysis of COVID-19 symptoms.

2.1 Image Preprocessing and Segmentation of Lung Field


The first stage of the method is to convert intensity from DICOM from
Hounsfield units grayscale, that is values in range [0, 1]. Next the contrast
enhancement is performed with the use of histogram saturation at the lowest
and highest intensities (as in [11]). In parallel, efficient segmentation of the lung
region is performed to limit the area of further analysis to artifacts that hinder
COVID-19 symptom recognition. A method combining the idea of representing a
lung radiogram using a lung pattern dictionary and texture-based nonrigid reg-
istration was used to perfect segmentation effects [12]. It consists of three main
steps: i) recognition of the specific query lung characteristics by reference to a
set of representative patterns, ii) first approximation of the lung outline based on
the selected shape patterns; iii) shape refinement by fitting the boundary outline
into the locally varying distribution of the relevant lung tissue relative to the sur-
rounding tissues, under the control of an object model of the whole lung outline.
Moreover, the segmentation criteria used relate primarily to the need to include
any possibly abnormal lung area excluding any redundancy. The adapted crite-
rion narrows the delineated lung tissue to only diagnostically relevant regions.
We refer to this as semantic segmentation of the lungs, implemented for the sake
of COVID-19 diagnosis.

2.2 Lung Seedpoints Definition and Marging


The seedpoints of possible artifacts in lungs are detected. Various filter banks
and algorithms were experimentally verified including the Canny or Roberts
Artifact Detection on X-ray of Lung 239

concepts. But finally the Sobel operator gave the best results across filtered
training dataset so it was proposed to establish seed points by thresolding. The
K-means clustering algorithm was optimized to calculate the threshold values
with adjusted k = 5 number of clusters. From the obtained cluster centers values,
the first one greater than t = 0.08 was selected. Both parameters were chosen
experimentally. The binary image obtained was then dilated with neighborhood
expressed as 5×5 array of 1’s to grow seedpoints and merge into larger masks or
discard. Finally, the lightly smoothed contours are filled in, removing any holes
in the finally defined objects.

2.3 Objects Selection with Shape Features

The next stage of the proposed method is the analysis of shapes of the detected
objects. At the beginning, objects with areas smaller than the given value of
M = 1500 pixels are rejected. The lengthwise objects are examined if they have
parabolic shape. That indicates they may be false detected ribs. The skeleton
of the region is created by Zhang’s algorithm [15] in order to detect parabolic
shaped objects. For the extreme left and right points, the straight line equation
is delineated. The middle point between them is checked whether it is above or
below the straight line.

Fig. 3. Method optimization: (a) before detecting ribs as artifacts, (b) after detecting
only electrode – desired effect, (c) before detecting only the brightness artifacts, (d)
after detecting all artifacts – desired effect

2.4 The Optimization of Proposed Method

Additional analysis of the shape of the detected objects proved necessary due to
the need to differentiate them relative to the brighter areas of the surrounding
ribs. Another challenge was the simultaneous occurrence of several different arti-
facts with varying pixel intensities. Furthermore, if only the maximum threshold
value obtained by the K-means algorithm had been accepted, some artifacts
240 A. Moskal et al.

might have been missed. Therefore, a minimum threshold value was determined
and the first cluster whose center value was greater than this minimum value
was accepted. The Fig. 3 shows both examples of the applied optimization of the
artifact segmentation method.

3 Method Results

For experiments and evaluation of the detection of foreign object, from large
chest radiogram database was used [13]. The data was collected from Hospital
108 and the Hanoi Medical University Hospital, two of the largest hospitals in
Vietnam. The published dataset consists of 18,000 postero-anterior view CXR
scans which were annotated by a group of experienced radiologists for the pres-
ence of critical findings. From the database, two types of were extracted. With
no pathological findings (Fig. 4(a)) and those annotated as consolidation or lung
opacity (Fig. 4(b)) because these are the most common pathological findings in
CXR of COVID-19 patients [14]. There were 1378 and 1366 in these categories,
respectively. Furthermore, a subset of all radiograms with foreign objects like
acquisition or surgical artifacts was selected. For radiograms with findings, we
selected 564 cases but only 270 of them had artifacts within the lungs. In the
second, we selected 393 such cases to at least roughly balance the number of
artifacts of interest in both collections (with or without pathological findings)
(Table 1).

Fig. 4. Sample chest X-rays from database with: (a) no pathological findings and sliders
artifacts, (b) lung opacity and necklace artifact. Case for manual evaluation – lung
mask (marked in blue) and expert’s mask (marked in green) have common area (c).
Underwires are outside the segmented area of the lung
Artifact Detection on X-ray of Lung 241

Table 1. Summary of number of artifacts markers inside or outside lung area

Artifact area Cases with findings Cases without findings


Inside lungs region 270 393
Outside lungs region 294 180

3.1 Artifacts Delineation

The X-rays were reviewed on PC computer with Windows OS on Nec EA271U


display with NVIDIA Quadro K620 GPU by a biomedical imaging engineer with
15 year experience. All radiological artifact findings were manually marked by
bounding box on preselected cases: 127 CXR with pathological findings and
131 normal cases. A subset of representative, most interesting cases were out-
lined by location, size, form of manifestation, and diversity. The analysis of
delineated areas indicated the main groups of artifacts are: small bra elements
(sliders, hooks), bra underwire, jewelry, ornaments or surgical equipment like
pacemaker, stent, oxygen tube and electrodes. Table 2 show percentage of the
artifacts present in X-ray over lung area by category. Columns do not add up to
100% because of the different categories of artifacts that may be present in the
image. The average separate small bra elements over lungs region was 6. The
surgical equipment was present among patients with findings in higher rate than
in no findings cases.
Table 2. Summary of number of artifacts markers inside lung area in percents

Type Cases without findings, % Cases with findings, % All %


underwire 34 10 22
bra elements 78 54 66
zipper 8 2 5
button 12 10 11
necklace 32 9 21
brooch/safety pin 1,5 4.7 2
pacemaker 1,5 13 7
stent 0 2.3 1
oxygen tube 0 5.5 2.7
electrode 1 11 5.8
R/L marker 0.7 0 0.3

3.2 Measurement of Artifact Detection Efficiency

Using the proposed method, a binary mask of artifacts was created for each
image in the chosen dataset. To evaluate the method, for each manually marked
242 A. Moskal et al.

region within the lung region, it was checked if created mask had any artifacts
detected there. If the mask has marked pixels in this region, it is considered
as true positive (TP), otherwise as false negative (FN). Sensitivity parameter
defined as T P/(T P +F N ) is used to describe the method efficiency. To delineate
the regions inside the lung area, the intersection of the lung mask and the mask
selected by an expert have been determined. Regions with a minimum number
of pixels of 3000 (the value determined experimentally) were assumed as the
area inside lungs field. Additionally, manual verification had to be performed.
Regions marked by the expert are rectangular in shape and may intersect with
the lung area but not contain any regions in this area. This mainly applies to
underwires or necklaces. An example is shown in Fig. 4(c).

3.3 Experimental Results

The selected with achieved results of proposed method are presented in Fig. 5(a)–
5(h). All show chest X-ray with artifacts ground truth markers (green rectan-
gles), lung area (blue contours) and detected mask of artifacts marked in yellow.
Only artifacts withing lung area should be found.
All artifacts in Fig. 5(a)–5(c) were correctly detected and marked. Figure 5(d)
presents fully detected electrode but without the wire which was probably
removed during ribs elements filtering. Figure 5(g) shows elements of the bra
were not detected on left lung. It may be caused by whiter regions in left lungs
and their closeness to the lung’s periphery. The problem near the lungs’ periph-
ery is also shown in the Fig. 5(e) were artifact are only partially detected. Other
problematic areas might be clavicles because of their high contrast on chest X-
rays. For example in the Fig. 5(f) buttons located on the clavicles were not fully
marked. During edge detection, ribs can easily be found and marked. A part of
proposed method is filtering objects like ribs but is not always successful, espe-
cially when artifacts are connected to them because of low contrast between rib
and foreign object (Fig. 5(h)).
For normal cases, the proposed method correctly detected 288 out of 393 arti-
facts regions which gives sensitivity value 0.73. In case of dataset with COVID-19
findings, method discovered 204 out of 270 artifacts, so the sensitivity value is
0.76. For all cases, method detected artifacts in 90%. The results are summarized
in Table 3.

Table 3. The sensitivity of proposed solution

Dataset TP FN Sensitivity
Normal cases 288 105 0.73
Cases with confirmed COVID-19 204 66 0.76
All cases 492 171 0.74
Artifact Detection on X-ray of Lung 243

Fig. 5. Correctly detected artifacts on chest X-rays (a)–(b) without pathological find-
ings, (c)–(d) with pathological findings. Partially detected artifacts on chest X-rays
with pathological findings (e)–(g). Detected artifacts on chest X-rays without patho-
logical findings (g)–(h). Lung masks are marked in blue. Expert’s masks of artifacts
are marked in green. Algorithm results of artifact detection are marked in yellow
244 A. Moskal et al.

4 Conclusions
Effective detection of radiographic artifacts can, in some cases, increase the
utility of computerized lung segmentation by isolating areas of non-significant
diagnostic content (most of the most effective methods do not use additional
algorithms for artifact detection). However, the role of the proposed method
of artifact detection and its usefulness increases significantly in the case of the
interpretation of lung for e.g. detection of COVID-19 symptoms.
A challenge of deep learning (DL) method is the use of only those features
in the trained model that allow reliable interpretations to support diagnostic
decisions. Only substantively justified image content related to the analyzed
pathology should be described. The extensive computational model often catches
random correlations between the different quality parameters of from different
acquisition systems additional radiograms information. It selects the most dis-
criminating features not necessarily related to diagnostically relevant content.
The specificity of the radiographic artifacts dependent on the X-ray machine,
the hospital procedures of organization of imaging studies, regulations on acqui-
sition conditions, etc. may shape the model much more strongly than subtle
pathological changes. By recognizing artifacts and eliminating their influence on
further analysis and interpretation of radiograms, we increase the reliability and
usefulness of these models.
In other machine learning solutions, by using inpainting methods to cover the
area of detected artifacts with the specificity of adjacent tissue, we increase the
effectiveness of local descriptors characterizing the distribution of local features
of radiograms. This often results in an improvement in the number of accurate
diagnoses.
The developed and tested method is sufficiently effective to achieve such pos-
itive effects improving the effectiveness of COVID-19 diagnostics support. The
presented method: i) is robust for various artifacts types; ii) does not required
labeled training samples; in this paper the annotated set was used only for evalu-
ation purposes; iii) can be easily adapted as preprocessing step in more complex
algorithms. The achieved efficiency of artifact detection is limited, mainly due
to the highly variable nature of artifacts, different acquisition parameters of
radiographs or strongly different specificity of lung tissue in CXR depending
on patient characteristics. However, when combined with an effective segmenta-
tion method, it addresses important DL problems highlighted in recent reports
[16,17].

References
1. World Health Organization: Global research on coronavirus disease (COVID-
19). https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-
research-on-novel-coronavirus-2019-ncov
2. Kanne, J.P., et al.: COVID-19 imaging: what we know now and what remains
unknown. Radiology 299(3), 262–279 (2021). https://doi.org/10.1148/radiol.
2021204522
Artifact Detection on X-ray of Lung 245

3. Heidari, M., Mirniaharikandehei, S., Zargari, A., et al.: Improving the performance
of CNN to predict the likelihood of COVID-19 using chest X-ray with preprocessing
algorithms. Int. J. Med. Inform. 144 (2020). 104284, ISSN 1386–5056, https://doi.
org/10.1016/j.ijmedinf.2020.104284
4. Uras, I., Yavuz, O.Y., Kose, K.C., Atalar, H., Uras, N., Karadag, A.: Radiographic
artifact mimicking epiphysis of the femoral head in a seven-month-old girl. J. Natl.
Med. Assoc. 98(7), 1181–1182 (2006). PMID: 16895292; PMCID: PMC2569463
5. Mestayer, R.G., Attaway, K.C., Polchow ,T.N., Brogdon, B.G.: Snooping around
the adolescent pelvis: good grief, it’s the brief! AJR Am. J. Roentgenol.
186(2):587–588. PMID: 16423982. https://doi.org/10.2214/AJR.05.0816
6. Hogeweg, L., et al.: Foreign object detection and removal to improve automated
analysis of chest radiographs. Med Phys. 40(7), 071901 (2013). PMID: 23822438.
https://doi.org/10.1118/1.4805104
7. Sarkar, A., et al.: Identification of of COVID-19 from chest X-rays using deep
learning: comparing COGNEX VisionPro deep learning 1.0 software with open
source convolutional neural networks. SN Comput. Sci. 2(3), 130 (2021)
8. Murphy, A.: Clothing artifact. Case study, Radiopaedia.org. https://doi.org/10.
53347/rID-59812. Accessed 18 Jan 2022
9. Subramaniam, U., Monica Subashini, M., Almakhles, D., Karthick, A., Manoharan,
S.: An Expert system for covid-19 infection tracking in lungs using image processing
and deep learning techniques. BioMed Res. Int. 2021 (2021). Article ID 1896762,
17 pages. https://doi.org/10.1155/2021/1896762
10. Heidari, M., et al.: Improving the performance of CNN to predict the likeli-
hood of COVID-19 using chest X-ray with preprocessing algorithms. Int. J. Med.
Inform. 144, 104284 (2020). ISSN 1386–5056, https://doi.org/10.1016/j.ijmedinf.
2020.104284
11. Xue, Z., et al.: Foreign object detection in chest X-rays. In: IEEE International
Conference on Bioinformatics and Biomedicine (BIBM), 2015, pp. 956–961 (2015).
https://doi.org/10.1109/BIBM.2015.7359812
12. Przelaskowski, A., Jasionowska, M., Ostrek, G.: ’Semantic segmentation of abnor-
mal lung areas on chest X-rays to detect COVID-19’. Submitted to ITIB 2022
13. Nguyen, H.Q., et al.: VinDr-CXR: An open dataset of chest X-rays with radiolo-
gist’s annotations (2020)
14. Wong, H., Lam, H., Fong, A.T., Leung, S., Chin, T.Y., Lo, C., Lui, M.S., Lee, J.,
Chiu, K.H., Chung, T.H., Lee, E., Wan, E., Hung, I., Lam, T., Kuo, M., Ng, M.Y.:
Frequency and distribution of chest radiographic findings in patients positive for
COVID-19. Radiology 296(2), E72–E78 (2020)
15. Zhang, T.Y., Suen, C.Y.: A fast parallel algorithm for thinning digital patterns.
Commun. ACM 27(3), 236–239 (1984)
16. Lopez-Cabrera, J.D., Orozco-Morales, R., et al.: Current limitations to identify
COVID-19 using artifcial intelligence with chest X-ray imaging. Health Technol.
11, 411–424 (2021)
17. Maguolo, G., Nanni, L.: A critic evaluation of methods for COVID-19 automatic
detection from X-ray. Inf. Fus. 76, 1–7 (2021)
Semantic Segmentation of Abnormal
Lung Areas on Chest X-rays to Detect
COVID-19

Artur Przelaskowski(B) , Magdalena Jasionowska-Skop, and Grzegorz Ostrek

Faculty of Mathematics and Information Science, Warsaw University of Technology,


75 Koszykowa st., 00-662 Warsaw, Poland
{arturp,jasionowskam,grzegorz.ostrek}@mini.pw.edu.pl

Abstract. The main objective of this study was to effectively segment


the lung tissue area in chest radiograms (called a chest X-ray: CXR). The
results of conducted analysis were related to the requirements of effective
detection and description of COVID-19 (C-19) symptoms to support
the diagnosis of this disease. The proposed method uses the concept
of representing the chest radiogram using a dictionary of matched lung
shape patterns in reference CXRs. The initial lung shape approximation
is then corrected by non-rigid registration based on the tissue texture
distribution. The optimization criteria used emphasize tissue features
that may have diagnostic significance. We refer to this as the semantic,
more reliable lung segmentation required by C-19 diagnostic support.
The obtained efficiency is comparable to the best ML reference methods
and not far from the average efficiency of DL methods. The relatively
high values of the minimum fit indices demonstrate the stability and
reliability of the segmentation performed.

Keywords: Chest x-ray radiography · Lung segmentation · Supported


diagnosis of covid-19 · Semantic descriptors · Recognition of image
signatures

1 Introduction
Reliable recognition of COVID-19 and possibly precise determination of the
scope, severity and prognosis of the disease development is an extremely impor-
tant research task in the era of the current pandemic. However until now the
reliability and effectiveness of the commonly used methods for diagnosing C-19
has been limited, debatable and even controversial [1,2]. High hopes are placed
on chest imaging including computer tomography (CT) and CXR. Although
CT imaging is essentially the gold standard for the diagnosis of lung disease
because it generates spatially differentiated, detailed scans, the low specificity
of CT interpretation is well known. In the case of CXR, a limitation is the low
sensitivity in the initial stages of C-19 development, as well as the ambiguity of

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 246–260, 2022.
https://doi.org/10.1007/978-3-031-09135-3_21
Lung Segmentation on Chest X-rays 247

the observed symptoms. Therefore, the search for improvement of the diagnostic
methods used so far or the search for new ones is extremely important.
It can be said that the most common tool used for both diagnostic and mon-
itoring the course of the disease is CXR available in most health care facilities.
Compared to RT-PCR or CT imaging tests, having an X-ray image is an extremely
low cost and a fast process taking only a few seconds [3]. This portable modality
prevents the patient to move, minimizing the possibility of spreading the virus and
exposing the patient to a lower dose of ionizing radiation [4]. Reported diagnostic
performance in a large high C-19 prevalence cohort of CXR was significant. For
example, 79% sensitivity and 81% specificity were reported for the diagnosis of
viral pneumonia in symptomatic patients with clinical suspicion of C-19 [5].
A special role is played by the concepts and specific implementations of
computer-aided lung diagnosis based on digital CXR images. This applies to
recognition, characteristics or clinical interpretations of such diseases like tuber-
culosis, pneumonia, cardiomegaly, pneumothorax, pneumoconiosis, emphysema
or cancer [6–8]. In particular, effective segmentation of the lung regions from
surrounding thoracic issue turns out to be an essential, very important compo-
nent of any intelligent method of CXR image analysis, disease recognition or
case interpretation in the context of specific clinical circumstances [9,10]. This
task is not easy, especially in view of significantly diverse body habitus, varying
image quality including sampling distortions, multiplicative noise level, possi-
ble motion and low contrast of the imaged objects, presence of artifacts due to
external or internal non-anatomical objects, i.e. introduced medical equipment
including electrocardiographic leads, prosthetic devices etc.

2 Lung Segmentation

An analysis of the state of the literature identified three major groups of algo-
rithms: i) rule-based methods generally used as initialization of more robust pixel
classification-based methods; ii) model-based methods including active shape or
active appearance models and level sets with extensions; iii) deep learning (DL)
models in various configurations. Of course, also various forms of hybrid integra-
tion of these concepts have been proved to be effective in some applications [9,11].
However, this problem has taken on particular importance in the context of C-19
diagnosis. The most effective recently developed methods typically propose CXR-
based DL models, overwhelmingly analyzing entire radiograms without the need
to precisely delineate the lung region. However, this has proven to be unreliable
in some cases [12]. The high performance of disease classification was sometimes
determined significantly by features extracted from outside the lung region alone
[4]. Random correlations completely unrelated to the object of analysis proved
decisive. The trained model in this case did not describe real relationships between
selected image descriptors and reliable symptoms of the disease confirming its diag-
nosis. Using lung segmentation, we force the model to focus only on lung areas or
even a more narrowly defined region within the lung reasoned to model knowledge
regarding C-19 and human related performance. Segmentation may not improve
248 A. Przelaskowski et al.

the classification results, but because it forces you to use only lung area informa-
tion, it increases reliability of the results and justifies them.
The effect of lung segmentation and more detailed CXR content on C-19
identification was evaluated by Teixeira et al. [10]. The hypothesis that a proper
lung segmentation might mitigate the bias introduced by data-driven models
and provide a more realistic performance was verified. The obtained results sug-
gest that proper lung segmentation can effectively remove a meaningful part of
noise and background information, forcing the model to take into account only
data from the lung area containing desired information in this specific context.
However, they found that even after segmentation, there is a strong bias due to
factors from the different specifics of the imaging systems. These experiments
made us realize the importance of narrowing the analysis to areas of interest
where actual C-19 symptoms may appear, while reducing the influence of non-
anatomical objects, any artifacts specific to certain imaging procedures.
Thus, we proposed a method that appeals to domain knowledge primarily con-
cerning the specifics of X-ray imaging, the determinants of C-19 diagnosis, includ-
ing the features of differential symptoms, and the object-oriented description of
lung shape using real, representative CXR manifestations ordered by criteria of a
sparse, semantic representation of lung tissue. The solution closest to our concept is
object-oriented lung description using anatomical atlases (AnAtlas) [8]. The refer-
ence atlas was constructed from a limited set of preselected, possibly representative
chest radiograms with binary lung shape GT (Ground Truth) patterns. Five items
were then selected from this atlas that were most similar to the query in terms of
the two (vertical and horizontal) Radon projections of the tissue intensity distribu-
tion. This robust idea, developed in the context of early detection of tuberculosis,
refers to the concept of patient-specific lung model extraction basing on the most
similar radiograms preselected according to content-base image retrieval (CBIR),
also used for CXR analysis by [13].
The proposed method reduces to the integration of two fundamental con-
cepts: a) effective local characterization differentiating specific features of lung
tissue relevant to the description of C-19 symptoms with respect to other tissues
and useless objects; b) inference of domain knowledge regarding possible features
of lung shape in the context of possible determinants of C-19 diagnosis. Thus, it
combines object-oriented modeling of lung shape with differential tissue charac-
terization taking into account all anomalies and individual determinants of the
analyzed case. Thus, we developed a kind of dedicated segmentation, which can
serve as estimated activity maps of differentiating features of the C-19. These
maps can be directly analyzed by diagnosticians or used to construct or optimize
models identifying the disease or describing it numerically.

3 Method

We propose a method for effective lung segmentation with particular empha-


sis on areas susceptible to manifestations of C-19. That is, segmented areas of
the lung should be primarily narrowed to the areas prone to C-19 lesions while
Lung Segmentation on Chest X-rays 249

segmentation of redundant areas of the lung is particularly inadvisable. Thus,


the immediate goal was to semantically segment any possibly abnormal or diag-
nostically important lung area. In designing the method, we aspired to reliable
simplicity, although we are aware that a matter is always more complex (referring
to Alfred North Whitehead [14]). Therefore, we have appealed to the paradigm
of knowledge-based model as essential in the construction of universal meth-
ods, relating to the essence, reaching deeper than even the most complete set of
its representatives. Formalized conclusions from discussed human observations
and collected decision-making interpretations were used, serving the interactive
optimization of models shaped on the basis of analyzes of available measurement
reconstructions. These reconstructions are formed using possibly invariant ker-
nel or sparse representations [15] using reliable usability criteria. This strictly
relates to the empirical model building proposed by Thompson [16]. Utilizing the
respective domain knowledge, objectified and standardized in most reliable way
our approach propose algorithmic framework to address human interpretations
and decisions as reliably and effectively as possible.
Based on these assumptions, the segmentation method developed is as
follows:
– CXR (C) preprocessing: appropriate preprocessing of the analyzed radio-
grams P(C) for their qualitative normalization, extracting/differentiating fea-
tures of the lung tissue and extracting its diagnostically relevant textural fea-
tures; we used simple but varied forms of preprocessing respectively for query
radiograms and dictionary patterns;
– designation of the knowledge-based dictionary (D): selection of repre-
sentative and possibly complete set of real lung patterns CD constituting GT
was done by analyzing available reference sources; each item was defined by pre-
cise binary maps B(CD ) determined by experienced radiologists; the quality of
the used patterns was controlled by means of selected preprocessing methods
P(CD ); this dictionary is used for approximation of radiogram input by expan-
sion in the set of most similar approximants selected from the dictionary;
– approximation A of the query lung CQ shape: determining a relatively
small set of most similar approximants CD,Q allows to use their GT maps for
approximation of segmented radiogram map; we do it by dictionary-based dic-
tionary retrieval, which effectiveness depends both on the similarity measure
S(CQ , CD,Q ) used directly, as well as on the preliminary determination of the
class of retrieved case narrowing of the search area to the nearest dictionary pat-
terns; next, texture-based nonrigid registration of the selected approximants
was used in order to determine the pixel-wise transformation model T to fit the
query CQ ≈ T (CQ , CD,Q ) basing on distribution of local image descriptors;
– final lung area refinement: GT bitmaps B(CD,Q ) of the respective approx-
imants are transformed and superimposed S to form the approximated query
bitmap B(CQ ) = S(T (B(CD,Q ))) describing the distribution of significant
lung tissue in the segmented query; S-operator was optimized as slight blurring
and summation of the registered lung masks followed by adaptive threshold-
ing; finally, B(CQ ) was perfected by smoothing using cubic splines-based with
respective control nodes to match a natural properties of segmented tissue.
250 A. Przelaskowski et al.

A schematic of the method is presented in Fig. 1. The details of the method


are described in turn.

Fig. 1. Flowchart of the proposed method for semantic segmentation of abnormal lung
regions on chest X-ray images. In addition to the basic scheme of the method, extensions
to the suggested method for its effective use are also presented. These are highlighted
in yellow. A description of a possible artifact detector is provided in [35]. The next
section provides a brief outline of our test implementations of the C-19 detector, which
uses feature distributions in a designated, diagnostically relevant ROI. The figure sym-
bolically shows selected forms of texture feature interpretation highlighting potential
C-19 symtoms

3.1 Preprocessing to Improve Segmentation


In our case, the purposes of preprocessin gare as follows: a) qualitative normal-
ization making further image analysis independent of various acquisition system
parameters (resolution, dynamic range, tube parameters, detector specifics, noise
level and type, artifacts etc.); b) to differentiate lung tissue from surrounding
structures; c) emphasizing clarity in the distribution of key local features of C-19.

State of the Art. Among effective methods, median filtering was sometimes
used both for denoising and to enhance the local contrast [17]. Further contrast
enhancement by normalization allows the contrast between the whole chest and
the background area to be increased. The intensity range has been expanded
to take full advantage of all possible intensity ranges [18]. More effective solu-
tion starts with an image energy decomposition across subsequent bands. Then,
the localized energy of each band was iteratively scaled to a reference value to
reconstruct locally normalized image [19], giving maximum standardization of
the structures of interest in the radiogram. For extraction purposes, 2-D Law’s
masks have been used so far, among others, to improve the texture information
of the segmented chest and highlight its micro-structure characteristics. Nine
Lung Segmentation on Chest X-rays 251

3 × 3 Law’s texture masks were applied on the initially segmented chest image
to produce a new texture image for every convolution mask [18]. The ensemble
model as a group of weak learners was used to form a powerful learner increas-
ing the precision of the model based on GLCM-based features calculated from
each of the 9 texture images. Efficient extraction of key lung tissue features by
unsharp masking was confirmed by Chen [20].

We Used the following experimentally validated forms of preprocessing: a) tex-


ture and edges-preserving denoising using median filtering of the query CXR
in 2-D in a 11-by-11 pixel context, followed by unsharp masking to sharpen
the textures with increase in the contrast to level 2 (Matlab implementation);
b) normalize by proportionally resizing the image to a height equal to 256 with
bilinear interpolation using weighted average of pixels in the 2-by-2 neighbor-
hood; c) unsharp masking of query and patterns to extract texture features and
perfect nonrigid registration, with level 2 of the contrast and standard deviation
of the Gaussian lowpass filter equal 2.

3.2 Knowledge-Based Dictionary of Lung Shapes


State of the Art. Dictionaries as forms of knowledge representation have been
used in CBIR systems, among others [21]. Image understanding with knowledge-
based dictionaries allows semantic expansion of the query [22]. To use it effec-
tively, we referred to the concept of highly nonlinear approximation [23]. It con-
sists in the fact that, given a target object C to be reliably described, we deter-
mine both a dictionary D of representative approximants belonging to a Hilbert
space and an n-term approximation of C from D. The expansion of C in D con-
sists of determining the projection of C onto the dictionary items. These projec-
tions are a measure of mutual similarity, which in our case should have diagnostic
significance. After determining the projections, dictionary items (approximants)
most similar to C in the number depending on the adopted threshold of accept-
able similarity are used to approximate C. Understandably, similarity between
approximated and reference objects can be defined by controlling their most
salient features in the context of their target diagnostic interpretation [8,24].
Domain knowledge in our case includes specifics of lung shape in CXR, i.e. large
variations in shape and specific features of the lungs taking into account the
relationship between the left and right one, characteristics of typical symptoms
of lung diseases according to appropriate distribution of diseases reflecting their
prevalence, adequate distribution among various age groups, gender diversity
etc. also taking into account the impact of different acquisition conditions[9].

We Proposed the dictionary-based representation of domain knowledge


regarding the specificity of CXRs in relation to C-19 diagnosis. Dictionary design
were used for spatial texture prediction based on sparse feature approximations
giving a reduced analysis and content matching space. The quality and content
limitations of real diagnostic datasets are particularly important in constructing
252 A. Przelaskowski et al.

an effective dictionary-based model. It is essential that all included dictionary


radiograms (patterns) should be labeled by reliable binary lung masks constitut-
ing GT reference for our research. These masks should be manually delineated
by an expert radiologists, in accordance with their interpretative requirements.
Considering these criteria, we used two collections of established reputation to
construct the dictionary: Japanese Society of Radiological Technology (JSRT)
[25] dataset and Montgomery County chest X-ray set (MC) [26]. A dataset com-
piled by JSRT contains 247 CXRs of 2048×2048 size and 12-bit grayscale, includ-
ing 100 malignant nodule cases, 54 benign nodule cases, and 93 normals. Gold
standard masks of lung areas was specified for all cases. The MC dataset consists
of 80 normal radiograms and 58 cases of tuberculosis symptoms. The radiograms
are in 12-bit gray-scale, with resolution either 4020 × 4892 or 4892 × 4020.
The primary problem we sought to solve was to determine the size and con-
tent of the dictionary so that it would be complete and minimally redundant for
the predefined application requirements (pneumonia including specificity of C-19
symptoms). We conducted a series of experiments selecting the initial size and
content of the dictionary by characterizing the level of variation in dictionary
patterns and allowing for various forms of adaptation. We adopted a dictionary
construction based on the JSRT set by setting its size depending on the adopted
query to pattern and pattern to pattern similarity measure (described below
in text). We pre-estimated the average similarity between the elements of the
dictionary set and the query class representatives. For example, the experimen-
tally selected effective size of the normalized dictionary when segmenting the
selected CXRs of the MC set was significantly lower (lower mutual similarity,
smaller number of patterns satisfying threshold criteria of significant similarity
in general) than for the JSRT.

3.3 Query Radiogram Approximation


State of the Art. Content-based image retrieval has often been used to analyze
medical images, including for the purpose of assisted C-19 diagnosis. However,
it has typically used a specific image to perform a query on a large database
to find the most diagnostically similar cases, including retrieval models based
on deep metric learning [24]. A small subset of images in the training database
that are most similar to the patient query image using CBIR inspired approach
was applied in [8]. Pattern analysis and the ability to filter CBIR results accord-
ing to dominant disease features by selecting high-level semantic features have
been used to dramatically improve physicians’ accuracy in diagnosing interstitial
lung diseases [27]. Because of possibly nonlinear nature of lung shape changes,
it is appropriate to register the lung images by using a deformable or nonrigid
method. The radiograms to be registered have geometric differences that cannot
be accounted for by global translation, rotation, and scaling [8,28]. Intensity-
based methods tend to misregister small structures in the lung areas relying on
image intensity. Therefore, local feature-based methods are preferred when local
structural information is crucial in identifying similar areas. The performance of
such registration is closely related to the accuracy of feature extraction based
Lung Segmentation on Chest X-rays 253

on relevant descriptors and matching key content in radiograms. The precision


of this clue matching depends on the choice of textural features used to cor-
rectly approximate the relevant content of the query lung tissue. Registration is
achieved by minimizing a cost function that is a combination of the cost associ-
ated with the smoothness of the resulting transformation and the cost associated
with the similarity of the matched images.

We Proposed to implement the nonlinear approximation of a radiographic


query by dictionary-based pattern retrieval, followed by textural registration of
patterns most similar to the query to determine areas of diagnostically relevant
lung tissue. Integrated optimization criteria refer to a semantic similarity that
takes into account the specificity and geometric distribution of local lung tissue
differentiation, to distinguish it from the characteristics of other uninformative
parts of the radiogram outside the lung region, which is extremely important in
verifying the reliability of advanced interpretation models of segmented regions,
including in particular DL models [4,12].
Thus, we approximate CXR as a query of dictionary-based retrieval in the
following steps:

– establish a similarity measure using high-level semantic features important


in C-19 diagnosis; this measure can be tailored to the predefined characteris-
tics of the analyzed radiograms and to the requirements for their diagnostic
evaluation;
– adaptive forming the dictionary: initial selection (knowledge-based) of rep-
resentative, interpreted datasets and retrieval to fit the query characteris-
tics represented by the training set, which establishes the specificity of the
query and determines the set of patterns matching the characteristics of the
analyzed radiograms according to the critical similarity conditions (for the
selected similarity measure);
– query-based retrieval to adaptively select the best-fitted dictionary patterns
according to similarity restrictions;
– texture-based registration to find model, which specifies transformations of
the selected patterns to match the query.

The goal is to select similar dictionary patterns similar to query lung tissue
distribution at the interpretive levels in the context of supported diagnosis.
Experimentally, we concluded CW-SSIM (Complex Wavelet Structural Sim-
ilarity Index Measure) [29] as most useful 2D similarity metric. It is invariant
under affine transformations and can be used effectively to reliably determine
the level of semantic similarity in domain of predefined multiscale, local features
of tissue texture. Scale-invariant feature transform (SIFT), especially effective
in image retrieval applications [30], was selected and experimentally verified to
determine the domain in which high-level semantic features are calculated. How-
ever, we only used SIFT maps of the three fixed, most differential directions (7,
20, 50) to extract coarse-grained essential features of tissue profiling and reduce
computational complexity.
254 A. Przelaskowski et al.

Adaptive forming the dictionary was tested using aforementioned reference


datasets: JSRT and MC. When analyzing MC queries we used JSRT as ini-
tial dictionary. Taking randomly selected queries we evaluated necessary size of
JSRT-based dictionary to fit best results of lung segmentation. It turned out
that the selection of only 20 dictionary patterns was enough to effectively repre-
sent MC queries. Next, adaptive retrieval of the n-best-fitted dictionary patterns
(approximants) was carried out according to the following rule: a) sorting the
similarity scores si calculated for each i dictionary pattern; b) determining the
adaptive threshold: T = max(s) − 2 · µ(d)/ max(d) while s = si for i = 1, . . . , N ,
d = si − si+1 for i = 1, . . . , N − 1; N denotes dictionary size and µ – the scores
median; c) T -based hard thresholding of the similarity scores to select best-fitted
dictionary patterns.
Next, a set of best fit dictionary items was used to learn a patient specific lung
model. We used nonrigid registration to fit the target lung segment by match-
ing corresponding pixels of the query and approximants. Texture-based analysis
was used to determine the model of such transformation. Deformable registration
solves the alignment process in an energy minimization framework calculating
the corresponding pixels between approximant training and patient X-ray which
provides the transformation mapping for each pixel. We adapted SIFT flow algo-
rithm [31] which involves matching densely sampled, pixel-wise SIFT features
between respective approximant and query while preserving spatial discontinu-
ities. Firstly, the corresponding pixels between two images are calculated by
matching the SIFT descriptors to provide the transformation mapping for each
pixel. Predefined objective function forces the algorithm to match these points.
Next, designated model was used to warp the most similar training masks to the
patient CXR.

3.4 Lung Shape Refinement

A set of GT masks of the selected approximants were used to obtain the cor-
responding final lung segmentation. The determined parameters of the warping
model allow mapping all points of these masks to the target shape of the seg-
mented lungs. The proposed algorithm of lung shape refinement thus looks as
follows:

– warping the most similar masks of the selected approximants to the patient
CXR on the basis of transformations determined in the SIFT feature space;
– the transformation model can be optimized by using local texture features
specified by descriptors that primarily characterize important diagnostic fea-
tures of lung tissue;
– lung shape refinement using transformed lung masks summed together, taking
into account the similarity weights of the corresponding dictionary patterns
to the query radiogram, calculated during query approximation;
– adaptive thresholding determines the initial form of the binary mask of the
query followed by final contour smoothing with cubic splines.
Lung Segmentation on Chest X-rays 255

In our experiments, matched radiograms were also modeled using speeded up


robust features (SURF), determining the distribution of Haar-wavelets within
in the preset neighborhood of the predetermined interest points following an
orientation assignment step [32]. Moreover, maximally stable extremal regions
(MSER) [33] and histograms of oriented gradients (HOG) [34] have been proved
to be sufficiently effective to validate C-19 symptom recognition. Using these fea-
tures, it is possible to narrow the area of C-19 diagnosis by reducing the number
of false indications for models used to support C-19 diagnosis. The proposed
methodology offers this possibility. However, only preliminary considerations
and very brief results are included in this publication. This is subtly hinted at
in the schematic of the presented method in Fig. 1.

4 Experiments
The research discussed here and the results of the experiments carried out were
oriented toward the effective diagnosis of C-19 in lung radiograms. In our opin-
ion, the key stage confirming the reliability of the obtained results and, above
all, the usefulness of the conclusions formulated on their basis is the semantic
segmentation of the lung area. The use of domain knowledge to indicate pos-
sible areas of expression of C-19 symptoms, taking into account the specificity
of the imaging system used, is essential in the clinically effective interpretation
of the acquired information. The experiments were conducted primarily to ver-
ify the presented concept in its fundamental aspect concerning non redundant
segmentation. Therefore, it was not so much the average efficiency, but mainly
the minimum error obtained on a reliable test set that was the most important
evaluation criterion. The implementation used includes previously characterized
algorithms, methods, relationships, and parameters including registration based
on SIFT features as universal in retrieval of radiograms. The reported results
serve only as a preliminary verification of the proposed concept for the use of
advanced C-19 detection methods. They were planned as a demonstration of its
feasibility and preliminary verification of practical potential. It is therefore a
prof of concept study, not a full, credible verification of usability of the proposed
segmentation method, with the criterion of possibly minimal errors with respect
to reference masks.

4.1 Adopted Conditions and Obtained Results


Three selected databases were applied in the experiments. The JSRT and MC
sets described earlier were used in the experiments as test datasets to evalu-
ate the effectiveness of the proposed segmentation method. In addition to these
high quality standardized chest radiograms, CXRs of critically ill patients dif-
fering in image quality, presence of artifacts, varying body habitus, and disease
manifestation were included in the experiments to generalize the evaluation of
the effectiveness of the segmentation methods. The third database used in our
experiments was based on VinDr-CXR [36]. It is an open dataset of chest X-
rays with various resolution and 12–14 bit dynamics, annotated by experienced
256 A. Przelaskowski et al.

radiologist’s. VinDr-CXR contains 18,000 scans contains radiograms of the con-


trol group and 14 types of thoracic abnormalities. We selected balanced test set
containing 1366 normal cases and 1378 cases of opacities and consolidations to
evaluate the role of segmentation in verification and substantiation of C-19 recog-
nition methods. We verified the effectiveness of lung area segmentation against
reference masks using the two most commonly applied similarity metrics: Dice
and Jaccard coefficients [6,8]. These are used to quantify the area of overlap
of segmentation effects relative to reference patterns. The segmentation results
obtained with respect to selected reference methods previously verified on the
same sets of radiograms were shown in Table 1.

Table 1. Evaluation of the segmentation performance of the proposed solution on


JCRT and MC reference datasets using Dice and Jaccard metrics. The obtained results
were related to the performance of the most efficient methods known from the liter-
ature. The top performers in each category are highlighted in thick font. Differences
in the overall methodology of the compared solutions are also indicated, as previously
described

Approach Methodology Database Dice Jaccard


mean ± CI min mean ± CI min
SEDUCM [37] rule-based JSRT .975 ± .010 – .952 ± .018 –
Pix Class [38] rule-based JSRT – – .946 ± .018 –
ShapApp [39] model-based JSRT .972 ± .010 – .946 ± .019 –
Shape Thr [40] model-based JSRT – – .940 ± .053 –
U-Net [6] DL MC .969 ± .027 .844 – –
JSRT .982 ± .001 .950 – –
InvertedNet [41] DL JSRT .974 – .950 –
DCNN [42] DL JSRT .962 ± .008 – – –
NWTrain [43] DL JSRT .980 ± .008 – .961 ± .015 –
Atlas [8] hybrid MC .960 ± .018 – .941 ± .0034 –
JSRT .967 ± .008 – .954 ± .015 –
TVAC [6] hybrid MC .957 ± .0025 .857 – –
JSRT .950 ± .030 .849 – –
Proposed hybrid MC .966 ± .002 .914 .935 ± .004 .842
JSRT .975 ± .001 .950 .952 ± .001 .905

In addition, we investigated the effectiveness of modeling the distribution


of diagnostically relevant features of lung tissue using the aforementioned local
semantic descriptors SIFT, SURF, MSER and HOG. A weighted compilation
of these feature vectors was used to identify two key symptoms important for
C-19 diagnosis: opacities and consolidations (the third database) resulting in a
recognition accuracy of 0.9. In this case extracted features were classified with
integrated models of SVM, discriminant analysis and decision trees. Alterna-
tively, the same feature vectors counted from whole radiograms was applied
Lung Segmentation on Chest X-rays 257

to train DL ResNet50 (30 epochs, 2 passes). The resulting accuracy of binary


classification of norms against cases of opacities and consolidations was 0.97.
However, training the same ResNet50 structure with radiograms limited to only
segmented regions of the lung, characterized by the same descriptors, reduced
the performance of the DL model to accuracy value below 0.6. This experiment
confirmed the particularly important role of semantic segmentation in increasing
the reliability of learned DL models.

5 Discussion and Conclusions


The selection of the most similar approximants from the dictionary of refer-
ence radiographs plays an important role for the effectiveness of segmentation.
What is important is their necessary number, which usually increases in rela-
tion to greater complexity and atypicality of the diagnostic content contained
in the radiogram (it needs to be represented by more dictionary content com-
ponents). However, if the dictionary contains more representations dedicated
to the detailed specificity of the case, a larger number of them can only blur
the unambiguity of the description of other query cases. A solution was tested
that, depending on the case specificity, adaptively selects the number of dictio-
nary representations approximating the lung shape in a given case, based on the
achieved query similarity level. However, the increasing computational complex-
ity did not compensate for only small improvements in segmentation effects in
some cases. A trade-off between complexity and efficiency of the approximation
process was chosen. This took into account both the limited reliability of sub-
jectively determined GT contours, but also semantic segmentation goals limited
to C-19 diagnostically useful areas.
The obtained results confirm the high efficiency of the proposed segmenta-
tion method with respect to the state-of-the art methods, including DL models
(Table 1). Developing as complete a set of dictionary patterns as possible is
challenging due to the enormous variation of lung shape in real radiograms. Too
numerous dictionary significantly increases the computational complexity. It is
difficult to find a compromise between the size of the dictionary at subsequent
stages of its analysis and the efficiency of its retrieval, and then the effectiveness
of the final segmentation. The conducted experiments demonstrate the great
potential of the considered methodology. The extreme segmentation errors that
occur are smaller than the results known from the literature. This indicates
that the aim of the conducted research, i.e., the development of a segmenta-
tion method that increases the reliability of radiogram analyzes, especially when
applied to the recognition of C-19 symptoms, has been successfully achieved.
Also, an additional analysis of the effectiveness of lung region segmentation
performed on a set of highly variable radiograms containing various artifacts
(selected part of the third test set) confirmed the reliability of the proposed
method (no significant segmentation errors) [35].
The advantage of proposed methodology lies in its deliberate adaptation to
the requirements of the considered applications by efficient selection of rele-
vant tissue content. Preliminarily tested methods of C-19 recognition by feature
258 A. Przelaskowski et al.

analysis of segmented lung areas allow us to exclude the mechanism of efficient


learning based on random correlations, which have no informative meaning in
the context of a computer-aided form of C-19 diagnosis. We focused on a high
reliability approximation of the lung area for further analysis and interpretation
of radiograms, with a particular emphasis on computationally modeled symp-
toms based on semantic descriptors allowing control of quality criteria, but also,
above all, usability.
The resulting segmentation of the lung area allowed us to significantly
increase the efficiency of supporting the diagnosis of C-19 by analyzing CXRs in
preliminary experiments beyond the study area presented here, with the diagno-
sis being determined by a comprehensive description of the local features of the
lung lesions. Dysfunctional pneumonia with more prominent foci often results
in extensive changes in subtle tissue features described by the distribution of
local features, even beyond the opacities or consolidations seen by radiologists.
Their extraction and quantifiable evaluation could enhance the diagnosis made
by radiologists. However, this hypothesis should be verified by much more exten-
sive experiments on C-19 detection and prognostic evaluation.

References
1. Mardian, Y., Kosasih, H., et al.: Review of current COVID-19 diagnostics and
opportunities for further development. Front. Med. 8, 615099 (2021)
2. Laskar, P., Yallapu, M.M., Chauhan, S.C.: “Tomorrow Never Dies”: Recent
advances in diagnosis, treatment, and prevention modalities against coronavirus
(COVID-19) amid controversies. Diseases 8, 30 (2020)
3. Yamac, M., Ahishali, M., et al.: Convolutional sparse support estimator-based
COVID-19 recognition from X-ray images. IEEE Tran. Neura Networks Learn.
Syst. 32(5), 1810–1820 (2021)
4. Lopez-Cabrera, J.D., Orozco-Morales, R., et al.: Current limitations to identify
COVID-19 using artifcial intelligence with chest X-ray imaging. Health Technol.
11, 411–24 (2021)
5. Flor, N., et al.: Diagnostic performance of chest radiography in high COVID-
19 prevalence setting: experience from a European reference hospital. Emergency
Radiol. 28(5), 877–885 (2021). https://doi.org/10.1007/s10140-021-01946-x
6. Reamaroon, N., Sjoding, M.W., et al.: Robust segmentation of lung in chest x-ray:
applications in analysis of acute respiratory distress syndrome. BMC Med. Imaging
20, 116 (2020)
7. Liu, X., Li, K.-W., et al.: Review of deep learning based automatic segmentation
for lung cancer radiotherapy. Front. Oncol. 11, 717039 (2021)
8. Candemir, S., Jaeger, S., Palaniappan, K., et al.: Lung segmentation in chest radio-
graphs using anatomical atlases with nonrigid registration. IEEE Tran. Med. Imag-
ing 33(2), 577–590 (2014)
9. Candemir, S., Antani, S.: A review on lung boundary detection in chest X-rays.
Int. J. Comput. Assist. Radiol. Surg. 14(4), 563–576 (2019). https://doi.org/10.
1007/s11548-019-01917-1
10. Teixeira, L.O., Pereira, R.M., Bertolini, D., et al.: Impact of lung segmentation
on the diagnosis and explanation of COVID-19 in chest X-ray images. Sensors 21,
7116 (2021)
Lung Segmentation on Chest X-rays 259

11. Calli, E., Sogancioglu, E., et al.: Deep learning for chest X-ray analysis: a survey.
Med. Image Anal. 72, 102125 (2021)
12. Maguolo, G., Nanni, L.: A critic evaluation of methods for COVID-19 automatic
detection from X-ray images. Inf. Fusion 76, 1–7 (2021)
13. Yu, Y., Hu, P., Lin, J., Krishnaswamy, P.: Multimodal multitask deep learning
for X-ray image retrieval. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N.,
Speidel, S., Zheng, Y., Essert, C. (eds.) MICCAI 2021. LNCS, vol. 12905, pp.
603–613. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3 58
14. Stengers, I.: Thinking with Whitehead: A Free and Wild Creation of Concepts.
Harvard University Press, Cambridge (2014)
15. Wang, H., Wang, Q., et al.: Multi-scale location-aware kernel representation for
object detection. In: Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pp. 1248–57 (2018)
16. Thompson, J.R.: Empirical Model Building: Data, Models, and Reality. Wiley,
Hoboken (2011)
17. Hassanien, A.E., Mahdy, L.N., et al.: Automatic xray covid-19 lung image clas-
sification system based on multi-level thresholding and support vector machine.
medRxiv (2020)
18. Mohammed, S.N., Alkinani, F.S., Hassan, Y.A.: Automatic computer aided diag-
nostic for COVID-19 based on chest X-ray image and particle swarm intelligence.
Int. J. Intell. Eng. Syst. 13, 5 (2020)
19. Philipsen, R.H.H.M., Maduskar, P., et al.: Localized energy-based normalization
of medical images: application to chest radiography. IEEE Trans. Med. Imaging
34(9), 1965–1975 (2015)
20. Chen, S., Cai, Y.: Enhancement of chest radiograph in emergency intensive care
unit by means of reverse anisotropic diffusion-based unsharp masking model. Diag-
nostics 9, 45 (2019)
21. Khodaskar, A., Ladhake, S.: Semantic image analysis for intelligent image retrieval.
Procedia Comput. Sci. 48, 192–197 (2015)
22. Chenggang, L.L., Yan, C., et al.: Distributed image understanding with semantic
dictionary and semantic expansion. Neurocomputing 174(A), 384–392 (2016)
23. DeVore, R.A.: Nonlinear approximation. Acta Numerica 7, 51–150 (1998)
24. Zhong, A., Li, X., et al.: Deep metric learning-based image retrieval system for
chest radiograph and its clinical applications in COVID-19. Med. Image Anal. 70,
101993 (2021)
25. Shiraishi, J., Katsuragawa, S., Ikezoe, J., et al.: Development of a digital image
database for chest radiographs with and without a lung nodule: receiver Operating
Characteristic analysis of radiologists’ detection of pulmonary nodules. AJR 174,
71–74 (2000)
26. Jeager, S., Candemir, S., et al.: Two public chest X-ray datasets for computer-
aided screening of pulmonary diseases. Quant. Imaging Med. Surg. 4(6), 475–477
(2014)
27. Pogarell, T., Bayer, N., et al.: Evaluation of a novel content-based image retrieval
system for the differentiation of interstitial lung diseases in CT examinations. Diag-
nostics 11, 2114 (2021)
28. Nonrigid registration of lung CT images based on tissue features. Comput. Math.
Meth. Med. 834192, 1–7 (2013)
29. Sampat, M.P., Wang, Z., et al.: Complex wavelet structural similarity: a new image
similarity index. IEEE Tran. Image Proc. 18(11), 2385–2401 (2009)
260 A. Przelaskowski et al.

30. Nabizadeh-Shahre-Babak, Z., Karimi, N., et al.: Detection of COVID-19 in X-ray


images by classification of bag of visual words using neural networks. Biomed.
Signal Process. Control 68, 102750 (2021)
31. Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across different
scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994
(2011)
32. Ashour, A.S., Eissa, M.M., et al.: Ensemble-based bag of features for automated
classification of normal and COVID-19 CXR images. Biomed. Signal Process. Con-
trol 68, 102656 (2021)
33. Moitra, D., Mandal, R.K.: Automated AJCC (7th edition) staging of non-small
cell lung cancer (NSCLC) using deep convolutional neural network (CNN) and
recurrent neural network (RNN). Health Inf Sci Syst. 7(1), 14 (2019)
34. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In:
IEEE Computer Society Conference on Computer Vision and Pattern Recognition,
vol. 1, 886–893 (2005)
35. Moskal, A., Jasionowska-Skop, M., Ostrek, G., Przelaskowski, A.: Artifact detec-
tion on X-ray images of lung with COVID-19 symptoms. Submitted to IBIB 2022
36. Nguyen, H.Q., Lam, K., et al.: VinDr-CXR: an open dataset of chest X-rays with
radiologist’s annotations. arXiv:2012.15029
37. YangW, W., Liu, Y., et al.: Lung field segmentation in chest radiographs from
boundary maps by a structured edge detector. IEEE J. BiomedHealth Inform.
22(3), 842–851 (2018)
38. Ginneken, B., Stegmann, M., Loog, M.: Segmentation of anatomical structures
in chest radiographs using supervised methods: a comparative study on a public
database. Med. Image Anal. 10(1), 19–40 (2006)
39. Shao, Y., Gao, Y., et al.: Hierarchical lung field segmentation with joint shape and
appearance sparse learning. IEEE Trans. Med. Imaging 33(9), 1761–1780 (2014)
40. Dawoud, A.: Lung segmentation in chest radiographs by fusing shape information
in iterative thresholding. IET Comput. Vis. 5(3), 185–190 (2011)
41. Novikov, A., Major, D., et al.: Fully convolutional architectures for multi-class
segmentation in chest radiographs. IEEE Trans. Med. Imaging 37(8), 1865–76
(2018)
42. Kalinovsky, A., Kovalev, V.: Lung image segmentation using deep learning methods
and convolutional neural networks. Pattern recognition and information processing.
Publishing Center of BSU, Minsk (2016)
43. Hwang, S., Park, S.: Accurate lung segmentation via network-wise training of
convolutional networks. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017.
LNCS, vol. 10553, pp. 92–99. Springer, Cham (2017). https://doi.org/10.1007/978-
3-319-67558-9 11
Verification of Selected Segmentation
Methods in Relation to the Structures
of the Knee Joint

Weronika Żak and Piotr Zarychta(B)

Faculty of Biomedical Engineering, Silesian University of Technology,


ul. Roosevlta 40, 41-800 Zabrze, Poland
werozak059@student.polsl.pl, piotr.zarychta@polsl.pl

Abstract. The main aim of this research is to present a verification of


selected segmentation methods in relation to the structures of the knee
joint. The paper shows the known medical image segmentation methods,
which have been used for extraction of hard (femur, patella, tibia) and
soft (anterior and posterior cruciate ligaments) structures of the knee
joint. These methods have been implemented in MATLAB and tested
on clinical MRI slices of the knee joint in sagittal plane.

Keywords: Knee joint · Cruciate ligaments · Femur · Patella · Tibia ·


RG FCM · FC

1 Introduction
In the human body are various joints about different sizes and structures. From
a medical point of view, the knee joint (Fig. 1) is one of the largest and the most
complex joints of the human. In this joint can be distinguish hard (femur, patella
and tibia) and soft (i.e. ligaments, muscles) structures. This joint consists of the
distal end of the femur, which abuts and slides on the proximal surface of the
tibia. The whole of the knee joint is completed by: the patella, ligaments and
muscles. The patella slides on the front surface of the distal end of the femur.
Large ligaments connect the femur and the tibia to provide stability to the joint.
The long thigh muscles ensure the strength of the knee [2,7,8].
This complex joint of the humane body plays a very important role in stand-
ing, walking and running. In the case of patellar chondromalacia a proper patella
segmentation [2,11] plays a very important role and in the case of the knee joint
arthroplasty (knee replacement) a correct femoral and tibial heads segmenta-
tion [9] plays a very important role. Therefore in the above mentioned cases, it
is very helpful for the specialist doctor (orthopedist) that these bone structures
may be extracted from MRI or CT slices of the knee joint and in the next step
its three-dimensional presentation be obtained. This allows the specialist doctor
to accurately diagnose the pathological bone structures of the knee joint.
The soft structures are also very important in the knee joint. The cruci-
ate ligaments (ACL – anterior cruciate ligament and PCL – posterior cruciate
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 261–270, 2022.
https://doi.org/10.1007/978-3-031-09135-3_22
262 W. Żak and P. Zarychta

ligament) together with the collateral ligaments are responsible for the knee sta-
bility and ensure proper arthrokinematics and contact forces [2]. The ACLs and
PCLs (Fig. 1) belong to the group of anatomical structures frequently susceptible
to injuries (especially in athletes [5]. A proper segmentation of the injured cruci-
ate ligaments and their three-dimensional representation allows the orthopedist
to accurately diagnose.

Fig. 1. MRI slices of the knee joint in sagittal plane: a) original slice of the T1-weighted
series and b) internal schema

In the literature many different methods have been dedicated to the segmen-
tation of the human body structures. Usually, these are methods that require
user interaction, sometimes these are fully automatic methods (very desirable
in the medical field) [3,8]. Usually, in practical solutions the below mentioned
methods are used [1,11]:

– thresholding – is based on pixel intensity, is not time consuming and can be


applied globally or locally (unfortunately, but implemented to the knee joint
structures gives approximate results);
– region growing – includes region-based segmentation methods, which are
applied locally on the image and are used, in the case of the knee joint,
mainly in order to bone structures (tibia or femur) extract;
– clustering – is widely used especially for segmentation applied to MRI of the
human body in each plane;
– deformable models – are semi-automated and require user interaction, how-
ever they are used in clinical applications (e.g. snakes or active contours);
– atlas – these methods require a lot of work from the expert, but applied to
bone structures of the knee joint give results at the level of 85–88% of the
Dice index.
Verification of Selected Segmentation Methods 263

2 Methodology
2.1 Region Growing
The method comprises selecting one or more seed points in the ROI (Region
Of Interest). Then, neighboring pixels are added to the seed point in case they
meet the inclusion criterion, which can be intensity level difference, texture etc.
Each currently analyzed pixel is compared in its neighborhood and if it meets
the inclusion criterion it is added to the structur. Most of the methods that are
based on region growing are semi-automatic methods due to the need to select a
seed point. The exact steps of the algorithm were described in [4,13]. According
to [12], the region growing method has some difficulties in segmenting cartilage
due to its thickness, so it is more widely used in segmenting bony structures.

2.2 FCM
In this study in order to ensure an automated method of segmentation of selected
knee structures atlas based segmentation has been implemented. In order to
achieve this purpose, the algorithm starts with the automated image match-
ing and after that with the normalization of clinical images. After these steps
a dataset to which all scans in the series are allocated is determined. On this
base the average feature vector for the teaching group, which automates and
streamlines both fuzzy segmentation methods (fuzzy c-means and fuzzy con-
nectedness) is delineated. These averaged features (centroid and surface area of
the segmented structure) are then transmitted to the fuzzy segmentation meth-
ods implemented for the testing group, correspondingly, for each given scan.
Centroids become then seed or starting points for the fuzzy methods, and the
surface area protects against oversegmentation.
According to the literature [9,10], the standard Fuzzy C-Means (FCM) algo-
rithm is very popular and widely used in practical solutions. This algorithm has
many advantages, however it does not incorporate spatial context information
which makes it sensitive to noise and image artefacts. In order to reduce this
disadvantage in this study the FCM objective function has been modified by
adding a second term. This second term formulates a spatial constraint based
on the median estimator. In image processing approaches an implementation of
median filtering replaces each data sample by its spatial neighbourhood func-
tion. Neighbourhood function is defined as M edF (xn , Z) = median(S), where
S = neighbourhood(xn , Z) and Z determines the size of the mask. Then, the
final formula for FCM median modified can be expressed as
c 
 N
 
J(U, V) = um
in xn − vi  + αM edF (xn , Z) − vi  ,
2 2
(1)
i=1 n=1

where uiN denotes membership function, vi denotes prototype for a given fuzzy-
fication level m (where 1 ≤ m ≤ ∞), xn = {xi , . . . , xk } and xn , vi ∈ F k and F k
is a feature space.
264 W. Żak and P. Zarychta

In this paper the fuzzy c-means algorithm with median modification is not
described in detail. An exhaustive description of the FCM algorithm with median
modification can be found in [7,9].

2.3 Fuzzy Connectedness

On the basis of literature [6], the fuzzy connectedness (FC) method is based
on the fuzzy affinity relation and the generalized definition of FC introduces
an iterative method, that permits the fuzzy connectedness to be determined in
relation to the marked image point (seed or starting point). The starting points
have been marked on the basis of the atlas based segmentation (centroids).
In this paper the fuzzy connectedness algorithm is not described in detail. An
exhaustive description this algorithm can be found in [7,9].

3 Discussion and Results

3.1 Materials

The methodology has been tested on 15 clinical T1-weighted MRI studies of the
knee joint. The entire data set had a total of 303 slices in sagittal plane. The
MRI data have been acquired for females and males of different ages.

3.2 Experiments and Evaluation

In order to increase the efficiency of computational procedures the fuzzy meth-


ods have been limited to 3D ROI (Fig. 2) for the extraction the both cruciate
ligament (ACL and PCL). A detailed description of the 3D ROI determination
procedure has been presented in [10].
In Fig. 3 have been presented selected steps in the segmentation (based on
FC method) of the bone structures of the knee joint: tibia, patella and femur,
respectively.
In Fig. 4 have been presented selected steps in the segmentation (based on FC
method) of the healthly and pathological cruciate ligament structures of the knee
joint: anterior cruciate ligament and posterior cruciate ligament, respectively.
Each method was able to segment bony and soft structures in all images.
Segmentation of bony structures and soft structures such as ligaments requires
different initial conditions. Each of the methods used produced results – some
were better and some were worse. The segmentation results presented by Dice
coefficient values are shown in tables. The obtained values of Dice index for
the analyzed MRI series were collected in Table 1 (Region growing), in Table 2
(Fuzzy connectedness) and in Table 3 (Fuzzy C-Means). The used methods were
compared on the same input data by using the same expert.
Discrepancy in the obtained values of the Dice index for bone structures
(tibia, patella, femur) has been shown on the box-and-whisker plot (Fig. 5). In
Fig. 5 is no the box-and-whisker plot obtained for the RG method due to too
Verification of Selected Segmentation Methods 265

Fig. 2. T1-weighted MRI series – signal from data base clinical hospital with 3D ROI
including the ACL and PCL

Fig. 3. Segmentation (FC method) of the bone structures of the knee joint: (a) tibia,
(b) patella and (c) femur

much dispersion of the minimum, maximum and median value in relation to


other methods.
When the region growing method was applied to the bony structures the
results were at a good level. This method did not deal with the PLC and ACL
ligament structures. From the data in Table 1, it can be seen that the region
266 W. Żak and P. Zarychta

Fig. 4. Segmentation of the healthly and pathological anterior and posterior crucuate
ligament structures in the MRI: (a) original image with starting point, (b) result after
fuzzy connectedness method (c) result superimposed onto the original image

growing method performed best on the femur, with the Dice coefficient values
being the highest here. Figure 6 shows the segmentation result for the best femur
segmentation using the region growing method. Green and magnenta colors have
been used for differences between superimposed structures and white for over-
lapping images [13]. As far as the ACL and PCL structures are concerned, the
region growing method did not cope with the segmentation – in addition to these
structures, other soft structures were also segmented, which meant that the seg-
mentation was not effective and the results were not satisfactory. The results
reached values of about 15%, which, compared to the Dice coefficient values of
Verification of Selected Segmentation Methods 267

Table 1. Values of Dice index for the analyzed MRI series for the Region growing
(RG) method

Case no. Region growing


tibia patella femur ACL PCL
1 78.44 58.04 87.97 – –
2 83.70 71.49 90.00 – –
3 78.90 67.00 91.60 – –
4 76.38 65.22 88.74 – –
5 77.59 67.22 85.38 – –
6 85.26 75.70 92.20 – –
7 89.60 72.84 89.12 – –
8 81.23 56.89 90.51 – –
9 80.77 60.69 91.70 – –
10 81.11 71.63 92.18 – –
11 89.49 47.47 87.83 – –
12 85.59 71.60 87.78 – –
13 80.55 61.63 92.78 – –
14 84.98 56.73 89.05 – –
15 81.99 56.76 90.64 – –
Average 82.37 64.06 89.83 – –

Table 2. Values of Dice index for the analyzed MRI series for the Fuzzy connectedness
(FC) method

Case no. Fuzzy connectedness


tibia patella femur ACL PCL
1 88.67 90.17 92.09 90.45 91.44
2 86.17 88.09 89.91 89.03 89.12
3 89.45 90.36 92.89 92.21 92.67
4 84.12 86.32 87.86 85.75 86.01
5 89.78 90.34 91.42 90.67 91.33
6 85.34 89.37 88.58 88.05 89.67
7 86.67 87.21 89.45 89.21 88.75
8 89.89 90.33 91.18 90.67 91.11
9 90.75 89.10 92.83 91.31 90.89
10 86.13 86.45 88.19 85.34 87.33
11 86.34 87.32 88.34 86.45 87.89
12 87.39 89.31 88.78 89.10 89.23
13 84.20 89.33 85.38 85.23 86.11
14 86.56 86.46 89.33 86.01 88.30
15 84.21 86.78 86.67 86.19 86.02
Average 88.47 87.67 89.53 88.38 89.06
268 W. Żak and P. Zarychta

the bony structures, does not allow to assess that the segmentation brought the
expected results. Based on the results, it was concluded that the region growing
method was not suitable for segmentation of the ACL and PCL ligaments.

Table 3. Values of Dice index for the analyzed MRI series for the Fuzzy C-Means
(FCM) method

Case no. Fuzzy C-Means


tibia patella femur ACL PCL
1 87.79 89.67 90.92 89.56 90.22
2 86.11 88.01 88.71 88.77 89.03
3 86.67 89.78 92.16 92.02 92.79
4 83.88 86.10 86.67 85.25 85.67
5 87.23 90.11 91.01 90.23 91.30
6 85.11 88.12 87.17 87.75 89.75
7 85.45 86.89 88.79 88.67 87.98
8 88.17 89.17 90.24 90.33 90.67
9 89.23 88.78 91.67 91.10 90.13
10 85.49 85.34 86.89 85.21 88.67
11 85.24 86.33 88.16 86.59 88.10
12 85.98 87.89 87.15 86.92 90.33
13 83.67 88.10 85.67 84.67 89.45
14 85.43 86.03 88.36 85.89 88.34
15 84.12 84.67 85.67 85.90 87.43
Average 85.97 87.67 88.53 87.92 89.32

Fig. 5. Discrepancy in the obtained values of the Dice index for the following methods:
a) RG, b) FC, c) FCM, d) FC, e) FCM, f) RG, g) FC and h) FCM
Verification of Selected Segmentation Methods 269

The best results for bone structures were obtained for the Fuzzy C-Means
method (Table 3), which are additionally reproducible for each imaging series.
The obtained Dice coefficient values also depend on the quality of the imaging
series images, which are not always the same.

Fig. 6. Segmentation result for the best femur segmentation using the region growing
method a) original slice and b) differences between superimposed structures (green and
magnenta colors have been used for differences between superimposed structures and
white for overlapping images) (Color figure online)

4 Conclusions
The region growing method could perform better, if it worked on a dynamic
inclusion criterion, not like the tests on a fixed intensity difference between pixels.
As mentioned by the authors [12] there is a problem with the segmentation of
cartilage using the region growing method, also a problem has been encountered
with the segmentation of ligaments. This could also be solved by using additional
image preprocessing to remove noise or improve quality of the image.
In the case of cruciate ligament, the best way to obtain the correct segmen-
tation is to reduce the analysis area to the region of interest containing the
desirable anatomical structures. Therefore, in this study in order to increase
the efficiency of computational procedures the fuzzy methods (FC and FCM)
have been limited to 3D ROI for the extraction the both anterior and posterior
cruciate ligament.
The obtained results of the segmentation methods implemented in this article
have confirmed the theses in the literature [1,3,9]. RG method can be used for
segmentation of the tibia (Dice index above 82%) and femur (Dice index above
89%). However, this method is not dedicated to the segmantation of the patella or
cruciate ligaments (Dice index below 65% or lower). Fuzzy methods (FC or FCM)
can be used for segmentation of the bone strucures and cruciate ligament in the
knee joint (Dice index above 85%). Properly selected preprocessing methods in
combination with atlas based segmentation together with fuzzy methods allow
270 W. Żak and P. Zarychta

even to obtain a Dice index above 90%. Therefore, the final conclusions can be
interpreted, that the fuzzy methods are quite good in relation to the knee joint
structures segmentation.

References
1. Aprovitola, A., Gallo, L.: Knee bone segmentation from MRI: a classification and
literature review. Biocybern. Biomed. Eng. 36(2), 437–449 (2016)
2. Bochenek, A., Reicher, M.: The Human Anatomy. PZWL, Warsaw (1990)
3. Kubicek, J., Penhaker, M., Augustynek, M., et al.: Segmentation of knee cartilage:
a comprehensive review. J. Med. Imaging Health Inform. 8(3), 401–418 (2018)
4. Öztürk, C.N., Albayrak, S.: Automatic segmentation of cartilage in high-field Mag-
netic Resonance Images of the knee joint with an improved voxel-classification-
driven region-growing algorithm using vivinity-correlated subsamling. Comput.
Biol. Med. 72, 90–107 (2016). https://doi.org/10.1016/j.compbiomed.2016.03.011
5. Pasierbinski, A., Jarzabek, A.: Biobiomechanics of the cruciate ligaments. Acta
Clinica 4(1), 284–293 (2001)
6. Udupa, J., Samarasekera, S.: Fuzzy connectedness and object definition: theory,
algorithms, and applications in image segmentation. Graph Models Image Process.
58, 246–261 (1996)
7. Zarychta, P.: Automatic registration of the medical images T1- and T2-weighted
MR knee images. In: Napieralski, A. (ed.) Proceedings of the International Confer-
ence Mixed Design of Integrated Circuits and Systems MIXDES2006, pp. 741–745
(2006)
8. Zarychta, P.: Cruciate ligaments of the knee joint in the computer analysis.
In: Pietka, E., Kawa, J., Wieclawek, W. (eds.) Information Technologies in
Biomedicine, Advances in Intelligent and Soft Computing, vol. 283, pp. 71–80
(2014)
9. Zarychta, P.: A new approach to knee joint arthroplasty. Comput. Med. Imaging
Graph. 65, 32–45 (2018)
10. Zarychta, P.: Posterior Cruciate Ligament – 3D Visualization. In: Kurzynski, M.,
Puchala, E., Wozniak, M., Zolnierek, A. (eds.) Computer Recognition Systems 2.
Advances in Soft Computing, vol. 45, pp. 695–702. Springer, Heidelberg (2007).
https://doi.org/10.1007/978-3-540-75175-5 87
11. Zarychta, P.: Patella – atlas based segmentation. In: Pietka, E., Badura, P., Kawa,
J., Wieclawek, W. (eds.) Information Technologies in Medicine, Advances in Intel-
ligent Systems and Computing, vol. 1011, pp. 314–322 (2019)
12. Zhang, B., Zhang, Y., Cheng, H., Xian, M., Cheng, O., Huang, K.: Computer-
aided knee joint magnetic resonance image segmentation – a survey. ArXiv
abs/1802.04894 (2018)
13. Żak, W.: Segmentation and three-dimensional visualization of chondromalacia
lesions of the femoral head. In: Recent Advances in Computational Oncology and
Personalized Medicine, vol. 1 (2021)
Rigid and Elastic Registrations
Benchmark on Re-stained Histologic
Human Ileum Images

Pawel Cyprys1 , Natalia Wyleżol2 , Adrianna Jagodzińska2 , Julia Uzdowska2 ,


Bartlomiej Pyciński2 , and Arkadiusz Gertych2,3,4(B)
1
Department of Cardiac Anaesthesiology and Intensive Care,
Faculty of Medical Sciences in Zabrze, Medical University of Silesia in Katowice,
ul. M. Curie-Sklodowskiej 9, 41-800 Zabrze, Poland
p.cyprys@sccs.pl
2
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{adrijag943,juliuzd745}@student.polsl.pl, bartlomiej.pycinski@polsl.pl
3
Department of Surgery, Cedars-Sinai Medical Center, Los Angeles, USA
arkadiusz.gertych@cshs.org
4
Department of Pathology and Laboratory Medicine, Cedars-Sinai Medical Center,
Los Angeles, CA 90048, USA

Abstract. Registration of images from re-stained tissue sections is an


initial step in generating ground truth image data for machine learn-
ing applications in histopathology. In this paper, we focused on evaluat-
ing existing feature-based and intensity-based registration methods using
regions of interest (ROIs) extracted from whole slide images (n = 25) of
human ileum that was first stained with hematoxylin and eosin (H&E)
and then re-stained with immunohistochemistry (IHC). Elastic and mov-
ing least squares deformation models with rigid, affine and similarity
feature matching were compared with intensity-based methods utilizing
an optimizer to find rigid and affine transformation parameters. Cor-
responding color H&E and IHC ROIs were registered through gray-
level luminance and deconvoluted hematoxylin images. Our goal was
to identify methods that can yield a high number of correctly regis-
tered ROIs and low median (MTRE) and average (ATRE) target reg-
istration errors. Based on the benchmark landmarks (n = 5020) placed
across the ROIs, the elastic deformation model with rigid matching and
the intensity-based rigid registrations on color-deconvoluted hematoxylin
channels yielded the highest (86%, 100%) rates of correctly registered
ROIs. For these two methods, the MTRE was 2.00 and 2.12 pixels
(0.982 µm, 1.04 µm), and ATRE was 3.14 and 4.0 pixels (1.54 µm, 1.964 µ
m), respectively. Although the intensity-based rigid registration was the
slowest of all methods tested, it may be more practical in use due to the
highest rate of correctly registered ROIs and the second-best MTRE.
The WSIs and ROIs with landmarks that we prepared can be valuable
in benchmarking other image registration approaches.

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 271–284, 2022.
https://doi.org/10.1007/978-3-031-09135-3_23
272 P. Cyprys et al.

Keywords: Multimodal image registration · Computational


pathology · Tissue imaging

1 Introduction

Histopathology refers to the examination of tissue to provide diagnostic informa-


tion about the underlying pathological processes. This examination is facilitated
by staining of the tissue with hematoxylin & eosin (H&E) and immunohisto-
chemistry (IHC). While the H&E staining is used to visualize tissue morphol-
ogy, the IHC, with different antibody selection detects targets of interest such
as proteins, receptors, cellular compartments, different cell lineages, etc. When
digitized on high-resolution slide scanners, the glass slides enable a thorough
quantitative tissue analysis of whole slides [29]. In addition, whole slide images
(WSI) outputted by the scanners are an invaluable resource of data for training
of advanced machine learning (ML) techniques [1,4,15] with the goal to alleviate
pathologist workload in identifying and quantitating histologic targets.
Tissue segmentation [5,26], histologic structure identification [10], or cell
detection tasks [13] in digital H&E slides are such areas where advanced ML tech-
niques can improve workflow of target detection in digital slides. However, devel-
opment of ML techniques for this purpose requires significant involvement of an
expert who would need to manually prepare ground truth data (i.e. delineate tar-
get of interest) for ML training and testing. Some studies showed that equiva-
lent ground truth can be obtained from the IHC-stained slides by image process-
ing, thereby significantly reducing the involvement of experts [4,15]. In this case,
obtaining the ground truth includes destaining of the H&E-stained slide which is
then re-stained with IHC that visualizes the histologic target. In subsequent steps,
regions from the digitized H&E and IHC stained slides are superimposed to gener-
ate data for ML development. However, since re-staining requires removal of cover-
slip and subsequent washing and bleaching of the H&E-stained tissue [14], it often
results in tissue displacement, rotation, local squeezing, stretching and sometimes
physical damage (tearing apart) leading to the lack of good correspondence (align-
ment) between H&E and IHC images after superimposing. Hence, the inherent lack
of good alignment between raw IHC and H&E images after slide re-staining makes
them poorly suited for generating data for ML development.
Image registration – a procedure of finding a transformation that will align
one image to its reference counterpart is a way to address these issues. It is
an iterative procedure where initial transformation of images to be registered
is followed by the computation of cost function and cost function gradient to
output new values of image transformation parameters for the next iteration.
The entire process ends when the minimum of the cost function is reached or
the predefined number of iterations is exceeded. The choice of the transformation
method and the cost function are critical to successful registration.
Most of the existing image registration techniques were developed to register
serial images such as radiology scans [6,25], but those for registering images of
serial tissue sections are also available. The latter group includes techniques with
Re-stained Tissue Image Registration 273

rigid or affine body transformation. The cost functions are based on histology
feature matching (such as blood vessels or small tissue voids) [7], mutual infor-
mation of pixel intensities [23], and many others [2,3,18,28]. More sophisticated
methods include tissue mask boundary alignment [19], background segmentation
with B-spline registration [12], step-wise registrations of image patches with ker-
nel density estimation, and hierarchical resolution regression [16]. However, stud-
ies demonstrating applicability of the existing registration methods to images of
re-stained tissue sections are lacking.
Since the accuracy of H&E and IHC image registration is strongly related to
the quality of data generated for ML training, it is important to identify most
accurate methods. The goal is to achieve excellent correspondence between the
H&E and IHC-stained tissue images so that positions of the same cells in both
images are ideally matched [16]. To achieve this level of accuracy on images
from re-stained slides, the slides need to be scanned at a high resolution, oppo-
site to the scans from serial sections which can be digitized at the same or lower
resolution [20]. The level of correspondence (or quality of registration) can be
measured by the number of successfully registered regions and the registration
error expressed as the distance between landmark points before and after regis-
tration placed at random cells. Methods that yield a large number of registered
regions (without failure) and smallest possible registration errors would be most
suitable for inclusion in pipelines that generate data for ML development.
In this paper, we focus on investigating accuracy of several existing image-
registration methods using high-resolution WSIs of human ileum stained with
H&E and then re-stained with IHC with an antibody visualizing neuronal cells.
To carry out the analyses, regions extracted from the WSIs were annotated
with landmark points. Rigid [11], affine [30], elastic B-splines [2] and moving
least squares [24] transformations that utilize image intensity and feature-based
approaches [17,27] were tested. Our goal was to identify those techniques that
can yield a high number of accurately registered image regions isolated from the
re-stained H&E and IHC slides.

2 Materials
For the purpose of this study, we used 25 sections (1 section per case, 4 µm stan-
dard thickness) of human ileum. Each section was first stained with H&E, imaged
with Aperio AT Turbo whole slide scanner with 20x magnification objective
(Fig. 1(a)), then destained and re-stained with IHC, and imaged on the same
scanner again. Pixel size in the obtained WSIs was 0.491 µm × 0.491 µm. IHC
involved antibodies reactive to S100 proteins that are normally present in neu-
ronal cell lineages including Shwann, glial and neural cells. The purpose was
to distinguish the myenteric plexus which is a network of nerves between the
layers of the muscular propria in the ileum. DAB (brown chromogen) was used
to visualize the positive staining (Fig. 1(b)). Slide re-staining and imaging was
performed at the Cedars-Sinai Biobank.
Using the digital slide viewer (Aperio ImageScope ver.12.4.3), we anno-
tated corresponding regions of interest (ROIs) in H&E and IHC WSIs by first
274 P. Cyprys et al.

Fig. 1. Example WSIs of a re-stained tissue section. Manually annotated ROIs are
marked in green. The ROIs vary by size, location and tissue histology. H&E slide (a),
IHC slide (b). The tissue on the IHC slide is damaged. Example ROIs with visibly
damaged tissue are in the WSIs’ center
Re-stained Tissue Image Registration 275

annotating ROIs in the IHC WSIs and then transferring the ROI annotations
to the H&E WSIs. We annotated as many ROIs with myenteric plexus (positive
staining) per WSI as possible. Other regions (sparse or no staining) were chosen
too without giving any preference to the location and included other parts of
ileal histology (crypts, stroma, inflammation, muscle layer, and fat) with differ-
ent proportions in each ROI. This process yielded n = 593 pairs (23.7 ± 14.2 per
WSI) of corresponding H&E and IHC ROIs. The ROI size varied from 617 ×
676 to 4450 × 2288 pixels, but ROIs measuring about 1k × 1.1k pixels were
most common (60.4% of all ROIs) (Fig. 2(a)).
The H&E and IHC ROIs were subsequently annotated for landmarks. The
landmarks were manually generated in 3DSlicer [8] which turned out to be very
handy in visualizing superimposed H&E and IHC ROIs [31]. By adjusting opac-
ity, we could see individual cells in both ROIs and hence place landmarks precisely
near or at corresponding cells that were chosen and then save their coordinates in a
file. 8 to 10 corresponding landmark pairs were placed in each H&E and IHC ROI,
(5020 in total, 8.45 on average per ROI) (Fig. 2(b)). To assure good landmark cor-
respondence, they were placed on borders of cell nuclei, wherever possible.

Fig. 2. ROI and landmark statistics in 25 pairs of H&E and IHC WSIs used in this
study. (a) In ROI area histogram about 60% of the ROIs had area around 1024 × 1024
pixels (N = 593). (b) Landmark distances between corresponding H&E and IHC ROIs
before registration (N = 5020). The median distance between landmarks before regis-
tration in our set was 28 pixels (equivalent to 13.78 µm). About 12% of these landmarks
were 200 pixels away or more
276 P. Cyprys et al.

3 Methods
Registration methods tested in this study originate from two families: feature-
based and intensity-based. The feature-based methods such as the B-spline elas-
tic [3] (ELT) and moving least squares [34] (MLS) utilize the scale invariant
feature transform (SIFT) [21] which finds grid points in two images subjected to
registration. Since finding the points is carried out separately for each image, the
points may not correspond in terms of number and location in the two images.
However, corresponding grid points can be found through a matching transform
such as rigid (RG), affine (AF) or similarity (SM). The choice of the matching
transform is part of the feature extraction process and determined by the user.
The registration begins once parameters of the matching transform are found. In
the last step, the images undergo deformation through B-splines or the moving
least squares method applied to image pixels between the grid points found after
the matching.
Methods from the intensity-based family apply the matching transform (i.e.
RG or AF) directly to the images without finding the grid points. Since grid
points are not available, the transformation parameters are initially unknown.
However, they can be found iteratively i.e. by maximizing mutual information
between pixel intensities (IN) in images subjected to registration through an
evolutionary optimizer [36].
The registrations were run independently on two gray-level representations of
the H&E and IHC ROIs: (a) luminance channel (LU), and (b) hematoxylin stain-
ing channel (HE). Pixel intensities in the LU channel were calculated as follows:
LU = 0.2989R + 0.5870G + 0.1140B, where: the R, G and B are red, blue and
green color pixel intensities in the original H&E and IHC ROIs, whereas pixel
intensities of the HE channel were found through the color deconvolution algo-
rithm [32]. The gray-level image representation can be considered as a variable in
the registration method, leading to the following notation: MethodFeature (image
representation), where the gray-level image representation is either LU or HE.
Registration methods that we studied have previously been implemented
in Matlab and Fiji scientific computing environments. Specifically, the feature-
based methods: ELTRG , ELTAF , ELTSM , MLSRG , MLSAF , MLSSM are avail-
able in FIJI (ImageJ ver. 1.53c) [35] as “Register Virtual Stack Slices” pack-
age with the B-spline elastic being called through “bUnwarpJ ” function. The
intensity-based RGIN and AFIN image registrations can be called in Matlab
(ver.R2020b) through the “imregtform” function. Each of the tested registration
methods required method-specific hyperparameters to be set. However, we used
default values except the “modality” hyperparameter in the optimizer settings
used by RGIN and AFIN which we set to “multimodal ”.
Accuracy of registration methods applicable to digital pathology is usually
assessed by the median target registration error (MTRE) expressed as the median
distance between corresponding landmarks in all ROIs after registration [9].
Besides MTRE, the average (mean) target registration error (ATRE) is often com-
puted. Since ATRE and MTRE may not fully reflect the performance of a method
in yielding highly accurate registrations for a series of ROIs, we introduced the
Re-stained Tissue Image Registration 277

percent of correct registrations (PCC) and the percent of successful registrations


(PSC) to emphasize advantages and shortcomings that ATRE or MTRE do not
capture. In our study, the registration was considered correct if coordinates of all
landmarks after registration were within the ROI frame. Likewise, the registra-
tion was considered successful if the distance between all corresponding landmarks
within the ROI frame decreased. We also recorded the registration time to assess
computational complexity of each method.

4 Results
593 ROIs and 5020 landmarks were extracted from 25 pairs of WSIs obtained
from restrained tissue sections to test six selected feature-based and two intensity-
based image registration techniques. Example registrations are presented in Fig. 3.
PCCs, PSCs and registration times that we assessed first are shown in Table 1.
Distributions of TRE in successfully registered ROIs (Table 1, column with PSC)
are shown in Fig. 4(a). ATRE, and MTRE for PSC ROIs are shown in Table 2. A
heatmap of successfully registered ROIs as a function of the percent landmarks
that are within a successfully registered ROI was also plotted (Fig. 5).

Table 1. Registration performance of ROIs in our dataset. PCC and PSC are the
respective percentages of correctly and successfully registered ROIs

Method PCC PSC Registration time [s]


[%] [%] Mean Std. dev.
ELTRG (LU) 89.54 61.89 47.48 9.52
ELTAF (LU) 90.22 62.90 41.78 9.63
ELTSM (LU) 83.92 57.50 41.87 9.69
MLSRG (LU) 81.95 65.94 35.61 7.14
MLSAF (LU) 81.28 64.59 35.57 6.99
MLSSM (LU) 65.77 51.94 35.49 7.15
RGIN (LU) 100 80.27 228.00 197.33
AFIN (LU) 100 78.76 228.08 191.44
ELTRG (HE) 86.85 58.52 40.03 12.24
ELTAF (HE) 79.60 56.32 40.54 12.52
ELTSM (HE) 69.48 52.11 39.76 16.88
MLSRG (HE) 80.78 61.89 33.43 10.54
MLSAF (HE) 75.38 59.19 33.43 9.85
MLSSM (HE) 64.59 50.25 21.89 9.49
RGIN (HE) 99.83 79.60 227.19 192.26
AFIN (HE) 100 80.61 225.55 193.08

All intensity-based methods had the highest rates of PCC (99–100%) and
their PSC was around 80% (Table 1). The feature-based registrations were less
278 P. Cyprys et al.

Fig. 3. Registrations of corresponding H&E and IHC ROIs by two different methods.
(a) shows corresponding H&E (left) and IHC (right) ROIs extracted from a WSI and
annotated for landmarks (green and blue dots) before registration. A pair of corre-
sponding landmarks is marked by the green arrow (landmark placed in the H&E ROI)
and the blue arrow (landmark placed in the IHC ROI). Distances between paired land-
marks after registrations were used to measure the rates of PSC, PCC, and ATRE
and MTRE. (b) shows the registration by the RGIN (HE) and (c) by the MLSRG (HE)
method. Yellow arrows point at paired-landmarks after registration. ROIs are well
aligned if the blue and green dots overlap (solid arrows). Dashed arrows point at poorly
aligned landmark pairs. Note, the tissue damage in the IHC ROI
Re-stained Tissue Image Registration 279

Fig. 4. Distribution of distances between landmarks before and after registration in


ROIs registered by different methods. (a) shows distances for all landmarks and all
ROIs are included. (b) shows distances in ROIs which were successfully and correctly
registered

successful and accurate, that is, the PCC rate for this family was in the range
of 64–90% and the PSC oscillated between 50.2% and 65.94%. Although the
intensity-based methods yielded more accurate regions, the time needed to com-
plete registrations was approximately 5–10 times longer when compared to that
needed by the feature based-methods. The PSC and PCC rates for the LU
image representation were generally higher by 1–5% than the corresponding rates
obtained by the same registration method for the HE image representation.
Regardless the rate of success and accuracy of registration, the MTRE ranged
from 2.0 to 2.5 pix for all the methods (Fig. 4(a)). However, in ROIs that were
successfully registered, the ATRE was higher and ranged between 2.7 and 4.7 pix
(Fig. 4(b)) suggesting that both PSC and MTRE are essential in assessing the
success rate and accuracy of registration.

5 Discussion
Accurate registration of high-resolution tissue images from re-stained tissues
remains a challenge [20,33]. In this study, we tested two intensity-based and six
feature-based registration methods that were applied to WSIs from re-stained
histologic specimens from of human ileum. Ileal sections are fragile [37] and
therefore susceptible to damage when exposed to mechanical stress or tissue
280 P. Cyprys et al.

Table 2. Descriptive statistics of TREs in successfully registered ROIs

PSC ATRE std min MTRE max


[%] [pix] [pix] [pix] [pix] [pix]
ELTRG (LU) 61.89 3.83 8.86 0.00 2.00 131.11
ELTAF (LU) 62.90 3.54 8.11 0.00 2.00 131.11
ELTSM (LU) 57.50 4.30 12.08 0.00 2.00 131.11
MLSRG (LU) 65.94 3.33 4.51 0.00 2.24 84.65
MLSAF (LU) 64.59 3.30 4.52 0.00 2.24 84.65
MLSSM (LU) 51.94 3.18 3.94 0.00 2.24 64.03
RGIN (LU) 80.27 3.58 8.96 0.45 2.12 212.09
AFIN (LU) 78.76 4.68 13.63 0.50 2.55 237.78
ELTRG (HE) 58.52 3.14 6.42 0.00 2.00 82.97
ELTAF (HE) 56.32 2.74 4.82 0.00 2.00 82.28
ELTSM (HE) 52.11 4.59 20.24 0.00 2.00 364.97
MLSRG (HE) 61.89 3.63 5.88 0.00 2.24 129.32
MLSAF (HE) 59.19 3.32 4.02 0.00 2.24 57.49
MLSSM (HE) 50.25 3.29 4.99 0.00 2.24 83.38
RGIN (HE) 79.60 4.00 11.32 0.50 2.12 212.21
AFIN (HE) 80.61 4.07 9.60 0.50 2.55 197.73
Before registr. 100 37.58 40.51 0.00 25.08 402.36

Fig. 5. Heatmaps of ROI registration performance as a function of the percent land-


marks that remained within the ROI frame after registration. We used the percent
landmarks (X-axis) as indicator of registration success. The number of ROIs that failed
to register is expressed in the leftmost column. The number of ROIs which had all land-
marks within the ROI frame is expressed in the rightmost column. The numbers in the
rightmost column correspond to the PSC rate in Table 1
Re-stained Tissue Image Registration 281

processing such as destaining, removal of coverslip etc. Although, the re-staining


was carried out with care, tearing, displacement and local separation in sub-
mucosa and muscularis propria were observed in nearly every IHC stained slide
(Fig. 1). The goal of this testing was to identify methods that can yield a large
number of accurately registered ROIs that we can later use to develop a ML-
based tool for segmentation of neuronal cells (myenteric plexus) in H&E WSIs.
Unlike other works which focused on registration of WSIs [16,20], we focused on
registration of ROIs because WSIs of re-stained tissue sections constitute a reli-
able source of ground truth data for ML development, particularly at the ROI
level [4,15]. The eight methods that we chose were evaluated using landmark
distance and success rates of ROI registration.
Results from our tests indicated that the intensity-based methods such as
the RGIN (*) and AFIN (*) outperformed the feature-based methods in terms
of the number of successfully and correctly registered ROIs; the PSC and PCC
rates were approximately 20% and 10% higher than those yielded by the feature-
based methods such as ELTRG (*) and MLSRG (*). Interestingly, all ELT methods
yielded the lowest MTRE (2 pix, equivalent to 0.982 µm). The MTRE by RGIN (*)
methods was slightly higher (2.12–2.55 pix). On the other hand, the MLSSM (*)
ELTSM (*) yielded the lowest PSCs and PCCs indicating that the similarity
matching is a sub-optimal choice as the feature extraction scheme and leading
to the overall smallest number of successfully and correctly registered regions.
Another interesting observation was that the ELTAF (HE) and ELTRG (HE)
had the lowest ATRE (2.74 pix and 3.14 pix, equivalent to 1.34 µm and 1.54 µm).
These error rates were lower than those by ELTAF (LU) and ELTRG (LU) meth-
ods suggesting that the elastic models [2] through color-deconvoluted hema-
toxylin channel (HE) may lead to more accurate registrations. Since the use of
hematoxylin channel did not seem to clearly improve the registration accuracy
of the intensity-based and MLS methods (as measured by ATRE), additional
data sets should be used to validate this observation.
High rates of accurately registered H&E and IHC ROIs achieved by the
intensity-based registrations suggest that these techniques can be key in gener-
ating ground truth image data useful in developing ML models that can learn
histologic patterns from H&E images through IHC staining. In our case, this will
be neuronal cells in human epithelium. The ROIs that we successfully registered
through this study can be processed to obtain neuronal cell masks and then
integrated with the manual ground truth masks of illeal epithelium, goblets and
stroma that we prepared and reported before [22].

6 Conclusions
The intensity-based and elastic registration methods were most accurate in reg-
istration of ROIs from re-stained H&E and IHC tissue sections of the human
ileum. The intensity-based rigid registration may be more practical for generat-
ing ground truth data for developing ML models segmenting neuronal cells due
to the highest rate of correctly registered ROIs and overall low median and aver-
age target registration errors. Further studies involving re-stained tissues from
282 P. Cyprys et al.

additional repositories should validate our observations. The whole slide images
and ROIs with landmarks that we prepared can be valuable in benchmarking
other image registration approaches in computational pathology.

Acknowledgement. This project was in part supported by the grant from the Helm-
sley Charitable Trust and the grants from the Silesian University of Technology no.
BK-231/RIB1/2022 and 31/010/SDU20/0006-10 (Excellence Initiative – Research Uni-
versity). The authors would also like to thank the Cedars-Sinai Biobank for preparation
and digitization of slides.

References
1. Anand, D., et al.: Deep learning to estimate human epidermal growth factor recep-
tor 2 status from hematoxylin and eosin-stained breast tissue images. J. Pathol.
Inform. 11, 19 (2020). https://doi.org/10.4103/jpi.jpi 10 20
2. Arganda-Carreras, I., Sorzano, C.O.S., Marabini, R., Carazo, J.M., Ortiz-de-
Solorzano, C., Kybic, J.: Consistent and elastic registration of histological sections
using vector-spline regularization. In: Beichel, R.R., Sonka, M. (eds.) CVAMIA
2006. LNCS, vol. 4241, pp. 85–95. Springer, Heidelberg (2006). https://doi.org/10.
1007/11889762 8
3. Borovec, J., et al.: Anhir: automatic non-rigid histological image registration chal-
lenge. IEEE Trans. Med. Imaging PP, 1–1 (2020). https://doi.org/10.1109/TMI.
2020.2986331
4. Bulten, W., et al.: Epithelium segmentation using deep learning in h&e-stained
prostate specimens with immunohistochemistry as reference standard. Scientific
Reports 9, 864 (2019). https://doi.org/10.1038/s41598-018-37257-4
5. Bándi, P., Balkenhol, M., Ginneken, B., Laak, J., Litjens, G.: Resolution-agnostic
tissue segmentation in whole-slide histopathology images with convolutional neural
networks. PeerJ 7, e8242 (2019). https://doi.org/10.7717/peerj.8242
6. Chen, C.T.: Radiologic image registration: old skills and new tools. Acad. Radiol.
10, 239–41 (2003)
7. Cooper, L., Sertel, O., Kong, J., Lozanski, G., Huang, K., Gurcan, M.: Feature-
based registration of histopathology images with different stains: an application for
computerized follicular lymphoma prognosis. Comput. Methods Programs Biomed.
96, 182–92 (2009). https://doi.org/10.1016/j.cmpb.2009.04.012
8. Fedorov, A., et al.: 3D slicer as an image computing platform for the quantitative
imaging network. Magnetic Resonance Imaging 30, 1323–41 (2012). https://doi.
org/10.1016/j.mri.2012.05.001. https://www.slicer.org
9. Fitzpatrick, J., West, J.: The distribution of target registration error in rigid-
body point-based registration. IEEE Trans. Med. Imaging 20(9), 917–927 (2001).
https://doi.org/10.1109/42.952729
10. Gallego, J., Swiderska, Z., Markiewicz, T., Yamashita, M., Gabaldon, M., Ger-
tych, A.: A u-net based framework to quantify glomerulosclerosis in digitized pas
and h&e stained human tissues. Computerized Medical Imaging and Graphics 89,
101,865 (2021). https://doi.org/10.1016/j.compmedimag.2021.101865
11. Ghahremani, M., et al.: Rigid Registration, pp. 1087–1099. Springer, Cham (2021).
https://doi.org/10.1007/978-3-030-63416-2 184
Re-stained Tissue Image Registration 283

12. Gonzalez, D., Frafjord, A., Øynebråten, I., Corthay, A., Olivo-Marin, J.C., Meas-
Yedid, V.: Multi-staining registration of large histology images. In: 2017 IEEE 14th
International Symposium on Biomedical Imaging, pp. 345–348 (2017). https://doi.
org/10.1109/ISBI.2017.7950534
13. Hatipoglu, N., Bilgin, G.: Cell segmentation in histopathological images with deep
learning algorithms by utilizing spatial relationships. Med. Biol. Eng. Comput.
55(10), 1829–1848 (2017). https://doi.org/10.1007/s11517-017-1630-1
14. Hinton, J., et al.: A method to reuse archived h&e stained histology slides for a
multiplex protein biomarker analysis. Methods Prot. 2, 86 (2019). https://doi.org/
10.3390/mps2040086
15. Ing, N., et al.: A novel machine learning approach reveals latent vascular pheno-
types predictive of renal cancer outcome. Sci. Rep. 7 (2017). https://doi.org/10.
1038/s41598-017-13196-4
16. Jiang, J., Larson, N., Prodduturi, N., Flotte, T., Hart, S.: Robust hierarchical
density estimation and regression for re-stained histological whole slide image
co-registration. PLoS ONE 14, e0220,074 (2019). https://doi.org/10.1371/journal.
pone.0220074
17. Johnson, H., Christensen, G.: Consistent landmark and intensity-based image reg-
istration. IEEE Trans. Med. Imaging 21, 450–61 (2002). https://doi.org/10.1109/
TMI.2002.1009381
18. Kuska, J.P., et al.: Image registration of differently stained histological sections. In:
2006 International Conference on Image Processing, pp. 333–336 (2006). https://
doi.org/10.1109/ICIP.2006.313161
19. Kybic, J., Dolejšı́, M., Borovec, J.: Fast registration of segmented images by normal
sampling. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition
Workshops (2015). https://doi.org/10.1109/CVPRW.2015.7301311
20. Lotz, J., Weiss, N., van der Laak, J., Heldmann, S.: High-resolution image
registration of consecutive and re-stained sections in histopathology (2021).
ArXiv:2106.13150
21. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J.
Comput. Vision 60(2), 91–110 (2004). https://doi.org/10.1023/b:visi.0000029664.
99615.94
22. Ma, Z., et al.: Semantic segmentation of colon glands in inflammatory bowel disease
biopsies. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) ITIB 2018.
AISC, vol. 762, pp. 379–392. Springer, Cham (2019). https://doi.org/10.1007/978-
3-319-91211-0 34
23. Maes, F., Vandermeulen, D., Suetens, P.: Medical image registration using mutual
information. Proc. IEEE 91(10), 1699–1722 (2003). https://doi.org/10.1109/jproc.
2003.817864
24. Menon, H., Narayanankutty, K.A.: Applicability of non-rigid medical image regis-
tration using moving least squares. Int. J. Comput. Appl. 1, 85–92 (2010). https://
doi.org/10.5120/138-256
25. Mäkelä, T., et al.: A review of cardiac image registration methods. IEEE Trans.
Med. Imaging 21, 1011–21 (2002). https://doi.org/10.1109/TMI.2002.804441
26. Nirschl, J., et al.: Chapter 8 - Deep Learning Tissue Segmentation in Cardiac
Histopathology Images, pp. 179–195. Academic Press (2017). https://doi.org/10.
1016/B978-0-12-810408-8.00011-0
27. Oliveira, F.P., Tavares, J.M.R.: Medical image registration: a review. Comput.
Methods Biomech. Biomed. Engin. 17(2), 73–93 (2014). https://doi.org/10.1080/
10255842.2012.670855
284 P. Cyprys et al.

28. Ourselin, S., Roche, A., Subsol, G., Pennec, X., Ayache, N.: Reconstructing a 3D
structure from serial histological sections. Image Vis. Comput. 19, 25–31 (2001).
https://doi.org/10.1016/S0262-8856(00)00052-4
29. Pantanowitz, L., et al.: Review of the current state of whole slide imaging in pathol-
ogy. J. Pathol. Inform. 2, 36 (2011). https://doi.org/10.4103/2153-3539.83746
30. Pitiot, A., Bardinet, E., Thompson, P., Malandain, G.: Piecewise affine registra-
tion of biological images for volume reconstruction. Med. Image Anal. 10, 465–83
(2006). https://doi.org/10.1016/j.media.2005.03.008
31. Pyciński, B., Yagi, Y., Walts, A.E., Gertych, A.: 3-D tissue image reconstruction
from digitized serial histologic sections to visualize small tumor nests in lung adeno-
carcinomas. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information
Technology in Biomedicine. AISC, vol. 1186, pp. 55–70. Springer, Cham (2021).
https://doi.org/10.1007/978-3-030-49666-1 5
32. Ruifrok, A.C., Johnston, D.A.: Quantification of histochemical staining by color
deconvolution. Anal. Quant. Cytol. Histol. 23(4), 291–299 (2001)
33. Ruusuvuori, P., et al.: Spatial analysis of histology in 3D: quantification and visu-
alization of organ and tumor level tissue environment. Heliyon p. e08762 (2022).
https://doi.org/10.1016/j.heliyon.2022.e08762
34. Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least
squares. In: ACM SIGGRAPH 2006 Papers on - SIGGRAPH 2006. ACM Press
(2006). https://doi.org/10.1145/1179352.1141920
35. Schindelin, J., et al.: Fiji: an open-source platform for biological-image analysis.
Nat. Methods 9(7), 676–682 (2012). https://doi.org/10.1038/nmeth.2019
36. Styner, M., Brechbuhler, C., Szckely, G., Gerig, G.: Parametric estimate of intensity
inhomogeneities applied to MRI. IEEE Trans. Med. Imaging 19(3), 153–165 (2000).
https://doi.org/10.1109/42.845174
37. Williams, J.M., Duckworth, C.A., Vowell, K., Burkitt, M.D., Pritchard, D.M.:
Intestinal preparation techniques for histological analysis in the mouse. Curr. Prot.
Mouse Biol. 6(2), 148–168 (2016). https://doi.org/10.1002/cpmo.2
DVT: Application of Deep Visual
Transformer in Cervical Cell Image
Classification

Wanli Liu1 , Chen Li1(B) , Hongzan Sun2 , Weiming Hu1 , Haoyuan Chen1 ,
and Marcin Grzegorzek3
1
Microscopic Image and Medical Image Analysis Group, College of Medicine
and Biological Information Engineering, Northeastern University, Shenyang, China
lichen@bmie.neu.edu.cn
2
Shengjing Hospital, China Medical University, Shenyang, China
sunhz@sj-hospital.org
3
Institute of Medical Informatics, University of Luebeck, Luebeck, Germany
marcin.grzegorzek@uni-luebeck.de

Abstract. Cervical cancer is a very common cancer among women.


Cytopathologists use a microscope to examine cell slides collected from the
patient’s cervix to determine if it is cancerous. However, manually check-
ing the slides to diagnose cancer is a very difficult task, not only time-
consuming but also error-prone. A computer-aided diagnosis system can
automatically and accurately screen cervical cell images. In this study,
we propose a framework called DVT to perform classification tasks. We
use SIPaKMeD and CRIC datasets together. On 11-class classification
tasks, DVT achieves an accuracy of 87.35%. In the comparative experi-
ment, DVT performs better than the CNN and VT models alone.

Keywords: Visual transformer · Cervical cancer · Cell image · Image


classification

1 Introduction
Cervical cancer is a very common cancer among women. Weak immunity, smok-
ing, use of contraceptives, and unsanitary menstruation may all lead to cervical
cancer. Patients with cervical cancer have symptoms such as abnormal vaginal
bleeding and leucorrhea [1]. However, cervical cancer can be prevented through
early examination and treatment [2].
The cytopathological examination is an effective means of diagnosing can-
cer. Cytopathologists use a microscope to examine cell slides collected from the
patient’s cervix to determine if it is cancerous. However, manually checking the
slides to diagnose cancer is a very difficult task, not only time-consuming but
also error-prone.
A computer-aided diagnosis system (CAD) can automatically and accurately
screen cervical cell images. Early CAD used traditional methods. In recent years,
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 285–294, 2022.
https://doi.org/10.1007/978-3-031-09135-3_24
286 W. Liu et al.

deep learning methods have become popular in computer vision and other fields.
However, recently, the Transformer used in the field of Natural Language Pro-
cessing (NLP) appears in the field of computer vision, and we collectively refer
to the Transformer model applied in the field of computer vision as Visual Trans-
former (VT). The first VT model to appear was the Vision Transformer (ViT).
ViT is better than Convolutional Neural Networks (CNN) when pre-training
large amounts of data [3].
In this study, we propose a VT framework called Deep Visual Transformer
(DVT) to perform classification tasks. We use SIPaKMeD and CRIC datasets
together [4,5]. DVT achieves an accuracy of 87.35%. The workflow of DVT can
be seen in Fig. 1.
In Fig. 1, cell images in (a) are the training data, from the SIPaKMeD and
CRIC datasets together, including 11 categories. (b) is the preprocessing stage
of the data. The data first goes through the augmentation phase and then is
normalized. (c) is the training model stage, the preprocessed data is input into
model for training. In (d), test images of cells are input into the trained model
for testing. In (e), precision, recall, F1-Score, and accuracy are calculated to
evaluate the performance of DVT.

Fig. 1. Workflow of the DVT framework

The main contributions of this study are as follows: (1) This study combines
two datasets together, SIPaKMeD (five categories) and CRIC (six categories),
to perform an 11-class classification task. (2) We use DVT framework to classify
cervical cells, which is an applied innovation in the field of cell classification. (3)
DVT achieves a good result of 87.35%.

2 Related Work
2.1 Applications of Deep Learning in Cervical Cell Classification

Deep learning methods are widely used for cervical cell classification. The char-
acteristics of some of the deep learning methods used in this study are described
below.
VGG network is a classic network in CNN, which improves the performance
of the model by increasing the depth of the network. This network uses a small
DVT 287

3×3 convolution kernel, and the amount of parameters is greatly reduced [6].
InceptionV3 improves performance by increasing the width of the network. It
uses multi-channel convolution to extract more information from the image [7].
ViT applies the Transformer in the NLP field to the computer vision field for the
first time, and its core components are the attention mechanism and Multilayer
Perceptron (MLP). It works better than CNN in the pre-trained case [3]. DeiT
is an upgraded version of ViT. It adds distillation tokens to ViT’s model. But
the tiny version of DeiT does not introduce distillation tokens, only improves
parameters [8]. T2T-ViT is an upgraded version of ViT, which proposes a new
T2T mechanism. It is better than ViT with fewer parameters [9].
Then we introduce the working direction of some literature to understand
how deep learning can be applied to cervical cell classification. For more detailed
information, please refer to our review paper [10].
In [11], a transfer learning framework that integrates four CNN models is
proposed to process the Herlev dataset. In [12], this study uses CNN for feature
extraction and machine learning models for classification. In [13], this method
uses pre-trained CNN to extract features and uses feature fusion to process the
SIPaKMeD dataset. A method based on graph convolutional network is proposed
to classify cervical cells [14]. A comparative experiment using 22 deep learning
models to classify cervical cell images is performed in [15].
It can be seen that most of the literature uses the CNN model for processing.
No studies have used VT to classify cervical cell images. Therefore, this study
is of great significance for practical applications.

2.2 Applications of Deep Learning in Other Image Analysis Tasks


Deep learning methods are also widely used in other image analysis tasks. The
applications of deep learning are introduced below in the fields of COVID-19
images, histopathological images, and microscopic images of microorganisms.
In the field of COVID-19 electron microscope image analysis, a SARS-CoV-2
microscopic image dataset is constructed and novel deep learning features are
used to describe the visual information of the images [16]. In [17], a comparative
experiment using 15 pre-trained CNN models to detect COVID-19 samples is
performed. In [18], this study uses pre-trained CNN models to extract features
from chest X-ray images and classify them using classical classifiers such as
support vector machines.
In the field of histopathology image analysis, deep learning methods are often
used to classify and segment images [19,20]. A framework for classifying colorec-
tal histopathology images based on attention mechanism and interactive learning
is proposed in [21]. In [22], a publicly available database of gastric histopathol-
ogy subscale images is proposed. In [23], an ensemble learning method based
on pre-trained CNN models is proposed to classify breast cancer histopathology
images.
In the field of microscopic image analysis of microorganisms, deep learning
also has many applications, such as microorganism counting [24,25]. For exam-
ple, an environmental microorganism classification engine based on conditional
288 W. Liu et al.

random fields and CNN is proposed in [26]. In [27], a network architecture based
on U-Net, Inception and connection operations is proposed for the environmental
microorganism image segmentation task.

3 Method
To use VT for the cervical cell classification task, we design the DVT framework,
and its structure can be seen in Fig. 1:
(a) uses training data from the SIPaKMeD and CRIC datasets together, includ-
ing 8838 cell images in 11 categories.
(b) demonstrates the preprocessing stage of the data. The data first passes
through the augmentation stage, and the image has a half probability of
being flipped by the mirror. This can enhance the generalization ability of
the model. Then the data undergo a normalization stage to complete pre-
processing. The normalization stage normalizes the data by channel, that is,
subtracting the mean and then dividing by the variance. It can speed up the
convergence of the model.
(c) is the DVT model whose detailed structure can be seen in Fig. 2. The input
size of the image is 224×224 pixels. We use the DeiT model to classify cer-
vical cell images [8]. It is worth noting that the DeiT model we use is a
tiny version without distillation tokens, which is equivalent to a parameter-
improved version of ViT. This model performs better than ViT. Its core
components are the attention layer, the MLP layer, and the residual struc-
ture, which are marked in green in Fig. 2. The attention layer can extract the
global features of the image, and the residual structure solves the problem of
gradient disappearance or explosion that may be caused by the deepening of
the network. DVT extracts the 192-dimensional features of the image, and
finally obtains the classification result through the fully connected layer.
(d) uses cervical cell test images to test the trained DVT to see if the performance
is robust.
(e) evaluates the performance of DVT by calculating precision, recall, F1-Score,
and accuracy, which are commonly used evaluation metrics.

4 Experiments and Analysis


4.1 Datasets
Cervical cell image datasets are widely explored in literature [4,5]:

CRIC. CRIC dataset has 400 Pap smear images. Images can be divided into
six categories: negative for intraepithelial lesion or malignancy (NILM); atypical
squamous cells of undetermined significance, possibly non-neoplastic (ASC-US);
low-grade squamous intraepithelial lesion (LSIL); atypical squamous cells, can-
not exclude a high-grade lesion (ASC-H); high-grade squamous intraepithelial
lesion (HSIL); squamous cell carcinoma (SCC) [5]. This study only uses 4789
cropped cell images [28]. There are some examples in Fig. 3.
DVT 289

Fig. 2. The structure of the DVT model

SIPaKMeD. SIPaKMeD dataset contains 4049 cropped cervical cell images.


Images can be divided into five categories: dyskeratotic, koilocytotic, metaplas-
tic, parabasal, and superficial intermediate [4]. There are some examples in Fig. 3.

Fig. 3. Examples of CRIC and SIPaKMeD datasets

Combined Dataset. To utilize the benefits of both the mentioned datasets, in


our work we use the combination of them. It is worth mentioning that the nam-
ing systems of the two datasets are different and have no overlapping. This
combined dataset has 11 categories, including 8838 cervical cell images. We
randomly divide 60% of the data in 11 categories as the training set, 20% as
the validation set, and the rest as the test set. Table 1 is the situation of data
allocation.
290 W. Liu et al.

Table 1. Data division of the combined dataset. (1: ASC-H, 2: ASC-US, 3: SCC, 4:
HSIL, 5: LSIL, 6: NILM, 7: Dyskeratotic, 8: Koilocytotic, 9: Metaplastic, 10: Parabasal,
11: Superficial-Intermediate)

Dataset/Class 1 2 3 4 5 6 7 8 9 10 11 Total
Train 484 446 413 525 491 518 488 495 476 473 499 5308
Validation 161 148 137 175 164 172 163 165 159 157 166 1767
Test 161 148 137 174 163 172 162 165 158 157 166 1763
Total 806 742 687 874 818 862 813 825 793 787 831 8838

4.2 Experimental Environment


This experiment is performed on a local computer with 32 GB of memory. The
GPU is an NVIDIA Quadro RTX 4000 with 8 GB of memory. We use Pycharm
software and Pytorch deep learning framework for programming. The learning
rate is 0.0002, the batch size is 16, and the epoch is 100. AdamW is applied as
the optimizer in this paper. We select the model with the highest accuracy on
the validation set as the optimal model.

4.3 Evaluation Methods


We use accuracy, recall, precision, and F1-Score as evaluation methods. In this
11-class classification task, the evaluation index of each category is calculated,
and then the average value is taken as the final evaluation index. When calcu-
lating the evaluation index of each category, the current category is regarded
as the positive sample, and other categories are regarded as negative samples.
A negative sample predicted to be positive is a false positive (FP). A positive
sample predicted to be negative is a false negative (FN). True Positive (TP) is
a positive sample that is correctly predicted. True Negative (TN) is a negative
sample that is correctly predicted. Table 2 shows the formula of the evaluation
methods.

Table 2. Evaluation methods

Method Formula Method Formula


TP P ×R
Precision (P ) TP+FP
F1-Score 2 × P +R
TP TP+TN
Recall (R) TP+FN
Accuracy TP+TN+FP+FN

4.4 Experimental Results and Analysis


Table 3 shows the performance of DVT in the validation and testing phase. In
the validation phase, the accuracy of DVT is 88.56%, precision is 88.50%, recall
is 88.70%, and F1-Score is 88.40%. In the testing phase, the accuracy of DVT
DVT 291

is 87.35%, precision is 87.40%, recall is 87.40%, and F1-Score is 87.20%. In this


11-class classification task, the test accuracy rate of 87.35% indicates that the
model performance is very good. It can be seen that the performance of DVT
in the validation and testing phase is not much different, indicating that the
performance of DVT is relatively robust.

Table 3. The performance of DVT

Dataset Average Average Average Accuracy (%)


Precision (%) Recall (%) F1-Score (%)
Validation 88.50 88.70 88.40 88.56
Test 87.40 87.40 87.20 87.35

4.5 Extended Experiment


In order to further demonstrate the effect of the DVT model, we use 12
models for a comparative experiment, including three VT models: ViT [3],
BoTNet [29], T2T-ViT [9]. There are also nine CNN models: VGG11 [6],
VGG13 [6], VGG16 [6], VGG19 [6], InceptionV3 [7], MobileNetV2 [30],
ShuffleNetV2×1.0 [31], ShuffleNetV2×0.5 [31], InceptionResNetV1 [32]. The rest
of the experimental settings are the same as the main experiment.
The comparison results of DVT and the other 12 models can be seen in
Table 4. It can be seen that the accuracy of DVT is the highest among all models,
at 87.35%. The accuracy of DVT is about 3% higher than InceptionV3 in CNN,
and about 2% higher than ViT in VT. DVT’s precision, recall, and F1-Score
are also the highest among all models, 87.40%, 87.40%, and 87.20% respectively.
This shows that the DVT proposed by us is quite effective.

Table 4. Comparison results of models on the test set of the combined dataset

Models Average Average Average Accuracy(%)


Precision (%) Recall (%) F1-Score (%)
VGG11 84.80 83.90 84.00 84.06
VGG13 85.30 84.70 84.40 84.62
VGG16 79.80 79.90 79.60 79.97
VGG19 78.60 78.30 78.20 78.38
InceptionV3 85.40 84.90 84.80 84.96
MobileNetV2 84.90 84.60 84.60 84.62
ShuffleNetV2×1.0 85.50 84.30 84.50 84.57
ShuffleNetV2×0.5 82.60 82.60 82.50 82.64
InceptionResNetV1 85.20 84.90 84.90 84.96
ViT 86.10 86.00 85.70 85.87
BoTNet 66.10 65.50 63.90 65.51
T2T-ViT 85.60 85.30 85.20 85.19
Our 87.40 87.40 87.20 87.35
292 W. Liu et al.

5 Conclusion and Future Work


In this study, we propose a VT framework called DVT to perform cervical cell
classification tasks. We use SIPaKMeD and CRIC datasets together for classifi-
cation experiments. On 11-class classification tasks, DVT achieves an accuracy
of 87.35%. In comparative experiments, DVT outperforms the separate CNN
and VT models. In the future, we may improve the internal structure of the
model or combine more models to improve the accuracy of DVT. We may apply
data enhancement methods to better preprocess the data.

References
1. Šarenac, T., Mikov, M.: Cervical cancer, different treatments and importance of
bile acids as therapeutic agents in this disease. Front. Pharmacol. 10, 484 (2019)
2. Saslow, D., et al.: American cancer society, American society for colposcopy and
cervical pathology, and American society for clinical pathology screening guidelines
for the prevention and early detection of cervical cancer. CA Cancer J. Clin. 62(3),
147–172 (2012)
3. Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image
recognition at scale. arXiv preprint arXiv:2010.11929 (2020), https://arxiv.org/
abs/2010.11929
4. Plissiti, M.E., Dimitrakopoulos, P., Sfikas, G., Nikou, C., Krikoni, O., Charchanti,
A.: Sipakmed: a new dataset for feature and image based classification of normal
and pathological cervical cells in pap smear images. In: 2018 25th IEEE Interna-
tional Conference on Image Processing (ICIP), pp. 3144–3148. IEEE (2018)
5. Rezende, M.T., et al.: Cric searchable image database as a public platform for
conventional pap smear cytology data. Sci. Data 8(1), 1–8 (2021)
6. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556 (2014). https://arxiv.org/abs/
1409.1556
7. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the incep-
tion architecture for computer vision. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
8. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training
data-efficient image transformers & distillation through attention. In: International
Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
9. Yuan, L., et al.: Tokens-to-token vit: training vision transformers from scratch
on imagenet. arXiv preprint arXiv:2101.11986 (2021), https://arxiv.org/abs/2101.
11986
10. Rahaman, M.M., et al.: A survey for cervical cytopathology image analysis using
deep learning. IEEE Access 8, 61687–61710 (2020)
11. Xue, D., et al.: An application of transfer learning and ensemble learning techniques
for cervical histopathology image classification. IEEE Access 8, 104603–104618
(2020)
12. Khamparia, A., Gupta, D., de Albuquerque, V.H.C., Sangaiah, A.K., Jhaveri, R.H.:
Internet of health things-driven deep learning system for detection and classifica-
tion of cervical cells using transfer learning. J. Supercomput. 76(11), 8590–8608
(2020). https://doi.org/10.1007/s11227-020-03159-4
DVT 293

13. Rahaman, M.M., et al.: Deepcervix: a deep learning-based framework for the clas-
sification of cervical cells using hybrid deep feature fusion techniques. Comput.
Biol. Med. 136, 104649 (2021)
14. Shi, J., Wang, R., Zheng, Y., Jiang, Z., Zhang, H., Yu, L.: Cervical cell classifica-
tion with graph convolutional network. Comput. Methods Programs Biomed. 198,
105807 (2021)
15. Liu, W., et al.: Is the aspect ratio of cells important in deep learning? A robust
comparison of deep learning methods for multi-scale cytopathology cell image clas-
sification: From convolutional neural networks to visual transformers. Comput.
Biol. Med. 105026 (2021)
16. Li, C., Zhang, J., Kulwa, F., Qi, S., Qi, Z.: A SARS-CoV-2 microscopic image
dataset with ground truth images and visual features. In: Peng, Y., et al. (eds.)
PRCV 2020. LNCS, vol. 12305, pp. 244–255. Springer, Cham (2020). https://doi.
org/10.1007/978-3-030-60633-6 20
17. Rahaman, M.M., et al.: Identification of covid-19 samples from chest x-ray images
using deep learning: a comparison of transfer learning approaches. J. Xray Sci.
Technol. 28(5), 821–839 (2020)
18. Ismael, A.M., Şengür, A.: Deep learning approaches for covid-19 detection based
on chest x-ray images. Expert Syst. Appl. 164, 114054 (2021)
19. Li, C., et al.: A review for cervical histopathology image analysis using machine
vision approaches. Artif. Intell. Rev. 53(7), 4821–4862 (2020). https://doi.org/10.
1007/s10462-020-09808-7
20. Li, C., et al.: A comprehensive review of computer-aided whole-slide image analy-
sis: from datasets to feature extraction, segmentation, classification and detection
approaches. Artif. Intell. Rev. 1–70 (2021). https://doi.org/10.1007/s10462-021-
10121-0
21. Chen, H., et al.: IL-MCAM: an interactive learning and multi-channel attention
mechanism-based weakly supervised colorectal histopathology image classification
approach. Comput. Biol. Med. 143, 105265 (2022)
22. Hu, W., et al.: GasHisSDB: a new gastric histopathology image dataset for com-
puter aided diagnosis of gastric cancer. Comput. Biol. Med. 105207 (2022)
23. Hameed, Z., Zahia, S., Garcia-Zapirain, B., Javier Aguirre, J., Marı́a Vanegas,
A.: Breast cancer histopathology image classification using an ensemble of deep
learning models. Sensors 20(16), 4373 (2020)
24. Li, C., Wang, K., Xu, N.: A survey for the applications of content-based microscopic
image analysis in microorganism classification domains. Artif. Intell. Rev. 51(4),
577–646 (2017). https://doi.org/10.1007/s10462-017-9572-4
25. Zhang, J., et al.: A comprehensive review of image analysis methods for microor-
ganism counting: from classical image processing to deep learning approaches.
Artif. Intell. Rev. 1–70 (2021)
26. Kosov, S., Shirahama, K., Li, C., Grzegorzek, M.: Environmental microorganism
classification using conditional random fields and deep convolutional neural net-
works. Pattern Recogn. 77, 248–261 (2018)
27. Zhang, J., et al.: LCU-Net: a novel low-cost u-net for environmental microorganism
image segmentation. Pattern Recogn. 115, 107885 (2021)
28. Diniz, N., et al.: A deep learning ensemble method to assist cytopathologists in
pap test image classification. J. Imaging 7(7), 111 (2021)
29. Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: Bottle-
neck transformers for visual recognition. arXiv preprint arXiv:2101.11605 (2021).
https://arxiv.org/abs/2101.11605
294 W. Liu et al.

30. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2:
Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
31. Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for
efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C.,
Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138.
Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9 8
32. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet
and the impact of residual connections on learning. In: Proceedings of the Thirty-
First AAAI Conference on Artificial Intelligence, pp. 4278–4284 (2017). https://
dl.acm.org/doi/10.5555/3298023.3298188
Image Classification in Breast
Histopathology Using Transfer
and Ensemble Learning

Yuchao Zheng1 , Chen Li1(B) , Xiaomin Zhou1 , Haoyuan Chen1 ,


Haiqing Zhang1 , Yixin Li1 , Hongzan Sun2 , and Marcin Grzegorzek3
1
Microscopic Image and Medical Image Analysis Group, College of Medicine and
Biological Information Engineering, Shenyang, China
lichen@bmie.neu.edu.cn
2
Shengjing Hospital, China Medical University, Shengjing, China
sunhz@sj-hospital.org
3
Institute for Medical Informatics, University of Luebeck, Luebeck, Germany
marcin.grzegorzek@uni-luebeck.de

Abstract. Breast cancer has the highest prevalence in women globally.


The classification of breast cancer histopathological images has always
been a hot spot of clinical concern. In Computer-Aided Diagnosis (CAD),
traditional models mostly use a single network to extract features, which
has significant limitations. Besides, many networks are trained and opti-
mized on patient-level datasets, ignoring the lower-level labels. This
paper proposes a deep ensemble model based on image-level labels for
the binary classification of benign and malignant breast histopathologi-
cal images. First, the BreakHis dataset is randomly divided into training,
validation and test sets. Then, data augmentation techniques are used
to balance the number of benign and malignant samples. Thirdly, VGG-
16, Xception, Resnet-50, Densenet-201 are selected as the base classi-
fiers resulting from their complementarity. Finally, with accuracy as the
ensemble weight, an accuracy of 98.90% is achieved. In order to verify
the capabilities of our method, the latest Transformer and Multilayer
Perception (MLP) models have been experimentally compared on the
same dataset. Our model wins with a 5–20% advantage, emphasizing
the ensemble model’s far-reaching significance in classification tasks.

Keywords: Convolutional neural network · Transfer learning ·


Ensemble learning · Image classification · Histopathological image ·
Breast cancer

1 Introduction

Breast cancer is the most common malignant tumor in the world, directly affect-
ing the lives of 2.3 million women every year. Especially for females, it accounts
for 30% of all cancer deaths in the world [21]. Breast cancer is a disease in which
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 295–306, 2022.
https://doi.org/10.1007/978-3-031-09135-3_25
296 Y. Zheng et al.

breast epithelial cells proliferate out of control under the action of a variety of
carcinogens. It generally can be divided into benign (not dangerous to health)
and malignant (potentially dangerous). Early treatment of breast cancer is very
important, and doctors need to choose an effective treatment plan based on its
malignancy. At present, histopathological diagnosis is generally regarded as a
“gold standard” [13]. To better observe and analyze the different components of
the tissue under the microscope, Hematoxylin and Eosin staining is frequently
used, abbreviated as H&E [4].
However, it is difficult to observe the tissue with naked eyes and to manually
analyze the visual information based on prior medical knowledge, so CAD is of
vital significance. However, most of the existing methods are based on a single
classifier, and few models based on image-level can achieve good results. This
is because image-level classification is more complex: Images under the same
patient-level label are related, but image-level images have relatively independent
labels, which is much more challenging. Therefore, this paper proposes an image-
level classification method based on transfer learning and ensemble learning. The
workflow is shown in Fig. 1.

Fig. 1. Workflow of the proposed method

As the workflow presented, the basic framework of this method is mainly com-
posed of six parts: (a) Data augmentation. The benign tumor image is flipped
horizontally and vertically, in order to balance the amount of training data
between benign and malignant tumors. (b) Data input. The images are divided
into training, validation and test sets. (c) Transfer learning. Six CNNs are used
for transfer learning to get applicable networks. (d) Classifier selection. The idea
of ensemble pruning is applied to select the most suitable base classifiers. (e)
Classification in Breast Histopathology Image 297

Ensemble learning. The weighted voting method is used to further improve the
classification performance. (f) Result evaluation. The confusion matrix is used
to measure the classification ability of the whole algorithm.

2 Related Work
In the past ten years, CNN has had an outstanding performance in Breast
Histopathology Image Classification. In [14], a third-party software (LNKNet
package) containing a neural network classifier is applied to evaluate two spe-
cific textures, namely the quantity density of the two landmark substances, and
a 90% classification accuracy of breast histopathology images is achieved.
In the paper [20], morphological features are extracted to classify cancer cells
and lung cancer cells in histopathological images. In the experiment, the multi-
layer perceptIon based on the feedforward ANN model obtains 80% accuracy,
82.9% sensitivity, and 89.2% AUC successfully.
In [2], a CNN based classifier of full-slice histopathological images is designed.
First, the posterior estimate is obtained, which is derived from the CNN with
specific magnification. Then, the posterior estimates of these random multi-views
are vote-filtered to provide a slice-level diagnosis. Finally, 5-fold cross-validation
is used in the experiment, with an average accuracy of 94.67%, a sensitivity of
96%, a specificity of 92%, and an F1-score of 96.24%.
In [31], a new hybrid convolutional and cyclic DNN is created to classify
breast cancer histopathological images. The paper uses fine-tuned Inception-V3
to extract features for each image block. Then, the feature vectors are input to
the 4-layer bidirectional long and short-term memory network [17] for feature
fusion. Finally, the average accuracy is 91.3% and a new dataset containing 3,771
histopathological images of breast cancer is published.
Other applications of classic CNNs and deep learning in breast histopatho-
logical images have been summarized in [34]. However, there are a few kinds of
research on the binary classification of breast histopathological images using
ensemble learning methods. Among them, patient-level classification is com-
monly studied, such as research in Table 1. Nevertheless, since the image-level
classification is more complex, the accuracy is relatively low in previous research,
such as [11] with less than 90% and [7] with around 91%. Therefore, this paper
focuses on the challenge of image-level classification. At the same time, the char-
acteristics of each transfer learning model are comprehensively considered, and
ensemble learning is carried out based on their complementarity.
298 Y. Zheng et al.

Table 1. Researches on the patient-level classification of breast histopathological


images based on ensemble learning

Year Dataset Transfer learning CNNs Accuracy [%]


2019 [9] BreakHis VGG-19, Mobilenet-V2, Densenet-201 98.13
2019 [32] BreakHis (40×) Resnet-101, Resnet-152, Densenet-161 99.75
2020 [1] 544 Images VGG-16, VGG-19 95.29
2021 [18] BreakHis (40×) Resnet-152, Densenet-161 100.00
2021 [19] BreakHis (40×) Densenet-161, 6 image-wise CNNs 99.95

3 Method
In this section, the transfer learning and ensemble learning methods are intro-
duced separately, and some classical CNNs are mentioned.

3.1 Transfer Learning


Transfer learning is a method of applying knowledge or patterns learned in a
specific field or task to different fields or problems [16]. Recently, research is
conducted based on deep transfer learning techniques to identify COVID-19 cases
using CXR images and has obtained excellent results [15]. Given a source domain
DS = {XS , fS (x)} and learning task TS , a target domain DT = {XT , fT (x)}
and learning task TT , transfer learning aims to help improve the learning of the
target predictive function fT (x) in DT using the knowledge in DS and TS , where
DS = DT , or TS = TT . As for neural networks, there are two main methods of
applying transfer learning: One is to fine-tune the parameters of the pre-trained
network; the other is to use the pre-trained network as a feature extractor and
then use these features to train a new classifier. In this paper, we choose the
former one.
In this paper, the VGG series [22], Inception series [25], Resnet series [6]
and Densenet [8] series are carefully studied and compared, and six CNNs are
selected to classify breast histopathological images. They are: VGG-16, VGG-
19, Inception-V3, Xception, Resnet-50 and Densenet-201. These classic networks
have passed numerous classification tasks and show high accuracy and stability
in various datasets. Besides, their performance shows great complementarity.
For VGGNet, the VGG-16 and the VGG-19 network model are selected for their
simple structure and excellent learning performance on many tasks. In terms
of the classification ability and parameter quantity, Inception-V3 and Xception
models are adopted. In addition, considering the calculation ability, the Resnet-
50 and the Densenet-201 network model are chosen.

3.2 Ensemble Learning


In the past few decades, ensemble learning has received wide attention in compu-
tational intelligence and machine learning. First, a group of individual learners
Classification in Breast Histopathology Image 299

is generated by existing learning algorithms (e.g. decision tree, error propagation


neural network) from training data; after that, the learners are effectively com-
bined through a specific strategy; finally, the expected experimental results can
be obtained [5]. Using individual learners of the identical type (e.g. all neural
networks) is called homogeneous ensemble learning, and applying various ones is
called heterogeneous ensemble learning. In this paper, the former is used. Voting,
averaging, and stacking are three combination strategies of ensemble learning.
Considering their complexity and ease of operation, the basic weighted voting
strategy is applied in this paper, and the formula is Eq. (1). With appropriate
weights, weighted voting can be both superior to the individual classifier and
the absolute majority voting method.
Among them, wi represents the weight of the classifier hi . In practical appli-
cations, similar to the weighted average method, the Tweight coefficients are often
normalized and are constrained to be wi ≥ 0 and i=1 wi = 1. Getting the right
weight is very important. Suppose l = (l1 , l2 , ..., lT )T is the output of the indi-
vidual classifier, where li represents the prediction result of the class label of
the classifier hi on sample x. Let pi be the precision of hi , the combined output
of the category label cj can be expressed as Eq. (2) using a Bayesian optimal
discriminant function. When it is assumed that the output conditions of indi-
vidual classifiers are independent, Eq. (2) can be reduced to Eq. (3). When the
first term of the above equation does not depend on the individual classifier, the
optimal weight of the weighted voting can be obtained from the second term
and satisfies Eq. (4). In this paper, the weight is set based on the evaluation
indicators of the classifier.
T

H(x) = cargj max wi hji (x) (1)
i=1
H j (x) = log(P (cj )P (l | cj )) (2)
T
 pi
H j (x) = log P (cj ) + hji (x) log (3)
i=1
1 − pi
pi
wi ∝ log (4)
1 − pi

4 Experiments

In this section, the experimental settings, process, analysis and limitation of the
results will be presented. Empirically, we demonstrate our method significantly
improves on a number of prior art machine learning schemes.
300 Y. Zheng et al.

4.1 Experimental Settings

To implement the proposed model, a practical open source BreakHis dataset


is used in the research of this paper [23]. BreakHis consists of 7,909 clinically
representative microscopic images of breast tumor tissue, which are collected
from 82 patients using four magnifications. Up to now, there are 2,480 benign
and 5,429 malignant tumor samples in this dataset, both of whom are divided
into different subtypes, as shown in Fig. 2.

Fig. 2. Benign/malignant tumor subtype samples. (a)–(d) are benign images, (e)–(h)
are malignant images

This experiment separates the BreakHis dataset on image-level at a ratio of


7:1:2 into the training, test and validation sets. However, the number of benign
(2,480) and malignant (5,429) images is seriously imbalanced, so a data augmen-
tation strategy is applied to the benign samples, namely mirror flip. A total of
2,480 images are generated through horizontal mirror flip and 469 images are
randomly selected from vertical mirror flip. In this way, the number of images
in both two categories is equal, with 10,858 images.

4.2 Deep Learning Algorithm

In this paper, a transfer learning strategy is applied to train the models. After
many trials, parameters are set when the validation set obtains the best result.
The decay steps are set to 5, the decay rate is 0.1, the initial learning rate is
1 × 10−4 and the adaptive moment estimation (Adam) is selected as the opti-
mizer. Besides, these CNNs are also trained from scratch to evaluate the per-
formance of transfer learning, which means that all the weights are initialized
randomly. Results are presented in Table 2.
Classification in Breast Histopathology Image 301

Table 2. The result and prediction time (unit, s) of CNN models on validation set
(unit, [%]). Acc – Accuracy; R – Precision; R – Recall; F1 – F1-score

Model Training mode Acc P R F1 Time


VGG-16 Training from scratch 89.50 89.65 89.32 89.48 17.28
Transfer learning 95.49 95.40 95.58 95.49 32.50
VGG-19 Training from scratch 91.34 90.16 92.82 91.47 17.96
Transfer learning 95.03 93.27 97.05 95.13 38.61
Inception-V3 Training from scratch 94.48 90.59 99.26 94.73 42.75
Transfer learning 93.83 97.41 90.06 93.59 52.99
Xception Training from scratch 89.96 86.29 95.03 90.45 31.04
Transfer learning 96.59 94.54 98.90 96.67 39.45
Resnet-50 Training from scratch 84.71 94.15 74.03 82.89 34.97
Transfer learning 98.90 98.72 99.08 98.90 42.72
Densenet-201 Training from scratch 88.03 95.19 80.11 87.00 94.14
Transfer learning 98.25 96.95 99.63 98.27 103.06

From Table 2, it can be concluded that the effect of transfer learning is gen-
erally better than training from scratch by around 3% to 14%. Although the
performance of Inception-V3 shows an opposite trend, this does not affect our
subsequent choice of transfer learning for the next step. This is because training
a network from scratch usually takes more than 2 h.
Figure 3 shows the transfer learning training curves of the six CNNs. Sub-
sequently, these six networks are compared, and their confusion matrices are

Fig. 3. CNN training process curve. The X-axis is epoch, Y-axis is acc_loss. Training
accuracy-blue, training loss-green, validation accuracy-yellow, validation loss-red
302 Y. Zheng et al.

shown in Fig. 4. In the end, the Resnet-50 network performs best, with three
indicators that top all network models. To be precise, the accuracy is 98.90%,
the precision reaches 98.72%, and the F1-score reaches 98.90%. The second place
is the Densenet-201 network at around 98%, and the recall of the Densenet-201
network is the highest among all models, reaching 98.63%. By contrast, The
results of the Inception-V3 network are relatively the worst. The lowest value of
precision is obtained on VGG-19, namely 93.27%.

4.3 Ensemble Learning


According to the evaluation system after transfer learning, we select the most
suitable base classifiers for the next process, and the weighted voting strategy is
used to further improve the accuracy.

Ensemble Pruning. Ensemble pruning refers to selecting a subset of the indi-


vidual learners for the next step. Firstly, the ensemble result can be obtained
using a smaller network scale, reducing the storage overhead brought from stor-
ing model and the computing overhead corresponding to the output of the indi-
vidual learner, thereby improving the efficiency of the model. In addition, the
generalization performance of pruning is even better than using all learners.
In this paper, two classifiers with relatively poor performance are pruned from
the six CNN models based on transfer learning, and four classifiers are selected

Fig. 4. Confusion matrix of CNN on validation set


Classification in Breast Histopathology Image 303

as individual learners, namely VGG-16, Xception, Resnet-50, and Densenet-201


network model. When selecting weights, the four evaluation indicators in each
network are used as weight and many trials are carried out. After comparing the
results, accuracy is chosen as the weight due to its best ensemble performance.

Evaluation of Ensemble Learning Algorithms. After training, the four


indicators of accuracy, precision, recall, and F1-score are still used to evaluate
the overall performance of the system. Figure 5 shows the confusion matrix of
ensemble learning.
Finally, the overall indicators of all single CNNs and our ensembled method
are evaluated on the test set and summarized in Table 3. The classification per-
formance of the ensemble learning method is generally better than that of the
single transfer learning method.

Comparison of Experimental Results. To evaluate the capabilities of the


algorithm proposed in this paper, a comparative experiment on Transformer
and MLP models is carried out on the same dataset. The latest methods are

Fig. 5. Confusion matrix of ensemble learning on validation set


Table 3. Summary of classification results in testing process (unit, [%])

Model Accuracy Precision Recall F1-score


VGG-16 95.44 95.74 95.12 95.43
VGG-19 94.66 92.47 97.24 94.79
Inception-V3 93.65 96.93 90.15 93.42
Xception 96.04 94.25 98.07 96.12
Resnet-50 98.80 99.07 98.53 98.53
Densenet-201 98.30 97.64 98.99 98.31
Ensemble 98.90 98.72 99.08 99.90
304 Y. Zheng et al.

Table 4. A comparison between the existing methods and our method on BreakHis

Model Accuracy [%] Training time [s]


Transformer BoTNet-50 [24] 90.75 4502
CaiT [29] 96.70 5081
CoaT [30] 92.91 420
DeiT [28] 90.51 2101
LeViT [3] 93.42 13321
ViT [10] 80.11 1242
T2T-ViT [33] 80.02 2370
MLP MLP-mixer [26] 83.84 5541
gMLP [12] 93.88 8407
ResMLP [27] 79.56 10341

summarized in Table 4. However, their overall classification performance is not


so good.

Discussion. It is clear that the accuracy of our method is much better than
some state-of-the-art machine learning strategies. Compared with a single clas-
sifier mentioned above, all of the four indicators are enhanced in the ensembled
framework. In particular, F1-score is even close to 100%, showing that both
precision and recall of the model are excellent.
Figure 6 illustrates some images that are not correctly classified. It can be
found that the correctly classified image contains almost all features of benign
and malignant. However, some samples are quite similar to the other type of
texture, distribution and color, such as Fig. 6 (a) and Fig. 6 (b). Also, the wrongly
predicted images often contain very little valuable information, such as Fig. 6 (c)
and Fig. 6 (d). Patch-based images are allowed to contain many blanks during
segmentation. These reasons above can interfere with the model’s reliability.

Fig. 6. An example of the classification result. (a) and (c) are correctly classified
images. (b) and (d) are wrongly classified images
Classification in Breast Histopathology Image 305

5 Conclusion and Future Work


In this paper, a framework that combines transfer learning and ensemble learning
is proposed for the image-level classification of breast histopathological images.
Finally, an accuracy of 98.90% is obtained. In this paper, four transfer learning
networks are ensembled based on the weighted voting method with accuracy
as the weight. In addition, we use the ten latest Transformer and MLP models
to classify images on the same dataset, which proves that our method is quite
competitive.

References
1. Anda, J.: Histopathology image classification using an ensemble of deep learning
models. Sensors 20 (2020)
2. Das, K., Karri, S., Roy, A., et al.: Classifying histopathology whole-slides using
fusion of decisions from deep convolutional network on a collection of random
multi-views at multi-magnification. In: Proceedings of ISBI 2017, pp. 1024–1027
(2017)
3. Graham, B., El-Nouby, A., Touvron, H., et al.: Levit: a vision transformer in con-
vnet’s clothing for faster inference. In: Proceedings of ICCV 2021, pp. 12,259–
12,269 (2021)
4. Gurcan, M., Boucheron, L., Can, A., et al.: Histopathological image analysis: a
review. IEEE Rev. Biomed. Eng. 2, 147–171 (2009)
5. Hadad, O., Ran, B., Ben-Ari, R., et al.: Ensemble learning (2009)
6. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In:
Proceedings of CVPR 2016, pp. 770–778 (2016)
7. Hu, C., Sun, X., Yuan, Z., et al.: Classification of breast cancer histopathological
image with deep residual learning. Int. J. Imaging Syst. Technol. 31, 1583–1594
(2021)
8. Huang, G., Liu, Z., Van Der Maaten, L., et al.: Densely connected convolutional
networks. In: Proceedings of CVPR 2017, pp. 4700–4708 (2017)
9. Kassani, S., Kassani, P., Wesolowski, M., et al.: Classification of histopathological
biopsy images using ensemble of deep learning networks (2019)
10. Kolesnikov, A., Dosovitskiy, A., Weissenborn, D., et al.: An image is worth 16x16
words: Transformers for image recognition at scale (2021)
11. Li, J., Zhang, J., Sun, Q., et al.: Breast cancer histopathological image classification
based on deep second-order pooling network, pp. 1–7 (2020)
12. Liu, H., Dai, Z., So, D., et al.: Pay attention to mlps. Adv. Neural. Inf. Process.
Syst. 34, 9204–9215 (2021)
13. Matos, D., J., A., B.J., Oliveira, L., et al.: Histopathologic image processing: a
review. arXiv: 1904.07900 (2019)
14. Petushi, S., Garcia, F., Haber, M., et al.: Large-scale computations on histology
images reveal grade-differentiating parameters for breast cancer. BMC Med. Imag-
ing 6(1), 1–11 (2006)
15. Rahaman, M., Li, C., Yao, Y., et al.: Identification of covid-19 samples from chest
x-ray images using deep learning: a comparison of transfer learning approaches. J.
Xray Sci. Technol. 28(5), 821–839 (2020)
16. Ribani, R., Marengoni, M.: A survey of transfer learning for convolutional neural
networks. In: Proceedings of SIBGRAPI 2019, pp. 47–57 (2019)
306 Y. Zheng et al.

17. Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans.
Signal Process. 45(11), 2673–2681 (1997)
18. Senousy, Z., Abdelsamea, M., Gaber, M., et al.: Mcua: multi-level context and
uncertainty aware dynamic deep ensemble for breast cancer histology image clas-
sification. IEEE Transactions on Biomedical Engineering, pp. 1–1 (2021)
19. Senousy, Z., Abdelsamea, M., Mostafa Mohamed, M., et al.: 3e-net: Entropy-based
elastic ensemble of deep convolutional neural networks for grading of invasive breast
carcinoma histopathological microscopic images. Entropy 23, 620 (2021)
20. Shukla, K., Tiwari, A., Sharma, S., er al.: Classification of histopathological images
of breast cancerous and non cancerous cells based on morphological features.
Biomed. Pharmacol. J. 10(1), 353–366 (2017)
21. Siegel, R., Miller, K., Fuchs, H., et al.: Cancer statistics, 2021. CA: A Cancer J.
Clin. 71(1), 7–33 (2021)
22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv:1409.1556 (2014)
23. Spanhol, F., Oliveira, L., Petitjean, C., et al.: A dataset for breast cancer
histopathological image classification. IEEE Trans. Biomed. Eng. 63(7), 1455–1462
(2015)
24. Srinivas, A., Lin, T., Parmar, N., et al.: Bottleneck transformers for visual recog-
nition, pp. 16,519–16,529 (2021)
25. Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions, pp. 1–9 (2015)
26. Tolstikhin, I., Houlsby, N., Kolesnikov, A., et al.: Mlp-mixer: an all-mlp architecture
for vision. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
27. Touvron, H., Bojanowski, P., Caron, M., et al.: Resmlp: Feedforward networks for
image classification with data-efficient training. arXiv: 2105.03404 (2021)
28. Touvron, H., Cord, M., Douze, M., et al.: Training data-efficient image transformers
& distillation through attention, pp. 10,347–10,357 (2021)
29. Touvron, H., Cord, M., Sablayrolles, A., et al.: Going deeper with image trans-
formers. In: Proceedings of ICCV 2021, pp. 32–42 (2021)
30. Xu, W., Xu, Y., Chang, T., et al.: Co-scale conv-attentional image transformers.
In: Proceedings of ICCV 2021, pp. 9981–9990 (2021)
31. Yan, R., Ren, F., Wang, Z., et al.: Breast cancer histopathological image classifi-
cation using a hybrid deep neural network. Methods 173, 52–60 (2020)
32. Yang, Z., Ran, L., Zhang, S., et al.: Ems-net: ensemble of multiscale convolutional
neural networks for classification of breast cancer histology images. Neurocomput-
ing 366, 46–53 (2019)
33. Yuan, L., Chen, Y., Wang, T., et al.: Tokens-to-token vit: training vision trans-
formers from scratch on imagenet. In: Proceedings of ICCV 2021, pp. 558–567
(2021)
34. Zhou, X., Li, C., Rahaman, M., et al.: A comprehensive review for breast
histopathology image analysis using classical and deep neural networks. IEEE
Access 8, 90931–90956 (2020)
PIS-Net: A Novel Pixel Interval Sampling
Network for Dense Microorganism
Counting in Microscopic Images

Jiawei Zhang1 , Chen Li1(B) , Hongzan Sun2 , and Marcin Grzegorzek3


1
Microscopic Image and Medical Image Analysis Group, College of Medicine and
Biological Information Engineering, Northeastern University, Shenyang, China
1971087@stu.neu.edu.cn, lichen@bmie.neu.edu.cn
2
Shengjing Hospital, China Medical University, Shenyang, China
sunhz@sj-hospital.org
3
Institute for Medical Informatics, University of Luebeck, Lübeck, Germany
marcin.grzegorzek@uni-luebeck.de

Abstract. A novel Pixel Interval Sampling Network (PIS-Net) is


applied here for dense microorganism counting. The PIS-Net is designed
for microorganism image segmentation with encoder to decoder architec-
ture, and then the connected domain detection is applied for counting.
The proposed method has good response for edge segmentation between
tiny objects. Several classical segmentation metrics (Dice, Jaccard, and
Hausdorff distance) are applied for evaluation. Experimental result shows
that the proposed PIS-Net has the best performance and potential for
dense tiny object counting tasks, which achieves 96.88% counting accu-
racy on the dataset with 420 yeast cell images. By comparing with the
state-of-the-art approaches like Attention U-Net, Swin U-Net, and Trans
U-Net, the proposed PIS-Net can segment the dense tiny objects with
clearer boundaries and fewer incorrect debris, which shows the great
potential of PIS-Net in the task of accurate counting tasks.

Keywords: Microorganism counting · Pixel interval sampling · Image


segmentation · Deep learning · Dense objects

1 Introduction
The problem of environmental pollution needs to be resolved with the develop-
ment of urbanization. With the break of Corona Virus Disease 2019 (COVID-19),
people pay more attention to the microorganism analysis [20,22]. The biologi-
cal methods, which are more efficient with no pollution, are widely applied for
pollution controlling by comparing with physical and chemical methods. Yeast
is a kind of single-celled eukaryotic microorganism, which was first applied in
wastewater treatment and performed startlingly [29], and it can also be applied
for the treatment of toxic industrial wastewater and solid waste [34] till now.

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 307–318, 2022.
https://doi.org/10.1007/978-3-031-09135-3_26
308 J. Zhang et al.

The research of yeast counting can quantitatively evaluate the performance


of yeast in various tasks [30]. The classical counting methods are stable and
straightforward, but they are subjective, and the counting accuracy is unsatis-
factory when the number of cells becomes larger. With the development of dig-
ital image processing and machine learning technologies, image analysis-based
approaches are widely applied for microorganism classification, segmentation,
and counting [16–19,26,35,36,38]. However, by reviewing our recent work [14],
we find that deep learning methods are mostly applied for microorganism clas-
sification, but not for microorganism segmentation in the task of microorganism
counting. To improve the performance of dense tiny microorganism counting, we
propose a novel Pixel Interval Sampling Network (PIS-Net) for the yeast count-
ing task with higher accuracy. The PIS-Net is designed as an Encoder-Decoder
architecture based on Convolutional Neural Network (CNN), pixel interval sam-
pling, and concatenate operations. The proposed PIS-Net has improved counting
performance by comparing with SegNet [3] and U-Net [23]. The workflow of the
proposed PIS-Net counting method is shown in Fig. 1.

Fig. 1. The workflow of PIS-Net-based dense tiny microorganism counting system

In Fig. 1, (a) Original Dataset: The dataset contains images of yeast cells and
their ground truth (GT). There are less than 256 yeast cells in each image. (b)
Data Augmentation: Mirror and rotation operations are applied to expand the
original dataset. (c) Training Process: PIS-Net is trained for image segmentation,
and the model with the best performance is saved. (d) Segmentation Result:
Using the best PIS-Net model to output the segmentation binary images. (e)
Counting Result: The number of yeast cells is counted by using connected domain
detection.
The main contributions of this paper are as follows: (1) We propose the PIS-
Net for dense tiny object counting. All down-sampling operations are based on
pixel interval sampling without max-pooling. (2) The operation of max-pooling
would lose some local features of tiny objects (the edge lines may not be con-
nected after max-pooling operations), while PIS-Net can cover a more detailed
region. (3) The proposed PIS-Net achieves the best counting result by comparing
it with other models.
PIS-Net for Dense Microorganism Counting 309

2 Related Work
In Table 1, the digital image processing-based microorganism counting approaches
are listed, including classical and machine learning-based counting approaches.
Our survey paper [14] summarized and analyzed these methods in detail.

Table 1. Microorganism image counting methods

Category Subcategory Related work


Classical methods Thresholding based methods [2, 10, 33]
Edge detection based methods [4, 5, 27]
Watershed based methods [1, 13, 24]
Machine learning Classical machine learning based methods [28, 37]
Methods Deep learning based methods [6, 12]

Classical Counting Methods. The segmentation result determines the per-


formance of microorganism counting. In Table 1, the classical approaches are
listed, which contain thresholding, edge detection, and watershed based methods.
Many kinds of thresholding approaches are applied for microorganism counting
tasks [2,10,33]. In [2], the iterative local threshold is applied for bacteria image
segmentation, and then a Hough circle transformation is used to separate clus-
tered colonies into a single colony. In [10], an adaptive thresholding method is
applied for image segmentation, then, the minima function is used for center
locating of the microorganism. The work [33] uses Otsu thresholding for bac-
teria image segmentation, and the hypothesis testing is then used for debris
erasing. Edge detection approaches are applied for microorganism image count-
ing [4,5,27]. In [4], several combination approaches of different edge detection
filters are applied for the microorganism counting tasks. Then the concave sur-
face between the connected colonies is detected for counting. In [27], Sobel and
Laplacian filters are applied for bacteria edge detection. In [1,13,24], watershed
approaches are applied for microorganism counting. In [1], a watershed-based
method is used for bacteria images colonies separation. Then, the circularity
ratio of colonies is measured and counted. In [13], distance transformation is
combined with watershed and applied for the bacteria counting task. Then, the
sharp corners are eliminated by using morphological operations.

Machine Learning-Based Counting Methods. In Table 1, the counting


methods based on classical machine learning and deep learning are summarized.
In [28], Principal Component Analysis (PCA) is used to separate the bacteria
with useless debris. Then, the nearest neighbor searching technology is used
to separate the clustered colonies. In [37], Support Vector Machine (SVM) is
310 J. Zhang et al.

trained using shape features for different microorganisms classification, then the
Otsu thresholding is used for counting. In [6], the Marr-Hildreth operator is
used to detect the edge of bacteria images. Then thresholding is used for binary
image segmentation. After that, Artificial Neural Network (ANN) is trained for
bacteria classification. In [12], the contrast-limited adaptive histogram equaliza-
tion is used to segment the bacteria. Moreover, a neural network including four
convolutional and one fully connected layers is used for microorganism image
classification. By analyzing all related works, it can be found that in the task of
microorganism counting, most deep learning approaches are used for classifica-
tion tasks, but few for segmentation tasks. The classical segmentation approaches
perform unsatisfactorily through the pre-test because the contour lines of yeast
cells in our dataset are not clear. So we design an end-to-end CNN model for
the tasks of dense tiny objects segmentation.

3 PIS-Net-Based Yeast Counting Method


3.1 Image Dataset
In our work, we use a yeast image dataset proposed in [11], containing 51 different
images of yeast cells and their corresponding ground truth (GT) images. The
images contain 3799 yeast cells in total. All images are resized to the resolution
of 256×256 pixels, which are shown in Fig. 2. Then the original 51 images are
rotated (0, 90, 180, and 270◦ C) and flipped (mirror), so the number of images
in this dataset is augmented to eight times (408 images).

Fig. 2. The images in yeast cell dataset (a) shows the original yeast image and (b) is
the corresponding Ground Truth images

The augmented yeast image dataset is randomly divided into training, vali-
dation, and test dataset with the ratio of 3:1:1. Therefore, 244 original and their
corresponding GT images are used for the training task, 82 original and their
corresponding GT images are used for validation, and 82 original images are
used for the test.
PIS-Net for Dense Microorganism Counting 311

3.2 The Structure of PIS-Net

The structure of the proposed PIS-Net is shown in Fig. 3, which is an end-to-end


CNN structure based on the encoder and decoder. There are four blocks in the
encoder network (red dotted box in Fig. 3). The first part in each block are two
convolution operations with a kernel size of 3 × 3 (each use filtering with padding
and followed by a ReLU operation), then the pixel interval sampling operation
is applied to reduce the size of feature maps by half. The architecture of pixel
interval sampling is shown in Fig. 3. It shows that the feature maps with channel
C can be processed into 4 parts based on pixel interval sampling, and they then
can be combined using concatenation operations. To further extract features
of feature maps, while considering the finite computational ability, convolution
operations with the kernel size of 3×3 (filtering with padding and activated with
ReLU) can tune 4 C-dimensional features to C-dimensional features. Hereto,
the initial feature maps with size H × W and channel C are changed to feature
maps with size H W
2 × 2 and channel C. The procedure is repeated four times,
H
with output resolutions of 16 ×W 16 and channel of 4C.

Fig. 3. The network structure of PIS-Net

In decoder network, four blocks are applied for up-sampling. Two convolu-
tion operations with a kernel size of 3×3 (each use filtering with padding and
followed by a ReLU operation) are applied first. Then the transposed convolu-
tion operation with a kernel size of 3, a stride of 2, and padding of 1 is applied
for up-sampling. The transposed convolution, whose parameters can be learned
in the process of backpropagation, can be applied to expand the size of the
312 J. Zhang et al.

image [32]. Meanwhile, the number of channels can be changed by using the
different number of convolutional kernels [31]. After that, the high-resolution
feature maps from the encoder are transformed to low-resolution feature maps
using 2×, 4× and 8× pixel interval sampling, and concatenated with the feature
maps after up-sampling. For instance, the 8× pixel interval sampling features
of the first block, 4× pixel interval sampling of the second block and 2× pixel
interval sampling of the third block in the encoder are concatenated with the
copied features of the fourth block and the features after up-sampling (5 parts
of pixel interval sampling features are concatenated in the fourth level of the
decoder). In the same way, there are 4, 3, 2 parts of features are concatenated in
the third, second, and first level of decoder, respectively. After that, two convo-
lutions and ReLU operations are applied to change the number of channels. The
up-sampling operation is repeated four times, with output resolutions of H × W
and channel of C, which has the same size as the encoder input features. Finally,
a Softmax layer with 2 output channels is applied for feature maps classification.

3.3 Counting Method

After segmentation, the post-processing method is applied to reduce the effect


of noises. A morphological filter is applied to remove small regions, which can
significantly improve the counting performance. Finally, the connected regions
of the segmentation images after post-processing are counted based on 8 Neigh-
borhood Search [7].

4 Experiments

4.1 Experimental Setting

Evaluation Indices. The evaluation indices are listed in Table 2. TP (True


Positive), TN (True Negative), FP (False Positive), and FN (False Negative)
are basic evaluation metrics, which can be applied to measure the performance
of segmentation in general. Furthermore, Npred means the number of connected
regions in the predicted image, NGT means the number of connected regions
in the GT image, which indicates the number of yeast cells. In the definition
of Hausdorff distance, sup is supremum and inf is infimum. Hausdorff distance
can be applied to measure the shortest distance between two images, with the
unit of pixels per image.

Experimental Environment. The experiment is conducted by Python 3.8.10


in Windows 10 operating system. The experimental environment is based on
Torch 1.9.0. The workstation is equipped with Intel(R) Core(TM) i7-8700 CPU
with 3.20 GHz, 16 GB RAM, and NVIDIA GEFORCE RTX 2080 8 GB.
PIS-Net for Dense Microorganism Counting 313

Table 2. The definitions of evaluation metrics. CA and HD are abbreviations of count-


ing accuracy and hausdorff distance, respectively

Metric Definition
T P +T N
Accuracy T P +T N +F P +F N
T P +T N
Jaccard F N +T P +F P
2T P
Dice F N +2T P +F P
TP
Precision T P +F P
|N −NGT |
CA 1 − pred NGT
HD dH (X, Y ) = max(supx∈X inf y∈Y d(x, y), supy∈Y inf x∈X d(x, y))

Parameters Setting. Softmax is used for the final classification in our experi-
ment. Adam optimizer is applied to minimize the loss function, which can adjust
the learning rate automatically by considering the gradient momentum of the
previous time steps [15]. In the training task, the learning rate is set as 0.001,
and the batch size is 8. The epoch is set as 100 by considering the converging
speed of experimental models, the example of loss and intersection over union
(IoU) curves of models is shown in Fig. 4. Though there are 92,319,298 params
to be trained in total, the PIS-Net can converge rapidly and smoothly, without
overfitting. There is a jump in loss and IoU plots for all 3 tested networks from
40 to 80 epochs, which is caused by the small batch size. Small batch size may
lead to the huge differences between each batch, and the loss and IoU curves
may jump while convergence.

Fig. 4. The loss and IoU curves in training process

4.2 Evaluation of Segmentation and Counting Performance


To show the best performance of PIS-Net for the task of yeast counting, we com-
pare it with some state-of-the-art methods, which contain Attention U-Net [21],
Trans U-Net [9], and Swin U-Net [8].
The experimental setting and evaluation indices are the same for the com-
parative experiment. All comparative experiments have the same dataset for
314 J. Zhang et al.

training, validation, and test, which is proposed in Sect. 3.1. All models are
trained from scratch without pre-training and fine-tuning, which are the same
as the work in [11] (though the models after pre-trained may obtain more satis-
factory results). In addition, the methods based on Hough transformation, Otsu
thresholding, and Watershed are compared to show the performance of classical
methods in the task of tiny dense object counting. The images after segmentation
are shown in Fig. 5.

Fig. 5. An example of segmentation images predicted by different models

By reviewing the Table 3, it shows that the PIS-Net has the highest Dice,
Jaccard, and Precision, which indicates that the PIS-Net performs best in the
task of dense tiny objects segmentation. U-Net has better Accuracy and Haus-
dorff distance, but the values are close to the proposed PIS-Net. It can be seen
that the deep learning methods can work better than the classical methods in
this task.
After segmentation, post-processing proposed in Sect. 3.3 is applied for noise
removal. Then the region search method based on 8 Neighborhood is applied for
dense tiny object counting. The counting results of several methods are shown
in Table 4.
The Counting accuracy of PIS-Net is the best and more than 5% higher than
U-Net. The Counting Accuracy of SegNet, Attention-UNet, Trans U-Net, and
Swin U-Net would be less than zero without post-processing, which is caused by
the huge number of False Positive pixels.

4.3 Repeatability Tests


Five extra experiments are done for repeatability tests to prove the stable and
accurate segmentation performance of PIS-Net. The result is shown in Table 5.
PIS-Net for Dense Microorganism Counting 315

Table 3. The average segmentation evaluation indices of predicted images (accuracy,


dice, jaccard and precision are in [%], hausdorff distance is in pixel per image)

Methods Accuracy Dice Jaccard Precision Hausdorff distance


PIS-Net 97.42 95.75 91.89 95.68 4.72
SegNet 94.69 90.34 84.02 88.50 6.36
U-Net 97.47 95.71 91.84 95.62 4.67
Attention U-Net 96.62 93.36 88.96 92.67 5.12
Trans U-Net 96.84 93.60 88.99 93.25 5.07
Swin U-Net 96.47 92.99 88.32 92.43 5.31
Hough 82.12 61.12 44.74 88.26 9.25
Otsu 84.23 65.71 49.90 87.66 8.92
Watershed 78.67 50.15 34.88 78.61 9.69

Table 4. The average counting accuracy of predicted images (in %)

Methods Counting accuracy Methods Counting accuracy


PIS-Net 96.88 SegNet 68.82
U-Net 91.33 Attention U-Net 84.34
Trans U-Net 91.32 Swin U-Net 91.95
Hough 73.66 Otsu 74.34
Watershed 63.34

From Table 5, it can be found that all evaluation indices of repeated PIS-Nets
are approximate, which shows satisfactory and stable counting performance for
dense tiny object counting tasks.

Table 5. The evaluation indices of repeatability tests (accuracy, dice, jaccard, precision
and counting accuracy are in %, hausdorff distance is in pixel per image)

Methods Accuracy Dice Jaccard Precision Counting accuracy Hausdorff distance


PIS-Net 97.42 95.75 91.89 95.68 96.88 4.72
PIS-Net (Re 1) 97.51 95.79 91.97 95.91 95.26 4.59
PIS-Net (Re 2) 97.33 95.59 91.62 95.70 96.25 4.73
PIS-Net (Re 3) 97.54 95.91 92.18 96.21 96.82 4.60
PIS-Net (Re 4) 97.37 95.64 91.70 95.70 95.51 4.75
PIS-Net (Re 5) 97.43 95.66 91.73 92.24 96.26 4.64
316 J. Zhang et al.

4.4 Computational Time


The training time, mean training time, test time, and mean test time are listed
in Table 6. There are 244 images in the training dataset and 82 images in the
test dataset. The mean training time of the PIS-Net model is approximately
1.8 s higher than the time of the Swin U-Net, and the test time is about 0.28 s
higher than Swin U-Net. The memory cost of PIS-Net is about 36MB, which
is about 6MB less than the cost of Swin U-Net model, meanwhile, the PIS-
Net has better counting performance. The counting accuracy is increased by
about 6%, so the PIS-Net has satisfactory counting performance and tolerable
computational time, which can be widely applied in the task of accurate dense
tiny object counting.

Table 6. The summary of computational time (in seconds)

Model Training time Mean training time Test time Mean test time
PIS-Net 1750.80 7.18 76.23 0.93
U-Net 1033.00 4.23 42.94 0.52
Swin-UNet 1314.06 5.39 53.25 0.65
Att-UNet 1163.93 4.77 49.44 0.60

5 Conclusion and Future Work


In this paper, a CNN model, PIS-Net is designed for the task of dense tiny
objects (yeast) counting. PIS-Net is an end-to-end model based on the encoder to
decoder architecture, down-sampling operations are based on pixel interval sam-
pling without max-pooling, which can cover more detailed regions and decrease
the accuracy loss of down-sampling in the task of dense tiny objects counting.
The evaluation indices, including Accuracy, Dice, Jaccard, Precision, Count-
ing Accuracy, and Hausdorff Distance of PIS-Net are 97.42%, 95.75%, 91.89%,
95.68%, 96.48% and 4.7204. By comparing with U-Net, the Dice, Jaccard, Pre-
cision, and Counting Accuracy are improved by 0.04%, 0.05%, 0.06%, 0.4% and
5.55%, respectively, which shows the PIS-Net has excellent segmentation and
counting performance in the tasks of dense tiny objects counting. However, the
Accuracy is decreased by 0.05% and Hausdorff Distance is improved by 0.0538,
which shows there is still some room for improvement.
In the future, we plan to apply pixel interval sampling for down-sampling in
other models to reduce the accuracy loss caused by max-pooling. On the other
hand, we will improve the architecture of PIS-Net for better segmentation and
counting performance. The model pruning [25] can also be applied to control the
memory cost of PIS-Net, making it more stable to use.

Acknowledgement. This work is supported by “National Natural Science Foundation


of China” (No. 61806047).
PIS-Net for Dense Microorganism Counting 317

References
1. Ates, H., Gerek, O.: An image-processing based automated bacteria colony counter.
In: Proceedings of ISCIS 2009, pp. 18–23 (2009)
2. Austerjost, J., Marquard, D., Raddatz, L., et al.: A smart device application for the
automated determination of E. coli colonies on agar plates. Eng. Life Sci. 17(8),
959–966 (2017)
3. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-
decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach.
Intell. 39(12), 2481–2495 (2017)
4. Barbedo, J.: An algorithm for counting microorganisms in digital images. IEEE
Lat. Am. Trans. 11(6), 1353–1358 (2013)
5. Barber, P., Vojnovic, B., Kelly, J., et al.: An automated colony counter utilising a
compact Hough transform. Proc. MIUA 2000, 41–44 (2000)
6. Blackburn, N., Hagström, Å., Wikner, J., et al.: Rapid determination of bacterial
abundance, biovolume, morphology, and growth by neural network-based image
analysis. Appl. Environ. Microbiol. 64(9), 3246–3255 (1998)
7. Boss, R., Thangavel, K., Daniel, D.: Automatic mammogram image breast region
extraction and removal of pectoral muscle. arXiv: 1307.7474 (2013)
8. Cao, H., Wang, Y., Chen, J., et al.: Swin-unet: unet-like pure transformer for
medical image segmentation. arXiv: 2105.05537 (2021)
9. Chen, J., Lu, Y., Yu, Q., et al.: Transunet: Transformers make strong encoders for
medical image segmentation. arXiv: 2102.04306 (2021)
10. Clarke, M., Burton, R., Hill, A., et al.: Low-cost, high-throughput, automated
counting of bacterial colonies. Cytometry Part A 77(8), 790–797 (2010)
11. Dietler, N., Minder, M., Gligorovski, V., et al.: A convolutional neural network
segments yeast microscopy images with high accuracy. Nature Commun. 11(1),
1–8 (2020)
12. Ferrari, A., Lombardi, S., Signoroni, A.: Bacterial colony counting by convolutional
neural networks. In: Proceedings of EMBC 2015, pp. 7458–7461 (2015)
13. Hong, M., Yujie, W., Caihong, W., et al.: Study on heterotrophic bacteria colony
counting based on image processing method. Control Instrum. Chem. Ind. 35(3),
38–41 (2008)
14. Jiawei, Z., Chen, L., Rahaman, M., et al.: A comprehensive review of image anal-
ysis methods for microorganism counting: from classical image processing to deep
learning approaches. Artif. Intell. Rev. 55, 2875–2944 (2021)
15. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv: 1412.6980
(2014)
16. Kosov, S., Shirahama, K., Li, C., et al.: Environmental microorganism classification
using conditional random fields and deep convolutional neural networks. Pattern
Recogn. 77, 248–261 (2018)
17. Kulwa, F., Li, C., Zhao, X., et al.: A state-of-the-art survey for microorganism
image segmentation methods and future potential. IEEE Access 7, 100243–100269
(2019)
18. Kulwa, F., Li, C., Zhang, J., et al.: A new pairwise deep learning feature for
environmental microorganism image analysis. Environmental Science and Pollution
Research p, Online first (2022)
19. Li, C., Wang, K., Xu, N.: A survey for the applications of content-based microscopic
image analysis in microorganism classification domains. Artif. Intell. Rev. 51(4),
577–646 (2017). https://doi.org/10.1007/s10462-017-9572-4
318 J. Zhang et al.

20. Li, C., Zhang, J., Kulwa, F., Qi, S., Qi, Z.: A SARS-CoV-2 microscopic image
dataset with ground truth images and visual features. In: Peng, Y., et al. (eds.)
PRCV 2020. LNCS, vol. 12305, pp. 244–255. Springer, Cham (2020). https://doi.
org/10.1007/978-3-030-60633-6_20
21. Oktay, O., Schlemper, J.F., et al.: Attention u-net: Learning where to look for the
pancreas. arXiv: 1804.03999 (2018)
22. Rahaman, M., Li, C., Yao, Y., et al.: Identification of COVID-19 samples from chest
X-Ray images using deep learning: a comparison of transfer learning approaches.
J. X-ray Sci. Technol. 28(5), 821–839 (2020)
23. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed-
ical image segmentation. In: Proceedings of ICMICCAI 2015, pp. 234–241 (2015)
24. Selinummi, J., Seppälä, J., Yli-Harja, O., et al.: Software for quantification of
labeled bacteria from digital microscope images by automated image analysis.
Biotechniques 39(6), 859–863 (2005)
25. Tang, Y., Ji, J.and Gao, S., et al.: A pruning neural network model in credit clas-
sification analysis. Comput. Intell. Neurosci. 2018, 22 (2018). Article ID: 9390410
26. Xu, H., Li, C., Rahaman, M.M., et al.: An enhanced framework of generative adver-
sarial networks (EF-GANs) for environmental microorganism image augmentation
with limited rotation-invariant training data. IEEE Access 8(1), 187455–187469
(2020)
27. Yamaguchi, N., Ichijo, T., Ogawa, M., et al.: Multicolor excitation direct counting
of bacteria by fluorescence microscopy with the automated digital image analysis
software BACS II. Bioimages 12(1), 1–7 (2004)
28. Yoon, S., Lawrence, K., Park, B.: Automatic counting and classification of bacterial
colonies using hyperspectral imaging. Food Bioprocess Technol. 8(10), 2047–2065
(2015)
29. Yoshizawa, K.: Treatment of waste-water discharged from sake brewery using yeast.
J. Ferment Technol. 56, 389–395 (1978)
30. You, L., Zhao, D., Zhou, R., et al.: Distribution and function of dominant yeast
species in the fermentation of strong-flavor baijiu. World J. Microbiol. Biotechnol.
37(2), 1–12 (2021)
31. Zeiler, M., Krishnan, D., Taylor, G., et al.: Deconvolutional networks. In: Proceed-
ings of of CVPR 2020, pp. 2528–2535 (2010)
32. Zeiler, M., Taylor, G., Fergus, R.: Adaptive deconvolutional networks for mid and
high level feature learning. In: Proceedings of ICCV 2011, pp. 2018–2025 (2011)
33. Zhang, C., Chen, W., Liu, W., et al.: An automated bacterial colony counting
system. In: Proceedings of SUTC 2008, pp. 233–240 (2008)
34. Zhang, H., Jian, L.: Current microbial techniques for biodegradation of wastewater
with high lipid concentrations. Tech. Equipment Environ. Pollut. Control 3, 28–32
(2004)
35. Zhang, J., Li, C., Kosov, S., et al.: LCU-net: a novel low-cost U-net for environ-
mental microorganism image segmentation. Pattern Recogn. 115, 107885 (2021)
36. Zhang, J., Li, C., Kulwa, F., et al.: A multi-scale CNN-CRF framework for environ-
mental microorganism image segmentation. BioMed Res. Int. 2020, 1–27 (2020)
37. Zhang, R., Zhao, S., Jin, Z., et al.: Application of SVM in the food bacteria image
recognition and count. In: Proceedings of ICISP 2010, vol. 4, pp. 1819–1823 (2010)
38. Zhao, P., Li, C., Rahaman, M.M., et al.: Comparative study of deep learning classi-
fication methods on a small environmental microorganism image dataset (EMDS-
6): from convolutional neural networks to visual transformers. Front. Microbiol.
13, 792166 (2022). https://doi.org/10.3389/fmicb.2022.792166
Biomedical Engineering
and Physiotherapy, Joint Activities,
Common Goals
Analysis of Expert Agreement
on Determining the Duration of Writhing
Movements in Infants to Develop
an Algorithm in OSESEC

Dominika Latos1 , Daniel Ledwoń2(B) , Marta Danch-Wierzchowska2 ,


Iwona Doroniewicz1 , Alicja Affanasowicz1 , Katarzyna Kieszczyńska1 ,
Małgorzata Matyja1 , and Andrzej Myśliwiec1
1
Institute of Physiotheraphy and Health Science, Academy of Physical Education
in Katowice, ul. Mikołowska 72A, 40-065 Katowice, Poland
{d.latos,i.doroniewicz,a.affanasowicz,
m.matyja,a.mysliwiec}@awf.katowice.pl
2
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{daniel.ledwon,marta.danch-wierzchowska}@polsl.pl

Abstract. Neurodevelopmental assessment is aimed to determine the


quality of motor patterns in newborns and infants early. Prechtl’s gen-
eral movement assessment method is considered the most effective in
identifying infants’ motor activity quality in the first 5 months of life.
Assessment, according to Prechtl, is not easy as it requires very good
knowledge and experience from diagnosticians. The purpose of this study
was to evaluate the agreement on the assessment of normal writhing
movements as described by Prechtl in infants on days 2 and 3 of life
by five experts. The assessments were made using 5 recordings of infant
activity randomly selected from 36 videos. In the author’s opinion, the
differences observed between experts in the indicated sections of WMs
were mainly due to discrepancies in indicating the beginning and end of
the movement. The obtained results show the necessity of considering the
observations of many experts in ground truth data for developing and
evaluating the computer-aided neurodevelopmental diagnosis of infants.

Keywords: General movement · Writhing movements · Infant

1 Introduction

Preterm delivery significantly increases the risk of abnormal infant development,


whereas perinatal complications can affect neurological development [15]. There-
fore, there is a need to search for tools that support the diagnostic procedure,
both from the specialist and screening standpoints. Furthermore, the use of mod-
ern information technologies, such as those associated with social media, makes
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 321–331, 2022.
https://doi.org/10.1007/978-3-031-09135-3_27
322 D. Latos et al.

it possible to draw the infant’s parents’ attention to the presence of features that
should be consulted with an expert. Such management facilitates early diagno-
sis of neurological disorders in infants, which in turn allows for prompt medical
intervention early in the child’s development. Despite the development of modern
advanced diagnostic techniques, it is primarily the feedback obtained through
observation and clinical evaluation that is particularly important.
Early assessment of infant motor development has been the subject of study
by many researchers. Many publications describe testing procedures based on
diagnostic scales used in the early assessment of motor activity. The scales can
be divided into subjective and objective. The former indicate that the reliabil-
ity of assessing the motor development of a child depends on the experience of
the person who performs the examination (including work experience, regular
further training in pediatric physiotherapy, and individual observation skills) of
the person performing the examination [17,23]. The most recognized and widely
used methods of the subjective assessment of motor development in infants are:
Dubowitz Scale which identifies developmental disorders of the infant [10]; Test
of Infant Motor Scale (TIMP) which evaluates posture and psychomotor skills
of the infant [25]; Neonatal Behavioral Assessment Scale (NBAS) which assesses
infant behavior [26]; General Movement Assessment (GMA) – an assessment of
general movement patterns according to Prechtl [24]; Harris Infant Neuromotor
Test (HINT) which identifies neurodevelopmental, cognitive, and behavioral dis-
orders [16]; Alberta Infant Motor Scale (AIMS) that assesses the development of
motor functions from birth to independent locomotion [1]; and the Munich Func-
tional Developmental Diagnosis (MFDD) assessing functional skills in manual
dexterity, perception, active speech, speech comprehension, child independence,
and assessing postural reactivity. Of the subjective methods, the general move-
ments assessment has the highest reliability [3,4,27]. This method is described as
the most reliable for the prediction of cerebral palsy [22,27]. It is now a widely
used diagnostic tool in the neurological assessment of newborns and infants.
The method is considered extremely important for the overall assessment of the
integrity of the central nervous system [26].
Infants exhibit a spontaneous movement pattern called general movements
(GMs) [5]. GMs can be divided into fetal movements, which last from 9 to
12 weeks of fetal life to 40 weeks of gestation. From the time of delivery until 6
to 9 weeks of age, these movements are also called writhing movements (WMs).
These movements are performed with low velocity, amplitude, and are character-
ized by high complexity and high variability with respect to amplitude, velocity,
and acceleration [22]. After this period, writhing movements disappear, replaced
by fidgety movements (FMs), which are present until about 6 months of age. One
of the abnormal patterns of general movements is a poor repertoire of movements
(PR). In this case, the movement pattern is simple and monotonous. Movements
of different body parts do not occur in a complex manner. They are character-
ized by low amplitude, speed, and intensity. PR occurs frequently in infants and
therefore the predictive value is described in the literature as very low [12]. In
their study on 10-day-old infants, De Vries & Bos presented the conclusion that
Expert Agreement on Determining the Duration of Writhing Movements 323

if at least one normal WM is observed in the infant, the likelihood of normal


development is high (94%) [6].
Objective methods are a small group and are relatively rarely used in prac-
tice. These include biological age assessment, EMG, ECG, and Podo Baby, a
device to assess the amount of infantile postural asymmetry. In recent years,
researchers have been exploring the possibilities of computer-aided early diag-
nosis through automated analysis of spontaneous infant activity. Some studies
have focused mainly on the assessment of movements to predict potential cere-
bral palsy and the development of various models for the assessment of move-
ments [19,20]. To acquire information about movements, researchers use optical
systems and various sensors placed on the infant’s body, such as accelerometric
[18] or electromagnetic sensors [21]. How sensors or markers used on the infant’s
body affect the infant’s spontaneous movements is debatable. When a marker-
less optical system based on a video camera or RGB-D systems is used, image
assessment provides opportunities to develop a simple, non-invasive, and, most
importantly, reproducible tool for the assessment of movement quality.
Studies have attempted to develop objective indicators for neurodevelopmen-
tal assessment of infants based on computer analysis of video recordings as part
of the OSESEC project (Objective System of Evaluation and Support in Early
Childhood) [7–9]. A prerequisite for the development and verification of an auto-
mated computer-aided diagnostic tool is the acquisition of data with an appro-
priate expert description. In the case of GMA, the diagnostic outcome is the
classification of the observed general movements into one of the subcategories
of normal or abnormal movements. The classification used refers to a holistic
observation of an infant over several minutes. However, it results from the char-
acteristic features of the movements observed by an expert at specific instants.
In the classical approach, the expert does not break down the entire recording
into individual interesting sections that confirm the final diagnosis. Due to the
variable nature of infants’ spontaneous activity, obtaining an expert’s opinion on
specific parts of the movement and their effect on the final diagnostic decision
seems necessary to create a reliable automated diagnostic tool that replaces the
expert’s work.
The purpose of this study was to determine the agreement of five experi-
enced experts on the diagnosis of infants on day 2 or 3 of life using GMA. To
develop a methodology to obtain reliable data for the development and valida-
tion of a computer-aided neonatal diagnosis tool, experts attempted to identify
sections indicating the presence of normal general movements for 5 cases classi-
fied as normal writhing movements. The data obtained were used to determine
expert agreement on the accurate identification of sections of normal writhing
movements.

2 Material and Methods


The research was positively evaluated by the Bioethics Committee for Sci-
entific Research at the Jerzy Kukuczka Academy of Physical Education in
324 D. Latos et al.

Katowice (Resolution No. 5/2018 of 19 April 2018). All patients and their par-
ents/guardians gave written informed consent to participate in the study. The
research was carried out in the Piekary Medical Centre in the neonatal unit in
Piekary Śląskie, Poland. The equipment used in the tests did not pose a threat
of radiation or exposure to other energy that could in any way affect the safety
of the observed infants.

2.1 Research Population

The study population consisted of 125 full-term (38–42 weeks) infants on days 2
and 3 of life, born by physiologic delivery, with positive perinatal history. The
motor activity of the subjects was recorded with a video camera. From this group,
after taking into account the inclusion and exclusion criteria, such as infant’s
crying or sleeping and caregiver’s intervention aimed at calming the infant by
giving a pacifier, rocking, stroking, or feeding, 36 recordings were selected for
further analysis, with five videos (N1–N5) chosen for the main part of the study
(Fig. 1). Table 1 shows the specific information about infants recorded on selected
videos.

Fig. 1. Flow diagram showing the criteria for the qualification of infants for the test
Expert Agreement on Determining the Duration of Writhing Movements 325

Table 1. Specific information about subjects selected from the research group

Infant ID Sex Day of observation Age (weeks) Weight (g) Apgar


N1 Female 2 39 3250 10
N2 Male 3 40 4080 10
N3 Male 2 41 3330 10
N4 Female 3 38 3560 10
N5 Male 3 40 3720 10

2.2 Methodology
Observations of the infants were made after their spontaneous awakening and
after feeding. Recording of the infant’s movement was continued until the infant
was active and did not cry. Videos showing moments of the infant sleeping or
crying have been removed. The hospital crib with the infant was placed on the
video recording station taking into account the appropriate direction: the side
of the infant’s head on the video recording device had a blue marking labeled
‘UP’, while the side of the infant’s legs – a pink marking labeled ‘DOWN’. The
infant was recorded while lying in a supine position, undressed, and in a diaper.
The temperature, lighting, and noise level in the test room were consistent with
the current regulations of the neonatal ward. The camera was placed 1 m above
the infant’s navel. Captured by a full HD (1920 × 1080) resolution camera, the
recording lasted 10 to 17 min [9].
Videos where the infant was crying or sleeping in an extended part were
removed. Videos of 36 infants evaluated by five independent physiotherapists
(experts) with years of experience working with patients at the developmental
age were used for further analysis. The neonatal activity was assessed using
GMA. As a general movement assessment system, GMA is defined as a stan-
dard, non-invasive, convenient, inexpensive, and reliable research method [2,11].
A standard GMA is a process of a 3–5 min video observation of a supine, com-
fortably dressed, awake, and calm infant. The recorded movements of the infant
are evaluated according to its age (chronological or corrected).
The experts’ task was to identify videos with three complete WM sequences
in infants and videos of those with poor repertoire (PR). For the purposes of
the study, it was assumed that an infant would be identified as normal (N) if at
least four of the five experts classified the infant in the same way. The results
of the observations were documented in the form of a list of the following data
for each of the 36 recordings: sex and age of the infant (24 h of the observed
infant’s life), duration of the recording, determination of N or PR by each of the
five experts. In a further qualification step, five recordings of infants identified
as normal (N) were randomly selected for evaluation. Then, from each of the five
recordings, the sections in which the five experts indicated writhing movements
(WM), other movements (OM), and in which the infants made no movement
(NM) were selected.
326 D. Latos et al.

The study used 5 videos of infants whose movements were uniquely identified
by five experts in general movement assessment as writhing movements. Next,
the percentage value of WMs was evaluated by dividing the total time of occur-
rence of WMs in a given video by the entire video length. Experts agreement
was determined by dividing the range length defined by all experts as a WMs
(common part) and the range length indicated by at least one expert (union of
all experts indications).

2.3 Experts
The experiments were conducted by highly qualified physiotherapists who were
certified experts in GMA chosen for the purposes of the study:
– Expert 1 (E1): physiotherapist with 45 years of experience in working with
infants and young children, psychologist, sensorimotor integration, and NDT
Bobath therapist.
– Expert 2 (E2): physiotherapy specialist with 20 years of experience in working
with infants and young children, NDT Bobath therapist for children.
– Expert 3 (E3): physiotherapist, NDT Bobath therapist for children with
9 years of experience in working with infants and young children.
– Expert 4 (E4): physiotherapist, NDT Bobath therapist for children with
14 years of experience in working with infants and young children.
– Expert 5 (E5): physiotherapist, NDT-Bobath therapist for children with
17 years of experience in working with infants and young children.

3 Results
Five experts received five video recordings of the infants studied. The expert’s
task was to analyze each video recording and identify writhing movements in
individual infants (Fig. 2). The observations made by the researchers allowed
for:
1. Evaluation of the movement on the timeline in percentage terms,
2. Determination of the number of time units of WMs,
3. Determination of the agreement on the assessment of five experts identified
at the same time.
Table 2 shows the duration of WMs observed by experts. The percentage
of WMs observed by each expert in each recording and the number of WM
periods/sections recorded by the experts at their indicated moments of infants’
activity are also presented.
The greatest expert agreement on the observation of WMs (9 common sec-
tions) was observed in the examination of infant N4 (Fig. 3). The agreement
recorded during observation of this infant lasted 3 min. During the examinations
of infants N2, N3, N5, experts agree on 5 common sections, whereas in N1 — in
4 common sections. The lowest agreement was observed in the examination of
N2 and N3, with 4 and 5 common sections, respectively (Table 3).
Expert Agreement on Determining the Duration of Writhing Movements 327

Fig. 2. Results of observation of infants by experts

Most experts observed the largest number of WMs in the recording N4. Fur-
thermore, experts identified the least number of WMs in the video of infant 5,
despite similar values of the sections (except for E1, which included small breaks
between WMs). In this case, there was also a large variation in agreement, which
may be due to considerable difficulties in the determination of the beginning
and end of WM. The most (and the longest) WM sections were indicated by
E2, thus significantly deviating from the observations of the other experts, while
the fewest were found by E4, who, compared to other experts, also indicated a
relatively small percentage of movements as WMs. Similarly, E5 marked a small
amount of WM compared to other experts.

4 Discussion
In the present study, discrepancies were observed in the assessment of videos by
experts. The discrepancies were caused by the fact that discretion was allowed
in assessing when WM started and when it ended. However, it is worth noting
that the above-mentioned definitions cannot be found in the extensive litera-
ture on the subject. Writhing movements are known to occur in the fetus from
328 D. Latos et al.

Table 2. Parameters of the observation of writhing movements by individual experts


in each of the analyzed videos

Recording ID (duration) E1 E2 E3 E4 E5
N1 (10:43) WM time 03:52 08:31 02:50 02:41 03:21
% WM 36.08 79.47 26.44 25.04 31.26
Number of sections 7 5 4 4 7
N2 (17:10) WM time 02:47 02:42 03:32 02:26 02:10
% WM 16.21 15.73 20.58 14.17 12.62
Number of sections 6 7 8 6 8
N3 (12:09) WM time 04:24 09:24 05:26 04:39 04:56
% WM 36.21 77.37 44.72 38.27 40.60
Number of sections 12 6 5 5 9
N4 (14:36) WM time 10:00 09:26 09:42 09:30 09:03
% WM 68.49 64.61 66.44 65.07 61.99
Number of sections 6 7 6 4 7
N5 (12:08) WM time 05:51 09:13 05:58 02:18 03:57
% WM 48.21 75.96 49.18 18.96 32.55
Number of sections 12 3 5 3 4

Fig. 3. Comparison of writhing movement sections marked by at least one of the experts
to the ranges of writhing movements observed in full agreement by all experts
Expert Agreement on Determining the Duration of Writhing Movements 329

Table 3. Percentage agreement of experts on the assessment of individual infants


determined as the quotient of the length of common range determined by experts and
the union of the ranges indicated by all experts

N1 N2 N3 N4 N5
Agreement 18.89% 26.58% 17.92% 80.98% 19.58%

40 weeks of gestation and continue until approximately 2 months of age. They


are characterized by small to moderate movement amplitude and slow to mod-
erate speed. Rapid and extensive extension movements may occasionally occur
in the limbs, especially in the upper limbs. Typically, such movements are in
the form of an ellipse, which creates an impression of the writhing nature of the
movement [13,14]. In healthy full-term infants at the age of 6 to 9 weeks, gen-
eral movement patterns change from writhing movements to fidgety movements.
These are circular movements of the neck, body trunk, and other parts of the
body, performed at an average speed, with variable acceleration, and in different
directions [18].
In order to identify the infant’s activities as normal, a diagnostician should
isolate the WM patterns. All experts agreed on the occurrence of most WMs.
However, there were differences in terms of taking into account both the begin-
ning and the end of WMs on the timeline for the analyzed video recordings.
The discrepancy is shown in Fig. 3. Experts agree that defining clear criteria for
establishing the aforementioned points will increase agreement among them in
more accurate identification of WMs. However, the method does not specify how
to recognize the beginning and end of a movement. This was considered not to
impinge on the efficacy of the method. During the literature review, no stud-
ies were also found to assess the timeline agreement associated with accurately
determining both the beginning and end of WMs. Recognizing patterns of spon-
taneous movements alone is very challenging and requires intensive training from
experts. Given these limitations, computer-aided analysis of the general move-
ments would be very helpful. Although several research groups have attempted
to overcome the subjectivity of GMA over the past decade by trying to auto-
matically analyze general movements, creating an algorithm that, along with
the researchers’ observation, could monitor the infant’s development and possi-
ble progress in therapy would be very helpful. The idea of the present study is
to develop patterns that reflect the observations of an expert.
In the study, the experts focused on assessing the smooth and elegant WMs,
the degree of their variety and complexity, and the rotation that is observed
along the long axis of the limb. It should be noted that all the experts were
highly experienced in neurodevelopmental diagnosis and therapy of infants. High
agreement occurred when they identified infants with N and PR in a group of 36
infants. Infants with at least three complete sequences of WMs were identified
as normal, whereas those performing incomplete WMs, lacking fluency, elegance,
and complexity were identified as PR (poor repertoire) infants. In their study
330 D. Latos et al.

on 10-day-old infants, De Vries & Bos presented the conclusion that if at least
one normal WM is observed in the infant, the likelihood of normal development
is high (94%) [6]. Therefore, a detailed determination of the beginning and end
of WMs is of no value in predicting neonatal development. The diagnosticians
unanimously distinguished normal WMs in the studied group of infants and iden-
tified them as normal. The GMA does not specify when such a movement begins
or ends. In developing computer-aided diagnosis tools for neurodevelopmental
disorders, it is important to consider the discrepancies that exist in the obser-
vations made by the experts indicated in this study. The results are the basis
for determining the effectiveness of the developed methods in the automated
identification of writhing movements.

References
1. Almeida, K.M., Dutra, M.V.P., Mello, R.R.D., Reis, A.B.R., Martins, P.S.: Concur-
rent validity and reliability of the alberta infant motor scale in premature infants.
J. de pediatria 84, 442–448 (2008)
2. Bosanquet, M., Copeland, L., Ware, R., Boyd, R.: A systematic review of tests to
predict cerebral palsy in young children. Dev. Med. Child Neurol. 55(5), 418–426
(2013)
3. Cioni, G., Ferrari, F., Einspieler, C., Paolicelli, P.B., Barbani, T., Prechtl, H.F.:
Comparison between observation of spontaneous movements and neurologic exam-
ination in preterm infants. J. Pediatrics 130(5), 704–711 (1997)
4. Cioni, G., Prechtl, H.F., Ferrari, F., Paolicelli, P.B., Einspieler, C., Roversi, M.F.:
Which better predicts later outcome in fullterm infants: quality of general move-
ments or neurological examination? Early Hum. Dev. 50(1), 71–85 (1997)
5. De Vries, J.I., Visser, G.H., Prechtl, H.F.: The emergence of fetal behaviour. i.
qualitative aspects. Early Hum. Dev. 7(4), 301–322 (1982)
6. De Vries, N., Bos, A.: The quality of general movements in the first ten days of life
in preterm infants. Early Hum. Dev. 86(4), 225–229 (2010)
7. Doroniewicz, I., et al.: Computer-based analysis of spontaneous infant activity: a
pilot study. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information
Technology in Biomedicine. AISC, vol. 1186, pp. 147–159. Springer, Cham (2021).
https://doi.org/10.1007/978-3-030-49666-1_12
8. Doroniewicz, I., et al.: Temporal and spatial variability of the fidgety movement
descriptors and their relation to head position in automized general movement
assessment. Acta Bioeng. Biomech. 23(3), 69–78 (2021)
9. Doroniewicz, I., et al.: Writhing movement detection in newborns on the second and
third day of life using pose-based feature machine learning classification. Sensors
20(21), 5986 (2020)
10. Dubowitz, L., Ricciw, D., Mercuri, E.: The dubowitz neurological examination of
the full-term newborn. Mental Retard. Dev. Disabil. Res. Rev. 11(1), 52–60 (2005)
11. Einspieler, C., Bos, A.F., Libertus, M.E., Marschik, P.B.: The general movement
assessment helps us to identify preterm infants at risk for cognitive dysfunction.
Front. Psychol. 7, 406 (2016)
12. Einspieler, C., Prechtl, H.F.: Prechtl’s assessment of general movements: a diagnos-
tic tool for the functional assessment of the young nervous system. Mental Retard.
Dev. Disabil. Res. Rev. 11(1), 61–67 (2005)
Expert Agreement on Determining the Duration of Writhing Movements 331

13. Einspieler, C., Prechtl, H.F., Ferrari, F., Cioni, G., Bos, A.F.: The qualitative
assessment of general movements in preterm, term and young infants-review of the
methodology. Early Hum. Dev. 50(1), 47–60 (1997)
14. Fjørtoft, T., Einspieler, C., Adde, L., Strand, L.I.: Inter-observer reliability of the
“assessment of motor repertoire-3 to 5 months" based on video recordings of infants.
Early Hum. Dev. 85(5), 297–302 (2009)
15. Harris, S.R., Daniels, L.E.: Reliability and validity of the harris infant neuromotor
test. J. Pediatr. 139(2), 249–253 (2001)
16. Heineman, K.R., Hadders-Algra, M.: Evaluation of neuromotor function in infancy-
a systematic review of available methods. J. Dev. Behav. Pediatr. 29(4), 315–323
(2008)
17. Heinze, F., Hesels, K., Breitbach-Faller, N., Schmitz-Rode, T., Disselhorst-Klug,
C.: Movement analysis by accelerometry of newborns and infants for the early
detection of movement disorders due to infantile cerebral palsy. Med. Biol. Eng.
Comput. 48(8), 765–772 (2010)
18. Hopkins, B., Prechtl, H.R.: A qualitative approach to the development of move-
ments during early infancy. Clin. Dev. Med. 94, 179–197 (1984)
19. Ihlen, E.A., et al.: Machine learning of infant spontaneous movements for the early
prediction of cerebral palsy: a multi-site cohort study. J. Clin. Med. 9(1), 5 (2020)
20. Karch, D., et al.: Kinematic assessment of stereotypy in spontaneous movements
in infants. Gait Posture 36(2), 307–311 (2012)
21. Marcroft, C., Khan, A., Embleton, N.D., Trenell, M., Plötz, T.: Movement recog-
nition technology as a method of assessing spontaneous general movements in high
risk infants. Front. Neurol. 5, 284 (2015)
22. Noble, Y., Boyd, R.: Neonatal assessments for the preterm infant up to 4 months
corrected age: a systematic review. Dev. Med. Child Neurol. 54(2), 129–139 (2012)
23. Novak, I., et al.: Early, accurate diagnosis and early intervention in cerebral palsy:
advances in diagnosis and treatment. JAMA Pediatr. 171(9), 897–907 (2017)
24. Nuysink, J., et al.: Prediction of gross motor development and independent walking
in infants born very preterm using the test of infant motor performance and the
alberta infant motor scale. Early Hum. Dev. 89(9), 693–697 (2013)
25. Ploegstra, W.M., Bos, A.F., de Vries, N.K.: General movements in healthy full term
infants during the first week after birth. Early Hum. Dev. 90(1), 55–60 (2014)
26. Prechtl, H.F.: General movement assessment as a method of developmental neu-
rology: new paradigms and their consequences the 1999 ronnie mackeith lecture.
Dev. Med. Child Neurol. 43(12), 836–842 (2001)
27. Stewart, P., Reihman, J., Lonky, E., Darvill, T., Pagano, J.: Prenatal PCB expo-
sure and neonatal behavioral assessment scale (NBAS) performance. Neurotoxicol.
Teratol. 22(1), 21–29 (2000)
Comparative Analysis of Selected Methods
of Identifying the Newborn’s Skeletal
Model

Adam Mrozek1(B) , Marta Danch-Wierzchowska2 , Daniel Ledwoń2 ,


Dariusz Badura3 , Iwona Doroniewicz4 , Monika N. Bugdol2 ,
Małgorzata Matyja1,2,3,4 , and Andrzej Myśliwiec4
1
Faculty of Exact and Technical Sciences, University of Silesia in Katowice,
Bankowa 12, 40-007 Katowice, Poland
m_adam007@o2.pl
2
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{marta.danch-wierzchowska,daniel.ledwon,monika.bugdol}@polsl.pl
3
Faculty of Applied Sciences, University of Dąbrowa Górnicza,za, ul. Zygmunta
Cieplaka 1c, 41-300 Dąbrowa Górnicza, Poland
4
Institute of Physiotherapy and Health Science, Academy of Physical Education
in Katowice, ul. Mikołowska 72A, 40-065 Katowice, Poland
{i.doroniewicz,a.mysliwiec}@awf.katowice.pl

Abstract. Determining and tracking the location of specific points of


real objects in space is not an easy task. Nowadays, this task is performed
by artificial deep neural networks. Various methods and techniques have
been competing with each other in recent years. Most of them do it
effectively and with a satisfactory result. The success of such solutions
is associated with long-term learning and a large amount of training
material. The following article should answer the question whether and
how the selected tool affects the results obtained. Several approaches
to solving the problem using different technologies are presented. The
material was verified for a selected group of photos of children in the
first weeks of life.

Keywords: Computational geometry · Computer vision · Infant


model · Pose estimation · Activity detection

1 Introduction

The problem discussed in this article is a challenge for many automatic behavior
control systems. The way in which the machine sees and interprets the image
obtained from the environment does not always correspond to the way in which
the human (and other living organisms) perceives it. This study deals with the
customization of generally available pose estimation systems for the assessment
of neonatal form. The research was inspired by the problems encountered while
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 332–344, 2022.
https://doi.org/10.1007/978-3-031-09135-3_28
Comparative Analysis 333

working on the OSESEC (Objective System of Evaluation and Support of Early


Childhood) project implemented in cooperation with biomedical engineers and
physiotherapists involved in the neuromotor development of infants. The pro-
gram assumes the construction of a system supporting the assessment of motor
functions of newborns. Currently, most methods of assessing the quality of new-
borns’ motor functions are based on the observation of the newborn by the
examiner and largely depend on experience and substantive preparation. This
type of test does not allow for an accurate and repeatable assessment. There
are both subjective methods, using visual observations, and objective methods
– using measuring instruments (e.g., anthropometers). They also include tech-
niques that use photographs (shading, silhouette, photometry, photogramme-
try). When examining the characteristics of the muscular and skeletal systems,
the third dimension becomes useful, and even measuring the rotation of organs
around its axis. There are many methods of recreating a character skeleton
model [3]. Some of them, more accurate, use the so-called tags. These are ele-
ments detectable by measuring sensors, enabling their precise identification and
location in space. Although the method is accurate, it is difficult to use in most
applications, especially at home for newborns, due to its inconvenience. The sec-
ond group of methods is the non-contact image analysis. It’s allowed to register
the position of objects and their changes, and to designate a group of selected
characteristic points on the captured image – even in real time. Since the photo-
graphic image is two-dimensional, special techniques are used to obtain the next
dimension, usually using additional shots synchronously (stereometry) or asyn-
chronously. The third dimension can be estimated on the basis of domain data
having basic data on the structure and spatial shape of a real object, or it can be
calculated on the basis of other additional sources of information. In particular,
this source can be synchronized shots from a different camera from another angle.
The more shots from diverse angles, the better the effect can be obtained, but
with the in-creasing of the computational complexity cost. Two-lens stereomet-
ric systems deserve special attention here. Many methods require prior camera
calibration, although there are systems where it is not needed. It should also be
mentioned that some systems are suitable for estimating the position of fixed
objects, and others for moving objects. The correct location of points in a 2D
plan enables proper calculation of the third component. The points obtained
as a result of the method are not intended to be used only for visualization,
but to constitute intermediate data for building a more complex model. In par-
ticular, they should be useful for the development of motion parameters using
multi-frame sequences. It is important that the successively exposed positions of
the selected points must form a continuum. The main goal was a comparative
analysis enabling the assessment of the suitability of selected methods for the
assessment of the motor function of newborns in the 3D dimension. Obtaining a
3D model would allow to visualize the features of the dynamics of movement nec-
essary to support the expert assessment. The designated preliminary task was to
test the suitability of selected methods of acquiring characteristic points on the
body of a newborn, because only a properly functioning 2D method guarantees
334 A. Mrozek et al.

an equally good 3D method. The algorithms developed so far will reproduce the
characteristic places on the body of adults with greater or lesser accuracy. Most
of them fulfill their tasks, in which, due to the demonstrative nature, inaccuracies
are allowed. The preferred system should be resistant to such deviations, be able
to eliminate them or identify and separate them. The complexity of the developed
method is also not trivial. Appearing artifacts can be filtered using specific algo-
rithms if they are incidental. However, this introduces a further computational
overhead. Sometimes pre-processing of images is also required. The fact that the
pose estimation systems built during the learning of artificial neural networks
of universal collections of silhouettes do not provide sufficient effectiveness may
be proved by the development of special video collections of only children, espe-
cially newborns. One of them is presented by Migliorellia and Moccia [15]. More
advanced systems additionally use the depth image. Such solutions can be found
in the works Hesse [10–12]. Published datasets also contain labeled landmarks.
Sets ready to train new neural network architectures or train already pre-learned,
especially useful when you have a small data set [20]. A ready tool for automatic
pose estimation that can be used at home, using ordinary video recordings made
with a smartphone, offers Passmore et. [17]. However, the biggest problem is that
the landmarks are independently embedded by the creators of these datasets and
cannot be directly transferred to another system without tedious learning. The
measures introduced in the cited works are based on the previously known appro-
priate point coordinates. The research of this paper aims to discuss the effects
of the indiscriminate use of any of the methods.

2 Application of Machine Learning Methods


The three-dimensional model consists of a set of points with the following coor-
dinates: on the perpendicular plane (x, y) and depth, coordinate (z). In addition,
the model introduces connections between such points, which are to imitate the
skeletal system. The process of estimating a human pose can be divided into
several parts:
1. Identifying the anatomical key points of the human body.
2. Connecting key points to form the skeleton structure.
3. Creating a 3D representation of the skeleton structure.
Algorithms that perform this task in particular stages perform a number of activ-
ities. The color image is first flattened to shades of gray, reduced to the desired
size, and then subjected to a series of transformations. Finally, it is directed to
the input of the artificial multilayer neural networks that make the estimation.
Two factors decide about the results of the estimation: the network architecture
and the training set used, as well as the learning time. In general, deep learn-
ing architectures suitable for pose estimation are based on convolutional neural
network (CNN) varieties. There are two overarching approaches: bottom-up and
top-down. Both types of architectures are equally applicable to 2D and 3D pose
estimation [6]. In [3], the authors list 12 different concepts. Many solutions based
Comparative Analysis 335

on different concepts compete with each other both in terms of memory and time
efficiency. The results of the rankings are available in many publications [24,25].
Applied as a benchmark, OpenPose, using a bottom-up approach, first detects
parts (key points) belonging to each person in the image, then assigns parts
to different people using part affinity fields [1] The opposite is true of RMPE
(AlphaPose), which is a top-down method. Popular architecture for semantic
segmentation and instance is Mask RCNN and Deep High-Resolution Represen-
tation Learning for Human Pose Estimation [HRNet] (CVPR’19), [19]. Typing
the places of occurrence of characteristic points assumes a certain probability of
their occurrence in a specific place. This estimation is based on a set of pre-tagged
images that constitute training material for a previously built artificial neural
network. Most of the popular applications (mentioned above) use standard data
sets for learning. This makes it easier to compare performance as well as learn.
Consequently, it can be expected that the result will be identical positions of
the characteristic points. Popular data sets are: 25,000 images MPII Human
Pose dataset [9,23], 2000 Leeds Sports Pose Dataset [13,22], Frames Labeled In
Cinema (FLIC) [18] COCO (Microsoft Common Objects in Context) [14]. CMU
Panoptic Dataset [9]: 5 sequences (5.5 h) and 1.5 million skeletons 3D, VGG
Human Pose Estimation Datasets [26] YouTube Pose [4], and other. In addition,
there are also data sets teaching three-dimensional estimation: Human3.6 [2]
containing 3.6 million three-dimensional human positions, HumanEva Dataset
in version I, II and baseline [18] containing 40 thousand frames with 3D posi-
tions. Characteristic points calculated by algorithms are given different names,
not always corresponding to the anatomical parts of the body, e.g. PoseNet and
MPII Format. The diagrams (Fig. 1) show a different arrangement of the points.

Fig. 1. Location of characteristic points in models: COCO Format (a), OpenPose (b),
Pose MediaPipe format Landmark Model (BlazePose GHUM 3D) (c), Mobidev Pose
format (d)
336 A. Mrozek et al.

3 Research Material and Research Methodology


As the assessment of the proposed system is related to a specific application,
films containing shots of newborns were used as the research material. The film
material is part of the collection prepared for the implementation of the OSESEC
project. The study was approved by the Biomedical Research Ethics Committee
(No 5/2018) and in accordance with the Declaration of Helsinki. The study
was approved by the Bioethics Committee of Research of the Jerzy Kukuczka
Academy of Physical Education in Katowice. The study used a database of 125
recordings of newborns collected in cooperation with the neonatal ward of the
St. Luke Municipal Hospital in Piekary Śląskie, Poland. The measurement stand
consisted of a 1 × 1 m tabletop and a frame with a camera mount 1 m above
the tabletop surface. The stand was equipped with a Sony HDR-AS200V video
camera, enabling recording at a spatial resolution of 1920 × 1080 px at 60fps
sampling rate. These are films, about 20 min long, showing a newborn baby lying
on his back, free in his movements. He is the only character on the shot set. The
recording is made in a laboratory or home environment, using natural daylight.
The recording was made with a high resolution of 1920 × 1080 and a frequency
of 60 frames/s. This footage was used to calculate the set of characteristic points
by the OpenPose program, which are the reference points for other systems. The
research methodology consists of several stages:
1. Analysis of the video recording as a whole.
The analysis includes such components as:
– the course of variability and frequency of occurrence of specific measure-
ment values, the degree of confidence,
– calculation of derivative values and their characteristics: distance between
adjacent joints, range of variability,
– examination of physical quantities: velocity and acceleration of motion,
– calculation of mutual relations between quantities,
– typing the most extreme results.
2. Analysis of selected images:
– selecting frames with extreme parameter values,
– selection of frames with incorrectly estimated characteristic points.
3. The scale and scope of the appearing artifacts:
– maximum deviation from the correct position,
– artifact duration.

4 Results
Standard material – the sample against which subsequent systems are compared
is a set of data obtained from the OpenPose system. In the raw dataset, including
coordinates of points characteristic of the shoulders, hips, knee and elbow joints,
wrists and heels, as well as eyes and nose, a subset common to the other systems
was selected; i.e. shoulders, knees, hips, elbows, wrists, and ankles. For the 20
still images constituting the previously selected frames, estimates for most of the
Comparative Analysis 337

available systems such as; AlphaPose, EfficentPose, MMPose and MediaPipe.


Where possible, for a wide range of parameters, such as the applied artificial
neural network architectures together with a pre-trained set of their weights.
System rejected wrnchAI [15] due to the lack of availability for research and
future application. Preliminary results for EfficientPose [8] also proved unsatis-
factory, despite the use of each of the parameters given (RT Lite, RT, I Lite,
I, II Lite, II, III, IV). While the program managed to select the characteristic
points, assigning them to the appropriate parts of the body turned out to be a
problem. Which resulted in an unnatural combination of them. The parameter
II achieved the best effect. The AlphaPose system generated a set of results for
the entire recording, but quite often it did not recognize the points at all and
misinterpreted others. Subsequently, the Media Pipe system was used, for which
the following architectures were used with trained parameters:
– hrnet_w32_coco_256 × 192_udp_regress,
– hrnet_w48_coco_256 × 192,
– res101_aic_256 × 192,
– hrnet_w32_aic_256 × 192,
– vgg16_bn_coco_256 × 192,
– res50_coco_256 × 192,
– res152_coco_256 × 192,
– litehrnet_30_coco_256 × 192.
For further tests, it was decided to choose three methods: OpenPose, MMPose
& MediaPipe. 35526 consecutive points were generated, corresponding to about
592 s of recording. As no significant differences were observed in the upper form,
detailed studies were limited to the lower body: i.e. hips and legs. Such a choice
was dictated by errors in the location of the knees. As expected, in each case there
were oscillations of the computed coordinates. Neither was any sequence-based
method used. Contrary to the authors [5,6], who used the Savitzky-Golay high-
frequency noise filtering method and Points with low confidence to study the
characteristics of infant movement and Points with low confidence were replaced
by previous locations. Using only the confidence coefficient is not a reliable factor
in this case, because it shows how similar the test material is to the training set.
Small values indicate a presumed position, e.g. when a point on one side is
obscured. For different parts of the body, it takes greater or lesser values. The
points for the upper torso and head for each method are similar and indicate
anatomical details fairly accurately. This proves that these features are very well
exposed and their similarity to training images is very close, it is also indicated
by the high confidence factor for these points. For each lower limb, the lengths (or
more precisely, their squares) were calculated for the calf and thigh, respectively.
It is more reliable for the system to estimate the position of the hip joint
than that of the knee joint. Using the graphs (Fig. 2), the beginning and end of
the defective sequences were determined. However, not all the extreme lengths
calculated turned out to be defective. It turned out to be a problem to estimate
the position of the knees, especially the straightened legs. These few frames
became the reference material for comparison with other systems. The OpenPose
338 A. Mrozek et al.

Fig. 2. Histograms of confidence score for two different points on the body

indication has become a benchmark. Two typical mistakes are illustrated by the
frames (Fig. 3).

Fig. 3. Estimation of points on the child’s body in different methods (black color –
OpenPose method, green or white – MMPose, dotted – MediaPipe Pose, last tree –
EfficentPose)

The graph in Fig. 3 shows the misdiagnosis of the location of the characteristic
points. There are moments when a sudden increase in the length of the calves
is accompanied by a decrease in the length of the thigh. This is the situation
where the knee point is shifted upward. There are also other errors where there
is a bad assignment to body parts and their connections. The charts in Fig. 4
shows the difference between the squares of the length of the calf and thigh of
the left leg (Figs. 5 and 6).
As can be seen above, the difference usually does not exceed 25,000 points.
This value is exceeded in 10 cases within 10 min of recording. The increase in
this value usually occurs simultaneously in each of the tested methods, but the
degree of exceeding is much greater in OpenPose than in the others, and there
Comparative Analysis 339

Fig. 4. Confidence score (upper) and squares of length for left leg (down)

Fig. 5. Squares of the length of the thigh and calf of the right leg

Fig. 6. Difference in the squares of the length of the thigh and calf of the left leg in
the time
340 A. Mrozek et al.

are also situations in which it does not occur in other methods. It is also worth
noting that in the MediaPipe method, despite the lowering of the hip line, there
is no shortening of the thigh part of the leg. The histogram in Fig. 7 present the
scattering of the difference between the squares of the length of both parts of
the left and right legs.

Fig. 7. Histogram: difference in the squares of the length of the thigh and calf of the
left leg

Based only on one film, it can be assumed that the MediaPipe method may
be the best. This method shows the smallest dispersion of values, and the indi-
cated values are the most compact. In the other two methods, single points are
located in the area from 20,000 to 25,000 points. Another attempt should confirm
whether the above dependencies are repeated in other recordings. The second
stage consisted in selecting the sequences in which the probability of anomalies
was the highest. The decisive parameter was the difference between the thigh
and calf length in relation to its maximum value. All the frames with the cur-
rent difference greater than 0.7 maximum were selected. Out of 130 films with
a total length of over 44 h (9.5 million frames), an average of 170 frames per
film met the criteria. After rejecting such films for which not all points were
calculated in the OpenPose system (A), for the remaining films the coordinates
of characteristic points in the selected frames were estimated for the MMPose
(B) (hrnet_w48_coco_256 × 192) and MediaPipe (C) methods. The following
rule was adopted for the evaluation: for each frame and point, the coordinates
are calculated using all three methods, obtaining three results. Of these three,
the one that is farthest away from the other two is discarded. The best method
is the one whose results are least often rejected. All six lower body points are
taken into account separately for each selected frame and film. The percentages
of rejections are calculated and their average value for each method and point.
It can be seen (Table 1) that the methods A and C are most often rejected,
and B and A the least frequently. If we only considered the number of films in
which a given method would most often be rejected, the situation would look
Comparative Analysis 341

Table 1. The most common and the least frequent outliers

Point/method 1 2 3 4 5 6
OpenPose (A) Mean 20% 20% 47% 52% 42% 41%
MMPose (B) 27% 22% 15% 13% 16% 18%
Media Pipe (C) 53% 58% 38% 35% 41% 41%
MAX C C A A A A
MIN A/B A/B B B B B

like in Table 2. It lists the number of films in which the results of the indicated
method most often differ from the others. The comparison shows that in only
9% of the films surveyed, the MMPose method gives way to another method (it
is rejected), while in 2/3 of the films it is part of the winning pair. At the same
time, it will keep an advantage of about 30% in relation to the next method.

Table 2. The most common deviations in the results for the methods in the following
videos

Point/method 1 2 3 4 5 6 Sum [%]


OpenPose (A) 8 9 56 66 48 45 232 39%
MMPose (B) 20 17 7 2 5 5 56 9%
Media Pipe (C) 71 73 36 31 46 49 306 52%

And the methods that are least often rejected illustrated in Table 3.

Table 3. Methods for which results are the least likely to deviate from subsequent
videos

Point/method 1 2 3 4 5 6 Sum [%]


OpenPose (A) 67 56 11 4 11 19 168 28%
MMPose (B) 27 42 81 83 78 74 385 65%
Media Pipe (C) 5 1 7 12 10 6 41 7%

5 Discussion and Conclusions

The built model of the bone skeleton is to be the basis for the development
of a universal method that allows to study the characteristics of spontaneous
movements of infants, in particular, their intensity, fluidity, phase and duration,
and also, the interdependencies between them. Determining quantitatively and
342 A. Mrozek et al.

qualitatively each activity of the newborn will allow you to document disturbing
developmental trends and track changes with age and during therapy. It will also
create the foundations for building a model of a child’s physical development in
the first weeks and months of life. With regard to the methods of obtaining char-
acteristic points, it can be stated that the popular OpenPose method fared worse
than the other two competing ones. Such a verdict was strongly influenced by the
lack of resistance to the occurrence of artifacts and their fairly large range. The
advantages, however, include the speed of this method and the ease of imple-
mentation. The method works well for the upper torso, head, hands, and feet by
detecting the toes and heel. The extended method also identifies the fingers, but
not always. The MMPose method is easily configurable and with a little mod-
ification it can generate a large data set. However, this takes a long time. The
necessity to install some libraries in advance may also be a significant obstacle.
It can also run on both GPU and CPU which is an advantage. The MMPose
method is the most stable, as long as certain parameters are used. Compared to
other methods, the characteristic points calculated by it are not included among
the most common outliers in individual items. On the other hand, they are the
least likely outliers in three out of five places. The advantage of the MediaPipe
method is the large number of points to be detected and the ease of installation
and commissioning. On the other hand, the OpenPose method is variable, as
it is most often rejected in three categories, and least often in the other two.
The selection will not change when the criterion is only the ranking position in
each video, without taking into account the aspect ratio. The MediaPipe method
fares the worst here, accounting for more than half of the rejections and only 7%
of the non-rejected ones. Method B is also in the lead, as it participates by far
the least in rejections and as much as 65% in the least frequently rejected ones.
Due to the nondeterministic test setup environment (Google Colab), the time
parameter was not included in the evaluation. In the described experiment, a
fairly primitive comparative method was used. The adopted method of selecting
the winner by elimination is not without its drawbacks. The studies did not take
into account differences in the definition of body parts, which may be a mistake.
However, the verdicts are fairly consistent across most focal points. However, it
does not take into account the seriousness of the error and there are situations
where the differences are minimal (<5%) – in Table 1 points 1 and 2, methods
A and B in the minimum category. Considering only the number of clips, it can
be concluded that OpenPose and MediaPipe are ex aequo, usually eliminated.
On the other hand, MMPose is most often found near another. In alternative
articles [7], the authors compare their proprietary CIMA-Pose method with the
popular OpenPose, EfficentPose III and Efficient Hurglass B4 methods and man-
made annotations. At the same time, the network is trained. In the cited article,
OpenPose performs the worst in terms of both accuracy and complexity. Sum-
marizing all the pros and cons, according to the authors, the MMPose method
is the most useful for the purpose indicated at the beginning. Due to the fact
that the presented method of comparing algorithms was limited to their selected
group and did not take into account the possibility of correction and training,
Comparative Analysis 343

future work should focus on the use of the target set of children’s profiles. Tuning
also requires a smooth transition between individual frames. This is the intention
to continue work on improving the method of analyzing the motor well-being of
infants. The results should be approached with caution as the dataset was nei-
ther too large nor representative, and the analysis does not take real coordinates
into account, so it may happen that the rejected method (the most different
from the others) is more accurate than the other two.

References
1. Cao, Z., et al.: OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part
Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1 (2021)
2. Ionescu, C., Fuxin Li, C.S.: Human3.6M Dataset. http://vision.imar.ro/human3.
6m/description.php. Accessed 03 Sep 2021
3. Ceseracciu, E., et al.: Comparison of markerless and marker-based motion capture
technologies through simultaneous data collection during gait: Proof of concept.
PLoS One. 9(3), 1–7 (2014)
4. Charles, J., et al.: Personalizing human video pose estimation. In: Proceedings of
the IEEE Computer Society Conference on Computer Vision and Pattern Recog-
nition, pp. 3063–3072 (2016)
5. Doroniewicz, I., et al.: Writhing movement detection in newborns on the second and
third day of life using pose-based feature machine learning classification. Sensors
(Switzerland). 20(21), 1–15 (2020)
6. Doroniewicz, I., et al.: Temporal and spatial variability of the fidgety movement
descriptors and their relation to head position in automized general movement
assessment. Acta Bioeng. Biomech. 23(3), 1–21 (2021)
7. Groos, D., et al.: Towards human performance on automatic motion tracking of
infant spontaneous movements. Comput. Med. Imaging Graph. 95, 1–14 (2021)
8. Groos, D., Ramampiaro, H., Ihlen, E.A.F.: EfficientPose: scalable single-person
pose estimation. Appl. Intell. 51(4), 2518–2533 (2020). https://doi.org/10.1007/
s10489-020-01918-7
9. Hanbyul (Han), J., Simon, T., Donglai Xiang, Y.R.Y.A.S.: CMU Panoptic Dataset.
http://domedb.perception.cs.cmu.edu/. Accessed 03 Sep 2021
10. Hesse, N., et al.: Body pose estimation in depth images for infant motion analysis.
In: Proceedings of the Annual International Conference of the IEEE Engineering
in Medicine and Biology Society, EMBS. (2017). https://doi.org/10.1109/EMBC.
2017.8037221
11. Hesse, N., Bodensteiner, C., Arens, M., Hofmann, U.G., Weinberger, R., Sebastian
Schroeder, A.: Computer vision for medical infant motion analysis: state of the
art and RGB-D data set. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS,
vol. 11134, pp. 32–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-
11024-6_3
12. Hesse, N., et al.: Estimating body pose of infants in depth images using random
ferns. In: Proceedings of the IEEE International Conference on Computer Vision
(2015). https://doi.org/10.1109/ICCVW.2015.63
13. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models
for human pose estimation. In: British Machine Vision Conference BMVC 2010 -
Proceedings, pp. 1–11 (2010)
344 A. Mrozek et al.

14. Lin, T., Maire, M.: COCO Dataset | Papers With Code. https://paperswithcode.
com/dataset/coco. Accessed 03 Sep 2021
15. Migliorelli, L., et al.: The babyPose dataset. Data Br. 33 (2020). https://doi.org/
10.1016/j.dib.2020.106329
16. Nakano, N., et al.: Evaluation of 3D markerless motion capture accuracy using
open-pose with multiple video cameras. Front. Sport. Act. Living. 2(50), 1–9 (2020)
17. Passmore, E., et al.: Deep learning for automated pose estimation of infants at
home from smart phone videos. Gait Posture. 81 (2020). https://doi.org/10.1016/
j.gaitpost.2020.08.026
18. Sapp, B., Taskar, B.: MODEC: Multimodal decomposable models for human pose
estimation. In: Proceedings of the IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition, pp. 3674–3681, Portland, OR, USA (2013)
19. Sun, K., et al.: Deep high-resolution representation learning for human pose esti-
mation. In: Proceedings of the IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, pp. 5686–5696 (2019)
20. Huang, X., Fu, N., Shuangjun Liu, S.O.: Invariant representation learning for infant
pose estimation with small data. In: 16th IEEE International Conference on Auto-
matic Face and Gesture Recognition, p. 18, Jodhpur, India (2021). https://doi.
org/10.1109/FG52635.2021.9666956
21. Home - WRNCH. https://wrnch.ai/, Accessed 03 Sep 2021, CMU Panoptic
Dataset. http://domedb.perception.cs.cmu.edu/. Accessed 03 Sep 2021
22. COCO Dataset | Papers With Code. https://paperswithcode.com/dataset/coco.
Accessed 03 Sep 2021
23. Human3.6M Dataset. http://vision.imar.ro/human3.6m/description.php. Accessed
03 Sep 2021
24. Leeds Sports Pose Dataset. http://sam.johnson.io/research/lsp.html. Accessed 03
Sep 2021
25. MPII Human Pose Database. http://human-pose.mpi-inf.mpg.de/. Accessed 03
Sep 2021
26. Pose Estimation. https://paperswithcode.com/task/pose-estimation. Accessed 03
Sep 2021
27. Pose Estimation on MPII Human Pose.https://paperswithcode.com/sota/pose-
estimation-on-mpii-human-pose. Accessed 03 Sep 2021
28. VGG Pose Datasets. https://www.robots.ox.ac.uk/~vgg/data/pose/. Accessed 03
Sep 2021
Head Pose and Biomedical Signals
Analysis in Pain Level Recognition

Maria Bieńkowska1(B) , Aleksandra Badura1 , Andrzej Myśliwiec2 ,


and Ewa Pietka1
1
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{maria.bienkowska,aleksandra.badura,ewa.pietka}@polsl.pl
2
Institute of Physiotheraphy and Health Science, Academy of Physical Education
in Katowice, ul. Mikołowska 72a, 40-065 Katowice, Poland
a.mysliwiec@awf.katowice.pl

Abstract. Pain feeling assessment is crucial for a safe and efficient


course of physiotherapy. Especially onset of severe pain stands for specific
tissue guard and protects it from damage. In this study, an approach for
automatic pain level recognition is described. Biomedical signals (EMG,
BVP, EDA) and video data of a head pose are analyzed in patients
undergoing fascial therapy. The impact of video data and their fusion
with biomedical data is tested for the system’s performance. Decision
trees and random forest are applied for classification, yielding an accu-
racy of 0.85. The energy of the EMG signal turned out to be a highly
discriminative feature that dominated the weak classifier. Video features
impact the classification results in ensembled methods.

Keywords: Pain in physiotherapy · Pain assessment · Pain


recognition · Manual therapy · Video analysis

1 Introduction
Pain level recognition is an essential issue for medicine and physiotherapy. Since
a patient’s self-report is highly subjective and corresponds to the individual pain
experience, automatic pain feeling recognition systems need to be developed. It is
really important, especially in uncooperative patients (infants, mentally-disabled
people, etc.). Pain level recognition also allows for identifying the stimulus inten-
sity, which can be harmful and may lead to tissue damage [24]. Pain intensity
adjustment is a key step in therapy procedures, where the pain cannot be con-
trolled by the subject himself and is induced by an external source (e.g., a
physiotherapist). Then, an automatic pain feeling recognition system seems to
be a kind of tissue emergency brake. There are some widely used databases in
the field of automatic pain assessment research. One of the most popular is the
BioVid Heat Pain Database [21]. It gathers data for 90 patients and is based
on electrodermal activity (EDA), electrocardiography (ECG), electromiography
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 345–355, 2022.
https://doi.org/10.1007/978-3-031-09135-3_29
346 M. Bieńkowska et al.

(EMG), electroencephalography (EEG) signals, and video data collected during


thermal stimulation. Thermal stimulation was used for SenseEmotion Database
[20] as well. In the UNBC-McMaster database [14], patients suffering from back
pain performed range of motion (ROM) tests during the video data acquisition.
For other research projects, the pain was triggered by cold [26] or electrical stim-
ulation [9]. Some studies were performed on the data collected after surgery [12]
or during a tactile brush session [13].
EDA is a frequently considered signal in the level of pain feeling assess-
ment. Its variability is found to be a robust emotion and pain indicator [7]. The
amplitude of the EDA signal is correlated with the number of recruited sweat
glands [4] and shows emotional states such as arousal. It can be found that pain
causes heart rate (HR) variations as well [18]. In [15], the authors used HR-
based features for acute pain sensation assessment in postoperative patients.
Frontalis EMG was also found as a pain sensation indicator [6,21]. In recent
years, video data have been increasingly used. Video data can present behav-
ioral pain responses such as body movements, facial expressions, vocalizations
and can be widely analyzed in terms of pain level recognition [22]. Salekin et al.
[16] proposed a method to assess neonatal postoperative pain occurrence using
a multimodal dataset (including facial expression, body movements, and crying
sounds) with an accuracy of 0.7. Zhi et al. [27] combined facial expression and
biomedical signals to assess the pain feeling with an accuracy of 0.6. Aquajari
et al. [1] used an EDA signal and classified baseline and high pain level with an
accuracy of 0.62.
Despite many achievements in the field, there is a high need to improve pain
sensation assessment systems in real use cases [22]. Manual therapy is one of
the fields where pain feeling recognition systems are awaited. Nevertheless, com-
pared to laboratory-based examinations, clinical conditions carry many imped-
iments. First, pain stimulus cannot be designed in terms of intensity, duration,
and increase. Thus, it is challenging to obtain the ground truth of the stimuli.
Moreover, collecting patients’ pain feelings is limited. Some therapy protocols
require a lying position, making using any buttons [21] or manual sliders impos-
sible. One more issue is the presence of inevitable movement artifacts in signals
resulting from the procedures performed on a patient. To the best of the authors’
knowledge, there is no research on pain sensation assessment systems dedicated
to manual physiotherapy.
The goal of our study is to extend our previous research [2,3] by incorporat-
ing head pose analysis and combining the data in one classification procedure.
A wider group of patients that underwent the fascial therapy has been tested.
This paper is organized as follows. Section 2 describes the multimodal data
acquisition set up followed by the overall method yielding the classification result.
It includes the video data analysis for head pose tracking and refers to the
signal features already described in [2,3]. Then, decision trees and random forest
employed at the classification step are presented. For evaluation (Sect. 3) two
different feature vectors were tested: one with signals features only, the other
with video and biomedical signals features combined. Several experiments were
Video and Biomedical Signals Analysis in Pain Level Recognition 347

carried out, including tests on balanced and imbalanced data sets. The benefit
of the head pose data analysis for pain sensation assessment in manual therapy
is discussed in Sect. 4. Section 5 concludes the paper.

2 Materials and Methods


The workflow of the study is presented in Fig. 1. A double-branch data analysis
concludes in classifying time frames as either severe or no pain feeling. The
details of data acquisition and processing are given in the following sections.

Fig. 1. The workflow of data acquisition and analysis for pain level assessment

2.1 Data Set

The study was taken in an isolated, quiet therapy room. Participants were
informed about the examination course and have given a consent form. The
fascial therapy of the shoulders and neck area was performed by a physiothera-
pist and continued for 2–3 min. Patients were sitting on a chair. The procedure
was performed twice. Patients informed about their painful feelings during the
therapy course, saying a number in the range of 0 to 10, where 0 means no pain
and 10 refers to imaginably the most severe pain.
Additionally, after the therapy, patients were asked to indicate numeric
thresholds of moderate pain feeling (MPT) and severe pain sensation (SPT)
in their own opinion. The boundaries were used to binarize the pain labels. For
this study, we used data from 46 patients aged 22–72. The measurements were
carried out at a physiotherapy clinic and were approved by a bioethics commit-
tee.
Data were collected with the measuring system consisting of a wearable device
(RespiBan Professional, Portugal, Biosignalplux) and an external video camera
for patient movement registration. EDA, BVP, and EMG signals were acquired
with sampling frequencies 8 Hz, 64 Hz, 256 Hz, respectively. EDA sensors were
attached to the middle part of the index and middle fingers. The BVP clip-sensor
collected a signal from a ring finger. EMG electrodes were placed on the forehead
to register the corrugator supercilii muscle signals. The video camera captured
waist up images with a 30 Hz frequency, 640 × 280 px. The patient was sitting
348 M. Bieńkowska et al.

Fig. 2. Scheme of the data acquisition system

in an upright position, and a full view of the frontal face was recorded. The
platform (Fig. 2) acquired synchronized time-series with a pain feeling rating
declared by the patient during therapy [2]. The preprocessed signal data was
divided into 4-second frames with a 50% overlap. It meets the requirements of
short-term pain feeling assessment, which is important in the manual therapy.

2.2 Video Analysis for Head Pose Tracking


A video frame including the patient’s upper body is subjected to a multiphase
processing algorithm (Fig. 3). It includes the face area detection in the first video
frame to determine the region of interest (ROI). Then, as selected corner points
are found, the point tracking algorithm (KLT) derives a geometric transforma-
tion. It is applied to the following video frame and yields a new ROI bounding
box position with already detected corner points. Processing of further frames
continues. A detailed description of consecutive phases is presented below.
The first video frame is subjected to the Viola-Jones algorithm [11] that
extracts Haar-like features and uses a pre-trained classifier to find the face area.
Haar features are based on the difference of total pixel values between rectangular
regions in the picture. Such regions are used for eyes, nose, and cheeks recognition
and finally allow for face area detection. It becomes a region of interest (ROI)
and is subjected to further processing (Fig. 4).
Corner point detection for face tracking is the next phase in the workflow
(Fig. 3). Since patients have been wearing EMG forehead electrodes and, in most
cases, a protective mask, implementation of commonly known anatomical points
(right eyebrow inner corner, the midpoint of the upper lip) is limited. Therefore,
the corner points extraction is based on the eigenvalue algorithm [17]. A corner
point is detected if the lowest eigenvalue in the 2 × 2 pixel sub-window is higher
than a predefined threshold. The algorithm provides corner points (white signs
in Fig. 4) that are easy for tracking. Their values are above the image noise level
and feature a similar magnitude.
Extracted corner points are subjected to the Kanade-Lucas-Tomasi (KLT)
feature-tracking algorithm [19]. It aims to designate a geometric transformation
Video and Biomedical Signals Analysis in Pain Level Recognition 349

between subsequent video frames (Fig. 3). For proper tracking, the KLT algo-
rithm requires a ROI with many corner points of no significant differentiation
in the following video frames. Still, the KLT uses a complex transformation
and manages to track points moving with various velocities. The transformation
derived is applied to the ROI bounding box outlined in Phase 1 and updated for
consecutive video frames tracking the face position.
For each video frame the angle between the ROI bounding box diagonal and
the x-axis is found (Fig. 4). The result is referred to as the head tilt angle. Two
features are extracted from each 4-second frame for further analysis: the range
of a head tilt based on the angle and the maximum of its 1st derivative.

Fig. 3. The head pose tracking workflow

2.3 Biomedical Signals Analysis


Firstly, the EDA signal is smoothed with the Gaussian-weighted moving average
filter. Then, Greco et al. [8] convex optimization approach is used, breaking the
signal down into three components: tonic, phasic, and additive white Gaussian
noise. The EDA phasic component is used for further analysis, and the range
of its amplitude in each frame is computed. For BVP, the median period is
determined. The median amplitude of the pulse wave stands for another feature.
Also, EMG energy is extracted to reveal facial expressions (like frowning, tight
eyelids closing). For details see [2,3].
Finally, the feature vector consists of 6 features computed for each 4-second
frame (Fig. 5). Two of them describe the head position, the other 4 are derived
form the signal data.

2.4 Classification
Frame labelling is determined based on the patient’s subjective pain thresholds.
All frames marked by a subject with a number higher than SPT are treated as
severe pain class, whereas those below MPT stand for no-pain class (details are
given in [3]). The remaining frames are not used in this study. Since there is
a noticeable class imbalance (300 severe pain and 736 no-pain frames), we use
350 M. Bieńkowska et al.

Fig. 4. Head tilt angle detection. Characteristic points are marked by white signs,
yellow rectangles stand for tracked bounding box. Head tilt angle α = 49◦ (left frame)
and α = 57◦ (right frame)

Fig. 5. Workflow of feature extraction for pain feeling analysis

two differently arranged data sets. In the first one, the observations’ number
equals 300 in both classes, whereas the second data set contains the originally
imbalanced number of frames.
Decision tree and random forest classifiers are used for binary data classifica-
tion. A decision tree is a simple and quick classifier that can be used in real-time
analysis. Decision trees split nodes to get the best impurity in the received class
through the minimization of the Gini index. The feature that ensures a division
into two the most pure classes is selected during each split. The same feature
can be used a few times, but a too deep tree causes the classifier to train specific
cases instead of global rules. In this study, the max number of splits was 4 to
avoid overfitting to the training set. Because decision trees are unstable (slight
variance in the data may distort the result), their ensemble - random forest was
also used.
Random forest is a method of tree bagging that combines the predictions of
several weak base learners. The training set is divided into a few groups. Each
group trains one weak learner. A random subset of predictors is used for node
splitting in the growing phase. It results in a diversity of the trees ensemble [5].
Video and Biomedical Signals Analysis in Pain Level Recognition 351

Then, the random forest prediction is made as follows: the results of all weak
learners are given, and the class with the most votes stands for the final decision.
Compared to basic decision tree classifier, random forest leads to more accurate
output and reduced variance. In this study, the number of the learning cycles
was set to 30. Its increase did not significantly affect the results.

3 Results

Several experiments were carried out. Firstly, we used two different feature sets to
validate classification performance. One set enclosed all six extracted features,
and the second one was reduced to biomedical signals features only. Besides,
we tested imbalanced and balanced data sets (in terms of class observations’
number). A 10-fold and one-patient-out cross-validation was applied. Presented
results are the median of the 10-times run, each with a new random data set.
The assessment involved five metrics: accuracy, sensitivity, precision, specificity,
and F1 score. Results for decision trees classifier are presented in Table 1 and for
random forest in Table 2. Table 3 represents results for imbalanced data.

Table 1. Decision trees classification results

One-patient-out 10-fold
All features Without video All features Without video
Accuracy 0.68 ± 0.06 0.65 ± 0.08 0.75 ± 0.07 0.75 ± 0.05
Sensitivity 0.67 ± 0.16 0.78 ± 0.21 0.90 ± 0.07 0.89 ± 0.07
Precision 0.64 ± 0.20 0.64 ± 0.24 0.69 ± 0.06 0.70 ± 0.04
Specificity 0.73 ± 0.12 0.60 ± 0.12 0.59 ± 0.12 0.61 ± 0.08
F1 score 0.62 ± 0.08 0.66 ± 0.12 0.78 ± 0.05 0.78 ± 0.04

Table 2. Random forest classification results

One-patient-out 10-fold
All features Without video All features Without video
Accuracy 0.66 ± 0.10 0.61 ± 0.09 0.85 ± 0.06 0.81 ± 0.05
Sensitivity 0.60 ± 0.11 0.61 ± 0.16 0.85 ± 0.04 0.83 ± 0.06
Precision 0.68 ± 0.21 0.65 ± 0.15 0.85 ± 0.09 0.80 ± 0.08
Specificity 0.73 ± 0.13 0.59 ± 0.30 0.86 ± 0.08 0.80 ± 0.07
F1 score 0.64 ± 0.15 0.61 ± 0.10 0.85 ± 0.06 0.81 ± 0.06
352 M. Bieńkowska et al.

Table 3. Random forest classification results for imbalanced data

One-patient-out 10-fold
All features Without video All features Without video
Accuracy 0.77 ± 0.10 0.72 ± 0.13 0.90 ± 0.02 0.84 ± 0.04
Sensitivity 0.51 ± 0.20 0.44 ± 0.23 0.76 ± 0.08 0.65 ± 0.09
Precision 0.60 ± 0.44 0.57 ± 0.27 0.86 ± 0.09 0.77 ± 0.13
Specificity 0.91 ± 0.08 0.85 ± 0.10 0.95 ± 0.03 0.92 ± 0.05
F1 score 0.46 ± 0.34 0.48 ± 0.23 0.80 ± 0.06 0.70 ± 0.09

4 Discussion

Carried out experiments aimed to find the impact of video features on pain
level classification. Video and biomedical data fusion provide better results for
a random forest than using only biomedical signals (F1 score at 0.85 and 0.81,
respectively). Though, this seems not to affect decision trees performance: results
for both data sets are comparable. Here, the first split was based on the EMG
energy and implies EMG to be a distinctive predictor. Head pose features were
used in further steps and thus had less impact on the results. Since random
forest impose random feature set at every split, video data could be involved
in initial steps. It resulted in boosting the classification performance. Therefore,
head pose analysis may be beneficial in pain level recognition systems for manual
therapy.
Imbalance in severe pain and no-pain frames is well known in literature [23].
Hence, there is a need to validate systems on such imbalanced data sets. Sat-
isfying results were obtained for the random forest with sensitivity at 0.76 and
F1 score at 0.80 for the all-features set. Figure 6 presents a confusion matrix for
imbalanced and balanced data set classification.
It is worth considering which validation method is used for pain classifi-
cation problem. Since, in some cases, training and test sets include the same
patient’s samples, k-fold cross-validation may artificially improve the results.
Regarding the widely observed subjectivity of pain reactions, one-patient-out
cross-validation reflects the credibility of classifiers better than k-fold cross-
validation. Several head-pose responses were observed from pain-avoiding moves
to a relaxed, upright position. What is more, manual therapy could affect head
positions, i.e., some patients may turn their heads to facilitate access to the
muscle. Thus, the patient-specific model is desirable in our future work.
In previous works, Zhi et al. [27] proposed a method integrating biomedical
(EDA, ECG, and EMG) and video data and received accuracy of 0.68, where the
pain was induced by heat and electric stimulation. Better results (accuracy of
0.89) were obtained on cold compressor stimulation [10], where heart rate, blood
pressure, respiration, EDA, and video-based facial action were combined. Still,
experiments were carried out in laboratory conditions. Zamzmi et al. [25] per-
formed pain level assessment in 32–41 weeks old infants for painful procedures,
Video and Biomedical Signals Analysis in Pain Level Recognition 353

e.g., heel lancing. Facial expressions, body movements, and vital signs (HR, res-
piratory rate, and spirometry) integration resulted in accuracy of 0.95, yet the
pain was assessed in 1-minute intervals. Manual therapy requires continuous pain
monitoring, affecting therapy course and intensity. This study used overlapped
4-seconds frames, which spot better pain onset reactions. It also directs our
further efforts to a real-time recognition approach.

Fig. 6. Confusion matrices for imbalanced and balanced data set, respectively. Results
of random forest for all features classification

5 Conclusion
In this paper, an approach for automatic pain assessment was presented. The
study was based on data acquired during the manual physiotherapy. Biomedical
signals (EDA, BVP, and EMG) and video data based on the head pose were used
in binary classification. Decision trees and random forest were tested, yielding
accuracy of 0.85 for balanced data sets. In some cases, video data enhanced the
classification performance.

Acknowledgement. This work was supported by the Polish-German grant in the


field of DIGITIZATION of ECONOMY: ‘Multimodal Platform for Pain Monitoring in
Physiotherapy’ (grant number WPN-3/1/2019).

References
1. Aqajari, S.A.H., et al.: Pain assessment tool with electrodermal activity for post-
operative patients: method validation study. JMIR mHealth uHealth 9(5), e25–258
(2021)
2. Badura, A., Bieńkowska, M., Masłowska, A., Czarlewski, R., Myśliwiec, A.,
Pietka, E.: Multimodal signal acquisition for pain assessment in physiotherapy.
In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technology
in Biomedicine. AISC, vol. 1186, pp. 227–237. Springer, Cham (2021). https://doi.
org/10.1007/978-3-030-49666-1_18
354 M. Bieńkowska et al.

3. Badura, A., Masłowska, A., Myśliwiec, A., Piętka, E.: Multimodal signal analysis
for pain recognition in physiotherapy using wavelet scattering transform. Sensors
21(4), 1311 (2021)
4. Benedek, M., Kaernbach, C.: Decomposition of skin conductance data by means
of nonnegative deconvolution. Psychophysiology 47(4), 647–658 (2010)
5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
6. Cram, J.R., Steger, J.C.: Emg scanning in the diagnosis of chronic pain. Biofeed-
back Self-regul. 8(2), 229–241 (1983)
7. Greco, A., Marzi, C., Lanata, A., Scilingo, E.P., Vanello, N.: Combining electro-
dermal activity and speech analysis towards a more accurate emotion recognition
system. In: 2019 41st Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC), pp. 229–232 (2019)
8. Greco, A., Valenza, G., Lanata, A., Scilingo, E.P., Citi, L.: cvxEDA: a convex
optimization approach to electrodermal activity processing. IEEE Trans. Biomed.
Eng. 63(4), 797–804 (2016)
9. Haque, M.A., et al.: Deep multimodal pain recognition: a database and comparison
of spatio-temporal visual modalities. In: 2018 13th IEEE International Conference
on Automatic Face & Gesture Recognition (FG 2018), pp. 250–257. IEEE (2018)
10. Hinduja, S., Canavan, S., Kaur, G.: Multimodal fusion of physiological signals and
facial action units for pain recognition. In: 2020 15th IEEE International Confer-
ence on Automatic Face and Gesture Recognition (FG 2020), pp. 577–581. IEEE
(2020)
11. Jones, M.J., Viola, P., et al.: Robust real-time object detection. In: Workshop on
statistical and computational theories of vision, vol. 266, p. 56 (2001)
12. Lim, H., Kim, B., Noh, G.J., Yoo, S.K.: A deep neural network-based pain classifier
using a photoplethysmography signal. Sensors 19(2), 384 (2019)
13. Lopez-Martinez, D., Peng, K., Lee, A., Borsook, D., Picard, R.: Pain detection with
FNIRS-measured brain signals: a personalized machine learning approach using the
wavelet transform and bayesian hierarchical modeling with dirichlet process pri-
ors. In: 2019 8th International Conference on Affective Computing and Intelligent
Interaction Workshops and Demos (ACIIW), pp. 304–309. IEEE (2019)
14. Lucey, P., Cohn, J.F., Prkachin, K.M., Solomon, P.E., Matthews, I.: Painful data:
The UNBC-McMaster shoulder pain expression archive database. In: 2011 IEEE
International Conference on Automatic Face & Gesture Recognition (FG), pp.
57–64. IEEE (2011)
15. Naeini, E.K., et al.: Pain recognition with electrocardiographic features in post-
operative patients: method validation study. J. Med. Internet Res. 23(5), e25–079
(2021)
16. Salekin, M.S., Zamzmi, G., Goldgof, D., Kasturi, R., Ho, T., Sun, Y.: Multimodal
spatio-temporal deep learning approach for neonatal postoperative pain assess-
ment. Comput. Biol. Med. 129, 104–150 (2021)
17. Shi, J., et al.: Good features to track. In: 1994 Proceedings of IEEE Conference on
Computer Vision and Pattern Recognition, pp. 8–10. IEEE (1994)
18. Terkelsen, A.J., Mølgaard, H., Hansen, J., Andersen, O.K., Jensen, T.S.: Acute
pain increases heart rate: differential mechanisms during rest and mental stress.
Auton. Neurosci. 121(1–2), 101–109 (2005)
19. Tomasi, C., Kanade, T.: Detection and tracking of point. Int. J. Comput. Vis. 9,
137–154 (1991)
Video and Biomedical Signals Analysis in Pain Level Recognition 355

20. Velana, M., Gruss, S., Layher, G., Thiam, P., Zhang, Y., Schork, D., Kessler, V.,
Meudt, S., Neumann, H., Kim, J., Schwenker, F., André, E., Traue, H.C., Walter,
S.: The SenseEmotion database: a multimodal database for the development and
systematic validation of an automatic pain- and emotion-recognition system. In:
Schwenker, F., Scherer, S. (eds.) MPRSS 2016. LNCS (LNAI), vol. 10183, pp.
127–139. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59259-6_11
21. Walter, S., et al.: The biovid heat pain database data for the advancement and
systematic validation of an automated pain recognition system. In: 2013 IEEE
International Conference on Cybernetics (CYBCO), pp. 128–131. IEEE (2013)
22. Werner, P., Al-Hamadi, A., Limbrecht, K., Walter, S., Gruss, S., Traue, H.C.:
Automatic pain assessment with facial activity descriptors. IEEE Trans. Affect.
Comput. 8, 286–299 (2017). https://doi.org/10.1109/TAFFC.2016.2537327
23. Werner, P., Lopez-Martinez, D., Walter, S., Al-Hamadi, A., Gruss, S., Picard, R.:
Automatic recognition methods supporting pain assessment: a survey. IEEE Trans.
Affect. Comput. 1 (2019)
24. Williams, A.C.D.C.: Facial expression of pain: an evolutionary account. Behav.
Brain Sci. 25(4), 439–455 (2002)
25. Zamzmi, G., Pai, C.Y., Goldgof, D., Kasturi, R., Ashmeade, T., Sun, Y.: An app-
roach for automated multimodal analysis of infants’ pain. In: 2016 23rd Interna-
tional Conference on Pattern Recognition (ICPR), pp. 4148–4153. IEEE (2016)
26. Zhang, X., et al.: Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic
facial expression database. Image Vis. Comput. 32(10), 692–706 (2014)
27. Zhi, R., Zhou, C., Yu, J., Li, T., Zamzmi, G.: Multimodal-based stream integrated
neural networks for pain assessment. IEICE Trans. Inf. Syst. 104(12), 2184–2194
(2021)
Electromyograph as a Tool for Patient
Feedback in the Field of Rehabilitation
and Targeted Muscle Training

Michal Prochazka(B) and Vladimir Kasik

VSB-Technical University of Ostrava, FEECS, K450, 17. Listopadu 15,


Ostrava-Poruba, Czech Republic
{michal.prochazka,vladimir.kasik}@vsb.cz

Abstract. Diagnostic instruments are nowadays an integral medicine


part. Instruments try to make work of doctors easier. The measured
data are often poorly understood by the layperson, so it is important
that the physician sufficiently explains the information obtained from the
biosignal. However, sometimes this information is not necessarily impor-
tant. When measuring electromyographic signals during rehabilitation or
training of athletes, rapid feedback is essential. This paper deals exclu-
sively with the creation prototype of an electromyograph measurement
chain with a quick and simple presentation of the electromyographic
signal for the layperson. Signal pre-processing is discussed and many
presentation variant of electromyographic signal. Such as acoustic out-
put, lighting of LEDs by EMG, visualization of the EMG signal intensity
on a cascade of LEDs and playback of the selected sound when the set
intensity of the electromyographic signal is exceeded. The device has the
ability to adjust the level of difficulty to monitore progress.

Keywords: sEMG · Audio output · Rehabilitation · Biosignal


processing · bmeng DAU unit · Circuit design

1 Introduction

Electromyography (EMG) is well-known and used diagnostic method for mea-


suring the electrical activity of muscles.
The output from the electromyograph does not have to be referred only for
subjective evaluation by a medical expert. Nowadays, attention is paid to meth-
ods and procedures in which the diagnostic part should be able to be performed
at least to some extent by an autonomous device or by the user/patient him-
self. Such efforts aim to lighten the burden on physicians and at the same time
improve the comfort of users. The conventional way of obtaining output infor-
mation from an electromyograph is in graphical form as a time course. This is
not always suitable for human assessment, as the chosen scale may cause the
observer to miss either some details or, on the contrary, the overall time frame
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 356–366, 2022.
https://doi.org/10.1007/978-3-031-09135-3_30
EMG as Feedback During Rehabilitation 357

of the biosignal. Another aspect is the fact that this form can be complicated
for assessment by the layperson.
However, for rehabilitation purposes or e.g. targeted training for athletes
effective and fast real-time feedback is essential. Here other ways of output from
the electromyograph are offered, e.g. as a conversion to an acoustic signal or as
another, simplified and easier to perceive, form of graphical output.
Biofeedback is a tool used for effective rehabilitation. The aim of biofeed-
back is to convey information to the patient in a simple way, for example about
their muscle activity. Our aim is to present a comprehensive aid for the field of
neurorehabilitation or for targeted training of athletes. Nowadays, good results
are achieved by aids that are equipped with an acoustic output. This type of
biofeedback has proven to be more beneficial and less stressful for patients than
visual and audiovisual biofeedback [13]. There are multiple ways to transform
information from an EMG signal to sound. For example, one presented is to
assign a tone color to a given channel. The patient has been shown to be able
to recognize which muscle is moving based on the tone [8]. Acoustic output has
been shown to be beneficial for biofeedback in rehabilitation [16], but it does
not allow for simple quantification of EMG intensity. We attempt to overcome
this drawback in our work by introducing several options to present the elec-
trical activity of muscles with different levels of difficulty. The presentation of
the EMG signal is enabled by simple visualization as well as acoustic output.
By monitoring the set difficulty level, the patient receives information about
the progress and benefits of rehabilitation. This motivates the patient. We also
propose a possible muscle training that uses the EMG signal intensity levels to
trigger the tone of a musical instrument.

2 Methods Used

People after a severe injury, learning to move their musculoskeletal system again,
need simple feedback as to whether the efforts made to move the limb are having
a progressive impact. Conventional EMG devices need a screen displaying the
measured signal to visualize the measured record. It is often difficult to explain to
a layperson how to read the plotted record and what information it carries. This
information may be superfluous for feedback. It is important to find a simple way
to give the patient feedback on their muscle activity[1,2,4,6,11,14,15]. Feedback
should carry with it positive motivation and a vision of progress.
The new solution of the electromyograph extends the device with audio out-
put, simple visualization by lighting LED, displaying the EMG signal intensity
on a cascade of LEDs and playing sound when the set EMG signal intensity is
exceeded (possibility of playing a fanfare or the sound of a musical instrument).
Emphasis is placed on the easy transmission of information about muscle activity
to the patient. Motivation is also thought of to give the patient the strength to
continue rehabilitation. The audio output may be useful to assess the electrical
activity of the muscles from a different perspective. In order to obtain the most
faithful signal for the audio output, signal pre-processing must be implemented
358 M. Prochazka and V. Kasik

analogue. To monitor the progress of rehabilitation, different levels are set to


light the LED, cascade of LEDs and play the audio. The device allows to record
the progress of rehabilitation by tracking the change in difficulty level.

3 Technical Aspects of EMG and Equipment


Requirements
The surface electromyographic signal varies in amplitude from 0.05 to 5 mV with
a frequency from 2 500 Hz [5,14], with the majority of the signal in the frequency
range of 20 200 Hz [18]. During muscle contraction, a pulse-like electrical signal
is produced by synchronising the activation of muscle fibres. This pulse repeats
at a frequency of 6 30 Hz [5,9,14]. The EMG signal pre-processing filters must
meet these requirements. This means setting the cut-off frequencies of the filters
to the mentioned frequency band and including a corresponding amplifier in the
measurement chain.

Fig. 1. Block diagram of biosignal pre-processing

An integral part of the measured signal, although not wanted, are interfering
signals called artifacts. In terms of origin, artefacts are divided into biological and
technical. Biological artefacts originate from the patient’s own activity. Specif-
ically, in EMG, it is the electrical activity of the surrounding muscles, as well
EMG as Feedback During Rehabilitation 359

as the motion artifact resulting from the mutual movement of the electrode and
the electrolyte. One way to remove biological artifact is to capture the biological
activity itself separately and then subtract it from the signal. Another option is
sophisticated systems based on data analysis and classification (fuzzy systems,
machine learning, artificial intelligence). Technical artefacts are signals that do
not originate from the patient’s own activity. A typical example is electric hum,
which can significantly distort the signal under examination due to capacitive
coupling. Other technical artefacts are different half-cell voltages on the elec-
trodes, drying of the electroconductive gel, and thermal and contact noise of the
components [3,10,12,17]. In the design of the EMG signal measurement chain,
the elimination of artifacts must be achieved as much as possible.
Biosignal pre-processing can be divided into several sub-blocks (Fig. 1).
As an input pre-amplifier, an instrumentation amplifier is most often used,
which with its high input impedance is suitable for sensing biological signals.
Next, a high-pass filter is used to remove the DC component, and the artifact
caused by the different half-cell potentials at the electrodes is also removed. An
isolation amplifier based on capacitive, inductive or optical coupling provides
galvanic isolation to ensure safety for the patient in the event of a failure. A
low-pass filter removes high-frequency interference. A band-pass filter is used
for analogue interference removal. If it is not necessary to visualize the signal
analogue, mains interference can be removed digitally using the latest algorithms
[7]. A variable gain amplifier provides the final amplification of the pre-processed
signal [2,5,10].
The interpretation of the EMG signal in this case means a simple visuali-
sation of the signal either optically or acoustically. For the creation of acoustic
output, well-known circuitry such as LM386 or voltage-to-frequency converter
circuitry (e.g. CD4046) can be used. LED illumination can be realized, for exam-
ple, by wiring a transistor in the form of a follower. The implementation of the
visualization of the EMG signal intensity on the LED cascade and the playback
of the sound for exceeding a certain EMG signal intensity will be approached dig-
itally. The level of EMG signal visualization can be easily changed by a variable
amplifier (penultimate block in the biosignal pre-processing).

4 Creating a Measurement Chain


According to the blocks in Fig. 1, the measurement chain for EMG signal pre-
processing is created step by step. The device is powered from a bmeng DAU
unit, it is an A/D converter with galvanic isolation. Therefore, it is not necessary
to create galvanic isolation in our solution. Because of the large input impedance
it was chosen the INA128 instrumentation amplifier. The amplification has been
set (G1 ) of approximately 10× according to formula (1). The amplification is set
by placing resistor R1 between pin 1 and 8 of 5600 Ω.
50000
G1 = 1 + (1)
R1
360 M. Prochazka and V. Kasik

The filters are chosen by active second-order Sallen-Key. These filters have
a –6 dB drop in cutoff frequency followed by a –40 dB/dec drop. This is a
reasonable compromise between overshoot (higher order filters) and insufficient
attenuation of unwanted frequencies. The cutoff frequencies are chosen according
to the written frequency ranges. The low‘ pass filter 23 Hz and the high pass filter
498 Hz. The cutoff frequency (fcut−of f ) is calculated using formula (2).
1
fcut−of f = (2)
2πRC
where R and C are the selected component values for the filter.
The notch filter for removing mains interference is chosen active second order,
the cutoff frequency is calculated according to Formula (2). The resulting fre-
quency response is shown in Fig. 2.

Fig. 2. Frequency characteristic of EMG module pre-processing

For final amplification, a non-inverting operational amplifier circuit with a


potentiometer in feedback can be used. The gain can be varied from 25 to 125.
The set gain (G2 ) can be expressed according to formula (3).
R2
G2 = 1 + (3)
R1
where R1 is the value of the forward resistance and R2 is the value of the feedback
resistance. Signal pre-processing can measure input signals from 0.05 to 11 mV.
By changing the gain, the difficulty is set. The amplification is divided into 6
levels. The first level amplifies the differential input signal at the electrodes 1281
times. The sixth level amplifies 218 times. With lower amplification, it is more
challenging for the patient to perform the selected biofeedback presentation
EMG as Feedback During Rehabilitation 361

5 EMG Signal Presentation


The output of the pre-processed EMG signal enters into six possible variants of
EMG signal presentation.

1. Audio output.
2. Visualization by LED lighting.
3. Visualisation of the EMG signal intensity on cascade of LEDs.
4. Audio playback after exceeding a certain EMG signal intensity.
5. EMG signal measurement (PC and ready program for this device required).
6. Saving EMG signal to SD card (for later analysis).

The first four options are designed for simple feedback during rehabilitation.
The last two options are for medical purposes and for subsequent analysis of the
measured EMG. The audio output is realized by wiring the LM386 as an audio
amplifier.

Fig. 3. Recorded audio output sound

Figure 3 shows sound recording by microphone. A recording of biceps is made


with and without weight on the arm. From the result, it can be seen that the vol-
ume of the audio output is higher when a more intense EMG signal is developed.
362 M. Prochazka and V. Kasik

The LED illumination is created by wiring a transistor as a tracker. A micro-


controller is included in the processing to visualize the EMG signal intensity
on the cascade of diodes and to play the audio after the EMG signal intensity
is reached. For the visualization of the EMG signal on the LED cascade, the
maximum range of the EMG signal is set at first, then the intensity of the EMG
signal is expressed relative to the set maximum (Fig. 4).

Fig. 4. EMG intensity presentation on the LED cascade

For sound playback after the EMG signal intensity is reached, a threshold
is initially set, after which the selected sound is played. The audio for playback
is stored on the SD card. It is possible to choose a sound that is motivating
(fanfare). Another option can be, to play the tone of an instrumental instrument.
The patient can play melodies of one tone. By connecting other devices with
different tones, it is possible to play a melody of several tones. By playing, the
patient thus exercises to stimulate his muscles, which has a positive impact on
the outcome of rehabilitation. For this exercise, the patient is presented with
sheet music (Fig. 5).

Fig. 5. Single tone melody for EMG device

The block diagram (Fig. 6) shows the complete measurement chain.


The proposed device (Fig. 7) has buttons on the top left side for setting the
desired EMG signal presentation. The first button selects between visual or audio
presentation. The second button selects between LED lighting or LED cascade in
case of visual presentation. The third button, in the case of audio presentation,
selects between audio of the EMG signal or playback of the audio when the
EMG as Feedback During Rehabilitation 363

Fig. 6. Block diagram of the implemented pre-processing of EMG signal and its possible
visualization

Fig. 7. EMG biofeedback module


364 M. Prochazka and V. Kasik

threshold is crossed. In the top middle there is a difficulty setting from level 1
to level 6. Where level 1 amplifies the signal 1281 times and level 6 amplifies
the signal 218 times. As the level of amplification increases, the amplification
decreases which helps the patient to present the EMG signal easily. On the top
right side, there is a visual presentation of the EMG signal i.e. LED lighting
and LED cascade. A note holder is located in the top On the right side there is
an electrode input in the form of a 3.5 jack connector. In the back there is an
SD card slot and a connector for connecting a bmeng DAU unit. At the bottom
there is a speaker.

6 Discussion
The prototype was tested for functionality on ten volunteers. With the difficult
level 1 setting, it could be observed that the presentation of the EMG signal was
activated at the low effort. In contrast, at the level 6 setting, increased effort
had to be exerted to achieve the desired biofeedback. A shortcoming of the
device is the limitation to only one channel. For future work, the inclusion of at
least two channels is considered, which will open up possibilities to include other
biofeedback options in the form of games. In future work it is intended to compare
the different biofeedback in terms of the benefit of faster recovery. The different
presentations of the EMG signal will be compared in terms of concentricity.
Furthermore, the time spent with each biofeedback will be calculated to evaluate
the user’s most used EMG signal presentation. Also, it is planned to connect the
device with a phone where the amount of EMG signal will control a simple game.

7 Conclusion
The main aim of the work was to create and present a new prototype with
several possibilities of EMG signal presentation for the general public for use
in rehabilitation or for targeted training of athletes. The problem of analogue
pre-processing of biosignals is discussed in detail. In the creation of analogue pre-
processing, EMG signal parameters (for setting the cutoff frequencies of filters
and sufficient amplification) are included. The device is powered by a galvani-
cally isolated A/D converter bmeng DAU unit. For this reason, galvanic isolation
on the board is not implemented. Four options for simple EMG signal presenta-
tion are considered. The first option is an acoustic signal directly to the EMG
signal. The recording resembles dry crackling of wood. As the intensity of the
EMG signal increases, the volume of the audio output increases (Fig. 3). The
second option is to visualize the intensity of the EMG signal on an LED. By
clenching the muscle, the LED lights up. Again, at higher EMG signal intensi-
ties the LED glows more. The other two visualization options are handled by
the microcontroller. The third option displays the EMG signal intensity on the
cascade LEDs (Fig. 4). The last option presents the EMG signal as a playback
of the sound. When the set level of the EMG signal is exceeded, the selected
sound is played, it can be the sound like a fanfare or for example the tone of
EMG as Feedback During Rehabilitation 365

an instrumental instrument. The change of level can be set by a variable gain


amplifier. The device will facilitate the doctors and nurses work. It also presents
simple visualizations of the EMG signal for the patient, who will use the device
especially for feedback of their efforts during training.

References
1. Cases, C.M.P., Baldovino, R.G., Manguerra, M.V., Dupo, V.B., Dajay, R.C.R.,
Bugtai, N.T.: An EMG-based gesture recognition for active-assistive rehabilita-
tion. In: 2020 IEEE 12th International Conference on Humanoid, Nanotechnology,
Information Technology, Communication and Control, Environment, and Manage-
ment (HNICEM), pp. 1–6. IEEE (2020)
2. Chan, B., Saad, I., Bolong, N., Siew, K.E.: A review of surface EMG in clinical
rehabilitation care systems design. In: 2021 IEEE 19th Student Conference on
Research and Development (SCOReD), pp. 371–376. IEEE (2021)
3. Darak, B.S., Hambarde, S.: A review of techniques for extraction of cardiac artifacts
in surface EMG signals and results for simulation of ECG-EMG mixture signal.
In: 2015 International Conference on Pervasive Computing (ICPC), pp. 1–5. IEEE
(2015)
4. Gotuzzo, J., Vu, S., Dee, S., George, K.: Electromyography based orthotic arm and
finger rehabilitation system. In: 2018 IEEE International Conference on Healthcare
Informatics (ICHI), pp. 338–339. IEEE (2018)
5. Kieliba, P., Tropea, P., Pirondini, E., Coscia, M., Micera, S., Artoni, F.: How are
muscle synergies affected by electromyography pre-processing? IEEE Trans. Neural
Syst. Rehabil. Eng. 26(4), 882–893 (2018)
6. Kim, Y.H., Kim, S.J., Shim, H.M., Lee, S.M., Kim, K.S.: A method for gait reha-
bilitation training using EMG fatigue analysis. In: 2013 International Conference
on ICT Convergence (ICTC), pp. 52–55. IEEE (2013)
7. Malboubi, M., Razzazi, F., Sh, M.A., Davari, A.: Power line noise elimination
from EMG signals using adaptive laguerre filter with fuzzy step size. In: 2010 17th
Iranian Conference of Biomedical Engineering (ICBME), pp. 1–4. IEEE (2010)
8. Matsubara, M., Terasawa, H., Kadone, H., Suzuki, K., Makino, S.: Sonification of
muscular activity in human movements using the temporal patterns in EMG. In:
Proceedings of the 2012 Asia Pacific Signal and Information Processing Association
Annual Summit and Conference, pp. 1–5. IEEE (2012)
9. Merletti, R., Farina, D.: Surface Electromyography: Physiology, Engineering, and
Applications. Wiley, Hoboken (2016)
10. Merletti, R., Parker, P.J.: Electromyography: Physiology, Engineering, and Non-
invasive Applications, vol. 11. Wiley, Hoboken (2004)
11. Poonsiri, J., Charoensuk, W.: Surface EMG based controller design for knee reha-
bilitation devices. In: The 4th 2011 Biomedical Engineering International Confer-
ence, pp. 131–134. IEEE (2012)
12. Preston, D.C., Shapiro, B.E.: Electromyography and neuromuscular disorders
e-book: clinical-electrophysiologic correlations (Expert Consult-Online). Elsevier
Health Sciences (2012)
13. Rastogi, R., et al.: Which one is best: Electromyography biofeedback, efficacy anal-
ysis on audio, visual and audio-visual modes for chronic tth on different character-
istics. Int. J. Comput. Intell. IoT 1(1), 25–31 (2018)
366 M. Prochazka and V. Kasik

14. Sheng, G., Wang, L., Ma, D., Fan, F., Niu, H.: The design of a rehabilitation train-
ing system with EMG feedback. In: 2012 International Conference on Biomedical
Engineering and Biotechnology, pp. 917–920. IEEE (2012)
15. Suhaimi, R., et al.: Analysis of EMG-based muscles activity for stroke rehabili-
tation. In: 2014 2nd International Conference on Electronic Design (ICED), pp.
167–170. IEEE (2014)
16. Tsubouchi, Y., Suzuki, K.: Biotones: a wearable device for EMG auditory biofeed-
back. In: 2010 Annual International Conference of the IEEE Engineering in
Medicine and Biology, pp. 6543–6546. IEEE (2010)
17. Weiss, J.M., Weiss, L.D., Silver, J.K.: Easy Emg-E-Book: A Guide to Perform-
ing Nerve Conduction Studies and Electromyography. Elsevier Health Sciences,
Philadelphia (2022)
18. Winter, D.A.: Biomechanics and Motor Control of Human Movement. Wiley, Hobo-
ken (2009)
Touchless Pulse Diagnostics Methods
and Devices: A Review

Anna Pająk(B) and Piotr Augustyniak

Department of Biocybernetics and Biomedical Engineering, AGH University


of Science and Technology, Mickiewicza Avenue Building C-3, 30-059 Krakow, Poland
{annapaja,august}@agh.edu.pl

Abstract. Noninvasive monitoring of human vital parameters is a


widely studied topic. The scientists and engineers create many devices
with telemedicine applications. Also in everyday functioning people use
gadgets that contain noninvasive measurements (e.g. heart rate measure-
ments in a smart watch). In addition to medical diagnostics, they are also
used to monitor sports development. This paper presents a literature
review on the noninvasive measurement of human vital signs. Our goal
is to present methods that allow monitoring of human vital parameters
and provide examples of applications in the constructed devices.

Keywords: Noninvasive measurements · Videoplethysmography ·


Photoplethysmography · Radar · Wearable devices

1 Introduction
The necessity for the continuous measurement of human vital signs exists as long
as medicine. An important aspect of the use of such measurement is not only the
ability to react quickly in the event ofI a life- or health-threatening situation,
but also to analyse the behaviour of the human body under certain conditions
and contribute to development of knowledge.
The acquisition methods can be divided into invasive or noninvasive. Exam-
ples of invasive measurements are: vascular and cardiac catheterization [1],
aminopuncture [2] and biopsy [3]. Although these methods are accurate, they
involve patient stress [2], need to be performed under appropriate, preferably
sterile conditions and require qualified personnel supervision during the mea-
surement.
In this article we address noninvasive measurements. They are increasingly
competitive alternative to invasive testing. During performing noninvasive mea-
surements, the skin is not interrupted and the achieved results allow for a correct
diagnosis [4]. Among of noninvasive methods we focus on touchless data acquisi-
tions due to their possible distant use easier setup and lack of hygienic issues. The
parameters used to make such a measurement are the optical and mechanical
properties of the patient’s body.

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 367–376, 2022.
https://doi.org/10.1007/978-3-031-09135-3_31
368 A. Pająk and P. Augustyniak

Noninvasive methods of measurement also allow for continuous monitoring of


vital signs under natural conditions. The advantage of this type of measurement
is primarily the ability to provide medical services outside of the hospital.

2 Methods of Noninvasive Measurements


2.1 Photoplethysmography

Photoplethysmography (PPG) uses variable of light absorption through the skin


[5]. It is used primarily in the measurements of heart rate, body temperature,
blood pressure [6] and heart rate variability analysis [5]. PPG has already been
widely used in emergency medicine for years (pulse oximeters applied to the
finger or auricle) and has entered to common use thanks to smart watches and
other smart devices [6].
During PPG measurement the monochromatic light is analysed after passing
through the human skin. Depending on place where the device is located, the
light beam is analysed as it is reflected inside the skin [7] or as it passes through,
e.g. the finger [6]. The absorption capacity of the test object depends on the oxy-
genation and blood volume in the arterial vessel. Based on the absorption-time
dependence curve it is possible to study the pulse wave flow and related param-
eters (Fig. 1). More commonly used in diagnosis is the second derivative of the
PPG signal (acceleration photoplethysmogram), which reflects the acceleration
of the blood [5].

Fig. 1. Example of the pulse wave flow recording (the arrow indicates the dicrotic
waveform) [8]

Besides cardiovascular parameters, the PPG measurement signal contains


also information about respiration rate. During respiration the frequency, inten-
sity, amplitude and signals statistic characteristics change, paving the way to
expanded diagnosis. [5]
The obstacles to accurate determination of human vital signs using PPG are
primarily motion artifacts and ambient noise [5]. They can easily disturb the
measurement and make the data misleading if not useless.
We have included the results that were obtained in the analysed articles for
the determination of heart rate in Table 1.
Touchless Pulse Diagnostics Methods... 369

Table 1. Summary of heart rate measurement errors obtained in selected articles

Paper Error in heart rate determination [%]


Hong et al. [7] 0.76
Nabeel et al. [9] 0.14

2.2 Videoplethysmography
Videoplethysmographic (VPG) measurement is usually performed with a com-
mercially available camera [10], [11]. The VPG method is used to detect heart
rate and to seek for cardiac disorders (e.g. atrial fibrillation [12]), and can col-
lect the data needed for heart rate variability analysis [13]. The study is also
obtaining blood pressure information for the video signal [14].
To perform a usable VPG measurement, it is important to choose the appro-
priate lighting [15]. According to [16], the most favorable results ate obtained
for warm light colours. Because of the difference in absorption of the incident
light, the skin tone must be also considered. In the RGB colour space for the
people with lighter skin colour, the green component of the signal shows higher
variability, whereas for the people with darker skin colour, the red component
was found better to analyze [17].
The place the most commonly chosen for measurements is the face due to
the availability and good quality of the signal obtained from it. However there
are also studies performed with footage taken from other places. The tests were
carried out with chest [16] and neck [18].
The VPG method has some limitations related with use of visible light. The
presence of shadows on the test object, inadequate illumination and motion
(of the camera or the examined person) interferes with the results, so that the
analysed signal will be nondiagnostic [19].
In the VPG method, during or after a correctly performed video recording,
the program should determine a skin region (e.g. using Viola-Jones algorithm or
Haar cascades) [20]. The next step is to split the signal contained in the area of
interest into chromatic components [11] or convert to grayscale [15]. The analysis
of the obtained video is based on the principal component analysis algorithm
[19], independent component analysis algorithm [11], or autoregressive model
[21]. The signal after correct acquisition and processing should be characterized
by variability of frequency adequate to the pulse rate of the examined person.
In the Table 2 we have included the set of measurements errors obtained by
the VPG method to the reference method.

Table 2. Summary of heart rate measurement errors obtained in selected articles

Paper Error in heart rate determination [%]


Poh et al. [11] (green channel) 9.52
Poh et al. [11] (with ICA) 3.63
370 A. Pająk and P. Augustyniak

2.3 Laser Pulse Measurements


In [22] a noninvasive, contactless system was proposed to measure pulse wave-
forms of artery with the use of laser triangulation method able to detect skin sur-
face vibration. The experimental atrial pulsation measurement system employs
inexpensive laser diode, magnet coil translator and image sensor. Obtained
results were satisfactory for traditional Chinese medicine (although absolute
pulse rate error was not specified) and the authors claim to capture displacement
of radial artery near wrist of order of 48 µm.
In [23] a laser Doppler vibrometer was proposed to measure the blood pulse
in sleep from the carotid artery. Four AI classifiers were employed to recognize
the blood pulse from the sensor signal. Although consistent, all result were rather
disappointing – the median error of rhythm of 7.8 bpm comparing to the ECG
reference is beyond of interest of any medical or lifestyle application.
Both examples of use of laser-based pulse detectors have serious drawbacks:
(1) they require stable positioning of the patient for 10 to 30 min and (2) the
use of laser light may be harmful to the sight.

2.4 Radar Measurements


The very first measurements of human vital parameters with radar were per-
formed in ’70s [24]. A method of such measurement is still being developed for
detecting heartbeat [25,26], breathing [27,28] frequency and determining the
number of people [28].
Radar operation focuses on the interaction of radio waves with objects in
space [29], what allows measurements to be made regardless of lightning con-
ditions. The waves sent out at a known frequency bounce off the tested person
and objects nearby. The signal received by the receiver is analysed (envelope
analysis [30], band-pass filtering [27], application of fast Fourier transform) to
extract information about the presence of a human in the radar range and to
determine the frequency of breathing [27] or heartbeat [25].
The attenuation caused by walls and large objects (door, window, refrig-
erator) [31] and signal interference caused by noise [30] must be considered
when designing a system for measuring vital signs using radar. The authors
of [25] conducted an attempt to increase the signal-to-noise ratio (SNR) for the
Ultra-Wideband Doppler radar. They extended the technique of complex sig-
nal demodulation. The purpose of this procedure was to reduce the influence of
interference from harmonic signals of the parameters under study. The SNR has
been increased from about 20 dB to about 32 dB.
In reviewed articles achieved results were satisfactory and allow the deter-
mination of respiratory or heart rate with good accuracy. In Table 3 we have
compiled a comparison of the results from each paper.

2.5 Other Non-Electrical Wearable Sensors


In recent years many wearable devices were made for noninvasive measurements
(smart watches, smart glasses, smart bracelets) [32]. They are used for both
Touchless Pulse Diagnostics Methods... 371

Table 3. Summary of heart rate measurement errors obtained in selected articles

Paper Error in respiration rate Error in heart rate Distance [m]


determination [%] determination [%]
Ren et al. [25] NA 1,5 0.8
Regev et al. [28] 0.67 NA 1–2
Michahelles et al. [29] NA 5 0.15

medical and lifestyle purposes. Besides methods referenced in Chap. 2, gyro-


scopes [33], [34], magnetometers [34] or accelerometers [35] are briefly mentioned
below. These wearables may also be considered touchless for operating without
a direct contact to the patient skin. Accelerometers are most popular among
such devices. Human activity brings on acceleration of legs, arms, hands, or
head. Movements can be related to illnesses [35] or natural activity (running
[34], sitting, walking stairs [36], etc.).
Lack of electrical contact, electromagnetic interferences, complicated setups
and hygienic issues are common advantages of touchless pulse acquisition meth-
ods. An example study on influence of how electrodes are made on the quality
of examination is given in [37].

3 Data Analysis Methods


3.1 Simple Unimodal Scenarios
The scope of research and development in the area of wearable devices is not
only mechanical construction but also software implementing and data analysing.
Depending on the method and device used in measurement the embedded soft-
ware allows collecting and preprocessing data. Than the analysis takes place.
In order to improve the results scientists are using statistical analysis. One of
them is principal component analysis with coordinate transformation [38]. This
method serves to eliminate the effect of orientation variations. Another statisti-
cal method is independent component analysis or principal component analysis
[21] mentioned in Sect. 2.2. It allows for splitting the signal into a mixture of
uncorrelated components.
The other way to analyse show Huang et al. in [39]. They used Data Den-
sity Functional Method (DDFM) for data collected from tri-axial gravitational
sensor. The DDFM provides to correctly recognize the motion pattern. The best
results were reached for positions changes: lying face-up - turn left and lying
face-up – turn right.
Wearable devices used to recognize the activity being performed are aided
by neural networks. In the case of [40], the network replenishes data lost dur-
ing acquisition. The model was trained with data with randomly missing data.
During the experiment authors had tested their algorithm with use of data of
human activity sensing consortium and an accelerometer mounted on a chest.
They received better results in recognizing activities than with use of network
trained on good quality data (the differences reached 4% points) [40].
372 A. Pająk and P. Augustyniak

3.2 Multimodal Monitoring Systems


Multimodal monitoring systems are also emerging among the continuous mon-
itoring proposals. They are mainly used to record the patient’s whereabouts
[30], [41] and collect data from wearable sensors [42]. Such systems use multiple
sensors usually of different operating principles. In [43] authors connected the
following sensors in their system: infrared (for movement detection), pressure
(in the bed and sofa), magnetic (for opening door detection) and temperature
(joined to each previously mentioned sensor).
The main challenge in measuring system using is safe data circulation.
Because of sensitivity of the data, only medical staff, measured person and
eventually family members should be authorized to access to data. In [44] the
blockchain technique for pervasive social network based healthcare. Pervasive
social network serves sharing data with authorized persons. Blockchain tech-
nique is to cryptographic and mathematical operations to encrypt the data. The
main advantages of this technique are efficiency in encrypting the data and inex-
pensiveness. The authors used improved version of IEEE 802.15.6 system. Their
protocol has lower computational burden on the sensor-node which has limited
resource. [44]
Chen et al. point on other important issue – the psychological side of mea-
sured person [45]. They proposed a system to measure physiological parameters
and simultaneously analyse emotional status of patient. After analysis of the
signal from sensors, the algorithm interpret the status as commands for assisted
living environment e.g. change the light in room, heat or cool the apartment, etc.
depending on psychological condition of human. System like this one is holistic
and allows taking care of both: patients body and mind.

4 Discussion
In this article we analyzed some of touchless measuring methods and devices
focusing on blood pulse. There are several other ways to collect data of human
vital signs (e.g. augmented reality, intelligent virtual agents [46]) noninvasively
and it can be a topic for further researches.
We focused on PPG, VPG laser, and radar measurements. The main advan-
tages of each are possibility of continuous and noninvasive (or even contactless)
measurement. The best results were reached for PPG – the error in heart rate
estimation was 0.14% [9] and 0.76% [7]. The biggest error have VPG measure-
ments: 3.63% [11] and 9.52% [11]. This can be due to longer history of use PPG
in medicine (common use of pulseoximeters in emergency medicine and in hospi-
tals). The VPG has much shorter time of existence, moreover some methods are
designed for distant measurements (up to 10 m). Laser measurement was found
the less matured from this group. To make things worse it uses potentially harm-
ful light, requires patient immobility for a considerable amount of time and offers
poor accuracy. In radar measurements the estimate of respiration rate has error
at the level 0.67% [28] and of heart rate: 1.5% [25] and 5% [29] were fair in home
care.
Touchless Pulse Diagnostics Methods... 373

The PPG is important for diagnostics because of many parameters which


can be extracted from the signal (blood pressure, respiration rate etc.). The
acquisition of signal is simple, fast and independent of electrical artifacts. A
disadvantage of this measuring method is motion sensitivity. The patients move-
ments or ambient noise can easily disturb the measurement.
The VPG can be used for much more than just heart rate detection. The
procedure of taking the measurement allows one for simultaneous monitoring of
subject’s movements or face expression. When visible light detector is supported
with infrared detector we can also measure temperature and respiration rate. The
variety of colour spaces opens a wide range of possibilities to analyze the obtained
video and further parameters. The disadvantage of this method is dependence
on adequate illumination. Shadows on measured region interfere with the results
and may degrade the accuracy of final readout.
Radar measurement is good for long-distance acquisition. In researched
papers the detector was in a 0.15 to 2 m distance from the measured person
[25], [28], [29]. A big advantage of this method is possibility to measure object
located out of sight range, so we can detect heart or respiration rate when patient
is in other room. Weakness of radar measurements is significant attenuation of
the wave when a large object such a refrigerator or wardrobe is present in the
vicinity of the subject. An intense electromagnetic noise is also harmful for data
acquisition.
For parameters measured by each of there methods (e.g. heart rate), it is
possible to validate their correctness against each other. Using the idea described
in [44] or [45] each sensor may exchange information in a home network. A
system composed of PPG, VPG and radar measurements performs noninvasive
and seamless pulse tracking discharging the patient from remembering to take
measurements. The quaint idea is to add wearable sensors like mentioned in
Sect. 3.1. The design of the system, which measures human vital signs, is still
being expanded with new measurement methods and data acquisition method,
and provides ample room for new improvements.

Acknowledgement. Research supported by the AGH University of Science and Tech-


nology in year 2022 from the subvention granted by the Polish Ministry of Science and
Higher Education; grant no. 16.16.120.773

References
1. Mazurek, T.: Wskazania diagnostyczne do cewnikowania jam serca, zasady zabiegu,
pp. 202–205. Gdańsk, Via Medica (2013)
2. Siedlecka, A., Ciach, K., Świątkowska-Frerund, M., Preis, K.: Fear related to
aminocentesis as a method if invasive prenatal diagnosis. GinPolMedProject 4(18),
38–43 (2010)
3. Pawełczyk, K., Marciniak, M., Kołodziej, J.: Invasive diagnostics of throatic malig-
nant diseases. Adv. Clin. Exp. Med. 13(6), 1067–1072 (2004)
4. Swora, E., Stankowiak-Kulpa, H., Marcinkowska, E., Grzymisławski, M.: Clinical
aspects of diagnostics in heliobacter pylori infection. Nowiny Lekarskie 78(3–4),
228–230 (2009)
374 A. Pająk and P. Augustyniak

5. Castaneda, D., Esparza, A., Ghamari, M., Soltanpur, C., Nazezran, H.: A review
on wearable photoplethysmography sensors and their potential future applications
in health care. Int. J. Biosens. Bioelectron. 4(4), 195–202 (2018)
6. Celka, P., Charlton, P.H., Farukh, B., Chowienczyk, P., Alastruey, J.: Influence
of mental stress on the pulse wave features of photoplethysmograms. Healthcare
Technol. Lett. 7(1), 7–12 (2020)
7. Hong, S., Park, K.S.: Unobtrusive photoplethysmographic monitoring under the
foot sole while in a standing posture. Sensors 18, 3239 (2018)
8. Prokop, D.: Zastosowanie wieloczujnikowego optoelektronicznego systemu pomi-
arowego do badania przebiegów fali tętna (2017)
9. Nabeel, P.M., Jayaraj, J., Mohansankar, S.: Single-source PPG based local pulse
wave velocity measurement: a potential cuffless blood pressure estimation tech-
nique. Physiol. Meas. 38(12), 2122–2140 (2017)
10. Poh, M.-Z., McDuff, D.J., Picard, R.W.: Advancements in noncontact, multipa-
rameter physiological measurements using a webcam. IEEE Trans. Biomed. Eng.
58(1), 7–11 (2011)
11. Poh, M.-Z., McDuff, D.J., Picard, R.W.: Non-contact, automated cardiac pulse
measurements using video imaging and blind source separation. Opt. Exp. 18(10),
10762–10774 (2010)
12. Couderc, J.-P., et al.: Detection of atrial fibrillation using contactless facial video
monitoring. Heart Rhythm 12(1), 195–201 (2015)
13. Couderc, J.-P., et al.: Pulse harmonic strength of facial video signal for the detec-
tion of atrial fibrillation. Comput. Cardiol. 41, 661–664 (2014)
14. Sugita, N., et al.: Estimation of absolute blood pressure using video images cap-
tured at different heights from the heart. In: 2019 41st Annual International Con-
ference of the IEEE Engineering in Medicine and Biology Society (EMBC) (2019)
15. Przybyło, J., Kańoch, E., Jabłoński, M., Augustyniak, P.: Distant measurements
of plethysmographic signal in various lighting conditions using configurable frame-
rate camera. Metrol. Meas. Syst. 23(4), 579–592 (2016)
16. Mędrala, R., Augustyniak, P.: Taking Videoplethysmographic Measurements at
Alternative Parts of the Body - Pilot Study, PCBBE (2019)
17. Królak, A.: Influence of skin tone on efficiency of vision-based heart rate estimation.
In: Augustyniak, P., Maniewski, R., Tadeusiewicz, R. (eds.) PCBBE 2017. AISC,
vol. 647, pp. 44–55. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-
66905-2_4
18. Nabeel, P.M., Jayaraj, J., Mohanasankar, S.: Single-source PPG based local pulse
wave velocity measurement: a potential cuffess blood pressure estimation tech-
nique. Inst. Phys. Eng. Med. 38(12), 2122–2140 (2017)
19. Al-Naji, A., Perera, A.G., Chahl, J.: Remote monitoring of cardiorespiratory sig-
nals from a hovering unmanned aerial vehicle. BioMed. Eng. OnLine 16, 101 (2017).
https://doi.org/10.1186/s12938-017-0395-y
20. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple
features. In: Proceedings of IEEE Conference on Computer Vision and Pattern
Recognition, p. 511. IEEE (2001)
21. Przybyło, J.: Continuous distant measurement of the user’s heart rate in human-
computer interaction applications. Sensors 19, 4205 (2019)
22. Wu, J.H., Chang, R.S., Jiang, J.A.: A novel pulse measurement system by using
laser triangulation and a CMOS image sensor. Sensors 7(12), 3366–3385 (2007).
https://doi.org/10.3390/s7123366
Touchless Pulse Diagnostics Methods... 375

23. Antognoli, L., Moccia, S., Migliorelli, L., Casaccia, S., Scalise, L., Frontoni, E.:
Heartbeat detection by laser doppler vibrometry and machine learning. Sensors.
20(18), 5362 (2020)
24. Lin, J.C.: Noninvasive microwave measurement of respiration. IEEE 63(10), 1530–
1530 (1975)
25. Ren, L., et al.: Phase based methods for heart rate detection using UWB impulse
doppler radar. IEEE Trans. Microwave Theor. Tech. 64(10), 3319–3331 (2016)
26. Rong, Y., Herschfelt, A., Holtom, J., Bliss, D.W.: Cardiac and respiratory sensing
from a hovering UAV radar platform. In: 2021 IEEE Statistical Signal Processing
Workshop (2021)
27. Abdulatif, S., et al.: Power-based real-time respiration monitoring using FMCW
radar. Comput. Sci. Eng. (2017)
28. Regev, N., Wulich, D.: Radar-based, simultaneous human presence detection and
breathing rate estimation. Sensors 21, 3529 (2021)
29. Michahelles, F., Wicki, R., Schiele, B.: Less contact: heart-rate detection without
even touching the user. In: Eighth International Symposium on Wearable Com-
puters (2004)
30. Ravichandran, R.: et al., WiBreathe: estimating respiration rate using wireless
signals in natural settings in the home. In: 2015 IEEE International Conference on
Pervasive Computing and Communications (2015)
31. Jasińki, Ł.: Pomiar tłumienia ścian i innych elementów charakterystycznych dla
środowiska wewnątrzbudynkowego w paśmie 2,4 GHz, www.alvarus.org (2011)
32. Liu, J., et al.: Recent progress in flexible wearable sensors for vital sign monitoring.
Sensors 20, 4009 (2020)
33. Qiu, S., Wang, Z., Zhao, H., Hu, H.: Using distributed wearable sensors to measure
and evaluate human lower limbs motion. IEEE Trans. Instrum. Measur. 65(4),
939–950 (2016)
34. Weich, C., Vieten, M.M.: The Gaitprint: identifying individuals by their running
style. Sensors 20, 3810 (2020)
35. Petersen, J., Austin, D., Sack, R., Hayes, T.L.: Actigraphy-based scratch detection
using logistic regression. IEEE J. Biomed. Health Inf. 17(2), 277–283 (2013)
36. Zhang, P., Zhang, Z., Chao, H.-C.: A stacked human activity recognition model
based on parallel recurrent network and time series evidence theory. Sensors 20,
4016 (2020)
37. Pitou, S., Michael, B., Thompson, K., Howard, M.: Hand-Made embroidered elec-
tromyography: towards a solution for low-income countries. Sensors 20, 3347 (2020)
38. Chen, Z., Zhu, Q., Soh, Y.C., Zhang, L.: Roboust human activity recognition using
smartphone sensors svia CT-PCA and online SVM. IEEE Trans. Ind. Inf. 13(6),
3070–3080 (2017)
39. Huang, S.-J., Wu, C.-J., Chen, C.-C.: Pattern recognition of human postures using
the data density functional method. Appl. Sci. 8, 1615 (2018)
40. Hossain, T., Ahad, A.R., Inoue, S.: A method for sensor-based activity recognition
in missing data scenario. Sensors 20, 3811 (2020)
41. Horn, B.K.P.: Observation model for indoor positioning. Sensors 20, 4027 (2020)
42. Kańtoch, E.: Recognition of sedentary behaviour by machine learning analysis
of wearable sensors during activities of daily living for telemedical assessment of
cardiovascular risk. Sensors 18, 3219 (2018)
43. Zapata, J., Fernández-Luque, F.J., Ruiz, R.: Wireless sensor network for ambient
assisted living, December 2010. ISBN 978-953-307-321-7, https://doi.org/10.5772/
13005
376 A. Pająk and P. Augustyniak

44. Zhang, J., Xue, N., Huang, X.: A secure system for pervasive social network-based
healthcare. IEEE Access 4, 9239–9250 (2016)
45. Chen, M., Zhang, Y., Li, Y., Hassan, M.M., Alamri, A.: AIWAC: affective inter-
action through wearable computing and cloud technology. IEEE Wirel. Commun.
22(1), 20–27 (2015)
46. Norouzi, N., Bruder, G., Belna, B., Mutter, S., Turgut, D., Welch, G.: A systematic
review of the convergence of augmented reality, intelligent virtual agents, and the
internet of things. In: Al-Turjman, F. (ed.) Artificial Intelligence in IoT. TCSCI,
pp. 1–24. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-04110-6_1
Methods of Functional Assessment
of the Temporomandibular Joints –
Systematic Review

Damian Kania1(B) , Patrycja Romaniszyn-Kania2 , Marcin Bugdol2 ,


Anna Lipowicz3 , Krzysztof Dowgierd4 , Małgorzata Kulesa-Mrowiecka5 ,
Zofia Polewczyk2 , Łukasz Krakowczyk6 , and Andrzej Myśliwiec1
1
Institute of Physiotherapy and Health Science, The Jerzy Kukuczka Academy
of Physical Education in Katowice, ul. Mikołowska 72A, 40-065 Katowice, Poland
{d.kania,a.mysliwiec}awf.katowice.pl
2
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{patrycja.romaniszyn-kania,marcin.bugdol,zofipol729}@polsl.pl
3
Department of Anthropology, Institute of Environmental Biology,
Wroclaw University of Environmental and Life Sciences, Kozuchowska St. 5B,
50-375 Wroclaw, Poland
anna.lipowicz@upwr.edu.pl
4
Head and Neck Surgery Clinic for Children and Young Adults,
Department of Clinical Pediatrics, University of Warmia and Mazury,
Żołnierska 18a Street, Olsztyn 10-561, Poland
5
Department of Physiotherapy, Institute of Physiotherapy, Faculty of Health
Sciences, Jagiellonian University Medical College, Michałowskiego street 12,
Kraków 31-008, Poland
m.kulesa-mrowiecka@uj.edu.pl
6
Department of Oncological and Reconstructive Surgery, Maria-Sklodowska Curie
National Research Institute of Oncology, Gliwice Branch, ul. Wybrzeże Armii
Krajowej 15, Gliwice 44-100, Poland
lukasz.krakowczyk@io.gliwice.pl

Abstract. The temporomandibular joint (TMJ) is an even joint in the


human head that allows for a movable connection between the skull and
the mandible and performs complex movements. According to statisti-
cal data, it is estimated that disorders within the masticatory apparatus
affect 60 to 80% of the population. The need for evaluation of the tem-
poromandibular joints, whether resulting from functional disorders or
conditions after surgical procedures, and proper diagnosis is crucial in
many areas of science. Therefore, it was decided that a systematic review
was necessary. The aim was to search for functional assessment meth-
ods of the TMJ using standard techniques and innovative technological
solutions. Detailed criteria for the inclusion of found articles for further
analysis were defined. More than 18 000 articles were found through-
out the search process on the topic presented; however, only 22 met the
inclusion criteria and were included in further analysis.

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 377–390, 2022.
https://doi.org/10.1007/978-3-031-09135-3_32
378 D. Kania et al.

Keywords: Diagnostic methods of TMJ · Temporomandibular


disorder diagnosis · Novel approaches · ROM of TMJ

1 Introduction
The temporomandibular joint (TMJ) is an even joint in the human head that
allows for a movable connection between the skull and the mandible and performs
complex movements such as retraction and adduction of the jaws, extension and
retraction of the mandible, and laterotrusion movements. Consequently, this
allows for crushing and grinding of food, speaking and breathing [1]. Temporo-
mandibular joint dysfunctions (TMD) comprise a large group of disease enti-
ties, distinct in etiology, symptoms, and subsequent treatment modalities, often
accompanied by pain and reduced mobility [2]. TMJ disorders with accompa-
nying symptoms are a common problem. According to statistics, it is estimated
that 60 to 80% of the population is affected by TMJ disorders [3]. One example
of TMD is functional impairment, which can be caused by pathologies in TMJ
structure or tooth development and leads to abnormalities in the functioning
of the muscles responsible for mandibular movements in all three planes [4–7].
The necessity to evaluate the TMJs, whether resulting from functional disorders
(malocclusion, bruxism, ankylosis) or conditions after surgical procedures and
the correct diagnosis, is essential in many areas of science.
With the ubiquitous and readily available technology, there are many meth-
ods for functional assessment of TMJs, ranging from a standard ruler or caliper
measurements to advanced magnetic resonance imaging techniques [8–31]. Range
of motion (ROM) measures with manual devices are always subjective and
depend on many factors, such as the testing protocol, the examiner’s experience,
or the model of the equipment used [8–16]. Imaging techniques, repeatedly pre-
sented in the literature as methods to assess TMJ, are expensive and difficult to
access [32]. Solutions based on vibroacoustic are an interesting diagnostic alter-
native, but it should be remembered that the TMJ uses many muscles, which
may disturb the recorded signals [21–27]. It is essential to search for solutions
that allow objective assessment of the temporomandibular joint while minimiz-
ing the influence of both internal and external factors.
Therefore, based on authors’ knowledge and the experience, it was decided
that a systematic review was necessary to search for methods of functional assess-
ment of the TMJ using both, standard undemanding techniques and novel tech-
nological solutions.

2 Research Methods
2.1 Knowledge Sources and Search Strategy
The review of literature focused on the search for methods used for functional
assessment of the TMJ. The team of authors carried out this task in the period
from September to November 2021. For this purpose, the databases of scientific
Methods of Functional Assessment of TMJs 379

articles of universities in Poland, i.e., the Silesian University of Technology in


Gliwice and the J. Kukuczka Academy of Physical Education in Katowice, were
used. Additionally, literature sources available through public online databases
such as PubMed were also searched.

2.2 Selection Criteria


In the presented article, the search of scientific reports was divided into two
categories. The first one concerned the information about standard methods of
functional assessment of the TMJ without highly specialized equipment. For
this purpose, papers were searched by the following keywords: staw skroniowo
żuchwowy fizjoterapia ocena funkcjonalna OR functional assessment temporo-
mandibular joint TMJ physiotherapy stomatology OR assessment of functioning
temporomandibular joints OR range of mobility TMJ OR examination of the
active range of movement of the TMJ OR diagnosis of temporomandibular dis-
orders. The second category concerned the functional evaluation of TMJs using
different types of technical solutions. In this case, articles were searched for by
the following keywords: temporomandibular joint diagnostics OR mandible tem-
poromandibular assessment technology OR novel TMJ assessment methods.
For the founded literature reports the following inclusion criteria were
defined:

– the subject matter of the research concerned the categories listed above,
– the article was a scientific report and not a review,
– the type of paper was important: only clinical research, preliminary research,
individual case studies, and the latest reports on the problem discussed were
taken into account,
– the paper was written in English and Polish,
– the papers should have been published not earlier than 1 January 2011, except
for those on established rehabilitation techniques and major reports in the
field.

2.3 Data Extraction

Throughout the search process, more than 18 000 articles were found considering
the presented topic. However, only 23 met the inclusion criteria and were further
analysed. The diagram below shows the successive criteria for including articles
in further analysis (Fig. 1), where n – the number of articles selected at the given
stage.

3 Functional Assessment of TMJ – Standard Methods

The variety of aspects that should be analyzed during the functional assessment
of the temporomandibular joint is large. The ROM in the sagittal plane and
the range of lateral movement in the frontal plane using calipers were most
380 D. Kania et al.

often analyzed in the available literature [8–17]. In addition, the strength of the
masticatory muscles was often checked using electromyography [12,18–20]. The
analysis results, including the essential elements of the research protocol and the
results obtained, in the context of the functional assessment of the TMJs using
traditional methods, are presented in the table below (Table 1).

Fig. 1. Flow diagram containing criteria for excluding articles from further analysis

Table 1. Traditional methods of evaluating the temporomandibular joint

Authors Kolodziejczyk, P., et al. 2011 [8]


Aim of study Develop a research method for evaluating the progress of
therapy in TMD
Material and Methods 15 patients with TMD; Measurement of (1) mandibular ROM in
the sagittal plane using calipers; (2) mandibular trajectory using
a diode system mounted on the patient’s face and long exposure
photography to record the trajectory of motion
Parameters analysed Passive and active ROM before and after Transcutaneus
Electrical Nerve Stimulation and infrared radiation
Conclusion Physiotherapy can be an ideal complement to dental treatment
of TMJ
(continued)
Methods of Functional Assessment of TMJs 381

Table 1. (continued)

Authors Bae, Y. 2014 [9]


Aim of study Identifying the changes in the myofascial pain and ROM of TMJ
to patients with latent myofascial trigger points of the SCM
Material and Methods 42 patients; kinesiotaping to the SCM; measurement of pain
triggered during palpation of a taut band or nodule using VAS
and pressure pain threshold, the ROM (distance between the
median clefts of the upper and lower teeth) using a goniometer,
before and after the intervention
Parameters analysed Pain intensity, pressure pain threshold, ROM
Conclusion Kinesio taping as an intervention method applicable to latent
myofascial trigger points
Authors Shaffer, S. et al. 2014 [10]
Aim of study Clinical evaluation of TMJ
Material and Methods Measurement of the mandibular ROM using Boyle’s tool
(specialised caliper) or TheraBite ROM, jaw dynamometry
Parameters analysed ROM as the distance between incisors, left and right retraction,
protrusion, squeeze force measurement
Conclusion Assessment of the ROM and progression as an essential element
in the evaluation of dental treatment
Authors Davoudi, A., et al. 2015 [11]
Aim of study Assessment of the activity of masticatory muscle of TMJs in
healthy individuals and patients with TMD
Material and Methods 69 patients with TMD; assessment of muscle using
electromyography during maximal voluntary clenching of the
jaws and chewing by one side of the jaws voluntary; ROM
measurement using ruler
Parameters analysed ROM, amplitudes, rest time, activity time, activity index for
EMG of chewing muscles
Conclusion Decreased masticatory muscle activity depending on the severity
of TMJ pathology
Authors Spagnol, G., et al. 2016 [12]
Aim of study Evaluation of occlusal force, electromyographic activity and
mandibular mobility
Material and Methods 13 patients; measurement of (1) maximum bite force with a
dynamometer, (2) mandibular mobility with a caliper, (3)
electromyographic activity of the right and left masseter, right
and left temporalis muscles using an EMG
Parameters analysed Bite force, maximum mouth opening, the maximum ROM,
protrusion, to the midline of the teeth, level of tension of
selected muscle groups
Conclusion Recovery of electromyographic activity, increase in maximum
occlusal force and mandibular mobility throughout the assessed
postoperative period
(continued)
382 D. Kania et al.

Table 1. (continued)

Authors Mazzetto, M. O., et al. 2017 [13]


Aim of study Comparision of the amplitude of mandibular movement
measurements obtained by two different methods
Material and Methods 60 patients (30 with TMD, 30 healthy); measurement of
mandibular mobility using a digital millimeter ruler and the
ultrasound JAM system
Parameters analysed ROM measurement in (1) sagittal, (2) frontal plane
Conclusion Evaluation of mandibular patterns as a diagnostic criterion for
TMD classification, no significant differences between rule and
ultrasound system
Authors Akgol, A. C., et al., 2019 [14]
Aim of study Evaluation of hypermobility and TMD and the relationship
between hypermobility and TMD
Material and Methods 97 healthy patients; assessment of hypermobility using the
Beighton Hypermobility Score, TMJ using deviation or
deviation on opening; measurement of pain at rest, during
chewing and at night using VAS and dolorimeter
Parameters analysed Stiffness in the TMJ, mandibular displacement in the sagittal
plane, ROM, maximum right and left tilt in the frontal plane,
the pain level
Conclusion The need for a thorough examination of the TMJ in the context
of pain, sensitivity, and functional health by introducing
rehabilitation
Authors Alonso-Royo, R., et al. 2021 [15]
Aim of study Validation of the use of the Helkimo Clinical Dysfunction Index
(HDCI) in patients with TMD
Material and Methods 107 patients (60 with TMD, 47 healthy); measurement of
mandibular ROM using a ruler, assessment of changes in
mammary function using Fonseca’s anamnestic index, evaluation
of pain, assessment of masticatory muscle pain and discomfort
Parameters analysed ROM as maximum mouth opening, maximum lateral
displacement of the mandible to both sides, level of pain, the
possibility of a dysfunctional neck disability, balance problems
and headaches
Conclusion HCDI as a battery of combined tests is a suitable tool for
diagnosing TMD
Authors Kulesa-Mrowiecka M., et al. 2021 [16]
Aim of study The prevalence of HJS in patients with TMD
Material and Methods 322 patients; the linear measurement of the maximum mouth
opening and opening pattern, using ruler, the pain-free
maximum mouth opening pattern, the assessment of pain
Parameters analysed ROM as a distance from the edges of lower incisors to the edges
of upper incisors, level of myofascial pain severity
Conclusion Physiotherapy focused on the TMJ of patients with HJS as an
effective method in reducing pain and improving mandibular
coordination
(continued)
Methods of Functional Assessment of TMJs 383

Table 1. (continued)

Authors Campos López, A., et al. 2021 [17]


Aim of study Assessment of jaw and neck function, pressure pain threshold and
presence of trigger points
Material and Methods 100 patients with disc displacement with reduction (DDWR) and
100 healthy people; clinical assessment – demographic data, range
of jaw motion using ruler, jaw and neck disability
Parameters analysed Pain intensity, range of jaw motion
Conclusion Cervical spine TMJ evaluation and treatment should be considered
in DDWR patients
Authors dos Santos Berni, K. C., et al. 2015 [18]
Aim of study Evaluation of surface electromyography activity in the diagnosis of
TMJ
Material and Methods 123 patients (80 with TMD, 43 healthy); measuring the muscle
strength of the temporal, masseter and supraglottic muscles using
EMG
Parameters analysed Strength of selected muscle parts (their maximum value) while
biting the jaw
Conclusion The use of sEMG as an adjunctive tool in the diagnosis of TMJ
Authors Woźniak, K., et al. 2015 [19]
Aim of study Assessment of temporalis and masseter muscle fatigue in patients
with TMD
Material and Methods 200 patients; qualitative palpation, auscultation and visual
assessment of the TMJ, measurement of the muscle strength of the
temporal and masseter muscles using EMG during the maximal
effort test, pain using VAS
Parameters analysed ROM, level of pain in the TMJ and masticatory muscles, muscle
strength measurement – analysis of the average power frequency of
the EMG signal, asymmetry index
Conclusion sEMG as an excellent diagnostic tool in identifying patients with
TMD
Authors Amrulloevich, G. S. et al. 2020 [20]
Aim of study Evaluation of the diagnostic efficacy of research methods for TMD
Material and Methods 120 patients with TMD; recording of mandibular movements using
occlusiography, masticatory muscle activity using EMG at different
activities, CT scanning to assess TMJ, pain levels
Parameters analysed mean EMG amplitude, during chewing, at maximum jaw
compression, chewing time, resting time, pain level
Conclusion Relationship between the amplitude of vertical movements of jaw,
changes in the biopotentials of the masticatory muscles and the
occurrence of different type of TMD

4 Functional Assessment of TMJ – New Technologies


The table below (Table 2) presents novel approaches to TMJ assessment using
new technologies such as accelerometer systems [21–23], miniature microphones
[25,26] or different types of cameras [28–30]. During the literature search, the
384 D. Kania et al.

focus was on the variety of methods presented and the potential uses of these
methods – the methods used, the equipment used, and the parameters analyzed
are listed.

Table 2. Novel approaches to TMJ assessment

Authors Sharma, S., et al. 2017 [21]


Aim of study Evaluation of the reliability and diagnostic usefulness of joint
vibration analysis in patients with TMJ dysfunction
Material and Methods 36 patients with TMD; recording of jaw opening and closing
movements synchronised to a metronome, ROM measurement using
a tool based on accelerometers (BioPAK)
Parameters analysed Total energy of vibration, total energy of vibration 300 Hz and total
energy for vibration 300 Hz; above/below energy ratio, peak
amplitude, peak frequency, median frequency, ROM
Conclusion A complex variable-based analysis of joint vibration as a tool to
distinguish patients with TMD from healthy patients
Authors Baldini, A., et al. 2016 [22]
Aim of study Analysis of the influence of the mandibular positions on the active
cervical ROM using an accelerometer
Material and Methods 21 healthy patients; a cervical range of movement examination
using a 9-axis accelerometer
Parameters analysed Range of head rotation, difference between right and left rotation
angles, range of head lateral bending, difference between right and
left lateral bending angles, range of head flexion/extension
Conclusion No effect of mandibular position on active cervical ROM, on the
symmetry of active cervical ROM between left and right sides
Authors Whittings- low, D. C., et al. 2020 [23]
Aim of study Development of a system to record acoustic emissions of TMJs
Material and Methods 11 patients; recording of acoustic emissions using accelerometers
during alternate mouth opening and closing
Parameters analysed Root mean square power, the signal energy, the zero-crossing rate
Conclusion Acoustic emission analysis is a promising technique for the clinical
evaluation of TMJ
Authors Widmalm, S. E., et al. 2016 [24]
Aim of study Evaluation of the relationship of TMJ sounds to the extent of
lateral horizontal mandibular movements during jaw opening and
closing
Material and Methods 66 patients (28 healthy i 38 with TMD); recording of sounds using
miniature microphones placed in the ear canals; recording of jaw
movements using jaw tracker JT-3D during 6-fold slow mouth
opening and closing movement
Parameters analysed Value describing jaw movement deviation, degrees of maximum
lateral deviation per se during opening and closing, amplitude,
frequency distribution and frequency of TMJ sounds
Conclusion The diagnostic value of the introduced parameter towards TMD
screening
(continued)
Methods of Functional Assessment of TMJs 385

Table 2. (continued)

Authors Łyżwa, P., et al. 2018 [25]


Aim of study Assessing the usefulness of vibroacoustic methods for recording
TMJ as a tool for monitoring treatment progress
Material and Methods 13 patients with TMD; recording of sounds using binaural
microphones in a small anechoic chamber, crackles using an
electronic stethoscope, and other disturbances using
accelerometers, coming from the TMJ, video recording of the
lower part of face
Parameters analysed Parameters from each signal: peak value, root mean square,
cerst factor like quotient of peak and RMS values
Conclusion Binaural microphones as a promising tool to study acoustic
phenomena from the TMJs
Authors Carmignani, A., et al. 2017 [26]
Aim of study Evaluation of the correlation between ROM and skeletal classes
Material and Methods 108 patients; ROM measurement by axiography Cadiax III
Parameters analysed Distance between transversal axis and major lateral translation
point, maxillary position, maxillary depth, facial depth
Conclusion The need for further studies to confirm the obtained results
Authors Loster, B. W., et al. 2012 [27]
Aim of study Evaluation of treatment results in patients with TMD by
application of a decompression splint
Material and Methods 8 patients; clinical examination of the mandibular ROM, pain
assessment using VAS scale, conditionlography using the Cadiax
Compact device
Parameters analysed Jaw free ROM, pain intensity VAS scale
Conclusion The results of conditionlography as a predictor of the need for
further rehabilitation of the masticatory motor system
Authors Park, Y., et al. 2014 [28]
Aim of study Checking the change in ROM of the TMJs in scoliosis correction
Material and Methods 31 patients with scoliosic; pilates type exercises, scoliosis grade
assessment and TMJ condition before and after examination
using digital camera and system to body alignment analysis
Parameters analysed Degree of spinal curvature, disproportion of lower limb length,
and deviation, ROM value of TMJ
Conclusion ROM change with reduction in spinal curvature; mild scoliosis
as a factor negatively affecting ROM of the TMJ
Authors Clemente, M. P., et al. 2018 [29]
Aim of study Definition of a research protocol for the diagnosis and treatment
of TMD in instrumentalists
Material and Methods Case study; ROM measurement using a ruler, assessment of
areas with pain during maximal assisted and unassisted mouth
opening, static analysis of contacts between teeth in supporting
areas and dynamic analysis of relations between teeth, analysis
of muscle areas using thermal imaging
(continued)
386 D. Kania et al.

Table 2. (continued)

Parameters analysed ROM, subjective evaluation of occlusion based on photographs,


palpation analysis of masseter and temporal muscles and TMJ,
asymmetry of the degree of heating of masseter and temporal
muscles
Conclusion A need to introduce screening of TMJ areas; application of
innovative techniques in dentistry; an opportunity to prevent
overuse of specific anatomical structures, with early diagnosis
and correct monitoring
Authors Barbosa, J. S., et al. 2020 [30]
Aim of study Evaluation of the sensitivity, specificity, and accuracy of
thermography in identifying patients with TMD
Material and Methods 86 patients with TMD, Thermographic pictures on both sides,
the definition of ROI for masseter muscles, anterior temporal
muscles, TMJ for the right and left side independently,
subjective evaluation of pain – VAS scale
Parameters analysed Mean temperature in individual ROIs, correlation coefficients
between pain and mean temperature
Conclusion Unsatisfactory results of TMD condition differentiation based on
thermal imaging analysis

5 Discussion
Nowadays, the world is struggling with many civilization diseases. Scientific
reports indicate that TMJ pathologies are also becoming one of them, as they
affect an increasing number of the people [31,33]. Temporomandibular joint
dysfunctions comprise a large group of conditions that vary in etiology, symp-
toms, and subsequent treatment options, often accompanied by pain and reduced
mobility [2]. The correct diagnosis of TMJs is a challenge posed by many research
centers. There are no simple, objective methods for functional assessment of
TMJs without highly specialized equipment. For diagnostic purposes, the range
of mandibular mobility in different planes has been the most frequently used
parameter to represent the TMJ status, and it has usually been measured with
a ruler, caliper, or goniometer [8–16]. Depending on the established testing proto-
col, this parameter in the sagittal plane was defined as the distance between the
incisors during maximum mouth opening [10,13] while in the frontal plane (max-
imum shift of the mandible to both sides) as the distance between the medial
slits of the upper and lower teeth [9,17]. Another variable repeatedly analyzed
in the literature for assessing TMJs is the maximum occlusal force measured
with dedicated dynamometers or occlusiography, which allows the evaluation of
masticatory muscle strength [10,12,31]. A technique also used to directly assess
the muscles responsible for TMJs mobility was recording an electromyographic
signal from multiple channels [12,18,19]. Indirect methods of muscle force anal-
ysis included recording thermal images to show the degree of heating of indi-
vidual regions of interest (ROI) or the symmetry of muscle work between the
right and left sides of the face [29,30]. These tools allowed to analyse and pre-
Methods of Functional Assessment of TMJs 387

vent overexploitation of specific anatomical structures within the TMJs at the


early diagnostic stage. However, the application of the solutions mentioned above
required specialized systems – for EMG and thermal camera measurements and
the ability to interpret them correctly. An interesting alternative to the use of
standard methods was the development of non-invasive systems for evaluating
TMJs. The paper by Kolodziejczyk et al. presents a diode system for recording
mandibular motion trajectories that have been validated using calipers and long
exposure photographs [8]. Ultrasound System JAM is also a dedicated solution
for ROM analysis [13]. For functional assessment of the TMJ, the CADIAX
device was developed, which returned ROM information based on the regis-
tration of mandibular position changes [26,27]. Unlike the other solutions cited
above, it was one of the few available on the market, medically certified products
to record changes in the therapy process. Baldini et al. analyzed mandibular posi-
tion and cervical spine ROM using an accelerometer system [22]. Accelerometers
were also used in a system consisting of binaural microphones and an electronic
stethoscope dedicated to the vibroacoustic analysis of signals recorded during
mandibular movements [25]. Accelerometric signal recording for vibroacoustic
analysis of TMJs has also been studied for the extent of mandibular mobility
during maximal mouth opening and closing movements [21,23]. The correlations
of sounds generated by TMJ and ROM were evaluated [24], and similar results
were correlated with posture [28].
The cited scientific reports on temporomandibular joint sound analysis have
introduced diagnostic values of specific parameters into the literature to develop
a screening protocol for TMD and clinical evaluation of TMJ. Joint tremor anal-
ysis, based on complex variables, may in the future allow for creating a tool to
distinguish TMD patients from healthy ones. Since these systems do not require
specialized equipment, special measurement conditions, they may in the future
allow for use in both dental and physiotherapy clinics. Unfortunately, the pre-
sented hardware solutions are prototypes of systems or devices beyond the finan-
cial reach of most TMJ professionals. Currently, there are no widely available,
easy-to-use, inexpensive, progress-monitoring, and above all objective systems
for assessing the TMJ that can be used by specialists involved in the diagno-
sis and subsequent treatment of the temporomandibular joint. It is, therefore,
essential to develop devices that will meet all the above conditions.

6 Conclusion

Despite the weaknesses of the proposed solutions, the application of each of the
above-mentioned techniques makes it possible to evaluate the TMJ joint, and the
conclusions in the available literature prove the necessity of addressing the topic
of functional assessment of the TMJ, the acquisition of knowledge by specialists
in this subject area, as well as the search for objective solutions, all with the aim
of improving the quality and speed of TMD diagnosis.
388 D. Kania et al.

References
1. Herring, S.W.: TMJ anatomy and animal models. J. Musculoskelet. Neuronal Inter-
act. 3(4), 391 (2003)
2. Osiewicz, M.A., Lobbezoo, F., Loster, B.W., Wilkosz, M., Naeije, M., Ohrbach,
R.: Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD):
the Polish version of a dual-axis system for the diagnosis of TMD. (RDC/TMD)
form. J. Stomatology 66, 576–649 (2013)
3. Kapandji, A.I.: Anatomia funkcjonalna stawów. Kręgoslup i glowa 3, 216–217
(2013). (in Polish)
4. Ivkovic, N., Racic, M.: Structural and functional disorders of the temporomandibu-
lar joint (Internal disorders). Maxillofacial Surgery and Craniofacial Deformity-
practices and Updates (2018)
5. Lipowicz, A., et al.: Evaluation of mandibular growth and symmetry in child with
congenital zygomatic-coronoid ankylosis. Symmetry 13(9), 1634 (2021)
6. Dowgierd, K., Pokrowiecki, R., Borowiec, M., Kozakiewicz, M., Smyczek, D.,
Krakowczyk, Ł: A protocol for the use of a combined microvascular free flap
with custom-made 3D-printed total temporomandibular joint (TMJ) prosthesis
for mandible reconstruction in children. Appl. Sci. 11(5), 2176 (2021)
7. Kulesa-Mrowiecka, M., Pihut, M., Słojewska, K., Sułko, J.: Temporomandibular
joint and cervical spine mobility assessment in the prevention of temporomandibu-
lar disorders in children with osteogenesis imperfecta: a pilot study. Int. J. Environ.
Res. Public Health 18(3), 1076 (2021)
8. Kołodziejczyk, P., Kuciel-Lewandowska, J., Paprocka-Borowicz, M., Jarząb, S.,
Aleksandrowicz, K.: A photographic method of recording movement trajectory to
evaluate the effectiveness of physiotherapy in temporomandibular joint dysfunc-
tions - a preliminary study. Adv. Clin. Exp. Med. 20(1), 79–85 (2011)
9. Bae, Y.: Change the myofascial pain and range of motion of the temporomandibular
joint following kinesio taping of latent myofascial trigger points in the sternoclei-
domastoid muscle. J. Phys. Ther. Sci. 26(9), 1321–1324 (2014)
10. Shaffer, S.M., Brismee, J.M., Sizer, P.S., Courtney, C.A.: Temporomandibular dis-
orders. Part 1: anatomy and examination diagnosis. J. Manual Manipulative Ther-
apy 22(1), 2–12 (2014)
11. Davoudi, A., Haghighat, A., Rybalov, O., Shadmehr, E., Hatami, A.: Investigating
activity of masticatory muscles in patients with hypermobile temporomandibular
joints by using EMG. J. Clin. Exp. Dent. 7(2), e310 (2015)
12. Spagnol, G., Palinkas, M., Regalo, S.C.H., de Vasconcelos, P.B., Sverzut, C.E.,
Trivellato, A.E.: Impact of midface and upper face fracture on bite force, mandibu-
lar mobility, and electromyographic activity. Int. J. Oral Maxillofac. Surg. 45(11),
1424–1429 (2016)
13. Mazzetto, M.O., Anacleto, M.A., Rodrigues, C.A., Bragança, R.M.F., Paiva, G.,
Valencise Magri, L.: Comparison of mandibular movements in TMD by means of
a 3D ultrasonic system and digital caliper rule. Cranio 35(1), 46–51 (2017)
14. Akgol, A.C., Saldiran, T.C., Tascilar, L.N., Okudan, B., Aydin, G., Rezaei, D.A.:
Temporomandibular joint dysfunction in adults: its relation to pain, general joint
hypermobility, and head posture. Int. J. Health Allied Sci. 8(1), 38 (2019)
15. Alonso-Royo, R., et al.: Validity and reliability of the helkimo clinical dysfunction
index for the diagnosis of temporomandibular disorders. Diagnostics 11(3), 472
(2021)
Methods of Functional Assessment of TMJs 389

16. Kulesa-Mrowiecka, M., Piech, J., Gaździk, T.S.: The effectiveness of physical ther-
apy in patients with generalized joint hypermobility and concurrent temporo-
mandibular disorders - a cross-sectional study. J. Clin. Med. 10(17), 3808 (2021)
17. Campos López, A., De-Miguel, E.E., Malo-Urriés, M., Acedo, T.C.: Mouth open-
ing, jaw disability, neck disability, pressure pain thresholds, and myofascial trigger
points in patients with disc displacement with reduction: a descriptive and com-
parative study. In: CRANIO, pp. 1–7 (2021)
18. dos Santos Berni, K.C., Dibai-Filho, A.V., Pires, P.F., Rodrigues-Bigaton, D.:
Accuracy of the surface electromyography RMS processing for the diagnosis of
myogenous temporomandibular disorder. J. Electromyogr. Kinesiol. 25(4), 596–
602 (2015)
19. Woźniak, K., Lipski, M., Lichota, D., Szyszka-Sommerfeld, L.: Muscle fatigue in the
temporal and masseter muscles in patients with temporomandibular dysfunction.
BioMed Research International (2015)
20. Amrulloevich, G.S., Mirjonovich, A.O.: Clinical features of diagnostics and their
defenses in patients with dysfunction of the high-mandibular joint without pathol-
ogy, inflammatory-dystrophic origin. Int. J. Progressive Sci. Technol. 22(2), 36–43
(2020)
21. Sharma, S., Crow, H.C., Kartha, K., McCall, W.D., Gonzalez, Y.M.: Reliability
and diagnostic validity of a joint vibration analysis device. BMC Oral Health 17(1),
1–7 (2017)
22. Baldini, A., Nota, A., Tecco, S., Ballanti, F., Cozza, P.: Influence of the mandibular
position on the active cervical range of motion of healthy subjects analyzed using
an accelerometer. Cranio 36(1), 29–34 (2018)
23. Whittingslow, D.C., Orlandic, L., Gergely, T., Prahalad, S., Inan, O.T., Abramow-
icz, S.: Acoustic emissions of the temporomandibular joint in children: proof of
concept. Frontiers of Oral and Maxillofacial Medicine, 2 (2020)
24. Widmalm, S.E., Dong, Y., Li, B.X., Lin, M., Fan, L.J., Deng, S.M.: Unbalanced
lateral mandibular deviation associated with TMJ sound as a sign in TMJ disc
dysfunction diagnosis. J. Oral Rehabil. 43(12), 911–920 (2016)
25. Łyżwa, P., Kłaczyński, M., & Kazana, P. (2018). Vibroacoustic methods of imaging
in selected temporomandibular joint disorders during movement. Diagnostyka, 19
26. Carmignani, A., Carmignani, R., Ciampalini, G., Franchini, M., Greven, M.: Com-
parison of condylar lateral translation and skeletal classes. Zeitschrift für Kran-
iomandibuläre Funktion 9(3), 1–15 (2017)
27. Loster, B. W., Loster, J., Wieczorek, A., Ryniewicz, W.: Mycological analysis of the
oral cavity of patients using acrylic removable dentures. Gastroenterology Research
and Practice (2012)
28. Park, Y., Bae, Y.: Change of range of motion of the temporomandibular joint after
correction of mild scoliosis. J. Phys. Ther. Sci. 26(8), 1157–1160 (2014)
29. Clemente, M.P., Mendes, J., Moreira, A., Vardasca, R., Ferreira, A.P., Amarante,
J.M.: Wind instrumentalists and temporomandibular disorder: from diagnosis to
treatment. Dentistry J. 6(3), 41 (2018)
30. Barbosa, J.S., et al.: Infrared thermography assessment of patients with temporo-
mandibular disorders. Dentomaxillofacial Radiology 49(4), 20190392 (2020)
31. Gouw, S., de Wijer, A., Bronkhorst, E.M., Kalaykova, S.I., Creugers, N.H.: Asso-
ciation between self-reported bruxism and anger and frustration. J. Oral Rehabil.
46(2), 101–108 (2019)
390 D. Kania et al.

32. Krohn, S., Gersdorff, N., Wassmann, T., Merboldt, K.D., Joseph, A.A., Buergers,
R., Frahm, J.: Real-time MRI of the temporomandibular joint at 15 frames per
second - a feasibility study. Eur. J. Radiol. 85(12), 2225–2230 (2016)
33. Czernielewska, J., Gębska, M., Weber-Nowakowska, K.: Ocena ruchomości stawów
skroniowo-żuchwowych i odcinka szyjnego kręgosłupa u osób z bruksizmem. Medy-
cyna Ogólna i Nauki o Zdrowiu 26(1), 60 (2020). (in Polish)
Sound and Motion
The Effect of Therapeutic Commands
on the Teaching of Maintaining Correct
Static Posture

Damian Kania1 , Tomasz Szurmik2 , Karol Bibrowicz3 ,


Patrycja Romaniszyn-Kania4(B) , Miroslaw Czak4 , Anna Mańka4 ,
Maria Rosiak5 , Bruce Turner6 , Anita Pollak7 , and Andrzej W. Mitas4
1
Institute of Physiotherapy and Health Science, The Jerzy Kukuczka Academy
of Physical Education in Katowice, ul. Mikolowska 72A, 40-065 Katowice, Poland
d.kania@awf.katowice.pl
2
Faculty of Arts and Educational Science, University of Silesia, ul. Bielska 62,
43-400 Cieszyn, Poland
tomasz.szurmik@us.edu.pl
3
Science and Research Center of Body Posture, College of Education and Therapy
in Poznań, ul. Grabowa 22, 61-473 Poznań, Poland
bibrowicz@wp.pl
4
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{patrycja.romaniszyn-kania,miroslaw.czak,anna.manka,andrzej.mitas}@polsl.pl
5
Faculty of Automatic Control, Electronics and Computer Science,
Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland
mariros974@studentpolsl.pl
6
dBs Music, HE Music Faculty, 17 St Thomas St, Redcliffe, Bristol BS1 6JS, UK
bruce.turner@dbsmusic.co.uk
7
Institute of Psychology, University of Silesia in Katowice, ul. Bankowa 12, 40-007
Katowice, Poland
anita.pollak@us.edu.pl

Abstract. The article presents the results of a preliminary study analy-


sing the physiological parameters obtained during exercises that teach
the patient’s correct body posture while sitting. Electrodermal activity
(EDA), blood volume pulse (BVP), and electromyographic (EMG) sig-
nals were recorded and analysed during the training process for position
shaping. A music preference and musicality questionnaire was carried
out before the study. The JAWS questionnaire was completed twice by
the respondent, before and after exercises. The physiotherapists provided
instructions with respect to the stimulation of the autonomic nervous sys-
tem, observed in EDA, heart rate and the subsequent motor units. While
performing the exercises, the subjects felt positive emotions, which can
be perceived as a positive experience for the probands and suggests their
willingness to learn and maintain correct body posture while sitting.
The sonification of the therapist’s commands and their sonic emotional
content is further researched.

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 393–405, 2022.
https://doi.org/10.1007/978-3-031-09135-3_33
394 D. Kania et al.

Keywords: Body posture · Frankfurt position · Correct sitting


posture · Electromyography · Electrodermal activity · Heart rate
variability

1 Introduction

Rehabilitation is very popular today. Humans naturally strive for comfort, and
often in doing so adopt a collapsed posture, causing dramatically unfavourable
changes in lifestyle. Counterintuitively these habits become (next to excessive
symptomatic pharmacotherapy) the primary method used to avoid pain [1–3].
Prevention is the best way to counteract the need for repair later on, and would
improve our health, which we know instinctively very well, but are also very
reluctant to pursue [4]. A participant’s ability to adhere to a health program, to
sustain motivation and postural correctness are also of concern [5]. This can be
seen as a cultural issue, and related to an inadequate health policy. Therefore,
it has a long-term dimension, which today is also shaped by the popularising
influence of scientific institutions that address the issue of early prevention.
This article, however, focuses on one of the elements of the formation of cor-
rect movement habits, conducive to the correctness of the physiotherapeutic or
physioprophylactic activities undertaken.
In the context of long-term research, attention should be paid to the phe-
nomenon of Music Entrainment, i.e., the influence on the human psychosomatic
system in the sense of neuronal motor initiation, using appropriately selected and
timed auditory stimuli [6]. Because of the natural connection of sonic stimulation
with movement behaviour in the broadest sense, it is justified to treat the prob-
lems undertaken in music therapy. Current research has traction over centuries-old
work in music therapy in its general definition constituted by radically improved
measurement instrumentation. It allows the viewing of slow-moving processes
and, above all, the recording of real-time data from the functional behaviours of
the human body.
The most common habitual postural defects in the sagittal plane are pro-
traction and flexed-relaxed position [7]. These positions are often adopted, espe-
cially by sedentary habits, due to low muscular effort and increased load on pas-
sive paraspinal structures [8]. Protraction, together with rounded shoulders, can
cause weakening of the cervical extensors as well as increased compressive forces
on the cervical spine [9]. Biofeedback is an effective intervention for postural re-
education and reduction of changed muscle activation [10]. EMG-based biofeed-
back can specifically reduce activation of a target muscle. Visual feedback based
on pressure signals can reduce or prevent shoulder protraction and rounding.
EMG-based devices can support postural alignment by reducing static muscle
activity [11].
The research presented in this article was preliminary. The aim was to anal-
yse the effect of a series of voice commands given by a physiotherapist on the
patient’s ability to learn to and maintain correct posture while sitting. The devel-
opment of a research protocol and the analysis of the obtained results will allow,
The Effect of Therapeutic Commands 395

in the next phase, to sonify and modify the emotional content of voice commands
to stimulate therapeutic actions.

2 Materials and Methods

2.1 Materials

Six people in the age range of 22 to 39 years participated in the study. All probands
were born, grew up, and currently live in the territory of Poland and have all com-
pleted secondary education. The participants were informed about the aim of the
research, the course of the experiment, and the instructions of correct posture.
They had no feedback in self-observation of the movements performed – relying
only upon their proprioception. Adopting the proper posture in the presented case
meant that the physiotherapist conducting the examination placed the head in the
Frankfurt position, otherwise known as the eye-ear plane. The aim was to promote
skull alignment so that the lower border of the eye along both sides of the human
head was in the same horizontal plane as the upper border of the auditory muscle
[12].
Participants gave written consent before the measurements were taken, and
the data obtained were fully anonymised. Participants in the study had to comply
with the requirements of the experiment, i.e., neck and ears should be exposed,
hair pinned up, wearing a ‘strapless’ T-shirt allowing the neck, and clavicles to
be revealed.

2.2 Acquisitions Process

The study was performed in a specially prepared laboratory where the subjects
were provided with peace, comfort, and privacy. During the experiment, only the
people conducting the research and the participants were in the room. The research
protocol consisted of several elements (Fig. 1). The first of these was the comple-
tion by the patient of a questionnaire on selected aspects of musical preferences and
musicality, according to the concept of Bialkowski et al. [13], and primary demo-
graphic data. The subject was also asked to complete the standardized Job-related
Affective Well-being Scale (JAWS) test [14], which consisted of 12 questions and
concerned the current emotions experienced. Then the patient was asked to take
a designated seat – sit in a chair without a backrest. An Empatica E4, worn each
time on the wrist of the patient’s non-dominant hand, used to measure physiolog-
ical signals in real-time, such as electrodermal activity (EDA), body temperature,
accelerometric signals (ACC), and blood volume pulse (BVP). The EDA and tem-
perature signal sampling frequency 4 Hz, the BVP signal 64 Hz, and the ACC –
32 Hz.
In the next step, the physiotherapist placed electrodes for measuring the elec-
tromyographic (EMG) signal in predefined points of the patient’s body according
to the following figure (Fig. 2) symmetrically on both sides. The relevant mus-
cle parts were chosen deliberately because of their primary role during head
396 D. Kania et al.

Fig. 1. Scheme of the research protocol

movements. Behavioral and neurophysiological data show that proprioception


from neck muscles contributes to constructing a cognitive representation of the
body. It includes the position of limb segments, their hierarchical arrangement,
and the spatial configuration of segments [15]. Therefore, the regions of inter-
est consisted of muscles mainly involved in motor control of head movement.
Electrodes were mounted at the beginning and end of the sternocleidomastoid
muscle (SCMb/SCMe, Fig. 2(a)) and over the upper trapezius at the level of the
C2 vertebra (V, Fig. 2(b)). All EMG electrodes were attached symmetrically on
the right and left side of the patient’s body.

(a) (b)

Fig. 2. Location of EMG electrodes on the patient’s body

A Noraxon Ultium sensor system was used to record the EMG signal, in a 6-
channel measurement configuration, with a sampling rate of 2 kHz. In addition,
using the same device, one of the sensors located on the top of the patient’s head,
at the vertex, was configured only for full IMU measurement (accelerometer,
gyroscope, and magnetometer), with a sampling rate of 0.4 kHz.
The Effect of Therapeutic Commands 397

After placing all the necessary sensors on the patient’s body, the measurement
of all the signals mentioned above was started from that moment until the end
of the whole research protocol. Subsequently, after the recording was completed,
the collected information, in raw data, was exported to a .csv format file for
further analysis.
The primary research and measurement procedure consisted of three stages.
During each of them, the physiotherapist gave strictly defined voice commands,
with the help of which the patient, by definition, was supposed to strive to reach
the Frankfurt position from a specific initial position – a tilted head, with the chin
attached to the sternum, while sitting down. The commands were given in Pol-
ish, without dialect accents and without emotion in voice, with objective sound
intensity. The beginning and end of each event were marked with time markers
independently on E4 and Noraxon. The first stage of recording involved learning
to reach the final position. The patient repeated the exercise three times according
to the given commands, and the physiotherapist continuously corrected any incor-
rect part of the subject’s head. Knowing what to do as well as the basic expecta-
tions of the therapist allowed the probate to assume the correct positions as far
as possible with respect to their motor habits and motor limitations. Then, while
sitting on a chair, without support, the pattern shown in Fig. 3 was repeated six
times, but this time without the possibility of correction by the specialist.

Fig. 3. Diagram of exercises performer

The final stage was to change the chair to a version with a backrest (bench),
and the patient was again asked to execute the commands six times. The
research task was to observe the susceptibility to the commands given. Angle val-
ues, determined from accelerometer measurements, during the individual move-
ment sequences are presented below, together with the corresponding commands
(Fig. 4).
Immediately after completing the registration, the patient was again asked
to perform the JAWS test, which seeks to attain the subject’s feelings during
exercise. Removal of the E4 device from the subject’s hand and the Noraxon
sensors stopped condition monitoring during the test.

2.3 Data Analysis


The Empatica E4 and Noraxon Ultium devices, throughout the research proto-
col, allowed independent placement of time markers (at the beginning and end
398 D. Kania et al.

Fig. 4. Angle values during movement and verbal commands given

stages) in the following order: learning, chair exercise, bench exercise. In subse-
quent analysis, this facilitated the division of the recorded signals and the HRV
coefficients determined from the BVP into appropriate time segments, depending
on the exercises. For all signals, similar features recorded during the presented
protocol stages were determined.
According to the concept of Greco et al., the EDA signal was divided into the
tonic (t) and phasic (p) components, and additive error (e) [16]. Then, in the pha-
sic, sudden changes (local maxima) were searched for, i.e., galvanic skin responses
(GSR). The total number of GSR, amplitude, and the number of responses per
minute (rpm) were determined in the analysed time segment. The skin conduc-
tance level (SCL) was calculated based on the tonic component. From the sum of
the tonic and phasic components, the basic statistical features of the signal in the
time domain were calculated, such as the mean (x), standard deviation (σ), min-
imum (min), maximum (max), skin conductance level (SCL) and the number of
galvanic skin response (nGSR) and GSR’s amplitude (aGSR) [17]. The BVP was
the basis for determining heart rate variability (HRV ) coefficients. The accelero-
metric signal recorded by Empatica E4 during the whole research protocol was
also important in this stage of the analysis, as a reference signal to eliminate arti-
facts in the analysis of cardiac signals related to the subject’s movement. Based
on BVP, an IBI (Inter-beats-interval) vector was determined – a vector of consec-
utive time intervals (dt) between individual heartbeats – local maxima or min-
ima. On this basis, the following coefficients were calculated in the next stage of
analysis: standard deviation of normal-to-normal intervals (SDN N ), root mean
square of successive differences (RM SSD), probability of intervals greater than 50
ms (pN N 50), integral of the density of the RR interval histogram divided by its
height (T RI) and mean heart rate value based on the inter-beats interval (HR).
The Effect of Therapeutic Commands 399

These parameters were purposely selected to be analysed in an ultrashort-term


HRV analysis (less than 5 min) [18].
The EMG signal was processed using dedicated Noraxon MR3 software.
Changes were monitored during the recruitment process of neuromotor units,
which is related to muscle fatigue caused by continuous activity [19]. Then, the
EMG signal was subjected to rectification, i.e., calculating the absolute value of
the signal and smoothing using the MS algorithm with a time window of 100 ms.
For the thus prepared signal, averaged mean amplitude and averaged max for
each study stage were determined. For each muscle, values of specific parameters
were determined. Then all of them were averaged to obtain an overall average
for each step of the study.
The psychological test taken by the subjects is a standardised test widely
used to assess emotions. The response format was a 5-point Likert scale (from
1 – never to 5 – very often). Based on the answers given by the respondent, it
was possible to obtain the JAWS coefficients such as Total Score, Total Positive
Emotions, and Total Negative Emotions. These coefficients indicate which emo-
tion is experienced more intensely, positive or negative, and how this changes
throughout the experiment.

3 Results
The following table presents selected physiological signal parameters determined
for the relevant parts of the research protocol (Table 1).

Table 1. Mean values of physiological parameters at different stages of the research


protocol

Feature Baseline Study Bench Chair


EDA x [uS] 1.675 1.974 2.282 3.429
σ 0.307 0.179 0.269 0.125
min [uS] 0.578 1.703 1.899 3.126
max [uS] 2.558 2.486 2.866 3.657
SCL [uS] 1.578 1.817 2.094 3.371
nGSR 19.333 19 22.75 23.500
aGSR [uS] 1.540 0.242 0.272 0.170
HRV SDNN [ms] 146.961 186.043 148.089 149.37
RMSSD [ms] 23.379 25.896 23.969 21.546
pNN50 [%] 37.200 35.956 33.621 23.747
TRI 13.846 10.527 11.844 10.083
HR 83.397 82.129 84.500 84.959

Based on the above results (Table 1), it is important to note the mean EDA
value increase in the study’s successive stages and the consequent increase in
400 D. Kania et al.

the maximum value, the number of GSRs, and the SCL parameter. Analysing
the heart rate variability parameters, the regularity of SDNN, pNN50, and HR
values can be noticed, according to the norms for the age group [20].
The table below (Table 2) presents selected EMG signal parameters deter-
mined for each study stage.

Table 2. Mean values of EMG signal parameters at different stages of the research
protocol

Feature Muscle Baseline Study Bench Chair


Mean amplitude [uV] SCMb – right 4.84 5.25 5.78 5.39
SCMb – left 4.58 4.67 5.12 5.16
SCMe – right 6.70 7.50 9.49 8.41
SCMe – left 7.60 7.94 8.90 8.92
V – right 9.47 7.85 8.58 7.56
V – left 9.20 8.25 12.24 13.63
Average for all muscles 7.07 6.91 8.35 8.18
Average max value [uV] SCMb – right 12.24 13.52 29.66 13.26
SCMb – left 9.45 13.30 14.89 14.51
SCMe – right 16.18 22.08 27.17 28.03
SCMe – left 19.20 21.76 24.42 23.95
V – right 51.68 71.16 43.53 55.00
V – left 43.33 50.89 113.01 130.28
Average for all muscles 25.35 32.12 42.11 44.17

The obtained values of mean amplitude and maximum value for the stage
before the test (baseline) and during the learning of movement sequence are the
lowest, and increase with the progress of subsequent stages of the experiment.
Table 3 shows the results of the JAWS test performed before and after the
study.

Table 3. Results of JAWS tests performed before and after exercise

Before After
JAWS Total sum 35.5 43.2
Positive emotion 14.5 16.5
Negative emotion 15.0 9.3

In the second stage (after), a higher mean summed test value was obtained,
with the dominance of the experienced positive emotions. While analysing the
music preference questionnaire results, it is essential to note the attitude of the
The Effect of Therapeutic Commands 401

proband towards music. Most of the probands (80%) did not have any musi-
cal education. However, some of them (34%) were self-taught musicians. Of all
probands, 67% indicated that music and related activities were essential. In con-
trast, the remainder stated that music was an indifferent part of their lives.
Participants were reluctant to participate in music-making, singing, and musical
activities. They were much more likely to choose passive forms of communicat-
ing with music, such as listening to music or concerts, devoting on average 15 h
a week to this. The respondents most often use a smartphone or a computer
during their leisure time, and the desire to interact with music was indicated
much less frequently. The music genre most commonly chosen by the partic-
ipants was film-music and rock. Half of the participants also prefer classical
music. The study group’s most frequently selected musical activities included
buying records/subscriptions, talking about music, and making music for per-
sonal enjoyment. Less common choices were reading, listening to or watching
music programs, attending music events, and making music with family and
friends. None of the people participating in the survey perform music for a liv-
ing, play/sing in a musical ensemble or choir.

4 Discussion

This paper presents the results of a preliminary study analysing the effect of
a series of given voice commands on a patient’s ability to learn and maintain
correct posture while sitting. The presented research protocol was based on the
acquisition of physiological data such as EMG, EDA, BVP, and cognitive data
such as the JAWS emotion intensity score and music preferences of the subject.
The analysis was performed based upon changes in the mean values of the
individual physiological coefficients in the successive stages of the research pro-
tocol (Table 1). With each subsequent element an increase in the mean and max-
imum value of the EDA signal, SCL level, and nGSR was noted, thus requiring
more and more involvement and control from the test subject. This indicates
the stimulation of the autonomic nervous system responsible for EDA activation
and heart rate [21]. It may be related to the perceived stress or coping with the
situation [22]. It should also be noted that despite the increase in the number of
GSRs (in the following components of the research protocol), their amplitude’s
mean value decreases. These arousals are no longer as intense as the beginning,
which were associated with a new unknown. In a sense, patients get used to the
situation, despite the stress and discomfort still felt [23].
HRV coefficients such as SDNN and HR were consistent with the norms
[18]. The RMSSD and pNN50 parameters showed some abnormalities – RMSSD
is too low for this age group, while pNN50 is too high. It may probably be
because the recording was relatively short, even called ultra-short (less than
5 min). According to the available literature, a more accurate analysis could
be performed to record at least 5 min [24]. The determined values of the mean
amplitude and maximum value of the EMG signal for the individual stages of
analysis differed from one another. The step before the test (baseline) and during
402 D. Kania et al.

the movement sequence learning was the lowest and increased with the progress
of subsequent experiment phases. This was a correct occurrence, as prolonged
execution of a specific movement increases the number of motor units of a given
muscle performing this movement [19], which indicates the subjects’ involvement
in the exercises committed. Moreover, the values obtained for symmetrical forces
were similar, suggesting that the performed movement were equally loaded on
the right and left sides of the body. Any deviation in the results achieved by a
specific muscle pair may indicate a significantly weaker muscle, which may be an
indication to adjust the training accordingly for a particular group of subjects
in subsequent experiments [25]. For the upper trapezius muscle (V – left), a
significantly different average maximum value was observed for the test stage
with the execution of the independent movement in contrast to the right side
(V – right) or the initial stages of the test. Discrepancy in the results due to the
displacement of the pair of electrodes attached over the trapezius muscle could
not be excluded as the amplitude range for the EMG signal should be within
0–10 mV (±5) suggesting the need to pay attention to the correct placement of
the electrodes and observe their position during the test [26].
Analysing the intensity of emotions of the subjects (Table 3), attention should
be given to their variability before and after performing corrective exercises. The
predominance of the average value in the range of positive to negative emotions
may indicate that the negative affect was not important for the exercise perfor-
mance. The observed phenomenon could be perceived as a positive experience of
the probands, and a willingness to learn to maintain correct body posture while
sitting. Before the tests, the subjects indicated the predominance of experienc-
ing negative emotions, which may have resulted from the unfamiliarity with the
situation – a test supervised by specialists, the lack of previous detailed descrip-
tion of the research protocol, or being informed about the need to expose the
relevant muscle parts.
Due to the small study group and the preliminary character of the research,
the importance of musical preference for command response cannot be deter-
mined.
To the authors’ knowledge, there were no studies available that analyse the
effect of a series of voice commands on a patient’s ability to learn to main-
tain correct posture while sitting. Literature sources only report some related
methods using sensors, in an independent configuration, to assess the emotional
state, perceived stress during testing, response to musical entrainment, or muscle
tension during exercise.
The issues of emotion analysis in combination with EMG signal recording
were repeatedly referred to as facial EMG analysis [27–29]. The electrodermal
activity and electrocardiographic signals were the most frequently used physio-
logical patterns of subjects’ emotional responses as a direct reflection of sympa-
thetic nervous system activity [30–32]. However, when analysing the cited solu-
tions, it should be noted that the evoked emotions were controlled each time. A
given event was supposed to cause a targeted feeling. It was not a reaction to
The Effect of Therapeutic Commands 403

spontaneous behavior, significantly distinguishing the available literature from


the presented work.
Music, emotion, and physiological responses were inextricably linked, and
their interdependence has been used in many areas of study, involving a wide
range of sensory-motor, cognitive and emotional processes [33]. One key aspect
is musical entertainment, which has been and will continue to be a repeatedly
used technique in treating musculoskeletal dysfunction [34]. Music acts as a spe-
cific stimulus to produce motor and emotional responses through movement and
stimulation of different sensory pathways [35]. Verbal interaction, especially its
emotional character, between the patient and the physiotherapist was the main
element of any therapy conducted, both in understanding the proband’s perfor-
mance of appropriate exercises and their possible correction. According to the
data, verbal communication was the most important component for organising
and conducting the therapy, which was reflected in the treatment results [36–38]
A scientifically important and experimentally forward-looking intermediate
goal is to evaluate the qualitative and quantitative impact of the emotional
content of the physiotherapist’s oral expression used to obtain the desired patient
behavior during exercise.
A further part of the research is the sonification of these commands and their
emotional content, i.e., changing the acoustic signal of speech (the physiothera-
pist’s commands) into a continuous sound with melodic characteristics based on
the spoken text. The undertaken experiment uses the possibility of modifying
the expression of emotions during verbal communication with the patient during
therapeutic exercises.

5 Conclusion
This paper presents the results of a preliminary study analysing the effect of a
series of voice commands on a patient’s ability to learn and maintain a correct
posture while sitting. Based on the conducted experiment, it is important to draw
attention to the necessity of researching a larger group of people, determining
an additional number of physiological parameters, their subsequent correlation
with psychological data, recorded accelerations, and the influence of personal
musical preferences on the obtained results. To achieve such a set of tasks in
assessing dynamic change in the patient’s condition, it is necessary to develop
a multimodal measurement system that records physiological state and psycho-
logical condition in real-time in relation to social and environmental external
conditions.

References
1. Alves-Conceicao, V., da Silva, D.T., de Santana, V.L., Dos Santos, E.G., Santos,
L.M.C., de Lyra, D.P.: Evaluation of pharmacotherapy complexity in residents
of long-term care facilities: a cross-sectional descriptive study. BMC Pharmacol.
Toxicol. 18(1), 1–8 (2017)
404 D. Kania et al.

2. Jesus, T.S., Hoenig, H., Landry, M.D.: Development of the rehabilitation health
policy, systems, and services research field: quantitative analyses of publications
over time (1990–2017) and across country type. Int. J. Environ. Res. Public Health
17(3), 965 (2020)
3. Osterweis, M., Kleinman, A., Mechanic, D.: Pain and disability: Clinical, behav-
ioral, and public policy perspectives (1987)
4. Muntigl, P., Horvath, A. O., Chubak, L., Angus, L.: Getting to “yes”: overcoming
client reluctance to engage in chair work. Front. Psychol. 11 (2020)
5. Palazzo, C., et al.: Barriers to home-based exercise program adherence with chronic
low back pain: Patient expectations regarding new technologies. Annals Phys.
Rehabilitation Med. 59(2), 107–113 (2016)
6. Clayton, M., Sager, R., Will, U.: In time with the music: the concept of enter-
tainment and its significance for ethnomusicology. In: European meetings in eth-
nomusicology, vol. 11, pp. 1–82. Romanian Society for Ethnomusicology, January
2005
7. Szeto, G.P., Straker, L., Raine, S.: A field comparison of neck and shoulder postures
in symptomatic and asymptomatic office workers. Appl. Ergon. 33(1), 75–84 (2002)
8. Chiu, T.T.W., Ku, W.Y., Lee, M.H., Sum, W.K., Wan, M.P., Wong, C.Y., Yuen,
C.K.: A study on the prevalence of and risk factors for neck pain among university
academic staff in Hong Kong. J. Occup. Rehabil. 12(2), 77–91 (2002)
9. Moore, M.K.: Upper crossed syndrome and its relationship to cervicogenic
headache. J. Manipulative Physiol. Ther. 27(6), 414–420 (2004)
10. Neblett, R., Mayer, T.G., Brede, E., Gatchel, R.J.: Correcting abnormal flexion-
relaxation in chronic lumbar pain: responsiveness to a new biofeedback training
protocol. Clin. J. Pain 26(5), 403 (2010)
11. Ma, C., Szeto, G.P., Yan, T., Wu, S., Lin, C., Li, L.: Comparing biofeedback
with active exercise and passive treatment for the management of work-related
neck and shoulder pain: a randomized controlled trial. Arch. Phys. Med. Rehabil.
92(6), 849–858 (2011)
12. Robinson, D., Kesser, B.W.: Frankfort horizontal plane. In: Kountakis, S.E. (eds.)
Encyclopedia of Otolaryngology, Head and Neck Surgery, pp. 960–960. Springer,
Heidelberg (2013). https://doi.org/10.1007/978-3-642-23499-6 200042
13. Biallkowski, A., Migut, M., Socha, Z., Wyrzykowska, K.M.: Muzykowanie w Polsce.
Badanie podstawowych form muzycznej aktywności Polaków (2014)
14. Van Katwyk, P.T., Fox, S., Spector, P.E., Kelloway, E.K.: Using the Job-Related
Affective Well-Being Scale (JAWS) to investigate affective responses to work stres-
sors. J. Occup. Health Psychol. 5(2), 219 (2000)
15. Mittelstaedt, H.: Origin and processing of postural information. Neurosci. Biobe-
havioral Rev. 22(4), 473–478 (1998)
16. Greco, A., Valenza, G., Lanata, A., Scilingo, E.P., Citi, L.: cvxEDA: a convex
optimization approach to electrodermal activity processing. IEEE Trans. Biomed.
Eng. 63(4), 797–804 (2015)
17. Romaniszyn-Kania, P., et al.: Affective state during physiotherapy and its analysis
using machine learning methods. Sensors 21(14), 4853 (2021)
18. Shaffer, F., Ginsberg, J.P.: An overview of heart rate variability metrics and norms.
Frontiers in public health, 258 (2017)
19. Konrad, P.: ABC EMG: praktyczne wprowadzenie do elektromiografii kinezjolog-
icznej. Technomex Spólka z oo (2007)
20. Van Ravenswaaij-Arts, C.M., Kollee, L.A., Hopman, J.C., Stoelinga, G.B., van
Geijn, H.P.: Heart rate variability. Ann. Intern. Med. 118(6), 436–447 (1993)
The Effect of Therapeutic Commands 405

21. Carlson, N.R.: Physiology of Behavior: Books a la Carte Edition. Prentice Hall
(2016)
22. Setz, C., Arnrich, B., Schumm, J., La Marca, R., Tröster, G., Ehlert, U.: Discrim-
inating stress from cognitive load using a wearable EDA device. IEEE Trans. Inf.
Technol. Biomed. 14(2), 410–417 (2009)
23. Junk, K., Peller, L., Brandenburg, H., Lehrmann, B., Henke, E.: Physiological
Stress Response to Anticipation of Physical Exertion (2018)
24. Munoz, M.L., et al.: Validity of (ultra-) short recordings for heart rate variability
measurements. PloS One 10(9), e0138921 (2015)
25. Balasubramanian, V., Adalarasu, K.: EMG-based analysis of change in muscle
activity during simulated driving. J. Bodyw. Mov. Ther. 11(2), 151–158 (2007)
26. Reaz, M.B.I., Hussain, M.S., Mohd-Yasin, F.: Techniques of EMG signal analysis:
detection, processing, classification and applications. Biological Procedures Online
8(1), 11–35 (2006)
27. Van Boxtel, A.: Facial EMG as a tool for inferring affective states. In: Proceed-
ings of measuring behavior, vol. 7, pp. 104–108. Wageningen: Noldus Information
Technology, August 2010
28. Mithbavkar, S.A., Shah, M.S.: Analysis of EMG based emotion recognition for
multiple people and emotions. In: 2021 IEEE 3rd Eurasia Conference on Biomedical
Engineering, Healthcare and Sustainability (ECBIOS), pp. 1–4. IEEE, May 2021
29. Sato, W., Murata, K., Uraoka, Y., Shibata, K., Yoshikawa, S., Furuta, M.: Emo-
tional valence sensing using a wearable facial EMG device. Sci. Rep. 11(1), 1–11
(2021)
30. Zheng, B. S., Murugappan, M., Yaacob, S., Murugappan, S.: Human emotional
stress analysis through time domain electromyogram features. In: 2013 IEEE Sym-
posium on Industrial Electronics & Applications, pp. 172–177. IEEE, September
2013
31. Canento, F., Fred, A., Silva, H., Gamboa, H., Lourenço, A.: Multimodal biosignal
sensor data handling for emotion recognition. In: 2011 IEEE SENSORS, pp. 647–
650. IEEE, October 2011
32. Egger, M., Ley, M., Hanke, S.: Emotion recognition from physiological signal anal-
ysis: a review. Electron. Notes Theoretical Comput. Sci. 343, 35–55 (2019)
33. Vuilleumier, P., Trost, W.: Music and emotions: from enchantment to entrainment.
Ann. N. Y. Acad. Sci. 1337(1), 212–222 (2015)
34. Galińska, E.: Music therapy in neurological rehabilitation settings. Psychiatr. Pol.
49(4), 835–846 (2015)
35. Le Roux, F.: Music: a new integrated model in physiotherapy. South African J.
Physiotherapy 54(2), 10–11 (1998)
36. Talvitie, U., Reunanen, M.: Interaction between physiotherapists and patients in
stroke treatment. Physiotherapy 88(2), 77–88 (2002)
37. Gyllensten, A.L., Gard, G., Salford, E., Ekdahl, C.: Interaction between patient and
physiotherapist: a qualitative study reflecting the physiotherapist’s perspective.
Physiother. Res. Int. 4(2), 89–109 (1999)
38. Klaber Moffett, J.A., Richardson, P.H.: The influence of the physiotherapist-
patient relationship on pain and disability. Physiother. Theory Pract. 13(1), 89–96
(1997)
Improving the Process of Verifying
Employee Potential During Preventive
Work Examinations – A Case Study

Marcin Bugdol1(B) , Anita Pollak2 , Patrycja Romaniszyn-Kania1 ,


Monika N. Bugdol1 , Magdalena Jesionek2 , Aleksandra Badura1 ,
Paulina Krasnodębska3 , Agata Szkiełkowska3 , and Andrzej W. Mitas1
1
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{marcin.bugdol,patrycja.romaniszyn-kania,monika.bugdol,aleksandra.badura,
andrzej.mitas}@polsl.pl
2
Institute of Psychology, University of Silesia in Katowice, ul. Bankowa 12,
40-007 Katowice, Poland
anita.pollak@us.edu.pl
3
Audiology and Phoniatric Clinic, World Hearing Centre, Institute of Physiology
and Pathology of Hearing in Warsaw, ul. Mokra 17, 05-830 Kajetany, Poland
{p.krasnodebska,a.szkielkowska}@ifps.org.pl

Abstract. The paper presents a new approach to preventive examina-


tions of employees in relation to the low reliability of the appraisals made
due to experiencing stress that does not allow employees to fully present
their potential. This approach is illustrated using the example of a case
study of a voice professional. The study provides a detailed, quantitative
view of the emotions experienced, revealed during the voice recording and
audiometry procedure. Differences were found as a result of the study
in terms of the intensity of emotions. EDA and HRV measurement val-
ues were highest when questionnaires were being completed concerning
the emotions experienced during voice recording and audiometry. The
discussion focuses on the possibilities of analysing emotions using psy-
chophysiological measurements and on the benefits of combining research
methods (physical examinations, psychological examinations, and psy-
chophysiological measurements) in the context of employee appraisal to
predict the effects of the impact of work on subjective well-being and
health of individuals.

Keywords: Preventive work examinations · Stress during


presentation · Job emotions · Signal analysis · Heart rate variability ·
Electrodermal activity

1 Introduction
Preventive examinations of employees make it possible to determine the absence
of contraindications for further work in the specific position and to estimate the
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 406–420, 2022.
https://doi.org/10.1007/978-3-031-09135-3_34
Improving the Process of Verifying Employee Potential 407

risk of occupational diseases. They thus confirm that the employee has com-
petences at a level that ensures the performance of work as expected by the
employer. At the same time, the appraisal given makes it possible to plan the
employee’s further career development. As in the case of job interviews, a limita-
tion is represented by the low reliability of the appraisals made due to the stress
experienced, which does not allow the employee to fully present their potential.
A modification is suggested to the current way of assessing voice professionals’
examinations, consisting of supplementing the ENT examination with question-
naires and standardised emotion assessment methods, and supplementing stress
surveys with psychophysiological measurements. The authors set themselves the
objective of testing a procedure for the reliable appraisal of the professional compe-
tences presented, taking into account the stress induced by each appraisal situation
in the individual, leading to a deterioration in the quality of task performance and
to a lack of adequate assessment of the individual’s potential by others.
The transactional stress theory proposed by Lazarus and Folkman treats
stress as a situational relationship between the environment and the individual,
interpreted by the latter as exceeding their resources or threatening their well-
being [1]. The situation in which the individual is assessed in terms of their pro-
fessional competences can be regarded as stress-inducing, as issuing an opinion
involves social evaluation, important for the candidate’s future [2]. McCarthy and
Goffin identified five areas of anxiety in an employee selection situation, related
to communication (stress related to verbal and non-verbal communication and
listening competencies); appearance (apprehension about physical look); social
(nervousness about social, behavioral appropriateness because of the desire to
be liked); performance (worry about the outcome, such as fear of failure); and
behavioral (expression of the autonomic arousal of anxiety, such as uneasiness or
fidgeting) [3]. In addition, displaying anxiety during a job interview reduces the
likelihood of being hired, and performance deteriorates, especially in situations
where candidates are competing [4,5].
The theory by van Katwyk et al. links emotional states to occupational activ-
ity, dividing them in terms of their positive and negative valence (pleasure and
displeasure) [6]. Taking into account the intensity of arousal (Low/High arousal),
four groups of emotions can be distinguished: high pleasure/high arousal (e.g.,
euphoria), low pleasure/high arousal (e.g., disgust), high pleasure/low arousal
(e.g., satisfaction), and low pleasure/low arousal (e.g., boredom). Van Katwyk’s
theory can be linked to the concept of stress by Selye, according to which pos-
itive emotions with a high degree of arousal can be referred to as psychological
eustress, while negative emotions with a high degree of arousal can be referred to
as psychological distress [7,8]. Eustress has a positive, mobilising impact on the
individual and has a protective effect. It increases one’s energy to act, improves
performance, and is accompanied by emotions such as excitement and joy. Dis-
tress, on the other hand, is a state that reduces concentration and performance,
perceived as a situation beyond the individual’s ability to cope. It is associated
with unpleasant emotions, i.e., fear, uncertainty, and helplessness [7].
The relationship mentioned above, consisting of feeling stress during an
employee selection situation, depends on the individual’s cognitive appraisal and
408 M. Bugdol et al.

stress management skills [9]. Lazarus and Folkman pointed out that people con-
stantly assess what happens to them in terms of its relevance to their well-being
[10]. On the basis of research by these authors, two types of the appraisal can
be distinguished: primary appraisal and secondary appraisal. Primary appraisal
concerns motivational meaning, i.e., whether something relevant to our well-
being is happening. Primary judgments can be divided into three categories:
harm already experienced, threat, i.e., anticipated harm, and challenge, i.e., the
possibility of achieving mastery or gain. Challenge is mentioned in the context of
stress assessment because the individual needs to mobilise to overcome obstacles
and achieve a favorable outcome. Challenges always involve a threat, as there
is always a certain risk of suffering harm. Secondary appraisal is a key element
complementing primary appraisal, since harm, threat, challenge and benefit also
depend on the extent to which one believes they have control over the outcomes.
If there is a risk of a harmful outcome, but one is confident that it can be
prevented, the sense of threat is either minimal or entirely absent.
Choosing emotion as a variable in voice recording is justified, in addition to
its association with stress, by the fact that current technology allows to mea-
sure emotions, including stress, using a voice sample. The most commonly used
measures are voice amplitude (i.e., loudness) and pitch (also referred to as fun-
damental frequency, or F0 ) [11]. The most consistent relationship described in
the literature concerning emotion and voice pitch showed a correlation between
stronger emotional arousal and a higher tone of voice [12–14]. Scherer et al.
investigated the acoustic features of phrases uttered by actors [15]. When actors
presented high valence emotions such as fear, joy, and anger, the pitch was
higher compared to when they presented lower valence emotions, such as sad-
ness. Bachorowski and Owen suggested that voice pitch could be used to assess
the level of emotional arousal being experienced by an individual [16]. Phys-
iological signals, i.e., electrodermal activity and skin temperature, as well as
heart rate variability (HRV), have been proven to be reliable stress indicators
[17,18]. Thus, using them in conjunction with subjective psychological measures
can deepen insights into the fundamental psychological mechanisms describing
reactions to stress. Combining more than one measurement strategy makes it
possible to achieve higher validity in studies, so it is justified to combine physi-
ological measurements with self-report questionnaires [19]. Taking into account
the psychological variables present in the study (stress and emotions), it needs
to be mentioned that self-report methods do not allow to measure stress “live”
at a given moment, but only after a certain of time, after cognitive reappraisal
[20]. Data collected using declarative methods make it possible to capture the
individual’s interpretation of their reaction rather than the reaction itself [21].

2 Materials and Methods


2.1 Aim of the Research
The nature of the study presented here is preliminary. It focused on collecting
quantitative data concerning the emotions experienced during the appraisal of
Improving the Process of Verifying Employee Potential 409

professional competences through a preventive examination. The description is


provided in the form of a case study. Both standardised psychological methods
in the form of questionnaires and psychophysiological measurements were used
in the studies to improve validity.

2.2 Subject
The studied individual was chosen deliberately. The research problem addressed
affects and is personally important for the person invited to participate in the
study, a singer from a leading opera ensemble in Poland. The representative
of this group is excellently prepared for the job. They actively perform their
work-related duties, and their decisions concerning the continuation and direc-
tions of further career development take into account the current state of health
and competence level. They work with their voice for 4 h a day on average. In
the process of competence appraisal, importance is given to the interpretation
presented by both the subject and the relevant expert.
The research was conducted in accordance with the recommendations of the
Declaration of Helsinki, with prior approval by the ethics board. Before the
start of the measurements, the participant was informed about the purpose of
the study and the following steps and consented to them in writing. The data
obtained were fully anonymised.

2.3 Equipment

Specialised equipment was used in the research protocol presented here. Empat-
ica E4 was used to record physiological signals during the subsequent stages of
the study. The patient’s voice was recorded in an appropriately soundproofed
audiometric chamber with a Kay Elementrics device. The Voice Handicap Index
Questionnaire (VHI) was used in the study [22]. The psychological measurements
were performed using a single survey question concerning anticipated stress and
the standardised Job-related Affective Well-being Scale (JAWS) questionnaire
[6]. The surveys and questionnaire studies were conducted using the paper-and-
pencil method.

2.4 Data Acquisition


The study was conducted in a specially designated room in the premises of the
Institute of Hearing Physiology and Pathology at the World Hearing Centre in
Kajetany to ensure calm, comfort and intimacy to the participant. Only the
persons conducting the tests and the participants were present in the room
during the study. The presented research protocol consisted of several steps
(Fig. 1).
The first stage involved a preliminary assessment by an audiologist and pho-
niatrist, to qualify the participant for further testing. The subject underwent a
standard ENT/phoniatric examination in the physician’s office, including video
410 M. Bugdol et al.

Fig. 1. Research protocol

otoscopy, anterior rhinoscopy, examination of the oral cavity and of the pharynx,
and endoscopy of the larynx. In addition, the participant was asked to complete
the VHI questionnaire, which consisted of three parts, with a total of 30 ques-
tions, concerning the functional, emotional and physical state of a person’s voice
disorders with an indication of the answer to a question about the intensity of
anticipated stress related to the appraisal professional competences [22]. Upon
entering the room where further tests would be performed, the subject was first
fitted with the Empatica E4 device on the wrist of the non-dominant hand,
making it possible to record physiological signals such as electrodermal activity
(EDA), temperature, blood volume pressure (BVP), and accelerometer (ACC)
signals, in real-time. The sampling frequency of the EDA and temperature signal
4 Hz, of the BVP signal 64 Hz, and of ACC 32 Hz. The signals were recorded from
that moment until the completion of the entire research protocol, after which
they were exported in raw data form to a .csv file for further analysis.
The subject was then asked to enter a special soundproofed audiometric
chamber, where their voice was recorded using Kay Elementrics (Fig. 2. The
subject’s task was to pronounce the vowels a, e and i one after another in
prolonged phonation. During voice recording, the subject had noise-canceling
headphones on and was alone in the booth. After the recording, the subject was
asked to complete the JAWS questionnaire (JAWS1 ).
The next step in the research protocol involved tonal audiometry, followed by
impedance audiometry. The first test type involved determining what is referred
to as the hearing threshold, i.e., the quietest sound, in the range of frequencies
tested. For this purpose, the patient was asked to sit in the chamber wearing
noise-canceling headphones and holding a special button. After the test started,
the audiometer generated tones of different frequencies, changed by the operator
at the console (outside the chamber). The test was conducted using a standard
clinical audiometer, measuring air conduction in a range 125 Hz to 8 kHz and
bone conduction in a range 250 Hz to 4 kHz [23]. The patient was asked to click
a button each time they heard a sound. The clinical criterion for normal hearing
level in tonal audiometry, according to World Health Organization guidelines,
refers to values not exceeding 20 dB HL. Directly afterwards, impedance audiom-
etry was performed as the most accurate and objective method of middle ear
examination, measuring middle ear pressure, stapedius reflex (with ipsilateral
and contralateral stimulation), and tympanic membrane tension. This type of
Improving the Process of Verifying Employee Potential 411

Fig. 2. Soundproofed audiometric chamber

audiometry was performed using a Clarinet Inventis impedance bridge. A probe


with an appropriately sized tip was inserted into the subject’s ear, and subse-
quently, the device performed the entire test automatically. Directly after the
test, the subject was once again asked to complete the JAWS test, this time con-
cerning what they had felt during both types of audiometry (JAWS2 ). Removal
of the E4 device from the participant’s hand ended the research protocol.

2.5 Data Analysis


Throughout the research protocol, E4 allowed to place markers (start and end of
the stage) in the following order: voice recording, JAWS1, audiometry, JAWS2.
In the subsequent analysis, this facilitated the division of the EDA and BVP
signal and of the HRV coefficients determined from it into appropriate time
segments, depending on activity. For each of the presented protocol stages, anal-
ogous features were determined for all physiological signals.
The EDA signal was divided into the tonic (t) and the phasic component (p),
and also additive error (e) according to the concept by Greco et al. [24]. Galvanic
skin responses (GSR), i.e., sudden spikes (local maxima), were then sought in the
p signal. On this basis, the number and amplitude of all GSRs in the analysed
time segment were determined, along with the number of responses per minute
(rpm). Skin conductance level (SCL) was calculated using the tonic component
[2]. The sum of the tonic and phasic components was used to calculate the basic
statistical characteristics of the signal in the time domain, such as the mean (x),
standard deviation (σ), minimum value (min), and maximum value (max ) [25].
The basic signal recorded related to cardiac activity was the blood volume
pressure (BVP), providing the basis for the determination of heart rate vari-
ability (HRV) coefficients. The accelerometer signal recorded by Empatica E4
throughout the research protocol was also important at this stage of the analy-
sis. It was used as a reference signal to eliminate artefacts of the cardiac signals
412 M. Bugdol et al.

associated with the subject’s movement. The BVP signal provided the basis for
the determination of the IBI (inter-beat interval) vector, i.e., successive time
intervals (dt) between individual heartbeats – local maxima or minima, on the
basis of which the following coefficients were calculated at the next stage of the
analysis: standard deviation of normal-to-normal intervals (SDNN ), root mean
square of successive differences (RMSSD), probability of intervals greater than
50 ms (pNN50 ), and mean heart rate (HR) determined on the basis of IBIs, as
the basic parameters for ultrashort cardiac signal recordings [26].
The Kay Elementrics device, combined with a computer and appropriate soft-
ware, allows to perform a very accurate test of the acoustic structure of the voice.
Multi-Dimensional Voice Program (MVDP ) software was used, making it possi-
ble to analyse 33 acoustic parameters of the voice. Most often, however, it uses
17 parameters in clinical practice to determine percentage changes of a specific
feature of the human voice. These parameters were divided into groups defining
the physical characteristics of the voice: parameters describing fundamental fre-
quency (F0 , Fhi , Flo , STD), parameters assessing frequency disturbance (Jitt,
RAP, PPQ, sPPQ, vF0 ), amplitude disturbance (Shimm, APQ, sAPQ, vAm),
voice irregularity (DUV ), vocal tremor (modulation) (FTRI, ATRI ), and voice
breaks (DVB ), parameters indicating the presence of subharmonic components
(DSH ) and parameters indicating the presence of noise components (NHR, VTI,
SPI ).
MVDP does not allow to extract parameters from selected sounds, only from
the whole recording. It is also impossible to expand the list of coefficients deter-
mined. For this reason, additional parameters were also calculated broken down
by the individual sounds. These were the parameters used in [27].
Harmonic to Noise Ratio (HNR) describes the ratio of harmonics to non-
harmonics of the signal. It is based on autocorrelation and fundamental frequency
(F0 ) analysis. Fundamental Frequency Variability in Frames (vF0f rames ) stands
for the ratio of F0 standard deviation to its mean value. The set of F0 values
is determined for each frame of a signal. Next, Spectrum Centroid (SC ) reflects
the contribution of formants in power spectrum density. It indicates voice sharp-
ness. Spectrum Spread (SS ) determines the energy distribution with respect to
the spectrum centroid. SS makes it possible to distinguish noise from sound sig-
nals. Signal amplitude modulation is described by Shimmer (Shimm). It shows
changes in subglottic pressure connected with vocal fold tension. Jitter (Jitt)
reflects short-term F0 variability. Jitter can be used to evaluate self-control of
vocal fold vibration. Fundamental Frequency Variability (vF0 ) corresponds to
the vF0f rames feature, but F0 is determined for each signal period. Amplitude
Variability (vAm) is determined by the ratio of the standard deviation of the
amplitude to its mean value. Finally, Noise to Harmonic Ratio (NHR) shows the
ratio of non-harmonic to harmonic energy.

2.6 Psychological Measurement


The anticipated stress level related to the appraisal of professional competences
was determined using a single-item measurement, with the following question:
Improving the Process of Verifying Employee Potential 413

To what extent, on a scale of 1 to 10, do you feel stressed about your profes-
sional competence being verified during the tests? The respondent indicated their
answer on a graduated scale whose ends were marked with the appropriate labels,
with 1 = not at all on the left and 10 = strongest possible on the right.
Emotions were analysed using the Job-related Affective Well-being Scale
in its 12-item version. This tool measures emotional reactions occurring in an
occupational context. It is based on the theory by van Katwyk, which assumes
that experienced emotions can be described two-dimensionally: in terms of the
valence criterion (pleasure/displeasure), and in terms of arousal when the emo-
tion is experienced (low arousal/high arousal) [6]. The general categories consist
of individual emotions: the high pleasure/high arousal dimension involves excite-
ment, energy, and inspiration. This sphere can be also referred to as eustress.
High pleasure/low arousal emotions include relaxation, satisfaction, and feeling
at ease. The other part in terms of emotion valence involves low pleasure/high
arousal, i.e., anger, concern, and disgust, which can also be classified in the stress
category, and low pleasure/low arousal, i.e., fatigue, depression, and discourage-
ment. The response format is a 5-point Likert scale (from 1 – “never” to 5 – “very
often”). The answers given by the respondent make it possible to receive several
JAWS results: Total Score, Total Positive Emotions, Total Negative Emotions, as
well as totals for the individual subscales: eustress, High Pleasure/Low Arousal,
distress, Low Pleasure/Low Arousal.

3 Results

The subject assessed the situation of verifying their competences before the
start of the study as non-stressful. The VHI questionnaire score did not indicate
difficulties due to interference associated with voice pathologies. This is con-
trary to the answers they provided after the completion of the study. Both voice
recording, which was specific to the subject, and which measured the relevant
professional competence (voice), and the audiometry, related to the subject’s
complementary competence (hearing) induced emotions of a distress nature in
the subject, exceeding the theoretical average and the average for the subject
(Table 1).

Table 1. JAWS questionnaire results collected after the voice recording and audiometry

Total Total Total Eustress High Distress Low


score positive negative pleasure pleasure
emotions emotions low low
arousal arousal
Possible range 12–60 6–30 6–30 3–18 3–18 3–18 3–18
Theoretical mean 30 15 15 6 6 6 6
JAWS 1 36 15 9 6 3 9 6
JAWS 2 37 16 9 8 3 8 6
414 M. Bugdol et al.

The experienced stress is also confirmed by the measurements of EDA and


HRV signal parameters performed before the start of the study (Table 2), indi-
cating that more stress would be experienced compared to the time preceding
the start of the voice recording and the completion of JAWS1 and respectively,
the time preceding the full test and the completion of JAWS2. On the basis
of the data in Table 1, an increase should be noted in the values of all EDA
parameters (except the mean value and the minimum) in the successive stages
of the research protocol compared to the baseline. In terms of the heart rate vari-
ability parameter analysis, the highest SDNN, RMSSD and pNN50 values were
determined during voice recording. The JAWS emotion assessment test caused
an increase in HR each time compared to baseline. No clear relationship can be
established in the remaining cases.

Table 2. Mean values of features measured during the trial

Baseline Voice JAWS 1 Audiometry JAWS 2


EDA GSR 3 6 15 99 15
GSR amp 0.901 1.772 1.620 2.438 1.094
Rpm 3.025 12 11.180 10.458 14.458
SCL [uS] 1.452 1.057 3.225 1.637 1.303
X [uS] 1.171 1.056 1.188 0.362 0.392
Sigma 0.033 0.039 0.007 0.049 0.108
Min [uS] 0.172 0.070 0.030 0.060 0.150
Max [uS] 1.925 1.852 2.314 3.084 3.311
HRV SDNN [ms] 131.101 125.352 121.442 185.230 197.944
RMSSD [ms] 31.140 32.557 32.351 23.513 29.747
pNN50 [%] 18.820 16.667 38.461 14.427 23.077
HR 73 65 84 71 84

An analysis of the results obtained by the study subject in the voice recording
in terms of total positive emotions, leads to the observation that they are equal
to the theoretical mean in the test – the maximum score on this scale is 30
(Fig. 3). At the same time, they are higher than the mean of all responses given
by the subject. In terms of total negative emotions, the result obtained is lower
than the theoretical mean and also lower than the mean of negative emotions at
the same stage of the study.
In this study phase, the individual emotions experienced by the study subject
can be characterised as a complex of low pleasure/low arousal negative emotions
that exceeded the theoretical mean and the mean for the study subject and
of low and high arousal positive emotions. The study subject scored below the
mean on high arousal negative emotions. The second stage of the study involved
the same variables but measured during audiometry. There was an increase in
overall mean positive emotions but mean negative emotions did not change.
Improving the Process of Verifying Employee Potential 415

Fig. 3. Differences in the intensity of certain categories of emotions during the voice
recording and audiometry

A closer look at the results concludes that the mean did not change either in
terms of low arousal negative emotions or high arousal negative emotions. Thus,
the change concerns only positive emotions: compared to the voice recording, in
the other test the mean increased for high arousal positive emotions, while the
mean for low arousal positive emotions decreased. The emotions experienced
include low pleasure/high arousal and high pleasure/high arousal (above the
theoretical mean and the mean for the study subject), and low pleasure/low
arousal (whose value is below the theoretical mean and slightly below the mean
for the study subject), as well as high pleasure/low arousal.
The results indicate a predominance of positive emotions experienced dur-
ing the voice recording and audiometry. More intensely experienced emotions
appeared during the audiometry, which may imply the presence of eustress in
the subject. The emotion complex observed here is formed by emotions differing
in terms of valence and intensity, meaning that emotions of opposite valence can
be observed simultaneously. The effect is more noticeable at the first stage of
the study, with the subject facing a more stressful situation when their vocal
abilities were being tested.
416 M. Bugdol et al.

4 Discussion
Voice professionals, such as singers, are required to undergo examinations of the
vocal organ every 3–5 years (Annex No. 1 to the Ordinance of the Polish Minister
of Health and Welfare of 30 May 1996) [28]. Obtaining a favorable opinion from
a specialist physician confirms their physical capacities and suitability to pursue
the profession. In the case of the individual described here, neither the physical
examination nor the voice recording and audiometry revealed any contraindica-
tions for continuing to pursue it. The subject proceeded to undergo the voice
recording and audiometry without anxiety, as indicated by their responses in the
anticipated stress questionnaire and in the psychophysiological measurements.
According to congruence theory, when individual abilities, capacities, and pref-
erences match the requirements of a specific job, one can speak of person-job fit
[29]. From the individual’s point of view, lack of job fit is associated with stress
and tension and is relevant to performance at work [30]. Higher levels of stress
result in worse performance [31]. Similarly, not being able to pursue a profession
one feels a vocation for determining the experience of distress and a reduction in
satisfaction and work engagement [32]. At the same time, research has confirmed
that job fit is linked to positive personal and organisational outcomes such as
satisfaction and productivity [33]. On the basis of the theory by McCarthy and
Goffin, it could be assumed that the subject would experience distress during
the voice recording and audiometry, especially during this first one [3]. The pre-
dominance of negative emotions was related to the fact that performance of the
task induced fear of failure, uncertainty about the outcome, and fear of critical
judgment by others. In the group of professional musicians, fear of imperfect
performance is constantly present, which was reflected by commitment to work
manifesting itself, for example, in spending many hours practicing [34,35]. In the
case of the EDA signal, the results confirm this. On the other hand, in the case
of the variables determining HRV, the HR value was lower in the voice recording
and increased at the further stages of the study. This may be due to the fact
that the HRV signal is characterised by a slow varying trend: the time segments
of the individual stages of the measurement protocol are too short to unambigu-
ously determine the trend characteristics, and the body’s response to the stimuli
(the situation) is somewhat delayed [26]. In the study, a difference is seen in the
proportion of positive and negative emotions in both the voice recording and
audiometry. The negative emotions are equally intense, but during audiometry,
not specific to the subject’s competences, more high arousal positive emotions
appear. In other words, a positive cognitive response to a stressor appears, which
is mobilising rather than harmful [36].
A separate aspect is the increased intensity of EDA and HRV signals when
completing the questionnaire concerning the emotions experienced in voice
recording and audiometry. This can be explained by the fact that questionnaire
measurements may distort the original emotional responses, as respondents are
asked to recall their experiences, which activates the cognitive appraisal system,
i.e., involves cognitive activation. In fact, it was believed that data collected
using declarative methods often made it possible to capture the individual’s
Improving the Process of Verifying Employee Potential 417

interpretation of their emotional reaction rather than the emotional reaction


itself [21]. Interpretation involves, in addition to the appraisal of the stress stim-
ulus, determining one’s ability to cope with the situation and whether one was
in control and influenced the course of events. Explaining the changes in the
subject’s results in the consecutive tests in the individual emotion categories
represents a certain challenge due to the absence of unambiguous findings on the
possibility of simultaneous occurrence of emotions of opposite valence. Nelson
and Simmons believed that eustress and distress were separate and independent
aspects of the overall stress response, but they can occur simultaneously [37].
Lundberg and Frankenhauser were of a similar opinion, arguing that eustress
(positive emotions) and distress (negative emotions) could occur in response to
environmental demands separately or in combination [38]. However, along with
the division of emotions in terms of valence and arousal, the view was expressed
that the differences between emotions, such as for example anger and fear, sim-
ilar in terms of these two criteria, depending on the subjective interpretation
of actual events [39]. A study on over 300,000 individuals demonstrates that
people were capable of simultaneously feeling emotions which were different, but
similar in terms of valence [40]. Plutchik expressed a similar view when devel-
oping the three-dimensional model of emotions [41]. There was also extensive
discussion in the literature on what was referred to as the positivity ratio, i.e.,
the ratio of positive to negative affect. It was conceptualised as a key predictor
of well-being, and psychological flourish [42]. Research suggests that the critical
value of the positivity ratio, allowed to distinguish flourishing individuals from
non-flourishing ones, is 2.9:1. Studies have demonstrated positive associations
of the ratio with emotional intelligence, life satisfaction, optimism, self-esteem,
and self-control [43–45]. The positivity ratio and the formula for its calculation
have been criticised [46]. A curvilinear relationship has been indicated between
the positivity ratio and exhaustion, with the suggestion that past a certain value
(i.e., 2.0), the positivity ratio can lead to negative outcomes such as job exhaus-
tion, but nevertheless does indicate a certain trend [47]. In the subject’s case,
the changes in the individual categories suggest that the stress response was
transformed into a more positive stress form (eustress). An analysis of these
changes seems very promising, as it would make it possible to better determine
the impact of work on work exhaustion. One of the limitations of the procedure
applied was that it did not include objective measures of individual performance
or questionnaires to measure commitment to work and occupational burnout.
Using them and conducting longitudinal studies would allow to assess the impact
of the emotions experienced on overall individual well-being and occupational
burnout. Such measurements would enable the development of a system of care
and support for workers based on data concerning their actual potential, allow
to protect them in certain periods against excessive strain and exhaustion, and
to prevent occupational diseases in the long-term perspective.
418 M. Bugdol et al.

5 Conclusion
According to the findings, the research protocol is a reliable assessment of the
professional potential of vocalists tested during preventive work examinations.
Mixed measurements (from the subject in the form of a questionnaire (VHI) and
questionnaire (JAWS) replies, as well as psychophysiological measures (EDA,
HRV)) were used to collect the results. They were gathered at various points
throughout the research. As a result, it was possible to acquire more informa-
tion about the respondent’s stress and emotions, which could aid in a more
comprehensive understanding of behavior during preventive work examinations.
In the future, research should be carried out on a larger group of people, with
indicators of work exhaustion included and a much broader range of physiologi-
cal characteristics and behavioral-physiological correlations identified among the
probands.

References
1. Lazarus, R.S., Folkman, S.: Stress, Appraisal, and Coping. Springer, Cham (1984)
2. Finnerty, A.N., Muralidhar, S., Nguyen, L.S., Pianesi, F., Gatica-Perez, D.: Stress-
ful first impressions in job interviews. In: Proceedings of the 18th ACM Interna-
tional Conference on Multimodal Interaction, pp. 325–332, October 2016
3. McCarthy, J., Goffin, R.: Measuring job interview anxiety: beyond weak knees and
sweaty palms. Pers. Psychol. 57(3), 607–637 (2004)
4. Constantin, K.L., Powell, D.M., McCarthy, J.M.: Expanding conceptual under-
standing of interview anxiety and performance: Integrating cognitive, behavioral,
and physiological features. Int. J. Sel. Assessment (2021)
5. Feiler, A.R., Powell, D.M.: Behavioral expression of job interview anxiety. J. Bus.
Psychol. 31(1), 155–171 (2016)
6. Katwyk, P., Fox, S., Spector, P., Kelloway, K.: Using the Job-Related Affective
Well-Being Scale (JAWS) to investigate affective responses to work stressors. J.
Occup. Health Psychol. 5, 219–30 (2000)
7. Selye, H.: Implications of stress concept. N. Y. State J. Med. 75(12), 2139–2145
(1975)
8. Basińska, B.: Emocje w miejscu pracy w zawodach podwyższonego ryzyka psy-
chospołecznego. Polskie Forum Psychologiczne, XVII I(1), 81–92 (2013)
9. Łosiak, W.: Psychologia stresu. Wydawnictwa Akademickie i Profesjonalne (2008)
10. Lazarus, R.S., Folkman, S.: Transactional theory and research on emotions and
coping. Eur. J. Pers. 1(3), 141–169 (1987)
11. Mauss, I.B., Robinson, M.D.: Measures of emotion: a review. Cognition Emotion
23(2), 209–237 (2009)
12. Bachorowski, J.-A.: Vocal expression and perception of emotion. Curr. Dir. Psy-
chol. Sci. 8(2), 53–57 (1999)
13. Kappas, A., Hess, U., Scherer, K.R.: Voice and emotion. In: Feldman, R.S., Rime,
B. (eds.) Fundamentals of Nonverbal Behavior, pp. 200–238. Cambridge University
Press, Cambridge (1999)
14. Pittam, J., Gallois, C., Callan, V.: The long-term spectrum and perceived emotion.
Speech Commun. 9(3), 177–187 (1990)
Improving the Process of Verifying Employee Potential 419

15. Scherer, K.R., Banse, R., Wallbott, H.G., Goldbeck, T.: Vocal cues in emotion
encoding and decoding. Motiv. Emot. 15(2), 123–148 (1991)
16. Bachorowski, J.A., Owren, M.J.: Vocal expression of emotion: acoustic properties
of speech are associated with emotional intensity and context. Psychol. Sci. 6(4),
219–224 (1995)
17. Karthikeyan, P., Murugappan, M., Yaacob, S.: Detection of human stress using
short-term ECG and HRV signals. J. Mech. Med. Biol. 13(02), 1350038 (2013)
18. Allen, A.P., Kennedy, P.J., Cryan, J.F., Dinan, T.G., Clarke, G.: Biological and
psychological markers of stress in humans: focus on the Trier Social Stress Test.
Neurosci. Biobehavioral Rev. 38, 94–124 (2014)
19. Seo, J., Laine, T.H., Sohn, K.A.: An exploration of machine learning methods
for robust boredom classification using EEG and GSR data. Sensors 19(20), 4561
(2019)
20. Micu, A.C., Plummer, J.T.: Measurable emotions: how television ads really work:
patterns of reactions to commercials can demonstrate advertising effectiveness. J.
Advert. Res. 50(2), 137–153 (2010)
21. Poels, K., Dewitte, S.: How to capture the heart? reviewing 20 years of emotion
measurement in advertising. J. Advert. Res. 46(1), 18–37 (2006)
22. Krasnodębska, P., Szkiełkowska, A., Rosińska, A., Domeracka-Kołodziej, A., Wło-
darczyk, E., Miaśkiewicz, B., Skarżyński, H.: Polska adaptacja kwestionariusza
oceny niepełnosprawności głosowej Pediatric Voice Handicap Index (pVHI). Nowa
Audiofonologia 8(1), 55–59 (2019)
23. Krasnodębska, P., Raj-Koziak, D., Szkiełkowska, A., Skarżyński, H.: Zastosowanie
audiometrii wysokich częstotliwości w diagnostyce nagłego niedosłuchu u muzyka.
Am. J. Case Rep. 5(4), 77–81 (2016)
24. Greco, A., Valenza, G., Lanata, A., Scilingo, E.P., Citi, L.: cvxEDA: a convex
optimization approach to electrodermal activity processing. IEEE Trans. Biomed.
Eng. 63(4), 797–804 (2015)
25. Romaniszyn-Kania, P., et al.: Affective state during physiotherapy and its analysis
using machine learning methods. Sensors 21(14), 4853 (2021)
26. Shaffer, F., Ginsberg, J.P.: An overview of heart rate variability metrics and norms.
Frontiers in public health, 258 (2017)
27. Zyśk, A., Bugdol, M., Badura, P.: Voice fatigue evaluation: a comparison of singing
and speech. In: Tkacz, E., Gzik, M., Paszenda, Z., Piętka, E. (eds.) IBE 2018. AISC,
vol. 925, pp. 107–114. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-
15472-1_12
28. Załącznik nr 1 do rozporządzenia Ministra Zdrowia i Opieki Społecznej z dnia
30 maja 1996 r. https://ibhp.uj.edu.pl/documents/1028990/c7941269-7b5d-4249-
8fa3-03e20d7018e4
29. Lawrence, A., Doverspike, D., O’Connell, M.: An Examination of the Role Job Fit
Plays in Selection (2004)
30. Rousseau, D.M., Parks, I.M.: The contracts of individuals and organizations. Res.
Organizational Behav. 15, 41–43 (1993)
31. Tillmann, J., Hamill, L., Dungan, B., Lopez, S., Lu, S.: Employee stress, engage-
ment, and work outcomes. Paper presented at the meeting of the Society for
Industrial-Organizational Psychology, Chicago, IL, April 2018
32. Berg, J.M., Grant, A.M., Johnson, V.: When callings are calling: crafting work
and leisure in pursuit of unanswered occupational callings. Organ. Sci. 21(5), 973–
994(2010)
420 M. Bugdol et al.

33. Kristof-Brown, A.L., Zimmerman, R.D., Johnson, E.C.: Consequences of individ-


ual’s fit at work: a meta-analysis of person-job, person-organization, person-group,
and person-supervisor fit. Pers. Psychol. 58(2), 281–342 (2005)
34. McGinnis, A.M., Milling, L.S.: Psychological treatment of musical performance
anxiety: current status and future directions. Psychotherapy: Theory Res. Practice
Training 42(3), 357 (2005)
35. Ericsson, K.A., Krampe, R.T., Tesch-Römer, C.: The role of deliberate practice in
the acquisition of expert performance. Psychol. Rev. 100(3), 363–406 (1993)
36. Lazarus, R.S.: From psychological stress to the emotions: a history of changing
outlooks. Annu. Rev. Psychol. 44, 1–21 (1993)
37. Nelson, D.L., Simmons, B.L.: Savoring eustress while coping with distress: The
holistic model of stress. In: Quick, J.C., Tetrick, L.E. (eds.) Handbook of Occupa-
tional Health Psychology, pp. 55–74. American Psychological Association, Wash-
ington (2011)
38. Lundberg, U., Frankenhaeuser, M.: Pituitary-adrenal and sympathetic-adrenal cor-
relates of distress and effort. J. Psychosom. Res. 24, 125–130 (1980)
39. Russell, J.A.: Core affect and the psychological construction of emotion. Psychol.
Rev. 110, 145–172 (2003)
40. Cowen, A.S., Keltner, D.: Self-report captures 27 distinct categories of emotion
bridged by continuous gradients. Proc. Natl. Acad. Sci. U.S.A. 114(38), E7900–
E7909 (2017)
41. Plutchik, R.: A psychoevolutionary theory of emotions. Soc. Sci. Inf. 21(4–5), 529–
553 (1982)
42. Fredrickson, B.L., Losada, M.F.: Positive affect and the complex dynamics of
human flourishing. Am. Psychol. 60(7), 678–686 (2005)
43. Shrira, A., Palgi, Y., Wolf, J.J., Haber, Y., Goldray, O., Shacham-Shmueli, E.,
Ben-Ezra, M.: The positivity ratio and functioning under stress. Stress. Health 27,
265–271 (2011)
44. Orkibi, H., Ronen, T.: Basic psychological needs satisfaction mediates the associ-
ation between self-control skills and subjective well-being. Front. Psychol. 8, 936
(2017)
45. Moroń, M.: Perceived emotional intelligence and life satisfaction: the mediating
role of the positivity ratio. Curr. Issues Pers. Psychol. 6, 212–223 (2018)
46. Brown, N.J.L., Sokal, A.D., Friedman, H.L.: The complex dynamics of wishful
thinking: the critical positivity ratio. Am. Psychol. 68(9), 801–813 (2013)
47. Basińska, B.A., Gruszczyńska, E.: Positivity and job burnout in emergency per-
sonnel: examining linear and curvilinear relationship. Pol. Psychol. Bull. 48(2),
212–219 (2017)
The Microphone Type and Voice Acoustic
Parameters Values – A Comparative
Study

 ukasz Pawelec1(B) , Anna Lipowicz1 , Miroslaw Czak2 , and Andrzej W. Mitas2


L
1
Institute of Environmental Biology, Division of Anthropology, Wroclaw University
of Environmental and Life Sciences, Kożuchowska St. 5, 51-631 Wroclaw, Poland
{lukasz.pawelec,anna.lipowicz}@upwr.edu.pl
2
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{miroslaw.czak,andrzej.mitas}@polsl.pl

Abstract. The selection of the appropriate voice signal recording equip-


ment, including a microphone, and the selection of appropriate conditions
for the recording site is crucial for obtaining reliable and authoritative
values of acoustic parameters in both medical and biological research.
The aim of this study was to compare selected acoustic measures of two
microphone types – dynamic and condenser. The study involved 80 adults
(including 48 women) for whom the values of voice parameters, i.e. inten-
sity, fundamental and formant frequencies, jitter, shimmer and noise-to-
harmonic ratio (NHR) were determined on the basis of 5 vowels recorded
simultaneously with both microphones. The existence of significant differ-
ences in the values of selected acoustic parameters between the two types
of microphones, despite the high correlation coefficient of these measures,
was demonstrated. The results of this study may prove important for voice
researchers when selecting the appropriate recording equipment.

Keywords: Dynamic microphone · Condenser microphone ·


Fundamental frequency · Formant frequencies · Vocal perturbation

1 Introduction
Specific values of acoustic parameters of the human voice are crucial from the
medical point of view, especially in the diagnosis and treatment of diseases of
the speech apparatus [1]. The cut-off values of voice parameters are particu-
larly important to discriminate between pathological and normal voices [2] –
the differences in acoustic parameters recorded by different types of microphone
can lead to wrong assessment of voice quality (i.e. labeled patient as having a
dysphonic voice, when some parameter’s values are close to cut-off values [3]).
Apart from medical diagnostics of the speech apparatus, reliably determined
values of acoustic parameters are important in the life sciences. Voice biology
researchers use these measures to assess an individual’s sex [4], age [5], emotional
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 421–431, 2022.
https://doi.org/10.1007/978-3-031-09135-3_35
422 L
 . Pawelec et al.

state [6], and even body size [7]. For example some voice characteristic’s values
such fundamental and formant frequencies are significantly correlated with mea-
surements describing body size and shape [8–15]. Also some formants derivatives
(especially formant dispersion [Df] or formant spacing [δF]) are strongly related
to body size [16]or body weight and age [17] among some mammals. As these
parameters are crucial for the correct description of biological phenomena and
medical diagnostics, precise determination of their value should be a particularly
important element in voice research.
The aim of the study was to compare quantitatively the acoustic parameters
of the voice of 80 adults, obtained with the use of two microphones of different
specifications.

2 Material and Methods

2.1 Subjects

The material was collected at two institutes: The Jerzy Kukuczka Academy of
Physical Education (Katowice, Poland) and Wroclaw University of Environmen-
tal and Life Sciences (Wroclaw, Poland). A total of 80 participants (including 48
women) participated in the study; 28 from Katowice (16 women) and 52 from
Wroclaw (32 women). Each subject was asked to complete a short questionnaire
and record a voice sample. All participants of the study were tested at the same
time of the day – between 9 and 12 am. All subjects agreed to participate in the
study free of charge.

2.2 Preliminary Questionnaire

Each of the respondents completed a short questionnaire containing inclusion cri-


teria. A questionnaire consisted of birth record questions and questions regarding
all factors that may affect the acoustic parameters of the voice, i.e. speech andor
hearing defects, occlusion defects, history of trauma and surgery of the head
and neck, being ill during the examination, the use of stimulants such as, for
example, cigarettes, alcohol consumption in significant amounts on the day pre-
ceding the day of examination or taking hormonal drugs (e.g. anabolic steroids,
growth hormone, steroid drugs, etc.). Additionally, the women were asked to fill
in a questionnaire on the current phase of the menstrual cycle, taking hormonal
agents (e.g. contraceptives) or being pregnant or after/during menopause.

2.3 Voice Recording Procedure

All voice recordings were made in the same acoustic conditions (room muted from
external sounds), using a professional acoustic cabin Mozos Mshield, placed on a
special tripod with a height adjusted to the height of the examined person. The
recordings were made in a standing position, using two microphones at the same
time. Both microphones were placed on special grips 15 cm from the mouth of the
The Microphone Type and Voice Acoustic Parameters Values 423

examined person, directly at the height of his head. Special pop-filters were used
for both microphones. The background noise measurement was approximately
38 dB in Wroclaw and approximately 39 dB in Katowice. The volume (intensity)
of the recordings was controlled with a special digital sound level meter Benetech
GM1351 and was in the range of 65–70 dB.
The first type of microphone used in the study was a dynamic cardioid micro-
phone Shure SM 58 SE with 50 Hz–15 kHz, connected to an amplifier IMG Stage-
line MPA-202 with 45 dB sound amplification and low 60 Hz. All voice samples
were recorded as a mono files.
The second type of microphone used in the study was a condenser cardioid
microphone Rode NT1-A Kit with 20 Hz–20 kHz, connected to Zoom H4n PRO
sound recorder with low 80 Hz. The recordings were recorded as a stereo files.
All samples were recorded as uncompressed format files (.wav) with an equiv-
alent sampling frequency of 44.1 kHz and with 16-bit resolution.
Each participant was asked to speak aloud 5 Polish vowels: a, e, i, o, u
which can be phonetically recorded as /α:/, /:/, /i:/, /⊃:/, /u:/ with sustained
phonation for 3 s followed by 1 s break. Each sample was recorded simultaneously
via 2 microphone types: dynamic and condenser.

2.4 Acoustic Parameters Analyses


All voice recordings analyses were performed in Praat software v 6.0.56. [18].
For each vowel (a, e, i, o, u) middle fragment of equal length (0.2 s) were used to
determine voice parameters. Then the values from five vowels were averaged to
calculate a final acoustic parameter value. For each vowel the following param-
eters were determined (as described in [19]):

– sound intensity (loudness);


– fundamental frequency (F 0) parameters: mean pitch (MF 0 ), median pitch
(M eF 0 ), standard deviation (SDF 0 ), minimum (M inF 0 ), maximum (M axF 0 )
pitch;
– frequency perturbation indices: jitter (local ), jitter rap, jitter ppq5, jitter ddp;
– amplitude perturbation indices: shimmer (local ), shimmer apq3, shimmer
apq5, shimmer apq11, shimmer dda;
– harmonics/noises parameters: harmonic-to-noise ratio (HNR) and noise-to-
harmonic ratio (NHR);
– formant frequencies parameters (formants): F1-F4.

F0 parameters were measured using pitch floor 75 Hz and pitch ceiling 500 Hz.
For formants ceiling value was 5000 Hz.

2.5 Statistical Analysis


First, descriptive statistics of all mentioned acoustic parameters for dynamic and
condenser microphone were presented. Moreover, the normality of the distribu-
tions for all parameters was examined using Shapiro-Wilk W test.
424 L
 . Pawelec et al.

Secondly, the voice acoustic parameters were compared between two types
of microphones (dynamic and condenser) using Pearson’s correlation coefficient
(r ). Additionally, for better illustration of this relationship scatterplots for each
parameter were created.
Furthermore, the medians and quartile deviation (Q) of acoustic parameters
from these two microphones were compared and for each parameter a difference
was determined. To check the significance of those differences, due to the lack
of normal distribution of those parameters, the Wilcoxon signed-rank test was
applied check the significance of those differences.
For all statistical analyses a Statistica v 13.3 software (1984–2017 TIBCO
Software Inc, Palo Alto, California, USA) was applied. A significance level of 5%
(p ≤ 0.05) were adopted for all tests.

3 Results
3.1 Descriptive Data

Descriptive statistics of acoustic parameters recorded by dynamic and condenser


microphone type for 80 adult participants of the study (mean age: 39 y., SD =
14.2 y.; range: 18–72 y., Fig. 1) were presented in Table 1.

Fig. 1. Age structure of participants (N = 80)


The Microphone Type and Voice Acoustic Parameters Values 425

Table 1. Descriptive statistics of acoustic parameters from two microphone types


(N=80)

Dynamic type microphone Condenser type microphone


Mean SD Median Min Max Mean SD Median Min Max
M eF 0 [Hz] 161.00 44.86 170.71 87.22 237.88 160.91 44.67 171.22 87.40 237.45
MF 0 [Hz] 161.21 44.88 170.82 87.33 237.60 160.94 44.62 171.20 87.57 237.37
SDF 0 [Hz] 1.99 2.15 1.35 0.37 12.51 1.85 1.96 1.30 0.33 13.77
M inF 0 [Hz] 158.28 44.47 169.52 84.50 236.08 158.17 44.27 168.89 85.82 236.05
M axF 0 [Hz] 164.78 45.58 173.99 90.09 239.70 164.30 45.23 174.39 89.67 239.34
Jitter (local) [%] 0.44 0.19 0.41 0.16 1.22 0.49 0.41 0.42 0.16 3.64
Jitter (rap) [%] 0.22 0.10 0.20 0.08 0.64 0.25 0.22 0.21 0.08 1.94
Jitter (ppq5) [%] 0.25 0.11 0.21 0.09 0.66 0.29 0.27 0.22 0.10 2.42
Jitter (ddp) [%] 0.66 0.29 0.59 0.25 1.91 0.76 0.65 0.64 0.25 5.82
Shimmer (local) [%] 4.51 2.73 3.98 0.77 13.16 5.21 3.76 4.24 0.88 19.06
Shimmer (apq3) [%] 2.14 1.29 1.84 0.40 6.57 2.48 1.82 1.98 0.46 9.86
Shimmer (apq5) [%] 2.79 1.75 2.46 0.45 8.22 3.26 2.59 2.62 0.48 14.13
Shimmer (apq11) [%] 4.17 2.89 3.49 0.57 11.99 4.76 3.59 3.63 0.71 16.83
Shimmer (dda) [%] 6.41 3.87 5.51 1.21 19.72 7.44 5.45 5.93 1.38 29.59
NHR 0.03 0.03 0.02 0.002 0.15 0.04 0.06 0.02 0.00 0.41
HNR [dB] 19.42 4.07 19.59 10.12 28.78 19.18 4.20 19.53 6.91 26.81
Intensity [dB] 74.99 12.01 72.52 56.23 149.05 71.93 11.81 77.33 47.62 86.02
F1 [Hz] 669.70 168.09 615.41 404.21 1113.27 630.10 141.85 584.07 409.79 1011.52
F2 [Hz] 1947.72 505.48 1681.66 1387.31 3090.85 1861.75 495.20 1616.84 1341.62 3135.48
F3 [Hz] 3263.82 601.03 2987.36 2530.88 4496.36 3217.29 598.08 2945.62 2507.47 4656.40
F4 [Hz] 4523.80 951.87 4032.63 3425.65 6441.54 4500.73 941.04 3991.01 3454.39 6474.54

None of participants of this study had speech/hearing defects. None of them


declared any history of trauma and/or surgery of the head and neck nor occlusion
defects. None of the subjects was ill on the day of examination. The question-
naire did not reveal the use of any hormonal agents among the respondents,
nor significant alcohol consumption in the evening before the study. All women
participants were in the infertile phase of the menstrual cycle or were after
menopause.

3.2 Linear Correlation of Acoustic Parameters from Different


Microphone Types

The values of all voice parameters between two microphone types were com-
pared using linear Pearson’s correlation (Table 2). The most similar values in
dynamic and condenser microphones were observed for parameters describing
426 L
 . Pawelec et al.

Table 2. Pearson’s correlation coefficients of acoustic parameters between dynamic


and condenser microphone types (N=80)

Acoustic Pearson correlation Acoustic Pearson correlation


Parameter coefficient (r) Parameter coefficient (r)
M eF 0 0.993* Shimmer (apq5) [%] 0.726*
MF 0 [Hz] 0.994* Shimmer (apq11) [%] 0.726*
SDF 0 [Hz] 0.895* Shimmer (dda) [%] 0.750*
M inF 0 [Hz] 0.995* NHR 0.558*
M axF 0 [Hz] 0.994* HNR [dB] 0.762*
Jitter (local) [%] 0.740* Intensity [dB] 0.751*
Jitter (rap) [%] 0.715* F1 [Hz] 0.830*
Jitter (ppq5) [%] 0.689* F2 [Hz] 0.947*
Jitter (ddp) [%] 0.714* F3 [Hz] 0.953*
Shimmer (local) [%] 0.725* F4 [Hz] 0.981*
Shimmer (apq3) [%] 0.750*
* p < 0.001

fundamental frequency (F0) of sound wave, especially minimum pitch, max-


imum pitch, median pitch and mean pitch (Fig. 2). The weakest correlations
were noticed for parameters of amplitude/frequency perturbations (especially
jitter ppq5 ), mean noise-to-harmonic ratio (NHR) and intensity.

3.3 Differences Between Voice Parameters of Dynamic


and Condenser Microphone
To assess the amount and significance of differences in acoustic parameters
recorded by two types of microphones, Wilcoxon signed-rank test was applied
(Table 3).
The greatest significant differences of acoustic parameters values between
two microphones were noticed for intensity (loudness) and first two formants
(F1-F2). The analysis of the F1-F4 formants calculated on the measurements
results of both microphones shows that the maximum energy of the dynamic
microphone is shown at higher frequencies than for condenser microphone. This
phenomenon can be explained by comparing the frequency characteristics of
both microphones (Fig. 3) [20,21].
The Microphone Type and Voice Acoustic Parameters Values 427

Table 3. Wilcoxon signed-rank test values of acoustic parameters of voice for two
microphone types – dynamic and condenser one. Median± Quartile deviation (Q)

Dynamic (D) Condenser (C) Difference (D-C) Z p


Median pitch [Hz] 170.71±41.41 171.22±41.59 −0.517 0.1919 0.8479
Mean pitch [Hz] 170.82±41.20 171.20±41.77 −0.381 0.0911 0.9274
Standard deviation [Hz] 1.35±0.74 1.30±0.63 0.049 0.5995 0.5488
Minimum pitch [Hz] 169.52±40.84 168.89±41.39 0.632 0.0096 0.9924
Maximum pitch [Hz] 173.99±43.71 174.39±42.27 −0.406 0.0863 0.9312
Jitter (local) [%] 0.41±0.14 0.42±0.15 −0.010 0.9737 0.3302
Jitter (rap) [%] 0.20±0.06 0.21±0.07 −0.015 1.3861 0.1657
Jitter (ppq5) [%] 0.21±0.07 0.22±0.07 −0.007 0.8178 0.4135
Jitter (ddp) [%] 0.59±0.18 0.64±0.20 −0.045 1.3813 0.1672
Shimmer (local) [%] 3.98±1.65 4.24±1.75 −0.261 1.3334 0.1824
Shimmer (apq3) [%] 1.84±0.71 1.98±0.76 −0.141 1.6499 0.0990
Shimmer (apq5) [%] 2.46±1.06 2.62±1.13 −0.158 1.3574 0.1747
Shimmer (apq11) [%] 3.49±1.82 3.63±2.14 −0.132 1.4917 0.1358
Shimmer (dda) [%] 5.51±2.14 5.93±2.27 −0.423 1.6499 0.0990
Mean noise-to-harmonics ratio 0.02±0.01 0.02±0.01 0.002 0.3837 0.7012
Mean harmonics-to-noise ratio [dB] 19.59±2.87 19.53±2.52 0.061 0.1967 0.8441
Intensity [dB] 72.51±7.33 77.33±11.49 −4.814 2.5372 0.0112*
F1 [Hz] 615.41±85.58 584.07±65.19 31.337 4.3694 <0.0001∗
F2 [Hz] 1681.66±450.46 1616.84±409.33 64.815 4.5373 <0.0001∗
F3 [Hz] 2987.36±530.31 2945.62±498.95 41.745 1.8706 0.0614
F4 [Hz] 4032.63±920.72 3991.01±884.78 41.614 1.0984 0.2721
* p<0.05

Fig. 2. Scatterplots of Pearson correlation of four fundamental frequency parameters:


A. Median pitch, B. Mean pitch, C. Minimum pitch and D. Maximum pitch recorded
by dynamic (D) and condenser (C) microphone types
428 L
 . Pawelec et al.

Fig. 3. Frequency characteristics comparison of the tested microphones (data from


[22, 23])

The results indicate a different amplitude response of the tested microphones


to voice stimulus. Shimmer is smaller in recordings made with a dynamic micro-
phone than for a condenser microphone, but the values of shimmer calculated
in relation to the averaged amplitudes give lower percentages, when number of
amplitudes taken into the average increases.
A possible explanation for this phenomenon is the different dynamic response
of microphones to impulse stimulation. Unfortunately, this parameter is usually
not included in technical specifications of microphones and can only be estimated
from the sensitivity parameter of the microphone’s specification.

4 Discussion

The comparative analysis showed the existence of significant differences in some


acoustic parameters registered by two types of microphones – dynamic and con-
denser. In the parameters describing the fundamental frequency of the voice
(i.e., mean pitch, maximum pitch etc.), formant frequencies (F1-F4) and inten-
sity (voice loudness), higher values were recorded in the case of a dynamic
microphone. The opposite situation was observed in the case of voice instability
parameters (perturbations of amplitude [shimmer local, shimmer apq3 etc.]) and
frequency perturbations [jitter local, jitter rap etc.] of the sound wave) where
higher values of these parameters were recorded with a condenser microphone.
Due to the relatively large mass of the diaphragm-coil assembly, dynamic
microphones filter changes in the amplitude of the waveform, showing less shim-
mer. This effect can be increased by signal averaging. Condenser microphones,
due to the lower inertia of the of the condenser capsule diaphragm, have a lower
effect of masking amplitude changes. It is visible that the SM58 microphones
have a distinct increase of the frequency response above 1kHz, while the NT1
microphone has a much flatter and even frequency response. The use of a switch-
able equalizer of the frequency response (equalizer) would make it possible to
flatten measuring path characteristics. These findings contradict the results of
Titze & Winholtz [24]. These authors observed that perturbation parameters
recorded by dynamic microphones had higher values than by condenser types.
However, not all of these differences were significant – statistical significance was
obtained for the intensity and the first two formant frequencies (F1-F2). Parsa
The Microphone Type and Voice Acoustic Parameters Values 429

et al. [2] found significant differences between an original sound parameters of F0


perturbations (expect absolute jitter), amplitude perturbations and harmonic-
to-noise ratio and those recorded by four microphones (2 dynamics, 1 electret
and 1 condenser). The condenser omnidirectional microphone parameters were
closest to the original values [2]. Also Bottalico et al. [3] concluded that acoustic
parameter values closest to the reference values were obtained using a condenser
microphone. On the other hand Keisenwether & Sataloff [1] found no statistically
significant effect of microphone type on the validity of acoustic measures.
Limitation of the Study. Unfortunately, the repeatability of the diaphragm-
coil assembly measurements was not verified during this test. Probably the repro-
ducibility of the measurements of a condenser microphone is greater than that of
a dynamic microphone, which can be concluded from Fig. 3. A condenser micro-
phone has a flatter response than a dynamic microphone, which has a large
variation in the frequency characteristic. Therefore, it is necessary to use an
equalizer when measuring voice parameters with a dynamic microphone.

5 Conclusion
The differences obtained in this study are an indication for voice researchers
(e.g. bioacoustics) that the selection of sound recording equipment may have a
significant impact on the results obtained by them. The above-mentioned use of
a switchable equalizer would make it possible to equalize the characteristics of
the signal path (Fig. 4). It is important, that the corrector system introduces a
minimal and constant phase shift in the range of measured frequencies.

Fig. 4. Microphone frequency response equalizer

It is also possible to equalize the characteristics of the recorded waveform in


the digital domain using post-processing.
A separate analysis must be carried out for significant differences of wave-
forms amplitude fluctuations [25,26]. Further research, including other fac-
tors influencing voice measures’ values (acoustic background, type of sound
card/audio interface, distance from the sound recorder and recording angle) and
comparing the obtained values with the reference values of artificially generated
sounds are necessary to better understanding of this process.
In our opinion, better option for bioacoustic testing is using condenser micro-
phone type because of flatter response than dynamic one. As condenser micro-
phones are usually more expensive, it is worth considering the use of dynamic
430 L
 . Pawelec et al.

microphones with an appropriately selected equalizer, which would eliminate


the lower repeatability of measurements of acoustic parameters of this type of
microphones. Therefore, when comparing recorded voices from different papers,
attention should be paid to the type of microphone used, because e.g. formant
values (F1-F4) may differ significantly between dynamic and condenser micro-
phone type. As a consequence, it may influence, for example, the values of the
observed correlations of body build and vocal parameters in research in the field
of bioacoustics The safest solution seems to be choosing microphones with a
flat frequency response. To sum up, the publications should provide information
on the type of microphone used, which is essential when making comparisons
between the results of various authors.

Acknowledgement. We would like to thank The Jerzy Kukuczka Academy of Phys-


ical Education in Katowice for the opportunity to collect research material.

References
1. Keisenwether, J.S., Sataloff, R.T.: The effect of microphone type on acoustical
measures of synthesized vowels. J. Voice 29(5), 548–551 (2015)
2. Parsa, V., Jamieson, D.G., Pretty, B.R.: Effects of microphone type on acoustic
measures of voice. J. Voice 15(3), 331–343 (2001)
3. Bottalico, P., et al.: Reproducibility of voice parameters: the effect of room acous-
tics and microphones. J. Voice 34(3), 320–334 (2020)
4. Puts, D.A., et al.: Sexual selection on male vocal fundamental frequency in humans
and other anthropoids. Proc. Roy. Soc. B: Biol. Sci. 283(1829), 20152830 (2016)
5. Ramig, L.A., Ringel, R.L.: Effects of physiological aging on selected acoustic char-
acteristics of voice. J. Speech Lang. Hear. Res. 26(1), 22–30 (1983)
6. Pisanski, K., Raine, J., Reby, D.: Individual differences in human voice pitch are
preserved from speech to screams, roars and pain cries. Roy. Soc. Open Sci. 7(2),
191642 (2020)
7. Fitch, W.T., Giedd, J.: Morphology and development of the human vocal tract: a
study using magnetic resonance imaging. J. Acoust. Soc. Am. 106(3), 1511–1522
(1999)
8. Bruckert, L., Li, ’enard, J.S., Lacroix, A., Kreutzer, M., Leboucher, G.: Women
use voice parameters to assess men’s characteristics. Proc. Roy. Soc. B: Biol. Sci.
273, 83–89 (2006)
9. Evans, S., Neave, N., Wakelin, D.: Relationships between vocal characteristics and
body size and shape in human males: an evolutionary explanation for a deep male
voice. Biol. Psychol. 72, 160–163 (2006)
10. Gonzalez, J.: Formant frequencies and body size of speaker: a weak relationship in
adult humans. J. Phon. 32, 277–287 (2004)
11. Gonzalez, J.: Correlations between speakers’ body size and acoustic parameters of
voice. Percept. Mot. Skills 105, 215–220 (2007)
12. Pawelec, L . P., Graja, K., Lipowicz, A.: Vocal indicators of size, shape and body
composition in polish men. J. Voice. In Press (2020)
13. Pisanski, K., et al.: Vocal indicators of body size in men and women, a meta-
analysis. Anim. Behav. 95, 89–99 (2014)
The Microphone Type and Voice Acoustic Parameters Values 431

14. Pisanski, K., et al.: Voice parameters predict sex-specific body morphology in men
and women. Anim. Behav. 112, 13–22 (2016)
15. Rendall, D., Kollias, S., Ney, C., Lloyd, P.: Pitch (F0) and formant profiles of
human vowels and vowel-like baboon grunts: the role of vocalizer body size and
voice-acoustic allometry. J. Acoustical Soc. Am. 117(2), 944–955 (2005)
16. Fitch, W.T.: Vocal tract length and formant frequency dispersion correlate with
body size in rhesus macaques. J. Acoustical Soc. Am. 102(2), 1213–1222 (1997)
17. Reby, D., McComb, K.: Anatomical constraints generate honesty: acoustic cues to
age and weight in the roars of red deer stags. Anim. Behav. 65(3), 519–530 (2003)
18. Boersma, P., Weenink, D.: Praat: doing phonetics by computer [Computer pro-
gram]. Version 6.0.56 (2019). http://www.praat.org/. Accessed 20 June 2019
19. Teixeira, J.P., Oliveira, C., Lopes, C.: Vocal acoustic analysis-jitter, shimmer and
HNR parameters. Procedia Technol. 9, 1112–1122 (2013)
20. RODE company data sheets. https://cdn1.rode.com/nt1-a datasheet.pdf
21. Shure company datasheets. https://pubs-api.shure.com/file/260007
22. Rode NT1-A Kit microphone data sheet. https://cdn1.rode.com/nt1-
a datasheet.pdf
23. Shure, S.M.: 58 SE microphone data sheet. https://pubs-api.shure.com/file/260007
24. Titze, I.R., Winholtz, W.S.: Effect of microphone type and placement on voice
perturbation measurements. J. Speech Lang. Hear. Res. 36(6), 1177–1190 (1993)
25. Teixeira, J.P., Fernandes, P.O.: Jitter, shimmer and HNR classification within gen-
der, tones and vowels in healthy voices. Procedia Technol. 16, 1228–1237 (2014).
www.sciencedirect.com
26. Source code of PRATT. https://github.com/praat/praat/blob/
382c64e43c64bf73b93fcec32ebfd788b5970a8d/fon/VoiceAnalysis.cpp
Signal Processing
Activities Classification Based on IMU
Signals

Monika N. Bugdol(B) , Marta Danch-Wierzchowska, Marcin Bugdol,


and Dariusz Badura

Faculty of Biomedical Engineering, Silesian University of Technology,


ul. Roosevelta 40, 41-800 Zabrze, Poland
{monika.bugdol,marta.danch-wierzchowska,marcin.bugdol,
dariusz.badura}@polsl.pl

Abstract. This article presents method for recognizing activity from


data acquired from the accelerator, magnetometer, gyroscope and motion
sensors. The experiments providing data were conducted in July 2021
in Katowice, Poland, as a part of the System for Monitoring Activity
and Training Rationalization (SMART) project, financed by the polish
National Centre for Research and Development. A variety of classifiers
were tested in two approaches – using all available variables and using
features selected with the Joint Mutual Information method. Separate
models were built for each activity as well as models for selecting one
activity out of 8 possible. The best obtained results exceeded 98% for
the multi-activity classifier and 99% for each individual activity classifier,
which legitimizes their application for commercialization.

Keywords: Signal classification · Activity recognition · Machine


learning · Signal processing

1 Introduction
In recent years, it has been observed an increasing trend of using wearable sen-
sors in physiotherapy. Miniaturized sensors are being used more and more to
track patients to assess their progress or during exercise [8] and allow obtaining
continuous information about one’s behavior even after the session. Many solu-
tions have been proposed for human activity recognition (HAR) tasks based on
wearable sensor data [7]. These methods adopted different feature selection algo-
rithms and achieved satisfactory recognition accuracy using different machine
learning algorithms.
Recent research surveys [9,10] presented a detailed review of human activity
recognition methods based on wearable sensors from different points of view,
including the variety of sensors, recognition approaches, and application strate-
gies. It can conclude that the latest research is focused on employing inertial
sensors, especially accelerometers, for activity recognition by features related to
human motion.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 435–442, 2022.
https://doi.org/10.1007/978-3-031-09135-3_36
436 M. N. Bugdol et al.

Human activity recognition can be described as a supervised classification


problem. Due to the specificity of the collected signals, i.e., abstract and unin-
terpretable, the data gathered from the wearable sensors are fitted to a prede-
fined set of activities, like walking, running, or swimming. There is a gap between
these raw data and the corresponding activity. Therefore, the whole body motion
could be intractable [5].
One of the earliest research on HAR was described in [6]. The main goal of
the work was to collect data from an accelerometer in a laboratory environment
to evaluate tremor activity and detect posture and motion. After that, inertial
sensor-based systems have often investigated human activities using accelerom-
eters, gyroscopes, and other sensors. Another early work [2] used five two-axis
accelerometers worn on the user’s right hip, dominant wrist, non-dominant upper
arm, dominant ankle, and non-dominant thigh. It was the first work that allowed
recognizing so many (20) different activities. Nevertheless, the decision system
was complex and used decision tables, instance-based learning, C4.5, and naive
Bayes classifiers. The obtained recognition accuracy exceeded 80% on a variety of
everyday activities. Above mentioned studies were one of the most fundamental
in HAR research based on wearable sensors. The proposed procedures became a
vital reference for follow-up research works.
In recent years, research interests have been more focused on deep learning
methods in HAR and almost every research area. Chen et al. [4] presented a
survey of the state-of-the-art deep learning methods for sensor-based HAR. An
interesting overview of the current research progress was also presented. The
conclusion one can draw from this study is that layer-by-layer structures of
deep neural networks still meet many challenges, such as the interpretability of
features, which makes them difficult to understand.
Our work presents a simple and easy to interpret solution adapted to the
SMART system. It employs an effective features extraction procedure and clas-
sical machine learning algorithms. This approach gives a very accurate prediction
of the performed activity.

2 Materials and Methods

For the purposes of the realization of the SMART project, a set of signals has
been recorded for 140 volunteers during July and August 2021. For registrations
the Xsens DOT sensors were used [11]. The Xsens DOT sensor contains:

– accelerometer (nonlinearity ±0.5 %FS),


– magnetometer (resolution 0.25 mGauss, nonlinearity 0.2%FS),
– gyroscope (g-sensitivity 0.1 [◦ /s/g], nonlinearity 0.1%FS).

The sampling frequency of the measured signals 60 Hz. The sensors were placed
as presented on Fig. 1.
The examined persons were performing the following activities:

– walk – free 6-min walk on the treadmill,


Activities Classification 437

Fig. 1. Sensors locations

– BTS walk – walking in laboratory conditions along a measuring path with


2 dynamometric platforms [3] measuring the ground reactions; the walk is
recorded by 6 cameras operating in the infrared range, which emit and detect
infrared radiation reflected from passive markers placed on the body of the
tested person in accordance with the applied model (in this case: Vicon plug-
in gait),
– picking up an item – taking a few steps and bending down to get the item,
– head movement – several head bows,
– pelvis movements – several pelvis rotations along the horizontal axis,
– stabilography – examination on a stabilographic platform,
– Astrand test – respiratory performance test [1],
– getting up – getting up from a chair and walking a few steps.

In total 1121 activities has been recorded. After excluding those measurements,
in which the amount of missing data hampered machine learning, 1101 recorded
activities were further analyzed. Due to the fact, that the recordings varied in
terms of their length, for each measurement a 4 s long clip was chosen as its repre-
sentative. For each variable transmitted by the sensors, concerning the object’s
438 M. N. Bugdol et al.

movement, the mean, standard deviation and range has been calculated. The
obtained coefficients served as predictors, whereas the activity label was the
dependent variable. All calculations were performed in R 4.1.2 using RStudio
framework, employing the following packages: gdata, caTools, class, rpart, ran-
domForest, e1071, adabag, praznik (Fig. 2).

Fig. 2. Flow diagram of data

For each machine learning session the training set consisted of 80% of avail-
able data maintaining the proportions between representatives of individual
classes. Then a model was built and tested. This procedure was repeated 1000
times for each classifier and the averaged accuracies are presented in Tables 1
and 2. The modelling was conducted in two ways:

– without feature selection (all variables included),


– with features selected using the Joint Mutual Information (JMI) method [12].

The details of the variable selection process are as follows. The JMI algorithm
was performed 100 times on a set, which constituted a random 80% of the
training set, maintaining the proportions of the original set in terms of the
participation of representatives of individual classes. The 100 coefficients with
the highest scores were then selected. This procedure was repeated 100 times.
Features that were selected more than 50 times were further selected for model
building.
Activities Classification 439

The tested classifiers included:


– K-Nearest Neighbours (KNN),
– Simple Decision Tree (SDT),
– Random Forest (RF),
– Adaptive boosting (ADA),
– Bootstrap aggregating (Bagging),
– Support Vector Machine with linear kernel (SVM lin).
Due to the fact that the polynomial and radial kernels in SVM gave sig-
nificantly worse results, they were removed from further considerations at the
preliminary stage, and the results obtained with them are not presented in the
Results section.

3 Results
The averaged accuracy results of the employed classifiers are presented in
Tables 1 and 2. Table 2 contains classification results obtained when choosing
between one of the 8 considered activities, whereas Table 2 presents the accu-
racy when verifying each activity separately. Bold font indicates the best result
for each activity. In both tables the postscript “AllVar” means, that all available
coefficients were used and “SelVar” denotes the case, where the input variables
were previously selected using the JMI method.

Table 1. Classification accuracy when distincting between 8 activities: AllVar – All


variables SelVar, – Selected variables

KNN SDT Bagging RF ADA SVM lin


8 activities AllVar 91.08 92.12 96.34 98.34 98.29 97.27
8 activities SelVar 96.01 96.20 96.67 98.35 98.38 97.36

When classifying activities into one of the 8 possibilities, employing Random


Forest and Adaptive Boosting enabled obtaining over 98% accuracy. Performing
pre-processing in the form of feature selection increased the averaged results by
0.04% points. Since the ADA algorithm returns an interpretable result, it should
be preferred over the Random Forest model in case employing it in the SMART
project.
In almost all cases the best results of verifying an activity occurrence were
achieved with the ADA algorithm. When another algorithm gave better accuracy,
the difference between it and ADA did not exceed 0.31% points. The lowest
average accuracy of ADA was 99.21% and 99.25% when including all or selected
variables, respectively.
In case of employing in the final product the solution including separate clas-
sifiers for each activity, the following interpretation is suggested. If there are 8
440 M. N. Bugdol et al.

Table 2. Classification accuracy when detecting particular activities: AllVar – All


variables, SelVar – Selected variables

KNN SDT Bagging RF ADA SVM lin


Walk AllVar 97.55 96.97 98.45 99.22 99.43 98.63
Walk SelVar 98.25 98.36 98.74 99.36 99.47 98.91
BTS walk AllVar 97.25 98.10 98.78 99.09 99.64 98.69
BTS walk SelVar 98.20 98.12 98.74 99.15 99.46 98.48
Picking up an item AllVar 98.71 97.86 98.87 99.38 99.51 99.35
Picking up an item SelVar 99.46 99.30 98.91 99.33 99.54 99.30
Head movement AllVar 100 98.95 99.50 99.83 99.69 100
Head movement SelVar 100 99.50 99.55 99.87 99.73 100
Pelvis movements AllVar 96.37 98.21 98.75 99.34 99.39 95.83
Pelvis movements SelVar 98.45 98.40 98.70 99.37 99.33 97.25
Stabilography AllVar 99.40 99.59 99.61 99.57 99.62 99.73
Stabilography SelVar 99.51 99.58 99.67 99.65 99.72 99.91
Astrand test AllVar 98.13 94.80 97.98 99.32 99.48 98.17
Astrand test SelVar 99.43 98.92 98.20 99.36 99.58 98.34
Getting up AllVar 94.67 96.09 98.18 99.11 99.21 95.46
Getting up SelVar 97.75 98.11 98.50 99.17 99.25 97.51

zeros (all classifiers gave did not recognize the given signal as the particular
activities) then it the analyzed time period should be labeled as “another activ-
ity”. If more than one activity will be identified, the label for the given time
period should be this activity, for which the larger probability of belonging to
class “1” is obtained.

4 Discussion

The best solution was selected based on the obtained results, i.e., the method
that obtained the classification’s highest average quality. The chosen method
– ADA each time obtained a result above 99%, and only in the case of two
activities was (not much) worse than the others. The use of simple statistical
characteristics of the signal, i.e., mean, std and range, simplifies the computation
complexity and speeds up the signal processing time. In addition, the feature
selection enabling precise recognition of a specific activity means that there is
no need to calculate a large set of features each time, but only those identified as
important from the point of view of classification. It should be taken into account
that for all tested methods, the classification quality did not drop below 90%,
which may be the result of a large variety of activities selected for identification
in the SMART system.
Activities Classification 441

The study uses two approaches: one-of-many (AllVar) type recognition, in


this case eight activities, and is/is not type recognition (SelVar). The obtained
classification qualities do not differ significantly. However, in our opinion, the
is/is not (SelVar) approach is more generic and will enable the future expansion
of the system with further activities not included at this stage of activity, without
the need to rebuild the entire model.
Thanks to this approach, the created model can be successfully used as an
element of the SMART system at the stage of recognizing user characteristics.
Providing it with stable work in real-time.
The quality of the classification in the work of various teams on the recog-
nition of activity published so far ranges from very low, approx. 50%, to very
high, 98% [7,9,10]. The methods used are a cross-section from simple systems
based on basic signal characteristics to deep learning solutions. In each study,
the classification results are obtained for different sets of activities and different
characteristics of signals from different sources, and the solutions are selected
according to the research problem. The problem of recognizing activity is com-
plex and each time requires the creation of a new model adapted to a specific
case.
Nevertheless, the wearable sensors introduces a new quality into human move-
ment tracking, and it is possible that this sensors will revolutionize our under-
standing of personal activities.

5 Conclusion
This paper presents the results of activity recognition using machine learning
methods on IMU signals. The obtained accuracy values are satisfying and lead
to the conclusions, that it is possible to automatically detect an activity from
the provided activity list in a fast and effective way. For the final version of
the SMART system it is advised to verify each activity separately and then
perform final activity selection instead of trying to distinguish between them
using a single model. The details of the proposed final models were delivered
to Comfortel, the partner of Silesian University of Technology and the main
investor in the SMART project.

Acknowledgememt. We would like to thank The Jerzy Kukuczka Academy of Physi-


cal Education in Katowice and Comfortel company for conducting the experiments and
providing the acquired data.
The study was realized within the project “SMART – activity monitoring and
training rationalisation system” (grant no. WND-RPSL.01.02.00-24-045E/19-003).

References
1. Åstrand, P.O.: Experimental studies of physical working capacity in relation to sex
and age. FIEP Bull. On-line 52 (1952)
442 M. N. Bugdol et al.

2. Bao, L., Intille, S.S.: Activity recognition from user-annotated acceleration data.
In: Ferscha, A., Mattern, F. (eds.) Pervasive 2004. LNCS, vol. 3001, pp. 1–17.
Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24646-6 1
3. BTS Bioengineering Corp.: BTS GAITLAB. https://www.btsbioengineering.com/
products/bts-gaitlab-gait-analysis/
4. Chen, K., Zhang, D., Yao, L., Guo, B., Yu, Z., Liu, Y.: Deep learning for sensor-
based human activity recognition: overview, challenges, and opportunities. ACM
Comput. Surv. CSUR 54, 1–40 (2021)
5. Fan, C., Gao, F.: Enhanced human activity recognition using wearable sensors via
a hybrid feature selection method. Sensors 21, 6434 (2021)
6. Foerster, F., Smeja, M.: Joint amplitude and frequency analysis of tremor activity.
Electromyogr. Clin. Neuro-Physiol. 39, 11–19 (1999)
7. Lara, O.D., Labrador, M.A.: A survey on human activity recognition using wear-
able sensors. IEEE Commun. Surv. Tutor. 15, 1192–1209 (2012)
8. Romaniszyn-Kania, P., et al.: Affective state during physiotherapy and its analysis
using machine learning methods. Sensors 21, 4853 (2021)
9. Slim, S.O., Atia, A., Elfattah, M.M.A., M.Mostafa, M.S.: Survey on human activity
recognition based on acceleration data. Int. J. Adv. Comput. Sci. Appl. 10, 84–98
(2019)
10. Wang, Y., Cang, S., Yu, H.: A survey on wearable sensor modality centered human
activity recognition in health care. Expert Syst. Appl. 137, 167–190 (2019)
11. Xsens: Xsens DOT user manual (2021). https://www.xsens.com/hubfs/
Downloads/Manuals/XsensDOTUserManual.pdf
12. Yang, H., Moody, J.: Data visualization and feature selection: new algorithms for
nongaussian data. In: Solla, S., Leen, T., Müller, K. (eds.) Advances in Neural
Information Processing Systems, vol. 12. MIT Press (2000). https://proceedings.
neurips.cc/paper/1999/file/8c01a75941549a705cf7275e41b21f0d-Paper.pdf
Heart Rate Measurement Based
on Embedded Accelerometer
in a Smartphone

Mirella Urzeniczok1 , Szymon Sieciński2(B) , and Paweł Kostka2


1
Łukasiewicz Research Network – Institute for Medical Equipment and Technology,
ul. Roosevelta 118, 41-800 Zabrze, Poland
mirella.urzeniczok@itam.lukasiewicz.gov.pl
2
Department of Biosensors and Processing of Biomedical Signals,
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{szymon.siecinski,pawel.kostka}@polsl.pl

Abstract. The emergence of new information technologies affects vari-


ous aspects of life. The use of mobile communication devices in healthcare
is known as mobile health or mHealth. In our study, we present an app
for measuring the heart rate in real time based on seismocardiography.
The heartbeats were detected with a modified version of Pan-Tompkins
algorithm. The results prove the feasibility of the designed app for real-
time measurement of heart rate using only an accelerometer.

Keywords: Heart rate measurement · Seismocardiography · mHealth

1 Introduction
The emergence of new information and communication technologies (ICT) affects
various aspects of life. The use of mobile communication devices (e.g., smart-
phones, tablet computers, smart watches, smart wristbands) to provide health-
care services is defined as mobile health or mHealth [2,23,32]. On an individual
patient level, mHealth may find its use in increasing medication adherence or
control of vital signs [10,20]. Cardiology may also benefit from mHealth thanks to
the broadening capabilities of telecommunication networks and mobile devices,
which may improve the diagnosis and treatment of cardiovascular diseases [20].
Heart rate (HR) is one of the most important vital signs that can be acquired
with various techniques, such as electrocardiography (ECG) [8,30], photoplethys-
mography (PPG) [4,10], videoplethysmography (VPG) [6,24], seismocardiog-
raphy [9,17,25], and gyrocardiography [17]. Seismocardiography (SCG) is a
noninvasive technique for acquisition and analysis of cardiac vibrations by an
accelerometer placed on a chest wall [29,33,34]. Before 2007, the acquisition of
seismocardiograms (SCG signals) required the use of cumbersome piezoelectric
accelerometers [5,29] placed on the sternum. The availability of small, inex-
pensive, and accurate MEMS (microelectromechanical systems) accelerometers
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 443–454, 2022.
https://doi.org/10.1007/978-3-031-09135-3_37
444 M. Urzeniczok et al.

encouraged researchers and clinicians to rediscover the clinical potential of seis-


mocardiography [14].
The aforementioned emergence of new ICT technologies in combination with
a previously forgotten (and now emerging) diagnostic technique (SCG) led us
to conduct the research with the two following purposes: the development of a
mobile app for real-time heart rate measurement with an embedded accelerom-
eter of a smartphone and assessment of the performance of heartbeat detection.

2 Material and Methods

2.1 Experimental Group

Conducting tests with volunteers was an integral part of an app. Due to the lim-
ited time and budget, the experimental group consisted of 9 volunteers: 5 female
and 4 male subjects. The mean age of female subjects was 33.4 years, whereas
the mean age of male subjects was 38.3 years. One subject was diagnosed with
an arrhythmia. Before conducting the study, all subjects gave informed con-
sent. Each subject was lying in a supine position and was asked to place the
smartphone on the chest wall and stay still during the signal registration. The
SCG signal from each subject was registered for two minutes with the sampling
frequency 100 Hz.

2.2 Heart Rate Measurement

The detection of heartbeats is performed with a modified version of Pan-


Tompkins algorithm that was designed to detect heartbeats from electrocardio-
grams in real time [21]. Because the heartbeats in electrocardiograms and seis-
mocardiograms are represented by quasi-periodic single, sharp peaks, we decided
to adjust the well-known Pan-Tompkins algorithm to the characteristics of seis-
mocardiograms [31]. The original Pan-Tompkins algorithm consists of several
steps: bandpass filtration, differentiation, squaring, smoothing with a moving
average filter, adaptive thresholding, and finding local maxima [21].
In the original Pan-Tompkins algorithm, the first step is applying the But-
terworth bandpass filter with cut-off frequencies of 5 Hz and 11 Hz implemented
as a cascade of low-pass and high-pass recursive filters with integer coefficients
[11,21]. In our approach, we applied the bandpass filter with cut-off frequencies
of 15 Hz and 25 Hz implemented as a zero-phase finite impulse response (FIR)
filter [31]. The passband of 15–25 Hz was chosen based on [3,22,27].
The next step in the original Pan-Tompkins algorithm is differentiating the
input signal with a five-point derivative filter [21]. In our approach, we omitted
this step. Then, the signal samples were squared to rise the slope of the AO wave
[31], which is expressed as:
y(n) = [x(n)]2 (1)
where y(n) is the output signal and x(n) is the input signal.
HR Measurement with a Smartphone 445

Then, the signal was smoothed by applying a moving average filter imple-
mented as a finite impulse response (FIR) filter, which is expressed as follows:
1
y(n) = [x(n) + x(n − 1) + · · · + x(n − L + 1)] (2)
L
where y(n) is the output signal, L is the window width, and x(n) is the input
signal. The width of the moving average filter was set to 0.1 s.
The last step is finding the AO waves and automatic thresholding to atten-
uate the noise [11,21]. The locations of the AO waves were determined as the
rising slope of the preprocessed signal. Signal thresholding uses two thresholds:
a higher threshold used in the learning phase and lower threshold used if no
heartbeat is detected in a set time interval.
After the learning phase, the threshold is adapted to the running signal and
noise levels and the missed peaks are added using search-back technique [21].
In our modification of Pan-Tompkins algorithm, the most significant changes
are the learning phase (the first estimation of signal and noise peaks), and the
calculation of threshold. In general, the threshold level T hr is calculated as:

T hr = nl + 0.25 × (sl − nl) (3)

where sl is the signal level, and nl is the noise level.


After estimating the initial signal and noise levels, the signal and noise levels
are adjusted to improve the signal-to-noise ratio (SNR) as:

sl = 0.2 × pk + 0.8 × sl (4)


nl = 0.2 × pk + 0.8 × nl (5)

where pk is the peak.


The beat-to-beat interval (NN) is calculated as:

N Ni = ti − ti−1 (6)

where ti is the time of i-th heartbeat.


The heart rate is calculated as:
60
HR = BPM (7)
NN

2.3 Mobile App Development


The app was written in Java, built with Gradle and consists of the following
types of classes:
– Three activities:
1. StartActivity – implements the main menu and controls the data acqui-
sition service
2. PlotActivity – plots charts with SCG signals in three axes (X, Y, Z) with
MPAndroidChart
446 M. Urzeniczok et al.

3. HRDetectionActivity – implements HR measurement


– Service:
1. AccService – implements accelerometer data acquisition
– Filters:
1. FIRFilterImplementation – implements FIR filter
2. IIRFilterImplementation – implements IIR filter
3. LowpassFilterButterworthSection – creates lowpass Butterworth filter
4. LowpassFilterButterworthImplementation – implements lowpass Butter-
worth filter
5. HighpassFilterButterworthSection – creates highpass Butterworth filter
6. HighpassFilterButterworthImplementation – implements highpass But-
terworth filter
7. BandpassFilterButterworthSection – creates bandpass Butterworth filter
8. BandpassFilterButterworthImplementation – implements bandpass But-
terworth filter.
The implementation of FIR and IIR filters was based on C# implementation
available at [1] under The Code Project Open License (CPOL). The designed
app requires the availability of a working accelerometer and two permissions:
WAKE_LOCK which prevents the device from turning into sleep mode after
some time of inactivity, and WRITE_EXTERNAL_STORAGE which enables
saving files with registered signals to an SD card [12].
The main activity (StartActivity) is responsible for the main menu, managing
the data acquisition service (AccService), and saves the registered signals as a
CSV file with OpenCSV library. The available interactions of a user with the
app are shown in Fig. 1.

3 Results
3.1 Heartbeat Detection Performance
The performance of heartbeat detection was expressed as the number of true
positives (TP), false negatives (FN), and false positives (FP) annotated manually
in SCG signals based on the description provided in [33]. True positive is defined
as a correctly detected AO wave, whereas a false negative is defined as an AO
wave that was not detected. False positive is defined as detecting a false AO
wave. Tolerance for detecting AO waves was 0.1 s based on [27].
The sensitivity (Se) is defined as:
TP
Se = (8)
TP + FN
and the positive predictive value (PPV) is defined as:
TP
PPV = (9)
TP + FP
The performance of heartbeat detection is shown in Table 1.
The comparison of the performance of our approach with other available
heartbeat detectors designed for SCG signals is presented in Table 2.
Fig. 1. Use cases diagram of the app
HR Measurement with a Smartphone
447
448 M. Urzeniczok et al.

Table 1. Heartbeat detection performance

Subject TP FN FP Se [%] PPV [%]


1 150 14 0 91.46 100
2 112 24 0 82.35 100
3 152 25 0 85.88 100
4 232 15 0 93.93 100
5 162 10 0 94.18 100
6 100 8 12 92.59 89.28
7 130 1 16 99.23 89.04
8 130 1 12 99.23 91.54
9 140 0 13 100 91.50
Total 1308 98 53 93.03 96.11

Table 2. Comparison of the presented heartbeat detection algorithm with other avail-
able algorithms

Algorithm Reference Se PPV Remarks


Choudhary et al. [7] 0.974 0.974
Cocconcelli et al. [9] 0.991 0.979
Landreani et al. [15] 0.999 – Accuracy: 0.986; POS1 in supine
0.999 – Accuracy: 0.990; POS2 in supine
Li et al. [16] 0.993 0.994
Mora et al. [18] 0.985 0.986 Specificity: 0.986
Mora et al. [19] 0.985 0.986
Siecinski et al. [25] 0.995 0.991
Siecinski et al. [26] 0.893 0.896
Suresh et al. [28] 0.980 0.980
This study 0.930 0.961

3.2 Mobile App Tests

The app was tested on a Honor 10 smartphone with Android version 9.1.0 and
included the typical use case: lying still in a supine position, placing a smartphone
on a chest, and following the instructions provided by an app. After launching
the app, the user sees the welcome screen as shown in Fig. 2.
HR Measurement with a Smartphone 449

Fig. 2. Welcome screen and main menu

If the device has no available accelerometer, the user is informed that the
examination cannot be performed without a working accelerometer (see Fig. 3).
After choosing “Rozpocznij badanie” (Start examination), the user is
instructed how to perform the examination and then the signal registration and
heart rate measurement starts. The charts with an SCG signal (in X-axis, Y-
axis, and/or Z-axis) and heart rate are shown after choosing “Pokaż wykresy”
(Show chart). The current heart rate is displayed with a one-second delay (see
Fig. 4). To stop registering the signals, a user must choose “Zakończ badanie”
(Stop examination) and confirm the choice.
The registered signals may be exported to a CSV file saved on a device
storage by choosing “Zapisz wynik badania” (Save the recording). The user is
asked whether to save the CSV file or abort the export (see Fig. 5)
450 M. Urzeniczok et al.

Fig. 3. The warning about the lack of available accelerometer

Fig. 4. SCG signal and heart rate charts: on the left for HR and Z-axis only, and for
all the axes and heart rate on the right
HR Measurement with a Smartphone 451

Fig. 5. Exporting the registered signals to a CSV file

4 Discussion

We have developed an app for real-time heart rate measurement based on SCG.
The implemented app has a simple and intuitive user interface, which helps
the user prepare to record the SCG signals. The app runs stable, the charts are
displayed smoothly, the heartbeat detector proved its feasibility on a study group
(Se = 0.930, PPV = 0.961), and the delay of heart rate measurement was one
second. The reported performance is lower than in most studies shown in Table 2,
except Sieciński et al. in [26] (Se = 0.893, PPV = 0.896). The main cause was
a relatively small difference in amplitude between the AO and RE waves that
may be related to arterial stiffness due to ageing [13]. However, we proved that a
smartphone with an app can register seismocardiograms and display the current
heart rate in real time.
In future research, we consider improving the robustness of the implemented
heartbeat detector in order to decrease the number of false positives. Another
important aspect of future studies is conducting tests in different conditions
(e.g., in various positions, places, and emotional states) and including more sub-
jects. An app could be improved by adding new modules for the analysis of SCG
signals, such as for heart rate variability analysis or atrial fibrillation detection.
Thanks to the improvements in signal processing techniques and devices in com-
bination with the understanding of physiological background of cardiac vibra-
tions, seismocardiography may become a useful technique for the assessment of
heart condition at home.
452 M. Urzeniczok et al.

References
1. Lowpass, highpass, and bandpass Butterworth filters in C# (2019). https://www.
codeproject.com/Tips/5070936/Lowpass-Highpass-and-Bandpass-Butterworth-
Filters. Accessed 27 Dec 2021
2. Adibi, S. (ed.): Mobile Health. Springer, Cham (2015). https://doi.org/10.1007/
978-3-319-12817-7
3. Alamdari, N., Tavakolian, K., Zakeri, V., Fazel-Rezai, R., Akhbardeh, A.: A mor-
phological approach to detect respiratory phases of seismocardiogram. In: 2016
38th Annual International Conference of the IEEE Engineering in Medicine and
Biology Society (EMBC), Orland, FL, USA, pp. 4272–4275 (2016). https://doi.
org/10.1109/EMBC.2016.7591671
4. Allen, J.: Photoplethysmography and its application in clinical physiological mea-
surement. Physiol. Meas. 28(3), R1 (2007). https://doi.org/10.1088/0967-3334/
28/3/R01
5. Castiglioni, P., Faini, A., Parati, G., Rienzo, M.D.: Wearable seismocardiogra-
phy. In: 2007 29th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society, Lyon, France, pp. 3954–3957 (2007). https://doi.
org/10.1109/IEMBS.2007.4353199
6. Celniak, W., Augustyniak, P.: Detection of human blood pulse based on displace-
ment vector in video footage. In: 2021 14th International Conference on Human
System Interaction (HSI). IEEE (2021). https://doi.org/10.1109/hsi52170.2021.
9538740
7. Choudhary, T., Bhuyan, M.K., Sharma, L.N.: A novel method for aortic valve
opening phase detection using SCG signal. IEEE Sens. J. 20(2), 899–908 (2020).
https://doi.org/10.1109/jsen.2019.2944235
8. Christov, I.I.: Real time electrocardiogram QRS detection using combined adaptive
threshold. Biomed. Eng. Online 3(1), 28 (2004). https://doi.org/10.1186/1475-
925X-3-28
9. Cocconcelli, F., Mora, N., Matrella, G., Ciampolini, P.: Seismocardiography-based
detection of heartbeats for continuous monitoring of vital signs. In: 2019 11th
Computer Science and Electronic Engineering (CEEC), Colchester, UK, pp. 53–58
(2019). https://doi.org/10.1109/CEEC47804.2019.8974343
10. Coppetti, T., et al.: Accuracy of smartphone apps for heart rate measure-
ment. Eur. J. Prev. Cardiol. 24(12), 1287–1293 (2017). https://doi.org/10.1177/
2047487317702044
11. Fariha, M.A.Z., Ikeura, R., Hayakawa, S., Tsutsumi, S.: Analysis of Pan-Tompkins
algorithm performance with noisy ECG signals. J. Phys. Conf. Ser. 1532, 012, 022
(2020). https://doi.org/10.1088/1742-6596/1532/1/012022
12. Google Inc: Manifest.permission (2021). https://developer.android.com/reference/
android/Manifest.permission. Accessed 27 Jan 2022
13. Gurev, V., Tavakolian, K., Constantino, J.C., Kaminska, B., Blaber, A.P.,
Trayanova, N.: Mechanisms underlying isovolumic contraction and ejection peaks
in seismocardiogram morphology. J. Med. Biol. Eng. 32(2), 103 (2012). https://
doi.org/10.5405/jmbe.847
14. Inan, O.T., et al.: Ballistocardiography and seismocardiography: a review of recent
advances. IEEE J. Biomed. Health Inform. 19(4), 1414–1427 (2015). https://doi.
org/10.1109/JBHI.2014.2361732
HR Measurement with a Smartphone 453

15. Landreani, F., et al.: Beat-to-beat heart rate detection by smartphone’s accelerom-
eters: validation with ECG. In: 2016 38th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA,
pp. 525–528 (2016). https://doi.org/10.1109/EMBC.2016.7590755
16. Li, Y., Tang, X., Xu, Z.: An approach of heartbeat segmentation in seismocar-
diogram by matched-filtering. In: 2015 7th International Conference on Intelligent
Human-Machine Systems and Cybernetics, Hangzhou, China, vol. 2, pp. 47–51
(2015). https://doi.org/10.1109/IHMSC.2015.157
17. Mehrang, S., et al.: Machine learning based classification of myocardial infarction
conditions using smartphone-derived seismo- and gyrocardiography. In: 2018 Com-
puting in Cardiology Conference (CinC), Maastricht, Netherlands, vol. 45, pp. 1–4
(2018). https://doi.org/10.22489/CinC.2018.110
18. Mora, N., Cocconcelli, F., Matrella, G., Ciampolini, P.: Detection and analysis of
heartbeats in seismocardiogram signals. Sensors 20(6), 1670 (2020). https://doi.
org/10.3390/s20061670
19. Mora, N., Cocconcelli, F., Matrella, G., Ciampolini, P.: A unified methodology for
heartbeats detection in seismocardiogram and ballistocardiogram signals. Comput-
ers 9(2), 41 (2020). https://doi.org/10.3390/computers9020041
20. Nguyen, H.H., Silva, J.N.: Use of smartphone technology in cardiology. Trends
Cardiovasc. Med. 26(4), 376–386 (2016). https://doi.org/10.1016/j.tcm.2015.11.
002
21. Pan, J., Tompkins, W.J.: A real-time QRS detection algorithm. IEEE Trans.
Biomed. Eng. BME-32(3), 230–236 (1985)
22. Pandia, K., Inan, O.T., Kovacs, G.T.A., Giovangrandi, L.: Extracting respiratory
information from seismocardiogram signals acquired on the chest using a miniature
accelerometer. Physiol. Meas. 33(10), 1643–1660 (2012). https://doi.org/10.1088/
0967-3334/33/10/1643
23. Reeder, B., David, A.: Health at hand: a systematic review of smart watch uses
for health and wellness. J. Biomed. Inform. 63, 269–276 (2016). https://doi.org/
10.1016/j.jbi.2016.09.001
24. Rumiński, J.: Reliability of pulse measurements in videoplethysmography. Metrol.
Meas. Syst. 23(3), 359–371 (2016). https://doi.org/10.1515/mms-2016-0040
25. Siecinski, S., Kostka, P.S., Tkacz, E.J.: Heart rate variability analysis on CEBS
database signals. In: 2018 40th Annual International Conference of the IEEE Engi-
neering in Medicine and Biology Society, Honolulu, HI, USA, pp. 5697–5700 (2018).
https://doi.org/10.1109/EMBC.2018.8513551
26. Siecinski, S., Tkacz, E.J., Kostka, P.S.: Comparison of HRV indices obtained from
ECG and SCG signals from CEBS database. BioMed. Eng. OnLine 18(69) (2019).
https://doi.org/10.1186/s12938-019-0687-5
27. Sørensen, K., Schmidt, S.E., Jensen, A.S., Søgaard, P., Struijk, J.J.: Definition of
fiducial points in the normal seismocardiogram. Sci. Rep. 8(1) (2018). https://doi.
org/10.1038/s41598-018-33675-6
28. Suresh, P., Narayanan, N., Pranav, C.V., Vijayaraghavan, V.: End-to-end deep
learning for reliable cardiac activity monitoring using seismocardiograms. In:
2020 19th IEEE International Conference on Machine Learning and Applica-
tions (ICMLA), Miami, FL, USA, pp. 1369–1375 (2020). https://doi.org/10.1109/
ICMLA51294.2020.00213
29. Taebi, A., Solar, B.E., Bomar, A.J., Sandler, R.H., Mansy, H.A.: Recent advances
in seismocardiography. Vibration 2(1), 64–86 (2019). https://doi.org/10.3390/
vibration2010005
454 M. Urzeniczok et al.

30. Task Force of the European Society of Cardiology the North American Society
of Pacing Electrophysiology: Heart rate variability. Standards of measurement,
physiological interpretation, and clinical use. Circulation 93, 1043–1065 (1996).
https://doi.org/10.1161/01.CIR.93.5.1043
31. Urzeniczok, M.: Aplikacja do detekcji tętna za pomocą akcelerometru wbu-
dowanego w urządzenie mobilne [An app for detecting the heart beats with an
accelerometer embedded into a mobile device]. Master’s thesis, Silesian University
of Technology, Zabrze, Poland (2020)
32. Yang, X., et al.: Exploring emerging IoT technologies in smart health research: a
knowledge graph analysis. BMC Med. Inform. Decision Mak. 20(1) (2020). https://
doi.org/10.1186/s12911-020-01278-9
33. Zanetti, J.M., Poliac, M.O., Crow, R.S.: Seismocardiography: waveform identifica-
tion and noise analysis. In: Proceedings Computers in Cardiology, Venice, Italy,
pp. 49–52 (1991). https://doi.org/10.1109/CIC.1991.169042
34. Zanetti, J.M., Tavakolian, K.: Seismocardiography: past, present and future. In:
2013 35th Annual International Conference of the IEEE Engineering in Medicine
and Biology Society (EMBC), Osaka, Japan, pp. 7004–7007 (2013). https://doi.
org/10.1109/EMBC.2013.6611170
Non-invasive Measurement of Human
Pulse Based on Photographic Images
of the Face

Jakub Gumulski, Marta Jankowska, and Dominik Spinczyk(B)

Faculty of Biomedical Engineering, Silesian University of Technology,


ul. Roosevelta 40, 41-800 Zabrze, Poland
{jakugum853,martjan942}@student.polsl.pl, dominik.spinczyk@polsl.pl

Abstract. The focus of this study was to test a contactless photo-


plethysmography based method to calculate pulse of a patient from a
video recording of their face. For this purpose deep convolution neural
network was used for detection the region of interest skin area of face
and then analyzing of the variability of the image values was processed as
a signal in frequency domain for pulse reconstruction. The method was
tested on three video sets with different video resolutions: 1920 × 1080
px, 960 × 540 px, and 640 × 580 px. The best results came from a set
with a resolution of 960 × 540 px, with a relative error of 10.6%, and
an absolute error of 10.4 BPM, and a processing speed of 3.7 FPS. The
method can be useful when it is impossible to use dedicated medical
equipment to measure the human pulse.

Keywords: Pulse measurement · Deep convolution neural network ·


Photoplethysmography

1 Introduction
During a global pandemic, keeping distance and remote health monitoring
became critical issues in the medical field. There are numerous noninvasive and
contactless diagnostic procedures available today [4,12]. C. Omar et al. was com-
paring accuracy of pulse measurement using pulse oximetry and electrocardio-
graphy methods in detecting low heart rate (lower than 100 beats per minute).
55 infants were tested and mean difference between the two methods was ±2
beats per minute. The pulse oximetries sensitivity of low pulse detection was
89% and specificity was 99% [10]. Nelson et al. was focused on examining accu-
racy of heart rate measurements of wearable devices Apple Watch 3 and Fitbit
Charge 2. The measurements were taken during 24-h period and compared with
reference data from ambulatory electrocardiogram. The Apple Watch had mean
difference of −1.80 beats per minute and mean absolute error percent of 5.86%
and the Fitbit Charge had difference of −3.47 beats per minute and mean abso-
lute error of 5.96% [9]. Cheng et al. the goal was to develop new optical method
of detecting heart rate and blood pressure values. Heart rate was measured by
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 455–464, 2022.
https://doi.org/10.1007/978-3-031-09135-3_38
456 J. Gumulski et al.

using photoplethysmography with SpO2 sensor and blood pressure was mea-
sured with ballistocardiography using fiber sensor mat. Accuracy was measured
by comparing with reference data. Mean error for heart rate was 0.6 ± 0.9 beats
per minute and for blood pressure it was 1.8 ± 1.3 mmHg [2].
During systole, the heart pumps blood through arteries, changing the color
shade of tissue and skin subtly. Photoplethysmography analyzes those changes
and detects blood volume pulse (bvp), and heart rate. This method usually
requires an optical sensor that is physically connected to the skin, but in this
article, a video camera and face recognition algorithms were used instead. The
majority of optical approaches are less successful than normal contact methods,
but the suggested method should allow for contactless heart rate measurement
without the use of any special equipment and with only an standard phone
camera [1].
The purpose and novum of this article was evaluation the maximum perfor-
mance of a contactless photoplethysmography based method to calculate human
pulse in daylight conditions with minimum hardware equipment using smart-
phone camera. No such detailed analysis was found in the literature review.

2 Materials and Methods

The proposed method of the human pulse detection was shown in block diagram
Fig. 1. Below is a brief description of the individual stages of the developed
method.
First step of the developed method was face recognition. To recognize face in
image, FaceLib library was used [3]. After successfully recognizing face in each
frame, pixels containing skin area were selected and background pixels were
removed from each frame, to minimize irrelevant data and perturbations. Each
frame was resized into 256 × 256 px squares and converted into RGB color scale.
Segmentation was based on pixels intensity and colors corresponding to skin
tones using deep-learning LinkNet34 model [6–8]. Pulse could not be calculated
unless face was recognized in the video’s frame. Only one person’s pulse could
be calculated at the same time.
Next step of the algorithm was constructing variability function of image
elements. The function was processed and the pulse was calculated for each
90 frames. Mean signal for each RGB canal of each frame was calculated and
then, by calculating diagonal array and inverse of a matrix, the batch signal
was acquired. Then the moving average was applied with indicator of 6 frames
to remove insignificant fluctuations of the signal. For signal processing rPPG
library was used [11]. That prepared signal was ready to calculate pulse. All
calculations were made in Python configured in the Anaconda environment.
To compute pulse from mean RGB signal one-dimensional discrete Fourier
1
Transform with sample frequencies of f ramespersecond step was used. Fourier
Transform is transform that changes function’s domain of time to domain of
frequency. Its formula was shown in Eq. 1.
Contactless Heart Rate Measurement 457

Fig. 1. Block diagram of pulse acquisition algorithm

L−1
 2πkn
F (k) = f (n)e−j L , (1)
n=0

where:

– F(k) – frequency domain signal,


– f(n) – time domain signal,
– L – signal’s length.
458 J. Gumulski et al.

Then absolute value was driven from signal to remove imaginary part of the
signal.
The next step was filtering spectral signal by removing frequencies not cor-
responding with heart rate frequencies. Band-pass filter (shown on Fig. 2) was
used for frequencies between 0.8 Hz and 3 Hz, which correspond with heart rate
values between 48 bpm and 180 bpm. Band pass filter was also shown in Eq. 2.

Fig. 2. Band-pass filter used to detect heart rate frequencies

F (k) = 0 ⇔ (F (k) < fmin  F (k) > fmax ) , (2)


where:
– fmin – minimum frequency of filter equal to 0.8 Hz,
– fmax – maximum frequency of filter equal 3 Hz.
Last step of calculating pulse was detection of maximum value harmonic fre-
quency. Peak amplitude of filtered signal corresponds with heart rate frequency
as shown in Eq. 3. This value was squared to make it more distinguishable from
the rest of the signal. That harmonic frequency after multiplication by 60.0 s
(Eq. 4) returned pulse value in beats per minute and was saved as batch pulse
value for its 90 frames of video. Thereby the rest of video was processed. Each
of batch pulses was shown on the heart rate plot on Fig. 3. The final pulse was
an average of all batch pulses calculated throughout video.

h = F (m), (3)
p = h × 60, (4)
Contactless Heart Rate Measurement 459

where

– h – heart rate frequency,


– F(m) – frequency signal element of highest amplitude,
– p – calculated pulse value.

Fig. 3. Example of pulse plot acquired from video

2.1 Evaluation of the Proposed Method


Results of heart rate calculation could be shown on the plot, as shown in Fig. 3.
The upper chart shows blood volume pulse changes and the lower chart shows
changes in pulse values in each frame. In upper left corner heart rate value was
shown. For each video, calculated heart rate values was measured in beats per
minute (BPM) and assessed with absolute error in beats per minute (Eq. 5)
and relative error in percentage (Eq. 6) [5]. Additionally standard deviation was
computed.

ea = |P − p|, (5)
ea
er = × 100%, (6)
P
where
– ea – absolute error value,
– er – relative error value,
– P – measured pulse value,
– p – calculated pulse value.
460 J. Gumulski et al.

The algorithm was tested on a total of 72 recordings – showing 8 people after


three different levels of physical activity, additionally augmented by scaling to
a two-fold smaller size (from the overall size of 1920 × 1080 px to 960 × 540 px
and to 640 × 580 px). Each recording lasted from 10 to 22 s long. The partici-
pants included both sexes, some with makeup or a beard. For each video par-
ticipant’s face was recorded with camera, while their pulse was measured using
pulse oximeter. There were three types of video:
1. While resting (no prior physical activity),
2. After short period of physical activity,
3. Around one minute after physical activity.

3 Results
In following tables each of 8 participants was marked with letter from A to
H. Alongside with sample number creates and ID for specific video used as
base material. Both measured and calculated pulse are expressed in beats per
minute (BPM), while relative error is expressed in percentages (%). Video length
and processing time are given in seconds (s). The tables show the results from
recordings with resolutions of 1920 × 1080 px in Table 1, 960 × 540 px in Table 2,
and 640 × 580 px in Table 3.

Table 1. Pulse values and algorithm performance from 1920 × 1080 px videos

ID Measured Calculated Absolute Relative Video Processing FPS


pulse (BPM) pulse (BPM) error error (%) length (s) time (s)
(BPM)
A1 75 73 2 2.74 19 196 2.4
A2 92 71 21 29.58 22 236 2.3
A3 93 86 7 8.14 20 220 2.3
B1 93 107 14 13.08 21 215 2.4
B2 98 108 10 9.26 21 225 2.3
B3 101 102 1 0.98 10 105 2.4
C1 92 92 0 0 20 229 2.2
C2 98 93 5 5.38 20 220 2.3
C3 98 111 13 11.71 20 184 2.7
D1 70 123 53 43.09 21 196 2.7
D2 103 120 17 14.17 21 220 2.4
D3 109 109 0 0 21 215 2.4
E1 76 76 0 0 21 208 2.5
E2 81 81 0 0 22 209 2.6
E3 89 86 3 3.49 20 190 2.6
F1 85 95 10 10.53 20 172 2.9
F2 86 82 4 4.88 20 200 2.5
F3 89 101 12 11.88 18 173 2.6
G1 81 119 38 31.93 22 209 2.6
G2 89 106 17 16.04 21 194 2.7
G3 115 124 9 7.26 20 185 2.7
H1 84 71 13 18.31 20 186 2.7
H2 84 94 10 10.64 20 180 2.8
H3 101 100 1 1 21 206 2.5
Contactless Heart Rate Measurement 461

Table 2. Pulse values and algorithm performance from 960 × 540 px videos

ID Measured Calculated Absolute Relative Video Processing FPS


pulse pulse error error (%) length (s) time (s)
(BPM) (BPM) (BPM)
A1 75 72 3 4.17 19 162 2.9
A2 92 76 16 21.05 22 153 3.6
A3 93 91 2 2.2 20 142 3.5
B1 93 102 9 8.82 21 146 3.6
B2 98 102 4 3.92 21 148 3.5
B3 101 102 1 0.98 10 76 3.3
C1 92 95 3 3.16 20 140 3.6
C2 98 78 20 25.64 20 138 3.6
C3 98 111 13 11.71 20 121 4.1
D1 70 111 41 36.94 21 130 4
D2 103 120 17 14.17 21 149 3.5
D3 109 100 9 9 21 137 3.8
E1 76 75 1 1.33 21 147 3.6
E2 81 76 5 6.58 22 148 3.7
E3 89 81 8 9.88 20 132 3.8
F1 85 95 10 10.53 20 127 3.9
F2 86 82 4 4.88 20 125 4
F3 89 113 24 21.24 18 115 3.9
G1 81 101 20 19.8 22 138 4
G2 89 111 22 19.82 21 132 4
G3 115 120 5 4.17 20 129 3.9
H1 84 79 5 6.33 20 127 3.9
H2 84 91 7 7.69 20 127 3.9
H3 101 101 0 0 21 138 3.8

Table 3. Pulse values and algorithm performance from 640 × 580 px videos

ID Measured Calculated Absolute Relative Video Processing FPS


pulse pulse error error (%) length (s) time (s)
(BPM) (BPM) (BPM)
A1 75 71 4 5.63 19 129 3.7
A2 92 62 30 48.39 22 150 3.7
A3 93 71 22 30.99 20 134 3.7
B1 93 91 2 2.2 21 138 3.8
B2 98 72 26 36.11 21 142 3.7
B3 101 102 1 0.98 10 69 3.6
(continued)
462 J. Gumulski et al.

Table 3. (continued)

ID Measured Calculated Absolute Relative Video Processing FPS


pulse pulse error error (%) length (s) time (s)
(BPM) (BPM) (BPM)
C1 92 87 5 5.75 20 136 3.7
C2 98 67 31 46.27 20 137 3.7
C3 98 80 18 22.5 20 128 3.9
D1 70 100 30 30 21 143 3.7
D2 103 86 17 19.77 21 144 3.6
D3 109 86 23 26.74 21 139 3.8
E1 76 66 10 15.15 21 142 3.7
E2 81 79 2 2.53 22 149 3.7
E3 89 72 17 23.61 20 136 3.7
F1 85 107 22 20.56 20 132 3.8
F2 86 76 10 13.16 20 132 3.8
F3 89 100 11 11 18 130 3.5
G1 81 99 18 18.18 22 140 3.9
G2 89 105 16 15.24 21 132 4
G3 115 120 5 4.17 20 129 3.9
H1 84 74 10 13.51 20 127 3.9
H2 84 81 3 3.7 20 120 4.2
H3 101 80 21 26.25 21 130 4

4 Discussion
The algorithm’s overall performance was the best for those with a size of
960 × 540 px, out of all three file resolutions. It was slightly faster (3.8 FPS
compared to 3.7 FPS with 960 × 540 px input files) but had higher absolute
(14.8 BPM for 640 × 580 px and 10.4 BPM for 960 × 540 px) and relative errors
when utilizing 640 × 580 px input files (18.4% for 640 × 580 px and 10.6% for
960 × 540 px).
However, when compared to 960 × 540 px, processing 1920 × 1080 px data
resulted in a considerable drop in performance, with just 2.5 FPS (a 32.4%
reduction in speed) and similar absolute (10.8 BPM) and relative error (10.6%).
It has been demonstrated that providing the algorithm with even higher resolu-
tion recordings does not necessarily improve its accuracy, but it does significantly
prolong the processing time.
Comparing acquired results of contactless photoplethysmography with ones
from conventional methods such as ECG or pulse oximetry, it can be noticed
that using this technique in general yields less accurate results in detecting heart
rate compared to traditional practices. It should be noted that average absolute
error of measurement performed using photopletysmography based pulse oxime-
ter is 0.6 BPM (10.4 BPM for proposed method on best suited data set) [2].
Contactless Heart Rate Measurement 463

In comparison to data from a study using pulse oximeter to measure pulse of


infants, the suggested technique is 5 times less accurate (measurements from
pulse oximeter from the study displayed absolute error of 2 BPM) [10]. It is also
around 2 times less accurate when compared to pulse oximeters from smart-
watches such as Apple Watch 3 (relative error of 5.86%) and Fitbit Charge
2(5.96%) [9]. However method proposed in the article does not require any pro-
fessional equipment. It can be implemented using low resolution camera (com-
monly found in smartphones) and a computer. As such it can be done from
home, not requiring any contact between patient and doctor, limiting risk of
contracting or spreading any potential disease, as well as eliminating need to
travel possibly long distances between patient’s place of residence and medical
facility.

5 Conclusions
The article presents a non-invasive pulse measurement method based on pho-
tographic images of the face. The accuracy of the method results was tested
on three resolutions of the input recordings. Relative and absolute error value
(means and deviations) was obtained for each data set. Algorithm performed the
best on 960 × 540 px videos with absolute error of 10.4 BPM and relative error
of 10.6% as well as shortest processing time. Relative and absolute error val-
ues from 1920 × 1080 px and 640 × 680 were 10.8 BMP and 10.6% respectively,
with longest processing time (slowest, no significant accuracy improvement com-
pared to 960 × 540 px videos) and 14.8 BPM and 18.4% (least accurate). As a
result, it can be deduced that using videos with even higher level of detail will
not improve the accuracy of the suggested method. The method can be useful
when it is impossible to use dedicated medical equipment to measure the human
pulse. Further development could involve using video files shot under various
lighting conditions to see how they affect the accuracy of the results. Another
area for improvement would be to expand the research group, with a specific
breakdown of those wearing makeup or having a beard to conclusively exclude
their impact on the results, as well as to include people of various skin types.
Examining the possibility of including the calculation of the respiratory cycle as
another function in the algorithm would be another director for further research.
Finally, more research on the influence of wearing a face mask would be required
in order to improve the utility of this method during pandemics.

References
1. Allen, J.: Photoplethysmography and its application in clinical physiological mea-
surement. https://doi.org/10.1088/0967-3334/28/3/R01
2. Cheng, Z., et al.: Noninvasive monitoring of blood pressure using optical Ballisto-
cardiography and Photoplethysmograph approaches. In: 35th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
(2013). https://doi.org/10.1109/EMBC.2013.6610029
464 J. Gumulski et al.

3. FaceLib library documentation. https://github.com/sajjjadayobi/FaceLib.


Accessed 24 Jan 2022
4. Heusch, G., Anjos, A., Marcel, S.: A Reproducible Study on Remote Heart Rate
Measurement. CoRR (2017). db/journals/corr/corr1709.html#abs-1709-00962
5. Jones, D.: The Blood Volume Pulse - Biofeedback Basics (2018). https://www.
biofeedback-tech.com/articles/2016/3/24/the-blood-volume-pulse-biofeedback-
basics. Accessed 24 Jan 2022
6. Lamba, P.S., Virmani, P.: Contactless heart rate estimation in humans using low
cost face video. J. Stat. Manage. Syst. 23(7) (2020). https://doi.org/10.1080/
09720510.2020.1799584. Intelligent Decision Making using Best Practices of Big
Data Technologies (Part-II)
7. Liu, X., Zao, Y., Kuang, H., Ma, X.: Face image age estimation based on data
augmentation and lightweight convolutional neural network. In: Symmetry 2020.
https://doi.org/10.3390/sym12010146
8. Nair, B.K., Lokhande, S.S.: Patient monitoring system using image processing.
Int. J. Adv. Res. Electr. Electron. Instrum. Eng. (2017). https://doi.org/10.15662/
IJAREEIE.2017.0606127
9. Nelson, B.W., Allen, N.B.: Accuracy of Consumer Wearable Heart Rate Mea-
surement During an Ecologically Valid 24-Hour Period: Intraindividual Validation
Study (2019). https://doi.org/10.2196/10828
10. Omar, C., et al.: Accuracy of Pulse Oximetry Measurement of Heart Rate of New-
born Infants in the Delivery Room (2008). https://doi.org/10.1016/j.jpeds.2008.
01.002
11. rPPG library documentation. https://github.com/nasir6/rPPG. Accessed 24 Jan
2022
12. Yu, Z., Peng, W., Li, X., Hong, X., Zhao, G.: Remote heart rate measurement from
highly compressed facial videos: an end-to-end deep learning solution with video
enhancement. In: IEEE/CVF International Conference on Computer Vision, pp.
151–160 (2019). https://doi.org/10.1109/ICCV.2019.00024
The Validation Concept for Automatic
Electroencephalogram Preprocessing

Julia M. Mitas1(B) and Katarzyna Zawiślak-Fornagiel2


1
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
julimit139@student.polsl.pl
2
University Clinical Center prof. K. Gibiński of the Medical University of Silesia,
ul. Ceglana 35, 40-752 Katowice, Poland
katarzyna zawislak@wp.pl

Abstract. Artifact detection in the electroencephalogram (EEG) anal-


ysis is one of the preprocessing phases that improves the final recognition
result. At this research step, a preprocessing function able to recognize
selected types of artifacts is developed. Detected artifacts are labeled for
further processing. As part of the proprietary concept, the effectiveness
of the proposed software solution is compared with the expert outcome.
This study is part of an international research project Tele-BRAIN, scien-
tifically oriented at supporting the early diagnosis of Parkinson’s disease,
with particular emphasis on cognitive disorders.
The obtained results in the form of preprocessed EEG signals are
then subjected to the analysis based on artificial intelligence methods,
presented as a separate study.

Keywords: Medical signal analysis · Preprocessing · EEG analysis ·


Early diagnosis of neurodegenerative diseases

1 Introduction

Electroencephalography is a diagnostic method commonly used in medicine to


diagnose neurological disorders. It also provides valuable information in the event
of a stroke, increased intracranial pressure, toxic and metabolic disorders, or dis-
turbances of consciousness [9]. In the last dozen or so years, EEG has gained
popularity in the context of the use of biocybernetic feedback therapy (biofeed-
back), in which a trained person observes specific parameters of the functioning
of his body and, on this basis, tries to bring the body into an optimal state of
work. Electroencephalography as a non-invasive diagnostic method is also used
in brain-computer interfaces, allowing paralyzed people to communicate with
the outside world only through brain waves [6].
An electroencephalogram is a record of the electrical activity of the cerebral
cortex. The recorded signal is created by information derived from neurons, which

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 465–473, 2022.
https://doi.org/10.1007/978-3-031-09135-3_39
466 J. M. Mitas and K. Zawiślak-Fornagiel

is a set of postsynaptic potentials, action potentials, and long-term depolariza-


tion of neurons [9]. Postsynaptic potentials, which result from synaptic activity
in neurons, are considered the main component of the brain’s electrical activity
recording due to their long duration and large electric field. Action potentials, due
to the induction of short-term local currents with a strongly limited field, do not
significantly contribute to the electroencephalogram. Long-term depolarization of
neurons can be recorded in EEG in the case of brain damage [7,9].
Common sources of signals that do not come from brain activity are transient
potentials called artifacts. Due to the reasons for their formation, artifacts are
divided into three groups that are not disjoint sets:
1. artifacts originating from external phenomena – events taking place outside
the body of the examined person, such as the electromagnetic activity of the
surrounding environment or measuring equipment,
2. artifacts occurring on the border of the internal and external environment –
events occurring at the contact of the skin and measuring electrodes,
3. artifacts derived from internal phenomena – all phenomena occurring as a
result of the natural bioelectric activity of the organism [7].
The presence of artifacts in the electroencephalogram generally makes it difficult
to analyze the results of the study. If the elimination of fragments containing
artifacts is performed visually by a qualified and authorized specialist, then (for
example) it requires 4 to 7 h to check the overnight EEG recording. The possi-
bility of automating such a costly operation is very interesting and attractive.
However, it should be borne in mind that the use of automatic analysis implies
particular problems, such as correlation perturbations between individual chan-
nels [7].
The validity of preprocessing may be the subject of a posteriori research
when, on the basis of the results of the Tele-BRAIN project, extensive diagnos-
tic knowledge is gathered, enabling the comparison of diagnostic effectiveness
without preprocessing and with the use of preliminary interference elimination
methods. From a formal logic point of view, removing redundant non-user infor-
mation shortens the data sequence and, generally, for any information compres-
sion, lowers the likelihood of error masking.

2 Goals and Tasks


The main idea of the study is to analyze the compatibility of subjective expert
knowledge with the results of detection and extraction of artifacts in the EEG
signal in people suffering from Parkinson’s disease. The inclusion criterion is
justified by the similar nature of the EEG signal. In the first place, this task
requires the development and implementation of detection methods for selected
types of artifacts. In particular, arbitrarily, based on the opinion of a high-class
specialist – scientist, it was decided to select the following three types of artifacts
most frequently occurring and the elimination of which brings the most benefits
from the point of view of improving the quality of the record:
Validation of EEG Preprocessing 467

– slow-varying artifacts related to, inter alia, respiratory movements, caused by


slow eye movements or originating from the bioelectrical activity of the skin;
– artifacts from the electrical activity of the heart;
– external electrostatic potentials associated with high movement of the exam-
ined person or resulting from serious disturbances caused by objects located
or moving in the vicinity of the examined person or measuring apparatus.

A separate, demanding technical issue is the extraction of artifacts, often con-


sisting of the isolation of so-called epochs and their elimination from the course.
Such a method may be questionable in the case of the high content of artifacts in
the course of a biomedical signal, but then it should be treated as an indication
for re-examination with extreme caution.
The main task in the context of justifying the indication for the detection and
elimination of artifacts was to compare the results of automatic preprocessing
with the assessments of experts, whose competence in the case of EEG signal
analysis is confirmed by appropriate licenses generally available to neurologists.
The starting point for further artificial intelligence-related activities in this
field is the assumption of the biomedical signal preprocessing requirement.
According to the coding theory, the reduction of data of no pertinent infor-
mation (or is indistinguishably dispersed with stochastic jamming sequences)
does not adversely affect the result of processing diagnostic information.

3 Methods

The overall approach consists of two phases. In the first one, the artifact candi-
dates are located based on single-channel analysis. Then, signal regions, including
detected artifact candidates, are subjected to spatial filters for multichannel anal-
ysis that effectively extracts discriminatory information for artifact detection.
This section first presents some details on single-channel detection of the
external electrostatic potential, slow-varying potential, and heart activity signal.
Then, the spatial analysis is presented for verifying the initial artifact detection.

3.1 Channel Artifact Detection

Three methods are presented for the external electrostatic potential, slow-
varying potential, and heart activity signal detection. The analysis is performed
on disjoint EEG segments of constant length. Three separate approaches depend-
ing on the artifact shape have been developed.
External electrostatic potential detection method contains the following
steps:

1. Calculation of maximum and minimum signal values in each block for each
EEG channel.
2. Calculation of medians (mXmax (j) and mXmin (j)) and standard deviations
(σXmax (j) and σXmin (j)) within each block in each channel.
468 J. M. Mitas and K. Zawiślak-Fornagiel

3. Threshold values determination as:

gXmin (j) = mXmin (j) + 6σXmin (j), (1)


gXmax (j) = mXmax (j) + 6σXmax (j). (2)

4. Comparison of the values of parameters X max (j, k) and X min (j, k) with the
threshold values gXmax (j) and gXmin (j), respectively. If the value of at least
one of the parameters is greater than the corresponding threshold value in
any of the EEG channels, the given block k is denoted as having an external
electrostatic potential artifact.

The slow-varying potential detection method is performed in the frequency


domain. The signal processing includes the following steps [7]:

1. Calculation of the Fourier transform of the EEG signal in each block for each
EEG channel.
2. Calculation of the value of the function given by the formula (8) defined based
on the Fourier transform:
λ 2
f =0 |
sjk (f )|
F (j, k) = fN 2  f +2 Hz 2
, (3)
f =0 |
sjk (f )| − fAC =fAC −2 Hz |
sjk (f )|

where: j is the EEG channel number, j ∈ {1, ..., J}, k is the block number,
k ∈ {1, ..., K}, λ is the frequency equal to the upper limit of the range in
which the increase in power was tested, and it falls within the range [0, 1];
defaults to 0.625 Hz, fN is the Nyquist frequency, equal to half the EEG
sampling frequency, fAC is the electricity grid frequency; 50 Hz in Europe.
3. Calculation of the threshold value for F (j, k).
To calculate the threshold value, the median value mF (j) of the distribution
of F (j, k) is determined for all k blocks in each channel j. The threshold
values in channel j for the value of F (j, k) in individual blocks are calculated
as:
gF (j) = 0.75 + 0.25mF (j). (4)
4. Comparing the value of F (j, k) with the threshold value gF (j).
If a parameter value is greater than the corresponding threshold value in
any EEG channel, the given block k is denoted as containing a slow-varying
potential artifact.

The method of detecting potentials originating from the electrical heart activ-
ity consists of several stages:
1. Calculation of the Pearson correlation coefficient of the EEG signal with the
ECG signal in each block for each EEG channel:

cov(X, Y )
rXY = , (5)
σX σY
Validation of EEG Preprocessing 469

where: cov(X, Y ) is the covariance defined as: cov(X, Y ) = E((X−E(X))(Y −


E(Y ))), σX , σY are the standard deviations of X and Y , respectively, E(X)
and E(Y ) are the expected values of X and Y , respectively. The correlation
of the EEG and ECG signals in each k block and each EEG channel j, is
given as:
cov(kj, ECG)
rkjECG = , (6)
σkj σECG
where: j is the EEG channel number, j ∈ {1, ..., J}, k is the block number,
k ∈ {1, ..., K}, ECG is the ECG channel.
2. Finding the maximum value of the correlation coefficient among the coeffi-
cients calculated for a given block k in all j channels:

RECG (k) = max(rkjECG ), (7)


j∈J

3. Comparing RECG (k) with the default threshold gRECG .


The gRECG threshold breakpoints are in the range (0.8, 1). If the value of
RECG (k) is greater than the threshold value, the given block k is marked as
an ECG-related artifact.
The threshold value has been chosen experimentally. The analysis has been
carried out repeatedly within various threshold values in the range of (0.80,
0.99). The value of 0.9 has given the best results. This will be verified at the
further stage of signal analysis.

3.2 Spatial Filters for Multichannel Analysis

Potentials collected from the surface of the head with the electrodes attached
to it feature a poor spatial resolution. This affects any pattern recognition per-
formance, including the EEG feature selection and classification as well as the
artifacts selection. In our study, the unwanted structures are to be detected.
In addition to the statistics implemented in the detection of external electro-
static potential or ECG-related distortions, the spatial filter technique has been
employed. It is based on implementing a combination signal from adjacent elec-
trodes (summation or subtraction with the corresponding weights). This leads
to a completely new signal that contains much more useful information. Such
“creation” of a new signal is referred to as spatial filters. Two methods operating
in these fields have been tested: Laplace filter (LF) and Local Average Technique
(LAT) [8].
Assuming a symmetrical arrangement of the electrodes, Laplasjan can be
determined by subtracting the weighted average values of the adjacent electrode
potential according to:
n
1
VLAP = V − Vi (8)
n i=1
where: VLAP is Laplasjan value, V is potential of the considered electrode, n is
number of adjacent electrodes, Vi is potential of the adjacent electrode.
470 J. M. Mitas and K. Zawiślak-Fornagiel

Laplace spatial filters are used to remove the mean value of the potential.
Every electrode features this potential, yet it does not provide useful information.
Local Average Technique increases the overall value if the adjacent compo-
nents feature a no significantly higher, yet spatially exiting value. It is determined
by summation the weighted average values of the adjacent electrode potential:
n
1
VLAT = V + Vi . (9)
n i=1

where: VLAT is Laplasjan value, V is potential of the considered electrode, n is


number of adjacent electrodes, Vi is potential of the adjacent electrode.
In order to test the Laplace filter and Local Average Technique, a two-stage
algorithm was implemented consisting of the following steps:
1. performing artifact detection in the entire EEG signal using channel artifact
detection methods,
2. applying spatial filtering technique to artifact regions extracted in the first
step.

4 System Architecture
The program was implemented in Python, version 3.8.5, using the PyQy5 library,
version 5.15.1, to implement the graphical interface [2,3]. For the preprocessing
of the EEG signal, the methods available in the libraries NumPy, version 1.19.2,
SciPy, version 1.5.2, math, and statistics [1,4,5] were used.
The user can manually select one detection method or the option to automat-
ically execute all methods sequentially. After an EEG signal is selected by the
user, the indicated file is checked in terms of its saving format and the correct-
ness of the internal structure. After pressing the Upload file button, the content
of the input file is read and loaded into the program. The artifacts detected are
displayed to the user (Fig. 1).

5 Results
The detection evaluation was based on six EEG studies. Selected signal segments,
including artifacts, were assigned by three independent experts as disruptive by
external electrostatic potentials, slow-varying potentials, and potentials originat-
ing from the electrical activity of the heart. They serve as a ground truth in the
evaluation process. Each EEG channel has been divided into 4-second segments
and subjected to the analysis. The comparison was based on four parameters:
sensitivity,
TP
TPR = , (10)
TP + FN
specificity,
TN
SP C = . (11)
FP + TN
Validation of EEG Preprocessing 471

Fig. 1. Graphical user interface

precision,
TP
prec = , (12)
TP + FP
and accuracy
TP + TN
ACC = . (13)
P +N
Both detection steps have been evaluated. First, the channel artifact detec-
tion has been performed, then spatial filtering for the multichannel analysis has
been for a more general approach including the spatial information. The results
are shown in Table 1. The first column presents the channel detection phase, the
second and third column indicates the correction obtained after the spatial fil-
tering procedures, including the LF and LAT, have been employed. The overall
effectiveness of the developed approach is shown in Table 2.

Table 1. Evaluation of the effectiveness of the spatial filters

Detection method Laplace filter LAT


TP = 43 FP = 17 TP = 47 FP = 14 TP = 53 FP = 8
FN = 0 TN = 30 FN = 0 TN = 30 FN = 0 TN = 30
472 J. M. Mitas and K. Zawiślak-Fornagiel

Table 2. Evaluation of the effectiveness of the implemented methods

Sensitivity Specificity Precision Accuracy


External electrostatic potentials 0.769 0.998 0.909 0.993
Slow-varying potentials 0.862 0.970 0.908 0.942
ECG-related artifacts 0.719 0.990 0.888 0.964

6 Discussion and Conclusions


Signal preprocessing in the electroencephalographic diagnosis faces an important
issue. The detection of artifacts is essential in further analysis. Their extraction
has to be precise to identify their type and location. Further signal correction or
artifacts removal must not influence the diagnosis. In this study, a two-phase detec-
tion method has been developed. It identifies three types of distortions: external
electrostatic potential, slow-varying potential, and heart activity signal. The first
phase locates the candidates, the other eliminates misclassifications, particularly
the false positives. This eliminates as small amount of signal data as possible and
reduces further unnecessary interventions in the region, that lacks any distortions.
Misclassifications elimination algorithm was performed using 6 EEG files, on
which the external electrostatic potentials were first detected, separately in each
channel. Then, the blocks with artifacts were tested again using the implemented
mechanisms. The testing results are shown in Table 1. Of the 61 blocks marked
as disturbed, Laplace filter rejected 14 (23% of the results), which shows that
more than 1/5 of the blocks were marked as False Positive. The LAT algorithm,
on the other hand, rejected 8 blocks (13% of the results), from which it can be
concluded that over 1/8 of the results were False Positive.
The effectiveness of the approach was tested on clinical EEG studies, which
consisted of over 3000 four-second signal segments, achieving satisfactory results.
In order to bring the problem closer to the clinical reality, the concept of validation
was adopted with the participation of experts with many years of experience, using
developed heuristics in clinical practice and procedural neurological knowledge.
This opens up a possible implementation of methods allowing for early detec-
tion of symptoms that should be supported by IT approaches, considering that
in-depth clinical diagnostics in this area is time-consuming and requires medical
experts and neuropsychologists. The algorithm, being part of a more general
research grant, is implemented in a computer assisted system to support the
EEG analysis in clinical environment.

7 Summary
The data presented in the table show the real values of the experiment. For the
obtained results of the experiment, the minimum precision value is 0.89, and the
accuracy is 0.94. In other cases, these measures reach slightly higher values. A thor-
ough a posteriori analysis (after the tests with the participation of specialists) basi-
cally shows the advantages of the automatic artifact elimination method:
Validation of EEG Preprocessing 473

1. reduces the smallest possible amount of information,


2. prepares the signal for undisturbed analysis using artificial intelligence meth-
ods.

Early diagnosis of dementia changes (trends) is essential to prolong the patient’s


ability to function independently. The risk of masking this tendency (markers)
in the EEG signal, which is essentially invisible to a specialist, is considerable
and currently difficult to quantify. This is an argument in favor of preparing
the results of biomedical measurements so that they contain as much useful
information as possible but without unnecessary data that simply distracts the
spectrum of the results.
The essence of the project was the preparation, implementation, and valida-
tion of such a study that is effective for situations commonly considered atypical
(difficult to recognize) when the experience of a specialist doctor and the devel-
oped heuristics are effective at a random level. Then the software is helpful,
showing with an acceptable probability the method of further preprocessing,
consisting essentially of the sensitive extraction and, importantly, the minimum
part of the signal. The practical usability of the software is currently being tested
and will be the subject of further research, especially in terms of aliasing and
the acceptable degree of information compression.

Acknowledgement. The study was supported by the Polish-German grant in the


field of DIGITIZATION of ECONOMY: ’Artifically Intelligent EEG Analysis in the
Cloud – TeleBrain’ (grant no. WPN3/9/TeleBrain/2018).

References
1. NumPy documentation. https://numpy.org/. Accessed 13 Dec 2020
2. Pyqt5 Reference Guide. https://www.riverbankcomputing.com/static/Docs/
PyQt5/. Accessed 13 Dec 2020
3. Python languge documentation. https://www.python.org/. Accessed 13 Dec 2020
4. The Python Standard Library documentation. https://docs.python.org/3/library/
index.html. Accessed 13 Dec 2020
5. SciPy documentation. https://www.scipy.org/. Accessed 13 Dec 2020
6. Borkowski, P.: Atlas EEG i QEEG: podrecznik
 ilościowej elektroencefalografii i jej
zastosowanie w planowaniu neurofeedbacku. Wydawnictwo Biomed Neutrotechnolo-
gie (2017). https://books.google.pl/books?id=XRyQtAEACAAJ
7. Klekowicz, H.: Opis i identyfikacja struktur przejściowych w sygnale EEG. Ph.D.
thesis, Instytut Fizyki Doświadczalnej, WydzialFizyki, Uniwersytet Warszawski
(2008)
8. Kolodziej, M.: Przetwarzanie, analiza i klasyfikacja sygnalu EEG na użytek inter-
fejsu mózg-komputer. Ph.D. thesis, Wydzial Elektryczny, Politechnika Warszawska
(2011)
9. Rowan, A.J., Tolunsky, E., Sobieszek, A.: Podstawy eeg z miniatlasem. Wydanie I
polskie. Sobieszek A.[red.], Urban & Partner, Wroclaw (2004)
Do Contractions of Abdominal Muscles
Bias Parameters Describing Contractile
Activities of a Uterus? A Preliminary
Study

Dariusz Radomski(B)

Institute of Radioelectronics and Multimedia Technology,


Warsaw University of Technology, Nowowiejska 15/19, 00-665 Warsaw, Poland
D.Radomski@ire.pw.edu.pl

Abstract. Monitoring of uterine contractions is a routine procedure in


obstetrical wards. The new method called an electrohysterography has
many advantages in compering to the external tocography. However,
there is no clinical standard for measuring and parameterization of the
bioelectrical activity of a pregnant uterus. Particularly, factors affecting
the electrohysterographical signals (EHG) were not studied. This paper
shows that contractions of abdominal muscles can change the commonly
used parameters of electrohysterographical signals in the same way like
the uterine contractions. Therefore, we postulate to parametrize an enve-
lope of the EHG signals instead of the raw EHG. Moreover, this paper
initiates a terminological discusiosn asking whether EHG really measures
bioelectrical or biomechanical activities of myometrium.

Keywords: Labour · Electrohysterography · Electromyography ·


Mean frequency · Sample entropy · Abdominal muscles

1 Introduction
Reproduction is an essential goal of life for many people. However, the biological
and psychological mechanism of reproduction, including labour is still little known
because of historical and cultural constraints which prevail in many countries. On
the other hand, lack of a model describing a given process makes it impossible to
control it effectively. Thus, midwives and obstetricians’ lack of knowledge of labour
biomechanics makes it difficult for them to manage the labour and delivery process.
It results in an excessive number of Caesarean sections and increasing number of
neonatal complications observed in developed countries [10].
The most important clinical task in labour management is to control the
force which causes delivery of a healthy baby. This force is generated by period-
ically variable intrauterine pressure which moves the foetus down to the vagina.
There are two sources of that pressure: (i) uterine contractions and (ii) abdom-
inal pressure created by parturient’s consciously contracting abdominal skeletal
muscles and simultaneously relaxing pelvic floor muscles.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 474–483, 2022.
https://doi.org/10.1007/978-3-031-09135-3_40
Do Contractions of Abdominal Muscles Bias Parameters. . 475

On the basis of a biomechanical model Miller et al. indicated that mother’s


pushing increased the expulsive force twice as compared to the force generated
only by uterine contractions [1]. Despite this fact, labour progress monitoring is
limited to routine measurement of the uterine component only.
The most accurate method assessing uterine contractions is measurement of
intrauterine pressure (IUP) by a balloon sensor placed inside the uterus. This
way is treated as a “gold standard”. However, it cannot be used clinically because
it is invasive and increases the risk of an intrauterine infection.
The standard method used involves external tocography which indeed mea-
sures the increasing stiffness of the uterine wall during its contraction. This way
has a number of drawbacks. Comparing the intrauterine pressure course with
the tocographic signal shows weak correlation in relation to the moment of a
contraction, its duration and amplitude. These discrepancies are greater with
the growing BMI of parturient women [3].
Electrohysterography (EHG) as an alternative to external tocography, mea-
sures bioelectrical activity of myometrium preceding the mechanical contraction
of a uterus. There is a large number of methods used for estimation of mechanical
uterine activity based on EHG signals. An excellent review of those methods is
provided by Garcia-Cascado et al. [2]. Some of them use linear or nonlinear EHG
parametrization that highly correlate with labour progress. Others construct an
IUP estimators based bioelectrical signals.
Multiplicity of these ways stems from the fact that there is no coherent model
explaining physiology of a uterine contraction at the cell and organ level. It seems
likely that a uterus has several pacemakers rather than one (as in case of a heart).
They are randomly distributed which works as coupling oscillators [13].
However, there is an agreement among researchers that propagation of bio-
electrical signals through the uterus is very slow. Therefore, the interested fre-
quency passband of EHG signals is limited to maximum 3.00 Hz, the mean
frequency usually not 1 Hz.
Yet, the EHG can be seen as a particular case of electromyographic signals
(EMG). There is the widely accepted assumption stating that the low frequency
of EMG signals is 60–100 Hz. Traditionally, the frequencies 40 Hz are treated
as mechanical artifacts, so, the researchers assume that the frequencies 3 Hz
observed in the EHG signals come from a uterus.
However, the review of mechanographical responses during muscles activi-
ties as well as the last analyses of low frequencies of EMG signals show that a
contracted muscle produces mechanical vibration measured also in EMG [5,7].
The lastly published results suggest that abdominal muscles contractions
support uterine contractions in the 2nd stage of a labour [8]. Moreover, Ola-
dosu et al. showed that menstrual cramps of a uterus were also associated with
spontaneous rectus abdominis contractions [6].
Therefore, the aims of the study was to verify whether isometric contractions
of a rectus abdominis could affect some EMG parameters which traditionally
were interpreted as uterine activities. We concentrated on the RMS values, the
mean frequencies of the “uterine passband” and the sample entropy values. There
are the most common EHG parameters used for labour predictions. They can be
476 D. Radomski

treated as a base of the EHG parameters space. The other published parameters
are highly correlated with the selected for the presented study.

2 Methods

To identify a potential impact of abdominal muscles contractions on an EMG


signal limited to the passband usually assigned to uterine contraction, the fol-
lowing studies were performed.

2.1 Study Groups


To avoid mutual confounding of uterine and muscle contractions, the two inde-
pendent study groups were used. The group designated for studying muscle
concentrations contained 14 healthy, unpregnant women being in a follicular
phase of their menstrual cycles. Such group was selected in order to limit other
one sources of the analysed signal. All women were in reproductive age (26 ± 5
years).
The second group was employed for studying uterine contractions was con-
sisted of 15 pregnant women observed in two time moments (a trivial longitudinal
study). All subjects gave their informed consent for inclusion before they partici-
pated in the study. The study was conducted in accordance with the Declaration
of Helsinki, and the protocol was approved by the Bioethics Committee of the
Medical University of Warsaw.

2.2 Acquisition of Rectus Abdominis EMG Signals

The study protocol was performed as follows. The bipolar electrodes were placed
in the middle point between a navel and a pubic symphysis. The reference elec-
trode was placed on the right hip. The EMG signals were registered using the
commercial EMG device named Summit Cadwell Electromyography System. The
lowest sample frequency was 100 kHz. Firstly, the examined woman was relaxed
for 5 min. Next, the EMG signal was registered for 10 s in the rest state. After,
the woman made the isomeric contractions of the rectus abdominis for 15 s. The
EMG signal is registered too.

2.3 Acquisition of EHG Signals of Uterine Contractions

Using traditional nomenclature, a bioelectrical signal registered for monitoring of


uterine contractions will be called electrohysterography. The first measurements
were done in the 2nd trimmest of a pregnancy. The second ones were performed
during 2nd stages of labours.
Do Contractions of Abdominal Muscles Bias Parameters. . 477

2.4 EMG Signal Parametrization

The registered EMG signals were down-sampled to 200 Hz and filtered by the
7th order low pass Butterworth?s filter with the cuf-off frequencies responding to
the low passband (0.32–3Hz) and the high passband (20–60Hz). To investigate
an influence of abdominal muscles contractions on low frequency EMG signal
(limited to the “uterine passband”) the following parameters were calculated. In
time domain the was the root mean square computed as:

1 N 2
rms = x , (1)
N i=1 i

where xi is the i-th sample of the rectified EMG signal.


Moreover, an instantaneous frequency was computed for the both passbands
to present a relation between slow and fast components of EMG signals. A con-
ception of instantaneous frequency derives from the theory of signal modulation.
Electromyographic signals are also modulated because the number of activated
myocytes depends on a muscle load. The instantaneous frequency is defined as
follows [11]:

1 dϕ(t)
f (t) = (2)
2π dt
where ϕ(t) is the instantaneous phase expressed by

H (x(t))
ϕ(t) = arctan . (3)
x(t)

The operator H (x(t)) is a Hilbert transform of the bioelectrical signal. It is


given by the following formula:
∞
1 x(t)
X(s) = H (x(t)) = dt (4)
π s−t
−∞

Computing of the instantaneous frequency is impractical by definition


because it produces negative frequencies. Therefore, the instantaneous frequency
was estimated using the first moment of the time-frequency power spectrum, i.e.

FS
2 f P (f, t) df
0
f (t) = (5)
 F2S
0 P (f, t) df

There are many methods for estimating the time-frequency power spec-
trum P (f, t). The most accurate is the Wigner-Ville distribution. However, this
method is memory and time consuming so its use in clinical monitoring is limited.
Also, it gives biased instantaneous frequency in case of noisy signals.
478 D. Radomski

Moreover, we do not need high time-frequency resolution because we want


only to observe changes of instantaneous frequencies caused by uterus or abdomi-
nal muscles. Therefore, the power spectrum was estimated by Short Time Fourier
Transform:  ∞ 2
 
 
P (f, t) =  x(τ )h(τ − t)e−j2πf τ dτ  (6)
 
−∞

The Hann sliding window h(t) was applied. The estimator (5) can be treated
as a time dependent mean frequency because the following equation holds:

 T
FS
2 f P (f )df
1
f¯ = f (t)dt = 0
, (7)
T 0  F2S
0 P (f )df

where P (f ) is a power spectrum. The right side of the Eq. (7) was used for
computing the mean frequency of the slow EMG waves and EHG signals.The
lowest frequency fd = 0.32 Hz and the highest frequency fu = 3 Hz.
The last studied parameter was the sample entropy that correctly described
a nonlinear character of the electrohysterographical signal. The algorithm for
the sample entropy computation can be found in [9]. The Wilcoxon rank sign
test was applied to identify an impact of abdominal muscle contractions on the
above parameters.

3 Results

The power spectrum, mean frequency, root mean square and sample entropy
values were calculated for the rectified, filtered EMG signals registered in 14
unpregnant women before and during the rectus abdominis isometric contrac-
tion. The Fig. 1 shows the low frequency (up to 3Hz) and the EMG signal reg-
istered in a random woman during contraction of the rectus abdominis. We can
note that contraction of this muscle produces also the low frequency waves which
traditionally are treated as the artefact stemmed from muscle movements. The
performed analyses also indicate that the instantaneous frequencies computed for
slow and fast EMG waves are mutually dependent. The short isometric contrac-
tion of the abdominal muscle is associated with increase of the high frequency.
This frequency decreases when muscles fatigue appears (Fig. 2). Simultaneously,
the strong muscle contraction decreases the slow instantaneous frequency. Next,
it increases when the muscle becomes tired. Interestingly, a cough also reduces
the instantaneous frequency in the low passband wave of the EMG signal.
The Fig. 3(a) presents the persons-averaged power spectrum of the filtered
EMG signals. We observe that during contraction of a rectus abdominis the low
frequency components of EMG signals dominate. The Table 1 shows comparison
of the analyzed parameters during the relax phase and the isometric muscle
contractions.
Do Contractions of Abdominal Muscles Bias Parameters. . 479

Fig. 1. The frequency component of a EMG signal registered in a random unpregnant


woman during contraction of the rectus abdominis. Additionally, the impact of a cough
on both the components is presented. a) the EMG signal filtered to the standard
passband, b) the EMG signal filtered to the passband typically used for EHG signals

Fig. 2. The instantaneous frequencies of a rectified EMG signal registered in a random


unpregnant woman during contraction of the rectus abdominis. a) the EMG signal
filtered to the standard passband, b) the instantaneous frequency for the low passband
corresponding to the typical EHG signal c) the instantaneous frequency of the high
passband usually analyzed for assessment of a muscle activity

Table 1. Comparison of the studied parameters of the EMG signals in the low fre-
quency band. * denotes statistically significant differences (p<0.05)

Parameter Relax value Contraction value


Mean frequency 0.82 ± 0.24 Hz 0.67 ± 0.15 Hz *
RMS 2.41E3 ± 559.82 3.02E3 ± 1.35E3 *
Sample entropy 1.41 ± 0.34 1.59 ± 0.43 *
480 D. Radomski

Fig. 3. The person averaged spectrum of (a) the EMG signal registered during relax
and rectus abdominis contraction and (b) EHG signal observed during pregnancies and
labours

The performed Wilcoxon test showed the statistically significant differences


in all studied parameters during relaxations and contractions of the abdominal
muscles (Table 1).
The Fig. 4 presents comparison a slow wave of an EMG signal registered in a
random unpregnant woman with the EHG signal registered in a random woman
being in 2nd stage of a labor. The both signals have the same passband, i.e.
0.32–3Hz.

Fig. 4. Comparison EMG of the rectus abdominis and EHG of a random labor woman.
a) slow waves of the EMG signal filtered to 0.32–3Hz registered in a random unpregnant
woman. b) the EHG signal filtered to 0.32–3Hz registered in a random woman during
the 2nd stage of labor
Do Contractions of Abdominal Muscles Bias Parameters. . 481

We note that the time course of these signals is similar. Naturally, the burst
of EHG signal is longer because in our study duration of a muscle contraction
was ca 15–20 s while duration of a uterine may last 60 s. However, these courses
suggest that EMG signals can interfere with EHG signals. The person averaged
spectra obtained for the pregnant women and women being in the active labour
are shown in the Fig. 3(b).

Table 2. Comparison of the studied parameters of the EHG signals in the low frequency
band. * denotes statistically significant differences (p < 0.05)

Parameter Pregnancy value Labour value


Mean frequency 1.06 ± 0.24 Hz 0.87 ± 0.16Hz *
RMS 994.96 ± 279.56 1.22E3 ± 327 *
Sample entropy 1.81 ± 0.60 1.35 ± 0.50 *

The changes of the analyzed EHG parameters between the pregnancies (with-
out uterine contractions) and the 2nd stages of labours are similar to those
obtained in the previous experiment. Again, the differences in the all studied
parameters were statistically significant (Table 2).

4 Discussion
Although different parameters of electrohysterograhical signals are published,
their robustness never has been evaluated. The obtains results show that the
most commonly proposed parameters can be biased by contractions of female
abdominal muscles. The shape of spectra computed for the low frequency of
EMG signals and EHG signals were similar. The spectra obtained for pregnant
or labour women are more smoothed because of these women had greater BMI
than participants enrolled to the muscles study.
Admittedly, the study performed post vivo proves that a myometrium has
its bioelectrical activity but frequency characteristic was not investigated [4].
It seems that the “uterine like” changes of the low frequency components of
EMG signals showed in this research can stemmed from the mechanical tremor
of the contracted rectus abdominis. Our results are agreed with the study per-
formed by Qian et al. who show that contractions of the abdominal muscles can
even affect a tocographic signal [8].
Therefore, on the base of the obtained results we suggest that assessment of
uterine activities requires firstly computation of the envelope of EHG signals.
This procedure averages eventual muscle contractions because they are shorter
that the labor uterine contractions.
Confirmation of the presented results needs to measure a bioelectrical signals
during a labor. Then we could monitor simultaneously uterine contractions as
well as abdominal muscles activities.
482 D. Radomski

5 Limitations of Study
According to my best knowledge there is the first study indicated that the con-
tracted abdominal muscles may give a false positive detection of uterine con-
tractions based on EHG signals. However, the major limitation of this study is
measurements of EMG and EHG signals in two separate group of women. It
was necessary in the preliminary study to exclude unaware uterine contractions
during voluntary tensions of abdominal muscles. However, these results should
be confirmed on prospective studies of pregnant women who consciously tense
their abdominal muscles with and without uterine contractions.

6 Conclusions

In our opinion, the presented results have dual significant impact. On the one
side, they firstly show that the commonly used parameters of raw EHG signals
can be biased by contractions of the abdominal muscles, mainly by the abdominis
rectus. Thus, we must develop methods allowing for differentiation sources of
bioelectrical signals measured on the abdominal skin of a pregnant or labor
woman. On the other side, it is a raison to recommend wide-band registration
of so called EHG signals to monitor whether conscious pushes performed by a
woman are synchronic with uterine contractions.
Lastly, the presented considerations initiates a terminological discussion ask-
ing whether EHG really monitors bioelectrical activity of uterus or maybe this
method should be called mechanohisterography Perhaps, the mechanical tremor
stemmed from a contractile myometrium changes the local electrical impedance
and these changes are received by the electrodes placing of an abdominal skin.
This interpretation seems to be confirmed by an initial proposition of an appli-
cation of accelerometers for this purpose [12].

Acknowledgement. I would like to extend my thanks to Elmico LTD represented by


Marcin Miatacz for borrowing the EMG equipment and to Krzysztof Kruszewski for
registered bioelectrical signals of the abdominal muscles, the last but not least to my
Sister Magda Platek for her warm motivations to write this paper.

References
1. Ashton-Miller, J.A., DeLancey, J.O.: On the biomechanics of vaginal birth and
common sequelae. Ann. Rev. Biomed. Eng. 11(1), 163–176 (2009). https://doi.
org/10.1146/annurev-bioeng-061008-124823. PMID: 19591614
2. Garcia-Casado, J., Ye-Lin, Y., Prats-Boluda, G., Mas-Cabo, J., Alberola-Rubio, J.,
Perales, A.: Electrohysterography in the diagnosis of preterm birth: a review. Phys-
iol. Measur. 39(2), 02TR01 (2018). https://doi.org/10.1088/1361-6579/aaad56
3. Hayes-Gill, B., et al.: Accuracy and reliability of uterine contraction identifica-
tion using abdominal surface electrodes. Clin. Med. Insights Women’s Health 5,
CMWH.S10,444 (2012). https://doi.org/10.4137/CMWH.S10444
Do Contractions of Abdominal Muscles Bias Parameters. . 483

4. Kuijsters, N.P., et al.: Propagation of spontaneous electrical activity in the ex


vivo human uterus. Pflügers Archiv - Euro. J. Physiol. 472(8), 1065–1078 (2020).
https://doi.org/10.1007/s00424-020-02426-w
5. Moon, H., et al.: Force control is related to low-frequency oscillations in force
and surface EMG. PLOS ONE 9(11), 1–9 (2014). https://doi.org/10.1371/journal.
pone.0109202
6. Oladosu, F.A., et al.: Abdominal skeletal muscle activity precedes sponta-
neous menstrual cramping pain in primary dysmenorrhea. Am. J. Obstet.
Gynecol. 219(1), 91.e1–91.e7 (2018). https://doi.org/10.1016/j.ajog.2018.04.050.
URL https://www.sciencedirect.com/science/article/pii/S0002937818303818
7. Potvin, J.R.: Effects of muscle kinematics on surface EMG amplitude and frequency
during fatiguing dynamic contractions. J. Appl. Physiol. 82(1), 144–151 (1997).
https://doi.org/10.1152/jappl.1997.82.1.144. PMID: 9029209
8. Qian, X., Li, P., Shi, S.Q., Garfield, R.E., Liu, H.: Simultaneous recording and
analysis of uterine and abdominal muscle electromyographic activity in nulliparous
women during labor. Reprod. Sci. 24(3), 471–477 (2017). https://doi.org/10.1177/
1933719116658704. PMID: 27436367
9. Radomski, D.: Sensitivity analysis of a sample entropy estimator on its parameters
in application to electrohysterographical signals. Biocybern. Biomed. Eng. 30(2),
67–72 (2010)
10. Sandall, J., et al.: Short-term and long-term effects of caesarean section on the
health of women and children. Lancet 392(10155), 1349–1357 (2018). https://
doi.org/10.1016/S0140-6736(18)31930-5, https://www.sciencedirect.com/science/
article/pii/S0140673618319305
11. Singh, P.: Breaking the limits: redefining the instantaneous frequency. Circ. Syst.
Sign. Process. 37(8), 3515–3536 (2018). https://doi.org/10.1007/s00034-017-0719-
y
12. Urdal, J., Engan, K., Eftestol, T., Yarrot, L., Kidanto, H., Ersdal, H.: Noise and
contraction detection using fetal heart rate and accelerometer signals. In: Proceed-
ings of the 17th Scandinavian Conference on Health Informatics, pp. 121–126. Oslo,
Norway (2019)
13. Young, R.C.: The uterine pacemaker of labor. Best Pract. Res. Clin. Obstet.
Gynaecol. 52, 68–87 (2018). https://doi.org/10.1016/j.bpobgyn.2018.04.002. URL
https://www.sciencedirect.com/science/article/pii/S1521693418300786. Biological
Basis and Prevention of Preterm Birth Treatment
Simulation and Modelling
Finding the Time-Dependent Virus
Transmission Intensity via Gradient
Method and Adjoint Sensitivity Analysis

 akomiec(B) , Agata Wilk, Krzysztof Psiuk-Maksymowicz,


Krzysztof L
and Krzysztof Fujarewicz

Department of Systems Biology and Engineering, Faculty of Automatic Control,


Electronics and Computer Science, Silesian University of Technology,
ul. Akademicka 16, 44-100 Gliwice, Poland
{krzysztof.lakomiec,agata.wilk,krzysztof.psiuk-maksymowicz,
krzysztof.fujarewicz}@polsl.pl

Abstract. In this work we propose a method for numerical finding


of a function representing the time-dependent virus transmission inten-
sity coefficient in the exemplary SEIR model of infectious disease. Our
method is based on gradient minimization of a predefined functional and
uses a gradient obtained from adjoint sensitivity analysis. To apply this
method to the exemplary SEIR model we used publicly available infec-
tion data concerning the COVID-19 cumulative cases in Poland.

Keywords: Gradient optimization · Adjoint sensitivity analysis ·


SEIR epidemic model · COVID-19 pandemic

1 Introduction
The ongoing COVID-19 pandemic has caused a surge in the interest in modelling
of infectious diseases, mostly because mathematical modelling can be used to
among others predict the future number of infections in a particular population.
This knowledge can be used for control strategy planning to mitigate the effects
of these future infections.
There are numerous epidemic models in the literature which can be used to
describe the COVID-19 pandemic, an important group of which are compartmen-
tal models. They are based on differential equations describing the flow between
compartments representing different states of the individuals in the affected pop-
ulation. The models vary in number and type of compartments, ranging from the
classic Susceptible-Infected-Removed (SIR) model, to more sophisticated models
distinguishing for example asymptomatic cases, quarantined, isolated or hospi-
talized individuals, vaccinations or different levels of susceptibility to infection
[1,10,15,17].
To use these models for simulation of the COVID-19 pandemic we need to
estimate the parameters appearing in them. These parameters can be catego-
rized into two groups: stationary parameters representing constants, and non-
stationary which can represent time-dependent functions. For estimation of these
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 487–497, 2022.
https://doi.org/10.1007/978-3-031-09135-3_41
488 K. L
 akomiec et al.

parameters one can use publicly available data on the COVID-19 cases in a par-
ticular country. However, due to very high noisiness of these data the estimation
process can be complicated. Additionally, estimation of non-stationary parame-
ters is particularly challenging due to the high computational cost.
Various solutions to the problem of parameter estimation in compartmental
epidemic models have been employed. Some noteworthy approaches to parameter
estimation include ensemble Kalman Filter [3,9], Particle Swarm Optimization
[11] and trust-region [16]. We present a method of estimating the non-stationary
parameters representing time functions in which we use the gradient approach,
where the gradient is obtained from adjoint sensitivity analysis. The computa-
tional cost of the method presented in this paper depends mainly on the length
of the time horizon used for simulation. Moreover, only one simulation of the
adjoint system is required in order to obtain the whole gradient.

2 Problem Formulation

Let us consider a mathematical model given in a following state-space represen-


tation: 
ẋ(t) = f (x(t), u(t))
(1)
y(t) = g(x(t))
where: x(t) is the state of the system at time t, y(t) is the output of the system,
and u(t) represents the input signal affecting the system at time t. f (·) and g(·)
are some nonlinear functions.
The objective function which will be minimized is given by the following
equation:
Je = h(y(t)), (2)
where h(·) is some nonlinear functional.
Problem: Find u(t) that minimizes the objective function Je .
Solution: Use a gradient method – the nontrivial task in the problem is the
calculation of the gradient ∇u(t) Je .

3 Solution of the Problem with Exemplary Model

3.1 SEIR Model

To illustrate finding the time-dependent virus transmission intensity coeffi-


cient we used the Susceptible-Exposed-Infectious-Removed (SEIR) model [12],
described by the following system of ordinary differential equations:
⎧ −β(t)S(t)I(t)
⎪Ṡ(t) =

⎪ N
⎨Ė(t) = β(t)S(t)I(t) − k E(t)
N EI
(3)

⎪ ˙ = kEI E(t) − kIR I(t)
I(t)


Ṙ(t) = kIR I(t)
Finding the Time-Dependent Virus Transmission Intensity 489

with the following initial conditions: S(0) = N − 1, E(0) = 0, I(0) = 1,


Rk (0) = 0.
In the above equations the variables S, E, I, and R represent the numbers
of individuals in the population who are susceptible, exposed, infectious and
removed from compartments, respectively. N is equal to the sum of all compart-
ments of the SEIR model (3), therefore:
N = S(t) + E(t) + I(t) + R(t) = const. (4)
The coefficients kEI and kIR are parameters of the SEIR model (3), which stand
for the inverse of times of viral latency (defined as the time to becoming con-
tagious, not to symptom onset) and of recovery from infection, respectively. We
set these values as kEI = 0.15, kIR = 0.1. These values correspond to approxi-
mately 6.7 days from exposure to the virus to beginning of the infectious period
[14], which then lasts for further 10 days (consistent with the generally imple-
mented quarantine time). The function β(t) represents the time-dependent virus
transmission intensity.

3.2 Infection Data


The statistics on reported COVID-19 cumulative cases in Poland were taken
from the JHU CSSE data repository [2]. The chosen time horizon includes data
starting from 22 January 2020 and ending on 31 January 2021 (376 data points,
see Fig. 1).

105
16

14

12
Cumulative infections

10

0
0 50 100 150 200 250 300 350
Time [days]

Fig. 1. COVID-19 infection data for Poland

Due to the chosen time horizon and sampling density we can assume that
this data is quasi-continuous in time.
490 K. L
 akomiec et al.

3.3 Adjoint Sensitivity Analysis


Objective Function. Under assumption that the infections data (Cd ) is con-
tinuous in time the objective function that we minimized can be defined by the
following equation:  tf  2
J= Cm (t) − Cd (t) dt, (5)
0

where: tf represents the final time of simulation, Cm (t) and Cd (t) are the num-
bers of cumulative infections at time t predicted by the SEIR model and obtained
from data, respectively. Therefore, Cm (t) is equal to:

Cm (t) = I(t) + R(t). (6)

The gradient of the objective function (5) with respect to the signal β(t) can be
obtained from an adjoint system constructed by transformations of the original
system given by a block diagram [4–7].

Construction of the Adjoint System. To build the adjoint system for this
example we should do the following tasks:
1. Construct a block diagram representing the extended model where the input
is the function β(t) and the output is our objective function J (Fig. 2)
2. Construct the sensitivity model (Fig. 3) by applying the following transfor-
mations to the block diagram of the extended model:
– change each signal to its variation,
– change each nonlinear element to its derivative.
3. Construct the final adjoint system (Fig. 4) by applying the following trans-
formations to the block diagram of the sensitivity model:
– change the direction of all signals to the opposite,
– change all nodes to summing junctions and vice-versa,
– inverse in time all signals from the extended model.
If we stimulate the adjoint system (Fig. 4) by Dirac delta at time 0, then the
gradient ∇β(t) J can be obtained using the following equation:

∇β(t) J = β(tf − t). (7)

3.4 Optimization Procedure


To find the function β(t) we used the iterative gradient-descent method which
is defined by the following equation:

βnew (t) = βold (t) − α∇β(t) J, (8)

where α is a step of the gradient-descent method decreasing during the opti-


mization progress.
Finding the Time-Dependent Virus Transmission Intensity 491

Fig. 2. Block diagram of the analyzed extended model, signal J(t) ˜ has the property
that at final time tf its value is equal to the objective function J

4 Results

The resulting function β(t) of the model (3) after the optimization procedure
is shown on Fig. 5. The fit of the SEIR model to the analysed cumulative cases
data is shown on Fig. 6. We can also compare the fit of the model (3) to the
daily cases, introducing new variables Dm and Dd representing daily new cases
obtained from the model and from the cumulative data, respectively (Fig. 7).
Simulation of the SEIR model using the function β(t) after the optimization
procedure is shown in Fig. 8.
The model produced a near-perfect fit, smoothing the noisiness of the real
data (which is most pronounced compared with daily cases). However, there
is a visible discrepancy between the model and the analyzed data at the end
of the simulation. To avoid this discrepancy one can use an additional term in
492 K. L
 akomiec et al.

Fig. 3. Block diagram of the senstivity model. Signals indicated with an overline are
variations of particulars signals from the extended model

the objective function responsible for improvement of the fit at the end of the
simulation:  2
J  = J + c Cm (tf ) − Cd (tf ) , (9)

where c is some positive constant responsible for the strength of the improve-
ment. In that case during optimization we should compute the gradient of the
modified objective function J  with respect to the signal β(t).
Finding the Time-Dependent Virus Transmission Intensity 493


Fig. 4. Block diagram of the adjoint system. Signals S(t), 
E(t), 
I(t), 
R(t) are state

˜ 
variables of the adjoint system, J(t), β(t) are the input and output of the adjoint
system, respectively. Time t∗ is the inversed time equal to tf − t
494 K. L
 akomiec et al.

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 50 100 150 200 250 300 350
Time [days]

Fig. 5. The time-dependent virus transmission intensity coefficient of the SEIR model
(3) after optimization procedure

105
16

14

12
Cumulative infections

10

0
0 50 100 150 200 250 300 350
Time [days]

Fig. 6. Fit of the model (3) to the COVID-19 cumulative cases data
Finding the Time-Dependent Virus Transmission Intensity 495

104
3.5

2.5
Daily infections

1.5

0.5

0
0 50 100 150 200 250 300 350
Time [days]

Fig. 7. Fit of the model (3) to the COVID-19 daily new cases. Variable Dm = kEI E(t),
variable Dd is obtained by successive subtraction of cumulative data

20

15
Number of people [log scale]

10

-5
0 50 100 150 200 250 300 350
Time [days]

Fig. 8. Simulation of model (3) with using the function β(t) from Fig. 5

5 Conclusion
Compartmental mathematical models are a powerful tool for epidemic simulation
and prediction. However, estimating the parameters of a model both accurately
and within reasonable computation time is a non-trivial task, particularly for
time-dependent parameters. A major obstacle is the high noisiness of infections
496 K. L
 akomiec et al.

data, stemming from various factors encompassing natural variation as well as


technical reasons such as testing and reporting schedules (for example, in several
countries no infections were reported during weekends, obviously compensated
by a spike in cases on Mondays). Efficient optimization algorithms are therefore
a crucial element of constructing an epidemic model.
A well-fitted mathematical model may prove instrumental for pandemic con-
trol. Knowledge of a probable increase in cases before its occurrence provides
an opportunity to undertake preventive measures and prepare the necessary
resources. Furthermore, as a result of modelling, a time-dependent function
βopt (t) is estimated. This function represents the transmission rate, and can
be used to investigate factors affecting the spread of the virus, for example tem-
perature, humidity, and precipitation, as well as sociological phenomena related
to holiday seasons, school year organization etc.
In this work we demonstrate a method of parameter estimation for an exem-
plary SEIR model, based on gradient minimization of a non-linear objective
function. The gradient can be calculated using adjoint sensitivity analysis. This
approach, successfully employed by us before for different biological systems
[4–8,13], proved effective also for simulation of the COVID-19 pandemic. The
obtained estimation was consistent with observed data, at the same time show-
ing no signs of overfitting despite the noise. Furthermore, a converged solution
was obtained in a reasonable time, which is invaluable for the ongoing pandemic
as it allows for rapid re-estimation of the parameters for updated data. The
method presented in this paper is not limited to biological systems – it can be
used for any mathematical model described by a system of ordinary differential
equations.

Acknowledgement. This work was supported by the Polish National Science Cen-
tre under grant number UMO-2020/37/B/ST6/01959 and by the Silesian University
of Technology under statutory research funds. Calculations were performed on the
Ziemowit computer cluster in the Laboratory of Bioinformatics and Computational
Biology, created in the EU Innovative Economy Programme POIG.02.01.00-00-166/08
and expanded in the POIG.02.03.01-00-040/13 project. Data analysis was partially
carried out using the Biotest Platform developed within Project n. PBS3/B3/32/2015
financed by the Polish National Centre of Research and Development (NCBiR). This
work was carried out in part by the Silesian University of Technology internal research
funding.

References
1. Dashtbali, M., Mirzaie, M.: A compartmental model that predicts the effect of
social distancing and vaccination on controlling COVID-19. Sci. Rep. 11(1) (2021).
https://doi.org/10.1038/s41598-021-86873-0
2. Dong, E., Du, H., Gardner, L.: An interactive web-based dashboard to track
COVID-19 in real time. Lancet Infect. Dis. 20(5), 533–534 (2020). https://doi.
org/10.1016/S1473-3099(20)30120-1
3. Engbert, R., Rabe, M.M., Kliegl, R., Reich, S.: Sequential data assimilation of
the stochastic SEIR epidemic model for regional COVID-19 dynamics. Bull. Math.
Biol. 83(1) (2020). https://doi.org/10.1007/s11538-020-00834-8
Finding the Time-Dependent Virus Transmission Intensity 497

4. Fujarewicz, K., Galuszka, A.: Generalized backpropagation through time for con-
tinuous time neural networks and discrete time measurements. In: Rutkowski, L.,
Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI),
vol. 3070, pp. 190–196. Springer, Heidelberg (2004). https://doi.org/10.1007/978-
3-540-24844-6 24
5. Fujarewicz, K., Kimmel, M., Świerniak, A.: On fitting of mathematical models
of cell signaling pathways using adjoint systems. Mathe. Bioscie. Eng. 2(3), 527
(2005)
6. Fujarewicz, K., L  akomiec, K.: Parameter estimation of systems with delays via
structural sensitivity analysis. Discr. Continuous Dyn. Syst. 19(8), 2521–2533
(2014)
7. Fujarewicz, K., L  akomiec, K.: Spatiotemporal sensitivity of systems mod-
eled by cellular automata. Math. Meth. Appl. Sci. 41(18), 8897–8905
(2018). https://doi.org/10.1002/mma.5358, https://onlinelibrary.wiley.com/doi/
abs/10.1002/mma.5358
8. Fujarewicz, K., L akomiec, K.: Adjoint sensitivity analysis of a tumor growth model
and its application to spatiotemporal radiotherapy optimization. Mathem. Biosci.
Eng. 13(6), 1131–1142 (2016)
9. Ghostine, R., Gharamti, M., Hassrouny, S., Hoteit, I.: An extended SEIR model
with vaccination for forecasting the COVID-19 pandemic in Saudi Arabia using an
ensemble kalman filter. Math. 9(6) (2021). https://doi.org/10.3390/math9060636,
https://www.mdpi.com/2227-7390/9/6/636
10. Giordano, G., et al.: Modelling the COVID-19 epidemic and implementation of
population-wide interventions in Italy. Nat. Med. 26(6), 855–860 (2020). https://
doi.org/10.1038/s41591-020-0883-7
11. He, S., Peng, Y., Sun, K.: SEIR modeling of the COVID-19 and its dynamics. Non-
linear Dyn. 101(3), 1667–1680 (2020). https://doi.org/10.1007/s11071-020-05743-
y
12. Hethcote, H.W.: The mathematics of infectious diseases. SIAM Rev. 42(4), 599–
653 (2000). https://doi.org/10.1137/S0036144500371907
13. L
 akomiec, K., Kumala, S., Hancock, R., Rzeszowska-Wolny, J., Fujarewicz, K.:
Modeling the repair of DNA strand breaks caused by γ-radiation in a minichromo-
some. Phys. Biol. 11(4), 003–045 (2014). https://doi.org/10.1088/1478-3975/11/
4/045003
14. Lauer, S.A., et al.: The incubation period of coronavirus disease 2019 (COVID-
19) from publicly reported confirmed cases: estimation and application. Ann.
Inter. Med. 172(9), 577–582 (2020). https://doi.org/10.7326/M20-0504. PMID:
32150748
15. Leontitsis, et al.: A specialized compartmental model for COVID-19. Int. J. Env-
iron. Res. Public Health 18(5) (2021). https://doi.org/10.3390/ijerph18052667,
https://www.mdpi.com/1660-4601/18/5/2667
16. López, L., Rodó, X.: A modified SEIR model to predict the COVID-19 outbreak
in Spain and Italy: simulating control scenarios and multi-scale epidemics. Results
Phys. 21, 103–746 (2021). https://doi.org/10.1016/j.rinp.2020.103746. https://
www.sciencedirect.com/science/article/pii/S2211379720321604
17. Ramezani, S.B., Amirlatifi, A., Rahimi, S.: A novel compartmental model to
capture the nonlinear trend of COVID-19. Comput. Biol. Med. 134, 104–421
(2021). https://doi.org/10.1016/j.compbiomed.2021.104421. URL https://www.
sciencedirect.com/science/article/pii/S0010482521002158
A Revealed Imperfection in Concept Drift
Correction in Metabolomics Modeling

Jana Schwarzerova1,2(B) , Ales Kostoval1 , Adam Bajger3 , Lucia Jakubikova3 ,


Iro Pierides2 , Lubos Popelinsky3 , Karel Sedlar1,4 , and Wolfram Weckwerth2,5
1
Department of Biomedical Engineering, Faculty of Electrical Engineering and
Communication, Brno University of Technology, Brno, Czech Republic
{Jana.Schwarzerova,221515,sedlar}@vut.cz
2
Molecular Systems Biology (MOSYS), University of Vienna, Vienna, Austria
a11913926@unet.univie.ac.at, wolfram.weckwerth@univie.ac.at
3
Knowledges Discovery Group, Faculty of Informatics, Masaryk University, Brno,
Czech Republic
{469113,456634}@mail.muni.cz, popel@fi.muni.cz
4
Institute of Bioinformatics, Department of Informatics,
Ludwig-Maximilians-Universität München, Munich, Germany
5
Vienna Metabolomics Center (VIME), University of Vienna, Vienna, Austria

Abstract. Prediction models that rely on time series data are often
affected by diminished predictive accuracy. This occurs from the causal
relationships of the data that shift over time. Thus, the changing weights
that are used to create prediction models lose their informational value.
One way to correct this change is by using concept drift information.
That is exactly what prediction models in biomedical applications need.
Currently, metabolomics is at the forefront in modeling analysis for phe-
notype prediction, making it one of the most interesting candidates for
biomedical prediction diagnosis. However, metabolomics datasets include
dynamic information that can harm prediction modeling. The study
presents concept drift correction methods to account for dynamic changes
that occur in metabolomics data for better prediction outcomes of phe-
notypes in a biomedical setting.

Keywords: Biomedical analysis · Metabolomics · Machine learning ·


Prediction methods

1 Introduction

Predictive models are at the forefront of diagnostic methods to aid the rapid
detection of disease diagnosis, which ultimately plays the most important role
in treatment [1,2]. Recent studies [3–6] show that metabolomics data has the
potential to address prediction changes in the immune system that may play a
key role in early disease symptoms detection. Thus, this raises new challenges
for the creation of prediction tools that are suitable for metabolomics data.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 498–509, 2022.
https://doi.org/10.1007/978-3-031-09135-3_42
A Revealed Imperfection in Concept Drift Correction 499

Nowadays, the most challenging applications of prediction tools are mostly


related to scenarios in which the source data is being provided in real time. As
the distributions of the underlying reality shift over time, a classification model
trained on the previously relevant data will begin to yield incorrect predictions
about current data. This undesirable phenomenon is called concept drift [7].
An untrained model retains roughly the same accuracy. Most real-life scenarios,
however, do not assume a static context. A model that was relevant at a previous
point in time may begin yielding inaccurate predictions. This might be caused
by a concept drift, in which the underlying concept of the machine learning
task is simply changing its meaning [7]. Therefore, prediction modeling aims to
eliminate this undesirable phenomenon.

1.1 Related Work


First, it is important to identify a concept drift when it occurs. This is the part
of the drift detection method that evaluates statistical features, including sample
mean and standard deviation, error rates, or other metrics of the base models
[8]. The second, much more important part of concept drift analysis is concept
drift correction. The learning algorithms applied to data streams must be able
to handle the concept drift and thus keep the learned model updated. Several
approaches have been proposed to handle concept drift [9–14]. Basically, the
methods for handling concept drift are divided into two main groups: explicit
methods and implicit, so-called blinding, methods [9,12].
Explicit methods for drift detection involve reactively updating the model to
respond to changes. This is done by monitoring the evolution of a performance
indicator that is based on classification or by monitoring data distributions for
different portions of the data, for example of these methods is Early Drift Detec-
tion Method (EDDM). Windowing approaches rely on sliding ADaptive WIN-
dowing (ADWIN) [9]. The principle is based on maintaining an adaptive sliding
window that grows by adding the most recent tuples as long as there is no con-
cept drift detected. Another explicit method is sampling, for example biased
reservoir sampling [13].
Implicit, also called blinding, methods are those that update the decision
model at regular intervals, independent, of the occurrence of concept drifts.
These methods rely on support vector regression (SVR) based forecasting meth-
ods [12], dynamic weight majority [15], or ensemble methods [16]. Guajardo et
al. [14] proposed an implicit concept drift handling method that relied on the
moving window approach and SVR, in which the most recent data is added to
the training set every time a predefined number of observations takes place [14].
One of the current issues in machine learning also connects to high dimen-
sional data streams. In study by Liu et al. [17] focused to concept drift detec-
tion for data stream learning based on angle optimized global embedding and
principal component analysis. The authors rightly mention here that traditional
machine learning methods are not appropriate to tackle with these data due to
the curse of dimensionality. Manifold learning algorithms have been successfully
applied to high-dimensional data [18]. Nevertheless, as mentioned by study by
500 J. Schwarzerova et al.

Liu et al. [17], these methods mainly focus on evaluating data manifolds so to
reveal interesting properties of data stream. The effectiveness of concept drift
detection should be considered by manifold learning approach [17].
Despite the large potential for developing biomedical machine learning mod-
els [19], concept drift detection is not currently used in biomedical applications
because there is a lack of understanding in how to update such forecasting mod-
els when new data arrive, i.e. when a new event occurs within the given time
series. For example, within metabolomics, some confounding factors changing
in time remain undetected. The study [20] presents applications affected by the
concept drift problem and the associated importance of concept drift detection
due to the adaptive nature of microorganisms in a biomedical scenario. This
is related to the effect of detection of early disease symptoms in metabolomics
modeling such as diabetes [21], cancer prevention [22], and others [23,24].
Our study builds on our previous work that was published in the study by
Schwarzerova et al. [25] in which the occurrence of a concept drift in human
metabolites during adolescence was confirmed. However, the study by Schwarze-
rova et al. [25] only considered the first step in the determination of a concept
drift. The current study verifies the occurrence of this drift on the newly created
models and performs a corrective analysis.

2 Materials and Methods

This study focuses on prediction using the published metabolomic data from a
study by Chu et al. [26]. Metabolomics prediction is now at to the forefront of
scientific work because of its causal informational value regarding the immune
system, as showcased in this study [26].

2.1 Dataset

The study by Chu et al. [26] measured the circulating blood metabolome and
integrates metabolite features with deep immunophenotyping from a population-
based cohort. In total, the analysis from the cohort study included 534 healthy
individuals of Caucasian origin aged between 18 to 75 years.
The study by Chu et al. [26] provides metabolomics and phenotype data
which is available in additional file 22, 23, or https://500 fg-hfgp.bbmri.nl. This
dataset was also used in the study by Schwarzerova et al. [25], in which the occur-
rence of a concept drift based on patient’s age was confirmed. Owing study, our
study aims to correct this drift to achieve better prediction values. The nor-
malized metabolite abundance levels were acquired from General Metabolomics
(GM) and Nightingale Health/Brainshake (BM). The two datasets (BM and
GM) were acquired using two different technical platforms: (1) The BM blood
metabolites, represented as 231 features with 200 absolute concentrations, were
measured using nuclear magnetic resonance (NMR) spectroscopy; (2) the GM
data comprised 1,589 features with 257 concentrations, and were measured using
flow injection-time-of-flight mass spectrometry.
A Revealed Imperfection in Concept Drift Correction 501

2.2 Methods

Concept drift analysis can be divided into two branches. The first branch is
concept drift detection which was performed in the study by Schwarzerova et al.
[25]. The second branch represents the concept of drift correction which is much
more important because it leads to improved prediction values.
First, our study includes a prediction phase in which we modeled the classi-
fier using Logistic Regression (LR), Random Forest (RF), and Gradient Boost-
ing (GB) approaches using Scikit multiflow [27,28]. The classifier learned the
metabolites datasets BM and GM, see Fig. 1. The selected classification prob-
lems were retrieved from the study by Schwarzerova et al. [25] as metabolite
prediction markers for patient gender detection, male or female, and a more
challenging prediction of the classification of women who did, or did not take
birth control pills.

Fig. 1. Framework of our methodology, where GM represents general metabolomics and


BM is nightingale health/brainshake metabolite data. Classifier G as gender determines
patient gender and BCP as birth control pills which determines women who take them,
or not. These classifiers were created using logistic regression (LR), random forest (RF),
and gradient boosting (GB) approaches. Concept drift analysis includes two steps.
First step is drift detection using ADWIN and EDDM detectors. Second step is drift
correction using manually drift concept correction and auto concept drift correction
represented as OzaBaggingADWINClassifier

Finally, the last part is focused on concept drift analysis. This analysis
includes two steps. First, ADWIN and EDDM detector are used to a concept
drift detection and available in Scikit multiflow [27]. A manual concept drift
502 J. Schwarzerova et al.

correction and automatic concept drift detection using the OzaBaggingADWIN-


Classifier from Scikit multiflow [27]. OzaBaggingADWINClassifier is a boosting
algorithm used in a study by [29] and tested in [30].
The basic core of the OzaBaggingADWINClassifier relies on one of the most
widely used correction approaches, ADWIN. This approach [9–11] maintains an
adaptive sliding window that grows by adding the most recent tuples as long as
there is no concept drift detected. If ADWIN detects a concept drift, it shrinks
the window by removing the old tuples. It does not require users to set mini-
mum or maximum times between concept drifts in advance. It configures only
the confidence value (0,1) to adjust the sensitivity of the concept drift detec-
tion, therefore eliminating the significant disadvantage of the approaches that
periodically recompute the models under a fixed-size windows of data. However,
ADWIN has time and memory inefficiencies; therefore, a new improved version,
ADWIN2, was introduced. It checks the O(logW ) cutpoints, uses only O(logW )
memory words, its processing time per example is O(log2W ) as the worst case,
and O(logW ) is amortized.
A summarized framework of the methodology is divided into two sections:
prediction modeling and drift analysis. Figure 1 show the overall concept of this
methodology for easier comprehension.

3 Results

In total, we created 12 different classifiers. Four were based on logistic regression


modeling. The number of maximum iterations for these models was set to 50,000.
In RF and GB classifiers, the parameters were set to default. All models were
evaluated using 10 cross validation scores for accuracy, and then classification
reports were obtained. Summary of created classifiers is shown in Tables 1, 2 and
3.

Table 1. Evaluation parameters of predicted classifiers using logistic regression

Accuracy Standard deviation


Birth control pills (BM) 0.775 0.134
Gender (BM) 0.754 0.119
Birth control pills (GM) 0.630 0.192
Gender (GM) 0.626 0.144

Table 1 shows evaluated parameters for classifiers based on logistic regression


modeling. Average accuracy is 0.696 and average standard deviation is 0.147 in
logistic regression models.
Similarly, Table 2 and 3 shows evaluated parameters for classifiers based on
random forest and gradient boosting modeling. The highest accuracy has birth
A Revealed Imperfection in Concept Drift Correction 503

Table 2. Evaluation parameters of predicted classifiers using random forest

Accuracy Standard deviation


Birth control pills (BM) 0.800 0.187
Gender (BM) 0.782 0.153
Birth control pills (GM) 0.710 0.204
Gender (GM) 0.779 0.123

Table 3. Evaluation parameters of predicted classifiers using gradient boosting

Accuracy Standard deviation


Birth control pills (BM) 0.825 0.195
Gender (BM) 0.780 0.113
Birth control pills (GM) 0.536 0.183
Gender (GM) 0.723 0.175

control pills model rely on BM data in all different modeling approaches. Aver-
age accuracy is 0.768 and average standard deviation is 0.167 in random forest
models. Gradient boosting models have 0.716 as average accuracy and 0.167 as
average of standard deviation. The highest accuracy have prediction classifiers
created by random forest models.

3.1 Concept Drift Detection


Only one concept drift was detected using the ADWIN detector. This detected
drift was related to 21 year of patients. The drift concept was detected by
ADWIN only in models created by the GM dataset, which solved determina-
tion of the patient’s gender. In the same model the EDDM detector determines
20 fuzzy concept drifts also related to up to 21 year of patients. Therefore, the
assumption of correcting the drift concept will increase the accuracy of our
models.

3.2 Manual Concept Drift Correction


The main parameter for each model before and after manual concept drift cor-
rection was accuracy, as seen in Figs. 2 and 3. It is possible to see that in the
models in which ADWIN detected a concept drift related to up to 21 years of
patients after manually correction, the accuracy is increased.
Nevertheless, the most increased accuracy was produced from the LR classi-
fier that predicted whether or not women took birth control pills. This is evidence
that the model’s drift concept within up to 21 years, however, was not detected
using the ADWIN detector.
The best modeling score before accuracy was generated by the RF classifier
relying on GM and solved the birth control pills determination. Nevertheless,
504 J. Schwarzerova et al.

the average of evaluation parameters is highest for the modeling that was based
on the GB classifiers. On the other hand, the minimum values were based on LR
classifiers. On average, the accuracy was increased in 67% the created models.
The poorest accuracy resulted from the LR classifier based on the GM dataset
that focused on birth control pills problem.
After completing of concept drift correction, the best modeling score was
generated by the LR classifiers focused on the prediction women taking birth
control pills, which was based on the BM dataset. Precisely, the accuracy for
this model was 0.917; thus, the accuracy improvement of this model was 0.092.

Fig. 2. Accuracy for each created classifier from general metabolomics (GM) input
data. Gender determines patient gender and birth control pills determines women who
take them, or not. LR represents logistic regression, RF is random forest, and GB is
gradient boosting approaches

Fig. 3. Accuracy for each created classifier from nightingale health/brainshake metabo-
lite (BM) input data. Gender determines patient gender and birth control pills deter-
mines women who take them, or not. LR represents logistic regression, RF is random
forest, and GB is gradient boosting approaches

3.3 Auto Concept Drift Correction


This part of study, called auto concept drift correction, focused on drift detection
by ADWIN and automatic drift correction using OzaBaggingADWINClassifier.
Figure 4 includes the accuracy obtained using the OzaBaggingADWINClassifier.
The purpose of this classifier is to detect and immediately correct the drift in the
model. Thus, it was used to improve the resulting predictions in the previously
created models.
A Revealed Imperfection in Concept Drift Correction 505

Fig. 4. Accuracy metabolomics modeling using OzaBaggingADWINClassifier [27]; GM


means general metabolomics and BM is nightingale health/brainshake metabolite input
data. Gender determines patient gender and birth control pills determines women who
take them, or not

Unfortunately, we found that the drift detection approach using the ADWIN
approach, or the classifier itself, is completely unsuitable for the analysis of
metabolomics data. As seen in Fig. 4 the best accuracy was 0.461 in Birth Control
Pills (BCP) models based on the GM dataset. The reduction of 40% is seen,
unlike models without consideration of ADWIN.

4 Discussion

This study revealed an imperfection in concept drift correction in metabolomics


modeling. Metabolomics has a huge potential to provide accurate diagnoses of
diseases. The prediction models based on metabolite data contain dynamic infor-
mation that is directly influenced by environmental inputs and have considerable
potential for use as a tool for medical diagnosis.
Due to the growing quality and decreasing cost of sequencing, research has
shifted to the post-genomic era, in which its main purpose is to understand the
relationship between genotype and phenotype. This leads to an understanding
of the metabolomics processes in the human body, which represent the upper-
most, intermediate molecular layer between genotype and phenotype. Thus, the
methods and tools used in our study can also aid in developing computational
methods for understanding human complexity. However, we must also identify
any unsuitable paths to the solution for the given data, one of which has been
presented in our study.
At the beginning of this study, we modeled nine individual classifiers based on
metabolic data. These models were created using three different approaches and
solved two different classification problems similar to the study [25], on which
this study was built. The first classification predicted the patient’s gender in the
BM and GM datasets. The second predicted whether a female patient took birth
control pills or not. The selected model approaches used computational methods
of varying complexity, from LR to the most complex, such as RF or GB. As
506 J. Schwarzerova et al.

expected, the best accuracy score was achieved by RF and GB classifiers, as


shown in Figs. 2 and 3.
The concept drift analysis in this study was performed using ADWIN and
EDDM detection. However, as previously demonstrated in the study [25], this
ADWIN was found to be unsuitable for drift detection in the context of metabol-
omics modeling. However, this was not because the drift does not occur in the
data, as evident in the study [25] in which the occurrence of concept drift was
proven. In addition, ADWIN detected one drift occurrence within the GB classi-
fier related to 21-year-olds. This reaffirmed the results of a study [25] that used
different models and approaches but the same metabolomics data. Contrary to
the study [25], while using ADWIN we did not detect any fuzzy decisions. Nev-
ertheless, we detected a drift, which was also detected in the study by [25] using
the unequivocally best model for drift detection: EDDM [25,31]. Hence, a man-
ual drift concept correction was performed for the elimination of patients under
22 years of age. The results showed an increase in accuracy in 67% of the model
classifiers, as seen in in Figs. 2 and 3.
The last part of this study tested an algorithm for an auto concept drift cor-
rection called OzaBaggingADWINClassifier, available in Scikit multiflow [27,28].
As might be expected due to insufficient detection using the ADWIN detector,
there was no correct way to improve prediction. Oza and Russell’s Online Bag-
ging and Boosting uses a Poisson distribution to simulate the behavior of their
corresponding offline algorithms in online environments [32]. In contrast, EDDM
analyzes the distance between consecutive errors, rather than the error rate [32].
Therefore, we conclude that algorithms based on sliding windows are unsuitable
for detecting drift in metabolomics modeling.
This opens up further avenues for not only testing other available correc-
tion algorithms, but also innovating completely new ones that are focused on
metabolomics modeling.
Overall, this study revealed not only an imperfection in concept drift cor-
rection in metabolomics modeling but also confirmed the results of the study
by [25] that detected a concept drift phenomenon in prediction models based
on a metabolite dataset. These metabolic changes in adolescents have been cor-
responded with study such as [33]. In addition, this study also confirmed that
concept drift is detected in adolescents by manual concept drift correction.

5 Conclusion

One of the biggest challenges in science is understanding human complexity


and creating diagnostic tools, including prediction models based on metabolite
datasets. These models could play a main role in early disease detection, which
can lead to the reduction of treatment costs. This challenge requires an accurate
prediction of metabolic changes. Moreover, natural drifts in the molecular under-
pinning of the genotype-phenotype relationship can be identified using concept
drift detection using an updated model that can improve predictions by consid-
ering dynamic molecular changes and eliminating false-positive detections.
A Revealed Imperfection in Concept Drift Correction 507

The main benefit of this study is the discovery of inappropriate algorithms


for detecting and correcting concept drift. These algorithms are based on win-
dowing approaches that are unsuitable for use on metabolomics models. The
evaluation included two main approaches: concept drift detection in two differ-
ent metabolomics datasets and testing a correction classifier using the ADWIN
method.
As a result, this study revealed an imperfection in concept drift correction in
metabolomics modeling. Although the study concludes this path is unsuitable
for use on metabolomics models, it also confirms that drift can be detected in
the model during adolescence. Therefore, the metabolomics modeling is affected
by concept drift based on confounding factors such as age for example within
the adolescent period. Thanks to that, the innovative adaptable approach for
better prediction should be used with caution and further effort is needed to
improve them using concept drift correction. The study closes one path for
using the ADWIN-based correction algorithm and recommends a focus on dif-
ferent approaches of correction algorithm as it relates to concept drift analysis
in metabolomics modeling.

Acknowledgement. This work has been supported by grant FEKT-K-21-6878


realised within the project Quality Internal Grants of BUT (KInG BUT), Reg. No.
CZ.02.2.69/0.0 /0.0/19 073/0016948, which is financed from the OP RDE.
We would like to thank Adam Hospodka for their support of our study, by building
on the project team’s results in Machine Learning and Data Mining (PV056) course
at Masaryk University.

References
1. Birks, J., et al.: Evaluation of a prediction model for colorectal cancer: retrospective
analysis of 2.5 million patient records. Cancer Med. 6(10), 2453–2460 (2017)
2. Jae-woo, L., et al.: The development and implementation of stroke risk predic-
tion model in national health insurance Service’s personal health record. Comput.
Methods Program. Biomed. 153, 253–257 (2018)
3. Tantawy, A.A., Naguib, D.M.: Arginine, histidine and tryptophan: a new hope for
cancer immunotherapy. PharmaNutrition 8, 100149 (2019)
4. Changsong, G., et al.: Isoleucine plays an important role for maintaining immune
function. Curr. Protein Pept. Sci. 20(7), 644–651 (2019)
5. Iyer, A., Fairlie, D.P., Brown, L.: Lysine acetylation in obesity, diabetes and
metabolic disease. Immunol. Cell Biol. 90(1), 39–46 (2012)
6. Andras, P.: Metabolic control of immune system activation in rheumatic diseases.
Arthritis Rheumatol. 69(12), 2259–2270 (2017)
7. Webb, G.I., Hyde, R., Cao, H., Nguyen, H.L., Petitjean, F.: Characterizing con-
cept drift. Data Min. Knowl. Disc. 30(4), 964–994 (2016). https://doi.org/10.1007/
s10618-015-0448-4
8. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In:
Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295.
Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5 29
9. Grulich, P.M., et al. Scalable detection of concept drifts on data streams with
parallel adaptive windowing. In: EDBT, pp. 477–480 (2018)
508 J. Schwarzerova et al.

10. Imen, K., et al.: Self-adaptive windowing approach for handling complex concept
drift. Cogn. Comput. 7(6), 772–790 (2015)
11. Huang, D.T.J., et al. Detecting volatility shift in data streams. In: 2014 IEEE
International Conference on Data Mining, pp. 863–868. IEEE (2014)
12. Sun, J., Li, H., Adeli, H.: Concept drift-oriented adaptive and dynamic support
vector machine ensemble with time window in corporate financial risk prediction.
IEEE Trans. Syst. Man Cybern. Syst. 43(4), 801–813 (2013)
13. Aggarwal, C.C.: On biased reservoir sampling in the presence of stream evolution.
In: Proceedings of the 32nd International Conference on Very Large Data Bases,
pp. 607–618 (2006)
14. Guajardo, J.A., Weber, R., Miranda, J.: A model updating strategy for predicting
time series with seasonal patterns. Appl. Soft Comput. 10(1), 276–283 (2010)
15. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: an ensemble method for
drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)
16. Sun, Y., et al.: Concept drift adaptation by exploiting historical knowledge. IEEE
Trans. Neural Netw. Learn. Syst. 29(10), 4822–4832 (2018)
17. Shenglan, L., et al.: Concept drift detection for data stream learning based on angle
optimized global embedding and principal component analysis in sensor networks.
Comput. Electr. Eng. 58, 327–336 (2017)
18. Pless, R., Souvenir, R.: A survey of manifold learning for images. IPSJ Trans.
Comput. Vis. Appl. 1, 83–94 (2009)
19. Wei, L., et al.: Guidelines for developing and reporting machine learning predictive
models in biomedical research: a multidisciplinary view. J. Med. Internet Res.
18(12), e323 (2016)
20. Žliobaité, I.: Learning under concept drift: an overview. arXiv preprint
arXiv:1010.4784 (2010)
21. Wang, T.J., et al.: Metabolite profiles and the risk of developing diabetes. Nat.
Med. 17(4), 448–453 (2011)
22. Clement, I.P., et al.: Chemical form of selenium, critical metabolites, and cancer
prevention. Cancer Res. 51(2), 595–600 (1991)
23. Montemayor, D., Sharma, K.: mGWAS: next generation genetic prediction in kid-
ney disease. Nat. Rev. Nephrol. 16(5), 255–256 (2020)
24. Moats, R.A., et al.: Abnormal cerebral metabolite concentrations in patients with
probable Alzheimer disease. Magn. Reson. Med. 32(1), 110–115 (1994)
25. Schwarzerova, J., Bajger, A., Pierdou, I., Popelinsky, L., Sedlar, K., Weckw-
erth, W.: An innovative perspective on metabolomics data analysis in biomedical
research using concept drift detection. In: Proceedings 2021 IEEE International
Conference on Bioinformatics and Biomedicine (BIBM2021) (2021). (in press)
26. Xiaojing, C., et al.: Integration of metabolomics, genomics, and immune pheno-
types reveals the causal roles of metabolites in disease. Genome Biol. 22(1), 1–22
(2021)
27. Jacob, M., et al.: Scikit-multiflow: a multi-output streaming framework. J. Mach.
Learn. Res. 19(1), 2914–2915 (2018)
28. Ekaba, B.: Logistic regression. In: Building Machine Learning and Deep Learning
Models on Google Cloud Platform, Apress, Brekeley, CA, pp. 243–250 (2019)
29. Oza, N.C., Russell, S.J.: Online bagging and boosting. In: International Workshop
on Artificial Intelligence and Statistics. PMLR, pp. 229–236 (2001)
30. Sanjeev, K., et al.: Design of adaptive ensemble classifier for online sentiment
analysis and opinion mining. PeerJ. Comput. Sci. 7, e660 (2021)
31. Manuel, B.G., et al.: Early drift detection method. In: Fourth International Work-
shop on Knowledge Discovery from Data Streams, pp. 77–86 (2006)
A Revealed Imperfection in Concept Drift Correction 509

32. de Barros, R.S.M., de Carvalho Santos, S.G.T.: An overview and comprehensive


comparison of ensembles for concept drift. Inf. Fusion 52, 213–244 (2019)
33. Bei, D., et al.: A prospective study of serum metabolomic and lipidomic changes
in myopic children and adolescents. Exp. Eye Res. 199, 108182 (2020)
Two-Dimensional vs. Scalar Control
of Blood Glucose Level in Diabetic
Patients

Jarosław Śmieja and Artur Wyciślok(B)

Department of Systems Biology and Engineering, Silesian University of Technology,


ul. Akademicka 16, 44-100 Gliwice, Poland
{jaroslaw.smieja,artur.wycislok}@polsl.pl

Abstract. Closed-loop controllers for insulin pumps have been on the


market for some time. It has been shown that modified PID or MPC
control algorithms are best suited for artificial pancreas. However, due
to nonnegative control values only and relatively slow dynamics of the
response to insulin input, they are not well equipped to deal with hypo-
glycemia induced by a physical effort. This paper is focused on that
aspect of blood glucose control. Two alternative solutions are proposed
and compared. The first one is based on feedforward, with additional
information about future physical effort entered by the user. The second
approach uses an additional control in the form of glucagon. Simulation
is run for a fixed scenario of three meals and additional physical effort
that affects the insulin-glucose system for a cohort of virtual patients, for
whom model parameters were sampled. Performance of control systems
is evaluated with several quality indicators.

Keywords: Artificial pancreas · Feedforward · Closed-loop control ·


Glucagon · Insulin pumps

1 Introduction

According to World Health Organisation report, between years 1980 and 2014 the
number of people diagnosed with diabetes rose from 108 million to 422 million
[20], and that number is still on the rise. Therefore, improving treatment itself
and the quality of life of the patients have become one of the most important
problems in bioengineering.
Standard diabetes treatment involves multiple glucose measurements and
insulin injections a day. This can be facilitated by continuous blood glucose moni-
tors (CBCM) and insulin pumps that dose insulin in an automatic manner. Intro-
duction of reliable insulin-administering and blood glucose level (BGL) monitor-
ing devices provided an opportunity to create so-called artificial pancreas (AP),
i.e. an automatic system that takes over the role of maintaining the appropriate
BGL from malfunctioning organism. A controller that reads data from CBCM,

c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022


E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 510–521, 2022.
https://doi.org/10.1007/978-3-031-09135-3_43
Two-Dimensional vs. Scalar Control.. 511

determines the amount of insulin to be injected and provides input signal to the
insulin pump is a core of the closed-loop blood glucose control system.
Multiple works concerning control algorithm for such systems have been pub-
lished [3]. Two most widely used algorithms were chosen for further tests –
Proportional-Integral-Derivative (PID) and Model Predictive Control (MPC).
They have been proven to be safe both in a controlled and free environments.
However, intensive physical exercise or other physical effort may result in dra-
matic decrease of blood glucose level, which cannot be counteracted by standard
algorithms. This work aims at introducing modifications to these algorithms
and compare PID and MPC approaches with respect to their performance with
respect to preventing effort-induced hypoglycemia.
The comparison was performed based on results of numerical simulations
obtained for 1000 virtual patients with randomised parameters. Separate param-
eter vectors represent both inter-patient heterogeneity as well as changes in phys-
iological parameters in individual patients.

2 Simulation Model and Control Structures


Most of works concerning testing of different control algorithms and con-
trol structures involve clinical trials incorporating a small number of diabetic
patients, e.g. [4]. Similarly, most computational models-based works focused on
specific control algorithms run simulations for a single or several individuals only
(e.g. [7,14]). In order to check if the proposed approach may be safely introduced
in a population of patient with various physiological conditions, a large pool of
patients, whose models are characterised by different parameters, need to be
created.

2.1 Insulin-Glucose Mathematical Model


The model of the patient that had to be implemented in order to run simulations
consists of several subsystems: a glucose-insulin interaction subsystem, an insulin
pharmacokinetics subsystem, and a meal-related subsystem.

Glucose-Insulin Interaction Model. Mathematical models of glucose-insulin


interaction as well as insulin pharmacokinetics (PK) are well established. The
simplest model, which reflects system dynamics surprisingly well, used in this
work is so-called Bergman minimal model [2], combined with first order PK
model. It is given by the following equations

dG(t)
= −(p1 + X(t))G(t) + p1 Gb + p2 Gin (t), G(0) = Gb (1)
dt
dX(t)
= −p2 X(t) + p3 I(t), X(0) = 0 (2)
dt
dI(t)
= k1 Iin (t) + k2 I(t), I(0) = 0 (3)
dt
512 J. Śmieja and A. Wyciślok

where Gin (t), Iin (t) I(t), X(t) and G(t) represent glucose input, insulin input,
insulin blood concentration, so-called insulin effect and blood glucose, respec-
tively. Additionally Gb indicating basal glucose production and parameters p1 ,
p2 , p3 , k1 and k2 are present.
The model of meal-associated glucose input Gin comes from [12]:
Ggut (t)
= −kgabs Ggut (t) + Gempt (t), Ggut (0) = 0 (4)
dt
where Gempt (t) is a gastric emptying rate and kgabs is a parameter. In [15] it
is argued that, if that variable takes triangular of trapezoidal curve depending
on the meal’s glycaemic index, then, in a reasonable simplification, that model
combined with Eq. (1)–(3) represents BGL dynamics accurately.
One additional part of the model is the inclusion of effort’s effect on BGL.
That was done by modifying Eq. (1) to account for exercise-dependent changes
in minimal model’s parameters shown e.g. in [6]:
dG(t)
= −(p1 + P ∗ (t)X(t))G(t) + p1 Gb + p2 Gin (t) (5)
dt
The term P ∗ (t)X(t) determines the intensity of the effort, or, more accu-
rately, its effect on the insulin-glucose subsystem.

Glucagon Model. Apart from administering insulin to patients when blood


glucose level is above desired values, an action opposite in effect is considered.
As insulin can only reduce blood glucose level, control system utilising only
that substance can do nothing in case of hypoglycemia. To account for that
shortcoming, the administration of glucagon in cases of low blood glucose level
is taken into account as an extension of traditional insulin-only blood glucose
management systems.
Glucagon dynamics is reported to be of the same type as that of insulin,
with different parameters [8,10,19]. Due to that fact, as was proposed in [8],
the inclusion of glucagon model into Bergman glucose-insulin model (Eq. (6) is
simple and requires just one additional part in Eq. (1), namely Y (t) being the
effect of glucagon concentration in blood, similar to X(t) for insulin but acting
with opposite sign.

dG(t)
= −(p1 + X(t) − Y (t))G(t) + p1 Gb + p2 Gin (t), (6)
dt
where Y (t) is described by additional equation similar to X(t) in Bergman min-
imal model:
dY (t)
= p3g · L(t) − p2g · Y (t), Y (0) = 0 (7)
dt
The description of pharmacokinetics of glucagon is, as discussed, analogue to
that of insulin:
dL(t)
= k1g · Lin (t) − k2g · L(t), L(0) = 0 (8)
dt
Two-Dimensional vs. Scalar Control.. 513

where Lin (t) is the administered glucagon and L(t) is glucagon concentration in
blood. In simulations described in this paper the values of parameters p1g , p2g ,
p3g , k1g and k2g were adopted as identical to respective values for insulin PK
(without "g" subscript).
Combining Eq. (6) description with patient’s effort shown on Eq. (5), the
patient’s model used in this work for insulin-glucagon controller simulations,
consisted of set of 5 differential equations – two concerning insulin propagation
(Eq. (2) and (3)), two concerning glucagon propagation (Eq. (7) and (8)), and
a modified version of the main equation of Bergman model:
dG(t)
= −(p1 + P ∗ (t)X(t) − Y (t))G(t) + p1 Gb + p2 Gin (t) (9)
dt

2.2 Control Structures


Two control loop structures aimed at reducing negative impact of effort were
tested. The first one involves feedforward, in which information about planned
effort is supplied to the controller in advance. The second structure employs a
separate glucagon pump, in addition to the insulin one [18].

Feed-Forward Structure for Planned Effort. The inclusion of patient’s


effort results in potentially life-threatening hypoglycemic incidents, when sudden
increase in organism’s glucose consumption happens after control system had
already provided additional insulin to deal with meal-associated glucose. usually,
intense efforts can be predicted and planned and information about them can
be supplied to the controller in order to reduce amount of administered insulin
prior to the effort.
Accounting for nonlinear relation between effort and blood glucose level
changes (Eq. (5)) the feed-forward aspect of the control loop, presented on Fig. 1,
was designed accordingly.

Fig. 1. Structure of control system with feed-forward aspect – block diagram

Two-Dimensional Control. Many studies were conducted researching the


inclusion of glucagon into AP in clinical trials [4,5,17], however only recently
simulation-based testing of different algorithms including glucagon as a second
control signal became a filed of extensive research, e.g. [16].
514 J. Śmieja and A. Wyciślok

Should we want to make use of glucagon as indicated in Subsect. 2.1, imple-


mented controllers are to be modified in a way that allows them to calculate
values of two control signals based on only one measurements signal.
It is rather straightforward in the case of MPC controller, as such operation
mode is natural for that algorithm. For PID however, additional elements must
be included to allow for mode of operation desired in this case. Two approaches
were discussed: a single PID controller with split-range control subsystem and
two separate PID controllers, one for each output signal, together with auxiliary
algorithm ensuring that in any time only one substance is being administered.
The latter approach was chosen.
The block diagram of control system using both insulin and glucagon is shown
on Fig. 2.

Fig. 2. Structure of the control system using both insulin and glucagon – block diagram

Controller Bounds. The output of the control algorithm must have been
limited due to the fact that simulation model should approximate reality, and in
reality neither negative amount, nor excessively high dose of substance can be
administered.
The upper bound was determined in reference to average daily dose of insulin
[9], accounting for the fact that in the developed model the controller is respon-
sible only for the bolus [13].

2.3 Simulation Details

For each virtual patient, represented by a parameter vector, a simulation was


run for a fixed scenario of meals and a physical effort. Then, selected properties
of model responses were combined to facilitate comparison of controllers per-
formance in a population of 1000 virtual patients. The parameter vectors for
virtual patients were drawn from a uniform distribution with lower and upper
bounds shown on Table 1. Values of parameters of model’s equations are shown
on Table 2.
Two-Dimensional vs. Scalar Control.. 515

Table 1. Nominal values and bounds for changes of parameters

Parameter name Lower bound Upper bound


p2 0.015 0.030

Pbase 0.5 3
Vmax multiplier 0.5 1.5
Tup_max multiplier 0.5 3

Table 2. Values of parameters of model’s equations

Parameter name Value Unit


Gb 80 mg·dL−1
p1 0.015 min−1
p2 (nominal value) 0.021 min−1
−8
p3 7.5 · 10 mL · µU−1 · min−2
k1 5 mL−1 ·min−1
k2 0.214 min−1
kgabs 0.01(6) min−1
Vmax (nominal value) 1/90 mg·min−1
Tup_max (nominal value) 30 min

Simulation Senarios. Each simulation included computations of 24 h of BGL


time course for a patient characterised by unique set of model parameters. The
meal scenario included three meals of different glucose value (20 g for a breakfast,
40 g for a lunch and 60 g for a dinner). These doses and times of meals were the
same for all simulations.
Apart from meals, effort distribution during the day had to be determined. In
the chosen scenario, efforts occur in relatively short time after the meal, but their
intensities are the same, regardless of the glucose dose of the preceding meal.

Acceptable Blood Glucose Levels. To create a control system a reference


value of controlled variable – setpoint – must be provided. Additionally, as
described further, to assess the quality of the control, two additional values must
be chosen with reference to the setpoint – hypo- and hyperglycemia thresholds.
mg
As the setpoint was taken at 80 dl the aforementioned limit values were
chosen according to World Health Organisation’s [21] and American Diabetes
Association [1] guidelines, as similar to those shown in [11].

Control Quality Indicators. Two undesirable states have to be taken into


account – hypo- and hyperglycemia. Each of them should be considered sepa-
rately, as they have different impact on a patient state. Therefore, the following
control quality indicators were chosen:
516 J. Śmieja and A. Wyciślok

– The number of hypoglycemic episodes in all simulations for a given control


mg
structure (referred further in the text as a simulation batch) (BGL< 60 dl );
mg
– The total time of hypoglycemic episodes in a single simulation (BGL< 60 dl );
mg
– Total time of hyperglycemic episodes in a single simulation (BGL> 140 dl ).

3 Results

Simulation results obtained for each control structure, in terms of quality indica-
tors specified in 2.3, are presented in the form of histograms. Matlab & Simulink
software with addition of Model Predictive Control Toolbox was used. In each
figure, histograms for one indicator and both control algorithms are presented
(darker shade indicates overlapping of histograms for both controllers). To bet-
ter depict the distribution of indicators for non-zero values, only those values
are presented. The number of simulations in which not a single hypo- or hyper-
glycemic event can be taken from all simulations.

3.1 Reference Results

Both proposed control structures are compared to a reference controller, with


no feedforward and a single insulin control signal, for which the histograms are
shown in Fig. 3.

3.2 Results for Analysed Structures

First comparison between proposed and reference structures can be made on the
basis of glucose time course during a single simulation. In Fig. 4 an example of
such results is presented. It includes three meals, as stated in 2.3, and two efforts:
first starting in 160 min and ending in 260 min, second starting in 900 min and
lasting until 1120 min. Both efforts intensities (value of P ∗ ) were equal to 2.
As presented in Table 3, both control structures provided improvement in
terms of reducing the number of hypoglycemic episodes. One of the reasons for
more significant improvement in case of feed-forward structure can be seen in
Fig. 4, where for the first effort no hypoglycemia occurred, as a direct result of
the effort starting.

Table 3. The number of hypoglycemic episodes in a simulation batch

Structure Value for PID Value for MPC


Reference 508 2329
Use of glucagon 464 1904
Feed-forward 67 1546
Two-Dimensional vs. Scalar Control.. 517

Fig. 3. Histograms of quality indicators for reference case

Fig. 4. Comparison of control structures for both algorithms


518 J. Śmieja and A. Wyciślok

Fig. 5. Histograms of time of hypoglycemia for both structures

The results show that for the MPC algorithm, when glucagon is used, the
time in which state of hypoglycemia occurs is significantly lower than in case of
feed-forward structure (Fig. 5. Such a conclusion is not entirely true in case of
PID algorithm, where, though the maximum value of hypoglycemia time is lower
for the controller with glucagon, the overall number of simulations in which such
state occurred is lower for the feed-forward structure. Both control structures
provide an improvement compared to reference case.
Additionally, when looking at number of hypoglycemic episodes, shown in
Table 3, it is clear that with the feed-forward structure, the number of such
episodes is much lower and that the decrease is much bigger in the case of the
PID algorithm. Values above 1000, seen for the MPC algorithm, indicate that
more than one hypoglycemic episode happened during a single simulation. Such
behaviour is characteristic for the tested MPC algorithm.
Speaking about time of hyperglycemia, shown in Fig. 6, for both algorithms
it is clearly visible that feed-forward structure may lead to longer periods of
hyperglycemia. It is consistent with the way this structure works, i.e. reducing
amount of administered insulin some time before BGL would fall due to effort,
what can be clearly seen in Fig. 4. When compared against reference case, for
additional glucagon virtually no change in the character of distribution is visible.
Two-Dimensional vs. Scalar Control.. 519

Fig. 6. Histograms of time of hyperglycemia for both structures

From that it is natural that in case of the feed-forward structure duration of


hyperglycemia is longer than in the reference situation.

4 Conclusions

Physical effort may lead to dangerous hypoglycemia, if its effects on blood glucose
are not counteracted by appropriate controller actions. In terms of hypoglycemia
related indicators (both time and number of episodes), the structure that uses
information about planned effort performed better for both control algorithms.
However that comes with a price of increased duration of hyperglycemia. More-
over, as with each feedforward control system, the physical effort beginning,
intensity and duration must be predicted accurately, otherwise its performance
will be unacceptable. Further research is needed to research to what extend
inaccuracies in efforts’ parameters prediction hinder the efficiency of such con-
trol structure.
Using glucagon as an additional control signal provides an alternative to the
feedforward structure. Though slightly less efficient in reducing hypoglycemia
duration, it requires no additional information about planned effort. For that
520 J. Śmieja and A. Wyciślok

reason this solution seems more robust, allowing patients for more flexible daily
regime. However such solution requires changes to the construction of insulin
pumps to allow for administering two substances.
Combining both approaches is possible, with the user interface allowing the
patient to switch the controller from glucagon administering to feed-forward
structure when certain effort is well-planned and determined – like in case of
regular exercises.

Acknowledgement. This work was supported by the SUT internal grant for young
researchers (AW) and the SUT internal grant 02/040/BK_21/1022.

References
1. American diabetes association. 6. glycemic targets. Diabetes Care 40(Supplement
1), S48–S56 (2016). https://doi.org/10.2337/dc17-S009
2. Bergman, R.: The minimal model: yesterday, today and tomorrow. In: R. Bergman,
J. Lovejoy (eds.) The minimal model approach and determinants of glucose toler-
ance. Louisiana University Press, Baton Rouge, USA (1997)
3. Bertachi, A., Ramkissoon, C.M., Bondia, J., Vehí, J.: Automated blood glucose
control in type 1 diabetes: a review of progress and challenges. Endocrinología, Dia-
betes y Nutrición 65(3), 172–181 (2018). https://doi.org/10.1016/j.endinu.2017.10.
011
4. Blauw, H., Onvlee, A.J., Klaassen, M., van Bon, A.C., DeVries, J.H.: Fully closed
loop glucose control with a bihormonal artificial pancreas in adults with type 1
diabetes: an outpatient, randomized, crossover trial. Diab. Care 44(3), 836–838
(2021). https://doi.org/10.2337/dc20-2106
5. van Bon, A.C., Luijf, Y.M., Koebrugge, R., Koops, R., Hoekstra, J.B., DeVries,
J.H.: Feasibility of a portable bihormonal closed-loop system to control glucose
excursions at home under free-living conditions for 48 hours. Diab. Technol. Ther.
16(3), 131–136 (2014). https://doi.org/10.1089/dia.2013.0166
6. Brun, J., Guintrand-Hugret, R., Boegner, C., Bouix, O., Orsetti, A.: Influence of
short-term submaximal exercise on parameters of glucose assimilation analyzed
with the minimal model. Metabolism 44(7), 833–840 (1995). https://doi.org/10.
1016/0026-0495(95)90234-1
7. Colmegna, P.H., Bianchi, F.D., Sanchez-Pena, R.S.: Automatic glucose control
during meals and exercise in type 1 diabetes: Proof-of-concept in silico tests using a
switched LPV approach. IEEE Control Syst. Lett. 5(5), 1489–1494 (2021). https://
doi.org/10.1109/lcsys.2020.3041211
8. Herrero, P., Georgiou, P., Oliver, N., Reddy, M., Johnston, J., Toumazou, C.: A
composite model of glucagon-glucose dynamics for in silico testing of bihormonal
glucose controllers. J. Diab. Sci. Technol. 7(4), 941–951 (2013). https://doi.org/
10.1177/193229681300700416
9. Hirsch, I.: Type 1 diabetes mellitus and the use of flexible insulin regimens. Am.
Family Phys. 60(8), 2343–2356 (1999)
10. Hovorka, R., et al.: Nonlinear model predictive control of glucose concentration in
subjects with type 1 diabetes. Phys. Measur. 25(4), 905–920 (2004)
11. Briscoe, V., Davis, S.: Hypoglycemia in Type 1 Diabetes. In: Type 1 Diabetes in
Adults, pp. 203–220. CRC Press, Boca Raton (2007)
Two-Dimensional vs. Scalar Control.. 521

12. Lehmann, E., Deutsch, T.: A physiological model of glucose-insulin interaction in


type 1 diabetes mellitus. J Biomed. Eng. 14, 235–242 (1992)
13. Matejko, B., Kukułka, A., Kieć-Wilk, B., Stąpór, A., Klupa, T., Malecki, M.T.:
Basal insulin dose in adults with type 1 diabetes mellitus on insulin pumps in
real-life clinical practice: a single-center experience. Adv. Med. 2018, 1–5 (2018).
https://doi.org/10.1155/2018/1473160
14. Paiva, H.M., Keller, W.S., da Cunha, L.G.R.: Blood-glucose regulation using
fractional-order PID Control. J. Control Autom. Electr. Syst. 31(1), 1–9 (2019).
https://doi.org/10.1007/s40313-019-00552-0
15. Śmieja, J., Gałuszka, A.: Rule-based pid control of blood glucose level. In: A. Świer-
niak, J. Krystek (eds.) ’Teoria i zastosowania. T. 2’. Wydawnictwo Politechniki
Ślaskiej, Gliwice (2018)
16. Tabassum, M.F., Farman, M., Naik, P.A., Ahmad, A., Ahmad, A.S., Hassan, S.M.:
Modeling and simulation of glucose insulin glucagon algorithm for artificial pan-
creas to control the diabetes mellitus. Netw. Model. Anal. Health Inf. Bioinf. 10(1),
1–8 (2021). https://doi.org/10.1007/s13721-021-00316-4
17. Taleb, N., et al.: Efficacy of single-hormone and dual-hormone artificial pancreas
during continuous and interval exercise in adult patients with type 1 diabetes: ran-
domised controlled crossover trial. Diabetologia 59(12), 2561–2571 (2016). https://
doi.org/10.1007/s00125-016-4107-0
18. Taleb, N., Haidar, A., Messier, V., Gingras, V., Legault, L., Rabasa-Lhoret, R.:
Glucagon in artificial pancreas systems: potential benefits and safety profile of
future chronic use. Diabetes Obes. Metab. 19(1), 13–23 (2016). https://doi.org/
10.1111/dom.12789
19. Wendt, S., et al.: Model of the glucose-insulin-glucagon dynamics after subcuta-
neous administration of a glucagon rescue bolus in healthy humans. In: Proceedings
of The American Diabetes Association’s 76th Scientific Sessions. The American
Diabetes Association, New Orleans, Louisiana, United States (2016)
20. Global Report on Diabetes. World Health Organisation (2016)
21. Guidelines on second- and third-line medicines and type of insulin for the control
of blood glucose levels in non-pregnant adults with diabetes mellitus. World Health
Organisation, Geneva (2018)
Gene Expression Analysis of the Bladder
Cancer Patients Managed by Radical
Cystectomy

Anna Tamulewicz(B) and Alicja Mazur

Faculty of Biomedical Engineering, Silesian University of Technology,


ul. Roosevelta 40, 41-800 Zabrze, Poland
Anna.Tamulewicz@polsl.pl

Abstract. The aim of this research was to find the differentially


expressed genes in the groups of patients with different stages of bladder
cancer. Proper analysis could help to find biomarkers responsible for the
occurrence of the most invasive forms of bladder cancer. The microarray
data (series GSE31684) was used and the obtained genes were character-
ized and described. The data from 93 bladder cancer patients managed
by radical cystectomy was used. The research also examined the impact
of various parameters on survival after surgery. The results of the analy-
sis showed that some genes can allow to diagnose various types of cancer
grade and stage. The research also presented how smoking, lymph node
metastases, cancer stage and grade can affect survival.

Keywords: Bladder cancer · Gene expression · Microarray · Radical


cystectomy

1 Introduction
In Poland, the number of people suffering from bladder cancer accounts for 6% of
all cancer patients. This cancer ranks 3rd among men and 15th among women
in terms of incidence. Statistically, men suffer from the disease 4 times more
often than women [30]. It is the most common neoplasm in the urinary system,
9th most frequently diagnosed neoplasm. Annually, 330,000 people suffer from
it, of whom 130,000 die [9]. Early and proper diagnosis of the cancer is very
important. The later the disease is detected, the larger the tumor is, it begins
to infiltrate other tissues, shortening the life expectancy. Very often, bladder
tumors recur and metastasize to other organs. These diseases are very serious,
they are characterized by a low survival rate and mainly affect the elderly.
The detection of biomarkers indicating the presence of neoplastic changes
could significantly help patients. Finding predictors to distinguish between inva-
sive and non-invasive types of cancer can help start the right treatment faster
and prevent unnecessary surgeries.
The main aim of this research was to discover the expression of genes respon-
sible for the occurrence of characteristic types of bladder cancer. This analysis
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 522–533, 2022.
https://doi.org/10.1007/978-3-031-09135-3_44
Gene Expression Analysis of the Bladder Cancer Patients 523

identified genes that could become biomarkers for detecting non-invasive and
invasive stages of cancer. Another task was to assess the influence of parame-
ters on the survival of people after radical cystectomy. The results may reveal
which variables could be important in the treatment of patients managed by
the surgery. The last stage of the research was to compare the data with the
available literature and the biological analysis of the detected genes.
In this research, gene expression data collected using DNA microarrays were
used. Microarrays (gene chips) are slides with a dense array of immobilized DNA
oligomers – usually single-stranded DNA molecules consisting of several or sev-
eral dozen nucleotides obtained by chemical synthesis. The analysis of microar-
ray data may be closely related to the occurrence of neoplasms. Determining the
expression of given genes indicates which of them affect the uncontrolled division
of cells, which results in the formation of cancerous structures [34]. This paper
presents an analysis of microarray data from patients with bladder cancer.

1.1 Bladder Cancer


Bladder cancer comes from the transitional epithelium that lines the bladder. It
can take the form of a wart that penetrates inside the organ. If the tumor has not
crossed the lamina propria of the mucosa, it is referred to as a surface tumor. If
there are infiltrations to the muscle surface, it is classified as an advanced stage.
The TNM (Tumor Nodes Metastases) classification is used to assess the staging
of bladder cancer (Table 1). T1-T4 stages refers to the size and extent of the
tumor. Stage T1 is recognized as problematic to be assessed due to the variable
prognosis for patients. Another characteristic TNM criterion is the assessment
of nearby lymph nodes. The N feature takes into account the number and size
of the lymph nodes and the M features describes distant metastasis [4,26].

Table 1. TNM staging system of bladder cancer – primary tumor (based on [4] and
[26])

Classification Description
Tx Primary tumor cannot be measured
T0 No evidence of tumor
Ta Noninvasive papillary tumor
T1 Tumor invades subepithelial connective tissue
T2 Tumor invades muscle layer
T3 Tumor invades perivesical tissue
T4 Tumor invades at least one of the following organs: prostate,
uterus, vagina, pelvic wall, abdominal wall

At present, the exact mechanisms by which a healthy epithelial cell changes


into a cancerous one are still unknown. Over the years, it has been discovered
that the oncogenesis of this type of cancer may be caused by mutations in tumor
suppressor genes (e.g., p53) [17]. The process of activating proto-oncogenes is also
important. Disturbances in the expression of RAS and FGFR3 oncogenes are
524 A. Tamulewicz and A. Mazur

present in the non-invasive type of cancer (Ta). Loss of arm of the chromosome
9 is also often found [11]. There are also studies confirming the presence of
6 BLCA proteins, characteristic of bladder cancer [32]. In addition to genetic
factors, the causes of cancer may also include: cigarette smoking, exposure to
armomatic amines, aniline or aromatic hydrocarbon compounds, the presence of
arsenic in drinking water, and family history of bladder cancer [23].
One of the treatment option is radical cystectomy. It is used in patients with
infiltrating bladder cancer and in people with non-infiltrative cancer that has
a high risk of progression. The procedure includes: tumors with high oncolog-
ical potential, T1 tumors of high malignancy, and non-infiltrating tumors that
have not been affected by other treatments. The cystectomy procedure consists
in removing the urinary bladder, seminal vesicles and the prostate in men. In
women, the uterus and appendages are removed. It is also recommended to
remove at least 15–17 lymph nodes to prolong survival [7].

2 Methodology
The analyzed data was quality checked and normalized. Then, a statistical anal-
ysis and a survival analysis for selected parameters were conducted.

2.1 Dataset
The analyzed data (series GSE31684) sourced from the Gene Expression
Omnibus (GEO) database – one of the resources of the National Center for
Biotechnology Information (NCBI). Affymetrix Human Genome U133 Plus 2.0
microarrays were used. The study involved 93 patients with bladder cancer who
were managed by radical cystectomy [15]. The GSE31684 series have been used
in the research presented in the paper by S. Chen et al. [6] to determine genetic
biomarkers related to survival in people with bladder cancer and to evaluate
the prediction of gene expression signature in these patients. The data descrip-
tion included the age of the patients at which the procedure was performed, the
stage of the cancer, the current condition of the patients and information on
chemotherapy. The average age of the patients was 69 years, and survival after
surgery – 48 months. Important parameters are included in the Table 2.

2.2 Data Analysis


Data analysis was conducted in R, version 3.6.3. RStudio programming environ-
ment and Bioconductor software tool were used.
The first step in the analysis was quality control according to the recommended
procedures for analyzing data from the Affymetrix microarrays [33]. The scale
factor, background level, percentage of genes called present and 3’/5’ ratios were
analyzed and the series was considered qualitatively correct.
Then expression measures were determined and data normalization using
RMA procedure [16] was performed. In the next step, the selection of differen-
tially expressed genes was performed on the basis of a one-dimensional signifi-
cance test (Student’s t-test). The dataset was divided into 2 groups: high and
Gene Expression Analysis of the Bladder Cancer Patients 525

Table 2. Description of GSE31684 series data (based on [6] and [28])

Parameter Number Parameter Number


Gender Tumor stage
Female 25 (26.9%) Ta/T1 27 (29.0%)
Male 68 (73.1%) T2 55 (59.1%)
Tumor grade T3 10 (10.8%)
High 87 (93.5%) T4 1 (1.1%)
Low 6 (6.5%) Tumor stage after cystectomy
Lymph node status pTa/pT1 15 (16.1%)
Positive 28 (30.1%) pT2 17 (18.3%)
Negative 49 (52.7%) pT3 42 (45.2%)
Unknown 16 (17.2%) pT4 19 (20.4%)
Smoking Last known status
In the past 56 (60.2%) No evidence of disease 28 (30.1%)
At present 19 (20.4%) Death from bladder cancer 38 (40.9%)
Never 18 (19.4%) Death from other causes 27 (29.0%)

low tumor grade. A p-value correction by the Benjamini-Yekutieli procedure


controlling FDR was also performed [3].
An analysis of variance (ANOVA) according to the cancer stage was also
conducted. The dataset was divided into groups: T1, T2, T3 and T4.
In addition to statistical analysis of gene expression, a survival analysis was
performed examining the influence of various parameters (cigarette smoking,
cancer stage, cancer grade, lymph node status). The Kaplan-Meier estimator
was used [18].

3 Results and Discussion


3.1 Statistical Analysis

The obtained set was divided into 2 classes according to the level of radical
cystectomy and the selection of differentially expressed genes was made. Fifteen
genes were obtained from the t-test (Table 3).
An ANOVA statistical analysis was also conducted using different stages of
cancer: T1, T2, T3, T4. As a result of the analysis of the T1 and T2 stages,
15 different genes were obtained (Table 4). As a result of comparing the T1 and
T3 stages, 8 different genes were obtained (Table 5). Among the differentiating
genes of the T1–T2 and T1–T3 groups there was the same gene BMP5.
The comparison of gene expression in the T1 and T4 groups resulted in
obtaining 144 genes. None of them matched the genes from the T1–T2 groups
comparison, but a common PRICKLE1 gene was found that was present in both
the T1–T4 and T1–T3 comparisons. In the comparison of the T2–T3 groups, 1
gene was found (SUSD4), while in the T2–T4 groups comparison 147 genes
526 A. Tamulewicz and A. Mazur

Table 3. Information on differentially expressed genes from t-test

Probe Gene Desription


202952_s_at ADAM12 ADAM metallopeptidase domain
204995_at CDK5R1 Cyclin dependent kinase 5 regulatory
205064_at SPRR1B Small proline rich protein
206510_at SIX2 SIX homeobox
208161_s_at ABCC3 ATP binding cassette subfamily
209466_x_at PTN Pleiotrophin
209641_s_at ABCC3 ATP binding cassette subfamily
213909_at LRRC15 Leucine rich repeat containing
213943_at TWIST1 Twist family bHLH transcription factor
214549_x_at SPRR1A Small proline rich protein
222774_s_at NETO2 Neuropilin and tolloid like
225930_at NKIRAS1 NFKB inhibitor interacting Ras like
228293_at DEPDC7 DEP domain containing
233110_s_at BCL2L12 BCL2 like
238673_at SAMD12 Sterile alpha motif domain containing

Table 4. Information on differentially expressed genes from ANOVA of T1–T2 groups

Probe Gene Desription


1554029_a_at TTC37 Tetratricopeptide repeat domain
1555203_s_at SLC44A4 Solute carrier family 44 member
1555383_a_at POF1B POF1B actin binding protein
1555982_at ZFYVE16 Zinc finger FYVE-type containing
205431_s_at BMP5 Bone morphogenetic protein 5
205523_at HAPLN1 Hyaluronan link protein
205524_s_at HAPLN1 Hyaluronan link protein
207549_x_at CD46 CD46 molecule
208611_s_at SPTAN1 Spectrin alpha, non-erythrocytic
209534_x_at AKAP13 A-kinase anchoring protein
209624_s_at MCCC2 Methylcrotonoyl-CoA carboxylase
210704_at FEZ2 Elongation protein zeta
211574_s_at CD46 CD46 molecule
211958_at IGFBP5 Growth factor binding protein
211959_at IGFBP5 Growth factor binding protein
214007_s_at TWF1 Twinfilin actin binding protein
215235_at SPTAN1 Spectrin alpha, non-erythrocytic
222387_s_at VPS35 VPS35 retromer complex component
230895_at HAPLN1 Hyaluronan link protein
Gene Expression Analysis of the Bladder Cancer Patients 527

Table 5. Information on differentially expressed genes from ANOVA of T1–T3 groups

Probe Gene Desription


201431_s_at DPYSL3 Dihydropyrimidinase like 3
205431_s_at BMP5 Bone morphogenetic protein
205907_s_at OMD Osteomodulin
205908_s_at OMD Osteomodulin
214587_at COL8A1 Collagen type VIII alpha 1 chain
221029_s_at WNT5B Wnt family member 5B
225214_at LOC100129034 Uncharacterized LOC100129034
226065_at PRICKLE1 Prickle planar cell polarity protein
226069_at PRICKLE1 Prickle planar cell polarity protein
228368_at ARHGAP20 Rho GTPase activating protein

were obtained, 127 of them were the same as those determined in the T1–T4
comparison. The study of the T3–T4 relationship indicated 140 differentially
expressed genes, 129 of them were identical to those in the T1–T4 groups study,
while 121 overlapped with the genes found during the T2–T4 comparison.
The series GSE31684 was also used in other studies (i.a. [6,28]). Comparing
the obtained results with the other analyses conducted on the same dataset, it
can be noticed that no common genes were found.

3.2 Survival Analysis

The overall analysis of survival by months and the patient’s condition (Fig. 1)
shows a 50% decrease in the probability of survival in the first 2 years after
surgery. Stabilization is visible between the 25th and the 100th month, the sur-
vival rate is 50–30%. In the 100th month there is a clear decrease by 17 p.p. as
compared to the previous range. The last range between 100th and 150th month
presents stabilization at a level of approximately 13% of the chance of survival.
The survival analysis with regards to lymph node status, cigarette smoking,
tumor stage and grade is presented in Fig. 2. The first analyzed parameter was
the condition of the lymph nodes. In the group of patients with metastases, there
was a decrease in the first 25 months by 60 p.p. In the group of people in which it
was not found whether metastases were present, a similar behavior of the curve
was observed. The best results are seen in the group of patients who did not
have any metastases.
For the second parameter (cigarette smoking) it can be seen that the great-
est decrease in survival for the first years after surgery was observed for ex-
smokers. 20 months after the surgery, the survival rate was 40%, while for the
next 90 months it decreased, down to about 5%.
The analysis of the impact of cancer stage on survival showed that the best
results were found in the Ta/T1 group. At this stage, the probability of survival
528 A. Tamulewicz and A. Mazur

Fig. 1. The overall survival analysis by the Kaplan-Meier estimator

Fig. 2. The survival analysis according to lymph node status, cigarette smoking, tumor
stage and grade by the Kaplan-Meier estimator

was about 85–90%, but before the 100th month it dropped to the value of about
47% and remained at this level. In the T2 group, the probability decreased by 30
p.p. within 2 years from the surgery, and remained stable until the 50th month.
In the T3 stage, the mortality of patients is high up to about 30 months after the
procedure – the probability of survival at that time reached the value of 40%.
For people in the T4 stage, the survival value drops drastically for up to 2 years
after the surgery, and then stabilizes at around 15%.
In the group of low tumor grade, in the first few months, the probability
decreased to 85% and remained at this level until the 85th month. During this
period, it fell again to 55%. In the group of high tumor grade, the probability of
survival decreases from the beginning of the analysis, then it stabilizes around
the 110th month at the value of 12%
Gene Expression Analysis of the Bladder Cancer Patients 529

3.3 Analysis of the Differentially Expressed Genes Obtained


in the T-Test

The differentially expressed genes found in the t-test (Table 3) are described
below based on the Atlas of Genetics and Cytogenetics in Oncology and Haema-
tology [1]:
– ADAM12 – the studies showed that the expression level was correlated with
the stage and grade of tumor malignancy. After surgical removal of the tumor,
the expression level decreased, but increased in the case of relapse [1].
– PTN – studies have shown that high expression was associated with the
advanced stage of several tumors and short survival times. The expression
level was not associated with lymph node metastases or grade. The gene can
serve as a biomarker for predicting adverse outcomes in survival analysis [14].
– ABCC3 – high expression of this gene has been detected in patients with
bladder cancer. Despite this, the role of the gene in the human body remains
unclear. The levels of mRNA and protein in bladder cancer patients were
much higher than in the group of healthy people. Additionally, expression
was found to be associated with tumor size, malignancy and lymph node
metastasis. Research results indicate that ABCC3 may be a potential prog-
nostic marker [20].
– LRRC15 – research indicates its high expression in fibroblasts in many
tumors. In an experiment conducted on several types of cancer, the LRRC15
gene was positively expressed in 47% of bladder cancer cells [27].
– TWIST1 – it was shown that patients with high expression of this gene and
the YB-1 gene have lower survival rates than people with low expression
of these genes. Both genes can be considered as promoters of bladder cancer
progression [31]. Studies have also shown increased expression of the TWIST1
gene in 60% of bladder cancer patients who smoked and had worse clinical
results [12].
– NETO2 – the gene could be a new potential marker of kidney cancer. Research
has shown that the expression of this gene may also be associated with other
malignant neoplasms of the lung and the bladder [25].
– DEPDC7 – the DEPDC7 gene has not been identified as a tumor marker,
however the DEPDC1 protein from the same family was shown to be overex-
pressed in bladder cancer cells. It is also associated with several other types
of cancer and contributes to the carcinogenicity [19].
– BCL2L12 – the studies carried out on patients with bladder cancer showed
that the samples with malignant cells, compared to normal cells, were char-
acterized by a high expression of the BCL2L12 gene. Expression was also
correlated with a higher relapse rate in patients with Ta–T1 stage [13].

Among the 14 genes selected in the t-test, no information was found about 6 of
them: CDK5R1, SPRR1B, SPRR1A, SIX2, NKIRAS1, SAMD12. The remaining
8 genes are associated with cancer, including bladder tumors.
530 A. Tamulewicz and A. Mazur

3.4 Analysis of the Differentially Expressed Genes Obtained


in ANOVA

The differentially expressed genes from ANOVA of T1–T2 groups (Table 4) are
described below:

– SLC44A4 – studies showed that SLC44A4 is upregulated in epithelial tumors,


most notably prostate and pancreatic cancer [22].
– SHH – human tissue analysis confirmed that there was a higher expression of
SHH in benign neoplastic lesions compared to the invasive stage [29].
– CD46 – studies showed that gene expression was very low in healthy samples,
while tissue from bladder cancer patients showed increased expression in 29
out of 59 cases. High expression of the CD46 gene has a negative correlation
with the stage and malignancy of the tumor and with the survival time [10].
– SPTAN1 – SPTAN1 may affect tumor growth by increasing the migration of
damaged cells. High levels of cytoplasmic SPTAN1 can be used as a marker
for tumors. SPTAN1 also plays a role in survival, angiogenesis, and other
cellular mechanisms. In bladder tumors, the gene was detected in relapses. It
can therefore be used as a predictor of disease recurrence in its early stages
[2].
– IGFBP5 – studies have shown that IGFBP5 mRNA levels in bladder can-
cer samples were negatively correlated with IGF1R gene activation, indicat-
ing that IGFBP5 overexpression may be a useful marker of IGF1R inhibitor
insensitivity. It has also been proven that high expression levels are associ-
ated with poor prognosis in patients with urothelial carcinomas of the upper
urinary tract and bladder [24].

No references were found in the literature for the remaining 10 genes due to their
occurrence in bladder tumors. However, this does not mean that their expres-
sion is not associated with bladder cancer. In the analysis of the relationship
between the T1–T3 groups (Table 5), 8 genes were found, of which the BMP5
gene was already detected in the earlier comparison and its connection with the
development of neoplasms was described. The differentially expressed genes from
ANOVA of T1–T3 groups are described below:

– COL8A1 – research has shown that the gene is associated with the stage of
bladder tumors. In addition, the COL5A1 and COL8A1 genes were the most
significant in predicting the prognosis of this bladder cancer and can be used
as predictors [8].
– WNT5B – the microarray analysis showed that oncogenic genes: FABP4,
HBP17, RGS4, TIMP3, WNT5B, URB and COL8A1 were significantly sup-
pressed by emodin – a strong mutagen. This confirms the influence of the 2
found genes as cancer factors [5].
– PRICKLE1 – the studies conducted in the group of patients with bladder
cancer of the Ta and T1 stages resulted in obtaining 33 genes differentiating
the groups of patients with relapse after one year and without relapse. Set of
genes, including PRICKLE1, in the group of patients without relapse showed
Gene Expression Analysis of the Bladder Cancer Patients 531

significant expression. These genes may constitute a pool of biomearkers that


detect early stages of cancer [21].
The other genes have not been described. The gene found as a result of the
comparison of the T2-T3 groups was also checked, however, no studies indicate
an association of SUSD4 with the occurrence of neoplastic changes. Finding
119 common genes when comparing the T1–T4, T2-T4 and T3-T4 groups may
indicate that the invasive stages of cancer do not show distinct gene expression
from each other.

4 Summary and Conclusions


The analysis of data from 93 patients suffering from bladder cancer revealed
that the expression of certain genes influences different stages of cancer and its
malignancy. As a result of the t-test analysis, 14 genes differentiated between
high and low tumor grade were discovered. It was confirmed that 8 of them:
BCL2L12, ADAM12, PTN, ABCC3, LRRC15, TWIST1, NETO2, DEPDC7
have been described as tumor predictors.
ANOVA analysis led to the detection of 15 genes distinguishing between
T1–T2 stages. The studies confirmed that 4 of them: BMP5, CD46, SPTAN1,
IGFBP5W can be classified as tumor biomarkers and that the SLC44A4 gene
was present in healthy bladder cells. Research with a healthy control group would
need to be conducted to see if the classification of this gene as a biomarker to
discriminate between T1–T2 stages was correct. Eight genes were found in the
T1–T3 analysis, with the BMP5 gene already present in the previous compar-
ison. Descriptions of the following genes and their connection with the cancer
development have been found in the literature: COL8A1, WNT5B, PRICKLE1.
The detection of 119 common genes for the T1–T4, T2–T4 and T3–T4 groups
may indicate similar gene expression at these stages.
Survival analysis showed that all the parameters studied: smoking, lymph
nodes metastasis, cancer stage and grade, influence the probability of survival.
The conducted research has shown that there are genes that influence the
occurrence of a given stage and the tumor grade. Furthermore, the genes selected
in this study have not been described in other studies conducted on the series
GSE31684 which could provide new insights into analysis of genes responsible
for the occurrence of bladder cancer. Further research could determine if all the
genes found were classified correctly and could be considered biomarkers.

References
1. Atlas of Genetics and Cytogenetics in Oncology and Haematology. http://
atlasgeneticsoncology.org/
2. Ackermann, A., Brieger, A.: The role of nonerythroid spectrin alpha II in cancer.
J. Oncol. 2019, 1–14 (2019)
3. Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple
testing under dependency. Ann. Stat. 29(4), 1165–1188 (2001)
532 A. Tamulewicz and A. Mazur

4. Bostrom, P.J., et al.: Staging and staging errors in bladder cancer. Eur. Urol. Suppl.
9, 2–9 (2010)
5. Cha, T.L.: Emodin modulates epigenetic modifications and suppresses bladder car-
cinoma cell growth. Mol. Carcinog. 54(3), 167–177 (2015)
6. Chen, S., Zhang, N., Shao, J., Wang, T., Wang, X.: A novel gene signature com-
bination improves the prediction of overall survival in urinary bladder cancer. J.
Cancer 10, 5744–5753 (2019)
7. Chłosta, P.: Radykalne wycięcie pęcherza moczowego metodą laparoskopową - tech-
nika, wyniki i ograniczenia. Przegląd urologiczny 2(66), 24–28 (2011). (in Polish)
8. Di, Y., Chen, D., Yu, W., Yan, L.: Bladder cancer stage-associated hub genes
revealed by WGCNA co-expression network analysis. Hereditas 156(1), 1–11
(2019)
9. Długosz, A., Królik, E.: Profilaktyka w raku pęcherza moczowego. Biuletyn Pol-
skiego Towarzystwa Onkologicznego Nowotwory 2(4), 321–327 (2017). (in Polish)
10. Do, M.H., et al.: Targeting CD46 enhances anti-tumoral activity of adenovirus type
5 for bladder cancer. Int. J. Mol. Sci. 19, 2694 (2018)
11. Drewa, T.: Biologia raka naciekającego błonę mięśniową pęcherza moczowego.
Przegląd urologiczny 2(66), 10–12 (2011). (in Polish)
12. Fondrevelle, M.E., et al.: The expression of twist has an impact on survival in
human bladder cancer and is influenced by the smoking status. Urol. Oncol. Semin.
Original Invest. 27(3), 268–276 (2009)
13. Foutadakis, S., Avgeris, M., Tokas, T., Stravodimos, K., Scorilas, A.: Increased
BCL2L12 expression predicts the short-term relapse of patients with TaT1 bladder
cancer following transurethral resection of bladder tumors. Urol. Oncol. Semin.
Original Invest. 32(1), 39.e29-39.e36 (2014)
14. Fröhlich, C., Albrechtsen, R., Dyrskjøt, L., Rudkjær, L., Ørntoft, T.F., Wewer,
U.M.: Molecular profiling of ADAM12 in human bladder cancer. Clin. Cancer Res.
12(24), 7359–7368 (2006)
15. Gene Expression Omnibus: Series GSE31684. https://www.ncbi.nlm.nih.gov/geo/
query/acc.cgi?acc=GSE31684
16. Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J.,
Scherf, U., Speed, T.P.: Exploration, normalization, and summaries of high density
oligonucleotide array probe level data. Biostatistics 4(2), 249–264 (2003)
17. Izdebska, M., Grzanka, A., Ostrowski, M.: Rak pęcherza moczowego - molekularne
podłoże genezy i leczenia. Kosmos 54(2–3), 213–220 (2005). (in Polish)
18. Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations.
J. Am. Stat. Assoc. 53(282), 457–481 (1958)
19. Liao, Z., Wang, X., Zeng, Y., Zou, Q.: Identification of dep domain-containing pro-
teins by a machine learning method and experimental analysis of their expression
in human HCC tissues. Sci. Rep. 6(1), 1–11 (2016)
20. Liu, X., et al.: Overexpression of ABCC3 promotes cell proliferation, drug resis-
tance, and aerobic glycolysis and is associated with poor prognosis in urinary
bladder cancer patients. Tumor Biol. 37(6), 8367–8374 (2016). https://doi.org/10.
1007/s13277-015-4703-5
21. Mares, J., Szakacsova, M., Soukup, V., Duskova, J., Horinek, A., Babjuk, M.: Pre-
diction of recurrence in low and intermediate risk non-muscle invasive bladder can-
cer by real-time quantitative PCR analysis: cDNA microarray results. Neoplasma
60(3), 295–301 (2013)
22. Mattie, M., et al.: The discovery and preclinical development of ASG-5ME, an
antibody–drug conjugate targeting SLC44A4-positive epithelial tumors including
pancreatic and prostate cancer. Mol. Cancer Ther. 15(11), 2679–2687 (2016)
Gene Expression Analysis of the Bladder Cancer Patients 533

23. Mitra, A.P., Cote, R.J.: Molecular pathogenesis and diagnostics of bladder cancer.
Annu. Rev. Pathol. 4(1), 251–285 (2009)
24. Neuzillet, Y., et al.: IGF1R activation and the in vitro antiproliferative efficacy
of IGF1R inhibitor are inversely correlated with IGFBP5 expression in bladder
cancer. BMC Cancer 17(1), 1–12 (2017)
25. Oparina, N., et al.: Increase in NETO2 gene expression is a potential molecular
genetic marker in renal and lung cancers. Russ. J. Genet. 48, 506–512 (2012).
https://doi.org/10.1134/S1022795412050171
26. Poletajew, S.: Ocena stopnia zaawansowania raka pęcherza moczowego. Przegląd
urologiczny 3(73), 22–26 (2012). (in Polish)
27. Purcell, J.W., et al.: LRRC15 is a novel mesenchymal protein and stromal target
for antibody–drug conjugates. Cancer Res. 78(14), 4059–4072 (2018)
28. Riester, M., et al.: Combination of a novel gene expression signature with a clinical
nomogram improves the prediction of survival in high-risk bladder cancer. Clin.
Cancer Res. 18(5), 1323–1333 (2012)
29. Shin, K., et al.: Hedgehog signaling restrains bladder cancer progression by eliciting
stromal production of urothelial differentiation factors. Cancer Cell 26(4), 521–533
(2014)
30. Skrzypczyk, M., Grothuss, G., Dobruch, J., Chłosta, P.L., Borówka, A.: Rak
pęcherza moczowego w Polsce. Bladder cancer in poland. Postępy Nauk Medy-
cznych/Progress in Medicine 25(4), 311–319 (2012). (in Polish)
31. Song, Y.H., et al.: TWIST1 and Y-box-binding protein-1 are potential prognostic
factors in bladder cancer. Urol. Oncol. Semin. Original Invest. 32(1), 31.e1-31.e7
(2014)
32. Szymańska, B., Długosz, A.: The role of the BLCA-4 nuclear matrix protein in
bladder cancer. Adv. Hyg. Exp. Med./Postępy Higieny i Medycyny Doświadczalnej
71, 681–689 (2017)
33. Wilson, C.L., Miller, C.J.: Simpleaffy: a BioConductor package for affymetrix qual-
ity control and data analysis. Bioinformatics 21(18), 3683–3685 (2005)
34. Xiong, J.: Essential Bioinformatics. Cambridge University Press, New York (2006)
The Influence of Low-Intensity Pulsed
Ultrasound (LIPUS) on the Properties
of PLGA Biodegradable Polymer
Coatings on Ti6Al7Nb Substrate

Karolina Goldsztajn1(B) , Janusz Szewczenko1 , Joanna Jaworska2 ,


Katarzyna Jelonek2 , Katarzyna Nowińska3 , Wojciech Kajzer1 ,
and Marcin Basiaga1
1
Departament of Biomaterials and Medical Devices Engineering, Faculty of
Biomedical Engineering, Silesian University of Technology, ul. Roosevelta 40,
41-800 Zabrze, Poland
{karolina.goldsztajn,janusz.szewczenko,wojciech.kajzer,
marcin.basiaga}@polsl.pl
2
Centre of Polymer and Carbon Materials of the Polish Academy of Sciences,
ul. M. Curie-Sklodowskiej 34, 41-819 Zabrze, Poland
{jjaworska,kjelonek}@cmpw-pan.edu.pl
3
Department of Applied Geology, Faculty of Mining, Safety Engineering
and Industrial Automation, Silesian University of Technology,
ul. Akademicka 2, 44-100 Gliwice, Poland
katarzyna.nowinska@polsl.pl

Abstract. The influence of LIPUS on the properties of the biodegrad-


able polymer coatings containing an active substance on the anodized tita-
nium alloy was investigated. PLGA polymer coatings with ciprofloxacin
were applied using the dipping method. The samples were stimulated by
ultrasound in Ringer’s solution or were exposed only to solution for 1, 3 or
4 weeks. The influence of ultrasound stimulation was determined by micro-
scopic observation, wettability, metallic ion release and drug release. The
polymer coating used in the study is characterized by hydrophilic prop-
erties and barrier properties limiting the release of substrate degradation
products into solution. Application of LIPUS caused a decrease of coatings’
barrier properties and an increase of hydrophilic properties. However, the
results show the usefulness of polymer coatings in bone fracture stabilizers.
Moreover, the application of biodegradable polymer coatings enables drug
delivery to the bone fracture, which release can be controlled by different
parameters of LIPUS therapy.

Keywords: LIPUS · Titanium alloys · Biodegradable polymer


coatings

1 Introduction
Titanium alloys are one of the most commonly used metallic biomaterials. High
corrosion resistance, low density and good biocompatibility in an environment
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 534–543, 2022.
https://doi.org/10.1007/978-3-031-09135-3_45
The Influence of LIPUS on PLGA Polymer Coatings 535

of tissue and body fluids are properties that determine their usefulness, espe-
cially as implants for osteosynthesis. However, after many years of research, was
proved that they are not fully biologically inert, but may cause allergies and
other adverse reactions. For this reason and to improve their biocompatibility,
bioactivity and corrosion resistance, the use of a titanium alloy requires proper
surface treatment. One of the most popular method is anodic oxidation, which
allows obtaining a passive layer with properties controlled by process parame-
ters. However, anodization does not prevent the migration of metal ions (such
as vanadium, aluminum or niobium) from the implant surface to the tissue envi-
ronment [5,6,11].
One of the modification methods may be the use of biodegradable polymer
coatings. Application of polymer coatings, such as poly(D,L-lactide-co-glycolide)
PLGA, improve biocompatibility by limiting the penetration of metal ions into
the tissue and body fluids environment, which was proven in previous studies.
Moreover, the degradation of the polymer will not deteriorate the mechanical
properties of the implant, since the stability is ensured by the metal substrate [4,
9,10]. In addition, biodegradable coatings can be also a matrix for the release of
active substances. Released substances may have a beneficial effect on the bone
tissue healing process and also reduce the need for systemic drug therapy. One
of the available substances is ciprofloxacin (CFX) [3].
Ciprofloxacin is the second-generation fluoroquinolone. It is used in case of
urinary tract infections, chronic bacterial prostatitis, bone and joint infections,
lower respiratory tract infections, acute sinusitis, and skin infections due to its
antimicrobial activity. Release kinetics of the CFX is related to the degradation
process of the polymer coating and affects the achieved therapeutic effect [7,12].
To support the bone healing process, not only locally delivered drugs can be
used, but also stimulation with physical factors, which include, among others,
low-frequency pulsed ultrasound.
Low-Intensity Pulsed Ultrasound (LIPUS) is a non-invasive therapy support-
ing fracture healing. Ultrasound waves induce micromechanical stress at a frac-
ture site, which stimulates molecular and cellular responses involved in fracture
healing. LIPUS therapy is frequently used to enhance or accelerate fracture heal-
ing, by employing a sinusoidal ultrasonic wave with specific parameters [1,8].
There are no previous reports describing the influence of LIPUS on prop-
erties of biodegradable polymer coatings. Therefore, the aim of the research
was to determine the influence of Low-Intensity Pulsed Ultrasound therapy
on the degradation of PLGA polymer coating formed on titanium alloy and
ciprofloxacin release kinetics.

2 Materials and Methods

The material used in the tests was Ti6Al7Nb alloy. The samples in the form of
discs were taken from rods of 25 mm in diameter. Their chemical composition,
structure, and mechanical properties met the ISO 5832-11 standards require-
ments [2]. The surface of the samples was modified by grinding on 120, 320 and
536 K. Goldsztajn et al.

500 grit sandpaper, sandblasting and anodic oxidation. Anodic oxidation was
carried out in a bath based on phosphorous and sulfonic acid at 97 V for 2 min.
The surface of the metal substrate was coated by dip-coating method
with biodegradable poly(D,L-lactide-co-glycolide) (PLGA) copolymer with
comonomer ratio of 85:15 containing ciprofloxacin (CFX). PLGA was synthe-
sized by the ring-opening polymerization of glycolide and D,L-lactide at 130◦ C
for 24 h and 120◦ C for 48 h at argon atmosphere using Zirconium (IV) as a non-
toxic initiator. The samples were immersed into PLGA with CFX solution in
dichloromethane using Dip Coater PTL-OV6P, MTI CORPORATION (1 dip-
ping cycle, 30 s of immersion time, 1500 mm mm/min of immersion rate). Coated
samples were dried first, for 3 d in the air and next, for one week under reduced
pressure. All samples were sterilized using radiation with an energy of 10 MeV
and a dose of 25 kGy.
The non-coated and coated samples were immersed in 0.1 dm3 Ringer’s solu-
tion of the following chemical composition: NaCl – 8.6 g/dm3 , CaCl2 2H2 O –
0.33 g/dm3 , KCl – 0.3 g/dm3 . The samples immersed in Ringer’s solution were
subjected to LIPUS therapy for 20 min per day with the following parameters:
ultrasound frequency 1.5 MHz, intensity 30 mW/cm2 , pulse duration 250 µs and
a repetition rate of 1 kHz. Samples only immersed in Ringer’s solution were used
as reference material. During the exposure, the samples were kept in the heating
chamber at 37◦ C (Binder FD 115).
The samples’ surface observations were performed using the stereoscopic Zeiss
Stereo Discovery V8 microscope with MC5s camera. Tests were carried out for
metal substrate and the samples with polymer coatings in the initial state and
after 1, 3 and 4 weeks exposure.
To determine the surface wettability of metal substrate and biodegradable
polymer coatings before and after 1, 3 and 4 weeks of exposure, contact angle
measurements (θ) were performed using a drop of distilled water of the vol-
ume of 1.5 mm3 , at room temperature (T=23◦ C) at the test stand consisting
of SURFTENS UNIVERSAL goniometer by OEG and a PC with Surftens 4.5
software for analyzing the recorded image of drops. The studies were carried out
in 60 s with a sampling rate 1 Hz.
Metal ions concentration in Ringer’s solution was measured with JY 2000
spectrometer by Yobin-Yvon, using Inductively Coupled Plasma-Atomic Emis-
sion Spectrometry (ICP-AES) method. The studies were performed for both the
non-coated and the coated samples after 1, 3 and 4 weeks exposure.
Ciprofloxacin release from the polymer coatings was investigated using the
extraction method. The supernatant was determined using high-performance
liquid chromatography (HPLC). Measurements were carried out using VWR-
Hitachi LaChromElite apparatus with the LiChroCART Purospher STAR RP-
18e column (150×4.6, 5µ m). The mobile phase consisted of 2% glacial acetic
acid and acetonitrile (84:16) at the flow rate of 1 ml/min. CFX was monitored
by a diode array detector at 280 nm. The studies were performed for samples
after 1, 3 and 4 weeks of LIPUS therapy and exposure to Ringer’s solution.
The Influence of LIPUS on PLGA Polymer Coatings 537

3 Results
3.1 Surface Observations

Macroscopic observations of the metal substrate in the initial state showed the
topography characteristic for sandblasting, as well as uneven coloration result-
ing from anodic oxidation (Fig. 1a). The biodegradable polymer coatings with
ciprofloxacin applied on the metal substrate are homogeneous, continuous and
transparent (Fig. 2a). Macroscopic observations after exposure to Ringer’s solu-
tions have shown the presence of salt crystals on the surface of the metal sub-
strate (Fig. 1d). No visible changes of PLGA coating were observed after expo-
sure to the corrosive environment (Fig. 2). After LIPUS treatment local brown
discolorations were observed on the surface of titanium alloy, the number of
which increased with the duration of exposure. However, after three weeks these
discolorations still did not cover the entire surface. In addition, these changes
were not observed after four weeks and on the surface of polymer coatings
(Fig. 3). Heterogeneities and discolorations on polymer coatings’ surface have
already been noticed after 1 week of LIPUS therapy. Their amount and areas
increased with the duration of the exposure, which resulted from the degrada-
tion of polymer (Fig. 4).

Fig. 1. Exemplary images of the Ti6Al7Nb surface a) in initial state, b) after 1 week,
b) after 3 weeks, c) after 4 weeks in Ringer’s solution
538 K. Goldsztajn et al.

Fig. 2. Exemplary images of the PLGA polymer coating a) in initial state, b) after
1 week, b) after 3 weeks, c) after 4 weeks in Ringer’s solution

3.2 Wettability

All analyzed surfaces were hydrophilic (Fig. 5). The surface of the substrate
in the initial state was characterized by the lowest wettability. Application of
biodegradable polymer coatings containing active substance increased wettabil-
ity. Exposure to Ringer’s solution caused a decrease in the contact angle of the
coatings, which is probably due to swelling in the solution. The exposure time
improves the wettability of the tested surfaces. The LIPUS treatment almost
always decreased the value of the contact angle both for the substrate surface
and the surfaces of PLGA coatings. The improvement in wettability followed
with the duration of exposure. However, after four weeks in Ringer’s solution
combined with LIPUS, a decrease in wettability was observed for the titanium
alloy.

3.3 Ions Release

The density of metal ions mass in the Ringer’s solution after 1, 3 and 4 weeks
of exposition is shown in Fig. 6. The highest mass density of ions (Ti, Al, Nb)
permeating from the surface to the solution was observed for the Ti6Al7Nb
alloy substrate. Biodegradable polymer coatings with active substance applied
on the surfaces of a metal substrate significantly decreased the mass density of
metal ions in the solution. This proves that polymer coatings fulfill a protective
The Influence of LIPUS on PLGA Polymer Coatings 539

Fig. 3. Exemplary images of the Ti6Al7Nb surface a) in initial state, b) after 1 week,
b) after 3 weeks, c) after 4 weeks of LIPUS therapy

function as a barrier to the degradation products of the metal substrate. The


mass density of titanium, aluminum and niobium ions in the solution increases
with the time of exposure to the Ringer’s solution for both non-coated and coated
samples. The application of LIPUS in a corrosive environment had a significant
impact on the increase in the mass density of ions released from the surface of the
metal substrate as well as samples with biodegradable polymer coatings. This
shows the influence of LIPUS to increase the degradation rate of biodegradable
polymer coatings containing the active substance. Moreover, the increase in the
time of LIPUS in Ringer’s solution increased the mass density of metallic ions.

3.4 CFX Rrelease

Due to the low concentration of the drug in the coating and the detection limit
of the HPLC method, it was not possible to perform a quantitative analysis.
Only a qualitative analysis was carried out in the study.
The increase in time of exposure to Ringer’s solution as well as Ringer’s
solution combined with LIPUS therapy caused an increase in the amount of
ciprofloxacin released from PLGA coating. However, the fastest release of CFX
was observed during the first week of exposure for both cases. The use of LIPUS
therapy increases the amount of released CFX, which is most likely related to
the increase in the degradation rate of the polymer coating.
540 K. Goldsztajn et al.

Fig. 4. Exemplary images of the PLGA polymer coating a) in initial state, b) after
1 week, b) after 3 weeks, c) after 4 weeks of LIPUS therapy

Fig. 5. Surface wettability of Ti6Al7Nb and polymer coatings

4 Discussion
The analysis of the obtained results shows that the application of LIPUS has an
impact on the degradation of PLGA polymer coatings.
In the initial state, the samples showed a topography characteristic for sand-
blasting, which resulted in uneven color after anodic oxidation. PLGA biodegrad-
The Influence of LIPUS on PLGA Polymer Coatings 541

Fig. 6. Density of metal ion mass penetrating to the Ringer’s solution: a) Ti ions, b)
Al ions, c) Nb ions
542 K. Goldsztajn et al.

able polymer coatings containing the ciprofloxacin, after being applied to the sur-
face, showed translucency, homogeneity and continuity (Fig. 2). After exposure
to Ringer’s solution, the salt crystals were observed on the surface of the metal
substrate (Fig. 1). The changes observed on the surfaces of both the substrate and
biodegradable coatings subjected to LIPUS, show heterogeneity, not observed in
the case of samples exposed to a corrosive environment only (Fig. 3, 4).
The research showed that the use of PLGA polymer coatings with active
substance leads to an improvement in the wettability of the surface. In addition,
hydrophilicity improves with the time of exposure to Ringer’s solution. Most
likely, this is the result of the biodegradable coating swelling influenced by the
solution. LIPUS therapy decreases the contact angle of titanium alloy as well as
polymer coatings, after 1, 3 and 4 weeks of exposure. After the same exposure
periods, the higher wettability of polymer coatings is observed for samples sub-
jected to the LIPUS combined with Ringer’s solution in comparison to exposure
only to a corrosive environment (Fig. 5). This may indicate that ultrasonic waves
increase the rate of polymer coatings degradation.
In Ringer’s solution, in which non-coated and coated samples were immersed,
the presence of titanium, aluminum and niobium ions was observed. The highest
concentration of metal ions was released from the surface of Ti6Al7Nb alloy. The
application of the polymer coatings containing ciprofloxacin causes a reduction
in the metal ions’ mass density compared to the metal substrate. This demon-
strates good protective properties of the polymer coatings effectively limiting
the degradation of the metal substrate. In addition, the mass density of ions
released from the surface increases with the time of exposure to Ringer’s solu-
tion. After LIPUS treatment in the corrosive environment, a significant increase
in the mass density of ions released from the surface was observed in all ana-
lyzed cases. Additionally, larger amounts of ions were observed in the solution
after extended periods of exposure. There is a deterioration of the barrier func-
tion fulfilled by biodegradable polymer coatings (Fig. 6). Moreover, with the
time of exposure to both corrosive environment and LIPUS in Ringer’s solu-
tion an increase in the amount of CFX released from the biodegradable coating
was observed. Also, the use of LIPUS therapy increases the amount of released
ciprofloxacin compared to polymer coatings exposure to the corrosive environ-
ment only. In the overview with the macroscopic observation of the surface, as
well as the results of wettability tests, it can be assumed that the application of
LIPUS may increase the rate of degradation of the polymer coating and active
substance release, which can be controlled by the therapy parameters.

5 Conclusions
Poly Lactic-co-Glycolic Acid (PLGA) biodegradable polymer coatings contain-
ing ciprofloxacin formed on Ti6Al7Nb were characterized as continuous, homo-
geneous and translucent. Polymer coatings formed on titanium alloys decrease
the mass density of metal ions released from metal substrate into the environ-
ment. Moreover, degradation of polymer coatings and ciprofloxacin release can
be controlled by parameters of LIPUS therapy.
The Influence of LIPUS on PLGA Polymer Coatings 543

References
1. Berber, R., Aziz, S., Simkins, J., Lin, S.S.: Low intensity pulsed ultrasound therapy
(LIPUS): a review of evidence and potential applications in diabetics. J. Clin.
Orthop. Trauma 11, 500–505 (2020). https://doi.org/10.1016/j.jcot.2020.03.009
2. ISO 5832–11:2014 Implants for surgery - Metallic materials - Part 11: Wrought
titanium 6-aluminium 7-nobium alloy
3. Jaworska, J., et al.: Development of antibacterial, ciprofloxacin-eluting biodegrad-
able coatings on Ti6Al7Nb implants to prevent peri-implant infections. J. Biomed.
Mater. Res. Part A 108, 1006–1015 (2020). https://doi.org/10.1002/jbm.a.36877
4. Kajzer, W., et al.: Corrosion resistance of Ti6Al4V alloy coated with caprolactone-
based biodegradable polymeric coatings. Maintenance Reliab. 20(1), 30–38 (2018).
https://doi.org/10.17531/ein.2018.1.5
5. Kiel-Jamrozik, M., Szewczenko, J., Basiaga, M., Nowińska, K.: Technological
capabilities of surface layers formation on implant made of Ti-6Al-4V ELI alloy.
Acta Bioeng. Biomech. 17(1), 31–37 (2015). https://doi.org/10.5277/ABB-00065-
2014-03
6. Liu, X., Chu, P.K., Ding, C.: Surface modification of titanium, titanium alloys,
and related materials for biomedical application. Mater. Sci. Eng. R 47, 49–121
(2004)
7. Ma, X., Xia, Y., Xu, H., Lei, K., Lang, M.: Preparation, degradation and in
vitro release of ciprofloxacin-eluting ureteral stents for potential antibacterial
application. Mater. Sci. Eng. C 66, 92–99 (2016). https://doi.org/10.1016/j.msec.
2016.04.072
8. Rutten S., van den Bekerom M.P.J., Sierevelt I. N., Nolte P.A.: Enhancement of
bone-healing by low-intensity pulsed ultrasound. JBJS Rev. 4(3) (2016). https://
doi.org/10.2106/jbjs.rvw.o.00027
9. Szewczenko, J., Kajzer, W., Grygiel-Pradelok, M., Jaworska, J., Jelonek, K.,
Nowińska, K., Gawliczek, M., Libera, M., Marcinkowski, A., Kasperczyk, J.: Cor-
rosion resistance of PLGA-coated biomaterials. Acta Bioeng. Biomech. 19(1), 173–
179 (2017). https://doi.org/10.5277/ABB-00556-2016-04
10. Szewczenko, J., et al.: Biodegradable polymer coatings on Ti6Al7Nb alloy. Acta
Bioeng. Biomech. 21(4), 83–92 (2019). https://doi.org/10.37190/ABB-01461-
2019-01
11. Wang, M.: Surface modification of metallic biomaterials for orthopedic applica-
tions. Mater. Sci. Forum 618–619, 285–290 (2009). https://doi.org/10.4028/www.
scientific.net/MSF.618-619.285
12. Zhang, G.F., Liu, X., Zhang, S., Pan, B., Liu, M.L.: Ciprofloxacin derivatives and
their antibacterial activities. Eur. J. Med. Chem 146, 599–612 (2018). https://doi.
org/10.1016/j.ejmech.2018.01.078
Author Index

A D
Affanasowicz, Alicja, 321 Danch-Wierzchowska, Marta, 28, 43, 321, 332,
Augustyniak, Piotr, 66, 367 435
Dardzińska-Gł˛ebocka, Agnieszka, 107
Domino, Małgorzata, 54
B Doroniewicz, Iwona, 321, 332
Badura, Aleksandra, 345, 406 Dowgierd, Krzysztof, 377
Badura, Dariusz, 332, 435
Badura, Paweł, 84 F
Bajger, Adam, 498 Farhan, Nabeel, 15
Barańska, Klaudia, 94 Fujarewicz, Krzysztof, 487
Basiaga, Marcin, 534
Bednorz, Adam, 194 G
Bibrowicz, Karol, 393 Gaus, Olaf, 15
Gertych, Arkadiusz, 271
Bieńkowska, Maria, 194, 345
Goldsztajn, Karolina, 534
Biguri, Ander, 107
Gorzkowska, Agnieszka A., 28
Borowska, Marta, 54, 107
Grzegorzek, Marcin, 3, 285, 295, 307
Brombach, Nick, 15
Gumulski, Jakub, 455
Brück, Rainer, 15
Bugdol, Marcin, 28, 43, 76, 377, 406, 435 H
Bugdol, Monika N., 28, 43, 332, 406, 435 Hahn, Kai, 15
Hoffmann, Raoul, 3
Hu, Weiming, 285
C
Celniak, Weronika, 66 J
Cenda, Piotr, 143 Jagodzińska, Adrianna, 271
Chen, Haoyuan, 285, 295 Jakubikova, Lucia, 498
Cieślak, Adam, 119 Jangas, Mariusz, 168
Cyprys, Paweł, 271 Jankowska, Marta, 455
Czajkowska, Joanna, 208 Jasiński, Tomasz, 54
Czak, Mirosław, 393, 421 Jasionowska-Skop, Magdalena, 234, 246
Czarlewski, Robert, 28 Jaworska, Joanna, 534

© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 545–547, 2022.
https://doi.org/10.1007/978-3-031-09135-3
546 Author Index

J˛edzierowska, Magdalena, 155 O


Jelonek, Katarzyna, 534 Obuchowicz, Rafał, 119, 143
Jesionek, Magdalena, 406 Ostrek, Grzegorz, 234, 246

K P
Kajzer, Wojciech, 534 Paj˛ak, Anna, 367
Kałuża, Justyna, 132 Pawelec, Łukasz, 421
Kania, Damian, 377, 393 Pierides, Iro, 498
Kasik, Vladimir, 356 Pietka, Ewa, 345
Kawa, Jacek, 194 Pietruszewska, Wioletta, 132
Keil, Alexander, 15 Piórkowski, Adam, 119, 143
Kieszczyńska, Katarzyna, 321 Płatkowska-Szczerek, Anna, 208
Kobus, Monika, 168 Pociask, Elżbieta, 181
Kociołek, Marcin, 222 Polewczyk, Zofia, 377
Koprowski, Robert, 155 Pollak, Anita, 393, 406
Korzekwa, Szymon, 208 Popelinsky, Lubos, 498
Kostka, Paweł, 443 Prochazka, Michal, 356
Kostoval, Ales, 498 Przelaskowski, Artur, 234, 246
Krakowczyk, Łukasz, 377 Psiuk-Maksymowicz, Krzysztof, 487
Krasnod˛ebska, Paulina, 406 Pyciński, Bartłomiej, 271
Kr˛ecichwost, Michał, 84
Kulesa-Mrowiecka, Małgorzata, 377
R
Radomski, Dariusz, 474
L
Rojewska, Katarzyna, 94
Latos, Dominika, 321
Roleder, Tomasz, 181
Łakomiec, Krzysztof, 487
Romaniszyn-Kania, Patrycja, 377, 393, 406
Ławecka, Magdalena, 84
Rosiak, Maria, 393
Ledwoń, Daniel, 43, 321, 332
Różańska, Agnieszka, 94
Li, Chen, 285, 295, 307
Li, Yixin, 295
Liebenow, Laura, 3 S
Lipowicz, Anna, 76, 377, 421 Sage, Agata, 84
Lipowicz, Paweł, 107 Schwarzerova, Jana, 498
Liu, Wanli, 285 Sedlar, Karel, 498
Sieciński, Szymon, 443
M Sitek, Emilia J., 194
Maćkowska, Stella, 94 Śmieja, Jarosław, 510
Małecki, Andrzej S., 28 Smoliński, Michał, 194
Małek, Weronika, 181 Sobczak, Karolina, 168
Mańka, Anna, 393 Spinczyk, Dominik, 94, 455
Maśko, Małgorzata, 54 Steinhage, Axel, 3
Matyja, Małgorzata, 321, 332 Str˛akowska, Maria, 222
Mazur, Alicja, 522 Strumiłło, Paweł, 132
Miodońska, Zuzanna, 84 Strzelecki, Michał, 168
Mitas, Andrzej W., 28, 43, 393, 406, 421 Sun, Hongzan, 285, 295, 307
Mitas, Julia M., 28, 465 Świ˛atek, Adrian, 168
Moćko, Natalia, 84 Szewczenko, Janusz, 534
Moskal, Alicja, 234 Szkiełkowska, Agata, 406
Mrozek, Adam, 332 Szurmik, Tomasz, 393
Myśliwiec, Andrzej, 321, 332, 345, 377 Szymańska, Dżesika, 208

N T
Niebudek-Bogusz, Ewa, 132 Tamulewicz, Anna, 522
Nowińska, Katarzyna, 534 Turner, Bruce, 393
Author Index 547

U Wyciślok, Artur, 510


Urzeniczok, Mirella, 443 Wyleżoł, Natalia, 271
Uzdowska, Julia, 271
Z
Żak, Weronika, 261
W Zarychta, Piotr, 261
Walter, Jasmin, 3 Zawiślak-Fornagiel, Katarzyna, 28, 43, 465
Weckwerth, Wolfram, 498 Zhang, Haiqing, 295
Wilczyński, Sławomir, 155 Zhang, Jiawei, 307
Wilk, Agata, 487 Zheng, Yuchao, 295
Witek, Agnieszka, 76 Zhou, Xiaomin, 295

You might also like