Information Technology in Biomedicine: Ewa Pietka Pawel Badura Jacek Kawa Wojciech Wieclawek
Information Technology in Biomedicine: Ewa Pietka Pawel Badura Jacek Kawa Wojciech Wieclawek
Ewa Pietka
Pawel Badura
Jacek Kawa
Wojciech Wieclawek Editors
Information
Technology in
Biomedicine
9th International Conference, ITIB 2022
Kamień Śląski, Poland, June 20–22, 2022
Proceedings
Advances in Intelligent Systems and Computing
Volume 1429
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing,
Universidad Central de Las Villas, Santa Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, School of Computer Science and Electronic Engineering,
University of Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University,
Gyor, Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas
at El Paso, El Paso, TX, USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao
Tung University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology,
University of Technology Sydney, Sydney, NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute
of Technology, Tijuana, Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro,
Rio de Janeiro, Brazil
Ngoc Thanh Nguyen , Faculty of Computer Science and Management,
Wrocław University of Technology, Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering,
The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications
on theory, applications, and design methods of Intelligent Systems and Intelligent
Computing. Virtually all disciplines such as engineering, natural sciences, computer
and information science, ICT, economics, business, e-commerce, environment,
healthcare, life science are covered. The list of topics spans all the areas of modern
intelligent systems and computing such as: computational intelligence, soft comput-
ing including neural networks, fuzzy systems, evolutionary computing and the fusion
of these paradigms, social intelligence, ambient intelligence, computational neuro-
science, artificial life, virtual worlds and society, cognitive science and systems,
Perception and Vision, DNA and immune based systems, self-organizing and
adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics
including human-machine teaming, knowledge-based paradigms, learning para-
digms, machine ethics, intelligent data analysis, knowledge management, intelligent
agents, intelligent decision making and support, intelligent network security, trust
management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are
primarily proceedings of important conferences, symposia and congresses. They
cover significant recent developments in the field, both of a foundational and
applicable character. An important characteristic feature of the series is the short
publication time and world-wide distribution. This permits a rapid and broad
dissemination of research results.
Indexed by DBLP, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and
Technology Agency (JST).
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose (aninda.bose@springer.com).
Editors
Information Technology
in Biomedicine
9th International Conference, ITIB 2022
Kamień Śląski, Poland, June 20–22, 2022
Proceedings
123
Editors
Ewa Pietka Pawel Badura
Faculty of Biomedical Engineering Faculty of Biomedical Engineering
Silesian University of Technology Silesian University of Technology
Gliwice, Poland Gliwice, Poland
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The continuous growth of the amount of medical information and the variety of
multimodal content necessitates the demand for a fast and reliable technology able
to process data and deliver results in a user-friendly manner at the time and place
the information is needed. Multimodal acquisition systems, AI-powered applica-
tions, and computational methods for understanding human complexity, sound and
motion in physiotherapy, and prevention give new meaning to optimization of the
functional requirements of the healthcare system for the benefit of the patients. We
give back to the readers the book, which includes chapters written by academic
society members. The scientific scope of particular sections includes the aspects
listed below.
New explainable approaches to Computational Methods for Understanding
Human Complexity become a challenging artificial intelligence frontier able to
discover and investigate the relationship between internal biomedical processes
of the human body with the massive amount of heterogeneous data that can be
collected from humans (image data, sensor signals, phenotype, genotype, micro-
biome, clinical history, etc.). Approaches discovering the complex character of the
human body and mind employ data-intensive computational methods, e.g., pattern
recognition, machine learning, data science, and pervasive computing. Cognitive
disorders in Parkinson’s patients, human activity recognition, behavior changes due
to a hospital stay, multimodal emotion recognition systems, auditory processing
disorder in children, natural language features in patients with anorexia, and sleep
quality in population study are just examples of applications discussed in the first
part.
The Image Analysis section presents original studies reporting on scientific
approaches to support the CT and MR brain image analysis, laryngeal image
processing from high-speed video endoscopy, monitoring changes in corneal
structure, and skin layer segmentation. Computational pathology and cell studies
have been carried out on breast, cervical, and ileum images for analysis and clas-
sification. A study on chest X-rays in patients with COVID-19 is a response to the
need to face the pandemic. This section also covers fundamental studies on
v
vi Preface
The editors would like to express their gratitude to the authors who have sub-
mitted their original research papers and all the reviewers for their valuable com-
ments. Your effort has contributed to the high quality of the book we proudly pass
on to the readers.
ix
x Contents
Image Analysis
Comparison of Analytical and Iterative Algorithms
for Reconstruction of Microtomographic Phantom Images and Rat
Mandibular Scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Paweł Lipowicz, Agnieszka Dardzińska-Głębocka, Marta Borowska,
and Ander Biguri
Comparison of Interpolation Methods for MRI Images Acquired
with Different Matrix Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Adam Cieślak, Adam Piórkowski, and Rafał Obuchowicz
Preprocessing of Laryngeal Images from
High-Speed Videoendoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Justyna Kałuża, Paweł Strumiłło, Ewa Niebudek-Bogusz,
and Wioletta Pietruszewska
Construction of a Cephalometric Image Based on Magnetic
Resonance Imaging Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Piotr Cenda, Rafał Obuchowicz, and Adam Piórkowski
Analysis of Changes in Corneal Structure During Intraocular
Pressure Measurement by Air-Puff Method . . . . . . . . . . . . . . . . . . . . . . 155
Magdalena Jędzierowska, Robert Koprowski, and Sławomir Wilczyński
Discrimination Between Stroke and Brain Tumour in CT Images
Based on the Texture Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Monika Kobus, Karolina Sobczak, Mariusz Jangas, Adrian Świątek,
and Michał Strzelecki
The Influence of Textural Features on the Differentiation
of Coronary Vessel Wall Lesions Visualized on IVUS Images . . . . . . . . 181
Weronika Małek, Tomasz Roleder, and Elżbieta Pociask
Contents xi
Signal Processing
Activities Classification Based on IMU Signals . . . . . . . . . . . . . . . . . . . . 435
Monika N. Bugdol, Marta Danch-Wierzchowska, Marcin Bugdol,
and Dariusz Badura
Heart Rate Measurement Based on Embedded Accelerometer
in a Smartphone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Mirella Urzeniczok, Szymon Sieciński, and Paweł Kostka
Non-invasive Measurement of Human Pulse Based on Photographic
Images of the Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Jakub Gumulski, Marta Jankowska, and Dominik Spinczyk
The Validation Concept for Automatic Electroencephalogram
Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Julia M. Mitas and Katarzyna Zawiślak-Fornagiel
Do Contractions of Abdominal Muscles Bias Parameters Describing
Contractile Activities of a Uterus? A Preliminary Study . . . . . . . . . . . . 474
Dariusz Radomski
1 Introduction
For this project, we investigated the topic of behavioural analysis through motion
data recorded with floor sensors. Specifically, we look at the differences in the
behaviour of the same person before and after a hospitalisation.
The floor sensor used for this work is a relatively new type of sensor which,
due to its robustness, is able to cover large areas by being installed inconspicu-
ously under the normal floor covering. This makes versatile use conceivable for
support in many areas, such as care. Due to under-staffing in nursing homes,
hospitals and similar institutions, it is not possible to care for and medically
monitor all patients to the extent that would be desirable. For people in need of
care, it is necessary to take enough time to assess the short-term and long-term
consequences of their impairment in order to be able to adapt the degree of
treatment individually.
With our work we want to make a further step towards support with floor
sensors especially in the field of care. We set out to show that it is possible to
detect changes in a person’s behaviour using data from a sensor floor alone. For
this purpose, we have made the assumption that a person behaves differently
before and after a hospitalisation due to the physical and psychological stress of
that hospital stay. Our data set is an anonymised recording of floor sensor data
from a single nursing home resident whose entire room, including the bathroom,
is equipped with floor sensors that record data 24 h a day. There is also informa-
tion about a two-week hospital stay of the said resident. To analyse and classify
the data, we use different classification methods, such as a Multi-Layer Percep-
tron, Gaussian Naive Bayes classifier, a Support Vector Machine and Random
Decision Forests. We compare the results and analyse their classification quality
and reliability. In summary, our scientific contribution is to classify the data
measured by a new variant of sensors, a large area floor sensor, using known
methods. Our long-term goal is to use the ability to detect changes in people’s
behaviour only by floor sensor data to enable individualised analyses that pro-
vides predictive and decision-making support for treating doctors and nurses.
2 Related Work
Declining health in high age can result in behavioural changes and anoma-
lies when compared to a previous healthy period. These behaviours can unob-
trusively be measured by means of technical sensors in smart home environ-
ments, which was done using motion and environmental sensors in several stud-
ies for behavioural long-term monitoring and detection of behavioural changes
[3,18,19]. The cause of sudden changes in the behaviour and activity of elderly
people is often due to the appearance of diseases of all kinds and following hos-
pital stays for treatment. Hospital stays and associated treatments can have
two divergent effects: First, they are inherently stressful events that come with
physical and psychological side-effects, which can lead to a prolonged recovery
period after discharge. This can be visible in the daily behaviour as a reduction in
activity when compared to the baseline activity and behaviour that was observed
some time before the emergence of the disease that lead to the admission to a
hospital. Second, as the hospital stays have the goal to improve the physical
condition of the patient, measurements of personal functionality can be already
improved, and further improve after discharge, when compared with observa-
tions directly before the admission, when effects of the disease already show as
declined activity as a result from limitations due to the disease which later needs
treatment in a hospital. These patterns were also found in a large study which
regarded functional changes related to hospital stays [11]. The recovery process
is typically not totally finished at the time of discharge, as an example a steady
Sensor Data Analysis 5
increase in step counts per day can be measured in the days after a hospital stay
that came with some major surgery [5]. The present work adds to this field of
research by contributing an end-to-end behavioural classification method which
uses floor sensor data and machine learning classifiers to distinguish between
behavioural profiles that were recorded on days directly before admission and
after discharge from the hospital. The method is evaluated in a single-case pilot
study on sensor floor data recorded in a senior residence from an inhabitant who
had a hospital stay in the recording period.
This section introduces the methods that were used within this paper for the
data recording and analysis. In particular, it describes the hardware that was
used and the algorithms that were applied and developed for the classification
task of recognising behavioural changes after hospitalisation.
3.1 SensFloor
The sensor floor used for the data recordings in this study is the model Sens-
Floor R
by the company Future-Shape GmbH [1]. This sensor floor is a contact-
less system as it detects changes in the electric capacitances on a triangular grid
of sensor fields. This measurement principle stands in contrast to sensor floors
that work by measuring force or pressure, where a direct contact to the sensors
is necessary [7]. The contact-less sensor measures through a flooring layer on top
of it. Until now it is commonly used in elderly care facilities for fall detection [15]
and for ambient assisted living at home [8], as it is easy to install and integrates
well with the environment as it is hidden under the normal flooring.
Our goal is to find out if it is possible to detect changes in a person’s behaviour
just from the data provided by a sensor floor. Around-the-clock observation that
has no influence on a person’s behaviour but still collects high-resolution data
would be best suited for this. The fact that the SensFloor R
is able to collect the
most intensively researched parameters in gait analysis, like cadence, step width,
step length and timing, is promising for the outcome of our experiments. The
focus of the present work is on movement behaviour as it is found in trajectory
patterns and the gait analysis results show that a lot of information is contained
in the SensFloor R
data [7].
The SensFloor R
consist of a three layer composite [7]. The layer at the bot-
tom is a thin aluminium foil for electrical shielding. In the middle a polyester
fleece of 3mm is placed. And on top is an additional thin polyester fleece that
is coated in metal, which makes it electrically conductive. The whole system is
organised as a grid consisting of independent modules. Each module has a micro-
controller board in its centre. The microcontroller is connected to eight triangular
sensor shapes and to the power supply lines. The triangular-shaped sensor fields
are meant to measure the electric capacities. These modules are placed next to
each other up to a covering of the whole surface as seen in Fig. 1. The power
6 L. Liebenow et al.
supply lines of modules that lay next to each other are electrically connected
with textile stripes of the same fabric as the top layer. If the room has a special
shape it is also possible to cut the modules in the right shape so that they fit
in the corners, as long as the microcontroller circuit remains undamaged. The
whole installation is powered by a single power supply unit of 12V, that can be
placed at any position at the edge of the sensor floor.
There are three standard types of SensFloor R
, low resolution (1 m × 0.5 m
modules), 16 sensors/m ), high resolution (0.5 m × 0.5 m/32 sensors/m2 ) and
2
Scientific and Medical Bands (ISM) on 868 MHz or 920 MHz, depending on the
region and radio regulations. The wireless sensor messages now can be collected
by a central transceiver in a connectionless mode. A message contains the module
ID where it comes from and the current capacitance values of the eight fields.
They remain valid as long as no new message from the same module ID arrives.
The process of the constant updating of sensor states by messages is a time series
data stream. SensFloor R
data is stored in two different types: module capacity
information and trajectory information.
a hyperplane which linearly separates classes in the feature space [17]. They
seemed like a good choice to try out for us because several recent studies have
reported that SVMs are generally able to provide higher classification accuracy
than other data classification algorithms, especially for linear decision problems
[6].
Since Random Decision Forests achieve very good classification accuracy,
as described in many paper sources [4], and are robust to noise due to the
randomness of the features and a large number of uncorrelated decision trees,
we decided to use it in our approach as well. Random Forests have the advantage,
that a large set of individual predictors are able to outperform any individual
model.
3.3 Preprocessing
We used the trajectory information files to create a set of features needed for
the classifiers. The data is collected in a separate file for each day so that the
data samples start and end at midnight of each day.
One complete trajectory T is given by a list of data points vi with vari-
able length n, where each data point contains the information timestamp ti ,
x-coordinate xi and y-coordinate yi . A trajectory can thus be described as
follows:
T = <v1 , ..., vn >
(1)
vi = (ti , xi , yi ) , i = 1, ..., n.
We deleted all trajectories that were not completed at the end of the day to
ignore unfinished paths. Since it was not only the resident who entered and left
the room, we also tried to filter out data from possible visitors and caregivers.
For this, the following assumptions were made:
In this way, we were able to exclude many trajectories that did not belong to
the resident in the best possible way (Fig. 2).
For the network features, we decided not to use the trajectory data directly, but
to extract additional information from it. For each trajectory we calculated the
following parameters:
Sensor Data Analysis 9
Fig. 2. This is a simulation of the resident’s room plan. The big blue shape represents
the living room and the blue rectangle represents the bathroom. The numbers on the
x- and y- scale symbolise the meters for the room width and length. The dark blue
dotted line is a sample trajectory
The duration was calculated using the Unix timestamps ti stored for each data
point of the trajectories. For the total duration of a trajectory, the time difference
between the start time t1 and the end time tn of the trajectory was calculated:
fΔt = tn − t1 , where ti is from vi . (3)
For the radius, we decided to calculate the maximum radius fr that each tra-
jectory describes in the room. To do this, we first calculated the mean point v
for each trajectory. We then calculated the distances between the mean vector
p and each point pi of the trajectory, and stored the maximum distance dmax
from the mean vector as the maximum radius of the trajectory:
fr = dmax (pi , p) = max (p − pi )2 . (4)
i∈[1,...,n]
With these parameters, we filtered out and deleted trajectories that are most
likely noise from objects in the resident’s room. Therefore, we deleted all trajec-
tories with a radius or distance less than 0.5 m.
10 L. Liebenow et al.
with every fx,h being the sum of all the feature values that were calculated for
every trajectory that took place in one hour of the day, for h ∈ [1, . . . , 24] and
for all feature types x ∈ {c, d, Δt, r}. The complete data set consists of 108 of
these vectors for the 108 days considered. The full set was used as input to the
classification algorithms.
We selected the described features from the trajectories and the split in 24 h
to obtain a useful representation of a person’s daily behaviour. The total number
of trajectories that was generated by the resident is a good description of the
general activity, like leaving and re-entering the room, bed, or bathroom, which
are events that create new trajectories. The distance parameter describes how
many meters the resident walks per hour in their room, which is an indication
of how capable the person is of walking. The duration parameter describes how
long it takes the person to walk one way in their room. The duration adds the
information about how long it takes the person to walk their distances. Here, it
is assumed that a person recovering from a hospitalisation will take longer for a
way than before the hospitalisation. Radius is a parameter that helps describe a
further geometrical property of the path. For example, the resident might walk
a distance of 5 m but only go back and forth in a small area, or they might walk
from one end of the room to the other. The area of the resident’s room that
is used during the paths should also change according to the health condition,
since a person in poor health is less active. Finally, the division into 24-h helps
to get an overview of the resident’s daily behaviour. It shows when the most
active hours are and when the person tends to be inactive or is not in the room
at all. When this daily routine is recorded, a repeating pattern can be identified
over several days, which also is assumed to change after a longer hospital stay.
For our approach, only the data set of a single resident was used. This resident
was hospitalised for several weeks due to a COVID-19 diagnosis. We used the
days before hospitalisation since the start of the recording, 54 days, and an equal
number of days after discharge to obtain a balanced data set. This gives a total
of 108 days of recorded data. The days were labelled according to their class Ωi ,
which allows them to be assigned to the time of their admission before, i = 0,
or after, i = 1, the hospitalisation. We used a k-fold cross-validation with k = 5
folds for each classifier, one of the folds being a test set as described in 3.2, and
using the rest to train the machine learning models.
We used classifier implementations provided by scikit learn [12]. Our Multi-
Layer Perceptron consists of seven layers: The input, the output and five hidden
layers. Each hidden layer has a size of 100 neurons. It iterates over a maximum of
1000000 iterations. Since an Adam optimiser is used, it determines the number
of epochs, i.e. how many times each data point is used, rather than the number
of gradient steps. Our Random Decision Forest consists of 1000 estimator trees.
4.1 Results
Table 1. Average results over the five training epochs of the four classifiers MLP, GNB,
SVM and RDF given in five different metrics
From the result values it can be seen that the RDF results are generally
the best and the GNB results the worst. The MLP and the SVM both perform
equally good. The confusion matrices in Fig. 3 visualise the numbers of correctly
and incorrectly classified days per class.
12 L. Liebenow et al.
Fig. 3. These confusion matrices visualise the statistical classification results of the
classifiers per class. The numbers in the lower left and upper right square of each
matrix represent the numbers of correct classified data points. The numbers in the
upper left and lower right represent the false classified data points
4.2 Discussion
With comparable average results above 75% in all metrics, the MLP, SVM and
RDF classifiers perform good enough in the task of classifying the behavioural
patterns of the monitored nursing home resident before and after hospitalisation.
Since only the precision and specificity values of the GNB are above 70%, this
classifier is less suitable for this task. The confusion matrices (see Fig. 3) give
the numbers of correctly and falsely classified data points and are the base of
the metrics given in Table 1. It is noticeable that only the MLP is better in cor-
rectly classifying the days after hospitalisation (recall/sensitivity), than correctly
classifying the days before hospitalisation (specificity). The GNB especially has
a worse recall/sensitivity. Its specificity is nearly as good as the MLP and the
SVM. The results of the RDF are noticeable better in every metric.
These values are only meaningful for this specific resident, as each person is
individual and some people cope better with hospitalisation than others. The
extent of behavioural differences may also depend on the cause of the hospitali-
sation, i.e. whether it was a bone fracture, a severe infection or something else.
Our patient was hospitalised for a COVID-19 infection. Another difficulty for
the classification is the overall health condition of a person. In particular, the
Sensor Data Analysis 13
5 Conclusion
The results of the MLP, SVM and especially RDF classifiers show that it is
possible to detect changes in a person’s behaviour before and after a hospital
stay using only sensor floor data. This promising result implies that it may be
possible to detect individual behavioural changes just by using floor sensor data
in general. This would open many new possibilities in the field of care and early
detection of emerging diseases. Our current approach is limited for practical use
because labels added in hindsight are required to classify the data. Future plans
include an unsupervised approach that would allow behavioural analysis of more
than one person without the need for retrospective labelling of the data. This
would allow the system to serve as a continuous, divergent behaviour analysis
and alerting system.
References
1. https://future-shape.com
2. Home & smart (2021). https://www.homeandsmart.de/ambient-assisted-living-aal
3. Aran, O., Sanchez-Cortes, D., Do, M.-T., Gatica-Perez, D.: Anomaly detection in
elderly daily behavior in ambient sensing environments. In: Chetouani, M., Cohn,
J., Salah, A.A. (eds.) HBU 2016. LNCS, vol. 9997, pp. 51–67. Springer, Cham
(2016). https://doi.org/10.1007/978-3-319-46843-3_4
4. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
5. Cook, D.J., Thompson, J.E., Prinsen, S.K., Dearani, J.A., Deschamps, C.: Func-
tional recovery in the elderly after major surgery: assessment of mobility recovery
using wireless technology. Ann. Thorac. Surg. 96(3), 1057–1061 (2013). https://
doi.org/10.1016/j.athoracsur.2013.05.092
6. Durgesh, K.S., Lekha, B.: Data classification using support vector machine. J.
Theor. Appl. Inf. Technol. 12(1), 1–7 (2010)
7. Hoffmann, R., Brodowski, H., Steinhage, A., Grzegorzek, M.: Detecting walking
challenges in gait patterns using a capacitive sensor floor and recurrent neural
networks. Sensors 21(4), 1086 (2021)
8. Lauterbach, C., Steinhage, A., Techmer, A., Sousa, M., Hoffmann, R.: AAL func-
tions for home care and security: a sensor floor supports residents and carers. Curr.
Dir. Biomed. Eng. 4(1), 127–129 (2018)
9. Murphy, K.P., et al.: Naive Bayes classifiers. Univ. Br. Columbia 18(60) (2006)
10. Noriega, L.: Multilayer perceptron tutorial. School of Computing, Staffordshire
University (2005)
14 L. Liebenow et al.
11. Palleschi, L., et al.: Functional recovery of elderly patients hospitalized in geri-
atric and general medicine units. The PROgetto DImissioni in GEriatria study:
in-hospital functional recovery in older adults. J. Am. Geriatr. Soc. 59(2), 193–
199 (2011). https://doi.org/10.1111/j.1532-5415.2010.03239.x
12. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn.
Res. 12, 2825–2830 (2011)
13. Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the poor assumptions of
Naive Bayes text classifiers. In: Proceedings of the 20th International Conference
on Machine Learning (ICML-03), pp. 616–623 (2003)
14. Santos, R., Rupp, M., Bonzi, S., Fileti, A.: Comparison between multilayer feed-
forward neural networks and a radial basis function network to detect and locate
leaks in pipelines transporting gas. Chem. Eng. Trans. 32, 1375–1380 (2013)
15. Steinhage, A., Lauterbach, C.: SensFloor R and NaviFloor:R robotics applications
for a large-area sensor system. Int. J. Intell. Mechatron. Robot. (IJIMR) 3(3), 43–
59 (2013)
16. Theodoridis, S., Koutroumbas, K.: Chapter 2 - classifiers based on Bayes deci-
sion theory. In: Theodoridis, S., Koutroumbas, K. (eds.) Pattern Recognition,
4th edn, pp. 13–89. Academic Press, Boston (2009). https://doi.org/10.1016/
B978-1-59749-272-0.50004-9, https://www.sciencedirect.com/science/article/pii/
B9781597492720500049
17. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Cham (1999).
https://doi.org/10.1007/978-1-4757-3264-1
18. Veronese, F., Masciadri, A., Comai, S., Matteucci, M., Salice, F.: Behavior drift
detection based on anomalies identification in home living quantitative indicators.
Technologies 6(1), 16 (2018). https://doi.org/10.3390/technologies6010016
19. Verstaevel, N., Georgé, J.P., Bernon, C., Gleizes, M.P.: A self-organized learning
model for anomalies detection: application to elderly people. In: 2018 IEEE 12th
International Conference on Self-Adaptive and Self-Organizing Systems (SASO),
pp. 70–79 (2018). https://doi.org/10.1109/SASO.2018.00018
20. Zhang, H.: Exploring conditions for the optimality of Naive Bayes. Int. J. Pattern
Recogn. Artif. Intell. 19(02), 183–198 (2005)
Cloud-Based System for Vital Data
Recording at Patients’ Home
1 Introduction
Guaranteeing comprehensive GP (General Practice) care in the rural German
tri-border region of North Rhine-Westphalia, Rhineland-Palatinate, and Hesse is
an increasing challenge. On the one hand, this is caused by growing medical and
nursing needs due to demographic changes [1] and, on the other hand, regional
disparities with regard to the average age and thus the number of practicing
physicians in outpatient GP care are reasons [2].
This paper describes the approach of a digital transformation to introduce a
patient-controlled home monitoring of vital data as a new form of outpatient GP
care. This new approach aims concurrently at care efficiency by relieving physi-
cians’ time through delegation and (partial) automation of documentation obli-
gations, and improved cost-effectiveness by minimizing follow-up examinations
and appointments, reducing hospital stays and the use of emergency services [3].
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 15–27, 2022.
https://doi.org/10.1007/978-3-031-09135-3_2
16 A. Keil et al.
2 Project Background
Although the overall number of medics is not declining, the competition to hire
and keep country doctors turns out to be a problem not only for German rural
areas, but also for the rest of Europe [5].
The current project “DataHealth” is located in Burbach a small village in
rural southern Westphalia. A digital medical platform was implemented that
allows the involvement of patients (or non-medical staff in nursing homes) for
measuring vitals signs. This considerably relieves doctors in their daily practice.
The participating physicians assess which patients are suitable for this “self-
measurement” and select the appropriate ones. For the project two scenarios were
chosen: about 20 patients participate in a nursing facility and about the same
amount are ambulatory patients of two doctors’s offices in the village. Medical
indications according to the measuring devices are e.g. arterial hypertension,
diabetes mellitus II, pneumonia, cardiac failures, and atrial fibrillation.
The project is part of a larger regional initiative lead by University of
Siegen [6] to improve the health environment by digital medicine.
3 System Overview
Home-based measurement of vital parameters aims to initiate a paradigm shift
and thus to optimize care processes. On-spot measurements taken by patients
themselves, by relatives, or by caregivers are obtained from the monitoring sys-
tem running continuously in the background and can thus be presented to physi-
cians in different resolutions (discrete or continuous) or as aggregations. In this
way, physicians can examine the clocked measurements and also the detailed
course at any time, e.g. in order to recognize intermediate values and their ten-
dencies in the case of irregularities to make these as a starting point for further
interventions.
The doctors assess initially which patients are suitable for this self-
measurement and select them. Different sensing devices allow the measurement
of vital parameters like blood pressure, ECG, heart rate, blood glucose, weight,
and oxygen saturation. Most of the devices are equipped with Bluetooth inter-
faces and can automatically sync with a smartphone app [7]. For devices with
no radio connection a manual transcript of data to a smartphone app is eas-
ily possible. Practices could be equipped with a viable number of such certified
Vital Data Recording at Home 17
This vital data evaluation supports physicians in their daily practice by pro-
cessing and evaluating patient data securely and in compliance with data pro-
tection laws allowing the additional use of statistics-based services. The focus
is on supporting patient-physician interaction. In this way, patients can easily
be involved in health decisions on the basis of their vital data. After informed
patient consent, physician are given access to all data and the possibility of data
interaction. They can filter, view and summarize it. The use of data processing
methods like aggregations supports physicians by transparently supplementing
the presented vital data to make informed and fact-based decisions.
The vital data are recorded within the prescribed monitoring period. Aggre-
gated and processed data are temporarily cached in the cloud to populate ser-
vices such as evaluation and visualization for a physician’s web interface. A later
integration into PMS (Practice Management Systems) is intended.
Security, authentication, and encryption procedures are used for patient data.
Since legal regulations in Germany require the use of the so-called Telematics
Infrastructure (TI) for stakeholders’ mutual authentication there are also initial
ideas to connect cloud services to the TI [9].
18 A. Keil et al.
4 Patient Frontend
For the described concept, a frontend for the patient is needed to transmit vital
parameter data. There are two basic ways to transmit: Either the sensing devices
for vital data measurement send the data directly to the cloud or some kind of
gateway is used.
In a current research project called “DataHealth” a smartphone as a gateway
has been used due to the simple implementation of this approach. In use are 40
Apple iPhones SE (version 2020). The app was implemented in the programming
language Swift to guarantee the best compatibility with Apple’s iOS system.
For this project, vital data measuring devices were required that are easy to
use, intended for home use, have appropriate certifications, and can transmit
measured data to a smartphone via Bluetooth. In the end, the decision was made
in chose the German manufacturer Beurer, which has many years of experience
in the fields of measuring devices and their networking with a smartphone app.
The following section refer briefly to the measured vital values and the mea-
surement devices:
The Blood pressure and hearth rate is measured using Beurer BM85
which is an easy to use blood pressure monitor with bluetooth functionality. If
it is connected to a smartphone, it transmits the measured blood pressure and
the hearth rate.
The Blood oxygen saturation is measured using Beurer PO60 which is an
easy to use pulse oximeter with bluetooth functionality. As long as a finger tip
is put into the device, it measures the hearth rate and the oxygen saturation of
the blood. If it is connected to a smartphone and the finger is removed at the
end of the measurement, it automatically transmits the measured blood oxygen
saturation. Unfortunately, it does not transmit the hearth rate as well.
For ECG measurement, Apple Watches Series 6 where chosen. Apple watches
are currently the only devices that are allowed to write ecg-data to Apple Health,
from there the data is easily accessible. In addition Apple Watches are also capa-
ble of recording the hearth rate and the blood oxygen saturation permanently
instead of only sporadically as when conventional devices are used.
For measuring the body weight, Beurer BF700 diagnostic bathroom scale
with bluetooth functionality are used. In addition, the scale can perform a bio-
electrical impedance analysis, to measure the body fat, body water, muscle and
bone mass.
The blood glucose is measured by the patients’ own equipment, as diabet-
ics already own such devices. So the option of entering a blood glucose value
manually has been implemented.
Vital Data Recording at Home 19
Health Manager Pro. The “HealthManager Pro” app of Beurer receives the
vital data via Bluetooth from the Beurer devices described in Sect. 4.1 and for-
wards them to Apple Health. After setting up the app for the first time, this
process is done automatically in the background without any interaction by the
user being necessary.
Apple Health. This is the central app provided by Apple to store and view all
kind of health related data. Interfaces are provided for third-party apps to read
and write health data in Apple Health with one exception: Currently, ECG data
can only be read by third parties. Only Apple Devices like the Apple Watches
are allow to write them. Within the project DataHealth Apple Health is only
used as an interface so that Health Manager Pro stores new vital data and the
project app can read it. Therefore no user interaction with the app is necessary.
Übermorgen = the day after tomorrow). The daily prescription is divided into
daytimes (Nüchtern = empty stomach, Morgens = in the morning, Mittags = at
noon, Abends = in the evening).
The yellow box in the middle shows when and which last vital value was
transferred, so the user has a feedback if data has been transmitted. Below are
three buttons to manually enter health complains (Gesundheitsbeschwerden),
blood glucose level (Blutzucker) and weight (Gewicht).
On the bottom of the page are buttons for the three main pages overview
(Übersicht), history (Historie) and messages (Nachrichten).
The history page is shown in Fig. 3(b). All transmitted vital values can be
viewed here. At the top of the page, the vital value can be selected. The table
below shows the selected values and the recording date.
All vital data recorded from the patient as previously described are transmitted
to a cloud to be accessed by the physician.
Vital Data Recording at Home 21
A cloud solution was preferred to a local server solution here for various reasons:
On the one hand, the cloud provider allows problem-free scaling of the nec-
essary resources, i.e. the required computing and storage capacities are always
adapted to the current needs (e.g. with regard to the number of users). On the
other hand, no costly hardware purchases have to be made. Only the services
used are paid for. In addition, the cloud enables a dedicated access mechanism
with account management and redundant data storage and is thus also superior
to the so-called on-premise solutions in terms of security. The choice of a service
provider fell on the current market leader in cloud computing: Amazon.
Amazon Web Services, or AWS for short, is a cloud computing service that
has been offered by Amazon for about 15 years. Cloud computing is a devel-
opment model in which data storage and large parts of data processing are no
longer carried out on the specific user devices (in this case the patients’ smart-
phones or the doctors’ desktop computers), but on an online service provided
by the chosen service provider. Amazon is by far the largest cloud provider on
the market and is one of the few certified by the German Federal Office for
Information Security (BSI) (C5, ISO/IEC 27001:2013, GxP), so that even sen-
sitive medical informatics applications with patient data do not pose a security
problem [10].
AWS Cognito. Cognito is a token based service and used to perform user
registrations, logins and access control. To register new users, a user pool with
certain access privileges needs to be crated first. In our case we created two user
pools: a physician user pool and a patient user pool.
Every user must register with a user name and a password. Depending on
the configuration, an e-mail address or a phone number can be added. These
are mainly used to reset the password. For every user, an unique subject ID
(sub) is created automatically to identify the user. This ID is also used in the
DynamoDB tables, to store data like vital values belonging to a certain user.
22 A. Keil et al.
For the storage of the vital values, physician and patient data and others, the
DynamoDB database service was chosen. Most of the data, including the vital
values, are stored on the server ‘eu-central-1’ which is located in Frankfurt in
Germany.
Vital Values. All vital values are stored almost in the same way but each in
a separate table. The primary key is the patient ID which equals the unique
sub-ID of the user in cognito. The sort key is the recording time of the vital
value in the format “yyyy/MM/dd HH:mm:ss:fff”. This way, the combination
of partition key and sort key should be unique at every time. Furthermore, it is
possible to retrieve the data of a particular patient very efficient, since all data
from the same patient have the same partition key. Additionally, it is possible
to retrieve a subset of the data using the sort key.
With some exceptions, all vital values have a numeric field for the vital value
itself and an additional comment field for the physician to comment the vital
value.
In Apple’s HealthKit, blood pressure is not stored in a single table but in
two separate tables one for the systolic and the other for the diastolic value.
Therefore, the decision was made to use the same structure with two tables in
DynamoDB. But since we need a comment field only once, only the systolic table
contains one.
In addition to the previously mentioned fields, the blood glucose has an
optional field for information about the last meal.
Finally, a “general health condition” field was requested where the user can
enter health complains as plain text. So the numeric field for the vital value was
replaced by a comment field with a string.
ECG. The storage of ECG data is slightly more complex than the rest of the
vital signs due to the structure and amount of data. Like the other vital values,
the primary key is the sub-id of the user and the sort key is the recording time.
Additional vales are the sampling frequency and the average hearth rate. The ecg
itself is stored as a numeric list of values in millivolt. Since the sampling frequency
of the used Apple Watch 6 512 Hz and the duration of an ecg measurement is
30 s, one ecg consists of more than 15,000 measuring points.
Physician Master Data. This table contains the subID of the Cognito user
as partition key. Additional fields are the name, first name, academic title and
role. The role can be physician, care or admin and enables different views in the
frontend (see Sect. 6.2).
Patient Master Data. This table is stored on the separate server “eu-west-1”
which is located in Ireland for data protection reasons. The primary key is the
Vital Data Recording at Home 23
subID of the Cognito user, a sort key does not exist. Additionally, it contains
the patients name, first name, birthday, sex, height (for BMI), phone number
and address.
Prescriptions. This table reflects the prescrptions of the physician with regard
to vital data recordings for every patient. The primary key is the patient ID,
the sort key is the time when the prescription was entered. Additional fields are
the end time of the prescription, the physician id and the vital value as strings,
the measuring interval as a numeric and the times of the day, morning, noon,
evening and empty stomach as bools. This was it is possible to prescribe for
example a blood pressure measurement every second day in the morning and
in the evening or a blood sugar measurement every day with empty stomach.
When the field for the end time of the prescription is empty, it is active, when
it contains a valid date, it is inactive and shown in the prescription history.
Mapping. This table contains the physicians’ ID as partition key and the
patient ID. It is used to assign physicians to their patients. Data of unassigned
patients cannot be viewed. Both the physician ID and the patient ID are the
cognito sub-ids of the user.
Others. Additional tables are for messages from the physician to the patient
(not the other way around), storing diagnoses and target areas for each patient
and vital value. When the values of the specified patient are not inside the target
area, the vales are highlighted for the physician in the web frontend.
6 Web Frontend
To allow physicians and caregivers to access their patients’ vital data, a web
frontend was implemented using ASP.NET. It is running on a virtual server of
the data center of the University of Siegen and is accessible over the world wide
web.
Overview. The starting page for the physician is the overview page. The most
recent vital signs are displayed here, which are not within the patient-specific
target area. This allows the physician to get a quick overview of the most impor-
tant new values.
24 A. Keil et al.
Dashboard. This page is the most important and displays the patients’ vital
values. There are two dropdown menus, one for selecting the patient and one
for selecting the vital value. Selectable vital values are timeline, blood pressure,
blood sugar, hearth rate, weight, blood oxygen saturation and the general health
condition.
The timeline is the default selection and provides an overview over all vital
values of the selected patient for a selected duration of days. The values are
sorted by date starting with the newest values as shown in Fig. 4. The columns
are the date of measurement, the name of the vital value, the vital value itself
and the comment of the physician if there is one.
Patients. On this page, a list of all assigned patients can be viewed including
their personal data. It will be possible to change the personal data of the patient
here, but this is not implemented yet. From here the vital value prescriptions
page of the patients is accessible.
Vital Data Recording at Home 25
Prescriptions. At the top of this page, there is a list of all active vital value
prescriptions, below is a list of all historic prescriptions. If no end date has been
set for a prescription, it is active.
At the top of the two lists is a button to create new prescriptions via a
pop-up. In this pop-up a vital value is chosen with a drop down list ae well
as a measurement interval in days and there are four radio buttons for empty
stomach, morning, noon and evening. If a new prescription is created and an
active prescription with the same vital value for this patient already exists, the
existing one is automatically deactivated so that there can only be one active
prescription of every vital value at the same time.
The Admin is mainly for configuration purposes. All functions are available
and all patients are visible independent of the mapping table. This user can
create new patients.
7 Conclusion
The paper described the technical infrastructure necessary for a new approach
of measuring patients’ vital parameters. Many appointments for patients in doc-
tor’s practices are only conducted to measure a new set of vital data in order to
assess the health status of the patient. Measurements at home can be a relief for
patients as well as physicians. Additionally the frequency and the duration of
measurements can be individually adapted to the patient’s condition. The sys-
tems consists of a patients’ frontend with sensing devices and transmitting hard-
and software, a middleware within a secure cloud environment and a physician
web-frontend with capabilities to visualise the vital data.
References
1. van den Bussche, H.: Die Zukunftsprobleme der hausärztlichen Versorgung in
Deutschland. Aktuelle Trends und notwendige Maßnahmen. Bundesgesundheits-
blatt 62, 1.129–1.137 (2019). https://doi.org/10.1007/s00103-019-02997-9
2. Kassenärztliche Bundesvereinigung: Gesundheitsdaten - Regionale Verteilung der
Ärzte in der vertragsärztlichen Versorgung 2020. Statistische Informationen
aus dem Bundesarztregister (2020). https://gesundheitsdaten.kbv.de/cms/html/
16402.php. Accessed 21 Jan 2022
3. Kramer, U., Vollmar, H.C.: Digit. Health: Forum 32(6), 470–475 (2017). https://
doi.org/10.1007/s12312-017-0326-7
4. Warzecha, M., Dräger, J., Hohenberg, G.: Visite via monitor. Heilberufe 70(5),
42–43 (2018). https://doi.org/10.1007/s00058-018-3461-3
5. Winkelmann, J., Muench, U., Maier, C.: Time trends in the regional distribution
of physicians, nurses and midwives in Europe. BMC Health Serv. Res. (2020).
https://doi.org/10.1186/s12913-020-05760-y
6. University of Siegen: Digitale Modellregion Gesundheit Dreilädereck (DMGD).
https://dmgd.de. Accessed 15 Feb 2022
7. Haak, D., Deserno, V., Deserno (geb. Lehmann), T.: Datenmanagement für Medi-
zinproduktestudien. In: Kramme, R. (ed.) Informationsmanagement und Kommu-
nikation in der Medizin, pp. 145–164. Springer, Heidelberg (2017). https://doi.org/
10.1007/978-3-662-48778-5 48
8. Darms, M., Haßfeld, S., Fedtke, S.: Medizintechnik und medizinische Geräte
als potenzielle Schwachstelle. In: Darms, M., Haßfeld, S., Fedtke, S. (eds.) IT-
Sicherheit und Datenschutz im Gesundheitswesen, pp. 109–128. Springer, Wies-
baden (2019). https://doi.org/10.1007/978-3-658-21589-7 5
Vital Data Recording at Home 27
1 Introduction
The diagnosis of neurodegenerative diseases is of constant interest today, espe-
cially in the context of the growing possibilities of neuroimaging and the use of
information search techniques in large sets of highly diversified data.
Despite recent advances in medical knowledge, clinical criteria are still used
to diagnose Parkinson’s disease (PD). The most recent and widely used criteria
are the Movement Disorders Society (MDS) criteria published in 2015, according
to which the diagnosis of PD requires confirmation of bradykinesis and one of the
two axial symptoms - resting tremor or muscle stiffness. According to the current
guidelines, the clinician should review the exclusion criteria, red flags, and crite-
ria confirming the diagnosis after diagnosing Parkinsonian syndrome. The MDS
criteria also include cardiac scintigraphy using the I-meta-iodobenzylguanidine
(MIBG) marker and the DaTSCAN test, which in practice has its limitations
– primarily – in terms of availability and cost of these tests, and is performed
only in selected cases. In the process of diagnosing PD, the doctor uses stan-
dard neuroimaging tests, and only in special cases – e.g. in very young people
or familial parkinsonism – additionally genetic tests. Whatever biological mark-
ers of PD (so-called biomarkers) have not been introduced into routine clinical
practice yet, the observations so far indicate that the sensitivity and specificity
of the clinical criteria used are limited. Hence, the use of artificial intelligence is
a significant direction of research, which may contribute to increasing the sen-
sitivity and accuracy of diagnosis, especially at an earlier stage of PD, enable
diagnosis already in the prodromal period, and further improve the quality of
life of patients and their caregivers, and above all, the expected intervention in
the future neuroprotective. This is particularly important from the point of view
of one of the most aggravating aspects of this disease – dementia, which develops
in a large group of patients with PD.
The theory of risk, which is the basis of medical diagnostics, applies to all
of us in every situation, so the correctness of each inference can be (and usually
is) analysed in a probabilistic way as a random event, not the same as a specific
event. Ultimately, we cannot be sure of well-being even in the simple situation of
consuming an ordinary meal because, in the specific situation of environmental
conditions and the current mental condition of the consumer, it may be a dietary
error. Unfortunately, it is impossible to analyse the impact of such exemplary
non-measurable variables.
Usually, we estimate the risk of failure (or sometimes not, relying on someone
else’s, primarily unverified suggestions) and take appropriate action. We expe-
rience every day the fact that although the probabilities of various life events
are almost always lower than one, we nonetheless undertake activities related to
them (i.e. events in the probabilistic sense), e.g. when driving a car to a specific
destination. However, the accident is a random variable.
In the Tele-BRAIN project, the basis of diagnostics is the signal recording
coming from the electrical manifestations of the brain’s work. Bearing in mind
the low mass of the bodies involved in generating charges, polarising selected
areas of the head, and with high resistances (e.g. dry skin or skull bones), we
30 A. W. Mitas et al.
are glad to analyse the potentials because the currents are too low. On the
other hand, potential disturbances occur for completely trivial reasons, precisely
because of high impedance, so the measurement is, by its very nature, burdened
with high uncertainty. A completely different material is devoted to the issue of
reducing the impact of these disturbances because such a problem belongs to a
completely different research area (compression, reduction, filtration or augmen-
tation of data).
Human EEG data analysis is the obvious basis of inference for many neu-
rological pathologies. However, despite basing its principle on algorithmics, it
does not avoid heuristics; the expert’s opinions contain some conclusions that
are difficult to justify, other than with many years of practice and similar cases
from personal experience in historical terms.
A multi-channel electroencephalograph works simultaneously in each lead,
providing separate courses. Simultaneous analysis of material dispersed over time
(study period, divided into epochs) and space (amplitude of changes in each
lead) is often limited to the visual registration and interpretation of selected
patterns, characteristic of the previously described cases. The invention of a
relationship resulting from the linear combination of several (a dozen) leads is
beyond the scope of human possibilities. Special courses and certificates regulate
the appropriate authorisation to diagnose based on such records.
In this material, the merits of interest are methods of EEG signal processing,
commercialised as part of a research and implementation project, designated in
NCBR as WPN-3/5/2018, and entitled: TeleBrain – Artificially Intelligent EEG
Analysis in the Cloud.
It is an international project, and the German consortium has shown in his
research the usefulness of using artificial intelligence in the effective analysis of
electroencephalographic waveforms. The broadly understood aim of the project
is to implement the developed method and its practical use in diagnostics.
Different potentially biomarkers are currently on the way to explore the risk
estimation of cognitive deterioration in Parkinson Disease (PD). Although most
of the studies so far focused on the patient with clinically-established Parkinson’s
Disease Dementia (PDD), the most important seems to be to discover a potential
early biomarker of cognitive impairment in PD since its pre-clinical phases. That
is why it has become a more and more widely researched topic nowadays. Recent
studies have shown that quantitative EEG (qEEG) is a promising method to
study dynamic brain changes and prove its suitability for identifying possible
early electrophysiological markers of PD-related cognitive decline. That could
allow to mark the early stage and the onset of dementia in Parkinson’s disease
(PD) and enable the development of early treatment methods.
qEEG analysis of electrocortical activity demonstrated that a decrement of
the background rhythm frequency (alpha rhythm) together with an increase of
the low-frequency activity (delta and theta) could be associated with the degree
of cognitive decline in PD [11]. Reduction in the alfa band is widely known in the
literature. Many studies have already revealed a slow-down in the background
activity in PD patients with cognitive decline [4,6,11]. Analysing the absolute
power of seven frequency bands based on individual alpha frequency revealed
that compared to healthy controls, patients with PD without cognitive decline
had higher power in theta and lower alpha1 bands while PDD had higher power
in delta, theta, lower alpha1 and beta bands. Higher delta and gamma power
with no difference in theta and lower-alpha 1 power was the characteristic feature
of PD patients with dementia compared to non-dementia [26]. Moreover, studies
show that theta and gamma oscillations in PD EEG significantly varies from
matched controls. At rest, these oscillations increase in PD patients without
dementia, but they decrease in those with cognitive decline [21].
A recent study analysing electrocortical networks in PD patients using low-
resolution electromagnetic tomography (LORETA) and qEEG analysis found a
significant relationship between the electrocortical networks and cognitive dys-
functions in Parkinson’s Disease Mild Cognitive Impairment (PD-MCI). QEEG
analysis showed a relevant decrease of alpha Power Spectral Density (PSD) over
the occipital regions and an increase of delta PSD over the left temporal region
in PD-MCI [25]. An increase of theta power in the left temporal region and a
34 A. W. Mitas et al.
reduction of median frequency was associated with the presence of early cogni-
tive dysfunction PD and Alzheimer’s Disease (AD). However, EEG slowing was
more pronounced in PD than in AD, which may arise from a greater cholinergic
deficit in cognitively impaired patients with PD [4].
Other studies also looked at the Event-Related Potentials (ERPs), which
refer to event-related voltage changes in areas of the brain in response to spe-
cific stimuli (e.g., visual, auditory, and somatosensory stimuli). ERPs provide
a valid method to study the ongoing EEG activity during sensory, motor and
cognitive events and can be used as an electrophysiological indicator of cogni-
tive function. Research results show that ERP changes might be used to identify
cognitive impairment in PD patients and can be used to study the correlation
between cognitive and motor functions [11,32]. The P300 ERP is the best-studied
neural marker of cognitive functions. Studies have shown that changes in P300
ERP amplitude and latency are associated with impaired attention and cognitive
decline [32].
The next stage of the inference procedure, which forms the basis of the expert
system architecture, is the performance of an electroencephalographic examina-
tion (EEG).
After performing the above tests, the results are summarised, and the final
diagnosis is made:
– non-PD,
– PD without cognitive impairment (PD-CogN),
– PD with MCI (Parkinson’s Disease Mild Cognitive Impairment, PD-MCI),
– PD with dementia (Parkinson’s Disease Dementia, PDD).
The collection and detailing of the entire procedure, which is the basis for devel-
oping the diagnostic support algorithm in Parkinson’s disease, is presented in
three subsequent block diagrams (Figs. 1 and 2).
6 Tele-BRAIN-PL System
The discussed IT system has been implemented in cloud and local (stand-alone)
versions. The scope of functionality of both versions is similar from the user’s
point of view, with the difference that the local version has been equipped with
a mechanism for automatic updating of the algorithm version, whose current
version is in the cloud. Other functionalities are the same in both versions.
The Microsoft Azure cloud has been used for building the system, as it pro-
vides the best possibilities in terms of data security, system scalability, and high
availability. Two language versions of the system were prepared - Polish and
English, and the switching of languages takes place at the stage of logging into
the system.
The principle of operation of the discussed system (regardless of its version)
is as follows:
– through the application interface, data are delivered to the algorithm, such
as the result of the EEG examination, the description of the examination and
the description itself,
– additionally, through the system forms, demographic and clinical information
is entered.
The information gathered in this way is sent to the algorithm, which analyses
the data provided and returns a numerical result indicating the likelihood of the
disease.
Currently, in the research phase, the result of the algorithm is presented in
numerical form and additionally illustrated by a heat map. The result of the
algorithm is also generated as a .pdf file, which can be printed or saved.
The following technologies were used to build the system:
Access to the system (regardless of its version) is possible from any internet
browser. The necessary condition is the presence of Windows 10 64-bit, in the
version that already has WSL (Windows Subsystem for Linux), that is build
min. 19041.
TCP and SSL protocols are responsible for data security during transmission
in the system. Additionally, the security features provided by the Azure platform
itself are also used, such as AES encryption with a 256-bit key. It is one of the
strongest block cyphers, compliant with FIPS 140-2 standard.
Example screenshots illustrate the following three images in Fig. 3.
38 A. W. Mitas et al.
Fig. 2. Illustration of the test procedure for non-motor disorders (on the left) and
research algorithm of examination for cognitive impairment (on the right)
Acknowledgement. The study was realised within the project ‘TeleBrain – Arti-
ficially Intelligent EEG Analysis in the Cloud’ (grant number WPN3/9/Tele-
Brain/2018).
References
1. Aarsland, D., Zaccai, J., Brayne, C.: A systematic review of prevalence studies of
dementia in Parkinson’s disease. Mov. Disord. Off. J. Mov. Disord. Soc. 20(10),
1255–1263 (2005)
2. Babiloni, C., et al.: Cortical sources of resting state electroencephalographic
rhythms in Parkinson’s disease related dementia and Alzheimer’s disease. Clin.
Neurophysiol. 122(12), 2355–2364 (2011)
3. Babiloni, C., et al.: Abnormalities of cortical neural synchronization mechanisms
in patients with dementia due to Alzheimer’s and Lewy body diseases: an EEG
study. Neurobiol. Aging 55, 143–158 (2017)
4. Benz, N., et al.: Slowing of EEG background activity in Parkinson’s and
Alzheimer’s disease with early cognitive dysfunction. Front. Aging Neurosci. 6,
314 (2014)
5. Berendse, H.W., Stam, C.J.: Stage-dependent patterns of disturbed neural syn-
chrony in Parkinson’s disease. Parkinsonism Relat. Disord. 13, S440–S445 (2007)
6. Bočková, M., Rektor, I.: Impairment of brain functions in Parkinson’s disease
reflected by alterations in neural connectivity in EEG studies: a viewpoint. Clin.
Neurophysiol. 130(2), 239–247 (2019)
Tele-BRAIN – Diagnostics Support System 41
26. Pal, A., Pegwal, N., Behari, M., Sharma, R.: High delta and gamma EEG power
in resting state characterise dementia in Parkinson’s patients. Biomark. Neuropsy-
chiatry 3, 100,027 (2020)
27. Sinanović, O., Kapidzić, A., Kovacević, L., Hudić, J., Smajlović, D.: EEG frequency
and cognitive dysfunction in patients with Parkinson’s disease. Med. Arh. 59(5),
286–287 (2005)
28. Stoffers, D., Bosboom, J., Deijen, J., Wolters, E.C., Berendse, H., Stam, C.: Slowing
of oscillatory brain activity is a stable characteristic of Parkinson’s disease without
dementia. Brain 130(7), 1847–1860 (2007)
29. Svenningsson, P., Westman, E., Ballard, C., Aarsland, D.: Cognitive impairment in
patients with Parkinson’s disease: diagnosis, biomarkers, and treatment. LANCET
Neurol. 11(8), 697–707 (2012)
30. Thatcher, R.W., North, D., Biver, C.: EEG and intelligence: relations between
EEG coherence, EEG phase delay and power. Clin. Neurophysiol. 116(9), 2129–
2141 (2005)
31. Tysnes, O.B., Storstein, A.: Epidemiology of Parkinson’s disease. J. Neural Transm.
124(8), 901–905 (2017)
32. Wang, Q., Meng, L., Pang, J., Zhu, X., Ming, D.: Characterization of EEG data
revealing relationships with cognitive and motor symptoms in Parkinson’s disease:
a systematic review. Front. Aging Neurosci. 373 (2020)
EEG Signal and Deep Learning Approach
in Evaluation of Cognitive Declines
in Parkinson’s Disease
1 Introduction
Parkinson’s Disease (PD) is one of the most common neurodegenerative diseases
and the most frequent form of parkinsonian syndrome. The frequency of occur-
rence of PD in the population is about 0.3%, for population older than 60 years it
is about 1%, and in the population older than 80 years, about 3% [15,27]. There-
fore, the problems resulting from this disease may be a significant social problem.
Parkinson’s Disease Mild Cognitive Impairment (PD-MCI) is diagnosed in
approximately 40–50% of PD patients within five years of PD diagnosis. It
is estimated that approximately 20.2% of PD patients develop mild cognitive
impairment at the time of diagnosis of Parkinson’s disease [21]. It is a group of
symptoms in which there is a cognitive decline, the so-called intermediate state
between the changes observed in the aging process and the disorders that meet
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 43–53, 2022.
https://doi.org/10.1007/978-3-031-09135-3_4
44 M. Bugdol et al.
the criteria for the diagnosis of dementia. They are a well-established risk factor
for the development of dementia [8]. The risk of passing PD-MCI in Parkin-
son’s Disease Dementia (PDD) is unclear. PD-MCI may remain stable, may
even improve and return to normal cognitive functioning in about 11–28% of
cases, not necessarily turning into dementia [23,29].
The time at which PD-MCI symptoms may progress to PDD is an individual
feature, and studies show that the annual rate of progression from PD-MCI to
PDD is around 10%, with particular susceptibility to cognitive decline in people
over 70 [28]. The fact that conversion rates for PDD are much higher in people
with PD-MCI is also confirmed by studies in which almost 60% of patients with
PDD were reported for the ParkWest cohort after five years of follow-up of
patients with PD-MCI [21], and other data reports that as much as 80% of PD-
MCI can be converted to PDD [16]. The mean percentage of patients with PD
diagnosed with dementia is 24–31% [1]. Although the results vary depending on
the study, the prevalence of PDD in patients aged 54 to 70.2 years at diagnosis
is 17% five years after diagnosis, 19.49% ten years after diagnosis, reaching 83%
20 years after diagnosis [9,30,31].
Neuropsychological diagnosis is the gold standard in assessing the cognitive
impairment in PD. However, the way neuropsychological tests are performed
may be modified by the patient’s ability to cooperate, exercise capacity, and
the severity of motor symptoms. Therefore, this diagnosis could be supported
by EEG, which is not only a simple and safe method but also carried out in
the conditions of the patient’s relaxation and does not require effort or retained
verbal functions, which is important – it does not depend on motor symptoms
[13].
The EEG is an important supporting examination to study the effects of
developing dementia on brain function [2,4]. As shown by the previous studies,
the assessment of EEG in PD may be a valuable biomarker, especially in terms
of predicting the occurrence of cognitive disorders and their early diagnosis,
as well as monitoring selected aspects of the effectiveness of treatment [13].
Digital quantitative analysis of the EEG allows the analysis of the absolute
strength of the various frequency bands of the brain’s rhythms, the temporal
relationships between individual regions of the brain, as well as the functional
connectivity between them [7,25]. Bousleiman et al. [5] analyzed the changes in
qEEG characteristics of PD and found that specific changes in alpha and theta
signal strengths may be markers for the diagnosis of the disease itself and are
associated with the presence of MCI in PD. Other studies have tried to compare
the pathophysiological mechanisms of dementia in Alzheimer’s Disease (AD)
with dementia in PD in qEEG and found differences in recording, which may be
associated with different mechanisms of dementia development in both disease
entities [7].
Automated EEG analysis is typically used for the following tasks [6]:
frequency during a standard routine examination. The full EEG recording for
each patient contained approximately 20 min of signal collected during succes-
sive activation trials. According to the established study protocol, recording
begins with approximately 3 min of resting eyes-closed condition followed by
an eye-opening command. These fragments were used in further analysis and
classification of cognitive impairments.
The EEG fragments were firstly split into 5 s non-overlapping epochs. The
epochs with artifacts from bad channels (gaps in the collected signal) and incom-
plete epochs were removed. The resulting number of epochs for each cognitive
disorders class was as follows: 763 N-PD, 607 PD-MCI, 402 PDD. The signals
were filtered in the frequency range of 1–40 Hz with a fourth-order Butterworth
bandpass filter to reduce low-frequency drifts and power line noise.
In many solutions for EEG classification using deep neural networks, high
performance was obtained for raw EEG signal input on the convolutional neural
network. We evaluated different architectures of CNN in two approaches: one-
dimensional convolutions only on the time dimension, where weights for all EEG
channels have the same value, and two-dimensional convolutions on time and
channels dimensions. The third approach used in experiments is also based on
2D convolution. However, before the input, each EEG channel was transformed
to power spectral density (PSD) in frequency limited to range 1–40 Hz.
The first deep neural network architecture, EEG-Conv1D (Fig. 2(a)), consists
of three processing blocks and a multilayer perceptron classifier with two hidden
layers. Each block involves a 1D convolutional layer (10, 20, 40 filters, 21 kernel
size) followed by a rectified linear unit (ReLU) and max pooling (pool size equal
to 5 and the same stride). After flattening, the two fully connected layers with
the ReLu activation function have 500 and 100 units, respectively. The two-
dimensional EEG-Conv2D (Fig. 2(b)) was evaluated with the same structure
and different convolution layers kernel dimensions variants. Finally, the best
results were achieved by modifying the first processing block only. This block in
EEG-Conv2D consists of two convolutional layers: first with ten filters 1 × 21
EEG Signal and Deep Learning Approach 47
kernel (convolution in the time dimension) and second with ten filters 19 × 1
(convolution in channels dimension). This block cumulates information from all
channels, and in the following blocks, the 1×21 kernel is used in the convolutional
layer. All the networks ended with softmax activation in the last layer with two
or three outputs depending on the experiment.
Third approach PSD-Conv2D (Fig. 2(c)) was inspired by Ieracitano et al. [11]
and it is based on PSD of each channel as the input. The PSD was estimated
using periodogram with rectangular windowing function. The shape of resulting
CNN input was 194 PSD samples in 19 channels. The deep neural network
architecture consists of only one processing block with 16 filters size 3 × 3,
batch normalization, ReLU activation and 3 × 3 max pooling with strides 2 × 2.
Resulting 13824 features was then classified by one hidden layer with 300 units
and last layer with softmax activation.
The stratified 8-fold cross-validation was used to validate each model. The
patients’ set was randomly divided into eight subsets preserving the percentage
of subjects for each class. We decided on 8-fold scheme because of 64 patients
– the number of unique patients in each fold was the same. Eight patients in
the testing set are sufficient to provide an example from each class. The training
was performed eight times; each time, one of the subsets was a testing set used
after training to compute evaluation metrics: accuracy, precision, recall, and F1-
score. Results achieved from 8-folds were then presented as average and standard
deviation. We also collected results from each iteration in the confusion matrix.
For the evaluation of the 3-class results, we used the macro-averaging strategy.
For each testing set, metrics values were calculated separately for each label and
then averaged.
In CNN training, we used stochastic gradient descent (SGD) optimization
with a learning rate of 0.0001 and momentum coefficient α = 0.9. The mini-
batch size was set to 91 EEG fragments. The early stopping technique was used
to prevent overfitting. The number of training epochs depended on categorical
cross-entropy loss value improvement and accuracy value lower than 1.
3 Results
In the first experiments, individual models were validated in a three-class clas-
sification. Results from each cross-validation iteration were aggregated in the
confusion matrix presented in Fig. 3. Average results presented in Table 1 are
poor, but on the other hand, results for different folds could reach almost 70%
accuracy.
4 Discussion
Regardless of the network architecture, the obtained results oscillate around
50–70% for two classes and about 50% for three classes. This indicates that
the solutions used to distinguish cognitive disorders from AD or healthy cases
from AD are not the best choice for the described task. The approaches where
the network input were raw EEG signals directly in the time domain did not
give good results. The usage of PSD did not improve the accuracy. The large
value of standard deviation depended heavily on the composition of train and
test sets, which suggests that the dataset consists of few cases to achieve global
generalization for the problem of cognitive disorders classification.
We evaluated three different CNN approaches presented in the literature
concerning EEG classification. They were used successfully in different classifi-
cation tasks: Parkinson’s Disease (Oh 2020), cognitive impairments (Ieracitano
2019), and motor imagery (Schirrmeister 2017). We reproduced EEG signal
processing and preparation for deep learning classification presented in other
papers. The results showed the limitations of cognitive impairments classifica-
tion in PD patients by known CNN approaches. The cohort in the presented
study is relatively small, considering the diversity of PD duration, other dis-
eases, and pathologies in EEG recordings. The other shortcoming is the dif-
ference in recording lengths, leading to imbalances in training and testing set
during cross-validation based on patients. For this reason, it was difficult to
adjust the parameters in such a way as to keep the balance between complete
lack of generalization ability and overfitting.
One of the necessary preprocessing steps, which will be included in further
work, will be detecting frames containing artefacts and their removal from further
processing. Then, various possibilities of processing the EEG signal into the form
of a feature vector will be tested, which will constitute the input of the neural
network. An additional element that could improve classification accuracy is
data augmentation by overlaying consecutive frames.
In the previous works published so far on the automatic classification of cogni-
tive disorders based on EEG, the occurrence of neurological disorders, including
PD, was one of the exclusion criteria [12,20]. However, the risk of developing
cognitive impairment in people with PD is very high (20.2% of PD patients
develop mild cognitive impairment at the time of diagnosis of Parkinson’s dis-
ease and approximately 40–50% of PD patients within five years of PD diagnosis
[21]), therefore it is necessary to look for solutions enabling their early detection,
especially in the presence of neurological disorders.
EEG Signal and Deep Learning Approach 51
5 Conclusion
The paper proposed the first approach for evaluating cognitive declines in Parkin-
son’s Disease using deep learning classifiers. The obtained results are promising,
and they indicate that such a solution is possible. On the other hand, the archi-
tectures of CNN proposed for a similar task are not quite suitable for patients
with PD. However, further work should be taken to create a tool for monitoring
changes of cognitive impairments of PD patients.
Acknowledgement. The study was realized within the project “TeleBrain – Artifi-
cially Intelligent EEG Analysis in the Cloud” (grant no. WPN3/9/TeleBrain-/2018).
References
1. Aarsland, D., Zaccai, J., Brayne, C.: A systematic review of prevalence studies of
dementia in Parkinson’s disease. Mov. Disord. 20(10), 1255–1263 (2005). https://
doi.org/10.1002/mds.20527
2. Babiloni, C., et al.: Abnormalities of cortical neural synchronization mechanisms in
patients with dementia due to Alzheimer’s and Lewy body diseases: an EEG study.
Neurobiol. Aging 55, 143–158 (2017). https://doi.org/10.1016/j.neurobiolaging.
2017.03.030
3. Bashivan, P., Rish, I., Yeasin, M., Codella, N.: Learning representations from EEG
with deep recurrent-convolutional neural networks (2015)
4. Bonanni, L., et al.: Quantitative electroencephalogram utility in predicting conver-
sion of mild cognitive impairment to dementia with Lewy bodies. Neurobiol. Aging
36(1), 434–445 (2015). https://doi.org/10.1016/j.neurobiolaging.2014.07.009
5. Bousleiman, H., et al.: P122. Alpha1/theta ratio from quantitative EEG (qEEG) as
a reliable marker for mild cognitive impairment (MCI) in patients with Parkinson’s
disease (PD). Clin. Neurophysiol. 126(8), e150–e151 (2015). https://doi.org/10.
1016/j.clinph.2015.04.249
6. Craik, A., He, Y., Contreras-Vidal, J.: Deep learning for electroencephalogram
(EEG) classification tasks: a review. J. Neural Eng. 16(3), 031,001 (2019). https://
doi.org/10.1088/1741-2552/ab0ab5
7. Fonseca, L.C., Tedrus, G.M., Carvas, P.N., Machado, E.C.: Comparison of quanti-
tative EEG between patients with Alzheimer’s disease and those with Parkinson’s
disease dementia. Clin. Neurophysiol. 124(10), 1970–1974 (2013). https://doi.org/
10.1016/j.clinph.2013.05.001
8. Goldman, J.G., Sieg, E.: Cognitive impairment and dementia in Parkinson disease.
Clin. Geriatr. Med. 36, 365–377 (2020). https://doi.org/10.1016/j.cger.2020.01.001
9. Hely, M.A., Reid, W.G., Adena, M.A., Halliday, G.M., Morris, J.G.: The Sydney
multicenter study of Parkinson’s disease: the inevitability of dementia at 20 years.
Mov. Disord. 23(6), 837–844 (2008). https://doi.org/10.1002/mds.21956
10. Hussein, R., Palangi, H., Ward, R.K., Wang, Z.J.: Optimized deep neural network
architecture for robust detection of epileptic seizures using EEG signals. Clin.
Neurophysiol. 130(1), 25–37 (2019). https://doi.org/10.1016/j.clinph.2018.10.010
11. Ieracitano, C., Mammone, N., Bramanti, A., Hussain, A., Morabito, F.C.: A con-
volutional neural network approach for classification of dementia stages based on
2d-spectral representation of EEG recordings. Neurocomputing 323, 96–107 (2019)
52 M. Bugdol et al.
12. Ieracitano, C., Mammone, N., Hussain, A., Morabito, F.C.: A novel multi-modal
machine learning based approach for automatic classification of EEG recordings in
dementia. Neural Netw. 123, 176–190 (2020)
13. Klassen, B., et al.: Quantitative EEG as a predictive biomarker for Parkinson dis-
ease dementia. Neurology 77(2), 118–124 (2011). https://doi.org/10.1212/WNL.
0b013e318224af8d
14. Kumar, S., Sharma, A., Tsunoda, T.: Brain wave classification using long short-
term memory network based optical predictor. Sci. Rep. 9(1), 9153 (2019). https://
doi.org/10.1038/s41598-019-45605-1
15. Lee, A., Gilbert, R.M.: Epidemiology of Parkinson disease. Neurol. Clin. 34(4),
955–965 (2016). https://doi.org/10.1016/j.ncl.2016.06.012. Glob. Domest. Publ.
Health Neuroepidemiol
16. Litvan, I.: Diagnostic criteria for mild cognitive impairment in Parkinson’s dis-
ease: Movement disorder society task force guidelines. Mov. Disord. 27(3), 349–356
(2012). https://doi.org/10.1002/mds.24893
17. Medvedev, A.V., Agoureeva, G.I., Murro, A.M.: A long short-term memory neural
network for the detection of epileptiform spikes and high frequency oscillations.
Sci. Rep. 9(1), 19,374 (2019). https://doi.org/10.1038/s41598-019-55861-w
18. Michielli, N., Acharya, U.R., Molinari, F.: Cascaded LSTM recurrent neural net-
work for automated sleep stage classification using single-channel EEG signals.
Comput. Biol. Med. 106, 71–81 (2019). https://doi.org/10.1016/j.compbiomed.
2019.01.013
19. Nejedly, P., et al.: Intracerebral EEG artifact identification using convolutional neu-
ral networks. Neuroinformatics 17(2), 225–234 (2018). https://doi.org/10.1007/
s12021-018-9397-6
20. Oltu, B., Akşahin, M.F., Kibaroğlu, S.: A novel electroencephalography based app-
roach for Alzheimer’s disease and mild cognitive impairment detection. Biomed.
Sig. Process. Control 63, 102,223 (2021)
21. Pedersen, K.F., Larsen, J.P., Tysnes, O.B., Alves, G.: Natural course of mild cog-
nitive impairment in Parkinson disease. Neurology 88(8), 767–774 (2017). https://
doi.org/10.1212/WNL.0000000000003634
22. Praveena, M., Sarah, A., George, s.: Deep learning techniques for EEG signal appli-
cations - a review. IETE J. Res. 1–8 (2020). https://doi.org/10.1080/03772063.
2020.1749143
23. Saredakis, D., Collins-Praino, L., Gutteridge, D., Stephan, B., Keage, H.: Conver-
sion to mci and dementia in Parkinson’s disease: a systematic review and meta-
analysis. Parkinsonism Relat. Disord. 65, 20–31 (2019). https://doi.org/10.1016/
j.parkreldis.2019.04.020
24. Schirrmeister, R., Gemein, L., Eggensperger, K., Hutter, F., Ball, T.: Deep learn-
ing with convolutional neural networks for decoding and visualization of EEG
pathology. In: 2017 IEEE Signal Processing in Medicine and Biology Symposium
(SPMB), pp. 1–7 (2017). https://doi.org/10.1109/SPMB.2017.8257015
25. Thatcher, R., North, D., Biver, C.: EEG and intelligence: relations between EEG
coherence, EEG phase delay and power. Clin. Neurophysiol. 116(9), 2129–2141
(2005). https://doi.org/10.1016/j.clinph.2005.04.026
26. Tsiouris, K.M., Pezoulas, V.C., Zervakis, M., Konitsiotis, S., Koutsouris, D.D.,
Fotiadis, D.I.: A long short-term memory deep learning network for the prediction
of epileptic seizures using EEG signals. Comput. Biol. Med. 99, 24–37 (2018).
https://doi.org/10.1016/j.compbiomed.2018.05.019
27. Tysnes, O.-B., Storstein, A.: Epidemiology of Parkinson’s disease. J. Neural
Transm. 124(8), 901–905 (2017). https://doi.org/10.1007/s00702-017-1686-y
EEG Signal and Deep Learning Approach 53
28. Vasconcellos, L.F.R., et al.: Mild cognitive impairment in Parkinson’s disease: char-
acterization and impact on quality of life according to subtype. Geriatr. Gerontol.
Int. 19(6), 497–502 (2019). https://doi.org/10.1111/ggi.13649
29. Weil, R.S., Costantini, A.A., Schrag, A.E.: Mild cognitive impairment in Parkin-
son’s disease—what is it? Curr. Neurol. Neurosci. Rep. 18(4), 1–11 (2018). https://
doi.org/10.1007/s11910-018-0823-9
30. Williams-Gray, C.H., et al.: The distinct cognitive syndromes of Parkinson’s dis-
ease: 5 year follow-up of the campaign cohort. Brain 132(11), 2958–2969 (2009).
https://doi.org/10.1093/brain/awp245
31. Williams-Gray, C.H., et al.: The campaign study of Parkinson’s disease: 10-year
outlook in an incident population-based cohort. J. Neurol. Neurosurg. Psychi-
atry 84(11), 1258–1264 (2013). https://doi.org/10.1136/jnnp-2013-305277. URL
https://jnnp.bmj.com/content/84/11/1258
32. Zhang, X., Yao, L., Wang, X., Monaghan, J., McAlpine, D., Zhang, Y.: A survey on
deep learning-based non-invasive brain signals: recent advances and new frontiers.
J. Neural Eng. 18(3), 031,002 (2021). https://doi.org/10.1088/1741-2552/abc902
33. Zhang, Y., et al.: An investigation of deep learning models for EEG-based emo-
tion recognition. Front. Neurosci. 14 (2020). https://doi.org/10.3389/fnins.2020.
622759. URL https://www.frontiersin.org/article/10.3389/fnins.2020.622759
The Role of Two-Dimensional Entropies
in IRT-Based Pregnancy Determination
Evaluated on the Equine Model
1 Introduction
Detection of pregnancy is important for reducing losses and managing the ani-
mal herd. Pregnancy detection is based on the clinical examination which is
supported by additional tests, qualified accordingly as invasive or non-invasive
methods. Invasive additional tests, such as hormones level evaluation, required
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 54–65, 2022.
https://doi.org/10.1007/978-3-031-09135-3_5
Two-Dimensional Entropies 55
more than a single blood sampling, whereas non-invasive methods, such as ultra-
sound examination, require direct contact, which could also be a stressful factor
[4,16,22]. Thus, there is a need for a contactless objective quantitative method to
assess pregnancy detection which could be useful in large herds of animals or non-
domestic animals. Infrared thermography (IRT) is a non-invasive method that
measures the temperature distribution of a surface and transforms it into an image
representing the differences in emitted heat [17]. Therefore, IRT is a useful diag-
nostic method [10,14], which has been successfully applied in veterinary medicine
to assess pregnancy in various species [3,20]. However, IRT is sensitive to different
types of factors: internal factors and external factors. Internal factors are related
to tissue metabolism and blood flow. During pregnancy, blood flows through the
uterus increases, making it possible to detect even subtle temperature changes
that can be detected by the IRT in its vicinity. External factors such as fluctua-
tions in ambient temperature [26], other weather conditions [20], and individual
characteristics of the equine body surface [9] can also affect the image recorded.
Therefore, thermal images should be obtained under standard conditions. Achiev-
ing this assumption is possible, however, the time required to detect pregnancy-
related changes in body surface temperature [3] and the accuracy of detection of
pregnancy [3,7,20] limit the application of IRT to late stages of pregnancy. There-
fore, there is a need to search for methods to describe thermal images in a case of
early and middle pregnancy detection. There are many approaches to describe
images. One of them is to treat an image as a texture [13]. Texture can be defined
as a measure of roughness, coarseness, directionality, or regularity [29]. Relatively
recent entropy-based methods have been introduced for quantitative description
of the texture. These methods are simple to implement and are based on the prop-
agation of measures for one-dimensional data. This new class of methods is com-
puted directly on the image and is related to image irregularity or complexity [13].
Until now, methods were known in which the entropy measure was calculated but
as a measure, disordered matrix obtained from the processing step applied to the
image [28]. Entropy-based approaches certainly deserve attention and needs to
be evaluated for usefulness in different types of applications [30]. Therefore, in
our work, entropy-based methods of images texture analysis have been applied
in the equine model of pregnancy. This study aimed to compare the results of
four based entropy methods: two-dimensional sample entropy, two-dimensional
fuzzy entropy, two-dimensional dispersion entropy, and two-dimensional distribu-
tion entropy; in order to find differences between IRT images obtained from preg-
nant and non-pregnant mares as well the contribution of subsequent color channels
in the whole measures of the entropy of thermal images.
The same mares’ body area – the left lateral surface of the abdomen was imaged
using a non-contact thermal camera (FLIR Therma CAM E60, FLIR Systems
Brazil, Brazil) under the same imaging conditions [15,26] (0.99 emissivity, closed
space devoid of wind and sun radiation, 2.0 m distance, position of the central
beam on the half of the vertical line through the tuber coxae, 1 h acclimatiza-
tion to the imaging condition, brushing off dirt and mud from the imaged area
15 min before imaging). A total of 160 thermal images were taken four times
every two months, starting in June and ending in December, with the ambi-
ent temperature and humidity ranging from 1.0 ◦ C and 50% to 24 ◦ C and 90%,
respectively. Obtained thermal images were set in the same temperature range
(10.0 to 40.0 ◦ C) using the software FLIR Tools Professional (FLIR Systems
Brasil, Brazil) and manually segmented. During segmentation, one region of
interests ROI) was annotated on each thermal image in the flank area of the left
lateral surface of the abdomen ranging from the vertical line behind the tuber
coxae, the dorsal edge of the abdomen, the caudal edge of the last rib, and the
lower 2/3 of the abdomen height. The annotated thermal images were converted
to bitmaps and prepared for based entropy image texture analysis in the ImageJ
software (Wayne Rasband, National Institutes of Health, USA). In total, there
were 160 registered images, within the 160 ROIs were separated (Fig. 1). Each
ROIs were converted into its greyscale and then four methods of image conver-
sion were performed (Fig. 1). The first method uses only the Red channel, second
– the Green channel, third – the Blue channel. The fourth method is based on
grayscale computing by the cv2 module in Python, version: 3.8.5 64-bit. Next,
Two-Dimensional Entropies 57
in each color channel separately, the based entropy texture analysis was per-
formed using: two-dimensional sample entropy, two-dimensional fuzzy entropy,
two-dimensional dispersion entropy, and two-dimensional distribution entropy.
The mentioned features were computed using Python, version: 3.8.5 64-bit.
Fig. 1. Example of the pregnant mares and the annotated ROI conversion to the
grayscale image, Red (R) channel, Green (G) channel, and Blue (B) channel
After mapping, the obtained results are matched to the dispersion pattern π
and the probabilities p(π) of each dispersion patterns are calculated. If all possi-
ble two-dimensional image dispersion patterns have the same probability value,
DispEn2D reaches its maximum value. If there is one probability value p other
than zero, DispEn2D reaches the minimum value and the image has a regular
pattern.
3 Results
In all calculated features, two parameters were manually selected: m = 3 and
r = 0.2. Within the entropy features measured in the grayscale images, the
SampEn2D, FuzzEn2D, and DispEn2D were higher in the pregnant than in the
non-pregnant group, whereas for the DistEn2D no difference was found between
groups (Gray boxes on Figs. 2, 3, 4 and 5). When the thermal images were
converted to subsequent color channels, the most significant differences between
studied groups were found for the Red channel. Concerning the Red channel, the
SampEn2D and FuzzEn2D were lower in the pregnant than in the non-pregnant
group, whereas the DispEn2D and DistEn2D were contrary higher (Red boxes on
Figs. 2, 3, 4 and 5). Concerning the Green channel, the FuzzEn2D, DispEn2D,
and DistEn2D were higher in the pregnant than in the non-pregnant group,
whereas for the SampEn2D no difference was found between groups (Green boxes
on Figs. 2, 3, 4 and 5). Finally concerning the Blue channel, the SampEn2D and
FuzzEn2D were lower in the pregnant than in the non-pregnant group, whereas
the DispEn2D and DistEn2D were contrary higher (Blue boxes on Figs. 2, 3,
4 and 5). When different color channels were considered, a different pattern
Two-Dimensional Entropies 59
Fig. 2. The box plots of two-dimensional sample entropy (SampEn2D). The upper
whisker represents the maximum value; the upper line of the box represents Q3 (upper
quartile); the center line inside the box represents the median; the lower line of the box
represents Q1 (lower quartile); and the lower whisker represents the minimum value.
Lower case letters indicate differences between groups (non-pregnant; pregnant) for
p < 0.05. Asterisks indicate differences color channels (Grayscale, Red, Green, Blue)
(∗ ∗ p < 0.01; ∗ ∗ ∗p < 0.0001)
In the non-pregnant group, the FuzzEn2D differed between each of all exam-
ined color channels. In this group, the FuzzEn2D was the lowest in the Red
channel, low in the Grayscale images, high in the Green channel, and the high-
est in the Blue channel. In the pregnant group, the FuzzEn2D was lower in the
60 M. Borowska et al.
Table 2. The results of two-dimensional fuzzy entropy (FuzzEn2D) for different color
channels: mean, median, standard deviation (std) and the significance of p value
between groups (non-pregnant; pregnant)
Fig. 3. The box plots of two-dimensional fuzzy entropy (FuzzEn2D). The upper whisker
represents the maximum value; the upper line of the box represents Q3 (upper quar-
tile); the center line inside the box represents the median; the lower line of the box
represents Q1 (lower quartile); and the lower whisker represents the minimum value.
Lower case letters indicate differences between groups (non-pregnant; pregnant) for
p < 0.05. Asterisks indicate differences color channels (Grayscale, Red, Green, Blue)
(∗ ∗ p < 0.01; ∗ ∗ ∗p < 0.0001)
Red channel than in the other color channels, with no differences between the
Grayscale images, Green channel, and Blue channel (Table 2, Fig. 3).
In both groups, the DispEn2D was lower in the Red channel than in the other
color channels with no differences between others (Table 3, Fig. 4).
In both groups, the DistEn2D was lower in the Red channel than in the other
color channels. In the non-pregnant group, the DistEn2D was lower in the Blue
channel than in the Grayscale images and the Green channel, with no differences
between the last two. Whereas, in the pregnant group, the DistEn2D was higher
in the Green channel than in the Grayscale images and the Blue channel, with
no differences between the last two (Table 4, Fig. 5).
Two-Dimensional Entropies 61
Fig. 4. The box plots of two-dimensional dispersion entropy (DispEn2D). The upper
whisker represents the maximum value; the upper line of the box represents Q3 (upper
quartile); the center line inside the box represents the median; the lower line of the box
represents Q1 (lower quartile); and the lower whisker represents the minimum value.
Lower case letters indicate differences between groups (non-pregnant; pregnant) for
p < 0.05. Asterisks indicate differences color channels (Grayscale, Red, Green, Blue)
(∗ ∗ p < 0.01; ∗ ∗ ∗p < 0.0001)
Fig. 5. The box plots of two-dimensional distribution entropy (DistEn2D). The upper
whisker represents the maximum value; the upper line of the box represents Q3 (upper
quartile); the center line inside the box represents the median; the lower line of the box
represents Q1 (lower quartile); and the lower whisker represents the minimum value.
Lower case letters indicate differences between groups (non-pregnant; pregnant) for
p < 0.05. Asterisks indicate differences color channels (Grayscale, Red, Green, Blue)
(∗ ∗ ∗p < 0.0001)
4 Discussion
In recent research, the IRT imaging of the mares’ flank area of the abdomen was
successfully used for the pregnancy evaluation using a conventional [3,20] and
an advanced [7] approaches. In the most recent research, it has been assumed
that two entropy features were extracted using Gray Level Co-occurrence Matrix
(GLCM) approach, summation entropy (SumEntrp) and Entropy, increased with
the progression of pregnancy in the Red and I channels of IRT images [7]. There-
fore, in the current study, the investigation of mares’ IRT images with four detail
entropy-based measures is fully justified. One may observe that in the Red and
Blue channels, the studied groups differed regardless of the entropy feature. In
both referred channels, the SampEn2D and FuzzEn2D decreased with the preg-
nancy, whereas the DispEn2D and DistEn2D increased with the pregnancy. The
current SampEn2D and FuzzEn2D results are convergent with the recent find-
ings, according to which the entropy features increased with the pregnancy pro-
gression [7]. It has been suggested that changes in physiological condition, such
as exercise [8,19] or pregnancy [7], cases rise of the degree of thermal energy dis-
sipation and thus entropy of the thermal image texture. In the case of pregnancy,
the thermal energy emission increase with the increase of metabolic energy pro-
duced by the growing fetus and enlarging uterus [3,4,11]. As on the IRT images,
the temperatures are coded with respective image colors, the Red component
reflects the high temperature whereas, the Blue one reflects the low temperature
[21]. Therefore, the highest differences observed currently in the Red and Blue
channel are consistent with conventional thermal results indicating the minimal
and maximal temperatures as significantly differ between pregnant states [3,20].
One can also note that both in this and previous [7] research, the share of dif-
ferences in the studied features in the Green channel in the thermal pattern of
Two-Dimensional Entropies 63
the mare’s flank area was the lowest. Considering that each IRT image is com-
posed of the total effect of three separately considered channels, the Red, Green,
and Blue one [27], the final findings in the Grayscale image can be indicative of
the whole thermal image. It is worth noting that in three out of four explored
entropy-based methods, the SampEn2D, FuzzEn2D, and DispEn2D, the values
of features were higher in the pregnant than non-pregnant group. One may sug-
gest that the IRT images obtained during pregnancy showed a more irregular
and complex texture of the thermal body surface. This could be considered as
the most important finding of the current preliminary research, as the main
result is convergent with the current state of knowledge [3,7,20]. Our primary
entropy-based results seem to be promising in distinguishing the pregnant and
non-pregnant state of the mare based on their IRT images of the flank area.
However, further research is required to apply these entropy-based measures not
only to detect pregnancy but also to monitor the progression of pregnancy during
its duration.
5 Conclusion
Thermal images obtained from the non-pregnant and pregnant mares can be
successfully investigated by four entropy-based measures after conversion to
grayscale image and three basic color channels. The signs of higher irregu-
larity and complexity of IRT image texture were evidenced for the composite
Grayscale image using three out of four explored entropy-based methods. How-
ever, the Red and Blue channels-depended pattern of the IRT image differed
between the entropy features. Although this pattern could be initially assigned
as the SampEn2D/FuzzEn2D-depended or the DispEn2D/DistEn2D-depended,
the role and utility of the subsequent entropy features and color channels in preg-
nancy or other pathologies distinguish or monitoring, require further research.
References
1. Azami, H., Escudero, J., Humeau-Heurtier, A.: Bidimensional distribution entropy
to analyze the irregularity of small-sized textures. IEEE Signal Process. Lett. 24(9),
1338–1342 (2017)
2. Azami, H., da Silva, L.E.V., Omoto, A.C.M., Humeau-Heurtier, A.: Two-
dimensional dispersion entropy: an information-theoretic method for irregularity
analysis of images. Signal Process.: Image Commun. 75, 178–187 (2019)
3. Bowers, S., Gandy, S., Anderson, B., Ryan, P., Willard, S.: Assessment of preg-
nancy in the late-gestation mare using digital infrared thermography. Theriogenol-
ogy 72(3), 372–377 (2009)
4. Bucca, S., Fogarty, U., Collins, A., Small, V.: Assessment of feto-placental well-
being in the mare from mid-gestation to term: transrectal and transabdominal
ultrasonographic features. Theriogenology 64(3), 542–557 (2005)
64 M. Borowska et al.
5. Chen, W., Wang, Z., Xie, H., Yu, W.: Characterization of surface EMG signal
based on fuzzy entropy. IEEE Trans. Neural Syst. Rehabil. Eng. 15(2), 266–272
(2007)
6. Dascanio, J., McCue, P.: Equine Reproductive Procedures. Wiley, Hoboken (2021)
7. Domino, M., et al.: Advances in thermal image analysis for the detection of preg-
nancy in horses using infrared thermography. Sensors 22(1), 191 (2022)
8. Domino, M., et al.: The effect of rider: horse bodyweight ratio on the superficial
body temperature of horse’s thoracolumbar region evaluated by advanced thermal
image processing. Animals 12(2), 195 (2022)
9. Domino, M., Romaszewski, M., Jasiński, T., Maśko, M.: Comparison of the surface
thermal patterns of horses and donkeys in infrared thermography images. Animals
10(12), 2201 (2020)
10. Durrant, B.S., Ravida, N., Spady, T., Cheng, A.: New technologies for the study
of carnivore reproduction. Theriogenology 66(6–7), 1729–1736 (2006)
11. Fowden, A.L., Giussani, D., Forhead, A.: Physiological development of the equine
fetus during late gestation. Equine Vet. J. 52(2), 165–173 (2020)
12. Hilal, M., Berthin, C., Martin, L., Azami, H., Humeau-Heurtier, A.: Bidimensional
multiscale fuzzy entropy and its application to pseudoxanthoma elasticum. IEEE
Trans. Biomed. Eng. 67(7), 2015–2022 (2019)
13. Humeau-Heurtier, A.: Texture feature extraction methods: a survey. IEEE Access
7, 8975–9000 (2019)
14. Jones, M., et al.: Assessing pregnancy status using digital infrared thermal imaging
in Holstein Heifers. In: Journal of Dairy Science, vol. 88, pp. 40–41. AMER DAIRY
SCIENCE ASSOC 1111 N DUNLAP AVE, SAVOY, IL 61874 USA (2005)
15. Kastelic, J., Cook, R., Coulter, G., Wallins, G., Entz, T.: Environmental factors
affecting measurement of bovine scrotal surface temperature with infrared ther-
mography. Anim. Reprod. Sci. 41(3–4), 153–159 (1996)
16. Kirkpatrick, J.F., Lasley, B.L., Shideler, S.E., Roser, J.F., Turner, J.W., Jr.: Non-
instrumented immunoassay field tests for pregnancy detection in free-roaming feral
horses. J. Wildl. Manag. 57, 168–173 (1993)
17. Lahiri, B., Bagavathiappan, S., Jayakumar, T., Philip, J.: Medical applications of
infrared thermography: a review. Infrared Phys. Technol. 55(4), 221–235 (2012)
18. Li, P., Liu, C., Li, K., Zheng, D., Liu, C., Hou, Y.: Assessing the complexity
of short-term heartbeat interval series by distribution entropy. Med. Biol. Eng.
Comput. 53(1), 77–87 (2014). https://doi.org/10.1007/s11517-014-1216-0
19. Masko, M., Borowska, M., Domino, M., Jasinski, T., Zdrojkowski, L., Gajewski,
Z.: A novel approach to thermographic images analysis of equine thoracolumbar
region: the effect of effort and rider’s body weight on structural image complexity.
BMC Vet. Res. 17(1), 1–12 (2021)
20. Maśko, M., Witkowska-Pilaszewicz, O., Jasiński, T., Domino, M.: Thermal fea-
tures, ambient temperature and hair coat lengths: limitations of infrared imaging
in pregnant primitive breed mares within a year. Reprod. Domest. Anim. 56(10),
1315–1328 (2021)
21. Mccafferty, D.J.: The value of infrared thermography for research on mammals:
previous applications and future directions. Mamm. Rev. 37(3), 207–223 (2007)
22. McCue, P.M.: Reproductive evaluation of the mare. Equine Reproductive Proce-
dures, p. 1 (2014)
23. Richman, J.S., Moorman, J.R.: Physiological time-series analysis using approxi-
mate entropy and sample entropy. Am. J. Physiol.-Heart Circ. Physiol. 278(6),
H2039–H2049 (2000)
Two-Dimensional Entropies 65
24. Rostaghi, M., Azami, H.: Dispersion entropy: a measure for time-series analysis.
IEEE Signal Process. Lett. 23(5), 610–614 (2016)
25. da Silva, L.E.V., Senra Filho, A.C.D.S., Fazan, V.P.S., Felipe, J.C., Murta, L.O.:
Two-dimensional sample entropy analysis of rat sural nerve aging. In: 2014 36th
Annual International Conference of the IEEE Engineering in Medicine and Biology
Society, pp. 3345–3348. IEEE (2014)
26. Soroko, M., Howell, K., Dudek, K.: The effect of ambient temperature on infrared
thermographic images of joints in the distal forelimbs of healthy racehorses. J.
Therm. Biol. 66, 63–67 (2017)
27. Szczypiński, P., Klepaczko, A., Pazurek, M., Daniel, P.: Texture and color based
image segmentation and pathology detection in capsule endoscopy videos. Comput.
Methods Programs Biomed. 113(1), 396–411 (2014)
28. Szczypiński, P.M., Klepaczko, A.: Mazda–a framework for biomedical image tex-
ture analysis and data exploration. In: Biomedical Texture Analysis, pp. 315–347.
Elsevier (2017)
29. Tamura, H., Mori, S., Yamawaki, T.: Textural features corresponding to visual
perception. IEEE Trans. Systems Man Cybern. 8(6), 460–473 (1978)
30. Zarychta, P.: Application of fuzzy image concept to medical images matching. In:
Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) ITIB 2018. AISC, vol. 762,
pp. 27–38. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91211-0 3
Eye-Tracking as a Component
of Multimodal Emotion Recognition
Systems
1 Introduction
Emotion recognition is a field of research that has been constantly gaining pop-
ularity. Algorithms enabling translation of this non-verbal communication cues
can not only provide another step towards enriching computer-user interaction,
but additionally may be a helpful tool in the treatment of anxiety, depression
or autism spectrum disorders. The beginning of research on automatic human
emotion detection goes back decades, but the constant development of new sig-
nal processing, machine learning and computer vision methods combined with
an increase in the accuracy and availability of measuring equipment prompts
researchers to further research in this area.
Computer recognition of emotion (also known as affective computing) is
mainly targeted to adapt the man-machine communication to emotional state of
the human user [25]. As it was proven in psychology (i.e. [39]), emotions favor
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 66–75, 2022.
https://doi.org/10.1007/978-3-031-09135-3_6
Eye-Tracking as a Component of Multimodal Emotion Recognition Systems 67
the fast (i.e. intuitive) part of human cognition and reduce slow analytic pro-
cesses. Consequently, with increase of the user’s emotion intensity his or her
perception (visual, auditory and other) becomes faster, less detailed and more
prone to use stereotypes. Certainly, maintaining unambiguous communication
in variable emotional status of the human requires the machine to recognize the
emotions and to adapt cues accordingly.
Computer recognition of emotion is also employed as part of automatic
behavior tracking in real and virtual life (e.g. security surveillance, workplace
monitoring, communication with impaired people or computer games). Wide
areas of possible applications justify the attention paid recently by research
teams worldwide to develop an effective yet inobtrusive way of detection.
Achieving a method for the automatic assessment of emotions requires close
cooperation between psychologists and engineers as the selection of appropriate
parameters and further analyses require specialized knowledge. Among the most
popular emotion classification models are Circumplex [30], Ekman’s [12] and
Plutchik’s models. Visual representation of Plutchnik’s emotion model can be
presented as a wheel. It consists of eight basic bipolar emotions that result
from adaptation mechanisms. These are respectively anticipation, joy, trust, fear,
surprise, sadness, disgust and anger. In this theory all other emotions are either
a combination of basic emotions or a version of a basic emotion of greater or
lesser intensity [29]. Paul Ekman in his theory introduced six basic emotions: joy,
sadness, anger, fear, disgust and surprise, which in his opinion are experienced
and recognized universally regardless of culture [12]. The circumplex emotions
model differs from other mentioned models because it’s not discrete. Instead
every emotion is described by two parameters valence and arousal. Every emotion
can be shown on a two-dimensional plane created by the axes corresponding to
these parameters [30].
Taking into account the models introduced by psychologists, researchers aim
to create a method that allows the computer system to translate measurement
signals into emotions. Depending on the usage scenario, solutions based on facial
expression recognition [20,23,38], speech and voice analysis [10,16,26] or physi-
ological signals have been presented. Observations of emotional state changes in
physiological signals are possible due to changes of activity of the autonomic ner-
vous system. It consists of two bipolar components – sympathetic and parasym-
pathetic nervous systems. The former is responsible for the preparation of the
organism in the event of a potential threat. It causes symptoms such as increased
hear rate, pupil size, blood pressure and respiration rate, excessive sweating. In
contrary the latter aims to maintain homeostasis of the organism. Its activity
causes among others decrease in heart rate, pupil size and blood pressure as well
as muscle relaxation [17].
Common knowledge about the effects of these systems and brainwaves activ-
ity was the basis for the development of emotion recognition methods built on
ECG (electrocardiography), EEG (electroencephalography), EOG (electroocu-
lography), EMG (electromyography), GSR (Galvanic Skin Response), Respi-
ration Rate or HRV (Heart Rate Variability) signal analysis. The solutions
68 W. Celniak and P. Augustyniak
described in the literature use single of the above-mentioned signals, but also
combinations of several of them. Nevertheless, the results obtained with these
methods still leave room for improvement. Eye-tracking is a relatively new and
promising approach to emotional evaluation. The use of eye-movement features
in a multimodal system could enrich the emotion assessment algorithms with
additional information. Moreover, eye-tracking is related to the principal human
sense of sight that rapidly responds to emotion changes in search for further cues.
Consequently, pupils becomes larger and eye motions show more variance with
the growth of emotions intensity. The scanpath also has a technical advantage –
can be recorded remotely and, in particular circumstances, behind the knowledge
of the subject. For similar advantages, the eye-tracking has recently been proved
effective for detection of deception [9] in various scenarios of investigation.
In this paper, we provide an overview of the solutions presented so far using
eye-tracking as a method in multimodal emotion recognition. It is organized as
follows: firstly the eye features of the greatest importance are presented, then
commonly used eye-tracking devices are characterized, next section contains a
description of architectures introduced to date. In the final part a summary
and comparison are given along with the advantages and disadvantages of the
presented approaches.
The beginnings of research on eye movement date back to the beginning of the
20th century. The author of the first eye-tracking device was Edmund Huey,
and the solution presented by him allowed for tracking eyes in a horizontal
line. It consisted of a lens with a pointer attached to it. The movement of
the pointer reflects the movement of the eye while reading. Advancement in
technology allowed to create less invasive, more accurate and convenient meth-
ods. Nowadays eye-trackers use image processing algorithms to calculate gaze
direction from pupil center corneal reflection caused by the near-infrared light
source. Webcam image-based algorithms are also gaining popularity but follow
up research must be conducted to asses accuracy compared to near-infrared
based eye-tracking.
Both proprietary solutions and commercially available eye-tracking devices
are used in the research. They can be divided into two groups according to the
way they work: static and wearable (i.e. head-mounted). Eye-tracking is also used
in AR\VR headsets. Both options have their advantages and disadvantages. The
former allows for more natural conditions, while the latter enables measurement
while observing both the real world and the virtual world presented on the screen.
that may easily be laid over the layer of displayed visual stimulus. In some cases,
however, additional algorithm is needed to trace the face and maintain the focus
on the eye. Alternatively, the head may be immobilized in a chin support, what
is not comfortable to the observer, particularly in long-lasting experiments.
[8] (Fig. 2). Because of that, there is a lack of depth information and to carry out
eye-tracking additional calibration have to be conducted so that it is possible to
build a model of accurate gaze trajectory. In recent works various parameters of
the signal captured with commercially available virtual reality and mixed real-
ity glasses have been evaluated, among the most popular devices are Microsoft
Hololens 2 [3], HTC Vive Pro Eye [34], Fove-0 and the Varjo VR-1 [37]. Besides
creating more immersive setup of the experiment, the advantage of eye-tracking
in AR\VR sets is the rigid correspondence of scene and eye coordinates, thus
the scanpath is natively represented in the geometry of the scene.
Guo et al. [14], apart from BDAE, took advantage of features fusion. They
combined features extracted from Eye Image, Eye Movement, and EEG signals
such as pupil diameter, dispersion, fixation duration, saccade, and different eye
events statistics e.g. blink frequency from Eye Movement, from Eye Image tem-
poral features was achieved using a combination of CNN and LSTM networks
and from EEG – Differential Entropy (DE) features. For eye images capturing
SMI ETG glasses were used. For the classification model Support vector machine
(SVM) with a linear kernel was used. They found out that accuracy of the model
trained on all of the features was 79.63%, but the combination of movement and
eye images followed close by with the result of 71.99%
Contrary to the combination of EEG and eye-tracking signals Perdiz et al.
[27] used EMG and EOG signals, to classify four basic expressions: neutral, sad,
happy, and angry. Although more research has to be conducted to get more
information about accuracy of this proposed method.
5 Conclusion
References
1. Alshehri, M., Alghowinem, S.: An exploratory study of detecting emotion states
using eye-tracking technology. In: 2013 Science and Information Conference, pp.
428–433 (2013)
2. Ariel, R., Castel, A.D.: Eyes wide open: enhanced pupil dilation when selectively
studying important information. Exp. Brain Res. 232(1), 337–344 (2013). https://
doi.org/10.1007/s00221-013-3744-5
3. Aziz, S.D., Komogortsev, O.V.: An assessment of the eye tracking signal quality
captured in the Hololens 2. arXiv preprint arXiv:2111.07209 (2021)
4. Beljaars, D.: Eye-tracking: retracing visual perception in the everyday environ-
ments of people with Tourette syndrome (2015)
5. Bentivoglio, A.R., Bressman, S.B., Cassetta, E., Carretta, D., Tonali, P., Albanese,
A.: Analysis of blink rate patterns in normal subjects. Mov. disord. 12(6), 1028–
1034 (1997)
74 W. Celniak and P. Augustyniak
6. Bhowmik, S., Arjunan, S.P., Sarossy, M., Radcliffe, P., Kumar, D.K.: Pupillometric
recordings to detect glaucoma. Physiol. Meas. 42(4), 045003 (2021)
7. Bristol, S., et al.: Visual biases and attentional inflexibilities differentiate those
at elevated likelihood of autism: an eye-tracking study. Am. J. Occup. Ther.
74(4 Supplement 1), 7411505221p1–7411505221p1 (2020)
8. Clay, V., König, P., Koenig, S.: Eye tracking in virtual reality. J. Eye Mov. Res.
12(1), 3 (2019)
9. Cook, A.E., et al.: Lyin’ eyes: ocular-motor measures of reading reveal deception.
J. Exp. Psychol. Appl. 18(3), 301–313 (2012). https://doi.org/10.1037/a0028307
10. Dimitrova-Grekow, T., Klis, A., Igras-Cybulska, M.: Speech emotion recognition
based on voice fundamental frequency. Arch. Acoust. 44, 277–286 (2019)
11. Eckstein, M.K., Guerra-Carrillo, B., Singley, A.T.M., Bunge, S.A.: Beyond eye
gaze: what else can eyetracking reveal about cognition and cognitive development?
Dev. Cogn. Neurosci. 25, 69–91 (2017)
12. Ekman, P.: Basic emotions. Handb. Cogn. Emot. 98(45–60), 16 (1999)
13. de Gee, J.W., Knapen, T., Donner, T.H.: Decision-related pupil dilation reflects
upcoming choice and individual bias. Proc. Natl. Acad. Sci. 111(5), E618–E625
(2014)
14. Guo, J.J., Zhou, R., Zhao, L.M., Lu, B.L.: Multimodal emotion recognition from
eye image, eye movement and EEG using deep neural networks. In: 2019 41st
Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (EMBC), pp. 3071–3074. IEEE (2019)
15. Korek, W.T., Mendez, A., Asad, H.U., Li, W.-C., Lone, M.: Understanding human
behaviour in flight operation using eye-tracking technology. In: Harris, D., Li, W.-
C. (eds.) HCII 2020, Part II. LNCS (LNAI), vol. 12187, pp. 304–320. Springer,
Cham (2020). https://doi.org/10.1007/978-3-030-49183-3 24
16. Kwon, S., et al.: MLT-DNet: speech emotion recognition using 1D dilated CNN
based on multi-learning trick approach. Expert Syst. Appl. 167, 114177 (2021)
17. Levenson, R.: The autonomic nervous system and emotion. Emot. Rev. 6, 100–112
(2014)
18. Lu, Y., Zheng, W.L., Li, B., Lu, B.L.: Combining eye movements and EEG to
enhance emotion recognition. In: Twenty-Fourth International Joint Conference
on Artificial Intelligence (2015)
19. Maffei, A., Angrilli, A.: Spontaneous blink rate as an index of attention and emo-
tion during film clips viewing. Physiol. Behav. 204, 256–263 (2019)
20. Maglogiannis, I., Vouyioukas, D., Aggelopoulos, C.: Face detection and recognition
of natural human emotion using Markov random fields. Pers. Ubiquit. Comput.
13(1), 95–101 (2009)
21. Martin, K.B.: Differences aren’t deficiencies: eye tracking reveals the strengths of
individuals with autism. Retr. Sept. 12, 2021 (2018)
22. Mathôt, S.: Pupillometry: psychology, physiology, and function. J. Cogn. 1(1), 16
(2018)
23. Mehendale, N.: Facial emotion recognition using convolutional neural networks
(FERC). SN Appl. Sci. 2(3), 1–8 (2020)
24. Miranda, A.M., Nunes-Pereira, E.J., Baskaran, K., Macedo, A.F.: Eye movements,
convergence distance and pupil-size when reading from smartphone, computer,
print and tablet. Scand. J. Optom. Vis. Sci. 11(1), 1–5 (2018)
25. Nalepa, G.J., Palma, J., Herrero, M.T.: Affective computing in ambient intelligence
systems. Future Gener. Comput. Syst. 92, 454–457 (2019). https://doi.org/10.
1016/j.future.2018.11.016
Eye-Tracking as a Component of Multimodal Emotion Recognition Systems 75
26. Ntalampiras, S.: Speech emotion recognition via learning analogies. Pattern Recog-
nit. Lett. 144, 21–26 (2021)
27. Perdiz, J., Pires, G., Nunes, U.J.: Emotional state detection based on EMG and
EOG biosignals: a short survey. In: 2017 IEEE 5th Portuguese Meeting on Bio-
engineering (ENBENG), pp. 1–4. IEEE (2017)
28. Przybylo, J., Kańtoch, E., Augustyniak, P.: Eyetracking-based assessment of affect-
related decay of human performance in visual tasks. Future Gener. Comput. Syst.
92, 504–515 (2019). https://doi.org/10.1016/j.future.2018.02.012, https://www.
sciencedirect.com/science/article/pii/S0167739X17312001
29. Robert, P.: The nature of emotions. Am. Sci. 89(4), 344–350 (2001)
30. Russell, J.A.: A circumplex model of affect. J. Personal. Soc. Psychol. 39(6), 1161
(1980)
31. Saisara, U., Boonbrahm, P., Chaiwiriya, A.: Strabismus screening by eye tracker
and games. In: 2017 14th International Joint Conference on Computer Science and
Software Engineering (JCSSE), pp. 1–5. IEEE (2017)
32. Scott, G.G., O’Donnell, P.J., Sereno, S.C.: Emotion words affect eye fixations dur-
ing reading. J. Exp. Psychol.: Learn. Mem. Cogn. 38(3), 783 (2012)
33. Shahbakhti, M., et al.: Simultaneous eye blink characterization and elimination
from low-channel prefrontal EEG signals enhances driver drowsiness detection.
IEEE J. Biomed. Health Inform. 26, 1001–1012 (2021). https://doi.org/10.1109/
JBHI.2021.3096984
34. Sipatchin, A., Wahl, S., Rifai, K.: Eye-tracking for low vision with virtual reality
(VR): testing status quo usability of the HTC Vive Pro Eye. bioRxiv (2020)
35. Sirois, S., Brisson, J.: Pupillometry. Wiley Interdiscip. Rev.: Cogn. Sci. 5(6), 679–
692 (2014)
36. Soleymani, M., Pantic, M., Pun, T.: Multimodal emotion recognition in response
to videos. IEEE Trans. Affect. Comput. 3(2), 211–223 (2011)
37. Stein, N., et al.: A comparison of eye tracking latencies among several commercial
head-mounted displays. i-Perception 12(1), 2041669520983338 (2021)
38. Tie, Y., Guan, L.: A deformable 3-D facial expression model for dynamic human
emotional state recognition. IEEE Trans. Circuits Syst. Video Technol. 23(1), 142–
157 (2012)
39. Tversky, A., Kahneman, D.: Judgment under uncertainty: heuristics and biases.
Sci., New Ser. 185(4157), 1124–1131 (1974)
40. Zekveld, A.A., Kramer, S.E.: Cognitive processing load across a wide range of
listening conditions: insights from pupillometry. Psychophysiology 51(3), 277–284
(2014)
41. Zheng, W.L., Dong, B.N., Lu, B.L.: Multimodal emotion recognition using EEG
and eye tracking data. In: 2014 36th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, pp. 5040–5043. IEEE (2014)
42. Zheng, W.L., Liu, W., Lu, Y., Lu, B.L., Cichocki, A.: Emotionmeter: a multimodal
framework for recognizing human emotions. IEEE Trans. Cybern. 49(3), 1110–1122
(2018)
43. Zhou, J., Wei, X., Cheng, C., Yang, Q., Li, Q.: Multimodal emotion recognition
method based on convolutional auto-encoder. Int. J. Comput. Intell. Syst. 12(1),
351–358 (2019)
Sleep Quality in Population Studies –
Relationship of BMI and Sleep Quality
in Men
1 Introduction
Sleep is a physiological state present in humans as well as in most other species.
To keep optimal health, both quality and quantity of sleep should be well [2]. The
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 76–83, 2022.
https://doi.org/10.1007/978-3-031-09135-3_7
Sleep Quality 77
stages of sleep), number, and timing of awakenings during sleep are observed [1].
Electrooculography (EOG) is used to register the horizontal and vertical move-
ment of the eyes by placing the electrodes on the outer corners of the eyes (one
above the right and one below the left). EOG results are complementary to EEG
in assessing the REM stage of sleep during polysomnography. Additionally, an
electromyography (EMG) sensor on the chin makes it possible to define mus-
cle atony, characteristic of the REM sleep stage. Another EMG sensors can be
located on the anterior muscles of the tibias to record lower limb movements dur-
ing sleep. Moreover, in polysomnography study can be used: electrocardiography
(ECG – to monitor heart activity), airflow sensors (to measure the episodes of
apnea), pulse oximetry (to assess the blood oxygen level), chest and abdomen
movement sensors (to notice respiratory muscle function), microphone (to record
snoring) and body position sensors (to assess the position of the patient’s body
during the night) [1].
Access to polysomnography becomes more common as sleep disorders are
becoming more frequent, especially with the growing number of people exposed
to sleep disturbance risk factors. Such risk factors are, among others, depression
and anxiety, exposure to chronic stress, abuse of alcohol, coffee, or cigarettes,
and obesity [9,12,17]. High Body Mass Index (BMI – a value derived from the
weight and height of the person) indicates obesity and may be correlated to poor
sleep quality [3,5,14]. The aim of this study was to analyze sleep parameters of
adult men by the value of their Body Mass Index.
The materials for this study were obtained from the polysomnography laboratory
hospital archives in Opolskie Province, Poland. Sleep analyzes were conducted
using Nox A1 PSG System (Nox Medical, Reykjavik, Iceland) equipment. The
data were polysomnographic records of 94 men aged 38–60 examined in 2015 and
2016 on their own initiative due to their sleep disorders. Consumption of caffeine,
alcohol, or sleeping pills before the examination was a criterion for excluding
individuals from the group. Additional criteria for excluding participants from
the analysis were duration of sleep shorter than 90 min (one sleep cycle) and
body mass index (BMI, [kg/m2 ]) greater than 50. In total, 75 men met the
criteria for inclusion in the study.
The characteristics of age and BMI of the examined group is presented in
Table 1.
study and sleep efficiency correlated negatively with body mass index values
(Fig. 1A). The total time spent sleepless during the study and the time spent
awake after falling asleep and before final awakening correlated positively with
the BMI of participants. The ratio of time spent in the 2. stage of sleep to time
spent in bed correlated negatively while time spent in the 4. stage of sleep to
time in bed and time spent in the 3. stage of sleep to total sleep time correlated
positively with body mass index of men. Time of obstructive sleep apnea and
apnea and hypopnea in NREM sleep correlated positively with BMI. Events per
hour of NREM sleep regarding sleep apnea, apnea/hypopnea, oxygen desatura-
tion, and isolated limb movement and snoring index correlated positively with
body mass index (Fig. 1B). The number of oxygen desaturation of 90% to 94%
and oxygen desaturation Nadir in NREM sleep correlated negatively with BMI
(Fig. 1C).
Table 2. Pearson correlation coefficient of sleep parameters and body mass index
(BMI) of study participants (N = 75)
Variable r p
Total sleep time (TST) [min] −0.28 0.017
Total sleepless time [min] 0.27 0.018
Wake time After Sleep Onset (WASO) [min] 0.29 0.013
Sleep efficiency (TST to TIB ratio) [%] −0.28 0.013
2. stage of sleep to TIB ratio [%] −0.26 0.026
4. stage of sleep to TIB ratio [%] 0.27 0.021
3. stage of sleep to TST ratio [%] 0.26 0.024
NREM obstructive sleep apnea [min] 0.27 0.019
Apnea and hypopnea in NREM [min] 0.23 0.043
NREM sleep apnea index (events per hour) 0.34 0.003
NREM apnea/hypopnea index 0.34 0.003
NREM oxygen desaturation 90%–94% −0.31 0.007
NREM oxygen desaturation nadir −0.50 0.001
NREM oxygen desaturation index 0.35 0.002
Isolated limb movement index 0.39 0.001
Snoring index in total 0.24 0.042
The study adds to the body of literature the sample of the adult men whose
sleep parameters were assessed by polysomnography. The BMI value of 90% of
men (above the 10th percentile) indicates that their body weight was above nor-
mal. The study results are consistent with those of Moraes et al. [8]. In both
studies, total sleep time is significantly shorter, the higher BMI the patient had,
and sleep efficiency is the lower, the higher BMI was, and WASO values correlate
with BMI positively. Significant positive correlation of BMI with the occurrence
Sleep Quality 81
Fig. 1. Correlation of body mass index (BMI) and: (A) sleep efficiency, (B) NREM
sleep apnea index, (C) NREM oxygen desaturation Nadir of study participants (N =
75)
82 A. Witek et al.
4 Conclusion
The sample of the study consisted in 90% of men who had mean BMI that
indicated overweight or obesity. The shorter and less efficient their sleep time,
the higher BMI was. Episodes of apnea and hypopnea were more numerous
the higher men’s BMI. Those sleep breathing disorders happened significantly
more often in NREM phases of sleep than in REM. Moreover, the number of
limb movements and snoring was higher the higher BMI. These tendencies often
result from the accumulation of fat in the neck that narrows the throat – it
predisposes the throat to collapse and causes apnea, hypopnea, or snoring and,
as a result, waking from sleep.
References
1. Avidan, A.Y., Zee, P.C., Smalling, T.R.: Handbook of Sleep Medicine. Lippincott
Williams & Wilkins (LWW), Philadelphia (2007)
2. Brand, S., Kirov, R.: Sleep and its importance in adolescence and in common ado-
lescent somatic and psychiatric conditions. Int. J. Gener. Med. 4, 425–442 (2011).
https://doi.org/10.2147/IJGM.S11557
3. Carno, M.A., et al.: Symptoms of sleep apnea and polysomnography as predictors
of poor quality of life in overweight children and adolescents. J. Pediatr. Psychol.
33(3), 269–278 (2007). https://doi.org/10.1093/jpepsy/jsm127
4. Dattilo, M., et al.: Effects of sleep deprivation on the acute skeletal muscle recovery
after exercise. Med. Sci. Sports Exerc. 52, 507–514 (2020). https://doi.org/10.
1249/MSS.0000000000002137
5. Dixon, J., Schachter, L., O’Brien, P.: Polysomnography before and after weight
loss in obese patients with severe sleep apnea. Int. J. Obes. 29, 1048–1054 (2005).
https://doi.org/10.1038/sj.ijo.0802960
6. Krishnan, V., Collop, N.A.: Gender differences in sleep disorders. Curr. Opin.
Pulm. Med. 12, 383–389 (2006). https://doi.org/10.1097/01.mcp.0000245705.
69440.6a
7. Landry, G.J., Best, J.R., Liu-Ambrose, T.: Measuring sleep quality in older adults:
a comparison using subjective and objective methods. Front. Aging Neurosci. 7,
166–166 (2015). https://doi.org/10.3389/fnagi.2015.00166
8. Moraes, W., et al.: Association between body mass index and sleep duration
assessed by objective methods in a representative sample of the adult popula-
tion. Sleep Med. 14(4), 312–318 (2013). https://doi.org/10.1016/j.sleep.2012.11.
010
Sleep Quality 83
9. Ohayon, M., Li, K., Guilleminault, C.: Risk factors for sleep bruxism in the general
population. Chest 119, 53–61 (2001). https://doi.org/10.1378/chest.119.1.53
10. Peng, Z., Dai, C., Ba, Y., Zhang, L., Shao, Y., Tian, J.: Effect of sleep depriva-
tion on the working memory-related n2–p3 components of the event-related poten-
tial waveform. Front. Neurosci. 14, 469–469 (2020). https://doi.org/10.3389/fnins.
2020.00469
11. Reite, M., Weissberg, M., Ruddy, J.: Clinical Manual for Evaluation and Treatment
of Sleep Disorders. American Psychiatric Pub. (2009)
12. Riemann, D., Berger, M., Voderholzer, U.: Sleep and depression - results from
psychobiological studies: an overview. Biol. Psychol. 57(1), 67–103 (2001). https://
doi.org/10.1016/S0301-0511(01)00090-4
13. Sateia, M.J.: International classification of sleep disorders-third edition. Chest
146(5), 1387–1394 (2014). https://doi.org/10.1378/chest.14-0970
14. Spruyt, K., Molfese, D., Gozal, D.: Sleep duration, sleep regularity, body weight,
and metabolic homeostasis in school-aged children. Pediatrics 127, e345-52 (2011).
https://doi.org/10.1542/peds.2010-0497
15. Tan, H.L., Gozal, D., Ramirez, H., Bandla, H., Kheirandish-Gozal, L.: Overnight
polysomnography versus respiratory polygraphy in the diagnosis of pediatric
obstructive sleep apnea. Sleep 37, 255–260 (2014). https://doi.org/10.5665/sleep.
3392
16. Watson, N.F., et al.: Joint consensus statement of the American academy of sleep
medicine and sleep research society on the recommended amount of sleep for a
healthy adult: methodology and discussion. J. Clin. Sleep Med. 11(8), 931–952
(2015). https://doi.org/10.5664/jcsm.4950
17. Witek, A., Lipowicz, A.: The impact of cigarette smoking on the quality of sleep
in polish men. Anthropol. Rev. 84(4), 369–382 (2021)
18. World Health Organization: WHO technical meeting on sleep and health: Bonn
Germany (2004)
Exploring Developmental Factors Related
to Auditory Processing Disorder
in Children
1(B)
Michal Krecichwost
, Natalia Moćko2 , Magdalena L
awecka3 ,
Zuzanna Miodońska , Agata Sage , and Pawel Badura1
1 1
1
Faculty of Biomedical Engineering, Silesian University of Technology,
ul. Roosevelta 40, 41-800 Zabrze, Poland
{michal.krecichwost,zuzanna.miodonska,agata.sage,pawel.badura}@polsl.pl
2
Faculty of Humanities, Institute of Linguistics, University of Silesia,
ul. Sejmu Ślaskiego
1, 40-001 Katowice, Poland
natalia.mocko@us.edu.pl
3
Silesian Center for Hearing and Speech Medincus,
Nasypowa 18, 40-551 Katowice, Poland
1 Introduction
in a child. The symptoms of APD will vary depending on what causes the dis-
order. The school difficulties arising from pregnancy damage may differ from
adolescent auditory deprivation.
In this paper, we present a preliminary study on the developmental factors
that correlate with the presence of APD. We examined a group of 60 children
and performed a qualitative analysis of gathered material to evaluate the role
of selected factors (pregnancy, perinatal, and early childhood) in the APD risk
assessment. State of the art suggests the importance of considering multiple
elements in the context of difficulties in sound processing that affect different
age groups.
So far, diagnosis is triggered mainly by noticing increased problems in learn-
ing. Specific tests are ordered and assessed by an audiologist, which confirms
the presence of APD in children, including tests of higher auditory functions.
However, such an operation frequently takes over a year. Our work constitutes
a preliminary stage for developing the examination protocol and implementing
computer-aided approaches to speed up the diagnostic process and make it more
accessible.
The remainder of the paper is structured as follows: Sect. 2 describes the
process of constructing the database, including areas of information collected
by the interviews. In Sect. 3, we present the outcomes of qualitative analysis of
gathered material. Section 4 discusses the results and indicates the ideas for the
development, while Sect. 5 concludes this paper.
Fig. 1. Distribution of age in investigated group. The median is marked with a red
bar. The whiskers present minimum and maximum values
3 Results
loss) during the period of speech acquisition, 3rd tonsil/polyps (pharyngeal ade-
noid), high exposure to processed signals (increased presence of digital sounds in
the child’s acoustic environment), abnormal muscle ton/neurological disorders,
upper respiratory tract infections/obstruction of the Eustachian tubes (nasal
infections), and allergy/asthma (Fig. 2(b)).
The respondents’ answers suggest that the most frequent difficulties resulting
from etiology include: inability to properly master reading and writing skills, low
concentration level, problems with understanding in noise, articulation disorders
(persistent speech impediments), and decreased auditory memory level. Auditory
hypersensitivity occurs the least frequently (Fig. 3). It is also not an isolated
difficulty. It mostly coexists with the problems related to understanding in noise
(Fig. 3(b)).
Figure 5 shows the most frequently related etiological factors that the parents
mentioned. We observed that abnormal muscle tone (AbTo) was most common
in those children whose parents reported allergy or asthma (AlAs) as well as
frequent nasal infections (NaIn). An association was also observed in the case of
abnormal muscle tone (AbTo) and high exposure to processed signals (HiEx).
In addition, high exposure to processed signals can be linked to allergies and
asthma (AlAs).
We divided the etiological factors according to the time of occurrence and
the environment influence on the factor presence into three categories (Table 1):
(Group 1) prenatal and perinatal, (Group 2) health problems during the speech
acquisition period, and (Group 3) environment-specific factors present during
the speech acquisition period.
Table 1. Etiological factors included in each group: (Group 1) prenatal and peri-
natal, (Group 2) health problems during speech acquisition period, and (Group 3)
environment-specific factors present during speech acquisition period
Groups Etiologies
Group 1 Burdened pregnancy
Burdened childbirth
Abnormal muscle ton/neurological disorders
Group 2 Ear inflammation/hearing disorders during the period of speech learning
3rd tonsil/polyps
Upper respiratory tract infections/obstruction of the Eustachian tubes
Allergy/asthma
Group 3 High exposure to processed signals
Group 1 included both prenatal and perinatal factors. However, the issues
related to the complications in pregnancy constituted a smaller group than peri-
natal factors (Fig. 6). Groups 2 and 3 involved factors occurring in the period
of intense speech acquisition (2.5–6 years). These factors relate to the persistent
disturbance of proper reception of sounds caused by co-occurring diseases and
Developmental Factors Related to APD in Children 89
disorders (Group 2) in a specific age range. The last category of factors (Group
3) included environmental deprivation issues (high exposure to processed sig-
nals). The groups often co-occur: most commonly Group 1 with Group 2. Group
3 only co-occur, mostly with Group 2.
4 Discussion
tration issues are effortlessly observable. The remaining issues are often noted
by teachers and are related to functioning in the class.
Reception and decoding of sounds in an unfavorable acoustic environment
are mainly related to the development of cognitive [4] and linguistic functions
of a person [9]. The process of reproducing incomplete acoustic information
is possible due to phonological abilities. MacCutcheon [9] claims that noise
imposes requirements on cognitive speech processing in terms of working memory
resources that are necessary to help match incoming phonological information
with phonological representations stored in long-term memory. The maturation
of the auditory system is associated with the gradual mastery of auditory skills,
especially in the aspect of decoding speech sounds. The school skills that require
efficient higher auditory function operations should be mastered in the first years
of school education. This impacts the initiation of an audiological diagnosis after
the age of 7. Such solutions at the educational level are inconsistent with one of
the significant education assumptions – preventive actions to avoid difficulties at
Developmental Factors Related to APD in Children 91
5 Conclusion
We performed interviews with parents of 60 children with auditory processing
difficulties. Based on them, we separated a set of etiological factors that cover
the background of APD and are likely to be relevant for computer-aided diag-
nosis of this disorder in the future. The employment of etiology-based factors
may improve the diagnostic process by considering the number and duration
of factors that translate into incorrect mastery of school skills. Through this
research, we emphasize the relevance of the constant development of diagnostic
tools dedicated to APD.
The next stage of the work will be developing a remote computer-aided diag-
nosis (CAD) tool supporting the diagnosis and therapy of APD using artificial
intelligence methods. The proposed approach will enable a detailed examina-
tion of disorders of higher auditory functions (manifested in the form of school
difficulties) based on a remote interview with the parent and teacher of the exam-
ined child. The data obtained as part of the interview will constitute a knowledge
base for a fuzzy expert system, which task will be to determine the profile of
the child’s disorders (in conjunction with the etiology of their occurrence). In
addition, it will help the therapist indicate areas that they should pay special
attention to in therapy.
Developmental Factors Related to APD in Children 93
References
1. Agrawal, D., Dritsakis, G., Mahon, M., Mountjoy, A., Bamiou, D.E.: Experiences of
patients with auditory processing disorder in getting support in health, education,
and work settings: findings from an online survey. Front. Neurol. 12, 167 (2021).
https://doi.org/10.3389/fneur.2021.607907
2. Barry, J., Tomlin, D., Moore, D., Dillon, H.: Use of questionnaire-based measures
in the assessment of listening difficulties in school-aged children. Ear Hear. 36
(2015). https://doi.org/10.1097/AUD.0000000000000180
3. Bellis, T.J., Ferre, J.M.: Multidimensional approach to the differential diagnosis
of central auditory processing disorders in children. J. Am. Acad. Audiol. 10(6),
319–28 (1999)
4. Bradley, J.S., Sato, H.: The intelligibility of speech in elementary school classrooms.
J. Acoust. Soc. Am. 123(4), 2078–2086 (2008)
5. Cunha, P., Silva, I.M.d.C., Rabelo, N.E., Tristão, R.M.: Auditory processing dis-
order evaluations and cognitive profiles of children with specific learning disorder.
Clin. Neurophysiol. Pract. 4, 119–127 (2019). https://doi.org/10.1016/j.cnp.2019.
05.001
6. Dillon, H., Cameron, S., Glyde, H., Wilson, W., Tomlin, D.: An opinion on the
assessment of people who may have an auditory processing disorder. J. Am. Acad.
Audiol. 23, 97–105 (2012). https://doi.org/10.3766/jaaa.23.2.4
7. Iliadou, V.M., Bamiou, D.E.: Psychometric evaluation of children with auditory
processing disorder (APD): comparison with normal-hearing and clinical non-APD
groups. J. Speech Lang. Hear. Res. JSLHR 55, 791–9 (2012). https://doi.org/10.
1044/1092-4388(2011/11-0035)
8. Iliadou, V.M., Chermak, G., Bamiou, D.E., Musiek, F.: Gold standard, evidence
based approach to diagnosing APD. Hear. J. 72, 42–46 (2019). https://doi.org/10.
1097/01.HJ.0000553582.69724.78
9. Maccutcheon, D., Füllgrabe, C., Eccles, R., van der Linde, J., Panebianco, C.,
Ljung, R.: Investigating the effect of one year of learning to play a musical instru-
ment on speech-in-noise perception and phonological short-term memory in 5-to-7-
year-old children. Front. Psychol. 10 (2020). https://doi.org/10.3389/fpsyg.2019.
02865
10. Martins, J.H., Alves, M., Andrade, S., Falé, I., Teixeira, A.: Auditory processing
disorder test battery in European Portuguese-development and normative data for
pediatric population. Audiol. Res. 11(3), 474–490 (2021). https://doi.org/10.3390/
audiolres11030044
Morphological Language Features
of Anorexia Patients Based on Natural
Language Processing
1 Introduction
The recent years have brought a severe increase in people suffering from vari-
ous eating disorders, wherein the first place is anorexia nervosa. The researchers
estimate that eating disorders affect approximately 1% of the total population,
and the number is still growing. Unfortunately, the situation with coronavirus
pandemic (Covid-19) has worsened the statistics. Many people fearing for their
health avoid contact with others, including medical staff. Online education, work,
lockdowns are the factors that impact mental conditions, especially among ado-
lescents. This young have been affected severely by the pandemic, as remote
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 94–104, 2022.
https://doi.org/10.1007/978-3-031-09135-3_9
Initial Motivation as a Factor Predicting the Progress 95
time, shorten the diagnostic process [18–21]. The aim of the article is to compare
the morphological features of language between people suffering from anorexia
and healthy people.
Within the developed criteria, 41 girls with anorexia (restrictive form) were
included in the research group. The participants were aged 12–19, an average of
15.7 ± 2. The anorexia was diagnosed according to ICD-10 and DSM IV criteria.
The average weight of the girls was 35.1 ± 4.7 kg, and BMI was between 11.3–
20.2, average 15.1 ± 2.8 (p < 0.001 vs. control group), and BMI SDS from −4.2
to 0.9, average −2.72 ± 1.49 (p < 0.001 vs. control group).
The control group consisted of 55 healthy girls, aged 12–20, average
15.1 ± 1.9, the average weight was 57.1 ± 10.1 kg and BMI from 16.5 to 25.8,
average 21.5 ± 3.4, and BMI SDS from −2.7 to 3.6, average 0.19±1.44.
– Text parsing, including dropping and keeping the term process (Start/stop
list), stemming to the basic form, part of speech tagging.
– Calculating the general number of adjectives and categorizing them into neg-
ative and positive adjectives, determining the verb tense (present or past).
Text Parsing
The parsing node enables us to explore the text to find specific terms in our docu-
ments. It includes analyzing the sentence structure and representing it according
to syntactic formalism, such as constituency (or phrase-structure) and depen-
dency [30]. This step also includes text cleaning, which is removing language
errors, removing stop words (preposition or conjunction often provide minimal
value, and are often dropped during text parsing). The next process is stemming,
Initial Motivation as a Factor Predicting the Progress 99
As mentioned previously, the analysis was carried out for determined parts of
speech. The choice of presented parts of speech was made after many consulta-
tions with the psychologist. Having acquired the basic knowledge on anorexia,
we decided to establish some grammatical patterns of patients with anorexia. At
the beginning of the research to our minds seems necessary to evaluate language
of anorexia patients in terms of morphology. First, we calculated the number of
the personal pronoun ‘I’, the possessive pronoun ‘my’. Next, the focus was on
verbs and their tenses, present, and past. The last point in analysis concerned
the number of adjectives with their negative and positive tone. The number of
particular parts of speech for both groups is presented in the form of chart.
Next, we compared the numerical amount of all parts of speech for individual
groups and calculated the number of particular parts of speech.
4 Results
Fig. 1. The mean number of parts of speech per note in research and control groups
Fig. 2. The number of positive and negative adjectives used in research and control
groups
Fig. 2. The females in the control group use this part of speech with similar
frequency, but the occurrence of adjectives with positive tone is more frequent
(Mean: 2.8 ± 1.6) that those negative (Mean: 1.8 ± 1.4).
Initial Motivation as a Factor Predicting the Progress 101
Fig. 3. The number of verbs in present and past form in research and control groups
Figure 3 presents the number of verbs in present and past form in research
and control groups notes. Patients in the research group used significantly more
verbs than those in the control one (present tense Mean (RG): 5.1 ± 3.4 vs. Mean
(CG): 1.9 ± 1.8). The analysis revealed that anorexia patients in the majority
referred to present and past tenses, and the verb used with high frequency is ‘to
be’. In the control group, we did not observe verbs in the past tense.
References
1. Springall, G., Cheung, M., Sawyer, S.M., Yeo, M.: Impact of the coronavirus pan-
demic on anorexia nervosa and atypical anorexia nervosa presentations to an Aus-
tralian tertiary paediatric hospital. J. Paediatr. Child Health (2021). https://doi.
org/10.1111/JPC.15755
2. Gigantesco, A., Masocco, M., Picardi, A., Lega, I., Conti, S., Vichi, M.: Hospital-
ization for anorexia nervosa in Italy. Riv. Psichiatr. 45, 154–162 (2010)
3. Miniati, M., et al.: Eating disorders spectrum during the COVID pandemic: a sys-
tematic review. Front. Psychol. 12, 4161 (2021). https://doi.org/10.3389/FPSYG.
2021.663376
4. Vuillier, L., May, L., Greville-Harris, M., Surman, R., Moseley, R.L.: The impact of
the COVID-19 pandemic on individuals with eating disorders: the role of emotion
regulation and exploration of online treatment experiences. J. Eat. Disord. 9, 1–18
(2021). https://doi.org/10.1186/S40337-020-00362-9
5. Rodgers, R.F., et al.: The impact of the COVID-19 pandemic on eating disorder
risk and symptoms. Int. J. Eat. Disord. 53, 1166–1170 (2020). https://doi.org/10.
1002/eat.23318
6. Surgenor, L.J., Maguire, S.: Assessment of anorexia nervosa: an overview of uni-
versal issues and contextual challenges. J. Eat. Disord. 1, 1–12 (2013). https://doi.
org/10.1186/2050-2974-1-29
7. Damiano, S.R., Atkins, L., Reece, J.: The psychological profile of adolescents with
anorexia and implications for treatment. J. Eat. Disord. 2, 1–1 (2014). https://
doi.org/10.1186/2050-2974-2-S1-P9
8. Anorexia Nervosa: Symptoms, Causes, and Treatments. https://www.healthline.
com/health/anorexia-nervosa. Accessed 23 Jan 2022
9. Smink, F.R.E., Van Hoeken, D., Hoek, H.W.: Epidemiology of eating disorders:
incidence, prevalence and mortality rates. Curr. Psychiatry Rep. 14, 406–414
(2012). https://doi.org/10.1007/S11920-012-0282-Y
10. Wu, J., Liu, J., Li, S., Ma, H., Wang, Y.: Trends in the prevalence and disability-
adjusted life years of eating disorders from 1990 to 2017: results from the Global
Burden of Disease Study 2017. Epidemiol. Psychiatr. Sci. 29 (2020). https://doi.
org/10.1017/S2045796020001055
11. Kotwas, A., Karakiewicz-Krawczyk, K., Zabielska, P., Jurczak, A., Bażydlo, M.,
Karakiewicz, B.: The incidence of eating disorders among upper secondary school
female students. Psychiatr. Pol. 54, 253–263 (2020). https://doi.org/10.12740/PP/
ONLINEFIRST/99164
12. Quick, V.M., Byrd-Bredbenner, C., Neumark-Sztainer, D.: Chronic illness and dis-
ordered eating: a discussion of the literature. Adv. Nutr. 4, 277 (2013). https://
doi.org/10.3945/AN.112.003608
13. Stice, E., Nathan Marti, C., Rohde, P.: Prevalence, incidence, impairment, and
course of the proposed DSM-5 eating disorder diagnoses in an 8-year prospective
community study of young women. J. Abnorm. Psychol. 122, 445 (2013). https://
doi.org/10.1037/A0030679
Initial Motivation as a Factor Predicting the Progress 103
14. Abebe, D.S., Lien, L., Von Soest, T.: The development of bulimic symptoms from
adolescence to young adulthood in females and males: a population-based longi-
tudinal cohort study. Int. J. Eat. Disord. 45, 737–745 (2012). https://doi.org/10.
1002/EAT.20950
15. Guillaume, S., et al.: Characteristics of suicide attempts in anorexia and bulimia
nervosa: a case-control study. PLoS ONE 6, 23578 (2011). https://doi.org/10.1371/
JOURNAL.PONE.0023578
16. Abbate-Daga, G., Amianto, F., Delsedime, N., De-Bacco, C., Fassino, S.: Resis-
tance to treatment in eating disorders: a critical challenge. BMC Psychiatry 13,
1–18 (2013). https://doi.org/10.1186/1471-244X-13-294
17. Robertson, A., Thornton, C.: Challenging rigidity in Anorexia (treatment, training
and supervision): questioning manual adherence in the face of complexity. J. Eat.
Disord. 9, 1–8 (2021). https://doi.org/10.1186/S40337-021-00460-2
18. Bellows, B.K., et al.: Automated identification of patients with a diagnosis of
binge eating disorder from narrative electronic health records. J. Am. Med. Inform.
Assoc. 21, e163 (2014). https://doi.org/10.1136/AMIAJNL-2013-001859
19. Funk, B., et al.: A framework for applying natural language processing in digital
health interventions. J. Med. Internet Res. 22 (2020). https://doi.org/10.2196/
13855
20. Spinczyk, D., Bas, M., Dzieciako, M., Maćkowski, M., Rojewska, K., Maćkowska,
S.: Computer-aided therapeutic diagnosis for anorexia. Biomed. Eng. Online 19
(2020). https://doi.org/10.1186/S12938-020-00798-9
21. Barańska, K., Różańska, A., Maćkowska, S., Rojewska, K., Spinczyk, D.: Determin-
ing the intensity of basic emotions among people suffering from anorexia nervosa
based on free statements about their body. Electronics 11, 138 (2022). https://
doi.org/10.3390/electronics11010138
22. Iliev, R., Dehghani, M., Sagi, E.: Automated text analysis in psychology: methods,
applications, and future developments. Lang. Cogn. 7, 265–290 (2015). https://
doi.org/10.1017/LANGCOG.2014.30
23. Calvo, R.A., Milne, D.N., Hussain, M.S., Christensen, H.: Natural language pro-
cessing in mental health applications using non-clinical texts. Nat. Lang. Eng. 23,
649–685 (2017). https://doi.org/10.1017/S1351324916000383
24. Rezaii, N., Walker, E., Wolff, P.: A machine learning approach to predicting psy-
chosis using semantic density and latent content analysis. npj Schizophr. 5, 1–12
(2019). https://doi.org/10.1038/s41537-019-0077-9
25. Van Puyvelde, M., Neyt, X., McGlone, F., Pattyn, N.: Voice stress analysis: a
new framework for voice and effort in human performance. Front. Psychol. 9, 1994
(2018). https://doi.org/10.3389/FPSYG.2018.01994
26. Rocco, D., Pastore, M., Gennaro, A., Salvatore, S., Cozzolino, M., Scorza, M.:
Beyond verbal behavior: an empirical analysis of speech rates in psychotherapy ses-
sions. Front. Psychol. 9, 978 (2018). https://doi.org/10.3389/FPSYG.2018.00978
27. Cuteri, V., et al.: Linguistic Feature of Anorexia Nervosa: A Prospective Case-
Control Pilot Study (2021). https://doi.org/10.21203/RS.3.RS-186615/V1
28. Minori, G., et al.: Linguistic markers of anorexia nervosa: preliminary data from
a prospective observational study. In: 3rd RaPID Workshop: Resources and Pro-
cessing of Linguistic, Para-linguistic and Extra-linguistic Data from People with
Various Forms of Cognitive/Psychiatric/Developmental Impairments, pp. 34–37
(2020)
104 S. Maćkowska et al.
29. Spinczyk, D., Nabrdalik, K., Rojewska, K.: Computer aided sentiment analysis
of anorexia nervosa patients’ vocabulary. Biomed. Eng. Online 17, 1–11 (2018).
https://doi.org/10.1186/S12938-018-0451-2
30. Pyysalo, S.: Text parsing. In: Dubitzky, W., Wolkenhauer, O., Cho, KH., Yokota,
H. (eds.) Encyclopedia of Systems Biology, pp. 2162–2163. Springer, New York
(2013). https://doi.org/10.1007/978-1-4419-9863-7 182
Image Analysis
Comparison of Analytical and Iterative
Algorithms for Reconstruction
of Microtomographic Phantom Images
and Rat Mandibular Scans
1 Introduction
Computed tomography (CT) is widely used in various areas of science and still
strongly developing. Nowadays, tomography is used not only in medical areas
such as medical imaging, stomatology or oncology, but also in materials science,
geology, archaeology or life sciences.
2 Methodology
2.1 Microtomography (µCT) Scanning
A group of 150 rats were treated with implants made of different materials (pure
titanium, bio-glass, composite of titanium and bio-glass, and polylactide (PLA)).
The implants were placed in the mandibular angle area. Then, after 20, 50 and
100 days, mandibular material was harvested from each material group. Only
the portion of the mandible that contained the implant was scanned on the
µCT scanner. Samples were scanned using a 1172 SkyScan microCT desktop
scanner (SkyScan, Kontich, Belgium). All samples were scanned under the same
parameters. The X-ray source was operated at 80 kV/125 µA, and a 0.5 mm, Al
filter was used. The exposure rate of the matrix itself was kept at 45%. The image
resolution was 7 µm and the rotation step was 0.4◦ . The images were saved at
a depth of 16 bits TIFF. The resolution of the matrix was 2000 × 1332 px. We
have all the required documents and permits from the bioethics committee to
perform procedures on laboratory animals.
2000 × 1332 × 901 px. In order to speed up the reconstruction process for the
microtomography data, only the central 100 cross-sections were reconstructed.
Phantoms and a scan of the rat mandible were reconstructed using FDK algo-
rithms with filtering (Ram-Lak, Shepp-Logan, Cosine, Hamming, Hann), algo-
rithms known as the algebraic reconstruction technique family (ART-family),
Conjugate Gradient Least Squares (CGLS), and the total variation regular-
ization algorithms family (ASD-POCS, OS-ASD-POCS, OS-AwASD-POCS).
Phantoms and reconstructed images were analysed in 32 bit depth.
FDK is the algorithm which extracts data linearly by treating the projection
as a matrix with n-row and m-column pixels [6]. It preserves some accuracy
in the z-direction for low complexity objects, in the plane of the beam centre
trajectory, preserving integrals in the longitudinal and oblique direction [8]. The
FDK algorithm for CBCT is described as follow:
∞
1 2π 1 D
fˆ(x, y, z) = dθ 2 × p(θ, u, u ) √ · h(u − u) · dudθ (1)
2 0 U D 2 + u2 + v 2
−∞
where fˆ is the approximate reconstruction result and h() is the ramp filter.
Phantom and sample data were reconstructed using five different filters. Each
filter was defined in the frequency domain:
AT Ax = AT b (4)
This method is definitely faster and gives similar results to algorithms from
the ART-family [15,17].
The Adaptive Steepest Descent Projection Onto Convex Subsets method
(ASD-POCS) is based on constrained total variation (TV) minimisation. The
algorithms stabilize the image reconstruction in parts with a large beam cone
angle and obtain good reconstruction of very noisy or heavily undersampled
images. The norm of the total variation is defined as the sum of the 2-norm
directional gradients of the variable.
||x||T V = || δα xn ||2 (6)
n α
Several techniques have been used to measure errors between images, to deter-
mine the errors created during reconstruction. These are Root Mean Square
Error (RMSE), Structural Similarity Index (SSIM), Multi-scale Structural Sim-
ilarity Index (MS-SSIM), and Visual Information Fidelity (VIF).
RMSE is the square root of the mean square error. It measures to what extent
the processing has induced changes per pixel. The RMSE between two images,
oryginal (k) and reconstructed (k ) shows the formula:
M −1,N
1 −1
RM SE = [k (i, j) − k(i, j)]2 (7)
M ∗ N i=0,j=0
The lower RMSE value, the closer the image is to the original or reference
image [2].
SSIM measures the similarity between two images by comparing them in
terms of luminance l, contrast c and structure s. These three components are
combined to give a measure of image similarity that can be written with the
formula [22]:
where C1 and C2 are constants using two scalar constants K1 and K2 and the
dynamic range of the image L. In this paper K1 = 0.01 and K2 = 0.03. The
SSIM indexing algorithm uses a sliding window method to evaluate the image
quality. SSIM takes values from −1 to 1, SSIM (x, y) = 1 if and only if x = y.The
window is moved pixel by pixel in the whole image. In the paper, the window
size is 8 × 8 px.
MS-SSIM involves low-pass filtering of the images and downsampling of the
filtered image by a factor of 2. The iterations up to M − 1 are performed. The
overall evaluation of SSIM is obtained by combining the measurements according
to the formula [20]:
M
M S − SSIM (x, y) = [lM (x, y)]αM · [cj (x, y)]βj · [sj (x, y)]γj (11)
j=1
3 Results
3.1 Head and Shepp-Logan Phantoms
Reconstruction Time. A script was written in Python which sequentially
reconstructed the phantoms using FDK algorithms with different filters, next
algorithms from the ART-family, CGLS, and finally the POCS family. Before
each algorithm the start time of the reconstruction was registered and after the
script exited the algorithm the end time was registered. The difference between
the start time and the end time was calculated. The results from the Head and
Shepp-Logan (S-L) phantom reconstruction are shown in Table 1.
There is a noticeable difference in the reconstruction time of the both phan-
toms. This is due to the different number of projections used for reconstruction
512 for S-L phantom and 901 for Head phantom. The differences are significant
especially with SART, CGLS, ASD-POCS and OS-AwASD-POCS algorithms.
In all iterative algorithms, 20 iterations were performed. The algorithms using
ordered subsets (OS) used a subset size of 53 for 901 projections and 32 for 512
112 P. Lipowicz et al.
Algorithm Duration
Head phantom S-L phantom Rat madibular
FDK Ram Lak 0:00:10 0:00:07 0:00:40
FDK Hann 0:00:10 0:00:06 0:00:38
FDK Cosine 0:00:11 0:00:06 0:00:44
FDK Shepp-Logan 0:00:10 0:00:06 0:00:39
FDK Hamming 0:00:11 0:00:06 0:00:41
SART 0:14:38 0:08:39 1:16:58
SIRT 0:00:06 0:00:04 0:01:08
OS-SART 0:00:23 0:00:22 0:01:30
CGLS 0:00:10 0:00:06 0:03:04
ASD-POCS 0:14:15 0:08:05 1:15:26
OS-ASD-POCS 0:00:28 0:00:22 0:04:55
OS-AwASD-POCS 0:15:39 0:08:36 1:17:35
projections. The OS-SART algorithm and those using OS-ASD-POCS are less
sensitive to the number of projections. Significantly the fastest reconstruction
was performed by the FDK algorithm, the duration was in around 10 s where
for the fastest iterative algorithm SIRT the duration was over 1 min.
Table 2. Results of comparison the reconstruction Head phantom, S-L phantom images
and reference by the RMSE, SSIM, MS-SSIM and VIF methods
Reconstruction Time. The scan was performed at a high (2000 × 1332 px)
resolution and therefore the reconstruction process required significantly more
time. The reconstruction included only 100 central cross-sections, and the detec-
tor was resized to 400 × 1332 px to limit the duration. The measured times are
presented in Table 1.
Comparison of Analytical and Iterative Algorithms 115
The longest reconstruction time was with the SART algorithm and was over
1 day. The next longest calculations took with the OS-AwASD-POCS and ASD-
POCS algorithms (over 20 h). The FDK analytical algorithm reconstructed scans
in under 1 min. The other iterative algorithms took in the range of 20 min to 1h.
However, this is several dozen times longer than FDK.
Fig. 3. Comparison of equivalent cross sections rat madibular scan from various recon-
struction algorithms: FDK Ram-Lak (a), FDK Hann (b), FDK Cosine (c), FDK Shepp-
Logan (d), FDK Hamming (e), SART (f), SIRT (g), OS-SART (h), CGLS (i), ASD-
POCS (j), OS-ASD-POCS (k), OS-AwASD-POCS (l)
In the images after reconstruction with the FDK (Fig. 3(a)–(e)) it is difficult
to notice any differences. They have a homogeneously distributed brightness.
The beam hardening artefacts at the borders of the composite implant (tita-
nium + bio-glass) are clearly visible. Images from the SIRT algorithm and OS-
ASD-POCS are the most distorted. In these images it is difficult to differentiate
the bone structure model. In the other iterative algorithms, the difference in
brightness between the right and left side of the image is visible. In SART and
ASD-POCS the beam hardening artefacts are the slightest.
Due to the fact that this is a scan of a real object, the reconstructed images
cannot be compared to the reference image. Data from the FDK Ram-Lak algo-
rithm was used to calculate image similarity. The images from the FDK analyti-
cal algorithm look the most homogeneous and the Ram-Lak filtering in phantom
reconstruction obtained the best results. The results from RMSE, SSIM, MS-
SSIM and VIF calculations are shown in Table 3. Among the analytical algo-
rithms, the Sheep-Logan filtering method obtained the least differences. The
SIRT algorithm distorted the data the most. The images from CGLS and OS-
SART are close to FDK Ram-Lak in terms of pixel values but the blurring of the
116 P. Lipowicz et al.
Table 3. Results of comparison the reconstruction S-L phantom images and reference
by the RMSE, SSIM, MS-SSIM and VIF methods
image, results in a low score in the VIF method which analyses the image struc-
ture. Among the iterative algorithms, the images from SART and OS-AwASD-
POCS algorithm are the most similar considering all image assessment methods.
Comparing the time to perform the reconstruction of the phantoms and the rat
mandibular scan, the trend and differences between the algorithms were pre-
served. The reconstruction of the mandible scan took a very long time using
iterative methods. This is due to the high image resolution of 2000 × 1332 px
and the number of projections (901). It should be taken into account that only
a fragment of the data set was reconstructed (100 cross-sections). When recon-
structing the full set, the time would probably increase several times [4].
Analysis of the results from the image evaluation methods for the phantoms
showed comparable results, slight differences may be due to fewer projections in
the S-L phantom. The iterative algorithms, particularly SART, ASD-POCS and
OS-AwASD-POCS gave slightly improved results compared to the analytical
algorithm. In both cases, the SIRT and CGLS algorithms distorted the image
the most. The worse results may be due to the need to perform more iterations
in these algorithms or to increase the size of the projection sub-sets in the case
of SIRT. However, this would result in increased reconstruction time [23].
The results from the reconstruction of the mandibular scans follow the trend
observed in the reconstruction of the phantoms. The SART iterative algorithm
showed the best results among all iterative algorithms. The OS-SART and CGLS
algorithms in the RMSE, SSIM, MS-SSIM method showed high similarity to
Comparison of Analytical and Iterative Algorithms 117
FDK Ram-Lak. The VIF method confirmed, what was visible in the images, that
blurring significantly reduces the ability to recognise structures in the images.
In summary, the algorithms with the longest reconstruction time gave the best
results, i.e. SART and OS-AwASD-POCS [3]. However, the reconstruction by the
analytical method is definitely faster, images are homogeneous and the structure
is clearly visible. Both, iterative and analytical algorithms, induce artefacts in
the implant region. The minimal number of artefacts was visible in the SART
and OS-AwASD-POCS algorithms. The filtering of the analytical algorithm does
not introduce significant changes in the image, it correctly removes high fre-
quency noise. Identifying which of the analysed filters least distorts object details
requires further research.
Microtomography requires very high resolution scans and multiple projec-
tions. This amount of data during reconstruction with iterative algorithms
requires very high computing power of computers and the reconstruction would
take many hours or even days. The results presented in this work were obtained
for algorithms in TIGRE and are implementation specific, a more task-specific
implication for iterative reconstructions may be much faster.
Further work to develop a method to reduce implant artefacts in reconstruc-
tion images will be carried out on the basis of analytical algorithms due to their
currently more practical application than iterative algorithms.
References
1. Andersen, A.H., Kak, A.C.: Simultaneous Algebraic Reconstruction Technique
(SART): a superior implementation of the ART algorithm. Ultrason. Imaging 6(1),
81–94 (1984)
2. Asamoah, D., Ofori, E., Opoku, S., Danso, J.: Measuring the performance of image
contrast enhancement technique. Int. J. Comput. Appl. 181(22), 6–13 (2018)
3. Beister, M., Kolditz, D., Kalender, W.A.: Iterative reconstruction methods in X-ray
CT. Physica Med. 28(2), 94–108 (2012)
4. Biguri, A., Dosanjh, M., Hancock, S., Soleimani, M.: TIGRE: a MATLAB-GPU
toolbox for CBCT image reconstruction. Biomed. Phys. Eng. Express 2(5), 055010
(2016)
5. Censor, Y., Elfving, T.: Block-iterative algorithms with diagonally scaled oblique
projections for the linear feasibility problem. SIAM J. Matrix Anal. Appl. 24(1),
40–58 (2002)
6. Feldkamp, L.A., Davis, L.C., Kress, J.W.: Practical cone-beam algorithm. Josa a
1(6), 612–619 (1984)
7. Lee, S.W., et al.: Effects of reconstruction parameters on image noise and spatial
resolution in cone-beam computed tomography. J. Korean Phys. Soc. 59(4), 2825–
2832 (2011)
118 P. Lipowicz et al.
8. Li, L., Chen, Z., Xing, Y., Zhang, L., Kang, K., Wang, G.: A general exact method
for synthesizing parallel-beam projections from cone-beam projections by filtered
backprojection. In: 2006 IEEE Nuclear Science Symposium Conference Record,
vol. 6, pp. 3476–3479. IEEE (2006)
9. Liu, J., Wright, S.: An accelerated randomized Kaczmarz algorithm. Math. Com-
put. 85(297), 153–178 (2016)
10. Liu, Y., Ma, J., Fan, Y., Liang, Z.: Adaptive-weighted total variation minimization
for sparse data toward low-dose X-ray computed tomography image reconstruction.
Phys. Med. Biol. 57(23), 7923 (2012)
11. Machin, K., Webb, S.: Cone-beam X-ray microtomography of small specimens.
Phys. Med. Biol. 39(10), 1639 (1994)
12. Pan, X., Sidky, E.Y., Vannier, M.: Why do commercial CT scanners still employ
traditional, filtered back-projection for image reconstruction? Inverse Probl.
25(12), 123009 (2009)
13. Pontana, F., et al.: Chest computed tomography using iterative reconstruction vs
filtered back projection (part 2): image quality of low-dose ct examinations in 80
patients. Eur. Radiol. 21(3), 636–643 (2011)
14. Pontana, F., et al.: Chest computed tomography using iterative reconstruction vs
filtered back projection (part 1): evaluation of image noise reduction in 32 patients.
Eur. Radiol. 21(3), 627–635 (2011)
15. Qiu, W., Titley-Péloquin, D., Soleimani, M.: Blockwise conjugate gradient methods
for image reconstruction in volumetric CT. Comput. Methods Programs Biomed.
108(2), 669–678 (2012)
16. Sheikh, H.R., Bovik, A.C.: Image information and visual quality. IEEE Trans.
Image Process. 15(2), 430–444 (2006)
17. Shewchuk, J.R., et al.: An introduction to the conjugate gradient method without
the agonizing pain (1994)
18. Sidky, E.Y., Pan, X.: Image reconstruction in circular cone-beam computed tomog-
raphy by constrained, total-variation minimization. Phys. Med. Biol. 53(17), 4777
(2008)
19. Song, X., et al.: Non-invasive location and tracking of tumors and other tissues for
radiation therapy (2010). US Patent App. 12/679,730
20. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image
quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Sys-
tems & Computers 2003, vol. 2, pp. 1398–1402. IEEE (2003)
21. Yu, H., Ye, Y., Wang, G.: Katsevich-type algorithims for variable radius spiral
cone-beam CT. In: Developments in X-Ray Tomography IV, vol. 5535, pp. 550–
557. International Society for Optics and Photonics (2004)
22. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment:
from error visibility to structural similarity. IEEE Trans. Image Process. 13(4),
600–612 (2004)
23. Zeng, G.L.: Comparison of FBP and iterative algorithms with non-uniform angu-
lar sampling. In: 2014 IEEE Nuclear Science Symposium and Medical Imaging
Conference (NSS/MIC), pp. 1–13. IEEE (2014)
Comparison of Interpolation Methods
for MRI Images Acquired with Different
Matrix Sizes
1 Introduction
Magnetic resonance is a non-invasive modern diagnostic technique for which no
adverse biological effects have yet been identified. MR images offer a plethora of
diagnostic information with little risk for the patient. MR images rely on com-
plex physical phenomena, for which numerous computational process steps are
required. A key role is played by proper choice of parameters, especially matrix
size (MS), which has to be adjusted according to the size of the anatomical
region and the desired image resolution. According to the physical limitations of
the signal strength received from spin-spin and spin-lattice dynamics, increasing
the resolution (with high MS size) results in significant signal strength reduc-
tion. Inversely, the high signal strength obtained by small size matrix has the
disadvantage of low image resolution [5]. Therefore, Matrix Size has to be set
carefully with awareness of the underlying physical processes. One technique
Seventeen popular interpolation methods were used to enlarge the images. All
of them came from the Matplotlib library written in Python [4].
1. None 4. Bicubic [3] 9. Hermite [12] 14. Bessel [8]
2. Nearest 5. Spline 16. [7] 10. Kaiser [6] 15. Mitchel [1]
Neighbor 6. Spline 36 [7] 11. Quadratic [3] 16 Sinc [2]
3. Bilinear 7. Hanning [10] 12. Catrom [14] 17. Lanczos [1]
8. Hamming [10] 13. Gaussian [13]
Fig. 1. Examples of cross-sections of various sizes enlarged to the size 448 × 448 by
using the bilinear method, along with their equivalent of a given size
Images Expanded to the Size of the Nearest Larger Matrix Size. The
images in this set have been scaled to the size of the nearest larger matrix’s size.
For example, an image with a size of 320 × 320 has been enlarged to 384 × 384
(Fig. 2).
Fig. 2. Examples of cross-sections of various sizes enlarged to the size of the nearest
larger matrix size by using the bilinear method, along with their equivalent of a given
size
Comparison of Interpolation Methods for MRI Images 123
(2μx μy + C1 ) + (2σxy + C2 )
SSIM (x, y) = (1)
(μ2x + μ2y + C1 )(σx2 + σy2 + C2 )
where: μx , μy are mean values of compared images, σx , σy , σx2 , σy2 and σxy are
standard deviations, variances and covariance of respective images, C1 , C2 are
small constant values to avoid division by zero problem.
The second one was Mean Square Error (MSE) of interpolated volume and
observed volume.
n
1
M SE = (y − y)2 (2)
n i=1
3 Results
Group A. In this group, none of the studies were rejected. This resulted in a
set of twenty studies, each with four sets of images. For these, a comparison was
made of the methods of increasing images to the size of 448 × 448 and to the size
of the image with the closest larger matrix size. The results for a given group
are presented in the Fig. 4.
It is evident that the bicubic, quadric and gaussian methods turned out to
be the best methods in this group. Others did not show as good results as the
previous three.
Group B. This group consisted of sixteen studies. The numbers 8, 10, 12 and
15 were rejected. This was due to the largest M SE peaks when zooming into
the image size with the closest larger matrix size. These results are shown in
124 A. Cieślak et al.
the Fig. 5. For all sixteen studies, a comparison was made of the methods of
increasing images to the size of 448 × 448 and to the size of the image with the
closest larger matrix size.
The results for a given group are presented below in the Figs. 6(a), 6(b), 6(c)
and 6(d).
In this group, an increase in the scope of SSIM and a decrease in the scope
of M SE were noticed (Fig. 6). The layout of the best methods has also changed.
The quadric method shows the best results when zoomed to the size of the closest
larger matrix size. In contrast, the bicubic method is best for enlarging images to
the size of 448 × 448. Gaussian and Mitchell’s methods also showed good results.
Comparison of Interpolation Methods for MRI Images 125
Fig. 4. Graphs of the mean SSIM (a), (c) and M SE (b), (d) for five middle slices
acquired by enlarging images to a matrix of 448 × 488 size (a), (b) or to a matrix of
nearest larger matrix size (c), (d) – Group A
126 A. Cieślak et al.
Fig. 5. Series M SE plot for image scaling to size of nearest larger matrix size for
default interpolation method, for twenty patients
Group C. This group consisted of eleven studies. Those with values greater
than that of patient number 20 were rejected. This was due to the highest mean
M SE values for all study lots when zooming into the image size with the closest
larger matrix size. These results are shown in the Fig. 7. For all eleven studies, a
comparison was made of the methods of increasing images to the size of 448 × 448
and to the image size of the closest larger matrix size.
The results for a given group are presented below in the Fig. 8.
Again, there is an increase in the scope of the SSIM and a decrease in the
scope of MSE. The layout of the best methods has also changed. The methods
that showed the best quality when enlarging MRI images were quadric, bicubic,
bilinear and Mitchell’s.
Fig. 6. Graphs of the mean SSIM (a), (c) and M SE (b), (d) for five middle slices
acquired by enlarging images to a matrix of 448 × 488 size (a), (b) or to a matrix of
nearest larger matrix size (c), (d) – Group B
128 A. Cieślak et al.
Fig. 7. Series mean M SE plot for the image scaling to size of the closest larger matrix
size for the default interpolation method, for twenty patients
5 Discussion
In conclusion, it was possible to compare popular interpolation methods and
indicate those that best deal with the estimation of human tissues when enlarging
MRI images. Taking into account sets of images where they were enlarged to the
size of the closest larger matrix size, the quadric method often turned out to
be the best method. However, when you look at the sets where the images were
enlarged to the size of 448 × 448, the bicubic method stood out there. The results
were similar to those presented by M. B. Hisham [3], where he states that bicubic
interpolation is the best one from the methods that he compares. It is also worth
paying attention to the difference in individual groups. Changes in the range of
SSIM and M SE are visible, which may mean that studies in which patients
showed greater agitation between series could be excluded. The above studies
were conducted to find the best interpolation method for tissue estimation on
MRI images during enlargement. The best method will be used to reconstruct a
cephalometric image from MRI images.
Comparison of Interpolation Methods for MRI Images 129
Fig. 8. Graphs of the mean SSIM (a), (c) and M SE (b), (d) for five middle slices
acquired by enlarging images to a matrix of 448 × 488 (a), (b) or nearest larger matrix
(c), (d) size – Group C
130 A. Cieślak et al.
Fig. 9. Cephalometric image reconstructed using Near (a), (b) and Bicubic
(c), (d) method
Acknowledgement. The data acquisition was carried out based on the consent of the
Jagiellonian University’s Bioethics Committee (No 155/KBL/OIL/2017, 22.09.2017).
This work was financed by the AGH University of Science and Technology thanks
to the Rector’s Grant 18/GRANT/2022.
This work was co-financed by the AGH University of Science and Technology,
Faculty of EAIIB, KBIB no 16.16.120.773.
Work carried out within the grant Studenckie Kola tworza innowacje - II edition,
project no. SKN/SP/535131/2022 entitled “Cephalometric image reconstruction based
on magnetic resonance imaging”.
Comparison of Interpolation Methods for MRI Images 131
References
1. Conejero, J.: Interpolation algorithms in pixinsight (2011)
2. Getreuer, P.: Linear methods for image interpolation. Image Process. Line 1, 238–
259 (2011)
3. Hisham, M., Yaakob, S.N., Raof, R., Nazren, A., Wafi, N.: An analysis of perfor-
mance for commonly used interpolation method. Adv. Sci. Lett. 23(6), 5147–5150
(2017)
4. Hunter, J.D.: Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9(03),
90–95 (2007)
5. Kokeny, P., Cheng, Y.C.N., Xie, H.: A study of MRI gradient echo signals from dis-
crete magnetic particles with considerations of several parameters in simulations.
Magnet. Reson. Imaging 48, 129–137 (2018)
6. Kuo, F.F., Kaiser, J.F.: System Analysis by Digital Computer. Wiley (1966)
7. Limongelli, M., Carvelli, V.: Damage localization in a glass fiber reinforced compos-
ite plate via the surface interpolation method. In: Journal of Physics: Conference
Series, vol. 628, p. 012095. IOP Publishing (2015)
8. Mohan, P.G., Prakash, C., Gangashetty, S.V.: Bessel transform for image resizing.
In: 2011 18th International Conference on Systems, Signals and Image Processing
(2011)
9. Plenge, E., et al.: Super-resolution methods in MRI: can they improve the trade-
off between resolution, signal-to-noise ratio, and acquisition time? Magnet. Reson.
Med. 68(6), 1983–1993 (2012)
10. Podder, P., Khan, T.Z., Khan, M.H., Rahman, M.M.: Comparative performance
analysis of hamming, hanning and blackman window. Int. J. Comput. Appl. 96(18)
(2014)
11. Sara, U., Akter, M., Uddin, M.S.: Image quality assessment through FSIM, SSIM,
MSE and PSNR- a comparative study. J. Comput. Commun. 7(3), 8–18 (2019)
12. Seta, R., Okubo, K., Tagawa, N.: Digital image interpolation method using higher-
order hermite interpolating polynomials with compact finite-difference. In: Pro-
ceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Asso-
ciation, 2009 Annual Summit and Conference, pp. 406–409. Asia-Pacific Signal and
Information Processing Association (2009)
13. Smith, J.O.: Spectral Audio Signal Processing. W3K (2011)
14. Twigg, C.: Catmull-rom splines. Computer 41(6), 4–6 (2003)
Preprocessing of Laryngeal Images
from High-Speed Videoendoscopy
1 Introduction
At the turn of the 19th and 20th century, an effective solution was sought to
correlate the size and type of pathological changes with their impact on vocal
fold vibrations. Commonly used videolaryngostroboscopy (VLS) imaging has
some limitations due to the sampling technique [13]. An adequate effect can
only be obtained if the observed movement is sufficiently periodic and stable, so
a visualization problem may arise when vocal fold movements are nonperiodic
(e.g. voice breaks or vocal tremor) [17]. In many cases the acoustic signal is
distorted to such an extent that the fundamental frequency cannot be calculated,
and thus the sampling rate of the VLS images cannot be adjusted, resulting in
the inability to visualize properly the vibrations of the vocal folds [21]. The
Laryngeal High-Speed Videoendoscopy (LHSV) overcomes these limitations by
recording the images at a frame rate that is much higher and independent of the
fundamental frequency of the vocal folds [20]. At the end of the 20th century,
the development of electronic technology allowed the construction of high-speed
video cameras [1]. Due to their size, the original models were used for scientific
and research purposes only. Initially, the cameras could recorded 2000 frames
per seconds (fps), with time the number of frames increased even to 6000 fps fps
but this was at the cost of compromising image quality or increasing the weight
of the device [17]. The breakthrough came in recent years. Currently, available
high-speed camera are of small size and weight and allow to record images at a
rate of 3200 colour images per seconds at a resolution of 480 × 400 pixels. Due to
short, exposition intervals a laser illuminator is used, which features dynamically
adjustable light intensity. The camera can be directly coupled to a computer and
the recording is carried out on-line. The use of the LHSV technique has reduced
the patient examination time from about 20 s compared to VLS imaging to
fractions of a second.
LHSV imaging is still quite new compared to previous imaging techniques
such as VLS or kymography, and new algorithms need to be developed to fully
exploit the capabilities of this laryngeal diagnostic too. There are already many
articles on glottis segmentation, optical flow or other types of analysis [9,12,15,
19,22–26], but few studies even mention how the ROI is determined from these
images [8,10,18], which can significantly accelerate the computational efficiency
of a given program. Appropriate preparation of images also affects the comfort
of working with a given analysis.
In this paper, we report a study on pre-processing images collected with
LHSV so that they are as best prepared for further processing, i.e. they are free
of artifacts due to the applied image acquisition technique.
directly contribute to the easy and intuitive writing of programs. The first pack-
age is OpenCV (Open Source Computer Vision Library) is an open-source library
that includes several hundreds of computer vision algorithms [2]. It is predomi-
nantly used for image processing, video analysis and object detection. The sec-
ond main library used is SciPy [6], which was used to perform operations on
signals, e.g. FFT or B-spline function, and to eliminate shifts between consecu-
tive frames of the laryngeal movie. Libraries such as NumPy [4] for creating and
operate image matrices and Matplotlib [3] for creating graphs and displaying
the results as images were also used.
The videos that were used were in .avi, .mp4 or .hsv format, the first step
of pre-processing was to convert each of the video into the series of frames in
.bmp format, in case of first two formats using Python language of programming
and other using HSVviewer program created by Diagnova Technologies. Each
BMP image has three components: R (red), G (green) and B (blue). The red
component was separated for further processing. Since, the entire image of the
vocal folds contains red colour components, so the difference in the level of
brightness between the vocal fold and the glottis gap will be most prominent.
Additionally, in the G and B colour images, blood vessels were visible on the vocal
folds more intensely than for the R component, which could interfere with further
analysis. The blue component of images was used to reduce the reflections coming
from the source of light. Reflections are most visible for the blue component of the
image. Figure 2 shows the differences in reflectance brightness between different
image components R, G and B.
Preprocessing of Laryngeal Images from High-Speed Videoendoscopy 135
Fig. 2. The brightness of reflections for different image’s component (a) red compo-
nent, just edges of reflection are bright, centre is dark; (b) green component, brighter
reflection than for red component, but still darker spots in centre; (c) blue component,
the brightness reflections, the most marked difference in the level of brightness between
the vocal fold and light reflection
Next, the offsets of successive frames relative to each other had to be reduced.
The offsets were usually up to a dozen or so pixels (see Fig. 3). Such small offsets
can significantly affect further analysis. It appears that the image is blurred, but
this is due to the overlap of two frames – frame no. 1 and frame no. 143. The
fftconvolve [7] function from the SciPy package was used to remove the offsets.
It convolves two images (the first image in a given series with each subsequent
image) using the Fast Fourier Transform. The result is an image in which one
point is the brightest – if it is at the point where there is the centre of the
image, it means that the shift has not occurred. If the location of the brightest
pint is different, black rows and/or columns are added to the image to properly
shift the image with respect to the first frame. Figure 3(a) shows an example
trajectory of image shifts caused by movement of the laryngoscope relative to
larynx during video acquisition. It is a visualization that uses b-spline functions
136 J. Kaluża et al.
that approximate the movement trajectory of the image content. Figure 3(a)
shows the image displacements discretised to the grid of image pixels. Note that
the number of image shifts appears to be small, but in reality there are a large
number of them, which take repeated coordinates.
Fig. 3. (a) Shift between two images frame no. 1 and no. 143; (b) Trajectory of dis-
placements with respect to the image centre
The next step in pre-processing the laryngeal images was to rotate the images
in the video so that the edges of the vocal folds were vertically aligned in the
image. The images were rotated because phoniatricians justified it by the fact
that it is easier to diagnose and analyse the image when it is in a vertical position.
First, the image representing the largest glottal opening is selected and displayed
(pictures with the largest glottal opening were annotated by the phoniatrists),
on which the user can mark a line that is the axis of symmetry of the glottis.
The centre of this line marks the midpoint of the image rotation. After rotating
the image, the black rows and/or columns (added due to move the centre of the
line to the centre of the image) were removed. Figure 4 illustrates the adopted
image rotation procedure.
At the current stage of software development we decided that the axis of
symmetry would be defined by the user. This is because of the complexity and
polymorphic shapes of the vocal folds. Additionally, it was important to deter-
mine the axis of symmetry of the entire organ (vocal folds) and not just the
glottis (Fig. 5).
A further processing step is to crop the image to the region of interest (ROI).
Phoniatricians have suggested that it is important to be able to determine the
area of the entire vocal folds and their left and right parts. For this purpose,
the following interactive procedure was proposed. The user selects 20 points sur-
rounding the vocal fold region. A third order B-spline curve is fitted based on
the marked points. B-spline curves are a generalization of polynomial represen-
tations for Bezier curves [11]. To maintain continuity, the first point indicated
by the user is added at the end to the list of designated points. The designated
area corresponds to the area of the vocal folds. The axis of symmetry was used
to obtain the area of each vocal fold.
Preprocessing of Laryngeal Images from High-Speed Videoendoscopy 137
Fig. 4. (a) User-defined axis of the glottis symmetry and moving the centre of the
defined axis to the centre of the image; (b) Rotated image
Fig. 5. Exemplary image where the glottis gap is not symmetrical and it would be
difficult to determine the axis of symmetry automatically
Figure 6(a) shows the marked area of the vocal folds, while in Fig. 6(b) the
left and right vocal fold areas are marked with different shades. Finally, the
image was cropped to the vocal fold area with appropriate margins added.
3 Results
Fig. 6. (a) The area of the vocal folds surrounded by b-spline curves; (b) The area of
the glottis divided into left (gray) and right (orange) vocal fold
Table 1. Maximum shift between the first and n-th frame for each subject
Fig. 7. The maximum shift between two frames for images in the same phonetic phase
(a); the frame with added black rows or/and columns to compensate for the shift –
two frames overlapped after offset correction (b)
4 Conclusions
This paper presents a method for pre-processing LHSV images. The LHSV image
recording technique offers a new quality of visualization of the vibrating vocal
folds. However, specially designed pre-processing algorithms are required before
quantitative analysis of the images can be performed. In this paper, the following
LHSV image pre-processing procedures were developed:
– removal of shifts between consecutive images in a series of video images caused
by relative movements of the larynx in relation to the laryngoscope, occurring
even for short image acquisition intervals of fractions of a second,
– removal of glare on images due to the intense laser light source used to illu-
minate the larynx,
– determining the ROI containing the vocal folds area,
– rotating images and determining the axis of symmetry of the glottal area,
– delineation of regions containing the left and right vocal fold area.
It should be noted that these studies were performed under the close supervi-
sion of phoniatricians. They supervised the study and defined the requirements
for computer image analysis to facilitate diagnostic interpretation of the laryn-
geal images.
The current research direction is to analyze the periodicity of vocal fold move-
ments in normophonic subjects and test the developed algorithms for patients
with voice disorders.
References
1. DiagNova. https://www.diagnova.pl/pages/zasoby/rejestracja wideo IV szybka
kamera 1.html. Accessed 15 Jan 2022
2. OpenCV. https://opencv.org/. Accessed 31 Dec 2021
3. Matplotlib - Visualization with Python. https://matplotlib.org/. Accessed 31 Dec
2022
4. NumPy. https://numpy.org/. Accessed 31 Dec 2021
5. OpenCV: Image Inpainting. https://docs.opencv.org/3.4/df/d3d/tutorial py
inpainting.html. Accessed 31 Dec 2021
6. SciPy. https://scipy.org/. Accessed 31 Dec 2021
7. scipy.signal.fftconvolve - SciPy v1.8.0 Manual. https://docs.scipy.org/doc/scipy/
reference/generated/scipy.signal.fftconvolve.html. Accessed 31 Dec 2021
8. Andrade-Miranda, G., Godino-Llorente, J.I., Moro-Velázquez, L., Gómez-Garcı́a,
J.A.: An automatic method to detect and track the glottal gap from high speed
videoendoscopic images. Biomed. Eng. Online14(1), 100 (2015). https://doi.org/
10.1186/s12938-015-0096-3
9. Andrade-Miranda, G., Henrich Bernardoni, N., Godino llorente, J., Cruz, H.: Vocal
folds dynamics by means of optical flow techniques: a review of the methods. Adv.
Sign. Process. Rev. (2018)
Preprocessing of Laryngeal Images from High-Speed Videoendoscopy 141
10. Dı́az-Cádiz, M.E., et al.: Estimating vocal fold contact pressure from raw laryngeal
high-speed videoendoscopy using a hertz contact model. Appl. Sci. 9(11), 2384
(2019). https://doi.org/10.3390/app9112384. https://www.mdpi.com/2076-3417/
9/11/2384. Number: 11 Publisher: Multidisciplinary Digital Publishing Institute
11. Forrest, A.R.: Interactive interpolation and approximation by Bezier polynomials.
Comput. J. 15(1), 71–79 (1972)
12. Gómez, P., et al.: Bagls, a multihospital benchmark for automatic glottis segmen-
tation. Sci. data 7(1), 1–12 (2020)
13. Hillman, R., Mehta, D.: The science of stroboscopic imaging. Laryngeal Evaluation:
Indirect Laryngoscopy to High-Speed Digital Imaging, pp. 101–109 (2010)
14. Ikuma, T., Kunduk, M., McWhorter, A.J.: Preprocessing techniques for high-
speed videoendoscopy analysis. J. Voice 27(4), 500–505 (2013). https://doi.
org/10.1016/j.jvoice.2013.01.014. https://www.sciencedirect.com/science/article/
pii/S0892199713000155
15. Kist, A.M., Dürr, S., Schützenberger, A., Döllinger, M.: OpenHSV: an open plat-
form for laryngeal high-speed videoendoscopy. Sci. Rep. 11(1), 13,760 (2021).
https://doi.org/10.1038/s41598-021-93149-0. Number: 1 Publisher: Nature Pub-
lishing Group
16. Koç, T., Çiloğlu, T.: Automatic segmentation of high speed video images of vocal
folds. J. Appl. Math. 2014 (2014). https://doi.org/10.1155/2014/818415
17. Mehta, D.D., Hillman, R.E.: Current role of stroboscopy in laryngeal imag-
ing. Curr. Opin. Otolaryngol. Head Neck Surg. 20(6), 429–436 (2012).
https://doi.org/10.1097/MOO.0b013e3283585f04. https://www.ncbi.nlm.nih.gov/
pmc/articles/PMC3747974/
18. Naghibolhosseini, M., Deliyski, D.D., Zacharias, S.R., de Alarcon, A., Orlikoff,
R.F.: Temporal segmentation for laryngeal high-speed videoendoscopy in
connected speech. J. Voice, Offic. J. Voice Found. 32(2), 256.e1–256.e12
(2018). https://doi.org/10.1016/j.jvoice.2017.05.014, https://www.ncbi.nlm.nih.
gov/pmc/articles/PMC5740029/
19. Pedersen, M., Jønsson, A., Mahmood, S., Agersted, A.: Which mathematical and
physiological formulas are describing voice pathology: an overview. J. Gen. Pract.
4(3) (2016). https://doi.org/10.4172/2329-9126.1000253
20. Poburka, B.J., Patel, R.R., Bless, D.M.: Voice-vibratory assessment with laryngeal
imaging (VALI) form: reliability of rating stroboscopy and high-speed videoen-
doscopy. J. Voice 31(4), 513.e1–513.e14 (2017). https://doi.org/10.1016/j.jvoice.
2016.12.003. https://www.jvoice.org/article/S0892-1997(16)30360-5/fulltext.
Publisher: Elsevier
21. Powell, M.E., et al.: Efficacy of videostroboscopy and high-speed videoendoscopy
to obtain functional outcomes from perioperative ratings in patients with vocal fold
mass lesions. J. Voice 34(5), 769–782 (2020). https://doi.org/10.1016/j.jvoice.2019.
03.012. https://www.jvoice.org/article/S0892-1997(18)30466-1/fulltext.Publisher:
Elsevier
22. Schlegel, P., et al.: Influence of analyzed sequence length on parameters in laryngeal
high-speed videoendoscopy. Appl. Sci. 8(12), 2666 (2018). https://doi.org/10.3390/
app8122666. Number: 12 Publisher: Multidisciplinary Digital Publishing Institute
23. Schlegel, P., Stingl, M., Kunduk, M., Kniesburges, S., Bohr, C., Döllinger, M.:
Dependencies and ill-designed parameters within high-speed videoendoscopy and
acoustic signal analysis. J. Voice 33(5), 811-e1 (2019)
24. Yamauchi, A., et al.: Chapter 16 analysis of HSDI/HSDP with laryngotopography:
the principles p. 4 (2015)
142 J. Kaluża et al.
25. Yamauchi, A., et al.: Vibratory phase difference of normal phonation: HSDI ana-
lyzed with laryngotopography p. 10 (2016)
26. Yamauchi, A., Yokonishi, H., Imagawa, H., Sakakibara, K.I., Nito, T., Tayama, N.,
Yamasoba, T.: Quantification of vocal fold vibration in various lryngeal disorders
using high-speed digital imaging. J. Voice Offic. J. Voice Found. 30(2), 205–214
(2016). https://doi.org/10.1016/j.jvoice.2015.04.016
Construction of a Cephalometric Image
Based on Magnetic Resonance
Imaging Data
1 Objective
Examinations using ionizing radiation, such as cephalometry and pantomograms,
are still commonly used for imaging of the cerebrocranial and craniofacial skele-
ton in the diagnosis of malocclusion and the planning of orthodontic treat-
ment, despite its well-known harmful effects [4,6,7,9,10], which can affect the
extremely radiosensitive thyroid gland and lens, for example. Ionizing radiation
is frequently used in dental and orthodontic treatment, especially during routine
and periodic examinations, but this unnecessarily exposes the body to radiation
and increases the risk of cancer.
the displacement error with respect to CT, which is within 3 mm. The limitation
of this method is that it can only be applied in the part of the skull contain-
ing the brain, therefore the craniofacial region that is important for dental and
orthodontic treatment is excluded.
3 Data
The data used in this project are two series of magnetic resonance images: T1-
weighted and T2-weighted (Fig. 1). Each series consists of 105 cross sections in
the transverse plane, which contains the craniofacial region, the cerebrocranial
region, and the cervical spine. The resolution of the images is 448 × 448 pixels
with a bit depth of 16.
Fig. 1. Selected cross-section of the data used, T1 (a) and T2 (b) sequences
The T1 and T2 sequences do not overlap, which was taken into account in
the constructed algorithm by fitting appropriate geometric transformations to
minimize the differences between the sequences.
4 Algorithm of Reconstruction
The whole algorithm can be divided into stages: data preparation, tissue seg-
mentation and then generation of the final reconstruction. Data preparation
includes data loading, normalization, and matching of T1-weighted and T2-
weighted image series. Processing includes filtering, thresholding, and de-noising
of the prepared data (Fig. 2); reconstruction includes interpolation of the images
between the given cross-sections and transformation into a cephalometric image.
146 P. Cenda et al.
Due to the offset of the sequences used together simultaneously, it was neces-
sary to match them to each other. For this purpose, the T2 sequence was sub-
jected to geometric transformations such as translation, rotation and scaling.
The optimization function was the BFGS algorithm, whose goal was to select
the parameters of each transformation in order to reduce the differences between
the sequences as much as possible.
An optimization function was used to minimize the differences between the
sequences by applying an appropriate cost function. Its construction consisted
of the use of bilateral filtering in order to simultaneously preserve clear edges
at the boundary between the imaging background and the patient’s head; edge
detection using the Canny algorithm was also performed. For the contours thus
prepared, in each iteration of the optimization algorithm the sum of overlapping
pixels was computed and used in the minimized cost function.
The image series prepared in this way allowed for further transformation
using the correlations and differences in tissue imaging between T1-weighted
and T2-weighted MRI images. To unify the data and facilitate comparison of the
images and the tissues, each image was then normalized to an interval of 0–1,
such that the smallest and largest values in the images were 0 and 1, respectively.
Cephalometric Image Reconstruction 147
In the first step, bilateral filtering was performed on the image for a 10×10
mask size in order to suppress noise and maintain strong and unblurred bound-
aries between the background and the imaged head. Next, thresholding was
performed. Due to the normalization that took place during data preparation,
the thresholds had a value between 0 and 1. For the background segmentation
part, the threshold for both sequences was the same at 0.05 (true for values less
than the threshold).
These are preliminary versions of the masks, but they contain a lot of noise
and imperfections. Especially the area along the edges needs de-noising. For
this purpose, a merge of the two sequences was performed. As a result, the
numerous defects in the masks were mostly fixed and further de-noising was
more effective. De-noising was performed by removing all objects (unconnected
binary selections) that were smaller than the largest one.
In the next step, dilation and closure operations were applied to get rid of
elements such as noise from the masks and to get the masks as close as possible
to the head boundary. For closure, a mask size of 15 × 15 was used; for dilation,
a mask size of 7 × 7 was used. The results are shown in Fig. 3. Additionally, this
allowed for the inclusion of peripheral parts of the head, such as the auricles
and parts of the nose, which would be harder to detect in the soft tissue mask
process.
The last part of the background segmentation was to remove all the signif-
icant indentations contained in the already segmented mask. This was done to
remove any fields that may have been created by dilation and closure by cutting
off the indentations that cut into the mask with a narrow element. Segmenting
the background mask in this way allows the algorithm to distinguish the dark
values of the background image from the bones, which are also imaged with low
values in the image.
148 P. Cenda et al.
In the soft tissue segmentation part, the first step was bilateral filtering to remove
noise and to enhance the contrast between soft tissues and other tissues. A 7 × 7
mask was used and the filtering was performed.
The next step was thresholding. The threshold was different for each
sequence: for the T1-weighted series, the threshold was 0.1; for the T2-weighted
series, it was 0.2 (true for values greater than the threshold). The different thresh-
olding values for the two sequences are due to the different representation of soft
tissues, particularly the white and gray matter of the brain and the fluids sur-
rounding it.
For the T2-weighted series, the threshold is twice as large because the key
issue was the segmentation of the outer parts of the brain and other soft tissues
that bordered other tissue types. These boundaries in the T1-weighted series
were darker, making their segmentation from the T1-weighted series impossible
because it would compromise the coherence of bone tissue segmentation. The
lower segmentation threshold of the T1-weighted series was due to the softer
contrast at the tissue boundaries and the absence of elements that could interfere
with the segmentation of the bone tissues.
The soft tissue binary masks created in this way were de-noised by removing
objects and holes in the image that were smaller than 15 pixels, thus preventing
formation of small artifacts outside the mask area and within the mask bound-
aries, but not affecting the coherence of further segmentation.
The final part of the soft tissue segmentation was to combine the T1-weighted
and T2-weighted sequences. The result of this operation is shown in Fig. 4. The
merging of the sequences involved a logical sum operation.
The soft tissue binary masks prepared in this way allows the algorithm to
segment bone tissues from data more accurately. The masks take into account
brain tissues, cerebrospinal fluid, fatty tissues, connective tissues, other types of
fluids (such as blood), and pathologies (such as tumors and abscesses).
Cephalometric Image Reconstruction 149
For this purpose, a bicubic interpolation method was used to interpolate the
bone and soft tissue masks to 448 cross sections in the transverse plane. An
example spatial model of the bone mask is shown in Fig. 6. The models thus
prepared were used for the next step of cephalometric image generation.
Using the bone and tissue model, projection of mask values onto planes was
performed. During the projection, for each pixel on the selected plane the number
of pixels present in the mask in a given direction perpendicular to the plane
was counted. This operation simulated the density and thickness of the objects,
which are parameters that have a major influence on the appearance of the image
during cephalometric imaging. The counts from the bone and soft tissue masks
were added together with weights: 1 for bone and 0.25 for soft tissue. The values
thus calculated were normalized to 1. The results are shown in Fig. 7. These are
the generated projections in three different planes.
5 Results
By running the algorithm, a cephalometric-like image was generated. For easier
and more accurate comparison with the real cephalometric image (Fig. 8(a)),
the generated image (Fig. 8(b)) was cropped and rotated. These images were
Cephalometric Image Reconstruction 151
not obtained from the same individual due to the difficulty of obtaining such
data. Nevertheless, even an illustrative comparison of the two images will give
an idea of the differences and the possible inaccuracies present in the image
generated by the proposed algorithm.
Fig. 8. Comparison of the original cephalometric image (a) with the generated (b)
Leaving aside the differences caused by the nature of the examination from
which the image was obtained, the differences are most pronounced in the cran-
iofacial portion: in the area of the nasal septa and sinuses, as well as at the
esophageal site. Magnetic resonance imaging is different than computed tomog-
raphy in terms of tissue representation: in CT, bone tissues are represented as
bright elements and air and background are represented as dark elements; on
the other hand, in MRI, bone tissue, air, and background are all represented as
152 P. Cenda et al.
image in the transverse plane to match the size of the frontal and sagittal planes.
However, due to the high degree of interpolation (from 105 cross sections to 448),
distortion is present in areas of high volatility. In the region of the cranial vault,
the frontal and parietal bones begin to have a drastically smaller area in each
cross-section in the transverse plane, which causes visible brightness biases due
to inaccurate and non-smooth interpolation of values between known values in
the image. This effect can be removed by using a larger series of MRI images
that covers the same size area as previous data.
6 Discussion
The difference between the original and the created image is primarily due to
the position of the body during examination: for cephalometry, it is a standing
position; for MRI, it is a recumbent position. The biggest changes caused by this
are manifested as a different position of the skull and cranium in relation to the
cervical vertebrae.
The developed methodology, which reconstructs a cephalometric-like image,
is an initial step towards using patient-free imaging instead of methods that use
harmful ionizing radiation, such as cephalometry or pantomograms. Despite its
limitations due to inaccurate air segmentation, which distorts and disrupts the
output image generated by the proposed algorithm, it significantly resembles
cephalometric imaging. These distortions do not include most areas, such as
the cervical vertebrae, the skull, and the dentition, all of which are frequently
used by dentists and orthodontists during treatment, therefore the developed
methodology could be utilized. If the number of sections comprising the magnetic
resonance imaging sequences were increased, the mapping would gain in accuracy
and legibility, which would reduce the impact of interpolation, which noticeably
compromises image integrity.
The constructed method fundamentally describes the bone structure of the
human cerebral, craniofacial, and cervical vertebrae. Its low resolution (although
it could be increased by using more cross-sections) can be used as an initial
screening and diagnostic test that serves as a rough model of a cephalometric
image and, therefore it could be used to determine the need for imaging with
ionizing radiation.
The problem with clear sinus imaging is not necessarily just a disturbance
of the cephalometric image. When sinusitis is present, the sinuses are imaged as
brighter than normal in magnetic resonance imaging. This is due to the presence
of sinus-filling secretions that are a consequence of infection or chronic allergies.
Thus, the sinuses would be darker on the generated image, suggesting an exist-
ing disease or other pathology. Such an examination could serve as a screening
procedure that suggests the possible need for further diagnostic tests.
In conclusion, the developed algorithm, despite the partial inaccuracies, can
serve as preliminary and approximate imaging that would determine whether
further steps in the diagnostic process should be taken.
154 P. Cenda et al.
Acknowledgement. This work was financed by the AGH University of Science and
Technology thanks to the Rector’s Grant 18/GRANT/2022.
This work was co-financed by the AGH University of Science and Technology,
Faculty of EAIIB, KBIB on 16.16.120.773.
Work carried out within the grant Studenckie Kola tworza innowacje - II edition,
project no. SKN/SP/535131/2022 entitled “Cephalometric image reconstruction based
on magnetic resonance imaging”.
References
1. Claus, E.B., Calvocoressi, L., Bondy, M.L., Schildkraut, J.M., Wiemels, J.L., Wren-
sch, M.: Dental X-rays and risk of meningioma. Cancer 118(18), 4530–4537 (2012)
2. Cung, W., et al.: Cephalometry in adults and children with neurofibromatosis type
1: implications for the pathogenesis of sphenoid wing dysplasia and the NF1 facies.
Eur. J. Med. Genet. 58(11), 584–590 (2015)
3. Dogdas, B., Shattuck, D.W., Leahy, R.M.: Segmentation of skull and scalp in 3-D
human MRI using mathematical morphology. Hum. Brain Mapp. 26(4), 273–285
(2005)
4. Domeshek, L.F., Mukundan, S., Yoshizumi, T., Marcus, J.R.: Increasing concern
regarding computed tomography irradiation in craniofacial surgery. Plast. Recon-
str. Surg. 123(4), 1313–1320 (2009)
5. Eley, K.A., Delso, G.: Automated 3D MRI rendering of the craniofacial skeleton:
using ZTE to drive the segmentation of black bone and FIESTA-C images. Neu-
roradiology 63(1), 91–98 (2020). https://doi.org/10.1007/s00234-020-02508-7
6. Hwang, S.Y., Choi, E.S., Kim, Y.S., Gim, B.E., Ha, M., Kim, H.Y.: Health
effects from exposure to dental diagnostic X-ray. Environ, Health Toxicol, 33(4),
e2018,017 (2018)
7. Krille, L., et al.: Risk of cancer incidence before the age of 15 years after exposure
to ionising radiation from computed tomography: results from a German cohort
study. Radiat. Environ. Biophys. 54(1), 1–12 (2015)
8. Maillie, H.D., Gilda, J.E.: Radiation-induced cancer risk in radiographic cephalom-
etry. Oral Surg. Oral Med. Oral Pathol. 75(5), 631–637 (1993)
9. Pflugbeil, S., Pflugbeil, C., Schmitz-Feuerhake, I.: Risk estimates for meningiomas
and other late effects after diagnostic X-ray exposure of the skull. Radiat. Prot.
Dosimetry 147(1–2), 305–309 (2011)
10. Smith-Bindman, R., et al.: Radiation dose associated with common computed
tomography examinations and the associated lifetime attributable risk of cancer.
Arch. Intern. Med. 169(22), 2078–2086 (2009)
11. Zhang, R., et al.: Bone-selective MRI as a nonradiative alternative to CT for cran-
iofacial imaging. Acad. Radiol. 27(11), 1515–1522 (2020)
Analysis of Changes in Corneal Structure
During Intraocular Pressure
Measurement by Air-Puff Method
1(B)
Magdalena Jedzierowska
, Robert Koprowski1 , and Slawomir Wilczyński2
1
Institute of Biomedical Engineering, Faculty of Science and Technology,
University of Silesia in Katowice, ul. Bedzińska
39, 41-200 Sosnowiec, Poland
{magdalena.jedzierowska,robert.koprowski}@us.edu.pl
2
Department of Basic Biomedical Science, School of Pharmacy with the Division
of Laboratory Medicine in Sosnowiec, Medical University of Silesia in Katowice,
Kasztanowa Street 3, 41-200 Sosnowiec, Poland
swilczynski@sum.edu.pl
1 Introduction
A dynamic process of corneal deformation occurs when measuring intraocular
pressure (IOP) by the air-puff method. The cornea deforms inwards and then
returns to its original shape. This corneal behaviour is influenced by many fac-
tors, including its thickness and mechanical properties of its structure, as well
as intraocular pressure [11]. Currently, using corneal visualization Scheimpflug
technology (Corvis ST tonometer), it is possible to obtain a number of biome-
chanical parameters of the cornea, helpful, among others, in the diagnosis of
keratoconus [5]. As indicated in the papers [1,8,13,14], it can be noticed that
during deformation the cornea performs characteristic vibrations in the form
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 155–167, 2022.
https://doi.org/10.1007/978-3-031-09135-3_14
156 M. Jedzierowska
et al.
Fig. 1. Sample images of corneal cross-sections from the Corvis ST tonometer with the
highlighted characteristic areas of the corneal structure
In this study, the entire sequence of 140 corneal images from the Corvis ST
tonometer was analysed in order to find relationships in the observed changes
mentioned above. The study is not related to texture, but to the structure of
the cornea. Therefore, in the following sections, the corneal structure as well as
its displacement visible during IOP measurements with a non-contact tonometer
are analysed. The details contained in the texture of the analysed area were not
investigated. They will be the subject of discussion in future papers. Thus, the
authors decided not to use tools for analysing the texture of medical images,
such as grey level co-occurrence matrix (GLCM) methods [18], Laws Texture
Energy Method [4], wavelet texture analysis [15] or statistical approaches [3],
and only correlation, as the algorithm component, was used.
2 Materials
3 Methods
The proposed method for analysing the corneal structure, allowing for fully auto-
matic tracking of changes visible in the images of its cross-sections during the
intraocular pressure examination, has been divided into two key stages:
1. Extraction of corneal cross-sections from an image sequence.
2. Analysis of corneal structure changes in consecutive images in the sequence
(over time).
Extraction of the corneal image from individual image frames involved image
pre-processing and outer corneal edge detection. For this purpose, the authors’
method presented in [9] was used. Then, to extract the cornea, its thickness was
assumed to be constant, which, as indicated later in the paper, is possible under
specific conditions and introduces certain limitations to the entire algorithm.
The second stage consisted in analysing corneal structure changes during the
intraocular pressure examination. For this purpose, specific areas of the cornea
were followed in 140 consecutive images. Tracking was based on finding the areas
with the highest correlation to the originally selected fragment. The individual
stages of the proposed image analysis are described in detail in the following sub-
sections. The algorithm was written in MATLAB R
ver. 9.0.0.341360 (R2016a)
using Image Processing Toolbox (ver. 9.4).
where: t(m, n) – threshold value for a pixel with coordinates (m,n), m ∈ (1, 200),
n ∈ (1, 576), μ(m, n) – mean brightness value for a given window, σ(m, n) – stan-
dard deviation for a given window, k – constant k > 0, selected experimentally
(k = 0.25), R – maximum value of the standard deviation.
The constant k (Eq. (1) was selected experimentally based on the contrast
and size of the analysed objects. Then, based on the image resolution and anthro-
pometric data showing a mean corneal thickness of approx. 500 µm, an assump-
tion was made that the corneal thickness was constant, equal to 20 pixels. The
above assumption applies only under certain conditions, e.g. lack of diseases or
conditions affecting the corneal thickness such as keratoconus in the test sub-
jects. The introduction of the above assumption also requires pre-selection of
subjects taking into account their individual variability. The symbol i, where
i ∈ (1, 139), was adopted for marking consecutive images in the sequence.
Knowing the values of the position of the detected outer edge of the image
LSP
k (n) for each i-th image from the sequence and using the adopted assump-
tion of constant corneal thickness, successive images of the corneal cross-sections
LCOR (m, n), where m ∈ (1, 20), were extracted. Based on the above operation,
the LCOR (mi , ni ) image sized 2800 × 576 pixels was obtained, where the sub-
script i refers to the successive images in the sequence, thus assuming values in
the range from 1 to 139. The image LCOR (mi , ni ) is shown in Fig. 2.
Fig. 2. a) Diagram showing the method for extracting a single corneal cross-section
LCOR (m, n) from the selected 2D image. b) Newly created image LCOR (mi , ni ) con-
sisting of the consecutive 140 corneal images. An exemplary area LK (mi , ki ) for which
the area with the highest correlation LK (mi+1 , j) was searched for is marked with a
red rectangle
Analysis of Changes in Corneal Structure 159
In this part of the study, individual corneal cross-sections were analysed in terms
of changes in their structure during the IOP examination. For this purpose, in
the consecutive 140 corneal images LCOR (m, n), areas with the greatest corre-
spondence, understood in the analysed issue as the highest value of correlation
between two selected areas in the sequence, were searched for. To accomplish the
above task, each two consecutive corneal cross-sections were analysed, arranged
one after the other in the LCOR (mi , ni ) image (see Fig. 2b) as follows: Start-
ing from the first LCOR (m1 , ni ) image, the area LK (mi , ki ) was determined (an
example area is shown in Fig. 2b) with a constant length of 64 pixels, where
ki ∈ (ni , ni + 63) and ni ∈ (1, 576 − 63), and then the corresponding area
LK (mi+1 , j) was searched for in the next cross-section LCOR (mi+1 , ni ), taking
into account the possibility of its shifting left or right by 50% of the length, there-
fore j ∈ (ki − 32, ki + 32). Then, the value of correlation ri (ki , j) was determined
for all the resulting pairs of images, i.e. the correlation between LK (mi , ki ) and
LK (mi+1 , j). The area with the greatest similarity was the area for which the
calculated correlation value was the highest, i.e. ri (ki ) = max ri (ki , j).
j
Ki
LKS (mi ) = 1
Ki LK (mi , ki ) (2)
ki =1
J
LKS (mi+1 ) = 1
J LK (mi+1 , j) (3)
j=1
K
i M
i
[LK (mi ,ki )−LKS (mi )][LK (mi+1 ,j)−LKS (mi+1 )]
ki =1 mi =1
ri (ki ) = max
(4)
j K M K M
i i i
i
[LK (mi ,ki )−LKS (mi )]2 [LK (mi+1 ,j)−LKS (mi+1 )]2
ki =1 mi =1 ki =1 mi =1
where: LKS (mi ) – mean value for each image row LK (mi , ki ), where ki ∈
(ni , ni + 63) and ni ∈ (1, 576 − 63), LKS (mi+1 ) – mean value for each image row
LK (mi+1 , j), where j ∈ (ki − 32, ki + 32). The above steps were repeated from
mi = 1 to mi = 139 − 1 for all possible areas with the length of 64 pixels.
The above operations provided the matrix Ls (l, z), where l ∈ (1, 139 − 1),
z ∈ (1, N − 63) and N = 576, in which the values of ’position’ were recorded, i.e.
ki of successive areas with the maximum correlation. A fragment of the Ls (l, z)
matrix is shown in Fig. 3. It should be noted that the l coordinate of the Ls (l, z)
matrix indicates the consecutive corneal cross-sections, starting from i = 2, i.e.:
i = l + 1.
The values of the Ls (l, z) matrix were then tracked (over time) to determine
the position of the next most similar (in terms of correlation) area.
The position tracking method is shown schematically in Fig. 3 by means of
arrows, which indicate the highest correlation value in the consecutive corneal
cross-section for the first fragment LK (m1 , k1 ), i.e. for i = 2, Lk (m2 , Ls (1, 1))
was obtained for the fragment, where Ls (1, 1) = 1, then forLk (m2 , 1) the
160 M. Jedzierowska
et al.
Fig. 3. Fragment of the matrix Ls (l, z) where l ∈ (1, 139 − 1), z ∈ (1, 576 − 63) in
which the values of ’position’ were recorded, i.e. ki of the consecutive areas with the
maximum correlation. The l coordinate of the matrix denotes successive corneal cross-
sections starting from i+1, whereas the z coordinate refers to the successive extractable
areas LK (mi , ki ). Red arrows indicate the method of tracking the area LK = (m1 , k1 ),
blue arrows indicate the method of tracking the area LK (m1 , k2 )
4 Results
Two parameters determined on the basis of the obtained values of displacements
constituted the basis for further analysis of corneal structure changes during
the intraocular pressure examination. For each of the 20 subjects, the following
values were designated:
– absolute displacement between the end and start of each plot of displacement
change, hereinafter denoted as: |Δn| (see Fig. 5a),
– maximum deviation from the line defined by two points describing the posi-
tion of a given area at the beginning and end of the IOP test, hereinafter
referred to as: d (see Fig. 5b).
Analysis of Changes in Corneal Structure 161
Fig. 4. Sample graph of tracking changes in the position of selected areas of the cornea
during IOP examination. The values ki , which correspond to the horizontal position
of a given fragment, are marked on the horizontal axis, whereas successive i values,
corresponding to consecutive corneal cross-sections recorded during the test, where
i ∈ (1, 139), are marked on the vertical axis
Fig. 5. Schematic diagram showing the method for determining: a) parameter |Δn|,
i.e. the value of the absolute displacement between the end and beginning of the dis-
placement change plot, b) parameter d, i.e. the value of the maximum deviation from
the line defined by two points describing the location of a given area at the beginning
and end of IOP examination
Table 1. Absolute displacement between the end and beginning of each plot of dis-
placement change (|Δn|) for the selected ki values for 20 subjects
Subject no. ki
1 65 129 193 257 321 385 449 513
1 2 25 34 69 42 31 33 2 65
2 34 0 8 32 35 29 1 2 74
3 13 14 10 79 20 84 4 1 4
4 2 17 71 89 25 4 88 62 59
5 55 11 73 63 1 46 41 23 8
6 47 10 60 36 28 61 31 24 52
7 32 32 37 101 49 29 108 5 15
8 9 16 28 48 17 81 40 20 8
9 16 9 51 117 4 82 60 24 0
10 4 9 107 43 21 2 21 1 4
11 2 6 25 83 16 5 57 12 4
12 3 11 14 92 26 6 14 8 0
13 2 25 78 54 10 74 13 17 79
14 0 7 34 107 31 36 101 20 78
15 0 15 19 103 57 8 12 5 25
16 5 31 21 69 5 50 35 18 45
17 0 2 3 13 23 27 6 9 6
18 29 1 13 82 18 45 5 1 0
19 7 38 65 75 11 28 37 15 0
20 5 6 42 23 41 1 58 16 59
Mean 13.35 14.25 39.65 68.90 24.00 36.45 38.25 14.25 29.25
Std 16.33 10.52 27.46 28.68 14.74 27.57 31.24 13.62 29.72
Table 2. Maximum deviation from the line defined by two points describing the posi-
tion of a given area at the beginning and end of IOP examination (d) for selected ki
values for 20 subjects
Subject no. ki
1 65 129 193 257 321 385 449 513
1 2.90 13.73 20.17 14.76 36.98 37.67 17.93 16.61 33.54
2 21.90 15.00 8.04 22.83 55.19 75.16 4.51 35.87 27.41
3 8.80 14.43 15.32 30.58 31.20 47.13 7.88 4.85 1.93
4 1.00 15.15 25.01 34.35 43.95 60.76 59.05 33.63 25.52
5 24.72 45.28 54.47 83.93 75.80 63.25 22.42 39.13 3.37
6 36.02 20.89 19.90 95.29 63.82 53.21 15.72 20.18 24.19
7 14.01 19.86 19.94 51.82 30.85 29.29 42.18 4.69 7.24
8 6.90 15.81 17.97 48.66 35.30 35.77 18.12 10.59 13.86
9 16.47 12.95 19.86 43.69 8.66 68.82 24.60 13.11 0.00
10 3.16 23.15 44.04 18.93 24.99 3.13 8.12 1.83 1.91
11 4.56 6.73 12.54 38.23 26.42 57.58 20.96 5.20 10.69
12 9.15 3.55 15.33 44.65 23.93 9.34 9.76 5.93 3.00
13 13.68 16.13 22.18 51.18 28.09 29.20 10.57 14.64 37.01
14 4.00 5.33 32.89 35.16 30.75 68.28 73.40 8.63 49.32
15 10.00 13.16 17.88 35.22 47.66 41.92 12.39 7.97 12.02
16 11.38 10.00 13.30 57.69 39.70 40.55 13.52 31.47 27.76
17 3.00 5.77 16.97 11.43 13.97 24.51 7.12 9.02 4.86
18 14.14 12.35 23.24 35.85 32.33 42.97 3.99 1.51 0.00
19 6.47 14.87 29.50 30.01 34.54 16.21 12.03 6.38 4.00
20 7.66 12.94 13.14 56.39 33.86 21.80 15.40 28.33 28.28
Mean 11.00 14.86 22.08 42.03 35.90 41.33 19.98 14.98 15.80
Std 8.45 8.57 10.79 20.42 15.32 20.05 17.68 11.85 14.26
early elastic and viscoelastic properties [6,17,20]. In addition, there are visible
displacements of areas in the central part of the cornea, which after the moment
of maximum bending take an oscillating character (in Fig. 4 it is visible around
i = 100, i.e. about 22 ms).
The parameters of absolute displacement between the end and beginning
of each plot of displacement change (|Δn|) indicate large discrepancies among
individual subjects. The standard deviations for mean values |Δn| exceed at
least half of the mean values, which proves the high variability of the data.
Nevertheless, a consistent observation for all subjects is an increase in the value of
|Δn| in the middle part of the cornea and in 1/4 and 3/4 of its width (see Fig. 6a),
i.e. in the areas subject to the greatest deformation during IOP measurements
by the air-puff method.
164 M. Jedzierowska
et al.
Fig. 6. a) Graph showing the mean values of the absolute displacement (|Δn|) for
the selected ki values, which correspond to the horizontal position of the fragment. b)
Graph of the mean values of the maximum deviation (d) for the selected ki values
Another parameter (d), indicating the maximum deviation from the line
defined by two points describing the position of a given area at the beginning
and end of the IOP examination, showed an increase in the value of the max-
imum deviation for ki = 193, ki = 257 and ki = 321 in all the test subjects
(see Fig. 6b). An increase in d in the central part of the cornea is also visible in
Fig. 6b, presenting the mean values of d. The above corresponds to the observed
corneal changes during the IOP test with the air-puff method – the greatest
changes in the form of vibrations [1,14] are observed from 1/4 to 3/4 of the
corneal width, which, according to the obtained data, affects the largest corneal
structure changes in these areas. In addition, the greatest deviations in the cen-
tral part of the cornea indicate that the corneal structure gets denser on the
sides – the traced areas “shift to the side” when the cornea deforms the most.
It can be assumed that the visible displacements are related to stretching and
crimping of the collagen fibres of the cornea occurring under elastic deformation
of the cornea.
The parameter d, determined on the basis of the obtained displacements, is
characterized by lower values of the standard deviation than the parameter |Δn|.
Therefore, based on the preliminary analysis of the corneal structure changes
presented in this paper, it seems to be the most appropriate parameter for the
assessment of the discussed issue. At the present stage, the obtained results have
been submitted to ophthalmologists for verification under clinical conditions.
5 Discussion
The study analysed changes in the corneal structure during the intraocular pres-
sure examination using the air-puff method. The analysed images were obtained
from the Corvis ST tonometer equipped with a Scheimpflug camera, which
records corneal deformation and its morphological changes under the influence
of an air puff within approximately 30 ms [10,21]. Changes resulting from the
Analysis of Changes in Corneal Structure 165
acting force in the form of an air stream are described in biomechanics as a fast
loading and unloading process. The changes in the displacement of the corneal
structures observed in the conducted research can be described as propagating
waves, which is also visible to the naked eye when observing the images from the
IOP examination. The characteristic corneal structures move during dynamic
deformation in an asymmetrical manner (as shown in Fig. 4), which is reflected
in the observed asymmetry during the dynamic response of the cornea during
the loading and unloading process [2]. The studies conducted by the authors
are not devoid of a number of limitations that may have affected the obtained
results. Some of them include:
1. Introduction of an assumption about the constant corneal thickness. The
lack of reference to the actual corneal thickness, which in a healthy subject
is thinner in the middle and thicker on the sides, can potentially lead to
misinterpretation of the obtained results. Therefore, in further studies, the
authors intend to combine the measurement of the central corneal thickness
with the analysis of its structure.
2. The proposed algorithm for tracking specific areas is based on the correlation
parameter, thus the authors do not refer to the analysis of the corneal texture.
3. A small research group – 20 healthy subjects.
To the authors’ knowledge, the issue of corneal structure analysis, understood
as tracking its specific areas during the air-puff examination, has not yet been
described in the literature.
6 Summary
This paper presents the authors’ method for analysing corneal structure changes
during IOP measurements by using an air puff. The research was carried out
on data obtained from a Corvis ST tonometer equipped with a Scheimpflug
camera. The obtained results allow for real-time tracking of specific areas of
the cornea, the displacements of which, according to the preliminary results,
are characterized by asymmetry. In subsequent studies, the authors intend to
compare the obtained results for the group of healthy subjects and patients with
diseased corneas. The above will enable to assess the usefulness of the extracted
parameters, i.e. the absolute displacement |Δn| and the maximum deviation
d in the classification of eye diseases. The preliminary test results have been
submitted to ophthalmologists for verification in terms of their usefulness under
clinical conditions. To sum up, the preliminary analysis of corneal structure
changes carried out in this paper seems to be one of the possible ways to broaden
the knowledge about changes in the corneal morphology during its dynamic
deformation. Thus, it may allow for a more accurate assessment of the corneal
biomechanics, which plays an increasingly important role in the diagnosis of
corneal diseases [5,19] and in the screening of patients qualified for refractive
surgery [7,16]. However, further research is necessary to systematize and confirm
the obtained results.
166 M. Jedzierowska
et al.
References
1. Boszczyk, A., et al.: Non-contact tonometry using Corvis ST: analysis of corneal
vibrations and their relation with intraocular pressure. J. Opt. Soc. Am. A. 36(4),
28–34 (2019). https://doi.org/1084-7529/19/040B28-07
2. Boszczyk, A., et al.: Novel method of measuring corneal viscoelasticity using the
Corvis ST Tonometer. J. Clin. Med. 11, 261 (2022)
3. Corrias, G., et al.: Texture analysis imaging ’what a clinical radiologist needs to
know’. Eur. J. Radiol. 146, 110055 (2022). https://doi.org/10.1016/j.ejrad.2021.
110055
4. Dash, S., Jena, U.R.: Multi-resolution Laws’ Masks based texture classification. J.
Appl. Res. Technol. 15(6), 571–582 (2017). https://doi.org/10.1016/j.jart.2017.07.
005
5. Elham, R., et al.: Keratoconus diagnosis using Corvis ST measured biomechanical
parameters. J. Curr. Ophthalmol. 29, 175–181 (2017). https://doi.org/10.1016/j.
joco.2017.05.002
6. Eliasy, A., et al.: Determination of corneal biomechanical behavior in-vivo for
healthy eyes using CorVis ST tonometry: stress-strain index. Front. Bioeng.
Biotechnol. 7(May), 1–10 (2019). https://doi.org/10.3389/fbioe.2019.00105
7. Esporcatte, L.P.G., et al.: Biomechanical diagnostics of the cornea. Eye Vis. (Lon-
don, England). 7, 9 (2020). https://doi.org/10.1186/s40662-020-0174-x
8. Han, Z., et al.: Air puff induced corneal vibrations: theoretical simulations and
clinical observations. J. Refract. Surg. 30(3), 208–213 (2014). https://doi.org/10.
3928/1081597X-20140212-02
9. Jedzierowska,
M., et al.: A new method for detecting the outer corneal contour in
images from an ultra-fast Scheimpflug camera. Biomed. Eng. Online. 18(1), 115
(2019). https://doi.org/10.1186/s12938-019-0735-1
10. Kling, S., Hafezi, F.: Corneal biomechanics - a review. Ophthalmic Physiol. Opt.
1–13 (2017). https://doi.org/10.1111/opo.12345
11. Kling, S., Marcos, S.: Contributing factors to corneal deformation in air puff mea-
surements. Invest. Ophthalmol. Vis. Sci. 54(7), 5078–85 (2013). https://doi.org/
10.1167/iovs.13-12509
12. Koprowski, R.: Image Analysis for Ophthalmological Diagnosis. Springer (2016).
https://doi.org/10.1007/978-3-319-29546-6
13. Koprowski, R., Ambrósio, R.: Quantitative assessment of corneal vibrations dur-
ing intraocular pressure measurement with the air-puff method in patients with
keratoconus. Comput. Biol. Med. 66, 170–178 (2015). https://doi.org/10.1016/j.
compbiomed.2015.09.007
14. Koprowski, R., Wilczyński, S.: corneal vibrations during intraocular pressure mea-
surement with an Air-Puff Method. J. Healthc. Eng. 13 (2018). https://doi.org/
10.1155/2018/5705749
15. Kumar, I., et al.: Wavelet packet texture descriptors based four-class BIRADS
breast tissue density classification. Procedia Comput. Sci. 70, 76–84 (2015).
https://doi.org/10.1016/j.procs.2015.10.042
16. Ortiz, D., et al.: Corneal biomechanical properties in normal, post-laser in situ
keratomileusis, and keratoconic eyes. J. Cataract Refract. Surg. 33(8), 1371–1375
(2007). https://doi.org/10.1016/j.jcrs.2007.04.021
17. Qin, X., et al.: Evaluation of corneal elastic modulus based on Corneal Visualization
Scheimpflug Technology. Biomed. Eng. Online. 1–16 (2019). https://doi.org/10.
1186/s12938-019-0662-1
Analysis of Changes in Corneal Structure 167
18. Shin, Y.G., et al.: Histogram and gray level co-occurrence matrix on gray-scale
ultrasound images for diagnosing lymphocytic thyroiditis. Comput. Biol. Med. 75,
257–266 (2016). https://doi.org/10.1016/j.compbiomed.2016.06.014
19. Vinciguerra, R., et al.: Detection of Keratoconus with a new biomechanical index.
J. Refract. Surg. 32(12), 803–810 (2016). https://doi.org/10.3928/1081597X-
20160629-01
20. Yazdi, A.A., et al.: Characterization of non-linear mechanical behavior of the
cornea. Sci. Rep. 10, 11549 (2020). https://doi.org/10.1038/s41598-020-68391-7
21. Zhang, D., et al.: Exploring the biomechanical properties of the human cornea in
vivo based on Corvis ST. Front. Bioeng. Biotechnol. 9(November), 1–10 (2021).
https://doi.org/10.3389/fbioe.2021.771763
Discrimination Between Stroke and Brain
Tumour in CT Images Based
on the Texture Analysis
Abstract. The brain is one of the most important organs in the human
body because it is its control centre, and any disease of the brain is a
real threat to human life. A brain tumour is a newly formed, foreign
structure, whose growth causes an increase in intracranial tightness. A
stroke is defined as a sudden neurological deficit caused by central ner-
vous system ischemia or haemorrhage. CT is a routine examination when
these diseases are suspected. However, the distinction between stroke and
tumour is not always straightforward, even for experienced radiologists.
There are solutions for detecting each disease separately, but there is no
system that would support distinguishing of both diseases. Therefore,
the aim of this work is to develop a system that allows discrimination
between a stroke and a brain tumour on CT images based on the anal-
ysis of the texture features calculated for the region of interest marked
by radiologist. Next, feature selection was performed by Fisher crite-
rion and convex hull approach. Finally, selected features were classified
with use of the Classification Learner application available in MATLAB
R2021b. Classification methods based on classification trees, k-nearest
neighbours (KNN), and support vector machine (SVM) gave the best
classification results. It was demonstrated that classification accuracy
reached over 95% for the analysed feature set that is promising result in
semiautomatic discrimination between stroke and tumour in CT data.
1 Introduction
The brain is one of the most important organs in the human body because it
is its control centre. The human brain is responsible for functions necessary for
human survival, such as proper heartbeat, digesting meals, receiving sensory
impressions, learning and movement [12]. For this reason, any disease of the
brain is a real threat to human life. A brain tumour is a newly formed, foreign
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 168–180, 2022.
https://doi.org/10.1007/978-3-031-09135-3_15
Discrimination Between Stroke and Brain Tumour 169
proliferative process and often raises doubts [22]. The problem of discriminating
between stroke and tumour is also discussed in [8] where gliomas are identified
as one of the factors that may mimic a stroke in CT, which may lead to mis-
diagnosis. To avoid this, a special protocol was developed which, in addition to
the CT examination, includes medical history of the patient, neurological symp-
toms, blood pressure, ECG, blood tests as well as other imaging methods like
chest X-Ray. Also, in [18] other nonischemic conditions that mimic the presence
of stroke were analysed. It was demonstrated that application of multimodal CT
protocol (including besides the standard CT also contrast-enhancement CT, CT
angiography and CT perfusion), due to high specificity in the diagnosis of stroke
mimic, is able supporting the clinician in the choice of avoiding revascularization
treatment in patients diagnosed with stroke mimics.
Thus, the distinction between stroke and tumour visualised in CT images is
not always straightforward, even for experienced radiologists (especially since the
other observed clinical symptoms of these two conditions may be similar). In [13]
the retrospective study made on 224 patients with brain tumour demonstrated
that 4.9% of them were initially misdiagnosed as stroke. The conclusion from this
study is that CT is frequently employed for acute neurologic deficits to exclude
intracranial haemorrhage but CT may not be always sufficient to exclude brain
tumour. For these reasons, there is a real need to develop a system that will
support the diagnosis of these two brain diseases.
There are several reports in the literature on the use of computer techniques
for the diagnosis of tumour and stroke separately.
Fahmi [4] developed the automatic detection of a brain tumour in computed
tomography (CT) images. After thresholding, the image features were deter-
mined on the images, and then the “zoning” method was applied, which facili-
tates the extraction of features by dividing into zones. As a result, vector features
are obtained that can be used for classification. LVQ (Learning Vector Quanti-
zation) was used – a type of machine learning that enables training in supervised
layers in the competitive layer. By combining the zoning algorithm with vector
quantization learning, they obtained the classification of brain CT images into
healthy and sick with an average accuracy of 85%.
Another example of brain tumour detection was presented by Nanthagopal
and Rajamony [14]. The method they use is based on regions of interest. Seg-
mentation is performed using the Ncut method. It divides the image into smaller
fragments maintaining the continuity of the edges. Another approach is to use
the fuzzy c-means grouping method. This technique assigns each sample to a
cluster considering the similarity. This way, you can distinguish an invalid region
from a normal one. Then an analysis of the texture characteristics is performed.
The edge features were considered, including the average, variance of the pixel
matrix, as well as the energy or entropy of the entire studied area. The features
are determined with the use of the wavelet transform. Classification is done using
the vector machine supporting SVM, which is an example of supervised learning.
This solution uses a non-linear fit. The greatest efficiency was achieved using a
nucleus with a Gaussian RBF function.
Discrimination Between Stroke and Brain Tumour 171
Ramakrishnan [21] also presented how to classify brain CT images for brain
tumour detection. The classification process proposed in this approach is carried
out by Support-vector Machine (SVM) with various kernel functions and opti-
mization techniques followed by a segmentation process carried out by Modified
Region Growing (MRG) with threshold optimization. Algorithms such as Har-
mony Search (HS), Evolutionary Programming (EP) and Gray Wolf Optimiza-
tion (GWO) were used in threshold optimization. The entire approach was imple-
mented on the MATLAB platform. The obtained experimental results showed
that the proposed methodology (MRG-GWO) is characterized by high accuracy
and increased tumour detection compared to the other two techniques, achieving
an accuracy of 99.05%.
Qasem [19] to detect brain tumours used MRI scans, a watershed technique
(that enabled feature extraction) and machine learning. They decided to use
MRI scans instead of computed tomography images because of the less harm-
ful examination. The applied segmentation ensured appropriate identification of
regions (both foreground and background) with minor calculations. It also solved
the problem with wrong edges and distorted borders in images. The algorithm
of k-nearest neighbours was used for the classification.
Šušteršič [28] used active contour methods to evaluate the optimal trajectory
of the approach to a brain tumour. In this case, segmented CT images were used.
For this purpose, a method was used based on active contour models, considering
the following parameters: the complexity of the initial conditions, the accuracy
of the recognition of the tumour surface, computational time. The disadvantages
of this method are the strong dependence on the initial curve position and the
high sensitivity to image noise.
One of the methods of automatic detection of stroke in CT images was devel-
oped by Chawla [1], which is based on the detection of abnormalities between
the left and right hemispheres of the brain. It uses image histograms that differ
significantly in the event of a stroke. The method used ensures 90% accuracy
and 100% sensitivity in detecting abnormalities at the patient’s level, while at
the layer level it achieves an average precision of 91% and a sensitivity of 90%.
A different approach was proposed by Dourado [3], in which to classify strokes
from CT images deep learning was applied. The convolutional neural network
(CNN) to identify a healthy brain, ischemic stroke or haemorrhagic stroke was
implemented. The designed CNN network was combined with various consoli-
dated machine learning methods such as Bayesian classifier, multi-layer percep-
tron, k-Nearest Neighbour, Random Forest, and Support-vector Machine.
Kalmutskiy [10] in their work focused on the method of automatic recognition
of strokes with the use of brain images without contrast computed tomography.
The distinguishing features of this solution are the use of a data set with a very
small number of images, as well as the fact that traditional methods of computer
vision and a convolutional neural network were applied to solve the problem.
Augmentation techniques and sub-image division have also been implemented
to increase the analysed dataset.
172 M. Kobus et al.
Nedel’ko [15] also presented a method that allows the detection of strokes
from CT without the used contrast. The technique used deep neural networks
and classifiers based on image texture properties. They applied the basic U-
net neural network architecture and Haralick texture features extracted from
images for the following classifiers: SVM, Adaboost, KNN and Random Forest.
This method makes it possible to provide information that makes it possible to
recognize contours and evaluate the importance of texture features. Reported
accuracy was 83%.
Ostrek [17] used the diagnosis of strokes based on computed tomography
images. Eight non-stroke cases and thirty-two stroke cases were analyzed. The
method was based on automatic CT image segmentation, multiscale analysis,
expression, and visualization of semantic maps, as well as automatic patch based.
The following results were obtained: accuracy 74.6%, sensitivity 64.2%, speci-
ficity 82.6%.
From the above literature review it appears that there are solutions for detect-
ing stroke and brain tumor separately, but no methods that distinguish the two
diseases. This may mean that the need to distinguish them simultaneously has
not been widely identified in the neurology community. On the other hand, a
team of neurologists and radiologists from the Department of Neurology and
Stroke, Medical University of Lodz, identified the difficulty in discriminating
between these two diseases as an important diagnostic problem. Therefore, this
paper presents a system that allows to distinguish a stroke from a brain tumor
on CT images based on the analysis of texture features, which will support doc-
tors in making a correct diagnosis. No additional medical imaging techniques
(such as accurate but expensive MRI, CTA or CTP) will be needed to identify
these diseases, saving patients time and money. It was already demonstrated
that image texture reflects the structure and properties of visualized organs and
tissues, for almost all medical imaging modalities, like e.g. ultrasound [2,16],
MRI [5,24] or CT [14,15]. Thus, texture analysis approach was also selected to
solve the problem discussed in this research.
to 10936 (mean 2968) pixels and the ROIs size for brain tumour ranged from 247
to 4646 (mean 1896) pixels. Such ROI sizes ensure reliable estimation of texture
parameters.
Fig. 1. Sample images with marked ROIs: (a) stroke, (b) brain tumour
2.2 Methodology
A block diagram of the adopted methodology is shown in Fig. 2. The input of the
algorithm is the image with marked ROI that represents brain pathology. In the
first step, 337 texture features (analysis options: no ROI normalization, 8 bits
grey level images) of each ROI in the image using Qmazda program were com-
puted automatically with batch file for loading images and calculating selected
texture features. Qmazda is a software package for digital image analysis, includ-
ing computing shape, colour, and texture attributes of any shape of regions of
interest. Reduction of number of bits per pixel to 8 (originally acquired images
are 12 bits coded) is caused by the need of noise reduction in collected texture
information. Bigger number of bits better reflects the tissue structure in the
image. However, on the same time the noise (always present in medical images)
is emphasized. It blurs the texture and as a result modifies texture parameter
values. It was demonstrated that lower number of bits may improve the tex-
ture classification accuracy [11,25]. Qmazda package consists of four programs:
MaZda, MzGengui, MzReport and MzMaps [26,27]. In this study MzReport
program was used to select 10 features based on texture analysis of ROI. The
MzReport is a program for data analysis, selection of most discriminative fea-
tures, visualization of feature vector distributions, supervised machine learning
and testing of the resulting classifiers [26]. Selection of mentioned texture features
174 M. Kobus et al.
was done with three algorithms implemented in the MzReport program: Fisher
coefficient-based discrimination, Mutual information and Convex hull based dis-
crimination. Then, based on selected features the data classification was per-
formed using Classification Learner application available in MATLAB R2021b.
Manual feature reduction was performed before the classification to find most
suitable feature subsets (containing 2 or 3 texture parameters) and to avoid the
classifier overtraining. This is acceptable number of dimensions when compared
to the number of samples (83). This also guaranties reasonable classifier size
as well as reliable classification results. The automatic feature selection meth-
ods use general criteria (like Fisher coefficient or Mutual Information values)
that not always ensure the best feature subset selection. It is due to e.g. pos-
sible feature correlation (Fisher approach) or non-optimal convex hulls shape
(caused by outliers). Thus, not only the features with highest rank generated
by feature selection methods were tested but also subsets selected from best 4–5
features were analyzed. Thanks to this approach, it was possible to obtain a
better classification result (at the level of 2% percent approximately). The Clas-
sification Learner trained models to classify data into strokes and tumours cases.
To learn the model, the following 32 classifiers were used: Decision Trees, Linear
and Quadratic Discriminant Analysis, Logistic Regression, Naive Bayes Clas-
sifiers, Support Vector Machines (with linear, quadratic, cubic, and Gaussian
kernels), K-Nearest Neighbour Classifiers (k = 1, 10, 100), Ensemble Classifiers
(Boosted Trees, Bagged Trees, Subspace Discriminant, Subspace KNN, RUS-
Boosted Trees), Multilayer Perceptrons (different structures), Kernel Approx-
imation Classifiers. All experiments involving three feature selection methods
and various classifiers were conducted independently, thus for each case the same
relation between number of features and training samples was preserved.
3 Results
selection methods, mutual information, and convex hull, 90% of all were based
on the Gray Level Co-occurrence Matrix (GLCM). Features were computed for
four specified directions: 0◦ (horizontal), 45◦ , 90◦ (vertical), and 135◦ and inter-
pixel distances in the range of 1 to 4. The remaining texture features were based
on the histogram, including the mean value and Grey-Level Run-Length Matrix
(evaluated in the vertical direction).
For further analysis, manual feature reduction was performed for the obtained
selected sets of 10 features. Its aim was to limit the number of features by
discarding the correlated ones and others, that did not improve the classification
result. Resulting subset contain 2 or 3 features. Such low number of input features
simplify the classifier architecture and avoid the overtraining. In each case, the
best results were obtained for the combinations of features with highest ranks as
selected by the feature selection algorithms. For each selected subset of features,
32 different classifiers were applied with 5-fold cross-validation. For each of 5
training experiments similar quality measure values were observed what means
that classifier achieved good generalization for the investigated dataset, avoiding
overfitting. The best classification results obtained by feature subsets selected
by three algorithms are presented in Table 1 (three best classifiers were selected
for each feature classification method). Parameters in Table 1 were evaluated as
a means of values obtained in each experiment during 5-fold cross-validation.
Table 1. Quality measure values for best classifiers applied for selected texture feature
subsets (Acc – Accuracy, Sens – Sensitivity, Spec – Specificity)
Selection method Features Classification method Acc [%] Sens [%] Spec [%]
Fisher HistPerc99 Weighted KNN 96.4 92.9 100.0
Discrimination ArmTeta2 Fine KNN 95.2 94.9 95.5
(FD) GradMean Fine Gaussian SVM 94.0 94.7 93.3
Mutual glcmZ4InvDfMom Bagged Trees 90.4 89.7 909
Information glcmV1AngScMom Weighted KNN 88.0 80.9 97.2
(MI) grlmVLngREmph Fine Gaussian SVM 86.7 80.9 92.3
Convex glcmH2SumVarnc Fine Tree 95.2 90.7 100.0
Hull HistMean Weighted KNN 94.0 92.5 95.3
(CH) Bagged Trees 94.0 90.5 97.6
As shown in Table 1, the features selected by all three methods can be used
to develop a system with high classification efficiency around 86% and higher,
up to over 96%. As can be seen, the best results were achieved for the Fisher
discriminant (FD) and convex hull (CH) selection method, while the weakest
when using mutual information. It is worth noting that the best results for FD-
based selection were achieved using three features, while for CH-based selection
only two were needed. It can be also observed that among all the classifica-
tion methods, the most effective were those based on classification trees, KNN
176 M. Kobus et al.
(k = 10), and SVM with Gaussian kernel. Sample feature distributions and con-
fusion matrices (generated by the MATLAB classification Learner application)
are shown in Figs. 3, 4 and 5.
Fig. 3. (a) Distribution of two FD-based texture features and (b) confusion matrix of
the weighted 10-NN classifier
Fig. 4. (a) Distribution of two MI-based texture features and (b) confusion matrix of
the the bagged trees classifier
Discrimination Between Stroke and Brain Tumour 177
Fig. 5. (a) Distribution of two CH-based texture features and (b) confusion matrix of
the the fine tree classifier
Performed experiments demonstrated that for analysed data, Fisher based cri-
terion and convex hull-based feature selection are the best methods of obtaining
features that can be used to discriminate between brain tumour and stroke
CT images. It is worth highlighting that second method used only two features
instead of three as in the first approach while the similar accuracy was obtained
(over 95%). Significantly lower scores were observed for the features selected by
mutual information method. Therefore, it can be concluded that convex hull
method selects more discriminative features that are better able to separate
images into classes.
Among all the selected features, most were based on the histogram and Gray
Level Co-occurrence Matrix (GLCM). This demonstrates that not only the tex-
ture features can be used to classification, but also the histogram ones. Inter-
estingly, the combination of these two types of features, mean brightness and
sum variance, turned out to be one of the best solutions with an accuracy of
95,2%. A higher score (96,2%) was achieved only for classification using three
features selected by Fisher discriminant (histogram 99th percentile, autoregres-
sive model parameter, and gradient map mean). Such results indicate that dif-
ferences between brain stroke and tumour visualized in CT images are not only
caused by the structure of diseased brain tissue (reflected by the texture param-
eters) but also by different grey level distribution that characterizes analysed
types of pathologies (coded by the histogram features).
178 M. Kobus et al.
References
1. Chawla, M., Sharma, S., Sivaswamy, J., Kishore, L.T.: A method for automatic
detection and classification of stroke from brain CT images. In: Proceedings of the
31st Annual International Conference of the IEEE Engineering in Medicine and
Biology Society: Engineering the Future of Biomedicine, EMBC 2009, pp. 3581–
3584 (2009). https://doi.org/10.1109/IEMBS.2009.5335289
2. Chrzanowski, L., Drozdz, J., Strzelecki, M., Krzeminska-Pakula, M., Jedrzejewski,
K.S., Kasprzak, J.D.: Application of neural networks for the analysis of intravas-
cular ultrasound and histological aortic wall appearance - an in vitro tissue char-
acterization study. Ultrasound Med. Biol. 34, 103–113 (2008). https://doi.org/10.
1016/J.ULTRASMEDBIO.2007.06.021
3. Dourado, C.M., da Silva, S.P.P., da Nóbrega, R.V.M., Antonio, A.C., Filho, P.P.,
de Albuquerque, V.H.C.: Deep learning IoT system for online stroke detection in
skull computed tomography images. Comput. Netw. 152, 25–39 (2019). https://
doi.org/10.1016/J.COMNET.2019.01.019
4. Fahmi, F., Apriyulida, F., Nasution, I.K., Sawaluddin: Automatic detection of
brain tumor on computed tomography images for patients in the intensive care
unit. J. Healthc. Eng. 2020 (2020). https://doi.org/10.1155/2020/2483285
5. Gentillon, H., Stefańczyk, L., Strzelecki, M., Respondek-Liberska, M.: Parameter
set for computer-assisted texture analysis of fetal brain. BMC Res. Notes 9, 1–
18 (2016). https://doi.org/10.1186/S13104-016-2300-3/TABLES/2. https://link.
springer.com/articles/10.1186/s13104-016-2300-3
6. Ghosh, M.K., Chakraborty, D., Sarkar, S., Bhowmik, A., Basu, M.: The interre-
lationship between cerebral ischemic stroke and glioma: a comprehensive study of
recent reports. Sign. Transduct. Target. Ther. 4(1), 1–13 (2019). https://doi.org/
10.1038/s41392-019-0075-4
7. Gośliński, J.: Nowotwory ukladu nerwowego - przyczyny i rodzaje - zwrotnik raka.pl
(2019). https://www.zwrotnikraka.pl/przyczyny-rodzaje-guzow-mozgu
8. Hatzitolios, A., et al.: Stroke and conditions that mimic it: a protocol secures a
safe early recognition. Hippokratia 12(2), 98 (2008)
9. Janowski, P., Strzelecki, M., Brzezinska-Blaszczyk, E., Zalewska, A.: Computer
analysis of normal and basal cell carcinoma mast cells. Med. Sci. Monit. 7(2),
260–265 (2001)
10. Kalmutskiy, K., Tulupov, A., Berikov, V.: Recognition of tomographic images in
the diagnosis of stroke. Lecture Notes in Computer Science (including subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol.
12665 LNCS, pp. 166–171 (2021). https://doi.org/10.1007/978-3-030-68821-9 16
11. Kociolek, M., Strzelecki, M., Obuchowicz, R.: Does image normalization and inten-
sity resolution impact texture classification? Comput. Med. Imaging Graph. 81,
101,716 (2020). https://doi.org/10.1016/j.compmedimag.2020.101716
12. Klos-Wojtczak, P.: Mózg czlowieka - jaka jest jego budowa i funkcje? hellozdrowie
(2019). https://www.hellozdrowie.pl/mozg-czlowieka-anatomia-i-fizjologia-
organu/
13. Morgenstern, L.B., Frankowski, R.F.: Brain tumor masquerading as stroke. J. Neu-
rooncol. 44(1), 47–52 (1999). https://doi.org/10.1023/A:1006237421731
14. Nanthagopal, A.P., Rajamony, R.S.: A region-based segmentation of tumour from
brain CT images using nonlinear support vector machine classifier. J. Med.
Eng. Technol. 36, 271–277 (2012). https://doi.org/10.3109/03091902.2012.682638.
https://pubmed.ncbi.nlm.nih.gov/22621242/
180 M. Kobus et al.
15. Nedel’ko, V., Kozinets, R., Tulupov, A., Berikov, V.: Comparative analysis of deep
neural network and texture-based classifiers for recognition of acute stroke using
non-contrast CT images. In: Proceedings - 2020 Ural Symposium on Biomedical
Engineering, Radioelectronics and Information Technology, USBEREIT 2020, pp.
376–379 (2020). https://doi.org/10.1109/USBEREIT48449.2020.9117784
16. Obuchowicz, R., Kruszyńska, J., Strzelecki, M.: Classifying median nerves in carpal
tunnel syndrome: ultrasound image analysis. Biocybern. Biomed. Eng. 41(2), 335–
351 (2021). https://doi.org/10.1016/j.bbe.2021.02.011
17. Ostrek, G., Przelaskowski, A.: Automatic early stroke recognition algorithm in
CT images. Lecture Notes in Computer Science (including subseries Lecture Notes
in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7339 LNBI,
pp. 101–109 (2012). https://doi.org/10.1007/978-3-642-31196-3 11. https://link.
springer.com/chapter/10.1007/978-3-642-31196-3 11
18. Prodi, E., et al.: Stroke mimics in the acute setting: role of multimodal CT protocol.
Am. J. Neuroradiol. (2022). https://doi.org/10.3174/ajnr.A7379
19. Qasem, S.N., Nazar, A., Qamar, A., Shamshirband, S., Karim, A.: A learning based
brain tumor detection system. Comput. Mater. Cont. 59, 713–727 (2019). https://
doi.org/10.32604/CMC.2019.05617
20. Raciborski, F., Gawińska, E., Klak, A., Slowik, A., Wnuk, M.: Udary mózgu:
rosnacy
problem w starzejacym sie spoleczeństwie. Instytut Ochrony Zdrowia
(2016)
21. Ramakrishnan, T., Sankaragomathi, B.: A professional estimate on the com-
puted tomography brain tumor images using SVM-SMO for classification
and MRG-GWO for segmentation. Patt. Recogn. Lett. 94, 163–171 (2017).
https://doi.org/10.1016/J.PATREC.2017.03.026. https://dl.acm.org/doi/abs/10.
1016/j.patrec.2017.03.026
22. (red.), S.M.: Radiologia - diagnostyka obrazowa, cześć II. Akademia Medyczna w
Gdańsku (2001)
23. Sobotko-Waszczeniuk, O., L ukasiewicz, A., Pyd, E., Janica, J.R., L ebkowska, U.:
Differentiation of density of ischaemic brain tissue in computed tomography with
respect to neurological deficit in acute and subacute period of Ischaemic stroke.
Polish J. Radiol. 74(3) (2009)
24. Strzelecki, M.: Texture boundary detection using network of synchronised oscilla-
tors. Electron. Lett. 40, 466–467 (2004). https://doi.org/10.1049/EL:20040330
25. Strzelecki, M., Kociolek, M., Materka, A.: On the influence of image features
wordlength reduction on texture classification. In: International Conference on
Information Technologies in Biomedicine, pp. 15–26. Springer (2018). https://doi.
org/10.1007/978-3-319-91211-0 2
26. Szczypinski, P.M., Klepaczko, A., Kociolek, M.: Qmazda - software tools for image
analysis and pattern recognition, pp. 217–221. IEEE Computer Society (2017).
https://doi.org/10.23919/SPA.2017.8166867
27. Szczypiński, P.M.: qmazda manual (2020). http://www.eletel.p.lodz.pl/pms/
Programy/qmazda.pdf
28. Šušteršič, T., Peulić, M., Filipović, N., Peulić, A.: Application of active contours
method in assessment of optimal approach trajectory to brain tumor. In: 2015
IEEE 15th International Conference on Bioinformatics and Bioengineering, BIBE
2015 (2015). https://doi.org/10.1109/BIBE.2015.7367661
The Influence of Textural Features
on the Differentiation of Coronary Vessel
Wall Lesions Visualized on IVUS Images
1 Introduction
The rush of life, poor diet and stress cause many types of disease that affect
human health. One of them is atherosclerosis, which causes lipids to accumulate
in the walls of blood vessels. Development of atherosclerosis affects the release
of growth factors and cytokines that cause inflammation in blood vessels. The
first lesion is a fatty band which contains T lymphocytes and foam cells in its
structure [1]. The disease can progress, thus causing this fatty band to turn
into an atherosclerotic plaque. There are different types of plaques with dif-
ferent echogenicity, hence they are visible and distinguishable on IVUS images.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 181–193, 2022.
https://doi.org/10.1007/978-3-031-09135-3_16
182 W. Malek et al.
pixel classification [7] and later segmentation of the image the is possibility to
assign fragments of the image into appropriate classes based on pixel similarity.
Segmentation of IVUS images is difficult. Algorithms developed for this segmen-
tation, such as semi-automatic algorithms, which may al-low better results, often
have drawbacks. For example, semi-automatic algorithms need manual interac-
tion. Another algorithm uses too much memory or is very time consuming [8].
J. Tong et al. detected the lumen contour based on textural features and sparse
kernel coding [5]. The features they tested are based on first-order statistics
and feature calculation for the gray-level co-occurrence matrix for 4 angles. The
results they obtained are satisfactory but do not show the effect of normalization
or quantization [9] on the improvement or deterioration of classification [5].
M. Kociolek et al. indicated that normalization and its effect on improving
classification are affected by the algorithm used [10]. Although their study was
performed on MRI images, it is important to see if the same relationship applies
to images obtained with IVUS. P. Mazur pointed out that quantization also
affects some elements by influencing the values of correlation coefficients [11].
Thus, this is another aspect to be considered in the analysis of plaque and blood
textures in IVUS images.
Subsequent analyses were carried out using first-order statistical methods,
the Haralick method, the Laws texture energy method, the differential gray
neighborhood matrix method (NGTDM), and the texture spectrum method.
Classification of textural features was carried out using the pattern recognition
method, and validation of the results was performed using resubstitution and
cross-validation methods. The results showed that texture analysis is possible
and the best classification is done on texture features obtained from the Haralick
method [12].
Another example of a textural analysis approach is the use of MaZda soft-
ware [13,14]. MaZda is software that focuses on different algorithms and different
color models and allows the analysis of regions of interest for more than 10,000
features. Although this software does not focus on all possible texture parame-
ters, it is a good tool to initially discern between texture parameters that have
the potential to differentiate between plaques and blood in the vessel lumen.
Additional parameters such as the GDTM, for example, may be considered in
later stages of the analysis [15].
in which the atherosclerotic plaque and blood in the lumen of the vessel were
visible. Although the features of the plaque area or the vessel lumen were not
considered in this analysis, images were selected so that the area of these regions
was as large as possible so as to allow better comparison of data obtained from
the textural analysis.
The first stage of data preparation was to superimpose the contours of the
vessel lumen and the vessel wall, which were described for each DICOM image
frame as coordinates. The Python programming language version 3.9.10 64-bit
was used for this. To perform statistical analysis and logistic regression, the fol-
lowing Python libraries were used: Scikit-learn [16], Matplotlib [17], Seabron [18],
NumPy [19], Pandas [20], SciPy [21], CSV. The coordinates were rounded to
integer values and then binary masks were created from them (Fig. 1).
Binary masks were applied to 30 selected IVUS images and then prepared so
that only the plaque and lumen of the vessel were visible on the image without
including adventitia (Fig. 2).
Fig. 2. Prepared image for textural analysis with selected ROI of plaque (green) and
lumen (red)
The Influence of Textural Features on the Differentiation 185
The obtained images were entered into qMaZda software, and the area
belonging to the plaque and the area containing blood were manually marked.
Textural analysis was performed for the two regions of interest in qMaZda soft-
ware. 10,477 textural features were obtained for 30 images for the plaque and the
blood. IVUS images focus primarily on brightness, hence the features considered
later are based only on the brightness of the image marked as Y from the YUV
model. Thus, the number of features was reduced to 6,770.
The next step was to find the features that were most different from each
other. Statistical methods were used for this. Statistical tests were performed
for paired data, which meant that the feature determined for the atherosclerotic
plaque was compared with the values of the same feature determined for blood.
The Shapiro-Wilk test was performed to determine the data distribution of the
values obtained for each feature. A distinction was made between data that came
from a normal distribution and those that came from a non-normal distribution.
For features in which the data for plaque and blood were distributed differently,
it was considered that they would be analyzed as features from a non-normal
distribution.
3 Experimental Results
The Mann-Whitney test and the t-Student test for dependent samples were
conducted for data from a non-normal distribution and data from a normal
distribution, respectively. This determined whether the data were from a single
sample or not. Spearman’s and Pearson’s correlations were then used to find the
features that were most different between plaques and blood. Table 1 shows the
lowest values of both correlations for statistically valuable data which indicates
the biggest difference between plaque and blood.
Due to the large number of features that MaZda presents, the program has
appropriate labels that allow the features to be recognized more easily. A feature
name consists of a color component, normalization and quantization, a feature
extraction algorithm, a feature, and a feature name abbreviation. An example
feature name is YD8GlcmZ5Correlat, which indicates a feature whose color com-
ponent is brightness; the image is analyzed without normalization using 8 bits
for grayscale encoding. Glcm indicates that the gray level co-occurrence matrix
algorithm is used. Z5 indicates the encoding direction and distance between pix-
els (in this case (5,−5,0)). ‘Correlat’ indicates one of the characteristics of Glcm.
The other feature names are presented in the same way. Features whose calcu-
lation was based on the area of the marked region of interest were discarded
because the fields differed in size for each image analyzed. Thus, the results that
would have been obtained might have been unreliable. More than one normal-
ization is available. S means that the mean value μ and standard deviation σ of
gray-levels are computed. The range for further computation is <μ−3σ, μ−3σ>.
N means that the area gray-level histogram percentiles are computed. The new
range is defined by the first and ninety-ninth percentiles <p1, p99>. Finally, M
shows the minimum and maximum gray-levels found in the region of interest
which define a new range [13].
186 W. Malek et al.
Table 1. Features from the normal distribution and their Pearson’s (P) or Spearman’s
(S) coefficient values
The values of Pearson’s and Spearman’s coefficients are negative. The fea-
tures were checked for the possibility of classifying regions of interest based on
them. Logistic regression was used for this purpose. A value of 1 was assigned
to values be-longing to plaque; a value of 0 was assigned to values belonging to
blood. The data set was divided such that 70% of the data was the training set
and 30% of the data was the test set. A logistic regression plot was then cre-
ated for each feature (Fig. 3). The confusion matrix for each feature showed the
effectiveness of the classification. The results from each matrix were collected in
Table 2.
Table 4. Results obtained for different bits for the statistically selected feature for
GlcmCorrelat
Table 6. Results obtained for different bits for the statistically selected feature for
GlcmEntropy
Table 8. Results obtained for different bits for the statistically selected feature for
GlcmDifVarnc
Table 10. Results obtained for different bits for statistically selected feature GabMag
Table 12. Results obtained for different bits for the statistically selected feature for
DwtHaar
4 Disscusion
The analysis indicated several features that can be used to classify plaque and
distinguish it from blood. The classification accuracy values determined by logis-
tic regression oscillated around 0.7 or 0.8 for most features. The obtained fea-
tures were determined using Gray Level Co-Occurance Matrix (Glcm), Gabor
Transform (Gab), and Discrete wavelet transform (DwtHaar) algorithms.
The Influence of Textural Features on the Differentiation 191
The overarching goal of the work was to find the best possible features to
classify soft tissue plaque and blood. MaZda software makes it possible to check
different types of normalization and quantization for all algorithms and their
features. The information obtained from this software was used in this paper to
find the relationships. The results obtained varied according to the algorithm
and its features, hence the conclusions will be presented separately for each
algorithm.
The correlation obtained from Glcm was from the normal distribution of
the features. The Pearson coefficient showed differences in values for different
normalizations. In most cases, this translated into classification accuracy using
logistic regression. The feature obtained in the statistical analysis of texture
features had the best accuracy. No normalization or different normalization at
the same quantization showed worse classification results. Next, the effect of
quantization on the normalization that obtained the best values was considered.
In this case, the Pearson coefficient and accuracy for different bits were the same
or changed slightly, but this did not affect the overall results.
The entropy from the Glcm algorithm came from a non-normal distribution,
hence Spearman correlation was used to determine the coefficient. In this case,
normalization also affects the results. Better accuracy was obtained for the nor-
malized features. However, the value of Spearman’s coefficient did not translate
into accuracy results. Although the value of the correlation coefficient differs
significantly for different bits obtained for Entropy, the accuracy for each bit
remains the same. The same conclusion is reached for DifVanc calculated from
Glcm, and Mag calculated from Gab. The different value of bits does not affect
the classification accuracy. Both these features also show that normalization sig-
nificantly affects the accuracy values. In these cases, the difference between the
features is also shown in the obtained values of the Spearman coefficients. Lower
values of this coefficient give better accuracy in logistic regression.
The logistic regression plots showed very insignificant differences, which could
be significant in the development of automatic segmentation algorithms. The
feature maps for the different features attached in the paper show that these
texture analysis methods successfully distinguish blood and plaque in images.
Acknowledgement. This work was financed by AGH University of Science and Tech-
nology thanks to the Rector’s Grant 17/GRANT/2022. This work was co-financed
by the AGH University of Science and Technology, Faculty of EAIIB, KBIB no
16.16.120.773.
References
1. Szczeklik, A., Tendera, M.: Kardiologia tom 1, 696 (2009)
2. Garcı́a-Garcı́a, H.M., Gogas, B.D., Serruys, P.W., Bruining, N.: IVUS-based
imaging modalities for tissue characterization: similarities and differences. Int.
J. Cardiovas. Imaging 27(2), 215–224 (2011). https://doi.org/10.1007/s10554-010-
9789-7
192 W. Malek et al.
1 Introduction
Demographic projections indicate growth in life expectancy and a significant
increase in the proportion of the elderly population. One of the undesired side
effects of this process is a growing number of people affected by age-related
diseases. It is estimated that in 2030 dementia can affect up to 65 million people
[28].
Dementia (or major neurocognitive disorder) is a complex disorder compris-
ing of cognitive and behavioral changes that lead to the loss of independence
in the activities of daily living [3]. The most common causes of dementia are:
Alzheimer’s disease, vascular dementia and dementia with Lewy bodies.
Early diagnosis enables introduction of non-pharmacological and pharmaco-
logical management. Also, of note, dementia may be related with poor medi-
cation management, which may put the patient at risk of complications due to
omission or overdose of prescribed drugs.
Clock Drawing Test (CDT) is one of the most popular cognitive screening
tests that has been used for years not only at stroke and dementia clinics [2],
but also in the primary care [19,21]. It is being used as a standalone tool, as
a part of tests as ACE (Addenbrooke’s Cognitive Assessment) [20], GPCOG
(General Practitioner Assessment of Cognition) [10], MoCA (Montreal Cogni-
tive Assessment) [27], Mini-Cog [9] or in conjunction with Mini-Mental State
Examination [14]. It is an easy-to-administer test, but procedures and scoring
methods differ considerably. It may involve drawing, filling in or copying a clock
face, sometimes with hands to indicate a specific time (most commonly 11:10,
2:45 or 8:20) [24,36]. CDT engages visual semantic memory (as the patient has
to visualize the clock face with reference to his/her knowledge), executive func-
tion (especially planning) and visuospatial function. Thus, CDT engages 3 out
of 6 cognitive domains specified in DSM-V [3]. Lower CDT score is related to
greater risk of falls in Alzheimer’s disease; CDT performance may be also used
to predict driving ability [15,34]. CDT may be either simply classified as nor-
mal vs. impaired or scored according to the quantitative characteristics (e.g.
score depends on the correct placement of numbers and sometimes hands) or
qualitative analysis (the presence of critical errors) [36]. Differences in the test
procedure and its scoring make CDT research data difficult to compare. Each
scoring methods has its own psychometric characteristics. Complex scoring sys-
tems usually have lower inter-rater reliability [2,24]. Some of the scoring systems
require the use of the template to objectively assess if the numbers are placed
in the correct sections of the clock face. Obviously, the templates can be used
only with clock face filling procedures such as the one devised by Watson et al.,
which refers to the clock face quadrants [35] or the one by Manos and Wu, which
refers to the clock face octagons [26].
In recent years, different solutions have been proposed for the digital version
of the Clock Drawing Test. The subject performs the standard task but draws
it on the tablet [18]. Apart from the clock sketch, some additional data are also
available in this type of test. Nirjon et al. [29] used information referring to the
moment when the user first touches the screen and ending when he lifts his finger
196 J. Kawa et al.
up. Another data could be the mobile sensor data such as x and y coordinates,
timestamps, and touch events [30]. Binaco et al. [8] used not only a tablet but
also a pen to register 350 different variables useful for analysis.
However, although the sensitivity of digital and paper-pencil versions is com-
parable [11], it seems that the digital version of the CDT can increase the level of
anxiety and stress in some of the elderly influencing the test results [6]. Therefore,
many studies are concerned with the automated assessment of analog (paper)
tests. An attempt to automatically evaluate the analog test was made by Guha
et al. [17]. However, it is based on digit recognition, assessing whether 12 num-
bers are present and where their centroids lie.
A group of methods features artificial intelligence and return information
about a suspected disorder. The neural networks are pre-trained with the set of
correct and incorrect clock images. They do not evaluate particular elements of
the image; thus, it does not allow for detailed scoring of the result [4,12,37].
The basic CDT assessment is a challenging task for an automated method,
as the scoring usually addresses not only the presence of indispensable elements,
but also the absence of irrelevant elements.
Still, manual, fine-grained scoring is significantly more troublesome for
humans and depends on the templates, guides, or experience. The computer-
aided analysis of the CDT could easily objectify the detailed scoring, providing
preprocessed CDT sheets or interactive utilities, yet leave the final decision to
the expert. Such an approach has successfully been employed in radiology.
Indeed, radiologists often employ computer-aided diagnosis (CAD) systems
to evaluate lesions, complex measurements, follow-up assessment, or triage. In
radiology, the CAD is traditionally [23] a Picture Archiving and Communications
System (PACS)/Vendor Neutral Archive (VNA) plugin or a standalone appli-
cation running on the radiological workstation. It operates on DICOM (Digi-
tal Imaging and Communications in Medicine) [7] objects as acquired by com-
puted tomography, magnetic resonance, etc., and stores the results as DICOM-
compatible data or in external databases. It operates using various image pro-
cessing techniques [31] or AI [16]. Moreover, it often integrates with Radiological
Information Systems (RIS) to help radiologists prepare reports. The same work-
flow could easily be adapted to any computer-assisted assessment or diagnosis
task requiring images on input and report generation, provided PACS/VNA and
RIS infrastructure could be used.
A proof of concept computer aided analysis module dedicated to static CDT
assessment is introduced in the paper. The module is designed for integration
with the PACS/VNA for result storage and RIS for presentation and expert eval-
uation. It provides REST (Representational State Transfer) and DICOM net-
work interfaces to directly process raster images or DICOM encapsulated scans.
The numerical results are stored in an auxiliary, transactional database. The
graphical results are returned as raster images or Secondary Capture DICOM
images for examination. Our long-term goal is to provide the tool to experts
evaluating digitized, traditional CDT sheets via the PACS/VNA and RIS sys-
tems using various scoring systems. However, at the moment, the CDT scans are
Computer Aided Analysis of Clock Drawing Test Samples 197
processed for evaluation according to the Manos and Wu scale [25]. Manos and
Wu scale is a 10-point scoring system based on adding a point for each of the
eight digits (1, 2, 4, 5, 8, 7, 10, 11) placed in a proper one-eighth of the circle
(octant) and two points for hands indicating 2 and 11. During the evaluation, a
template positioned over the clock drawing is employed.
In the study, digital Clock Drawing Test sheets are processed. The binary images
containing a clock face, hours, and hands drawn by the patients are imported into
the Matlab environment. Next, the binary images are subjected to the morpho-
logical opening and flood-filled to remove holes. Then, an object corresponding
to the clock face is selected. Parameters of the object determine the location of
the clock face and define the ROI. Finally, the Manos and Wu template is posi-
tioned. Objects inside the ROI are labeled and clustered in groups containing
separate hours. The object’s location is used to assign the objects (hours) to the
appropriate section of the Manos and Wu template and generate the report for
the expert.
The method is implemented in Matlab and deployed as a standalone exe-
cutable working inside a Docker container. The public interface consists of asyn-
chronous DICOM SecondaryCaptureImageStorage SOP service and synchronous
Python (Flask) REST service.
2.1 Materials
During the study, deidentified Clock Drawing Test digital sheets were used. Sam-
ples were acquired using a mobile application Test Pamieci
(Memory Test) run
on two mobile devices: Samsung Galaxy Tab 4 (7 touch screen; resolution:
1280 × 800): 20 cases and Sony Xperia Z2 (10 touch screen; screen resolution
2560 × 1600): 16 cases.
Blank test sheets provided to patients via app had the clock face and the
center of the clock marked. The examinees (elderly patients of John Paul II
Geriatric Hospital in Katowice and volunteers) were asked to draw all the num-
bers designating hours inside the provided circle using a writing tool (stylus)
or a finger at the correct positions and then draw the clock hands pointing at
11:10.
During the acquisition, static image, as well as dynamical parameters, were
registered. However, this study did not employ the dynamical parameters as the
sheets were meant to resemble paper scans. The static images were captured
with the screen resolution. Exemplary scans are shown in the Fig. 1.
The data was acquired within study IS-2/54/NCBR/2015 co-financed by the
National Centre for Research and Development and approved by the bioethical
committee by the Jerzy Kukuczka Academy of Physical Education in Katowice,
resolution 1/2015.
198 J. Kawa et al.
2.2 Methods
Clock Face Detection. Next, all remaining objects are subjected to flood-
fill operation and evaluated. Eccentricity and area are collected and sorted in
Computer Aided Analysis of Clock Drawing Test Samples 199
Next, the object selected as the clock face is analyzed. A centroid of the object
is selected as the center of the clock. The radius of the clock face is obtained as:
Area
R= , (1)
π
to increase the robustness of the method in presence of drawings crossing the
clock face’s template.
The resulting clock coordinates define the processing ROI (region of interest)
for the following steps.
200 J. Kawa et al.
Object Labeling and Clustering. At the final processing step, objects are
labeled and clustered based on their location.
Before labeling, the objects are subjected to morphological dilation with
a disk-like structural element of a size matching the line thickness detected
above. The operation effectively merges nearby segments. The following label-
ing extracts 8-connected components. Finally, the pixels appended during the
dilation are marked as background (leaving only the original components).
Next, the centroids of the objects are extracted. Cartesian coordinates of the
centroids are used as data points for a SOM (Self Organizing Map) [22] based
clustering.
The 5×5, hexagon-like SOM-topology is employed. The initial neighbour-
hood includes all available data points. All centroids are used in training and
clustering. All objects assigned to the same SOM neuron are eventually merged.
The resulting objects are assigned to the Manos and Wu template’s octants
(segments) based on the centroid location. Finally, the report is created.
Fig. 3. The diagram of the computer aided CDT analysis system architecture
The DICOM network interface for the container is provided by the storescp
application from the dcmtk package (OFFIS, Germany). SecondaryCapture
DICOM objects are accepted for asynchronous processing.
REST (Representational State Transfer) web service written in Python serves
as an entry point for web calls. PNG (Portable Network Graphics) or JPEG
(Joint Photographic Experts Group) files are accepted for asynchronous pro-
cessing.
In both cases, a new processing job is created (CDT is queued). However, in
the case of the REST service, the unique ID of the processing job is returned. It
can be later used to check the status of the processing and retrieve text and image
data. In DICOM interface case, the results are stored in a new SecondaryCapture
instance with the same PatientID, StudyInstanceUID and SeriesInstanceUID
as the input object with SecondaryCaptureDeviceManufacturerModelName tag
updated to ’PoC CDT CAD module 1.0’. The new instance is later stored in the
predefined DICOM node (e.g., PACS archive).
Text results (number of objects in each octant) are stored in the RIS (Radi-
ological Information System) auxiliary PRS (Preliminary Report Structure)
database with the cdt10 prefix. Opening the CDT case, RIS can read the table
and initially fill the report template shown to the expert. The startup config-
uration defines the DICOM’s Application Entity Titles and the location of the
database.
202 J. Kawa et al.
3 Results
The image dataset introduced in Sect. 2.2 has been processed using the devel-
oped module. The processing was performed on the Linux system running on
Intel Xeon Gold 6226 CPU, Matlab 2022a. The container and computational
module were started before the test. Cases were processed sequentially. The
mean processing time (not including DICOM interface overhead) for lower res-
olution cases was 1.5 s (min/max: 1.3/1.7 s) and for a higher resolution cases
7 s (min/max: 5.3/7.7 s). Obtained DICOMs were validated using dciodvfy tool
from the dicom3tools [13] package.
The results were assessed in a quantitative and qualitative manner:
During the first evaluation step, the size and location of a generated Manos and
Wu template were assessed. The colored template was superimposed over the
original CDT sheet. The location was considered correct if the red template’s
line was placed directly over the clock face template (as in Fig. 2, Template
fitting stage). As the hours were not recognized, the location of the first octant’s
center line over the 12:00 h, as required by the Manos and Wu, was not checked
during the template placement assessment.
In all 36 cases, the template was assessed as correctly placed.
Next, all available cases were manually evaluated. The objects in each Manos
and Wu octant (segment) were manually counted, and the quality of the object’s
segmentation was assessed. On that basis, the segment was considered correctly
or incorrectly processed.
The segment was considered incorrectly processed if any of the following
conditions occurred inside the said region:
– missing objects were present (e.g., hour or hand was not detected),
– 2-digit numbers (e.g.10, 11, 12) were detected as separate objects (e.g. 1–0,
1–1, 1–2),
– clustering errors were present, e.g., the clock’s hand and digit were considered
a single object despite not being 8-connected.
The Table 1 summarizes the evaluation. Of 288 evaluated octants, 201 (70%)
were assessed as correct.
Computer Aided Analysis of Clock Drawing Test Samples 203
In one case, the results were assessed as not helpful. In twenty-five cases, the
neutral option was selected. In ten cases, the report was considered helpful. In
neither cases the report was considered very helpful or distracting.
4 Discussion
The presented results are based on the CDT data acquired by the tablet applica-
tion. However, in our study, only static images are analyzed. Although disregard-
ing dynamic parameters registered during the test could be generally considered
a drawback (dynamic registration provides more options for robust handwrit-
ing recognition or kinematic analysis), it is intentional. The long-term goal is to
analyze static scans of historical and still widely employed paper CDT sheets
lacking the dynamic parameters. Similar approach was previously employed in
our Luria’s test analysis framework in [33].
204 J. Kawa et al.
The method performance itself is considered adequate for the proof of con-
cept stage. However, the object (hours and hands) detection has to be improved
for the method to be considered ready for clinical evaluation. Significant draw-
back is a lack of handwriting recognition options. Without robust detection of
the digits, advanced scoring options are not available: one cannot position the
template at about 12:00, check whether digits are placed in specific octants, or
distinguish between digits, hands, other words, drawings inserted by the subject
or artifacts, etc. Moreover, although dedicated routines can easily detect selected,
even subtle, errors, such as recognizing incorrectly placed hours, they must be
supervised. Indeed, for human experts, clock drawings are either obviously cor-
rect (hours and hands placed where expected) or recognizably incorrect (wrong
numbers, all hours located in the upper/right half of the clock face, redundant
hands, etc.). Some of the newest scoring systems even abandoned using hand
length and focus only on the placement of hands [32].
What is more, as long as the neutral score prevails in experts’ opinion (such
as in Sect. 3.3), the computerized method can hardly be considered an overall
improvement. On the other hand, even employing a quasi-automatic method, the
computer-aided diagnosis approach provides several advantages. Fixed workflow
makes the scoring less subjective; placing the digits in correct octants can easily
be validated in terms of including most of the digit or the center of the digit, etc.
Moreover, in computer system, results are automatically archived and available
for follow-up up or remote assessment. The option to include new features that
are not directly available to the expert might also be considered beneficial.
Introduced PACS/VNA integration on the back-end layer was straightfor-
ward. The development model based on containerization makes the processing
module independent and ready for immediate deployment. On the PACS/VNA
level, only the configuration of a new DICOM node and corresponding auto-
routing rules was necessary. However, the memory footprint of the running con-
tainer, as well as resource consumption, was significant. On the other hand,
replacing the proof-of-concept, general-purpose environment with an optimized,
dedicated application is expected to significantly reduce the load without chang-
ing the public network interface.
In general, the architecture of contemporary PACS/VNA systems should
make the development of similar modules easier. Early PACS were considered
proprietary, closed environments, complaint with selected part of the standard
only. Transitioning to the more open, vendor-agnostic architecture, traditionally
linked with a VNA label [1,5], permitted various kinds of medical data stored in
the same archive. Nowadays, PACSes or VNAs (terms often used as synonymous
despite historical differences) can store DICOM-embedded scans, pictures, or
video recordings. RIS interface can be adapted to handle new types of data and
provide a unified user experience through various processing modules requiring
user interaction.
Acknowledgement. This research has been co-financed within the statutory grant
of Silesian University of Technology no. 07/010/BK 22/1011.
Computer Aided Analysis of Clock Drawing Test Samples 205
References
1. Agarwal, T.K.: Sanjeev: vendor neutral archive in PACS. Indian J. Radiol. Imaging
22(04), 242–245 (2012). https://doi.org/10.4103/0971-3026.111468
2. Agrell, B., Dehlin, O.: The clock-drawing test. Age Ageing 27(3), 399–404 (1998)
3. American Psychiatric Association: Diagnostic and statistical manual of mental
disorders, vol. 5th edn. American Psychiatric Publishing, Arlington (2013)
4. Amini, S., et al.: An artificial intelligence-assisted method for dementia detection
using images from the clock drawing test. J. Alzheimers Dis. 83(2), 581–589 (2021)
5. Armbrust, L.J.: PACS and image storage. The veterinary clinics of North America.
Small Animal Practi. 39(4), 711–718 (2009). https://doi.org/10.1016/j.cvsm.2009.
04.004
6. Bednorz, A., et al.: Zastosowanie tabletowej wersji Testu Rysowania Zegara do
rozpoznawania lagodnych zaburzeń poznawczych (MCI) u osób starszych, jako
próba telediagnostyki w geriatrii [Tablet version of Clock Drawing Test in assess-
ment of mild cognitive impairment in the elderly as an attempt to tele-diagnostics
in geriatrics] (2017)
7. Bidgood, W.D., Horii, S.C.: Introduction to the ACR-NEMA DICOM standard.
Radiographics 12(2), 345–355 (1992). https://doi.org/10.1148/radiographics.12.2.
1561424
8. Binaco, R., Calzaretto, N., Epifano, J., Emrani, S., Wasserman, V., Libon, D.,
et al.: Automated analysis of the clock drawing test for differential diagnosis of
mild cognitive impairment and Alzheimer’s disease. In: Mid-Year Meeting of the
International Neuropsychological Society (2018)
9. Borson, S., Scanlan, J., Brush, M., Vitaliano, P., Dokmak, A.: The Mini-Cog:
a cognitive ‘vital signs’ measure for dementia screening in multi-lingual elderly.
Int. J. Geriat. Psychiatr. 15(11), 1021–1027 (2000). https://doi.org/10.1002/1099-
1166(200011)15:111021::aid-gps2343.0.co;2-6
10. Brodaty, H., Low, L.F., Gibson, L., Burns, K.: What is the best dementia screening
instrument for general practitioners to use? Am. J. Geriatr. Psychiatr. 14(5), 391–
400 (2006)
11. Chan, J.Y., et al.: Evaluation of digital drawing tests and paper-and-pencil drawing
tests for the screening of mild cognitive impairment and dementia: a systematic
review and meta-analysis of diagnostic studies. Neuropsychol. Rev. 1–11 (2021).
https://doi.org/10.1007/s11065-021-09523-2
12. Chen, S., Stromer, D., Alabdalrahim, H.A., Schwab, S., Weih, M., Maier, A.: Auto-
matic dementia screening and scoring by applying deep learning on clock-drawing
tests. Sci. Rep. 10(1), 1–11 (2020)
13. Clunie, D.A.: Dicom3tools software website. https://www.dclunie.com/
dicom3tools.html (2022). Accessed 19 Jan 2022
14. Ferrucci, L., Cecchi, F., Guralnik, J.M., Giampaoli, S., Noce, C.L., Salani, B.,
Bandinelli, S., Baroni, A., Group, F.S.: Does the clock drawing test predict cogni-
tive decline in older persons independent of the mini-mental state examination? J.
Am. Geriatr. Soc. 44(11), 1326–1331 (1996)
15. Freund, B., Gravenstein, S., Ferris, R., Burke, B.L., Shaheen, E.: Drawing clocks
and driving cars. J. Gen. Intern. Med. 20(3), 240–244 (2005)
16. Fujita, H.: AI-based computer-aided diagnosis (AI-CAD): the latest review to read
first. Radiol. Phys. Technol. 13(1), 6–19 (2020). https://doi.org/10.1007/s12194-
019-00552-4
206 J. Kawa et al.
17. Guha, A., Kim, H., Do, E.Y.L.: Automated clock drawing test through machine
learning and geometric analysis. In: DMS, pp. 311–314 (2010)
18. Harbi, Z., Hicks, Y., Setchi, R.: Clock drawing test interpretation system. Proced.
Comput. Sci. 112, 1641–1650 (2017)
19. Hazan, E., Frankenburg, F., Brenkel, M., Shulman, K.: The test of time: a history
of clock drawing. Int. J. Geriatr. Psychiatr. 33(1), e22–e30 (2018)
20. Hsieh, S., Schubert, S., Hoon, C., Mioshi, E., Hodges, J.R.: Validation of the Adden-
brooke’s cognitive examination iii in frontotemporal dementia and Alzheimer’s dis-
ease. Dement. Geriatr. Cogn. Disord. 36(3–4), 242–250 (2013)
21. Kirby, M., Denihan, A., Bruce, I., Coakley, D., Lawlor, B.A.: The clock drawing test
in primary care: sensitivity in dementia detection and specificity against normal
and depressed elderly. Int. J. Geriatr. Psychiatr. 16(10), 935–940 (2001)
22. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol.
Cybern. 43(1), 59–69 (1982). https://doi.org/10.1007/bf00337288
23. Le, A.H.T., Liu, B., Huang, H.K.: Integration of computer-aided diagno-
sis/detection (CAD) results in a PACS environment using CAD–PACS toolkit and
DICOM SR. Int. J. CARS 4(4), 317–329 (2009). https://doi.org/10.1007/s11548-
009-0297-y
24. Mainland, B.J., Shulman, K.I.: Clock drawing test. In: Larner, A.J. (ed.) Cognitive
Screening Instruments, pp. 67–108. Springer, Cham (2017). https://doi.org/10.
1007/978-3-319-44775-9 5
25. Manos, P.J.: The utility of the ten-point clock test as a screen for cognitive impair-
ment in general hospital patients. Gen. Hosp. Psychiatr. 19(6), 439–444 (1997)
26. Manos, P.J., Wu, R.: The ten point clock test: a quick screen and grading method
for cognitive impairment in medical and surgical patients. Int. J. Psychiatr. Med.
24(3), 229–244 (1994)
27. Nasreddine, Z.S., et al.: The Montreal cognitive assessment, MoCA: a brief screen-
ing tool for mild cognitive impairment. J. Am. Geriatr. Soc. 53(4), 695–699 (2005).
https://doi.org/10.1111/j.1532-5415.2005.53221.x
28. Ngo, J., Holroyd-Leduc, J.M.: Systematic review of recent dementia practice guide-
lines. Age Ageing 44(1), 25–33 (2014)
29. Nirjon, S., Emi, I.A., Mondol, M.A.S., Salekin, A., Stankovic, J.A.: MOBI-COG:
a mobile application for instant screening of dementia using the mini-cog test. In:
Proceedings of the Wireless Health 2014 on National Institutes of Health, pp. 1–7
(2014)
30. Park, I., Lee, U.: Automatic, qualitative scoring of the clock drawing test (CDT)
based on U-net, CNN and mobile sensor data. Sensors 21(15), 5239 (2021)
31. Pietka, E., Kawa, J., Badura, P., Spinczyk, D.: Open architecture computer-aided
diagnosis system. Expert. Syst. 27(1), 17–39 (2010). https://doi.org/10.1111/j.
1468-0394.2009.00524.x
32. Rakusa, M., Jensterle, J., Mlakar, J.: Clock drawing test: a simple scoring system
for the accurate screening of cognitive impairment in patients with mild cogni-
tive impairment and dementia. Dement. Geriatr. Cogn. Disord. 45(5–6), 326–334
(2018)
33. Stepień,
P., et al.: Computer aided written character feature extraction in pro-
gressive supranuclear palsy and Parkinson’s disease. Sensors 22(4), 1688 (2022).
https://doi.org/10.3390/s22041688
34. Suzuki, Y., et al.: Quantitative and qualitative analyses of the clock drawing test
in fall and non-fall patients with Alzheimer’s disease. Dementia Geriatric Cogn.
Disord. Extra 9(3), 381–388 (2019)
Computer Aided Analysis of Clock Drawing Test Samples 207
35. Watson, Y.I., Arfken, C.L., Birge, S.J.: Clock completion: an objective screening
test for dementia. J. Am. Geriatr. Soc. 41(11), 1235–1240 (1993)
36. Wójcik, D., Szczechowiak, K.: Wybrane wersje testu rysowania zegara w prak-
tyce klinicznej-analiza porównawcza ilościowych i jakościowych systemów oceny
[Selected versions of the clock test in clinical practice - a comparative analysis of
quantitative and qualitative scoring systems]. Aktualn Neurol. 19, 83–90 (2019)
37. Youn, Y.C., et al.: Use of the clock drawing test and the rey-osterrieth complex
figure test-copy with convolutional neural networks to predict cognitive impair-
ment. Alzheimer’s Res. Ther. 13(1), 1–7 (2021)
Study on the Impact of Neural Network
Architecture and Region of Interest
Selection on the Result of Skin Layer
Segmentation in High-Frequency
Ultrasound Images
1 Introduction
Atopic dermatitis is a chronic inflammatory skin disease characterized by intense
itching and recurrent eczema. It can occur at any age but most often appears
in early childhood (usually between 3 and 6). Its causes are multifactorial and
complex, and one of them is the genetic factor. So far, no medicine has been
developed for it [2].
Psoriasis is a chronic and recurrent immune-mediated disease of the skin and
joints. It has several clinical skin symptoms, but chronic symmetrical erythema-
tous scaly papules and plaques are the most common manifestation. It can occur
at any age. One of its causes is a vital genetic component, but environmental
factors, such as infections, may also play an essential role [3].
To monitor both mentioned diseases, HFUS (high-frequency ultrasound,
>20 MHz) is now often used. It is a non-invasive method that enables the differ-
entiation of skin structures on a micro-scale [4].
Ultrasound images of inflammatory skin diseases show a band with low
echogenicity – SLEB layer (subepidermal low echogenic band). The thickness
of the SLEB layer may be an indicator of the disease’s severity, and its measure-
ment over time may also be used to monitor the effects of the applied therapy
[4].
Due to the growing popularity of skin imaging using ultrasound, it is neces-
sary to develop image processing techniques dedicated to this issue, which will
allow, for example, segmentation of skin layers or detection of skin lesions. One
of the possible solutions is the use of deep neural networks.
segmentation results in layered images [11]. The last network, CFPNet-M, was
chosen due to its smaller size than previously mentioned, however similar appli-
cation area [12]. An important factor that influenced the network’s choice was
the year of its publication – the latest solutions were benefited. The U-Net is the
oldest one; however, it is often described as a reference solution in literature.
Implementations of the above architectures were taken from [11–14].
To prevent false detection of the epidermis, the input images were limited to
the region of interest in one of our experiments. Based on the literature, it was
assumed that this area would be 0.5 mm up from the top of the epidermis and
2 mm down from its valley [6,10,13,21,23]. After that, ROI images were scaled
to the size of 128 × 256 pixels. The designated region of interest in the original
image and expert mask is shown in Fig. 1 by a red frame.
2.4 Augmentation
Due to the limited number of images in the dataset, the augmentation step is
considered as potentially improving the segmentation results. Therefore, three
variants of the data set were applied: original data (380 images), data with
212 D. Szymańska et al.
2.5 Training
The last studied parameter is the optimizer, and the two considered optimiz-
ers are Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation
(Adam). The training setup was as follows: for SGD optimizer, the learning
rate was equal to 0.01, and momentum was equal to 0, whereas, for Adam opti-
mizer, the learning rate was equal to 0.001, the exponential decay rate for the
1st moment estimates was equal to 0.9, the exponential decay rate for the 2nd
moment estimates was equal to 0.999, and the epsilon was equal to 1 × 10−7 .
For all the architectures, categorical cross-entropy was used as the loss function.
Each model was trained over 100 epochs with a batch size equal to 2. The archi-
tectures were implemented in python using Keras and Tensorflow libraries in the
Collab environment.
To assess the segmentation, the external k-fold cross-validation was intro-
duced. To analyze the impact of the k value on the obtained results, two mostly
explored k values were selected: k = 5 and k = 10.
allowed for achieving the highest segmentation results for both segmented lay-
ers among all the analyzed variants. It also returns better results in the case
of using the SGD optimization method compared to the U-Net and CFPNet-M
networks. U-Net performed best in the case of a small data set (without aug-
mentation). Its results are better for smaller images and the Adam optimizer.
Moreover, the Adam optimizer works better than SGD for the most considered
approaches (excluding DC-UNet). CFPNet-M makes it possible to achieve higher
results than U-Net for larger images (512 × 256). Compared to the DC-UNet net-
work, it performs better only in the original image data set with the image size
512 × 256. In the case of the ROI data set without augmentation, the U-Net and
CFPNet-M networks provide similar results, better than the DC-UNet architec-
ture. In turn, analyzing the obtained results for the set with augmentation, the
DC-UNet network allowed for achieving the highest values.
Since the network’s output is a probability map, the influence of the binariza-
tion threshold value on the final segmentation results is also reviewed. It shows
that the highest values were achieved for the threshold equal to 0.4. Thresh-
olds above the value of 0.6 were too restrictive, and those below 0.3 were too
imprecise.
Figures 6 and 7 show the boxplots for each network and the combinations
of parameters, which obtained the highest values of the segmentation quality
assessment coefficients for the epidermal layer and SLEB. On this basis, we can
conclude that the DC-UNet network achieves the highest quality segmentation
214 D. Szymańska et al.
of both layers among the analyzed architectures: 0.893 Jaccard and 0.943 Dice
index for the epidermal layer, and 0.869 Jaccard and 0.93 Dice index for the
SLEB layer. The exemplary segmentation results for DC-UNet model are shown
in Fig. 8.
CFPNet-M provides the results of 0.873 Jaccard and 0.932 Dice for epidermal
layer segmentation and 0.844 Jaccard and 0.915 Dice for the SLEB layer. In turn,
the results are a little bit worse using the U-Net architecture: 0.863 Jaccard and
0.927 Dice for the epidermis and 0.827 Jaccard and 0.905 Dice for the SLEB layer
segmentation. It is also worth noting that these results were achieved for the
models learned on the set with augmentation v1 and images of a size (512×256).
These are the parameters for which DC-Unet works the best.
The region of interest selection step does not provide the highest result among
all the analyzed models. However, in the case of the U-Net architecture, reducing
the image area to ROI improves segmentation using the SGD optimizer. Similar
results were obtained for the original images resized to 256 × 128 pixels and the
Adam optimizer.
Applying the augmentation significantly improved the obtained results, espe-
cially variant v1. Since the methods of data augmentation differed only in the
angle of rotation of the images (for v1, the angle of rotation was 10, while for v2
it was 20), we can conclude that too intensive rotation in the case of ultrasound
images (of a layered nature, where successive layers are arranged in parallel)
causes a deterioration of the segmentation quality. The next parameter analyzed
was the size of the images. It was noticed that the segmentation of skin layers
is more accurate for larger images. It can be explained by the information loss
during the interpolation step. For the two optimization methods used during
network training considered, we can notice that the Adam optimizer results in
significantly better segmentation than SGD.
The last analyzed parameter was the number of subsets used in the k-fold
cross-validation. It is not considered the framework parameter. However, the
validation strategy may strongly influence the obtained results. For accurate
evaluation of the developed algorithm on extensive data collection, it is sufficient
to use smaller k (e.g., 5), whereas for small data sets, k needs to be higher (e.g.,
10). The highest results were obtained in our experiments when dividing the set
into 10 parts, and the networks were trained on larger image sets.
The obtained results were also compared with SegUNet described in [6]
applied to the same dataset. The reported there Dice indexes are equal to 0.874
for the epidermis, and 0.829 for the SLEB layer, respectively, being worst than
the currently selected model. From our observation, the CNN training results
depend on the development environment and the used libraries (even for the
same architectures).
It is worth to mention, that in our analysis we considered, the only models
and parameters with promising results described in the literature. Our analysis
would complement other models or additional training parameters (e.g., loss
function, mini-batch size, number of the epoch, etc.). However, such analysis
requires additional time and hardware resources. In our opinion, the selected
Study on the Impact of Neural Network Architecture 215
Fig. 2. The comparison results of different epidermis layer segmentation methods illus-
trated in a histogram
setup and conclusions coming from the results can be beneficial for further works
in the HFUS image segmentation area.
216 D. Szymańska et al.
Table 4. Results of skin layer segmentation for ROI data set achieved by U-Net archi-
tecture
Table 5. Results of skin layer segmentation for ROI data set achieved by DC-UNet
architecture
Fig. 3. The comparison results of different SLEB layer segmentation methods illus-
trated in a histogram
218 D. Szymańska et al.
Fig. 4. The comparison results of different epidermis layer segmentation methods for
ROI data set illustrated in a histogram
Fig. 5. The comparison results of different SLEB layer segmentation methods for ROI
data set illustrated in a histogram
Fig. 6. Boxplots for epidermis layer segmentation according to the network architec-
ture. Each box covers a 25th to 75th percentile range with median value given and
indicated by a central line. The extreme values are bordered by whiskers
Study on the Impact of Neural Network Architecture 219
Fig. 7. Boxplots for SLEB layer segmentation according to the network architecture.
Each box covers a 25th to 75th percentile range with median value given and indicated
by a central line. The extreme values are bordered by whiskers
4 Conclusion
This study presents the results of analyzing the impact of neural network archi-
tecture and region of interest selection on skin layer segmentation in high-
frequency ultrasound images. In the analysis, we considered the influence of
image size, data augmentation, applied optimization method, and binarization
threshold. Additionally, the influence of k value used in k-fold cross-validation
was investigated.
The most critical concern of this analysis was the network architecture, which
seems to be strongly related to the size of the analyzed images and the augmen-
tation strategy. From our observation, the U-Net performed better for a small
set without augmentation and images of a smaller size. The CFPNet-M pro-
vides higher accuracy than U-Net for larger images. On the other hand, the
DC-UNet network was the best solution for the set with augmentation. The
region of interest selection improved segmentation quality for all architectures
using the SGD optimization method. However, it does not achieve the highest
results among all analyzed models. From this, we can conclude that this step is
not necessary to limit erroneous detection of the epidermis and does not bring
the expected improvement in segmentation. For the optimization techniques,
the Adam optimizer proved to be much better in this matter. The augmentation
step significantly improved the segmentation results, but the limited rotation for
layered images works better. Finally, the larger size of the images, the better the
results, however it strongly influence the training time.
Acknowledgement. This research was funded by the Polish Ministry of Science and
Silesian University of Technology statutory financial support No. 07/010/BK 22/1011.
References
1. Czajkowska, J., Badura, P., Platkowska-Szczerek, A., Korzekwa, S.: Data for: Deep
Learning Approach to Skin Layers Segmentation in Inflammatory Dermatoses,
Mendeley Data. https://doi.org/10.17632/5p7fxjt7vs.1
2. Langan, S.M., Irvine, A.D., Weidinger, S.: Atopic dermatitis. The Lancet, Else-
vier, vol. 396, nb. 10247, pp. 345–360, August 2020. https://doi.org/10.1016/s0140-
6736(20)31286-1
3. Langley, R.G., Krueger, G.G., Griffiths, C.E.: Psoriasis: epidemiology, clinical fea-
tures, and quality of life. Ann Rheum Dis. BMJ 64, nb. Suppl 2, ii18-23; discussion
ii24-5, (2005). https://doi.org/10.1136/ard.2004.033217
4. Polańska, A., Dańczak-Pazdrowska, A., Jalowska, M., Żaba, R., Adamski, Z.: Cur-
rent applications of high-frequency ultrasonography in dermatology. Adv. Derma-
tol. Allergol./Postepy
Dermatologii i Alergologii, Termedia Sp. z.o.o. 34, nb. 6,
535–542 (2017). https://doi.org/10.5114/ada.2017.72457
5. del Amor, R., et al.: Automatic segmentation of epidermis and hair follicles in opti-
cal coherence tomography images of normal skin by convolutional neural networks.
Front. Med. Front. Media 7, (2020). https://doi.org/10.3389/fmed.2020.00220
6. Czajkowska, J., Badura, P., Korzekwa, S., Platkowska-Szczerek, A.: Deep learning
approach to skin layers segmentation in inflammatory dermatoses. Ultrasonics,
Elsevier, vol. 114, pp. 106412 (2021). https://doi.org/10.1016/j.ultras.2021.106412
Study on the Impact of Neural Network Architecture 221
7. Sciolla, B., Digabel, J.L., Josse, G., Dambry, T., Guibert, B., Delachartre, P.: Joint
segmentation and characterization of the dermis in 50 MHz ultrasound 2D and 3D
images of the skin. Comput. Biol. Med. Elsevier 103, 277–286. (2018). https://doi.
org/10.1016/j.compbiomed.2018.10.029
8. Marosán, P., Szalai, K., Csabai, D., Csány, G., Horváth, A., Gyöngy, M.: Auto-
mated seeding for ultrasound skin lesion segmentation. Ultrasonics, Elsevier 110,
106268 (2021). https://doi.org/10.1016/j.ultras.2020.106268
9. Xu, H., Mandal, M.: Epidermis segmentation in skin histopathological images
based on thickness measurement and k-means algorithm. EURASIP J. Image Video
Process. 2015(1), 1–14 (2015). https://doi.org/10.1186/s13640-015-0076-3
10. Siddique, N., Paheding, S., Elkin, C., Devabhaktuni, V.: U-net and its variants for
medical image segmentation: a review of theory and applications. IEEE Access 9,
82031–82057 (2021). https://doi.org/10.1109/ACCESS.2021.3086020
11. Lou, A., Guan, S., Loew, M.H.: DC-UNET: rethinking the U-Net architecture with
dual channel efficient CNN for medical image segmentation. In: Medical Imaging
2021: Image Processing, SPIE, vol. 11596, pp. 115962T (2021). https://doi.org/10.
1117/12.2582338
12. Lou, A., Guan, S., Loew, M.: CFPNet-M: a light-weight encoder-decoder based
network for multimodal biomedical image real-time segmentation. arXiv preprint
arXiv:2105.04075 (2021). https://doi.org/10.48550/ARXIV.2105.04075
13. Sterbak, T.: U-net for segmenting seismic images with keras. https://www.
depends-on-the-definition.com/unet-keras-segmenting-images/, April 2020
14. Chen, J.: OCT-image-segmentation-ml. https://github.com/jessicaychen/OCT-
Image-Segmentation-ML, August 2020
Skin Lesion Matching Algorithm
for Application in Full Body Imaging
Systems
(B)
Maria Strakowska
and Marcin Kociolek
Abstract. The full body imaging systems (FBS) recently gain atten-
tion as they are efficient tool in patient screening in early melanoma
detection. Their advantage is the ability to detect suspicious changes
that appear in places that are difficult for the patient to see indepen-
dently (e.g. on the back) as well as to observe newly formed changes
and detect the growth of existing nevi. An essential part of FBS soft-
ware is lesion matching algorithm that enables pairing lesions detected
during patient’s follow-up examination. This paper proposes such algo-
rithm that is based on feature matching followed by triangulation. It was
demonstrated that proposed method provides relatively fast and accu-
rate lesion matching. Obtained sensitivity and precision at the level of
85.9% and 86.9% respectively satisfies the requirements defined for the
FBS specification which is currently under development.
1 Introduction
Skin cancer is taking its death toll all over with an observable upward trend.
However, if detected early, the chances of recovery are very high. Detection of this
neoplasm has long been supported by the development of image analysis meth-
ods that have been applied in the discrimination of pathological skin lesions [17].
Many algorithms for the detection and classification of skin neoplastic changes
have been described in the literature, good reviews of such methods can be found
in [3,13]. There are also many mobile applications devoted to analysis of skin
lesions that use photos taken with a smartphone [4]. Recently, full body (or
whole-, total-body) systems (FBS) have also been developed, which provide the
possibility of taking photographs of the patient’s entire body (and not just indi-
vidual moles). The advantage of this approach is the ability to detect suspicious
changes that appear in places that are difficult for the patient to see indepen-
dently (e.g., on the back). Another advantage of such systems is the possibility
of observing newly formed changes and detecting the growth of existing nevi,
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 222–233, 2022.
https://doi.org/10.1007/978-3-031-09135-3_19
Skin Lesion Matching 223
Each pair contains images taken for the same person over a period of time.
Both pictures of the pair are made at the same height and from the same direc-
tion. We used images of both men and women. The image pairs were taken from
different directions. To anonymize, the pairs of images in this study were cropped
so that the patient’s head was not visible. There were from 8 to 143 melanocytic
nevi on the individual images.
Neural networks are reliable tools for many medical image analysis tasks
[5,7,11,14]. Also in this study, a nevus detection procedure [15] was based on a
deep learning network (YOLO3 [12])). As a result of which a binary mask was
obtained containing areas of the detected nevi along with vectors containing
centroids and bounding boxes of individual nevi. An example of a mask with
discovered nevi superimposed on the input image is shown in Fig. 2.
The goal is to match the lesions from the images taken in different time, e.g.,
with the interval of six months. The result of the mole matching algorithm is a list
of indexes of matched moles, which corresponds to the same spot on the patient’s
Skin Lesion Matching 225
Fig. 2. An example of a binary mask with discovered nevi superimposed on the input
image (light green spots on the image).
skin. Although, the images mentioned above are taken in the same reference
position of the patient’s body it is not easy to match lesions corresponding to
each other. This results from different patient’s body posture, changes in their
weight, skin tan, lingerie which is worn etc. What is most important the character
of lesions could change. The size, shape and color of lesions could evolve, which
is in fact the reason to match and compare it. The block scheme of the developed
algorithm is shown in Fig. 3.
Two main stages can be indicated. First step is based on feature matching,
the second on triangulation using the spots from stage one.
and their descriptors for both images are found. Keypoints are the points of
interest found by the algorithm. Descriptors is the set of values which describes
the keypoint. Using the descriptors set and comparing its similarities, two dif-
ferent keypoints could be matched. Brute Force Matcher is used to match these
descriptors together and find similarities between the images. Tests of different
types of feature detectors show that AKAZE features gives the best result. They
ensure the highest number of keypoint’s matches, both on overall and valid ones,
which indicate the same lesion on the skin. Exemplary result of detected AKAZE
features and matching are shown in Fig. 4(a).
Fig. 4. AKAZE keypoints matched by Brut Force Matcher algorithm (a) all keypoints
matched (b) keypoints detected for marked lesion (rectangle)
The pairs of matched keypoints are the temporary result of this stage of
algorithm. Each lesion could have many keypoints – Fig. 4(b). Not all of them
could be paired with the ones corresponding to the same lesion in the second
image. In result, the additional operation must be performed to get unambiguous
pairs of lesions. The simple voting for the highest number of connections between
keypoints of the lesions have been used to get the final matching presented in
Fig. 5.
Many lesions, especially these healthy, could be very similar to each other.
Therefore, the above method could not work correctly in every case and generate
wrong matches as the ones seen in Fig. 5. The most obvious wrong matches
are easy to find as the lines connecting two images have different angle from
most of the other ones. To solve this problem, the RANSAC (Random Sample
Consensus) [6] algorithm has been used to remove the wrong matches and leave
only these which are agreed with the detected model. The result of the 1st stage
of the algorithm is shown in Fig. 6.
The indicated pairs of the lesions have been created correctly. However, many
of lesions from images have not been matched. One could compare Fig. 4(a) to
Fig. 6 to state it evidently.
Fig. 7. (a) Unmatched lesion the image 1 and (b) corresponding area, where the pair
is searched for
The unmatched lesions create the triangles with the matched ones which are
found in their nearest neighborhood. The operation is performed on first image
with the U L1 being considered at the moment. The next sets of triangles are
created on the follow-up image with unmatched spots which meets the condition
of the neighborhood and the same matched, reference lesions as in the image 1.
Figure 8 illustrate this operation.
Fig. 8. Triangles created by the same vertex of matched (dots) and unmatchetd
(crosses) lesions. (a) first potential pair to lesion no. 1 (b) second potential pair to
lesion no. 1
Let’s assume that we have 4 matched lesions (dots) and 3 unmatched ones
(crosses) – Fig. 8. Now let’s try to find the pair to the lesion no. 1 in the first
image (U L1 ). The position condition defined by the circle radius is met in image
2 for points no. 1 and no. 2 (U L1.1 \U L1.2 ). Only these points are the candidates
to be paired with U L1 . Next, the set of triangles with the common unmatched
Skin Lesion Matching 229
lesion (U L1 ) and the M L from algorithm stage 1 are created. Considering four
ML in the neighborhood of U L1 , three triangles (t1 ,t2 ,t3 ) will be created for the
U L1 in the image 1 – Fig. 8(a). Coordinates of the centers of these lesions are
the vertices of build triangles. Common vertex for all triangles is the center of
unmatched lesion U L1 . Taking the same M L in the image 2, the new sets of the
corresponding triangles are created. These sets of triangles are (t1.1 , t2.1 , t3.1 ) for
U L1.1 (Fig. 8(a)) and (t1.2 , t2.2 , t3.2 ) for U L1.2 (Fig. 8(b)). The final matching
is performed by analyzing the similarity of the set of triangles created for each
individual unmatched lesions from follow-up image. To perform this task, two
matrices of parameters are calculated according to Eq. (2) and (3).
an bn cn
sss(m, n) = std , , (2)
an.m bn.m cn.m
αn βn γn
aaa(m, n) = std , , (3)
αn.m βn.m γn.m
where anm , bnm , cnm – triangles sides, αnm , βnm , γnm – triangles angles, n –
number of triangles for U L1 created with same referenced matched lesions on
images 1 and 2, m – number of candidate lesions (from follow-up image) to be
paired with U L1 from image 1.
In such a way the array with triangles similarity parameters is created. The
number of rows is the number of unmatched lesions in the second image that
could be the valid match to selected unmatched lesion in image 1. The number
of columns indicate the number of triangles created with the reference lesions.
The parameter calculation is based on Side-Side-Side (sss(n, m)) and Angle-
Angle-Angle (aaa(n, m)) rules of triangle comparison. Therefore, the reduction
to single value is made by calculation of standard deviation from the quotient of
the corresponding sides or angles of a given triangle. The closer to zero standard
deviation is, the more similar are the triangles. Finally, the voting takes place.
Each of the value in the matrix is cast to positive vote if the value of parameter
do not exceed the threshold value thvote . Such voting take place for both types
of parameters - sides and angles and aims to find the most similar triangles. The
possible match lesion with the highest number of the votes is chosen as a pair to
considered unmatched lesion. This part of the algorithm has two iterations. The
result of the first one shows Fig. 9(a). To increase the number of the matches the
second iteration is performed. The triangles are now created on new vertices as
the number of matched lesions (the reference points for triangles) increased. It
is also assumed that the area of interest was too small in previous iteration so
some pairs of unmatched lesions in the second image were not inside the area
of interest. In the second iteration this area (cirRad) is extended 2 times. The
result after such operation is shown in Fig. 9(b).
Figure 10 shows the result for all matched lesions which are paired by the
developed algorithm, both by feature matching method and the triangulation.
230 M. Strakowska
and M. Kociolek
Fig. 9. New matches detected by the method of triangles, (a) – first iteration (b) –
second iteration
3 Results
In total, the melanocytic nevus detection algorithm found 599 nevi in the ini-
tial images and 648 nevi in the follow-up images. Table 1 summarizes the test
performed.
Table legend: Total: all possible nevi pairs in all analysed images. For each
image pair, the number of nevi pairs in original and follow up image is equal
to the product of the number of lesions found in both images. A total number
of nevi pairs is a sum of all pairs in all analysed images. Actually Positive (P):
Skin Lesion Matching 231
correct nevi pairs found by the manual review; Actually Negative (N): not correct
nevi pairs that were rejected during the manual review; True Positive (TP): nevi
pairs found both by means of lesion algorithm and confirmed during the manual
review; False Positive (FP): nevi pairs found by means of lesion algorithm and
rejected during the manual review; True Negative (TN): all not correct nevi pairs
rejected both by means of lesion algorithm and during the manual review; False
Negative (FN): nevi pairs not detected by means of lesion algorithm but found
during the manual review;
Based on the examined image pairs, the detected nevi can be combined in 34
688 pairs. Of course, most of these pairs can be easily rejected, e.g. due to the
significant vertical and or horizontal shift between most of nevi. After careful
manual analysis by a specialist, 433 correct pairs were found. Our matching
algorithm identified 428 pairs 56 of which were false positives. The algorithm
failed to discover 61 (false negative) pairs. Table 2 shows some basic statistics of
the performed test.
achieves the precision equal almost to 100%, but the average sensitivity is 79%.
Also, the algorithm proposed there is quite complex and rather non applicable
for the systems that should operate relatively fast (due to the limited visit time
provided by each patient).
There is still a room for further improvements. The greatest number of
misidentified pairs concerned moles located in places where the skin surface
was inclined at an angle deviating significantly from 90◦ . Currently, our method
is based solely on the information contained in images. In the near future, we
intend to supplement this method with information from the 3D model. In such
a situation, the analysis will cover only these pairs of marks whose inclination
in relation to the camera axis is close to a right angle. The remaining nevi will
be matched based on imaging taken for other directions. Our acquisition system
[15,16] in its current configuration performs imaging from 8 directions of the
patient’s axis. This allows to limit the range of angles at which the nevi are
analyzed to around 65◦ ÷ 115◦ . Nevi matching accuracy is also influenced by
the quality of nevi segmentation algorithm. Improper segmentation may cause
change in lesion shape while fill adversely affect feature calculation step in the
matching algorithm. Although our CNN based segmentation method [15] is quite
accurate (both precision and sensitivity > 90% were obtained), we are still work-
ing on improving it. This is important because the binary masks of detected and
segmented lesions obtained from the previous step of the developed system are
used as the input data of the presented matching algorithm.
The algorithms were developed and tested based on the data obtained so far
(patients’ images acquired by project partner – Skopia Estetic Clinic). Currently,
these algorithms are validated on the emerging images of new patients. In the
future the algorithm will be used in a dermatology clinic for screening. Prior to
that the software will be tuned to specific vision system (its parameters may
influence the algorithm performance) and tested on larger number of images.
Acknowledgement. This work has been supported by the European Regional Fund
and National Centre for Research and Development project POIR.04.01.04-00-0125/18-
00 “Development of a device to support early stage examination of skin lesions, includ-
ing melanoma using computer imaging, spatial modeling, comparative analysis and
classification methods”, implemented by the Skopia Estetic Clinic Sp. z o.o. and the
Lodz University of Technology.
References
1. Alcantarilla, P.F., Bartoli, A., Davison, A.J.: Kaze features. Lecture Notes
in Computer Science (including subseries Lecture Notes in Artificial Intelli-
gence and Lecture Notes in Bioinformatics), vol. 7577 LNCS, pp. 214–227
(2012). https://doi.org/10.1007/978-3-642-33783-3 16, https://link.springer.com/
chapter/10.1007/978-3-642-33783-3 16
2. Alcantarilla, P.F., Nuevo, J., Bartoli, A.: Fast explicit diffusion for accelerated fea-
tures in nonlinear scale spaces (2011). http://www.bmva.org/bmvc/2013/Papers/
paper0013/abstract0013.pdf
Skin Lesion Matching 233
3. Barata, C., Celebi, M.E., Marques, J.S.: A survey of feature extraction in der-
moscopy image analysis of skin cancer. IEEE J. Biomed. Health Inform. 23, 1096–
1109 (2019). https://doi.org/10.1109/JBHI.2018.2845939
4. de Carvalho, T.M., Noels, E., Wakkee, M., Udrea, A., Nijsten, T.: Development of
smartphone apps for skin cancer risk assessment: progress and promise. JMIR Der-
matol. 2019 2(1), e13376 (2019). https://doi.org/10.2196/13376, https://derma.
jmir.org/2019/1/e13376
5. Chrzanowski, L., Drozdz, J., Strzelecki, M., Krzeminska-Pakula, M., Jedrzejewski,
K.S., Kasprzak, J.D.: Application of neural networks for the analysis of intravas-
cular ultrasound and histological aortic wall appearance - an in vitro tissue char-
acterization study. Ultrasound Med. Biol. 34, 103–113 (2008). https://doi.org/10.
1016/J.ULTRASMEDBIO.2007.06.021
6. Fischler, M.A., Bolles, R.C.: Random sample consensus. Commun. ACM 24, 381–
395 (1981). https://doi.org/10.1145/358669.358692, https://dl.acm.org/doi/abs/
10.1145/358669.358692
7. Gentillon, H., Stefańczyk, L., Strzelecki, M., Respondek-Liberska, M.: Parameter
set for computer-assisted texture analysis of fetal brain. BMC Res. Notes 9, 1–
18 (2016). https://doi.org/10.1186/S13104-016-2300-3/TABLES/2. https://link.
springer.com/articles/10.1186/s13104-016-2300-3
8. Korotkov, K., et al.: An improved skin lesion matching scheme in total body pho-
tography. IEEE J. Biomed. Health Inform. 23, 586–598 (2019). https://doi.org/
10.1109/JBHI.2018.2855409
9. Korotkov, K., Quintana, J., Puig, S., Malvehy, J., Garcia, R.: A new total body
scanning system for automatic change detection in multiple pigmented skin lesions.
IEEE Trans. Med. Imaging 34, 317–338 (2015). https://doi.org/10.1109/TMI.
2014.2357715
10. Mirzaalian, H., Lee, T.K., Hamarneh, G.: Skin lesion tracking using structured
graphical models. Med. Image Anal. 27, 84–92 (2016). https://doi.org/10.1016/J.
MEDIA.2015.03.001
11. Obuchowicz, J.R., Kruszyńska, M.S.: Classifying median nerves in carpal tun-
nel syndrome: ultrasound image analysis. Biocybern. Biomed. Eng. 41, 335–351
(2021). https://doi.org/10.1016/j.bbe.2021.02.011
12. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified,
real-time object detection, pp. 779–788 (2016)
13. Saba, T.: Recent advancement in cancer detection using machine learning: sys-
tematic survey of decades, comparisons and challenges. J. Inf. Public Health 13,
1274–1289 (2020). https://doi.org/10.1016/J.JIPH.2020.06.033
14. Strzelecki, M.: Texture boundary detection using network of synchronised oscilla-
tors. Electron. Lett. 40, 466–467 (2004). https://doi.org/10.1049/EL:20040330
15. Strzelecki, M.H., Strakowska,
M., Kozlowski, M., Urbańczyk, T., Wielowieyska-
Szybińska, D., Kociolek, M.: Skin lesion detection algorithms in whole body images.
Sensors 21, 6639 (2021). https://doi.org/10.3390/S21196639, https://www.mdpi.
com/1424-8220/21/19/6639/htm
16. Szczypiński, P.M., Sprawka, K.: Orthorectification of skin nevi images by means of
3d model of the human body. Sensors 21, 8367 (2021). https://doi.org/10.3390/
S21248367, https://www.mdpi.com/1424-8220/21/24/8367/htm
17. Zalewska, A., Strzelecki, M., Sugut, J.: Implementation of an image analysis system
for morphological description of skin mast cells in urticaria pigmentosa. Med. Sci.
Mon. 3, 260–265 (1997)
Artifact Detection on X-ray of Lung
with COVID-19 Symptoms
1 Introduction
The current COVID-19 pandemic [1] that is occurring around the world is
prompting us to try to develop automated methods for recognizing the symptoms
of SARS-Cov-2 [2] on chest X-ray. The chest X-ray (CXR) are used to recognize
the SARS-CoV-2 symptoms among different imaging modalities due to low cost,
low radiation dose, wide accessibility and easy-to-operate in general or commu-
nity hospitals [3]. One of the significant limitations of this diagnostic imaging is
the large number of various artifacts that limit the effectiveness of the diagnosis.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 234–245, 2022.
https://doi.org/10.1007/978-3-031-09135-3_20
Artifact Detection on X-ray of Lung 235
There have been reported cases in which radiographic artifacts have mimicked
potentially more serious pathology. For example, an X-ray artifact caused diag-
nostic problem in a child who is followed up with developmental dysplasia of the
hip joint [4], another time underpants with highly recognizable Snoopy shows
lobulated density overlying pelvic area [5]. In general, the presence of artifacts
can significantly affect the results of computer analysis [6]. Serious problems arise
in developing effective methods for computerized COVID-19 symptom recogni-
tion to support diagnosis, especially with deep learning methods. An extremely
timely and important challenge is to increase the reliability of models and the
efficiency of recognition methods by reducing the impact of unwanted, often
unavoidable artifacts on the effects of CXR image analysis and diagnostic inter-
pretations [7]. This work focuses on developing effective preprocessing methods
to detect these artifacts and to identify segmented areas to be excluded from
further computerized analysis to more reliably recognize of disease symptoms.
This helps solve the black box problem in the results of analyses performed with
deep models used to detect COVID-19 symptoms.
Fig. 1. Chest X-ray with ACQUISITION ARTIFACTS: sliders in bra for adjustable
shoulder straps (a), adjustable hooks and eye fastening (b), underwire and rings in bra
(c), buttons (d), necklace (e), other (f). Chest X-ray with SURGICAL ARTIFACTS:
pacemaker (g), electrode (h) and cable (i)
of this type of artifacts, pixels of different intensity are observed because some
elements are made of metal and some of plastic and they absorb X-rays to
a different extent.
Artifact Detection on X-ray of Lung 237
The presence of image artifacts makes the diagnostically significant content diffi-
cult to enhance in the image. In the case of CXR, the artifacts primarily interfere
with the effectiveness of computer-aided algorithms for pathology detection. The
artifacts are so visible that they are not a problem in making a diagnosis by radi-
ologists. However, both their size and their distribution, often at the lung border
or covering the ribs, can interfere with the results of automated detection meth-
ods for pathological findings. A lot of work on image preprocessing to remove
redundant image artifacts is presented in the literature.
U. Subramaniam et al. [9] applied histogram of oriented gradients algorithm,
Haar transform, and local binary pattern algorithm to improve the image quality
increasing the intersection over union scores in lung segmentation from the X-
ray. According to the research group, the segmentation of lungs from the X-ray
can improve the accuracy of results in COVID-19 detection algorithms or any
machine/deep learning techniques. M. Heidari [10] et al. indicates that two-stage
image preprocessing with the use of both a histogram equalization algorithm and
a bilateral low-pass filter as well as generating a pseudo color image matters in
developing a deep learning CAD scheme of chest X-ray to improve accuracy in
detecting COVID-19 infected pneumonia. The method of Z. Xue et al. [11] was
proposed for the detection of one type of foreign objects in the X-ray i.e. buttons
on the gown that the patient is wearing. The method involves four major steps:
intensity normalization, low contrast image identification and enhancement, seg-
mentation of lung regions, and button object extraction. The method is based
on the circular Hough transform and the Viola-Jones algorithm. An automated
method was presented by Hogeweg et al. [6] to detect, segment, and remove for-
eign objects from CXR. Detection of buttons, brassier clips, jewellery or pace-
makers and wires was performed with the use of supervised pixel classification
with a kNN classifier, resulting in a probability estimate per pixel to belong to an
artifact. Grouping and post-processing pixels with a probability above a certain
threshold was used for radiographic artifact segmentation. Based on literature
review on methods for the automatic detection of COVID-19 symptoms in the
X-ray, it seems to be a need for detection and removal of various types of foreign
objects in a simple and low-cost algorithmic way at the image preprocessing step.
Radiographic artifacts are often present in the lung field, which may erroneously
suggest the presence of pathology in computer-aided diagnosis of lung diseases.
The proposed method starts with preprocessing the DICOM image to gray scale
and contrast enhancement. Lung area segmentation is then performed for further
analysis. The edges are detected subsequently and a binary image of the mask is
created to use morphological dilation and filling the contours in the next stage.
The shape features of each detected region are analyzed to eliminate regions
238 A. Moskal et al.
with a small area or too elongated elements alike ribs. The method overview is
presented on Fig. 2.
concepts. But finally the Sobel operator gave the best results across filtered
training dataset so it was proposed to establish seed points by thresolding. The
K-means clustering algorithm was optimized to calculate the threshold values
with adjusted k = 5 number of clusters. From the obtained cluster centers values,
the first one greater than t = 0.08 was selected. Both parameters were chosen
experimentally. The binary image obtained was then dilated with neighborhood
expressed as 5×5 array of 1’s to grow seedpoints and merge into larger masks or
discard. Finally, the lightly smoothed contours are filled in, removing any holes
in the finally defined objects.
The next stage of the proposed method is the analysis of shapes of the detected
objects. At the beginning, objects with areas smaller than the given value of
M = 1500 pixels are rejected. The lengthwise objects are examined if they have
parabolic shape. That indicates they may be false detected ribs. The skeleton
of the region is created by Zhang’s algorithm [15] in order to detect parabolic
shaped objects. For the extreme left and right points, the straight line equation
is delineated. The middle point between them is checked whether it is above or
below the straight line.
Fig. 3. Method optimization: (a) before detecting ribs as artifacts, (b) after detecting
only electrode – desired effect, (c) before detecting only the brightness artifacts, (d)
after detecting all artifacts – desired effect
Additional analysis of the shape of the detected objects proved necessary due to
the need to differentiate them relative to the brighter areas of the surrounding
ribs. Another challenge was the simultaneous occurrence of several different arti-
facts with varying pixel intensities. Furthermore, if only the maximum threshold
value obtained by the K-means algorithm had been accepted, some artifacts
240 A. Moskal et al.
might have been missed. Therefore, a minimum threshold value was determined
and the first cluster whose center value was greater than this minimum value
was accepted. The Fig. 3 shows both examples of the applied optimization of the
artifact segmentation method.
3 Method Results
For experiments and evaluation of the detection of foreign object, from large
chest radiogram database was used [13]. The data was collected from Hospital
108 and the Hanoi Medical University Hospital, two of the largest hospitals in
Vietnam. The published dataset consists of 18,000 postero-anterior view CXR
scans which were annotated by a group of experienced radiologists for the pres-
ence of critical findings. From the database, two types of were extracted. With
no pathological findings (Fig. 4(a)) and those annotated as consolidation or lung
opacity (Fig. 4(b)) because these are the most common pathological findings in
CXR of COVID-19 patients [14]. There were 1378 and 1366 in these categories,
respectively. Furthermore, a subset of all radiograms with foreign objects like
acquisition or surgical artifacts was selected. For radiograms with findings, we
selected 564 cases but only 270 of them had artifacts within the lungs. In the
second, we selected 393 such cases to at least roughly balance the number of
artifacts of interest in both collections (with or without pathological findings)
(Table 1).
Fig. 4. Sample chest X-rays from database with: (a) no pathological findings and sliders
artifacts, (b) lung opacity and necklace artifact. Case for manual evaluation – lung
mask (marked in blue) and expert’s mask (marked in green) have common area (c).
Underwires are outside the segmented area of the lung
Artifact Detection on X-ray of Lung 241
Using the proposed method, a binary mask of artifacts was created for each
image in the chosen dataset. To evaluate the method, for each manually marked
242 A. Moskal et al.
region within the lung region, it was checked if created mask had any artifacts
detected there. If the mask has marked pixels in this region, it is considered
as true positive (TP), otherwise as false negative (FN). Sensitivity parameter
defined as T P/(T P +F N ) is used to describe the method efficiency. To delineate
the regions inside the lung area, the intersection of the lung mask and the mask
selected by an expert have been determined. Regions with a minimum number
of pixels of 3000 (the value determined experimentally) were assumed as the
area inside lungs field. Additionally, manual verification had to be performed.
Regions marked by the expert are rectangular in shape and may intersect with
the lung area but not contain any regions in this area. This mainly applies to
underwires or necklaces. An example is shown in Fig. 4(c).
The selected with achieved results of proposed method are presented in Fig. 5(a)–
5(h). All show chest X-ray with artifacts ground truth markers (green rectan-
gles), lung area (blue contours) and detected mask of artifacts marked in yellow.
Only artifacts withing lung area should be found.
All artifacts in Fig. 5(a)–5(c) were correctly detected and marked. Figure 5(d)
presents fully detected electrode but without the wire which was probably
removed during ribs elements filtering. Figure 5(g) shows elements of the bra
were not detected on left lung. It may be caused by whiter regions in left lungs
and their closeness to the lung’s periphery. The problem near the lungs’ periph-
ery is also shown in the Fig. 5(e) were artifact are only partially detected. Other
problematic areas might be clavicles because of their high contrast on chest X-
rays. For example in the Fig. 5(f) buttons located on the clavicles were not fully
marked. During edge detection, ribs can easily be found and marked. A part of
proposed method is filtering objects like ribs but is not always successful, espe-
cially when artifacts are connected to them because of low contrast between rib
and foreign object (Fig. 5(h)).
For normal cases, the proposed method correctly detected 288 out of 393 arti-
facts regions which gives sensitivity value 0.73. In case of dataset with COVID-19
findings, method discovered 204 out of 270 artifacts, so the sensitivity value is
0.76. For all cases, method detected artifacts in 90%. The results are summarized
in Table 3.
Dataset TP FN Sensitivity
Normal cases 288 105 0.73
Cases with confirmed COVID-19 204 66 0.76
All cases 492 171 0.74
Artifact Detection on X-ray of Lung 243
Fig. 5. Correctly detected artifacts on chest X-rays (a)–(b) without pathological find-
ings, (c)–(d) with pathological findings. Partially detected artifacts on chest X-rays
with pathological findings (e)–(g). Detected artifacts on chest X-rays without patho-
logical findings (g)–(h). Lung masks are marked in blue. Expert’s masks of artifacts
are marked in green. Algorithm results of artifact detection are marked in yellow
244 A. Moskal et al.
4 Conclusions
Effective detection of radiographic artifacts can, in some cases, increase the
utility of computerized lung segmentation by isolating areas of non-significant
diagnostic content (most of the most effective methods do not use additional
algorithms for artifact detection). However, the role of the proposed method
of artifact detection and its usefulness increases significantly in the case of the
interpretation of lung for e.g. detection of COVID-19 symptoms.
A challenge of deep learning (DL) method is the use of only those features
in the trained model that allow reliable interpretations to support diagnostic
decisions. Only substantively justified image content related to the analyzed
pathology should be described. The extensive computational model often catches
random correlations between the different quality parameters of from different
acquisition systems additional radiograms information. It selects the most dis-
criminating features not necessarily related to diagnostically relevant content.
The specificity of the radiographic artifacts dependent on the X-ray machine,
the hospital procedures of organization of imaging studies, regulations on acqui-
sition conditions, etc. may shape the model much more strongly than subtle
pathological changes. By recognizing artifacts and eliminating their influence on
further analysis and interpretation of radiograms, we increase the reliability and
usefulness of these models.
In other machine learning solutions, by using inpainting methods to cover the
area of detected artifacts with the specificity of adjacent tissue, we increase the
effectiveness of local descriptors characterizing the distribution of local features
of radiograms. This often results in an improvement in the number of accurate
diagnoses.
The developed and tested method is sufficiently effective to achieve such pos-
itive effects improving the effectiveness of COVID-19 diagnostics support. The
presented method: i) is robust for various artifacts types; ii) does not required
labeled training samples; in this paper the annotated set was used only for evalu-
ation purposes; iii) can be easily adapted as preprocessing step in more complex
algorithms. The achieved efficiency of artifact detection is limited, mainly due
to the highly variable nature of artifacts, different acquisition parameters of
radiographs or strongly different specificity of lung tissue in CXR depending
on patient characteristics. However, when combined with an effective segmenta-
tion method, it addresses important DL problems highlighted in recent reports
[16,17].
References
1. World Health Organization: Global research on coronavirus disease (COVID-
19). https://www.who.int/emergencies/diseases/novel-coronavirus-2019/global-
research-on-novel-coronavirus-2019-ncov
2. Kanne, J.P., et al.: COVID-19 imaging: what we know now and what remains
unknown. Radiology 299(3), 262–279 (2021). https://doi.org/10.1148/radiol.
2021204522
Artifact Detection on X-ray of Lung 245
3. Heidari, M., Mirniaharikandehei, S., Zargari, A., et al.: Improving the performance
of CNN to predict the likelihood of COVID-19 using chest X-ray with preprocessing
algorithms. Int. J. Med. Inform. 144 (2020). 104284, ISSN 1386–5056, https://doi.
org/10.1016/j.ijmedinf.2020.104284
4. Uras, I., Yavuz, O.Y., Kose, K.C., Atalar, H., Uras, N., Karadag, A.: Radiographic
artifact mimicking epiphysis of the femoral head in a seven-month-old girl. J. Natl.
Med. Assoc. 98(7), 1181–1182 (2006). PMID: 16895292; PMCID: PMC2569463
5. Mestayer, R.G., Attaway, K.C., Polchow ,T.N., Brogdon, B.G.: Snooping around
the adolescent pelvis: good grief, it’s the brief! AJR Am. J. Roentgenol.
186(2):587–588. PMID: 16423982. https://doi.org/10.2214/AJR.05.0816
6. Hogeweg, L., et al.: Foreign object detection and removal to improve automated
analysis of chest radiographs. Med Phys. 40(7), 071901 (2013). PMID: 23822438.
https://doi.org/10.1118/1.4805104
7. Sarkar, A., et al.: Identification of of COVID-19 from chest X-rays using deep
learning: comparing COGNEX VisionPro deep learning 1.0 software with open
source convolutional neural networks. SN Comput. Sci. 2(3), 130 (2021)
8. Murphy, A.: Clothing artifact. Case study, Radiopaedia.org. https://doi.org/10.
53347/rID-59812. Accessed 18 Jan 2022
9. Subramaniam, U., Monica Subashini, M., Almakhles, D., Karthick, A., Manoharan,
S.: An Expert system for covid-19 infection tracking in lungs using image processing
and deep learning techniques. BioMed Res. Int. 2021 (2021). Article ID 1896762,
17 pages. https://doi.org/10.1155/2021/1896762
10. Heidari, M., et al.: Improving the performance of CNN to predict the likeli-
hood of COVID-19 using chest X-ray with preprocessing algorithms. Int. J. Med.
Inform. 144, 104284 (2020). ISSN 1386–5056, https://doi.org/10.1016/j.ijmedinf.
2020.104284
11. Xue, Z., et al.: Foreign object detection in chest X-rays. In: IEEE International
Conference on Bioinformatics and Biomedicine (BIBM), 2015, pp. 956–961 (2015).
https://doi.org/10.1109/BIBM.2015.7359812
12. Przelaskowski, A., Jasionowska, M., Ostrek, G.: ’Semantic segmentation of abnor-
mal lung areas on chest X-rays to detect COVID-19’. Submitted to ITIB 2022
13. Nguyen, H.Q., et al.: VinDr-CXR: An open dataset of chest X-rays with radiolo-
gist’s annotations (2020)
14. Wong, H., Lam, H., Fong, A.T., Leung, S., Chin, T.Y., Lo, C., Lui, M.S., Lee, J.,
Chiu, K.H., Chung, T.H., Lee, E., Wan, E., Hung, I., Lam, T., Kuo, M., Ng, M.Y.:
Frequency and distribution of chest radiographic findings in patients positive for
COVID-19. Radiology 296(2), E72–E78 (2020)
15. Zhang, T.Y., Suen, C.Y.: A fast parallel algorithm for thinning digital patterns.
Commun. ACM 27(3), 236–239 (1984)
16. Lopez-Cabrera, J.D., Orozco-Morales, R., et al.: Current limitations to identify
COVID-19 using artifcial intelligence with chest X-ray imaging. Health Technol.
11, 411–424 (2021)
17. Maguolo, G., Nanni, L.: A critic evaluation of methods for COVID-19 automatic
detection from X-ray. Inf. Fus. 76, 1–7 (2021)
Semantic Segmentation of Abnormal
Lung Areas on Chest X-rays to Detect
COVID-19
1 Introduction
Reliable recognition of COVID-19 and possibly precise determination of the
scope, severity and prognosis of the disease development is an extremely impor-
tant research task in the era of the current pandemic. However until now the
reliability and effectiveness of the commonly used methods for diagnosing C-19
has been limited, debatable and even controversial [1,2]. High hopes are placed
on chest imaging including computer tomography (CT) and CXR. Although
CT imaging is essentially the gold standard for the diagnosis of lung disease
because it generates spatially differentiated, detailed scans, the low specificity
of CT interpretation is well known. In the case of CXR, a limitation is the low
sensitivity in the initial stages of C-19 development, as well as the ambiguity of
the observed symptoms. Therefore, the search for improvement of the diagnostic
methods used so far or the search for new ones is extremely important.
It can be said that the most common tool used for both diagnostic and mon-
itoring the course of the disease is CXR available in most health care facilities.
Compared to RT-PCR or CT imaging tests, having an X-ray image is an extremely
low cost and a fast process taking only a few seconds [3]. This portable modality
prevents the patient to move, minimizing the possibility of spreading the virus and
exposing the patient to a lower dose of ionizing radiation [4]. Reported diagnostic
performance in a large high C-19 prevalence cohort of CXR was significant. For
example, 79% sensitivity and 81% specificity were reported for the diagnosis of
viral pneumonia in symptomatic patients with clinical suspicion of C-19 [5].
A special role is played by the concepts and specific implementations of
computer-aided lung diagnosis based on digital CXR images. This applies to
recognition, characteristics or clinical interpretations of such diseases like tuber-
culosis, pneumonia, cardiomegaly, pneumothorax, pneumoconiosis, emphysema
or cancer [6–8]. In particular, effective segmentation of the lung regions from
surrounding thoracic issue turns out to be an essential, very important compo-
nent of any intelligent method of CXR image analysis, disease recognition or
case interpretation in the context of specific clinical circumstances [9,10]. This
task is not easy, especially in view of significantly diverse body habitus, varying
image quality including sampling distortions, multiplicative noise level, possi-
ble motion and low contrast of the imaged objects, presence of artifacts due to
external or internal non-anatomical objects, i.e. introduced medical equipment
including electrocardiographic leads, prosthetic devices etc.
2 Lung Segmentation
An analysis of the state of the literature identified three major groups of algo-
rithms: i) rule-based methods generally used as initialization of more robust pixel
classification-based methods; ii) model-based methods including active shape or
active appearance models and level sets with extensions; iii) deep learning (DL)
models in various configurations. Of course, also various forms of hybrid integra-
tion of these concepts have been proved to be effective in some applications [9,11].
However, this problem has taken on particular importance in the context of C-19
diagnosis. The most effective recently developed methods typically propose CXR-
based DL models, overwhelmingly analyzing entire radiograms without the need
to precisely delineate the lung region. However, this has proven to be unreliable
in some cases [12]. The high performance of disease classification was sometimes
determined significantly by features extracted from outside the lung region alone
[4]. Random correlations completely unrelated to the object of analysis proved
decisive. The trained model in this case did not describe real relationships between
selected image descriptors and reliable symptoms of the disease confirming its diag-
nosis. Using lung segmentation, we force the model to focus only on lung areas or
even a more narrowly defined region within the lung reasoned to model knowledge
regarding C-19 and human related performance. Segmentation may not improve
248 A. Przelaskowski et al.
the classification results, but because it forces you to use only lung area informa-
tion, it increases reliability of the results and justifies them.
The effect of lung segmentation and more detailed CXR content on C-19
identification was evaluated by Teixeira et al. [10]. The hypothesis that a proper
lung segmentation might mitigate the bias introduced by data-driven models
and provide a more realistic performance was verified. The obtained results sug-
gest that proper lung segmentation can effectively remove a meaningful part of
noise and background information, forcing the model to take into account only
data from the lung area containing desired information in this specific context.
However, they found that even after segmentation, there is a strong bias due to
factors from the different specifics of the imaging systems. These experiments
made us realize the importance of narrowing the analysis to areas of interest
where actual C-19 symptoms may appear, while reducing the influence of non-
anatomical objects, any artifacts specific to certain imaging procedures.
Thus, we proposed a method that appeals to domain knowledge primarily con-
cerning the specifics of X-ray imaging, the determinants of C-19 diagnosis, includ-
ing the features of differential symptoms, and the object-oriented description of
lung shape using real, representative CXR manifestations ordered by criteria of a
sparse, semantic representation of lung tissue. The solution closest to our concept is
object-oriented lung description using anatomical atlases (AnAtlas) [8]. The refer-
ence atlas was constructed from a limited set of preselected, possibly representative
chest radiograms with binary lung shape GT (Ground Truth) patterns. Five items
were then selected from this atlas that were most similar to the query in terms of
the two (vertical and horizontal) Radon projections of the tissue intensity distribu-
tion. This robust idea, developed in the context of early detection of tuberculosis,
refers to the concept of patient-specific lung model extraction basing on the most
similar radiograms preselected according to content-base image retrieval (CBIR),
also used for CXR analysis by [13].
The proposed method reduces to the integration of two fundamental con-
cepts: a) effective local characterization differentiating specific features of lung
tissue relevant to the description of C-19 symptoms with respect to other tissues
and useless objects; b) inference of domain knowledge regarding possible features
of lung shape in the context of possible determinants of C-19 diagnosis. Thus, it
combines object-oriented modeling of lung shape with differential tissue charac-
terization taking into account all anomalies and individual determinants of the
analyzed case. Thus, we developed a kind of dedicated segmentation, which can
serve as estimated activity maps of differentiating features of the C-19. These
maps can be directly analyzed by diagnosticians or used to construct or optimize
models identifying the disease or describing it numerically.
3 Method
Fig. 1. Flowchart of the proposed method for semantic segmentation of abnormal lung
regions on chest X-ray images. In addition to the basic scheme of the method, extensions
to the suggested method for its effective use are also presented. These are highlighted
in yellow. A description of a possible artifact detector is provided in [35]. The next
section provides a brief outline of our test implementations of the C-19 detector, which
uses feature distributions in a designated, diagnostically relevant ROI. The figure sym-
bolically shows selected forms of texture feature interpretation highlighting potential
C-19 symtoms
State of the Art. Among effective methods, median filtering was sometimes
used both for denoising and to enhance the local contrast [17]. Further contrast
enhancement by normalization allows the contrast between the whole chest and
the background area to be increased. The intensity range has been expanded
to take full advantage of all possible intensity ranges [18]. More effective solu-
tion starts with an image energy decomposition across subsequent bands. Then,
the localized energy of each band was iteratively scaled to a reference value to
reconstruct locally normalized image [19], giving maximum standardization of
the structures of interest in the radiogram. For extraction purposes, 2-D Law’s
masks have been used so far, among others, to improve the texture information
of the segmented chest and highlight its micro-structure characteristics. Nine
Lung Segmentation on Chest X-rays 251
3 × 3 Law’s texture masks were applied on the initially segmented chest image
to produce a new texture image for every convolution mask [18]. The ensemble
model as a group of weak learners was used to form a powerful learner increas-
ing the precision of the model based on GLCM-based features calculated from
each of the 9 texture images. Efficient extraction of key lung tissue features by
unsharp masking was confirmed by Chen [20].
The goal is to select similar dictionary patterns similar to query lung tissue
distribution at the interpretive levels in the context of supported diagnosis.
Experimentally, we concluded CW-SSIM (Complex Wavelet Structural Sim-
ilarity Index Measure) [29] as most useful 2D similarity metric. It is invariant
under affine transformations and can be used effectively to reliably determine
the level of semantic similarity in domain of predefined multiscale, local features
of tissue texture. Scale-invariant feature transform (SIFT), especially effective
in image retrieval applications [30], was selected and experimentally verified to
determine the domain in which high-level semantic features are calculated. How-
ever, we only used SIFT maps of the three fixed, most differential directions (7,
20, 50) to extract coarse-grained essential features of tissue profiling and reduce
computational complexity.
254 A. Przelaskowski et al.
A set of GT masks of the selected approximants were used to obtain the cor-
responding final lung segmentation. The determined parameters of the warping
model allow mapping all points of these masks to the target shape of the seg-
mented lungs. The proposed algorithm of lung shape refinement thus looks as
follows:
– warping the most similar masks of the selected approximants to the patient
CXR on the basis of transformations determined in the SIFT feature space;
– the transformation model can be optimized by using local texture features
specified by descriptors that primarily characterize important diagnostic fea-
tures of lung tissue;
– lung shape refinement using transformed lung masks summed together, taking
into account the similarity weights of the corresponding dictionary patterns
to the query radiogram, calculated during query approximation;
– adaptive thresholding determines the initial form of the binary mask of the
query followed by final contour smoothing with cubic splines.
Lung Segmentation on Chest X-rays 255
4 Experiments
The research discussed here and the results of the experiments carried out were
oriented toward the effective diagnosis of C-19 in lung radiograms. In our opin-
ion, the key stage confirming the reliability of the obtained results and, above
all, the usefulness of the conclusions formulated on their basis is the semantic
segmentation of the lung area. The use of domain knowledge to indicate pos-
sible areas of expression of C-19 symptoms, taking into account the specificity
of the imaging system used, is essential in the clinically effective interpretation
of the acquired information. The experiments were conducted primarily to ver-
ify the presented concept in its fundamental aspect concerning non redundant
segmentation. Therefore, it was not so much the average efficiency, but mainly
the minimum error obtained on a reliable test set that was the most important
evaluation criterion. The implementation used includes previously characterized
algorithms, methods, relationships, and parameters including registration based
on SIFT features as universal in retrieval of radiograms. The reported results
serve only as a preliminary verification of the proposed concept for the use of
advanced C-19 detection methods. They were planned as a demonstration of its
feasibility and preliminary verification of practical potential. It is therefore a
prof of concept study, not a full, credible verification of usability of the proposed
segmentation method, with the criterion of possibly minimal errors with respect
to reference masks.
References
1. Mardian, Y., Kosasih, H., et al.: Review of current COVID-19 diagnostics and
opportunities for further development. Front. Med. 8, 615099 (2021)
2. Laskar, P., Yallapu, M.M., Chauhan, S.C.: “Tomorrow Never Dies”: Recent
advances in diagnosis, treatment, and prevention modalities against coronavirus
(COVID-19) amid controversies. Diseases 8, 30 (2020)
3. Yamac, M., Ahishali, M., et al.: Convolutional sparse support estimator-based
COVID-19 recognition from X-ray images. IEEE Tran. Neura Networks Learn.
Syst. 32(5), 1810–1820 (2021)
4. Lopez-Cabrera, J.D., Orozco-Morales, R., et al.: Current limitations to identify
COVID-19 using artifcial intelligence with chest X-ray imaging. Health Technol.
11, 411–24 (2021)
5. Flor, N., et al.: Diagnostic performance of chest radiography in high COVID-
19 prevalence setting: experience from a European reference hospital. Emergency
Radiol. 28(5), 877–885 (2021). https://doi.org/10.1007/s10140-021-01946-x
6. Reamaroon, N., Sjoding, M.W., et al.: Robust segmentation of lung in chest x-ray:
applications in analysis of acute respiratory distress syndrome. BMC Med. Imaging
20, 116 (2020)
7. Liu, X., Li, K.-W., et al.: Review of deep learning based automatic segmentation
for lung cancer radiotherapy. Front. Oncol. 11, 717039 (2021)
8. Candemir, S., Jaeger, S., Palaniappan, K., et al.: Lung segmentation in chest radio-
graphs using anatomical atlases with nonrigid registration. IEEE Tran. Med. Imag-
ing 33(2), 577–590 (2014)
9. Candemir, S., Antani, S.: A review on lung boundary detection in chest X-rays.
Int. J. Comput. Assist. Radiol. Surg. 14(4), 563–576 (2019). https://doi.org/10.
1007/s11548-019-01917-1
10. Teixeira, L.O., Pereira, R.M., Bertolini, D., et al.: Impact of lung segmentation
on the diagnosis and explanation of COVID-19 in chest X-ray images. Sensors 21,
7116 (2021)
Lung Segmentation on Chest X-rays 259
11. Calli, E., Sogancioglu, E., et al.: Deep learning for chest X-ray analysis: a survey.
Med. Image Anal. 72, 102125 (2021)
12. Maguolo, G., Nanni, L.: A critic evaluation of methods for COVID-19 automatic
detection from X-ray images. Inf. Fusion 76, 1–7 (2021)
13. Yu, Y., Hu, P., Lin, J., Krishnaswamy, P.: Multimodal multitask deep learning
for X-ray image retrieval. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N.,
Speidel, S., Zheng, Y., Essert, C. (eds.) MICCAI 2021. LNCS, vol. 12905, pp.
603–613. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87240-3 58
14. Stengers, I.: Thinking with Whitehead: A Free and Wild Creation of Concepts.
Harvard University Press, Cambridge (2014)
15. Wang, H., Wang, Q., et al.: Multi-scale location-aware kernel representation for
object detection. In: Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pp. 1248–57 (2018)
16. Thompson, J.R.: Empirical Model Building: Data, Models, and Reality. Wiley,
Hoboken (2011)
17. Hassanien, A.E., Mahdy, L.N., et al.: Automatic xray covid-19 lung image clas-
sification system based on multi-level thresholding and support vector machine.
medRxiv (2020)
18. Mohammed, S.N., Alkinani, F.S., Hassan, Y.A.: Automatic computer aided diag-
nostic for COVID-19 based on chest X-ray image and particle swarm intelligence.
Int. J. Intell. Eng. Syst. 13, 5 (2020)
19. Philipsen, R.H.H.M., Maduskar, P., et al.: Localized energy-based normalization
of medical images: application to chest radiography. IEEE Trans. Med. Imaging
34(9), 1965–1975 (2015)
20. Chen, S., Cai, Y.: Enhancement of chest radiograph in emergency intensive care
unit by means of reverse anisotropic diffusion-based unsharp masking model. Diag-
nostics 9, 45 (2019)
21. Khodaskar, A., Ladhake, S.: Semantic image analysis for intelligent image retrieval.
Procedia Comput. Sci. 48, 192–197 (2015)
22. Chenggang, L.L., Yan, C., et al.: Distributed image understanding with semantic
dictionary and semantic expansion. Neurocomputing 174(A), 384–392 (2016)
23. DeVore, R.A.: Nonlinear approximation. Acta Numerica 7, 51–150 (1998)
24. Zhong, A., Li, X., et al.: Deep metric learning-based image retrieval system for
chest radiograph and its clinical applications in COVID-19. Med. Image Anal. 70,
101993 (2021)
25. Shiraishi, J., Katsuragawa, S., Ikezoe, J., et al.: Development of a digital image
database for chest radiographs with and without a lung nodule: receiver Operating
Characteristic analysis of radiologists’ detection of pulmonary nodules. AJR 174,
71–74 (2000)
26. Jeager, S., Candemir, S., et al.: Two public chest X-ray datasets for computer-
aided screening of pulmonary diseases. Quant. Imaging Med. Surg. 4(6), 475–477
(2014)
27. Pogarell, T., Bayer, N., et al.: Evaluation of a novel content-based image retrieval
system for the differentiation of interstitial lung diseases in CT examinations. Diag-
nostics 11, 2114 (2021)
28. Nonrigid registration of lung CT images based on tissue features. Comput. Math.
Meth. Med. 834192, 1–7 (2013)
29. Sampat, M.P., Wang, Z., et al.: Complex wavelet structural similarity: a new image
similarity index. IEEE Tran. Image Proc. 18(11), 2385–2401 (2009)
260 A. Przelaskowski et al.
1 Introduction
In the human body are various joints about different sizes and structures. From
a medical point of view, the knee joint (Fig. 1) is one of the largest and the most
complex joints of the human. In this joint can be distinguish hard (femur, patella
and tibia) and soft (i.e. ligaments, muscles) structures. This joint consists of the
distal end of the femur, which abuts and slides on the proximal surface of the
tibia. The whole of the knee joint is completed by: the patella, ligaments and
muscles. The patella slides on the front surface of the distal end of the femur.
Large ligaments connect the femur and the tibia to provide stability to the joint.
The long thigh muscles ensure the strength of the knee [2,7,8].
This complex joint of the humane body plays a very important role in stand-
ing, walking and running. In the case of patellar chondromalacia a proper patella
segmentation [2,11] plays a very important role and in the case of the knee joint
arthroplasty (knee replacement) a correct femoral and tibial heads segmenta-
tion [9] plays a very important role. Therefore in the above mentioned cases, it
is very helpful for the specialist doctor (orthopedist) that these bone structures
may be extracted from MRI or CT slices of the knee joint and in the next step
its three-dimensional presentation be obtained. This allows the specialist doctor
to accurately diagnose the pathological bone structures of the knee joint.
The soft structures are also very important in the knee joint. The cruci-
ate ligaments (ACL – anterior cruciate ligament and PCL – posterior cruciate
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 261–270, 2022.
https://doi.org/10.1007/978-3-031-09135-3_22
262 W. Żak and P. Zarychta
ligament) together with the collateral ligaments are responsible for the knee sta-
bility and ensure proper arthrokinematics and contact forces [2]. The ACLs and
PCLs (Fig. 1) belong to the group of anatomical structures frequently susceptible
to injuries (especially in athletes [5]. A proper segmentation of the injured cruci-
ate ligaments and their three-dimensional representation allows the orthopedist
to accurately diagnose.
Fig. 1. MRI slices of the knee joint in sagittal plane: a) original slice of the T1-weighted
series and b) internal schema
In the literature many different methods have been dedicated to the segmen-
tation of the human body structures. Usually, these are methods that require
user interaction, sometimes these are fully automatic methods (very desirable
in the medical field) [3,8]. Usually, in practical solutions the below mentioned
methods are used [1,11]:
2 Methodology
2.1 Region Growing
The method comprises selecting one or more seed points in the ROI (Region
Of Interest). Then, neighboring pixels are added to the seed point in case they
meet the inclusion criterion, which can be intensity level difference, texture etc.
Each currently analyzed pixel is compared in its neighborhood and if it meets
the inclusion criterion it is added to the structur. Most of the methods that are
based on region growing are semi-automatic methods due to the need to select a
seed point. The exact steps of the algorithm were described in [4,13]. According
to [12], the region growing method has some difficulties in segmenting cartilage
due to its thickness, so it is more widely used in segmenting bony structures.
2.2 FCM
In this study in order to ensure an automated method of segmentation of selected
knee structures atlas based segmentation has been implemented. In order to
achieve this purpose, the algorithm starts with the automated image match-
ing and after that with the normalization of clinical images. After these steps
a dataset to which all scans in the series are allocated is determined. On this
base the average feature vector for the teaching group, which automates and
streamlines both fuzzy segmentation methods (fuzzy c-means and fuzzy con-
nectedness) is delineated. These averaged features (centroid and surface area of
the segmented structure) are then transmitted to the fuzzy segmentation meth-
ods implemented for the testing group, correspondingly, for each given scan.
Centroids become then seed or starting points for the fuzzy methods, and the
surface area protects against oversegmentation.
According to the literature [9,10], the standard Fuzzy C-Means (FCM) algo-
rithm is very popular and widely used in practical solutions. This algorithm has
many advantages, however it does not incorporate spatial context information
which makes it sensitive to noise and image artefacts. In order to reduce this
disadvantage in this study the FCM objective function has been modified by
adding a second term. This second term formulates a spatial constraint based
on the median estimator. In image processing approaches an implementation of
median filtering replaces each data sample by its spatial neighbourhood func-
tion. Neighbourhood function is defined as M edF (xn , Z) = median(S), where
S = neighbourhood(xn , Z) and Z determines the size of the mask. Then, the
final formula for FCM median modified can be expressed as
c
N
J(U, V) = um
in xn − vi + αM edF (xn , Z) − vi ,
2 2
(1)
i=1 n=1
where uiN denotes membership function, vi denotes prototype for a given fuzzy-
fication level m (where 1 ≤ m ≤ ∞), xn = {xi , . . . , xk } and xn , vi ∈ F k and F k
is a feature space.
264 W. Żak and P. Zarychta
In this paper the fuzzy c-means algorithm with median modification is not
described in detail. An exhaustive description of the FCM algorithm with median
modification can be found in [7,9].
On the basis of literature [6], the fuzzy connectedness (FC) method is based
on the fuzzy affinity relation and the generalized definition of FC introduces
an iterative method, that permits the fuzzy connectedness to be determined in
relation to the marked image point (seed or starting point). The starting points
have been marked on the basis of the atlas based segmentation (centroids).
In this paper the fuzzy connectedness algorithm is not described in detail. An
exhaustive description this algorithm can be found in [7,9].
3.1 Materials
The methodology has been tested on 15 clinical T1-weighted MRI studies of the
knee joint. The entire data set had a total of 303 slices in sagittal plane. The
MRI data have been acquired for females and males of different ages.
Fig. 2. T1-weighted MRI series – signal from data base clinical hospital with 3D ROI
including the ACL and PCL
Fig. 3. Segmentation (FC method) of the bone structures of the knee joint: (a) tibia,
(b) patella and (c) femur
Fig. 4. Segmentation of the healthly and pathological anterior and posterior crucuate
ligament structures in the MRI: (a) original image with starting point, (b) result after
fuzzy connectedness method (c) result superimposed onto the original image
growing method performed best on the femur, with the Dice coefficient values
being the highest here. Figure 6 shows the segmentation result for the best femur
segmentation using the region growing method. Green and magnenta colors have
been used for differences between superimposed structures and white for over-
lapping images [13]. As far as the ACL and PCL structures are concerned, the
region growing method did not cope with the segmentation – in addition to these
structures, other soft structures were also segmented, which meant that the seg-
mentation was not effective and the results were not satisfactory. The results
reached values of about 15%, which, compared to the Dice coefficient values of
Verification of Selected Segmentation Methods 267
Table 1. Values of Dice index for the analyzed MRI series for the Region growing
(RG) method
Table 2. Values of Dice index for the analyzed MRI series for the Fuzzy connectedness
(FC) method
the bony structures, does not allow to assess that the segmentation brought the
expected results. Based on the results, it was concluded that the region growing
method was not suitable for segmentation of the ACL and PCL ligaments.
Table 3. Values of Dice index for the analyzed MRI series for the Fuzzy C-Means
(FCM) method
Fig. 5. Discrepancy in the obtained values of the Dice index for the following methods:
a) RG, b) FC, c) FCM, d) FC, e) FCM, f) RG, g) FC and h) FCM
Verification of Selected Segmentation Methods 269
The best results for bone structures were obtained for the Fuzzy C-Means
method (Table 3), which are additionally reproducible for each imaging series.
The obtained Dice coefficient values also depend on the quality of the imaging
series images, which are not always the same.
Fig. 6. Segmentation result for the best femur segmentation using the region growing
method a) original slice and b) differences between superimposed structures (green and
magnenta colors have been used for differences between superimposed structures and
white for overlapping images) (Color figure online)
4 Conclusions
The region growing method could perform better, if it worked on a dynamic
inclusion criterion, not like the tests on a fixed intensity difference between pixels.
As mentioned by the authors [12] there is a problem with the segmentation of
cartilage using the region growing method, also a problem has been encountered
with the segmentation of ligaments. This could also be solved by using additional
image preprocessing to remove noise or improve quality of the image.
In the case of cruciate ligament, the best way to obtain the correct segmen-
tation is to reduce the analysis area to the region of interest containing the
desirable anatomical structures. Therefore, in this study in order to increase
the efficiency of computational procedures the fuzzy methods (FC and FCM)
have been limited to 3D ROI for the extraction the both anterior and posterior
cruciate ligament.
The obtained results of the segmentation methods implemented in this article
have confirmed the theses in the literature [1,3,9]. RG method can be used for
segmentation of the tibia (Dice index above 82%) and femur (Dice index above
89%). However, this method is not dedicated to the segmantation of the patella or
cruciate ligaments (Dice index below 65% or lower). Fuzzy methods (FC or FCM)
can be used for segmentation of the bone strucures and cruciate ligament in the
knee joint (Dice index above 85%). Properly selected preprocessing methods in
combination with atlas based segmentation together with fuzzy methods allow
270 W. Żak and P. Zarychta
even to obtain a Dice index above 90%. Therefore, the final conclusions can be
interpreted, that the fuzzy methods are quite good in relation to the knee joint
structures segmentation.
References
1. Aprovitola, A., Gallo, L.: Knee bone segmentation from MRI: a classification and
literature review. Biocybern. Biomed. Eng. 36(2), 437–449 (2016)
2. Bochenek, A., Reicher, M.: The Human Anatomy. PZWL, Warsaw (1990)
3. Kubicek, J., Penhaker, M., Augustynek, M., et al.: Segmentation of knee cartilage:
a comprehensive review. J. Med. Imaging Health Inform. 8(3), 401–418 (2018)
4. Öztürk, C.N., Albayrak, S.: Automatic segmentation of cartilage in high-field Mag-
netic Resonance Images of the knee joint with an improved voxel-classification-
driven region-growing algorithm using vivinity-correlated subsamling. Comput.
Biol. Med. 72, 90–107 (2016). https://doi.org/10.1016/j.compbiomed.2016.03.011
5. Pasierbinski, A., Jarzabek, A.: Biobiomechanics of the cruciate ligaments. Acta
Clinica 4(1), 284–293 (2001)
6. Udupa, J., Samarasekera, S.: Fuzzy connectedness and object definition: theory,
algorithms, and applications in image segmentation. Graph Models Image Process.
58, 246–261 (1996)
7. Zarychta, P.: Automatic registration of the medical images T1- and T2-weighted
MR knee images. In: Napieralski, A. (ed.) Proceedings of the International Confer-
ence Mixed Design of Integrated Circuits and Systems MIXDES2006, pp. 741–745
(2006)
8. Zarychta, P.: Cruciate ligaments of the knee joint in the computer analysis.
In: Pietka, E., Kawa, J., Wieclawek, W. (eds.) Information Technologies in
Biomedicine, Advances in Intelligent and Soft Computing, vol. 283, pp. 71–80
(2014)
9. Zarychta, P.: A new approach to knee joint arthroplasty. Comput. Med. Imaging
Graph. 65, 32–45 (2018)
10. Zarychta, P.: Posterior Cruciate Ligament – 3D Visualization. In: Kurzynski, M.,
Puchala, E., Wozniak, M., Zolnierek, A. (eds.) Computer Recognition Systems 2.
Advances in Soft Computing, vol. 45, pp. 695–702. Springer, Heidelberg (2007).
https://doi.org/10.1007/978-3-540-75175-5 87
11. Zarychta, P.: Patella – atlas based segmentation. In: Pietka, E., Badura, P., Kawa,
J., Wieclawek, W. (eds.) Information Technologies in Medicine, Advances in Intel-
ligent Systems and Computing, vol. 1011, pp. 314–322 (2019)
12. Zhang, B., Zhang, Y., Cheng, H., Xian, M., Cheng, O., Huang, K.: Computer-
aided knee joint magnetic resonance image segmentation – a survey. ArXiv
abs/1802.04894 (2018)
13. Żak, W.: Segmentation and three-dimensional visualization of chondromalacia
lesions of the femoral head. In: Recent Advances in Computational Oncology and
Personalized Medicine, vol. 1 (2021)
Rigid and Elastic Registrations
Benchmark on Re-stained Histologic
Human Ileum Images
1 Introduction
rigid or affine body transformation. The cost functions are based on histology
feature matching (such as blood vessels or small tissue voids) [7], mutual infor-
mation of pixel intensities [23], and many others [2,3,18,28]. More sophisticated
methods include tissue mask boundary alignment [19], background segmentation
with B-spline registration [12], step-wise registrations of image patches with ker-
nel density estimation, and hierarchical resolution regression [16]. However, stud-
ies demonstrating applicability of the existing registration methods to images of
re-stained tissue sections are lacking.
Since the accuracy of H&E and IHC image registration is strongly related to
the quality of data generated for ML training, it is important to identify most
accurate methods. The goal is to achieve excellent correspondence between the
H&E and IHC-stained tissue images so that positions of the same cells in both
images are ideally matched [16]. To achieve this level of accuracy on images
from re-stained slides, the slides need to be scanned at a high resolution, oppo-
site to the scans from serial sections which can be digitized at the same or lower
resolution [20]. The level of correspondence (or quality of registration) can be
measured by the number of successfully registered regions and the registration
error expressed as the distance between landmark points before and after regis-
tration placed at random cells. Methods that yield a large number of registered
regions (without failure) and smallest possible registration errors would be most
suitable for inclusion in pipelines that generate data for ML development.
In this paper, we focus on investigating accuracy of several existing image-
registration methods using high-resolution WSIs of human ileum stained with
H&E and then re-stained with IHC with an antibody visualizing neuronal cells.
To carry out the analyses, regions extracted from the WSIs were annotated
with landmark points. Rigid [11], affine [30], elastic B-splines [2] and moving
least squares [24] transformations that utilize image intensity and feature-based
approaches [17,27] were tested. Our goal was to identify those techniques that
can yield a high number of accurately registered image regions isolated from the
re-stained H&E and IHC slides.
2 Materials
For the purpose of this study, we used 25 sections (1 section per case, 4 µm stan-
dard thickness) of human ileum. Each section was first stained with H&E, imaged
with Aperio AT Turbo whole slide scanner with 20x magnification objective
(Fig. 1(a)), then destained and re-stained with IHC, and imaged on the same
scanner again. Pixel size in the obtained WSIs was 0.491 µm × 0.491 µm. IHC
involved antibodies reactive to S100 proteins that are normally present in neu-
ronal cell lineages including Shwann, glial and neural cells. The purpose was
to distinguish the myenteric plexus which is a network of nerves between the
layers of the muscular propria in the ileum. DAB (brown chromogen) was used
to visualize the positive staining (Fig. 1(b)). Slide re-staining and imaging was
performed at the Cedars-Sinai Biobank.
Using the digital slide viewer (Aperio ImageScope ver.12.4.3), we anno-
tated corresponding regions of interest (ROIs) in H&E and IHC WSIs by first
274 P. Cyprys et al.
Fig. 1. Example WSIs of a re-stained tissue section. Manually annotated ROIs are
marked in green. The ROIs vary by size, location and tissue histology. H&E slide (a),
IHC slide (b). The tissue on the IHC slide is damaged. Example ROIs with visibly
damaged tissue are in the WSIs’ center
Re-stained Tissue Image Registration 275
annotating ROIs in the IHC WSIs and then transferring the ROI annotations
to the H&E WSIs. We annotated as many ROIs with myenteric plexus (positive
staining) per WSI as possible. Other regions (sparse or no staining) were chosen
too without giving any preference to the location and included other parts of
ileal histology (crypts, stroma, inflammation, muscle layer, and fat) with differ-
ent proportions in each ROI. This process yielded n = 593 pairs (23.7 ± 14.2 per
WSI) of corresponding H&E and IHC ROIs. The ROI size varied from 617 ×
676 to 4450 × 2288 pixels, but ROIs measuring about 1k × 1.1k pixels were
most common (60.4% of all ROIs) (Fig. 2(a)).
The H&E and IHC ROIs were subsequently annotated for landmarks. The
landmarks were manually generated in 3DSlicer [8] which turned out to be very
handy in visualizing superimposed H&E and IHC ROIs [31]. By adjusting opac-
ity, we could see individual cells in both ROIs and hence place landmarks precisely
near or at corresponding cells that were chosen and then save their coordinates in a
file. 8 to 10 corresponding landmark pairs were placed in each H&E and IHC ROI,
(5020 in total, 8.45 on average per ROI) (Fig. 2(b)). To assure good landmark cor-
respondence, they were placed on borders of cell nuclei, wherever possible.
Fig. 2. ROI and landmark statistics in 25 pairs of H&E and IHC WSIs used in this
study. (a) In ROI area histogram about 60% of the ROIs had area around 1024 × 1024
pixels (N = 593). (b) Landmark distances between corresponding H&E and IHC ROIs
before registration (N = 5020). The median distance between landmarks before regis-
tration in our set was 28 pixels (equivalent to 13.78 µm). About 12% of these landmarks
were 200 pixels away or more
276 P. Cyprys et al.
3 Methods
Registration methods tested in this study originate from two families: feature-
based and intensity-based. The feature-based methods such as the B-spline elas-
tic [3] (ELT) and moving least squares [34] (MLS) utilize the scale invariant
feature transform (SIFT) [21] which finds grid points in two images subjected to
registration. Since finding the points is carried out separately for each image, the
points may not correspond in terms of number and location in the two images.
However, corresponding grid points can be found through a matching transform
such as rigid (RG), affine (AF) or similarity (SM). The choice of the matching
transform is part of the feature extraction process and determined by the user.
The registration begins once parameters of the matching transform are found. In
the last step, the images undergo deformation through B-splines or the moving
least squares method applied to image pixels between the grid points found after
the matching.
Methods from the intensity-based family apply the matching transform (i.e.
RG or AF) directly to the images without finding the grid points. Since grid
points are not available, the transformation parameters are initially unknown.
However, they can be found iteratively i.e. by maximizing mutual information
between pixel intensities (IN) in images subjected to registration through an
evolutionary optimizer [36].
The registrations were run independently on two gray-level representations of
the H&E and IHC ROIs: (a) luminance channel (LU), and (b) hematoxylin stain-
ing channel (HE). Pixel intensities in the LU channel were calculated as follows:
LU = 0.2989R + 0.5870G + 0.1140B, where: the R, G and B are red, blue and
green color pixel intensities in the original H&E and IHC ROIs, whereas pixel
intensities of the HE channel were found through the color deconvolution algo-
rithm [32]. The gray-level image representation can be considered as a variable in
the registration method, leading to the following notation: MethodFeature (image
representation), where the gray-level image representation is either LU or HE.
Registration methods that we studied have previously been implemented
in Matlab and Fiji scientific computing environments. Specifically, the feature-
based methods: ELTRG , ELTAF , ELTSM , MLSRG , MLSAF , MLSSM are avail-
able in FIJI (ImageJ ver. 1.53c) [35] as “Register Virtual Stack Slices” pack-
age with the B-spline elastic being called through “bUnwarpJ ” function. The
intensity-based RGIN and AFIN image registrations can be called in Matlab
(ver.R2020b) through the “imregtform” function. Each of the tested registration
methods required method-specific hyperparameters to be set. However, we used
default values except the “modality” hyperparameter in the optimizer settings
used by RGIN and AFIN which we set to “multimodal ”.
Accuracy of registration methods applicable to digital pathology is usually
assessed by the median target registration error (MTRE) expressed as the median
distance between corresponding landmarks in all ROIs after registration [9].
Besides MTRE, the average (mean) target registration error (ATRE) is often com-
puted. Since ATRE and MTRE may not fully reflect the performance of a method
in yielding highly accurate registrations for a series of ROIs, we introduced the
Re-stained Tissue Image Registration 277
4 Results
593 ROIs and 5020 landmarks were extracted from 25 pairs of WSIs obtained
from restrained tissue sections to test six selected feature-based and two intensity-
based image registration techniques. Example registrations are presented in Fig. 3.
PCCs, PSCs and registration times that we assessed first are shown in Table 1.
Distributions of TRE in successfully registered ROIs (Table 1, column with PSC)
are shown in Fig. 4(a). ATRE, and MTRE for PSC ROIs are shown in Table 2. A
heatmap of successfully registered ROIs as a function of the percent landmarks
that are within a successfully registered ROI was also plotted (Fig. 5).
Table 1. Registration performance of ROIs in our dataset. PCC and PSC are the
respective percentages of correctly and successfully registered ROIs
All intensity-based methods had the highest rates of PCC (99–100%) and
their PSC was around 80% (Table 1). The feature-based registrations were less
278 P. Cyprys et al.
Fig. 3. Registrations of corresponding H&E and IHC ROIs by two different methods.
(a) shows corresponding H&E (left) and IHC (right) ROIs extracted from a WSI and
annotated for landmarks (green and blue dots) before registration. A pair of corre-
sponding landmarks is marked by the green arrow (landmark placed in the H&E ROI)
and the blue arrow (landmark placed in the IHC ROI). Distances between paired land-
marks after registrations were used to measure the rates of PSC, PCC, and ATRE
and MTRE. (b) shows the registration by the RGIN (HE) and (c) by the MLSRG (HE)
method. Yellow arrows point at paired-landmarks after registration. ROIs are well
aligned if the blue and green dots overlap (solid arrows). Dashed arrows point at poorly
aligned landmark pairs. Note, the tissue damage in the IHC ROI
Re-stained Tissue Image Registration 279
successful and accurate, that is, the PCC rate for this family was in the range
of 64–90% and the PSC oscillated between 50.2% and 65.94%. Although the
intensity-based methods yielded more accurate regions, the time needed to com-
plete registrations was approximately 5–10 times longer when compared to that
needed by the feature based-methods. The PSC and PCC rates for the LU
image representation were generally higher by 1–5% than the corresponding rates
obtained by the same registration method for the HE image representation.
Regardless the rate of success and accuracy of registration, the MTRE ranged
from 2.0 to 2.5 pix for all the methods (Fig. 4(a)). However, in ROIs that were
successfully registered, the ATRE was higher and ranged between 2.7 and 4.7 pix
(Fig. 4(b)) suggesting that both PSC and MTRE are essential in assessing the
success rate and accuracy of registration.
5 Discussion
Accurate registration of high-resolution tissue images from re-stained tissues
remains a challenge [20,33]. In this study, we tested two intensity-based and six
feature-based registration methods that were applied to WSIs from re-stained
histologic specimens from of human ileum. Ileal sections are fragile [37] and
therefore susceptible to damage when exposed to mechanical stress or tissue
280 P. Cyprys et al.
6 Conclusions
The intensity-based and elastic registration methods were most accurate in reg-
istration of ROIs from re-stained H&E and IHC tissue sections of the human
ileum. The intensity-based rigid registration may be more practical for generat-
ing ground truth data for developing ML models segmenting neuronal cells due
to the highest rate of correctly registered ROIs and overall low median and aver-
age target registration errors. Further studies involving re-stained tissues from
282 P. Cyprys et al.
additional repositories should validate our observations. The whole slide images
and ROIs with landmarks that we prepared can be valuable in benchmarking
other image registration approaches in computational pathology.
Acknowledgement. This project was in part supported by the grant from the Helm-
sley Charitable Trust and the grants from the Silesian University of Technology no.
BK-231/RIB1/2022 and 31/010/SDU20/0006-10 (Excellence Initiative – Research Uni-
versity). The authors would also like to thank the Cedars-Sinai Biobank for preparation
and digitization of slides.
References
1. Anand, D., et al.: Deep learning to estimate human epidermal growth factor recep-
tor 2 status from hematoxylin and eosin-stained breast tissue images. J. Pathol.
Inform. 11, 19 (2020). https://doi.org/10.4103/jpi.jpi 10 20
2. Arganda-Carreras, I., Sorzano, C.O.S., Marabini, R., Carazo, J.M., Ortiz-de-
Solorzano, C., Kybic, J.: Consistent and elastic registration of histological sections
using vector-spline regularization. In: Beichel, R.R., Sonka, M. (eds.) CVAMIA
2006. LNCS, vol. 4241, pp. 85–95. Springer, Heidelberg (2006). https://doi.org/10.
1007/11889762 8
3. Borovec, J., et al.: Anhir: automatic non-rigid histological image registration chal-
lenge. IEEE Trans. Med. Imaging PP, 1–1 (2020). https://doi.org/10.1109/TMI.
2020.2986331
4. Bulten, W., et al.: Epithelium segmentation using deep learning in h&e-stained
prostate specimens with immunohistochemistry as reference standard. Scientific
Reports 9, 864 (2019). https://doi.org/10.1038/s41598-018-37257-4
5. Bándi, P., Balkenhol, M., Ginneken, B., Laak, J., Litjens, G.: Resolution-agnostic
tissue segmentation in whole-slide histopathology images with convolutional neural
networks. PeerJ 7, e8242 (2019). https://doi.org/10.7717/peerj.8242
6. Chen, C.T.: Radiologic image registration: old skills and new tools. Acad. Radiol.
10, 239–41 (2003)
7. Cooper, L., Sertel, O., Kong, J., Lozanski, G., Huang, K., Gurcan, M.: Feature-
based registration of histopathology images with different stains: an application for
computerized follicular lymphoma prognosis. Comput. Methods Programs Biomed.
96, 182–92 (2009). https://doi.org/10.1016/j.cmpb.2009.04.012
8. Fedorov, A., et al.: 3D slicer as an image computing platform for the quantitative
imaging network. Magnetic Resonance Imaging 30, 1323–41 (2012). https://doi.
org/10.1016/j.mri.2012.05.001. https://www.slicer.org
9. Fitzpatrick, J., West, J.: The distribution of target registration error in rigid-
body point-based registration. IEEE Trans. Med. Imaging 20(9), 917–927 (2001).
https://doi.org/10.1109/42.952729
10. Gallego, J., Swiderska, Z., Markiewicz, T., Yamashita, M., Gabaldon, M., Ger-
tych, A.: A u-net based framework to quantify glomerulosclerosis in digitized pas
and h&e stained human tissues. Computerized Medical Imaging and Graphics 89,
101,865 (2021). https://doi.org/10.1016/j.compmedimag.2021.101865
11. Ghahremani, M., et al.: Rigid Registration, pp. 1087–1099. Springer, Cham (2021).
https://doi.org/10.1007/978-3-030-63416-2 184
Re-stained Tissue Image Registration 283
12. Gonzalez, D., Frafjord, A., Øynebråten, I., Corthay, A., Olivo-Marin, J.C., Meas-
Yedid, V.: Multi-staining registration of large histology images. In: 2017 IEEE 14th
International Symposium on Biomedical Imaging, pp. 345–348 (2017). https://doi.
org/10.1109/ISBI.2017.7950534
13. Hatipoglu, N., Bilgin, G.: Cell segmentation in histopathological images with deep
learning algorithms by utilizing spatial relationships. Med. Biol. Eng. Comput.
55(10), 1829–1848 (2017). https://doi.org/10.1007/s11517-017-1630-1
14. Hinton, J., et al.: A method to reuse archived h&e stained histology slides for a
multiplex protein biomarker analysis. Methods Prot. 2, 86 (2019). https://doi.org/
10.3390/mps2040086
15. Ing, N., et al.: A novel machine learning approach reveals latent vascular pheno-
types predictive of renal cancer outcome. Sci. Rep. 7 (2017). https://doi.org/10.
1038/s41598-017-13196-4
16. Jiang, J., Larson, N., Prodduturi, N., Flotte, T., Hart, S.: Robust hierarchical
density estimation and regression for re-stained histological whole slide image
co-registration. PLoS ONE 14, e0220,074 (2019). https://doi.org/10.1371/journal.
pone.0220074
17. Johnson, H., Christensen, G.: Consistent landmark and intensity-based image reg-
istration. IEEE Trans. Med. Imaging 21, 450–61 (2002). https://doi.org/10.1109/
TMI.2002.1009381
18. Kuska, J.P., et al.: Image registration of differently stained histological sections. In:
2006 International Conference on Image Processing, pp. 333–336 (2006). https://
doi.org/10.1109/ICIP.2006.313161
19. Kybic, J., Dolejšı́, M., Borovec, J.: Fast registration of segmented images by normal
sampling. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition
Workshops (2015). https://doi.org/10.1109/CVPRW.2015.7301311
20. Lotz, J., Weiss, N., van der Laak, J., Heldmann, S.: High-resolution image
registration of consecutive and re-stained sections in histopathology (2021).
ArXiv:2106.13150
21. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J.
Comput. Vision 60(2), 91–110 (2004). https://doi.org/10.1023/b:visi.0000029664.
99615.94
22. Ma, Z., et al.: Semantic segmentation of colon glands in inflammatory bowel disease
biopsies. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) ITIB 2018.
AISC, vol. 762, pp. 379–392. Springer, Cham (2019). https://doi.org/10.1007/978-
3-319-91211-0 34
23. Maes, F., Vandermeulen, D., Suetens, P.: Medical image registration using mutual
information. Proc. IEEE 91(10), 1699–1722 (2003). https://doi.org/10.1109/jproc.
2003.817864
24. Menon, H., Narayanankutty, K.A.: Applicability of non-rigid medical image regis-
tration using moving least squares. Int. J. Comput. Appl. 1, 85–92 (2010). https://
doi.org/10.5120/138-256
25. Mäkelä, T., et al.: A review of cardiac image registration methods. IEEE Trans.
Med. Imaging 21, 1011–21 (2002). https://doi.org/10.1109/TMI.2002.804441
26. Nirschl, J., et al.: Chapter 8 - Deep Learning Tissue Segmentation in Cardiac
Histopathology Images, pp. 179–195. Academic Press (2017). https://doi.org/10.
1016/B978-0-12-810408-8.00011-0
27. Oliveira, F.P., Tavares, J.M.R.: Medical image registration: a review. Comput.
Methods Biomech. Biomed. Engin. 17(2), 73–93 (2014). https://doi.org/10.1080/
10255842.2012.670855
284 P. Cyprys et al.
28. Ourselin, S., Roche, A., Subsol, G., Pennec, X., Ayache, N.: Reconstructing a 3D
structure from serial histological sections. Image Vis. Comput. 19, 25–31 (2001).
https://doi.org/10.1016/S0262-8856(00)00052-4
29. Pantanowitz, L., et al.: Review of the current state of whole slide imaging in pathol-
ogy. J. Pathol. Inform. 2, 36 (2011). https://doi.org/10.4103/2153-3539.83746
30. Pitiot, A., Bardinet, E., Thompson, P., Malandain, G.: Piecewise affine registra-
tion of biological images for volume reconstruction. Med. Image Anal. 10, 465–83
(2006). https://doi.org/10.1016/j.media.2005.03.008
31. Pyciński, B., Yagi, Y., Walts, A.E., Gertych, A.: 3-D tissue image reconstruction
from digitized serial histologic sections to visualize small tumor nests in lung adeno-
carcinomas. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information
Technology in Biomedicine. AISC, vol. 1186, pp. 55–70. Springer, Cham (2021).
https://doi.org/10.1007/978-3-030-49666-1 5
32. Ruifrok, A.C., Johnston, D.A.: Quantification of histochemical staining by color
deconvolution. Anal. Quant. Cytol. Histol. 23(4), 291–299 (2001)
33. Ruusuvuori, P., et al.: Spatial analysis of histology in 3D: quantification and visu-
alization of organ and tumor level tissue environment. Heliyon p. e08762 (2022).
https://doi.org/10.1016/j.heliyon.2022.e08762
34. Schaefer, S., McPhail, T., Warren, J.: Image deformation using moving least
squares. In: ACM SIGGRAPH 2006 Papers on - SIGGRAPH 2006. ACM Press
(2006). https://doi.org/10.1145/1179352.1141920
35. Schindelin, J., et al.: Fiji: an open-source platform for biological-image analysis.
Nat. Methods 9(7), 676–682 (2012). https://doi.org/10.1038/nmeth.2019
36. Styner, M., Brechbuhler, C., Szckely, G., Gerig, G.: Parametric estimate of intensity
inhomogeneities applied to MRI. IEEE Trans. Med. Imaging 19(3), 153–165 (2000).
https://doi.org/10.1109/42.845174
37. Williams, J.M., Duckworth, C.A., Vowell, K., Burkitt, M.D., Pritchard, D.M.:
Intestinal preparation techniques for histological analysis in the mouse. Curr. Prot.
Mouse Biol. 6(2), 148–168 (2016). https://doi.org/10.1002/cpmo.2
DVT: Application of Deep Visual
Transformer in Cervical Cell Image
Classification
Wanli Liu1 , Chen Li1(B) , Hongzan Sun2 , Weiming Hu1 , Haoyuan Chen1 ,
and Marcin Grzegorzek3
1
Microscopic Image and Medical Image Analysis Group, College of Medicine
and Biological Information Engineering, Northeastern University, Shenyang, China
lichen@bmie.neu.edu.cn
2
Shengjing Hospital, China Medical University, Shenyang, China
sunhz@sj-hospital.org
3
Institute of Medical Informatics, University of Luebeck, Luebeck, Germany
marcin.grzegorzek@uni-luebeck.de
1 Introduction
Cervical cancer is a very common cancer among women. Weak immunity, smok-
ing, use of contraceptives, and unsanitary menstruation may all lead to cervical
cancer. Patients with cervical cancer have symptoms such as abnormal vaginal
bleeding and leucorrhea [1]. However, cervical cancer can be prevented through
early examination and treatment [2].
The cytopathological examination is an effective means of diagnosing can-
cer. Cytopathologists use a microscope to examine cell slides collected from the
patient’s cervix to determine if it is cancerous. However, manually checking the
slides to diagnose cancer is a very difficult task, not only time-consuming but
also error-prone.
A computer-aided diagnosis system (CAD) can automatically and accurately
screen cervical cell images. Early CAD used traditional methods. In recent years,
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 285–294, 2022.
https://doi.org/10.1007/978-3-031-09135-3_24
286 W. Liu et al.
deep learning methods have become popular in computer vision and other fields.
However, recently, the Transformer used in the field of Natural Language Pro-
cessing (NLP) appears in the field of computer vision, and we collectively refer
to the Transformer model applied in the field of computer vision as Visual Trans-
former (VT). The first VT model to appear was the Vision Transformer (ViT).
ViT is better than Convolutional Neural Networks (CNN) when pre-training
large amounts of data [3].
In this study, we propose a VT framework called Deep Visual Transformer
(DVT) to perform classification tasks. We use SIPaKMeD and CRIC datasets
together [4,5]. DVT achieves an accuracy of 87.35%. The workflow of DVT can
be seen in Fig. 1.
In Fig. 1, cell images in (a) are the training data, from the SIPaKMeD and
CRIC datasets together, including 11 categories. (b) is the preprocessing stage
of the data. The data first goes through the augmentation phase and then is
normalized. (c) is the training model stage, the preprocessed data is input into
model for training. In (d), test images of cells are input into the trained model
for testing. In (e), precision, recall, F1-Score, and accuracy are calculated to
evaluate the performance of DVT.
The main contributions of this study are as follows: (1) This study combines
two datasets together, SIPaKMeD (five categories) and CRIC (six categories),
to perform an 11-class classification task. (2) We use DVT framework to classify
cervical cells, which is an applied innovation in the field of cell classification. (3)
DVT achieves a good result of 87.35%.
2 Related Work
2.1 Applications of Deep Learning in Cervical Cell Classification
Deep learning methods are widely used for cervical cell classification. The char-
acteristics of some of the deep learning methods used in this study are described
below.
VGG network is a classic network in CNN, which improves the performance
of the model by increasing the depth of the network. This network uses a small
DVT 287
3×3 convolution kernel, and the amount of parameters is greatly reduced [6].
InceptionV3 improves performance by increasing the width of the network. It
uses multi-channel convolution to extract more information from the image [7].
ViT applies the Transformer in the NLP field to the computer vision field for the
first time, and its core components are the attention mechanism and Multilayer
Perceptron (MLP). It works better than CNN in the pre-trained case [3]. DeiT
is an upgraded version of ViT. It adds distillation tokens to ViT’s model. But
the tiny version of DeiT does not introduce distillation tokens, only improves
parameters [8]. T2T-ViT is an upgraded version of ViT, which proposes a new
T2T mechanism. It is better than ViT with fewer parameters [9].
Then we introduce the working direction of some literature to understand
how deep learning can be applied to cervical cell classification. For more detailed
information, please refer to our review paper [10].
In [11], a transfer learning framework that integrates four CNN models is
proposed to process the Herlev dataset. In [12], this study uses CNN for feature
extraction and machine learning models for classification. In [13], this method
uses pre-trained CNN to extract features and uses feature fusion to process the
SIPaKMeD dataset. A method based on graph convolutional network is proposed
to classify cervical cells [14]. A comparative experiment using 22 deep learning
models to classify cervical cell images is performed in [15].
It can be seen that most of the literature uses the CNN model for processing.
No studies have used VT to classify cervical cell images. Therefore, this study
is of great significance for practical applications.
random fields and CNN is proposed in [26]. In [27], a network architecture based
on U-Net, Inception and connection operations is proposed for the environmental
microorganism image segmentation task.
3 Method
To use VT for the cervical cell classification task, we design the DVT framework,
and its structure can be seen in Fig. 1:
(a) uses training data from the SIPaKMeD and CRIC datasets together, includ-
ing 8838 cell images in 11 categories.
(b) demonstrates the preprocessing stage of the data. The data first passes
through the augmentation stage, and the image has a half probability of
being flipped by the mirror. This can enhance the generalization ability of
the model. Then the data undergo a normalization stage to complete pre-
processing. The normalization stage normalizes the data by channel, that is,
subtracting the mean and then dividing by the variance. It can speed up the
convergence of the model.
(c) is the DVT model whose detailed structure can be seen in Fig. 2. The input
size of the image is 224×224 pixels. We use the DeiT model to classify cer-
vical cell images [8]. It is worth noting that the DeiT model we use is a
tiny version without distillation tokens, which is equivalent to a parameter-
improved version of ViT. This model performs better than ViT. Its core
components are the attention layer, the MLP layer, and the residual struc-
ture, which are marked in green in Fig. 2. The attention layer can extract the
global features of the image, and the residual structure solves the problem of
gradient disappearance or explosion that may be caused by the deepening of
the network. DVT extracts the 192-dimensional features of the image, and
finally obtains the classification result through the fully connected layer.
(d) uses cervical cell test images to test the trained DVT to see if the performance
is robust.
(e) evaluates the performance of DVT by calculating precision, recall, F1-Score,
and accuracy, which are commonly used evaluation metrics.
CRIC. CRIC dataset has 400 Pap smear images. Images can be divided into
six categories: negative for intraepithelial lesion or malignancy (NILM); atypical
squamous cells of undetermined significance, possibly non-neoplastic (ASC-US);
low-grade squamous intraepithelial lesion (LSIL); atypical squamous cells, can-
not exclude a high-grade lesion (ASC-H); high-grade squamous intraepithelial
lesion (HSIL); squamous cell carcinoma (SCC) [5]. This study only uses 4789
cropped cell images [28]. There are some examples in Fig. 3.
DVT 289
Table 1. Data division of the combined dataset. (1: ASC-H, 2: ASC-US, 3: SCC, 4:
HSIL, 5: LSIL, 6: NILM, 7: Dyskeratotic, 8: Koilocytotic, 9: Metaplastic, 10: Parabasal,
11: Superficial-Intermediate)
Dataset/Class 1 2 3 4 5 6 7 8 9 10 11 Total
Train 484 446 413 525 491 518 488 495 476 473 499 5308
Validation 161 148 137 175 164 172 163 165 159 157 166 1767
Test 161 148 137 174 163 172 162 165 158 157 166 1763
Total 806 742 687 874 818 862 813 825 793 787 831 8838
Table 4. Comparison results of models on the test set of the combined dataset
References
1. Šarenac, T., Mikov, M.: Cervical cancer, different treatments and importance of
bile acids as therapeutic agents in this disease. Front. Pharmacol. 10, 484 (2019)
2. Saslow, D., et al.: American cancer society, American society for colposcopy and
cervical pathology, and American society for clinical pathology screening guidelines
for the prevention and early detection of cervical cancer. CA Cancer J. Clin. 62(3),
147–172 (2012)
3. Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image
recognition at scale. arXiv preprint arXiv:2010.11929 (2020), https://arxiv.org/
abs/2010.11929
4. Plissiti, M.E., Dimitrakopoulos, P., Sfikas, G., Nikou, C., Krikoni, O., Charchanti,
A.: Sipakmed: a new dataset for feature and image based classification of normal
and pathological cervical cells in pap smear images. In: 2018 25th IEEE Interna-
tional Conference on Image Processing (ICIP), pp. 3144–3148. IEEE (2018)
5. Rezende, M.T., et al.: Cric searchable image database as a public platform for
conventional pap smear cytology data. Sci. Data 8(1), 1–8 (2021)
6. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv preprint arXiv:1409.1556 (2014). https://arxiv.org/abs/
1409.1556
7. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the incep-
tion architecture for computer vision. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
8. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training
data-efficient image transformers & distillation through attention. In: International
Conference on Machine Learning, pp. 10347–10357. PMLR (2021)
9. Yuan, L., et al.: Tokens-to-token vit: training vision transformers from scratch
on imagenet. arXiv preprint arXiv:2101.11986 (2021), https://arxiv.org/abs/2101.
11986
10. Rahaman, M.M., et al.: A survey for cervical cytopathology image analysis using
deep learning. IEEE Access 8, 61687–61710 (2020)
11. Xue, D., et al.: An application of transfer learning and ensemble learning techniques
for cervical histopathology image classification. IEEE Access 8, 104603–104618
(2020)
12. Khamparia, A., Gupta, D., de Albuquerque, V.H.C., Sangaiah, A.K., Jhaveri, R.H.:
Internet of health things-driven deep learning system for detection and classifica-
tion of cervical cells using transfer learning. J. Supercomput. 76(11), 8590–8608
(2020). https://doi.org/10.1007/s11227-020-03159-4
DVT 293
13. Rahaman, M.M., et al.: Deepcervix: a deep learning-based framework for the clas-
sification of cervical cells using hybrid deep feature fusion techniques. Comput.
Biol. Med. 136, 104649 (2021)
14. Shi, J., Wang, R., Zheng, Y., Jiang, Z., Zhang, H., Yu, L.: Cervical cell classifica-
tion with graph convolutional network. Comput. Methods Programs Biomed. 198,
105807 (2021)
15. Liu, W., et al.: Is the aspect ratio of cells important in deep learning? A robust
comparison of deep learning methods for multi-scale cytopathology cell image clas-
sification: From convolutional neural networks to visual transformers. Comput.
Biol. Med. 105026 (2021)
16. Li, C., Zhang, J., Kulwa, F., Qi, S., Qi, Z.: A SARS-CoV-2 microscopic image
dataset with ground truth images and visual features. In: Peng, Y., et al. (eds.)
PRCV 2020. LNCS, vol. 12305, pp. 244–255. Springer, Cham (2020). https://doi.
org/10.1007/978-3-030-60633-6 20
17. Rahaman, M.M., et al.: Identification of covid-19 samples from chest x-ray images
using deep learning: a comparison of transfer learning approaches. J. Xray Sci.
Technol. 28(5), 821–839 (2020)
18. Ismael, A.M., Şengür, A.: Deep learning approaches for covid-19 detection based
on chest x-ray images. Expert Syst. Appl. 164, 114054 (2021)
19. Li, C., et al.: A review for cervical histopathology image analysis using machine
vision approaches. Artif. Intell. Rev. 53(7), 4821–4862 (2020). https://doi.org/10.
1007/s10462-020-09808-7
20. Li, C., et al.: A comprehensive review of computer-aided whole-slide image analy-
sis: from datasets to feature extraction, segmentation, classification and detection
approaches. Artif. Intell. Rev. 1–70 (2021). https://doi.org/10.1007/s10462-021-
10121-0
21. Chen, H., et al.: IL-MCAM: an interactive learning and multi-channel attention
mechanism-based weakly supervised colorectal histopathology image classification
approach. Comput. Biol. Med. 143, 105265 (2022)
22. Hu, W., et al.: GasHisSDB: a new gastric histopathology image dataset for com-
puter aided diagnosis of gastric cancer. Comput. Biol. Med. 105207 (2022)
23. Hameed, Z., Zahia, S., Garcia-Zapirain, B., Javier Aguirre, J., Marı́a Vanegas,
A.: Breast cancer histopathology image classification using an ensemble of deep
learning models. Sensors 20(16), 4373 (2020)
24. Li, C., Wang, K., Xu, N.: A survey for the applications of content-based microscopic
image analysis in microorganism classification domains. Artif. Intell. Rev. 51(4),
577–646 (2017). https://doi.org/10.1007/s10462-017-9572-4
25. Zhang, J., et al.: A comprehensive review of image analysis methods for microor-
ganism counting: from classical image processing to deep learning approaches.
Artif. Intell. Rev. 1–70 (2021)
26. Kosov, S., Shirahama, K., Li, C., Grzegorzek, M.: Environmental microorganism
classification using conditional random fields and deep convolutional neural net-
works. Pattern Recogn. 77, 248–261 (2018)
27. Zhang, J., et al.: LCU-Net: a novel low-cost u-net for environmental microorganism
image segmentation. Pattern Recogn. 115, 107885 (2021)
28. Diniz, N., et al.: A deep learning ensemble method to assist cytopathologists in
pap test image classification. J. Imaging 7(7), 111 (2021)
29. Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: Bottle-
neck transformers for visual recognition. arXiv preprint arXiv:2101.11605 (2021).
https://arxiv.org/abs/2101.11605
294 W. Liu et al.
30. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2:
Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
31. Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for
efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C.,
Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138.
Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9 8
32. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet
and the impact of residual connections on learning. In: Proceedings of the Thirty-
First AAAI Conference on Artificial Intelligence, pp. 4278–4284 (2017). https://
dl.acm.org/doi/10.5555/3298023.3298188
Image Classification in Breast
Histopathology Using Transfer
and Ensemble Learning
1 Introduction
Breast cancer is the most common malignant tumor in the world, directly affect-
ing the lives of 2.3 million women every year. Especially for females, it accounts
for 30% of all cancer deaths in the world [21]. Breast cancer is a disease in which
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 295–306, 2022.
https://doi.org/10.1007/978-3-031-09135-3_25
296 Y. Zheng et al.
breast epithelial cells proliferate out of control under the action of a variety of
carcinogens. It generally can be divided into benign (not dangerous to health)
and malignant (potentially dangerous). Early treatment of breast cancer is very
important, and doctors need to choose an effective treatment plan based on its
malignancy. At present, histopathological diagnosis is generally regarded as a
“gold standard” [13]. To better observe and analyze the different components of
the tissue under the microscope, Hematoxylin and Eosin staining is frequently
used, abbreviated as H&E [4].
However, it is difficult to observe the tissue with naked eyes and to manually
analyze the visual information based on prior medical knowledge, so CAD is of
vital significance. However, most of the existing methods are based on a single
classifier, and few models based on image-level can achieve good results. This
is because image-level classification is more complex: Images under the same
patient-level label are related, but image-level images have relatively independent
labels, which is much more challenging. Therefore, this paper proposes an image-
level classification method based on transfer learning and ensemble learning. The
workflow is shown in Fig. 1.
As the workflow presented, the basic framework of this method is mainly com-
posed of six parts: (a) Data augmentation. The benign tumor image is flipped
horizontally and vertically, in order to balance the amount of training data
between benign and malignant tumors. (b) Data input. The images are divided
into training, validation and test sets. (c) Transfer learning. Six CNNs are used
for transfer learning to get applicable networks. (d) Classifier selection. The idea
of ensemble pruning is applied to select the most suitable base classifiers. (e)
Classification in Breast Histopathology Image 297
Ensemble learning. The weighted voting method is used to further improve the
classification performance. (f) Result evaluation. The confusion matrix is used
to measure the classification ability of the whole algorithm.
2 Related Work
In the past ten years, CNN has had an outstanding performance in Breast
Histopathology Image Classification. In [14], a third-party software (LNKNet
package) containing a neural network classifier is applied to evaluate two spe-
cific textures, namely the quantity density of the two landmark substances, and
a 90% classification accuracy of breast histopathology images is achieved.
In the paper [20], morphological features are extracted to classify cancer cells
and lung cancer cells in histopathological images. In the experiment, the multi-
layer perceptIon based on the feedforward ANN model obtains 80% accuracy,
82.9% sensitivity, and 89.2% AUC successfully.
In [2], a CNN based classifier of full-slice histopathological images is designed.
First, the posterior estimate is obtained, which is derived from the CNN with
specific magnification. Then, the posterior estimates of these random multi-views
are vote-filtered to provide a slice-level diagnosis. Finally, 5-fold cross-validation
is used in the experiment, with an average accuracy of 94.67%, a sensitivity of
96%, a specificity of 92%, and an F1-score of 96.24%.
In [31], a new hybrid convolutional and cyclic DNN is created to classify
breast cancer histopathological images. The paper uses fine-tuned Inception-V3
to extract features for each image block. Then, the feature vectors are input to
the 4-layer bidirectional long and short-term memory network [17] for feature
fusion. Finally, the average accuracy is 91.3% and a new dataset containing 3,771
histopathological images of breast cancer is published.
Other applications of classic CNNs and deep learning in breast histopatho-
logical images have been summarized in [34]. However, there are a few kinds of
research on the binary classification of breast histopathological images using
ensemble learning methods. Among them, patient-level classification is com-
monly studied, such as research in Table 1. Nevertheless, since the image-level
classification is more complex, the accuracy is relatively low in previous research,
such as [11] with less than 90% and [7] with around 91%. Therefore, this paper
focuses on the challenge of image-level classification. At the same time, the char-
acteristics of each transfer learning model are comprehensively considered, and
ensemble learning is carried out based on their complementarity.
298 Y. Zheng et al.
3 Method
In this section, the transfer learning and ensemble learning methods are intro-
duced separately, and some classical CNNs are mentioned.
4 Experiments
In this section, the experimental settings, process, analysis and limitation of the
results will be presented. Empirically, we demonstrate our method significantly
improves on a number of prior art machine learning schemes.
300 Y. Zheng et al.
Fig. 2. Benign/malignant tumor subtype samples. (a)–(d) are benign images, (e)–(h)
are malignant images
In this paper, a transfer learning strategy is applied to train the models. After
many trials, parameters are set when the validation set obtains the best result.
The decay steps are set to 5, the decay rate is 0.1, the initial learning rate is
1 × 10−4 and the adaptive moment estimation (Adam) is selected as the opti-
mizer. Besides, these CNNs are also trained from scratch to evaluate the per-
formance of transfer learning, which means that all the weights are initialized
randomly. Results are presented in Table 2.
Classification in Breast Histopathology Image 301
Table 2. The result and prediction time (unit, s) of CNN models on validation set
(unit, [%]). Acc – Accuracy; R – Precision; R – Recall; F1 – F1-score
From Table 2, it can be concluded that the effect of transfer learning is gen-
erally better than training from scratch by around 3% to 14%. Although the
performance of Inception-V3 shows an opposite trend, this does not affect our
subsequent choice of transfer learning for the next step. This is because training
a network from scratch usually takes more than 2 h.
Figure 3 shows the transfer learning training curves of the six CNNs. Sub-
sequently, these six networks are compared, and their confusion matrices are
Fig. 3. CNN training process curve. The X-axis is epoch, Y-axis is acc_loss. Training
accuracy-blue, training loss-green, validation accuracy-yellow, validation loss-red
302 Y. Zheng et al.
shown in Fig. 4. In the end, the Resnet-50 network performs best, with three
indicators that top all network models. To be precise, the accuracy is 98.90%,
the precision reaches 98.72%, and the F1-score reaches 98.90%. The second place
is the Densenet-201 network at around 98%, and the recall of the Densenet-201
network is the highest among all models, reaching 98.63%. By contrast, The
results of the Inception-V3 network are relatively the worst. The lowest value of
precision is obtained on VGG-19, namely 93.27%.
Table 4. A comparison between the existing methods and our method on BreakHis
Discussion. It is clear that the accuracy of our method is much better than
some state-of-the-art machine learning strategies. Compared with a single clas-
sifier mentioned above, all of the four indicators are enhanced in the ensembled
framework. In particular, F1-score is even close to 100%, showing that both
precision and recall of the model are excellent.
Figure 6 illustrates some images that are not correctly classified. It can be
found that the correctly classified image contains almost all features of benign
and malignant. However, some samples are quite similar to the other type of
texture, distribution and color, such as Fig. 6 (a) and Fig. 6 (b). Also, the wrongly
predicted images often contain very little valuable information, such as Fig. 6 (c)
and Fig. 6 (d). Patch-based images are allowed to contain many blanks during
segmentation. These reasons above can interfere with the model’s reliability.
Fig. 6. An example of the classification result. (a) and (c) are correctly classified
images. (b) and (d) are wrongly classified images
Classification in Breast Histopathology Image 305
References
1. Anda, J.: Histopathology image classification using an ensemble of deep learning
models. Sensors 20 (2020)
2. Das, K., Karri, S., Roy, A., et al.: Classifying histopathology whole-slides using
fusion of decisions from deep convolutional network on a collection of random
multi-views at multi-magnification. In: Proceedings of ISBI 2017, pp. 1024–1027
(2017)
3. Graham, B., El-Nouby, A., Touvron, H., et al.: Levit: a vision transformer in con-
vnet’s clothing for faster inference. In: Proceedings of ICCV 2021, pp. 12,259–
12,269 (2021)
4. Gurcan, M., Boucheron, L., Can, A., et al.: Histopathological image analysis: a
review. IEEE Rev. Biomed. Eng. 2, 147–171 (2009)
5. Hadad, O., Ran, B., Ben-Ari, R., et al.: Ensemble learning (2009)
6. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In:
Proceedings of CVPR 2016, pp. 770–778 (2016)
7. Hu, C., Sun, X., Yuan, Z., et al.: Classification of breast cancer histopathological
image with deep residual learning. Int. J. Imaging Syst. Technol. 31, 1583–1594
(2021)
8. Huang, G., Liu, Z., Van Der Maaten, L., et al.: Densely connected convolutional
networks. In: Proceedings of CVPR 2017, pp. 4700–4708 (2017)
9. Kassani, S., Kassani, P., Wesolowski, M., et al.: Classification of histopathological
biopsy images using ensemble of deep learning networks (2019)
10. Kolesnikov, A., Dosovitskiy, A., Weissenborn, D., et al.: An image is worth 16x16
words: Transformers for image recognition at scale (2021)
11. Li, J., Zhang, J., Sun, Q., et al.: Breast cancer histopathological image classification
based on deep second-order pooling network, pp. 1–7 (2020)
12. Liu, H., Dai, Z., So, D., et al.: Pay attention to mlps. Adv. Neural. Inf. Process.
Syst. 34, 9204–9215 (2021)
13. Matos, D., J., A., B.J., Oliveira, L., et al.: Histopathologic image processing: a
review. arXiv: 1904.07900 (2019)
14. Petushi, S., Garcia, F., Haber, M., et al.: Large-scale computations on histology
images reveal grade-differentiating parameters for breast cancer. BMC Med. Imag-
ing 6(1), 1–11 (2006)
15. Rahaman, M., Li, C., Yao, Y., et al.: Identification of covid-19 samples from chest
x-ray images using deep learning: a comparison of transfer learning approaches. J.
Xray Sci. Technol. 28(5), 821–839 (2020)
16. Ribani, R., Marengoni, M.: A survey of transfer learning for convolutional neural
networks. In: Proceedings of SIBGRAPI 2019, pp. 47–57 (2019)
306 Y. Zheng et al.
17. Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans.
Signal Process. 45(11), 2673–2681 (1997)
18. Senousy, Z., Abdelsamea, M., Gaber, M., et al.: Mcua: multi-level context and
uncertainty aware dynamic deep ensemble for breast cancer histology image clas-
sification. IEEE Transactions on Biomedical Engineering, pp. 1–1 (2021)
19. Senousy, Z., Abdelsamea, M., Mostafa Mohamed, M., et al.: 3e-net: Entropy-based
elastic ensemble of deep convolutional neural networks for grading of invasive breast
carcinoma histopathological microscopic images. Entropy 23, 620 (2021)
20. Shukla, K., Tiwari, A., Sharma, S., er al.: Classification of histopathological images
of breast cancerous and non cancerous cells based on morphological features.
Biomed. Pharmacol. J. 10(1), 353–366 (2017)
21. Siegel, R., Miller, K., Fuchs, H., et al.: Cancer statistics, 2021. CA: A Cancer J.
Clin. 71(1), 7–33 (2021)
22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale
image recognition. arXiv:1409.1556 (2014)
23. Spanhol, F., Oliveira, L., Petitjean, C., et al.: A dataset for breast cancer
histopathological image classification. IEEE Trans. Biomed. Eng. 63(7), 1455–1462
(2015)
24. Srinivas, A., Lin, T., Parmar, N., et al.: Bottleneck transformers for visual recog-
nition, pp. 16,519–16,529 (2021)
25. Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions, pp. 1–9 (2015)
26. Tolstikhin, I., Houlsby, N., Kolesnikov, A., et al.: Mlp-mixer: an all-mlp architecture
for vision. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
27. Touvron, H., Bojanowski, P., Caron, M., et al.: Resmlp: Feedforward networks for
image classification with data-efficient training. arXiv: 2105.03404 (2021)
28. Touvron, H., Cord, M., Douze, M., et al.: Training data-efficient image transformers
& distillation through attention, pp. 10,347–10,357 (2021)
29. Touvron, H., Cord, M., Sablayrolles, A., et al.: Going deeper with image trans-
formers. In: Proceedings of ICCV 2021, pp. 32–42 (2021)
30. Xu, W., Xu, Y., Chang, T., et al.: Co-scale conv-attentional image transformers.
In: Proceedings of ICCV 2021, pp. 9981–9990 (2021)
31. Yan, R., Ren, F., Wang, Z., et al.: Breast cancer histopathological image classifi-
cation using a hybrid deep neural network. Methods 173, 52–60 (2020)
32. Yang, Z., Ran, L., Zhang, S., et al.: Ems-net: ensemble of multiscale convolutional
neural networks for classification of breast cancer histology images. Neurocomput-
ing 366, 46–53 (2019)
33. Yuan, L., Chen, Y., Wang, T., et al.: Tokens-to-token vit: training vision trans-
formers from scratch on imagenet. In: Proceedings of ICCV 2021, pp. 558–567
(2021)
34. Zhou, X., Li, C., Rahaman, M., et al.: A comprehensive review for breast
histopathology image analysis using classical and deep neural networks. IEEE
Access 8, 90931–90956 (2020)
PIS-Net: A Novel Pixel Interval Sampling
Network for Dense Microorganism
Counting in Microscopic Images
1 Introduction
The problem of environmental pollution needs to be resolved with the develop-
ment of urbanization. With the break of Corona Virus Disease 2019 (COVID-19),
people pay more attention to the microorganism analysis [20,22]. The biologi-
cal methods, which are more efficient with no pollution, are widely applied for
pollution controlling by comparing with physical and chemical methods. Yeast
is a kind of single-celled eukaryotic microorganism, which was first applied in
wastewater treatment and performed startlingly [29], and it can also be applied
for the treatment of toxic industrial wastewater and solid waste [34] till now.
In Fig. 1, (a) Original Dataset: The dataset contains images of yeast cells and
their ground truth (GT). There are less than 256 yeast cells in each image. (b)
Data Augmentation: Mirror and rotation operations are applied to expand the
original dataset. (c) Training Process: PIS-Net is trained for image segmentation,
and the model with the best performance is saved. (d) Segmentation Result:
Using the best PIS-Net model to output the segmentation binary images. (e)
Counting Result: The number of yeast cells is counted by using connected domain
detection.
The main contributions of this paper are as follows: (1) We propose the PIS-
Net for dense tiny object counting. All down-sampling operations are based on
pixel interval sampling without max-pooling. (2) The operation of max-pooling
would lose some local features of tiny objects (the edge lines may not be con-
nected after max-pooling operations), while PIS-Net can cover a more detailed
region. (3) The proposed PIS-Net achieves the best counting result by comparing
it with other models.
PIS-Net for Dense Microorganism Counting 309
2 Related Work
In Table 1, the digital image processing-based microorganism counting approaches
are listed, including classical and machine learning-based counting approaches.
Our survey paper [14] summarized and analyzed these methods in detail.
trained using shape features for different microorganisms classification, then the
Otsu thresholding is used for counting. In [6], the Marr-Hildreth operator is
used to detect the edge of bacteria images. Then thresholding is used for binary
image segmentation. After that, Artificial Neural Network (ANN) is trained for
bacteria classification. In [12], the contrast-limited adaptive histogram equaliza-
tion is used to segment the bacteria. Moreover, a neural network including four
convolutional and one fully connected layers is used for microorganism image
classification. By analyzing all related works, it can be found that in the task of
microorganism counting, most deep learning approaches are used for classifica-
tion tasks, but few for segmentation tasks. The classical segmentation approaches
perform unsatisfactorily through the pre-test because the contour lines of yeast
cells in our dataset are not clear. So we design an end-to-end CNN model for
the tasks of dense tiny objects segmentation.
Fig. 2. The images in yeast cell dataset (a) shows the original yeast image and (b) is
the corresponding Ground Truth images
The augmented yeast image dataset is randomly divided into training, vali-
dation, and test dataset with the ratio of 3:1:1. Therefore, 244 original and their
corresponding GT images are used for the training task, 82 original and their
corresponding GT images are used for validation, and 82 original images are
used for the test.
PIS-Net for Dense Microorganism Counting 311
In decoder network, four blocks are applied for up-sampling. Two convolu-
tion operations with a kernel size of 3×3 (each use filtering with padding and
followed by a ReLU operation) are applied first. Then the transposed convolu-
tion operation with a kernel size of 3, a stride of 2, and padding of 1 is applied
for up-sampling. The transposed convolution, whose parameters can be learned
in the process of backpropagation, can be applied to expand the size of the
312 J. Zhang et al.
image [32]. Meanwhile, the number of channels can be changed by using the
different number of convolutional kernels [31]. After that, the high-resolution
feature maps from the encoder are transformed to low-resolution feature maps
using 2×, 4× and 8× pixel interval sampling, and concatenated with the feature
maps after up-sampling. For instance, the 8× pixel interval sampling features
of the first block, 4× pixel interval sampling of the second block and 2× pixel
interval sampling of the third block in the encoder are concatenated with the
copied features of the fourth block and the features after up-sampling (5 parts
of pixel interval sampling features are concatenated in the fourth level of the
decoder). In the same way, there are 4, 3, 2 parts of features are concatenated in
the third, second, and first level of decoder, respectively. After that, two convo-
lutions and ReLU operations are applied to change the number of channels. The
up-sampling operation is repeated four times, with output resolutions of H × W
and channel of C, which has the same size as the encoder input features. Finally,
a Softmax layer with 2 output channels is applied for feature maps classification.
4 Experiments
Metric Definition
T P +T N
Accuracy T P +T N +F P +F N
T P +T N
Jaccard F N +T P +F P
2T P
Dice F N +2T P +F P
TP
Precision T P +F P
|N −NGT |
CA 1 − pred NGT
HD dH (X, Y ) = max(supx∈X inf y∈Y d(x, y), supy∈Y inf x∈X d(x, y))
Parameters Setting. Softmax is used for the final classification in our experi-
ment. Adam optimizer is applied to minimize the loss function, which can adjust
the learning rate automatically by considering the gradient momentum of the
previous time steps [15]. In the training task, the learning rate is set as 0.001,
and the batch size is 8. The epoch is set as 100 by considering the converging
speed of experimental models, the example of loss and intersection over union
(IoU) curves of models is shown in Fig. 4. Though there are 92,319,298 params
to be trained in total, the PIS-Net can converge rapidly and smoothly, without
overfitting. There is a jump in loss and IoU plots for all 3 tested networks from
40 to 80 epochs, which is caused by the small batch size. Small batch size may
lead to the huge differences between each batch, and the loss and IoU curves
may jump while convergence.
training, validation, and test, which is proposed in Sect. 3.1. All models are
trained from scratch without pre-training and fine-tuning, which are the same
as the work in [11] (though the models after pre-trained may obtain more satis-
factory results). In addition, the methods based on Hough transformation, Otsu
thresholding, and Watershed are compared to show the performance of classical
methods in the task of tiny dense object counting. The images after segmentation
are shown in Fig. 5.
By reviewing the Table 3, it shows that the PIS-Net has the highest Dice,
Jaccard, and Precision, which indicates that the PIS-Net performs best in the
task of dense tiny objects segmentation. U-Net has better Accuracy and Haus-
dorff distance, but the values are close to the proposed PIS-Net. It can be seen
that the deep learning methods can work better than the classical methods in
this task.
After segmentation, post-processing proposed in Sect. 3.3 is applied for noise
removal. Then the region search method based on 8 Neighborhood is applied for
dense tiny object counting. The counting results of several methods are shown
in Table 4.
The Counting accuracy of PIS-Net is the best and more than 5% higher than
U-Net. The Counting Accuracy of SegNet, Attention-UNet, Trans U-Net, and
Swin U-Net would be less than zero without post-processing, which is caused by
the huge number of False Positive pixels.
From Table 5, it can be found that all evaluation indices of repeated PIS-Nets
are approximate, which shows satisfactory and stable counting performance for
dense tiny object counting tasks.
Table 5. The evaluation indices of repeatability tests (accuracy, dice, jaccard, precision
and counting accuracy are in %, hausdorff distance is in pixel per image)
Model Training time Mean training time Test time Mean test time
PIS-Net 1750.80 7.18 76.23 0.93
U-Net 1033.00 4.23 42.94 0.52
Swin-UNet 1314.06 5.39 53.25 0.65
Att-UNet 1163.93 4.77 49.44 0.60
References
1. Ates, H., Gerek, O.: An image-processing based automated bacteria colony counter.
In: Proceedings of ISCIS 2009, pp. 18–23 (2009)
2. Austerjost, J., Marquard, D., Raddatz, L., et al.: A smart device application for the
automated determination of E. coli colonies on agar plates. Eng. Life Sci. 17(8),
959–966 (2017)
3. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-
decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach.
Intell. 39(12), 2481–2495 (2017)
4. Barbedo, J.: An algorithm for counting microorganisms in digital images. IEEE
Lat. Am. Trans. 11(6), 1353–1358 (2013)
5. Barber, P., Vojnovic, B., Kelly, J., et al.: An automated colony counter utilising a
compact Hough transform. Proc. MIUA 2000, 41–44 (2000)
6. Blackburn, N., Hagström, Å., Wikner, J., et al.: Rapid determination of bacterial
abundance, biovolume, morphology, and growth by neural network-based image
analysis. Appl. Environ. Microbiol. 64(9), 3246–3255 (1998)
7. Boss, R., Thangavel, K., Daniel, D.: Automatic mammogram image breast region
extraction and removal of pectoral muscle. arXiv: 1307.7474 (2013)
8. Cao, H., Wang, Y., Chen, J., et al.: Swin-unet: unet-like pure transformer for
medical image segmentation. arXiv: 2105.05537 (2021)
9. Chen, J., Lu, Y., Yu, Q., et al.: Transunet: Transformers make strong encoders for
medical image segmentation. arXiv: 2102.04306 (2021)
10. Clarke, M., Burton, R., Hill, A., et al.: Low-cost, high-throughput, automated
counting of bacterial colonies. Cytometry Part A 77(8), 790–797 (2010)
11. Dietler, N., Minder, M., Gligorovski, V., et al.: A convolutional neural network
segments yeast microscopy images with high accuracy. Nature Commun. 11(1),
1–8 (2020)
12. Ferrari, A., Lombardi, S., Signoroni, A.: Bacterial colony counting by convolutional
neural networks. In: Proceedings of EMBC 2015, pp. 7458–7461 (2015)
13. Hong, M., Yujie, W., Caihong, W., et al.: Study on heterotrophic bacteria colony
counting based on image processing method. Control Instrum. Chem. Ind. 35(3),
38–41 (2008)
14. Jiawei, Z., Chen, L., Rahaman, M., et al.: A comprehensive review of image anal-
ysis methods for microorganism counting: from classical image processing to deep
learning approaches. Artif. Intell. Rev. 55, 2875–2944 (2021)
15. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv: 1412.6980
(2014)
16. Kosov, S., Shirahama, K., Li, C., et al.: Environmental microorganism classification
using conditional random fields and deep convolutional neural networks. Pattern
Recogn. 77, 248–261 (2018)
17. Kulwa, F., Li, C., Zhao, X., et al.: A state-of-the-art survey for microorganism
image segmentation methods and future potential. IEEE Access 7, 100243–100269
(2019)
18. Kulwa, F., Li, C., Zhang, J., et al.: A new pairwise deep learning feature for
environmental microorganism image analysis. Environmental Science and Pollution
Research p, Online first (2022)
19. Li, C., Wang, K., Xu, N.: A survey for the applications of content-based microscopic
image analysis in microorganism classification domains. Artif. Intell. Rev. 51(4),
577–646 (2017). https://doi.org/10.1007/s10462-017-9572-4
318 J. Zhang et al.
20. Li, C., Zhang, J., Kulwa, F., Qi, S., Qi, Z.: A SARS-CoV-2 microscopic image
dataset with ground truth images and visual features. In: Peng, Y., et al. (eds.)
PRCV 2020. LNCS, vol. 12305, pp. 244–255. Springer, Cham (2020). https://doi.
org/10.1007/978-3-030-60633-6_20
21. Oktay, O., Schlemper, J.F., et al.: Attention u-net: Learning where to look for the
pancreas. arXiv: 1804.03999 (2018)
22. Rahaman, M., Li, C., Yao, Y., et al.: Identification of COVID-19 samples from chest
X-Ray images using deep learning: a comparison of transfer learning approaches.
J. X-ray Sci. Technol. 28(5), 821–839 (2020)
23. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed-
ical image segmentation. In: Proceedings of ICMICCAI 2015, pp. 234–241 (2015)
24. Selinummi, J., Seppälä, J., Yli-Harja, O., et al.: Software for quantification of
labeled bacteria from digital microscope images by automated image analysis.
Biotechniques 39(6), 859–863 (2005)
25. Tang, Y., Ji, J.and Gao, S., et al.: A pruning neural network model in credit clas-
sification analysis. Comput. Intell. Neurosci. 2018, 22 (2018). Article ID: 9390410
26. Xu, H., Li, C., Rahaman, M.M., et al.: An enhanced framework of generative adver-
sarial networks (EF-GANs) for environmental microorganism image augmentation
with limited rotation-invariant training data. IEEE Access 8(1), 187455–187469
(2020)
27. Yamaguchi, N., Ichijo, T., Ogawa, M., et al.: Multicolor excitation direct counting
of bacteria by fluorescence microscopy with the automated digital image analysis
software BACS II. Bioimages 12(1), 1–7 (2004)
28. Yoon, S., Lawrence, K., Park, B.: Automatic counting and classification of bacterial
colonies using hyperspectral imaging. Food Bioprocess Technol. 8(10), 2047–2065
(2015)
29. Yoshizawa, K.: Treatment of waste-water discharged from sake brewery using yeast.
J. Ferment Technol. 56, 389–395 (1978)
30. You, L., Zhao, D., Zhou, R., et al.: Distribution and function of dominant yeast
species in the fermentation of strong-flavor baijiu. World J. Microbiol. Biotechnol.
37(2), 1–12 (2021)
31. Zeiler, M., Krishnan, D., Taylor, G., et al.: Deconvolutional networks. In: Proceed-
ings of of CVPR 2020, pp. 2528–2535 (2010)
32. Zeiler, M., Taylor, G., Fergus, R.: Adaptive deconvolutional networks for mid and
high level feature learning. In: Proceedings of ICCV 2011, pp. 2018–2025 (2011)
33. Zhang, C., Chen, W., Liu, W., et al.: An automated bacterial colony counting
system. In: Proceedings of SUTC 2008, pp. 233–240 (2008)
34. Zhang, H., Jian, L.: Current microbial techniques for biodegradation of wastewater
with high lipid concentrations. Tech. Equipment Environ. Pollut. Control 3, 28–32
(2004)
35. Zhang, J., Li, C., Kosov, S., et al.: LCU-net: a novel low-cost U-net for environ-
mental microorganism image segmentation. Pattern Recogn. 115, 107885 (2021)
36. Zhang, J., Li, C., Kulwa, F., et al.: A multi-scale CNN-CRF framework for environ-
mental microorganism image segmentation. BioMed Res. Int. 2020, 1–27 (2020)
37. Zhang, R., Zhao, S., Jin, Z., et al.: Application of SVM in the food bacteria image
recognition and count. In: Proceedings of ICISP 2010, vol. 4, pp. 1819–1823 (2010)
38. Zhao, P., Li, C., Rahaman, M.M., et al.: Comparative study of deep learning classi-
fication methods on a small environmental microorganism image dataset (EMDS-
6): from convolutional neural networks to visual transformers. Front. Microbiol.
13, 792166 (2022). https://doi.org/10.3389/fmicb.2022.792166
Biomedical Engineering
and Physiotherapy, Joint Activities,
Common Goals
Analysis of Expert Agreement
on Determining the Duration of Writhing
Movements in Infants to Develop
an Algorithm in OSESEC
1 Introduction
it possible to draw the infant’s parents’ attention to the presence of features that
should be consulted with an expert. Such management facilitates early diagno-
sis of neurological disorders in infants, which in turn allows for prompt medical
intervention early in the child’s development. Despite the development of modern
advanced diagnostic techniques, it is primarily the feedback obtained through
observation and clinical evaluation that is particularly important.
Early assessment of infant motor development has been the subject of study
by many researchers. Many publications describe testing procedures based on
diagnostic scales used in the early assessment of motor activity. The scales can
be divided into subjective and objective. The former indicate that the reliabil-
ity of assessing the motor development of a child depends on the experience of
the person who performs the examination (including work experience, regular
further training in pediatric physiotherapy, and individual observation skills) of
the person performing the examination [17,23]. The most recognized and widely
used methods of the subjective assessment of motor development in infants are:
Dubowitz Scale which identifies developmental disorders of the infant [10]; Test
of Infant Motor Scale (TIMP) which evaluates posture and psychomotor skills
of the infant [25]; Neonatal Behavioral Assessment Scale (NBAS) which assesses
infant behavior [26]; General Movement Assessment (GMA) – an assessment of
general movement patterns according to Prechtl [24]; Harris Infant Neuromotor
Test (HINT) which identifies neurodevelopmental, cognitive, and behavioral dis-
orders [16]; Alberta Infant Motor Scale (AIMS) that assesses the development of
motor functions from birth to independent locomotion [1]; and the Munich Func-
tional Developmental Diagnosis (MFDD) assessing functional skills in manual
dexterity, perception, active speech, speech comprehension, child independence,
and assessing postural reactivity. Of the subjective methods, the general move-
ments assessment has the highest reliability [3,4,27]. This method is described as
the most reliable for the prediction of cerebral palsy [22,27]. It is now a widely
used diagnostic tool in the neurological assessment of newborns and infants.
The method is considered extremely important for the overall assessment of the
integrity of the central nervous system [26].
Infants exhibit a spontaneous movement pattern called general movements
(GMs) [5]. GMs can be divided into fetal movements, which last from 9 to
12 weeks of fetal life to 40 weeks of gestation. From the time of delivery until 6
to 9 weeks of age, these movements are also called writhing movements (WMs).
These movements are performed with low velocity, amplitude, and are character-
ized by high complexity and high variability with respect to amplitude, velocity,
and acceleration [22]. After this period, writhing movements disappear, replaced
by fidgety movements (FMs), which are present until about 6 months of age. One
of the abnormal patterns of general movements is a poor repertoire of movements
(PR). In this case, the movement pattern is simple and monotonous. Movements
of different body parts do not occur in a complex manner. They are character-
ized by low amplitude, speed, and intensity. PR occurs frequently in infants and
therefore the predictive value is described in the literature as very low [12]. In
their study on 10-day-old infants, De Vries & Bos presented the conclusion that
Expert Agreement on Determining the Duration of Writhing Movements 323
Katowice (Resolution No. 5/2018 of 19 April 2018). All patients and their par-
ents/guardians gave written informed consent to participate in the study. The
research was carried out in the Piekary Medical Centre in the neonatal unit in
Piekary Śląskie, Poland. The equipment used in the tests did not pose a threat
of radiation or exposure to other energy that could in any way affect the safety
of the observed infants.
The study population consisted of 125 full-term (38–42 weeks) infants on days 2
and 3 of life, born by physiologic delivery, with positive perinatal history. The
motor activity of the subjects was recorded with a video camera. From this group,
after taking into account the inclusion and exclusion criteria, such as infant’s
crying or sleeping and caregiver’s intervention aimed at calming the infant by
giving a pacifier, rocking, stroking, or feeding, 36 recordings were selected for
further analysis, with five videos (N1–N5) chosen for the main part of the study
(Fig. 1). Table 1 shows the specific information about infants recorded on selected
videos.
Fig. 1. Flow diagram showing the criteria for the qualification of infants for the test
Expert Agreement on Determining the Duration of Writhing Movements 325
Table 1. Specific information about subjects selected from the research group
2.2 Methodology
Observations of the infants were made after their spontaneous awakening and
after feeding. Recording of the infant’s movement was continued until the infant
was active and did not cry. Videos showing moments of the infant sleeping or
crying have been removed. The hospital crib with the infant was placed on the
video recording station taking into account the appropriate direction: the side
of the infant’s head on the video recording device had a blue marking labeled
‘UP’, while the side of the infant’s legs – a pink marking labeled ‘DOWN’. The
infant was recorded while lying in a supine position, undressed, and in a diaper.
The temperature, lighting, and noise level in the test room were consistent with
the current regulations of the neonatal ward. The camera was placed 1 m above
the infant’s navel. Captured by a full HD (1920 × 1080) resolution camera, the
recording lasted 10 to 17 min [9].
Videos where the infant was crying or sleeping in an extended part were
removed. Videos of 36 infants evaluated by five independent physiotherapists
(experts) with years of experience working with patients at the developmental
age were used for further analysis. The neonatal activity was assessed using
GMA. As a general movement assessment system, GMA is defined as a stan-
dard, non-invasive, convenient, inexpensive, and reliable research method [2,11].
A standard GMA is a process of a 3–5 min video observation of a supine, com-
fortably dressed, awake, and calm infant. The recorded movements of the infant
are evaluated according to its age (chronological or corrected).
The experts’ task was to identify videos with three complete WM sequences
in infants and videos of those with poor repertoire (PR). For the purposes of
the study, it was assumed that an infant would be identified as normal (N) if at
least four of the five experts classified the infant in the same way. The results
of the observations were documented in the form of a list of the following data
for each of the 36 recordings: sex and age of the infant (24 h of the observed
infant’s life), duration of the recording, determination of N or PR by each of the
five experts. In a further qualification step, five recordings of infants identified
as normal (N) were randomly selected for evaluation. Then, from each of the five
recordings, the sections in which the five experts indicated writhing movements
(WM), other movements (OM), and in which the infants made no movement
(NM) were selected.
326 D. Latos et al.
The study used 5 videos of infants whose movements were uniquely identified
by five experts in general movement assessment as writhing movements. Next,
the percentage value of WMs was evaluated by dividing the total time of occur-
rence of WMs in a given video by the entire video length. Experts agreement
was determined by dividing the range length defined by all experts as a WMs
(common part) and the range length indicated by at least one expert (union of
all experts indications).
2.3 Experts
The experiments were conducted by highly qualified physiotherapists who were
certified experts in GMA chosen for the purposes of the study:
– Expert 1 (E1): physiotherapist with 45 years of experience in working with
infants and young children, psychologist, sensorimotor integration, and NDT
Bobath therapist.
– Expert 2 (E2): physiotherapy specialist with 20 years of experience in working
with infants and young children, NDT Bobath therapist for children.
– Expert 3 (E3): physiotherapist, NDT Bobath therapist for children with
9 years of experience in working with infants and young children.
– Expert 4 (E4): physiotherapist, NDT Bobath therapist for children with
14 years of experience in working with infants and young children.
– Expert 5 (E5): physiotherapist, NDT-Bobath therapist for children with
17 years of experience in working with infants and young children.
3 Results
Five experts received five video recordings of the infants studied. The expert’s
task was to analyze each video recording and identify writhing movements in
individual infants (Fig. 2). The observations made by the researchers allowed
for:
1. Evaluation of the movement on the timeline in percentage terms,
2. Determination of the number of time units of WMs,
3. Determination of the agreement on the assessment of five experts identified
at the same time.
Table 2 shows the duration of WMs observed by experts. The percentage
of WMs observed by each expert in each recording and the number of WM
periods/sections recorded by the experts at their indicated moments of infants’
activity are also presented.
The greatest expert agreement on the observation of WMs (9 common sec-
tions) was observed in the examination of infant N4 (Fig. 3). The agreement
recorded during observation of this infant lasted 3 min. During the examinations
of infants N2, N3, N5, experts agree on 5 common sections, whereas in N1 — in
4 common sections. The lowest agreement was observed in the examination of
N2 and N3, with 4 and 5 common sections, respectively (Table 3).
Expert Agreement on Determining the Duration of Writhing Movements 327
Most experts observed the largest number of WMs in the recording N4. Fur-
thermore, experts identified the least number of WMs in the video of infant 5,
despite similar values of the sections (except for E1, which included small breaks
between WMs). In this case, there was also a large variation in agreement, which
may be due to considerable difficulties in the determination of the beginning
and end of WM. The most (and the longest) WM sections were indicated by
E2, thus significantly deviating from the observations of the other experts, while
the fewest were found by E4, who, compared to other experts, also indicated a
relatively small percentage of movements as WMs. Similarly, E5 marked a small
amount of WM compared to other experts.
4 Discussion
In the present study, discrepancies were observed in the assessment of videos by
experts. The discrepancies were caused by the fact that discretion was allowed
in assessing when WM started and when it ended. However, it is worth noting
that the above-mentioned definitions cannot be found in the extensive litera-
ture on the subject. Writhing movements are known to occur in the fetus from
328 D. Latos et al.
Recording ID (duration) E1 E2 E3 E4 E5
N1 (10:43) WM time 03:52 08:31 02:50 02:41 03:21
% WM 36.08 79.47 26.44 25.04 31.26
Number of sections 7 5 4 4 7
N2 (17:10) WM time 02:47 02:42 03:32 02:26 02:10
% WM 16.21 15.73 20.58 14.17 12.62
Number of sections 6 7 8 6 8
N3 (12:09) WM time 04:24 09:24 05:26 04:39 04:56
% WM 36.21 77.37 44.72 38.27 40.60
Number of sections 12 6 5 5 9
N4 (14:36) WM time 10:00 09:26 09:42 09:30 09:03
% WM 68.49 64.61 66.44 65.07 61.99
Number of sections 6 7 6 4 7
N5 (12:08) WM time 05:51 09:13 05:58 02:18 03:57
% WM 48.21 75.96 49.18 18.96 32.55
Number of sections 12 3 5 3 4
Fig. 3. Comparison of writhing movement sections marked by at least one of the experts
to the ranges of writhing movements observed in full agreement by all experts
Expert Agreement on Determining the Duration of Writhing Movements 329
N1 N2 N3 N4 N5
Agreement 18.89% 26.58% 17.92% 80.98% 19.58%
on 10-day-old infants, De Vries & Bos presented the conclusion that if at least
one normal WM is observed in the infant, the likelihood of normal development
is high (94%) [6]. Therefore, a detailed determination of the beginning and end
of WMs is of no value in predicting neonatal development. The diagnosticians
unanimously distinguished normal WMs in the studied group of infants and iden-
tified them as normal. The GMA does not specify when such a movement begins
or ends. In developing computer-aided diagnosis tools for neurodevelopmental
disorders, it is important to consider the discrepancies that exist in the obser-
vations made by the experts indicated in this study. The results are the basis
for determining the effectiveness of the developed methods in the automated
identification of writhing movements.
References
1. Almeida, K.M., Dutra, M.V.P., Mello, R.R.D., Reis, A.B.R., Martins, P.S.: Concur-
rent validity and reliability of the alberta infant motor scale in premature infants.
J. de pediatria 84, 442–448 (2008)
2. Bosanquet, M., Copeland, L., Ware, R., Boyd, R.: A systematic review of tests to
predict cerebral palsy in young children. Dev. Med. Child Neurol. 55(5), 418–426
(2013)
3. Cioni, G., Ferrari, F., Einspieler, C., Paolicelli, P.B., Barbani, T., Prechtl, H.F.:
Comparison between observation of spontaneous movements and neurologic exam-
ination in preterm infants. J. Pediatrics 130(5), 704–711 (1997)
4. Cioni, G., Prechtl, H.F., Ferrari, F., Paolicelli, P.B., Einspieler, C., Roversi, M.F.:
Which better predicts later outcome in fullterm infants: quality of general move-
ments or neurological examination? Early Hum. Dev. 50(1), 71–85 (1997)
5. De Vries, J.I., Visser, G.H., Prechtl, H.F.: The emergence of fetal behaviour. i.
qualitative aspects. Early Hum. Dev. 7(4), 301–322 (1982)
6. De Vries, N., Bos, A.: The quality of general movements in the first ten days of life
in preterm infants. Early Hum. Dev. 86(4), 225–229 (2010)
7. Doroniewicz, I., et al.: Computer-based analysis of spontaneous infant activity: a
pilot study. In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information
Technology in Biomedicine. AISC, vol. 1186, pp. 147–159. Springer, Cham (2021).
https://doi.org/10.1007/978-3-030-49666-1_12
8. Doroniewicz, I., et al.: Temporal and spatial variability of the fidgety movement
descriptors and their relation to head position in automized general movement
assessment. Acta Bioeng. Biomech. 23(3), 69–78 (2021)
9. Doroniewicz, I., et al.: Writhing movement detection in newborns on the second and
third day of life using pose-based feature machine learning classification. Sensors
20(21), 5986 (2020)
10. Dubowitz, L., Ricciw, D., Mercuri, E.: The dubowitz neurological examination of
the full-term newborn. Mental Retard. Dev. Disabil. Res. Rev. 11(1), 52–60 (2005)
11. Einspieler, C., Bos, A.F., Libertus, M.E., Marschik, P.B.: The general movement
assessment helps us to identify preterm infants at risk for cognitive dysfunction.
Front. Psychol. 7, 406 (2016)
12. Einspieler, C., Prechtl, H.F.: Prechtl’s assessment of general movements: a diagnos-
tic tool for the functional assessment of the young nervous system. Mental Retard.
Dev. Disabil. Res. Rev. 11(1), 61–67 (2005)
Expert Agreement on Determining the Duration of Writhing Movements 331
13. Einspieler, C., Prechtl, H.F., Ferrari, F., Cioni, G., Bos, A.F.: The qualitative
assessment of general movements in preterm, term and young infants-review of the
methodology. Early Hum. Dev. 50(1), 47–60 (1997)
14. Fjørtoft, T., Einspieler, C., Adde, L., Strand, L.I.: Inter-observer reliability of the
“assessment of motor repertoire-3 to 5 months" based on video recordings of infants.
Early Hum. Dev. 85(5), 297–302 (2009)
15. Harris, S.R., Daniels, L.E.: Reliability and validity of the harris infant neuromotor
test. J. Pediatr. 139(2), 249–253 (2001)
16. Heineman, K.R., Hadders-Algra, M.: Evaluation of neuromotor function in infancy-
a systematic review of available methods. J. Dev. Behav. Pediatr. 29(4), 315–323
(2008)
17. Heinze, F., Hesels, K., Breitbach-Faller, N., Schmitz-Rode, T., Disselhorst-Klug,
C.: Movement analysis by accelerometry of newborns and infants for the early
detection of movement disorders due to infantile cerebral palsy. Med. Biol. Eng.
Comput. 48(8), 765–772 (2010)
18. Hopkins, B., Prechtl, H.R.: A qualitative approach to the development of move-
ments during early infancy. Clin. Dev. Med. 94, 179–197 (1984)
19. Ihlen, E.A., et al.: Machine learning of infant spontaneous movements for the early
prediction of cerebral palsy: a multi-site cohort study. J. Clin. Med. 9(1), 5 (2020)
20. Karch, D., et al.: Kinematic assessment of stereotypy in spontaneous movements
in infants. Gait Posture 36(2), 307–311 (2012)
21. Marcroft, C., Khan, A., Embleton, N.D., Trenell, M., Plötz, T.: Movement recog-
nition technology as a method of assessing spontaneous general movements in high
risk infants. Front. Neurol. 5, 284 (2015)
22. Noble, Y., Boyd, R.: Neonatal assessments for the preterm infant up to 4 months
corrected age: a systematic review. Dev. Med. Child Neurol. 54(2), 129–139 (2012)
23. Novak, I., et al.: Early, accurate diagnosis and early intervention in cerebral palsy:
advances in diagnosis and treatment. JAMA Pediatr. 171(9), 897–907 (2017)
24. Nuysink, J., et al.: Prediction of gross motor development and independent walking
in infants born very preterm using the test of infant motor performance and the
alberta infant motor scale. Early Hum. Dev. 89(9), 693–697 (2013)
25. Ploegstra, W.M., Bos, A.F., de Vries, N.K.: General movements in healthy full term
infants during the first week after birth. Early Hum. Dev. 90(1), 55–60 (2014)
26. Prechtl, H.F.: General movement assessment as a method of developmental neu-
rology: new paradigms and their consequences the 1999 ronnie mackeith lecture.
Dev. Med. Child Neurol. 43(12), 836–842 (2001)
27. Stewart, P., Reihman, J., Lonky, E., Darvill, T., Pagano, J.: Prenatal PCB expo-
sure and neonatal behavioral assessment scale (NBAS) performance. Neurotoxicol.
Teratol. 22(1), 21–29 (2000)
Comparative Analysis of Selected Methods
of Identifying the Newborn’s Skeletal
Model
1 Introduction
The problem discussed in this article is a challenge for many automatic behavior
control systems. The way in which the machine sees and interprets the image
obtained from the environment does not always correspond to the way in which
the human (and other living organisms) perceives it. This study deals with the
customization of generally available pose estimation systems for the assessment
of neonatal form. The research was inspired by the problems encountered while
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 332–344, 2022.
https://doi.org/10.1007/978-3-031-09135-3_28
Comparative Analysis 333
an equally good 3D method. The algorithms developed so far will reproduce the
characteristic places on the body of adults with greater or lesser accuracy. Most
of them fulfill their tasks, in which, due to the demonstrative nature, inaccuracies
are allowed. The preferred system should be resistant to such deviations, be able
to eliminate them or identify and separate them. The complexity of the developed
method is also not trivial. Appearing artifacts can be filtered using specific algo-
rithms if they are incidental. However, this introduces a further computational
overhead. Sometimes pre-processing of images is also required. The fact that the
pose estimation systems built during the learning of artificial neural networks
of universal collections of silhouettes do not provide sufficient effectiveness may
be proved by the development of special video collections of only children, espe-
cially newborns. One of them is presented by Migliorellia and Moccia [15]. More
advanced systems additionally use the depth image. Such solutions can be found
in the works Hesse [10–12]. Published datasets also contain labeled landmarks.
Sets ready to train new neural network architectures or train already pre-learned,
especially useful when you have a small data set [20]. A ready tool for automatic
pose estimation that can be used at home, using ordinary video recordings made
with a smartphone, offers Passmore et. [17]. However, the biggest problem is that
the landmarks are independently embedded by the creators of these datasets and
cannot be directly transferred to another system without tedious learning. The
measures introduced in the cited works are based on the previously known appro-
priate point coordinates. The research of this paper aims to discuss the effects
of the indiscriminate use of any of the methods.
on different concepts compete with each other both in terms of memory and time
efficiency. The results of the rankings are available in many publications [24,25].
Applied as a benchmark, OpenPose, using a bottom-up approach, first detects
parts (key points) belonging to each person in the image, then assigns parts
to different people using part affinity fields [1] The opposite is true of RMPE
(AlphaPose), which is a top-down method. Popular architecture for semantic
segmentation and instance is Mask RCNN and Deep High-Resolution Represen-
tation Learning for Human Pose Estimation [HRNet] (CVPR’19), [19]. Typing
the places of occurrence of characteristic points assumes a certain probability of
their occurrence in a specific place. This estimation is based on a set of pre-tagged
images that constitute training material for a previously built artificial neural
network. Most of the popular applications (mentioned above) use standard data
sets for learning. This makes it easier to compare performance as well as learn.
Consequently, it can be expected that the result will be identical positions of
the characteristic points. Popular data sets are: 25,000 images MPII Human
Pose dataset [9,23], 2000 Leeds Sports Pose Dataset [13,22], Frames Labeled In
Cinema (FLIC) [18] COCO (Microsoft Common Objects in Context) [14]. CMU
Panoptic Dataset [9]: 5 sequences (5.5 h) and 1.5 million skeletons 3D, VGG
Human Pose Estimation Datasets [26] YouTube Pose [4], and other. In addition,
there are also data sets teaching three-dimensional estimation: Human3.6 [2]
containing 3.6 million three-dimensional human positions, HumanEva Dataset
in version I, II and baseline [18] containing 40 thousand frames with 3D posi-
tions. Characteristic points calculated by algorithms are given different names,
not always corresponding to the anatomical parts of the body, e.g. PoseNet and
MPII Format. The diagrams (Fig. 1) show a different arrangement of the points.
Fig. 1. Location of characteristic points in models: COCO Format (a), OpenPose (b),
Pose MediaPipe format Landmark Model (BlazePose GHUM 3D) (c), Mobidev Pose
format (d)
336 A. Mrozek et al.
4 Results
Standard material – the sample against which subsequent systems are compared
is a set of data obtained from the OpenPose system. In the raw dataset, including
coordinates of points characteristic of the shoulders, hips, knee and elbow joints,
wrists and heels, as well as eyes and nose, a subset common to the other systems
was selected; i.e. shoulders, knees, hips, elbows, wrists, and ankles. For the 20
still images constituting the previously selected frames, estimates for most of the
Comparative Analysis 337
Fig. 2. Histograms of confidence score for two different points on the body
indication has become a benchmark. Two typical mistakes are illustrated by the
frames (Fig. 3).
Fig. 3. Estimation of points on the child’s body in different methods (black color –
OpenPose method, green or white – MMPose, dotted – MediaPipe Pose, last tree –
EfficentPose)
The graph in Fig. 3 shows the misdiagnosis of the location of the characteristic
points. There are moments when a sudden increase in the length of the calves
is accompanied by a decrease in the length of the thigh. This is the situation
where the knee point is shifted upward. There are also other errors where there
is a bad assignment to body parts and their connections. The charts in Fig. 4
shows the difference between the squares of the length of the calf and thigh of
the left leg (Figs. 5 and 6).
As can be seen above, the difference usually does not exceed 25,000 points.
This value is exceeded in 10 cases within 10 min of recording. The increase in
this value usually occurs simultaneously in each of the tested methods, but the
degree of exceeding is much greater in OpenPose than in the others, and there
Comparative Analysis 339
Fig. 4. Confidence score (upper) and squares of length for left leg (down)
Fig. 5. Squares of the length of the thigh and calf of the right leg
Fig. 6. Difference in the squares of the length of the thigh and calf of the left leg in
the time
340 A. Mrozek et al.
are also situations in which it does not occur in other methods. It is also worth
noting that in the MediaPipe method, despite the lowering of the hip line, there
is no shortening of the thigh part of the leg. The histogram in Fig. 7 present the
scattering of the difference between the squares of the length of both parts of
the left and right legs.
Fig. 7. Histogram: difference in the squares of the length of the thigh and calf of the
left leg
Based only on one film, it can be assumed that the MediaPipe method may
be the best. This method shows the smallest dispersion of values, and the indi-
cated values are the most compact. In the other two methods, single points are
located in the area from 20,000 to 25,000 points. Another attempt should confirm
whether the above dependencies are repeated in other recordings. The second
stage consisted in selecting the sequences in which the probability of anomalies
was the highest. The decisive parameter was the difference between the thigh
and calf length in relation to its maximum value. All the frames with the cur-
rent difference greater than 0.7 maximum were selected. Out of 130 films with
a total length of over 44 h (9.5 million frames), an average of 170 frames per
film met the criteria. After rejecting such films for which not all points were
calculated in the OpenPose system (A), for the remaining films the coordinates
of characteristic points in the selected frames were estimated for the MMPose
(B) (hrnet_w48_coco_256 × 192) and MediaPipe (C) methods. The following
rule was adopted for the evaluation: for each frame and point, the coordinates
are calculated using all three methods, obtaining three results. Of these three,
the one that is farthest away from the other two is discarded. The best method
is the one whose results are least often rejected. All six lower body points are
taken into account separately for each selected frame and film. The percentages
of rejections are calculated and their average value for each method and point.
It can be seen (Table 1) that the methods A and C are most often rejected,
and B and A the least frequently. If we only considered the number of films in
which a given method would most often be rejected, the situation would look
Comparative Analysis 341
Point/method 1 2 3 4 5 6
OpenPose (A) Mean 20% 20% 47% 52% 42% 41%
MMPose (B) 27% 22% 15% 13% 16% 18%
Media Pipe (C) 53% 58% 38% 35% 41% 41%
MAX C C A A A A
MIN A/B A/B B B B B
like in Table 2. It lists the number of films in which the results of the indicated
method most often differ from the others. The comparison shows that in only
9% of the films surveyed, the MMPose method gives way to another method (it
is rejected), while in 2/3 of the films it is part of the winning pair. At the same
time, it will keep an advantage of about 30% in relation to the next method.
Table 2. The most common deviations in the results for the methods in the following
videos
And the methods that are least often rejected illustrated in Table 3.
Table 3. Methods for which results are the least likely to deviate from subsequent
videos
The built model of the bone skeleton is to be the basis for the development
of a universal method that allows to study the characteristics of spontaneous
movements of infants, in particular, their intensity, fluidity, phase and duration,
and also, the interdependencies between them. Determining quantitatively and
342 A. Mrozek et al.
qualitatively each activity of the newborn will allow you to document disturbing
developmental trends and track changes with age and during therapy. It will also
create the foundations for building a model of a child’s physical development in
the first weeks and months of life. With regard to the methods of obtaining char-
acteristic points, it can be stated that the popular OpenPose method fared worse
than the other two competing ones. Such a verdict was strongly influenced by the
lack of resistance to the occurrence of artifacts and their fairly large range. The
advantages, however, include the speed of this method and the ease of imple-
mentation. The method works well for the upper torso, head, hands, and feet by
detecting the toes and heel. The extended method also identifies the fingers, but
not always. The MMPose method is easily configurable and with a little mod-
ification it can generate a large data set. However, this takes a long time. The
necessity to install some libraries in advance may also be a significant obstacle.
It can also run on both GPU and CPU which is an advantage. The MMPose
method is the most stable, as long as certain parameters are used. Compared to
other methods, the characteristic points calculated by it are not included among
the most common outliers in individual items. On the other hand, they are the
least likely outliers in three out of five places. The advantage of the MediaPipe
method is the large number of points to be detected and the ease of installation
and commissioning. On the other hand, the OpenPose method is variable, as
it is most often rejected in three categories, and least often in the other two.
The selection will not change when the criterion is only the ranking position in
each video, without taking into account the aspect ratio. The MediaPipe method
fares the worst here, accounting for more than half of the rejections and only 7%
of the non-rejected ones. Method B is also in the lead, as it participates by far
the least in rejections and as much as 65% in the least frequently rejected ones.
Due to the nondeterministic test setup environment (Google Colab), the time
parameter was not included in the evaluation. In the described experiment, a
fairly primitive comparative method was used. The adopted method of selecting
the winner by elimination is not without its drawbacks. The studies did not take
into account differences in the definition of body parts, which may be a mistake.
However, the verdicts are fairly consistent across most focal points. However, it
does not take into account the seriousness of the error and there are situations
where the differences are minimal (<5%) – in Table 1 points 1 and 2, methods
A and B in the minimum category. Considering only the number of clips, it can
be concluded that OpenPose and MediaPipe are ex aequo, usually eliminated.
On the other hand, MMPose is most often found near another. In alternative
articles [7], the authors compare their proprietary CIMA-Pose method with the
popular OpenPose, EfficentPose III and Efficient Hurglass B4 methods and man-
made annotations. At the same time, the network is trained. In the cited article,
OpenPose performs the worst in terms of both accuracy and complexity. Sum-
marizing all the pros and cons, according to the authors, the MMPose method
is the most useful for the purpose indicated at the beginning. Due to the fact
that the presented method of comparing algorithms was limited to their selected
group and did not take into account the possibility of correction and training,
Comparative Analysis 343
future work should focus on the use of the target set of children’s profiles. Tuning
also requires a smooth transition between individual frames. This is the intention
to continue work on improving the method of analyzing the motor well-being of
infants. The results should be approached with caution as the dataset was nei-
ther too large nor representative, and the analysis does not take real coordinates
into account, so it may happen that the rejected method (the most different
from the others) is more accurate than the other two.
References
1. Cao, Z., et al.: OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part
Affinity Fields. IEEE Trans. Pattern Anal. Mach. Intell. 43, 1 (2021)
2. Ionescu, C., Fuxin Li, C.S.: Human3.6M Dataset. http://vision.imar.ro/human3.
6m/description.php. Accessed 03 Sep 2021
3. Ceseracciu, E., et al.: Comparison of markerless and marker-based motion capture
technologies through simultaneous data collection during gait: Proof of concept.
PLoS One. 9(3), 1–7 (2014)
4. Charles, J., et al.: Personalizing human video pose estimation. In: Proceedings of
the IEEE Computer Society Conference on Computer Vision and Pattern Recog-
nition, pp. 3063–3072 (2016)
5. Doroniewicz, I., et al.: Writhing movement detection in newborns on the second and
third day of life using pose-based feature machine learning classification. Sensors
(Switzerland). 20(21), 1–15 (2020)
6. Doroniewicz, I., et al.: Temporal and spatial variability of the fidgety movement
descriptors and their relation to head position in automized general movement
assessment. Acta Bioeng. Biomech. 23(3), 1–21 (2021)
7. Groos, D., et al.: Towards human performance on automatic motion tracking of
infant spontaneous movements. Comput. Med. Imaging Graph. 95, 1–14 (2021)
8. Groos, D., Ramampiaro, H., Ihlen, E.A.F.: EfficientPose: scalable single-person
pose estimation. Appl. Intell. 51(4), 2518–2533 (2020). https://doi.org/10.1007/
s10489-020-01918-7
9. Hanbyul (Han), J., Simon, T., Donglai Xiang, Y.R.Y.A.S.: CMU Panoptic Dataset.
http://domedb.perception.cs.cmu.edu/. Accessed 03 Sep 2021
10. Hesse, N., et al.: Body pose estimation in depth images for infant motion analysis.
In: Proceedings of the Annual International Conference of the IEEE Engineering
in Medicine and Biology Society, EMBS. (2017). https://doi.org/10.1109/EMBC.
2017.8037221
11. Hesse, N., Bodensteiner, C., Arens, M., Hofmann, U.G., Weinberger, R., Sebastian
Schroeder, A.: Computer vision for medical infant motion analysis: state of the
art and RGB-D data set. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS,
vol. 11134, pp. 32–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-
11024-6_3
12. Hesse, N., et al.: Estimating body pose of infants in depth images using random
ferns. In: Proceedings of the IEEE International Conference on Computer Vision
(2015). https://doi.org/10.1109/ICCVW.2015.63
13. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models
for human pose estimation. In: British Machine Vision Conference BMVC 2010 -
Proceedings, pp. 1–11 (2010)
344 A. Mrozek et al.
14. Lin, T., Maire, M.: COCO Dataset | Papers With Code. https://paperswithcode.
com/dataset/coco. Accessed 03 Sep 2021
15. Migliorelli, L., et al.: The babyPose dataset. Data Br. 33 (2020). https://doi.org/
10.1016/j.dib.2020.106329
16. Nakano, N., et al.: Evaluation of 3D markerless motion capture accuracy using
open-pose with multiple video cameras. Front. Sport. Act. Living. 2(50), 1–9 (2020)
17. Passmore, E., et al.: Deep learning for automated pose estimation of infants at
home from smart phone videos. Gait Posture. 81 (2020). https://doi.org/10.1016/
j.gaitpost.2020.08.026
18. Sapp, B., Taskar, B.: MODEC: Multimodal decomposable models for human pose
estimation. In: Proceedings of the IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition, pp. 3674–3681, Portland, OR, USA (2013)
19. Sun, K., et al.: Deep high-resolution representation learning for human pose esti-
mation. In: Proceedings of the IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, pp. 5686–5696 (2019)
20. Huang, X., Fu, N., Shuangjun Liu, S.O.: Invariant representation learning for infant
pose estimation with small data. In: 16th IEEE International Conference on Auto-
matic Face and Gesture Recognition, p. 18, Jodhpur, India (2021). https://doi.
org/10.1109/FG52635.2021.9666956
21. Home - WRNCH. https://wrnch.ai/, Accessed 03 Sep 2021, CMU Panoptic
Dataset. http://domedb.perception.cs.cmu.edu/. Accessed 03 Sep 2021
22. COCO Dataset | Papers With Code. https://paperswithcode.com/dataset/coco.
Accessed 03 Sep 2021
23. Human3.6M Dataset. http://vision.imar.ro/human3.6m/description.php. Accessed
03 Sep 2021
24. Leeds Sports Pose Dataset. http://sam.johnson.io/research/lsp.html. Accessed 03
Sep 2021
25. MPII Human Pose Database. http://human-pose.mpi-inf.mpg.de/. Accessed 03
Sep 2021
26. Pose Estimation. https://paperswithcode.com/task/pose-estimation. Accessed 03
Sep 2021
27. Pose Estimation on MPII Human Pose.https://paperswithcode.com/sota/pose-
estimation-on-mpii-human-pose. Accessed 03 Sep 2021
28. VGG Pose Datasets. https://www.robots.ox.ac.uk/~vgg/data/pose/. Accessed 03
Sep 2021
Head Pose and Biomedical Signals
Analysis in Pain Level Recognition
1 Introduction
Pain level recognition is an essential issue for medicine and physiotherapy. Since
a patient’s self-report is highly subjective and corresponds to the individual pain
experience, automatic pain feeling recognition systems need to be developed. It is
really important, especially in uncooperative patients (infants, mentally-disabled
people, etc.). Pain level recognition also allows for identifying the stimulus inten-
sity, which can be harmful and may lead to tissue damage [24]. Pain intensity
adjustment is a key step in therapy procedures, where the pain cannot be con-
trolled by the subject himself and is induced by an external source (e.g., a
physiotherapist). Then, an automatic pain feeling recognition system seems to
be a kind of tissue emergency brake. There are some widely used databases in
the field of automatic pain assessment research. One of the most popular is the
BioVid Heat Pain Database [21]. It gathers data for 90 patients and is based
on electrodermal activity (EDA), electrocardiography (ECG), electromiography
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 345–355, 2022.
https://doi.org/10.1007/978-3-031-09135-3_29
346 M. Bieńkowska et al.
carried out, including tests on balanced and imbalanced data sets. The benefit
of the head pose data analysis for pain sensation assessment in manual therapy
is discussed in Sect. 4. Section 5 concludes the paper.
Fig. 1. The workflow of data acquisition and analysis for pain level assessment
The study was taken in an isolated, quiet therapy room. Participants were
informed about the examination course and have given a consent form. The
fascial therapy of the shoulders and neck area was performed by a physiothera-
pist and continued for 2–3 min. Patients were sitting on a chair. The procedure
was performed twice. Patients informed about their painful feelings during the
therapy course, saying a number in the range of 0 to 10, where 0 means no pain
and 10 refers to imaginably the most severe pain.
Additionally, after the therapy, patients were asked to indicate numeric
thresholds of moderate pain feeling (MPT) and severe pain sensation (SPT)
in their own opinion. The boundaries were used to binarize the pain labels. For
this study, we used data from 46 patients aged 22–72. The measurements were
carried out at a physiotherapy clinic and were approved by a bioethics commit-
tee.
Data were collected with the measuring system consisting of a wearable device
(RespiBan Professional, Portugal, Biosignalplux) and an external video camera
for patient movement registration. EDA, BVP, and EMG signals were acquired
with sampling frequencies 8 Hz, 64 Hz, 256 Hz, respectively. EDA sensors were
attached to the middle part of the index and middle fingers. The BVP clip-sensor
collected a signal from a ring finger. EMG electrodes were placed on the forehead
to register the corrugator supercilii muscle signals. The video camera captured
waist up images with a 30 Hz frequency, 640 × 280 px. The patient was sitting
348 M. Bieńkowska et al.
in an upright position, and a full view of the frontal face was recorded. The
platform (Fig. 2) acquired synchronized time-series with a pain feeling rating
declared by the patient during therapy [2]. The preprocessed signal data was
divided into 4-second frames with a 50% overlap. It meets the requirements of
short-term pain feeling assessment, which is important in the manual therapy.
between subsequent video frames (Fig. 3). For proper tracking, the KLT algo-
rithm requires a ROI with many corner points of no significant differentiation
in the following video frames. Still, the KLT uses a complex transformation
and manages to track points moving with various velocities. The transformation
derived is applied to the ROI bounding box outlined in Phase 1 and updated for
consecutive video frames tracking the face position.
For each video frame the angle between the ROI bounding box diagonal and
the x-axis is found (Fig. 4). The result is referred to as the head tilt angle. Two
features are extracted from each 4-second frame for further analysis: the range
of a head tilt based on the angle and the maximum of its 1st derivative.
2.4 Classification
Frame labelling is determined based on the patient’s subjective pain thresholds.
All frames marked by a subject with a number higher than SPT are treated as
severe pain class, whereas those below MPT stand for no-pain class (details are
given in [3]). The remaining frames are not used in this study. Since there is
a noticeable class imbalance (300 severe pain and 736 no-pain frames), we use
350 M. Bieńkowska et al.
Fig. 4. Head tilt angle detection. Characteristic points are marked by white signs,
yellow rectangles stand for tracked bounding box. Head tilt angle α = 49◦ (left frame)
and α = 57◦ (right frame)
two differently arranged data sets. In the first one, the observations’ number
equals 300 in both classes, whereas the second data set contains the originally
imbalanced number of frames.
Decision tree and random forest classifiers are used for binary data classifica-
tion. A decision tree is a simple and quick classifier that can be used in real-time
analysis. Decision trees split nodes to get the best impurity in the received class
through the minimization of the Gini index. The feature that ensures a division
into two the most pure classes is selected during each split. The same feature
can be used a few times, but a too deep tree causes the classifier to train specific
cases instead of global rules. In this study, the max number of splits was 4 to
avoid overfitting to the training set. Because decision trees are unstable (slight
variance in the data may distort the result), their ensemble - random forest was
also used.
Random forest is a method of tree bagging that combines the predictions of
several weak base learners. The training set is divided into a few groups. Each
group trains one weak learner. A random subset of predictors is used for node
splitting in the growing phase. It results in a diversity of the trees ensemble [5].
Video and Biomedical Signals Analysis in Pain Level Recognition 351
Then, the random forest prediction is made as follows: the results of all weak
learners are given, and the class with the most votes stands for the final decision.
Compared to basic decision tree classifier, random forest leads to more accurate
output and reduced variance. In this study, the number of the learning cycles
was set to 30. Its increase did not significantly affect the results.
3 Results
Several experiments were carried out. Firstly, we used two different feature sets to
validate classification performance. One set enclosed all six extracted features,
and the second one was reduced to biomedical signals features only. Besides,
we tested imbalanced and balanced data sets (in terms of class observations’
number). A 10-fold and one-patient-out cross-validation was applied. Presented
results are the median of the 10-times run, each with a new random data set.
The assessment involved five metrics: accuracy, sensitivity, precision, specificity,
and F1 score. Results for decision trees classifier are presented in Table 1 and for
random forest in Table 2. Table 3 represents results for imbalanced data.
One-patient-out 10-fold
All features Without video All features Without video
Accuracy 0.68 ± 0.06 0.65 ± 0.08 0.75 ± 0.07 0.75 ± 0.05
Sensitivity 0.67 ± 0.16 0.78 ± 0.21 0.90 ± 0.07 0.89 ± 0.07
Precision 0.64 ± 0.20 0.64 ± 0.24 0.69 ± 0.06 0.70 ± 0.04
Specificity 0.73 ± 0.12 0.60 ± 0.12 0.59 ± 0.12 0.61 ± 0.08
F1 score 0.62 ± 0.08 0.66 ± 0.12 0.78 ± 0.05 0.78 ± 0.04
One-patient-out 10-fold
All features Without video All features Without video
Accuracy 0.66 ± 0.10 0.61 ± 0.09 0.85 ± 0.06 0.81 ± 0.05
Sensitivity 0.60 ± 0.11 0.61 ± 0.16 0.85 ± 0.04 0.83 ± 0.06
Precision 0.68 ± 0.21 0.65 ± 0.15 0.85 ± 0.09 0.80 ± 0.08
Specificity 0.73 ± 0.13 0.59 ± 0.30 0.86 ± 0.08 0.80 ± 0.07
F1 score 0.64 ± 0.15 0.61 ± 0.10 0.85 ± 0.06 0.81 ± 0.06
352 M. Bieńkowska et al.
One-patient-out 10-fold
All features Without video All features Without video
Accuracy 0.77 ± 0.10 0.72 ± 0.13 0.90 ± 0.02 0.84 ± 0.04
Sensitivity 0.51 ± 0.20 0.44 ± 0.23 0.76 ± 0.08 0.65 ± 0.09
Precision 0.60 ± 0.44 0.57 ± 0.27 0.86 ± 0.09 0.77 ± 0.13
Specificity 0.91 ± 0.08 0.85 ± 0.10 0.95 ± 0.03 0.92 ± 0.05
F1 score 0.46 ± 0.34 0.48 ± 0.23 0.80 ± 0.06 0.70 ± 0.09
4 Discussion
Carried out experiments aimed to find the impact of video features on pain
level classification. Video and biomedical data fusion provide better results for
a random forest than using only biomedical signals (F1 score at 0.85 and 0.81,
respectively). Though, this seems not to affect decision trees performance: results
for both data sets are comparable. Here, the first split was based on the EMG
energy and implies EMG to be a distinctive predictor. Head pose features were
used in further steps and thus had less impact on the results. Since random
forest impose random feature set at every split, video data could be involved
in initial steps. It resulted in boosting the classification performance. Therefore,
head pose analysis may be beneficial in pain level recognition systems for manual
therapy.
Imbalance in severe pain and no-pain frames is well known in literature [23].
Hence, there is a need to validate systems on such imbalanced data sets. Sat-
isfying results were obtained for the random forest with sensitivity at 0.76 and
F1 score at 0.80 for the all-features set. Figure 6 presents a confusion matrix for
imbalanced and balanced data set classification.
It is worth considering which validation method is used for pain classifi-
cation problem. Since, in some cases, training and test sets include the same
patient’s samples, k-fold cross-validation may artificially improve the results.
Regarding the widely observed subjectivity of pain reactions, one-patient-out
cross-validation reflects the credibility of classifiers better than k-fold cross-
validation. Several head-pose responses were observed from pain-avoiding moves
to a relaxed, upright position. What is more, manual therapy could affect head
positions, i.e., some patients may turn their heads to facilitate access to the
muscle. Thus, the patient-specific model is desirable in our future work.
In previous works, Zhi et al. [27] proposed a method integrating biomedical
(EDA, ECG, and EMG) and video data and received accuracy of 0.68, where the
pain was induced by heat and electric stimulation. Better results (accuracy of
0.89) were obtained on cold compressor stimulation [10], where heart rate, blood
pressure, respiration, EDA, and video-based facial action were combined. Still,
experiments were carried out in laboratory conditions. Zamzmi et al. [25] per-
formed pain level assessment in 32–41 weeks old infants for painful procedures,
Video and Biomedical Signals Analysis in Pain Level Recognition 353
e.g., heel lancing. Facial expressions, body movements, and vital signs (HR, res-
piratory rate, and spirometry) integration resulted in accuracy of 0.95, yet the
pain was assessed in 1-minute intervals. Manual therapy requires continuous pain
monitoring, affecting therapy course and intensity. This study used overlapped
4-seconds frames, which spot better pain onset reactions. It also directs our
further efforts to a real-time recognition approach.
Fig. 6. Confusion matrices for imbalanced and balanced data set, respectively. Results
of random forest for all features classification
5 Conclusion
In this paper, an approach for automatic pain assessment was presented. The
study was based on data acquired during the manual physiotherapy. Biomedical
signals (EDA, BVP, and EMG) and video data based on the head pose were used
in binary classification. Decision trees and random forest were tested, yielding
accuracy of 0.85 for balanced data sets. In some cases, video data enhanced the
classification performance.
References
1. Aqajari, S.A.H., et al.: Pain assessment tool with electrodermal activity for post-
operative patients: method validation study. JMIR mHealth uHealth 9(5), e25–258
(2021)
2. Badura, A., Bieńkowska, M., Masłowska, A., Czarlewski, R., Myśliwiec, A.,
Pietka, E.: Multimodal signal acquisition for pain assessment in physiotherapy.
In: Pietka, E., Badura, P., Kawa, J., Wieclawek, W. (eds.) Information Technology
in Biomedicine. AISC, vol. 1186, pp. 227–237. Springer, Cham (2021). https://doi.
org/10.1007/978-3-030-49666-1_18
354 M. Bieńkowska et al.
3. Badura, A., Masłowska, A., Myśliwiec, A., Piętka, E.: Multimodal signal analysis
for pain recognition in physiotherapy using wavelet scattering transform. Sensors
21(4), 1311 (2021)
4. Benedek, M., Kaernbach, C.: Decomposition of skin conductance data by means
of nonnegative deconvolution. Psychophysiology 47(4), 647–658 (2010)
5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
6. Cram, J.R., Steger, J.C.: Emg scanning in the diagnosis of chronic pain. Biofeed-
back Self-regul. 8(2), 229–241 (1983)
7. Greco, A., Marzi, C., Lanata, A., Scilingo, E.P., Vanello, N.: Combining electro-
dermal activity and speech analysis towards a more accurate emotion recognition
system. In: 2019 41st Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC), pp. 229–232 (2019)
8. Greco, A., Valenza, G., Lanata, A., Scilingo, E.P., Citi, L.: cvxEDA: a convex
optimization approach to electrodermal activity processing. IEEE Trans. Biomed.
Eng. 63(4), 797–804 (2016)
9. Haque, M.A., et al.: Deep multimodal pain recognition: a database and comparison
of spatio-temporal visual modalities. In: 2018 13th IEEE International Conference
on Automatic Face & Gesture Recognition (FG 2018), pp. 250–257. IEEE (2018)
10. Hinduja, S., Canavan, S., Kaur, G.: Multimodal fusion of physiological signals and
facial action units for pain recognition. In: 2020 15th IEEE International Confer-
ence on Automatic Face and Gesture Recognition (FG 2020), pp. 577–581. IEEE
(2020)
11. Jones, M.J., Viola, P., et al.: Robust real-time object detection. In: Workshop on
statistical and computational theories of vision, vol. 266, p. 56 (2001)
12. Lim, H., Kim, B., Noh, G.J., Yoo, S.K.: A deep neural network-based pain classifier
using a photoplethysmography signal. Sensors 19(2), 384 (2019)
13. Lopez-Martinez, D., Peng, K., Lee, A., Borsook, D., Picard, R.: Pain detection with
FNIRS-measured brain signals: a personalized machine learning approach using the
wavelet transform and bayesian hierarchical modeling with dirichlet process pri-
ors. In: 2019 8th International Conference on Affective Computing and Intelligent
Interaction Workshops and Demos (ACIIW), pp. 304–309. IEEE (2019)
14. Lucey, P., Cohn, J.F., Prkachin, K.M., Solomon, P.E., Matthews, I.: Painful data:
The UNBC-McMaster shoulder pain expression archive database. In: 2011 IEEE
International Conference on Automatic Face & Gesture Recognition (FG), pp.
57–64. IEEE (2011)
15. Naeini, E.K., et al.: Pain recognition with electrocardiographic features in post-
operative patients: method validation study. J. Med. Internet Res. 23(5), e25–079
(2021)
16. Salekin, M.S., Zamzmi, G., Goldgof, D., Kasturi, R., Ho, T., Sun, Y.: Multimodal
spatio-temporal deep learning approach for neonatal postoperative pain assess-
ment. Comput. Biol. Med. 129, 104–150 (2021)
17. Shi, J., et al.: Good features to track. In: 1994 Proceedings of IEEE Conference on
Computer Vision and Pattern Recognition, pp. 8–10. IEEE (1994)
18. Terkelsen, A.J., Mølgaard, H., Hansen, J., Andersen, O.K., Jensen, T.S.: Acute
pain increases heart rate: differential mechanisms during rest and mental stress.
Auton. Neurosci. 121(1–2), 101–109 (2005)
19. Tomasi, C., Kanade, T.: Detection and tracking of point. Int. J. Comput. Vis. 9,
137–154 (1991)
Video and Biomedical Signals Analysis in Pain Level Recognition 355
20. Velana, M., Gruss, S., Layher, G., Thiam, P., Zhang, Y., Schork, D., Kessler, V.,
Meudt, S., Neumann, H., Kim, J., Schwenker, F., André, E., Traue, H.C., Walter,
S.: The SenseEmotion database: a multimodal database for the development and
systematic validation of an automatic pain- and emotion-recognition system. In:
Schwenker, F., Scherer, S. (eds.) MPRSS 2016. LNCS (LNAI), vol. 10183, pp.
127–139. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59259-6_11
21. Walter, S., et al.: The biovid heat pain database data for the advancement and
systematic validation of an automated pain recognition system. In: 2013 IEEE
International Conference on Cybernetics (CYBCO), pp. 128–131. IEEE (2013)
22. Werner, P., Al-Hamadi, A., Limbrecht, K., Walter, S., Gruss, S., Traue, H.C.:
Automatic pain assessment with facial activity descriptors. IEEE Trans. Affect.
Comput. 8, 286–299 (2017). https://doi.org/10.1109/TAFFC.2016.2537327
23. Werner, P., Lopez-Martinez, D., Walter, S., Al-Hamadi, A., Gruss, S., Picard, R.:
Automatic recognition methods supporting pain assessment: a survey. IEEE Trans.
Affect. Comput. 1 (2019)
24. Williams, A.C.D.C.: Facial expression of pain: an evolutionary account. Behav.
Brain Sci. 25(4), 439–455 (2002)
25. Zamzmi, G., Pai, C.Y., Goldgof, D., Kasturi, R., Ashmeade, T., Sun, Y.: An app-
roach for automated multimodal analysis of infants’ pain. In: 2016 23rd Interna-
tional Conference on Pattern Recognition (ICPR), pp. 4148–4153. IEEE (2016)
26. Zhang, X., et al.: Bp4d-spontaneous: a high-resolution spontaneous 3d dynamic
facial expression database. Image Vis. Comput. 32(10), 692–706 (2014)
27. Zhi, R., Zhou, C., Yu, J., Li, T., Zamzmi, G.: Multimodal-based stream integrated
neural networks for pain assessment. IEICE Trans. Inf. Syst. 104(12), 2184–2194
(2021)
Electromyograph as a Tool for Patient
Feedback in the Field of Rehabilitation
and Targeted Muscle Training
1 Introduction
of the biosignal. Another aspect is the fact that this form can be complicated
for assessment by the layperson.
However, for rehabilitation purposes or e.g. targeted training for athletes
effective and fast real-time feedback is essential. Here other ways of output from
the electromyograph are offered, e.g. as a conversion to an acoustic signal or as
another, simplified and easier to perceive, form of graphical output.
Biofeedback is a tool used for effective rehabilitation. The aim of biofeed-
back is to convey information to the patient in a simple way, for example about
their muscle activity. Our aim is to present a comprehensive aid for the field of
neurorehabilitation or for targeted training of athletes. Nowadays, good results
are achieved by aids that are equipped with an acoustic output. This type of
biofeedback has proven to be more beneficial and less stressful for patients than
visual and audiovisual biofeedback [13]. There are multiple ways to transform
information from an EMG signal to sound. For example, one presented is to
assign a tone color to a given channel. The patient has been shown to be able
to recognize which muscle is moving based on the tone [8]. Acoustic output has
been shown to be beneficial for biofeedback in rehabilitation [16], but it does
not allow for simple quantification of EMG intensity. We attempt to overcome
this drawback in our work by introducing several options to present the elec-
trical activity of muscles with different levels of difficulty. The presentation of
the EMG signal is enabled by simple visualization as well as acoustic output.
By monitoring the set difficulty level, the patient receives information about
the progress and benefits of rehabilitation. This motivates the patient. We also
propose a possible muscle training that uses the EMG signal intensity levels to
trigger the tone of a musical instrument.
2 Methods Used
People after a severe injury, learning to move their musculoskeletal system again,
need simple feedback as to whether the efforts made to move the limb are having
a progressive impact. Conventional EMG devices need a screen displaying the
measured signal to visualize the measured record. It is often difficult to explain to
a layperson how to read the plotted record and what information it carries. This
information may be superfluous for feedback. It is important to find a simple way
to give the patient feedback on their muscle activity[1,2,4,6,11,14,15]. Feedback
should carry with it positive motivation and a vision of progress.
The new solution of the electromyograph extends the device with audio out-
put, simple visualization by lighting LED, displaying the EMG signal intensity
on a cascade of LEDs and playing sound when the set EMG signal intensity is
exceeded (possibility of playing a fanfare or the sound of a musical instrument).
Emphasis is placed on the easy transmission of information about muscle activity
to the patient. Motivation is also thought of to give the patient the strength to
continue rehabilitation. The audio output may be useful to assess the electrical
activity of the muscles from a different perspective. In order to obtain the most
faithful signal for the audio output, signal pre-processing must be implemented
358 M. Prochazka and V. Kasik
An integral part of the measured signal, although not wanted, are interfering
signals called artifacts. In terms of origin, artefacts are divided into biological and
technical. Biological artefacts originate from the patient’s own activity. Specif-
ically, in EMG, it is the electrical activity of the surrounding muscles, as well
EMG as Feedback During Rehabilitation 359
as the motion artifact resulting from the mutual movement of the electrode and
the electrolyte. One way to remove biological artifact is to capture the biological
activity itself separately and then subtract it from the signal. Another option is
sophisticated systems based on data analysis and classification (fuzzy systems,
machine learning, artificial intelligence). Technical artefacts are signals that do
not originate from the patient’s own activity. A typical example is electric hum,
which can significantly distort the signal under examination due to capacitive
coupling. Other technical artefacts are different half-cell voltages on the elec-
trodes, drying of the electroconductive gel, and thermal and contact noise of the
components [3,10,12,17]. In the design of the EMG signal measurement chain,
the elimination of artifacts must be achieved as much as possible.
Biosignal pre-processing can be divided into several sub-blocks (Fig. 1).
As an input pre-amplifier, an instrumentation amplifier is most often used,
which with its high input impedance is suitable for sensing biological signals.
Next, a high-pass filter is used to remove the DC component, and the artifact
caused by the different half-cell potentials at the electrodes is also removed. An
isolation amplifier based on capacitive, inductive or optical coupling provides
galvanic isolation to ensure safety for the patient in the event of a failure. A
low-pass filter removes high-frequency interference. A band-pass filter is used
for analogue interference removal. If it is not necessary to visualize the signal
analogue, mains interference can be removed digitally using the latest algorithms
[7]. A variable gain amplifier provides the final amplification of the pre-processed
signal [2,5,10].
The interpretation of the EMG signal in this case means a simple visuali-
sation of the signal either optically or acoustically. For the creation of acoustic
output, well-known circuitry such as LM386 or voltage-to-frequency converter
circuitry (e.g. CD4046) can be used. LED illumination can be realized, for exam-
ple, by wiring a transistor in the form of a follower. The implementation of the
visualization of the EMG signal intensity on the LED cascade and the playback
of the sound for exceeding a certain EMG signal intensity will be approached dig-
itally. The level of EMG signal visualization can be easily changed by a variable
amplifier (penultimate block in the biosignal pre-processing).
The filters are chosen by active second-order Sallen-Key. These filters have
a –6 dB drop in cutoff frequency followed by a –40 dB/dec drop. This is a
reasonable compromise between overshoot (higher order filters) and insufficient
attenuation of unwanted frequencies. The cutoff frequencies are chosen according
to the written frequency ranges. The low‘ pass filter 23 Hz and the high pass filter
498 Hz. The cutoff frequency (fcut−of f ) is calculated using formula (2).
1
fcut−of f = (2)
2πRC
where R and C are the selected component values for the filter.
The notch filter for removing mains interference is chosen active second order,
the cutoff frequency is calculated according to Formula (2). The resulting fre-
quency response is shown in Fig. 2.
1. Audio output.
2. Visualization by LED lighting.
3. Visualisation of the EMG signal intensity on cascade of LEDs.
4. Audio playback after exceeding a certain EMG signal intensity.
5. EMG signal measurement (PC and ready program for this device required).
6. Saving EMG signal to SD card (for later analysis).
The first four options are designed for simple feedback during rehabilitation.
The last two options are for medical purposes and for subsequent analysis of the
measured EMG. The audio output is realized by wiring the LM386 as an audio
amplifier.
For sound playback after the EMG signal intensity is reached, a threshold
is initially set, after which the selected sound is played. The audio for playback
is stored on the SD card. It is possible to choose a sound that is motivating
(fanfare). Another option can be, to play the tone of an instrumental instrument.
The patient can play melodies of one tone. By connecting other devices with
different tones, it is possible to play a melody of several tones. By playing, the
patient thus exercises to stimulate his muscles, which has a positive impact on
the outcome of rehabilitation. For this exercise, the patient is presented with
sheet music (Fig. 5).
Fig. 6. Block diagram of the implemented pre-processing of EMG signal and its possible
visualization
threshold is crossed. In the top middle there is a difficulty setting from level 1
to level 6. Where level 1 amplifies the signal 1281 times and level 6 amplifies
the signal 218 times. As the level of amplification increases, the amplification
decreases which helps the patient to present the EMG signal easily. On the top
right side, there is a visual presentation of the EMG signal i.e. LED lighting
and LED cascade. A note holder is located in the top On the right side there is
an electrode input in the form of a 3.5 jack connector. In the back there is an
SD card slot and a connector for connecting a bmeng DAU unit. At the bottom
there is a speaker.
6 Discussion
The prototype was tested for functionality on ten volunteers. With the difficult
level 1 setting, it could be observed that the presentation of the EMG signal was
activated at the low effort. In contrast, at the level 6 setting, increased effort
had to be exerted to achieve the desired biofeedback. A shortcoming of the
device is the limitation to only one channel. For future work, the inclusion of at
least two channels is considered, which will open up possibilities to include other
biofeedback options in the form of games. In future work it is intended to compare
the different biofeedback in terms of the benefit of faster recovery. The different
presentations of the EMG signal will be compared in terms of concentricity.
Furthermore, the time spent with each biofeedback will be calculated to evaluate
the user’s most used EMG signal presentation. Also, it is planned to connect the
device with a phone where the amount of EMG signal will control a simple game.
7 Conclusion
The main aim of the work was to create and present a new prototype with
several possibilities of EMG signal presentation for the general public for use
in rehabilitation or for targeted training of athletes. The problem of analogue
pre-processing of biosignals is discussed in detail. In the creation of analogue pre-
processing, EMG signal parameters (for setting the cutoff frequencies of filters
and sufficient amplification) are included. The device is powered by a galvani-
cally isolated A/D converter bmeng DAU unit. For this reason, galvanic isolation
on the board is not implemented. Four options for simple EMG signal presenta-
tion are considered. The first option is an acoustic signal directly to the EMG
signal. The recording resembles dry crackling of wood. As the intensity of the
EMG signal increases, the volume of the audio output increases (Fig. 3). The
second option is to visualize the intensity of the EMG signal on an LED. By
clenching the muscle, the LED lights up. Again, at higher EMG signal intensi-
ties the LED glows more. The other two visualization options are handled by
the microcontroller. The third option displays the EMG signal intensity on the
cascade LEDs (Fig. 4). The last option presents the EMG signal as a playback
of the sound. When the set level of the EMG signal is exceeded, the selected
sound is played, it can be the sound like a fanfare or for example the tone of
EMG as Feedback During Rehabilitation 365
References
1. Cases, C.M.P., Baldovino, R.G., Manguerra, M.V., Dupo, V.B., Dajay, R.C.R.,
Bugtai, N.T.: An EMG-based gesture recognition for active-assistive rehabilita-
tion. In: 2020 IEEE 12th International Conference on Humanoid, Nanotechnology,
Information Technology, Communication and Control, Environment, and Manage-
ment (HNICEM), pp. 1–6. IEEE (2020)
2. Chan, B., Saad, I., Bolong, N., Siew, K.E.: A review of surface EMG in clinical
rehabilitation care systems design. In: 2021 IEEE 19th Student Conference on
Research and Development (SCOReD), pp. 371–376. IEEE (2021)
3. Darak, B.S., Hambarde, S.: A review of techniques for extraction of cardiac artifacts
in surface EMG signals and results for simulation of ECG-EMG mixture signal.
In: 2015 International Conference on Pervasive Computing (ICPC), pp. 1–5. IEEE
(2015)
4. Gotuzzo, J., Vu, S., Dee, S., George, K.: Electromyography based orthotic arm and
finger rehabilitation system. In: 2018 IEEE International Conference on Healthcare
Informatics (ICHI), pp. 338–339. IEEE (2018)
5. Kieliba, P., Tropea, P., Pirondini, E., Coscia, M., Micera, S., Artoni, F.: How are
muscle synergies affected by electromyography pre-processing? IEEE Trans. Neural
Syst. Rehabil. Eng. 26(4), 882–893 (2018)
6. Kim, Y.H., Kim, S.J., Shim, H.M., Lee, S.M., Kim, K.S.: A method for gait reha-
bilitation training using EMG fatigue analysis. In: 2013 International Conference
on ICT Convergence (ICTC), pp. 52–55. IEEE (2013)
7. Malboubi, M., Razzazi, F., Sh, M.A., Davari, A.: Power line noise elimination
from EMG signals using adaptive laguerre filter with fuzzy step size. In: 2010 17th
Iranian Conference of Biomedical Engineering (ICBME), pp. 1–4. IEEE (2010)
8. Matsubara, M., Terasawa, H., Kadone, H., Suzuki, K., Makino, S.: Sonification of
muscular activity in human movements using the temporal patterns in EMG. In:
Proceedings of the 2012 Asia Pacific Signal and Information Processing Association
Annual Summit and Conference, pp. 1–5. IEEE (2012)
9. Merletti, R., Farina, D.: Surface Electromyography: Physiology, Engineering, and
Applications. Wiley, Hoboken (2016)
10. Merletti, R., Parker, P.J.: Electromyography: Physiology, Engineering, and Non-
invasive Applications, vol. 11. Wiley, Hoboken (2004)
11. Poonsiri, J., Charoensuk, W.: Surface EMG based controller design for knee reha-
bilitation devices. In: The 4th 2011 Biomedical Engineering International Confer-
ence, pp. 131–134. IEEE (2012)
12. Preston, D.C., Shapiro, B.E.: Electromyography and neuromuscular disorders
e-book: clinical-electrophysiologic correlations (Expert Consult-Online). Elsevier
Health Sciences (2012)
13. Rastogi, R., et al.: Which one is best: Electromyography biofeedback, efficacy anal-
ysis on audio, visual and audio-visual modes for chronic tth on different character-
istics. Int. J. Comput. Intell. IoT 1(1), 25–31 (2018)
366 M. Prochazka and V. Kasik
14. Sheng, G., Wang, L., Ma, D., Fan, F., Niu, H.: The design of a rehabilitation train-
ing system with EMG feedback. In: 2012 International Conference on Biomedical
Engineering and Biotechnology, pp. 917–920. IEEE (2012)
15. Suhaimi, R., et al.: Analysis of EMG-based muscles activity for stroke rehabili-
tation. In: 2014 2nd International Conference on Electronic Design (ICED), pp.
167–170. IEEE (2014)
16. Tsubouchi, Y., Suzuki, K.: Biotones: a wearable device for EMG auditory biofeed-
back. In: 2010 Annual International Conference of the IEEE Engineering in
Medicine and Biology, pp. 6543–6546. IEEE (2010)
17. Weiss, J.M., Weiss, L.D., Silver, J.K.: Easy Emg-E-Book: A Guide to Perform-
ing Nerve Conduction Studies and Electromyography. Elsevier Health Sciences,
Philadelphia (2022)
18. Winter, D.A.: Biomechanics and Motor Control of Human Movement. Wiley, Hobo-
ken (2009)
Touchless Pulse Diagnostics Methods
and Devices: A Review
1 Introduction
The necessity for the continuous measurement of human vital signs exists as long
as medicine. An important aspect of the use of such measurement is not only the
ability to react quickly in the event ofI a life- or health-threatening situation,
but also to analyse the behaviour of the human body under certain conditions
and contribute to development of knowledge.
The acquisition methods can be divided into invasive or noninvasive. Exam-
ples of invasive measurements are: vascular and cardiac catheterization [1],
aminopuncture [2] and biopsy [3]. Although these methods are accurate, they
involve patient stress [2], need to be performed under appropriate, preferably
sterile conditions and require qualified personnel supervision during the mea-
surement.
In this article we address noninvasive measurements. They are increasingly
competitive alternative to invasive testing. During performing noninvasive mea-
surements, the skin is not interrupted and the achieved results allow for a correct
diagnosis [4]. Among of noninvasive methods we focus on touchless data acquisi-
tions due to their possible distant use easier setup and lack of hygienic issues. The
parameters used to make such a measurement are the optical and mechanical
properties of the patient’s body.
Fig. 1. Example of the pulse wave flow recording (the arrow indicates the dicrotic
waveform) [8]
2.2 Videoplethysmography
Videoplethysmographic (VPG) measurement is usually performed with a com-
mercially available camera [10], [11]. The VPG method is used to detect heart
rate and to seek for cardiac disorders (e.g. atrial fibrillation [12]), and can col-
lect the data needed for heart rate variability analysis [13]. The study is also
obtaining blood pressure information for the video signal [14].
To perform a usable VPG measurement, it is important to choose the appro-
priate lighting [15]. According to [16], the most favorable results ate obtained
for warm light colours. Because of the difference in absorption of the incident
light, the skin tone must be also considered. In the RGB colour space for the
people with lighter skin colour, the green component of the signal shows higher
variability, whereas for the people with darker skin colour, the red component
was found better to analyze [17].
The place the most commonly chosen for measurements is the face due to
the availability and good quality of the signal obtained from it. However there
are also studies performed with footage taken from other places. The tests were
carried out with chest [16] and neck [18].
The VPG method has some limitations related with use of visible light. The
presence of shadows on the test object, inadequate illumination and motion
(of the camera or the examined person) interferes with the results, so that the
analysed signal will be nondiagnostic [19].
In the VPG method, during or after a correctly performed video recording,
the program should determine a skin region (e.g. using Viola-Jones algorithm or
Haar cascades) [20]. The next step is to split the signal contained in the area of
interest into chromatic components [11] or convert to grayscale [15]. The analysis
of the obtained video is based on the principal component analysis algorithm
[19], independent component analysis algorithm [11], or autoregressive model
[21]. The signal after correct acquisition and processing should be characterized
by variability of frequency adequate to the pulse rate of the examined person.
In the Table 2 we have included the set of measurements errors obtained by
the VPG method to the reference method.
4 Discussion
In this article we analyzed some of touchless measuring methods and devices
focusing on blood pulse. There are several other ways to collect data of human
vital signs (e.g. augmented reality, intelligent virtual agents [46]) noninvasively
and it can be a topic for further researches.
We focused on PPG, VPG laser, and radar measurements. The main advan-
tages of each are possibility of continuous and noninvasive (or even contactless)
measurement. The best results were reached for PPG – the error in heart rate
estimation was 0.14% [9] and 0.76% [7]. The biggest error have VPG measure-
ments: 3.63% [11] and 9.52% [11]. This can be due to longer history of use PPG
in medicine (common use of pulseoximeters in emergency medicine and in hospi-
tals). The VPG has much shorter time of existence, moreover some methods are
designed for distant measurements (up to 10 m). Laser measurement was found
the less matured from this group. To make things worse it uses potentially harm-
ful light, requires patient immobility for a considerable amount of time and offers
poor accuracy. In radar measurements the estimate of respiration rate has error
at the level 0.67% [28] and of heart rate: 1.5% [25] and 5% [29] were fair in home
care.
Touchless Pulse Diagnostics Methods... 373
References
1. Mazurek, T.: Wskazania diagnostyczne do cewnikowania jam serca, zasady zabiegu,
pp. 202–205. Gdańsk, Via Medica (2013)
2. Siedlecka, A., Ciach, K., Świątkowska-Frerund, M., Preis, K.: Fear related to
aminocentesis as a method if invasive prenatal diagnosis. GinPolMedProject 4(18),
38–43 (2010)
3. Pawełczyk, K., Marciniak, M., Kołodziej, J.: Invasive diagnostics of throatic malig-
nant diseases. Adv. Clin. Exp. Med. 13(6), 1067–1072 (2004)
4. Swora, E., Stankowiak-Kulpa, H., Marcinkowska, E., Grzymisławski, M.: Clinical
aspects of diagnostics in heliobacter pylori infection. Nowiny Lekarskie 78(3–4),
228–230 (2009)
374 A. Pająk and P. Augustyniak
5. Castaneda, D., Esparza, A., Ghamari, M., Soltanpur, C., Nazezran, H.: A review
on wearable photoplethysmography sensors and their potential future applications
in health care. Int. J. Biosens. Bioelectron. 4(4), 195–202 (2018)
6. Celka, P., Charlton, P.H., Farukh, B., Chowienczyk, P., Alastruey, J.: Influence
of mental stress on the pulse wave features of photoplethysmograms. Healthcare
Technol. Lett. 7(1), 7–12 (2020)
7. Hong, S., Park, K.S.: Unobtrusive photoplethysmographic monitoring under the
foot sole while in a standing posture. Sensors 18, 3239 (2018)
8. Prokop, D.: Zastosowanie wieloczujnikowego optoelektronicznego systemu pomi-
arowego do badania przebiegów fali tętna (2017)
9. Nabeel, P.M., Jayaraj, J., Mohansankar, S.: Single-source PPG based local pulse
wave velocity measurement: a potential cuffless blood pressure estimation tech-
nique. Physiol. Meas. 38(12), 2122–2140 (2017)
10. Poh, M.-Z., McDuff, D.J., Picard, R.W.: Advancements in noncontact, multipa-
rameter physiological measurements using a webcam. IEEE Trans. Biomed. Eng.
58(1), 7–11 (2011)
11. Poh, M.-Z., McDuff, D.J., Picard, R.W.: Non-contact, automated cardiac pulse
measurements using video imaging and blind source separation. Opt. Exp. 18(10),
10762–10774 (2010)
12. Couderc, J.-P., et al.: Detection of atrial fibrillation using contactless facial video
monitoring. Heart Rhythm 12(1), 195–201 (2015)
13. Couderc, J.-P., et al.: Pulse harmonic strength of facial video signal for the detec-
tion of atrial fibrillation. Comput. Cardiol. 41, 661–664 (2014)
14. Sugita, N., et al.: Estimation of absolute blood pressure using video images cap-
tured at different heights from the heart. In: 2019 41st Annual International Con-
ference of the IEEE Engineering in Medicine and Biology Society (EMBC) (2019)
15. Przybyło, J., Kańoch, E., Jabłoński, M., Augustyniak, P.: Distant measurements
of plethysmographic signal in various lighting conditions using configurable frame-
rate camera. Metrol. Meas. Syst. 23(4), 579–592 (2016)
16. Mędrala, R., Augustyniak, P.: Taking Videoplethysmographic Measurements at
Alternative Parts of the Body - Pilot Study, PCBBE (2019)
17. Królak, A.: Influence of skin tone on efficiency of vision-based heart rate estimation.
In: Augustyniak, P., Maniewski, R., Tadeusiewicz, R. (eds.) PCBBE 2017. AISC,
vol. 647, pp. 44–55. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-
66905-2_4
18. Nabeel, P.M., Jayaraj, J., Mohanasankar, S.: Single-source PPG based local pulse
wave velocity measurement: a potential cuffess blood pressure estimation tech-
nique. Inst. Phys. Eng. Med. 38(12), 2122–2140 (2017)
19. Al-Naji, A., Perera, A.G., Chahl, J.: Remote monitoring of cardiorespiratory sig-
nals from a hovering unmanned aerial vehicle. BioMed. Eng. OnLine 16, 101 (2017).
https://doi.org/10.1186/s12938-017-0395-y
20. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple
features. In: Proceedings of IEEE Conference on Computer Vision and Pattern
Recognition, p. 511. IEEE (2001)
21. Przybyło, J.: Continuous distant measurement of the user’s heart rate in human-
computer interaction applications. Sensors 19, 4205 (2019)
22. Wu, J.H., Chang, R.S., Jiang, J.A.: A novel pulse measurement system by using
laser triangulation and a CMOS image sensor. Sensors 7(12), 3366–3385 (2007).
https://doi.org/10.3390/s7123366
Touchless Pulse Diagnostics Methods... 375
23. Antognoli, L., Moccia, S., Migliorelli, L., Casaccia, S., Scalise, L., Frontoni, E.:
Heartbeat detection by laser doppler vibrometry and machine learning. Sensors.
20(18), 5362 (2020)
24. Lin, J.C.: Noninvasive microwave measurement of respiration. IEEE 63(10), 1530–
1530 (1975)
25. Ren, L., et al.: Phase based methods for heart rate detection using UWB impulse
doppler radar. IEEE Trans. Microwave Theor. Tech. 64(10), 3319–3331 (2016)
26. Rong, Y., Herschfelt, A., Holtom, J., Bliss, D.W.: Cardiac and respiratory sensing
from a hovering UAV radar platform. In: 2021 IEEE Statistical Signal Processing
Workshop (2021)
27. Abdulatif, S., et al.: Power-based real-time respiration monitoring using FMCW
radar. Comput. Sci. Eng. (2017)
28. Regev, N., Wulich, D.: Radar-based, simultaneous human presence detection and
breathing rate estimation. Sensors 21, 3529 (2021)
29. Michahelles, F., Wicki, R., Schiele, B.: Less contact: heart-rate detection without
even touching the user. In: Eighth International Symposium on Wearable Com-
puters (2004)
30. Ravichandran, R.: et al., WiBreathe: estimating respiration rate using wireless
signals in natural settings in the home. In: 2015 IEEE International Conference on
Pervasive Computing and Communications (2015)
31. Jasińki, Ł.: Pomiar tłumienia ścian i innych elementów charakterystycznych dla
środowiska wewnątrzbudynkowego w paśmie 2,4 GHz, www.alvarus.org (2011)
32. Liu, J., et al.: Recent progress in flexible wearable sensors for vital sign monitoring.
Sensors 20, 4009 (2020)
33. Qiu, S., Wang, Z., Zhao, H., Hu, H.: Using distributed wearable sensors to measure
and evaluate human lower limbs motion. IEEE Trans. Instrum. Measur. 65(4),
939–950 (2016)
34. Weich, C., Vieten, M.M.: The Gaitprint: identifying individuals by their running
style. Sensors 20, 3810 (2020)
35. Petersen, J., Austin, D., Sack, R., Hayes, T.L.: Actigraphy-based scratch detection
using logistic regression. IEEE J. Biomed. Health Inf. 17(2), 277–283 (2013)
36. Zhang, P., Zhang, Z., Chao, H.-C.: A stacked human activity recognition model
based on parallel recurrent network and time series evidence theory. Sensors 20,
4016 (2020)
37. Pitou, S., Michael, B., Thompson, K., Howard, M.: Hand-Made embroidered elec-
tromyography: towards a solution for low-income countries. Sensors 20, 3347 (2020)
38. Chen, Z., Zhu, Q., Soh, Y.C., Zhang, L.: Roboust human activity recognition using
smartphone sensors svia CT-PCA and online SVM. IEEE Trans. Ind. Inf. 13(6),
3070–3080 (2017)
39. Huang, S.-J., Wu, C.-J., Chen, C.-C.: Pattern recognition of human postures using
the data density functional method. Appl. Sci. 8, 1615 (2018)
40. Hossain, T., Ahad, A.R., Inoue, S.: A method for sensor-based activity recognition
in missing data scenario. Sensors 20, 3811 (2020)
41. Horn, B.K.P.: Observation model for indoor positioning. Sensors 20, 4027 (2020)
42. Kańtoch, E.: Recognition of sedentary behaviour by machine learning analysis
of wearable sensors during activities of daily living for telemedical assessment of
cardiovascular risk. Sensors 18, 3219 (2018)
43. Zapata, J., Fernández-Luque, F.J., Ruiz, R.: Wireless sensor network for ambient
assisted living, December 2010. ISBN 978-953-307-321-7, https://doi.org/10.5772/
13005
376 A. Pająk and P. Augustyniak
44. Zhang, J., Xue, N., Huang, X.: A secure system for pervasive social network-based
healthcare. IEEE Access 4, 9239–9250 (2016)
45. Chen, M., Zhang, Y., Li, Y., Hassan, M.M., Alamri, A.: AIWAC: affective inter-
action through wearable computing and cloud technology. IEEE Wirel. Commun.
22(1), 20–27 (2015)
46. Norouzi, N., Bruder, G., Belna, B., Mutter, S., Turgut, D., Welch, G.: A systematic
review of the convergence of augmented reality, intelligent virtual agents, and the
internet of things. In: Al-Turjman, F. (ed.) Artificial Intelligence in IoT. TCSCI,
pp. 1–24. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-04110-6_1
Methods of Functional Assessment
of the Temporomandibular Joints –
Systematic Review
1 Introduction
The temporomandibular joint (TMJ) is an even joint in the human head that
allows for a movable connection between the skull and the mandible and performs
complex movements such as retraction and adduction of the jaws, extension and
retraction of the mandible, and laterotrusion movements. Consequently, this
allows for crushing and grinding of food, speaking and breathing [1]. Temporo-
mandibular joint dysfunctions (TMD) comprise a large group of disease enti-
ties, distinct in etiology, symptoms, and subsequent treatment modalities, often
accompanied by pain and reduced mobility [2]. TMJ disorders with accompa-
nying symptoms are a common problem. According to statistics, it is estimated
that 60 to 80% of the population is affected by TMJ disorders [3]. One example
of TMD is functional impairment, which can be caused by pathologies in TMJ
structure or tooth development and leads to abnormalities in the functioning
of the muscles responsible for mandibular movements in all three planes [4–7].
The necessity to evaluate the TMJs, whether resulting from functional disorders
(malocclusion, bruxism, ankylosis) or conditions after surgical procedures and
the correct diagnosis, is essential in many areas of science.
With the ubiquitous and readily available technology, there are many meth-
ods for functional assessment of TMJs, ranging from a standard ruler or caliper
measurements to advanced magnetic resonance imaging techniques [8–31]. Range
of motion (ROM) measures with manual devices are always subjective and
depend on many factors, such as the testing protocol, the examiner’s experience,
or the model of the equipment used [8–16]. Imaging techniques, repeatedly pre-
sented in the literature as methods to assess TMJ, are expensive and difficult to
access [32]. Solutions based on vibroacoustic are an interesting diagnostic alter-
native, but it should be remembered that the TMJ uses many muscles, which
may disturb the recorded signals [21–27]. It is essential to search for solutions
that allow objective assessment of the temporomandibular joint while minimiz-
ing the influence of both internal and external factors.
Therefore, based on authors’ knowledge and the experience, it was decided
that a systematic review was necessary to search for methods of functional assess-
ment of the TMJ using both, standard undemanding techniques and novel tech-
nological solutions.
2 Research Methods
2.1 Knowledge Sources and Search Strategy
The review of literature focused on the search for methods used for functional
assessment of the TMJ. The team of authors carried out this task in the period
from September to November 2021. For this purpose, the databases of scientific
Methods of Functional Assessment of TMJs 379
– the subject matter of the research concerned the categories listed above,
– the article was a scientific report and not a review,
– the type of paper was important: only clinical research, preliminary research,
individual case studies, and the latest reports on the problem discussed were
taken into account,
– the paper was written in English and Polish,
– the papers should have been published not earlier than 1 January 2011, except
for those on established rehabilitation techniques and major reports in the
field.
Throughout the search process, more than 18 000 articles were found considering
the presented topic. However, only 23 met the inclusion criteria and were further
analysed. The diagram below shows the successive criteria for including articles
in further analysis (Fig. 1), where n – the number of articles selected at the given
stage.
The variety of aspects that should be analyzed during the functional assessment
of the temporomandibular joint is large. The ROM in the sagittal plane and
the range of lateral movement in the frontal plane using calipers were most
380 D. Kania et al.
often analyzed in the available literature [8–17]. In addition, the strength of the
masticatory muscles was often checked using electromyography [12,18–20]. The
analysis results, including the essential elements of the research protocol and the
results obtained, in the context of the functional assessment of the TMJs using
traditional methods, are presented in the table below (Table 1).
Fig. 1. Flow diagram containing criteria for excluding articles from further analysis
Table 1. (continued)
Table 1. (continued)
Table 1. (continued)
focus was on the variety of methods presented and the potential uses of these
methods – the methods used, the equipment used, and the parameters analyzed
are listed.
Table 2. (continued)
Table 2. (continued)
5 Discussion
Nowadays, the world is struggling with many civilization diseases. Scientific
reports indicate that TMJ pathologies are also becoming one of them, as they
affect an increasing number of the people [31,33]. Temporomandibular joint
dysfunctions comprise a large group of conditions that vary in etiology, symp-
toms, and subsequent treatment options, often accompanied by pain and reduced
mobility [2]. The correct diagnosis of TMJs is a challenge posed by many research
centers. There are no simple, objective methods for functional assessment of
TMJs without highly specialized equipment. For diagnostic purposes, the range
of mandibular mobility in different planes has been the most frequently used
parameter to represent the TMJ status, and it has usually been measured with
a ruler, caliper, or goniometer [8–16]. Depending on the established testing proto-
col, this parameter in the sagittal plane was defined as the distance between the
incisors during maximum mouth opening [10,13] while in the frontal plane (max-
imum shift of the mandible to both sides) as the distance between the medial
slits of the upper and lower teeth [9,17]. Another variable repeatedly analyzed
in the literature for assessing TMJs is the maximum occlusal force measured
with dedicated dynamometers or occlusiography, which allows the evaluation of
masticatory muscle strength [10,12,31]. A technique also used to directly assess
the muscles responsible for TMJs mobility was recording an electromyographic
signal from multiple channels [12,18,19]. Indirect methods of muscle force anal-
ysis included recording thermal images to show the degree of heating of indi-
vidual regions of interest (ROI) or the symmetry of muscle work between the
right and left sides of the face [29,30]. These tools allowed to analyse and pre-
Methods of Functional Assessment of TMJs 387
6 Conclusion
Despite the weaknesses of the proposed solutions, the application of each of the
above-mentioned techniques makes it possible to evaluate the TMJ joint, and the
conclusions in the available literature prove the necessity of addressing the topic
of functional assessment of the TMJ, the acquisition of knowledge by specialists
in this subject area, as well as the search for objective solutions, all with the aim
of improving the quality and speed of TMD diagnosis.
388 D. Kania et al.
References
1. Herring, S.W.: TMJ anatomy and animal models. J. Musculoskelet. Neuronal Inter-
act. 3(4), 391 (2003)
2. Osiewicz, M.A., Lobbezoo, F., Loster, B.W., Wilkosz, M., Naeije, M., Ohrbach,
R.: Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD):
the Polish version of a dual-axis system for the diagnosis of TMD. (RDC/TMD)
form. J. Stomatology 66, 576–649 (2013)
3. Kapandji, A.I.: Anatomia funkcjonalna stawów. Kręgoslup i glowa 3, 216–217
(2013). (in Polish)
4. Ivkovic, N., Racic, M.: Structural and functional disorders of the temporomandibu-
lar joint (Internal disorders). Maxillofacial Surgery and Craniofacial Deformity-
practices and Updates (2018)
5. Lipowicz, A., et al.: Evaluation of mandibular growth and symmetry in child with
congenital zygomatic-coronoid ankylosis. Symmetry 13(9), 1634 (2021)
6. Dowgierd, K., Pokrowiecki, R., Borowiec, M., Kozakiewicz, M., Smyczek, D.,
Krakowczyk, Ł: A protocol for the use of a combined microvascular free flap
with custom-made 3D-printed total temporomandibular joint (TMJ) prosthesis
for mandible reconstruction in children. Appl. Sci. 11(5), 2176 (2021)
7. Kulesa-Mrowiecka, M., Pihut, M., Słojewska, K., Sułko, J.: Temporomandibular
joint and cervical spine mobility assessment in the prevention of temporomandibu-
lar disorders in children with osteogenesis imperfecta: a pilot study. Int. J. Environ.
Res. Public Health 18(3), 1076 (2021)
8. Kołodziejczyk, P., Kuciel-Lewandowska, J., Paprocka-Borowicz, M., Jarząb, S.,
Aleksandrowicz, K.: A photographic method of recording movement trajectory to
evaluate the effectiveness of physiotherapy in temporomandibular joint dysfunc-
tions - a preliminary study. Adv. Clin. Exp. Med. 20(1), 79–85 (2011)
9. Bae, Y.: Change the myofascial pain and range of motion of the temporomandibular
joint following kinesio taping of latent myofascial trigger points in the sternoclei-
domastoid muscle. J. Phys. Ther. Sci. 26(9), 1321–1324 (2014)
10. Shaffer, S.M., Brismee, J.M., Sizer, P.S., Courtney, C.A.: Temporomandibular dis-
orders. Part 1: anatomy and examination diagnosis. J. Manual Manipulative Ther-
apy 22(1), 2–12 (2014)
11. Davoudi, A., Haghighat, A., Rybalov, O., Shadmehr, E., Hatami, A.: Investigating
activity of masticatory muscles in patients with hypermobile temporomandibular
joints by using EMG. J. Clin. Exp. Dent. 7(2), e310 (2015)
12. Spagnol, G., Palinkas, M., Regalo, S.C.H., de Vasconcelos, P.B., Sverzut, C.E.,
Trivellato, A.E.: Impact of midface and upper face fracture on bite force, mandibu-
lar mobility, and electromyographic activity. Int. J. Oral Maxillofac. Surg. 45(11),
1424–1429 (2016)
13. Mazzetto, M.O., Anacleto, M.A., Rodrigues, C.A., Bragança, R.M.F., Paiva, G.,
Valencise Magri, L.: Comparison of mandibular movements in TMD by means of
a 3D ultrasonic system and digital caliper rule. Cranio 35(1), 46–51 (2017)
14. Akgol, A.C., Saldiran, T.C., Tascilar, L.N., Okudan, B., Aydin, G., Rezaei, D.A.:
Temporomandibular joint dysfunction in adults: its relation to pain, general joint
hypermobility, and head posture. Int. J. Health Allied Sci. 8(1), 38 (2019)
15. Alonso-Royo, R., et al.: Validity and reliability of the helkimo clinical dysfunction
index for the diagnosis of temporomandibular disorders. Diagnostics 11(3), 472
(2021)
Methods of Functional Assessment of TMJs 389
16. Kulesa-Mrowiecka, M., Piech, J., Gaździk, T.S.: The effectiveness of physical ther-
apy in patients with generalized joint hypermobility and concurrent temporo-
mandibular disorders - a cross-sectional study. J. Clin. Med. 10(17), 3808 (2021)
17. Campos López, A., De-Miguel, E.E., Malo-Urriés, M., Acedo, T.C.: Mouth open-
ing, jaw disability, neck disability, pressure pain thresholds, and myofascial trigger
points in patients with disc displacement with reduction: a descriptive and com-
parative study. In: CRANIO, pp. 1–7 (2021)
18. dos Santos Berni, K.C., Dibai-Filho, A.V., Pires, P.F., Rodrigues-Bigaton, D.:
Accuracy of the surface electromyography RMS processing for the diagnosis of
myogenous temporomandibular disorder. J. Electromyogr. Kinesiol. 25(4), 596–
602 (2015)
19. Woźniak, K., Lipski, M., Lichota, D., Szyszka-Sommerfeld, L.: Muscle fatigue in the
temporal and masseter muscles in patients with temporomandibular dysfunction.
BioMed Research International (2015)
20. Amrulloevich, G.S., Mirjonovich, A.O.: Clinical features of diagnostics and their
defenses in patients with dysfunction of the high-mandibular joint without pathol-
ogy, inflammatory-dystrophic origin. Int. J. Progressive Sci. Technol. 22(2), 36–43
(2020)
21. Sharma, S., Crow, H.C., Kartha, K., McCall, W.D., Gonzalez, Y.M.: Reliability
and diagnostic validity of a joint vibration analysis device. BMC Oral Health 17(1),
1–7 (2017)
22. Baldini, A., Nota, A., Tecco, S., Ballanti, F., Cozza, P.: Influence of the mandibular
position on the active cervical range of motion of healthy subjects analyzed using
an accelerometer. Cranio 36(1), 29–34 (2018)
23. Whittingslow, D.C., Orlandic, L., Gergely, T., Prahalad, S., Inan, O.T., Abramow-
icz, S.: Acoustic emissions of the temporomandibular joint in children: proof of
concept. Frontiers of Oral and Maxillofacial Medicine, 2 (2020)
24. Widmalm, S.E., Dong, Y., Li, B.X., Lin, M., Fan, L.J., Deng, S.M.: Unbalanced
lateral mandibular deviation associated with TMJ sound as a sign in TMJ disc
dysfunction diagnosis. J. Oral Rehabil. 43(12), 911–920 (2016)
25. Łyżwa, P., Kłaczyński, M., & Kazana, P. (2018). Vibroacoustic methods of imaging
in selected temporomandibular joint disorders during movement. Diagnostyka, 19
26. Carmignani, A., Carmignani, R., Ciampalini, G., Franchini, M., Greven, M.: Com-
parison of condylar lateral translation and skeletal classes. Zeitschrift für Kran-
iomandibuläre Funktion 9(3), 1–15 (2017)
27. Loster, B. W., Loster, J., Wieczorek, A., Ryniewicz, W.: Mycological analysis of the
oral cavity of patients using acrylic removable dentures. Gastroenterology Research
and Practice (2012)
28. Park, Y., Bae, Y.: Change of range of motion of the temporomandibular joint after
correction of mild scoliosis. J. Phys. Ther. Sci. 26(8), 1157–1160 (2014)
29. Clemente, M.P., Mendes, J., Moreira, A., Vardasca, R., Ferreira, A.P., Amarante,
J.M.: Wind instrumentalists and temporomandibular disorder: from diagnosis to
treatment. Dentistry J. 6(3), 41 (2018)
30. Barbosa, J.S., et al.: Infrared thermography assessment of patients with temporo-
mandibular disorders. Dentomaxillofacial Radiology 49(4), 20190392 (2020)
31. Gouw, S., de Wijer, A., Bronkhorst, E.M., Kalaykova, S.I., Creugers, N.H.: Asso-
ciation between self-reported bruxism and anger and frustration. J. Oral Rehabil.
46(2), 101–108 (2019)
390 D. Kania et al.
32. Krohn, S., Gersdorff, N., Wassmann, T., Merboldt, K.D., Joseph, A.A., Buergers,
R., Frahm, J.: Real-time MRI of the temporomandibular joint at 15 frames per
second - a feasibility study. Eur. J. Radiol. 85(12), 2225–2230 (2016)
33. Czernielewska, J., Gębska, M., Weber-Nowakowska, K.: Ocena ruchomości stawów
skroniowo-żuchwowych i odcinka szyjnego kręgosłupa u osób z bruksizmem. Medy-
cyna Ogólna i Nauki o Zdrowiu 26(1), 60 (2020). (in Polish)
Sound and Motion
The Effect of Therapeutic Commands
on the Teaching of Maintaining Correct
Static Posture
1 Introduction
Rehabilitation is very popular today. Humans naturally strive for comfort, and
often in doing so adopt a collapsed posture, causing dramatically unfavourable
changes in lifestyle. Counterintuitively these habits become (next to excessive
symptomatic pharmacotherapy) the primary method used to avoid pain [1–3].
Prevention is the best way to counteract the need for repair later on, and would
improve our health, which we know instinctively very well, but are also very
reluctant to pursue [4]. A participant’s ability to adhere to a health program, to
sustain motivation and postural correctness are also of concern [5]. This can be
seen as a cultural issue, and related to an inadequate health policy. Therefore,
it has a long-term dimension, which today is also shaped by the popularising
influence of scientific institutions that address the issue of early prevention.
This article, however, focuses on one of the elements of the formation of cor-
rect movement habits, conducive to the correctness of the physiotherapeutic or
physioprophylactic activities undertaken.
In the context of long-term research, attention should be paid to the phe-
nomenon of Music Entrainment, i.e., the influence on the human psychosomatic
system in the sense of neuronal motor initiation, using appropriately selected and
timed auditory stimuli [6]. Because of the natural connection of sonic stimulation
with movement behaviour in the broadest sense, it is justified to treat the prob-
lems undertaken in music therapy. Current research has traction over centuries-old
work in music therapy in its general definition constituted by radically improved
measurement instrumentation. It allows the viewing of slow-moving processes
and, above all, the recording of real-time data from the functional behaviours of
the human body.
The most common habitual postural defects in the sagittal plane are pro-
traction and flexed-relaxed position [7]. These positions are often adopted, espe-
cially by sedentary habits, due to low muscular effort and increased load on pas-
sive paraspinal structures [8]. Protraction, together with rounded shoulders, can
cause weakening of the cervical extensors as well as increased compressive forces
on the cervical spine [9]. Biofeedback is an effective intervention for postural re-
education and reduction of changed muscle activation [10]. EMG-based biofeed-
back can specifically reduce activation of a target muscle. Visual feedback based
on pressure signals can reduce or prevent shoulder protraction and rounding.
EMG-based devices can support postural alignment by reducing static muscle
activity [11].
The research presented in this article was preliminary. The aim was to anal-
yse the effect of a series of voice commands given by a physiotherapist on the
patient’s ability to learn to and maintain correct posture while sitting. The devel-
opment of a research protocol and the analysis of the obtained results will allow,
The Effect of Therapeutic Commands 395
in the next phase, to sonify and modify the emotional content of voice commands
to stimulate therapeutic actions.
2.1 Materials
Six people in the age range of 22 to 39 years participated in the study. All probands
were born, grew up, and currently live in the territory of Poland and have all com-
pleted secondary education. The participants were informed about the aim of the
research, the course of the experiment, and the instructions of correct posture.
They had no feedback in self-observation of the movements performed – relying
only upon their proprioception. Adopting the proper posture in the presented case
meant that the physiotherapist conducting the examination placed the head in the
Frankfurt position, otherwise known as the eye-ear plane. The aim was to promote
skull alignment so that the lower border of the eye along both sides of the human
head was in the same horizontal plane as the upper border of the auditory muscle
[12].
Participants gave written consent before the measurements were taken, and
the data obtained were fully anonymised. Participants in the study had to comply
with the requirements of the experiment, i.e., neck and ears should be exposed,
hair pinned up, wearing a ‘strapless’ T-shirt allowing the neck, and clavicles to
be revealed.
The study was performed in a specially prepared laboratory where the subjects
were provided with peace, comfort, and privacy. During the experiment, only the
people conducting the research and the participants were in the room. The research
protocol consisted of several elements (Fig. 1). The first of these was the comple-
tion by the patient of a questionnaire on selected aspects of musical preferences and
musicality, according to the concept of Bialkowski et al. [13], and primary demo-
graphic data. The subject was also asked to complete the standardized Job-related
Affective Well-being Scale (JAWS) test [14], which consisted of 12 questions and
concerned the current emotions experienced. Then the patient was asked to take
a designated seat – sit in a chair without a backrest. An Empatica E4, worn each
time on the wrist of the patient’s non-dominant hand, used to measure physiolog-
ical signals in real-time, such as electrodermal activity (EDA), body temperature,
accelerometric signals (ACC), and blood volume pulse (BVP). The EDA and tem-
perature signal sampling frequency 4 Hz, the BVP signal 64 Hz, and the ACC –
32 Hz.
In the next step, the physiotherapist placed electrodes for measuring the elec-
tromyographic (EMG) signal in predefined points of the patient’s body according
to the following figure (Fig. 2) symmetrically on both sides. The relevant mus-
cle parts were chosen deliberately because of their primary role during head
396 D. Kania et al.
(a) (b)
A Noraxon Ultium sensor system was used to record the EMG signal, in a 6-
channel measurement configuration, with a sampling rate of 2 kHz. In addition,
using the same device, one of the sensors located on the top of the patient’s head,
at the vertex, was configured only for full IMU measurement (accelerometer,
gyroscope, and magnetometer), with a sampling rate of 0.4 kHz.
The Effect of Therapeutic Commands 397
After placing all the necessary sensors on the patient’s body, the measurement
of all the signals mentioned above was started from that moment until the end
of the whole research protocol. Subsequently, after the recording was completed,
the collected information, in raw data, was exported to a .csv format file for
further analysis.
The primary research and measurement procedure consisted of three stages.
During each of them, the physiotherapist gave strictly defined voice commands,
with the help of which the patient, by definition, was supposed to strive to reach
the Frankfurt position from a specific initial position – a tilted head, with the chin
attached to the sternum, while sitting down. The commands were given in Pol-
ish, without dialect accents and without emotion in voice, with objective sound
intensity. The beginning and end of each event were marked with time markers
independently on E4 and Noraxon. The first stage of recording involved learning
to reach the final position. The patient repeated the exercise three times according
to the given commands, and the physiotherapist continuously corrected any incor-
rect part of the subject’s head. Knowing what to do as well as the basic expecta-
tions of the therapist allowed the probate to assume the correct positions as far
as possible with respect to their motor habits and motor limitations. Then, while
sitting on a chair, without support, the pattern shown in Fig. 3 was repeated six
times, but this time without the possibility of correction by the specialist.
The final stage was to change the chair to a version with a backrest (bench),
and the patient was again asked to execute the commands six times. The
research task was to observe the susceptibility to the commands given. Angle val-
ues, determined from accelerometer measurements, during the individual move-
ment sequences are presented below, together with the corresponding commands
(Fig. 4).
Immediately after completing the registration, the patient was again asked
to perform the JAWS test, which seeks to attain the subject’s feelings during
exercise. Removal of the E4 device from the subject’s hand and the Noraxon
sensors stopped condition monitoring during the test.
stages) in the following order: learning, chair exercise, bench exercise. In subse-
quent analysis, this facilitated the division of the recorded signals and the HRV
coefficients determined from the BVP into appropriate time segments, depending
on the exercises. For all signals, similar features recorded during the presented
protocol stages were determined.
According to the concept of Greco et al., the EDA signal was divided into the
tonic (t) and phasic (p) components, and additive error (e) [16]. Then, in the pha-
sic, sudden changes (local maxima) were searched for, i.e., galvanic skin responses
(GSR). The total number of GSR, amplitude, and the number of responses per
minute (rpm) were determined in the analysed time segment. The skin conduc-
tance level (SCL) was calculated based on the tonic component. From the sum of
the tonic and phasic components, the basic statistical features of the signal in the
time domain were calculated, such as the mean (x), standard deviation (σ), min-
imum (min), maximum (max), skin conductance level (SCL) and the number of
galvanic skin response (nGSR) and GSR’s amplitude (aGSR) [17]. The BVP was
the basis for determining heart rate variability (HRV ) coefficients. The accelero-
metric signal recorded by Empatica E4 during the whole research protocol was
also important in this stage of the analysis, as a reference signal to eliminate arti-
facts in the analysis of cardiac signals related to the subject’s movement. Based
on BVP, an IBI (Inter-beats-interval) vector was determined – a vector of consec-
utive time intervals (dt) between individual heartbeats – local maxima or min-
ima. On this basis, the following coefficients were calculated in the next stage of
analysis: standard deviation of normal-to-normal intervals (SDN N ), root mean
square of successive differences (RM SSD), probability of intervals greater than 50
ms (pN N 50), integral of the density of the RR interval histogram divided by its
height (T RI) and mean heart rate value based on the inter-beats interval (HR).
The Effect of Therapeutic Commands 399
3 Results
The following table presents selected physiological signal parameters determined
for the relevant parts of the research protocol (Table 1).
Based on the above results (Table 1), it is important to note the mean EDA
value increase in the study’s successive stages and the consequent increase in
400 D. Kania et al.
the maximum value, the number of GSRs, and the SCL parameter. Analysing
the heart rate variability parameters, the regularity of SDNN, pNN50, and HR
values can be noticed, according to the norms for the age group [20].
The table below (Table 2) presents selected EMG signal parameters deter-
mined for each study stage.
Table 2. Mean values of EMG signal parameters at different stages of the research
protocol
The obtained values of mean amplitude and maximum value for the stage
before the test (baseline) and during the learning of movement sequence are the
lowest, and increase with the progress of subsequent stages of the experiment.
Table 3 shows the results of the JAWS test performed before and after the
study.
Before After
JAWS Total sum 35.5 43.2
Positive emotion 14.5 16.5
Negative emotion 15.0 9.3
In the second stage (after), a higher mean summed test value was obtained,
with the dominance of the experienced positive emotions. While analysing the
music preference questionnaire results, it is essential to note the attitude of the
The Effect of Therapeutic Commands 401
proband towards music. Most of the probands (80%) did not have any musi-
cal education. However, some of them (34%) were self-taught musicians. Of all
probands, 67% indicated that music and related activities were essential. In con-
trast, the remainder stated that music was an indifferent part of their lives.
Participants were reluctant to participate in music-making, singing, and musical
activities. They were much more likely to choose passive forms of communicat-
ing with music, such as listening to music or concerts, devoting on average 15 h
a week to this. The respondents most often use a smartphone or a computer
during their leisure time, and the desire to interact with music was indicated
much less frequently. The music genre most commonly chosen by the partic-
ipants was film-music and rock. Half of the participants also prefer classical
music. The study group’s most frequently selected musical activities included
buying records/subscriptions, talking about music, and making music for per-
sonal enjoyment. Less common choices were reading, listening to or watching
music programs, attending music events, and making music with family and
friends. None of the people participating in the survey perform music for a liv-
ing, play/sing in a musical ensemble or choir.
4 Discussion
This paper presents the results of a preliminary study analysing the effect of
a series of given voice commands on a patient’s ability to learn and maintain
correct posture while sitting. The presented research protocol was based on the
acquisition of physiological data such as EMG, EDA, BVP, and cognitive data
such as the JAWS emotion intensity score and music preferences of the subject.
The analysis was performed based upon changes in the mean values of the
individual physiological coefficients in the successive stages of the research pro-
tocol (Table 1). With each subsequent element an increase in the mean and max-
imum value of the EDA signal, SCL level, and nGSR was noted, thus requiring
more and more involvement and control from the test subject. This indicates
the stimulation of the autonomic nervous system responsible for EDA activation
and heart rate [21]. It may be related to the perceived stress or coping with the
situation [22]. It should also be noted that despite the increase in the number of
GSRs (in the following components of the research protocol), their amplitude’s
mean value decreases. These arousals are no longer as intense as the beginning,
which were associated with a new unknown. In a sense, patients get used to the
situation, despite the stress and discomfort still felt [23].
HRV coefficients such as SDNN and HR were consistent with the norms
[18]. The RMSSD and pNN50 parameters showed some abnormalities – RMSSD
is too low for this age group, while pNN50 is too high. It may probably be
because the recording was relatively short, even called ultra-short (less than
5 min). According to the available literature, a more accurate analysis could
be performed to record at least 5 min [24]. The determined values of the mean
amplitude and maximum value of the EMG signal for the individual stages of
analysis differed from one another. The step before the test (baseline) and during
402 D. Kania et al.
the movement sequence learning was the lowest and increased with the progress
of subsequent experiment phases. This was a correct occurrence, as prolonged
execution of a specific movement increases the number of motor units of a given
muscle performing this movement [19], which indicates the subjects’ involvement
in the exercises committed. Moreover, the values obtained for symmetrical forces
were similar, suggesting that the performed movement were equally loaded on
the right and left sides of the body. Any deviation in the results achieved by a
specific muscle pair may indicate a significantly weaker muscle, which may be an
indication to adjust the training accordingly for a particular group of subjects
in subsequent experiments [25]. For the upper trapezius muscle (V – left), a
significantly different average maximum value was observed for the test stage
with the execution of the independent movement in contrast to the right side
(V – right) or the initial stages of the test. Discrepancy in the results due to the
displacement of the pair of electrodes attached over the trapezius muscle could
not be excluded as the amplitude range for the EMG signal should be within
0–10 mV (±5) suggesting the need to pay attention to the correct placement of
the electrodes and observe their position during the test [26].
Analysing the intensity of emotions of the subjects (Table 3), attention should
be given to their variability before and after performing corrective exercises. The
predominance of the average value in the range of positive to negative emotions
may indicate that the negative affect was not important for the exercise perfor-
mance. The observed phenomenon could be perceived as a positive experience of
the probands, and a willingness to learn to maintain correct body posture while
sitting. Before the tests, the subjects indicated the predominance of experienc-
ing negative emotions, which may have resulted from the unfamiliarity with the
situation – a test supervised by specialists, the lack of previous detailed descrip-
tion of the research protocol, or being informed about the need to expose the
relevant muscle parts.
Due to the small study group and the preliminary character of the research,
the importance of musical preference for command response cannot be deter-
mined.
To the authors’ knowledge, there were no studies available that analyse the
effect of a series of voice commands on a patient’s ability to learn to main-
tain correct posture while sitting. Literature sources only report some related
methods using sensors, in an independent configuration, to assess the emotional
state, perceived stress during testing, response to musical entrainment, or muscle
tension during exercise.
The issues of emotion analysis in combination with EMG signal recording
were repeatedly referred to as facial EMG analysis [27–29]. The electrodermal
activity and electrocardiographic signals were the most frequently used physio-
logical patterns of subjects’ emotional responses as a direct reflection of sympa-
thetic nervous system activity [30–32]. However, when analysing the cited solu-
tions, it should be noted that the evoked emotions were controlled each time. A
given event was supposed to cause a targeted feeling. It was not a reaction to
The Effect of Therapeutic Commands 403
5 Conclusion
This paper presents the results of a preliminary study analysing the effect of a
series of voice commands on a patient’s ability to learn and maintain a correct
posture while sitting. Based on the conducted experiment, it is important to draw
attention to the necessity of researching a larger group of people, determining
an additional number of physiological parameters, their subsequent correlation
with psychological data, recorded accelerations, and the influence of personal
musical preferences on the obtained results. To achieve such a set of tasks in
assessing dynamic change in the patient’s condition, it is necessary to develop
a multimodal measurement system that records physiological state and psycho-
logical condition in real-time in relation to social and environmental external
conditions.
References
1. Alves-Conceicao, V., da Silva, D.T., de Santana, V.L., Dos Santos, E.G., Santos,
L.M.C., de Lyra, D.P.: Evaluation of pharmacotherapy complexity in residents
of long-term care facilities: a cross-sectional descriptive study. BMC Pharmacol.
Toxicol. 18(1), 1–8 (2017)
404 D. Kania et al.
2. Jesus, T.S., Hoenig, H., Landry, M.D.: Development of the rehabilitation health
policy, systems, and services research field: quantitative analyses of publications
over time (1990–2017) and across country type. Int. J. Environ. Res. Public Health
17(3), 965 (2020)
3. Osterweis, M., Kleinman, A., Mechanic, D.: Pain and disability: Clinical, behav-
ioral, and public policy perspectives (1987)
4. Muntigl, P., Horvath, A. O., Chubak, L., Angus, L.: Getting to “yes”: overcoming
client reluctance to engage in chair work. Front. Psychol. 11 (2020)
5. Palazzo, C., et al.: Barriers to home-based exercise program adherence with chronic
low back pain: Patient expectations regarding new technologies. Annals Phys.
Rehabilitation Med. 59(2), 107–113 (2016)
6. Clayton, M., Sager, R., Will, U.: In time with the music: the concept of enter-
tainment and its significance for ethnomusicology. In: European meetings in eth-
nomusicology, vol. 11, pp. 1–82. Romanian Society for Ethnomusicology, January
2005
7. Szeto, G.P., Straker, L., Raine, S.: A field comparison of neck and shoulder postures
in symptomatic and asymptomatic office workers. Appl. Ergon. 33(1), 75–84 (2002)
8. Chiu, T.T.W., Ku, W.Y., Lee, M.H., Sum, W.K., Wan, M.P., Wong, C.Y., Yuen,
C.K.: A study on the prevalence of and risk factors for neck pain among university
academic staff in Hong Kong. J. Occup. Rehabil. 12(2), 77–91 (2002)
9. Moore, M.K.: Upper crossed syndrome and its relationship to cervicogenic
headache. J. Manipulative Physiol. Ther. 27(6), 414–420 (2004)
10. Neblett, R., Mayer, T.G., Brede, E., Gatchel, R.J.: Correcting abnormal flexion-
relaxation in chronic lumbar pain: responsiveness to a new biofeedback training
protocol. Clin. J. Pain 26(5), 403 (2010)
11. Ma, C., Szeto, G.P., Yan, T., Wu, S., Lin, C., Li, L.: Comparing biofeedback
with active exercise and passive treatment for the management of work-related
neck and shoulder pain: a randomized controlled trial. Arch. Phys. Med. Rehabil.
92(6), 849–858 (2011)
12. Robinson, D., Kesser, B.W.: Frankfort horizontal plane. In: Kountakis, S.E. (eds.)
Encyclopedia of Otolaryngology, Head and Neck Surgery, pp. 960–960. Springer,
Heidelberg (2013). https://doi.org/10.1007/978-3-642-23499-6 200042
13. Biallkowski, A., Migut, M., Socha, Z., Wyrzykowska, K.M.: Muzykowanie w Polsce.
Badanie podstawowych form muzycznej aktywności Polaków (2014)
14. Van Katwyk, P.T., Fox, S., Spector, P.E., Kelloway, E.K.: Using the Job-Related
Affective Well-Being Scale (JAWS) to investigate affective responses to work stres-
sors. J. Occup. Health Psychol. 5(2), 219 (2000)
15. Mittelstaedt, H.: Origin and processing of postural information. Neurosci. Biobe-
havioral Rev. 22(4), 473–478 (1998)
16. Greco, A., Valenza, G., Lanata, A., Scilingo, E.P., Citi, L.: cvxEDA: a convex
optimization approach to electrodermal activity processing. IEEE Trans. Biomed.
Eng. 63(4), 797–804 (2015)
17. Romaniszyn-Kania, P., et al.: Affective state during physiotherapy and its analysis
using machine learning methods. Sensors 21(14), 4853 (2021)
18. Shaffer, F., Ginsberg, J.P.: An overview of heart rate variability metrics and norms.
Frontiers in public health, 258 (2017)
19. Konrad, P.: ABC EMG: praktyczne wprowadzenie do elektromiografii kinezjolog-
icznej. Technomex Spólka z oo (2007)
20. Van Ravenswaaij-Arts, C.M., Kollee, L.A., Hopman, J.C., Stoelinga, G.B., van
Geijn, H.P.: Heart rate variability. Ann. Intern. Med. 118(6), 436–447 (1993)
The Effect of Therapeutic Commands 405
21. Carlson, N.R.: Physiology of Behavior: Books a la Carte Edition. Prentice Hall
(2016)
22. Setz, C., Arnrich, B., Schumm, J., La Marca, R., Tröster, G., Ehlert, U.: Discrim-
inating stress from cognitive load using a wearable EDA device. IEEE Trans. Inf.
Technol. Biomed. 14(2), 410–417 (2009)
23. Junk, K., Peller, L., Brandenburg, H., Lehrmann, B., Henke, E.: Physiological
Stress Response to Anticipation of Physical Exertion (2018)
24. Munoz, M.L., et al.: Validity of (ultra-) short recordings for heart rate variability
measurements. PloS One 10(9), e0138921 (2015)
25. Balasubramanian, V., Adalarasu, K.: EMG-based analysis of change in muscle
activity during simulated driving. J. Bodyw. Mov. Ther. 11(2), 151–158 (2007)
26. Reaz, M.B.I., Hussain, M.S., Mohd-Yasin, F.: Techniques of EMG signal analysis:
detection, processing, classification and applications. Biological Procedures Online
8(1), 11–35 (2006)
27. Van Boxtel, A.: Facial EMG as a tool for inferring affective states. In: Proceed-
ings of measuring behavior, vol. 7, pp. 104–108. Wageningen: Noldus Information
Technology, August 2010
28. Mithbavkar, S.A., Shah, M.S.: Analysis of EMG based emotion recognition for
multiple people and emotions. In: 2021 IEEE 3rd Eurasia Conference on Biomedical
Engineering, Healthcare and Sustainability (ECBIOS), pp. 1–4. IEEE, May 2021
29. Sato, W., Murata, K., Uraoka, Y., Shibata, K., Yoshikawa, S., Furuta, M.: Emo-
tional valence sensing using a wearable facial EMG device. Sci. Rep. 11(1), 1–11
(2021)
30. Zheng, B. S., Murugappan, M., Yaacob, S., Murugappan, S.: Human emotional
stress analysis through time domain electromyogram features. In: 2013 IEEE Sym-
posium on Industrial Electronics & Applications, pp. 172–177. IEEE, September
2013
31. Canento, F., Fred, A., Silva, H., Gamboa, H., Lourenço, A.: Multimodal biosignal
sensor data handling for emotion recognition. In: 2011 IEEE SENSORS, pp. 647–
650. IEEE, October 2011
32. Egger, M., Ley, M., Hanke, S.: Emotion recognition from physiological signal anal-
ysis: a review. Electron. Notes Theoretical Comput. Sci. 343, 35–55 (2019)
33. Vuilleumier, P., Trost, W.: Music and emotions: from enchantment to entrainment.
Ann. N. Y. Acad. Sci. 1337(1), 212–222 (2015)
34. Galińska, E.: Music therapy in neurological rehabilitation settings. Psychiatr. Pol.
49(4), 835–846 (2015)
35. Le Roux, F.: Music: a new integrated model in physiotherapy. South African J.
Physiotherapy 54(2), 10–11 (1998)
36. Talvitie, U., Reunanen, M.: Interaction between physiotherapists and patients in
stroke treatment. Physiotherapy 88(2), 77–88 (2002)
37. Gyllensten, A.L., Gard, G., Salford, E., Ekdahl, C.: Interaction between patient and
physiotherapist: a qualitative study reflecting the physiotherapist’s perspective.
Physiother. Res. Int. 4(2), 89–109 (1999)
38. Klaber Moffett, J.A., Richardson, P.H.: The influence of the physiotherapist-
patient relationship on pain and disability. Physiother. Theory Pract. 13(1), 89–96
(1997)
Improving the Process of Verifying
Employee Potential During Preventive
Work Examinations – A Case Study
1 Introduction
Preventive examinations of employees make it possible to determine the absence
of contraindications for further work in the specific position and to estimate the
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 406–420, 2022.
https://doi.org/10.1007/978-3-031-09135-3_34
Improving the Process of Verifying Employee Potential 407
risk of occupational diseases. They thus confirm that the employee has com-
petences at a level that ensures the performance of work as expected by the
employer. At the same time, the appraisal given makes it possible to plan the
employee’s further career development. As in the case of job interviews, a limita-
tion is represented by the low reliability of the appraisals made due to the stress
experienced, which does not allow the employee to fully present their potential.
A modification is suggested to the current way of assessing voice professionals’
examinations, consisting of supplementing the ENT examination with question-
naires and standardised emotion assessment methods, and supplementing stress
surveys with psychophysiological measurements. The authors set themselves the
objective of testing a procedure for the reliable appraisal of the professional compe-
tences presented, taking into account the stress induced by each appraisal situation
in the individual, leading to a deterioration in the quality of task performance and
to a lack of adequate assessment of the individual’s potential by others.
The transactional stress theory proposed by Lazarus and Folkman treats
stress as a situational relationship between the environment and the individual,
interpreted by the latter as exceeding their resources or threatening their well-
being [1]. The situation in which the individual is assessed in terms of their pro-
fessional competences can be regarded as stress-inducing, as issuing an opinion
involves social evaluation, important for the candidate’s future [2]. McCarthy and
Goffin identified five areas of anxiety in an employee selection situation, related
to communication (stress related to verbal and non-verbal communication and
listening competencies); appearance (apprehension about physical look); social
(nervousness about social, behavioral appropriateness because of the desire to
be liked); performance (worry about the outcome, such as fear of failure); and
behavioral (expression of the autonomic arousal of anxiety, such as uneasiness or
fidgeting) [3]. In addition, displaying anxiety during a job interview reduces the
likelihood of being hired, and performance deteriorates, especially in situations
where candidates are competing [4,5].
The theory by van Katwyk et al. links emotional states to occupational activ-
ity, dividing them in terms of their positive and negative valence (pleasure and
displeasure) [6]. Taking into account the intensity of arousal (Low/High arousal),
four groups of emotions can be distinguished: high pleasure/high arousal (e.g.,
euphoria), low pleasure/high arousal (e.g., disgust), high pleasure/low arousal
(e.g., satisfaction), and low pleasure/low arousal (e.g., boredom). Van Katwyk’s
theory can be linked to the concept of stress by Selye, according to which pos-
itive emotions with a high degree of arousal can be referred to as psychological
eustress, while negative emotions with a high degree of arousal can be referred to
as psychological distress [7,8]. Eustress has a positive, mobilising impact on the
individual and has a protective effect. It increases one’s energy to act, improves
performance, and is accompanied by emotions such as excitement and joy. Dis-
tress, on the other hand, is a state that reduces concentration and performance,
perceived as a situation beyond the individual’s ability to cope. It is associated
with unpleasant emotions, i.e., fear, uncertainty, and helplessness [7].
The relationship mentioned above, consisting of feeling stress during an
employee selection situation, depends on the individual’s cognitive appraisal and
408 M. Bugdol et al.
stress management skills [9]. Lazarus and Folkman pointed out that people con-
stantly assess what happens to them in terms of its relevance to their well-being
[10]. On the basis of research by these authors, two types of the appraisal can
be distinguished: primary appraisal and secondary appraisal. Primary appraisal
concerns motivational meaning, i.e., whether something relevant to our well-
being is happening. Primary judgments can be divided into three categories:
harm already experienced, threat, i.e., anticipated harm, and challenge, i.e., the
possibility of achieving mastery or gain. Challenge is mentioned in the context of
stress assessment because the individual needs to mobilise to overcome obstacles
and achieve a favorable outcome. Challenges always involve a threat, as there
is always a certain risk of suffering harm. Secondary appraisal is a key element
complementing primary appraisal, since harm, threat, challenge and benefit also
depend on the extent to which one believes they have control over the outcomes.
If there is a risk of a harmful outcome, but one is confident that it can be
prevented, the sense of threat is either minimal or entirely absent.
Choosing emotion as a variable in voice recording is justified, in addition to
its association with stress, by the fact that current technology allows to mea-
sure emotions, including stress, using a voice sample. The most commonly used
measures are voice amplitude (i.e., loudness) and pitch (also referred to as fun-
damental frequency, or F0 ) [11]. The most consistent relationship described in
the literature concerning emotion and voice pitch showed a correlation between
stronger emotional arousal and a higher tone of voice [12–14]. Scherer et al.
investigated the acoustic features of phrases uttered by actors [15]. When actors
presented high valence emotions such as fear, joy, and anger, the pitch was
higher compared to when they presented lower valence emotions, such as sad-
ness. Bachorowski and Owen suggested that voice pitch could be used to assess
the level of emotional arousal being experienced by an individual [16]. Phys-
iological signals, i.e., electrodermal activity and skin temperature, as well as
heart rate variability (HRV), have been proven to be reliable stress indicators
[17,18]. Thus, using them in conjunction with subjective psychological measures
can deepen insights into the fundamental psychological mechanisms describing
reactions to stress. Combining more than one measurement strategy makes it
possible to achieve higher validity in studies, so it is justified to combine physi-
ological measurements with self-report questionnaires [19]. Taking into account
the psychological variables present in the study (stress and emotions), it needs
to be mentioned that self-report methods do not allow to measure stress “live”
at a given moment, but only after a certain of time, after cognitive reappraisal
[20]. Data collected using declarative methods make it possible to capture the
individual’s interpretation of their reaction rather than the reaction itself [21].
2.2 Subject
The studied individual was chosen deliberately. The research problem addressed
affects and is personally important for the person invited to participate in the
study, a singer from a leading opera ensemble in Poland. The representative
of this group is excellently prepared for the job. They actively perform their
work-related duties, and their decisions concerning the continuation and direc-
tions of further career development take into account the current state of health
and competence level. They work with their voice for 4 h a day on average. In
the process of competence appraisal, importance is given to the interpretation
presented by both the subject and the relevant expert.
The research was conducted in accordance with the recommendations of the
Declaration of Helsinki, with prior approval by the ethics board. Before the
start of the measurements, the participant was informed about the purpose of
the study and the following steps and consented to them in writing. The data
obtained were fully anonymised.
2.3 Equipment
Specialised equipment was used in the research protocol presented here. Empat-
ica E4 was used to record physiological signals during the subsequent stages of
the study. The patient’s voice was recorded in an appropriately soundproofed
audiometric chamber with a Kay Elementrics device. The Voice Handicap Index
Questionnaire (VHI) was used in the study [22]. The psychological measurements
were performed using a single survey question concerning anticipated stress and
the standardised Job-related Affective Well-being Scale (JAWS) questionnaire
[6]. The surveys and questionnaire studies were conducted using the paper-and-
pencil method.
otoscopy, anterior rhinoscopy, examination of the oral cavity and of the pharynx,
and endoscopy of the larynx. In addition, the participant was asked to complete
the VHI questionnaire, which consisted of three parts, with a total of 30 ques-
tions, concerning the functional, emotional and physical state of a person’s voice
disorders with an indication of the answer to a question about the intensity of
anticipated stress related to the appraisal professional competences [22]. Upon
entering the room where further tests would be performed, the subject was first
fitted with the Empatica E4 device on the wrist of the non-dominant hand,
making it possible to record physiological signals such as electrodermal activity
(EDA), temperature, blood volume pressure (BVP), and accelerometer (ACC)
signals, in real-time. The sampling frequency of the EDA and temperature signal
4 Hz, of the BVP signal 64 Hz, and of ACC 32 Hz. The signals were recorded from
that moment until the completion of the entire research protocol, after which
they were exported in raw data form to a .csv file for further analysis.
The subject was then asked to enter a special soundproofed audiometric
chamber, where their voice was recorded using Kay Elementrics (Fig. 2. The
subject’s task was to pronounce the vowels a, e and i one after another in
prolonged phonation. During voice recording, the subject had noise-canceling
headphones on and was alone in the booth. After the recording, the subject was
asked to complete the JAWS questionnaire (JAWS1 ).
The next step in the research protocol involved tonal audiometry, followed by
impedance audiometry. The first test type involved determining what is referred
to as the hearing threshold, i.e., the quietest sound, in the range of frequencies
tested. For this purpose, the patient was asked to sit in the chamber wearing
noise-canceling headphones and holding a special button. After the test started,
the audiometer generated tones of different frequencies, changed by the operator
at the console (outside the chamber). The test was conducted using a standard
clinical audiometer, measuring air conduction in a range 125 Hz to 8 kHz and
bone conduction in a range 250 Hz to 4 kHz [23]. The patient was asked to click
a button each time they heard a sound. The clinical criterion for normal hearing
level in tonal audiometry, according to World Health Organization guidelines,
refers to values not exceeding 20 dB HL. Directly afterwards, impedance audiom-
etry was performed as the most accurate and objective method of middle ear
examination, measuring middle ear pressure, stapedius reflex (with ipsilateral
and contralateral stimulation), and tympanic membrane tension. This type of
Improving the Process of Verifying Employee Potential 411
associated with the subject’s movement. The BVP signal provided the basis for
the determination of the IBI (inter-beat interval) vector, i.e., successive time
intervals (dt) between individual heartbeats – local maxima or minima, on the
basis of which the following coefficients were calculated at the next stage of the
analysis: standard deviation of normal-to-normal intervals (SDNN ), root mean
square of successive differences (RMSSD), probability of intervals greater than
50 ms (pNN50 ), and mean heart rate (HR) determined on the basis of IBIs, as
the basic parameters for ultrashort cardiac signal recordings [26].
The Kay Elementrics device, combined with a computer and appropriate soft-
ware, allows to perform a very accurate test of the acoustic structure of the voice.
Multi-Dimensional Voice Program (MVDP ) software was used, making it possi-
ble to analyse 33 acoustic parameters of the voice. Most often, however, it uses
17 parameters in clinical practice to determine percentage changes of a specific
feature of the human voice. These parameters were divided into groups defining
the physical characteristics of the voice: parameters describing fundamental fre-
quency (F0 , Fhi , Flo , STD), parameters assessing frequency disturbance (Jitt,
RAP, PPQ, sPPQ, vF0 ), amplitude disturbance (Shimm, APQ, sAPQ, vAm),
voice irregularity (DUV ), vocal tremor (modulation) (FTRI, ATRI ), and voice
breaks (DVB ), parameters indicating the presence of subharmonic components
(DSH ) and parameters indicating the presence of noise components (NHR, VTI,
SPI ).
MVDP does not allow to extract parameters from selected sounds, only from
the whole recording. It is also impossible to expand the list of coefficients deter-
mined. For this reason, additional parameters were also calculated broken down
by the individual sounds. These were the parameters used in [27].
Harmonic to Noise Ratio (HNR) describes the ratio of harmonics to non-
harmonics of the signal. It is based on autocorrelation and fundamental frequency
(F0 ) analysis. Fundamental Frequency Variability in Frames (vF0f rames ) stands
for the ratio of F0 standard deviation to its mean value. The set of F0 values
is determined for each frame of a signal. Next, Spectrum Centroid (SC ) reflects
the contribution of formants in power spectrum density. It indicates voice sharp-
ness. Spectrum Spread (SS ) determines the energy distribution with respect to
the spectrum centroid. SS makes it possible to distinguish noise from sound sig-
nals. Signal amplitude modulation is described by Shimmer (Shimm). It shows
changes in subglottic pressure connected with vocal fold tension. Jitter (Jitt)
reflects short-term F0 variability. Jitter can be used to evaluate self-control of
vocal fold vibration. Fundamental Frequency Variability (vF0 ) corresponds to
the vF0f rames feature, but F0 is determined for each signal period. Amplitude
Variability (vAm) is determined by the ratio of the standard deviation of the
amplitude to its mean value. Finally, Noise to Harmonic Ratio (NHR) shows the
ratio of non-harmonic to harmonic energy.
To what extent, on a scale of 1 to 10, do you feel stressed about your profes-
sional competence being verified during the tests? The respondent indicated their
answer on a graduated scale whose ends were marked with the appropriate labels,
with 1 = not at all on the left and 10 = strongest possible on the right.
Emotions were analysed using the Job-related Affective Well-being Scale
in its 12-item version. This tool measures emotional reactions occurring in an
occupational context. It is based on the theory by van Katwyk, which assumes
that experienced emotions can be described two-dimensionally: in terms of the
valence criterion (pleasure/displeasure), and in terms of arousal when the emo-
tion is experienced (low arousal/high arousal) [6]. The general categories consist
of individual emotions: the high pleasure/high arousal dimension involves excite-
ment, energy, and inspiration. This sphere can be also referred to as eustress.
High pleasure/low arousal emotions include relaxation, satisfaction, and feeling
at ease. The other part in terms of emotion valence involves low pleasure/high
arousal, i.e., anger, concern, and disgust, which can also be classified in the stress
category, and low pleasure/low arousal, i.e., fatigue, depression, and discourage-
ment. The response format is a 5-point Likert scale (from 1 – “never” to 5 – “very
often”). The answers given by the respondent make it possible to receive several
JAWS results: Total Score, Total Positive Emotions, Total Negative Emotions, as
well as totals for the individual subscales: eustress, High Pleasure/Low Arousal,
distress, Low Pleasure/Low Arousal.
3 Results
The subject assessed the situation of verifying their competences before the
start of the study as non-stressful. The VHI questionnaire score did not indicate
difficulties due to interference associated with voice pathologies. This is con-
trary to the answers they provided after the completion of the study. Both voice
recording, which was specific to the subject, and which measured the relevant
professional competence (voice), and the audiometry, related to the subject’s
complementary competence (hearing) induced emotions of a distress nature in
the subject, exceeding the theoretical average and the average for the subject
(Table 1).
Table 1. JAWS questionnaire results collected after the voice recording and audiometry
An analysis of the results obtained by the study subject in the voice recording
in terms of total positive emotions, leads to the observation that they are equal
to the theoretical mean in the test – the maximum score on this scale is 30
(Fig. 3). At the same time, they are higher than the mean of all responses given
by the subject. In terms of total negative emotions, the result obtained is lower
than the theoretical mean and also lower than the mean of negative emotions at
the same stage of the study.
In this study phase, the individual emotions experienced by the study subject
can be characterised as a complex of low pleasure/low arousal negative emotions
that exceeded the theoretical mean and the mean for the study subject and
of low and high arousal positive emotions. The study subject scored below the
mean on high arousal negative emotions. The second stage of the study involved
the same variables but measured during audiometry. There was an increase in
overall mean positive emotions but mean negative emotions did not change.
Improving the Process of Verifying Employee Potential 415
Fig. 3. Differences in the intensity of certain categories of emotions during the voice
recording and audiometry
A closer look at the results concludes that the mean did not change either in
terms of low arousal negative emotions or high arousal negative emotions. Thus,
the change concerns only positive emotions: compared to the voice recording, in
the other test the mean increased for high arousal positive emotions, while the
mean for low arousal positive emotions decreased. The emotions experienced
include low pleasure/high arousal and high pleasure/high arousal (above the
theoretical mean and the mean for the study subject), and low pleasure/low
arousal (whose value is below the theoretical mean and slightly below the mean
for the study subject), as well as high pleasure/low arousal.
The results indicate a predominance of positive emotions experienced dur-
ing the voice recording and audiometry. More intensely experienced emotions
appeared during the audiometry, which may imply the presence of eustress in
the subject. The emotion complex observed here is formed by emotions differing
in terms of valence and intensity, meaning that emotions of opposite valence can
be observed simultaneously. The effect is more noticeable at the first stage of
the study, with the subject facing a more stressful situation when their vocal
abilities were being tested.
416 M. Bugdol et al.
4 Discussion
Voice professionals, such as singers, are required to undergo examinations of the
vocal organ every 3–5 years (Annex No. 1 to the Ordinance of the Polish Minister
of Health and Welfare of 30 May 1996) [28]. Obtaining a favorable opinion from
a specialist physician confirms their physical capacities and suitability to pursue
the profession. In the case of the individual described here, neither the physical
examination nor the voice recording and audiometry revealed any contraindica-
tions for continuing to pursue it. The subject proceeded to undergo the voice
recording and audiometry without anxiety, as indicated by their responses in the
anticipated stress questionnaire and in the psychophysiological measurements.
According to congruence theory, when individual abilities, capacities, and pref-
erences match the requirements of a specific job, one can speak of person-job fit
[29]. From the individual’s point of view, lack of job fit is associated with stress
and tension and is relevant to performance at work [30]. Higher levels of stress
result in worse performance [31]. Similarly, not being able to pursue a profession
one feels a vocation for determining the experience of distress and a reduction in
satisfaction and work engagement [32]. At the same time, research has confirmed
that job fit is linked to positive personal and organisational outcomes such as
satisfaction and productivity [33]. On the basis of the theory by McCarthy and
Goffin, it could be assumed that the subject would experience distress during
the voice recording and audiometry, especially during this first one [3]. The pre-
dominance of negative emotions was related to the fact that performance of the
task induced fear of failure, uncertainty about the outcome, and fear of critical
judgment by others. In the group of professional musicians, fear of imperfect
performance is constantly present, which was reflected by commitment to work
manifesting itself, for example, in spending many hours practicing [34,35]. In the
case of the EDA signal, the results confirm this. On the other hand, in the case
of the variables determining HRV, the HR value was lower in the voice recording
and increased at the further stages of the study. This may be due to the fact
that the HRV signal is characterised by a slow varying trend: the time segments
of the individual stages of the measurement protocol are too short to unambigu-
ously determine the trend characteristics, and the body’s response to the stimuli
(the situation) is somewhat delayed [26]. In the study, a difference is seen in the
proportion of positive and negative emotions in both the voice recording and
audiometry. The negative emotions are equally intense, but during audiometry,
not specific to the subject’s competences, more high arousal positive emotions
appear. In other words, a positive cognitive response to a stressor appears, which
is mobilising rather than harmful [36].
A separate aspect is the increased intensity of EDA and HRV signals when
completing the questionnaire concerning the emotions experienced in voice
recording and audiometry. This can be explained by the fact that questionnaire
measurements may distort the original emotional responses, as respondents are
asked to recall their experiences, which activates the cognitive appraisal system,
i.e., involves cognitive activation. In fact, it was believed that data collected
using declarative methods often made it possible to capture the individual’s
Improving the Process of Verifying Employee Potential 417
5 Conclusion
According to the findings, the research protocol is a reliable assessment of the
professional potential of vocalists tested during preventive work examinations.
Mixed measurements (from the subject in the form of a questionnaire (VHI) and
questionnaire (JAWS) replies, as well as psychophysiological measures (EDA,
HRV)) were used to collect the results. They were gathered at various points
throughout the research. As a result, it was possible to acquire more informa-
tion about the respondent’s stress and emotions, which could aid in a more
comprehensive understanding of behavior during preventive work examinations.
In the future, research should be carried out on a larger group of people, with
indicators of work exhaustion included and a much broader range of physiologi-
cal characteristics and behavioral-physiological correlations identified among the
probands.
References
1. Lazarus, R.S., Folkman, S.: Stress, Appraisal, and Coping. Springer, Cham (1984)
2. Finnerty, A.N., Muralidhar, S., Nguyen, L.S., Pianesi, F., Gatica-Perez, D.: Stress-
ful first impressions in job interviews. In: Proceedings of the 18th ACM Interna-
tional Conference on Multimodal Interaction, pp. 325–332, October 2016
3. McCarthy, J., Goffin, R.: Measuring job interview anxiety: beyond weak knees and
sweaty palms. Pers. Psychol. 57(3), 607–637 (2004)
4. Constantin, K.L., Powell, D.M., McCarthy, J.M.: Expanding conceptual under-
standing of interview anxiety and performance: Integrating cognitive, behavioral,
and physiological features. Int. J. Sel. Assessment (2021)
5. Feiler, A.R., Powell, D.M.: Behavioral expression of job interview anxiety. J. Bus.
Psychol. 31(1), 155–171 (2016)
6. Katwyk, P., Fox, S., Spector, P., Kelloway, K.: Using the Job-Related Affective
Well-Being Scale (JAWS) to investigate affective responses to work stressors. J.
Occup. Health Psychol. 5, 219–30 (2000)
7. Selye, H.: Implications of stress concept. N. Y. State J. Med. 75(12), 2139–2145
(1975)
8. Basińska, B.: Emocje w miejscu pracy w zawodach podwyższonego ryzyka psy-
chospołecznego. Polskie Forum Psychologiczne, XVII I(1), 81–92 (2013)
9. Łosiak, W.: Psychologia stresu. Wydawnictwa Akademickie i Profesjonalne (2008)
10. Lazarus, R.S., Folkman, S.: Transactional theory and research on emotions and
coping. Eur. J. Pers. 1(3), 141–169 (1987)
11. Mauss, I.B., Robinson, M.D.: Measures of emotion: a review. Cognition Emotion
23(2), 209–237 (2009)
12. Bachorowski, J.-A.: Vocal expression and perception of emotion. Curr. Dir. Psy-
chol. Sci. 8(2), 53–57 (1999)
13. Kappas, A., Hess, U., Scherer, K.R.: Voice and emotion. In: Feldman, R.S., Rime,
B. (eds.) Fundamentals of Nonverbal Behavior, pp. 200–238. Cambridge University
Press, Cambridge (1999)
14. Pittam, J., Gallois, C., Callan, V.: The long-term spectrum and perceived emotion.
Speech Commun. 9(3), 177–187 (1990)
Improving the Process of Verifying Employee Potential 419
15. Scherer, K.R., Banse, R., Wallbott, H.G., Goldbeck, T.: Vocal cues in emotion
encoding and decoding. Motiv. Emot. 15(2), 123–148 (1991)
16. Bachorowski, J.A., Owren, M.J.: Vocal expression of emotion: acoustic properties
of speech are associated with emotional intensity and context. Psychol. Sci. 6(4),
219–224 (1995)
17. Karthikeyan, P., Murugappan, M., Yaacob, S.: Detection of human stress using
short-term ECG and HRV signals. J. Mech. Med. Biol. 13(02), 1350038 (2013)
18. Allen, A.P., Kennedy, P.J., Cryan, J.F., Dinan, T.G., Clarke, G.: Biological and
psychological markers of stress in humans: focus on the Trier Social Stress Test.
Neurosci. Biobehavioral Rev. 38, 94–124 (2014)
19. Seo, J., Laine, T.H., Sohn, K.A.: An exploration of machine learning methods
for robust boredom classification using EEG and GSR data. Sensors 19(20), 4561
(2019)
20. Micu, A.C., Plummer, J.T.: Measurable emotions: how television ads really work:
patterns of reactions to commercials can demonstrate advertising effectiveness. J.
Advert. Res. 50(2), 137–153 (2010)
21. Poels, K., Dewitte, S.: How to capture the heart? reviewing 20 years of emotion
measurement in advertising. J. Advert. Res. 46(1), 18–37 (2006)
22. Krasnodębska, P., Szkiełkowska, A., Rosińska, A., Domeracka-Kołodziej, A., Wło-
darczyk, E., Miaśkiewicz, B., Skarżyński, H.: Polska adaptacja kwestionariusza
oceny niepełnosprawności głosowej Pediatric Voice Handicap Index (pVHI). Nowa
Audiofonologia 8(1), 55–59 (2019)
23. Krasnodębska, P., Raj-Koziak, D., Szkiełkowska, A., Skarżyński, H.: Zastosowanie
audiometrii wysokich częstotliwości w diagnostyce nagłego niedosłuchu u muzyka.
Am. J. Case Rep. 5(4), 77–81 (2016)
24. Greco, A., Valenza, G., Lanata, A., Scilingo, E.P., Citi, L.: cvxEDA: a convex
optimization approach to electrodermal activity processing. IEEE Trans. Biomed.
Eng. 63(4), 797–804 (2015)
25. Romaniszyn-Kania, P., et al.: Affective state during physiotherapy and its analysis
using machine learning methods. Sensors 21(14), 4853 (2021)
26. Shaffer, F., Ginsberg, J.P.: An overview of heart rate variability metrics and norms.
Frontiers in public health, 258 (2017)
27. Zyśk, A., Bugdol, M., Badura, P.: Voice fatigue evaluation: a comparison of singing
and speech. In: Tkacz, E., Gzik, M., Paszenda, Z., Piętka, E. (eds.) IBE 2018. AISC,
vol. 925, pp. 107–114. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-
15472-1_12
28. Załącznik nr 1 do rozporządzenia Ministra Zdrowia i Opieki Społecznej z dnia
30 maja 1996 r. https://ibhp.uj.edu.pl/documents/1028990/c7941269-7b5d-4249-
8fa3-03e20d7018e4
29. Lawrence, A., Doverspike, D., O’Connell, M.: An Examination of the Role Job Fit
Plays in Selection (2004)
30. Rousseau, D.M., Parks, I.M.: The contracts of individuals and organizations. Res.
Organizational Behav. 15, 41–43 (1993)
31. Tillmann, J., Hamill, L., Dungan, B., Lopez, S., Lu, S.: Employee stress, engage-
ment, and work outcomes. Paper presented at the meeting of the Society for
Industrial-Organizational Psychology, Chicago, IL, April 2018
32. Berg, J.M., Grant, A.M., Johnson, V.: When callings are calling: crafting work
and leisure in pursuit of unanswered occupational callings. Organ. Sci. 21(5), 973–
994(2010)
420 M. Bugdol et al.
1 Introduction
Specific values of acoustic parameters of the human voice are crucial from the
medical point of view, especially in the diagnosis and treatment of diseases of
the speech apparatus [1]. The cut-off values of voice parameters are particu-
larly important to discriminate between pathological and normal voices [2] –
the differences in acoustic parameters recorded by different types of microphone
can lead to wrong assessment of voice quality (i.e. labeled patient as having a
dysphonic voice, when some parameter’s values are close to cut-off values [3]).
Apart from medical diagnostics of the speech apparatus, reliably determined
values of acoustic parameters are important in the life sciences. Voice biology
researchers use these measures to assess an individual’s sex [4], age [5], emotional
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 421–431, 2022.
https://doi.org/10.1007/978-3-031-09135-3_35
422 L
. Pawelec et al.
state [6], and even body size [7]. For example some voice characteristic’s values
such fundamental and formant frequencies are significantly correlated with mea-
surements describing body size and shape [8–15]. Also some formants derivatives
(especially formant dispersion [Df] or formant spacing [δF]) are strongly related
to body size [16]or body weight and age [17] among some mammals. As these
parameters are crucial for the correct description of biological phenomena and
medical diagnostics, precise determination of their value should be a particularly
important element in voice research.
The aim of the study was to compare quantitatively the acoustic parameters
of the voice of 80 adults, obtained with the use of two microphones of different
specifications.
2.1 Subjects
The material was collected at two institutes: The Jerzy Kukuczka Academy of
Physical Education (Katowice, Poland) and Wroclaw University of Environmen-
tal and Life Sciences (Wroclaw, Poland). A total of 80 participants (including 48
women) participated in the study; 28 from Katowice (16 women) and 52 from
Wroclaw (32 women). Each subject was asked to complete a short questionnaire
and record a voice sample. All participants of the study were tested at the same
time of the day – between 9 and 12 am. All subjects agreed to participate in the
study free of charge.
All voice recordings were made in the same acoustic conditions (room muted from
external sounds), using a professional acoustic cabin Mozos Mshield, placed on a
special tripod with a height adjusted to the height of the examined person. The
recordings were made in a standing position, using two microphones at the same
time. Both microphones were placed on special grips 15 cm from the mouth of the
The Microphone Type and Voice Acoustic Parameters Values 423
examined person, directly at the height of his head. Special pop-filters were used
for both microphones. The background noise measurement was approximately
38 dB in Wroclaw and approximately 39 dB in Katowice. The volume (intensity)
of the recordings was controlled with a special digital sound level meter Benetech
GM1351 and was in the range of 65–70 dB.
The first type of microphone used in the study was a dynamic cardioid micro-
phone Shure SM 58 SE with 50 Hz–15 kHz, connected to an amplifier IMG Stage-
line MPA-202 with 45 dB sound amplification and low 60 Hz. All voice samples
were recorded as a mono files.
The second type of microphone used in the study was a condenser cardioid
microphone Rode NT1-A Kit with 20 Hz–20 kHz, connected to Zoom H4n PRO
sound recorder with low 80 Hz. The recordings were recorded as a stereo files.
All samples were recorded as uncompressed format files (.wav) with an equiv-
alent sampling frequency of 44.1 kHz and with 16-bit resolution.
Each participant was asked to speak aloud 5 Polish vowels: a, e, i, o, u
which can be phonetically recorded as /α:/, /:/, /i:/, /⊃:/, /u:/ with sustained
phonation for 3 s followed by 1 s break. Each sample was recorded simultaneously
via 2 microphone types: dynamic and condenser.
F0 parameters were measured using pitch floor 75 Hz and pitch ceiling 500 Hz.
For formants ceiling value was 5000 Hz.
Secondly, the voice acoustic parameters were compared between two types
of microphones (dynamic and condenser) using Pearson’s correlation coefficient
(r ). Additionally, for better illustration of this relationship scatterplots for each
parameter were created.
Furthermore, the medians and quartile deviation (Q) of acoustic parameters
from these two microphones were compared and for each parameter a difference
was determined. To check the significance of those differences, due to the lack
of normal distribution of those parameters, the Wilcoxon signed-rank test was
applied check the significance of those differences.
For all statistical analyses a Statistica v 13.3 software (1984–2017 TIBCO
Software Inc, Palo Alto, California, USA) was applied. A significance level of 5%
(p ≤ 0.05) were adopted for all tests.
3 Results
3.1 Descriptive Data
The values of all voice parameters between two microphone types were com-
pared using linear Pearson’s correlation (Table 2). The most similar values in
dynamic and condenser microphones were observed for parameters describing
426 L
. Pawelec et al.
Table 3. Wilcoxon signed-rank test values of acoustic parameters of voice for two
microphone types – dynamic and condenser one. Median± Quartile deviation (Q)
4 Discussion
5 Conclusion
The differences obtained in this study are an indication for voice researchers
(e.g. bioacoustics) that the selection of sound recording equipment may have a
significant impact on the results obtained by them. The above-mentioned use of
a switchable equalizer would make it possible to equalize the characteristics of
the signal path (Fig. 4). It is important, that the corrector system introduces a
minimal and constant phase shift in the range of measured frequencies.
References
1. Keisenwether, J.S., Sataloff, R.T.: The effect of microphone type on acoustical
measures of synthesized vowels. J. Voice 29(5), 548–551 (2015)
2. Parsa, V., Jamieson, D.G., Pretty, B.R.: Effects of microphone type on acoustic
measures of voice. J. Voice 15(3), 331–343 (2001)
3. Bottalico, P., et al.: Reproducibility of voice parameters: the effect of room acous-
tics and microphones. J. Voice 34(3), 320–334 (2020)
4. Puts, D.A., et al.: Sexual selection on male vocal fundamental frequency in humans
and other anthropoids. Proc. Roy. Soc. B: Biol. Sci. 283(1829), 20152830 (2016)
5. Ramig, L.A., Ringel, R.L.: Effects of physiological aging on selected acoustic char-
acteristics of voice. J. Speech Lang. Hear. Res. 26(1), 22–30 (1983)
6. Pisanski, K., Raine, J., Reby, D.: Individual differences in human voice pitch are
preserved from speech to screams, roars and pain cries. Roy. Soc. Open Sci. 7(2),
191642 (2020)
7. Fitch, W.T., Giedd, J.: Morphology and development of the human vocal tract: a
study using magnetic resonance imaging. J. Acoust. Soc. Am. 106(3), 1511–1522
(1999)
8. Bruckert, L., Li, ’enard, J.S., Lacroix, A., Kreutzer, M., Leboucher, G.: Women
use voice parameters to assess men’s characteristics. Proc. Roy. Soc. B: Biol. Sci.
273, 83–89 (2006)
9. Evans, S., Neave, N., Wakelin, D.: Relationships between vocal characteristics and
body size and shape in human males: an evolutionary explanation for a deep male
voice. Biol. Psychol. 72, 160–163 (2006)
10. Gonzalez, J.: Formant frequencies and body size of speaker: a weak relationship in
adult humans. J. Phon. 32, 277–287 (2004)
11. Gonzalez, J.: Correlations between speakers’ body size and acoustic parameters of
voice. Percept. Mot. Skills 105, 215–220 (2007)
12. Pawelec, L . P., Graja, K., Lipowicz, A.: Vocal indicators of size, shape and body
composition in polish men. J. Voice. In Press (2020)
13. Pisanski, K., et al.: Vocal indicators of body size in men and women, a meta-
analysis. Anim. Behav. 95, 89–99 (2014)
The Microphone Type and Voice Acoustic Parameters Values 431
14. Pisanski, K., et al.: Voice parameters predict sex-specific body morphology in men
and women. Anim. Behav. 112, 13–22 (2016)
15. Rendall, D., Kollias, S., Ney, C., Lloyd, P.: Pitch (F0) and formant profiles of
human vowels and vowel-like baboon grunts: the role of vocalizer body size and
voice-acoustic allometry. J. Acoustical Soc. Am. 117(2), 944–955 (2005)
16. Fitch, W.T.: Vocal tract length and formant frequency dispersion correlate with
body size in rhesus macaques. J. Acoustical Soc. Am. 102(2), 1213–1222 (1997)
17. Reby, D., McComb, K.: Anatomical constraints generate honesty: acoustic cues to
age and weight in the roars of red deer stags. Anim. Behav. 65(3), 519–530 (2003)
18. Boersma, P., Weenink, D.: Praat: doing phonetics by computer [Computer pro-
gram]. Version 6.0.56 (2019). http://www.praat.org/. Accessed 20 June 2019
19. Teixeira, J.P., Oliveira, C., Lopes, C.: Vocal acoustic analysis-jitter, shimmer and
HNR parameters. Procedia Technol. 9, 1112–1122 (2013)
20. RODE company data sheets. https://cdn1.rode.com/nt1-a datasheet.pdf
21. Shure company datasheets. https://pubs-api.shure.com/file/260007
22. Rode NT1-A Kit microphone data sheet. https://cdn1.rode.com/nt1-
a datasheet.pdf
23. Shure, S.M.: 58 SE microphone data sheet. https://pubs-api.shure.com/file/260007
24. Titze, I.R., Winholtz, W.S.: Effect of microphone type and placement on voice
perturbation measurements. J. Speech Lang. Hear. Res. 36(6), 1177–1190 (1993)
25. Teixeira, J.P., Fernandes, P.O.: Jitter, shimmer and HNR classification within gen-
der, tones and vowels in healthy voices. Procedia Technol. 16, 1228–1237 (2014).
www.sciencedirect.com
26. Source code of PRATT. https://github.com/praat/praat/blob/
382c64e43c64bf73b93fcec32ebfd788b5970a8d/fon/VoiceAnalysis.cpp
Signal Processing
Activities Classification Based on IMU
Signals
1 Introduction
In recent years, it has been observed an increasing trend of using wearable sen-
sors in physiotherapy. Miniaturized sensors are being used more and more to
track patients to assess their progress or during exercise [8] and allow obtaining
continuous information about one’s behavior even after the session. Many solu-
tions have been proposed for human activity recognition (HAR) tasks based on
wearable sensor data [7]. These methods adopted different feature selection algo-
rithms and achieved satisfactory recognition accuracy using different machine
learning algorithms.
Recent research surveys [9,10] presented a detailed review of human activity
recognition methods based on wearable sensors from different points of view,
including the variety of sensors, recognition approaches, and application strate-
gies. It can conclude that the latest research is focused on employing inertial
sensors, especially accelerometers, for activity recognition by features related to
human motion.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 435–442, 2022.
https://doi.org/10.1007/978-3-031-09135-3_36
436 M. N. Bugdol et al.
For the purposes of the realization of the SMART project, a set of signals has
been recorded for 140 volunteers during July and August 2021. For registrations
the Xsens DOT sensors were used [11]. The Xsens DOT sensor contains:
The sampling frequency of the measured signals 60 Hz. The sensors were placed
as presented on Fig. 1.
The examined persons were performing the following activities:
In total 1121 activities has been recorded. After excluding those measurements,
in which the amount of missing data hampered machine learning, 1101 recorded
activities were further analyzed. Due to the fact, that the recordings varied in
terms of their length, for each measurement a 4 s long clip was chosen as its repre-
sentative. For each variable transmitted by the sensors, concerning the object’s
438 M. N. Bugdol et al.
movement, the mean, standard deviation and range has been calculated. The
obtained coefficients served as predictors, whereas the activity label was the
dependent variable. All calculations were performed in R 4.1.2 using RStudio
framework, employing the following packages: gdata, caTools, class, rpart, ran-
domForest, e1071, adabag, praznik (Fig. 2).
For each machine learning session the training set consisted of 80% of avail-
able data maintaining the proportions between representatives of individual
classes. Then a model was built and tested. This procedure was repeated 1000
times for each classifier and the averaged accuracies are presented in Tables 1
and 2. The modelling was conducted in two ways:
The details of the variable selection process are as follows. The JMI algorithm
was performed 100 times on a set, which constituted a random 80% of the
training set, maintaining the proportions of the original set in terms of the
participation of representatives of individual classes. The 100 coefficients with
the highest scores were then selected. This procedure was repeated 100 times.
Features that were selected more than 50 times were further selected for model
building.
Activities Classification 439
3 Results
The averaged accuracy results of the employed classifiers are presented in
Tables 1 and 2. Table 2 contains classification results obtained when choosing
between one of the 8 considered activities, whereas Table 2 presents the accu-
racy when verifying each activity separately. Bold font indicates the best result
for each activity. In both tables the postscript “AllVar” means, that all available
coefficients were used and “SelVar” denotes the case, where the input variables
were previously selected using the JMI method.
zeros (all classifiers gave did not recognize the given signal as the particular
activities) then it the analyzed time period should be labeled as “another activ-
ity”. If more than one activity will be identified, the label for the given time
period should be this activity, for which the larger probability of belonging to
class “1” is obtained.
4 Discussion
The best solution was selected based on the obtained results, i.e., the method
that obtained the classification’s highest average quality. The chosen method
– ADA each time obtained a result above 99%, and only in the case of two
activities was (not much) worse than the others. The use of simple statistical
characteristics of the signal, i.e., mean, std and range, simplifies the computation
complexity and speeds up the signal processing time. In addition, the feature
selection enabling precise recognition of a specific activity means that there is
no need to calculate a large set of features each time, but only those identified as
important from the point of view of classification. It should be taken into account
that for all tested methods, the classification quality did not drop below 90%,
which may be the result of a large variety of activities selected for identification
in the SMART system.
Activities Classification 441
5 Conclusion
This paper presents the results of activity recognition using machine learning
methods on IMU signals. The obtained accuracy values are satisfying and lead
to the conclusions, that it is possible to automatically detect an activity from
the provided activity list in a fast and effective way. For the final version of
the SMART system it is advised to verify each activity separately and then
perform final activity selection instead of trying to distinguish between them
using a single model. The details of the proposed final models were delivered
to Comfortel, the partner of Silesian University of Technology and the main
investor in the SMART project.
References
1. Åstrand, P.O.: Experimental studies of physical working capacity in relation to sex
and age. FIEP Bull. On-line 52 (1952)
442 M. N. Bugdol et al.
2. Bao, L., Intille, S.S.: Activity recognition from user-annotated acceleration data.
In: Ferscha, A., Mattern, F. (eds.) Pervasive 2004. LNCS, vol. 3001, pp. 1–17.
Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24646-6 1
3. BTS Bioengineering Corp.: BTS GAITLAB. https://www.btsbioengineering.com/
products/bts-gaitlab-gait-analysis/
4. Chen, K., Zhang, D., Yao, L., Guo, B., Yu, Z., Liu, Y.: Deep learning for sensor-
based human activity recognition: overview, challenges, and opportunities. ACM
Comput. Surv. CSUR 54, 1–40 (2021)
5. Fan, C., Gao, F.: Enhanced human activity recognition using wearable sensors via
a hybrid feature selection method. Sensors 21, 6434 (2021)
6. Foerster, F., Smeja, M.: Joint amplitude and frequency analysis of tremor activity.
Electromyogr. Clin. Neuro-Physiol. 39, 11–19 (1999)
7. Lara, O.D., Labrador, M.A.: A survey on human activity recognition using wear-
able sensors. IEEE Commun. Surv. Tutor. 15, 1192–1209 (2012)
8. Romaniszyn-Kania, P., et al.: Affective state during physiotherapy and its analysis
using machine learning methods. Sensors 21, 4853 (2021)
9. Slim, S.O., Atia, A., Elfattah, M.M.A., M.Mostafa, M.S.: Survey on human activity
recognition based on acceleration data. Int. J. Adv. Comput. Sci. Appl. 10, 84–98
(2019)
10. Wang, Y., Cang, S., Yu, H.: A survey on wearable sensor modality centered human
activity recognition in health care. Expert Syst. Appl. 137, 167–190 (2019)
11. Xsens: Xsens DOT user manual (2021). https://www.xsens.com/hubfs/
Downloads/Manuals/XsensDOTUserManual.pdf
12. Yang, H., Moody, J.: Data visualization and feature selection: new algorithms for
nongaussian data. In: Solla, S., Leen, T., Müller, K. (eds.) Advances in Neural
Information Processing Systems, vol. 12. MIT Press (2000). https://proceedings.
neurips.cc/paper/1999/file/8c01a75941549a705cf7275e41b21f0d-Paper.pdf
Heart Rate Measurement Based
on Embedded Accelerometer
in a Smartphone
1 Introduction
The emergence of new information and communication technologies (ICT) affects
various aspects of life. The use of mobile communication devices (e.g., smart-
phones, tablet computers, smart watches, smart wristbands) to provide health-
care services is defined as mobile health or mHealth [2,23,32]. On an individual
patient level, mHealth may find its use in increasing medication adherence or
control of vital signs [10,20]. Cardiology may also benefit from mHealth thanks to
the broadening capabilities of telecommunication networks and mobile devices,
which may improve the diagnosis and treatment of cardiovascular diseases [20].
Heart rate (HR) is one of the most important vital signs that can be acquired
with various techniques, such as electrocardiography (ECG) [8,30], photoplethys-
mography (PPG) [4,10], videoplethysmography (VPG) [6,24], seismocardiog-
raphy [9,17,25], and gyrocardiography [17]. Seismocardiography (SCG) is a
noninvasive technique for acquisition and analysis of cardiac vibrations by an
accelerometer placed on a chest wall [29,33,34]. Before 2007, the acquisition of
seismocardiograms (SCG signals) required the use of cumbersome piezoelectric
accelerometers [5,29] placed on the sternum. The availability of small, inex-
pensive, and accurate MEMS (microelectromechanical systems) accelerometers
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 443–454, 2022.
https://doi.org/10.1007/978-3-031-09135-3_37
444 M. Urzeniczok et al.
Conducting tests with volunteers was an integral part of an app. Due to the lim-
ited time and budget, the experimental group consisted of 9 volunteers: 5 female
and 4 male subjects. The mean age of female subjects was 33.4 years, whereas
the mean age of male subjects was 38.3 years. One subject was diagnosed with
an arrhythmia. Before conducting the study, all subjects gave informed con-
sent. Each subject was lying in a supine position and was asked to place the
smartphone on the chest wall and stay still during the signal registration. The
SCG signal from each subject was registered for two minutes with the sampling
frequency 100 Hz.
Then, the signal was smoothed by applying a moving average filter imple-
mented as a finite impulse response (FIR) filter, which is expressed as follows:
1
y(n) = [x(n) + x(n − 1) + · · · + x(n − L + 1)] (2)
L
where y(n) is the output signal, L is the window width, and x(n) is the input
signal. The width of the moving average filter was set to 0.1 s.
The last step is finding the AO waves and automatic thresholding to atten-
uate the noise [11,21]. The locations of the AO waves were determined as the
rising slope of the preprocessed signal. Signal thresholding uses two thresholds:
a higher threshold used in the learning phase and lower threshold used if no
heartbeat is detected in a set time interval.
After the learning phase, the threshold is adapted to the running signal and
noise levels and the missed peaks are added using search-back technique [21].
In our modification of Pan-Tompkins algorithm, the most significant changes
are the learning phase (the first estimation of signal and noise peaks), and the
calculation of threshold. In general, the threshold level T hr is calculated as:
N Ni = ti − ti−1 (6)
3 Results
3.1 Heartbeat Detection Performance
The performance of heartbeat detection was expressed as the number of true
positives (TP), false negatives (FN), and false positives (FP) annotated manually
in SCG signals based on the description provided in [33]. True positive is defined
as a correctly detected AO wave, whereas a false negative is defined as an AO
wave that was not detected. False positive is defined as detecting a false AO
wave. Tolerance for detecting AO waves was 0.1 s based on [27].
The sensitivity (Se) is defined as:
TP
Se = (8)
TP + FN
and the positive predictive value (PPV) is defined as:
TP
PPV = (9)
TP + FP
The performance of heartbeat detection is shown in Table 1.
The comparison of the performance of our approach with other available
heartbeat detectors designed for SCG signals is presented in Table 2.
Fig. 1. Use cases diagram of the app
HR Measurement with a Smartphone
447
448 M. Urzeniczok et al.
Table 2. Comparison of the presented heartbeat detection algorithm with other avail-
able algorithms
The app was tested on a Honor 10 smartphone with Android version 9.1.0 and
included the typical use case: lying still in a supine position, placing a smartphone
on a chest, and following the instructions provided by an app. After launching
the app, the user sees the welcome screen as shown in Fig. 2.
HR Measurement with a Smartphone 449
If the device has no available accelerometer, the user is informed that the
examination cannot be performed without a working accelerometer (see Fig. 3).
After choosing “Rozpocznij badanie” (Start examination), the user is
instructed how to perform the examination and then the signal registration and
heart rate measurement starts. The charts with an SCG signal (in X-axis, Y-
axis, and/or Z-axis) and heart rate are shown after choosing “Pokaż wykresy”
(Show chart). The current heart rate is displayed with a one-second delay (see
Fig. 4). To stop registering the signals, a user must choose “Zakończ badanie”
(Stop examination) and confirm the choice.
The registered signals may be exported to a CSV file saved on a device
storage by choosing “Zapisz wynik badania” (Save the recording). The user is
asked whether to save the CSV file or abort the export (see Fig. 5)
450 M. Urzeniczok et al.
Fig. 4. SCG signal and heart rate charts: on the left for HR and Z-axis only, and for
all the axes and heart rate on the right
HR Measurement with a Smartphone 451
4 Discussion
We have developed an app for real-time heart rate measurement based on SCG.
The implemented app has a simple and intuitive user interface, which helps
the user prepare to record the SCG signals. The app runs stable, the charts are
displayed smoothly, the heartbeat detector proved its feasibility on a study group
(Se = 0.930, PPV = 0.961), and the delay of heart rate measurement was one
second. The reported performance is lower than in most studies shown in Table 2,
except Sieciński et al. in [26] (Se = 0.893, PPV = 0.896). The main cause was
a relatively small difference in amplitude between the AO and RE waves that
may be related to arterial stiffness due to ageing [13]. However, we proved that a
smartphone with an app can register seismocardiograms and display the current
heart rate in real time.
In future research, we consider improving the robustness of the implemented
heartbeat detector in order to decrease the number of false positives. Another
important aspect of future studies is conducting tests in different conditions
(e.g., in various positions, places, and emotional states) and including more sub-
jects. An app could be improved by adding new modules for the analysis of SCG
signals, such as for heart rate variability analysis or atrial fibrillation detection.
Thanks to the improvements in signal processing techniques and devices in com-
bination with the understanding of physiological background of cardiac vibra-
tions, seismocardiography may become a useful technique for the assessment of
heart condition at home.
452 M. Urzeniczok et al.
References
1. Lowpass, highpass, and bandpass Butterworth filters in C# (2019). https://www.
codeproject.com/Tips/5070936/Lowpass-Highpass-and-Bandpass-Butterworth-
Filters. Accessed 27 Dec 2021
2. Adibi, S. (ed.): Mobile Health. Springer, Cham (2015). https://doi.org/10.1007/
978-3-319-12817-7
3. Alamdari, N., Tavakolian, K., Zakeri, V., Fazel-Rezai, R., Akhbardeh, A.: A mor-
phological approach to detect respiratory phases of seismocardiogram. In: 2016
38th Annual International Conference of the IEEE Engineering in Medicine and
Biology Society (EMBC), Orland, FL, USA, pp. 4272–4275 (2016). https://doi.
org/10.1109/EMBC.2016.7591671
4. Allen, J.: Photoplethysmography and its application in clinical physiological mea-
surement. Physiol. Meas. 28(3), R1 (2007). https://doi.org/10.1088/0967-3334/
28/3/R01
5. Castiglioni, P., Faini, A., Parati, G., Rienzo, M.D.: Wearable seismocardiogra-
phy. In: 2007 29th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society, Lyon, France, pp. 3954–3957 (2007). https://doi.
org/10.1109/IEMBS.2007.4353199
6. Celniak, W., Augustyniak, P.: Detection of human blood pulse based on displace-
ment vector in video footage. In: 2021 14th International Conference on Human
System Interaction (HSI). IEEE (2021). https://doi.org/10.1109/hsi52170.2021.
9538740
7. Choudhary, T., Bhuyan, M.K., Sharma, L.N.: A novel method for aortic valve
opening phase detection using SCG signal. IEEE Sens. J. 20(2), 899–908 (2020).
https://doi.org/10.1109/jsen.2019.2944235
8. Christov, I.I.: Real time electrocardiogram QRS detection using combined adaptive
threshold. Biomed. Eng. Online 3(1), 28 (2004). https://doi.org/10.1186/1475-
925X-3-28
9. Cocconcelli, F., Mora, N., Matrella, G., Ciampolini, P.: Seismocardiography-based
detection of heartbeats for continuous monitoring of vital signs. In: 2019 11th
Computer Science and Electronic Engineering (CEEC), Colchester, UK, pp. 53–58
(2019). https://doi.org/10.1109/CEEC47804.2019.8974343
10. Coppetti, T., et al.: Accuracy of smartphone apps for heart rate measure-
ment. Eur. J. Prev. Cardiol. 24(12), 1287–1293 (2017). https://doi.org/10.1177/
2047487317702044
11. Fariha, M.A.Z., Ikeura, R., Hayakawa, S., Tsutsumi, S.: Analysis of Pan-Tompkins
algorithm performance with noisy ECG signals. J. Phys. Conf. Ser. 1532, 012, 022
(2020). https://doi.org/10.1088/1742-6596/1532/1/012022
12. Google Inc: Manifest.permission (2021). https://developer.android.com/reference/
android/Manifest.permission. Accessed 27 Jan 2022
13. Gurev, V., Tavakolian, K., Constantino, J.C., Kaminska, B., Blaber, A.P.,
Trayanova, N.: Mechanisms underlying isovolumic contraction and ejection peaks
in seismocardiogram morphology. J. Med. Biol. Eng. 32(2), 103 (2012). https://
doi.org/10.5405/jmbe.847
14. Inan, O.T., et al.: Ballistocardiography and seismocardiography: a review of recent
advances. IEEE J. Biomed. Health Inform. 19(4), 1414–1427 (2015). https://doi.
org/10.1109/JBHI.2014.2361732
HR Measurement with a Smartphone 453
15. Landreani, F., et al.: Beat-to-beat heart rate detection by smartphone’s accelerom-
eters: validation with ECG. In: 2016 38th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA,
pp. 525–528 (2016). https://doi.org/10.1109/EMBC.2016.7590755
16. Li, Y., Tang, X., Xu, Z.: An approach of heartbeat segmentation in seismocar-
diogram by matched-filtering. In: 2015 7th International Conference on Intelligent
Human-Machine Systems and Cybernetics, Hangzhou, China, vol. 2, pp. 47–51
(2015). https://doi.org/10.1109/IHMSC.2015.157
17. Mehrang, S., et al.: Machine learning based classification of myocardial infarction
conditions using smartphone-derived seismo- and gyrocardiography. In: 2018 Com-
puting in Cardiology Conference (CinC), Maastricht, Netherlands, vol. 45, pp. 1–4
(2018). https://doi.org/10.22489/CinC.2018.110
18. Mora, N., Cocconcelli, F., Matrella, G., Ciampolini, P.: Detection and analysis of
heartbeats in seismocardiogram signals. Sensors 20(6), 1670 (2020). https://doi.
org/10.3390/s20061670
19. Mora, N., Cocconcelli, F., Matrella, G., Ciampolini, P.: A unified methodology for
heartbeats detection in seismocardiogram and ballistocardiogram signals. Comput-
ers 9(2), 41 (2020). https://doi.org/10.3390/computers9020041
20. Nguyen, H.H., Silva, J.N.: Use of smartphone technology in cardiology. Trends
Cardiovasc. Med. 26(4), 376–386 (2016). https://doi.org/10.1016/j.tcm.2015.11.
002
21. Pan, J., Tompkins, W.J.: A real-time QRS detection algorithm. IEEE Trans.
Biomed. Eng. BME-32(3), 230–236 (1985)
22. Pandia, K., Inan, O.T., Kovacs, G.T.A., Giovangrandi, L.: Extracting respiratory
information from seismocardiogram signals acquired on the chest using a miniature
accelerometer. Physiol. Meas. 33(10), 1643–1660 (2012). https://doi.org/10.1088/
0967-3334/33/10/1643
23. Reeder, B., David, A.: Health at hand: a systematic review of smart watch uses
for health and wellness. J. Biomed. Inform. 63, 269–276 (2016). https://doi.org/
10.1016/j.jbi.2016.09.001
24. Rumiński, J.: Reliability of pulse measurements in videoplethysmography. Metrol.
Meas. Syst. 23(3), 359–371 (2016). https://doi.org/10.1515/mms-2016-0040
25. Siecinski, S., Kostka, P.S., Tkacz, E.J.: Heart rate variability analysis on CEBS
database signals. In: 2018 40th Annual International Conference of the IEEE Engi-
neering in Medicine and Biology Society, Honolulu, HI, USA, pp. 5697–5700 (2018).
https://doi.org/10.1109/EMBC.2018.8513551
26. Siecinski, S., Tkacz, E.J., Kostka, P.S.: Comparison of HRV indices obtained from
ECG and SCG signals from CEBS database. BioMed. Eng. OnLine 18(69) (2019).
https://doi.org/10.1186/s12938-019-0687-5
27. Sørensen, K., Schmidt, S.E., Jensen, A.S., Søgaard, P., Struijk, J.J.: Definition of
fiducial points in the normal seismocardiogram. Sci. Rep. 8(1) (2018). https://doi.
org/10.1038/s41598-018-33675-6
28. Suresh, P., Narayanan, N., Pranav, C.V., Vijayaraghavan, V.: End-to-end deep
learning for reliable cardiac activity monitoring using seismocardiograms. In:
2020 19th IEEE International Conference on Machine Learning and Applica-
tions (ICMLA), Miami, FL, USA, pp. 1369–1375 (2020). https://doi.org/10.1109/
ICMLA51294.2020.00213
29. Taebi, A., Solar, B.E., Bomar, A.J., Sandler, R.H., Mansy, H.A.: Recent advances
in seismocardiography. Vibration 2(1), 64–86 (2019). https://doi.org/10.3390/
vibration2010005
454 M. Urzeniczok et al.
30. Task Force of the European Society of Cardiology the North American Society
of Pacing Electrophysiology: Heart rate variability. Standards of measurement,
physiological interpretation, and clinical use. Circulation 93, 1043–1065 (1996).
https://doi.org/10.1161/01.CIR.93.5.1043
31. Urzeniczok, M.: Aplikacja do detekcji tętna za pomocą akcelerometru wbu-
dowanego w urządzenie mobilne [An app for detecting the heart beats with an
accelerometer embedded into a mobile device]. Master’s thesis, Silesian University
of Technology, Zabrze, Poland (2020)
32. Yang, X., et al.: Exploring emerging IoT technologies in smart health research: a
knowledge graph analysis. BMC Med. Inform. Decision Mak. 20(1) (2020). https://
doi.org/10.1186/s12911-020-01278-9
33. Zanetti, J.M., Poliac, M.O., Crow, R.S.: Seismocardiography: waveform identifica-
tion and noise analysis. In: Proceedings Computers in Cardiology, Venice, Italy,
pp. 49–52 (1991). https://doi.org/10.1109/CIC.1991.169042
34. Zanetti, J.M., Tavakolian, K.: Seismocardiography: past, present and future. In:
2013 35th Annual International Conference of the IEEE Engineering in Medicine
and Biology Society (EMBC), Osaka, Japan, pp. 7004–7007 (2013). https://doi.
org/10.1109/EMBC.2013.6611170
Non-invasive Measurement of Human
Pulse Based on Photographic Images
of the Face
1 Introduction
During a global pandemic, keeping distance and remote health monitoring
became critical issues in the medical field. There are numerous noninvasive and
contactless diagnostic procedures available today [4,12]. C. Omar et al. was com-
paring accuracy of pulse measurement using pulse oximetry and electrocardio-
graphy methods in detecting low heart rate (lower than 100 beats per minute).
55 infants were tested and mean difference between the two methods was ±2
beats per minute. The pulse oximetries sensitivity of low pulse detection was
89% and specificity was 99% [10]. Nelson et al. was focused on examining accu-
racy of heart rate measurements of wearable devices Apple Watch 3 and Fitbit
Charge 2. The measurements were taken during 24-h period and compared with
reference data from ambulatory electrocardiogram. The Apple Watch had mean
difference of −1.80 beats per minute and mean absolute error percent of 5.86%
and the Fitbit Charge had difference of −3.47 beats per minute and mean abso-
lute error of 5.96% [9]. Cheng et al. the goal was to develop new optical method
of detecting heart rate and blood pressure values. Heart rate was measured by
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 455–464, 2022.
https://doi.org/10.1007/978-3-031-09135-3_38
456 J. Gumulski et al.
using photoplethysmography with SpO2 sensor and blood pressure was mea-
sured with ballistocardiography using fiber sensor mat. Accuracy was measured
by comparing with reference data. Mean error for heart rate was 0.6 ± 0.9 beats
per minute and for blood pressure it was 1.8 ± 1.3 mmHg [2].
During systole, the heart pumps blood through arteries, changing the color
shade of tissue and skin subtly. Photoplethysmography analyzes those changes
and detects blood volume pulse (bvp), and heart rate. This method usually
requires an optical sensor that is physically connected to the skin, but in this
article, a video camera and face recognition algorithms were used instead. The
majority of optical approaches are less successful than normal contact methods,
but the suggested method should allow for contactless heart rate measurement
without the use of any special equipment and with only an standard phone
camera [1].
The purpose and novum of this article was evaluation the maximum perfor-
mance of a contactless photoplethysmography based method to calculate human
pulse in daylight conditions with minimum hardware equipment using smart-
phone camera. No such detailed analysis was found in the literature review.
The proposed method of the human pulse detection was shown in block diagram
Fig. 1. Below is a brief description of the individual stages of the developed
method.
First step of the developed method was face recognition. To recognize face in
image, FaceLib library was used [3]. After successfully recognizing face in each
frame, pixels containing skin area were selected and background pixels were
removed from each frame, to minimize irrelevant data and perturbations. Each
frame was resized into 256 × 256 px squares and converted into RGB color scale.
Segmentation was based on pixels intensity and colors corresponding to skin
tones using deep-learning LinkNet34 model [6–8]. Pulse could not be calculated
unless face was recognized in the video’s frame. Only one person’s pulse could
be calculated at the same time.
Next step of the algorithm was constructing variability function of image
elements. The function was processed and the pulse was calculated for each
90 frames. Mean signal for each RGB canal of each frame was calculated and
then, by calculating diagonal array and inverse of a matrix, the batch signal
was acquired. Then the moving average was applied with indicator of 6 frames
to remove insignificant fluctuations of the signal. For signal processing rPPG
library was used [11]. That prepared signal was ready to calculate pulse. All
calculations were made in Python configured in the Anaconda environment.
To compute pulse from mean RGB signal one-dimensional discrete Fourier
1
Transform with sample frequencies of f ramespersecond step was used. Fourier
Transform is transform that changes function’s domain of time to domain of
frequency. Its formula was shown in Eq. 1.
Contactless Heart Rate Measurement 457
L−1
2πkn
F (k) = f (n)e−j L , (1)
n=0
where:
Then absolute value was driven from signal to remove imaginary part of the
signal.
The next step was filtering spectral signal by removing frequencies not cor-
responding with heart rate frequencies. Band-pass filter (shown on Fig. 2) was
used for frequencies between 0.8 Hz and 3 Hz, which correspond with heart rate
values between 48 bpm and 180 bpm. Band pass filter was also shown in Eq. 2.
h = F (m), (3)
p = h × 60, (4)
Contactless Heart Rate Measurement 459
where
ea = |P − p|, (5)
ea
er = × 100%, (6)
P
where
– ea – absolute error value,
– er – relative error value,
– P – measured pulse value,
– p – calculated pulse value.
460 J. Gumulski et al.
3 Results
In following tables each of 8 participants was marked with letter from A to
H. Alongside with sample number creates and ID for specific video used as
base material. Both measured and calculated pulse are expressed in beats per
minute (BPM), while relative error is expressed in percentages (%). Video length
and processing time are given in seconds (s). The tables show the results from
recordings with resolutions of 1920 × 1080 px in Table 1, 960 × 540 px in Table 2,
and 640 × 580 px in Table 3.
Table 1. Pulse values and algorithm performance from 1920 × 1080 px videos
Table 2. Pulse values and algorithm performance from 960 × 540 px videos
Table 3. Pulse values and algorithm performance from 640 × 580 px videos
Table 3. (continued)
4 Discussion
The algorithm’s overall performance was the best for those with a size of
960 × 540 px, out of all three file resolutions. It was slightly faster (3.8 FPS
compared to 3.7 FPS with 960 × 540 px input files) but had higher absolute
(14.8 BPM for 640 × 580 px and 10.4 BPM for 960 × 540 px) and relative errors
when utilizing 640 × 580 px input files (18.4% for 640 × 580 px and 10.6% for
960 × 540 px).
However, when compared to 960 × 540 px, processing 1920 × 1080 px data
resulted in a considerable drop in performance, with just 2.5 FPS (a 32.4%
reduction in speed) and similar absolute (10.8 BPM) and relative error (10.6%).
It has been demonstrated that providing the algorithm with even higher resolu-
tion recordings does not necessarily improve its accuracy, but it does significantly
prolong the processing time.
Comparing acquired results of contactless photoplethysmography with ones
from conventional methods such as ECG or pulse oximetry, it can be noticed
that using this technique in general yields less accurate results in detecting heart
rate compared to traditional practices. It should be noted that average absolute
error of measurement performed using photopletysmography based pulse oxime-
ter is 0.6 BPM (10.4 BPM for proposed method on best suited data set) [2].
Contactless Heart Rate Measurement 463
5 Conclusions
The article presents a non-invasive pulse measurement method based on pho-
tographic images of the face. The accuracy of the method results was tested
on three resolutions of the input recordings. Relative and absolute error value
(means and deviations) was obtained for each data set. Algorithm performed the
best on 960 × 540 px videos with absolute error of 10.4 BPM and relative error
of 10.6% as well as shortest processing time. Relative and absolute error val-
ues from 1920 × 1080 px and 640 × 680 were 10.8 BMP and 10.6% respectively,
with longest processing time (slowest, no significant accuracy improvement com-
pared to 960 × 540 px videos) and 14.8 BPM and 18.4% (least accurate). As a
result, it can be deduced that using videos with even higher level of detail will
not improve the accuracy of the suggested method. The method can be useful
when it is impossible to use dedicated medical equipment to measure the human
pulse. Further development could involve using video files shot under various
lighting conditions to see how they affect the accuracy of the results. Another
area for improvement would be to expand the research group, with a specific
breakdown of those wearing makeup or having a beard to conclusively exclude
their impact on the results, as well as to include people of various skin types.
Examining the possibility of including the calculation of the respiratory cycle as
another function in the algorithm would be another director for further research.
Finally, more research on the influence of wearing a face mask would be required
in order to improve the utility of this method during pandemics.
References
1. Allen, J.: Photoplethysmography and its application in clinical physiological mea-
surement. https://doi.org/10.1088/0967-3334/28/3/R01
2. Cheng, Z., et al.: Noninvasive monitoring of blood pressure using optical Ballisto-
cardiography and Photoplethysmograph approaches. In: 35th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
(2013). https://doi.org/10.1109/EMBC.2013.6610029
464 J. Gumulski et al.
1 Introduction
3 Methods
The overall approach consists of two phases. In the first one, the artifact candi-
dates are located based on single-channel analysis. Then, signal regions, including
detected artifact candidates, are subjected to spatial filters for multichannel anal-
ysis that effectively extracts discriminatory information for artifact detection.
This section first presents some details on single-channel detection of the
external electrostatic potential, slow-varying potential, and heart activity signal.
Then, the spatial analysis is presented for verifying the initial artifact detection.
Three methods are presented for the external electrostatic potential, slow-
varying potential, and heart activity signal detection. The analysis is performed
on disjoint EEG segments of constant length. Three separate approaches depend-
ing on the artifact shape have been developed.
External electrostatic potential detection method contains the following
steps:
1. Calculation of maximum and minimum signal values in each block for each
EEG channel.
2. Calculation of medians (mXmax (j) and mXmin (j)) and standard deviations
(σXmax (j) and σXmin (j)) within each block in each channel.
468 J. M. Mitas and K. Zawiślak-Fornagiel
4. Comparison of the values of parameters X max (j, k) and X min (j, k) with the
threshold values gXmax (j) and gXmin (j), respectively. If the value of at least
one of the parameters is greater than the corresponding threshold value in
any of the EEG channels, the given block k is denoted as having an external
electrostatic potential artifact.
1. Calculation of the Fourier transform of the EEG signal in each block for each
EEG channel.
2. Calculation of the value of the function given by the formula (8) defined based
on the Fourier transform:
λ 2
f =0 |
sjk (f )|
F (j, k) = fN 2 f +2 Hz 2
, (3)
f =0 |
sjk (f )| − fAC =fAC −2 Hz |
sjk (f )|
where: j is the EEG channel number, j ∈ {1, ..., J}, k is the block number,
k ∈ {1, ..., K}, λ is the frequency equal to the upper limit of the range in
which the increase in power was tested, and it falls within the range [0, 1];
defaults to 0.625 Hz, fN is the Nyquist frequency, equal to half the EEG
sampling frequency, fAC is the electricity grid frequency; 50 Hz in Europe.
3. Calculation of the threshold value for F (j, k).
To calculate the threshold value, the median value mF (j) of the distribution
of F (j, k) is determined for all k blocks in each channel j. The threshold
values in channel j for the value of F (j, k) in individual blocks are calculated
as:
gF (j) = 0.75 + 0.25mF (j). (4)
4. Comparing the value of F (j, k) with the threshold value gF (j).
If a parameter value is greater than the corresponding threshold value in
any EEG channel, the given block k is denoted as containing a slow-varying
potential artifact.
The method of detecting potentials originating from the electrical heart activ-
ity consists of several stages:
1. Calculation of the Pearson correlation coefficient of the EEG signal with the
ECG signal in each block for each EEG channel:
cov(X, Y )
rXY = , (5)
σX σY
Validation of EEG Preprocessing 469
Potentials collected from the surface of the head with the electrodes attached
to it feature a poor spatial resolution. This affects any pattern recognition per-
formance, including the EEG feature selection and classification as well as the
artifacts selection. In our study, the unwanted structures are to be detected.
In addition to the statistics implemented in the detection of external electro-
static potential or ECG-related distortions, the spatial filter technique has been
employed. It is based on implementing a combination signal from adjacent elec-
trodes (summation or subtraction with the corresponding weights). This leads
to a completely new signal that contains much more useful information. Such
“creation” of a new signal is referred to as spatial filters. Two methods operating
in these fields have been tested: Laplace filter (LF) and Local Average Technique
(LAT) [8].
Assuming a symmetrical arrangement of the electrodes, Laplasjan can be
determined by subtracting the weighted average values of the adjacent electrode
potential according to:
n
1
VLAP = V − Vi (8)
n i=1
where: VLAP is Laplasjan value, V is potential of the considered electrode, n is
number of adjacent electrodes, Vi is potential of the adjacent electrode.
470 J. M. Mitas and K. Zawiślak-Fornagiel
Laplace spatial filters are used to remove the mean value of the potential.
Every electrode features this potential, yet it does not provide useful information.
Local Average Technique increases the overall value if the adjacent compo-
nents feature a no significantly higher, yet spatially exiting value. It is determined
by summation the weighted average values of the adjacent electrode potential:
n
1
VLAT = V + Vi . (9)
n i=1
4 System Architecture
The program was implemented in Python, version 3.8.5, using the PyQy5 library,
version 5.15.1, to implement the graphical interface [2,3]. For the preprocessing
of the EEG signal, the methods available in the libraries NumPy, version 1.19.2,
SciPy, version 1.5.2, math, and statistics [1,4,5] were used.
The user can manually select one detection method or the option to automat-
ically execute all methods sequentially. After an EEG signal is selected by the
user, the indicated file is checked in terms of its saving format and the correct-
ness of the internal structure. After pressing the Upload file button, the content
of the input file is read and loaded into the program. The artifacts detected are
displayed to the user (Fig. 1).
5 Results
The detection evaluation was based on six EEG studies. Selected signal segments,
including artifacts, were assigned by three independent experts as disruptive by
external electrostatic potentials, slow-varying potentials, and potentials originat-
ing from the electrical activity of the heart. They serve as a ground truth in the
evaluation process. Each EEG channel has been divided into 4-second segments
and subjected to the analysis. The comparison was based on four parameters:
sensitivity,
TP
TPR = , (10)
TP + FN
specificity,
TN
SP C = . (11)
FP + TN
Validation of EEG Preprocessing 471
precision,
TP
prec = , (12)
TP + FP
and accuracy
TP + TN
ACC = . (13)
P +N
Both detection steps have been evaluated. First, the channel artifact detec-
tion has been performed, then spatial filtering for the multichannel analysis has
been for a more general approach including the spatial information. The results
are shown in Table 1. The first column presents the channel detection phase, the
second and third column indicates the correction obtained after the spatial fil-
tering procedures, including the LF and LAT, have been employed. The overall
effectiveness of the developed approach is shown in Table 2.
7 Summary
The data presented in the table show the real values of the experiment. For the
obtained results of the experiment, the minimum precision value is 0.89, and the
accuracy is 0.94. In other cases, these measures reach slightly higher values. A thor-
ough a posteriori analysis (after the tests with the participation of specialists) basi-
cally shows the advantages of the automatic artifact elimination method:
Validation of EEG Preprocessing 473
References
1. NumPy documentation. https://numpy.org/. Accessed 13 Dec 2020
2. Pyqt5 Reference Guide. https://www.riverbankcomputing.com/static/Docs/
PyQt5/. Accessed 13 Dec 2020
3. Python languge documentation. https://www.python.org/. Accessed 13 Dec 2020
4. The Python Standard Library documentation. https://docs.python.org/3/library/
index.html. Accessed 13 Dec 2020
5. SciPy documentation. https://www.scipy.org/. Accessed 13 Dec 2020
6. Borkowski, P.: Atlas EEG i QEEG: podrecznik
ilościowej elektroencefalografii i jej
zastosowanie w planowaniu neurofeedbacku. Wydawnictwo Biomed Neutrotechnolo-
gie (2017). https://books.google.pl/books?id=XRyQtAEACAAJ
7. Klekowicz, H.: Opis i identyfikacja struktur przejściowych w sygnale EEG. Ph.D.
thesis, Instytut Fizyki Doświadczalnej, WydzialFizyki, Uniwersytet Warszawski
(2008)
8. Kolodziej, M.: Przetwarzanie, analiza i klasyfikacja sygnalu EEG na użytek inter-
fejsu mózg-komputer. Ph.D. thesis, Wydzial Elektryczny, Politechnika Warszawska
(2011)
9. Rowan, A.J., Tolunsky, E., Sobieszek, A.: Podstawy eeg z miniatlasem. Wydanie I
polskie. Sobieszek A.[red.], Urban & Partner, Wroclaw (2004)
Do Contractions of Abdominal Muscles
Bias Parameters Describing Contractile
Activities of a Uterus? A Preliminary
Study
Dariusz Radomski(B)
1 Introduction
Reproduction is an essential goal of life for many people. However, the biological
and psychological mechanism of reproduction, including labour is still little known
because of historical and cultural constraints which prevail in many countries. On
the other hand, lack of a model describing a given process makes it impossible to
control it effectively. Thus, midwives and obstetricians’ lack of knowledge of labour
biomechanics makes it difficult for them to manage the labour and delivery process.
It results in an excessive number of Caesarean sections and increasing number of
neonatal complications observed in developed countries [10].
The most important clinical task in labour management is to control the
force which causes delivery of a healthy baby. This force is generated by period-
ically variable intrauterine pressure which moves the foetus down to the vagina.
There are two sources of that pressure: (i) uterine contractions and (ii) abdom-
inal pressure created by parturient’s consciously contracting abdominal skeletal
muscles and simultaneously relaxing pelvic floor muscles.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 474–483, 2022.
https://doi.org/10.1007/978-3-031-09135-3_40
Do Contractions of Abdominal Muscles Bias Parameters. . 475
treated as a base of the EHG parameters space. The other published parameters
are highly correlated with the selected for the presented study.
2 Methods
The study protocol was performed as follows. The bipolar electrodes were placed
in the middle point between a navel and a pubic symphysis. The reference elec-
trode was placed on the right hip. The EMG signals were registered using the
commercial EMG device named Summit Cadwell Electromyography System. The
lowest sample frequency was 100 kHz. Firstly, the examined woman was relaxed
for 5 min. Next, the EMG signal was registered for 10 s in the rest state. After,
the woman made the isomeric contractions of the rectus abdominis for 15 s. The
EMG signal is registered too.
The registered EMG signals were down-sampled to 200 Hz and filtered by the
7th order low pass Butterworth?s filter with the cuf-off frequencies responding to
the low passband (0.32–3Hz) and the high passband (20–60Hz). To investigate
an influence of abdominal muscles contractions on low frequency EMG signal
(limited to the “uterine passband”) the following parameters were calculated. In
time domain the was the root mean square computed as:
1 N 2
rms = x , (1)
N i=1 i
1 dϕ(t)
f (t) = (2)
2π dt
where ϕ(t) is the instantaneous phase expressed by
H (x(t))
ϕ(t) = arctan . (3)
x(t)
FS
2 f P (f, t) df
0
f (t) = (5)
F2S
0 P (f, t) df
There are many methods for estimating the time-frequency power spec-
trum P (f, t). The most accurate is the Wigner-Ville distribution. However, this
method is memory and time consuming so its use in clinical monitoring is limited.
Also, it gives biased instantaneous frequency in case of noisy signals.
478 D. Radomski
The Hann sliding window h(t) was applied. The estimator (5) can be treated
as a time dependent mean frequency because the following equation holds:
T
FS
2 f P (f )df
1
f¯ = f (t)dt = 0
, (7)
T 0 F2S
0 P (f )df
where P (f ) is a power spectrum. The right side of the Eq. (7) was used for
computing the mean frequency of the slow EMG waves and EHG signals.The
lowest frequency fd = 0.32 Hz and the highest frequency fu = 3 Hz.
The last studied parameter was the sample entropy that correctly described
a nonlinear character of the electrohysterographical signal. The algorithm for
the sample entropy computation can be found in [9]. The Wilcoxon rank sign
test was applied to identify an impact of abdominal muscle contractions on the
above parameters.
3 Results
The power spectrum, mean frequency, root mean square and sample entropy
values were calculated for the rectified, filtered EMG signals registered in 14
unpregnant women before and during the rectus abdominis isometric contrac-
tion. The Fig. 1 shows the low frequency (up to 3Hz) and the EMG signal reg-
istered in a random woman during contraction of the rectus abdominis. We can
note that contraction of this muscle produces also the low frequency waves which
traditionally are treated as the artefact stemmed from muscle movements. The
performed analyses also indicate that the instantaneous frequencies computed for
slow and fast EMG waves are mutually dependent. The short isometric contrac-
tion of the abdominal muscle is associated with increase of the high frequency.
This frequency decreases when muscles fatigue appears (Fig. 2). Simultaneously,
the strong muscle contraction decreases the slow instantaneous frequency. Next,
it increases when the muscle becomes tired. Interestingly, a cough also reduces
the instantaneous frequency in the low passband wave of the EMG signal.
The Fig. 3(a) presents the persons-averaged power spectrum of the filtered
EMG signals. We observe that during contraction of a rectus abdominis the low
frequency components of EMG signals dominate. The Table 1 shows comparison
of the analyzed parameters during the relax phase and the isometric muscle
contractions.
Do Contractions of Abdominal Muscles Bias Parameters. . 479
Table 1. Comparison of the studied parameters of the EMG signals in the low fre-
quency band. * denotes statistically significant differences (p<0.05)
Fig. 3. The person averaged spectrum of (a) the EMG signal registered during relax
and rectus abdominis contraction and (b) EHG signal observed during pregnancies and
labours
Fig. 4. Comparison EMG of the rectus abdominis and EHG of a random labor woman.
a) slow waves of the EMG signal filtered to 0.32–3Hz registered in a random unpregnant
woman. b) the EHG signal filtered to 0.32–3Hz registered in a random woman during
the 2nd stage of labor
Do Contractions of Abdominal Muscles Bias Parameters. . 481
We note that the time course of these signals is similar. Naturally, the burst
of EHG signal is longer because in our study duration of a muscle contraction
was ca 15–20 s while duration of a uterine may last 60 s. However, these courses
suggest that EMG signals can interfere with EHG signals. The person averaged
spectra obtained for the pregnant women and women being in the active labour
are shown in the Fig. 3(b).
Table 2. Comparison of the studied parameters of the EHG signals in the low frequency
band. * denotes statistically significant differences (p < 0.05)
The changes of the analyzed EHG parameters between the pregnancies (with-
out uterine contractions) and the 2nd stages of labours are similar to those
obtained in the previous experiment. Again, the differences in the all studied
parameters were statistically significant (Table 2).
4 Discussion
Although different parameters of electrohysterograhical signals are published,
their robustness never has been evaluated. The obtains results show that the
most commonly proposed parameters can be biased by contractions of female
abdominal muscles. The shape of spectra computed for the low frequency of
EMG signals and EHG signals were similar. The spectra obtained for pregnant
or labour women are more smoothed because of these women had greater BMI
than participants enrolled to the muscles study.
Admittedly, the study performed post vivo proves that a myometrium has
its bioelectrical activity but frequency characteristic was not investigated [4].
It seems that the “uterine like” changes of the low frequency components of
EMG signals showed in this research can stemmed from the mechanical tremor
of the contracted rectus abdominis. Our results are agreed with the study per-
formed by Qian et al. who show that contractions of the abdominal muscles can
even affect a tocographic signal [8].
Therefore, on the base of the obtained results we suggest that assessment of
uterine activities requires firstly computation of the envelope of EHG signals.
This procedure averages eventual muscle contractions because they are shorter
that the labor uterine contractions.
Confirmation of the presented results needs to measure a bioelectrical signals
during a labor. Then we could monitor simultaneously uterine contractions as
well as abdominal muscles activities.
482 D. Radomski
5 Limitations of Study
According to my best knowledge there is the first study indicated that the con-
tracted abdominal muscles may give a false positive detection of uterine con-
tractions based on EHG signals. However, the major limitation of this study is
measurements of EMG and EHG signals in two separate group of women. It
was necessary in the preliminary study to exclude unaware uterine contractions
during voluntary tensions of abdominal muscles. However, these results should
be confirmed on prospective studies of pregnant women who consciously tense
their abdominal muscles with and without uterine contractions.
6 Conclusions
In our opinion, the presented results have dual significant impact. On the one
side, they firstly show that the commonly used parameters of raw EHG signals
can be biased by contractions of the abdominal muscles, mainly by the abdominis
rectus. Thus, we must develop methods allowing for differentiation sources of
bioelectrical signals measured on the abdominal skin of a pregnant or labor
woman. On the other side, it is a raison to recommend wide-band registration
of so called EHG signals to monitor whether conscious pushes performed by a
woman are synchronic with uterine contractions.
Lastly, the presented considerations initiates a terminological discussion ask-
ing whether EHG really monitors bioelectrical activity of uterus or maybe this
method should be called mechanohisterography Perhaps, the mechanical tremor
stemmed from a contractile myometrium changes the local electrical impedance
and these changes are received by the electrodes placing of an abdominal skin.
This interpretation seems to be confirmed by an initial proposition of an appli-
cation of accelerometers for this purpose [12].
References
1. Ashton-Miller, J.A., DeLancey, J.O.: On the biomechanics of vaginal birth and
common sequelae. Ann. Rev. Biomed. Eng. 11(1), 163–176 (2009). https://doi.
org/10.1146/annurev-bioeng-061008-124823. PMID: 19591614
2. Garcia-Casado, J., Ye-Lin, Y., Prats-Boluda, G., Mas-Cabo, J., Alberola-Rubio, J.,
Perales, A.: Electrohysterography in the diagnosis of preterm birth: a review. Phys-
iol. Measur. 39(2), 02TR01 (2018). https://doi.org/10.1088/1361-6579/aaad56
3. Hayes-Gill, B., et al.: Accuracy and reliability of uterine contraction identifica-
tion using abdominal surface electrodes. Clin. Med. Insights Women’s Health 5,
CMWH.S10,444 (2012). https://doi.org/10.4137/CMWH.S10444
Do Contractions of Abdominal Muscles Bias Parameters. . 483
1 Introduction
The ongoing COVID-19 pandemic has caused a surge in the interest in modelling
of infectious diseases, mostly because mathematical modelling can be used to
among others predict the future number of infections in a particular population.
This knowledge can be used for control strategy planning to mitigate the effects
of these future infections.
There are numerous epidemic models in the literature which can be used to
describe the COVID-19 pandemic, an important group of which are compartmen-
tal models. They are based on differential equations describing the flow between
compartments representing different states of the individuals in the affected pop-
ulation. The models vary in number and type of compartments, ranging from the
classic Susceptible-Infected-Removed (SIR) model, to more sophisticated models
distinguishing for example asymptomatic cases, quarantined, isolated or hospi-
talized individuals, vaccinations or different levels of susceptibility to infection
[1,10,15,17].
To use these models for simulation of the COVID-19 pandemic we need to
estimate the parameters appearing in them. These parameters can be catego-
rized into two groups: stationary parameters representing constants, and non-
stationary which can represent time-dependent functions. For estimation of these
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 487–497, 2022.
https://doi.org/10.1007/978-3-031-09135-3_41
488 K. L
akomiec et al.
parameters one can use publicly available data on the COVID-19 cases in a par-
ticular country. However, due to very high noisiness of these data the estimation
process can be complicated. Additionally, estimation of non-stationary parame-
ters is particularly challenging due to the high computational cost.
Various solutions to the problem of parameter estimation in compartmental
epidemic models have been employed. Some noteworthy approaches to parameter
estimation include ensemble Kalman Filter [3,9], Particle Swarm Optimization
[11] and trust-region [16]. We present a method of estimating the non-stationary
parameters representing time functions in which we use the gradient approach,
where the gradient is obtained from adjoint sensitivity analysis. The computa-
tional cost of the method presented in this paper depends mainly on the length
of the time horizon used for simulation. Moreover, only one simulation of the
adjoint system is required in order to obtain the whole gradient.
2 Problem Formulation
105
16
14
12
Cumulative infections
10
0
0 50 100 150 200 250 300 350
Time [days]
Due to the chosen time horizon and sampling density we can assume that
this data is quasi-continuous in time.
490 K. L
akomiec et al.
where: tf represents the final time of simulation, Cm (t) and Cd (t) are the num-
bers of cumulative infections at time t predicted by the SEIR model and obtained
from data, respectively. Therefore, Cm (t) is equal to:
The gradient of the objective function (5) with respect to the signal β(t) can be
obtained from an adjoint system constructed by transformations of the original
system given by a block diagram [4–7].
Construction of the Adjoint System. To build the adjoint system for this
example we should do the following tasks:
1. Construct a block diagram representing the extended model where the input
is the function β(t) and the output is our objective function J (Fig. 2)
2. Construct the sensitivity model (Fig. 3) by applying the following transfor-
mations to the block diagram of the extended model:
– change each signal to its variation,
– change each nonlinear element to its derivative.
3. Construct the final adjoint system (Fig. 4) by applying the following trans-
formations to the block diagram of the sensitivity model:
– change the direction of all signals to the opposite,
– change all nodes to summing junctions and vice-versa,
– inverse in time all signals from the extended model.
If we stimulate the adjoint system (Fig. 4) by Dirac delta at time 0, then the
gradient ∇β(t) J can be obtained using the following equation:
Fig. 2. Block diagram of the analyzed extended model, signal J(t) ˜ has the property
that at final time tf its value is equal to the objective function J
4 Results
The resulting function β(t) of the model (3) after the optimization procedure
is shown on Fig. 5. The fit of the SEIR model to the analysed cumulative cases
data is shown on Fig. 6. We can also compare the fit of the model (3) to the
daily cases, introducing new variables Dm and Dd representing daily new cases
obtained from the model and from the cumulative data, respectively (Fig. 7).
Simulation of the SEIR model using the function β(t) after the optimization
procedure is shown in Fig. 8.
The model produced a near-perfect fit, smoothing the noisiness of the real
data (which is most pronounced compared with daily cases). However, there
is a visible discrepancy between the model and the analyzed data at the end
of the simulation. To avoid this discrepancy one can use an additional term in
492 K. L
akomiec et al.
Fig. 3. Block diagram of the senstivity model. Signals indicated with an overline are
variations of particulars signals from the extended model
the objective function responsible for improvement of the fit at the end of the
simulation: 2
J = J + c Cm (tf ) − Cd (tf ) , (9)
where c is some positive constant responsible for the strength of the improve-
ment. In that case during optimization we should compute the gradient of the
modified objective function J with respect to the signal β(t).
Finding the Time-Dependent Virus Transmission Intensity 493
Fig. 4. Block diagram of the adjoint system. Signals S(t),
E(t),
I(t),
R(t) are state
˜
variables of the adjoint system, J(t), β(t) are the input and output of the adjoint
system, respectively. Time t∗ is the inversed time equal to tf − t
494 K. L
akomiec et al.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250 300 350
Time [days]
Fig. 5. The time-dependent virus transmission intensity coefficient of the SEIR model
(3) after optimization procedure
105
16
14
12
Cumulative infections
10
0
0 50 100 150 200 250 300 350
Time [days]
Fig. 6. Fit of the model (3) to the COVID-19 cumulative cases data
Finding the Time-Dependent Virus Transmission Intensity 495
104
3.5
2.5
Daily infections
1.5
0.5
0
0 50 100 150 200 250 300 350
Time [days]
Fig. 7. Fit of the model (3) to the COVID-19 daily new cases. Variable Dm = kEI E(t),
variable Dd is obtained by successive subtraction of cumulative data
20
15
Number of people [log scale]
10
-5
0 50 100 150 200 250 300 350
Time [days]
Fig. 8. Simulation of model (3) with using the function β(t) from Fig. 5
5 Conclusion
Compartmental mathematical models are a powerful tool for epidemic simulation
and prediction. However, estimating the parameters of a model both accurately
and within reasonable computation time is a non-trivial task, particularly for
time-dependent parameters. A major obstacle is the high noisiness of infections
496 K. L
akomiec et al.
Acknowledgement. This work was supported by the Polish National Science Cen-
tre under grant number UMO-2020/37/B/ST6/01959 and by the Silesian University
of Technology under statutory research funds. Calculations were performed on the
Ziemowit computer cluster in the Laboratory of Bioinformatics and Computational
Biology, created in the EU Innovative Economy Programme POIG.02.01.00-00-166/08
and expanded in the POIG.02.03.01-00-040/13 project. Data analysis was partially
carried out using the Biotest Platform developed within Project n. PBS3/B3/32/2015
financed by the Polish National Centre of Research and Development (NCBiR). This
work was carried out in part by the Silesian University of Technology internal research
funding.
References
1. Dashtbali, M., Mirzaie, M.: A compartmental model that predicts the effect of
social distancing and vaccination on controlling COVID-19. Sci. Rep. 11(1) (2021).
https://doi.org/10.1038/s41598-021-86873-0
2. Dong, E., Du, H., Gardner, L.: An interactive web-based dashboard to track
COVID-19 in real time. Lancet Infect. Dis. 20(5), 533–534 (2020). https://doi.
org/10.1016/S1473-3099(20)30120-1
3. Engbert, R., Rabe, M.M., Kliegl, R., Reich, S.: Sequential data assimilation of
the stochastic SEIR epidemic model for regional COVID-19 dynamics. Bull. Math.
Biol. 83(1) (2020). https://doi.org/10.1007/s11538-020-00834-8
Finding the Time-Dependent Virus Transmission Intensity 497
4. Fujarewicz, K., Galuszka, A.: Generalized backpropagation through time for con-
tinuous time neural networks and discrete time measurements. In: Rutkowski, L.,
Siekmann, J.H., Tadeusiewicz, R., Zadeh, L.A. (eds.) ICAISC 2004. LNCS (LNAI),
vol. 3070, pp. 190–196. Springer, Heidelberg (2004). https://doi.org/10.1007/978-
3-540-24844-6 24
5. Fujarewicz, K., Kimmel, M., Świerniak, A.: On fitting of mathematical models
of cell signaling pathways using adjoint systems. Mathe. Bioscie. Eng. 2(3), 527
(2005)
6. Fujarewicz, K., L akomiec, K.: Parameter estimation of systems with delays via
structural sensitivity analysis. Discr. Continuous Dyn. Syst. 19(8), 2521–2533
(2014)
7. Fujarewicz, K., L akomiec, K.: Spatiotemporal sensitivity of systems mod-
eled by cellular automata. Math. Meth. Appl. Sci. 41(18), 8897–8905
(2018). https://doi.org/10.1002/mma.5358, https://onlinelibrary.wiley.com/doi/
abs/10.1002/mma.5358
8. Fujarewicz, K., L akomiec, K.: Adjoint sensitivity analysis of a tumor growth model
and its application to spatiotemporal radiotherapy optimization. Mathem. Biosci.
Eng. 13(6), 1131–1142 (2016)
9. Ghostine, R., Gharamti, M., Hassrouny, S., Hoteit, I.: An extended SEIR model
with vaccination for forecasting the COVID-19 pandemic in Saudi Arabia using an
ensemble kalman filter. Math. 9(6) (2021). https://doi.org/10.3390/math9060636,
https://www.mdpi.com/2227-7390/9/6/636
10. Giordano, G., et al.: Modelling the COVID-19 epidemic and implementation of
population-wide interventions in Italy. Nat. Med. 26(6), 855–860 (2020). https://
doi.org/10.1038/s41591-020-0883-7
11. He, S., Peng, Y., Sun, K.: SEIR modeling of the COVID-19 and its dynamics. Non-
linear Dyn. 101(3), 1667–1680 (2020). https://doi.org/10.1007/s11071-020-05743-
y
12. Hethcote, H.W.: The mathematics of infectious diseases. SIAM Rev. 42(4), 599–
653 (2000). https://doi.org/10.1137/S0036144500371907
13. L
akomiec, K., Kumala, S., Hancock, R., Rzeszowska-Wolny, J., Fujarewicz, K.:
Modeling the repair of DNA strand breaks caused by γ-radiation in a minichromo-
some. Phys. Biol. 11(4), 003–045 (2014). https://doi.org/10.1088/1478-3975/11/
4/045003
14. Lauer, S.A., et al.: The incubation period of coronavirus disease 2019 (COVID-
19) from publicly reported confirmed cases: estimation and application. Ann.
Inter. Med. 172(9), 577–582 (2020). https://doi.org/10.7326/M20-0504. PMID:
32150748
15. Leontitsis, et al.: A specialized compartmental model for COVID-19. Int. J. Env-
iron. Res. Public Health 18(5) (2021). https://doi.org/10.3390/ijerph18052667,
https://www.mdpi.com/1660-4601/18/5/2667
16. López, L., Rodó, X.: A modified SEIR model to predict the COVID-19 outbreak
in Spain and Italy: simulating control scenarios and multi-scale epidemics. Results
Phys. 21, 103–746 (2021). https://doi.org/10.1016/j.rinp.2020.103746. https://
www.sciencedirect.com/science/article/pii/S2211379720321604
17. Ramezani, S.B., Amirlatifi, A., Rahimi, S.: A novel compartmental model to
capture the nonlinear trend of COVID-19. Comput. Biol. Med. 134, 104–421
(2021). https://doi.org/10.1016/j.compbiomed.2021.104421. URL https://www.
sciencedirect.com/science/article/pii/S0010482521002158
A Revealed Imperfection in Concept Drift
Correction in Metabolomics Modeling
Abstract. Prediction models that rely on time series data are often
affected by diminished predictive accuracy. This occurs from the causal
relationships of the data that shift over time. Thus, the changing weights
that are used to create prediction models lose their informational value.
One way to correct this change is by using concept drift information.
That is exactly what prediction models in biomedical applications need.
Currently, metabolomics is at the forefront in modeling analysis for phe-
notype prediction, making it one of the most interesting candidates for
biomedical prediction diagnosis. However, metabolomics datasets include
dynamic information that can harm prediction modeling. The study
presents concept drift correction methods to account for dynamic changes
that occur in metabolomics data for better prediction outcomes of phe-
notypes in a biomedical setting.
1 Introduction
Predictive models are at the forefront of diagnostic methods to aid the rapid
detection of disease diagnosis, which ultimately plays the most important role
in treatment [1,2]. Recent studies [3–6] show that metabolomics data has the
potential to address prediction changes in the immune system that may play a
key role in early disease symptoms detection. Thus, this raises new challenges
for the creation of prediction tools that are suitable for metabolomics data.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 498–509, 2022.
https://doi.org/10.1007/978-3-031-09135-3_42
A Revealed Imperfection in Concept Drift Correction 499
Liu et al. [17], these methods mainly focus on evaluating data manifolds so to
reveal interesting properties of data stream. The effectiveness of concept drift
detection should be considered by manifold learning approach [17].
Despite the large potential for developing biomedical machine learning mod-
els [19], concept drift detection is not currently used in biomedical applications
because there is a lack of understanding in how to update such forecasting mod-
els when new data arrive, i.e. when a new event occurs within the given time
series. For example, within metabolomics, some confounding factors changing
in time remain undetected. The study [20] presents applications affected by the
concept drift problem and the associated importance of concept drift detection
due to the adaptive nature of microorganisms in a biomedical scenario. This
is related to the effect of detection of early disease symptoms in metabolomics
modeling such as diabetes [21], cancer prevention [22], and others [23,24].
Our study builds on our previous work that was published in the study by
Schwarzerova et al. [25] in which the occurrence of a concept drift in human
metabolites during adolescence was confirmed. However, the study by Schwarze-
rova et al. [25] only considered the first step in the determination of a concept
drift. The current study verifies the occurrence of this drift on the newly created
models and performs a corrective analysis.
This study focuses on prediction using the published metabolomic data from a
study by Chu et al. [26]. Metabolomics prediction is now at to the forefront of
scientific work because of its causal informational value regarding the immune
system, as showcased in this study [26].
2.1 Dataset
The study by Chu et al. [26] measured the circulating blood metabolome and
integrates metabolite features with deep immunophenotyping from a population-
based cohort. In total, the analysis from the cohort study included 534 healthy
individuals of Caucasian origin aged between 18 to 75 years.
The study by Chu et al. [26] provides metabolomics and phenotype data
which is available in additional file 22, 23, or https://500 fg-hfgp.bbmri.nl. This
dataset was also used in the study by Schwarzerova et al. [25], in which the occur-
rence of a concept drift based on patient’s age was confirmed. Owing study, our
study aims to correct this drift to achieve better prediction values. The nor-
malized metabolite abundance levels were acquired from General Metabolomics
(GM) and Nightingale Health/Brainshake (BM). The two datasets (BM and
GM) were acquired using two different technical platforms: (1) The BM blood
metabolites, represented as 231 features with 200 absolute concentrations, were
measured using nuclear magnetic resonance (NMR) spectroscopy; (2) the GM
data comprised 1,589 features with 257 concentrations, and were measured using
flow injection-time-of-flight mass spectrometry.
A Revealed Imperfection in Concept Drift Correction 501
2.2 Methods
Concept drift analysis can be divided into two branches. The first branch is
concept drift detection which was performed in the study by Schwarzerova et al.
[25]. The second branch represents the concept of drift correction which is much
more important because it leads to improved prediction values.
First, our study includes a prediction phase in which we modeled the classi-
fier using Logistic Regression (LR), Random Forest (RF), and Gradient Boost-
ing (GB) approaches using Scikit multiflow [27,28]. The classifier learned the
metabolites datasets BM and GM, see Fig. 1. The selected classification prob-
lems were retrieved from the study by Schwarzerova et al. [25] as metabolite
prediction markers for patient gender detection, male or female, and a more
challenging prediction of the classification of women who did, or did not take
birth control pills.
Finally, the last part is focused on concept drift analysis. This analysis
includes two steps. First, ADWIN and EDDM detector are used to a concept
drift detection and available in Scikit multiflow [27]. A manual concept drift
502 J. Schwarzerova et al.
3 Results
control pills model rely on BM data in all different modeling approaches. Aver-
age accuracy is 0.768 and average standard deviation is 0.167 in random forest
models. Gradient boosting models have 0.716 as average accuracy and 0.167 as
average of standard deviation. The highest accuracy have prediction classifiers
created by random forest models.
the average of evaluation parameters is highest for the modeling that was based
on the GB classifiers. On the other hand, the minimum values were based on LR
classifiers. On average, the accuracy was increased in 67% the created models.
The poorest accuracy resulted from the LR classifier based on the GM dataset
that focused on birth control pills problem.
After completing of concept drift correction, the best modeling score was
generated by the LR classifiers focused on the prediction women taking birth
control pills, which was based on the BM dataset. Precisely, the accuracy for
this model was 0.917; thus, the accuracy improvement of this model was 0.092.
Fig. 2. Accuracy for each created classifier from general metabolomics (GM) input
data. Gender determines patient gender and birth control pills determines women who
take them, or not. LR represents logistic regression, RF is random forest, and GB is
gradient boosting approaches
Fig. 3. Accuracy for each created classifier from nightingale health/brainshake metabo-
lite (BM) input data. Gender determines patient gender and birth control pills deter-
mines women who take them, or not. LR represents logistic regression, RF is random
forest, and GB is gradient boosting approaches
Unfortunately, we found that the drift detection approach using the ADWIN
approach, or the classifier itself, is completely unsuitable for the analysis of
metabolomics data. As seen in Fig. 4 the best accuracy was 0.461 in Birth Control
Pills (BCP) models based on the GM dataset. The reduction of 40% is seen,
unlike models without consideration of ADWIN.
4 Discussion
5 Conclusion
References
1. Birks, J., et al.: Evaluation of a prediction model for colorectal cancer: retrospective
analysis of 2.5 million patient records. Cancer Med. 6(10), 2453–2460 (2017)
2. Jae-woo, L., et al.: The development and implementation of stroke risk predic-
tion model in national health insurance Service’s personal health record. Comput.
Methods Program. Biomed. 153, 253–257 (2018)
3. Tantawy, A.A., Naguib, D.M.: Arginine, histidine and tryptophan: a new hope for
cancer immunotherapy. PharmaNutrition 8, 100149 (2019)
4. Changsong, G., et al.: Isoleucine plays an important role for maintaining immune
function. Curr. Protein Pept. Sci. 20(7), 644–651 (2019)
5. Iyer, A., Fairlie, D.P., Brown, L.: Lysine acetylation in obesity, diabetes and
metabolic disease. Immunol. Cell Biol. 90(1), 39–46 (2012)
6. Andras, P.: Metabolic control of immune system activation in rheumatic diseases.
Arthritis Rheumatol. 69(12), 2259–2270 (2017)
7. Webb, G.I., Hyde, R., Cao, H., Nguyen, H.L., Petitjean, F.: Characterizing con-
cept drift. Data Min. Knowl. Disc. 30(4), 964–994 (2016). https://doi.org/10.1007/
s10618-015-0448-4
8. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In:
Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295.
Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5 29
9. Grulich, P.M., et al. Scalable detection of concept drifts on data streams with
parallel adaptive windowing. In: EDBT, pp. 477–480 (2018)
508 J. Schwarzerova et al.
10. Imen, K., et al.: Self-adaptive windowing approach for handling complex concept
drift. Cogn. Comput. 7(6), 772–790 (2015)
11. Huang, D.T.J., et al. Detecting volatility shift in data streams. In: 2014 IEEE
International Conference on Data Mining, pp. 863–868. IEEE (2014)
12. Sun, J., Li, H., Adeli, H.: Concept drift-oriented adaptive and dynamic support
vector machine ensemble with time window in corporate financial risk prediction.
IEEE Trans. Syst. Man Cybern. Syst. 43(4), 801–813 (2013)
13. Aggarwal, C.C.: On biased reservoir sampling in the presence of stream evolution.
In: Proceedings of the 32nd International Conference on Very Large Data Bases,
pp. 607–618 (2006)
14. Guajardo, J.A., Weber, R., Miranda, J.: A model updating strategy for predicting
time series with seasonal patterns. Appl. Soft Comput. 10(1), 276–283 (2010)
15. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: an ensemble method for
drifting concepts. J. Mach. Learn. Res. 8, 2755–2790 (2007)
16. Sun, Y., et al.: Concept drift adaptation by exploiting historical knowledge. IEEE
Trans. Neural Netw. Learn. Syst. 29(10), 4822–4832 (2018)
17. Shenglan, L., et al.: Concept drift detection for data stream learning based on angle
optimized global embedding and principal component analysis in sensor networks.
Comput. Electr. Eng. 58, 327–336 (2017)
18. Pless, R., Souvenir, R.: A survey of manifold learning for images. IPSJ Trans.
Comput. Vis. Appl. 1, 83–94 (2009)
19. Wei, L., et al.: Guidelines for developing and reporting machine learning predictive
models in biomedical research: a multidisciplinary view. J. Med. Internet Res.
18(12), e323 (2016)
20. Žliobaité, I.: Learning under concept drift: an overview. arXiv preprint
arXiv:1010.4784 (2010)
21. Wang, T.J., et al.: Metabolite profiles and the risk of developing diabetes. Nat.
Med. 17(4), 448–453 (2011)
22. Clement, I.P., et al.: Chemical form of selenium, critical metabolites, and cancer
prevention. Cancer Res. 51(2), 595–600 (1991)
23. Montemayor, D., Sharma, K.: mGWAS: next generation genetic prediction in kid-
ney disease. Nat. Rev. Nephrol. 16(5), 255–256 (2020)
24. Moats, R.A., et al.: Abnormal cerebral metabolite concentrations in patients with
probable Alzheimer disease. Magn. Reson. Med. 32(1), 110–115 (1994)
25. Schwarzerova, J., Bajger, A., Pierdou, I., Popelinsky, L., Sedlar, K., Weckw-
erth, W.: An innovative perspective on metabolomics data analysis in biomedical
research using concept drift detection. In: Proceedings 2021 IEEE International
Conference on Bioinformatics and Biomedicine (BIBM2021) (2021). (in press)
26. Xiaojing, C., et al.: Integration of metabolomics, genomics, and immune pheno-
types reveals the causal roles of metabolites in disease. Genome Biol. 22(1), 1–22
(2021)
27. Jacob, M., et al.: Scikit-multiflow: a multi-output streaming framework. J. Mach.
Learn. Res. 19(1), 2914–2915 (2018)
28. Ekaba, B.: Logistic regression. In: Building Machine Learning and Deep Learning
Models on Google Cloud Platform, Apress, Brekeley, CA, pp. 243–250 (2019)
29. Oza, N.C., Russell, S.J.: Online bagging and boosting. In: International Workshop
on Artificial Intelligence and Statistics. PMLR, pp. 229–236 (2001)
30. Sanjeev, K., et al.: Design of adaptive ensemble classifier for online sentiment
analysis and opinion mining. PeerJ. Comput. Sci. 7, e660 (2021)
31. Manuel, B.G., et al.: Early drift detection method. In: Fourth International Work-
shop on Knowledge Discovery from Data Streams, pp. 77–86 (2006)
A Revealed Imperfection in Concept Drift Correction 509
1 Introduction
According to World Health Organisation report, between years 1980 and 2014 the
number of people diagnosed with diabetes rose from 108 million to 422 million
[20], and that number is still on the rise. Therefore, improving treatment itself
and the quality of life of the patients have become one of the most important
problems in bioengineering.
Standard diabetes treatment involves multiple glucose measurements and
insulin injections a day. This can be facilitated by continuous blood glucose moni-
tors (CBCM) and insulin pumps that dose insulin in an automatic manner. Intro-
duction of reliable insulin-administering and blood glucose level (BGL) monitor-
ing devices provided an opportunity to create so-called artificial pancreas (AP),
i.e. an automatic system that takes over the role of maintaining the appropriate
BGL from malfunctioning organism. A controller that reads data from CBCM,
determines the amount of insulin to be injected and provides input signal to the
insulin pump is a core of the closed-loop blood glucose control system.
Multiple works concerning control algorithm for such systems have been pub-
lished [3]. Two most widely used algorithms were chosen for further tests –
Proportional-Integral-Derivative (PID) and Model Predictive Control (MPC).
They have been proven to be safe both in a controlled and free environments.
However, intensive physical exercise or other physical effort may result in dra-
matic decrease of blood glucose level, which cannot be counteracted by standard
algorithms. This work aims at introducing modifications to these algorithms
and compare PID and MPC approaches with respect to their performance with
respect to preventing effort-induced hypoglycemia.
The comparison was performed based on results of numerical simulations
obtained for 1000 virtual patients with randomised parameters. Separate param-
eter vectors represent both inter-patient heterogeneity as well as changes in phys-
iological parameters in individual patients.
dG(t)
= −(p1 + X(t))G(t) + p1 Gb + p2 Gin (t), G(0) = Gb (1)
dt
dX(t)
= −p2 X(t) + p3 I(t), X(0) = 0 (2)
dt
dI(t)
= k1 Iin (t) + k2 I(t), I(0) = 0 (3)
dt
512 J. Śmieja and A. Wyciślok
where Gin (t), Iin (t) I(t), X(t) and G(t) represent glucose input, insulin input,
insulin blood concentration, so-called insulin effect and blood glucose, respec-
tively. Additionally Gb indicating basal glucose production and parameters p1 ,
p2 , p3 , k1 and k2 are present.
The model of meal-associated glucose input Gin comes from [12]:
Ggut (t)
= −kgabs Ggut (t) + Gempt (t), Ggut (0) = 0 (4)
dt
where Gempt (t) is a gastric emptying rate and kgabs is a parameter. In [15] it
is argued that, if that variable takes triangular of trapezoidal curve depending
on the meal’s glycaemic index, then, in a reasonable simplification, that model
combined with Eq. (1)–(3) represents BGL dynamics accurately.
One additional part of the model is the inclusion of effort’s effect on BGL.
That was done by modifying Eq. (1) to account for exercise-dependent changes
in minimal model’s parameters shown e.g. in [6]:
dG(t)
= −(p1 + P ∗ (t)X(t))G(t) + p1 Gb + p2 Gin (t) (5)
dt
The term P ∗ (t)X(t) determines the intensity of the effort, or, more accu-
rately, its effect on the insulin-glucose subsystem.
dG(t)
= −(p1 + X(t) − Y (t))G(t) + p1 Gb + p2 Gin (t), (6)
dt
where Y (t) is described by additional equation similar to X(t) in Bergman min-
imal model:
dY (t)
= p3g · L(t) − p2g · Y (t), Y (0) = 0 (7)
dt
The description of pharmacokinetics of glucagon is, as discussed, analogue to
that of insulin:
dL(t)
= k1g · Lin (t) − k2g · L(t), L(0) = 0 (8)
dt
Two-Dimensional vs. Scalar Control.. 513
where Lin (t) is the administered glucagon and L(t) is glucagon concentration in
blood. In simulations described in this paper the values of parameters p1g , p2g ,
p3g , k1g and k2g were adopted as identical to respective values for insulin PK
(without "g" subscript).
Combining Eq. (6) description with patient’s effort shown on Eq. (5), the
patient’s model used in this work for insulin-glucagon controller simulations,
consisted of set of 5 differential equations – two concerning insulin propagation
(Eq. (2) and (3)), two concerning glucagon propagation (Eq. (7) and (8)), and
a modified version of the main equation of Bergman model:
dG(t)
= −(p1 + P ∗ (t)X(t) − Y (t))G(t) + p1 Gb + p2 Gin (t) (9)
dt
Fig. 2. Structure of the control system using both insulin and glucagon – block diagram
Controller Bounds. The output of the control algorithm must have been
limited due to the fact that simulation model should approximate reality, and in
reality neither negative amount, nor excessively high dose of substance can be
administered.
The upper bound was determined in reference to average daily dose of insulin
[9], accounting for the fact that in the developed model the controller is respon-
sible only for the bolus [13].
3 Results
Simulation results obtained for each control structure, in terms of quality indica-
tors specified in 2.3, are presented in the form of histograms. Matlab & Simulink
software with addition of Model Predictive Control Toolbox was used. In each
figure, histograms for one indicator and both control algorithms are presented
(darker shade indicates overlapping of histograms for both controllers). To bet-
ter depict the distribution of indicators for non-zero values, only those values
are presented. The number of simulations in which not a single hypo- or hyper-
glycemic event can be taken from all simulations.
First comparison between proposed and reference structures can be made on the
basis of glucose time course during a single simulation. In Fig. 4 an example of
such results is presented. It includes three meals, as stated in 2.3, and two efforts:
first starting in 160 min and ending in 260 min, second starting in 900 min and
lasting until 1120 min. Both efforts intensities (value of P ∗ ) were equal to 2.
As presented in Table 3, both control structures provided improvement in
terms of reducing the number of hypoglycemic episodes. One of the reasons for
more significant improvement in case of feed-forward structure can be seen in
Fig. 4, where for the first effort no hypoglycemia occurred, as a direct result of
the effort starting.
The results show that for the MPC algorithm, when glucagon is used, the
time in which state of hypoglycemia occurs is significantly lower than in case of
feed-forward structure (Fig. 5. Such a conclusion is not entirely true in case of
PID algorithm, where, though the maximum value of hypoglycemia time is lower
for the controller with glucagon, the overall number of simulations in which such
state occurred is lower for the feed-forward structure. Both control structures
provide an improvement compared to reference case.
Additionally, when looking at number of hypoglycemic episodes, shown in
Table 3, it is clear that with the feed-forward structure, the number of such
episodes is much lower and that the decrease is much bigger in the case of the
PID algorithm. Values above 1000, seen for the MPC algorithm, indicate that
more than one hypoglycemic episode happened during a single simulation. Such
behaviour is characteristic for the tested MPC algorithm.
Speaking about time of hyperglycemia, shown in Fig. 6, for both algorithms
it is clearly visible that feed-forward structure may lead to longer periods of
hyperglycemia. It is consistent with the way this structure works, i.e. reducing
amount of administered insulin some time before BGL would fall due to effort,
what can be clearly seen in Fig. 4. When compared against reference case, for
additional glucagon virtually no change in the character of distribution is visible.
Two-Dimensional vs. Scalar Control.. 519
4 Conclusions
Physical effort may lead to dangerous hypoglycemia, if its effects on blood glucose
are not counteracted by appropriate controller actions. In terms of hypoglycemia
related indicators (both time and number of episodes), the structure that uses
information about planned effort performed better for both control algorithms.
However that comes with a price of increased duration of hyperglycemia. More-
over, as with each feedforward control system, the physical effort beginning,
intensity and duration must be predicted accurately, otherwise its performance
will be unacceptable. Further research is needed to research to what extend
inaccuracies in efforts’ parameters prediction hinder the efficiency of such con-
trol structure.
Using glucagon as an additional control signal provides an alternative to the
feedforward structure. Though slightly less efficient in reducing hypoglycemia
duration, it requires no additional information about planned effort. For that
520 J. Śmieja and A. Wyciślok
reason this solution seems more robust, allowing patients for more flexible daily
regime. However such solution requires changes to the construction of insulin
pumps to allow for administering two substances.
Combining both approaches is possible, with the user interface allowing the
patient to switch the controller from glucagon administering to feed-forward
structure when certain effort is well-planned and determined – like in case of
regular exercises.
Acknowledgement. This work was supported by the SUT internal grant for young
researchers (AW) and the SUT internal grant 02/040/BK_21/1022.
References
1. American diabetes association. 6. glycemic targets. Diabetes Care 40(Supplement
1), S48–S56 (2016). https://doi.org/10.2337/dc17-S009
2. Bergman, R.: The minimal model: yesterday, today and tomorrow. In: R. Bergman,
J. Lovejoy (eds.) The minimal model approach and determinants of glucose toler-
ance. Louisiana University Press, Baton Rouge, USA (1997)
3. Bertachi, A., Ramkissoon, C.M., Bondia, J., Vehí, J.: Automated blood glucose
control in type 1 diabetes: a review of progress and challenges. Endocrinología, Dia-
betes y Nutrición 65(3), 172–181 (2018). https://doi.org/10.1016/j.endinu.2017.10.
011
4. Blauw, H., Onvlee, A.J., Klaassen, M., van Bon, A.C., DeVries, J.H.: Fully closed
loop glucose control with a bihormonal artificial pancreas in adults with type 1
diabetes: an outpatient, randomized, crossover trial. Diab. Care 44(3), 836–838
(2021). https://doi.org/10.2337/dc20-2106
5. van Bon, A.C., Luijf, Y.M., Koebrugge, R., Koops, R., Hoekstra, J.B., DeVries,
J.H.: Feasibility of a portable bihormonal closed-loop system to control glucose
excursions at home under free-living conditions for 48 hours. Diab. Technol. Ther.
16(3), 131–136 (2014). https://doi.org/10.1089/dia.2013.0166
6. Brun, J., Guintrand-Hugret, R., Boegner, C., Bouix, O., Orsetti, A.: Influence of
short-term submaximal exercise on parameters of glucose assimilation analyzed
with the minimal model. Metabolism 44(7), 833–840 (1995). https://doi.org/10.
1016/0026-0495(95)90234-1
7. Colmegna, P.H., Bianchi, F.D., Sanchez-Pena, R.S.: Automatic glucose control
during meals and exercise in type 1 diabetes: Proof-of-concept in silico tests using a
switched LPV approach. IEEE Control Syst. Lett. 5(5), 1489–1494 (2021). https://
doi.org/10.1109/lcsys.2020.3041211
8. Herrero, P., Georgiou, P., Oliver, N., Reddy, M., Johnston, J., Toumazou, C.: A
composite model of glucagon-glucose dynamics for in silico testing of bihormonal
glucose controllers. J. Diab. Sci. Technol. 7(4), 941–951 (2013). https://doi.org/
10.1177/193229681300700416
9. Hirsch, I.: Type 1 diabetes mellitus and the use of flexible insulin regimens. Am.
Family Phys. 60(8), 2343–2356 (1999)
10. Hovorka, R., et al.: Nonlinear model predictive control of glucose concentration in
subjects with type 1 diabetes. Phys. Measur. 25(4), 905–920 (2004)
11. Briscoe, V., Davis, S.: Hypoglycemia in Type 1 Diabetes. In: Type 1 Diabetes in
Adults, pp. 203–220. CRC Press, Boca Raton (2007)
Two-Dimensional vs. Scalar Control.. 521
1 Introduction
In Poland, the number of people suffering from bladder cancer accounts for 6% of
all cancer patients. This cancer ranks 3rd among men and 15th among women
in terms of incidence. Statistically, men suffer from the disease 4 times more
often than women [30]. It is the most common neoplasm in the urinary system,
9th most frequently diagnosed neoplasm. Annually, 330,000 people suffer from
it, of whom 130,000 die [9]. Early and proper diagnosis of the cancer is very
important. The later the disease is detected, the larger the tumor is, it begins
to infiltrate other tissues, shortening the life expectancy. Very often, bladder
tumors recur and metastasize to other organs. These diseases are very serious,
they are characterized by a low survival rate and mainly affect the elderly.
The detection of biomarkers indicating the presence of neoplastic changes
could significantly help patients. Finding predictors to distinguish between inva-
sive and non-invasive types of cancer can help start the right treatment faster
and prevent unnecessary surgeries.
The main aim of this research was to discover the expression of genes respon-
sible for the occurrence of characteristic types of bladder cancer. This analysis
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 522–533, 2022.
https://doi.org/10.1007/978-3-031-09135-3_44
Gene Expression Analysis of the Bladder Cancer Patients 523
identified genes that could become biomarkers for detecting non-invasive and
invasive stages of cancer. Another task was to assess the influence of parame-
ters on the survival of people after radical cystectomy. The results may reveal
which variables could be important in the treatment of patients managed by
the surgery. The last stage of the research was to compare the data with the
available literature and the biological analysis of the detected genes.
In this research, gene expression data collected using DNA microarrays were
used. Microarrays (gene chips) are slides with a dense array of immobilized DNA
oligomers – usually single-stranded DNA molecules consisting of several or sev-
eral dozen nucleotides obtained by chemical synthesis. The analysis of microar-
ray data may be closely related to the occurrence of neoplasms. Determining the
expression of given genes indicates which of them affect the uncontrolled division
of cells, which results in the formation of cancerous structures [34]. This paper
presents an analysis of microarray data from patients with bladder cancer.
Table 1. TNM staging system of bladder cancer – primary tumor (based on [4] and
[26])
Classification Description
Tx Primary tumor cannot be measured
T0 No evidence of tumor
Ta Noninvasive papillary tumor
T1 Tumor invades subepithelial connective tissue
T2 Tumor invades muscle layer
T3 Tumor invades perivesical tissue
T4 Tumor invades at least one of the following organs: prostate,
uterus, vagina, pelvic wall, abdominal wall
present in the non-invasive type of cancer (Ta). Loss of arm of the chromosome
9 is also often found [11]. There are also studies confirming the presence of
6 BLCA proteins, characteristic of bladder cancer [32]. In addition to genetic
factors, the causes of cancer may also include: cigarette smoking, exposure to
armomatic amines, aniline or aromatic hydrocarbon compounds, the presence of
arsenic in drinking water, and family history of bladder cancer [23].
One of the treatment option is radical cystectomy. It is used in patients with
infiltrating bladder cancer and in people with non-infiltrative cancer that has
a high risk of progression. The procedure includes: tumors with high oncolog-
ical potential, T1 tumors of high malignancy, and non-infiltrating tumors that
have not been affected by other treatments. The cystectomy procedure consists
in removing the urinary bladder, seminal vesicles and the prostate in men. In
women, the uterus and appendages are removed. It is also recommended to
remove at least 15–17 lymph nodes to prolong survival [7].
2 Methodology
The analyzed data was quality checked and normalized. Then, a statistical anal-
ysis and a survival analysis for selected parameters were conducted.
2.1 Dataset
The analyzed data (series GSE31684) sourced from the Gene Expression
Omnibus (GEO) database – one of the resources of the National Center for
Biotechnology Information (NCBI). Affymetrix Human Genome U133 Plus 2.0
microarrays were used. The study involved 93 patients with bladder cancer who
were managed by radical cystectomy [15]. The GSE31684 series have been used
in the research presented in the paper by S. Chen et al. [6] to determine genetic
biomarkers related to survival in people with bladder cancer and to evaluate
the prediction of gene expression signature in these patients. The data descrip-
tion included the age of the patients at which the procedure was performed, the
stage of the cancer, the current condition of the patients and information on
chemotherapy. The average age of the patients was 69 years, and survival after
surgery – 48 months. Important parameters are included in the Table 2.
The obtained set was divided into 2 classes according to the level of radical
cystectomy and the selection of differentially expressed genes was made. Fifteen
genes were obtained from the t-test (Table 3).
An ANOVA statistical analysis was also conducted using different stages of
cancer: T1, T2, T3, T4. As a result of the analysis of the T1 and T2 stages,
15 different genes were obtained (Table 4). As a result of comparing the T1 and
T3 stages, 8 different genes were obtained (Table 5). Among the differentiating
genes of the T1–T2 and T1–T3 groups there was the same gene BMP5.
The comparison of gene expression in the T1 and T4 groups resulted in
obtaining 144 genes. None of them matched the genes from the T1–T2 groups
comparison, but a common PRICKLE1 gene was found that was present in both
the T1–T4 and T1–T3 comparisons. In the comparison of the T2–T3 groups, 1
gene was found (SUSD4), while in the T2–T4 groups comparison 147 genes
526 A. Tamulewicz and A. Mazur
were obtained, 127 of them were the same as those determined in the T1–T4
comparison. The study of the T3–T4 relationship indicated 140 differentially
expressed genes, 129 of them were identical to those in the T1–T4 groups study,
while 121 overlapped with the genes found during the T2–T4 comparison.
The series GSE31684 was also used in other studies (i.a. [6,28]). Comparing
the obtained results with the other analyses conducted on the same dataset, it
can be noticed that no common genes were found.
The overall analysis of survival by months and the patient’s condition (Fig. 1)
shows a 50% decrease in the probability of survival in the first 2 years after
surgery. Stabilization is visible between the 25th and the 100th month, the sur-
vival rate is 50–30%. In the 100th month there is a clear decrease by 17 p.p. as
compared to the previous range. The last range between 100th and 150th month
presents stabilization at a level of approximately 13% of the chance of survival.
The survival analysis with regards to lymph node status, cigarette smoking,
tumor stage and grade is presented in Fig. 2. The first analyzed parameter was
the condition of the lymph nodes. In the group of patients with metastases, there
was a decrease in the first 25 months by 60 p.p. In the group of people in which it
was not found whether metastases were present, a similar behavior of the curve
was observed. The best results are seen in the group of patients who did not
have any metastases.
For the second parameter (cigarette smoking) it can be seen that the great-
est decrease in survival for the first years after surgery was observed for ex-
smokers. 20 months after the surgery, the survival rate was 40%, while for the
next 90 months it decreased, down to about 5%.
The analysis of the impact of cancer stage on survival showed that the best
results were found in the Ta/T1 group. At this stage, the probability of survival
528 A. Tamulewicz and A. Mazur
Fig. 2. The survival analysis according to lymph node status, cigarette smoking, tumor
stage and grade by the Kaplan-Meier estimator
was about 85–90%, but before the 100th month it dropped to the value of about
47% and remained at this level. In the T2 group, the probability decreased by 30
p.p. within 2 years from the surgery, and remained stable until the 50th month.
In the T3 stage, the mortality of patients is high up to about 30 months after the
procedure – the probability of survival at that time reached the value of 40%.
For people in the T4 stage, the survival value drops drastically for up to 2 years
after the surgery, and then stabilizes at around 15%.
In the group of low tumor grade, in the first few months, the probability
decreased to 85% and remained at this level until the 85th month. During this
period, it fell again to 55%. In the group of high tumor grade, the probability of
survival decreases from the beginning of the analysis, then it stabilizes around
the 110th month at the value of 12%
Gene Expression Analysis of the Bladder Cancer Patients 529
The differentially expressed genes found in the t-test (Table 3) are described
below based on the Atlas of Genetics and Cytogenetics in Oncology and Haema-
tology [1]:
– ADAM12 – the studies showed that the expression level was correlated with
the stage and grade of tumor malignancy. After surgical removal of the tumor,
the expression level decreased, but increased in the case of relapse [1].
– PTN – studies have shown that high expression was associated with the
advanced stage of several tumors and short survival times. The expression
level was not associated with lymph node metastases or grade. The gene can
serve as a biomarker for predicting adverse outcomes in survival analysis [14].
– ABCC3 – high expression of this gene has been detected in patients with
bladder cancer. Despite this, the role of the gene in the human body remains
unclear. The levels of mRNA and protein in bladder cancer patients were
much higher than in the group of healthy people. Additionally, expression
was found to be associated with tumor size, malignancy and lymph node
metastasis. Research results indicate that ABCC3 may be a potential prog-
nostic marker [20].
– LRRC15 – research indicates its high expression in fibroblasts in many
tumors. In an experiment conducted on several types of cancer, the LRRC15
gene was positively expressed in 47% of bladder cancer cells [27].
– TWIST1 – it was shown that patients with high expression of this gene and
the YB-1 gene have lower survival rates than people with low expression
of these genes. Both genes can be considered as promoters of bladder cancer
progression [31]. Studies have also shown increased expression of the TWIST1
gene in 60% of bladder cancer patients who smoked and had worse clinical
results [12].
– NETO2 – the gene could be a new potential marker of kidney cancer. Research
has shown that the expression of this gene may also be associated with other
malignant neoplasms of the lung and the bladder [25].
– DEPDC7 – the DEPDC7 gene has not been identified as a tumor marker,
however the DEPDC1 protein from the same family was shown to be overex-
pressed in bladder cancer cells. It is also associated with several other types
of cancer and contributes to the carcinogenicity [19].
– BCL2L12 – the studies carried out on patients with bladder cancer showed
that the samples with malignant cells, compared to normal cells, were char-
acterized by a high expression of the BCL2L12 gene. Expression was also
correlated with a higher relapse rate in patients with Ta–T1 stage [13].
Among the 14 genes selected in the t-test, no information was found about 6 of
them: CDK5R1, SPRR1B, SPRR1A, SIX2, NKIRAS1, SAMD12. The remaining
8 genes are associated with cancer, including bladder tumors.
530 A. Tamulewicz and A. Mazur
The differentially expressed genes from ANOVA of T1–T2 groups (Table 4) are
described below:
No references were found in the literature for the remaining 10 genes due to their
occurrence in bladder tumors. However, this does not mean that their expres-
sion is not associated with bladder cancer. In the analysis of the relationship
between the T1–T3 groups (Table 5), 8 genes were found, of which the BMP5
gene was already detected in the earlier comparison and its connection with the
development of neoplasms was described. The differentially expressed genes from
ANOVA of T1–T3 groups are described below:
– COL8A1 – research has shown that the gene is associated with the stage of
bladder tumors. In addition, the COL5A1 and COL8A1 genes were the most
significant in predicting the prognosis of this bladder cancer and can be used
as predictors [8].
– WNT5B – the microarray analysis showed that oncogenic genes: FABP4,
HBP17, RGS4, TIMP3, WNT5B, URB and COL8A1 were significantly sup-
pressed by emodin – a strong mutagen. This confirms the influence of the 2
found genes as cancer factors [5].
– PRICKLE1 – the studies conducted in the group of patients with bladder
cancer of the Ta and T1 stages resulted in obtaining 33 genes differentiating
the groups of patients with relapse after one year and without relapse. Set of
genes, including PRICKLE1, in the group of patients without relapse showed
Gene Expression Analysis of the Bladder Cancer Patients 531
References
1. Atlas of Genetics and Cytogenetics in Oncology and Haematology. http://
atlasgeneticsoncology.org/
2. Ackermann, A., Brieger, A.: The role of nonerythroid spectrin alpha II in cancer.
J. Oncol. 2019, 1–14 (2019)
3. Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple
testing under dependency. Ann. Stat. 29(4), 1165–1188 (2001)
532 A. Tamulewicz and A. Mazur
4. Bostrom, P.J., et al.: Staging and staging errors in bladder cancer. Eur. Urol. Suppl.
9, 2–9 (2010)
5. Cha, T.L.: Emodin modulates epigenetic modifications and suppresses bladder car-
cinoma cell growth. Mol. Carcinog. 54(3), 167–177 (2015)
6. Chen, S., Zhang, N., Shao, J., Wang, T., Wang, X.: A novel gene signature com-
bination improves the prediction of overall survival in urinary bladder cancer. J.
Cancer 10, 5744–5753 (2019)
7. Chłosta, P.: Radykalne wycięcie pęcherza moczowego metodą laparoskopową - tech-
nika, wyniki i ograniczenia. Przegląd urologiczny 2(66), 24–28 (2011). (in Polish)
8. Di, Y., Chen, D., Yu, W., Yan, L.: Bladder cancer stage-associated hub genes
revealed by WGCNA co-expression network analysis. Hereditas 156(1), 1–11
(2019)
9. Długosz, A., Królik, E.: Profilaktyka w raku pęcherza moczowego. Biuletyn Pol-
skiego Towarzystwa Onkologicznego Nowotwory 2(4), 321–327 (2017). (in Polish)
10. Do, M.H., et al.: Targeting CD46 enhances anti-tumoral activity of adenovirus type
5 for bladder cancer. Int. J. Mol. Sci. 19, 2694 (2018)
11. Drewa, T.: Biologia raka naciekającego błonę mięśniową pęcherza moczowego.
Przegląd urologiczny 2(66), 10–12 (2011). (in Polish)
12. Fondrevelle, M.E., et al.: The expression of twist has an impact on survival in
human bladder cancer and is influenced by the smoking status. Urol. Oncol. Semin.
Original Invest. 27(3), 268–276 (2009)
13. Foutadakis, S., Avgeris, M., Tokas, T., Stravodimos, K., Scorilas, A.: Increased
BCL2L12 expression predicts the short-term relapse of patients with TaT1 bladder
cancer following transurethral resection of bladder tumors. Urol. Oncol. Semin.
Original Invest. 32(1), 39.e29-39.e36 (2014)
14. Fröhlich, C., Albrechtsen, R., Dyrskjøt, L., Rudkjær, L., Ørntoft, T.F., Wewer,
U.M.: Molecular profiling of ADAM12 in human bladder cancer. Clin. Cancer Res.
12(24), 7359–7368 (2006)
15. Gene Expression Omnibus: Series GSE31684. https://www.ncbi.nlm.nih.gov/geo/
query/acc.cgi?acc=GSE31684
16. Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J.,
Scherf, U., Speed, T.P.: Exploration, normalization, and summaries of high density
oligonucleotide array probe level data. Biostatistics 4(2), 249–264 (2003)
17. Izdebska, M., Grzanka, A., Ostrowski, M.: Rak pęcherza moczowego - molekularne
podłoże genezy i leczenia. Kosmos 54(2–3), 213–220 (2005). (in Polish)
18. Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations.
J. Am. Stat. Assoc. 53(282), 457–481 (1958)
19. Liao, Z., Wang, X., Zeng, Y., Zou, Q.: Identification of dep domain-containing pro-
teins by a machine learning method and experimental analysis of their expression
in human HCC tissues. Sci. Rep. 6(1), 1–11 (2016)
20. Liu, X., et al.: Overexpression of ABCC3 promotes cell proliferation, drug resis-
tance, and aerobic glycolysis and is associated with poor prognosis in urinary
bladder cancer patients. Tumor Biol. 37(6), 8367–8374 (2016). https://doi.org/10.
1007/s13277-015-4703-5
21. Mares, J., Szakacsova, M., Soukup, V., Duskova, J., Horinek, A., Babjuk, M.: Pre-
diction of recurrence in low and intermediate risk non-muscle invasive bladder can-
cer by real-time quantitative PCR analysis: cDNA microarray results. Neoplasma
60(3), 295–301 (2013)
22. Mattie, M., et al.: The discovery and preclinical development of ASG-5ME, an
antibody–drug conjugate targeting SLC44A4-positive epithelial tumors including
pancreatic and prostate cancer. Mol. Cancer Ther. 15(11), 2679–2687 (2016)
Gene Expression Analysis of the Bladder Cancer Patients 533
23. Mitra, A.P., Cote, R.J.: Molecular pathogenesis and diagnostics of bladder cancer.
Annu. Rev. Pathol. 4(1), 251–285 (2009)
24. Neuzillet, Y., et al.: IGF1R activation and the in vitro antiproliferative efficacy
of IGF1R inhibitor are inversely correlated with IGFBP5 expression in bladder
cancer. BMC Cancer 17(1), 1–12 (2017)
25. Oparina, N., et al.: Increase in NETO2 gene expression is a potential molecular
genetic marker in renal and lung cancers. Russ. J. Genet. 48, 506–512 (2012).
https://doi.org/10.1134/S1022795412050171
26. Poletajew, S.: Ocena stopnia zaawansowania raka pęcherza moczowego. Przegląd
urologiczny 3(73), 22–26 (2012). (in Polish)
27. Purcell, J.W., et al.: LRRC15 is a novel mesenchymal protein and stromal target
for antibody–drug conjugates. Cancer Res. 78(14), 4059–4072 (2018)
28. Riester, M., et al.: Combination of a novel gene expression signature with a clinical
nomogram improves the prediction of survival in high-risk bladder cancer. Clin.
Cancer Res. 18(5), 1323–1333 (2012)
29. Shin, K., et al.: Hedgehog signaling restrains bladder cancer progression by eliciting
stromal production of urothelial differentiation factors. Cancer Cell 26(4), 521–533
(2014)
30. Skrzypczyk, M., Grothuss, G., Dobruch, J., Chłosta, P.L., Borówka, A.: Rak
pęcherza moczowego w Polsce. Bladder cancer in poland. Postępy Nauk Medy-
cznych/Progress in Medicine 25(4), 311–319 (2012). (in Polish)
31. Song, Y.H., et al.: TWIST1 and Y-box-binding protein-1 are potential prognostic
factors in bladder cancer. Urol. Oncol. Semin. Original Invest. 32(1), 31.e1-31.e7
(2014)
32. Szymańska, B., Długosz, A.: The role of the BLCA-4 nuclear matrix protein in
bladder cancer. Adv. Hyg. Exp. Med./Postępy Higieny i Medycyny Doświadczalnej
71, 681–689 (2017)
33. Wilson, C.L., Miller, C.J.: Simpleaffy: a BioConductor package for affymetrix qual-
ity control and data analysis. Bioinformatics 21(18), 3683–3685 (2005)
34. Xiong, J.: Essential Bioinformatics. Cambridge University Press, New York (2006)
The Influence of Low-Intensity Pulsed
Ultrasound (LIPUS) on the Properties
of PLGA Biodegradable Polymer
Coatings on Ti6Al7Nb Substrate
1 Introduction
Titanium alloys are one of the most commonly used metallic biomaterials. High
corrosion resistance, low density and good biocompatibility in an environment
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 534–543, 2022.
https://doi.org/10.1007/978-3-031-09135-3_45
The Influence of LIPUS on PLGA Polymer Coatings 535
of tissue and body fluids are properties that determine their usefulness, espe-
cially as implants for osteosynthesis. However, after many years of research, was
proved that they are not fully biologically inert, but may cause allergies and
other adverse reactions. For this reason and to improve their biocompatibility,
bioactivity and corrosion resistance, the use of a titanium alloy requires proper
surface treatment. One of the most popular method is anodic oxidation, which
allows obtaining a passive layer with properties controlled by process parame-
ters. However, anodization does not prevent the migration of metal ions (such
as vanadium, aluminum or niobium) from the implant surface to the tissue envi-
ronment [5,6,11].
One of the modification methods may be the use of biodegradable polymer
coatings. Application of polymer coatings, such as poly(D,L-lactide-co-glycolide)
PLGA, improve biocompatibility by limiting the penetration of metal ions into
the tissue and body fluids environment, which was proven in previous studies.
Moreover, the degradation of the polymer will not deteriorate the mechanical
properties of the implant, since the stability is ensured by the metal substrate [4,
9,10]. In addition, biodegradable coatings can be also a matrix for the release of
active substances. Released substances may have a beneficial effect on the bone
tissue healing process and also reduce the need for systemic drug therapy. One
of the available substances is ciprofloxacin (CFX) [3].
Ciprofloxacin is the second-generation fluoroquinolone. It is used in case of
urinary tract infections, chronic bacterial prostatitis, bone and joint infections,
lower respiratory tract infections, acute sinusitis, and skin infections due to its
antimicrobial activity. Release kinetics of the CFX is related to the degradation
process of the polymer coating and affects the achieved therapeutic effect [7,12].
To support the bone healing process, not only locally delivered drugs can be
used, but also stimulation with physical factors, which include, among others,
low-frequency pulsed ultrasound.
Low-Intensity Pulsed Ultrasound (LIPUS) is a non-invasive therapy support-
ing fracture healing. Ultrasound waves induce micromechanical stress at a frac-
ture site, which stimulates molecular and cellular responses involved in fracture
healing. LIPUS therapy is frequently used to enhance or accelerate fracture heal-
ing, by employing a sinusoidal ultrasonic wave with specific parameters [1,8].
There are no previous reports describing the influence of LIPUS on prop-
erties of biodegradable polymer coatings. Therefore, the aim of the research
was to determine the influence of Low-Intensity Pulsed Ultrasound therapy
on the degradation of PLGA polymer coating formed on titanium alloy and
ciprofloxacin release kinetics.
The material used in the tests was Ti6Al7Nb alloy. The samples in the form of
discs were taken from rods of 25 mm in diameter. Their chemical composition,
structure, and mechanical properties met the ISO 5832-11 standards require-
ments [2]. The surface of the samples was modified by grinding on 120, 320 and
536 K. Goldsztajn et al.
500 grit sandpaper, sandblasting and anodic oxidation. Anodic oxidation was
carried out in a bath based on phosphorous and sulfonic acid at 97 V for 2 min.
The surface of the metal substrate was coated by dip-coating method
with biodegradable poly(D,L-lactide-co-glycolide) (PLGA) copolymer with
comonomer ratio of 85:15 containing ciprofloxacin (CFX). PLGA was synthe-
sized by the ring-opening polymerization of glycolide and D,L-lactide at 130◦ C
for 24 h and 120◦ C for 48 h at argon atmosphere using Zirconium (IV) as a non-
toxic initiator. The samples were immersed into PLGA with CFX solution in
dichloromethane using Dip Coater PTL-OV6P, MTI CORPORATION (1 dip-
ping cycle, 30 s of immersion time, 1500 mm mm/min of immersion rate). Coated
samples were dried first, for 3 d in the air and next, for one week under reduced
pressure. All samples were sterilized using radiation with an energy of 10 MeV
and a dose of 25 kGy.
The non-coated and coated samples were immersed in 0.1 dm3 Ringer’s solu-
tion of the following chemical composition: NaCl – 8.6 g/dm3 , CaCl2 2H2 O –
0.33 g/dm3 , KCl – 0.3 g/dm3 . The samples immersed in Ringer’s solution were
subjected to LIPUS therapy for 20 min per day with the following parameters:
ultrasound frequency 1.5 MHz, intensity 30 mW/cm2 , pulse duration 250 µs and
a repetition rate of 1 kHz. Samples only immersed in Ringer’s solution were used
as reference material. During the exposure, the samples were kept in the heating
chamber at 37◦ C (Binder FD 115).
The samples’ surface observations were performed using the stereoscopic Zeiss
Stereo Discovery V8 microscope with MC5s camera. Tests were carried out for
metal substrate and the samples with polymer coatings in the initial state and
after 1, 3 and 4 weeks exposure.
To determine the surface wettability of metal substrate and biodegradable
polymer coatings before and after 1, 3 and 4 weeks of exposure, contact angle
measurements (θ) were performed using a drop of distilled water of the vol-
ume of 1.5 mm3 , at room temperature (T=23◦ C) at the test stand consisting
of SURFTENS UNIVERSAL goniometer by OEG and a PC with Surftens 4.5
software for analyzing the recorded image of drops. The studies were carried out
in 60 s with a sampling rate 1 Hz.
Metal ions concentration in Ringer’s solution was measured with JY 2000
spectrometer by Yobin-Yvon, using Inductively Coupled Plasma-Atomic Emis-
sion Spectrometry (ICP-AES) method. The studies were performed for both the
non-coated and the coated samples after 1, 3 and 4 weeks exposure.
Ciprofloxacin release from the polymer coatings was investigated using the
extraction method. The supernatant was determined using high-performance
liquid chromatography (HPLC). Measurements were carried out using VWR-
Hitachi LaChromElite apparatus with the LiChroCART Purospher STAR RP-
18e column (150×4.6, 5µ m). The mobile phase consisted of 2% glacial acetic
acid and acetonitrile (84:16) at the flow rate of 1 ml/min. CFX was monitored
by a diode array detector at 280 nm. The studies were performed for samples
after 1, 3 and 4 weeks of LIPUS therapy and exposure to Ringer’s solution.
The Influence of LIPUS on PLGA Polymer Coatings 537
3 Results
3.1 Surface Observations
Macroscopic observations of the metal substrate in the initial state showed the
topography characteristic for sandblasting, as well as uneven coloration result-
ing from anodic oxidation (Fig. 1a). The biodegradable polymer coatings with
ciprofloxacin applied on the metal substrate are homogeneous, continuous and
transparent (Fig. 2a). Macroscopic observations after exposure to Ringer’s solu-
tions have shown the presence of salt crystals on the surface of the metal sub-
strate (Fig. 1d). No visible changes of PLGA coating were observed after expo-
sure to the corrosive environment (Fig. 2). After LIPUS treatment local brown
discolorations were observed on the surface of titanium alloy, the number of
which increased with the duration of exposure. However, after three weeks these
discolorations still did not cover the entire surface. In addition, these changes
were not observed after four weeks and on the surface of polymer coatings
(Fig. 3). Heterogeneities and discolorations on polymer coatings’ surface have
already been noticed after 1 week of LIPUS therapy. Their amount and areas
increased with the duration of the exposure, which resulted from the degrada-
tion of polymer (Fig. 4).
Fig. 1. Exemplary images of the Ti6Al7Nb surface a) in initial state, b) after 1 week,
b) after 3 weeks, c) after 4 weeks in Ringer’s solution
538 K. Goldsztajn et al.
Fig. 2. Exemplary images of the PLGA polymer coating a) in initial state, b) after
1 week, b) after 3 weeks, c) after 4 weeks in Ringer’s solution
3.2 Wettability
All analyzed surfaces were hydrophilic (Fig. 5). The surface of the substrate
in the initial state was characterized by the lowest wettability. Application of
biodegradable polymer coatings containing active substance increased wettabil-
ity. Exposure to Ringer’s solution caused a decrease in the contact angle of the
coatings, which is probably due to swelling in the solution. The exposure time
improves the wettability of the tested surfaces. The LIPUS treatment almost
always decreased the value of the contact angle both for the substrate surface
and the surfaces of PLGA coatings. The improvement in wettability followed
with the duration of exposure. However, after four weeks in Ringer’s solution
combined with LIPUS, a decrease in wettability was observed for the titanium
alloy.
The density of metal ions mass in the Ringer’s solution after 1, 3 and 4 weeks
of exposition is shown in Fig. 6. The highest mass density of ions (Ti, Al, Nb)
permeating from the surface to the solution was observed for the Ti6Al7Nb
alloy substrate. Biodegradable polymer coatings with active substance applied
on the surfaces of a metal substrate significantly decreased the mass density of
metal ions in the solution. This proves that polymer coatings fulfill a protective
The Influence of LIPUS on PLGA Polymer Coatings 539
Fig. 3. Exemplary images of the Ti6Al7Nb surface a) in initial state, b) after 1 week,
b) after 3 weeks, c) after 4 weeks of LIPUS therapy
Due to the low concentration of the drug in the coating and the detection limit
of the HPLC method, it was not possible to perform a quantitative analysis.
Only a qualitative analysis was carried out in the study.
The increase in time of exposure to Ringer’s solution as well as Ringer’s
solution combined with LIPUS therapy caused an increase in the amount of
ciprofloxacin released from PLGA coating. However, the fastest release of CFX
was observed during the first week of exposure for both cases. The use of LIPUS
therapy increases the amount of released CFX, which is most likely related to
the increase in the degradation rate of the polymer coating.
540 K. Goldsztajn et al.
Fig. 4. Exemplary images of the PLGA polymer coating a) in initial state, b) after
1 week, b) after 3 weeks, c) after 4 weeks of LIPUS therapy
4 Discussion
The analysis of the obtained results shows that the application of LIPUS has an
impact on the degradation of PLGA polymer coatings.
In the initial state, the samples showed a topography characteristic for sand-
blasting, which resulted in uneven color after anodic oxidation. PLGA biodegrad-
The Influence of LIPUS on PLGA Polymer Coatings 541
Fig. 6. Density of metal ion mass penetrating to the Ringer’s solution: a) Ti ions, b)
Al ions, c) Nb ions
542 K. Goldsztajn et al.
able polymer coatings containing the ciprofloxacin, after being applied to the sur-
face, showed translucency, homogeneity and continuity (Fig. 2). After exposure
to Ringer’s solution, the salt crystals were observed on the surface of the metal
substrate (Fig. 1). The changes observed on the surfaces of both the substrate and
biodegradable coatings subjected to LIPUS, show heterogeneity, not observed in
the case of samples exposed to a corrosive environment only (Fig. 3, 4).
The research showed that the use of PLGA polymer coatings with active
substance leads to an improvement in the wettability of the surface. In addition,
hydrophilicity improves with the time of exposure to Ringer’s solution. Most
likely, this is the result of the biodegradable coating swelling influenced by the
solution. LIPUS therapy decreases the contact angle of titanium alloy as well as
polymer coatings, after 1, 3 and 4 weeks of exposure. After the same exposure
periods, the higher wettability of polymer coatings is observed for samples sub-
jected to the LIPUS combined with Ringer’s solution in comparison to exposure
only to a corrosive environment (Fig. 5). This may indicate that ultrasonic waves
increase the rate of polymer coatings degradation.
In Ringer’s solution, in which non-coated and coated samples were immersed,
the presence of titanium, aluminum and niobium ions was observed. The highest
concentration of metal ions was released from the surface of Ti6Al7Nb alloy. The
application of the polymer coatings containing ciprofloxacin causes a reduction
in the metal ions’ mass density compared to the metal substrate. This demon-
strates good protective properties of the polymer coatings effectively limiting
the degradation of the metal substrate. In addition, the mass density of ions
released from the surface increases with the time of exposure to Ringer’s solu-
tion. After LIPUS treatment in the corrosive environment, a significant increase
in the mass density of ions released from the surface was observed in all ana-
lyzed cases. Additionally, larger amounts of ions were observed in the solution
after extended periods of exposure. There is a deterioration of the barrier func-
tion fulfilled by biodegradable polymer coatings (Fig. 6). Moreover, with the
time of exposure to both corrosive environment and LIPUS in Ringer’s solu-
tion an increase in the amount of CFX released from the biodegradable coating
was observed. Also, the use of LIPUS therapy increases the amount of released
ciprofloxacin compared to polymer coatings exposure to the corrosive environ-
ment only. In the overview with the macroscopic observation of the surface, as
well as the results of wettability tests, it can be assumed that the application of
LIPUS may increase the rate of degradation of the polymer coating and active
substance release, which can be controlled by the therapy parameters.
5 Conclusions
Poly Lactic-co-Glycolic Acid (PLGA) biodegradable polymer coatings contain-
ing ciprofloxacin formed on Ti6Al7Nb were characterized as continuous, homo-
geneous and translucent. Polymer coatings formed on titanium alloys decrease
the mass density of metal ions released from metal substrate into the environ-
ment. Moreover, degradation of polymer coatings and ciprofloxacin release can
be controlled by parameters of LIPUS therapy.
The Influence of LIPUS on PLGA Polymer Coatings 543
References
1. Berber, R., Aziz, S., Simkins, J., Lin, S.S.: Low intensity pulsed ultrasound therapy
(LIPUS): a review of evidence and potential applications in diabetics. J. Clin.
Orthop. Trauma 11, 500–505 (2020). https://doi.org/10.1016/j.jcot.2020.03.009
2. ISO 5832–11:2014 Implants for surgery - Metallic materials - Part 11: Wrought
titanium 6-aluminium 7-nobium alloy
3. Jaworska, J., et al.: Development of antibacterial, ciprofloxacin-eluting biodegrad-
able coatings on Ti6Al7Nb implants to prevent peri-implant infections. J. Biomed.
Mater. Res. Part A 108, 1006–1015 (2020). https://doi.org/10.1002/jbm.a.36877
4. Kajzer, W., et al.: Corrosion resistance of Ti6Al4V alloy coated with caprolactone-
based biodegradable polymeric coatings. Maintenance Reliab. 20(1), 30–38 (2018).
https://doi.org/10.17531/ein.2018.1.5
5. Kiel-Jamrozik, M., Szewczenko, J., Basiaga, M., Nowińska, K.: Technological
capabilities of surface layers formation on implant made of Ti-6Al-4V ELI alloy.
Acta Bioeng. Biomech. 17(1), 31–37 (2015). https://doi.org/10.5277/ABB-00065-
2014-03
6. Liu, X., Chu, P.K., Ding, C.: Surface modification of titanium, titanium alloys,
and related materials for biomedical application. Mater. Sci. Eng. R 47, 49–121
(2004)
7. Ma, X., Xia, Y., Xu, H., Lei, K., Lang, M.: Preparation, degradation and in
vitro release of ciprofloxacin-eluting ureteral stents for potential antibacterial
application. Mater. Sci. Eng. C 66, 92–99 (2016). https://doi.org/10.1016/j.msec.
2016.04.072
8. Rutten S., van den Bekerom M.P.J., Sierevelt I. N., Nolte P.A.: Enhancement of
bone-healing by low-intensity pulsed ultrasound. JBJS Rev. 4(3) (2016). https://
doi.org/10.2106/jbjs.rvw.o.00027
9. Szewczenko, J., Kajzer, W., Grygiel-Pradelok, M., Jaworska, J., Jelonek, K.,
Nowińska, K., Gawliczek, M., Libera, M., Marcinkowski, A., Kasperczyk, J.: Cor-
rosion resistance of PLGA-coated biomaterials. Acta Bioeng. Biomech. 19(1), 173–
179 (2017). https://doi.org/10.5277/ABB-00556-2016-04
10. Szewczenko, J., et al.: Biodegradable polymer coatings on Ti6Al7Nb alloy. Acta
Bioeng. Biomech. 21(4), 83–92 (2019). https://doi.org/10.37190/ABB-01461-
2019-01
11. Wang, M.: Surface modification of metallic biomaterials for orthopedic applica-
tions. Mater. Sci. Forum 618–619, 285–290 (2009). https://doi.org/10.4028/www.
scientific.net/MSF.618-619.285
12. Zhang, G.F., Liu, X., Zhang, S., Pan, B., Liu, M.L.: Ciprofloxacin derivatives and
their antibacterial activities. Eur. J. Med. Chem 146, 599–612 (2018). https://doi.
org/10.1016/j.ejmech.2018.01.078
Author Index
A D
Affanasowicz, Alicja, 321 Danch-Wierzchowska, Marta, 28, 43, 321, 332,
Augustyniak, Piotr, 66, 367 435
Dardzińska-Gł˛ebocka, Agnieszka, 107
Domino, Małgorzata, 54
B Doroniewicz, Iwona, 321, 332
Badura, Aleksandra, 345, 406 Dowgierd, Krzysztof, 377
Badura, Dariusz, 332, 435
Badura, Paweł, 84 F
Bajger, Adam, 498 Farhan, Nabeel, 15
Barańska, Klaudia, 94 Fujarewicz, Krzysztof, 487
Basiaga, Marcin, 534
Bednorz, Adam, 194 G
Bibrowicz, Karol, 393 Gaus, Olaf, 15
Gertych, Arkadiusz, 271
Bieńkowska, Maria, 194, 345
Goldsztajn, Karolina, 534
Biguri, Ander, 107
Gorzkowska, Agnieszka A., 28
Borowska, Marta, 54, 107
Grzegorzek, Marcin, 3, 285, 295, 307
Brombach, Nick, 15
Gumulski, Jakub, 455
Brück, Rainer, 15
Bugdol, Marcin, 28, 43, 76, 377, 406, 435 H
Bugdol, Monika N., 28, 43, 332, 406, 435 Hahn, Kai, 15
Hoffmann, Raoul, 3
Hu, Weiming, 285
C
Celniak, Weronika, 66 J
Cenda, Piotr, 143 Jagodzińska, Adrianna, 271
Chen, Haoyuan, 285, 295 Jakubikova, Lucia, 498
Cieślak, Adam, 119 Jangas, Mariusz, 168
Cyprys, Paweł, 271 Jankowska, Marta, 455
Czajkowska, Joanna, 208 Jasiński, Tomasz, 54
Czak, Mirosław, 393, 421 Jasionowska-Skop, Magdalena, 234, 246
Czarlewski, Robert, 28 Jaworska, Joanna, 534
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2022
E. Pietka et al. (Eds.): ITIB 2022, AISC 1429, pp. 545–547, 2022.
https://doi.org/10.1007/978-3-031-09135-3
546 Author Index
K P
Kajzer, Wojciech, 534 Paj˛ak, Anna, 367
Kałuża, Justyna, 132 Pawelec, Łukasz, 421
Kania, Damian, 377, 393 Pierides, Iro, 498
Kasik, Vladimir, 356 Pietka, Ewa, 345
Kawa, Jacek, 194 Pietruszewska, Wioletta, 132
Keil, Alexander, 15 Piórkowski, Adam, 119, 143
Kieszczyńska, Katarzyna, 321 Płatkowska-Szczerek, Anna, 208
Kobus, Monika, 168 Pociask, Elżbieta, 181
Kociołek, Marcin, 222 Polewczyk, Zofia, 377
Koprowski, Robert, 155 Pollak, Anita, 393, 406
Korzekwa, Szymon, 208 Popelinsky, Lubos, 498
Kostka, Paweł, 443 Prochazka, Michal, 356
Kostoval, Ales, 498 Przelaskowski, Artur, 234, 246
Krakowczyk, Łukasz, 377 Psiuk-Maksymowicz, Krzysztof, 487
Krasnod˛ebska, Paulina, 406 Pyciński, Bartłomiej, 271
Kr˛ecichwost, Michał, 84
Kulesa-Mrowiecka, Małgorzata, 377
R
Radomski, Dariusz, 474
L
Rojewska, Katarzyna, 94
Latos, Dominika, 321
Roleder, Tomasz, 181
Łakomiec, Krzysztof, 487
Romaniszyn-Kania, Patrycja, 377, 393, 406
Ławecka, Magdalena, 84
Rosiak, Maria, 393
Ledwoń, Daniel, 43, 321, 332
Różańska, Agnieszka, 94
Li, Chen, 285, 295, 307
Li, Yixin, 295
Liebenow, Laura, 3 S
Lipowicz, Anna, 76, 377, 421 Sage, Agata, 84
Lipowicz, Paweł, 107 Schwarzerova, Jana, 498
Liu, Wanli, 285 Sedlar, Karel, 498
Sieciński, Szymon, 443
M Sitek, Emilia J., 194
Maćkowska, Stella, 94 Śmieja, Jarosław, 510
Małecki, Andrzej S., 28 Smoliński, Michał, 194
Małek, Weronika, 181 Sobczak, Karolina, 168
Mańka, Anna, 393 Spinczyk, Dominik, 94, 455
Maśko, Małgorzata, 54 Steinhage, Axel, 3
Matyja, Małgorzata, 321, 332 Str˛akowska, Maria, 222
Mazur, Alicja, 522 Strumiłło, Paweł, 132
Miodońska, Zuzanna, 84 Strzelecki, Michał, 168
Mitas, Andrzej W., 28, 43, 393, 406, 421 Sun, Hongzan, 285, 295, 307
Mitas, Julia M., 28, 465 Świ˛atek, Adrian, 168
Moćko, Natalia, 84 Szewczenko, Janusz, 534
Moskal, Alicja, 234 Szkiełkowska, Agata, 406
Mrozek, Adam, 332 Szurmik, Tomasz, 393
Myśliwiec, Andrzej, 321, 332, 345, 377 Szymańska, Dżesika, 208
N T
Niebudek-Bogusz, Ewa, 132 Tamulewicz, Anna, 522
Nowińska, Katarzyna, 534 Turner, Bruce, 393
Author Index 547