0% found this document useful (0 votes)
18 views32 pages

Besttt

This study investigates the spread of COVID-19 across five major regions of Africa using deep learning and big data analytics methods, specifically employing models like LSTM and ARIMA. The research identifies Mali, Angola, Egypt, Somalia, and Gabon as the most vulnerable countries with significant expected increases in cases. The LSTM model outperformed others in forecasting, providing predictions for the next 61 days based on cumulative COVID-19 data.

Uploaded by

eyasu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views32 pages

Besttt

This study investigates the spread of COVID-19 across five major regions of Africa using deep learning and big data analytics methods, specifically employing models like LSTM and ARIMA. The research identifies Mali, Angola, Egypt, Somalia, and Gabon as the most vulnerable countries with significant expected increases in cases. The LSTM model outperformed others in forecasting, providing predictions for the next 61 days based on cumulative COVID-19 data.

Uploaded by

eyasu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

SN Computer Science (2023) 4:374

https://doi.org/10.1007/s42979-023-01801-5

ORIGINAL RESEARCH

Forecasting the Spread of COVID‑19 Using Deep Learning and Big Data
Analytics Methods
Cylas Kiganda1 · Muhammet Ali Akcayol1

Received: 2 June 2022 / Accepted: 22 March 2023 / Published online: 3 May 2023
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2023

Abstract
To contain the spread of the COVID-19 pandemic, there is a need for cutting-edge approaches that make use of existing
technology capabilities. Forecasting its spread in a single or multiple countries ahead of time is a common strategy in most
research. There is, however, a need for all-inclusive studies that capitalize on the entire regions on the African continent. This
study closes this gap by conducting a wide-ranging investigation and analysis to forecast COVID-19 cases and identify the
most critical countries in terms of the COVID-19 pandemic in all five major African regions. The proposed approach lever-
aged both statistical and deep learning models that included the autoregressive integrated moving average (ARIMA) model
with a seasonal perspective, the long-term memory (LSTM), and Prophet models. In this approach, the forecasting problem
was considered as a univariate time series problem using confirmed cumulative COVID-19 cases. The model performance
was evaluated using seven performance metrics that included the mean-squared error, root mean-square error, mean abso-
lute percentage error, symmetric mean absolute percentage error, peak signal-to-noise ratio, normalized root mean-square
error, and the R2 score. The best-performing model was selected and used to make future predictions for the next 61 days.
In this study, the long short-term memory model performed the best. Mali, Angola, Egypt, Somalia, and Gabon from the
Western, Southern, Northern, Eastern, and Central African regions, with an expected increase of 22.77%, 18.97%, 11.83%,
10.72%, and 2.81%, respectively, were the most vulnerable countries with the highest expected increase in the number of
cumulative positive cases.

Keywords Deep learning · COVID-19 · Artificial neural networks · Long short-term memory · Autoregressive integrated
moving average · Prophet

Introduction this disease will recover with no strong treatments applied.


According to the findings by WHO [22], individuals may
The coronavirus disease (COVID-19) is an epidemic display symptoms ranging from low to average. Individuals
that first appeared in Wuhan, Hubei Province, China, on with chronic medical illnesses, particularly the elderly, are
December 31, 2019. It was initially reported as a cluster of more likely to experience severe symptoms of this.
pneumonia cases. After a thorough analysis of the sever- The COVID-19 virus spreads from person to person via
ity of the spread, the World Health Organization (WHO) tiny fluid spread when an infected person coughs, sneezes,
declared COVID-19 as a pandemic on March 11, 2020, as speaks, or breathes. When these fluids become stuck on sur-
described in the research from WHO timeline to COVID- faces such as door handles, the virus can be spread to others
19 [4]. COVID-19 is caused by the SARS-CoV-2 virus and who come into contact with these surfaces without taking
can infect anyone. In most cases, patients infected with the necessary medical precautions. To prevent the spread
of this virus, it is recommended that people keep a 1-m dis-
* Cylas Kiganda tance from other people, wash their hands frequently or use
kigandacylas@gmail.com; cylas.kiganda1@gazi.edu.tr a disinfectant, wear a mask, and get vaccinated as recom-
Muhammet Ali Akcayol mended by WHO [22].
akcayol@gazi.edu.tr Various approaches have been deployed to prevent and
control the spread of the COVID-19 pandemic. Among
1
Computer Science Department, Institute of Informatics, Gazi these strategies is the prediction of the spread of the
University, Ankara, Turkey

SN Computer Science
Vol.:(0123456789)
374 Page 2 of 32 SN Computer Science (2023) 4:374

COVID-19 virus. In this context, the spread of COVID-19 COVID-19-positive cases. In “Model Selection Criteria”,
is considered a time series problem to which deep learn- the models used in this study are discussed in detail.
ing forecasting algorithms and big data statistical mod- This study uses the African continent as a case study.
els are applied. Among the deep learning algorithms are In this comprehensive approach, the African continent was
the long short-term memory (LSTM) model as applied broken down into five major subregions, including Northern,
by Marzouk et al. [12], Hssayeni et al. [6], Yu et al. [24], Southern, Eastern, Western, and Central Africa. While most
Zeroual et al. [25], Pal et al. [14] and Shastri et al. [16]; studies focus on a single or a few countries as a case study
the convolutional neural network (CNN) model as applied during the prediction of the spread of COVID-19, this study
in research by Huang et al. [8], which performs well on included and utilized all the African continent’s regions. In
image data such as X-ray images; the autoencoder model, this study, the successful prediction model was selected by
which was applied by Hu [7]; gradient boosting, which using seven performance indicators. The performance indi-
provided the best results in research conducted by Zoabi cators include mean-square error (MSE), root mean-square
et al. [27]; and the Prophet model, which was applied by P. error (RMSE), mean absolute percentage error (MAPE),
Wang et al. [19] to perform epidemiological trend predic- symmetric mean absolute percentage error (SMAPE) R2
tion. Big data statistical models include models such as score, normalized root mean-square error (NRMSE), and
the auto-regressive integrated moving average (ARIMA) peak signal-to-noise ratio (PSNR). In “The Framework Of
model as applied by Gebretensae and Asmelash [5] and The Applied Approach”, the performance metrics are pro-
the susceptible-exposed infectious-removed (SEIR) model, vided in detail. The best-performing model was then used to
which has been proven to be a robust model to predict perform the prediction of COVID-19 cases 61 days ahead of
the trend of COVID-19 as applied by Yang et al. [23]. schedule. In “Results and Discussion”, the model results are
Among the deep learning models used to perform time provided and discussed in detail.
series prediction, the LSTM has been widely used due to
its successful results in most research experiments. On the
other hand, the ARIMA statistical model has also been Related Work
widely applied in the health sector, for example, in a study
by Y. W. Wang et al. [20] to predict the spread of hepatitis In this section, prediction approaches and methods used in
B disease, in the forecasting of medical service demand other research studies are addressed. These studies mainly
by Y. Huang et al. [9] and in the prediction of daily blood concentrate on the prediction of the spread of COVID-19
sampling room visits by Zhang et al. [26]. using both statistical and deep learning tools.
The following questions will be addressed by this In a research study by Gebretensae and Asmelash [5],
research: the autoregressive integrated moving average (ARIMA)
algorithm was used to forecast the spread of COVID-19 in
1. What is the best-performing prediction model given the Ethiopia. The autocorrelation function (ACF) and partial
COVID-19 cumulative positive cases data from African autocorrelation functions (PACF) were used to obtain the
countries in five key regions? model’s optimal terms. It was observed that the ARIMA
2. Is it possible to estimate the total number of cumula- models, ARIMA (0, 1, 5) and ARIMA (2, 1, 3), produced
tive positive cases 61 days ahead of time using the best the best results. Ribeiro et al. [15] developed a stacking-
prediction model? ensemble learning algorithm that included ARIMA, cubist
3. After a 61-day forecasting period, which countries on regression, random forest, and support vector regression. In
the African continent are in the most vulnerable position this study, the Gaussian process was employed as a meta-
in terms of the COVID-19 virus's spread? learner, while the random forest, ridge regression, and other
algorithms were utilized as foundational learners. In this
In this study, a comparative and analytical approach study, it was observed that the support vector regression
were followed to predict the spread of the COVID-19 algorithm produced the best results.
virus. This approach includes two deep learning models Abdulmajeed et al. [1] applied a deep learning ensemble
and a statistical model. The deep learning models include method to predict COVID-19 cases in Nigeria. The empha-
LSTM and Prophet. The statistical model comprises the sis in this study was to create a prediction method that uses
ARIMA model. In most studies, the modeled ARIMA as little data as possible to give accurate predictions. This
model does not include the seasonal component of the was because there was a problem with limited training data
problem. However, in this study, it is modeled to take into for models to learn the COVID-19 spread. This deep learn-
consideration the seasonal component of the time series ing approach combined four prediction approaches, which
problem. The spread of COVID-19 was considered to included one statistical method called ARIMA. Among the
be a univariate time series problem using the number of other deep learning models in the ensemble approach were

SN Computer Science
SN Computer Science (2023) 4:374 Page 3 of 32 374

the Prophet model (supported and provided by Facebook), were all model input features. Techniques such as early stop-
the Holt–Winters exponential smoothing model, and the ping were used to improve the results.
generalized autoregressive conditional heteroscedasticity Pal et al. [14] used the LSTM model and Bayesian opti-
(GARCH). While applying the ARIMA model, non-seasonal mization to determine COVID-19 risk categories. To obtain
phenomena were used. To find the best ARIMA model, strat- the hyperparameters, the search space had to be defined.
egies such as brute search, autocorrelation function inspec- The optimal hyperparameters were obtained and used by
tion, and partial autocorrelation function plots are used. the model in the local trend prediction phase to perform
Wang et al. [19] used a hybrid prediction strategy to pre- country-specific predictions. Finally, a fuzzy rule-based risk
dict the COVID-19 cumulative cases in their study. This categorization process was carried out, in which the data
included the logistic and Prophet models. With the Prophet obtained from the previous module was used to determine
model, the primary focus was on modeling non-periodic each country’s risk status. This study concluded that weather
changes. The model included the date and the total number had no significant impact on the spread of COVID-19.
of COVID-19 cases obtained from a specific country. The Shastri et al. [16] conducted research on COVID-19
logistic model was used to identify the quickest rising point time series prediction and comparative analysis using
in the data in this hybrid method. The output of this model variants of long short-term memory neural network mod-
is then fed into the Prophet model, which is used to make the els. Among them were models such as bidirectional long
final forecast. Marzouk et al. [12] used three deep learning short-term memory, convolutional long short-term memory,
models to forecast the spread of COVID-19 in Egypt: the and stacked long short-term memory. Two countries were
LSTM, convolutional neural network, and multilayer per- used as case studies. Among these are the USA and India.
ceptron neural network. In this study, the COVID-19 data Because models are sensitive to the size of data input values,
was modeled as a time series data. In this study, the LSTM tools like MinMaxScaler were used to perform data normali-
outperformed the other two models. zation. Various regions of the USA and India were divided
Hssayeni et al. [6] used mobility data to predict the into groups based on the severity of the COVID-19 situation.
COVID-19 risk spread using the LSTM model and the These were the initial, moderate, and severe groups. Regions
gradient tree boosting model in their study. In this study, it with a high number of COVID-19 cases were classified as
was discovered that the number of daily cases decreased in severe. When compared to the other two models, the convo-
the retiree context, while it increased in the youth context. lutional LSTM model produced the best results.
Yang et al. [23], on the other hand, used the susceptible- In the related literature, several models have been used to
exposed-infectious-removed (SEIR) and the LSTM models forecast the spread of COVID-19 in a couple of countries.
to forecast the spread of the COVID-19 pandemic in China. However, the African continent has not been extensively
The SIER algorithm was used to model epidemiological and studied in this regard. This study aimed to close this gap
mobility data by specifying parameters, and the parameter by applying the most successful model (LSTM) among the
was defined as the product of the daily number of people rest of the forecasting models to conduct an extensive inves-
in contact with COVID-19 patients and the likelihood of tigation and analysis of African states from the five major
transmission. σ was the amount of time it took for a COVID- regions of the continent. In addition, the most critical states
19 patient to develop infection symptoms. Finally, γ was with the highest expected COVID-19 increase rate from each
determined to be the average mortality or recovery rate. The region were identified for immediate action in the region.
rate of pandemic spread in Hubei province was determined
using these parameters. These parameters were then fed into
the LSTM model as input. Methods and Materials
Zeroual et al. [25] used five models to predict new and
recovered COVID-19 cases. The recurrent neural network, Data Gathering
long short-term memory, bidirectional LSTM, gated recur-
rent units, and variational autoencoder were among the mod- Africa’s Geographical Regions and Populations
els used. The study was carried out in six different countries:
Italy, Spain, France, China, the USA, and Australia. The The case studies used in this study included countries from
variational autoencoder model produced the best results. The the five major regions of the African continent. These
best model was used to forecast cases for the next 2 weeks. regions, as depicted in Figure 1, include the Northern, East-
To forecast the positive COVID-19 outcome in a PCR test, ern, Southern, Central, and Western regions.
Zoabi et al. [27] used the gradient-boosting algorithm in Much work on the COVID-19 pandemic has been done in
conjunction with the Shapley additive explanations (SHAP) the literature. In some research, several or individual Afri-
bee-swarm plot. Sex, contact with COVID-19 patients, and can countries have been used as case studies, for example,
the presence of the five most notable COVID-19 symptoms research done by Abdulmajeed et al. [1]. In this study, the

SN Computer Science
374 Page 4 of 32 SN Computer Science (2023) 4:374

Fig. 1  Africa’s five major


regions

African continent is considered from a broader perspective, is based directly on the immediate past data point, while an
including countries from each of the major regions that make AR(2) implies that it is based on two past data points in the
up the continent. This study performs a comparative analysis series by Kırbaş et al. [10]. The "I" component stands for the
of the COVID-19 pandemic spread. integrated element, which shows the amount of difference
between the current data points and their preceding values.
COVID‑19 Data This is part of the ARIMA model that handles the data sta-
tionarity requirement for better results in ARIMA time series
A humanitarian data exchange [2] source provided the processing, which is attained by the differencing process as
COVID-19 dataset used in this research. This informa- explained in the research by the Noureen et al. [13]. Station-
tion was gathered by first splitting the data of each country arity in ARIMA processing refers to the condition when the
into distinct groups based on the country's geography. The mean and variance statistical parameters in the time series
Northern, Southern, Central, Eastern, and Western regions data are constant with respect to the time factor. The last
of Africa were used in the study. Model fitting was then done part in the basic ARIMA structure is the "MA" part, which
for each country separately. This data was split into training represents the moving average. This component displays the
and testing datasets, with the former accounting for 80% of linear combination that exists between the error values at
the total prediction models. past intervals in the time series as denoted by Ribeiro et al.
[15]. The standard notation of the basic ARIMA model is
ARIMA Model denoted as ARIMA (p, d, q). The p, d, and q terms represent
the autoregressive, differencing, and moving average terms
The ARIMA model is made up of three main parts: the as described in the research by Abdulmajeed et al. [1]. The
terms “AR,” “I,” and “MA” are among these elements. As mathematical notation for the AR (p) term can be repre-
mentioned by Noureen et al. [13], the “AR” term refers to sented as shown in Eq. 1.
the autoregression parameter. This shows that the variable
under consideration in this context has a linear relationship
Yt = 𝛿 + 𝜑1 Yt−1 + 𝜑2 Yt−2 + ⋯ + 𝜑p Yt−p + 𝜀t . (1)
between its present and prior values. That is to say, an AR(1)
of order one implies that the current data point in the series

SN Computer Science
SN Computer Science (2023) 4:374 Page 5 of 32 374

In the above equation, Yt denotes the time series value at generative additive model (GAM), which is a linear regres-
a given time point t. The p, δ, and εt denote the autoregres- sion model whose linear variable is reliant on smoothing
sion term, fixed value, and the error value, respectively. The functions. GAMs can be quantitatively represented using
moving average component can be defined mathematically Eq. 5.
in Eq. 2.
g(E(Y)) = 𝛽0 + f 1(x1) + f 2(x2) + ⋯ + fm(xm). (5)
Yt = 𝜇 + 𝜀t + 𝜃 1 𝜀t−1 + ⋯ + 𝜃2 𝜀t−2 + 𝜃q 𝜀t−q . (2)
In Eq. 6, Y represents the univariate response variable,
In Eq. 2, q depicts the order of the moving average term. x1 represents the predictor variable, and f1 represents the
The difference term d can be obtained from Eq. 3. smoothing functions. Due to its use of GAM model formula-
tion, the Prophet model has a variety of benefits, including
ΔYt = Yt − Yt−1 = Yt − LYt . (3) flexibility and quick fitting times, and evaluates a time series
In Eq. 3, ∆Yt denotes the stationary time series value at problem from three perspectives, including trend, seasonal-
a time interval t. ity, and holiday components, as discussed in research carried
out by Taylor and Letham [18]. The trend component takes
(1 − 𝜑1 L − 𝜑1 L2 − ... − 𝜑p Lq )ΔdYt = 𝛿 + 𝜃1 𝜀t−1 + ⋯ + 𝜃q 𝜀t−q . into account the likelihood of time series data increasing or
(4) decreasing over time. Seasonality, on the other hand, looks
Equation 4 is a combination of all the equations for the at data changes that happen over a short time period.
basic ARIMA model terms. This denoted the full ARIMA y(t) = g(t) + s(t) + h(t) + 𝜀t . (6)
(p, d, q) model equation with the complete set of terms com-
puted and represented. The final predicted value y(t) is obtained from a combina-
The partial autocorrelation function (PACF) and autocor- tion of the trend, seasonal and holiday component functions
relation function (ACF) graphs, as shown in Fig. 2, can also as shown in Eq. 6 above, where εt represents the changes that
be used to obtain the ARIMA model's p and q terms. The are not captured by the model [18].
ACF plot is a graphical representation of the average cor-
relation between data and prior values in a time series over LSTM Model
different lag intervals. The only difference between the two
exists in the fact that PACF reveals correlations within a The LSTM model is composed of three main core compo-
shorter lag interval, as explained in the research by Noureen nents. These include the forget gate, input gate, and output
et al. [13]. gate [16]. The forget gate identifies the degree to which past
data is obliterated. The input gate receives the data that is
Prophet Model taken into the cell’s internal state, while the output gate is
used to create the next hidden state or output that is obtained
The Prophet model is a deep learning model for time series from the existing internal state value.
forecasting. The Facebook group created and maintains this The above figure displays the major building blocks of
model as an open-source initiative. According to Taylor and the LSTM model. It is evident that the main building blocks
Letham [18], it is based on the generic specification of a of the LSTM model consist of the forget gate, input gate

Fig. 2  Representation of PACF and ACF plots

SN Computer Science
374 Page 6 of 32 SN Computer Science (2023) 4:374

and output gates as described by Le and Lee [11]. Several n


100% ∑ |F − A |
| t t|
activation functions are used such as the tanh and sigmoid SMAPE = ( ) . (10)
n t=1 ||At || + ||Ft || ∕2
functions for extracting the optimal model weight values.
The observed vector numbers are represented by At , the
Model Selection Criteria forecasted value is represented by Ft , and the overall number
of observations is represented by n in Eq. 10.
In this study, seven metrics were adopted to assess the pre-
dictive performance of the models. These metrics include, Peak Signal‑to‑Noise Ratio
the peak signal-to-noise ratio (PSNR), mean-squared error
(MSE), root mean-square error (RMSE), symmetric mean
� �
MAXf
absolute percentage error (SMAPE), mean absolute per- PSNR = 20log10 √ . (11)
MSE
centage error (MAPE), normalized root mean-square error
(NRMSE), and R2 score. The highest signal value is expressed by MAXf in Eq. 11.
MSE stands for mean-square error.
Mean‑Square Error
Normalized Root Mean‑Square Error
The mean-squared error can be calculated numerically as
below. NRMSE =
RMSD
. (12)
Ymax − Ymin
n
1 ∑( ∧ )2
MSE = Yi − Yi (7) The root mean-square deviation (RMSD) is defined in
n 1=1
Eq. 12. The RMSD measure is also known as the RMSE
The overall number of observations n , the exact value statistic (Fig. 3).
Y, and the anticipated value Y^ are all represented in Eq. 7.
R2 Score
Root Mean‑Square Error ∑ � �2
i yi − fi
R2 = ∑ � � . (13)
The RMSE can be calculated using Eq. 8. i 2
i yi − Y

In Eq. 13, the projected values are represented by fi ,
∑n � �2
1 1 = 1 Yi∧ − Yi
(8)
RMSE =
n whereas the original values are represented by yi , and the
mean is represented by Y i.
The overall number of observations n , the actual value
Y, and the anticipated value Y^ are all represented in Eq. 8.
The Framework of Applied Approach

Mean Absolute Percentage Error In Fig. 4, the major stages of this study include splitting the
preprocessed positive COVID-19 cumulative cases data into
Equation 12 can be used to represent this performance meas- 80% training and 20% testing datasets, fitting the models,
ure numerically. validating the model performance using the performance,
n and then selecting the best-performing model to use it to
100% ∑ || At − Ft || forecast the future positive COVID-19 cases for the next
MAPE = . (9)
n t=1 || At || 61 days.

The observed vector of numbers is represented by At , the


projected value is expressed by Ft , and the total number of Rationale for the Selected Models
data points is represented by n in Eq. 9.
This section aims to address the reasons for choosing the
LSTM, ARIMA, and Prophet models to perform the predic-
Symmetric Mean Absolute Percentage Error tion and forecasting of the COVID-19 cumulative positive
cases data for the various African countries in this study.
Equation 10 can be used to represent this measurement
numerically.

SN Computer Science
SN Computer Science (2023) 4:374 Page 7 of 32 374

Fig. 3  General structure of an LSTM model

Fig. 4  Structural depiction of


the methodology used in this
study
COVID-19
dataset

80% Train data 20% Test data

Training phase

ARIMA
LSTM Prophet

Testing phase

Evaluation

Forecasting phase

61-day forecasting

SN Computer Science
374 Page 8 of 32 SN Computer Science (2023) 4:374

LSTM model is for a given combination of hyperparameter terms,


which further makes it easier to streamline the prediction
This model is a special class of recurrent neural network results. This model has the ability to process data with sea-
deep learning models with the capability to identify and sonal trends by further increasing the hyperparameter terms
learn the relationship that exists within a given series of to include the seasonal factors, as explained by Y. Wang
data observations, as described in the research by Yu et al. et al. [21]. This makes it possible to capture any seasonal
[24]. This is possible because the LSTM has memory mod- relationship within the COVID-19 dataset at any given time.
ules that act as a connection between past and current data
points. Important data points with strong desired insights are Prophet
retained, while those with weaker weights are disposed of in
the forget module of the LSTM model. This both optimizes According to Abdulmajeed et al. [1], this is an additive
the model to concentrate on extracting the dependence that regression model supported by Facebook with a robust
exists within a given input sequence and also minimizes the architecture that takes into account seasonal dynamics
error by eliminating noise points from the learned data at within a given data sequence, such as yearly, weekly, and
this stage. As described by Zeroual et al. [25], the LSTM daily trends. It also handles data with missing data points
model eliminates the problem of vanishing gradients that is and extreme values well, since it has the ability to identify
faced with traditional recurrent neural networks, whereby data anomalies as described by Y. Wang et al. [21]. This
the computed gradient fluctuates within peak ranges, that makes it an ideal solution to process and predict the COVID-
is to say, either too big or too small. According to Zeroual 19 datasets in some countries with data of this nature, such
et al. [25], this issue arises during the training phase. The as data that has sharp spikes from the normal trend in the
LSTM model solves the vanishing gradient problem with general data. According to research by Letham and Taylor
the help of activation vectors used in the forget gate to deter- [18], the Prophet model has built-in computational support
mine the gradient values. It is at this point that the LSTM that handles non-linear growth curves when the natural
model, by using a summative strategy, identifies the optimal boundary is reached and also offers flexibility in tuning,
terms to adjust at a given time step, which improves accu- such as smoothing features that capture and model season-
racy and overall performance. The LSTM model implemen- ality constraints in the data to make a good fit regarding
tation provides several hyperparameters, such as the batch historical cycles. It is also easy to capture and model the
and epoch numbers, which can be easily adjusted to obtain effects of events such as holidays in the time series data with
better results. This makes it easy to fit and use the LSTM the Prophet model using limited data [18]. These qualities
model to achieve accurate results. These robust qualities of make this model appropriate to perform the prediction of
the LSTM model make it ideal for performing the time series the COVID-19 spread.
prediction task.

ARIMA Results and Discussion

This is a statistical method that uses regression in which past In this study, countries from the African continent were
data points and errors are connected using weight factors, grouped into the five groups named in “Data Gathering”.
which improves the overall prediction results, as described Three forecasting models were used, including the ARIMA,
in the research by Singh et al. [17]. This model also amal- LSTM, and Prophet. In this section, the performance results
gamates the strengths of both the autoregression and mov- obtained from these models are given for each region of
ing average models, which further makes it a robust choice Africa.
that extracts the inherent statistical relationship between the
dependent and independent variables. It is a flexible model Model Training and Testing
to use, since it incorporates the difference between data
points both in the past and present context, which makes it Northern Africa
able to handle and process data which is not stationary using
a few parameters as described by Abdulmajeed et al. [1]. In the Northern region of Africa, of the six countries stud-
Another factor lies in the fact that it is easier to obtain the ied, the most densely populated country is Egypt, as shown
optimal parameter terms of this model using simple methods in Fig. 2, with a population of 102334404, while the least
like the PACF and ACF plots, as described in the research populated country is Mauritania, with a population of
by Gebretensae and Asmelash [5]. Also, metrics such as 4649658 as observed in the work by Worldometer [3].
the Akaike information criteria and Bayesian information In Fig. 5, it can be seen that Morocco has maintained
criteria make it possible to measure how good the ARIMA the highest number of COVID-19 cases over time. This was

SN Computer Science
SN Computer Science (2023) 4:374 Page 9 of 32 374

Fig. 5  Cumulative positive cases for Northern Africa

followed by Tunisia in this critical condition. On the other that the greater the number, the better is the model’s relative
hand, Mauritania, on the other hand, has the lowest number performance.
of cases over time compared to other states in this region.
Libya has a relative increase in cases, with a gradual
increase occurring between the months of October 2020 and Central Africa
July 2021. Beyond the month of July, a sharp increase that
slowly reduces toward the month of October is observed. In this region, five states were studied. At the time of this
This clearly describes the first wave of COVID-19 cases study, the most populated state in this group was Cameroon,
in Libya. Algeria's trend is more similar to that of Libya’s. with a population of 26545863 [3]. On the other hand, the
However, it is observed that the cases reach a constant num- least populated state is São Tomé and Príncipe, whose popu-
ber, while in Libya there is an increase. lation is 219,159.
According to Fig. 6, it is observed that the LSTM model In Fig. 7 above, the COVID-19 cumulative cases from the
fits better than both the ARIMA and Prophet models. In five countries in this region have been given. According to
Tunisia, it can be observed that the Prophet model performs this graph, COVID-19 cases in Cameroon are higher than
the worst in predicting the test data. This is because while in the rest of the countries, with more than two significant
the test data flattens to a constant case value, the Prophet waves. Cameroon is followed by Gabon, which also has
model predicts a sharp increase of over 800000 cases. In more than two waves. The rest of the countries maintain
countries like Egypt and Tunisia, the ARIMA and Prophet a slightly constant curve, with minor increases in COVID-
models predicted lower and higher cases, respectively, with 19 cases. The lowest number of cases is seen in São Tomé
respect to the actual data. Apart from these two countries, and Príncipe. A positive correlation is observed between the
in the four other countries, both models predicted lower population variable and the number of cases. This is because
cumulative positive cases with regard to the actual data. This the highest number of cases is observed in Cameroon, which
confirms the poor performance of these two models when is also the most populated state in this region [3]. On the
compared to the LSTM model, which predicts better results other hand, it can also be observed that the least number of
close to the actual data in five countries except Egypt. cases are observed in São Tomé and Príncipe, a country with
In Table 1, the best results in terms of the PSNR and R the smallest population. This makes Cameroon the member
value can be observed with larger numbers, which implies with the highest risk in terms of COVID-19 spread in this
region.

SN Computer Science
374 Page 10 of 32 SN Computer Science (2023) 4:374

Fig. 6  Actual and predicted cumulative cases in Northern Africa

Figure 8 shows a plot of the model performance after Southern Africa


prediction of the test data in various countries in the Central
African region. In three countries, the LSTM model predic- From this region, ten countries were used in this study. As
tion generally matches well with the actual data. This implies shown in Fig. 3, the most densely populated country in this
that the best performance in this region was observed from region is South Africa, with a population of 59308690. The
the LSTM model. It is also observed that the worst model least populated, on the other hand, is Eswatini, with a popu-
performance is given by the Prophet model, for example in lation of 1160164.
Cameroon. In Chad, the ARIMA model performs relatively In Fig. 9, it is clearly observed that South Africa has
well in predicting the data, while in the rest of the countries, the highest number of cases compared to other countries
it comes immediately after the LSTM model. in the same region. This shows how fast the COVID-19
virus spreads in this country. This puts the other neighbor-
ing countries in the same region at a very high risk of hav-
ing increased rates of spread of the virus. While the other

SN Computer Science
SN Computer Science (2023) 4:374 Page 11 of 32 374

Table 1  Performance parameters of the models for Northern Africa


Country Model PSNR R value NRMSE SMAPE RMSE MSE MAPE

Mauritania ARIMA − 28.3210 − 1.0387 0.4392 20.0932 6646.4399 44175163.3400 18.0172


LSTM − 1.0514 0.9962 0.0190 0.7515 287.8118 82835.6322 0.7551
Prophet − 31.3765 − 3.1201 0.6244 30.2207 9448.5330 89274775.8500 25.8536
Algeria ARIMA − 39.6473 − 0.3443 0.3715 12.4041 24485.1867 599524367.7000 11.5554
LSTM − 18.1055 0.9906 0.0311 1.0187 2050.2803 4203649.3090 1.0244
Prophet − 43.4308 − 2.2125 0.5744 20.2395 37851.3220 1432722577.0000 18.1621
Morocco ARIMA − 59.9838 − 1.7886 0.6184 29.8376 254525.4081 64783183368.0000 25.0037
LSTM − 38.0839 0.9820 0.0497 1.8985 20452.0670 418287044.6000 1.9400
Prophet − 60.8397 − 2.3961 0.6824 33.7894 280882.9106 78895209467.0000 27.7835
Libya ARIMA − 49.3128 − 1.4477 0.4721 24.5338 74504.0709 5550856581.0000 21.5143
LSTM − 25.4671 0.9899 0.0303 1.4573 4785.1562 22897719.8600 1.4395
Prophet − 51.1119 − 2.7040 0.5808 31.1302 91650.8128 8399871487.0000 26.4318
Egypt ARIMA − 31.1439 0.5917 0.1873 2.0405 9198.8946 84619661.8600 2.0052
LSTM − 37.1818 − 0.6398 0.3753 5.5535 18434.4519 339829016.9000 5.7408
Prophet − 46.5591 − 13.2078 1.1048 15.3217 54261.6346 2944324989.0000 16.7849
Tunisia ARIMA − 55.2048 − 2.9832 0.5456 13.7406 146819.1010 21555848418.0000 15.5706
LSTM − 28.8865 0.9907 0.0264 0.9365 7093.5494 50318443.0900 0.9371
Prophet − 54.7576 − 2.5934 0.5182 23.6105 139450.1419 19446342076.0000 21.0428

The highest PSNR and R values were obtained by the LSTM model in Mauritania. These values were −1.0514 and 0.9962, respectively. For the
rest of the performance metrics other than PSNR and R values, the best results are observed with lower values. It is also evident in Mauritania
that the lowest RMSE value of 287.8118 was obtained from the LSTM model. The ARIMA and Prophet models produced MAPE ranges of
2.0052–25.0037 and 16.7849–26.4318. On the other hand, it was observed that the LSTM model produced the lowest MAPE of 0.7551–5.7408.
The highest MAPE value for the LSTM model is clearly observed to be lower than the lowest MAPE values for both the ARIMA and Prophet
models. This makes the LSTM the best-performing model in predicting the COVID-19 cumulative in the Northern African region. Among the
countries in this region, the best model performance was observed in Mauritania, while the worst model performance was observed in Morocco,
with an RMSE value of 280882.9106 by the Prophet model

Fig. 7  Cumulative positive cases for Central Africa

SN Computer Science
374 Page 12 of 32 SN Computer Science (2023) 4:374

Table 2  Performance parameters of the models for Central Africa


Country Model PSNR R value NRMSE SMAPE RMSE MSE MAPE

Cameroon ARIMA − 23.7735 0.6842 0.1819 3.8398 3937.4568 15503566.0519 3.8655


LSTM − 29.9249 − 0.3019 0.3694 7.0175 7994.4080 63910559.2705 7.4279
Prophet − 40.3369 − 13.3144 1.2249 24.9969 26508.4573 702698308.4259 28.8557
Gabon ARIMA − 22.9030 − 0.0850 0.3428 7.2343 3561.9664 12687604.6347 6.6077
LSTM − 5.0523 0.9822 0.0439 1.0980 456.2018 208120.0823 1.1101
Prophet − 19.1268 0.5452 0.2219 7.4839 2306.1026 5318109.2017 7.7485
Chad ARIMA 14.9747 − 0.3830 0.3921 0.7312 45.4783 2068.2758 0.7272
LSTM 25.6706 0.8822 0.1144 0.2213 13.2742 176.2044 0.2212
Prophet 0.0557 − 41.9264 2.1842 4.6249 253.3710 64196.8636 4.7480
Equatorial ARIMA − 19.4805 − 0.9611 0.5230 16.0832 2401.9573 5769398.8710 13.9253
Guinea LSTM − 10.6582 0.7428 0.1894 5.8417 869.8616 756659.2032 6.1844
Prophet − 12.1101 0.6407 0.2238 8.8312 1028.1208 1057032.3790 8.8613
São Tomé ARIMA − 7.0908 − 0.2709 0.4364 12.6397 576.8751 332784.8810 11.1389
and Príncipe LSTM 16.3165 0.9942 0.0295 0.8522 38.9686 1518.5518 0.8449
Prophet − 8.3251 − 0.6886 0.5030 15.7833 664.9582 442169.4077 13.7283

In Table 2, it is observed that the lowest RMSE value of 13.2742 was obtained by the LSTM model from Chad, as well as the highest RMSE
of 26508.4573 was obtained by the Prophet model in Cameroon. It is evident that the lowest and highest PSNR values of 40.3369 and 16.3165
were observed in Cameroon and São Tomé and Príncipe by the Prophet and LSTM models, respectively. The best MAPE range of 0.2212–
7.4279 was obtained by the LSTM model, followed by 0.7272–1.1389 and 4.7480–28.8557 by the ARIMA and Prophet models. In this region,
the best model performance was obtained by the LSTM model, while the worst model performance was seen in the Prophet model

countries in the same region are experiencing their second close to the actual data, while the ARIMA model pre-
wave of virus spread, South Africa is observed to have three dicted a lower number of cases, quite different but also
waves. Since it has the largest population, there is a positive substantially close to the actual data. It is in this coun-
correlation between the large number of cases observed and try that the three models show a significant uniformity in
the large population. their predicted results. This can be generally attributed to
For clarity, in Fig. 10, South Africa was excluded to the smooth rise in the number of cases in Angola, which
be able to perform a comparative analysis of the COVID- makes it easier for all the models to capture the inherent
19 state in other countries in the same region. It can be data relationships and trends to be able to make better
observed that, apart from South Africa, Zambia has the predictions.
largest number of cases compared to other countries. It In Fig. 12, it is observed that the ARIMA model per-
is also the first country to have an earlier increase in the formed the worst when compared to the other countries. This
number of cases. It is also observed that all countries have model made predictions that were generally higher than the
had their second major wave of COVID-19 spread. It is actual data. In all four countries, the ARIMA model predicts
worth noting that the lowest number of cases was observed a higher number of cases than the numbers predicted by the
in Lesotho. Beyond the month of October, it is clearly rest of the models. The LSTM model is also observed to
observed that in all countries, there is a constant number provide the best performance with the best-matching predic-
of cases with the curves flattened. This clearly signifies the tions. The LSTM model is followed by the Prophet model,
effects of some form of control of the spread by a number with the second-best prediction performance. In the South
of practices, such as quarantines and vaccinations. African region, the LSTM model is observed to provide the
In Fig. 11, in three countries (Botswana, Malawi, and best overall prediction results compared to the ARIMA and
Mozambique), the LSTM model provided the best-match- Prophet models, as shown in both Figs. 11 and 12, while
ing prediction results. In Lesotho, the ARIMA model the worst prediction results are observed from the ARIMA
performed better than the other two models. The Prophet model.
model emerged as the worst performer, as clearly observed Table 3 displays the performance metrics used to deter-
in four countries: Malawi, Mozambique, Eswatini, and mine the best prediction model in the Southern African
Lesotho. In these countries, this model predicts a roughly region.
constant number of cases, with slight increases in the pre-
dicted number of cases. In Angola, both the LSTM and
Prophet models produced slightly matching predictions

SN Computer Science
SN Computer Science (2023) 4:374 Page 13 of 32 374

Fig. 8  Actual and predicted cumulative cases in Central Africa

Western Africa the cumulative positive cases. It is observed that between


the months of January 2020 and April of the same year,
In this research study, 12 countries from this region were no COVID-19 cases were reported in this region. However,
used as case studies. In the Western region, Nigeria is beyond the month of April of the same year, the first cases
the country with the largest population, with a total of have begun to be reported. Notably, after this, in about four
206139589 people. Guinea-Bissau, on the other hand, has countries, which include Nigeria, Ghana, Senegal, and Mali,
the smallest population of 1968001 [3]. there is a sharp increase in the number of cases, while in the
In Fig. 13, a comparative plot of the 12 countries used other countries there is a gradual increase in the number
in this study from the Western region of Africa has been of cases. Nigeria, followed by Ghana and Senegal, displays
given. This shows the state of the COVID-19 pandemic in the highest number of cases over time. Nigeria, being the
each of the 12 counties. It also displays the severity of the most populated country with over 200 million people and
risk situation in terms of the COVID-19 spread given by the highest number of cases, is the riskiest member in this

SN Computer Science
374 Page 14 of 32 SN Computer Science (2023) 4:374

Fig. 9  Cumulative positive cases in the Southern African region including South Africa

Fig. 10  Cumulative positive cases in the Southern African region excluding South Africa

region. If immediate measures are not taken, there are higher of countries from this region, it can be observed that the
chances of a faster spread to other countries too. LSTM model outperformed the other two models in produc-
Figure 14 displays the prediction results of the three ing the best-matching prediction results. This can be clearly
models in the region of Western Africa. In this first group observed in countries like Guinea, Guinea-Bissau, Gambia,

SN Computer Science
SN Computer Science (2023) 4:374 Page 15 of 32 374

Fig. 11  Actual and predicted cumulative positive cases for Southern Africa (a)

Ghana, and Togo. In Burkina Faso, the Prophet model man- model. This is the only country where this model performs
ages to make the most successful prediction. The ARIMA best when its performance is compared to the remaining
and Prophet are observed to make marching predictions in countries. It can also be concluded from this figure that the
three countries: Guinea-Bissau, Ghana, and Togo. These ARIMA model did not display any top performance in any
predictions suggest a lower COVID-19 case number when of the countries. In all the six countries in this group in the
compared to the actual data. This provides another proof of Western region of Africa, the LSTM model maintains the
how these two models perform poorly when compared to the best-matching prediction results, which continues to affirm
LSTM model. In Fig. 15, the second group of model predic- the LSTM model as the top performing model in this region.
tions in the Western region of Africa is given. According to In Nigeria, both the ARIMA and Prophet models make
this figure, it can be observed that the best model prediction matching predictions against each other, which is still lower
performance obtained in Niger is obtained from the Prophet and significantly different from the actual data. These results

SN Computer Science
374 Page 16 of 32 SN Computer Science (2023) 4:374

Fig. 12  Actual and predicted cumulative positive cases in Southern Africa (b)

prove the LSTM model to be the best prediction model in the number of confirmed cases. If proactive measures are not
the West African region. applied, the Eastern region is at a higher risk of experienc-
In Table 4, the prediction results based on the seven met- ing a surge in the spread of COVID-19. In the region, there
rics used in this study for the three models are provided for was a relatively late occurrence of the first cases, which is
the 12 countries from the Western region of Africa. observed from the fact that the significant numbers of cases
started to be registered just after the month of July in 2020
in all countries. In this region, Kenya is observed to have the
Eastern Africa highest number of waves of the COVID-19 spread. Apart
from Ethiopia, Kenya, Uganda, Rwanda, Madagascar, and
From this region, 12 countries were studied. Among these, Sudan, the rest of the countries are observed to have a rela-
the Comoros is observed to be the least populated coun- tively slow increase in the number of cases reported. This
try, with a population of 869601, while the most populated can be due to varying measures that might have been taken
country is observed to be Ethiopia, with a population of by the respective countries and also the general population.
114963588 at the time of this study. For example, in the Comoros, the least populated country
The cumulative positive COVID-19 cases for the coun- in this region.
tries in the Eastern region of Africa have been given in the Both Figs. 16 and 17 display the prediction results from
plot in Fig. 16. It is notably clear that in this region, the the LSTM, ARIMA, and Prophet models in the 12 countries
highest number of cases is obtained in Ethiopia, which is used in this study from the Eastern region of Africa. These
followed by Kenya. It is worth noting that the population results display both the plots of the predicted data by the
of Kenya, at 53771296 people immediately follows that of models and the expected actual data. It is observed from
Ethiopia, while at the same time, its number of cumulative Fig. 16 that all three models performed relatively well in
cases immediately follows that of Ethiopia, which means a the Comoros, followed by Sudan, as displayed in Fig. 17. In
roughly positive correlation between the population size and the rest of the countries, in both figures, it can be observed

SN Computer Science
SN Computer Science (2023) 4:374 Page 17 of 32 374

Table 3  Performance parameters of the models for Southern Africa


Country Model PSNR R value NRMSE SMAPE RMSE MSE MAPE

South Africa ARIMA − 68.6754 − 5.7832 0.7905 16.5715 692322.2002 479310000000.0000 19.0097
LSTM − 39.7648 0.9913 0.0283 0.7800 24818.6368 615964732.6000 0.7830
Prophet − 68.2322 − 5.1251 0.7512 26.7633 657882.1667 432809000000.0000 23.5047
Zambia ARIMA − 48.9441 − 48.6151 1.7532 23.3656 71407.6662 5099054792.0000 28.1140
LSTM − 6.7122 0.9970 0.0136 0.2252 552.2662 304997.9557 0.2252
Prophet − 44.8702 − 18.4190 1.0968 24.6496 44673.6041 1995730903.0000 21.8100
Namibia ARIMA − 50.7674 − 168.8737 3.0293 40.7690 88086.5415 7759238793.0000 57.0550
LSTM − 11.0236 0.9820 0.0312 0.6692 907.2334 823072.4421 0.6697
Prophet − 39.0956 − 10.5598 0.7902 19.9281 22978.5282 528012758.2000 17.8522
Eswatini ARIMA − 35.7542 − 1.6589 0.5871 40.7329 15640.5055 244625412.3000 32.6660
LSTM − 25.7619 0.7336 0.1858 10.6050 4950.3420 24505885.9200 9.5222
Prophet − 38.0908 − 3.5536 0.7683 58.2823 20468.2129 418947739.3000 43.3898
Lesotho ARIMA − 8.5919 0.9618 0.0775 0.6781 685.6982 470182.0215 0.6157
LSTM − 21.4589 0.2614 0.3409 14.5044 3016.3651 9098458.4170 16.4016
Prophet − 25.6834 − 0.9537 0.5545 23.6798 4905.8203 24067072.8200 20.2240
Malawi ARIMA − 28.4637 − 0.0858 0.2957 9.3612 6756.5296 45650692.2400 9.5218
LSTM − 20.0071 0.8451 0.1117 3.3325 2552.0723 6513073.0240 3.4858
Prophet − 38.2399 − 9.3125 0.9113 41.4722 20822.5839 433580000.3000 34.0048
Mozambique ARIMA − 42.8290 − 2.4604 0.5430 14.7160 35317.3358 1247314208.0000 16.9596
LSTM − 21.2013 0.9762 0.0450 1.9387 2928.2214 8574480.5670 1.9128
Prophet − 47.2140 − 8.4979 0.8996 50.2045 58511.4488 3423589641.0000 39.7238
Botswana ARIMA − 45.9388 − 1.1362 0.4543 34.8145 50522.0752 2552480083.0000 29.1172
LSTM − 21.9147 0.9915 0.0286 1.9982 3178.8901 10105342.2700 2.0284
Prophet − 47.6825 − 2.1915 0.5553 44.8292 61753.7964 3813531370.0000 35.9618
Angola ARIMA − 28.5294 0.3054 0.2763 9.3038 6807.8362 46346633.7300 8.6089
LSTM − 19.7646 0.9077 0.1007 3.0467 2481.8282 6159471.2140 2.9731
Prophet − 21.5315 0.8613 0.1234 5.5594 3041.7034 9251959.5740 5.7561
Zimbabwe ARIMA − 50.9603 − 25.8469 1.3262 38.1986 90064.6700 8111644782.0000 54.1828
LSTM − 19.8316 0.9793 0.0368 1.6663 2501.0312 6255157.0630 1.6868
Prophet − 48.4614 − 14.1010 0.9947 75.8921 67547.6906 4562690505.0000 54.7687

The highest R value of 0.9970 was obtained from Zambia by the LSTM model, while the smallest R value of −168.8737 was observed from
Namibia by the ARIMA model. On the other hand, the smallest PSNR value −68.6754 was obtained from South Africa by the ARIMA model,
while the highest PSNR value −6.7122 was obtained by the LSTM model from Zambia. Both the PSNR and R metrics suggest that the LSTM
model is the best prediction model in this region, while the worst prediction model in this region is ARIMA. The RMSE metric ranges of
552.2662–24818.6368, 685.6982–692322.2002 and 3041.7034–657882.1667 were obtained by the LSTM, ARIMA and LSTM models, respec-
tively. It is also evident from this metric that the best RMSE range was produced by the LSTM model compared to the rest of the models. With
the smallest value of the MAPE metric of 0.2252 from Zambia, the overall best-performing model in the Southern Africa region is observed to
be the LSTM model, while the worst-performing model, with the largest MAPE of 57.0550 from Namibia, is observed to be the ARIMA model

that the three models show significant relative discrepancies Rwanda, the worst model performance can be observed from
in performance. In Fig. 16, both the LSTM and ARIMA both the ARIMA and Prophet models. In this particular sce-
models obtained better match prediction results when com- nario, both models predicted extremely varied results from
pared with the Prophet model in Madagascar. In Fig. 16, the actual data. These results conclude that the LSTM model
the worst model performance is observed in both Djibouti outperformed the ARIMA and Prophet models in the Eastern
and Madagascar by the Prophet model. On the other hand, region.
the best model performance is evidently obtained by the In Table 5, the three model performances have been given
LSTM model in all countries represented by the same fig- for the 12 countries from the Eastern region of Africa.
ure. In Fig. 17 too, the LSTM model is observed to have Figure 18 displays the overall combined model perfor-
the overall best-matching prediction results when compared mance from all individual regions used in this study. It
to the ARIMA and Prophet models. In both Mauritius and shows the percentage distributions both in the positive and

SN Computer Science
374 Page 18 of 32 SN Computer Science (2023) 4:374

Fig. 13  Cumulative positive cases in Western Africa

negative directions to quantify each model’s performance all natures, while the other two models are affected by the
depending on its contribution to the total error value for the quality of their inherent data properties. The ARIMA model
seven error metrics used in this study. In both PSNR and R, works best with stationary data, and also requires a larger
good performance is indicated by having more distribution amount of data to fit well. With data that is not stationary,
toward the positive direction, just as bad performance can the ARIMA model performs poorly. The data used in this
be observed by having a more negative percentage distribu- study was small in amount due to the fact that the COVID-
tion. For RMSE, MAPE, NRMSE, SMAPE, and the MSE 19 pandemic is still a new ordeal with little data available.
errors, good performance can be observed in having smaller In most countries, the datasets were not significantly able
percentage distributions tending in the positive direction. to be made stationary, despite the differencing efforts to
On the other hand, bad performance for the models can be make them so during ARIMA model fitting. All of these
observed in having a large positive percentage distribution. factors contribute to its poor performance when compared
The RMSE, MAPE, NRMSE, SMAPE, and MSE metrics to the LSTM model. On the other hand, in this study, it
clearly state that the overall best performance in this study is observed that the overall worst-performing model is the
was obtained by the LSTM model, followed by the ARIMA Prophet model. Despite its ease of setup and not requiring
model, and lastly, the Prophet model. This is because the data preprocessing, this Fourier series-based model failed
LSTM model is observed to have obtained the smallest to find and learn significant trends, seasonality, and holiday
percentage distribution of the total error in all these five structures within the data to make best-matching predictions,
metrics. The ARIMA model follows, with relatively larger which is because of the limited data available and given for
percentage distributions than the LSTM model, but smaller training. The LSTM model's having several hyperparameter
compared to the Prophet model. The PSNR and R values tuning points made it possible for it to be tuned until the
also clarify that the LSTM model is observed to outperform best-matching results were reached. When compared to the
the other two models. Both the PSNR and R values for the other two models, the computational and time complexity
LSTM model tend toward the positive direction, showing of the LSTM model in order to achieve optimal results was
that it achieved the highest values for these two metrics the highest.
compared to the ARIMA and Prophet models. It is again
followed by the ARIMA and, lastly, the Prophet model,
respectively. The LSTM model's performance is owed to
the fact that it can process and handle sequential data of

SN Computer Science
SN Computer Science (2023) 4:374 Page 19 of 32 374

Fig. 14  Actual and predicted cumulative positive cases in Western Africa (a)

Forecasting for the Next 61 Days for each country in the five major regions of the African
continent.
In this study, after determining the best prediction model
through the training and testing processes, the second major
phase involved the forecasting of the cumulative positive Northern Africa
cases by the best-performing model for each country for
a period of 61 days. At the time of access to the main As displayed in Fig. 19, the COVID-19 cumulative positive
COVID-19 case dataset used in this study, the last date of the cases are expected to have a fast increasing rate in Egypt as
reported cases for each country in all regions was 2021-11- well. While in countries like Tunisia, Algeria, and Maurita-
1. Cumulative positive cases were then forecasted from the nia, cases are expected to maintain a flat rate of increase, in
last date of the original dataset up to the date of 2022-01-02 Libya it is expected to show a gradual increase in the rate of

SN Computer Science
374 Page 20 of 32 SN Computer Science (2023) 4:374

Fig. 15  Actual and predicted cumulative cases in Western African (b)

increase. In Morocco, a notable slight decrease is expected, country in this region with the largest expected increase in
after which a constant number of cases with a small increase the number of COVID-19 cumulative positive cases.
at the end of the forecasting period is expected. At the end
of the prediction period, all these countries in Northern Central Africa
Africa that reported cumulative cases are expected to show
an increase. In Algeria, Mauritania, Tunisia, Egypt, Libya, In Fig. 20, the forecasted cases for the five Central Afri-
and Morocco, cases are expected to increase from 206452 can countries have been plotted. In Cameroon, the cases
to 208009, 37320–38250, 712747–716835, 331017–370164, are expected to slightly drop to a constant rate of increase.
357338–369986, and from 946145 to 947226, respectively. In Gabon and Equatorial Guinea, a gradual increase is
With an 11.83% increase in the number of cases at the end expected, while in Chad and São Tomé and Príncipe, a con-
of the forecasting period, it is observed that Egypt is the stant rate of change in the cases is expected. At the end of

SN Computer Science
SN Computer Science (2023) 4:374 Page 21 of 32 374

Table 4  Performance parameters of the models for Western Africa


Country Model PSNR R value NRMSE SMAPE RMSE MSE MAPE

Niger ARIMA − 1.1740 − 0.6132 0.3475 3.9254 291.9044 85208.1787 3.8076


LSTM − 0.5325 − 0.3917 0.3228 3.8508 271.1221 73507.1931 3.7488
Prophet 12.2196 0.9262 0.0743 0.9621 62.4539 3900.4896 0.9659
Mali ARIMA − 7.2888 − 0.7804 0.3645 2.8408 590.1751 348306.6487 2.7670
LSTM 19.7407 0.9965 0.0162 0.1374 26.2729 690.2653 0.1375
Prophet − 24.2073 − 86.5715 2.5566 22.7287 4139.1138 17132263.0500 25.9383
Liberia ARIMA − 25.5340 − 558.9784 5.2358 49.7815 4822.1454 23253086.2600 72.6324
LSTM 7.4258 0.7167 0.1178 1.4477 108.4557 11762.6389 1.4647
Prophet − 14.4240 − 42.3685 1.4571 26.6279 1341.9666 1800874.3560 23.2167
Guinea ARIMA − 23.7940 − 1.8990 0.5884 12.7953 3946.7398 15576755.0500 11.8515
LSTM 3.7225 0.9949 0.0248 0.5042 166.1170 27594.8577 0.5034
Prophet − 19.6152 − 0.1076 0.3637 7.5991 2439.4969 5951145.1250 7.2484
Guinea− Bissau ARIMA − 15.9092 − 2.9237 0.7195 28.2155 1592.2175 2535156.5670 23.9589
LSTM 13.1228 0.9951 0.0254 0.8095 56.2861 3168.1251 0.8156
Prophet − 16.0789 − 3.0801 0.7337 29.4110 1623.6256 2636160.0890 24.9430
Ghana ARIMA − 36.9592 − 1.3960 0.5336 13.5331 17967.9858 322848513.7000 12.4038
LSTM − 18.7581 0.9637 0.0656 1.5789 2210.2812 4885342.9830 1.5967
Prophet − 37.8953 − 1.9724 0.5943 15.6611 20012.7714 400511019.1000 14.2355
Gambia ARIMA − 18.8193 − 3.6199 0.6265 24.9010 2225.9044 4954650.3980 21.8765
LSTM 9.2716 0.9928 0.0247 0.7308 87.6925 7689.9746 0.7358
Prophet − 21.1720 − 6.9414 0.8214 34.6748 2918.3552 8516797.0730 29.1977
Burkina Faso ARIMA − 4.7163 − 0.0108 0.3379 2.0304 438.8913 192625.5732 1.9840
LSTM 3.2095 0.8370 0.1357 0.9683 176.2237 31054.7924 0.9607
Prophet − 0.2520 0.6384 0.2021 1.7649 262.5062 68909.5050 1.7794
Togo ARIMA − 27.1499 − 0.6747 0.4841 23.3990 5808.0929 33733943.1400 20.0786
LSTM − 18.9326 0.7475 0.1880 8.1182 2255.1162 5085549.0760 8.7542
Prophet − 26.4842 − 0.4367 0.4484 21.2872 5379.6047 28940146.7300 18.5120
Sierra Leone ARIMA − 10.4028 − 88.3031 2.1384 11.3549 844.6563 713444.2651 12.1692
LSTM 22.5264 0.9545 0.0483 0.2727 19.0642 363.4437 0.2728
Prophet − 6.9960 − 39.7560 1.4446 8.0593 570.6140 325600.3370 7.6311
Nigeria ARIMA − 39.5807 − 1.5257 0.5510 10.7816 24298.1514 590400161.5000 10.0128
LSTM − 12.3440 0.9952 0.0239 0.4389 1056.1850 1115526.7540 0.4374
Prophet − 39.5353 − 1.4995 0.5481 10.7166 24171.6715 584269703.1000 9.9560
Senegal ARIMA − 36.9374 − 2.5463 0.5945 27.3320 17923.0581 321236011.7000 23.6709
LSTM − 9.9932 0.9928 0.0267 0.9606 805.7498 649232.7402 0.9669
Prophet − 39.2209 − 4.9995 0.7732 37.3071 23312.2755 543462189.0000 30.8797

As annotated in the table based on the PSNR metric, the best model performance was obtained from the LSTM model from Sierra Leone, with
a PSNR value of 22.5264. On the other hand, the worst performance was observed in Nigeria, with a PSNR value of −39.5807, which was
obtained by the ARIMA model for Nigeria. From the R metric, it is also observed that the highest value of 0.9965 is produced by the LSTM
model in Mali, as well as the lowest value of −558.9784 was obtained by the ARIMA model in Liberia. It is worth noting that a higher PSNR
and R value imply better results. Values obtained with this metric show that the LSTM models outperform the other two models in the Western
African region as well. The lowest RMSE value of 19.0642 obtained by the same model in Sierra Leone further reinforces this observation,
while the worst performance with regard to the same metric can be notably seen in Nigeria by the ARIMA model with the highest value of
24,298.1514. When the MAPE metric ranges from all the three models are taken into consideration, it is observed that the LSTM model pro-
vides the best range, reflecting the best performance of 0.1375–8.7542, which is followed by the ARIMA model with a range of 1.9840–72.6324
and the Prophet model with a range of 0.9659–30.8797. These performance metric results affirm that the best-performing model is the LSTM
model in the Western region

SN Computer Science
374 Page 22 of 32 SN Computer Science (2023) 4:374

Fig. 16  Cumulative positive cases for Eastern Africa

the forecasting period in Cameroon, a decrease in the num- these three, are observed to maintain a constant number of
ber of cases is expected to occur from 102,499 to 102,129. cases with insignificant increases.
In the Central African region, Cameroon is the only country Figure 22 is a continuation of Fig. 21, which also shows
with an expected decrease in the number of cases. a plot of the forecasted cases and actual cases for three
The rest of the countries are expected to experience an countries in the Southern African region. In both Zam-
increase in the number of cases as well. Cases are expected bia and Mozambique, the number of cumulative cases is
to increase from 35525 to 36522, 5069–5072, 13368–13508 expected to maintain a constant course while a signifi-
and 3714–3717 in Gabon, Chad, Equatorial Guinea and São cant gradual increase in the number of cumulative cases
Tomé and Príncipe respectively. The largest increase in the is expected to occur. At the end of the forecasting period
number of cases in this region is expected to occur in Gabon, among the countries of this region, it is only in Mozam-
with an expected percentage increase of 2.81%. bique that the number of COVID-19 cumulative cases
is expected to decrease from 151292 to 151051. In the
Southern Africa rest of the countries, the cases are expected to increase.
In Angola, Botswana, Malawi, Namibia, South- Africa,
For the sake of clarity, countries from the Southern Afri- Zambia, Eswatini, Lesotho, and Zimbabwe, the number of
can region were separated into two plots showing the fore- cases is expected to increase from 64433 to 76655, 186594
casted cumulative cases. This is because the number of cases to 193024, 61796 to 63201, 128886 to 129401, 209734 to
in South Africa is so much bigger than in the rest of the 210955, 46421 to 46874, 21635 to 24334, and 132977 to
countries in this region. This would result in plots for other 133267, respectively. In this region, the highest percentage
countries being stacked together and not being able to be increase is observed to be 18.97% from Angola.
examined. In Fig. 21, a plot for the actual and forecasted
cumulative cases for seven countries in the Southern African Eastern Africa
region is provided. According to this figure, it is observed
that in Angola, the expected rate of increase in the cumula- Forecasted cases in the Eastern region of Africa have been
tive positive cases is higher than in the rest of the countries. plotted in two separate graphs (Figs. 23 and 24). This made
Angola is followed by Lesotho, with a moderate rate of it possible to analyze and observe clearly the forecasted
increase in the number of cumulative cases. Lesotho is also cases in all countries studied in this region.
followed by Botswana, with a small but notable increase in In Fig. 23, a plot of the actual and forecasted cases for
the cumulative cases. The rest of the countries, apart from seven countries from the Eastern African region has been

SN Computer Science
SN Computer Science (2023) 4:374 Page 23 of 32 374

Fig. 17  Actual and predicted cumulative positive cases for Eastern Africa (a). Actual and predicted cumulative positive cases for Eastern Africa
(b)

given. This forecast has been produced by the top perform- expected, while in Ethiopia and Somalia, a notable increase
ing model, which is the LSTM in most countries. According is expected to occur. In both Uganda and Sudan, a small
to this forecast, it is observed that in two countries, Rwanda increase, which will be followed by a small but significant
and Mauritius, there is an expected gradual increase in the decrease, is expected to take place.
rate of increase of cumulative positive cases. Apart from At the end of the forecasting period in Djibouti, the
these two countries and Djibouti, which are expected to cases are expected to remain constant. The previous num-
have the same number of cases, the rest of the countries are ber in the original dataset was 13478 cases, which was
expected to have small fluctuations in the number of cases. expected to remain the same at the end of the forecast for
In Fig. 24, five countries in the Eastern region of Africa Djibouti. In Eritrea, a small decrease is expected to hap-
have been shown with their respective COVID-19 cumula- pen from 6834 to 6820 cases. On the other hand, in the
tive positive cases. In Kenya, a constant number of cases is

SN Computer Science
374 Page 24 of 32 SN Computer Science (2023) 4:374

Fig. 17  (continued)

rest of the countries, an increase is expected by the end of observed to take place in Somalia, with a 10.72% expected
the forecasting period. In these countries, Uganda, Sudan, percentage increase.
Madagascar, Kenya, South Sudan, Somalia, Rwanda,
Mauritius, Ethiopia, and Comoros, cases are expected to Western Africa
increase from 126236 to 127628, 40433 to 40598, 43626
to 44150, 253310 to 253901, 12410 to 12761, 21998 The forecasted cases from the Western African countries
to 24356, 99698 to 102205, 17812 to 18297, 365167 were grouped into two groups. As shown in Figs. 25 and 26,
to 377935, and 4259 to 4472, respectively. The highest six countries were plotted together in each group. This was
expected increase in the cumulative number of cases is done in order to separate countries that have closer numbers
of cumulative cases for a clear analysis of the results from
the forecasting stage.

SN Computer Science
SN Computer Science (2023) 4:374 Page 25 of 32 374

Table 5  Performance parameters of the models for Eastern Africa


Country Model PSNR R value NRMSE SMAPE RMSE MSE MAPE

Mauritius ARIMA − 30.9283 − 1.5722 0.5692 93.1564 8973.3930 80521781.9300 59.9527


LSTM − 9.8988 0.9797 0.0506 7.7913 797.0358 635266.0665 8.3734
Prophet − 31.3969 − 1.8653 0.6007 104.3478 9470.7650 89695389.6900 65.2334
Madagascar ARIMA − 10.3860 − 2.8964 0.6832 1.7484 843.0209 710684.2378 1.7672
LSTM − 2.1830 0.4107 0.2657 0.6019 327.8600 107492.1796 0.6048
Prophet − 37.2296 − 1882.7825 15.0212 32.4367 18536.1595 343589209.0000 39.7624
Kenya ARIMA − 38.5762 0.1094 0.3237 8.2885 21644.5874 468488163.7000 7.8676
LSTM − 16.2976 0.9947 0.0249 0.6621 1665.0265 2772313.2460 0.6640
Prophet − 34.9861 0.6104 0.2141 5.2961 14316.8267 204971526.8000 5.1147
Rwanda ARIMA − 42.6049 − 3.1725 0.6113 22.6311 34418.0092 1184599357.0000 27.5678
LSTM − 11.4368 0.9968 0.0169 0.9610 951.4349 905228.3689 0.9569
Prophet − 44.8293 − 5.9637 0.7897 66.9946 44463.6562 1977016723.0000 49.7882
Eritrea ARIMA − 17.9013 − 234.2806 3.3829 21.4673 2002.6522 4010615.8340 25.1679
LSTM 8.3570 0.4431 0.1646 1.2999 97.4299 9492.5854 1.3106
Prophet − 11.2635 − 50.0280 1.5754 10.6952 932.6455 869827.6287 11.4391
Ethiopia ARIMA − 45.0386 − 1.0228 0.5137 11.0389 45548.2859 2074646348.0000 10.0965
LSTM − 25.4272 0.9779 0.0537 1.1764 4763.2242 22688304.7800 1.1653
Prophet − 40.5723 0.2767 0.3072 8.1788 27236.8169 741844194.8000 8.5723
Comoros ARIMA 2.7683 − 6.6139 0.7476 4.2035 185.4063 34375.4961 4.3021
LSTM 18.6599 0.8039 0.1200 0.5967 29.7540 885.3005 0.5992
Prophet 8.3879 − 1.0876 0.3915 2.2780 97.0829 9425.0895 2.3056
Djibouti ARIMA − 3.8103 0.6871 0.2116 2.6073 395.4184 156355.7111 2.5933
LSTM 7.8871 0.9788 0.0550 0.4601 102.8451 10577.1146 0.4634
Prophet − 22.6304 − 22.8474 1.8469 23.6379 3451.9329 11915840.7500 27.0253
Uganda ARIMA − 35.2449 0.0511 0.3294 10.4174 14749.7766 217555909.7000 11.2426
LSTM − 34.4800 0.2044 0.3017 8.4224 13506.4698 182424726.5000 9.2481
Prophet − 37.2039 − 0.4897 0.4128 17.6531 18481.4583 341564300.9000 16.1231
South Sudan ARIMA − 9.3188 − 1.3764 0.4994 5.1883 745.5587 555857.7751 4.9893
LSTM 3.9456 0.8879 0.1084 1.2819 161.9053 26213.3262 1.2915
Prophet − 6.4872 − 0.2381 0.3604 3.6874 538.1478 289603.0546 3.5850
Somalia ARIMA − 22.5747 − 1.0788 0.4898 15.0670 3429.8690 11764001.3600 13.4804
LSTM − 10.2801 0.8774 0.1189 3.9871 832.8056 693565.1674 3.8825
Prophet − 15.7176 0.5714 0.2224 6.7348 1557.4683 2425707.5060 6.5082
Sudan ARIMA − 4.6451 0.8122 0.1243 0.8412 435.3047 189490.1818 0.8396
LSTM − 17.3794 − 2.5243 0.5385 3.9144 1885.8611 3556472.0890 4.0335
Prophet − 18.5771 − 3.6434 0.6181 5.0666 2164.6931 4685896.2170 5.2195

Seven metrics have been used to measure the model’s performance. Of these metrics, PSNR and R value, show the best performance for higher
numbers. The rest of the metrics, apart from these, display the best performance for lower values. Basing on the PSNR metric, it is observed
that the LSTM model obtained the highest value of 18.6599 from the Comoros. Using the same metrics, it is evident that in Ethiopia the worst
performance, with the smallest PSNR value of −45.0386, was obtained by the ARIMA model. This implies that there is a significantly greater
amount of noise in the predicted data by the ARIMA model, which directly reflects poor performance. This is also evident in Fig. 17, whereby
there is a greater deviation of the predictions from the ARIMA with regard to the actual data. In Madagascar, the Prophet is observed to have
the worst R value of −1882.7825. This implies that the predicted values by the Prophet model in Madagascar have the poorest correlation with
the actual data in this country when compared with the other two model predictions and their respective actual values in both Madagascar and
the other countries. Using the same metric, it is observed that the LSTM model in Rwanda obtained the best and highest value of 0.9968. This
shows a strong correlation between the LSTM predicted cases and the actual cases, as further represented in Fig. R in Rwanda. When ranked
by the RMSE ranges, it is observed that the best range is 29.7540–13506.4698, which is obtained by the LSTM, followed by the Prophet model
with a range of 97.0829–44463.6562 and lastly, the ARIMA model with a range of 185.4063–45548.2859. It is clear from this metric that the
most accurate model is the LSTM model in the Eastern region

SN Computer Science
374 Page 26 of 32 SN Computer Science (2023) 4:374

100%

80%

60%
Percentage error distribution

40%

20%
Prophet
0%
RMSE MAPE PSNR R NRMSE SMAPE MSE ARIMA
-20% LSTM

-40%

-60%

-80%

-100%

Models

Fig. 18  Total error distribution of the models

Fig. 19  Actual and forecasted COVID-19 cumulative positive cases for Northern Africa

In Fig. 25, six countries from the Western region of Africa, cases. In the rest of the five countries, a constant number of
including their respective forecasted and actual cumulative cases is expected, with small fluctuations by the end of the
cases, are shown. According to this figure, it is clear that the forecasting period. Since all countries in this figure maintained
expected cases in Guinea will have a small increase, which their respective fluctuation courses in the number of cases, it
is immediately followed by a generally constant number of is evident that countries with a higher number of cases before

SN Computer Science
SN Computer Science (2023) 4:374 Page 27 of 32 374

Fig. 20  Actual and forecasted COVID-19 cumulative positive cases for Central Africa

Fig. 21  Actual and forecasted COVID-19 cumulative positive cases for Southern Africa (a)

the forecasting processes maintained these higher numbers number of forecasted cases, as depicted in Fig. 25. Since there
after forecasting. Countries such as Guinea, with the highest is no expected significant decrease in the forecasted cases, this
number of actual cases, are still expected to have the highest

SN Computer Science
374 Page 28 of 32 SN Computer Science (2023) 4:374

Fig. 22  Actual and forecasted COVID-19 cumulative positive cases for Southern Africa (b)

Fig. 23  Actual and forecasted COVID-19 cumulative positive cases for Eastern Africa (a)

still presents a great risk for the region if preemptive measures In Fig. 26, the rest of the six countries from the Western
are not taken. region of Africa are given, including the forecasted and actual
cases in each state. A significant increase in the expected cases

SN Computer Science
SN Computer Science (2023) 4:374 Page 29 of 32 374

Fig. 24  Actual and forecasted COVID-19 cumulative positive cases for Eastern Africa (b)

Fig. 25  Actual and forecasted COVID-19 cumulative positive cases for Western Africa (a)

in Mali is observed, while in the rest of the countries, a con- In Gambia, a very small decrease is expected to occur in the
stant number of cases with minor fluctuations is observed. forecasted number of cumulative cases at the end of the fore-
casting period. In this country, cases are forecasted to decrease

SN Computer Science
374 Page 30 of 32 SN Computer Science (2023) 4:374

Fig. 26  Actual and forecasted COVID-19 cumulative positive cases for Western Africa (b)

from 9967 to 9964. In other countries in the Western region, future course that the pandemic might take beforehand.
apart from the Gambia, there is an expected increase in the This is because it would enable authorities to plan ahead
number of cases. The COVID-19 cumulative positive cases of time and eventually allocate resources effectively and
are expected to increase from 6366 to 6565, 6134 to 6151, efficiently to more critical areas. There is a significant
30653 to 30909, 14793 to 14848, 26079 to 26195, 6398 to gap in the literature for studies that consider a continent’s
6408, 73917 to 74171, 211961 to 214460, 16074 to 19734, perspective, especially in Africa, when dealing with the
5815 to 5838, and 130077 to 131347 in countries such as forecasting of COVID-19. This study aimed at closing this
Niger, Guinea-Bissau, Guinea, Burkina Faso, Togo, Sierra gap by focusing on the forecasting and investigation of the
Leone, Senegal, Nigeria, Mali, Liberia, and Ghana, respec- expected future COVID-19 cumulative positive cases for a
tively. According to these results, it is observed that the highest period of sixty- one days. From the forecasted values, this
expected percentage increase of 22.77% is expected to occur study aims to also identify the most critical states in each
in Mali. of the five major regions that have the highest expected
percentage increase in the number of cases.
To achieve these objectives, this study employed both
Conclusions and Suggestions statistical and deep learning approaches, which consisted of
three prediction models that were composed of the ARIMA,
This study involves the forecasting of COVID-19 cumula- Prophet, and LSTM models. In a comparative analysis of
tive positive cases in countries from the five major regions the performance of these three models, seven performance
of the African continent, which include the Northern, metrics were used. These included the MSE, RMSE, MAPE,
Eastern, Western, Central, and Southern regions. To con- SMAPE, R2 score, NRMSE, and PSNR. The best-perform-
tain and control the spread of the COVID-19 pandemic, ing model was then selected to perform the forecasting of
there is a great need for strategies that can predict the the future COVID-19 cumulative positive cases for a 61-day

SN Computer Science
SN Computer Science (2023) 4:374 Page 31 of 32 374

perspective. In this study, the best-performing model was the 8. Huang CJ, Chen YH, Ma Y, Kuo PH. Multiple-input deep con-
LSTM model, while the worst-performing model was the volutional neural network model for COVID-19 forecasting in
China. medRix. 2020. https://​doi.​org/​10.​1101/​2020.​03.​23.​20041​
Prophet model. The highest expected increase in the number 608.
of cases from the Western African region is expected to be 9. Huang Y, Xu C, Ji M, Xiang W, He D. Medical service demand
22.77% from Mali. On the other hand, in Angola, a coun- forecasting using a hybrid model based on ARIMA and self-adap-
try from the Southern region, the overall highest expected tive filtering method. BMC Med Inform Decis Mak. 2020. https://​
doi.​org/​10.​1186/​s12911-​020-​01256-1.
increase is 18.97%. The highest expected increase from 10. Kırbaş S, Sözen A, Tuncer AD, Kazancıoğlu FI. Comparative
the Northern region is expected to take place in Egypt, analysis and forecasting of COVID-19 cases in various European
at 11.83%. In the Eastern region, the highest increase of countries with ARIMA, NARNN and LSTM approaches. Chaos,
10.72% is expected to occur in Somalia. Lastly, from the Solitons Fractals. 2020;138:110015. https://​doi.​org/​10.​1016/j.​
chaos.​2020.​110015.
Central African region, the highest expected increase is 11. Le H, Lee J. Application of long short-term memory (LSTM) neu-
2.81% in Gabon. There is a need for studies that consider ral network for flood forecasting. Water. 2019;11(7):1387. https://​
the influence of population demographics on the spread of doi.​org/​10.​3390/​w1107​1387.
COVID-19 12. Marzouk M, Elshaboury N, Abdel-Latif A, Azab S. Deep learning
model for forecasting COVID-19 outbreak in Egypt. Process Saf
Environ Prot. 2021;153:363–75. https://​doi.​org/​10.​1016/j.​psep.​
2021.​07.​034.
Author Contributions All authors have participated in (a) the concep- 13. Noureen S, Atique S, Roy V, Bayne S. Analysis and application
tion and design, or analysis and interpretation of the data; (b) drafting of seasonal ARIMA model in Energy Demand Forecasting: a case
the article or revising it critically for important intellectual content; study of small scale agricultural load. In: 2019 IEEE 62nd iIn-
and (c) approval of the final version. This manuscript has not been ternational midwest symposium on circuits and systems (MWS-
submitted to, nor is under review for, another journal or other publish- CAS), Dallas, TX, USA, 2019, p. 521–524. https://​doi.​org/​10.​
ing venue. 1109/​MWSCAS.​2019.​88853​49
14. Pal R, Sekh AA, Kar S, Prasad DK. Neural network based
Funding No funding was granted to this research work. country wise risk prediction of COVID-19. Appl Sci.
2020;10(18):6448.https://​doi.​org/​10.​3390/​app10​186448
Data availability The dataset used as case study was obtained and 15. Ribeiro MH, da Silva RG, Mariani VC, dos Santos Coe-
retrieved on October 1, 2021, from https://​data.​humda​ta.​org/​datas​et/​ lho L. Short-term forecasting COVID-19 cumulative con-
africa-​covid​19-​infec​ted firmed cases: Perspectives for Brazil. Chaos, Solitons Fractals.
2020;135:109853. https://​doi.​org/​10.​1016/j.​chaos.​2020.​109853
Declarations 16. Shastri S, Singh K, Kumar S, Kour P, Mansotra V. Time
series forecasting of COVID-19 using deep learning models:
Conflict of Interest The authors declare that they have no conflict of India-USA comparative case study. Chaos, Solitons Fractals.
interest. 2020;140:110227. https://​doi.​org/​10.​1016/j.​chaos.​2020.​110227.
17. Singh RK, Rani M, Bhagavathula AS, Sah R, Rodriguez-Morales
AJ, Kalita H, Nanda C, Sharma S, Sharma YD, Rabaan AA,
Rahmani J, Kumar P. Prediction of the COVID-19 Pandemic for
the Top 15 affected countries: advanced autoregressive integrated
References moving average (ARIMA) Model. JMIR Pub Health Surveil.
2020;6(2):e19115. https://​doi.​org/​10.​2196/​19115.
1. Abdulmajeed K, Adeleke M, Popoola L. Online forecasting 18. Taylor SJ, Letham B. Forecasting at scale. Am Stat. 2017. https://​
of COVID-19 cases in nigeria using limited data. Data Brief. doi.​org/​10.​7287/​peerj.​prepr​ints.​3190v2.
2020;30:105683. https://​doi.​org/​10.​1016/j.​dib.​2020.​105683. 19. Wang P, Zheng X, Li J, Zhu B. Prediction of epidemic trends in
2. Africa: COVID-19 Infections (National)—Humanitarian Data COVID-19 with logistic model and machine learning technics.
Exchange. (2021). Humanitarian Data Exchange. Retrieved Chaos, Solitons Fractals. 2020;139:110058. https://​doi.​org/​10.​
October 1, 2021, from https:// ​ d ata. ​ h umda ​ t a. ​ o rg/ ​ d atas ​ e t/​ 1016/j.​chaos.​2020.​110058.
africa- covid19-infected 20. Wang YW, Shen ZZ, Jiang Y. Comparison of ARIMA and
3. African Countries by Population (2022)—Worldometer. (2021). GM(1,1) models for prediction of hepatitis B in China. PLoS
Worldometer. Retrieved November 1, 2021, from https://​www.​ ONE. 2018;13(9):e0201987. https://d​ oi.o​ rg/1​ 0.1​ 371/j​ ourna​ l.p​ one.​
world​omete​rs.​info/​popul​ation/​count​ries-​in-​africa- by-population/ 02019​87.
4. Archived: WHO Timeline—COVID-19. (2020, April 28). WHO 21. Wang Y, Wang J, Zhao G, Dong Y. Application of residual modi-
Timeline. https://​www.​who.​int/​news/​item/​27-​04-​2020-​who-​timel​ fication approach in seasonal ARIMA for electricity demand fore-
ine---​covid-​19 casting: a case study of China. Energy Policy. 2012;48:284–94.
5. Gebretensae YA, Asmelash D. Trend analysis and forecasting the https://​doi.​org/​10.​1016/j.​enpol.​2012.​05.​026.
spread of COVID-19 pandemic in Ethiopia using Box-Jenkins 22. W.H.O. (2020, January 10). Coronavirus. WHO. https://​www.​
modeling procedure. Int J Gen Med. 2021;14:1485–98. https://​ who.​int/​health- topics/coronavirus#tab=tab_1
doi.​org/​10.​2147/​ijgm.​s3062​50. 23. Yang Z, Zeng Z, Wang K, Wong SS, Liang W, Zanin M, Liu P,
6. Hssayeni MD, Chala A, Dev R, Xu L, Shaw J, Furht B, Ghoraani Cao X, Gao Z, Mai Z, Liang J, Liu X, Li S, Li Y, Ye F, Guan W,
B. The forecast of COVID-19 spread risk at the county level. J Yang Y, Li F, Luo S, He J. Modified SEIR and AI prediction of
Big Data. 2021. https://​doi.​org/​10.​1186/​s40537-​021-​00491-1. the epidemics trend of COVID-19 in China under public health
7. Hu, Z. (2020, February 17). Artificial Intelligence Forecasting of
COVID-19 in China. arXiv.Org. https://​arxiv.​org/​abs/​2002.​07112

SN Computer Science
374 Page 32 of 32 SN Computer Science (2023) 4:374

interventions. J Thorac Dis. 2020;12(3):165–74. https://​doi.​org/​ 27. Zoabi Y, Deri-Rozov S, Shomron N. Machine learning-based pre-
10.​21037/​jtd.​2020.​02.​64. diction of COVID-19 diagnosis based on symptoms. Npj Digital
24. Yu CS, Chang SS, Chang TH, Wu JL, Lin YJ, Chien HF, Chen RJ. Med. 2021. https://​doi.​org/​10.​1038/​s41746-​020-​00372-6.
A COVID-19 pandemic artificial intelligence-based system with
deep learning forecasting and automatic statistical data acquisi- Publisher's Note Springer Nature remains neutral with regard to
tion: development and implementation study. J Med Internet Res. jurisdictional claims in published maps and institutional affiliations.
2021;23(5):e27806. https://​doi.​org/​10.​2196/​27806.
25. Zeroual A, Harrou F, Dairi A, Sun Y. Deep learning methods Springer Nature or its licensor (e.g. a society or other partner) holds
for forecasting COVID-19 time-series data: a comparative study. exclusive rights to this article under a publishing agreement with the
Chaos, Solitons Fractals. 2020;140:110121. https://​doi.​org/​10.​ author(s) or other rightsholder(s); author self-archiving of the accepted
1016/j.​chaos.​2020.​110121. manuscript version of this article is solely governed by the terms of
26. Zhang X, Yu Y, Xiong F, Luo L. Prediction of daily blood sam- such publishing agreement and applicable law.
pling room visits based on ARIMA and SES model. Comput Math
Methods Med. 2020;2020:1–11. https://​doi.​org/​10.​1155/​2020/​
17201​34.

SN Computer Science

You might also like