Besttt
Besttt
https://doi.org/10.1007/s42979-023-01801-5
ORIGINAL RESEARCH
Forecasting the Spread of COVID‑19 Using Deep Learning and Big Data
Analytics Methods
Cylas Kiganda1 · Muhammet Ali Akcayol1
Received: 2 June 2022 / Accepted: 22 March 2023 / Published online: 3 May 2023
© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2023
Abstract
To contain the spread of the COVID-19 pandemic, there is a need for cutting-edge approaches that make use of existing
technology capabilities. Forecasting its spread in a single or multiple countries ahead of time is a common strategy in most
research. There is, however, a need for all-inclusive studies that capitalize on the entire regions on the African continent. This
study closes this gap by conducting a wide-ranging investigation and analysis to forecast COVID-19 cases and identify the
most critical countries in terms of the COVID-19 pandemic in all five major African regions. The proposed approach lever-
aged both statistical and deep learning models that included the autoregressive integrated moving average (ARIMA) model
with a seasonal perspective, the long-term memory (LSTM), and Prophet models. In this approach, the forecasting problem
was considered as a univariate time series problem using confirmed cumulative COVID-19 cases. The model performance
was evaluated using seven performance metrics that included the mean-squared error, root mean-square error, mean abso-
lute percentage error, symmetric mean absolute percentage error, peak signal-to-noise ratio, normalized root mean-square
error, and the R2 score. The best-performing model was selected and used to make future predictions for the next 61 days.
In this study, the long short-term memory model performed the best. Mali, Angola, Egypt, Somalia, and Gabon from the
Western, Southern, Northern, Eastern, and Central African regions, with an expected increase of 22.77%, 18.97%, 11.83%,
10.72%, and 2.81%, respectively, were the most vulnerable countries with the highest expected increase in the number of
cumulative positive cases.
Keywords Deep learning · COVID-19 · Artificial neural networks · Long short-term memory · Autoregressive integrated
moving average · Prophet
SN Computer Science
Vol.:(0123456789)
374 Page 2 of 32 SN Computer Science (2023) 4:374
COVID-19 virus. In this context, the spread of COVID-19 COVID-19-positive cases. In “Model Selection Criteria”,
is considered a time series problem to which deep learn- the models used in this study are discussed in detail.
ing forecasting algorithms and big data statistical mod- This study uses the African continent as a case study.
els are applied. Among the deep learning algorithms are In this comprehensive approach, the African continent was
the long short-term memory (LSTM) model as applied broken down into five major subregions, including Northern,
by Marzouk et al. [12], Hssayeni et al. [6], Yu et al. [24], Southern, Eastern, Western, and Central Africa. While most
Zeroual et al. [25], Pal et al. [14] and Shastri et al. [16]; studies focus on a single or a few countries as a case study
the convolutional neural network (CNN) model as applied during the prediction of the spread of COVID-19, this study
in research by Huang et al. [8], which performs well on included and utilized all the African continent’s regions. In
image data such as X-ray images; the autoencoder model, this study, the successful prediction model was selected by
which was applied by Hu [7]; gradient boosting, which using seven performance indicators. The performance indi-
provided the best results in research conducted by Zoabi cators include mean-square error (MSE), root mean-square
et al. [27]; and the Prophet model, which was applied by P. error (RMSE), mean absolute percentage error (MAPE),
Wang et al. [19] to perform epidemiological trend predic- symmetric mean absolute percentage error (SMAPE) R2
tion. Big data statistical models include models such as score, normalized root mean-square error (NRMSE), and
the auto-regressive integrated moving average (ARIMA) peak signal-to-noise ratio (PSNR). In “The Framework Of
model as applied by Gebretensae and Asmelash [5] and The Applied Approach”, the performance metrics are pro-
the susceptible-exposed infectious-removed (SEIR) model, vided in detail. The best-performing model was then used to
which has been proven to be a robust model to predict perform the prediction of COVID-19 cases 61 days ahead of
the trend of COVID-19 as applied by Yang et al. [23]. schedule. In “Results and Discussion”, the model results are
Among the deep learning models used to perform time provided and discussed in detail.
series prediction, the LSTM has been widely used due to
its successful results in most research experiments. On the
other hand, the ARIMA statistical model has also been Related Work
widely applied in the health sector, for example, in a study
by Y. W. Wang et al. [20] to predict the spread of hepatitis In this section, prediction approaches and methods used in
B disease, in the forecasting of medical service demand other research studies are addressed. These studies mainly
by Y. Huang et al. [9] and in the prediction of daily blood concentrate on the prediction of the spread of COVID-19
sampling room visits by Zhang et al. [26]. using both statistical and deep learning tools.
The following questions will be addressed by this In a research study by Gebretensae and Asmelash [5],
research: the autoregressive integrated moving average (ARIMA)
algorithm was used to forecast the spread of COVID-19 in
1. What is the best-performing prediction model given the Ethiopia. The autocorrelation function (ACF) and partial
COVID-19 cumulative positive cases data from African autocorrelation functions (PACF) were used to obtain the
countries in five key regions? model’s optimal terms. It was observed that the ARIMA
2. Is it possible to estimate the total number of cumula- models, ARIMA (0, 1, 5) and ARIMA (2, 1, 3), produced
tive positive cases 61 days ahead of time using the best the best results. Ribeiro et al. [15] developed a stacking-
prediction model? ensemble learning algorithm that included ARIMA, cubist
3. After a 61-day forecasting period, which countries on regression, random forest, and support vector regression. In
the African continent are in the most vulnerable position this study, the Gaussian process was employed as a meta-
in terms of the COVID-19 virus's spread? learner, while the random forest, ridge regression, and other
algorithms were utilized as foundational learners. In this
In this study, a comparative and analytical approach study, it was observed that the support vector regression
were followed to predict the spread of the COVID-19 algorithm produced the best results.
virus. This approach includes two deep learning models Abdulmajeed et al. [1] applied a deep learning ensemble
and a statistical model. The deep learning models include method to predict COVID-19 cases in Nigeria. The empha-
LSTM and Prophet. The statistical model comprises the sis in this study was to create a prediction method that uses
ARIMA model. In most studies, the modeled ARIMA as little data as possible to give accurate predictions. This
model does not include the seasonal component of the was because there was a problem with limited training data
problem. However, in this study, it is modeled to take into for models to learn the COVID-19 spread. This deep learn-
consideration the seasonal component of the time series ing approach combined four prediction approaches, which
problem. The spread of COVID-19 was considered to included one statistical method called ARIMA. Among the
be a univariate time series problem using the number of other deep learning models in the ensemble approach were
SN Computer Science
SN Computer Science (2023) 4:374 Page 3 of 32 374
the Prophet model (supported and provided by Facebook), were all model input features. Techniques such as early stop-
the Holt–Winters exponential smoothing model, and the ping were used to improve the results.
generalized autoregressive conditional heteroscedasticity Pal et al. [14] used the LSTM model and Bayesian opti-
(GARCH). While applying the ARIMA model, non-seasonal mization to determine COVID-19 risk categories. To obtain
phenomena were used. To find the best ARIMA model, strat- the hyperparameters, the search space had to be defined.
egies such as brute search, autocorrelation function inspec- The optimal hyperparameters were obtained and used by
tion, and partial autocorrelation function plots are used. the model in the local trend prediction phase to perform
Wang et al. [19] used a hybrid prediction strategy to pre- country-specific predictions. Finally, a fuzzy rule-based risk
dict the COVID-19 cumulative cases in their study. This categorization process was carried out, in which the data
included the logistic and Prophet models. With the Prophet obtained from the previous module was used to determine
model, the primary focus was on modeling non-periodic each country’s risk status. This study concluded that weather
changes. The model included the date and the total number had no significant impact on the spread of COVID-19.
of COVID-19 cases obtained from a specific country. The Shastri et al. [16] conducted research on COVID-19
logistic model was used to identify the quickest rising point time series prediction and comparative analysis using
in the data in this hybrid method. The output of this model variants of long short-term memory neural network mod-
is then fed into the Prophet model, which is used to make the els. Among them were models such as bidirectional long
final forecast. Marzouk et al. [12] used three deep learning short-term memory, convolutional long short-term memory,
models to forecast the spread of COVID-19 in Egypt: the and stacked long short-term memory. Two countries were
LSTM, convolutional neural network, and multilayer per- used as case studies. Among these are the USA and India.
ceptron neural network. In this study, the COVID-19 data Because models are sensitive to the size of data input values,
was modeled as a time series data. In this study, the LSTM tools like MinMaxScaler were used to perform data normali-
outperformed the other two models. zation. Various regions of the USA and India were divided
Hssayeni et al. [6] used mobility data to predict the into groups based on the severity of the COVID-19 situation.
COVID-19 risk spread using the LSTM model and the These were the initial, moderate, and severe groups. Regions
gradient tree boosting model in their study. In this study, it with a high number of COVID-19 cases were classified as
was discovered that the number of daily cases decreased in severe. When compared to the other two models, the convo-
the retiree context, while it increased in the youth context. lutional LSTM model produced the best results.
Yang et al. [23], on the other hand, used the susceptible- In the related literature, several models have been used to
exposed-infectious-removed (SEIR) and the LSTM models forecast the spread of COVID-19 in a couple of countries.
to forecast the spread of the COVID-19 pandemic in China. However, the African continent has not been extensively
The SIER algorithm was used to model epidemiological and studied in this regard. This study aimed to close this gap
mobility data by specifying parameters, and the parameter by applying the most successful model (LSTM) among the
was defined as the product of the daily number of people rest of the forecasting models to conduct an extensive inves-
in contact with COVID-19 patients and the likelihood of tigation and analysis of African states from the five major
transmission. σ was the amount of time it took for a COVID- regions of the continent. In addition, the most critical states
19 patient to develop infection symptoms. Finally, γ was with the highest expected COVID-19 increase rate from each
determined to be the average mortality or recovery rate. The region were identified for immediate action in the region.
rate of pandemic spread in Hubei province was determined
using these parameters. These parameters were then fed into
the LSTM model as input. Methods and Materials
Zeroual et al. [25] used five models to predict new and
recovered COVID-19 cases. The recurrent neural network, Data Gathering
long short-term memory, bidirectional LSTM, gated recur-
rent units, and variational autoencoder were among the mod- Africa’s Geographical Regions and Populations
els used. The study was carried out in six different countries:
Italy, Spain, France, China, the USA, and Australia. The The case studies used in this study included countries from
variational autoencoder model produced the best results. The the five major regions of the African continent. These
best model was used to forecast cases for the next 2 weeks. regions, as depicted in Figure 1, include the Northern, East-
To forecast the positive COVID-19 outcome in a PCR test, ern, Southern, Central, and Western regions.
Zoabi et al. [27] used the gradient-boosting algorithm in Much work on the COVID-19 pandemic has been done in
conjunction with the Shapley additive explanations (SHAP) the literature. In some research, several or individual Afri-
bee-swarm plot. Sex, contact with COVID-19 patients, and can countries have been used as case studies, for example,
the presence of the five most notable COVID-19 symptoms research done by Abdulmajeed et al. [1]. In this study, the
SN Computer Science
374 Page 4 of 32 SN Computer Science (2023) 4:374
African continent is considered from a broader perspective, is based directly on the immediate past data point, while an
including countries from each of the major regions that make AR(2) implies that it is based on two past data points in the
up the continent. This study performs a comparative analysis series by Kırbaş et al. [10]. The "I" component stands for the
of the COVID-19 pandemic spread. integrated element, which shows the amount of difference
between the current data points and their preceding values.
COVID‑19 Data This is part of the ARIMA model that handles the data sta-
tionarity requirement for better results in ARIMA time series
A humanitarian data exchange [2] source provided the processing, which is attained by the differencing process as
COVID-19 dataset used in this research. This informa- explained in the research by the Noureen et al. [13]. Station-
tion was gathered by first splitting the data of each country arity in ARIMA processing refers to the condition when the
into distinct groups based on the country's geography. The mean and variance statistical parameters in the time series
Northern, Southern, Central, Eastern, and Western regions data are constant with respect to the time factor. The last
of Africa were used in the study. Model fitting was then done part in the basic ARIMA structure is the "MA" part, which
for each country separately. This data was split into training represents the moving average. This component displays the
and testing datasets, with the former accounting for 80% of linear combination that exists between the error values at
the total prediction models. past intervals in the time series as denoted by Ribeiro et al.
[15]. The standard notation of the basic ARIMA model is
ARIMA Model denoted as ARIMA (p, d, q). The p, d, and q terms represent
the autoregressive, differencing, and moving average terms
The ARIMA model is made up of three main parts: the as described in the research by Abdulmajeed et al. [1]. The
terms “AR,” “I,” and “MA” are among these elements. As mathematical notation for the AR (p) term can be repre-
mentioned by Noureen et al. [13], the “AR” term refers to sented as shown in Eq. 1.
the autoregression parameter. This shows that the variable
under consideration in this context has a linear relationship
Yt = 𝛿 + 𝜑1 Yt−1 + 𝜑2 Yt−2 + ⋯ + 𝜑p Yt−p + 𝜀t . (1)
between its present and prior values. That is to say, an AR(1)
of order one implies that the current data point in the series
SN Computer Science
SN Computer Science (2023) 4:374 Page 5 of 32 374
In the above equation, Yt denotes the time series value at generative additive model (GAM), which is a linear regres-
a given time point t. The p, δ, and εt denote the autoregres- sion model whose linear variable is reliant on smoothing
sion term, fixed value, and the error value, respectively. The functions. GAMs can be quantitatively represented using
moving average component can be defined mathematically Eq. 5.
in Eq. 2.
g(E(Y)) = 𝛽0 + f 1(x1) + f 2(x2) + ⋯ + fm(xm). (5)
Yt = 𝜇 + 𝜀t + 𝜃 1 𝜀t−1 + ⋯ + 𝜃2 𝜀t−2 + 𝜃q 𝜀t−q . (2)
In Eq. 6, Y represents the univariate response variable,
In Eq. 2, q depicts the order of the moving average term. x1 represents the predictor variable, and f1 represents the
The difference term d can be obtained from Eq. 3. smoothing functions. Due to its use of GAM model formula-
tion, the Prophet model has a variety of benefits, including
ΔYt = Yt − Yt−1 = Yt − LYt . (3) flexibility and quick fitting times, and evaluates a time series
In Eq. 3, ∆Yt denotes the stationary time series value at problem from three perspectives, including trend, seasonal-
a time interval t. ity, and holiday components, as discussed in research carried
out by Taylor and Letham [18]. The trend component takes
(1 − 𝜑1 L − 𝜑1 L2 − ... − 𝜑p Lq )ΔdYt = 𝛿 + 𝜃1 𝜀t−1 + ⋯ + 𝜃q 𝜀t−q . into account the likelihood of time series data increasing or
(4) decreasing over time. Seasonality, on the other hand, looks
Equation 4 is a combination of all the equations for the at data changes that happen over a short time period.
basic ARIMA model terms. This denoted the full ARIMA y(t) = g(t) + s(t) + h(t) + 𝜀t . (6)
(p, d, q) model equation with the complete set of terms com-
puted and represented. The final predicted value y(t) is obtained from a combina-
The partial autocorrelation function (PACF) and autocor- tion of the trend, seasonal and holiday component functions
relation function (ACF) graphs, as shown in Fig. 2, can also as shown in Eq. 6 above, where εt represents the changes that
be used to obtain the ARIMA model's p and q terms. The are not captured by the model [18].
ACF plot is a graphical representation of the average cor-
relation between data and prior values in a time series over LSTM Model
different lag intervals. The only difference between the two
exists in the fact that PACF reveals correlations within a The LSTM model is composed of three main core compo-
shorter lag interval, as explained in the research by Noureen nents. These include the forget gate, input gate, and output
et al. [13]. gate [16]. The forget gate identifies the degree to which past
data is obliterated. The input gate receives the data that is
Prophet Model taken into the cell’s internal state, while the output gate is
used to create the next hidden state or output that is obtained
The Prophet model is a deep learning model for time series from the existing internal state value.
forecasting. The Facebook group created and maintains this The above figure displays the major building blocks of
model as an open-source initiative. According to Taylor and the LSTM model. It is evident that the main building blocks
Letham [18], it is based on the generic specification of a of the LSTM model consist of the forget gate, input gate
SN Computer Science
374 Page 6 of 32 SN Computer Science (2023) 4:374
Mean Absolute Percentage Error In Fig. 4, the major stages of this study include splitting the
preprocessed positive COVID-19 cumulative cases data into
Equation 12 can be used to represent this performance meas- 80% training and 20% testing datasets, fitting the models,
ure numerically. validating the model performance using the performance,
n and then selecting the best-performing model to use it to
100% ∑ || At − Ft || forecast the future positive COVID-19 cases for the next
MAPE = . (9)
n t=1 || At || 61 days.
SN Computer Science
SN Computer Science (2023) 4:374 Page 7 of 32 374
Training phase
ARIMA
LSTM Prophet
Testing phase
Evaluation
Forecasting phase
61-day forecasting
SN Computer Science
374 Page 8 of 32 SN Computer Science (2023) 4:374
This is a statistical method that uses regression in which past In this study, countries from the African continent were
data points and errors are connected using weight factors, grouped into the five groups named in “Data Gathering”.
which improves the overall prediction results, as described Three forecasting models were used, including the ARIMA,
in the research by Singh et al. [17]. This model also amal- LSTM, and Prophet. In this section, the performance results
gamates the strengths of both the autoregression and mov- obtained from these models are given for each region of
ing average models, which further makes it a robust choice Africa.
that extracts the inherent statistical relationship between the
dependent and independent variables. It is a flexible model Model Training and Testing
to use, since it incorporates the difference between data
points both in the past and present context, which makes it Northern Africa
able to handle and process data which is not stationary using
a few parameters as described by Abdulmajeed et al. [1]. In the Northern region of Africa, of the six countries stud-
Another factor lies in the fact that it is easier to obtain the ied, the most densely populated country is Egypt, as shown
optimal parameter terms of this model using simple methods in Fig. 2, with a population of 102334404, while the least
like the PACF and ACF plots, as described in the research populated country is Mauritania, with a population of
by Gebretensae and Asmelash [5]. Also, metrics such as 4649658 as observed in the work by Worldometer [3].
the Akaike information criteria and Bayesian information In Fig. 5, it can be seen that Morocco has maintained
criteria make it possible to measure how good the ARIMA the highest number of COVID-19 cases over time. This was
SN Computer Science
SN Computer Science (2023) 4:374 Page 9 of 32 374
followed by Tunisia in this critical condition. On the other that the greater the number, the better is the model’s relative
hand, Mauritania, on the other hand, has the lowest number performance.
of cases over time compared to other states in this region.
Libya has a relative increase in cases, with a gradual
increase occurring between the months of October 2020 and Central Africa
July 2021. Beyond the month of July, a sharp increase that
slowly reduces toward the month of October is observed. In this region, five states were studied. At the time of this
This clearly describes the first wave of COVID-19 cases study, the most populated state in this group was Cameroon,
in Libya. Algeria's trend is more similar to that of Libya’s. with a population of 26545863 [3]. On the other hand, the
However, it is observed that the cases reach a constant num- least populated state is São Tomé and Príncipe, whose popu-
ber, while in Libya there is an increase. lation is 219,159.
According to Fig. 6, it is observed that the LSTM model In Fig. 7 above, the COVID-19 cumulative cases from the
fits better than both the ARIMA and Prophet models. In five countries in this region have been given. According to
Tunisia, it can be observed that the Prophet model performs this graph, COVID-19 cases in Cameroon are higher than
the worst in predicting the test data. This is because while in the rest of the countries, with more than two significant
the test data flattens to a constant case value, the Prophet waves. Cameroon is followed by Gabon, which also has
model predicts a sharp increase of over 800000 cases. In more than two waves. The rest of the countries maintain
countries like Egypt and Tunisia, the ARIMA and Prophet a slightly constant curve, with minor increases in COVID-
models predicted lower and higher cases, respectively, with 19 cases. The lowest number of cases is seen in São Tomé
respect to the actual data. Apart from these two countries, and Príncipe. A positive correlation is observed between the
in the four other countries, both models predicted lower population variable and the number of cases. This is because
cumulative positive cases with regard to the actual data. This the highest number of cases is observed in Cameroon, which
confirms the poor performance of these two models when is also the most populated state in this region [3]. On the
compared to the LSTM model, which predicts better results other hand, it can also be observed that the least number of
close to the actual data in five countries except Egypt. cases are observed in São Tomé and Príncipe, a country with
In Table 1, the best results in terms of the PSNR and R the smallest population. This makes Cameroon the member
value can be observed with larger numbers, which implies with the highest risk in terms of COVID-19 spread in this
region.
SN Computer Science
374 Page 10 of 32 SN Computer Science (2023) 4:374
SN Computer Science
SN Computer Science (2023) 4:374 Page 11 of 32 374
The highest PSNR and R values were obtained by the LSTM model in Mauritania. These values were −1.0514 and 0.9962, respectively. For the
rest of the performance metrics other than PSNR and R values, the best results are observed with lower values. It is also evident in Mauritania
that the lowest RMSE value of 287.8118 was obtained from the LSTM model. The ARIMA and Prophet models produced MAPE ranges of
2.0052–25.0037 and 16.7849–26.4318. On the other hand, it was observed that the LSTM model produced the lowest MAPE of 0.7551–5.7408.
The highest MAPE value for the LSTM model is clearly observed to be lower than the lowest MAPE values for both the ARIMA and Prophet
models. This makes the LSTM the best-performing model in predicting the COVID-19 cumulative in the Northern African region. Among the
countries in this region, the best model performance was observed in Mauritania, while the worst model performance was observed in Morocco,
with an RMSE value of 280882.9106 by the Prophet model
SN Computer Science
374 Page 12 of 32 SN Computer Science (2023) 4:374
In Table 2, it is observed that the lowest RMSE value of 13.2742 was obtained by the LSTM model from Chad, as well as the highest RMSE
of 26508.4573 was obtained by the Prophet model in Cameroon. It is evident that the lowest and highest PSNR values of 40.3369 and 16.3165
were observed in Cameroon and São Tomé and Príncipe by the Prophet and LSTM models, respectively. The best MAPE range of 0.2212–
7.4279 was obtained by the LSTM model, followed by 0.7272–1.1389 and 4.7480–28.8557 by the ARIMA and Prophet models. In this region,
the best model performance was obtained by the LSTM model, while the worst model performance was seen in the Prophet model
countries in the same region are experiencing their second close to the actual data, while the ARIMA model pre-
wave of virus spread, South Africa is observed to have three dicted a lower number of cases, quite different but also
waves. Since it has the largest population, there is a positive substantially close to the actual data. It is in this coun-
correlation between the large number of cases observed and try that the three models show a significant uniformity in
the large population. their predicted results. This can be generally attributed to
For clarity, in Fig. 10, South Africa was excluded to the smooth rise in the number of cases in Angola, which
be able to perform a comparative analysis of the COVID- makes it easier for all the models to capture the inherent
19 state in other countries in the same region. It can be data relationships and trends to be able to make better
observed that, apart from South Africa, Zambia has the predictions.
largest number of cases compared to other countries. It In Fig. 12, it is observed that the ARIMA model per-
is also the first country to have an earlier increase in the formed the worst when compared to the other countries. This
number of cases. It is also observed that all countries have model made predictions that were generally higher than the
had their second major wave of COVID-19 spread. It is actual data. In all four countries, the ARIMA model predicts
worth noting that the lowest number of cases was observed a higher number of cases than the numbers predicted by the
in Lesotho. Beyond the month of October, it is clearly rest of the models. The LSTM model is also observed to
observed that in all countries, there is a constant number provide the best performance with the best-matching predic-
of cases with the curves flattened. This clearly signifies the tions. The LSTM model is followed by the Prophet model,
effects of some form of control of the spread by a number with the second-best prediction performance. In the South
of practices, such as quarantines and vaccinations. African region, the LSTM model is observed to provide the
In Fig. 11, in three countries (Botswana, Malawi, and best overall prediction results compared to the ARIMA and
Mozambique), the LSTM model provided the best-match- Prophet models, as shown in both Figs. 11 and 12, while
ing prediction results. In Lesotho, the ARIMA model the worst prediction results are observed from the ARIMA
performed better than the other two models. The Prophet model.
model emerged as the worst performer, as clearly observed Table 3 displays the performance metrics used to deter-
in four countries: Malawi, Mozambique, Eswatini, and mine the best prediction model in the Southern African
Lesotho. In these countries, this model predicts a roughly region.
constant number of cases, with slight increases in the pre-
dicted number of cases. In Angola, both the LSTM and
Prophet models produced slightly matching predictions
SN Computer Science
SN Computer Science (2023) 4:374 Page 13 of 32 374
SN Computer Science
374 Page 14 of 32 SN Computer Science (2023) 4:374
Fig. 9 Cumulative positive cases in the Southern African region including South Africa
Fig. 10 Cumulative positive cases in the Southern African region excluding South Africa
region. If immediate measures are not taken, there are higher of countries from this region, it can be observed that the
chances of a faster spread to other countries too. LSTM model outperformed the other two models in produc-
Figure 14 displays the prediction results of the three ing the best-matching prediction results. This can be clearly
models in the region of Western Africa. In this first group observed in countries like Guinea, Guinea-Bissau, Gambia,
SN Computer Science
SN Computer Science (2023) 4:374 Page 15 of 32 374
Fig. 11 Actual and predicted cumulative positive cases for Southern Africa (a)
Ghana, and Togo. In Burkina Faso, the Prophet model man- model. This is the only country where this model performs
ages to make the most successful prediction. The ARIMA best when its performance is compared to the remaining
and Prophet are observed to make marching predictions in countries. It can also be concluded from this figure that the
three countries: Guinea-Bissau, Ghana, and Togo. These ARIMA model did not display any top performance in any
predictions suggest a lower COVID-19 case number when of the countries. In all the six countries in this group in the
compared to the actual data. This provides another proof of Western region of Africa, the LSTM model maintains the
how these two models perform poorly when compared to the best-matching prediction results, which continues to affirm
LSTM model. In Fig. 15, the second group of model predic- the LSTM model as the top performing model in this region.
tions in the Western region of Africa is given. According to In Nigeria, both the ARIMA and Prophet models make
this figure, it can be observed that the best model prediction matching predictions against each other, which is still lower
performance obtained in Niger is obtained from the Prophet and significantly different from the actual data. These results
SN Computer Science
374 Page 16 of 32 SN Computer Science (2023) 4:374
Fig. 12 Actual and predicted cumulative positive cases in Southern Africa (b)
prove the LSTM model to be the best prediction model in the number of confirmed cases. If proactive measures are not
the West African region. applied, the Eastern region is at a higher risk of experienc-
In Table 4, the prediction results based on the seven met- ing a surge in the spread of COVID-19. In the region, there
rics used in this study for the three models are provided for was a relatively late occurrence of the first cases, which is
the 12 countries from the Western region of Africa. observed from the fact that the significant numbers of cases
started to be registered just after the month of July in 2020
in all countries. In this region, Kenya is observed to have the
Eastern Africa highest number of waves of the COVID-19 spread. Apart
from Ethiopia, Kenya, Uganda, Rwanda, Madagascar, and
From this region, 12 countries were studied. Among these, Sudan, the rest of the countries are observed to have a rela-
the Comoros is observed to be the least populated coun- tively slow increase in the number of cases reported. This
try, with a population of 869601, while the most populated can be due to varying measures that might have been taken
country is observed to be Ethiopia, with a population of by the respective countries and also the general population.
114963588 at the time of this study. For example, in the Comoros, the least populated country
The cumulative positive COVID-19 cases for the coun- in this region.
tries in the Eastern region of Africa have been given in the Both Figs. 16 and 17 display the prediction results from
plot in Fig. 16. It is notably clear that in this region, the the LSTM, ARIMA, and Prophet models in the 12 countries
highest number of cases is obtained in Ethiopia, which is used in this study from the Eastern region of Africa. These
followed by Kenya. It is worth noting that the population results display both the plots of the predicted data by the
of Kenya, at 53771296 people immediately follows that of models and the expected actual data. It is observed from
Ethiopia, while at the same time, its number of cumulative Fig. 16 that all three models performed relatively well in
cases immediately follows that of Ethiopia, which means a the Comoros, followed by Sudan, as displayed in Fig. 17. In
roughly positive correlation between the population size and the rest of the countries, in both figures, it can be observed
SN Computer Science
SN Computer Science (2023) 4:374 Page 17 of 32 374
South Africa ARIMA − 68.6754 − 5.7832 0.7905 16.5715 692322.2002 479310000000.0000 19.0097
LSTM − 39.7648 0.9913 0.0283 0.7800 24818.6368 615964732.6000 0.7830
Prophet − 68.2322 − 5.1251 0.7512 26.7633 657882.1667 432809000000.0000 23.5047
Zambia ARIMA − 48.9441 − 48.6151 1.7532 23.3656 71407.6662 5099054792.0000 28.1140
LSTM − 6.7122 0.9970 0.0136 0.2252 552.2662 304997.9557 0.2252
Prophet − 44.8702 − 18.4190 1.0968 24.6496 44673.6041 1995730903.0000 21.8100
Namibia ARIMA − 50.7674 − 168.8737 3.0293 40.7690 88086.5415 7759238793.0000 57.0550
LSTM − 11.0236 0.9820 0.0312 0.6692 907.2334 823072.4421 0.6697
Prophet − 39.0956 − 10.5598 0.7902 19.9281 22978.5282 528012758.2000 17.8522
Eswatini ARIMA − 35.7542 − 1.6589 0.5871 40.7329 15640.5055 244625412.3000 32.6660
LSTM − 25.7619 0.7336 0.1858 10.6050 4950.3420 24505885.9200 9.5222
Prophet − 38.0908 − 3.5536 0.7683 58.2823 20468.2129 418947739.3000 43.3898
Lesotho ARIMA − 8.5919 0.9618 0.0775 0.6781 685.6982 470182.0215 0.6157
LSTM − 21.4589 0.2614 0.3409 14.5044 3016.3651 9098458.4170 16.4016
Prophet − 25.6834 − 0.9537 0.5545 23.6798 4905.8203 24067072.8200 20.2240
Malawi ARIMA − 28.4637 − 0.0858 0.2957 9.3612 6756.5296 45650692.2400 9.5218
LSTM − 20.0071 0.8451 0.1117 3.3325 2552.0723 6513073.0240 3.4858
Prophet − 38.2399 − 9.3125 0.9113 41.4722 20822.5839 433580000.3000 34.0048
Mozambique ARIMA − 42.8290 − 2.4604 0.5430 14.7160 35317.3358 1247314208.0000 16.9596
LSTM − 21.2013 0.9762 0.0450 1.9387 2928.2214 8574480.5670 1.9128
Prophet − 47.2140 − 8.4979 0.8996 50.2045 58511.4488 3423589641.0000 39.7238
Botswana ARIMA − 45.9388 − 1.1362 0.4543 34.8145 50522.0752 2552480083.0000 29.1172
LSTM − 21.9147 0.9915 0.0286 1.9982 3178.8901 10105342.2700 2.0284
Prophet − 47.6825 − 2.1915 0.5553 44.8292 61753.7964 3813531370.0000 35.9618
Angola ARIMA − 28.5294 0.3054 0.2763 9.3038 6807.8362 46346633.7300 8.6089
LSTM − 19.7646 0.9077 0.1007 3.0467 2481.8282 6159471.2140 2.9731
Prophet − 21.5315 0.8613 0.1234 5.5594 3041.7034 9251959.5740 5.7561
Zimbabwe ARIMA − 50.9603 − 25.8469 1.3262 38.1986 90064.6700 8111644782.0000 54.1828
LSTM − 19.8316 0.9793 0.0368 1.6663 2501.0312 6255157.0630 1.6868
Prophet − 48.4614 − 14.1010 0.9947 75.8921 67547.6906 4562690505.0000 54.7687
The highest R value of 0.9970 was obtained from Zambia by the LSTM model, while the smallest R value of −168.8737 was observed from
Namibia by the ARIMA model. On the other hand, the smallest PSNR value −68.6754 was obtained from South Africa by the ARIMA model,
while the highest PSNR value −6.7122 was obtained by the LSTM model from Zambia. Both the PSNR and R metrics suggest that the LSTM
model is the best prediction model in this region, while the worst prediction model in this region is ARIMA. The RMSE metric ranges of
552.2662–24818.6368, 685.6982–692322.2002 and 3041.7034–657882.1667 were obtained by the LSTM, ARIMA and LSTM models, respec-
tively. It is also evident from this metric that the best RMSE range was produced by the LSTM model compared to the rest of the models. With
the smallest value of the MAPE metric of 0.2252 from Zambia, the overall best-performing model in the Southern Africa region is observed to
be the LSTM model, while the worst-performing model, with the largest MAPE of 57.0550 from Namibia, is observed to be the ARIMA model
that the three models show significant relative discrepancies Rwanda, the worst model performance can be observed from
in performance. In Fig. 16, both the LSTM and ARIMA both the ARIMA and Prophet models. In this particular sce-
models obtained better match prediction results when com- nario, both models predicted extremely varied results from
pared with the Prophet model in Madagascar. In Fig. 16, the actual data. These results conclude that the LSTM model
the worst model performance is observed in both Djibouti outperformed the ARIMA and Prophet models in the Eastern
and Madagascar by the Prophet model. On the other hand, region.
the best model performance is evidently obtained by the In Table 5, the three model performances have been given
LSTM model in all countries represented by the same fig- for the 12 countries from the Eastern region of Africa.
ure. In Fig. 17 too, the LSTM model is observed to have Figure 18 displays the overall combined model perfor-
the overall best-matching prediction results when compared mance from all individual regions used in this study. It
to the ARIMA and Prophet models. In both Mauritius and shows the percentage distributions both in the positive and
SN Computer Science
374 Page 18 of 32 SN Computer Science (2023) 4:374
negative directions to quantify each model’s performance all natures, while the other two models are affected by the
depending on its contribution to the total error value for the quality of their inherent data properties. The ARIMA model
seven error metrics used in this study. In both PSNR and R, works best with stationary data, and also requires a larger
good performance is indicated by having more distribution amount of data to fit well. With data that is not stationary,
toward the positive direction, just as bad performance can the ARIMA model performs poorly. The data used in this
be observed by having a more negative percentage distribu- study was small in amount due to the fact that the COVID-
tion. For RMSE, MAPE, NRMSE, SMAPE, and the MSE 19 pandemic is still a new ordeal with little data available.
errors, good performance can be observed in having smaller In most countries, the datasets were not significantly able
percentage distributions tending in the positive direction. to be made stationary, despite the differencing efforts to
On the other hand, bad performance for the models can be make them so during ARIMA model fitting. All of these
observed in having a large positive percentage distribution. factors contribute to its poor performance when compared
The RMSE, MAPE, NRMSE, SMAPE, and MSE metrics to the LSTM model. On the other hand, in this study, it
clearly state that the overall best performance in this study is observed that the overall worst-performing model is the
was obtained by the LSTM model, followed by the ARIMA Prophet model. Despite its ease of setup and not requiring
model, and lastly, the Prophet model. This is because the data preprocessing, this Fourier series-based model failed
LSTM model is observed to have obtained the smallest to find and learn significant trends, seasonality, and holiday
percentage distribution of the total error in all these five structures within the data to make best-matching predictions,
metrics. The ARIMA model follows, with relatively larger which is because of the limited data available and given for
percentage distributions than the LSTM model, but smaller training. The LSTM model's having several hyperparameter
compared to the Prophet model. The PSNR and R values tuning points made it possible for it to be tuned until the
also clarify that the LSTM model is observed to outperform best-matching results were reached. When compared to the
the other two models. Both the PSNR and R values for the other two models, the computational and time complexity
LSTM model tend toward the positive direction, showing of the LSTM model in order to achieve optimal results was
that it achieved the highest values for these two metrics the highest.
compared to the ARIMA and Prophet models. It is again
followed by the ARIMA and, lastly, the Prophet model,
respectively. The LSTM model's performance is owed to
the fact that it can process and handle sequential data of
SN Computer Science
SN Computer Science (2023) 4:374 Page 19 of 32 374
Fig. 14 Actual and predicted cumulative positive cases in Western Africa (a)
Forecasting for the Next 61 Days for each country in the five major regions of the African
continent.
In this study, after determining the best prediction model
through the training and testing processes, the second major
phase involved the forecasting of the cumulative positive Northern Africa
cases by the best-performing model for each country for
a period of 61 days. At the time of access to the main As displayed in Fig. 19, the COVID-19 cumulative positive
COVID-19 case dataset used in this study, the last date of the cases are expected to have a fast increasing rate in Egypt as
reported cases for each country in all regions was 2021-11- well. While in countries like Tunisia, Algeria, and Maurita-
1. Cumulative positive cases were then forecasted from the nia, cases are expected to maintain a flat rate of increase, in
last date of the original dataset up to the date of 2022-01-02 Libya it is expected to show a gradual increase in the rate of
SN Computer Science
374 Page 20 of 32 SN Computer Science (2023) 4:374
increase. In Morocco, a notable slight decrease is expected, country in this region with the largest expected increase in
after which a constant number of cases with a small increase the number of COVID-19 cumulative positive cases.
at the end of the forecasting period is expected. At the end
of the prediction period, all these countries in Northern Central Africa
Africa that reported cumulative cases are expected to show
an increase. In Algeria, Mauritania, Tunisia, Egypt, Libya, In Fig. 20, the forecasted cases for the five Central Afri-
and Morocco, cases are expected to increase from 206452 can countries have been plotted. In Cameroon, the cases
to 208009, 37320–38250, 712747–716835, 331017–370164, are expected to slightly drop to a constant rate of increase.
357338–369986, and from 946145 to 947226, respectively. In Gabon and Equatorial Guinea, a gradual increase is
With an 11.83% increase in the number of cases at the end expected, while in Chad and São Tomé and Príncipe, a con-
of the forecasting period, it is observed that Egypt is the stant rate of change in the cases is expected. At the end of
SN Computer Science
SN Computer Science (2023) 4:374 Page 21 of 32 374
As annotated in the table based on the PSNR metric, the best model performance was obtained from the LSTM model from Sierra Leone, with
a PSNR value of 22.5264. On the other hand, the worst performance was observed in Nigeria, with a PSNR value of −39.5807, which was
obtained by the ARIMA model for Nigeria. From the R metric, it is also observed that the highest value of 0.9965 is produced by the LSTM
model in Mali, as well as the lowest value of −558.9784 was obtained by the ARIMA model in Liberia. It is worth noting that a higher PSNR
and R value imply better results. Values obtained with this metric show that the LSTM models outperform the other two models in the Western
African region as well. The lowest RMSE value of 19.0642 obtained by the same model in Sierra Leone further reinforces this observation,
while the worst performance with regard to the same metric can be notably seen in Nigeria by the ARIMA model with the highest value of
24,298.1514. When the MAPE metric ranges from all the three models are taken into consideration, it is observed that the LSTM model pro-
vides the best range, reflecting the best performance of 0.1375–8.7542, which is followed by the ARIMA model with a range of 1.9840–72.6324
and the Prophet model with a range of 0.9659–30.8797. These performance metric results affirm that the best-performing model is the LSTM
model in the Western region
SN Computer Science
374 Page 22 of 32 SN Computer Science (2023) 4:374
the forecasting period in Cameroon, a decrease in the num- these three, are observed to maintain a constant number of
ber of cases is expected to occur from 102,499 to 102,129. cases with insignificant increases.
In the Central African region, Cameroon is the only country Figure 22 is a continuation of Fig. 21, which also shows
with an expected decrease in the number of cases. a plot of the forecasted cases and actual cases for three
The rest of the countries are expected to experience an countries in the Southern African region. In both Zam-
increase in the number of cases as well. Cases are expected bia and Mozambique, the number of cumulative cases is
to increase from 35525 to 36522, 5069–5072, 13368–13508 expected to maintain a constant course while a signifi-
and 3714–3717 in Gabon, Chad, Equatorial Guinea and São cant gradual increase in the number of cumulative cases
Tomé and Príncipe respectively. The largest increase in the is expected to occur. At the end of the forecasting period
number of cases in this region is expected to occur in Gabon, among the countries of this region, it is only in Mozam-
with an expected percentage increase of 2.81%. bique that the number of COVID-19 cumulative cases
is expected to decrease from 151292 to 151051. In the
Southern Africa rest of the countries, the cases are expected to increase.
In Angola, Botswana, Malawi, Namibia, South- Africa,
For the sake of clarity, countries from the Southern Afri- Zambia, Eswatini, Lesotho, and Zimbabwe, the number of
can region were separated into two plots showing the fore- cases is expected to increase from 64433 to 76655, 186594
casted cumulative cases. This is because the number of cases to 193024, 61796 to 63201, 128886 to 129401, 209734 to
in South Africa is so much bigger than in the rest of the 210955, 46421 to 46874, 21635 to 24334, and 132977 to
countries in this region. This would result in plots for other 133267, respectively. In this region, the highest percentage
countries being stacked together and not being able to be increase is observed to be 18.97% from Angola.
examined. In Fig. 21, a plot for the actual and forecasted
cumulative cases for seven countries in the Southern African Eastern Africa
region is provided. According to this figure, it is observed
that in Angola, the expected rate of increase in the cumula- Forecasted cases in the Eastern region of Africa have been
tive positive cases is higher than in the rest of the countries. plotted in two separate graphs (Figs. 23 and 24). This made
Angola is followed by Lesotho, with a moderate rate of it possible to analyze and observe clearly the forecasted
increase in the number of cumulative cases. Lesotho is also cases in all countries studied in this region.
followed by Botswana, with a small but notable increase in In Fig. 23, a plot of the actual and forecasted cases for
the cumulative cases. The rest of the countries, apart from seven countries from the Eastern African region has been
SN Computer Science
SN Computer Science (2023) 4:374 Page 23 of 32 374
Fig. 17 Actual and predicted cumulative positive cases for Eastern Africa (a). Actual and predicted cumulative positive cases for Eastern Africa
(b)
given. This forecast has been produced by the top perform- expected, while in Ethiopia and Somalia, a notable increase
ing model, which is the LSTM in most countries. According is expected to occur. In both Uganda and Sudan, a small
to this forecast, it is observed that in two countries, Rwanda increase, which will be followed by a small but significant
and Mauritius, there is an expected gradual increase in the decrease, is expected to take place.
rate of increase of cumulative positive cases. Apart from At the end of the forecasting period in Djibouti, the
these two countries and Djibouti, which are expected to cases are expected to remain constant. The previous num-
have the same number of cases, the rest of the countries are ber in the original dataset was 13478 cases, which was
expected to have small fluctuations in the number of cases. expected to remain the same at the end of the forecast for
In Fig. 24, five countries in the Eastern region of Africa Djibouti. In Eritrea, a small decrease is expected to hap-
have been shown with their respective COVID-19 cumula- pen from 6834 to 6820 cases. On the other hand, in the
tive positive cases. In Kenya, a constant number of cases is
SN Computer Science
374 Page 24 of 32 SN Computer Science (2023) 4:374
Fig. 17 (continued)
rest of the countries, an increase is expected by the end of observed to take place in Somalia, with a 10.72% expected
the forecasting period. In these countries, Uganda, Sudan, percentage increase.
Madagascar, Kenya, South Sudan, Somalia, Rwanda,
Mauritius, Ethiopia, and Comoros, cases are expected to Western Africa
increase from 126236 to 127628, 40433 to 40598, 43626
to 44150, 253310 to 253901, 12410 to 12761, 21998 The forecasted cases from the Western African countries
to 24356, 99698 to 102205, 17812 to 18297, 365167 were grouped into two groups. As shown in Figs. 25 and 26,
to 377935, and 4259 to 4472, respectively. The highest six countries were plotted together in each group. This was
expected increase in the cumulative number of cases is done in order to separate countries that have closer numbers
of cumulative cases for a clear analysis of the results from
the forecasting stage.
SN Computer Science
SN Computer Science (2023) 4:374 Page 25 of 32 374
Seven metrics have been used to measure the model’s performance. Of these metrics, PSNR and R value, show the best performance for higher
numbers. The rest of the metrics, apart from these, display the best performance for lower values. Basing on the PSNR metric, it is observed
that the LSTM model obtained the highest value of 18.6599 from the Comoros. Using the same metrics, it is evident that in Ethiopia the worst
performance, with the smallest PSNR value of −45.0386, was obtained by the ARIMA model. This implies that there is a significantly greater
amount of noise in the predicted data by the ARIMA model, which directly reflects poor performance. This is also evident in Fig. 17, whereby
there is a greater deviation of the predictions from the ARIMA with regard to the actual data. In Madagascar, the Prophet is observed to have
the worst R value of −1882.7825. This implies that the predicted values by the Prophet model in Madagascar have the poorest correlation with
the actual data in this country when compared with the other two model predictions and their respective actual values in both Madagascar and
the other countries. Using the same metric, it is observed that the LSTM model in Rwanda obtained the best and highest value of 0.9968. This
shows a strong correlation between the LSTM predicted cases and the actual cases, as further represented in Fig. R in Rwanda. When ranked
by the RMSE ranges, it is observed that the best range is 29.7540–13506.4698, which is obtained by the LSTM, followed by the Prophet model
with a range of 97.0829–44463.6562 and lastly, the ARIMA model with a range of 185.4063–45548.2859. It is clear from this metric that the
most accurate model is the LSTM model in the Eastern region
SN Computer Science
374 Page 26 of 32 SN Computer Science (2023) 4:374
100%
80%
60%
Percentage error distribution
40%
20%
Prophet
0%
RMSE MAPE PSNR R NRMSE SMAPE MSE ARIMA
-20% LSTM
-40%
-60%
-80%
-100%
Models
Fig. 19 Actual and forecasted COVID-19 cumulative positive cases for Northern Africa
In Fig. 25, six countries from the Western region of Africa, cases. In the rest of the five countries, a constant number of
including their respective forecasted and actual cumulative cases is expected, with small fluctuations by the end of the
cases, are shown. According to this figure, it is clear that the forecasting period. Since all countries in this figure maintained
expected cases in Guinea will have a small increase, which their respective fluctuation courses in the number of cases, it
is immediately followed by a generally constant number of is evident that countries with a higher number of cases before
SN Computer Science
SN Computer Science (2023) 4:374 Page 27 of 32 374
Fig. 20 Actual and forecasted COVID-19 cumulative positive cases for Central Africa
Fig. 21 Actual and forecasted COVID-19 cumulative positive cases for Southern Africa (a)
the forecasting processes maintained these higher numbers number of forecasted cases, as depicted in Fig. 25. Since there
after forecasting. Countries such as Guinea, with the highest is no expected significant decrease in the forecasted cases, this
number of actual cases, are still expected to have the highest
SN Computer Science
374 Page 28 of 32 SN Computer Science (2023) 4:374
Fig. 22 Actual and forecasted COVID-19 cumulative positive cases for Southern Africa (b)
Fig. 23 Actual and forecasted COVID-19 cumulative positive cases for Eastern Africa (a)
still presents a great risk for the region if preemptive measures In Fig. 26, the rest of the six countries from the Western
are not taken. region of Africa are given, including the forecasted and actual
cases in each state. A significant increase in the expected cases
SN Computer Science
SN Computer Science (2023) 4:374 Page 29 of 32 374
Fig. 24 Actual and forecasted COVID-19 cumulative positive cases for Eastern Africa (b)
Fig. 25 Actual and forecasted COVID-19 cumulative positive cases for Western Africa (a)
in Mali is observed, while in the rest of the countries, a con- In Gambia, a very small decrease is expected to occur in the
stant number of cases with minor fluctuations is observed. forecasted number of cumulative cases at the end of the fore-
casting period. In this country, cases are forecasted to decrease
SN Computer Science
374 Page 30 of 32 SN Computer Science (2023) 4:374
Fig. 26 Actual and forecasted COVID-19 cumulative positive cases for Western Africa (b)
from 9967 to 9964. In other countries in the Western region, future course that the pandemic might take beforehand.
apart from the Gambia, there is an expected increase in the This is because it would enable authorities to plan ahead
number of cases. The COVID-19 cumulative positive cases of time and eventually allocate resources effectively and
are expected to increase from 6366 to 6565, 6134 to 6151, efficiently to more critical areas. There is a significant
30653 to 30909, 14793 to 14848, 26079 to 26195, 6398 to gap in the literature for studies that consider a continent’s
6408, 73917 to 74171, 211961 to 214460, 16074 to 19734, perspective, especially in Africa, when dealing with the
5815 to 5838, and 130077 to 131347 in countries such as forecasting of COVID-19. This study aimed at closing this
Niger, Guinea-Bissau, Guinea, Burkina Faso, Togo, Sierra gap by focusing on the forecasting and investigation of the
Leone, Senegal, Nigeria, Mali, Liberia, and Ghana, respec- expected future COVID-19 cumulative positive cases for a
tively. According to these results, it is observed that the highest period of sixty- one days. From the forecasted values, this
expected percentage increase of 22.77% is expected to occur study aims to also identify the most critical states in each
in Mali. of the five major regions that have the highest expected
percentage increase in the number of cases.
To achieve these objectives, this study employed both
Conclusions and Suggestions statistical and deep learning approaches, which consisted of
three prediction models that were composed of the ARIMA,
This study involves the forecasting of COVID-19 cumula- Prophet, and LSTM models. In a comparative analysis of
tive positive cases in countries from the five major regions the performance of these three models, seven performance
of the African continent, which include the Northern, metrics were used. These included the MSE, RMSE, MAPE,
Eastern, Western, Central, and Southern regions. To con- SMAPE, R2 score, NRMSE, and PSNR. The best-perform-
tain and control the spread of the COVID-19 pandemic, ing model was then selected to perform the forecasting of
there is a great need for strategies that can predict the the future COVID-19 cumulative positive cases for a 61-day
SN Computer Science
SN Computer Science (2023) 4:374 Page 31 of 32 374
perspective. In this study, the best-performing model was the 8. Huang CJ, Chen YH, Ma Y, Kuo PH. Multiple-input deep con-
LSTM model, while the worst-performing model was the volutional neural network model for COVID-19 forecasting in
China. medRix. 2020. https://doi.org/10.1101/2020.03.23.20041
Prophet model. The highest expected increase in the number 608.
of cases from the Western African region is expected to be 9. Huang Y, Xu C, Ji M, Xiang W, He D. Medical service demand
22.77% from Mali. On the other hand, in Angola, a coun- forecasting using a hybrid model based on ARIMA and self-adap-
try from the Southern region, the overall highest expected tive filtering method. BMC Med Inform Decis Mak. 2020. https://
doi.org/10.1186/s12911-020-01256-1.
increase is 18.97%. The highest expected increase from 10. Kırbaş S, Sözen A, Tuncer AD, Kazancıoğlu FI. Comparative
the Northern region is expected to take place in Egypt, analysis and forecasting of COVID-19 cases in various European
at 11.83%. In the Eastern region, the highest increase of countries with ARIMA, NARNN and LSTM approaches. Chaos,
10.72% is expected to occur in Somalia. Lastly, from the Solitons Fractals. 2020;138:110015. https://doi.org/10.1016/j.
chaos.2020.110015.
Central African region, the highest expected increase is 11. Le H, Lee J. Application of long short-term memory (LSTM) neu-
2.81% in Gabon. There is a need for studies that consider ral network for flood forecasting. Water. 2019;11(7):1387. https://
the influence of population demographics on the spread of doi.org/10.3390/w11071387.
COVID-19 12. Marzouk M, Elshaboury N, Abdel-Latif A, Azab S. Deep learning
model for forecasting COVID-19 outbreak in Egypt. Process Saf
Environ Prot. 2021;153:363–75. https://doi.org/10.1016/j.psep.
2021.07.034.
Author Contributions All authors have participated in (a) the concep- 13. Noureen S, Atique S, Roy V, Bayne S. Analysis and application
tion and design, or analysis and interpretation of the data; (b) drafting of seasonal ARIMA model in Energy Demand Forecasting: a case
the article or revising it critically for important intellectual content; study of small scale agricultural load. In: 2019 IEEE 62nd iIn-
and (c) approval of the final version. This manuscript has not been ternational midwest symposium on circuits and systems (MWS-
submitted to, nor is under review for, another journal or other publish- CAS), Dallas, TX, USA, 2019, p. 521–524. https://doi.org/10.
ing venue. 1109/MWSCAS.2019.8885349
14. Pal R, Sekh AA, Kar S, Prasad DK. Neural network based
Funding No funding was granted to this research work. country wise risk prediction of COVID-19. Appl Sci.
2020;10(18):6448.https://doi.org/10.3390/app10186448
Data availability The dataset used as case study was obtained and 15. Ribeiro MH, da Silva RG, Mariani VC, dos Santos Coe-
retrieved on October 1, 2021, from https://data.humdata.org/dataset/ lho L. Short-term forecasting COVID-19 cumulative con-
africa-covid19-infected firmed cases: Perspectives for Brazil. Chaos, Solitons Fractals.
2020;135:109853. https://doi.org/10.1016/j.chaos.2020.109853
Declarations 16. Shastri S, Singh K, Kumar S, Kour P, Mansotra V. Time
series forecasting of COVID-19 using deep learning models:
Conflict of Interest The authors declare that they have no conflict of India-USA comparative case study. Chaos, Solitons Fractals.
interest. 2020;140:110227. https://doi.org/10.1016/j.chaos.2020.110227.
17. Singh RK, Rani M, Bhagavathula AS, Sah R, Rodriguez-Morales
AJ, Kalita H, Nanda C, Sharma S, Sharma YD, Rabaan AA,
Rahmani J, Kumar P. Prediction of the COVID-19 Pandemic for
the Top 15 affected countries: advanced autoregressive integrated
References moving average (ARIMA) Model. JMIR Pub Health Surveil.
2020;6(2):e19115. https://doi.org/10.2196/19115.
1. Abdulmajeed K, Adeleke M, Popoola L. Online forecasting 18. Taylor SJ, Letham B. Forecasting at scale. Am Stat. 2017. https://
of COVID-19 cases in nigeria using limited data. Data Brief. doi.org/10.7287/peerj.preprints.3190v2.
2020;30:105683. https://doi.org/10.1016/j.dib.2020.105683. 19. Wang P, Zheng X, Li J, Zhu B. Prediction of epidemic trends in
2. Africa: COVID-19 Infections (National)—Humanitarian Data COVID-19 with logistic model and machine learning technics.
Exchange. (2021). Humanitarian Data Exchange. Retrieved Chaos, Solitons Fractals. 2020;139:110058. https://doi.org/10.
October 1, 2021, from https:// d ata. h umda t a. o rg/ d atas e t/ 1016/j.chaos.2020.110058.
africa- covid19-infected 20. Wang YW, Shen ZZ, Jiang Y. Comparison of ARIMA and
3. African Countries by Population (2022)—Worldometer. (2021). GM(1,1) models for prediction of hepatitis B in China. PLoS
Worldometer. Retrieved November 1, 2021, from https://www. ONE. 2018;13(9):e0201987. https://d oi.o rg/1 0.1 371/j ourna l.p one.
worldometers.info/population/countries-in-africa- by-population/ 0201987.
4. Archived: WHO Timeline—COVID-19. (2020, April 28). WHO 21. Wang Y, Wang J, Zhao G, Dong Y. Application of residual modi-
Timeline. https://www.who.int/news/item/27-04-2020-who-timel fication approach in seasonal ARIMA for electricity demand fore-
ine---covid-19 casting: a case study of China. Energy Policy. 2012;48:284–94.
5. Gebretensae YA, Asmelash D. Trend analysis and forecasting the https://doi.org/10.1016/j.enpol.2012.05.026.
spread of COVID-19 pandemic in Ethiopia using Box-Jenkins 22. W.H.O. (2020, January 10). Coronavirus. WHO. https://www.
modeling procedure. Int J Gen Med. 2021;14:1485–98. https:// who.int/health- topics/coronavirus#tab=tab_1
doi.org/10.2147/ijgm.s306250. 23. Yang Z, Zeng Z, Wang K, Wong SS, Liang W, Zanin M, Liu P,
6. Hssayeni MD, Chala A, Dev R, Xu L, Shaw J, Furht B, Ghoraani Cao X, Gao Z, Mai Z, Liang J, Liu X, Li S, Li Y, Ye F, Guan W,
B. The forecast of COVID-19 spread risk at the county level. J Yang Y, Li F, Luo S, He J. Modified SEIR and AI prediction of
Big Data. 2021. https://doi.org/10.1186/s40537-021-00491-1. the epidemics trend of COVID-19 in China under public health
7. Hu, Z. (2020, February 17). Artificial Intelligence Forecasting of
COVID-19 in China. arXiv.Org. https://arxiv.org/abs/2002.07112
SN Computer Science
374 Page 32 of 32 SN Computer Science (2023) 4:374
interventions. J Thorac Dis. 2020;12(3):165–74. https://doi.org/ 27. Zoabi Y, Deri-Rozov S, Shomron N. Machine learning-based pre-
10.21037/jtd.2020.02.64. diction of COVID-19 diagnosis based on symptoms. Npj Digital
24. Yu CS, Chang SS, Chang TH, Wu JL, Lin YJ, Chien HF, Chen RJ. Med. 2021. https://doi.org/10.1038/s41746-020-00372-6.
A COVID-19 pandemic artificial intelligence-based system with
deep learning forecasting and automatic statistical data acquisi- Publisher's Note Springer Nature remains neutral with regard to
tion: development and implementation study. J Med Internet Res. jurisdictional claims in published maps and institutional affiliations.
2021;23(5):e27806. https://doi.org/10.2196/27806.
25. Zeroual A, Harrou F, Dairi A, Sun Y. Deep learning methods Springer Nature or its licensor (e.g. a society or other partner) holds
for forecasting COVID-19 time-series data: a comparative study. exclusive rights to this article under a publishing agreement with the
Chaos, Solitons Fractals. 2020;140:110121. https://doi.org/10. author(s) or other rightsholder(s); author self-archiving of the accepted
1016/j.chaos.2020.110121. manuscript version of this article is solely governed by the terms of
26. Zhang X, Yu Y, Xiong F, Luo L. Prediction of daily blood sam- such publishing agreement and applicable law.
pling room visits based on ARIMA and SES model. Comput Math
Methods Med. 2020;2020:1–11. https://doi.org/10.1155/2020/
1720134.
SN Computer Science