Ref 5
Ref 5
Received June 29, 2020, accepted July 30, 2020, date of publication August 4, 2020, date of current version August 17, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3014241
  ABSTRACT With the deregulation of the electric energy industry, accurate electricity price forecast-
  ing (EPF) is increasingly significant to market participants’ bidding strategies and uncertainty risk control.
  However, it remains a challenging task owing to the high volatility and complicated nonlinearity of electricity
  prices. Aimed at this, a novel hybrid deep-learning framework is proposed for day-ahead EPF, which includes
  four modules: the feature preprocessing module, the deep learning-based point prediction module, the error
  compensation module, and the probabilistic prediction module. The feature preprocessing module is based
  on isolation forest (IF), and least absolute shrinkage and selection operator (Lasso), which is used to detect
  outliers and select the correlated features of electricity price series. The point prediction module combines
  the deep belief network (DBN), long-short-term memory (LSTM) neural network (RNN), and convolutional
  neural network (CNN), and is employed to extract complicated nonlinear features. The residual error between
  forecasting price and actual price can be reduced based on the error compensation module. The probabilistic
  prediction module based on quantile regression (QR) is used to estimate the uncertainty under various
  confidence levels. The PJM market data is employed in case studies to evaluate the proposed framework,
  and the results revealed that it has a competitive advantage compared with all of the considered comparison
  methods.
  INDEX TERMS Electricity market, day-ahead electricity price forecasting, feature preprocessing, deep
  learning, error compensation, probabilistic forecasting.
                     This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020                                                                                                                                                        143423
                                                                     R. Zhang et al.: Deep Learning Based Hybrid Framework for Day-Ahead EPF
will lead to complex computation [7]. And it will grow              belief network (DBN) [21], deep reinforcement learning [22],
worse when physical methods encounter unexpected inputs             long short-term memory (LSTM) [23] and convolutional
during prediction. Therefore, physical methods maybe not            neural network (CNN) [24]. In [25], the authors developed
very suitable for day-ahead EPF. Statistical methods aim to         multi-input and multi-output LSTM models to forecast elec-
unveil the dynamic trend between historical electricity price       tricity load for the time specified by the user. Numerical
series using curve fitting. This method has advantages in           results demonstrate that the LSTM exhibits the ability to
high-speed performance, model simplicity, and convenience.          improve forecasting accuracies compared to SVM, ANN, and
Statistical models include Autoregressive Moving Average            recurrent neural network (RNN). In [26], [27], the authors
(ARMA) [8], generalized autoregressive conditionally het-           proposed a novel hybrid method based on wavelet trans-
eroskedastic (GARCH) [9], and fuzzy theory [10]. In [8],            form and CNN for renewable energy forecasting. Wavelet
the authors proposed a novel EPF algorithm that includes            transform was employed to decompose the raw renewable
the results from multiple linear regression (MLR) model             energy data into a set of better outlines series, and CNN was
with an ARMA and Holt-Winters models. In [9], the authors           applied to extract the complicated nonlinear features. The
exploited the GARCH methodology to predict next-day elec-           experimental results also demonstrated that the forecasting
tricity prices. The authors in [10] proposed a hybrid model         performance of deep learning models performed all of the
combining wavelet, firefly algorithm, and fuzzy adaptive            considered shallow learning models in terms of seasons and
resonance theory map (ARTMAP) for day-ahead EPF. How-               prediction horizons. To the authors’ knowledge, deep learn-
ever, since the prerequisites for using statistical methods are     ing method for day-ahead EPF has received little attention
mainly linear modeling, most of them become more diffi-             compared with wind energy forecasting, load forecasting,
cult for predicting the high-dimensional nonlinear electricity      and photovoltaic power forecasting in the field of energy
price series.                                                       system.
   Machine learning methods can broadly split into two types:          The existing studies only focus on the application of
shallow learning models and deep learning models [40].              feature extraction for point prediction, there are other
Shallow learning models are based on the principle of error         issues unresolved. First, the raw features with outliers and
minimization, and usually have better performance than              high-dimensional data can make the feature extractions
physical methods and statistical methods. Due to their notable      more complex. Therefore, feature preprocessing techniques
capabilities in extracting features, they has been one of the       will be crucial to anomaly detection (reduce outliers) and
most common methods for electricity prices forecasting.             identify the correlated features to the corresponding fore-
Shallow learning models include support vector regression           cast time (reduce the dimension) [28]. Second, the single
(SVR) [11], artificial neural network (ANN) [12] and regres-        forecasting model may result in large forecasting residu-
sion tree [13]. In [14], the authors employed the hybrid model      als between the forecast and actual values. Therefore, it is
based on SVR and feature selection techniques to predict            reasonable to reduce the forecasting residuals by an error
the electricity price spikes. In [15], [16], the authors utilized   compensation model, which is perceived as a post-processing
ANN and similar daily load and prices of the corresponding          learning model to forecasting error shifting trends. Finally,
forecasting day to predict day-ahead electricity prices in the      considering the model misspecification and data noise, point
PJM market. The simulation results show that the average            prediction models fail to evaluate the forecasting uncertainty
of mean absolute percentage error (MAPE) for the proposed           presented in electricity prices, which is not conducive to
method is up to 9.75%. The authors in [17] developed a              counteractive risk. Consequently, the probabilistic day-ahead
hybrid forecasting model based on a seasonal component              EPF model that can describe these uncertainties becomes
auto-regressive model and an ANN model, which is applied            more meaningful [29].
to forecast day-ahead electricity prices with tolerable perfor-        In light of this, we need to combining feature preprocess-
mance.                                                              ing techniques, deep learning-based point prediction mod-
   However, the above methods for EPF concentrate only on           els, error compensation module, and probabilistic prediction
shallow learning models. Because shallow machine learning           model to rethink and design a hybrid deep learning frame-
models are prone to over-fitting and gradient disappearance,        work for day-ahead EPF in this article. The main contribu-
they are restricted in processing big data and complicated          tions of this article are presented as follows:
nonlinear problems [18]. With the development of intel-                (1) A novel feature preprocessing module consisting
ligence optimization theories and computer technology in            of isolation forest (IF) [30] and least absolute shrinkage
recent years, deep learning methods, as a potential forecast-       and selection operator (Lasso) [31] is proposed, which
ing technology, is experiencing rapid growth to circumvent          helps to anomaly outliers and identify the correlated
these problems, and successful applications to pattern recog-       features.
nition, text processing, and fault location [19]. In addition,         (2) The deep learning-based point prediction module com-
deep learning methods have also been widely employed                bining three types of deep-learning models (DBN, LSTM,
for short-term renewable energy forecasting and electric-           and CNN) is developed and contrasted to extract complicated
ity load forecasting in the energy system. Deep learning            nonlinear features of electricity price, and their forecasting
methods combine deep neural network (DNN) [20], deep                performance is compared with shallow learning models.
   (3) For the first time, the residuals between the forecasting          reduction (this is, features selection). This article concen-
electricity price and actual electricity price are trained to             trates on IF based anomaly detection and Lasso based features
correct the error from the point predicted values of electricity          selection, as described in Section III-A.
price by an error compensation module, for which the inputs
are the validation set errors.                                            B. DEEP LEARNING-BASED POINT PREDICTION MODULE
   (4) The uncertainties registered in electricity price are              Point prediction module based on DBN, LSTM, and CNN
initially evaluated using a probabilistic prediction module               aims to extract complicated nonlinear features of day-ahead
concerning data noise and model misspecification (inspired                electricity prices. DBN is semi-supervised learning for
by quantile regression (QR)) [32].                                        weight initialization, but CNN and LSTM belong to a kind
   The rest of the paper is organized as follows. Section II              of supervised learning. Furthermore, what their similar char-
introduces the proposed forecasting framework; Section III                acter is: a quadratic loss function to compute weights can be
describes the experimental methods; Section IV describes the              minimized by using BP based on various gradient descent
assessment indicators of point and probabilistic forecasting              methods [33], such as stochastic gradient descent (SGD),
performance; Section V shows and discusses case studies of                momentum gradient descent (MGD), Ada gradient descent
the PJM electricity market. Finally, Section VI presents the              (Ada), and RMSProp gradient descent (RMSProp). The
concluding remarks.                                                       details are described in Section III-B.
III. THE DESCRIPTION OF EXPERIMENTAL METHODS                        and activation probability of each neuron for the visible layer
A. FEATURE PREPROCESSING MODULE                                     (binary state v and offset a) and the hidden layer (binary state
1) ISOLATION FOREST                                                 h and offset b) can be respectively deduced as follows:
Conventional anomaly detection methods are generally based                         X            X            XX
                                                                     E(v, h) = −       aTs vs −     bTk hk −        hk Wk,s vs (3)
on a hypothesis that better outlines follow a given region, such
                                                                                      s             k                   k   s
as mean-square error method and quartile method, which                                             X
could result in false valuation or partial detection. Conversely,       p(vs = 1|h) = σ (as +               wk,s hk )                     (4)
IF without making any prior assumptions (contain multiple                                               k
                                                                                                    X
isolation trees of no child node or two child nodes) utilizes          p(hk = 1|v) = σ (bk +                wk,s vs )                     (5)
anomaly score to isolate outliers [34], which is defined as                                             s
follows:                                                            where E(v, h) is the energy function. σ is the logistic sigmoid
                                      E(h(x))
                        S(x, n) = 2    c(n)                  (1)    function. P(hk = 1|v) and P(vs = 1|h) respectively represent
                                                                    the activation probability of v at a given h and the activation
where E(h(x)) is the average length of sample x from a set of       probability of h at a given v.
isolation trees. c(n) represents the average path length with          Once the energy function and activation probability for
n samples obtained from a binary search tree [34]. Once             each neuron of RBM are required, the objective function Lθ,s
anomaly scores for each sample x are solved, lower val-             of DBN is defined as follows:
ues (outliers) can manually be exclude based on the abnormal                                 X       X E(v, h)
proportional coefficient ζ .                                                        Lθ,S =      log(             )              (6)
                                                                                              s
                                                                                                            Z
                                                                                                             k
2) LEAST ABSOLUTE AND SELECTION OPERATOR                            where Z is the partition function.
In general, feature selection aims to choose the attributes            Then, the model parameters of the visible and hidden layers
relevant to the corresponding forecast day. Compared to tradi-      can be updated using the derivative of Lθ,s based on Bayesian
tional methods such as Pearson correlations, chi-square tests,      statistics theory and Gibbs sampling [39], as shown below:
and information gain, Lasso can effectively tackle continuous                          n                                     o
time series and parameter estimation problems [35]. There-           Wk,s = Wk,s +η P(hk = 1|v)vT − P(h∗k = 1|v∗ )(v∗ )T (7)
fore, feature selection based on Lasso is introduced in this            a = a + η(v − v∗ )                                                (8)
article, which is as follows:
                                                  2                 b = b + η{p(hk = 1|v) − P(h∗k = 1|v∗ )}                           (9)
                          
                          X n            p            
                                                       
                                                                      If pre-trained weights have concluded in an unsupervised
                                         X
       (r, βj ) = arg min       y−r −       βj xij     ,
                          
                           i=1          j=1
                                                       
                                                                   manner, as described in Section III-B, the whole network
                                                                    weights for DBN can finally be tuned by BP algorithm.
                       p
                       X
                s.t.         βj ≤ ϕ                          (2)    2) CONVOLUTIONAL NEURAL NETWORK
                       j=1
                                                                    Convolutional neural network (CNN) is proposed based on
where xi,j is the time-shared load or historical electricity        biological information transfer, in which the connection
prices for the jth feature in the ith time series. r and βj are     relation among neurons references the cat’s cerebral opti-
regression coefficients.                                            cal cortex, and it is found that it can effectively lessen
   From (2), it can be obviously seen that time-shared uncor-       the complexity of network structures [40]. Based on this,
related features could be eliminated when a feature selection       CNN has two advantages, which is translation invariance and
coefficient ϕ is smaller.                                           shared-weights technique [41]. Typical CNN consists of an
                                                                    input layer, stacking convolutional layers with sub-sampling
B. DEEP LEARNING-BASED POINT PREDICTION MODULE                      layers, a fully connected layer, and an output layer, as shown
1) DEEP BELIEF NETWORK                                              in Figure 4.
Deep belief network (DBN) contains stacked restricted Boltz-
mann layers (RBM) for unsupervised pre-training and a logis-        a: INPUT LAYER
tic regression layer for forecasting outputs [36], which is         Compared with pattern recognition, EPF has distinct char-
shown in Figure 2. As an important part of DBN, RBM is              acteristics. Its input features are 1-dimensional (1D) data,
a simple binary network model which includes a visible layer        whereas the stacking convolutional and down-sampling lay-
to accept inputs from the upper hidden layer, and a hidden          ers for CNN are two-dimensional (2D) data. Thus, the input
layer to extract features from the visible layer, as shown          layer serves as a dimension transducer to convert the 1D day-
in Figure 3. There is no interconnection between neurons            ahead electricity price data series into a 2D image, and the
in the visible layer and hidden layer, but fully connected          size of the image should be determined by the number of
neural networks exist neurons between various layers [37].          correlated features from feature preprocessing module. The
According to Boltzmann distribution [38], energy function           process is described as follows: (1) The size of the image
                                                                          c: SUB-SAMPLING LAYER
                                                                          Sub-sampling layer aims to achieve an information filtering
                                                                          task, in which the single point statistical result is obtained
                                                                          by replacing the pooled feature map with sub-sampling
                                                                          function to avoid over-fitting due to no specificity deduc-
                                                                          tions [42]. As a sub-sampling function, the max-pooling
                                                                          function max(kijl−1 ) can easily keep the invariance of the local
FIGURE 3. RBM architecture.                                               extraordinary feature, which is as follows:
                                                                                                               l−1
                                                                                            ylj = f (βjl∗ max(xi,j ) + clj )          (11)
TABLE 1. The advantages and disadvantages of various gradient descent   probabilistic electricity price tends to follow given distribu-
methods.
                                                                        tions, such as Gaussian [47], [48] and logistic [49]. Neverthe-
                                                                        less, this solution may be improper because it is difficult to
                                                                        follow a prior distribution [50]. Therefore, QR is introduced
                                                                        as a kind of nonparametric approaches to estimate these
                                                                        uncertainties from electricity prices, which is as follows,
                                                                                          X
                                                                                     min      ρτ (yj − xj0 β1,τ − β0,τ − r)        (19)
                                                                                     βτ ∈R
                                                                                             j
                                                                        where yj and xj0 are the forecasting and real values for sample
                                                                        j, respectively. β1 , τ and β0 , τ are linear optimal parameters
                                                                        according to τ (quantile value). ρτ is a piecewise linear loss
                                                                        function, which is defined by:
                                                                                               (
                                                                                                 τr            if r > 0
                                                                                       ρτ =                                        (20)
                                                                                                 −(1 − τ )r otherwise
description is as follows:
                                                                          Once the parameters are estimated, the uncertainty for
                 ft    =   σ (Wf [ht−1 , xt ]) + bf )            (12)   point forecasting results of day-ahead electricity prices can
                 it    =   σ (Wi [ht−1 , xt ]) + bi )            (13)   be evaluated at different quantiles, as follows:
                Ct 0   =   tanh(Wc [ht−1 , xt ]) + bc )          (14)
                                                                                                 yτ = β0,τ + β1,τ xj0                      (21)
                Ct     =   ft∗ Ct−1 + i∗t Ct 0                   (15)
                 ot    =   σ (Wo [ht−1 , xt ]) + bo )            (16)   IV. PERFORMANCE ASSESSMENT
                 ht    =   o∗t tanh(Ct )                         (17)   In this section, some judgment indexes are introduced to
                                                                        evaluate point and probabilistic forecasting performance and
4) BACK PROPAGATION ALGORITHM                                           demonstrate the superiority of the proposed hybrid forecast-
To improve the accuracy and stability of the day-ahead EPF,             ing framework.
the network parameters of DBN, LSTM RNN, and CNN, the
weights, and biases, are trained and updated based on the BP            A. ASSESSMENT OF POINT FORECASTING PERFORMANCE
algorithm using gradient descent methods with a mini-batch              Mean absolute percentage error (MAPE), mean absolute error
form. Table 1 shows the advantages and disadvantages of                 (MAE), and root mean square error (RMSE) are utilized to
various gradient descent methods in the application of the              evaluate the point forecasting performance. MAPE represents
weight updating. Compared with the batch form, the key                  the absolute average forecasting deviation between forecasts
focus of the mini-batch form presents two main benefits: the            and targets. MAE and RMSE are to assess the forecasting
first is that the calculation iteration numbers have reduced            accuracy and capability of the point forecasting results, which
to better convergence and stability; the second is that matrix          are as follows:
optimization is more efficient [45]. Thus, BP algorithm to                                      1 X |rt − pt |
minimize the mean squared error between the outputs value                           MAPE =                       × 100%            (22)
                                                                                               T           rt
pm,t and target value rm,t can be described as follows,                                           t∈T
                                                                                                1X
                         M T                                                          MAE =           |rt − pt | × 100%            (23)
                       1 XX                                                                    T
                J=           (rm,t − pm,t )2                     (18)                             t∈T
                       M                                                                       s
                                                                                                  1X
                            m=1 t=1
                                                                                    RMSE =               (rt − pt )2 × 100%        (24)
                                                                                                  T
                                                                                                        t∈T
C. PROBABILISTIC PREDICTION MODULE
Though feature preprocessing module, deep learning-based                B. ASSESSMENT OF PROBABILISTIC FORECASTING
point prediction module, and error compensation module                  PERFORMANCE
have been proposed to improve the accuracy of the day-ahead             Average coverage percentage (ACP) and interval
EPF, the point forecasting results are still uncertain. However,        sharpness (IS) are considered as the criteria for probabilistic
this uncertainty can be depicted by a probabilistic predic-             forecasting performance assessment. ACP is to evaluate the
tion module to diminish the bidding risk for market partic-             coverage value cαt of the observed values located in a pre-
ipants [29]. In general, probabilistic forecasting methods can          diction interval (PI) at the given prediction interval nominal
be grouped into parametric and nonparametric approaches,                confidence (1−α) %. IS is to comprehensively measure the
with or without prior distribution assumptions [46]. The                offset between PI and out-of-range observed values, and the
premise condition of parametric approaches for modeling                 width wαt for PI at the given the prediction interval nominal
FIGURE 6. Electricity prices autocorrelation of PJM market. FIGURE 7. Correlation between electricity prices and load in PJM market.
       1X α
ACP =         ct × 100%                                             (25)
       T
          t∈T
                    α                            Ltα < rt < Utα
             
       1     −2αwt ,
             
               −2αwαt − 4 Ltα − rt               Ltα ≥ rt
         X
  IS =                                                          (26)
       T
         t∈T −2αwα − 4 U α − r                   Utα ≤ rt
             
             
                    t       t     t
V. CASE STUDIES
A. EXPERIMENTAL SETTINGS
The proposed hybrid framework for day-ahead EPF are eval-
uated using PJM market data [51]. In this study, the day-
ahead electricity prices data covers a period from June 2018 to
December 2019 at an interval of one hour. The whole data set
is classified into a training set, a validation set, and a testing
set, in which the testing set contains the seven-day of each
season in 2019 (7 days before March, December, June, and                   corresponding to the forecasting day is acknowledged easily.
September). In the whole data set except the test set, 60% is              Therefore, we can choose the correlated features derived
divided into training set, whereas the rest 40% is divided into            from the historical price at a lag of more than 24-h, and
validation set.                                                            combine the forecast day and historical load as input data
   The parameters of the feature preprocessing module                      for the forecasting models in this article. In addition, the
include an abnormal proportional coefficient ζ for IF and a                data dimension is related to the correlation degree of the
feature selection coefficient ϕ for Lasso. Simulation results              correlated features and the structure of forecasting models.
show that an abnormal proportional coefficient ζ within the                In order to verify the effectiveness of the hybrid deep-learning
range of 0.005-0.1 performs well, because a high value would               framework, the forecasting results are compared with the
result in the loss of information relating to the significance             light gradient boosting machine (LGBM) [53], BPNN [16],
attributes, whereas a low value would affect the overall                   k-nearest neighbor (KNN) [54], and SVR [14]. Some vital
forecasting performance. According to stability analysis, the              parameter settings of the proposed and contrasted models are
abnormal proportional coefficient ζ is set to 0.015 in winter              listed in Table 2.
and 0.0015 in other seasons. The feature selection coefficient
ϕ is used to characterize the correlation degree of the impact             B. NUMERICAL RESULTS
features, and is fixed at 0.00005.                                         In order to evaluate the effectiveness of the proposed frame-
   Figure 6 and Figure 7 give 20 better correlation values from            work, the comparison of the three deep-learning forecasting
historical electricity price and load over the week, respec-               models (DBN, LSTM RNN, and CNN) and four contrast
tively. Other influential factors, such as bidding behavior,               forecasting models are conducted, and the simulation results
congestion and maintenance schedule, can not be quantified                 over various seasons are presented in Table 3. It can be
as input values of feature extractors. From the figures, it can            seen that the MAPE value of CNN varies from 0.0587 to
be seen that the electricity prices autocorrelation coefficients           0.0988 with an average of 0.0814, the MAPE value of
increase slowly as the number of forecasting hours reduce,                 DBN varies from 0.0613 to 0.1085 with an average of
and it performs well when it lags by 1 hour, but it can not be             0.0902, and the MAPE value of LSTM varies from 0.0608 to
used for day-ahead EPF with a lag of 0-23 hours. According                 0.1073 with an average of 0.0896. The average MAPE results
to [52], the day-ahead load forecasting has touched a satisfac-            for LGBM, BPNN, SVR, and KNN are 0.1233, 0.1271,
tory forecasting accuracy with a 1%∼2% error, so the load                  0.3169, and 0.1252, respectively. Compared with LSTM
TABLE 3. The forecasting statistical results for various contrast models in various seasons.
RNN and DBN, the MAPE of CNN is averagely improved                                 To graphically verify the supremacy of three deep-learning
by 10.81% and 10.11%, respectively. Compared with LGBM,                         forecasting models, the comparison of day-ahead electricity
BPNN, SVR, and KNN, the MAPE of CNN is averagely                                forecasting results of different forecasting models under vari-
increased by 51.46%, 56.16%, 289.40%, and 53.86%, respec-                       ous seasons is carried out and presented in Figure 8-Figure 11.
tively. Compared with LSTM, DBN, LGBM, BPNN, SVR                                It can be seen that the results of CNN performs best in terms
and KNN, the MAE of CNN is averagely increased by                               of the MAPE, MAE, and RMSE, followed by LSTM RNN,
10.26%, 10.26%, 39.83%, 43.22%, 232.19% and 45.95%,                             DBN, LGBM, BPNN, KNN, and SVR. The reason may be
respectively, and the RMSE is averagely increased by                            that the CNN model holds a weight-sharing technique to
9.06%, 8.53%, 31.38%, 29.91%, 167.35% and 59.84%,                               extract local complicated nonlinear features during training.
respectively.                                                                   The inferior performance of the SVR model is mainly driven
FIGURE 10. Day-ahead electricity forecasting results of different forecasting models in fall.
FIGURE 11. Day-ahead electricity forecasting results of different forecasting models in winter.
FIGURE 12. Hourly absolute percentage error results of CNN with and without ECM in spring.
by following abnormal distribution of kernel and worsened                         (2) Error compensation module: Because the proposed
by low feature extraction capability. It’s worth noting that the               deep learning forecasting models contain an error compensa-
three deep-learning forecasting models have a competitive                      tion module, a small deviation between the forecast and actual
benefit than four contrast forecasting models. There are two                   electricity prices may be corrected.
reasons for this:                                                                 To further demonstrate the advantages with the error
   (1) Feature extraction capability: Shallow machine learn-                   compensation module (ECM), several simulations using the
ing models (four contrast forecasting models) need to make                     proposed and contrast models with and without ECM are per-
full use of feature selection to identify shifting trends,                     formed in spring, as an example of illustration. Hourly abso-
whereas deep learning models are well versed in the high                       lute
nonlinear mapping capability, especially for the volume of                     percentage error results of different deep learning models
data increases.                                                                in spring are shown in Figure 12-Figure 14. The statistical
FIGURE 13. Hourly absolute percentage error results of DBN with and without ECM in spring.
FIGURE 14. Hourly absolute percentage error results of LSTM RNN with and without ECM in Spring.
TABLE 4. Statistical results of hourly absolute percentage error for the effect of ECM in spring.
TABLE 5. The ACPs and ISs with 80% confidence level using various models in various seasons.
results of them are listed in Table 4. It can be seen that                       and a variance of 0.0045. The hourly absolute percentage
the hourly absolute percentage error results of CNN with                         error results of DBN with ECM vary from 0.0054 to 0.3405
ECM vary from 0.0007 to 0.3743 with an average of 0.0820,                        with an average of 0.0877, and a variance of 0.0042. The
FIGURE 15. PIs with confidence level 80% for March 1 using various forecasting models.
TABLE 6. The average ACPs and ISs with various confidence level ranging from 2% to 98% level using various models in various seasons.
hourly absolute percentage error results of LSTM RNN with                   average ACPs for CNN+QR, DBN+QR, and LSTM+QR
ECM vary from 0.0015 to 0.3490 with an average of 0.0837,                   are 79.46%, 79.31%, and 79.31%, respectively. The aver-
and a variance of 0.0048. Correspondingly, the variance of                  age ACPs for LGBM+QR, BPNN+QR, SVR+QR, and
CNN without ECM, DBN without ECM, and LSTM with-                            KNN+QR are 79.16%, 79.02%, 79.12%, and 79.31%,
out ECM are 0.0046, 0.0058, and 0.0077, respectively, and                   respectively. Meanwhile, The ISs of CNN + QR range from
their averages are 0.0883, 0.0903, and 0.0881, respectively.                −3.79 to −1.92 with an average of −3.07 to. The ISs
Compared with contrast models without ECM, it is found                      of DBN+QR range from −4.17 to −2.13 with an aver-
that the average index of CNN with ECM, DBN with ECM,                       age −3.32. The ISs of LSTM + QR range from −1.53
and LSTM RNN with ECM are improved by 2.44%, 2.96%,                         to −0.90 with an average of −3.48. The average ISs of
and 5.26%, respectively, and their variances are increased                  LGBM+QR, BPNN+QR, SVR+QR, and KNN+QR are
by 7.68%, 38.10%, and 60.42%, respectively. Moreover,                       −3.72, −3.67, −4.32, and −3.67, respectively. Compared
the green dashed lines in Figure12-Figure 14 also show                      with DBN+QR, LSTM +QR, LGBM+QR, BPNN+QR,
that the ECM can cut the hourly absolute percentage error                   SVR+QR, and KNN+QR, the IS of CNN+QR is improved
arising from a deep learning-based point prediction module.                 by 8.48%, 13.61%, 21.27%, 19.72%, 40.75%, and 19.64%,
Therefore, these diagrams results demonstrate that the ECM                  respectively.
can not only enhance forecasting accuracy but also manifest                    Take the case on March 1, the constructed prediction inter-
higher stability and robustness.                                            vals with confidence level 80% under various forecasting
   According to the results above, it is noticed that fore-                 models are also diagrammatically presented and illustrated
casting data of day-ahead electricity prices always exist                   in Figure 15. It has three distinct characteristics for different
indelibility errors (uncertainties), and have an adverse influ-             forecasting models. The first is that the observed values
ence on market participants’ bidding strategy. Therefore,                   in the prediction intervals with various forecasting models
a probabilistic prediction module with QR is introduced in                  have similar numbers. The second is that the offset distance
this work to describe these uncertainties. Table 5 presents                 between prediction interval and the out-of-range observed
each seasonal result with a confidence level of 80% by                      value of CNN+QR is the smallest than that of DBN+QR, and
using the proposed and contrasted models. In Table 5, the                   LSTM+QR, as the green dotted lines indicate. The third is
FIGURE 16. The ACPs with various confidence level using various forecasting models in various seasons.
FIGURE 17. The ISs with various confidence level using various forecasting models in various seasons.
that the width of various contrast forecasting models is longer             11.95%, 21.70%, 22.20%, 38.22%, and 24.46%, respec-
compared with the proposed forecasting models, as the red                   tively, compared with DBN+QR, LSTM+QR, BPNN+QR,
dotted lines show. It is clear from these charts that ACP results           LGBM+QR, KNN+QR, and SVR+QR. Therefore, it can be
of various forecasting models represent almost consistency in               concluded that the ISs of CNN+QR exhibit the best forecast-
each season, and IS results of CNN+QR perform best in each                  ing performance compared with other forecasting models,
season, followed by DBN+QR, LSTM+QR, BPNN+QR,                               while the ACPs of CNN+QR changed little. Figure 16 and
LGBM+QR, KNN+QR, and SVR+QR. This is mainly due                             Figure 17 respectively show the statistical results in terms of
to that the CNN model presents more petite errors, and is                   various confidence levels ranging from 2% to 98% in each
easier to predict these uncertainties than other forecasting                season. Obviously, it is in good agreement with figure results.
models.                                                                     From the cases above, it can be concluded that the day-ahead
   Considering that market participants may have differ-                    EPF framework with high-accuracy is great attractive for
ent risk preferences (here is confidence level), probabilis-                implementation.
tic forecasting performance with various confidence level
are compared using the different forecasting models in                       VI. CONCLUSION
each season to demonstrate the effectiveness and feasibil-                  In this article, a novel deep-learning based hybrid framework,
ity. Table 6 shows the average ACE and IS results with                      composed of feature preprocessing module, deep-learning
various confidence levels ranging from 2% to 98%. The                       based point prediction module, error compensation module,
results show that the ACPs of CNN+QR are averagely                          and probabilistic prediction module, is presented for forecast-
improved by 0.12%, 0.09%, 0.27%, 0.27%, 3.7%, and                           ing day-ahead electricity prices. The usage of the first mod-
2.7%, respectively, compared with DBN+QR, LSTM+QR,                          ules is applied to detect outliers and identify the correlated
LGBM+QR, BPNN+QR, SVR+QR, and KNN+QR. And                                   features of electricity price series. The three deep learning
the ISs of CNN+QR is averagely improved by 9.49%,                           models, DBN, LSTM RNN, and CNN in the second module,
143434                                                                                                                                  VOLUME 8, 2020
R. Zhang et al.: Deep Learning Based Hybrid Framework for Day-Ahead EPF
are proposed and compared to extract complicated nonlinear                            [17] G. Marcjasz, B. Uniejewski, and R. Weron, ‘‘On the importance of the
features. The residual errors between the forecasting and                                  long-term seasonal component in day-ahead electricity price forecast-
                                                                                           ing with NARX neural networks,’’ Int. J. Forecasting, vol. 35, no. 4,
actual prices can be reduced by the third module. The fourth                               pp. 1520–1532, Oct. 2019.
module is applied as a probabilistic evaluation estimator to                          [18] A. U. Haque, M. H. Nehrir, and P. Mandal, ‘‘A hybrid intelligent
depict the uncertainty issue under various confidence levels.                              model for deterministic and quantile regression approach for probabilis-
                                                                                           tic wind power forecasting,’’ IEEE Trans. Power Syst., vol. 29, no. 4,
   It is demonstrsted that the proposed hybrid deep-learning                               pp. 1663–1672, Jul. 2014.
framework has advantages in point forecasting performance                             [19] Z. Han and J. Liang, ‘‘The analysis of node planning and control logic
(MAPE, MAE, and RMSE) and probabilistic forecasting                                        optimization of 5G wireless networks under deep mapping learning algo-
                                                                                           rithms,’’ IEEE Access, vol. 7, pp. 156489–156499, 2019.
performance (ACP and IS) compared with the benchmarks                                 [20] C. Yan, H. Xie, D. Yang, J. Yin, Y. Zhang, and Q. Dai, ‘‘Supervised hash
(LGBM, BPNN, SVR, and KNN). Therefore, the deep-                                           coding with deep neural network for environment perception of intelligent
learning based hybrid framework proposed in this study has a                               vehicles,’’ IEEE Trans. Intell. Transp. Syst., vol. 19, no. 1, pp. 284–295,
                                                                                           Jan. 2018.
high potential in application of market participants’ bidding
                                                                                      [21] J. Zheng, X. Fu, and G. Zhang, ‘‘Research on exchange rate forecasting
strategies and uncertainty risk control.                                                   based on deep belief network,’’ Neural Comput. Appl., vol. 31, no. S1,
                                                                                           pp. 573–582, Jan. 2019.
                                                                                      [22] Y. Peng, J. Zhang, and Z. Ye, ‘‘Deep reinforcement learning for image
REFERENCES                                                                                 hashing,’’ IEEE Trans. Multimedia, vol. 22, no. 8, pp. 2061–2073,
 [1] I.-Y. Joo and D.-H. Choi, ‘‘Distributed optimization framework for energy             Aug. 2020.
     management of multiple smart homes with distributed energy resources,’’          [23] Y. Zhu, R. Dai, G. Liu, Z. Wang, and S. Lu, ‘‘Power market price fore-
     IEEE Access, vol. 5, pp. 15551–15560, 2017.                                           casting via deep learning,’’ in Proc. IECON 44th Annu. Conf. IEEE Ind.
 [2] D. Xu, Q. Wu, B. Zhou, C. Li, L. Bai, and S. Huang, ‘‘Distributed                     Electron. Soc., Oct. 2018, pp. 4935–4939.
     multi-energy operation of coupled electricity, heating and natural gas           [24] K. Muhammad, J. Ahmad, I. Mehmood, S. Rho, and S. W. Baik, ‘‘Convo-
     networks,’’ IEEE Trans. Sustain. Energy, early access, Dec. 23, 2020,                 lutional neural networks based fire detection in surveillance videos,’’ IEEE
     10.1109/TSTE.2019.2961432.                                                            Access, vol. 6, pp. 18174–18183, 2018.
 [3] K. Hubicka, G. Marcjasz, and R. Weron, ‘‘A note on averaging day-                [25] J. Bedi and D. Toshniwal, ‘‘Deep learning framework to forecast electricity
     ahead electricity price forecasts across calibration windows,’’ IEEE Trans.           demand,’’ Appl. Energy, vol. 238, pp. 1312–1326, Mar. 2019.
     Sustain. Energy, vol. 10, no. 1, pp. 321–323, Jan. 2019.                         [26] H. Wang, H. Yi, J. Peng, G. Wang, Y. Liu, H. Jiang, and W. Liu, ‘‘Deter-
 [4] A. Pourdaryaei, H. Mokhlis, H. A. Illias, S. H. A. Kaboli, S. Ahmad,                  ministic and probabilistic forecasting of photovoltaic power based on
     and S. P. Ang, ‘‘Hybrid ANN and artificial cooperative search algorithm               deep convolutional neural network,’’ Energy Convers. Manage., vol. 153,
     to forecast short-term electricity price in de-regulated electricity market,’’        pp. 409–422, Dec. 2017.
     IEEE Access, vol. 7, pp. 125369–125386, 2019.                                    [27] H.-Z. Wang, G.-Q. Li, G.-B. Wang, J.-C. Peng, H. Jiang, and Y.-T. Liu,
 [5] S. Zhou, L. Zhou, M. Mao, H.-M. Tai, and Y. Wan, ‘‘An optimized                       ‘‘Deep learning based ensemble approach for probabilistic wind power
     heterogeneous structure LSTM network for electricity price forecasting,’’             forecasting,’’ Appl. Energy, vol. 188, pp. 56–70, Feb. 2017.
     IEEE Access, vol. 7, pp. 108161–108173, 2019.                                    [28] C. Fan, Y. Sun, Y. Zhao, M. Song, and J. Wang, ‘‘Deep learning-based
 [6] F. Li and R. Bo, ‘‘Congestion and price prediction under load variation,’’            feature engineering methods for improved building energy prediction,’’
     IEEE Trans. Power Syst., vol. 24, no. 2, pp. 911–922, May 2009.                       Appl. Energy, vol. 240, pp. 35–45, Apr. 2019.
 [7] T. Li and M. Shahidehpour, ‘‘Strategic bidding of transmission-constrained       [29] M. Dicorato, G. Forte, M. Trovato, and E. Caruso, ‘‘Risk-constrained profit
     GENCOs with incomplete information,’’ IEEE Trans. Power Syst., vol. 20,               maximization in day-ahead electricity market,’’ IEEE Trans. Power Syst.,
     no. 1, pp. 437–447, Feb. 2005.                                                        vol. 24, no. 3, pp. 1107–1114, Aug. 2009.
 [8] D. Bissing, M. T. Klein, R. A. Chinnathambi, D. F. Selvaraj, and                 [30] S. Ahmed, Y. Lee, S.-H. Hyun, and I. Koo, ‘‘Unsupervised machine
     P. Ranganathan, ‘‘A hybrid regression model for day-ahead energy price                learning-based detection of covert data integrity assault in smart grid
     forecasting,’’ IEEE Access, vol. 7, pp. 36833–36842, 2019.                            networks utilizing isolation forest,’’ IEEE Trans. Inf. Forensics Security,
 [9] R. C. Garcia, J. Contreras, M. van Akkeren, and J. B. C. Gar-                         vol. 14, no. 10, pp. 2765–2777, Oct. 2019.
     cia, ‘‘A GARCH forecasting model to predict day-ahead electric-                  [31] C. Li, L. Chen, J. Feng, D. Wu, Z. Wang, J. Liu, and W. Xu, ‘‘Prediction of
     ity prices,’’ IEEE Trans. Power Syst., vol. 20, no. 2, pp. 867–874,                   length of stay on the intensive care unit based on least absolute shrink-
     May 2005.                                                                             age and selection operator,’’ IEEE Access, vol. 7, pp. 110710–110721,
[10] P. Mandal, A. U. Haque, J. Meng, A. K. Srivastava, and R. Martinez,                   2019.
     ‘‘A novel hybrid approach using wavelet, firefly algorithm, and fuzzy            [32] D. Rocchini and B. S. Cade, ‘‘Quantile regression applied to spectral dis-
     ARTMAP for day-ahead electricity price forecasting,’’ IEEE Trans. Power               tance decay,’’ IEEE Geosci. Remote Sens. Lett., vol. 5, no. 4, pp. 640–643,
     Syst., vol. 28, no. 2, pp. 1041–1051, May 2013.                                       Oct. 2008.
[11] E. Stathakis, T. Papadimitriou, and P. Gogas, Forecasting Electricity Price      [33] S. Ruder, ‘‘An overview of gradient descent optimization
     Spikes Using Support Vector Machines. Accessed: Jun. 21, 2017. [Online].              algorithms,’’      2016,    arXiv:1609.04747.        [Online].      Available:
     Available: https://ssrn.com/abstract=2990407                                          http://arxiv.org/abs/1609.04747
[12] N. M. Pindoriya, S. N. Singh, and S. K. Singh, ‘‘An adaptive wavelet             [34] F. T. Liu, K. M. Ting, and Z.-H. Zhou, ‘‘Isolation forest,’’ in Proc. 8th IEEE
     neural network-based energy price forecasting in electricity markets,’’               Int. Conf. Data Mining, Dec. 2008, pp. 413–422.
     IEEE Trans. Power Syst., vol. 23, no. 3, pp. 1423–1432, Aug. 2008.               [35] N. Tang, S. Mao, Y. Wang, and R. M. Nelms, ‘‘Solar power generation
[13] W. Zhao, L. Shang, and J. Sun, ‘‘Power quality disturbance classification             forecasting with a LASSO-based approach,’’ IEEE Internet Things J.,
     based on time-frequency domain multi-feature and decision tree,’’ Protec-             vol. 5, no. 2, pp. 1090–1099, Apr. 2018.
     tion Control Mod. Power Syst., vol. 4, no. 1, p. 27, Dec. 2019.                  [36] H. Wang, Z. Lei, X. Zhang, B. Zhou, and J. Peng, ‘‘A review of deep
[14] J. H. Zhao, Z. Y. Dong, X. Li, and K. P. Wong, ‘‘A framework for electricity          learning for renewable energy forecasting,’’ Energy Convers. Manage.,
     price spike analysis with advanced data mining methods,’’ IEEE Trans.                 vol. 198, Oct. 2019, Art. no. 111799.
     Power Syst., vol. 22, no. 1, pp. 376–385, Feb. 2007.                             [37] K. Wang, X. Qi, H. Liu, and J. Song, ‘‘Deep belief network based k-means
[15] P. Mandal, T. Senjyu, and T. Funabashi, ‘‘Neural networks approach to                 cluster approach for short-term wind power forecasting,’’ Energy, vol. 165,
     forecast several hour ahead electricity prices and loads in deregulated               pp. 840–852, Dec. 2018.
     market,’’ Energy Convers. Manage., vol. 47, nos. 15–16, pp. 2128–2142,           [38] H. Z. Wang, G. B. Wang, G. Q. Li, J. C. Peng, and Y. T. Liu, ‘‘Deep
     Sep. 2006.                                                                            belief network based deterministic and probabilistic wind speed forecast-
[16] P. Mandal, T. Senjyu, N. Urasaki, T. Funabashi, and A. K. Srivastava,                 ing approach,’’ Appl. Energy, vol. 182, pp. 80–93, Nov. 2016.
     ‘‘A novel approach to forecast electricity price for PJM using neural            [39] X. Kong, C. Li, F. Zheng, and C. Wang, ‘‘Improved deep belief network for
     network and similar days method,’’ IEEE Trans. Power Syst., vol. 22, no. 4,           short-term load forecasting considering demand-side management,’’ IEEE
     pp. 2058–2065, Nov. 2007.                                                             Trans. Power Syst., vol. 35, no. 2, pp. 1531–1538, Mar. 2020.
[40] H. Wang, Y. Liu, B. Zhou, C. Li, G. Cao, N. Voropai, and E. Barakhtenko,                                    RONGQUAN ZHANG received the M.E.
     ‘‘Taxonomy research of artificial intelligence for deterministic solar                                      degree in control science and engineering from
     power forecasting,’’ Energy Convers. Manage., vol. 214, Jun. 2020,                                          Shenzhen University, Shenzhen, China, in 2018.
     Art. no. 112909.                                                                                            He is currently a Research Assistant with the
[41] H. Liu, X. Mi, and Y. Li, ‘‘Smart deep learning based wind speed prediction                                 College of Urban Transportation and Logistics,
     model using wavelet packet decomposition, convolutional neural network                                      Shenzhen Technology University, and the Col-
     and convolutional long short term memory network,’’ Energy Convers.                                         lege of Mechatronics and Control Engineering,
     Manage., vol. 166, pp. 120–131, Jun. 2018.
                                                                                                                 Shenzhen University. His research interests
[42] S. Zubair, F. Yan, and W. Wang, ‘‘Dictionary learning based sparse coeffi-
                                                                                                                 include electricity market, and energy optimiza-
     cients for audio classification with max and average pooling,’’ Digit. Signal
     Process., vol. 23, no. 3, pp. 960–970, May 2013.                                                            tion and electric vehicles.
[43] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural
     Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[44] W. Kong, Z. Y. Dong, Y. Jia, D. J. Hill, Y. Xu, and Y. Zhang, ‘‘Short-term
     residential load forecasting based on LSTM recurrent neural network,’’
     IEEE Trans. Smart Grid, vol. 10, no. 1, pp. 841–851, Jan. 2019.
[45] H. Robbins and S. Monro, ‘‘A stochastic approximation method,’’ Ann.                                        GANGQIANG LI received the B.E. degree
     Math. Statist., vol. 22, no. 3, pp. 400–407, 1951.                                                          in electronic engineering from the Henan
[46] S. Lefevre, C. Sun, R. Bajcsy, and C. Laugier, ‘‘Comparison of parametric                                   University of Urban Construction, Pingding-
     and non-parametric approaches for vehicle speed prediction,’’ in Proc.                                      shan, China, in 2014, and the M.Eng. degree
     Amer. Control Conf., Jun. 2014, pp. 3494–3499.                                                              in control science and engineering from
[47] C. Wan, Z. Xu, Y. Wang, Z. Y. Dong, and K. P. Wong, ‘‘A hybrid approach                                     Shenzhen University, Shenzhen, China, in 2017,
     for probabilistic forecasting of electricity price,’’ IEEE Trans. Smart Grid,                               where he is currently pursuing the Ph.D. degree in
     vol. 5, no. 1, pp. 463–470, Jan. 2014.                                                                      communication and information engineering. His
[48] R. Tahmasebifar, M. K. Sheikh-El-Eslami, and R. Kheirollahi, ‘‘Point                                        research interests include machine learning, data
     and interval forecasting of real-time and day-ahead electricity prices by                                   mining, and distributed protocols.
     a novel hybrid approach,’’ IET Gener., Transmiss. Distrib., vol. 11, no. 9,
     pp. 2173–2183, Jun. 2017.
[49] S. Chai, Z. Xu, and Y. Jia, ‘‘Conditional density forecast of electricity price
     based on ensemble ELM and logistic EMOS,’’ IEEE Trans. Smart Grid,
     vol. 10, no. 3, pp. 3031–3043, May 2019.
[50] D. W. van der Meer, J. Widén, and J. Munkhammar, ‘‘Review on
     probabilistic forecasting of photovoltaic power production and electric-                                      ZHENGWEI MA received the B.E. degree in
     ity consumption,’’ Renew. Sustain. Energy Rev., vol. 81, pp. 1484–1512,                                       transportation from the North China University
     Jan. 2018.                                                                                                    of Water Resources and Electric Power, China,
[51] PJM. Electricity Market Data. Accessed: Dec. 10, 2019. [Online]. Avail-                                       in 2010, and the Ph.D. degree in vehicle engineer-
     able: http://www.pjm.com/                                                                                     ing from the South China University of Technol-
[52] X. Zhang, J. Wang, and K. Zhang, ‘‘Short-term electric load forecasting                                       ogy, China, in 2015. He has been an Assistant
     based on singular spectrum analysis and support vector machine opti-
                                                                                                                   Professor with the College of Mechatronics and
     mized by Cuckoo search algorithm,’’ Electr. Power Syst. Res., vol. 146,
     pp. 270–285, May 2017.
                                                                                                                   Control Engineering, Shenzhen University, since
[53] Y. Ju, G. Sun, Q. Chen, M. Zhang, H. Zhu, and M. U. Rehman,                                                   2015; and an Assistant Professor and an Asso-
     ‘‘A model combining convolutional neural network and LightGBM algo-                                           ciate Professor with the College of Urban Trans-
     rithm for ultra-short-term wind power forecasting,’’ IEEE Access, vol. 7,         portation and Logistics, Shenzhen Technology University, since 2017. His
     pp. 28309–28318, 2019.                                                            research interests include the design, control, and optimization in energy
[54] Y. Dai, H. Hua, C. Ma, H. Zhang, and L. Yang, ‘‘A fusion method for word          efficiency, safety, and man–machine of smart and sustainable mobility, espe-
     vector based on fasttext-KdTree,’’ in Proc. 7th Int. Conf. Adv. Cloud Big         cially for electric vehicles.
     Data (CBD), Sep. 2019, pp. 229–234.