2018 International Conference on Frontiers of Information Technology (FIT)
Support Vector Machine and Gaussian Process
         Regression based Modeling for Photovoltaic Power
                           Prediction
                Sidra Kanwal                                      Bilal Khan                                 Sahibzada Muhammad Ali
            Electrical Engineering                          Electrical Engineering                             Electrical Engineering
        COMSATS University Islamabad,                   COMSATS University Islamabad,                      COMSATS University Islamabad,
         Abbottabad Campus, Pakistan                     Abbottabad Campus, Pakistan                        Abbottabad Campus, Pakistan
           sidrakanwal@ciit.net.pk                          bilalkhan@ciit.net.pk                              hallianali@ciit.net.pk
          Chaudhry Arshad Mehmood                           Muhammad Qasim Rauf
            Electrical Engineering                          Electrical Engineering
        COMSATS University Islamabad,                    Capital University of Science &
         Abbottabad Campus, Pakistan                    Technology, Islamabad, Pakistan
             chaudhry@ciit.net.pk                       muhammadqasimrauf@gmail.com
          Abstract— Grid integration of Solar energy positively affects         forecasting is of great concern for grid interfaced PVGs [3].
      energy market due to inexhaustible fuel supply and virtually              The significance of power forecast is driven by
      zero emissions. However, inexhaustible renewable fuel supply is           overwhelming grid requirements, ranging from startup time
      punctuated by the problem of intermittency. Intermittency                 of a conventional power plant to the energy market
      exacerbates the problem of grid operators to bridge the supply
                                                                                perspective [4]. The PVG output power depends on solar
      and demand gap. Thus, precise output power forecast of grid
      interfaced Photovoltaic (PV) systems is required for economic             irradiance that in turns varies with cloud motion and
      dispatch, market regulation, and stable grid operation. This              movement of the earth. The grid instability takes place by the
      study compares the statistical models of Gaussian Process                 abrupt variations of input supply from solar plant to grid.
      Regression (GPR) and Support Vector Machine (SVM) for solar               Thus, precise forecast is of key importance. The grid operator
      power prediction. The models are trained to predict PV system             notices the difference between predicted and generated power
      output power against the backdrop of data recorded for                    to timely run the backup generators using an accurate
      Abbottabad City, Pakistan. Both the models have been trained,             forecast. Consequently, the low operation cost along with
      validated, and compared with each other for varying irradiance            grid reliability is ensured [5], [6]. Numerous techniques are
      and temperature settings. The results depicted that SVM based
                                                                                proposed in the literature to predict PVG power. Few models
      modeling excel in solar power prediction with Root Mean
      Square Error (RMSE) lower than GPR based modeling                         used exogenous data based on satellite images, sky imager
      technique. Performance evaluation of models is conducted with             information, and nearby installed PV models, while some
      error metrics of RMSE, Mean Absolute Error (MAE), and                     other models used nonexogenous information of location [5].
      Mean Square Error (MSE). Moreover, prediction quality is                  Forecasting span varies from instants to days. Conversely,
      qualified based on residual analysis benchmarked by load line             spatial prediction perspectives are from individual location to
      analysis of PV system in Simulink.                                        a district. Prediction span is categorized as short, medium,
                                                                                and long duration forecast [7]. An Artificial Neural Network
         Keywords – Gaussian process; Microgrid; Machine learning;              (ANN) model based 48 hours ahead PVG output power
      Photovoltaic system; Support Vector Machine; Solar Power
      Prediction
                                                                                forecasting technique for Italy was recommended in [8]. They
                                                                                observed that suggested model forecasts PVG power for
                            I.    INTRODUCTION                                  sunny, partial shading and cloudy day having Root Mean
          The power output of Photovoltaic Generators (PVGs) is                 Square Error (RMSE) of 12.5, 24, and 36.9 respectively.
      intermittent unlike the conventional fossil fuel based power              Autoregressive Moving Average with exogenous inputs
      plants [1]. The key phenomena influencing PV systems                      (ARMAX) for PV power prediction was employed in [9].
      performance are earth movements and cloud shading. Earth                  They established that suggested model improves output
      movements are deterministic and hence the PVG output                      precision. A Numerical Weather Prediction (NWP) technique
      power is calculated precisely during clear sky over various               based PVG output power prediction was implemented in
      time scales. However, cloud shading is random process. The                [10]. Multivariate Adaptive Regression Splines (MARS) for
      sudden irradiance variations due to partial shading induce                power prediction of PV system was described in [11]. They
      PVG output power variations. The grid operators are                       compared results with additional techniques as well. They
      concerned for these variations because the abrupt changes in              observed that suggested technique achieved better results
      grid interfaced PVG power output will result into grid                    than other methods for testing and training [11].
      perturbations [2].                                                            The extensive literature exists about comparison of
          The recent smart grid management requires real time                   statistical and deterministic techniques for modeling PV
      energy production, incorporating demand variations and                    systems. Nonlinear behavior of PVG cannot be accurately
      PVG input variations effectively and efficiently. Thus,                   modeled by using deterministic method. Five parameter
      forecasting the energy produced by PVGs helps in the                      deterministic model is compared with Gaussian Process
      effective and secure grid operation. 72 hours ahead                       Regression (GPR) based technique and superiority of GPR
978-1-5386-9355-1/18/$31.00 ©2018 IEEE                                    117
DOI 10.1109/FIT.2018.00028
over deterministic modeling is established for internal reserve           from GP and basis functions, ℎ [14].The        covariance
estimation of PVG in [12]. The highlight of this work is to               function assures smooth response. The conversion of (1) is
train Support Vector Machine (SVM) and GPR for PV                         illustrated in (2).
system power prediction. Both the techniques are then
compared with each other for accuracy. The technique with                                    = ℎ( )          + ( ),
                                                                                                                                                           (2)
low estimated forecasting error is declared as appropriate for                                ( )~        (0, ( ,       )),
output power prediction.
                                                                              where ( , ) depicts covariance function. At any time,
    The main contributions of this paper are:                             y is illustrated as (3).
x   GPR based statistical modeling of PV system is
    developed by considering weather conditions of                                       ,       ~            ℎ            +               ,           ,
                                                                                                                                                           (3)
    Abbottabad, Pakistan. System is designed for output
    power prediction of PVG and Maximum Power Point                                  ( | , )~ ( |                  + ,            ),
    (MPP) estimation in ambient weather.                                      The remaining model parameters are initial coefficients
x   SVM technique for power prediction of PVG is also                     ( = 1.0855 + 5 ), and variance (             = 72.97 + 3 ).
    developed for varying temperature and irradiance                      Matern 5/2 is chosen as the kernel or covariance function ( ),
    settings. The model is also formulated for MPP in                     as illustrated in (4).
    ambient weather settings.
x   Both the statistical models are compared with each other
                                                                                ,    =              1+
                                                                                                         √
                                                                                                              +         exp −
                                                                                                                                       √
                                                                                                                                                   .       (4)
    for accurate solar power prediction. Moreover,
    performance evaluation of models is conducted with
    error metrics.                                                           where           =        −                −           is the Euclidean
    The work is structured as: Section II demonstrates                    distance between           and       [14].
statistical techniques of GPR and SVM for output power
prediction of PV system. The generally used performance                   B. Support Vector Machine
evaluation indicators are also described. Section III presents                SVM is a kernel function based nonparametric technique.
microgrid architecture, training data description, and                    The technique is created on Structural Risk Minimization
comparative analysis of GPR and SVM. Finally, conclusion                  (SRM). The SRM based learning algorithms involve control
and future directions are illustrated in Section IV.                      of two factors namely empirical risk and confidence interval
                                                                          value. First term in inequality of (5) is empirical risk and
               II.     MATERIALS AND METHODS
                                                                          other is confidence interval.
    Artificial intelligence teaches computer to learn a specific
task independently using data without relying on                                                 ≤                + Φ              .                       (5)
mathematical analysis. Machine learning demonstrates                                                                          ℎ
computers to acquire from experience. The algorithms                          where /ℎ denotes relation between training samples and
adaptively learn accurately by increasing training data                   Vapnik-Chervonenkis (VC) dimension of machine’s
samples. The image and speech recognition, electricity load               functions. SVM keeps empirical risk value fixed or zero and
forecasting, and data mining are numerous applications of                 minimize value of confidence interval. Thus, the task is to
machine learning, categorize as supervised and unsupervised               search for function that results in minimized error for training
learning. Supervised learning predicts future response by                 data samples. SVM implement sets of functions, as depicted
training input samples. Regression learning estimates                     in (6).
continuous response, while Classification learning estimates
categorical output [13]. GPR and SVM techniques of
supervised learning is used in this paper for PVG output                        ( , , )=                               ( ,        )−           .           (6)
power prediction.
A. Gaussian Process Regression                                                where is an integer, scalars in         are = 1, … . . , ,
    In this section, GPR method is summarized for predicting              while vectors in        are = 1, … . . , . ( , ) is kernel
dynamic system of PVG. The regression framework takes                     symmetric function [15]. Continuous function of accurate
                                                                          degree can be approximated by using neural network,
training sets =        ,     = 1, … . . } as input, where , x,
                                                                          polynomial or radial basis function kernel type. The kernels
and y presents observations, covariates, and response                     generating polynomials, radial basis function, and neural
respectively. Trained GPR forecasts         for an input     .            networks to obtain approximating functions are illustrated in
The model is represented in (1).                                         (7), (8), and (9) respectively. Thus, learning machine is
                                                             (1)          categorized by only varying kernel ( , ) in SVM. The
          =          + ,        ~    (0,    ),                            best possible results are achieved, if vectors  in (6) overlap
                                                                          with majority vectors of training datasets.
   where      denotes error variance and         are coefficients
calculated from input data.                                                              ( ,        ) = [( ∗      ∗)
                                                                                                                       + 1] .                              (7)
   Gaussian Process (GP) is definite random variables count
having collective Gaussian distribution. GPR model converts                                   ( ,    )=       (| −         |).                             (8)
(1) by incorporating latent variables      , = 1,2, … ,515,
                                                                    118
                                                      Fig. 1. Schematic of Machine Learning based grid interfaced PV system.
             ( ,       )= ( ( ∗             ) + ).                           (9)
                                                                                                                       1                            (12)
                                                                                                                  =            .
    SVM has low complexity and good data fitting feature.
SVM performs multiple linear regression employing
transformed predictors. The objective is to observe and fine                                             III.   RESULTS AND DISCUSSION
tune three major parameters: precision (H), cost (C) to handle                               The comparison of SVM and GPR based PVG modeling
tradeoff concerning model complexity and accuracy, and
                                                                                         is presented here. The model with an appropriate output
kernel regulator (J). The underlying ability of SVM to
                                                                                         power prediction is qualified as better technique for opted
generalize solutions for non-linear problems is remarkable
[15].                                                                                    climate conditions. RMSE, MBE, and MSE serve as
                                                                                         prediction quality parameters. Moreover, model reliability is
C. Evaluation of Model Parameters                                                        established through visual analysis of residual plots of
    Few standard statistical error metrics are employed in this                          respective modeling technique.
work. The purpose is to evaluate and compare two techniques
of GPR and SVM for accurate output power prediction [13].                                A. Microgrid Architecture
Mean Absolute Error (MAE) and Mean Biased Error (MBE)                                        The system under observation is a microgrid model
are presented in (10) and (11).                                                          consisted upon 100kW PVG and 50kW backup diesel
                                                                                         generator connected to a utility grid, as depicted in Fig. 1. The
                       1                                                    (10)
                   =        (       ,   −         ,    ).                                diesel generator serves as a peaker plant to balance supply
                                                                                         demand variance, after the PVG’s internal power reserve is
                                                                                         exhausted. Unbalance of supply and demand is reflected by
                        1                                                   (11)         the grid frequency deviation. The power demand greater than
                   =            ,       −     ,        .
                                                                                         supply results in grid frequency rise from nominal 60Hz, and
                                                                                         vice versa. Microgrid Control System (MCS) is a centralized
       where     , denotes measured power,      , represents
                                                                                         microgrid control architecture, that supplies PWM for
forecast of prediction method, and N is data samples.                                    suboptimal MPP operation[12]. The suboptimal MPP is
                                                                                         suggested by either the SVM or GPR model, and is fine-tuned
       MBE is not a consistent parameter, because data
                                                                                         based on the unbalance grid frequency. MCS also generates
residuals constantly compensate one another. However, MBE
presents model’s capability for estimation. RMSE is depicted                             inverter control signals to produce three phase grid interfaced
in (12). Meteorology, Economics, and Regression analysis                                 240V (RMS) voltages. During utility grid faults, MCS
are few fields using RMSE as performance evaluation                                      isolates the microgrid by connecting microgrid islanding
indicator [13].                                                                          relay. Finally, in the event of the PVG saturation at 95% of
                                                                                         MPP operation, MCS activates secondary frequency
                                                                                         protection layer by triggering the peaker plant.
                                                                                   119
B. Site Description and Data
    The training data for both statistical models are yearly
irradiance and temperature recorded for Abbottabad,
Pakistan. During Autumn and Spring, temperature changes
from mild to warm. However, hot weather is experienced in
middle of year with cool to mild conditions in Winter time.
Heavy monsoon season in July through September is
followed by sparse snowfall in the months of December and
January [16]. Both GPR and SVM is trained for the climate
data of Abbottabad, Pakistan.
    The irradiance data illustrated in Fig. 2 as scatter plot is
recorded annually in HOMER software. Hours, days, and
irradiance are depicted on the x, y, and z axis respectively.
Fig. 3 presents average temperature per month during 2009-
                                                                              Fig. 4. Training data of Irradiance, Maximum power, and Temperature for
2017 [17]. Peak temperature is observed in the mid of every                                             GPR and SVM based modeling.
year. The relation of maximum output power, temperature,
and irradiance is represented in Fig. 4. The collected data of                C. Model Training
irradiance, temperature, and computed maximum power are                           SVM and GPR models are trained for data obtained from
used to train GPR Matern 5/2 and SVM technique. Maximum                       the load line analysis of a 100kW PVG. The climate
power output is predicted using trained model for ambient                     conditions of Abbottabad, Pakistan are tabulated as training
irradiance and temperature settings.                                          inputs or predictors. The maximum power of PVG obtained
                                                                              from load line study in Simulink is tabulated as training
                                                                              output or response. The 515 predictor samples and respective
                                                                              response observations are fed into Statistics and Machine
                                                                              Learning Toolbox of MATLAB. Model for training are SVM
                                                                              and GPR Matern 5/2 model. The models are anticipated to
                                                                              forecast maximum power output as anticipated by the
                                                                              respective irradiance and temperature, learnt from training.
                                                                              Predicted versus Actual plot in Fig. 5. compares SVM and
                                                                              GPR responses with true response. The technique of best
                                                                              prediction quality is expected to lie flatly on the diagonal line,
                                                                              represented by the true response. Fig. 5. depicts a close
                                                                              matching of the two modeling techniques.
        Fig. 2. Photovoltaic annual irradiance model of Abbottabad,
                                      Pakistan.
                                                                                 Fig. 5. Response plot comparing SVM and GPR response with true
                                                                                                                response.
                                                                                 Fig. 6. and Fig. 7. represent individual impact of input
                                                                              predictors of irradiance and temperature on the predicted
                                                                              response, that is PVG output power of two modeling
     Fig. 3. Average temperature of Abbottabad, Pakistan observed per         techniques. The individual predictor impact of the two
                              month in 2009-2017.                             techniques is also closely matched, as anticipated from the
                                                                              Predicted versus Actual plot in Fig. 5.
                                                                        120
                                                                                      The superiority of SVM is further cemented by comparing
                                                                                  performance indices of two techniques, as highlighted in
                                                                                  Table 1. The indicators opted for evaluation are MSE, RMSE,
                                                                                  and MAE. It is clear from Table I that SVM technique
                                                                                  estimates nonlinear behavior of PV system more
                                                                                  appropriately than GPR and thus accurately predicts output
                                                                                  power.
                                                                                        TABLE I.   Performance indicators comparison of GPR and SVM.
                                                                                                               Gaussian Process       Support Vector
                                                                                   Performance Indicators
                                                                                                               Regression             Machine
                                                                                   Root Mean Square
                                                                                                               102.32                 22.31
Fig. 6. Irradiance impact on plot comparing SVM and GPR response with              Error (RMSE)
                                   true response.                                  Mean Absolute Error
                                                                                                               76.21                  15.43
                                                                                   (MAE)
                                                                                   Mean Square Error
                                                                                                               10468.72               497.93
                                                                                   (MSE)
                                                                                   Training Time (s)           19.97                  16.87
                                                                                               IV.     CONCLUSION AND FUTURE WORK
                                                                                      Numerous models for PV system output power prediction
                                                                                  are proposed and tested lately. In this work, GPR and SVM
                                                                                  based statistical model is trained for output power forecast of
                                                                                  PV system. The techniques have been trained, tested, and
                                                                                  validated for varying temperature and irradiance conditions
                                                                                  of Abbottabad, Pakistan. The performance of both statistical
                                                                                  techniques is evaluated using standard statistical error
 Fig. 7. Temperature impact on plot comparing SVM and GPR response
                             with true response.
                                                                                  metrics. It is observed that RMSE for GPR is 102.32, while
                                                                                  that of SVM is 22.31. Thus, SVM based trained model is
    Residual plot is another helpful tool to minutely compare                     better than GPR for proposed scenario of power prediction.
the prediction quality of any model. Residual plot displays                           In future, extended microgrid incorporating multiple
the error between the predicted and the true response. The                        sources will be simulated. Hardware validation of the
model with residuals concentrated symmetrically across the                        statistical techniques will be carried out for testing purpose.
x-axis is expected to perform better. The residuals of SVM
are more symmetrically concentrated along the x-axis, as
depicted in Fig. 8. GPR offers inferior prediction quality for                                                REFERENCES
extremely low irradiance cases as represented by an outlier in                    [1]        F. D. Alexander Buttler, Simon Franz, Hartmut Spliethoff,
Fig. 8. Conversely, SVM offers better prediction quality for                                 “Variability of wind and solar power – An assessment of
all irradiance cases, as evident from x-axis concentrated                                    the current situation in the European Union based on the
residuals in Fig. 8.                                                                         year 2014,” Energy, vol. 106, pp. 147-161, 2016.
                                                                                  [2]        M. A. Andrew Mills, Michael Brower, Abraham Ellis,
                                                                                             Ray George, Thomas Hoff, Benjamin Kroposki, Carl
                                                                                             Lenox, Nicholas Miller, Michael Milligan, Joshua Stein,
                                                                                             Yih-huei Wan, “Understanding Variability and
                                                                                             Uncertainty of Photovoltaics for Integration with the
                                                                                             Electric Power System,” IEEE Power and Energy
                                                                                             Magazine, vol. 9, pp. 33 - 41, 2011.
                                                                                  [3]        T. S. Elke Lorenz , Johannes Hurka, Detlev Heinemann,
                                                                                             Christian Kurz, “Regional PV power prediction for
                                                                                             improved grid integration,” Progress in Photovoltaics,
                                                                                             vol. 19, pp. 757-771, 2011.
                                                                                  [4]        M. B. Marius Paulescu, Remus Boata, Viorel Badescu,
                                                                                             “Structured, physically inspired (gray box) models versus
                                                                                             black box modeling for forecasting the output power of
                                                                                             photovoltaic plants,” Energy, vol. 121, pp. 792-802, 2017.
                                                                                  [5]        N. O. J.Antonanzas, R.Escobar, R.Urraca, F.J.Martinez-
                                                                                             de-Pison, F.Antonanzas-Torres, “Review of photovoltaic
                                                                                             power forecasting,” Solar Energy, vol. 136, pp. 78-111,
                                                                                             2016.
                                                                                  [6]        V. O. Pamela Ramsami, “A hybrid method for forecasting
Fig. 8. Residual plot comparing deviation of predicted response from true                    the energy output of photovoltaic systems,” Energy
                          response for GPR and SVM.                                          Conversion and Management, vol. 95, pp. 406-413, 2015.
                                                                            121
[7]       M. N. Muhammad Qamar Raza, Chandima Ekanayake,                     [12]   B. K. Sidra Kanwal, Sahibzada Muhammad Ali,
          “On recent advances in PV output power forecast,” Solar                   Chaudhry Arshad Mehmood, “Gaussian process
          Energy, vol. 136, pp. 125-144, 2016.                                      regression based inertia emulation and reserve estimation
[8]       A. D. S.Leva, F.Grimaccia, M.Mussetta, E.Ogliari,                         for grid interfaced photovoltaic system,” Renewable
          “Analysis and validation of 24 hours ahead neural                         Energy, vol. 126 pp. 865-875, 2018.
          network forecasting of photovoltaic output power,”                 [13]   G. N. Cyril Voyant, Soteris Kalogirou,Marie-Laure
          Mathematics and Computers in Simulation                                   Nivet,, and F. M. Christophe Paoli, and Alexis Fouilloy,
vol. 131, pp. 88-100, 2017.                                                         “Machine learning methods for solar radiation
[9]       Y. S. Yanting Li, Lianjie Shu, “An ARMAX model for                        forecasting: A review,” Renewable Energy vol. 105 pp.
          forecasting the power output of a grid connected                          569-582, 2017.
          photovoltaic system,” Renewable Energy, vol. 66, pp. 78-           [14]   C. K. I. W. Carl Edward Rasmussen, Gaussian Processes
          89, 2014.                                                                 for Machine Learning: MIT Press, 2006.
[10]      L. N. David P.Larson, Carlos F.M. Coimbra, “Day-ahead              [15]   V. N. Vapnik, The Nature of Statistical Learning Theory,
          forecasting of solar power output from photovoltaic plants                first ed.: Springer, New York, USA, 1995.
          in the American Southwest,” Renewable Energy, vol. 91,             [16]   "Abbottabad Wikipedia," 07 December, 2017;
          pp. 11-20, 2016.                                                          https://en.wikipedia.org/wiki/Abbottabad.
[11]      Y. H. Yanting Li, Yan Su, Lianjie Shu, “Forecasting the            [17]   "Abbottabad Weather Averages," 10 November, 2017;
          daily power output of a grid-connected photovoltaic                       https://www.worldweatheronline.com/v2/weather-
          system based on multivariate adaptive regression splines,”                averages.aspx?q=aaw.
          Applied Energy, vol. 180, pp. 392-401, 2016.
                                                                       122