Rainfall Forecasting
Rainfall Forecasting
net/publication/334089285
CITATIONS READS
8 2,885
5 authors, including:
2 PUBLICATIONS 14 CITATIONS
Space Application Center
22 PUBLICATIONS 33 CITATIONS
SEE PROFILE
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Urmay Shah on 07 October 2019.
Abstract—The paper is focused to provide the insights of analysis technique is also performed using no. of rainy days
climate to the clients from various businesses, e.g. and rainfall as the input variable. L. Ingsrisawang in[11] has
agriculturists, researchers etc., to comprehend the significance
done a comparative study for rainfall prediction using diff-
of changes in climate and atmosphere parameters like
precipitation, temperature, humidity etc. Precipitation erent machine learning techniques on the north-eastern part
estimate is one of the critical investigations in field of of Thailand. The paper shows that, how the feature selection
meteorological research. In order to predict precipitation, an can be used to find the correlation between other weather
endeavor is made to a couple of factual procedures and parameter and the rainfall, the paper also shows the same
machine learning techniques to forecast and estimate
meteorological parameters. For experimentation purpose daily
day, next day, and next 2-day classification using ANN,
observations were considered. The accuracy assessment of SVM, KNN. Thai meteorological department (TMD) data is
forecasting model experimentation is done using validation of used for experimental purpose. Attributes like temperature,
results with ground truth. The experimentation demonstrates humidity, pressure, wind, rain occurrence are used as input
that for forecasting meteorological parameters ARIMA and to the model. In [15] S.N Kohail has used daily historical
Neural Network works best, and best classification accuracy in
comparison to other machine learning algorithms for
data of the Gaza city and outlier analysis, prediction,
forecasting precipitation for next season was given by Random classification, and clustering is done for temperature
Forest model. prediction. The paper shows the temperature prediction and
classification for the Gaza city using many machine learning
Keywords—Precipitation, ARIMA, SVM, Decision Tree, Holt techniques, it also does outlier detection and clustering.
Winter, Machine Learning, Random Forest
Daily relative humidity, average temperature, wind speed
I. INTRODUCTION with direction, time of highest speed and rainfall is used as
In India, where the majority of agribusiness is dependent on an input parameter in the study. Onset monsoon for the
precipitation as its standard wellspring of water, the time Indian sub-continent is predicted based on features extracted
and measure of precipitation hold high importance and can from the satellite image using data mining methods. KNN
impact the entire economy of the nation. Climate plays a with euclidean distance is used for sea surface temperature
vital role in our everyday life. From the earliest starting (SST), cloud top temperature (CTT), cloud density, water
point of the human development, we are occupied with vapour attributes were used. It predicts the onset monsoon in
thinking about climatic changes. Weather forecasting is one advance 10-30 days is proposed in[13].
of the most challenging issues seen by the world, in a most Rainfall classification using supervised learning in Quest
recent couple of century in the field of science and (SLIQ), and decision tree method with different Gini index
technology. Prediction is the phenomena of knowing what is performed in[18]. Dew point, temperature, pressure,
may happen to a system in the near future. Present weather humidity, wind speed were used as an input parameter. Petre
observations are obtained by ground-based instruments and in [17](2008) proposed an approach that uses decision tree
from the satellite through remote sensing. As India's method with CART algorithm using data from
economy significantly depends on horticulture, precipitation meteorological department Hong Kong. They have used
plays an important part. year, month, average pressure, relative humidity, cloud
The monthly climatic changes using spatiotemporal mining quantity, precipitation, average temperature as an input
is being analyzed and the variability in seasonal rainfall parameter. The work is explained by S.-Y. Ji in[12] uses
using the IMD data with many rain gauge station decision tree with CART and C4.5 algorithm with
information is done by K. Chowdari in[5][1]. Cluster temperature, wind direction, wind speed, wind gust, outdoor
humidity, evaporation, solar radiation, dew point, cloud elaborates the methods used. The proposed architecture is
cover, air density, vapour pressure, pressure altitude as a explained in Section III. Section IV presents the source of
parameter. The proposed method predicts rain and it is data information, parameters used with their dates are
classified into three categories in hourly rain 0.0 to 0.5 mm mentioned along with comparison of the results of the
as level 1, 0.5 to 2.0 mm as level 2, > 2.0mm as level 3. different machine learning and forecasting methods. This
A comparative study of data mining techniques is being section also includes the final classification accuracy of the
done using historical weather data set. Analysis of different rainfall. We have concluded our work in Section V with
machine learning techniques for regression as well as for the future scope.
classification, paper shows that KNN performs better for
classification and Naive Bayes performs better for II. BASIC PRELIMINARIES
regression [2]. Forecasting monthly rainfall for the Assam A. Machine learning methods for Regression
region using multiple linear regression is performed with the Multiple Linear Regression: In multiple linear regression[7],
help of 6 years data gathered from regional meteorological multiple in-dependent parameters are taken as an input and
center Guwahati[6]. Using Nigerian Meteorological Agency based on the best-fitted line dependent continuous variable
data paper individually predict the min temp, max temp, is predicted. The relation between them is derived by
evaporation, rainfall and radiation using ANN and decision equation:
tree, error in rainfall is very high compared to other Y=a*X+b*Z+c
parameters prediction error in [14]. In [19] the comparison
Where Y =Dependent Variable, a,b =Regression Parameters
of several machine learning algorithms like ANN,
X, Z=Independent Variable, c=Intercept
Multiplicative Additive Regression Spline (MARS), radial Support Vector Regression: The support vector regression
basis SVM is done to forecast average daily and monthly (SVR) [3] uses the same principles as the SVM for
rainfall of the Fukuoka city Japan. Rainfall forecasting using classification, with only a few minor differences. To
neural network through the satellite image is attempted in minimize error, individualizing the hyper plane which
[14] with parameters like relative humidity, pressure, maximizes the margin, keeping in mind that part of the error
temperature, precipitate water, wind speed. Daily rainfall is tolerated in linear support vector regression.
prediction over Dhaka station in Bangladesh using markov Prediction of the rainfall using other independent
chain model and logistic regression is performed with the parameters (temperature, humidity, pressure, wind speed
help of no of rainy days, no of dry days and rainfall as a etc.) is attempted in many studies showing the comparison
parameter in [8][16]. of different machine learning techniques and claiming the
We have observed that most of the papers higher ac-curacy with categorizing rainfall in two to three
[5][2][19][14][9][3] claiming higher accuracy have categories, but most of them have not attempted the
forecasting of rainfall for next season using machine
classified rainfall into three or less than three categories or
learning techniques. In Few papers forecasting of the
have estimated rainfall using machine learning techniques
rainfall as well as different weather parameters like
but have not done rainfall forecasting using machine temperature, relative humidity, number of rainy days etc. is
learning techniques, few of them have used few attempted. The result shows forecasting rainfall individually
meteorological parameters for the estimation of the rainfall. gives less accurate result compared to other weather
The papers which are forecasting rainfall have used the parameters.
regression techniques and forecasting techniques have less As forecasting rainfall individually using forecasting
accuracy. We have proposed a model to predict the rainfall techniques gives less accuracy and prediction of rainfall
using a fusion of forecasting and machine learning with the help of different weather parameter using machine
techniques. Prediction of rainfall depends on various other learning techniques gives higher accuracy it is necessary to
parameters along with temperature. Classifying the rainfall design the fusion model.
gives us the good classification accuracy but our ultimate
III. PROPOSED ARCHITECTURE
goal is to predict the rainfall using the other forecasted
parameters. In this study objective is not only to correctly In the first part of the proposed model retrieved weather
classify rainfall but also correctly predict the rainfall using data is cleaned and reordered, after that the rainfall data is
categorized into different categories according to IMD
various forecasted parameters. Our work is focused on
guidelines. The data is partitioned into two parts 70% for
understanding the effects of different meteorological
training and 30% for testing. Four different machine
parameters in rainfall prediction along with an exploration learning techniques like a decision tree, random forest,
of approaches which were used for forecasting rainfall, KNN, SVM were applied on the partitioned data, the
machine learning, and their limitation. The proposed model individual results were also analyzed and tuned.
predicts the rainfall for the next season using machine In the second part of the proposed model, the correlation
learning and forecasting techniques. Our contribution to this of the rainfall with minimum temperature, maximum
problem is to analyze the accuracy of different machine temperature, relative humidity and wind speed were
learning and forecasting techniques to predicts precipitation calculated. From the study, it is found that all four
for next season. parameters have significant importance with the rainfall. All
The rest of paper is organized as follows: Section II past years maximum temperature and minimum temperature
were retrieved except last year. Based on the past data six detailed explanation of tuned parameters. The detailed
different forecasting methods (Holt winter method [10], analysis of the best-fitted model and comparison of all
ARIMA model [10], Simple Moving Average model [2], methods based on performance is done.
Neural Network method [10], Seasonal Naive method [10]) The data for the experimental purpose is retrieved from
were applied and the best-fitted model output was taken into global weather site and it is provided by National Centers for
consideration. Relative humidity and wind speed were Environmental Prediction (NCEP). For experimentation,
retrieved from minimum temperature and maximum daily data from 1/1/1979 to 7/31/2014 is collected from five
temperature using linear regression and support vector locations. Data also contains parameters like minimum
regression as it is found that it gives better accuracy by this
method compared to a direct forecast of the individual.
original data set. Table 1 shows, the 365 days forecasted to December month forecast. Confusion matrix shows that
maximum temperature error. The result shows that in the very heavy rain clas-si ed to none. Results also show the
case of maximum temperature ARIMA model performs considerable accuracy for the no rain, very light rain,
better than the other model. Arima(3,0,4) is the best-fitted moderate rain. For the very heavy rain, heavy rain and rather
model. heavy rain results were not impressive.
Forecasting Minimum Temperature: Various forecasting Decision Tree: In this method, we have used Gini index
method for fore-casting the minimum temperature were algorithm for the selection of the most homogeneous node.
analyzed, neural network show significant low RMSE Higher the value of Gini higher the homogeneity and based
compared to the other model. NNR(30,1,16)[365] performs on that decision tree is generated.
the best fit model. Average of 20 networks, each of which is
a 31-16-1 network with 529 weights options were -linear
output units. Estimated sigma2 =0.01786
Forecasting Relative Humidity: As the correlation between The process of pruning is also done in order to limit the
relative humidity and rainfall is significant 0.303. We have level of the tree. To ensure that tree is not overfitted or
also forecasted relative humidity. We have used minimum underfitted we have also tuned tree. For level 5, it shows the
temperature and maximum temperature as the input to the best result, to avoid overfitting, we have taken only up to 5
level.10-fold cross validation is done on this data set for
model and predicted relative humidity. Forecasted minimum
measuring the accuracy of the model.
and maximum temperature were given as the input instead Results were also analyzed by confusion matrix. It is
of the measured temperature to get the final model accuracy. found that unlike the KNN, this method has classified very
The result shows that support vector regression which is a heavy rain. But same as the case in KNN it only shows the
combination of linear regression and support vector machine considerable accuracy for the no rain, moderate rain and for
works best. very light rain.
Forecasting Wind Speed: Wind speed is one of the Support Vector Machine: In order to give best
important parameter for predicting the rainfall as its classification accuracy different combination of kernels,
correlation with the rainfall is 0.49. It is also impor-tant to gamma, C values were tried for the tuning purpose. Radial
forecast the wind speed(m/s). We have also applied the two base function kernel, linear kernel, sigmoid kernel were
regression techniques for predicting the wind speed giving given for kernel parameter, different gamma values and C
two input parameters minimum temperature and maximum values were also given. It is found that linear kernel with
temperature, as a result support vector regression gives less gamma value 0.1 and C value 1 gives best accuracy
RMSE compare to simple linear regression.For papers with compared to others. From the confusion matrix, it is found
that SVM is unable to classify Heavy Rain and Very Heavy
more than six authors: Add author names horizontally,
Rain. For even light rain and for very light rain results were
moving to a third row if needed for more than 8 authors. poor.
Table 2: Forecasted Relative Humidity and Wind Speed RMSE In the experimentation we have taken more number of
Forecasted Relative Forecasted Wind classes to classify the rainfall, but as SVM works best with
Humidity Speed optimal margin, there may be the case that multiple category
Method RMSE(Fraction) Method RMSE(m/s) overlap each other and because of which SVM performs
Linear Linear worst compare to others.
Regression 0.75 Regression 0.1345
Support Random Forest: Random forest [4] is a tree based model,
Vector Support Vector it is a collection of many tree models. We have applied
Regression 0.68 Regression 0.1116 different tuning parameters for tuning it. As in random forest
case, one of the parameters is how many trees should be used
to get the more accurate results. It works well with high
variance low bias models. It is noticed that after 250 number
B. Machine Learning Model trees error rate is constant. So, we will restrict number of
KNN Method: To identify the best k nearest neighbor, we trees to 250 in the forest. From the confusion matrix, it is
found that for very light rain Random forest method gives
have tried with different values of K. The study reveals that
the best accuracy. It also performs well for the no rain,
k=15 gives best classification accuracy for the 1-year moderate rain, and for light rain.
forecast, and k=9 gives best classification accuracy for June
Table 3: Accuracy on 30% test data. axis against False Positive Rate (1-Speci city) on x-axis for
AUC
(Area
each categories.
Under Precisio Recal Table 5 shows the final classification accuracy of each
Method Curve) Classification Accuracy n l method with forecasted parameters as an input to the
trained model. In a country like India, where rainfall occurs
KNN 0.873 0.721 0.691 0.721 in only limited no. of the month. So for that, we have also
0.721
analyzed our accuracy for monsoon season and it is noticed
Tree 0.755 0.716 0.721
that it gives considerable classification accuracy.
SVM 0.684 0.539 0.659 0.539
Random Table 5: Confusion Matrix for the Random Forest (for Jun to Dec)
Forest 0.914 0.762 0.744 0.76
Actual(Days)
Table 4: Final Accuracy Comparison on Forecast Very Very
predicted Heavy Light Moderate No Rather Heavy Light
Final Accuracy (Days) Rain Rain Rain Rain Heavy Rain Rain
Final Accuracy(1 year-365 (for June Heavy
Method days) To Dec) Rain 0 0 0 0 0 0 0
C. D.
E. F.
G.