0% found this document useful (0 votes)

50 views22 pages

Mini Project Report

The document discusses developing a rainfall prediction system using data mining algorithms. It proposes using numerical weather prediction, statistical weather prediction, and synoptic weather prediction techniques. The system would help farmers and those involved in water resource management by predicting rainfall amounts and timing. This could improve planning and reduce economic losses from unexpected rainfall or drought.

Uploaded by

schaudhary2332

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views22 pages

Mini Project Report

Uploaded by

schaudhary2332

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Mini Project Report on

RAINFALL PREDICTION SYSTEM

Submitted in partial fulfilment of the requirement for the award of the

degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE & ENGINEERING

Submitted by:

Student Name: University Roll No.

2019162
SUMIT MALAN

Under the Mentorship of

Ms. Meenakshi Maindola

Department of Computer Science and Engineering

Graphic Era (Deemed to be University)
Dehradun, Uttarakhand
January-2024
CANDIDATE’S DECLARATION

I hereby certify that the work which is being presented in the project report entitled “Rainfall
Prediction System” in partial fulfillment of the requirements for the award of the Degree of
Bachelor of Technology in Computer Science and Engineering of the Graphic Era (Deemed
to be University), Dehradun shall be carried out by the under the mentorship of Ms.
Meenakshi Maindola , Department of Computer Science and Engineering, Graphic Era
(Deemed to be University), Dehradun.

Name :- Sumit malan

University Roll no:- 2019162
Table of Contents

Chapter No. Description Page No.

Chapter 1 Introduction 1-3
Chapter 2 Literature Survey 4-5
Chapter 3 Methodology 6-14
Chapter 4 Result and Discussion 15-17
Chapter 5 Conclusion and Future Work 18-19
References
Chapter 1

Introduction

1.1 Introduction

Rainfall prediction remains a serious concern and has attracted the attention of governments,

industries, risk management entities, as well as the scientific community. Rainfall is a

climatic factor that affects many human activities like agricultural production, construction,

power generation, forestry and tourism, among others. To this extent, rainfall prediction is

essential since this variable is the one with the highest correlation with adverse natural events

such as landslides, flooding, mass movements and avalanches. These incidents have affected

society for years. Therefore, having an appropriate approach for rainfall prediction makes it

possible to take preventive and mitigation measures for these natural phenomena.

Weather forecasting is seen as dynamic and time consuming Even with the new scientific

advancements, doing jobs Thanks to the intensely active situation so in fact, messy. There are

deterministic and mathematical knowledge Climate prediction methods. The capacity to

process knowledge correctly in a dynamic world is artificial intelligence.

Land parameters are not predefined and thus not set. The forecast of rainfall remains a major

issue and has been called to attention Governments, markets, risk control agencies and

research community. They are obtained through the collection of knowledge. Nevertheless,

the last decade has seen a prior focus both in science and in tech developers in the field. Their

involvement is unprecedented. In a hydrological model, rainfalls are one of the most critical

parameters. Many models for the analysis and prediction of precipitation patterns have been

established. Thanks to the time-frequency interpretation of different water management work

1
in recent years, wavelet techniques have been commonly implemented. In this work to

predict the rainfall by using data mining algorithm such as neural network, random forest,

classification and regression tree, support vector machine, and k-nearest neighbor. The

proposed algorithm of this work is spatial - temporal algorithm in mining for better

understanding of the weather and climate data. Also, developed some novel algorithms in

mining techniques based on case studies such as rainfall analysis and simulation, cyclone

analysis and simulation and temperature analysis and simulation. This paper found that

various parameters that causes the precipitation in the atmosphere such as temperature,

humidity, based on correlation. In this work proposed we have predicted the rainfall based

on quantitative data of current atmospheric state and complex combination of mathematical

abstractions.

1.2 Existing System

Agriculture is the strength of our Indian economy. Farmer only depends upon monsoon to be

their cultivation. The good crop productivity needs good soil, fertilizer and also good climate.

Weather forecasting is the very important requirement of the each farmer. Due to the sudden

changes in climate/weather, The people are suffered economically and physically. Weather

prediction is one of the challenging problems in current state. The main motivation of this

paper to predict the weather using various data mining techniques. Such as classification,

clustering, decision tree and also neural networks. Weather related information is also called

the meteorological data. In this paper the most commonly used weather parameters are

rainfall, wind speed, temperature and cold.

1.2.1 Disadvantages of Existing System

1.Classification

2.Clustering

2
3.Decision Tree

1.3 POPOSED SYSTEM

Rainfall is important for food production plan, water resource management and all activity

plans in the nature. The occurrence of prolonged dry period or heavy rain at the critical stages

of the crop growth and development may lead to significant reduce crop yield. India is an

agricultural country and its economy is largely based upon crop productivity. Thus rainfall

prediction becomes a significant factor in agricultural countries like India. Rainfall

forecasting has been one of the most scientifically and technologically challenging problems

around the world in the last century.

1.3.1 Advantages of Proposed System

1.Numerical Weather Pediction

2.Statistical Weather Prediction

3.Synoptic Weather Prediction

3
Chapter 2

Literature Survey

 Pritpal Singh et al. Measurable investigation shows the idea of ISMR, which can't be

precisely anticipated by insights or factual information. Hence, this review exhibits the

utilization of three techniques: object creation, entropy, and artificial neural network (ANN).

In view of this innovation, another technique for anticipating ISMR times has been created to

address the idea of ISMR. This model has been endorsed and supported by the studio and

exploration data. Factual examination of different information and near investigations

showing the presentation of the normal technique

 Sam Carmer , Michael Kampouridis, Alex A. Freitas , Antonios Alexandridis et al. The

primary impact of this movement is to exhibit the advantages of AI calculations, just as the

more prominent degree of clever framework than the advanced rainfall determining methods.

We analyze and think about the momentum execution (Markov chain stretched out by rainfall

research) with the forecasts of the six most notable AI machines: Genetic programming,

Vector relapse support, radio organizations, M5 organizations, M5 models, models - Happy.

To work with a more itemized appraisal, we led a rainfall overview utilizing information

from 42 metropolitan urban communities.

 Sahar Hadi Poura , Shamsuddin Shahida, Eun-Sung chungb et al. RF was utilized to

anticipate assuming that it would rain in one day, while SVM was utilized to foresee

downpour on a blustery day. The limit of the Hybrid model was fortified by the decrease of

day-by-day rainfall in three spots at the rainfall level in the eastern piece of Malaysia.

Crossover models have likewise been found to emulate the full change, the quantity of days

straight, 95% of the month-to-month rainfall, and the dispersion of the noticed rainfall .

4
 Tanvi Patil, Dr. Kamal Shah et al. The reason for the framework is to anticipate 4 the

climate sooner or later. Climatic still up in the air utilizing various sorts of factors all over the

place. Of these, main the main highlights are utilized in climate conjectures. Picking

something like this relies a great deal upon the time you pick. Underlying displaying is

utilized to incorporate the fate of demonstrating, AI applications, data trade, and character

examination.

 N.Divya Prabha, P. Radha et al. Contrasted with different spots where rainfall information

isn't accessible, it consumes a large chunk of the day to build up a solid water overview for a

long time. Improving complex neural organizations is intended to be a brilliant instrument for

anticipating the stormy season. This downpour succession was affirmed utilizing a complex

perceptron neural organization. Estimations like MSE (Early Modeling), NMSE (Usually

Early Error), and the arrangement of informational collections for transient arranging are

clear in the examination of different organizations, like Adanaive. AdaSVM.

 Senthamil Selvi S, Seetha et al. In this paper, Artificial Neural Network (ANN) innovation

is utilized to foster a climate anticipating strategy to distinguish rainfall utilizing Indian

rainfall information. Along these lines, Feed Forward Neural Network (FFNN) was utilized

utilizing the Backpropagation Algorithm. Execution of the two models is assessed dependent

on emphasis examination, Mean Square Error (MSE) and Magnitude of Relative Error

(MRE). This report likewise gives a future manual for rainfall determining.

 YashasAthreya, VaishaliBV, SagarK and SrinidhiHR, et al. This page features rainfall

investigation speculations utilizing Machine Learning. The principle motivation behind

utilizing this program is to secure against the impacts of floods. This program can be utilized

by conventional residents or the public authority to anticipate what will occur before the

flood. The flood card, then, at that point, furnish them with the vital help by moving versatile

or other important measures.

5
Chapter 3

Methodology
In this paper, the overall architecture include four major components: Data Exploration and

Analysis, Data Pre-processing, Model Implementation, and Model Evaluation, as shown in

Fig. 3.1

Fig. 3.1 Overall Architecture.

3.1 Data Exploration and Analysis

Exploratory Data Analysis is valuable to machine learning problems since it allows to get

closer to the certainty that the future results will be valid, correctly interpreted, and applicable

to the desired business contexts. Such level of certainty can be achieved only after raw data is

validated and checked for anomalies, ensuring that the data set was collected without errors.

EDA also helps to find insights that were not evident or worth investigating to business

stakeholders and researchers. We performed EDA using two methods - Univariate

Visualization which provides summary statistics for each field in the raw data set (figure 3.2)

and Pair-wise Correlation Matrix which is performed to understand interactions between

different fields in the data set (figure 3.3).

6
Table 3.1 Irrelevant Features

Fig. 3.2 Univariate Visualization.

We have other features with null values too which we will be imputing in our preprocessing

steps. If we look the distribution of our target variable, it is clear that we have a class

imbalance problem with number of positive instances - 110316 and number of negative

instances - 31877.

7
Fig. 3.3 Heat Map.

The correlation matrix depicts that the features - MaxTemp, Pressure9am, Pressure3pm,

Temp3pm and Temp9am are negatively correlated with target variable. Hence, we can drop

this features in our feature selection step later.

8
3.2 Data Preprocessing

Data preprocessing is a data mining technique that involves transforming raw data into an

understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in

certain behaviors or trends, and is likely to contain many errors. We have carried below

preprocessing steps.

3.2.1 Missing Values: As per our EDA step, we learned that we have few instances with null

values. Hence, this becomes one of the important step. To impute the missing values, we will

group our instances based on the location and date and thereby replace the null values by

there respective mean values. Feature Expansion: Date feature can be expanded to Day,

Month and Year and then these newly created features can be further used for other

preprocessing steps.

3.2.2 Categorical Values: Categorical feature is one that has two or more categories, but

there is no intrinsic ordering to the categories. We have a few categorical features -

WindGustDir, WindDir9am, WindDir3pm with 16 unique values. Now it gets complicated

for machines to understand texts and process them, rather than numbers, since the models are

based on mathematical equations and calculations. Therefore, we have to encode the

categorical data. We here tried two different techniques.

3.2.2.1 Dummy Variables: A Dummy variable is an artificial variable created to represent

an attribute with two or more distinct categories/levels. However, as we have 16 unique

values, our one feature will now get transformed to 16 new features which in turn results in

curse of dimensionality. For each instance, we will have a feature with 1 value and rest 15

features with 0 values. Example: Categorical Encoding of feature - windDir3pm using

Dummy Variables

9
Fig. 3.4 Sample Instance.

3.2.2.2 Feature Hashing: Feature hashing scheme is another useful feature engineering

scheme for dealing with large scale categorical features. In this scheme, a hash function is

typically used with the number of encoded features pre-set (as a vector of pre-defined length)

such that the hashed values of the features are used as indices in this pre-defined vector and

values are updated accordingly .Example: Categorical Encoding of feature - windDir3pm

using Feature Hashing

Fig. 3.5 Sample Instance.

Fig. 3.6 Feature Hashing.

3.2.3 Feature Selection Feature Selection is the process where you automatically or

manually select those features which contribute most to our prediction variable or output.

Having irrelevant features in data can decrease the accuracy of the models and make the

10
model learn based on irrelevant features. Feature selection helps to reduce over fitting,

improves accuracy and reduces training time. We used two techniques to perform this activity

and got the same results.

3.2.3.1 Univariate Selection: Statistical tests can be used to select those features that have

the strongest relationship with the output variable. The scikitlearn library provides the

SelectKBest class that can be used with a suite of different statistical tests to select a specific

number of features. We used chi-squared statistical test for non-negative features to select 5

of the best features from our data set .

3.2.3.2 Correlation Matrix with Heatmap: Correlation states how the features are related

to each other or the target variable. Correlation can be positive (increase in one value of

feature increases the value of the target variable) or negative (increase in one value of feature

decreases the value of the target variable). Heatmap makes it easy to identify which features

are most related to the target variable, we plotted heatmap of correlated features using the

seaborn library (figure 3.3).

3.2.4 Handling Class Imbalance We learned in our EDA step that our data set is highly

imbalanced. Imbalanced data results in biased results as our model doesn’t learn much about

the minority class. We performed two experiments one with oversampled data and another

with undersampled data.

3.2.4.1 Undersampling: We used Imblearn’s random under sampler library to eliminate

instances of majority class [10]. This elimination is based on distance so that there is

minimum loss of information (figure 3.7)

3.2.4.2 Oversampling: We used Imblearn’s SMOTE technique to generate synthetic

instances for minority class . A subset of data is taken from the minority class as an example

and then new synthetic similar instances are created. (figure 3.8)

11
3.3 Models

We chose different classifiers each belonging to different model family (such as Linear

classifier, Tree-based, Distance-based, Rule-based and Ensemble). All the

Fig. 3.7 Undersampling.

12
classifiers were implemented using scikit-learn except for Decision table which was

implemented using weka. The following classification algorithms have been used to build

prediction models to perform the experiments:

3.3.1 Logistic Regression is a classification algorithm used to predict a binary outcome (1 /

0, Yes / No, True / False) given a set of independent variables. To represent binary /

categorical outcome, we use dummy variables. We can also think of logistic regression as a

special case of linear regression when the outcome variable is categorical, where we are using

log of odds as dependent variable. In simple words, it predicts the probability of occurrence

of an event by fitting data to a logit function. Hence, this makes Logistic Regression a better

fit as ours is a binary classification problem.

3.3.2 Decision Tree have a natural if then else construction that makes it fit easily into a

programmatic structure. They also are well suited to categorization problems where attributes

or features are systematically checked to determine a final category. It works for both

categorical and continuous input and output variables. In this technique, we split the

population or sample into two or more homogeneous sets (or sub-populations) based on most

significant splitter / differentiator in input variables. This characteristics of Decision Tree

makes it a good fit for our problem as our target variable is binary categorical variable.

3.3.3 K - Nearest Neighbour is a non-parametric and lazy learning algorithm. Non-

parametric means there is no assumption for underlying data distribution. In other words, the

model structure is determined from the dataset. Lazy algorithm means it does not need any

training data points for model generation. All training data used in the testing phase. KNN

performs better with a lower number of features than a large number of features. We can say

that when the number of features increases than it requires more data. Increase in dimension

also leads to the problem of overfitting. However, we have performed feature selection which

helps to reduce dimension and hence KNN looks a good candidate for our problem. Our

13
Model’s configuration: We tried various values of n ranging from 3 to 30 and learned that the

model performs best with n as 25, 27 and 29.

3.3.4 Decision table provides a handy and compact way to represent complex business logic.

In a decision table, business logic is well divided into conditions, actions (decisions) and

rules for representing the various components that form the business logic. This was

implemented using Weka.

3.3.5 Random Forest is a supervised ensemble learning algorithm. Ensemble means that it

takes a bunch of weak learners and have them work together to form one 12 Nikhil Oswal

strong predictor. Here, we have a collection of decision trees, known as Forest. To classify a

new object based on attributes, each tree gives a classification and we say the tree votes for

that class. The forest chooses the classification having the most votes (over all the trees in the

forest). Our Model’s configuration: number of weak learners = 100, maximum depth of each

tree = 4

14
Chapter 4

Result and Discussion

4.1 Result

4.1.1 Experiment 1 - Original Dataset: Post all the preprocessing steps (as mentioned above

in the Methodology section), we ran all the implemented classifiers each one with the same

input data (Shape: 92037 x 4). Figure 4.1 depicts two considered metrics (10-skfold Accuracy

and Area Under Curve) for all the classifiers. Accuracy wise Gradient Boosting with a

learning rate of 0.25 performed best, coverage wise Random Forest and Decision Tree

performed worsts.

Fig. 4.1 Experiment 1.

4.1.2 Experiment 2 - Undersampled Dataset: Post all the preprocessing steps (as mentioned

above in the Methodology section) including the undersampling step, we ran all the

implemented classifiers each one with the same input data (Shape: 54274 x 4). Figure 4.2

15
depicts two considered metrics (10-skfold Accuracy and Area Under Curve) for all the

classifiers.

Fig. 4.2 Experiment 2

Accuracy and coverage wise Logistic Regression performed best and Decision Tree

performed worsts.

Experiment 3 - Oversampled Dataset: Post all the preprocessing steps (as mentioned above

in the Methodology section) including the oversampling step, we ran all the implemented

classifiers each one with the same input data (Shape: 191160 x 4). Figure 4.3 depicts two

considered metrics (10-skfold Accuracy and Area Under Curve) for all the classifiers.

16
Fig. 4.3 Experiment 3

Accuracy and coverage wise Decision Tree performed best and Logistic Regression

performed worsts. We have varying range of results with respect to different input data and

different classifiers. Other metrics are followed in appendix.

4.2 Discussion

With the issues with our original dataset, we learned many things considering all the

preprocessing steps that we carried to rectify them. The first important thing we learned is the

importance of knowing your data. While imputing the missing value, we grouped two other

features and calculated the mean instead of directly calculating the mean for all the instances.

This way our imputed values were closer to the correct information. Another thing we

learned is about the leaky features. While exploring our data, we came to that one of our

feature (RiskMM) was used for generating the target variable and hence it made no sense to

use this feature for predictions. We learned about the curse of dimensionality while dealing

with categorical variables which we solved using feature hashing. We also learned two

techniques for performing feature selection - univariate selection and correlation heat map.

We also explore undersampling and oversampling techniques while handling the class

imbalance problem.

With the experiments that we carried using different data, we also came to know that in a few

cases we have achieved higher accuracy (Decision Tree) clearly implying the classic case of

overfitting. We also observed that the performance of classifiers varied with different input

data. To count a few, Logistic Regression performed best with undersampled data whereas it

performed worst with oversampled data; same goes with KNN, it performed best with

oversampled data and worst with undersampled data. Hence we can say that the input data

17
has a very important role here. Ensembles to be precise Gradient Boosting performed pretty

consistently in all the experiments.

Chapter 5

Conclusion and Future Work

In this paper, we explored and applied several preprocessing steps and learned there impact

on the overall performance of our classifiers. We also carried a comparative study of all the

classifiers with different input data and observed how the input data can affect the model

predictions. We can conclude that Australian weather is uncertain and there is no such

correlation among rainfall and the respective region and time. We figured certain patterns and

relationships among data which helped in determining important features. Refer to the

appendix section. As we have a huge amount of data, we can apply Deep Learning models

such as Multilayer Perceptron, Convolutional Neural Network, and others. It would be great

to perform a comparative study between the Machine learning classifiers and Deep learning

models.

18
References

1. World Health Organization: Climate Change and Human Health: Risks and Responses. World
Health Organization, January 2003
2. Alcntara-Ayala, I.: Geomorphology, natural hazards, vulnerability and prevention of natural
disasters in developing countries. Geomorphology 47(24), 107124 (2002)
3. Nicholls, N.: Atmospheric and climatic hazards: Improved monitoring and prediction for disaster
mitigation. Natural Hazards 23(23), 137155 (2001)
4. [Online] InDataLabs, Exploratory Data Analysis: the Best way to Start a Data Science Project.
Available: https://medium.com/@InDataLabs/ why-start-a-data-science-project-with-
exploratory-data-analysis-f90c0efcbe49
5. [Online] Pandas Documentation. Available: https://pandas.pydata.org/
pandas-docs/stable/reference/api/pandas.get\_dummies.html
6. [Online] Sckit-Learn Documentation Available: https://scikit-learn.org/
stable/modules/generated/sklearn.feature\_extraction.FeatureHasher. html
7. [Online] Sckit-Learn Documentation Available: https://scikit-learn.org/
stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
8. [Online] Sckit Learn Documentation Available: https://scikit-learn.org/
stable/modules/generated/sklearn.feature_selection.SelectKBest.html

Rainfall Prediction with ML Techniques
No ratings yet
Rainfall Prediction with ML Techniques
7 pages
Rainfall Prediction Using Machine Learning
100% (1)
Rainfall Prediction Using Machine Learning
6 pages
29 July2023
No ratings yet
29 July2023
10 pages
Rainfall Prediction with ML Algorithms
No ratings yet
Rainfall Prediction with ML Algorithms
5 pages
Rainfall Prediction System: (Peer-Reviewed, Open Access, Fully Refereed International Journal)
No ratings yet
Rainfall Prediction System: (Peer-Reviewed, Open Access, Fully Refereed International Journal)
7 pages
Rainfall Forecasting with ML Techniques
No ratings yet
Rainfall Forecasting with ML Techniques
5 pages
Rainfall Prediction
No ratings yet
Rainfall Prediction
40 pages
1 s2.0 S2590123024003475 Main q1
No ratings yet
1 s2.0 S2590123024003475 Main q1
9 pages
Rainfall Prediction System 2
No ratings yet
Rainfall Prediction System 2
6 pages
Prediction of Rainfall Using Machine Learning Techniques
No ratings yet
Prediction of Rainfall Using Machine Learning Techniques
16 pages
21 - Rainfall Prediction Using Machine Learning
No ratings yet
21 - Rainfall Prediction Using Machine Learning
2 pages
Assessing Rainfall Prediction Models
No ratings yet
Assessing Rainfall Prediction Models
10 pages
Rainfall Prediction Using Machine Learning Algorithms A Comparative Analysis Approach
100% (1)
Rainfall Prediction Using Machine Learning Algorithms A Comparative Analysis Approach
4 pages
Main Journal Conference Main
No ratings yet
Main Journal Conference Main
6 pages
20mis0293 VL2023240102882 Pe003
No ratings yet
20mis0293 VL2023240102882 Pe003
23 pages
Rainfall
No ratings yet
Rainfall
7 pages
Rainfall Prediction Using Machine Learning
No ratings yet
Rainfall Prediction Using Machine Learning
6 pages
Presentationfinal 1
No ratings yet
Presentationfinal 1
14 pages
Jose MINI2nd
No ratings yet
Jose MINI2nd
39 pages
Jait V15N12 1329
No ratings yet
Jait V15N12 1329
10 pages
JIEEE V002 Iss02 Sn015
No ratings yet
JIEEE V002 Iss02 Sn015
11 pages
Rainfall Prediction With Agricultural Soil Analysis Using Machine Learning
No ratings yet
Rainfall Prediction With Agricultural Soil Analysis Using Machine Learning
11 pages
Research Paper Rain Prediction System
No ratings yet
Research Paper Rain Prediction System
6 pages
Rainfall Chapter 2
No ratings yet
Rainfall Chapter 2
8 pages
A Study On Rainfall Prediction Techniques: December 2021
No ratings yet
A Study On Rainfall Prediction Techniques: December 2021
16 pages
Atmosphere 10 00668 PDF
No ratings yet
Atmosphere 10 00668 PDF
18 pages
Rainfall Analysis and Forecasting Using Deep Learn
No ratings yet
Rainfall Analysis and Forecasting Using Deep Learn
11 pages
A Survey On Weather Forecasting To Predict Rainfall Using Big Data
No ratings yet
A Survey On Weather Forecasting To Predict Rainfall Using Big Data
8 pages
Rainfall Prediction Using ML
No ratings yet
Rainfall Prediction Using ML
5 pages
4cspl2041 - Introduction To Machine Learning
No ratings yet
4cspl2041 - Introduction To Machine Learning
6 pages
Rainfall Prediction Using ML
No ratings yet
Rainfall Prediction Using ML
5 pages
Integrating Temporal and Meteorological Metrics For Rainfall Prediction Using Machine Learning Models
No ratings yet
Integrating Temporal and Meteorological Metrics For Rainfall Prediction Using Machine Learning Models
8 pages
Document Final
No ratings yet
Document Final
44 pages
BMS Institute of Technology and Management Department of MCA
100% (1)
BMS Institute of Technology and Management Department of MCA
10 pages
Mini Project Rainfall
No ratings yet
Mini Project Rainfall
7 pages
A Review of Computational Intelligence Techniques For Rainfall Prediction (22.07.2023)
No ratings yet
A Review of Computational Intelligence Techniques For Rainfall Prediction (22.07.2023)
10 pages
IRJET Flood Prediction and Rainfall Anal
No ratings yet
IRJET Flood Prediction and Rainfall Anal
5 pages
Ajrssh-6-6-146
No ratings yet
Ajrssh-6-6-146
12 pages
Sustainability 14 02663 v2
No ratings yet
Sustainability 14 02663 v2
21 pages
Research Paper
No ratings yet
Research Paper
11 pages
Development of Advanced Artificial Intel
No ratings yet
Development of Advanced Artificial Intel
47 pages
Paper 4 Summary
No ratings yet
Paper 4 Summary
3 pages
Research Papers of Rainfall Ptediction
No ratings yet
Research Papers of Rainfall Ptediction
8 pages
Non Syllabus Project
No ratings yet
Non Syllabus Project
26 pages
Prediction of Rainfall Using Machine Lea
No ratings yet
Prediction of Rainfall Using Machine Lea
5 pages
Rainfall Prediction: Nayan Gaikwad (22102078) Nikhil Bhosale (22102045) Harsh Gaikwad (22102008) Rishi Dubey (22102086)
No ratings yet
Rainfall Prediction: Nayan Gaikwad (22102078) Nikhil Bhosale (22102045) Harsh Gaikwad (22102008) Rishi Dubey (22102086)
7 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
Discover Internet of Things: A Pragmatic Ensemble Learning Approach For Rainfall Prediction
No ratings yet
Discover Internet of Things: A Pragmatic Ensemble Learning Approach For Rainfall Prediction
15 pages
A Survey On Rainfall Prediction Using Artificial Neural Network
No ratings yet
A Survey On Rainfall Prediction Using Artificial Neural Network
9 pages
Ai 1
No ratings yet
Ai 1
4 pages
Rainfall Prediction Project
100% (4)
Rainfall Prediction Project
19 pages
Weather Forecasting with Data Mining
No ratings yet
Weather Forecasting with Data Mining
5 pages
Heuristic Predictionof Rainfall
No ratings yet
Heuristic Predictionof Rainfall
5 pages
Rainfall Prediction Project
No ratings yet
Rainfall Prediction Project
19 pages
IJRAR23B3457
No ratings yet
IJRAR23B3457
6 pages
Nandhu 19 - 01 - 2022 - Final - v6
No ratings yet
Nandhu 19 - 01 - 2022 - Final - v6
72 pages
A Comparative Study of Machine Learning Models For Daily and Weekly Rainfall Forecasting
No ratings yet
A Comparative Study of Machine Learning Models For Daily and Weekly Rainfall Forecasting
21 pages
Rain Fall Project Report
No ratings yet
Rain Fall Project Report
39 pages
Job Description - Trainee
No ratings yet
Job Description - Trainee
4 pages
Ay5741310652 8
No ratings yet
Ay5741310652 8
2 pages
Delivery #: NWT106099181: Sumit Malan
No ratings yet
Delivery #: NWT106099181: Sumit Malan
2 pages
12070-Article Text-21436-1-10-20220203
No ratings yet
12070-Article Text-21436-1-10-20220203
9 pages
4TH Summative Test in Sci7
No ratings yet
4TH Summative Test in Sci7
3 pages
1 Bac Writing Practice
No ratings yet
1 Bac Writing Practice
75 pages
Kids' Learning Guide: Basics
No ratings yet
Kids' Learning Guide: Basics
9 pages
AFMAN11-217 Vol2 Visual Flight Procedures
No ratings yet
AFMAN11-217 Vol2 Visual Flight Procedures
80 pages
2021-158-Power Outage Checklist - Cummins
No ratings yet
2021-158-Power Outage Checklist - Cummins
2 pages
Climate Change and Development LECTURE NOTES
No ratings yet
Climate Change and Development LECTURE NOTES
10 pages
Soft Gelatin Capsule Manufacturing
No ratings yet
Soft Gelatin Capsule Manufacturing
1 page
Descriptive Text Writing Guide
No ratings yet
Descriptive Text Writing Guide
2 pages
F1 Prashant Updated-1 - 250812 - 133009
No ratings yet
F1 Prashant Updated-1 - 250812 - 133009
106 pages
Hot Vs Cold Lab Lesson Plan
No ratings yet
Hot Vs Cold Lab Lesson Plan
4 pages
CHIRPS Rainfall Data Accuracy in Ruvu
No ratings yet
CHIRPS Rainfall Data Accuracy in Ruvu
15 pages
Urban Cooling & Drought Solutions
No ratings yet
Urban Cooling & Drought Solutions
152 pages
8 Grammar Rev T3 24
No ratings yet
8 Grammar Rev T3 24
9 pages
P.1 Schemes of Work Term Ii
No ratings yet
P.1 Schemes of Work Term Ii
117 pages
4th PERIODIC TEST IN SCIENCE 5 Final
No ratings yet
4th PERIODIC TEST IN SCIENCE 5 Final
4 pages
Matura Probna 2025 Arkusz PR
100% (1)
Matura Probna 2025 Arkusz PR
13 pages
Project Planning Scheduling Control The Ultimate Handson Guide To Bringing Projects in On Time and On Budget Sixth Edition James P Lewis PDF Download
No ratings yet
Project Planning Scheduling Control The Ultimate Handson Guide To Bringing Projects in On Time and On Budget Sixth Edition James P Lewis PDF Download
26 pages
Speaking Part 1
No ratings yet
Speaking Part 1
5 pages
Tropical Architecture and Design 1
No ratings yet
Tropical Architecture and Design 1
9 pages
CO2 and CO2 Lesson Plan
No ratings yet
CO2 and CO2 Lesson Plan
20 pages
God's Design For Heaven & Earth
100% (2)
God's Design For Heaven & Earth
23 pages
Magic Survival Information Spreadsheet - v0.8574 PDF
No ratings yet
Magic Survival Information Spreadsheet - v0.8574 PDF
44 pages
Characteristics of Building Stones
No ratings yet
Characteristics of Building Stones
4 pages
Step To Use Wind Load Structure Asce 7-16
No ratings yet
Step To Use Wind Load Structure Asce 7-16
16 pages
Sri Lanka 14days Itenery MCX 2024
No ratings yet
Sri Lanka 14days Itenery MCX 2024
2 pages
The Autumn Lady
No ratings yet
The Autumn Lady
11 pages
Introduction To Business Forecasting and Predictive Analytics
0% (1)
Introduction To Business Forecasting and Predictive Analytics
25 pages
Local Winds
No ratings yet
Local Winds
28 pages
Chapter 1 Types of Clouds in Earth S Atmosphere 2014 International Geophysics
No ratings yet
Chapter 1 Types of Clouds in Earth S Atmosphere 2014 International Geophysics
21 pages
11.ship Commissioning, Sea Trial & Delivery
No ratings yet
11.ship Commissioning, Sea Trial & Delivery
42 pages

Mini Project Report

Uploaded by

Mini Project Report

Uploaded by

Mini Project Report on

RAINFALL PREDICTION SYSTEM

Submitted in partial fulfilment of the requirement for the award of the

Student Name: University Roll No.

Under the Mentorship of

Ms. Meenakshi Maindola

Department of Computer Science and Engineering

Name :- Sumit malan

Chapter No. Description Page No.

industries, risk management entities, as well as the scientific community. Rainfall is a

deterministic and mathematical knowledge Climate prediction methods. The capacity to

process knowledge correctly in a dynamic world is artificial intelligence.

established. Thanks to the time-frequency interpretation of different water management work

on quantitative data of current atmospheric state and complex combination of mathematical

1.2 Existing System

rainfall, wind speed, temperature and cold.

1.2.1 Disadvantages of Existing System

1.3 POPOSED SYSTEM

prediction becomes a significant factor in agricultural countries like India. Rainfall

around the world in the last century.

1.3.1 Advantages of Proposed System

1.Numerical Weather Pediction

2.Statistical Weather Prediction

3.Synoptic Weather Prediction

exploration data. Factual examination of different information and near investigations

showing the presentation of the normal technique

Vector relapse support, radio organizations, M5 organizations, M5 models, models - Happy.

from 42 metropolitan urban communities.

clear in the examination of different organizations, like Adanaive. AdaSVM.

is utilized to foster a climate anticipating strategy to distinguish rainfall utilizing Indian

investigation speculations utilizing Machine Learning. The principle motivation behind

or other important measures.

Analysis, Data Pre-processing, Model Implementation, and Model Evaluation, as shown in

Fig. 3.1 Overall Architecture.

3.1 Data Exploration and Analysis

stakeholders and researchers. We performed EDA using two methods - Univariate

and Pair-wise Correlation Matrix which is performed to understand interactions between

different fields in the data set (figure 3.3).

Fig. 3.2 Univariate Visualization.

this features in our feature selection step later.

understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in

there is no intrinsic ordering to the categories. We have a few categorical features -

WindGustDir, WindDir9am, WindDir3pm with 16 unique values. Now it gets complicated

based on mathematical equations and calculations. Therefore, we have to encode the

categorical data. We here tried two different techniques.

3.2.2.1 Dummy Variables: A Dummy variable is an artificial variable created to represent

an attribute with two or more distinct categories/levels. However, as we have 16 unique

features with 0 values. Example: Categorical Encoding of feature - windDir3pm using

values are updated accordingly .Example: Categorical Encoding of feature - windDir3pm

using Feature Hashing

Fig. 3.5 Sample Instance.

Fig. 3.6 Feature Hashing.

and got the same results.

of the best features from our data set .

seaborn library (figure 3.3).

with undersampled data.

3.2.4.1 Undersampling: We used Imblearn’s random under sampler library to eliminate

minimum loss of information (figure 3.7)

3.2.4.2 Oversampling: We used Imblearn’s SMOTE technique to generate synthetic

classifier, Tree-based, Distance-based, Rule-based and Ensemble). All the

Fig. 3.7 Undersampling.

prediction models to perform the experiments:

3.3.1 Logistic Regression is a classification algorithm used to predict a binary outcome (1 /

fit as ours is a binary classification problem.

significant splitter / differentiator in input variables. This characteristics of Decision Tree

3.3.3 K - Nearest Neighbour is a non-parametric and lazy learning algorithm. Non-

model performs best with n as 25, 27 and 29.

implemented using Weka.

Result and Discussion

Fig. 4.1 Experiment 1.

Fig. 4.2 Experiment 2

different classifiers. Other metrics are followed in appendix.

consistently in all the experiments.

Conclusion and Future Work

You might also like