AGRICULTURE CROP PREDICTION
Kaviya Bharathi. D (21MIS0069)
Bharani Kumar A (21MIS0110)
Katta Vamsi Krishna (21MIS0250)
Anill Udhayakumar (21MIS0363)
Arvind E (21MIS0439) 1
PROBLEM IDENTIFICATION
■ Time complexity in existing system for predicting the crop is high and can’t include multiple variables
(attributes) for crop prediction.
■ Agricultural data is vast, encompassing satellite imagery, sensor data, weather reports, soil samples,
market prices, and historical crop yields. The sheer volume of this data makes it difficult to process and
store using traditional data processing systems.
■ The data comes in different formats, resolutions, and frequencies, creating a significant challenge in
harmonizing them for consistent analysis.
■ Agricultural datasets often suffer from missing, incomplete, or inconsistent data due to human errors,
sensor malfunctions, or measurement inaccuracies.
■ Incorporating Temporal and Spatial Variability.
■ Data Quality and Consistency.
Methodology
✓ Predicting crop yield involves leveraging the DecisionTreeRegressor, a
powerful algorithm suited for handling complex datasets with multiple
features.
Data collection:
❑ Diverse datasets encompassing soil attributes, meteorological conditions,
and crop-specific information are gathered.
Data preprocessing:
❑ Conducted to clean the data, handle missing values, and ensure
consistency.
Feature engineering :
❑ Relevant features are selected and transformed into a format suitable for
modeling.
❑ This includes combining multiple attributes into a single feature vector and
encoding categorical variables as numerical data.
Decision Tree Regressor
■ The core of the methodology involves model training with the Decision Tree Regressor.
■ This model builds a tree-like structure to make predictions, where each node represents a
decision based on a feature, ultimately predicting crop yields based on historical data.
■ The model's performance is assessed using metrics such as Root Mean Squared Error
(RMSE) and R-squared, to speak about model prediction.
Optimization:
■ The model’s parameters, such as tree depth and minimum samples per leaf, are fine-tuned to
enhance predictive accuracy.
■ By employing the DecisionTreeRegressor, this methodology ensures accurate and interpretable
predictions of crop yields, supporting informed decision-making and enhancing agricultural
productivity.
Data Analysis Method – Regression Analysis
Direct Prediction of Continuous Outcomes:
❑ It models the relationship between crop yield and multiple independent variables (e.g., soil nutrients, weather
conditions, and crop-specific information).
Handling Multiple Variables:
❑ Regression analysis can handle multiple variables simultaneously, which aligns well with the need to include various
attributes such as soil nutrients , meteorological factors (TEMPERATURE, HUMIDITY, RAINFALL), environmental
conditions (pH), and crop-related factors (STATE, CROP_PRICE).
Flexibility and Adaptability:
❑ Advanced regression techniques, such as Decision Tree Regressor, Random Forest Regressor, or Gradient Boosting
Regressor, can model non-linear relationships and interactions between variables, providing flexibility in capturing
complex patterns in the data.
Insightful Feature Importance:
❑ Regression analysis not only predicts crop yield but also provides insights into which factors are most influential.
DEVELOPMENT
■ The selected IDE for implementing the agriculture crop yield prediction model is Jupyter Notebook.
■ Jupyter Notebook is an ideal platform for crop yield prediction due to its interactive coding environment,
ease of use, strong support for Python libraries, data visualization capabilities, and wide community support.
■ Development implementation of the project begins with loading the dataset into Jupyter Notebook and
performing data cleaning and exploration using Python libraries.
Data Collection (Yield Dataset)
Data Cleaning (Removing Unwanted Columns)
Data Collection (Rainfall Dataset)
The conversion of average_rain_fall_mm_per_year from an object to a float is needed because the data might be
stored as text (object type), even if it looks like numbers. To do any calculations, like finding averages or sums, the
data needs to be in a number format (like float) and we drop the unnecessary columns too.
Merged Dataset of yield and rainfall Merged Dataset of yield and rainfall and pesticide
Pesticide Dataset
• Removing unwanted columns and
changing column names
Data Collection (Temperature Dataset)
Data Collection (Complete Dataset)
Data Cleaning (Complete Dataset)
Checking for Null Values
[No empty value is found]
Data Exploration
Data Exploration by grouping of Area.
(India has the highest yield prediction)
Data Exploration by grouping of items.
Data Exploration by grouping of item and area.
(India is the highest producers of Cassava and Potato)
C
O
R
R
E
L
A
T
I
O
N
VIA
H
E
A
T
M
A
P
DATA PREPROCESSING
Data Preprocessing is a technique that is used to convert the raw data into a clean data set.
Encoding Categorical Variables:
❑ There are two categorical columns in the dataset
❑ Categorical data are variables that contain label values rather than numeric values. The number of possible values
is often limited to a fixed set, like in this case, items and countries values that must be converted to a numerical
form.
Technique Used:
❑ One hot encoding is a process by which categorical variables are converted into a form that could be provided to
ML algorithms to do a better job in prediction.
❑ For that purpose, One-Hot Encoding will be used to convert these two columns to one-hot numeric array.
❑ This encoding will create a binary column for each category and returns a matrix with the results.
DATA PREPROCESSING
Scaling of Features
Model Selection, Training and Testing Data
Model Selected: Decision Tree Regressor
Comparison of all model accuracies
Testing Model for Prediction
Running the model’s actual values vs predicted one
Prediction Results
Accuracy Results
Evaluation and Accuracy of the Model
Crop Prediction
We are able to predict the crop correctly by entering the inputs (hg/ha_yield, rainfall, pesticide, temperature).
When we don’t know the amount of yield we are going to produce then also we are able to predict the crop
correctly by entering the other inputs (rainfall, pesticide, temperature).
Workflow Proposed
1) Data Collection Completed
2) Data Cleaning Completed
3) Data Exploration Completed
4) Data Pre-Processing Completed
5) Feature Selection Completed
6) Model Selection Completed
7) Model Training and evaluation Completed
Thank You