Report TTT
Report TTT
by
We thus declare that, to the best of our knowledge and belief, this submission is entirely original
work by us, with the exception of any passages where appropriate credit has been given within
the text. Neither previously published or written work by another person nor work that has been
substantially accepted for the award of any other degree or certificate from the university or
other higher education institution is present.
Signature:
Name: Aryan Parashar
Roll number: 2000320120048
Date:
Signature:
Name: Aryan Gupta
Roll number: 2000320120046
Date:
Signature:
Name: Aryan Tyagi
Roll number: 2000320120050
Date:
Signature:
Name: Anurag Bhardwaj
Roll number: 2000320120039
Date:
CERTIFICATE
This certifies that Aryan Parashar, Aryan Gupta, Aryan Tyagi, and Anurag Bhardwaj's project
report, "Predicting soil moisture using meteorological data," is a record of their own work
completed under my supervision and partially fulfills the requirements for the granting of a
B.Tech. degree in the Department of Computer Science at Dr. A.P.J. Abdul Kalam Technical
University. This thesis' content is unique and hasn't been submitted for consideration for any
other degree.
Presenting the report of the B.Tech project completed during the B.Tech final year brings us
tremendous joy. We are especially grateful to Ms. Disha Mohini Pathak, an assistant professor in
the computer science department of ABES Engineering College in Ghaziabad, for her
unwavering support and direction during the project. His/her honesty, diligence, and tenacity
have always served as an example for us. Only because of his aware efforts have our endeavors
been successful.
We also take this opportunity to thank Professor (Dr.) Pankaj Kumar Sharma, Head of the
Computer Science Department at ABES Engineering College, Ghaziabad, for his invaluable
support and help during the project's development.
Furthermore, we would hate to pass up the chance to thank the department's teachers for their
generous support and collaboration as our project was being developed. Finally, but just as
importantly, we thank our friends for helping to see the project through to completion.
Signature:
Name: Aryan Parashar
Roll No. 2000320120048
Date:
Signature:
Name: Aryan Gupta
Roll No. 2000320120046
Date:
Signature:
Name: Aryan Tyagi
Roll No. 2000320120048
Date:
Signature:
Name: Anurag Bhardwaj
Roll No. 2000320120039
Date:
ABSTRACT
The study of agriculture has the potential to improve our current situation regarding food
and water scarcity, two issues that the world is now grappling with and trying to find
solutions for. This issue can be resolved by using economically viable machine learning
approaches to predict soil moisture based on meteorological data. Many farmers lack the
means or money to check soil moisture levels as they pertain to long-term crops, even if
some can afford to have several moisture sensors and keep an eye on them. Giving farmers
permission to hire a specialist to conduct a sensor-based study of their property would be
one way to solve the problem. This way, the model could forecast soil moisture levels based
on meteorological information.
Keeping the soil at the proper moisture level during the plant-growing phenomena might
result in higher yields and fewer crop-related problems overall. Varying phases of growth
have different consequences, or maybe insignificant effects, from water surplus or deficit.
It's critical to understand how your land uses and stores water, as these factors can vary
greatly depending on the type of plants you use, the terrain, and elevation changes. We may
use a regression model with many strategies to get around this problem; based on settings
for prediction time, fit time, and r2 score, Random forest emerges as the top option.
TABLE OF
Page
CONTENTS
DECLARATION ii
CERTIFICATE iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
LIST OF TABLES vii
LIST OF FIGURES viii
LIST OF SYMBOLS ix
LIST OF ABBREVIATIONS x
CHAPTER 1 (INTRODUCTION) 1
1.1 (Motivation)
1.2 (Project Objective)
1.3 (Scope of the object)
1.4 (Related Work)
1.5 (Organizaion of the report)
CHAPTER 2 (LITERATURE REVIEW) 13
2.1. ...............................................................................................................
2.2. ...............................................................................................................
2.2.1 .............................................................................................................
2.2.2. ..........................................................................................................
2.2.2.1. ........................................................................................................
2.2.2.2. ........................................................................................................
2.3. ...............................................................................................................
CHAPTER 3
3.1. ................................................................................................................ 36
3.2. ................................................................................................................ 39
CHAPTER 4 (CONCLUSIONS) 40
APPENDIX A 45
APPENDIX B 47
REFERENCES... .................................................................................................... 49
LIST OF TABLES
Table Description
Figure Description
≠ Not Equal
∈ Belongs to
€ Euro- A Currency
_ Optical distance
Maintaining the right amount of moisture in the soil during the plant growth season
can increase yields and reduce crop issues overall. Water excess or shortage has
varying, or even insignificant, impacts on different development stages.
Because it can vary widely based on the plants you utilize, the terrain, and the
elevation of the region, it is important to understand how your land consumes and
stores water. This kind of approach has been utilized by farmers for hundreds of
years. What counts is the precision we can get with real data. For the last few
centuries, farmers had to evaluate the moisture level of their soil mostly by touch
and experience. Even if a lot of farmers were successful in the sense that they
were able to grow crops, there were still ways they might have improved the
productivity of their harvests.
Although there are other factors besides plant availability that affect yields, the
goal of this study is to develop a model that farmers can use to estimate soil
moisture levels without having to purchase and install costly sensors. There are
several possible uses for the developed model. The ability to track present soil
conditions is the primary usage, allowing for the potential correction of any
problems by making necessary adjustments.
Second, a farmer might assess past data, compare it to yields or other harvest
outcomes, and utilize this analytical knowledge to guide actions in the future. A
maize farmer, for instance, could just be concerned with the circumstances that
are anticipated to ensure that they fall within acceptable bounds. Together with
other data, a grape farmer in a wine vineyard may use this information to forecast
the wine's quality or even the blend of wine that would be best made from grapes
grown under these circumstances.
The goal of this research is to precisely observe how weather affects a specific
plot of land in the state of Washington. To get benchmarks, this process might be
carried out anywhere in the world. For a farmer without the resources to undertake
a comprehensive study of water consumption on their field, these benchmarks
may be an affordable alternative for training data. Alternatively, they may choose a
model with a comparable soil composition and/or topography, and then estimate
the soil moisture content using their own meteorological data.
This project's main objective is to provide the greatest tool at a price that will allow
for widespread use.
1. Problem Statement
1.1 Motivation
Predicting soil moisture using machine learning and weather data is crucial for
optimizing agriculture, enhancing water resource management, and mitigating the
impacts of droughts. Accurate predictions aid in crop yield optimization through
precise irrigation, promoting resource efficiency and sustainable farming practices.
Early warning systems for droughts benefit communities, governments, and farmers,
enabling proactive measures. The information is valuable for reservoir management,
urban planning, and infrastructure development, reducing risks associated with soil
moisture variations. Integration into weather models improves forecasting accuracy,
while the insights contribute to climate change research. Soil moisture predictions
also play a role in insurance risk assessment for agriculture. Overall, this approach
not only addresses immediate challenges but fosters scientific understanding,
supporting innovation and informed decision-making in various sectors.
The project aims to develop a robust machine learning model for predicting soil
moisture levels based on weather data. By leveraging advanced algorithms, the
objective is to provide accurate and timely information to optimize agricultural
practices, enhance water resource management, and mitigate the impact of
droughts. The model will contribute to precision farming by guiding farmers in
optimal irrigation scheduling, ultimately improving crop yields and resource
efficiency. Additionally, the project seeks to facilitate early warning systems for
droughts, empowering communities, governments, and farmers to proactively
respond to water scarcity challenges. The research will also explore the integration
of soil moisture predictions into weather forecasting models, with the potential to
improve overall forecasting accuracy. The overarching goal is to create a practical
tool that addresses real-world challenges in agriculture, water management, and
environmental sustainability, fostering informed decision-making and contributing to
the advancement of scientific knowledge.
Chapter 5: Conclusion
Synthesizing the project, this chapter summarizes key findings, reflecting on limitations
and proposing avenues for future research. It underscores the project's significance in
agriculture and environmental sustainability, providing closure to the report while
emphasizing real-world applications.
CHAPTER 2
LITERATURE SURVEY
India, recognized for its agricultural legacy, saw around 73 million hectares fitted with irrigation
infrastructure in fiscal year 2022-23, accounting for 52% of the total 141 million hectares of gross planted
land [10]. The nation's available utilizable water resources are limited to an annual capacity of 1122 BCM
(Billion Cubic Meters), which includes 690 BCM from surface water and 432 BCM from groundwater. The
utilized water potential for this allotment is roughly 699 BCM, with 450 BCM generated from surface water
and 249 BCM from groundwater. Noticeably, the agriculture sector is thought to utilize 85-90% of the
country's overall water demand.
The work in [2] focuses on the prediction of soil moisture using specific implementations of RNN (Recurrent
Neural Network), LSTM (Long-Short Term Memory), and Volumetric Soil maintenance clustering using
weather data [5]. The data set was collected from 28 dissimilar locations located near Siberia which
inculcated 4 distinct types of soil. The data set was collected from 2017-2018 and was tested for about 8
months from September 2019 to April 2020. The results were proved to be 84 percent accurate wherein the
accuracy was compromised when the no.of days was reduced and soil depth was increased.
Gap: The shorter time frame prediction produced an unsatisfactory accuracy rate and inculcated that a large
set of data should be researched.
The research in [3] focuses on soil moisture prediction for irrigation automation and forecasting utilising
time-series modelling. The models utilised were Lasso, Decision Tree, Random Forest, and Support Vector
Machine. The investigation was conducted on a cotton farm, employing wireless soil moisture monitoring
equipment set in five plots. Temperature, depth, air humidity, and wind speed were some of the factors used.
Gap: In this model, only a single data set with a small count of a number of rows is used. No comparative
analysis is provided in the result and analysis section to select the best out of all.
The study in [4] focuses on predicting soil moisture. The models utilised were Deep Neural Network
Regression (DNNR). The research region included 25 weather stations that collected data on 32 factors such
as air temperature, rainfall, soil moisture, wind direction, and so on, with 15 placed in 8 North Dakota
counties and 10 in seven Minnesota counties. The study included various sorts of crops, including wheat, dry
beans, canola, oats, and barley.
Gap: Predict the soil moisture only at a depth of 20 cm.
The research in [5] focuses on forecasting soil moisture. The models used were Random Forest (RF),
Extremely Randomised Trees (ET), and Gradient Boosting Machines (GBM), with temperature, wind speed,
and air humidity as variables.
Gap: Limited parameters; not suitable for varied depths.
In this model, we are going to fill the following gaps discovered in several research papers.
and
artificial
neural
network)
soil
moisture,
daily
precipitat
ion, illu-
mination
duration
Figure 1
.
CHAPTER 3
1. System Design
2. Methodology
1. Linear Regression
2. Ridge Regression
3. Lasso Regression
9. ElasticNet Regression
SYSTEM
DESIGN
System Design should include the following sections (Refer each figure or table in some
text). Figure number should be provided below the figure and the table numbering should
be provided above the table.
1. Architecture diagrams
Figure 3
3. Class Diagram
Figure 4
4. Database schema diagrams
Figure 5
CHAPTER 4
RESULTS AND
DISCUSSION
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Hybrid KNN-Decision Tree and Hybrid Decision Tree-Ridge models have a longer training time.
This could imply that they need more time to study the initial data and understand the underlying
patterns. However, Hybrid Lasso-Ridge and Hybrid KNN-Lasso models can be trained much faster,
approximately between 1.5 to 2 seconds. It seems like they can learn quickly due to getting rid of
features that are unimportant or reducing complexity in general.
On the other hand, both Hybrid Decision Tree-Ridge and Hybrid Lasso-Ridge are also coming up
with predictions in 0.2 seconds or less; unlike the Hybrid KNN-Decision Tree and Hybrid
KNN-Lasso predictions that require a large amount of time - 24 seconds. This might be due to the
fact that before making a prediction, KNN needs to compare new data points to all of the data it was
trained on. The r2_score determines how well the model's predictions correspond with the actual
values; here, Hybrid Decision Tree-Ridge performs excellently since it has the highest score, hence
there is a strong connection between the data leading to highly reliable predictions.
Figure 12
CONCLUSIO
1. Performance Evaluation
The outcome of all experimentation was a proceeding in which two data sets could be joined and
fed into a model to forecast the soil moisture with great accuracy, an r2 score lies between 0.977
and 0.991 depending on the depth using a Random Forest Regressor with default settings. This
procedure could be a repeatable process in which a peasant contracts a company to collect
training data on their land specifically for a growing season. As the collected sets of sensor data
could be cumbersome and expensive to deal with as a peasant, this is an alternative that is
cheaper and still gives nearly the same outcomes as having sensors that are constantly running.
Alternatively, this process could be a sub-process in a larger suite of software that peasants could
use for forecasting analysis or even have a set of data on soil moisture from a growing season to
use in post-season analysis of their crop produced. As long as large-scale AI programs are still
expensive and cumbersome for peasants to deal with, there will be a low rate of adoption. This
project has shown that a solution for large-scale soil moisture prediction software could be done
with relatively low computational costs.
}
],
"source": [
"pipe_with_estimator = Pipeline(steps=[('preprocessor',
preprocessor),\n", " ('classifier', Lasso(alpha = 1))])\n",
"\n",
"data_cols = ['30cm', '60cm', '90cm', '120cm', '150cm']\n",
"try:\n",
" log\n",
"except NameError:\n",
" log = pd.DataFrame(columns = ['Experiment', 'Depth', 'Fit_Time', 'Pred_Time',
'r2_score', 'datetime'])\n",
" \n",
"for cols in
data_cols:\n", " t0 =
time.time()\n",
" pipe_with_estimator.fit(X_train_set[cols],
y_train_set[cols])\n", " t1 = time.time()\n",
" preds =
pipe_with_estimator.predict(X_test_set[cols])\n", " t2 =
time.time()\n",
" r2sc = r2_score(y_test_set[cols], preds)\n",
" now = datetime.now().strftime('%Y-%m-%d %H:%M:%S')\n",
" log.loc[len(log)] = ['Lasso Reg - Alpha = 1', cols, t1-t0, t2-t1, r2sc,
now]\n", " \n",
"print(log)"
]
},
{
"cell_type":
"markdown",
"metadata": {},
"source": [
"### At least with with these parameters, Lasso Fits Poorly"
]
},
{
"cell_type":
"markdown",
"metadata": {},
"source": [
"### Ridge with a built in gridsearch cross validation"
]
},
{
"cell_type": "code",
"execution_count":
19, "metadata": {},
"outputs": [
{
"name": "stdout",
"output_type":
"stream", "text": [
" Experiment Depth Fit_Time Pred_Time r2_score \\\n",
"0 First Linear Reg 30cm 32.042364 14.371771 9.154623e-01 \n",
"1 First Linear Reg 60cm 2.874317 0.135288 -1.662894e+14 \n",
"2 First Linear Reg 90cm 2.858727 0.156215 9.487954e-01 \n",
"3 First Linear Reg 120cm 2.874285 0.171831 9.460321e-01 \n",
"4 First Linear Reg 150cm 2.952462 0.125001 9.433287e-01 \n",
"5 Ridge Reg - Alpha = 1 30cm 28.188629 0.140590 9.162112e-01 \n",
"6 Ridge Reg - Alpha = 1 60cm 1.312190 0.218699 9.427566e-01 \n",
"7 Ridge Reg - Alpha = 1 90cm 1.202838 0.156215 9.487904e-01 \n",
"8 Ridge Reg - Alpha = 1 120cm 1.140353 0.140597 9.460320e-01 \n",
"9 Ridge Reg - Alpha = 1 150cm 1.140351 0.140591 9.433202e-01 \n",
"10 Lasso Reg - Alpha = 1 30cm 15.871741 0.156247 -1.832157e-04 \n",
"11 Lasso Reg - Alpha = 1 60cm 1.327864 0.140546 -4.613909e-05 \n",
"12 Lasso Reg - Alpha = 1 90cm 1.359043 0.140626 -5.673799e-06 \n",
"13 Lasso Reg - Alpha = 1 120cm 1.373374 0.140591 -1.131381e-06 \n",
"14 Lasso Reg - Alpha = 1 150cm 1.348597 0.140538 -1.814059e-04 \n",
"15 Ridge Reg - GSCV 30cm 15.183893 0.187457 9.162351e-01 \n",
"16 Ridge Reg - GSCV 60cm 5.366714 0.156221 9.427570e-01 \n",
"17 Ridge Reg - GSCV 90cm 5.436197 0.140594 9.487957e-01 \n",
"18 Ridge Reg - GSCV 120cm 5.483074 0.171830 9.460322e-01 \n",
"19 Ridge Reg - GSCV 150cm 5.561178 0.203086 9.433280e-01 \n",
"\n",
}
],
"source": [
"pipe_with_estimator = Pipeline(steps=[('preprocessor', preprocessor),\n",
" ('classifier', RidgeCV(alphas = [0.001, 0.01, 0.1, 1, 10, 100,
1000]))])\n", "\n",
"data_cols = ['30cm', '60cm', '90cm', '120cm', '150cm']\n",
"try:\n",
" log\n",
"except NameError:\n",
" log = pd.DataFrame(columns = ['Experiment', 'Depth', 'Fit_Time', 'Pred_Time',
'r2_score', 'datetime'])\n",
" \n",
"for cols in
data_cols:\n", " t0 =
time.time()\n",
" pipe_with_estimator.fit(X_train_set[cols],
y_train_set[cols])\n", " t1 = time.time()\n",
" preds =
pipe_with_estimator.predict(X_test_set[cols])\n", " t2 =
time.time()\n",
" r2sc = r2_score(y_test_set[cols], preds)\n",
" now = datetime.now().strftime('%Y-%m-%d %H:%M:%S')\n",
" log.loc[len(log)] = ['Ridge Reg - GSCV', cols, t1-t0, t2-t1, r2sc,
now]\n", " \n",
"print(log)"
]
},
{
"cell_type":
"markdown",
"metadata": {},
"source": [
"Gridsearch found alpha = 1 to be the best parameter"
]
},
{
"cell_type":
"markdown",
"metadata": {},
"source": [
"## Other Regressor Tests"
]
},
{
"cell_type":
"markdown",
"metadata": {},
"source": [
"Right now Ridge Regression with an alpha of 1 is winning as the best model so far. Let's see if we
can beat it"
]
},
{
"cell_type": "code",
"execution_count":
20, "metadata": {},
"outputs": [
{
"name": "stdout",
"output_type":
"stream", "text": [
" Experiment Depth Fit_Time Pred_Time r2_score \\\n",