0% found this document useful (0 votes)

21 views4 pages

ML Da1

Uploaded by

Lakshita Setia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views4 pages

ML Da1

Uploaded by

Lakshita Setia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

School of Computer Science Engineering and Information Systems

Winter Semester 2024-2025

BITE410L - MACHINE LEARNING

Assignment – I

Topic: Water Quality prediction

Team Members:
1. Introduction

Access to safe drinking water is crucial for public health, yet traditional water quality assessment
methods are time-consuming and resource-intensive. To address this challenge, we propose an
advanced machine learning (ML) approach for predicting water quality based on chemical
parameters such as pH, turbidity, conductivity, and hardness.

While conventional models like Naive Bayes, Decision Tree, and Multilayer Perceptron (MLP)
offer reasonable accuracy, they often fall short when features exhibit interdependence and lack
transparency in decision-making. To overcome these limitations, we introduce a novel
Ensemble Learning approach using a Stacking Classifier, which combines Naive Bayes,
Decision Tree, and MLP models, with Logistic Regression as the meta-classifier. Furthermore,
we use SHAP (SHapley Additive Explanations) to identify the most influential water quality
parameters.

2. Literature Review

Peretz et al. (2024), in their journal “Engineering Applications of Artificial Intelligence”, state that
the Naïve Bayes classifier is a widely used probabilistic model due to its simplicity and
efficiency. However, its key limitation lies in its assumption of feature independence, which often
reduces its classification accuracy. To address this, they proposed the Naïve Bayes Enrichment
Method, which enhances classification performance by optimizing feature selection through
threshold learning and employing multiple NB classifiers with different distributions. Their study
found that NBEM significantly improves recall and precision by integrating results through a
weighted classification function. Additionally, they highlight the broad applications of NB
classifiers in various domains, including text classification, fraud detection, and medical
diagnostics. The study emphasizes that while NB is efficient for high-dimensional data, future
research should focus on refining its independence assumption, integrating deep learning
methodologies, and improving interpretability to enhance its trustworthiness in real-world
decision-making scenarios.

Mienye and Jere (2024), in their journal “A Survey of Decision Trees: Concepts, Algorithms,
and Applications”, state that decision tree-based algorithms are widely used due to their
simplicity, interpretability, and efficiency in classification and regression tasks. They discuss key
algorithms such as CART, ID3, C4.5, and CHAID, along with ensemble methods like random
forest and gradient-boosted decision trees, highlighting their applications in medical diagnosis,
fraud detection, and finance. The authors note that while decision trees offer transparency, they
are prone to overfitting and sensitivity to noise. Techniques such as pruning, ensemble learning,
and hybrid models help mitigate these issues. Their study also emphasizes that decision trees,
particularly ensemble models, achieve high accuracy in diagnosing diseases and detecting
fraud. In conclusion, they suggest that decision trees remain valuable in machine learning
despite their limitations. Future research should focus on enhancing scalability, handling high-
dimensional data, and integrating deep learning techniques to improve their predictive
performance

Ramchoun et al. (2016), in their journal “Multilayer Perceptron: Architecture Optimization and
Training”, discuss the optimization of Multilayer Perceptron (MLP) architecture to improve
classification and regression tasks. They highlight that selecting the optimal number of hidden
layers and neurons is crucial for preventing underfitting and overfitting. The study proposes a
genetic algorithm-based approach to optimize network architecture, ensuring efficiency in
training and generalization. The authors further explain that traditional MLP models fix their
architecture before training, leading to suboptimal performance. Their research introduces an
optimization model that dynamically adjusts connections and hidden layers using binary
variables, enhancing network adaptability. The study demonstrates the effectiveness of this
approach through experiments on the Iris dataset, showing improved classification accuracy
and reduced computational complexity compared to existing methods.In conclusion, Ramchoun
et al. (2016) emphasize that optimizing MLP architecture is essential for improving neural
network performance. They suggest further research on applying this model to real-world
datasets, such as medical diagnosis and financial forecasting, to validate its effectiveness
across different domains.

3. Proposed Design/Solution of the Identified Problem

Our proposed solution involves developing an ensemble-based ML model for water quality
prediction, focusing on both performance and interpretability.

3.1 Data Collection:

 We use the Water Quality Dataset, which includes chemical parameters such as pH,
turbidity, conductivity, hardness, sulphate, and dissolved solids.

3.2 Data Preprocessing:

 Handle missing values using median imputation.

 Normalize features using Min-Max scaling.

 Split the dataset into 80% training and 20% testing sets.

3.3 Model Training:

1. Base Models: Train individual classifiers: Naive Bayes, Decision Tree, and MLP.
2. Ensemble Learning: Combine predictions using a Stacking Classifier, with Logistic
Regression as the meta-classifier.

3.4 Performance Evaluation:

 Evaluate models using metrics such as accuracy, precision, recall, F1-score.

 Compare the performance of individual models and the ensemble approach.

3.5 Model Explainability:

 Use SHAP values to identify the most influential parameters, such as pH and turbidity,
enhancing the model's transparency.

Expected Outcomes:

1. Improved Accuracy: Higher classification accuracy compared to individual models.

2. Enhanced Explainability: Clear insights into the key water quality parameters.

3. Deployable Solution: A user-friendly system for real-time water quality assessment.

Predicting Water Purity by Riding The Ensemble Waves With Gradient Boosting Classification Technique
No ratings yet
Predicting Water Purity by Riding The Ensemble Waves With Gradient Boosting Classification Technique
4 pages
Summaries of Sources For Tomorrow
No ratings yet
Summaries of Sources For Tomorrow
3 pages
Checkfinal 123
No ratings yet
Checkfinal 123
18 pages
JWC 2023403
No ratings yet
JWC 2023403
23 pages
Ristan To 2018
No ratings yet
Ristan To 2018
71 pages
An AI-Driven Approach To Potable Water Classification Using Machine Learning Techniques - Abdulla A
No ratings yet
An AI-Driven Approach To Potable Water Classification Using Machine Learning Techniques - Abdulla A
8 pages
A Predictive Model For Water Quality Index Assessment by Machine Learning Approach
No ratings yet
A Predictive Model For Water Quality Index Assessment by Machine Learning Approach
6 pages
ABSTRACT
No ratings yet
ABSTRACT
2 pages
C3 Water Quality Prediction Based On Hybrid Deep (Drinking - Water)
No ratings yet
C3 Water Quality Prediction Based On Hybrid Deep (Drinking - Water)
10 pages
Journal Pone 0326870
No ratings yet
Journal Pone 0326870
27 pages
Water 14 02836
No ratings yet
Water 14 02836
15 pages
Water Quality Prediction Using Machine Learning Technique
No ratings yet
Water Quality Prediction Using Machine Learning Technique
9 pages
Random Forest Classifier For Remote Sensing Classification.
No ratings yet
Random Forest Classifier For Remote Sensing Classification.
12 pages
1 s2.0 S2214714422003646 Main
No ratings yet
1 s2.0 S2214714422003646 Main
17 pages
Applsci 10 05776 With Cover
No ratings yet
Applsci 10 05776 With Cover
50 pages
1 s2.0 S004313542300180X Main
No ratings yet
1 s2.0 S004313542300180X Main
20 pages
Literature Review Table (Himanshu Gautam)
No ratings yet
Literature Review Table (Himanshu Gautam)
5 pages
A Prediction of Water Quality Analysis Using Machine Learning
No ratings yet
A Prediction of Water Quality Analysis Using Machine Learning
6 pages
Toxic Article
No ratings yet
Toxic Article
66 pages
Prediction of Water Quality System For Aquaculture Using Machine Learning
No ratings yet
Prediction of Water Quality System For Aquaculture Using Machine Learning
8 pages
Iciccd 2024 Paper Id XX
No ratings yet
Iciccd 2024 Paper Id XX
12 pages
Intel Unnati Workshop Report
No ratings yet
Intel Unnati Workshop Report
2 pages
Before 7
No ratings yet
Before 7
17 pages
23mda025 Keerthana S
No ratings yet
23mda025 Keerthana S
17 pages
Machine Learning in Water Quality
No ratings yet
Machine Learning in Water Quality
10 pages
Water 17 02158 v2
No ratings yet
Water 17 02158 v2
19 pages
Water Resources Research 2019 Read ProcessGuided Deep Learning Predictions of Lake Water Temperature
No ratings yet
Water Resources Research 2019 Read ProcessGuided Deep Learning Predictions of Lake Water Temperature
18 pages
Water Resources Quality Indicators Monitoring by Nonlinear Programming and Simulated Annealing Optimization With Ensemble Learning Approaches
No ratings yet
Water Resources Quality Indicators Monitoring by Nonlinear Programming and Simulated Annealing Optimization With Ensemble Learning Approaches
15 pages
Generators 1
No ratings yet
Generators 1
101 pages
Article Mini Project
No ratings yet
Article Mini Project
7 pages
Technical Paper
No ratings yet
Technical Paper
23 pages
BIBILOGRAPHY
No ratings yet
BIBILOGRAPHY
5 pages
1 s2.0 S221471442500755X Main
No ratings yet
1 s2.0 S221471442500755X Main
13 pages
Water Potability Prediction Paper
No ratings yet
Water Potability Prediction Paper
3 pages
Batch 11 Ieee
No ratings yet
Batch 11 Ieee
5 pages
Waste Water Treatment Energy Biugi
No ratings yet
Waste Water Treatment Energy Biugi
23 pages
A Review of Artificial Neural Network Techniques For Environmental Issues
No ratings yet
A Review of Artificial Neural Network Techniques For Environmental Issues
17 pages
1 s2.0 S2772823423000131 Main
No ratings yet
1 s2.0 S2772823423000131 Main
10 pages
CONCLUSION
No ratings yet
CONCLUSION
2 pages
Water SVM XGB
No ratings yet
Water SVM XGB
6 pages
Water Potability Prediction Using Neural Network
No ratings yet
Water Potability Prediction Using Neural Network
3 pages
v1 Covered
No ratings yet
v1 Covered
20 pages
Technical Report
No ratings yet
Technical Report
3 pages
Water Quality Prediction with ML
No ratings yet
Water Quality Prediction with ML
8 pages
P16 Prediction of Drinking Water Quality With Machine Learning
No ratings yet
P16 Prediction of Drinking Water Quality With Machine Learning
17 pages
Enhanced Sensor Fault Detection in Aquatic Monitoring Using Deep BALAJI PRESENTATION
No ratings yet
Enhanced Sensor Fault Detection in Aquatic Monitoring Using Deep BALAJI PRESENTATION
14 pages
Tasks
No ratings yet
Tasks
11 pages
Project Report
No ratings yet
Project Report
38 pages
A New Approach For Wastewater Treatment
No ratings yet
A New Approach For Wastewater Treatment
4 pages
Applsci 14 10689
No ratings yet
Applsci 14 10689
21 pages
Water Quality Prediction in The Luan River Based On 1-DRCNN and BiGRU Hybrid Neural Network Model
No ratings yet
Water Quality Prediction in The Luan River Based On 1-DRCNN and BiGRU Hybrid Neural Network Model
19 pages
Iugig
No ratings yet
Iugig
10 pages
Water 15 00475 v2
No ratings yet
Water 15 00475 v2
17 pages
Predicting and Investigating Water Quality Index by Robust Machine Learning Methods 2
No ratings yet
Predicting and Investigating Water Quality Index by Robust Machine Learning Methods 2
11 pages
Electronics 11 02707 v2
No ratings yet
Electronics 11 02707 v2
13 pages
Business Analytics Course Outline
No ratings yet
Business Analytics Course Outline
10 pages
Introduction To Statistical Learning
No ratings yet
Introduction To Statistical Learning
16 pages
Langley1987 Article ResearchPapersInMachineLearnin
No ratings yet
Langley1987 Article ResearchPapersInMachineLearnin
4 pages
Baseline Survey - Pigging
100% (1)
Baseline Survey - Pigging
22 pages
Greedy Layerwise Learning
No ratings yet
Greedy Layerwise Learning
39 pages
Machine Learning Curriculum Berkley
100% (1)
Machine Learning Curriculum Berkley
12 pages
Moffitt Et Al 2021 Hunting Conspiracy Theories During The Covid 19 Pandemic
No ratings yet
Moffitt Et Al 2021 Hunting Conspiracy Theories During The Covid 19 Pandemic
17 pages
Surveillance Radar MP-SET-080-26
No ratings yet
Surveillance Radar MP-SET-080-26
12 pages
Classification - KNN
No ratings yet
Classification - KNN
8 pages
Machine Learning - Applications, Process and Techniques
100% (1)
Machine Learning - Applications, Process and Techniques
241 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
Object Analysis Classification
100% (1)
Object Analysis Classification
15 pages
Machine Learning in Food Tech
No ratings yet
Machine Learning in Food Tech
32 pages
What's Next?: Binary Classification and Related Tasks Classification
No ratings yet
What's Next?: Binary Classification and Related Tasks Classification
44 pages
Colone2019 - Predictive Repair Scheduling of Wind Turbine Drive Train Components Based On Machine Learning
No ratings yet
Colone2019 - Predictive Repair Scheduling of Wind Turbine Drive Train Components Based On Machine Learning
13 pages
58-Khushbu Khamar-Short Text Classification USING
No ratings yet
58-Khushbu Khamar-Short Text Classification USING
4 pages
MLP & Backpropagation Explained
No ratings yet
MLP & Backpropagation Explained
30 pages
Decisiontrees
No ratings yet
Decisiontrees
46 pages
Discriminant Analysis PDF
No ratings yet
Discriminant Analysis PDF
9 pages
CS3491 - Aiml - Qbank
No ratings yet
CS3491 - Aiml - Qbank
9 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
370-Submission File-740-1-10-20170906
No ratings yet
370-Submission File-740-1-10-20170906
4 pages
Predicting Stock Price Direction Using Support Vector Machines
No ratings yet
Predicting Stock Price Direction Using Support Vector Machines
14 pages
Which Contractor Selection Methodology?
No ratings yet
Which Contractor Selection Methodology?
12 pages
Vanishing Gradient Problem
No ratings yet
Vanishing Gradient Problem
3 pages
ML Practical
No ratings yet
ML Practical
61 pages
Identifying and Categorizing Offensive Language in Social Media
No ratings yet
Identifying and Categorizing Offensive Language in Social Media
59 pages
MLT Unit-1
No ratings yet
MLT Unit-1
19 pages
Anomaly Detection
No ratings yet
Anomaly Detection
11 pages
6 05 Undercomplete Vs Overcomplete Hidden Layer
No ratings yet
6 05 Undercomplete Vs Overcomplete Hidden Layer
4 pages

ML Da1

Uploaded by

ML Da1

Uploaded by

School of Computer Science Engineering and Information Systems

Winter Semester 2024-2025

BITE410L - MACHINE LEARNING

Topic: Water Quality prediction

3. Proposed Design/Solution of the Identified Problem

3.1 Data Collection:

3.2 Data Preprocessing:

 Handle missing values using median imputation.

 Normalize features using Min-Max scaling.

3.3 Model Training:

3.4 Performance Evaluation:

 Evaluate models using metrics such as accuracy, precision, recall, F1-score.

 Compare the performance of individual models and the ensemble approach.

3.5 Model Explainability:

1. Improved Accuracy: Higher classification accuracy compared to individual models.

3. Deployable Solution: A user-friendly system for real-time water quality assessment.

You might also like