0% found this document useful (0 votes)

27 views26 pages

ML Lab

This document provides an overview of various machine learning algorithms including FIND-S, Candidate Elimination, ID3 Decision Tree, Artificial Neural Networks, Naïve Bayes, Bayesian Networks, K-Means, and KNN. Each algorithm is explained with sample datasets and Python code implementations, demonstrating their applications in classification, regression, and clustering tasks. Additionally, it includes performance evaluation metrics and comparisons between different algorithms.

Uploaded by

Rupesh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views26 pages

ML Lab

Uploaded by

Rupesh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

FIND-S Algorithm Demonstration

This document explains and demonstrates the FIND-S algorithm, which is used to find the
most specific hypothesis that fits all the positive examples in a given dataset.

Step 1: Sample CSV Format

The following is a sample of training data stored in a CSV file named 'training_data.csv':

Sky AirTemp Humidity Wind Water Forecast EnjoySport

Sunny Warm Normal Strong Warm Same Yes

Sunny Warm High Strong Warm Same Yes

Rainy Cold High Strong Warm Change No

Sunny Warm High Strong Cool Change Yes

Step 2: Python Code Implementation

import pandas as pd

def find_s_algorithm(df):
# Get only positive examples
positive_examples = df[df.iloc[:, -1] == "Yes"]

# Initialize hypothesis with the first positive example

hypothesis = positive_examples.iloc[0, :-1].tolist()

for _, row in positive_examples.iterrows():

for i in range(len(hypothesis)):
if hypothesis[i] != row[i]:
hypothesis[i] = "?"

return hypothesis

# Load data from CSV file

file_path = "training_data.csv" # Update with your actual path
data = pd.read_csv(file_path)

# Apply FIND-S
final_hypothesis = find_s_algorithm(data)

print("Final hypothesis:", final_hypothesis)

Step 3: Output Example

Final hypothesis: ['Sunny', 'Warm', '?', 'Strong', '?', '?']

This indicates the most specific hypothesis that fits all positive examples.
Candidate Elimination Algorithm -
Detailed Explanation

Overview:

The Candidate-Elimination algorithm is a supervised learning algorithm used in Concept

Learning. It aims to find all hypotheses consistent with a given training dataset. It maintains a
Version Space, defined by the most specific hypothesis (S) and the most general hypothesis
(G).

Sample Training Data:

The following dataset is used for demonstration:

Sky AirTemp Humidity Wind Water Forecast EnjoySport

Sunny Warm Normal Strong Warm Same Yes

Sunny Warm High Strong Warm Same Yes

Rainy Cold High Strong Warm Change No

Sunny Warm High Strong Cool Change Yes

Algorithm Steps:

1. Initialize S to the first positive example.

2. Initialize G to the most general hypothesis: all fields are '?'
3. For each training example:
- If the example is positive, generalize S minimally to include it.
Remove hypotheses from G that are inconsistent with S.
- If the example is negative, specialize G to exclude it.
Ensure each new G hypothesis is consistent with S.

Final Output:
After processing all the examples, we obtain the following:

Specific Hypothesis (S): ['Sunny', 'Warm', '?', 'Strong', '?', '?']

General Hypotheses (G):

[['Sunny', '?', '?', '?', '?', '?'], ['?', 'Warm', '?', '?', '?', '?']]

Explanation:

- The Specific Hypothesis (S) is the most specific hypothesis that covers all positive
examples.
- The General Hypotheses (G) are the most general boundaries that are consistent with all
positive and negative examples.
- Together, S and G represent the version space of all consistent hypotheses.
ID3 Decision Tree Algorithm -
Implementation & Demonstration

Overview:

The ID3 (Iterative Dichotomiser 3) algorithm is a decision tree learning algorithm used for
classification.
It builds the tree by choosing the attribute that yields the highest information gain at each
step.
It is typically used with categorical data.

Dataset:

The following dataset is used to demonstrate the ID3 algorithm (Play Tennis dataset):

Outlook Temperature Humidity Wind PlayTennis

Sunny Hot High Weak No

Sunny Hot High Strong No

Overcast Hot High Weak Yes

Rain Mild High Weak Yes

Rain Cool Normal Weak Yes

Rain Cool Normal Strong No

Overcast Cool Normal Strong Yes

Sunny Mild High Weak No

Sunny Cool Normal Weak Yes

Rain Mild Normal Weak Yes

Sunny Mild Normal Strong Yes

Overcast Mild High Strong Yes

Overcast Hot Normal Weak Yes

Rain Mild High Strong No

Python Code:

import pandas as pd
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.preprocessing import LabelEncoder
import matplotlib.pyplot as plt

# Load dataset
data = pd.read_csv("play_tennis.csv")

# Encode categorical values

le = LabelEncoder()
for column in data.columns:
data[column] = le.fit_transform(data[column])

# Train model
X = data.drop("PlayTennis", axis=1)
y = data["PlayTennis"]
clf = DecisionTreeClassifier(criterion="entropy")
clf.fit(X, y)

# Visualize tree
plot_tree(clf, feature_names=X.columns, class_names=["No", "Yes"], filled=True)
plt.show()

# Predict a new sample

sample = pd.DataFrame([["Sunny", "Cool", "High", "Strong"]], columns=["Outlook",
"Temperature", "Humidity", "Wind"])
sample_encoded = sample.apply(lambda col: le.fit(data[col.name]).transform(col))
print("Prediction:", clf.predict(sample_encoded))

Explanation:

The above code uses the ID3 algorithm via sklearn's DecisionTreeClassifier with entropy as
the criterion.
It trains a model on the Play Tennis dataset, visualizes the decision tree, and predicts the
output for a new sample.
Artificial Neural Network using
Backpropagation - Implementation
Guide

Overview:

An Artificial Neural Network (ANN) is a computational model inspired by the human brain.
The backpropagation algorithm is used to train the ANN by minimizing the error between the
predicted and actual output through gradient descent.

Dataset:

We use a simple dataset such as the XOR function to demonstrate the working of ANN with
backpropagation.

Input1 Input2 Output

0 0 0

0 1 1

1 0 1

1 1 0
Python Code (using Keras):

import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# XOR dataset
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])

# Build ANN model

model = Sequential()
model.add(Dense(4, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train model
model.fit(X, y, epochs=1000, verbose=0)

# Test predictions
predictions = model.predict(X)
print("Predictions:\n", predictions)

Explanation:

The model consists of an input layer, one hidden layer with 4 neurons, and an output layer.
The activation function 'relu' is used in the hidden layer and 'sigmoid' in the output layer.
Backpropagation is used internally by Keras to adjust weights and minimize error using the
'adam' optimizer.
Naïve Bayes Classifier -
Implementation & Accuracy Evaluation

Overview:

The Naïve Bayes classifier is a probabilistic classifier based on Bayes’ Theorem with strong
independence assumptions between features.
It is simple, fast, and effective for many classification problems.

Dataset:

We assume a CSV file containing labeled data for training and testing. For demonstration,
let’s consider a Play Tennis dataset.

Outlook Temperature Humidity Wind PlayTennis

Sunny Hot High Weak No

Sunny Hot High Strong No

Overcast Hot High Weak Yes

Rain Mild High Weak Yes

Rain Cool Normal Weak Yes

Rain Cool Normal Strong No

Overcast Cool Normal Strong Yes

Sunny Mild High Weak No

Sunny Cool Normal Weak Yes

Rain Mild Normal Weak Yes

Python Code:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import CategoricalNB
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

# Load dataset
data = pd.read_csv("play_tennis.csv")

# Encode categorical data

label_encoders = {}
for col in data.columns:
le = LabelEncoder()
data[col] = le.fit_transform(data[col])
label_encoders[col] = le

# Split into train and test sets

X = data.drop("PlayTennis", axis=1)
y = data["PlayTennis"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Naive Bayes model

model = CategoricalNB()
model.fit(X_train, y_train)

# Predict and evaluate

predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

Explanation:

The code uses sklearn's `CategoricalNB` to train the Naïve Bayes classifier on categorical
data.
The dataset is encoded, split into training and test sets, and the model is trained and evaluated
for accuracy.
Document Classification using Naïve
Bayes in Java

Overview:

This document outlines the implementation of a Naïve Bayes Classifier in Java using the
Weka API to classify a set of documents.
The model computes Accuracy, Precision, and Recall to evaluate its performance.

Assumptions:

- The dataset is in ARFF format (Weka-compatible) and contains labeled document data.
- The Java program uses Weka’s NaiveBayes class.

Java Code:

import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.Evaluation;
import java.util.Random;

public class DocumentClassifier {

public static void main(String[] args) throws Exception {
// Load dataset
DataSource source = new DataSource("documents.arff");
Instances data = source.getDataSet();
data.setClassIndex(data.numAttributes() - 1);

// Train Naive Bayes model

NaiveBayes nb = new NaiveBayes();
nb.buildClassifier(data);

// Evaluate with 10-fold cross-validation

Evaluation eval = new Evaluation(data);
eval.crossValidateModel(nb, data, 10, new Random(1));

// Output metrics
System.out.println("Accuracy: " + eval.pctCorrect());
System.out.println("Precision: " + eval.precision(1));
System.out.println("Recall: " + eval.recall(1));
}
}

Explanation:

This Java program loads a document dataset, builds a Naïve Bayes classifier using Weka,
and evaluates its performance with 10-fold cross-validation. It prints Accuracy, Precision, and
Recall for class '1'.
Bayesian Network for Heart Disease
Diagnosis

Overview:

This program demonstrates how to construct and use a Bayesian Network to diagnose heart
disease using a medical dataset.
The model is implemented in Python using the `pgmpy` library.

Dataset:

We use a simplified version of the UCI Heart Disease dataset with attributes like Age,
Gender, ChestPainType, Cholesterol, and HeartDisease.

Python Code (using pgmpy):

import pandas as pd
from pgmpy.models import BayesianNetwork
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination

# Sample medical dataset

data = pd.DataFrame([
[63, 1, 'typical', 233, 1],
[37, 1, 'non-anginal', 250, 1],
[41, 0, 'atypical', 204, 0],
[56, 1, 'asymptomatic', 236, 1],
[57, 0, 'typical', 354, 0],
], columns=["Age", "Sex", "ChestPainType", "Cholesterol", "HeartDisease"])

# Convert categorical values

data["ChestPainType"] = data["ChestPainType"].astype('category').cat.codes

# Define Bayesian Network structure

model = BayesianNetwork([
("Age", "HeartDisease"),
("Sex", "HeartDisease"),
("ChestPainType", "HeartDisease"),
("Cholesterol", "HeartDisease")
])

# Learn CPDs using Maximum Likelihood Estimator

model.fit(data, estimator=MaximumLikelihoodEstimator)

# Perform inference
inference = VariableElimination(model)
result = inference.query(variables=["HeartDisease"], evidence={"Age": 56, "Sex": 1,
"ChestPainType": 3, "Cholesterol": 236})

print(result)
Explanation:

This code builds a Bayesian Network with connections from predictors to the target variable
(HeartDisease).
The `MaximumLikelihoodEstimator` learns the CPDs from the data.
The `VariableElimination` module performs inference to diagnose if a patient with given
attributes has heart disease.
Clustering with EM and K-Means
Algorithms - Implementation &
Comparison

Overview:

This program applies the Expectation-Maximization (EM) algorithm and the K-Means
algorithm to cluster a dataset stored in a CSV file.
It uses Python's scikit-learn library for implementation and compares the results using
silhouette score for clustering quality.

Python Code:

import pandas as pd
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler

# Load dataset
data = pd.read_csv("data.csv")
X = StandardScaler().fit_transform(data)

# K-Means Clustering
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans_labels = kmeans.fit_predict(X)
kmeans_score = silhouette_score(X, kmeans_labels)
print("K-Means Silhouette Score:", kmeans_score)

# EM Clustering using Gaussian Mixture Model

gmm = GaussianMixture(n_components=3, random_state=42)
gmm_labels = gmm.fit_predict(X)
gmm_score = silhouette_score(X, gmm_labels)
print("EM Silhouette Score:", gmm_score)

Explanation:

Both K-Means and EM algorithms aim to group similar data points together:
- K-Means partitions the dataset into clusters by minimizing the sum of squared distances.
- EM uses a probabilistic model (Gaussian Mixture) to estimate the likelihood of data points
belonging to clusters.

The silhouette score measures how well data points fit into their assigned clusters. Higher
scores indicate better-defined clusters.

Comparison and Conclusion:

- If the silhouette score for EM is higher than K-Means, EM provides better clustering,
especially when data clusters are elliptical.
- If K-Means performs similarly or better, it suggests that the clusters are spherical and well-
separated.
- Depending on the data shape, EM may handle overlapping clusters better than K-Means.
K-Nearest Neighbour (KNN)
Algorithm - Iris Dataset Classification

Overview:

The K-Nearest Neighbour (KNN) algorithm is a simple, non-parametric method used for
classification and regression.
This implementation uses Python's scikit-learn library to classify the Iris dataset.
It prints both correct and incorrect predictions.

Python Code (using sklearn):

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

# Load Iris dataset

iris = load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names

# Split data into train and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train KNN classifier

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Predict and evaluate

predictions = knn.predict(X_test)
for i in range(len(y_test)):
actual = target_names[y_test[i]]
predicted = target_names[predictions[i]]
status = "Correct" if y_test[i] == predictions[i] else "Incorrect"
print(f"Sample {i+1}: Actual = {actual}, Predicted = {predicted} --> {status}")

Explanation:

- The KNN algorithm classifies test samples based on the majority vote of the k-nearest
training examples.
- The Iris dataset contains three classes of 50 instances each, where each class refers to a type
of iris plant.
- The script prints the prediction results for each test sample and indicates whether each
prediction was correct or incorrect.
Locally Weighted Regression (LWR) -
Implementation & Visualization

Overview:

Locally Weighted Regression (LWR) is a non-parametric algorithm used to fit a curve through
data points.
It computes weights for each training point depending on its distance to the query point and
fits a regression using weighted least squares.

Python Code Summary:

- `kernel`: Defines the weight for each training point based on Gaussian kernel.
- `predict`: Computes the regression prediction using weighted least squares.
- Data: 100 points generated from `sin(x)` with added noise.
- Bandwidth (`tau`) determines how 'local' the fit is.
Graphical Output:

The above graph shows the noisy data (red) and the smooth curve fitted using LWR (blue).

AD3461-Machine Learning Lab Manual
No ratings yet
AD3461-Machine Learning Lab Manual
26 pages
FIND-S Algorithm Implementation
No ratings yet
FIND-S Algorithm Implementation
51 pages
MANUAL
No ratings yet
MANUAL
34 pages
MANUAL
No ratings yet
MANUAL
33 pages
ML Lab Manual - Ex No. 1 To 9
No ratings yet
ML Lab Manual - Ex No. 1 To 9
26 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
12 pages
AD3461 ML Lab Manual
No ratings yet
AD3461 ML Lab Manual
32 pages
MLlab Manual LIET
No ratings yet
MLlab Manual LIET
52 pages
Fedal #5
No ratings yet
Fedal #5
33 pages
Machine Learning
No ratings yet
Machine Learning
27 pages
IT ML Lab
No ratings yet
IT ML Lab
35 pages
ML Manual
No ratings yet
ML Manual
74 pages
Lab Manual
No ratings yet
Lab Manual
55 pages
ML Lab AIDS
No ratings yet
ML Lab AIDS
25 pages
Machine Learninf File Final
No ratings yet
Machine Learninf File Final
45 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
MLT Lab1
No ratings yet
MLT Lab1
27 pages
AD3461 - ML Lab Manual
No ratings yet
AD3461 - ML Lab Manual
54 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
ML Lab Programs
No ratings yet
ML Lab Programs
18 pages
Machine Learning Lab Record: Dr. Sarika Hegde
No ratings yet
Machine Learning Lab Record: Dr. Sarika Hegde
23 pages
Remaining ML Program
No ratings yet
Remaining ML Program
12 pages
ML Lab Manual (1-9)
No ratings yet
ML Lab Manual (1-9)
37 pages
R20 Iii-Ii ML Lab Manual
100% (1)
R20 Iii-Ii ML Lab Manual
79 pages
ML Lab Observation
100% (1)
ML Lab Observation
44 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
21 pages
Machine Learning Algorithms Lab
No ratings yet
Machine Learning Algorithms Lab
48 pages
Machine Learning Laboratory Record Book: 1 Find S Algorithm
No ratings yet
Machine Learning Laboratory Record Book: 1 Find S Algorithm
22 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
39 pages
ML5 Implementation
No ratings yet
ML5 Implementation
32 pages
Unit 6 DWDM
No ratings yet
Unit 6 DWDM
74 pages
Machine Learning With Titanic Dataset Tutorial
No ratings yet
Machine Learning With Titanic Dataset Tutorial
7 pages
ML1408-Machine Learning Lab Programs
No ratings yet
ML1408-Machine Learning Lab Programs
17 pages
ML Lab Report
No ratings yet
ML Lab Report
8 pages
Lecture 7.2 - DTC Algorithm Implementation
No ratings yet
Lecture 7.2 - DTC Algorithm Implementation
7 pages
Unit 2 AAM
No ratings yet
Unit 2 AAM
32 pages
ML Manual
No ratings yet
ML Manual
34 pages
ML Ex1
No ratings yet
ML Ex1
12 pages
ML Lab Manual
No ratings yet
ML Lab Manual
23 pages
Aiml Practical
No ratings yet
Aiml Practical
17 pages
Exp 4
No ratings yet
Exp 4
3 pages
CP4252 Machine Learning Lab Manual
100% (2)
CP4252 Machine Learning Lab Manual
33 pages
Cse Machine Learning Lab Manual
No ratings yet
Cse Machine Learning Lab Manual
22 pages
Data Analytics III
No ratings yet
Data Analytics III
5 pages
ML LAB Rec
No ratings yet
ML LAB Rec
9 pages
ML Lab Manual - 3,4,5
No ratings yet
ML Lab Manual - 3,4,5
6 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
11 pages
DMDM Part 2
No ratings yet
DMDM Part 2
94 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
ML Lab Work
No ratings yet
ML Lab Work
25 pages
ML NEW Final Format
No ratings yet
ML NEW Final Format
37 pages
Lab Manual
No ratings yet
Lab Manual
25 pages
ML Lab Record
No ratings yet
ML Lab Record
30 pages
ML Lab Manual
No ratings yet
ML Lab Manual
26 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
ML Lab P-1
No ratings yet
ML Lab P-1
10 pages
Slide 3
No ratings yet
Slide 3
23 pages
Ame: Waqar Ali
No ratings yet
Ame: Waqar Ali
22 pages
BahadirAkinAkgul Sub156
No ratings yet
BahadirAkinAkgul Sub156
10 pages
Fruit Detection Report
No ratings yet
Fruit Detection Report
50 pages
IMSS2019 Paper 106 PP 633-642
No ratings yet
IMSS2019 Paper 106 PP 633-642
11 pages
Question Bank of Advanced Dbms
No ratings yet
Question Bank of Advanced Dbms
2 pages
AFRICDSA Certified Data Scientist Syllabus - V1.2
No ratings yet
AFRICDSA Certified Data Scientist Syllabus - V1.2
12 pages
OML4SQL - Workshop V 1.0 052022
No ratings yet
OML4SQL - Workshop V 1.0 052022
42 pages
K-Means Clustering: Sargur Srihari Srihari@cedar - Buffalo.edu
No ratings yet
K-Means Clustering: Sargur Srihari Srihari@cedar - Buffalo.edu
20 pages
Data Science Infinity Transition Roadmap
No ratings yet
Data Science Infinity Transition Roadmap
34 pages
Unsupervised Learning: K-Means & EM
No ratings yet
Unsupervised Learning: K-Means & EM
34 pages
51 Machine Learning Interview Questions With Answers - Springboard
100% (1)
51 Machine Learning Interview Questions With Answers - Springboard
20 pages
Clustering in AI
No ratings yet
Clustering in AI
16 pages
RIce Plant Disease Detection Using Different AI Approaches
No ratings yet
RIce Plant Disease Detection Using Different AI Approaches
11 pages
Journal of Transport Geography: David S. Vale
No ratings yet
Journal of Transport Geography: David S. Vale
11 pages
Data Analytics Exam Paper 2018
No ratings yet
Data Analytics Exam Paper 2018
16 pages
Unsupervised Learning: Clustering Methods
No ratings yet
Unsupervised Learning: Clustering Methods
24 pages
Movie Recommender System Using K-Means
No ratings yet
Movie Recommender System Using K-Means
7 pages
Time Series Indexing Guide
No ratings yet
Time Series Indexing Guide
15 pages
BE - LP III Lab Manual
No ratings yet
BE - LP III Lab Manual
54 pages
Customer Segmentation Using K
No ratings yet
Customer Segmentation Using K
16 pages
Plant Health Monitoring Using Digital Image Processing
No ratings yet
Plant Health Monitoring Using Digital Image Processing
5 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1U
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1U
3 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
61 pages
Deep Learning Exam for M.E CSE
No ratings yet
Deep Learning Exam for M.E CSE
9 pages
Camintac Essay - Nubbh Kejriwal
No ratings yet
Camintac Essay - Nubbh Kejriwal
4 pages
Amishi Rana: Data Analyst & AI Specialist Resume
No ratings yet
Amishi Rana: Data Analyst & AI Specialist Resume
1 page
Quantum Image Processing
No ratings yet
Quantum Image Processing
10 pages
Progress in Energy and Combustion Science: Masoud Aliramezani, Charles Robert Koch, Mahdi Shahbakhti
No ratings yet
Progress in Energy and Combustion Science: Masoud Aliramezani, Charles Robert Koch, Mahdi Shahbakhti
38 pages
Otsu Method and K-Means: Dongjuliu, Jianyu
No ratings yet
Otsu Method and K-Means: Dongjuliu, Jianyu
6 pages

ML Lab

Uploaded by

ML Lab

Uploaded by

FIND-S Algorithm Demonstration

Step 1: Sample CSV Format

Sky AirTemp Humidity Wind Water Forecast EnjoySport

Sunny Warm Normal Strong Warm Same Yes

Sunny Warm High Strong Warm Same Yes

Rainy Cold High Strong Warm Change No

Sunny Warm High Strong Cool Change Yes

Step 2: Python Code Implementation

# Initialize hypothesis with the first positive example

for _, row in positive_examples.iterrows():

# Load data from CSV file

print("Final hypothesis:", final_hypothesis)

Step 3: Output Example

The Candidate-Elimination algorithm is a supervised learning algorithm used in Concept

Sample Training Data:

Sky AirTemp Humidity Wind Water Forecast EnjoySport

Sunny Warm Normal Strong Warm Same Yes

Sunny Warm High Strong Warm Same Yes

Rainy Cold High Strong Warm Change No

Sunny Warm High Strong Cool Change Yes

1. Initialize S to the first positive example.

Specific Hypothesis (S): ['Sunny', 'Warm', '?', 'Strong', '?', '?']

General Hypotheses (G):

Outlook Temperature Humidity Wind PlayTennis

Sunny Hot High Weak No

Sunny Hot High Strong No

Overcast Hot High Weak Yes

Rain Mild High Weak Yes

Rain Cool Normal Weak Yes

Rain Cool Normal Strong No

Sunny Mild High Weak No

Sunny Cool Normal Weak Yes

Rain Mild Normal Weak Yes

Sunny Mild Normal Strong Yes

Overcast Mild High Strong Yes

Overcast Hot Normal Weak Yes

Rain Mild High Strong No

# Encode categorical values

# Predict a new sample

Input1 Input2 Output

# Build ANN model

Outlook Temperature Humidity Wind PlayTennis

Sunny Hot High Weak No

Sunny Hot High Strong No

Overcast Hot High Weak Yes

Rain Mild High Weak Yes

Rain Cool Normal Weak Yes

Rain Cool Normal Strong No

Overcast Cool Normal Strong Yes

Sunny Cool Normal Weak Yes

Rain Mild Normal Weak Yes

# Encode categorical data

# Split into train and test sets

# Train Naive Bayes model

# Predict and evaluate

public class DocumentClassifier {

// Train Naive Bayes model

// Evaluate with 10-fold cross-validation

Python Code (using pgmpy):

# Sample medical dataset

# Convert categorical values

# Define Bayesian Network structure

# Learn CPDs using Maximum Likelihood Estimator

# EM Clustering using Gaussian Mixture Model

Comparison and Conclusion:

Python Code (using sklearn):

from sklearn.datasets import load_iris

# Load Iris dataset

# Split data into train and test sets

# Train KNN classifier

# Predict and evaluate

Python Code Summary:

You might also like