NAME OF THE EXPERIMENT PAGE
NO
1 python program to compute Central Tendency
Measures :Mean, Median, Mode Measures of
Dispersion: variance ,standard Deviation
2 .Study of Python Basic Libraries such as Statistics,
Math, Numpy and Scipy
3 Study of Python Libraries for ML application such
as Pandas and Matplotlib
4 Python Program for Simple Linear Regression.
5 Implementation of Multiple Linear Regression for
House Pricing Pricing Prediction using sklearn
6 Implementation of Decision tree using sklearn and
its parameter tuning
7 Implementation of KNN using sklearn
8 Implementation of Logistic Regression using
sklearn
9 Implementation of K-Means Clustering
import numpy as np
10 Performance analysis of Classification Algorithms
1
Program 1: python program to compute Central Tendency
Measures :Mean, Median, Mode Measures of Dispersion:
variance ,standard Deviation
import statistics as stats
def central_tendency_dispersion(data):
# Central Tendency Measures
mean = stats.mean(data)
median = stats.median(data)
try:
mode = stats.mode(data)
except stats.StatisticsError:
mode = "No unique mode found"
# Measures of Dispersion
variance = stats.variance(data)
std_dev = stats.stdev(data)
# Display results
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {std_dev}")
# Example data
2
data = [10, 15, 14, 10, 15, 18, 20, 25, 30]
central_tendency_dispersion(data)
OUTPUT:
Mean: 17.444444444444443
Median: 15
Mode: 10
Variance: 44.52777777777778
Standard Deviation: 6.672913739722534
3
2.Study of Python Basic Libraries such as Statistics, Math, Numpy and
Scipy
Python provides a wide range of basic libraries that are essential for various computational
tasks. These libraries offer functionality to handle statistical calculations, mathematical
operations, and scientific computing. Here is an overview:
Statistics Module
Used for statistical computations such as mean, median, mode, variance, etc.
Example
import statistics
data = [1, 2, 2, 3, 4]
print("Mean:", statistics.mean(data))
print("Median:", statistics.median(data))
print("Mode:", statistics.mode(data))
Math Module
Provides mathematical functions such as trigonometric calculations, logarithms,
factorials, and more.
Example
import math
print("Square root of 16:", math.sqrt(16))
print("Factorial of 5:", math.factorial(5))
print("Cosine of 45 degrees:", math.cos(math.radians(45)))
Numpy Library
Widely used for numerical computations with arrays, matrices, and linear algebra
functions.
Example:
import numpy as np
array = np.array([1, 2, 3, 4, 5])
4
print("Mean of array:", np.mean(array))
print("Sum of array:", np.sum(array))
Scipy Library
Built on Numpy, it provides additional functionality for optimization, integration, and
scientific computations.
Example
from scipy import integrate
# Define a function to integrate
result, _ = integrate.quad(lambda x: x**2, 0, 1)
print("Integral of x^2 from 0 to 1:", result)
5
3. Study of Python Libraries for ML application such as Pandas and
Matplotlib
For machine learning and data analysis, Python libraries like Pandas and Matplotlib are
essential for data manipulation and visualization.
Pandas
Provides data structures like Series and DataFrame for handling and analyzing data
efficiently.
Example:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
print(df)
print("Mean Age:", df['Age'].mean())
Matplotlib
A visualization library used for creating static, interactive, and animated plots.
Example:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [10, 20, 25, 30, 35]
plt.plot(x, y, marker='o', linestyle='--', color='r')
plt.title("Sample Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
6
Program 4:Python Program for Simple Linear Regression.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Generate some example data
np.random.seed(0)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
7
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared: {r2:.2f}")
# Plotting the results
plt.scatter(X_test, y_test, color="black", label="Actual data")
plt.plot(X_test, y_pred, color="blue", linewidth=2, label="Fitted line")
plt.xlabel("X")
plt.ylabel("y")
plt.title("Simple Linear Regression")
plt.legend()
plt.show()
OUTPUT:
8
program5: Implementation of Multiple Linear Regression for House
Pricing Pricing Prediction using sklearn
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Load the dataset
data = pd.read_csv('house_prices.csv')
# Display the first few rows of the dataset
print(data.head())
# Selecting features and target variable
X = data[['Size', 'Bedrooms', 'Age']]
y = data['Price']
# Handling missing data
X = X.fillna(X.mean())
y = y.fillna(y.mean())
# Splitting the data into training and testing sets
9
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Creating and training the model
model = LinearRegression()
model.fit(X_train, y_train)
# Making predictions on the testing set
y_pred = model.predict(X_test)
# Evaluating the model's performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')
# Model coefficients
print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)
coefficients = pd.DataFrame(model.coef_, X.columns, columns=['Coefficient'])
print(coefficients)
10
6. Implementation of Decision tree using sklearn and its parameter tuning
11
Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import load_iris
# Load dataset (for example, the Iris dataset)
data = load_iris()
X = data.data
y = data.target
# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Initialize a basic DecisionTreeClassifier
clf = DecisionTreeClassifier(random_state=42)
# Fit the model with the training data
clf.fit(X_train, y_train)
# Predict on the test set
y_pred = clf.predict(X_test)
12
# Evaluate model performance
print("Accuracy without tuning: ", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
# Parameter tuning using GridSearchCV
param_grid = {
'criterion': ['gini', 'entropy'], # Different criteria for splitting
'splitter': ['best', 'random'], # Split strategy
'max_depth': [None, 10, 20, 30], # Depth of tree
'min_samples_split': [2, 5, 10], # Minimum number of samples to split a
node
'min_samples_leaf': [1, 2, 4], # Minimum number of samples to be at a
leaf node
'max_features': [None, 'auto', 'sqrt', 'log2'] # Number of features to consider
for the best split
# Using GridSearchCV for parameter tuning
grid_search = GridSearchCV(estimator=clf, param_grid=param_grid, cv=5,
n_jobs=-1, verbose=1)
# Fit GridSearchCV
grid_search.fit(X_train, y_train)
# Best parameters from GridSearchCV
13
print("Best Parameters: ", grid_search.best_params_)
# Predict with the best estimator from grid search
best_clf = grid_search.best_estimator_
y_pred_best = best_clf.predict(X_test)
# Evaluate performance with the tuned model
print("Accuracy with tuning: ", accuracy_score(y_test, y_pred_best))
print("Classification Report:\n", classification_report(y_test, y_pred_best))
OUTPUT:
Accuracy with tuning: 1.0
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 10
1 1.00 1.00 1.00 9
2 1.00 1.00 1.00 11
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
7. Implementation of KNN using sklearn
14
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load the dataset (Iris dataset)
iris = load_iris()
X = iris.data # Features
y = iris.target # Target labels
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Create the KNN model with k=3
knn = KNeighborsClassifier(n_neighbors=3)
# Train the model
knn.fit(X_train, y_train)
# Make predictions
y_pred = knn.predict(X_test)
# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
OUTPUT:
Accuracy: 100.00%
8.Implementation of Logistic Regression using sklearn
15
# Import necessary libraries
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report
from sklearn.datasets import load_iris
# Load a sample dataset
# Here, we're using the Iris dataset for simplicity.
# We'll use only two classes (binary classification) for logistic regression.
iris = load_iris()
X = iris.data
y = iris.target
# For binary classification, we'll select only two classes (e.g., class 0 and 1)
X = X[y != 2] # Select only class 0 and 1
y = y[y != 2] # Select only class 0 and 1
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Create a Logistic Regression model
log_reg = LogisticRegression()
# Train the model
log_reg.fit(X_train, y_train)
16
# Make predictions on the test set
y_pred = log_reg.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
# Print the results
print("Accuracy:", accuracy)
print("\nConfusion Matrix:\n", conf_matrix)
print("\nClassification Report:\n", class_report)
OUTPUT:
Accuracy: 1.0
Confusion Matrix:
[[17 0]
[ 0 13]]
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 17
1 1.00 1.00 1.00 13
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
9.Implementation of K-Means Clustering
17
import numpy as np
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
# Generate synthetic data with 4 clusters
X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60,
random_state=0)
# Create a KMeans model with the number of clusters set to 4
kmeans = KMeans(n_clusters=4, random_state=0)
# Fit the model to the data
kmeans.fit(X)
# Predict the cluster labels for each data point
y_kmeans = kmeans.predict(X)
# Plotting the clusters and their centroids
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
# Marking the centroids
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='red', s=200, alpha=0.75, marker='X')
plt.title("K-Means Clustering")
18
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
OUTPUT:
19