EXPERIMENT-11: Exploratory Data Analysis for Classification using Pandas or Matplotlib.
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
# Load the built-in Iris dataset
iris = load_iris()
# Convert the dataset to a pandas DataFrame
data = pd.DataFrame(iris.data, columns=iris.feature_names)
data['species'] = iris.target # Add target column for species
# Display the first few rows of the dataset
print("First 5 rows of the dataset:")
print(data.head())
# Summary statistics
print("\nSummary statistics:")
print(data.describe())
# Scatter plot for the first two features
plt.scatter(data.iloc[:, 0], data.iloc[:, 1], c=data['species'], cmap='viridis')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.title('Scatter plot of first two features')
plt.show()
OUTPUT:
First 5 rows of the dataset:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
species
0 0
1 0
2 0
3 0
4 0
Summary statistics:
sepal length (cm) sepal width (cm) petal length (cm) \
count 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000
std 0.828066 0.435866 1.765298
min 4.300000 2.000000 1.000000
25% 5.100000 2.800000 1.600000
50% 5.800000 3.000000 4.350000
75% 6.400000 3.300000 5.100000
max 7.900000 4.400000 6.900000
petal width (cm) species
count 150.000000 150.000000
mean 1.199333 1.000000
std 0.762238 0.819232
min 0.100000 0.000000
25% 0.300000 0.000000
50% 1.300000 1.000000
75% 1.800000 2.000000
max 2.500000 2.000000