0% found this document useful (0 votes)
16 views3 pages

Experiment 11 PML

The document outlines an exploratory data analysis experiment using the Iris dataset with Pandas and Matplotlib. It includes loading the dataset, displaying the first few rows, providing summary statistics, and creating a scatter plot of the first two features colored by species. The analysis reveals key statistics and visualizations that aid in understanding the dataset's characteristics.

Uploaded by

sri117537
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views3 pages

Experiment 11 PML

The document outlines an exploratory data analysis experiment using the Iris dataset with Pandas and Matplotlib. It includes loading the dataset, displaying the first few rows, providing summary statistics, and creating a scatter plot of the first two features colored by species. The analysis reveals key statistics and visualizations that aid in understanding the dataset's characteristics.

Uploaded by

sri117537
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

EXPERIMENT-11: Exploratory Data Analysis for Classification using Pandas or Matplotlib.

# Import necessary libraries


import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load the built-in Iris dataset


iris = load_iris()

# Convert the dataset to a pandas DataFrame


data = pd.DataFrame(iris.data, columns=iris.feature_names)
data['species'] = iris.target # Add target column for species

# Display the first few rows of the dataset


print("First 5 rows of the dataset:")
print(data.head())

# Summary statistics
print("\nSummary statistics:")
print(data.describe())

# Scatter plot for the first two features


plt.scatter(data.iloc[:, 0], data.iloc[:, 1], c=data['species'], cmap='viridis')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.title('Scatter plot of first two features')
plt.show()
OUTPUT:
First 5 rows of the dataset:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

species
0 0
1 0
2 0
3 0
4 0

Summary statistics:
sepal length (cm) sepal width (cm) petal length (cm) \
count 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000
std 0.828066 0.435866 1.765298
min 4.300000 2.000000 1.000000
25% 5.100000 2.800000 1.600000
50% 5.800000 3.000000 4.350000
75% 6.400000 3.300000 5.100000
max 7.900000 4.400000 6.900000

petal width (cm) species


count 150.000000 150.000000
mean 1.199333 1.000000
std 0.762238 0.819232
min 0.100000 0.000000
25% 0.300000 0.000000
50% 1.300000 1.000000
75% 1.800000 2.000000
max 2.500000 2.000000

You might also like