PRACTICAL PROGRAM
PYTHON PROGRAMMING: Data Handling
Q1: Create a panda’s series from a dictionary of values and a ndarray
Creating a Pandas Series from a dictionary:
import pandas as pd
# Create a dictionary
data_dict = {'a': 10, 'b': 20, 'c': 30}
# Create a Pandas Series from the dictionary
series_from_dict = pd.Series(data_dict)
print("Series from dictionary:")
print(series_from_dict)
OUTPUT:
Creating a Pandas Series from a NumPy ndarray:
import numpy as np
import pandas as pd
# Create a NumPy ndarray
data_ndarray = np.array([10, 20, 30])
# Create a Pandas Series from the ndarray
series_from_ndarray = pd.Series(data_ndarray, index=['a', 'b', 'c'])
print("Series from ndarray:")
print(series_from_ndarray)
OUTPUT:
Q2: Given a Series, print all the elements that are above the 75th percentile
import pandas as pd
# Example Series
data = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
# Calculate the 75th percentile
percentile_75 = data.quantile(0.75)
# Filter the Series to include only elements above the 75th percentile
above_75th_percentile = data[data > percentile_75]
print("Elements above the 75th percentile:")
print(above_75th_percentile)
OTPUT:
Q3: Create a Data Frame quarterly sales where each row contains the item category, item name, and
expenditure. Group the rows by the category and print the total expenditure per category.
Coding
import pandas as pd
# Create a DataFrame with sample quarterly sales data
data = {
'Item Category': ['Electronics', 'Electronics', 'Furniture', 'Furniture', 'Clothing', 'Clothing'],
'Item Name': ['Laptop', 'Smartphone', 'Sofa', 'Desk', 'Jacket', 'Shoes'],
'Expenditure': [1200, 800, 600, 300, 150, 100]
df = pd.DataFrame(data)
# Group by 'Item Category' and calculate the total expenditure per category
total_expenditure_per_category = df.groupby('Item Category')['Expenditure'].sum()
# Print the results
print("Total expenditure per category:")
print(total_expenditure_per_category)
Output:
Q4: . Create a data frame for examination result and display row labels, column labels data types of
each column and the dimensions.
Coding:
import pandas as pd
# Create a DataFrame with sample examination results data
data = {
'Student Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Math': [85, 90, 78, 88],
'Science': [92, 85, 80, 87],
'English': [88, 91, 85, 82]
df = pd.DataFrame(data)
# Display row labels (index)
print("Row labels (index):")
print(df.index)
# Display column labels
print("\nColumn labels:")
print(df.columns)
# Display data types of each column
print("\nData types of each column:")
print(df.dtypes)
# Display dimensions of the DataFrame (rows, columns)
print("\nDimensions of the DataFrame:")
print(df.shape)
Output:
Q5:
Let's start by creating an example DataFrame that contains some duplicate rows:
Create a sample DataFrame with duplicate rows
data = {
'Student Name': ['Alice', 'Bob', 'Charlie', 'David', 'Alice', 'Bob'],
'Math': [85, 90, 78, 88, 85, 90],
'Science': [92, 85, 80, 87, 92, 85],
'English': [88, 91, 85, 82, 88, 91]
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Removing Duplicate Rows
To remove duplicate rows based on all columns or specific columns, you can use the
drop_duplicates() method.
Remove duplicate rows based on all columns
df_no_duplicates_all = df.drop_duplicates()
print("\nDataFrame with duplicate rows removed (based on all columns):")
print(df_no_duplicates_all)
Removing Duplicate Rows Based on Specific Columns
# Remove duplicate rows based on specific columns, e.g., 'Student Name'
df_no_duplicates_specific = df.drop_duplicates(subset=['Student Name'])
print("\nDataFrame with duplicate rows removed (based on 'Student Name'):")
print(df_no_duplicates_specific)
Filtering Based on Conditions
# Filter rows where Math score is greater than 80
df_filtered_math = df[df['Math'] > 80]
print("\nDataFrame with rows where Math score > 80:")
print(df_filtered_math)
Filtering Rows Where Multiple Conditions Are Met
# Filter rows where Math score is greater than 80 and Science score is greater than 85
df_filtered_multiple = df[(df['Math'] > 80) & (df['Science'] > 85)]
print("\nDataFrame with rows where Math score > 80 and Science score > 85:")
print(df_filtered_multiple)
Removing Rows with Missing Values
# Assume some missing values for demonstration
df.loc[2, 'Math'] = None
# Remove rows with any missing values
df_no_missing = df.dropna()
print("\nDataFrame with rows containing missing values removed:")
print(df_no_missing)
Notes:
Summary of Filtering Methods
1. Removing Duplicate Rows:
o df.drop_duplicates(): Removes duplicate rows based on all columns.
o df.drop_duplicates(subset=['column_name']): Removes duplicates based on
specific columns.
2. Filtering Based on Conditions:
o df[df['column_name'] > value]: Filters rows based on a condition.
o df[(df['column1'] > value1) & (df['column2'] < value2)]: Filters rows based on
multiple conditions.
3. Removing Rows with Missing Values:
o df.dropna(): Removes rows with any missing values.
o df.dropna(subset=['column_name']): Removes rows with missing values in
specific columns.
Q6: Write python code to read ‘dept.csv’ file containing columns as (d_id, d_name and city )
and display all records separated by comma
Ans:
import csv
print('Details of csv file:')
with open('dept.csv') as csvfile:
data = csv.reader(csvfile, delimiter=' ')
for row in data:
print(', '.join(row))
Output is:
Details of csv file:
10,sales,kolkatta
20,marketing,delhi
Q7: Write a program to read the contents of “dept.csv” file using with open()
Ans:
import csv
with open('dept.csv','r') as csv_file:
reader=csv.reader(csv_file)
rows=[ ]
for rec in reader:
rows.append(rec)
print(rows)
Output is:
[['d_id d_name city '], ['10 sales kolkatta'], ['11 marketing delhi']]
Q8: Write a program to count the number of records present in “dept.csv” file
Ans:
import csv
f=open('dept.csv','r')
csv_reader=csv.reader(f)
columns=next(csv_reader)
c=0
for row in csv_reader:
c=c+1
print('No.of records are:',c)
Output is
No.of records are: 2
Use of next() = The function next() is used to directly point to this list of fields to read the
next line in the CSV file. .next() method returns the current row and advances the iterator to
the next row.
Q9: Write a program to search the record of a particular student from CSV file on the basis of
inputted name.
Ans:
import csv
f=open('student.csv','r')
csv_reader=csv.reader(f)
name=input('Enter name to be searched:')
for row in csv_reader:
if (row[1]==name):
print(row)
Output is:
Enter name to be searched:APARNA
['1', 'APARNA', 'XII']
Q10: Write a program to add (append) Employee records onto a csv file.
Ans:
import csv
with open('student.csv',mode='a') as csvfile:
mywriter=csv.writer(csvfile,delimiter=',')
ans='y'
while ans.lower()=='y':
rno=int(input('Enter roll no:'))
name=input('Enter student name:')
clas=input('Enter class:')
mywriter.writerow([rno,name,clas])
print('## Data Saved ##')
ans=input('Add More?:')
Output is:
Enter roll no:4
Enter student name:MANGESH
Enter class:XI
## Data Saved ##
Add More?:y
Enter roll no:5
Enter student name:PRIYANKA
Enter class:XII
## Data Saved ##
Add More?:n
Q11: Write a menu-driven program implementing user-defined functions to perform different
functions on a csv file “student” such as:
(a) Write a single record to csv
(b) Write all the records in one single go onto the csv.
(c) Display the contents of the csv file
Ans:
import csv
def crcsv1():
with open('student.csv','a') as csvfile:
fobj=csv.writer(csvfile)
while True:
rno=int(input('Enter Roll no:'))
name=input('Name:')
marks = float(input("Marks:"))
Line=[rno,name,marks]
fobj.writerow(Line)
ch=input('More (Y/N):')
if ch=='N':
break
def crcsv2():
with open('student.csv','w') as csvfile:
fobj=csv.writer(csvfile)
Lines=[]
while True:
rno=int(input('Enter Roll no:'))
name=input('Name:')
marks = float(input("Marks:"))
Lines.append([rno,name,marks])
fobj.writerow(Lines)
ch=input('More (Y/N):')
if ch=='N':
break
fobj.writerows(Lines)
def showall():
with open('student.csv') as csvfile:
csvobj=csv.reader(csvfile)
for line in csvobj:
print(','.join(line))
print('csv file handling:')
while True:
opt=input('1: Create 2: CreateCSVAll 3:Show CSV 4.Quit\n')
if opt=='1':
crcsv1()
elif opt=='2':
crcsv2()
elif opt=='3':
showall()
else:
break
Output is:
csv file handling:
1: Create 2: CreateCSVAll 3:Show CSV 4.Quit
Enter Roll no:1
Name:ARCHANA SAKAT
Marks:453
More (Y/N):Y
Enter Roll no:2
Name:SUNIL
Marks:433
More (Y/N):N
1: Create 2: CreateCSVAll 3:Show CSV 4.Quit
1,ARCHANA SAKAT,453.0
2,SUNIL,433.0
1: Create 2: CreateCSVAll 3:Show CSV 4.Quit
Enter Roll no:3
Name:AAKASH
Marks:455
More (Y/N):N
1: Create 2: CreateCSVAll 3:Show CSV 4.Quit
[3, 'AAKASH', 455.0]
1: Create 2: CreateCSVAll 3:Show CSV 4.Quit
>>>
Q12: Given the school result data, analyses the performance of the students on different
parameters, e.g subject wise or class wise.
coding
import pandas as pd
import matplotlib.pyplot as plt
# Load the CSV file into a DataFrame
df = pd.read_csv('school_results.csv')
print("DataFrame loaded:")
print(df)
# Subject-wise performance statistics
subject_stats = df[['Math', 'Science', 'English']].agg(['mean', 'median', 'std'])
print("\nSubject-wise performance statistics:")
print(subject_stats)
# Class-wise performance
class_performance = df.groupby('Class').agg({'Math': 'mean', 'Science': 'mean', 'English':
'mean'})
print("\nClass-wise performance:")
print(class_performance)
# Plot subject-wise performance
subject_stats.T.plot(kind='bar', figsize=(10, 6))
plt.title('Subject-wise Performance Statistics')
plt.xlabel('Subject')
plt.ylabel('Score')
plt.legend(title='Statistics')
plt.show()
# Plot class-wise performance
class_performance.plot(kind='bar', figsize=(10, 6))
plt.title('Class-wise Performance')
plt.xlabel('Class')
plt.ylabel('Average Score')
plt.legend(title='Subject')
plt.show()