0% found this document useful (0 votes)
127 views30 pages

DS Practical

The document is a lab report submitted by Amandeep (Roll No. 105/20, Branch: CSE, 6th Sem) to Dr. Vinay Chopra (Associate Professor) at D.A.V. Institute of Engineering & Technology, Jalandhar. It contains 15 experiments performed using Python libraries and concepts related to data science and data analysis. The experiments include using pandas to load and analyze NBA player data, performing matrix operations in NumPy, combining DataFrames in pandas, adding/selecting columns in pandas DataFrames, and creating visualizations like box plots, histograms, pivot tables and heatmaps.

Uploaded by

XYZ NK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views30 pages

DS Practical

The document is a lab report submitted by Amandeep (Roll No. 105/20, Branch: CSE, 6th Sem) to Dr. Vinay Chopra (Associate Professor) at D.A.V. Institute of Engineering & Technology, Jalandhar. It contains 15 experiments performed using Python libraries and concepts related to data science and data analysis. The experiments include using pandas to load and analyze NBA player data, performing matrix operations in NumPy, combining DataFrames in pandas, adding/selecting columns in pandas DataFrames, and creating visualizations like box plots, histograms, pivot tables and heatmaps.

Uploaded by

XYZ NK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

DAVIET/CSE/2003635

D.A.V INSTITUTE OF ENGINERING & TECHNOLOGY


JALANDHAR

Data Science LAB(BTCS 616-18)


Submitted To:- Submitted By:-
Dr. Vinay Chopra Amandeep
(Associate Professor) Roll:-105/20
Branch :- CSE/6 th Sem

DS_ LAB Page 1


DAVIET/CSE/2003635

INDEX
S.NO. EXPERIMENTS Page Remarks
No.
1 Using the pandas Python Library 3-4

2 a) Write a basic code in python for Matrix input from user 5-6
b) Create Matrix using map function using Numpy
c) Write a python program to demonstrate the operations
add , subtract and divide using Numpy
3 Combine two Dataframe in Python-Pandas 7-9
a) Concatenating dataframe
b) Joining dataframe
c) Concatenating using append

4 Adding new column to existing dataframe in Pandas 10-12


a) By declaring a new list as a column.
b) By using dataframe insert()
c) Using dataframe assign() method
d) By using a dictionary

5 Create a new column in Pandas dataframe based on the 13-15


existing columns
a)Use dataframe. Apply()
b) We can achieve the same result by directly perfroming
the required operation on the desired column element-wise
c) Using dataframe .map() function

6 Create a new column in Pandas dataframe based on a given . 16-17


condition
a) Using list comprehension
b)Using dataframe.apply() function
c)Using dataframe .map() function
d) Using numpy.where() function

DS_ LAB Page 2


DAVIET/CSE/2003635

7 Selecting row in Pandas dataframe based on a given condition 18-20


a) Selecting all the rows from the gives dataframes in which
‘percentage’ is greater than 80% using loc()
b) Selecting those rows whose column value is present in the
list using isin() method of the dataframe
c) Selecting rows based on multiple column conditions using
’&’ operator
8 Pandas dataframe where() 21
9 Create Box Plot. 22
10 Create Histogram 23
11 Create Pivot Table. 24
12 Create Heapmap. 25
13 Demostrating means() in Pyhton program 26
14 Demostrating standard deviation in python program 27
15 Calculation skewness and kurtosis in python 28

DS_ LAB Page 3


DAVIET/CSE/2003635

Task 1: Using the pandas Python Library

import requests
download_url =
"https://raw.githubusercontent.com/fivethirtyeight/data/master/nba-elo/
nbaallelo.csv"
target_csv_path = "nba_all_elo.csv"
response = requests.get(download_url)
response.raise_for_status() # Check that the request was successful
with open(target_csv_path, "wb") as f:
f.write(response.content)
print("Download ready.")

import pandas as pd
nba = pd.read_csv("nba_all_elo.csv")
type(nba)

len(nba)

nba.shape

nba.describe()

DS_ LAB Page 4


DAVIET/CSE/2003635

import numpy as np

nba.describe(include=object

nba["team_id"].value_counts()

DS_ LAB Page 5


DAVIET/CSE/2003635

Task 2: a) Write a basic code in python for Matrix input from user

INPUT:
R = int(input("Enter the number of rows:"))
C = int(input("Enter the number of columns:"))
matrix = []
print("Enter the entries rowwise:")
for i in range(R): # A for loop for row entries
a =[]
for j in range(C): # A for loop for column entries
a.append(int(input()))
matrix.append(a)
for i in range(R):
for j in range(C):
print(matrix[i][j], end = " ")
print()

OUTPUT:

b) Create Matrix using map function using Numpy

DS_ LAB Page 6


DAVIET/CSE/2003635

INPUT:
import numpy as np
R = int(input("Enter the number of rows:"))
C = int(input("Enter the number of columns:"))
print("Enter the entries in a single line (separated by space): ")
entries = list(map(int, input().split()))
matrix = np.array(entries).reshape(R, C)
print(matrix)

OUTPUT:

c) Write a python program to demonstrate the operations add , subtract and divide
using Numpy

INPUT:
import numpy as np
print("Add:")
print(np.add(1.0, 4.0))
print("Subtract:")
print(np.subtract(1.0, 4.0))
print("Multiply:")
print(np.multiply(1.0, 4.0))
print("Divide:")
print(np.divide(1.0, 4.0))

OUTPUT:

DS_ LAB Page 7


DAVIET/CSE/2003635

Task 3: Combine two Dataframe in Python-Pandas

a) Concatenating dataframe

INPUT:
import pandas as pd
df1 = pd.DataFrame({'id': ['A01', 'A02', 'A03', 'A04'],
'Name': ['ABC', 'PQR', 'DEF', 'GHI']})
df2 = pd.DataFrame({'id': ['B05', 'B06', 'B07', 'B08'],
'Name': ['XYZ', 'TUV', 'MNO', 'JKL']})
frames = [df1, df2]
result = pd.concat(frames)
display(result)

OUTPUT:

b) Joining dataframe

INPUT:
import pandas as pd

DS_ LAB Page 8


DAVIET/CSE/2003635

df1 = pd.DataFrame({'id': ['A01', 'A02', 'A03', 'A04'],


'Name': ['ABC', 'PQR', 'DEF', 'GHI']})
df3 = pd.DataFrame({'City': ['MUMBAI', 'PUNE', 'MUMBAI', 'DELHI'],
'Age': ['12', '13', '14', '12']})
result = pd.concat([df1, df3], axis=1, join='inner')
display(result)

OUTPUT:

c) Concatenating using append

INPUT:
import pandas as pd
df1 = pd.DataFrame({'id': ['A01', 'A02', 'A03', 'A04'],
'Name': ['ABC', 'PQR', 'DEF', 'GHI']})
df2 = pd.DataFrame({'id': ['B05', 'B06', 'B07', 'B08'],
'Name': ['XYZ', 'TUV', 'MNO', 'JKL']})
result = df1.append(df2)
display(result)

OUTPUT:

DS_ LAB Page 9


DAVIET/CSE/2003635

Task 4: Adding new column to existing dataframe in Pandas

a) By declaring a new list as a column.

INPUT:
import pandas as pd
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
df = pd.DataFrame(data)
address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']
df['Address'] = address
print(df)

OUTPUT:

DS_ LAB Page 10


DAVIET/CSE/2003635

b) By using dataframe insert()

INPUT:
import pandas as pd
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
df = pd.DataFrame(data)
df.insert(2, "Age", [21, 23, 24, 21], True)
print(df)
OUTPUT:

c) Using dataframe assign() method

INPUT:
import pandas as pd
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
df = pd.DataFrame(data)
df2 = df.assign(address=['Delhi', 'Bangalore', 'Chennai', 'Patna'])
print(df2)

OUTPUT:

DS_ LAB Page 11


DAVIET/CSE/2003635

d) By using a dictionary

INPUT:
import pandas as pd
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
address = {'Delhi': 'Jai', 'Bangalore': 'Princi',
'Patna': 'Gaurav', 'Chennai': 'Anuj'}
df = pd.DataFrame(data)
df['Address'] = address
print(df)

OUTPUT:

DS_ LAB Page 12


DAVIET/CSE/2003635

Task 5: Create a new column in Pandas dataframe based on the existing columns
a) Use dataframe. Apply()

INPUT:
import pandas as pd
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Event':['Music', 'Poetry', 'Theatre', 'Comedy'],
'Cost':[10000, 5000, 15000, 2000]})
print(df)

OUTPUT:

DS_ LAB Page 13


DAVIET/CSE/2003635

b) We can achieve the same result by directly perfroming the required operation on
the desired column element-wise

INPUT:
import pandas as pd
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Event':['Music', 'Poetry', 'Theatre', 'Comedy'],
'Cost':[10000, 5000, 15000, 2000]})
df['Discounted_Price'] = df['Cost'] - (0.1 * df['Cost'])
print(df)
df['Discounted_Price'] = df.apply(lambda row: row.Cost -
(row.Cost * 0.1), axis = 1)
print(df)

OUTPUT:

DS_ LAB Page 14


DAVIET/CSE/2003635

c) Using dataframe .map() function

INPUT:
data = {
"name": ["John", "Ted", "Dev", "Brad", "Rex", "Smith", "Samuel", "David"],
"salary": [10000, 20000, 50000, 45500, 19800, 95000, 5000, 50000]
}
df = pd.DataFrame(data)
display(df.head())
def salary_stats(value):
if value < 10000:
return "very low"
if 10000 <= value < 25000:
return "low"
elif 25000 <= value < 40000:
return "average"
elif 40000 <= value < 50000:
return "better"
elif value >= 50000:
return "very good"
df['salary_stats'] = df['salary'].map(salary_stats)
display(df.head())

OUTPUT:

DS_ LAB Page 15


DAVIET/CSE/2003635

Task 6: Create a new column in Pandas dataframe based on a given condition


a) Using list comprehension

DS_ LAB Page 16


DAVIET/CSE/2003635

INPUT:
import pandas as pd
df = pd.DataFrame({'Date' : ['11/8/2011', '11/9/2011', '11/10/2011',
'11/11/2011', '11/12/2011'],
'Event' : ['Music', 'Poetry', 'Music', 'Comedy', 'Poetry']})
print(df)
df['Price'] = [1500 if x =='Music' else 800 for x in df['Event']]
print(df)

OUTPUT:

b) Using dataframe.apply() function

INPUT:
def set_value(row_number, assigned_value):
return assigned_value[row_number]
event_dictionary ={'Music' : 1500, 'Poetry' : 800, 'Comedy' : 1200}
df['Price'] = df['Event'].apply(set_value, args =(event_dictionary, ))
print(df)

OUTPUT:

DS_ LAB Page 17


DAVIET/CSE/2003635

c) Using dataframe .map() function

INPUT:
event_dictionary ={'Music' : 1500, 'Poetry' : 800, 'Comedy' : 1200}
df['Price'] = df['Event'].map(event_dictionary)
print(df)

OUTPUT:

d) Using numpy.where() function

INPUT:
df['Price'] = np.where(df['Event']
=='Music', 1500,800 )
print(df)
OUTPUT:

DS_ LAB Page 18


DAVIET/CSE/2003635

Task 7: Selecting row in Pandas dataframe based on a given condition


a) Selecting all the rows from the gives dataframes in which ‘percentage’ is greater
than 80% using loc()

INPUT:
import pandas as pd
record = {
'Name': ['Ankit', 'Amit', 'Aishwarya', 'Priyanka', 'Priya', 'Shaurya' ],
'Age': [21, 19, 20, 18, 17, 21],
'Stream': ['Math', 'Commerce', 'Science', 'Math', 'Math', 'Science'],
'Percentage': [88, 92, 95, 70, 65, 78] }
dataframe = pd.DataFrame(record, columns = ['Name', 'Age', 'Stream',
'Percentage'])
print("Given Dataframe :\n", dataframe)
rslt_df = dataframe[dataframe['Percentage'] > 80]
print('\nResult dataframe :\n', rslt_df)

OUTPUT:

b) Selecting those rows whose column value is present in the list using isin()
method of the dataframe

INPUT:
import pandas as pd

DS_ LAB Page 19


DAVIET/CSE/2003635

record = {
'Name': ['Ankit', 'Amit', 'Aishwarya', 'Priyanka', 'Priya', 'Shaurya' ],
'Age': [21, 19, 20, 18, 17, 21],
'Stream': ['Math', 'Commerce', 'Science', 'Math', 'Math', 'Science'],
'Percentage': [88, 92, 95, 70, 65, 78]}
dataframe = pd.DataFrame(record, columns = ['Name', 'Age', 'Stream',
'Percentage'])
print("Given Dataframe :\n", dataframe)
options = ['Math', 'Commerce']
rslt_df = dataframe[dataframe['Stream'].isin(options)]
print('\nResult dataframe :\n', rslt_df)

OUTPUT:

c) Selecting rows based on multiple column conditions using ’&’ operator

INPUT:
import pandas as pd
record = {
'Name': ['Ankit', 'Amit', 'Aishwarya', 'Priyanka', 'Priya', 'Shaurya' ],
'Age': [21, 19, 20, 18, 17, 21],
'Stream': ['Math', 'Commerce', 'Science', 'Math', 'Math', 'Science'],
'Percentage': [88, 92, 95, 70, 65, 78]}

DS_ LAB Page 20


DAVIET/CSE/2003635

dataframe = pd.DataFrame(record, columns = ['Name', 'Age', 'Stream',


'Percentage'])
print("Given Dataframe :\n", dataframe)
options = ['Math', 'Science']
rslt_df = dataframe[(dataframe['Age'] == 21) &
dataframe['Stream'].isin(options)]
print('\nResult dataframe :\n', rslt_df)

OUTPUT:

DS_ LAB Page 21


DAVIET/CSE/2003635

Task 8: Pandas dataframe where()

INPUT:
import pandas as pd
data = {
"age": [50, 40, 30, 40, 20, 10, 30],
"qualified": [True, False, False, False, False, True, True]
}
df = pd.DataFrame(data)
print(df)
newdf = df.where(df["age"] > 30)
print(newdf)

OUTPUT:

DS_ LAB Page 22


DAVIET/CSE/2003635

Task 9: Create Box Plot

INPUT:
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(10)
data_1 = np.random.normal(100, 10, 200)
data_2 = np.random.normal(90, 20, 200)
data_3 = np.random.normal(80, 30, 200)
data_4 = np.random.normal(70, 40, 200)
data = [data_1, data_2, data_3, data_4]
fig = plt.figure(figsize =(10, 7))
ax = fig.add_axes([0, 0, 1, 1])
bp = ax.boxplot(data)
plt.show()

OUTPUT:

DS_ LAB Page 23


DAVIET/CSE/2003635

Task 10: Create Histogram

INPUT:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import colors
from matplotlib.ticker import PercentFormatter
np.random.seed(23685752)
N_points = 10000
n_bins = 20
x = np.random.randn(N_points)
y = .8 ** x + np.random.randn(10000) + 25
fig, axs = plt.subplots(1, 1,
figsize =(6, 5),
tight_layout = True)
axs.hist(x, bins = n_bins)
plt.show()

DS_ LAB Page 24


DAVIET/CSE/2003635

OUTPUT:

Task 11: Create Pivot Table

INPUT:
import pandas as pd
df = pd.DataFrame({'Product' : ['Carrots', 'Broccoli', 'Banana', 'Banana',
'Beans', 'Orange', 'Broccoli', 'Banana'],
'Category' : ['Vegetable', 'Vegetable', 'Fruit', 'Fruit',
'Vegetable', 'Fruit', 'Vegetable', 'Fruit'],
'Quantity' : [8, 5, 3, 4, 5, 9, 11, 8],
'Amount' : [270, 239, 617, 384, 626, 610, 62, 90]})
pivot = df.pivot_table(index =['Product'],
values =['Amount'],
aggfunc ='sum')
print(pivot)

OUTPUT:

DS_ LAB Page 25


DAVIET/CSE/2003635

Task 12: Create Heapmap

INPUT:
import numpy as np
import seaborn as sns
import matplotlib.pylab as plt
data_set = np.random.rand( 10 , 10 )
ax = sns.heatmap( data_set , linewidth = 0.5 , cmap = 'coolwarm' )
plt.title( "2-D Heat Map" )
plt.show()

OUTPUT:

DS_ LAB Page 26


DAVIET/CSE/2003635

Task 13: Demostrating means() in Pyhton program

INPUT:
import statistics
data1 = [1, 3, 4, 5, 7, 9, 2]
x = statistics.mean(data1)
print("Mean is :", x)

OUTPUT:

DS_ LAB Page 27


DAVIET/CSE/2003635

Task 14: Demostrating standard deviation in python program

INPUT:
import statistics
sample = [1, 2, 3, 4, 5]
print("Standard Deviation of sample is % s " % (statistics.stdev(sample)))

OUTPUT:

DS_ LAB Page 28


DAVIET/CSE/2003635

Task 15: Calculation skewness and kurtosis in python


INPUT:

import scipy
from scipy.stats import kurtosis
dataset = [10, 25, 14, 26, 35, 45, 67, 90,
40, 50, 60, 10, 16, 18, 20]
from scipy.stats import skew
dataset = [88, 85, 82, 97, 67, 77, 74, 86,
81, 95, 77, 88, 85, 76, 81]
print("SKEWNESS")
print(skew(dataset, axis=0, bias=True))

DS_ LAB Page 29


DAVIET/CSE/2003635

print("KURTOSIS")

OUTPUT:

DS_ LAB Page 30

You might also like