DAVIET/CSE/2003635
D.A.V INSTITUTE OF ENGINERING & TECHNOLOGY
JALANDHAR
Data Science LAB(BTCS 616-18)
Submitted To:- Submitted By:-
Dr. Vinay Chopra Amandeep
(Associate Professor) Roll:-105/20
Branch :- CSE/6 th Sem
DS_ LAB Page 1
DAVIET/CSE/2003635
INDEX
S.NO. EXPERIMENTS Page Remarks
No.
1 Using the pandas Python Library 3-4
2 a) Write a basic code in python for Matrix input from user 5-6
b) Create Matrix using map function using Numpy
c) Write a python program to demonstrate the operations
add , subtract and divide using Numpy
3 Combine two Dataframe in Python-Pandas 7-9
a) Concatenating dataframe
b) Joining dataframe
c) Concatenating using append
4 Adding new column to existing dataframe in Pandas 10-12
a) By declaring a new list as a column.
b) By using dataframe insert()
c) Using dataframe assign() method
d) By using a dictionary
5 Create a new column in Pandas dataframe based on the 13-15
existing columns
a)Use dataframe. Apply()
b) We can achieve the same result by directly perfroming
the required operation on the desired column element-wise
c) Using dataframe .map() function
6 Create a new column in Pandas dataframe based on a given . 16-17
condition
a) Using list comprehension
b)Using dataframe.apply() function
c)Using dataframe .map() function
d) Using numpy.where() function
DS_ LAB Page 2
DAVIET/CSE/2003635
7 Selecting row in Pandas dataframe based on a given condition 18-20
a) Selecting all the rows from the gives dataframes in which
‘percentage’ is greater than 80% using loc()
b) Selecting those rows whose column value is present in the
list using isin() method of the dataframe
c) Selecting rows based on multiple column conditions using
’&’ operator
8 Pandas dataframe where() 21
9 Create Box Plot. 22
10 Create Histogram 23
11 Create Pivot Table. 24
12 Create Heapmap. 25
13 Demostrating means() in Pyhton program 26
14 Demostrating standard deviation in python program 27
15 Calculation skewness and kurtosis in python 28
DS_ LAB Page 3
DAVIET/CSE/2003635
Task 1: Using the pandas Python Library
import requests
download_url =
"https://raw.githubusercontent.com/fivethirtyeight/data/master/nba-elo/
nbaallelo.csv"
target_csv_path = "nba_all_elo.csv"
response = requests.get(download_url)
response.raise_for_status() # Check that the request was successful
with open(target_csv_path, "wb") as f:
f.write(response.content)
print("Download ready.")
import pandas as pd
nba = pd.read_csv("nba_all_elo.csv")
type(nba)
len(nba)
nba.shape
nba.describe()
DS_ LAB Page 4
DAVIET/CSE/2003635
import numpy as np
nba.describe(include=object
nba["team_id"].value_counts()
DS_ LAB Page 5
DAVIET/CSE/2003635
Task 2: a) Write a basic code in python for Matrix input from user
INPUT:
R = int(input("Enter the number of rows:"))
C = int(input("Enter the number of columns:"))
matrix = []
print("Enter the entries rowwise:")
for i in range(R): # A for loop for row entries
a =[]
for j in range(C): # A for loop for column entries
a.append(int(input()))
matrix.append(a)
for i in range(R):
for j in range(C):
print(matrix[i][j], end = " ")
print()
OUTPUT:
b) Create Matrix using map function using Numpy
DS_ LAB Page 6
DAVIET/CSE/2003635
INPUT:
import numpy as np
R = int(input("Enter the number of rows:"))
C = int(input("Enter the number of columns:"))
print("Enter the entries in a single line (separated by space): ")
entries = list(map(int, input().split()))
matrix = np.array(entries).reshape(R, C)
print(matrix)
OUTPUT:
c) Write a python program to demonstrate the operations add , subtract and divide
using Numpy
INPUT:
import numpy as np
print("Add:")
print(np.add(1.0, 4.0))
print("Subtract:")
print(np.subtract(1.0, 4.0))
print("Multiply:")
print(np.multiply(1.0, 4.0))
print("Divide:")
print(np.divide(1.0, 4.0))
OUTPUT:
DS_ LAB Page 7
DAVIET/CSE/2003635
Task 3: Combine two Dataframe in Python-Pandas
a) Concatenating dataframe
INPUT:
import pandas as pd
df1 = pd.DataFrame({'id': ['A01', 'A02', 'A03', 'A04'],
'Name': ['ABC', 'PQR', 'DEF', 'GHI']})
df2 = pd.DataFrame({'id': ['B05', 'B06', 'B07', 'B08'],
'Name': ['XYZ', 'TUV', 'MNO', 'JKL']})
frames = [df1, df2]
result = pd.concat(frames)
display(result)
OUTPUT:
b) Joining dataframe
INPUT:
import pandas as pd
DS_ LAB Page 8
DAVIET/CSE/2003635
df1 = pd.DataFrame({'id': ['A01', 'A02', 'A03', 'A04'],
'Name': ['ABC', 'PQR', 'DEF', 'GHI']})
df3 = pd.DataFrame({'City': ['MUMBAI', 'PUNE', 'MUMBAI', 'DELHI'],
'Age': ['12', '13', '14', '12']})
result = pd.concat([df1, df3], axis=1, join='inner')
display(result)
OUTPUT:
c) Concatenating using append
INPUT:
import pandas as pd
df1 = pd.DataFrame({'id': ['A01', 'A02', 'A03', 'A04'],
'Name': ['ABC', 'PQR', 'DEF', 'GHI']})
df2 = pd.DataFrame({'id': ['B05', 'B06', 'B07', 'B08'],
'Name': ['XYZ', 'TUV', 'MNO', 'JKL']})
result = df1.append(df2)
display(result)
OUTPUT:
DS_ LAB Page 9
DAVIET/CSE/2003635
Task 4: Adding new column to existing dataframe in Pandas
a) By declaring a new list as a column.
INPUT:
import pandas as pd
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
df = pd.DataFrame(data)
address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']
df['Address'] = address
print(df)
OUTPUT:
DS_ LAB Page 10
DAVIET/CSE/2003635
b) By using dataframe insert()
INPUT:
import pandas as pd
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
df = pd.DataFrame(data)
df.insert(2, "Age", [21, 23, 24, 21], True)
print(df)
OUTPUT:
c) Using dataframe assign() method
INPUT:
import pandas as pd
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
df = pd.DataFrame(data)
df2 = df.assign(address=['Delhi', 'Bangalore', 'Chennai', 'Patna'])
print(df2)
OUTPUT:
DS_ LAB Page 11
DAVIET/CSE/2003635
d) By using a dictionary
INPUT:
import pandas as pd
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2],
'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
address = {'Delhi': 'Jai', 'Bangalore': 'Princi',
'Patna': 'Gaurav', 'Chennai': 'Anuj'}
df = pd.DataFrame(data)
df['Address'] = address
print(df)
OUTPUT:
DS_ LAB Page 12
DAVIET/CSE/2003635
Task 5: Create a new column in Pandas dataframe based on the existing columns
a) Use dataframe. Apply()
INPUT:
import pandas as pd
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Event':['Music', 'Poetry', 'Theatre', 'Comedy'],
'Cost':[10000, 5000, 15000, 2000]})
print(df)
OUTPUT:
DS_ LAB Page 13
DAVIET/CSE/2003635
b) We can achieve the same result by directly perfroming the required operation on
the desired column element-wise
INPUT:
import pandas as pd
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Event':['Music', 'Poetry', 'Theatre', 'Comedy'],
'Cost':[10000, 5000, 15000, 2000]})
df['Discounted_Price'] = df['Cost'] - (0.1 * df['Cost'])
print(df)
df['Discounted_Price'] = df.apply(lambda row: row.Cost -
(row.Cost * 0.1), axis = 1)
print(df)
OUTPUT:
DS_ LAB Page 14
DAVIET/CSE/2003635
c) Using dataframe .map() function
INPUT:
data = {
"name": ["John", "Ted", "Dev", "Brad", "Rex", "Smith", "Samuel", "David"],
"salary": [10000, 20000, 50000, 45500, 19800, 95000, 5000, 50000]
}
df = pd.DataFrame(data)
display(df.head())
def salary_stats(value):
if value < 10000:
return "very low"
if 10000 <= value < 25000:
return "low"
elif 25000 <= value < 40000:
return "average"
elif 40000 <= value < 50000:
return "better"
elif value >= 50000:
return "very good"
df['salary_stats'] = df['salary'].map(salary_stats)
display(df.head())
OUTPUT:
DS_ LAB Page 15
DAVIET/CSE/2003635
Task 6: Create a new column in Pandas dataframe based on a given condition
a) Using list comprehension
DS_ LAB Page 16
DAVIET/CSE/2003635
INPUT:
import pandas as pd
df = pd.DataFrame({'Date' : ['11/8/2011', '11/9/2011', '11/10/2011',
'11/11/2011', '11/12/2011'],
'Event' : ['Music', 'Poetry', 'Music', 'Comedy', 'Poetry']})
print(df)
df['Price'] = [1500 if x =='Music' else 800 for x in df['Event']]
print(df)
OUTPUT:
b) Using dataframe.apply() function
INPUT:
def set_value(row_number, assigned_value):
return assigned_value[row_number]
event_dictionary ={'Music' : 1500, 'Poetry' : 800, 'Comedy' : 1200}
df['Price'] = df['Event'].apply(set_value, args =(event_dictionary, ))
print(df)
OUTPUT:
DS_ LAB Page 17
DAVIET/CSE/2003635
c) Using dataframe .map() function
INPUT:
event_dictionary ={'Music' : 1500, 'Poetry' : 800, 'Comedy' : 1200}
df['Price'] = df['Event'].map(event_dictionary)
print(df)
OUTPUT:
d) Using numpy.where() function
INPUT:
df['Price'] = np.where(df['Event']
=='Music', 1500,800 )
print(df)
OUTPUT:
DS_ LAB Page 18
DAVIET/CSE/2003635
Task 7: Selecting row in Pandas dataframe based on a given condition
a) Selecting all the rows from the gives dataframes in which ‘percentage’ is greater
than 80% using loc()
INPUT:
import pandas as pd
record = {
'Name': ['Ankit', 'Amit', 'Aishwarya', 'Priyanka', 'Priya', 'Shaurya' ],
'Age': [21, 19, 20, 18, 17, 21],
'Stream': ['Math', 'Commerce', 'Science', 'Math', 'Math', 'Science'],
'Percentage': [88, 92, 95, 70, 65, 78] }
dataframe = pd.DataFrame(record, columns = ['Name', 'Age', 'Stream',
'Percentage'])
print("Given Dataframe :\n", dataframe)
rslt_df = dataframe[dataframe['Percentage'] > 80]
print('\nResult dataframe :\n', rslt_df)
OUTPUT:
b) Selecting those rows whose column value is present in the list using isin()
method of the dataframe
INPUT:
import pandas as pd
DS_ LAB Page 19
DAVIET/CSE/2003635
record = {
'Name': ['Ankit', 'Amit', 'Aishwarya', 'Priyanka', 'Priya', 'Shaurya' ],
'Age': [21, 19, 20, 18, 17, 21],
'Stream': ['Math', 'Commerce', 'Science', 'Math', 'Math', 'Science'],
'Percentage': [88, 92, 95, 70, 65, 78]}
dataframe = pd.DataFrame(record, columns = ['Name', 'Age', 'Stream',
'Percentage'])
print("Given Dataframe :\n", dataframe)
options = ['Math', 'Commerce']
rslt_df = dataframe[dataframe['Stream'].isin(options)]
print('\nResult dataframe :\n', rslt_df)
OUTPUT:
c) Selecting rows based on multiple column conditions using ’&’ operator
INPUT:
import pandas as pd
record = {
'Name': ['Ankit', 'Amit', 'Aishwarya', 'Priyanka', 'Priya', 'Shaurya' ],
'Age': [21, 19, 20, 18, 17, 21],
'Stream': ['Math', 'Commerce', 'Science', 'Math', 'Math', 'Science'],
'Percentage': [88, 92, 95, 70, 65, 78]}
DS_ LAB Page 20
DAVIET/CSE/2003635
dataframe = pd.DataFrame(record, columns = ['Name', 'Age', 'Stream',
'Percentage'])
print("Given Dataframe :\n", dataframe)
options = ['Math', 'Science']
rslt_df = dataframe[(dataframe['Age'] == 21) &
dataframe['Stream'].isin(options)]
print('\nResult dataframe :\n', rslt_df)
OUTPUT:
DS_ LAB Page 21
DAVIET/CSE/2003635
Task 8: Pandas dataframe where()
INPUT:
import pandas as pd
data = {
"age": [50, 40, 30, 40, 20, 10, 30],
"qualified": [True, False, False, False, False, True, True]
}
df = pd.DataFrame(data)
print(df)
newdf = df.where(df["age"] > 30)
print(newdf)
OUTPUT:
DS_ LAB Page 22
DAVIET/CSE/2003635
Task 9: Create Box Plot
INPUT:
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(10)
data_1 = np.random.normal(100, 10, 200)
data_2 = np.random.normal(90, 20, 200)
data_3 = np.random.normal(80, 30, 200)
data_4 = np.random.normal(70, 40, 200)
data = [data_1, data_2, data_3, data_4]
fig = plt.figure(figsize =(10, 7))
ax = fig.add_axes([0, 0, 1, 1])
bp = ax.boxplot(data)
plt.show()
OUTPUT:
DS_ LAB Page 23
DAVIET/CSE/2003635
Task 10: Create Histogram
INPUT:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import colors
from matplotlib.ticker import PercentFormatter
np.random.seed(23685752)
N_points = 10000
n_bins = 20
x = np.random.randn(N_points)
y = .8 ** x + np.random.randn(10000) + 25
fig, axs = plt.subplots(1, 1,
figsize =(6, 5),
tight_layout = True)
axs.hist(x, bins = n_bins)
plt.show()
DS_ LAB Page 24
DAVIET/CSE/2003635
OUTPUT:
Task 11: Create Pivot Table
INPUT:
import pandas as pd
df = pd.DataFrame({'Product' : ['Carrots', 'Broccoli', 'Banana', 'Banana',
'Beans', 'Orange', 'Broccoli', 'Banana'],
'Category' : ['Vegetable', 'Vegetable', 'Fruit', 'Fruit',
'Vegetable', 'Fruit', 'Vegetable', 'Fruit'],
'Quantity' : [8, 5, 3, 4, 5, 9, 11, 8],
'Amount' : [270, 239, 617, 384, 626, 610, 62, 90]})
pivot = df.pivot_table(index =['Product'],
values =['Amount'],
aggfunc ='sum')
print(pivot)
OUTPUT:
DS_ LAB Page 25
DAVIET/CSE/2003635
Task 12: Create Heapmap
INPUT:
import numpy as np
import seaborn as sns
import matplotlib.pylab as plt
data_set = np.random.rand( 10 , 10 )
ax = sns.heatmap( data_set , linewidth = 0.5 , cmap = 'coolwarm' )
plt.title( "2-D Heat Map" )
plt.show()
OUTPUT:
DS_ LAB Page 26
DAVIET/CSE/2003635
Task 13: Demostrating means() in Pyhton program
INPUT:
import statistics
data1 = [1, 3, 4, 5, 7, 9, 2]
x = statistics.mean(data1)
print("Mean is :", x)
OUTPUT:
DS_ LAB Page 27
DAVIET/CSE/2003635
Task 14: Demostrating standard deviation in python program
INPUT:
import statistics
sample = [1, 2, 3, 4, 5]
print("Standard Deviation of sample is % s " % (statistics.stdev(sample)))
OUTPUT:
DS_ LAB Page 28
DAVIET/CSE/2003635
Task 15: Calculation skewness and kurtosis in python
INPUT:
import scipy
from scipy.stats import kurtosis
dataset = [10, 25, 14, 26, 35, 45, 67, 90,
40, 50, 60, 10, 16, 18, 20]
from scipy.stats import skew
dataset = [88, 85, 82, 97, 67, 77, 74, 86,
81, 95, 77, 88, 85, 76, 81]
print("SKEWNESS")
print(skew(dataset, axis=0, bias=True))
DS_ LAB Page 29
DAVIET/CSE/2003635
print("KURTOSIS")
OUTPUT:
DS_ LAB Page 30