0% found this document useful (0 votes)

46 views29 pages

Data Sci

The document contains 8 questions related to data analysis using pandas on a dataset containing height and weight information. The questions cover topics like loading the dataset, viewing basic statistics, plotting graphs, and modifying the dataframe. Code solutions are provided for each question to demonstrate various pandas functions like read_csv(), describe(), plot(), dropna(), etc. on the height-weight dataset.

Uploaded by

ketisi2987

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views29 pages

Data Sci

Uploaded by

ketisi2987

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

You are on page 1/ 29

ROLL NO:

Assignment 1:SET-A

Q.1: Write a python program to create a data frame containing columns name as
name,age,and percentage. Add 10 rows to the data frame. View the data frame.
import pandas as pd
df=pd.DataFrame(columns=['Name','Age','Percentage'])
df.loc[0]=['subhan',20,78]
df.loc[1]=['tofik',22,45]
df.loc[2]=['rayyan',21,15]
df.loc[3]=['sharif',21,65]
df.loc[4]=['alim',88,99]
df.loc[5]=['shoib',18,97]
df.loc[6]=['danish',19,49]
df.loc[7]=['mustakim',25,6]
df.loc[8]=['mosin',20,78]
df.loc[9]=['arbaz',22,15]
print(df)

Output:
Name Age Percentage
0 subhan 20 78
1 tofik 22 45
2 rayyan 21 15
3 sharif 21 65
4 alim 88 99
5 shoib 18 97
6 danish 19 49
7 mustakim 25 6
8 mosin 20 78
9 arbaz 22 15

Q.2: Write a pyhton program to print shape,number of rows-number of columns, data

type, feature names, and description of the data.
import pandas as pd
df=pd.DataFrame(columns=['Name','Age','Percentage'])
df.loc[0]=['alim',20,78]
df.loc[1]=['tofik',22,45]
df.loc[2]=['rayyan',21,15]
df.loc[3]=['sharif',21,65]
df.loc[4]=['subhan',20,12]
df.loc[5]=['shoib',18,97]
df.loc[6]=['danish',19,49]
df.loc[7]=['mustakim',25,6]
df.loc[8]=['mosin',20,78]
df.loc[9]=['arbaz',22,15]
print(df.shape)
print(df.size)
print(df.dtypes)
print(df.columns)
print(df.describe)

Output:
(10, 3)
30
Name object
Age object
Percentage object
dtype: object
Index(['Name', 'Age', 'Percentage'], dtype='object')
<bound method NDFrame.describe of Name Age Percentage
0 alim 20 78
1 tofik 22 45
2 rayyan 21 15
3 sharif 21 65
4 subhan 20 12
5 shoib 18 97
6 danish 19 49
7 mustakim 25 6
8 mosin 20 78
9 arbaz 22 15>

Q.3: Write a python program to view basic statistical detail of data.

import pandas as pd
df=pd.DataFrame(columns=['Name','Age','Percentage'])
df.loc[0]=['subhan',20,78]
df.loc[1]=['tofik',22,45]
df.loc[2]=['rayyan',21,15]
df.loc[3]=['sharif',21,65]
df.loc[4]=['alim',20,12]
df.loc[5]=['shoib',18,97]
df.loc[6]=['danish',19,49]
df.loc[7]=['mustakim',25,6]
df.loc[8]=['mosin',20,78]
df.loc[9]=['arbaz',22,15]
print(df.describe())

Output:
Name Age Percentage
count 10 10 10
unique 10 6 8
top arbaz 20 15
freq 1 3 2
Q.4: Write a python program to add 5 rows with duplicate values and missing value. Add
a column Remark with empty values.

import pandas as pd
df=pd.DataFrame(columns=['Name','Age','Percentage'])
df.loc[0]=['alim',20,78]
df.loc[1]=['tofik',22,45]
df.loc[2]=['subhan',21,15]
df.loc[3]=[None,21,65]
df.loc[4]=['subhan',20,12]
df['Remarks']=None
print(df

Output:
Name Age Percentage Remarks
0 alim 20 78 None
1 tofik 22 45 None
2 subhan 21 15 None
3 NaN 21 65 None
4 subhan 20 12 None

Q.5: Write a python program to get the number of observations missing values and
duplicate values.

import pandas as pd
df=pd.DataFrame(columns=['Name','Age','Percentage'])
df.loc[0]=['alim',20,15]
df.loc[1]=['tofik',22,45]
df.loc[2]=['subhan',20,15]
df.loc[3]=[None,21,65]
df.loc[4]=['subhan',20,15]
print(df['Name'].size)
missing=df.isnull()
print(missing)
dup=df.duplicated()
print(dup)

Output:
5
Name Age Percentage
0 False False False
1 False False False
2 False False False
3 True False False
4 False False False
0 False
1 False
2 False
3 False
4 True
dtype: bool
Q.6:Write a python program to drop ‘Remark’ column from the dataframe. Also drop all
nulland empty values.print the modified data.

import pandas as pd
df=pd.DataFrame(columns=['Name','Age','Percentage'])
df.loc[0]=['ALIM',20,15]
df.loc[1]=['SHOAIB',22,45]
df.loc[2]=['SHARIF',20,15]
df.loc[3]=[None,21,65]
df.loc[4]=['ALIM',20,15]
df['Remarks']=None
print(df)
df.drop(labels=['Remarks'],axis=1,inplace=True)
print(df)
df.dropna(axis=0,inplace=True)
print(df)

Output:
Name Age Percentage Remarks
0 ALIM 20 15 None
1 SHOAIB 22 45 None
2 SHARIF 20 15 None
3 NaN 21 65 None
4 ALIM 20 15 None
Name Age Percentage
0 ALIM 20 15
1 SHOAIB 22 45
2 SHARIF 20 15
3 NaN 21 65
4 ALIM 20 15
Name Age Percentage
0 ALIM 20 15
1 SHOAIB 22 45
2 SHARIF 20 15
4 ALIM 20 15

Q.7:Write a python program to generate a line plot pf name vs percentage

import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame(columns=['Name','age','percentage'])
df.loc[0]=['kashish',19,95]
df.loc[1]=['Ramiza',20,91]
df.loc[2]=['naki',7,90]
df.loc[3]=['Faisal',18,85]
df.loc[4]=['Aman',23,80]
df.loc[5]=['Anas',24,75]
df.loc[6]=['Fazil',21,70]
df.loc[7]=['Mustaqim',22,65]
df.loc[8]=['Alfiya',20,89]
df.loc[9]=['Aqsa',21,86]
print(df)
df.plot(x="Name",y="percentage")
plt.title('Line plot name vs percentage')
plt.xlabel('name of student')
plt.ylabel('percentage')
plt.show()
print(df)

Output:
Name age percentage
0 kashish 19 95
1 Ramiza 20 91
2 naki 7 90
3 Faisal 18 85
4 Aman 23 80
5 Anas 24 75
6 Fazil 21 70
7 Mustaqim 22 65
8 Alfiya 20 89
9 Aqsa 21 86

Q.8:Write a python program to generate a scatter plot pf name vs percentage.

import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame(columns=['Name','age','percentage'])
df.loc[0]=['kashish',19,95]
df.loc[1]=['Ramiza',20,91]
df.loc[2]=['naki',7,90]
df.loc[3]=['Faisal',18,85]
df.loc[4]=['Aman',23,80]
df.loc[5]=['Anas',24,75]
df.loc[6]=['Fazil',21,70]
df.loc[7]=['Mustaqim',22,65]
df.loc[8]=['Alfiya',20,89]
df.loc[9]=['Aqsa',21,86]
print(df)
plt.scatter(x=df["Name"],y=df["percentage"])
plt.title('Scatter plot name vs percentage')
plt.xlabel('name of student')
plt.ylabel('percentage')
plt.show()
print(df)

Output:
Name age percentage
0 kashish 19 95
1 Ramiza 20 91
2 naki 7 90
3 Faisal 18 85
4 Aman 23 80
5 Anas 24 75
6 Fazil 21 70
7 Mustaqim 22 65
8 Alfiya 20 89
9 Aqsa 21 86
Assingment 1:SET-B
Q1) Download the heights and weights dataset and load the data set from a given csv file
into a dataframe. print the first,last 10 rows and random 20 rows.

import pandas as pd
data=pd.read_csv('HeightWeight.csv')
df=pd.DataFrame(data)
print('/nfirst 10 rows')
print(df.head(10))

print('/nlast 10 rows')
print(df.tail(10))

print('/n random 10 rows')

print(df.sample(20))

Output:
/nfirst 10 rows
Index Height(Inches) Weight(Pounds)
0 1 65.78331 112.9925
1 2 71.51521 136.4873
2 3 69.39874 153.0269
3 4 68.21660 142.3354
4 5 67.78781 144.2971
5 6 68.69784 123.3024
6 7 69.80204 141.4947
7 8 70.01472 136.4623
8 9 67.90265 112.3723
9 10 66.78236 120.6672
/nlast 10 rows
Index Height(Inches) Weight(Pounds)
24990 24991 69.97767 125.3672
24991 24992 71.91656 128.2840
24992 24993 70.96218 146.1936
24993 24994 66.19462 118.7974
24994 24995 67.21126 127.6603
24995 24996 69.50215 118.0312
24996 24997 64.54826 120.1932
24997 24998 64.69855 118.2655
24998 24999 67.52918 132.2682
24999 25000 68.87761 124.8742
/n random 10 rows
Index Height(Inches) Weight(Pounds)
18515 18516 71.15912 143.7729
3550 3551 65.95300 130.0755
16400 16401 67.20032 130.9151
10718 10719 70.79804 125.7816
4830 4831 66.24238 121.9611
9121 9122 68.05361 137.0546
11516 11517 68.67632 115.3375
3126 3127 67.59507 126.1888
15670 15671 68.55083 137.6187
20293 20294 65.96939 139.4453
5842 5843 68.92916 129.1092
21409 21410 69.27081 124.6497
15365 15366 67.18395 137.6251
16889 16890 69.07788 131.0112
12382 12383 69.50005 135.9850
14220 14221 68.49492 132.0698
2701 2702 69.25709 142.5795
4578 4579 68.91069 103.7011
11468 11469 65.93881 125.8178
19312 19313 68.47651 117.6580

Q2) Write a python program to find the shape, size datatypes of the dataframe object.

import pandas as pd
data=pd.read_csv('HeightWeight.csv')
df=pd.DataFrame(data)
print('/n shape of dataframe',df.shape)
print('/n size of dataframe',df.size)
print('/n datatype of dataframe',df.dtypes)

Output:
/n shape of dataframe (25000, 3)
/n size of dataframe 75000
/n datatype of dataframe Index int64
Height(Inches) float64
Weight(Pounds) float64
dtype: object

Q3) Write a python program to view basic statistical details of the data.
import pandas as pd
data=pd.read_csv('HeightWeight.csv')
df=pd.DataFrame(data)
print("/n basic statistical details of a data:/n",df.describe())

Output:
basic statistical details of a data:
Index Height(Inches) Weight(Pounds)
count 25000.000000 25000.000000 25000.000000
mean 12500.500000 67.993114 127.079421
std 7217.022701 1.901679 11.660898
min 1.000000 60.278360 78.014760
25% 6250.750000 66.704397 119.308675
50% 12500.500000 67.995700 127.157750
75% 18750.250000 69.272958 134.892850
max 25000.000000 75.152800 170.924000
Q4) Write a pyython program to get the number of observation , missing values and nan
values.

import pandas as pd
import numpy as np
data=pd.read_csv('HeightWeight.csv')
df=pd.DataFrame(data)
print('/n number of observation:',df['Index'].size)
missing=df.isnull()
nan_values=np.isnan(df)
print("/n nan values:",nan_values.size)

Output:
number of observation : 25000

missing values:
75000

nan values : 75000

Q5) Write a python program to add a column to the dataframe “BMI” which is
calculated as: weight/height^2.

import pandas as pd
data=pd.read_csv('HeightWeight.csv')
df=pd.DataFrame(data)
df['BMI']=(df['Weight(Pounds)']/df['Height(Inches)']**2)
print("after adding colom/n",df)

Output:
after adding column/n Index Height(Inches) Weight(Pounds)
BMI
0 1 65.78331 112.9925 2.950311
1 2 71.51521 136.4873 3.642400
2 3 69.39874 153.0269 4.862195
3 4 68.21660 142.3354 4.353572
4 5 67.78781 144.2971 4.531187
... ... ... ... ...
24995 24996 69.50215 118.0312 2.884013
24996 24997 64.54826 120.1932 3.467294
24997 24998 64.69855 118.2655 3.341389
24998 24999 67.52918 132.2682 3.836436
24999 25000 68.87761 124.8742 3.286921

[25000 rows x 4 columns]

Q6) Write a python program to find the maximum and minimum BMI.
import pandas as pd
import numpy as np
data=pd.read_csv('HeightWeight.csv')
df=pd.DataFrame(data)
df['BMI']=((df['Weight(Pounds)']/df['Height(Inches)'])**2)
print("\n Maximum of BMI = ",max(df['BMI']))
print("\n Minimum of BMI = ",min(df['BMI']))

Output::
Maximum of BMI = 5.933879009339526

Minimum of BMI = 1.531593289721334

Q7) Write a python program to generate a scatter plot of height vs weight.

import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('HeightWeight.csv')
df = pd.DataFrame(data)

plt.scatter(x=df['Height(Inches)'],y=df['Weight(Pounds)'],c='blue')
plt.title("Scatter Plot")
plt.xlable("Height(Inches)")
plt.ylabel("Weight(Pounds)")
plt.show()

Output::
ROLL NO:

Assingment 2:SET-A
Q1) Create an array using numpy and display mean and median.
importumpy as np
demo = np.array([[30,75,70],[80,90,20],[50,95,60]])
print(demo)
print('/n')
print(np.mean(demo))
print('/n')
print(np.median(demo))
print('/n')

Output::
[[30 75 70]
[80 90 20]
[50 95 60]]
/n
63.333333333333336
/n
70.0
/n

Q2) Create a data frame as follows: Print df.sum

import pandas as pd
import numpy as np
d= {'Name':pd.Series(['Ram','Sham','Meena','Seeta','Geeta','Rakesh','Madhav']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
df=pd.DataFrame(d)
print(df.sum())

Output::
Name RamShamMeenaSeetaGeetaRakeshMadhav
Age 181
Rating 25.61
dtype: object

Q3) For the above data display statistical details:

import pandas as pd
import numpy as np
md={'Name':pd.Series(['Ram','Sham','Meena','Seeta','Geeta','Rakesh','Madhav']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
df=pd.DataFrame(md)
print(df.describe())

Output:
Age Rating
count 7.000000 7.000000
mean 25.857143 3.658571
std 2.734262 0.698628
min 23.000000 2.560000
25% 24.000000 3.220000
50% 25.000000 3.800000
75% 27.500000 4.105000
max 30.000000 4.600000

Q4) Consider the array [13,52,44,32,30,0,36,45].Calculate standard deviation.

import numpy as np
data=np.array([13,52,44,32,30,0,36,45])
print("Standard Deviation of sample is %s"%(np.std(data)))

Output::
Standard Deviation of sample is 16.263455967290593

Q5) Create a data frame as follows:

Virat Rohit
92 89
97 87
85 67
74 55
71 47
55 72
85 76
63 79
42 44
32 92
71 99

55 47

Display the mean of every player :

import pandas as pd
import scipy.stats as s
score={'Virat':[92,97,85,74,71,55,85,63,42,32,71,55],'Rohit':
[89,87,67,55,47,72,76,79,44,92,99,47]}
df=pd.DataFrame(score)
print(df)
print("\nArithmetic Mean Values")
print("Score 1",s.tmean(df["Virat"]).round(2))
print("Score 2",s.tmean(df["Rohit"]).round(2))

Output:
Virat Rohit
0 92 89
1 97 87
2 85 67
3 74 55
4 71 47
5 55 72
6 85 76
7 63 79
8 42 44
9 32 92
10 71 99
11 55 47

Arithmetic Mean Values

Score 1 68.5
Score 2 71.17
Q6) Consider the array[24,29,20,22,24,26,27,30,20,31,26,38,44,47],Calculate IQR.

import numpy as np
mydata=np.array([24,29,20,22,24,26,27,30,20,31,26,38,44,47])
q3,q1=np.percentile(mydata,[75,25])
iqrvalue=q3-q1
print(iqrvalue)

Output:
6.75

Q7) Write a python program to find the maximum value of a given flattened array:

import numpy as np
arr=np.array([[25,26,45],[12,36,42],[8,50,65]])
print("\n Original flattened Array:\n",arr)
arr.flatten()
max=np.max(arr)
print("\n Maximum value of flattened array:\n",max)
min=np.min(arr)
print("\n Minimum value of flattened array:\n",min)

Output:
Original flattened Array:
[[25 26 45]
[12 36 42]
[ 8 50 65]]

Maximum value of flattened array:

Minimum value of flattened array:

Q8) Write a python program to compute Eclidian Distance between two data points in a
dataset.

import numpy as np
point1= np.array((1,2,3))
point2= np.array((1,1,1))
dist = np.linalg.norm(point1 - point2)
print("Euclidian Distance between two pints: ",dist)

Output:
Euclidian Distance between two pints: 2.23606797749979

Q.9: Create one dataframe of data values.Find out mean , range , and IQR for this data.
import pandas as pd
import numpy as np
import scipy.stats as s
d = [32,36,46,47,56,69,75,79,79,88,89,91,92,93,96,97,101,105,112,116]
data=pd.DataFrame(d)
print("\n DataFrame:\n",data)
print("\n Mean of dataframe : ",s.tmean(data))
data_range = np.max(data)-np.min(data)
print("\n Range of dataframe : ",data_range)
Q1 = np.median(data[:10])
Q3 = np.median(data[10:])
IQR = Q3 - Q1
print("\n Inter Quartile Range (IQR) of dataframe : ",IQR)

Output::
DataFrame:
0
0 32
1 36
2 46
3 47
4 56
5 69
6 75
7 79
8 79
9 88
10 89
11 91
12 92
13 93
14 96
15 97
16 101
17 105
18 112
19 116

Mean of dataframe : 79.95

DataFrame:
0
0 32
1 36
2 46
3 47
4 56
5 69
6 75
7 79
8 79
9 88
10 89
11 91
12 92
13 93
14 96
15 97
16 101
17 105
18 112
19 116

Mean of dataframe : 79.95

Range of dataframe : 0 84
dtype: int64

Inter Quartile Range (IQR) of dataframe : 34.0

Q10)WRITE A PYTHON PROGRAM TO COMPUTE SUM OF MANHATTAN

DISTANCE BETWEEN ALL PAIRS OF POINTS:
x = [2,5,7]
y = [4,6,8]
n = len(x)
def distance_sum (x,y,n):
sum = 0
for i in range(n):
for j in range(i+1,n):

sum += (abs(x[i] - x[j])) + (abs(y[i] - y[j]))

return sum
print("\n sum of Manhattan distance between all pairs of point is =",distance_sum(x,y,n))

OUPUT:
sum of Manhattan distance between all pairs of point is = 18

Q.12: Create a Dataframe for student's information such name, graduation percentage
and age. Display average age of student ,average of graduation percentage. And ,also
describe all basic statistics od data.(Hint:use describe()).
import pandas as pd
import scipy.stats as s
data={'Name':['sharif','shoaib','nafisa','alim'],'Age':[20,22,23,21],
'perc':[65.2,78.4,78.6,74.5]}
df=pd.DataFrame(data)
print("\n Average Age :",sum(df['Age']/len(df['Age'])))
print("\n Average Percentage : ",sum(df['perc']/len(df['perc'])))
print("\n Basic Stastistics of data :\n",df.describe())

Output:
Average Age : 21.5
Average Percentage : 74.17500000000001

Basic Stastistics of data :

Age perc
count 4.000000 4.000000
mean 21.500000 74.175000
std 1.290994 6.273954
min 20.000000 65.200000
25% 20.750000 72.175000
50% 21.500000 76.450000
75% 22.250000 78.450000
max 23.000000 78.600000

Q.11: Write a Numpy program to compute the histogram of nums agains bins.
Sample Output:
nums:[0.5 0.7 1.0 1.2 1.3 2.1]
bins:[0 1 2 3].
import numpy as np
import matplotlib.pyplot as plt
nums = np.array([0.5,0.7,1.0,1.2,1.3,2.1])
bins = np.array([0,1,2,3])
print("nums: ",nums)
print("bins: ",bins)
print("Result:", np.histogram(nums, bins))
plt.hist(nums, bins=bins)
plt.show()

Output:
nums: [0.5 0.7 1. 1.2 1.3 2.1]
bins: [0 1 2 3]
Result: (array([2, 3, 1]), array([0, 1, 2, 3]))
Assingment 2:SET-B

Q1) Download iris dataset file.Read this csv file using read_csv() function.Take sample
from entire dataset.Display maximum and minimum values of all numeric attributes.

import pandas as pd
data=pd.read_csv("Iris.csv")
df=pd.DataFrame(data)
sample=df.sample()
print("\n Sample from Dataset:\n",sample)
print("\n Maximum of sepal Length:",max(df['SepalLengthCm']))
print("\n Minimum of sepal Length:",min(df['SepalLengthCm']))
print("\n Maximum of sepal Width:",max(df['SepalWidthCm']))
print("\n Minimum of sepal Width:",min(df['SepalWidthCm']))

print("\n Maximum of Petal Length:",max(df['PetalLengthCm']))

print("\n Minimum of Petal Length:",min(df['PetalLengthCm']))
print("\n Maximum of Petal Width:",max(df['PetalWidthCm']))
print("\n Minimum of Petal Width:",min(df['PetalWidthCm']))

Output:

Maximum of sepal Length: 7.9

Minimum of sepal Length: 4.3

Maximum of sepal Width: 4.4

Minimum of sepal Width: 2.0

Maximum of Petal Length: 6.9

Minimum of Petal Length: 1.0

Maximum of Petal Width: 2.5

Minimum of Petal Width: 0.1

Q.2:Continue with above dataset ,find number of records for each distinct value of class
attributes.Consider entire dataset and not the sample.

import pandas as pd
data=pd.read_csv("Iris.csv")
df=pd.DataFrame(data)
print("\n DataFrame:\n",df)
cnt=df['Species'].value_counts()
print("\n number of records for each distinct value of species attribute:\n",cnt)

Output:

number of records for each distinct value of species attribute:

Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
Name: Species, dtype: int64

Q.3:Display column-wise mean,and median for iris dataset (Hint:Use mean() and
median() function of pandas dataframe.

import pandas as pd
import scipy.stats as s
import statistics as st
data=pd.read_csv('Iris.csv')
df=pd.DataFrame(data)
print("\n DataFrame :\n",df)
print("\n Mean of sepal Length:", s.tmean(df['SepalLengthCm']))
print("\n Median of sepal Length:", st.median(df['SepalLengthCm']))
print("\n Mean of sepal Width:", s.tmean(df['SepalWidthCm']))
print("\n Median of sepal Width:", st.median(df['SepalWidthCm']))

print("\n Mean of Petal Length:", s.tmean(df['PetalLengthCm']))

print("\n Median of Petal Length:", st.median(df['PetalLengthCm']))
print("\n Mean of Petal Width:", s.tmean(df['PetalWidthCm']))
print("\n Median of Petal Width:", st.median(df['PetalWidthCm']))

OUTPUT:
DataFrame :
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm \
0 1 5.1 3.5 1.4 0.2
1 2 4.9 3.0 1.4 0.2
2 3 4.7 3.2 1.3 0.2
3 4 4.6 3.1 1.5 0.2
4 5 5.0 3.6 1.4 0.2
.. ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3
146 147 6.3 2.5 5.0 1.9
147 148 6.5 3.0 5.2 2.0
148 149 6.2 3.4 5.4 2.3
149 150 5.9 3.0 5.1 1.8

Species
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
.. ...
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica

[150 rows x 6 columns]

Mean of sepal Length: 5.843333333333334

Median of sepal Length: 5.8

Mean of sepal Width: 3.0540000000000003

Median of sepal Width: 3.0

Mean of Petal Length: 3.758666666666666

Median of Petal Length: 4.35

Mean of Petal Width: 1.1986666666666668

Median of Petal Width: 1.3

ASSINGMENT 3: SET-A

Create own dataset and do simple preprocessing

Dataset Name: Data.csv (save following data in excel and save it with .csv extension)

Country Age Salary Purchased

France 44 72000 No
Spain 27 48000 Yes
Germany 30 54000 No
Spain 38 61000 No
Germany 40 Yes
France 35 58000 Yes
Spain 52000 No
France 48 79000 Yes
Germany 50 83000 No
France 37 67000 Yes

Q.1: Write a program in python to perform following task.

1.import dataset and do the following:
a) Dscribing the dataset
b) Shape of the dataset
c) Display first 3 rows from dataset

import pandas as pd
data=pd.read_csv("Data.csv")
df=pd.DataFrame(data)
print("\n Describing Dataset:\n",df.describe())
print("\n Shape of Dataset:\n",df.shape)
print("\n First three rows of Dataset:\n",df.head(3))

OUTPUT:
Describing Dataset:
Age Salary
count 9.000000 9.000000
mean 38.777778 63777.777778
std 7.693793 12265.579662
min 27.000000 48000.000000
25% 35.000000 54000.000000
50% 38.000000 61000.000000
75% 44.000000 72000.000000
max 50.000000 83000.000000
Shape of Dataset:
(10, 4)

First three rows of Dataset:

Country Age Salary Purchased
0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No

Q.1:Handling Missing value:

a) Repalce missing value of salary, age column with mean of thatcolumn.

import pandas as pd
data=pd.read_csv("Data.csv")
df=pd.DataFrame(data)
print("\n Displaying Dataset:\n",df)
data['Salary']= data['Salary'].fillna(data['Salary'].mean())
data['Age']= data['Age'].fillna(data['Age'].mean())
print("\n ****** Modified Dataset ******\n",df)

OUTPUT:
Displaying Dataset:
Country Age Salary Purchased
0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes

Modified Dataset

Country Age Salary Purchased
0 France 44.000000 72000.000000 No
1 Spain 27.000000 48000.000000 Yes
2 Germany 30.000000 54000.000000 No
3 Spain 38.000000 61000.000000 No
4 Germany 40.000000 63777.777778 Yes
5 France 35.000000 58000.000000 Yes
6 Spain 38.777778 52000.000000 No
7 France 48.000000 79000.000000 Yes
8 Germany 50.000000 83000.000000 No
9 France 37.000000 67000.000000 Yes

Q3) Data.csv have two categorical column(the country column,and the purchased
column).
a. Apply OneHot coding on country column.
b.Apply Label encoding on purchased column.

import pandas as pd
from sklearn import preprocessing
data=pd.read_csv("Data.csv")
df=pd.DataFrame(data)
print("\n Describing Dataset:\n",df)
one_hot_encoded_data = pd.get_dummies(data, columns = ['Country'])
print("\n *******After applying OneHot coding on Country*******\n",one_hot_encoded_data)
label_encoder = preprocessing.LabelEncoder()
df['Purchased']= label_encoder.fit_transform(df['Purchased'])
df['Purchased'].unique()
print("\n *******After applying OneHot coding on Country**********\n",df)

OUTPUT:
Describing Dataset:
Country Age Salary Purchased
0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes

After applying OneHot coding on Country

Age Salary Purchased Country_France Country_Germany Country_Spain
0 44.0 72000.0 No 1 0 0
1 27.0 48000.0 Yes 0 0 1
2 30.0 54000.0 No 0 1 0
3 38.0 61000.0 No 0 0 1
4 40.0 NaN Yes 0 1 0
5 35.0 58000.0 Yes 1 0 0
6 NaN 52000.0 No 0 0 1
7 48.0 79000.0 Yes 1 0 0
8 50.0 83000.0 No 0 1 0
9 37.0 67000.0 Yes 1 0 0

*After applying OneHot coding on Country****

Country Age Salary Purchased
0 France 44.0 72000.0 0
1 Spain 27.0 48000.0 1
2 Germany 30.0 54000.0 0
3 Spain 38.0 61000.0 0
4 Germany 40.0 NaN 1
5 France 35.0 58000.0 1
6 Spain NaN 52000.0 0
7 France 48.0 79000.0 1
8 Germany 50.0 83000.0 0
9 France 37.0 67000.0 1
ASSINGMENT 3: SET-B

Q.1: Import standard data set and use transformation techniques

Dataset Name: winequality-red.csv

Write a program in python to perform following task

1. Import Dtaset from above link.

import pandas as pd
data=pd.read_csv("winequality_red.csv")
df=pd.DataFrame(data)
print("\n Dataset is : \n",df)

OUTPUT:

Dataset is :
fixed acidity volatile acidity citric acid residual sugar chlorides \
0 7.4 0.700 0.00 1.9 0.076
1 7.8 0.880 0.00 2.6 0.098
2 7.8 0.760 0.04 2.3 0.092
3 11.2 0.280 0.56 1.9 0.075
4 7.4 0.700 0.00 1.9 0.076
... ... ... ... ... ...
1594 6.2 0.600 0.08 2.0 0.090
1595 5.9 0.550 0.10 2.2 0.062
1596 6.3 0.510 0.13 2.3 0.076
1597 5.9 0.645 0.12 2.0 0.075
1598 6.0 0.310 0.47 3.6 0.067

free sulfur dioxide total sulfur dioxide density pH sulphates \

0 11.0 34.0 0.99780 3.51 0.56
1 25.0 67.0 0.99680 3.20 0.68
2 15.0 54.0 0.99700 3.26 0.65
3 17.0 60.0 0.99800 3.16 0.58
4 11.0 34.0 0.99780 3.51 0.56
... ... ... ... ... ...
1594 32.0 44.0 0.99490 3.45 0.58
1595 39.0 51.0 0.99512 3.52 0.76
1596 29.0 40.0 0.99574 3.42 0.75
1597 32.0 44.0 0.99547 3.57 0.71
1598 18.0 42.0 0.99549 3.39 0.66

alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
4 9.4 5
... ... ...
1594 10.5 5
1595 11.2 6
1596 11.0 6
1597 10.2 5
1598 11.0 6

[1599 rows x 12 columns]

Q.2: Rescaling: Normalization the dataset using MinMaxScalar class

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
data=pd.read_csv("winequality-red.csv",sep=';')
df=pd.DataFrame(data)
print(df)
scaler = MinMaxScaler()
scaled = scaler.fit_transform(df)
print(scaled)

OUTPUT:

fixed acidity volatile acidity citric acid residual sugar chlorides \

0 7.4 0.700 0.00 1.9 0.076
1 7.8 0.880 0.00 2.6 0.098
2 7.8 0.760 0.04 2.3 0.092
3 11.2 0.280 0.56 1.9 0.075
4 7.4 0.700 0.00 1.9 0.076
... ... ... ... ... ...
1594 6.2 0.600 0.08 2.0 0.090
1595 5.9 0.550 0.10 2.2 0.062
1596 6.3 0.510 0.13 2.3 0.076
1597 5.9 0.645 0.12 2.0 0.075
1598 6.0 0.310 0.47 3.6 0.067

free sulfur dioxide total sulfur dioxide density pH sulphates \

alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
4 9.4 5
... ... ...
1594 10.5 5
1595 11.2 6
1596 11.0 6
1597 10.2 5
1598 11.0 6

[1599 rows x 12 columns]

[[0.24778761 0.39726027 0. ... 0.13772455 0.15384615 0.4 ]
[0.28318584 0.52054795 0. ... 0.20958084 0.21538462 0.4 ]
[0.28318584 0.43835616 0.04 ... 0.19161677 0.21538462 0.4 ]
...
[0.15044248 0.26712329 0.13 ... 0.25149701 0.4 0.6 ]
[0.11504425 0.35958904 0.12 ... 0.22754491 0.27692308 0.4 ]
[0.12389381 0.13013699 0.47 ... 0.19760479 0.4 0.6 ]]

Q.3:) Standardizing Data (trans. Them into a standard Guassian distribution with a mean
of 0 and a standard deviation of 1)

import pandas as pd
from sklearn import preprocessing
data=pd.read_csv("winequality-red.csv",sep=';')
df=pd.DataFrame(data)
print("\n Dataset is : \n",df)
standard = preprocessing.scale(df)
print("\n *********Standardized Data*********\n",standard)

OUTPUT:

free sulfur dioxide total sulfur dioxide density pH sulphates \

alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
4 9.4 5
... ... ...
1594 10.5 5
1595 11.2 6
1596 11.0 6
1597 10.2 5
1598 11.0 6

[1599 rows x 12 columns]

*********Standardized Data*********
[[-0.52835961 0.96187667 -1.39147228 ... -0.57920652 -0.96024611
-0.78782264]
[-0.29854743 1.96744245 -1.39147228 ... 0.1289504 -0.58477711
-0.78782264]
[-0.29854743 1.29706527 -1.18607043 ... -0.04808883 -0.58477711
-0.78782264]
...
[-1.1603431 -0.09955388 -0.72391627 ... 0.54204194 0.54162988
0.45084835]
[-1.39015528 0.65462046 -0.77526673 ... 0.30598963 -0.20930812
-0.78782264]
[-1.33270223 -1.21684919 1.02199944 ... 0.01092425 0.54162988
0.45084835]]

Q.4:)Normalizing Data (rescale each observation to a length pof 1 (a unit norm) .For this,
use the normalizer class.)

import pandas as pd
import numpy as np
from sklearn import preprocessing
data=pd.read_csv("winequality-red.csv",sep=';')
df=pd.DataFrame(data)
print("\n Dataset is : \n",df)
normalized = preprocessing.normalize(df,norm='l2')
print("\n***********Normalized Data***********\n",normalized)

OUTPUT:
Dataset is :
fixed acidity volatile acidity citric acid residual sugar chlorides \
0 7.4 0.700 0.00 1.9 0.076
1 7.8 0.880 0.00 2.6 0.098
2 7.8 0.760 0.04 2.3 0.092
3 11.2 0.280 0.56 1.9 0.075
4 7.4 0.700 0.00 1.9 0.076
... ... ... ... ... ...
1594 6.2 0.600 0.08 2.0 0.090
1595 5.9 0.550 0.10 2.2 0.062
1596 6.3 0.510 0.13 2.3 0.076
1597 5.9 0.645 0.12 2.0 0.075
1598 6.0 0.310 0.47 3.6 0.067

free sulfur dioxide total sulfur dioxide density pH sulphates \

alcohol quality
0 9.4 5
1 9.8 5
2 9.8 5
3 9.8 6
4 9.4 5
... ... ...
1594 10.5 5
1595 11.2 6
1596 11.0 6
1597 10.2 5
1598 11.0 6

[1599 rows x 12 columns]

***********Normalized Data***********
[[0.19347777 0.01830195 0. ... 0.01464156 0.24576906 0.13072822]
[0.10698874 0.01207052 0. ... 0.00932722 0.13442175 0.06858252]
[0.13494887 0.01314886 0.00069205 ... 0.01124574 0.16955114 0.08650569]
...
[0.1222319 0.00989496 0.00252225 ... 0.01455142 0.21342078 0.11641133]
[0.10524769 0.01150589 0.00214063 ... 0.0126654 0.18195363 0.08919296]
[0.12491328 0.00645385 0.00978487 ... 0.01374046 0.22900768 0.12491328]]

5.Biarizing Data using we use Binarizer class (Using a banary threshold, it is possible to
transform out data by marking the values above it 1 and those equal to or below it 0)
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import preprocessing
from sklearn.preprocessing import Binarizer
data_set = pd.read_csv('Data.csv')

data_set.head()
age = data_set['Age'].values
salary = data_set['Salary'].values
print("\n Original age data values : \n",age)
print("\n Original salary data values : \n",salary)
x = age
x = x.reshape(1,-1)
y = salary
y = y.reshape(1,-1)
binarizer_1 = Binarizer(threshold=35)
binarizer_1 = Binarizer(threshold=61000)

print("\n Binarized age : \n", binarizer_1.fit_transform(x))

print("\n Binarized salary : \n", binarizer_1.fit_transform(y))

OUTPUT:
Original age data values :
[44 27 30 38 40 35 58 48 50 37]

Original salary data values :

[72000 48000 54000 61000 55000 58000 52000 79000 83000 67000]

Binarized age :
[[0 0 0 0 0 0 0 0 0 0]]

Binarized salary :
[[1 0 0 0 0 0 0 1 1 1]]

Assignment 1 (Set A)
No ratings yet
Assignment 1 (Set A)
4 pages
Data Science Practicals - Ipynb
No ratings yet
Data Science Practicals - Ipynb
54 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Assignment
No ratings yet
Assignment
2 pages
Series 1
No ratings yet
Series 1
408 pages
12 IP Practical Exampl
No ratings yet
12 IP Practical Exampl
6 pages
Sakina Assign1 Batch3
No ratings yet
Sakina Assign1 Batch3
8 pages
Practical File ANKIT RAJ CLASS 12-F
No ratings yet
Practical File ANKIT RAJ CLASS 12-F
48 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Pandas
No ratings yet
Pandas
27 pages
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
No ratings yet
ST Joseph'S Convent Senior Secondary School: Name:-Shatakshi Gaur Class:-Xii Sec:-A Board Roll No.
65 pages
Python Data Handling with Pandas
No ratings yet
Python Data Handling with Pandas
12 pages
IP Practical File
No ratings yet
IP Practical File
27 pages
AI Practical 2025
No ratings yet
AI Practical 2025
14 pages
2023 Data Analysis and Visualization Using Python
100% (2)
2023 Data Analysis and Visualization Using Python
9 pages
Even Students
No ratings yet
Even Students
36 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
Pandas Practicals - Term-1
100% (1)
Pandas Practicals - Term-1
18 pages
CLASS XII - IP List of Practicals With Coding 2020
No ratings yet
CLASS XII - IP List of Practicals With Coding 2020
15 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
47 pages
12 Pandas
No ratings yet
12 Pandas
14 pages
Ayush IP
No ratings yet
Ayush IP
24 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
Unit 4
No ratings yet
Unit 4
27 pages
Python Pandas-DataFrames Complete - Jupyter Notebook
No ratings yet
Python Pandas-DataFrames Complete - Jupyter Notebook
34 pages
Dsbda Assignment 1
No ratings yet
Dsbda Assignment 1
5 pages
Practical File Question 28.09.2022
No ratings yet
Practical File Question 28.09.2022
15 pages
Pandas 2 Complete Notes Class XII
No ratings yet
Pandas 2 Complete Notes Class XII
18 pages
Vantika Kamra's Practical File 12 Diamond (26600872)
No ratings yet
Vantika Kamra's Practical File 12 Diamond (26600872)
46 pages
Khadeeja - DS - PRACTICAL 4
No ratings yet
Khadeeja - DS - PRACTICAL 4
24 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
LL
No ratings yet
LL
5 pages
Dataframe in Pandas
No ratings yet
Dataframe in Pandas
23 pages
MCQ On Dataframe
No ratings yet
MCQ On Dataframe
11 pages
Python Pandas Assignment Guide
No ratings yet
Python Pandas Assignment Guide
9 pages
DHP Journal
No ratings yet
DHP Journal
29 pages
Assignments IP Class 12
No ratings yet
Assignments IP Class 12
9 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
No ratings yet
Suryadatta National School Class 12 CBSE Informatics Practices Practicals List
19 pages
Day08-Pandas-Tutorial: Pandas - by Punith V T
No ratings yet
Day08-Pandas-Tutorial: Pandas - by Punith V T
8 pages
DS Manual 1
No ratings yet
DS Manual 1
96 pages
Class 12 IP Practical File Question & Answer 2021
No ratings yet
Class 12 IP Practical File Question & Answer 2021
15 pages
Pandas Practice for Students
No ratings yet
Pandas Practice for Students
12 pages
Python Class 6 Assignment Solution
No ratings yet
Python Class 6 Assignment Solution
9 pages
I.P Practical Solution
No ratings yet
I.P Practical Solution
20 pages
Question Bank Class XII IP 065 Long Question Answer
No ratings yet
Question Bank Class XII IP 065 Long Question Answer
35 pages
Wa0012.
No ratings yet
Wa0012.
30 pages
S7 Practice Questions
No ratings yet
S7 Practice Questions
7 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
15 pages
Practical File 2024
No ratings yet
Practical File 2024
25 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
98 pages
GR12 Record Programs 6TH Onwards
No ratings yet
GR12 Record Programs 6TH Onwards
18 pages
GE Python Visualization 2023
No ratings yet
GE Python Visualization 2023
16 pages
Project Prog
No ratings yet
Project Prog
6 pages
Poly Product Catalog 2023
No ratings yet
Poly Product Catalog 2023
47 pages
Influence of Musical Preferences and Intraoperative Questions On Suturing Speed
No ratings yet
Influence of Musical Preferences and Intraoperative Questions On Suturing Speed
7 pages
Centre Pivot Tender Document
No ratings yet
Centre Pivot Tender Document
38 pages
Complete Map Work SST Class 10
100% (26)
Complete Map Work SST Class 10
20 pages
Ring Coupler
No ratings yet
Ring Coupler
39 pages
Succsex PDF
67% (9)
Succsex PDF
156 pages
54 Django Questions Answers
No ratings yet
54 Django Questions Answers
86 pages
Traveller - Solo Adventure 1 - Scout's Honor
No ratings yet
Traveller - Solo Adventure 1 - Scout's Honor
26 pages
Power - Aware Query Processing Over Sensor Networks
No ratings yet
Power - Aware Query Processing Over Sensor Networks
15 pages
Lesson Plan Nervous System Star
No ratings yet
Lesson Plan Nervous System Star
3 pages
ForeFlight Navlog: SKCG to SKBO Flight
No ratings yet
ForeFlight Navlog: SKCG to SKBO Flight
4 pages
Team PROJECT CHARTER
No ratings yet
Team PROJECT CHARTER
7 pages
Husen Mohammed Quantitative Final Exam
100% (2)
Husen Mohammed Quantitative Final Exam
14 pages
Final Annual Report 2023-24
No ratings yet
Final Annual Report 2023-24
165 pages
Risk Assessment PowerPoint Presentation
No ratings yet
Risk Assessment PowerPoint Presentation
15 pages
Partner Determination in SAP SD
No ratings yet
Partner Determination in SAP SD
9 pages
QSB67C Terberg
100% (8)
QSB67C Terberg
45 pages
Contingency Approach MPV
No ratings yet
Contingency Approach MPV
45 pages
Chemical Equilibrium
No ratings yet
Chemical Equilibrium
88 pages
Boltzmann Equation Overview
No ratings yet
Boltzmann Equation Overview
7 pages
9980 - Manual On Privatization in The Provision of Airports and Air Navigation Services.-1st Ed. - 2012 PDF
No ratings yet
9980 - Manual On Privatization in The Provision of Airports and Air Navigation Services.-1st Ed. - 2012 PDF
46 pages
Professional Ethics Record Work Guidelines
No ratings yet
Professional Ethics Record Work Guidelines
3 pages
NCM 109 Midterm Quizzes
No ratings yet
NCM 109 Midterm Quizzes
6 pages
Vitamin C Project Kartik Styled
No ratings yet
Vitamin C Project Kartik Styled
14 pages
Engaging Families at CDM
No ratings yet
Engaging Families at CDM
25 pages
APUSH Chapter 37 Outline
No ratings yet
APUSH Chapter 37 Outline
6 pages
SSRF NOTES With Lab Solutions (Part 2)
No ratings yet
SSRF NOTES With Lab Solutions (Part 2)
5 pages
Modicon 984 To Modicon Quantum
No ratings yet
Modicon 984 To Modicon Quantum
2 pages
Cepheus Engine - 50 Wonders of The Reticulan Empire (Oef)
No ratings yet
Cepheus Engine - 50 Wonders of The Reticulan Empire (Oef)
30 pages
Working Capital Management and Its Impact On Profitability-A Study On Janata Bank Ltd.
100% (1)
Working Capital Management and Its Impact On Profitability-A Study On Janata Bank Ltd.
57 pages

Data Sci

Uploaded by

Data Sci

Uploaded by

ROLL NO:

Q.2: Write a pyhton program to print shape,number of rows-number of columns, data

Q.3: Write a python program to view basic statistical detail of data.

Q.7:Write a python program to generate a line plot pf name vs percentage

Q.8:Write a python program to generate a scatter plot pf name vs percentage.

print('/n random 10 rows')

nan values : 75000

[25000 rows x 4 columns]

Minimum of BMI = 1.531593289721334

Q7) Write a python program to generate a scatter plot of height vs weight.

Q2) Create a data frame as follows: Print df.sum

Q3) For the above data display statistical details:

Q4) Consider the array [13,52,44,32,30,0,36,45].Calculate standard deviation.

Q5) Create a data frame as follows:

Display the mean of every player :

Arithmetic Mean Values

Maximum value of flattened array:

Minimum value of flattened array:

Mean of dataframe : 79.95

Mean of dataframe : 79.95

Inter Quartile Range (IQR) of dataframe : 34.0

Q10)WRITE A PYTHON PROGRAM TO COMPUTE SUM OF MANHATTAN

sum += (abs(x[i] - x[j])) + (abs(y[i] - y[j]))

Basic Stastistics of data :

print("\n Maximum of Petal Length:",max(df['PetalLengthCm']))

Maximum of sepal Length: 7.9

Minimum of sepal Length: 4.3

Maximum of sepal Width: 4.4

Minimum of sepal Width: 2.0

Maximum of Petal Length: 6.9

Minimum of Petal Length: 1.0

Maximum of Petal Width: 2.5

number of records for each distinct value of species attribute:

print("\n Mean of Petal Length:", s.tmean(df['PetalLengthCm']))

[150 rows x 6 columns]

Mean of sepal Length: 5.843333333333334

Median of sepal Length: 5.8

Mean of sepal Width: 3.0540000000000003

Median of sepal Width: 3.0

Mean of Petal Length: 3.758666666666666

Median of Petal Length: 4.35

Mean of Petal Width: 1.1986666666666668

Median of Petal Width: 1.3

Create own dataset and do simple preprocessing

Country Age Salary Purchased

Q.1: Write a program in python to perform following task.

First three rows of Dataset:

Q.1:Handling Missing value:

a) Repalce missing value of salary, age column with mean of thatcolumn.

****** Modified Dataset ******

*******After applying OneHot coding on Country*******

*******After applying OneHot coding on Country**********

Q.1: Import standard data set and use transformation techniques

Write a program in python to perform following task

free sulfur dioxide total sulfur dioxide density pH sulphates \

[1599 rows x 12 columns]

Q.2: Rescaling: Normalization the dataset using MinMaxScalar class

fixed acidity volatile acidity citric acid residual sugar chlorides \

free sulfur dioxide total sulfur dioxide density pH sulphates \

[1599 rows x 12 columns]

free sulfur dioxide total sulfur dioxide density pH sulphates \

[1599 rows x 12 columns]

free sulfur dioxide total sulfur dioxide density pH sulphates \

[1599 rows x 12 columns]

print("\n Binarized age : \n", binarizer_1.fit_transform(x))

Original salary data values :

You might also like

Modified Dataset

After applying OneHot coding on Country

*After applying OneHot coding on Country****