0% found this document useful (0 votes)

109 views19 pages

Assignment-1: Abhishek Shringi

This document contains Abhishek Shringi's assignment on analyzing three datasets - data_1, data_2, and data_3. For each dataset, Abhishek provides scatter plots, histograms, boxplots, and heatmaps to infer patterns in the data. He also calculates statistical parameters like mean, median, variance, range, percentiles, and quartiles. Finally, he identifies outliers in data_3 using standard deviation and mean deviation approaches.

Uploaded by

Franklin Garyson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views19 pages

Assignment-1: Abhishek Shringi

Uploaded by

Franklin Garyson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

ASSIGNMENT-1

ABHISHEK SHRINGI

03-03-2021

CLL788

PROF. HARIPRASAD KODAMANA

Question 1:

PART-(A):
For “data_1”

Inferernce:

From the scatter plot we can infer that the given

data is in form of a cluster between a certain range
of values and point far away from it are

Scatter plot

Histograms

Inferernce:

From the scatter plot we can infer that the given data is in form of a cluster between a certain range of values and
point far away from it are
Assignment PAGE 2
Histograms

Assignment PAGE 3
Question 1:

PART-(A):
For “data_1”

Inference

Boxplot showing x and y distribution

Inferernce:

From the scatter plot we can infer that the given

data is in form of a cluster between a certain range
of values and point far away from it are

Heatmap

Assignment PAGE 4
Question 1:

PART-(A):
For “data_3”

Inferernce:

From the scatter plot we can infer that the given

data is in form of a cluster between a certain range
of values and point far away from it are

Scatter plot

Histograms

Inferernce:

From the scatter plot we can infer that the given data is in form of a cluster between a certain range of values and
point far away from it are

Assignment PAGE 5
Question 1:

PART-(A):
For “data_3”

Inferernce:

From the scatter plot we can infer that the given

data is in form of a cluster between a certain range
of values and point far away from it are

Inferernce:

From the scatter plot we can infer that the given

data is in form of a cluster between a certain range
of values and point far away from it are

Assignment PAGE 6
Question 1:

PART-(C):
For “data_1”

Parameters x y

4.9397 5.0429
Mean

4.9242 5.0747
Median

0.9718 1.014
Variance

(2.373,8.117) (2.181, 8.190)

Range

3.68476102 3.73504038
Percentile
4.08570612 4.21923318
4.4123583 4.55248202
4.700721 4.87424146
4.9242784 5.07476839
5.18017309 5.36513593
5.46958487 5.57664436
5.78852584 5.87166202
6.20755249 6.21449875
8.1170447 8.19010885

4.30398749 4.33146421
Quartile
4.9242784 5.07476839
5.60721395 5.6823799
8.1170447 8.19010885
-0.00136 -0.1207
Skewness

-0.2039 0.09080
Kurtosis

Assignment PAGE 7
For “data_3”

Parameters x y

5.0824 4.9528
Mean

5.0548 4.9788
Median

2.6572 2.6294886177174
Variance
48

(-1.458,12.267) (-1.004,10.58)
Range

3.43512982 3.28478833
Percentile
4.09472424 3.95274238
4.50069037 4.3584519
4.81775713 4.69351167
5.05487549 4.97884708
5.38188447 5.25595864
5.70739195 5.56424354
6.04270545 5.94105513
6.53338054 6.59228622
12.26702453 10.58925244

4.31898111 4.21810056
Quartile
5.05487549 4.97884708
5.89160317 5.7826677
12.2670245 10.58925244

0.0413 -0.0825
Skewness

3.30098 2.417
Kurtosis

Assignment PAGE 8
Question 1:

PART-(D):

By Standard Deviation approach( For Data3) :

Outliers in x Outliers in y

10.10393747643514 10.43902165955905,
10.67798532964827 10.55061243676815,
12.2670245277286 10.58925243952102,
11.44965456902049 10.11551641326384,
10.28782093813039 -0.5334956955315495,
-1.458402520742969 -0.2557338130229054,
-1.171853375119153 -1.004100360593079,
-0.763132811291513 -0.5528195925039532
-0.2198563833130491
-0.2439263484483771

By Mean Deviation approach ( For Data3)

Outliers in x Outliers in y

10.10393747643514 9.745989964192137
9.14841306110033 9.265264240840457
9.630313376450168 10.43902165955905
9.109233422770801 9.219071559753377
9.37144942459267 9.717401494534483
9.005201225564582 8.5540601585995
9.039507424436444 10.55061243676815
10.67798532964827 9.749970745029252
9.510497893324324 9.127858533076473
12.2670245277286 8.581245337617629
9.5456448855292 8.529730294461498
9.375609831291062 10.58925243952102
8.780482242073463 8.693157145943976
11.44965456902049 9.431652444355333
9.321302250247287 10.11551641326384
10.28782093813039 0.739593465779022
-1.458402520742969 1.028275835883979
-1.171853375119153 1.336612395334499
1.172063763666684 1.289365241997683
1.180680541173682 0.357106853679473
0.7916468945614012 0.303137254711518
1.400723237983714 0.580120633348619
-0.763132811291513 1.271821699448418
1.232122283140872 0.365306465834663
0.7725850670343857 0.432114885726590
1.179109482383718 0.468033364310671
-0.2198563833130491 -0.53349569553154

Assignment PAGE 9
0.7398399056096148 1.154781275984739
-0.2439263484483771 -0.25573381302290
0.5680693847459939 0.434412418225085
1.083946032586385 -1.00410036059307
0.9053918140205244 0.747056391444499
1.286385738350685 1.047516482080888
-0.55281959250395
0.957823807905711
1.421603535576362

Inference:-

From the data obtained we can see that the number of outliers in obtained through the MAD approach is greater. This is
due to the fact that the standard deviation approach relies on the mean of the data which in turn would be dependent on
the outliers. However, the MAD approach utilizes the median of the data and hence is less effected by outliers and can
easily detect them.

Question 2: Bach Gradient Descent

STEPS DONE:

1) Input of data from files “q1”

2) Then this data was divided into input(x) and output(y)
3) Then, learning rate was assigned value. Parameters(theta) were initialized to 1.
4) Then a batch gradient descent was implemented and iterated until we get minimum cost .
5) A plot of cost function with iteration was plotted for different learning rates as given below.

Inference: We see that the cost decreases with each iteration showing that it gradually converges to a particular value. On
reaching this value the cost becomes constant thus stopping the batch gradient descent algorithms and helping us obtain
the corresponding value of thetas. These theta are then used to fit a straight line to the data. As seen from the plot on the
left, the line fits the data quite well thus showing the model works well

Question 2: Stochastic Gradient Descent

Assignment PAGE 10
STEPS DONE:

1) Input of data from files “q1”

2) Then this data was divided into input(x) and output(y)
3) Then, learning rate was assigned value. Parameters(theta) were initialized to 1.
4) Then a Stochastic gradient descent was implemented and iterated until we get minimum cost .
5) A plot of cost function with iteration was plotted for different learning rates as given below.

Inference: We see that the cost decreases with each iteration showing that it gradually converges to a particular value.
However on comparing this with that of the batch gradient descent we see that the path followed by stochastic gradient
descent is quite noisy. This is in accordance with the expected results as in this algorithm we update our hypothesis after
each example as a result of which we may not always follow the optimum path.

Once the algorithm converges we obtain the corresponding value of thetas. These theta are then used to fit a straight line
to the data. As seen from the plot on the left, the line fits the data quite well thus showing the model created is working.

Question 2: Least Squared Close Form Solution

STEPS DONE:

1) Input of data from files “q1”

2) Then this data was divided into input(x) and output(y)
3) Then, learning rate was assigned value. Parameters(theta) were initialized to 1.
4) Then Least Squared solution was implemented according to the following formula thus giving us the optimum
value of the parameters required:
𝜃 = (𝑋 𝑇 ∗ 𝑋)−1 ∗ 𝑋 𝑇 ∗ 𝑦
5) A plot of cost function with iteration was plotted for different learning rates as given below.

Inference: As expected, the model obtained through this method seems as the best fit as here we are using a theoretical
equation to arrive directly at the optimum value. Note that this is also expected to be the fastest algorithm due to the least
computations involved and absence of loops within the code.

Assignment PAGE 11
Question 2b:

x y w h
6.2101 17.612 0.023960093 17.612
5.6277 9.2302 0.000504599 9.078502328
8.6186 13.762 0.113718462 13.52965597
7.1032 11.954 0.639492951 10.80446502

theta0 theta1
0 0
0.004219852 0.0262057
0.004265662 0.026463506
0.019651378 0.159066843
0.088745171 0.649853868

Assignment PAGE 12
Assignment PAGE 13
Question-2c

Assignment PAGE 14
Using suitable python libraries, I was able to implement the following regression algorithms:

Ridge Regression:

Lasso Regression:

Elastic Regression:

Assignment PAGE 15
Inference: Through the three plots above we can see that each of the three algorithms have almost the same performance
and seem to fit the data equally well. It was expected that lasso regression would be better. However as the given dataset
consists of only 1 input feature lasso regression underperforms(can be observed by calculating model score). It is a well
known fact that lasoo regression tries to reduce the input parameters and works well for datasets with a large number of
input parameters. On the other hand ridge regression is more suitable for datasets with fewer datapoints.

Question -3

STEPS DONE:

6) Input of data from files “q2_train” and “q2_test”

7) Then this data was divided into input(x) and output(y)
8) Then, as input involves more than one feature or independent variable so feature scaling was done so as to avoid
any skewness in results.This was done using a function from sklearn library named ‘MinMaxScaler()’
9) Then, learning rate was assigned value. Parameters(theta) were initialized to 1.
10) Sigmoid function was calculated as given below:
11) Then a batch gradient descent was implemented and iterated until we get minimum cost .
12) A plot of cost function with iteration was plotted for different learning rates as given below.

OBSERVATIONS:

1) For the various learning rates below or equal to ‘.1’ the cost function shows excellent convergence. The graph
below shows cost vs iteration for the the batch lms implemented (𝛼 = .1).

Assignment PAGE 16
Clearly from the above graph, it can be inferred that the convergence occur around 50 iterations. The value of
parameters after the complete set of iterations were found out to be 𝜃 = [−3.32681817 5.82141106 1.10280864].

Assignment PAGE 17
2) In order to compare the actual and the predicted value , a contour was plotted between the input parameters for
the training data as shown below:

It can be inferred from the above graph that for the training data, most of the predicted and training data
coincides and hence it can be said that the logistic regression fairly predicts correct output for a given set of

inputsFor the test data predictions were made , and the output is represented on a similar contour plot between
the input parameters as shown below:

Assignment PAGE 18
Assignment PAGE 19

Practical Work2
No ratings yet
Practical Work2
2 pages
CSE1703 - Fundamental of Data Science
No ratings yet
CSE1703 - Fundamental of Data Science
6 pages
ML June 2024
No ratings yet
ML June 2024
12 pages
2024 Machine Learning
No ratings yet
2024 Machine Learning
8 pages
Linear Regression & Data Analysis
No ratings yet
Linear Regression & Data Analysis
4 pages
04 LinearRegression PDF
No ratings yet
04 LinearRegression PDF
61 pages
Lecture 10 - 04.09.2024 - Regression-02 Lecture Slides
No ratings yet
Lecture 10 - 04.09.2024 - Regression-02 Lecture Slides
61 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
Midterm With Solutions
No ratings yet
Midterm With Solutions
26 pages
PRCV Practical File
No ratings yet
PRCV Practical File
24 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
18-660: Numerical Methods For Engineering Design and Optimization
No ratings yet
18-660: Numerical Methods For Engineering Design and Optimization
27 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Machine Learning Insem-01 QP
No ratings yet
Machine Learning Insem-01 QP
6 pages
Regression
No ratings yet
Regression
16 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
Weekly Homework X
No ratings yet
Weekly Homework X
15 pages
PRML Exercise Solutions Guide
No ratings yet
PRML Exercise Solutions Guide
87 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
CH 2
No ratings yet
CH 2
31 pages
Machine Learning Quiz for Students
No ratings yet
Machine Learning Quiz for Students
45 pages
Department of Electrical Engineering School of Science and Engineering
No ratings yet
Department of Electrical Engineering School of Science and Engineering
10 pages
Machine Learning Assignment Solutions
No ratings yet
Machine Learning Assignment Solutions
46 pages
Assignment III
No ratings yet
Assignment III
3 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
21Csc305P-Machine Learning: Offline
No ratings yet
21Csc305P-Machine Learning: Offline
8 pages
Sample Midterm With Solutions (Updated)
No ratings yet
Sample Midterm With Solutions (Updated)
26 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Endsem ML Makeup AK - 1
No ratings yet
Endsem ML Makeup AK - 1
7 pages
Machine Learning (CSEN3203) 1-14
No ratings yet
Machine Learning (CSEN3203) 1-14
15 pages
Assignment 2 Ds
No ratings yet
Assignment 2 Ds
8 pages
Linear Regression for Data Science
No ratings yet
Linear Regression for Data Science
34 pages
Amazon ML Pyq
No ratings yet
Amazon ML Pyq
8 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
ML 20230316 1
No ratings yet
ML 20230316 1
9 pages
Machine Learning Questions Final - Solutions
No ratings yet
Machine Learning Questions Final - Solutions
5 pages
Stats 205 Hw1
No ratings yet
Stats 205 Hw1
4 pages
AI & ML Lab Manual - LDCE
No ratings yet
AI & ML Lab Manual - LDCE
70 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
Updating Weight
No ratings yet
Updating Weight
9 pages
Big Data Science Assignment
No ratings yet
Big Data Science Assignment
17 pages
Solutions To Selected Problems in Machine Learning: An Algorithmic Perspective
0% (1)
Solutions To Selected Problems in Machine Learning: An Algorithmic Perspective
4 pages
Fisseha Berhane,: Analytical and Numerical Solutions, With R, To Linear Regression Problems
No ratings yet
Fisseha Berhane,: Analytical and Numerical Solutions, With R, To Linear Regression Problems
22 pages
Part I: Written Exercises: Homework 3 Submit On NYU Classes by Fri. Oct. 20 at Noon
No ratings yet
Part I: Written Exercises: Homework 3 Submit On NYU Classes by Fri. Oct. 20 at Noon
3 pages
End Sem PYQ
No ratings yet
End Sem PYQ
8 pages
Amazon ML Summer School Previous Year Questions
100% (1)
Amazon ML Summer School Previous Year Questions
12 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
MATH3714 Jan 2024
No ratings yet
MATH3714 Jan 2024
9 pages
Qs ML
No ratings yet
Qs ML
8 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
26 pages
HW 3
No ratings yet
HW 3
7 pages
Endsem ML Regular AK
No ratings yet
Endsem ML Regular AK
7 pages
Assgmt 1
No ratings yet
Assgmt 1
7 pages
Perceptron vs Logistic Regression Analysis
No ratings yet
Perceptron vs Logistic Regression Analysis
8 pages
PYL100: EM Waves and Quantum Mechanics Electrostatics Problem Set 2
No ratings yet
PYL100: EM Waves and Quantum Mechanics Electrostatics Problem Set 2
3 pages
Absorption and Dispersion in Conducting Media: PYL100: Electromagnetic Waves and Quantum Mechanics
No ratings yet
Absorption and Dispersion in Conducting Media: PYL100: Electromagnetic Waves and Quantum Mechanics
31 pages
PYL100: Electromagnetic Waves and Quantum Mechanics (II Semester, 2016-17) Exercise Sheet # 1
No ratings yet
PYL100: Electromagnetic Waves and Quantum Mechanics (II Semester, 2016-17) Exercise Sheet # 1
2 pages
PYL100: EM Waves and Quantum Mechanics Electric Fields in Matter Problem Set 3
No ratings yet
PYL100: EM Waves and Quantum Mechanics Electric Fields in Matter Problem Set 3
3 pages
PYL100: EM Waves and Quantum Mechanics Vector Analysis Problem Set 1
100% (1)
PYL100: EM Waves and Quantum Mechanics Vector Analysis Problem Set 1
4 pages
Review Article: Nanomaterials For Hydrogen Storage Applications: A Review
No ratings yet
Review Article: Nanomaterials For Hydrogen Storage Applications: A Review
10 pages
The Reflection and Refraction of Electromagnetic Waves
No ratings yet
The Reflection and Refraction of Electromagnetic Waves
47 pages
Magnetostatics and Magnetic Fields in Matter: PYL100: EM Waves and Quantum Mechanics Semester I 2018-2019
No ratings yet
Magnetostatics and Magnetic Fields in Matter: PYL100: EM Waves and Quantum Mechanics Semester I 2018-2019
63 pages
Hydrogen Storage in Nanopolymers
No ratings yet
Hydrogen Storage in Nanopolymers
4 pages
FP32020 Oct IAL
No ratings yet
FP32020 Oct IAL
32 pages
DPC 2011 3 2 352 357
No ratings yet
DPC 2011 3 2 352 357
6 pages
Solving Recurrences in Algorithms
0% (1)
Solving Recurrences in Algorithms
44 pages
Auto Trend Fib Signals BY Studio Divin
No ratings yet
Auto Trend Fib Signals BY Studio Divin
2 pages
Artificial Intelligence Semester Project: Topic: Car Mileage Predictor Presented by Abdullah Farooq
No ratings yet
Artificial Intelligence Semester Project: Topic: Car Mileage Predictor Presented by Abdullah Farooq
17 pages
Drug Assay Techniques Guide
100% (1)
Drug Assay Techniques Guide
19 pages
Hill Climbing Algorithm Guide
No ratings yet
Hill Climbing Algorithm Guide
5 pages
Fixed Point Theorems For Discontinuous Functions and Applications
No ratings yet
Fixed Point Theorems For Discontinuous Functions and Applications
8 pages
Parent Functions Activity
No ratings yet
Parent Functions Activity
4 pages
AP Calculus AB - Ultimate Guide Notes - Knowt
No ratings yet
AP Calculus AB - Ultimate Guide Notes - Knowt
29 pages
Calculator Modes & Configuration Guide
No ratings yet
Calculator Modes & Configuration Guide
10 pages
Constrained Optimization Lecture
No ratings yet
Constrained Optimization Lecture
12 pages
Numerical Analysis Solution
50% (2)
Numerical Analysis Solution
14 pages
Cauchy-Schwarz Inequality
No ratings yet
Cauchy-Schwarz Inequality
9 pages
Computational Differential Equations
No ratings yet
Computational Differential Equations
19 pages
2 Hamilton
No ratings yet
2 Hamilton
80 pages
Optimum Workforce Scheduling Under The (14, 21) Days-Off Timetable
No ratings yet
Optimum Workforce Scheduling Under The (14, 21) Days-Off Timetable
9 pages
Calculus Volume 1
100% (11)
Calculus Volume 1
871 pages
Complex Analysis Exam 2016/2017
No ratings yet
Complex Analysis Exam 2016/2017
4 pages
CS Ma Math31.4 Eden - R C 2021 2
100% (1)
CS Ma Math31.4 Eden - R C 2021 2
5 pages
EAII - Note 6 PDF
No ratings yet
EAII - Note 6 PDF
13 pages
Difference Equations
No ratings yet
Difference Equations
17 pages
Signed Off - Practical Research 1 G11 - q2 - Mod3 - Qualiresearch - v3 PDF
83% (6)
Signed Off - Practical Research 1 G11 - q2 - Mod3 - Qualiresearch - v3 PDF
19 pages
Wolfram Alpha
No ratings yet
Wolfram Alpha
18 pages
Cuis Module 4.1 and 4.2 Answers
No ratings yet
Cuis Module 4.1 and 4.2 Answers
6 pages
Reflections of A Physicist 2nd Edition Bridgman P W Download
100% (2)
Reflections of A Physicist 2nd Edition Bridgman P W Download
90 pages
Eciv 720 A Advanced Structural Mechanics and Analysis: Isoparametric CST
No ratings yet
Eciv 720 A Advanced Structural Mechanics and Analysis: Isoparametric CST
84 pages
Demonstration Models For Teaching Structural Mechanics: W. G. Godden
No ratings yet
Demonstration Models For Teaching Structural Mechanics: W. G. Godden
84 pages
MATH5825 Lec 10
No ratings yet
MATH5825 Lec 10
25 pages

Assignment-1: Abhishek Shringi

Uploaded by

Assignment-1: Abhishek Shringi

Uploaded by

ASSIGNMENT-1

PROF. HARIPRASAD KODAMANA

From the scatter plot we can infer that the given

Boxplot showing x and y distribution

From the scatter plot we can infer that the given

From the scatter plot we can infer that the given

From the scatter plot we can infer that the given

From the scatter plot we can infer that the given

(2.373,8.117) (2.181, 8.190)

By Standard Deviation approach( For Data3) :

By Mean Deviation approach ( For Data3)

Question 2: Bach Gradient Descent

1) Input of data from files “q1”

Question 2: Stochastic Gradient Descent

1) Input of data from files “q1”

Question 2: Least Squared Close Form Solution

1) Input of data from files “q1”

6) Input of data from files “q2_train” and “q2_test”

You might also like