ASSIGNMENT-1
ABHISHEK SHRINGI
03-03-2021
CLL788
PROF. HARIPRASAD KODAMANA
Question 1:
PART-(A):
For “data_1”
Inferernce:
From the scatter plot we can infer that the given
data is in form of a cluster between a certain range
of values and point far away from it are
Scatter plot
Histograms
Inferernce:
From the scatter plot we can infer that the given data is in form of a cluster between a certain range of values and
point far away from it are
Assignment PAGE 2
Histograms
Assignment PAGE 3
Question 1:
PART-(A):
For “data_1”
Inference
Boxplot showing x and y distribution
Inferernce:
From the scatter plot we can infer that the given
data is in form of a cluster between a certain range
of values and point far away from it are
Heatmap
Assignment PAGE 4
Question 1:
PART-(A):
For “data_3”
Inferernce:
From the scatter plot we can infer that the given
data is in form of a cluster between a certain range
of values and point far away from it are
Scatter plot
Histograms
Inferernce:
From the scatter plot we can infer that the given data is in form of a cluster between a certain range of values and
point far away from it are
Assignment PAGE 5
Question 1:
PART-(A):
For “data_3”
Inferernce:
From the scatter plot we can infer that the given
data is in form of a cluster between a certain range
of values and point far away from it are
Inferernce:
From the scatter plot we can infer that the given
data is in form of a cluster between a certain range
of values and point far away from it are
Assignment PAGE 6
Question 1:
PART-(C):
For “data_1”
Parameters x y
4.9397 5.0429
Mean
4.9242 5.0747
Median
0.9718 1.014
Variance
(2.373,8.117) (2.181, 8.190)
Range
3.68476102 3.73504038
Percentile
4.08570612 4.21923318
4.4123583 4.55248202
4.700721 4.87424146
4.9242784 5.07476839
5.18017309 5.36513593
5.46958487 5.57664436
5.78852584 5.87166202
6.20755249 6.21449875
8.1170447 8.19010885
4.30398749 4.33146421
Quartile
4.9242784 5.07476839
5.60721395 5.6823799
8.1170447 8.19010885
-0.00136 -0.1207
Skewness
-0.2039 0.09080
Kurtosis
Assignment PAGE 7
For “data_3”
Parameters x y
5.0824 4.9528
Mean
5.0548 4.9788
Median
2.6572 2.6294886177174
Variance
48
(-1.458,12.267) (-1.004,10.58)
Range
3.43512982 3.28478833
Percentile
4.09472424 3.95274238
4.50069037 4.3584519
4.81775713 4.69351167
5.05487549 4.97884708
5.38188447 5.25595864
5.70739195 5.56424354
6.04270545 5.94105513
6.53338054 6.59228622
12.26702453 10.58925244
4.31898111 4.21810056
Quartile
5.05487549 4.97884708
5.89160317 5.7826677
12.2670245 10.58925244
0.0413 -0.0825
Skewness
3.30098 2.417
Kurtosis
Assignment PAGE 8
Question 1:
PART-(D):
By Standard Deviation approach( For Data3) :
Outliers in x Outliers in y
10.10393747643514 10.43902165955905,
10.67798532964827 10.55061243676815,
12.2670245277286 10.58925243952102,
11.44965456902049 10.11551641326384,
10.28782093813039 -0.5334956955315495,
-1.458402520742969 -0.2557338130229054,
-1.171853375119153 -1.004100360593079,
-0.763132811291513 -0.5528195925039532
-0.2198563833130491
-0.2439263484483771
By Mean Deviation approach ( For Data3)
Outliers in x Outliers in y
10.10393747643514 9.745989964192137
9.14841306110033 9.265264240840457
9.630313376450168 10.43902165955905
9.109233422770801 9.219071559753377
9.37144942459267 9.717401494534483
9.005201225564582 8.5540601585995
9.039507424436444 10.55061243676815
10.67798532964827 9.749970745029252
9.510497893324324 9.127858533076473
12.2670245277286 8.581245337617629
9.5456448855292 8.529730294461498
9.375609831291062 10.58925243952102
8.780482242073463 8.693157145943976
11.44965456902049 9.431652444355333
9.321302250247287 10.11551641326384
10.28782093813039 0.739593465779022
-1.458402520742969 1.028275835883979
-1.171853375119153 1.336612395334499
1.172063763666684 1.289365241997683
1.180680541173682 0.357106853679473
0.7916468945614012 0.303137254711518
1.400723237983714 0.580120633348619
-0.763132811291513 1.271821699448418
1.232122283140872 0.365306465834663
0.7725850670343857 0.432114885726590
1.179109482383718 0.468033364310671
-0.2198563833130491 -0.53349569553154
Assignment PAGE 9
0.7398399056096148 1.154781275984739
-0.2439263484483771 -0.25573381302290
0.5680693847459939 0.434412418225085
1.083946032586385 -1.00410036059307
0.9053918140205244 0.747056391444499
1.286385738350685 1.047516482080888
-0.55281959250395
0.957823807905711
1.421603535576362
Inference:-
From the data obtained we can see that the number of outliers in obtained through the MAD approach is greater. This is
due to the fact that the standard deviation approach relies on the mean of the data which in turn would be dependent on
the outliers. However, the MAD approach utilizes the median of the data and hence is less effected by outliers and can
easily detect them.
Question 2: Bach Gradient Descent
STEPS DONE:
1) Input of data from files “q1”
2) Then this data was divided into input(x) and output(y)
3) Then, learning rate was assigned value. Parameters(theta) were initialized to 1.
4) Then a batch gradient descent was implemented and iterated until we get minimum cost .
5) A plot of cost function with iteration was plotted for different learning rates as given below.
Inference: We see that the cost decreases with each iteration showing that it gradually converges to a particular value. On
reaching this value the cost becomes constant thus stopping the batch gradient descent algorithms and helping us obtain
the corresponding value of thetas. These theta are then used to fit a straight line to the data. As seen from the plot on the
left, the line fits the data quite well thus showing the model works well
Question 2: Stochastic Gradient Descent
Assignment PAGE 10
STEPS DONE:
1) Input of data from files “q1”
2) Then this data was divided into input(x) and output(y)
3) Then, learning rate was assigned value. Parameters(theta) were initialized to 1.
4) Then a Stochastic gradient descent was implemented and iterated until we get minimum cost .
5) A plot of cost function with iteration was plotted for different learning rates as given below.
Inference: We see that the cost decreases with each iteration showing that it gradually converges to a particular value.
However on comparing this with that of the batch gradient descent we see that the path followed by stochastic gradient
descent is quite noisy. This is in accordance with the expected results as in this algorithm we update our hypothesis after
each example as a result of which we may not always follow the optimum path.
Once the algorithm converges we obtain the corresponding value of thetas. These theta are then used to fit a straight line
to the data. As seen from the plot on the left, the line fits the data quite well thus showing the model created is working.
Question 2: Least Squared Close Form Solution
STEPS DONE:
1) Input of data from files “q1”
2) Then this data was divided into input(x) and output(y)
3) Then, learning rate was assigned value. Parameters(theta) were initialized to 1.
4) Then Least Squared solution was implemented according to the following formula thus giving us the optimum
value of the parameters required:
𝜃 = (𝑋 𝑇 ∗ 𝑋)−1 ∗ 𝑋 𝑇 ∗ 𝑦
5) A plot of cost function with iteration was plotted for different learning rates as given below.
Inference: As expected, the model obtained through this method seems as the best fit as here we are using a theoretical
equation to arrive directly at the optimum value. Note that this is also expected to be the fastest algorithm due to the least
computations involved and absence of loops within the code.
Assignment PAGE 11
Question 2b:
x y w h
6.2101 17.612 0.023960093 17.612
5.6277 9.2302 0.000504599 9.078502328
8.6186 13.762 0.113718462 13.52965597
7.1032 11.954 0.639492951 10.80446502
theta0 theta1
0 0
0.004219852 0.0262057
0.004265662 0.026463506
0.019651378 0.159066843
0.088745171 0.649853868
Assignment PAGE 12
Assignment PAGE 13
Question-2c
Assignment PAGE 14
Using suitable python libraries, I was able to implement the following regression algorithms:
Ridge Regression:
Lasso Regression:
Elastic Regression:
Assignment PAGE 15
Inference: Through the three plots above we can see that each of the three algorithms have almost the same performance
and seem to fit the data equally well. It was expected that lasso regression would be better. However as the given dataset
consists of only 1 input feature lasso regression underperforms(can be observed by calculating model score). It is a well
known fact that lasoo regression tries to reduce the input parameters and works well for datasets with a large number of
input parameters. On the other hand ridge regression is more suitable for datasets with fewer datapoints.
Question -3
STEPS DONE:
6) Input of data from files “q2_train” and “q2_test”
7) Then this data was divided into input(x) and output(y)
8) Then, as input involves more than one feature or independent variable so feature scaling was done so as to avoid
any skewness in results.This was done using a function from sklearn library named ‘MinMaxScaler()’
9) Then, learning rate was assigned value. Parameters(theta) were initialized to 1.
10) Sigmoid function was calculated as given below:
11) Then a batch gradient descent was implemented and iterated until we get minimum cost .
12) A plot of cost function with iteration was plotted for different learning rates as given below.
OBSERVATIONS:
1) For the various learning rates below or equal to ‘.1’ the cost function shows excellent convergence. The graph
below shows cost vs iteration for the the batch lms implemented (𝛼 = .1).
Assignment PAGE 16
Clearly from the above graph, it can be inferred that the convergence occur around 50 iterations. The value of
parameters after the complete set of iterations were found out to be 𝜃 = [−3.32681817 5.82141106 1.10280864].
Assignment PAGE 17
2) In order to compare the actual and the predicted value , a contour was plotted between the input parameters for
the training data as shown below:
It can be inferred from the above graph that for the training data, most of the predicted and training data
coincides and hence it can be said that the logistic regression fairly predicts correct output for a given set of
inputsFor the test data predictions were made , and the output is represented on a similar contour plot between
the input parameters as shown below:
Assignment PAGE 18
Assignment PAGE 19