AI for Mechanical Engineering
Dr. Arsalan Arif
Artificial Intelligence A Modern Approach
Stuart J. Russell and Peter Norvig
Spring 2024
Linear Regression
Dependent Variable
Weight gain vs intake of food
Positive relationship
Weight Gain Vs expenditure
Independent variable increase and dependent
variable decreases
Independent Variable
Regression Line
Minimize the difference between the estimated and
actual value Negative relationship
Error
Linear
Regression
^
𝑦𝑦=𝑏 0 +𝑏
=𝑚𝑥 1 𝑥1
+𝐶
𝑏^
𝑦 =𝑏
=
6 𝑥 +𝑏 𝑥
0 =0.6
0 1 𝑏
1 =
∑ (=1𝒙 − 𝒙 ) ∗( 𝒚 − 𝒚 )
Where
∑ ( 𝒙 − 𝒙 )𝟐
1 1
10
^
𝑦 =𝑏 +𝑏1 𝑥 1
𝑏1 =? 0
^ =slope+𝑏
𝑦 =𝑏 of the line 𝑏 0=2.2
0 1 𝑥1 4=𝑏0 + 0.6 ∗ 3
(3 , 4)
^
𝑦 =𝑏 0 +𝑏1 𝑥 1
x y
1 2 1-3=-2 2-4=-2 4 4
6
5
2 4 2-3=-1 4-4=0 1 0
4 3 5 3-3=0 5-4=1 1 0
3
4 4 4-3=1 4-4=0 1 0
2
1 5 5 5-3=2 5-4=1 4 2
0 1 2 4 5
3
3 4 10 6
Linear
Regression 6
5
𝑏1 =∑ 𝒚^ −𝒚 ¿
2 ¿ ¿ 𝑏1 =
3.6
=𝟎 . 𝟔
4
range is from 0 to 1
∑ 𝒚 −𝒚 ¿2 6
=0.6 means it’s a good fit
3
2
1
^
𝑦 =2.2+ 0.6 𝑥
0 1 2 3 4 5
x y
1 2 1-3=-2 2-4=-2 4 2.8 2.8-4=-1.2 1.44 4 4
2 4 2-3=-1 4-4=0 0 3.4 -0.6 0.36 1 0
3 5 3-3=0 5-4=1 1 4 0 0 1 0
4 4 4-3=1 4-4=0 0 4.6 0.6 0.36 1 0
5 5 5-3=2 5-4=1 1 5.2 1.2 1.44 4 2
3 4 6 3.6 10 6
Bias: Underfitting
Gap between Actual and estimated value.
High Bias means estimated value is far away from the actual
value. And vice versa.
When algorithm has limited flexibility to learn.
Pays less attention to training data, and over simplify the
model.
Such models always leads to high error on training and test
data.
Variance:
How much scattered the estimated values are
A model with high variance pays lots of attention to training
data and doesn't generalize.
Overfitting
Anaconda
is open source (free) of python programing for machine learning with tools like
Spider
Jupitar notebook is a platform
Google provided free GPU (Online access). Also paid ( Faster)
More number of cores, parallel computation.
Pandas Application Programing Interface (API)
Numerical Python (numpy)
Pandas is open source python library data analysis tool, providing high performance
Read “csv “ file.
By default first 5 samples
It can be chosen by writing required
data visibility
25 % of your data is less than or equal to 7
50 % of your data is less than or equal to 7.5
Split the data
Training, Validation and Testing
Test size = 40 % of the data
Call scikit (SK) library for data spliting
Train the system and validate it.
Separate some % age of the data (Unseen by the system) for testing. Then report the results on test data
Training, Validation and Testing
Test size = 40 % of the data
Calibrate the regression problems to prevents under or overfitting in order to minimize the adjusted loss
function.
Ridge regularization
Used L2 Norm
Loss= sum of the square of the errors =0.61
=1.96Line
Regression
Lasso regularization
Used L1 Norm 6
Linear regularization
Ridge regularization 5
Loss=0
Used L2 Norm 4
λ= 1
Loss= sum of the square of the errors 3
w= 1.4 2
Cost Fun= Loss +λ
Cost Fun=0+1*(1.4=1.96 1
Loss= Sum of squared values
λ= Penalty for errors 0 1 2 3 4 5
w= Slope of curve Ridge regularization Lasso regularization
Loss=(0 Cost Fun= Loss +λ
λ= 1 Loss= Sum of squared values
w= 0.7 λ= Penalty for errors
Cost Fun=0.13+1*(0.7=0.61 w= Slope of curve
Calibrate the regression problems to prevents under or overfitting in order to minimize the adjusted loss
function.
Ridge regularization
Used L2 Norm
Loss= sum of the square of the errors
Lasso regularization
Used L1 Norm