Logistic Regression Explained

This document discusses logistic regression with cross-entropy loss for binary classification problems. It defines logistic regression as finding a polynomial function to separate two classes of data points using a logistic function. The logistic function is used because it is differentiable and can be optimized with gradient descent. Cross-entropy loss is used as the cost function for training, which maximizes the likelihood of the training data. The gradient of the cross-entropy loss is derived to allow optimizing the model parameters with gradient descent.

Uploaded by

Mario Andres Fernandez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views3 pages

Logistic Regression Explained

Uploaded by

Mario Andres Fernandez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Logistic classification with cross-entropy loss

Julián D. Arias Londoño

August 3, 2020

1 Definition
Logistic regression (LG) is one the basic models studied in Statistics and Ma-
chine Learning to solve bi-class classification problems. The intuition behind
this model is to find a polynomial function capable of splitting the feature space
into two parts, i.e. the polynomial function plays the role of decision bound-
ary between the two classes. The aim of the training algorithm is to find the
model’s parameters (polynomial’s weights) such that each part of the space
contains samples from only one of the classes as long as it is possible. Figure 1
shows a scatter plot of a two-class toy problem and the boundary function.

Figure 1: Scatter plot for a two-class problem and a linear decision function
N
Formally, given a dataset D = {(xj , yj )}j=1 , where xj is a feature vector rep-
resenting a sample j, and yj is its corresponding target output, which can take
one of two possible values {0, 1}, the aim is to build a function able to predict
whether a new sample belongs to class 0 or 1. The LG model is composed of a
polynomial function wrapped by a logistic function; it can be expressed as:
1
g(wT x) = (1)
1 + exp(−wT x)
The logistic function is chosen because it is a derivable approximation of the

1
sing function, and thus it can be used for gradient-based optimization methods.
Figure 2 shows a graphic representation of the logistic function.

Figure 2: Logistic function

In order to train the model, we need to define a loss function that can be used for
optimization purposes. Typically LG model uses the well-known cross-entropy
function as cost function. Taking into account that logistic function provide a
value in the interval [0, 1], it can be interpreted as the probability of belonging
to class 1. Therefore, we can use the Maximum Likelihood criterion applied to
a Bernoulli distribution to come out with the function we want to optimize.

By assuming the samples are i.i.d the log-likelihood function can be estimated
as:
 
N
y
Y
arg max L = log  pj j (1 − pj )(1−yj ) 
w
j=1
 
N
Y
= log  (g(wT xj ))yj (1 − g(wT xj ))(1−yj ) 
j=1
N
X
log (g(wT xj ))yj + log (1 − g(wT xj ))(1−yj )

=
j=1
N
X
yj log g(wT xj ) + (1 − yj ) log 1 − g(wT xj )

= (2)
j=1

For the sake of numerical stability and for the use of a minimization algorithm
instead of a maximization one, the final cross-entropy loss function is given by:
N
1 X
yj log g(wT xj ) + (1 − yj ) log 1 − g(wT xj )

J(w) = − (3)
N j=1

2
One of the most common optimization algorithms used for LG is the Gradient
Descent which is based on applying iteratively the following rule:

w(τ ) = w(τ − 1) − η∇J(w) (4)

In order to apply the former rule, the gradient of J(w) must be estimated. The
first step to get the gradient, is to estimate the derivative of the logistic function.

T 1
∇w g(w x) = ∇w
1 + exp(−wT x)
exp(−wT x)x
= 2
(1 + exp(−wT x))
exp(−wT x) x
=
1 + exp(−wT x) 1 + exp(−wT x)
= g(wT x)(1 − g(wT x))x (5)

Based on the former result it is quite easy to estimate the gradient of the cross-
entropy function as:

 
N
1 X
yj log g(wT xj ) − (1 − yj ) log 1 − g(wT xj ) 

∇w J(w) = ∇w −
N j=1
N
1 X ∇w g(wT xj ) −∇w g(wT xj )
= − yj + (1 − yi ) (6)
N j=1 g(wT xj ) 1 − g(wT xj )

By replacing Eq. (5) into Eq. (6) we obtain:

N
1 X
yj 1 − g(wT xj ) xj − (1 − yi )g(wT xj )xj

∇w J(w) = −
N j=1
N
1 X
g(wT xj ) − yj xj

= (7)
N j=1

The expression in Eq. (7) is similar to the one that can be obtained for multiple
regression using the Least Square Error cost function, that in turn can be derived
from the maximum likelihood criterion but applied to a normal distribution
instead [1].

References
[1] C. M. Bishop, Pattern recognition and machine learning. Springer, 2006.

Iterative Methods for Grad Students
No ratings yet
Iterative Methods for Grad Students
33 pages
Course Notes Math 146
No ratings yet
Course Notes Math 146
10 pages
Mathematical Treatise On Linear Algebra
No ratings yet
Mathematical Treatise On Linear Algebra
7 pages
04 Notes 6250 f13
0% (1)
04 Notes 6250 f13
16 pages
Metric Space Measure Theory
No ratings yet
Metric Space Measure Theory
17 pages
Test of 20 Different MA Filters For Smoothness and Responsiveness
No ratings yet
Test of 20 Different MA Filters For Smoothness and Responsiveness
12 pages
Quality and Reliability Engineering
No ratings yet
Quality and Reliability Engineering
15 pages
Ensemble Average and Time Average
No ratings yet
Ensemble Average and Time Average
31 pages
FMEA
No ratings yet
FMEA
5 pages
Naive - Bayes - Ipynb - Colab
No ratings yet
Naive - Bayes - Ipynb - Colab
3 pages
Bates 1996
No ratings yet
Bates 1996
40 pages
Information Theory: 1 Random Variables and Probabilities X
No ratings yet
Information Theory: 1 Random Variables and Probabilities X
8 pages
Machine Learning and Finance II
No ratings yet
Machine Learning and Finance II
103 pages
AI For Trading Syllabus: Contact Info
100% (1)
AI For Trading Syllabus: Contact Info
7 pages
FMEA Guide for Engineering Students
No ratings yet
FMEA Guide for Engineering Students
16 pages
Lecture Notes For Algorithms For Data Science: 1 Nearest Neighbors
No ratings yet
Lecture Notes For Algorithms For Data Science: 1 Nearest Neighbors
3 pages
A Futures Quantitative Trading Strategy Based On A Deep Reinforcement Learning Algorithm
No ratings yet
A Futures Quantitative Trading Strategy Based On A Deep Reinforcement Learning Algorithm
5 pages
Eng PDF
No ratings yet
Eng PDF
166 pages
Time Series Analysis and Spectral Analysis
No ratings yet
Time Series Analysis and Spectral Analysis
11 pages
Slides Aymeric KALIFE Derivatives As Hedge Instruments-2022
No ratings yet
Slides Aymeric KALIFE Derivatives As Hedge Instruments-2022
234 pages
Beginners' Guide to Object Recognition
No ratings yet
Beginners' Guide to Object Recognition
9 pages
02 A New Hybrid Cryptocurrency Returns Forecasting Method Based On Multiscale Decomposition and An Optimized Extreme Learning Machine Using The Sparro
No ratings yet
02 A New Hybrid Cryptocurrency Returns Forecasting Method Based On Multiscale Decomposition and An Optimized Extreme Learning Machine Using The Sparro
15 pages
Machine Learning Approach To Regime Modeling
No ratings yet
Machine Learning Approach To Regime Modeling
9 pages
Bayesian Methods in Finance-Nick Polson
No ratings yet
Bayesian Methods in Finance-Nick Polson
38 pages
A Reinforcement Learning Extension To The Almgren-Chriss Framework For Optimal Trade Execution
No ratings yet
A Reinforcement Learning Extension To The Almgren-Chriss Framework For Optimal Trade Execution
8 pages
Comparative
No ratings yet
Comparative
5 pages
FMEA
No ratings yet
FMEA
26 pages
Badyal, 2003 - Animal Models of Hypertension and Effect of Drugs
100% (2)
Badyal, 2003 - Animal Models of Hypertension and Effect of Drugs
14 pages
Chapter 9 - Mechanics of Options Markets
No ratings yet
Chapter 9 - Mechanics of Options Markets
28 pages
Lecture - 12 Von Neumann & Morgenstern Expected Utility
No ratings yet
Lecture - 12 Von Neumann & Morgenstern Expected Utility
20 pages
Ito Process
No ratings yet
Ito Process
76 pages
Master Thesis Excl. Appendix
No ratings yet
Master Thesis Excl. Appendix
118 pages
Swiss Army Knife Indicator
No ratings yet
Swiss Army Knife Indicator
13 pages
On Measures of Entropy and Information
No ratings yet
On Measures of Entropy and Information
18 pages
Fractal Formation and Trend Trading Strategy in Futures Market
No ratings yet
Fractal Formation and Trend Trading Strategy in Futures Market
5 pages
Autoregressive Integrated Moving Average Arima
No ratings yet
Autoregressive Integrated Moving Average Arima
23 pages
Technical Trading and Cryptocurrencies
No ratings yet
Technical Trading and Cryptocurrencies
30 pages
State Estimation Using Kalman Filter
No ratings yet
State Estimation Using Kalman Filter
32 pages
UNIT 5 Time Series Analysis
No ratings yet
UNIT 5 Time Series Analysis
17 pages
Signal Processing for Finance Experts
No ratings yet
Signal Processing for Finance Experts
19 pages
Transformer Forecasting Review
No ratings yet
Transformer Forecasting Review
30 pages
Administration OPEN DATABASE 2024
No ratings yet
Administration OPEN DATABASE 2024
66 pages
Arbitrage Theory in Continuous Time: Third Edition
No ratings yet
Arbitrage Theory in Continuous Time: Third Edition
9 pages
A Pairs Trading Strategy For GOOG:GOOGL Using Machine Learning
No ratings yet
A Pairs Trading Strategy For GOOG:GOOGL Using Machine Learning
5 pages
Python Data Science Cookbook - (Index)
No ratings yet
Python Data Science Cookbook - (Index)
9 pages
Advanced Brownian Motion Tasks
No ratings yet
Advanced Brownian Motion Tasks
7 pages
01 Transformers For Time-Series Data - by BearingPoint Data, Analytics & AI - BearingPoint Data, Analytics & AI - Medium
No ratings yet
01 Transformers For Time-Series Data - by BearingPoint Data, Analytics & AI - BearingPoint Data, Analytics & AI - Medium
20 pages
Quant Interviews Cheatsheet
No ratings yet
Quant Interviews Cheatsheet
1 page
Performance Measures: Return Risk
No ratings yet
Performance Measures: Return Risk
40 pages
Option Trade Classification
No ratings yet
Option Trade Classification
60 pages
3.2 Performance Evaluations
No ratings yet
3.2 Performance Evaluations
18 pages
Wavelet Transform in Banking Models
No ratings yet
Wavelet Transform in Banking Models
11 pages
A New Approach To Markov-Switching Garch Models
100% (1)
A New Approach To Markov-Switching Garch Models
38 pages
Advance Stochastic Calculus (Abstracts) PDF
100% (2)
Advance Stochastic Calculus (Abstracts) PDF
106 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Exp 2
No ratings yet
Exp 2
7 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
Machine Learning for Mechanics
No ratings yet
Machine Learning for Mechanics
19 pages
cs188 Fa23 Note22
No ratings yet
cs188 Fa23 Note22
3 pages
A História Da Ciência de Software
No ratings yet
A História Da Ciência de Software
12 pages
Mbenet
No ratings yet
Mbenet
42 pages
CCNP Lab Questions:: Answer
100% (1)
CCNP Lab Questions:: Answer
27 pages
Case 1
No ratings yet
Case 1
13 pages
Ot and Iot
No ratings yet
Ot and Iot
12 pages
Computer Basics-WPS Office
No ratings yet
Computer Basics-WPS Office
4 pages
PyTorch - A Comprehensive Overview
No ratings yet
PyTorch - A Comprehensive Overview
7 pages
Security Lab
No ratings yet
Security Lab
49 pages
U35WF Product User Manual
100% (2)
U35WF Product User Manual
9 pages
SQL Practice Queries for Employees
No ratings yet
SQL Practice Queries for Employees
6 pages
Onifade Siwes Report
No ratings yet
Onifade Siwes Report
15 pages
Curriculum Vitae: Medical Students' International Network - Sudan Medsin-Sudan
No ratings yet
Curriculum Vitae: Medical Students' International Network - Sudan Medsin-Sudan
6 pages
Database Systems and DBMS Overview
No ratings yet
Database Systems and DBMS Overview
6 pages
WEbDriver APIs
No ratings yet
WEbDriver APIs
8 pages
VHDL Data Types
No ratings yet
VHDL Data Types
22 pages
Cascading Style Sheet (CSS) : Instructor: Dr. Fang (Daisy) Tang
No ratings yet
Cascading Style Sheet (CSS) : Instructor: Dr. Fang (Daisy) Tang
20 pages
Five Patent Search Engines and Databases
No ratings yet
Five Patent Search Engines and Databases
5 pages
Thesis June 16 2019
No ratings yet
Thesis June 16 2019
11 pages
Instructions For Peta LIT 2 Finals
No ratings yet
Instructions For Peta LIT 2 Finals
12 pages
Automation System For Injection Molding Machines
No ratings yet
Automation System For Injection Molding Machines
56 pages
Play List de Agosto 0082013.PRT629216.7261
No ratings yet
Play List de Agosto 0082013.PRT629216.7261
170 pages
Inside Computer System-1
No ratings yet
Inside Computer System-1
17 pages
Java Unit-3 - Interfaces and Packages Notes - Knowt
No ratings yet
Java Unit-3 - Interfaces and Packages Notes - Knowt
5 pages
LR Class Direct User Guide
No ratings yet
LR Class Direct User Guide
33 pages
AutoML-GPT for AI Task Automation
No ratings yet
AutoML-GPT for AI Task Automation
11 pages
Answers 1. Critical Path: Found by Filtering "Critical Path"
No ratings yet
Answers 1. Critical Path: Found by Filtering "Critical Path"
1 page
Wasp IV Manual 2006
No ratings yet
Wasp IV Manual 2006
343 pages
Introduction To ERP: Enterprise Systems Using SAP
No ratings yet
Introduction To ERP: Enterprise Systems Using SAP
59 pages
0323-Servodoser G3-US v3
No ratings yet
0323-Servodoser G3-US v3
2 pages
Java Imp
No ratings yet
Java Imp
133 pages

Logistic Regression Explained

Uploaded by

Logistic Regression Explained

Uploaded by

Logistic classification with cross-entropy loss

Julián D. Arias Londoño

Figure 2: Logistic function

w(τ ) = w(τ − 1) − η∇J(w) (4)

By replacing Eq. (5) into Eq. (6) we obtain:

You might also like