Activation Functions
An activation function is a mathematical function applied to the output of a neuron.
It introduces non-linearity into the model, allowing the network to learn and
represent complex patterns in the data.
Without this non-linearity feature, a neural network would behave like a linear
regression model, no matter how many layers it has.
The activation function decides whether a neuron should be activated by
calculating the weighted sum of inputs and adding a bias term.
This helps the model make complex decisions and predictions by introducing non-
linearities to the output of each neuron.
1. SIGMOID Sigmoidal functions are frequently used
in machine learning, specifically in the
ACTIVATION testing of artificial neural networks, as a
FUNCTION way of understanding the output of a
node or “neuron.”
A sigmoid function is a type of activation
function, and more specifically defined
as a squashing function. Squashing
functions limit the output to a range
between 0 and 1.
Pros And Cons of Sigmoid Activation function
Pros Cons
1. The performance of Binary 1. The calculation in sigmoid function is
Classification is very well as compare complex.
to other activation function. 2. It is not useful in multiclass
2. Clear predictions, i.e very close to 1 or classification .
0. 3. For negative values of x-axis gives 0.
4. It become constant and gives 1 for any
high positive values.
5. Function output is not zero-centered
Hypertangent
activation Function
This function is easily defined as the
ratio between the hyperbolic sine and
the cosine functions
Pros And Cons of Tanh Activation function
Pros Cons
1. The gradient is stronger for tanh than 1. Tanh also has the vanishing gradient
sigmoid ( derivatives are steeper). problem.
2. The output interval of tanh is 1, and the
whole function is 0-centric, which is
better than sigmod
ReLu Activation
Function
The ReLU function is actually a
function that takes the maximum
value
Pros And Cons of ReLu function
Pros Cons
1. When the input is positive, there is no 1. When the input is negative, ReLU is
gradient saturation problem. completely inactive, which means that
2. The calculation speed is much faster. once a negative number is entered,
3. The ReLU function has only a linear ReLU will die
relationship. 2. We find that the output of the ReLU
4. Whether it is forward or backward, it is function is either 0 or a positive number,
much faster than sigmod and tanh. which means that the ReLU function is
not a 0-centric function.
Leaky ReLu
Function
It is an attempt to solve the dying
ReLU problem
The leak helps to increase the
range of the ReLU function.
Usually, the value of a is 0.01 or so.
Pros And Cons of Leaky ReLu Activation
function
Pros Cons
1. There will be no problems with Dead 1. It has not been fully proved that Leaky
ReLU. ReLU is always better than ReLU.
2. A parameter-based method, Parametric
ReLU : f(x)= max(alpha x,x), which
alpha can be learned from back
propagation.
ELU is very similiar to RELU except negative inputs.
They are both in identity function form for non-
negative inputs. On the other hand, ELU becomes
smooth slowly until its output equal to -α whereas
RELU sharply smoothes.
ELU (Exponential
Linear Units)
function
Pros And Cons of ELU Activation function
Pros Cons
1. ELU becomes smooth slowly until its 1. For x > 0, it can blow up the activation
output equal to -α whereas RELU with the output range of [0, inf].
sharply smoothes.
2. ELU is a strong alternative to ReLU.
3. Unlike to ReLU, ELU can produce
negative outputs.
Softmax Function
Softmax function calculates the
probabilities distribution of the event
over ‘n’ different events. In general
way of saying, this function will
calculate the probabilities of each
target class over all possible target
classes.
Pros And Cons of Softmax Activation function
Pros Cons
1. It mimics the one hot encoded labels 1. The softmax function should not be used
better than the absolute values. for multi-label classification.
2. If we use the absolute (modulus) values 2. the sigmoid function (discussed later) is
we would lose information, while the preferred for multi-label classification.
exponential intrinsically takes care of 3. The Softmax function should not be used
this. for a regression task as well.
Swish Function
Swish's design was inspired by the
use of sigmoid functions for gating
in LSTMs and highway networks. We
use the same value for gating to
simplify the gating mechanism,
which is called self-gating.
Pros And Cons of Swish Activation function
Pros Cons
1. No dying ReLU. 1. Slightly more computationally
2. Increase in accuracy over ReLU expensive.
3. Outperforms ReLU in every batch size. 2. More problems with the algorithm will
probably arise given time.
Maxout Function The Maxout activation function
is defined as follows
The Maxout activation is a
generalization of the ReLU and the
leaky ReLU functions. It is a
learnable activation function.
Pros And Cons of Maxout Activation function
Pros Cons
1. It is a learnable activation function. 1. It doubles the total number of
parameters for each neuron and hence,
a higher total number of parameters
need to be trained.
Softplus Activation
Funtion
The softplus function is similar to
the ReLU function, but it is relatively
smooth.It is unilateral suppression
like ReLU.It has a wide acceptance
range (0, + inf).
Softplus function: f(x) = ln(1+exp x)
Pros And Cons of Softplus Activation function
Pros Cons
1. It is relatively smooth. 1. Leaky ReLU is a piecewise linear
2. It is unilateral suppression like ReLU. function, just as for ReLU, so quick to
3. It has a wide acceptance range (0, + compute. ELU has the advantage over
inf). softmax and ReLU that it's mean output
is closer to zero, which improves
learning.