0% found this document useful (0 votes)
26 views8 pages

Neural Network Notes

The document explains how neural networks can resolve the XOR problem using a specific architecture consisting of an input layer, a middle layer with ReLU-based units, and an output layer. It details the process of calculating outputs through weighted sums and activation functions, emphasizing the advantages of using ReLU and tanh over sigmoid functions. Additionally, it describes the structure and training of feed-forward neural networks for classification tasks.

Uploaded by

Ayaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views8 pages

Neural Network Notes

The document explains how neural networks can resolve the XOR problem using a specific architecture consisting of an input layer, a middle layer with ReLU-based units, and an output layer. It details the process of calculating outputs through weighted sums and activation functions, emphasizing the advantages of using ReLU and tanh over sigmoid functions. Additionally, it describes the structure and training of feed-forward neural networks for classification tasks.

Uploaded by

Ayaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

How Neural Networks resolve XOR issue?

ARCHITECTURE ? 1 ip layer, x1 and x2

1 middle layer, h1 and h2

1 op layer, y1

# Middle layer consists of 2 ReLu-based units

[Diagram]

x1 x2

0 0

0 1

1 0

1 1

When (0, 0) is in the middle layer unit h1, the weights [1, 1] are applied to the input values to obtain:

0(1) + 0(1) = 0

Adding bias term:

0 + (-1) (+1) = -1

However, since ReLu-based units produce a value of 0 for all values that are negative in nature, the

output value for h1 = 0.

Similarly for h2, it yields a value of 0.

? Middle layer yields the value [0, 0], applying the value of the weights [-2, 1] and the bias
expression (0+1) the output of the entire neural network will be 0.

When this entire process is replicated for all of their input values of the table, the corresponding

values produced are that of their XOR operation.

PREREQUISITES

=> Concept of Neural Networks:

A neural network is simply a network of neural computing units, each of which takes in a vector of

inputs and produces a single output value.

[Diagram]

A neural computing unit is the fundamental building block of a neural network.

It takes in a vector of input values, performs some computation on them and produces an output

value.

When neural computing units receive a vector of input values, they perform a weighted sum on

these input values and then they add a bias to the result of this weighted sum.

The result is then passed into some linear function, known as activation function, to produce an

output value.

Eg: w = [0.2, 0.2, 0.2, 0.1], b = 0.5

x = [5.0, 4.0, 1.0, 2.0]

weighted sum = 0.2(5) + 0.2(4) + 0.2(1) + 0.1(2)

= 2.2
adding bias = 2.2 + 0.5 = 2.7 ? g supplied to activation function
=> Activation Functions ? function that is added to an ANN in order to help the network learn

complex patterns in the data.

When comparing with neuron-based model that is in our brain, it is at the end deciding what is to be

fired to the next neuron.

In ANN, the activation function of a node defines the output of the node given an input or set of

inputs.

They are simply non-linear functions that convert the values they receive into output values for the

neural computational units.

1. Sigmoid Activation Function

f(z) = 1 / (1 + e^-z)

[Graph of sigmoid]

2. Tanh or Hyperbolic Tangent Activation Function (Sigmoidal)

f(x) = tanh(x) = (2 / (1 + e^-2x)) - 1

[Graph of tanh]

? Why is tanh better than sigmoid activation function?

- When the input is large or small, the output is almost smooth and the gradient is small, which is not
conducive to weight update.

The difference is the output interval.

Output interval for tanh is [-1, 1] and the output function is 0-centric, which is better than sigmoid.

- Major advantage is that the negative inputs will be mapped strongly negative and the 0 input will

be mapped near 0 in the tanh graph.

? In binary classification problems, tanh is used for hidden layer and the sigmoid function for output

layer.

3. ReLu (Rectified Linear Unit) activation function:

?(x) = max(0, x), x > 0

= 0, x < 0

[Graph of ReLU]

Range [0, ?)

? Better than tanh and sigmoid:

? When the input is positive, there is no gradient saturation problem.

? The calculation speed is much faster. It has only a linear relationship so whether it is forward or

backward it is faster than the 2 (sigmoid and tanh need to calculate their exp output which will be

slower)

(* dead ReLU problem)


? Feed Forward Neural Networks

A multilayer network of neural units in which the outputs from the units in each layer are passed to

the units in the higher layer.

These networks don?t have any cycles within them, i.e., the outputs from these units don?t flow in

cyclical manner.

[Diagram]

n (x1, x2, ..., x_input) ? i/p values

y (y1, y2, ..., y_output) ? o/p values

x1 to xn ? n input values of the network reside on the first layer (Layer 0)

y1 to ym ? n input output values reside on the last layer (Layer 2)

W ? matrix containing the weights to be applied to the input values

U ? matrix containing the weights to be applied to the output values of the hidden layer

b ? vector containing the bias terms to be applied to the input values

Mathematical Representation:

Multinomial classification:

h = activation function (W.x + b)

z = U.h

y = softmax(z)
For multinomial classification, it?s prudent to choose a softmax function to normalize any vector of

real values received through the performance of a matrix multiplication between U and h.

The normalization process is meant to transform the vector of real values into a vector that

represents a probability distribution.

softmax(zi) = e^zi / ? e^zj for 1 ? i ? d

A feed-forward neural network is a supervised ML algorithm.

To train a neural network means to figure out the right values of W and U for each layer in the neural

network to enable it to predict accurate values of y when given input values of x.

You might also like