Name: AHAMMAD NADENDLA
Roll no: B19BB030
1) DD/MM/YY - 27/01/2001
 ABC        -030
 FIRST      -Nadendla
 SECOND       -Hussain
G = [m1 m2 m3
   m4 m5 m6
   m7 m8 m9]
m1 =1
m2 =0
m3 =1
m4 =0
m5 =0
m6 =0
m7 =1
m8 =1
m9=0
G =[1 0 1
  000
  1 1 0]
then total number of 1's is 4
paper id 3(as number of 1's is greater than 3)
2) In this paper for generator and discriminator 2 architecture is used first is standard
CNN and second is resnet for comparison.
When it comes to our article, the author uses a different architecture for each dataset.
Describes the standard CNN models of CIFAR10 and STL10 used in the imaging
experiments. The slope of all lReLU functions in the network is set to 0.1. For the
CIFAR10 dataset's generator architecture, a random, noisy image, as shown, is
designated as a "z" with 128 dimensions and is assigned to a high-density generator
layer sized 4 x 4x512. High-Density Layer 4X4X 512 Next, build a deconvolution layer
with size 4 x 4, and the stride is 2. Next, perform batch normalization Next, apply the
ReLu activation function Apply the above two-set structure     Next, size 3 x 3 with stride
1 Build a convolutional laye Next, apply the tanh activation function
The output of the generator is taken as 32x32x3, which should be given to the input of
the discriminator.
   ● Next, create a convolution layer that is 3 x 3 in size and has a stride of 1. Next,
       apply the LeakyReLu activation function Apply this set twice in a row. • Then
       build a convolution layer of size 3 x 3 with a stride of 1. Next, apply the
       LeakyReLu activation function Next, create a dense layer. Similarly, in the SVHN
       dataset, create an architecture with M = 6 for the generator and M = 32 for the
       discriminator. As mentioned earlier, GAN operations use generators and
       discriminators to continuously perform training actions and create fake images.
       Although it comes to different datasets in the paper as in the above operation,
       there are several different activation functions and these are the ResNet
       architectures of the CIFAR10 dataset.
As similarly we should make the Architecture with the value of M =6 for the generator
term and in the case of discriminator the value of M will be at 32 in the data set named
SVHN
But in the case of GAN as per the above the GAN operations will work continuously for
the training with generator and discriminator in order to make fake image and different
activation functions and this is ResNet architectures for CIFAR10 dataset.
Similarly we should make the Architecture that is ResNet architecture. The Conditional
GANs, Generator has been replaced the regular batch normalization layer of -ResBlock
with a conditional batch normalization layer. To model the projection Discriminator and
the Architecture is the same as the above Coming to different dataset used is STL-10
dataset. On the ResNet architectures
Optimizer :
In this paper adam optimizer is used optimizer adam stands for adaptive moment
estimation adam is an extension of your stochastic gradient descent I suppose you
know what gradient descent is and it is best for optimizing almost all of the machine
learning algorithms you see in practice.
From Adam.n this paper it is shown that Adam Optimizer gets the quality or positive
attributes of the above two techniques, and the extends them to allow for a higher
degree of tilted camber. Here, we control the pace of angle reduction to minimize sway
when reaching . It also performs a reasonable number of steps (stride size) to pass
along the way. This is the combination of the elements of the technique above that is
the least effective way to reach it in the world. Using the numerical aspects of Adam
Optimizer,the recipe used in the above two techniques, you get
Parameters used:
   1.   A small + ve constant to avoid a division error by 0 when ϵ = (vt> 0). (108)
   2.   β1 & β2 = average reduction rate of the gradients of the above two methods.
        (Β1 = 0.9 & β2 = 0.999)
   3.    α—Step size parameter / learning rate (0.001)
Since both mt and vt are fixed at 0 (in the strategy above), we can see that tends to
"distort to 0" because it is both β1 and β2 ≈ 1. This optimizer fixes this problem by
calculation. "Evaluation adjusted" mt and vt. This is also done to control the load but it
reaches a minimum to prevent high movements in the vicinity. The equation used is:
Instinctively adjust the angle for each intonation to control the angle throughout the
interaction and keep it unbiased. That's why it's named Adam. Currently, instead of the
normal weight boundaries mt and vt, we use the modified predisposition             weight
boundaries (m_hat) t and (v_hat) t. Putting them in the overall state,
Back propagation:
There's really nothing unusual about         backpropagation calculations in generative.
Hostile Network (GAN). This is equivalent to of a convolutional neural network (CNN).
This is because CNNs typically consist of a GAN generator and discriminator. The GAN
consists of a discriminator and a generator. Each of them remains stable while the other
is being prepared. Therefore, alternate preparations for the discriminator and the
generator. This is done automatically. Preparing the discriminator is much easier, so
let's take a look. At this point, prepare the generator specified in the query.
Training the discriminator:
The discriminator has two result centers, which are used to distinguish between real
incidents and fake incidents. To prepare the discriminator, use the forward path from
the generator to generate m events. These are fake events and their mark will be y = 0.
To create these, simply pass the vibration vector as a contribution to the model. We also
use m examples from the actual information. These have a y = 1 tag. Then you can see
that the discriminator is prepared in exactly the same way as a basic CNN setup with
two result hubs.
Training the generator:
Hold the discriminator firmly whenever you train the generator. This is the basis for not
over-tightening the discriminator over the limit . So basically what we have is a CNN
(generator) connected to another CNN (discriminator). Connecting the hub between
these two models will generate a and when created it will produce the ideal image.
Please note that the generator requires the case provided. This situation is named y =1
because the discriminator labels it as from the true scatter. Overall, this is just a CNN,
and backpropagation is registered in exactly the same way. First it goes through the
Kramer vector and then through the generator. Any image is created with the result of ,
then passes through the discriminator and is called a fake.Use tilted drops to update
these boundaries with the final goal that the next passthrough should result in a lower
unhappiness rate. Choosing the right unfortunate work is the basis of this interaction.
When grouping tasks, as with GAN, you typically choose parallel cross entropy, as
shown below.
                            L = −y log (y ^) − (1-y) log (1-y ^)
It also occurs N or more times, as is the normal situation for stochastic tilt dips Where y
is the actual brand and y ^ is the expected name. Check out this answer here. shows
how to use backpropagation and slope drop to create a single neuron perceptron and
then a complex tissue. The main difference is . We are using parallel entropy misfortune
work here. It has a different dependency with respect to y ^.Then propagate this
disaster throughout your organization using the rules of the chain with the starting
capabilities you choose at each level.
b) While discussing Loss function, Authors used Hinge loss for both generator loss and
discriminator loss. Those are,
GAN attempts to restore the stochastic transfer. Therefore, it is necessary to use the
unfortunate work, which reflects the distance between the dissemination of information
created by GAN and the actual transmission of information. How do you perceive the
contrast between the two promotions in GAN's accident capability? This question is in
the area of dynamic exploration and many methods have been proposed. This section
describes two normal GAN unhappiness functions. Two of them run on TFGAN.
Minimax Unhappiness: The unfortunate work used in the GAN and                 Wasserstein
unhappiness articles: The default unhappiness work for the TFGAN estimator. TFGAN
is carrying out a number of other disasters. Normally, the GAN architecture uses a
Gaussian random variable (for example, N) for the Generator's input. You can see that
the two players are competing with each other. The Discriminator wants to make a good
distinction between real and fake samples, but the Generator Discriminator is trying to
trick them. GAN has two fault capacities. One is for preparing the generator and the
other is for preparing the discriminator. How can the two unlucky abilities work together
to quantify the distance between probability spreads? In the accident plan here, we will
look at generator and discriminator accidents due to a single distance ratio between
belief propagation. Anyway, both generators in these plans can affect terms that are
outside the range of their measurements. Term reflects the spread of false information.
Therefore, when preparing the generator, remove another term that reflects the actual
circulation of information.
The misfortune of the generator and discriminator seems to be unique in the end, even
though this comes from a single recipe.
Since the introduction of GAN, there have been extensions of GAN variants. In this
tutorial exercise, you can look at specific variations of GAN inside and outside, focus on
virtual data tools to properly decipher , and finally is a knowledge chunk to promote a
fair generative model. Provide. That is, a hinge loss-based GAN. Using different loss
functions (instead of log loss): Hinge
Loss function Chosen is Mean Absolute Error
MEAN ABSOLUTE ERROR:
Mean Absolute Error is defined as the mean of the sum of absolute values of the
difference between the actual value and predicted value.
Mathematically,it can be written as
            𝑛
        1
MAE =   𝑛
            ∑ |𝑦𝑖 − 𝑦𝑖|
            𝑖=1
Here n represents total number of samples, 𝑦𝑖 represents the ground truth values and
𝑦𝑖 represents the predicted values
Architecture Chosen is Convolutional Neural Networks (CNN)
A CNN Consists of Input Layers followed by hidden Layers followed by Outptu layers as
shown. Input is given to the input layers and output is obtained from the output layers. In
between the layers, there are some weights associated.
Below is the block diagram of CNN:
Lets assume there are inputs as x1,x2,x3,x4 and also lets assume the hidden layer is of
one single layer with 2 neurons, and that they are fully connected
h1=x1*w11+x2*w21+x3*w31
h2=x1*w12+x2*w22+x3*w32
Lets assume that output layer has single neuron, then
y_predicted= Activation function(h1*w3+h2*w4).
The activation function could be sigmoid, ReLu etc.
This is how weight updates work
Start with an input image of size 5x5
Then apply the convolution with a 2x2 kernel and stride = 1. This will generate a 4x4
size feature map. Then apply 2x2maxpooling with stride = 2. This will reduce the
feature map to 2x2 size. Next, apply a logistic sigmoid. Then a fully connected layer
with two neurons. And the output layer. For simplicity, suppose you have already
completed forward traversal
. So after a full forward traversal and a partially completed reverse traversal, my network
looks like this:
I have the following CNN:
Then I compute deltas for non-linear layer (logistic sigmoid):
Then, I propagate deltas to 4x4 layer and set all the values which were filtered out by
max-pooling to 0 and gradient map look like this
Reference
   ● https://www.geeksforgeeks.org/intuition-of-adam-optimizer/
   ● https://www.jefkine.com/general/2016/09/05/backpropagation-in-convolutional-ne
      ural-networks/
   ● https://datascience.stackexchange.com/questions/27506/back-propagation-in-cn
      n
   ● https://www.youtube.com/watch?v=KBZV2X8jo7E
   ● https://youtu.be/3VcPZ4SuIaA