DL Lab Manual
DL Lab Manual
• Enhance deep learning tasks with Google Colab and GPU resources.
• Study and implement feedforward neural networks and backpropagation.
• Implement Batch Gradient Descent, Stochastic Gradient Descent, and Mini-batch
Gradient Descent.
• Study and apply batch normalization and dropout for neural networks.
• Train RNNs for sentiment analysis, CNNs for object detection, and GANs for image
generation.
Course Outcomes
At the end of this course, the students will be able to:
• Gain proficiency in using Google Colab for coding, collaborating, and accessing
powerful computing resources like GPUs for faster processing.
• Hands-on experience with libraries like TensorFlow, Keras, or PyTorch, implementing
FFNNs for various data types and tasks.
• Learn the step-by-step process of backpropagation and how gradients are calculated and
used to update weights in a neural network.
• Understand how batch, stochastic, and mini-batch gradient descent affect the efficiency,
accuracy, and computational cost of training models.
• Learn to apply SVD to reduce the dimensionality of datasets, improving computational
efficiency and mitigating issues related to high-dimensional data.
• Understand the techniques like sparsity, dropout, and variational autoencoders to improve
the performance and generalization of autoencoders.
• Gain word2vec embeddings to tasks such as sentiment analysis, document clustering, and
recommendation systems.
• Learn about convolutional neural network (CNN) architectures tailored for object
detection, including feature extraction, detection layers, and bounding box regression.
• Learn the structure and functioning of Long Short-Term Memory (LSTM) networks,
including gates (input, forget, output) and their role in managing long-term dependencies
in sequential data.
• Gain hands-on experience implementing GANs using frameworks like TensorFlow or
PyTorch to generate synthetic handwritten digit images, focusing on data preparation,
model training, and evaluation of generated images
Deep Learning Lab
List of Experiments/ Exercises
Page Sign of
Exp. No Name of the Experiment
No. Faculty
Getting familiar with the usage of Google Colab and using GPU as
processing unit
Try yourself uploading a dataset and perform basic data pre- 1–6
1
processing on it. Compare the training time with GPU enabled
versus CPU only
Study and implementation of Feed Forward Neural Network.
Conduct an experiment with dataset of your choice to determine the 7 – 13
2 impact of different hyperparameters on the performance of a feed-
forward neural network
Study and Implementation of Back Propagation
3 Write a python code for backward propagation in a neural network 8 – 18
using concepts like error calculation, and learning rate.
Implement Batch Gradient Descent, Stochastic Gradient Descent and Mini
Batch Gradient Descent
4 Implement Gradient Descent for Neural Network (or Logistic 19 – 27
Regression) by Predicting if a person would buy life insurance based
on his age
Study and Implementation of Singular Value Decomposition for
Dimensionality Reduction
Draw the whisker plot for the above solution for creating the
5 28 – 33
distribution of accuracy scores for each configured number of
dimensions and also find out the predicted class by using
combination of SVD transform and logistic regression model
Implementing Autoencoder for Encoding the Real-World Data
6 Write a program to add random noise to the MNIST dataset and train 34 – 38
the autoencoder to reconstruct the clean images.
Implementing Word2Vec for the Real-World Data
7 Implement pre-trained word2vec models from Google or Wikipedia 39 – 41
for improved accuracy and efficiency
Write a program to perform object detection using CNN
8 Write a CNN program using Pascal VOC or COCO dataset and train 42 – 46
the network to detect objects of interest
Study and Implementation of LSTM
9 Write a python code using time series analysis to predict the number 47 – 52
of international airline passengers from January 1949 to December
1960?
Implementation of GAN for Generating Handwritten Digits Images
Implement a Conditional GAN (cGAN) where the generator and
10 discriminator are conditioned on class labels (e.g., digits 0-9 from 53 – 60
the MNIST dataset). Train the cGAN and generate images for each
digit class
Deep Learning Lab Manual R22 III B.Tech I Sem - AI & ML
Introduction
Collaboratory by Google (Google Colab in short) is a Jupyter notebook-based runtime
environment which allows you to run code entirely on the cloud. This is necessary because it
means that you can train large scale ML and DL models even if you don’t have access to a
powerful machine or a high-speed internet access. Google Colab supports both GPU and TPU
instances, which makes it a perfect tool for deep learning and data analytics enthusiasts because
of computational limitations on local machines. Since a Colab notebook can be accessed
remotely from any machine through a browser, it’s well suited for commercial purposes as
well. In this tutorial you will learn:
• Getting around in Google Colab
• Installing python libraries in Colab
• Downloading large datasets in Colab
• Training a Deep learning model in Colab
• Using Tensor Board in Colab
Creating your first .ipynb notebook in colab
Open a browser of your choice and go to colab.research.google.com and sign in using your
Google account. Click on a new notebook to create a new runtime instance. In the top left
corner, you can change the name of the notebook from “Untitled.ipynb“ to the name of your
choice by clicking on it. The cell execution block is where you type your code. To execute the
cell, press shift + enter. The variable declared in one cell can be used in other cells as a global
variable. The environment automatically prints the value of the variable in the last line of the
code block if stated explicitly.
Training a sample tensorflow model
Training a machine learning model in Colab is very easy. The best part about it is not having
to set up a custom runtime environment, it’s all handled for you. For example, let’s look at
training a basic deep learning model to recognize handwritten digits trained on the MNIST
dataset. The data is loaded from the standard Keras dataset archive. The model is very basic, it
categorizes images as numbers and recognizes them.
Setup:
#import necessary libraries
import tensorflow as tf
#load training data and split into train and test sets
mnist = tf.keras.datasets.mnist
#define model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28,28)),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)
])
Output:
Downloading data from https://storage.googleapis.com/tensorflow/tf-
keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
Epoch 1/5
1875/1875 [==============================] - 16s 8ms/step - loss: 0.2975
- accuracy: 0.9144
Epoch 2/5
1875/1875 [==============================] - 7s 3ms/step - loss: 0.1444
- accuracy: 0.9574
Epoch 3/5
1875/1875 [==============================] - 8s 4ms/step - loss: 0.1086
- accuracy: 0.9677
Epoch 4/5
1875/1875 [==============================] - 7s 3ms/step - loss: 0.0871
- accuracy: 0.9726
Epoch 5/5
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0752
- accuracy: 0.9762
313/313 - 1s - loss: 0.0742 - accuracy: 0.9765 - 576ms/epoch - 2ms/step
Downloading a dataset
When you’re training a machine learning model on your local machine, you’re likely to have
trouble with the storage and bandwidth costs that come with downloading and storing the
dataset required for training a model. Deep learning datasets can be massive in size, ranging
between 20 to 50 Gb. Downloading them is most challenging if you’re living in a developing
country, where getting high-speed internet isn’t possible. The most efficient way to use datasets
is to use a cloud interface to download them, rather than manually uploading the dataset
from a local machine. Thankfully, Colab gives us a variety of ways to download the dataset
from common data hosting platforms.
To download an existing dataset from Kaggle, we can follow the steps outlined below:
1. Go to your Kaggle Account and click on “Create New API Token”. This will
download akaggle.json file to your machine.
2. Go to your Google Colab project file, and run the following commands:
! pip install -q kaggle
from google.colab import files
# choose the kaggle.json file
that you downloaded
files.upload()
! mkdir ~/.kaggle
Downloading the dataset from Google Cloud Platform (GCP) or Google Drive
Google Cloud Platform is a cloud computing and storage platform. You can use it to store large
datasets, and you can import that dataset directly from the cloud into Colab. To upload and
download files on GCP, first you need to authenticate your Google account.
from google.colab import auth
auth.authenticate_user()
Exercise:
1. Try yourself uploading a dataset and perform basic data pre-processing on it. Compare
the training time with GPU enabled versus CPU only.
Solution:
• Input Layer — contains one or more input nodes. For example, suppose you want to
predict whether it will rain tomorrow and base your decision on two variables, humidity
and wind speed. In that case, your first input would be the value for humidity, and the
second input would be the value for wind speed.
• Hidden Layer — this layer houses hidden nodes, each containing an activation function
(more on these later). Note that a Neural Network with multiple hidden layers isknown as
Deep Neural Network.
• Output Layer — contains one or more output nodes. Following the same weather
prediction example above, you could choose to have only one output node generating arain
probability (where >0.5 means rain tomorrow, and ≤0.5 no rain tomorrow). Alternatively,
you could have two output nodes, one for rain and another for no rain. Note, you can use a
different activation function for output nodes vs. hidden nodes.
• Connections — lines joining different nodes are known as connections. These contain
kernels (weights) and biases, the parameters that get optimized during thetraining of a
neural network.
• Kernels (weights) — used to scale input and hidden node values. Each connection
typically holds a different weight.
• Biases — used to adjust scaled values before passing them through an activation
function.
• Activation functions — think of activation functions as standard curves (building
blocks) used by the Neural Network to create a custom curve to fit the training
data.Passing different input values through the network selects different sections
of the standard curve, which are then assembled into a final custom-fit curve.
Loss functions, optimizers, and training
Training Neural Networks involves a complicated process known as backpropagation. I will
not go through a step-by-step explanation of how backpropagation works since it is a big
enough topic deserving a separate article. Instead, let me briefly introduce you to loss
functions and optimizers and summarize what happens when we “train” a Neural Network.
• Loss — represents the “size” of error between the true values/labels and the predicted
values/labels. The goal of training a Neural Network is to minimize this loss. The smaller
the loss, the closer the match between the true and the predicted data. There are many loss
functions to choose from, with Binary Crossentropy, Categorical Crossentropy, and Mean
Squared Error being the most common.
• Optimizers — are the algorithms used in backpropagation. The goal of an optimizer
isto find the optimum set of kernels (weights) and biases to minimize the loss. Optimizers
typically use a gradient descent approach, which allows them to iteratively find the “best”
possible configuration of weights and biases. The most commonly used ones are SGD,
ADAM, and RMS Prop.
Training a Neural Network is basically fitting a custom curve through the training data until it
can approximate it as well as possible. The graph below illustrates what a custom-fitted curve
could look like in a specific scenario. This example contains a set of data that seem to flip
between 0 and 1 as the value for input increases.
Example: Implementation of Feed Forward Neural Network
import math
import pandas as pd
import numpy as np
import random
import tensorflow as tf
file_name = '/content/SAheart.data'
data.head()
random.seed(42)
train = data.iloc[train_ixs, :]
test = data.iloc[test_ixs, :]
print(len(train))
print(len(test))
response = 'chd'
x_train = train[features]
y_train = train[response]
x_test = test[features]
y_test = test[response]
x_train = preprocessing.normalize(x_train)
x_test = preprocessing.normalize(x_test)
model = models.Sequential()
model.add(layers.Dense(input_dim=len(features),
units=hidden_units,
activation=activation))
model.add(layers.Dense(input_dim=hidden_units,
units=1,
activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer=optimizers.Adam(lr=learning_rate),
metrics=['accuracy'])
# evaluate accuracy
losses = history.history['loss']
plt.show()
Output
Epoch 1/5
21/21 [==============================] - 1s 2ms/step - loss: 0.6914 -
accuracy: 0.5263
Epoch 2/5
21/21 [==============================] - 0s 2ms/step - loss: 0.6701 -
accuracy: 0.6718
Epoch 3/5
21/21 [==============================] - 0s 2ms/step - loss: 0.6546 -
accuracy: 0.6718
Epoch 4/5
21/21 [==============================] - 0s 2ms/step - loss: 0.6457 -
accuracy: 0.6718
Epoch 5/5
21/21 [==============================] - 0s 2ms/step - loss: 0.6396 -
5/5 [==============================] - 0s 3ms/step - loss: 0.6717 -
accuracy: 0.6115
accuracy: 0.6718
accuracy: 0.6115
Training accuracy: 0.6718266010284424
Testing accuracy: 0.6115108132362366
Exercise:
1. Conduct an experiment with dataset of your choice to determine the impact of different
hyperparameters on the performance of a feed-forward neural network.
Solution:
import numpy as np
return output
# Inputs (features)
x = np.array([0.1, 0.2, 0.3, 0.4])
# Initial weights
weights = np.array([0.5, -0.5, 0.3, -0.3])
# Actual output
actual_output = 1
# Learning rate
learning_rate = 0.01
# Number of iterations
num_iterations = 10000
# Print results
print("Initial Output:", output_before)
print("Updated Output:", output_after)
print("Error Difference:", error - (actual_output - output_after))
print("Updated Weights after Backward Pass:", weights)
Output
Initial Output: 0.48001065984441826
Updated Output: 0.9734967745349465
Error Difference: 0.4934861146905283
Updated Weights after Backward Pass: [1.72787602 1.95575205 3.98362807
4.61150409]
Exercise:
1. Write a python code for backward propagation in a neural network using concepts like error
calculation, and learning rate.
Solution:
Introduction
We will use very simple home prices data set to implement batch and stochastic gradient
descent inpython.
Batch gradient descent uses all training samples in forward pass to calculate cumulative error
and then weadjust weights using derivatives. In stochastic GD, we randomly pick one training
sample, perform forward pass, compute the error and immediately adjust weights.
So, the key difference here is that to adjust weights, batch GD will use all training samples
whereas stochastic GD will use one randomly picked training sample.
Mini batch is intermediate version of batch GD and stochastic GD. In mini gradient descent you
will use abatch of samples in each iteration. For example, if you have total 50 training samples,
you can take a batch of 10 samples, calculate cumulative error for those 10 samples and then
adjust weights.
To summarize it, In SGD we adjust weights after every one sample. In Batch we adjust weights
after going through all samples but in mini batch we do after every m sample (where m is batch
size and it is 0 < m < n, where n is total number of samples).
Gradient descent allows you to find weights (𝑤1, 𝑤2 , 𝑤3 ) and bias in the following linear
equation for housing price prediction.
Example:
Gradient Descent
import numpy as np
import matplotlib.pyplot as plt
w = learning_rate * w_grad
b = learning_rate * b_grad
return w, b, cost_list
# Running GD
w_gd, b_gd, cost_list_gd = gradient_descent(X, y,
learning_rate=0.001, epochs=500)
Output:
for i in range(epochs):
Output:
# Compute predictions
y_predicted = np.dot(X_mini_batch, w) + b
# Compute gradients
w_grad = -(2/batch_size) * np.dot(X_mini_batch.T,
(y_mini_batch - y_predicted))
b_grad = -(2/batch_size) * np.sum(y_mini_batch -
y_predicted)
Output:
Exercise:
1. Implement Gradient Descent for Neural Network (or Logistic Regression) by Predicting
if a person would buy life insurance based on his age.
2. Conduct an experiment to investigate the impact of different mini-batch sizes on the
performance of Mini-Batch Gradient Descent (e.g., 16, 32, 64, 128).
Solution:
Introduction
Reducing the number of input variables for a predictive model is referred to as dimensionality
reduction. Fewer input variables can result in a simpler predictive model that may have better
performance when making predictions on new data. Perhaps the more popular technique for
dimensionality reduction in machine learning is Singular Value Decomposition, or SVD for
short. This is a technique that comes from the field of linear algebra and can be used as a data
preparation technique to create a projectionof a sparse dataset prior to fitting a model.
Examples of sparse data appropriate for applying SVD for dimensionality reduction:
• Recommender Systems
• Customer-Product purchases
• User-Song Listen Counts
• User-Movie Ratings
• Text Classification
SVD can be thought of as a projection method where data with m-columns (features) is
projected into a subspace with m or fewer columns, whilst retaining the essence of the original
data.
The SVD is used widely both in the calculation of other matrix operations, such as matrix
inverse, but also as a data reduction method in machine learning.
Example 1:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
plt.tight_layout()
plt.show()
Output:
Example 2:
Output:
Exercise:
1. Draw the whisker plot for the above solution for creating the distribution of accuracy scores
for each configured number of dimensions and also find out the predicted class by using
combination of SVD transform and logistic regression model.
Solution:
Example
import keras
from keras import layers
# Display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
Output:
Downloading data from https://storage.googleapis.com/tensorflow/tf-
keras-datasets/mnist.npz
11490434/11490434 [==============================] - 0s 0us/step
(60000, 784)
(10000, 784)
Epoch 1/50
235/235 [==============================] - 4s 5ms/step - loss: 0.2775
- val_loss: 0.1906
Epoch 2/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1714
- val_loss: 0.1541
Epoch 3/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1447
- val_loss: 0.1341
Epoch 4/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1290
- val_loss: 0.1220
Epoch 5/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1189
- val_loss: 0.1135
Experiment 6 Outcome:
Exercise:
1. Write a program to add random noise to the MNIST dataset and train the autoencoder to
reconstruct the clean images. Compare the reconstructed images with the original images
and evaluate the quality of the reconstruction
Solution:
Introduction
Word2vec is a powerful technique in natural language processing (NLP) that transforms words
into dense vector representations, capturing semantic relationships and contextual meanings.
By embedding words in a continuous vector space, word2vec enables algorithms to interpret
textual data more effectively, facilitating tasks such as sentiment analysis, language translation,
and recommendation systems.
Implementing word2vec involves training neural networks on large corpora of text to learn
these embeddings. This process allows words with similar meanings to have vectors that are
close together in the embedding space, enhancing the model's ability to generalize and
understand language nuances.
In this context, understanding how to implement word2vec for real-world data involves
selecting appropriate training data, choosing the right parameters for the model, and
interpreting the resulting embeddings to derive meaningful insights from textual data.
Using the model: Once the word2vec model is trained, it can be used to perform several
operations such as finding the similarity between words, finding the odd one out, finding the
analogy and more.
Example:
# Sample dataset
dataset = [
"natural language processing is a field of computer science",
"word embeddings are used in natural language processing tasks",
"deep learning models have revolutionized natural language processing",
"word2vec is a popular technique for generating word embeddings"
]
Output:
Exercise:
1. Implement pre-trained word2vec models from Google or Wikipedia for improved
accuracy and efficiency.
Solution
Introduction
Object Detection is the process of finding real-world object instances like car, bike, TV,
flowers, and humans in still images or Videos. It allows for the recognition, localization, and
detection of multiple objects within an image which provides us with a much better
understanding of an image as a whole. It is commonly used in applications such as image
retrieval, security, surveillance, and advanced driver assistance systems (ADAS). Image
classification is straight forward, but the differences between object localization and object
detection can be confusing, especially when all three tasks may be just as equally referred to
as object recognition.
Image classification involves assigning a class label to an image, whereas object localization
involves drawing a bounding box around one or more objects in an image. Object detection is
more challenging and combines these two tasks and draws a bounding box around each object
of interest in the image and assigns them a class label. Together, all of these problems are
referred to as object recognition.
As such, we can distinguish between these three computer vision tasks:
• Image Classification: Predict the type or class of an object in an image.
• Input: An image with a single object, such as a photograph.
• Output: A class label (e.g. one or more integers that are mapped to class
labels).
• Object Localization: Locate the presence of objects in an image and indicate
theirlocation with a bounding box.
• Input: An image with one or more objects, such as a photograph.
• Output: One or more bounding boxes (e.g. defined by a point, width, and
height).
• Object Detection: Locate the presence of objects with a bounding box and types
orclasses of the located objects in an image.
• Input: An image with one or more objects, such as a photograph.
• Output: One or more bounding boxes (e.g. defined by a point, width, and
height), and a class label for each bounding box.
Object Detection can be done via multiple ways:
• Feature-Based Object Detection
• Viola Jones Object Detection
• SVM Classifications with HOG Features
• Deep Learning Object Detection
Example
import cv2
from google.colab.patches import cv2_imshow
import numpy as np
# Process detections
boxes = []
confidences = []
class_ids = []
for output in outputs:
for detection in output:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5:
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
Output:
Exercise:
1. Write a CNN program using Pascal VOC or COCO dataset and train the network to detect
objects of interest. Evaluate the model's performance using metrics such as precision,
recall, and mean Average Precision (mAP).
Solution:
Introduction
Long short-term memory (LSTM) units (or blocks) are a building unit for layers of a recurrent
neural network (RNN). A RNN composed of LSTM unitsis often called an LSTM network.
A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate.
The cell is responsible for "remembering" values over arbitrary time intervals; hence the word
"memory" in LSTM. Each of the three gates can be thought of asa "conventional" artificial
neuron, as in a multi-layer (or feed forward) neural network: that is, they compute an activation
(using an activation function) of a weighted sum. Intuitively, they canbe thought as regulators
of the flow of values that goes through the connections of the LSTM; hence the denotation
"gate". There are connections between these gates and the cell.
The expression long short-term refers to the fact that LSTM is a model for the short-term
memory which can last for a long period of time. An LSTM is well-suited to classify, process
and predict time series given time lags of unknown size and duration between important events.
LSTMs were developed to deal with the exploding and vanishing gradient problem when
training traditional RNNs.
Recurrent neural networks have a wide array of applications. These include time series
analysis, document classification, and speech and voice recognition. In contrast to feed
forward artificial neural networks, the predictions made by recurrent neural networks are
Draw a straight line. Let us see, if LSTM can learn the relationship of a straight line and
predictit.
Example
import numpy as np
# Generate x values
x_values = np.linspace(0, 10, 100)
plt.figure(figsize=(8, 6))
plt.plot(x_values, y_values, label="Straight Line", color='blue')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Straight Line Dataset')
plt.legend()
plt.grid(True)
plt.show()
Output:
Output:
Epoch 1/1000
4/4 [==============================] - 6s 28ms/step - loss: 199.4389
Epoch 2/1000
4/4 [==============================] - 0s 6ms/step - loss: 197.8245
Epoch 3/1000
4/4 [==============================] - 0s 6ms/step - loss: 196.1669
.
.
.
Epoch 995/1000
4/4 [==============================] - 0s 9ms/step - loss: 0.0033
Epoch 996/1000
4/4 [==============================] - 0s 11ms/step - loss: 0.0033
Epoch 997/1000
4/4 [==============================] - 0s 10ms/step - loss: 0.0032
Epoch 998/1000
4/4 [==============================] - 0s 11ms/step - loss: 0.0033
Epoch 999/1000
Exercise:
1. Write a python code using time series analysis to predict the number of international
airline passengers from January 1949 to December 1960?
Solution:
Introduction
GANs consist of two neural networks, one trained to generate data and the other trained to
distinguish fake data from real data (hence the “adversarial” nature of the model).
Discriminative vs Generative Models
If you’ve studied neural networks, then most of the applications you’ve come across were likely
implemented using discriminative models. Generative adversarial networks, on the other hand,
are part of a different class of models known as generative models.
Discriminative models are those used for most supervised classification or regression
problems. As an example of a classification problem, suppose you’d like to train a model to
classify images of handwritten digits from 0 to 9. For that, you could use a labeled dataset
containing images of handwritten digits and their associated labels indicating which digit each
image represents.
During the training process, you’d use an algorithm to adjust the model’s parameters. The goal
would be to minimize a loss function so that the model learns the probability.
The Architecture of Generative Adversarial Networks
Generative adversarial networks consist of an overall structure composed of two neural
networks, one called the generator and the other called the discriminator.
The role of the generator is to estimate the probability distribution of the real samples in order
to provide generated samples resembling real data. The discriminator, in turn, is trained to
estimate the probability that a given sample came from the real data rather than being provided
by the generator.
These structures are called generative adversarial networks because the generator and
discriminator are trained to compete with each other: the generator tries to get better at fooling
the discriminator, while the discriminator tries to get better at identifying generated samples.
Example: Students are asked to perform Handwritten Digits Generator with a GAN
import torch
from torch import nn
import matplotlib.pyplot as plt
import torchvision
import torchvision.transforms as transforms
train_set = torchvision.datasets.MNIST(
root=".", train=True, download=True,
transform=transform
)
batch_size = 32
train_loader = torch.utils.data.DataLoader(
train_set, batch_size=batch_size, shuffle=True
)
import torch
from torch import nn
import matplotlib.pyplot as plt
import torchvision
import torchvision.transforms as transforms
train_set = torchvision.datasets.MNIST(
batch_size = 32
train_loader = torch.utils.data.DataLoader(
train_set, batch_size=batch_size, shuffle=True
)
# Discriminator Model
class Discriminator(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Linear(784, 1024),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(1024, 512),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(512, 256),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(256, 1),
nn.Sigmoid(),
)
discriminator = Discriminator().to(device=device)
# Generator Model
class Generator(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Linear(100, 256),
nn.ReLU(),
nn.Linear(256, 512),
nn.ReLU(),
nn.Linear(512, 1024),
nn.ReLU(),
nn.Linear(1024, 784),
nn.Tanh(),
)
generator = Generator().to(device=device)
# Training settings
lr = 0.0001
num_epochs = 6
loss_function = nn.BCELoss()
optimizer_discriminator = torch.optim.Adam(discriminator.parameters(),
lr=lr)
optimizer_generator = torch.optim.Adam(generator.parameters(), lr=lr)
# Training loop
for epoch in range(num_epochs):
for n, (real_samples, mnist_labels) in enumerate(train_loader):
# Data for training the discriminator
real_samples = real_samples.to(device=device)
real_samples_labels = torch.ones((real_samples.size(0),
1)).to(device=device)
latent_space_samples = torch.randn((real_samples.size(0),
100)).to(device=device)
generated_samples = generator(latent_space_samples)
generated_samples_labels = torch.zeros((real_samples.size(0),
1)).to(device=device)
all_samples =
torch.cat((real_samples.view(real_samples.size(0), -1),
generated_samples.view(generated_samples.size(0), -1)))
all_samples_labels = torch.cat((real_samples_labels,
generated_samples_labels))
loss_discriminator.backward()
optimizer_discriminator.step()
# Show loss
if n == len(train_loader) - 1:
print(f"Epoch: {epoch} Loss D.: {loss_discriminator:.4f}")
print(f"Epoch: {epoch} Loss G.: {loss_generator:.4f}")
# Save models
torch.save(generator.state_dict(), "generator.pth")
torch.save(discriminator.state_dict(), "discriminator.pth")
Output:
Exercise:
1. Implement a Conditional GAN (cGAN) where the generator and discriminator are
conditioned on class labels (e.g., digits 0-9 from the MNIST dataset). Train the cGAN
and generate images for each digit class.
Solution: