Introduction:
This is a “Face Mask Detection” project for detecting if user is wearing face
mask or not.
In this present scenario it is very important to keep ourselves safe hence in
order to protect ourselves almost every one of us tend to wear a face mask.
As there is a high spread of virus around it becomes increasingly necessary to
check if people in the crowd wear face masks in most public gatherings such
as Malls, Theatres, Parks etc. The development of a solution to detect if the
person is wearing a face mask and allow their entry would be of great help to
the society. This Face Mask detection system is built using the Deep Learning
technique called as Convolutional Neural Networks (CNN). This CNN Model is
built using the TensorFlow framework and the OpenCV library which is highly
used for real-time applications. This model can also be used to develop a full-
fledged software to scan every person before they can enter the public
gathering. This can also be used further to achieve even higher levels of
accuracy. Detection of face masks is an extremely challenging task for the
face detectors. This is because faces with masks have varied
accommodations, various degrees of obstructions, and diversified mask
types. This Face Mask Detection model is prepared using Convolutional
Neural Network (CNN), TensorFlow, Keras and MobileNetV2 which is used as
an image classifier.
Objective:
After the breakout of the worldwide pandemic, there arises a severe need of
protection mechanisms, face mask is the primary one. The basic aim of the
project is to detect the presence of a face mask on human faces. Deep
Learning is being used to develop this face detector model. we have used
basic concepts of transfer learning in neural networks to finally output
presence or absence of a face mask.
Tools and Technology used:
TENSORFLOW:
TensorFlow is an end-to-end open-source platform for machine learning. It
has a comprehensive flexible ecosystem of tools, libraries and community
resources that lets researchers push the state-of-the-art in ML and developers
easily build and deploy ML powered applications. Its easy model building
property allows to build and train ML models easily using intuitive high-level
APIs like Keras with eager execution, which makes for immediate model
iteration and easy debugging. It helps to easily train and deploy models in the
cloud, on-prem, in the browser, or on-device no matter what language you
use. It has a simple and flexible architecture to take new ideas from concept
to code, to state-of-the-art models, and to publication faster.
KERAS
Keras is a minimalist Python library for deep learning that can run on top of
TensorFlow. It was developed to make implementing deep learning models as
fast and easy as possible for research and development. Keras is the high-
level API of TensorFlow: an approachable, highly-productive interface for
solving machine learning problems, with a focus on modern deep learning. It
provides essential abstractions and building blocks for developing and
shipping machine learning solutions with high iteration velocity. Keras
empowers engineers and researchers to take full advantage of the scalability
and cross-platform capabilities of TensorFlow: we can run Keras on TPU or on
large clusters of GPUs, and you can export your Keras models to run in the
browser or on a mobile device.
OPENCV
OpenCV is a huge open-source library for computer vision, machine learning,
and image processing. OpenCV supports a wide variety of programming
languages like Python, C++, Java, etc. It can process images and videos to
identify objects, faces, or even the handwriting of a human. When it is
integrated with various libraries, such as NumPy which is a highly optimized
library for numerical operations, then the number of weapons increases in
Arsenal i.e., whatever operations one can do in NumPy can be combined with
OpenCV.
MOBILENET
MobileNet is a streamlined architecture that uses depth wise separable
convolutions to construct lightweight deep convolutional neural networks
and provides an efficient model for mobile and embedded vision
applications. The structure of MobileNet is based on depth wise separable
filters. Depth wise separable convolution filters are composed of depth wise
convolution filters and point convolution filters. The depth wise convolution
filter performs a single convolution on each input channel, and the point
with 1 ∗ 1 convolutions.
convolution filter combines the output of depth wise convolution linearly
Fig: MobileNet Architecture
CONVOUTIONAL NEURAL NETWORK(CNN)
The Face Mask detection model is built using the Sequential API of the keras
library. This allows to create the new layers for the model step by step.
The first layer is the Conv2D layer with 100 filters and the filter size or the
kernel size of 3X3. In this first step, the activation function used is the ‘ReLu’.
This ReLu function stands for Rectified Linear Unit which will output the input
directly if is positive, otherwise, it will output zero. The input size is also
initialized as 224X224X3 for all the images to be trained and tested using this
model
In the second layer, the MaxPooling2D is used with the pool size of 2X2.
The next layer is again a Conv2D layer with another 100 filters of the same
filter size 3X3 and the activation function used is the ‘ReLu’. This Conv2D layer
is followed by a MaxPooling3=2D layer with pool size 2X2.
In the next step, we use the Flatten() layer to flatten all the layers into a single
1D layer.
After the Flatten layer, we use the Dropout (0.5) layer to prevent the model
from overfitting.
Finally, towards the end, we use the Dense layer with 50 units and the
activation function as ‘ReLu’.
The last layer of our model will be another Dense Layer, with only two units
and the activation function used will be the ‘Softmax’ function. The softmax
function outputs a vector which will represent the probability distributions of
each of the input units. Here, two input units are used. The softmax function
will output a vector with two probability distribution values.
After building the model, we compile the model and define the loss function
and optimizer function. In this model, we use the ‘Adam’ Optimizer and the
‘Binary Cross Entropy’ as the Loss function for training purpose.
For the face detection, the Haar Feature-based Cascade Classifiers are used in
this experiment. It is a machine learning object detection algorithm used to
identify objects in an image or video and based on the concept of features
proposed by Paul Viola and Michael Jones. In this, a cascade function is
trained from a lot of positive and negative images. It is then used to detect
objects in other images.
The cascade classifier used for this experiment is the Face Detection Cascade
Classifier. In this, a model is pre-trained with frontal facial features is
developed and used in this experiment to detect the faces in real-time.
Finally, the CNN model along with the cascade classifier is trained for 30
epochs with two classes, one denoting the class of images with the face
masks and the other without face masks.
HAAR CASCADE:
Haar Cascade classifiers are an effective way for object detection. This
method was proposed by Paul Viola and Michael Jones in their paper Rapid
Object Detection using a Boosted Cascade of Simple Features. Haar Cascade is
a machine learning-based approach where a lot of positive and negative
images are used to train the classifier.
Positive images – These images contain the images which we want our
classifier to identify.
Negative Images – Images of everything else, which do not contain the
object we want to detect.
Haar cascades are machine learning object detection algorithms.
They use Haar features to determine the likelihood of a certain point
being part of an object.
Boosting algorithms are used to produce a strong prediction out of a
combination of “weak” learners.
Cascading classifiers are used to run boosting algorithms on different
subsections of the input image.
Make sure to optimize against false negatives for Haar cascades.
Use OpenCV for implementing a Haar cascade model yourself.
Haar feature-based cascade classifiers is an effectual machine learning based
approach, in which a cascade function is trained using a sample that contains
a lot of positive and negative images. The outcome of AdaBoost classifier is
that the strong classifiers are divided into stages to form cascade classifiers.
The term “cascade” means that the classifier thus produced consists of a set
of simpler classifiers which are applied to the region of interest until the
selected object is discarded or passed.
The cascade classifier splits the classification work into two stages: training
and detection. The training stage does the work of gathering the samples
which can be classified as positive and negative. The cascade classifier
employs some supporting functions to generate a training dataset and to
evaluate the prominence of classifiers.
In order to train the cascade classifier, we need a set of positive and negative
samples. In our work, we have incorporated the utility called
opencv_createsamples to create the positive samples for
opencv_traincascade. The output file of this function serves as an input to
opencv_traincascade to train the detected face. The negative samples are
collected from arbitrary images, which do not include the objects to be
detected.
Fig. 14.2 and Table 14.2 show the flow of the cascade classifier. Initially, the
classifier was trained with a few positive and negative samples, which are
arbitrary images of the same size, of which both samples were equally scaled
in their size. The classifier generates “1” if the region possibly identifies the
face and generates “0” otherwise. The major goal of the cascade classifier is
to find the face objects of interest at diverse sizes, making the classifier more
proficient without altering the size of the input images.