0% found this document useful (0 votes)
93 views46 pages

9th SEM Final Project Amrita

Uploaded by

Neha Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views46 pages

9th SEM Final Project Amrita

Uploaded by

Neha Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

DEEP LEARNING BASED FACIAL RECOGNITION FOR TASK

AUTHENTICATION

A PROJECT REPORT
Submitted by

SHRUTHI S
CB.SC.I5DAS20144

in partial fulfillment of the requirements for the award of the


degree of

INTEGRATED MASTER OF SCIENCE


IN
DATA SCIENCE

Department of Mathematics

AMRITA SCHOOL OF PHYSICAL SCIENCES

AMRITA VISHWA VIDYAPEETHAM

COIMBATORE 641112

NOVEMBER 2024
DEPARTMENT OF MATHEMATICS

AMRITA SCHOOL OF PHYSICAL SCIENCES,

Coimbatore - 641112

BONAFIDE CERTIFICATE

This is to certify that the project report entitled "DEEP LEARNING BASED FACIAL
RECOGNITION FOR TASK AUTHENTICATION" submitted by SHRUTHI S
(CB.SC.I5DAS20144) in partial fulfillment of the requirements for the award of the Degree of
Integrated Master of Science in Data Science is a bonafide record of the work carried out under my
guidance and supervision at Amrita School of Physical Sciences, Coimbatore.

Project Coordinator Project Advisor

Chairperson
Department of Mathematics
Dr. K. Somasundaram

The project was evaluated by us on:

Internal Examiner Internal Examiner


DEPARTMENT OF MATHEMATICS

AMRITA SCHOOL OF PHYSICAL SCIENCES,

Coimbatore - 641112

DECLARATION

I, SHRUTHI S (CB.SC.I5DAS20144) hereby declare that this project report entitled "DEEP
LEARNING BASED FACIAL RECOGNITION FOR TASK AUTHENTICATION" is the
record of the original work done by me at Department of Mathematics, Amrita School of Physical
Sciences, Coimbatore. To the best of my knowledge this work has not formed the basis for the
award of any degree / diploma / associateship / fellowship or a similar award to any candidate in
any university/institutions.

Date:

Place:

Signature of the Student

COUNTERSIGNED
Dr. P. Tamilalagan
Project Coordinator,
Department of Mathematics,
School of Physical Sciences,
Amrita Vishwa Vidyapeetham,
Coimbatore-641112, Tamil Nadu, India.
Acknowledgements

I would like to express my sincere gratitude to everyone who has supported me in the completion
of this project. I am especially thankful to my project guide, Dr. Neha Singh, Assistant Professor,
Department of Mathematics, School of Physical Sciences, Coimbatore, for her continuous support
and guidance throughout this project. I also extend my gratitude to our Chairperson, Dr.
Somasundaram K, Department of Mathematics, School of Physical Sciences, Coimbatore, for
fostering a creative and encouraging environment that greatly contributed to this project’s success.
I am grateful to my friends for their constant encouragement and assistance in bringing this project
to life.

Most importantly, I would like to express my heartfelt appreciation to my parents for their
unwavering understanding and support, which provided me with the strength and motivation to
complete this project successfully.

(SHRUTHI S)
ABSTRACT

This project, titled Deep Learning-Based Facial Recognition for Task Authentication, aims to
develop a robust and secure system for task authentication using deep learning and computer
vision techniques. By combining Convolutional Neural Networks (CNN), Vision Transformers
(ViT), and traditional computer vision methods, the system provides a highly accurate solution
for real-time facial verification, ensuring only authorized individuals can complete specific
tasks.

The authentication process leverages a structured dataset where each employee’s identity is
confirmed using five reference images and a live image. These images are processed to generate
facial embeddings through a pre-trained VGG16 model, which are then stored in Elasticsearch
for efficient retrieval and comparison. During the verification stage, the system fetches the latest
image’s embeddings, which are compared to stored embeddings based on a cosine similarity
threshold of 0.9, ensuring reliable identity verification.

Key challenges addressed include managing compatibility across frameworks and


synchronizing data between SQL Server and Elasticsearch. Testing and comparison of CNN,
ViT, and traditional computer vision approaches enabled selection of the optimal model for this
specific application. This project highlights the synergy between deep learning and computer
vision in creating secure, automated solutions for task management, enhancing both the
reliability and efficiency of organizational processes.
CONTENTS

1. Introduction ....................................................................................................................... 1
1.1. Background .................................................................................................................. 2
1.2. Objectives ..................................................................................................................... 2
1.3. Overview of Approach .................................................................................................. 3

2. Literature Review .............................................................................................................. 4


2.1. Overview of Facial Recognition Techniques ............................................................... 5
2.2. Traditional vs. Deep Learning Approaches .................................................................. 5
2.3. CNN vs. Vision Transformer Models .......................................................................... 5

3. Methodology ....................................................................................................................... 7
3.1. Data Collection and Preprocessing ............................................................................... 8
3.1.1. Dataset Selection ................................................................................................ 8
3.1.2. Image Preprocessing Techniques ....................................................................... 9
3.1.3. Storage and Retrieval in Elasticsearch ............................................................... 9
3.2. Model Selection and Training ...................................................................................... 9
3.2.1. CNN Model Implementation ............................................................................. 9
3.2.2. Vision Transformer (ViT) Model Implementation ............................................ 9
3.2.3. Traditional Computer Vision Methods............................................................. 10
3.3. Embedding Storage and Verification Process ............................................................ 10
3.3.1. Embedding Generation with VGG16 ............................................................... 10
3.3.2. Embedding Storage and Management in Elasticsearch .................................... 10
3.3.3. Cosine Similarity for Verification and Threshold Criteria ............................... 11
3.3.4. User Interface for Presentation Purposes .......................................................... 11

4. System Design ................................................................................................................... 12


4.1. Workflow of the Facial Authentication System .......................................................... 13
4.1.1. Data Collection and Initialization .................................................................... 13
4.1.2. Image Preprocessing ........................................................................................ 13
4.1.3. Embedding Generation and Storage ................................................................. 14
4.1.4. Real-Time Verification .................................................................................... 14
4.1.5. Data Pipeline Workflow ................................................................................... 15
4.2. Integration of Elasticsearch for Storage and Retrieval …………………………........ 16
4.2.1. Indexing and Data Mapping ……………………………………………......... 16
4.2.2. Efficient Storage and Scalability ...................................................................... 16
4.2.3. Data Synchronization ....................................................................................... 16
4.3. Real-Time Verification Process .................................................................................. 17
4.3.1. Capture and Embedding Generation ................................................................ 17
4.3.2. Retrieving Stored Embeddings from Elasticsearch .......................................... 17
4.3.3. Cosine Similarity and Matching ....................................................................... 17
4.3.4. Real-Time Feedback and Access Control ........................................................ 18
4.3.5. Employee Facial Recognition Pipeline ............................................................ 19
5. Implementation ................................................................................................................ 20
5.1. Development Environment Setup ............................................................................... 21
5.2. Implementation of the Verification Model .................................................................. 22
5.3. System Testing and Debugging .................................................................................. 23
5.4. Addressing Version Compatibility ............................................................................. 33

6. Milestones ......................................................................................................................... 25

7. Evaluation ........................................................................................................................ 28
7.1. Performance Analysis of Each Model ......................................................................... 28
7.2. Verification Success Rate and Accuracy Metrics ........................................................ 29
7.3. Cosine Similarity Observations .................................................................................. 30

8. Results and Analysis ........................................................................................................ 31


8.1. Performance of VGG16 Model ................................................................................... 31
8.2. Verification Success Rate and Accuracy ..................................................................... 31
8.3. Observations from Cosine Similarity Testing ............................................................. 32

9. Conclusion ........................................................................................................................ 34

10. References ......................................................................................................................... 36


CHAPTER 1
INTRODUCTION

1
INTRODUCTION

In today's digital landscape, security and efficiency are of utmost importance for organizations
managing various tasks. Traditional authentication methods, such as passwords or tokens, often
present vulnerabilities, or usability challenges. Facial recognition, powered by advancements
in deep learning and computer vision, offers a promising alternative that is both secure and
user-friendly. By automating the process of verifying individuals based on their facial features,
organizations can enhance security and streamline task verification.

This project, titled "Deep Learning-Based Facial Recognition for Task Authentication," is the
first step toward building a production-ready system that uses facial recognition for secure task
authentication. The goal is to ensure that only authorized personnel are allowed to sign off or
complete specific tasks, thus maintaining integrity and accountability in organizational
workflows.

Leveraging deep learning models such as Convolutional Neural Networks (CNN) and Vision
Transformers (ViT), combined with traditional computer vision techniques, this project aims to
implement an initial prototype of a facial authentication system. The system works by
processing a set of reference images for each employee, generating facial embeddings using a
pre-trained VGG16 model, and storing these embeddings in Elasticsearch for comparison
during real-time verification. The verification process uses cosine similarity to ensure accurate
matching, with a threshold set at 0.9 for successful identification.

This project serves as the foundation for a more robust and scalable system, which will undergo
rigorous testing and refinement to meet production standards. Key challenges addressed in this
first phase include the synchronization of data between SQL Server and Elasticsearch, as well
as evaluating the performance of different models in terms of accuracy and computational
efficiency. Based on testing outcomes, the system will be enhanced to ensure it meets the
security and performance requirements necessary for production deployment.

2
Ultimately, this project demonstrates the potential of deep learning and computer vision to
revolutionize task authentication processes. While the current implementation is a first step, it
sets the stage for continuous improvement and optimization, paving the way for a fully
functional and scalable facial recognition system in organizational settings.

3
CHAPTER 2
LITERATURE
REVIEW

4
LITERATURE REVIEW

2.1 Overview of Facial Recognition Techniques

Facial recognition has become one of the most reliable forms of biometric authentication due
to its non-intrusive nature and high accuracy in identifying individuals. The core of facial
recognition involves detecting facial features and comparing them against a stored dataset of
known faces to verify identity. Over the years, facial recognition has evolved through several
stages, starting from basic geometric-based methods to complex, deep learning-based models
that can handle variations in lighting, pose, and expression. Recent advancements leverage
feature extraction methods such as face embeddings, which convert faces into vector
representations, allowing for highly accurate comparisons. This progression demonstrates the
field’s adaptation to meet increasing demands for accuracy, speed, and scalability.

2.2 Traditional Computer Vision Methods vs. Deep Learning Approaches

Early facial recognition systems primarily relied on traditional computer vision techniques,
such as Eigenfaces and Fisherfaces, which used Principal Component Analysis (PCA) and
Linear Discriminant Analysis (LDA), respectively, to extract features. Although effective, these
techniques struggled with variations in environmental conditions, such as changes in lighting
and angle. As deep learning emerged, more sophisticated models, particularly Convolutional
Neural Networks (CNNs), became the standard for facial recognition. Deep learning
approaches revolutionized facial recognition by allowing models to learn complex facial
features from large datasets, thus improving accuracy significantly under diverse conditions.
Deep learning also enables end-to-end training, making models more adaptable and less reliant
on manual feature engineering. This shift from traditional methods to deep learning has
expanded facial recognition capabilities, especially in areas requiring real-time authentication.

2.3 Comparison of CNN and Vision Transformer (ViT) Models

Convolutional Neural Networks (CNNs) have traditionally been the backbone of facial
recognition systems due to their ability to capture spatial hierarchies in images, making them
ideal for detecting facial features. CNNs use convolutional layers to identify patterns at various

5
levels, from simple edges to complex structures, providing accurate embeddings for facial
comparisons. However, CNNs have limitations in handling long-range dependencies and may
require extensive labeled data for training.

Vision Transformers (ViT), a newer approach, bring the attention mechanism—originally used
in natural language processing—into computer vision, enabling models to capture relationships
across the entire image, rather than focusing on local features alone. ViTs divide an image into
patches, treating each patch as a sequence element and learning long-range dependencies. ViTs
have shown promise in facial recognition, especially in cases where relationships between
distant features, such as eyes and mouth, need to be captured for precise recognition. Although
ViTs are computationally demanding and may require large amounts of data, they provide an
alternative that can complement or enhance the performance of CNNs, depending on the
specific requirements of the application.

6
CHAPTER 3
METHODOLOGY

7
METHODOLOGY

This section outlines the steps involved in developing the facial authentication system,
including data collection, model selection, embedding generation, and verification processes.
Each stage was carefully designed to ensure accuracy, efficiency, and scalability, providing a
strong foundation for real-time task authentication.

3.1 Data Collection and Preprocessing

3.1.1 Dataset Selection

To ensure accurate facial recognition, the project required a diverse and reliable dataset. Each
employee’s identity was validated using five high-quality reference images, captured in various
lighting conditions and angles to account for real-world variations. A live image is also used
for real-time verification during task sign-off. This dataset structure enables the system to
handle typical variations while ensuring accurate recognition.

3.1.2 Image Preprocessing Techniques

Before feeding images into the model, several preprocessing steps were applied to enhance
accuracy:

• Face Detection: Each image undergoes face detection to isolate the face from the
background, reducing noise and improving the quality of embeddings.
• Image Resizing: Images are resized to a fixed input size compatible with the selected
deep learning models (e.g., 224x224 pixels for VGG16).
• Normalization: Image pixel values are normalized to a standard scale (typically
between 0 and 1) to improve model convergence and performance.
• Data Augmentation: To further improve model robustness, augmentation techniques
such as random rotations, zooms, and brightness adjustments were applied, helping the
model generalize to new images during real-time authentication.

8
3.1.3 Storage and Retrieval in Elasticsearch

Elasticsearch serves as the primary storage solution for employee facial embeddings. After
preprocessing, each image's embedding is generated and stored in the facerecognition index
within Elasticsearch, organized by unique employee IDs. Elasticsearch allows for efficient
retrieval and updating of embeddings, making it suitable for high-speed, real-time facial
verification. This setup also supports scalability, enabling the system to accommodate a
growing number of employees without performance degradation.

3.2 Model Selection and Training

3.2.1 CNN Model Implementation

Convolutional Neural Networks (CNNs), specifically the VGG16 model, were chosen due to
their robust feature extraction capabilities. The VGG16 model uses multiple convolutional
layers to detect intricate patterns and structures in facial images, making it well-suited for facial
recognition tasks. Key steps in implementing CNN include:

• Transfer Learning: Pre-trained VGG16 weights are used to leverage existing


knowledge, allowing the model to produce accurate embeddings without requiring
extensive retraining.
• Embedding Generation: Once images are processed through VGG16, the output from
the penultimate layer serves as a high-dimensional vector, or embedding, capturing
unique facial features.

3.2.2 Vision Transformer (ViT) Model Implementation

Vision Transformers (ViT) offer an alternative to CNNs by using the self-attention mechanism,
capturing relationships between all parts of an image. Key steps in implementing ViT include:

• Patch Embedding: Each image is divided into fixed-size patches, which are then linearly
embedded to serve as the input tokens for the transformer.
• Attention Mechanism: ViT models learn to capture long-range dependencies between
patches, enabling a more holistic understanding of facial features compared to CNNs.

9
• Embedding Comparison: Like CNNs, ViT outputs an embedding vector, which is stored
and compared during verification. ViTs were assessed for performance and accuracy to
determine their suitability for real-time applications.

3.2.3 Traditional Computer Vision Methods

To complement deep learning methods, traditional computer vision techniques such as


Eigenfaces and Fisherfaces were also explored. These methods use Principal Component
Analysis (PCA) and Linear Discriminant Analysis (LDA) to identify key facial features and
reduce dimensionality, making them computationally efficient for specific applications.
However, due to their limitations in handling variations in lighting and pose, traditional
methods were primarily included for comparison purposes and were not used in the final
implementation.

3.3 Embedding Storage and Verification Process

3.3.1 Embedding Generation with VGG16

The VGG16 model generates embeddings for each image after preprocessing, producing a
unique high-dimensional vector that captures the essential facial features. This embedding
serves as a compressed representation of the face, which is subsequently stored in Elasticsearch
for future comparison. Using VGG16 pre-trained weights ensures that the embeddings are
accurate without extensive training time, enhancing system efficiency.

3.3.2 Embedding Storage and Management in Elasticsearch

Elasticsearch provides a structured environment for storing and managing facial embeddings:

• Indexing and Storage: Each employee’s embeddings are indexed within Elasticsearch
under a unique employee ID. Multiple embeddings per employee are stored to account for
variations in appearance over time.
• Data Syncing: The system syncs the SQL Server database with Elasticsearch, ensuring
that any new employee images added to the SQL database are automatically reflected in
Elasticsearch, maintaining consistency across both databases.

10
• Scalability: Elasticsearch’s scalability allows the system to handle a growing number of
employee records while maintaining fast retrieval times, a crucial feature for real-time
applications.

3.3.3 Cosine Similarity for Verification and Threshold Criteria

For the verification process, cosine similarity is used to measure the closeness between the
embedding of the live image and the stored embeddings. The steps in the verification process
are as follows:

• Cosine Similarity Calculation: Cosine similarity measures the angle between two
embedding vectors, yielding a score between -1 and 1. A higher cosine similarity indicates
a closer match between the embeddings.
• Threshold Setting: A threshold of 0.9 was set as the minimum acceptable similarity score
for successful authentication. If the cosine similarity between the live image embedding
and all reference embeddings for a given employee is above 0.9, the person is considered
authenticated; otherwise, access is denied.
• Real-Time Verification: During task sign-off, the system retrieves the latest image
embedding from Elasticsearch and performs cosine similarity checks against stored
embeddings in real time. This real-time comparison enables prompt and accurate task
authentication.

3.3.4 User Interface for Presentation Purposes

While the final deployment does not require a user interface, a simple UI is added for
demonstration purposes. This UI allows for an interactive experience to showcase the
workflow, from uploading employee images to displaying verification results. The UI provides
a visual representation of the system's functionality, making it easier for stakeholders to
understand the authentication process, especially during panel presentations.

11
CHAPTER 4
SYSTEM DESIGN

12
SYSTEM DESIGN

The system design of this project focuses on achieving efficient, accurate, and secure facial
authentication for task sign-off. The following subsections outline the overall workflow,
integration with Elasticsearch for efficient data management, and the real-time verification
process.

4.1 Workflow of the Facial Authentication System

The facial authentication system is designed with a step-by-step workflow to capture, process,
store, and verify employee images for task authentication. This structured approach helps
ensure that each component of the system—from data collection to real-time verification—
functions optimally and integrates smoothly.

4.1.1 Data Collection and Initialization:

• Each employee provides five images during an initial enrollment phase. These images
are captured under various conditions to enhance the model's ability to generalize across
different lighting, expressions, and poses. A unique employee ID is assigned to each
person, linking their images and authentication details in the system.
• A live image is captured during each task sign-off attempt. This image is used for real-
time verification to ensure that only authorized personnel complete or sign off on the
task.

4.1.2 Image Preprocessing:

• Images undergo face detection to isolate and focus on the facial area, eliminating
background noise.
• Images are resized and normalized to match the input requirements of the deep learning
models (e.g., VGG16).
• Data augmentation techniques may be applied to account for environmental variations,
improving model robustness.

13
4.1.3 Embedding Generation and Storage:

• After preprocessing, each image is passed through a deep learning model (e.g., VGG16
or Vision Transformer) to generate a facial embedding. This embedding is a unique
high-dimensional vector that represents the core features of an individual’s face.
• These embeddings are then stored in Elasticsearch, mapped to the employee’s ID,
ensuring quick access during the verification process.

4.1.4 Real-Time Verification:

• During task sign-off, the system captures a live image and generates an embedding,
which is compared against the stored embeddings in Elasticsearch. Cosine similarity
measures are used to assess how closely the live embedding matches the stored
embeddings.
• This workflow provides a comprehensive, end-to-end process that enables the system
to capture, store, and verify facial data for secure and efficient task authentication.

14
4.1.5 Data Pipeline Workflow:

15
4.2 Integration of Elasticsearch for Image Storage and Retrieval

Elasticsearch plays a central role in managing and retrieving facial embeddings for real-time
verification. It was chosen for its speed, scalability, and ability to handle complex data queries,
making it well-suited for high-frequency access in authentication systems.

4.2.1 Indexing and Data Mapping:

• All embeddings are indexed within Elasticsearch under a unique identifier


corresponding to each employee ID. This allows the system to store and organize
multiple embeddings for a single employee, covering variations across different
images.
• The index is set up to include fields such as the employee ID, timestamp, and
embedding data, creating an efficient structure for storing and accessing facial data.

4.2.2 Efficient Storage and Scalability:

• With Elasticsearch, the system can efficiently store embeddings for many employees,
supporting horizontal scalability to accommodate future growth.
• Elasticsearch’s architecture enables fast retrieval, allowing the system to perform real-
time comparisons without delays. This scalability is essential as the system expands to
handle more employees and data.

4.2.3 Data Synchronization:

• To ensure the database is up-to-date, the system synchronizes with an SQL Server
database that maintains employee information. Data from the SQL Server
(`bas_wc_wemp_empimgs` table) is mapped to the Elasticsearch index
(`facerecognition`), ensuring consistency across both databases.
• Data synchronization enables real-time updates, where any new image uploaded to
SQL Server is automatically reflected in Elasticsearch, making the latest embeddings
available for immediate verification.

16
• This integration of Elasticsearch with the facial authentication system ensures fast,
reliable data management, enabling smooth and efficient operations for real-time task
sign-offs.

4.3 Real-Time Verification Process for Task Sign-Off

The real-time verification process is critical to ensuring secure and efficient task authentication.
This process involves generating a live image embedding, retrieving stored embeddings, and
comparing them to determine if a match exists.

4.3.1 Capture and Embedding Generation:

During task sign-off, a live image is captured, and its facial embedding is generated using a pre-
trained VGG16 model. This embedding represents the individual’s facial features in a compact,
high-dimensional format, capturing unique details essential for verification.

4.3.2 Retrieving Stored Embeddings from Elasticsearch:

The system queries Elasticsearch to retrieve all embeddings associated with the employee’s ID.
Since embeddings are organized by employee ID and timestamp, the latest embeddings are
easily accessible for comparison.

4.3.3 Cosine Similarity and Matching:

• Cosine Similarity Calculation: The similarity between the live image embedding and
stored embeddings is calculated using cosine similarity, which measures the angle
between the two vectors. A similarity score near 1 indicates a close match, while a
score near 0 or below indicates a lack of similarity.
• Threshold Verification: A threshold of 0.9 is set to determine a successful match. If
the similarity score for the live embedding compared to each stored embedding meets
or exceeds this threshold, the task is authenticated as valid. If the score falls below 0.9,
access is denied, preventing unauthorized sign-off.

17
4.3.4 Real-Time Feedback and Access Control:

• Based on the cosine similarity results, the system grants or denies access. Successful
matches allow task sign-off, while failed matches trigger an alert, preventing
unauthorized actions.
• This real-time feedback ensures that the verification process is both secure and
efficient, minimizing the risk of errors and providing a streamlined experience for end
users.
• The real-time verification process is essential for maintaining the integrity of task
authentication, allowing only verified individuals to complete sensitive actions.

18
4.3.5 Employee Facial Recognition Pipeline:

19
CHAPTER 5
IMPLEMENTATION

20
IMPLEMENTATION

The implementation of the Deep Learning-Based Facial Recognition for Task


Authentication system was carried out in multiple stages, including setting up the development
environment, integrating the verification model, conducting system testing and debugging, and
addressing version compatibility issues. Each stage involved careful planning, implementation,
and testing to ensure the system meets its performance and security requirements.

5.1 Development Environment Setup

The development environment was established to support the various tools and libraries
necessary for building the facial recognition system. The following steps were taken:

• Software and Libraries:


o PyCharm was chosen as the Integrated Development Environment (IDE) for writing
and debugging Python code.
o SQL Server was selected as the database management system for storing employee
information and images. Specifically, the `bas_wc_wemp_empimgs` table stores
images, while employee-related metadata is also kept for integration with the facial
recognition system.
o Elasticsearch was integrated into the system for efficient storage, indexing, and retrieval
of facial embeddings. Elasticsearch is well-suited for handling large datasets and
provides quick access to embeddings during real-time verification.
o Deep Learning Frameworks: TensorFlow/Keras was used to build the deep learning
model for generating facial embeddings. The VGG16 model was pre-trained on a large
dataset and adapted for the facial recognition task, enabling accurate and fast feature
extraction.
• Configuration Steps:
o A connection was established between SQL Server and Elasticsearch for seamless data
synchronization. The `facerecognition` index in Elasticsearch was configured to store
facial embeddings along with the associated employee IDs and timestamps.

21
o Python packages were installed via `pip`, ensuring compatibility with TensorFlow,
Keras, and Elasticsearch. The environment was tested to confirm that all dependencies
worked together without issues.

5.2 Implementation of the Verification Model

The core of the system involves face recognition for task sign-off, relying on generating facial
embeddings and comparing them with stored embeddings in Elasticsearch. The process is
explained below:

• Model Selection and Integration:


o VGG16: This pre-trained model was chosen for generating facial embeddings. VGG16
is a deep convolutional neural network that excels in feature extraction tasks, including
facial recognition. The model was fine-tuned to output embeddings suitable for our
application.
o Embedding Generation: When an employee provides images, they are passed through
the VGG16 model to generate embeddings. These embeddings are unique, high-
dimensional vectors that represent distinct facial features and are crucial for accurate
identity verification.
o Real-Time Verification: During task sign-off, a live image is captured, and its
embedding is generated by passing it through the pre-trained model. The live image’s
embedding is then compared to stored embeddings in Elasticsearch to verify the identity
of the person attempting to sign off on the task.
• Cosine Similarity for Matching:
o To determine if the live image belongs to the employee, we compute the cosine
similarity between the live image embedding and each stored embedding for that
employee. The cosine similarity is a measure of the angle between two vectors, and a
high similarity indicates that the two embeddings are likely from the same person.
o A threshold of 0.9 was chosen to classify a match. If the cosine similarity score is above
this threshold, the task is authenticated, and the employee is allowed to sign off on it. If
the score is below 0.9, the system denies access and triggers an alert.

22
5.3 System Testing and Debugging

The system underwent rigorous testing to ensure that it functions as expected in a real-world
environment. Several tests were conducted in different phases:

• Data Pipeline Testing:


o The flow of data between SQL Server and Elasticsearch was thoroughly tested to ensure
that images and embeddings were being transferred correctly. This involved testing the
synchronization mechanism to verify that any updates in the SQL Server database were
immediately reflected in Elasticsearch.
o Edge cases, such as adding new employee images or updating existing ones, were tested
to ensure that the system handled these updates without errors.
• Verification Accuracy Testing:
o Several tests were conducted to evaluate the accuracy of the verification model. These
included testing with a variety of images under different lighting conditions, facial
expressions, and angles to ensure that the model could consistently and accurately match
the live image with the stored embeddings.
o The model's ability to reject false positives and false negatives was also tested, with the
system being calibrated to ensure that the 0.9 threshold appropriately balanced security
and accuracy.
• Debugging Tools and Techniques:
o During the testing phase, bugs related to the image preprocessing pipeline, such as
improper image resizing or detection failures, were encountered and resolved.
o Log files were used to track system behavior and identify areas where the process broke
down. Tools like PyCharm’s debugger helped isolate issues and identify mismatched
data or system crashes.
o Extensive testing was done to ensure that all components of the system (image capturing,
embedding generation, Elasticsearch querying, and cosine similarity comparison)
worked together seamlessly.

5.4 Addressing Version Compatibility

One significant challenge encountered during the implementation phase was version
compatibility between Keras and Elasticsearch, particularly with the VGGFace model:

23
• Issue Identification:
o The incompatibility between Keras and Elasticsearch arose when integrating pre-trained
facial recognition models. Specifically, there were issues with the VGGFace model,
which was built using Keras but did not directly integrate with the Elasticsearch version
used for the project.
o This caused errors during the embedding storage and retrieval process, affecting the
system’s ability to retrieve embeddings from Elasticsearch and use them for verification.
• Solution and Adjustments:
o Workaround Solution: After identifying the version mismatch, I explored several
alternatives, including using a different version of Elasticsearch or modifying the pre-
trained model to work with the current version of Elasticsearch.
o Model Conversion: I also attempted converting the VGGFace model's embeddings into
a format compatible with Elasticsearch, ensuring that embeddings could be stored and
retrieved without errors.
o Dependency Updates: The versions of Keras and TensorFlow were updated to ensure
they were compatible with the required libraries for Elasticsearch. Additionally, the
Elasticsearch client was upgraded to ensure compatibility with Python-based libraries.

24
CHAPTER 6
MILESTONES

25
MILESTONES

The project milestones for the facial recognition system have been systematically organized to
ensure a structured development approach. In the Initial Research phase, comprehensive
studies were conducted on Convolutional Neural Networks (CNNs), Vision Transformers, and
various facial recognition workflows. This phase provided foundational knowledge of computer
vision techniques and pre-trained models, including VGGFace, and led to the development of
essential functions for handling Base64 image encoding, a key component for data processing.

26
Model Development was then initiated, where CNN models were built from scratch and tested
on datasets such as CIFAR-10, which provided insights into initial accuracy and performance
improvement opportunities. Integration of pre-trained models, including VGGFace, allowed for
significant accuracy gains on publicly available data. Additionally, research into facial
recognition systems on platforms like Android, iOS, and Huawei HarmonyOS provided insights
into potential device-level applications and compatibility requirements.

The project is now in Model Development (In Progress), where efforts are focused on
integrating computer vision models and exploring traditional face recognition methods such as
Haar Cascades and Histogram of Oriented Gradients (HOG). These approaches are being
compared to CNN and VGGFace to determine the best method for the company's dataset and
infrastructure.

In the Next Steps, a dedicated data pipeline will be built to manage the company's dataset, with
images and metadata stored in Elasticsearch for streamlined retrieval. A detailed comparison
of the CNN, VGGFace, and traditional models will be conducted using company data to
evaluate performance metrics, leading to the integration of the most effective model as an API.
This final model will be validated with real-world data to ensure its reliability, and the system
will undergo rigorous testing and security integration to prepare it for production deployment.

27
CHAPTER 7
EVALUATION

28
EVALUATION

The evaluation phase is critical in assessing the performance, accuracy, and reliability of the
facial recognition system designed for task authentication. The system was evaluated using key
performance metrics such as accuracy, precision, recall, and F1 score, which provide
comprehensive insights into its overall effectiveness. Accuracy represents the overall
correctness of the system in distinguishing authorized users from unauthorized ones, ensuring
only verified individuals can authenticate tasks. Precision measures the proportion of correct
matches among the predicted matches, while recall assesses the system’s capability to identify
all true matches. The F1 score, the harmonic mean of precision and recall, was particularly
important as it reflects the balance between false positives and false negatives, ensuring the
system’s robustness without sacrificing either sensitivity or specificity.

During evaluation, three primary computer vision models were compared for their suitability
in facial recognition: Convolutional Neural Networks (CNN), Vision Transformers (ViT), and
traditional computer vision techniques. CNNs demonstrated superior performance in face
detection and embedding generation, providing high accuracy while maintaining efficiency in
terms of computational resources. The CNN model's ability to generate meaningful embeddings
from the facial features enabled the system to achieve robust task authentication results. Vision
Transformers, though showing competitive accuracy, presented challenges in terms of
computational overhead, particularly when deployed for real-time verification tasks. These
models, while effective in larger datasets, did not offer the same speed as CNNs, making them
less optimal for this system's real-time requirements. Traditional computer vision techniques,
while faster and less resource-intensive, fell short in terms of accuracy and the ability to handle
diverse face variations, which ultimately led to their exclusion from the system.

As part of the evaluation, the system’s performance was also tested on multiple datasets
consisting of images captured under different conditions (e.g., lighting, expression, and angles).
This comprehensive testing helped identify the strengths and weaknesses of each model. The
system’s efficiency was also assessed by checking the retrieval time for facial embeddings from
Elasticsearch, which proved crucial for real-time performance. CNNs were found to be the most

29
efficient in embedding generation and retrieval speed, providing a good balance of accuracy
and real-time processing, making them the ideal choice for the task authentication process.

After a thorough comparison and testing phase, CNN-based models were selected as the optimal
approach for this system. They offered a reliable mix of accuracy, computational efficiency,
and speed, which aligns with the system's real-time verification requirements. However, there
is still room for further improvements. Future enhancements might include fine-tuning the CNN
model to better handle challenging environmental conditions, such as varying lighting and
facial expressions, by leveraging advanced data augmentation techniques and further training
on diverse datasets. Additionally, exploring Vision Transformers for future versions of the
system may become feasible if computational efficiency improves or if the system scales up to
handle more complex datasets. Incorporating hybrid models that combine CNNs and traditional
computer vision methods could also be explored for specific use cases where rapid verification
with lower computational demands is required.

Moreover, future versions of the system could incorporate additional security layers, such as
multi-factor authentication, to further enhance the robustness of the facial recognition-based
task authentication process. In conclusion, while the current system performs well with high
accuracy and efficiency, ongoing refinements and optimizations are planned, particularly in
addressing diverse facial variations and enhancing computational efficiency, to make the
system even more reliable and scalable in real-world scenarios.

30
CHAPTER 8
RESULTS AND
ANALYSIS

31
RESULTS AND ANALYSIS

The facial recognition system was evaluated using various metrics, including cosine similarity,
accuracy, and verification success rate to assess its performance and effectiveness in
authenticating employees. The system uses embeddings derived from a pre-trained VGG16
model for face feature extraction, and the cosine similarity is used to compare embeddings for
verification.

8.1 Performance Analysis of Each Model

• VGG16 Model for Feature Extraction: The VGG16 model, pre-trained on ImageNet,
was used for extracting face embeddings. This model is effective for high-dimensional
image feature extraction and demonstrated robust performance in producing reliable
embeddings, which is critical for facial recognition.
• Cosine Similarity Calculation: The model uses cosine similarity to compare the stored
embeddings with the live image embedding. Cosine similarity measures how similar two
embeddings are by computing the cosine of the angle between them. A threshold of 0.9
was set to determine the success or failure of the facial recognition.

8.2 Verification Success Rate and Accuracy Metrics

• Verification Success Rate: The verification function uses cosine similarity to compare
the live image against the stored embeddings for each employee. If the average cosine
similarity score of the five stored embeddings with the live image embedding is greater
than or equal to 0.975, the authentication is successful. If the average similarity score is
lower, the authentication fails.
• Accuracy: The accuracy is evaluated based on how many times the system correctly
authenticates employees as compared to the total number of attempts. For each employee,
five images were used to train the model, and the similarity scores between the stored
embeddings were calculated. If all similarity scores were above the threshold (0.9), the
verification passed. If any score was below the threshold, the verification failed.

32
• Cosine Similarity Observations:
o The intra-employee similarity matrix was used to analyze how similar the stored
images of the same employee are to each other. This matrix helps ensure that the
embeddings of the same employee are sufficiently close, meaning the system can
correctly authenticate the employee even with different facial expressions or angles.
o During the test, the average cosine similarity score between the live image and stored
embeddings was calculated. A higher average score indicated better alignment
between the live image and the stored images, leading to a successful authentication.

8.3 Observations from Cosine Similarity Threshold Testing

The threshold of 0.9 for cosine similarity was selected to ensure high accuracy in employee
verification. Below are the observations based on threshold testing:

• Threshold Testing: When comparing the cosine similarity scores of the stored
embeddings with the live image, if any of the similarity scores were below 0.9, the system
flagged the verification as failed. For successful verification, all similarity scores needed
to be above the threshold.
• Average Similarity Calculation: The average similarity score is calculated for each
employee by comparing the live image with the embeddings of the stored images. If the
average similarity score was below 0.975, the system rejected the verification. This
ensures that even small variations in facial features (due to lighting, pose, etc.) did not
result in false acceptances.

33
CHAPTER 9
CONCLUSION

34
CONCLUSION

In this project, substantial progress has been made in developing a facial recognition system for
task authentication, demonstrating the feasibility of using a deep learning-based approach
tailored to real-time applications. Key accomplishments include building and refining a CNN
model from scratch, implementing a Base64 data pipeline, successfully integrating a VGGFace
pre-trained model, and storing image data and embeddings within Elasticsearch. Additionally,
the project has explored and compared CNNs, Vision Transformers, and traditional computer
vision methods, with CNNs emerging as the preferred model for its balance between accuracy,
computational efficiency, and real-time compatibility.

Despite these accomplishments, there are limitations to the current system. The training data,
limited to 4-5 images per employee, poses challenges for model accuracy, especially under
varied conditions such as lighting or facial expressions. Additionally, the current environment
has computational constraints, impacting real-time performance when testing more resource-
intensive models like Vision Transformers.

Future work will focus on completing the system’s integration into the production environment,
improving model robustness, and ensuring scalability. This includes extensive testing on real
employee data, system stress testing to monitor performance under heavy traffic, and refining
the model to improve accuracy across diverse conditions. Additional security measures,
including authentication and data encryption, will also be incorporated to ensure a production-
ready solution.

35
CHAPTER 10
REFERENCES

36
REFERENCES

[1] S. Almabdy and L. Elrefaei, "Deep Convolutional Neural Network-Based Approaches for
Face Recognition," *Appl. Sci.*, vol. 9, no. 20, pp. 4397, 2019. doi: 10.3390/app9204397.
[2] G. Sree and R. Baskar, "Performance Analysis of CNN Algorithm in Comparison with LR
Algorithm for Face Recognition in Smart-Lock," *2024 IEEE International Conference on
Trends in Quantum Computing and Emerging Business Technologies (TQCEBT)*, 2024,
pp. 1–5. doi: 10.1109/TQCEBT59414.2024.10545038.
[3] A. Budiman, F. Aryatama Yaputera, S. Achmad, and A. Kurniawan, "Student attendance
with face recognition (LBPH or CNN): Systematic literature review," *Procedia Comput.
Sci.*, vol. 216, pp. 31–38, 2023. doi: 10.1016/j.procs.2023.01.042.
[4] Q. Cao, L. Shen, W. Xie, O. Parkhi, and A. Zisserman, "VGGFace2: A Dataset for
Recognizing Faces across Pose and Age," in *Proc. 2018 13th IEEE Int. Conf. Automatic
Face Gesture Recognit. (FG 2018)*, Xi’an, China, 2018, pp. 67–74. doi:
10.1109/FG.2018.00020.
[5] V. S. Patil, "Face Recognition using Open Source Computer Vision Library (OpenCV) with
Python," in *2022 Int. Conf. Smart Technol. Syst. Next Generation Comput.* (ICSTSN),
2022, pp. 287–291. doi: 10.1109/ICSTSN55094.2022.9964836.
[6] M. Y. Purnomo, S. B. Nugroho, and I. C. Hidayat, "A Face Recognition Approach Based
on Computer Vision," in *2020 IEEE Int. Conf. Artif. Intell. Comput. Vision* (AICV),
Cairo, Egypt, 2020, pp. 123–127. doi: 10.1109/AICV51006.2020.9260310.
[7] R. B. Krishna and R. M. Moorthy, "Real-Time Facial Recognition in Computer Vision for
Industrial Security," in *2023 8th IEEE Conf. Signal Process., Informatics, Commun.
Energy Syst.* (SPICES), Trivandrum, India, 2023, pp. 89–94. doi:
10.1109/SPICES57160.2023.10456174.
[8] R. Tiwari, A. P. Borkar, and S. P. Deshmukh, "Vision-Face Recognition Attendance
Monitoring System for Surveillance using Deep Learning Technology and Computer
Vision," in *2019 IEEE Int. Conf. Power, Intell. Comput. Commun.* (ICPICC), Manipal,
India, 2019, pp. 1–6. doi: 10.1109/ICPICC.2019.8899418.
[9] P. Sengupta, "Face Recognition: Recent Advancements and Research Challenges," *2022
IEEE Int. Conf. Smart Technol. Sustain. Develop. (ICSTSD)*, Pune, India, 2022, pp. 56–
60. doi: 10.1109/ICSTSD56100.2022.9984308.

37
[10] J. Wu, "Face Recognition System Based on CNN," in *2020 IEEE Int. Conf. Big Data
Artif. Intell. (BDAI)*, Chengdu, China, 2020, pp. 452–456. doi:
10.1109/BDAI52447.2020.9148141.
[11] K. R. Yadav, "Face Detection and Recognition System using Digital Image Processing,"
*Int. Conf. Data Process. Eng. (ICDPE)*, Nanjing, China, 2020, pp. 108–112. doi:
10.1109/ICDPE50310.2020.9074838.
[12] M. K. Gupta, "ARTriViT: Automatic Face Recognition System Using ViT-Based
Siamese Neural Networks with a Triplet Loss," in *Proc. IEEE Int. Conf. Image Process.,
Comput. Vision Pattern Recognit.* (ICIPCV), San Francisco, CA, USA, 2023, pp. 145–
151. doi: 10.1109/ICIPCV57494.2023.10228106.
[13] R. Yang, "Baby Learning with Vision Transformer for Face Recognition," *IEEE
Comput. Soc. Conf. Comput. Vision Pattern Recognit. Workshops (CVPRW)*, 2023, pp.
243–249. doi: 10.1109/CVPRW50312.2023.9924795.
[14] L. Feng, "Facial Emotions Recognition Using ViT and Transfer Learning," in *Proc.
IEEE Int. Conf. Appl. Comput. Sci. Eng.* (ICACSE), Rome, Italy, 2023, pp. 398–404. doi:
10.1109/ICACSE54989.2023.9993933.
[15] N. D. Rao and S. Gupta, "Facial Expression Recognition Based on Visual Transformers
and Local Attention Features Network," *Proc. IEEE Int. Conf. Artif. Intell. Big Data
(ICAIBD)*, 2023, pp. 39–45. doi: 10.1109/ICAIBD57058.2023.9846106.
[16] H. Wang and X. Liu, "Learning 3D Face Representation with Vision Transformer for
Masked Face Recognition," in *Proc. 2022 IEEE/CVF Conf. Comput. Vision Pattern
Recognit.* (CVPR), New Orleans, LA, USA, 2022, pp. 981–987. doi:
10.1109/CVPR49987.2022.9852538.
[17] A. M. Singh, "Diagnosing Progressive Face Recognition from Face Morphing Using
ViT Technique Through DL Approach," in *2023 IEEE Conf. Image, Signal Vision Eng.*
(ISVE), London, UK, 2023, pp. 567–573. doi: 10.1109/ISVE57240.2023.10127374.
[18] R. J. Dinh and J. H. Kim, "Spoofing Attack Detection in Face Recognition System Using
Vision Transformer with Patch-wise Data Augmentation," *IEEE Access*, vol. 10, pp.
1324–1332, 2023. doi: 10.1109/ACCESS.2023.9979996.
[19] T. S. Cho, "Driver Facial Expression Recognition Based on ViT and StarGAN," in
*Proc. IEEE Int. Conf. Cybern. Comput. Intell.* (ICCCI), Beijing, China, 2021, pp. 99–
104. doi: 10.1109/ICCCI54643.2021.9540071.

38
[20] M. Zhou and S. Wu, "A Review of Face Recognition Technology," *IEEE Access*,
vol. 8, pp. 136349–136363, 2020. doi: 10.1109/ACCESS.2020.9145558.
[21] N. J. Patel and R. K. Shah, "Face Detection and Recognition Using OpenCV," in *2020
Int. Conf. Comput. Intell. Sustainable Develop. (CISD)*, 2020, pp. 95–100. doi:
10.1109/CISD57108.2020.8974493.
[22] A. Khare, "Face Recognition System Using Machine Learning Algorithm," in *Proc.
2020 Int. Conf. Recent Trends Electron. Inf. Commun. Technol.* (RTEICT), Bangalore,
India, 2020, pp. 119–124. doi: 10.1109/RTEICT58722.2020.9137850.
[23] A. Sinha and B. Mehta, "A Study on Evolution of Facial Recognition Technology," in
*Proc. 2023 IEEE Int. Conf. Comput. Intell. Secure Syst.* (CISS), pp. 201–208. doi:
10.1109/CISS58096.2023.10150876.
[24] C. Y. Zhang, "Research on Face Recognition Technology Based on an Improved LeNet-
5 System," in *2021 IEEE Int. Conf. Electron. Commun. Autom.* (ICECA), pp. 83–88.
doi: 10.1109/ICECA57312.2021.9700863.
[25] A. M. Haque, "An Approach for Face Detection and Face Recognition Using OpenCV
and Face Recognition Libraries in Python," *IEEE Access*, vol. 10, pp. 2014–2021, 2022.
doi: 10.1109/ACCESS.2022.10113066.
[26] M. Aggarwal, "A Review Paper on Facial Recognition Techniques," in *Proc. 2021
IEEE Int. Conf. Comput.

39

You might also like