9th SEM Final Project Amrita
9th SEM Final Project Amrita
AUTHENTICATION
A PROJECT REPORT
Submitted by
SHRUTHI S
CB.SC.I5DAS20144
Department of Mathematics
COIMBATORE 641112
NOVEMBER 2024
DEPARTMENT OF MATHEMATICS
Coimbatore - 641112
BONAFIDE CERTIFICATE
This is to certify that the project report entitled "DEEP LEARNING BASED FACIAL
RECOGNITION FOR TASK AUTHENTICATION" submitted by SHRUTHI S
(CB.SC.I5DAS20144) in partial fulfillment of the requirements for the award of the Degree of
Integrated Master of Science in Data Science is a bonafide record of the work carried out under my
guidance and supervision at Amrita School of Physical Sciences, Coimbatore.
Chairperson
Department of Mathematics
Dr. K. Somasundaram
Coimbatore - 641112
DECLARATION
I, SHRUTHI S (CB.SC.I5DAS20144) hereby declare that this project report entitled "DEEP
LEARNING BASED FACIAL RECOGNITION FOR TASK AUTHENTICATION" is the
record of the original work done by me at Department of Mathematics, Amrita School of Physical
Sciences, Coimbatore. To the best of my knowledge this work has not formed the basis for the
award of any degree / diploma / associateship / fellowship or a similar award to any candidate in
any university/institutions.
Date:
Place:
COUNTERSIGNED
Dr. P. Tamilalagan
Project Coordinator,
Department of Mathematics,
School of Physical Sciences,
Amrita Vishwa Vidyapeetham,
Coimbatore-641112, Tamil Nadu, India.
Acknowledgements
I would like to express my sincere gratitude to everyone who has supported me in the completion
of this project. I am especially thankful to my project guide, Dr. Neha Singh, Assistant Professor,
Department of Mathematics, School of Physical Sciences, Coimbatore, for her continuous support
and guidance throughout this project. I also extend my gratitude to our Chairperson, Dr.
Somasundaram K, Department of Mathematics, School of Physical Sciences, Coimbatore, for
fostering a creative and encouraging environment that greatly contributed to this project’s success.
I am grateful to my friends for their constant encouragement and assistance in bringing this project
to life.
Most importantly, I would like to express my heartfelt appreciation to my parents for their
unwavering understanding and support, which provided me with the strength and motivation to
complete this project successfully.
(SHRUTHI S)
ABSTRACT
This project, titled Deep Learning-Based Facial Recognition for Task Authentication, aims to
develop a robust and secure system for task authentication using deep learning and computer
vision techniques. By combining Convolutional Neural Networks (CNN), Vision Transformers
(ViT), and traditional computer vision methods, the system provides a highly accurate solution
for real-time facial verification, ensuring only authorized individuals can complete specific
tasks.
The authentication process leverages a structured dataset where each employee’s identity is
confirmed using five reference images and a live image. These images are processed to generate
facial embeddings through a pre-trained VGG16 model, which are then stored in Elasticsearch
for efficient retrieval and comparison. During the verification stage, the system fetches the latest
image’s embeddings, which are compared to stored embeddings based on a cosine similarity
threshold of 0.9, ensuring reliable identity verification.
1. Introduction ....................................................................................................................... 1
1.1. Background .................................................................................................................. 2
1.2. Objectives ..................................................................................................................... 2
1.3. Overview of Approach .................................................................................................. 3
3. Methodology ....................................................................................................................... 7
3.1. Data Collection and Preprocessing ............................................................................... 8
3.1.1. Dataset Selection ................................................................................................ 8
3.1.2. Image Preprocessing Techniques ....................................................................... 9
3.1.3. Storage and Retrieval in Elasticsearch ............................................................... 9
3.2. Model Selection and Training ...................................................................................... 9
3.2.1. CNN Model Implementation ............................................................................. 9
3.2.2. Vision Transformer (ViT) Model Implementation ............................................ 9
3.2.3. Traditional Computer Vision Methods............................................................. 10
3.3. Embedding Storage and Verification Process ............................................................ 10
3.3.1. Embedding Generation with VGG16 ............................................................... 10
3.3.2. Embedding Storage and Management in Elasticsearch .................................... 10
3.3.3. Cosine Similarity for Verification and Threshold Criteria ............................... 11
3.3.4. User Interface for Presentation Purposes .......................................................... 11
6. Milestones ......................................................................................................................... 25
7. Evaluation ........................................................................................................................ 28
7.1. Performance Analysis of Each Model ......................................................................... 28
7.2. Verification Success Rate and Accuracy Metrics ........................................................ 29
7.3. Cosine Similarity Observations .................................................................................. 30
9. Conclusion ........................................................................................................................ 34
1
INTRODUCTION
In today's digital landscape, security and efficiency are of utmost importance for organizations
managing various tasks. Traditional authentication methods, such as passwords or tokens, often
present vulnerabilities, or usability challenges. Facial recognition, powered by advancements
in deep learning and computer vision, offers a promising alternative that is both secure and
user-friendly. By automating the process of verifying individuals based on their facial features,
organizations can enhance security and streamline task verification.
This project, titled "Deep Learning-Based Facial Recognition for Task Authentication," is the
first step toward building a production-ready system that uses facial recognition for secure task
authentication. The goal is to ensure that only authorized personnel are allowed to sign off or
complete specific tasks, thus maintaining integrity and accountability in organizational
workflows.
Leveraging deep learning models such as Convolutional Neural Networks (CNN) and Vision
Transformers (ViT), combined with traditional computer vision techniques, this project aims to
implement an initial prototype of a facial authentication system. The system works by
processing a set of reference images for each employee, generating facial embeddings using a
pre-trained VGG16 model, and storing these embeddings in Elasticsearch for comparison
during real-time verification. The verification process uses cosine similarity to ensure accurate
matching, with a threshold set at 0.9 for successful identification.
This project serves as the foundation for a more robust and scalable system, which will undergo
rigorous testing and refinement to meet production standards. Key challenges addressed in this
first phase include the synchronization of data between SQL Server and Elasticsearch, as well
as evaluating the performance of different models in terms of accuracy and computational
efficiency. Based on testing outcomes, the system will be enhanced to ensure it meets the
security and performance requirements necessary for production deployment.
2
Ultimately, this project demonstrates the potential of deep learning and computer vision to
revolutionize task authentication processes. While the current implementation is a first step, it
sets the stage for continuous improvement and optimization, paving the way for a fully
functional and scalable facial recognition system in organizational settings.
3
CHAPTER 2
LITERATURE
REVIEW
4
LITERATURE REVIEW
Facial recognition has become one of the most reliable forms of biometric authentication due
to its non-intrusive nature and high accuracy in identifying individuals. The core of facial
recognition involves detecting facial features and comparing them against a stored dataset of
known faces to verify identity. Over the years, facial recognition has evolved through several
stages, starting from basic geometric-based methods to complex, deep learning-based models
that can handle variations in lighting, pose, and expression. Recent advancements leverage
feature extraction methods such as face embeddings, which convert faces into vector
representations, allowing for highly accurate comparisons. This progression demonstrates the
field’s adaptation to meet increasing demands for accuracy, speed, and scalability.
Early facial recognition systems primarily relied on traditional computer vision techniques,
such as Eigenfaces and Fisherfaces, which used Principal Component Analysis (PCA) and
Linear Discriminant Analysis (LDA), respectively, to extract features. Although effective, these
techniques struggled with variations in environmental conditions, such as changes in lighting
and angle. As deep learning emerged, more sophisticated models, particularly Convolutional
Neural Networks (CNNs), became the standard for facial recognition. Deep learning
approaches revolutionized facial recognition by allowing models to learn complex facial
features from large datasets, thus improving accuracy significantly under diverse conditions.
Deep learning also enables end-to-end training, making models more adaptable and less reliant
on manual feature engineering. This shift from traditional methods to deep learning has
expanded facial recognition capabilities, especially in areas requiring real-time authentication.
Convolutional Neural Networks (CNNs) have traditionally been the backbone of facial
recognition systems due to their ability to capture spatial hierarchies in images, making them
ideal for detecting facial features. CNNs use convolutional layers to identify patterns at various
5
levels, from simple edges to complex structures, providing accurate embeddings for facial
comparisons. However, CNNs have limitations in handling long-range dependencies and may
require extensive labeled data for training.
Vision Transformers (ViT), a newer approach, bring the attention mechanism—originally used
in natural language processing—into computer vision, enabling models to capture relationships
across the entire image, rather than focusing on local features alone. ViTs divide an image into
patches, treating each patch as a sequence element and learning long-range dependencies. ViTs
have shown promise in facial recognition, especially in cases where relationships between
distant features, such as eyes and mouth, need to be captured for precise recognition. Although
ViTs are computationally demanding and may require large amounts of data, they provide an
alternative that can complement or enhance the performance of CNNs, depending on the
specific requirements of the application.
6
CHAPTER 3
METHODOLOGY
7
METHODOLOGY
This section outlines the steps involved in developing the facial authentication system,
including data collection, model selection, embedding generation, and verification processes.
Each stage was carefully designed to ensure accuracy, efficiency, and scalability, providing a
strong foundation for real-time task authentication.
To ensure accurate facial recognition, the project required a diverse and reliable dataset. Each
employee’s identity was validated using five high-quality reference images, captured in various
lighting conditions and angles to account for real-world variations. A live image is also used
for real-time verification during task sign-off. This dataset structure enables the system to
handle typical variations while ensuring accurate recognition.
Before feeding images into the model, several preprocessing steps were applied to enhance
accuracy:
• Face Detection: Each image undergoes face detection to isolate the face from the
background, reducing noise and improving the quality of embeddings.
• Image Resizing: Images are resized to a fixed input size compatible with the selected
deep learning models (e.g., 224x224 pixels for VGG16).
• Normalization: Image pixel values are normalized to a standard scale (typically
between 0 and 1) to improve model convergence and performance.
• Data Augmentation: To further improve model robustness, augmentation techniques
such as random rotations, zooms, and brightness adjustments were applied, helping the
model generalize to new images during real-time authentication.
8
3.1.3 Storage and Retrieval in Elasticsearch
Elasticsearch serves as the primary storage solution for employee facial embeddings. After
preprocessing, each image's embedding is generated and stored in the facerecognition index
within Elasticsearch, organized by unique employee IDs. Elasticsearch allows for efficient
retrieval and updating of embeddings, making it suitable for high-speed, real-time facial
verification. This setup also supports scalability, enabling the system to accommodate a
growing number of employees without performance degradation.
Convolutional Neural Networks (CNNs), specifically the VGG16 model, were chosen due to
their robust feature extraction capabilities. The VGG16 model uses multiple convolutional
layers to detect intricate patterns and structures in facial images, making it well-suited for facial
recognition tasks. Key steps in implementing CNN include:
Vision Transformers (ViT) offer an alternative to CNNs by using the self-attention mechanism,
capturing relationships between all parts of an image. Key steps in implementing ViT include:
• Patch Embedding: Each image is divided into fixed-size patches, which are then linearly
embedded to serve as the input tokens for the transformer.
• Attention Mechanism: ViT models learn to capture long-range dependencies between
patches, enabling a more holistic understanding of facial features compared to CNNs.
9
• Embedding Comparison: Like CNNs, ViT outputs an embedding vector, which is stored
and compared during verification. ViTs were assessed for performance and accuracy to
determine their suitability for real-time applications.
The VGG16 model generates embeddings for each image after preprocessing, producing a
unique high-dimensional vector that captures the essential facial features. This embedding
serves as a compressed representation of the face, which is subsequently stored in Elasticsearch
for future comparison. Using VGG16 pre-trained weights ensures that the embeddings are
accurate without extensive training time, enhancing system efficiency.
Elasticsearch provides a structured environment for storing and managing facial embeddings:
• Indexing and Storage: Each employee’s embeddings are indexed within Elasticsearch
under a unique employee ID. Multiple embeddings per employee are stored to account for
variations in appearance over time.
• Data Syncing: The system syncs the SQL Server database with Elasticsearch, ensuring
that any new employee images added to the SQL database are automatically reflected in
Elasticsearch, maintaining consistency across both databases.
10
• Scalability: Elasticsearch’s scalability allows the system to handle a growing number of
employee records while maintaining fast retrieval times, a crucial feature for real-time
applications.
For the verification process, cosine similarity is used to measure the closeness between the
embedding of the live image and the stored embeddings. The steps in the verification process
are as follows:
• Cosine Similarity Calculation: Cosine similarity measures the angle between two
embedding vectors, yielding a score between -1 and 1. A higher cosine similarity indicates
a closer match between the embeddings.
• Threshold Setting: A threshold of 0.9 was set as the minimum acceptable similarity score
for successful authentication. If the cosine similarity between the live image embedding
and all reference embeddings for a given employee is above 0.9, the person is considered
authenticated; otherwise, access is denied.
• Real-Time Verification: During task sign-off, the system retrieves the latest image
embedding from Elasticsearch and performs cosine similarity checks against stored
embeddings in real time. This real-time comparison enables prompt and accurate task
authentication.
While the final deployment does not require a user interface, a simple UI is added for
demonstration purposes. This UI allows for an interactive experience to showcase the
workflow, from uploading employee images to displaying verification results. The UI provides
a visual representation of the system's functionality, making it easier for stakeholders to
understand the authentication process, especially during panel presentations.
11
CHAPTER 4
SYSTEM DESIGN
12
SYSTEM DESIGN
The system design of this project focuses on achieving efficient, accurate, and secure facial
authentication for task sign-off. The following subsections outline the overall workflow,
integration with Elasticsearch for efficient data management, and the real-time verification
process.
The facial authentication system is designed with a step-by-step workflow to capture, process,
store, and verify employee images for task authentication. This structured approach helps
ensure that each component of the system—from data collection to real-time verification—
functions optimally and integrates smoothly.
• Each employee provides five images during an initial enrollment phase. These images
are captured under various conditions to enhance the model's ability to generalize across
different lighting, expressions, and poses. A unique employee ID is assigned to each
person, linking their images and authentication details in the system.
• A live image is captured during each task sign-off attempt. This image is used for real-
time verification to ensure that only authorized personnel complete or sign off on the
task.
• Images undergo face detection to isolate and focus on the facial area, eliminating
background noise.
• Images are resized and normalized to match the input requirements of the deep learning
models (e.g., VGG16).
• Data augmentation techniques may be applied to account for environmental variations,
improving model robustness.
13
4.1.3 Embedding Generation and Storage:
• After preprocessing, each image is passed through a deep learning model (e.g., VGG16
or Vision Transformer) to generate a facial embedding. This embedding is a unique
high-dimensional vector that represents the core features of an individual’s face.
• These embeddings are then stored in Elasticsearch, mapped to the employee’s ID,
ensuring quick access during the verification process.
• During task sign-off, the system captures a live image and generates an embedding,
which is compared against the stored embeddings in Elasticsearch. Cosine similarity
measures are used to assess how closely the live embedding matches the stored
embeddings.
• This workflow provides a comprehensive, end-to-end process that enables the system
to capture, store, and verify facial data for secure and efficient task authentication.
14
4.1.5 Data Pipeline Workflow:
15
4.2 Integration of Elasticsearch for Image Storage and Retrieval
Elasticsearch plays a central role in managing and retrieving facial embeddings for real-time
verification. It was chosen for its speed, scalability, and ability to handle complex data queries,
making it well-suited for high-frequency access in authentication systems.
• With Elasticsearch, the system can efficiently store embeddings for many employees,
supporting horizontal scalability to accommodate future growth.
• Elasticsearch’s architecture enables fast retrieval, allowing the system to perform real-
time comparisons without delays. This scalability is essential as the system expands to
handle more employees and data.
• To ensure the database is up-to-date, the system synchronizes with an SQL Server
database that maintains employee information. Data from the SQL Server
(`bas_wc_wemp_empimgs` table) is mapped to the Elasticsearch index
(`facerecognition`), ensuring consistency across both databases.
• Data synchronization enables real-time updates, where any new image uploaded to
SQL Server is automatically reflected in Elasticsearch, making the latest embeddings
available for immediate verification.
16
• This integration of Elasticsearch with the facial authentication system ensures fast,
reliable data management, enabling smooth and efficient operations for real-time task
sign-offs.
The real-time verification process is critical to ensuring secure and efficient task authentication.
This process involves generating a live image embedding, retrieving stored embeddings, and
comparing them to determine if a match exists.
During task sign-off, a live image is captured, and its facial embedding is generated using a pre-
trained VGG16 model. This embedding represents the individual’s facial features in a compact,
high-dimensional format, capturing unique details essential for verification.
The system queries Elasticsearch to retrieve all embeddings associated with the employee’s ID.
Since embeddings are organized by employee ID and timestamp, the latest embeddings are
easily accessible for comparison.
• Cosine Similarity Calculation: The similarity between the live image embedding and
stored embeddings is calculated using cosine similarity, which measures the angle
between the two vectors. A similarity score near 1 indicates a close match, while a
score near 0 or below indicates a lack of similarity.
• Threshold Verification: A threshold of 0.9 is set to determine a successful match. If
the similarity score for the live embedding compared to each stored embedding meets
or exceeds this threshold, the task is authenticated as valid. If the score falls below 0.9,
access is denied, preventing unauthorized sign-off.
17
4.3.4 Real-Time Feedback and Access Control:
• Based on the cosine similarity results, the system grants or denies access. Successful
matches allow task sign-off, while failed matches trigger an alert, preventing
unauthorized actions.
• This real-time feedback ensures that the verification process is both secure and
efficient, minimizing the risk of errors and providing a streamlined experience for end
users.
• The real-time verification process is essential for maintaining the integrity of task
authentication, allowing only verified individuals to complete sensitive actions.
18
4.3.5 Employee Facial Recognition Pipeline:
19
CHAPTER 5
IMPLEMENTATION
20
IMPLEMENTATION
The development environment was established to support the various tools and libraries
necessary for building the facial recognition system. The following steps were taken:
21
o Python packages were installed via `pip`, ensuring compatibility with TensorFlow,
Keras, and Elasticsearch. The environment was tested to confirm that all dependencies
worked together without issues.
The core of the system involves face recognition for task sign-off, relying on generating facial
embeddings and comparing them with stored embeddings in Elasticsearch. The process is
explained below:
22
5.3 System Testing and Debugging
The system underwent rigorous testing to ensure that it functions as expected in a real-world
environment. Several tests were conducted in different phases:
One significant challenge encountered during the implementation phase was version
compatibility between Keras and Elasticsearch, particularly with the VGGFace model:
23
• Issue Identification:
o The incompatibility between Keras and Elasticsearch arose when integrating pre-trained
facial recognition models. Specifically, there were issues with the VGGFace model,
which was built using Keras but did not directly integrate with the Elasticsearch version
used for the project.
o This caused errors during the embedding storage and retrieval process, affecting the
system’s ability to retrieve embeddings from Elasticsearch and use them for verification.
• Solution and Adjustments:
o Workaround Solution: After identifying the version mismatch, I explored several
alternatives, including using a different version of Elasticsearch or modifying the pre-
trained model to work with the current version of Elasticsearch.
o Model Conversion: I also attempted converting the VGGFace model's embeddings into
a format compatible with Elasticsearch, ensuring that embeddings could be stored and
retrieved without errors.
o Dependency Updates: The versions of Keras and TensorFlow were updated to ensure
they were compatible with the required libraries for Elasticsearch. Additionally, the
Elasticsearch client was upgraded to ensure compatibility with Python-based libraries.
24
CHAPTER 6
MILESTONES
25
MILESTONES
The project milestones for the facial recognition system have been systematically organized to
ensure a structured development approach. In the Initial Research phase, comprehensive
studies were conducted on Convolutional Neural Networks (CNNs), Vision Transformers, and
various facial recognition workflows. This phase provided foundational knowledge of computer
vision techniques and pre-trained models, including VGGFace, and led to the development of
essential functions for handling Base64 image encoding, a key component for data processing.
26
Model Development was then initiated, where CNN models were built from scratch and tested
on datasets such as CIFAR-10, which provided insights into initial accuracy and performance
improvement opportunities. Integration of pre-trained models, including VGGFace, allowed for
significant accuracy gains on publicly available data. Additionally, research into facial
recognition systems on platforms like Android, iOS, and Huawei HarmonyOS provided insights
into potential device-level applications and compatibility requirements.
The project is now in Model Development (In Progress), where efforts are focused on
integrating computer vision models and exploring traditional face recognition methods such as
Haar Cascades and Histogram of Oriented Gradients (HOG). These approaches are being
compared to CNN and VGGFace to determine the best method for the company's dataset and
infrastructure.
In the Next Steps, a dedicated data pipeline will be built to manage the company's dataset, with
images and metadata stored in Elasticsearch for streamlined retrieval. A detailed comparison
of the CNN, VGGFace, and traditional models will be conducted using company data to
evaluate performance metrics, leading to the integration of the most effective model as an API.
This final model will be validated with real-world data to ensure its reliability, and the system
will undergo rigorous testing and security integration to prepare it for production deployment.
27
CHAPTER 7
EVALUATION
28
EVALUATION
The evaluation phase is critical in assessing the performance, accuracy, and reliability of the
facial recognition system designed for task authentication. The system was evaluated using key
performance metrics such as accuracy, precision, recall, and F1 score, which provide
comprehensive insights into its overall effectiveness. Accuracy represents the overall
correctness of the system in distinguishing authorized users from unauthorized ones, ensuring
only verified individuals can authenticate tasks. Precision measures the proportion of correct
matches among the predicted matches, while recall assesses the system’s capability to identify
all true matches. The F1 score, the harmonic mean of precision and recall, was particularly
important as it reflects the balance between false positives and false negatives, ensuring the
system’s robustness without sacrificing either sensitivity or specificity.
During evaluation, three primary computer vision models were compared for their suitability
in facial recognition: Convolutional Neural Networks (CNN), Vision Transformers (ViT), and
traditional computer vision techniques. CNNs demonstrated superior performance in face
detection and embedding generation, providing high accuracy while maintaining efficiency in
terms of computational resources. The CNN model's ability to generate meaningful embeddings
from the facial features enabled the system to achieve robust task authentication results. Vision
Transformers, though showing competitive accuracy, presented challenges in terms of
computational overhead, particularly when deployed for real-time verification tasks. These
models, while effective in larger datasets, did not offer the same speed as CNNs, making them
less optimal for this system's real-time requirements. Traditional computer vision techniques,
while faster and less resource-intensive, fell short in terms of accuracy and the ability to handle
diverse face variations, which ultimately led to their exclusion from the system.
As part of the evaluation, the system’s performance was also tested on multiple datasets
consisting of images captured under different conditions (e.g., lighting, expression, and angles).
This comprehensive testing helped identify the strengths and weaknesses of each model. The
system’s efficiency was also assessed by checking the retrieval time for facial embeddings from
Elasticsearch, which proved crucial for real-time performance. CNNs were found to be the most
29
efficient in embedding generation and retrieval speed, providing a good balance of accuracy
and real-time processing, making them the ideal choice for the task authentication process.
After a thorough comparison and testing phase, CNN-based models were selected as the optimal
approach for this system. They offered a reliable mix of accuracy, computational efficiency,
and speed, which aligns with the system's real-time verification requirements. However, there
is still room for further improvements. Future enhancements might include fine-tuning the CNN
model to better handle challenging environmental conditions, such as varying lighting and
facial expressions, by leveraging advanced data augmentation techniques and further training
on diverse datasets. Additionally, exploring Vision Transformers for future versions of the
system may become feasible if computational efficiency improves or if the system scales up to
handle more complex datasets. Incorporating hybrid models that combine CNNs and traditional
computer vision methods could also be explored for specific use cases where rapid verification
with lower computational demands is required.
Moreover, future versions of the system could incorporate additional security layers, such as
multi-factor authentication, to further enhance the robustness of the facial recognition-based
task authentication process. In conclusion, while the current system performs well with high
accuracy and efficiency, ongoing refinements and optimizations are planned, particularly in
addressing diverse facial variations and enhancing computational efficiency, to make the
system even more reliable and scalable in real-world scenarios.
30
CHAPTER 8
RESULTS AND
ANALYSIS
31
RESULTS AND ANALYSIS
The facial recognition system was evaluated using various metrics, including cosine similarity,
accuracy, and verification success rate to assess its performance and effectiveness in
authenticating employees. The system uses embeddings derived from a pre-trained VGG16
model for face feature extraction, and the cosine similarity is used to compare embeddings for
verification.
• VGG16 Model for Feature Extraction: The VGG16 model, pre-trained on ImageNet,
was used for extracting face embeddings. This model is effective for high-dimensional
image feature extraction and demonstrated robust performance in producing reliable
embeddings, which is critical for facial recognition.
• Cosine Similarity Calculation: The model uses cosine similarity to compare the stored
embeddings with the live image embedding. Cosine similarity measures how similar two
embeddings are by computing the cosine of the angle between them. A threshold of 0.9
was set to determine the success or failure of the facial recognition.
• Verification Success Rate: The verification function uses cosine similarity to compare
the live image against the stored embeddings for each employee. If the average cosine
similarity score of the five stored embeddings with the live image embedding is greater
than or equal to 0.975, the authentication is successful. If the average similarity score is
lower, the authentication fails.
• Accuracy: The accuracy is evaluated based on how many times the system correctly
authenticates employees as compared to the total number of attempts. For each employee,
five images were used to train the model, and the similarity scores between the stored
embeddings were calculated. If all similarity scores were above the threshold (0.9), the
verification passed. If any score was below the threshold, the verification failed.
32
• Cosine Similarity Observations:
o The intra-employee similarity matrix was used to analyze how similar the stored
images of the same employee are to each other. This matrix helps ensure that the
embeddings of the same employee are sufficiently close, meaning the system can
correctly authenticate the employee even with different facial expressions or angles.
o During the test, the average cosine similarity score between the live image and stored
embeddings was calculated. A higher average score indicated better alignment
between the live image and the stored images, leading to a successful authentication.
The threshold of 0.9 for cosine similarity was selected to ensure high accuracy in employee
verification. Below are the observations based on threshold testing:
• Threshold Testing: When comparing the cosine similarity scores of the stored
embeddings with the live image, if any of the similarity scores were below 0.9, the system
flagged the verification as failed. For successful verification, all similarity scores needed
to be above the threshold.
• Average Similarity Calculation: The average similarity score is calculated for each
employee by comparing the live image with the embeddings of the stored images. If the
average similarity score was below 0.975, the system rejected the verification. This
ensures that even small variations in facial features (due to lighting, pose, etc.) did not
result in false acceptances.
33
CHAPTER 9
CONCLUSION
34
CONCLUSION
In this project, substantial progress has been made in developing a facial recognition system for
task authentication, demonstrating the feasibility of using a deep learning-based approach
tailored to real-time applications. Key accomplishments include building and refining a CNN
model from scratch, implementing a Base64 data pipeline, successfully integrating a VGGFace
pre-trained model, and storing image data and embeddings within Elasticsearch. Additionally,
the project has explored and compared CNNs, Vision Transformers, and traditional computer
vision methods, with CNNs emerging as the preferred model for its balance between accuracy,
computational efficiency, and real-time compatibility.
Despite these accomplishments, there are limitations to the current system. The training data,
limited to 4-5 images per employee, poses challenges for model accuracy, especially under
varied conditions such as lighting or facial expressions. Additionally, the current environment
has computational constraints, impacting real-time performance when testing more resource-
intensive models like Vision Transformers.
Future work will focus on completing the system’s integration into the production environment,
improving model robustness, and ensuring scalability. This includes extensive testing on real
employee data, system stress testing to monitor performance under heavy traffic, and refining
the model to improve accuracy across diverse conditions. Additional security measures,
including authentication and data encryption, will also be incorporated to ensure a production-
ready solution.
35
CHAPTER 10
REFERENCES
36
REFERENCES
[1] S. Almabdy and L. Elrefaei, "Deep Convolutional Neural Network-Based Approaches for
Face Recognition," *Appl. Sci.*, vol. 9, no. 20, pp. 4397, 2019. doi: 10.3390/app9204397.
[2] G. Sree and R. Baskar, "Performance Analysis of CNN Algorithm in Comparison with LR
Algorithm for Face Recognition in Smart-Lock," *2024 IEEE International Conference on
Trends in Quantum Computing and Emerging Business Technologies (TQCEBT)*, 2024,
pp. 1–5. doi: 10.1109/TQCEBT59414.2024.10545038.
[3] A. Budiman, F. Aryatama Yaputera, S. Achmad, and A. Kurniawan, "Student attendance
with face recognition (LBPH or CNN): Systematic literature review," *Procedia Comput.
Sci.*, vol. 216, pp. 31–38, 2023. doi: 10.1016/j.procs.2023.01.042.
[4] Q. Cao, L. Shen, W. Xie, O. Parkhi, and A. Zisserman, "VGGFace2: A Dataset for
Recognizing Faces across Pose and Age," in *Proc. 2018 13th IEEE Int. Conf. Automatic
Face Gesture Recognit. (FG 2018)*, Xi’an, China, 2018, pp. 67–74. doi:
10.1109/FG.2018.00020.
[5] V. S. Patil, "Face Recognition using Open Source Computer Vision Library (OpenCV) with
Python," in *2022 Int. Conf. Smart Technol. Syst. Next Generation Comput.* (ICSTSN),
2022, pp. 287–291. doi: 10.1109/ICSTSN55094.2022.9964836.
[6] M. Y. Purnomo, S. B. Nugroho, and I. C. Hidayat, "A Face Recognition Approach Based
on Computer Vision," in *2020 IEEE Int. Conf. Artif. Intell. Comput. Vision* (AICV),
Cairo, Egypt, 2020, pp. 123–127. doi: 10.1109/AICV51006.2020.9260310.
[7] R. B. Krishna and R. M. Moorthy, "Real-Time Facial Recognition in Computer Vision for
Industrial Security," in *2023 8th IEEE Conf. Signal Process., Informatics, Commun.
Energy Syst.* (SPICES), Trivandrum, India, 2023, pp. 89–94. doi:
10.1109/SPICES57160.2023.10456174.
[8] R. Tiwari, A. P. Borkar, and S. P. Deshmukh, "Vision-Face Recognition Attendance
Monitoring System for Surveillance using Deep Learning Technology and Computer
Vision," in *2019 IEEE Int. Conf. Power, Intell. Comput. Commun.* (ICPICC), Manipal,
India, 2019, pp. 1–6. doi: 10.1109/ICPICC.2019.8899418.
[9] P. Sengupta, "Face Recognition: Recent Advancements and Research Challenges," *2022
IEEE Int. Conf. Smart Technol. Sustain. Develop. (ICSTSD)*, Pune, India, 2022, pp. 56–
60. doi: 10.1109/ICSTSD56100.2022.9984308.
37
[10] J. Wu, "Face Recognition System Based on CNN," in *2020 IEEE Int. Conf. Big Data
Artif. Intell. (BDAI)*, Chengdu, China, 2020, pp. 452–456. doi:
10.1109/BDAI52447.2020.9148141.
[11] K. R. Yadav, "Face Detection and Recognition System using Digital Image Processing,"
*Int. Conf. Data Process. Eng. (ICDPE)*, Nanjing, China, 2020, pp. 108–112. doi:
10.1109/ICDPE50310.2020.9074838.
[12] M. K. Gupta, "ARTriViT: Automatic Face Recognition System Using ViT-Based
Siamese Neural Networks with a Triplet Loss," in *Proc. IEEE Int. Conf. Image Process.,
Comput. Vision Pattern Recognit.* (ICIPCV), San Francisco, CA, USA, 2023, pp. 145–
151. doi: 10.1109/ICIPCV57494.2023.10228106.
[13] R. Yang, "Baby Learning with Vision Transformer for Face Recognition," *IEEE
Comput. Soc. Conf. Comput. Vision Pattern Recognit. Workshops (CVPRW)*, 2023, pp.
243–249. doi: 10.1109/CVPRW50312.2023.9924795.
[14] L. Feng, "Facial Emotions Recognition Using ViT and Transfer Learning," in *Proc.
IEEE Int. Conf. Appl. Comput. Sci. Eng.* (ICACSE), Rome, Italy, 2023, pp. 398–404. doi:
10.1109/ICACSE54989.2023.9993933.
[15] N. D. Rao and S. Gupta, "Facial Expression Recognition Based on Visual Transformers
and Local Attention Features Network," *Proc. IEEE Int. Conf. Artif. Intell. Big Data
(ICAIBD)*, 2023, pp. 39–45. doi: 10.1109/ICAIBD57058.2023.9846106.
[16] H. Wang and X. Liu, "Learning 3D Face Representation with Vision Transformer for
Masked Face Recognition," in *Proc. 2022 IEEE/CVF Conf. Comput. Vision Pattern
Recognit.* (CVPR), New Orleans, LA, USA, 2022, pp. 981–987. doi:
10.1109/CVPR49987.2022.9852538.
[17] A. M. Singh, "Diagnosing Progressive Face Recognition from Face Morphing Using
ViT Technique Through DL Approach," in *2023 IEEE Conf. Image, Signal Vision Eng.*
(ISVE), London, UK, 2023, pp. 567–573. doi: 10.1109/ISVE57240.2023.10127374.
[18] R. J. Dinh and J. H. Kim, "Spoofing Attack Detection in Face Recognition System Using
Vision Transformer with Patch-wise Data Augmentation," *IEEE Access*, vol. 10, pp.
1324–1332, 2023. doi: 10.1109/ACCESS.2023.9979996.
[19] T. S. Cho, "Driver Facial Expression Recognition Based on ViT and StarGAN," in
*Proc. IEEE Int. Conf. Cybern. Comput. Intell.* (ICCCI), Beijing, China, 2021, pp. 99–
104. doi: 10.1109/ICCCI54643.2021.9540071.
38
[20] M. Zhou and S. Wu, "A Review of Face Recognition Technology," *IEEE Access*,
vol. 8, pp. 136349–136363, 2020. doi: 10.1109/ACCESS.2020.9145558.
[21] N. J. Patel and R. K. Shah, "Face Detection and Recognition Using OpenCV," in *2020
Int. Conf. Comput. Intell. Sustainable Develop. (CISD)*, 2020, pp. 95–100. doi:
10.1109/CISD57108.2020.8974493.
[22] A. Khare, "Face Recognition System Using Machine Learning Algorithm," in *Proc.
2020 Int. Conf. Recent Trends Electron. Inf. Commun. Technol.* (RTEICT), Bangalore,
India, 2020, pp. 119–124. doi: 10.1109/RTEICT58722.2020.9137850.
[23] A. Sinha and B. Mehta, "A Study on Evolution of Facial Recognition Technology," in
*Proc. 2023 IEEE Int. Conf. Comput. Intell. Secure Syst.* (CISS), pp. 201–208. doi:
10.1109/CISS58096.2023.10150876.
[24] C. Y. Zhang, "Research on Face Recognition Technology Based on an Improved LeNet-
5 System," in *2021 IEEE Int. Conf. Electron. Commun. Autom.* (ICECA), pp. 83–88.
doi: 10.1109/ICECA57312.2021.9700863.
[25] A. M. Haque, "An Approach for Face Detection and Face Recognition Using OpenCV
and Face Recognition Libraries in Python," *IEEE Access*, vol. 10, pp. 2014–2021, 2022.
doi: 10.1109/ACCESS.2022.10113066.
[26] M. Aggarwal, "A Review Paper on Facial Recognition Techniques," in *Proc. 2021
IEEE Int. Conf. Comput.
39