Dhaka University of Engineering & Technology (DUET), Gazipur
Department of Computer Science and Engineering
                             Thesis proposal report
Title: Detecting Emotions using facial Expression Analysis: A Machine Learning Approach
                                       Prepared By
             Palash Chandra Paul        Nahidur Rahman            Md.Shawon Miah
             Std_Id: 204011             Std_Id: 204010            Std_Id: 204033
             Semester: 4/2              Semester: 4/2             Semester: 4/2
                                         Supervised By
                                    Md.Abu Bakkar Siddique
                                       Assistant Professor
                         Department of Computer Science and Engineering
                                              1.Introduction
Facial emotion is a fundamental aspect of nonverbal communication. It plays a critical role in expressing
human feelings, perceptions, behavioral reactions, intentions, social signals, criminal tendencies, lies, and
a degree of gratification or displeasure. Facial expressions often provide an immediate and involuntary
reflection of a person’s emotional state. Regardless of gender, nationality, culture, and race, most people
can recognize facial emotions easily. The general approach to automatic facial expression analysis
consists of three steps: face detection and tracking, feature extraction, and expression classification and
recognition [4].
As technology becomes more embedded in our daily life, the ability for machines to comprehend and react
to human emotions is emerging as a significant focus within artificial intelligence (AI) and human-computer
interaction (HCI). With advancements in computer vision and machine learning, it has become possible to
analyze facial features automatically and classify emotional states with increasing accuracy. The field of
emotion detection through facial expression analysis has seen significant advancements in recent years,
primarily driven by progress in machine learning and computer vision.
Facial expressions can vary significantly based on factors such as age, gender, ethnicity, facial hair,
makeup, accessories, lighting conditions, and head orientation. When working with a limited dataset, these
variations make it challenging for machine learning models to accurately detect and interpret expressions
under such diverse conditions. Using a large and diverse dataset can significantly improve the accuracy
of facial expression detection. To achieve high performance, various machine learning and deep learning
algorithms can be applied, such as Convolutional Neural Networks (CNN), Recurrent Neural Networks
(RNN), Support Vector Machines (SVM), AdaBoost, and Linear Discriminant Analysis (LDA).
Additionally, feature selection techniques like Principal Component Analysis (PCA) can be used to reduce
dimensionality and enhance model efficiency. By combining robust datasets with these advanced
algorithms and feature selection methods, facial expressions can be detected more accurately and reliably
[2]. However, a challenging task is automating facial emotion detection and classification [7]. This area of
research focuses on training machines using a dataset to automatically recognize human emotions by
analyzing facial features captured in images or video streams. Humans can easily detect emotions, but
sometimes they cannot detect them. In the robotics sector, robots cannot detect human emotions easily,
so this field is used to detect emotions using facial expression analysis. Facial emotion analysis can help
identify signs of emotional distress, depression, anxiety, or other psychological conditions. Machine
learning models can assist therapists by continuously monitoring patients’ emotional responses in remote
or in-person sessions, offering early warnings, or aiding in diagnostics [3].
This research paper presents a machine learning-based approach for detecting emotions through facial
expression analysis. The goal is to develop a hybrid model combining EfficientNetV2 and ConvNeXt
that not only achieves high classification accuracy but also generalizes well across different faces and
environments
                                           2.Literature Review
Numerous studies have investigated diverse techniques for emotion detection, offering significant
contributions to the advancement of this field. The following section offers a brief examination of several
influential studies that have contributed to the progress of emotion recognition research.
In [6], the authors proposed a deep learning approach utilizing Convolutional Neural Networks (CNNs) for
emotion recognition from facial images. The model’s effectiveness was assessed using two well-
established datasets: the Facial Emotion Recognition Challenge (FER2013) and the Japanese Female
Facial Expression (JAFFE) dataset. It achieved notable accuracy scores of 70.14% and 98.65% on the
FER2013 and JAFFE datasets, respectively. However, their approach was designed to detect only the
seven basic emotions. It does not account for more nuanced or compound emotions (e.g., confusion,
frustration, sarcasm), which limits its applicability in more realistic, emotionally rich environments.
Pranav E. et al. [10] proposed a Deep Convolutional Neural Network (DCNN) model for recognizing five
facial emotions—angry, happy, neutral, sad, and surprised—achieving a test accuracy of 78.04%.The
model, trained on a manually collected dataset, utilizes standard CNN layers with ReLU and Softmax
activations, and is optimized using the Adam algorithm. While the approach shows promise and avoids
overfitting, it primarily relies on a simple two-layer architecture and lacks integration of more advanced
deep learning strategies or temporal dynamics—key elements that are increasingly vital for robust emotion
detection in real-world scenarios.
In [8], the authors developed a facial expression recognition system using traditional machine learning
techniques, specifically Haar-like features for face detection and HOG features with SVM for emotion
classification. While the system achieved a strong F1 score of 0.8759 and demonstrated effective emotion
detection across three expressions, its performance hinged on handcrafted features and personalized
classifiers—limitations that reduce scalability and adaptability to broader, more diverse datasets.
Singh et al. [11] proposed a facial emotion recognition system using CNN integrated with SVM for
classification, achieving up to 93% accuracy, though the model faced challenges with training time and
generalization across diverse datasets. Gaddam et al. [5] introduced a CNN-based model using ResNet50
architecture for classifying facial emotions from static images, achieving a test accuracy of 55.6%, but the
model struggled with data imbalance and limited accuracy compared to more advanced deep learning
frameworks.
Ali et al. [1] developed a CNN-based facial emotion detection system using Keras and real-world image
datasets, achieving up to 93% accuracy, but the study lacked integration of ensemble techniques and did
not address computational constraints for deployment on low-resource systems. Singh and Fang [2]
proposed a deep learning-based approach for emotion recognition using both audio and video modalities.
They    evaluated    multiple   neural   architectures,   including   CNN,     CNN+RNN,       and    a    hybrid
CNN+RNN+3DCNN model. Their best performance was achieved with the hybrid model, reaching 71.75%
accuracy on three emotions (sad, angry, and neutral). However, despite its enhanced learning capacity,
the hybrid model was noted for its high computational demand and only marginal improvement over simpler
CNN+RNN models—highlighting the trade-off between model complexity and practical efficiency in
emotion detection tasks.
In [9], Mellouka and Handouzia conducted a comprehensive review of deep learning models applied to
facial emotion recognition (FER), highlighting architectures like CNN, CNN-LSTM, 3DCNN, and BiLSTM.
These models achieved high recognition rates—many exceeding 90%—across benchmark datasets such
as CK+, JAFFE, and FER2013. While the results were promising, challenges persisted regarding
computational costs, data diversity, and the generalizability of models across varied realworld conditions,
limiting their practical scalability and interpretability in sensitive domains like healthcare.
Khan et al [7] conducted a comprehensive review of both traditional machine learning and deep learning
methods for facial emotion recognition, highlighting that CNN-based models deliver high accuracy on
standard datasets, while traditional ML methods like SVM and KNN remain computationally efficient.
However, the study lacked experimental implementation and did not address fairness or real-time
deployment challenges, limiting its practical applicability.
The reviewed work presents a deep learning-based facial emotion recognition framework using CNN,
LSTM, and transfer learning models like MobileNet and DCNN. While effective in feature extraction and
classification across standard datasets, the approaches still faced challenges with intra-class diversity and
struggled with real-world variability, as they primarily relied on pre-defined sentiment categories and lacked
real-time adaptability.
Comparison among different models & their dataset
                                                                         Jaiswal : CNN and FER2013 and
 100
                                                                                   JAFFE DataSet
  90                                                                     Pranva : DCNN Model, ReLU and
  80                                                                               Softmax Activation
                                                                         Kim     : Haar and HOG Feature With
  70                                                                              SVM
  60                                                                     Singh : CNN integrated With SVM
  50                                                                     Gadda    : CNN Base model ResNet50
  40                                                                              Architure
                                                                         Ali     : CNN and Keras Real world
  30                                                                               Image
  20                                                                     Singh : CNN, CNN+RNN,and
                                                                                   CNN+RNN+3DD
  10                                                                     Mellouka : FER Dataset,CNN CNN-
   0                                                                                 LSTM, BiLSTM
                                                                         Yadav     : CNN model and FER-2013
Current machine learning models like CNN, RNN, SVM, LDA, Random Forest, and architectures such as
MobileNet, DExpression, DenseNet, and ResNet are widely used for facial emotion recognition.
However, they often face limitations in accuracy and generalization. Developing a hybrid model can help
overcome these issues and improve overall performance.
                                                 3.Objectives
The primary objectives of our research is as follows:
   ➢ To analyze existing machine learning models for facial emotion recognition to identify strengths,
       limitations, and gaps.
   ➢ To select and preprocess relevant facial expression datasets to ensure quality, balance, and
       suitability for training.
   ➢ To design and implement an effective hybrid machine learning model to detect emotions from facial
       expressions with performance evaluation by combining EfficientNetV2 and ConvNeXt.
                                                4.Methodology
This work consider the leading challenge faced by machine learning and the entire system is the training
part. Where the system has to train by using real data of human face reactions. For a facial emotion
recognition system to be effective, it must be trained with real-world data showing different human facial
reactions. For instance, if the system is expected to detect a happy or angry face, it must first learn how
these emotions look. This learning is done through a process called re-training, where the models are
repeatedly fed real examples until they begin to recognize patterns and emotional features with high
accuracy
Image Collection                                                           Feature Selection and
                            Image Processing
                                                                              Transformation
                                                                          by using a hybrid model
                              Classification/                                Feature extraction
                               Recognition
Fig: Block Diagram of Image processing for emotion detection
Creating a hybrid model combining EfficientNetV2 and ConvNeXt is a powerful approach that can
leverage the efficiency of EfficientNetV2 and the feature richness of ConvNeXt.
Fig: Network Structure of ConvNeXt
ConvNeXt Stages (Stage 1 to Stage 4)
Input Layer(Conv2D+LayerNorm): Facial input images passes through a 2D convolutional layer to
extract low level feature, followed by Layer Normalization of stabilize and Accelerate Training.
   ❖ Stage 1: Initial feature extraction using ConvNeXt blocks. Followed by downsampling to reduce
     image size and increase abstraction.
   ❖ Stage 2: Builds on stage 1’s features, capturing more complex structures (e.g., edge patterns,
     basic facial parts). Downsampling continues for deeper representation.
   ❖ Stage 3: Further extracts semantic features (e.g., eyes, mouth shapes, etc.). Ends with another
     downsampling operation.
   ❖ Stage 4: Deepest features extracted — focuses on subtle emotion cues. Ends with Global
     Average Pooling, compressing feature maps to a compact representation.
 Output Layer (LayerNorm + Linear Layer): Normalizes and passes final features to a fully connected
layer. Predicts one of the emotional categories (e.g., anger, disgust, fear, happy, neutral, sad, surprise).
Fig: EfficientNetV2 for Emotion Detection           Fig: Attention mechanism SENet
Function:
   ➢ Input LayerInput Size: Typically 224×224×3 (RGB image). Takes in preprocessed face images.
   ➢ The stem that process the input image before deeper extraction begins. In emotion detection, the
     stem plays a critical role in preparing the raw facial image data for meaningful feature learning.
   ➢   EfficientNetV2 BackboneActs as a feature extractor. It progressively reduces spatial resolution
       while increasing channel depth, extracting complex hierarchical features.
   ➢ Global Average Pooling (GAP)Reduces feature maps to a 1D vector, maintaining essential
     features.
   ➢ Dropout is a regularization technique used in deep learning to prevent overfitting by randomly
     “dropping out” (i.e., setting to zero) a fraction of the neurons during training.
   ➢ Fully Connected (Dense) LayerOutputs a vector with seven softmax probabilities corresponding
     to:[Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral].
   ➢ Output LayerActivation: SoftmaxOutput: Class probabilities for the emotion classes.
                                    5.Proposed Research Plan
   ➢   To develop a new hybrid architecture combining EfficientNetV2, ConvNeXt blocks, and SENet.
   ➢   To develop a new own dataset for Higher accuracy.
   ➢   To compare performance of CNN, MobileNet, ResNet, DenseNet, and Dexpression.
   ➢   Real-time compatibility with webcam or mobile deployment.
   Month         Research          Existing       Research     Data Collection     Methodology        Result Test
                Background         method         Objective     And Analysis
                                   analysis
 Sep – Nov             ✓                    X         X               X                  X                X
 Dec – Jan            X                 ✓             X               X                  X                X
    Feb               X                X               ✓              X                  X                X
 Mar – May            X                X              X                ✓                 X                X
 Jun – July           X                X              X               X                   ✓               X
    Aug               X                X              X               X                   X                ✓
                                        6.References
1. Md Forhad Ali, Mehenag Khatun, and Nakib Aman Turzo. Facial emotion detection using neural
    network. the international journal of scientific and engineering research, 2020.
2. Tanveer Aslam, Salman Qadri, Muhammad Shehzad, S Furqan Qadri, Abdul Razzaq, and Syed
    Shah. Emotion based facial expression detection using machine learning. Life Science Journal,
    17(8):35–43, 2020.
3. Carmen Bisogni, Aniello Castiglione, Sanoar Hossain, Fabio Nar ducci, and Saiyed Umer. Impact
    of deep learning approaches on facial expression recognition in healthcare industries. IEEE
    Transactions on Industrial Informatics, 18(8):5619–5627, 2022.
4. Renuka S. Deshmukh, Vandana Jagtap, and Shilpa Paygude. Facial emotion recognition system
    through machine learning approach. In 2017 International Conference on Intelligent Computing and
    Control Systems (ICICCS), pages 272–277, 2017.
5. Dharma Karan Reddy Gaddam, Mohd Dilshad Ansari, Sandeep Vuppala, Vinit Kumar Gunjan, and
    Madan Mohan Sati. Human facial emotion detection using deep learning. In ICDSMLA 2020:
    Proceed ings of the 2nd International Conference on Data Science, Machine Learning and
    Applications, pages 1417–1427. Springer, 2022.
6. Akriti Jaiswal, A Krishnama Raju, and Suman Deb. Facial emotion detection using deep learning.
    In 2020 international conference for emerging technology (INCET), pages 1–5. IEEE, 2020.
7. Amjad Rehman Khan. Facial emotion recognition using conventional machine learning and deep
    learning methods: current achievements,analysis and remaining challenges. Information,
    13(6):268, 2022.
8. Sanghyuk Kim, Gwon Hwan An, and Suk-Ju Kang. Facial expression recognition system using
    machine learning. In 2017 international SoC design conference (ISOCC), pages 266–267. IEEE,
    2017.
9. Wafa Mellouk and Wahida Handouzi. Facial emotion recognition using deep learning: review and
    insights. Procedia Computer Science,175:689–694, 2020
10. Pranav, Suraj Kamal, C Satheesh Chandran, and MH Supriya.Facial emotion recognition using
    deep convolutional neural network.In 2020 6th International conference on advanced computing
    and communication Systems (ICACCS), pages 317–320. IEEE, 2020.
11. Shubham Kumar Singh, Revant Kumar Thakur, Satish Kumar, and Rohit Anand. Deep learning
    and machine learning based facial emotion detection using cnn. In 2022 9th International
    Conference on Computing for Sustainable Global Development (INDIACom), pages 530–535.
    IEEE, 2022.
12. Yatharth Yadav, Vikas Kumar, Vipin Ranga, and Ram Murti Rawat.Analysis of facial sentiments:
    a deep-learning way. In 2020 International Conference on Electronics and Sustainable
    Communication Systems (ICESC), pages 541–545. IEEE, 2020.
13. Feifei Zhang, Tianzhu Zhang, Qirong Mao, and Changsheng Xu.Geometry guided pose-invariant
    facial expression recognition. IEEE Transactions on Image Processing, 29:4445–4460, 2020.