Real-Time Facial Emotion Detection
Real-Time Facial Emotion Detection
org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882
Abstract: The paper presents a real-time facial emotion recognition system using Convolutional Neural Networks (CNNs) and
OpenCV. The system processes video frames in real-time to detect faces and recognize emotions from the facial expressions. The
CNN model is trained on a large dataset of facial images and emotions, and the results demonstrate accurate and fast emotion
recognition performance. The integration of OpenCV with the CNN model enables real-time processing of video frames, making
the system suitable for various practical applications.One use of machine learning is the identification of facial expressions of
emotion.That were extracted from an image based on the features,it assigns a face emotion image to one of the facial emotion
classes. Among the classification techniques, convolutional neural network (cnn) also pulls patterns from a picture. In this study
we used the CNN model to recognize the facial expressions.To increase the precision of facial emotion detection, the wavelet
transform is then used. There are seven different face emotions represented in the facial emotion image dataset that was gathered
from Kaggle. The accuracy of the experimental facial emotion recognition utilising the CNN and wavelet transform increases.
Index Terms - Computer Vision, Emotion Recognition, Face Recognition, Feature Extraction, Image Sequences,
Convolutional neural network, Facial Expressions, wavelet transform.
I. INTRODUCTION
1.Intelligence System:
Intelligent systems are emerging more and more in people's lives and often have to be identified when using intelligent systems.
Traditional identification methods primarily identify persons with certain personal characteristics, such as identity documents, such
as keys and documents.They are easily forgotten, lost and tampered with.If you use some of the personal characteristics
to recognize the effect will be very good, such as: facial recognition, fingerprinting and so on.There are sharing parameters between
the convolution layer and the CNN convolution layer in terms of algorithms.
This has the benefit of reducing both the amount of memory needed and the number of parameters that must be trained. As a result,
the algorithm's performance is enhanced. In contrast, other machine learning algorithms require us to perform feature extraction or
pre-processing on the images. However, when using CNN for image processing, we hardly ever need to perform these operations.
This is something other machine learning algorithms cannot do. Deep learning has some drawbacks as well.
One of them is that building a depth model needs a lot of samples, which restricts where this algorithm can be used.Since face
recognition and licence plate character recognition have made significant strides in recent years, this topic will conduct some basic
research on CNN-based face recognition technology.
The name of this component is also logistic regression model. The structure is now referred to as a neural network model when
numerous neurons are connected to one another and when they were layered. A neural network with hidden layers can be seen in
Figure1.1:
The input of this neural network is made up of X1, X2, and X3. The offset node, also called the intercept term, is at position +1.
The input layer of the neural network is located in the left-most column of this neural network model, and the output layer is located
in the right-most column. A hidden layer that is completely connected to both the input layer and the output layer makes up the
network model's middle layer. The training sample set does not reveal the values of every node in the network model.
We can see from this neural network model that it has a total of three input units, three hidden units, and one output unit.The first
layer can be expressed as Ll, followed by the neural network's L1 output layer, whose output layer is Lnl. This neural network has
the following parameters:
is the connection parameter between the jth cell of layer 1 and the I th cell of layer l+1, and bi l is the offset of the I th cell of layer
1+1. Set ai (l) to represent the output value of the first few cells in this layer in the neural network model. Let l represent this layer
and I represent its first few cells. IOP Publishing IOP Conf. Series: Earth and Environmental Science 170 (2018) 032110
doi:10.1088/1755-1315/170/3/032110 3 1234567890 """ 2nd International Symposium on Resource Exploration and Environmental
Science We can use the formula hw,b(x) to determine this neural network's output given the set of parameters W and b.
Equation illustrates how forward propagation is calculated (3). Due to the multi-layered neural network and the necessity of gradient
descent + chain derivation rule, neural network training methods and the logistic regression model are similar. 3. CNN Model
Building and Instruction CNN model, 3. LeNet5, AlexNet, ZF Net, GooLeNet, and VGGNet are currently the categories into which
the typical neural network architecture is divided. The following will provide a detailed analysis of the LeNet5 architecture. LeNet5
is a long-gone CNN classic structure that is primarily used in the identification of handwritten fonts. It has a total of seven layers of
structure; aside from the input layer, each of the other layers contains multiple training parameters.
In our proposed work, Facial emotion with wavelet transforming, OpenCV also recognize the facial expression analysis of
video and we are applying here.
To develop Convolutional Neural Network algorithm, with Mobilenet and wavelet transform. Here emotion image dataset is
used.
By using CNN algorithm we identify/ classified facial emotion and usage of feb 2018 dataset.csv
We explore the presentation order of the samples during training and apply some pre-processing techniques to extract only
expression-specific features from a face image in order to solve the issue.
a. Proposed system :
The input of this neural network is made up of X1, X2, and X3. The offset node, also called the intercept term, is at position +1.
The input layer of the neural network is located in the left-most column of this neural network model, and the output layer is located
in the right-most column. A hidden layer that is completely connected to both the input layer and the output layer makes up the
network model's middle layer.
The training sample set does not reveal the values of every node in the network model. We can see from this neural network model
that it has a total of three input units, three hidden units, and one output unit. Using nl to represent the number of layers in the neural
network, we can see that this neural network has three layers. The first layer can be expressed as Ll, followed by the neural network's
L1 output layer, whose output layer is Lnl.
This neural network has the following parameters is the connection parameter between the jth cell of layer 1 and the I th cell of
layer l+1, and bi l is the offset of the I th cell of layer 1+1. Set ai (l) to represent the output value of the first few cells in this layer
in the neural network model. Let l represent this layer and I represent its first few cells. IOP Publishing IOP Conf. Series: Earth and
Environmental Science 170 (2018) 032110 doi:10.1088/1755-1315/170/3/032110 3 1234567890 """ 2nd International Symposium
on Resource Exploration and Environmental Science
We can use the formula hw,b(x) to determine this neural network's output given the set of parameters W and b. Equation illustrates
how forward propagation is calculated (3). Due to the multi-layered neural network and the necessity of gradient descent + chain
derivation rule, neural network training methods and the logistic regression model are similar. 3. CNN Model Building and
Instruction CNN model, 3. LeNet5, AlexNet, ZF Net, GooLeNet, and VGGNet are currently the categories into which the typical
neural network architecture is divided.
The following will provide a detailed analysis of the LeNet5 architecture. LeNet5 is a long-gone CNN classic structure that is
primarily used in the identification of handwritten fonts. It has a total of seven layers of structure; aside from the input layer, each
of the other layers contains multiple training parameters is following..
The research work can be described by the block diagram shown in fig.1
Image based:
Here we collect the one image of one person.
Training:
We are training our model with the CNN algorithm using the pre-processed training dataset.
Four layers make up the CNN algorithm: the input layer, the convolution layer, the pooling layer, the flatten layer, and the dense
layer.
We consider images to be input in the input layer.
We convert the image into matrices in the convolution layer.
The numerical values will be stored in the pooling layer. Utilizing the machine learning algorithm Softmax, we convert the
numerical data to binary data (supervised learning algorithm). We will convert the numerical data to binary in the Softmax layer.
IJCRT2304305 International Journal of Creative Research Thoughts (IJCRT) c555
www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 4 April 2023 | ISSN: 2320-2882
The classes of the entire dataset, which will be in binary data format, are stored in flatten layer and dense.
We save the data as.yml files using the fit generator method. See exampleLabeling:
Here we will split the images from the directory where we are saved images. After splitting will collect the label part and save as
pickle format.
Testing:
Image Based:
Here we will upload the image for testing that which will be recognized.
Video Based:
Here we test the data using of computer vision.
V. UML DIAGRAMS
Unified Modelling Language is known as UML. A general-purpose modelling language with standards, UML is used in the field
of object-oriented software engineering. The Object Management Group oversees and developed the standard.
The objective is for UML to establish itself as a standard language for modelling object-oriented computer programmes. UML
currently consists of a meta-model and a notation as its two main parts. In the future, UML might also be associated with or added
to in the form of a method or process.
The Unified Modelling Language is a standard language used for business modelling, non-software systems, and specifying,
visualising, building, and documenting the artefacts of software systems.
The UML is an amalgam of best engineering practises that have been effective in simulating large, complex systems.
The UML is a crucial component of the software development process and the creation of objects-oriented software. The UML
primarily employs graphical notations to convey software project design.
GOALS:
In the Unified Modeling Language (UML), a use case diagram is a specific kind of behavioural diagram that is defined by and
produced by a use-case analysis.
Its goal is to provide a graphical overview of a system's functionality in terms of actors, their objectives (represented as use cases),
and any dependencies between those use cases.
A use case diagram's primary function is to display which system functions are carried out for which actor. The system's actors can
be represented by their roles.
It is a Message Sequence Chart construct. Event diagrams, event scenarios, and timing diagrams are other names for sequence
diagrams.
.
Fig10: Collaboration diagram
Fig11:Deployment Diagram
ACTIVITY DIAGRAM
With support for choice, iteration, and concurrency, activity diagrams are graphical representations of workflows involving
sequential activities and actions. Activity diagrams can be used to describe the operational and business workflows of system
components in the Unified Modeling Language. An activity diagram demonstrates the overall control flow.
COMPONENT DIAGRAM
A component diagram, also referred to as a UML component diagram, outlines the arrangement and wiring of the actual physical
components in a system. Drawing component diagrams is a common practise for modelling implementation details and verifying
that planned development addresses every aspect of the system's necessary function.
Fig14: ER Diagram
DFD DIAGRAM:
A data flow diagram (DFD) is a common method for illustrating how information moves throughout a system. A good deal of the
system requirements can be graphicall depicted in a clean and clear DFD. It may be manual, automated, or a hybrid of the two. It
demonstrates how information enters and exits the system, what modifies the information, and where information is stored. A DFD's
main function is to outline the scope and bounds of a system as a whole. It may be utilised as a tool for communication between a
systems analyst and any individual who plays a component in the system that serves as the foundation for redesigning a
system.Context Level Diagram:
Level-2 Diagram:
The results and discussion for real-time facial expression recognition using CNN depend on the specific dataset, architecture,
hyperparameters, and evaluation metrics used in the experiment. Here are some possible results and discussion points:
1. Dataset: The choice of dataset affects the performance of the facial expression recognition system. Common datasets include
FER2013, CK+, and JAFFE. The system's accuracy and generalization ability can be evaluated using metrics such as precision,
recall, F1 score, and confusion matrix.
2. CNN architecture: The architecture of the CNN model affects the accuracy, computational efficiency, and real-time performance
of the system. Popular architectures include VGGNet, ResNet, and Inception. The number of layers, filter size, and pooling
strategies can be tuned to optimize the performance.
3. Hyperparameters: The hyperparameters of the CNN model affect the training process and performance of the system. The
learning rate, batch size, and regularization strength can be tuned using techniques such as grid search orrandom search.
4. Real-time performance: The real-time performance of the facial expression recognition system is a critical factor for practical
applications. The latency,frames per second, and memory usage can be evaluated to determine the system's efficiency and feasibility.
4. Discussion: The discussion of the results should address the strengths, limitations, and potential future directions of the facial
expression recognition system. The system's accuracy, real-time performance, and generalization ability can be compared with other
state-of-the-art methods. The limitations of the dataset, architecture, and hyperparameters should be acknowledged. Potential future
directions include improving the accuracy and efficiency of the system, expanding the dataset to include more diverse facial
expressions and conditions, and exploring the use of other deep learning models such as recurrent neural networks and attention
mechanisms
TEST CASES
In general, the results and discussion for real-time facial expression recognition using CNN should demonstrate the effectiveness
and feasibility of the system for practical applications. The system should achieve high accuracy, real-time performance, and
generalization ability while addressing the limitations and potential future directions of the field. In this Project, By using PyCharm
we developed the code for the Real-Time Facial Expression Recognition using CNN successfully.
VIII. CONCLUSION
We have successfully created an application that can detect and recognize faces in this project. Here we developed the two types of
methods like image and video based by using CNN algorithm. Once after trained the dataset results were tested by uploading image
and also video streaming with face inputs.
[1] Koujan, M. R., Alharbawee, L., Giannakakis, G., Pugeault, N., & Roussos, A.(2020, November). Real-time facial expression
recognition “in the wild” by disentangling 3d expression from identity. In 2020 15th IEEE International Conference on Automatic
Face and Gesture Recognition (FG 2020) (pp. 24-31). IEEE.
[2] Haghpanah, M. A., Saeedizade, E., Masouleh, M. T., & Kalhor, A. (2022, February). Real-time facial expression recognition
using facial landmarks and neural networks. In 2022 International Conference on Machine Vision and Image Processing
(MVIP) (pp. 1-7). IEEE.
[3] Bisogni, C., Castiglione, A., Hossain, S., Narducci, F., & Umer, S. (2022). Impact of deep learning approaches on facial expression
recognition in healthcare industries. IEEE Transactions on Industrial Informatics, 18(8), 5619-5627.
[4] Modi, S., & Bohara, M. H. (2021, May). Facial emotion recognition using convolution neural network. In 2021 5th international
conference on intelligent computing and control systems (ICICCS) (pp. 1339-1344). IEEE.
[5] Pham, L., Vu, T. H., & Tran, T. A. (2021, January). Facial expression recognition using residual masking network. In 2020 25Th
international conference on pattern recognition (ICPR) (pp. 4513-4519). IEEE.
[6] Zhang, K., Li, Y., Wang, J., Cambria, E., & Li, X. (2021). Real-time video emotion recognition based on reinforcement learning
and domain knowledge. IEEE Transactions on Circuits and Systems for Video Technology, 32(3), 1034-1047.
[7] Gilderlane Ribeiro Alexandre, José Marques Soares, George André Pereira [2020],Systematic review of 3D facial expression
recognition methods , Thé Pattern Recognition 100, 107108, 2020.
[8] Emmanuel Dufourq [2020],A survey on factors affecting facial expression recognition based on convolutional neural networks ,
Conference of the South African Institute of Computer Scientists and Information Technologists 2020, 168-179, 2020.
[9] H Dino, Maiwan Bahjat Abdulrazzaq, SR Zeebaree, Amira B Sallow, Rizgar R Zebari, Hanan M Shukur,Lailan M Haji
[2020],Facial Expression Recognition based on Hybrid Feature Extraction Techniques with Different Classifiers , TEST
Engineering & Management 83, 22319-22329, 2020.
[10] Liu Sijia, Chen Zhikun, Wang Fubin, et al. Multi-angle face recognition based on convolutional neural network [J]. Journal of
North China University of Technology (Natural Science Edition), 2019, 41 (4): 103-108.