VISVESVARAYA TECHNOLOGICAL UNIVERSITY
“JnanaSangama”, Belagavi-590014, Karnataka, India
MINI PROJECT REPORT ON
“SPEECH EMOTION RECOGNITION”
SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE SIXTTH SEMESTER BACHELOR OF ENGINEERING DEGREE
SUBMITTED BY
RAKSHA B S 1IC21AI025
MONICA K 1IC21CD002
RISHITHA B H 1IC21CD004
VARSHITHA R 1IC21CD009
UNDER THE GUIDANCE OF
Dr. SHEETHAL AJJ MANI
Professor
Dept. of AI&ML
IMPACT COLLEGE OF ENGINEERING AND APPLIED SCIENCES
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING-DATA SCIENCE
Sahakarnagar, Bengaluru-560092
2023-2024
IMPACT COLLEGE OF ENGINEERING AND APPLIED SCIENCES
Sahakarnagar Bangalore-560092
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING-DATA SCIENCE
CERTIFICATE
This is to certify that the project entitled “SPEECH EMOTION RECOGNITION” carried out by RAKSHA
BS[1IC21AI025], M O N I C A K [1IC21CD002], RISHITHA BH[1IC21CD004], V A R S H I T H A
R [1IC21CD009] bonafide students of Impact College of Engineering and Applied Sciences have been submitted
in partial fulfillment of requirements in Mini Project of VI Semester Bachelor of Engineering degree in
COMPUTER SCIENCE AND ENGINEERING-DATA SCIENCE as prescribed by VISVESVARAYA
TECHNOLOGICAL UNIVERSITY during the academic year of 2023-2024.
Signature of the Guide Signature of the HOD
Dr. Sheethal Ajj Mani Dr. Kaipa Sandhya
Professor Professor & HoD,
Dept. ofAI&ML Dept. of CS&E-DATA SCIENCE
Name of Examiner Signature with date
1. 1.
2. 2.
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the successful completion of any task would
be incomplete without the mention of the people who made it possible and whose constant
encouragement and guidance crowned my efforts with success.
We consider proud to be part of Impact College of Engineering and Applied Sciences
family, the institution which stood by our way in endeavor.
We are grateful to Dr. Kaipa Sandhya, Professor & HoD-Department of Computer
Science and Engineering-Data Science, Impact College of Engineering and Applied
Sciences who is source of inspiration and of invaluable help in channelizing our efforts in
right direction.
We express our deep and sincere thanks to our Management and Principal, Dr. Jalumedi
Babu, Impact College of Engineering and Applied Sciences for their continuous
support.
We would like to thank the faculty members and supporting staff of the Department of
Computer Science and Engineering-Data Science, Impact College of Engineering and
Applied Sciences for providing all the support for completing the project.
Finally, we are grateful to our parents and friends for their unconditional support and help
during the course of our project.
RAKSHA BS [1IC21AI025]
MONICA K [1IC21CD002]
RISHITHA BH [1IC21CD004]
VARSHITHA R [1IC21CD009]
[i]
ABSTRACT
The speech signal is one of the most natural and fastest methods of communication between
humans. Many systems have been developed by various researchers to identify the emotions
from the speech signal. In differentiating between various emotions particularly speech
features are more useful and if not clear is the reason that makes emotion recognition from
speaker‘s speech very difficult. After feature extraction, another important part is the
classification of speech emotions.
Speech emotion recognition, the best python mini project. The best example of it can be
seen at call centers. If you ever noticed, call centers employees never talk in the same
manner, their way of pitching/talking to the customers changes with customers. Now, this
does happen with common people too, but how is this relevant to call centers? Here is your
answer, the employees recognize customers’ emotions from speech, so they can improve
their service and convert more people. In this way, they are using speech emotion
recognition. Speech emotion recognition is a simple Python mini-project.
[ii]
CONTENTS
ACKNOWLEDGEMENT i
ABSTRACT ii
CHAPTER NO. TITLE PAGE NO.
Chapter 1 INTRODUCTION
1.1 Problem statement 1
1.2 Objectives 1
1.3 background and literature review 1
Chapter 2 REQUIREMENT ANALYSIS
2.1 hardware requirements 3
2.2 software requirements 3
Chapter 3 SYSTEM DESIGN AND
ARCHITECTURE
3.1 System design
5
3.2 system architecture
6
Chapter 4 IMPLEMENTATION
4.1 Introduction 8
4.2 Technical information 8
4.3 Model selection 8
4.4 Selection criteria 9
4.5 Model training 9
4.6 Evaluation metrices 9
4.7 Extract features and labels 10
4.8 Load and use trained model 12
Chapter 5 TESTING
5.1 Introduction to testing phase 13
5.2 Testing strategies 13
5.3 Performance analysis 13
5.4 Train the model 14
5.5 Integrated testing 15
5.6 Sub-system testing 15
5.7 System testing 15
5.8 Acceptance testing 15
Chapter 6 RESULTS AND DISCUSSIONS
6.1 Recognition of the emotion 16
6.2 Output 16
6.3 Discussions 17
Chapter 7 CONCLUSION & FUTURE WORK
7.1 Future work 18
7.2 Conclusion 18
REFERENCES 19
List Of Figures
Sl. No Fig. No Name of the figure Page No
1 3.1 Data flow of the project 5
2 4.1 Extract features and labels 11
3 4.2 Load and use trained model 12
4 5.1 Train the model 14
5 6.1 Output 16
Speech emotion recognition
CHAPTER 1
INTRODUCTION
Human speech, through speech, tone, pitch and many such features of the human vocal system,
conveys information and con- text. Speech emotion recognition are widely used in many fields
with natural human-computer interaction requirements. Speech emotion recognition, which is
characterized as extricating the emotional condition of a speaker from his or her discourse. This
sort of recognition is supposed to be used to extract useful semantics of speech recognition
systems.
The model of SER includes the discrete speech emotion model and continuous speech emotion
model. The discrete emotion model expresses several independent emotions, indicating that a
certain speech has a single independent emotion, while the continuous speech emotion means
that the emotion is in the emotion space, and every emotion has different strength on each
dimension.
Determining the emotional state of humans is an idiosyncratic task and may be used as a
standard for any emotion recognition model. It uses various emotion such as anger, disgust,
surprise, fear, joy, happiness, neutral and sadness.
1.1 problem statement:
For a dataset containing audio files of different actors, design a Speech Emotion
Recognition (SER) system which will analyse audio using required machine learning
techniques. Also, create a wave plot of the voice.
1.2 Objectives:
The primary objectives of this project are:
• The main goal of this module is to recognise emotion.
• To design a system which will extract, characterize and recognize the information of
speaker’s emotions.
• Accurately identify and classify emotional states (e.g., happiness, sadness, anger)
from speech signals.
• Aid in monitoring and assessing emotional well-being, particularly in therapeutic
settings.
• Identify signs of distress or frustration to enable timely intervention in high-stakes
situations.
• Contribute to the scientific understanding of emotion dynamics and their acoustic
manifestations.
1.3 Background and Literature Review:
Speech Emotion Recognition (SER) is a multidisciplinary field that combines aspects of
signal processing, machine learning, and psychology. The primary goal of SER is to analyze
vocal features and identify the emotional states of speakers based on their speech patterns.
This technology has seen rapid advancements due to increased interest in human-computer
interaction, sentiment analysis, and artificial intelligence.
Dept of CD 2023-24 01
Speech emotion recognition
Historically, early work in SER focused on acoustic features such as pitch, tone, and rhythm,
leveraging basic statistical methods. However, the advent of more sophisticated machine
learning algorithms and deep learning techniques has revolutionized the field, allowing for
more accurate advanced emotion detection.
Studies have extensively explored various acoustic features used for emotion classification,
such as Mel-frequency cepstral coefficients (MFCCs), pitch, intensity, and speech rate. These
features serve as fundamental inputs for machine learning models.
raditional approaches utilized classifiers like Support Vector Machines (SVM), Random
Forests, and k-Nearest Neighbors (k-NN). Recent research has shifted towards deep learning
methods, particularly Convolutional Neural Networks (CNNs) and Recurrent Neural
Networks (RNNs), which have shown superior performance in capturing temporal
dependencies in speech data.
Integration:
• Call Centers: Enhance customer interactions by identifying emotions, leading to
tailored responses and improved service quality.
• Mental Health Monitoring: Used in teletherapy and mental health applications to
gauge patients' emotional states.
• E-Learning Platforms: Track students' emotions to adjust content delivery and
improve engagement.
• Consumer Feedback: Analyze emotions in user feedback to gain insights into brand
perception and product reception.
Benefits:
• Improves understanding in conversations, helping to identify underlying feelings that
may not be verbally expressed.
• Tailors interactions based on emotional state, leading to higher satisfaction.
• Promotes more empathetic interactions, especially in fields like healthcare and
customer service.
• Identifies signs of distress or frustration, enabling timely intervention.
Dept of CD 2023-24 02
Speech emotion recognition
CHAPTER 2
REQUIREMENT ANALYSIS
Requirement analysis is a critical phase in the project lifecycle, defining the necessary
software and hardware components to achieve the project's objectives. This section outlines
the requirements that is being used for speech emotion recognition.
2.1 Hardware requirements
Computers / Laptops:
High-performance machines for development, capable of handling intensive data
processing and machine learning tasks.
Processor:
Multi-core processors (e.g., Intel i7/i9, AMD Ryzen 7/9).
RAM:
Minimum 16 GB, preferably 32 GB.
Storage:
SSD with at least 512 GB for fast data access and retrieval.
GPU:
Dedicated GPU (e.g., NVIDIA RTX 2070 or higher) for training machine learning models.
2.2 Software requirements
Data Collection and Processing- APIs and Data Sources:
• RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song):
Contains audio and video files with 8 emotions expressed by actors.
Programming Languages:
• Python: The most commonly used language for SER due to its extensive libraries and
ease of use.
Libraries and Frameworks- Machine Learning:
• Framework: Visual Studio Code.
• NumPy: For numerical operations.
• Librosa: For loading, visualizing, and transforming audio.
• Train_test_split: Split a dataset into training and testing subsets.
• OS: For interacting with the operating system.
• TensorFlow: An open-source library for deep learning.
• Keras: A high-level neural networks API, running on top of TensorFlow.
• TQDM: That provides fast, extensible progress bars for loops and other iterable tasks.
Emotions Dictionary:
Maps emotion codes to their corresponding emotional labels.
Dept of CD 2023-24 03
Speech emotion recognition
Feature Extraction:
• Extract MFCCs, chroma features, spectral contrast, etc., using Librosa or
OpenSMILE.
• Save extracted features in a structured format (e.g., CSV or HDF5).
Visualization:
• Matplotlib: For plotting, visualizing data in python.
Ubuntu environment in Visual Studio Code:
Visual Studio Code is a lightweight but powerful source code editor which runs on your
desktop and is available for Windows, macOS and Linux. It comes with built-in support for
JavaScript, TypeScript and Node.js and has a rich ecosystem of extensions for other
languages and runtimes (such as C++, C#, Java, Python, PHP, Go, .NET).
Dept of CD 2023-24 04
Speech emotion recognition
CHAPTER 3
SYSTEM DESIGN AND ARCHITECTURE
3.1 System design
3.1 Data flow of the project
The diagram above clearly illustrates the data flow in the speech emotion recogntion. The
input speech is trained and tested and then recognises the emotion of the input speech.
The image provides a clear illustration of a Speech Emotion Recognition (SER) system
architecture.
Components and Data Flow:
• Training Samples Collection: Collection of audio recordings of speech with labeled
emotions.
• Pre-Processing: Purpose is to clean the audio data with noise reduction, normalization,
silence removal, and segmentation.
• Feature Extraction: Convert audio signals into feature vectors that capture essential
characteristics. Techniques used is MFCC (Mel-Frequency Cepstral Coefficients) that
capture the power spectrum of the audio.
• Training Dictionary (Feature Vectors): Stores the extracted feature vectors along
with their corresponding emotion labels.
• Classifier Training: The classifier is the core component that processes the extracted
features from the test samples and assigns emotion labels based on the patterns
learned during training. Methods used are SVM (Support Vector Machine), Random
Forest, k-NN (k-Nearest Neighbors), CNN (Convolutional Neural Network), RNN
(Recurrent Neural Network), LSTM (Long Short-Term Memory).
Dept of CD 2023-24 05
Speech emotion recognition
• Test Samples Collection: Collection of new, unlabeled audio recordings for emotion
recognition.
• Emotion Recognition: Classify the extracted features into one of the emotion
categories using the trained classifier. The predicted emotion label for the test sample
(e.g., happy, sad, angry, excited).
Data Flow Description:
• Collect and preprocess training samples.
• Extract features and create a training dictionary.
• Train a classifier with the feature vectors.
• Collect test samples, preprocess, and extract features.
• Use the trained classifier to recognize emotions in the test samples.
This flow ensures the system can accurately detect the emotion of the user using the pitch and
speech of the user.
3.2 System Architecture
The system architecture for this project comprises one main component: That is the
emotion recognition through speech. This component is used to detect the user emotions
through speech.
• Data Collection Layer:
Training Data: Pre-recorded speech samples with emotion labels.
Test Data: New speech samples for emotion prediction.
• Preprocessing Layer:
Noise Reduction: Filters and techniques to remove background noise.
Silence Removal: Trimming silent portions of the audio clips.
Normalization: Standardizing audio signal amplitudes.
• Feature Extraction Layer:
Feature Extractor: Extracts relevant features from the audio signals using
MFCC (Mel-Frequency Cepstral Coefficients).
• Data Storage Layer:
Training Dictionary (Feature Vectors): Stores extracted features and their
corresponding emotion labels in a database or other storage medium.
• Model Training Layer:
Training Module: Trains a machine learning or deep learning model using
the feature vectors using SVM, Random Forest, k-NN, CNN, RNN, LSTM.
• Inference Layer:
Feature Extraction: Extracts features from test samples using the same
methods as in the training phase.
Classifier: Uses the trained model to classify the emotion of the test samples
based on extracted features.
Dept of CD 2023-24 06
Speech emotion recognition
• Output Layer:
Emotion Recognition Results: Outputs the predicted emotions for the test
samples.
Visualization and Reporting: Provides a user interface or dashboard to
display results.
Explanation of Each Layer and Their Interactions:
• Data Collection Layer: Audio data is collected, either pre-labeled for training or
unlabeled for testing.
• Preprocessing Layer: Raw audio data is cleaned and prepared for feature extraction,
ensuring better performance in subsequent steps.
• Feature Extraction Layer: Extracted features serve as the input to the machine
learning models. These features capture essential characteristics of the audio signals
relevant to emotion.
• Data Storage Layer: Stores the feature vectors and their corresponding labels, which
are used to train the model.
• Model Training Layer: The extracted features and labels are used to train a machine
learning model. The trained model is then saved for future inference.
• Inference Layer: For new audio samples, features are extracted using the same
method as in training. The trained model then predicts the emotion based on these
features.
• Output Layer: The predicted emotions are presented to the user. This layer may also
include tools for visualization and reporting of results.
The following this architecture, the system can effectively process audio data, extract
meaningful features, train a robust model, and accurately recognize emotions from speech in
real-time or batch processing scenarios.
Dept of CD 2023-24 07
Speech emotion recognition
CHAPTER 4
IMPLEMENTATION
4.1 Introduction
This chapter discuss about the implementation of the system. It provides the technical
information about the system. Systems implementation is the process of defining how the
information system should be built (i.e., physical system design), ensuring that the
information system is operational and used and ensuring that the information system meets
quality standard (i.e., quality assurance).
The purpose of the implementation process is to design and create (or fabricate) a system
element conforming to that element’s design properties or requirements. The element is
constructed employing appropriate technologies and industry practices. This process bridges
the system definition processes and the integration process. It discusses the printed version of
the system.
4.2 Technical Information
System and Software Design
System and Software design is a mechanism to transform user requirements into some
suitable form, which helps the programmer in software coding and implementation.
It deals with representing the client's requirement, as described in SRS (Software
Requirement Specification) document, into a form, i.e., easily implementable using
programming language.
The software design phase is the first step in SDLC (Software Design Life Cycle), which
moves the concentration from the problem domain to the solution domain. In software
design, we consider the system to be a set of components or modules with clearly defined
behaviors & boundaries.
4.3 Model Selection
In speech emotion recognition (SER), selecting an appropriate model is crucial for achieving
high accuracy and reliability. Various models can be employed, ranging from traditional
machine learning algorithms to more advanced deep learning techniques. Here are some
common model types used in SER:
• Traditional Machine Learning Models:
Support Vector Machines (SVM): Effective for high-dimensional spaces.
Commonly used with feature extraction techniques such as Mel-Frequency
Cepstral Coefficients (MFCCs).
Dept of CD 2023-24 08
Speech emotion recognition
• Deep Learning Models:
Convolutional Neural Networks (CNN): Excellent for feature extraction
from spectrograms or raw audio. Can capture local patterns in speech signals.
4.4 Selection Criteria
Considering Support Vector Machines (SVM) and Convolutional Neural Networks (CNN)
for speech emotion recognition (SER) is beneficial due to their unique strengths and
capabilities in handling different aspects of audio data.
• SVM: Best for high-dimensional feature spaces, robustness to overfitting, versatility
with kernel functions, and good performance with limited data.
• CNN: Effective in feature extraction, automatic feature learning, handling data
variability, and scalability for large datasets.
Using SVMs and CNNs in tandem or in a hybrid approach can combine the strengths of both
methods, potentially leading to even better performance in speech emotion recognition tasks.
4.5 Model Training
• Data Sources: Use datasets like RAVDESS datasets.
• Data Preprocessing: Noise Reduction, normalization, segmentation.
• Feature Extraction: Extract features such as MFCCs. Convert audio signals into
waveforms for CNN-based models.
• Training Process: Use frameworks like TensorFlow, Keras, PyTorch, or Scikit-learn.
• Fine-Tuning: Adjust hyperparameters, model architecture, or retrain with more data
to improve performance.
• Deployment: Export the trained model for deployment.
4.6 Evaluation Metrices
• Accuracy: The ratio of correctly predicted emotions to the total number of
predictions.
• Precision: The ratio of correctly predicted positive observations to the total predicted
positives.
• Recall: The ratio of correctly predicted positive observations to all observations in the
actual class.
• F1-Score: The harmonic mean of precision and recall, providing a balance between
the two.
Dept of CD 2023-24 09
Speech emotion recognition
4.7 Extract Features and Labels
Dept of CD 2023-24 10
Speech emotion recognition
4.1 Extract Features and Labels
Extracting features and labels is a crucial step in building a speech emotion recognition
(SER) system.
• Import Necessary Libraries
• Define Feature Extraction Function
• Extract Features and Labels from Dataset
• Encode Labels
• Split Data into Training and Testing Sets
Dept of CD 2023-24 11
Speech emotion recognition
4.8 Load and Use Trained Model
4.2 Load and Use Trained Model
To load and use a trained model in speech emotion recognition (SER), you need to follow
these steps:
• Train the model (if not already trained).
• Save the trained model to a file.
• Load the trained model from the file when needed.
• Use the loaded model to make predictions on new data.
Dept of CD 2023-24 12
Speech emotion recognition
CHAPTER 5
TESTING
Testing is a crucial phase in the development of any software project, including our
project. Below, I'll detail the testing phase with a focus on unit testing and integrated testing.
5.1 Introduction to Testing Phase
The testing phase ensures that all components of the project work as expected, meet
functional requirements, and provide a seamless user experience. Testing involves both unit
testing, which checks individual components, and integrated testing, which verifies interactions
between components.
The following statements serve as the objectives for testing:
1. Testing is a process of executing a program with the intent of finding error
2. A good test case is one that has a high probability of finding an as-yet
undiscovered error.
3. A successful test is one that uncovers as-yet undiscovered error.
5.2 Testing Strategies
• Purpose:
Generalization: Evaluate how well the model performs on unseen data.
Validation: Confirm model reliability before deployment.
• Methodology:
Data Preparation: Extract Features from Test Data: The same feature
extraction techniques used during training are applied to the test data.
Label Encoding: If the labels are categorical, they need to be encoded into
numerical values, just as done during training.
• Model Loading:
Load the Trained Model: Load the previously trained and saved model.
• Prediction:
Make Predictions: Use the trained model to make predictions on the test
dataset.
• Evaluation:
Calculate Metrics: Evaluate the model using various metrics such as
accuracy, precision, recall, F1-score, and confusion matrix.
5.3 Performance Analysis
• Metrics Comparison: Evaluate accuracy, precision, recall, and F1 score across test
cases.
Dept of CD 2023-24 13
Speech emotion recognition
5.4 Train the Model
5.1 Train the Model
Once the model is trained, testing the model involves evaluating its performance on a
separate, unseen test dataset. This helps in understanding how well the model generalizes to
new data.
• Load the Model: Load the trained CNN model from a file if it's not already in
memory.
• Prepare Test Data: Ensure test data is preprocessed and reshaped in the same way as
the training data.
• Make Predictions: Use the model to make predictions on the test data.
• Evaluate Performance: Calculate and display evaluation metrics such as
classification report and confusion matrix to assess the model's performance.
Dept of CD 2023-24 14
Speech emotion recognition
5.5 Integrated testing
Main function is design to call many sub functions, where different options are given in the
sub functions Now, different functions are included in the main separately and tested for
error. Compile and tested the project without any error’s get the output as we expected.
5.6 Sub-system testing
This phase involves testing collections of modules which have been integrated into sub-
systems. The sub-system test process should concentrate on the detection of module interface
errors by rigorously exercising these interfaces.
5.7 System testing
The sub-systems are integrated to make up the system. This process is concerned with finding
errors that result from unanticipated interactions between sub-systems and sub-system
interface problems. It is also concerned with validating that the system meets its functional
and non-functional requirements and testing the emergent system properties.
5.8 Acceptance testing
This is the final stage in the testing process before the system is accepted for operational use.
The system is tested with data given and gives the recognised emotion of the user.
Dept of CD 2023-24 15
Speech emotion recognition
CHAPTER 6
RESULTS & DISCUSSIONS
The results and discussions section of a project is crucial as it allows you to reflect on
the outcomes, findings, implications, and potential future directions of our project.
6.1 Recognition of the Emotion:
Model Output Interpretation:
The ouput will be shown in the form of a waveform and the predicted emotion of the
voice. For example: if the predicted emotion is anger, sad, happy or excited the resultant
output will be shown based of the predicted emotion.
Presentation Format:
• Output Format: Display predicted emotion alongside corresponding waveforms.
6.2 Output:
6.2 Output
Dept of CD 2023-24 16
Speech emotion recognition
6.3 Discussions
• Challenges Faced:
Speech Emotion Recognition (SER) involves several challenges that can impact the
accuracy and reliability of models.
Variability in Speech Data: Emotions are expressed differently by each
individual, making it difficult to generalize across different speakers.
Variations in accent, pronunciation, and speech rate can affect the consistency
of emotional cues.
Emotional Ambiguity: Some emotions are subtle and hard to distinguish,
especially if they are expressed with slight variations in tone or intensity.
People often express multiple emotions simultaneously, which can complicate
the classification.
Dataset Limitations: High-quality, large-scale datasets that cover a wide
range of emotions and speakers are often scarce. he accuracy of emotion labels
depends on human annotators, and inconsistencies or biases in labeling can
affect model performance.
Real-time Processing: Achieving real-time emotion recognition while
maintaining accuracy can be challenging, especially in live interactions or
applications.
Feature Extraction: Identifying the most relevant features (e.g., MFCCs,
pitch, tone) that best represent emotional states is challenging.
• Lessons Learned:
Importance of High-Quality Data
Feature Engineering is Critical
Model Complexity vs. Performance
Handling Real-World Challenges
Model Training and Optimization
Dept of CD 2023-24 17
Speech emotion recognition
CHAPTER 7
CONCLUSION & FUTURE WORK
7.1 Future Work:
As the field of Speech Emotion Recognition (SER) continues to evolve, several areas offer
exciting opportunities for future research and development. Future work in SER encompasses
a range of exciting opportunities, from enhancing data collection and model architectures to
improving contextual understanding and real-time processing. Addressing these areas will
lead to more accurate, robust, and practical SER systems with broad applications across
different domains. Continued innovation and research in these areas will drive the
advancement of SER technology and its integration into real-world solutions.
7.2 Conclusion:
Speech Emotion Recognition (SER) is a complex and evolving field with significant potential
applications in various domains such as customer service, mental health, and interactive
technologies.
successful implementation of SER systems requires a multifaceted approach that includes
high-quality data, effective feature engineering, careful model selection, robust evaluation,
and attention to real-world challenges. By addressing the aspects, SER systems can achieve
high accuracy, practical utility, and ongoing relevance in various applications.
Dept of CD 2023-24 18
Speech emotion recognition
REFERENCES
1. Introduction to Machine learning, PHI Learning pvt.ltd,2nd Ed.,2013
2. A Guide to Convolutional Neural Networks for Computer Vision
3. https://www.kaggle.com/code/shivamburnwal/speech-emotion
recognition
Dept of CD 2023-24 19