0% found this document useful (0 votes)

25 views27 pages

Final Report

The document is a mini project report on 'Speech Emotion Recognition' submitted by students of Impact College of Engineering and Applied Sciences as part of their Bachelor of Engineering degree. It outlines the project's objectives, requirements, system design, and architecture, emphasizing the importance of recognizing emotions from speech for various applications such as call centers and mental health monitoring. The report includes acknowledgments, an abstract, and detailed chapters covering the project's implementation and testing phases.

Uploaded by

Monica karnunakaran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views27 pages

Final Report

Uploaded by

Monica karnunakaran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“JnanaSangama”, Belagavi-590014, Karnataka, India

MINI PROJECT REPORT ON

“SPEECH EMOTION RECOGNITION”

SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE SIXTTH SEMESTER BACHELOR OF ENGINEERING DEGREE

SUBMITTED BY

RAKSHA B S 1IC21AI025
MONICA K 1IC21CD002
RISHITHA B H 1IC21CD004
VARSHITHA R 1IC21CD009

UNDER THE GUIDANCE OF

Dr. SHEETHAL AJJ MANI

Professor
Dept. of AI&ML

IMPACT COLLEGE OF ENGINEERING AND APPLIED SCIENCES

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING-DATA SCIENCE
Sahakarnagar, Bengaluru-560092
2023-2024
IMPACT COLLEGE OF ENGINEERING AND APPLIED SCIENCES
Sahakarnagar Bangalore-560092

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING-DATA SCIENCE

CERTIFICATE
This is to certify that the project entitled “SPEECH EMOTION RECOGNITION” carried out by RAKSHA
BS[1IC21AI025], M O N I C A K [1IC21CD002], RISHITHA BH[1IC21CD004], V A R S H I T H A
R [1IC21CD009] bonafide students of Impact College of Engineering and Applied Sciences have been submitted
in partial fulfillment of requirements in Mini Project of VI Semester Bachelor of Engineering degree in
COMPUTER SCIENCE AND ENGINEERING-DATA SCIENCE as prescribed by VISVESVARAYA
TECHNOLOGICAL UNIVERSITY during the academic year of 2023-2024.

Signature of the Guide Signature of the HOD

Dr. Sheethal Ajj Mani Dr. Kaipa Sandhya
Professor Professor & HoD,
Dept. ofAI&ML Dept. of CS&E-DATA SCIENCE

Name of Examiner Signature with date

1. 1.

2. 2.
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompany the successful completion of any task would
be incomplete without the mention of the people who made it possible and whose constant
encouragement and guidance crowned my efforts with success.

We consider proud to be part of Impact College of Engineering and Applied Sciences

family, the institution which stood by our way in endeavor.

We are grateful to Dr. Kaipa Sandhya, Professor & HoD-Department of Computer

Science and Engineering-Data Science, Impact College of Engineering and Applied
Sciences who is source of inspiration and of invaluable help in channelizing our efforts in
right direction.

We express our deep and sincere thanks to our Management and Principal, Dr. Jalumedi
Babu, Impact College of Engineering and Applied Sciences for their continuous
support.

We would like to thank the faculty members and supporting staff of the Department of
Computer Science and Engineering-Data Science, Impact College of Engineering and
Applied Sciences for providing all the support for completing the project.

Finally, we are grateful to our parents and friends for their unconditional support and help
during the course of our project.

RAKSHA BS [1IC21AI025]
MONICA K [1IC21CD002]
RISHITHA BH [1IC21CD004]
VARSHITHA R [1IC21CD009]

[i]
ABSTRACT
The speech signal is one of the most natural and fastest methods of communication between
humans. Many systems have been developed by various researchers to identify the emotions
from the speech signal. In differentiating between various emotions particularly speech
features are more useful and if not clear is the reason that makes emotion recognition from
speaker‘s speech very difficult. After feature extraction, another important part is the
classification of speech emotions.

Speech emotion recognition, the best python mini project. The best example of it can be
seen at call centers. If you ever noticed, call centers employees never talk in the same
manner, their way of pitching/talking to the customers changes with customers. Now, this
does happen with common people too, but how is this relevant to call centers? Here is your
answer, the employees recognize customers’ emotions from speech, so they can improve
their service and convert more people. In this way, they are using speech emotion
recognition. Speech emotion recognition is a simple Python mini-project.

[ii]
CONTENTS

ACKNOWLEDGEMENT i
ABSTRACT ii

CHAPTER NO. TITLE PAGE NO.

Chapter 1 INTRODUCTION
1.1 Problem statement 1
1.2 Objectives 1
1.3 background and literature review 1

Chapter 2 REQUIREMENT ANALYSIS

2.1 hardware requirements 3
2.2 software requirements 3

Chapter 3 SYSTEM DESIGN AND

ARCHITECTURE
3.1 System design
5
3.2 system architecture
6

Chapter 4 IMPLEMENTATION
4.1 Introduction 8
4.2 Technical information 8
4.3 Model selection 8
4.4 Selection criteria 9
4.5 Model training 9
4.6 Evaluation metrices 9
4.7 Extract features and labels 10
4.8 Load and use trained model 12

Chapter 5 TESTING
5.1 Introduction to testing phase 13
5.2 Testing strategies 13
5.3 Performance analysis 13
5.4 Train the model 14
5.5 Integrated testing 15
5.6 Sub-system testing 15
5.7 System testing 15
5.8 Acceptance testing 15

Chapter 6 RESULTS AND DISCUSSIONS

6.1 Recognition of the emotion 16
6.2 Output 16
6.3 Discussions 17

Chapter 7 CONCLUSION & FUTURE WORK

7.1 Future work 18
7.2 Conclusion 18

REFERENCES 19
List Of Figures

Sl. No Fig. No Name of the figure Page No

1 3.1 Data flow of the project 5

2 4.1 Extract features and labels 11
3 4.2 Load and use trained model 12
4 5.1 Train the model 14
5 6.1 Output 16
Speech emotion recognition

CHAPTER 1
INTRODUCTION

Human speech, through speech, tone, pitch and many such features of the human vocal system,
conveys information and con- text. Speech emotion recognition are widely used in many fields
with natural human-computer interaction requirements. Speech emotion recognition, which is
characterized as extricating the emotional condition of a speaker from his or her discourse. This
sort of recognition is supposed to be used to extract useful semantics of speech recognition
systems.
The model of SER includes the discrete speech emotion model and continuous speech emotion
model. The discrete emotion model expresses several independent emotions, indicating that a
certain speech has a single independent emotion, while the continuous speech emotion means
that the emotion is in the emotion space, and every emotion has different strength on each
dimension.
Determining the emotional state of humans is an idiosyncratic task and may be used as a
standard for any emotion recognition model. It uses various emotion such as anger, disgust,
surprise, fear, joy, happiness, neutral and sadness.

1.1 problem statement:

For a dataset containing audio files of different actors, design a Speech Emotion
Recognition (SER) system which will analyse audio using required machine learning
techniques. Also, create a wave plot of the voice.

1.2 Objectives:
The primary objectives of this project are:
• The main goal of this module is to recognise emotion.
• To design a system which will extract, characterize and recognize the information of
speaker’s emotions.
• Accurately identify and classify emotional states (e.g., happiness, sadness, anger)
from speech signals.
• Aid in monitoring and assessing emotional well-being, particularly in therapeutic
settings.
• Identify signs of distress or frustration to enable timely intervention in high-stakes
situations.
• Contribute to the scientific understanding of emotion dynamics and their acoustic
manifestations.

1.3 Background and Literature Review:

Speech Emotion Recognition (SER) is a multidisciplinary field that combines aspects of
signal processing, machine learning, and psychology. The primary goal of SER is to analyze
vocal features and identify the emotional states of speakers based on their speech patterns.
This technology has seen rapid advancements due to increased interest in human-computer
interaction, sentiment analysis, and artificial intelligence.

Dept of CD 2023-24 01
Speech emotion recognition

Historically, early work in SER focused on acoustic features such as pitch, tone, and rhythm,
leveraging basic statistical methods. However, the advent of more sophisticated machine
learning algorithms and deep learning techniques has revolutionized the field, allowing for
more accurate advanced emotion detection.

Studies have extensively explored various acoustic features used for emotion classification,
such as Mel-frequency cepstral coefficients (MFCCs), pitch, intensity, and speech rate. These
features serve as fundamental inputs for machine learning models.

raditional approaches utilized classifiers like Support Vector Machines (SVM), Random
Forests, and k-Nearest Neighbors (k-NN). Recent research has shifted towards deep learning
methods, particularly Convolutional Neural Networks (CNNs) and Recurrent Neural
Networks (RNNs), which have shown superior performance in capturing temporal
dependencies in speech data.

Integration:

• Call Centers: Enhance customer interactions by identifying emotions, leading to

tailored responses and improved service quality.
• Mental Health Monitoring: Used in teletherapy and mental health applications to
gauge patients' emotional states.
• E-Learning Platforms: Track students' emotions to adjust content delivery and
improve engagement.
• Consumer Feedback: Analyze emotions in user feedback to gain insights into brand
perception and product reception.

Benefits:

• Improves understanding in conversations, helping to identify underlying feelings that

may not be verbally expressed.
• Tailors interactions based on emotional state, leading to higher satisfaction.
• Promotes more empathetic interactions, especially in fields like healthcare and
customer service.
• Identifies signs of distress or frustration, enabling timely intervention.

Dept of CD 2023-24 02
Speech emotion recognition

CHAPTER 2
REQUIREMENT ANALYSIS

Requirement analysis is a critical phase in the project lifecycle, defining the necessary
software and hardware components to achieve the project's objectives. This section outlines
the requirements that is being used for speech emotion recognition.

2.1 Hardware requirements

Computers / Laptops:
High-performance machines for development, capable of handling intensive data
processing and machine learning tasks.

Processor:
Multi-core processors (e.g., Intel i7/i9, AMD Ryzen 7/9).

RAM:
Minimum 16 GB, preferably 32 GB.

Storage:
SSD with at least 512 GB for fast data access and retrieval.

GPU:
Dedicated GPU (e.g., NVIDIA RTX 2070 or higher) for training machine learning models.

2.2 Software requirements

Data Collection and Processing- APIs and Data Sources:
• RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song):
Contains audio and video files with 8 emotions expressed by actors.

Programming Languages:
• Python: The most commonly used language for SER due to its extensive libraries and
ease of use.

Libraries and Frameworks- Machine Learning:

• Framework: Visual Studio Code.
• NumPy: For numerical operations.
• Librosa: For loading, visualizing, and transforming audio.
• Train_test_split: Split a dataset into training and testing subsets.
• OS: For interacting with the operating system.
• TensorFlow: An open-source library for deep learning.
• Keras: A high-level neural networks API, running on top of TensorFlow.
• TQDM: That provides fast, extensible progress bars for loops and other iterable tasks.

Emotions Dictionary:
Maps emotion codes to their corresponding emotional labels.

Dept of CD 2023-24 03
Speech emotion recognition

Feature Extraction:
• Extract MFCCs, chroma features, spectral contrast, etc., using Librosa or
OpenSMILE.
• Save extracted features in a structured format (e.g., CSV or HDF5).

Visualization:
• Matplotlib: For plotting, visualizing data in python.

Ubuntu environment in Visual Studio Code:

Visual Studio Code is a lightweight but powerful source code editor which runs on your
desktop and is available for Windows, macOS and Linux. It comes with built-in support for
JavaScript, TypeScript and Node.js and has a rich ecosystem of extensions for other
languages and runtimes (such as C++, C#, Java, Python, PHP, Go, .NET).

Dept of CD 2023-24 04
Speech emotion recognition

CHAPTER 3
SYSTEM DESIGN AND ARCHITECTURE

3.1 System design

3.1 Data flow of the project

The diagram above clearly illustrates the data flow in the speech emotion recogntion. The
input speech is trained and tested and then recognises the emotion of the input speech.
The image provides a clear illustration of a Speech Emotion Recognition (SER) system
architecture.

Components and Data Flow:

• Training Samples Collection: Collection of audio recordings of speech with labeled
emotions.
• Pre-Processing: Purpose is to clean the audio data with noise reduction, normalization,
silence removal, and segmentation.
• Feature Extraction: Convert audio signals into feature vectors that capture essential
characteristics. Techniques used is MFCC (Mel-Frequency Cepstral Coefficients) that
capture the power spectrum of the audio.
• Training Dictionary (Feature Vectors): Stores the extracted feature vectors along
with their corresponding emotion labels.
• Classifier Training: The classifier is the core component that processes the extracted
features from the test samples and assigns emotion labels based on the patterns
learned during training. Methods used are SVM (Support Vector Machine), Random
Forest, k-NN (k-Nearest Neighbors), CNN (Convolutional Neural Network), RNN
(Recurrent Neural Network), LSTM (Long Short-Term Memory).

Dept of CD 2023-24 05
Speech emotion recognition

• Test Samples Collection: Collection of new, unlabeled audio recordings for emotion
recognition.
• Emotion Recognition: Classify the extracted features into one of the emotion
categories using the trained classifier. The predicted emotion label for the test sample
(e.g., happy, sad, angry, excited).

Data Flow Description:

• Collect and preprocess training samples.

• Extract features and create a training dictionary.
• Train a classifier with the feature vectors.
• Collect test samples, preprocess, and extract features.
• Use the trained classifier to recognize emotions in the test samples.

This flow ensures the system can accurately detect the emotion of the user using the pitch and
speech of the user.

3.2 System Architecture

The system architecture for this project comprises one main component: That is the
emotion recognition through speech. This component is used to detect the user emotions
through speech.

• Data Collection Layer:

Training Data: Pre-recorded speech samples with emotion labels.
Test Data: New speech samples for emotion prediction.
• Preprocessing Layer:
Noise Reduction: Filters and techniques to remove background noise.
Silence Removal: Trimming silent portions of the audio clips.
Normalization: Standardizing audio signal amplitudes.
• Feature Extraction Layer:
Feature Extractor: Extracts relevant features from the audio signals using
MFCC (Mel-Frequency Cepstral Coefficients).
• Data Storage Layer:
Training Dictionary (Feature Vectors): Stores extracted features and their
corresponding emotion labels in a database or other storage medium.
• Model Training Layer:
Training Module: Trains a machine learning or deep learning model using
the feature vectors using SVM, Random Forest, k-NN, CNN, RNN, LSTM.
• Inference Layer:
Feature Extraction: Extracts features from test samples using the same
methods as in the training phase.
Classifier: Uses the trained model to classify the emotion of the test samples
based on extracted features.

Dept of CD 2023-24 06
Speech emotion recognition

• Output Layer:
Emotion Recognition Results: Outputs the predicted emotions for the test
samples.
Visualization and Reporting: Provides a user interface or dashboard to
display results.

Explanation of Each Layer and Their Interactions:

• Data Collection Layer: Audio data is collected, either pre-labeled for training or
unlabeled for testing.
• Preprocessing Layer: Raw audio data is cleaned and prepared for feature extraction,
ensuring better performance in subsequent steps.
• Feature Extraction Layer: Extracted features serve as the input to the machine
learning models. These features capture essential characteristics of the audio signals
relevant to emotion.
• Data Storage Layer: Stores the feature vectors and their corresponding labels, which
are used to train the model.
• Model Training Layer: The extracted features and labels are used to train a machine
learning model. The trained model is then saved for future inference.
• Inference Layer: For new audio samples, features are extracted using the same
method as in training. The trained model then predicts the emotion based on these
features.
• Output Layer: The predicted emotions are presented to the user. This layer may also
include tools for visualization and reporting of results.

The following this architecture, the system can effectively process audio data, extract
meaningful features, train a robust model, and accurately recognize emotions from speech in
real-time or batch processing scenarios.

Dept of CD 2023-24 07
Speech emotion recognition

CHAPTER 4
IMPLEMENTATION

4.1 Introduction

This chapter discuss about the implementation of the system. It provides the technical
information about the system. Systems implementation is the process of defining how the
information system should be built (i.e., physical system design), ensuring that the
information system is operational and used and ensuring that the information system meets
quality standard (i.e., quality assurance).

The purpose of the implementation process is to design and create (or fabricate) a system
element conforming to that element’s design properties or requirements. The element is
constructed employing appropriate technologies and industry practices. This process bridges
the system definition processes and the integration process. It discusses the printed version of
the system.

4.2 Technical Information

System and Software Design

System and Software design is a mechanism to transform user requirements into some
suitable form, which helps the programmer in software coding and implementation.

It deals with representing the client's requirement, as described in SRS (Software

Requirement Specification) document, into a form, i.e., easily implementable using
programming language.

The software design phase is the first step in SDLC (Software Design Life Cycle), which
moves the concentration from the problem domain to the solution domain. In software
design, we consider the system to be a set of components or modules with clearly defined
behaviors & boundaries.

4.3 Model Selection

In speech emotion recognition (SER), selecting an appropriate model is crucial for achieving
high accuracy and reliability. Various models can be employed, ranging from traditional
machine learning algorithms to more advanced deep learning techniques. Here are some
common model types used in SER:

• Traditional Machine Learning Models:

Support Vector Machines (SVM): Effective for high-dimensional spaces.
Commonly used with feature extraction techniques such as Mel-Frequency
Cepstral Coefficients (MFCCs).

Dept of CD 2023-24 08
Speech emotion recognition

• Deep Learning Models:

Convolutional Neural Networks (CNN): Excellent for feature extraction
from spectrograms or raw audio. Can capture local patterns in speech signals.

4.4 Selection Criteria

Considering Support Vector Machines (SVM) and Convolutional Neural Networks (CNN)
for speech emotion recognition (SER) is beneficial due to their unique strengths and
capabilities in handling different aspects of audio data.

• SVM: Best for high-dimensional feature spaces, robustness to overfitting, versatility

with kernel functions, and good performance with limited data.
• CNN: Effective in feature extraction, automatic feature learning, handling data
variability, and scalability for large datasets.

Using SVMs and CNNs in tandem or in a hybrid approach can combine the strengths of both
methods, potentially leading to even better performance in speech emotion recognition tasks.

4.5 Model Training

• Data Sources: Use datasets like RAVDESS datasets.
• Data Preprocessing: Noise Reduction, normalization, segmentation.
• Feature Extraction: Extract features such as MFCCs. Convert audio signals into
waveforms for CNN-based models.
• Training Process: Use frameworks like TensorFlow, Keras, PyTorch, or Scikit-learn.
• Fine-Tuning: Adjust hyperparameters, model architecture, or retrain with more data
to improve performance.
• Deployment: Export the trained model for deployment.

4.6 Evaluation Metrices

• Accuracy: The ratio of correctly predicted emotions to the total number of
predictions.
• Precision: The ratio of correctly predicted positive observations to the total predicted
positives.
• Recall: The ratio of correctly predicted positive observations to all observations in the
actual class.
• F1-Score: The harmonic mean of precision and recall, providing a balance between
the two.

Dept of CD 2023-24 09
Speech emotion recognition

4.7 Extract Features and Labels

Dept of CD 2023-24 10
Speech emotion recognition

4.1 Extract Features and Labels

Extracting features and labels is a crucial step in building a speech emotion recognition
(SER) system.
• Import Necessary Libraries
• Define Feature Extraction Function
• Extract Features and Labels from Dataset
• Encode Labels
• Split Data into Training and Testing Sets

Dept of CD 2023-24 11
Speech emotion recognition

4.8 Load and Use Trained Model

4.2 Load and Use Trained Model

To load and use a trained model in speech emotion recognition (SER), you need to follow
these steps:
• Train the model (if not already trained).
• Save the trained model to a file.
• Load the trained model from the file when needed.
• Use the loaded model to make predictions on new data.

Dept of CD 2023-24 12
Speech emotion recognition

CHAPTER 5

TESTING
Testing is a crucial phase in the development of any software project, including our
project. Below, I'll detail the testing phase with a focus on unit testing and integrated testing.

5.1 Introduction to Testing Phase

The testing phase ensures that all components of the project work as expected, meet
functional requirements, and provide a seamless user experience. Testing involves both unit
testing, which checks individual components, and integrated testing, which verifies interactions
between components.
The following statements serve as the objectives for testing:

1. Testing is a process of executing a program with the intent of finding error

2. A good test case is one that has a high probability of finding an as-yet
undiscovered error.
3. A successful test is one that uncovers as-yet undiscovered error.

5.2 Testing Strategies

• Purpose:
Generalization: Evaluate how well the model performs on unseen data.
Validation: Confirm model reliability before deployment.
• Methodology:
Data Preparation: Extract Features from Test Data: The same feature
extraction techniques used during training are applied to the test data.
Label Encoding: If the labels are categorical, they need to be encoded into
numerical values, just as done during training.
• Model Loading:
Load the Trained Model: Load the previously trained and saved model.
• Prediction:
Make Predictions: Use the trained model to make predictions on the test
dataset.
• Evaluation:
Calculate Metrics: Evaluate the model using various metrics such as
accuracy, precision, recall, F1-score, and confusion matrix.

5.3 Performance Analysis

• Metrics Comparison: Evaluate accuracy, precision, recall, and F1 score across test
cases.

Dept of CD 2023-24 13
Speech emotion recognition

5.4 Train the Model

5.1 Train the Model

Once the model is trained, testing the model involves evaluating its performance on a
separate, unseen test dataset. This helps in understanding how well the model generalizes to
new data.
• Load the Model: Load the trained CNN model from a file if it's not already in
memory.
• Prepare Test Data: Ensure test data is preprocessed and reshaped in the same way as
the training data.
• Make Predictions: Use the model to make predictions on the test data.
• Evaluate Performance: Calculate and display evaluation metrics such as
classification report and confusion matrix to assess the model's performance.

Dept of CD 2023-24 14
Speech emotion recognition

5.5 Integrated testing

Main function is design to call many sub functions, where different options are given in the
sub functions Now, different functions are included in the main separately and tested for
error. Compile and tested the project without any error’s get the output as we expected.

5.6 Sub-system testing

This phase involves testing collections of modules which have been integrated into sub-
systems. The sub-system test process should concentrate on the detection of module interface
errors by rigorously exercising these interfaces.

5.7 System testing

The sub-systems are integrated to make up the system. This process is concerned with finding
errors that result from unanticipated interactions between sub-systems and sub-system
interface problems. It is also concerned with validating that the system meets its functional
and non-functional requirements and testing the emergent system properties.

5.8 Acceptance testing

This is the final stage in the testing process before the system is accepted for operational use.
The system is tested with data given and gives the recognised emotion of the user.

Dept of CD 2023-24 15
Speech emotion recognition

CHAPTER 6
RESULTS & DISCUSSIONS

The results and discussions section of a project is crucial as it allows you to reflect on
the outcomes, findings, implications, and potential future directions of our project.

6.1 Recognition of the Emotion:

Model Output Interpretation:

The ouput will be shown in the form of a waveform and the predicted emotion of the
voice. For example: if the predicted emotion is anger, sad, happy or excited the resultant
output will be shown based of the predicted emotion.

Presentation Format:
• Output Format: Display predicted emotion alongside corresponding waveforms.

6.2 Output:

6.2 Output

Dept of CD 2023-24 16
Speech emotion recognition

6.3 Discussions
• Challenges Faced:
Speech Emotion Recognition (SER) involves several challenges that can impact the
accuracy and reliability of models.
Variability in Speech Data: Emotions are expressed differently by each
individual, making it difficult to generalize across different speakers.
Variations in accent, pronunciation, and speech rate can affect the consistency
of emotional cues.
Emotional Ambiguity: Some emotions are subtle and hard to distinguish,
especially if they are expressed with slight variations in tone or intensity.
People often express multiple emotions simultaneously, which can complicate
the classification.
Dataset Limitations: High-quality, large-scale datasets that cover a wide
range of emotions and speakers are often scarce. he accuracy of emotion labels
depends on human annotators, and inconsistencies or biases in labeling can
affect model performance.
Real-time Processing: Achieving real-time emotion recognition while
maintaining accuracy can be challenging, especially in live interactions or
applications.
Feature Extraction: Identifying the most relevant features (e.g., MFCCs,
pitch, tone) that best represent emotional states is challenging.

• Lessons Learned:
Importance of High-Quality Data
Feature Engineering is Critical
Model Complexity vs. Performance
Handling Real-World Challenges
Model Training and Optimization

Dept of CD 2023-24 17
Speech emotion recognition

CHAPTER 7
CONCLUSION & FUTURE WORK

7.1 Future Work:

As the field of Speech Emotion Recognition (SER) continues to evolve, several areas offer
exciting opportunities for future research and development. Future work in SER encompasses
a range of exciting opportunities, from enhancing data collection and model architectures to
improving contextual understanding and real-time processing. Addressing these areas will
lead to more accurate, robust, and practical SER systems with broad applications across
different domains. Continued innovation and research in these areas will drive the
advancement of SER technology and its integration into real-world solutions.

7.2 Conclusion:

Speech Emotion Recognition (SER) is a complex and evolving field with significant potential
applications in various domains such as customer service, mental health, and interactive
technologies.
successful implementation of SER systems requires a multifaceted approach that includes
high-quality data, effective feature engineering, careful model selection, robust evaluation,
and attention to real-world challenges. By addressing the aspects, SER systems can achieve
high accuracy, practical utility, and ongoing relevance in various applications.

Dept of CD 2023-24 18
Speech emotion recognition

REFERENCES

1. Introduction to Machine learning, PHI Learning pvt.ltd,2nd Ed.,2013

2. A Guide to Convolutional Neural Networks for Computer Vision
3. https://www.kaggle.com/code/shivamburnwal/speech-emotion
recognition

Dept of CD 2023-24 19

Sample Poster Template CSE
No ratings yet
Sample Poster Template CSE
1 page
SER Documentation Satwik
No ratings yet
SER Documentation Satwik
47 pages
SER Group Documentation
No ratings yet
SER Group Documentation
50 pages
Documentation Batch
No ratings yet
Documentation Batch
38 pages
First Review PPT Template
No ratings yet
First Review PPT Template
14 pages
Sentiment Emotion Recognition
No ratings yet
Sentiment Emotion Recognition
6 pages
Md. Maruf Hossain
No ratings yet
Md. Maruf Hossain
60 pages
Project - I Review-2 Report SAMPLE
No ratings yet
Project - I Review-2 Report SAMPLE
16 pages
SMR6!
No ratings yet
SMR6!
14 pages
1st Review
No ratings yet
1st Review
19 pages
Demo
No ratings yet
Demo
29 pages
Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0
No ratings yet
Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0
12 pages
Project Repoprt Final-Speech Emotion Recognition
No ratings yet
Project Repoprt Final-Speech Emotion Recognition
25 pages
Web Development
No ratings yet
Web Development
16 pages
IJRPR4210
No ratings yet
IJRPR4210
12 pages
Seds Project
No ratings yet
Seds Project
54 pages
Speech Emotion Recognition Guide
No ratings yet
Speech Emotion Recognition Guide
14 pages
Sentispeak Tone Mood Detector
No ratings yet
Sentispeak Tone Mood Detector
16 pages
Dual Recurrent Model for Speech Emotion Recognition
No ratings yet
Dual Recurrent Model for Speech Emotion Recognition
106 pages
Speech Emotion AI for Tech Experts
No ratings yet
Speech Emotion AI for Tech Experts
15 pages
Speech Emotion Recognization
No ratings yet
Speech Emotion Recognization
65 pages
Synopsis Ishita
No ratings yet
Synopsis Ishita
17 pages
Phase1 Reference Report
No ratings yet
Phase1 Reference Report
11 pages
Organized Removed
No ratings yet
Organized Removed
55 pages
Speech-Emotion-Recognition Using SVM, Decision Tree and LDA Report
No ratings yet
Speech-Emotion-Recognition Using SVM, Decision Tree and LDA Report
7 pages
Speech Emotion Recognition Guide
No ratings yet
Speech Emotion Recognition Guide
86 pages
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
No ratings yet
Winter Semester 2021-22 CSE4020-Machine Learning Digital Assignment-1
20 pages
Audio Spotlight PDF
No ratings yet
Audio Spotlight PDF
29 pages
Speech Emotion Recognition Using Deep Learning: Nithya Roopa S., Prabhakaran M, Betty.P
No ratings yet
Speech Emotion Recognition Using Deep Learning: Nithya Roopa S., Prabhakaran M, Betty.P
4 pages
Group No 37
No ratings yet
Group No 37
19 pages
SER (Research Paper)
No ratings yet
SER (Research Paper)
5 pages
CS21B1051
No ratings yet
CS21B1051
27 pages
Set Conference Draft Paper - 223585
No ratings yet
Set Conference Draft Paper - 223585
6 pages
Serdl 2
No ratings yet
Serdl 2
10 pages
Speech Emotion Recognition System
No ratings yet
Speech Emotion Recognition System
4 pages
9 - Yogendra
No ratings yet
9 - Yogendra
5 pages
Sat - 82.Pdf - Election Prediction With Automated Speech Emotion Recognition
No ratings yet
Sat - 82.Pdf - Election Prediction With Automated Speech Emotion Recognition
11 pages
Deep Learning Structure For Emotion Prediction Using MFCC From Native Languages
No ratings yet
Deep Learning Structure For Emotion Prediction Using MFCC From Native Languages
13 pages
Review 3 PPT Final1)
No ratings yet
Review 3 PPT Final1)
51 pages
Research Paper On Speech Emotion Recogtion System
No ratings yet
Research Paper On Speech Emotion Recogtion System
9 pages
Enhancing Emergency Response Through Speech Emotion Recognition A Machine Learning Approach
No ratings yet
Enhancing Emergency Response Through Speech Emotion Recognition A Machine Learning Approach
5 pages
Speech Emotion Recognition Using Machine Learningg
No ratings yet
Speech Emotion Recognition Using Machine Learningg
19 pages
Synopsis Content
No ratings yet
Synopsis Content
6 pages
Analyzing Human Emotions Report
No ratings yet
Analyzing Human Emotions Report
2 pages
Group 110 Arun Kumar Review 2 Report
No ratings yet
Group 110 Arun Kumar Review 2 Report
14 pages
Human Speech Emotion Recognition Using Artificial Neural Networks Technique
No ratings yet
Human Speech Emotion Recognition Using Artificial Neural Networks Technique
7 pages
Speech Emotion Journal Phase 2-3
No ratings yet
Speech Emotion Journal Phase 2-3
6 pages
Speech
No ratings yet
Speech
6 pages
Real-Time Speech Emotion Recognition Using Deep Le
No ratings yet
Real-Time Speech Emotion Recognition Using Deep Le
40 pages
Emotion Recognition Using Speech Processing
No ratings yet
Emotion Recognition Using Speech Processing
5 pages
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
No ratings yet
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
18 pages
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
6 pages
A Review On Speech Emotion Classification Using Linear Predictive Coding and Neural Networks
No ratings yet
A Review On Speech Emotion Classification Using Linear Predictive Coding and Neural Networks
5 pages
GROUP7 Researchpaper
No ratings yet
GROUP7 Researchpaper
9 pages
1822 B.E Cse Batchno 140
No ratings yet
1822 B.E Cse Batchno 140
55 pages
A Complete Phase 3
No ratings yet
A Complete Phase 3
14 pages
Speech Emotion Recognition Using Machine Learning Techniques
No ratings yet
Speech Emotion Recognition Using Machine Learning Techniques
8 pages
Energy Conservation Report
No ratings yet
Energy Conservation Report
23 pages
Harsha SG Waste Segregation Report
No ratings yet
Harsha SG Waste Segregation Report
22 pages
Internship - Report Purushotham
No ratings yet
Internship - Report Purushotham
29 pages
Predicting Employee Attrition Using Machine Learning
No ratings yet
Predicting Employee Attrition Using Machine Learning
21 pages
Technicalseminar JUNAID
No ratings yet
Technicalseminar JUNAID
22 pages
Internship Project Ppt-1
No ratings yet
Internship Project Ppt-1
23 pages
CC Assign 2
No ratings yet
CC Assign 2
8 pages
Final ppt-1
No ratings yet
Final ppt-1
20 pages
Affective Computing Report
No ratings yet
Affective Computing Report
19 pages
PYTHON 2021-22 Projects List
No ratings yet
PYTHON 2021-22 Projects List
9 pages
Deep Facial Expression Recognition A Survey
No ratings yet
Deep Facial Expression Recognition A Survey
21 pages
Tian Han
No ratings yet
Tian Han
36 pages
Attention-Based Multi-Level Feature Fusion For Multilingual Speech Emotion Recognition
No ratings yet
Attention-Based Multi-Level Feature Fusion For Multilingual Speech Emotion Recognition
6 pages
Mind-Reading Tech for CS Students
No ratings yet
Mind-Reading Tech for CS Students
13 pages
Blue Eyes: Human-Computer Interaction
No ratings yet
Blue Eyes: Human-Computer Interaction
18 pages
Class Pulse - Real Time Emotion Detection For Classroom Engagement
No ratings yet
Class Pulse - Real Time Emotion Detection For Classroom Engagement
11 pages
Interdisciplinary Research and Education
No ratings yet
Interdisciplinary Research and Education
51 pages
Project Report B12
No ratings yet
Project Report B12
80 pages
Case Study: Emotional Virtual Personal Assistant
No ratings yet
Case Study: Emotional Virtual Personal Assistant
4 pages
A Practical Guide To Sentiment Analysis FULL PDF DOCX DOWNLOAD
No ratings yet
A Practical Guide To Sentiment Analysis FULL PDF DOCX DOWNLOAD
16 pages
Stress Detection via Facial Recognition
No ratings yet
Stress Detection via Facial Recognition
16 pages
Facial Expressions of Emotion Stimuli and Tests FEEST
100% (1)
Facial Expressions of Emotion Stimuli and Tests FEEST
110 pages
Final Report-2
No ratings yet
Final Report-2
4 pages
Human Emotion Detection Using Deep Learning
No ratings yet
Human Emotion Detection Using Deep Learning
7 pages
Speech Emotion Detection Using Machine Learning
No ratings yet
Speech Emotion Detection Using Machine Learning
11 pages
Classifying Emotions and Engagement in Online Learning Based On A Single Facial Expression Recognition Neural Networ
No ratings yet
Classifying Emotions and Engagement in Online Learning Based On A Single Facial Expression Recognition Neural Networ
12 pages
"Mood Detection Using Artificial Intelligence And: Machine Learning
No ratings yet
"Mood Detection Using Artificial Intelligence And: Machine Learning
38 pages
Facial Expression Recognition Model Depending On O
No ratings yet
Facial Expression Recognition Model Depending On O
17 pages
Research Specialization CSE
No ratings yet
Research Specialization CSE
12 pages
Unthought Meets The Assemblage Brain PDF
No ratings yet
Unthought Meets The Assemblage Brain PDF
25 pages
Facial Expression Recognition: International Journal On Computer Science and Engineering August 2010
No ratings yet
Facial Expression Recognition: International Journal On Computer Science and Engineering August 2010
7 pages
OpenFace: An Open Source Facial Behavior Analysis Toolkit
No ratings yet
OpenFace: An Open Source Facial Behavior Analysis Toolkit
10 pages
Facial Smile Detection Based On Deep Learning Features
No ratings yet
Facial Smile Detection Based On Deep Learning Features
5 pages
Capstone Project Topic Submission On Emotional AI
No ratings yet
Capstone Project Topic Submission On Emotional AI
36 pages
AI Ethics in Video Games
No ratings yet
AI Ethics in Video Games
14 pages
CG-FER A Hybrid CNN and GAN Based Facial Expression Recognition
No ratings yet
CG-FER A Hybrid CNN and GAN Based Facial Expression Recognition
6 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
14 pages

Final Report

Uploaded by

Final Report

Uploaded by

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“JnanaSangama”, Belagavi-590014, Karnataka, India

MINI PROJECT REPORT ON

“SPEECH EMOTION RECOGNITION”

FOR THE SIXTTH SEMESTER BACHELOR OF ENGINEERING DEGREE

UNDER THE GUIDANCE OF

Dr. SHEETHAL AJJ MANI

IMPACT COLLEGE OF ENGINEERING AND APPLIED SCIENCES

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING-DATA SCIENCE

Signature of the Guide Signature of the HOD

Name of Examiner Signature with date

We consider proud to be part of Impact College of Engineering and Applied Sciences

We are grateful to Dr. Kaipa Sandhya, Professor & HoD-Department of Computer

CHAPTER NO. TITLE PAGE NO.

Chapter 2 REQUIREMENT ANALYSIS

Chapter 3 SYSTEM DESIGN AND

Chapter 6 RESULTS AND DISCUSSIONS

Chapter 7 CONCLUSION & FUTURE WORK

Sl. No Fig. No Name of the figure Page No

1 3.1 Data flow of the project 5

1.1 problem statement:

1.3 Background and Literature Review:

• Call Centers: Enhance customer interactions by identifying emotions, leading to

• Improves understanding in conversations, helping to identify underlying feelings that

2.1 Hardware requirements

2.2 Software requirements

Libraries and Frameworks- Machine Learning:

Ubuntu environment in Visual Studio Code:

3.1 System design

3.1 Data flow of the project

Components and Data Flow:

Data Flow Description:

• Collect and preprocess training samples.

3.2 System Architecture

• Data Collection Layer:

Explanation of Each Layer and Their Interactions:

4.2 Technical Information

System and Software Design

It deals with representing the client's requirement, as described in SRS (Software

4.3 Model Selection

• Traditional Machine Learning Models:

• Deep Learning Models:

4.4 Selection Criteria

• SVM: Best for high-dimensional feature spaces, robustness to overfitting, versatility

4.5 Model Training

4.6 Evaluation Metrices

4.7 Extract Features and Labels

4.1 Extract Features and Labels

4.8 Load and Use Trained Model

4.2 Load and Use Trained Model

5.1 Introduction to Testing Phase

1. Testing is a process of executing a program with the intent of finding error

5.2 Testing Strategies

5.3 Performance Analysis

5.4 Train the Model

5.1 Train the Model

5.5 Integrated testing

5.6 Sub-system testing

5.7 System testing

5.8 Acceptance testing

6.1 Recognition of the Emotion:

Model Output Interpretation:

7.1 Future Work:

1. Introduction to Machine learning, PHI Learning pvt.ltd,2nd Ed.,2013

You might also like