research-article

Speech Emotion Recognition Based on Multi-feature Fusion and DCNN

Authors:

Lianbiao FangAuthors Info & Claims

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

Pages 1454 - 1459

https://doi.org/10.1145/3573428.3573684

Published: 15 March 2023 Publication History

Abstract

In order to get more effective information and improve the recognition accuracy, this paper proposes a speech emotion recognition model based on multi-feature fusion and deep convolutional neural network. First, the speech emotion data is preprocessed to obtain the two-dimensional three-channel fusion feature parameters, which are used as the input layer of the AlexNet DCNN model. Second, the model is improved. Batch normalization is added after each convolutional layer, and use genetic algorithm and simulated annealing algorithm to optimize the model. Final, we use SoftMax classifier to classify emotion. In this paper, the cross-validation method is used to evaluate the model, and it is verified on the EMO-DB and IEMOCAP datasets. The experimental results verify that the method is superior to the existing speech emotion recognition technology.

References

[1]

El Ayadi, M.; Kamel, M.S.; Karray, F. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognit. 2011, 44, 572–587.

Digital Library

[2]

Hinton, G. E, Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks. [J]. Science, 2006, pp. 504–507.

[3]

K. Han, D. Yu, and I. Tashev, “Speech emotion recognition using deep neural network and extreme learning machine,” in Proc. Interspeech, 2014, pp. 223–227.

[4]

Z. Huang, M. Dong, Q. Mao, and Y. Zhan, “Speech emotion recognition using CNN,” in Proc. ACM Int. Conf. Multimedia, New York, NY, USA, 2014, pp. 801-804.

Digital Library

[5]

Abdel-Hamid, O.; Mohamed, A.R.; Jiang, H.; Deng, L.; Penn, G.; Yu, D. Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 1533-1545. [Cross Ref]

Digital Library

[6]

G. Trigeorgis, “Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network,” in Proc. 41st IEEE Int. Conf. Acoust., Speech, Signal Process., Shanghai, China, 2016, pp. 5200–5204.

Digital Library

[7]

Badshah A M, Ahmad J, Rahim N, Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network[C]// 2017 International Conference on Platform Technology and Service (PlatCon). IEEE, 2017.

[8]

Yang S, Gong Z, Ye K, EdgeRNN: A Compact Speech Recognition Network With Spatio-Temporal Features for Edge Computing [J]. IEEE Access, 2020, PP(99):1-1.

[9]

Zhao Z, Bao Z, Zhao Y, Exploring Deep Spectrum Representations via Attention-Based Recurrent and Convolutional Neural Networks for Speech Emotion Recognition[J]. IEEE Access, 2019, 7:97515-97525.

[10]

M. Chen, X. He, J. Yang and H. Zhang, "3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition," in IEEE Signal Processing Letters, 2018, 25(10):1440-1444.

[11]

Yang S, Gong Z, Ye K, EdgeRNN: A Compact Speech Recognition Network With Spatio-Temporal Features for Edge Computing[J]. IEEE Access, 2020, PP(99):1-1.

[12]

Meng H, Yan T, Yuan F, Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep Learning Network [J]. IEEE Access, 2019.

[13]

Mao Q, Dong M, Huang Z, Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks [J]. IEEE Transactions on Multimedia, 2014, 16(8):2203-2213.

[14]

Khalil R A, Jones E, Babar M I, Speech Emotion Recognition Using Deep Learning Techniques: A Review [J]. IEEE Access, 2019, PP(99):1-1.

[15]

H. Wang, Q. Zhang, J. Wu, S. Pan, and Y. Chen, "Time series feature learning with labeled and unlabeled data," Pattern Recognit., vol. 89, pp. 55-66, May, 2019.

[16]

Sun T W. End-to-End Speech Emotion Recognition with Gender Information [J]. IEEE Access, 2020, PP(99):1-1.

[17]

Ossama, Abdel-Hamid, Abdel-Rahman, Convolutional neural networks for speech recognition [J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2014, 22(10):1533-1545.

[18]

Zhang S, Zhang S, Huang T, Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching [J]. IEEE Transactions on Multimedia, 2017: 1-1.

[19]

Gomathy M. Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm [J]. International Journal of Speech Technology, 2021, 24(1):155-163.

Digital Library

[20]

Jiang P, Hongliang F U, Tao H, Parallelized convolutional recurrent neural network with spectral features for speech emotion recognition [J]. IEEE Access, 2019, PP(99):1-1

Cited By

Luo NWang Z(2024)The Use of Multi-Feature Fusion in the Evaluation of Emotional Expressions in Spoken EnglishApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-23429:1Online publication date: 3-Sep-2024
https://doi.org/10.2478/amns-2024-2342

Index Terms

Speech Emotion Recognition Based on Multi-feature Fusion and DCNN
1. Software and its engineering
  1. Software creation and management
    1. Software development process management
      1. Software development methods
        Agile software development

Recommendations

Speech emotion recognition algorithm based on bimodality and attention mechanism
CNIOT '23: Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things

To address the problem of low accuracy of unimodal speech emotion recognition methods, a bimodal MCNN-BiLSTM-Attention speech emotion recognition algorithm is proposed. The algorithm adopts the Mel-spectrogram and text information in audio as input, ...
Speech emotion recognition based on multi‐feature and multi‐lingual fusion
Abstract
A speech emotion recognition algorithm based on multi-feature and Multi-lingual fusion is proposed in order to resolve low recognition accuracy caused bylack of large speech dataset and low robustness of acoustic features in the recognition of ...
Deep and shallow features fusion based on deep convolutional neural network for speech emotion recognition

Recent years have witnessed the great progress for speech emotion recognition using deep convolutional neural networks (DCNNs). In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. With going ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering

October 2022

1999 pages

ISBN:9781450397148

DOI:10.1145/3573428

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 March 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

EITCE 2022

EITCE 2022: 2022 6th International Conference on Electronic Information Technology and Computer Engineering

October 21 - 23, 2022

Xiamen, China

Acceptance Rates

Overall Acceptance Rate 508 of 972 submissions, 52%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
36
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Luo NWang Z(2024)The Use of Multi-Feature Fusion in the Evaluation of Emotional Expressions in Spoken EnglishApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-23429:1Online publication date: 3-Sep-2024
https://doi.org/10.2478/amns-2024-2342

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents