skip to main content
10.1145/3573428.3573684acmotherconferencesArticle/Chapter ViewAbstractPublication PageseitceConference Proceedingsconference-collections
research-article

Speech Emotion Recognition Based on Multi-feature Fusion and DCNN

Published: 15 March 2023 Publication History

Abstract

In order to get more effective information and improve the recognition accuracy, this paper proposes a speech emotion recognition model based on multi-feature fusion and deep convolutional neural network. First, the speech emotion data is preprocessed to obtain the two-dimensional three-channel fusion feature parameters, which are used as the input layer of the AlexNet DCNN model. Second, the model is improved. Batch normalization is added after each convolutional layer, and use genetic algorithm and simulated annealing algorithm to optimize the model. Final, we use SoftMax classifier to classify emotion. In this paper, the cross-validation method is used to evaluate the model, and it is verified on the EMO-DB and IEMOCAP datasets. The experimental results verify that the method is superior to the existing speech emotion recognition technology.

References

[1]
El Ayadi, M.; Kamel, M.S.; Karray, F. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognit. 2011, 44, 572–587.
[2]
Hinton, G. E, Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks. [J]. Science, 2006, pp. 504–507.
[3]
K. Han, D. Yu, and I. Tashev, “Speech emotion recognition using deep neural network and extreme learning machine,” in Proc. Interspeech, 2014, pp. 223–227.
[4]
Z. Huang, M. Dong, Q. Mao, and Y. Zhan, “Speech emotion recognition using CNN,” in Proc. ACM Int. Conf. Multimedia, New York, NY, USA, 2014, pp. 801-804.
[5]
Abdel-Hamid, O.; Mohamed, A.R.; Jiang, H.; Deng, L.; Penn, G.; Yu, D. Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 1533-1545. [Cross Ref]
[6]
G. Trigeorgis, “Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network,” in Proc. 41st IEEE Int. Conf. Acoust., Speech, Signal Process., Shanghai, China, 2016, pp. 5200–5204.
[7]
Badshah A M, Ahmad J, Rahim N, Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network[C]// 2017 International Conference on Platform Technology and Service (PlatCon). IEEE, 2017.
[8]
Yang S, Gong Z, Ye K, EdgeRNN: A Compact Speech Recognition Network With Spatio-Temporal Features for Edge Computing [J]. IEEE Access, 2020, PP(99):1-1.
[9]
Zhao Z, Bao Z, Zhao Y, Exploring Deep Spectrum Representations via Attention-Based Recurrent and Convolutional Neural Networks for Speech Emotion Recognition[J]. IEEE Access, 2019, 7:97515-97525.
[10]
M. Chen, X. He, J. Yang and H. Zhang, "3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition," in IEEE Signal Processing Letters, 2018, 25(10):1440-1444.
[11]
Yang S, Gong Z, Ye K, EdgeRNN: A Compact Speech Recognition Network With Spatio-Temporal Features for Edge Computing[J]. IEEE Access, 2020, PP(99):1-1.
[12]
Meng H, Yan T, Yuan F, Speech Emotion Recognition From 3D Log-Mel Spectrograms With Deep Learning Network [J]. IEEE Access, 2019.
[13]
Mao Q, Dong M, Huang Z, Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks [J]. IEEE Transactions on Multimedia, 2014, 16(8):2203-2213.
[14]
Khalil R A, Jones E, Babar M I, Speech Emotion Recognition Using Deep Learning Techniques: A Review [J]. IEEE Access, 2019, PP(99):1-1.
[15]
H. Wang, Q. Zhang, J. Wu, S. Pan, and Y. Chen, "Time series feature learning with labeled and unlabeled data," Pattern Recognit., vol. 89, pp. 55-66, May, 2019.
[16]
Sun T W. End-to-End Speech Emotion Recognition with Gender Information [J]. IEEE Access, 2020, PP(99):1-1.
[17]
Ossama, Abdel-Hamid, Abdel-Rahman, Convolutional neural networks for speech recognition [J]. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2014, 22(10):1533-1545.
[18]
Zhang S, Zhang S, Huang T, Speech Emotion Recognition Using Deep Convolutional Neural Network and Discriminant Temporal Pyramid Matching [J]. IEEE Transactions on Multimedia, 2017: 1-1.
[19]
Gomathy M. Optimal feature selection for speech emotion recognition using enhanced cat swarm optimization algorithm [J]. International Journal of Speech Technology, 2021, 24(1):155-163.
[20]
Jiang P, Hongliang F U, Tao H, Parallelized convolutional recurrent neural network with spectral features for speech emotion recognition [J]. IEEE Access, 2019, PP(99):1-1

Cited By

View all
  • (2024)The Use of Multi-Feature Fusion in the Evaluation of Emotional Expressions in Spoken EnglishApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-23429:1Online publication date: 3-Sep-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering
October 2022
1999 pages
ISBN:9781450397148
DOI:10.1145/3573428
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 March 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep convolutional neural network
  2. Genetic and simulated annealing algorithm (GSA)
  3. Multi-feature fusion
  4. Speech emotion recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EITCE 2022

Acceptance Rates

Overall Acceptance Rate 508 of 972 submissions, 52%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)The Use of Multi-Feature Fusion in the Evaluation of Emotional Expressions in Spoken EnglishApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-23429:1Online publication date: 3-Sep-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media