{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T14:25:24Z","timestamp":1760711124345,"version":"3.37.3"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,12,5]],"date-time":"2022-12-05T00:00:00Z","timestamp":1670198400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,12,5]],"date-time":"2022-12-05T00:00:00Z","timestamp":1670198400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100014717","name":"National Outstanding Youth Science Fund Project of National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["11804068"],"award-info":[{"award-number":["11804068"]}],"id":[{"id":"10.13039\/100014717","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100011787","name":"Heilongjiang Provincial Science and Technology Department","doi-asserted-by":"publisher","award":["LH2020F033"],"award-info":[{"award-number":["LH2020F033"]}],"id":[{"id":"10.13039\/501100011787","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J AUDIO SPEECH MUSIC PROC."],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In the task of sound event detection and localization (SEDL) in a complex environment, the acoustic signals of different events usually have nonlinear superposition, so the detection and localization effect is not good. Given this, this paper is based on the Residual-spatially and channel Squeeze-Excitation (Res-scSE) model. Combined with Multiple-scale Convolutional Recurrent Neural Network (M-CRNN), the Res-scSE-CRNN model is proposed. Firstly, to solve the problem of insufficient extraction of time-frequency feature in single-size convolution kernel, multi-scale feature fusion is carried out by using the feature hierarchy of the convolutional neural network to improve the accuracy of detection. Secondly, aiming at the problem of overlapping audio event localization accuracy is not high, with Res-scSE to replace common convolution module and add residual structure to strengthen the feature extraction, and combining with an attention mechanism to enhance neural network channels and spatial relationships, to improve the network to extract the characteristics of directivity, achieve the goal of the overlapped audio localization. In this paper, experiments are carried out in the open dataset DCASE2019, and evaluation indicators are used to analyze the effectiveness of the proposed model and baseline model in the detection and localization of audio events. The results show that compared with the M-CRNN model, the detection error rate of Res-scSE-CRNN model is reduced 4%, the F1-Score is increased 3.4%, the localization error is reduced by 22.8\u00b0, and the frame recall rate is increased 3%.<\/jats:p>","DOI":"10.1186\/s13636-022-00263-6","type":"journal-article","created":{"date-parts":[[2022,12,5]],"date-time":"2022-12-05T08:03:23Z","timestamp":1670227403000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Attention mechanism combined with residual recurrent neural network for sound event detection and localization"],"prefix":"10.1186","volume":"2022","author":[{"given":"Chaofeng","family":"Lan","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lei","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuanyuan","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1399-6076","authenticated-orcid":false,"given":"Lirong","family":"Fu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chao","family":"Sun","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yulan","family":"Han","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Meng","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,12,5]]},"reference":[{"issue":"12","key":"263_CR1","first-page":"45","volume":"42","author":"L Weijie","year":"2022","unstructured":"L. Weijie, L. Bo, Modern Electronic Technique. 42(12), 45\u201347 (2022)","journal-title":"Modern Electronic Technique."},{"key":"263_CR2","first-page":"345","volume-title":"HMM-based Acoustic Event Detection with AdaBoost feature selection","author":"X Zhou","year":"2007","unstructured":"X. Zhou, X. Zhuang, M. Liu et al., HMM-based Acoustic Event Detection with AdaBoost feature selection (Springer, Berlin, Heidelberg, 2007), pp.345\u2013353"},{"issue":"05","key":"263_CR3","first-page":"65","volume":"41","author":"W Junqin","year":"2020","unstructured":"W. Junqin, W. Yingfu, Speech Separation Based on GCC-NMF. J. Jiangxi Univ. Sci. Technol. 41(05), 65\u201372 (2020)","journal-title":"J. Jiangxi Univ. Sci. Technol."},{"issue":"1","key":"263_CR4","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1109\/TASLP.2014.2367814","volume":"23","author":"H Phan","year":"2014","unstructured":"H. Phan, M. Maa\u00df, R. Mazur et al., Random Regression Forests for Acoustic Event Detection and Classification. IEEE\/ACM Trans. Audio Speech Lang. Process. 23(1), 20\u201331 (2014)","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"263_CR5","doi-asserted-by":"crossref","unstructured":"K.J. Piczak, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). Environmental Sound Classification with Convolutional Neural Networks (IEEE, Boston, 2015), pp. 1\u20136","DOI":"10.1109\/MLSP.2015.7324337"},{"key":"263_CR6","first-page":"10","volume":"1","author":"TH Vu","year":"2016","unstructured":"T.H. Vu, J.C. Wang, Acoustic Scene and Event Recognition Using Recurrent Neural Networks. Detect. Classif. Acoust. Scenes Events. 1, 10\u201315 (2016)","journal-title":"Detect. Classif. Acoust. Scenes Events."},{"key":"263_CR7","first-page":"10","volume-title":"Design of binary network system for one-dimensional time series signal","author":"S Jianan","year":"2020","unstructured":"S. Jianan, Design of binary network system for one-dimensional time series signal (Beijing University of Posts and Telecommunications, Beijing, 2020), pp.10\u201315"},{"issue":"6","key":"263_CR8","doi-asserted-by":"publisher","first-page":"1291","DOI":"10.1109\/TASLP.2017.2690575","volume":"25","author":"RE Cak\u0131","year":"2017","unstructured":"R.E. Cak\u0131, G. Parascandolo, T. Heittola et al., Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE\/ACM Trans. Audio Speech Lang. Process. 25(6), 1291\u20131303 (2017)","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"263_CR9","doi-asserted-by":"crossref","unstructured":"S. Adavanne, A. Politis, T. Virtanen, 2018 International Joint Conference on Neural Networks (IJCNN). Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features (IEEE, Rio de Janeiro, 2018), pp. 1\u20137","DOI":"10.1109\/IJCNN.2018.8489542"},{"key":"263_CR10","doi-asserted-by":"publisher","unstructured":"Q. Kong, Y. Xu, I. Sobieraj, in IEEE\/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 4. IEEE\/ACM Transactions on Sound Event Detection and Time-frequency Segmentation from Weakly Labelled Data (2019), pp. 777-787. https:\/\/doi.org\/10.1109\/TASLP.2019.2895254","DOI":"10.1109\/TASLP.2019.2895254"},{"key":"263_CR11","doi-asserted-by":"crossref","unstructured":"T. Iqbal, Y. Xu, Q. Kong, et al., 2018 26th European Signal Processing Conference (EUSIPCO). Capsule routing for sound event detection (IEEE, Rome, 2018), pp. 2255\u20132259","DOI":"10.23919\/EUSIPCO.2018.8553198"},{"issue":"03","key":"263_CR12","first-page":"1102","volume":"44","author":"L Yang","year":"2022","unstructured":"L. Yang, J. Hao, X. Gu, Z. Hou, Audio Label consistency constraint CRNN Sound Event Detection. J. Electron. Inf. Technol. 44(03), 1102\u20131110 (2022)","journal-title":"J. Electron. Inf. Technol."},{"key":"263_CR13","doi-asserted-by":"publisher","unstructured":"Y. Huang. Noise cases, sound detection, classification and localization (University of electronic science and technology, 2021), https:\/\/doi.org\/10.27005\/d.cnki.gdzku.2021.003635.","DOI":"10.27005\/d.cnki.gdzku.2021.003635"},{"issue":"8","key":"263_CR14","doi-asserted-by":"publisher","first-page":"1208","DOI":"10.1109\/LSP.2017.2713830","volume":"24","author":"J Lee","year":"2017","unstructured":"J. Lee, J. Nam, Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging. IEEE Sig. Process. Lett. 24(8), 1208\u20131212 (2017)","journal-title":"IEEE Sig. Process. Lett."},{"key":"263_CR15","first-page":"8","volume-title":"Research on multi-scale feature fusion and Data augmentation method for sound scene classification","author":"C Xinxing","year":"2019","unstructured":"C. Xinxing, Research on multi-scale feature fusion and Data augmentation method for sound scene classification (Chongqing University, Chongqing, 2019), pp.8\u201310"},{"key":"263_CR16","unstructured":"Z. Weizhe, QIU Peng, Wei Juan. Voice Recognition and Detection Based on Multi-scale Attention Fusion in Weak Label Environment. Comput. Sci. 47(05), 120-123 (2020)"},{"key":"263_CR17","doi-asserted-by":"publisher","unstructured":"H. Xinyuan. Research on deep network model for acoustic event localization and detection (Yanshan University, 2021), https:\/\/doi.org\/10.27440\/d.cnki.gysdu.2021.001846.","DOI":"10.27440\/d.cnki.gysdu.2021.001846"},{"key":"263_CR18","doi-asserted-by":"publisher","first-page":"487","DOI":"10.1109\/TASLP.2019.2958408","volume":"28","author":"I Trowitzsch","year":"2019","unstructured":"I. Trowitzsch, C. Schymura, D. Kolossa et al., Joining sound event detection and localization through spatial segregation. IEEE\/ACM Trans. Audio Speech Lang. Process. 28, 487\u2013502 (2019)","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"263_CR19","doi-asserted-by":"crossref","unstructured":"Y. Cao, Q. Kong, T. Iqbal, et al., Polyphonic Sound Event Detection and Localization using a two-stage Strategy. (2019). ArXiv Preprint arXiv:1905.00268","DOI":"10.33682\/4jhy-bj81"},{"key":"263_CR20","doi-asserted-by":"crossref","unstructured":"R. Ranjan, S. Jayabalan, T.N.T. Nguyen, et al., Sound Event Detection and Direction of Arrival Estimation using Residual Net and Recurrent Neural Networks. (2019). ArXiv preprint arXiv:1902.00260","DOI":"10.33682\/93dp-f064"},{"issue":"7","key":"263_CR21","first-page":"1751","volume":"26","author":"EL Tan","year":"2019","unstructured":"E.L. Tan, R. Ranjan, S. Jayabalan, Sound Event Detection and Localization using ResNet RNN and Time-delay DOA. Detect. Classif. Acoust. Scenes Events Chall. 26(7), 1751\u20131760 (2019)","journal-title":"Detect. Classif. Acoust. Scenes Events Chall."},{"key":"263_CR22","unstructured":"J. Naranjo-Alcazar, S. Perez-Castanos, J. Ferrandis, et al., Sound Event Localization and Detection using Squeeze-Excitation Residual CNNs. (2020). arXiv preprint arXiv:2006.14436"},{"key":"263_CR23","doi-asserted-by":"publisher","first-page":"430","DOI":"10.1016\/j.neucom.2021.05.036","volume":"454","author":"L Min","year":"2021","unstructured":"L. Min, M. Zhenjiang, X. Wanru, A CRNN-based attention-seq2seq model with fusion feature for automatic Labanotation generation. Neurocomputing. 454, 430\u2013440 (2021)","journal-title":"Neurocomputing."},{"key":"263_CR24","doi-asserted-by":"crossref","unstructured":"L. Bin, Z. Junyue, C. Jie. Efficient Residual Neural Network for Semantic Segmentation. Pattern Recognit. Image Anal. 31 (2), (2022)","DOI":"10.1134\/S1054661821020103"},{"issue":"05","key":"263_CR25","first-page":"35","volume":"13","author":"W Zhanguo","year":"2021","unstructured":"W. Zhanguo, S. Yaping, L. Ya, Improved IMAGE segmentation algorithm of logistics tray based on squeeze excitation cavity convolution in U-NET Network. J. Packag. 13(05), 35\u201341 (2021)","journal-title":"J. Packag."},{"issue":"8","key":"263_CR26","doi-asserted-by":"publisher","first-page":"2011","DOI":"10.1109\/TPAMI.2019.2913372","volume":"42","author":"J Hu","year":"2020","unstructured":"J. Hu, L. Shen, G. Sun, Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011\u20132023 (2020)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."}],"container-title":["EURASIP Journal on Audio, Speech, and Music Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-022-00263-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13636-022-00263-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13636-022-00263-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,5]],"date-time":"2022-12-05T08:13:27Z","timestamp":1670228007000},"score":1,"resource":{"primary":{"URL":"https:\/\/asmp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13636-022-00263-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,5]]},"references-count":26,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["263"],"URL":"https:\/\/doi.org\/10.1186\/s13636-022-00263-6","relation":{},"ISSN":["1687-4722"],"issn-type":[{"type":"electronic","value":"1687-4722"}],"subject":[],"published":{"date-parts":[[2022,12,5]]},"assertion":[{"value":"19 March 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 November 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 December 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Springer Nature remains neutral about concerning jurisdictional claims in published maps and institutional affiliations.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"29"}}