skip to main content
research-article

Teacher-Student Framework for Polyphonic Semi-supervised Sound Event Detection: Survey and Empirical Analysis

Published: 17 October 2024 Publication History

Abstract

Polyphonic sound event detection refers to the task of automatically identifying sound events occurring simultaneously in an auditory scene. Due to the inherent complexity and variability of real-world auditory scenes, building robust detectors for polyphonic sound event detection poses a significant challenge. The task becomes furthermore challenging without sufficient annotated data to develop sound event detection systems under a supervised learning regime. In this article, we explore the recent developments in polyphonic sound event detection, with a particular emphasis on the application of Teacher-Student techniques within the semi-supervised learning paradigm. Unlike previous works, we have consolidated and organized the fragmented literature on Teacher-Student techniques for polyphonic sound event detection. By examining the latest research, categorizing Teacher-Student approaches, and conducting an empirical study to assess the performance of each approach, this survey offers valuable insights and practical guidance for researchers and practitioners in the field. Our findings highlight the potential benefits of utilizing multiple learners, ensuring consistent predictions, and making thoughtful choices regarding perturbation strategies.

References

[1]
Tamer S. Abdelgayed, Walid G. Morsi, and Tarlochan S. Sidhu. 2018. Fault detection and classification based on co-training of semisupervised machine learning. IEEE Transactions on Industrial Electronics 65, 2 (2018), 1595–1605. DOI:
[2]
Jakob Abeßer. 2020. A review of deep learning based methods for acoustic scene classification. Applied Sciences 10, 6 (2020). DOI:
[3]
Sharath Adavanne, Giambattista Parascandolo, Pasi Pertilä, Toni Heittola, and Tuomas Virtanen. 2017. Sound event detection in multichannel audio using spatial and harmonic features. arXiv1706.02293. Retrieved from http://arxiv.org/abs/1706.02293
[4]
Sharath Adavanne, Archontis Politis, Joonas Nikunen, and Tuomas Virtanen. 2018. Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE Journal of Selected Topics in Signal Processing 13, 1 (2018), 34–48. DOI:
[5]
Sharath Adavanne, Archontis Politis, and Tuomas Virtanen. 2019. A multi-room reverberant dataset for sound event localization and detection. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE ’16), 6–10. arXiv:1905.08546. Retrieved from http://arxiv.org/abs/1905.08546
[6]
Anthony Agnone and Umair Altaf. 2019. Virtual Adversarial Training System for DCASE 2019 Task 4. Technical Report. Detection and Classification of Acoustics Scenes and Events 2019 Challenge.
[7]
Ning An, Huitong Ding, Jiaoyun Yang, Rhoda Au, and Ting F. Ang. 2020. Deep ensemble learning for Alzheimer's disease classification. Journal of Biomedical Informatics 105 (2020), 103411. DOI:
[8]
George Awad, Jon Fiscus, Brian Antonishek, Martial Michel, Alan Smeaton, Wessel Kraaij, and Georges Quénot. 2011. TRECVID 2011 – An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics. TREC Video Retrieval Evaluation. Retrieved from https://api.semanticscholar.org/CorpusID:2983005
[9]
Elham Babaee, Nor Badrul Anuar, Ainuddin Wahid Abdul Wahab, Shahaboddin Shamshirband, and Anthony T. Chronopoulos. 2017. An overview of audio event detection methods from feature extraction to classification. Applied Artificial Intelligence 31, 9–10 (2017), 661–714. DOI:
[10]
Bram Bakker, Shimon Whiteson, Leon J. H. M. Kester, and Frans C. A. Groen. 2010. Traffic light control by multiagent reinforcement learning systems. In Interactive Collaborative Information Systems. Robert Babuska and Frans C. A. Groen (Eds.), Studies in Computational Intelligence, Vol. 281, Springer, 475–510. DOI:
[11]
Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian J. Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, and Yoshua Bengio. 2012. Theano: New features and speed improvements. In Proceedings of the Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop. arXiv e-prints, 1–8. arXiv:1211.5590. Retrieved from http://arxiv.org/abs/1211.5590
[12]
David Berthelot, Nicholas Carlini, Ian J. Goodfellow, Nicolas Papernot, Avital Oliver, and Colin Raffel. 2019. MixMatch: A holistic approach to semi-supervised learning. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, Article 454, 5049–5059. Retrieved from http://arxiv.org/abs/1905.02249
[13]
Michael Bowling and Manuela Veloso. 2002. Multiagent learning using a variable learning rate. Artificial Intelligence 136, 2 (2002), 215–250. DOI:
[14]
Brian J. Brandler and Zehra F. Peynircioglu. 2015. A comparison of the efficacy of individual and collaborative music learning in ensemble rehearsals. Journal of Research in Music Education 63, 3 (2015), 281–297. DOI:
[15]
Cristian Bucila, Rich Caruana, and Alexandru Niculescu-Mizil. 2006. Model compression. In Proceedings of the Knowledge Discovery and Data Mining (KDD’06). Association for Computing Machinery, New York, NY, USA, 535–541.
[16]
Emre Cakir and Tuomas Virtanen. 2017. Convolutional recurrent neural networks for rare sound event detection. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE ’17) . Tuomas Virtanen, Annamaria Mesaros, Toni Heittola, Aleksandr Diment, Emmanuel Vincent, Emmanouil Benetos, and Benjamin Elizalde (Eds.), 27–31.
[17]
Yin Cao, Turab Iqbal, Qiuqiang Kong, Fengyan An, Wenwu Wang, and Mark D. Plumbley. 2020. An improved event-independent network for polyphonic sound event localization and detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 885–889. Retrieved from https://arxiv.org/abs/2010.13092
[18]
Yin Cao, Turab Iqbal, Qiuqiang Kong, Miguel Galindo, Wenwu Wang, and Mark D. Plumbley. 2019. Two-Stage Sound Event Localization and Detection Using Intensity Vector and Generalized Cross-Correlation. Technical Report. Detection Classification Acoustic Scenes Events (DCASE) Challenge. 2019.
[19]
Mark Cartwright, Ayanna Seals, Justin Salamon, Alex C. Williams, Stefanie Mikloska, Duncan MacConnell, Edith Law, Juan Pablo Bello, and Oded Nov. 2017. Seeing sound: Investigating the effects of visualizations and complexity on crowdsourced audio annotations. Proceedings of the ACM on Human Computer Interaction 1, 29 (2017), 1–29:21. DOI:
[20]
Teck Kai Chan and Cheng Siong Chin. 2020. A comprehensive review of polyphonic sound event detection. IEEE Access 8 (2020), 103339–103373. DOI:
[21]
Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien (Eds.). 2006. Semi-Supervised Learning. The MIT Press. DOI:
[22]
Sameer Chauhan, Sharang Phadke, and Christian Sherland. 2013. IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events. Technical Report. Event Detection and Classification.
[23]
Guobin Chen, Wongun Choi, Xiang Yu, Tony X. Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 742–751. Retrieved from https://proceedings.neurips.cc/paper/2017/hash/e1e32e235eee1f970470a3a6658dfdd5-Abstract.html
[24]
Wen-Chang Cheng, Tin-Yu Wu, and Dai-Wei Li. 2018. Ensemble convolutional neural networks for face recognition. In Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence. Association for Computing Machinery, New York, NY, USA, Article 44, 1–6. DOI:
[25]
Yu. Cheng, Duo. Wang, Pan. Zhou, and Tao. Zhang. 2018. Model compression and acceleration for deep neural networks: the principles, progress, and challenges. IEEE Signal Process. Mag. 35, 1, 126–136. DOI:
[26]
In Kyu Choi, Kisoo Kwon, Soo Hyun Bae, and Nam Soo Kim. 2016. DNN-based sound event detection with exemplar-based approach for noise reduction. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE ’16). Tuomas Virtanen, Annamaria Mesaros, Toni Heittola, Mark D. Plumbley, Peter Foster, Emmanouil Benetos, and Mathieu Lagrange (Eds.), 16–19. Retrieved from http://dcase.community/documents/workshop2016/proceedings/Choi-DCASE2016workshop.pdf
[27]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, 160–167. DOI:
[28]
Courtenay V. Cotton and Daniel P. W. Ellis. 2011. Spectral vs. spectro-temporal features for acoustic event detection. In Proceedings of the 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 69–72. DOI:
[29]
Wenhui Cui, Yanlin Liu, Yuxing Li, Menghao Guo, Yiming Li, Xiuli Li, Tianle Wang, Xiangzhu Zeng, and Chuyang Ye. 2019. Semi-supervised brain lesion segmentation with an adapted mean teacher model. In Proceedings of the Information Processing in Medical Imaging. Springer, 554–565. Retrieved from http://arxiv.org/abs/1903.01248
[30]
An Dang, Toan H. Vu, and Jia-Ching Wang. 2017. A survey of deep learning for polyphonic sound event detection. In Proceedings of the 2017 International Conference on Orange Technologies (ICOT), 75–78. DOI:
[31]
Lionel Delphin-Poulat, Rozenn Nicol, Cyril Plapous, and Katell Peron. 2020. Comparative assessment of data augmentation for semi-supervised polyphonic sound event detection. In Proceedings of the 2020 27th Conference of Open Innovations Association (FRUCT), 46–53. DOI:
[32]
Lionel Delphin-Poulat and Cyril Plapous. 2019. Mean Teacher with Data Augmentation for DCASE 2019 Task 4. Technical Report. Detection and Classification of Acoustic Scenes and Events.
[33]
Jinhong Deng, Wen Li, Yuhua Chen, and Lixin Duan. 2020. Unbiased mean teacher for cross domain object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4091–4101. Retrieved from https://arxiv.org/abs/2003.00707
[34]
Li Deng, Geoffrey Hinton, and Brian Kingsbury. 2013. New types of deep neural network learning for speech recognition and related applications: An overview. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 8599–8603. DOI:
[35]
Zhor Diffallah, Hadjer Ykhlef, Hafida Bouarfa, and Nardjesse Diffallah. 2022. Consistency regularization-based polyphonic audio event detection with minimal supervision. In Proceedings of the 2022 IEEE 21st international Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA), 325–330. DOI:
[36]
Zhor Diffallah, Hadjer Ykhlef, Hafida Bouarfa, and Farid Ykhlef. 2021. Impact of mixup hyperparameter tunning on deep learning-based systems for acoustic scene classification. In Proceedings of the 2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI), 1–6. DOI:
[37]
Heinrich Dinkel and Kai Yu. 2020. Duration robust weakly supervised sound event detection. In Proceedings of the ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 311–315. DOI:
[38]
Xibin Dong, Zhiwen Yu, Wenming Cao, Yifan Shi, and Qianli Ma. 2020. A survey on ensemble learning. Frontiers of Computer Science 14, 2 (2020), 241–258. DOI:
[39]
Janek Ebbers and Reinhold Haeb-Umbach. 2021. Self-trained audio tagging and sound event detection in domestic environments. In Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE ’21). Frederic Font, Annamaria Mesaros, Daniel P. W. Ellis, Eduardo Fonseca, Magdalena Fuentes, and Benjamin Elizalde (Eds.), 226–230. Retrieved from http://dcase.community/documents/workshop2021/proceedings/DCASE2021Workshop_Ebbers_71.pdf
[40]
Peter Foster, Siddharth Sigtia, Sacha Krstulovic, Jon Barker, and Mark D. Plumbley. 2015. Chime-home: A dataset for sound source recognition in a domestic environment. In Proceedings of the 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 1–5. DOI:
[41]
Jürgen T. Geiger and Karim Helwani. 2015. Improving event detection for audio surveillance using Gabor filterbank features. In Proceedings of the 2015 23rd European Signal Processing Conference (EUSIPCO), 714–718. DOI:
[42]
Dimitrios Giannoulis, Emmanouil Benetos, Dan Stowell, Mathias Rossignol, Mathieu Lagrange, and Mark D. Plumbley. 2013a. Detection and classification of acoustic scenes and events: An IEEE AASP challenge. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA ’13). IEEE, 1–4. DOI:
[43]
Dimitrios Giannoulis, Dan Stowell, Emmanouil Benetos, Mathias Rossignol, Mathieu Lagrange, and Mark D. Plumbley. 2013b. A database and challenge for acoustic scene classification and event detection. In Proceedings of the 21st European Signal Processing Conference (EUSIPCO ’13). IEEE, 1–5. Retrieved from https://ieeexplore.ieee.org/document/6811416/
[44]
Ross B. Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, 1440–1448. DOI:
[45]
Yunchao Gong, Liu Liu, Ming Yang, and Lubomir D. Bourdev. 2014. Compressing deep convolutional networks using vector quantization. CoRR. Retrieved from http://arxiv.org/abs/1412.6115
[46]
Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572. Retrieved from https://api.semanticscholar.org/CorpusID:6706414
[47]
Arseniy Gorin, Nurtas Makhazhanov, and Nickolay Shmyrev. 2016. DCASE 2016 Sound Event Detection System Based on Convolutional Neural Network. Technical Report. Detection and Classification of Acoustic Scenes and Events.
[48]
Jann Goschenhofer, Rasmus Hvingelby, David Rügamer, Janek Thomas, Moritz Wagner, and Bernd Bischl. 2021. Deep semi-supervised learning for time series classification. In Proceedings of the 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), 422–428. DOI:
[49]
Jianping Gou, Baosheng Yu, Stephen J. Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021), 1789–1819. DOI:
[50]
Sangchul Hahn and Heeyoul Choi. 2019. Self-knowledge distillation in natural language processing. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP ’19), 423–430. arXiv:1908.01851. Retrieved from http://arxiv.org/abs/1908.01851
[51]
Robert Harb and Franz Pernkopf. 2018. Sound event detection using weakly-labeled semi-supervised data with GCRNNS, VAT and self-adaptive label refinement. arXiv:1810.06897. Retrieved from http://arxiv.org/abs/1810.06897
[52]
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, Jonathan Le Roux, and Kazuya Takeda. 2017. Duration-controlled LSTM for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 11 (2017), 2059–2070. DOI:
[53]
Yuhang He, Niki Trigoni, and Andrew Markham. 2021. SoundDet: Polyphonic sound event detection and localization from raw waveform. arXiv:2106.06969. Retrieved from https://arxiv.org/abs/2106.06969
[54]
Geoffrey Hinton, Jeff Dean, and Oriol Vinyals. 2014. Distilling the knowledge in a neural network. arXiv:1503.02531, 1–9. Retrieved from https://api.semanticscholar.org/CorpusID:7200347
[55]
Rui Huang, Shu Zhang, Tianyu Li, and Ran He. 2017. Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis. arXiv:1704.04086. Retrieved from http://arxiv.org/abs/1704.04086
[56]
Yuxin Huang, Liwei Lin, Shuo Ma, Xiangdong Wang, Hong Liu, Yueliang Qian, Min Liu, and Kazushige Ouchi. 2020. Guided multi-branch learning systems for sound event detection with sound separation. In Proceedings of the Detection Classification Acoustic Scenes Events Workshop, 61–65.
[57]
Zhen Huang, You-Chi Cheng, Kehuang Li, Ville Hautamäki, and Chin-Hui Lee. 2013. A blind segmentation approach to acoustic event detection based on i-vector. In Proceedings of Interspeech, 2282–2286. DOI:
[58]
Keisuke Imoto, Sakiko Mishima, Yumi Arai, and Reishi Kondo. 2021. Impact of sound duration and inactive frames on sound event detection performance. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 860–864. arXiv:2102.01927. Retrieved from https://arxiv.org/abs/2102.01927
[59]
Intelligent Interface and Interaction. [n. d.].
[60]
Lu Jiakai and Pfu Shanghai. 2018. Mean Teacher Convolution System for DCASE 2018 Task 4. Technical Report. Detection and Classification of Acoustic Scenes and Events.
[61]
Wangkai Jin, Junyu Liu, Meili Feng, and Jianfeng Ren. 2022. Polyphonic sound event detection using capsule neural network on multi-type-multi-scale time-frequency representation. In Proceedings of the 2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence (SEAI), 146–150. DOI:
[62]
Wang Kaiwu, Liping Yang, and Yang Bin. 2017. Audio events detection and classification using extended R-FCN approach. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 128–132. Retrieved from https://api.semanticscholar.org/CorpusID:30679271
[63]
Slawomir Kapka. 2020. ID-conditioned auto-encoder for unsupervised anomaly detection. In Proceedings of the 5th Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE ’20) (full virtual). Nobutaka Ono, Noboru Harada, Yohei Kawaguchi, Annamaria Mesaros, Keisuke Imoto, Yuma Koizumi, and Tatsuya Komatsu (Eds.), 71–75.
[64]
Rainer Stiefelhagen, Keni Bernardin, Rachel Bowers, John Garofolo, Djamel Mostefa, and Padmanabhan Soundararajan. 2007. The CLEAR 2006 evaluation. In Multimodal Technologies for Perception of Humans. Rainer Stiefelhagen and John Garofolo (Eds.), Springer Berlin Heidelberg, 1–44.
[65]
Rainer Stiefelhagen, Keni Bernardin, Rachel Bowers, R. Travis Rose, Martial Michel, and John Garofolo. 2008. The CLEAR 2007 evaluation. In Multimodal Technologies for Perception of Humans. Rachel Bowers, Rainer Stiefelhagen, and Jonathan Fiscus (Eds.), Springer Berlin Heidelberg, 3–34.
[66]
Tanmay Khandelwal, Rohan Kumar Das, Andrew Koh, and Eng Siong Chng. 2022. Detection and Classification of Acoustic Scenes and Events 2022 FMSG-NTU Submission for DCASE 2022 Task 4 on Sound Event Detection in Domestic Environments. Technical Report. Detection and Classification of Acoustic Scenes and Events.
[67]
Changmin Kim and Siyoung Yang. 2022. Detection and Classification of Acoustic Scenes and Events 2022 Sound Event Detection System Using Fixmatch for DCASE 2022 Challenge Task 4. Technical Report. Detection and Classification of Acoustic Scenes and Events.
[68]
Chih-Yuan Koh, You-Siang Chen, Yi-Wen Liu, and Mingsian R Bai. 2021. Sound event detection by consistency training and pseudo-labeling with feature-pyramid convolutional recurrent neural networks. In Proceedings of the ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 376–380. DOI:
[69]
Qiuqiang Kong, Iwona Sobieraj, Wenwu Wang, and Mark D. Plumbley. 2016. Deep neural network baseline for DCASE challenge 2016. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE ’16), Tuomas Virtanen, Annamaria Mesaros, Toni Heittola, Mark D. Plumbley, Peter Foster, Emmanouil Benetos, and Mathieu Lagrange (Eds.), 50–54.
[70]
Khaled Koutini, Hamid Eghbal-zadeh, and Gerhard Widmer. 2018. Iterative knowledge distillation in R-CNNs for weakly-labeled semi-supervised sound event detection. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE ’18). Mark D. Plumbley, Christian Kroos, Juan Pablo Bello, Gaël Richard, Daniel P. W. Ellis, and Annamaria Mesaros (Eds.), 173–177.
[71]
Jan Kukacka, Vladimir Golkov, and Daniel Cremers. 2017. Regularization for deep learning: A taxonomy. arXiv:1710.10686. Retrieved from http://arxiv.org/abs/1710.10686
[72]
Anurag Kumar, Rajesh M. Hegde, Rita Singh, and Bhiksha Raj. 2013. Event detection in short duration audio using Gaussian mixture model and random forest classifier. In Proceedings of the 21st European Signal Processing Conference (EUSIPCO ’13), IEEE, 1–5. Retrieved from https://ieeexplore.ieee.org/document/6811668/
[73]
Anurag Kumar and Vamsi Krishna Ithapu. 2020. A sequential self teaching approach for improving generalization in sound event recognition. International Conference on Machine Learning. Retrieved from https://api.semanticscholar.org/CorpusID:220280250
[74]
Samuli Laine and Timo Aila. 2017. Temporal ensembling for semi-supervised learning. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17), Conference Track Proceedings. OpenReview.net. Retrieved from https://openreview.net/forum?id=BJ6oOfqge
[75]
Mario Lasseck. 2018. Acoustic bird detection with deep convolutional neural networks. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE ’18). Mark D. Plumbley, Christian Kroos, Juan Pablo Bello, Gaël Richard, Daniel P. W. Ellis, and Annamaria Mesaros (Eds.), 143–147.
[76]
Alfred Laugros, Alice Caplier, and Matthieu Ospici. 2020. Addressing neural network robustness with mixup and targeted labeling adversarial training. ECCV Workshops. Retrived from https://api.semanticscholar.org/CorpusID:221172827
[77]
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew P Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, Los Alamitos, CA, USA, 105–114. DOI:
[78]
Chuan Li and Michael Wand. 2016. Precomputed real-time texture synthesis with Markovian Generative Adversarial Networks. (January 1970). Retrieved May 18, 2023 from https://link.springer.com/chapter/10.1007/978-3-319-46487-9_43
[79]
Kunpeng Li, Zizhao Zhang, Guanhang Wu, Xuehan Xiong, Chen-Yu Lee, Zhichao Lu, Yun Fu, and Tomas Pfister. 2022. Learning from weakly-labeled web videos via exploring sub-concepts. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI ’22), 34th Conference on Innovative Applications of Artificial Intelligence (IAAI ’22), 12th Symposium on Educational Advances in Artificial Intelligence (EAAI ’22) Virtual Event. AAAI Press, 1341–1349.
[80]
Zhixin Li, Lan Lin, Canlong Zhang, Huifang Ma, and Weizhong Zhao. 2019. Automatic image annotation based on co-training. In Proceedings of the2019 International Joint Conference on Neural Networks (IJCNN), 1–8. DOI:
[81]
Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, and Ming-Ting Sun. 2017. Adversarial ranking for language generation. Neural Information Processing Systems. Retrieved from https://api.semanticscholar.org/CorpusID:4857922
[82]
Liwei Lin, Xiangdong Wang, Hong Liu, and Yueliang Qian. 2019. Guided learning convolution system for DCASE 2019 task 4. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events 2019 (DCASE ’19). New York University. Michael I. Mandel, Justin Salamon, and Daniel P. W. Ellis (Eds.), 134–138.
[83]
Bo Liu, Lin Gu, and Feng Lu. 2019. Unsupervised ensemble strategy for retinal vessel segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention (MICCAI ’19): 22nd International Conference, Part I 111–119. DOI:
[84]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2017. Adversarial multi-task learning for text classification. Retrieved May 18, 2023 from https://aclanthology.org/P17-1001/
[85]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2014. Fully convolutional networks for semantic segmentation. arXiv:1411.4038. Retrieved from https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf
[86]
Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, and Juan Pablo Bello. 2019. Robust sound event detection in bioacoustic sensor networks. PLoS ONE 14(10): e0214168. DOI:
[87]
Xugang lu, Yu Tsao, Shigeki Matsuda, and Chiori Hori. 2014. Sparse representation based on a bag of spectral exemplars for acoustic event detection. In Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, 6255–6259. DOI:
[88]
Maja J. Matarić. 1997. Reinforcement learning in the multi-robot domain. Autonomous Robots 4, 1 (1997), 73–83, 1573–7527. DOI:
[89]
Annamaria Mesaros, Aleksandr Diment, Benjamin Elizalde, Toni Heittola, Emmanuel Vincent, Bhiksha Raj, and Tuomas Virtanen. 2019. Sound event detection in the DCASE 2017 challenge. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 (2019), 992–1006.
[90]
Annamaria Mesaros, Toni Heittola, Aleksandr Diment, Benjamin Elizalde, Ankit Shah, Emmanuel Vincent, Bhiksha Raj, and Tuomas Virtanen. 2017. DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System. Technical Report. Detection and Classification of Acoustic Scenes and Events. DOI: https://hal.inria.fr/hal-01627981
[91]
Annamaria Mesaros, Toni Heittola, Antti Eronen, and Tuomas Virtanen. 2010. Acoustic event detection in real life recordings. In Proceedings of the 2010 18th European Signal Processing Conference, 1267–1271.
[92]
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. 2016. TUT database for acoustic scene classification and sound event detection. In Proceedings of the 2016 24th European Signal Processing Conference (EUSIPCO), 1128–1132. DOI:
[93]
Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. 2018. Acoustic scene classification: An overview of Dcase 2017 challenge entries. In Proceedings of the 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 411–415. DOI:
[94]
Annamaria Mesaros, Toni Heittola, Tuomas Virtanen, and Mark D. Plumbley. 2021. Sound event detection: A tutorial. IEEE Signal Processing Magazine 38, 5 (2021), 67–83. DOI:
[95]
Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, and Hassan Ghasemzadeh. 2019. Improved knowledge distillation via teacher assistant: Bridging the gap between student and teacher. AAAI Conference on Artificial Intelligence. Retrieved from https://api.semanticscholar.org/CorpusID:212908749
[96]
Takeru Miyato, Shin-Ichi Maeda, Masanori Koyama, and Shin Ishii. 2019. Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 8 (2019), 1979–1993. DOI:
[97]
Koichi Miyazaki, Tatsuya Komatsu, Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, and K. Takeda. 2020. Convolution-Augmented Transformer for Semi-Supervised Sound Event Detection. Technical Report. Detection and Classification of Acoustic Scenes and Events.
[98]
Veronica Morfi, Ines Nolasco, Vincent Lostanlen, Shubhr Singh, Ariana Strandburg-Peshkin, Lisa F. Gill, Hanna Pamula, David Benvent, and Dan Stowell. 2021. Few-shot bioacoustic event detection: A new task at the DCASE 2021 challenge. In Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE ’21). Frederic Font, Annamaria Mesaros, Daniel P. W. Ellis, Eduardo Fonseca, Magdalena Fuentes, and Benjamin Elizalde (Eds.), 145–149.
[99]
Loris Nanni, Yandre M. G. Costa, Rafael L. Aguiar, Rafael B. Mangolin, Sheryl Brahnam, and Carlos N. Silla. 2020. Ensemble of convolutional neural networks to improve animal audio classification. EURASIP Journal on Audio, Speech, and Music Processing 8, 1 (2020), 1687–4722. DOI:
[100]
Loris Nanni, Gianluca Maguolo, Sheryl Brahnam, and Michelangelo Paci. 2021. An ensemble of convolutional neural networks for audio classification. Applied Sciences 11, 13 (2021), 2076–3417. DOI:
[101]
Maria E. Niessen, Tim L. M. Van Kasteren, and Andreas Merentitis. 2013. Hierarchical modeling using automated sub-clustering for sound event recognition. In Proceedings of the 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1–4. DOI:
[102]
V. Nivitha Varghees and K. I. Ramachandran. 2017. Effective heart sound segmentation and murmur classification using empirical wavelet transform and instantaneous phase for electronic stethoscope. IEEE Sensors Journal 17, 12 (2017), 3861–3872. DOI:
[103]
Mauricio Orbes-Arteaga, M. Jorge Cardoso, Lauge S{orensen, Christian Igel, Sébastien Ourselin, Marc Modat, Mads Nielsen, and Akshay Pai. 2019. Knowledge distillation for semi-supervised domain adaptation. In Proceedings of the OR 2.0 Context-Aware Operating Theaters and Machine Learning in Clinical Neuroimaging - Second International Workshop, OR 2.0 2019, and Second International Workshop, MLCN 2019, Held in Conjunction with MICCAI 2019. Luping Zhou, Duygu Sarikaya, Seyed Mostafa Kia, Stefanie Speidel, Anand Malpani, Daniel A. Hashimoto, Mohamad Habes, Tommy Löfstedt, Kerstin Ritter, and Hongzhi Wang (Eds.). Lecture Notes in Computer Science, Vol. 11796, Springer, 68–76. DOI:
[104]
Yassine Ouali, Céline Hudelot, and Myriam Tami. 2020. An overview of deep semi-supervised learning. arXiv:2006.05278. Retrieved from https://arxiv.org/abs/2006.05278
[105]
Paul Over, George Awad, Jonathan G. Fiscus, Brian Antonishek, Martial Michel, Wessel Kraaij, Alan F. Smeaton, and Georges Quénot (Eds.). 2010. TRECVID 2010 Workshop Participants Notebook Papers. National Institute of Standards and Technology (NIST), Gaithersburg, MD, USA. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.10.org.html
[106]
Paul Over, Jon Fiscus, Gregory A. Sanders, Barbara Shaw, George Awad, Martial Michel, Alan F. Smeaton, Wessel Kraaij, and Georges Quénot. 2012. TRECVID 2012 - An overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of the 2012 TREC Video Retrieval Evaluation, TRECVID 2012. National Institute of Standards and Technology (NIST). Paul Over, Jonathan G. Fiscus, Gregory A. Sanders, Barbara Shaw, George Awad, Martial Michel, Alan F. Smeaton, Wessel Kraaij, and Georges Quénot (Eds.), 10–13. Retrieved from https://www-nlpir.nist.gov/projects/tvpubs/tv12.papers/tv12overview.pdf
[107]
Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian J. Goodfellow, and Kunal Talwar. 2017. Semi-supervised knowledge transfer for deep learning from private training data. In Proceedings of the 5th International Conference on Learning Representations (ICLR ’17), Conference Track Proceedings. Retrieved from https://openreview.net/forum?id=HkwoSDPgg
[108]
Sangwook Park, David K. Han, and Mounya Elhilali. 2021. Cross-referencing self-training network for sound event detection in audio mixtures. IEEE Trans. Multimedia 25 (2023), 4573–4585. DOI: https://doi.org/10.1109/tmm.2022.3178591.
[109]
Mark D. Plumbley, Christian Kroos, Juan Pablo Bello, Gaël Richard, Daniel P. W. Ellis, and Annamaria Mesaros (Eds.). 2018. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE ’18), 9–221. Retrieved from http://dcase.community/workshop2018/proceedings
[110]
Archontis Politis, Sharath Adavanne, Daniel Krause, Antoine Deleforge, Prerak Srivastava, and Tuomas Virtanen. 2021a. Detection and Classification of Acoustic Scenes and Events 2021. A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection. Technical Report. Detection and Classification of Acoustic Scenes and Events. DOI:
[111]
Archontis Politis, Annamaria Mesaros, Sharath Adavanne, Toni Heittola, and Tuomas Virtanen. 2021b. Overview and evaluation of sound event localization and detection in DCASE 2019. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021), 684–698. DOI:
[112]
Jordi Pons, Oriol Nieto, Matthew Prockup, Erik M. Schmidt, Andreas F. Ehmann, and Xavier Serra. 2017. End-to-end learning for music audio tagging at scale. arXiv:1711.02520v4. Retrieved from http://arxiv.org/abs/1711.02520
[113]
Andrés Pérez-López, Eduardo Fonseca, and Xavier Serra. 2019. A hybrid parametric-deep learning approach for sound event localization and detection. Retrieved from https://api.semanticscholar.org/CorpusID:201646197
[114]
Siyuan Qiao, Wei Shen, Zhishuai Zhang, Bo Wang, and Alan L. Yuille. 2018. Deep co-training for semi-supervised image recognition. (January 1970). Retrieved May 18, 2024 from https://link.springer.com/chapter/10.1007/978-3-030-01267-0_9
[115]
Xueheng Qiu, Ponnuthurai Nagaratnam Suganthan, and Gehan A. J. Amaratunga. 2018. Ensemble incremental learning Random Vector Functional Link network for short-term electric load forecasting. Knowledge-Based Systems 145 (2018), 182–196. DOI:
[116]
Santiago S. Silva R., Boris A. Gutman, Eduardo Romero, Paul M. Thompson, André Altmann, and Marco Lorenzi. 2019. Federated learning in distributed medical databases: Meta-analysis of large-scale subcortical brain data. In Proceedings of the 16th IEEE International Symposium on Biomedical Imaging (ISBI ’19). IEEE, 270–274. DOI:
[117]
Bharath Ramsundar, Steven Kearnes, Patrick Riley, Dale Webster, David Konerding, and Vijay Pande. 2015. Massively multitask networks for drug discovery. abs/1502.02072. Retrieved from https://api.semanticscholar.org/CorpusID:2127453
[118]
Antti Rasmus, Mathias Berglund, Mikko Honkala, Harri Valpola, and Tapani Raiko. 2015. Semi-supervised learning with ladder network. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2 (NIPS’15). MIT Press, Cambridge, MA, USA, 3546–3554.
[119]
Kuniaki Saito, Yoshitaka Ushiku, and Tatsuya Harada. 2017. Asymmetric tri-training for unsupervised domain adaptation . In Proceedings of the 34th International Conference on Machine Learning, Vol. 70, Doina Precup and Yee Whye Teh (Eds.), 2988–2997. Retrieved from https://proceedings.mlr.press/v70/saito17a.html
[120]
Mohammadreza Salehi, Niousha Sadjadi, Soroosh Baselizadeh, Mohammad Hossein Rohban, and Hamid R. Rabiee. 2020. Multiresolution knowledge distillation for anomaly detection. arXiv:2011.11108. Retrieved from https://arxiv.org/abs/2011.11108
[121]
Thomas Schlegl, Philipp Seeböck, Sebastian M. Waldstein, Ursula Schmidt-Erfurth, and Georg Langs. 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. arXiv:1703.05921. Retrieved from http://arxiv.org/abs/1703.05921
[122]
Jens Schröder, Stefan Goetze, and Jörn Anemüller. 2015. Spectro-temporal Gabor filterbank features for acoustic event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 12 (2015), 2198–2208. DOI:
[123]
Nian Shao, Erfan Loweimi, and Xiaofei Li. 2022. RCT: Random consistency training for semi-supervised sound event detection. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH ’22), 1541–1545. DOI:
[124]
Rajat Sharma, Sourabh Tiwari, Rohit Singla, Saksham Goyal, Vinay Vasanth Patage, and Rashmi T. Shankarappa. 2020. Sound event separation and classification in domestic environment using mean teacher. In Proceedings of the 2020 IEEE 17th India Council International Conference (INDICON), 1–6. DOI:
[125]
Ziqiang Shi, Liu Liu, Huibin Lin, and Rujie Liu. 2020. Hodge and Podge: Hybrid supervised sound event detection with multi-hot MixMatch and composition consistence training. In Proceedings of the 28th European Signal Processing Conference (EUSIPCO), Amsterdam, Netherlands, 1–5. DOI:
[126]
Ziqiang Shi, Liu Liu, Huibin Lin, Rujie Liu, and Anyan Shi. 2019. HODGEPODGE: Sound event detection based on ensemble of semi-supervised learning methods. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events 2019 (DCASE ’19). New York University. Michael I. Mandel, Justin Salamon, and Daniel P. W. Ellis (Eds.), 224–228.
[127]
Yusuke Shinohara. 2016. Adversarial multi-task learning of deep neural networks for robust speech recognition. In Proceedings of the Interspeech 2016, 17th Annual Conference of the International Speech Communication Association. ISCA. Nelson Morgan (Ed.), 2369–2372. DOI:
[128]
Joan Claudi Socoró, Francesc Alías, and Rosa Ma Alsina-Pages. 2017. An anomalous noise events detector for dynamic road traffic noise mapping in real-life urban and suburban environments. Sensors 17, 10 (2017), 1424–8220. DOI:
[129]
Konstantin Sozinov, Vladimir Vlassov, and Sarunas Girdzijauskas. 2018. Human activity recognition using federated learning. In Proceedings of the 2018 IEEE International Conference on Parallel and Distributed Processing with Applications, Ubiquitous Computing and Communications, Big Data and Cloud Computing, Social Computing and Networking, Sustainable Computing and Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), 1103–1111. DOI:
[130]
Dan Stowell, Dimitrios Giannoulis, Emmanouil Benetos, Mathieu Lagrange, and Mark Plumbley. 2015. Detection and classification of acoustic scenes and events. IEEE Transactions on Multimedia 17 (2015), 1733–1746. DOI:
[131]
Dan Stowell, Michael D. Wood, Hanna PamuÅâa, Yannis Stylianou, and Hervé Glotin. 2019. Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge. Methods in Ecology and Evolution 10, 3 (2019), 368–380. DOI:
[132]
Toshiyuki Sueyoshi and Gopalakrishna Tadiparthi. 2008. An agent-based decision support system for wholesale electricity market. Decision Support Systems 44 (2008), 425–446. DOI:
[133]
Heung-Il Suk, Seong-Whan Lee, and Dinggang Shen. 2017. Deep ensemble learning of sparse regression models for brain disease diagnosis. Medical Image Analysis 37 (2017), 101–113. DOI:
[134]
Mingxing Tan and Quoc V Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, 9-15 June 2019, 6105–6114. Retrieved from http://proceedings.mlr.press/v97/tan19a.html
[135]
Yihe Tang, Weifeng Chen, Yijun Luo, and Yuting Zhang. 2021. Humble teachers teach better students for semi-supervised object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’21), virtual, Computer Vision Foundation/IEEE, 3132–3141. DOI:
[136]
Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.), 1195–1204. Retrieved from https://proceedings.neurips.cc/paper/2017/hash/68053af2923e00204c3ca7c6a3150cf7-Abstract.html
[137]
P. R.J Tillotson, Q. H. Wu, and P. M. Hughes. 2000. Multi-agent learning for control of Internet traffic routing. IEE Colloquium (Digest), 4/1–4/6. DOI:
[138]
Nicolas Turpault, Romain Serizel, Justin Salamon, and Ankit Parag Shah. 2019. Sound event detection in domestic environments with weakly labeled data and soundscape synthesis. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events 2019 (DCASE ’19). New York University. Michael I. Mandel, Justin Salamon, and Daniel P. W. Ellis (Eds.), 253–257.
[139]
Andrew Varga and Herman J. M. Steeneken. 1993. Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication 12, 3 (1993), 247–251. DOI:
[140]
Jozef Vavrek, Jozef Juhár, and Anton Cizmar. 2013. Audio classification utilizing a rule-based approach and the support vector machine classifier. In Proceedings of the 2013 36th International Conference on Telecommunications and Signal Processing, 512–516. DOI:
[141]
Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, and David Lopez-Paz. 2022. Interpolation consistency training for semi-supervised learning. Neural Networks 145 (2022), 90–106. DOI:
[142]
Fabio Vesperini, Leonardo Gabrielli, Emanuele Principi, and Stefano Squartini. 2019. Polyphonic sound event detection by using capsule neural networks. IEEE Journal of Selected Topics in Signal Processing 13, 2 (2019), 310–322. DOI:
[143]
Tuomas Virtanen, Mark Plumbley, and Dan Ellis. 2017. Computational Analysis of Sound Scenes and Events. Springer International Publishing, 1–422. DOI:
[144]
Weiguo Wan and Hyo Jong Lee. 2019. Generative adversarial multi-task learning for face sketch synthesis and recognition. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), 4065–4069. DOI:
[145]
Lin Wang and Kuk-Jin Yoon. 2022. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 6 (2022), 3048–3068. DOI:
[146]
Yih-Wen Wang, Chia-Ping Chen, Chung-Li Lu, and Bo-Cheng Chan. 2021. Semi-supervised sound event detection using self-attention and multiple techniques of consistency training. In Proceedings of the 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 269–274.
[147]
Xianjun Xia, Roberto Togneri, Ferdous Sohel, Yuanjun Zhao, and Defeng Huang. 2019. A survey: Neural network-based deep learning for acoustic event detection. Circuits, Systems, and Signal Processing 38, 8 (2019), 3433–3453. DOI:
[148]
Yingda Xia, Dong Yang, Zhiding Yu, Fengze Liu, Jinzheng Cai, Lequan Yu, Zhuotun Zhu, Daguang Xu, Alan L Yuille, and Holger Roth. 2020. Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation. Retrieved May 18, 2023 from https://pubmed.ncbi.nlm.nih.gov/32623276/
[149]
Mengde Xu, Zheng Zhang, Han Hu, Jianfeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, and Zicheng Liu. 2021. End-to-end semi-supervised object detection with soft teacher. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 3060–3069.
[150]
Ruihang Xue, Xueru Bai, and Feng Zhou. 2021. Spatial–temporal ensemble convolution for sequence SAR target classification. IEEE Transactions on Geoscience and Remote Sensing 59, 2 (2021), 1250–1262. DOI:
[151]
Xiangli Yang, Zixing Song, Irwin King, and Zenglin Xu. 2021. A survey on deep semi-supervised learning. IEEE Transactions on Knowledge and Data Engineering 35, 9 (2021), 8934–8954.
[152]
Junliang Yu, Hongzhi Yin, Min Gao, Xin Xia, Xiangliang Zhang, and Nguyen Quoc Viet Hung. 2021. Socially-aware selfsSupervised tri-training for recommendation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2084–2092. DOI:
[153]
Giovanni Zambon, Chiara Confalonieri, H. Eduardo Roman, Fabio Angelini, and Roberto Benocci. 2020. Achievements of dynamap project. In Proceedings of the Forum Acusticum, 1685–1690. DOI:
[154]
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2018. MixUp: Beyond empirical risk minimization. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18) - Conference Track.
[155]
Keming Zhang, Yuanwen Cai, Yuan Ren, Ruida Ye, Xianwei Zhang, and Liang He. 2021. Multi-scale convolutional recurrent neural network and data augmentation for polyphonic sound event detection. Journal of Physics: Conference Series 1769 (2021), 12008. DOI:
[156]
Xiao-Lei Zhang and DeLiang Wang. 2016. A deep ensemble learning method for monaural speech separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 5 (2016), 967–977. DOI:
[157]
Ying Zhang, Tao Xiang, Timothy M. Hospedales, and Huchuan Lu. 2017. Deep mutual learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). arXiv:1706.00384. Retrieved from http://arxiv.org/abs/1706.00384
[158]
Xu Zheng, Yan Song, Jie Yan, Li-Rong Dai, Ian McLoughlin, and Lin Liu. 2020. An effective perturbation based semi-supervised learning method for sound event detection. In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, ISCA. Helen Meng, Bo Xu, and Thomas Fang Zheng (Eds.), 841–845. DOI:
[159]
Jie. Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2020. Graph neural networks: A review of methods and applications. AI Open (1), 57–81. DOI:
[160]
Xi Zhou, Xiaodan Zhuang, Ming Liu, Hao Tang, and Mark Hasegawa-Johnson. 2007. HMM-based acoustic event detection with AdaBoost feature selection. In Proceedings of the Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007 4625, 345–353. DOI:
[161]
Zhi-Hua Zhou and Ming Li. 2005. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17, 11 (2005), 1529–1541. DOI:
[162]
Michael Zhu and Suyog Gupta. 2018. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In Proceedings of the 6th International Conference on Learning Representations (ICLR ’18), Workshop Track Proceedings.
[163]
Xiaodan Zhuang, Jing Huang, Gerasimos Potamianos, and Mark Hasegawa-Johnson. 2009. Acoustic fall detection using Gaussian mixture models and GMM supervectors. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 69–72. DOI:
[164]
Xiaodan Zhuang, Xi Zhou, Mark A. Hasegawa-Johnson, and Thomas S. Huang. 2010. Real-world acoustic event detection. Pattern Recognition Letters 31, 12 (2010), 1543–1551. DOI:
[165]
Matthias Zöhrer and Franz Pernkopf. 2016. Gated recurrent networks applied to acoustic scene classification. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE ’16). Tuomas Virtanen, Annamaria Mesaros, Toni Heittola, Mark D. Plumbley, Peter Foster, Emmanouil Benetos, and Mathieu Lagrange (Eds.), 115–119.
[166]
Matthias Zöhrer and Franz Pernkopf. 2017. Virtual adversarial training and data augmentation for acoustic event detection with gated recurrent neural networks. In Proceedings of the Interspeech 2017, 18th Annual Conference of the International Speech Communication Association. ISCA. Francisco Lacerda (Ed.), 493–497. DOI:

Index Terms

  1. Teacher-Student Framework for Polyphonic Semi-supervised Sound Event Detection: Survey and Empirical Analysis

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 5
      October 2024
      719 pages
      EISSN:2157-6912
      DOI:10.1145/3613688
      • Editor:
      • Huan Liu
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 October 2024
      Online AM: 23 April 2024
      Accepted: 07 April 2024
      Revised: 07 July 2023
      Received: 07 July 2023
      Published in TIST Volume 15, Issue 5

      Check for updates

      Author Tags

      1. Polyphonic sound event detection
      2. Teacher-Student framework
      3. semi-supervised learning

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 221
        Total Downloads
      • Downloads (Last 12 months)221
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 12 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media