A Survey of Multimodal Sarcasm Detection
A Survey of Multimodal Sarcasm Detection
Shafkat Farabi, Tharindu Ranasinghe, Diptesh Kanojia, Yu Kong, Marcos Zampieri
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Survey Track. Pages 8020-8028.
https://doi.org/10.24963/ijcai.2024/887
Sarcasm is a rhetorical device that is used to convey the opposite of the literal meaning of an utterance. Sarcasm is widely used on social media and other forms of computer-mediated communication motivating the use of computational models to identify it automatically. While the clear majority of approaches to sarcasm detection have been carried out on text only, sarcasm detection often requires additional information present in tonality, facial expression, and contextual images. This has led to the introduction of multimodal models, opening the possibility to detect sarcasm in multiple modalities such as audio, images, text, and video. In this paper, we present the first comprehensive survey on multimodal sarcasm detection - henceforth MSD - to date. We survey papers published between 2018 and 2023 on the topic, and discuss the models and datasets used for this task. We also present future research directions in MSD.
Keywords:
Machine Learning: ML: Multi-modal learning
Machine Learning: General
Natural Language Processing: General
Natural Language Processing: NLP: Applications
Natural Language Processing: NLP: Sentiment analysis, stylistic analysis, and argument mining
Natural Language Processing: NLP: Speech
Natural Language Processing: NLP: Text classification