{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T04:26:06Z","timestamp":1773807966647,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"41","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>Multimodal sarcasm detection (MSD) aims to identify sarcasm polarity from diverse modalities (i.e., image\u2013text pairs), a task that has received increasing attention. While significant progress has been made, existing approaches still face two major issues: lack of explainability and weak generalizability. In this paper, we introduce a new large vision\u2013language model (LVLM) dubbed S\u00b3-MSD for explainable and generalizable MSD through three key components. For explainability, we develop (1) a self-training paradigm that automatically bootstraps answers with explanations, and (2) a self-calibrating mechanism that rectifies flawed explanations. For generalizability, we design (3) a self-focusing module that amplifies visual semantic entities through preference optimization, thereby mitigating textual over-reliance. Experimental results on both in-distribution and out-of-distribution (OOD) benchmarks demonstrate that S\u00b3-MSD consistently outperforms state-of-the-art methods in detection performance. Furthermore, the proposed S\u00b3-MSD provides persuasive explanations, as verified by both quantitative metrics and human evaluations.<\/jats:p>","DOI":"10.1609\/aaai.v40i41.40834","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T03:27:49Z","timestamp":1773804469000},"page":"35266-35274","source":"Crossref","is-referenced-by-count":0,"title":["S\u00b3-MSD: Large Vision-Language Model for Explainable and Generalizable Multi-modal Sarcasm Detection"],"prefix":"10.1609","volume":"40","author":[{"given":"Zhihong","family":"Zhu","sequence":"first","affiliation":[]},{"given":"Fan","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Yunyan","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Jinghan","family":"Sun","sequence":"additional","affiliation":[]},{"given":"Guimin","family":"Hu","sequence":"additional","affiliation":[]},{"given":"Hao","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Yuyan","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Bowen","family":"Xing","sequence":"additional","affiliation":[]},{"given":"Xian","family":"Wu","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/40834\/44795","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/40834\/44795","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T03:27:51Z","timestamp":1773804471000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/40834"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"41","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i41.40834","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}