default search action
33rd MM 2025: Dublin, Ireland
- Cathal Gurrin, Klaus Schoeffmann, Min Zhang, Luca Rossetto, Stevan Rudinac, Duc-Tien Dang-Nguyen, Wen-Huang Cheng, Phoebe Chen, Jenny Benois-Pineau:
Proceedings of the 33rd ACM International Conference on Multimedia, MM 2025, Dublin, Ireland, October 27-31, 2025. ACM 2025, ISBN 979-8-4007-2035-2
Keynote Talks
- Shalini De Mello
:
AI-Mediated Human Interaction. 1 - Tat-Seng Chua:
Next Phase of Research on Multimodal Foundation Models: From Alignments to Content Generation and Quality Assessment. 2 - Steve Hodges
:
SenseCam and Isotyping: The Challenges and Benefits of Working with New Hardware. 3-4
Content: Media Interpretation
- Haolun Li
, Weihuang Liu
, Jiateng Liu
, Zhenhua Tang
, Chi-Man Pun
, Qiguang Miao
, Feng Xu
, Hao Gao
:
MotionRefineNet: Fine-Grained Pose Sequence Smoothing and Refinement. 5-14 - Mo Yang
, Luo Chen
, Jiali Zhou
:
Change-UP: Advancing Visualization and Inference Capability for Multi-level Remote Sensing Change Interpretation. 15-24 - Yuxiang Zhao
, Wei Huang
, Haipeng Zeng
, Huan Zhao
, Yujie Song
:
Cross Time Domain Intention Interaction for Conditional Trajectory Prediction. 25-33 - Ye-Chan Kim
, SeungJu Cha
, Si-Woo Kim
, Taewhan Kim
, Dong-Jin Kim
:
SIDA: Synthetic Image Driven Zero-shot Domain Adaptation. 34-42 - Han Hu
, Wenli Du
, Bing Wang
:
Efficient Video Anomaly Detection via Scene-Dependent Memory Assisted Inter-Frame RGB Difference Reconstruction. 43-51 - Hyungjun Doh
, Dong In Lee
, Seunggeun Chi
, Pin-Hao Huang
, Kwonjoon Lee
, Sangpil Kim
, Karthik Ramani
:
Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction. 52-61 - Guoyi Li
, Die Hu
, Haozhe Li
, Qirui Tang
, Xiaomeng Fu
, Yulei Wu
, Xiaodan Zhang
, Honglei Lyu
:
Zero-Shot Multimodal Fact-Checking with Conceptual Reasoning. 62-71 - Junyu Zhou
, Yuyang Huang
, Wenrui Dai
, Junni Zou
, Ziyang Zheng
, Nuowen Kan
, Chenglin Li
, Hongkai Xiong
:
3DGabSplat: 3D Gabor Splatting for Frequency-adaptive Radiance Field Rendering. 72-81 - Songze Li
, Yunfei Guo
, Shen Chen
, Bin Li
, Kaiqing Lin
, Changsheng Chen
, Haodong Li
, Taiping Yao
, Shouhong Ding
:
DITL2: Dual-Stage Invariance Transfer Learning for Generalizable Document Image Tampering Localization. 82-91 - Rouqi Zhang
, Chengdi Lu
, Hancheng Lu
, Yang Cao
, Tiesong Zhao
:
RobustVisH: Robust Visual-Haptic Cross-Modal Recognition under Transmission Interference. 92-100 - Zhangchi Hu
, Peixi Wu
, Jie Chen
, Huyue Zhu
, Yijun Wang
, Yansong Peng
, Hebei Li
, Xiaoyan Sun
:
Dome-DETR: DETR with Density-Oriented Feature-Query Manipulation for Efficient Tiny Object Detection. 101-110 - Xiaojian Lin
, Wenxin Zhang
, Yuchu Jiang
, Wangyu Wu
, Yiran Guo
, Kangxu Wang
, Zongzheng Zhang
, Guijin Wang
, Lei Jin
, Hao Zhao
:
Butter: Frequency Consistency and Hierarchical Fusion for Autonomous Driving Object Detection. 111-120 - Xinkui Lin
, Yongxiu Xu
, Minghao Tang
, Shilong Zhang
, Hongbo Xu
, Hao Xu
, Yubin Wang
:
REMOTE: A Unified Multimodal Relation Extraction Framework with Multilevel Optimal Transport and Mixture-of-Experts. 121-130 - Xiaoran Xu
, Jiangang Yang
, Wenyue Chong
, Wenhui Shi
, Shichu Sun
, Jing Xing
, Jian Liu
:
Boosting Single-Domain Generalized Object Detection via Vision-Language Knowledge Interaction. 131-140 - Shaohua Liu
, Ning Gao
, Zuoya Gu
, Hongkun Dou
, Yue Deng
, Hongjue Li
:
Spatiotemporal Degradation-Aware 3D Gaussian Splatting for Realistic Underwater Scene Reconstruction. 141-150 - Tianyi Ma
, Maoying Qiao
:
EBaR: Efficient Buffer and Resetting for Single-Sample Continual Test-Time Adaptation. 151-160 - Wenzhe He
, Xiaojun Chen
, Wentang Chen
, Hongyu Wang
, Ying Liu
, Ruihui Li
:
RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion. 161-170 - Ruian He
, Zixian Zhang
, Ri Cheng
, Weimin Tan
, Bo Yan
:
Efficient Trajectory Space-Time Super-Resolution for Fast Live-cell Imaging. 171-179 - Hongzhao Li
, Hualei Wan
, Liangzhi Zhang
, Mingyuan Jiu
, Shupan Li
, Mingliang Xu
, Muhammad Haris Khan
:
Towards Robust Multimodal Domain Generalization via Modality-Domain Joint Adversarial Training. 180-188 - Hongda Qin
, Xiao Lu
, Zhiyong Wei
, Ningjiang Chen
:
Object-Preserving Counterfactual Diffusion Augmentation for Single-Domain Generalized Object Detection. 189-198 - Yidong Chen
, Qi Li
, Yuyang Yang
, Wen Li
, Sheng Ao
, Cheng Wang
:
Unleashing the Power of Data Generation in One-Pass Outdoor LiDAR Localization. 199-208 - Wenli Zheng
, Huiyuan Fu
, Xicong Wang
, Hao Kang
, Chuanming Wang
, Jin Liu
, Zekai Xu
, Heng Zhang
, Huadong Ma
:
EvRAW: Event-guided Structural and Color Modeling for RAW-to-sRGB Image Reconstruction. 209-218 - Zhaoxi Mu
, Rilin Chen
, Andong Li
, Meng Yu
, Xinyu Yang
, Dong Yu
:
From Continuous to Discrete: Cross-Domain Collaborative General Speech Enhancement via Hierarchical Language Models. 219-228 - Jin Han
, Yixin Yang
, Zhan Zhan
, Boxin Shi
, Imari Sato
:
EDeF-Net: Spatio-temporal Association Network for Flicker Removal in Event Streams. 229-237 - Jinxiang Lai
, Wenlong Wu
, Jiawei Zhan
, Jian Li
, Bin-Bin Gao
, Jun Liu
, Jie Zhang
, Song Guo
:
BoxSeg: Quality-Aware and Peer-Assisted Learning for Box-supervised Instance Segmentation. 238-246 - Jiaxu Li
, Rui Li
, Jianyu Qi
, Songning Lai
, Linpu Lv
, Kejia Fan
, Jianheng Tang
, Yutao Yue
, Dongzhan Zhou
, Yunhuai Liu
, Huiping Zhuang
:
CFSSeg: Closed-Form Solution for Class-Incremental Semantic Segmentation of 2D Images and 3D Point Clouds. 247-256 - Trong-Thang Pham
, Anh Nguyen
, Zhigang Deng
, Carol C. Wu
, Hien Nguyen
, Ngan Le
:
Interpreting Radiologist's Intention from Eye Movements in Chest X-ray Diagnosis. 257-266 - Mingliang Zhai
, Yiheng Wang
, Haidong Hu
, Chi-Man Pun
, Hao Gao
:
FGRFlow: Learning Fine-Grained Rigidity Scene Flow from 4D Radar Point Cloud. 267-276 - Xiaoyu Zhang
, Zhifeng Bao
, Hai Dong
, Ziwei Wang
, Jiajun Liu
:
Querying Autonomous Vehicle Point Clouds: Enhanced by 3D Object Counting with CounterNet. 277-285 - Guiping Cao
, Xiangyuan Lan
, Wenjian Huang
, Jianguo Zhang
, Dongmei Jiang
, Yaowei Wang
:
DS-Det: Single-Query Paradigm and Attention Disentangled Learning for Flexible Object Detection. 286-295 - Zhen Wang
, Dongyuan Li
, Yaozu Wu
, Peide Zhu
, Shiyin Tan
, Renhe Jiang
:
Video-based Transparent Object Segmentation via Temporal Feature Aggregation. 296-304 - Haosheng Cai
, Yang Xue
:
G2LFormer: Global-to-Local Query Enhancement for Robust Table Structure Recognition. 305-314 - Xinyi Hu
, Yuran Wang
, Ruixu Zhang
, Yue Li
, Wenxuan Liu
, Zheng Wang
:
SPAN: Continuous Modeling of Suspicion Progression for Temporal Intention Localization. 315-323 - Tianyi Zhang
, Qinglong Lin
, Yang Hu
, Pengming Feng
, Rubo Zhang
:
Edge-aware Affinity Enhancement for Image Manipulation Localization. 324-332 - Kanglin Qu
, Pan Gao
, Qun Dai
, Yuanhao Sun
:
HydraMamba: Multi-Head State Space Model for Global Point Cloud Learning. 333-342 - Runmin Cong
, Zongji Yu
, Hao Fang
, Haoyan Sun
, Sam Kwong
:
UIS-Mamba: Exploring Mamba for Underwater Instance Segmentation via Dynamic Tree Scan and Hidden State Weaken. 343-352 - Kuo Shi
, Jie Lu
, Shanshan Ye
, Guangquan Zhang
, Zhen Fang
:
MiraGe: Multimodal Discriminative Representation Learning for Generalizable AI-Generated Image Detection. 353-361 - Runtian Yuan
, Mohan Chen
, Jilan Xu
, Ling Zhou
, Qingqiu Li
, Yuejie Zhang
, Rui Feng
, Tao Zhang
, Shang Gao
:
Text-Promptable Propagation for Referring Medical Image Sequence Segmentation. 362-371 - Dunwei Tu
, Huiyu Yi
, Yuchi Wang
, Baile Xu
, Jian Zhao
, Furao Shen
:
Multiple Queries with Multiple Keys: A Precise Prompt Matching Paradigm for Prompt-based Continual Learning. 372-381 - Zihou Zhang
, Hao Li
, Zhengwei Yang
, Zechao Hu
, Liang Li
, Zheng Wang
:
From Language to Instance: Generative Visual Prompting for Zero-shot Camouflaged Object Detection. 382-391 - Chen Cai, Tianyi Liu, Jianjun Gao, Wenyang Liu, Kejun Wu, Ruoyu Wang, Yi Wang, Soo Chin Liew:
From Semantics, Scene to Instance-awareness: Distilling Foundation Model for Open-vocabulary Grounded Situation Recognition. 392-401 - Hanyu Guo
, Suzhou Que
, Junlong Gao
, Hanzi Wang
:
TFPA: Text Features Guided Dynamic Parameter Adjustment for Few Shot Action Recognition. 402-411 - Jitong Liao
, Yulu Gao
, Shaofei Huang
, Jialin Gao
, Jie Lei
, Ronghua Liang
, Si Liu
:
DOMR: Establishing Cross-View Segmentation via Dense Object Matching. 412-421 - Yue Guo
, Haoxiang Liao
, Haibin Ling
, Bingyao Huang
:
NeuroPump: Simultaneous Geometric and Color Rectification for Underwater Images. 422-431 - Yichi Zhang
, Zhuo Chen
, Lingbing Guo
, Yajing Xu
, Lei Liang
, Wen Zhang
, Huajun Chen
:
Client-Server Co-design with Multi-modal Codebooks Makes Better and Faster Federate Knowledge Sharing. 432-440 - Bo Wang
, Jin Liu
, Huiyuan Fu
, Xin Wang
, Heng Zhang
, Huadong Ma
:
Severe Light, Textureless Sight: A Benchmark for Extreme Exposure Correction. 441-449 - Zhicheng Lian
, Lizhi Wang
, Hua Huang
:
APG-MOS: Auditory Perception Guided-MOS Predictor for Synthetic Speech. 450-459 - Zhaoyu Chen
, Qian Huang
, Xing Li
, Yunfei Zhang
, Shihao Han
, Ge Gao
, Yirui Wu
, Xin Li
, Ziyang Yin
:
Geo-CF2Net: Geometry-Prior Cross-Frequency Interactive Fusion Network for 3D Human Action Recognition. 460-469 - Naisong Luo
, Yuan Wang
, Yuwen Pan
, Rui Sun
:
Focus on the Object: Gradient-based Feature Modulation for Camouflaged Object Segmentation. 470-478 - Liuyi Li
, Feng Shi
, Jian Wang
, Jinjing Zhu
, Wenze Shao
:
An Event-tailored State-Space Based Model for Pedestrian Detection. 479-488 - Zhihong Zheng
, Yang Cao
, Junlong Gao
, Hanzi Wang
:
OV-VOD: Open-Vocabulary Video Object Detection. 489-498 - Yin Wang
, Zixuan Wang
, Hao Lu
, Zhen Qin
, Hailiang Zhao
, Guanjie Cheng
, Xin Du
, Ge Su
, Li Kuang
, Mengchu Zhou
, Shuiguang Deng
:
SeMi: When Imbalanced Semi-Supervised Learning Meets Mining Hard Examples. 499-507 - Kuiye Ding
, Fanda Fan
, Yao Wang
, Ruijie Jian
, Xiaorui Wang
, Luqi Gong
, Yishan Jiang
, Chunjie Luo
, Jianfeng Zhan
:
DualSG: A Dual-Stream Explicit Semantic-Guided Multivariate Time Series Forecasting Framework. 508-517 - Quanmin Liang
, Jinyi Lu
, Qiang Li
, Shuai Liu
, Zhihao Zhao
, Yinzheng Zhao
, Wei Zhang
, Kai Huang
, Yonghong Tian
:
ESOD: Event-Based Small Object Detection. 518-527 - Michael Kohl
, Tobias Wursthorn
, Christof Weiß
:
Cross-Modal Metrics for Capturing Correspondences Between Music Audio and Stage Lighting Signals. 528-534 - Yingbing Liu
, Fei Ma
, Yanan Wu
, Xinxin Zuo
, Fan Zhang
, Yang Wang
:
Collaborative Cloud-edge Generalized Category Discovery. 535-543 - Ping Li
, Chenhao Ping
, Wenxiao Wang
, Mingli Song
:
Sample-level Adaptive Knowledge Distillation for Action Recognition. 544-552 - Jiale Yu
, Baopeng Zhang
, Zhu Teng
, Jianping Fan
:
OV-DAVEL: Towards Open-Vocabulary Dense Audio-Visual Event Localization in Untrimmed Videos. 553-562 - Jie Fu
, Bingkun Bao
:
Retaining Temporal Semantics and Relation Topologies for Continual Weakly-Supervised Audio-Visual Video Parsing. 563-572 - Xiaofeng Liu
, Guanchen Meng
, Chongyang Feng
, Risheng Liu
, Zhongxuan Luo
, Xin Fan
:
TNT-GS: Truncated and Tailored Gaussian Splatting. 573-581 - Pengfei Cai
, Yan Song
, Qing Gu
, Nan Jiang
, Haoyu Song
, Ian McLoughlin
:
Detect Any Sound: Open-Vocabulary Sound Event Detection with Multi-Modal Queries. 582-591 - Zhaolin Cai
, Fan Li
, Ziwei Zheng
, Yanjun Qin
:
HiProbe-VAD: Video Anomaly Detection via Hidden States Probing in Tuning-Free Multimodal LLMs. 592-601 - Guanchun Wang
, Xiangrong Zhang
, Yifei Zhang
, Zelin Peng
, Tianyang Zhang
, Xu Tang
, Licheng Jiao
:
ACMamba: Fast Unsupervised Anomaly Detection via An Asymmetrical Consensus State Space Model. 602-611 - Jian Zhou
, Yingjie Xie
, Cunhang Fan
, Huabin Wang
, Zhao Lv
, Liang Tao
:
DHGCN: Dual HyperGraph Convolutional Network for EEG-Based Auditory Attention Detection. 612-620 - Peiqi Jiang
, Bohan Lei
, Yuhao Sun
, Lingyun Yu
, Zhineng Chen
, Hongtao Xie
, Yongdong Zhang
:
Proactive Deepfake Detection via Self-Verifiable Semantic Watermarking. 621-630 - Yuzhen Li
, Yuehui Han
, Jianjun Qian
, Jian Yang
:
Self-Supervised Vision Graph Neural Networks Based on Contrastive Learning. 631-640 - Luosheng Xu
, Dalin Zhang
, Zhaohui Song
:
Pushing Trade-Off Boundaries: Compact yet Effective Remote Sensing Change Detection. 641-649 - Chenglong Sun
, Shijie Pang
, Yuzheng Wang
, Lizhe Qi
:
RWKV3D: An RWKV-Based Model with Multiple Training Strategies for Point Cloud Analysis. 650-659 - Jinghan Liu
, Xingmei Wang
, Jiaxiang Meng
:
Adaspeaker: Learning Discriminative Speaker Representations with Gradient-Aware Adaptive Scaling. 660-668 - Wenpeng Lang
, Saihui Hou
, Yongzhen Huang
:
Beyond Sparse Keypoints: Dense Pose Modeling for Robust Gait Recognition. 669-678 - Jinwen Wang
, Youfang Lin
, Xiaobo Hu
, Siyu Yang
, Sheng Han
, Shuo Wang
, Kai Lv
:
From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training. 679-688 - Yaoxun Xu
, Hangting Chen
, Jianwei Yu
, Wei Tan
, Shun Lei
, Zhiwei Lin
, Rongzhi Gu
, Zhiyong Wu
:
MuCodec: Ultra Low-Bitrate Music Codec for Music Generation. 689-698 - Chi Huang
, Qi Zhang
, Qian Zhang
, Nan Li
, Yipu Gong
, Xiaowei Wang
, Wei Feng
:
TriGS: Tri-consistency 3D Gaussian Splatting from Sparse and Unposed Views. 699-708 - Xuedong He
, Huiying Xu
, Xinzhong Zhu
, Hongbo Li
:
High-Performance Discriminative Tracking with Spatio-Temporal Template Fusion. 709-718 - Jingdong Zhang
, Hanrong Ye
, Xin Li
, Wenping Wang
, Dan Xu
:
Multi-Task Label Discovery via Hierarchical Task Tokens for Partially Annotated Dense Predictions. 719-728 - Jiaxi Wang
, Yaosen Min
, Xun Zhu
, Miao Li
, Ji Wu
:
MIPS: A Multimodal Infinite Polymer Sequence Pre-training Framework for Polymer Property Prediction. 729-738 - Yuxuan Zhang
, Bo Wang
, Yu Du
, Yangfu Zhu
, Haorui Wang
, Guangyao Su
, Tao Zhou
, Bin Wu
:
Cause and Effect: Video Social Relationship Recognition from Causal Perspective. 739-747 - Mashiro Toyooka
, Kiyoharu Aizawa
, Yoko Yamakata
:
A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task. 748-756 - Guitao Xu
, Ziqi Yi
, Peirong Zhang
, Jiahuan Cao
, Shihang Wu
, Lianwen Jin
:
From Pixels to Semantics: A Novel MLLM-Driven Approach for Explainable Tampered Text Detection. 757-766 - Yifan Wang
, Yuntai Ding
, Yiyang Gu
, Ziyue Qiao
, Chong Chen
, Xian-Sheng Hua
, Ming Zhang
, Wei Ju
:
Deep Graph Clustering with Disentangled Representation Learning. 767-776 - Han Li
, Shaofei Huang
, Longfei Xu
, Yulu Gao
, Beipeng Mu
, Si Liu
:
RATopo: Improving Lane Topology Reasoning via Redundancy Assignment. 777-786 - Sensen Wang
, Yuehu Liu
, Chi Zhang
:
BiOMamba: Mamba-based Forward-Then-Backward Temporal Modeling for Online Action Detection and Anticipation. 787-795 - Xiangyu Zheng
, Songcheng He
, Wanyun Li
, Xiaoqiang Li
, Wei Zhang
:
Shallow Features Matter: Hierarchical Memory with Heterogeneous Interaction for Unsupervised Video Object Segmentation. 796-805 - Xiaobo Liu
, Henglu Wei
, Chuxi Yang
, Wei Yu
, Xudong Zhao
, Xiangyang Ji
:
Camera-Specific Imaging Simulation for Raw Domain Image Super Resolution. 806-815 - Zongsheng Cao
, Yangfan He
, Anran Liu
, Jun Xie
, Zhepeng Wang
, Feng Chen
:
PurifyGen: A Risk-Discrimination and Semantic-Purification Model for Safe Text-to-Image Generation. 816-825 - Haonan Cheng
, Junwei Zhang
, Hengyan Huang
, Long Ye
:
FG-Midiformer: A Symbolic Music Understanding Model towards Fine-Grained Learning of Multi-Attributes. 826-835 - Yiran Meng
, Junhong Ye
, Wei Zhou
, Guanghui Yue
, Xudong Mao
, Ruomei Wang
, Baoquan Zhao
:
VideoForest: Person-Anchored Hierarchical Reasoning for Cross-Video Question Answering. 836-845 - Guorui Song
, Guocun Wang
, Zhe Huang
, Jing Lin
, Xuefei Zhe
, Jian Li
, Haoqian Wang
:
Towards Fine-Grained Human Motion Video Captioning. 846-855
Content: Multimodal Fusion
- Junpu Zhang
, Shengju Yu
, Suyuan Liu
, Siwei Wang
, Miaomiao Li
, Xinwang Liu
, En Zhu
, Kunlun He
:
Learning the Anchors with Similar Distributions to Original Data for Multi-view Clustering. 857-866 - Fengshun Wang
, Qiurui Wang
, Peilin Zhao
:
Learning Long-Range Action Representation by Two-Stream Mamba Pyramid Network for Figure Skating Assessment. 867-875 - Yan Zhang
, Gangyan Zeng
, Daiqing Wu
, Huawen Shen
, Binbin Li
, Yu Zhou
, Can Ma
, Xiaojun Bi
:
Gather and Trace: Rethinking Video TextVQA from an Instance-oriented Perspective. 876-885 - Hui Zhang
, Yiteng Xu
, Yonglin Tian
, Yidong Li
, Tiago H. Falk
, Fei-Yue Wang
:
Selective Shift: Towards Personalized Domain Adaptation in Multi-Agent Collaborative Perception. 886-895 - Mingqian Ji
, Jian Yang
, Shanshan Zhang
:
Enhancing Pseudo-Boxes via Data-Level LiDAR-Camera Fusion for Unsupervised 3D Object Detection. 896-904 - Gaoxiang Cong
, Liang Li
, Jiadong Pan
, Zhedong Zhang
, Amin Beheshti
, Anton van den Hengel
, Yuankai Qi
, Qingming Huang
:
FlowDubber: Movie Dubbing with LLM-based Semantic-aware Learning and Flow Matching based Voice Enhancing. 905-914 - Wenhui Wu
, Guanqi Wen
, Le Ou-Yang
, Ran Wang
, Sam Kwong
:
DUIMC: Deep Unbalanced Incomplete Multi-View Clustering via Graph Constrained Imputation and Contrastive Learning. 915-924 - Hao Wang
, Xiaobao Wei
, Xiaoan Zhang
, Jianing Li
, Chengyu Bai
, Ying Li
, Ming Lu
, Wenzhao Zheng
, Shanghang Zhang
:
EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler. 925-934 - Zhongfan Sun
, Kan Guo
, Yongli Hu
, Daxin Tian
, Qingqing Gao
, Jiapu Wang
, Junbin Gao
, Yanfeng Sun
, Baocai Yin
:
Large-Small Model Synergy with Multimodal Fine-Grained Heuristics for Knowledge-Based Visual Question Answering. 935-944 - Peng Chen
, Xiaobao Wei
, Qingpo Wuwu
, Xinyi Wang
, Xingyu Xiao
, Ming Lu
:
MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussians. 945-954 - Peiyuan Jiang
, Yao Liu
, Qiao Liu
, Zongshun Zhang
, Jiaye Yang
, Lu Liu
, Daibing Yao
:
DRKF: Decoupled Representations with Knowledge Fusion for Multimodal Emotion Recognition. 955-964 - Tao Ling
, Siping Shi
, Dan Wang
:
Accelerating Long Video Understanding via Compressed Scene Graph-Enabled Chain-of-Thought. 965-974 - Tong Chen
, Bowen Du
, Jiejie Zhao
, Hanyang Xia
, Haiquan Wang
, Jiakai Wang
:
BadMDA: Towards Backdoor Injection during Domain Adaptation to Collapse Multi-Agent Perception. 975-983 - Chen Gao
, Youfang Lin
, Wenbin Wang
, Shuo Zhang
:
Epipolar Consistency-based Network for Structure-Aware LF Semantic Segmentation. 984-992 - Jia-Xuan Jiang
, Jiashuai Liu
, Hongtao Wu
, Yifeng Wu
, Zhong Wang
, Qi Bi
, Yefeng Zheng
:
Single Domain Generalization for Multimodal Cross-Cancer Prognosis via Dirac Rebalancer and Distribution Entanglement. 993-1002 - Yi Liu, Xinyi Liu, Yi Wan, Panwang Xia, Qiong Wu, Yongjun Zhang
:
StereoINR: Cross-View Geometry Consistent Stereo Super Resolution with Implicit Neural Representation. 1003-1012 - Lanhu Wu
, Zilin Gao
, Hao Fei
, Mong-Li Lee
, Wynne Hsu
:
LEAF-Mamba: Local Emphatic and Adaptive Fusion State Space Model for RGB-D Salient Object Detection. 1013-1022 - Min Li
, Jinghui He
, Jiachen Li
, Delong Han
, Jin Wan
, Gang Li
:
HGCF: Hierarchical Geometry-Color Fusion for Multimodal Industrial Anomaly Detection. 1023-1031 - Qiyuan Zhu
, Lujun Li
, Dezhi Li
, Jiacheng Liu
, Pengyu Cheng
, Yucheng Xu
, Sirui Han
, Yike Guo
:
Outlier-Aware Model Merging for Efficient Multitask Inference. 1032-1041 - Zhenyang Liu
, Sixiao Zheng
, Siyu Chen
, Cairong Zhao
, Longfei Liang
, Xiangyang Xue
, Yanwei Fu
:
A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding. 1042-1051 - Jinbao Wei
, Yuhang Chen
, Zhijie Wang
, Gang Yang
, Shimin Tao
, Jian Gao
, Aiping Liu
, Xun Chen
:
Rethinking Diffusion Bridge Model with Dual Alignments for Medical Image Synthesis. 1052-1061 - Haichuan Fang
, Haoran Zhang
, Yulin Du
, Qiang Guo
, Zhen Tian
, Youwei Wang
, Yangdong Ye
:
CDIB: Consistency Discovery-guided Information Bottleneck for Multi-modal Knowledge Graph Reasoning. 1062-1071 - Yalan Qin
, Nan Pu
, Hanzhou Wu
, Zhaoxin Fan
:
Flexible Multi-view Clustering with Dynamic Views Generation. 1072-1081 - Zheng Guan
, Xue Wang
, Wenhua Qian
, Peng Liu
, Runzhuo Ma
:
Residual Prior-driven Frequency-aware Network for Image Fusion. 1082-1091 - Mulin Chen
, Bocheng Wang
, Jiaxin Zhong
, Zongcheng Miao
, Xuelong Li
:
Clustering-Oriented Generative Attribute Graph Imputation. 1092-1101 - Taichun Zhou
, Zhibin Dong
, Siwei Wang
, Ke Liang
, Miaomiao Li
, Xinwang Liu
, En Zhu
, Xiangjun Dong
:
DPFMVC: Dynamic Progressive Fusion for Multi-view Clustering. 1102-1111 - Runlin Yu
, Yipu Gong
, Wenrui Li
, Aiwen Sun
, Mengren Zheng
:
Discrepancy-Aware Attention Network for Enhanced Audio-Visual Generalized Zero-Shot Learning. 1112-1121 - Ziming Quan
, Penglei Wang
, Danyang Wu
, Jin Xu
:
Unsupervised Cross-view Message Passing Method for Multi-view Graph Clustering. 1122-1131 - Mingrui Li
, Dong Li
, Sijia Hu
, Kangxu Wang
, Zhenjun Zhao
, Hongyu Wang
:
SLAM-X: Generalizable Dynamic Removal for NeRF and Gaussian Splatting SLAM. 1132-1140 - Jinjia Peng
, Tianhang Cheng
, Guangqi Jiang
, Huibing Wang
:
Prior-oriented Anchor Learning with Coalesced Semantics for Multi-View Clustering. 1141-1150 - Hao Wang
, Hanxiao Li
, Li Xu
:
CrosST: Cross Swin 4D Transformer for Multi-Modal Alzheimer's Detection. 1151-1160 - Binbin Zheng
, Aiqiu Wu
, Kai Fan
, Ao Li
, Minghui Wang
:
Domain-Specific Interactive Prompting for Generalized Nuclei Classification. 1161-1170 - Shaochen Zhang
, Zekun Qi
, Runpei Dong
, Xiuxiu Bai
, Xing Wei
:
Positional Prompt Tuning for Efficient 3D Representation Learning. 1171-1180 - Zhicheng Dong
, Xiaodong Yue
, Yufei Chen
, Yuxian Zhou
:
Trusted Open-World Multi-View Classification with Dynamic Opinion Aggregation. 1181-1189 - Zihan Wang
, Yunhang Shen
, Yuan Fang
, Zuwei Long
, Ke Li
, Xing Sun
, Jiao Xie
, Shaohui Lin
:
Towards Universal Perception through Language-Guided Open-World Object Detection. 1190-1199 - Junyu Chen
, Jiawei Peng
, Yuan Sun
, Jian Dai
, Xingfeng Li
, Zhenwen Ren
:
Scalable Unpaired Multi-View Clustering via Anchor-Driven High-Throughput Encoding. 1200-1209 - Zeyan Li
, Cankun Guo
, Yin Tang
:
Modal Symbiosis: Variational Alignment Unveils New Horizons in Multimodal Representation Learning. 1210-1219 - Zihan Fang
, Zhiyong Xu
, Lan Du
, Shide Du
, Zhiling Cai
, Shiping Wang
:
Enhancing Multi-view Open-set Learning via Ambiguity Uncertainty Calibration and View-wise Debiasing. 1220-1228 - Zhangyong Tang
, Tianyang Xu
, Xuefeng Zhu
, Chunyang Cheng
, Tao Zhou
, Xiaojun Wu
, Josef Kittler
:
Serial Over Parallel: Learning Continual Unification for Multi-Modal Visual Object Tracking and Benchmarking. 1229-1238 - Weiqi Liu
, Yongshan Zhang
, Xinxin Wang
, Lefei Zhang
:
Deep Multi-Level Contrastive Clustering for Multi-Modal Remote Sensing Images. 1239-1247 - Jiaqi Cui
, Yilun Li
, Xi Wu
, Jiliu Zhou, Yan Wang
:
PREMISE: Individual Preference-aware Multi-modal Cooperation for Survival Prediction. 1248-1257 - Jiaxing Qi
, Yifan Xu
, Zhifei Yang
, Ruifei Ma
, Chao Zhang
, Kuifei Yu
:
BridgeGLM: Bridging Graph and Language Spaces for Domain Generalization. 1258-1267 - Yating Liu
, Yang Zou
, Xingyuan Li
, Xingyue Zhu
, Kaiqi Han
, Zhiying Jiang
, Long Ma
, Jinyuan Liu
:
Toward a Training-Free Plug-and-Play Refinement Framework for Infrared and Visible Image Registration and Fusion. 1268-1277 - Cai Xu
, Ziqi Wen
, Jie Zhao
, Wanqing Zhao
, Jinlong Yu
, Haishun Chen
, Ziyu Guan
, Wei Zhao
:
Beyond Equal Views: Strength-Adaptive Evidential Multi-View Learning. 1278-1287 - Yoorhim Cho
, Hongyeob Kim
, Semin Kim
, Youjia Zhang
, Yunseok Choi
, Sungeun Hong
:
RA-Touch: Retrieval-Augmented Touch Understanding with Enriched Visual Data. 1288-1297 - Xinlei Yu
, Changmiao Wang
, Hui Jin
, Ahmed Elazab
, Gangyong Jia
, Xiang Wan
, Changqing Zou
, Ruiquan Ge
:
CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation. 1298-1307 - Bingyu Li
, Da Zhang
, Zhiyuan Zhao
, Junyu Gao
, Xuelong Li
:
StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation. 1308-1317 - Liang Zhao
, Shubin Ma
, Bo Xu
, Qingchen Zhang
:
Dual-Learning based Penalized Multi-Align Clustering for Multi-View Incomplete and Disorderly Data. 1318-1326 - Jialei Cui
, Jianwei Du
, Yanzhe Li
, Lei Gao
, Hui Jiang
, Chenfu Bao
:
HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection. 1327-1336 - Disen Hu
, Xun Jiang
, Zhe Sun
, Hao Yang
, Chong Peng
, Peng Yan
, Heng Tao Shen
, Xing Xu
:
Geometric Gradient Divergence Modulation for Imbalanced Multimodal Learning. 1337-1345 - Xuanming Jiang
, Baoyi An
, Zhengwei Zou
, Dingyu Nie
, Jialie Shen
, Xueming Qian
, Guoshuai Zhao
:
Ear with Eye: Lightweight Multimodal Audio-Visual Network Inspired by Bionic Structures. 1346-1355 - Chengzhou Li
, Xiaokang Liu
, Qi Jia
, Jinyuan Liu
, Zhiying Jiang
, Longhan Feng
, Yu Liu
, Zhongxuan Luo
, Xin Fan
:
Physics-Guided Sonar Image Fine-grained Recognition under Scarce Annotations. 1356-1365 - Mianzimei Yang
, Zhipeng Zhou
, Jin Zhang
, Yuanhao Pu
, Hong Xie
, Defu Lian
:
Conflict-Buffering Optimization by Symmetry Teleportation for Deep Long-Tailed Recognition. 1366-1375 - Jiahao Wang
, Fang Liu
, Licheng Jiao
, Hao Wang
, Shuo Li
, Lingling Li
, Puhua Chen
, Xu Liu
, Xinyi Wang
:
FA3T: Feature-Aware Adversarial Attacks for Multi-modal Tracking. 1376-1385 - Zhiwei Zhang
, Ruikai Xu
, Weijian Zhang
, Zhizhong Zhang
, Xin Tan
, Jingyu Gong
, Yuan Xie
, Lizhuang Ma
:
PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion. 1386-1394 - Siyuan Zhang
, Xiaoping Wang
, Jiang Li
, Weibin Feng
, Xin Zhan
, Hongzhi Huang
:
HAFUNet: A Hierarchical Attention Fusion Network for Monocular Depth Estimation Integrating Event and Frame Data. 1395-1403 - Ronghui Li
, Lingxiao Han
, Shi Shu
, Yueyao Liu
, Yukang Lin
, Yue Ma
, Jie Guo
, Ziwei Liu
, Xiu Li
:
A Motion is Worth a Hybrid Sentence: Taming Language Model for Unified Motion Generation by Fine-grained Planning. 1404-1413 - Hongyu Jiang
, Yuxin Huo
, Sirou Sheng
, Hong Tao
, Chenping Hou
:
Scalable One-step Unaligned Multi-view Clustering via Joint High-Order Correlation Learning. 1414-1422 - Xiangping Zheng
, Xuan Feng
, Bo Wu
, Bin Ren
, Wei Li
, Xiuxin Hao
, Xun Liang
, Bin Tang
, Zhiwen Yu
:
Breaking Semantic Barriers: A Zero-Shot Generalized Framework for Graph Anomaly Detection. 1423-1432 - Mi Zheng
, Guanglei Yang
, Zitong Huang
, Zhenhua Guo
, Kevin Han
, Wangmeng Zuo
:
Segmenting Objectiveness and Task-awareness Unknown Region for Autonomous Driving. 1433-1442 - Yuhao Wang
, Lingjuan Miao
, Zhiqiang Zhou
, Lei Zhang
, Yajun Qiao
:
Infrared and Visible Image Fusion with Language-Driven Loss in CLIP Embedding Space. 1443-1451 - Min Dang
, Gang Liu
, Jingqi Zhao
, Adams Wai-Kin Kong
, Nan Luo
, Di Wang
:
DDFD: Diffusion-Based Denoising Fusion for Object Detection in Infrared-Visible Images. 1452-1461 - Jiahuan Long
, Wen Yao
, Tingsong Jiang
, Jiacheng Hou
, Shuai Jia
, Junqi Wu
, Xiaoya Zhang
, Xiaohu Zheng
, Chao Ma
:
CDUPatch: Color-Driven Universal Adversarial Patch Attack for Dual-Modal Visible-Infrared Detectors. 1462-1470 - Peirong Zhang
, Kai Ding
, Lianwen Jin
:
Capturing More: Learning Multi-Domain Representations for Robust Online Handwriting Verification. 1471-1479 - Zhenxi Wang
, Zongyao Yin
, Yujie Hou
, Xianchuan Yu
:
Robust Multi-view Clustering via Pseudo Label Guided Universum Learning. 1480-1489 - Yao Zhang
, Ping Huang
, Rui Zhang
:
Multimodal Dual Population Evolutionary Reinforcement Learning. 1490-1499 - Bo Xu
, Jie Wei
, Hongya Wang
, Ming Du
, Hui Song
, Yanghua Xiao
:
Bridging the Unseen Gap: Label-Enhanced Information Bottleneck Distillation for Multimodal Named Entity Recognition. 1500-1509 - Mingle Zhou
, Jiahui Liu
, Jin Wan
, Gang Li
, Min Li
:
Exploring Multimodal Prompts For Unsupervised Continuous Anomaly Detection. 1510-1519 - Hongming Wang
, Yifeng Wu
, Huimin Huang
, Hongtao Wu
, Jiaxuan Jiang
, Xiaodong Zhang
, Hao Zheng
, Yawen Huang
, Xian Wu
, Yefeng Zheng
, Jinping Xu
, Jing Cheng
:
BrainSegDMIF: A Dynamic Fusion-enhanced SAM for Brain Lesion Segmentation. 1520-1529 - Tairan Huang
, Yili Wang
, Qiutong Li
, Changlong He
, Jianliang Gao
:
Can LLMs Find Fraudsters? Multi-level LLM Enhanced Graph Fraud Detection. 1530-1538 - Naichuan Zheng
, Yuchen Du
, Hailun Xia
, Zeyu Liang
:
Signal-SGN: A Spiking Graph Convolutional Network for Skeleton Action Recognition via Learning Temporal-Frequency Dynamics. 1539-1548 - Yang Zhou
, Jin Wang
, Yuxiao Zhang
, Kaixiang Huang
, Guodong Lu
, Jingru Yang
, Shengfeng He
:
Art4Math: Handwritten Mathematical Expression Recognition via Multimodal Sketch Grounding. 1549-1558 - Feiyu Peng
, Chaobo He
, Junwei Cheng
, Huijuan Hu
, Wenkai Zhang
, Youda Mo
:
Frequency-refined Graph Convolution Network with Cross-modal Wavelet Denoising for Recommendation. 1559-1568 - Chuan Zeng
, Zhao Zhang
, Wei Huang
, Lei Zhang
, Le Yi
, Kefu Zhao
:
DC2-SR: A Dual-Consistency Guided Curriculum Learning method for Thick-Slice Fetal MRI Super-Resolution. 1569-1578 - An Xiang
, Zixuan Huang
, Xitong Gao
, Kejiang Ye
, Cheng-zhong Xu
:
BridgeNet: A Unified Multimodal Framework for Bridging 2D and 3D Industrial Anomaly Detection. 1579-1587 - Hui Li
, Pengfei Yang
, Juanyang Chen
, Le Dong
, Yanxin Chen
, Quan Wang
:
MST-Distill: Mixture of Specialized Teachers for Cross-Modal Knowledge Distillation. 1588-1597 - Shifeng Bao
, Zhe Xue
, Qi Chen
, Shilong Ou
, Amin Beheshti
, Quan Z. Sheng
, Anton van den Hengel
, Yuankai Qi
:
CausalMVC: Causal Content-Style Representation Learning for Deep Multi-View Clustering. 1598-1606 - Wei Li
, Junwei Zhu
, Honghui Xu
, Jiawei Jiang
, Jianwei Zheng
:
SpecSolver: Solving Spatial-Spectral Fusion via Semantic Transformer. 1607-1616 - Junwei Zhu
, Wei Li
, Honghui Xu
, Jiawei Jiang
, Zhi Liu
, Jianwei Zheng
:
Arbitrary-scale Fusion Neural Operator. 1617-1626 - Zhongyun Bao
, Gang Fu
, Jianchi Sun
, Jing Zhou
, Ziqi Yu
, Chunxia Xiao
:
I 2HDiffuser: Image Illumination Harmonization Meets the Diffusion Model. 1627-1636 - Weitai Kang
, Luowei Zhou
, Junyi Wu
, Changchang Sun
, Yan Yan
:
Visual Grounding with Attention-Driven Constraint Balancing. 1637-1645 - Pengfei Ren
, Jingyu Wang
, Haifeng Sun
, Qi Qi
, Jing Wang, Jianxin Liao:
Rule Meets Learning: Confidence-Aware Multi-View Fusion for Self-Supervised 3D Hand Pose Estimation. 1646-1655 - Bingfeng Liu
, Songwei Pei
, Shuhuai Wang
, Wenzheng Yang
, Qian Li
, Shangguang Wang
:
Prior-Constrained Relevant Feature driven Image Fusion with Hybrid Feature via Mode Decomposition. 1656-1665 - Yue Zhu
, Haiwen Diao
, Shang Gao
, Jiazuo Yu
, Jiawen Zhu
, Yunzhi Zhuge
, Shuai Hao
, Xu Jia
, Lu Zhang
, Ying Zhang
, Huchuan Lu
:
Regularizing Subspace Redundancy of Low-Rank Adaptation. 1666-1675 - Jintian Ji
, Songhe Feng
:
Anchors Bring Stability and Efficiency: Fast Tensorial Multi-view Clustering on Shuffled Datasets. 1676-1685 - Ziyu Wang
, Yiming Du
, Rui Ning
, Lusi Li
:
Energy-based Deep Incomplete Multi-View Clustering. 1686-1694 - Kai Zhu
, Jun Yin
:
Neighbor Contrastive Learning with Weakened Consensus Graph for Deep Multi-View Clustering. 1695-1703 - Hankun Liu
, Yujian Zhao
, Guanglin Niu
:
Try Harder: Hard Sample Generation and Learning for Cloth-Changing Person Re-ID. 1704-1713 - Shide Du
, Chunming Wu
, Zihan Fang
, Wendi Zhao
, Yilin Wu
, Changwei Wang
, Shiping Wang
:
LargeMvC-Net: Anchor-based Deep Unfolding Network for Large-scale Multi-view Clustering. 1714-1723 - Quangui He
, Jiahui Qu
, Wenqian Dong
, Song Xiao
, Qinghao Gao
:
Cycle-Consistent Mamba-Based Registration-Fusion Joint Network for Unregistered Hyperspectral Image Super-Resolution. 1724-1733 - Liyuan Cao
, Zihang Guo
, Huaiwen Zhang
:
Event Consistency-aware Robust Fake News Detection. 1734-1743 - Qi Peng
, Jialin Cui
, Jiayuan Xie
, Yi Cai
, Qing Li
:
Tree-of-Reasoning: Towards Complex Medical Diagnosis via Multi-Agent Reasoning with Evidence Tree. 1744-1753 - Mengzhen Wang
, Xunbin Huang
, Jiayuan Xie
, Shukai Ma
, Jiale Men
, Dayong Liang
, Yi Cai
:
From Model Diagram to Code: A Benchmark Dataset and Multi-Agent Framework. 1754-1763 - Ziqiang Shi
, Rujie Liu
, Jun Takahashi
, Shan Jiang
:
TrueCount: Improving Open-World Object Counting with Visual-Language Models and Dynamic Multi-Modal Inputs. 1764-1773 - Hong Gao
, Xiangkai Xu
, Tianqi Zhu
, Xiugang Dong
, Yiming Bao
, Min-Ling Zhang
:
Radar-Mamba: 4D Millimeter-Wave Point Cloud Enhancement via State Space Models. 1774-1782 - Jiangyong Yu
, Sifan Zhou
, Dawei Yang
, Shuoyu Li
, Shuo Wang
, Xing Hu
, Chen Xu
, Zukang Xu
, Changyong Shu
, Zhihang Yuan
:
MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Static Quantization. 1783-1792 - Peican Zhu
, Yubo Jing
, Le Cheng
, Keke Tang
, Yangming Guo
:
KEN: Knowledge Augmentation and Emotion Guidance Network for Multimodal Fake News Detection. 1793-1801 - Runqi Wang
, Caoyuan Ma
, Jian Zhao
, Hanrui Xu
, Dongfang Sun
, Haoyang Chen
, Lin Xiong
, Zheng Wang
, Xuelong Li
:
Leader is Guided: Interactive Motion Generation via Lead-Follow Paradigm and Trajectory Guidance. 1802-1811 - Xuesong Li
, Jinguang Tong
, Jie Hong
, Vivien Rolland
, Lars Petersson
:
DGNS: Deformable Gaussian Splatting and Dynamic Neural Surface for Monocular Dynamic 3D Reconstruction. 1812-1821 - Pingting Hao
, Huijie Zhang
, Yongshan Zhang
:
Tensor-based Opposing yet Complementary Learning for Multi-view Multi-label Feature Selection. 1822-1831 - Hui Liu
, Chen Jia
, Fan Shi
, Xu Cheng
, Mengfei Shi
, Xia Xie
, Shengyong Chen
:
LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural Cracks. 1832-1841 - Mufan Liu
, Wu Ran
, Zhiquan He
, Zuojie Xie
, Hong Lu
, Peirong Ma
:
Implicit Retinex Decomposition with Chromaticity Disentanglement for Low-Light Image Enhancement. 1842-1851 - Chenbo Zhang
, Bing Huangfu
, Hongxu Ma
, Jihong Guan
, Shuigeng Zhou
:
Multi-modal Prototype Guided Few-shot Object Detection. 1852-1861 - Qiyin Zhong
, Xianglin Qiu
, Xiaolei Wang
, Zhen Zhang
, Gang Liu
, Jimin Xiao
:
FAMRD: Frequency-Aware Multimodal Reverse Distillation for Industrial Anomaly Detection. 1862-1871 - Lei Xie
, Junxiong Huang
, Yuanjing Feng
, Qingrun Zeng
:
Tractography-Guided Dual-Label Collaborative Learning for Multi-Modal Cranial Nerves Parcellation. 1872-1879 - Guoqiang Liang
, Chuan Qin
, De Cheng
, Shizhou Zhang
, Yanning Zhang
:
Boosting Multi-Modal Alignment: Geometric Feature Separation for Class Incremental Learning. 1880-1889 - Xueheng Li
, Xuanhua He
, Tao Hu
, Jie Zhang
, Man Zhou
, Chengjun Xie
, Yingying Wang
, Bo Huang
:
Freq-RWKV: Granularity-Aware Spatial-Frequency Synergy via Dual-Domain Recurrent Scanning for Pan-sharpening. 1890-1899 - Lingren Wang
, Wenxuan Tu
, Jieren Cheng
, Jianan Wang
, Xiangyan Tang
, Chenchen Wang
:
Discovering Maximum Frequency Consensus: Lightweight Federated Learning for Medical Image Segmentation. 1900-1909 - Nan Gao
, Junchao Zhu
, Yilong Zhang
, Ronghua Liang
, Guodao Sun
, Peng Chen
:
Dual Teacher with Dempster-Shafer Guidance for Decision Making in Semi-Supervised Small Object Detection. 1910-1919 - Nan Ma
, Beining Sun
, Yiheng Han
, Genbao Xu
:
Kinematic Enhanced Hypergraph Convolutional Network for Skeleton-based Human Action Recognition with LLM Training Guides. 1920-1928 - Yufei Zhang
, Yicheng Xu
, Hongxin Wei
, Zhiping Lin
, Xiaofeng Zou
, Cen Chen
, Huiping Zhuang
:
Analytic Continual Test-Time Adaptation for Multi-Modality Corruption. 1929-1937 - Pengfei Gu
, Hongxiao Wang
, Yejia Zhang
, Huimin Li
, Chaoli Wang
, Danny Chen
:
TopoImages: Incorporating Local Topology Encoding into Deep Learning Models for Medical Image Classification. 1938-1947 - Dawei Lin
, Meng Yuan
, Ziming Wang
, Tieru Wu
, Yuanning Liu
:
FreeCAD: A Multimodal Framework for 3D CAD Model Generation from Free-Form Prompts. 1948-1956 - Renjie Lin
, Jiacheng Li
, Shide Du
, Shiping Wang
, Le Zhang
:
OIMGC-Net: Optimization-inspired Interpretable Multi-view Graph Clustering Network. 1957-1966 - Qi Shen
, Junchang Xin
, Bing Tian Dai
, Shudi Zhang
, Xinyao Liu
, Zhiqiong Wang
:
ElaSleepNet: Exploring an Elastic Multimodal Neural Network for Sleep Staging via Temporal and Contextual Consistency Learning. 1967-1976 - Zeyu Zhu
, Ke Liang
, Lingyuan Meng
, Xingchen Hu
, Xinwang Liu
, Wanwei Liu
, Kunlun He
:
SALVG: Latent Variable Gene Augmented Graph Learning for Multi-View Clustering in Spatial Transcriptomics. 1977-1986 - Lamei Di
, Bin Zhang
, Yiming Wang
, Wenxia Zhang
:
Frequency Meets Semantics: Text-Visual Fusion with Directional Spectral Enhancement for Salient Object Detection in Optical Remote Sensing Images. 1987-1996 - Miaosen Luo
, Yuncheng Jiang
, Sijie Mai
:
Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis. 1997-2006 - Zeyu Xia
, Canqun Yang
, Haoang Chi
, Tao Tang
, Weiming Xiang
, Yingbo Cui
:
MMF-SV: A Multi-Modal Feature Fusion-Based Structural Variant Caller. 2007-2015 - Ziang Li
, Chengxiang Si
, Zhenyu Cheng
:
Zero in on the Target: A Composite Robust Model for Retrieving Information in Traffic Data to Discover Network Attacks. 2016-2025 - Long Chen
, De Cheng
, Shizhou Zhang
, Yinghui Xing
, Di Xu
, Yanning Zhang
:
Amplitude-aware Domain Style Replay for Lifelong Person Re-identification. 2026-2035 - Jie Qin
, Wei Yang
, Yan Su
, Yiran Zhu
, Weizhen Li
, Yunyue Pan
, Chengchang Pan
, Honggang Qi
:
HER2 Expression Prediction with Flexible Multi-Modal Inputs via Dynamic Bidirectional Reconstruction. 2036-2043 - Zhaochen Guo
, Zhixiang Shen
, Xuanting Xie
, Liangjian Wen
, Zhao Kang
:
Disentangling Homophily and Heterophily in Multimodal Graph Clustering. 2044-2053 - Zhishuo Zhao
, Yi Lin
, Dongyue Guo
, Junyu Fan
:
AV-RISE: Hierarchical Cross-Modal Denoising for Learning Robust Audio-Visual Speech Representation. 2054-2063 - Jiahao Zhang
, Wenzhe Yin
, Shujian Yu
:
Cross-Modal Retrieval with Cauchy-Schwarz Divergence. 2064-2073 - Xinbo Geng
, Fan Shi
, Xu Cheng
, Chen Jia
, Meng Zhao
, Shengyong Chen
:
LFMamba: Focal Stack-aware State Space Modeling for Light Field Salient Object Detection. 2074-2083 - Xiaodi Xu
, Lijie Li
, Ye Wang
, Tao Ren
, Tian Qiao
:
WFF: Wavelet-based Information Fusion for Multimodal Knowledge Graph Link Prediction. 2084-2093 - Xuyao Liu
, Jiahui Qu
, Wenqian Dong
:
Breaking the Spatial-Temporal Consistency Constraint: Towards Reference-Based Hyperspectral Image Super-Resolution. 2094-2103 - Yifan Liu
, Yu Fang
, Zhouhan Lin
:
Visual-informed Silent Video Identity Conversion. 2104-2112 - Zebing Yao
, Hao Fu
, Yuanhang Yang
, Guanghua Gu
:
Dynamic Optimization Noisy Cross-Modal Hashing. 2113-2121 - Yuhang Lan
, Shilin Xu
, Chao Su
, Run Ye
, Dezhong Peng
, Yuan Sun
:
Multi-view Hashing Classification. 2122-2130 - Jielong Lu
, Zhihao Wu
, Jiajun Yu, Qianqian Shen, Jiajun Bu, Haishuai Wang:
Where Views Meet Curves: Virtual Anchors for Hyperbolic Multi-View Graph Diffusion. 2131-2140 - Jun Yang
, Maoyu Mao
:
DiffuSeg: Diffusion-Enhanced Cross-Modal Semantic Segmentation for RGB-D. 2141-2149 - Haochen Yang
, Lei Li
, Jiacheng Guo
, Baolu Li
, Minghai Qin
, Hongkai Yu
, Tianyun Zhang
:
DA3D: Domain-Aware Dynamic Adaptation for All-Weather Multimodal 3D Detection. 2150-2158 - Wentao Wu
, Xiao Wang
, Chenglong Li
, Bo Jiang
, Jin Tang
, Bin Luo
, Qi Liu
:
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework. 2159-2168 - Yichen Bao
, Yuxuan Liu
, Yu Duan
, Jing Li
, Quanxue Gao
:
Multi-view Clustering Based on Probabilistic Tensor Regression. 2169-2177 - Xingchen Li
, Wuyang Zhang
, Guoliang You
, Xiaomeng Chu
, Wenhao Yu
, Yifan Duan
, Yuxuan Xiao
, Yanyong Zhang
:
CalibWorkflow: A General MLLM-Guided Workflow for Centimeter-Level Cross-Sensor Calibration. 2178-2187 - Yongzheng Liu
, Siru Zhong
, Gefeng Luo
, Weilin Ruan
, Yuxuan Liang
:
Towards Multi-Scenario Forecasting of Building Electricity Loads with Multimodal Data. 2188-2196 - Yan Chen
, Bingbing Jiang
, Peng Zhou
, Lei Duan
, Yuhua Qian
, Liang Du
:
Balanced Multiple Kernel Clustering with Discrete Partition Entropy Auto Regularization. 2197-2206 - Jiale Zou
, Yan Chen
, Bingbing Jiang
, Peng Zhou
, Liang Du
, Lei Duan
, Yuhua Qian
:
Robust Tensor Learning with Graph Diffusion for Scalable Multi-view Graph Clustering. 2207-2215 - Linxin Xiao
, Xin Wang
, Zeyang Zhang
, Yang Yao
, Wenwu Zhu
:
DyNAS-DDI: Dynamic Pairwise Architecture Search for Generalizable Drug-Drug Interaction LLM. 2216-2225 - Jianxiang Xie
, Yao Wu
, Yachao Zhang
, Xiaopei Zhang
, Yuan Xie
, Yanyun Qu
:
PLATO-TTA: Prototype-Guided Pseudo-Labeling and Adaptive Tuning for Multi-Modal Test-Time Adaptation of 3D Segmentation. 2226-2234 - Shilin Liu
, Kyohei Kamikawa
, Keisuke Maeda
, Takahiro Ogawa
, Miki Haseyama
:
Context-aware Image-to-Music Generation via Bridging Modalities through Musical Captions. 2235-2243 - Yan Li
, Xingchen Hu
, Jiyuan Liu
, Zhong Liu
:
Federated Incomplete Multi-view Clustering with Individual Structure Preservation and Central Representation Tensorization. 2244-2253 - Hanghui Guo
, Weijie Shi
, Mengze Li
, Juncheng Li
, Hao Chen
, Yue Cui
, Jiajie Xu
, Jia Zhu
, Jiawei Shen
, Zhangze Chen
, Sirui Han
:
Consistent and Invariant Generalization Learning for Short-video Misinformation Detection. 2254-2263 - Ruilin Yao
, Yi Rong
, Tianyu Zou
, Bo Zhang
, Jian Li
, Shengwu Xiong
, Shili Xiong
:
MAP: Parameter-Efficient Tuning for Referring Expression Comprehension via Multi-Modal Adaptive Positional Encoding. 2264-2273 - Hongyang Lin
, Kuixiang Shao
, Peijun Xu
, Zhuoyang Bu
, Yuyang Jiao
, Ziyuan Tang
, Chenxi Xiao
, Jingyi Yu
:
HandCraft: Tactile-Informed Hand-Object Dynamics Capture and Realistic Rendering. 2274-2283 - Linxuan Luo
, Pan Mu
, Cong Bai
:
Physics-Coupled Frequency Dynamic Adaptation Network for Domain Generalized Underwater Object Detection. 2284-2293 - Yanfeng Liu
, Lefei Zhang
:
Multimodal Decomposed Distillation with Instance Alignment and Uncertainty Compensation for Thermal Object Detection. 2294-2303 - Rui Wang
, Yuxuan Liu
, Guangyu Yang
, Quanxue Gao
, Cheng Deng
:
Bi-Orthogonal Non-negative Tensor tri-Factorization for Tensorized Label Learning. 2304-2312 - Xin Peng
, Bowen Liu
, Renxiang Guan
, Wenxuan Tu
:
Multi-view Graph Clustering with Dual Structure Awareness for Remote Sensing Data. 2313-2322 - Mingliang Yan
, Yanhua Yu
, Ruochi Zhang
, Zhiyuan Liu
, Ruicheng Zhang
, Yimeng Ren
, Kangkang Lu
, Zhiyong Huang
, Feng Luo
, Zhen Cai
:
DeepMolTex: Deep Alignment of Molecular Graphs with Large Language Models via Mixture of Modality Experts. 2323-2332 - Xinzhu Li
, Juepeng Zheng
, Yikun Chen
, Xudong Mao
, Guanghui Yue
, Wei Zhou
, Chenlei Lv
, Ruomei Wang
, Fan Zhou
, Baoquan Zhao
:
DepthGait: Multi-Scale Cross-Level Feature Fusion of RGB-Derived Depth and Silhouette Sequences for Robust Gait Recognition. 2333-2341 - Tianming Xu
, Tiantian Guo
, Youdan Feng
, Zihan Chen
, Qiaoyi Xue
, Lingzhi Hu
, Yuhang Shi
:
Anatomical Region-Guided 3D PET/MR Tumor Segmentation via Medical Record. 2342-2351 - Rongqiang Fang
, Yongqi Sun
, Jidong Yuan
, Hongbo Cao
, Jinkun Dong
:
A Language-Assisted Semantic-Aware Disentangled Method for Link Prediction on Heterogeneous Graphs. 2352-2361 - Guimin Hu
, Yi Xin
, Lijie Hu
, Zhihong Zhu
, Hasti Seifi
:
PgM: Partitioner Guided Modal Learning Framework. 2362-2371 - Kaixiang Wang
, Xiaojian Ding
, Wanqi Yang
, Ming Yang
:
Label-Semantics-Guided Multi-View Multi-Label Learning via High-Order Semantic Fusion. 2372-2380 - Chenyang Zhou
, Monghjaya Ha
, Chao Tang
, Licheng Wu
:
UniMTR: Unified Recognition of Dual-style Traditional Mongolian Scripts via Contrastive Representation Alignment. 2381-2389 - Mingyang Yu
, Xiahui Guo
, Peng Chen
, Zhenkai Li
, Yang Shu
:
Towards Measuring and Modeling Geometric Structures in Time Series Forecasting via Image Modality. 2390-2398 - Shu-Xun Yang
, Xian-Ling Mao
, Heyan Huang
:
ESTJ: Enhancing Structured Tendency Judgment in Hybrid-Modal Table Understanding. 2399-2408 - Maoxun Yuan
, Bo Cui
, Tianyi Zhao
, Jiayi Wang
, Shan Fu
, Xue Yang
, Xingxing Wei
:
UniRGB-IR: A Unified Framework for Visible-Infrared Semantic Tasks via Adapter Tuning. 2409-2418 - Nokap Tony Park
:
M2PE-Diff: Music-to-Pose Encoder for Dance Video Generation Leveraging Latent Diffusion Framework. 2419-2428 - Xiaorui Ding
, Huan Ma
, Changqing Zhang
:
A Theoretical Proof of Dynamic Multimodal Fusion Exacerbates Modality Greedy. 2429-2436 - Yiming Xu
, Jiarun Chen
, Zhen Peng
, Zihan Chen
, Qika Lin
, Lan Ma
, Bin Shi
, Bo Dong
:
Court of LLMs: Evidence-Augmented Generation via Multi-LLM Collaboration for Text-Attributed Graph Anomaly Detection. 2437-2446 - Shanghui Deng
, Xiao Zheng
, Chang Tang
, Kun Sun
, Yuanyuan Liu
, Xinwang Liu
:
Find True Collaborators: Banzhaf Index-based Cross View Alignment for Partially View-aligned Clustering. 2447-2456 - Wenlan Chen
, Lu Gao
, Cheng Liang
, Fei Guo
:
Deep Variational Incomplete Multi-View Clustering with Information-Theoretic Guidance. 2457-2466 - Jieyi Ge
, Zhaodong Sun
, Wei Peng
, Chenhang Ying
, Yuwei Chen
, Kui Ren
, Xiaobai Li
:
Evidential Remote Physiological Measurement via Uncertainty-aware Fusion of Video and RF. 2467-2475 - Fujian Ren
, Wenlan Chen
, Lu Gao
, Fei Guo
, Cheng Liang
:
Dual-Level Distribution Alignment for Deep Incomplete Multi-View Clustering. 2476-2485 - Guoyi Li
, Die Hu
, Xiaomeng Fu
, Qirui Tang
, Yulei Wu
, Xiaodan Zhang
, Honglei Lyu
:
Entity Graph Alignment and Visual Reasoning for Multimodal Fake News Detection. 2486-2495 - Peng Zhao
, Zhiguang Cao
, Di Wang
, Wen Song
, Wei Pang
, You Zhou
, Yuan Jiang
:
Visual-Enhanced Multimodal Framework for Flexible Job Shop Scheduling Problem. 2496-2505 - Yu Zhao
, Ying Zhang
, Xuhui Sui
, Baohang Zhou
, Haoze Zhu
, Jeff Z. Pan
, Xiaojie Yuan
:
Dark Side of Modalities: Reinforced Multimodal Distillation for Multimodal Knowledge Graph Reasoning. 2506-2515 - Jianting Tang
, Yubo Wang
, Haoyu Cao
, Linli Xu
:
CROP: Integrating Topological and Spatial Structures via Cross-View Prefixes for Molecular LLMs. 2516-2525 - Guyue Jin
, Tianming Zhao
, Jiacan Yan
, Tian Tian
:
Contextually-Guided State Space Fusion for Misaligned Multi-Spectral Object Detection. 2526-2535 - Libin Liu
, Shen Chen
, Sen Jia
, Jingzhe Shi
, Can Jin
, Zongkai Wu
, Jenq-Neng Hwang
, Lei Li
:
Graph Canvas for Controllable 3D Scene Generation. 2536-2545 - Berta Céspedes-Sarrias
, Carlos Collado-Capell
, Pablo Rodenas-Ruiz
, Olena Hrynenko
, Andrea Cavallaro
:
MM-HSD: Multi-Modal Hate Speech Detection in Videos. 2546-2555
Content: Vision and Language
- Yijie Yang
, Lianyong Qi
, Weiming Liu
, Fan Wang
, Jing Du
, Yuwen Liu
, Xiaolong Xu
, Qiang Ni
, Wanchun Dou
, Xiaokang Zhou
:
Joint Test-time Adaptation with Refined Pseudo-labels and Latent Score Matching. 2556-2565 - Hua Wang
, Hong Liu
, Jiale Ren
, Mingxin Tan
, Zhongzien Jiang
:
CLIP-6D: Empowering CLIP as a Zero-Shot 6D Pose Estimator Through Generalizable Object-Specific Representations. 2566-2575 - Ruipu Wu
, Yige Zhang
, Jinyu Chen
, Linjiang Huang
, Shifeng Zhang
, Xu Zhou
, Liang Wang
, Si Liu
:
AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation. 2576-2585 - Yihua Shao
, Haojin He
, Sijie Li
, Siyu Chen
, Xinwei Long
, Fanhu Zeng
, Yuxuan Fan
, Muyang Zhang
, Ziyang Yan
, Ao Ma
, Xiaochen Wang
, Hao Tang
, Yan Wang
, Shuyan Li
:
EventVAD: Training-Free Event-Aware Video Anomaly Detection. 2586-2595 - Qiuyu Liang
, Yongqiang Zhang
:
SAM based Region-Word Clustering and Inference Score Adjusting for Open-Vocabulary Object Detection. 2596-2605 - Xiao Liang
, Jiawei Hu
, Di Wang
, Zhi Ma
, Lin Zhao
, Ronghan Li
, Bo Wan
, Quan Wang
:
CheXPO: Preference Optimization for Chest X-ray VLMs with Counterfactual Rationale. 2606-2615 - Qian Sun
, Chengzhuo Lu
, Wenyu Chen
, Wenjie Wei
, Jingya Wang
, Jieyuan Zhang
, Xiaoli Liu
, Yalan Ye
, Yang Yang
, Malu Zhang
:
Temporal-coded Spiking Transformer. 2616-2624 - Yuwu Lu
, Haoyu Huang
, Xue Hu
:
Domain-aware Visual Context Prompt for Multi-Source Domain Adaptation. 2625-2633 - Xingke Song
, Jianxu Shangguan
, Yiran Li
, Jialu Zhang
, Jianfeng Ren
, Ruibin Bai
, Xin Chen
, Xudong Jiang
:
CEARI: Co-Evolutionary Agents for Reassembling and Inpainting Puzzles with Gaps and Missing Pieces. 2634-2642 - Xiaoyu Chen
, Yigang Cen
, Wanru Xu
, Yue Zhang
, Yi Jin
, Yidong Li
, Linna Zhang
:
Hierarchical Meta-prototypes Network for Few-shot Action Recognition. 2643-2652 - Kyungjune Lee
, Seongjean Kim
, Hoseok Tong
, Hyucksang Lee
, Seongmin Lee
, Weisi Lin
, Ping An
, Sanghoon Lee
:
Domain Crossover Non-Rigid Registration for 3D Human Meshes. 2653-2662 - Jingyao Wang
, Yiming Chen
, Lingyu Si
, Changwen Zheng
:
Advancing Complex Wide-Area Scene Understanding with Hierarchical Coresets Selection. 2663-2672 - Yuxing Liu
, Ji Zhang
, Xuchuan Zhou
, Jingzhong Xiao
, Huimin Yang
, Jiaxin Zhong
:
OoDDINO: A Multi-level Framework for Anomaly Segmentation on Complex Road Scenes. 2673-2682 - Si-Woo Kim
, MinJu Jeon
, Ye-Chan Kim
, Soeun Lee
, Taewhan Kim
, Dong-Jin Kim
:
SynC: Synthetic Image Caption Dataset Refinement with One-to-many Mapping for Zero-shot Image Captioning. 2683-2692 - Xun Zhu
, Fanbin Mo
, Zheng Zhang
, Jiaxi Wang
, Yiming Shi
, Ming Wu
, Chuang Zhang
, Miao Li
, Ji Wu
:
Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation Data. 2693-2702 - Linpu He
, Yanan Li
, Bingze Li
, Elvis Han Cui
, Donghui Wang
:
DSS-Prompt: Dynamic-Static Synergistic Prompting for Few-Shot Class-Incremental Learning. 2703-2712 - Yian Li
, Wentao Tian
, Yang Jiao
, Tianwen Qian
, Na Zhao
, Bin Zhu
, Jingjing Chen
, Yu-Gang Jiang
:
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning. 2713-2722 - Yifei Deng
, Chenglong Li
, Futian Wang
, Jin Tang
:
Learning Hierarchical Cross-modal Association with Intra-modal Context for Text-Image Person Retrieval. 2723-2731 - Xiubo Liang
, Hongzhi Wang
, Zigen Li
, Jinxing Han
, Yu Zhao
, Weidong Geng
:
SGM-Transformer: Rethinking Gradient Information Loss and Compensation in Spiking Neural Networks. 2732-2741 - Qinyue Tong
, Ziqian Lu
, Jun Liu
, Yangming Zheng
, Zhe-Ming Lu
:
MediSee: Reasoning-Based Pixel-Level Perception in Medical Images. 2742-2751 - Shuyong Gao
, Qianyu Guo
, Yu'ang Feng
, Chunyuan Chen
, Xujun Wei
, Yan Wang
, Wenqiang Zhang
:
Progressive Representation Learning for Weakly-Supervised Camouflaged Object Detection. 2752-2761 - Huaihai Lyu
, Chaofan Chen
, Yuheng Ji
, Changsheng Xu
:
EgoPrompt: Prompt Learning for Egocentric Action Recognition. 2762-2770 - Yuwu Lu
, Chunzhi Liu
, Yihan Yang
:
CWCP: Generalizing Virtual Reality to Real World with Contextual-Weather Correlation Pairing for Deraining and Desnowing. 2771-2780 - Pei Liu
, Xin Liu
, Ruoyu Yao
, Junming Liu
, Siyuan Meng
, Ding Wang
, Jun Ma
:
HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation. 2781-2790 - Yan Zhang
, Shiwen He
, Lin Yuan
, Jiaxu Leng
, Xinbo Gao
:
DichotomyIR: Universal Image Reconstruction via Dichotomy Classification and Uncertainty Elimination. 2791-2800 - Francesco Tonini
, Lorenzo Vaquero
, Alessandro Conti
, Cigdem Beyan
, Elisa Ricci
:
Dynamic Scoring with Enhanced Semantics for Training-Free Human-Object Interaction Detection. 2801-2810 - Zhilin Huang
, Chujun Qin
, Yifei Xing
, Wenming Yang
:
Enhanced Motion-aware Latent Diffusion Models for Video Frame Interpolation. 2811-2820 - Zeming Wei
, Junyi Lin
, Yang Liu
, Weixing Chen
, Jingzhou Luo
, Guanbin Li
, Liang Lin
:
3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians. 2821-2830 - Huy Le
, Nhat Chung
, Tung Kieu
, Anh Nguyen
, Ngan Le
:
BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance. 2831-2840 - Jianghang Lin
, Yue Hu
, Jiangtao Shen
, Yunhang Shen
, Liujuan Cao
, Shengchuan Zhang
, Rongrong Ji
:
What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation. 2841-2850 - Zhengyang Liang
, Meiyu Liang
, Wei Huang
, Yawen Li
, Wu Liu
, Yingxia Shao
, Kangkang Lu
:
Dynamic Self-adaptive Multiscale Distillation from Pre-trained Multimodal Large Model for Efficient Cross-modal Retrieval. 2851-2859 - Tiancheng Gu
, Kaicheng Yang
, Ziyong Feng
, Xingjun Wang
, Yanzhao Zhang
, Dingkun Long
, Yingda Chen
, Weidong Cai
, Jiankang Deng
:
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs. 2860-2869 - Lin Peng
, Cong Wan
, Shaokun Wang
, Xiang Song
, Yuhang He
, Yihong Gong
:
CIA: Class- and Instance-aware Adaptation for Vision-Language Models. 2870-2879 - Xi Xiao
, Yunbei Zhang
, Xingjian Li
, Tianyang Wang
, Xiao Wang
, Yuxiang Wei
, Jihun Hamm
, Min Xu
:
Visual Instance-aware Prompt Tuning. 2880-2889 - Yuliang Chen
, Xi Lin
, Chao Sang
, Xiu Su
:
DualFPT: Handling Data Heterogeneity in Federated Prompt Tuning from both Generalized and Personalized Perspective. 2890-2899 - Lingbo Zhang
, Bingqian Sun
, Linghan Cai
, Yifeng Wang
, Ye Zhang
, Songhan Jiang
, Kai Zhang
, Yongbing Zhang
:
Counting by Points: Density-Guided Weakly-Supervised Nuclei Segmentation in Histopathological Images. 2900-2908 - Haodong Chen
, Haojian Huang
, Xinxiang Yin
, Dian Shao
:
FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning. 2909-2918 - Shaowu Xu
, Xibin Jia
, Junyu Gao
, Qianmei Sun
, Jing Chang
, Chao Fan
:
Cross-Modal Dual-Causal Learning for Long-Term Action Recognition. 2919-2928 - Jiahao Li
, Yang Lu
, Yachao Zhang
, Fangyong Wang
, Yuan Xie
, Yanyun Qu
:
Novel Category Discovery with X-Agent Attention for Open-Vocabulary Semantic Segmentation. 2929-2938 - Jingyuan Fang
, Yang Ning
, Xiushan Nie
, Xinfeng Liu
, Zhiyong Cheng
:
VLHP: Learning Discriminative Vision-Language Hybrid Prototypes for Weakly Supervised Semantic Segmentation. 2939-2948 - Xin Li
, Mingming Gong
, Yunfei Wu
, Jianxin Dai
, Antai Guo
, Xinghua Jiang
, Haoyu Cao
, Yinsong Liu
, Deqiang Jiang
, Xing Sun
:
DREAM: Document Reconstruction via End-to-end Autoregressive Model. 2949-2957 - Longzhen Yang
, Zhangkai Ni
, Ying Wen
, Yihang Liu
, Lianghua He
, Heng Tao Shen
:
Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation. 2958-2967 - Wenxuan Yang
, Qingqv Wei
, Chenxi Ma
, Weimin Tan
, Bo Yan
:
Scaling Laws for Data-Efficient Visual Transfer Learning. 2968-2976 - Pengcheng Zheng
, Kecheng Chen
, Jiaxin Huang
, Bohao Chen
, Ju Liu
, Yazhou Ren
, Xiaorong Pu
:
Lightweight Medical Image Restoration via Integrating Reliable Lesion-Semantic Driven Prior. 2977-2986 - Kun-Hsiang Lin
, Yu-Wen Tseng
, Kang-Yang Huang
, Jhih-Ciang Wu
, Wen-Huang Cheng
:
InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing. 2987-2996 - Kai Niu
, Liucun Shi
, Ke Han
, Qinzi Zhao
, Yue Wu
, Yanning Zhang
:
Test-Time Adaptation for Text-Based Person Search. 2997-3006 - Si Chen
, Yujia Chen
, Xiaotian Yin
, Xin Liu
, Huakai Lai
, Tianzhu Zhang
:
PAF: Prototype Adaptive Fusion for Test-Time Adaptation of Vision-Language Models. 3007-3016 - Chunyan She
, Fujun Han
, Chengyu Fang
, Shukai Duan
, Lidan Wang
:
Exploring Fourier Prior and Event Collaboration for Low-Light Image Enhancement. 3017-3026 - Liang Yao
, Fan Liu
, Delong Chen
, Chuanyi Zhang
, Yijun Wang
, Ziyun Chen
, Wei Xu
, Shimin Di
, Yuhui Zheng
:
RemoteSAM: Towards Segment Anything for Earth Observation. 3027-3036 - Jiawei Ge
, Xinyu Zhang
, Jiuxin Cao
, Xuelin Zhu
, Weijia Liu
, Qingqing Gao
, Biwei Cao
, Kun Wang
, Chang Liu
, Bo Liu
, Chen Feng
, Ioannis Patras
:
Gen4Track: A Tuning-free Data Augmentation Framework via Self-correcting Diffusion Model for Vision-Language Tracking. 3037-3046 - Kangjie Chen
, BingQuan Dai
, Minghan Qin
, Dongbin Zhang
, Peihao Li
, Yingshuang Zou
, Haoqian Wang
:
SLGaussian: Fast Language Gaussian Splatting in Sparse Views. 3047-3056 - Jo-Ku Cheng
, Zeren Zhang
, Ran Chen
, Jingyang Deng
, Ziran Qin
, Jinwen Ma
:
GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions. 3057-3066 - Hang Xiong
, Runmin Cong
, Jinpeng Chen
, Chen Zhang
, Feng Li
, Huihui Bai
, Sam Kwong
:
MM-Prompt: Multi-modality and Multi-granularity Prompts for Few-Shot Segmentation. 3067-3075 - Jiawei Gu
, Ziyue Qiao
, Zechao Li
:
Activation Shape Matters: OOD Detection with Norm-Entropy Fusion. 3076-3084 - Xinchen Ye
, Aokai Zhang
, Rui Xu
:
Semantics-Driven Contrastive Learning for Real-World Depth Super Resolution. 3085-3093 - Jiawen Lin
, Shiran Bian
, Yihang Zhu
, Wenbin Tan
, Yachao Zhang
, Yuan Xie
, Yanyun Qu
:
SeqVLM: Proposal-Guided Multi-View Sequences Reasoning via VLM for Zero-Shot 3D Visual Grounding. 3094-3103 - Yucheng Shu
, Yaohui Wang
, Lihong Qiao
, Feiyan Li
, Bin Xiao
, Weisheng Li
, Xinbo Gao
:
The Overlooked Matters: Revisiting Background, Prototype, and Activation in Few-Shot Medical Image Segmentation. 3104-3113 - Jiaxin Peng
, Siwang Zhou
, Chengqing Li
, Yucheng Li
, Dunyun Chen
:
Mitigating Delivery Artifacts in Real-World Video Super-Resolution. 3114-3123 - Wei Chen
, Jianwei Niu
, Xuefeng Liu
, Xinghao Wu
:
Decoupling Dense Video Captioning via Task-specific Prompts. 3124-3132 - Yongxin Li
, Ying Cheng
, Yaning Pan
, Wen He
, Qing Wang
, Rui Feng
, Xiaobo Zhang
:
Semantic-Aware Hard Negative Mining for Medical Vision-Language Contrastive Pretraining. 3133-3142 - Jiale Li
, Mingrui Wu
, Zixiang Jin
, Hao Chen
, Jiayi Ji
, Xiaoshuai Sun
, Liujuan Cao
, Rongrong Ji
:
MIHBench: Benchmarking and Mitigating Multi-Image Hallucinations in Multimodal Large Language Models. 3143-3152 - Hezhao Liu
, Yang Lu
, Mengke Li
, Yiqun Zhang
, Shreyank N. Gowda
, Chen Gong
, Hanzi Wang
:
FATE: A Prompt-Tuning-Based Semi-Supervised Learning Framework for Extremely Limited Labeled Data. 3153-3162 - Wangsheng He
, Wanru Xu
, Ping Guo
, Zhenjiang Miao
, Yi Tian
:
InstructStep: Fine-Grained Localization of Step Content and Relation in Instructional Video. 3163-3172 - Jiaqi Xu
, Cuiling Lan
, Yan Lu
:
Deciphering Functions of Neurons in Vision-Language Models. 3173-3181 - Kamakshya Prasad Nayak
, Kamalakar Vijay Thakare
, Ashesh Xalxo
, Lalit Lohani
, Debi Prosad Dogra
:
Can Person-Level Attributes Improve Group Re-Identification? 3182-3191 - Changshuo Wang
, Shuting He
, Xiang Fang
, Fangzhe Nan
, Prayag Tiwari
:
Seeing the Overlooked: Bio-Visual Inspired Weak Saliency Feedback Transformer for Person Re-identification. 3192-3201 - Weihuang Lin
, Yiwei Ma
, Xiaoshuai Sun
, Shuting He
, Jiayi Ji
, Liujuan Cao
, Rongrong Ji
:
HRSeg: High-Resolution Visual Perception and Enhancement for Reasoning Segmentation. 3202-3211 - Da Zhang
, Feiyu Wang
, Bingyu Li
, Zhiyuan Zhao
, Junyu Gao
, Xuelong Li
:
KAID: Knowledge-Aware Interactive Distillation for Vision-Language Models. 3212-3221 - Xiao Hu
, Heiko Neumann
, Jochen Lang
:
A Filtering Framework for Semi-online Referring Video Object Segmentation. 3222-3231 - Ruiqi Dong
, Wenjing Pang
, Chenjie Pan
, Hengyang Lu
, Chenyou Fan
:
StoryCrafter: Instance-Aligned Multi-Character Storytelling with Diffusion Policy Learning. 3232-3241 - Xiaohan Yu
, Zicheng Pan
, Yang Zhao
, Qin Zhang
, Yongsheng Gao
:
Contrastive Lie Algebra Learning for Ultra-Fine-Grained Visual Categorization. 3242-3250 - Xiaoxing Hu
, Kaicheng Yang
, Jun Wang
, Haoran Xu
, Ziyong Feng
, Yupei Wang
:
Decoupled Global-Local Alignment for Improving Compositional Understanding. 3251-3260 - Jingxing Guo
, Guilian Chen
, Yimu Sun
, Huisi Wu
, Jing Qin
:
EchoVim: Making Vision Mamba Docile for Echocardiography Video Segmentation via Dynamic Interaction and Semantic Token-attentive Refinement. 3261-3269 - Haifeng Zhao
, Shuo Xu
, Leilei Ma
, Yufei Zhang
, Lei Wang
, Dengdi Sun
:
Towards Space and Semantics: Object-Purified Representation Learning for Multi-Label Image Classification. 3270-3279 - Junyu Gao
, Xuan Yao
, Yong Rui
, Changsheng Xu
:
Building Embodied EvoAgent: A Brain-inspired Paradigm for Bridging Multimodal Large Models and World Models. 3280-3289 - Chen Feng
, Nicu Sebe
, Georgios Tzimiropoulos
, Miguel R. D. Rodrigues
, Ioannis Patras
:
Unveiling Open-set Noise: Theoretical Insights into Label Noise. 3290-3299 - Zhongrui Gui
, Junyu Xie
, Tengda Han
, Weidi Xie
, Andrew Zisserman
:
Character-Centric Understanding of Animated Movies. 3300-3309 - Ziyun Dai
, Xiaoqiang Li
, Shaohua Zhang
, Yuanchen Wu
, Jide Li
:
See Different, Think Better: Visual Variations Mitigating Hallucinations in LVLMs. 3310-3319 - Cheng Ye
, Weidong Chen
, Peipei Song
, Xinyan Liu
, Lei Zhang
, Zhendong Mao
:
Multi-round Mutual Emotion-Cause Pair Extraction for Emotion-Attributed Video Captioning. 3320-3329 - Wenhao Zheng
, Chenwei Sun
, Wenbo Zhang
, Jiancheng Lv
, Xianggen Liu
:
Target-Guided Bayesian Flow Networks for Quantitatively Constrained CAD Generation. 3330-3339 - Zhiyu Ye
, Guowen Li
, Haoyuan Liang
, Zixi Wang
, Shilei Cao
, Yushan Lai
, Juepeng Zheng
:
Quantifying Samples with Invariance for Source-Free Class Incremental Domain Adaptation. 3340-3349 - Shuai Huang
, Yongxiong Wang
, Huan Luo
, Haodong Jing
, Chendong Qin
, Jingqun Tang
:
MINDEV: Multi-modal Integrated Diffusion Framework for Video Reconstruction from EEG Signals. 3350-3359 - Zhijie Rao
, Jingcai Guo
:
Balancing Cross-Modal Attention for Generalized Zero-Shot Learning. 3360-3369 - Zhenxuan Fang
, Shuaibo Wang
, Weisheng Dong
, Junwei Xu
, Fangfang Wu
, Xin Li
, Guangming Shi
:
Beyond Visual Quality: Fidelity-Oriented Diffusion Model for Real-world Image Super-Resolution. 3370-3379 - Peng Ying
, Zhongnian Li
, Meng Wei
, Xinzheng Xu
:
Reversible Privacy Preserving on Vision-Language Models via Adversarial Multimodal Key. 3380-3389 - Taras Kucherenko
, Derek Peristy
, Judith Bütepage
:
Evaluating the Evaluators: Towards Human-aligned Metrics for Missing Markers Reconstruction. 3390-3398 - Changho Choi
, Youngwoo Shin
, Gyojin Han
, Dong-Jae Lee
, Junmo Kim
:
B4DL: A Benchmark for 4D LiDAR LLM in Spatio-Temporal Understanding. 3399-3407 - Fenghe Tang
, Bingkun Nian
, Jianrui Ding
, Wenxin Ma
, Quan Quan
, Chengqi Dong
, Jie Yang
, Wei Liu
, S. Kevin Zhou
:
Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation. 3408-3417 - Ling You
, Wenxuan Huang
, Xinni Xie
, Xiangyi Wei
, Bangyan Li
, Shaohui Lin
, Yang Li
, Changbo Wang
:
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation. 3418-3427 - Bowen Guo
, Shiwei Gan
, Yafeng Yin
, Xiao Liu
, Zhiwei Jiang
, Shunmei Meng
:
Sentence-level Segmentation for Long Sign Language Videos with Captions. 3428-3437 - Jiayi Zou
, Chaofan Chen
, Bing-Kun Bao
, Changsheng Xu
:
DMC3: Dual-Modal Counterfactual Contrastive Construction for Egocentric Video Question Answering. 3438-3447 - Penglei Sun
, Yaoxian Song
, Xiangru Zhu
, Xiang Liu
, Qiang Wang
, Yue Liu
, Changqun Xia
, Tiefeng Li
, Yang Yang
, Xiaowen Chu
:
City-VLM: Towards Multidomain Perception Scene Understanding via Multimodal Incomplete Learning. 3448-3457 - Yuzhen Niu
, Siling Chen
, Yuzhong Chen
, Fusheng Li
, Rui Xu
, Hui Da
:
CoFiVLA: Synergistic Coarse-Fine Vision-Language Alignment for Image Aesthetic Assessment. 3458-3467 - Duolin Wang
, Guanyu Xing
, Yanli Liu
:
FlowTrack: Integrating Adjacent-Frame Motion Tracking and Adaptive Prediction for Robust Semi-Supervised VOS. 3468-3476 - Lin Zhang
, Yi Tian
, Xiyun Wang
, Wanru Xu
, Yi Jin
, Yaping Huang
:
Differential Contrastive Training for Gaze Estimation. 3477-3486 - Tiancheng Gu
, Kaicheng Yang
, Chaoyi Zhang
, Yin Xie
, Xiang An
, Ziyong Feng
, Dongnan Liu
, Weidong Cai
, Jiankang Deng
:
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm. 3487-3496 - Yanting Pei
, Fan Yang
:
Adaptive Neighbors and Uncertainty Estimation for Source-Free Unsupervised Domain Adaptation with Noisy Labels. 3497-3506 - Bingshuai Liu
, Ante Wang
, Zijun Min
, Chenyang Lyu
, Longyue Wang
, Zhihao Wang
, Xu Han
, Peng Li
, Jinsong Su
:
EditEval: Towards Comprehensive and Automatic Evaluation for Text-guided Video Editing. 3507-3516 - Rui Chen
, Lei Sun
, Jing Tang
, Geng Li
, Xiangxiang Chu
:
FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos. 3517-3526 - Zizhi Chen
, Xinyu Zhang
, Minghao Han
, Yizhou Liu
, Ziyun Qian
, Weifeng Zhang
, Xukun Zhang
, Jingwei Wei
, Lihua Zhang
:
VLM-based Prompts as the Optimal Assistant for Unpaired Histopathology Virtual Staining. 3527-3536 - Zihao Mo
, Junye Chen
, Chaowei Fang
, Guanbin Li
:
PatchWiper: Leveraging Dynamic Patch-Wise Parameters for Real-World Visible Watermark Removal. 3537-3545 - Xueyu Yuan
, Jiarui Zhang
, Jiangqi Song
, Liu Liu
, Li Zhang
, Dan Guo
, Richang Hong
, Meng Wang
:
DFGAP: Towards Depth-Free Cross-Category GAParts Perception via Uncertainty-Quantified Modeling. 3546-3554 - Yudong Zhang
, Ruobing Xie
, Xingwu Sun
, Yiqing Huang
, Jiansheng Chen
, Zhanhui Kang
, Di Wang
, Yu Wang
:
DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models. 3555-3564 - Wenjie Zhu
, Yabin Zhang
, Xin Jin
, Wenjun Zeng
, Lei Zhang
:
Knowledge Regularized Negative Feature Tuning of Vision-Language Models for Out-of-Distribution Detection. 3565-3574 - Ji Ma
, Wei Suo
, Peng Wang
, Yanning Zhang
:
Short-LVLM: Compressing and Accelerating Large Vision-Language Models by Pruning Redundant Layers. 3575-3584 - Xudong Wang
, Lei Tan
, Pingyang Dai
, Liujuan Cao
, Rongrong Ji
:
GPT-ReID: Learning Fine-grained Representation with GPT for Text-based Person Retrieval. 3585-3594 - Runze Zhao
, Fuqing Zhu
, Jizhong Han
, Songlin Hu
:
Visual Perception Uncertainty Learning for Hallucination Detection in Large Vision-Language Models. 3595-3604 - Lei Liu
, Xiangdong Su
, Guanglai Gao
:
Fourier Self-Adaptation for Transferring General Pretrained Models to Specific Domains. 3605-3614 - Yiying Yang
, Fukun Yin
, Jiayuan Fan
, Wanzhang Li
, Xin Chen
, Gang Yu
:
Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE. 3615-3624 - Gefan Ye
, Lin Li
, Kexin Li
, Jun Xiao
, Long Chen
:
Zero-shot Compositional Action Recognition with Neural Logic Constraints. 3625-3634 - Yijun Wang
, Siying Wu
, Lubin Gan
, Zheyu Zhang
, Jing Zhang
, Zhangchi Hu
, Huyue Zhu
, Peixi Wu
, Xiaoyan Sun
:
MeDKCoOp: Dual Knowledge-guided Graph Prompt Learning for Biomedical Vision-Language Models. 3635-3644 - Jianhui Wang
, Yangfan He
, Yan Zhong
, Xinyuan Song
, Jiayi Su
, Yuheng Feng
, Ruoyu Wang
, Hongyang He
, Wenyu Zhu
, Xinhang Yuan
, Miao Zhang
, Keqin Li
, Jiaqi Chen
, Tianyu Shi
, Xueqian Wang
:
Twin Co-Adaptive Dialogue for Progressive Image Generation. 3645-3653 - Jiayuan Rao
, Zifeng Li
, Haoning Wu
, Ya Zhang
, Yanfeng Wang
, Weidi Xie
:
Multi-Agent System for Comprehensive Soccer Understanding. 3654-3663 - Yuguang Zhang
, Qihang Fan
, Huaibo Huang
:
Vision Transformer with Sparse Scan Prior. 3664-3672 - Shaohui Dai
, Yansong Qu
, Zheyan Li
, Xinyang Li
, Shengchuan Zhang
, Liujuan Cao
:
Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs. 3673-3682 - Qinchen Wu
, Difei Gao
, Qinghong Lin
, Zhuoyu Wu
, Mike Zheng Shou
:
GUI-Narrator: Detecting and Captioning Computer GUI Actions. 3683-3692 - Liangyu Fu
, Junbo Wang
, Yuke Li
, Qiangguo Jin
, Hongsong Wang
, Jing Ya
, Linjiang Huang
, Liang Yao
, Jiangbin Zheng
, Xuecheng Wu
, Zhiyong Wang
:
DSACap: Enhancing Visual-Semantic Alignment with Diffusion-based Framework for Image Captioning. 3693-3701 - Meng Wei
, Zhongnian Li
, Peng Ying
, Xinzheng Xu
:
Seeing the Undefined: Chain-of-Action for Generative Semantic Labels. 3702-3711 - Zikang Liu
, Kun Zhou
, Wayne Xin Zhao
, Dawei Gao
, Yaliang Li
, Ji-Rong Wen
:
Less is More: High-value Data Selection for Visual Instruction Tuning. 3712-3721 - Mengzu Liu
, Junwei Xu
, Tao Huang
, Fangfang Wu
, Le Dong
, Xin Li
, Weisheng Dong
:
Exploring Global Correlations via Polarity Memory for Multispectral Demosaicing. 3722-3730 - Zhaofeng Shi
, Heqian Qiu
, Lanxiao Wang
, Qingbo Wu
, Fanman Meng
, Hongliang Li
:
Unsupervised Ego- and Exo-centric Dense Procedural Activity Captioning via Gaze Consensus Adaptation. 3731-3740 - Chao Yin
, Hao Li
, Kequan Yang
, Jide Li
, Pinpin Zhu
, Xiaoqiang Li
:
Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation. 3741-3750 - Shanding Diao
, Yang Zhao
, Yuan Chen
, Zhao Zhang
, Wei Jia
, Ronggang Wang
:
Multi-Layer Gaussian Splatting for Single-Image Feed-Forward Spatial Scene Reconstruction. 3751-3759 - Yang Ren
, Hai Jiang
, Wei Li
, Menglong Yang
, Heng Zhang
, Zehua Sheng
, Qingsheng Ye
, Shuaicheng Liu
:
Learning Arbitrary-Scale RAW Image Downscaling with Wavelet-based Recurrent Reconstruction. 3760-3768 - Wenqi Zeng
, Yuqi Sun
, Chenxi Ma
, Weimin Tan
, Bo Yan
:
MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks. 3769-3778 - Zelei Wu
, Xulun Ye
, Jieyu Zhao
:
Clustering-Based Tail-class Mitigation for New-class Discovery. 3779-3787 - Siqi Song
, Limin Yu
, Jimin Xiao
:
SDP: Spectral-Decomposed Prompting for Continual Learning. 3788-3797 - Shubo Liu
, Hongsheng Zhang
, Qian Qiao
, Qi Wu
, Peng Wang
:
VLN-ChEnv: Vision-language Navigation in Changeable Environments. 3798-3807 - Kedong Xiu
, Sai Qian Zhang
:
CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models. 3808-3816 - Fan Yang
, Ling Deng
, Zhiyong Gan
, Qisheng He
, Yuanbo Fang
, Xiangmin Xu
, Shuangping Huang
, Tianshui Chen
:
Optimal Feature Embedding for Document Large Visual Language Model. 3817-3826 - Lin Li
, Guikun Chen
, Zhen Wang
, Jun Xiao
, Long Chen
:
Compositional Zero-shot Learning via Progressive Language-based Observations. 3827-3836 - Weimin Cheng
, Zhenyu Wang
, Tao Huang
, Fangfang Wu
, Weisheng Dong
:
Pushing the Limit of Binarized Neural Network for Image Super Resolution with Smooth Information Transmission. 3837-3846 - Xiang Ma
, Litian Xu
, Lexin Fang
, Caiming Zhang
, Lizhen Cui
:
Reliable Cross-modal Alignment via Prototype Iterative Construction. 3847-3855 - Ran Chen
, Taiyi Su
, Hanli Wang
:
WaveCL: Wavelet Calibration Learning for Referring Video Object Segmentation. 3856-3864 - Jingxing Guo
, Guilian Chen
, Yimu Sun
, Huisi Wu
, Jing Qin
:
Hierarchical Spatiotemporal Context Aggregation and Speckle-aware Deformable Convolution for Echocardiography Video Segmentation. 3865-3874 - Junkang Liu
, Fanhua Shang
, Yuxuan Tian
, Hongying Liu
, Yuanyuan Liu
:
Consistency of Local and Global Flatness for Federated Learning. 3875-3883 - Yangxu Yin
, Honglong Chen
, Yudong Gao
, Peng Sun
, Liantao Wu
, Zhe Li
, Weifeng Liu
:
FFCBA: Feature-based Full-target Clean-label Backdoor Attacks. 3884-3892 - Sijing Li
, Tianwei Lin
, Lingshuai Lin
, Wenqiao Zhang
, Jiang Liu
, Xiaoda Yang
, Juncheng Li
, Yucheng He
, Xiaohui Song
, Jun Xiao
, Yueting Zhuang
, Beng Chin Ooi
:
EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model. 3893-3902 - Changtao Miao
, Qi Chu
, Tao Gong
, Zhentao Tan
, Zhenchao Jin
, Wanyi Zhuang
, Man Luo
, Honggang Hu
, Nenghai Yu
:
Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization. 3903-3912 - Shanshan Li
, Jiawei Hou
, Da Huang
, Yanwei Fu
, Xiangyang Xue
:
Ali-UI: Enhancing Complex Vision-Language Navigation with Alignment of Unified Map and Instruction Parsing. 3913-3922 - Ziming Zhao
, Zhaoxuan Li
, Tingting Li
, Fan Zhang
:
Stealthy-AE: Generating Stealthy Adversarial Examples through Online Social Networks. 3923-3931 - Hanning Chen
, Yang Ni
, Wenjun Huang
, Hyunwoo Oh
, Yezi Liu
, Tamoghno Das
, Mohsen Imani
:
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation. 3932-3941 - Yonghyeon Jo
, Janghyun Kim
, Jinsun Park
:
BAC-GCN: Background-Aware CLIP-GCN Framework for Unsupervised Multi-Label Classification. 3942-3951 - Dingwei Zhang
, Dong Zhang
, Jinhui Tang
:
Mitigating Query Selection Bias in Referring Video Object Segmentation. 3952-3961 - Xiangyu Shan
, Heng Song
, Junwu Zhu
:
DFCNet: Dual-Factor Compensatory Clustering Network for Modality-Imbalanced Generalized Zero-Shot Learning. 3962-3971 - Zhiyuan Fan
, Keyi Liang
:
Video-to-Image Affordance Grounding via Visual Conceptual Learning. 3972-3980 - Qiyan Zhao
, Xiaofeng Zhang
, Yiheng Li
, Yun Xing
, Xiaosong Yuan
, Feilong Tang
, Sinan Fan
, Xuhang Chen
, Da-Han Wang
, Xu-Yao Zhang
:
MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models. 3981-3990 - Dexuan Xu
, Yanyuan Chen
, Yu Huang
, Shihao E
, Yiwei Lou
, Yongzhi Cao
, Hanpin Wang
, Meikang Qiu
:
Medical Vision-Language Pre-training with Multimodal Variational Masked Autoencoder for Robust Medical VQA. 3991-4000 - Yili Li
, Gang Xiong
, Gaopeng Gou
, Xiangyan Qu
, Jiamin Zhuang
, Zhen Li
, Junzheng Shi
:
T2VParser: Adaptive Decomposition Tokens for Partial Alignment in Text to Video Retrieval. 4001-4009 - Yizhi Hu
, Zezhao Tian
, Xingqun Qi
, Chen Su
, Bingkun Yang
, Junhui Yin
, Muyi Sun
, Man Zhang
, Zhenan Sun
:
ReMeREC: Relation-aware and Multi-entity Referring Expression Comprehension. 4010-4019 - Xiaoqin Wang
, Xianxu Hou
, Meidan Ding
, Junliang Chen
, Kaijun Deng
, Jinheng Xie
, Linlin Shen
:
DisFaceRep: Representation Disentanglement for Co-occurring Facial Components in Weakly Supervised Face Parsing. 4020-4029 - Zhenni Yu
, Li Zhao
, Guobao Xiao
, Xiaoqin Zhang
:
SAM-TTT: Segment Anything Model via Reverse Parameter Configuration and Test-Time Training for Camouflaged Object Detection. 4030-4038 - Jing Ma
, Haochen Sun
, Zeyuan Zang
, Fangxiang Feng
, Caixia Yuan
, Lei Ren
, Huixing Jiang
, Wei Chen
, Xiaojie Wang
:
VL-DynaRefine: A Vision-Language Dynamic Refinement Approach for Visual Reasoning. 4039-4047 - Jiao Chen
, Jiayi He
, Fangfang Chen
, Zuohong Lv
, Jianhua Tang
:
Forward-Only Continual Learning. 4048-4057 - Jiahua Bao
, Siyao Cheng
, Jiaxing Du
, Changjiang He
, Zeming Lang
, Hao Zhang
, Jie Liu
:
BOLT: Fewer Tokens but More Performance Retention for Efficient Vision-Language Models Inference. 4058-4067 - Ziqi Yuan
, Jun Li
, Yanghao Li
, Yuxiang Huang
, Chi Chen
, Shuo Wang
, Zhinan Gou
:
CITR: Efficient Long Video Understanding Needs Causal Importance. 4068-4076 - Qi Li
, Yucan Zhou
, Jiang Zhou
, XingYou Yang
, Xiaoyan Gu
:
Diverse and Public Features Cooperation via Gradient Rectification for Federated Prompt Learning. 4077-4086 - Shilei Wang
, Gong Cheng
, Pujian Lai
, Dong Gao
, Junwei Han
:
Multi-State Tracker: Enhancing Efficient Object Tracking via Multi-State Specialization and Interaction. 4087-4096 - Xinyu Zhang
, Lingling Zhang
, Yanrui Wu
, Muye Huang
, Jun Liu
:
Cognitive Predictive Coding Network: Rethinking the Generalization in Raven's Progressive Matrices. 4097-4106 - Xiaoxuan Mu
, Haoyu Tang
, Han Jiang
, Tianyuan Liang
, Qinghai Zheng
, Jihua Zhu
:
FACE: A Dual-Template and Adaptive Curriculum Framework for Unsupervised Text-Based Person Search. 4107-4116 - Xinyu Huang
, Yi-Jie Huang
, Youcai Zhang
, Weiwei Tian
, Rui Feng
, Yuejie Zhang
, Yanchun Xie
, Yaqian Li
, Lei Zhang
:
Open-Set Image Tagging with Multi-Grained Text Supervision. 4117-4126 - Zhihao Wang
, Shiyu Liu
, Zhiwei He
, Kangjie Zheng
, Liangying Shao
, Junfeng Yao
, Jinsong Su
:
Gloss Matters: Unlocking the Potential of Non-Autoregressive Sign Language Translation. 4127-4136 - Jiye Xie
, Yifei Gao
, Liangliang You
, Xiang Xu
, Haoran Xu
, Zhiqiang Kou
, Kexue Fu
, Youyang Qu
, Wenjie Yang
, Jianwei Guo
, Weiliang Meng
, Longxiang Gao
, Haoran Yang
, Changwei Wang
, Yu Zhang
:
Collaboration Wins More: Dual-Modal Collaborative Attention Reinforcement for Mitigating Large Vision Language Models Hallucination. 4137-4146 - Xinzhe Xia
, Weiguang Zhao
, Yuyao Yan
, Guanyu Yang
, Rui Zhang
, Kaizhu Huang
, Xi Yang
:
Towards Training-Free Open-World Classification with 3D Generative Models. 4147-4155 - Mingyu Fu
, Wei Suo
, Ji Ma
, Lin Yuanbo Wu
, Peng Wang
, Yanning Zhang
:
Mitigating Information Loss under High Pruning Rates for Efficient Large Vision Language Models. 4156-4165 - Zijun Xu
, Jiahao Guo
, Chunjie Zhang
, Zhongyuan Wang
, Chunxia Xiao
, Chao Liang
:
Quantum Interference-Inspired Who-What-Where Composite-Semantics Instance Search for Story Videos. 4166-4174 - Lihong Qiao
, Shiyi Gao
, Yucheng Shu
, Bin Xiao
, Weisheng Li
, Xinbo Gao
:
Pathology-Aware Reconstruction with Discriminative Knowledge Boosting Alignment for Che-Xray Vision-Language Pre-training. 4175-4184 - Rongzhen Zhao
, Yi Zhao
, Juho Kannala
, Joni Pajarinen
:
Slot Attention with Re-Initialization and Self-Distillation. 4185-4192 - Qucheng Peng
, Chen Bai
, Guoxiang Zhang
, Bo Xu
, Xiaotong Liu
, Xiaoyin Zheng
, Chen Chen
, Cheng Lu
:
NavigScene: Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving. 4193-4202 - Zizhuo Li
, Chunbao Su
, Fan Fan
, Jun Huang
, Jiayi Ma
:
CorrNeXt: Making the ConvNet-Style Correspondence Pruner Stronger for Two-View Geometry. 4203-4212 - Jinxu Zhang
, Qiyuan Fan
, Yongqi Yu
, Yu Zhang
:
DREAM: Integrating Hierarchical Multimodal Retrieval with Multi-page Multimodal Language Model for Documents VQA. 4213-4221 - Junyi Wang
, Yue Qi
:
Visual Localization using Hybrid Feature Grid and Learned Weighted Global Point Cloud. 4222-4231 - Yifan Zhang
, Yang Shi
, Weichen Yu
, Qingsong Wen
, Xue Wang
, Wenjing Yang
, Zhang Zhang
, Liang Wang
, Rong Jin
:
Debiasing Multimodal Large Language Models via Penalization of Language Priors. 4232-4241 - Xiaolei Bo
, Feiyang Yang
, Feilong Xu
, Xiaoli Zhang
:
Cross-Counter-Repeat Attention for Enhanced Understanding of Visual Semantics in Radiology Report Generation. 4242-4250 - Jiacheng Ruan
, Zongyun Zhang
, Jingsheng Gao
, Wenzhen Yuan
, Ting Liu
, Yuzhuo Fu
:
MPI-CD: Multi-Path Information Contrastive Decoding for Mitigating Hallucinations in Large Vision-Language Models. 4251-4260 - Hao Sun
, Fenggen Yu
, Huiyao Xu
, Tao Zhang
, Changqing Zou
:
LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis. 4261-4270 - Hongchen Wei
, Zhenzhong Chen
:
RealVG: Unleashing MLLMs for Training-Free Spatio-Temporal Video Grounding in the Wild. 4271-4280 - Hongchen Wei
, Zhenzhong Chen
:
Visual Context Window Extension: A New Perspective for Long Video Understanding. 4281-4289 - Yu Liu
, Kun Sun
, Chang Tang
, Yuhua Qian
, Xin Li
:
TPDepth: Leveraging Text Prompts with ControlNet to Boost Diffusion-based Depth Estimation. 4290-4299 - Yingxin Lai
, Hongyang Wang
, Jing Yang
, Xiangui Kang
, Bin Li
, Linlin Shen
, Zitong Yu
:
GM-DF: Generalized Multi-Scenario Deepfake Detection. 4300-4309 - Kun Zhai
, Siheng Chen
, Xingjun Ma
, Yu-Gang Jiang
:
FedAPT: Federated Adversarial Prompt Tuning for Vision-Language Models. 4310-4318 - Jie Wan
, Jianhao Fu
, Ziqi Yang
, Kui Ren
:
BTUAP: Boosting the Transferability of Universal Adversarial Perturbations in the Black-box Setting under various data dependencies. 4319-4328 - Hui Wu
, Haoquan Zhai
, Yuchen Li
, Hengyi Cai
, Peirong Zhang
, Yidan Zhang
, Lei Wang
, Chunle Wang
, Yingyan Hou
, Shuaiqiang Wang
, Dawei Yin
:
MARA: A Multimodal Adaptive Retrieval-Augmented Framework for Document Question Answering. 4329-4338 - Bocheng Pan
, Hailong Shi
, Xingyu Gao
:
DR-VQA: Decompose-then-Reconstruct for Visual Question Answering in BLV Assistance. 4339-4348 - Wei Jia
, Li Jin
, Kaiwen Wei
, Yuying Shang
, Nayu Liu
, Zhicong Lu
, Qing Liu
, Linhao Zhang
, Jiang Zhong
, Yanfeng Hu
:
U-MERE: Unconstrained Multimodal Entity and Relation Extraction with Collaborative Modeling and Order-Sensitive Optimization. 4349-4358 - Luyao Ren
, Wenxin Yu
, Zhiqiang Zhang
, Chang Liu
:
EMIFS: Efficient Multi-scale Information Fusion Self-supervision for Medical Image Segmentation. 4359-4368 - Chenxi Zhang
, Qing Zhang
, Jiayun Wu
, Youwei Pang
:
CGCOD: Class-Guided Camouflaged Object Detection. 4369-4377 - Wenzheng Yang
, Songwei Pei
, Bingfeng Liu
, Qian Li
, Shangguang Wang
:
OGDepth: Leveraging Object Guidance in Diffusion Models for Enhanced Monocular Depth Estimation. 4378-4387 - Xueyi Zhang
, Peiyin Zhu
, Yuan Liao
, Xiyu Wang
, Mingrui Lao
, Siqi Cai
, Yanming Guo
, Haizhou Li
:
TrustCLIP: Learning from Noisy Labels via Semantic Label Verification and Trust-aligned Gradient Projection. 4388-4397 - Yikun Ji
, Yan Hong
, Jiahui Zhan
, Haoxing Chen
, Jun Lan
, Huijia Zhu
, Weiqiang Wang
, Liqing Zhang
, Jianfu Zhang
:
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models. 4398-4407 - Xiaodong Wang
, Hongmin Hu
, Fei Yan
, Junwen Lu
, Zhiqiang Zeng
, Weidong Hong
, Zhedong Zheng
:
UniAD: Integrating Geometric and Semantic Cues for Unified Anomaly Detection. 4408-4417 - Runwei Situ
, Yi Cai
, Yong Xu
, Jiexin Wang
:
Ground and Reconstruct: Entity-Region Bidirectional Alignment Pre-Training for Low-Resource GMNER. 4418-4426 - Yongquan Xue
, Zhaoru Guo
, Zhaozhao Su
, Chong Peng
, Jun Feng
, Pan Zhou
, Marcin Pietron
, Xiyuan Wang
, Liejun Wang
, Panpan Zheng
:
Rodecon-net: Medical Image Segmentation via Robust Decoupling and Contrast-enhanced Fusion. 4427-4435 - Wenxi Huang
, Xiaojun Chen
, Qin Zhang
, Ting Wan
, Ziqi Liu
, Liangjie Zhang
:
MRBench: A Multi-Image Reasoning Benchmark with Adaptive Knowledge Retrieval. 4436-4445 - Xuanliu Zhu
, Yiqiao Chai
, Runnan Li
, Mingying Lan
, Li Gao
:
CrossMind-VL: Multi-Subject Mind-to-Video Decoding with Multimodal LLM Semantic Grounding. 4446-4454 - Jiaqing Fan
, Hanwen Qian
, Mengjuan Jiang
, Fanzhang Li
:
PeriodVOS: Learning Periodic Patterns for Unsupervised Video Object Segmentation via Adaptive Contextual Coupling. 4455-4463 - Xiangzhao Hao
, Kuan Zhu
, Hongyu Guo
, Haiyun Guo
, Ning Jiang
, Quan Lu
, Ming Tang
, Jinqiao Wang
:
Referring Expression Instance Retrieval and A Strong End-to-End Baseline. 4464-4473 - Lifeng Lin
, Rongfeng Lu
, Quan Chen
, Haofan Ren
, Ming Lu
, Yaoqi Sun
, Chenggang Yan
, Anke Xue
:
VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control. 4474-4483 - Sidun Liu
, Wenyu Li
, Peng Qiao
, Yong Dou
:
Regist3R: Incremental Registration with Stereo Foundation Model. 4484-4493 - Zichi Liu
, Yinggui Wang
, Tao Wei
, Chao Ma
:
AnchorSync: Global Consistency Optimization for Long Video Editing. 4494-4503 - Hongxu Ma
, Chenbo Zhang
, Lu Zhang
, Jiaogen Zhou
, Jihong Guan
, Shuigeng Zhou
:
Fine-grained Zero-Shot Object Detection. 4504-4513 - Hongxu Ma
, Guanshuo Wang
, Fufu Yu
, Qiong Jia
, Shouhong Ding
:
MS-DETR: Towards Effective Video Moment Retrieval and Highlight Detection by Joint Motion-Semantic Learning. 4514-4523 - Hao Ruan
, Jinliang Lin
, Yingxin Lai
, Zhiming Luo
, Shaozi Li
:
HCCM: Hierarchical Cross-Granularity Contrastive and Matching Learning for Natural Language-Guided Drones. 4524-4533 - Yun Li
, Lina Yao
, Zhe Liu
:
Compositional Zero-Shot Learning with Contextualized Cues and Adaptive Contrastive Training. 4534-4541 - Zhuming Wang
, Yihao Zheng
, Jiarui Li
, Yaofei Wu
, Yan Huang
, Zun Li
, Lifang Wu
, Liang Wang
:
VicKAM: Visual Conceptual Knowledge Guided Action Map for Weakly Supervised Group Activity Recognition. 4542-4551 - Yuzhen Li
, Min Liu
, Yuan Bian
, Xueping Wang
, Zhaoyang Li
, Gen Li
, Yaonan Wang
:
Dual Enhancement on 3D Vision-Language Perception for Monocular 3D Visual Grounding. 4552-4561 - Yiliang Zhu
, Dayan Wu
, Qinghang Su
, Zexian Yang
, Zheng Lin
, Weiping Wang
:
Mitigating the Evolving Semantic Entanglement in Continual Learning of Vision-Language Models. 4562-4570 - Xiongwei Dang
, Wenxuan Liu
, Xian Zhong
, Zheng Wang
:
SegTraj: A Segmented-Trajectory-Aware Spatio-Temporal Graph Convolutional Network for Social Group Detection. 4571-4579 - Sifan Zuo
, Youfa Liu
, Bo Du
:
CSDN: CLIP-Driven Similarity-Aligned Distillation Network for Weakly-Supervised Object Localization. 4580-4589 - Dirui Xie
, Xiaofang Hu
, Zihan Wei
, Zhengqiqi Yang
, Yanlian Jiang
, Yue Zhou
:
Learning Structural Priors via Laplacian RWKV Diffusion with Light-Effect Dataset for Nighttime Visibility Enhancement. 4590-4599 - Biao Chen
, Kunbin He
, Zhikun Zheng
, Mengmeng Jing
, Lin Zuo
:
Chain-of-Thought Guided Semantic Debiasing for Low-Shot Vision-Language Tasks. 4600-4609 - Shengli Zhou
, Yang Liu
, Feng Zheng
:
Learn 3D VQA Better with Active Selection and Reannotation. 4610-4618 - Kun Ding
, Ying Wang
, Shiming Xiang
:
EvoVLMA: Evolutionary Vision-Language Model Adaptation. 4619-4628 - Yang Liu
, Zhiyong Zhang
:
DSP: Dense-Sparse Parallel Networks for Self-supervised 3D Multi-person Pose Estimation from Multiple Views. 4629-4638 - Meng Chu
, Yicong Li
, Tat-Seng Chua
:
GraphVideoAgent: Enhancing Long-form Video Understanding with Entity Relation Graphs. 4639-4648 - Hancong Wang
, Yue Yu
, Hairong Zheng
, Tong Zhang
:
Test-Time Adaptation of Medical Vision-Language Models with Mixture of Modality Experts. 4649-4658 - Zixuan Wan
, Jiqing Zhang
, Yushan Wang
, Hu Lin
, Yafei Wang
, Zetian Mi
, Xin Yang, Xianping Fu
, Huibing Wang
:
Eye-based Emotion Recognition via Event-Driven Sparse Transformers. 4659-4668 - Guoxin Zhang
, Zhonghong Ou
, Kaiwen Xue
, Jiangfeng Sun
, Yifan Zhu
, Siyuan Yao
, Yiran Shen
, Meina Song
:
DGFSD: Bridging the Gap between Dense and Sparse for Fully Sparse 3D Object Detection. 4669-4678 - Benlong Wu
, Yuang Qi
, Xiuwei Shang
, Weiming Zhang
, Nenghai Yu
, Kejiang Chen
:
MMPro: A Decoupled Perception-Thinking-Execution Framework for Secure GUI Agent. 4679-4687 - Shengqian Zhu
, Chengrong Yu
, Wenbo Qi
, Jiafei Wu
, Ying Song
, Guangjun Li
, Zhang Yi
, Xiaogang Xu
, Junjie Hu
:
PRIME: Prototype-Driven Class Incremental Learning for Medical Image Segmentation. 4688-4697 - Qile Su
, Shoutai Zhu
, Shuai Zhang
, Baoyu Liang
, Chao Tong
:
EventFormer: A Node-graph Hierarchical Attention Transformer for Action-centric Video Event Prediction. 4698-4707 - Haijing Liu
, Tao Pu
, Hefeng Wu
, Keze Wang
, Liang Lin
:
DART: Dual Adaptive Refinement Transfer for Open-Vocabulary Multi-Label Recognition. 4708-4717 - Mahiro Ukai
, Shuhei Kurita
, Nakamasa Inoue
:
STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models. 4718-4727 - Shiying Lin
, Rong Hu
, Zuoyong Li
, Qinghua Lin
, Jiawei Wu
, Changqing Zhang
:
Gradient-Aware Revitalization of Non-Effective Samples in Medical Image Segmentation. 4728-4737 - Chang Su
, Beihong Jin
, Fusang Zhang
, Siheng Li
, Zhi Wang
:
Self-Supervised Human Mesh Recovery from Partial Point Cloud via a Self-Improving Loop. 4738-4747 - Ruoxuan Li
, Xiangyu Wu
, Yang Yang
:
Noise Self-Correction via Relation Propagation for Robust Cross-Modal Retrieval. 4748-4757 - Yangyang Xu
, Xi Ye
, Duo Su
:
Multi-Task Dense Prediction Fine-Tuning with Mixture of Fine-Grained Experts. 4758-4767 - Siran Peng
, Tianshuo Zhang
, Li Gao
, Xiangyu Zhu
, Haoyuan Zhang
, Kai Pang
, Zhen Lei
:
WMamba: Wavelet-based Mamba for Face Forgery Detection. 4768-4777 - Nanxing Hu
, Xiaoyue Duan
, Jinchao Zhang
, Guoliang Kang
:
Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models. 4778-4787 - Yiwen Liang
, Hui Chen
, Yizhe Xiong
, Zihan Zhou
, Mengyao Lyu
, Zijia Lin
, Shuaicheng Niu
, Sicheng Zhao
, Jungong Han
, Guiguang Ding
:
Advancing Reliable Test-Time Adaptation of Vision-Language Models under Visual Variations. 4788-4797 - Chunpeng Wang
, Wenlong Ma
, Li Zou
, Zhiqiu Xia
, Qi Li
, Bin Ma
, Yunan Liu
:
Toward Robust Deepfake Detection: A Proactive Method Based on Watermarking and Knowledge Distillation. 4798-4807 - Futa Waseda
, Saku Sugawara
, Isao Echizen
:
Quality Text, Robust Vision: The Role of Language in Enhancing Visual Robustness of Vision-Language Models. 4808-4816 - Zhenghao Liu
, Xingsheng Zhu
, Tianshuo Zhou
, Xinyi Zhang
, Xiaoyuan Yi
, Yukun Yan
, Ge Yu
, Maosong Sun
:
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts. 4817-4826 - Garry Yang
, Zizhe Chen
, Man Hon Wong
, Haoyu Lei
, Yongqiang Chen
, Zhenguo Li
, Kaiwen Zhou
, James Cheng
:
MESH - Understanding Videos Like Human: Measuring Hallucinations in Large Video Models. 4827-4836 - Jiawei Zheng
, Feiyan Liu
, Xiaoli Wang
:
Seeing Through Ambiguity: Effective Video-guided Machine Translation via Chaotic Fusion and Causally Aligned Spatio-temporal Attention. 4837-4845 - Qingqing Fang
, Wenxi Lv
, Qinliang Su
:
AF-CLIP: Zero-Shot Anomaly Detection via Anomaly-Focused CLIP Adaptation. 4846-4855 - Qiqi Zhan
, Shiwei Li
, Qingjie Liu
, Yunhong Wang
:
AttriPrompt: Dynamic Prompt Composition Learning for CLIP. 4856-4865 - Rui Pan
, Ruiying Lu
:
SP-Mamba: Spatial-Perception State Space Model for Unsupervised Medical Anomaly Detection. 4866-4874 - Feiran Liu
, Yuzhe Zhang
, Xinyi Huang
, Yinan Peng
, Xinfeng Li
, Lixu Wang
, Yutong Shen
, Ranjie Duan
, Simeng Qin
, Xiaojun Jia
, Qingsong Wen
, Wei Dong
:
The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework. 4875-4883 - Xianrun Xu
, Baoyao Yang
, Wanyun Li
, Jingsong Lin
, Yufei Xu
:
Simple but Effective: Sub-Volume Contrastive Learning for Class-Imbalanced Semi-Supervised 3D Medical Image Segmentation. 4884-4893 - Junlin Fang
, Wenya Wang
, Lingli Zhang
, Fengmao Lv
:
Why is a Bird's Caption a Good Demonstration? Towards Effective Multimodal In-Context Learning without Dedicated Data. 4894-4903 - Xide Xu
, Sandesh Kamath
, Muhammad Atif Butt
, Bogdan Raducanu
:
An h-space Based Adversarial Attack for Protection Against Few-shot Personalization. 4904-4913 - Yiqing Hao
, Yangru Huang
, Yi Jin
, Tao Wang
, Yidong Li
, Yigang Cen
:
Tree of Prompts: Aligning Hierarchical Visual Prior for Continual Generalized Category Discovery. 4914-4922 - Wenxiang Liu
, Yongkang Liu
, Weiliang Meng
, Gaoqi He
, Jianhua Li
:
D3L: Curvature-Constrained Denoising Diffusion Model for 3D Lane Detection. 4923-4931 - Bingcai Wei
, Hui Liu
, Chuang Qian
, Zijian Li
, Wangyu Wu
, Zijie Meng
:
Robust Single Image Sand Removal by Leveraging Uncertainty-aware SAM Priors and Prompt Learning with Refined Perceptual Loss. 4932-4941 - Ziyan Liu
, Junwen Li
, Kaiwen Li
, Tong Ruan
, Chao Wang
, Xinyan He
, Zongyu Wang
, Xuezhi Cao
, Jingping Liu
:
I2CR: Intra- and Inter-modal Collaborative Reflections for Multimodal Entity Linking. 4942-4951 - Changzhou Li
, Xinyu Yang
, Weiguo Yang
, Xinyi Li
:
VaF-LangSplat: Voxel-Aware Fusion Language Gaussian Splatting. 4952-4961 - Chang Huang
, Jiahang Cao
, Jun Ma
, Kieren Yu
, Cong Li
, Huayong Yang
, Kaishun Wu
:
DACA-Net: A Degradation-Aware Conditional Diffusion Network for Underwater Image Enhancement. 4962-4971 - Muzhi Dai
, Jiashuo Sun
, Zhiyuan Zhao
, Shixuan Liu
, Rui Li
, Junyu Gao
, Xuelong Li
:
From Captions to Rewards (CaReVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models. 4972-4981 - Yimou Guo
, Yaochen Li
, Jingze Liu
, Jiahui Feng
, Haoyi Lou
, Zhimin Chen
, Yuan Gao
, Yuanqi Su
:
Image Captioning with Multimodal Guidance and Search Space Optimization. 4982-4991 - Yongqi Li
, Lu Yang
, Jian Wang
, Runyang You
, Wenjie Li
, Liqiang Nie
:
Towards Harmless Multimodal Assistants with Blind Preference Optimization. 4992-5000 - Donglu Yang
, Liang Zhang
, Zihao Yue
, Liangyu Chen
, Yichen Xu
, Wenxuan Wang
, Qin Jin
:
ChartM3: Benchmarking Chart Editing with Multimodal Instructions. 5001-5009 - Hao Cheng
, Erjia Xiao
, Jiayan Yang
, Jinhao Duan
, Yichi Wang
, Jiahang Cao
, Qiang Zhang
, Le Yang
, Kaidi Xu
, Jindong Gu
, Renjing Xu
:
Transfer Attack for Bad and Good: Explain and Boost Adversarial Transferability across Multimodal Large Language Models. 5010-5019 - Shibo Sun
, Xue Li
, Donglin Di
, Mingjie Wei
, Lanshun Nie
, Weinan Zhang
, Dechen Zhan
, Yang Song
, Lei Fan
:
LLaPa: A Vision-Language Model Framework for Counterfactual-Aware Procedural Planning. 5020-5029 - Guoxin Zang
, Xue Li
, Donglin Di
, Lanshun Nie
, Dechen Zhan
, Yang Song
, Lei Fan
:
SAGE: A Visual Language Model for Anomaly Detection via Fact Enhancement and Entropy-aware Alignment. 5030-5039 - Zhipeng Tang
, Sha Zhang
, Jiajun Deng
, Chenjie Wang
, Guoliang You
, Yuting Huang
, Xinrui Lin
, Yanyong Zhang
:
VLMPlanner: Integrating Visual Language Models with Motion Planning. 5040-5049 - Zhiqing Cui
, Jiahao Yuan
, Hanqing Wang
, Yanshu Li
, Chenxu Du
, Zhenglong Ding
:
Draw with Thought: Unleashing Multimodal Reasoning for Scientific Diagram Generation. 5050-5059 - Jinghan Yang
, Zhenbo Xu
, Dehua Ma
, Liu Liu
, Fei Liu
, Gong Huang
, Zhaofeng He
:
RecipeRAG: Advancing Recipe Generation with Reinforced Retrieval Augmented Generation. 5060-5069 - Shiqi Zhang
, Sha Zhang
, Jiajun Deng
, Yedong Shen
, Mingxiao Ma
, Yanyong Zhang
:
PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global Curriculum. 5070-5079 - Xinyao Li
, Dan Zhang
, Zhekai Du
, Lei Zhu
, Zhi Chen
, Jingjing Li
:
PatAug: Augmentation of Augmentation for Test-Time Adaptation. 5080-5089 - Xueqi Ma
, Yanbei Jiang
, Sarah M. Erfani
, James Bailey
, Weifeng Liu
, Krista A. Ehinger
, Jey Han Lau
:
Reasoning Like Experts: Leveraging Multimodal Large Language Models for Drawing-based Psychoanalysis. 5090-5099 - Yidan Wang
, Chenyi Zhuang
, Wutao Liu
, Pan Gao
, Nicu Sebe
:
AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual Grounding. 5100-5109 - YongXiang Hua
, Haoyu Cao
, Zhou Tao
, Bocheng Li
, Zihao Wu
, Chaohu Liu
, Linli Xu
:
Input Domain Aware MoE: Decoupling Routing Decisions from Task Optimization in Mixture of Experts. 5110-5119 - Yuxin Xie
, Dongyue Chen
, Yue Zhu
, Tong Jia
, Shizhuo Deng
:
Noise-Aware Decoding with Salient Region Enhancing for Zero-Shot Image Captioning. 5120-5129 - Huiyi Chen
, Jiawei Peng
, Kaihua Tang
, Xin Geng
, Xu Yang
:
Enhancing Multimodal In-Context Learning for Image Classification through Coreset Optimization. 5130-5139 - Bin Kang, Bin Chen
, Junjie Wang, Yulin Li, Junzhi Zhao, Junle Wang, Zhuotao Tian:
CalibCLIP: Contextual Calibration of Dominant Semantics for Text-Driven Image Retrieval. 5140-5149 - Jianming Liu
, Wenlong Qiu
, Haitao Wei
:
Textual and Visual Guided Task Adaptation for Source-Free Cross-Domain Few-Shot Segmentation. 5150-5159 - Gang Pan
, Hongen Liu
, Di Sun
:
Formula Spotting Based on Synergy Perception and Representation Mining. 5160-5168 - Hang Yang
, Le Hui
, Jianjun Qian
, Jian Yang
, Yigong Zhang
, Jin Xie
:
Cross-View Geometric Collaboration for Generalizable Sparse View Neural Surface Reconstruction. 5169-5177 - Wenju Sun
, Qingyong Li
, Wen Wang
, Yangliao Geng
, Boyang Li
:
Task Arithmetic in Trust Region: A Training-Free Model Merging Approach to Navigate Knowledge Conflicts. 5178-5187 - Yiyan Ji
, Haoran Chen
, Qiguang Chen
, Chengyue Wu
, Libo Qin
, Wanxiang Che
:
MPCC: A Novel Benchmark for Multimodal Planning with Complex Constraints in Multimodal Large Language Models. 5188-5197 - Changsheng Gao
, Zijie Liu
, Li Li
, Dong Liu
, Xiaoyan Sun
, Weisi Lin
:
DT-UFC: Universal Large Model Feature Coding via Peaky-to-Balanced Distribution Transformation. 5198-5207 - Xinshu Li
, Ruoyu Wang
, Erdun Gao
, Mingming Gong
, Lina Yao
:
Causality-aligned Prompt Learning via Diffusion-based Counterfactual Generation. 5208-5217 - Sujuan Hou
, Zhihui Feng
, Hao Xiong
, Weiqing Min
, Peng Li
, Shuqiang Jiang
:
DSDGF-Nutri: A Decoupled Self-Distillation Network with Gating Fusion For Food Nutritional Assessment. 5218-5227 - Yixin Xu
, Hao Wu
, Jingzhou Zhu
, Fengyuan Xu
, Sheng Zhong
:
PriCAF: Privacy-Preserving Contribution Assessment in Federated Learning Before Model Training. 5228-5236 - Yuehao Huang
, Liang Liu
, Shuangming Lei
, Yukai Ma
, Hao Su
, Jianbiao Mei
, Pengxiang Zhao
, Yaqing Gu
, Yong Liu
, Jiajun Lv
:
CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking. 5237-5246 - Yafei Zhang
, Yongle Shang
, Huafeng Li
:
Dual-Granularity Cross-Modal Identity Association for Weakly-Supervised Text-to-Person Image Matching. 5247-5256 - Fan Li
, Zanyi Wang
, Zeyi Huang
, Guang Dai
, Jingdong Wang
, Mengmeng Wang
:
TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP. 5257-5266 - Yongheng Zhang
, Xu Liu
, Ruihan Tao
, Qiguang Chen
, Hao Fei
, Wanxiang Che
, Libo Qin
:
ViTCoT: Video-Text Interleaved Chain-of-Thought for Boosting Video Understanding in Large Language Models. 5267-5276 - Gang Pan
, Liming Pan
, Hongze Mi
, Rongyu Xiong
, Jiahao Wang
, Di Sun
:
AFFIR: Dual-Modal Attention Feature Fusion for Scene Text Image Retargeting. 5277-5285 - Chunyan Wang
, Dong Zhang
, Jinhui Tang
:
Diffusion-Guided Knowledge Distillation for Weakly-Supervised Low-Light Semantic Segmentation. 5286-5295 - Hanting Wang
, Shengpeng Ji
, Shulei Wang
, Hai Huang
, Xiao Jin
, Qifei Zhang
, Tao Jin
:
TAP: Parameter-efficient Task-Aware Prompting for Adverse Weather Removal. 5296-5305 - Ming Li
, Yupeng Hu
, Yinwei Wei
, Hao Liu
, Haocong Wang
, Weili Guan
:
DCount: Decoupled Spatial Perception and Attribute Discrimination for Referring Expression Counting. 5306-5315 - Jiawei Meng
, Zhengmao Yang
, Zhiqiang Liu
, Shaokai Chen
, Zhizhen Liu
, Wen Zhang
, Huajun Chen
:
Text-to-Image Generation with Multi-modal Knowledge Graph Construction and Retrieval. 5316-5325 - Shehzad Ali
, Md Tanvir Islam
, Ik Hyun Lee
, Mingfu Xiong
, Minh-Son Dao
, Saeed Anwar
, Sambit Bakshi
, Khan Muhammad
:
Towards Hazardous Activity Recognition for A Novel Real-World Dataset. 5326-5335 - Maksim Golyadkin
, Valeria Rubanova
, Aleksandr Utkov
, Dmitry Nikolotov
, Ilya Makarov
:
Evaluation of Egyptian Hieroglyph Classification Across Diverse Writing Styles. 5336-5344 - Jiaming Liang
, Chi-Man Pun
:
I-C Attack: In-place and Cross-pixel Augmentations for Highly Transferable Transformation-based Attacks. 5345-5354 - Jinlan Fu
, Shenzhen Huangfu
, Hao Fei
, Yichong Huang
, Xiaoyu Shen
, Xipeng Qiu
, See-Kiong Ng
:
MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation. 5355-5364 - Zhaoyun Jiang
, Jiaqi Guo
, Shakie Liu
, Chao Han
, Ting Liu
, Jian-Guang Lou
, Dongmei Zhang
:
Illustration Layout Generation for Slide Enhancement with Pixel-based Diffusion Model. 5365-5374 - Yubin Zheng
, Pak-Hei Yeung
, Jing Xia
, Tianjie Ju
, Peng Tang
, Weidong Qiu
, Jagath C. Rajapakse
:
FedDEAP: Adaptive Dual-Prompt Tuning for Multi-Domain Federated Learning. 5375-5384 - Qiuna Tan
, Runqi Qiao
, Guanting Dong
, Yifan Zhang
, Minhui Wu
, Jiapeng Wang
, Miaoxuan Zhang
, Yida Xu
, Chong Sun
, Chen Li
, Honggang Zhang
:
OCR-Critic: Aligning Multimodal Large Language Models' Perception through Critical Feedback. 5385-5393 - Yicheng Pan, Zhenrong Zhang
, Pengfei Hu
, Jiefeng Ma, Jun Du, Jianshu Zhang, Quan Liu, Jianqing Gao
, Feng Ma:
Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration. 5394-5403 - Yongji Li
, Luping Wang
:
Spatial-Frequency Mamba Collaborative Learning Network for Infrared Small Target Detection. 5404-5412 - Yiming Zhao
, Guorong Li
, Laiyun Qing
, Amin Beheshti
, Jian Yang
, Quan Z. Sheng
, Yuankai Qi
, Qingming Huang
:
SDVPT: Semantic-Driven Visual Prompt Tuning for Open-world Object Counting. 5413-5421 - Rongzhen Zhao
, Vivienne Huiling Wang
, Juho Kannala
, Joni Pajarinen
:
Vector-Quantized Vision Foundation Models for Object-Centric Learning. 5422-5430 - Wanyi Zhuang
, Qi Chu
, Tao Gong
, Changtao Miao
, Nenghai Yu
:
Towards Good Generalizations for Diffusion Generated Image Detection Using Multiple Reconstruction Contrastive Learning. 5431-5440 - Xianzhi Ma
, Jianhui Li
, Changhua Pei
, Hao Liu
:
GeoMag: A Vision-Language Model for Pixel-level Fine-Grained Remote Sensing Image Parsing. 5441-5450
Engagement: Emotional and Social Signals
- Yijie Zhu
, Yibo Lyu
, Zitong Yu
, Rui Shao
, Kaiyang Zhou
, Liqiang Nie
:
EmoSym: A Symbiotic Framework for Unified Emotional Understanding and Generation via Latent Reasoning. 5451-5460 - Jihao Gu
, Kun Li
, Fei Wang
, Yanyan Wei
, Zhiliang Wu
, Hehe Fan
, Meng Wang
:
Motion Matters: Motion-guided Modulation Network for Skeleton-based Micro-Action Recognition. 5461-5470 - Cheng Peng
, Oya Çeliktutan
:
Multi-Task Gaze Communication Understanding. 5471-5479 - Zhiyuan Zhou
, Jilong Liu
, Sanwang Wang
, Shijie Hao
, Yanrong Guo
, Richang Hong
:
InterMind: Doctor-Patient-Family Interactive Depression Assessment Empowered by Large Language Models. 5480-5489 - Bing Wang
, Ximing Li
, Mengzhe Ye
, Changchun Li
, Bo Fu
, Jianfeng Qu
, Lin Yuanbo Wu
:
Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors. 5490-5498 - Qinfu Xu
, Liyuan Pan
, Shaozu Yuan
, Yiwei Wei
, Chunlei Wu
:
From Subtle Hints to Grand Expressions - Mastering Fine-grained Emotions with Dynamic Multimodal Analysis. 5499-5508 - Zhihao Jia
, Meiyan Xu
, Jingyuan Wang
, Ziyu Jia
, Yong Li
, Xinliang Zhou
, Chenyu Liu
, Junfeng Yao
, Yi Ding
:
Sera: Separated Coarse-to-fine Representation Alignment for Cross-subject EEG-based Emotion Recognition. 5509-5518 - Chuhang Zheng
, Chunwei Tian
, Jie Wen
, Daoqiang Zhang
, Qi Zhu
:
HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning. 5519-5527 - Zhiyuan Han
, Beier Zhu
, Yanlong Xu
, Peipei Song
, Xun Yang
:
Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning. 5528-5537 - Mingle Zhou
, Xingli Wang
, Jiachen Li
, Delong Han
, Gang Li
:
Unsupervised Dual-Domain Memory Model for Time Series Anomaly Detection. 5538-5546 - Hao Cheng
, Zhiwei Zhao
, Yichao He
, Zhenzhen Hu
, Jia Li
, Meng Wang
, Richang Hong
:
VAEmo: Efficient Representation Learning for Visual-Audio Emotion With Knowledge Injection. 5547-5556 - Hanmo Chen
, Chenghao Xu
, Jiexi Yan
, Cheng Deng
:
AStF: Motion Style Tranfer via Adaptive Statistics Fusor. 5557-5566 - Huiyu Zhai
, Xingxing Yang
, Yalan Ye
, Chenyang Li
, Bin Fan
, Changze Li
:
Rethinking Occlusion in FER: A Semantic-Aware Perspective and Go Beyond. 5567-5576 - Dongyang Li
, Haoyang Qin
, Mingyang Wu
, Chen Wei
, Quanying Liu
:
BrainFLORA: Uncovering Brain Concept Representation via Multimodal Neural Embeddings. 5577-5586 - Feng-Qi Cui
, Anyang Tong
, Jinyang Huang
, Jie Zhang
, Dan Guo
, Zhi Liu
, Meng Wang
:
Learning from Heterogeneity: Generalizing Dynamic Facial Expression Recognition via Distributionally Robust Optimization. 5587-5596 - Feng Liu
, Lingna Gu
, Chen Shi
, Xiaolan Fu
:
Action Unit Enhance Dynamic Facial Expression Recognition. 5597-5606 - Cheng Luo
, Siyang Song
, Siyuan Yan
, Zhen Yu
, Zongyuan Ge
:
ReactDiff: Fundamental Multiple Appropriate Facial Reaction Diffusion Model. 5607-5616 - Jiateng Liu
, Hengcan Shi
, Haiwen Liang
, Xiaolin Xu
, Yuan Zong
, Yaonan Wang
, Wenming Zheng
:
NaME: A Natural Micro-expression Dataset for Micro-expression Recognition in the Wild. 5617-5626 - Sha Zhao
, Song Yi
, Yangxuan Zhou
, Jiadong Pan
, Jiquan Wang
, Jie Xia
, Shijian Li
, Shurong Dong
, Gang Pan
:
Wearable Music2Emotion : Assessing Emotions Induced by AI-Generated Music through Portable EEG-fNIRS Fusion. 5627-5636 - Yuanchen Shi
, Fang Kong
, Longyin Zhang
:
Impact of Stickers on Multimodal Sentiment and Intent in Social Media: A New Task, Dataset and Baseline. 5637-5646 - Yan-Kai Liu
, Shunyang Yao
, Tao Xi
, Bao-Liang Lu
, Wei-Long Zheng
:
Human vs AI: How Digital Human News Anchors Affect Our Cognitive Processes? 5647-5656 - Yang Deng
, Yu-Kun Lai
, Paul L. Rosin
:
CCDb+: Enhanced Annotations and Multi-Modal Benchmark for Natural Dyadic Conversations. 5657-5666 - Guanyu Hu
, Dimitrios Kollias
, Xinyu Yang
:
Grounding Emotion Recognition with Visual Prototypes: VEGA - Revisiting CLIP in MERC. 5667-5676 - Zhuozhao Hu
, Kaishen Yuan
, Xin Liu
, Zitong Yu
, Yuan Zong
, Jingang Shi
, Huanjing Yue
, Jingyu Yang
:
FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning. 5677-5686 - Tianzuo Xin
, Jing Wang
, Xiyuan Jin
, Xiaojun Ning
, Zhiyang Feng
, Youfang Lin
:
MoCERNet: A Modality-Complete Modeling Framework for Emotion Recognition in Physiological Signals under Imperfect Modal Matching. 5687-5696 - Yue Pan
, Cunbo Li
, Peiyang Li
, Fali Li
, Feng Wan
, Dezhong Yao
, Zehong Cao
, Peng Xu
:
Real-Time EEG Emotion Recognition from Dynamic Mixed Spatiotemporal Graph Learning. 5697-5706 - Deng Li
, Bohao Xing
, Xin Liu
, Baiqiang Xia
, Bihan Wen
, Heikki Kälviäinen
:
DEEMO: De-identity Multimodal Emotion Recognition and Reasoning. 5707-5716 - Ziyi Li
, Wei-Long Zheng
, Bao-Liang Lu
:
Multimodal Emotion Recognition with Missing Modality via a Unified Multi-task Pre-training Framework. 5717-5725 - Tongfei Bian
, Mathieu Chollet
, Tanaya Guha
:
Robust Understanding of Human-robot Social Interactions through Multimodal Distillation. 5726-5734 - Xuandong Huang
, Yuzhe Zhou
, Jiashu Li
, Shiqian Lu
, Shangfei Wang
:
EmoDETective: Detecting, Exploring, and Thinking Emotional Cause in Videos. 5735-5744 - Chengzhe Wang
, Wenqing Ji
, Chenyang Li
, Tongjie Pan
, Yalan Ye
:
Toward Reliable Emotion Recognition: Alleviating Label Noise and Reducing Uncertain Prediction. 5745-5754 - Rui Liu
, Haolin Zuo
, Zheng Lian
, Hongyu Yuan
, Qi Fan
:
Hardness-Aware Dynamic Curriculum Learning for Robust Multimodal Emotion Recognition with Missing Modalities. 5755-5764 - Xiao Fu
, Pengyu Wang
, Wei Xi
, Kun Zhao
, Jiadong Feng
, Jizhong Zhao
:
LES-CLIP: A Lightweight Emotion-Sensitive Adaptation of CLIP for Precise Similar Emotion Discrimination. 5765-5774 - Dan Wu
, Xincheng Ju
, Dong Zhang
, Shoushan Li
, Erik Cambria
, Guodong Zhou
:
Emotion across Modalities and Cultures: Multilingual Multimodal Emotion-Cause Analysis with Memory-inspired Framework. 5775-5783 - Jiankun Zhu
, Sicheng Zhao
, Lulu Tian
, Jing Jiang
, Xi Chen
, Hongxun Yao
:
Emotion in a Bottle: Information Bottleneck Guided Disentanglement for Emotion Domain Adaptation. 5784-5793 - Jian Chen
, Yuxuan Hu
, Haifeng Lu
, Wei Wang
, Min Yang
, Chengming Li
, Xiping Hu
:
MGHFT: Multi-Granularity Hierarchical Fusion Transformer for Cross-Modal Sticker Emotion Recognition. 5794-5803 - Weicheng Xie, Chunlin Yan
, Siyang Song
, Zitong Yu
, Linlin Shen
, Laizhong Cui
:
Smooth Online Multiple Appropriate Facial Reaction Generation. 5804-5813 - Jinpeng Hu
, Hongchang Shi
, Chongyuan Dai
, Zhuo Li
, Peipei Song
, Meng Wang
:
Beyond Emotion Recognition: A Multi-Turn Multimodal Emotion Understanding and Reasoning Benchmark. 5814-5823 - Zheng Zhang
, Nuoqian Xiao
, Qi Chai
, Deheng Ye
, Hao Wang
:
MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind. 5824-5833 - Qile Liu
, Weishan Ye
, Lingli Zhang
, Zhen Liang
:
EEG-SCMM: Soft Contrastive Masked Modeling for Cross-Corpus EEG-Based Emotion Recognition. 5834-5842 - Wending Xiong
, Ruimin Hu
, Lingfei Ren
, Xixi Li
, Dengshi Li
:
SE2E: Recognizing Emotion behind Societal Behavior. 5843-5852 - Zhiming Ma
, Peidong Wang
, Minhua Huang
, Jinpeng Wang
, Kai Wu
, Xiangzhao Lv
, Yachun Pang
, Yin Yang
, Wenjie Tang
, Yuchen Kang
:
TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection. 5853-5862 - Bohao Zhang
, Haoxin Xu
, Jingzhong Lin
, Changbo Wang
, Gaoqi He
:
Regulatory Focus Theory Induced Micro-Expression Analysis with Structured Representation Learning. 5863-5872 - Jinsheng Wei
, Jialiang Sun
, Guanming Lu
, Jingjie Yan
, Dong Zhang
:
Multi-Information Hierarchical Fusion Transformer with Local Alignment and Global Correlation for Micro-Expression Recognition. 5873-5882 - Jinzhao Zhou
, Zehong Cao
, Yiqun Duan
, Connor Barkley
, Daniel Leong
, Xiaowei Jiang
, Quoc-Toan Nguyen
, Ziyi Zhao
, Thomas Do
, Yu-Cheng Chang
, Sheng-Fu Liang
, Chin-Teng Lin
:
Pretraining Large Brain Language Model for Active BCI: Silent Speech. 5883-5892 - Shenjie Jiang
, Zhuoyu Wang
, Xuecheng Wu
, Hongru Ji
, Mingxin Li
, Xianghua Li
, Chao Gao
:
DDSE: A Decoupled Dual-Stream Enhanced Framework for Multimodal Sentiment Analysis with Text-Centric SSM. 5893-5902 - Jinghui Zhang
, Kaiyang Wan
, Longwei Xu
, Ao Li
, Zongfang Liu
, Xiuying Chen
:
From Individuals to Crowds: Dual-Level Public Response Prediction in Social Media. 5903-5912 - Ziying Tan
, Linbo Luo
, Haiyan Yin
, Yew-Soon Ong
, Wentong Cai
:
Crowd Dynamics Demand Adaptivity: Self-Adaptive Physics-Informed Neural Network for Crowd Simulation. 5913-5921
Engagement: Multimedia Search and Recommendation
- Yifan Wang
, Tao Wang
, Chenwei Tang
, Caiyang Yu
, Zhengqing Zang
, Mengmi Zhang
, Shudong Huang
, Jiancheng Lv
:
Dual Prompt Learning for Adapting Vision-Language Models to Downstream Image-Text Retrieval. 5922-5931 - Zhan Yang
, Binghong Chen
, Jiajun Tang
, Yinan Li
:
Unsupervised Similarity-Fusion Transformer Hashing for Multimodal Retrieval. 5932-5941 - Penglei Wang
, Ziming Quan
, Danyang Wu
, Jin Xu
:
Cluster-Aware Contrastive Multi-View Clustering Based on Masked Views. 5942-5950 - Yixuan Zhou
, Yulu Tian
, Wenliang Zhong
, Xingbin Yu
, Heng Tao Shen
, Xing Xu
:
SaP-Bot: A Multimodal Large-Language Model for End-to-End Same-Product Identification. 5951-5960 - Beibei Zhang
, Yanan Lu
, Ruobing Xie
, Zongyi Li
, Siyuan Xing
, Tongwei Ren
, Fen Lin
:
Harnessing Multimodal Large Language Models for Personalized Product Search with Query-aware Refinement. 5970-5978 - Siyuan Huang
, Jiahui Jin
, Xin Lin
, Xigang Sun
, Yukun Ban
:
IM-POI: Bridging ID and Multi-modal Gaps in Next POI Recommendation. 5979-5987 - Mingjie Li
, Junhao Lin
, Dian Ouyang
, Ying Zhang
, Wei Wang
:
Graph-based Approximate Nearest Neighbor Search by Deep Reinforcement Routing. 5988-5997 - Fanshen Meng
, Zhenhua Meng
, Ru Jin
, Yuli Chen
, Rongheng Lin
, Budan Wu
:
TAMER: Interest Tree Augmented Modality Graph Recommender for Multimodal Recommendation. 5998-6006 - Yanbiao Ji
, Dan Luo
, Chang Liu
, Shaokai Wu
, Jing Tong
, Qichen He
, Deyi Ji
, Hongtao Lu
, Yue Ding
:
Generating Negative Samples for Multi-Modal Recommendation. 6007-6016 - Fengxin Li
, Zhiqian Yin
, Hongyan Liu
, Jingcai Guo
, Jun He
, Yi Li
, Chao Zhou
, Jun Zhang
, Haijie Gu
:
Topic Guided Multi-faceted Semantic Disentanglement for CTR prediction. 6017-6026 - Junan Lin
, Daizong Liu
, Xianke Chen
, Xiaoye Qu
, Xun Yang
, Jixiang Zhu
, Sanyuan Zhang
, Jianfeng Dong
:
Audio Does Matter: Importance-Aware Multi-Granularity Fusion for Video Moment Retrieval. 6027-6036 - Yufei Zheng
, Jiawei Liu
, Bingyu Hu
, Zikun Wei
, Yong Wu
, Zheng-Jun Zha
:
Dual Uncertainty-Guided Feature Alignment Learning for Text-Based Person Retrieval. 6037-6046 - Feng Chen
, Jielong He
, Yang Liu
, Heng Liu
, Zhe Chen
, Yaxiong Wang
:
Unsupervised Cross-Modal Person Search via Progressive Diverse Text Generation. 6047-6056 - Yiru Li
, Yingying Zhu
:
PLGeo: A Patch-level Framework to Overcome Orientation Discrepancies in Cross-view Geo-localization. 6057-6065 - Yadong Huo
, Qibing Qin
, Wenfeng Zhang
, Lei Huang
, Jie Nie
:
Factorized Transformer Hashing with Adaptive Routing for Large-scale Image Retrieval. 6066-6074 - Binrui Wu
, Haochen Sui
, Jiaye Lin
, Jiechao Gao
, Ting Xu
, Keyan Jin
, Xuesong Zhang
:
Prototype-Guided Representation Projection for Multi-Domain Multi-Task Recommendation. 6075-6083 - Hang Lv
, Zixuan Guo
, Zijie Wu
, Yanchao Tan
, Guofang Ma
, Zhigang Lin
, Xiping Chen
, Hong Cheng
, Carl Yang
:
MedAlign: Enhancing Combinatorial Medication Recommendation with Multi-modality Alignment. 6084-6092 - Guipeng Xv
, Xinyu Li
, Yi Liu
, Chen Lin
, Xiaoli Wang
:
Unveiling the Impact of Multi-modal Content in Multi-modal Recommender Systems. 6093-6102 - Kuan Liu
, Ke Wang
, Ji Zhang
, Gang Zhou
:
LLM-Grounded Diffusion for Cross-Domain Recommendation. 6103-6112 - Zhiwei Chen
, Yupeng Hu
, Zixu Li
, Zhiheng Fu
, Xuemeng Song
, Liqiang Nie
:
OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval. 6113-6122 - Yujia Zhu
, Hao Yang
, Yibo Zhao
, Chunjie Ma
, Weili Guan
, Zan Gao
:
Lightweight Relational Proposal Network with Dual-Branch Distillation for Video Moment Retrieval. 6123-6132 - Huilin Chen
, Miaomiao Cai
, Fan Liu
, Zhiyong Cheng
, Richang Hong
, Meng Wang
:
I3-MRec: Invariant Learning with Information Bottleneck for Incomplete Modality Recommendation. 6133-6142 - Zhiwei Chen
, Yupeng Hu
, Zixu Li
, Zhiheng Fu
, Haokun Wen
, Weili Guan
:
HUD: Hierarchical Uncertainty-Aware Disambiguation Network for Composed Video Retrieval. 6143-6152 - Zhengxin Pan
, Haishuai Wang
, Fangyu Wu
, Peng Zhang
, Jiajun Bu
:
Hubness Reduction with Dual Bank Sinkhorn Normalization for Cross-Modal Retrieval. 6153-6162 - Zhaoqi Chen
, Wanni Xu
, Yunfeng Zhang
, Yawei Hou
, Zhenyu Wen
, Cong Wang
:
DeCoRec: Decoupled Collaborative Refinement for Multi-Modal Sequential Recommendations. 6163-6172 - Fan Hu
, Zijie Xin
, Xirong Li
:
Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search. 6173-6182 - Xiong Li
, Yikang Yan
, Zhenyu Wen
, Qin Yuan
, Fangda Guo
, Zhen Hong
, Ye Yuan
:
Open3DSearch: Zero-Shot Precise Retrieval of 3D Shapes Using Text Descriptions. 6183-6192 - Wei Yang
, Rui Zhong
, Yiqun Chen
, Shixuan Li
, Heng Ping
, Chi Lu
, Peng Jiang
:
FITMM: Adaptive Frequency-Aware Multimodal Recommendation via Information-Theoretic Representation Learning. 6193-6202 - Te Song
, Lianyong Qi
, Weiming Liu
, Fan Wang
, Xiaolong Xu
, Hongsheng Hu
, Yang Cao
, Xuyun Zhang
, Amin Beheshti
:
Boosting Guided Diffusion with Large Language Models for Multimodal Sequential Recommendation. 6203-6212 - Rui Shang
, Min Liu
, Xueping Wang
, Yuan Bian
, Yaonan Wang
:
Decoupled Identity and Attribute Tokenization for Person Re-Identification. 6213-6222 - Qing Wang
, Chong-Wah Ngo
, Yu Cao
, Ee-Peng Lim
:
Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe Retrieval. 6223-6231 - Jingmao Zhang
, Zhiting Zhao
, Yunqi Lin
, Jianghong Ma
, Tianjun Wei
, Haijun Zhang
, Xiaofeng Zhang
:
Dual-Phase Playtime-guided Recommendation: Interest Intensity Exploration and Multimodal Random Walks. 6232-6241 - Fengyuan Yu
, Yuyuan Li
, Xiaohua Feng
, Junjie Fang
, Tao Wang
, Chaochao Chen
:
LEGO: A Lightweight and Efficient Multiple-Attribute Unlearning Framework for Recommender Systems. 6242-6251 - Tianqi Liu
, Kairui Fu
, Shengyu Zhang
, Wenyan Fan
, Zhaocheng Du
, Jieming Zhu
, Fan Wu
, Fei Wu
:
CHORD: Customizing Hybrid-precision On-device Model for Sequential Recommendation with Device-cloud Collaboration. 6252-6261 - Min Tan
, Guanhao Liu
, Huijing Zhan
, Yuyu Yin
, Zhou Yu
, Jiajun Ding
, Yinfu Feng
:
DiSCo: Disentangled Attribute Manipulation Retrieval via Semantic Reconstruction and Consistency Regularization. 6262-6270 - Wentao Fan
, Chao Zhang
, Chunlin Chen
, Huaxiong Li
:
Online Cross-Modal Hashing with Multi-Level Memory. 6271-6279 - Chenxu Wang
, Dong Zhou
, Ting Liu
, Jianghao Lin
, Yongmei Zhou
, Aimin Yang
:
DiffTMR: Diffusion-based Hierarchical Alignment for Text-Molecule Retrieval. 6280-6288 - Fengbin Zhu
, Junfeng Li
, Liangming Pan
, Wenjie Wang
, Fuli Feng
, Chao Wang
, Huanbo Luan
, Tat-Seng Chua
:
Towards Temporal-Aware Multi-Modal Retrieval Augemented Generation in Finance. 6289-6297 - Yue He
, Jingxi Xie
, Fengling Li
, Lei Zhu
, Jingjing Li
:
Flip is Better than Noise: Unbiased Interest Generation for Multimedia Recommendation. 6298-6306 - Jinhong Wang
, Tajamul Ashraf
, Zongyan Han
, Jorma Laaksonen
, Rao Muhammad Anwer
:
MIRA: A Novel Framework for Fusing Modalities in Medical RAG. 6307-6315 - Shouxing Ma
, Yawen Zeng
, Shiqing Wu
, Guandong Xu
:
Refining Contrastive Learning and Homography Relations for Multi-Modal Recommendation. 6316-6324 - Jinfeng Xu
, Zheyu Chen
, Shuo Yang
, Jinze Li
, Edith C. H. Ngai
:
The Best is Yet to Come: Graph Convolution in the Testing Phase for Multimodal Recommendation. 6325-6334 - Kipp Freud
, Daniel E. Collins
, Delmiro D. Sampaio Neto
, Grant Stevens
:
VibeSpace: Automatic Generation of Data and Vector Embeddings for Arbitrary Domains and Cross-domain Mappings using LLMs. 6335-6342 - Kaihang Jiang
, Wai Keung Wong
, Jianyang Qin
, Xiaozhao Fang
, Jie Wen
, Bingzhi Chen
, Hongbo Gao
:
Label Prediction Inherited Hashing for Cross-Modal Retrieval: Applying Supervised Hashing to Unsupervised Tasks. 6343-6352 - Yang Yu
, Meiyu Liang
, Wei Huang
, Juncheng Zheng
, Kangkang Lu
, Yawen Li
, Junping Du
, Zhe Xue
, Wu Liu
:
Asymmetric Pre-aligned Anchor Contrastive Enhanced Diffusion Hashing Model for Incomplete Multimodal Retrieval. 6353-6362 - Weihai Lu
, Li Yin
:
DMMD4SR: Diffusion Model-based Multi-level Multimodal Denoising for Sequential Recommendation. 6363-6372 - Fan Li
, Jiazhen Huang
, Shisong Tang
, Bing Han
, Huafeng Cao
, Haochen Sui
, Ting Xu
, Xiaoyu Kang
:
Contrastive Prototype Framework for Calibrating Video Recommendation. 6373-6382 - Zixin Tang
, Haihui Fan
, Jinchao Zhang
, Hui Ma
, Xiaoyan Gu
, Bo Li
, Weiping Wang
:
ShieldIR: Privacy-Preserving Unsupervised Cross-Domain Image Retrieval via Dual Protection Transformation. 6383-6392 - Kun Cheng
, Qibing Qin
, Wenfeng Zhang
, Lei Huang
, Jie Nie
:
Deep Probabilistic Binary Embedding via Learning Reliable Uncertainty for Cross-Modal Retrieval. 6393-6402 - Haowen Gao
, Liang Pang
, Shicheng Xu
, Leigang Qu
, Tat-Seng Chua
, Huawei Shen
, Xueqi Cheng
:
Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos. 6403-6411 - Nhu-Thuat Tran
, Hady W. Lauw
:
Parameter-Efficient Variational AutoEncoder for Multimodal Multi-Interest Recommendation. 6412-6420 - Qingtian Bian
, Tieying Li
, Marcus Vinícius de Carvalho
, Jiaxing Xu
, Hui Fang
, Yiping Ke
:
Multi-Domain Enhancement via Residual Interwoven Transfer in Cross-Domain Sequential Recommendation. 6421-6430 - Yuli Liu
, Wenjun Kong
, Weizhi Ma
, Cheng Luo
:
Why Generate When You Can Transform? Unleashing Generative Attention for Dynamic Recommendation. 6431-6440 - Yanwei Xie
, Weizhi Nie
, Lanjun Wang
, Hongshuo Tian
, Changtai Shi
, An-An Liu
:
When Headlines Meet Minds: Empowering News Recommendations with Social Simulator. 6441-6450 - Fan Zhang
, Jinpeng Chen
, Huan Li
, Senzhang Wang
, Yuan Cao
, Kaimin Wei
, Jianxiang He
, Feifei Kou
, Jinqing Wang
:
Leveraging Multimodal Data and Side Users for Diffusion Cross-Domain Recommendation. 6451-6460
Engagement: Summarization, Analytics, and Storytelling
- Pengyuan Li
, Man Liu
, Dongxia Chang
, Yiming Wang
, Zisen Kong
, Yao Zhao
:
AEMVC: Mitigate Imbalanced Embedding Space in Multi-view Clustering. 6461-6470 - Jiajun Han
, Xuran Yang
, Hui Zhang
:
Query-Focused Multimodal Summarization with Gate-Guided Mixture-of-Experts. 6471-6480 - Xiaohang Zhang
, Hui Gao
, Bo Zhang
, Xiao Chen
, Kun Niu
, Tan Yang
, Wufan Wang
, Wendong Wang
:
AIGC-Enhanced UAV-Based 3D Mapping and Trajectory Planning for Rapid Disaster Response. 6481-6489 - Zhijiang Tang
, Jiaxin Qi
, Yuhua Zheng
, Jianqiang Huang
:
A Comprehensive Benchmark for Electrocardiogram Time-Series. 6490-6499 - Lei Yao, Yi Wang, Yi Zhang, Moyun Liu, Lap-Pui Chau:
GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting. 6500-6509 - Hanling Wang
, Qing Li
, Li Chen
, Haidong Kang
, Fei Ma
, Yong Jiang
:
HoloTrace: LLM-based Bidirectional Causal Knowledge Graph for Edge-Cloud Video Anomaly Detection. 6510-6519 - Xinyu Xiao
, Peixi Peng
, Qiang Wang
, Chao Xing
, Shuhan Qi
:
Multi-faceted Complementary Learning for Incomplete Multi-view Multi-label Classification. 6520-6529 - Yawen Cui
, Wenbin Zou
, Huiping Zhuang
, Yi Wang
, Lap-Pui Chau
:
Probabilistic Mixture of Hyperbolic Mamba for Few-Shot Class-Incremental Learning. 6530-6539 - Mingkang Li
, Xuexiong Luo
, Yue Zhang
, Yaoyang Li
, Fu Lin
:
GTHNA: Local-global Graph Transformer with Memory Reconstruction for Holistic Node Anomaly Evaluation. 6540-6548 - Yamiao Ding
, Tianrui Liu
, Zhizhou Lu
, Jun-Jie Huang
, Wentao Zhao
, Xinwang Liu
, Meng Wang
:
VSumMamba: Mamba Empowered Efficient Video Summarization with Multi-Scale Spatial-Temporal Modeling. 6549-6557 - Haotian Gan
, Yudong Li
, Wanyue Li
, Weidong Tang
:
Aligned or Apart? Multi-Agent Insights into Consumer and Brand Messaging Discrepancies. 6558-6566 - Na Jiang
, Wenhui Zheng
, Xuqian Gu
, Jingjing Wang
:
OmniDoctor: Towards LLM-centric Lifelong Learning for New Emerging Medical VQA Tasks. 6567-6575 - Jiali Chen
, Yujie Jia
, Zihan Wu
, Jinyu Yang
, Jianpeng Chen
, Xusen Hei
, Jiayuan Xie
, Yi Cai
, Qing Li
:
ExpStar: Towards Automatic Commentary Generation for Multi-discipline Scientific Experiments. 6576-6585 - Junxiao Ma
, Jingjing Wang
, Min Zhang
, Guodong Zhou
:
Skynet-V1: Towards Early Warning of Video Abnormal Events via A Spatial-temporal Causal-enhanced MoE Framework. 6586-6595 - Manolis Mylonas
, Evlampios Apostolidis
, Vasileios Mezaris
:
SD-VSum: A Method and Dataset for Script-Driven Video Summarization. 6596-6604
Experience: Art and Culture
- Changjuan Ran
, Fang Liu
, Runqi Fang
, Xiangyu Meng
, Shenglan Cui
, Yunfan Ye
:
Where Watermark Meets Beauty: Expert-Guided Aesthetic Visible Watermarking for Digital Artworks. 6605-6614 - Jiayun Hu
, Yueyi He
, Tianyi Liang
, Changbo Wang
, Chenhui Li
:
Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning. 6615-6624 - Shiqi Jiang
, Xinpeng Li
, Xi Mao
, Changbo Wang
, Chenhui Li
:
PPJudge: Towards Human-Aligned Assessment of Artistic Painting Process. 6625-6633 - Ruixiang Jiang
, Chang Wen Chen
:
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot. 6634-6643 - Weiran Chen
, Guiqian Zhu
, Ying Li
, Yi Ji
, Chunping Liu
:
DA-Font: Few-Shot Font Generation via Dual-Attention Hybrid Integration. 6644-6653 - Chuanwei Huang
, Zexi Jia
, Hongyan Fei
, Yeshuang Zhu
, Zhiqiang Yuan
, Jinchao Zhang
, Jie Zhou
:
ArtFRD: A Fisher-Rao Mixture Metric for Generative Model Aesthetic Evaluation. 6654-6662 - Kaixing Yang
, Xulong Tang
, Haoyu Wu
, Biao Qin
, Hongyan Liu
, Jun He
, Zhaoxin Fan
:
CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition. 6663-6671 - Xiao Zhang
, Johan Bos
:
Multi-Modal Semantic Parsing for the Interpretation of Tombstone Inscriptions. 6672-6681 - Yuhong Zhang
, Liyao Wang
, Han Wang
, Danni Wu
, Zuzeng Lin
, Feng Wang
, Li Song
:
AnimeColor: Reference-based Animation Colorization with Diffusion Transformers. 6682-6690 - Zuona Chen
, James She
:
Infusing AI Art with Cultural Authenticity Through the Culture-Specific LoRA. 6691-6699 - Shuai Wang
, Ivona Najdenkoska
, Hongyi Zhu
, Stevan Rudinac
, Monika Kackovic
, Nachoem Wijnberg
, Marcel Worring
:
ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding. 6700-6709 - Wei Zhang
, Wong Kam-Kwai
, Biying Xu
, Yiwen Ren
, Yuhuai Li
, Yingchaojie Feng
, Minfeng Zhu
, Wei Chen
:
CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model. 6710-6719 - Songtao Zhou
, Xiaoyu Qin
, Yixuan Zhou
, Qixin Wang
, Zeyu Jin
, Zixuan Wang
, Zhiyong Wu
, Jia Jia
:
HarmoniVox: Painting Voices to Match the Avatar's Soul. 6720-6729 - Tiancheng Liu
, Jiayi Ye
, Shumeng Zhang
, Kang Zhang
, Chen Liang
:
Quantifying Structural Aesthetic Features and Personality Trait Preferences in Kai Shu Calligraphy. 6730-6739 - Mingyang Su
, Chao Liu
, Jingling Zhang
, Shuang Wu
, Mingming Fan
:
SimViews: An Interactive Multi-Agent System Simulating Visitor-to-Visitor Conversational Patterns to Present Diverse Perspectives of Artifacts in Virtual Museums. 6740-6750
Experience: Interactions and Quality of Experience
- Ziheng Jia
, Zicheng Zhang
, Jiaying Qian
, Haoning Wu
, Wei Sun
, Chunyi Li
, Xiaohong Liu
, Weisi Lin
, Guangtao Zhai
, Xiongkuo Min
:
VQA2: Visual Question Answering for Video Quality Assessment. 6751-6760 - Jiaying Qian
, Ziheng Jia
, Zicheng Zhang
, Zeyu Zhang
, Guangtao Zhai
, Xiongkuo Min
:
Towards Explainable Partial-AIGC Image Quality Assessment. 6761-6770 - Zhichao Zhang
, Wei Sun
, Xinyue Li
, Yunhao Li
, Qihang Ge
, Jun Jia
, Zicheng Zhang
, Zhongpeng Ji
, Fengyu Sun
, Shangling Jui
, Xiongkuo Min
, Guangtao Zhai
:
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric. 6771-6780 - Jiawei Zhang
, Haonan Zhang
, Weitao Zhang
, Liang Pu
, Zesen Feng
, Jie Guo
:
Decoupled Motion Prediction for Real-time G-buffer Free Frame Extrapolation. 6781-6790 - Wenhao Li
, Xiu Su
, Jingyi Wu
, Feng Yang
, Yang Liu
, Yi Chen
, Shan You
, Chang Xu
:
Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation. 6791-6800 - Tong Liu
, Zhiwei Fan
, Guanyan Peng
, Haodan Zhang
, Yucheng Zhang
, Zhen Wang
, Pengjin Xie
, Liang Liu
:
DeLoad: Demand-Driven Short-Video Preloading with Scalable Watch-Time Estimation. 6801-6809 - Richen Liu
, Lingyu Sun
, Xuefeng Huang
, Yiran Li
, Jiang Zhang
, Siru Chen
, Zhouhao Wu
, Ayush Kumar
, Chufan Lai
:
Meta-Illustrator: Transferring Illustrations from 2D Interactive Image Space to 3D Immersive Exploration Space. 6810-6819 - Junzhe Zhang
, Chengfeng Han
, Dandan Ding
, Zhan Ma
:
GeoQE: Enhancing Quality of Experience in Point Cloud Streaming. 6820-6829 - Xiangfei Sheng
, Pangu Xie
, Weidong Zou
, Pengfei Chen
, Tong Zhu
, Leida Li
:
InstructCrop: Teaching Multimodal Large Language Models to Crop Aesthetic Images. 6830-6839 - Sihan Zhao
, Zixuan Wang
, Tianyu Luan
, Jia Jia
, Wentao Zhu
, Jiebo Luo
, Junsong Yuan
, Nan Xi
:
PP-Motion: Physical-Perceptual Fidelity Evaluation for Human Motion Generation. 6840-6849 - Songpei Xu
, Xuri Ge
, Chaitanya Kaul
, Roderick Murray-Smith
:
HandSolo: A Mid-Air Hand Pose Interaction Method Based on Disentangled Degrees-of-Hand-Freedom. 6850-6858 - Yitong Zhu
, Zhuowen Liang
, Yiming Wu
, Tangyao Li
, Yuyang Wang
:
Towards Consumer-Grade Cybersickness Prediction: Multi-Model Alignment for Real-Time Vision-Only Inference. 6859-6867 - Xuan Zhang
, Sin Chee Chin
, Jing-Hao Xue
, Xiaochen Yang
, Wenming Yang
:
DARL: Mitigating Gradient Conflicts in Long-Tailed Out-of-Distribution Learning. 6868-6877 - Weizhi Chen
, Ziwei Wang
, Leyang Yang
, Sheng Zhou
, Xiaoxuan Tang
, Jiajun Bu
, Yong Li
, Wei Jiang
:
PG-Agent: An Agent Powered by Page Graph. 6878-6887 - Yongyang Zhou, Fanglue Zhang, Zichen Wang, Lei Zhang:
RTR-GS: 3D Gaussian Splatting for Inverse Rendering with Radiance Transfer and Reflection. 6888-6897 - Lei Chen
:
Graph-Perceptron with Semantic Fidelity for No-Reference Super-Resolution Image Quality Assessment. 6898-6907 - Zitong Xu, Huiyu Duan
, Bingnan Liu, Guangji Ma, Jiarui Wang, Liu Yang, Shiqi Gao, Xiaoyu Wang, Jia Wang, Xiongkuo Min, Guangtao Zhai, Weisi Lin:
LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs. 6908-6917 - Gyeongjin Kim
, Sebin Lee
, Daye Kim
, Jungjin Lee
, Minju Kim
:
Bring the VibeOn: Designing a Multimodal Interface for Shared Emotional Experiences in Live-streamed Concerts. 6918-6927 - Sijing Wu
, Yunhao Li
, Ziwen Xu
, Yixuan Gao
, Huiyu Duan
, Wei Sun
, Guangtao Zhai
:
FVQ: A Large-Scale Dataset and an LMM-based Method for Face Video Quality Assessment. 6928-6937 - De Li
, Zhou Tan
, Qiyu Li
, Zeming Gan
, Tiange Xia
, Jinyan Wang
, Xianxian Li
:
FedRog: Robust Federated Graph Classification for Strong Heterogeneity and High-Noise Scenarios. 6938-6947 - Yixuan Gao
, Xiongkuo Min
, Jinliang Han
, Yuqin Cao
, Sijing Wu
, Yunze Dou
, Guangtao Zhai
:
Multi-Dimensional Text-to-Face Image Quality Assessment Using LLM: Database and Method. 6948-6957 - Qiang Li
, Qingsen Yan
, Haojian Huang
, Peng Wu
, Haokui Zhang
, Yanning Zhang
:
Text-Visual Semantic Constrained AI-Generated Image Quality Assessment. 6958-6966 - Yang Hu
, Jingui Ma
, Yucheng Yang
, Jie Liang
, Jinbo Yan
, Jiahao Wu
, Jiayu Yang
, Yang Deng
, Ronggang Wang
:
Excavating the Most Critical Gaussians: Sparse Selection and Structural Optimization for Efficient 3DGS Compression. 6967-6976 - Cunhang Fan
, Sheng Zhang
, Jingjing Zhang
, Enrui Liu
, Xinhui Li
, Gangming Zhao
, Zhao Lv
:
DMF2Mel: A Dynamic Multiscale Fusion Network for EEG-Driven Mel Spectrogram Reconstruction. 6977-6985 - Shalayiding Sirejiding
, Yue Ding
, Yuxiang Lu
, Xinyi Hou
, Shaokai Wu
, Qichen He
, Chunlin Wang
, Wenqiang Guo
, Hongtao Lu
:
CLIP-MT: Multi-Modal Knowledge-Driven Adaptive Scale Feature Allocation for Multi-Task Dense Prediction. 6986-6995 - Yangyang Zhuang
, Wenjia Jiang
, Jiayu Zhang
, Ze Yang
, Joey Tianyi Zhou
, Chi Zhang
:
Learning to Be a Doctor: Searching for Effective Medical Agent Architectures. 6996-7005 - Changhao Pan
, Wenxiang Guo
, Yu Zhang
, Zhiyuan Zhu
, Zhetao Chen
, Han Wang
, Zhou Zhao
:
A Multimodal Evaluation Framework for Spatial Audio Playback Systems: From Localization to Listener Preference. 7006-7015 - Jiawei Li
, Linjie Qiu
, Zhiqing Wu
, Qiongyan Chen
, Ziyan Wang
, Mingming Fan
:
ExplorAR: Assisting Older Adults to Learn Smartphone Apps through AR-powered Trial-and-Error with Interactive Guidance. 7016-7025 - Yu-Fan Lin
, Chia-Ming Lee
, Chih-Chung Hsu
:
DenseSR: Image Shadow Removal as Dense Prediction. 7026-7035 - Yu Chen
, Binbin Yan
, Shuo Chen
, Xinzhu Sang
:
A Comprehensive Model for Visual Fatigue Assessment in 3D Light Field Displays Based on Eye Movement Data Analysis. 7036-7044 - Zhou Tan
, De Li
, Yirui Huang
, Jia-Li Yin
, Ximeng Liu
:
FeatShield: Isolating Malicious Feature Extractors for Backdoor-Robust Federated Learning. 7045-7054 - Liqian Zhang
, Feng Yuan
, Haoran Xie
, Fu Lee Wang
, Zhaoqing Pan
:
Evaluating Visual Quality of Autostereoscopic 3D Displays via a Multimodal Parameter Perception Network. 7055-7063 - Lancheng Gao
, Ziheng Jia
, Yunhao Zeng
, Wei Sun
, Yiming Zhang
, Wei Zhou
, Guangtao Zhai
, Xiongkuo Min
:
EEmo-Bench: A Benchmark for Multi-modal Large Language Models on Image Evoked Emotion Assessment. 7064-7073 - Leidong Fan
, Qian Zhang
, Qing Li
:
Inverse-Tone-Mapped HDR Video Quality Assessment for Broadcast Television: A Comprehensive Dataset and SDR-Referenced Method. 7074-7083 - Dominika Wanat
, Dawid Juszka
, Mikolaj Leszczuk
, Lucjan Janowski
:
Bridging the Lab and the Wild: Behavioral Experiments as a Pathway to QoE Research Closer to Realistic Environment. 7084-7092 - Duy X. Nguyen
, Hoang V. Hoan
, Ninh A. Vu
, Loc T. Nguyen
, Trung T. Phan
:
Like or Not to Like: An Usecase of Vietnamese Street Food Videos on YouTube. 7093-7102 - Ana Rita Rebelo
, Pedro A. Ferreira
, André Tomás Ribeiro
, Rui Nóbrega
:
Walking-with-Portals vs. Teleport in VR: Why Walking and Portals Matter in Small Spaces. 7103-7112 - Swarna Chakraborty
, Mylène C. Q. Farias
:
MT-DPCQA: A Multimodal Time-aware Learning Approach for No-Reference Dynamic Point Cloud Quality Assessment. 7113-7122 - Seung-gyeom Kim
, Areum Kim
, Eunchae Kim
, Minho Chung
, Yongjae Yoo
:
Automatic Accessible Multimodal Translation of Graphics Using A Refreshable Pin Array. 7123-7132
Experience: Multimedia Applications
- Hanzhe Liang
, Jie Zhang
, Tao Dai
, Linlin Shen
, Jinbao Wang
, Can Gao
:
Taming Anomalies with Down-Up Sampling Networks: Group Center Preserving Reconstruction for 3D Anomaly Detection. 7133-7141 - Daixun Li
, Sibo He
, Jiayun Tian
, Yusi Zhang
, Weiying Xie
, Mingxiang Cao
, Donglai Liu
, Zirui Li
, Tianlin Hui
, Rui Huang
, Yunsong Li
:
Uni-Sight: An E2E Vision-Language-Action System Unifying Multi-View Alignment and Multi-Modal Fusion. 7142-7151 - Jinhong He
, Minglong Xue
, Zhipu Liu
, Mingliang Zhou
, Aoxiang Ning
, Palaiahnakote Shivakumara
:
Degradation-Consistent Learning via Bidirectional Diffusion for Low-Light Image Enhancement. 7152-7161 - Ren Wang
, Xin Wang
, Tongtong Feng
, Xinyue Gong
, Guangyao Li
, Yu-Wei Zhan
, Qing Li
, Wenwu Zhu
:
Improving Compositional Generalization in Cross-Embodiment Learning via Mixture of Disentangled Prototypes. 7162-7171 - Xin Zhang
, Weiying Xie
, Yunsong Li
, Xiaoyu Chen
, Tianlin Hui
, Jitao Ma
, Leyuan Fang
:
TF-ATM: Training-Free Adaptive Token Merging. 7172-7180 - Bolei Chen
, Jiaxu Kang
, Haonan Yang
, Ping Zhong
, Jianxin Wang
:
Perspective from a Higher Dimension: Can 3D Geometric Priors Help Visual Floorplan Localization? 7181-7190 - Chengpei Xu
, Wenhao Zhou
, Long Ma
, Weimin Wang
, Feng Xia
, Binghao Li
, Wenjie Zhang
:
Bright to Dark: Stage-wise Bilevel Knowledge Transfer for Seeing Text in the Dark. 7191-7199 - Yunqiang Pei
, Hongrong Yang
, Kaiyue Zhang
, Guoqing Wang
, Peng Wang
, Chaoning Zhang
, Yang Yang
, Heng Tao Shen
:
InteractGuide: LLM-Enhanced Multimodal Reasoning for User-Centric Interaction Recommendations in AR-HRI Authoring. 7200-7209 - Tianyi Wang
, Harry Cheng
, Ming-Hui Liu
, Mohan Kankanhalli
:
FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks. 7210-7219 - Mingwei Li
, Pu Pang
, Hehe Fan
, Hua Huang
, Yi Yang:
TSGS: Improving Gaussian Splatting for Transparent Surface Reconstruction via Normal and De-lighting Priors. 7220-7229 - Qirui Yang
, Fangpu Zhang
, Yeying Jin
, Qihua Cheng
, Peng-Tao Jiang
, Huanjing Yue
, Jingyu Yang
:
DSDNet: Raw Domain Demoiréing via Dual Color-Space Synergy. 7230-7238 - Xu Chen
, Yang Li
, Yahong Han
, Jialie Shen
:
Ex Pede Herculem, Predicting Global Actionness Curve from Local Clips. 7239-7247 - Wei Jiang
, Junru Li
, Kai Zhang
, Li Zhang
:
BiECVC: Gated Diversification of Bidirectional Contexts for Learned Video Compression. 7248-7257 - Zeshuai Deng
, Guohao Chen
, Shuaicheng Niu
, Hui Luo
, Shuhai Zhang
, Yifan Yang
, Renjie Chen
, Wei Luo
, Mingkui Tan
:
Test-Time Model Adaptation for Quantized Neural Networks. 7258-7267 - Zeyi Lu
, Xiaoxiao Ma
, Yujun Huang
, Minxiao Chen
, Bin Chen
, Baoyi An
, Shu-Tao Xia
:
EDPC: Accelerating Lossless Compression via Lightweight Probability Models and Decoupled Parallel Dataflow. 7268-7276 - Yuxiong Xu
, Bin Li
, Weixiang Li
, Sara Mandelli
, Viola Negroni
, Sheng Li
:
ALDEN: Dual-Level Disentanglement with Meta-learning for Generalizable Audio Deepfake Detection. 7277-7286 - Boyuan Tian
, Qizhe Gao
, Siran Xianyu
, Xiaotong Cui
, Minjia Zhang
:
FlexGaussian: Flexible and Cost-Effective Training-Free Compression for 3D Gaussian Splatting. 7287-7296 - Chen Pang
, Xuequan Lu
, Qianyu Zhou
, Lei Lyu
:
Learning Adaptive Node Selection with External Attention for Human Interaction Recognition. 7297-7306 - Shuyang Chu
, Jingang Shi
, Xu Cheng
, Haoyu Chen
, Xin Liu
, Jian Xu
, Guoying Zhao
:
To Remember, To Adapt, To Preempt: A Stable Continual Test-Time Adaptation Framework for Remote Physiological Measurement in Dynamic Domain Shifts. 7307-7316 - Bingchen Miao
, Wenqiao Zhang
, Juncheng Li
, Wangyu Wu
, Siliang Tang
, Zhaocheng Li
, Haochen Shi
, Jun Xiao
, Yueting Zhuang
:
Robust Modality-Incomplete Anomaly Detection: A Modality-Instructive Framework with Benchmark. 7317-7326 - Huanqi Wu
, Huangbiao Xu
, Xiao Ke
:
The Devil in the Stego Image: Far from Being Usable in Real-World Scenarios. 7327-7335 - Nan An
, Siqi Xu
, Long Ma
, Zhu Liu
, Guangchao Han
, Tengyu Ma
, Risheng Liu
:
Inter-Task Weaving in Image Enhancement: From a New Unified Architecture to a Better Meta-Representation Learning. 7336-7345 - Renxiang Guan
, Junhong Li
, Siwei Wang
, Wenxuan Tu
, Miaomiao Li
, En Zhu
, Xinwang Liu
, Ping Chen
:
Multi-view Graph Clustering with Dual Relation Optimization for Remote Sensing Data. 7346-7355 - Chaoran Feng
, Zhenyu Tang
, Wangbo Yu
, Yatian Pang
, Yian Zhao
, Jianbin Zhao
, Li Yuan
, Yonghong Tian
:
E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras. 7356-7365 - Sifan Zhou
, Jiahao Nie
, Ziyu Zhao
, Yichao Cao
, Xiaobo Lu
:
FocusTrack: One-Stage Focus-and-Suppress Framework for 3D Point Cloud Object Tracking. 7366-7375 - Hongtao Wu
, Yifeng Wu
, Jiaxuan Jiang
, Chengyu Wu
, Hong Wang
, Yefeng Zheng
:
SAMVSR: Leveraging Semantic Priors to Zone-Focused Mamba for Video Snow Removal. 7376-7385 - Ze Huang
, Zhongyang Xiao
, Mingliang Song
, Yu Fang
, Hongyuan Yuan
, Kevin Li Sun
, Li Zhang
:
MS-Road: Towards Spatiotemporal-Consistent Large-Scale Road Reconstruction. 7386-7394 - Shuo Li
, Xingchen Liu
, Fang Liu
, Licheng Jiao
, Jiahao Wang
, Xinyan Huang
, Yanbiao Ma
, Puhua Chen
, Lingling Li
, Xu Liu
, Xuejian Gou
:
Imagining Vision From Language for Few-Shot Class-Incremental Learning. 7395-7404 - Seonghwa Choi
, Moonkyeong Choi
, Mingyu Jang
, Jaekyung Kim
, Jianfei Cai
, Wen-Huang Cheng
, Sanghoon Lee
:
Relightable and Dynamic Gaussian Avatar Reconstruction from Monocular Video. 7405-7414 - Binyan Xu
, Fan Yang
, Xilin Dai
, Di Tang
, Kehuan Zhang
:
CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation. 7415-7423 - Ziyun Qian
, Zeyu Xiao
, Xingliang Jin
, Dingkang Yang
, Mingcheng Li
, Zhenyi Wu
, Dongliang Kou
, Peng Zhai
, Lihua Zhang
:
UMSD: High Realism Motion Style Transfer via Unified Mamba-based Diffusion. 7424-7433 - Zhaohu Xing
, Lihao Liu
, Tian Ye
, Sixiang Chen
, Yijun Yang
, Guang Liu
, Lei Zhu
:
Farther Than Mirror: Explore Pattern-Compensated Depth of Mirror with Temporal Changes for Video Mirror Detection. 7434-7443 - Danyang Li
, Zenghui Yang
, Guangpeng Qi
, Songtao Pang
, Guangyong Shang
, Qiang Ma
, Zheng Yang
:
OpenMap: Instruction Grounding via Open-Vocabulary Visual-Language Mapping. 7444-7452 - Yan Zhong
, Xinping Zhao
, Li Zhang
, Xinyuan Song
, Tingting Jiang
:
Adaptive Prompt Learning for Blind Image Quality Assessment with Multi-modal Mixed-datasets Training. 7453-7462 - Changyu Rao
, Gaozhi Liu
, Sheng Li
, Xinpeng Zhang
, Zhenxing Qian
:
DynMark: A Robust Watermarking Solution for Dynamic Screen Content with Small-size Screenshot Support. 7463-7471 - Mingrui Li
, Shuhao Zhai
, Zibing Zhao
, Luyue Sun
, Xinxiao Wang
, Dong Li
, Shuhong Liu
, Hongyu Wang
:
Wild3A: Novel View Synthesis from Any Dynamic Images in Seconds. 7472-7480 - Shuo Wang
, Zhichuan Wang
, Yanmin Chen
, Mengyao Zhou
, Jun Luo
:
DRMix: Decomposition-Recomposition Data Augmentation with Diffusion Model. 7481-7489 - Lin Zhu
, Ruonan Liu
, Xiao Wang
, Lizhi Wang
, Hua Huang
:
Revealing Latent Information: A Physics-inspired Self-supervised Pre-training Framework for Noisy and Sparse Events. 7490-7499 - Haojie Zhang
, Yixiong Liang
, Hulin Kuang
, Lihui Cen
, Zhe Qu
, Yigang Cen
, Min Zeng
, Shichao Kan
:
Contrastive Regularization over LoRA for Multimodal Biomedical Image Incremental Learning. 7500-7509 - Yuanhong Chen
, Kazuki Shimada
, Christian Simon
, Yukara Ikemiya
, Takashi Shibuya
, Yuki Mitsufuji
:
CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation. 7510-7518 - Haitao Wang
, Sijia Wen
, Bo Guo
:
Polarimetric Monocular Gaussian Splatting SLAM for Dense Surface Reconstruction. 7519-7528 - Chen Qian
, Danyang Li
, Xinran Yu
, Zheng Yang
, Qiang Ma
:
OpenMoCap: Rethinking Optical Motion Capture under Real-world Occlusion. 7529-7537 - Chang Liu
, Ye Pan
, Chenyang Ding
, Susanto Rahardja
, Xiaokang Yang
:
MEDTalk: Multimodal Controlled 3D Facial Animation with Dynamic Emotions by Disentangled Embedding. 7538-7547 - Hongyan Xu
, Zhongze Wu
, Ang He
, Xi Lin
, Yi Chen
, Xiu Su
:
Addressing Granularity-induced Semantic Drift in OvOD via Graph-guided semantically consistent representation. 7548-7557 - Yanrui Yu
, Tianfei Zhou
, Jiaxin Sun
, Lianpeng Qiao
, Lizhong Ding
, Ye Yuan
, Guoren Wang
:
Lava: Language Driven Scalable and Versatile Traffic Video Analytics. 7558-7567 - Qian Li
, Siyuan Liang
, Yuzheng Zhang
, Cheng Ji
, Zongyu Chang
, Shangguang Wang
:
Meta-Knowledge Path Augmentation for Multi-Hop Reasoning on Satellite Commonsense Multi-Modal Knowledge Graphs. 7568-7577 - Yuxin Cheng
, Binxiao Huang
, Wenyong Zhou
, Taiqiang Wu
, Zhengwu Liu
, Graziano Chesi
, Ngai Wong
:
Re-Activating Frozen Primitives for 3D Gaussian Splatting. 7578-7586 - Songning Lai
, Ninghui Feng
, Jiechao Gao
, Hao Wang
, Haochen Sui
, Xin Zou
, Jiayu Yang
, Wenshuo Chen
, Lijie Hu
, Hang Zhao
, Xuming Hu
, Yutao Yue
:
From Guesswork to Guarantee: Towards Faithful Multimedia Web Forecasting with TimeSieve. 7587-7595 - Yiang Zhu
, Haoyue Wang
, Zhenxing Qian
, Sheng Li
, Xinpeng Zhang
, Jian Liu
:
Towards Generalized Physical Occlusion Detection On Documents. 7596-7605 - Bolun Zheng
, Xinjie Liu
, Qianyu Zhang
, Canjin Wang
, Fangni Chen
, Mingen Xu
:
EHPE: A Segmented Architecture for Enhanced Hand Pose Estimation. 7606-7615 - Zheyun Qin
, Deng Yu
, Yang Shi
, Qiangchang Wang
, Zhumin Chen
:
Video Instance Segmentation by Weighted Structure Inference. 7616-7624 - Jiaqi Hou
, Kewei Zhang
, Tianyu Yang
, Chengyu Jia
, Qiqi Lin
, Hui Wei
, Zheng Wang
:
FAB-Attack: Fabric-Aware Adversarial Attacks on Person Detectors under Motion Blur. 7625-7634 - Qi He
, Xiao Wu
, Jun-Yan He
, Wei Li
, Zhaoquan Yuan
:
DualEnhance: External Multimodal Foundation Models Guidance and Internal Fast-Slow Teacher Regulation. 7635-7643 - Yuezhou Li
, Yuzhen Niu
, Huangbiao Xu
, Hui Da
, Rui Xu
, Wenxi Liu
:
IPCMoE: Integrating Perceptual Cues with Mixture-of-Experts for Joint Low-Light Image Enhancement and Deblurring. 7644-7652 - Yibo Lyu
, Rui Shao
, Gongwei Chen
, Yijie Zhu
, Weili Guan
, Liqiang Nie
:
PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning. 7653-7662 - Quanhong Peng
, Dan Zhang
, Dong Zhao
, Jianpeng Zhang
, Meihua Song
, Chenlei Lv
:
Cam-Bench: A Benchmark for Image-based Camera Parameter Estimation. 7663-7671 - Chenda Wei
, Haoyue Wang
, Zhenxing Qian
, Sheng Li
, Xinpeng Zhang
, Jian Liu
:
Learning Discrepant Transformations for Face Privacy Protection. 7672-7680 - Zhiwen Yang
, Yuxin Peng
:
SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion. 7681-7690 - Zhaolin Wei
, Xiuwen Shi
, Dengpan Ye
, Yuhan Lin
, Zhigang Wang
, Jiacheng Deng
, Ziyi Liu
, Long Tang
:
PhonoFence: A Cross-Task Defense Framework for DeepFake via Phoneme-Level Adversarial Perturbations. 7691-7699 - Wei Shang
, Dongwei Ren
, Wanying Zhang
, Pengfei Zhu
, Qinghua Hu
, Wangmeng Zuo
:
Motion-Aware Adaptive Pixel Pruning for Efficient Local Motion Deblurring. 7700-7708 - Shuo Lu
, Yanyin Chen
, Wei Feng
, Jiahao Fan
, Fengheng Li
, Zheng Zhang
, Jingjing Lv
, Junjie Shen
, Ching Law
, Jian Liang
:
Uni-Layout: Integrating Human Feedback in Unified Layout Generation and Evaluation. 7709-7718 - Shibei Meng
, Saihui Hou
, Yang Fu
, Xuecai Hu
, Junzhou Huang
, Yongzhen Huang
:
Seeing from Magic Mirror: Contrastive Learning from Reconstruction for Pose-based Gait Recognition. 7719-7728 - Jinhao Li
, Zijian Chen
, Runze Jiang
, Tingzhu Chen
, Changbo Wang
, Guangtao Zhai
:
Mitigating Long-tail Distribution in Oracle Bone Inscriptions: Dataset, Model, and Benchmark. 7729-7738 - Yawei Chen
, Huibing Wang
, Mingze Yao
, Jinjia Peng
, Guangqi Jiang
, Jiqing Zhang
:
Scalable Multi-view Clustering based on Tight Anchor Distribution. 7739-7747 - Luyan Cui
, Huibing Wang
, Yawei Chen
, Mingze Yao
, Xianping Fu
, Jiqing Zhang
:
Dual-Constraint Multi-view Fuzzy Clustering with Scalable Anchor Graph Learning. 7748-7756 - Shuning Sun
, Yu Zhang
, Chen Wu
, Dianjie Lu
, Guijuan Zhang
, Yang Wen
, Zhuoran Zheng
:
UniFlowRestore: A General Video Restoration Framework via Flow Matching and Prompt Guidance. 7757-7765 - Kailong Yu
, Liyuan Pan
, Liu Liu
, Wei Liang
:
Enhanced Dual-Pixel Image Reflection Removal via Gaussian Splatting. 7766-7775 - Sitian Gu
, Zhiyu Pan
, Chaoyi Hong
, Chengxin Liu
, Zhiguo Cao
:
Dynamic Beauty is Easy to Find: A Large-Scale Composition-Aware Dataset and an End-to-End Framework for Video Reframing. 7776-7784 - Yong Liu
, Jinshan Pan
, Yinchuan Li
, Qingji Dong
, Chao Zhu
, Yu Guo
, Fei Wang
:
UltraVSR: Achieving Ultra-Realistic Video Super-Resolution with Efficient One-Step Diffusion Space. 7785-7794 - Yunlong Zhao
, Xiaoheng Deng
, Zhuohua Qiu
, Feng Yang
, Chang Xu
, Xiangjian He
, Shan You
, Xiu Su
:
CaDGS: Modeling Inter-Gaussian Mutual Information for Dynamic Novel View Synthesis. 7795-7804 - Jingjun Yi
, Qi Bi
, Hao Zheng
, Huimin Huang
, Haolan Zhan
, Yixian Shen
, Wei Ji
, Yawen Huang
, Yuexiang Li
, Xian Wu
, Yefeng Zheng
:
AtlantisGS: Underwater Sparse-View Scene Reconstruction via Gaussian Splatting. 7805-7814 - Ruicheng Zhang
, Yu Sun
, Zeyu Zhang
, Jinai Li
, Xiaofan Liu
, Hoi Fan Au
, Haowei Guo
, Puxin Yan
:
MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image Segmentation. 7815-7824 - Ruicheng Zhang
, Haowei Guo
, Kanghui Tian
, Jun Zhou
, Mingliang Yan
, Zeyu Zhang
, Shen Zhao
:
Unified Medical Image Segmentation with State Space Modeling Snake. 7825-7834 - Xiang Huang
, Ao Luo
, Xiao Wu
, Zhaoquan Yuan
:
Latent Interactiveness Field for Non-Contact Human Object Interaction Detection. 7835-7843 - Ru Jia
, Xiaoqian Liang
, Xubin Duan
, Jianji Wang
, Nanning Zheng
:
HybridPlane: A General 4D Representation for Dynamic Scene Reconstruction. 7844-7853 - Ruoxuan Zhang
, Bin Wen
, Hongxia Xie
, Yi Yao
, Songhan Zuo
, Jian-Yu Jiang-Lin
, Hong-Han Shuai
, Wen-Huang Cheng
:
CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation. 7854-7863 - Juan Zhao
, Yudao Sun
, Zhihai Yang
, Cai Xu
, Hongji Chen
, Fan Zhang
, Jianxin Li
:
Cross-Model Watermarking via Discriminative Samples for Secure Authentication. 7864-7873 - Chengcheng Xing
, Yanyu Xu
, Yonghui Xu
, Lizhen Cui
:
Learning Invariant Discriminative Patterns for Unified Anomaly Detection. 7874-7882 - Kerun Mi
, Guoliang Kang
, Guangyu Li
, Lin Zhao
, Tao Zhou
, Chen Gong
:
Cross-Domain Attribute Alignment with CLIP: A Rehearsal-Free Approach for Class-Incremental Unsupervised Domain Adaptation. 7883-7892 - Tianshun Han
, Benjia Zhou
, Ajian Liu
, Yanyan Liang
, Du Zhang
, Zhen Lei
, Jun Wan
:
PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles. 7893-7901 - Yu Tong
, Weihai Lu
, Xiaoxi Cui
, Yifan Mao
, Zhejun Zhao
:
DAPT: Domain-Aware Prompt-Tuning for Multimodal Fake News Detection. 7902-7911 - Xuanchen Wang
, Heng Wang
, Weidong Cai
:
ChoreoMuse: Robust Music-to-Dance Video Generation with Style Transfer and Beat-Adherent Motion. 7912-7921 - Zikai Zhang
, Xu Zhang
, Ziyi Li
, Yidong Li
, Yuanzhouhan Cao
:
GMML: Gradient-Modulated Robustness for Imbalance-Aware Multimodal Learning. 7922-7930 - Ziwei Niu
, Shiao Xie
, Ziyue Wang
, Yen-Wei Chen
, Yueming Jin
, Lanfen Lin
:
EIR-SDG: Explore Invariant Representation for Single-source Domain Generalization in Medical Image Segmentation. 7931-7939 - Huabin Wang
, Yingfan Cheng
, Wu Zheng
, Jiayuan Cheng
, Xin Li
, Min Li
, Fei Liu
:
A Multi-illumination Dataset and an Illumination Domain Adaptation Network for Finger Vein Identification. 7940-7948 - Tianyi Liu
, Kejun Wu
, Chen Cai
, Yi Wang
, Kim-Hui Yap
, Lap-Pui Chau
:
Towards Blind Bitstream-corrupted Video Recovery: A Visual Foundation Model-driven Framework. 7949-7958 - Jie Yu
, Songping Mai
, Peng Zhang
, Yucheng Jiang
, Jian Cheng
:
Activation and Weight Distribution Balancing for Optimal Post-Training Quantization in Learned Image Compression. 7959-7967 - Yu Hong
, Yize Wu
, Zhehao Shen
, Chengcheng Guo
, Yuheng Jiang
, Yingliang Zhang
, Qiang Hu
, Jingyi Yu
, Lan Xu
:
BEAM: Bridging Physically-based Rendering and Gaussian Modeling for Relightable Volumetric Video. 7968-7977 - Yi Dai
, Yang Ding
, Kaisheng Zeng
:
Bridging Domains in Mental Stress Assessment via Retrieval-Augmented Reasoning. 7978-7987 - Lin Wu
, Wei Wei
, Peizhuo Yu
, Jianglin Lan
:
Open-Vocabulary 3D Affordance Understanding via Functional Text Enhancement and Multilevel Representation Alignment. 7988-7997 - Shangheng Chen
, Shengsheng Qian
, Quan Fang
, Jun Hu
, Changsheng Xu
:
A Large-Scale Dataset for Short-Video Topic Peak Prediction and a Large Heterogeneous Graph Model. 7998-8007 - Yue Sun
, Xinqi Liu
, Zhiliang He
, Jialu Zhang
, Chenming Wu
, Guodong Lu
, Jituo Li
:
DAFU-CAD: Depth-assisted Feature Unraveling for Sketch-based Robust CAD Modeling. 8008-8017 - Hang Guo
, Qing Zhang
, Zixuan Gao
, Siyuan Yang
, Shulin Peng
, Xiang Tao
, Ting Yu
, Yan Wang
, Qingli Li
:
Efficient Multi-Slide Visual-Language Feature Fusion for Placental Disease Classification. 8018-8027 - Yihang Huang
, Yuanfei Huang
, Junhui Lin
, Hua Huang
:
DeflareMamba: Hierarchical Vision Mamba for Contextually Consistent Lens Flare Removal. 8028-8037 - Lizhi Xiong
, Peipeng Yu
, Yue Wu
:
MADPHash: Manipulation-Aware Deep Perceptual Hashing using Feature Consistency. 8038-8047 - Yuanyi Duan
, Wei Xu
, Qinlong Wu
, Guo-Sen Xie
, Fang Zhao
, Caifeng Shan
:
AnomalyControl: Highly-Aligned Anomalous Image Generation with Controlled Diffusion Model. 8048-8057 - Wei Li
, Yizhao Wan
, Xiao Wu
, Jianshuai Wang
, Penglin Dai
, Zhaoquan Yuan
:
HOPNet: Learning Hand-Object-Person Interaction Network for Hand Contact State Detection. 8058-8066 - Fansheng Zeng
, Bineng Zhong
, Haiying Xia
, Yufei Tan
, Xiantao Hu
, Liangtao Shi
, Shuxiang Song
:
Explicit Context Reasoning with Supervision for Visual Tracking. 8067-8076 - Tianzhong Lan
, Zhang Yi
, Xiuyuan Xu
, Min Zhu
:
LooBox: Loose-box-supervised 3D Tumor Segmentation with Self-correcting Bidirectional Learning. 8077-8086 - Fei Ye
, Adrian G. Bors
:
Online Continual Learning via Dynamic Expandable Recursive Model. 8087-8096 - Huynh Dang Nguyen
, Trong-Thang Pham
, Ngan Le
, Van Nguyen
:
TolerantECG: A Foundation Model for Imperfect Electrocardiogram. 8097-8105 - Xiangyu Wu
, Feng Yu
, Yang Yang
, Jianfeng Lu
:
Text as Any-Modality for Zero-Shot Classification by Consistent Prompt Tuning. 8106-8115 - Giovanni Zanin
, Ritujoy Biswas
, Pietro Morerio
, Sylvio Barbon Junior
, Alberto Carini
, Alessio Del Bue
, Vittorio Murino
:
Direction-Aware Room Impulse Response Estimation for Immersive Audio Rendering in Real Environments. 8116-8124 - Haocheng Tang
, Ruoke Yan
, Xinhui Yin
, Qi Zhang
, Xinfeng Zhang
, Siwei Ma
, Wen Gao
, Chuanmin Jia
:
HGC-Avatar: Hierarchical Gaussian Compression for Streamable Dynamic 3D Avatars. 8125-8134 - Cheng Peng
, Zhen Wang
:
Method and Applications of Solid-State Lidar Modeling for X-in-the-Loop Testing of Autonomous Vehicles. 8135-8143 - Shuai Zhang
, Guanjun Wu
, Zhoufeng Xie
, Xinggang Wang
, Bin Feng
, Wenyu Liu
:
Dynamic 2D Gaussians: Geometrically Accurate Radiance Fields for Dynamic Objects. 8144-8153 - Ao Yang
, Yanglin Feng
, Yuan Sun
, Dezhong Peng
, Guiduo Duan
, Yang Qin
:
Noise-Robust Cross-modal Learning for Reliable 2D-3D Retrieval. 8154-8163 - Xinqi Su
, Zitong Yu
, Yawen Cui
, Ajian Liu
, Xun Lin
, Yuhao Wang
, Haochen Liang
, Wenhui Li
, Li Shen
, Xiaochun Cao
:
Dynamic Analysis and Adaptive Discriminator for Fake News Detection. 8164-8173 - Hanbing Wu
, Ping Jiang
, Anyang Su
, Chenxu Zhao
, Tianyu Fu
, Minghui Wu
, Beiping Tan
, Huiying Li
:
PRE-MAP: Personalized Reinforced Eye-tracking Multimodal LLM for High-Resolution Multi-Attribute Point Prediction. 8174-8183 - Ziang Wang
, Xiaoqin Wang
, Dingyi Wang
, Qiang Li
, Shushan Qiao
:
DIME-Net: A Dual-Illumination Adaptive Enhancement Network Based on Retinex and Mixture-of-Experts. 8184-8193 - Xihang Hu
, Fuming Sun
, Jiazhe Liu
, Feilong Xu
, Xiaoli Zhang
:
ST-SAM: SAM-Driven Self-Training Framework for Semi-Supervised Camouflaged Object Detection. 8194-8203 - Fangmin Zhao
, Weichao Zeng
, Zhenhang Li
, Dongbao Yang
, Binbin Li
, Xiaojun Bi
, Yu Zhou
:
Uni-DocDiff: A Unified Document Restoration Model Based on Diffusion. 8204-8213 - Shuzhao Xie
, Jiahang Liu
, Weixiang Zhang
, Shijia Ge
, Sicheng Pan
, Chen Tang
, Yunpeng Bai
, Cong Zhang
, Xiaoyi Fan
, Zhi Wang
:
SizeGS: Size-aware Compression of 3D Gaussian Splatting via Mixed Integer Programming. 8214-8223 - Shuoshuo Li
, Shuli Cheng
, Liejun Wang
:
Entity-Level Alignment with Prompt-Guided Adapter for Remote Sensing Image-Text Retrieval. 8224-8233 - Feng-Kai Huang
, Bo-Lun Huang
, Li-Wu Tsao
, Jhih-Ciang Wu
, Hong-Han Shuai
, Wen-Huang Cheng
:
Flowing Crowd to Count Flows: A Self-Supervised Framework for Video Individual Counting. 8234-8243 - Qianqian Sun
, Jixiang Luo
, Dell Zhang
, Xuelong Li
:
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding. 8244-8252 - Zheng Qin
, Ruobing Zheng
, Yabing Wang
, Tianqi Li
, Zixin Zhu
, Sanping Zhou
, Ming Yang
, Le Wang
:
Versatile Multimodal Controls for Expressive Talking Human Animation. 8253-8262 - Xueyi Zhang
, Jialu Sun
, Chengwei Zhang
, Xianghu Yue
, Tianfang Xiao
, Siqi Cai
, Mingrui Lao
, Haizhou Li
:
EventLip: Enhancing Event-Based Lip Reading via Frequency-Aware Spatiotemporal Hypergraph Modeling. 8263-8272 - Haolin Wang
, Yafei Ou
, Prasoon Ambalathankandy
, Gen Ota
, Pengyu Dai
, Masayuki Ikebe
, Kenji Suzuki
, Tamotsu Kamishima
:
Layer Separation: Towards Adjustable Joint Space Width Images Synthesis. 8273-8282 - Yujie Yang
, Shuang Li
, Jun Ye
, Neng Dong
, Fan Li
, Huafeng Li
:
DINOv2 Driven Gait Representation Learning for Video-Based Visible-Infrared Person Re-identification. 8283-8292 - Lingling Dai
, Andong Li
, Zhe Han
, Chengshi Zheng
, Xiaodong Li
:
BAPEN: Towards Versatile Audio Phase Retrieval. 8293-8302 - Yishu Liu
, Zhiming Chen
, Desen Wang
, Xiaoling Luo
, Bingzhi Chen
, Guangming Lu
:
PET-GPRA: Rethinking PET with Gradient-Aware Prompting and Router-Free Adapters for Few-shot Class-Incremental Learning. 8303-8312 - Guanjie Huang
, Danny H. K. Tsang
, Shan Yang
, Guangzhi Lei
, Li Liu
:
Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition. 8313-8321 - Mingyang Ding
, Zhan Wang
, Jiachen Wang
, Tingting Han
, Xinyuan Hu
, Jiajun Ding
, Min Tan
, Zhenzhong Kuang
:
FutureGS: Structured Gaussian Fields for Future-Aware Dynamic Scene Modeling. 8322-8331 - Shuyang Wang
, Chunxiao Li
, Anlong Ming
:
IFS-Light: An Interactive Framework for Single-view Face Relighting with both Facial and Lighting Consistency. 8332-8340 - Lei Liu
, Zhenghao Chen
, Dong Xu
:
3D Gaussian Splatting Data Compression with Mixture of Priors. 8341-8350 - An Zhao
, Piaopiao Yu
, Zhe Zhu
, Mingqiang Wei
:
BSGS: Bi-Stage 3D Gaussian Splatting for Camera Motion Deblurring. 8351-8359 - Teng Jin
, Ziwen He
, Zhangjie Fu
, Songping Wang
, Yueming Lyu
, Yufei Shi
:
Frequency Domain Distributed Perturbations: Towards Query-Efficient Black-Box Adversarial Video Attack. 8360-8368 - Zhihao Luo
, Luojun Lin
, Zheng Lin
:
Synthetic-to-Real Camouflaged Object Detection. 8369-8378 - Hua Li
, Gaowei Lin
, Zhiyuan Li
, Sam Kwong
, Runmin Cong
:
FSCDiff: Frequency-Spatial Entangled Conditional Diffusion model for Underwater Salient Object Detection. 8379-8388 - Jing Jin
, Xu Liu
, Te Gao
, Zhihong Shi
, Yixiong Liang
, Ruiqing Zheng
, Hulin Kuang
, Min Zeng
, Shichao Kan
:
Dynamic Residual Encoding with Slide-Level Contrastive Learning for End-to-End Whole Slide Image Representation. 8389-8398 - Xuewen Liu
, Zhikai Li
, Minghao Jiang
, Mengjuan Chen
, Jianquan Li
, Qingyi Gu
:
DilateQuant: Accurate and Efficient Quantization-Aware Training for Diffusion Models via Weight Dilation. 8399-8408 - Yuxi Bi
, Yunfan Gao
, Haofen Wang
:
StePO-Rec: Towards Personalized Outfit Styling Assistant via Knowledge-Guided Multi-Step Reasoning. 8409-8417 - Tung-I Chen
, Dae Yeol Lee
, Guan-Ming Su
, Mohammad Hajiesmaili
, Ramesh K. Sitaraman
:
NIVM: Real-time View Morphing via Neural Implicit Function. 8418-8427 - Zhiqian Xia
, Haifeng Xia
, Shichao Jin
, Wei Wang
, Zhengming Ding
, Xiaochun Cao
:
DSPF: Dual-Stage Preservation and Fusion for Source-Free Domain Adaptive Point Cloud Completion. 8428-8437 - Youchen Xie
, Chen Li
, Sheng Qiu
, Zhi-Jun Wang
, Chenhui Li
, Yibo Zhao
, Zan Gao
, Changbo Wang
:
FluidGS: Physics Informed Gaussian Splatting for Dynamic Fluid Reconstruction from Sparse Views. 8438-8447 - Shun Zou
, Yi Zou
, Juncheng Li
, Guangwei Gao
, Guo-Jun Qi
:
Cross Paradigm Representation and Alignment Transformer for Image Deraining. 8448-8457 - Xiongjian Lv
, Yimin Wen
, Hang Yu
:
DiffuFuse: Diffusion-Driven Dual-Stream Fusion Framework for Multimodal Sentiment Analysis. 8458-8467 - Lizhi Xiong
, Linsen Ding
, Ziqiang Li
:
Detecting Forged HEVC Videos via Anomalous Bitrate-Compressed Traces: A Frame-Level Bitrate Analysis Framework. 8468-8477 - Zhicong Wu
, Hongbin Xu
, Gang Xu
, Ping Nie
, Zhixin Yan
, Jinkai Zheng
, Liangqiong Qu
, Ming Li
, Liqiang Nie
:
TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting. 8478-8487 - Liqi Yan
, Xuebin Li
, Jianhui Zhang
, Fangli Guan
, Kanglei Peng
, Pan Li
:
F-DDIM: A Featurized Denoising Diffusion Implicit Model for Facial Image Steganography. 8488-8496 - Yongqi Shao
, Bingxin Mei
, Cong Tan
, Hong Huo
, Tao Fang
:
MoTAS: MoE-Guided Feature Selection from TTS-Augmented Speech for Enhanced Multimodal Alzheimer's Early Screening. 8497-8505 - Yilin Zhang
, Yanyan Wei
, Zhao Zhang
, Jicong Fan
, Haijun Zhang
, Shuicheng Yan
:
From Outline to Detail: An Hierarchical End-to-end Framework for Coherent and Consistent Visual Novel Generation and Assembly. 8506-8516 - Hongbin Lin
, Yifan Jiang
, Juangui Xu
, Jesse Jiaxi Xu
, Yi Lu
, Zhengyu Hu
, Ying-Cong Chen
, Hao Wang
:
Graph-Guided Dual-Level Augmentation for 3D Scene Segmentation. 8517-8526 - Xin Xu
, Chaoyue Ren
, Wei Liu
, Wenke Huang
, Bin Yang
, Zhixi Yu
, Kui Jiang
:
Positive Style Accumulation: A Style Screening and Continuous Utilization Framework for Federated DG-ReID. 8527-8536 - Zefan Zhang
, Weiqi Zhang
, Kailong Suo
, Yanhui Li
, Tian Bai
:
Video-Level Multimodal Relation Extraction with Event-Entity Semantic Consistency. 8537-8546 - Xiaodong Zhu
, Suting Wang
, Junqi Yang
, Yuhong Yang
, Weiping Tu
, Zhongyuan Wang
:
Query-Based Audio-Visual Temporal Forgery Localization with Register-Enhanced Representation Learning. 8547-8556 - Yu-Wei Zhan
, Fan Liu
, Xin Luo
, Xin-Shun Xu
, Liqiang Nie
, Mohan Kankanhalli
:
Enhancing HOI Detection with Contextual Cues from Large Vision-Language Models. 8557-8566 - Yongtang Bao
, Chengjie Tang
, Yuze Wang
, Haojie Li
:
Seg-Wild: Interactive Segmentation based on 3D Gaussian Splatting for Unconstrained Image Collections. 8567-8576 - Xiangui Huang
, Taotao Lai
, Yizhang Liu
, Shuyuan Lin
, Zuoyong Li
:
Two-View Correspondence Pruning via Channel-Spatial Interaction and Bidirectional Consensus Interaction. 8577-8585 - Hongjun Liu
, Chao Yao, Yalan Zhang
, Xiaokun Wang
, Xiaojuan Ban
:
Spatial Imputation Drives Cross-Domain Alignment for EEG Classification. 8586-8595 - Wanting Zhang
, Jingxuan Zhang
, Libao Zhang
:
Saliency-Guided Adaptive Random Diffusion for Remote Sensing Images Restoration with Cloud and Haze. 8596-8605 - Sheng Lyu
, Ruiming Huang
, Sijie Ji
, Yasar Abbas Ur Rehman
, Lan Ma
, Chenshu Wu
:
CardioLive: Empowering Video Streaming with Online Cardiac Monitoring via Audio-Visual Learning. 8606-8615 - Beizhen Zhao
, Yifan Zhou
, Sicheng Yu
, Zijian Wang
, Hao Wang
:
Wavelet-GS: 3D Gaussian Splatting with Wavelet Decomposition. 8616-8625 - Qi Zheng
, Haozhi Wang
, Zihao Liu
, Jiaming Liu
, Zhijian Hao
, Bu Chen
, Min Li
, Rui Wan
, Peiye Liu
, Yanheng Lu
, Dimin Niu
, Jinjia Zhou
, Minge Jing
, Yibo Fan
:
Unicorn: Unified Neural Image Compression with One Number Reconstruction. 8626-8635 - Ruoqi Wang
, Haitao Wang
, Qiong Luo
:
GalaxAlign: Mimicking Citizen Scientists' Multimodal Guidance for Galaxy Morphology Analysis. 8636-8644 - Dengwen Wang
, Guanyu Xing
, Yanli Liu
:
Low-light Invariant Representation Learning for Visible-Infrared Person Re-identification. 8645-8653 - Yiqiang Guo
, Lei Zhong
, Bin Chen
, Jia-Li Yin
, Xiaolei Liu
, Shouling Ji
:
Focus on Generalization: Improving Adversarial Transferability via Bi-Level Bias Mitigation. 8654-8662 - Shucheng Gong
, Lingzhe Zhao
, Wenpu Li
, Hong Xie
, Yin Zhang
, Shiyu Zhao
, Peidong Liu
:
Casual3DHDR: High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos. 8663-8672 - Fan Wang
, Zhangjie Fu
, Xiang Zhang
, Ziqiang Li
, Ziwen He
, Manyu Wang
:
Pair-wise Confidence Difference-based Pseudo-Label Selection for Universal Mismatched Steganalysis. 8673-8681 - Ruonan Wei
, Yuntao Wang
, Siyan Fang
, Yuehuan Wang
:
End-to-End Multiple Object Tracking with Dynamic Scene Perception. 8682-8691 - Haiyang Mei
, Difei Gao
, Xiaopeng Wei
, Xin Yang, Mike Zheng Shou
:
Can I Trust You? Advancing GUI Task Automation with Action Trust Score. 8692-8700 - Kefan Tang
, Lihuo He
, Jisheng Dang
, Xinbo Gao
:
Boosting Temporal Sentence Grounding via Causal Inference. 8701-8710 - Jiang Shao
, Xinbo Zhao
, Xiaochun Zou
, Xiaolin Ye
:
EgoHierMask: Hierarchical Semantic-Prior Guided Masked Autoencoder for Egocentric Action Recognition. 8711-8720 - Junran Wu
, Beng Chin Ooi
, Ke Xu
:
Toward Robust Signed Graph Learning through Joint Input-Target Denoising. 8721-8729 - Yusen Wang
, Huan Zhou
, Yu Jiang
, Chunxia Xiao
:
Robust Gaussian Surface Reconstruction with Semantic Aware Progressive Propagation. 8730-8739 - Haoyu Shi
, Huaiwen Zhang
:
Sequence-Event Semantic Consistent Learning for Text-to-Motion Retrieval. 8740-8749 - Ting Xiao
, Minqian Sun
, Yiqing Xia
, Zhe Wang
:
Dual-Prototype Learning in Multiple Instance Learning for Histopathology Image Classification. 8750-8758 - Yunyu Zou
, Yishu Liu
, Jun Liang
, Bingzhi Chen
:
SG-FSL: Cross-Domain Few-Shot Learning with Style-Decoupled Augmentation and Gradient-Conflict Adjustment. 8759-8768 - Jiawei Zhang
, Xiaoli Jiang
, Hao Wang
, Lin Yuan
, Xiangyang Luo
, Bin Ma
, Jinwei Wang
:
DVW: Diffusion Visible Watermark. 8769-8777 - Kaixin Li
, Ziyang Meng
, Hongzhan Lin
, Ziyang Luo
, Yuchen Tian
, Jing Ma
, Zhiyong Huang
, Tat-Seng Chua
:
ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use. 8778-8786 - Qin Li
, Congcong Xiao
, Limei Liu
, Han Peng
, Junfeng Yang
:
Skeleton Compression and Complementary Enhanced Fusion Under Branch-Stage Supervision for Human Action Recognition. 8787-8796 - Zeyu Huang
, Juyuan Wang
, Longfeng Chen
, Boyi Xiao
, Leng Cai
, Yawen Zeng
, Jin Xu
:
MVISU-Bench: Benchmarking Mobile Agents for Real-World Tasks by Multi-App, Vague, Interactive, Single-App and Unethical Instructions. 8797-8805 - Yize Song
, Yunqing Chen
, Zhou Wang
, Cheng Chen
, Ruoxiu Xiao
:
Symmetrical Awareness Generation for Pelvic Image Segmentation. 8806-8814 - Hantao Zhou
, Rui Yang
, Longxiang Tang
, Guanyi Qin
, Runze Hu
, Xiu Li
:
Gamma: Toward Generic Image Assessment with Mixture of Assessment Experts. 8815-8824 - Woo Yi Yang
, Jiarui Wang
, Sijing Wu
, Huiyu Duan
, Yuxin Zhu
, Liu Yang
, Kang Fu
, Guangtao Zhai
, Xiongkuo Min
:
LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs. 8825-8834 - Ting Li
, Songtao Li
, Shuaifeng Li
, Xiaolin Qin
, Maoyuan Zhao
, Luping Ji
, Mao Ye
:
SAM-Guided Semantic Knowledge Fusion for Visible-Infrared Object Detection. 8835-8844 - Yue Hou
, Yingke Su
, Junran Wu
, Ke Xu
:
Test-time Graph OOD Detection via Dynamic Dictionary Expansion and OOD Score Calibration. 8845-8853 - Shuai Yu
, Xiaoliang He
, Kangjie Dong
, Yi Yu
:
DUDA: A Two-stage Decoupling Unsupervised Domain Adaptation Framework for Semi-supervised Singing Melody Extraction from Polyphonic Music. 8854-8862 - Dongjian Yu, Weiqing Min, Xin Jin
, Qian Jiang, Shuqiang Jiang:
Spatial-Aware Multi-Modal Information Fusion for Food Nutrition Estimation. 8863-8871 - Yan Rong
, Jinting Wang
, Guangzhi Lei
, Shan Yang
, Li Liu
:
AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation. 8872-8881 - Le Liu
, Shizhou Zhang
, Di Xu
:
SUVIS: A Depth- and Motion-Encoded Stereoscopic System for Communicating Forecast Uncertainty. 8882-8890 - Hanyuan Liu
, Minshan Xie
, Jinbo Xing
, Chengze Li
, Chi-Sing Leung
, Tien-Tsin Wong
:
ColorDiffuser: Video Colorization with Pretrained Text-to-Image Diffusion Models. 8891-8900 - Zhuo Su
, Jufeng Li
, Yan Zhang
, Xin Li
, Fuwei Zhang
, Yuxin Feng
, Fan Zhou
:
Breaking the Synthetic Barrier: Towards Stable and Generalizable Real-World Image Dehazing. 8901-8909 - Chuan Zhang
, Zihan Li
, Zihao Xu
, Xuhao Ren
, Liehuang Zhu
:
SepVAMark: Deep Separable Visual-Audio Fusion Watermarking for Source Tracing and Deepfake Detection. 8910-8919 - Jiayi Zeng
, Tao Ren
, Changhu Wang
, Yifan Wang
, Wei Ju
, Zhipeng Sun
, Xiao Luo
:
DATE: Dual Prompt Learning with Information Bottleneck for Graph Out-of-Distribution Generalization. 8920-8929 - Seungmi Choi
, TaeHwa Lee
, Jun Yeong Cha
, Suhyun Jo
, Hyunmin Ban
, Kwan-Jung Oh
, Hyunsuk Ko
, Hui Yong Kim
:
Phase Distribution Matters: On the Importance of Phase Distribution Alignment (PDA) in Holographic Applications. 8930-8938 - Yanyin Guo
, Runxuan An
, Junwei Li
, Zhiyuan Zhang
:
LSFDNet: A Single-Stage Fusion and Detection Network for Ships Using SWIR and LWIR. 8939-8948 - Yiyang Gu
, Taian Guo
, Hang Zhou
, Zihao Chen
, Zhiping Xiao
, Yifang Qin
, Xiao Luo
, Wei Ju
, Yifan Wang
, Ming Zhang
:
CODE: Towards Partial Label Graph Learning via Coupled Dual Separation. 8949-8958 - Zixi Wang
, Yubo Huang
, Jingzehua Xu
, Jinzhu Wei
, Shuai Zhang
, Xin Lai
:
Multi-Modal Gradual Domain Osmosis: Stepwise Dynamic Learning with Batch Matching for Gradual Domain Adaptation. 8959-8967 - Shan Wang
, Weisi Lin
, Yun Liu
, Libao Zhang
:
CLIP-HNet: Hybrid Network with Cross-Modal Guidance for Self-Supervised Remote Sensing Dehazing. 8968-8977 - Xiaoyan Yuan
, Wei Wang
, Junxin Chen
, Xiping Hu
:
Reading Between the Channels: Knowledge-Augmented Medical Time Series Classification. 8978-8987 - Liang Xu
, Songkai Jia
, Cathal Gurrin
, Monica Ward
, Allie Tran
:
Through Someone Else's Eyes: Lifelogging Meets Narrative Virtual Reality. 8988-8996 - Guangfei Li
, Quanxue Gao
, Yu Lei
, Yichen Bao
, Qianqian Wang
:
Multi-view Collaborative Representation Learning from Noisy Labels for VHR Imagery Classification. 8997-9005 - Dongyang Ma
, Zhengyu Ma
, Wei Zhang
, Yonghong Tian
:
DSF-Net: Dynamic Sparse Fusion of Event-RGB via Spike-Triggered Attention for High-Speed Detection. 9006-9015 - Tengyu Ma
, Jiafa Ruan
, Yuetong Wang
, Guangchao Han
, Zhu Liu
, Long Ma
, Risheng Liu
:
Degradation-Aware One-Step Diffusion Model for Content-Sensitive Super-Resolution in the Dark. 9016-9025 - Ruocheng Gu
, Sen Jia
, Yule Ma
, Jinqin Zhong
, Jenq-Neng Hwang
, Lei Li
:
MoCount: Motion-Based Repetitive Action Counting. 9026-9034 - Lin Zuo
, Kunshan Yang
, Mengmeng Jing
, Xiangxu Zhao
, Jiaqiao Chen
:
Bridging Inter-Class Ambiguity and Spatial Variability in Flexible Object Recognition via Graph Distillation. 9035-9043 - Xubo Liu
, Wenya Guo
, Ruxue Yan
, Xumeng Liu
, Ying Zhang
, Ru Zhou
:
Rethinking the Reliability of Evidence in End-to-End Fact-Checking from the Causal Perspective. 9044-9052 - Shengze Shi
, Tao Ren
, Guoliang Zhu
, Guan Dong Feng
, Jun Hu
:
Closing the Feedback Loop in Text2Vis: Refining Visualization with Vision-Language Models. 9053-9061 - Vitalii Emelianov
, Niki Martinel
:
Neural Additive Adapters for Interpretable Nutrition Prediction. 9062-9070 - Zongsheng Cao
, Yangfan He
, Anran Liu
, Jun Xie
, Feng Chen
, Zhepeng Wang
:
TV-RAG: A Temporal-aware and Semantic Entropy-Weighted Framework for Long Video Retrieval and Understanding. 9071-9079 - Yuting Zhao
, Yuheng Ji
, Xiaoshuai Hao
, Shuxiao Li
:
FastRSR: Efficient and Accurate Road Surface Reconstruction in Bird's Eye View. 9080-9089 - Domenic Zingsheim
, Markus Plack
, Hannah Dröge
, Janelle Pfeifer
, Patrick Stotko
, Matthias B. Hullin
, Reinhard Klein
:
RIFTCast: A Template-Free End-to-End Multi-View Live Telepresence Framework and Benchmark. 9090-9099 - Yongan Guo
, Zhongyan Zhou
, Yuao Wang
, Na Zhu
, Xuyun Zhang
, Hongwang Xiao
, Yuan Miao
, Bo Li
:
RSFomer: Time Series Transformer for Robust Sports Action Recognition. 9100-9109 - Qi Chen
, Jingxuan Wei
, Zhuoya Yao
, Haiguang Wang
, Gaowei Wu
, Bihui Yu
, Siyuan Li
, Cheng Tan
:
ResearchPulse: Building Method-Experiment Chains through Multi-Document Scientific Inference. 9110-9119 - Hengnian Gu
, Zhifu Chen
, Jin Peng Zhou
, Dongdai Zhou
:
Hierarchical Disentanglement of Cognitive States for Enhanced Cognitive Diagnosis. 9120-9129 - Fenghua Yu
, Jianwen Sun
, Qian Wan
, Meicheng Chen
, Xiaoxuan Shen
, Qing Li
:
DiffuQKT: A Diffusion-Based Approach for Improved Question Representation in Knowledge Tracing. 9130-9139 - Chong Wu
, Maolin Che
, Renjie Xu
, Zhuoheng Ran
, Hong Yan
:
ELFATT: Efficient Linear Fast Attention for Vision Transformers. 9140-9149 - Dahao Fu
, Jiangqun Ni
, Jian Zhang
:
JPEG-RAE: Reversible Adversarial Example for Privacy and Copyright Protection of JPEG Images. 9150-9158 - Donglin Zhang
, Boyuan Ma
, Xiaojun Wu
, Josef Kittler
:
Ingredients-Guided and Nutrients-Prompted Network for Food Nutrition Estimation. 9159-9167 - Baoquan Zhao
, Xiaofan Ma
, Qianshi Pang
, Ruomei Wang
, Fan Zhou
, Shujin Lin
:
VisAug: Facilitating Speech-Rich Web Video Navigation and Engagement with Auto-Generated Visual Augmentations. 9168-9176 - Hongyu Liu
, Hongwei Ge
, Yuxuan Liu
, Yaqing Hou
:
Dialogue-Driven Interactive Dynamic Learning for Text-to-Image Person Retrieval. 9177-9185 - Yuyang Jiang
, Binzhu Xie
, Lina Xu
, Xiaokang Lei
, Shi Qiu
, Luwen Yu
, Pan Hui
:
Generative Multi-Sensory Meditation: Exploring Immersive Depth and Activation in Virtual Reality. 9186-9195 - Chunzheng Zhu
, Yangfang Lin
, Jialin Shao
, Jianxin Lin
, Yijun Wang
:
Pathology-Aware Prototype Evolution via LLM-Driven Semantic Disambiguation for Multicenter Diabetic Retinopathy Diagnosis. 9196-9205 - Seungkyu Leem
, Seokhyun Jeong
, Yeonho Cho
, Yoonjae Lee
, Jungjin Lee
:
VRMusicStage: A System for Converting Fixed-Camera Music Stage Videos into Immersive VR Content. 9206-9215 - Zhi Zeng
, Jiaying Wu
, Minnan Luo
, Xiangzheng Kong
, Zihan Ma
, Guang Dai
, Qinghua Zheng
:
Understand, Refine and Summarize: Multi-View Knowledge Progressive Enhancement Learning for Fake News Video Detection. 9216-9225
Generative AI: Generative Multimedia
- Huijie Liu
, Jingyun Wang
, Shuai Ma
, Jie Hu
, Xiaoming Wei
, Guoliang Kang
:
Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion Models. 9227-9236 - Guotao Liang
, Juncheng Hu
, Ximing Xing
, Jing Zhang
, Qian Yu
:
Multi-Object Sketch Animation with Grouping and Motion Trajectory Priors. 9237-9246 - Yiming Li
, Peng Zhou
, Xiaokang Qin
, Hongwei Hu
, Jun Sun
, Yi Xu
:
Position-LoRA: Enhanced Relation Customization through Structural Prior in Initial Latent Noise. 9247-9256 - Leyang Li
, Shilin Lu
, Yan Ren
, Adams Wai-Kin Kong
:
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts. 9257-9266 - Ke Xing
, Hanwen Liang
, Dejia Xu
, Yuyang Yin
, Konstantinos N. Plataniotis
, Yao Zhao
, Yunchao Wei
:
TiP4GEN: Text to Immersive Panorama 4D Scene Generation. 9267-9276 - Minho Park
, Youngjoo Jo
, Jae-Hyeok Lee
, Jiyong Lee
, Dong-oh Kang
, Yong Man Ro
:
Focus Where It Matters: LLM-Guided Regional Identification for Instruction-based Image Editing. 9277-9286 - Mengling Xu
, Ming Tao
, Bing-Kun Bao
:
Chain-of-Cooking: Cooking Process Visualization via Bidirectional Chain-of-Thought Guidance. 9287-9295 - Yefei Sheng
, Jie Wang
, Ming Tao
, Bing-Kun Bao
:
D2Gaussian: Dynamic Control with Discretized 3D View Modeling for Text-Driven 3D Gaussian Splatting Editing. 9296-9305 - Dezhi Zheng
, Kaijun Deng
, Xianxu Hou
, Jinbao Wang
, Xiaoqin Wang
, Linlin Shen
:
Unknown Pixel Mask Based Fine-tuning of 2D Inpainting Models for Unbounded 3D Scene Generation from a Single Image. 9306-9315 - Yifan Yang
, Shujie Liu
, Jinyu Li
, Yuxuan Hu
, Haibin Wu
, Hui Wang
, Jianwei Yu
, Lingwei Meng
, Haiyang Sun
, Yanqing Liu
, Yan Lu
, Kai Yu
, Xie Chen
:
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis. 9316-9325 - Zongye Zhang
, Bohan Kong
, Qingjie Liu
, Yunhong Wang
:
Towards Robust and Controllable Text-to-Motion via Masked Autoregressive Diffusion. 9326-9335 - Duoyou Chen
, Yunqing Chen
, Can Zhang
, Zhou Wang
, Cheng Chen
, Ruoxiu Xiao
:
Latent Space Consistency for Sparse-View CT Reconstruction. 9336-9344 - Min Wei
, Chaohui Yu
, Jingkai Zhou
, Fan Wang
:
3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models. 9345-9354 - Yuxuan Xiong
, Ye Chen
, Yue Shi
, Zhangli Hu
, Bingbing Ni
:
Rig-Reconstruct-Render (R33D): Collaborative Representation for Editable and Skeleton-Drivable 3D Asset Generation. 9355-9364 - Tao Tang
, Enhui Ma
, Xia Zhou
, Letian Wang
, Tianyi Yan
, Xueyang Zhang
, Kun Zhan
, Peng Jia
, Xianpeng Lang
, Jia-Wang Bian
, Kaicheng Yu
, Xiaodan Liang
:
OmniGen: Unified Multimodal Sensor Generation for Autonomous Driving. 9365-9374 - Xueyang Kang
, Zhengkang Xiang
, Zezheng Zhang
, Kourosh Khoshelham
:
Look Beyond: Two-Stage Scene View Generation via Panorama and Video Diffusion. 9375-9384 - Liang Yue
, Shao-Kui Zhang
, Lin Yuan
, Yi-Tao Chen
, Zirui Zhou
, Song-Hai Zhang
:
Synthesizing 3D Scenes via Diffusion Model that Incorporates Indoor Scene Characteristics. 9385-9394 - Yihong Ji
, Yunze Liu
, Yiyao Zhuo
, Weijiang Yu
, Fei Ma
, Joshua Zhexue Huang
, Fei Yu
:
OnlineHOI: Towards Online Human-Object Interaction Generation and Perception. 9395-9403 - Siyi Qian
, Jian Fang
, Yuzhou Mao
, Yayun Zou
, Wentao Zhang
, Haiwei Xue
:
Human Motion Generation in 3D Scenes from Open-Ended Textual Instructions with MLLM Planning. 9404-9413 - Jianzhi Liu
, Junchen Zhu
, Pengpeng Zeng
, Lianli Gao
, Heng Tao Shen
, Jingkuan Song
:
AICL: Action In-Context Learning for Text-to-Video Generation. 9414-9423 - Yingjie Xi
, Jian Jun Zhang
, Xiaosong Yang
:
PMG: Progressive Motion Generation via Sparse Anchor Postures Curriculum Learning. 9424-9433 - Zhenghao Zhang
, Junchao Liao
, Xiangyu Meng
, Long Qin
, Weizhi Wang
:
Tora2: Motion and Appearance Customized Diffusion Transformer for Multi-Entity Video Generation. 9434-9443 - Xinyang Li
, Chengjie Yi
, Jiawei Lai
, Mingbao Lin
, Yansong Qu
, Shengchuan Zhang
, Liujuan Cao
:
SynergyAmodal: Deocclude Anything with Text Control. 9444-9453 - Yangyang Xu
, Shengfeng He
, Wenqi Shao
, Yong Du
, Kwan-Yee K. Wong
, Yu Qiao
, Jun Yu
, Ping Luo
:
DiffusionMat: Alpha Matting as Deterministic Sequential Refinement Learning. 9454-9462 - Sijie Xu
, Runqi Wang
, Wei Zhu
, Dejia Song
, Nemo Chen
, Xu Tang
, Yao Hu
:
Single Trajectory Distillation for Accelerating Image and Video Style Transfer. 9463-9471 - Zhenbo Yu
, Jimin Dai
, Yingzhen Zhang
, Jian Yang
, Lei Luo
:
SSAIM: Not All Self-Attentions Contain Effective Spatial Structure in Diffusion Models for Text-to-Image Editing. 9472-9480 - Peiang Zhao
, Han Li
, Ruiyang Jin
, S. Kevin Zhou
:
LoCo: Training-Free Layout-to-Image Synthesis with Localized Constraints. 9481-9490 - Xinhao Cai
, Minghang Zheng
, Xin Jin
, Yang Liu
:
InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects. 9491-9499 - Zhu Xu
, Zhaowen Wang
, Yuxin Peng
, Yang Liu
:
Interact-Custom: Customized Human Object Interaction Image Generation. 9500-9508 - Jiahui Zhang
, Mengtian Li
, Jiewei Tang
, Junyu Deng
, Siyu Tian
, Xiang Liu
, Meng Zhang
, Guangnan Ye
, Yu-Gang Jiang
:
EditMaster: Bridging Text instruction and Visual Example for Multimodal guided Image Editing. 9509-9518 - Zhenyu Xu
, Junjie Wu
, Zhiyan Piao
, Xiaoqi Sheng
, Yu Xiao
, Xinyu Zhang
:
AnyStyleDiffusion: Flexible Style Transfer with Consistent Content Adaptation Across Diffusion Models. 9519-9528 - Wenhui Song
, Hanhui Li
, Jiehui Huang
, Panwen Hu
, Yuhao Cheng
, Long Chen
, Yiqiang Yan
, Xiaodan Liang
:
LaVieID: Local Autoregressive Diffusion Transformers for Identity-Preserving Video Creation. 9529-9538 - Kunsheng Ma
, Fan Qi
, Changsheng Xu
:
Granular Music Attribute Transformation with Proximal Policy Optimization Adapters for Diffusion Model. 9539-9548 - Luyang Cao
, Han Xu
, Jian Zhang
, Lei Qi
, Jiayi Ma
, Yinghuan Shi
, Yang Gao
:
Towards Perfection: Building Inter-component Mutual Correction for Retinex-based Low-light Image Enhancement. 9549-9558 - Yingzhen Zhang
, Jimin Dai
, Qianliang Wu
, Jian Yang
, Lei Luo
:
DCNOT: Diffusion-Cascaded Neural Optimal Transport for Scalable Multi-Domain Image-to-Image Translation. 9559-9568 - Yi Han
, Yaochen Li
, Peijun Chen
, Wenlong Zhou
, Jinhuo Yang
, Jintao Chang
:
SVDGNet: Shapley Value-Based Weight Adjustment for Unsupervised Image Style Transfer. 9569-9577 - Chuhang Ma
, Shuai Tan
, Junjie Wei
, Ye Pan
:
GOES: 3D Gaussian-based One-shot Head Animation with Any Emotion and Any Style. 9578-9587 - Junxiang Qiu
, Shuo Wang
, Jinda Lu
, Lin Liu
, Houcheng Jiang
, Xingyu Zhu
, Yanbin Hao
:
Accelerating Diffusion Transformer via Error-Optimized Cache. 9588-9597 - Ruixiang Jiang
, Chang Wen Chen
:
DiffArtist: Towards Structure and Appearance Controllable Image Stylization. 9598-9607 - Feiyu Wang
, Zhiyuan Zhao
, Yuandong Liu
, Da Zhang
, Junyu Gao
, Hao Sun
, Xuelong Li
:
SVGen: Interpretable Vector Graphics Generation with Large Language Models. 9608-9617 - Yu Zhang
, Wenxiang Guo
, Changhao Pan
, Zhiyuan Zhu
, Tao Jin
, Zhou Zhao
:
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting. 9618-9627 - Le Han
, Kaixuan Chen
, Minchen Ye
, Nenggan Zheng
:
Hi-Motion: Hierarchical Intention Guided Conditional Motion Synthesis. 9628-9637 - Zhimin Zhang
, Bi'an Du
, Caoyuan Ma
, Zheng Wang
, Wei Hu
:
Behave Your Motion: Habit-preserved Cross-category Animal Motion Transfer. 9638-9647 - Qixun Zeng
:
Retrieval Augmented 3D Garment Generation from Single Image. 9648-9656 - Fan Qi
, Zhan Wang
, Changsheng Xu
, Huaiwen Zhang
:
Fine-tuning Bias Neurons for Fair Text-to-Image Generation. 9657-9666 - Yiyan Xu
, Wuqiang Zheng
, Wenjie Wang
, Fengbin Zhu
, Xinting Hu
, Yang Zhang
, Fuli Feng
, Tat-Seng Chua
:
DRC: Enhancing Personalized Image Generation via Disentangled Representation Composition. 9667-9676 - Zihang Zhang
, Shoulong Zhang
, Yan Wang
, Shuai Li
:
Reactffusion: Physical Contact-guided Diffusion Model for Reaction Generation. 9677-9685 - Hyebin Cho
, Jaehyup Lee
:
Uncertainty-Guided Face Matting for Occlusion-Aware Face Transformation. 9686-9694 - Yufan Hu
, Kunlin Yang
, Junyu Gao
, Bin Fan
, Hongmin Liu
:
Learning Evidential Delta Denoising Scores for Video Editing. 9695-9703 - Tianqi Li, Ruobing Zheng, Minghui Yang, Jingdong Chen, Ming Yang:
Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis. 9704-9713 - Yiming Wu
, Zhenghao Chen
, Huan Wang
, Dong Xu
:
Individual Content and Motion Dynamics Preserved Pruning for Video Diffusion Models. 9714-9723 - Zezhou Chen
, Ping Chen, Huan Hu, Xiang Liu, Zipeng Wang, Zhaoxiang Liu, Kai Wang, Shiguo Lian
:
CP3: Customizable 3D Pop-Out Effect Creation for Immersive Content Using Multimodal Models. 9724-9732 - Sizhe Zhao
, Chenyang Wang
, Weiyu Zhao
, Zonglin Li
, Ming Li
, Shengping Zhang
:
REA-Listener: Real-Time Listening Head Generation with Dynamic Emotion Modeling and Flexible Modality Adaptation. 9733-9742 - Zihao Liu
, Mingwen Ou
, Zunnan Xu
, Jiaqi Huang
, Haonan Han
, Ronghui Li
, Xiu Li
:
Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis. 9743-9752 - Xiaofan Li
, Chenming Wu
, Zhao Yang
, Zhihao Xu
, Yumeng Zhang
, Dingkang Liang
, Ji Wan
, Jun Wang
:
DriVerse: Navigation World Model for Driving Simulation via Multimodal Trajectory Prompting and Motion Alignment. 9753-9762 - Haiyang Zhou
, Wangbo Yu
, Jiawen Guan
, Xinhua Cheng
, Yonghong Tian
, Li Yuan
:
HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation. 9763-9772 - Hyunwoo Oh
, SeungJu Cha
, Kwanyoung Lee
, Si-Woo Kim
, Dong-Jin Kim
:
CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation. 9773-9782 - Wenqi Dong
, Bangbang Yang
, Zesong Yang
, Yuan Li
, Tao Hu
, Hujun Bao
, Yuewen Ma
, Zhaopeng Cui
:
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation. 9783-9792 - Ruiyan Wang
, Zhengxue Cheng
, Zonghao Lin
, Jun Ling
, Yuzhou Liu
, Yanru An
, Rong Xie
, Li Song
:
SemanticGarment: Semantic-Controlled Generation and Editing of 3D Gaussian Garments. 9793-9802 - Pinxin Liu
, Pengfei Zhang
, Hyeongwoo Kim
, Pablo Garrido
, Ari Shapiro
, Kyle Olszewski
:
Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture Representation. 9803-9812 - Xingbo Yao
, Xuanmin Wang
, Hui Xiong
:
CitySculpt: 3D City Generation from Satellite Imagery with UV Diffusion. 9813-9821 - Xiufeng Huang
, Ka Chun Cheung
, Runmin Cong
, Simon See
, Renjie Wan
:
Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction. 9822-9831 - Zeren Xiong
, Zikun Chen
, Zedong Zhang
, Xiang Li
, Ying Tai
, Jian Yang
, Jun Li
:
Category-Aware 3D Object Composition with Disentangled Texture and Shape Multi-view Diffusion. 9832-9841 - Guoqiang Liang
, Qingnan Fan
, Bingtao Fu
, Jinwei Chen
, Hong Gu
, Lin Wang
:
AuthFace: Towards Authentic Blind Face Restoration with Face-oriented Generative Diffusion Prior. 9842-9851 - Wenshuo Chen
, Kuimou Yu
, Haozhe Jia
, Kaishen Yuan
, Zexu Huang
, Bowen Tian
, Songning Lai
, Hongru Xiao
, Erhang Zhang
, Lei Wang
, Yutao Yue
:
ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model. 9852-9861 - Aravindan Kamatchi Sundaram
, Ujjayan Pal
, Abhimanyu Chauhan
, Aishwarya Agarwal
, Srikrishna Karanam
:
CoCoNO: Attention Contrast-and-Complete for Initial Noise Optimization in Text-to-Image Synthesis. 9862-9870 - Yuxuan Jiang
, Zehua Chen
, Zeqian Ju
, Chang Li
, Weibei Dou
, Jun Zhu
:
FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation. 9871-9880 - Bo Gao
, Jianhui Wang
, Xinyuan Song
, Yangfan He
, Fangxu Xing
, Tianyu Shi
:
Free-Mask: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing. 9881-9890 - Mengchao Wang
, Qiang Wang
, Fan Jiang
, Yaqi Fan
, Yunpeng Zhang
, Yonggang Qi
, Kun Zhao
, Mu Xu
:
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis. 9891-9900 - Zejian Li
, Yize Li
, Chenye Meng
, Zhongni Liu
, Ling Yang
, Shengyuan Zhang
, Guang Yang
, Changyuan Yang
, Zhiyuan Yang
, Lingyun Sun
:
Inversion-DPO: Precise and Efficient Post-Training for Diffusion Models. 9901-9910 - Hongxing Fan
, Lipeng Wang
, Haohua Chen
, Zehuan Huang
, Jiangtao Wu
, Lu Sheng
:
Multi-Agent Amodal Completion: Direct Synthesis with Fine-Grained Semantic Guidance. 9911-9919 - Haoxiang Cao
, Chaoqun Wang
, Yongwen Lai
, Shaobo Min
, Xuejin Chen
:
CausalCtrl: Causality-Aware Control Framework for Text-Guided Visual Editing. 9920-9929 - Yifeng Huang
, Zhang Chen
, Yi Xu
, Minh Hoai
, Zhong Li
:
DualMat: PBR Material Estimation via Coherent Dual-Path Diffusion. 9930-9939 - Amruta Muthal
, Varghese P. Kuruvilla
, Ravi Kiran Sarvadevabhatla
:
PLATO: Generating Objects from Part Lists via Synthesized Layouts. 9940-9949 - Haofan Zhang
, Shangfei Wang
:
EmIT: Emotional Interaction control in Text-to-image diffusion models. 9950-9958 - Yechao Xu
, Zhengxing Sun
, Qian Li
, Yunhan Sun
:
Text Prompted Spatiotemporal Sequence Prediction with Text-Vision Prompt Refiner and Masked Diffusion Transformers. 9959-9968 - Na Li
, Zihao Li
, Zuoli Tang
, Yuqing Yu
, Lixin Zou
, Chenliang Li
:
Bridging the Gap: Consistent Image Outpainting via Training-Free Noise Optimization. 9969-9977 - Zhibing Zhang
, Jiantao Lin
, Cangqi Zhou
, Rui Xia
:
MPPR: Memory-Prior-based Prompt Refinement in Continuous Space for Advanced Text-to-Image Generation. 9978-9986 - Weipeng Tan
, Chuming Lin
, Chengming Xu
, FeiFan Xu
, Xiaobin Hu
, Xiaozhong Ji
, Junwei Zhu
, Chengjie Wang
, Yanwei Fu
:
Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation. 9987-9995 - Junwen He
, Yifan Wang
, Lijun Wang
, Huchuan Lu
, Chenyang Li
, Hanyuan Chen
, Jin-Peng Lan
, Jun-Yan He
, Bin Luo
, Yifeng Geng
:
GLDesigner: Leveraging Multi-Modal LLMs as Designer for Enhanced Aesthetic Text Glyph Layouts. 9996-10005 - Shuang Hao
, Pengfei Ren
, Lei Zhang
, Haifeng Sun
, Pan Ting
, Menghao Zhang
, Cong Liu
, Qi Qi
, Jianxin Liao
, Jingyu Wang
:
A Dual-Branch 3D Spatial-Aware Latent Diffusion for Realistic Depth Image Synthesis. 10006-10014 - Xin Li
, Kaixiang Yang
, Qiang Li
, Zhiwei Wang
:
Joint Holistic and Lesion Controllable Mammogram Synthesis via Gated Conditional Diffusion Model. 10015-10023 - Jiacheng Liu
, Chang Zou
, Yuanhuiyi Lyu
, Fei Ren
, Shaobo Wang
, Kaixin Li
, Linfeng Zhang
:
SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching. 10024-10033 - Qinwei Lin
, Xiaopeng Sun
, Yu Gao
, Yujie Zhong
, Zheng Zhao
, Dengjie Li
, Haoqian Wang
:
TASR: Timestep-Aware Diffusion Model for Image Super-Resolution. 10034-10043 - Seung Young Noh
, Ju Yong Chang
:
Stable Diffusion-Based Approach for Human De-Occlusion. 10044-10053 - Ting Xiang
, Changjian Chen
, Zhuo Tang
, Qifeng Zhang
, Fei Lyu
, Li Yang
, Jiapeng Zhang
, Kenli Li
:
Enhancing Small-Scale Dataset Expansion with Triplet-Connection-based Sample Re-Weighting. 10054-10063 - Xinlong Zhang
, Zejian Li
, Wei Li
, Xiaoyu Zhang
, Jia Wei
, Chengyu Lin
, Yongchuan Tang
:
ObjCtrl: Object-based Control Relaxation for Conditional Text-to-Image Generation. 10064-10073 - Sung-Lin Tsai
, Bo-Lun Huang
, Yu-Ting Shen
, Cheng-Yu Yeo
, Chiang Tseng
, Bo-Kai Ruan
, Wen-Sheng Lien
, Hong-Han Shuai
:
Color Me Correctly: Bridging Perceptual Color Spaces and Text Embeddings for Improved Diffusion Generation. 10074-10082 - Huadai Liu
, Jialei Wang
, Xiangtai Li
, Wen Wang
, Qian Chen
, Rongjie Huang
, Yang Liu
, Jiayang Xu
, Zhou Zhao
, Wei Xue
:
MelodyEdit: Zero-shot Music Editing with Disentangled Inversion Control. 10083-10092 - Yue Ling
, Dong Zhao
, Kaikai Deng
, Kangwen Yin
, Zixiao He
, Yizong Wang
, Huadong Ma
:
Venus: Generating Large-scale mmWave Radar Data via Few 2D Videos for Gesture Recognition While Lying Down. 10093-10102 - Liu Yang
, Huiyu Duan
, Yucheng Zhu
, Xiaohong Liu
, Lu Liu
, Zitong Xu
, Guangji Ma
, Xiongkuo Min
, Guangtao Zhai
, Patrick Le Callet
:
Omni2: Unifying Omnidirectional Image Generation and Editing in an Omni Model. 10103-10112 - Gang Pan
, Meihua Liu
, Lei Zhou
, Jiahao Wang
, Di Sun
:
Image Retargeting based on Text Region Awareness. 10113-10121 - Ruiqi Li
, Yiu-ming Cheung
:
Modeling and Identifying Distractors with Curriculum for Robust 3D Gaussian Splatting. 10122-10131 - Yitong Yang
, Yinglin Wang
, Tian Zhang
, Jing Wang
, Shuting He
:
Prompt-Softbox-Prompt: A Free-Text Embedding Control for Image Editing. 10132-10141 - Ruohao Zhan
, Yijin Li
, Yisheng He
, Shuo Chen
, Yichen Shen
, Xinyu Chen
, Zilong Dong
, Zhaoyang Huang
, Guofeng Zhang
:
CoProSketch: Controllable and Progressive Sketch Generation with Diffusion Model. 10142-10151 - Bowen Tian
, Wenshuo Chen
, Zexi Li
, Songning Lai
, Jiemin Wu
, Yutao Yue
:
Text2Weight: Bridging Natural Language and Neural Network Weight Spaces. 10152-10160 - Muhammad Ali Farooq
, Waseem Shariff
, Peter Corcoran
:
ThermVision: Exploring FLUX for Synthesizing Hyper-Realistic Thermal Face Data and Animations via Image to Video Translation. 10161-10170 - Jianhong Bai, Tianyu He, Yuchi Wang, Junliang Guo, Haoji Hu, Zuozhu Liu
, Jiang Bian:
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing. 10171-10180 - Zhixin Zheng
, Xinyu Wang
, Chang Zou
, Shaobo Wang
, Linfeng Zhang
:
Compute Only 16 Tokens in One Timestep: Accelerating Diffusion Transformers with Cluster-Driven Feature Caching. 10181-10189 - Zeyang Bai
, Yunbiao Wang
, Dongbo Yu
, Jun Xiao
, Lupeng Liu
:
GraphSplat: Sparse-View Generalizable 3D Gaussian Splatting is Worth Graph of Nodes. 10190-10199 - Jiahao Song
, Yuzhao Wang
:
MusFlow: Multimodal Music Generation via Conditional Flow Matching. 10200-10209 - Yizhe Yuan
, Bingsen Xue
, Bangzheng Pu
, Chengxiang Wang
, Cheng Jin
:
PRINTER: Deformation-Aware Adversarial Learning for Virtual IHC Staining with In Situ Fidelity. 10210-10219 - He Wang
, Longquan Dai
, Shihao Pu
, Shaomeng Wang
, Jinhui Tang
:
Generative Semantic Probing for Vision-Language Models via Hierarchical Feature Optimization. 10220-10228 - Hui Wang
, Shujie Liu
, Lingwei Meng
, Jinyu Li
, Yifan Yang
, Shiwan Zhao
, Haiyang Sun
, Yanqing Liu
, Haoqin Sun
, Jiaming Zhou
, Yan Lu
, Yong Qin
:
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching. 10229-10238 - Shikun Sun
, Chengrui Wang
, Min Zhou
, Zixuan Wang
, Xiaoyu Qin
, Tiezheng Ge
, Bo Zheng
, Jia Jia
:
DEPO: Enhancing E-commerce Image Background Generation with Short Trajectory Direct Expected Preference Optimization. 10239-10247 - Yifan Hu
, Rui Liu
, Yi Ren
, Xiang Yin
, Haizhou Li
:
UniTalker: Conversational Speech-Visual Synthesis. 10248-10257 - Biao Dong
, Lei Zhang
:
Talking Head Generation via Viewpoint and Lighting Simulation Based on Global Representation. 10258-10267 - Yike Liu
, Jianhui Zhang
, Haipeng Li
, Shuaicheng Liu
, Bing Zeng
:
Coding-Prior Guided Diffusion Network for Video Deblurring. 10268-10277 - Weitao You
, Heda Zuo
, Junxian Wu
, Dengming Zhang
, Zhibin Zhou
, Lingyun Sun
:
Spatial-Temporal Decomposition and Alignment in Controllable Video-to-Music Generation. 10278-10286 - Qianru Qiu
, Jiafeng Mao
, Xueting Wang
:
Exploring Palette based Color Guidance in Diffusion Models. 10287-10295 - Na Zhang
, Moran Li
, Chengming Xu
, Han Feng
, Xiaobin Hu
, Jiangning Zhang
, Weijian Cao
, Chengjie Wang
, Yanwei Fu
:
StrandDesigner: Towards Practical Strand Generation with Sketch Guidance. 10296-10304 - Yukang Lin
, Yan Hong
, Zunnan Xu
, Xindi Li
, Chao Xu
, Chuanbiao Song
, Ronghui Li
, Haoxing Chen
, Jun Lan
, Huijia Zhu
, Weiqiang Wang
, Jianfu Zhang
, Xiu Li
:
InterAnimate: Taming Region-Aware Diffusion Model for Realistic Human Interaction Animation. 10305-10314 - Wenkang Han
, Wang Lin
, Yiyun Zhou
, Qi Liu
, Shulei Wang
, Chang Yao
, Jingyuan Chen
:
Show and Polish: Reference-Guided Identity Preservation in Face Video Restoration. 10315-10324 - Longquan Dai
, He Wang
, Xiaolu Wei
, Shaomeng Wang
, Jinhui Tang
:
Conducting Conditional Diffusion by Estimating the Mean Vector of von Mises-Fisher Distribution. 10325-10333 - Bin Wang
, Yang Xu
, Huan Zhao
, Hao Zhang
, Zixing Zhang
:
PTalker: Personalized Speech-Driven 3D Talking Head Animation via Style Disentanglement and Modality Alignment. 10334-10342 - Nian Liu
, Zilong Zhang
, Zi Wang
, Tengyu Liu
, Hongzhao Xie
, Xinyi Tong
, Libin Liu
, Yaodong Yang
, Zhaofeng He
:
Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters. 10343-10351 - Tongfei Liu
, Yufan Liu
, Bing Li
, Weiming Hu
, Yuming Li
, Chenguang Ma
:
Noise-Optimized Distribution Distillation for Dataset Condensation. 10352-10360 - Yuhong Zhang
, Han Wang
, Yiwen Wang
, Rong Xie
, Li Song
:
FreeInsert: Personalized Object Insertion with Geometric and Style Control. 10361-10369 - Meng Yu
, Kun Zhan
:
Frequency Regulation for Exposure Bias Mitigation in Diffusion Models. 10370-10378 - Zihou Liu, Dongming Zhang, Jing Zhang, Jun Li, Yongdong Zhang:
RealText: Realistic Text Image Generation based on Glyph and Scene Aware Inpainting. 10379-10387 - Zhihang Yuan
, Siyuan Wang
, Yuzhang Shang
, Hanling Zhang
, Tongcheng Fang
, Rui Xie
, Shengen Yan
, Guohao Dai, Yu Wang
:
DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation. 10388-10397 - Jiajing Lin
, Zhenzhong Wang
, Dejun Xu
, Shu Jiang
, Yunpeng Gong
, Min Jiang
:
Phys4DGen: Physics-Compliant 4D Generation with Multi-Material Composition Perception. 10398-10407 - Zichao Yu
, Zhen Zou
, Guojiang Shao
, Chenwei Zhang
, Shengze Xu
, Jie Huang
, Feng Zhao
, Xiaodong Cun
, Wenyi Zhang
:
AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse. 10408-10417 - Leyuan Liu
, Shen Chen
, Jingying Chen
:
HumanPrinter: Reconstructing 3D Human from a Single Image Like a 3D Printer. 10418-10426 - Junxian Wu
, Weitao You
, Heda Zuo
, Dengming Zhang
, Pei Chen
, Lingyun Sun
:
Controllable Video-to-Music Generation with Multiple Time-Varying Conditions. 10427-10436 - Lingzhou Mu
, Baiji Liu
, Ruonan Zhang
, Guiming Mo
, Jiawei Jin
, Kai Zhang
, Haozhi Huang
:
FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion model. 10437-10446 - Jinming Zhang
, Yunlian Sun
, Hongwen Zhang
, Jinhui Tang
:
EDMG: Towards Efficient Long Dance Motion Generation with Fundamental Movements from Dance Genres. 10447-10456 - Jiawen Wang
, Jianjun Li
, Zhiyuan Ma
, Ruixia Bai
:
SAKR-Edit: Scene-Aware Knowledge Reasoning for Text-to-Image Editing. 10457-10466 - Yuwei Zhou
, Xin Wang
, Hong Chen
, Yipeng Zhang
, Zeyang Zhang
, Wenwu Zhu
:
ModuleTeam: Open-Set Multi-Conditional Image Generation with Training-Free Latent Mixture of Any Control Module. 10467-10475 - Kien T. Pham
, Yingqing He
, Yazhou Xing
, Qifeng Chen
, Long Chen
:
SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation. 10476-10485 - Kai Wang
, Shijian Deng
, Jing Shi
, Dimitrios Hatzinakos
, Yapeng Tian
:
AV-DiT: Taming Image Diffusion Transformers for Efficient Joint Audio and Video Generation. 10486-10495 - Yuying Shang
, Xinyi Zeng
, Yutao Zhu
, Xiao Yang
, Zhengwei Fang
, Jingyuan Zhang
, Jiawei Chen
, Zinan Liu
, Yu Tian
:
From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models. 10496-10505 - Wangzheng Shi
, Yinglin Zheng
, Yuxin Lin
, Jianmin Bao
, Ming Zeng
, Dong Chen
:
HairShifter: Consistent and High-Fidelity Video Hair Transfer via Anchor-Guided Animation. 10506-10515 - YoungChan Choi
, HengFei Wang
, YiHua Cheng
, Boeun Kim
, Hyung Jin Chang
, Younggeun Choi
, Sang-Il Choi
:
Roll Your Eyes: Gaze Redirection via Explicit 3D Eyeball Rotation. 10516-10524 - Ruipeng Wang
, Junfeng Fang
, Jiaqi Li
, Hao Chen
, Jie Shi
, Kun Wang
, Xiang Wang
:
ACE: Concept Editing in Diffusion Models without Performance Degradation. 10525-10534 - Matteo Trippodo
, Federico Becattini
, Lorenzo Seidenari
:
Immunizing Images from Text to Image Editing via Adversarial Cross-Attention. 10535-10543 - Hohyun Na
, Seunghoo Hong
, Simon S. Woo
:
PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting. 10544-10553 - Atharva Mehta
, Shivam Chauhan
, Monojit Choudhury
:
Exploring Adapter Design Tradeoffs for Low Resource Music Generation. 10554-10562 - Gangjian Zhang
, Jian Shu
, Nanjie Yao
, Hao Wang
:
SAT: Supervisor Regularization and Animation Augmentation for Two-process Monocular Texture 3D Human Reconstruction. 10563-10572 - Feiwei Qin
, Shichao Lu
, Junhao Hou
, Changmiao Wang
, Meie Fang
, Ligang Liu
:
Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings. 10573-10582 - Yuqi Chen
, Xiubo Liang
, Yu Zhao
, Hongzhi Wang
, Weidong Geng
:
S2-Edit3DV: Diffusion-Guided Style Meets Structure for Consistent Multi-View 3D Video Generation. 10583-10592 - Gwon-Jung Kim
, Du Yeol Lee
, Jae Hong Yang
, Chae-Eun Rhee
:
See Through the Occlusions: Few-Shot Gaussian Splatting with Layered Amodal Supervision. 10593-10601 - Pengyu Long
, Zijun Zhao
, Min Ouyang
, Qingcheng Zhao
, Wei Yang
, Lan Xu
, Jingyi Yu
:
Generating 3D Hair Strands from Images with Diverse Styles and Viewpoints. 10602-10611 - Jiajun Zhang
, Xin Li
, Si Wu
, Yong Xu
, Yaowei Wang
:
Prior-Free Augmentation for Cloth-Changing Person Re-Identification. 10612-10621 - Xueyun Tian
, Wei Li
, Bingbing Xu
, Yige Yuan
, Yuanzhuo Wang
, Huawei Shen
:
MIGE: Mutually Enhanced Multimodal Instruction-Based Image Generation and Editing. 10622-10631 - Wenrui Liu
, Qian Chen
, Wen Wang
, Guanrou Yang
, Weiqin Li
, Minghui Fang
, Jialong Zuo
, Xiaoda Yang
, Tao Jin
, Jin Xu
, Zemin Liu
, Yafeng Chen
, Jionghao Bai
, Zhifang Guo
:
Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation. 10632-10641 - Dongdong Hu
, Yang Zhou
, Xiaofeng Huang
, Haibing Yin
, Zhu Li
:
Sparse4DGS: Flow-Geometry Assisted 4D Gaussian Splatting for Dynamic Sparse View Synthesis. 10642-10651 - Yanming Chen
, Zixin Ma
, Chuanguang Yang
, Zhulin An
, Yiwen Zhang
:
Accelerating Diffusion Models via Parallel Denoising. 10652-10661 - Qifan Fu
, Xu Chen
, Muhammad Asad
, Shanxin Yuan
, Changjae Oh
, Gregory G. Slabaugh
:
Robust Photo-Realistic Hand Gesture Generation: from Single View to Multiple View. 10662-10670 - Wenjie Tian
, Xinfa Zhu
, Haohe Liu
, Zhixian Zhao
, Zihao Chen
, Chaofan Ding
, Xinhan Di
, Junjie Zheng
, Lei Xie
:
DualDub: Video-to-Soundtrack Generation via Joint Speech and Background Audio Synthesis. 10671-10680 - Yongjie Hu
, Yifan Jiang
, Ziyun Li
, Fei Gao
, Henrik Boström
, Nannan Wang
:
CADQ: Attribute-Consistent Face Cartoonization with Cross-modal Aligned and Deformable Quantization. 10681-10689 - Fangli Ying
, Zhihong Zhang
, Liting Zhou
, Cathal Gurrin
, Jinhai Wang
:
Identity-Preserving Facial Aesthetic Enhancement via Hierarchical Prompt Learning and Pivotal Tuning. 10690-10698 - Ilya Borovik
, Dmitrii Gavrilev
, Vladimir Viro
:
SyMuPe: Affective and Controllable Symbolic Music Performance. 10699-10708 - Zongsheng Cao
, Yangfan He
, Anran Liu
, Jun Xie
, Zhepeng Wang
, Feng Chen
:
CoFi-Dec: Hallucination-Resistant Decoding via Coarse-to-Fine Generative Feedback in Large Vision-Language Models. 10709-10718 - Doga Yilmaz
, He Wang
, Towaki Takikawa
, Duygu Ceylan
, Kaan Aksit
:
Learned Single-Pass Multitasking Perceptual Graphics for Immersive Displays. 10719-10727 - Zihao Zhang
, Xingjiao Wu
, Junjie Xu
, Tianlong Ma
, Tangren Yao
, Wen Wu
, Liang He
:
Temporal-Conditioned Symbolic Alignment for Controllable Text-to-Music Generation. 10728-10737 - Yuntian Xiao
, Shoulong Zhang
, Zihang Zhang
, Jiahao Cui
, Yan Wang
, Shuai Li
:
Phys4DRT: Physics-based 4D Generation for Real-Time Interaction with Time-Frequency Supervision. 10738-10747 - Guanrou Yang
, Chen Yang
, Qian Chen
, Ziyang Ma
, Wenxi Chen
, Wen Wang
, Tianrui Wang
, Yifan Yang
, Zhikang Niu
, Wenrui Liu
, Fan Yu
, Zhihao Du
, Zhifu Gao
, Shiliang Zhang
, Xie Chen
:
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting. 10748-10757 - Jeongsoo Choi
, Ji-Hoon Kim
, Sung-Bin Kim
, Tae-Hyun Oh
, Joon Son Chung
:
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation. 10758-10767 - Hongjie Wu
, Mingqin Zhang
, Linchao He
, Ji-Zhe Zhou
, Jiancheng Lv
:
Enhancing Diffusion Model Stability for Image Restoration via Gradient Management. 10768-10777 - Wenming Wu
, Tianlei Sheng
, Gaofeng Zhang
, Liping Zheng
:
FloorplanSBS: Synthesizing Vector Floorplans by Patch-Based Floorplan Segmentation. 10778-10786
Generative AI: Multimedia Foundation Models
- Che Liu
, Yingji Zhang
, Dong Zhang
, Weijie Zhang
, Chenggong Gong
, Yu Lu
, Shilin Zhou
, Ziliang Gan
, Ziao Wang
, Haipang Wu
, Ji Liu
, André Freitas
, Qifan Wang
, Zenglin Xu
, Rongjunchen Zhang
, Yong Dai
:
NEXUS-O: An Omni-Perceptive and -Interactive Model for Language, Audio, and Vision. 10787-10796 - Renyang Liu
, Ziyu Lyu
, Wei Zhou
, See-Kiong Ng
:
D-Judge: How Far Are We? Assessing the Discrepancies Between AI-synthesized and Natural Images through Multimodal Guidance. 10797-10806 - Linli Yao
, Yicheng Li, Yuancheng Wei, Lei Li, Shuhuai Ren
, Yuanxin Liu
, Kun Ouyang, Lean Wang, Shicheng Li, Sida Li
, Lingpeng Kong, Qi Liu, Yuanxing Zhang, Xu Sun:
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos. 10807-10816 - Zhipeng Liu
, Peibo Duan
, Binwu Wang
, Xuan Tang
, Qi Chu
, Changsheng Zhang
, Yongsheng Huang
, Bin Zhang
:
DisMS-TS: Eliminating Redundant Multi-scale Features for Time Series Classification. 10817-10826 - Xiangyue Zhang
, Jianfang Li
, Jiaxu Zhang
, Jianqiang Ren
, Liefeng Bo
, Zhigang Tu
:
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation. 10827-10836 - Wei Miao
, Jiangrong Shen
, Hongming Xu
, Tommi Kärkkäinen
, Qi Xu
, Yi Xu
, Fengyu Cong
:
Advanced SpikingYOLOX: Extending Spiking Neural Network on Object Detection with Spike-based Partial Self-Attention and 2D-Spiking Transformer. 10837-10846 - Yihang Liu
, Ying Wen
, Longzhen Yang
, Lianghua He
, Heng Tao Shen
:
RadLAS: A Foundation Model for Interpretable Radiography Image Analysis with Lesion-Aware Self-Supervised Pre-training. 10847-10856 - Qiang Chen
, Zhongze Wu
, Ang He
, Xi Lin
, Shuo Jiang
, Shan You
, Chang Xu
, Yi Chen
, Xiu Su
:
Graph Unlearning Meets Influence-aware Negative Preference Optimization. 10857-10866 - Junwei Zhao
, Qianchun Luo
, Shiliang Zhang
, Shen Gao
, Jie Wu
:
HDCFN: Haze Distribution-aware Cross-modal Fusion Network for Infrared-guided Dense Haze Removal in UAVs. 10867-10875 - Yang Li
, Xinyi Zeng
, Zhe Xue
, Pinxian Zeng
, Zikai Zhang
, Yan Wang
:
Incorporating the Refractory Period into Spiking Neural Networks through Spike-Triggered Threshold Dynamics. 10876-10885 - Chunshi Wang
, Hongxing Li
, Yawei Luo
:
SonicGauss: Position-Aware Physical Sound Synthesis for 3D Gaussian Representations. 10886-10895 - Wenyu Yin
, Shuyuan Lin
, David Suter
, Hanzi Wang
:
Adaptive Graph Attention-Guided Parallel Sampling and Embedded Selection for Multi-Model Fitting. 10896-10904 - Yaowen Hu
, Wenxuan Tu
, Yue Liu
, Miaomiao Li
, Wenpeng Lu
, Zhigang Luo
, Xinwang Liu
, Ping Chen
:
Divide-Then-Rule: A Cluster-Driven Hierarchical Interpolator for Attribute-Missing Graphs. 10905-10914 - Chenxi Li
, Weijie Wang
, Qiang Li
, Nicu Sebe
, Bruno Lepri
, Weizhi Nie
:
FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors. 10915-10924 - Jiaqi Xu
, Kunzhe Huang
, Xinyi Zou
, Yunkuo Chen
, Bo Liu
, Mengli Cheng
, Jun Huang
, Xing Shi
:
EasyAnimate: High-Performance Video Generation Framework with Hybrid Windows Attention and Reward Backpropagation. 10925-10934 - Wencan Huang
, Daizong Liu
, Wei Hu
:
Fast3D: Accelerating 3D Multi-modal Large Language Models for Efficient 3D Scene Understanding. 10935-10944 - Weibin Wu
, Zitong Wang
, Zhengjie Luo
, Wenqing Chen
, Zibin Zheng
:
Detecting Violations of Physical Common Sense in Images: A Challenge Dataset and Effective Model. 10945-10954 - Le Wang
, Zonghao Ying
, Tianyuan Zhang
, Siyuan Liang
, Shengshan Hu
, Mingchuan Zhang
, Aishan Liu
, Xianglong Liu
:
Manipulating Multimodal Agents via Cross-Modal Prompt Injection. 10955-10964 - Shangqing Tu
, Yucheng Wang
, Daniel Zhang-Li
, Yushi Bai
, Jifan Yu
, Yuhao Wu
, Lei Hou
, Huiqin Liu
, Zhiyuan Liu
, Bin Xu
, Juanzi Li
:
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models. 10965-10974 - Haoran Zhang
, Yong Liu
, Yunzhong Qiu
, Haixuan Liu
, Zhongyi Pei
, Jianmin Wang
, Mingsheng Long
:
TimesBERT: A BERT-Style Foundation Model for Time Series Understanding. 10975-10983 - Hongfei Xue
, Yufeng Tang
, Hexin Liu
, Jun Zhang
, Xuelong Geng
, Lei Xie
:
Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning. 10984-10993 - Yang Shi
, Jiaheng Liu
, Yushuo Guan
, Zhenhua Wu
, Yuanxing Zhang
, Zihao Wang
, Weihong Lin
, Jingyun Hua
, Zekun Wang
, Xinlong Chen
, Bohan Zeng
, Wentao Zhang
, Fuzheng Zhang
, Wenjing Yang
, Di Zhang
:
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model. 10994-11003 - Hanqi Chen
, Zhongyin Zhao
, Ye Chen
, Zhujin Liang
, Bingbing Ni
:
SVGThinker: Instruction-Aligned and Reasoning-Driven Text-to-SVG Generation. 11004-11012 - Na Zhao
, Kejiang Chen
, Yuang Qi
, Kai Zeng
, Weiming Zhang
, Nenghai Yu
:
Merging-Resistant Watermarking for LoRA Modules. 11013-11021 - Kotaro Kikuchi
, Ukyo Honda
, Naoto Inoue
, Mayu Otani
, Edgar Simo-Serra
, Kota Yamaguchi
:
Multimodal Markup Document Models for Graphic Design Completion. 11022-11031 - Zhihan Zhang
, Yixin Cao
, Lizi Liao
:
Boosting Chart-to-Code Generation in MLLM via Dual Preference-Guided Refinement. 11032-11041 - Jiangrong Shen
, Yulin Xie
, Qi Xu
, Gang Pan
, Huajin Tang
, Badong Chen
:
Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning. 11042-11051 - Jingzhi Li
, Changjiang Luo
, Ruoyu Chen
, Hua Zhang
, Wenqi Ren
, Jianhou Gan
, Xiaochun Cao
:
FaceInsight: A Multimodal Large Language Model for Face Perception. 11052-11061 - Chang Gao
, Kang Zhao
, Runqi Wang
, Jianfei Chen
, Liping Jing
:
Maximum Redundancy Pruning: A Principle-Driven Layerwise Sparsity Allocation for LLMs. 11062-11070 - Baining Zhao
, Ziyou Wang
, Jianjie Fang
, Chen Gao
, Fanhang Man
, Jinqiang Cui
, Xin Wang
, Xinlei Chen
, Yong Li
, Wenwu Zhu
:
Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning. 11071-11080 - Wenyu Li
, Sidun Liu
, Peng Qiao
, Yong Dou
:
Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction. 11081-11090 - Pengsheng Liu
, Zhaojie Chu
, Xiaofen Xing
, Xiangmin Xu
:
SemGesture: Synthesizing Semantically Enhanced and Coherent Gestures. 11091-11100 - Zongxin Liu
, Yishu Liu
, Guangming Lu
, Xiaoling Luo
, Bingzhi Chen
:
CauRDG: Enhancing Domain Generalization with Causal-Driven Semantic Consistency Reasoning. 11101-11110 - Jiahuan Cao
, Yang Liu
, Peirong Zhang
, Yongxin Shi
, Kai Ding
, Lianwen Jin
:
TongGu-VL: Advancing Visual-Language Understanding in Chinese Classical Studies through Parameter Sensitivity-Guided Instruction Tuning. 11111-11120 - Mingjie Wei
, Wei-Nan Zhang
, Chen Zhang
, Yifeng Ding
, Donglin Di
, Lei Ren
, Wei Chen
, Ting Liu
:
PRISM: A Benchmark for Unveiling Cross-modal Knowledge Inconsistency in Large Vision-Language Models. 11121-11129 - Pengfei Jiang
, Hanjun Li
, Linglan Zhao
, Fei Chao
, Ke Yan
, Shouhong Ding
, Rongrong Ji
:
VISA: Group-wise Visual Token Selection and Aggregation via Graph Summarization for Efficient MLLMs Inference. 11130-11139 - Zongxing Zhao
, Shenzhi Yang
, Xingkai Yao
, Yuying Wang
, Zhongqiu Chen
, Xiaofang Zhang
:
HGACLLM: Attribute Completion in Heterogeneous Graph with Integration of External Knowledge from Large Language Models. 11140-11149 - Yuhan Jing
, Bo He
, Haifeng Sun
, Qi Qi
, Zirui Zhuang
, Lei Zhang
, Jianxin Liao
, Jingyu Wang
:
Foresail: LLM Sensor Knowledge Empowered Status-guided Network for Multivariate Time-series Classification. 11150-11159 - Wanying Zhou
, Yuqi Sun
, Yu Ling
, Zhen Xing
, Chenxi Ma
, Weimin Tan
, Bo Yan
:
TabiMed: Tabularizing Medical Images for Few-Shot In-Context Diagnosis. 11160-11169 - Hao Ye
, Mengshi Qi
, Zhaohong Liu
, Liang Liu
, Huadong Ma
:
SafeDriveRAG: Towards Safe Autonomous Driving with Knowledge Graph-based Retrieval-Augmented Generation. 11170-11178 - Duy-Cat Can
, Quang-Huy Tang
, Huong Ha
, Binh T. Nguyen
, Oliver Y. Chén
:
REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero- and Few-shot Neurodegenerative Diagnosis. 11179-11188 - Yanyun Pu
, Kehan Li
, Zeyi Huang
, Zhijie Zhong
, Kaixiang Yang
:
MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video Assessment. 11189-11198 - Wanying Wang
, Zeyu Ma
, Han Zheng
, Xin Tan
, Mingang Chen
:
Self-Aware Safety Augmentation: Leveraging Internal Semantic Understanding to Enhance Safety in Vision-Language Models. 11199-11208
Generative AI: Social Aspects of Generative AI
- Hongyao Yu
, Yixiang Qiu
, Yiheng Yang
, Hao Fang
, Tianqu Zhuang
, Jiaxin Hong
, Bin Chen
, Hao Wu
, Shu-Tao Xia
:
ICAS: Detecting Training Data from Autoregressive Image Generative Models. 11209-11217 - Jiadong Pan
, Liang Li
, Hongcheng Gao
, Zheng-Jun Zha
, Qingming Huang
, Jiebo Luo
:
SafeCFG: Controlling Harmful Features with Dynamic Safe Guidance for Safe Generation. 11219-11228 - Jiachen Fu
, Chun-Le Guo
, Chongyi Li
:
DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models. 11229-11238 - Longjian Zeng
, Zunjie Zhu
, Rongfeng Lu
, Ming Lu
, Bolun Zheng
, Chenggang Yan
, Anke Xue
:
DepthDark: Robust Monocular Depth Estimation for Low-Light Environments. 11239-11248 - Liu Yu
, Jiajun Sun
, Ping Kuang
, Rui Zhou
, Fan Zhou
, Zhikun Feng
:
Bimodal Debiasing for Text-to-Image Diffusion: Adaptive Guidance in Textual and Visual Spaces. 11249-11258 - Yilin Lu
, Jianghang Lin
, Linhuang Xie
, Kai Zhao
, Yansong Qu
, Shengchuan Zhang
, Liujuan Cao
, Rongrong Ji
:
Generate Aligned Anomaly: Region-Guided Few-Shot Anomaly Image-Mask Pair Synthesis for Industrial Inspection. 11259-11268 - Yujiang Li
, Zhili Zhou
, Ruohan Meng
, Baowei Wang
, Xiaojuan Wang
, Cheng Qiao
, Jiantao Zhou
:
Zero Matrix guided Adaptive Image Vaccine against Diffusion Model-based Multitask. 11269-11278 - Song Yan
, Hui Wei
, Jinlong Fei
, Guoliang Yang
, Zhengyu Zhao
, Zheng Wang
:
Universally Unfiltered and Unseen: Input-Agnostic Multimodal Jailbreaks against Text-to-Image Model Safeguards. 11279-11287 - Fan Qi
, Ao Liu
, Zixin Zhang
, Changsheng Xu
:
FORGET ME: Federated Unlearning for Face Generation Models. 11288-11297 - Pengyu Zeng
, Jun Yin
, Haoyuan Sun
, Yuqin Dai
, Maowei Jiang
, Miao Zhang
, Shuai Lu
:
MRED-14: A Benchmark for Low-Energy Residential Floor Plan Generation with 14 Flexible Inputs. 11298-11307 - Jinjie Shen
, Yaxiong Wang
, Lechao Cheng
, Nan Pu
, Zhun Zhong
:
Beyond Artificial Misalignment: Detecting and Grounding Semantic-Coordinated Multimodal Manipulations. 11308-11317 - Jipeng Liu
, Haichao Shi
, Yaru Zhang
, Xiao-Yu Zhang
:
Knowledge Negative Distillation: Circumventing Overfitting to Unlock More Generalizable Deepfake Detection. 11318-11327 - Yizhou Lin
, Nisha Huang
, Kaer Huang
, Henglin Liu
, Yiqiang Yan
, Jie Guo
, Tong-Yee Lee
, Xiu Li
:
ICE: Intercede Concept Erasure in Text-to-Image Diffusion Models. 11328-11336 - Hongyu Zhu
, Sichu Liang
, Wenwen Wang
, Boheng Li
, Tongxin Yuan
, Fangqi Li
, Hanyi Wang
, Shi-Lin Wang
, Zhuosheng Zhang
:
Revisiting Data Auditing in Large Vision-Language Models. 11337-11346 - Shunchang Liu
, Zhuan Shi
, Lingjuan Lyu
, Yaochu Jin
, Boi Faltings
:
CopyJudge: Automated Copyright Infringement Identification and Mitigation in Text-to-Image Diffusion Models. 11347-11356 - Haichao Sha
, Yuncheng Wu
, Ruixuan Liu
, Yang Cao
, Hong Chen
:
Differentially Private Visual Learning with Public Subspace Augmented by Synthetic Data. 11357-11366 - Kai Li
, Wenqi Ren
, Wei Wang
, Linchao Zhang
, Xiaochun Cao
:
Detecting Synthetic Image by Cross-Modal Commonality Interaction. 11367-11375 - Naresh Kumar Devulapally
, Shruti Agarwal
, Tejas Gokhale
, Vishnu Suresh Lokhande
:
Latent Diffusion Unlearning: Protecting Against Unauthorized Personalization Through Trajectory Shifted Perturbations. 11376-11384 - Shengjiu Dai
, Xiujian Liang
, Sheng Li
, Zhenxing Qian
, Xinpeng Zhang
:
Safe-BVAR: Text-to-Image Generative Watermarking for Bitwise Visual AutoRegressive Model. 11385-11393 - Hanzhe Yu
, Yun Ye
, Jintao Rong
, Qi Xuan
, Chen Ma
:
RealHD: A High-Quality Dataset for Robust Detection of State-of-the-Art AI-Generated Images. 11394-11403 - Tianjiao Xu
, Hao Fu
, Suiyang Zhang
, Jianhua Yin
, Tian Gan
, Liqiang Nie
:
Enhancing Democratic Mediation through Norm-Awareness in Generative Agent Societies. 11404-11413 - Muzhi Dai
, Shixuan Liu
, Zhiyuan Zhao
, Junyu Gao
, Hao Sun
, Xuelong Li
:
Secure Tug-of-War (SecTOW): Iterative Defense-Attack Training with Reinforcement Learning for Multimodal Model Security. 11414-11423 - Aryana Hou
, Li Lin
, Justin Li
, Shu Hu
:
Rethinking Individual Fairness in Deepfake Detection. 11424-11433 - Yifan Zeng
, Fangzhou Dong
, Jian Zhao
, Peijia Zheng
, Jian Li
, Huiyu Zhou
:
Towards Culturally Fair Multimodal Generation: Quantifying and Mitigating Orientalist Biases in Text-to-Visual Models. 11434-11442 - Poyuan Mao
, Cheng-Chang Tsai
, Chun-Shien Lu
:
MaXsive: High-Capacity and Robust Training-Free Generative Image Watermarking in Diffusion Models. 11443-11452 - Junlei Zhou
, Jiashi Gao
, Xinwei Guo
, Haiyan Wu
, Quanying Liu
, Xiangyu Zhao
, Hongxin Wei
, Xin Yao
, Xuetao Wei
:
Mitigating Stereotypes in Text-to-Image Generation: A Novel Perspective of Selective Neural Suppression. 11453-11461 - Yitong Sun
, Yao Huang
, Ruochen Zhang
, Huanran Chen
, Shouwei Ruan
, Ranjie Duan
, Xingxing Wei
:
NDM: A Noise-driven Detection and Mitigation Framework against Implicit Sexual Intentions in Text-to-Image Generation. 11462-11471 - Eungi Lee
, Jae Hyun Yoon
, Seok Bong Yoo
:
SCOL: Style Code Orchestration in Latent Space for Proactive Face-Swapping Defense. 11472-11481 - Zhicheng Zhang
, Peizhuo Lv
, Mengke Wan
, Jiang Fang
, Diandian Guo
, Yezeng Chen
, Yinlong Liu
, Wei Ma
, Jiyan Sun, Liru Geng
:
Hot-Swap MarkBoard: An Efficient Black-box Watermarking Approach for Large-scale Model Distribution. 11482-11491 - Jiahao Li
, Yiqiang Chen
, Yunbing Xing
, Yang Gu
, Xiangyuan Lan
:
K-Space Bispectrum Steganography for Robust Unlearnable Data. 11492-11501 - Xueyi Zhang
, Peiyin Zhu
, Jinping Sui
, Xiaoda Yang
, Jiahe Tian
, Mingrui Lao
, Siqi Cai
, Yanming Guo
, Jun Tang
:
Choose Your Expert: Uncertainty-Guided Expert Selection for Continual Deepfake Detection. 11502-11511 - Wenpeng Mu
, Zheng Li
, Qiang Xu
, Xinghao Jiang
, Tanfeng Sun
:
ExDA: Towards Universal Detection and Plug-and-Play Attribution of AI-Generated Ex-Regulatory Images. 11512-11521 - Demin Yu
, Wenchuan Du
, Kenghong Lin
, Xutao Li
, Yunming Ye
, Chuyao Luo
, Xunlai Chen
:
PiMMNet: Introducing Multi-Modal Precipitation Nowcasting via a Physics-informed Perspective. 11522-11531 - Jilong Wei
, Yangyang Hu
, Xiangjuan Wu
, Yiqiang Wu
, Hao Liu
:
Appearance Contrasts for Unconstrained Age Estimation. 11532-11541 - Yan Wang
, Qindong Sun
, Dongzhu Rong
:
Audio-Visual Asynchrony Mitigation: Cross-Modal Alignment and Feature Reconstruction for Deepfake Detection. 11542-11551 - Xinyu Xia
, Xingjun Ma
, Yunfeng Hu
, Ting Qu
, Hong Chen
, Xun Gong
:
From Failures to Fixes: LLM-Driven Scenario Repair for Self-Evolving Autonomous Driving. 11552-11561 - Liqin Wang
, Qianyue Hu
, Wei Lu
, Xiangyang Luo
:
Diffusion-based Adversarial Identity Manipulation for Facial Privacy Protection. 11562-11571 - Midou Guo
, Qilin Yin
, Wei Lu
, Xiangyang Luo
:
Towards Open-world Generalized Deepfake Detection: General Feature Extraction via Unsupervised Domain Adaptation. 11572-11580 - Wenbo Xu
, Junyan Wu
, Wei Lu
, Xiangyang Luo
, Qian Wang
:
A Multimodal Deviation Perceiving Framework for Weakly-Supervised Temporal Forgery Localization. 11581-11589 - Cong Kong
, Rui Xu
, Jiawei Chen
, Zhaoxia Yin
:
Protecting Copyright of Medical Pre-trained Language Models: Training-Free Backdoor Model Watermarking. 11590-11599 - Mingru Yang
, Yanmei Gu
, Qianhua He
, Peirong Zhang
, Haolin He
, Zhiming Wang
, Huijia Zhu
, Jian Liu
, Weiqiang Wang
:
Generalizable Audio Deepfake Detection via Risk-Aware Style Alignment and Structural Empirical Risk Minimization. 11600-11609 - Jianqiao Cui
, Bingyao Yu
, Qihao Wang
, Fei Meng
, Jiwen Lu
:
WhiADD: Semantic-Acoustic Fusion for Robust Audio Deepfake Detection. 11610-11618 - Xuan Hai
, Xin Liu
, Zihao Zhang
, Ziyao Yu
, Xiangzhen Kong
, Song Li
, Weina Niu
, Rui Zhou
, Qingguo Zhou
:
SiFMimicEvader: Evading Fake Voice Detection with Adversarial Neural Mimicry Attacks. 11619-11628 - Bingqian Zhou
, Zhihao Wu
, Yushi Cheng
, Wenyuan Xu
:
AdvPainting: Clean-text Jailbreaking Against Inpainting Models. 11629-11637 - Zhou Feng
, Jiahao Chen
, Chunyi Zhou
, Yuwen Pu
, Qingming Li
, Tianyu Du
, Shouling Ji
:
Enkidu: Universal Frequential Perturbation for Real-Time Audio Privacy Protection against Voice Deepfakes. 11638-11647 - Yurun Chen
, Xueyu Hu
, Keting Yin
, Juncheng Li
, Shengyu Zhang
:
Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks. 11648-11656 - Man Xiao
, Jianbin Ye
, Bo Liu
, Zijian Gao
, Kele Xu
, Xiaodong Wang
:
Analytic Synaptic Dynamic Scaling Balancer for Multimodal Deepfake Continual Detection. 11657-11666 - Inzamamul Alam
, Md Tanvir Islam
, Simon S. Woo
:
SpecXNet: A Dual-Domain Convolutional Network for Robust Deepfake Detection. 11667-11676 - Shiyao Cui
, Qinglin Zhang
, Xuan Ouyang
, Renmiao Chen
, Zhexin Zhang
, Yida Lu
, Hongning Wang
, Han Qiu
, Minlie Huang
:
ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs: ShieldVLM. 11677-11686 - Yunbo Lyu
, Zhou Yang
, Yuqing Niu
, Jing Jiang
, David Lo
:
Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models? 11687-11696 - Beijing Chen
, Yuting Hong
, Ziqiang Li
, Zhangjie Fu
:
DFPD: Dual-Forgery Proactive Defense against Both Deepfakes and Traditional Image Manipulations. 11697-11705 - Jiayi Gao
, Huaiwen Zhang
:
Evaluating and Mitigating Sycophancy in Large Vision-Language Models. 11706-11715 - Shahroz Tariq
, Simon S. Woo
, Priyanka Singh
, Irena Irmalasari
, Saakshi Gupta
, Dev Gupta
:
From Prediction to Explanation: Multimodal, Explainable, and Interactive Deepfake Detection Framework for Non-Expert Users. 11716-11725 - Jiehua Zhang
, Liang Li
, Chenggang Yan
, Wei Ke
, Yihong Gong
:
Frequency-aware Correlation Discovering and Spatial Forgery Clue Distilling for Synthetic Image Detection. 11726-11735 - Hao Gu
, Jiangyan Yi
, Chenglong Wang
, Jianhua Tao
, Zheng Lian
, Jiayi He
, Yong Ren
, Yujie Chen
, Zhengqi Wen
:
ALLM4ADD: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection. 11736-11745 - Tianxiao Li
, Zhenglin Huang
, Haiquan Wen
, Yiwei He
, Shuchang Lyu
, Baoyuan Wu
, Guangliang Cheng
:
RAIDX: A Retrieval-Augmented Generation and GRPO Reinforcement Learning Framework for Explainable Deepfake Detection. 11746-11755 - Renmiao Chen
, Shiyao Cui
, Xuancheng Huang
, Chengwei Pan
, Victor Shea-Jay Huang
, Qinglin Zhang
, Xuan Ouyang
, Zhexin Zhang
, Hongning Wang
, Minlie Huang
:
JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering. 11756-11765 - Hoan My Tran
, Damien Lolive
, Aghilas Sini
, Arnaud Delhay
, Pierre-François Marteau
, David Guennec
:
Multi-level SSL Feature Gating for Audio Deepfake Detection. 11766-11775 - Naquee Rizwan
, Nayandeep Deb
, Sarthak Roy
, Vishwajeet Singh Solanki
, Kiran Garimella
, Animesh Mukherjee
:
Toxicity Begets Toxicity: Unraveling Conversational Chains in Political Podcasts. 11776-11784
Systems: Systems and Middleware
- Zheqi Lv
, Wenqiao Zhang
, Kairui Fu
, Qi Tian
, Shengyu Zhang
, Jiajie Su
, Jingyuan Chen
, Kun Kuang
, Fei Wu
:
Tackling Device Data Distribution Real-time Shift via Prototype-based Parameter Editing. 11785-11794 - Jiaye Zhang
, Hongyi Wang
, Peiru Yang
, Zili Meng
, Mingwei Xu
:
Configuring Dynamic Multi-Stage Serverless Pipelines for Video Processing with Minimal Profiling Overhead. 11795-11804 - Xiaokun Wang
, Yuting Yan
, Sheng Zhang
, Andong Zhu
, Ning Chen
, Yu Chen
, Zhuzhong Qian
, Sanglu Lu
, Yu Liang
:
Decode-What-Matters: Frame-Level Parallel Generative Decoding to Accelerate Large-Scale Video Analytics. 11805-11814 - Shengzhe You
, Libo Weng
, Fei Gao
:
ViTraj: Learning Dual-Side Representations for Vehicle-Infrastructure Cooperative Trajectory Prediction. 11815-11824 - Yuxin Zhang
, Jiahao Yang
, Zhe Chen
, Wenjun Zhu
, Jin Zhao
, Yue Gao
:
A Satellite-Ground Synergistic Large Vision-Language Model System for Earth Observation. 11825-11833 - Zhe Sun
, Qiang Xu
, Qi Zhang
, Shan Liu
, Ge Li
:
Overfitted Point Cloud Attribute Codec Using Sparse Hierarchical Implicit Neural Representations. 11834-11843 - Nan He
, Yiming Chen
, Zheng Jiang
, Song Yang
, Lifeng Sun
:
DynFed: Adaptive Federated Learning via Quantization-Aware Knowledge Distillation. 11844-11852 - Fangxin Liu
, Junjie Wang
, Ning Yang
, Zongwu Wang
, Junping Zhao
, Li Jiang
, Haibing Guan
:
ASTER: Adaptive Dynamic Layer-Skipping for Efficient Transformer Inference via Markov Decision Process. 11853-11861 - Yang Zhao
, Shusheng Li
, Xueshang Feng
:
Lightweight Remote Sensing Scene Classification on Edge Devices via Knowledge Distillation and Early-exit. 11862-11870 - Adhi Widagdo
, Teemu Kämäräinen
, Ahmad Alhilal
, Matti Siekkinen
, Cheng-Hsin Hsu
:
Gaze-Adaptive Foveation for Remote Rendered VR. 11871-11879 - Dong Li, Yichen Niu
, Ying Ai
, Xiang Zou
, Biqing Qi
, Jianxing Liu
:
T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge Retrieval. 11880-11889 - Kewei Zhao
, Xiaowei Hu
, Qinya Li
:
Device-Cloud Collaborative Learning Framework for Efficient Unknown Object Detection. 11890-11898 - Weiwu Pang
, Rajrup Ghosh
, Jiawei Yang
, Ziyu Wei
, Branden Leong
, Yue Wang
, Ramesh Govindan
:
SplatPose: On-Device Outdoor AR Pose Estimation Using Gaussian Splatting. 11899-11908 - Youbo Mao
, Ziyang Kang
, Pengfei Li
, Jiyao Chen
, Zenglin Yang
, Zhijun Li
:
FCG: High-Throughput JPEG Heterogeneous Inference with Hybrid Parallel Pipeline on Mobile Devices. 11909-11917 - Xinbiao Gan
, Qiang Zhang
, Tiejun Li
, Chunye Gong
, Kai Lu
:
GraphWorld: Ultra-fast Graph Engine for World-Wide Web Searching. 11918-11927 - Lehao Lin
, Baohua Fang
, Ziheng Sun
, Ke Wang
, Hong Kang
, Wei Cai
:
BS3: Bézier Slicing Middleware for 3D Mesh LOD Optimization. 11928-11936 - Jiacheng Jiang
, Yuan Meng
, Chen Tang
, Han Yu
, Qun Li
, Zhi Wang
, Wenwu Zhu
:
Quantization Meets OOD: Generalizable Quantization-aware Training from a Flatness Perspective. 11937-11946 - Xinhai Yan
, Libing Wu
, Zhuangzhuang Zhang
, Bingyi Liu
, Lijuan Huo
, Jing Wang
:
FedBAP: Backdoor Defense via Benign Adversarial Perturbation in Federated Learning. 11947-11955 - Jiazhen Chen
, Zheng Ma
, Sichao Fu
, Mingbin Feng
, Tony S. Wirjanto
, Weihua Ou
:
Towards Effective Open-set Graph Class-incremental Learning. 11956-11965 - Haizhou Wang
, Guobing Zou
, Fei Xu
, Yangguang Cui
, Tongquan Wei
:
Multi-Width Neural Network-Assisted Hierarchical Federated Learning in Heterogeneous Cloud-Edge-Device Computing. 11966-11975 - Sarmistha Das
, R. E. Zera Marveen Lyngkhoi
, Sriparna Saha
, Alka Maurya
:
Unlocking Financial Insights: An advanced Multimodal Summarization with Multimodal Output Framework for Financial Advisory Videos. 11976-11985
Systems: Transport and Delivery
- Chunyu Qiao
, Tong Liu
, Yucheng Zhang
, Zhiwei Fan
, Pengjin Xie
, Zhen Wang
, Liang Liu
:
PIRA: Pan-CDN Intra-video Resource Adaptation for Short Video Streaming. 11987-11995 - Fenghao Tian
, Mingtao Feng
, Jianqiao Luo
, Zijie Wu
, Longlong Mei
, Lijie Yang
, Weisheng Dong
, Yaonan Wang
:
Generalizing to New Area: Self-Distillation Curriculum Learning for Fine-Grained Cross View Localization. 11996-12005 - Yuheng Wu
, Thanh-Tung Nguyen
, Lucas Liebe
, Quang Tau
, Pablo Espinosa Campos
, Jinghan Cheng
, Dongman Lee
:
How2Compress: Scalable and Efficient Edge Video Analytics via Adaptive Granular Video Compression. 12006-12015 - Yaojun Wu
, Chaoyi Lin
, Yiming Wang
, Semih Esenlik
, Zhaobin Zhang
, Kai Zhang
, Li Zhang
:
Neural Video Compression with In-Loop Contextual Filtering and Out-of-Loop Reconstruction Enhancement. 12016-12024 - Andong Zhu
, Sheng Zhang
, Xiaohang Shi
, Hesheng Sun
, Yu Liang
, Zhuzhong Qian
, Han Zheng
, Xiaokun Wang
, Ning Jiang
:
VidIQ: Inference-Aware Neural Codecs for Quality-Enhanced, Real-Time Video Analytics. 12025-12034 - Lianchen Jia
, Chaoyang Li
, Ziqi Yuan
, Jiahui Chen
, Tianchi Huang
, Jiangchuan Liu
, Lifeng Sun
:
Beyond Interpretability: Exploring the Comprehensibility of Adaptive Video Streaming through Large Language Models. 12035-12044 - Zhaohui Jiang
, Xuening Feng
, Tianchi Huang
, Ruixiao Zhang
, Paul Weng
, Yifei Zhu
:
Progressive Learning with Human Feedback for Personalized Adaptive Video Streaming. 12045-12053 - Jiaxun Zhang
, Haicheng Liao
, Yumu Xie
, Chengyue Wang
, Yanchen Guan
, Bin Rao
, Zhenning Li
:
Eyes on the Road, Mind Beyond Vision: Context-Aware Multi-modal Enhanced Risk Anticipation. 12054-12063 - Jingrou Wu
, Haoxian Liu
, Jin Zhang
, Dan Wang
, Jing Jiang
:
P2VS: Progressive Partition-Based Volumetric Video Streaming under Network Dynamics. 12064-12073 - Ahmad Alhilal
, Ze Wu
, Teemu Kämäräinen
, Tristan Braud
, Matti Siekkinen
:
Congestion Control for VR Cloud Gaming: Integration and Comparison in Real VR Gaming Environment. 12074-12082 - Junqi Liao
, Yaojun Wu
, Chaoyi Lin
, Zhipin Deng
, Li Li
, Dong Liu
, Xiaoyan Sun
:
EHVC: Efficient Hierarchical Reference and Quality Structure for Neural Video Coding. 12083-12091 - Runjie Wang
, Kemi Chen
, Shuijie Li
, Mingkai Chen
, Tiesong Zhao
:
Efficient Semantic Codec for Real-time Vibrotactile Transmission. 12092-12101 - Ruonan Chai
, Yixiang Zhu
, Xinjiao Li
, Jiawei Li
, Zili Meng
, Dirk Kutscher
:
INDS: Incremental Named Data Streaming for Real-Time Point Cloud Video. 12102-12110 - Yufeng Chen
, Umakant Kulkarni
, Voicu Popescu
, Sonia Fahmy
:
RUN: A Case for Cross-Layer Networked Virtual Reality. 12111-12120 - Daoxu Sheng
, Qi Qi
, Jingyu Wang
, Jianxin Liao
:
Watch, Skip, Repeat: Hotspot-Aware Joint Optimization for Video Streaming. 12121-12130 - Feida Liu
, Yifan Wang
, Jiaqi Zheng
, Boxi Liu
, Guihai Chen
:
Themis: Toward Stable Near-Zero Queuing Delay in Congestion Control for Low-Latency Interactive Video Streaming. 12131-12139
Brave New Ideas
- Xinyu Zhang
, Dong Gong
, Zicheng Duan
, Anton van den Hengel
, Lingqiao Liu
:
Let Your Video Listen to Your Music! - Beat-Aligned, Content-Preserving Video Editing with Arbitrary Music. 12140-12149 - Matyas Bohacek
, Ignacio Vilanova Echavarri
:
Compliance Rating Scheme: A Data Provenance Framework for Generative AI Datasets. 12150-12159 - Weilin Wu
, Shifan Yang
, Qizhao Lin
, Xinghong Chen
, Kunping Yang
, Jing Wang
, Guannan Chen
:
A Novel Perspective on Low-Light Image Enhancement: Leveraging Artifact Regularization and Walsh-Hadamard Transform. 12160-12169 - Esen K. Tütüncü
, Lissette Lemus
, Kris Pilcher
, Holger Sprengel
, Jordi Sabater-Mir
:
Teaching AI to Feel: A Collaborative, Full-Body Exploration of Emotive Communication. 12170-12178 - Xiaohao Liu
, Xiaobo Xia
, Zhuo Huang
, See-Kiong Ng
, Tat-Seng Chua
:
Towards Modality Generalization: A Benchmark and Prospective Analysis. 12179-12188 - Xiufeng Huang
, Ziyuan Luo
, Qi Song
, Ruofei Wang
, Renjie Wan
:
MarkSplatter: Generalizable Watermarking for 3D Gaussian Splatting Model via Splatter Image Structure. 12189-12198 - Qi Song
, Ziyuan Luo
, Ka Chun Cheung
, Simon See
, Renjie Wan
:
Align 3D Representation and Text Embedding for 3D Content Personalization. 12199-12208 - Divya Kothandaraman
, Ming Lin
, Dinesh Manocha
:
Financial Models meets Generative Art: Black-Scholes-Inspired Concept Blending in Text-to-Image Diffusion. 12209-12217 - Yili Jin
, Ling Pan
, Rui-Xiao Zhang
, Jiangchuan Liu
, Xue Liu
:
Generative Flow Networks for Personalized Multimedia Systems: A Case Study on Short Video Feeds. 12218-12226 - Zihao Wang
, Ruibin Yuan
, Ziqi Geng
, Hengjia Li
, Xingwei Qu
, Xinyi Li
, Songye Chen
, Haoying Fu
, Roger B. Dannenberg
, Kejun Zhang
:
Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model. 12227-12236 - Yili Jin
, Xue Liu
, Jiangchuan Liu
:
Generative AI for Multimedia Communication: Recent Advances, An Information-Theoretic Framework, and Future Opportunities. 12237-12246 - Mengying Duan
, He Li
, Mang Ye
:
MLLMs Meet Person Re-identification. 12247-12256 - Xingjun Ma
, Hanxun Huang
, Tianwei Song
, Ye Sun
, Yifeng Gao
, Yu-Gang Jiang
:
T2UE: Generating Unlearnable Examples from Text Descriptions. 12257-12265 - Yudong Zhang
, Ruobing Xie
, Yiqing Huang
, Jiansheng Chen
, Xingwu Sun
, Zhanhui Kang
, Di Wang
, Yu Wang
:
Fighting Fire with Fire (F3): A Training-free and Efficient Visual Adversarial Example Purification Method in LVLMs. 12266-12275 - Emanuele Artioli
, Daniele Lorenzi
, Shivi Vats
, Farzad Tashtarian
, Christian Timmerer
:
GenStream: Semantic Streaming Framework for Generative Reconstruction of Human-centric Media. 12276-12284 - Zhihao Hao
, Bob Zhang
, Haisheng Li
:
Towards a Global Spatial-Temporal Food Memory: A Vision for Privacy-Preserving Collaborative Multimedia Analysis. 12285-12294 - Luca Rossetto
, Heiko Schuldt
, Ralph Gasser
:
Towards a Universal Query Representation for Multimodal Information Retreival. 12295-12303 - Xingqi Wang
, Xiaoyuan Yi
, Xing Xie
, Jia Jia
:
Specify Privacy Yourself: Assessing Inference-Time Personalized Privacy Preservation Ability of Large Vision-Language Models. 12304-12313 - Songning Lai
, Mingqian Liao
, Zhangyi Hu
, Jiayu Yang
, Wenshuo Chen
, Hongru Xiao
, Jianheng Tang
, Haicheng Liao
, Yutao Yue
:
Learning New Concepts, Remembering the Old: Continual Learning for Multimodal Concept Bottleneck Models. 12314-12322 - Yichi Zhang
, Zhuo Chen
, Lingbing Guo
, Yajing Xu
, Min Zhang
, Wen Zhang
, Huajun Chen
:
Abstractive Visual Understanding of Multi-modal Structured Knowledge: A New Perspective for MLLM Evaluation. 12323-12332 - Sujaya Maiyya
, Shantanu Sharma
, Avinash Kumar
:
HEALTH+: Empowering Individuals via Unifying Health Data. 12333-12341 - Quanqi Du
, Loic De Langhe
, Els Lefever
, Véronique Hoste:
LDW: Label Divergence Weighting for Multimodal Sentiment Analysis. 12342-12351 - Haozhe Jia
, Wenshuo Chen
, Zhihui Huang
, Lei Wang
, Hongru Xiao
, Nanqian Jia
, Keming Wu
, Songning Lai
, Bowen Tian
, Yutao Yue
:
Physics-Informed Representation Alignment for Sparse Radio-Map Reconstruction. 12352-12360 - Aminul Islam
, Md. Mustakin Alam
, Shaker Islam
:
The Birth of Vision Language. 12361-12370 - Nidham Tekaya
, Manuela Waldner
, Matthias Zeppelzauer
:
A Matter of Time: Revealing the Structure of Time in Vision-Language Models. 12371-12380 - Kangzhong Wang
, Zitong Shen
, Youqian Zhang
, MK Michael Cheung
, Xiapu Luo
, Grace Ngai
, Eugene Yujun Fu
:
One Size Fits All? A Modular Adaptive Sanitization Kit (MASK) for Customizable Privacy-Preserving Phone Scam Detection. 12381-12389 - Qichao Dong
, Lingyu Liu
, Yaxiong Wang
, Jason J. R. Liu
, Zhedong Zheng
:
Domain-Agnostic Neural Oil Painting via Normalization Affine Test-Time Adaptation. 12390-12398 - Ana Carolina Condez
, Diogo Tavares
, João Magalhães:
MoralCLIP: Contrastive Alignment of Vision-and-Language Representations with Moral Foundations Theory. 12399-12408 - Aleksandr Farseev
, Marlo Ongpin
, Qi Yang
, Ilia Gossoudarev
, Yu-Yi Chu-Farseeva
, Sergey I. Nikolenko
:
MindFuse: Towards GenAI Explainability in Marketing Strategy Co-Creation. 12409-12418 - Kevin Kailun Zhang
, Ying Sun
, Hui Xiong
:
Multifractal Comparison of Billboard and AI-Generated Music. 12419-12427 - Natalia Jakubiec
, Lucjan Janowski
:
From Hemoglobin to MOS: Towards Neuro-Based QoE Assessment Using fNIRS. 12428-12436 - Julie Tores
, Elisa Ancarani
, Rémy Sun, Lucile Sassatelli
, Hui-Yin Wu
, Frédéric Precioso:
Re-examining Concept-based Explainable Models for Multimodal Interpretative Tasks. 12437-12445 - Rutger Hendrix
, Giovanni Patanè
, Leonardo G. Russo
, Simone Carnemolla
, Federica Proietto Salanitri
, Giovanni Bellitto
, Concetto Spampinato
, Matteo Pennisi
:
Pre-Forgettable Models: Prompt Learning as a Native Mechanism for Unlearning. 12446-12454 - Yingjun Dai
, Ahmed El-Roby
:
RQ-Rec: Residual Quantized Hierarchical Preference Modeling for Cross-Domain Recommendation. 12455-12463 - Hongru Xiao
, Xiang Li
, Duyi Pan
, Longfei Zhang
, ZhixueSong ZhixueSong
, Jiale Han
, Songning Lai
, Wenshuo Chen
, Jing Tang
, Benyou Wang
:
Can Audio Language Models Listen Between the Lines? A Study on Metaphorical Reasoning via Unspoken. 12464-12472 - Hridayesh Lekhak
, Tuan M. Dang
, Theron S. Wang
, Kenny Q. Zhu
:
A Data-driven Approach to the Longitudinal Study of Canine Vocal Pattern Development. 12473-12482 - Kamran Gholizadeh HamlAbadi
, Monica (Monireh) Vahdati
, Fedwa Laamarti
, Abdulmotaleb El Saddik
:
Agent-to-Agent (A2A) Protocol Integrated Digital Twin System with AgentIQ for Multimodal AI Fitness Coaching and Personalized Well-Being. 12483-12491 - Xun Li
, Rodrigo Santa Cruz
, Mingze Xi
, Hu Zhang
, Madhawa Perera
, Ziwei Wang
, Ahalya Ravendran
, Brandon J. Matthews
, Feng Xu
, Matt Adcock
, Dadong Wang
, Jiajun Liu
:
Queryable 3D Scene Representation: A Multi-Modal Framework for Semantic Reasoning and Robotic Task Planning. 12492-12500 - Hao Yang
, Tian Zheng
, Yanyan Zhao
, Bing Qin
:
Ensuring Responses Contain Appropriate Images: Timing Judgment for Multimodal Responses. 12501-12508 - Xiang Fang
, Wanlong Fang
, Wei Ji
, Tat-Seng Chua
:
Turing Patterns for Multimedia: Reaction-Diffusion Multi-Modal Fusion for Language-Guided Video Moment Retrieval. 12509-12518 - Baining Zhao
, Rongze Tang
, Mingyuan Jia
, Ziyou Wang
, Fanhang Man
, Xin Zhang
, Yu Shang
, Weichen Zhang
, Wei Wu
, Chen Gao
, Xinlei Chen
, Yong Li
:
AirScape: An Aerial Generative World Model with Motion Controllability. 12519-12528 - Dong Liu
, Yanxuan Yu
:
TinyServe: Query-Aware Cache Selection for Efficient LLM Serving. 12529-12537 - Igor N. Meleshin
, Anna Chistyakova
, Anastasia Antsiferova
, Dmitriy S. Vatolin
:
Robustness as Architecture: Designing IQA Models to Withstand Adversarial Perturbations. 12538-12546 - Xiangxian Li
, Yawen Zheng
, Baiqiao Zhang
, Yijia Ma
, Xianhui Cao
, Juan Liu
, Yulong Bian
, Jin Huang
, Chenglei Yang
:
MAGNeT: Multimodal Adaptive Gaussian Networks for Intent Inference in Moving Target Selection across Complex Scenarios. 12547-12555
Datasets
- Feng-Kai Huang, Hong-Wei Xu, Chu-Chuan Lee, Hong-Yi Tu, Hong-Han Shuai
, Wen-Huang Cheng:
OinkTrack: An Ultra-Long-Term Dataset for Multi-Object Tracking and Re-Identification of Group-Housed Pigs. 12556-12563 - Zhuojun Wu
, Dong Liu
, Juan Liu
, Yechen Wang
, Linxi Li
, Liwei Jin
, Hui Bu
, Pengyuan Zhang
, Ming Li
:
SMIIP-NV: A Multi-Annotation Non-Verbal Expressive Speech Corpus in Mandarin for LLM-Based Speech Synthesis. 12564-12570 - Bingjian Yang
, Danni Xu
, Kaipeng Niu
, Wenxuan Liu
, Zheng Wang
, Mohan Kankanhalli
:
A New Dataset and Benchmark for Grounding Multimodal Misinformation. 12571-12577 - Abel Yu Hao Chai
, Kelly Li Zhen Jee
, Sue Han Lee
, Fei Siang Tay
, Jules Vandeputte
, Hervé Goëau
, Pierre Bonnet
, Alexis Joly
:
Deep-Plant-Disease Dataset Is All You Need for Plant Disease Identification. 12578-12584 - Wentao Liu
, Qianjun Pan
, Yi Zhang
, Zhuo Liu
, Ji Wu
, Jie Zhou
, Aimin Zhou
, Qin Chen
, Bo Jiang
, Liang He
:
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models. 12585-12591 - Hengrui Lou
, Zunlei Feng
, Jinsong Geng
, Erteng Liu
, Lechao Cheng
, Jie Lei
, Jie Song
, Mingli Song
, Yijun Bei
:
A Large-scale Universal Evaluation Benchmark For Face Forgery Detection. 12592-12599 - Duy-Khang Ho
, Minh-Quan Ho-Le
, Van-Tu Ninh
, Cathal Gurrin
, Minh-Triet Tran
:
LSC-ADL: An Activity of Daily Living (ADL)-Annotated Lifelog Dataset Generated via Semi-Automatic Clustering. 12600-12606 - Qiyuan Guan
, Qianfeng Yang
, Xiang Chen
, Tianyu Song
, Guiyue Jin
, Jiyu Jin
:
WeatherBench: A Real-World Benchmark Dataset for All-in-One Adverse Weather Image Restoration. 12607-12613 - Tao Feng
, Tingfa Xu
, Haolin Qin
, Tianhao Li
, Shuaihao Han
, Xuyang Zou
, Zhan Lv
, Jianan Li
:
MSITrack: A Challenging Benchmark for Multispectral Single Object Tracking. 12614-12620 - Quang-Trung Truong
, Yuk-Kwan Wong
, Vo Hoang Kim Tuyen Dang
, Rinaldi Gotama
, Duc Thanh Nguyen
, Sai-Kit Yeung
:
MSC: A Marine Wildlife Dataset for Video Understanding with Grounded Segmentation and Clip-Level Captions. 12621-12628 - Luca Rossetto
, Werner Bailer
, Duc-Tien Dang-Nguyen
, Graham Healy
, Björn Þór Jónsson
, Onanong Kongmeesub
, Hoang-Bao Le
, Stevan Rudinac
, Klaus Schöffmann
, Florian Spiess
, Allie Tran
, Minh-Triet Tran
, Quang-Linh Tran
, Cathal Gurrin
:
The CASTLE 2024 Dataset: Advancing the Art of Multimodal Understanding. 12629-12635 - Guankun Wang
, Han Xiao
, Renrui Zhang
, Huxin Gao
, Long Bai
, Xiaoxiao Yang
, Zhen Li
, Hongsheng Li
, Hongliang Ren
:
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection. 12636-12643 - Cheng Zhang
, Hongxia Xie
, Bin Wen
, Songhan Zuo
, Ruoxuan Zhang
, Wen-Huang Cheng
:
EmoArt: A Multidimensional Dataset for Emotion-Aware Artistic Generation. 12644-12650 - Yulong Li
, Yuxuan Zhang
, Rui Chen
, Feilong Tang
, Zhixiang Lu
, Ming Hu
, Jianghao Wu
, Haochen Xue
, Mian Zhou
, Chong Li
, Jionglong Su
, Imran Razzak
:
Genesis: A Large-Scale Benchmark for Multimodal Large Language Model in Emotional Causality Analysis. 12651-12658 - Ruoxuan Zhang
, Jidong Gao
, Bin Wen
, Hongxia Xie
, Chenming Zhang
, Hong-Han Shuai
, Wen-Huang Cheng
:
RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation. 12659-12665 - Jiarui Wang
, Huiyu Duan
, Juntong Wang
, Ziheng Jia
, Woo Yi Yang
, Xiaorong Zhu
, Yu Zhao
, Jiaying Qian
, Yuke Xing
, Guangtao Zhai
, Xiongkuo Min
:
DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal Models. 12666-12673 - Bohan Zeng
, Ling Yang
, Jiaming Liu
, Minghao Xu
, Yuanxing Zhang
, Pengfei Wan
, Wentao Zhang
, Shuicheng Yan
:
EditWorld: Simulating World Dynamics for Instruction-Following Image Editing. 12674-12681 - Yuke Xing
, Jiarui Wang
, Peizhi Niu
, Wenjie Huang
, Guangtao Zhai
, Yiling Xu
:
3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting. 12682-12689 - Xiaorong Zhu, Ziheng Jia, Jiarui Wang, Xiangyu Zhao, Haodong Duan, Xiongkuo Min, Jia Wang, Zicheng Zhang
, Guangtao Zhai
:
GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs. 12690-12697 - Chenhui Qiang
, Zhaoyang Wei
, Xumeng Han
, Zipeng Wang
, Siyao Li
, Xiangyuan Lan
, Jianbin Jiao
, Zhenjun Han
:
VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual Evidence. 12698-12705 - Yingbo Tang
, Lingfeng Zhang
, Shuyi Zhang
, Yinuo Zhao
, Xiaoshuai Hao
:
RoboAfford: A Dataset and Benchmark for Enhancing Object and Spatial Affordance Learning in Robot Manipulation. 12706-12713 - Zihao Wang
, Shulei Ji
, Le Ma
, Yuhang Jin
, Shun Lei
, Jianyi Chen
, Haoying Fu
, Roger B. Dannenberg
, Kejun Zhang
:
Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent Recognition. 12714-12721 - Wenzhuo Jin
, Qianfeng Yang
, Xianhao Wu
, Hongming Chen
, Pengpeng Li
, Xiang Chen
:
SmokeBench: A Real-World Dataset for Surveillance Image Desmoking in Early-Stage Fire Scenes. 12722-12728 - Chenghanyu Zhang
, Zekun Li
, Peipei Li
, Xing Cui
, Shuhan Xia
, Weixiang Yan
, Yiqiao Zhang
, Qianyu Zhuang
:
SpineBench: Benchmarking Multimodal LLMs for Spinal Pathology Analysis. 12729-12736 - Shushi Wang
, Chunyi Li
, Zicheng Zhang
, Han Zhou
, Wei Dong
, Jun Chen
, Guangtao Zhai
, Xiaohong Liu
:
AU-IQA: A Benchmark Dataset for Perceptual Quality Assessment of AI-Enhanced User-Generated Content. 12737-12744 - Shuyi Zhang
, Xiaoshuai Hao
, Yingbo Tang
, Lingfeng Zhang
, Pengwei Wang
, Zhongyuan Wang
, Hongxuan Ma
, Shanghang Zhang
:
Video-CoT: A Comprehensive Dataset for Spatiotemporal Understanding of Videos Based on Chain-of-Thought. 12745-12752 - Xinyi Wang
, Pengfei Ren
, Haoyang Zhang
, Xin Sheng
, Da Li
, Liang Xie
, Yue Gao
, Erwei Yin
:
VIHand: Enhancing 3D Hand Pose Estimation with Visual-Inertial Benchmark. 12753-12760 - Tingrui Shen
, Bangzhen Liu
, Zhirun Fan
, Shiting Zhang
, Weifeng Pan
, Fan Sun
, Dan Cao
, Shengfeng He
:
Language-Driven 3D Human Pose Estimation in Multi-Person Scenarios: A New Dataset and Approach. 12761-12768 - Ruiyan Wang
, Lin Zuo
, Zonghao Lin
, Qiang Wang
, Zhengxue Cheng
, Rong Xie
, Jun Ling
, Li Song
:
PA-HOI: A Physics-Aware Human and Object Interaction Dataset. 12769-12775 - Xusheng He
, Wei Liu
, Shanshan Ma
, Qian Liu
, Chenghao Ma
, Jianlong Wu
:
FineBadminton: A Multi-Level Dataset for Fine-Grained Badminton Video Understanding. 12776-12783 - Weichen Zhang
, Zile Zhou
, Xin Zeng
, Xuchen Liu
, Jianjie Fang
, Chen Gao
, Jinqiang Cui
, Yong Li
, Xinlei Chen
, Xiao-Ping Zhang
:
Open3D-VQA: A Benchmark for Embodied Spatial Concept Reasoning with Multimodal Large Language Model in Open Space. 12784-12791 - Liang Yao
, Fan Liu
, Shengxiang Xu
, Chuanyi Zhang
, Shimin Di
, Xing Ma
, Jianyu Jiang
, Zequan Wang
, Jun Zhou
:
UEMM-Air: Enable UAVs to Undertake More Multi-modal Tasks. 12792-12798 - Minghao Zou
, Qingtian Zeng
, Yongping Miao
, Shangkun Liu
, Zilong Wang
, Hantao Liu
, Wei Zhou
:
PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments. 12799-12806 - Zheng Liu, Hao Liang, Bozhou Li, Wentao Xiong, Chong Chen, Conghui He, Wentao Zhang, Bin Cui:
SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models. 12807-12814 - Wuxia Zhang
, Yang Xin
, Shibo Lv
, Xin Zhang
, Xiang Zhong
, Jianmin Jiang
:
EEG-Face: A Facial-Image Stimulated EEG Data-Set for Analysis of Brain Perceived Multimedia. 12815-12821 - Stefan J. Arzberger
, Paul Raith
, Werner Bailer
, Marion Jaks
:
A Dataset and Metric for Textual Video Content Description. 12822-12828 - Lei Zhang
, Xin Zhou
, Chaoyue He
, Di Wang
, Yi Wu
, Hong Xu
, Wei Liu
, Chunyan Miao
:
MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks. 12829-12836 - Tan Yue
, Xuzhao Shi
, Rui Mao
, Zilong Song
, Zonghai Hu
, Dongyan Zhao
:
AnaFig: A Human-Aligned Dataset for Scientific Figure Analysis. 12837-12843 - Zhihua Wang
, Weixia Zhang
, Wei Zhou
, Xiaohong Liu
, Guangtao Zhai
, Patrick Le Callet
:
Evaluating Perceptual Color Preferences in Smartphone Photography: Dataset and Challenges. 12844-12850 - Liang Cheng
, Hao Wang
, Chenwei Wu, Haochen You
, Xianhao Wu
:
Unlocking Joint Image Deraining and Low-Light Enhancement: Benchmark and Baseline. 12851-12858 - SungHyun Moon
, Aidyn Zhakatayev
, SeungJae Lee
:
HAN: Korean Heritage Augmented Narrative Visual-Language Description Dataset. 12859-12866 - Ziliang Gan
, Dong Zhang
, Haohan Li
, Yang Wu
, Xueyuan Lin
, Ji Liu
, Haipang Wu
, Chaoyou Fu
, Zenglin Xu
, Rongjunchen Zhang
, Yong Dai
:
MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning. 12867-12874 - Maksim Golyadkin
, Innokentiy Humonen
, Valeria Rubanova
, Danil Kalin
, Ianis Plevokas
, Dmitry Nikolotov
, Aleksandr Utkov
, Nikita Sidelnikov
, Petr Ivanov
, Ekaterina Bureeva
, Ekaterina Alexandrova
, Ilya Makarov
:
MuMMy: Multimodal Dataset supporting VLM-based Egyptology Research Assistant. 12875-12881 - Bate Li
, Houqiang Zhong
, Zhengxue Cheng
, Qiang Hu
, Qiang Wang
, Li Song
, Wenjun Zhang
:
MultiEgo: A Multi-View Egocentric Video Dataset for 4D Scene Reconstruction. 12882-12889 - Shaokai Wu
, Yapan Guo
, Yanbiao Ji
, Jing Tong
, Yuxiang Lu
, Mei Li
, Suizhi Huang
, Yue Ding
, Hongtao Lu
:
MORE: Multi-Organ Medical Image REconstruction Dataset. 12890-12896 - Minyi Zhao
, Yi Liu
, Wensong He
, Bingzhe Yu
, Yuxi Mi
, Shuigeng Zhou
:
Towards High Robust Vision-Language Large Models: Benchmark and Method. 12897-12904 - Xing Zi
, Jinghao Xiao
, Yunxiao Shi
, Xian Tao
, Jun Li
, Ali Braytee
, Mukesh Prasad
:
RSVLM-QA: A Benchmark Dataset for Remote Sensing Vision Language Model-based Question Answering. 12905-12911 - Bichen Wang
, Yixin Sun
, Yanyan Zhao
, Bing Qin
:
Beyond Snapshots: A Multimodal User-Level Dataset for Depression Detection in Dynamic Social Media Streams. 12912-12918 - Chenxi Wang
, Jizhan Fang
, Xiang Chen
, Bozhong Tian
, Ziwen Xu
, Huajun Chen
, Ningyu Zhang
:
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems. 12919-12926 - Mingsong Yang
, Xinhong Hei
, Kehai Chen
, Haining Meng
, Haoyang Dong
, Qin Zhao
:
BIMCompNet: Multimodal Dataset for Geometric Deep Learning in Building Information Model. 12927-12933 - Chunyi Li
, Xiele Wu
, Haoning Wu
, Donghui Feng
, Zicheng Zhang
, Guo Lu
, Xiongkuo Min
, Xiaohong Liu
, Guangtao Zhai
, Weisi Lin
:
Towards a New Paradigm of Visual Signal Compression. 12934-12941 - Hao Liang
, Linzhuang Sun
, zhouminxuan zhouminxuan
, Zirong Chen
, Meiyi Qiang
, Mingan Lin
, Tianpeng Li
, Fan Yang
, Zenan Zhou
, Wentao Zhang
:
MathScape: Benchmarking Multimodal Large Language Models in Real-World Mathematical Contexts. 12942-12948 - Janet Wang
, Xin Hu
, Yunbei Zhang
, Diabate Almamy
, Vagamon Bamba
, Konan Amos Sébastien Koffi
, Koffi Aubin Yao
, Zhengming Ding
, Jihun Hamm
, Rie Roselyne Yotsu
:
eSkinHealth: A Multimodal Dataset for Neglected Tropical Skin Diseases. 12949-12956 - Cong Cai
, Shan Liang
, Xuefei Liu
, Kang Zhu
, Zhengqi Wen
, Jianhua Tao
, Heng Xie
, Jizhou Cui
, Yiming Ma
, Zhenhua Cheng
, Hanzhe Xu
, Ruibo Fu
, Bin Liu
, Yongwei Li
:
MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics. 12957-12964 - Xiaoyu Guo
, Pengzhi Zhong
, Hao Zhang
, Defeng Huang
, Huikai Shao
, Qijun Zhao
, Shuiwang Li
:
Camouflaged Object Tracking: A Benchmark. 12965-12972 - Wentao Mo
, Qingchao Chen
, Yuxin Peng
, Siyuan Huang
, Yang Liu
:
Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset. 12973-12980 - Tianlong Yu
, Chenghang Ye
, Zheyu Yang
, Ziyi Zhou
, Cui Tang
, Zui Tao
, Jun Zhang
, Kailong Wang
, Liting Zhou
, Yang Yang
, Ting Bi
:
SEAR: A Multimodal Dataset for Analyzing AR-LLM-Driven Social Engineering Behaviors. 12981-12987 - Mohammad Hossein Izadimehr
, Milad Ghanbari
, Guodong Chen
, Wei Zhou
, Xiaoshuai Hao
, Mallesham Dasari
, Christian Timmerer
, Hadi Amirpour
:
SVD: Spatial Video Dataset. 12988-12994 - Felix Immohr
, Gareth Rendle
, Annika Neidhardt
, Anton Benjamin Lammert
, Bernd Froehlich
, Alexander Raake
:
ICS-MR: Interactive Conversation Scenarios for Assessment of Mixed Reality Communication. 12995-13001 - Ruixu Zhang
, Yuran Wang
, Xinyi Hu
, Chaoyu Mai
, Wenxuan Liu
, Danni Xu
, Xian Zhong
, Zheng Wang
:
Beyond the Individual: Introducing Group Intention Forecasting with SHOT Dataset. 13002-13008 - Yuzhuo Li
, Di Zhao
, Tingrui Qiao
, Yihao Wu
, Bo Pang
, Yun Sing Koh
:
MetaWild: A Multimodal Dataset for Animal Re-Identification with Environmental Metadata. 13009-13015 - Zhixia Zhao
, Qiyue Li
, Jie Li
, Richang Hong
, Zhi Liu
:
ViewGauss: A Head Movement Dataset for 6DoF Gaussian Splatting Video Viewing. 13016-13022 - Jianqiang Xiao
, Yuexuan Sun
, Yixin Shao
, Boxi Gan
, Rongqiang Liu
, Yanjin Wu
, Weili Guan
, Xiang Deng
:
UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents. 13023-13029 - Zhende Song
, Chenchen Wang
, Jiamu Sheng
, Chi Zhang
, Shengji Tang
, Jiayuan Fan
, Tao Chen
:
DreamFrame: Enhancing Video Understanding via Automatically Generated QA and Style-Consistent Keyframes. 13030-13037 - Yifan Wang
, Jie Gui
, Baosheng Yu
, Qi Li
, Zhenan Sun
, Juho Kannala
, Guoying Zhao
:
FingerVeinSyn-5M: A Million-Scale Dataset and Benchmark for Finger Vein Recognition. 13038-13045 - Peirong Zhang
, Yidan Zhang
, Hanru Shi
, Dianyu Wang
, Xiaoxuan Liu
, Lei Wang
:
Referring Multi-Object Tracking in Satellite Videos: A New Benchmark and Baseline. 13046-13052 - Konstantin Egorov
, Stepan Botman
, Pavel Blinov
, Galina Zubkova
, Anton Ivaschenko
, Alexander Kolsanov
, Andrey V. Savchenko
:
Gaze into the Heart: A Multi-View Video Dataset for rPPG and Health Biomarkers Estimation. 13053-13059 - Bowen Yuan
, Selena Song
, Javier Fernandez
, Yadan Luo
, Mahsa Baktashmotlagh
, Zijian Wang
:
WisWheat: A Three-Tiered Vision-Language Dataset for Wheat Management. 13060-13067 - Zecheng Zhao
, Selena Song
, Tong Chen
, Zhi Chen
, Shazia Sadiq
, Yadan Luo
:
Are Synthetic Videos Useful? A Benchmark for Retrieval-Centric Evaluation of Synthetic Videos. 13068-13074 - Mohammad Ghasempour
, Hadi Amirpour
, Christian Timmerer
:
Nature-1k: The Raw Beauty of Nature in 4K at 60FPS. 13075-13081 - Sowmya Vijayakumar
, Tong Xue
, Abdallah El Ali
, Irene Viola
, Ronan Flynn
, Peter Corcoran
, Pablo César
, Niall Murray
:
RCQoEA-360VR: Real-time Continuous QoE Scores for HMD-based 360° VR Dataset. 13082-13088 - Yulin Sun
, Qisheng Xu
, Yi Su
, Qian Zhu
, Yong Dou
, Xinwang Liu
, Kele Xu
:
AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation. 13089-13096 - Ines Riahi
, Abduljalil Radman
, Zixin Guo
, Rachid Hedjam
, Jorma Laaksonen
:
Valor32k-AVQA v2.0: Open-Ended Audio-Visual Question Answering Dataset and Benchmark. 13097-13103 - Alessandro Ragano
, Carl Timothy Tolentino
, Kata Szita
, Dan Barry
, Davoud Shariat Panah
, Niall Murray
, Andrew Hines
:
EgoMusic: An Egocentric Augmented Reality Glasses Dataset for Music. 13104-13111 - Guillaume Gautier
, Xuemei Zhou
, Thong Nguyen
, Jack Jansen
, Louis Fréneau
, Marko Viitanen
, Uyen Phan
, Jani Käpylä, Irene Viola
, Alexandre Mercat
, Pablo César
, Jarno Vanne
:
UVG-CWI-DQPC: Dual-Quality Point Cloud Dataset for Volumetric Video Applications. 13112-13118 - Hieu Nguyen
, Phuc-Tan Nguyen
, Thien-Phuc Tran
, Minh-Quang Nguyen
, Tam V. Nguyen
, Minh-Triet Tran
, Trung-Nghia Le
:
OpenEvents V1: Large-Scale Benchmark Dataset for Multimodal Event Grounding. 13119-13125 - Zihao Ding
, Cheng-Tse Lee
, Mufeng Zhu
, Tao Guan
, Yuan-Chun Sun
, Cheng-Hsin Hsu
, Yao Liu
:
EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR. 13126-13132 - Yang Yao
, Lingyu Li
, Jiaxin Song
, Chiyu Chen
, Zhenqi He
, Yixu Wang
, Xin Wang
, Tianle Gu
, Jie Li
, Yan Teng
, Yingchun Wang
:
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes? 13133-13140 - Sahar Nasirihaghighi
, Negin Ghamsarian
, Leonie Peschek
, Matteo Munari
, Heinrich Husslein
, Raphael Sznitman
, Klaus Schoeffmann
:
GynSurg: A Comprehensive Gynecology Laparoscopic Surgery Dataset. 13141-13147 - Shifu Xiong
, Hang Chen
, Shi Cheng
, Kai Shen
, Hengshun Zhou
, Genshun Wan
, Chenyue Zhang
, Kewei Li
, Jun Du
, Lirong Dai
:
MISP-QEKS: A Large-Scale Dataset with Multimodal Cues for Query-by-Example Keyword Spotting. 13148-13155 - Jinke Li
, Jiarui Yu
, Chenxing Wei
, Hande Dong
, Qiang Lin
, Liangjing Yang
, Zhicai Wang
, Yanbin Hao
:
UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models. 13156-13163 - Debora Russo
, Nicola Mazzocca
, Valeria Vittorini
:
UR-MAT: A Multimodal, Material-Aware Synthetic Dataset of Urban Scenarios. 13164-13169 - Gabriele Magrini
, Niccolò Marini, Federico Becattini
, Lorenzo Berlincioni
, Niccolò Biondi
, Pietro Pala
, Alberto Del Bimbo
:
FRED: The Florence RGB-Event Drone Dataset. 13170-13176 - Yuchen Zhang
, Tailin Chen
, Jiangbei Yue
, Yueming Sun
, Rahul Singh
, Jianbo Jiao
, Zeyu Fu
:
DeHate: A Holistic Hateful Video Dataset for Explicit and Implicit Hate Detection. 13177-13183 - Guangxun Zhu
, Shiyu Fan
, Hang Dai
, Edmond S. L. Ho
:
Waymo-3DSkelMo: A Multi-Agent 3D Skeletal Motion Dataset for Pedestrian Interaction Modeling in Autonomous Driving. 13184-13190 - Negin Ghamsarian
, Raphael Sznitman
, Klaus Schoeffmann
, Jens Kowal
:
WetCat: Enabling Automated Skill Assessment in Wet-Lab Cataract Surgery Videos. 13191-13197 - Zijing Zhao
, Zhu Xu
, Qingchao Chen
, Yuxin Peng
, Yang Liu
:
Investigating Domain Gaps for Indoor 3D Object Detection. 13198-13205 - Mitsuki Watanabe
, Sosuke Amano
, Kiyoharu Aizawa
, Yoko Yamakata
:
FoodLogAthl-218: Constructing a Real-World Food Image Dataset Using Dietary Management Applications. 13206-13212 - Bo Liu
, Xiangyu Zhao
, Along He
, Yidi Chen
, Huazhu Fu
, Xiao-Ming Wu
:
GEMeX-RMCoT: An Enhanced Med-VQA Dataset for Region-Aware Multimodal Chain-of-Thought Reasoning. 13213-13220 - Tariq Al Shoura
, Ali Mollaahmadi Dehaghi
, Reza Razavi
, Mohammad Moshirpour
:
VIDEA-8K-60FPS Dataset: 8K 60FPS Video Sequences for Analysis and Development. 13221-13227 - Linlin Zong
, Shilin Sui
, Wenjun Liang
, Wanyu Song
, Linlin Tian
, Xinyue Liu
, Xianchao Zhang
, Bo Xu
:
CH-SV: A Benchmark for Multi-Type Chinese Harmful Short Video Detection. 13228-13234 - Changtao Miao
, Yi Zhang
, Man Luo
, Weiwei Feng
, Kaiyuan Zheng
, Qi Chu
, Tao Gong
, Jianshu Li
, Yunfeng Diao
, Wei Zhou
, Joey Tianyi Zhou
, Xiaoshuai Hao
:
MFFI: Multi-Dimensional Face Forgery Image Dataset for Real-World Scenarios. 13235-13242 - Tung-Lam Ngo
, Ba-Hoang Tran
, Duy-Cat Can
, Trung-Hieu Do
, Oliver Y. Chén
, Hoang-Quynh Le
:
MUDI: A Multimodal Biomedical Dataset for Understanding Pharmacodynamic Drug-Drug Interactions. 13243-13249 - Keyue Shi
, Qianqian Shen
, Zhaoming Ye
, Liangjun Jiang
, Jiajun Bu
, Haishuai Wang
:
LUMOS: A Lumbar Multimodal Osteoporosis Screening Dataset with X-ray and CT images. 13250-13257 - Parul Gupta
, Shreya Ghosh
, Tom Gedeon
, Thanh-Toan Do
, Abhinav Dhall
:
Multiverse Through Deepfakes: The MultiFakeVerse Dataset of Person-Centric Visual and Conceptual Manipulations. 13258-13265 - Chenxi Wang
, Yusheng Dai
, Lei Sun
, Jun Du
, Jianqing Gao
:
AudioAtlas: A Comprehensive and Balanced Benchmark Towards Movie-Oriented Text-to-Audio Generation. 13266-13272 - Bobo Li
, Yuheng Wang
, Hao Fei
, Juncheng Li
, Wei Ji
, Mong-Li Lee
, Wynne Hsu
:
FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents. 13273-13280 - Tuan M. Dang
, Theron S. Wang
, Hridayesh Lekhak
, Kenny Q. Zhu
:
EmotionalCanines: A Dataset for Analysis of Arousal and Valence in Dog Vocalization. 13281-13288 - Siqi Chen
, Xinyu Dong
, Haolei Xu
, Xingyu Wu
, Fei Tang
, Hang Zhang
, Yuchen Yan
, Linjuan Wu
, Wenqi Zhang
, Guiyang Hou
, Yongliang Shen
, Weiming Lu
, Yueting Zhuang
:
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation. 13289-13296 - Xiangnan Chen
, Yuancheng Fang
, Juncheng Li
, Qian Xiao
, Jun Lin
, Siliang Tang
, Yueting Zhuang
:
Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts. 13297-13303 - Han Wang
, Zhuoran Wang
, Roy Ka-Wei Lee
:
HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection. 13304-13310 - Deqiang Yin
, Junyi Guo
, Huanda Lu
, Fangyu Wu
, Dongming Lu
:
EditGarment: An Instruction-Based Garment Editing Dataset Constructed with Automated MLLM Synthesis and Semantic-Aware Evaluation. 13311-13317 - Nora Hofer
, Rainer Böhme:
Challenging Cases of Neural Image Compression: A Dataset of Visually Compelling Yet Semantically Incorrect Reconstructions. 13318-13324 - Ruoxi Chen
, Dongping Chen
, Siyuan Wu
, Sinan Wang
, Shiyun Lang
, Peter Sushko
, Gaoyang Jiang
, Yao Wan
, Ranjay Krishna
:
MultiRef: Controllable Image Generation with Multiple Visual References. 13325-13331 - Peng Wang
, Minh Huy Pham
, Zhihao Guo
, Wei Zhou
:
A Spatial Relationship Aware Dataset for Robotics. 13332-13338 - Yang Li
, Tingfa Xu
, Shuyan Bai
, Peifu Liu
, Jianan Li
:
MCOD: The First Challenging Benchmark for Multispectral Camouflaged Object Detection. 13339-13345 - Jieyu Li
, Xin Zhang
, Joey Tianyi Zhou
:
AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences. 13346-13353 - Chunyi Li
, Bo Hu
, Taiyang Chen
, Leida Li
, Lihuo He
, Xinbo Gao
:
Low-light Image Enhancement Quality Assessment: A Real-World Dataset and An Objective Method. 13354-13361 - Jan Zdenek
, Wataru Shimoda
, Kota Yamaguchi
:
OTR: Synthesizing Overlay Text Dataset for Text Removal. 13362-13368 - Hridayesh Lekhak
, Theron S. Wang
, Tuan M. Dang
, Kenny Q. Zhu
:
DogSpeak: A Canine Vocalization Classification Dataset. 13369-13375 - Sijing Wu
, Yunhao Li
, Huiyu Duan
, Yanwei Jiang
, Yucheng Zhu
, Guangtao Zhai
:
HVEval: Towards Unified Evaluation of Human-Centric Video Generation and Understanding. 13376-13383 - Wenxu Gao
, Liang Xie
, Kangli Wang
, Jingxuan Su
, Changhao Peng
, Wei Gao
:
DPCSet: A Large-scale Dynamic Point Cloud Dataset for Compression and Perception. 13384-13390 - Chenglin Wang
, Yucheng Zhou
, Qianning Wang
, Zhe Wang
, Kai Zhang
:
ComplexBench-Edit: Benchmarking Complex Instruction-Driven Image Editing via Compositional Dependencies. 13391-13397 - Haohui Li
, Bowen Qu
, Wei Gao
:
T23D-QA: An Open Dataset and Benchmark for Text-driven 3D Generation Quality Assessment. 13398-13404 - Aleksandr Gushchin
, Maksim Smirnov
, Dmitriy S. Vatolin
, Anastasia Antsiferova
:
LEHA-CVQAD: Dataset To Enable Generalized Video Quality Assessment of Compression Artifacts. 13405-13412 - Jianing Jin
, Jiangyong Ying
, Huiyu Duan
, Liu Yang
, Sijing Wu
, Yunhao Li
, Yushuo Zheng
, Xiongkuo Min
, Guangtao Zhai
:
RGC-VQA: An Exploration Database for Robotic-Generated Video Quality Assessment. 13413-13420 - Jiahao Lin
, Weixuan Peng
, Bojia Zi
, Yifeng Gao
, Xianbiao Qi
, Xingjun Ma
, Yu-Gang Jiang
:
BrokenVideos: A Benchmark Dataset for Fine-Grained Artifact Localization in AI-Generated Videos. 13421-13427 - Nickolay Safonov
, Rakhmanov Mikhail
, Dmitriy S. Vatolin
:
Screen Content Video Dataset and Benchmark. 13428-13434 - Shuo Yang
, Yuqin Dai
, Guoqing Wang
, Xinran Zheng
, Jinfeng Xu
, Jinze Li
, Zhenzhe Ying
, Weiqiang Wang
, Edith C. H. Ngai
:
RealFactBench: A Benchmark for Evaluating Large Language Models in Real-World Fact-Checking. 13435-13441 - Bei Yan
, Zhiyuan Chen
, Yuecong Min
, Jie Zhang
, Jiahao Wang
, Xiaozhen Wang
, Shiguang Shan
:
SHALE: A Scalable Benchmark for Fine-grained Hallucination Evaluation in LVLMs. 13442-13449 - Changsheng Gao
, Wei Zhou
, Guosheng Lin
, Weisi Lin
:
Compressed Feature Quality Assessment: Dataset and Baselines. 13450-13456 - Heng Er Metilda Chee
, Jiayin Wang
, Zhiqiang Guo
, Weizhi Ma
, Min Zhang
:
Small Stickers, Big Meanings: A Multilingual Sticker Semantic Understanding Dataset with a Gamified Approach. 13457-13463 - Yuhang Hu
, Zhenyu Yang
, Shihan Wang
, Shengsheng Qian
, Bin Wen
, Fan Yang
, Tingting Gao
, Changsheng Xu
:
StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA. 13464-13470
Demos and Videos
- Masatoshi Hamanaka
:
Implementation of Visualizer for Beats and Scratches. 13471-13473 - Aishan Liu
, Jiakai Wang
, Tianyuan Zhang
, Hainan Li
, Jiangfan Liu
, Siyuan Liang
, Yilong Ren
, Xianglong Liu
, Dacheng Tao
:
MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving. 13474-13476 - Ziqin Wang
, Jinyu Chen
, Xiangyi Zheng
, Qinan Liao
, Linjiang Huang
, Si Liu
:
'Hi AirStar, Guide Me to the Badminton Court.'. 13477-13479 - Liang Xu
, Songkai Jia
, Cathal Gurrin
, Allie Tran
:
SLIVeR: A Narrative VR Experience for Immersive Lifelog Exploration. 13480-13482 - Mohan Zhang
, Qianqian Hu
, Chuhan Li
, Yanxiu Dan
, Shenglan Cui
, Fang Liu
:
An Aesthetic Cultural Relic Poster Generation Framework Based on Multi-target Learning and Multimodal Large Language Model. 13483-13485 - Chaolong Yang
, Yinuo Guo
, Kai Yao
, Yuyao Yan
, Jie Sun
, Kaizhu Huang
:
KDTalker++: Controllable Talking Portrait Generation with Audio, Text, and Expression Editing. 13486-13488 - Wei Cai
, Jian Zhao
, Yuchu Jiang
, Tianle Zhang
, Xuelong Li
:
Safe Semantics, Unsafe Interpretations: Tackling Implicit Reasoning Safety in Large Vision-Language Models. 13489-13491 - Rongyu Zhang
, Zhanbin Hu
, Jiamu Wang
, Qiang Zhu
:
Omni-LLaMA-AD: A Unified Model for Open-Set Visual Anomaly Detection. 13492-13494 - Xiao Chen
, Wenrui He
, Meng Wang
, Zhanbin Hu
, Chaoquan Shen
, Qiang Zhu
:
PrivEdit: A Zero-Shot Interactive Image Privacy Editing System. 13495-13497 - Zhaofan Qiu
, Zijian Gong
, Yingwei Pan
, Ting Yao
, Tao Mei
:
Talk, Imagine, Evolve: A Unified Multimodal Agent for Seamless Visual Generation and Editing. 13498-13500 - Mikhail Mozikov
, Daniil Orekhov
, Ivan Nasonov
, Konstantin Baltsat
, Vladislav Pedashenko
, Dmitrii Abramov
, Nikita Severin
, Yury Maximov
, Andrey V. Savchenko
, Ilya Makarov
:
HL-EAI: A Multimodal Framework Enabling Emotional Reciprocity in Human-AI Strategic Decision-Making. 13501-13503 - Hadi Amirpour
, Doris Putzgruber-Adamitsch
, Klaus Schoeffmann
:
Depth-Enabled Inspection of Medical Videos. 13504-13506 - Yaojie Li
, Yiheng Zhang
, Zhaofan Qiu
, Yingwei Pan
, Wu Liu
, Ting Yao
, Tao Mei
:
Edit-by-Example: Adaptive Exemplar-Based Image Editing. 13507-13509 - Tan-Hiep To
, Duy-Khang Nguyen
, Minh-Triet Tran
, Trung-Nghia Le
:
Streamlining Virtual KOL Generation Through Modular Generative AI Architecture. 13510-13512 - Andreas Babic
, Xihui Chen
, Djordje Slijepcevic
, Adrian Jaques Böck, Matthias Zeppelzauer
:
CounterHelp: Promoting Online Civil Courage Among Young People Through AI-Generated Counterspeech. 13513-13515 - Nils Riekers
, Marten Risius
, Tong Chen
:
MAXplain: A Multi-Agent System for Interactive Multimodal Hate Speech Detection. 13516-13518 - Anvar Iskhakov
, Viktor Kovalev
, Vladislav Naumov
, Ilya Makarov
:
Real-Time SSL Sperm Whale Click Detector: Interactive Web Demo. 13519-13521 - Igor Abramov
:
EEG-Driven Image Reconstruction with Saliency-Guided Diffusion Models. 13522-13524 - Ashan Perera
, Md Eimran Hossain Eimon
, Juan Merlos
, Velibor Adzic
, Hari Kalva
, Borko Furht
:
FCM-RT: Real-Time Feature Coding for Machines. 13525-13527 - Michael Francis Perez
, Yichi Yang
, Yuheng Zha
, Enze Ma
, Danish Nisar Ahmed Tamboli, Haodi Ma
, Reza Shahriari
, Vyom Pathak
, Dzmitry Kasinets
, Rohith Venkatakrishnan
, Daisy Zhe Wang
, Jaime Ruiz
, Eric D. Ragan
, Zhiting Hu
, Eric P. Xing
, Jun-Yan Zhu
:
CReLeRI: Explainable, Concept-centric, Representation, Learning, Reasoning, and Interaction Video Analysis System. 13528-13530 - Jungsu Kim
, Jungwoo Huh
, Yeseung Park
, Seongjean Kim
, Jeongwook Choi
, Sanghoon Lee
:
Permission to Dance: An End-to-End Dance Enhancement System from Dance Capture to Analysis. 13531-13533 - Nhu-Binh Nguyen Truc
, Nhu-Vinh Hoang
, Tam V. Nguyen
, Minh-Triet Tran
, Trung-Nghia Le
:
Advancing Fashion Design Through Intelligent Sketchpad Studio. 13534-13536 - Ruifan Ji
, Mingyuan Wu
, Bo Chen
, Michael Zink
, Ramesh K. Sitaraman
, Jacob Chakareski
, Klara Nahrstedt
:
Anywhere Avatar: 3D Telepresence with Just a Phone and a Laptop. 13537-13539 - Peng Jin
, Yilin Wen
, Mingzhe Yu
, Yunshan Ma
, Rong Zheng
, Jintu Fan
, Chong Wah Ngo
:
GenWardrobe: A Fully Generative System for Travel Fashion Wardrobe Construction. 13540-13542 - Milad Ghanbari
, Wei Zhou
, Cosmin Stejerean
, Christian Timmerer
, Hadi Amirpour
:
SDART: Spatial Dart AR Simulation with Hand-Tracked Input. 13543-13545 - Alexander Filonenko
, Ilya Makarov
, Andrey V. Savchenko
:
FaceCluster: Interactive Photo Organization with Enhanced Face Recognition. 13546-13548 - Jinzhao Zhou
, Daniel Leong
, Zehong Cao
, Thomas Do
, Sheng-Fu Liang
, Tzyy-Ping Jung
, Chin-Teng Lin
:
MindSpeak: A Real-Time BCI System for Silent Speech. 13549-13551 - Zhifei Xie
, Hu Zongzheng
, Guibin Zhang
, Jialin Zhang
, Yue Liao
, Chunyan Miao
, Shuicheng Yan
:
Pask: Providing Answer before AsKing toward Proactive AI agent. 13552-13554 - Qinglan Wei
, Ruiqi Xue
, Mingyue Liao
, Long Ye
:
Event Chain-Driven Communication Strategy Generation for News Videos. 13555-13557 - Shiqin Liu
, Minjun Zhao
, Jiajun Bu
:
'What Can I Cook?' LetMeCook: An LLM-Based Interactive System for Personalized Recipe Generation. 13558-13560
Doctoral Symposium
- Yuan-Chun Sun
:
Streaming 3DGS Virtual Worlds in 6DoF over Next-Generation Networks. 13561-13565 - João Diogo
:
Enhancing Sports Experiences Through Video-Based Interactions. 13566-13570 - Zhucun Xue
:
Multi-Modal Retrieval Augmented Visual Understanding and Generation. 13571-13575
Interactive/Digital Art
- Masatoshi Hamanaka
, Gou Koutaki
:
RoboSax Melody Slot Machine. 13576-13578 - Jinfan Liu
, Zhangli Hu
, Hanqi Chen
, Ye Chen
, Bingbing Ni
, Shuicheng Yan
:
AR 2O Painter: An Artistic Oriented Realtime Realistic Oil Painting Agent Powered by Efficient Fluid Simulation. 13579-13581 - Raphaëlle Lemaire, Azamat Kaibaldiyev
, Eléonore Mariette, Débora Viglieri, Alexis Lechervy
, Fabrice Maurel
, Gaël Dias, Jérémie Pantin
, Gaëtane Blaizot, Véronique Agin, Nicolas Poirel
, Eric Bui
, Hervé Platel, Denis Vivien
, Youssef Chahir
:
WYSIWYG: What You See Is Where Your Gaze. 13582-13584 - Xuanyang Huang
, Wei Huang
:
Mirage. 13585-13586 - Mingdong Song
, Yufei Huang
:
Through The Mirage, Sky Meets Oculus: Rethinking Human-AI Romantic Relationships in a Posthumanist Context. 13587-13588 - Yifan Chen
:
Echoes of the Creator: An Immersive VR System for Spatial Storytelling and Empathy Towards Co-Creation. 13589-13591 - Meichun Cai
, Yiou Wang
:
Mixanthropy: Holographic Metamorphic Clouds. 13592-13593 - Tianxing Zhou
, Chengkai Xu
, Xinyue Yao
:
So Long: Interactive Storytelling, Embodying Collective Historical Memory, and Participatory Archiving in a VR Voyage. 13594-13596 - Xia Liu
, Xiao Zhang
:
Rhythm Gate: Invisible Conversations in the Elevator - Echoes of Material, Behavior, Memory and Transformation. 13597-13599 - Jean-Denis Durou
, Jean Mélou
, Yvain Quéau
, Gilles Azzaro
, Hugo Pauget Ballesteros
, Gabriel Gournay
, Achille Jeanvoine
, Clément Lacire, Floriane Payen
, Julie Remenant
:
Transform your Smartphone in a Real-time Sonagram Player. 13600-13602 - Zheyu Feng
, Boya Liu
, Zhonghe Ruan
, Xinyi Zhang
, Zihan Gao
:
Reconstructing the Experience of Nüshu Culture: An Exploration via Multimodal Mixed Reality Systems. 13603-13605 - Anna Borou Yu
, Jiajian Min
:
Embodied Ink: A Multisensory Reinterpretation of Chinese Calligraphy Through Digital Twins and Immersive Realities. 13606-13607
Open Source Software
- Kaiyu Li
, Jiawei Jiang
, Chengxi Han
, Yupeng Deng
, Keyan Chen
, Zhuo Zheng
, Hao Chen
, Ziyuan Liu
, Yuantao Gu
, Zhengxia Zou
, Zhenwei Shi
, Sheng Fang
, Deyu Meng
, Zhi Wang
, Xiangyong Cao
:
Open-CD: A Comprehensive Toolbox for Change Detection. 13608-13612 - Huiming Zheng
, Wei Gao
:
OpenMVC: An Open-Source Library for Learning-based Multi-view Compression. 13613-13617 - Huiming Zheng
, Linjie Zhou
, Wei Gao
:
SCID-Compress900: A Multi-Scene Dataset of 4K and 1080P Screen Content Images for Image Compression Research. 13618-13622 - Andrew C. Freeman
, Luke Reinkensmeyer
:
adder-viz: Real-Time Visualization Software for Transcoding Event Video. 13623-13627 - Tao Zhou
, Lingyu Shu
, Zixing Zhang
, Jing Han
:
Tyee: A Unified, Modular, and Fully-Integrated Configurable Toolkit for Intelligent Physiological Health Care. 13628-13631 - Cheng Zhu
, Jing Han
, Qianshuai Xue
, Kehan Wang
, Huan Zhao
, Zixing Zhang
:
AudioFab: Building A General and Intelligent Audio Factory through Tool Learning. 13632-13635 - Qi Cai
, Yehao Li
, Yingwei Pan
, Ting Yao
, Tao Mei
:
HiDream-I1: An Open-Source High-Efficient Image Generative Foundation Model. 13636-13639 - Pablo Alonso-Jiménez
, Pedro Ramoneda
, Recep Oguz Araz
, Andrea Poltronieri
, Dmitry Bogdanov
:
OMAR-RQ: Open Music Audio Representation Model Trained with Multi-Feature Masked Token Prediction. 13640-13643 - Luca Rossetto
, Florian Ruosch
:
MeGraS: An Open-Source Store for Multimodal Knowledge Graphs. 13644-13647 - Travis Seng
, Axel Carlier
, Wei Tsang Ooi
:
Video Lecture Analysis Toolkit: An Open-Source Framework for Interactive Learning. 13648-13651 - Ralph Gasser
, Rahel Arnold
, Laura Rettig
, Heiko Schuldt
, Raphael Waltenspül
, Luca Rossetto
:
Open-Source Multimedia Retrieval with vitrivr-engine. 13652-13655 - Snehil Kumar
, Neil Vaughan
, Zeyu Fu
, Heather Wilson
:
PySimPace v2.0: An Easy-to-Use Simulation Tool with Machine Learning Pipelines for Realistic MRI Motion Artifact Generation. 13656-13659 - Minsoo Park, Youngkwon Lim, Yangwoo Kim, Sam Richards, Min Woo Park, Kwang Pyo Choi:
OpenAPV: Open Collaborative Innovation in Professional Video Ecosystem. 13660-13663 - Mario Leopold
, Farzad Tashtarian
, Klaus Schoeffmann
:
diveXplore - An Open-Source Software for Modern Video Retrieval with Image/Text Embeddings. 13664-13668
Reproducibility Companion Papers
- Jingjing Hu
, Dan Guo
, Meng Wang
, Jiaxi Li
, Fei Liu
:
Reproducibility Companion Paper: Maskable Retentive Network for Video Moment Retrieval. 13669-13672 - Lorenzo Catania
, Dario Allegra
, Luigi Capogrosso
, Thu Nguyen
:
Reproducibility Companion Paper: NIF: A Fast Implicit Image Compression with Bottleneck Layers and Modulated Sinusoidal Activations. 13673-13676 - Hamed Alimohammadzadeh
, Shahram Ghandeharizadeh
, Federico Cunico
, Joshua Springer
:
Reproducibility Companion Paper: Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks. 13677-13681 - Zhiyu Zhu
, Zhibo Jin
, Jiayu Zhang
, Fang Chen
, Jianlong Zhou
, Vijay John
, Florian Spiess
:
Reproducibility Companion Paper: Enhancing Model Interpretability with Local Attribution over Global Exploration. 13682-13685
Grand Challenges
- Zhixi Cai
, Kartik Kuckreja
, Shreya Ghosh
, Akanksha Chuchra
, Muhammad Haris Khan
, Usman Tariq
, Tom Gedeon
, Abhinav Dhall
:
AV-Deepfake1M++: A Large-Scale Audio-Visual Deepfake Benchmark with Real-World Perturbations. 13686-13691 - Xuecheng Wu
, Heli Sun
, Danlei Huang
, Xinyi Yin
, Yifan Wang
, Hao Wang
, Jia Zhang
, Fei Wang
, Peihao Guo
, Suyu Xing
, Junxiao Xue
, Liang He
:
HOLA: Enhancing Audio-visual Deepfake Detection via Hierarchical Contextual Aggregations and Efficient Pre-training. 13692-13699 - Nicholas Klein
, Hemlata Tak
, James Fullwood
, Krishna Regmi
, Leonidas Spinoulas
, Ganesh Sivaraman
, Tianxiang Chen
, Elie Khoury
:
Pindrop it! Audio and Visual Deepfake Countermeasures for Robust Detection and Fine-Grained Localization. 13700-13706 - Ivan Kukanov
, Jun Wah Ng
:
KLASSify to Verify: Audio-Visual Deepfake Detection Using SSL-based Audio and Handcrafted Visual Features. 13707-13713 - Sebastiano Battiato
, Mirko Casu
, Francesco Guarnera
, Luca Guarnera
, Giovanni Puglisi
, Orazio Pontorno
, Claudio Vittorio Ragaglia
, Zahid Akhtar
:
Adversarial Attacks on Deepfake Detectors: A Challenge in the Era of AI-Generated Media (AADD-2025). 13714-13719 - Nicolas Göller
, Lukas Graner
, Raphael Antonius Frick
, Niklas Bunzel
:
Team RoMa @ AADD-2025: On the Generation of Transferable and Visually Imperceptible Adversarial Attacks Against Deepfake Detectors. 13720-13724 - Gaozheng Pei
, Ke Ma
, Dongpeng Zhang
, Chengzhi Sun
, Qianqian Xu
, Qingming Huang
:
A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement. 13725-13729 - Wonjune Seo
, Joonhyuk Baek
, Yeseong Jung
, Saerom Park:
MIG-COW: Transferable Adversarial Attacks on Deepfake Detectors via Gradient Decomposition. 13730-13736 - Yiheng Zhang
, Zhaofan Qiu
, Qi Cai
, Yehao Li
, Fuchen Long
, Yingwei Pan
, Ting Yao
, Tao Mei
:
Identity-Preserving Video Generation Challenge. 13737-13742 - Yuji Wang
, Moran Li
, Xiaobin Hu
, Ran Yi
, Jiangning Zhang
, Han Feng
, Weijian Cao
, Yabiao Wang
, Chengjie Wang
, Lizhuang Ma
:
Identity-Preserving Text-to-Video Generation Guided by Simple yet Effective Spatial-Temporal Decoupled Representations. 13743-13750 - Jiayi Gao
, Changcheng Hua
, Qingchao Chen
, Yuxin Peng
, Yang Liu
:
Identity-Preserving Text-to-Video Generation via Training-Free Prompt, Image, and Guidance Enhancement. 13751-13757 - Jiahao Xu
, Jianjie Luo
, Zhenguo Yang
:
Improving Identity Preservation in Video Generation with Multi-Branch Models. 13758-13765 - Jie Yang
, Shien Song
, Jin Chen
, Haoyuan Xie
, Han Qi
, Yifei Xue
, Yizhen Lao
:
ACM Multimedia 2025 Grand Challenge report for Image-to-Video Generation Model Acceleration. 13766-13770 - Daichi Sato
, Duc Minh Vo
, Khan Md. Anwarus Salam
, Hidenori Shoji
, Yuma Matsuoka
, Takara Taniguchi
, Kaito Baba
, Hideki Nakayama
:
LAVA Grand Challenge 2025: Benchmarking Japanese-English Document Understanding with Large Vision-Language Models. 13771-13776 - Haoxuan Li
, Wei Song
, Aofan Liu
, Peiwu Qin
:
AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings. 13777-13783 - Ao Zhou
, Zebo Gu
, Tenghao Sun
, Jiawen Chen
, Mingsheng Tu
, Zifeng Cheng
, Yafeng Yin
, Zhiwei Jiang
, Qing Gu
:
Hierarchical Vision-Language Reasoning for Multimodal Multiple-Choice Question Answering. 13784-13790 - Mizuki Yamano
, Keito Fukuoka
, Hisashi Miyamori
:
Two-Stage Approach Using Pretrained Language Models for Question Answering on Japanese Document Images. 13791-13796 - Dong Chen
, Fei Gao
, Zhengqing Hu
, Xiaojun Chang
:
MIRAGE25: ACM MM25 Multimodal Interleaved Reasoning and Generation Challenge. 13797-13798 - Angelos Vlachos
, Giorgos Filandrianos
, Maria Lymperaiou
, Nikolaos Spanos
, Ilias Mitsouras
, Vasileios Karampinis
, Athanasios Voulodimos
:
Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning. 13799-13805 - Jun Yu
, Xilong Lu
, Cong Wang
, Qiang Ling
:
LVLM-MIR: Large Vision-Language Model with Parameter-Efficient Fine-Tuning for Multimodal Interleaved Reasoning. 13806-13812 - Takahiro Komamizu
, Marc A. Kastner
, Yasutomo Kawanishi
, Trung Thanh Nguyen
, Junan Chen
:
IntentVC 2025: The ACM Multimedia Grand Challenge on Intention-Oriented Controllable Video Captioning. 13813-13814 - Zhipeng Yu
, Qianqian Xu
, Yangbangyan Jiang
, Pinci Yang
, Qingming Huang
:
MGVC: MLLM-Guided Video Captioning for the IntentVC Challenge. 13815-13821 - Tianheng Qiu
, Jingchun Gao
, Jingyu Li
, Huiyi Leong
, Xuan Huang
, Xi Wang
, Xiaocheng Zhang
, Kele Xu
, Lan Zhang
:
IntentVCNet: Bridging Spatio-Temporal Gaps for Intention-Oriented Controllable Video Captioning. 13822-13829 - Jun Yu
, Xilong Lu
, Yunxiang Zhang
, Qiang Ling
:
CMA-VC: Large Vision-Language Model for Cross-Modal Alignment in Intention-Oriented Video Captioning. 13830-13836 - Zheng Lian
, Rui Liu
, Kele Xu
, Bin Liu
, Xuefei Liu
, Yazhou Zhang
, Xin Liu
, Yong Li
, Zebang Cheng
, Haolin Zuo
, Ziyang Ma
, Xiaojiang Peng
, Xie Chen
, Ya Li
, Erik Cambria
, Guoying Zhao
, Björn W. Schuller, Jianhua Tao
:
MER 2025: When Affective Computing Meets Large Language Models. 13837-13842 - Xuerui Cheng
, Feng Chen
, Jun Xie
, Kanokphan Lertniphonphan
, Yi Liu
, Zhepeng Wang
:
Personality Prediction via Multimodal Fusion with Sentiment Analysis Enhancement. 13843-13847 - Yuesheng Huang
, Jinming Liu
, Jiajia Chen
, Yihang Lin
, Yanmei Chen
, Jianwei Dong
:
Affective-CoT: Decomposing Multimodal Emotion Reasoning through a Hierarchical Cognitive Workflow. 13848-13855 - Yanjie Sun
, Wuyang Chen
, Yong Dou
:
UniEmotion: A Unified Framework for Multimodal Emotion Recognition with Iterative Consensus-based Training. 13856-13863 - Zhengqin Lai
, Zhilin Zhu
, Xiaopeng Hong
, Yaowei Wang
:
Agent-MER: A Cognitive Agent with Hierarchical Deliberation for Open-Vocabulary Multimodal Emotion Recognition. 13864-13871 - Xudong Han
, Kai Liu
, Yanlin Li
, Hao Li
, Zheng Wang
:
The ACM Multimedia 2025 Grand Challenge of Truthful and Responsible Multimodal Learning. 13872-13873 - Hoang Chu
, Huy Chu
, Tan-Minh Nguyen
, Son T. Luu
, Cuong Hoang
, Hiep Nguyen
, Vu Tran
, Le Minh Nguyen
:
DeepSIX at ACM MM 2025 Grand Challenge: Enhancing Context Text Processing for Multimodal Hallucination Detection and Fact Verification. 13874-13880 - Zijian Zhang
, Xuecheng Wu
, Danlei Huang
, Siyu Yan
, Chong Peng
, Xuezhi Cao
:
HKD4VLM: A Progressive Hybrid Knowledge Distillation Framework for Robust Multimodal Hallucination and Factuality Detection in VLMs. 13881-13887 - Shuoping Yang
, Jun Yu
:
Unified Dual-Strategy Framework for Multi-Task Visual Question Answering. 13888-13894 - Tianyi Zhang
, Tianhua Qi
, Antonis Koutsoumpis
, Yuan Zong
, Wenming Zheng
, Janneke K. Oostrom
, Djurre Holtrop
, Zhaojie Luo
, Reinout E. de Vries
:
Assessing Personality Traits and Interview Performance from Asynchronous Video Interviews. 13895-13900 - Jia Li
, Yichao He
, Jiacheng Xu
, Tianhao Luo
, Zhenzhen Hu
, Richang Hong
, Meng Wang
:
Traits Run Deep: Enhancing Personality Assessment via Psychology-Guided LLM Representations and Multimodal Apparent Behaviors. 13901-13908 - Jia Li
, Yang Wang
, Wenhao Qian
, Jialong Hu
, Zhenzhen Hu
, Richang Hong
, Meng Wang
:
Listening to the Unspoken: Exploring '365' Aspects of Multimodal Interview Performance Assessment. 13909-13916 - Longjiang Yang
, Cong Yu
, Chenxi Huang
, Fengyu Zhang
, Ran Liu
, Zhuofan Wen
, Shun Chen
, Hailiang Yao
, Bin Liu
, Zheng Lian
, Jianhua Tao
:
Enhancing Multimodal Personality Assessment with LLM-Augmented Hierarchical Fusion. 13917-13923 - Changzeng Fu
, Zelin Fu
, Qi Zhang
, Xinhe Kuang
, Jiacheng Dong
, Kaifeng Su
, Yikai Su
, Wenbo Shi
, Junfeng Yao
, Yuliang Zhao
, Shiqi Zhao
, Jiadong Wang
, Siyang Song
, Chaoran Liu
, Yuichiro Yoshikawa
, Björn W. Schuller, Hiroshi Ishiguro
:
The First MPDD Challenge: Multimodal Personality-aware Depression Detection. 13924-13929 - Fangyuan Liu
, Sirui Zhao
, Kang Yin
, Tong Xu
, Enhong Chen
:
DepFormer: A Unified Framework with Bimodal Collaborative Transformer for Depression Detection. 13930-13936 - Hanlei Shi
, Yu Liu
, Haoxun Li
, Yuxuan Ding
, Jiaxi Hu
, Leyuan Qu
, Taihao Li
:
HOPE: Hierarchical Fusion for Optimized and Personality-Aware Estimation of Depression. 13937-13943 - Yuyun Liu
, Kaifei Zhang
, Yinghao Ma
, Xiaolin Xu
, Tianhua Qi
, Wenming Zheng
, Cheng Lu
, Yuan Zong
:
Multi-Level Segment Fusion Based on Adaptive Time-Window Selection for Multimodal Personality-Aware Elderly Depression Detection. 13944-13950 - Xinqi Fan
, Jingting Li
, John See
, Moi Hoon Yap
, Wen-Huang Cheng
, Xiaobai Li
, Xiaopeng Hong
, Su-Jing Wang
, Adrian K. Davison
:
MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering. 13951-13956 - Lingsi Zhu
, Yanjun Chi
, Jun Yu
, Gongpeng Zhao
, Yuefeng Zou
, Fengzhao Sun
, Xilong Lu
:
HierMEQA: A Relationship-Aware Hierarchical Framework for Consistent Micro-Expression Visual Question Answering. 13957-13963 - Zizheng Guo
, Bochao Zou
, Yinuo Jia
, Xiangyu Li
, Huimin Ma
:
Boosting Micro-Expression Analysis via Prior-Guided Video-Level Regression. 13964-13971 - Yujing Wang
, Ruotong Fang
, Xing Huang
, Zhiyuan Han
, Xiaoqing Lin
, Yuhao Shan
, Tong Chen
:
Emotion-Qwen-VL: A Fully Fine-Tuned Multimodal Large Language Model for Micro-Expression Visual Question Answering. 13972-13978 - Siyang Song
, Micol Spitale
, Xiangyu Kong
, Hengde Zhu
, Cheng Luo
, Cristina Palmero
, Germán Barquero
, Sergio Escalera
, Michel F. Valstar
, Mohamed Daoudi
, Tobias Baur
, Fabien Ringeval
, Andrew Howes
, Elisabeth André
, Hatice Gunes
:
REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge. 13979-13984 - Qirong Mao
, Qiwei Wu
, Na Liu
, Yakui Ding
, Lijian Gao
:
Scattering-Conditioned Diffusion Models for Multiple Appropriate Facial Reaction Generation. 13985-13991 - Jiajian Huang
, Zitong Yu
:
Multiple Appropriate Facial Reaction Generation Based on Multi-View Transformation of Speaker Video. 13992-13996 - Peng Wang
, Pujun Xue, Xiaofeng Liu
, Tongjuan Ji
:
Explaining Listener Reactions: Personality-Guided Facial Response Generation with Cross-Modal Attention. 13997-14003 - Han Zhang
, Hao Fei
, Hong Han
, Lizi Liao
, Erik Cambria
, Min Zhang
:
The ACM Multimedia 2025 Grand Challenge of Avatar-based Multimodal Empathetic Conversation. 14004-14005 - Ronghao Lin
, Shuai Shen
, Weipeng Hu
, Qiaolin He
, Aolin Xiong
, Li Huang
, Haifeng Hu
, Yap-Peng Tan
:
E3RG: Building Explicit Emotion-driven Empathetic Response Generation System with Multimodal Large Language Model. 14006-14013 - Keqi Chen, Wenxin Fu, Qihang Lu
, Zekai Sun
, Yizhong Geng, Yi Liu, Puyuan Guo
, Yingming Gao
, Ya Li:
EMO-Avatar: An LLM-Agent-Orchestrated Framework for Multimodal Emotional Support in Human Animation. 14014-14020 - Chenhao Dang
, Zeyuan Zhu
:
MERIA: Empathetic Response Generation via Parallel Disentanglement and Uncertainty-Gated Fusion. 14021-14027 - Duc-Tien Dang-Nguyen
, Morten Dahlback Langfeldt
, Henrik Brattli Vold
, Silje Førsund, Minh-Son Dao
, Sohail Ahmed Khan
, Kha-Luan Pham
, Marc Gallofré Ocaña
, Minh-Triet Tran
, Anh-Duy Tran
:
The 2025 Grand Challenge on Multimedia Verification: Foundations and Overview. 14028-14033 - Huy Hoan Le
, Van Sy Thinh Nguyen
, Thi Le Chi Dang
, Vo Thanh Khang Nguyen
, Truong Thanh Hung Nguyen
, Hung Cao
:
Multimedia Verification Through Multi-Agent Deep Research Multimodal Large Language Models. 14034-14040 - Minh-Anh Pham
, Anh-Tai Pham-Nguyen
, Anh-Duy Le
, Duc-Tuan Luu
, Thanh-Hai Tran
, Anh-Duy Tran
, Duc-Tien Dang-Nguyen
:
Ægis: AI-Enhanced OSINT for Multimedia Verification. 14041-14047 - Van-Hoang Phan
, Tung-Duong Le-Duc
, Long-Khanh Pham
, Anh-Thu Le
, Quynh-Huong Dinh-Nguyen
, Dang-Quan Vo
, Hoang-Quoc Nguyen-Son
, Anh-Duy Tran
, Dang Vu
, Minh-Son Dao
:
Fact-Checking at Scale: Multimodal AI for Authenticity and Context Verification in Online Media. 14048-14054 - Bo Wu
, Peiye Liu
, Qiushi Huang
, Zhaoyang Zeng
, Jia Wang
, Bei Liu
, Jiebo Luo
, Wen-Huang Cheng
:
SMPV: Social Media Prediction for Videos. 14055-14057 - Ao Zhou
, Mingsheng Tu
, Luping Wang
, Tenghao Sun
, Zifeng Cheng
, Yafeng Yin
, Zhiwei Jiang
, Qing Gu
:
Cross-Modal Prototype Augmentation and Dual-Grained Prompt Learning for Social Media Popularity Prediction. 14058-14065 - Yan Zhuang, Wei Bai, Yanru Zhang, Minhao Liu, Jiawen Deng, Fuji Ren:
FAME: Fusion-Aware Multi-modal Ensemble for Social Media Popularity Prediction. 14066-14072 - Wenzheng Hou
, Weixin Li
:
Modality-Aligned Hierarchical Attention Network for Multi-Modal Popularity Prediction on Social Media. 14073-14078 - Liliang Ye
, Yunyao Zhang
, Yafeng Wu
, Yi-Ping Phoebe Chen
, Junqing Yu
, Wei Yang
, Zikai Song
:
MVP: Winning Solution to SMP Challenge 2025 Video Track. 14079-14085 - Kele Xu
, Qisheng Xu
, Binli Luo
, Han Zhou
, Zengming Lin
, Hui Geng
, Xianhan Tan
:
Higher-Order Vision-Language Fusion for Video Popularity Prediction. 14086-14093 - Chia-Ming Lee
, Bo-Cheng Qiu
, Cheng-Jun Kang
, Yi-Hsuan Wu
, Jun-Lin Chen
, Yu-Fan Lin
, Yi-Shiuan Chou
, Chih-Chung Hsu
:
Anchoring Trends: Mitigating Social Media Popularity Prediction Drift via Feature Clustering and Expansion. 14094-14100 - Meng Luo
, Hao Fei
, Bobo Li
, Shengqiong Wu
, Qian Liu
, Soujanya Poria
, Erik Cambria
, Mong-Li Lee
, Wynne Hsu
:
The ACM Multimedia 2025 Grand Challenge of Multimodal Conversational Aspect-based Sentiment Analysis. 14101-14106 - Zhiqiang Gao
, Shihao Gao
, Zixing Zhang
, Yihao Guo
, Hongyu Chen
, Jing Han
:
Structured Prompting and LLM Ensembling for Multimodal Conversational Aspect-based Sentiment Analysis. 14107-14113 - Xinjing Liu
, Pengyue Lin
, Xinyu Tu
, Wenqi Jia
, Chen Jiang
, Ruifan Li
:
SDG-MLLM: Injecting Structured Dialogue Graphs into MLLM for Multimodal Conversational Aspect-Based Sentiment Analysis. 14114-14121 - Deyuan Chen
, Xiaocui Yang
, Shi Feng
, Zihan Cheng
, Daling Wang
, Yifei Zhang
:
A Two-Stage Full Fine-Tuning and LLM Post-processing Framework for MCABSA. 14122-14129 - Shiye Cao
, Maia Stiber
, Amama Mahmood
, Maria Teresa Parreira
, Wendy Ju
, Micol Spitale
, Hatice Gunes
, Chien-Ming Huang
:
ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations. 14130-14135 - Rutherford Agbeshi Patamia
, Ha Pham Thien Dinh
, Ming Liu
, Akansel Cosgun
:
Beyond Technical Failures: Multimodal Time-Series Modelling for Detecting Social Breakdowns and User Repair Attempts in Human-Robot Interaction. 14136-14142 - Xun Jiang
, Shuangle Li
, Chong Liu
, Xing Xu
:
Multimodal Time Series Alignment for Error Detection in Human Robot Interactions. 14143-14149 - Daksitha Senel Withanage Don
, Marius Funk
, Michal Balazia
, Huajian Qiu
, Shogo Okada
, François Brémond, Jan Alexandersson
, Andreas Bulling
, Elisabeth André
, Philipp Müller
:
MultiMediate '25: Cross-cultural Multi-domain Engagement Estimation. 14150-14155 - Jun Yu
, Xilong Lu
, Lingsi Zhu
, Qiang Ling
:
LVLM-HBA: Large Vision-Language Model with Cross-Modal Alignment for Human Behavior Analysis. 14156-14162 - Yuefeng Zou
, Hui Zhang
, Jun Yu
, Keda Lu
, Lingsi Zhu
, Fengzhao Sun
, Bo Wang
, Kun Yao
, Jianqing Sun
, Jiaen Liang
:
Heterogeneous Encoder Fusion with KAN Decoder for Group Engagement Modeling via 8× Sliding Pipelines. 14163-14169 - Yangchen Yu
, Yin Chen
, Jia Li
, Peng Jia
, Yu Zhang
, Li Dai
, Zhenzhen Hu
, Meng Wang
, Richang Hong
:
Generalizable Engagement Estimation in Conversation via Domain Prompting and Parallel Attention. 14170-14177 - Trong-Thuan Nguyen
, Viet-Tham Huynh
, Thao Thi Phuong Dao
, Mai-Khiem Tran
, Ha Nguyen Thi
, Tien To Vu Thuy
, Uyen Hanh Tran
, Tam V. Nguyen
, Minh-Triet Tran
, Thanh Dinh Le
:
ACM Multimedia Grand Challenge on ENT Endoscopy Analysis. 14178-14183 - Trong-Nhan Nguyen
, Luan L. M. Nguyen
, Phat-Dat To
, Tran-Quoc Duy Nguyen
, Anh-Huy Nguyen
, Tuan Pham-Dang
, Chu Lam Nguyen
, Duy V. M. Nguyen
:
HyMoENet: Mixture-of-Experts Enhanced CNN-Transformer Hybrid Framework for Classifying Anatomical Sites in Endoscopic ENT Images. 14184-14189 - Khoa Tran
, Linh Ly
, Duy Khanh Ho
, Ngoc Hoang Luong
:
Enhancing Endoscopic Image Retrieval via Self-Supervised Learning and Large VLM-Based Re-ranking. 14190-14196 - Y. Hop Nguyen
, Doan Anh Phan Huu
, Trung Thai Tran
, Nhat Nam Mai
, Van Toi Giap
, Thao Thi Phuong Dao
, Trung-Nghia Le
:
Multi-Level CLS Token Fusion for Contrastive Learning in Endoscopy Image Classification. 14197-14203 - Shreya Bansal
, Ruchi Bhatt
, Amanpreet Chander
, Rupinder Kaur
, Malya Singh
, Mohan Kankanhalli
, Abdulmotaleb El Saddik
, Mukesh Saini
:
GroMo25: ACM Multimedia 2025 Grand Challenge for Plant Growth Modeling with Multiview Images. 14204-14209 - Robin-Nico Kampa
, Fabian Deuser
, Konrad Habel
, Norbert Oswald
:
ViewSparsifier: Killing Redundancy in Multi-View Plant Phenotyping. 14210-14215 - Kun Li
, Dan Guo
, Xiaobai Li
, Haoyu Chen
, Pengyu Liu
, Fei Wang
, Jingjing Hu
, Guoying Zhao
, Meng Wang
:
MAC 2025: The 2nd Micro-Action Analysis Grand Challenge. 14216-14221 - Qiankun Li
, Qiupu Chen
, Huabao Chen
, Feng He
, Depeng Li
, Zhigang Zeng
:
Progressive Large-Scale Modeling via Temporal-Spatial Focus Connector for Micro-Action Recognition. 14222-14228 - Chuang Wang
, Weidong Chen
, Xu Cui
, Yiming Zhao
, Zhaobo Qi
, Pengqi Huang
, Xinyan Liu
, Weigang Zhang
:
Combatting Data Imbalance and Noise in Micro-Action Recognition. 14229-14235 - Zhichao Xia
, Yichi Zhang
, Yanjun Chi
, Lingsi Zhu
, Mohan Jing
, Jun Yu
:
Hierarchical Multi-Feature Extraction and Aggregation for Micro-Action Recognition. 14236-14243 - Thien-Phuc Tran
, Minh-Quang Nguyen
, Minh-Triet Tran
, Tam V. Nguyen
, Trong-Le Do
, Duy-Nam Ly
, Viet-Tham Huynh
, Khanh-Duy Le
, Mai-Khiem Tran
, Trung-Nghia Le
:
Event-Enriched Image Analysis Grand Challenge At ACM Multimedia 2025. 14244-14249 - Nam-Quan Nguyen
, Minh-Hoang Le
, Vinh-Toan Vong
, Minh-Triet Tran
:
ENRIC: EveNt-AwaRe Captioning with Image Retrieval via UnCertainty-Guided Re-ranking and Semantic Ensemble Reasoning. 14250-14256 - Dinh-Khoi Vo
, Van-Loc Nguyen
, Minh-Triet Tran
, Trung-Nghia Le
:
EVENT-Retriever: Event-Aware Multimodal Image Retrieval for Realistic Captions. 14257-14263 - Thinh-Phuc Nguyen
, Thanh-Hai Nguyen
, Gia-Huy Dinh
, Lam-Huy Nguyen
, Minh-Triet Tran
, Trung-Nghia Le
:
ReCap: Event-Aware Image Captioning with Article Retrieval and Semantic Gaussian Normalization. 14264-14270 - Luca Rossetto
, Werner Bailer
, Cathal Gurrin
, Duc-Tien Dang-Nguyen
, Klaus Schoeffmann
, Allie Tran
:
Overview of the First CASTLE Grand Challenge at ACM Multimedia 2025. 14271-14272 - Omar Shahbaz Khan
, Ujjwal Sharma
, Gonçalo Marcelino, Aaron Duane
, Stevan Rudinac
, Marcel Worring
, Björn Þór Jónsson
:
Interactive Retrieval System for Multi-Stream Collections: multiXview at CASTLE 2025 Interactive Grand Challenge. 14273-14279 - Quang-Linh Tran
, Hoang-Bao Le
, Thang-Long Nguyen-Ho
, Graham Healy
, Liting Zhou
, Allie Tran
:
Extending Lifelog Retrieval to Multi-stream Video Retrieval at the CASTLE Challenge 2025. 14280-14285
Workshop Summaries
- Valérie Gouet-Brunet
, Edgar Roman-Rangel
, Li Weng
:
SUMAC '25: 7th Workshop on analySis, Understanding and proMotion of heritAge Contents: Advances in Machine Learning, Signal Processing, Multimodal Techniques and Human-machine Interaction. 14286-14287 - Rainer Lienhart
, Thomas B. Moeslund
, Hideo Saito
:
8th ACM International Workshop on Multimedia Content Analysis in Sports (ACM MMSports'25). 14288-14290 - Aik Beng Ng
, Yethoven Tukimin
, Jeannie S. Lee
, Megani Rajendran
, Chek Tien Tan
, Indriyati Atmosukarto
:
Intelligent Immersification in the Metaverse: AI-Driven Immersive Multimedia. 14291-14292 - Wei Zhou
, Hadi Amirpour
, Li Yu
, Jungong Han
, Richang Hong
, Paul L. Rosin
:
MCHM25: Multimedia Computing for Health and Medicine. 14293-14295 - Wei Jiang
, Zhenghao Chen
, Dong Xu
:
(RichMediaGAI'25) 3rd International Workshop on Rich Media with Generative AI. 14296-14298 - Tai Tan Mai
, Allie Tran
, Quang-Linh Tran
, An Nguyen
, Hoang Nguyen
, Tho Quan, Duc-Tien Dang-Nguyen
, Cathal Gurrin
:
AIQAM'25: The 2nd ACM Workshop on AI-powered Question Answering Systems for Multimedia. 14299-14301 - Amit Kumar Jaiswal
, Thomas Mandl
, Gautam Kishore Shahi
, Durgesh Nandini
, Haiming Liu
:
DHOW '25: 2nd International Workshop on Diffusion of Harmful Content on Online Web. 14302-14304 - Taras Kucherenko
, Alice Delbosc
, Rajmund Nagy
, Laura B. Hensel
, Youngwoo Yoon
, Oya Çeliktutan
, Gustav Eje Henter
:
GENEA Workshop 2025: The 6th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents. 14305-14307 - Sherzod Hakimov
, David Semedo
, Eric Müller-Budack
, Marc A. Kastner
, Takahiro Komamizu
:
MUWS 2025: The 4th International Workshop on Multimodal Human Understanding for the Web and Social Media. 14308-14310 - Ziyu Wei
, Luting Wang
, Chen Gao
, Hongliang Huang
, Jiaqi Liu
, Li Wen
, Si Liu
:
RoboSoft'25: The 1st International Workshop on Vision-Language in Soft Robot. 14311-14313 - Zheng Lian
, Shreya Ghosh
, Erik Cambria
, Zhixi Cai
, Guoying Zhao
, Abhinav Dhall
, Björn W. Schuller
, Roland Goecke
, Jianhua Tao
, Tom Gedeon
:
MRAC 2025: 3rd International Workshop on Multimodal, Generative and Responsible Affective Computing. 14314-14316 - Sebastiano Battiato
, Mirko Casu
, Francesco Guarnera
, Luca Guarnera
, Giovanni Puglisi
, Orazio Pontorno
, Claudio Vittorio Ragaglia
, Zahid Akhtar
:
(DFF '25) 1st Deepfake Forensics Workshop: Detection, Attribution, Recognition, and Adversarial Challenges in the Era of AI-Generated Media. 14317-14319 - Zheng Wang
, Qianqian Chen
, Yiyang Luo
, Zhiqiu Ye
, Shi Wei
, Hanwang Zhang
, Tat-Seng Chua
:
Large Generative Models Meet Multimodal Applications (LGM3A). 14320-14322 - Hao Fei
, Bobo Li
, Meng Luo
, Qian Liu
, Lizi Liao
, Fei Li
, Min Zhang
, Björn W. Schuller
, Mong-Li Lee
, Erik Cambria
:
CogMAEC'25: The 1st Workshop on Cognition-oriented Multimodal Affective and Empathetic Computing. 14323-14325 - Wei Gao
, Sam Kwong
, Zhu Li
, Shan Liu
, Ge Li
:
APP3DV'25: ACM Multimedia - International Workshop on Application-driven Point Cloud Processing and 3D Vision. 14326-14328 - Cheng Jin
, Mingli Song
, Rui Wang
, Xingjiao Wu
:
McGE '25: The 3rd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice. 14329-14330 - Tiesong Zhao
, Qian Liu
, Zhisheng Yan
:
MSMA'2025: The 1st International Workshop on Multi-Sensorial Media and Applications. 14331-14332 - Irene Viola
, Silvia Rossi
, Marta Orduna
, Maria Torres Vega
:
IXR '25: 3rd International Workshop on Interactive eXtended Reality. 14333-14334 - Lipika Dey
, Marianna Obrist
, Stavroula G. Mougiakakou
:
MMFood'25: 1st International Workshop on Multi-modal Food Computing. 14335-14337
Tutorials
- Siru Zhong
, Xixuan Hao
, Hao Miao
, Yan Zhao
, Qingsong Wen
, Roger Zimmermann
, Yuxuan Liang
:
Multimodal Learning for Spatio-Temporal Data Mining. 14338-14339 - Wei Zhou
, Hadi Amirpour
:
Perceptual Visual Quality Assessment in Multimedia Communication. 14340-14341 - Sarmistha Das
, Akash Ghosh
, Sriparna Saha
, Koustava Goswami
, K. J. Joseph
:
Reasoning and Planning for Multimodal Large Language Models: A Multilingual and Cross-Domain Exploration. 14342-14343 - Qiang Sheng
, Peng Qi
, Tianyun Yang
, Yuyan Bu
, Wynne Hsu
, Mong-Li Lee
, Juan Cao
:
Combating Online Misinformation Videos: Characterization, Detection, and Prevention. 14344-14345 - Yicong Li
, Junbin Xiao
, Angela Yao
, Tat-Seng Chua
:
Video Question Answering and Beyond. 14346-14347 - Wei Gao
, Ge Li
:
AI-based Multimedia Data Compression: Perception Utility Optimization and Standardization. 14348-14349
Industrial Demonstrations and Expert Talks
- Jianquan Liu
, Balu Adsumilli
, Yukiko Yanagawa
, Haiwei Dong
:
An Innovative Industry Program on Multimedia in A New AI Era. 14351-14352 - Tomoya Sawada
:
How Generative AI Understands the Balance of Energy, Efficiency, and Human Experience. 14353 - Guan-Ming Su
:
Video Content Restoration in the Wild: Challenges and Opportunities. 14354 - Guoming Wang
:
MedAI Hub: A Multimodal Medical Data Platform with Evolutionary Image Enhancement and Graph-Driven Literature Retrieval. 14355 - Aleksandr Farseev
:
Will AI Make Agencies Obsolete? Rethinking the Future of Advertising. 14356 - Aleksandr Farseev
:
SOMIN: An Explainable AI and LLM Platform for Real-Time, Data-Driven Digital Marketing Strategy. 14357 - Wenbing Zhu
, Mingmin Chi
, Bo Peng
:
A Streamlined System for Multimodal Industrial Anomaly Detection via 2D and 3D Feature Fusion. 14358 - Goutham Vignesh
, Harikrishnan P. M.
, Siddartha Reddy
, Saisubramaniam Gopalakrishnan
, Vishal Vaddina
:
IDPFlow: A No-Code Agentic Framework for Multimodal Intelligent Document Processing. 14359-14360 - Roberto Iacoviello
, Alberto Ciprian
, Alberto Messina
, Maurizio Montagnuolo
, Davide Zappia
:
XReco Platform and RAI News Media Demonstrator. 14361 - Claudio Baecchi
, Matteo Bruni
, Fabio Clabot
, Marco Bertini
:
Real-time GenAI Solutions for Video Streaming in Low-bandwidth Settings. 14362-14363 - Yasunori Mochizuki
:
Solving Critical Real-World Business Challenges - NEC's Industrial Research Model in the AI Era. 14364 - Christoph Bregler
:
Media integrity and literacy in the age of GenAI & Deepfakes. 14365 - Benoit Huet
:
Advancing Lung Cancer Diagnosis with eyonis® LCS. 14366-14367 - Keisuke Nonaka
:
Research and Standardization Trends in Compression and Transmission Technologies for 3D Point Cloud. 14368 - Terumi Umematsu
:
To Advance People's Well-Being: Human health sensing, analysis, and applications. 14369 - Xin Li
:
Spark LLM and the Scientific Research it empowers: Practice and Thoughts. 14370 - Ting Yao
:
Multimodal Content Creation, Consumption and Distribution. 14371 - Yasuhiro Fujiwara
:
Toward Fast and Exact Machine Learning Platform for Big Data. 14372 - Maneesh Kumar Singh
:
Sovereign & Shared: Frugally Scalable Multilingual-Multimodal AI for Bharat. 14373 - Balu Adsumilli
, Jianle Chen
, In Suk Chong
, Yilin Wang
:
Google Industry Seminar: Video Processing in the New Age of AI. 14374-14375
Grand Challenges
- Longfeng Chen
, Zheng Xiao
, Juyuan Wang
, Zeyu Huang
, Yawen Zeng
, Jin Xu
:
HEAR: A Holistic Extraction and Agentic Reasoning Framework for Document Understanding. 14376-14382
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.