default search action
32nd MM 2024: Melbourne, VIC, Australia
- Jianfei Cai, Mohan S. Kankanhalli, Balakrishnan Prabhakaran, Susanne Boll, Ramanathan Subramanian, Liang Zheng, Vivek K. Singh, Pablo César, Lexing Xie, Dong Xu:
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024 - 1 November 2024. ACM 2024, ISBN 979-8-4007-0686-8
Keynote Talks
- Pascale Fung:
From Assistants to Agents in the LLM Era. 1 - Benoit Huet:
Revolutionizing Lung Cancer Diagnostics with eyonis TM LCS: Cutting-edge AI/ML Technology-based SaMD for Enhanced Patient Care. 2-3 - Judy Kay:
Empowering People to Harness and Control their Multimodal Data in Scrutable User models. 4-5 - Jiebo Luo:
Large Multimodal Models as Social Multimedia Analysis Engines. 6-7
Oral Session 1: Large Language Models & Applications 1
- Haicheng Liao, Yongkang Li, Chengyue Wang, Yanchen Guan, Kahou Tam, Chunlin Tian, Li Li, Chengzhong Xu, Zhenning Li:
When, Where, and What? A Benchmark for Accident Anticipation and Localization with Large Language Models. 8-17 - Haonan Zheng, Xinyang Deng, Wen Jiang, Wenrui Li:
A Unified Understanding of Adversarial Vulnerability Regarding Unimodal Models and Vision-Language Pre-training Models. 18-27 - Xiang Fang, Wanlong Fang, Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Renfu Li, Zichuan Xu, Lixing Chen, Panpan Zheng, Yu Cheng:
Not All Inputs Are Valid: Towards Open-Set Video Moment Retrieval using Language. 28-37 - Huishan Ji, Qingyi Si, Zheng Lin, Weiping Wang:
Towards Flexible Evaluation for Generative Visual Question Answering. 38-47 - Jiaqi Zhu, Shaofeng Cai, Fang Deng, Beng Chin Ooi, Junran Wu:
Do LLMs Understand Visual Anomalies? Uncovering LLM's Capabilities in Zero-shot Anomaly Detection. 48-57 - Yudong Li, Xianxu Hou, Dezhi Zheng, Linlin Shen, Zhe Zhao:
FLIP-80M: 80 Million Visual-Linguistic Pairs for Facial Language-Image Pre-Training. 58-67
Oral Session 2: Large Language Models & Applications 2
- Esmée Henrieke Anne de Haas, Lik-Hang Lee, Yiming Huang, Carlos Bermejo, Pan Hui, Zijun Lin:
Towards Trustworthy MetaShopping: Studying Manipulative Audiovisual Designs in Virtual-Physical Commercial Platforms. 68-77 - Weiqi Li, Shijie Zhao, Bin Chen, Xinhua Cheng, Junlin Li, Li Zhang, Jian Zhang:
ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images. 78-87 - Yunqiang Pei, Kaiyue Zhang, Hongrong Yang, Yong Tao, Qihang Tang, Jialei Tang, Guoqing Wang, Zhitao Liu, Ning Xie, Peng Wang, Yang Yang, Hengtao Shen:
Improving Interaction Comfort in Authoring Task in AR-HRI through Dynamic Dual-Layer Interaction Adjustment. 88-97 - Yang Lu, Junxian Li, Zhitong Cui, Jiapeng Hu, Yanna Lin, Shijian Luo:
Designing Spatial Visualization and Interactions of Immersive Sankey Diagram in Virtual Reality. 98-107 - Zhang Wan, Sheng Tang, Jiawei Wei, Ruize Zhang, Juan Cao:
DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships. 108-116 - Kento Shigyo, Yifan Cao, Kentaro Takahira, Mingming Fan, Huamin Qu:
VR-Mediated Cognitive Defusion: A Comparative Study for Managing Negative Thoughts. 117-126
Oral Session 3: Novel Multimedia Applications 1
- Yinxuan Gui, Bin Zhu, Jingjing Chen, Chong Wah Ngo, Yu-Gang Jiang:
Navigating Weight Prediction with Diet Diary. 127-136 - Feiyu Chen, Cong Xu, Qi Jia, Yihua Wang, Yuhan Liu, Haotian Zhang, Endong Wang:
Egocentric Vehicle Dense Video Captioning. 137-146 - Jinyue Chen, Lingyu Kong, Haoran Wei, Chenglong Liu, Zheng Ge, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang:
OneChart: Purify the Chart Structural Extraction via One Auxiliary Token. 147-155 - Jiawei Lin, Zhaoyun Jiang, Jiaqi Guo, Shizhao Sun, Ting Liu, Zijiang Yang, Jian-Guang Lou, Dongmei Zhang:
IconDM: Text-Guided Icon Set Expansion Using Diffusion Models. 156-165 - Haipeng Zhou, Hongqiu Wang, Tian Ye, Zhaohu Xing, Jun Ma, Ping Li, Qiong Wang, Lei Zhu:
Timeline and Boundary Guided Diffusion Network for Video Shadow Detection. 166-175 - Yichang Qu, Bing Li, Jie Huang, Feng Zhao:
Training Pansharpening Networks at Full Resolution Using Degenerate Invariance. 176-185
Oral Session 4: Graph and Diffusion Models
- Jielong Lu, Zhihao Wu, Zhaoliang Chen, Zhiling Cai, Shiping Wang:
Towards Multi-view Consistent Graph Diffusion. 186-195 - Liyuan Ma, Xueji Fang, Guo-Jun Qi:
Equilibrated Diffusion: Frequency-aware Textual Embedding for Equilibrated Image Customization. 196-204 - Weilun Feng, Chuanguang Yang, Zhulin An, Libo Huang, Boyu Diao, Fei Wang, Yongjun Xu:
Relational Diffusion Distillation for Efficient Image Generation. 205-213 - Hongjie Wu, Linchao He, Mingqin Zhang, Dongdong Chen, Kunming Luo, Mengting Luo, Jizhe Zhou, Hu Chen, Jiancheng Lv:
Diffusion Posterior Proximal Sampling for Image Restoration. 214-223 - Yiheng Huang, Hui Yang, Chuanchen Luo, Yuxi Wang, Shibiao Xu, Zhaoxiang Zhang, Man Zhang, Junran Peng:
StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework. 224-232 - Yichi Zhang, Zhuo Chen, Lingbing Guo, Yajing Xu, Wen Zhang, Huajun Chen:
Making Large Language Models Perform Better in Knowledge Graph Completion. 233-242
Oral Session 5: Multimodal Models and Applications
- Rishikesh Devanathan, Apoorva Singh, A. S. Poornash, Sriparna Saha:
Seeing Beyond Words: Multimodal Aspect-Level Complaint Detection in Ecommerce Videos. 243-252 - Hsiang-Hui Hung, Huu-Phu Do, Yung-Hui Li, Ching-Chun Huang:
TimeNeRF: Building Generalizable Neural Radiance Fields across Time from Few-Shot Input Views. 253-262 - Xiaoxuan Shen, Fenghua Yu, Yaqi Liu, Ruxia Liang, Qian Wan, Kai Yang, Jianwen Sun:
Revisiting Knowledge Tracing: A Simple and Powerful Model. 263-272 - Peiming Li, Ziyi Wang, Mengyuan Liu, Hong Liu, Chen Chen:
ClickDiff: Click to Induce Semantic Contact Map for Controllable Grasp Generation with Diffusion Models. 273-281 - Bochao Liu, Pengju Wang, Weijia Guo, Yong Li, Liansheng Zhuang, Weiping Wang, Shiming Ge:
Private Gradient Estimation is Useful for Generative Modeling. 282-290 - Ke Zhu, Liang Zhao, Zheng Ge, Xiangyu Zhang:
Self-Supervised Visual Preference Alignment. 291-300
Oral Session 6: Innovations in Medical Imaging and Physiological Measurement
- Yuxin Hong, Xiao Zhang, Xin Zhang, Joey Tianyi Zhou:
Evolution-aware VAriance (EVA) Coreset Selection for Medical Image Classification. 301-310 - Ruiqi Wang, Jinyang Huang, Jie Zhang, Xin Liu, Xiang Zhang, Zhi Liu, Peng Zhao, Sigui Chen, Xiao Sun:
FacialPulse: An Efficient RNN-based Depression Detection via Temporal Facial Landmarks. 311-320 - Wei Zhang, En Zhu, Juan Chen, YunPeng Li:
MDDR: Multi-modal Dual-Attention aggregation for Depression Recognition. 321-329 - Wei Qian, Kun Li, Dan Guo, Bin Hu, Meng Wang:
Cluster-Phys: Facial Clues Clustering Towards Efficient Remote Physiological Measurement. 330-339 - Zhenxi Song, Ruihan Qin, Huixia Ren, Zhen Liang, Yi Guo, Min Zhang, Zhiguo Zhang:
EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations. 340-349 - Xueyuan Xu, Li Zhuo, Jinxin Lu, Xia Wu:
WSEL: EEG Feature Selection with Weighted Self-expression Learning for Incomplete Multi-dimensional Emotion Recognition. 350-359
Oral Session 7: Imaging, Computer Vision & Graphics
- Yuanbo Wen, Tao Gao, Ting Chen:
Unpaired Photo-realistic Image Deraining with Energy-informed Diffusion Model. 360-369 - Zeyu Li, Ruitong Gan, Chuanchen Luo, Yuxi Wang, Jiaheng Liu, Ziwei Zhu, Qing Li, Xucheng Yin, Man Zhang, Zhaoxiang Zhang, Junran Peng:
MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets. 370-379 - Xiao Han, Yiming Ren, Peishan Cong, Yujing Sun, Jingya Wang, Lan Xu, Yuexin Ma:
Gait Recognition in Large-scale Free Environment via Single LiDAR. 380-389 - Tang Tao, Longfei Gao, Guangrun Wang, Yixing Lao, Peng Chen, Hengshuang Zhao, Dayang Hao, Xiaodan Liang, Mathieu Salzmann, Kaicheng Yu:
LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields. 390-398 - Mu Chen, Zhedong Zheng, Yi Yang:
Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation. 399-408 - Yujian Mo, Yan Wu, Junqiao Zhao, Zhenjie Hou, Weiquan Huang, Yinghao Hu, Jijun Wang, Jun Yan:
Sparse Query Dense: Enhancing 3D Object Detection with Pseudo Points. 409-418
Oral Session 8: Multimodal Reasoning & Inference
- Changmeng Zheng, Dayong Liang, Wengyu Zhang, Xiaoyong Wei, Tat-Seng Chua, Qing Li:
A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning. 419-428 - Qian Guo, Xinyan Liang, Yuhua Qian, Zhihua Cui, Jie Wen:
A Progressive Skip Reasoning Fusion Method for Multi-Modal Classification. 429-437 - Wenxin Xu, Hexin Jiang, Xuefeng Liang:
Leveraging Knowledge of Modality Experts for Incomplete Multimodal Learning. 438-446 - Bo Xu, Junzhe Zheng, Jiayuan He, Yuxuan Sun, Hongfei Lin, Liang Zhao, Feng Xia:
Generating Multimodal Metaphorical Features for Meme Understanding. 447-455 - Junjie Shi, Caozhi Shang, Zhaobin Sun, Li Yu, Xin Yang, Zengqiang Yan:
PASSION: Towards Effective Incomplete Multi-Modal Medical Image Segmentation with Imbalanced Missing Rates. 456-465 - Mengze Li, Kairong Han, Jiahe Xu, Yueying Li, Tao Wu, Zhou Zhao, Jiaxu Miao, Shengyu Zhang, Jingyuan Chen:
Cross-modal Observation Hypothesis Inference. 466-475
Oral Session 9: Image, Video, and Multimedia Processing
- Jiyang Li, Lechao Cheng, Zhangye Wang, Tingting Mu, Jingxuan He:
LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field. 476-485 - Chaofeng Chen, Sensen Yang, Haoning Wu, Liang Liao, Zicheng Zhang, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin:
Q-Ground: Image Quality Grounding with Large Multi-modality Models. 486-495 - Cheng Ye, Weidong Chen, Jingyu Li, Lei Zhang, Zhendong Mao:
Dual-path Collaborative Generation Network for Emotional Video Captioning. 496-505 - Hu Lin, Chengjiang Long, Yifeng Fei, Qianchen Xia, Erwei Yin, Baocai Yin, Xin Yang:
Exploring Matching Rates: From Keypoint Selection to Camera Relocalization. 506-514 - Zhihong Zhu, Xuxin Cheng, Zhaorun Chen, Yuyan Chen, Yunyan Zhang, Xian Wu, Yefeng Zheng, Bowen Xing:
InMu-Net: Advancing Multi-modal Intent Detection via Information Bottleneck and Multi-sensory Processing. 515-524 - Chaoya Jiang, Hongrui Jia, Mengfan Dong, Wei Ye, Haiyang Xu, Ming Yan, Ji Zhang, Shikun Zhang:
Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models. 525-534
Oral Session 10: Speech and Audio in Multimedia Processing
- Zhongxu Wang, Yujia Wang, Mingzhu Li, Hua Huang:
ArtSpeech: Adaptive Text-to-Speech Synthesis with Articulatory Representations. 535-544 - Shuai Yu, Xiaoliang He, Ke Chen, Yi Yu:
HKDSME: Heterogeneous Knowledge Distillation for Semi-supervised Singing Melody Extraction Using Harmonic Supervision. 545-553 - Yixuan Zhou, Xiaoyu Qin, Zeyu Jin, Shuoyi Zhou, Shun Lei, Songtao Zhou, Zhiyong Wu, Jia Jia:
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling. 554-563 - Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria:
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization. 564-572 - Xihua Wang, Yuyue Wang, Yihan Wu, Ruihua Song, Xu Tan, Zehua Chen, Hongteng Xu, Guodong Sui:
TiVA: Time-Aligned Video-to-Audio Generation. 573-582 - Alejandro Galán-Cuenca, Jose J. Valero-Mas, Juan C. Martinez-Sevilla, Antonio Hidalgo-Centeno, Antonio Pertusa, Jorge Calvo-Zaragoza:
MUSCAT: A Multimodal mUSic Collection for Automatic Transcription of Real Recordings and Image Scores. 583-591
Oral Session 11: Emotion & Sentiment
- Jianing Zhao, Jingjing Wang, Yujie Jin, Jiamin Luo, Guodong Zhou:
Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanced Video Large Language Model. 592-601 - Daiqing Wu, Dongbao Yang, Yu Zhou, Can Ma:
Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs. 602-611 - Tan Yu, Jingjing Wang, Jiawen Wang, Jiamin Luo, Guodong Zhou:
Towards Emotion-enriched Text-to-Motion Generation via LLM-guided Limb-level Emotion Manipulating. 612-621 - Wenjie Zheng, Jianfei Yu, Rui Xia:
A Unimodal Valence-Arousal Driven Contrastive Learning Framework for Multimodal Multi-Label Emotion Recognition. 622-631 - Xinji Mai, Junxiong Lin, Haoran Wang, Zeng Tao, Yan Wang, Shaoqi Yan, Xuan Tong, Jiawen Yu, Boyang Wang, Ziheng Zhou, Qing Zhao, Shuyong Gao, Wenqiang Zhang:
All rivers run into the sea: Unified Modality Brain-Inspired Emotional Central Mechanism. 632-641 - Xin Li, Shangfei Wang, Xuandong Huang:
Temporal Enhancement for Video Affective Content Analysis. 642-650
Poster Session 1
- Pei He, Licheng Jiao, Lingling Li, Xu Liu, Fang Liu, Wenping Ma, Shuyuan Yang, Ronghua Shang:
Domain Generalization-Aware Uncertainty Introspective Learning for 3D Point Clouds Segmentation. 651-660 - Yi Ma, Peiqi Duan, Yuchen Hong, Chu Zhou, Yu Zhang, Jimmy S. J. Ren, Boxin Shi:
Color4E: Event Demosaicing for Full-color Event Guided Image Deblurring. 661-670 - Jiajie Zhu, Xia Du, Jizhe Zhou, Chi-Man Pun, Qizhen Xu, Xiaoyuan Liu:
DP-RAE: A Dual-Phase Merging Reversible Adversarial Example for Image Privacy Protection. 671-680 - Xinyi Zhang, Qinpeng Cui, Qiqi Bao, Wenming Yang, Qingmin Liao:
Geometry-Guided Diffusion Model with Masked Transformer for Robust Multi-View 3D Human Pose Estimation. 681-690 - Meiqi Cao, Rui Yan, Xiangbo Shu, Guangzhao Dai, Yazhou Yao, Guo-Sen Xie:
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition. 691-700 - Junsheng Wang, Tiantian Gong, Yan Yan:
Partially Aligned Cross-modal Retrieval via Optimal Transport-based Prototype Alignment Learning. 701-709 - Hu Gao, Bowen Ma, Ying Zhang, Jingfan Yang, Jing Yang, Depeng Dang:
Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring. 710-718 - Hangjun Che, Xinyu Pu, Deqiang Ouyang, Beibei Li:
Enhanced Tensorial Self-representation Subspace Learning for Incomplete Multi-view Clustering. 719-728 - Jian-Jun Qiao, Meng-Yu Duan, Xiao Wu, Yu-Pei Song:
CartoonNet: Cartoon Parsing with Semantic Consistency and Structure Correlation. 729-737 - Qianyu Guo, Jieji Ren, Haofen Wang, Tianxing Wu, Weifeng Ge, Wenqiang Zhang:
Visual-Language Collaborative Representation Network for Broad-Domain Few-Shot Image Classification. 738-747 - Wenzhuo Xu, Kai Chen, Ziyi Gao, Zhipeng Wei, Jingjing Chen, Yu-Gang Jiang:
Highly Transferable Diffusion-based Unrestricted Adversarial Attack on Pre-trained Vision-Language Models. 748-757 - Hongzhi Wang, Xiubo Liang, Tao Zhang, Yue Gu, Weidong Geng:
PSSD-Transformer: Powerful Sparse Spike-Driven Transformer for Image Semantic Segmentation. 758-767 - Zengsheng Kuang, Changxing Ding, Huan Yao:
Learning Context with Priors for 3D Interacting Hand-Object Pose Estimation. 768-777 - Yang Chen, Jingcai Guo, Tian He, Xiaocheng Lu, Ling Wang:
Fine-Grained Side Information Guided Dual-Prompts for Zero-Shot Skeleton Action Recognition. 778-786 - Shuo Zhang, Yupeng Zhai, Jilin Mei, Yu Hu:
FusionOcc: Multi-Modal Fusion for 3D Occupancy Prediction. 787-796 - Shaokun Wang, Yifan Yu, Yuhang He, Yihong Gong:
Enhancing Pre-trained ViTs for Downstream Task Adaptation: A Locality-Aware Prompt Learning Method. 797-806 - Fangming Cui, Xun Yang, Chao Wu, Liang Xiao, Xinmei Tian:
Advancing Prompt Learning through an External Layer. 807-816 - Hanzi Wang, Jiamin Ren, Yifeng Ding, Lei Ren, Huixing Jiang, Wei Chen, Fangxiang Feng, Xiaojie Wang:
Q-MoE: Connector for MLLMs with Text-Driven Routing. 817-825 - Guozhen Peng, Yunhong Wang, Yuwei Zhao, Shaoxiong Zhang, Annan Li:
GLGait: A Global-Local Temporal Receptive Field Network for Gait Recognition in the Wild. 826-835 - Qiang Wang, Yuning Cui, Yawen Li, Yaping Ruan, Ben Zhu, Wenqi Ren:
RFFNet: Towards Robust and Flexible Fusion for Low-Light Image Denoising. 836-845 - Minghe Gao, Shuang Chen, Liang Pang, Yuan Yao, Jisheng Dang, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang, Tat-Seng Chua:
Fact : Teaching MLLMs with Faithful, Concise and Transferable Rationales. 846-855 - Yue Zhang, Parisa Kordjamshidi:
Narrowing the Gap between Vision and Action in Navigation. 856-865 - Zequn Zeng, Jianqiao Sun, Hao Zhang, Tiansheng Wen, Yudi Su, Yan Xie, Zhengjue Wang, Bo Chen:
HICEScore: A Hierarchical Metric for Image Captioning Evaluation. 866-875 - Chen Feng, Georgios Tzimiropoulos, Ioannis Patras:
CLIPCleaner: Cleaning Noisy Labels with CLIP. 876-885 - Haochen Zhao, Hui Meng, Deqian Yang, Xiaozheng Xie, Xiaoze Wu, Qingfeng Li, Jianwei Niu:
GuidedNet: Semi-Supervised Multi-Organ Segmentation via Labeled Data Guide Unlabeled Data. 886-895 - Kin-Chung Chan, Jun Xiao, Hana Lebeta Goshu, Kin-Man Lam:
Point Cloud Densification for 3D Gaussian Splatting from Sparse Input Views. 896-904 - Xiaorui Huang, Gen Luo, Chaoyang Zhu, Bo Tong, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji:
Deep Instruction Tuning for Segment Anything Model. 905-914 - Ziyi Wang, Yiming Rong, Deyang Jiang, Haoran Wu, Shiyu Zhou, Bo Xu:
CIEASR: Contextual Image-Enhanced Automatic Speech Recognition for Improved Homophone Discrimination. 915-924 - Jinxu Zhang, Yongqi Yu, Yu Zhang:
CREAM: Coarse-to-Fine Retrieval and Multi-modal Efficient Tuning for Document VQA. 925-934 - Hebaixu Wang, Hao Zhang, Xunpeng Yi, Xinyu Xiang, Leyuan Fang, Jiayi Ma:
TeRF: Text-driven and Region-aware Flexible Visible and Infrared Image Fusion. 935-944 - Ruonan Zhang, Ziwei Shang, Fengjuan Wang, Zhaoqilin Yang, Shan Cao, Yigang Cen, Gaoyun An:
Synergetic Prototype Learning Network for Unbiased Scene Graph Generation. 945-954 - Jiawei Zhu, Yishu Liu, Huanjia Zhu, Hui Lin, Yuncheng Jiang, Zheng Zhang, Bingzhi Chen:
Combating Visual Question Answering Hallucinations via Robust Multi-Space Co-Debias Learning. 955-964 - Qian Cao, Xu Chen, Ruihua Song, Xiting Wang, Xinting Huang, Yuchen Ren:
See or Guess: Counterfactually Regularized Image Captioning. 965-974 - Shuai Li, Fan Qi, Zixin Zhang, Changsheng Xu:
Cross-Modal Meta Consensus for Heterogeneous Federated Learning. 975-984 - Xiang He, Xiangxi Liu, Yang Li, Dongcheng Zhao, Guobin Shen, Qingqun Kong, Xin Yang, Yi Zeng:
CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization. 985-993 - Jiabao Guo, Huan Liu, Yizhi Luo, Xueli Hu, Hang Zou, Yuan Zhang, Hui Liu, Bo Zhao:
Style-conditional Prompt Token Learning for Generalizable Face Anti-spoofing. 994-1003 - Bowen Chen, Yun Sing Koh, Gillian Dobbie:
SSAT-Adapter: Enhancing Vision-Language Model Few-shot Learning with Auxiliary Tasks. 1004-1013 - Haoyu Tong, Xiaoyu Zhang, Yulin Jin, Jian Lou, Kai Wu, Xiaofeng Chen:
Balancing Generalization and Robustness in Adversarial Training via Steering through Clean and Adversarial Gradient Directions. 1014-1023 - Shuo Zheng, Yuanjie Dang, Peng Chen, Ruohong Huan, Dongdong Zhao, Ronghua Liang:
Saliency-Guided Fine-Grained Temporal Mask Learning for Few-Shot Action Recognition. 1024-1033 - Mengyin Liu, Chao Zhu, Shiqi Ren, Xu-Cheng Yin:
Unsupervised Multi-view Pedestrian Detection. 1034-1042 - Zhilin Huang, Yijie Yu, Ling Yang, Chujun Qin, Bing Zheng, Xiawu Zheng, Zikun Zhou, Yaowei Wang, Wenming Yang:
Motion-aware Latent Diffusion Models for Video Frame Interpolation. 1043-1052 - Zongxin Ye, Wenyu Li, Sidun Liu, Peng Qiao, Yong Dou:
AbsGS: Recovering Fine Details in 3D Gaussian Splatting. 1053-1061 - Ziming Wang, Boxiang Zhang, Ming Ma, Yue Wang, Taoli Du, Wenhui Li:
Multi-fineness Boundaries and the Shifted Ensemble-aware Encoding for Point Cloud Semantic Segmentation. 1062-1071 - Yubo Wang, Chaohu Liu, Yanqiu Qu, Haoyu Cao, Deqiang Jiang, Linli Xu:
Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models. 1072-1081 - Wenhao Li, Qiangchang Wang, Peng Zhao, Yilong Yin:
KNN Transformer with Pyramid Prompts for Few-Shot Learning. 1082-1091 - Lu Zhang, Ke Yan, Shouhong Ding:
AlignCLIP: Align Multi Domains of Texts Input for CLIP models with Object-IoU Loss. 1092-1100 - Pengfei Yue, Jianghang Lin, Shengchuan Zhang, Jie Hu, Yilin Lu, Hongwei Niu, Haixin Ding, Yan Zhang, Guannan Jiang, Liujuan Cao, Rongrong Ji:
Adaptive Selection based Referring Image Segmentation. 1101-1110 - Shanshan Wang, ALuSi, Xun Yang, Ke Xu, Huibin Tan, Xingyi Zhang:
Dual-stream Feature Augmentation for Domain Generalization. 1111-1119 - Yang Liu, Xiang Huang, Minghan Qin, Qinwei Lin, Haoqian Wang:
Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars. 1120-1129 - Wei Feng, Dongyuan Wei, Qianqian Wang, Bo Dong, Quanxue Gao:
Multi-View Clustering Based on Deep Non-negative Tensor Factorization. 1130-1138 - Aoqi Li, Saihui Hou, Chenye Wang, Qingyuan Cai, Yongzhen Huang:
AerialGait: Bridging Aerial and Ground Views for Gait Recognition. 1139-1147 - Zefan Zhang, Weiqi Zhang, Yanhui Li, Tian Bai:
Caption-Aware Multimodal Relation Extraction with Mutual Information Maximization. 1148-1157 - Xiaochen Li, Jian Cheng, Ziying Xia, Zichong Chen, Junhao Shi, Zhicheng Dong, Nyima Tashi:
TS-ILM: Class Incremental Learning for Online Action Detection. 1158-1167 - Yuxiang Cai, Yongheng Shang, Jianwei Yin:
MultiDAN: Unsupervised, Multistage, Multisource and Multitarget Domain Adaptation for Semantic Segmentation of Remote Sensing Images. 1168-1177 - Yu Tong, Weihai Lu, Zhe Zhao, Song Lai, Tong Shi:
MMDFND: Multi-modal Multi-Domain Fake News Detection. 1178-1186 - Minghang Zheng, Jiahua Zhang, Qingchao Chen, Yuxin Peng, Yang Liu:
ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding. 1187-1196 - Shilong Jia, Tingting Wu, Yingying Fang, Tieyong Zeng, Guixu Zhang, Zhi Li:
Purified Distillation: Bridging Domain Shift and Category Gap in Incremental Object Detection. 1197-1205 - Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen:
MPT: Multi-grained Prompt Tuning for Text-Video Retrieval. 1206-1214 - Ziwei Zheng, Zechuan Zhang, Yulin Wang, Shiji Song, Gao Huang, Le Yang:
Rethinking the Architecture Design for Efficient Generic Event Boundary Detection. 1215-1224 - Jinglun Li, Xinyu Zhou, Kaixun Jiang, Lingyi Hong, Pinxue Guo, Zhaoyu Chen, Weifeng Ge, Wenqiang Zhang:
TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning. 1225-1234 - Zihan Cao, Xiao Wu, Liang-Jian Deng, Yu Zhong:
A Novel State Space Model with Local Enhancement and State Sharing for Image Fusion. 1235-1244 - Zhenyu Yang, Shengsheng Qian, Dizhan Xue, Jiahong Wu, Fan Yang, Weiming Dong, Changsheng Xu:
Semantic Editing Increment Benefits Zero-Shot Composed Image Retrieval. 1245-1254 - Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li, Shuoyi Zhou, Songtao Zhou, Xiaoyu Qin, Zhiyong Wu:
SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description. 1255-1264 - Lihao Liu, Yanqi Cheng, Zhongying Deng, Shujun Wang, Dongdong Chen, Xiaowei Hu, Pietro Liò, Carola-Bibiane Schönlieb, Angelica E. Avilés-Rivero:
TrafficMOT: A Challenging Dataset for Multi-Object Tracking in Complex Traffic Scenarios. 1265-1273 - Jing Yang, Xiaowen Jiang, Yuan Gao, Laurence T. Yang, Jieming Yang:
Generalize to Fully Unseen Graphs: Learn Transferable Hyper-Relation Structures for Inductive Link Prediction. 1274-1282 - Panjun Liu, Jiacheng Li, Lizhi Wang, Zheng-Jun Zha, Zhiwei Xiong:
MLP Embedded Inverse Tone Mapping. 1283-1291 - Mingkai Lin, Wenzhong Li, Xiaobin Hong, Sanglu Lu:
Scalable Multi-Source Pre-training for Graph Neural Networks. 1292-1301 - Xiaole Zhao, Linze Li, Chengxing Xie, Xiaoming Zhang, Ting Jiang, Wenjie Lin, Shuaicheng Liu, Tianrui Li:
Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation. 1302-1310 - Minsu Kim, Jeong Hun Yeo, Se Jin Park, Hyeongseop Rha, Yong Man Ro:
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation. 1311-1320 - Shoutong Luo, Zhengxing Sun, Yi Wang, Yunhan Sun, Chendi Zhu:
LDCNet: Long-Distance Context Modeling for Large-Scale 3D Point Cloud Scene Semantic Segmentation. 1321-1330 - Yiming Cui, Liang Li, Jiehua Zhang, Chenggang Yan, Hongkui Wang, Shuai Wang, Heng Jin, Li Wu:
Stochastic Context Consistency Reasoning for Domain Adaptive Object Detection. 1331-1340 - Zhuoling Li, Yong Wang, Kaitong Li:
FewVS: A Vision-Semantics Integration Framework for Few-Shot Image Classification. 1341-1350 - Yuyan Bu, Qiang Sheng, Juan Cao, Peng Qi, Danding Wang, Jintao Li:
FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process. 1351-1360 - Subash Khanal, Eric Xing, Srikumar Sastry, Aayush Dhakal, Zhexiao Xiong, Adeel Ahmad, Nathan Jacobs:
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping. 1361-1369 - Zizhao Wu, Haohan Li, Gongyi Chen, Zhou Yu, Xiaoling Gu, Yigang Wang:
3D Question Answering with Scene Graph Reasoning. 1370-1378 - Liang He, Hongke Wang, Zhen Wu, Jianbing Zhang, Xinyu Dai, Jiajun Chen:
Focus & Gating: A Multimodal Approach for Unveiling Relations in Noisy Social Media. 1379-1388 - Yuanchen Wu, Xiaoqiang Li, Jide Li, Kequan Yang, Pinpin Zhu, Shaohua Zhang:
DINO is Also a Semantic Guider: Exploiting Class-aware Affinity for Weakly Supervised Semantic Segmentation. 1389-1397 - Dongshuo Yin, Xueting Han, Bin Li, Hao Feng, Jing Bai:
Parameter-efficient is not Sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions. 1398-1406 - Rongwen Li, Haiyang Hu, Liang Du, Jiarong Chen, Bingbing Jiang, Peng Zhou:
One-Stage Fair Multi-View Spectral Clustering. 1407-1416 - Jingfan Tan, Hyunhee Park, Ying Zhang, Tao Wang, Kaihao Zhang, Xiangyu Kong, Pengwen Dai, Zikun Liu, Wenhan Luo:
Blind Face Video Restoration with Temporal Consistent Generative Prior and Degradation-Aware Prompt. 1417-1426 - Yinghui Sun, Xingfeng Li, Quansen Sun, Min-Ling Zhang, Zhenwen Ren:
Improved Weighted Tensor Schatten p-Norm for Fast Multi-view Graph Clustering. 1427-1436 - Xinjie Jiang, Chenxi Zheng, Xuemiao Xu, Bangzhen Liu, Weiying Zheng, Huaidong Zhang, Shengfeng He:
VrdONE: One-stage Video Visual Relation Detection. 1437-1446 - Chenxi Ma, Weimin Tan, Shili Zhou, Bo Yan:
Learning Cross-Spectral Prior for Image Super-Resolution. 1447-1455 - Dayu Hu, Suyuan Liu, Jun Wang, Junpu Zhang, Siwei Wang, Xingchen Hu, Xinzhong Zhu, Chang Tang, Xinwang Liu:
Reliable Attribute-missing Multi-view Clustering with Instance-level and feature-level Cooperative Imputation. 1456-1466 - Duc Dang Trung Tran, Byeongkeun Kang, Yeejin Lee:
MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation. 1467-1475 - Jingjing Hu, Dan Guo, Kun Li, Zhan Si, Xun Yang, Meng Wang:
Maskable Retentive Network for Video Moment Retrieval. 1476-1485 - Junming Hou, Zihan Cao, Naishan Zheng, Xuan Li, Xiaoyu Chen, Xinyang Liu, Xiaofeng Cong, Danfeng Hong, Man Zhou:
Linearly-evolved Transformer for Pan-sharpening. 1486-1494 - Zhenhao Yang, Xin Liu, Deqiang Ouyang, Guiduo Duan, Dongyang Zhang, Tao He, Yuan-Fang Li:
Towards Open-vocabulary HOI Detection with Calibrated Vision-language Models and Locality-aware Queries. 1495-1504 - Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, Jintao Cheng, Kaiwei Wang, Zhiyong Li, Kailun Yang:
MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model. 1505-1513 - Tao Tang, Hong Liu, Yingxuan You, Ti Wang, Wenhao Li:
ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos. 1514-1523 - Xudong Lu, Yuqi Jiang, Haiwen Hong, Qi Sun, Cheng Zhuo:
DCAFuse: Dual-Branch Diffusion-CNN Complementary Feature Aggregation Network for Multi-Modality Image Fusion. 1524-1533 - Wenbin Zou, Hongxia Gao, Weipeng Yang, Tongtong Liu:
Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement. 1534-1543 - Junwei He, Qianqian Xu, Yangbangyan Jiang, Zitai Wang, Yuchen Sun, Qingming Huang:
HGOE: Hybrid External and Internal Graph Outlier Exposure for Graph Out-of-Distribution Detection. 1544-1553 - Ke Liang, Lingyuan Meng, Yue Liu, Meng Liu, Wei Wei, Suyuan Liu, Wenxuan Tu, Siwei Wang, Sihang Zhou, Xinwang Liu:
Simple Yet Effective: Structure Guided Pre-trained Transformer for Multi-modal Knowledge Graph Reasoning. 1554-1563 - Yuning Ding, Sifan Zhang, Shenglan Liu, Jinrong Zhang, Wenyue Chen, Haifei Duan, Bingcheng Dong, Tao Sun:
2M-AF: A Strong Multi-Modality Framework For Human Action Quality Assessment with Self-supervised Representation Learning. 1564-1572 - Liqiu Chen, Yuqing Huang, Hengyu Li, Zikun Zhou, Zhenyu He:
Simplifying Cross-modal Interaction via Modality-Shared Features for RGBT Tracking. 1573-1582 - Can Cui, Siteng Huang, Wenxuan Song, Pengxiang Ding, Min Zhang, Donglin Wang:
ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification. 1583-1592 - Tianqi Wei, Zhi Chen, Zi Huang, Xin Yu:
Benchmarking In-the-Wild Multimodal Disease Recognition and A Versatile Baseline. 1593-1601 - Jiaming Lei, Lin Li, Chunping Wang, Jun Xiao, Long Chen:
Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer. 1602-1611 - Jinyong Wen:
Gaussian Mutual Information Maximization for Efficient Graph Self-Supervised Learning: Bridging Contrastive-based to Decorrelation-based. 1612-1621 - Haowei Kuang, Yiyang Ma, Wenhan Yang, Zongming Guo, Jiaying Liu:
Consistency Guided Diffusion Model with Neural Syntax for Perceptual Image Compression. 1622-1631 - Zhangchi Feng, Richong Zhang, Zhijie Nie:
Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives. 1632-1641 - Guanchen Ding, Lingbo Liu, Zhenzhong Chen, Changwen Chen:
Domain-Agnostic Crowd Counting via Uncertainty-Guided Style Diversity Augmentation. 1642-1651 - Cunhang Fan, Jingjing Zhang, Hongyu Zhang, Wang Xiang, Jianhua Tao, Xinhui Li, Jiangyan Yi, Dianbo Sui, Zhao Lv:
MSFNet: Multi-Scale Fusion Network for Brain-Controlled Speaker Extraction. 1652-1661 - Zhong Ji, Changxu Meng, Yan Zhang, Haoran Wang, Yanwei Pang, Jungong Han:
Eliminate Before Align: A Remote Sensing Image-Text Retrieval Framework with Keyword Explicit Reasoning. 1662-1671 - Jinyan Zhang, Mengyuan Liu, Hong Liu, Guoquan Wang, Wenhao Li:
APP: Adaptive Pose Pooling for 3D Human Pose Estimation from Videos. 1672-1681 - Jing Bi, Yunlong Tang, Luchuan Song, Ali Vosoughi, Nguyen Nguyen, Chenliang Xu:
EAGLE: Egocentric AGgregated Language-video Engine. 1682-1691 - Kai Yin, Jie Shen:
Expanded Convolutional Neural Network Based Look-Up Tables for High Efficient Single-Image Super-Resolution. 1692-1700 - Zheng Han, Xiaobin Zhu, Chun Yang, Hongyang Zhou, Jingyan Qin, Xu-Cheng Yin:
Exploring Stable Meta-Optimization Patterns via Differentiable Reinforcement Learning for Few-Shot Classification. 1701-1710 - Yixin Guo, Yu Liu, Jianghao Li, Weimin Wang, Qi Jia:
Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection. 1711-1720 - Jiangbin Zheng, Han Zhang, Qianqing Xu, An-Ping Zeng, Stan Z. Li:
MetaEnzyme: Meta Pan-Enzyme Learning for Task-Adaptive Redesign. 1721-1730 - Yiming Zhong, Xiaolin Zhang, Yao Zhao, Yunchao Wei:
DreamLCM: Towards High Quality Text-to-3D Generation via Latent Consistency Model. 1731-1740 - Anna Zhu, Ke Xiao, Bo Zhou, Runmin Wang:
Trust Prophet or Not? Taking a Further Verification Step toward Accurate Scene Text Recognition. 1741-1750 - Gongli Xi, Ye Tian, Mengyu Yang, Lanshan Zhang, Xirong Que, Wendong Wang:
Global Patch-wise Attention is Masterful Facilitator for Masked Image Modeling. 1751-1760 - Chenghao Deng, Haote Xu, Xiaolu Chen, Haodi Xu, Xiaotong Tu, Xinghao Ding, Yue Huang:
SimCLIP: Refining Image-Text Alignment with Simple Prompts for Zero-/Few-shot Anomaly Detection. 1761-1770 - Yuanhe Tian, Fei Xia, Yan Song:
Diffusion Networks with Task-Specific Noise Control for Radiology Report Generation. 1771-1780 - Yun Xing, Qing Guo, Xiaofeng Cao, Ivor W. Tsang, Lei Ma:
MetaRepair: Learning to Repair Deep Neural Networks from Repairing Experiences. 1781-1790 - Xingtao Wang, Xianqi Zhang, Wenxue Cui, Ruiqin Xiong, Xiaopeng Fan, Debin Zhao:
Mesh Denoising Using Filtering Coefficients Jointly Aware of Noise and Geometry. 1791-1799 - Yan Zhuang, Yanru Zhang, Zheng Hu, Xiaoyue Zhang, Jiawen Deng, Fuji Ren:
GLoMo: Global-Local Modal Fusion for Multimodal Sentiment Analysis. 1800-1809 - Yuhui Wu, Guoqing Wang, Zhiwen Wang, Yang Yang, Tianyu Li, Malu Zhang, Chongyi Li, Heng Tao Shen:
JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement. 1810-1818 - Zichen Wen, Tianyi Wu, Yazhou Ren, Yawen Ling, Chenhang Cui, Xiaorong Pu, Lifang He:
Dual-Optimized Adaptive Graph Reconstruction for Multi-View Graph Clustering. 1819-1828 - Xiaobin Lu, Xiaobin Hu, Jun Luo, Ben Zhu, Yaping Ruan, Wenqi Ren:
3D Priors-Guided Diffusion for Blind Face Restoration. 1829-1838 - Hao Wu, Likun Zhang, Shucheng Li, Fengyuan Xu, Sheng Zhong:
CoAst: Validation-Free Contribution Assessment for Federated Learning based on Cross-Round Valuation. 1839-1847 - Kang Xia, Wenzhong Li, Yimiao Shao, Sanglu Lu:
Vi2ACT: Video-enhanced Cross-modal Co-learning with Representation Conditional Discriminator for Few-shot Human Activity Recognition. 1848-1856 - Seonggwan Ko, Yeong Jun Koh, Donghyeon Cho:
Reference-based Burst Super-resolution. 1857-1865 - Yi Zhang, Zhefeng Wang, Rui Hu, Xinyu Duan, Yi Zheng, Baoxing Huai, Jiarun Han, Jitao Sang:
Poisoning for Debiasing: Fair Recognition via Eliminating Bias Uncovered in Data Poisoning. 1866-1874 - Dizhan Xue, Shengsheng Qian, Changsheng Xu:
Few-Shot Multimodal Explanation for Visual Question Answering. 1875-1884 - Jingtao Wang, Zechao Li:
3DPCP-Net: A Lightweight Progressive 3D Correspondence Pruning Network for Accurate and Efficient Point Cloud Registration. 1885-1894 - Jiawei Ge, Jiuxin Cao, Xuelin Zhu, Xinyu Zhang, Chang Liu, Kun Wang, Bo Liu:
Consistencies are All You Need for Semi-supervised Vision-Language Tracking. 1895-1904 - Zhen Zou, Hu Yu, Jie Huang, Feng Zhao:
FreqMamba: Viewing Mamba from a Frequency Perspective for Image Deraining. 1905-1914 - Zhida Zhao, Jia Li, Lijun Wang, Yifan Wang, Huchuan Lu:
MaskMentor: Unlocking the Potential of Masked Self-Teaching for Missing Modality RGB-D Semantic Segmentation. 1915-1923 - Linli Yao, Yuanmeng Zhang, Ziheng Wang, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Xu Sun, Qin Jin:
Edit As You Wish: Video Caption Editing with Multi-grained User Control. 1924-1933 - Wenlin Li, Yucheng Xu, Xiaoqing Zheng, Suoya Han, Jun Wang, Xiaobo Sun:
Dual Advancement of Representation Learning and Clustering for Sparse and Noisy Images. 1934-1942 - Zhiwei Hao, Zhongyu Xiao, Yong Luo, Jianyuan Guo, Jing Wang, Li Shen, Han Hu:
PrimKD: Primary Modality Guided Multimodal Fusion for RGB-D Semantic Segmentation. 1943-1951 - Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang:
Neural Interaction Energy for Multi-Agent Trajectory Prediction. 1952-1960 - Hao Gu, Jiangyan Yi, Chenglong Wang, Yong Ren, Jianhua Tao, Xinrui Yan, Yujie Chen, Xiaohui Zhang:
Utilizing Speaker Profiles for Impersonation Audio Detection. 1961-1970 - Zejun Li, Ye Wang, Mengfei Du, Qingwen Liu, Binhao Wu, Jiwen Zhang, Chengxing Zhou, Zhihao Fan, Jie Fu, Jingjing Chen, Zhongyu Wei, Xuanjing Huang:
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks. 1971-1980 - Jiankang Chen, Ling Deng, Zhiyong Gan, Wei-Shi Zheng, Ruixuan Wang:
FodFoM: Fake Outlier Data by Foundation Models Creates Stronger Visual Out-of-Distribution Detector. 1981-1990 - Xudong Wang, Weihong Ren, Xi'ai Chen, Huijie Fan, Yandong Tang, Zhi Han:
Uni-YOLO: Vision-Language Model-Guided YOLO for Robust and Fast Universal Detection in the Open World. 1991-2000 - Junliu Zhong, Zhiyi Li, Dan Xiang, Maotang Han, Changsheng Li, Yanfen Gan:
A Lightweight Multi-domain Multi-attention Progressive Network for Single Image Deraining. 2001-2010 - Weijia Zhang, Dongnan Liu, Weidong Cai, Chao Ma:
Cross-View Consistency Regularisation for Knowledge Distillation. 2011-2020 - Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang:
Autogenic Language Embedding for Coherent Point Tracking. 2021-2030 - Yuwen Pan, Rui Sun, Yuan Wang, Tianzhu Zhang, Yongdong Zhang:
Rethinking the Implicit Optimization Paradigm with Dual Alignments for Referring Remote Sensing Image Segmentation. 2031-2040 - Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Hao Li, Ming Tang, Jinqiao Wang:
FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization. 2041-2049 - Yi Lei, Huilin Zhu, Jingling Yuan, Guangli Xiang, Xian Zhong, Shengfeng He:
DenseTrack: Drone-Based Crowd Tracking via Density-Aware Motion-Appearance Synergy. 2050-2058 - Fengze Jiang, Shuling Wang, Xiaojin Gong:
Task-Conditional Adapter for Multi-Task Dense Prediction. 2059-2068 - Yitai Lin, Zhijie Wei, Wanfa Zhang, Xiping Lin, Yudi Dai, Chenglu Wen, Siqi Shen, Lan Xu, Cheng Wang:
HmPEAR: A Dataset for Human Pose Estimation and Action Recognition. 2069-2078 - Deji Zhao, Donghong Han, Ye Yuan, Bo Ning, Mengxiang Li, Zhongjiang He, Shuangyong Song:
AutoGraph: Enabling Visual Context via Graph Alignment in Open Domain Multi-Modal Dialogue Generation. 2079-2088 - Jiaxin Zhang, Yiqi Wang, Xihong Yang, Siwei Wang, Yu Feng, Yu Shi, Ruichao Ren, En Zhu, Xinwang Liu:
Test-Time Training on Graphs with Large Language Models (LLMs). 2089-2098 - Yujia Xiao, Xi Wang, Xu Tan, Lei He, Xinfa Zhu, Sheng Zhao, Tan Lee:
Contrastive Context-Speech Pretraining for Expressive Text-to-Speech Synthesis. 2099-2107 - Junyu Lin, Yan Zheng, Xinyue Chen, Yazhou Ren, Xiaorong Pu, Jing He:
Cross-view Contrastive Unification Guides Generative Pretraining for Molecular Property Prediction. 2108-2116 - Bo Yuan, Danpei Zhao, Zhuoran Liu, Wentao Li, Tian Li:
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images. 2117-2126 - Shidi Chen, Lili Wei, Liqian Liang, Congyan Lang:
Joint Homophily and Heterophily Relational Knowledge Distillation for Efficient and Compact 3D Object Detection. 2127-2135 - Zhiwen Wang, Yuhui Wu, Zheng Wang, Jiwei Wei, Tianyu Li, Guoqing Wang, Yang Yang, Hengtao Shen:
Cascaded Adversarial Attack: Simultaneously Fooling Rain Removal and Semantic Segmentation Networks. 2136-2145 - Jiexuan Yan, Sheng Huang, Nankun Mu, Luwen Huangfu, Bo Liu:
Category-Prompt Refined Feature Learning for Long-Tailed Multi-Label Image Classification. 2146-2155 - Penglei Sun, Yaoxian Song, Xiang Liu, Xiaofei Yang, Qiang Wang, Tiefeng Li, Yang Yang, Xiaowen Chu:
3D Question Answering for City Scene Understanding. 2156-2165 - Qiuyu Kong, Jiangming Chen, Jie Jiang, Zanxi Ruan, Lai Kang:
Dual-Branch Fusion with Style Modulation for Cross-Domain Few-Shot Semantic Segmentation. 2166-2174 - Jiaqi Wang, Lu Lu, Mingmin Chi, Jian Chen:
MDR: Multi-stage Decoupled Relational Knowledge Distillation with Adaptive Stage Selection. 2175-2183 - Xiongjun Zhao, Zhengyu Liu, Fen Liu, Guanting Li, Yutao Dou, Shaoliang Peng:
Report-Concept Textual-Prompt Learning for Enhancing X-ray Diagnosis. 2184-2193 - Jianzhi Lu, Ruian He, Shili Zhou, Weimin Tan, Bo Yan:
FacialFlowNet: Advancing Facial Optical Flow Estimation with a Diverse Dataset and a Decomposed Model. 2194-2203 - Wei-Bang Jiang, Yu-Ting Lan, Bao-Liang Lu:
REmoNet: Reducing Emotional Label Noise via Multi-regularized Self-supervision. 2204-2213 - Shuxun Wang, Yunfei Lei, Ziqi Zhang, Wei Liu, Haowei Liu, Li Yang, Bing Li, Wenjuan Li, Jin Gao, Weiming Hu:
NFT1000: A Cross-Modal Dataset For Non-Fungible Token Retrieval. 2214-2222 - Haoyang Su, Wenzhe Du, Xiaoliang Wang, Cam-Tu Nguyen:
Sample Efficiency Matters: Training Multimodal Conversational Recommendation Systems in a Small Data Setting. 2223-2232 - Xincheng Ju, Dong Zhang, Suyang Zhu, Junhui Li, Shoushan Li, Guodong Zhou:
ECFCON: Emotion Consequence Forecasting in Conversations. 2233-2241 - Xiangbo Yin, Jiangming Shi, Yachao Zhang, Yang Lu, Zhizhong Zhang, Yuan Xie, Yanyun Qu:
Robust Pseudo-label Learning with Neighbor Relation for Unsupervised Visible-Infrared Person Re-Identification. 2242-2251 - Yubo Li, De Cheng, Chaowei Fang, Changzhe Jiao, Nannan Wang, Xinbo Gao:
Disentangling Identity Features from Interference Factors for Cloth-Changing Person Re-identification. 2252-2261 - Bing Wang, Shengsheng Wang, Changchun Li, Renchu Guan, Ximing Li:
Harmfully Manipulated Images Matter in Multimodal Misinformation Detection. 2262-2271 - Wuliang Huang, Yiqiang Chen, Xinlong Jiang, Chenlong Gao, Qian Chen, Teng Zhang, Bingjie Yan, Yifan Wang, Jianrong Yang:
Correlation-Driven Multi-Modality Graph Decomposition for Cross-Subject Emotion Recognition. 2272-2281 - Wenbin Wang, Liang Ding, Li Shen, Yong Luo, Han Hu, Dacheng Tao:
WisdoM: Improving Multimodal Sentiment Analysis by Fusing Contextual World Knowledge. 2282-2291 - Zhanpeng Chen, Zhihong Zhu, Wanshi Xu, Yunyan Zhang, Xian Wu, Yefeng Zheng:
Aspects are Anchors: Towards Multimodal Aspect-based Sentiment Analysis via Aspect-driven Alignment and Refinement. 2292-2300 - Haodong Chen, Haojian Huang, Junhao Dong, Mingzhe Zheng, Dian Shao:
FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs. 2301-2310 - Honghao Li, Lei Sang, Yi Zhang, Yiwen Zhang:
SimCEN: Simple Contrast-enhanced Network for CTR Prediction. 2311-2320 - Yuanyuan Shi, Yunan Li, Siyu Liang, Huizhou Chen, Qiguang Miao:
MGR-Dark: A Large Multimodal Video Dataset and RGB-IR Benchmark for Gesture Recognition in Darkness. 2321-2330 - Shuanglin Yan, Jun Liu, Neng Dong, Liyan Zhang, Jinhui Tang:
Prototypical Prompting for Text-to-image Person Re-identification. 2331-2340 - Kexiang Feng, Chuanmin Jia, Siwei Ma, Wen Gao:
Unifying Spike Perception and Prediction: A Compact Spike Representation Model Using Multi-scale Correlation. 2341-2349 - Feifei Zhang, Sijia Qu, Fan Shi, Changsheng Xu:
Overcoming the Pitfalls of Vision-Language Model for Image-Text Retrieval. 2350-2359 - Francesco Tonini, Nicola Dall'Asen, Lorenzo Vaquero, Cigdem Beyan, Elisa Ricci:
AL-GTD: Deep Active Learning for Gaze Target Detection. 2360-2369 - Yuxiang Zhou, Zhe Sun, Rui Liu, Yong Chen, Dell Zhang:
AVHash: Joint Audio-Visual Hashing for Video Retrieval. 2370-2378 - Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li:
DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines. 2379-2388 - Qian Li, Yucheng Zhou, Cheng Ji, Feihong Lu, Jianian Gong, Shangguang Wang, Jianxin Li:
Multi-Modal Inductive Framework for Text-Video Retrieval. 2389-2398 - Hancheng Zhu, Ju Shi, Zhiwen Shao, Rui Yao, Yong Zhou, Jiaqi Zhao, Leida Li:
Attribute-Driven Multimodal Hierarchical Prompts for Image Aesthetic Quality Assessment. 2399-2408 - Zeyu Xiao, Dachun Kai, Yueyi Zhang, Xiaoyan Sun, Zhiwei Xiong:
Asymmetric Event-Guided Video Super-Resolution. 2409-2418 - Yuanfeng Pan, Wenkang Su, Jiangqun Ni, Qingliang Liu, Yulin Zhang, Donghua Jiang:
Model-Based Non-Independent Distortion Cost Design for Effective JPEG Steganography. 2419-2427 - Xianghu Yue, Xueyi Zhang, Yiming Chen, Chengwei Zhang, Mingrui Lao, Huiping Zhuang, Xinyuan Qian, Haizhou Li:
MMAL: Multi-Modal Analytic Learning for Exemplar-Free Audio-Visual Class Incremental Tasks. 2428-2437 - Yuzheng Wang, Zhaoyu Chen, Jie Zhang, Dingkang Yang, Zuhao Ge, Yang Liu, Siao Liu, Yunquan Sun, Wenqiang Zhang, Lizhe Qi:
Sampling to Distill: Knowledge Transfer from Open-World Data. 2438-2447 - Xi Wu, Chuang Huang, Xinliu Liu, Fei Zhou, Zhenwen Ren:
Multiple Kernel Clustering with Shifted Laplacian on Grassmann Manifold. 2448-2456 - Guangyao Li, Yajun Jian, Yan Yan, Hanzi Wang:
GLATrack: Global and Local Awareness for Open-Vocabulary Multiple Object Tracking. 2457-2466 - Xuze Hao, Wenqian Ni, Xuhao Jiang, Weimin Tan, Bo Yan:
Addressing Imbalance for Class Incremental Learning in Medical Image Classification. 2467-2476 - Qiwei Li, Yuxin Peng, Jiahuan Zhou:
Progressive Prototype Evolving for Dual-Forgetting Mitigation in Non-Exemplar Online Continual Learning. 2477-2486 - Fengfan Zhou, Qianyu Zhou, Bangjie Yin, Hui Zheng, Xuequan Lu, Lizhuang Ma, Hefei Ling:
Rethinking Impersonation and Dodging Attacks on Face Recognition Systems. 2487-2496 - Xin Chen, Bin Wang, Jinzheng Jiang, Kunkun Zhang, Yongsheng Gao:
SDePR: Fine-Grained Leaf Image Retrieval with Structural Deep Patch Representation. 2497-2505 - Yuhan Liu, Qianxin Huang, Siqi Hui, Jingwen Fu, Sanping Zhou, Kangyi Wu, Pengna Li, Jinjun Wang:
Semantic-aware Representation Learning for Homography Estimation. 2506-2514 - Chen Hui, Haiqi Zhu, Shuya Yan, Shaohui Liu, Feng Jiang, Debin Zhao:
S2-CSNet: Scale-Aware Scalable Sampling Network for Image Compressive Sensing. 2515-2524 - Gangyan Zeng, Yuan Zhang, Jin Wei, Dongbao Yang, Peng Zhang, Yiwen Gao, Xugong Qin, Yu Zhou:
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval. 2525-2534 - Hua Yu, Weiming Liu, Jiapeng Bai, Xu Gui, Yaqing Hou, Yew-Soon Ong, Qiang Zhang:
Towards Efficient and Diverse Generative Model for Unconditional Human Motion Synthesis. 2535-2544 - Dan Zeng, Yu Zhu, Shuiwang Li, Qijun Zhao, Qiaomu Shen, Bo Tang:
Towards Labeling-free Fine-grained Animal Pose Estimation. 2545-2553 - Rui Xie, Anlong Ming, Shuai He, Yi Xiao, Huadong Ma:
"Special Relativity" of Image Aesthetics Assessment: a Preliminary Empirical Perspective. 2554-2563 - Zhengwei Yin, Mingze Ma, Guixu Lin, Yinqiang Zheng:
Exploring Data Efficiency in Image Restoration: A Gaussian Denoising Case Study. 2564-2573 - Yuntao Wang, Jinpu Zhang, Ruonan Wei, Wenbo Gao, Yuehuan Wang:
MFRGN: Multi-scale Feature Representation Generalization Network for Ground-to-Aerial Geo-localization. 2574-2583 - Chang Wu, Guancheng Quan, Gang He, Xin-Quan Lai, Yunsong Li, Wenxin Yu, Xianmeng Lin, Cheng Yang:
QS-NeRV: Real-Time Quality-Scalable Decoding with Neural Representation for Videos. 2584-2592 - Xiaoyu Han, Shunyuan Zheng, Zonglin Li, Chenyang Wang, Xin Sun, Quanling Meng:
Shape-Guided Clothing Warping for Virtual Try-On. 2593-2602 - Richen Liu, Hansheng Wang, Hailong Wang, Siru Chen, Chufan Lai, Ayush Kumar, Siming Chen:
ScaleTraversal: Creating Multi-Scale Biomedical Animation with Limited Hardware Resources. 2603-2612 - Chenrui Wu, Haishuai Wang, Xiang Zhang, Zhen Fang, Jiajun Bu:
Spatio-temporal Heterogeneous Federated Learning for Time Series Classification with Multi-view Orthogonal Training. 2613-2622 - Yaopeng Peng, Milan Sonka, Danny Z. Chen:
Group Vision Transformer. 2623-2631 - Zhichao Yang, Leida Li, Pengfei Chen, Jinjian Wu, Weisheng Dong:
Semantics-Aware Image Aesthetics Assessment using Tag Matching and Contrastive Ranking. 2632-2641 - Pengcheng Zhang, Xiaohan Yu, Xiao Bai, Jin Zheng, Xin Ning:
Prompting Continual Person Search. 2642-2651 - Xiao Zhao, Xukun Zhang, Dingkang Yang, Mingyang Sun, Mingcheng Li, Shunli Wang, Lihua Zhang:
MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation. 2652-2661 - Yong Yang, Aoqi Zhao, Shuying Huang, Xiaozheng Wang, Yajing Fan:
SCPSN: Spectral Clustering-based Pyramid Super-resolution Network for Hyperspectral Images. 2662-2670 - Xiangyu Chen, Yihao Liu, Yuandong Pu, Wenlong Zhang, Jiantao Zhou, Yu Qiao, Chao Dong:
Learning A Low-Level Vision Generalist via Visual Task Prompt. 2671-2680 - Wenxu Shi, Bochuan Zheng:
Alleviating the Equilibrium Challenge with Sample Virtual Labeling for Adversarial Domain Adaptation. 2681-2689 - Federico Espositi, Andrea Bonarini:
The Room: Design and Embodiment of Spaces as Social Beings. 2690-2699 - Chunjie Ma, Lina Du, Zan Gao, Li Zhuo, Meng Wang:
A Coarse to Fine Detection Method for Prohibited Object in X-ray Images Based on Progressive Transformer Decoder. 2700-2708 - Qizhi Xie, Kun Yuan, Yunpeng Qu, Mingda Wu, Ming Sun, Chao Zhou, Jihong Zhu:
QPT-V2: Masked Image Modeling Advances Visual Scoring. 2709-2718 - Shengguang Wu, Zhenglun Chen, Qi Su:
Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision. 2719-2728 - Yu Feng, Zhen Tian, Yifan Zhu, Zongfu Han, Haoran Luo, Guangwei Zhang, Meina Song:
CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning. 2729-2738 - Huixiang Wen, Shizong Yan, Shan Chang, Jie Xu, Hongzi Zhu, Yanting Zhang, Bo Li:
DepthCloak: Projecting Optical Camouflage Patches for Erroneous Monocular Depth Estimation of Vehicles. 2739-2747 - Keming Wu, Man Yao, Yuhong Chou, Xuerui Qiu, Rui Yang, Bo Xu, Guoqi Li:
RSC-SNN: Exploring the Trade-off Between Adversarial Robustness and Accuracy in Spiking Neural Networks via Randomized Smoothing Coding. 2748-2756 - Xueying Mao, Xiaoxiao Hu, Wanli Peng, Zhenliang Gan, Zhenxing Qian, Xinpeng Zhang, Sheng Li:
From Covert Hiding To Visual Editing: Robust Generative Video Steganography. 2757-2765 - Wu Ran, Peirong Ma, Zhiquan He, Hong Lu:
Rainmer: Learning Multi-view Representations for Comprehensive Image Deraining and Beyond. 2766-2775 - Haoxuan Li, Zhengmao Yang, Yunshan Ma, Yi Bin, Yang Yang, Tat-Seng Chua:
MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models. 2776-2785 - Shuyuan Wen, Bingrui Hu, Wenchao Li:
CDEA: Context- and Detail-Enhanced Unsupervised Learning for Domain Adaptive Semantic Segmentation. 2786-2794 - Xitong Ling, Minxi Ouyang, Yizhi Wang, Xinrui Chen, Renao Yan, Hongbo Chu, Junru Cheng, Tian Guan, Sufang Tian, Xiaoping Liu, Yonghong He:
Agent Aggregator with Mask Denoise Mechanism for Histopathology Whole Slide Image Analysis. 2795-2803 - Kepeng Xu, Zijia Ma, Li Xu, Gang He, Yunsong Li, Wenxin Yu, Taichu Han, Cheng Yang:
An End-to-End Real-World Camera Imaging Pipeline. 2804-2813 - Lijian Yang, Weisheng Li, Yucheng Shu, Jian-Xun Mi, Yuping Huang, Bin Xiao:
ShiftMorph: A Fast and Robust Convolutional Neural Network for 3D Deformable Medical Image Registration. 2814-2823 - Ximing Wu, Kongyange Zhao, Xu Chen, Teng Liang:
Edge-assisted Real-time Dynamic 3D Point Cloud Rendering for Multi-party Mobile Virtual Reality. 2824-2832 - Nannan Yu, Tao Ma, Jiqing Zhang, Yuji Zhang, Qirui Bao, Xiaopeng Wei, Xin Yang:
Adaptive Vision Transformer for Event-Based Human Pose Estimation. 2833-2841 - Litian Zhang, Xiaoming Zhang, Chaozhuo Li, Ziyi Zhou, Jiacheng Liu, Feiran Huang, Xi Zhang:
Mitigating Social Hazards: Early Detection of Fake News via Diffusion-Guided Propagation Path Generation. 2842-2851 - Yuzhen Du, Teng Hu, Ran Yi, Lizhuang Ma:
LD-BFR: Vector-Quantization-Based Face Restoration Model with Latent Diffusion Enhancement. 2852-2860 - Jie Huang, Zhao-Min Chen, Xiaoqin Zhang, Yisu Ge, Lusi Ye, Guodao Zhang, Huiling Chen:
Label Decoupling and Reconstruction: A Two-Stage Training Framework for Long-tailed Multi-label Medical Image Recognition. 2861-2869 - Chengpei Xu, Hao Fu, Long Ma, Wenjing Jia, Chengqi Zhang, Feng Xia, Xiaoyu Ai, Binghao Li, Wenjie Zhang:
Seeing Text in the Dark: Algorithm and Benchmark. 2870-2878 - Ye Tian, Zhe Wang, Jianguo Sun, Liguo Zhang:
Time-Frequency Domain Fusion Enhancement for Audio Super-Resolution. 2879-2887 - Lei Liu, Li Liu, Yawen Cui:
Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for Long-Tailed Continual Learning. 2888-2897 - Tianjiao Xu, Aoxuan Chen, Yuxi Zhao, Jinfei Gao, Tian Gan:
A Chinese Multimodal Social Video Dataset for Controversy Detection. 2898-2907 - Zhe Ji, Qiansiqi Hu, Yicheng Zheng, Liyao Xiang, Xinbing Wang:
A Principled Approach to Natural Language Watermarking. 2908-2916 - Hao Wu, Fan Xu, Chong Chen, Xian-Sheng Hua, Xiao Luo, Haixin Wang:
PastNet: Introducing Physical Inductive Biases for Spatio-temporal Video Prediction. 2917-2926 - Jiawei Yao, Yingxin Lai, Hongrui Kou, Tong Wu, Ruixi Liu:
QE-BEV: Query Evolution for Bird's Eye View Object Detection in Varied Contexts. 2927-2935 - Xiangrui Liu, Xinju Wu, Pingping Zhang, Shiqi Wang, Zhu Li, Sam Kwong:
CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting. 2936-2944 - Shengyu Hao, Wenhao Chai, Zhonghan Zhao, Meiqi Sun, Wendi Hu, Jieyang Zhou, Yixian Zhao, Qi Li, Yizhou Wang, Xi Li, Gaoang Wang:
Ego3DT: Tracking Every 3D Object in Ego-centric Videos. 2945-2954 - Junkang Liu, Fanhua Shang, Yuanyuan Liu, Hongying Liu, Yuangang Li, YunXiang Gong:
FedBCGD: Communication-Efficient Accelerated Block Coordinate Gradient Descent for Federated Learning. 2955-2963 - Yiran Cheng, Bintao He, Fa Zhang, Renmin Han:
Serial Section Microscopy Image Inpainting Guided by Axial Optical Flow. 2964-2972 - Han Fang, Kejiang Chen, Yupeng Qiu, Zehua Ma, Weiming Zhang, Ee-Chien Chang:
DERO: Diffusion-Model-Erasure Robust Watermarking. 2973-2981 - Yin Wang, Hao Lu, Ying-Cong Chen, Li Kuang, Mengchu Zhou, Shuiguang Deng:
rPPG-HiBa: Hierarchical Balanced Framework for Remote Physiological Measurement. 2982-2991 - Huan Chen, Tingfa Xu, Zhenxiang Chen, Peifu Liu, Huiyan Bai, Jianan Li:
Multi-scale Change-Aware Transformer for Remote Sensing Image Change Detection. 2992-3000 - Yinyin Peng, Yaofei Wang, Donghui Hu, Kejiang Chen, Xianjin Rong, Weiming Zhang:
LDStega: Practical and Robust Generative Image Steganography based on Latent Diffusion Models. 3001-3009 - Lei Lu, Yanyue Xie, Wei Jiang, Wei Wang, Xue Lin, Yanzhi Wang:
HybridFlow: Infusing Continuity into Masked Codebook for Extreme Low-Bitrate Image Compression. 3010-3018 - Linfei Li, Lin Zhang, Zhong Wang, Ying Shen:
GS3LAM: Gaussian Semantic Splatting SLAM. 3019-3027 - Shuang Wang, Pengyi Hao, Fuli Wu, Cong Bai:
Live on the Hump: Self Knowledge Distillation via Virtual Teacher-Students Mutual Learning. 3028-3036 - Xuhan Zhu, Yifei Xing, Ruiping Wang, Yaowei Wang, Xiangyuan Lan:
Calibration for Long-tailed Scene Graph Generation. 3037-3046 - Minjing Yu, Lingzhi Zeng, Xinxin Du, Jenny Sheng, Qiantian Liao, Yong-Jin Liu:
VisHanfu: An Interactive System for the Promotion of Hanfu Knowledge via Cross-Shaped Flat Structure. 3047-3055 - Xiuquan Du, Jiajia Chen, Xuejun Zhang:
CBNet: Cooperation-Based Weakly Supervised Polyp Detection. 3056-3064 - Zeyu Xiao, Zhihe Lu, Michael Bi Mi, Zhiwei Xiong, Xinchao Wang:
Unraveling Motion Uncertainty for Local Motion Deblurring. 3065-3074 - Yi Wang, Ningze Zhong, Minglin Chen, Longguang Wang, Yulan Guo:
Tangram-Splatting: Optimizing 3D Gaussian Splatting Through Tangram-inspired Shape Priors. 3075-3083 - Jiali Chen, Yi Cai, Ruohang Xu, Jiexin Wang, Jiayuan Xie, Qing Li:
Deconfounded Emotion Guidance Sticker Selection with Causal Inference. 3084-3093 - Zhijian Wu, Jun Li, Yang Hu, Dingjiang Huang:
Compacter: A Lightweight Transformer for Image Restoration. 3094-3103 - Xiuli Bi, Yang Hu, Bo Liu, Weisheng Li, Pamela C. Cosman, Bin Xiao:
PriFU: Capturing Task-Relevant Information Without Adversarial Learning. 3104-3112 - Zan Chen, Xiao Yu, Yuanjing Feng:
Connectivity-based Cerebrovascular Segmentation in Time-of-Flight Magnetic Resonance Angiography. 3113-3121 - Jiawei Chen, Dingkang Yang, Yue Jiang, Mingcheng Li, Jinjie Wei, Xiaolu Hou, Lihua Zhang:
Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Models. 3122-3130 - Keke Tang, Zhensu Wang, Weilong Peng, Lujie Huang, Le Wang, Peican Zhu, Wenping Wang, Zhihong Tian:
SymAttack: Symmetry-aware Imperceptible Adversarial Attacks on 3D Point Clouds. 3131-3140 - Jie Liang, Rongjie Wang, Rui Peng, Zhe Zhang, Kaiqiang Xiong, Ronggang Wang:
High Fidelity Aggregated Planar Prior Assisted PatchMatch Multi-View Stereo. 3141-3150 - Tao Huang, Xinjia Ou, Huali Yang, Shengze Hu, Jing Geng, Junjie Hu, Zhuoran Xu:
Remembering is Not Applying: Interpretable Knowledge Tracing for Problem-solving Processes. 3151-3159 - Kien T. Pham, Jingye Chen, Qifeng Chen:
TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization. 3160-3169 - Lingyu Xiong, Xize Cheng, Jintao Tan, Xianjia Wu, Xiandong Li, Lei Zhu, Fei Ma, Minglei Li, Huang Xu, Zhihui Hu:
SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing. 3170-3179 - Changshuo Wang, Mingzhe Yu, Lei Wu, Lei Meng, Xiang Li, Xiangxu Meng:
InstantAS: Minimum Coverage Sampling for Arbitrary-Size Image Generation. 3180-3188 - Du Chen, Zhengqiang Zhang, Jie Liang, Lei Zhang:
SSL: A Self-similarity Loss for Improving Generative Image Super-resolution. 3189-3198 - Zhengze Xu, Mengting Chen, Zhao Wang, Linyu Xing, Zhonghua Zhai, Nong Sang, Jinsong Lan, Shuai Xiao, Changxin Gao:
Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos. 3199-3208 - Lixing Tan, Shuang Song, Kangneng Zhou, Chengbo Duan, Lanying Wang, Huayang Ren, Linlin Liu, Wei Zhang, Ruoxiu Xiao:
Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans. 3209-3218 - Zecheng Wang, Xinye Li, Zhanyue Qin, Chunshan Li, Zhiying Tu, Dianhui Chu, Dianbo Sui:
Can We Debias Multimodal Large Language Models via Model Editing? 3219-3228 - Shuqi Dai, Ming-Yu Liu, Rafael Valle, Siddharth Gururani:
ExpressiveSinger: Multilingual and Multi-Style Score-based Singing Voice Synthesis with Expressive Performance Control. 3229-3238 - Dehao Ying, Fengchang Yu, Haihua Chen, Wei Lu:
DIG: Complex Layout Document Image Generation with Authentic-looking Text for Enhancing Layout Analysis. 3239-3247 - Shibo Hong, Xuhong Zhang, Tianyu Du, Sheng Cheng, Xun Wang, Jianwei Yin:
Cons2Plan: Vector Floorplan Generation from Various Conditions via a Learning Framework based on Conditional Diffusion Models. 3248-3256 - Qihe Pan, Zhen Zhao, Zicheng Wang, Sifan Long, Yiming Wu, Wei Ji, Haoran Liang, Ronghua Liang:
Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach. 3257-3265 - Xiaofeng Mao, Zhengkai Jiang, Qilin Wang, Chencan Fu, Jiangning Zhang, Jiafu Wu, Yabiao Wang, Chengjie Wang, Wei Li, Mingmin Chi:
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation. 3266-3274 - Jihoon Lee, Yunhong Min, Hwidong Kim, Sangtae Ahn:
DAFT-GAN: Dual Affine Transformation Generative Adversarial Network for Text-Guided Image Inpainting. 3275-3283 - Boyong He, Yuxiang Ji, Zhuoyue Tan, Liaoni Wu:
Diffusion Domain Teacher: Diffusion Guided Domain Adaptive Object Detector. 3284-3293 - Weizhi Liu, Yue Li, Dongdong Lin, Hui Tian, Haizhou Li:
GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis. 3294-3302 - Feihong Lu, Weiqi Wang, Yangyifei Luo, Ziqin Zhu, Qingyun Sun, Baixuan Xu, Haochen Shi, Shiqi Gao, Qian Li, Yangqiu Song, Jianxin Li:
Miko: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery. 3303-3312 - Guojin Zhong, Yihu Guo, Jin Yuan, Qianjun Zhang, Weili Guan, Long Chen:
PROMOTE: Prior-Guided Diffusion Model with Global-Local Contrastive Learning for Exemplar-Based Image Translation. 3313-3322 - Xiangcheng Zhai, Yingqi Jie, Xueguang Xie, Aimin Hao, Na Jiang, Yang Gao:
ANFluid: Animate Natural Fluid Photos base on Physics-Aware Simulation and Dual-Flow Texture Learning. 3323-3331 - Shoubin Yu, Jacob Zhiyuan Fang, Jian Zheng, Gunnar A. Sigurdsson, Vicente Ordonez, Robinson Piramuthu, Mohit Bansal:
Zero-Shot Controllable Image-to-Video Animation via Motion Decomposition. 3332-3341 - Goirik Chakrabarty, Aditya Chandrasekar, Ramya Hebbalaguppe, Prathosh AP:
LoMOE: Localized Multi-Object Editing via Multi-Diffusion. 3342-3351 - Yuyan Chen, Songzhou Yan, Zhihong Zhu, Zhixu Li, Yanghua Xiao:
XMeCap: Meme Caption Generation with Sub-Image Adaptability. 3352-3361 - Zhenqiang Li, Jie Li, Yangjie Cao, Jiayi Wang, Runfeng Lv:
ImageBind3D: Image as Binding Step for Controllable 3D Generation. 3362-3371 - Pengxiang Cai, Zhiwei Liu, Guibo Zhu, Yunfang Niu, Jinqiao Wang:
Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner. 3372-3380 - Chengwei Zhang, Xueyi Zhang, Xianghu Yue, Mingrui Lao, Tao Jiang, Jiawei Wang, Fubo Zhang, Longyong Chen:
PD-Refiner: An Underlying Surface Inheritance Refiner with Adaptive Edge-Aware Supervision for Point Cloud Denoising. 3381-3390 - Yue Jiang, Yueming Lyu, Ziwen He, Bo Peng, Jing Dong:
Mitigating Social Biases in Text-to-Image Diffusion Models via Linguistic-Aligned Attention Guidance. 3391-3400 - Peng Zhou, Dunbo Cai, Yujian Du, Runqing Zhang, Bingbing Ni, Jie Qin, Ling Qian:
Edit3D: Elevating 3D Scene Editing with Attention-Driven Multi-Turn Interactivity. 3401-3410 - Ziyu Yao, Xuxin Cheng, Zhiqi Huang:
FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model. 3411-3420 - Xiaomin Li, Xu Jia, Qinghe Wang, Haiwen Diao, Mengmeng Ge, Pengxiang Li, You He, Huchuan Lu:
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models. 3421-3430 - Qi Xu, Yaxin Li, Xuanye Fang, Jiangrong Shen, Qiang Zhang, Gang Pan:
Reversing Structural Pattern Learning with Biologically Inspired Knowledge Distillation for Spiking Neural Networks. 3431-3439 - Xiaogang Wang, Yuhang Cheng, Ziyang Fan, Kai Xu:
Learning to Transfer Heterogeneous Translucent Materials from a 2D Image to 3D Models. 3440-3448 - Zonglin Lyu, Ming Li, Jianbo Jiao, Chen Chen:
Frame Interpolation with Consecutive Brownian Bridge Diffusion. 3449-3458 - Teng Hu, Jiangning Zhang, Ran Yi, Yating Wang, Jieyu Weng, Hongrui Huang, Yabiao Wang, Lizhuang Ma:
COMD: Training-free Video Motion Transfer With Camera-Object Motion Disentanglement. 3459-3468 - Yihao Liu, Feng Xue, Anlong Ming, Mingshuai Zhao, Huadong Ma, Nicu Sebe:
SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model. 3469-3478 - Qinfeng Li, Zhiqiang Shen, Zhenghan Qin, Yangfan Xie, Xuhong Zhang, Tianyu Du, Sheng Cheng, Xun Wang, Jianwei Yin:
TransLinkGuard: Safeguarding Transformer Models Against Model Stealing in Edge Deployment. 3479-3488 - Tao Wu, Mengze Li, Jingyuan Chen, Wei Ji, Wang Lin, Jinyang Gao, Kun Kuang, Zhou Zhao, Fei Wu:
Semantic Alignment for Multimodal Large Language Models. 3489-3498 - Wenxuan Yang, Weimin Tan, Yuqi Sun, Bo Yan:
A Medical Data-Effective Learning Benchmark for Highly Efficient Pre-training of Foundation Models. 3499-3508 - Jin Liu, Huaibo Huang, Jie Cao, Ran He:
ZePo: Zero-Shot Portrait Stylization with Faster Sampling. 3509-3518 - Yiding Li, Lingyun Yu, Li Wang, Hongtao Xie:
Control-Talker: A Rapid-Customization Talking Head Generation Method for Multi-Condition Control and High-Texture Enhancement. 3519-3527 - Zhaoyang Li, Zhu Teng, Baopeng Zhang, Jianping Fan:
Boosting Non-causal Semantic Elimination: An Unconventional Harnessing of LVM for Open-World Deepfake Interpretation. 3528-3537 - Zhihao Sun, Haipeng Fang, Juan Cao, Xinying Zhao, Danding Wang:
Rethinking Image Editing Detection in the Era of Generative AI Revolution. 3538-3547 - Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu:
GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting. 3548-3557 - Xingqi Wang, Xiaoyuan Yi, Xing Xie, Jia Jia:
Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization. 3558-3567 - Weili Zeng, Yichao Yan, Qi Zhu, Zhuo Chen, Pengzhi Chu, Weiming Zhao, Xiaokang Yang:
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting. 3568-3577 - Yi Liu, Chengjun Cai, Xiaoli Zhang, Xingliang Yuan, Cong Wang:
Arondight: Red Teaming Large Vision Language Models with Auto-generated Multi-modal Jailbreak Prompts. 3578-3586 - Yisu Liu, Jinyang An, Wanqian Zhang, Dayan Wu, Jingzi Gu, Zheng Lin, Weiping Wang:
Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization. 3587-3596 - Yiren Lu, Jing Ma, Yu Yin:
View-consistent Object Removal in Radiance Fields. 3597-3606 - Shaocong Long, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Chenhao Ying, Yuan Luo, Lizhuang Ma, Shuicheng Yan:
DGMamba: Domain Generalization via Generalized State Space Model. 3607-3616 - Wangguandong Zheng, Haifeng Xia, Rui Chen, Libo Sun, Ming Shao, Siyu Xia, Zhengming Ding:
Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation. 3617-3626 - Ziyin Zhou, Ke Sun, Zhongxi Chen, Huafeng Kuang, Xiaoshuai Sun, Rongrong Ji:
StealthDiffusion: Towards Evading Diffusion Forensic Detection through Diffusion Model. 3627-3636 - Hong Chen, Xin Wang, Yipeng Zhang, Yuwei Zhou, Zeyang Zhang, Siao Tang, Wenwu Zhu:
DisenStudio: Customized Multi-Subject Text-to-Video Generation with Disentangled Spatial Control. 3637-3646 - Ziqi Yu, Jing Zhou, Zhongyun Bao, Gang Fu, Weilei He, Chao Liang, Chunxia Xiao:
CFDiffusion: Controllable Foreground Relighting in Image Compositing via Diffusion Model. 3647-3656 - Hao Wang, Shangwei Guo, Jialing He, Kangjie Chen, Shudong Zhang, Tianwei Zhang, Tao Xiang:
EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second. 3657-3665 - Haiyan Jiang, Leiyu Song, Dongdong Weng, Zhe Sun, Huiying Li, Xiaonuo Dongye, Zhenliang Zhang:
In Situ 3D Scene Synthesis for Ubiquitous Embodied Interfaces. 3666-3675 - Haoning Wu, Xiele Wu, Chunyi Li, Zicheng Zhang, Chaofeng Chen, Xiaohong Liu, Guangtao Zhai, Weisi Lin:
T2I-Scorer: Quantitative Evaluation on Text-to-Image Generation via Fine-Tuned Large Multi-Modal Models. 3676-3685 - Shiwei Li, Yingyi Cheng, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Dugang Liu, Xiuqiang He, Ruixuan Li:
Masked Random Noise for Communication-Efficient Federated Learning. 3686-3694 - Sa Yan, Nuowen Kan, Chenglin Li, Wenrui Dai, Junni Zou, Hongkai Xiong:
Task-Oriented Multi-Bitstream Optimization for Image Compression and Transmission via Optimal Transport. 3695-3703 - Tingting Li, Ziming Zhao, Jianwei Yin:
Minerva: Enhancing Quantum Network Performance for High-Fidelity Multimedia Transmission. 3704-3712 - Xiaotong Yu, Chang-Wen Chen:
Semantic-aware Next-Best-View for Multi-DoFs Mobile System in Search-and-Acquisition based Visual Perception. 3713-3721 - Yu Chen, Yanan Wu, Na Han, Xiaozhao Fang, Bingzhi Chen, Jie Wen:
Partial Multi-label Learning Based On Near-Far Neighborhood Label Enhancement And Nonlinear Guidance. 3722-3731 - Ruofan Jia, Weiying Xie, Jie Lei, Yunsong Li:
Adaptive Hierarchical Aggregation for Federated Object Detection. 3732-3740 - Liang Xie, Wei Gao, Huiming Zheng, Ge Li:
ROI-Guided Point Cloud Geometry Compression Towards Human and Machine Vision. 3741-3750
Oral Session 12: Human-centric and Interactive Multimedia
- Xiyu Wang, Yufei Wang, Satoshi Tsutsui, Weisi Lin, Bihan Wen, Alex C. Kot:
Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models. 3751-3760 - Shiyu Liu, Zibo Zhao, Yihao Zhi, Yiqun Zhao, Binbin Huang, Shuo Wang, Ruoyu Wang, Michael Xuan, Zhengxin Li, Shenghua Gao:
HeroMaker: Human-centric Video Editing with Motion Priors. 3761-3770 - Yunze Liu, Changxi Chen, Chenjing Ding, Li Yi:
PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis via Forward Dynamics Guided 4D Imitation. 3771-3780 - Wenxuan Wang, Haonan Bai, Jen-tse Huang, Yuxuan Wan, Youliang Yuan, Haoyi Qiu, Nanyun Peng, Michael R. Lyu:
New Job, New Gender? Measuring the Social Bias in Image Generation Models. 3781-3789 - Mengzhen Liu, Mengyu Wang, Henghui Ding, Yilong Xu, Yao Zhao, Yunchao Wei:
Segment Anything with Precise Interaction. 3790-3799 - Zhihua Xu, Tianshui Chen, Zhijing Yang, Chunmei Qing, Yukai Shi, Liang Lin:
Self-Supervised Emotion Representation Disentanglement for Speech-Preserving Facial Expression Manipulation. 3800-3808
Oral Session 13: Machine Learning for Multimedia
- Dongyu Xie, Chaofan Qiao, Lanyue Liang, Zhiwen Wang, Tianyu Li, Qiao Liu, Chongyi Li, Guoqing Wang, Yang Yang:
Generalizing ISP Model by Unsupervised Raw-to-raw Mapping. 3809-3817 - Yang Liu, Daizong Liu, Zongming Guo, Wei Hu:
Cross-Task Knowledge Transfer for Semi-supervised Joint 3D Grounding and Captioning. 3818-3827 - Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai, Qingming Huang:
Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval. 3828-3837 - Dongjie Fu, Xize Cheng, Xiaoda Yang, Hanting Wang, Zhou Zhao, Tao Jin:
Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts. 3838-3847 - Xingyu Zhu, Beier Zhu, Yi Tan, Shuo Wang, Yanbin Hao, Hanwang Zhang:
Selective Vision-Language Subspace Projection for Few-shot CLIP. 3848-3857 - Jin Liu, Bo Wang, Chuanming Wang, Huiyuan Fu, Huadong Ma:
Learning Exposure Correction in Dynamic Scenes. 3858-3866
Oral Session 14: Multimodal Datasets, Models & Analytics
- Fuqiang Niu, Zebang Cheng, Xianghua Fu, Xiaojiang Peng, Genan Dai, Yin Chen, Hu Huang, Bowen Zhang:
Multimodal Multi-turn Conversation Stance Detection: A Challenge Dataset and Effective Model. 3867-3876 - Ruilin Yao, Shengwu Xiong, Yichen Zhao, Yi Rong:
Visual Grounding with Multi-modal Conditional Adaptation. 3877-3886 - Junhao Xu, Jingjing Chen, Xue Song, Feng Han, Haijun Shan, Yu-Gang Jiang:
Identity-Driven Multimedia Forgery Detection via Reference Assistance. 3887-3896 - Bowen Zhao, Tianhao Cheng, Yuejie Zhang, Ying Cheng, Rui Feng, Xiaobo Zhang:
CT2C-QA: Multimodal Question Answering over Chinese Text, Table and Chart. 3897-3906 - Zhanyu Wang, Longyue Wang, Zhen Zhao, Minghao Wu, Chenyang Lyu, Huayang Li, Deng Cai, Luping Zhou, Shuming Shi, Zhaopeng Tu:
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation. 3907-3916 - Linmei Hu, Duokang Wang, Yiming Pan, Jifan Yu, Yingxia Shao, Chong Feng, Liqiang Nie:
NovaChart: A Large-scale Dataset towards Chart Understanding and Generation of Multimodal Large Language Models. 3917-3925
Oral Session 15: Video Applications
- Jiaxu Li, Songsong Yu, Yifan Wang, Lijun Wang, Huchuan Lu:
SelM: Selective Mechanism based Audio-Visual Segmentation. 3926-3935 - Yuqing Wang, Lei Meng, Haokai Ma, Yuqing Wang, Haibei Huang, Xiangxu Meng:
Modeling Event-level Causal Representation for Video Classification. 3936-3944 - Te Yang, Jian Jia, Bo Wang, Yanhua Cheng, Yan Li, Dongze Hao, Xipeng Cao, Quan Chen, Han Li, Peng Jiang, Xiangyu Zhu, Zhen Lei:
Spatiotemporal Fine-grained Video Description for Short Videos. 3945-3954 - Yili Li, Jing Yu, Keke Gai, Bang Liu, Gang Xiong, Qi Wu:
T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval. 3955-3963 - Haijie Yang, Zhenyu Zhang, Hao Tang, Jianjun Qian, Jian Yang:
ConsistentAvatar: Learning to Diffuse Fully Consistent Talking Head Avatar with Temporal Guidance. 3964-3973 - Zhiyu Zhang, Guo Lu, Huanxiong Liang, Zhengxue Cheng, Anni Tang, Li Song:
Rate-aware Compression for NeRF-based Volumetric Video. 3974-3983
Oral Session 16: Biological and Health Applications
- Jingxiong Li, Sunyi Zheng, Chenglu Zhu, Yuxuan Sun, Pingyi Chen, Zhongyi Shui, Yunlong Zhang, Honglin Li, Lin Yang:
PathUp: Patch-wise Timestep Tracking for Multi-class Large Pathology Image Synthesising Diffusion Model. 3984-3993 - Dian Xie, Peiang Zhao, Jiarui Zhang, Kangqi Wei, Xiaobao Ni, Jiong Xia:
BrainRAM: Cross-Modality Retrieval-Augmented Image Reconstruction from Human Brain Activity. 3994-4003 - Shuo Ma, Yingwei Zhang, Qiqi Zhang, Yiqiang Chen, Haoran Wang, Ziyu Jia:
SleepMG: Multimodal Generalizable Sleep Staging with Inter-modal Balance of Classification and Domain Discrimination. 4004-4013 - Zixuan Gong, Qi Zhang, Guangyin Bao, Lei Zhu, Yu Zhang, Ke Liu, Liang Hu, Duoqian Miao:
Lite-Mind: Towards Efficient and Robust Brain Representation Learning. 4014-4023 - Kun Dong, Jian Xue, Zehai Niu, Xing Lan, Ke Lu, Qingyuan Liu, Xiaoyu Qin:
Realistic Full-Body Motion Generation from Sparse Tracking with State Space Model. 4024-4033 - Usman Naseem, Adam G. Dunn, Matloob Khushi, Jinman Kim:
Vaccine Misinformation Detection in X using Cooperative Multimodal Framework. 4034-4042
Oral Session 17: Person Modeling and Tracking
- Shizong Yan, Huixiang Wen, Shan Chang, Hongzi Zhu, Luo Zhou:
Fooling 3D Face Recognition with One Single 2D Image. 4043-4052 - Fangyi Liu, Mang Ye, Bo Du:
Cloth-aware Augmentation for Cloth-generalized Person Re-identification. 4053-4062 - Zhiqi Pang, Lingling Zhao, Chunyu Wang:
Dual-Resolution Fusion Modeling for Unsupervised Cross-Resolution Person Re-Identification. 4063-4072 - Huilin Tian, Jingke Meng, Wei-Shi Zheng, Yuan-Ming Li, Junkai Yan, Yunong Zhang:
Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation. 4073-4081 - Changcheng Xiao, Qiong Cao, Zhigang Luo, Long Lan:
MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model. 4082-4091 - Ling Li, WenRui Yang, Xinchun Yu, Junliang Xing, Xiao-Ping Zhang:
Translating Motion to Notation: Hand Labanotation for Intuitive and Comprehensive Hand Movement Documentation. 4092-4100
Poster Session 2
- Xiang Gao, Jiaying Liu:
FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation. 4101-4109 - Wen Yin, Bin Benjamin Zhu, Yulai Xie, Pan Zhou, Dan Feng:
Backdoor Attacks on Bimodal Salient Object Detection with RGB-Thermal Data. 4110-4119 - Zhixiang Shen, Haolan He, Zhao Kang:
Balanced Multi-Relational Graph Clustering. 4120-4128 - Jiyuan Wang, Chunyu Lin, Lang Nie, Kang Liao, Shuwei Shao, Yao Zhao:
Digging into Contrastive Learning for Robust Depth Estimation with Diffusion Models. 4129-4137 - Zhuoxiao Chen, Zixin Wang, Yadan Luo, Sen Wang, Zi Huang:
DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection. 4138-4147 - Xian Zhang, Haokun Wen, Jianlong Wu, Pengda Qin, Hui Xue', Liqiang Nie:
Differential-Perceptive and Retrieval-Augmented MLLM for Change Captioning. 4148-4157 - Bingyan Liu, Chengyu Wang, Jun Huang, Kui Jia:
Attentive Linguistic Tracking in Diffusion Models for Training-free Text-guided Image Editing. 4158-4166 - Changhao He, Hongyuan Zhu, Peng Hu, Xi Peng:
Robust Variational Contrastive Learning for Partially View-unaligned Clustering. 4167-4176 - Shengxin Chen, Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Rongrong Ji:
QueryMatch: A Query-based Contrastive Learning Framework for Weakly Supervised Visual Grounding. 4177-4186 - Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li:
Generative Expressive Conversational Speech Synthesis. 4187-4196 - Zhien Dai, Zhaohui Tang, Hu Zhang, Can Tian, Mingjun Pan, Yongfang Xie:
Eglcr: Edge Structure Guidance and Scale Adaptive Attention for Iterative Stereo Matching. 4197-4206 - Humen Zhong, Zhibo Yang, Zhaohai Li, Peng Wang, Jun Tang, Wenqing Cheng, Cong Yao:
VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer. 4207-4216 - Chaofan Gan, Yuanpeng Tu, Yuxi Li, Weiyao Lin:
DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction. 4217-4226 - Zhenyu Hou, Junjun Guo:
Virtual Visual-Guided Domain-Shadow Fusion via Modal Exchanging for Domain-Specific Multi-Modal Neural Machine Translation. 4227-4235 - Yuxiang Yang, Lu Wen, Xinyi Zeng, Yuanyuan Xu, Xi Wu, Jiliu Zhou, Yan Wang:
Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression Recognition. 4236-4245 - Shuhuang Chen, Dingjie Fu, Shiming Chen, Shuo Ye, Wenjin Hou, Xinge You:
Causal Visual-semantic Correlation for Zero-shot Learning. 4246-4255 - Patrick Steinert, Stefan Wagenpfeil, Ingo Frommholz, Matthias L. Hemmje:
256 Metaverse Records Dataset. 4256-4263 - Yifeng Xie, Zhihong Zhu, Xin Chen, Zhanpeng Chen, Zhiqi Huang:
MoBA: Mixture of Bi-directional Adapter for Multi-modal Sarcasm Detection. 4264-4272 - Jiulin Li, Mengyu Yang, Ye Tian, Lanshan Zhang, Yongchun Lu, Jice Liu, Wendong Wang:
WaveDN: A Wavelet-based Training-free Zero-shot Enhancement for Vision-Language Models. 4273-4282 - Runkai Zhao, Heng Wang, Weidong Cai:
LaneCMKT: Boosting Monocular 3D Lane Detection with Cross-Modal Knowledge Transfer. 4283-4291 - Wenju Sun, Qingyong Li, Siyu Zhang, Wen Wang, Yangli-ao Geng:
Incremental Learning via Robust Parameter Posterior Fusion. 4292-4301 - Tao Jin, Weicai Yan, Ye Wang, Sihang Cai, Qifan Shuai, Zhou Zhao:
Calibrating Prompt from History for Continual Vision-Language Retrieval and Grounding. 4302-4311 - Pengyue Lin, Ruifan Li, Yuzhe Ji, Zhihan Yu, Fangxiang Feng, Zhanyu Ma, Xiaojie Wang:
Triple Alignment Strategies for Zero-shot Phrase Grounding under Weak Supervision. 4312-4321 - Zhenni Yu, Xiaoqin Zhang, Li Zhao, Yi Bin, Guobao Xiao:
Exploring Deeper! Segment Anything Model with Depth Perception for Camouflaged Object Detection. 4322-4330 - Jiawei Wang, Da Cao, Shaofei Lu, Zhanchang Ma, Junbin Xiao, Tat-Seng Chua:
Causal-driven Large Language Models with Faithful Reasoning for Knowledge Question Answering. 4331-4340 - Zijian Yi, Ziming Zhao, Zhishu Shen, Tiehua Zhang:
Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation. 4341-4348 - Cheng Shen, Liquan Shen, Mengyao Li, Meng Yu:
EPL-UFLSID: Efficient Pseudo Labels-Driven Underwater Forward-Looking Sonar Images Object Detection. 4349-4357 - Shuiping Gou, Xin Wang, Xinlin Wang, Yunzhi Chen:
Interpretable Matching of Optical-SAR Image via Dynamically Conditioned Diffusion Models. 4358-4367 - Xiaohuan Ding, Yangrui Gong, Tianyi Shi, Zihang Huang, Gangwei Xu, Xin Yang:
Masked Snake Attention for Fundus Image Restoration with Vessel Preservation. 4368-4376 - Yajie Zhang, Zhi-An Huang, Zhiliang Hong, Songsong Wu, Jibin Wu, Kay Chen Tan:
Mixed Prototype Correction for Causal Inference in Medical Image Classification. 4377-4386 - Yi Zhang, Ke Yu, Angelica I. Avilés-Rivero, Jiyuan Jia, Yushun Tang, Zhihai He:
Training-Free Feature Reconstruction with Sparse Optimization for Vision-Language Models. 4387-4396 - Nan Wang, Zonglin Di, Houlin He, Qingchao Jiang, Xiaoxiao Li:
A Simple and Provable Approach for Learning on Noisy Labeled Medical Images. 4397-4405 - Mengmeng Sheng, Zeren Sun, Gensheng Pei, Tao Chen, Haonan Luo, Yazhou Yao:
Enhancing Robustness in Learning with Noisy Labels: An Asymmetric Co-Training Approach. 4406-4415 - Muquan Li, Dongyang Zhang, Tao He, Xiurui Xie, Yuan-Fang Li, Ke Qin:
Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation. 4416-4425 - Qiuhui Chen, Yi Hong:
SMART: Self-Weighted Multimodal Fusion for Diagnostics of Neurodegenerative Disorders. 4426-4435 - Taoyu Su, Jiawei Sheng, Shicheng Wang, Xinghua Zhang, Hongbo Xu, Tingwen Liu:
IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment. 4436-4445 - Zhijun Jia, Huaying Xue, Xiulian Peng, Yan Lu:
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision. 4446-4454 - Yihan Zhao, Wei Xi, Yuhang Cui, Gairui Bai, Xinhui Liu, Jizhong Zhao:
CoPL: Parameter-Efficient Collaborative Prompt Learning for Audio-Visual Tasks. 4455-4464 - Junbo Hu, Zhixin Li:
Distilled Cross-Combination Transformer for Image Captioning with Dual Refined Visual Features. 4465-4474 - Siyuan Xu, Guannan Li, Haofei Song, Jiansheng Wang, Yan Wang, Qingli Li:
GeNSeg-Net: A General Segmentation Framework for Any Nucleus in Immunohistochemistry Images. 4475-4484 - Ziyi Gao, Kai Chen, Zhipeng Wei, Tingshu Mou, Jingjing Chen, Zhiyu Tan, Hao Li, Yu-Gang Jiang:
ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack. 4485-4494 - Kunyu Peng, David Schneider, Alina Roitberg, Kailun Yang, Jiaming Zhang, Chen Deng, Kaiyu Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen:
Towards Video-based Activated Muscle Group Estimation in the Wild. 4495-4504 - Rui Xu, Gaolei Li, Changze Li, Zhaohui Yang, Yuchen Liu, Mingzhe Chen:
OSNeRF: On-demand Semantic Neural Radiance Fields for Fast and Robust 3D Object Reconstruction. 4505-4514 - Wenjie Li, Heng Guo, Xuannan Liu, Kongming Liang, Jiani Hu, Zhanyu Ma, Jun Guo:
Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network. 4515-4523 - Ruoxi Deng, Bin Yu, Jinxuan Lu, Caixia Zhou, Zhao-Min Chen, Jie Hu:
Advancing Semantic Edge Detection through Cross-Modal Knowledge Learning. 4524-4532 - Jiacheng Zhang, Jie Wu, Huafeng Kuang, Haiming Zhang, Yuxi Ren, Weifeng Chen, Manlin Zhang, Xuefeng Xiao, Guanbin Li:
TreeReward: Improve Diffusion Model via Tree-Structured Feedback Learning. 4533-4542 - Chaomin Shen, Yaomin Huang, Haokun Zhu, Jinsong Fan, Guixu Zhang:
Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation. 4543-4552 - Yanshan Zhou, Pingrui Lai, Jiaqi Yu, Yingjie Xiong, Hua Yang:
Hydrodynamics-Informed Neural Network for Simulating Dense Crowd Motion Patterns. 4553-4561 - Zhidong Yu, Zhenbo Shi, Xiaoman Liu, Wei Yang:
PFFAA: Prototype-based Feature and Frequency Alteration Attack for Semantic Segmentation. 4562-4571 - Wenbo Huang, Jinghui Zhang, Xuwei Qian, Zhen Wu, Meng Wang, Lei Zhang:
SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition. 4572-4580 - Xiangyan Qu, Jing Yu, Keke Gai, Jiamin Zhuang, Yuanmin Tang, Gang Xiong, Gaopeng Gou, Qi Wu:
Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning. 4581-4590 - Weixiang Han, Chengjun Cai, Yu Guo, Jialiang Peng:
ERL-MR: Harnessing the Power of Euler Feature Representations for Balanced Multi-modal Learning. 4591-4600 - Luca Rossetto, Cristina Sarasua, Abraham Bernstein:
Estimating the Semantic Density of Visual Media. 4601-4609 - Shaokun Zhang, Yiran Wu, Zhonghua Zheng, Qingyun Wu, Chi Wang:
HyperTime: Hyperparameter Optimization for Combating Temporal Distribution Shifts. 4610-4619 - Xiaomeng Chu, Jiajun Deng, Guoliang You, Yifan Duan, Yao Li, Yanyong Zhang:
RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies. 4620-4629 - Yi Bin, Junrong Liao, Yujuan Ding, Haoxuan Li, Yang Yang, See-Kiong Ng, Heng Tao Shen:
Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning. 4630-4639 - Chengyou Jia, Minnan Luo, Xiaojun Chang, Zhuohang Dang, Mingfei Han, Mengmeng Wang, Guang Dai, Sizhe Dang, Jingdong Wang:
Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition. 4640-4649 - Jialu Zhang, Xinyi Wang, Chenglin Yao, Jianfeng Ren, Xudong Jiang:
Visual-linguistic Cross-domain Feature Learning with Group Attention and Gamma-correct Gated Fusion for Extracting Commonsense Knowledge. 4650-4659 - Wenhan Wu, Ce Zheng, Zihao Yang, Chen Chen, Srijan Das, Aidong Lu:
Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer. 4660-4669 - Xianwei Zhuang, Xuxin Cheng, Zhihong Zhu, Zhanpeng Chen, Hongxiang Li, Yuexian Zou:
Towards Multimodal-augmented Pre-trained Language Models via Self-balanced Expectation-Maximization Iteration. 4670-4679 - Hongze Zhu, Guoyang Xie, Chengbin Hou, Tao Dai, Can Gao, Jinbao Wang, Linlin Shen:
Towards High-resolution 3D Anomaly Detection via Group-Level Feature Contrastive Learning. 4680-4689 - Kaixiang Wang, Xiaojian Ding, Fan Yang:
Non-Overlapped Multi-View Weak-Label Learning Guided by Multiple Correlations. 4690-4698 - Xin Mei, Rui Mao, Xiaoyan Cai, Libin Yang, Erik Cambria:
Medical Report Generation via Multimodal Spatio-Temporal Fusion. 4699-4708 - Guofan Fan, Zekun Qi, Wenkai Shi, Kaisheng Ma:
Point-GCC: Universal Self-supervised 3D Scene Pre-training via Geometry-Color Contrast. 4709-4718 - Menghao Zhang, Jingyu Wang, Qi Qi, Pengfei Ren, Haifeng Sun, Zirui Zhuang, Huazheng Wang, Lei Zhang, Jianxin Liao:
Video Anomaly Detection via Progressive Learning of Multiple Proxy Tasks. 4719-4728 - Xingyu Zhang, Siyu Zhao, Zeen Song, Huijie Guo, Jianqi Zhang, Changwen Zheng, Wenwen Qiang:
Not All Frequencies Are Created Equal: Towards a Dynamic Fusion of Frequencies in Time-Series Forecasting. 4729-4737 - Shijie Chen, Junbao Zhuo, Xin Li, Haizhuang Liu, Rongquan Wang, Jiansheng Chen, Huimin Ma:
CMT: Co-training Mean-Teacher for Unsupervised Domain Adaptation on 3D Object Detection. 4738-4747 - Tianrui Pan, Jie Liu, Bohan Wang, Jie Tang, Gangshan Wu:
RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues. 4748-4756 - Siqi Wang, Chao Liang, Yunfan Gao, Yang Liu, Jing Li, Haofen Wang:
Decoding Urban Industrial Complexity: Enhancing Knowledge-Driven Insights via IndustryScopeGPT. 4757-4765 - Yuanbin Fu, Jie Ying, Houlei Lv, Xiaojie Guo:
Semi-supervised Camouflaged Object Detection from Noisy Data. 4766-4775 - Bolei Chen, Jiaxu Kang, Ping Zhong, Yixiong Liang, Yu Sheng, Jianxin Wang:
Embodied Contrastive Learning with Geometric Consistency and Behavioral Awareness for Object Navigation. 4776-4785 - Jia-Li Yin, Menghao Chen, Jin Han, Bo-Hao Chen, Ximeng Liu:
Adversarial Example Quality Assessment: A Large-scale Dataset and Strong Baseline. 4786-4794 - Ye Jing, Xinpei Zhao:
DQ-Former: Querying Transformer with Dynamic Modality Priority for Cognitive-aligned Multimodal Emotion Recognition in Conversation. 4795-4804 - Xicong Wang, Huiyuan Fu, Jiaxuan Wang, Xin Wang, Heng Zhang, Huadong Ma:
Exploring in Extremely Dark: Low-Light Video Enhancement with Real Events. 4805-4813 - Qing Zhang, Haocheng Lv, Jie Liu, Zhiyun Chen, Jianyong Duan, Hao Wang, Li He, Mingying Xu:
An Entailment Tree Generation Approach for Multimodal Multi-Hop Question Answering with Mixture-of-Experts and Iterative Feedback Mechanism. 4814-4822 - Kangpeng Hu, Quansen Sun, Yinghui Sun, Tao Wang:
Interactive Segmentation by Considering First-Click Intentional Ambiguity. 4823-4831 - Leqi Shen, Sicheng Zhao, Yifeng Zhang, Hui Chen, Jundong Zhou, Pengzhang Liu, Yongjun Bao, Guiguang Ding:
Multi-Label Learning with Block Diagonal Labels. 4832-4840 - Wentao He, Jianfeng Ren, Ruibin Bai, Xudong Jiang:
Hierarchical Perceptual and Predictive Analogy-Inference Network for Abstract Visual Reasoning. 4841-4850 - Wenxi Li, Yuchen Guo, Jilai Zheng, Haozhe Lin, Chao Ma, Lu Fang, Xiaokang Yang:
SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer. 4851-4860 - Bo Liu, Zexin Lu, Yan Wang:
Towards Medical Vision-Language Contrastive Pre-training via Study-Oriented Semantic Exploration. 4861-4870 - Zihao Liu, Xiaoyu Wu, Shengjin Wang, Jiayao Qian:
Adaptively Building a Video-language Model for Video Captioning and Retrieval without Massive Video Pretraining. 4871-4880 - Wenhao Guo, Peng Lu, Xujun Peng, Zhaoran Zhao, Ji Qiu, Xiangtao Dong:
BCSCN: Reducing Domain Gap through Bézier Curve basis-based Sparse Coding Network for Single-Image Super-Resolution. 4881-4889 - Yi Tu, Chong Zhang, Ya Guo, Huan Chen, Jinyang Tang, Huijia Zhu, Qi Zhang:
UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents. 4890-4898 - Tao Ling, Siping Shi, Hao Wang, Chuang Hu, Dan Wang:
Federated Morozov Regularization for Shortcut Learning in Privacy Preserving Learning with Watermarked Image Data. 4899-4908 - Jinfu Liu, Chen Chen, Mengyuan Liu:
Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition. 4909-4918 - Zewen Du, Zhenjiang Hu, Guiyu Zhao, Ying Jin, Hongbin Ma:
LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention. 4919-4927 - Shichen Lu, Longteng Guo, Wenxuan Wang, Zijia Zhao, Tongtian Yue, Jing Liu, Si Liu:
Collaborative Training of Tiny-Large Vision Language Models. 4928-4937 - Xudong Zhou, Tianxiang Chen:
BSBP-RWKV: Background Suppression with Boundary Preservation for Efficient Medical Image Segmentation. 4938-4946 - Yuxing Zhang, Siyuan Meng, Chunchun Chen, Mengyao Peng, Hongyan Gu, Xinli Huang:
LinkThief: Combining Generalized Structure Knowledge with Node Similarity for Link Stealing Attack against GNN. 4947-4956 - Yeqing Shen, Shang Li, Kun Song:
Restoring Real-World Degraded Events Improves Deblurring Quality. 4957-4966 - Xiao Liang, Yanlei Zhang, Di Wang, Haodi Zhong, Ronghan Li, Quan Wang:
Divide and Conquer: Isolating Normal-Abnormal Attributes in Knowledge Graph-Enhanced Radiology Report Generation. 4967-4975 - Zhen Wang, Dongyuan Li, Guang Li, Ziqing Zhang, Renhe Jiang:
Multimodal Low-light Image Enhancement with Depth Information. 4976-4985 - Zishuo Wang, Wenhao Zhou, Jinglin Xu, Yuxin Peng:
SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection. 4986-4994 - Xu Han, Yuan Tang, Zhaoxuan Wang, Xianzhi Li:
Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model. 4995-5004 - Wenqi Ren, Ruihao Xia, Meng Zheng, Ziyan Wu, Yang Tang, Nicu Sebe:
Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models. 5005-5014 - Xuefeng Yin, Chenyang Zhu, Shanglai Qu, Yuqi Li, Kai Xu, Baocai Yin, Xin Yang:
CSO: Constraint-Guided Space Optimization for Active Scene Mapping. 5015-5024 - Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie:
Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning. 5025-5034 - Xinyue Liu, Jianyuan Wang, Biao Leng, Shuo Zhang:
Dual-Modeling Decouple Distillation for Unsupervised Anomaly Detection. 5035-5044 - Huimin Ma, Siwei Wang, Shengju Yu, Suyuan Liu, Junjie Huang, Huijun Wu, Xinwang Liu, En Zhu:
Automatic and Aligned Anchor Learning Strategy for Multi-View Clustering. 5045-5054 - Shengyang Sun, Jiashen Hua, Junyi Feng, Dongxu Wei, Baisheng Lai, Xiaojin Gong:
TDSD: Text-Driven Scene-Decoupled Weakly Supervised Video Anomaly Detection. 5055-5064 - Yang Xin, Yu Zhou, Jianmin Jiang:
RobustFace: Adaptive Mining of Noise and Hard Samples for Robust Face Recognitions. 5065-5073 - Xiang Ma, Xuemei Li, Lexin Fang, Caiming Zhang:
Bridging the Modality Gap: Dimension Information Alignment and Sparse Spatial Constraint for Image-Text Matching. 5074-5082 - Chunli Peng, Xuan Dong, Tiantian Cao, Zhengqing Li, Kun Dong, Weixin Li:
ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig. 5083-5091 - Yang Fang, Xuefeng Rao, Xinbo Gao, Weisheng Li, Zijian Min:
MTSNet: Joint Feature Adaptation and Enhancement for Text-Guided Multi-view Martian Terrain Segmentation. 5092-5101 - Le Jiang, Yan Huang, Lianxin Xie, Wen Xue, Cheng Liu, Si Wu, Hau-San Wong:
Hunting Blemishes: Language-guided High-fidelity Face Retouching Transformer with Limited Paired Data. 5102-5111 - Yijia Guo, Yuanxi Bai, Liwen Hu, Ziyi Guo, Mianzhi Liu, Yu Cai, Tiejun Huang, Lei Ma:
PRTGS: Precomputed Radiance Transfer of Gaussian Splats for Real-Time High-Quality Relighting. 5112-5120 - Mingcan Xiang, Jiaxun Tang, Qizheng Yang, Hui Guan, Tongping Liu:
AdapMTL: Adaptive Pruning Framework for Multitask Learning Model. 5121-5130 - Xinwei Zhang, Aishan Liu, Tianyuan Zhang, Siyuan Liang, Xianglong Liu:
Towards Robust Physical-world Backdoor Attacks on Lane Detection. 5131-5140 - Longtao Jiang, Min Wang, Zecheng Li, Yao Fang, Wengang Zhou, Houqiang Li:
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval. 5141-5150 - Pinxue Guo, Wanyun Li, Hao Huang, Lingyi Hong, Xinyu Zhou, Zhaoyu Chen, Jinglun Li, Kaixun Jiang, Wei Zhang, Wenqiang Zhang:
X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation. 5151-5160 - Ling Huang, Wenqian Dong, Song Xiao, Jiahui Qu, Yuanbo Yang, Yunsong Li:
Language-Guided Visual Prompt Compensation for Multi-Modal Remote Sensing Image Classification with Modality Absence. 5161-5170 - Zening Lin, Jiapeng Wang, Teng Li, Wenhui Liao, Dayi Huang, Longfei Xiong, Lianwen Jin:
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction. 5171-5180 - Haojian Huang, Xiaozhen Qiao, Zhuo Chen, Haodong Chen, Bingyu Li, Zhe Sun, Mulin Chen, Xuelong Li:
CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning. 5181-5190 - Shuai Zhao, Yongkun Du, Zhineng Chen, Yu-Gang Jiang:
Decoder Pre-Training with only Text for Scene Text Recognition. 5191-5200 - Naibo Wang, Yuchen Deng, Wenjie Feng, Shichen Fan, Jianwei Yin, See-Kiong Ng:
One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity. 5201-5210 - Wendong Huang, Jinwu Hu, Xiuli Bi, Bin Xiao:
Anatomical Prior Guided Spatial Contrastive Learning for Few-Shot Medical Image Segmentation. 5211-5220 - Libo Long, Xiao Hu, Jochen Lang:
Learning to Handle Large Obstructions in Video Frame Interpolation. 5221-5229 - Hefei Huang, Xu Jia, Xinyu Zhang, Shengming Li, Huchuan Lu:
Event-Guided Rolling Shutter Correction with Time-Aware Cross-Attentions. 5230-5239 - Xibiao Wang, Hang Gao, Xindian Wei, Liang Peng, Rui Li, Cheng Liu, Si Wu, Hau-San Wong:
Contrastive Graph Distribution Alignment for Partially View-Aligned Clustering. 5240-5249 - Xudong Cai, Yongcai Wang, Lun Luo, Minhang Wang, Deying Li, Jintao Xu, Weihao Gu, Rui Ai:
PRISM: PRogressive dependency maxImization for Scale-invariant image Matching. 5250-5259 - Yang Du, Yuqi Liu, Qin Jin:
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval. 5260-5269 - Wen Luo, Yu Xia, Tianshu Shen, Sujian Li:
Shapley Value-based Contrastive Alignment for Multimodal Information Extraction. 5270-5279 - Hao Yu, Xin Yang, Xin Gao, Yihui Feng, Hao Wang, Yan Kang, Tianrui Li:
Overcoming Spatial-Temporal Catastrophic Forgetting for Federated Class-Incremental Learning. 5280-5288 - Haibo Wang, Chenghang Lai, Yixuan Sun, Weifeng Ge:
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering. 5289-5298 - Shudong Huang, Hecheng Cai, Hao Dai, Wentao Feng, Jiancheng Lv:
Adaptive Instance-wise Multi-view Clustering. 5299-5307 - Ze Yuan, Jinyang Guo, Dakai An, Junran Wu, He Zhu, Jianhao Li, Xueyuan Chen, Ke Xu, Jiaheng Liu:
VRDistill: Vote Refinement Distillation for Efficient Indoor 3D Object Detection. 5308-5317 - Sunoh Kim, Daeho Um, Hyunjun Choi, Jin Young Choi:
Learnable Negative Proposals Using Dual-Signed Cross-Entropy Loss for Weakly Supervised Video Moment Localization. 5318-5327 - Yansong Qu, Shaohui Dai, Xinyang Li, Jianghang Lin, Liujuan Cao, Shengchuan Zhang, Rongrong Ji:
GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane. 5328-5337 - Huan Yao, Changxing Ding, Xuanda Xu, Zhifeng Lin:
Decoupling Heterogeneous Features for Robust 3D Interacting Hand Poses Estimation. 5338-5346 - Zhiyu Zhu, Zhibo Jin, Jiayu Zhang, Huaming Chen:
Enhancing Model Interpretability with Local Attribution over Global Exploration. 5347-5355 - Ruxue Yan, Wenya Guo, Xubo Liu, Xumeng Liu, Ying Zhang, Xiaojie Yuan:
Tracking-forced Referring Video Object Segmentation. 5356-5364 - Xin Zhang, Shenghua Zhong, Jianmin Jiang:
Effective Optimization of Root Selection Towards Improved Explanation of Deep Classifiers. 5365-5373 - Guangchen Shi, Wei Zhu, Yirui Wu, Danhuai Zhao, Kang Zheng, Tong Lu:
Few-shot Semantic Segmentation via Perceptual Attention and Spatial Control. 5374-5383 - Zibo Ma, Bo Zhang, Zheng Zhang, Wu Liu, Wufan Wang, Hui Gao, Wendong Wang:
ADDG: An Adaptive Domain Generalization Framework for Cross-Plane MRI Segmentation. 5384-5392 - Lixiang Ru, Xin Guo, Lei Yu, Yingying Zhang, Jiangwei Lao, Jian Wang, Jingdong Chen, Yansheng Li, Ming Yang:
Parameter-Efficient Complementary Expert Learning for Long-Tailed Visual Recognition. 5393-5402 - Tianyuan Zhang, Lu Wang, Hainan Li, Yisong Xiao, Siyuan Liang, Aishan Liu, Xianglong Liu, Dacheng Tao:
LanEvil: Benchmarking the Robustness of Lane Detection to Environmental Illusions. 5403-5412 - Xinyue Zhang, Tingjin Luo, Yueying Liu, Chenping Hou:
Imbalanced Multi-instance Multi-label Learning via Coding Ensemble and Adaptive Thresholds. 5413-5422 - Pengxu Chen, Huazhong Liu, Jihong Ding, Jiawen Luo, Peng Tan, Laurence T. Yang:
Holistic-CAM: Ultra-lucid and Sanity Preserving Visual Interpretation in Holistic Stage of CNNs. 5423-5431 - Yihao Wang, Meng Yang, Rui Cao:
Fine-grained Semantic Alignment with Transferred Person-SAM for Text-based Person Retrieval. 5432-5441 - Qijie Wang, Guandu Liu, Bin Wang:
CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification. 5442-5450 - Rongyu Zhang, Zefan Cai, Huanrui Yang, Zidong Liu, Denis A. Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Baobao Chang, Yuan Du, Li Du, Shanghang Zhang:
VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness. 5451-5459 - Linhui Xiao, Xiaoshan Yang, Fang Peng, Yaowei Wang, Changsheng Xu:
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding. 5460-5469 - Yunfeng Fan, Wenchao Xu, Haozhao Wang, Junhong Liu, Song Guo:
Detached and Interactive Multimodal Learning. 5470-5478 - Chenglong Zhang, Xinyan Liang, Peng Zhou, Zhaolong Ling, Yingwei Zhang, Xingyu Wu, Weiguo Sheng, Bingbing Jiang:
Scalable Multi-view Unsupervised Feature Selection with Structure Learning and Fusion. 5479-5488 - Chengyi Yang, Mingda Dong, Xiaoyue Zhang, Jiayin Qi, Aimin Zhou:
Introducing Common Null Space of Gradients for Gradient Projection Methods in Continual Learning. 5489-5497 - Masoumeh Zareapoor, Pourya Shamsolmoali, Huiyu Zhou, Yue Lu, Salvador García:
Fractional Correspondence Framework in Detection Transformer. 5498-5506 - Geuntaek Lim, Hyunwoo Kim, Joonsoo Kim, Yukyung Choi:
Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization. 5507-5516 - Xihong Yang, Erxue Min, Ke Liang, Yue Liu, Siwei Wang, Sihang Zhou, Huijun Wu, Xinwang Liu, En Zhu:
GraphLearner: Graph Node Clustering with Fully Learnable Augmentation. 5517-5526 - Hongqiu Wang, Wei Wang, Haipeng Zhou, Huihui Xu, Shaozhi Wu, Lei Zhu:
Language-Driven Interactive Shadow Detection. 5527-5536 - Jinyu Cai, Yunhe Zhang, Zhoumin Lu, Wenzhong Guo, See-Kiong Ng:
Towards Effective Federated Graph Anomaly Detection via Self-boosted Knowledge Distillation. 5537-5546 - Chaofan Huo, Ye Shi, Jingya Wang:
Monocular Human-Object Reconstruction in the Wild. 5547-5555 - Baoqi Gao, Daoxu Sheng, Lei Zhang, Qi Qi, Bo He, Zirui Zhuang, Jingyu Wang:
STAR-VP: Improving Long-term Viewport Prediction in 360° Videos via Space-aligned and Time-varying Fusion. 5556-5565 - Hu Gao, Jing Yang, Ying Zhang, Jingfan Yang, Bowen Ma, Depeng Dang:
Learning Optimal Combination Patterns for Lightweight Stereo Image Super-Resolution. 5566-5574 - Yifan Wang, Wuliang Huang, Lei Li, Chun Yuan:
Semantic Distillation from Neighborhood for Composed Image Retrieval. 5575-5583 - Zhentao He, Changqun Xia, Shengye Qiao, Jia Li:
Text-prompt Camouflaged Instance Segmentation with Graduated Camouflage Learning. 5584-5593 - Zuyu Zhang, Yan Li, Byung-Seok Shin:
Embracing Domain Gradient Conflicts: Domain Generalization Using Domain Gradient Equilibrium. 5594-5603 - Ting Zhe, Jing Zhang, Yongqian Li, Yong Luo, Han Hu, Dacheng Tao:
Multi-Granularity Hand Action Detection. 5604-5613 - Xingyuan Mao, Yuwen Liu, Lianyong Qi, Li Duan, Xiaolong Xu, Xuyun Zhang, Wanchun Dou, Amin Beheshti, Xiaokang Zhou:
Cluster-driven Personalized Federated Recommendation with Interest-aware Graph Convolution Network for Multimedia. 5614-5622 - Yuan Sun, Kaiming Liu, Yongxiang Li, Zhenwen Ren, Jian Dai, Dezhong Peng:
Distribution Consistency Guided Hashing for Cross-Modal Retrieval. 5623-5632 - Luanyuan Dai, Xiaoyu Du, Jinhui Tang:
TrGa: Reconsidering the Application of Graph Neural Networks in Two-View Correspondence Pruning. 5633-5642 - Han Jiang, Haoyu Tang, Ming Yan, Ji Zhang, Mingzhu Xu, Yupeng Hu, Jihua Zhu, Liqiang Nie:
Revisiting Unsupervised Temporal Action Localization: The Primacy of High-Quality Actionness and Pseudolabels. 5643-5652 - Yu Liao, Xinfeng Zhang, Rui Yang, Jianwei Tao, Bai Liu, Zhipeng Hu, Shuang Wang, Zeng Zhao:
Selection and Reconstruction of Key Locals: A Novel Specific Domain Image-Text Retrieval Method. 5653-5662 - Wei Yang, Qingchen Yang:
Multimodal-aware Multi-intention Learning for Recommendation. 5663-5672 - Liupeng Li, Yuhua Zheng, Shupeng Liu, Xiaoyin Xu, Taihao Li:
Domain Knowledge Enhanced Vision-Language Pretrained Model for Dynamic Facial Expression Recognition. 5673-5682 - Yuting Zhang, Zhao Zhang, Yiqing Wu, Ying Sun, Fuzhen Zhuang, Wenhui Yu, Lantao Hu, Han Li, Kun Gai, Zhulin An, Yongjun Xu:
Tag Tree-Guided Multi-grained Alignment for Multi-Domain Short Video Recommendation. 5683-5691 - Kai Shao, Rui Wang, Yixue Hao, Long Hu, Min Chen, Hans-Arno Jacobsen:
Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition. 5692-5701 - Xinyu Li, Wenqing Ye, Yueyi Zhang, Xiaoyan Sun:
GRACE: GRadient-based Active Learning with Curriculum Enhancement for Multimodal Sentiment Analysis. 5702-5711 - Yuchen Pan, Junjun Jiang, Kui Jiang, Xianming Liu:
Disentangled-Multimodal Privileged Knowledge Distillation for Depression Recognition with Incomplete Multimodal Data. 5712-5721 - Yuanyuan Liu, Yuxuan Huang, Shuyang Liu, Yibing Zhan, Zijing Chen, Zhe Chen:
Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting. 5722-5731 - Aoqiang Zhu, Min Hu, Xiaohua Wang, Jiaoyun Yang, Yiming Tang, Fuji Ren:
KEBR: Knowledge Enhanced Self-Supervised Balanced Representation for Multimodal Sentiment Analysis. 5732-5741 - Zining Wang, Jinyang Guo, Ruihao Gong, Yang Yong, Aishan Liu, Yushi Huang, Jiaheng Liu, Xianglong Liu:
PTSBench: A Comprehensive Post-Training Sparsity Benchmark Towards Algorithms and Models. 5742-5751 - Longan Wang, Yang Qin, Yuan Sun, Dezhong Peng, Xi Peng, Peng Hu:
Robust Contrastive Cross-modal Hashing with Noisy Labels. 5752-5760 - Xiying Zheng, Yukang Zhang, Yang Lu, Hanzi Wang:
Semi-supervised Visible-Infrared Person Re-identification via Modality Unification and Confidence Guidance. 5761-5770 - Ziyang Zhou, Pinghui Wang, Zi Liang, Ruofei Zhang, Haitao Bai:
PAIR: Pre-denosing Augmented Image Retrieval Model for Defending Adversarial Patches. 5771-5779 - Daiqing Wu, Dongbao Yang, Yu Zhou, Can Ma:
Robust Multimodal Sentiment Analysis of Image-Text Pairs by Distribution-Based Feature Recovery and Fusion. 5780-5789 - Kunlun Xu, Haozhuo Zhang, Yu Li, Yuxin Peng, Jiahuan Zhou:
Mitigate Catastrophic Remembering via Continual Knowledge Purification for Noisy Lifelong Person Re-Identification. 5790-5799 - Wei Shen, Mang Ye, Wenke Huang:
Resisting Over-Smoothing in Graph Neural Networks via Dual-Dimensional Decoupling. 5800-5809 - Junlin Fang, Wenya Wang, Guosheng Lin, Fengmao Lv:
Sentiment-oriented Sarcasm Integration for Video Sentiment Analysis Enhancement with Sarcasm Assistance. 5810-5819 - Fanfan Wang, Heqing Ma, Xiangqing Shen, Jianfei Yu, Rui Xia:
Observe before Generate: Emotion-Cause aware Video Caption for Multimodal Emotion Cause Generation in Conversations. 5820-5828 - Yang Yang, Liyuan Cao, Haoyu Shi, Huaiwen Zhang:
Multi-Instance Multi-Label Learning for Text-motion Retrieval. 5829-5837 - Hongzu Su, Jingjing Li, Fengling Li, Ke Lu, Lei Zhu:
SOIL: Contrastive Second-Order Interest Learning for Multimodal Recommendation. 5838-5846 - Jiansong Qi, Yaping Huang, Ying Zhang, Sihui Zhang, Mei Tian, Yi Tian, Fanchao Meng, Lin Guan, Tianyi Chang:
Visual Question Answering Driven Eye Tracking Paradigm for Identifying Children with Autism Spectrum Disorder. 5847-5855 - Dongxiao He, Jinghan Zhang, Xiaobao Wang, Meng Ge, Zhiyong Feng, Longbiao Wang, Xiaoke Ma:
TUT4CRS: Time-aware User-preference Tracking for Conversational Recommendation System. 5856-5864 - Guoqing Yang, Zhiming Luo, Jianzhe Gao, Yingxin Lai, Kun Yang, Yifan He, Shaozi Li:
A Multilevel Guidance-Exploration Network and Behavior-Scene Matching Method for Human Behavior Anomaly Detection. 5865-5873 - Zekun Ai, Xiaotong Luo, Yanyun Qu, Yuan Xie:
SkipVSR: Adaptive Patch Routing for Video Super-Resolution with Inter-Frame Mask. 5874-5882 - Qianxin Huang, Siyao Peng, Xiaobo Shen, Yunhao Yuan, Shirui Pan:
Similarity Preserving Transformer Cross-Modal Hashing for Video-Text Retrieval. 5883-5891 - Wenxiao Zhang, Hossein Rahmani, Xun Yang, Jun Liu:
Reverse2Complete: Unpaired Multimodal Point Cloud Completion via Guided Diffusion. 5892-5901 - Yitong Sun, Yao Huang, Xingxing Wei:
Embodied Laser Attack: Leveraging Scene Priors to Achieve Agent-based Robust Non-contact Attacks. 5902-5910 - Yipo Huang, Xiangfei Sheng, Zhichao Yang, Quan Yuan, Zhichao Duan, Pengfei Chen, Leida Li, Weisi Lin, Guangming Shi:
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception. 5911-5920 - Ji Qiu, Peng Lu, Xujun Peng, Wenhao Guo, Zhaoran Zhao, Xiangtao Dong:
Learning Realistic Sketching: A Dual-agent Reinforcement Learning Approach. 5921-5929 - Xiaobo Shen, Gaoyao Yu, Yinfan Chen, Xichen Yang, Yuhui Zheng:
Graph Convolutional Semi-Supervised Cross-Modal Hashing. 5930-5938 - Harry Cheng, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan S. Kankanhalli:
Diffusion Facial Forgery Detection. 5939-5948 - Hengxing Liu, Mingjia Li, Xiaojie Guo:
Regional Attention For Shadow Removal. 5949-5957 - Hao Fang, Haoyuan Zhao, Jianxin Shi, Miao Zhang, Guanzhen Wu, Yi Ching Chou, Feng Wang, Jiangchuan Liu:
Robust Live Streaming over LEO Satellite Constellations: Measurement, Analysis, and Handover-Aware Adaptation. 5958-5966 - Qi Zang, Shuang Wang, Dong Zhao, Yang Hu, Dou Quan, Jinlong Li, Nicu Sebe, Zhun Zhong:
Generalized Source-Free Domain-adaptive Segmentation via Reliable Knowledge Propagation. 5967-5976 - Yunqiang Pei, Jialei Tang, Qihang Tang, Mingfeng Zha, Dongyu Xie, Guoqing Wang, Zhitao Liu, Ning Xie, Peng Wang, Yang Yang, Hengtao Shen:
Emotion Recognition in HMDs: A Multi-task Approach Using Physiological Signals and Occluded Faces. 5977-5986 - Xiaochao Pan, Jiawei Yao, Hongrui Kou, Tong Wu, Canran Xiao:
HarmonicNeRF: Geometry-Informed Synthetic View Augmentation for 3D Scene Reconstruction in Driving Scenarios. 5987-5996 - Guangyao Li, Henghui Du, Di Hu:
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues. 5997-6005 - Jiongming Qin, Fei Luo, Tuo Cao, Wenju Xu, Chunxia Xiao:
HS-Surf: A Novel High-Frequency Surface Shell Radiance Field to Improve Large-Scale Scene Rendering. 6006-6014 - Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu:
Harmony in Diversity: Improving All-in-One Image Restoration via Multi-Task Collaboration. 6015-6023 - Meichen Liu, Shuting He, Songnan Lin, Bihan Wen:
Dual-head Genre-instance Transformer Network for Arbitrary Style Transfer. 6024-6032 - Yingjie Zhou, Zicheng Zhang, Wei Sun, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai:
Subjective and Objective Quality-of-Experience Assessment for 3D Talking Heads. 6033-6042 - Zhi Zhou, Junke Zhu, Zhangjin Huang:
Gaussian Splatting with Neural Basis Extension. 6043-6052 - Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Yuhua Li, Ruixuan Li:
Learning Unknowns from Unknowns: Diversified Negative Prototypes Generator for Few-shot Open-Set Recognition. 6053-6062 - Jinxiao Zhang, Runmin Dong, Juepeng Zheng, Mengxuan Chen, Lixian Zhang, Yi Zhao, Haohuan Fu:
Spatial-Temporal Context Model for Remote Sensing Imagery Compression. 6063-6072 - Weiying Xie, Mei Yuan, Jitao Ma, Yunsong Li:
Adaptive Pruning of Channel Spatial Dependability in Convolutional Neural Networks. 6073-6082 - Heng Fang, Sheng Huang, Wenhao Tang, Luwen Huangfu, Bo Liu:
SAM-MIL: A Spatial Contextual Aware Multiple Instance Learning Approach for Whole Slide Image Classification. 6083-6092 - Wenhao Shen, Wanqi Yin, Hao Wang, Chen Wei, Zhongang Cai, Lei Yang, Guosheng Lin:
HMR-Adapter: A Lightweight Adapter with Dual-Path Cross Augmentation for Expressive Human Mesh Recovery. 6093-6102 - Shalayiding Sirejiding, Bayram Bayramli, Yuxiang Lu, Yuwen Yang, Tamam Alsarhan, Hongtao Lu, Yue Ding:
Task-Interaction-Free Multi-Task Learning with Efficient Hierarchical Feature Representation. 6103-6112 - Yiyong Xiao, Kai Shu, Haoyi Zhang, Baohua Yin, Wai Seng Cheang, Haoyang Wang, Jiechao Gao:
EGGesture: Entropy-Guided Vector Quantized Variational AutoEncoder for Co-Speech Gesture Generation. 6113-6122 - Yuqi Sun, Qing Lin, Weimin Tan, Bo Yan:
Audio-Driven Identity Manipulation for Face Inpainting. 6123-6132 - Leilei Ma, Hongxing Xie, Lei Wang, Yanping Fu, Dengdi Sun, Haifeng Zhao:
Text-Region Matching for Multi-Label Image Recognition with Missing Labels. 6133-6142 - Zhengwei Yin, Guixu Lin, Mengshun Hu, Hao Zhang, Yinqiang Zheng:
FlexIR: Towards Flexible and Manipulable Image Restoration. 6143-6152 - Hamed Alimohammadzadeh, Shahram Ghandeharizadeh:
Swarical: An Integrated Hierarchical Approach to Localizing Flying Light Specks. 6153-6161 - Xiaowen Cai, Yunbo Tao, Daizong Liu, Pan Zhou, Xiaoye Qu, Jianfeng Dong, Keke Tang, Lichao Sun:
Frequency-Aware GAN for Imperceptible Transfer Attack on 3D Point Clouds. 6162-6171 - Mingjin Zhang, Shilong Liu, Yuanjun Ouyang, Jie Guo, Zhihong Tang, Yunsong Li:
Explore Hybrid Modeling for Moving Infrared Small Target Detection. 6172-6181 - Yuhui Quan, Xiaoheng Tan, Yan Huang, Yong Xu, Hui Ji:
Enhancing Underwater Images via Asymmetric Multi-Scale Invertible Networks. 6182-6191 - Lishuang Zhan, Enting Ying, Jiabao Gan, Shihui Guo, Boyu Gao, Yipeng Qin:
SATPose: Improving Monocular 3D Pose Estimation with Spatial-aware Ground Tactility. 6192-6201 - Hongjian Zhan, Yangfu Li, Yu-Jie Xiong, Umapada Pal, Yue Lu:
Free Lunch: Frame-level Contrastive Learning with Text Perceiver for Robust Scene Text Recognition in Lightweight Models. 6202-6211 - Xin Wang, Kai Chen, Xingjun Ma, Zhineng Chen, Jingjing Chen, Yu-Gang Jiang:
AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning. 6212-6221 - Xudong Lv, Zhiwei He, Yuxiang Yang, Jiahao Nie, Jing Zhang:
SAR-SLAM: Self-Attentive Rendering-based SLAM with Neural Point Cloud Encoding. 6222-6231 - Shao-Kui Zhang, Junkai Huang, Liang Yue, Jia-Tong Zhang, Jia-Hong Liu, Yu-Kun Lai, Song-Hai Zhang:
SceneExpander: Real-Time Scene Synthesis for Interactive Floor Plan Editing. 6232-6240 - Long Tian, Hongyi Zhao, Ruiying Lu, Rongrong Wang, Yujie Wu, Liming Wang, Xiongpeng He, Xiyang Liu:
FOCT: Few-shot Industrial Anomaly Detection with Foreground-aware Online Conditional Transport. 6241-6249 - Chuang Liu, Yichao Cao, Xiu Su, Haogang Zhu:
Universal Frequency Domain Perturbation for Single-Source Domain Generalization. 6250-6259 - Yushun Tang, Shuoshuo Chen, Jiyuan Jia, Yi Zhang, Zhihai He:
Domain-Conditioned Transformer for Fully Test-time Adaptation. 6260-6269 - Zhiru Wang, Shiyun Xie, Chengwei Pan, Guoping Wang:
SpecGaussian with Latent Features: A High-quality Modeling of the View-dependent Appearance for 3D Gaussian Splatting. 6270-6278 - Wencheng Han, Chen Zhang, Yang Zhou, Wentao Liu, Chen Qian, Chengzhong Xu, Jianbing Shen:
Prior Metadata-Driven RAW Reconstruction: Eliminating the Need for Per-Image Metadata. 6279-6287 - Fulin Luo, Yi Liu, Xiuwen Gong, Zhixiong Nan, Tan Guo:
EMVCC: Enhanced Multi-View Contrastive Clustering for Hyperspectral Images. 6288-6296 - Fan Nie, Jiangqun Ni, Jian Zhang, Bin Zhang, Weizhe Zhang:
FRADE: Forgery-aware Audio-distilled Multimodal Learning for Deepfake Detection. 6297-6306 - Siru Zhong, Xixuan Hao, Yibo Yan, Ying Zhang, Yangqiu Song, Yuxuan Liang:
UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain Adaptation. 6307-6315 - Yuzhen Niu, Lifen Yang, Rui Xu, Yuezhou Li, Yuzhong Chen:
MiNet: Weakly-Supervised Camouflaged Object Detection through Mutual Interaction between Region and Edge Cues. 6316-6325 - Delong Zhang, Yi-Xing Peng, Xiao-Ming Wu, Ancong Wu, Weishi Zheng:
PixelFade: Privacy-preserving Person Re-identification with Noise-guided Progressive Replacement. 6326-6334 - Wei He, Xiang Li, Shengtian Xu, Yuzheng Chen, Chan-In Sio, Ge Lin Kan, Lik-Hang Lee:
MetaDragonBoat: Exploring Paddling Techniques of Virtual Dragon Boating in a Metaverse Campus. 6335-6344 - Yuxuan Lu, Jiahao Nie, Zhiwei He, Hongjie Gu, Xudong Lv:
VoxelTrack: Exploring Multi-level Voxel Representation for 3D Point Cloud Object Tracking. 6345-6354 - Yu Liu, Longhan Feng, Qi Jia, Zezheng Liu, Zi-Huang Cao:
Two Teachers Are Better Than One: Semi-supervised Elliptical Object Detection by Dual-Teacher Collaborative Guidance. 6355-6363 - Yao Luo, Ming Yang, Jinhui Tang:
Dual-view Pyramid Network for Video Frame Interpolation. 6364-6373 - Junxiong Lin, Zen Tao, Xuan Tong, Xinji Mai, Haoran Wang, Boyang Wang, Yan Wang, Qing Zhao, Jiawen Yu, Yuxuan Lin, Shaoqi Yan, Shuyong Gao, Wenqiang Zhang:
Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution. 6374-6383 - Wenxiao Zhang, Ziqi Wang, Li Xu, Xun Yang, Jun Liu:
Informative Point cloud Dataset Extraction for Classification via Gradient-based Points Moving. 6384-6393 - Jia-Hong Liu, Shao-Kui Zhang, Chuyue Zhang, Song-Hai Zhang:
Controllable Procedural Generation of Landscapes. 6394-6403 - Fangjian Liao, Xingxing Zou, Waikeung Wong:
Uni-DlLoRA: Style Fine-Tuning for Fashion Image Translation. 6404-6413 - Yusen Wang, Kaixuan Zhou, Wenxiao Zhang, Chunxia Xiao:
MegaSurf: Scalable Large Scene Neural Surface Reconstruction. 6414-6423 - Zherui Qiu, Chenqu Ren, Kaiwen Song, Xiaoyi Zeng, Leyuan Yang, Juyong Zhang:
Deformable NeRF using Recursively Subdivided Tetrahedra. 6424-6432 - Mamta, Gopendra Vikram Singh, Deepak Raju Kori, Asif Ekbal:
Aspect-Based Multimodal Mining: Unveiling Sentiments, Complaints, and Beyond in User-Generated Content. 6433-6442 - Zichen Liu, Yuxin Peng, Jiahuan Zhou:
InsVP: Efficient Instance Visual Prompting from Image Itself. 6443-6452 - Zidu Wang, Xiangyu Zhu, Jiang Yu, Tianshuo Zhang, Zhen Lei:
S2TD-Face: Reconstruct a Detailed 3D Face with Controllable Texture from a Single Sketch. 6453-6462 - Satoshi Kosugi:
Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement. 6463-6471 - Xun Jiang, Zhuoyuan Wei, Shenshen Li, Xing Xu, Jingkuan Song, Heng Tao Shen:
Counterfactually Augmented Event Matching for De-biased Temporal Sentence Grounding. 6472-6481 - Bingzhi Chen, Ruihan Liu, Yishu Liu, Xiaozhao Fang, Jiahui Pan, Guangming Lu, Zheng Zhang:
Stay Focused is All You Need for Adversarial Robustness. 6482-6491 - Zhi Zeng, Minnan Luo, Xiangzheng Kong, Huan Liu, Hao Guo, Hao Yang, Zihan Ma, Xiang Zhao:
Mitigating World Biases: A Multimodal Multi-View Debiasing Framework for Fake News Video Detection. 6492-6500 - Zibin Liu, Banglei Guan, Yang Shang, Shunkun Liang, Zhenbao Yu, Qifeng Yu:
Optical Flow-Guided 6DoF Object Pose Tracking with an Event Camera. 6501-6509 - Junran Wu, Xueyuan Chen, Shangzhe Li:
Uncovering Capabilities of Model Pruning in Graph Contrastive Learning. 6510-6519 - Zheng Wei, Yuzheng Chen, Wai Tong, Xuan Zong, Huamin Qu, Xian Xu, Lik-Hang Lee:
Hearing the Moment with MetaEcho! From Physical to Virtual in Synchronized Sound Recording. 6520-6529 - Cong Wang, Chengjin Yu, Jie Mu, Wei Wang:
PercepLIE: A New Path to Perceptual Low-Light Image Enhancement. 6530-6539 - Xin Cheng, Hao Wang, Jinwei Wang, Xiangyang Luo, Bin Ma:
Advancing Quantization Steps Estimation: A Two-Stream Network Approach for Enhancing Robustness. 6540-6548 - Mingjin Zhang, Longyi Li, Wenxuan Shi, Jie Guo, Yunsong Li, Xinbo Gao:
VmambaSCI: Dynamic Deep Unfolding Network with Mamba for Compressive Spectral Imaging. 6549-6558 - Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling:
Speech Reconstruction from Silent Lip and Tongue Articulation by Diffusion Models and Text-Guided Pseudo Target Generation. 6559-6568 - Junyuan Guo, Hao Tang, Teng Wang, Chao Wang:
R4D-planes: Remapping Planes For Novel View Synthesis and Self-Supervised Decoupling of Monocular Videos. 6569-6577 - Wu Chen, Hehe Fan, Qiuping Jiang, Chao Huang, Yi Yang:
Progressive Point Cloud Denoising with Cross-Stage Cross-Coder Adaptive Edge Graph Convolution Network. 6578-6587 - Mingyang Sun, Qipeng Yan, Zhuoer Liang, Dongliang Kou, Dingkang Yang, Ruisheng Yuan, Xiao Zhao, Mingcheng Li, Lihua Zhang:
IF-Garments: Reconstructing Your Intersection-Free Multi-Layered Garments from Monocular Videos. 6588-6597 - Bo Dong, Pichao Wang, Hao Luo, Fan Wang:
Adaptive Query Selection for Camouflaged Instance Segmentation. 6598-6606 - Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai:
TAVGBench: Benchmarking Text to Audible-Video Generation. 6607-6616 - Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Yixue Hao, Long Hu, Min Chen:
MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors. 6617-6626 - Guan Luo, Tian-Xing Xu, Ying-Tian Liu, Xiaoxiong Fan, Fang-Lue Zhang, Song-Hai Zhang:
3D Gaussian Editing with A Single Image. 6627-6636 - Zhenhong Sun, Junyan Wang, Zhiyu Tan, Daoyi Dong, Hailan Ma, Hao Li, Dong Gong:
EGGen: Image Generation with Multi-entity Prior Learning through Entity Guidance. 6637-6645 - Zhengzhong Kuang, Jianan Lu, Chenhui Hong, Haobin Huang, Suguo Zhu, Xiaowei Zhao, Jun Yu, Jianping Fan:
Latent Representation Reorganization for Face Privacy Protection. 6646-6655 - Wulin Xie, Xiaohuan Lu, Yadong Liu, Jiang Long, Bob Zhang, Shuping Zhao, Jie Wen:
Uncertainty-Aware Pseudo-Labeling and Dual Graph Driven Network for Incomplete Multi-View Multi-Label Classification. 6656-6665 - Mingzhao Yang, Shangchao Su, Bin Li, Xiangyang Xue:
FedDEO: Description-Enhanced One-Shot Federated Learning with Diffusion Models. 6666-6675 - Ruiyang Xia, Dawei Zhou, Decheng Liu, Lin Yuan, Shuodi Wang, Jie Li, Nannan Wang, Xinbo Gao:
Advancing Generalized Deepfake Detector with Forgery Perception Guidance. 6676-6685 - Hongye Hou, Xuehao Gao, Zhan Liu, Yang Yang:
Dig into Detailed Structures: Key Context Encoding and Semantic-based Decoding for Point Cloud Completion. 6686-6695 - Tao Liu, Feilong Chen, Shuai Fan, Chenpeng Du, Qi Chen, Xie Chen, Kai Yu:
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding. 6696-6705 - Qi Chen, Wenjie Liu, Hu Ding:
A Novel Confidence Guided Training Method for Conditional GANs with Auxiliary Classifier. 6706-6714 - Yukang Lin, Haonan Han, Chaoqun Gong, Zunnan Xu, Yachao Zhang, Xiu Li:
Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors. 6715-6724 - Zhaoyu Zhang, Yang Hua, Guanxiong Sun, Hui Wang, Seán F. McLoone:
Improving the Training of the GANs with Limited Data via Dual Adaptive Noise Injection. 6725-6734 - Changgu Chen, Libing Yang, Xiaoyan Yang, Lianggangxu Chen, Gaoqi He, Changbo Wang, Yang Li:
FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models. 6735-6744 - Tianyi Lu, Xing Zhang, Jiaxi Gu, Renjing Pei, Songcen Xu, Xingjun Ma, Hang Xu, Zuxuan Wu:
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models. 6745-6754 - Zhichao Liao, Fengyuan Piao, Di Huang, Xinghui Li, Yue Ma, Pingfa Feng, Heming Fang, Long Zeng:
Freehand Sketch Generation from Mechanical Components. 6755-6764 - Qishan Zhang, Shuangbing Wen, Tao Hu:
Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier. 6765-6773 - Bohong Chen, Yumeng Li, Yao-Xiang Ding, Tianjia Shao, Kun Zhou:
Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion Generation. 6774-6783 - Xiangcheng Du, Zhao Zhou, Xingjiao Wu, Yanlong Wang, Zhuoyao Wang, Yingbin Zheng, Cheng Jin:
MultiColor: Image Colorization by Learning from Multiple Color Spaces. 6784-6792 - Haozhe Jia, Yan Li, Hengfei Cui, Di Xu, Yuwang Wang, Tao Yu:
DisControlFace: Adding Disentangled Control to Diffusion Autoencoder for One-shot Explicit Facial Image Editing. 6793-6802 - Lutao Jiang, Hangyu Li, Lin Wang:
A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness. 6803-6812 - Yiluo Wei, Gareth Tyson:
Understanding the Impact of AI-Generated Content on Social Media: The Pixiv Case. 6813-6822 - Ruiqi Zhang, Jie Chen:
Mesh-Centric Gaussian Splatting for Human Avatar Modelling with Real-time Dynamic Mesh Reconstruction. 6823-6832 - Bo Xiong, Changqing Su, Zihan Lin, Yanqin Chen, You Zhou, Zhen Cheng, Zhaofei Yu, Tiejun Huang:
Real-time Parameter Evaluation of High-speed Microfluidic Droplets using Continuous Spike Streams. 6833-6841 - Qi Mao, Lan Chen, Yuchao Gu, Zhen Fang, Mike Zheng Shou:
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance. 6842-6850 - Guan-Yuan Chen, Von-Wun Soo:
Controllable Music Loops Generation with MIDI and Text via Multi-Stage Cross Attention and Instrument-Aware Reinforcement Learning. 6851-6859 - Weitian Zhang, Yichao Yan, Yunhui Liu, Xingdong Sheng, Xiaokang Yang:
E3Gen: Efficient, Expressive and Editable Avatars Generation. 6860-6869 - Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Chong-Wah Ngo, Tao Mei:
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models. 6870-6879 - Shuo Huang, Shikun Sun, Zixuan Wang, Xiaoyu Qin, Yanmin Xiong, Yuan Zhang, Pengfei Wan, Di Zhang, Jia Jia:
PlacidDreamer: Advancing Harmony in Text-to-3D Generation. 6880-6889 - Xiaodi Li:
Streamable Portrait Video Editing with Probabilistic Pixel Correspondence. 6890-6899 - Xuan Hai, Xin Liu, Yuan Tan, Gang Liu, Song Li, Weina Niu, Rui Zhou, Xiaokang Zhou:
What's the Real: A Novel Design Philosophy for Robust AI-Synthesized Voice Detection. 6900-6909 - Xiangyang Luo, Xin Zhang, Yifan Xie, Xinyi Tong, Weijiang Yu, Heng Chang, Fei Ma, Fei Richard Yu:
CodeSwap: Symmetrically Face Swapping Based on Prior Codebook. 6910-6919 - Ruofan Wang, Xingjun Ma, Hanxu Zhou, Chuanjun Ji, Guangnan Ye, Yu-Gang Jiang:
White-box Multimodal Jailbreaks Against Large Vision-Language Models. 6920-6928 - Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang:
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model. 6929-6938 - Weifeng Chen, Tao Gu, Yuhao Xu, Arlene Chen:
Magic Clothing: Controllable Garment-Driven Image Synthesis. 6939-6948 - Yiluo Wei, Yiming Zhu, Pan Hui, Gareth Tyson:
Exploring the Use of Abusive Generative AI Models on Civitai. 6949-6958 - Xiuliang Duan, Dating Tan, Liangda Fang, Yuyu Zhou, Chaobo He, Ziliang Chen, Lusheng Wu, Guanliang Chen, Zhiguo Gong, Weiqi Luo, Quanlong Guan:
Reason-and-Execute Prompting: Enhancing Multi-Modal Large Language Models for Solving Geometry Questions. 6959-6968 - Weiye Xu, Min Wang, Wengang Zhou, Houqiang Li:
P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task. 6969-6978 - Wenjie Xuan, Yufei Xu, Shanshan Zhao, Chaoyue Wang, Juhua Liu, Bo Du, Dacheng Tao:
When ControlNet Meets Inexplicit Masks: A Case Study of ControlNet on its Contour-following Ability. 6979-6988 - Wenshuo Chen, Hongru Xiao, Erhang Zhang, Lijie Hu, Lei Wang, Mengyuan Liu, Chen Chen:
SATO: Stable Text-to-Motion Framework. 6989-6997 - Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Wei Xue, Qifeng Liu, Yike Guo:
FlashSpeech: Efficient Zero-Shot Speech Synthesis. 6998-7007 - Huadai Liu, Rongjie Huang, Yang Liu, Hengyuan Cao, Jialei Wang, Xize Cheng, Siqi Zheng, Zhou Zhao:
AudioLCM: Efficient and High-Quality Text-to-Audio Generation with Minimal Inference Steps. 7008-7017 - Jiaxu Zhang, Xin Chen, Gang Yu, Zhigang Tu:
Generative Motion Stylization of Cross-structure Characters within Canonical Motion Space. 7018-7026 - Fengqi Liu, Hexiang Wang, Jingyu Gong, Ran Yi, Qianyu Zhou, Xuequan Lu, Jiangbo Lu, Lizhuang Ma:
Emphasizing Semantic Consistency of Salient Posture for Speech-Driven Gesture Generation. 7027-7035 - Tianyi Zheng, Cong Geng, Peng-Tao Jiang, Ben Wan, Hao Zhang, Jinwei Chen, Jia Wang, Bo Li:
Non-uniform Timestep Sampling: Towards Faster Diffusion Model Training. 7036-7045 - Miaoxin Ye, Saixing Zhou, Weiqi Luo, Shunquan Tan, Jiwu Huang:
GAN-based Symmetric Embedding Costs Adjustment for Enhancing Image Steganographic Security. 7046-7054 - Yaqi Li, Han Fang, Zerun Feng, Kaijing Ma, Chao Ban, Xianghao Zang, Lanxiang Zhou, Zhongjiang He, Jingyan Chen, Jiani Hu, Hao Sun, Huayu Zhang:
GOAL: Grounded text-to-image Synthesis with Joint Layout Alignment Tuning. 7055-7064 - Jinfeng Wei, Xiaofeng Zhang:
DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer. 7065-7074 - Yang Luo, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Zhineng Chen, Yu-Gang Jiang, Tao Mei:
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process. 7075-7084 - Wenquan Lu, Yufei Xu, Jing Zhang, Chaoyue Wang, Dacheng Tao:
HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting. 7085-7093 - Miao Liu, Jing Wang, Xinyuan Qian, Haizhou Li:
ListenFormer: Responsive Listening Head Generation with Non-autoregressive Transformers. 7094-7103 - Jie Hu, Jie Li, Yue Ma, Liujuan Cao, Songan Zhang, Wei Zhang, Guannan Jiang, Rongrong Ji:
Prompting to Adapt Foundational Segmentation Models. 7104-7112 - Zhiyuan Ma, Guoli Jia, Biqing Qi, Bowen Zhou:
Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking. 7113-7122 - Jin Sun, Xiaoshuang Shi, Zhiyuan Wang, Kaidi Xu, Heng Tao Shen, Xiaofeng Zhu:
Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation. 7123-7132 - Yuanbin Wang, Weilun Dai, Long Chan, Huanyu Zhou, Aixi Zhang, Si Liu:
GPD-VVTO: Preserving Garment Details in Video Virtual Try-On. 7133-7142 - Hengfei Wang, Zhongqun Zhang, Yihua Cheng, Hyung Jin Chang:
TextGaze: Gaze-Controllable Face Generation with Natural Language. 7143-7151 - Huiming Zheng, Wei Gao, Zhuozhen Yu, Tiesong Zhao, Ge Li:
ViewPCGC: View-Guided Learned Point Cloud Geometry Compression. 7152-7161 - Liyang He, Zhenya Huang, Chenglong Liu, Rui Li, Runze Wu, Qi Liu, Enhong Chen:
One-bit Deep Hashing: Towards Resource-Efficient Hashing Model with Binary Neural Network. 7162-7171 - Xinghao Wu, Xuefeng Liu, Jianwei Niu, Haolin Wang, Shaojie Tang, Guogang Zhu, Hao Su:
Decoupling General and Personalized Knowledge in Federated Learning via Additive and Low-rank Decomposition. 7172-7181 - Hengyi Wang, Weiying Xie, Jitao Ma, Daixun Li, Yunsong Li:
FedSLS: Exploring Federated Aggregation in Saliency Latent Space. 7182-7190 - Zhongchi Wang, Hailong Sun, Zhengyang Zhao:
FedEvalFair: A Privacy-Preserving and Statistically Grounded Federated Fairness Evaluation Framework. 7191-7199 - Weitao Tang, Jianqiang Li, Meijie Du, Die Hu, Qingyun Liu:
Zenith: Real-time Identification of DASH Encrypted Video Traffic with Distortion. 7200-7209 - Beizhang Guo, Juntao Bao, Baili Chai, Di Wu, Miao Hu:
Lumos: Optimizing Live 360-degree Video Upstreaming via Spatial-Temporal Integrated Neural Enhancement. 7210-7219 - Zhongnian Li, Meng Wei, Peng Ying, Tongfeng Sun, Xinzheng Xu:
Learning from Concealed Labels. 7220-7228 - Xiangxiang Dai, Zeyu Zhang, Peng Yang, Yuedong Xu, Xutong Liu, John C. S. Lui:
AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics. 7229-7238 - Shuo Wang, Yongcai Wang, Zhimin Xu, Yongyu Guo, Wanting Li, Zhe Huang, Xuewei Bai, Deying Li:
GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System. 7239-7248 - Yiyang Jiang, Wengyu Zhang, Xulu Zhang, Xiaoyong Wei, Chang Wen Chen, Qing Li:
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval. 7249-7258
Oral Session 18: Fairness, Trust, Explainability & Inperpretability in Multimedia
- Peiwen Sun, Honggang Zhang, Di Hu:
Unveiling and Mitigating Bias in Audio Visual Segmentation. 7259-7268 - Ying Liu, Lihong Liu, Cai Xu, Xiangyu Song, Ziyu Guan, Wei Zhao:
Dynamic Evidence Decoupling for Trusted Multi-view Learning. 7269-7277 - Wei Liu, Yufei Chen, Xiaodong Yue:
Building Trust in Decision with Conformalized Multi-view Deep Classification. 7278-7287 - Daoming Zong, Chaoyue Ding, Kaitao Chen:
Toward Explainable Physical Audiovisual Commonsense Reasoning. 7288-7297 - Jingjie Zeng, Zhihao Yang, Qi Yang, Liang Yang, Hongfei Lin:
Peeling Back the Layers: Interpreting the Storytelling of ViT. 7298-7306 - Chihaya Matsuhira, Marc A. Kastner, Takahiro Komamizu, Takatsugu Hirayama, Ichiro Ide:
Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation. 7307-7315
Oral Session 19: Multimodal Applications
- Minghui Wu, Chenxu Zhao, Anyang Su, Donglin Di, Tianyu Fu, Da An, Min He, Ya Gao, Meng Ma, Kun Yan, Ping Wang:
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding. 7316-7325 - Yanglin Deng, Tianyang Xu, Chunyang Cheng, Xiao-Jun Wu, Josef Kittler:
MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion. 7326-7335 - Ziyan Li, Jianfei Yu, Jia Yang, Wenya Wang, Li Yang, Rui Xia:
Generative Multimodal Data Augmentation for Low-Resource Multimodal Named Entity Recognition. 7336-7345 - Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang:
WorldGPT: Empowering LLM as Multimodal World Model. 7346-7355 - Yiming Li, Zhifang Guo, Xiangdong Wang, Hong Liu:
Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training. 7356-7365 - Yingxuan Li, Ryota Hinami, Kiyoharu Aizawa, Yusuke Matsui:
Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion. 7366-7374
Oral Session 20: Datasets & Algorithms for Multimedia Analysis
- Chunyi Li, Haoning Wu, Hongkun Hao, Zicheng Zhang, Tengchuan Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, Guangtao Zhai:
G-Refine: A General Quality Refiner for Text-to-Image Generation. 7375-7384 - Wenqiang Xu, Wenrui Dai, Ziyang Zheng, Chenglin Li, Junni Zou, Hongkai Xiong:
Point Cloud Upsampling with Geometric Algebra Driven Inverse Heat Dissipation. 7385-7394 - Junyan Wu, Wei Lu, Xiangyang Luo, Rui Yang, Qian Wang, Xiaochun Cao:
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization. 7395-7403 - Fujun Han, Peng Ye, Shukai Duan, Lidan Wang:
Ada-iD: Active Domain Adaptation for Intrusion Detection. 7404-7413 - Zhixi Cai, Shreya Ghosh, Aman Pankaj Adatia, Munawar Hayat, Abhinav Dhall, Tom Gedeon, Kalin Stefanov:
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset. 7414-7423 - Rintaro Yanagi, Ren Togo, Takahiro Ogawa, Miki Haseyama:
DQG: Database Question Generation for Exact Text-based Image Retrieval. 7424-7433
Oral Session 21: Image Enhancement and Super-Resolution
- Tongshun Zhang, Pingping Liu, Ming Zhao, Haotian Lv:
DMFourLLIE: Dual-Stage and Multi-Branch Fourier Network for Low-Light Image Enhancement. 7434-7443 - Fei Gao, Yuhao Lin, Jiaqi Shi, Maoying Qiao, Nannan Wang:
AesMamba: Universal Image Aesthetic Assessment with State Space Models. 7444-7453 - Yi Dong, Yuxi Wang, Zheng Fang, Wenqi Ouyang, Xianhui Lin, Zhiqi Shen, Peiran Ren, Xuansong Xie, Qingming Huang:
MovingColor: Seamless Fusion of Fine-grained Video Color Enhancement. 7454-7463 - Ruibin Li, Jingcai Guo, Qihua Zhou, Song Guo:
FreePIH: Training-Free Painterly Image Harmonization with Diffusion Model. 7464-7473 - Qian Huang, Cheng Xu, Guiqing Li, Ziheng Wu, Shengxin Liu, Shengfeng He:
Portrait Shadow Removal via Self-Exemplar Illumination Equalization. 7474-7482 - Qiwen Zhu, Yanjie Wang, Shilv Cai, Liqun Chen, Jiahuan Zhou, Luxin Yan, Sheng Zhong, Xu Zou:
Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem. 7483-7492
Oral Session 22: Audio-visual Datasets and Applications
- Han Wang, Tan Rui Yang, Usman Naseem, Roy Ka-Wei Lee:
MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili. 7493-7502 - Jiale Yu, Baopeng Zhang, Zhu Teng, Jianping Fan:
OpenAVE: Moving towards Open Set Audio-Visual Event Localization. 7503-7512 - Xinfa Zhu, Wenjie Tian, Xinsheng Wang, Lei He, Yujia Xiao, Xi Wang, Xu Tan, Sheng Zhao, Lei Xie:
UniStyle: Unified Style Modeling for Speaking Style Captioning and Stylistic Speech Synthesis. 7513-7522 - Zhedong Zhang, Liang Li, Gaoxiang Cong, Haibing Yin, Yuhan Gao, Chenggang Yan, Anton van den Hengel, Yuankai Qi:
From Speaker to Dubber: Movie Dubbing with Prosody and Duration Consistency Learning. 7523-7532 - Ruohao Guo, Liao Qu, Dantong Niu, Yanyu Qi, Wenzhen Yue, Ji Shi, Bowei Xing, Xianghua Ying:
Open-Vocabulary Audio-Visual Semantic Segmentation. 7533-7541
Oral Session 23: Multimodal Learning and Recommendation Systems
- Hongcheng Li, Yucan Zhou, Xiaoyan Gu, Bo Li, Weiping Wang:
Diversified Semantic Distribution Matching for Dataset Distillation. 7542-7550 - Jinghao Zhang, Guofan Liu, Qiang Liu, Shu Wu, Liang Wang:
Modality-Balanced Learning for Multimedia Recommendation. 7551-7560 - Ziyi Ye, Jingtao Zhan, Qingyao Ai, Yiqun Liu, Maarten de Rijke, Christina Lioma, Tuukka Ruotsalo:
Query Augmentation with Brain Signals. 7561-7570 - Lei Shi, Jiapeng Yang, Pengtao Lv, Lu Yuan, Feifei Kou, Jia Luo, Mingying Xu:
Self-derived Knowledge Graph Contrastive Learning for Recommendation. 7571-7580 - Jiaye Lin, Qing Li, Guorui Xie, Zhongxu Guan, Yong Jiang, Ting Xu, Zhong Zhang, Peilin Zhao:
Mitigating Sample Selection Bias with Robust Domain Adaption in Multimedia Recommendation. 7581-7590 - Yangqin Jiang, Lianghao Xia, Wei Wei, Da Luo, Kangyi Lin, Chao Huang:
DiffMM: Multi-Modal Diffusion Model for Recommendation. 7591-7599
Oral Session 24: Novel Multimedia Applications 2
- Tongtong Feng, Xin Wang, Feilin Han, Leping Zhang, Wenwu Zhu:
U2UData: A Large-scale Cooperative Perception Dataset for Swarm UAVs Autonomous Flight. 7600-7608 - Chaoqun Niu, Dongdong Chen, Jizhe Zhou, Jian Wang, Xiang Luo, Quan-Hui Liu, Yuan Li, Jiancheng Lv:
Neural Boneprint: Person Identification from Bones Using Generative Contrastive Deep Learning. 7609-7618 - Xueli Hu, Huan Liu, Haocheng Yuan, Zhiyang Fu, Yizhi Luo, Ning Zhang, Hang Zou, Jianwen Gan, Yuan Zhang:
Fine-Grained Prompt Learning for Face Anti-Spoofing. 7619-7628 - Xiao Han, Yiming Ren, Yichen Yao, Yujing Sun, Yuexin Ma:
Towards Practical Human Motion Prediction with LiDAR Point Clouds. 7629-7638 - Haodong Hong, Sen Wang, Zi Huang, Qi Wu, Jiajun Liu:
Navigating Beyond Instructions: Vision-and-Language Navigation in Obstructed Environments. 7639-7648 - Minghe Gao, Juncheng Li, Hao Fei, Liang Pang, Wei Ji, Guoming Wang, Zheqi Lv, Wenqiao Zhang, Siliang Tang, Yueting Zhuang:
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback. 7649-7657
Oral Session 25: Media and Communication Technologies
- Jingjing Liu, Youyi Zheng, Kun Zhou:
Virtual Agent Positioning Driven by Personal Characteristics. 7658-7666 - Meng Luo, Hao Fei, Bobo Li, Shengqiong Wu, Qian Liu, Soujanya Poria, Erik Cambria, Mong-Li Lee, Wynne Hsu:
PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis. 7667-7676 - Yawen Luo, Min Shi, Liao Shen, Yachuan Huang, Zixuan Ye, Juewen Peng, Zhiguo Cao:
Video Bokeh Rendering: Make Casual Videography Cinematic. 7677-7685 - Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Zhimeng Huang, Yuhua Li, Ruixuan Li:
MICM: Rethinking Unsupervised Pretraining for Enhanced Few-shot Learning. 7686-7695 - Zejun Zhang, Xiao Zhu, Anlan Zhang, Feng Qian:
An In-depth Study of Bandwidth Allocation across Media Sources in Video Conferencing. 7696-7704 - Zixuan Yang, Yushu Zhang, Tao Wang, Zhongyun Hua, Zhihua Xia, Jian Weng:
Once-for-all: Efficient Visual Face Privacy Protection via Person-specific Veils. 7705-7713
Oral Session 26: Cultural Heritage & Media Analysis
- Shipeng Zhu, Hui Xue, Na Nie, Chenjie Zhu, Haiyue Liu, Pengfei Fang:
Reproducing the Past: A Dataset for Benchmarking Inscription Restoration. 7714-7723 - Jiao Pan, Liang Li, Hiroshi Yamaguchi, Kyoko Hasegawa, Fadjar Ibnu Thufail, Brahmantara, Xiaojuan Ban, Satoshi Tanaka:
Reconstructing, Understanding, and Analyzing Relief Type Cultural Heritage from a Single Old Photo. 7724-7733 - Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen:
GalleryGPT: Analyzing Paintings with Large Multimodal Models. 7734-7743 - Jun Ma, Tuukka Ruotsalo:
Cognition-Supervised Saliency Detection: Contrasting EEG Signals and Visual Stimuli. 7744-7753 - Yizhang Liu, Weiwei Zhou, Yanping Li, Shengjie Zhao:
RoSe: Rotation-Invariant Sequence-Aware Consensus for Robust Correspondence Pruning. 7754-7763 - Yujia Wang, Fang-Lue Zhang, Neil A. Dodgson:
ScanTD: 360° Scanpath Prediction based on Time-Series Diffusion. 7764-7773
Oral Session 27: Security & Quality in Multimedia Systems
- Dunyun Chen, Xin Liao, Xiaoshuai Wu, Shiwei Chen:
SafePaint: Anti-forensic Image Inpainting with Domain Adaptation. 7774-7782 - Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Wei Sun, Chaofeng Chen, Xiongkuo Min, Xiaohong Liu, Weisi Lin, Guangtao Zhai:
LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM. 7783-7792 - Tengchuan Kou, Xiaohong Liu, Zicheng Zhang, Chunyi Li, Haoning Wu, Xiongkuo Min, Guangtao Zhai, Ning Liu:
Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment. 7793-7802 - Puyi Wang, Wei Sun, Zicheng Zhang, Jun Jia, Yanwei Jiang, Zhichao Zhang, Xiongkuo Min, Guangtao Zhai:
Large Multi-modality Model Assisted AI-Generated Image Quality Assessment. 7803-7812 - Xuemei Zhou, Irene Viola, Yunlu Chen, Jiahuan Pei, Pablo César:
Deciphering Perceptual Quality in Colored Point Cloud: Prioritizing Geometry or Texture Distortion? 7813-7822 - Desen Yuan, Lei Wang:
Dual-Criterion Quality Loss for Blind Image Quality Assessment. 7823-7832
Oral Session 28: Complex Scene Processing
- Zhe Huang, Shuo Wang, Yongcai Wang, Wanting Li, Deying Li, Lei Wang:
RoCo: Robust Cooperative Perception By Iterative Object Matching and Pose Adjustment. 7833-7842 - Shao-Kui Zhang, Hanxi Zhu, Xuebin Chen, Jinghuan Chen, Zhike Peng, Ziyang Chen, Yong-Liang Yang, Song-Hai Zhang:
ScenePhotographer: Object-Oriented Photography for Residential Scenes. 7843-7851 - Changli Wu, Yihang Liu, Jiayi Ji, Yiwei Ma, Haowei Wang, Gen Luo, Henghui Ding, Xiaoshuai Sun, Rongrong Ji:
3D-GRES: Generalized 3D Referring Expression Segmentation. 7852-7861 - Xuan Han, Yihao Zhao, Mingyu You:
Scene Diffusion: Text-driven Scene Image Synthesis Conditioning on a Single 3D Model. 7862-7870 - Jinbo Yan, Rui Peng, Luyang Tang, Ronggang Wang:
4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes. 7871-7880 - Hongtao Wu, Yijun Yang, Huihui Xu, Weiming Wang, Jinni Zhou, Lei Zhu:
RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining. 7881-7890
Oral Session 29: Enhancements in Video Streaming and Compression
- Bo Wu, Tong Li, Cheng Luo, Xu Yan, Fuyu Wang, Xinle Du, Ke Xu:
Toward Timeliness-Enhanced Loss Recovery for Large-Scale Live Streaming. 7891-7899 - Fangtao Zhou, Xiaofeng Huang, Peng Zhang, Meng Wang, Zhao Wang, Yang Zhou, Haibing Yin:
Enhanced Screen Content Image Compression: A Synergistic Approach for Structural Fidelity and Text Integrity Preservation. 7900-7908 - Miao Zhang, Jiaxing Li, Haoyuan Zhao, Linfeng Shen, Jiangchuan Liu:
StarStream: Live Video Analytics over Space Networking. 7909-7917 - Pengqiang Bi, Yifei Zou, Mengbai Xiao, Dongxiao Yu, Yijun Li, Zhixiong Liu, Qun Xie:
LiteQUIC: Improving QoE of Video Streams by Reducing CPU Overhead of QUIC. 7918-7927 - Yili Jin, Xize Duan, Fangxin Wang, Xue Liu:
HeadsetOff: Enabling Photorealistic Video Conferencing on Economical VR Headsets. 7928-7936 - Zihan Zheng, Houqiang Zhong, Qiang Hu, Xiaoyun Zhang, Li Song, Ya Zhang, Yanfeng Wang:
HPC: Hierarchical Progressive Coding Framework for Volumetric Video. 7937-7946
Poster Session 3
- Lianghui Zhu, Junwei Zhou, Yan Liu, Xin Hao, Wenyu Liu, Xinggang Wang:
WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition. 7947-7956 - Xiangyu Sun, Joo Chan Lee, Daniel Rho, Jong Hwan Ko, Usman Ali, Eunbyung Park:
F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting. 7957-7965 - Sijing Wu, Yunhao Li, Yichao Yan, Huiyu Duan, Ziwei Liu, Guangtao Zhai:
MMHead: Towards Fine-grained Multi-modal 3D Facial Animation. 7966-7975 - Chunxiao Li, Shuyang Wang, Xuejing Kang, Anlong Ming:
Thinking Temporal Automatic White Balance: Datasets, Models and Benchmarks. 7976-7984 - Zhe Luo, Weina Fu, Shuai Liu, Saeed Anwar, Muhammad Saqib, Sambit Bakshi, Khan Muhammad:
Cefdet: Cognitive Effectiveness Network Based on Fuzzy Inference for Action Detection. 7985-7994 - Wencan Huang, Daizong Liu, Wei Hu:
Advancing 3D Object Grounding Beyond a Single 3D Scene. 7995-8004 - Bin Huang, Feng He, Qi Wang, Hong Chen, Guohao Li, Zhifan Feng, Xin Wang, Wenwu Zhu:
Neighbor Does Matter: Curriculum Global Positive-Negative Sampling for Vision-Language Pre-training. 8005-8014 - Haoyuan Jin, Xuesong Nie, Yunfeng Yan, Xi Chen, Zhihang Zhu, Donglian Qi:
Object-Level Pseudo-3D Lifting for Distance-Aware Tracking. 8015-8023 - Xinwei Liu, Xiaojun Jia, Yuan Xun, Siyuan Liang, Xiaochun Cao:
Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning. 8024-8033 - Ge Luo, Yuchen Ma, Manman Zhang, Junqiang Huang, Sheng Li, Zhenxing Qian, Xinpeng Zhang:
Engaging Live Video Comments Generation. 8034-8042 - Lu Chen, Qiangchang Wang, Zhaohui Li, Yilong Yin:
Hypergraph-guided Intra- and Inter-category Relation Modeling for Fine-grained Visual Recognition. 8043-8052 - Yuan Xie, Yichen Zhang, Yifang Yin, Sheng Zhang, Ying Zhang, Rajiv Ratn Shah, Roger Zimmermann, Guoqing Xiao:
Traj2Former: A Local Context-aware Snapshot and Sequential Dual Fusion Transformer for Trajectory Classification. 8053-8061 - Guilin Li, Mengdan Zhang, Xiawu Zheng, Peixian Chen, Zihan Wang, Yunhang Shen, Mingchen Zhuge, Chenglin Wu, Fei Chao, Ke Li, Xing Sun, Rongrong Ji:
Multimodal Inplace Prompt Tuning for Open-set Object Detection. 8062-8071 - Shengran Cheng, Chuhang Ma, Ye Pan:
StylizedFacePoint: Facial Landmark Detection for Stylized Characters. 8072-8080 - Sheng Zhang, Xi Yang:
Information Fusion with Knowledge Distillation for Fine-grained Remote Sensing Object Detection. 8081-8089 - Bowen Zhao, Qianqian Wang, Zhiqiang Tao, Wei Feng, Quanxue Gao:
DFMVC: Deep Fair Multi-view Clustering. 8090-8099 - Ruyu Liu, Zhengzhe Liu, Haoyu Zhang, Guodao Zhang, Jianhua Zhang, Sunbo, Weiguo Sheng, Xiufeng Liu, Yaochu Jin:
ColVO: Colonoscopic Visual Odometry Considering Geometric and Photometric Consistency. 8100-8109 - Xun Lin, Yi Yu, Zitong Yu, Ruohan Meng, Jiale Zhou, Ajian Liu, Yizhong Liu, Shuai Wang, Wenzhong Tang, Zhen Lei, Alex C. Kot:
HideMIA: Hidden Wavelet Mining for Privacy-Enhancing Medical Image Analysis. 8110-8119 - Shuyuan Liu, Jiawei Chen, Shouwei Ruan, Hang Su, Zhaoxia Yin:
Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models. 8120-8128 - Jiahe Tian, Cai Yu, Xi Wang, Peng Chen, Zihao Xiao, Jizhong Han, Yesheng Chai:
Dynamic Mixed-Prototype Model for Incremental Deepfake Detection. 8129-8138 - Tianshan Liu, Kin-Man Lam, Bing-Kun Bao:
Label Text-aided Hierarchical Semantics Mining for Panoramic Activity Recognition. 8139-8148 - Xiaoda Yang, Xize Cheng, Dongjie Fu, Minghui Fang, Jialung Zuo, Shengpeng Ji, Zhou Zhao, Tao Jin:
SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning. 8149-8158 - Jingjun Yi, Qi Bi, Hao Zheng, Haolan Zhan, Wei Ji, Yawen Huang, Yuexiang Li, Yefeng Zheng:
Learning Spectral-Decomposited Tokens for Domain Generalized Semantic Segmentation. 8159-8168 - Peng Yin, Xiaosu Zhu, Jingkuan Song, Lianli Gao, Heng Tao Shen:
SI-BiViT: Binarizing Vision Transformers with Spatial Interaction. 8169-8178 - Ao Li, Huijun Liu, Jinrong Sheng, Zhongming Chen, Yongxin Ge:
Efficient Dual-Confounding Eliminating for Weakly-supervised Temporal Action Localization. 8179-8188 - Xuri Ge, Junchen Fu, Fuhai Chen, Shan An, Nicu Sebe, Joemon M. Jose:
Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning. 8189-8198 - Jongbhin Woo, Hyeonggon Ryu, Youngjoon Jang, Jae-Won Cho, Joon Son Chung:
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding. 8199-8208 - Jiali Chen, Xusen Hei, Yuqi Xue, Yuancheng Wei, Jiayuan Xie, Yi Cai, Qing Li:
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor. 8209-8218 - Yu-Pei Song, Yuantong Liu, Xiao Wu, Qi He, Zhaoquan Yuan, Ao Luo:
MagicCartoon: 3D Pose and Shape Estimation for Bipedal Cartoon Characters. 8219-8227 - Ajian Liu, Hui Ma, Junze Zheng, Haocheng Yuan, Xiaoyuan Yu, Yanyan Liang, Sergio Escalera, Jun Wan, Zhen Lei:
FM-CLIP: Flexible Modal CLIP for Face Anti-Spoofing. 8228-8237 - Jiaqi Guo, Lianli Gao, Junchen Zhu, Jiaxin Zhang, Siyang Li, Jingkuan Song:
MagicVFX: Visual Effects Synthesis in Just Minutes. 8238-8246 - Kangzheng Liu, Feng Zhao, Yu Yang, Guandong Xu:
DySarl: Dynamic Structure-Aware Representation Learning for Multimodal Knowledge Graph Reasoning. 8247-8256 - Weicai Yan, Ye Wang, Wang Lin, Zirun Guo, Zhou Zhao, Tao Jin:
Low-rank Prompt Interaction for Continual Vision-Language Retrieval. 8257-8266 - Jing Zhou, Ziqi Yu, Zhongyun Bao, Gang Fu, Weilei He, Chao Liang, Chunxia Xiao:
Foreground Harmonization and Shadow Generation for Composite Image. 8267-8276 - Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao, Zi-Chao Zhang, Tai Zheng, Xin Luo, Xin-Shun Xu:
Bi-directional Task-Guided Network for Few-Shot Fine-Grained Image Classification. 8277-8286 - Xiao He, Chang Tang, Xinwang Liu, Chuankun Li, Shan An, Zhenglai Li:
Heterogeneous Graph Guided Contrastive Learning for Spatially Resolved Transcriptomics Data. 8287-8295 - Yabing Wang, Le Wang, Qiang Zhou, Zhibin Wang, Hao Li, Gang Hua, Wei Tang:
Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval. 8296-8305 - Zhiwen Yang, Liang Li, Jiehua Zhang, Tingyu Wang, Yaoqi Sun, Chenggang Yan:
Domain Shared and Specific Prompt Learning for Incremental Monocular Depth Estimation. 8306-8315 - Shuting He, Henghui Ding:
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation. 8316-8325 - Yunwei Bai, Bill Yang Cai, Ying Kiat Tan, Zangwei Zheng, Shiming Chen, Tsuhan Chen:
FSL-QuickBoost: Minimal-Cost Ensemble for Few-Shot Learning. 8326-8335 - Jinhui Pang, Changqing Lin, Xiaoshuai Hao, Rong Yin, Zixuan Wang, Zhihui Zhang, Jinglin He, Huang Tai Sheng:
FTF-ER: Feature-Topology Fusion-Based Experience Replay Method for Continual Graph Learning. 8336-8344 - Fengmao Lv, Changru Nie, Jianyang Zhang, Guowu Yang, Guosheng Lin, Xiao Wu, Tianrui Li:
Rethinking the Effect of Uninformative Class Name in Prompt Learning. 8345-8354 - Yuhan Wang, Mofei Song:
UniL: Point Cloud Novelty Detection through Multimodal Pre-training. 8355-8364 - Zeyu Xiao, Zhihe Lu, Xinchao Wang:
P-BiC: Ultra-High-Definition Image Moiré Patterns Removal via Patch Bilateral Compensation. 8365-8373 - Jing Yang, Shundong Yang, Yuan Gao, Jieming Yang, Laurence T. Yang:
Multimodal Contextual Interactions of Entities: A Modality Circular Fusion Approach for Link Prediction. 8374-8382 - Chaolei Tan, Zihang Lin, Junfu Pu, Zhongang Qi, Wei-Yi Pei, Zhi Qu, Yexin Wang, Ying Shan, Wei-Shi Zheng, Jian-Fang Hu:
SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses. 8383-8392 - Buyu Liu, Kai Wang, Yansong Liu, Jun Bao, Tingting Han, Jun Yu:
MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability. 8393-8401 - Junzhang Liu, Zhecan Wang, Hammad A. Ayyubi, Haoxuan You, Chris Thomas, Rui Sun, Shih-Fu Chang, Kai-Wei Chang:
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions. 8402-8411 - Yingchun Wang, Jingcai Guo, Song Guo, Yi Liu, Jie Zhang, Weizhan Zhang:
SFP: Spurious Feature-Targeted Pruning for Out-of-Distribution Generalization. 8412-8420 - Yao Li, Jiajun Deng, Yuxuan Xiao, Yingjie Wang, Xiaomeng Chu, Jianmin Ji, Yanyong Zhang:
FARFusion V2: A Geometry-based Radar-Camera Fusion Method on the Ground for Roadside Far-Range 3D Object Detection. 8421-8430 - Fangdi Wang, Jiaqi Jin, Zhibin Dong, Xihong Yang, Yu Feng, Xinwang Liu, Xinzhong Zhu, Siwei Wang, Tianrui Liu, En Zhu:
View Gap Matters: Cross-view Topology and Information Decoupling for Multi-view Clustering. 8431-8440 - Wenjie Wei, Yu Liang, Ammar Belatreche, Yichen Xiao, Honglin Cao, Zhenbang Ren, Guoqing Wang, Malu Zhang, Yang Yang:
Q-SNNs: Quantized Spiking Neural Networks. 8441-8450 - Shihua Zhang, Jiayi Ma:
DiffGlue: Diffusion-Aided Image Feature Matching. 8451-8460 - Xueyang Li, Yu Song, Yunzhong Lou, Xiangdong Zhou:
CAD Translator: An Effective Drive for Text to 3D Parametric Computer-Aided Design Generative Modeling. 8461-8470 - Weichen Xu, Jian Cao, Tianhao Fu, Ruilong Ren, Zicong Hu, Xixin Cao, Xing Zhang:
Point Cloud Reconstruction Is Insufficient to Learn 3D Representations. 8471-8479 - Xiao Yu, Kejiang Chen, Kai Zeng, Han Fang, Zijin Yang, Xiuwei Shang, Yuang Qi, Weiming Zhang, Nenghai Yu:
SemGIR: Semantic-Guided Image Regeneration Based Method for AI-generated Image Detection and Attribution. 8480-8488 - Jiahua Xiao, Yang Liu, Shizhou Zhang, Xing Wei:
Bridging Fourier and Spatial-Spectral Domains for Hyperspectral Image Denoising. 8489-8497 - Heng Jia, Yunqiu Xu, Linchao Zhu, Guang Chen, Yufei Wang, Yi Yang:
MoS2: Mixture of Scale and Shift Experts for Text-Only Video Captioning. 8498-8507 - Qi Zhang, Chi Huang, Qian Zhang, Nan Li, Wei Feng:
Learning Geometry Consistent Neural Radiance Fields from Sparse and Unposed Views. 8508-8517 - Zihan Fang, Shide Du, Yuhong Chen, Shiping Wang:
Beyond the Known: Ambiguity-Aware Multi-view Learning. 8518-8526 - Jingchao Wang, Zhengnan Deng, Tongxu Lin, Wenyuan Li, Shaobin Ling, Junyu Lin:
Beyond Direct Relationships: Exploring Multi-Order Label Pair Dependencies for Knowledge Distillation. 8527-8535 - Yuhang Li, Jincen Jiang, Xiaosong Yang, Youdong Ding, Jian Jun Zhang:
Harmony Everything! Masked Autoencoders for Video Harmonization. 8536-8545 - Linfeng Tang, Yuxin Deng, Xunpeng Yi, Qinglong Yan, Yixuan Yuan, Jiayi Ma:
DRMF: Degradation-Robust Multi-Modal Image Fusion via Composable Diffusion Prior. 8546-8555 - Jintao Chen, Fan Wang, Shengye Pang, Siwei Tan, Mingshuai Chen, Tiancheng Zhao, Meng Xi, Jianwei Yin:
UniGM: Unifying Multiple Pre-trained Graph Models via Adaptive Knowledge Aggregation. 8556-8565 - Ziyue Wu, Junyu Gao, Changsheng Xu:
Open-Vocabulary Video Scene Graph Generation via Union-aware Semantic Alignment. 8566-8575 - Li Zheng, Boyu Chen, Hao Fei, Fei Li, Shengqiong Wu, Lizi Liao, Donghong Ji, Chong Teng:
Self-Adaptive Fine-grained Multi-modal Data Augmentation for Semi-supervised Muti-modal Coreference Resolution. 8576-8585 - Daqin Luo, Chengjian Feng, Yuxuan Nong, Yiqing Shen:
AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models. 8586-8594 - Xu Zhang, Zhipeng Xie, Haiyang Yu, Qitong Wang, Peng Wang, Wei Wang:
Enhancing Adaptive Deep Networks for Image Classification via Uncertainty-aware Decision Fusion. 8595-8603 - Ran Wang, Hua Zuo, Zhen Fang, Jie Lu:
Towards Robustness Prompt Tuning with Fully Test-Time Adaptation for CLIP's Zero-Shot Generalization. 8604-8612 - Lijun Zhang, Wei Suo, Peng Wang, Yanning Zhang:
A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap. 8613-8622 - Haojie Wei, Jun Yuan, Rui Zhang, Quanyu Dai, Yueguo Chen:
MAJL: A Model-Agnostic Joint Learning Framework for Music Source Separation and Pitch Estimation. 8623-8632 - Binbin Xu, Jun Yin, Nan Zhang:
Graph based Consistency Learning for Contrastive Multi-View Clustering. 8633-8641 - Jiaxin Gao, Yaohua Liu:
Enhancing Images with Coupled Low-Resolution and Ultra-Dark Degradations: A Tri-level Learning Framework. 8642-8651 - Qian Qu, Xinhang Wan, Weixuan Liang, Jiyuan Liu, Yu Feng, Huiying Xu, Xinwang Liu, En Zhu:
A Lightweight Anchor-Based Incremental Framework for Multi-view Clustering. 8652-8661 - Yao Wu, Mingwei Xing, Yachao Zhang, Yuan Xie, Yanyun Qu:
CLIP2UDA: Making Frozen CLIP Reward Unsupervised Domain Adaptation in 3D Semantic Segmentation. 8662-8671 - Zongqian Wu, Yujing Liu, Mengmeng Zhan, Ping Hu, Xiaofeng Zhu:
Adaptive Multi-Modality Prompt Learning. 8672-8680 - Shiwei Zhang, Wei Ke, Shuai Liu, Xiaopeng Hong, Tong Zhang:
Boosting Semi-supervised Crowd Counting with Scale-based Active Learning. 8681-8690 - Yingjie Gao, Yanan Zhang, Ziyue Huang, Nanqing Liu, Di Huang:
PS-TTL: Prototype-based Soft-labels and Test-Time Learning for Few-shot Object Detection. 8691-8700 - Li Yuan, Yi Cai, Junsheng Huang:
Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model. 8701-8710 - Yijia Wang, Qianqian Xu, Yangbangyan Jiang, Siran Dai, Qingming Huang:
Regularized Contrastive Partial Multi-view Outlier Detection. 8711-8720 - Rui Liu, Mingjie Li, Shen Zhao, Ling Chen, Xiaojun Chang, Lina Yao:
In-Context Learning for Zero-shot Medical Report Generation. 8721-8730 - Guoliang Zou, Yangdong Ye, Tongji Chen, Shizhe Hu:
Learning Dual Enhanced Representation for Contrastive Multi-view Clustering. 8731-8739 - Yang Zhao, Gangwei Xu, Gang Wu:
Hybrid Cost Volume for Memory-Efficient Optical Flow. 8740-8749 - Xiao-Qian Liu, Minghui Liu, Zhen-Duo Chen, Xin Luo, Xin-Shun Xu:
Hierarchical Multi-label Learning for Incremental Multilingual Text Recognition. 8750-8758 - Yuzhuo Wang, Junwei He, Hongzhi Wang:
RHKH: Relational Hypergraph Neural Network for Link Prediction on N-ary Knowledge Hypergraph. 8759-8767 - Fengbo Lan, Chang Wen Chen:
Understanding and Tackling Scattering and Reflective Flare for Mobile Camera Systems. 8768-8776 - Ziyu Zhao, Pingping Cai, Canyu Zhang, Xiaoguang Li, Song Wang:
Crossmodal Few-shot 3D Point Cloud Semantic Segmentation via View Synthesis. 8777-8785 - Jinkai Zheng, Xinchen Liu, Boyue Zhang, Chenggang Yan, Jiyong Zhang, Wu Liu, Yongdong Zhang:
It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment. 8786-8794 - Kenan Huang, Junbao Zhuo, Shuhui Wang, Chi Su, Qingming Huang, Huimin Ma:
Unsupervised Image-to-Video Adaptation via Category-aware Flow Memory Bank and Realistic Video Generation. 8795-8804 - Lv Tang, Peng-Tao Jiang, Zhihao Shen, Hao Zhang, Jin-Wei Chen, Bo Li:
Chain of Visual Perception: Harnessing Multimodal Large Language Models for Zero-shot Camouflaged Object Detection. 8805-8814 - Xinyao Liao, Wei Wei, Dangyang Chen, Yuanyuan Fu:
UniQ: Unified Decoder with Task-specific Queries for Efficient Scene Graph Generation. 8815-8824 - Siyang Wang, Jinghao Zhang, Jie Huang, Feng Zhao:
Image-free Pre-training for Low-Level Vision. 8825-8834 - Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Suncheng Xiang, Zefang Yu, Ting Liu, Yuzhuo Fu, Xiaoye Qu:
GIST: Improving Parameter Efficient Fine-Tuning via Knowledge Interaction. 8835-8844 - Xuechen Guo, Wenhao Chai, Shiyan Li, Gaoang Wang:
LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound. 8845-8854 - Xiao Han, Zhenduo Zhang, Yiling Wu, Xinfeng Zhang, Zhe Wu:
Event Traffic Forecasting with Sparse Multimodal Data. 8855-8864 - Wanru Xu, Zhenjiang Miao, Yi Tian, Yigang Cen, Lili Wan, Xiaole Ma:
Probabilistic Distillation Transformer: Modelling Uncertainties for Visual Abductive Reasoning. 8865-8873 - Shiye Wang, Changsheng Li, Jialin Tang, Xing Gong, Ye Yuan, Guoren Wang:
Importance-aware Shared Parameter Subspace Learning for Domain Incremental Learning. 8874-8883 - Chengshun Wang, Na Zhao:
GS2-GNeSF: Geometry-Semantics Synergy for Generalizable Neural Semantic Fields. 8884-8892 - Liang Du, Yukai Shi, Yan Chen, Peng Zhou, Yuhua Qian:
Fast and Scalable Incomplete Multi-View Clustering with Duality Optimal Graph Filtering. 8893-8902 - Zhilin He, Yawei Zhang, Jingchang Mu, Xiaoyue Gu, Tianhao Gu:
LiteGfm: A Lightweight Self-supervised Monocular Depth Estimation Framework for Artifacts Reduction via Guided Image Filtering. 8903-8912 - Chengyi Yang, Wentao Liu, Shisong Chen, Jiayin Qi, Aimin Zhou:
Generating Prompts in Latent Space for Rehearsal-free Continual Learning. 8913-8922 - Choubo Ding, Guansong Pang:
Improving Out-of-Distribution Detection with Disentangled Foreground and Background Features. 8923-8931 - Yi Lu, Shenghao Ren, Qiu Shen, Xun Cao:
Leveraging RGB-Pressure for Whole-body Human-to-Humanoid Motion Imitation. 8932-8941 - Li Zhang, Zean Han, Yan Zhong, Qiaojun Yu, Xingyu Wu, Xue Wang, Rujing Wang:
VoCAPTER: Voting-based Pose Tracking for Category-level Articulated Object via Inter-frame Priors. 8942-8951 - Jinpeng Yu, Binbin Huang, Yuxuan Zhang, Huaxia Li, Xu Tang, Shenghua Gao:
GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer. 8952-8961 - Sifan Wu, Haipeng Chen, Yifang Yin, Sihao Hu, Runyang Feng, Yingying Jiao, Ziqi Yang, Zhenguang Liu:
Joint-Motion Mutual Learning for Pose Estimation in Video. 8962-8971 - Jiaqi Wang, Pichao Wang, Yi Feng, Huafeng Liu, Chang Gao, Liping Jing:
Align2Concept: Language Guided Interpretable Image Recognition by Visual Prototype and Textual Concept Alignment. 8972-8981 - Siying Xiao, Mao Ye, Qichen He, Shuaifeng Li, Song Tang, Xiatian Zhu:
Adversarial Experts Model for Black-box Domain Adaptation. 8982-8991 - Yayun Wei, Lei Cao, Hao Li, Yilin Dong:
MB2C: Multimodal Bidirectional Cycle Consistency for Learning Robust Visual Neural Representations. 8992-9000 - Qiang Wang, Ke Yan, Shouhong Ding:
Bilateral Adaptive Cross-Modal Fusion Prompt Learning for CLIP. 9001-9009 - Yifei Gao, Jiaqi Wang, Zhiyu Lin, Jitao Sang:
AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models. 9010-9018 - Haizhuang Liu, Junbao Zhuo, Chen Liang, Jiansheng Chen, Huimin Ma:
Affinity3D: Propagating Instance-Level Semantic Affinity for Zero-Shot Point Cloud Semantic Segmentation. 9019-9028 - Zhaojian Li, Bin Zhao, Yuan Yuan:
TAS: Personalized Text-guided Audio Spatialization. 9029-9037 - Congqi Cao, Yueran Zhang, Yating Yu, Qinyi Lv, Lingtong Min, Yanning Zhang:
Task-Adapter: Task-specific Adaptation of Image Models for Few-shot Action Recognition. 9038-9047 - Quanjiang Li, Tingjin Luo, Mingdie Jiang, Jiahui Liao, Zhangqi Jiang:
Deep Incomplete Multi-View Network Semi-Supervised Multi-Label Learning with Unbiased Loss. 9048-9056 - Xinyue Liu, Jiahui Wan, Linlin Zong, Bo Xu:
Conditional Diffusion Model for Open-ended Video Question Answering. 9057-9066 - Yulin He, Siqi Wang, Wei Chen, Tianci Xun, Yusong Tan:
Sniffing Threatening Open-World Objects in Autonomous Driving by Open-Vocabulary Models. 9067-9076 - Haosen Sun, Yiming Li, Xixiang Lyu, Jing Ma:
Learning from Distinction: Mitigating Backdoors Using a Low-Capacity Model. 9077-9086 - Shen Lin, Xiaoyu Zhang, Willy Susilo, Xiaofeng Chen, Jun Liu:
GDR-GMA: Machine Unlearning via Direction-Rectified and Magnitude-Adjusted Gradients. 9087-9095 - Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji:
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM. 9096-9105 - Shijie Li, Yunbin Tu, Qingyuan Xiang, Zheng Li:
MAGIC: Rethinking Dynamic Convolution Design for Medical Image Segmentation. 9106-9115 - Chao Wang, Yang Zhou, Liangtian He, Fenglai Lin, Hongming Chen, Liang-Jian Deng:
Illumination Distribution Prior for Low-light Image Enhancement. 9116-9125 - Pinhan Fu, Xinyan Liang, Yuhua Qian, Qian Guo, Zhifang Wei, Wen Li:
CoMO-NAS: Core-Structures-Guided Multi-Objective Neural Architecture Search for Multi-Modal Classification. 9126-9135 - Yi Liu, Jiachen Li, Yanchun Ma, Qing Xie, Yongjian Liu:
HcaNet: Haze-concentration-aware Network for Real-scene Dehazing with Codebook Priors. 9136-9144 - Wenlong Liao, Sunyuan Qiang, Xianfei Li, Xiaolei Chen, Haoyu Wang, Yanyan Liang, Junchi Yan, Tao He, Pai Peng:
CalibRBEV: Multi-Camera Calibration via Reversed Bird's-eye-view Representations for Autonomous Driving. 9145-9154 - Md. Tanvir Islam, Nasir Rahim, Saeed Anwar, Muhammad Saqib, Sambit Bakshi, Khan Muhammad:
HazeSpace2M: A Dataset for Haze Aware Single Image Dehazing. 9155-9164 - Xiaojun Chen, Jimeng Lou, Wenxi Huang, Ting Wan, Qin Zhang, Min Yang:
ReCoS: A Novel Benchmark for Cross-Modal Image-Text Retrieval in Complex Real-Life Scenarios. 9165-9174 - Shicheng Yang, Xiaoxu Li, Dongliang Chang, Zhanyu Ma, Jing-Hao Xue:
Channel-Spatial Support-Query Cross-Attention for Fine-Grained Few-Shot Image Classification. 9175-9183 - Xiaorui Jiang, Zhongyi Ma, Yulin Fu, Yong Liao, Pengyuan Zhou:
Heterogeneity-Aware Federated Deep Multi-View Clustering towards Diverse Feature Representations. 9184-9193 - Jiyuan Zhang, Kang Chen, Shiyan Chen, Yajing Zheng, Tiejun Huang, Zhaofei Yu:
SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion. 9194-9203 - Jiangyi Wang, Zhongyao Cheng, Na Zhao, Jun Cheng, Xulei Yang:
On-the-fly Point Feature Representation for Point Clouds Analysis. 9204-9213 - Kun Wang, Hao Liu, Lirong Jie, Zixu Li, Yupeng Hu, Liqiang Nie:
Explicit Granularity and Implicit Scale Correspondence Learning for Point-Supervised Video Moment Localization. 9214-9223 - Shaoqing Xu, Shengyin Jiang, Fang Li, Li Liu, Ziying Song, Bo Yang, Zhixin Yang:
SparseInteraction: Sparse Semantic Guidance for Radar and Camera 3D Object Detection. 9224-9233 - Mahiro Ukai, Shuhei Kurita, Atsushi Hashimoto, Yoshitaka Ushiku, Nakamasa Inoue:
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering. 9234-9243 - Shengwei Zhao, Linhai Xu, Yuying Liu, Shaoyi Du:
Multi-grained Correspondence Learning of Audio-language Models for Few-shot Audio Recognition. 9244-9252 - Song Wu, Xiaoyu Wei, Xinyue Chen, Yazhou Ren, Jing He, Xiaorong Pu:
Cross-View Mutual Learning for Semi-Supervised Medical Image Segmentation. 9253-9261 - Yunshan Qi, Lin Zhu, Yifan Zhao, Nan Bao, Jia Li:
Deblurring Neural Radiance Fields with Event-driven Bundle Adjustment. 9262-9270 - Jingqiao Xiu, Mengze Li, Wei Ji, Jingyuan Chen, Hanbin Zhao, Shin'ichi Satoh, Roger Zimmermann:
Hierarchical Debiasing and Noisy Correction for Cross-domain Video Tube Retrieval. 9271-9280 - Wenyu Yin, Shuyuan Lin, Yang Lu, Hanzi Wang:
Diverse Consensuses Paired with Motion Estimation-Based Multi-Model Fitting. 9281-9290 - Andong Lu, Jiacong Zhao, Chenglong Li, Yun Xiao, Bin Luo:
Breaking Modality Gap in RGBT Tracking: Coupled Knowledge Distillation. 9291-9300 - Peng Wu, Xuerong Zhou, Guansong Pang, Zhiwei Yang, Qingsen Yan, Peng Wang, Yanning Zhang:
Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts. 9301-9310 - Pengfei Luo, Tong Xu, Che Liu, Suojuan Zhang, Linli Xu, Minglei Li, Enhong Chen:
Bridging Gaps in Content and Knowledge for Multimodal Entity Linking. 9311-9320 - Shiyu Tang, Zhaofan Luo, Yifan Wang, Lijun Wang, Huchuan Lu, Weibo Su, Libo Liu:
LOVD: Large-and-Open Vocabulary Object Detection. 9321-9329 - Cam-Van Thi Nguyen, The-Son Le, Anh-Tuan Mai, Duc-Trong Le:
Ada2I: Enhancing Modality Balance for Multimodal Conversational Emotion Recognition. 9330-9339 - Xinpeng Li, Teng Wang, Jian Zhao, Shuyi Mao, Jinbao Wang, Feng Zheng, Xiaojiang Peng, Xuelong Li:
Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer. 9340-9349 - Jingjia Huang, Jingyan Tu, Ge Meng, Yingying Wang, Yuhang Dong, Xiaotong Tu, Xinghao Ding, Yue Huang:
Efficient Perceiving Local Details via Adaptive Spatial-Frequency Information Integration for Multi-focus Image Fusion. 9350-9359 - Wonwoo Cho, Kangyeol Kim, Saemee Choi, Jaegul Choo:
Training Spatial-Frequency Visual Prompts and Probabilistic Clusters for Accurate Black-Box Transfer Learning. 9360-9368 - Ning Xu, Yifei Gao, Ting-Ting Zhang, Hongshuo Tian, An-An Liu:
Cross-Modal Coherence-Enhanced Feedback Prompting for News Captioning. 9369-9377 - Yuzhen Li, Zehang Deng, Yuxin Cao, Lihua Liu:
GRFormer: Grouped Residual Self-Attention for Lightweight Single Image Super-Resolution. 9378-9386 - Muxin Pu, Mei Kuan Lim, Chun Yong Chong:
Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition. 9387-9396 - Yue Duan, Zhangxuan Gu, Zhenzhe Ying, Lei Qi, Changhua Meng, Yinghuan Shi:
PC2: Pseudo-Classification Based Pseudo-Captioning for Noisy Correspondence Learning in Cross-Modal Retrieval. 9397-9406 - Wei Feng, Zhenwei Wu, Qianqian Wang, Bo Dong, Quanxue Gao:
Federated Fuzzy C-means with Schatten-p Norm Minimization. 9407-9416 - Tianjiao Wan, Kele Xu, Long Lan, Zijian Gao, Dawei Feng, Bo Ding, Huaimin Wang:
Tracing Training Progress: Dynamic Influence Based Selection for Active Learning. 9417-9425 - Ruohao Guo, Dantong Niu, Liao Qu, Yanyu Qi, Ji Shi, Wenzhen Yue, Bowei Xing, Taiyan Chen, Xianghua Ying:
Instance-Level Panoramic Audio-Visual Saliency Detection and Ranking. 9426-9434 - Shenglin Yin, Kelu Yao, Zhen Xiao, Jieyi Long:
Embracing Adaptation: An Effective Dynamic Defense Strategy Against Adversarial Examples. 9435-9444 - Zitong Huang, Ze Chen, Yuanze Li, Bowen Dong, Erjin Zhou, Yong Liu, Rick Siow Mong Goh, Chun-Mei Feng, Wangmeng Zuo:
Class Balance Matters to Active Class-Incremental Learning. 9445-9454 - Hao Zhang, Ee Yeo Keat, Basura Fernando:
RCA: Region Conditioned Adaptation for Visual Abductive Reasoning. 9455-9464 - Jian-Yu Jiang-Lin, Kang-Yang Huang, Ling Lo, Yi-Ning Huang, Terence Lin, Jhih-Ciang Wu, Hong-Han Shuai, Wen-Huang Cheng:
ReCorD: Reasoning and Correcting Diffusion for HOI Generation. 9465-9474 - Xiaze Zhang, Ziheng Ding, Qi Jing, Ying Cheng, Wenchao Ding, Rui Feng:
DeepPointMap2: Accurate and Robust LiDAR-Visual SLAM with Neural Descriptors. 9475-9484 - Hongyu Li, Tianrui Hui, Zihan Ding, Jing Zhang, Bin Ma, Xiaoming Wei, Jizhong Han, Si Liu:
Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding. 9485-9494 - Hengde Zhu, Xiangyu Kong, Weicheng Xie, Xin Huang, Linlin Shen, Lu Liu, Hatice Gunes, Siyang Song:
PerFRDiff: Personalised Weight Editing for Multiple Appropriate Facial Reaction Generation. 9495-9504 - Shiqin Liu, Chaozhuo Li, Xi Zhang, Minjun Zhao, Yuanbo Xu, Jiajun Bu:
Deeply Fusing Semantics and Interactions for Item Representation Learning via Topology-driven Pre-training. 9505-9514 - Yongsen Zheng, Guohua Wang, Yang Liu, Liang Lin:
Diversity Matters: User-Centric Multi-Interest Learning for Conversational Movie Recommendation. 9515-9524 - Yuanchen Shi, Fang Kong:
Integrating Stickers into Multimodal Dialogue Summarization: A Novel Dataset and Approach for Enhancing Social Media Interaction. 9525-9534 - Andreea-Maria Oncescu, João F. Henriques, A. Sophia Koepke:
Dissecting Temporal Understanding in Text-to-Audio Retrieval. 9535-9543 - Yuhang Su, Wei Hu, Fan Zhang, Qiming Xu:
AMG-Embedding: A Self-Supervised Embedding Approach for Audio Identification. 9544-9553 - Xue Li, Jiong Yu, Ziyang Li, Hongchun Lu, Ruifeng Yuan:
Dr. CLIP: CLIP-Driven Universal Framework for Zero-Shot Sketch Image Retrieval. 9554-9562 - Yan Zhuang, Yanlu Cai, Weizhong Zhang, Cheng Jin:
Future Motion Dynamic Modeling via Hybrid Supervision for Multi-Person Motion Prediction Uncertainty Reduction. 9563-9572 - Yupeng Zhang, Shuqi Zheng, Ruize Han, Yuzhong Feng, Junhui Hou, Linqi Song, Wei Feng, Liang Wan:
Rethinking the One-shot Object Detection: Cross-Domain Object Search. 9573-9581 - Yuhan Wu, Xiyu Meng, Yang He, Junru Zhang, Haowen Zhang, Yabo Dong, Dongming Lu:
Multi-view Self-Supervised Contrastive Learning for Multivariate Time Series. 9582-9590 - Dongding Lin, Jian Wang, Chak Tou Leong, Wenjie Li:
SCREEN: A Benchmark for Situated Conversational Recommendation. 9591-9600 - Xiaowan Hu, Yiyi Chen, Yan Li, Minquan Wang, Haoqian Wang, Quan Chen, Han Li, Peng Jiang:
Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval. 9601-9610 - Zheqi Lv, Shaoxuan He, Tianyu Zhan, Shengyu Zhang, Wenqiao Zhang, Jingyuan Chen, Zhou Zhao, Fei Wu:
Semantic Codebook Learning for Dynamic Recommendation Models. 9611-9620 - Geng Tu, Feng Xiong, Bin Liang, Hui Wang, Xi Zeng, Ruifeng Xu:
Multimodal Emotion Recognition Calibration in Conversations. 9621-9630 - Wuyou Xia, Shengzhe Liu, Rong Qin, Guoli Jia, Eunil Park, Jufeng Yang:
Perceive before Respond: Improving Sticker Response Selection by Emotion Distillation and Hard Mining. 9631-9640 - Yunshan Ma, Yingzhi He, Wenjun Zhong, Xiang Wang, Roger Zimmermann, Tat-Seng Chua:
CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling. 9641-9649 - Zixian Gao, Disen Hu, Xun Jiang, Huimin Lu, Heng Tao Shen, Xing Xu:
Enhanced Experts with Uncertainty-Aware Routing for Multimodal Sentiment Analysis. 9650-9659 - Zhenyang Li, Fan Liu, Yinwei Wei, Zhiyong Cheng, Liqiang Nie, Mohan S. Kankanhalli:
Attribute-driven Disentangled Representation Learning for Multimodal Recommendation. 9660-9669 - Ting Fu, Yu-Wei Zhan, Chong-Yu Zhang, Xin Luo, Zhen-Duo Chen, Yongxin Wang, Xun Yang, Xin-Shun Xu:
FedCAFE: Federated Cross-Modal Hashing with Adaptive Feature Enhancement. 9670-9679 - Feng Zhu, Xinxing Yang, Longfei Li, Jun Zhou:
An Active Masked Attention Framework for Many-to-Many Cross-Domain Recommendations. 9680-9689 - Zehao Qi, Ruixu Zhang, Xinyi Hu, Wenxuan Liu, Zheng Wang:
Predicting the Unseen: A Novel Dataset for Hidden Intention Localization in Pre-abnormal Analysis. 9690-9698 - Ding Wang, Wei Zhou, Songlin Hu:
Information Diffusion Prediction with Graph Neural Ordinary Differential Equation Network. 9699-9708 - Jian Chen, Wei Wang, Yuzhu Hu, Junxin Chen, Han Liu, Xiping Hu:
TGCA-PVT: Topic-Guided Context-Aware Pyramid Vision Transformer for Sticker Emotion Recognition. 9709-9718 - Rui Yang, Shuang Wang, Jianwei Tao, Yingping Han, Qiaoling Lin, Yanhe Guo, Biao Hou, Licheng Jiao:
Accurate and Lightweight Learning for Specific Domain Image-Text Retrieval. 9719-9728 - Xianbing Zhao, Lizhen Qu, Tao Feng, Jianfei Cai, Buzhou Tang:
Learning in Order! A Sequential Strategy to Learn Invariant Features for Multimodal Sentiment Analysis. 9729-9738 - Yutong Wang, Sidan Zhu, Hongteng Xu, Dixin Luo:
An Inverse Partial Optimal Transport Framework for Music-guided Trailer Generation. 9739-9748 - Haonan Zheng, Wen Jiang, Xinyang Deng, Wenrui Li:
Sample-agnostic Adversarial Perturbation for Vision-Language Pre-training Models. 9749-9758 - Jiade Chen, Jin Wang, Yunhui Shi, Nam Ling, Baocai Yin:
MVP-Net: Multi-View Depth Image Guided Cross-Modal Distillation Network for Point Cloud Upsampling. 9759-9768 - Zuoyan Zhao, Hui Xue, Pengfei Fang, Shipeng Zhu:
PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution. 9769-9778 - Yuzhi Huang, Chenxin Li, Zixu Lin, Hengyu Liu, Haote Xu, Yifan Liu, Yue Huang, Xinghao Ding, Xiaotong Tu, Yixuan Yuan:
P2SAM: Probabilistically Prompted SAMs Are Efficient Segmentator for Ambiguous Medical Images. 9779-9788 - Ran Yi, Haokun Zhu, Teng Hu, Yu-Kun Lai, Paul L. Rosin:
AesStyler: Aesthetic Guided Universal Style Transfer. 9789-9798 - Wenxuan Wang, Chenglei Wang, Huihui Qi, Menghao Ye, Xuelin Qian, Peng Wang, Yanning Zhang:
Sustainable Self-evolution Adversarial Training. 9799-9808 - Jian-Jun Qiao, Meng-Yu Duan, Xiao Wu, Wei Li:
CAPNet: Cartoon Animal Parsing with Spatial Learning and Structural Modeling. 9809-9817 - Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang:
V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection. 9818-9827 - Xian Zhong, Shengwang Hu, Wenxuan Liu, Wenxin Huang, Jianhao Ding, Zhaofei Yu, Tiejun Huang:
Towards Low-latency Event-based Visual Recognition with Hybrid Step-wise Distillation Spiking Neural Networks. 9828-9836 - Junqi Shi, Mingyi Jiang, Ming Lu, Tong Chen, Xun Cao, Zhan Ma:
HINER: Neural Representation for Hyperspectral Image. 9837-9846 - Yaqiang Wu, Zhen Xu, Yong Duan, Yanlai Wu, Qinghua Zheng, Hui Li, Xiaochen Hu, Lianwen Jin:
RDLNet: A Novel and Accurate Real-world Document Localization Method. 9847-9855 - Xiao Teng, Xingyu Shen, Kele Xu, Long Lan:
Enhancing Unsupervised Visible-Infrared Person Re-Identification with Bidirectional-Consistency Gradual Matching. 9856-9865 - Zhen Zhang, Jing Xiao, Liang Liao, Mi Wang:
RefScale: Multi-temporal Assisted Image Rescaling in Repetitive Observation Scenarios. 9866-9874 - Chaoxiang He, Xiaofan Bai, Xiaojing Ma, Bin B. Zhu, Pingyi Hu, Jiayun Fu, Hai Jin, Dongmei Zhang:
Towards Stricter Black-box Integrity Verification of Deep Neural Network Models. 9875-9884 - Peibin Chen, Xijin Zhang, Daniel Kang Du:
SimpliGuard: Robust Mesh Simplification In the Wild. 9885-9893 - Shixuan Gao, Pingping Zhang, Tianyu Yan, Huchuan Lu:
Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection. 9894-9903 - Panjun Duan, Yang Zhao, Yuan Chen, Wei Jia, Zhao Zhang, Ronggang Wang:
Blind Video Bit-Depth Expansion. 9904-9912 - Xiaoheng Tan, Jiabin Zhang, Yuhui Quan, Jing Li, Yajing Wu, Zilin Bian:
Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy. 9913-9922 - Yujia Wang, Zhongxu Wang, Hua Huang:
AutoSFX: Automatic Sound Effect Generation for Videos. 9923-9932 - Weiguang Zhang, Qiufeng Wang, Kaizhu Huang, Xiaowei Huang, Fengjun Guo, Xiaomeng Gu:
Document Registration: Towards Automated Labeling of Pixel-Level Alignment Between Warped-Flat Documents. 9933-9942 - Hao Yang, Min Wang, Zhengfei Yu, Zhi Zeng, Mingrui Lao, Yun Zhou:
Maximizing Feature Distribution Variance for Robust Neural Networks. 9943-9951 - Kai Han, Jin Wang, Yunhui Shi, Nam Ling, Baocai Yin:
D3U-Net: Dual-Domain Collaborative Optimization Deep Unfolding Network for Image Compressive Sensing. 9952-9960 - Jiangtong Zhu, Zhao Yang, Yinan Shi, Jianwu Fang, Jianru Xue:
IC-Mapper: Instance-Centric Spatio-Temporal Modeling for Online Vectorized Map Construction. 9961-9969 - Jianjun Xiang, Yuanjie Dang, Peng Chen, Ronghua Liang, Ruohong Huan, Nan Gao:
Semantic-Aware and Quality-Aware Interaction Network for Blind Video Quality Assessment. 9970-9979 - Zerui Zhang, Jun Yu, Liangxian Cui, Qiang Ling, Tianyu Liu:
Part-level Reconstruction for Self-Supervised Category-level 6D Object Pose Estimation with Coarse-to-Fine Correspondence Optimization. 9980-9988 - Yachun Mi, Yan Shu, Yu Li, Chen Hui, Puchao Zhou, Shaohui Liu:
CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings. 9989-9998 - Xuntao Liu, Yuzhou Yang, Haoyue Wang, Qichao Ying, Zhenxing Qian, Xinpeng Zhang, Sheng Li:
Multi-view Feature Extraction via Tunable Prompts is Enough for Image Manipulation Localization. 9999-10007 - Junfeng Yang, Jing Fu, Zhen Zhang, Limei Liu, Qin Li, Wei Zhang, Wenzhi Cao:
Align-IQA: Aligning Image Quality Assessment Models with Diverse Human Preferences via Customizable Guidance. 10008-10017 - Zehang Lin, Jiayuan Xie, Zhenguo Yang, Yi Yu, Qing Li:
Generalized News Event Discovery via Dynamic Augmentation and Entropy Optimization. 10018-10026 - Jiahao Cui, Wei Jiang, Zhan Peng, Zhiyu Pan, Zhiguo Cao:
Exposure Completing for Temporally Consistent Neural High Dynamic Range Video Rendering. 10027-10035 - Lei Han, Xuesong Zhang:
Scalable Super-Resolution Neural Operator. 10036-10045 - Ling Zhang, Yidong Ma, Zhi Jiang, Weilei He, Zhongyun Bao, Gang Fu, Wenju Xu, Chunxia Xiao:
HighlightRemover: Spatially Valid Pixel Learning for Image Specular Highlight Removal. 10046-10054 - Yuhang Zhou, Yushu Zhang, Leo Yu Zhang, Zhongyun Hua:
DERD: Data-free Adversarial Robustness Distillation through Self-adversarial Teacher Group. 10055-10064 - Shuman Zhuang, Sujia Huang, Wei Huang, Yuhong Chen, Zhihao Wu, Ximeng Liu:
Enhancing Multi-view Graph Neural Network with Cross-view Confluent Message Passing. 10065-10074 - Fu Rong, Wenjin Peng, Meng Lan, Qian Zhang, Lefei Zhang:
Driving Scene Understanding with Traffic Scene-Assisted Topology Graph Transformer. 10075-10084 - Chang'an Yi, Haotian Chen, Yifan Zhang, Yonghui Xu, Yan Zhou, Lizhen Cui:
From Question to Exploration: Can Classic Test-Time Adaptation Strategies Be Effectively Applied in Semantic Segmentation? 10085-10094 - Zehao Chen, Zhan Lu, De Ma, Huajin Tang, Xudong Jiang, Qian Zheng, Gang Pan:
Event-ID: Intrinsic Decomposition Using an Event Camera. 10095-10104 - Xu Zhang, Fan Ni, Guannan Dong, Aichun Zhu, Jianhui Wu, Mingcheng Ni, Hui Liu:
TVPR: Text-to-Video Person Retrieval and a New Benchmark. 10105-10113 - Haoyu Shi, Huaiwen Zhang:
Modal-Enhanced Semantic Modeling for Fine-Grained 3D Human Motion Retrieval. 10114-10123 - Hongyu Zhu, Sichu Liang, Wentao Hu, Fangqi Li, Ju Jia, Shi-Lin Wang:
Reliable Model Watermarking: Defending against Theft without Compromising on Evasion. 10124-10133 - Qian Qiao, Yu Xie, Jun Gao, Tianxiang Wu, Shaoyao Huang, Jiaqing Fan, Ziqiang Cao, Zili Wang, Yue Zhang:
DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training. 10134-10143 - Yi Liu, Xinyi Li, Wenjing Shuai:
3D Scene De-occlusion in Neural Radiance Fields: A Framework for Obstacle Removal and Realistic Inpainting. 10144-10153 - Xuannan Liu, Peipei Li, Huaibo Huang, Zekun Li, Xing Cui, Jiahao Liang, Lixiong Qin, Weihong Deng, Zhaofeng He:
FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs. 10154-10163 - Yalan Qin, Li Qian:
Fast Elastic-Net Multi-view Clustering: A Geometric Interpretation Perspective. 10164-10172 - Xiaojiao Guo, Xuhang Chen, Shenghong Luo, Shuqiang Wang, Chi-Man Pun:
Dual-Hybrid Attention Network for Specular Highlight Removal. 10173-10181 - Yiyang Luo, Ke Lin, Chao Gu:
Context-Aware Indoor Point Cloud Object Generation through User Instructions. 10182-10190 - Zhangli Hu, Ye Chen, Zhongyin Zhao, Jinfan Liu, Bilian Ke, Bingbing Ni:
Towards Artist-Like Painting Agents with Multi-Granularity Semantic Alignment. 10191-10199 - Zixuan Wang, Jiayi Li, Xiaoyu Qin, Shikun Sun, Songtao Zhou, Jia Jia, Jiebo Luo:
DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis. 10200-10209 - Sooho Kim, Soyeon Hong, Kyungsoo Park, Hyunsouk Cho, Kyung-Ah Sohn:
OmniStitch: Depth-Aware Stitching Framework for Omnidirectional Vision with Multiple Cameras. 10210-10219 - Kaijiang Li, Hao Li, Haining Li, Peisen Wang, Chunyi Guo, Wenfeng Jiang:
SIRLUT: Simulated Infrared Fusion Guided Image-adaptive 3D Lookup Tables for Lightweight Image Enhancement. 10220-10228 - Bolin Jiang, Yuqiu Xie, Jiawei Li, Naiqi Li, Bin Chen, Shu-Tao Xia:
IGSPAD: Inverting 3D Gaussian Splatting for Pose-agnostic Anomaly Detection. 10229-10237 - Guobiao Li, Sheng Li, Zhenxing Qian, Xinpeng Zhang:
Cover-separable Fixed Neural Network Steganography via Deep Generative Models. 10238-10247 - Baorui Ma, Yu-Shen Liu, Matthias Zwicker, Zhizhong Han:
Inferring 3D Occupancy Fields through Implicit Reasoning on Silhouette Images. 10248-10257 - Rui Li, Yishu Liu, Huafeng Li, Jinxing Li, Guangming Lu:
Prototype-Guided Dual-Transformer Reasoning for Video Individual Counting. 10258-10267 - Tao Wang, Yushu Zhang, Xiangli Xiao, Lin Yuan, Zhihua Xia, Jian Weng:
Make Privacy Renewable! Generating Privacy-Preserving Faces Supporting Cancelable Biometric Recognition. 10268-10276 - Green Rosh K. S, B. H. Pawan Prasad, Lokesh R. Boregowda, Kaushik Mitra:
R2SFD: Improving Single Image Reflection Removal using Semantic Feature Dictionary. 10277-10286 - Jiaming Shen, Kun Hu, Wei Bao, Chang Wen Chen, Zhiyong Wang:
Bridging the Gap: Sketch-Aware Interpolation Network for High-Quality Animation Sketch Inbetweening. 10287-10295 - Yanghao Su, Jie Zhang, Ting Xu, Tianwei Zhang, Weiming Zhang, Nenghai Yu:
Model X-ray: Detecting Backdoored Models via Decision Boundary. 10296-10305 - Lize Zhou, Xiaoqi Wang, Jian Xiong, Xianzhong Long, Hao Gao:
Towards Distortion-Debiased Blind Image Quality Assessment. 10306-10315 - Benhui Zhang, Junyu Gao, Yuan Yuan:
A Descriptive Basketball Highlight Dataset for Automatic Commentary Generation. 10316-10325 - Cong Wang, Liyan Wang, Jie Mu, Chengjin Yu, Wei Wang:
Progressive Local and Non-Local Interactive Networks with Deeply Discriminative Training for Image Deraining. 10326-10335 - Kaifang Yang, Xinrong Zhao, Yanchao Gong:
Semantic Aware Just Noticeable Differences for VVC Compressed Text Screen Content Images. 10336-10344 - Jiaxuan Wu, Zhengxian Wu, Yiming Xue, Juan Wen, Wanli Peng:
Generative Text Steganography with Large Language Model. 10345-10353 - Yuchen Wang, Xingyu Zhu, Guanhui Ye, Shiyao Zhang, Xuetao Wei:
Achieving Resolution-Agnostic DNN-based Image Watermarking: A Novel Perspective of Implicit Neural Representation. 10354-10362 - Renshu Gu, Jiajun Zhu, Yixuan Si, Fei Gao, Jiamin Xu, Gang Xu:
3D Human Pose Estimation from Multiple Dynamic Views via Single-view Pretraining with Procrustes Alignment. 10363-10372 - Yang Ding, Yi Dai, Xin Wang, Ling Feng, Lei Cao, Huijun Zhang:
Integrating Content-Semantics-World Knowledge to Detect Stress from Videos. 10373-10381 - Xintian Mao, Jiansheng Wang, Xingran Xie, Qingli Li, Yan Wang:
LoFormer: Local Frequency Transformer for Image Deblurring. 10382-10391 - Mingjin Zhang, Chi Zhang, Qiming Zhang, Yunsong Li, Xinbo Gao, Jing Zhang:
Unleashing the Power of Generic Segmentation Model: A Simple Baseline for Infrared Small Target Detection. 10392-10401 - Honglin Yuan, Shiyun Lai, Xingfeng Li, Jian Dai, Yuan Sun, Zhenwen Ren:
Robust Prototype Completion for Incomplete Multi-view Clustering. 10402-10411 - Changhao Peng, Wei Gao:
Laplacian Matrix Learning for Point Cloud Attribute Compression with Ternary Search-Based Adaptive Block Partition. 10412-10420 - Zhongwei Xuan, Zunjie Zhu, Shuai Wang, Haibing Yin, Hongkui Wang, Ming Lu:
Superpixel-based Efficient Sampling for Learning Neural Fields from Large Input. 10421-10430 - Zhaolin Wan, Qiushuang Yang, Zhiyang Li, Xiaopeng Fan, Wangmeng Zuo, Debin Zhao:
Dual-stream Perception-driven Blind Quality Assessment for Stereoscopic Omnidirectional Images. 10431-10439 - Weixuan Tang, Haoyu Yang, Yuan Rao, Zhili Zhou, Fei Peng:
Dig a Hole and Fill in Sand: Adversary and Hiding Decoupled Steganography. 10440-10448 - Bin Wang, Meishan Zhang, Hao Fei, Yu Zhao, Bobo Li, Shengqiong Wu, Wei Ji, Min Zhang:
SpeechEE: A Novel Benchmark for Speech Event Extraction. 10449-10458 - Shouyu Chen, Liang Hu, Tangwei Ye, Zhongyuan Lai, Qi Zhang, Ke Liu, Usman Naseem, Ke Sun, Nengjun Zhu:
VR-DiagNet: Medical Volumetric and Radiomic Diagnosis Networks with Interpretable Clinician-like Optimizing Visual Inspection. 10459-10467 - Minjing Yu, Delong Pang, Ziwen Kang, Zhiyao Sun, Tian Lv, Jenny Sheng, Ran Yi, Yu-Hui Wen, Yong-Jin Liu:
ECAvatar: 3D Avatar Facial Animation with Controllable Identity and Emotion. 10468-10476 - Zhenyu Bao, Guibiao Liao, Zhongyuan Zhao, Kanglin Liu, Qing Li, Guoping Qiu:
3D Reconstruction and Novel View Synthesis of Indoor Environments Based on a Dual Neural Radiance Field. 10477-10486 - Zimo Liu, Kangjun Liu, Mingyue Guo, Shiliang Zhang, Yaowei Wang:
CoTuning: A Large-Small Model Collaborating Distillation Framework for Better Model Generalization. 10487-10496 - Yanbin Deng, Zheng Li, Ning Xie, Wei Zhang:
PIMT: Physics-Based Interactive Motion Transition for Hybrid Character Animation. 10497-10505 - Kang Shen, Haifeng Xia, Guangxing Geng, Guangyue Geng, Siyu Xia, Zhengming Ding:
DEITalk: Speech-Driven 3D Facial Animation with Dynamic Emotional Intensity Modeling. 10506-10514 - Tianyi Wang, Mengxiao Huang, Harry Cheng, Xiao Zhang, Zhiqi Shen:
LampMark: Proactive Deepfake Detection via Training-Free Landmark Perceptual Watermarks. 10515-10524 - Lintao Dong, Wei Zhai, Zheng-Jun Zha:
UniDense: Unleashing Diffusion Models with Meta-Routers for Universal Few-Shot Dense Prediction. 10525-10534 - Henglei Lv, Jiayu Xiao, Liang Li:
Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization. 10535-10543 - Guoqing Zhu, Honghu Pan, Qiang Wang, Chao Tian, Chao Yang, Zhenyu He:
Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model. 10544-10553 - Qiao Li, Xiaomeng Fu, Xi Wang, Jin Liu, Xingyu Gao, Jiao Dai, Jizhong Han:
Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models. 10554-10562 - Zhaoda Ye, Xinhan Zheng, Yang Liu, Yuxin Peng:
RelScene: A Benchmark and baseline for Spatial Relations in text-driven 3D Scene Generation. 10563-10571 - Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie:
QVD: Post-training Quantization for Video Diffusion Models. 10572-10581 - Jingjing Xie, Yuxin Zhang, Mingbao Lin, Liujuan Cao, Rongrong Ji:
Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation. 10582-10591 - Pengfei Zhou, Fangxiang Feng, Guang Liu, Ruifan Li, Xiaojie Wang:
DiffHarmony++: Enhancing Image Harmonization with Harmony-VAE and Inverse Harmonization Model. 10592-10601 - Qi Xu, Xuanye Fang, Yaxin Li, Jiangrong Shen, De Ma, Yi Xu, Gang Pan:
RSNN: Recurrent Spiking Neural Networks for Dynamic Spatial-Temporal Information Processing. 10602-10610 - Wei Yang, Tengfei Huo, Zhiqiang Liu:
Enhancing Transformer-based Semantic Matching for Few-shot Learning through Weakly Contrastive Pre-training. 10611-10620 - Stanislav Frolov, Brian B. Moser, Sebastian Palacio, Andreas Dengel:
ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation. 10621-10629 - Rongjie Huang, Yongqi Wang, Ruofan Hu, Xiaoshan Xu, Zhiqing Hong, Dongchao Yang, Xize Cheng, Zehan Wang, Ziyue Jiang, Zhenhui Ye, Luping Liu, Siqi Zheng, Zhou Zhao:
VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation. 10630-10639 - Yuran Wang, Zhijing Wan, Yansheng Qiu, Zheng Wang:
Devil is in Details: Locality-Aware 3D Abdominal CT Volume Generation for Self-Supervised Organ Segmentation. 10640-10648 - Minghui Li, Jiangxiong Wang, Hao Zhang, Ziqi Zhou, Shengshan Hu, Xiaobing Pei:
Transferable Adversarial Facial Images for Privacy Protection. 10649-10658 - Ming Tao, Bing-Kun Bao, Hao Tang, Yaowei Wang, Changsheng Xu:
CoIn: A Lightweight and Effective Framework for Story Visualization and Continuation. 10659-10668 - Xulu Zhang, Wengyu Zhang, Xiaoyong Wei, Jinlin Wu, Zhaoxiang Zhang, Zhen Lei, Qing Li:
Generative Active Learning for Image Synthesis Personalization. 10669-10677 - Zhijun Zhai, Zengmao Wang, Xiaoxiao Long, Kaixuan Zhou, Bo Du:
SAT3D: Image-driven Semantic Attribute Transfer in 3D. 10678-10687 - Zihan Huang, Xinyu Shi, Zecheng Hao, Tong Bu, Jianhao Ding, Zhaofei Yu, Tiejun Huang:
Towards High-performance Spiking Transformers from ANN to SNN Conversion. 10688-10697 - Jialiang Li, Haoyue Wang, Sheng Li, Zhenxing Qian, Xinpeng Zhang, Athanasios V. Vasilakos:
Are handcrafted filters helpful for attributing AI-generated images? 10698-10706 - Peng Ding, Jingyu Wu, Jun Kuang, Dan Ma, Xuezhi Cao, Xunliang Cai, Shi Chen, Jiajun Chen, Shujian Huang:
Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs. 10707-10715 - Shaodong Wang, Yunyang Ge, Liuhan Chen, Haiyang Zhou, Qian Wang, Xinhua Cheng, Li Yuan:
Prompt2Poster: Automatically Artistic Chinese Poster Creation from Prompt Only. 10716-10724 - Weijie Wang, Jichao Zhang, Chang Liu, Xia Li, Xingqian Xu, Humphrey Shi, Nicu Sebe, Bruno Lepri:
UVMap-ID: A Controllable and Personalized UV Map Generative Model. 10725-10734 - Tianshuo Peng, Zuchao Li, Lefei Zhang, Hai Zhao, Ping Wang, Bo Du:
Multi-modal Auto-regressive Modeling via Visual Tokens. 10735-10744 - Haining Wang, Na Li, Huijie Zhao, Yan Wen, Yi Su, Yuqiang Fang:
MappingFormer: Learning Cross-modal Feature Mapping for Visible-to-infrared Image Translation. 10745-10754 - Xiangping Zheng, Xiuxin Hao, Bo Wu, Xigang Bao, Xuan Zhang, Wei Li, Xun Liang:
A Sample-driven Selection Framework: Towards Graph Contrastive Networks with Reinforcement Learning. 10755-10764 - Peiyong Wang, Bohan Xiao, Qisheng He, Carri Glide-Hurst, Ming Dong:
Score-Based Image-to-Image Brownian Bridge. 10765-10773 - Tingfeng Cao, Junsheng Kong, Xue Zhao, Wenqing Yao, Junwei Ding, Jinhui Zhu, Jiandong Zhang:
Product2IMG: Prompt-Free E-commerce Product Background Generation with Diffusion Model and Self-Improved LMM. 10774-10783 - Zhenyu Xie, Haoye Dong, Yufei Gao, Zehua Ma, Xiaodan Liang:
DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models. 10784-10793 - Chencan Fu, Yabiao Wang, Jiangning Zhang, Zhengkai Jiang, Xiaofeng Mao, Jiafu Wu, Weijian Cao, Chengjie Wang, Yanhao Ge, Yong Liu:
MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion. 10794-10803 - Wei Lou, Guanbin Li, Xiang Wan, Haofeng Li:
Multi-modal Denoising Diffusion Pre-training for Whole-Slide Image Classification. 10804-10813 - Xingyi Li, Yizheng Wu, Jun Cen, Juewen Peng, Kewei Wang, Ke Xian, Zhe Wang, Zhiguo Cao, Guosheng Lin:
iControl3D: An Interactive System for Controllable 3D Scene Generation. 10814-10823 - Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin:
PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering. 10824-10832 - Jiancheng Huang, Mingfu Yan, Songyan Chen, Yi Huang, Shifeng Chen:
MagicFight: Personalized Martial Arts Combat Video Generation. 10833-10842 - Longfei Lu, Huachen Gao, Tao Dai, Yaohua Zha, Zhi Hou, Junta Wu, Shu-Tao Xia:
Large Point-to-Gaussian Model for Image-to-3D Generation. 10843-10852 - Mingzhen Sun, Weining Wang, Yanyuan Qiao, Jiahui Sun, Zihan Qin, Longteng Guo, Xinxin Zhu, Jing Liu:
MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation. 10853-10861 - Ruowei Wang, Jiaqi Li, Dan Zeng, Xueqi Ma, Zixiang Xu, Jianwei Zhang, Qijun Zhao:
GenUDC: High Quality 3D Mesh Generation With Unsigned Dual Contouring Representation. 10862-10871 - Xiaopei Zhu, Peiyang Xu, Guanning Zeng, Yinpeng Dong, Xiaolin Hu:
Natural Language Induced Adversarial Images. 10872-10881 - Xin Lu, Chuanqing Zhuang, Zhengda Lu, Yiqun Wang, Jun Xiao:
FC-4DFS: Frequency-controlled Flexible 4D Facial Expression Synthesizing. 10882-10890 - Jiaxing Li, Hongbo Zhao, Yijun Wang, Jianxin Lin:
Towards Photorealistic Video Colorization via Gated Color-Guided Image Diffusion Models. 10891-10900 - Mengmeng Ge, Xu Jia, Takashi Isobe, Xiaomin Li, Qinghe Wang, Jing Mu, Dong Zhou, Li Wang, Huchuan Lu, Lu Tian, Ashish Sirasao, Emad Barsoum:
Customizing Text-to-Image Generation with Inverted Interaction. 10901-10909 - Yunqiu Xu, Linchao Zhu, Yi Yang:
GG-Editor: Locally Editing 3D Avatars with Multimodal Large Language Model Guidance. 10910-10919 - Xianqiang Lyu, Hui Liu, Junhui Hou:
RainyScape: Unsupervised Rainy Scene Reconstruction using Decoupled Neural Rendering. 10920-10929 - Jingyu Lin, Guiqin Zhao, Jing Xu, Guoli Wang, Zejin Wang, Antitza Dantcheva, Lan Du, Cunjian Chen:
DiffTV: Identity-Preserved Thermal-to-Visible Face Translation via Feature Alignment and Dual-Stage Conditions. 10930-10938 - Yifan Li, Yuhang Bai, Shuai Yang, Jiaying Liu:
COCO-LC: Colorfulness Controllable Language-based Colorization. 10939-10947 - Yiying Bao, Hao Zhou, Chao Peng, Chenyang Xu, Shuo Shi, Kecheng Cai:
Boundary-Aware Periodicity-based Sparsification Strategy for Ultra-Long Time Series Forecasting. 10948-10956 - Ziyi Dong, Yao Xiao, Pengxu Wei, Liang Lin:
Decoder-Only LLMs are Better Controllers for Diffusion Models. 10957-10965 - Zhenqi Dai, Ting Liu, Xingxing Zhang, Yunchao Wei, Yanning Zhang:
One-shot In-context Part Segmentation. 10966-10975 - Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, Ying Shan:
CustomNet: Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models. 10976-10984 - Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn, Seungryong Kim:
GaussianTalker: Real-Time Talking Head Synthesis with 3D Gaussian Splatting. 10985-10994 - Huanpeng Chu, Wei Wu, Chengjie Zang, Kun Yuan:
QNCD: Quantization Noise Correction for Diffusion Models. 10995-11003 - Dan Wang, Xinrui Cui:
InNeRF: Learning Interpretable Radiance Fields for Generalizable 3D Scene Representation and Rendering. 11004-11012 - Zhongyi Fan, Zixin Yin, Gang Li, Yibing Zhan, Heliang Zheng:
DreamBooth++: Boosting Subject-Driven Generation via Region-Level References Packing. 11013-11021 - Zhenghao Chen, Luping Zhou, Zhihao Hu, Dong Xu:
Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression. 11022-11031 - Lingfei Ren, Ruimin Hu, Zheng Wang, Yilin Xiao, Dengshi Li, Junhang Wu, Yilong Zang, Jinzhang Hu, Zijun Huang:
Heterophilic Graph Invariant Learning for Out-of-Distribution of Fraud Detection. 11032-11040 - Haicheng Liao, Haoyu Sun, Huanming Shen, Chengyue Wang, Chunlin Tian, KaHou Tam, Li Li, Chengzhong Xu, Zhenning Li:
CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions. 11041-11050 - Lehao Lin, Hong Kang, Xinyao Sun, Wei Cai:
SemNFT: A Semantically Enhanced Decentralized Middleware for Digital Asset Immortality. 11051-11059 - Guogang Zhu, Xuefeng Liu, Jianwei Niu, Shaojie Tang, Xinghao Wu, Jiayuan Zhang:
DualFed: Enjoying both Generalization and Personalization in Federated Learning via Hierachical Representations. 11060-11069 - Hui Zeng, Minrui Xu, Tongqing Zhou, Xinyi Wu, Jiawen Kang, Zhiping Cai, Dusit Niyato:
One-shot-but-not-degraded Federated Learning. 11070-11079 - Miao Cao, Lishun Wang, Huan Wang, Guoqing Wang, Xin Yuan:
Towards Real-time Video Compressive Sensing on Mobile Devices. 11080-11088 - Daheng Yin, Jianxin Shi, Miao Zhang, Zhaowu Huang, Jiangchuan Liu, Fang Dong:
FSVFG: Towards Immersive Full-Scene Volumetric Video Streaming with Adaptive Feature Grid. 11089-11098 - Huanhuan Zhang, Liu zhuo, Haotian Li, Anfu Zhou, Chuanming Wang, Huadong Ma:
AraLive: Automatic Reward Adaption for Learning-based Live Video Streaming. 11099-11108 - Jun Dan, Weiming Liu, Mushui Liu, Chunfeng Xie, Shunjie Dong, Guofang Ma, Yanchao Tan, Jiazheng Xing:
HOGDA: Boosting Semi-supervised Graph Domain Adaptation via High-Order Structure-Guided Adaptive Feature Alignment. 11109-11118
Reproducibility
- Xin Jin, Longteng Jiang, Yihao Zhang, Lihua Lu, Xiaobo Gao, Boyan Dong:
Reproducibility Companion Paper: Aesthetics-Driven Virtual Time-Lapse Photography Generation. 11119-11122
Panel
- Zi Helen Huang, Phoebe Chen, Shuicheng Yan:
Generative AI in Multimedia: Challenges and Opportunities for Academic and Industrial Impact. 11123-11124
Industry Session
- Jianquan Liu, Balu Adsumilli, Yukiko Yanagawa, Haiwei Dong:
An Innovative Industry Program in A New Era of Multimedia with Generative AI. 11125-11126
Doctoral Symposium
- Wenmiao Hu:
Utilizing Very High-resolution Optical RGB Satellite Imagery in Geo-information Extraction for Fine-scale Map-making. 11127-11131 - Cheng Zhang:
Practical Deep Learning Models for QIM-based VoIP Steganalysis. 11132-11136
Brave New Ideas
- Jie An, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Lijuan Wang, Jiebo Luo:
OpenLEAF: A Novel Benchmark for Open-Domain Interleaved Image-Text Generation. 11137-11145 - Carlos de la Torre-Ortiz, Tuukka Ruotsalo:
Perceptual Visual Similarity from EEG: Prediction and Image Generation. 11146-11155 - Yifeng Gao, Yuhua Sun, Xingjun Ma, Zuxuan Wu, Yu-Gang Jiang:
ModelLock: Locking Your Model With a Spell. 11156-11165 - Jiyi Zhang, Han Fang, Ee-Chien Chang:
Finding Input Data Domains of Image Classification Models with Hard-Label Black-Box Access. 11166-11174 - Yudong Zhang, Ruobing Xie, Jiansheng Chen, Xingwu Sun, Yu Wang:
PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe Questions. 11175-11183 - Taotao Zhou, Teng Xu, Dong Zhang, Yuyang Jiao, Peijun Xu, Yaoyu He, Lan Xu, Jingyi Yu:
Sophia-in-Audition: Virtual Production with a Robot Performer. 11184-11193
Open-Source
- Xiaodong Chen, Kunlang He, Wu Liu, Xinchen Liu, Zheng-Jun Zha, Tao Mei:
CLaM: An Open-Source Library for Performance Evaluation of Text-driven Human Motion Generation. 11194-11197 - Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen:
VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models. 11198-11201 - Wei Gao, Huiming Zheng, Chenhao Zhang, Kaiyu Zheng, Zhuozhen Yu, Yuan Li, Hua Ye, Yongchi Zhang:
OpenDIC: An Open-Source Library and Performance Evaluation for Deep-learning-based Image Compression. 11202-11205 - Hung-Jui Guo, Hiranya Garbha Kumar, Minhas Kamal, Balakrishnan Prabhakaran:
Room2XR: Virtual Interactive Collaboration in Real-world Scenes. 11206-11209 - Jack Jansen, Thomas Röggla, Silvia Rossi, Irene Viola, Pablo César:
Open-Sourcing VR2Gather: A Collaborative Social VR System for Adaptive Multi-Party Real Time Communication. 11210-11213 - Joni Räsänen, Heikki Tampio, Alexandre Mercat, Jarno Vanne:
uvgComm: Open Software for Low-Latency Multi-party Video Communication. 11214-11217 - Tomás Soucek, Jakub Lokoc:
TransNet V2: An Effective Deep Network Architecture for Fast Shot Transition Detection. 11218-11221 - Jingyuan Tang, Yangang Cai, Xuesong Gao, Songlin Sun:
Generalized Sampling of Non-Local Textural Clues Multi-View Stereo Framework. 11222-11225 - Yuan Tong, Mengshun Hu, Zheng Wang:
NNVISR: Bring Neural Network Video Interpolation and Super Resolution into Video Processing Framework. 11226-11229 - Marko Viitanen, Joose Sainio, Kari Siivonen, Alexandre Mercat, Jarno Vanne:
uvg266: Open-Source VVC Intra Encoder. 11230-11233 - Liang Xie, Wei Gao:
LearningPCC: A PyTorch Library for Learning-Based Point Cloud Compression. 11234-11238 - Liang Xie, Wei Gao:
PCHMVision: An Open-Source Library of Point Cloud Compression for Human and Machine Vision. 11239-11243 - Feng Ye, Li Zhang, Chuanmin Jia:
Deep Video Compression with Scaled Hierarchical Bi-directional Motion Model. 11244-11247 - Hang Yuan, Wei Gao, Wenxu Gao:
OpenSEP: An Open Source Subjective Experiment Platform. 11248-11251
Technical Demonstrations
- Ansel Blume, Khanh Duy Nguyen, Zhenhailong Wang, Yangyi Chen, Michal Shlapentokh-Rothman, Xiaomeng Jin, Jeonghwan Kim, Zhen Zhu, Jiateng Liu, Kuan-Hao Huang, Mankeerat Sidhu, Xuanming Zhang, Vivian Liu, Raunak Sinha, Te-Lin Wu, Abhay Zala, Elias Stengel-Eskin, Da Yin, Yao Xiao, Utkarsh Mall, Zhou Yu, Kai-Wei Chang, Camille Cobb, Karrie Karahalios, Lydia B. Chilton, Mohit Bansal, Nanyun Peng, Carl Vondrick, Derek Hoiem, Heng Ji:
MIRACLE: An Online, Explainable Multimodal Interactive Concept Learning System. 11252-11254 - Difei Gao, Siyuan Hu, Zechen Bai, Qinghong Lin, Mike Zheng Shou:
AssistEditor: Multi-Agent Collaboration for GUI Workflow Automation in Video Creation. 11255-11257 - Feilin Han, Leping Zhang, Xin Wang, Ke-Ao Zhao, Ying Zhong, Ziyi Su, Tongtong Feng, Wenwu Zhu:
U2USim - A UAV Telepresence Simulation Platform with Multi-agent Sensing and Dynamic Environment. 11258-11260 - Zhanbin Hu, Xiaodong He, Renzhou Pan, Xianzhou Zeng, Chenming Fan, Qiang Zhu:
MAF-ID: Multi-Agent Framework for Interactive Dubbing through Deep Video Understanding. 11261-11263 - Xin Jin, Liaoruxing Zhang, Longteng Jiang, Dandan Li:
Unlimited Vision: Professional Composition by Yourself. 11264-11266 - Seongjean Kim, Jungwoo Huh, Yeseung Park, Jungsu Kim, Sanghoon Lee:
DanceMimic: Awaken Your Dancing Instinct through a Real-time Dance Imitation Capture System. 11267-11269 - Ying Ma, Xinyan Yang, Aiqi Wang, Jianglin Zeng, Shaofei Liu:
Video Editing Chatbot: Language-Driven Video Compositing System. 11270-11272 - Liangyu Wang, Yoko Yamakata, Ryoma Maeda, Kiyoharu Aizawa:
Measure and Improve Your Food: Ingredient Estimation Based Nutrition Calculator. 11273-11275 - Mingyuan Wu, Ruifan Ji, Haozhen Zheng, Jiaxi Li, Beitong Tian, Bo Chen, Ruixiao Zhang, Jacob Chakareski, Michael Zink, Ramesh K. Sitaraman, Klara Nahrstedt:
Scene Graph Driven Hybrid Interactive VR Teleconferencing. 11276-11278 - Yuning Wu, Jiatong Shi, Yifeng Yu, Yuxun Tang, Tao Qian, Yueqian Lin, Jionghao Han, Xinyi Bai, Shinji Watanabe, Qin Jin:
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. 11279-11281 - Shengzhou Yi, Junichiro Matsugami, Takuya Yamamoto, Toshihiko Yamasaki:
Enhancing Speaking and Slide Design Skills with Deep Learning: An Online Presentation Assessment System. 11282-11284
Tutorial Presentations
- Rahel Arnold, Werner Bailer, Ralph Gasser, Björn Þór Jónsson, Omar Shahbaz Khan, Heiko Schuldt, Florian Spiess, Lucia Vadicamo:
Multimedia Information Retrieval in XR. 11285-11286 - Niccolo Biondi, Simone Ricci, Federico Pernici, Alberto Del Bimbo:
Learning Backward Compatible Representations. 11287-11288 - Hao Fei, Xiangtai Li, Haotian Liu, Fuxiao Liu, Zhuosheng Zhang, Hanwang Zhang, Shuicheng Yan:
From Multimodal LLM to Human-level AI: Modality, Instruction, Reasoning and Beyond. 11289-11291 - Wei Gao, Ge Li:
Point Cloud Compression, Enhancement and Applications: From 3D Perception to Large Models. 11292-11293 - Soyeon Caren Han, Feiqi Cao, Josiah Poon, Roberto Navigli:
Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond. 11294-11295 - Xin Wang, Yuwei Zhou, Hong Chen, Wenwu Zhu:
Curriculum Learning for Multimedia in the Era of Large Language Models. 11296-11297 - Kaicheng Yu, Zhuang Shao, Siyuan Qi, Dongfang Liu:
Tutorial: Large Language-Vision Model in Society. 11298-11299 - Sicheng Zhao, Guoli Jia, Xiaopeng Hong, Yanyan Zhao, Jianhua Tao:
Label-Efficient Emotion and Sentiment Analysis. 11300-11301
Grand Challenges
- Yicheng Wu, Yutong Xie, Xiangde Luo, Qi Wu, Jianfei Cai:
Dataset, Challenge, and Evaluation for Tumor Segmentation Variability. 11302-11303 - Dan Guo, Xiaobai Li, Kun Li, Haoyu Chen, Jingjing Hu, Guoying Zhao, Yi Yang, Meng Wang:
MAC 2024: Micro-Action Analysis Grand Challenge. 11304-11305 - Jun Yu, Mohan Jing, Guopeng Zhao, Keda Lu, Yifan Wang, Feng Zhao, Jiaqing Sun, Qingsong Liu, Jiaen Liang:
End-to-end Spatio-Temporal Information Aggregation For Micro-Action Detection. 11306-11312 - Qiankun Li, Xiaolong Huang, Huabao Chen, Feng He, Qiupu Chen, Zengfu Wang:
Advancing Micro-Action Recognition with Multi-Auxiliary Heads and Hybrid Loss Optimization. 11313-11319 - Chen Wang, Xun Mei, Feng Zhang:
Instance-aware Fine-grained Micro-action Recognition. 11320-11326 - Fan Gong, Jialiang Chen, Jiajun Zhu, Qijian Bao, Fei Gao, Renshu Gu, Gang Xu:
Micro-Action Recognition via Hierarchical Fusion and Inference. 11327-11332 - Muhammad Saad Saeed, Shah Nawaz, Marta Moscati, Rohan Kumar Das, Muhammad Salman Tahir, Muhammad Zaigham Zaheer, Muhammad Irzam Liaqat, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf, Markus Schedl:
A Synopsis of FAME 2024 Challenge: Associating Faces with Voices in Multilingual Environments. 11333-11334 - Jiehui Tang, Xiaofei Wang, Zhen Xiao, Jiayi Liu, Xueliang Liu, Richang Hong:
Exploring Robust Face-Voice Matching in Multilingual Environments. 11335-11341 - Ruijie Tao, Zhan Shi, Yidi Jiang, Duc-Tuan Truong, Eng Siong Chng, Massimo Alioto, Haizhou Li:
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization. 11342-11347 - Wuyang Chen, Yanjie Sun, Kele Xu, Yong Dou:
Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association. 11348-11354 - Zhixi Cai, Abhinav Dhall, Shreya Ghosh, Munawar Hayat, Dimitrios Kollias, Kalin Stefanov, Usman Tariq:
1M-Deepfakes Detection Challenge. 11355-11359 - Diego Pérez-Vieites, Juan José Moreira-Pérez, Ángel Aragón-Kifute, Raquel Román-Sarmiento, Rubén Castro-González:
Vigo: Audiovisual Fake Detection and Segment Localization. 11360-11364 - Yi Zhang, Changtao Miao, Man Luo, Jianshu Li, Wenzhong Deng, Weibin Yao, Zhe Li, Bingyu Hu, Weiwei Feng, Tao Gong, Qi Chu:
MFMS: Learning Modality-Fused and Modality-Specific Features for Deepfake Detection and Localization Tasks. 11365-11369 - Yifan Wang, Xuecheng Wu, Jia Zhang, Mohan Jing, Keda Lu, Jun Yu, Wen Su, Fang Gao, Qingsong Liu, Jianqing Sun, Jiaen Liang:
Building Robust Video-Level Deepfake Detection via Audio-Visual Local-Global Interactions. 11370-11376 - Philipp Müller, Michal Balazia, Tobias Baur, Michael Dietz, Alexander Heimerl, Anna Penzkofer, Dominik Schiller, François Brémond, Jan Alexandersson, Elisabeth André, Andreas Bulling:
MultiMediate'24: Multi-Domain Engagement Estimation. 11377-11382 - Deepak Kumar, Surbhi Madan, Pradeep Singh, Abhinav Dhall, Balasubramanian Raman:
Towards Engagement Prediction: A Cross-Modality Dual-Pipeline Approach using Visual and Audio Features. 11383-11389 - Fuyan Ma, Yiran He, Bin Sun, Shutao Li:
Less is More: Adaptive Feature Selection and Fusion for Eye Contact Detection. 11390-11396 - Jia Li, Yangchen Yu, Yin Chen, Yu Zhang, Peng Jia, Yunbo Xu, Ziqiang Li, Meng Wang, Richang Hong:
DAT: Dialogue-Aware Transformer with Modality-Group Fusion for Human Engagement Estimation. 11397-11403 - Yu Zhao, Hao Fei, Bobo Li, Meishan Zhang, Min Zhang:
The ACM Multimedia 2024 Viual Spatial Description Grand Challenge. 11404-11406 - Jun Yu, Yunxiang Zhang, Zerui Zhang, Zhao Yang, Gongpeng Zhao, Fengzhao Sun, Fanrui Zhang, Qingsong Liu, Jianqing Sun, Jiaen Liang, Yaohui Zhang:
RAG-Guided Large Language Models for Visual Spatial Description with Adaptive Hallucination Corrector. 11407-11413 - Jiabao Wang, Fang Gao, Jingfeng Tang, Shaodong Li, Hanbo Zheng, Shengheng Ma, Feng Shuang, Jun Yu:
A Method for Visual Spatial Description Based on Large Language Model Fine-tuning. 11414-11419 - Yizhang Jin, Jian Li, Jiangning Zhang, Jianlong Hu, Zhenye Gan, Xin Tan, Yong Liu, Yabiao Wang, Chengjie Wang, Lizhuang Ma:
LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description. 11420-11425 - Zhiqi Ge, Juncheng Li, Qifan Yu, Wei Zhou, Siliang Tang, Yueting Zhuang:
DEMON24: ACM MM24 Demonstrative Instruction Following Challenge. 11426-11428 - Xian Fu:
Enhancing Multimodal Large Language Models on Demonstrative Multi-Image Instructions. 11429-11434 - Jingyu Wei, Yi Su, Kele Xu, Lingbin Zeng, Bo Liu, Huaimin Wang:
Demonstrative Instruction Following in Multimodal LLMs via Integrating Low-Rank Adaptation with Ensemble Learning. 11435-11441 - Bo Wu, Peiye Liu, Qiushi Huang, Zhaoyang Zeng, Jia Wang, Bei Liu, Jiebo Luo, Wen-Huang Cheng:
SMP Challenge Summary: Social Media Prediction Challenge. 11442-11444 - Yu-Shi Lin, Anthony J. T. Lee:
MMF: Winning Solution to Social Media Popularity Prediction Challenge 2024. 11445-11449 - Wenhao Hu, Weilong Chen, Weimin Yuan, Yan Wang, Shimin Cai, Yanru Zhang:
Dual-Stream Pre-Training Transformer to Enhance Multimodal Learning for Social Media Prediction. 11450-11456 - Mingsheng Tu, Tianjiao Wan, Qisheng Xu, Xinhao Jiang, Kele Xu, Cheng Yang:
Higher-Order Vision-Language Alignment for Social Media Prediction. 11457-11463 - Chih-Chung Hsu, Chia-Ming Lee, Yu-Fan Lin, Yi-Shiuan Chou, Chih-Yu Jian, Chi-Han Tsai:
Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity Prediction. 11464-11469 - Shien Song, Jie Yang, Jin Chen, Han Qi, Yifei Xue, Yizhen Lao, Yi Yu:
ACM Multimedia 2024 Grand Challenge Report for Artificial Intelligence Generated Image Detection. 11470-11471 - Huihui Fu:
Optimizing AIGC Image Detection: Strategies in Data Augmentation and Model Architecture. 11472-11474 - ShiHang Li, Haishan Wu, Biao Wang:
A Solution to ACMMM 2024 on Artificial Intelligence Generated Image Detection. 11475-11477 - Jin Chen:
Optimizing the Baseline Approach for the 2024 ACM Multimedia Grand Challenge in Artificial Intelligence Generated Image Detection. 11478-11481 - John See, Jingting Li, Adrian K. Davison, Gen-Bing Liong, Moi Hoon Yap, Wen-Huang Cheng, Xiaobai Li, Xiaopeng Hong, Su-Jing Wang:
MEGC2024: ACM Multimedia 2024 Facial Micro-Expression Grand Challenge. 11482-11483 - Jun Yu, Gongpeng Zhao, Yaohui Zhang, Peng He, Zerui Zhang, Zhao Yang, Qingsong Liu, Jianqing Sun, Jiaen Liang:
Temporal-Informative Adapters in VideoMAE V2 and Multi-Scale Feature Fusion for Micro-Expression Spotting-then-Recognize. 11484-11489 - Jun Yu, Yaohui Zhang, Gongpeng Zhao, Peng He, Zerui Zhang, Zhongpeng Cai, Qingsong Liu, Jianqing Sun, Jiaen Liang:
Micro-Expression Spotting Based on Optical Flow Feature with Boundary Calibration. 11490-11496 - Zhengye Zhang, Sirui Zhao, Xinglong Mao, Shifeng Liu, Hao Wang, Tong Xu, Enhong Chen:
A Multi-scale Feature Learning Network with Optical Flow Correction for Micro- and Macro-expression Spotting. 11497-11502 - Yuhong He, Wenchao Liu, Guangyu Wang, Lin Ma, Haifeng Li:
Enhancing Micro-Expression Analysis Performance by Effectively Addressing Data Imbalance. 11503-11507
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.