default search action
31st ACM Multimedia 2023: Ottawa, ON, Canada
- Abdulmotaleb El-Saddik, Tao Mei, Rita Cucchiara, Marco Bertini, Diana Patricia Tobon Vallejo, Pradeep K. Atrey, M. Shamim Hossain:
Proceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023. ACM 2023
Keynote Talks
- Chang Wen Chen:
Internet of Video Things: Technical Challenges and Emerging Applications. 1-2 - Alejandro Jaimes:
Multimodal AI & LLMs for Peacekeeping and Emergency Response. 3-4 - Ralf Steinmetz:
Transition and Adaptability: The Cornerstone of Resilience in Future Networked Multimedia Systems and Beyond. 5-6
Oral Session I: Understanding Multimedia Content -- Media Interpretation
- Hao Shen, Zhong-Qiu Zhao, Yulun Zhang, Zhao Zhang:
Mutual Information-driven Triple Interaction Network for Efficient Image Dehazing. 7-16 - Yang Jiao, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang:
Suspected Objects Matter: Rethinking Model's Prediction for One-stage Visual Grounding. 17-26 - Sophyani Banaamwini Yussif, Ning Xie, Yang Yang, Heng Tao Shen:
Self-Relational Graph Convolution Network for Skeleton-Based Action Recognition. 27-36 - Qian Ning, Fangfang Wu, Weisheng Dong, Xin Li, Guangming Shi:
Exploring Correlations in Degraded Spatial Identity Features for Blind Face Restoration. 37-45 - Chuhao Zhou, Jinxing Li, Huafeng Li, Guangming Lu, Yong Xu, Min Zhang:
Video-based Visible-Infrared Person Re-Identification via Style Disturbance Defense and Dual Interaction. 46-55 - Wenmiao Hu, Yichen Zhang, Yuxuan Liang, Xianjing Han, Yifang Yin, Hannes Kruppa, See-Kiong Ng, Roger Zimmermann:
PetalView: Fine-grained Location and Orientation Extraction of Street-view Images via Cross-view Local Search. 56-66 - Haorui Wang, Yibo Hu, Yangfu Zhu, Jinsheng Qi, Bin Wu:
Shifted GCN-GAT and Cumulative-Transformer based Social Relation Recognition for Long Videos. 67-76 - Jilong Wang, Saihui Hou, Yan Huang, Chunshui Cao, Xu Liu, Yongzhen Huang, Liang Wang:
Causal Intervention for Sparse-View Gait Recognition. 77-85 - Digbalay Bose, Rajat Hebbar, Tiantian Feng, Krishna Somandepalli, Anfeng Xu, Shrikanth Narayanan:
MM-AU: Towards Multimodal Understanding of Advertisement Videos. 86-95 - Huiwei Lin, Shanshan Feng, Baoquan Zhang, Hongliang Qiao, Xutao Li, Yunming Ye:
UER: A Heuristic Bias Addressing Approach for Online Continual Learning. 96-104 - Peng Wu, Xiankai Lu, Jianbing Shen, Yilong Yin:
Clip Fusion with Bi-level Optimization for Human Mesh Reconstruction from Monocular Videos. 105-115 - Jinkai Zheng, Xinchen Liu, Shuai Wang, Lihao Wang, Chenggang Yan, Wu Liu:
Parsing is All You Need for Accurate Gait Recognition in the Wild. 116-124 - Dingyi Zhang, Yingming Li, Zhongfei Zhang:
Multi-Scale Similarity Aggregation for Dynamic Metric Learning. 125-134 - Yue Feng, Zhengye Zhang, Rong Quan, Limin Wang, Jie Qin:
RefineTAD: Learning Proposal-free Refinement for Temporal Action Detection. 135-143 - Zhenguang Liu, Xinyang Yu, Ruili Wang, Shuai Ye, Zhe Ma, Jianfeng Dong, Sifeng He, Feng Qian, Xiaobo Zhang, Roger Zimmermann, Lei Yang:
Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization. 144-152 - Dongbao Yang, Yu Zhou, Xiaopeng Hong, Aoting Zhang, Xin Wei, Linchengxi Zeng, Zhi Qiao, Weiping Wang:
Pseudo Object Replay and Mining for Incremental Object Detection. 153-162 - Shiqin Wang, Xin Xu, Xianzheng Ma, Kui Jiang, Zheng Wang:
Informative Classes Matter: Towards Unsupervised Domain Adaptive Nighttime Semantic Segmentation. 163-172 - Ye Tian, Mengyu Yang, Lanshan Zhang, Zhizhen Zhang, Yang Liu, Xiaohui Xie, Xirong Que, Wendong Wang:
View while Moving: Efficient Video Recognition in Long-untrimmed Videos. 173-183 - Yimin Deng, Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao:
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion. 184-192 - Gege Shi, Xueyang Fu, Chengzhi Cao, Zheng-Jun Zha:
Alleviating Spatial Misalignment and Motion Interference for UAV-based Video Recognition. 193-202 - Yang Liu, Zhaoyang Xia, Mengyang Zhao, Donglai Wei, Yuzheng Wang, Siao Liu, Bobo Ju, Gaoyun Fang, Jing Liu, Liang Song:
Learning Causality-inspired Representation Consistency for Video Anomaly Detection. 203-212 - Dongyue Guo, Yi Lin, Xuehang You, Zhongping Yang, Jizhe Zhou, Bo Yang, Jianwei Zhang, Han Shi, Shasha Hu, Zheng Zhang:
M2ATS: A Real-world Multimodal Air Traffic Situation Benchmark Dataset and Beyond. 213-221 - Jianghu Lu, Shikun Li, Kexin Bao, Pengju Wang, Zhenxing Qian, Shiming Ge:
Federated Learning with Label-Masking Distillation. 222-232 - Lingxiao Lu, Jiangtong Li, Junyan Cao, Li Niu, Liqing Zhang:
Painterly Image Harmonization using Diffusion Model. 233-241 - Xingran Xie, Ting Jin, Boxiang Yun, Qingli Li, Yan Wang:
Exploring Hyperspectral Histopathology Image Segmentation from a Deformable Perspective. 242-251 - Runhua Jiang, Yahong Han:
Uncertainty-Aware Variate Decomposition for Self-supervised Blind Image Deblurring. 252-260
Oral Session II: Understanding Multimedia Content -- Multimodal Fusion and Embedding
- Chao Sun, Min Chen, Jialiang Cheng, Han Liang, Chuanbo Zhu, Jincai Chen:
SCLAV: Supervised Cross-modal Contrastive Learning for Audio-Visual Coding. 261-270 - Feng Lin, Kaiqiang Fu, Hao Luo, Ziyue Zhan, Zhibo Wang, Zhenguang Liu, Lorenzo Cavallaro, Kui Ren:
Cross-Modal and Multi-Attribute Face Recognition: A Benchmark. 271-279 - Ye Wang, Junyang Chen, Mengzhu Wang, Hao Li, Wei Wang, Houcheng Su, Zhihui Lai, Wei Wang, Zhenghan Chen:
A Closer Look at Classifier in Adversarial Domain Generalization. 280-289 - Mengzhu Wang, Jianlong Yuan, Zhibin Wang:
Mixture-of-Experts Learner for Single Long-Tailed Domain Generalization. 290-299 - Chao Zhang, Jingwen Wei, Bo Wang, Zechao Li, Chunlin Chen, Huaxiong Li:
Robust Spectral Embedding Completion Based Incomplete Multi-view Clustering. 300-308 - Jinhui Pang, Zixuan Wang, Jiliang Tang, Mingyan Xiao, Nan Yin:
SA-GDA: Spectral Augmentation for Graph Domain Adaptation. 309-318 - Xihong Yang, Cheng Tan, Yue Liu, Ke Liang, Siwei Wang, Sihang Zhou, Jun Xia, Stan Z. Li, Xinwang Liu, En Zhu:
CONVERT: Contrastive Graph Clustering with Reliable Augmentation. 319-327 - Jintian Ji, Songhe Feng:
High-order Complementarity Induced Fast Multi-View Clustering with Enhanced Tensor Rank Minimization. 328-336 - Xihong Yang, Jiaqi Jin, Siwei Wang, Ke Liang, Yue Liu, Yi Wen, Suyuan Liu, Sihang Zhou, Xinwang Liu, En Zhu:
DealMVC: Dual Contrastive Calibration for Multi-view Clustering. 337-346 - Junming Hou, Qi Cao, Ran Ran, Che Liu, Junling Li, Liang-Jian Deng:
Bidomain Modeling Paradigm for Pansharpening. 347-357 - Yingying Wang, Yunlong Lin, Ge Meng, Zhenqi Fu, Yuhang Dong, Linyu Fan, Hedeng Yu, Xinghao Ding, Yue Huang:
Learning High-frequency Feature Enhancement and Alignment for Pan-sharpening. 358-367 - Xingfeng Li, Yinghui Sun, Quansen Sun, Jia Dai, Zhenwen Ren:
Distribution Consistency based Fast Anchor Imputation for Incomplete Multi-view Clustering. 368-376 - Yushen Wei, Yang Liu, Hong Yan, Guanbin Li, Liang Lin:
Visual Causal Scene Refinement for Video Question Answering. 377-386 - Hongye Liu, Xianhai Xie, Yang Gao, Zhou Yu:
Parameter-Efficient Transfer Learning for Audio-Visual-Language Tasks. 387-396 - Xi Chen, Yun Xiong, Siqi Wang, Haofen Wang, Tao Sheng, Yao Zhang, Yu Ye:
ReCo: A Dataset for Residential Community Layout Planning. 397-405 - Runmin Cong, Hongyu Liu, Chen Zhang, Wei Zhang, Feng Zheng, Ran Song, Sam Kwong:
Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection. 406-416 - Jinrong Cui, Yuting Li, Yulu Fu, Jie Wen:
Multi-view Self-Expressive Subspace Clustering Network. 417-425 - Jian Huang, Yanli Ji, Yang Yang, Heng Tao Shen:
Cross-modality Representation Interactive Learning for Multimodal Sentiment Analysis. 426-434 - Yixuan Ma, Xiaolin Zhang, Peng Zhang, Kun Zhan:
Entropy Neural Estimation for Graph Contrastive Learning. 435-443 - Liguo Zhang, Zilin Tian, Yunfei Long, Sizhao Li, Guisheng Yin:
Cross-modal and Cross-medium Adversarial Attack for Audio. 444-453 - Liang Peng, Xin Wang, Xiaofeng Zhu:
Unsupervised Multiplex Graph learning with Complementary and Consistent Information. 454-462 - Yixuan Wu, Jintai Chen, Jiahuan Yan, Yiheng Zhu, Danny Z. Chen, Jian Wu:
GCL: Gradient-Guided Contrastive Learning for Medical Image Segmentation with Multi-Perspective Meta Labels. 463-471 - Zhiying Jiang, Zengxi Zhang, Jinyuan Liu, Xin Fan, Risheng Liu:
Multi-Spectral Image Stitching via Spatial Graph Reasoning. 472-480 - Jiaming Zhuo, Can Cui, Kun Fu, Bingxin Niu, Dongxiao He, Yuanfang Guo, Zhen Wang, Chuan Wang, Xiaochun Cao, Liang Yang:
Propagation is All You Need: A New Framework for Representation Learning and Classifier Training on Graphs. 481-489 - Yao Wu, Mingwei Xing, Yachao Zhang, Yuan Xie, Jianping Fan, Zhongchao Shi, Yanyun Qu:
Cross-modal Unsupervised Domain Adaptation for 3D Semantic Segmentation via Bidirectional Fusion-then-Distillation. 490-498
Oral Session III: Understanding Multimedia Content -- Vision and Language
- Yinjie Zhao, Lichen Zhao, Qian Yu, Lu Sheng, Jing Zhang, Dong Xu:
Distortion-aware Transformer in 360° Salient Object Detection. 499-508 - Zixiao Wang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Boqiang Zhang, Yongdong Zhang:
Symmetrical Linguistic Feature Distillation with CLIP for Scene Text Recognition. 509-518 - Bo Zou, Chao Yang, Chengbin Quan, Youjian Zhao:
SpaceCLIP: A Vision-Language Pretraining Framework With Spatial Reconstruction On Text. 519-528 - Xu Huang, Jin Liu, Zhizhong Zhang, Yuan Xie:
Improving Cross-Modal Recipe Retrieval with Component-Aware Prompted CLIP Embedding. 529-537 - Shuhan Kong, Liang Li, Beichen Zhang, Wenyu Wang, Bin Jiang, Chenggang Yan, Changhao Xu:
Dynamic Contrastive Learning with Pseudo-samples Intervention for Weakly Supervised Joint Video MR and HD. 538-546 - Zheng Yuan, Qiao Jin, Chuanqi Tan, Zhengyun Zhao, Hongyi Yuan, Fei Huang, Songfang Huang:
RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training. 547-556 - Xiao Wang, Yaoyu Li, Tian Gan, Zheng Zhang, Jingjing Lv, Liqiang Nie:
RTQ: Rethinking Video-language Understanding Based on Image-text Model. 557-566 - Shanshan Zhong, Zhongzhan Huang, Wushao Wen, Jinghui Qin, Liang Lin:
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models. 567-578 - Xin Dong, Rui Wang, Siyuan Liang, Aishan Liu, Lihua Jing:
Face Encryption via Frequency-Restricted Identity-Agnostic Attacks. 579-588 - Peipei Song, Dan Guo, Xun Yang, Shengeng Tang, Erkun Yang, Meng Wang:
Emotion-Prior Awareness Network for Emotional Video Captioning. 589-600 - Dong Liu, Qirong Mao, Lijian Gao, Qinghua Ren, Zhenghan Chen, Ming Dong:
TE-KWS: Text-Informed Speech Enhancement for Noise-Robust Keyword Spotting. 601-610 - Jiancheng Pan, Qing Ma, Cong Bai:
A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval. 611-620 - Nirmalendu Prakash, Han Wang, Nguyen-Khoi Hoang, Ming Shan Hee, Roy Ka-Wei Lee:
PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using Large Language Models. 621-631 - Yue Lv, Jinxi Xiang, Jun Zhang, Wenming Yang, Xiao Han, Wei Yang:
Dynamic Low-Rank Instance Adaptation for Universal Neural Image Compression. 632-642 - Leigang Qu, Shengqiong Wu, Hao Fei, Liqiang Nie, Tat-Seng Chua:
LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation. 643-654 - Yue Zhang, Suchen Wang, Shichao Kan, Zhenyu Weng, Yigang Cen, Yap-Peng Tan:
POAR: Towards Open Vocabulary Pedestrian Attribute Recognition. 655-665 - Shengshan Hu, Wei Liu, Minghui Li, Yechao Zhang, Xiaogeng Liu, Xianlong Wang, Leo Yu Zhang, Junhui Hou:
PointCRT: Detecting Backdoor in 3D Point Cloud via Corruption Robustness. 666-675 - Rui Qin, Ming Sun, Fangyuan Zhang, Xing Wen, Bin Wang:
Blind Image Super-resolution with Rich Texture-Aware Codebook. 676-687 - Zizhang Wu, Zhuozheng Li, Zhi-Gang Fan, Yunzhe Wu, Jian Pu, Xianzhi Li:
V2Depth: Monocular Depth Estimation via Feature-Level Virtual-View Simulation and Refinement. 688-697 - Kai Chen, Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang:
GCMA: Generative Cross-Modal Transferable Adversarial Attacks from Images to Videos. 698-708 - Lianyu Hu, Liqing Gao, Zekang Liu, Chi-Man Pun, Wei Feng:
AdaBrowse: Adaptive Video Browser for Efficient Continuous Sign Language Recognition. 709-718 - Lingfeng Li, Gangming Zhao, Yizhou Yu, Jinpeng Li:
Dynamic Triple Reweighting Network for Automatic Femoral Head Necrosis Diagnosis from Computed Tomography. 719-727 - Liu Liu, Jianming Du, Hao Wu, Xun Yang, Zhenguang Liu, Richang Hong, Meng Wang:
Category-Level Articulated Object 9D Pose Estimation via Reinforcement Learning. 728-736 - Qichao Ying, Jiaxin Liu, Sheng Li, Haisheng Xu, Zhenxing Qian, Xinpeng Zhang:
RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection. 737-746 - Xueyi Zhang, Chengwei Zhang, Tao Wang, Jun Tang, Songyang Lao, Haizhou Li:
Slow-Fast Time Parameter Aggregation Network for Class-Incremental Lip Reading. 747-756 - Yang Bai, Jingyao Wang, Min Cao, Chen Chen, Ziqiang Cao, Liqiang Nie, Min Zhang:
Text-based Person Search without Parallel Image-Text Data. 757-767 - Jiawei Liang, Siyuan Liang, Aishan Liu, Ke Ma, Jingzhi Li, Xiaochun Cao:
Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation. 768-778 - Sun'ao Liu, Yiheng Zhang, Zhaofan Qiu, Hongtao Xie, Yongdong Zhang, Ting Yao:
CARIS: Context-Aware Referring Image Segmentation. 779-788 - Shizhou Zhang, Qingchun Yang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, Yanning Zhang:
Ground-to-Aerial Person Search: Benchmark Dataset and Approach. 789-799 - Fan Jiang, Zilei Wang:
Sparse Sharing Relation Network for Panoptic Driving Perception. 800-808
Oral Session IV: Engaging Users with Multimedia -- Emotional and Social Signals
- Daoming Zong, Chaoyue Ding, Baoxiang Li, Jiakui Li, Ken Zheng, Qunyan Zhou:
AcFormer: An Aligned and Compact Transformer for Multimodal Sentiment Analysis. 833-842 - Zeng Tao, Yan Wang, Zhaoyu Chen, Boyang Wang, Shaoqi Yan, Kaixun Jiang, Shuyong Gao, Wenqiang Zhang:
Freq-HD: An Interpretable Frequency-based High-Dynamics Affective Clip Selection Method for in-the-Wild Facial Expression Recognition in Videos. 843-852 - Peiguang Jing, Xianyi Liu, Ji Wang, Yinwei Wei, Liqiang Nie, Yuting Su:
StyleEDL: Style-Guided High-order Attention Network for Image Emotion Distribution Learning. 853-861 - Junjie Zhu, Bingjun Luo, Ao Sun, Jinghang Tan, Xibin Zhao, Yue Gao:
Variance-Aware Bi-Attention Expression Transformer for Open-Set Facial Expression Recognition in the Wild. 862-870 - Zixin Zhang, Fan Qi, Shuai Li, Changsheng Xu:
AffectFAL: Federated Active Affective Computing with Non-IID Data. 871-882 - Peiliang Gong, Ziyu Jia, Pengpai Wang, Yueying Zhou, Daoqiang Zhang:
ASTDF-Net: Attention-Based Spatial-Temporal Dual-Stream Fusion Network for EEG-Based Emotion Recognition. 883-892
Oral Session V: Engaging Users with Multimedia -- Multimedia Search and Recommendation
- Yishu Liu, Qingpeng Wu, Zheng Zhang, Jingyi Zhang, Guangming Lu:
Multi-Granularity Interactive Transformer Hashing for Cross-modal Retrieval. 893-902 - Wenjie Wang, Xinyu Lin, Liuhui Wang, Fuli Feng, Yinwei Wei, Tat-Seng Chua:
Equivariant Learning for Out-of-Distribution Cold-start Recommendation. 903-914 - Haokun Wen, Xian Zhang, Xuemeng Song, Yinwei Wei, Liqiang Nie:
Target-Guided Composed Image Retrieval. 915-923 - Haoxuan Li, Yi Bin, Junrong Liao, Yang Yang, Heng Tao Shen:
Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination. 924-934 - Xin Zhou, Zhiqi Shen:
A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation. 935-943 - Guiwei Zhang, Yongfei Zhang, Zichang Tan:
ProtoHPE: Prototype-guided High-frequency Patch Enhancement for Visible-Infrared Person Re-identification. 944-954 - Wei Ji, Xiangyan Liu, An Zhang, Yinwei Wei, Yongxin Ni, Xiang Wang:
Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation. 955-965 - Junyang Chen, Jialong Wang, Zhijiang Dai, Huisi Wu, Mengzhu Wang, Qin Zhang, Huan Wang:
Zero-shot Micro-video Classification with Neural Variational Inference in Graph Prototype Network. 966-974 - Zhiguo Chen, Xun Jiang, Xing Xu, Zuo Cao, Yijun Mo, Heng Tao Shen:
Joint Searching and Grounding: Multi-Granularity Video Content Retrieval. 975-983 - Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Zhongxuan Han, Dan Meng, Jun Wang:
Making Users Indistinguishable: Attribute-wise Unlearning in Recommender Systems. 984-994 - Dugang Liu, Yang Qiao, Xing Tang, Liang Chen, Xiuqiang He, Zhong Ming:
Prior-Guided Accuracy-Bias Tradeoff Learning for CTR Prediction in Multimedia Recommendation. 995-1003 - Haoyue Bai, Min Hou, Le Wu, Yonghui Yang, Kun Zhang, Richang Hong, Meng Wang:
GoRec: A Generative Cold-start Recommendation Framework. 1004-1012 - Jingzhi Li, Fengling Li, Lei Zhu, Hui Cui, Jingjing Li:
Prototype-guided Knowledge Transfer for Federated Unsupervised Cross-modal Hashing. 1013-1022
Oral Session VI: Engaging Users with Multimedia -- Interactions and Quality of Experience
- Shuai He, Anlong Ming, Shuntian Zheng, Haobin Zhong, Huadong Ma:
EAT: An Enhancer for Aesthetics-Oriented Transformers. 1023-1032 - Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai:
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons. 1033-1044 - Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin:
Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach. 1045-1054 - Guangming Zhu, Siyuan Wang, Qing Cheng, Kelong Wu, Hao Li, Liang Zhang:
Sketch Input Method Editor: A Comprehensive Dataset and Methodology for Systematic Input Recognition. 1055-1065 - Tengchuan Kou, Xiaohong Liu, Wei Sun, Jun Jia, Xiongkuo Min, Guangtao Zhai, Ning Liu:
StableVQA: A Deep No-Reference Quality Assessment Model for Video Stability. 1066-1076 - Jianjun Xiang, Yuanjie Dang, Peng Chen, Ronghua Liang, Ruohong Huan, Zhengyu Zhang:
Spatial-angular Quality-aware Representation Learning for Blind Light Field Image Quality Assessment. 1077-1087 - Yunlong Dong, Xiaohong Liu, Yixuan Gao, Xunchu Zhou, Tao Tan, Guangtao Zhai:
Light-VQA: A Multi-Dimensional Quality Assessment Model for Low-Light Video Enhancement. 1088-1097 - Kun Yuan, Zishang Kong, Chuanchuan Zheng, Ming Sun, Xing Wen:
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment. 1098-1107 - Kaiyuan Hu, Haowen Yang, Yili Jin, Junhua Liu, Yongting Chen, Miao Zhang, Fangxin Wang:
Understanding User Behavior in Volumetric Video Watching: Dataset, Analysis and Prediction. 1108-1116 - Xiangfei Sheng, Leida Li, Pengfei Chen, Jinjian Wu, Weisheng Dong, Yuzhe Yang, Liwu Xu, Yaqian Li, Guangming Shi:
AesCLIP: Multi-Attribute Contrastive Learning for Image Aesthetics Assessment. 1117-1126
Oral Session VII: Engaging Users with Multimedia -- Metaverse, Art and Culture
- Zheng Wei, Xian Xu, Lik-Hang Lee, Wai Tong, Huamin Qu, Pan Hui:
Feeling Present! From Physical to Virtual Cinematography Lighting Education with Metashadow. 1127-1136 - Shao-Kui Zhang, Jia-Hong Liu, Yike Li, Tianyi Xiong, Ke-Xin Ren, Hongbo Fu, Song-Hai Zhang:
Automatic Generation of Commercial Scenes. 1137-1147 - Yang Chen, Yingwei Pan, Yehao Li, Ting Yao, Tao Mei:
Control3D: Towards Controllable Text-to-3D Generation. 1148-1156 - Yuqing Zhang, Zhou Fang, Xinyu Yang, Shengyu Zhang, Baoyi He, Huaiyong Dou, Junchi Yan, Yongquan Zhang, Fei Wu:
Reconnecting the Broken Civilization: Patchwork Integration of Fragments from Ancient Manuscripts. 1157-1166
Oral Session VIII: Engaging Users with Multimedia -- Multimedia Applications
- Zixin Wang, Yadan Luo, Zhi Chen, Sen Wang, Zi Huang:
Cal-SFDA: Source-Free Domain-adaptive Semantic Segmentation with Differentiable Expected Calibration Error. 1167-1178 - Runmin Cong, Mengyao Sun, Sanyi Zhang, Xiaofei Zhou, Wei Zhang, Yao Zhao:
Frequency Perception Network for Camouflaged Object Detection. 1179-1189 - Xiaoshuai Wu, Xin Liao, Bo Ou:
SepMark: Deep Separable Watermarking for Unified Source Tracing and Deepfake Detection. 1190-1201 - Runmin Cong, Yuchen Guan, Jinpeng Chen, Wei Zhang, Yao Zhao, Sam Kwong:
SDDNet: Style-guided Dual-layer Disentanglement Network for Shadow Detection. 1202-1211 - Hao Tan, Weichao Kong, Feng Zhang, Wenjin Qin, Jianjun Wang:
High-Order Tensor Recovery Coupling Multilayer Subspace Priori with Application in Video Restoration. 1212-1220 - Chen Wang, Jiadai Sun, Lina Liu, Chenming Wu, Zhelun Shen, Dayan Wu, Yuchao Dai, Liangjun Zhang:
Digging into Depth Priors for Outdoor Neural Radiance Fields. 1221-1230 - Fanrui Zhang, Jiawei Liu, Qiang Zhang, Esther Sun, Jingyi Xie, Zheng-Jun Zha:
ECENet: Explainable and Context-Enhanced Network for Muti-modal Fact verification. 1231-1240 - Baochen Xiong, Xiaoshan Yang, Yaguang Song, Yaowei Wang, Changsheng Xu:
Client-Adaptive Cross-Model Reconstruction Network for Modality-Incomplete Multimodal Federated Learning. 1241-1249 - Jinpeng Lin, Min Zhou, Ye Ma, Yifan Gao, Chenxi Fei, Yangjian Chen, Zhang Yu, Tiezheng Ge:
AutoPoster: A Highly Automatic and Content-aware Design System for Advertising Poster Generation. 1250-1260 - Gangyan Zeng, Yuan Zhang, Yu Zhou, Bo Fang, Guoqing Zhao, Xin Wei, Weiping Wang:
Filling in the Blank: Rationale-Augmented Prompt Tuning for TextVQA. 1261-1272 - Liuhan Chen, Yirou Wang, Yongyong Chen:
End-to-end XY Separation for Single Image Blind Deblurring. 1273-1282 - Junxian Chen, Ying Liu, Yiqi Liang, Dandan Long, Xiaolin He, Ruihui Li:
SD-Net: Spatially-Disentangled Point Cloud Completion Network. 1283-1293 - Jiawei Jiang, Yuchao Feng, Jiacheng Chen, Dongyan Guo, Jianwei Zheng:
Latent-space Unfolding for MRI Reconstruction. 1294-1302 - Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu:
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World. 1303-1313 - Pengteng Li, Ying He, F. Richard Yu, Pinhao Song, Dongfu Yin, Guang Zhou:
IGG: Improved Graph Generation for Domain Adaptive Object Detection. 1314-1324 - De Cheng, Lingfeng He, Nannan Wang, Shizhou Zhang, Zhen Wang, Xinbo Gao:
Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID. 1325-1333 - Xun Jiang, Zailei Zhou, Xing Xu, Yang Yang, Guoqing Wang, Heng Tao Shen:
Faster Video Moment Retrieval with Point-Level Supervision. 1334-1342 - Xianliang Huang, Jiajie Gou, Shuhang Chen, Zhizhou Zhong, Jihong Guan, Shuigeng Zhou:
IDDR-NGP: Incorporating Detectors for Distractors Removal with Instant Neural Radiance Field. 1343-1351 - Junzhe Zhang, Tong Chen, Dandan Ding, Zhan Ma:
G-PCC++: Enhanced Geometry-based Point Cloud Compression. 1352-1363 - Zhengcong Fei, Mingyuan Fan, Junshi Huang:
Gradient-Free Textual Inversion. 1364-1373 - Qiaosong Qi, Le Zhuo, Aixi Zhang, Yue Liao, Fei Fang, Si Liu, Shuicheng Yan:
DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation. 1374-1382 - Peihuan Huang, Gaofeng Cao, Fei Zhou, Guoping Qiu:
Video Inverse Tone Mapping Network with Luma and Chroma Mapping. 1383-1391 - Qi Jia, Xiaomei Feng, Yu Liu, Xin Fan, Longin Jan Latecki:
Learning Pixel-wise Alignment for Unsupervised Image Stitching. 1392-1400 - Han Yan, Haijun Zhang, Xiangyu Mu, Jicong Fan, Zhao Zhang:
FashionDiff: A Controllable Diffusion Model Using Pairwise Fashion Elements for Intelligent Design. 1401-1411 - Wei Yu, Qi Zhu, Naishan Zheng, Jie Huang, Man Zhou, Feng Zhao:
Learning Non-Uniform-Sampling for Ultra-High-Definition Image Enhancement. 1412-1421 - Haoxing Chen, Zhangxuan Gu, Yaohui Li, Jun Lan, Changhua Meng, Weiqiang Wang, Huaxiong Li:
Hierarchical Dynamic Image Harmonization. 1422-1430 - Sha Guo, Zhuo Chen, Yang Zhao, Ning Zhang, Xiaotong Li, Lingyu Duan:
Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach. 1431-1442 - Kaixun Jiang, Zhaoyu Chen, Xinyu Zhou, Jingyu Zhang, Lingyi Hong, Jiafeng Wang, Bo Li, Yan Wang, Wenqiang Zhang:
Towards Decision-based Sparse Attacks on Video Recognition. 1443-1454 - Mingqi Fang, Lingyun Yu, Hongtao Xie, Junqiang Wu, Zezheng Wang, Jiahong Li, Yongdong Zhang:
RAIRNet: Region-Aware Identity Rectification for Face Forgery Detection. 1455-1464 - Xiao He, Chang Tang, Xin Zou, Wei Zhang:
Multispectral Object Detection via Cross-Modal Conflict-Aware Learning. 1465-1474 - Huan Zheng, Zhao Zhang, Jicong Fan, Richang Hong, Yi Yang, Shuicheng Yan:
Decoupled Cross-Scale Cross-View Interaction for Stereo Image Enhancement in the Dark. 1475-1484 - Kexin Li, Zongxin Yang, Lei Chen, Yi Yang, Jun Xiao:
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation. 1485-1494 - Zisong Chen, Chunyu Lin, Lang Nie, Zhijie Shen, Kang Liao, Yuanzhouhan Cao, Yao Zhao:
S-OmniMVS: Incorporating Sphere Geometry into Omnidirectional Stereo Matching. 1495-1503 - Yichen Zhang, Yifang Yin, Ying Zhang, Zhenguang Liu, Zheng Wang, Roger Zimmermann:
Prototypical Cross-domain Knowledge Transfer for Cervical Dysplasia Visual Inspection. 1504-1514 - Yuchen Sun, Qianqian Xu, Zitai Wang, Qingming Huang:
When Measures are Unreliable: Imperceptible Adversarial Perturbations toward Top-k Multi-Label Learning. 1515-1526 - Bowei Xu, Hao Chen, Zhan Ma:
Karma: Adaptive Video Streaming via Causal Sequence Modeling. 1527-1535 - Xinting Liao, Chaochao Chen, Weiming Liu, Pengyang Zhou, Huabin Zhu, Shuheng Shen, Weiqiang Wang, Mengling Hu, Yanchao Tan, Xiaolin Zheng:
Joint Local Relational Augmentation and Global Nash Equilibrium for Federated Learning with Non-IID Data. 1536-1545 - Jin Wang, Jiade Chen, Yunhui Shi, Nam Ling, Baocai Yin:
SSPU-Net: A Structure Sensitive Point Cloud Upsampling Network with Multi-Scale Spatial Refinement. 1546-1555 - Haoyue Wang, Sheng Li, Silu Cao, Rui Yang, Jishen Zeng, Zhenxing Qian, Xinpeng Zhang:
On Physically Occluded Fake Identity Document Detection. 1556-1564 - Deqi Li, Shi-Sheng Huang, Tianyu Shen, Hua Huang:
Dynamic View Synthesis with Spatio-Temporal Feature Warping from Sparse Views. 1565-1576
Oral Session IX: Engaging Users with Multimedia -- Social-good, Fairness and Transparency
- Shengfang Zhai, Yinpeng Dong, Qingni Shen, Shi Pu, Yuejian Fang, Hang Su:
Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning. 1577-1587 - Jingxuan Tan, Nan Zhong, Zhenxing Qian, Xinpeng Zhang, Sheng Li:
Deep Neural Network Watermarking against Model Extraction Attack. 1588-1597 - Yu Bai, Bo Zhang, Zheng Zhang, Wu Liu, Jinwen Li, Xiangyang Gong, Wendong Wang:
CoCa: A Connectivity-Aware Cascade Framework for Histology Gland Segmentation. 1598-1606 - Bo Zhang, Yunpeng Tan, Zheng Zhang, Wu Liu, Hui Gao, Zhijun Xi, Wendong Wang:
Factorized Omnidirectional Representation based Vision GNN for Anisotropic 3D Multimodal MR Image Segmentation. 1607-1615 - Rui Hu, Yahan Tu, Jitao Sang:
Echoes: Unsupervised Debiasing via Pseudo-bias Labeling in an Echo Chamber. 1616-1624 - Luxin Cai, Naiyue Chen, Yuanzhouhan Cao, Jiahuan He, Yidong Li:
FedCE: Personalized Federated Learning Method based on Clustering Ensembles. 1625-1633
Oral Session X: Multimedia systems -- Data Systems Management and Indexing
- Naoki Ono, Yusuke Matsui:
Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search. 1659-1667 - Cheng Xiong, Chuan Qin, Guorui Feng, Xinpeng Zhang:
Flexible and Secure Watermarking for Latent Diffusion Model. 1668-1676 - Rukai Wei, Yu Liu, Jingkuan Song, Heng Cui, Yanzhao Xie, Ke Zhou:
CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing. 1677-1688
Oral Session XI: Multimedia systems -- Systems and Middleware, Transport and Delivery
- Rui Lu, Lai Wei, Shuntao Zhu, Chuang Hu, Dan Wang:
Pagoda: Privacy Protection for Volumetric Video Streaming through Poisson Diffusion Model. 1689-1697 - Yuyang Leng, Renyuan Liu, Hongpeng Guo, Songqing Chen, Shuochao Yao:
ScaleFlow: Efficient Deep Vision Pipeline with Closed-Loop Scale-Adaptive Inference. 1698-1706 - Tianchi Huang, Rui-Xiao Zhang, Chenglei Wu, Lifeng Sun:
Optimizing Adaptive Video Streaming with Human Feedback. 1707-1718
Poster Session I: Understanding Multimedia Content -- Media Interpretation
- Hao Tang, Jun Liu, Shuanglin Yan, Rui Yan, Zechao Li, Jinhui Tang:
M3Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition. 1719-1728 - Chen Cheng, Jingkuan Song, Xiaosu Zhu, Junchen Zhu, Lianli Gao, Hengtao Shen:
CUCL: Codebook for Unsupervised Continual Learning. 1729-1737 - Yang Liu, Chen Chen, Can Wang, Xulin King, Mengyuan Liu:
Regress Before Construct: Regress Autoencoder for Point Cloud Self-supervised Learning. 1738-1749 - Bo Wang, Zhao Zhang, Suiyi Zhao, Haijun Zhang, Richang Hong, Meng Wang:
CropCap: Embedding Visual Cross-Partition Dependency for Image Captioning. 1750-1758 - Yanqi Wu, Xue Song, Jingjing Chen, Yu-Gang Jiang:
Generalizing Face Forgery Detection via Uncertainty Learning. 1759-1767 - Bingqing Zhang, Sen Wang, Yifan Liu, Brano Kusy, Xue Li, Jiajun Liu:
Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection. 1768-1778 - Yuanshen Guan, Ruikang Xu, Mingde Yao, Lizhi Wang, Zhiwei Xiong:
Mutual-Guided Dynamic Network for Image Fusion. 1779-1788 - Chenxi Xie, Changqun Xia, Tianshu Yu, Jia Li:
Frequency Representation Integration for Camouflaged Object Detection. 1789-1797 - Tao Wang, Lei Jin, Zhang Wang, Xiaojin Fan, Yu Cheng, Yinglei Teng, Junliang Xing, Jian Zhao:
DecenterNet: Bottom-Up Human Pose Estimation Via Decentralized Pose Representation. 1798-1808 - Jingyi Wang, Can Zhang, Jinfa Huang, Botao Ren, Zhidong Deng:
Improving Scene Graph Generation with Superpixel-Based Interaction Learning. 1809-1820 - Shifeng Xia, Lin Geng, Ningzhong Liu, Han Sun, Jie Qin:
Lifelong Scene Text Recognizer via Expert Modules. 1821-1830 - Zhen Ye, Wei Xue, Xu Tan, Jie Chen, Qifeng Liu, Yike Guo:
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model. 1831-1839 - Runhao Zeng, Qi Deng, Huixuan Xu, Shuaicheng Niu, Jian Chen:
Exploring Motion Cues for Video Test-Time Adaptation. 1840-1850 - Yan Shu, Wei Wang, Yu Zhou, Shaohui Liu, Aoting Zhang, Dongbao Yang, Weiping Wang:
Perceiving Ambiguity and Semantics without Recognition: An Efficient and Effective Ambiguous Scene Text Detector. 1851-1862 - Jiaming Chu, Lei Jin, Xiaojin Fan, Yinglei Teng, Yunchao Wei, Yuqiang Fang, Junliang Xing, Jian Zhao:
Single-Stage Multi-human Parsing via Point Sets and Center-based Offsets. 1863-1873 - Chengxiao Sun, Yan Xu, Jialun Pei, Haopeng Fang, He Tang:
Partitioned Saliency Ranking with Dense Pyramid Transformers. 1874-1883 - Jianbiao Mei, Yu Yang, Mengmeng Wang, Zizhang Li, Xiaojun Hou, Jongwon Ra, Laijian Li, Yong Liu:
CenterLPS: Segment Instances by Centers for LiDAR Panoptic Segmentation. 1884-1894 - Zhenhua Ning, Zhuotao Tian, Guangming Lu, Wenjie Pei:
Boosting Few-shot 3D Point Cloud Segmentation via Query-Guided Enhancement. 1895-1904 - Mu Chen, Zhedong Zheng, Yi Yang, Tat-Seng Chua:
PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation. 1905-1914 - Xinyan Zu, Haiyang Yu, Bin Li, Xiangyang Xue:
Weakly-Supervised Text Instance Segmentation. 1915-1923 - Wenjie Xuan, Shanshan Zhao, Yu Yao, Juhua Liu, Tongliang Liu, Yixin Chen, Bo Du, Dacheng Tao:
PNT-Edge: Towards Robust Edge Detection with Noisy Labels by Learning Pixel-level Noise Transitions. 1924-1932 - Pan Gao, Haoyue Tian, Jie Qin:
Video Frame Interpolation with Flow Transformer. 1933-1942 - Xianghao Kong, Wentao Jiang, Jinrang Jia, Yifeng Shi, Runsheng Xu, Si Liu:
DUSA: Decoupled Unsupervised Sim2Real Adaptation for Vehicle-to-Everything Collaborative Perception. 1943-1954 - Ruiqi Zhang, Jie Chen, Qiang Wang:
Explicifying Neural Implicit Fields for Efficient Dynamic Human Avatar Modeling via a Neural Explicit Surface. 1955-1963 - Shili Zhou, Xuhao Jiang, Weimin Tan, Ruian He, Bo Yan:
MVFlow: Deep Optical Flow Estimation of Compressed Videos with Motion Vector Prior. 1964-1974 - Ri Cheng, Xuhao Jiang, Ruian He, Shili Zhou, Weimin Tan, Bo Yan:
Uncertainty-Guided Spatial Pruning Architecture for Efficient Frame Interpolation. 1975-1986 - Junshan Hu, Liansheng Zhuang, Weisong Dong, Shiming Ge, Shafei Wang:
Learning Generalized Representations for Open-Set Temporal Action Localization. 1987-1996 - Jie Gao, Bineng Zhong, Yan Chen:
Unambiguous Object Tracking by Exploiting Target Cues. 1997-2005 - Keran Wang, Hongtao Xie, Yuxin Wang, Dongming Zhang, Yadong Qu, Zuan Gao, Yongdong Zhang:
Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text Detection. 2006-2015 - Jiamin Chen, Jianlou Si, Naihao Liu, Yao Wu, Li Niu, Chen Qian:
Object Part Parsing with Hierarchical Dual Transformer. 2016-2024 - Xugong Qin, Pengyuan Lyu, Chengquan Zhang, Yu Zhou, Kun Yao, Peng Zhang, Hailun Lin, Weiping Wang:
Towards Robust Real-Time Scene Text Detection: From Semantic to Instance Representation Learning. 2025-2034 - Xiyao Ma, Shiqi Liu, Xiaoliang Xie, Xiao-Hu Zhou, Zengguang Hou, Xinkai Qu, Wenzheng Han, Ming Wang, Meng Song, Lin-Sen Zhang:
Towards Flexible and Universal: A Novel Endpoint-based Framework for Vessel Structural Information Extraction. 2035-2044 - Sejin Park, Taehyung Lee, Yeejin Lee, Byeongkeun Kang:
FDCNet: Feature Drift Compensation Network for Class-Incremental Weakly Supervised Object Localization. 2045-2053 - Meng Shen, Yanzuo Lu, Yanxu Hu, Andy J. Ma:
Collaborative Learning of Diverse Experts for Source-free Universal Domain Adaptation. 2054-2065 - Wentao Yang, Zhe Li, Dezhi Peng, Lianwen Jin, Mengchao He, Cong Yao:
Read Ten Lines at One Glance: Line-Aware Semi-Autoregressive Transformer for Multi-Line Handwritten Mathematical Expression Recognition. 2066-2077 - Kejun Lin, Zhixiang Wang, Zheng Wang, Yinqiang Zheng, Shin'ichi Satoh:
Beyond Domain Gap: Exploiting Subjectivity in Sketch-Based Person Retrieval. 2078-2089 - Ben Sha, Baopu Li, Tao Chen, Jiayuan Fan, Tao Sheng:
Rethinking Pseudo-Label-Based Unsupervised Person Re-ID with Hierarchical Prototype-based Graph. 2090-2100 - Kehua Guo, Rui Ding, Tian Qiu, Xiangyuan Zhu, Zheng Wu, Liwei Wang, Hui Fang:
Single Domain Generalization via Unsupervised Diversity Probe. 2101-2111 - Ruijin Liu, Ning Lu, Dapeng Chen, Cheng Li, Zejian Yuan, Wei Peng:
PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer. 2112-2120 - Houzhang Fang, Zikai Liao, Lu Wang, Qingshan Li, Yi Chang, Luxin Yan, Xuhua Wang:
DANet: Multi-scale UAV Target Detection with Dynamic Feature Perception and Scale-aware Knowledge Distillation. 2121-2130 - Bo Dong, Jialun Pei, Rongrong Gao, Tian-Zhu Xiang, Shuo Wang, Huan Xiong:
A Unified Query-based Paradigm for Camouflaged Instance Segmentation. 2131-2138 - Jialun Pei, Zhangjun Zhou, Yueming Jin, He Tang, Pheng-Ann Heng:
Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy Dichotomous Image Segmentation. 2139-2147 - Yuxiang Cai, Meng Xi, Yongheng Shang, Jianwei Yin:
Exploring High-Correlation Source Domain Information for Multi-Source Domain Adaptation in Semantic Segmentation. 2148-2158 - Linfeng Tan, Jiangtong Li, Li Niu, Liqing Zhang:
Deep Image Harmonization in Dual Color Spaces. 2159-2167 - Wenyu Zhang, Xin Deng, Baojun Jia, Xingtong Yu, Yifan Chen, Jin Ma, Qing Ding, Xinming Zhang:
Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution. 2168-2179 - Yanqi Bao, Yuxin Li, Jing Huo, Tianyu Ding, Xinyue Liang, Wenbin Li, Yang Gao:
Where and How: Mitigating Confusion in Neural Radiance Fields from Sparse Inputs. 2180-2188 - Hang Guo, Tao Dai, Mingyan Zhu, Guanghao Meng, Bin Chen, Zhi Wang, Shu-Tao Xia:
One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer. 2189-2198 - Muxin Liao, Shishun Tian, Yuhang Zhang, Guoguang Hua, Wenbin Zou, Xia Li:
Calibration-based Dual Prototypical Contrastive Learning Approach for Domain Generalization Semantic Segmentation. 2199-2210 - Wentian Xin, Qiguang Miao, Yi Liu, Ruyi Liu, Chi-Man Pun, Cheng Shi:
Skeleton MixFormer: Multivariate Topology Representation for Skeleton-based Action Recognition. 2211-2220 - Xiaojie Li, Shaowei He, Jianlong Wu, Yue Yu, Liqiang Nie, Min Zhang:
Mask Again: Masked Knowledge Distillation for Masked Video Modeling. 2221-2232 - Mingxuan Zhang, Xiao Wu, Zhaoquan Yuan, Qi He, Xiang Huang:
Human-Object-Object Interaction: Towards Human-Centric Complex Interaction Detection. 2233-2242 - Yilun Zhang, Yuqian Fu, Xingjun Ma, Lizhe Qi, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang:
On the Importance of Spatial Relations for Few-shot Action Recognition. 2243-2251 - Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu, Xiangnan He:
CgT-GAN: CLIP-guided Text GAN for Image Captioning. 2252-2263 - Xiaojie Li, Jianlong Wu, Shaowei He, Shuo Kang, Yue Yu, Liqiang Nie, Min Zhang:
Fine-grained Key-Value Memory Enhanced Predictor for Video Representation Learning. 2264-2274 - Ziyang Gong, Fuhao Li, Yupeng Deng, Wenjun Shen, Xianzheng Ma, Zhenming Ji, Nan Xia:
Train One, Generalize to All: Generalizable Semantic Segmentation from Single-Scene to All Adverse Scenes. 2275-2284 - Cheng Zhang, Yu Zhu, Qingsen Yan, Jinqiu Sun, Yanning Zhang:
All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation. 2285-2293 - Ziyu Yang, Sucheng Ren, Zongwei Wu, Nanxuan Zhao, Junle Wang, Jing Qin, Shengfeng He:
NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos. 2294-2304 - Zengbin Wang, Saihui Hou, Man Zhang, Xu Liu, Chunshui Cao, Yongzhen Huang, Shibiao Xu:
LandmarkGait: Intrinsic Human Parsing for Gait Recognition. 2305-2314 - Wenjia Ren, Qingmin Liao, Zhijing Shao, Xiangru Lin, Xin Yue, Yu Zhang, Zongqing Lu:
Patchmatch Stereo++: Patchmatch Binocular Stereo with Continuous Disparity Optimization. 2315-2325 - Rui Wang, Cong Zou, Weizhong Zhang, Zixuan Zhu, Lihua Jing:
Consistency-aware Feature Learning for Hierarchical Fine-grained Visual Classification. 2326-2334 - Jun Yu, Peng He, Ziqi Peng:
FSR-Net: Deep Fourier Network for Shadow Removal. 2335-2343 - Tianwei Yu, Peng Chen, Yuanjie Dang, Ruohong Huan, Ronghua Liang:
Multi-Speed Global Contextual Subspace Matching for Few-Shot Action Recognition. 2344-2352 - Haonan Wang, Jie Liu, Jie Tang, Gangshan Wu:
Lightweight Super-Resolution Head for Human Pose Estimation. 2353-2361 - Yunkee Chae, Junghyun Koo, Sungho Lee, Kyogu Lee:
Exploiting Time-Frequency Conformers for Music Audio Enhancement. 2362-2370 - Jiaming Liu, Yue Wu, Maoguo Gong, Qiguang Miao, Wenping Ma, Cai Xu:
Exploring Dual Representations in Large-Scale Point Clouds: A Simple Weakly Supervised Semantic Segmentation Framework. 2371-2380 - Keke Chen, Xiangbo Shu, Guo-Sen Xie, Rui Yan, Jinhui Tang:
Foreground/Background-Masked Interaction Learning for Spatio-temporal Action Detection. 2381-2390 - Xin Wang, Benyuan Meng, Hong Chen, Yuan Meng, Ke Lv, Wenwu Zhu:
TIVA-KG: A Multimodal Knowledge Graph with Text, Image, Video and Audio. 2391-2399 - Wanqing Zhao, Yuta Nakashima, Haiyuan Chen, Noboru Babaguchi:
Enhancing Fake News Detection in Social Media via Label Propagation on Cross-modal Tweet Graph. 2400-2408 - Xingxing Yang, Jie Chen, Zaifeng Yang:
Cooperative Colorization: Exploring Latent Cross-Domain Priors for NIR Image Spectrum Translation. 2409-2417 - Yihao Huang, Liangru Sun, Qing Guo, Felix Juefei-Xu, Jiayi Zhu, Jincao Feng, Yang Liu, Geguang Pu:
ALA: Naturalness-aware Adversarial Lightness Attack. 2418-2426 - Liya Ji, Chan Ho Park, Zhefan Rao, Qifeng Chen:
Neural Image Popularity Assessment with Retrieval-augmented Transformer. 2427-2436 - Yanchao Liu, Xina Cheng, Takeshi Ikenaga:
A Figure Skating Jumping Dataset for Replay-Guided Action Quality Assessment. 2437-2445 - Yeying Jin, Beibei Lin, Wending Yan, Yuan Yuan, Wei Ye, Robby T. Tan:
Enhancing Visibility in Nighttime Haze Images Using Guided APSF and Gradient Adaptive Convolution. 2446-2457 - Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj:
Rethinking Voice-Face Correlation: A Geometry View. 2458-2467 - Baiang Li, Huan Zheng, Zhao Zhang, Yang Zhao, Zhongqiu Zhao, Haijun Zhang:
Dynamic Grouped Interaction Network for Low-Light Stereo Image Enhancement. 2468-2476 - Jiafu Wu, Jian Li, Jiangning Zhang, Boshen Zhang, Mingmin Chi, Yabiao Wang, Chengjie Wang:
PVG: Progressive Vision Graph for Vision Recognition. 2477-2486 - Chenyi Zhuang, Pan Gao, Aljosa Smolic:
StylePrompter: All Styles Need Is Attention. 2487-2497 - Pengling Zhang, Huibin Yan, Wenhui Wu, Shuoyao Wang:
Improving Federated Person Re-Identification through Feature-Aware Proximity and Aggregation. 2498-2506 - Xizhe Xue, Dongdong Yu, Lingqiao Liu, Yu Liu, Satoshi Tsutsui, Ying Li, Zehuan Yuan, Ping Song, Mike Zheng Shou:
Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization. 2507-2515 - Dongliang Zhu, Ruimin Hu, Shengli Song, Xiang Guo, Xixi Li, Zheng Wang:
Cross-Illumination Video Anomaly Detection Benchmark. 2516-2525 - Yuanbin Fu, Xiaojie Guo:
Practical Edge Detection via Robust Collaborative Learning. 2526-2534 - Haoyi Xiu, Xin Liu, Weimin Wang, Kyoung-Sook Kim, Masashi Matsuoka:
MSECNet: Accurate and Robust Normal Estimation for 3D Point Clouds by Multi-Scale Edge Conditioning. 2535-2543 - Xiao Liu, Xiuya Shi, Lufei Chen, Linbo Qing, Chao Ren:
Efficient Parallel Multi-Scale Detail and Semantic Encoding Network for Lightweight Semantic Segmentation. 2544-2552 - Jiquan Zhong, Xiaolin Huang, Xiao Yu:
Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature Fusion in Dynamic Scenes. 2553-2563 - Yudong Mao, Peilin Chen, Shurun Wang, Shiqi Wang, Dapeng Wu:
Peering into The Sketch: Ultra-Low Bitrate Face Compression for Joint Human and Machine Perception. 2564-2572 - Xiaodong Jin, Taiping Zhang:
MTSN: Multiscale Temporal Similarity Network for Temporal Action Localization. 2573-2581 - Guanzhou Ke, Yang Yu, Guoqing Chao, Xiaoli Wang, Chenyang Xu, Shengfeng He:
Disentangling Multi-view Representations Beyond Inductive Bias. 2582-2590 - Lei Zhao, Le Han, Min Yao, Nenggan Zheng:
Implicit Decouple Network for Efficient Pose Estimation. 2591-2599 - Zhenjie Chen, Hongsong Wang, Jie Gui:
Occluded Skeleton-Based Human Action Recognition with Dual Inhibition Training. 2625-2634 - Xujie Kang, Kanglin Liu, Jiang Duan, Yuanhao Gong, Guoping Qiu:
P2I-NET: Mapping Camera Pose to Image via Adversarial Learning for New View Synthesis in Real Indoor Environments. 2635-2643 - Wenpeng Xing, Jie Chen, Ka Chun Cheung, Simon See:
IRCasTRF: Inverse Rendering by Optimizing Cascaded Tensorial Radiance Fields, Lighting, and Materials From Multi-view Images. 2644-2653 - Zhiqi Yu, Jingjing Li, Zhekai Du, Fengling Li, Lei Zhu, Yang Yang:
Noise-Robust Continual Test-Time Domain Adaptation. 2654-2662 - Zeyu Wang, Fabien Colonnier, Jinghong Zheng, Jyotibdha Acharya, Wenyu Jiang, Kejie Huang:
TIRDet: Mono-Modality Thermal InfraRed Object Detection Based on Prior Thermal-To-Visible Translation. 2663-2672 - Junzhe Cai, Shuiyan Chen, Heng Li, Beihao Xia, Zimin Mao, Wei Yuan:
HARP: Let Object Detector Undergo Hyperplasia to Counter Adversarial Patches. 2673-2683 - Lei Xu, Rei Kawakami, Nakamasa Inoue:
Scale-space Tokenization for Improving the Robustness of Vision Transformers. 2684-2693 - Kosuke Mizufune, Shunsuke Tanaka, Toshihide Yukitake, Tatsushi Matsubayashi:
Margin MCC: Chance-Robust Metric for Video Boundary Detection with Allowed Margin. 2694-2703 - Liangchen Song, Xuan Gong, Helong Zhou, Jiajie Chen, Qian Zhang, David S. Doermann, Junsong Yuan:
Exploring the Knowledge Transferred by Response-Based Teacher-Student Distillation. 2704-2713 - Feng Gao, Jiaxu Leng, Ji Gan, Xinbo Gao:
Selecting Learnable Training Samples is All DETRs Need in Crowded Pedestrian Detection. 2714-2722 - Qiankun Li, Xiaolong Huang, Zhifan Wan, Lanqing Hu, Shuzhe Wu, Jie Zhang, Shiguang Shan, Zengfu Wang:
Data-Efficient Masked Video Modeling for Self-supervised Action Recognition. 2723-2733 - Teng Fu, Xiaocong Wang, Haiyang Yu, Ke Niu, Bin Li, Xiangyang Xue:
DeNoising-MOT: Towards Multiple Object Tracking with Severe Occlusions. 2734-2743 - Peiran Xu, Yadong Mu:
Co-Salient Object Detection with Semantic-Level Consensus Extraction and Dispersion. 2744-2755 - Xuenan Xu, Zhiling Zhang, Zelin Zhou, Pingyue Zhang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu:
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data. 2756-2764 - Bingyang Wang, Tanlin Li, Jiannan Wu, Yi Jiang, Huchuan Lu, You He:
A Simple Baseline for Open-World Tracking via Self-training. 2765-2774 - Yuxuan Zhao, Jin Ma, Zhongang Qi, Zehua Xie, Yu Luo, Qiusheng Kang, Ying Shan:
VTLayout: A Multi-Modal Approach for Video Text Layout. 2775-2784 - Rajat Hebbar, Digbalay Bose, Shrikanth Narayanan:
SEAR: Semantically-grounded Audio Representations. 2785-2794 - Zongyuan Yang, Baolin Liu, Yongping Xiong, Lan Yi, Guibin Wu, Xiaojun Tang, Ziqi Liu, Junjie Zhou, Xing Zhang:
DocDiff: Document Enhancement via Residual Diffusion Models. 2795-2806 - Boshen Xu, Sipeng Zheng, Qin Jin:
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World. 2807-2816 - Sihan Ma, Qiong Cao, Hongwei Yi, Jing Zhang, Dacheng Tao:
GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction. 2817-2828 - Hui Lu, Xixin Wu, Zhiyong Wu, Helen Meng:
SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody. 2829-2837 - Xiaohan Wang, Yuehu Liu, Xinhang Song, Beibei Wang, Shuqiang Jiang:
Generating Explanations for Embodied Action Decision from Visual Observation. 2838-2846 - Jieteng Yao, Junjie Chen, Li Niu, Bin Sheng:
Scene-aware Human Pose Generation using Transformer. 2847-2855 - Wanying Zhang, Shen Zhao, Fanyang Meng, Songtao Wu, Mengyuan Liu:
Dynamic Compositional Graph Convolutional Network for Efficient Composite Human Motion Prediction. 2856-2864 - Jiaqi Li, Yiran Wang, Zihao Huang, Jinghong Zheng, Ke Xian, Zhiguo Cao, Jianming Zhang:
Diffusion-Augmented Depth Prediction with Sparse Annotations. 2865-2876 - Chunwei Wu, Guitao Cao, Yan Li, Xidong Xi, Wenming Cao, Hong Wang:
Chaos to Order: A Label Propagation Perspective on Source-Free Domain Adaptation. 2877-2887 - Lianggangxu Chen, Jiale Lu, Youqi Song, Changbo Wang, Gaoqi He:
Beware of Overcorrection: Scene-induced Commonsense Graph for Scene Graph Generation. 2888-2897 - Haiyang Yu, Xiaocong Wang, Ke Niu, Bin Li, Xiangyang Xue:
Scene Text Segmentation with Text-Focused Transformers. 2898-2907 - Liangwei Jiang, Jiaxin Chen, Di Huang, Yunhong Wang:
MIEP: Channel Pruning with Multi-granular Importance Estimation for Object Detection. 2908-2917
Poster Session II: Understanding Multimedia Content -- Multimodal Fusion and Embedding
- Shanshan Wang, Yiyang Chen, Zhenwei He, Xun Yang, Mengzhu Wang, Quanzeng You, Xingyi Zhang:
Disentangled Representation Learning with Causality for Unsupervised Domain Adaptation. 2918-2926 - Jie Wen, Gehui Xu, Chengliang Liu, Lunke Fei, Chao Huang, Wei Wang, Yong Xu:
Localized and Balanced Efficient Incomplete Multi-view Clustering. 2927-2935 - Mengzhu Wang, Junyang Chen, Huan Wang, Huisi Wu, Zhidan Liu, Qin Zhang:
Interpolation Normalization for Contrast Domain Generalization. 2936-2945 - Yujing Liu, Zongqian Wu, Zhengyu Lu, Guoqiu Wen, Junbo Ma, Guangquan Lu, Xiaofeng Zhu:
Multi-teacher Self-training for Semi-supervised Node Classification with Noisy Labels. 2946-2954 - Liang Yang, Jiayi Wang, Tingting Zhang, Dongxiao He, Chuan Wang, Yuanfang Guo, Xiaochun Cao, Bingxin Niu, Zhen Wang:
Long Short-Term Graph Memory Against Class-imbalanced Over-smoothing. 2955-2963 - Zitan Chen, Zhuang Qi, Xiao Cao, Xiangxian Li, Xiangxu Meng, Lei Meng:
Class-level Structural Relation Modeling and Smoothing for Visual Representation Learning. 2964-2972 - Shengkai Sun, Daizong Liu, Jianfeng Dong, Xiaoye Qu, Junyu Gao, Xun Yang, Xun Wang, Meng Wang:
Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding. 2973-2984 - Pan Mu, Zhiying Du, Jinyuan Liu, Cong Bai:
Little Strokes Fell Great Oaks: Boosting the Hierarchical Features for Multi-exposure Image Fusion. 2985-2993 - Jing Wang, Songhe Feng, Gengyu Lyu, Zhibin Gu:
Triple-Granularity Contrastive Learning for Deep Multi-View Subspace Clustering. 2994-3002 - Zhao Su, Yong Yang, Shuying Huang, Weiguo Wan, Wei Tu, Hangyuan Lu, Changjie Chen:
CTCP: Cross Transformer and CNN for Pansharpening. 3003-3011 - Yonghua Zhu, Zhenyun Deng, Yang Chen, Robert Amor, Michael Witbrock:
Chain of Propagation Prompting for Node Classification. 3012-3020 - Yi Wen, Suyuan Liu, Xinhang Wan, Siwei Wang, Ke Liang, Xinwang Liu, Xihong Yang, Pei Zhang:
Efficient Multi-View Graph Clustering with Local and Global Structure Preservation. 3021-3030 - Yi Wen, Siwei Wang, Ke Liang, Weixuan Liang, Xinhang Wan, Xinwang Liu, Suyuan Liu, Jiyuan Liu, En Zhu:
Scalable Incomplete Multi-View Clustering with Structure Alignment. 3031-3040 - Yi Bin, Haoxuan Li, Yahui Xu, Xing Xu, Yang Yang, Heng Tao Shen:
Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval. 3041-3050 - Cai Xu, Zehui Li, Ziyu Guan, Wei Zhao, Xiangyu Song, Yue Wu, Jianxin Li:
Unbalanced Multi-view Deep Learning. 3051-3059 - Shuping Zhao, Lunke Fei, Jie Wen, Bob Zhang, Pengyang Zhao:
Incomplete Multi-View Clustering with Regularized Hierarchical Graph. 3060-3068 - Man-Sheng Chen, Jia-Qi Lin, Chang-Dong Wang, Wu-Dong Xi, Dong Huang:
On Regularizing Multiple Clusterings for Ensemble Clustering by Graph Tensor Learning. 3069-3077 - Guixu Lin, Jin Han, Mingdeng Cao, Zhihang Zhong, Yinqiang Zheng:
Event-guided Frame Interpolation and Dynamic Range Expansion of Single Rolling Shutter Image. 3078-3088 - Peng Zhou, Liang Du:
Learnable Graph Filter for Multi-view Clustering. 3089-3098 - Zhuang Qi, Lei Meng, Zitan Chen, Han Hu, Hui Lin, Xiangxu Meng:
Cross-Silo Prototypical Calibration for Federated Learning with Non-IID Data. 3099-3107 - Hai Zhou, Zhe Xue, Ying Liu, Boang Li, Junping Du, Meiyu Liang, Yuankai Qi:
CALM: An Enhanced Encoding and Confidence Evaluating Framework for Trustworthy Multi-view Learning. 3108-3116 - Houlun Chen, Xin Wang, Xiaohan Lan, Hong Chen, Xuguang Duan, Jia Jia, Wenwu Zhu:
Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding. 3117-3128 - Lei Liu, Chenglong Li, Yun Xiao, Jin Tang:
Quality-Aware RGBT Tracking via Supervised Reliability Learning and Weighted Residual Guidance. 3129-3137 - Yang Wang, Bo Dong, Yuji Zhang, Yunduo Zhou, Haiyang Mei, Ziqi Wei, Xin Yang:
Event-Enhanced Multi-Modal Spiking Neural Network for Dynamic Obstacle Avoidance. 3138-3148 - Yujun Ma, Benjia Zhou, Ruili Wang, Pichao Wang:
Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition. 3149-3160 - Peng Zhao, Qiangchang Wang, Yilong Yin:
M3R: Masked Token Mixup and Cross-Modal Reconstruction for Zero-Shot Learning. 3161-3171 - Yicong Li, Xun Yang, An Zhang, Chun Feng, Xiang Wang, Tat-Seng Chua:
Redundancy-aware Transformer for Video Question Answering. 3172-3180 - Wanting Yin, Hongtao Xie, Lei Zhang, Jiannan Ge, Pandeng Li, Chuanbin Liu, Yongdong Zhang:
Frequency-based Zero-Shot Learning with Phase Augmentation. 3181-3189 - Shiyuan Yang, Xiaodong Chen, Jing Liao:
Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model. 3190-3199 - Fangjian Lin, Jianlong Yuan, Sitong Wu, Fan Wang, Zhibin Wang:
UniNeXt: Exploring A Unified Architecture for Vision Recognition. 3200-3208 - Junjie Wu, Chen Gong, Ziqiang Cao, Guohong Fu:
MCG-MNER: A Multi-Granularity Cross-Modality Generative Framework for Multimodal NER with Instruction. 3209-3218 - Siran Peng, Chenhao Guo, Xiao Wu, Liang-Jian Deng:
U2Net: A General Framework with Spatial-Spectral-Integrated Double U-Net for Image Fusion. 3219-3227 - Yansheng Qiu, Ziyuan Zhao, Hongdou Yao, Delin Chen, Zheng Wang:
Modal-aware Visual Prompting for Incomplete Multi-modal Brain Tumor Segmentation. 3228-3239 - Hui Tang, Xun Liang:
Where to Find Fascinating Inter-Graph Supervision: Imbalanced Graph Classification with Kernel Information Bottleneck. 3240-3249 - Wuyuan Xie, Kaimin Wang, Yakun Ju, Miaohui Wang:
pmBQA: Projection-based Blind Point Cloud Quality Assessment via Multimodal Learning. 3250-3258 - Zihao Zhang, Qianqian Wang, Zhiqiang Tao, Quanxue Gao, Wei Feng:
Dropping Pathways Towards Deep Multi-View Graph Subspace Clustering Networks. 3259-3267 - Penglei Wang, Danyang Wu, Rong Wang, Feiping Nie:
Multi-view Graph Clustering via Efficient Global-Local Spectral Embedding Fusion. 3268-3276 - Hao Wang, Zhi-Qi Cheng, Jingdong Sun, Xin Yang, Xiao Wu, Hongyang Chen, Yan Yang:
Debunking Free Fusion Myth: Online Multi-view Anomaly Detection with Disentangled Product-of-Experts Modeling. 3277-3286 - Yunlong Lin, Zhenqi Fu, Ge Meng, Yingying Wang, Yuhang Dong, Linyu Fan, Hedeng Yu, Xinghao Ding:
Domain-irrelevant Feature Learning for Generalizable Pan-sharpening. 3287-3296 - Qingwei Wang, Jinyu Yang, Xiaosheng Yu, Fangyi Wang, Peng Chen, Feng Zheng:
Depth-aided Camouflaged Object Detection. 3297-3306 - Wei Ji, Jingjing Li, Cheng Bian, Zhicheng Zhang, Li Cheng:
SemanticRT: A Large-Scale Dataset and Method for Robust Semantic Segmentation in Multispectral Images. 3307-3316 - Zhuo Chen, Jiaoyan Chen, Wen Zhang, Lingbing Guo, Yin Fang, Yufeng Huang, Yichi Zhang, Yuxia Geng, Jeff Z. Pan, Wenting Song, Huajun Chen:
MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid. 3317-3327 - Jiayi Zhang, Weixin Li:
Multi-Modal and Multi-Scale Temporal Fusion Architecture Search for Audio-Visual Video Parsing. 3328-3336 - Jiaqi Li, Guilin Qi, Chuanyi Zhang, Yongrui Chen, Yiming Tan, Chenlong Xia, Ye Tian:
Incorporating Domain Knowledge Graph into Multimodal Movie Genre Classification with Self-Supervised Attention and Contrastive Learning. 3337-3345 - Yong Yang, Mengzhen Li, Shuying Huang, Hangyuan Lu, Wei Tu, Weiguo Wan:
Multi-scale Spatial-Spectral Attention Guided Fusion Network for Pansharpening. 3346-3354 - Xuehao Wang, Shuai Li, Chenglizhao Chen, Aimin Hao, Hong Qin:
Modality Profile - A New Critical Aspect to be Considered When Generating RGB-D Salient Object Detection Training Set. 3355-3364 - Meng Liu, Ke Liang, Dayu Hu, Hao Yu, Yue Liu, Lingyuan Meng, Wenxuan Tu, Sihang Zhou, Xinwang Liu:
TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification. 3365-3374 - Mufeng Yao, Jiaqi Wang, Jinlong Peng, Mingmin Chi, Chao Liu:
FOLT: Fast Multiple Object Tracking from UAV-captured Videos Based on Optical Flow. 3375-3383 - Zihan Li, Yuan Zheng, Xiangde Luo, Dandan Shan, Qingqi Hong:
ScribbleVC: Scribble-supervised Medical Image Segmentation with Vision-Class Embedding. 3384-3393 - Jiaqing Fan, Tiankang Su, Kaihua Zhang, Bo Liu, Qingshan Liu:
Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation. 3394-3402 - Haowei Wang, Jiji Tang, Jiayi Ji, Xiaoshuai Sun, Rongsheng Zhang, Yiwei Ma, Minda Zhao, Lincheng Li, Zeng Zhao, Tangjie Lv, Rongrong Ji:
Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation. 3403-3414 - Kongming Liang, Xinran Wang, Haiwen Zhang, Zhanyu Ma, Jun Guo:
Hierarchical Visual Attribute Learning in the Wild. 3415-3423 - Qiang Zhang, Jiawei Liu, Fanrui Zhang, Jingyi Xie, Zheng-Jun Zha:
Hierarchical Semantic Enhancement Network for Multimodal Fake News Detection. 3424-3433 - Meng Shen, Yizheng Huang, Jianxiong Yin, Heqing Zou, Deepu Rajan, Simon See:
Towards Balanced Active Learning for Multimodal Classification. 3434-3445 - Shiping Ge, Zhiwei Jiang, Yafeng Yin, Cong Wang, Zifeng Cheng, Qing Gu:
Learning Event-Specific Localization Preferences for Audio-Visual Event Localization. 3446-3454 - Zongwei Wu, Jingjing Wang, Zhuyun Zhou, Zhaochong An, Qiuping Jiang, Cédric Demonceaux, Guolei Sun, Radu Timofte:
Object Segmentation by Mining Cross-Modal Semantics. 3455-3464 - Wenxin Ni, Qianqian Xu, Yangbangyan Jiang, Zongsheng Cao, Xiaochun Cao, Qingming Huang:
PSNEA: Pseudo-Siamese Network for Entity Alignment between Multi-modal Knowledge Graphs. 3489-3497 - Xinyue Chen, Jie Xu, Yazhou Ren, Xiaorong Pu, Ce Zhu, Xiaofeng Zhu, Zhifeng Hao, Lifang He:
Federated Deep Multi-View Clustering with Global Self-Supervision. 3498-3506 - Sung Jin Um, Dongjin Kim, Jung Uk Kim:
Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization. 3507-3516 - Fangming Zhong, Chenglong Chu, Zijie Zhu, Zhikui Chen:
Hypergraph-Enhanced Hashing for Unsupervised Cross-Modal Retrieval via Robust Similarity Guidance. 3517-3527 - Yue Liu, Ke Liang, Jun Xia, Xihong Yang, Sihang Zhou, Meng Liu, Xinwang Liu, Stan Z. Li:
Reinforcement Graph Clustering with Unknown Cluster Number. 3528-3537 - Jingyu Wu, Shi Chen, Shuyu Gan, Weijun Li, Changyuan Yang, Lingyun Sun:
Cultural Self-Adaptive Multimodal Gesture Generation Based on Multiple Culture Gesture Dataset. 3538-3549 - Xin Zou, Chang Tang, Xiao Zheng, Zhenglai Li, Xiao He, Shan An, Xinwang Liu:
DPNET: Dynamic Poly-attention Network for Trustworthy Multi-modal Classification. 3550-3559 - Zhihao Zhang, Yiwei Chen, Weizhan Zhang, Caixia Yan, Qinghua Zheng, Qi Wang, Wangdu Chen:
Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer. 3560-3568 - Jinda Lu, Shuo Wang, Xinyu Zhang, Yanbin Hao, Xiangnan He:
Semantic-based Selection, Synthesis, and Supervision for Few-shot Learning. 3569-3578 - Jinyong Wen, Shiming Xiang, Chunhong Pan:
Exploring Universal Principles for Graph Contrastive Learning: A Statistical Perspective. 3579-3589 - Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya Poria:
Text-to-Audio Generation using Instruction Guided Latent Diffusion Model. 3590-3598 - Shangyu Xing, Fei Zhao, Zhen Wu, Chunhui Li, Jianbing Zhang, Xinyu Dai:
DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking. 3599-3608 - Shaokui Gu, Xu Yuan, Liang Zhao, Zhenjiao Liu, Yan Hu, Zhikui Chen:
MVCIR-net: Multi-view Clustering Information Reinforcement Network. 3609-3618 - Yixi Liu, Yuze Tan, Hongjie Wu, Shudong Huang, Yazhou Ren, Jiancheng Lv:
Preserving Local and Global Information: An Effective Metric-based Subspace Clustering. 3619-3627 - Jiaming Gu, Jingyu Zhang, Muyang Zhang, Weiliang Meng, Shibiao Xu, Jiguang Zhang, Xiaopeng Zhang:
FeaCo: Reaching Robust Feature-Level Consensus in Noisy Pose Conditions. 3628-3636 - Masayasu Muraoka, Bishwaranjan Bhattacharjee, Michele Merler, Graeme Blackwood, Yulong Li, Yang Zhao:
Cross-Lingual Transfer of Large Language Model by Visually-Derived Supervision Toward Low-Resource Languages. 3637-3646 - Jingyang Yuan, Xiao Luo, Yifang Qin, Zhengyang Mao, Wei Ju, Ming Zhang:
ALEX: Towards Effective Graph Transfer Learning with Noisy Labels. 3647-3656 - Chenwei Zhang, Yuxuan Hu, Min Yang, Chengming Li, Xiping Hu:
Skeletal Spatial-Temporal Semantics Guided Homogeneous-Heterogeneous Multimodal Network for Action Recognition. 3657-3666 - Zhong Chen, Zhizhong Zhang, Xin Tan, Yanyun Qu, Yuan Xie:
Unveiling the Power of CLIP in Unsupervised Visible-Infrared Person Re-Identification. 3667-3675 - Haowen Wang, Zhipeng Fan, Zhen Zhao, Zhengping Che, Zhiyuan Xu, Dong Liu, Feifei Feng, Yakun Huang, Xiuquan Qiao, Jian Tang:
DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field. 3676-3685 - Yuechen Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li:
Text-Only Training for Visual Storytelling. 3686-3695 - Zihao Zhang, Jie Wang, Yahong Han:
Saliency Prototype for RGB-D and RGB-T Salient Object Detection. 3696-3705 - Zhu Liu, Jinyuan Liu, Benzhuang Zhang, Long Ma, Xin Fan, Risheng Liu:
PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation. 3706-3714 - Baogui Xu, Chengjin Xu, Bing Su:
Cross-Modal Graph Attention Network for Entity Alignment. 3715-3723 - Yuwei Zhou, Xin Wang, Hong Chen, Xuguang Duan, Wenwu Zhu:
Intra- and Inter-Modal Curriculum for Multimodal Learning. 3724-3735 - Yaobin Zhang, Jianming Lv, Chen Liu, Hongmin Cai:
Graph based Spatial-temporal Fusion for Multi-modal Person Re-identification. 3736-3744 - Yuanbin Wang, Shaofei Huang, Yulu Gao, Zhen Wang, Rui Wang, Kehua Sheng, Bo Zhang, Si Liu:
Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation. 3745-3754 - Zhaojian Li, Bin Zhao, Yuan Yuan:
Bio-Inspired Audiovisual Multi-Representation Integration via Self-Supervised Learning. 3755-3764 - Junyin Wang, Chenghu Du, Hui Li, Shengwu Xiong:
DLFusion: Painting-Depth Augmenting-LiDAR for Multimodal Fusion 3D Object Detection. 3765-3776 - Wenna Wang, Tao Zhuo, Xiuwei Zhang, Mingjun Sun, Hanlin Yin, Yinghui Xing, Yanning Zhang:
Automatic Network Architecture Search for RGB-D Semantic Segmentation. 3777-3786 - Nuo Chen, Jin Xie, Jing Nie, Jiale Cao, Zhuang Shao, Yanwei Pang:
Attentive Alignment Network for Multispectral Pedestrian Detection. 3787-3795 - Dong Chen, Siliang Tang, Zijin Shen, Guoming Wang, Jun Xiao, Yueting Zhuang, Carl Yang:
FedAA: Using Non-sensitive Modalities to Improve Federated Learning while Preserving Image Privacy. 3796-3806 - Mengze Li, Haoyu Zhang, Juncheng Li, Zhou Zhao, Wenqiao Zhang, Shengyu Zhang, Shiliang Pu, Yueting Zhuang, Fei Wu:
Unsupervised Domain Adaptation for Video Object Grounding with Cascaded Debiasing Learning. 3807-3816 - Zhengyang Mao, Wei Ju, Yifang Qin, Xiao Luo, Ming Zhang:
RAHNet: Retrieval Augmented Hybrid Network for Long-tailed Graph Classification. 3817-3826 - Youngjoon Jang, Kyeongha Rho, Jong-Bin Woo, Hyeongkeun Lee, Jihwan Park, Youshin Lim, Byeong-Yeol Kim, Joon Son Chung:
That's What I Said: Fully-Controllable Talking Face Generation. 3827-3836 - Quanmin Liang, Xiawu Zheng, Kai Huang, Yan Zhang, Jie Chen, Yonghong Tian:
Event-Diffusion: Event-Based Image Reconstruction and Restoration with Diffusion Models. 3837-3846 - Han Fang, Zhifei Yang, Xianghao Zang, Chao Ban, Zhongjiang He, Hao Sun, Lanxiang Zhou:
Mask to Reconstruct: Cooperative Semantics Completion for Video-text Retrieval. 3847-3856 - Yixuan Ma, Kun Zhan:
Self-Contrastive Graph Diffusion Network. 3857-3865 - Yiyang Chen, Shanshan Zhao, Changxing Ding, Liyao Tang, Chaoyue Wang, Dacheng Tao:
Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic Segmentation. 3866-3875 - Ren Wang, Haoliang Sun, Xiushan Nie, Yuxiu Lin, Xiaoming Xi, Yilong Yin:
Multi-View Representation Learning via View-Aware Modulation. 3876-3886 - Boxiang Yun, Xingran Xie, Qingli Li, Yan Wang:
Uni-Dual: A Generic Unified Dual-Task Medical Self-Supervised Learning Framework. 3887-3896 - Yifan Dong, Suhang Wu, Fandong Meng, Jie Zhou, Xiaoli Wang, Jianxin Lin, Jinsong Su:
Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering. 3897-3907 - Shilong Li, Boyu Qiao, Kun Li, Qianqian Lu, Meng Lin, Wei Zhou:
Multi-modal Social Bot Detection: Learning Homophilic and Heterophilic Connections Adaptively. 3908-3916 - Weibing Zhao, Haiming Zhang, Chaoda Zheng, Xu Yan, Shuguang Cui, Zhen Li:
CPU: Codebook Lookup Transformer with Knowledge Distillation for Point Cloud Upsampling. 3917-3925 - Mohit Tomar, Abhisek Tiwari, Tulika Saha, Sriparna Saha:
Your tone speaks louder than your face! Modality Order Infused Multi-modal Sarcasm Detection. 3926-3933 - Jieming Wang, Ziyan Li, Jianfei Yu, Li Yang, Rui Xia:
Fine-Grained Multimodal Named Entity Recognition and Grounding with a Generative Framework. 3934-3943 - Wei Liu, Xinlei Yang, Zhenhua Li, Feng Qian:
SkipStreaming: Pinpointing User-Perceived Redundancy in Correlated Web Video Streaming through the Lens of Scenes. 3944-3953 - Zhao Yang, Bing Su, Ji-Rong Wen:
Synthesizing Long-Term Human Motions with Diffusion Models via Coherent Sampling. 3954-3964 - Haichao Zhang, Yi Xu, Hongsheng Lu, Takayuki Shimizu, Yun Fu:
Layout Sequence Prediction From Noisy Mobile Modality. 3965-3974 - Chenyang Lyu, Wenxi Li, Tianbo Ji, Longyue Wang, Liting Zhou, Cathal Gurrin, Linyi Yang, Yi Yu, Yvette Graham, Jennifer Foster:
Graph-Based Video-Language Learning with Multi-Grained Audio-Visual Alignment. 3975-3984 - Meng Liu, Fenglei Zhang, Xin Luo, Fan Liu, Yinwei Wei, Liqiang Nie:
Advancing Video Question Answering with a Multi-modal and Multi-layer Question Enhancement Network. 3985-3993 - Wenrui Li, Xi-Le Zhao, Zhengyu Ma, Xingtao Wang, Xiaopeng Fan, Yonghong Tian:
Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning. 3994-4002 - Qianru Qiu, Xueting Wang, Mayu Otani:
Multimodal Color Recommendation in Vector Graphic Documents. 4003-4011 - Hengcan Shi, Munawar Hayat, Jianfei Cai:
Open-Vocabulary Object Detection via Scene Graph Discovery. 4012-4021 - Jushuo Chen, Feifei Dai, Xiaoyan Gu, Jiang Zhou, Bo Li, Weiping Wang:
Universal Domain Adaptive Network Embedding for Node Classification. 4022-4030 - Chenyu Yang, Mengxi Chen, Yanfeng Wang, Yu Wang:
Uncertainty-Guided End-to-End Audio-Visual Speaker Diarization for Far-Field Recordings. 4031-4041 - Tianyu Liu, Peng Zhang, Wei Huang, Yufei Zha, Tao You, Yanning Zhang:
Induction Network: Audio-Visual Modality Gap-Bridging for Self-Supervised Sound Source Localization. 4042-4052 - Yuhuan Lu, Bangchao Deng, Weijian Yu, Dingqi Yang:
HELIOS: Hyper-Relational Schema Modeling from Knowledge Graphs. 4053-4064 - Zhongfan Sun, Yongli Hu, Qingqing Gao, Huajie Jiang, Junbin Gao, Yanfeng Sun, Baocai Yin:
Breaking the Barrier Between Pre-training and Fine-tuning: A Hybrid Prompting Model for Knowledge-Based VQA. 4065-4073 - Ziteng Wen, Hai Xu, Chenyu Liu, Tao Guo, Jinshui Hu, Xuming He, Fengren Wang, Shun Lou, Haibo Fan:
OccluBEV: Occlusion Aware Spatiotemporal Modeling for Multi-view 3D Object Detection. 4074-4083
Poster Session III: Understanding Multimedia Content -- Vision and Language
- Xingyu Shen, Xiang Zhang, Xun Yang, Yibing Zhan, Long Lan, Jianfeng Dong, Hongzhou Wu:
Semantics-Enriched Cross-Modal Alignment for Complex-Query Video Moment Retrieval. 4109-4118 - Yun Liu, Zhongsheng Yan, Sixiang Chen, Tian Ye, Wenqi Ren, Erkang Chen:
NightHazeFormer: Single Nighttime Haze Removal Using Prior Query Transformer. 4119-4128 - Hua Li, Junyan Liang, Wenjie Li, Wenhui Wu:
FSNet: Frequency Domain Guided Superpixel Segmentation Network for Complex Scenes. 4129-4137 - Zhi Chen, Peng-Fei Zhang, Jingjing Li, Sen Wang, Zi Huang:
Zero-Shot Learning by Harnessing Adversarial Samples. 4138-4146 - Tian Ye, Sixiang Chen, Yun Liu, Wenhao Chai, Jinbin Bai, Wenbin Zou, Yunchen Zhang, Mingchao Jiang, Erkang Chen, Chenghao Xue:
Sequential Affinity Learning for Video Restoration. 4147-4156 - Yiwei Ma, Xiaoshuai Sun, Jiayi Ji, Guannan Jiang, Weilin Zhuang, Rongrong Ji:
Beat: Bi-directional One-to-Many Embedding Alignment for Text-based Person Retrieval. 4157-4168 - Rui Xu, Le Hui, Yuehui Han, Jianjun Qian, Jin Xie:
Transformer-based Point Cloud Generation Network. 4169-4177 - Jun Guo, Xingyu Zheng, Aishan Liu, Siyuan Liang, Yisong Xiao, Yichao Wu, Xianglong Liu:
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks. 4178-4189 - Daizong Liu, Xiaoye Qu, Jianfeng Dong, Guoshun Nan, Pan Zhou, Zichuan Xu, Lixing Chen, He Yan, Yu Cheng:
Filling the Information Gap between Video and Query for Language-Driven Moment Retrieval. 4190-4199 - Zhibo Tian, Xiaolin Zhang, Peng Zhang, Kun Zhan:
Improving Semi-Supervised Semantic Segmentation with Dual-Level Siamese Structure Network. 4200-4208 - Jiarui Yang, Chuan Wang, Zeming Liu, Jiahong Wu, Dongsheng Wang, Liang Yang, Xiaochun Cao:
Focusing on Flexible Masks: A Novel Framework for Panoptic Scene Graph Generation with Relation Constraints. 4209-4218 - Chunyu Xie, Heng Cai, Jincheng Li, Fanjing Kong, Xiaoyu Wu, Jianfei Song, Henrique Morimitsu, Lin Yao, Dexin Wang, Xiangzheng Zhang, Dawei Leng, Baochang Zhang, Xiangyang Ji, Yafeng Deng:
CCMB: A Large-scale Chinese Cross-modal Benchmark. 4219-4227 - Sixiang Chen, Tian Ye, Yun Liu, Jinbin Bai, Haoyu Chen, Yunlong Lin, Jun Shi, Erkang Chen:
CPLFormer: Cross-scale Prototype Learning Transformer for Image Snow Removal. 4228-4239 - Xuan Yao, Junyu Gao, Mengyuan Chen, Changsheng Xu:
Video Entailment via Reaching a Structure-Aware Cross-modal Consensus. 4240-4249 - Cheng Chen, Yunqing Chen, Shuang Song, Jianan Wang, Huansheng Ning, Ruoxiu Xiao:
Cerebrovascular Segmentation in TOF-MRA with Topology Regularization Adversarial Model. 4250-4259 - Jiale Yu, Baopeng Zhang, Qirui Li, Haoyang Chen, Zhu Teng:
Hierarchical Reasoning Network with Contrastive Learning for Few-Shot Human-Object Interaction Recognition. 4260-4268 - Sixiang Chen, Tian Ye, Chenghao Xue, Haoyu Chen, Yun Liu, Erkang Chen, Lei Zhu:
Uncertainty-Driven Dynamic Degradation Perceiving and Background Modeling for Efficient Single Image Desnowing. 4269-4280 - Chenpeng Du, Qi Chen, Tianyu He, Xu Tan, Xie Chen, Kai Yu, Sheng Zhao, Jiang Bian:
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder. 4281-4289 - Jiexin Wang, Yujie Zhou, Wenwen Qiang, Ying Ba, Bing Su, Ji-Rong Wen:
Spatio-Temporal Branching for Motion Prediction using Motion Increments. 4290-4299 - Zhenqian Wu, Yazhou Ren, Xiaorong Pu, Zhifeng Hao, Lifang He:
Generative Neutral Features-Disentangled Learning for Facial Expression Recognition. 4300-4308 - Tingting Wang, Yongxu Ye, Faming Fang, Guixu Zhang, Ming Xu:
Deep Algorithm Unrolling with Registration Embedding for Pansharpening. 4309-4318 - Huilin Zhu, Jingling Yuan, Xian Zhong, Zhengwei Yang, Zheng Wang, Shengfeng He:
DAOT: Domain-Agnostically Aligned Optimal Transport for Domain-Adaptive Crowd Counting. 4319-4329 - Wei Ji, Renjie Liang, Lizi Liao, Hao Fei, Fuli Feng:
Partial Annotation-based Video Moment Retrieval via Iterative Learning. 4330-4339 - Yirui Shen, Jingxuan Kang, Shuang Li, Zhenjie Yu, Shuigen Wang:
Style Transfer Meets Super-Resolution: Advancing Unpaired Infrared-to-Visible Image Translation with Detail Enhancement. 4340-4348 - Chongyang Zhao, Yuankai Qi, Qi Wu:
Mind the Gap: Improving Success Rate of Vision-and-Language Navigation by Revisiting Oracle Success Routes. 4349-4358 - Xinda Liu, Yaohui Zhu, Linhu Liu, Jiang Tian, Lili Wang:
Feature-Suppressed Contrast for Self-Supervised Food Pre-training. 4359-4367 - Yuchen Zhou, Guang Tan, Mengtang Li, Chao Gou:
Learning from Easy to Hard Pairs: Multi-step Reasoning Network for Human-Object Interaction Detection. 4368-4377 - Chengyang Fang, Jiangnan Li, Liang Li, Can Ma, Dayong Hu:
Separate and Locate: Rethink the Text in Text-based Visual Question Answering. 4378-4388 - Yunshi Lan, Xiang Li, Xin Liu, Yang Li, Wei Qin, Weining Qian:
Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts. 4389-4400 - Jie Xu, Shanshan Zhang, Jian Yang:
Adaptive Decoupled Pose Knowledge Distillation. 4401-4409 - Li Li, Chenwei Wang, You Qin, Wei Ji, Renjie Liang:
Biased-Predicate Annotation Identification via Unbiased Visual Predicate Representation. 4410-4420 - Huan Liu, Lu Zhang, Jihong Guan, Shuigeng Zhou:
Zero-Shot Object Detection by Semantics-Aware DETR with Adaptive Contrastive Loss. 4421-4430 - Tao Jin, Xize Cheng, Linjun Li, Wang Lin, Ye Wang, Zhou Zhao:
Rethinking Missing Modality Learning from a Decoding Perspective. 4431-4439 - Zhijin Ge, Fanhua Shang, Hongying Liu, Yuanyuan Liu, Liang Wan, Wei Feng, Xiaosen Wang:
Improving the Transferability of Adversarial Examples with Arbitrary Style Transfer. 4440-4449 - Xin Wang, Zihao Wu, Hong Chen, Xiaohan Lan, Wenwu Zhu:
Mixup-Augmented Temporally Debiased Video Grounding with Content-Location Disentanglement. 4450-4459 - Yaya Shi, Haowei Liu, Haiyang Xu, Zongyang Ma, Qinghao Ye, Anwen Hu, Ming Yan, Ji Zhang, Fei Huang, Chunfeng Yuan, Bing Li, Weiming Hu, Zheng-Jun Zha:
Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval. 4460-4470 - Jiawei Li, Jiansheng Chen, Jinyuan Liu, Huimin Ma:
Learning a Graph Neural Network with Cross Modality Interaction for Image Fusion. 4471-4479 - Chaoya Jiang, Haiyang Xu, Wei Ye, Qinghao Ye, Chenliang Li, Ming Yan, Bin Bi, Shikun Zhang, Fei Huang, Ji Zhang:
COPA : Efficient Vision-Language Pre-training through Collaborative Object- and Patch-Text Alignment. 4480-4491 - Shuyu Yang, Yinan Zhou, Zhedong Zheng, Yaxiong Wang, Li Zhu, Yujiao Wu:
Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark. 4492-4501 - Shiwei Gan, Yafeng Yin, Zhiwei Jiang, Lei Xie, Sanglu Lu:
Towards Real-Time Sign Language Recognition and Translation on Edge Devices. 4502-4512 - Qiwei Li, Zuchao Li, Xiantao Cai, Bo Du, Hai Zhao:
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling. 4513-4523 - Shaokun Wang, Weiwei Shi, Yuhang He, Yifan Yu, Yihong Gong:
Non-Exemplar Class-Incremental Learning via Adaptive Old Class Reconstruction. 4524-4534 - Ruixiang Jiang, Lingbo Liu, Changwen Chen:
CLIP-Count: Towards Text-Guided Zero-Shot Object Counting. 4535-4545 - Fuxiang Yang, Tonghua Su, Xiang Zhou, Donglin Di, Zhongjie Wang, Songze Li:
Self-Supervised Cross-Language Scene Text Editing. 4546-4554 - Feng Chen, Jiajia Liu, Kaixiang Ji, Wang Ren, Jian Wang, Jingdong Chen:
Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER. 4555-4563 - Liang He, Hongke Wang, Yongchang Cao, Zhen Wu, Jianbing Zhang, Xinyu Dai:
MORE: A Multimodal Object-Entity Relation Extraction Dataset with a Benchmark Evaluation. 4564-4573 - Ziyue Wu, Junyu Gao, Changsheng Xu:
Weakly-supervised Video Scene Graph Generation via Unbiased Cross-modal Learning. 4574-4583 - Jiong Yin, Liang Li, Jiehua Zhang, Chenggang Yan, Lei Zhang, Zunjie Zhu:
Reducing Intrinsic and Extrinsic Data Biases for Moment Localization with Natural Language. 4584-4594 - Yaoming Wang, Yuchen Liu, Xiaopeng Zhang, Jin Li, Bowen Shi, Chenglin Li, Wenrui Dai, Hongkai Xiong, Qi Tian:
VioLET: Vision-Language Efficient Tuning with Collaborative Multi-modal Gradients. 4595-4605 - Junyi Zeng, Chong Bao, Rui Chen, Zilong Dong, Guofeng Zhang, Hujun Bao, Zhaopeng Cui:
Mirror-NeRF: Learning Neural Radiance Fields for Mirrors with Whitted-Style Ray Tracing. 4606-4615 - Hongbin Xu, Weitao Chen, Yang Liu, Zhipeng Zhou, Haihong Xiao, Baigui Sun, Xuansong Xie, Wenxiong Kang:
Semi-supervised Deep Multi-view Stereo. 4616-4625 - Chen Jiang, Hong Liu, Xuzheng Yu, Qing Wang, Yuan Cheng, Jia Xu, Zhongyi Liu, Qingpei Guo, Wei Chu, Ming Yang, Yuan Qi:
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning. 4626-4636 - Tian Gan, Xiao Wang, Yan Sun, Jianlong Wu, Qingpei Guo, Liqiang Nie:
Temporal Sentence Grounding in Streaming Videos. 4637-4646 - Decheng Liu, Weizhao Yang, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao:
Modality-agnostic Augmented Multi-Collaboration Representation for Semi-supervised Heterogenous Face Recognition. 4647-4656 - Yifan Li, Yaochen Li, Wenneng Tang, Zhifeng Zhu, Jinhuo Yang, Yuehu Liu:
Swin-UNIT: Transformer-based GAN for High-resolution Unpaired Image Translation. 4657-4665 - Xiaoxiong Du, Jun Peng, Yiyi Zhou, Jinlu Zhang, Siting Chen, Guannan Jiang, Xiaoshuai Sun, Rongrong Ji:
PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks. 4666-4677 - Jingzheng Li, Hailong Sun:
LiFT: Transfer Learning in Vision-Language Models for Downstream Adaptation and Generalization. 4678-4687 - Manman Zhang, Ge Luo, Yuchen Ma, Sheng Li, Zhenxing Qian, Xinpeng Zhang:
VCMaster: Generating Diverse and Fluent Live Video Comments Based on Multimodal Contexts. 4688-4696 - Fulong Ye, Yuxing Long, Fangxiang Feng, Xiaojie Wang:
Whether you can locate or not? Interactive Referring Expression Generation. 4697-4706 - Yiming Li, Xiaoshan Yang, Changsheng Xu:
Iterative Learning with Extra and Inner Knowledge for Long-tail Dynamic Scene Graph Generation. 4707-4715 - Jing Zhang, Yingshuai Xie, Xiaoqiang Liu:
Improving Image Captioning through Visual and Semantic Mutual Promotion. 4716-4724 - Minghao Zhu, Xiao Lin, Ronghao Dang, Chengju Liu, Qijun Chen:
Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning. 4725-4736 - Zhuoling Li, Yong Wang:
Better Integrating Vision and Semantics for Improving Few-shot Classification. 4737-4746 - Mingrui Lao, Nan Pu, Yu Liu, Zhun Zhong, Erwin M. Bakker, Nicu Sebe, Michael S. Lew:
Multi-Domain Lifelong Visual Question Answering via Self-Critical Distillation. 4747-4758 - Xue Song, Jingjing Chen, Yu-Gang Jiang:
Relation Triplet Construction for Cross-modal Text-to-Video Retrieval. 4759-4767 - Shuyi Ouyang, Hongyi Wang, Ziwei Niu, Zhenjia Bai, Shiao Xie, Yingying Xu, Ruofeng Tong, Yen-Wei Chen, Lanfen Lin:
HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification. 4768-4777 - Haonan Zhang, Lianli Gao, Pengpeng Zeng, Alan Hanjalic, Heng Tao Shen:
Depth-Aware Sparse Transformer for Video-Language Learning. 4778-4787 - Chuanpeng Yang, Fuqing Zhu, Jizhong Han, Songlin Hu:
Invariant Meets Specific: A Scalable Harmful Memes Detection Framework. 4788-4797 - Wuyuan Xie, Miaohui Wang:
A Method of Micro-Geometric Details Preserving in Surface Reconstruction from Gradient. 4798-4806 - Wenhui Li, Yan Wang, Yuting Su, Lanjun Wang, Weizhi Nie, An-An Liu:
Progressive Positive Association Framework for Image and Text Retrieval. 4807-4815 - Fangzheng Tian, Sungchan Kim:
Globally-Robust Instance Identification and Locally-Accurate Keypoint Alignment for Multi-Person Pose Estimation. 4816-4827 - Kun Zhang, Lei Zhang, Bo Hu, Mengxiao Zhu, Zhendong Mao:
Unlocking the Power of Cross-Dimensional Semantic Dependency for Image-Text Matching. 4828-4837 - Zhiqing Chen, Yawei Luo, Jian Shao, Yi Yang, Chunping Wang, Lei Chen, Jun Xiao:
Dark Knowledge Balance Learning for Unbiased Scene Graph Generation. 4838-4847 - Yanbiao Ma, Licheng Jiao, Fang Liu, Shuyuan Yang, Xu Liu, Lingling Li:
Orthogonal Uncertainty Representation of Data Manifold for Robust Long-Tailed Learning. 4848-4857 - Rundong He, Rongxue Li, Zhongyi Han, Xihong Yang, Yilong Yin:
Topological Structure Learning for Weakly-Supervised Out-of-Distribution Detection. 4858-4866 - Weikang Wang, Jing Liu, Yuting Su, Weizhi Nie:
Efficient Spatio-Temporal Video Grounding with Semantic-Guided Feature Decomposition. 4867-4876 - Jiale Lu, Lianggangxu Chen, Youqi Song, Shaohui Lin, Changbo Wang, Gaoqi He:
Prior Knowledge-driven Dynamic Scene Graph Generation with Causal Inference. 4877-4885 - Junwen Chen, Jie Zhu, Yu Kong:
ATM: Action Temporality Modeling for Video Question Answering. 4886-4895 - Shaoxiang Guo, Qing Cai, Lin Qi, Junyu Dong:
CLIP-Hand3D: Exploiting 3D Hand Pose Estimation via Context-Aware Prompting. 4896-4907 - Ying Yang, Mulin Chen, Xuelong Li:
A Multitask Framework for Graffiti-to-Image Translation. 4908-4916 - Zihao Wang, Weichen Zhang, Weihong Bao, Fei Long, Chun Yuan:
Adaptive Contrastive Learning for Learning Robust Representations under Label Noise. 4917-4927 - Yunyi Xuan, Weijie Chen, Shicai Yang, Di Xie, Luojun Lin, Yueting Zhuang:
Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification. 4928-4938 - Yanzhe Chen, Huasong Zhong, Xiangteng He, Yuxin Peng, Lele Cheng:
Real20M: A Large-scale E-commerce Dataset for Cross-domain Retrieval. 4939-4948 - Dongsheng Xu, Wenye Zhao, Yi Cai, Qingbao Huang:
Zero-TextCap: Zero-shot Framework for Text-based Image Captioning. 4949-4957 - Zhaoxin Wang, Handing Wang, Cong Tian, Yaochu Jin:
Adversarial Training of Deep Neural Networks Guided by Texture and Structural Information. 4958-4967 - Xu Gu, Yuchong Sun, Feiyue Ni, Shizhe Chen, Xihua Wang, Ruihua Song, Boyuan Li, Xiang Cao:
TeViS: Translating Text Synopses to Video Storyboards. 4968-4979 - Nan Xi, Jingjing Meng, Junsong Yuan:
Chain-of-Look Prompting for Verb-centric Surgical Triplet Recognition in Endoscopic Videos. 5007-5016 - Wencan Huang, Daizong Liu, Wei Hu:
Dense Object Grounding in 3D Scenes. 5017-5026 - Xiaoxuan He, Siming Fu, Xinpeng Ding, Yuchen Cao, Hualiang Wang:
Uniformly Distributed Category Prototype-Guided Vision-Language Framework for Long-Tail Recognition. 5027-5037 - Kanzhi Cheng, Wenpo Song, Zheng Ma, Wenhao Zhu, Zixuan Zhu, Jianbing Zhang:
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model. 5038-5047 - Yue Wang, Jinlong Peng, Jiangning Zhang, Ran Yi, Liang Liu, Yabiao Wang, Chengjie Wang:
Toward High Quality Facial Representation Learning. 5048-5058 - Zikai Gao, Peng Qiao, Yong Dou:
HAAN: Human Action Aware Network for Multi-label Temporal Action Detection. 5059-5069 - Baoli Sun, Xinchen Ye, Zhihui Wang, Haojie Li, Zhiyong Wang:
Exploring Coarse-to-Fine Action Token Localization and Interaction for Fine-grained Video Action Recognition. 5070-5078 - Zhe Wang, Jiaoyan Guan, Mengping Yang, Ting Xiao, Ziqiu Chi:
Semantic-Aware Generator and Low-level Feature Augmentation for Few-shot Image Generation. 5079-5088 - Bowen Yuan, Sisi You, Bing-Kun Bao:
Self-PT: Adaptive Self-Prompt Tuning for Low-Resource Visual Question Answering. 5089-5098 - Ping Wang, Xin Yuan:
SAUNet: Spatial-Attention Unfolding Network for Image Compressive Sensing. 5099-5108 - Lin Deng, Yuzhong Zhong, Maoning Wang, Jianwei Zhang:
CONICA: A Contrastive Image Captioning Framework with Robust Similarity Learning. 5109-5119 - Zikang Liu, Sihan Chen, Longteng Guo, Handong Li, Xingjian He, Jing Liu:
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner. 5120-5131 - Jiali Chen, Zhenjun Guo, Jiayuan Xie, Yi Cai, Qing Li:
Deconfounded Visual Question Generation with Causal Inference. 5132-5142 - Jing Zhao, Heliang Zheng, Chaoyue Wang, Long Lan, Wanrong Huang, Wenjing Yang:
Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator. 5143-5152 - Wenqing Wang, Kaifeng Gao, Yawei Luo, Tao Jiang, Fei Gao, Jian Shao, Jianwen Sun, Jun Xiao:
Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation. 5153-5163 - Shuo Yang, Zirui Shang, Xinxiao Wu:
Probability Distribution Based Frame-supervised Language-driven Action Localization. 5164-5173 - Yaoyuan Liang, Zhao Yang, Yansong Tang, Jiashuo Fan, Ziran Li, Jingang Wang, Philip H. S. Torr, Shao-Lun Huang:
LUNA: Language as Continuing Anchors for Referring Expression Comprehension. 5174-5184 - Xuming Hu, Junzhe Chen, Aiwei Liu, Shiao Meng, Lijie Wen, Philip S. Yu:
Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity and Relation Extraction. 5185-5194 - Xiao Liang, Di Wang, Quan Wang, Bo Wan, Lingling An, Lihuo He:
Language-Guided Visual Aggregation Network for Video Question Answering. 5195-5203 - Jue Chen, Huan Yuan, Jianchao Tan, Bin Chen, Chengru Song, Di Zhang:
Resource Constrained Model Compression via Minimax Optimization for Spiking Neural Networks. 5204-5213 - Huimin Huang, Yawen Huang, Shiao Xie, Lanfen Lin, Ruofeng Tong, Yen-Wei Chen, Yuexiang Li, Yefeng Zheng:
Semi-Supervised Convolutional Vision Transformer with Bi-Level Uncertainty Estimation for Medical Image Segmentation. 5214-5222 - Qian Yang, Qian Chen, Wen Wang, Baotian Hu, Min Zhang:
Enhancing Multi-modal Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation. 5223-5234 - Linbo Wang, Jing Wu, Xianyong Fang, Zhengyi Liu, Chenjie Cao, Yanwei Fu:
Local Consensus Enhanced Siamese Network with Reciprocal Loss for Two-view Correspondence Learning. 5235-5243 - Rui Cao, Ming Shan Hee, Adriel Kuek, Wen-Haw Chong, Roy Ka-Wei Lee, Jing Jiang:
Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection. 5244-5252 - Tiantian Gong, Guodong Du, Junsheng Wang, Yongkang Ding, Liyan Zhang:
Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identification. 5253-5261 - Yuanhao Zhai, Mingzhen Huang, Tianyu Luan, Lu Dong, Ifeoma Nwogu, Siwei Lyu, David S. Doermann, Junsong Yuan:
Language-guided Human Motion Synthesis with Atomic Actions. 5262-5271 - Yuan Zhang, Weihua Chen, Yichen Lu, Tao Huang, Xiuyu Sun, Jian Cao:
Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty. 5272-5280 - Yu Zhao, Hao Fei, Yixin Cao, Bobo Li, Meishan Zhang, Jianguo Wei, Min Zhang, Tat-Seng Chua:
Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling. 5281-5291 - Ziqiao Peng, Yihao Luo, Yue Shi, Hao Xu, Xiangyu Zhu, Hongyan Liu, Jun He, Zhaoxin Fan:
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces. 5292-5301 - Yujie Zhou, Wenwen Qiang, Anyi Rao, Ning Lin, Bing Su, Jiaqi Wang:
Zero-shot Skeleton-based Action Recognition via Mutual Information Estimation and Maximization. 5302-5310 - Guojin Zhong, Jin Yuan, Pan Wang, Kailun Yang, Weili Guan, Zhiyong Li:
Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation. 5311-5320 - Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa:
Guided Image Synthesis via Initial Image Editing in Diffusion Model. 5321-5329 - Song Yang, Qiang Li, Wenhui Li, Min Liu, Xuanya Li, Anan Liu:
External Knowledge Dynamic Modeling for Image-text Retrieval. 5330-5338 - Qiang Wang, Junlong Du, Ke Yan, Shouhong Ding:
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning. 5339-5347 - Zhou Zhou, Jiahao Chao, Jiali Gong, Hongfan Gao, Zhenbing Zeng, Zhengfeng Yang:
Enhancing Real-Time Super Resolution with Partial Convolution and Efficient Variance Attention. 5348-5357 - Binyi Su, Hua Zhang, Zhong Zhou:
HSIC-based Moving Weight Averaging for Few-Shot Open-Set Object Detection. 5358-5369 - Zhihong Chen, Zilei Wang, Yixin Zhang:
Exploiting Low-confidence Pseudo-labels for Source-free Object Detection. 5370-5379 - Runnan Chen, Xinge Zhu, Nenglun Chen, Wei Li, Yuexin Ma, Ruigang Yang, Wenping Wang:
Bridging Language and Geometric Primitives for Zero-shot Point Cloud Segmentation. 5380-5388 - Yuehui Han, Jiaxin Chen, Jianjun Qian, Jin Xie:
Graph Spectral Perturbation for 3D Point Cloud Contrastive Learning. 5389-5398 - Jiahua Rao, Zifei Shan, Longpo Liu, Yao Zhou, Yuedong Yang:
Retrieval-based Knowledge Augmented Vision Language Pre-training. 5399-5409 - Yulin Jin, Xiaoyu Zhang, Jian Lou, Xiaofeng Chen:
ACQ: Few-shot Backdoor Defense via Activation Clipping and Quantizing. 5410-5418 - Yi Tang, Hiroshi Kawasaki, Takafumi Iwaguchi:
Underwater Image Enhancement by Transformer-based Diffusion Model with Non-uniform Sampling for Skip Strategy. 5419-5427 - Zhenghan Chen, Changzeng Fu, Ruoxue Wu, Ye Wang, Xunzhu Tang, Xiaoxuan Liang:
LGFat-RGCN: Faster Attention with Heterogeneous RGCN for Medical ICD Coding Generation. 5428-5435 - Jianlong Yuan, Jinchao Ge, Zhibin Wang, Yifan Liu:
Semi-supervised Semantic Segmentation with Mutual Knowledge Distillation. 5436-5444 - Tao Niu, Yihang Lou, Yinglei Teng, Jianzhong He, Yiding Liu:
Shift Pruning: Equivalent Weight Pruning for CNN via Differentiable Shift Operator. 5445-5454 - Shuman Fang, Shuai Liu, Jie Li, Guannan Jiang, Xianming Lin, Rongrong Ji:
Improving Human-Object Interaction Detection via Virtual Image Learning. 5455-5463 - Bo Zhang, Jian Wang, Hui Ma, Bo Xu, Hongfei Lin:
ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Dialogue Generation. 5464-5473 - Borui Jiang, Yadong Mu:
Diffused Fourier Network for Video Action Segmentation. 5474-5483 - Rui Xu, Yong Luo, Han Hu, Bo Du, Jialie Shen, Yonggang Wen:
Rethinking the Localization in Weakly Supervised Object Localization. 5484-5494 - Jiahua Xiao, Yantao Ji, Xing Wei:
Hyperspectral Image Denoising with Spectrum Alignment. 5495-5503 - Zilin Du, Yunxin Li, Xu Guo, Yidan Sun, Boyang Li:
Training Multimedia Event Extraction With Generated Images and Captions. 5504-5513 - Xixi Nie, Bo Hu, Xinbo Gao, Leida Li, Xiaodan Zhang, Bin Xiao:
BMI-Net: A Brain-inspired Multimodal Interaction Network for Image Aesthetic Assessment. 5514-5522 - Sindhu B. Hegde, Rudrabha Mukhopadhyay, C. V. Jawahar, Vinay P. Namboodiri:
Towards Accurate Lip-to-Speech Synthesis in-the-Wild. 5523-5531 - Yicheng Song, Shuyong Gao, Haozhe Xing, Yiting Cheng, Yan Wang, Wenqiang Zhang:
Towards End-to-End Unsupervised Saliency Detection with Self-Supervised Top-Down Context. 5532-5541 - Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie:
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation. 5542-5551 - Chunhui Zhang, Xin Sun, Yiqian Yang, Li Liu, Qiong Liu, Xi Zhou, Yanfeng Wang:
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment. 5552-5561 - Nan Li, Pijian Li, Dongsheng Xu, Wenye Zhao, Yi Cai, Qingbao Huang:
Scene-text Oriented Visual Entailment: Task, Dataset and Solution. 5562-5571 - Songhe Deng, Wei Zhuo, Jinheng Xie, Linlin Shen:
QA-CLIMS: Question-Answer Cross Language Image Matching for Weakly Supervised Semantic Segmentation. 5572-5583 - Lele Lv, Qing Liu, Shichao Kan, Yixiong Liang:
Confidence-Aware Contrastive Learning for Semantic Segmentation. 5584-5593 - Ao Wang, Hui Chen, Zijia Lin, Zixuan Ding, Pengzhang Liu, Yongjun Bao, Weipeng Yan, Guiguang Ding:
Hierarchical Prompt Learning Using CLIP for Multi-label Classification with Single Positive Labels. 5594-5604 - Wenrui Li, Zhengyu Ma, Liang-Jian Deng, Penghong Wang, Jinqiao Shi, Xiaopeng Fan:
Reservoir Computing Transformer for Image-Text Retrieval. 5605-5613 - Gege Qi, Yuefeng Chen, Xiaofeng Mao, Binyuan Hui, Xiaodan Li, Rong Zhang, Hui Xue:
Model Inversion Attack via Dynamic Memory Learning. 5614-5622 - Zhiming Hu, Angela Ning Ye, Salar Hosseini Khorasgani, Iqbal Mohomed:
AdaCLIP: Towards Pragmatic Multimodal Video Retrieval. 5623-5633 - Zhenyang Li, Yangyang Guo, Kejie Wang, Xiaolin Chen, Liqiang Nie, Mohan S. Kankanhalli:
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR. 5634-5644 - Keyu Tu, Zilei Wang, Junjie Li, Yixin Zhang:
Semi-supervised Domain Adaptation via Joint Contrastive Learning with Sensitivity. 5645-5654 - Xinzi Cao, Xiawu Zheng, Yunhang Shen, Ke Li, Jie Chen, Yutong Lu, Yonghong Tian:
LocLoc: Low-level Cues and Local-area Guides for Weakly Supervised Object Localization. 5655-5664 - Cong-Duy Nguyen, The-Anh Vu-Le, Thong Nguyen, Tho Quan, Anh Tuan Luu:
Expand BERT Representation with Visual Information via Grounded Language Learning with Multimodal Partial Alignment. 5665-5673 - Zheng Ma, Mianzhi Pan, Wenhan Wu, Kanzhi Cheng, Jianbing Zhang, Shujian Huang, Jiajun Chen:
Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models. 5674-5685 - Zhe Li, Laurence T. Yang, Xin Nie, Bocheng Ren, Xianjun Deng:
Enhancing Sentence Representation with Visually-supervised Multimodal Pre-training. 5686-5695 - Longzheng Wang, Chuang Zhang, Hongbo Xu, Yongxiu Xu, Xiaohan Xu, Siqi Wang:
Cross-modal Contrastive Learning for Multimodal Fake News Detection. 5696-5704 - Dingyi Yang, Hongyu Chen, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin:
Visual Captioning at Will: Describing Images and Videos Guided by a Few Stylized Sentences. 5705-5715 - Yinuo Jing, Chunyu Wang, Ruxu Zhang, Kongming Liang, Zhanyu Ma:
Category-Specific Prompts for Animal Action Recognition with Pretrained Vision-Language Models. 5716-5724 - Rui Xu, Le Hui, Yuehui Han, Jianjun Qian, Jin Xie:
Scene Graph Masked Variational Autoencoders for 3D Scene Generation. 5725-5733 - Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia:
AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion. 5734-5745 - Xu Bao, Zhi-Qi Cheng, Jun-Yan He, Wangmeng Xiang, Chenyang Li, Jingdong Sun, Hanbing Liu, Wei Liu, Bin Luo, Yifeng Geng, Xuansong Xie:
KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration. 5746-5755 - Zhiyu Jin, Hanyang Yu, Chen Haul, Linxiang Wang, Zuobin Zhu, Qiu Shen, Xun Cao:
WormTrack: Dataset and Benchmark for Multi-Object Tracking in Worm Crowds. 5756-5763 - Jinglei Zhang, Tiancheng Lin, Yi Xu, Kai Chen, Rui Zhang:
Relational Contrastive Learning for Scene Text Recognition. 5764-5775 - Yiting Liu, Liang Li, Beichen Zhang, Shan Huang, Zheng-Jun Zha, Qingming Huang:
MaTCR: Modality-Aligned Thought Chain Reasoning for Multimodal Task-Oriented Dialogue Generation. 5776-5785 - Xiaoyu Li, Xiaoxue Chen, Zuming Huang, Lele Xie, Jingdong Chen, Ming Yang:
Fine-grained Pseudo Labels for Scene Text Recognition. 5786-5795 - Jiachen Sun, Mark Ibrahim, Melissa Hall, Ivan Evtimov, Z. Morley Mao, Cristian Canton-Ferrer, Caner Hazirbas:
VPA: Fully Test-Time Visual Prompt Adaptation. 5796-5806 - Haonan Shi, Wenwen Pan, Zhou Zhao, Mingmin Zhang, Fei Wu:
Unsupervised Domain Adaptation for Referring Semantic Segmentation. 5807-5818 - Guangming Shi, Xuyang Li, Xuemei Xie, Mingxuan Yu, Chengwei Rao, Jiakai Luo:
OCSKB: An Object Component Sketch Knowledge Base for Fast 6D Pose Estimation. 5819-5827 - Hongbo Sun, Xiangteng He, Jiahuan Zhou, Yuxin Peng:
Fine-Grained Visual Prompt Learning of Vision-Language Models for Image Recognition. 5828-5836
Poster Session IV: Engaging Users with Multimedia -- Emotional and Social Signals
- Teng Sun, Juntong Ni, Wenjie Wang, Liqiang Jing, Yinwei Wei, Liqiang Nie:
General Debiasing for Multimodal Sentiment Analysis. 5861-5869 - Tuukka Ruotsalo, Kalle Mäkelä, Michiel M. A. Spapé, Luis A. Leiva:
Feeling Positive? Predicting Emotional Image Similarity from Brain Signals. 5870-5878 - Tongjie Pan, Yalan Ye, Hecheng Cai, Shudong Huang, Yang Yang, Guoqing Wang:
Multimodal Physiological Signals Fusion for Online Emotion Recognition. 5879-5888 - Hanwei Liu, Huiling Cai, Qingcheng Lin, Xuefeng Li, Hui Xiao:
Learning from More: Combating Uncertainty Cross-multidomain for Facial Expression Recognition. 5889-5898 - Yizhuo Lu, Changde Du, Qiongyi Zhou, Dianpeng Wang, Huiguang He:
MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion. 5899-5908 - Jaeho Yoon, Jaewoo Park, Kensuke Wagata, Hojin Park, Andrew Beng Jin Teoh:
Pretrained Implicit-Ensemble Transformer for Open-Set Authentication on Multimodal Mobile Biometrics. 5909-5922 - Bobo Li, Hao Fei, Lizi Liao, Yu Zhao, Chong Teng, Tat-Seng Chua, Donghong Ji, Fei Li:
Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition. 5923-5934 - Yiwei Ru, Peipei Li, Muyi Sun, Yunlong Wang, Kunbo Zhang, Qi Li, Zhaofeng He, Zhenan Sun:
Sensing Micro-Motion Human Patterns using Multimodal mmRadar and Video Signal for Affective and Psychological Intelligence. 5935-5946 - Yunxiao Wang, Meng Liu, Zhe Li, Yupeng Hu, Xin Luo, Liqiang Nie:
Unlocking the Power of Multimodal Learning for Emotion Recognition in Conversation. 5947-5955 - Jiaxin Ye, Yujie Wei, Xin-Cheng Wen, Chenglong Ma, Zhizhong Huang, Kunhong Liu, Hongming Shan:
Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition. 5956-5965 - Yuchen Liu, Haoyu Zhang, Shichao Liu, Xiang Yin, Zejun Ma, Qin Jin:
Emotionally Situated Text-to-Speech Synthesis in User-Agent Conversation. 5966-5974 - Wei-Bang Jiang, Xuan-Hao Liu, Wei-Long Zheng, Bao-Liang Lu:
Multimodal Adaptive Emotion Transformer with Flexible Modality Inputs on A Novel Dataset with Continuous Labels. 5975-5984 - Ming Jin, Jinpeng Li:
Graph to Grid: Learning Deep Representations for Multimodal Emotion Recognition. 5985-5993 - Shihao Zou, Xianying Huang, Xudong Shen:
Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation. 5994-6003 - Minh Tran, Yelin Kim, Che-Chun Su, Cheng-Hao Kuo, Mohammad Soleymani:
SAAML: A Framework for Semi-supervised Affective Adaptation via Metric Learning. 6004-6015 - Tian-Yu Xiang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Hong-Jun Yang, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, De-Xing Huang, Zeng-Guang Hou:
Learning Shared Semantic Information from Multimodal Bio-signals for Brain-Muscle Modulation Analysis. 6016-6024 - Xiaoyu Chen, Changde Du, Qiongyi Zhou, Huiguang He:
Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning. 6025-6033 - Jicai Pan, Shangfei Wang:
Progressive Visual Content Understanding Network for Image Emotion Classification. 6034-6044 - Xiaocui Yang, Shi Feng, Daling Wang, Yifei Zhang, Soujanya Poria:
Few-shot Multimodal Sentiment Analysis Based on Multimodal Probabilistic Fusion Prompts. 6045-6053 - Zhouan Zhu, Chenguang Li, Jicai Pan, Xin Li, Yufei Xiao, Yanan Chang, Feiyi Zheng, Shangfei Wang:
MEDIC: A Multimodal Empathy Dataset in Counseling. 6054-6062 - Yan Li, Liang Zhang, Xiangyuan Lan, Dongmei Jiang:
Towards Adaptable Graph Representation Learning: An Adaptive Multi-Graph Contrastive Transformer. 6063-6071 - Luojun Lin, Zhifeng Shen, Jia-Li Yin, Qipeng Liu, Yuanlong Yu, Weijie Chen:
MetaFBP: Learning to Learn High-Order Predictor for Personalized Facial Beauty Prediction. 6072-6080 - Yayue Deng, Jinlong Xue, Fengping Wang, Yingming Gao, Ya Li:
CMCU-CSS: Enhancing Naturalness via Commonsense-based Multi-modal Context Understanding in Conversational Speech Synthesis. 6081-6089 - Sidharth Anand, Naresh Kumar Devulapally, Sreyasee Das Bhattacharjee, Junsong Yuan:
Multi-label Emotion Analysis in Conversation via Multimodal Knowledge Distillation. 6090-6100 - Zhihe Zhao, Dongdong Weng, Hanzhi Guo, Jing Hou, Jixiang Zhou:
Facial Auto Rigging from 4D Expressions via Skinning Decomposition. 6101-6109 - Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao:
MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition. 6110-6121 - Yucheng Liu, Ziyu Jia, Haichao Wang:
EmotionKD: A Cross-Modal Knowledge Distillation Framework for Emotion Recognition Based on Physiological Signals. 6122-6131 - Zaijing Li, Ting-En Lin, Yuchuan Wu, Meng Liu, Fengxiao Tang, Ming Zhao, Yongbin Li:
UniSA: Unified Generative Framework for Sentiment Analysis. 6132-6142 - Yi Wu, Shangfei Wang, Yanan Chang:
Patch-Aware Representation Learning for Facial Expression Recognition. 6143-6151 - Hao Yu, Danielle A. Allessio, Will Lee, William Rebelsky, Frank Sylvia, Tom Murray, John J. Magee, Ivon Arroyo, Beverly P. Woolf, Sarah Adel Bargal, Margrit Betke:
COVES: A Cognitive-Affective Deep Model that Personalizes Math Problem Difficulty in Real Time and Improves Student Engagement with an Online Tutor. 6152-6160 - Ravikiran Parameshwara, Ibrahim Radwan, Akshay Asthana, Iman Abbasnejad, Ramanathan Subramanian, Roland Goecke:
Efficient Labelling of Affective Video Datasets via Few-Shot & Multi-Task Contrastive Learning. 6161-6170
Poster Session V: Engaging Users with Multimedia -- Multimedia Search and Recommendation
- Tianyu Chang, Xun Yang, Xin Luo, Wei Ji, Meng Wang:
Learning Style-Invariant Robust Representation for Generalizable Visual Instance Retrieval. 6171-6180 - Xiyue Gao, Zhuoqi Ma, Jiangtao Cui, Xiaofang Xia, Cai Xu:
Hierarchical Category-Enhanced Prototype Learning for Imbalanced Temporal Recommendation. 6181-6189 - Zhongxuan Han, Chaochao Chen, Xiaolin Zheng, Weiming Liu, Jun Wang, Wenjie Cheng, Yuyuan Li:
In-processing User Constrained Dominant Sets for User-Oriented Fairness in Recommender Systems. 6190-6201 - Shuanglin Yan, Neng Dong, Jun Liu, Liyan Zhang, Jinhui Tang:
Learning Comprehensive Representations with Richer Self for Text-to-Image Person Re-Identification. 6202-6211 - Huafeng Liu, Mingjie Zhou, Liping Jing, Michael K. Ng:
Doubly Intention Learning for Cold-start Recommendation with Uncertainty-aware Stochastic Meta Process. 6212-6222 - Hongru Liang, Jingyao Liu, Yuanxin Xiang, Jiachen Du, Lanjun Zhou, Shushen Pan, Wenqiang Lei:
DiVa: An Iterative Framework to Harvest More Diverse and Valid Labels from User Comments for Music. 6223-6233 - Zhenghong Lin, Yanchao Tan, Yunfei Zhan, Weiming Liu, Fan Wang, Chaochao Chen, Shiping Wang, Carl Yang:
Contrastive Intra- and Inter-Modality Generation for Enhancing Incomplete Multimedia Recommendation. 6234-6242 - Weiming Liu, Xiaolin Zheng, Chaochao Chen, Mengling Hu, Xinting Liao, Fan Wang, Yanchao Tan, Dan Meng, Jun Wang:
Differentially Private Sparse Mapping for Privacy-Preserving Cross Domain Recommendation. 6243-6252 - Zexian Yang, Dayan Wu, Wanqian Zhang, Bo Li, Weiping Wang:
Handling Label Uncertainty for Camera Incremental Person Re-Identification. 6253-6263 - Wenhui Li, Xinqi Su, Dan Song, Lanjun Wang, Kun Zhang, An-An Liu:
Towards Deconfounded Image-Text Matching with Causal Inference. 6264-6273 - Yu Shang, Chen Gao, Jiansheng Chen, Depeng Jin, Huimin Ma, Yong Li:
Enhancing Adversarial Robustness of Multi-modal Recommendation via Modality Balancing. 6274-6282 - Yang Zhang, Songhe Feng:
Enhancing Domain-Invariant Parts for Generalized Zero-Shot Learning. 6283-6291 - Shenshen Li, Xing Xu, Yang Yang, Fumin Shen, Yijun Mo, Yujie Li, Heng Tao Shen:
DCEL: Deep Cross-modal Evidential Learning for Text-Based Person Retrieval. 6292-6300 - Ying Li, Chunming Guan, Rui Cai, Ye Erwan, Ding Yuxiang, Jiaquan Gao:
Tran-GCN: Multi-label Pattern Image Retrieval via Transformer Driven Graph Convolutional Network. 6301-6310 - Ziqi Zhou, Shengshan Hu, Minghui Li, Hangtao Zhang, Yechao Zhang, Hai Jin:
AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning. 6311-6320 - Jiajie Su, Chaochao Chen, Zibin Lin, Xi Li, Weiming Liu, Xiaolin Zheng:
Personalized Behavior-Aware Transformer for Multi-Behavior Sequential Recommendation. 6321-6331 - Jinhu Lu, Guohao Sun, Xiu Fang, Jian Yang, Wei He:
A Contrastive Learning Framework for Dual-Target Cross-Domain Recommendation. 6332-6339 - Chong-Yu Zhang, Xin Luo, Yu-Wei Zhan, Peng-Fei Zhang, Zhen-Duo Chen, Yongxin Wang, Xun Yang, Xin-Shun Xu:
Self-Distillation Dual-Memory Online Hashing with Hash Centers for Streaming Data Retrieval. 6340-6349 - Zhenpeng Song, Qinliang Su, Jiayang Chen:
Unsupervised Hashing with Contrastive Learning by Exploiting Similarity Knowledge and Hidden Structure of Data. 6350-6358 - Xinfeng Dong, Longfei Han, Dingwen Zhang, Li Liu, Junwei Han, Huaxiang Zhang:
Giving Text More Imagination Space for Image-text Matching. 6359-6368 - Wei Yang, Zhengru Fang, Tianle Zhang, Shiguang Wu, Chi Lu:
Modal-aware Bias Constrained Contrastive Learning for Multimodal Recommendation. 6369-6378 - Wenshuo Zhao, Jingkuan Song, Shengming Yuan, Lianli Gao, Yang Yang, Hengtao Shen:
Precise Target-Oriented Attack against Deep Hashing-based Retrieval. 6379-6389 - Hao Wei, Shuhui Wang, Zhe Xue, Shengbo Chen, Qingming Huang:
Conversational Composed Retrieval with Iterative Sequence Refinement. 6390-6399 - Yulu Wang, Pengwen Dai, Xiaojun Jia, Zhitao Zeng, Rui Li, Xiaochun Cao:
Hi-SIGIR: Hierachical Semantic-Guided Image-to-image Retrieval via Scene Graph. 6400-6409 - Shanshan Huang, Haoxuan Li, Qingsong Li, Chunyuan Zheng, Li Liu:
Pareto Invariant Representation Learning for Multimedia Recommendation. 6410-6419 - Jiaguo Yu, Yuming Shen, Haofeng Zhang:
Hashing One With All. 6420-6431 - Aozhu Chen, Ziyuan Wang, Chengbo Dong, Kaibin Tian, Ruixiang Zhao, Xun Liang, Zhanhui Kang, Xirong Li:
ChinaOpen: A Dataset for Open-world Multimodal Learning. 6432-6440 - Panwen Hu, Nan Xiao, Feifei Li, Yongquan Chen, Rui Huang:
A Reinforcement Learning-Based Automatic Video Editing Method Using Pre-trained Vision-Language Model. 6441-6450 - Jianyang Zhai, Xiawu Zheng, Chang-Dong Wang, Hui Li, Yonghong Tian:
Knowledge Prompt-tuning for Sequential Recommendation. 6451-6461 - Wenfeng Liu, Xudong Wang, Lei Tan, Yan Zhang, Pingyang Dai, Yongjian Wu, Rongrong Ji:
Learning Occlusion Disentanglement with Fine-grained Localization for Occluded Person Re-identification. 6462-6471 - He Zhang, Ying Sun, Weiyu Guo, Yafei Liu, Haonan Lu, Xiaodong Lin, Hui Xiong:
Interactive Interior Design Recommendation via Coarse-to-fine Multimodal Reinforcement Learning. 6472-6480 - Tinghui Zhu, Jingping Liu, Jiaqing Liang, Haiyun Jiang, Yanghua Xiao, Zongyu Wang, Rui Xie, Yunsen Xian:
Towards Visual Taxonomy Expansion. 6481-6490 - Wenzhe Du, Su Haoyang, Cam-Tu Nguyen, Jian Sun:
Enhancing Product Representation with Multi-form Interactions for Multimodal Conversational Recommendation. 6491-6500 - Yuan Sun, Dezhong Peng, Jian Dai, Zhenwen Ren:
Stepwise Refinement Short Hashing for Image Retrieval. 6501-6509 - Rui Yang, Shuang Wang, Huan Zhang, Siyuan Xu, Yanhe Guo, Xiutiao Ye, Biao Hou, Licheng Jiao:
Knowledge Decomposition and Replay: A Novel Cross-modal Image-Text Retrieval Continual Learning Method. 6510-6519 - Haiyang Xie, Zhengwei Yang, Huilin Zhu, Zheng Wang:
Striking a Balance: Unsupervised Cross-Domain Crowd Counting via Knowledge Diffusion. 6520-6529 - Hongzu Su, Jingjing Li, Fengling Li, Lei Zhu, Ke Lu, Yang Yang:
Task-Adversarial Adaptation for Multi-modal Recommendation. 6530-6538 - Zezhong Lv, Bing Su, Ji-Rong Wen:
Counterfactual Cross-modality Reasoning for Weakly Supervised Video Moment Localization. 6539-6547 - Jinpeng Wang, Ziyun Zeng, Yunxiao Wang, Yuting Wang, Xingyu Lu, Tianxiang Li, Jun Yuan, Rui Zhang, Hai-Tao Zheng, Shu-Tao Xia:
MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation. 6548-6557 - Xin Lu, Shikun Chen, Yichao Cao, Xin Zhou, Xiaobo Lu:
Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval. 6558-6566 - Fan Liu, Huilin Chen, Zhiyong Cheng, Liqiang Nie, Mohan S. Kankanhalli:
Semantic-Guided Feature Distillation for Multimodal Recommendation. 6567-6575 - Penghang Yu, Zhiyi Tan, Guanming Lu, Bing-Kun Bao:
Multi-View Graph Convolutional Network for Multimedia Recommendation. 6576-6585
Poster Session V: Engaging Users with Multimedia -- Summarization, Analytics, and Storytelling
- Yutong Wang, Hongteng Xu, Dixin Luo:
Self-supervised Video Summarization Guided by Semantic Inverse Optimal Transport. 6611-6622 - Zhuo Zhou, Wenxuan Liu, Danni Xu, Zheng Wang, Jian Zhao:
Uncovering the Unseen: Discover Hidden Intentions by Micro-Behavior Graph Reasoning. 6623-6633 - Jingqiu Li, Lanjun Wang, Jianlin He, Yongdong Zhang, Anan Liu:
Improving Rumor Detection by Class-based Adversarial Domain Adaptation. 6634-6642 - Peggy Tang, Kun Hu, Lei Zhang, Junbin Gao, Jiebo Luo, Zhiyong Wang:
TopicCAT: Unsupervised Topic-Guided Co-Attention Transformer for Extreme Multimodal Summarisation. 6643-6652 - Tao Yang, Fan Wang, Junfan Lin, Zhongang Qi, Yang Wu, Jing Xu, Ying Shan, Changwen Chen:
Toward Human Perception-Centric Video Thumbnail Generation. 6653-6664
Poster Session VI: Engaging Users with Multimedia -- Interactions and Quality of Experience
- Huimin Zeng, Weinong Wang, Xin Tao, Zhiwei Xiong, Yu-Wing Tai, Wenjie Pei:
Feature Decoupling-Recycling Network for Fast Interactive Segmentation. 6665-6675 - Shima Mohammadi, João Ascenso:
Predictive Sampling for Efficient Pairwise Subjective Image Quality Assessment. 6676-6684 - Quan Wang, Yanli Ren, Xinpeng Zhang, Guorui Feng:
Interactive Image Style Transfer Guided by Graffiti. 6685-6694 - Hongbo Liu, Mingda Wu, Kun Yuan, Ming Sun, Yansong Tang, Chuanchuan Zheng, Xing Wen, Xiu Li:
Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment. 6695-6704 - Lanjun Wang, Xinran Qiao, Yanwei Xie, Weizhi Nie, Yongdong Zhang, Anan Liu:
My Brother Helps Me: Node Injection Based Adversarial Attack on Social Bot Detection. 6705-6714 - Michela Testolina, Davi Lazzarotto, Rafael Rodrigues, Shima Mohammadi, João Ascenso, António M. G. Pinheiro, Touradj Ebrahimi:
On the Performance of Subjective Visual Quality Assessment Protocols for Nearly Visually Lossless Image Compression. 6715-6723 - Zan Gao, Xinglei Cui, Yibo Zhao, Tao Zhuo, Weili Guan, Meng Wang:
A Novel Temporal Channel Enhancement and Contextual Excavation Network for Temporal Action Localization. 6724-6733 - Jin Liu, Xi Wang, Xiaomeng Fu, Yesheng Chai, Cai Yu, Jiao Dai, Jizhong Han:
MFR-Net: Multi-faceted Responsive Listening Head Generation via Denoising Diffusion Model. 6734-6743 - Songlin Tang, Wenjie Pei, Xin Tao, Tanghui Jia, Guangming Lu, Yu-Wing Tai:
Scene-Generalizable Interactive Segmentation of Radiance Fields. 6744-6755 - Zhonghao Lin, Haihan Duan, Jiaye Li, Xinyao Sun, Wei Cai:
MetaCast: A Self-Driven Metaverse Announcer Architecture Based on Quality of Experience Evaluation Model. 6756-6764 - Wuyuan Xie, Shukang Wang, Rong Zhang, Miaohui Wang:
Visual Redundancy Removal of Composite Images via Multimodal Learning. 6765-6773 - Xiaoyu Ma, Chenxi Feng, Jiaojiao Wang, Qiang Lin, Suiyu Zhang, Jinchi Zhu, Xiaodiao Chen, Chang Liu, Dingguo Yu:
A Model-Agnostic Semantic-Quality Compatible Framework based on Self-Supervised Semantic Decoupling. 6774-6784 - Wei Xie, Haobo Jiang, Shuo Gu, Jin Xie:
Implicit Obstacle Map-driven Indoor Navigation Model for Robust Obstacle Avoidance. 6785-6793 - Hancheng Zhu, Zhiwen Shao, Yong Zhou, Guangcheng Wang, Pengfei Chen, Leida Li:
Personalized Image Aesthetics Assessment with Attribute-guided Fine-grained Feature Representation. 6794-6802 - Songtao Wang, Xiaoqi Wang, Hao Gao, Jian Xiong:
Non-Local Geometry and Color Gradient Aggregation Graph Model for No-Reference Point Cloud Quality Assessment. 6803-6810
Poster Session VII: Engaging Users with Multimedia -- Metaverse, Art and Culture
- Jionghao Wang, Ziyu Chen, Jun Ling, Rong Xie, Li Song:
360-Degree Panorama Generation from Few Unregistered NFoV Images. 6811-6821 - Haozhe Wu, Songtao Zhou, Jia Jia, Junliang Xing, Qi Wen, Xiang Wen:
Speech-Driven 3D Face Animation with Composite and Regional Facial Movements. 6822-6830 - Huiguo He, Tianfu Wang, Huan Yang, Jianlong Fu, Nicholas Jing Yuan, Jian Yin, Hongyang Chao, Qi Zhang:
Learning Profitable NFT Image Diffusions via Multiple Visual-Policy Guided Reinforcement Learning. 6831-6840 - Chaohui Yu, Qiang Zhou, Jingliang Li, Zhe Zhang, Zhibin Wang, Fan Wang:
Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation. 6841-6850 - Ye Pan, Ruisi Zhang, Jingying Wang, Yu Ding, Kenny Mitchell:
Real-time Facial Animation for 3D Stylized Character with Emotion Dynamics. 6851-6859 - Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Tao Mei:
3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models. 6860-6868 - Xiaolei Diao, Daqian Shi, Jian Li, Lida Shi, Mingzhe Yue, Ruihua Qi, Chuntao Li, Hao Xu:
Toward Zero-shot Character Recognition: A Gold Standard Dataset with Radical-level Annotations. 6869-6877 - Haibo Chen, Lei Zhao, Jun Li, Jian Yang:
TSSAT: Two-Stage Statistics-Aware Transformation for Artistic Style Transfer. 6878-6887 - Jian-Jun Qiao, Jie Zhang, Xiao Wu, Yu-Pei Song, Wei Li:
CPNet: Cartoon Parsing with Pixel and Part Correlation. 6888-6897 - Liangchen Song, Liangliang Cao, Hongyu Xu, Kai Kang, Feng Tang, Junsong Yuan, Zhao Yang:
RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture. 6898-6906 - Fengyuan Liu, Lingyun Yu, Hongtao Xie, Chuanbin Liu, Zhiguo Ding, Quanwei Yang, Yongdong Zhang:
High Fidelity Face Swapping via Semantics Disentanglement and Structure Enhancement. 6907-6917 - Zihao Huang, Min Shi, Chengxin Liu, Ke Xian, Zhiguo Cao:
SimHMR: A Simple Query-based Framework for Parameterized Human Mesh Reconstruction. 6918-6927 - Quan Wang, Sheng Li, Xinpeng Zhang, Guorui Feng:
Rethinking Neural Style Transfer: Generating Personalized and Watermarked Stylized Images. 6928-6937 - Xin Jin, Wu Zhou, Jinyu Wang, Duo Xu, Yongsen Zheng:
An Order-Complexity Aesthetic Assessment Model for Aesthetic-aware Music Recommendation. 6938-6947 - Jianwei Hu, Ningna Wang, Baorong Yang, Gang Chen, Xiaohu Guo, Bin Wang:
S3DS: Self-supervised Learning of 3D Skeletons from Single View Images. 6948-6958 - Kun Cheng, Mingrui Zhu, Nannan Wang, Guozhang Li, Xiaoyu Wang, Xinbo Gao:
Controllable Face Sketch-Photo Synthesis with Flexible Generative Priors. 6959-6968 - Xiaomeng Fu, Xi Wang, Jin Liu, Shuhui Wang, Jiao Dai, Jizhong Han:
CoP: Chain-of-Pose for Image Animation in Large Pose Changes. 6969-6977 - Sebin Lee, Daye Kim, Jungjin Lee:
The Effects of Viewing Formats and Song Genres on Audience Experiences in Virtual Avatar Concerts. 6978-6988 - Ray LC, Sijia Liu, Qiaosheng Lyu:
IN/ACTive: A Distance-Technology-Mediated Stage for Performer-Audience Telepresence and Environmental Control. 6989-6997 - Ruizhao Chen, Ye Pan, Zhigang Deng, Lili Wang, Lizhuang Ma:
Double Doodles: Sketching Animation in Immersive Environment With 3+6 DOFs Motion Gestures. 6998-7006 - Zhong Li, Liangchen Song, Zhang Chen, Xiangyu Du, Lele Chen, Junsong Yuan, Yi Xu:
Relit-NeuLF: Efficient Relighting and Novel View Synthesis via Neural 4D Light Field. 7007-7016
Poster Session VIII: Engaging Users with Multimedia -- Multimedia Applications
- Junbao Zhuo, Xingyu Zhao, Shuhao Cui, Qingming Huang, Shuhui Wang:
Adaptive Feature Swapping for Unsupervised Domain Adaptation. 7017-7028 - Xiaobo Shen, Yinfan Chen, Shirui Pan, Weiwei Liu, Yuhui Zheng:
Graph Convolutional Incomplete Multi-modal Hashing. 7029-7037 - Ruitao Chen, Guoyang Xie, Jiaqi Liu, Jinbao Wang, Ziqi Luo, Jinfan Wang, Feng Zheng:
EasyNet: An Easy Network for 3D Industrial Anomaly Detection. 7038-7046 - Yujuan Ding, P. Y. Mok, Yi Bin, Xun Yang, Zhiyong Cheng:
Modeling Multi-Relational Connectivity for Personalized Fashion Matching. 7047-7055 - Guancheng Chen, Xin Liu, Xing Xu, Yiu-Ming Cheung, Taihao Li:
Taking a Part for the Whole: An Archetype-agnostic Framework for Voice-Face Association. 7056-7064 - Kui Jiang, Wenxuan Liu, Zheng Wang, Xian Zhong, Junjun Jiang, Chia-Wen Lin:
DAWN: Direction-aware Attention Wavelet Network for Image Deraining. 7065-7074 - Honggu Liu, Xiaodan Li, Wenbo Zhou, Han Fang, Paolo Bestagini, Weiming Zhang, Yuefeng Chen, Stefano Tubaro, Nenghai Yu, Yuan He, Hui Xue:
BiFPro: A Bidirectional Facial-data Protection Framework against DeepFake. 7075-7084 - De Cheng, Xiaojian Huang, Nannan Wang, Lingfeng He, Zhihui Li, Xinbo Gao:
Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement. 7085-7093 - Jinkang Guo, Zhibo Wan, Zhihan Lv:
Digital Twins Fuzzy System Based on Time Series Forecasting Model LFTformer. 7094-7100 - Harry Cheng, Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Mohan S. Kankanhalli:
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration. 7101-7110 - Pan Mu, Hanning Xu, Zheyuan Liu, Zheng Wang, Sixian Chan, Cong Bai:
A Generalized Physical-knowledge-guided Dynamic Model for Underwater Image Enhancement. 7111-7120 - Yubin Wang, Huimin Yu, Yuming Yan, Shuyi Song, Biyang Liu, Yichong Lu:
Exploring Shape Embedding for Cloth-Changing Person Re-Identification via 2D-3D Correspondences. 7121-7130 - Chao Shuai, Jieming Zhong, Shuang Wu, Feng Lin, Zhibo Wang, Zhongjie Ba, Zhenguang Liu, Lorenzo Cavallaro, Kui Ren:
Locate and Verify: A Two-Stream Network for Improved Deepfake Detection. 7131-7142 - Yinyin Peng, Donghui Hu, Yaofei Wang, Kejiang Chen, Gang Pei, Weiming Zhang:
StegaDDPM: Generative Image Steganography based on Denoising Diffusion Probabilistic Model. 7143-7151 - Jianyang Shi, Haijun Zhang, Dongliang Zhou, Zhao Zhang:
Toward Intelligent Interactive Design: A Generation Framework Based on Cross-domain Fashion Elements. 7152-7163 - Danni Yang, Jiayi Ji, Xiaoshuai Sun, Haowei Wang, Yinan Li, Yiwei Ma, Rongrong Ji:
Semi-Supervised Panoptic Narrative Grounding. 7164-7174 - Jiahang Zhang, Lilang Lin, Jiaying Liu:
Prompted Contrast with Masked Motion Modeling: Towards Versatile 3D Action Representation Learning. 7175-7183 - Shuting Dong, Zhe Wu, Feng Lu, Chun Yuan:
Enhanced Image Deblurring: An Efficient Frequency Exploitation and Preservation Network. 7184-7193 - Han Yan, Haijun Zhang, Jie Hou, Jicong Fan, Zhao Zhang:
InspirNET: An Unsupervised Generative Adversarial Network with Controllable Fine-grained Texture Disentanglement for Fashion Generation. 7194-7204 - Yiwen Xu, Ruoyu Guo, Maurice Pagnucco, Yang Song:
Draw2Edit: Mask-Free Sketch-Guided Image Manipulation. 7205-7215 - Hongtao Wu, Yijun Yang, Haoyu Chen, Jingjing Ren, Lei Zhu:
Mask-Guided Progressive Network for Joint Raindrop and Rain Streak Removal in Videos. 7216-7225 - Ce Wang, Kun Shang, Haimiao Zhang, Shang Zhao, Dong Liang, S. Kevin Zhou:
Active CT Reconstruction with a Learned Sampling Policy. 7226-7235 - Yifan Gao, Jinpeng Lin, Min Zhou, Chuanbin Liu, Hongtao Xie, Tiezheng Ge, Yuning Jiang:
TextPainter: Multimodal Text Image Generation with Visual-harmony and Text-comprehension for Poster Design. 7236-7246 - Chang Liu, Lichen Wang, Yun Fu:
Rethinking Neighborhood Consistency Learning on Unsupervised Domain Adaptation. 7247-7254 - Zijun Deng, Xiangteng He, Yuxin Peng, Xiongwei Zhu, Lele Cheng:
MV-Diffusion: Motion-aware Video Diffusion Model. 7255-7263 - Yanglin Feng, Hongyuan Zhu, Dezhong Peng, Xi Peng, Peng Hu:
ROAD: Robust Unsupervised Domain Adaptation with Noisy Labels. 7264-7273 - Gaozhi Liu, Yichao Si, Zhenxing Qian, Xinpeng Zhang, Sheng Li, Wanli Peng:
WRAP: Watermarking Approach Robust Against Film-coating upon Printed Photographs. 7274-7282 - Baolong Liu, Tianyi Zheng, Peng Zheng, Daizong Liu, Xiaoye Qu, Junyu Gao, Jianfeng Dong, Xun Wang:
Lite-MKD: A Multi-modal Knowledge Distillation Framework for Lightweight Few-shot Action Recognition. 7283-7294 - Zijun Deng, Xiangteng He, Yuxin Peng:
Efficiency-optimized Video Diffusion Models. 7295-7303 - Gehui Li, Jinyuan Liu, Long Ma, Zhiying Jiang, Xin Fan, Risheng Liu:
Fearless Luminance Adaptation: A Macro-Micro-Hierarchical Transformer for Exposure Correction. 7304-7313 - Zengxi Zhang, Zhiying Jiang, Jinyuan Liu, Xin Fan, Risheng Liu:
WaterFlow: Heuristic Normalizing Flow for Underwater Image Enhancement and Beyond. 7314-7323 - Jinpu Zhang, Ziwen Li, Ruonan Wei, Yuehuan Wang:
Progressive Domain-style Translation for Nighttime Tracking. 7324-7334 - Guangyuan Li, Wei Xing, Lei Zhao, Zehua Lan, Zhanjie Zhang, Jiakai Sun, Haolin Yin, Huaizhong Lin, Zhijie Lin:
DuDoINet: Dual-Domain Implicit Network for Multi-Modality MR Image Arbitrary-scale Super-Resolution. 7335-7344 - Han Fang, Kejiang Chen, Yupeng Qiu, Jiayang Liu, Ke Xu, Chengfang Fang, Weiming Zhang, Ee-Chien Chang:
DeNoL: A Few-Shot-Sample-Based Decoupling Noise Layer for Cross-channel Watermarking Robustness. 7345-7353 - Luojun Lin, Zhifeng Shen, Zhishu Sun, Yuanlong Yu, Lei Zhang, Weijie Chen:
Parameter Exchange for Robust Dynamic Domain Generalization. 7354-7362 - Zichun Wang, Yulun Zhang, Debing Zhang, Ying Fu:
Recurrent Self-Supervised Video Denoising with Denser Receptive Field. 7363-7372 - Junxue Yang, Xin Liao:
Exploiting Fine-Grained DCT Representations for Hiding Image-Level Messages within JPEG Images. 7373-7382 - Qingshan Hou, Peng Cao, Jiaqi Wang, Xiaoli Liu, Jinzhu Yang, Osmar R. Zaïane:
A Reference-free Self-supervised Domain Adaptation Framework for Low-quality Fundus Image Enhancement. 7383-7393 - Wei Wan, Shengshan Hu, Minghui Li, Jianrong Lu, Longling Zhang, Leo Yu Zhang, Hai Jin:
A Four-Pronged Defense Against Byzantine Attacks in Federated Learning. 7394-7402 - Boyang Wang, Yan Wang, Qing Zhao, Junxiong Lin, Zeng Tao, Pinxue Guo, Zhaoyu Chen, Kaixun Jiang, Shaoqi Yan, Shuyong Gao, Wenqiang Zhang:
A Capture to Registration Framework for Realistic Image Super-Resolution in the Industry Environment. 7403-7412 - Xinhao Deng, Pingping Zhang, Wei Liu, Huchuan Lu:
Recurrent Multi-scale Transformer for High-Resolution Salient Object Detection. 7413-7423 - Yanru He, Kejiang Chen, Guoqiang Chen, Zehua Ma, Kui Zhang, Jie Zhang, Huanyu Bian, Han Fang, Weiming Zhang, Nenghai Yu:
ProTegO: Protect Text Content against OCR Extraction Attack. 7424-7434 - Chenxi Wang, Hongjun Wu, Zhi Jin:
FourLLIE: Boosting Low-Light Image Enhancement by Fourier Frequency Information. 7459-7469 - Teng Hu, Ran Yi, Haokun Zhu, Liang Liu, Jinlong Peng, Yabiao Wang, Chengjie Wang, Lizhuang Ma:
Stroke-based Neural Painting and Stylization with Dynamically Predicted Painting Region. 7470-7480 - Lingyi Hong, Wei Zhang, Shuyong Gao, Hong Lu, Wenqiang Zhang:
SimulFlow: Simultaneously Extracting Feature and Identifying Target for Unsupervised Video Object Segmentation. 7481-7490 - Yan Zhu, Junbao Zhuo, Bin Ma, Jiajia Geng, Xiaoming Wei, Xiaolin Wei, Shuhui Wang:
Orthogonal Temporal Interpolation for Zero-Shot Video Recognition. 7491-7501 - Jingcan Duan, Pei Zhang, Siwei Wang, Jingtao Hu, Hu Jin, Jiaxin Zhang, Haifang Zhou, Xinwang Liu:
Normality Learning-based Graph Anomaly Detection via Multi-Scale Contrastive Learning. 7502-7511 - Kangkang Zhou, Lijun Zhang, Feng Lu, Xiang-Dong Zhou, Yu Shi:
Efficient Hierarchical Multi-view Fusion Transformer for 3D Human Pose Estimation. 7512-7520 - Jiawei Wang, Zhanchang Ma, Da Cao, Yuquan Le, Junbin Xiao, Tat-Seng Chua:
Deconfounded Multimodal Learning for Spatio-temporal Video Grounding. 7521-7529 - Minyi Zhao, Shijie Xuyang, Jihong Guan, Shuigeng Zhou:
STIRER: A Unified Model for Low-Resolution Scene Text Image Recovery and Recognition. 7530-7539 - Jingwen Chen, Yingwei Pan, Ting Yao, Tao Mei:
ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors. 7540-7548 - Guangzhao Dai, Xiangbo Shu, Rui Yan, Peng Huang, Jinhui Tang:
Slowfast Diversity-aware Prototype Learning for Egocentric Action Recognition. 7549-7558 - Jiacheng Deng, Li Dong, Jiahao Chen, Diqun Yan, Rangding Wang, Dengpan Ye, Lingchen Zhao, Jinyu Tian:
Universal Defensive Underpainting Patch: Making Your Text Invisible to Optical Character Recognition. 7559-7568 - Zhiqing Hong, Chenye Cui, Rongjie Huang, Lichao Zhang, Jinglin Liu, Jinzheng He, Zhou Zhao:
UniSinger: Unified End-to-End Singing Voice Synthesis With Cross-Modality Information Matching. 7569-7579 - Yuqi Jiang, Chune Zhang, Shuo Jin, Jiao Liu, Jiapeng Wang:
CLG-INet: Coupled Local-Global Interactive Network for Image Restoration. 7580-7589 - Chen Liu, Peike Patrick Li, Xingqun Qi, Hu Zhang, Lincheng Li, Dadong Wang, Xin Yu:
Audio-Visual Segmentation by Exploring Cross-Modal Mutual Semantics. 7590-7598 - Junhong Gou, Siyu Sun, Jianfu Zhang, Jianlou Si, Chen Qian, Liqing Zhang:
Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow. 7599-7607 - Xian Wang, Xiaoyu Mo, Lik-Hang Lee, Xiaoying Wei, Xiaofu Jin, Mingming Fan, Pan Hui:
Designing Loving-Kindness Meditation in Virtual Reality for Long-Distance Romantic Relationships. 7608-7617 - Wei Jiang, Jiayu Yang, Yongqi Zhai, Peirong Ning, Feng Gao, Ronggang Wang:
MLIC: Multi-Reference Entropy Model for Learned Image Compression. 7618-7627 - Mingrui Zhang, Ming Chen, Yan Zhou, Li Chen, Weihua Jian, Pengfei Wan:
Automatic Human Scene Interaction through Contact Estimation and Motion Adaptation. 7628-7637 - Gang Li, Xianzheng Ma, Zhao Wang, Hao Li, Qifei Zhang, Chao Wu:
When Masked Image Modeling Meets Source-free Unsupervised Domain Adaptation: Dual-Level Masked Network for Semantic Segmentation. 7638-7647 - Chuanming Wang, Huiyuan Fu, Huadong Ma:
Multi-Part Token Transformer with Dual Contrastive Learning for Fine-grained Image Classification. 7648-7656 - Junwei Zhao, Jianming Ye, Shiliang Zhang, Zhaofei Yu, Tiejun Huang:
Recognizing High-Speed Moving Objects with Spike Camera. 7657-7665 - Meiqi Cao, Rui Yan, Xiangbo Shu, Jiachao Zhang, Jinpeng Wang, Guo-Sen Xie:
MUP: Multi-granularity Unified Perception for Panoramic Activity Recognition. 7666-7675 - Hongyuan Wang, Lizhi Wang, Chang Chen, Xue Hu, Fenglong Song, Hua Huang:
Learning Spectral-wise Correlation for Spectral Super-Resolution: Where Similarity Meets Particularity. 7676-7685 - Kun Yang, Dingkang Yang, Jingyu Zhang, Hanqi Wang, Peng Sun, Liang Song:
What2comm: Towards Communication-efficient Collaborative Perception via Feature Decoupling. 7686-7695 - Hongjie Zhang, Yi Liu, Yali Wang, Limin Wang, Yu Qiao:
Learning Discriminative Feature Representation for Open Set Action Recognition. 7696-7705 - Shenghai Yuan, Jijia Chen, Jiaqi Li, Wenchao Jiang, Song Guo:
LHNet: A Low-cost Hybrid Network for Single Image Dehazing. 7706-7717 - Songlin Yang, Wei Wang, Jun Ling, Bo Peng, Xu Tan, Jing Dong:
Context-Aware Talking-Head Video Editing. 7718-7727 - Euihyeok Lee, Chulhong Min, Jaeseung Lee, Jin Yu, Seungwoo Kang:
GrooveMeter: Enabling Music Engagement-aware Apps by Detecting Reactions to Daily Music Listening via Earable Sensing. 7728-7736 - Xianghao Zang, Wei Gao, Ge Li, Han Fang, Chao Ban, Zhongjiang He, Hao Sun:
A Baseline Investigation: Transformer-based Cross-view Baseline for Text-based Person Search. 7737-7746 - Pengyuan Lyu, Weihong Ma, Hongyi Wang, Yuechen Yu, Chengquan Zhang, Kun Yao, Yang Xue, Jingdong Wang:
GridFormer: Towards Accurate Table Structure Recognition via Grid Prediction. 7747-7757 - Yingchi Liu, Zhu Liu, Long Ma, Jinyuan Liu, Xin Fan, Zhongxuan Luo, Risheng Liu:
Bilevel Generative Learning for Low-Light Vision. 7758-7766 - Maizhen Ning, Qiu-Feng Wang, Kaizhu Huang, Xiaowei Huang:
A Symbolic Characters Aware Model for Solving Geometry Problems. 7767-7775 - Haoyu Wang, Haozhe Wu, Junliang Xing, Jia Jia:
Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space. 7776-7784 - Xinmin Qiu, Congying Han, Zicheng Zhang, Bonan Li, Tiande Guo, Xuecheng Nie:
DiffBFR: Bootstrapping Diffusion Model for Blind Face Restoration. 7785-7795 - Mingrui Lao, Nan Pu, Zhun Zhong, Nicu Sebe, Michael S. Lew:
FedVQA: Personalized Federated Visual Question Answering over Heterogeneous Scenes. 7796-7807 - Guangyao Li, Wenxuan Hou, Di Hu:
Progressive Spatio-temporal Perception for Audio-Visual Question Answering. 7808-7816 - Yusheng Guo, Nan Zhong, Zhenxing Qian, Xinpeng Zhang:
Physical Invisible Backdoor Based on Camera Imaging. 7817-7825 - Zhihao Li, Kexue Fu, Haoran Wang, Manning Wang:
PI-NeRF: A Partial-Invertible Neural Radiance Fields for Pose Estimation. 7826-7836 - Mengping Yang, Zhe Wang, Wenyi Feng, Qian Zhang, Ting Xiao:
Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation. 7837-7848 - Zhicong Zheng, Xinfeng Li, Chen Yan, Xiaoyu Ji, Wenyuan Xu:
The Silent Manipulator: A Practical and Inaudible Backdoor Attack against Speech Recognition Systems. 7849-7858 - Joo Chan Lee, Daniel Rho, Jong Hwan Ko, Eunbyung Park:
FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos. 7859-7870 - Hongming Luo, Fei Zhou, Zehong Zhou, Kin-Man Lam, Guoping Qiu:
Restoration of Multiple Image Distortions using a Semi-dynamic Deep Neural Network. 7871-7880 - Dongliang Zhou, Haijun Zhang, Jianghong Ma, Jicong Fan, Zhao Zhang:
FCBoost-Net: A Generative Network for Synthesizing Multiple Collocated Outfits via Fashion Compatibility Boosting. 7881-7889 - Fanda Fan, Chaoxu Guo, Litong Gong, Biao Wang, Tiezheng Ge, Yuning Jiang, Chunjie Luo, Jianfeng Zhan:
Hierarchical Masked 3D Diffusion Model for Video Outpainting. 7890-7900 - Runhua Jiang, Yongge Liu, Boyuan Zhang, Xu Chen, Deng Li, Yahong Han:
OraclePoints: A Hybrid Neural Representation for Oracle Character. 7901-7911 - Yuke Li, Jingkuan Song, Hao Ni, Heng Tao Shen:
Style-Controllable Generalized Person Re-identification. 7912-7921 - Hengchang Guo, Qilong Zhang, Junwei Luo, Feng Guo, Wenbin Zhang, Xiaodong Su, Minglei Li:
Practical Deep Dispersed Watermarking with Synchronization and Fusion. 7922-7932 - Wenxue Cui, Xingtao Wang, Xiaopeng Fan, Shaohui Liu, Chen Ma, Debin Zhao:
G2-DUN: Gradient Guided Deep Unfolding Network for Image Compressive Sensing. 7933-7942 - Zicong Luo, Sheng Li, Guobiao Li, Zhenxing Qian, Xinpeng Zhang:
Securing Fixed Neural Network Steganography. 7943-7951 - Yong Liu, Hang Dong, Boyang Liang, Songwei Liu, Qingji Dong, Kai Chen, Fangmin Chen, Lean Fu, Fei Wang:
Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for Super-Resolution. 7952-7960 - Kuan Tian, Yonghang Guan, Jinxi Xiang, Jun Zhang, Xiao Han, Wei Yang:
Towards Real-Time Neural Video Codec for Cross-Platform Application Using Calibration Information. 7961-7970 - Chen Tang, Kai Ouyang, Zenghao Chai, Yunpeng Bai, Yuan Meng, Zhi Wang, Wenwu Zhu:
SEAM: Searching Transferable Mixed-Precision Quantization Policy through Large Margin Regularization. 7971-7980 - Guangyuan Li, Wei Xing, Lei Zhao, Zehua Lan, Jiakai Sun, Zhanjie Zhang, Quanwei Zhang, Huaizhong Lin, Zhijie Lin:
Self-Reference Image Super-Resolution via Pre-trained Diffusion Large Model and Window Adjustable Transformer. 7981-7992 - Shuting Xia, Tingyu Fan, Yiling Xu, Jenq-Neng Hwang, Zhu Li:
Learning Dynamic Point Cloud Compression via Hierarchical Inter-frame Block Matching. 7993-8003 - Bingchen Gong, Yuehao Wang, Xiaoguang Han, Qi Dou:
RecolorNeRF: Layer Decomposed Radiance Fields for Efficient Color Editing of 3D Scenes. 8004-8015 - Jianwen Sun, Fenghua Yu, Sannyuya Liu, Yawei Luo, Ruxia Liang, Xiaoxuan Shen:
Adversarial Bootstrapped Question Representation Learning for Knowledge Tracing. 8016-8025 - Hongwei Ren, Yue Zhou, Haotian Fu, Yulong Huang, Renjing Xu, Bojun Cheng:
TTPOINT: A Tensorized Point Cloud Network for Lightweight Action Recognition with Event Cameras. 8026-8034 - Kun Pan, Yifang Yin, Yao Wei, Feng Lin, Zhongjie Ba, Zhenguang Liu, Zhibo Wang, Lorenzo Cavallaro, Kui Ren:
DFIL: Deepfake Incremental Learning by Exploiting Domain-invariant Forgery Clues. 8035-8046 - Lei Liu, Zhihao Hu, Zhenghao Chen, Dong Xu:
ICMH-Net: Neural Image Compression Towards both Machine Vision and Human Vision. 8047-8056 - Meng Li, Yibo Shi, Jing Wang, Yunqi Huang:
High Visual-Fidelity Learned Video Compression. 8057-8066 - Guanghui Zhang, Ke Liu, Mengbai Xiao, Bingshu Wang, Vaneet Aggarwal:
An Intelligent Learning Approach to Achieve Near-Second Low-Latency Live Video Streaming under Highly Fluctuating Networks. 8067-8075 - Weisong Zhao, Xiangyu Zhu, Zhixiang He, Xiaoyu Zhang, Zhen Lei:
Cross-Architecture Distillation for Face Recognition. 8076-8085 - Zhijian Wu, Jun Li, Dingjiang Huang:
Separable Modulation Network for Efficient Image Super-Resolution. 8086-8094 - Yulin Zhang, Jiangqun Ni, Wenkang Su, Xin Liao:
A Novel Deep Video Watermarking Framework with Enhanced Robustness to H.264/AVC Compression. 8095-8104 - Yue Yuan, Rundong He, Zhongyi Han, Yilong Yin:
LHAct: Rectifying Extremely Low and High Activations for Out-of-Distribution Detection. 8105-8113 - Jinshui Hu, Hao Wu, Mingjun Chen, Chenyu Liu, Jiajia Wu, Shi Yin, Baocai Yin, Bing Yin, Cong Liu, Jun Du, Lirong Dai:
Handwritten Chemical Structure Image to Structure-Specific Markup Using Random Conditional Guided Decoder. 8114-8124 - Ting Zhang, Nanfeng Jiang, Hongxin Wu, Keke Zhang, Yuzhen Niu, Tiesong Zhao:
HCSD-Net: Single Image Desnowing with Color Space Transformation. 8125-8133 - Wenhao Li, Guangyang Wu, Wenyi Wang, Peiran Ren, Xiaohong Liu:
FastLLVE: Real-Time Low-Light Video Enhancement with Intensity-Aware Look-Up Table. 8134-8144 - Yuyang Yin, Dejia Xu, Chuangchuang Tan, Ping Liu, Yao Zhao, Yunchao Wei:
CLE Diffusion: Controllable Light Enhancement Diffusion Model. 8145-8156 - Pengfei Zhou, Weiqing Min, Yang Zhang, Jiajun Song, Ying Jin, Shuqiang Jiang:
SeeDS: Semantic Separable Diffusion Synthesizer for Zero-shot Food Detection. 8157-8166 - Liao Shen, Xingyi Li, Huiqiang Sun, Juewen Peng, Ke Xian, Zhiguo Cao, Guosheng Lin:
Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image. 8167-8175 - Yang Xiao, Bo Duan, Mingwei Sun, Jingwei Huang:
LocalPose: Object Pose Estimation with Local Geometry Guidance. 8176-8184 - Xianghao Jiao, Yaohua Liu, Jiaxin Gao, Xinyuan Chu, Xin Fan, Risheng Liu:
PEARL: Preprocessing Enhanced Adversarial Robust Learning of Image Deraining for Semantic Segmentation. 8185-8194 - Yuchao Feng, Yanyan Shao, Honghui Xu, Jinshan Xu, Jianwei Zheng:
A Lightweight Collective-attention Network for Change Detection. 8195-8203 - Mengyi Wang, Xinxin Zhang, Yongshun Gong, Yilong Yin:
Personalized Single Image Reflection Removal Network through Adaptive Cascade Refinement. 8204-8213 - Weiyu Sun, Xinyu Zhang, Hao Lu, Ying Chen, Yun Ge, Xiaolin Huang, Jie Yuan, Yingcong Chen:
Resolve Domain Conflicts for Generalizable Remote Physiological Measurement. 8214-8224 - Yang Wei, Bin Xiao, Xiuli Bi, Zhuoran Ma, Yang Liu, Zhuo Ma:
Secondary Labeling: A Novel Labeling Strategy for Image Manipulation Detection. 8225-8232 - Qingliang Liu, Jiangqun Ni, Xianglei Hu:
Robust Image Steganography against General Scaling Attacks. 8233-8241 - Han Kim, Chunggi Lee, Junsoo Lee, Dohyun Kim, Kwangjin Lee, Moohyun Oh, Daesik Kim:
FlatGAN: A Holistic Approach for Robust Flat-Coloring in High-Definition with Understanding Line Discontinuity. 8242-8250 - Lin Zhu, Yunlong Zheng, Mengyue Geng, Lizhi Wang, Hua Huang:
Recurrent Spike-based Image Restoration under General Illumination. 8251-8260 - Shiao Xie, Ziwei Niu, Huimin Huang, Hao Sun, Rui Qin, Yen-Wei Chen, Lanfen Lin:
IS2Net: Intra-domain Semantic and Inter-domain Style Enhancement for Semi-supervised Medical Domain Generalization. 8285-8293 - Junbao Zhuo, Xingyu Zhao, Shuhui Wang, Huimin Ma, Qingming Huang:
Synthesizing Videos from Images for Image-to-Video Adaptation. 8294-8303 - Didi Zhu, Yinchuan Li, Yunfeng Shao, Jianye Hao, Fei Wu, Kun Kuang, Jun Xiao, Chao Wu:
Generalized Universal Domain Adaptation with Generative Flow Networks. 8304-8315 - Dongxia Huang, Weiqi Luo, Peijia Zheng, Jiwu Huang:
Automatic Asymmetric Embedding Cost Learning via Generative Adversarial Networks. 8316-8326 - Wen Yang, Jinjian Wu, Leida Li, Weisheng Dong, Guangming Shi:
Event-based Motion Deblurring with Modality-Aware Decomposition and Recomposition. 8327-8335 - Cong Huang, Jiahao Li, Lei Chu, Dong Liu, Yan Lu:
Disentangle Propagation and Restoration for Efficient Video Recovery. 8336-8345 - Zhen Zhao, Meng Zhao, Ye Liu, Di Yin, Luping Zhou:
Entropy-based Optimization on Individual and Global Predictions for Semi-Supervised Learning. 8346-8355 - Chenxi Wang, Zhi Jin:
Brighten-and-Colorize: A Decoupled Network for Customized Low-Light Image Enhancement. 8356-8366 - Panda Pan, Yang Zhao, Yuan Chen, Wei Jia, Zhao Zhang, Ronggang Wang:
Cross-view Resolution and Frame Rate Joint Enhancement for Binocular Video. 8367-8375 - Fanfan Ye, Bingyi Lu, Liang Ma, Qiaoyong Zhong, Di Xie:
Up to Thousands-fold Storage Saving: Towards Efficient Data-Free Distillation of Large-Scale Visual Classifiers. 8376-8386 - Menglin Wang, Xiaojin Gong:
Learning Intra and Inter-Camera Invariance for Isolated Camera Supervised Person Re-identification. 8387-8395 - Mingzhi Lyu, Yi Huang, Adams Wai-Kin Kong:
Adversarial Attack for Robust Watermark Protection Against Inpainting-based and Blind Watermark Removers. 8396-8405 - Junhong Lin, Shufan Pei, Bing Chen, Nanfeng Jiang, Wei Gao, Tiesong Zhao:
LDRM: Degradation Rectify Model for Low-light Imaging via Color-Monochrome Cameras. 8406-8414 - Shang Chai, Liansheng Zhuang, Fengying Yan, Zihan Zhou:
Two-stage Content-Aware Layout Generation for Poster Designs. 8415-8423 - Zhenbo Shi, Wei Yang, Zhenbo Xu, Zhidong Yu, Liusheng Huang:
Reinforcement Learning-based Adversarial Attacks on Object Detectors using Reward Shaping. 8424-8432 - Zhengwentai Sun, Yanghong Zhou, Honghong He, P. Y. Mok:
SGDiff: A Style Guided Diffusion Model for Fashion Synthesis. 8433-8442 - Zhengyan Sheng, Yang Ai, Yan-Nian Chen, Zhen-Hua Ling:
Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment. 8443-8452 - Zeyu Ma, Ziqiang Zheng, Jiwei Wei, Xiaoyong Wei, Yang Yang, Heng Tao Shen:
Open-Scenario Domain Adaptive Object Detection in Autonomous Driving. 8453-8462 - Run Wang, Jixing Ren, Boheng Li, Tianyi She, Wenhui Zhang, Liming Fang, Jing Chen, Lina Wang:
Free Fine-tuning: A Plug-and-Play Watermarking Scheme for Deep Neural Networks. 8463-8474 - Dan Zeng, Shanchuan Hong, Shuiwang Li, Qiaomu Shen, Bo Tang:
Data-Scarce Animal Face Alignment via Bi-Directional Cross-Species Knowledge Transfer. 8475-8485 - Chaoning Zhang, Philipp Benz, Adil Karjauv, In So Kweon, Choong Seon Hong:
Simple Techniques are Sufficient for Boosting Adversarial Transferability. 8486-8494 - Yuhang Zhao, Shanchen Pang, Zhihan Lv, Sheng Miao:
Augmented Digital Twins for Predictive Automatic Regulation and Fault Alarm in Sewage Plan. 8495-8503 - Siyue Yao, Mingjie Sun, Bingliang Li, Fengyu Yang, Junle Wang, Ruimao Zhang:
Dance with You: The Diversity Controllable Dancer Generation via Diffusion Models. 8504-8514 - Ji Zhang, Xiao Wu, Zhi-Qi Cheng, Qi He, Wei Li:
Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment. 8515-8524 - Qiutang Qi, Haonan Cheng, Yang Wang, Long Ye, Shaobin Li:
RD-FGFS: A Rule-Data Hybrid Framework for Fine-Grained Footstep Sound Synthesis from Visual Guidance. 8525-8533 - Lihua Lu, Hui Wei, Xin Jin, Yihao Zhang, Boyan Dong, Longteng Jiang, Xiaohui Zhang, Ruyang Li, Yaqian Zhao:
Aesthetics-Driven Virtual Time-Lapse Photography Generation. 8534-8542 - Zhenghao Chen, Lucas Relic, Roberto Azevedo, Yang Zhang, Markus Gross, Dong Xu, Luping Zhou, Christopher Schroers:
Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers. 8543-8551 - Xuan Hai, Xin Liu, Yuan Tan, Qingguo Zhou:
SiFDetectCracker: An Adversarial Attack Against Fake Voice Detection Based on Speaker-Irrelative Features. 8552-8560 - Leo Shan, Wenzhang Zhou, Grace Zhao:
Incremental Few Shot Semantic Segmentation via Class-agnostic Mask Proposal and Language-driven Classifier. 8561-8570 - Raghav Jain, Apoorva Singh, Vivek Kumar Gangwar, Sriparna Saha:
AbCoRD: Exploiting multimodal generative approach for Aspect-based Complaint and Rationale Detection. 8571-8579 - Davide Morelli, Alberto Baldrati, Giuseppe Cartella, Marcella Cornia, Marco Bertini, Rita Cucchiara:
LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On. 8580-8589 - Yanli Ji, Lingfeng Ye, Huili Huang, Lijing Mao, Yang Zhou, Lingling Gao:
Localization-assisted Uncertainty Score Disentanglement Network for Action Quality Assessment. 8590-8597 - Kaixun Jiang, Lingyi Hong, Zhaoyu Chen, Pinxue Guo, Zeng Tao, Yan Wang, Wenqiang Zhang:
Exploring the Adversarial Robustness of Video Object Segmentation via One-shot Adversarial Attacks. 8598-8607 - Chris Lenart, Pegah Ahadian, Yuxin Yang, Simon Suo, Ashton Corsello, Karl W. Kosko, Qiang Guan:
Gaze Analysis System for Immersive 360° Video for Preservice Teacher Education. 8608-8616 - Shuo Jin, Meiqin Liu, Chao Yao, Chunyu Lin, Yao Zhao:
Kernel Dimension Matters: To Activate Available Kernels for Real-time Video Super-Resolution. 8617-8625 - Jinlong Fan, Jing Zhang, Zhi Hou, Dacheng Tao:
AniPixel: Towards Animatable Pixel-Aligned Human Avatar. 8626-8634 - Qifeng Lin, Luojun Lin, Yuanlong Yu, Gang Fu:
A Multiple Prediction Mechanisms Ensemble for Complex Remote Sensing Scenes. 8635-8643 - Siyuan Huang, Bo Zhang, Botian Shi, Hongsheng Li, Yikang Li, Peng Gao:
SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification. 8644-8652 - Ziyu Feng, Zheming Xu, Haina Qin, Congyan Lang, Bing Li, Weihua Xiong:
SMM: Self-supervised Multi-Illumination Color Constancy Model with Multiple Pretext Tasks. 8653-8661 - Shukai Wu, Yuhang Yang, Shuchang Xu, Weiming Liu, Xiao Yan, Sanyuan Zhang:
FlexIcon: Flexible Icon Colorization via Guided Images and Palettes. 8662-8673
Poster Session IX: Engaging Users with Multimedia -- Social-good, Fairness and Transparency
- Yanzhen Ren, Hongcheng Zhu, Liming Zhai, Zongkun Sun, Rubing Shen, Lina Wang:
Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion. 8674-8685 - Liping Yi, Gang Wang, Xiaoguang Liu, Zhuan Shi, Han Yu:
FedGH: Heterogeneous Federated Learning with Generalized Global Header. 8686-8696 - Jinqian Chen, Jihua Zhu, Qinghai Zheng:
Towards Fast and Stable Federated Learning: Confronting Heterogeneity via Knowledge Anchor. 8697-8706 - Maosen Li, Xurong Li, Kun Yu, Cheng Deng, Heng Huang, Feng Mao, Hui Xue, Minghao Li:
Spatio-Temporal Catcher: A Self-Supervised Transformer for Deepfake Video Detection. 8707-8718 - Shide Du, Zihan Fang, Shiyang Lan, Yanchao Tan, Manuel Günther, Shiping Wang, Wenzhong Guo:
Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness. 8719-8729 - Peini Guo, Hong Liu, Jianbing Wu, Guoquan Wang, Tao Wang:
Semantic-aware Consistency Network for Cloth-changing Person Re-Identification. 8730-8739 - Xiaotian Wang, Shuo Liang, Zhifu Zhao, Xinyu Cui, Kai Chen, Xuanhang Xu:
Adaptive Spatio-Temporal Directed Graph Neural Network for Parkinson's Detection using Vertical Ground Reaction Force. 8740-8748 - Rui Zhang, Hongxia Wang, Mingshan Du, Hanqing Liu, Yang Zhou, Qiang Zeng:
UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization. 8749-8759 - Xiangming Gu, Wei Zeng, Ye Wang:
Elucidate Gender Fairness in Singing Voice Transcription. 8760-8769 - Yuyan Bu, Qiang Sheng, Juan Cao, Peng Qi, Danding Wang, Jintao Li:
Combating Online Misinformation Videos: Characterization, Detection, and Future Directions. 8770-8780 - Yuxuan Zhang, Lei Liu, Li Liu:
Cuing Without Sharing: A Federated Cued Speech Recognition Framework via Mutual Knowledge Distillation. 8781-8789 - Yuxuan Tan, Yuanman Li, Limin Zeng, Jiaxiong Ye, Wei Wang, Xia Li:
Multi-scale Target-Aware Framework for Constrained Splicing Detection and Localization. 8790-8798 - Muyao Niu, Zhuoxiao Li, Yifan Zhan, Huy H. Nguyen, Isao Echizen, Yinqiang Zheng:
Physics-Based Adversarial Attack on Near-Infrared Human Detector for Nighttime Surveillance Camera Systems. 8799-8807 - Shengtao Lou, Buyu Liu, Jun Bao, Jiajun Ding, Jun Yu:
Follow-me: Deceiving Trackers with Fabricated Paths. 8808-8818 - Lu Wei, Bin Liu, Jiujun He, Manxue Zhang, Yi Huang:
Autistic Spectrum Disorders Diagnose with Graph Neural Networks. 8819-8827 - Hui Wei, Hanxun Yu, Kewei Zhang, Zhixiang Wang, Jianke Zhu, Zheng Wang:
Moiré Backdoor Attack (MBA): A Novel Trigger for Pedestrian Detectors in the Physical World. 8828-8838 - Wenxuan Liu, Tianyao He, Chen Gong, Ning Zhang, Hua Yang, Junchi Yan:
Fine-Grained Music Plagiarism Detection: Revealing Plagiarists through Bipartite Graph Matching and a Comprehensive Large-Scale Dataset. 8839-8848 - Kui Zhang, Hang Zhou, Jie Zhang, Qidong Huang, Weiming Zhang, Nenghai Yu:
Ada3Diff: Defending against 3D Adversarial Point Clouds via Adaptive Diffusion. 8849-8859 - Yi Zhang, Jitao Sang, Junyang Wang, Dongmei Jiang, Yaowei Wang:
Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut Features. 8860-8868 - Zhihao Yue, Jun Xia, Zhiwei Ling, Ming Hu, Ting Wang, Xian Wei, Mingsong Chen:
Model-Contrastive Learning for Backdoor Elimination. 8869-8880 - Shucheng Li, Runchuan Wang, Hao Wu, Sheng Zhong, Fengyuan Xu:
SIEGE: Self-Supervised Incremental Deep Graph Learning for Ethereum Phishing Scam Detection. 8881-8890 - Jinzhang Hu, Ruimin Hu, Zheng Wang, Dengshi Li, Junhang Wu, Lingfei Ren, Yilong Zang, Zijun Huang, Mei Wang:
Collaborative Fraud Detection: How Collaboration Impacts Fraud Detection. 8891-8899 - Zixuan Ni, Longhui Wei, Jiacheng Li, Siliang Tang, Yueting Zhuang, Qi Tian:
Degeneration-Tuning: Using Scrambled Grid shield Unwanted Concepts from Stable Diffusion. 8900-8909 - Wan Jiang, Yunfeng Diao, He Wang, Jianxin Sun, Meng Wang, Richang Hong:
Unlearnable Examples Give a False Sense of Security: Piercing through Unexploitable Data with Learnable Examples. 8910-8921
Poster Session X: Multimedia systems -- Data Systems Management and Indexing
- Fei Shen, Xiangbo Shu, Xiaoyu Du, Jinhui Tang:
Pedestrian-specific Bipartite-aware Similarity Learning for Text-based Person Retrieval. 8922-8931 - Xingyu Zhao, Lei Qi, Yuexuan An, Xin Geng:
Generalizable Label Distribution Learning. 8932-8941 - Yan Jiang, Hongtao Xie, Lei Zhang, Pandeng Li, Dongming Zhang, Yongdong Zhang:
Dual Dynamic Proxy Hashing Network for Long-tailed Image Retrieval. 8942-8953 - Sensen Zhang, Xun Liang, Hui Tang, Zhenyu Guan:
Hybrid Interaction Temporal Knowledge Graph Embedding Based on Householder Transformations. 8954-8962 - Huaiwen Zhang, Yang Yang, Fan Qi, Shengsheng Qian, Changsheng Xu:
C2MR: Continual Cross-Modal Retrieval for Streaming Multi-modal Data. 8963-8974 - Yuanding Zhou, Xinran Li, Yaodong Fang, Chuan Qin:
When Perceptual Authentication Hashing Meets Neural Architecture Search. 8975-8983 - Xinbiao Gan, Jiaqi Guo, Peilin Guo, Guang Wu, Jiaqi Si, Songzhu Mei, Cong Liu, Tiejun Li:
GraphMedia: Communication-balanced Graph Searching for Billion-scale Social Media Access. 8984-8993
Poster Session XI: Multimedia systems -- Systems and Middleware
- Yunfei Long, Zhe Xue, Lingyang Chu, Tianlong Zhang, Junjiang Wu, Yu Zang, Junping Du:
FedCD: A Classifier Debiased Federated Learning Framework for Non-IID Data. 8994-9002 - Jangho Kim, Jayeon Yoo, Yeji Song, KiYoon Yoo, Nojun Kwak:
Finding Efficient Pruned Network via Refined Gradients for Pruned Weights. 9003-9011 - Jingzong Li, Yik Hong Cai, Libin Liu, Yu Mao, Chun Jason Xue, Hong Xu:
Moby: Empowering 2D Models for Efficient Point Cloud Analytics on the Edge. 9012-9021 - Lorenzo Catania, Dario Allegra:
NIF: A Fast Implicit Image Compression with Bottleneck Layers and Modulated Sinusoidal Activations. 9022-9031 - Wanting Li, Yongcai Wang, Yongyu Guo, Shuo Wang, Yu Shao, Xuewei Bai, Xudong Cai, Qiang Ye, Deying Li:
ColSLAM: A Versatile Collaborative SLAM System for Mobile Phones Using Point-Line Features and Map Caching. 9032-9041 - Xizhong Zhu, Guoqing Xiang, Peng Zhang, Huizhu Jia, Xiaodong Xie:
A Hardware-efficient Unified Motion Estimation for Video Coding. 9042-9050 - Yuxin Kong, Peng Yang, Yan Cheng:
Edge-Assisted On-Device Model Update for Video Analytics in Adverse Environments. 9051-9060 - Fangchen Ye, Jin Lin, Hongzhan Huang, Jianping Fan, Zhongchao Shi, Yuan Xie, Yanyun Qu:
Hardware-friendly Scalable Image Super Resolution with Progressive Structured Sparsity. 9061-9069 - Junteng Zhang, Tong Chen, Dandan Ding, Zhan Ma:
YOGA: Yet Another Geometry-based Point Cloud Compressor. 9070-9081 - Zhixiang Ye, Qinghao Hu, Tianli Zhao, Wangping Zhou, Jian Cheng:
MCUNeRF: Packing NeRF into an MCU with 1MB Memory. 9082-9092 - Jianwei Zheng, Changnan Xiao, Mingliang Li, Zhenhua Li, Feng Qian, Wei Liu, Xudong Wu:
ParliRobo: Participant Lightweight AI Robots for Massively Multiplayer Online Games (MMOGs). 9093-9102 - Guanyu Xu, Jiawei Hao, Li Shen, Han Hu, Yong Luo, Hui Lin, Jialie Shen:
LGViT: Dynamic Early Exiting for Accelerating Vision Transformer. 9103-9114 - Seyeon Kim, Kyungmin Bin, Donggyu Yang, Sangtae Ha, Song Chong, Kyunghan Lee:
ENTRO: Tackling the Encoding and Networking Trade-off in Offloaded Video Analytics. 9115-9123 - Sheng-Ming Tang, Yuan-Chun Sun, Cheng-Hsin Hsu:
A Blind Streaming System for Multi-client Online 6-DoF View Touring. 9124-9133 - Yizhen Yuan, Rui Kong, Shenghao Xie, Yuanchun Li, Yunxin Liu:
PatchBackdoor: Backdoor Attack against Deep Neural Networks without Model Modification. 9134-9142 - Shuoqian Wang, Mufeng Zhu, Na Li, Mengbai Xiao, Yao Liu:
VQBA: Visual-Quality-Driven Bit Allocation for Low-Latency Point Cloud Streaming. 9143-9151 - Zheming Yang, Wen Ji, Qi Guo, Zhi Wang:
JAVP: Joint-Aware Video Processing with Edge-Cloud Collaboration for DNN Inference. 9152-9160
Poster Session XII: Multimedia systems -- Transport and Delivery
- Yizong Wang, Dong Zhao, Huanhuan Zhang, Chenghao Huang, Teng Gao, Zixuan Guo, Liming Pang, Huadong Ma:
Hermes: Leveraging Implicit Inter-Frame Correlation for Bandwidth-Efficient Mobile Volumetric Video Streaming. 9185-9193 - Fulin Wang, Qing Li, Wanxin Shi, Gareth Tyson, Yong Jiang, Lianbo Ma, Peng Zhang, Yulong Lan, Zhicheng Li:
Reparo: QoE-Aware Live Video Streaming in Low-Rate Networks by Intelligent Frame Recovery. 9194-9204 - Hongbin Lin, Bolin Chen, Zhichen Zhang, Jielian Lin, Xu Wang, Tiesong Zhao:
DeepSVC: Deep Scalable Video Coding for Both Machine and Human Vision. 9205-9214 - Chaoyang Li, Rui-Xiao Zhang, Tianchi Huang, Lianchen Jia, Lifeng Sun:
Concerto: Client-server Orchestration for Real-Time Video Analytics. 9215-9223 - Mingxuan Yan, Yi Wang, Xuedou Xiao, Zhiqing Luo, Jianhua He, Wei Wang:
Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation. 9224-9233 - Haiping Wang, Zhenhua Yu, Ruixiao Zhang, Siping Tao, Hebin Yu, Shu Shi:
TwinStar: A Practical Multi-path Transmission Framework for Ultra-Low Latency Video Delivery. 9234-9242 - Sergi Fernández, Mario Montagud, David Rincón, Juame Moragues, Gianluca Cernigliaro:
Addressing Scalability for Real-time Multiuser Holo-portation: Introducing and Assessing a Multipoint Control Unit (MCU) for Volumetric Video. 9243-9251 - Zhichen Zhang, Bolin Chen, Hongbin Lin, Jielian Lin, Xu Wang, Tiesong Zhao:
ELFIC: A Learning-based Flexible Image Codec with Rate-Distortion-Complexity Optimization. 9252-9261 - Yueheng Li, Zicheng Zhang, Hao Chen, Zhan Ma:
Mamba: Bringing Multi-Dimensional ABR to WebRTC. 9262-9270 - Xiaodong Yang, Yiting Shao, Shan Liu, Thomas H. Li, Ge Li:
PDE-based Progressive Prediction Framework for Attribute Compression of 3D Point Clouds. 9271-9281
Brave New Ideas Session
- Zijie Ye, Jia Jia, Junliang Xing:
Semantics2Hands: Transferring Hand Motion Semantics between Avatars. 9282-9290 - Danni Xu, Shaojing Fan, Mohan S. Kankanhalli:
Combating Misinformation in the Era of Generative AI Models. 9291-9298 - Qi Yang, Marlo Ongpin, Sergey I. Nikolenko, Alfred Huang, Aleksandr Farseev:
Against Opacity: Explainable AI and Large Language Models for Effective Digital Advertising. 9299-9305 - Federico Betti, Jacopo Staiano, Lorenzo Baraldi, Lorenzo Baraldi, Rita Cucchiara, Nicu Sebe:
Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation. 9306-9312 - Junchen Zhu, Huan Yang, Huiguo He, Wenjing Wang, Zixi Tuo, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu:
MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images. 9313-9319 - Alexander Martin, Haitian Zheng, Jie An, Jiebo Luo:
Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation. 9320-9328 - Zihao Wu, Xin Wang, Hong Chen, Kaidong Li, Yi Han, Lifeng Sun, Wenwu Zhu:
Diff4Rec: Sequential Recommendation with Curriculum-scheduled Diffusion Augmentation. 9329-9335
Doctoral Symposium
- Ahmed Elhagry:
Text-to-Metaverse: Towards a Digital Twin-Enabled Multimodal Conditional Generative Metaverse. 9336-9339 - Tao Pu:
Video Scene Graph Generation with Spatial-Temporal Knowledge. 9340-9344 - Keke Zhang:
Limited-Reference Image Quality Assessment: Paradigms and Discussions. 9345-9349 - Ying Fang:
Haptic-aware Interaction: Design and Evaluation. 9350-9354 - Yuchen Yang:
Encoding and Decoding Narratives: Datafication and Alternative Access Models for Audiovisual Archives. 9355-9359 - Sandipan Sarma:
Zero-Shot Learning for Computer Vision Applications. 9360-9364
Technical Demonstrations
- Qinghao Ye, Haiyang Xu, Ming Yan, Chenlin Zhao, Junyang Wang, Xiaoshan Yang, Ji Zhang, Fei Huang, Jitao Sang, Changsheng Xu:
mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM. 9365-9367 - Zheng Zhang, Songling Chen, Mixiao Hou, Guangming Lu:
Multimodal Emotion Interaction and Visualization Platform. 9368-9370 - Junchen Zhu, Huan Yang, Wenjing Wang, Huiguo He, Zixi Tuo, Yongsheng Yu, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu, Jiebo Luo:
MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text. 9371-9373 - Djamahl Etchegaray, Yadan Luo, Zachary FitzChance, Anthony Southon, Jinjiang Zhong:
Open-RoadAtlas: Leveraging VLMs for Road Condition Survey with Real-Time Mobile Auditing. 9374-9375 - Yi Han, Kaidong Li, Zihan Song, Wei Feng, Xiang Cao, Shida Guo, Xin Wang, Xuguang Duan, Wenwu Zhu:
H2V4Sports: Real-Time Horizontal-to-Vertical Video Converter for Sports Lives via Fast Object Detection and Tracking. 9376-9378 - Mizuki Takenawa, Naoki Sugimoto, Leslie Wöhler, Satoshi Ikehata, Kiyoharu Aizawa:
360RVW: Fusing Real 360° Videos and Interactive Virtual Worlds. 9379-9381 - Yu-Hsuan Chen, Chen-Wei Fu, Wei-Lun Huang, Ming-Cong Su, Hsin-Yu Huang, Andrew Chen, Tse-Yu Pan:
SetterVision: Motion-based Tactical Training System for Volleyball Setters in Virtual Reality. 9382-9384 - Hao Wu, Yueyao Li, Yan Zhuang, Xinyao Sun, Wei Cai:
BranchClash: A Fully On-Chain Tower Defense Blockchain Game with New Collaboration Mechanism. 9385-9387 - Yuki Konishi, Panote Siriaraya, Da Li, Katsumi Tanaka, Yukiko Kawai, Shinsuke Nakajima:
Development of an Online Marathon System using Acoustic AR. 9388-9389 - Zhanbin Hu, Jianwu Wu, Danyang Gao, Yixu Zhou, Qiang Zhu:
CFTF: Controllable Fine-grained Text2Face and Its Human-in-the-loop Suspect Portraits Application. 9390-9392 - Zeyu Jin, Zixuan Wang, Qixin Wang, Jia Jia, Ye Bai, Yi Zhao, Hao Li, Xiaorui Wang:
HoloSinger: Semantics and Music Driven Motion Generation with Octahedral Holographic Projection. 9393-9395 - Dongkai Wang, Shiliang Zhang, Yaowei Wang, Yonghong Tian, Tiejun Huang, Wen Gao:
HumVis: Human-Centric Visual Analysis System. 9396-9398 - Yuya Moroto, Rintaro Yanagi, Naoki Ogawa, Kyohei Kamikawa, Keigo Sakurai, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama:
Personalized Content Recommender System via Non-verbal Interaction Using Face Mesh and Facial Expression. 9399-9401 - Ming Feng, Kele Xu, Hengxing Cai:
IFS-SED: Incremental Few-Shot Sound Event Detection Using Explicit Learning and Calibration. 9402-9404 - Qiuyun Zhang, Bin Guo, Lina Yao, Han Wang, Ying Zhang, Zhiwen Yu:
ALDA: An Adaptive Layout Design Assistant for Diverse Posters throughout the Design Process. 9405-9407 - Yang Chen, Jingwen Chen, Yingwei Pan, Xinmei Tian, Tao Mei:
3D Creation at Your Fingertips: From Text or Image to 3D Assets. 9408-9410 - Rintaro Yanagi, Atsushi Hashimoto, Naoya Chiba, Yoshitaka Ushiku:
Reference-based Dense Pose Estimation via Partial 3D Point Cloud Matching. 9411-9413 - Shanghua Gao, Zhijie Lin, Xingyu Xie, Pan Zhou, Ming-Ming Cheng, Shuicheng Yan:
EditAnything: Empowering Unparalleled Flexibility in Image Editing and Generation. 9414-9416 - Lorenzo Agnolucci, Alberto Baldrati, Marco Bertini, Alberto Del Bimbo:
Zero-Shot Image Retrieval with Human Feedback. 9417-9419
Grand Challenges
- Xin Zhang, Wen Xie, Ziqi Dai, Jun Rao, Haokun Wen, Xuan Luo, Meishan Zhang, Min Zhang:
Finetuning Language Models for Multimodal Question Answering. 9420-9424 - Ruizhe Li, Jiahao Guo, Mingxi Li, Zhengqian Wu, Chao Liang:
A Hierarchical Deep Video Understanding Method with Shot-Based Instance Search and Large Language Model. 9425-9429 - Shijian Mao, Wudong Xi, Lei Yu, Gaotian Lü, Xingxing Xing, Xingchen Zhou, Wei Wan:
Enhanced CatBoost with Stacking Features for Social Media Prediction. 9430-9435 - Zebang Cheng, Yuxiang Lin, Zhaoru Chen, Xiang Li, Shuyi Mao, Fan Zhang, Daijun Ding, Bowen Zhang, Xiaojiang Peng:
Semi-Supervised Multimodal Emotion Recognition with Expression MAE. 9436-9440 - Meng Liu, Yongqiang Li, Shuyan Zhai, Weili Guan, Liqiang Nie:
Towards Realistic Conversational Head Generation: A Comprehensive Framework for Lifelike Video Synthesis. 9441-9445 - Kangshuai Guo, Zhijian Xu, Shichao Luo, Feigao Wei, Yan Wang, Yanru Zhang:
Invisible Video Watermark Method Based on Maximum Voting and Probabilistic Superposition. 9446-9450 - Chih-Chung Hsu, Chia-Ming Lee, Xiu-Yu Hou, Chi-Han Tsai:
Gradient Boost Tree Network based on Extensive Feature Analysis for Popularity Prediction of Social Posts. 9451-9455 - Haoru Chen, Tianjiao Wan, Zhimin Lin, Kele Xu, Jin Wang, Huaimin Wang:
VTQAGen: BART-based Generative Model For Visual Text Question Answering. 9456-9461 - Xiaolu Chen, Weilong Chen, Chenghao Huang, Zhongjian Zhang, Lixin Duan, Yanru Zhang:
Double-Fine-Tuning Multi-Objective Vision-and-Language Transformer for Social Media Popularity Prediction. 9462-9466 - Nicolae-Catalin Ristea, Radu Tudor Ionescu:
Cascaded Cross-Modal Transformer for Request and Complaint Detection. 9467-9471 - Qiya Song, Renwei Dian, Bin Sun, Jie Xie, Shutao Li:
Multi-scale Conformer Fusion Network for Multi-participant Behavior Analysis. 9472-9476 - Dejan Porjazovski, Yaroslav Getman, Tamás Grósz, Mikko Kurimo:
Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference. 9477-9481 - Yanjie Sun, Kele Xu, Chaorun Liu, Yong Dou, Kun Qian:
Automatic Audio Augmentation for Requests Sub-Challenge. 9482-9486 - Jun Yu, Mohan Jing, Weihao Liu, Tongxu Luo, Bingyuan Zhang, Keda Lu, Fangyu Lei, Jianqing Sun, Jiaen Liang:
Answer-Based Entity Extraction and Alignment for Visual Text Question Answering. 9487-9491 - Siddhant R. Viksit, Vinayak Abrol:
Multi-Layer Acoustic & Linguistic Feature Fusion for ComParE-23 Emotion and Requests Challenge. 9492-9495 - Jun Yu, Keda Lu, Mohan Jing, Ziqi Liang, Bingyuan Zhang, Jianqing Sun, Jiaen Liang:
Sliding Window Seq2seq Modeling for Engagement Estimation. 9496-9500 - Wenfeng Qin, Bochao Zou, Xin Li, Weiping Wang, Huimin Ma:
Micro-Expression Spotting with Face Alignment and Optical Flow. 9501-9505 - Cong Liang, Jiahe Wang, Haofan Zhang, Bing Tang, Junshan Huang, Shangfei Wang, Xiaoping Chen:
UniFaRN: Unified Transformer for Facial Reaction Generation. 9506-9510 - Payal Mohapatra, Akash Pandey, Yueyuan Sui, Qi Zhu:
Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks. 9511-9515 - Kun Li, Dan Guo, Guoliang Chen, Feiyang Liu, Meng Wang:
Data Augmentation for Human Behavior Analysis in Multi-Person Conversations. 9516-9520 - Vu Ngoc Tu, Van Thong Huynh, Hyung-Jeong Yang, Soo-Hyung Kim, Shah Nawaz, Karthik Nandakumar, Muhammad Zaigham Zaheer:
DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation. 9521-9525 - Surbhi Madan, Rishabh Jain, Gulshan Sharma, Ramanathan Subramanian, Abhinav Dhall:
MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings. 9526-9530 - Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng:
Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023. 9531-9535 - Ximi Hoque, Adamay Mann, Gulshan Sharma, Abhinav Dhall:
BEAMER: Behavioral Encoder to Generate Multiple Appropriate Facial Reactions. 9536-9540 - Jun Yu, Zhongpeng Cai, Shenshen Du, Xiaxin Shen, Lei Wang, Fang Gao:
Efficient Micro-Expression Spotting Based on Main Directional Mean Optical Flow Feature. 9541-9545 - Qifei Li, Yingming Gao, Ya Li:
Mining High-quality Samples from Raw Data and Majority Voting Method for Multimodal Emotion Recognition. 9546-9550 - Runze Liu, Yaqun Fang, Fan Yu, Ruiqi Tian, Tongwei Ren, Gangshan Wu:
Deep Video Understanding with Video-Language Model. 9551-9555 - Haifeng Chen, Chujia Guo, Yan Li, Peng Zhang, Dongmei Jiang:
Semi-Supervised Multimodal Emotion Recognition with Class-Balanced Pseudo-labeling. 9556-9560 - Jun Yu, Ji Zhao, Guochen Xie, Fengxin Chen, Ye Yu, Liang Peng, Minglei Li, Zonghong Dai:
Leveraging the Latent Diffusion Models for Offline Facial Multiple Appropriate Reactions Generation. 9561-9565 - Wei Dai:
Improvements on SadTalker-based Approach for ViCo Conversational Head Generation Challenge. 9566-9570 - Sunan Li, Hailun Lian, Cheng Lu, Yan Zhao, Chuangao Tang, Yuan Zong, Wenming Zheng:
Multimodal Emotion Recognition in Noisy Environment Based on Progressive Label Revision. 9571-9575 - Ke Xu, Kang Chen, Licai Sun, Zheng Lian, Bin Liu, Gong Chen, Haiyang Sun, Mingyu Xu, Jianhua Tao:
Integrating VideoMAE based model and Optical Flow for Micro- and Macro-expression Spotting. 9576-9580 - Zhigang Chang, Weitai Hu, Qing Yang, Shibao Zheng:
Hierarchical Semantic Perceptual Listener Head Video Generation: A High-performance Pipeline. 9581-9585 - Kangzhong Wang, MK Michael Cheung, Youqian Zhang, Chunxi Yang, Peter Q. Chen, Eugene Yujun Fu, Grace Ngai:
Unveiling Subtle Cues: Backchannel Detection Using Temporal Multimodal Attention Networks. 9586-9590 - Yuanxing Xu, Yuting Wei, Bin Wu:
Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding. 9591-9595 - Daoming Zong, Chaoyue Ding, Baoxiang Li, Dinghao Zhou, Jiakui Li, Ken Zheng, Qunyan Zhou:
Building Robust Multimodal Sentiment Recognition via a Simple yet Effective Multimodal Transformer. 9596-9600 - Chunxi Yang, Kangzhong Wang, Peter Q. Chen, MK Michael Cheung, Youqian Zhang, Eugene Yujun Fu, Grace Ngai:
MultiMediate 2023: Engagement Level Detection using Audio and Video Features. 9601-9605 - Keith Curtis, George Awad, Afzal Godil, Ian Soboroff:
The ACM Multimedia 2023 Deep Video Understanding Grand Challenge. 9606-9609 - Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mingyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, Ye Liu, Bin Liu, Jiangyan Yi, Meng Wang, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao:
MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning. 9610-9614 - Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei:
Learning and Evaluating Human Preferences for Conversational Head Generation. 9615-9619 - Siyang Song, Micol Spitale, Cheng Luo, Germán Barquero, Cristina Palmero, Sergio Escalera, Michel F. Valstar, Tobias Baur, Fabien Ringeval, Elisabeth André, Hatice Gunes:
REACT2023: The First Multiple Appropriate Facial Reaction Generation Challenge. 9620-9624 - Adrian K. Davison, Jingting Li, Moi Hoon Yap, John See, Wen-Huang Cheng, Xiaobai Li, Xiaopeng Hong, Su-Jing Wang:
MEGC2023: ACM Multimedia 2023 ME Grand Challenge. 9625-9629 - Jin Chen, Yi Yu, Shien Song, Xinying Wang, Jie Yang, Yifei Xue, Yizhen Lao:
ACM Multimedia 2023 Grand Challenge Report: Invisible Video Watermark. 9630-9634 - Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Alexander Barnhill, Maurice Gerczuk, Andreas Triantafyllopoulos, Alice E. Baird, Panagiotis Tzirakis, Chris Gagne, Alan S. Cowen, Nikola Lackovic, Marie-José Caraty, Claude Montacié:
The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests. 9635-9639 - Philipp Müller, Michal Balazia, Tobias Baur, Michael Dietz, Alexander Heimerl, Dominik Schiller, Mohammed Guermal, Dominike Thomas, François Brémond, Jan Alexandersson, Elisabeth André, Andreas Bulling:
MultiMediate '23: Engagement Estimation and Bodily Behaviour Recognition in Social Interactions. 9640-9645 - Kang Chen, Tianli Zhao, Xiangqian Wu:
VTQA2023: ACM Multimedia 2023 Visual Text Question Answering Challenge. 9646-9650 - Bo Wu, Peiye Liu, Wen-Huang Cheng, Bei Liu, Zhaoyang Zeng, Jia Wang, Qiushi Huang, Jiebo Luo:
SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge. 9651-9655
Open Source Session
- Jiabei He, Yang Shen, Xiu-Shen Wei, Ye Wu:
Hawkeye: A PyTorch-based Library for Fine-Grained Image Recognition with Deep Learning. 9656-9659 - Hang Yuan, Wei Gao:
OpenFastVC: An Open Source Library for Video Coding Fast Algorithm Implementation. 9660-9663 - Lingxiao He, Xingyu Liao, Wu Liu, Xinchen Liu, Peng Cheng, Tao Mei:
FastReID: A Pytorch Toolbox for General Instance Re-identification. 9664-9667 - Daniele Malitesta, Giuseppe Gassi, Claudio Pomo, Tommaso Di Noia:
Ducho: A Unified Framework for the Extraction of Multimodal Features in Recommendation. 9668-9671 - Songlin Fan, Wei Gao:
Screen-based 3D Subjective Experiment Software. 9672-9675 - Max van Spengler, Philipp Wirth, Pascal Mettes:
HypLL: The Hyperbolic Learning Library. 9676-9679 - Gustavo Leticio, Lucas Pascotti Valem, Leonardo Tadeu Lopes, Daniel Carlos Guimarães Pedronette:
pyUDLF: A Python Framework for Unsupervised Distance Learning Tasks. 9680-9684 - Wei Gao, Shangkun Sun, Huiming Zheng, Yuyang Wu, Hua Ye, Yongchi Zhang:
OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame Compression. 9685-9688 - Ming Shan Hee, Aditi Kumaresan, Nguyen-Khoi Hoang, Nirmalendu Prakash, Rui Cao, Roy Ka-Wei Lee:
MATK: The Meme Analytical Tool Kit. 9689-9692 - Aaron Keesing, Yun Sing Koh, Vithya Yogarajan, Michael Witbrock:
Emotion Recognition ToolKit (ERTK): Standardising Tools For Emotion Recognition Research. 9693-9696
Tutorial Summaries
- Xu Tan:
Revisiting Learning Paradigms for Multimedia Data Generation. 9697-9699 - Debanjan Datta, Gerald Friedland:
Efficient Multimedia Computing: Unleashing the Power of AutoML. 9700-9701 - Xin Wang, Hong Chen, Wenwu Zhu:
Disentangled Representation Learning for Multimedia. 9702-9704 - Cem Sazara:
Diffusion Models in Generative AI. 9705-9706
Panel Summaries
- Irene Viola, Maria Torres Vega:
On the Impact of Interactive eXtended Reality: Challenges and Opportunities for Multimedia Research. 9707-9708 - Mohan S. Kankanhalli, Marcel Worring:
Panel: Multimodal Large Foundation Models. 9709
Workshop Summaries
- Hideo Saito, Thomas B. Moeslund, Rainer Lienhart:
MMSports '23: 6th International Workshop on Multimedia Content Analysis in Sports. 9710-9712 - Zheng Lian, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao:
MRAC'23: 1st International Workshop on Multimodal and Responsible Affective Computing. 9713-9714 - Zhedong Zheng, Yujiao Shi, Tingyu Wang, Jun Liu, Jianwu Fang, Yunchao Wei, Tat-Seng Chua:
UAVM '23: 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective. 9715-9717 - Valérie Gouet-Brunet, Ronak Kosti, Li Weng:
SUMAC '23: 5th Workshop on the analySis, Understanding and proMotion of heritAge Contents: Advances in Machine Learning, Signal Processing, Multimodal Techniques and Human-machine Interaction. 9718-9720 - Cheng Jin, Liang He, Mingli Song, Rui Wang:
McGE '23: 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice. 9721-9722 - Shahin Amiriparian, Lukas Christ, Andreas König, Alan Cowen, Eva-Maria Meßner, Erik Cambria, Björn W. Schuller:
MuSe 2023 Challenge: Multimodal Prediction of Mimicked Emotions, Cross-Cultural Humour, and Personalised Recognition of Affects. 9723-9725 - Stavroula G. Mougiakakou, Keiji Yanai, Dario Allegra:
MADiMa '23: 8th International Workshop on Multimedia Assisted Dietary Management. 9726-9727 - Irene Viola, Hadi Amirpour, Stephanie Arévalo Arboleda, Maria Torres Vega:
IXR '23: 2nd International Workshop on Interactive eXtended Reality. 9728-9730 - Mohan S. Kankanhalli, Ioannis (Yiannis) Patras, Jianquan Liu, Yongkang Wong, Takahiro Komamizu, Satoshi Yamazaki, Karen Stephen, Kajal Kansal:
NarSUM '23: The 2nd Workshop on User-Centric Narrative Summarization of Long Videos. 9731-9733 - Jingkuan Song, Wu Liu, Xinchen Liu, Dingwen Zhang, Chaowei Fang, Hongyuan Zhu, Wenbing Huang, John Smith, Xin Wang:
HCMA '23: 4th International Workshop on Human-Centric Multimedia Analysis. 9734-9735 - Adrian K. Davison, Jingting Li, Moi Hoon Yap, John See, Wen-Huang Cheng, Xiaobai Li, Xiaopeng Hong, Su-Jing Wang:
FME '23: 3rd Facial Micro-Expression Workshop. 9736-9738 - Wei Ji, Yinwei Wei, Zhedong Zheng, Hao Fei, Tat-Seng Chua:
Deep Multimodal Learning for Information Retrieval. 9739-9741 - Junxin Chen, Wei Wang, Gwanggil Jeon:
AMC-SME '23: 2023 Workshop on Advanced Multimedia Computing for Smart Manufacturing and Engineering. 9742-9743 - Zheng Wang, Cheng Long, Shihao Xu, Bingzheng Gan, Wei Shi, Zhao Cao, Tat-Seng Chua:
LGM3A '23: 1st Workshop on Large Generative Models Meet Multimodal Applications. 9744-9745
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.