default search action
29th ACM Multimedia 2021: Virtual Event, China
- Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo César, Florian Metze, Balakrishnan Prabhakaran:
MM '21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021. ACM 2021, ISBN 978-1-4503-8651-7
Keynote Talks I&II
- Wen Gao:
Video Coding for Machine. 1 - H. V. Jagadish:
Semantic Media Conversion: Possibilities and Limits. 2
Session 1: Deep Learning for Multimedia-I
- Rong Zhang, Wei Li, Yiqun Zhang, Hong Zhang, Jinhui Yu, Ruigang Yang, Weiwei Xu:
Image Re-composition via Regional Content-Style Decoupling. 3-11 - Hao Huang, Shinjae Yoo, Chenxiao Xu:
Deep Clustering based on Bi-Space Association Learning. 12-21 - Seogkyu Jeon, Kibeom Hong, Pilhyeon Lee, Jewook Lee, Hyeran Byun:
Feature Stylization and Domain-aware Contrastive Learning for Domain Generalization. 22-31 - Qi Zhang, Xuesong Zhang, Baoping Li, Yuzhong Chen, Anlong Ming:
HDA-Net: Horizontal Deformable Attention Network for Stereo Matching. 32-40 - Zhaoyang Jia, Han Fang, Weiming Zhang:
MBRS: Enhancing Robustness of DNN-based Watermarking by Mini-Batch of Real and Simulated JPEG Compression. 41-49 - Ye Liu, Lei Zhu, Shunda Pei, Huazhu Fu, Jing Qin, Qing Zhang, Liang Wan, Wei Feng:
From Synthetic to Real: Image Dehazing Collaborating with Unlabeled Real Data. 50-58
Session 2: Deep Learning for Multimedia-II
- Jiangtong Li, Wentao Wang, Junjie Chen, Li Niu, Jianlou Si, Chen Qian, Liqing Zhang:
Video Semantic Segmentation via Sparse Temporal Transformer. 59-68 - Yingchen Yu, Fangneng Zhan, Rongliang Wu, Jianxiong Pan, Kaiwen Cui, Shijian Lu, Feiying Ma, Xuansong Xie, Chunyan Miao:
Diverse Image Inpainting with Bidirectional and Autoregressive Transformers. 69-78 - Hanbang Liang, Xianxu Hou, Linlin Shen:
SSFlow: Style-guided Neural Spline Flows for Face Image Manipulation. 79-87 - Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi:
Constrained Graphic Layout Generation via Latent Optimization. 88-96 - Xiaoya Zhang, Ling Zhou, Yong Li, Zhen Cui, Jin Xie, Jian Yang:
Transfer Vision Patterns for Multi-Task Pixel Learning. 97-106 - Yike Wu, Bo Zhang, Gang Yu, Weixi Zhang, Bin Wang, Tao Chen, Jiayuan Fan:
Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification. 107-115
Session 3: Brave New Idea
- Yunan Zhu, Haichuan Ma, Jialun Peng, Dong Liu, Zhiwei Xiong:
Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN. 116-125 - Liangchen Song, Sheng Liu, Celong Liu, Zhong Li, Yuqi Ding, Yi Xu, Junsong Yuan:
Learning Kinematic Formulas from Multiple View Videos. 126-134 - Pingyue Zhang, Mengyue Wu, Heinrich Dinkel, Kai Yu:
DEPA: Self-Supervised Audio Embedding for Depression Detection. 135-143 - Zhaodong Kang, Jianing Li, Lin Zhu, Yonghong Tian:
Retinomorphic Sensing: A Novel Paradigm for Future Multimedia Computing. 144-152 - Haihan Duan, Jiaye Li, Sizheng Fan, Zhonghao Lin, Xiao Wu, Wei Cai:
Metaverse for Social Good: A University Campus Prototype. 153-161
Session 4: Deep Learning for Multimedia-III
- Yueqi Xie, Ka Leong Cheng, Qifeng Chen:
Enhanced Invertible Encoding for Learned Image Compression. 162-170 - Shihao Zhou, Mengxi Jiang, Shanshan Cai, Yunqi Lei:
DC-GNet: Deep Mesh Relation Capturing Graph Convolution Network for 3D Human Shape Reconstruction. 171-180 - Xun Cai, Jiajing Chai, Yanbo Gao, Shuai Li, Bo Zhu:
Deep Marginal Fisher Analysis based CNN for Image Representation and Classification. 181-189 - Yuanzhouhan Cao, Yidong Li, Haokui Zhang, Chao Ren, Yifan Liu:
Learning Structure Affinity for Video Depth Estimation. 190-198 - Jingjing Jiang, Ziyi Liu, Yifan Liu, Zhixiong Nan, Nanning Zheng:
X-GGM: Graph Generative Modeling for Out-of-distribution Generalization in Visual Question Answering. 199-208 - Aichun Zhu, Zijie Wang, Yifeng Li, Xili Wan, Jing Jin, Tian Wang, Fangqiang Hu, Gang Hua:
DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval. 209-217
Session 5: Emerging Multimedia Applications-I
- David D. Nguyen, Surya Nepal, Salil S. Kanhere:
Diverse Multimedia Layout Generation with Multi Choice Learning. 218-226 - Liangchen Liu, Xi Yang, Nannan Wang, Xinbo Gao:
Viewing from Frequency Domain: A DCT-based Information Enhancement Network for Video Person Re-Identification. 227-235 - Yingqing He, Yazhou Xing, Tianjia Zhang, Qifeng Chen:
Unsupervised Portrait Shadow Removal via Generative Priors. 236-244 - Yi Huang, Xiaoshan Yang, Changsheng Xu:
Multimodal Global Relation Knowledge Distillation for Egocentric Action Anticipation. 245-254 - Liuan Wang, Li Sun, Mingjie Zhang, Huigang Zhang, Wang Ping, Rong Zhou, Jun Sun:
Exploring Pathologist Knowledge for Automatic Assessment of Breast Cancer Metastases in Whole-slide Image. 255-263 - Mingxing Duan, Kenli Li, Lingxi Xie, Qi Tian, Bin Xiao:
Towards Multiple Black-boxes Attack via Adversarial Example Generation Network. 264-272
Session 6: Emerging Multimedia Applications-II
- Hao Feng, Yuechen Wang, Wengang Zhou, Jiajun Deng, Houqiang Li:
DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction. 273-281 - Yiyang Gan, Ruize Han, Liqiang Yin, Wei Feng, Song Wang:
Self-supervised Multi-view Multi-Human Association and Tracking. 282-290 - Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo:
Learning Fine-Grained Motion Embedding for Landscape Animation. 291-299 - Ying Li, Hongwei Zhou, Yeyu Yin, Jiaquan Gao:
Multi-label Pattern Image Retrieval via Attention Mechanism Driven Graph Convolutional Network. 300-308 - Na Zheng, Xuemeng Song, Qingying Niu, Xue Dong, Yibing Zhan, Liqiang Nie:
Collocation and Try-on Network: Whether an Outfit is Compatible. 309-317 - Rishabh Baghel, Abhishek Trivedi, Tejas Ravichandran, Ravi Kiran Sarvadevabhatla:
MeronymNet: A Hierarchical Model for Unified and Controllable Multi-Category Object Generation. 318-326
Session 7: Emerging Multimedia Applications-III
- Akash Gupta, Padmaja Jonnalagedda, Bir Bhanu, Amit K. Roy-Chowdhury:
Ada-VSR: Adaptive Video Super-Resolution with Meta-Learning. 327-336 - Minha Kim, Shahroz Tariq, Simon S. Woo:
CoReD: Generalizing Fake Media Detection with Continual Representation using Distillation. 337-346 - Xiaowen Ying, Xin Li, Mooi Choo Chuah:
SRNet: Spatial Relation Network for Efficient Single-stage Instance Segmentation in Videos. 347-356 - Zilong Shao, Siyang Song, Shashank Jaiswal, Linlin Shen, Michel F. Valstar, Hatice Gunes:
Personality Recognition by Modelling Person-specific Cognitive Processes using Graph Representation. 357-366 - Xiaopeng Guo, Zhijie Huang, Jie Gao, Mingyu Shang, Maojing Shu, Jun Sun:
Enhancing Knowledge Tracing via Adversarial Training. 367-375 - Gangyan Zeng, Yuan Zhang, Yu Zhou, Xiaomeng Yang:
Beyond OCR + VQA: Involving OCR into the Flow for Robust and Accurate TextVQA. 376-385
Poster Session 1
- Qing Guo, Xiaoguang Li, Felix Juefei-Xu, Hongkai Yu, Yang Liu, Song Wang:
JPGNet: Joint Predictive Filtering and Generative Network for Image Inpainting. 386-394 - Yihao Huang, Qing Guo, Felix Juefei-Xu, Lei Ma, Weikai Miao, Yang Liu, Geguang Pu:
AdvFilter: Predictive Perturbation-aware Filtering against Adversarial Attack via Multi-domain Learning. 395-403 - Zizheng Yan, Xianggang Yu, Yipeng Qin, Yushuang Wu, Xiaoguang Han, Shuguang Cui:
Pixel-level Intra-domain Adaptation for Semantic Segmentation. 404-413 - Xugong Qin, Yu Zhou, Youhui Guo, Dayan Wu, Zhihong Tian, Ning Jiang, Hongbin Wang, Weiping Wang:
Mask is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-Shaped Scene Text Detection. 414-423 - Chuanjun Zheng, Daming Shi, Yukun Liu:
Windowing Decomposition Convolutional Neural Network for Image Enhancement. 424-432 - Weiming Zhuang, Yonggang Wen, Shuai Zhang:
Joint Optimization in Edge-Cloud Continuum for Federated Unsupervised Person Re-identification. 433-441 - Zehai Niu, Ke Lu, Jian Xue, Haifeng Ma, Runchen Wei:
Multi-view 3D Smooth Human Pose Estimation based on Heatmap Filtering and Spatio-temporal Information. 442-450 - Yu-Ke Li, Pin Wang, Mang Ye, Ching-Yao Chan:
Imitative Learning for Multi-Person Action Forecasting. 451-459 - Ruikang Xu, Zeyu Xiao, Mingde Yao, Yueyi Zhang, Zhiwei Xiong:
Stereo Video Super-Resolution via Exploiting View-Temporal Correlations. 460-468 - Jiawei Zhao, Yifan Zhao, Jia Li:
M3TR: Multi-modal Multi-label Recognition with Transformer. 469-477 - Luchuan Song, Bin Liu, Guojun Yin, Xiaoyi Dong, Yufei Zhang, Jia-Xuan Bai:
TACR-Net: Editing on Deep Video and Voice Portraits. 478-486 - Yixiong Zou, Shanghang Zhang, Guangyao Chen, Yonghong Tian, Kurt Keutzer, José M. F. Moura:
Annotation-Efficient Untrimmed Video Action Recognition. 487-495 - Hsiao-Han Lu, Shao-En Weng, Ya-Fan Yen, Hong-Han Shuai, Wen-Huang Cheng:
Face-based Voice Conversion: Learning the Voice behind a Face. 496-505 - Xiongwei Wu, Xin Fu, Ying Liu, Ee-Peng Lim, Steven C. H. Hoi, Qianru Sun:
A Large-Scale Benchmark for Food Image Segmentation. 506-515 - Guowen Zhang, Pingping Zhang, Jinqing Qi, Huchuan Lu:
HAT: Hierarchical Aggregation Transformers for Person Re-identification. 516-525 - Qinglin Liu, Haozhe Xie, Shengping Zhang, Bineng Zhong, Rongrong Ji:
Long-Range Feature Propagating for Natural Image Matting. 526-534 - Ansheng You, Chenglin Zhou, Qixuan Zhang, Lan Xu:
Towards Controllable and Photorealistic Region-wise Image Manipulation. 535-543 - Zhuangzi Li, Ge Li, Thomas H. Li, Shan Liu, Wei Gao:
Information-Growth Attention Network for Image Super-Resolution. 544-552 - Jiale Li, Hang Dai, Ling Shao, Yong Ding:
Anchor-free 3D Single Stage Detector with Mask-Guided Attention for Point Cloud. 553-562 - Xin Gao, Zhenjiang Liu, Zunlei Feng, Chengji Shen, Kairi Ou, Haihong Tang, Mingli Song:
Shape Controllable Virtual Try-on for Underwear Models. 563-572 - Zhiwei Chen, Liujuan Cao, Yunhang Shen, Feihong Lian, Yongjian Wu, Rongrong Ji:
E2Net: Excitative-Expansile Learning for Weakly Supervised Object Localization. 573-581 - Jiahao Wang, Yunhong Wang, Sheng Liu, Annan Li:
Few-shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning. 582-591 - Yi Tan, Yanbin Hao, Xiangnan He, Yinwei Wei, Xun Yang:
Selective Dependency Aggregation for Action Classification. 592-601 - Wenbo Hu, Changgong Zhang, Fangneng Zhan, Lei Zhang, Tien-Tsin Wong:
Conditional Directed Graph Convolution for 3D Human Pose Estimation. 602-611 - Gangming Zhao:
Cross Chest Graph for Disease Diagnosis with Structural Relational Reasoning. 612-620 - Qi Wen, Shuang Li, Bingfeng Han, Yi Yuan:
ZiGAN: Fine-grained Chinese Calligraphy Font Generation via a Few-shot Style Transfer Approach. 621-629 - Hao Wang, Guosheng Lin, Steven C. H. Hoi, Chunyan Miao:
Cycle-Consistent Inverse GAN for Text-to-Image Synthesis. 630-638 - Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen:
Fully Quantized Image Super-Resolution Networks. 639-647 - Haonan Zhang, Longjun Liu, Hengyi Zhou, Wenxuan Hou, Hongbin Sun, Nanning Zheng:
AKECP: Adaptive Knowledge Extraction from Feature Maps for Fast and Efficient Channel Pruning. 648-657 - Qiangqiang Wu, Jia Wan, Antoni B. Chan:
Dynamic Momentum Adaptation for Zero-Shot Cross-Domain Crowd Counting. 658-666 - Miao Zhang, Tingwei Liu, Yongri Piao, Shunyu Yao, Huchuan Lu:
Auto-MSFNet: Search Multi-scale Fusion Network for Salient Object Detection. 667-676 - Shengqi Huang, Wanqi Yang, Lei Wang, Luping Zhou, Ming Yang:
Few-shot Unsupervised Domain Adaptation with Image-to-Class Sparse Similarity Encoding. 677-685 - Xuanhan Wang, Lianli Gao, Yan Dai, Yixuan Zhou, Jingkuan Song:
Semantic-aware Transfer with Instance-adaptive Parsing for Crowded Scenes Pose Estimation. 686-694 - Haoyu Zhang, Meng Liu, Zan Gao, Xiaoqiang Lei, Yinglong Wang, Liqiang Nie:
Multimodal Dialog System: Relational Graph-based Context-aware Question Understanding. 695-703 - Jingwei Liao, Yanli Liu, Guanyu Xing, Housheng Wei, Jueyu Chen, Songhua Xu:
Shadow Detection via Predicting the Confidence Maps of Shadow Detection Methods. 704-712 - Pengxiang Su, Zhenguang Liu, Shuang Wu, Lei Zhu, Yifang Yin, Xuanjing Shen:
Motion Prediction via Joint Dependency Modeling in Phase Space. 713-721 - Hao Su, Jianwei Niu, Xuefeng Liu, Qingfeng Li, Ji Wan, Mingliang Xu:
Q-Art Code: Generating Scanning-robust Art-style QR Codes by Deformable Convolution. 722-730 - Wenbo Zhang, Ge-Peng Ji, Zhuo Wang, Keren Fu, Qijun Zhao:
Depth Quality-Inspired Feature Manipulation for Efficient RGB-D Salient Object Detection. 731-740 - Yixiong Zou, Shanghang Zhang, Jianpeng Yu, Yonghong Tian, José M. F. Moura:
Revisiting Mid-Level Patterns for Cross-Domain Few-Shot Recognition. 741-749 - Yuqi Sun, Ri Cheng, Bo Yan, Shili Zhou:
Space-Angle Super-Resolution for Multi-View Images. 750-759 - Wei Wang, Junyu Gao, Changsheng Xu:
Weakly-Supervised Video Object Grounding via Stable Context Learning. 760-768 - Yukun Su, Guosheng Lin, Ruizhou Sun, Yun Hao, Qingyao Wu:
Modeling the Uncertainty for Self-supervised 3D Skeleton Action Representation Learning. 769-778 - Rongyun Mo, Yan Yan, Jing-Hao Xue, Si Chen, Hanzi Wang:
D³Net: Dual-Branch Disturbance Disentangling Network for Facial Expression Recognition. 779-787 - Yukang Zhang, Yan Yan, Yang Lu, Hanzi Wang:
Towards a Unified Middle Modality Learning for Visible-Infrared Person Re-Identification. 788-796 - Yuhao Cui, Zhou Yu, Chunqi Wang, Zhongzhou Zhao, Ji Zhang, Meng Wang, Jun Yu:
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration. 797-806 - Xuanxiang Lin, Ke Chen, Kui Jia:
Object Point Cloud Classification via Poly-Convolutional Architecture Search. 807-815 - Xiao Wang, Weirong Ye, Zhongang Qi, Xun Zhao, Guangge Wang, Ying Shan, Hanzi Wang:
Semantic-Guided Relation Propagation Network for Few-shot Action Recognition. 816-825 - Yunjie Ge, Qian Wang, Baolin Zheng, Xinlu Zhuang, Qi Li, Chao Shen, Cong Wang:
Anti-Distillation Backdoor Attacks: Backdoors Can Really Survive in Knowledge Distillation. 826-834 - Yinglu Liu, Mingcan Xiang, Hailin Shi, Tao Mei:
One-stage Context and Identity Hallucination Network. 835-843 - Zhi Chen, Yadan Luo, Sen Wang, Ruihong Qiu, Jingjing Li, Zi Huang:
Mitigating Generation Shifts for Generalized Zero-Shot Learning. 844-852 - Yuan Ji, Xu Jia, Huchuan Lu, Xiang Ruan:
Weakly-Supervised Temporal Action Localization via Cross-Stream Collaborative Learning. 853-861 - Cheng Chen, Jiayin Cai, Yao Hu, Xu Tang, Xinggang Wang, Chun Yuan, Xiang Bai, Song Bai:
Deep Interactive Video Inpainting: An Invisibility Cloak for Harry Potter. 862-870 - Chenchen Liu, Yadong Mu:
Searching Motion Graphs for Human Motion Synthesis. 871-879 - Hanbin Zhao, Xin Qin, Shihao Su, Yongjian Fu, Zibo Lin, Xi Li:
When Video Classification Meets Incremental Classes. 880-889 - Yulin He, Wei Chen, Zhengfa Liang, Dan Chen, Yusong Tan, Xin Luo, Chen Li, Yulan Guo:
Fast and Accurate Lane Detection via Frequency Domain Learning. 890-898 - Yifang Yin, Ying Zhang, Zhenguang Liu, Yuxuan Liang, Sheng Wang, Rajiv Ratn Shah, Roger Zimmermann:
Learning Multi-context Aware Location Representations from Large-scale Geotagged Images. 899-907 - Xiaojing Zhong, Zhonghua Wu, Taizhe Tan, Guosheng Lin, Qingyao Wu:
MV-TON: Memory-based Video Virtual Try-on network. 908-916 - Hao Zhang, Yanbin Hao, Chong-Wah Ngo:
Token Shift Transformer for Video Classification. 917-925 - Rui Wang, Jian Chen, Gang Yu, Li Sun, Changqian Yu, Changxin Gao, Nong Sang:
Attribute-specific Control Units in StyleGAN for Fine-grained Image Manipulation. 926-934 - Zhihao Peng, Hui Liu, Yuheng Jia, Junhui Hou:
Attention-driven Graph Clustering Network. 935-943 - Tianhao Fu, Yingying Li, Xiaoqing Ye, Xiao Tan, Hao Sun, Fumin Shen, Errui Ding:
Lifting the Veil of Frequency in Joint Segmentation and Depth Estimation. 944-952
Panel 1
- João Magalhães, Tat-Seng Chua, Tao Mei, Alan F. Smeaton:
The Next Generation Multimodal Conversational Search and Recommendation. 953-954
Session 8: Emerging Multimedia Applications-IV
- Guanze Liu, Yu Rong, Lu Sheng:
VoteHMR: Occlusion-Aware Voting Network for Robust 3D Human Mesh Recovery from Partial Point Clouds. 955-964 - Shao-Kui Zhang, Yi-Xiao Li, Yu He, Yong-Liang Yang, Song-Hai Zhang:
MageAdd: Real-Time Interaction Simulation for Scene Synthesis. 965-973 - Gaowen Liu, Hao Tang, Hugo Latapie, Jason J. Corso, Yan Yan:
Cross-View Exocentric to Egocentric Video Synthesis. 974-982 - Sachin Mehta, Amit Kumar, Fitsum A. Reda, Varun Nasery, Vikram Mulukutla, Rakesh Ranjan, Vikas Chandra:
EVRNet: Efficient Video Restoration on Edge Devices. 983-992 - Jingru Gan, Jinchang Luo, Haiwei Wang, Shuhui Wang, Wei He, Qingming Huang:
Multimodal Entity Linking: A New Dataset and A Baseline. 993-1001 - Xichu Ma, Ye Wang, Min-Yen Kan, Wee Sun Lee:
AI-Lyricist: Generating Music and Vocabulary Constrained Lyrics. 1002-1011
Session 9: Emotional and Social Signals in Multimedia
- Yingjie Chen, Diqi Chen, Yizhou Wang, Tao Wang, Yun Liang:
CaFGraph: Context-aware Facial Multi-graph Representation for Facial Action Unit Recognition. 1029-1037 - Jingwei Yan, Jingjing Wang, Qiang Li, Chunmao Wang, Shiliang Pu:
Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition. 1038-1046 - Ziyu Jia, Youfang Lin, Jing Wang, Zhiyang Feng, Xiangheng Xie, Caijie Chen:
HetEmotionNet: Two-Stream Heterogeneous Graph Recurrent Neural Network for Multi-modal Emotion Recognition. 1047-1056 - Xu Yan, Li-Ming Zhao, Bao-Liang Lu:
Simplifying Multimodal Emotion Recognition with Single Eye Movement Modality. 1057-1063 - Feiyu Chen, Zhengxiao Sun, Deqiang Ouyang, Xueliang Liu, Jie Shao:
Learning What and When to Drop: Adaptive Multimodal and Contextual Dynamics for Emotion Recognition in Conversation. 1064-1073 - Fan Qi, Xiaoshan Yang, Changsheng Xu:
Zero-shot Video Emotion Recognition via Multimodal Protagonist-aware Transformer Network. 1074-1083
Session 10: Industrial Track
- Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren, Rongrong Ji:
Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator. 1084-1092 - Di Jin, Zhongang Qi, Yingmin Luo, Ying Shan:
TransFusion: Multi-Modal Fusion for Video Tag Inference via Translation-based Knowledge Embedding. 1093-1101 - Yiqing Hu, Yan Zheng, Xinghua Jiang, Hao Liu, Deqiang Jiang, Yinsong Liu, Bo Ren, Rongrong Ji:
RecycleNet: An Overlapped Text Instance Recovery Approach. 1102-1110 - Shan An, Guangfu Che, Jinghao Guo, Haogang Zhu, Junjie Ye, Fangru Zhou, Zhaoqi Zhu, Dong Wei, Aishan Liu, Wei Zhang:
ARShoe: Real-Time Augmented Reality Shoe Try-on System on Smartphones. 1111-1119 - Yongshun Gong, Jinfeng Yi, Dongdong Chen, Jian Zhang, Jiayu Zhou, Zhihua Zhou:
Inferring the Importance of Product Appearance with Semi-supervised Multi-modal Enhancement: A Step Towards the Screenless Retailing. 1120-1128 - Cheng Da, Yanhao Zhang, Yun Zheng, Pan Pan, Yinghui Xu, Chunhong Pan:
AsyNCE: Disentangling False-Positives for Weakly-Supervised Video Grounding. 1129-1137 - Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu:
Unifying Multimodal Transformer for Bi-directional Image and Text Generation. 1138-1147 - Lianghua Huang, Yu Liu, Xiangzeng Zhou, Ansheng You, Ming Li, Bin Wang, Yingya Zhang, Pan Pan, Yinghui Xu:
Once and for All: Self-supervised Multi-modal Co-training on One-billion Videos at Alibaba. 1148-1156 - Yuanfeng Song, Di Jiang, Xuefang Zhao, Qian Xu, Raymond Chi-Wing Wong, Lixin Fan, Qiang Yang:
L2RS: A Learning-to-Rescore Mechanism for Hybrid Speech Recognition. 1157-1166 - Avijit Shah, Topojoy Biswas, Sathish Ramadoss, Deven Santosh Shah:
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding. 1167-1175 - Xin Jin, Zhonglan Li, Ke Liu, Dongqing Zou, Xiaodong Li, Xingfan Zhu, Ziyin Zhou, Qilong Sun, Qingyu Liu:
Focusing on Persons: Colorizing Old Images Learning from Modern Historical Movies. 1176-1184 - Haotian Zhang, Allan D. Jepson, Iqbal Mohomed, Konstantinos G. Derpanis, Ran Zhang, Afsaneh Fazly:
Personalized Multi-modal Video Retrieval on Mobile Devices. 1185-1191 - Wei Zhang, Lingxiao He, Peng Chen, Xingyu Liao, Wu Liu, Qi Li, Zhenan Sun:
Boosting End-to-end Multi-Object Tracking and Person Search via Knowledge Distillation. 1192-1201 - Li Hu, Bang Zhang, Peng Zhang, Jinwei Qi, Jian Cao, Daiheng Gao, Haiming Zhao, Xiaoduan Feng, Qi Wang, Lian Zhuo, Pan Pan, Yinghui Xu:
A Virtual Character Generation and Animation System for E-Commerce Live Streaming. 1202-1211 - Peng Qi, Juan Cao, Xirong Li, Huan Liu, Qiang Sheng, Xiaoyue Mi, Qin He, Yongbiao Lv, Chenyang Guo, Yingchao Yu:
Improving Fake News Detection by Using an Entity-enhanced Framework to Fuse Diverse Multimodal Clues. 1212-1220
Session 11: Multimedia HCI and Quality of Experience
- Federico Vaccaro, Marco Bertini, Tiberio Uricchio, Alberto Del Bimbo:
Fast Video Visual Quality and Resolution Improvement using SR-UNet. 1221-1229 - Yujie Zhang, Qi Yang, Yiling Xu:
MS-GraphSIM: Inferring Point Cloud Quality via Multiscale Graph Similarity. 1230-1238 - Jia-Xuan Bai, Bin Liu, Luchuan Song:
I Know Your Keyboard Input: A Robust Keystroke Eavesdropper Based-on Acoustic Signals. 1239-1247 - Jiahua Xu, Jing Li, Xingguang Zhou, Wei Zhou, Baichao Wang, Zhibo Chen:
Perceptual Quality Assessment of Internet Videos. 1248-1257 - Jonathan Carlton, Andy Brown, Caroline Jay, John Keane:
Using Interaction Data to Predict Engagement with Interactive Media. 1258-1266 - Sun-Kyung Lee, Jong-Hwan Kim:
Air-Text: Air-Writing and Recognition System. 1267-1274
Session 12: Multimodal Analysis and Description-I
- Daxin Gu, Jia Li, Yu Zhang, Yonghong Tian:
How to Learn a Domain-Adaptive Event Simulator? 1275-1283 - Jinming Mu, Shuiping Gou, Shasha Mao, Shankui Zheng:
A Stepwise Matching Method for Multi-modal Image based on Cascaded Network. 1284-1292 - Naili Xing, Sai Ho Yeung, Chenghao Cai, Teck Khim Ng, Wei Wang, Kaiyuan Yang, Nan Yang, Meihui Zhang, Gang Chen, Beng Chin Ooi:
SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis. 1293-1302 - Wanxia Deng, Yawen Cui, Zhen Liu, Gangyao Kuang, Dewen Hu, Matti Pietikäinen, Li Liu:
Informative Class-Conditioned Feature Alignment for Unsupervised Domain Adaptation. 1303-1312 - Zhaoquan Yuan, Xiao Peng, Xiao Wu, Changsheng Xu:
Hierarchical Multi-Task Learning for Diagram Question Answering with Multi-Modal Transformer. 1313-1321 - Jianming Lv, Kaijie Liu, Shengfeng He:
Differentiated Learning for Multi-Modal Domain Adaptation. 1322-1330
Session 13: Multimodal Analysis and Description-II
- Yang Jiao, Zequn Jie, Weixin Luo, Jingjing Chen, Yu-Gang Jiang, Xiaolin Wei, Lin Ma:
Two-stage Visual Cues Enhancement Network for Referring Image Segmentation. 1331-1340 - Yongyong Chen, Shuqin Wang, Chong Peng, Guangming Lu, Yicong Zhou:
Partial Tubal Nuclear Norm Regularized Multi-view Learning. 1341-1349 - Yuxing Wang, Yawen Lu, Zhihua Xie, Guoyu Lu:
Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment. 1350-1358 - Zhijie Lin, Zhou Zhao, Haoyuan Li, Jinglin Liu, Meng Zhang, Xingshan Zeng, Xiaofei He:
SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory. 1359-1367 - Xiaoni Li, Yu Zhou, Yifei Zhang, Aoting Zhang, Wei Wang, Ning Jiang, Haiying Wu, Weiping Wang:
Dense Semantic Contrast for Self-Supervised Visual Representation Learning. 1368-1376 - Xingyu Wan, Sanping Zhou, Jinjun Wang, Rongye Meng:
Multiple Object Tracking by Trajectory Map Regression with Temporal Priors Embedding. 1377-1386
Session 14: Multimedia Cloud, Edge and Device Computing
- Omar Mossad, Khaled M. Diab, Ihab Amer, Mohamed Hefeeda:
DeepGame: Efficient Video Encoding for Cloud Gaming. 1387-1395 - Takumi Kimura, Takashi Matsubara, Kuniaki Uehara:
ChartPointFlow for Topology-Aware 3D Point Cloud Generation. 1396-1404 - Cheng Tan, Jun Xia, Lirong Wu, Stan Z. Li:
Co-learning: Learning from Noisy Labels with Self-supervision. 1405-1413 - Xu Lu, Lei Zhu, Li Liu, Liqiang Nie, Huaxiang Zhang:
Graph Convolutional Multi-modal Hashing for Flexible Multimedia Retrieval. 1414-1422 - Jianming Ye, Shiliang Zhang, Jingdong Wang:
Hybrid Network Compression via Meta-Learning. 1423-1431 - Hui Cui, Lei Zhu, Jingjing Li, Zhiyong Cheng, Zheng Zhang:
Two-pronged Strategy: Lightweight Augmented Graph Network Hashing for Scalable Image Retrieval. 1432-1440
Interactive Arts
- Wenli Jiang, Chong Cao:
Reconstruction: A Motion Driven Interactive Artwork Inspired by Chinese Shadow Puppet. 1441-1442 - Predrag K. Nikolic, Ruiyang Liu, Shengcheng Luo:
Syntropic Counterpoints: Metaphysics of The Machines. 1443-1445 - Castillo Clarence Fitzgerald Gumtang, Sourav S. Bhowmick:
Kandinsky Mobile: Abstract Art-Inspired Interactive Visualization of Social Discussions on Mobile Devices. 1446-1448 - Lyn Chao-ling Chen:
Sand Scope: An Interactive Installation for Revealing the Connection Between Mental Space and Life Space in a Microcosm of the World. 1449-1451 - Lin Wang, Zhonghao Lin, Wei Cai:
Heraclitus's Forest: An Interactive Artwork for Oral History. 1452-1453 - Aiden Kang, Liang Wang, Ziyu Zhou, Zhe Huang, Robert J. K. Jacob:
Affective Color Fields: Reimagining Rothkoesque Artwork as an Interactive Companion for Artistic Self-Expression. 1454-1455 - Youyang Hu, Chiao-Chi Chou, Chia-Wei Li:
Apercevoir: Bio Internet of Things Interactive System. 1456-1458
Poster Session 2
- Zheng Wang, Jingjing Chen, Yu-Gang Jiang:
Visual Co-Occurrence Alignment Learning for Weakly-Supervised Video Moment Retrieval. 1459-1468 - Shubao Liu, Ke-Yue Zhang, Taiping Yao, Mingwei Bi, Shouhong Ding, Jilin Li, Feiyue Huang, Lizhuang Ma:
Adaptive Normalized Representation Learning for Generalizable Face Anti-Spoofing. 1469-1477 - Haozhe Wu, Jia Jia, Haoyu Wang, Yishun Dou, Chao Duan, Qingshan Deng:
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis. 1478-1486 - Zhongxing Ma, Yifan Zhao, Jia Li:
Pose-guided Inter- and Intra-part Relational Transformer for Occluded Person Re-Identification. 1487-1496 - Jiong Wang, Zhou Zhao, Weike Jin, Xinyu Duan, Zhen Lei, Baoxing Huai, Yiling Wu, Xiaofei He:
VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation. 1497-1506 - Lu He, Qianyu Zhou, Xiangtai Li, Li Niu, Guangliang Cheng, Xiao Li, Wenxuan Liu, Yunhai Tong, Lizhuang Ma, Liqing Zhang:
End-to-End Video Object Detection with Spatial-Temporal Transformers. 1507-1516 - Peng-Fei Zhang, Jiasheng Duan, Zi Huang, Hongzhi Yin:
Joint-teaching: Learning to Refine Knowledge for Resource-constrained Unsupervised Cross-modal Retrieval. 1517-1525 - Zhi Chen, Xiaoqing Ye, Liang Du, Wei Yang, Liusheng Huang, Xiao Tan, Zhenbo Shi, Fumin Shen, Errui Ding:
AggNet for Self-supervised Monocular Depth Estimation: Go An Aggressive Step Furthe. 1526-1534 - Xiaotong Luo, Qiuyuan Liang, Ding Liu, Yanyun Qu:
Boosting Lightweight Single Image Super-resolution via Joint-distillation. 1535-1543 - Shaohao Lu, Yuqiao Xian, Ke Yan, Yi Hu, Xing Sun, Xiaowei Guo, Feiyue Huang, Wei-Shi Zheng:
Discriminator-free Generative Adversarial Attack. 1544-1552 - Zengqun Zhao, Qingshan Liu:
Former-DFER: Dynamic Facial Expression Recognition Transformer. 1553-1561 - Guanyue Li, Yi Liu, Xiwen Wei, Yang Zhang, Si Wu, Yong Xu, Hau-San Wong:
Discovering Density-Preserving Latent Space Walks in GANs for Semantic Image Transformations. 1562-1570 - Yiming Wu, Xintian Wu, Xi Li, Jian Tian:
MGH: Metadata Guided Hypergraph Modeling for Unsupervised Person Re-identification. 1571-1580 - Meng-Jiun Chiou, Henghui Ding, Hanshu Yan, Changhu Wang, Roger Zimmermann, Jiashi Feng:
Recovering the Unbiased Scene Graphs from the Biased Ones. 1581-1590 - Fa-Ting Hong, Jia-Chang Feng, Dan Xu, Ying Shan, Wei-Shi Zheng:
Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization. 1591-1599 - Risheng Liu, Zhu Liu, Jinyuan Liu, Xin Fan:
Searching a Hierarchically Aggregated Fusion Architecture for Fast Multi-Modality Image Fusion. 1600-1608 - Yu Yin, Joseph P. Robinson, Songyao Jiang, Yue Bai, Can Qin, Yun Fu:
SuperFront: From Low-resolution to High-resolution Frontal Face Synthesis. 1609-1617 - Chen Jiang, Kaiming Huang, Sifeng He, Xudong Yang, Wei Zhang, Xiaobo Zhang, Yuan Cheng, Lei Yang, Qing Wang, Furong Xu, Tan Pan, Wei Chu:
Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval. 1618-1626 - Tianshu Xie, Xuan Cheng, Xiaomin Wang, Minghui Liu, Jiali Deng, Tao Zhou, Ming Liu:
Cut-Thumbnail: A Novel Data Augmentation for Convolutional Neural Network. 1627-1635 - Sheng Li, Xun Zhu, Guorui Feng, Xinpeng Zhang, Zhenxing Qian:
Diffusing the Liveness Cues for Face Anti-spoofing. 1636-1644 - Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan:
Co-Transport for Class-Incremental Learning. 1645-1654 - Fida Mohammad Thoker, Hazel Doughty, Cees G. M. Snoek:
Skeleton-Contrastive 3D Action Representation Learning. 1655-1663 - Albin Vogel, Erik Kronberg, Niklas Carlsson:
Fast-forwarding, Rewinding, and Path Exploration in Interactive Branched Video Streaming. 1664-1672 - Yunzhong Hou, Liang Zheng:
Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation). 1673-1682 - Chang Liu, Lichen Wang, Kai Li, Yun Fu:
Domain Generalization via Feature Variation Decorrelation. 1683-1691 - Dong Jing, Shuo Zhang, Runmin Cong, Youfang Lin:
Occlusion-aware Bi-directional Guided Network for Light Field Salient Object Detection. 1692-1701 - Jiabo Ye, Xin Lin, Liang He, Dingbang Li, Qin Chen:
One-Stage Visual Grounding via Semantic-Aware Feature Filter. 1702-1711 - Chenyou Fan, Junjie Hu, Jianwei Huang:
Few-Shot Multi-Agent Perception. 1712-1720 - Bo Seok Shim, Yoo Seung Shin, Seong-Wook Park, Jong-Uk Hou:
SI3DP: Source Identification Challenges and Benchmark for Consumer-Level 3D Printer Forensics. 1721-1729 - Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-Jun Zha, Yonggang Wen, Dacheng Tao:
Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers. 1730-1738 - Tianyi Xie, Liucheng Liao, Cheng Bi, Benlai Tang, Xiang Yin, Jianfei Yang, Mingjie Wang, Jiali Yao, Yang Zhang, Zejun Ma:
Towards Realistic Visual Dubbing with Heterogeneous Sources. 1739-1747 - Qianqian Wang, Wei Xia, Zhiqiang Tao, Quanxue Gao, Xiaochun Cao:
Deep Self-Supervised t-SNE for Multi-modal Subspace Clustering. 1748-1755 - Xindi Shang, Zehuan Yuan, Anran Wang, Changhu Wang:
Multimodal Video Summarization via Time-Aware Transformers. 1756-1765 - Taichi Nishimura, Atsushi Hashimoto, Yoshitaka Ushiku, Hirotaka Kameko, Shinsuke Mori:
State-aware Video Procedural Captioning. 1766-1774 - Woo-Sung Choi, Minseok Kim, Marco A. Martínez Ramírez, Jaehwa Chung, Soonyoung Jung:
AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries. 1775-1783 - Sitong Su, Lianli Gao, Junchen Zhu, Jie Shao, Jingkuan Song:
Fully Functional Image Manipulation Using Scene Graphs in A Bounding-Box Free Way. 1784-1792 - Xi Zhang, Feifei Zhang, Changsheng Xu:
Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning. 1793-1802 - Zhiwei Hao, Yong Luo, Han Hu, Jianping An, Yonggang Wen:
Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression. 1803-1811 - Haocong Rao, Xiping Hu, Jun Cheng, Bin Hu:
SM-SGE: A Self-Supervised Multi-Scale Skeleton Graph Encoding Framework for Person Re-Identification. 1812-1820 - Sohail Ahmed Khan, Hang Dai:
Video Transformer for Deepfake Detection with Incremental Learning. 1821-1828 - Jiahao Wang, Gang Pan, Di Sun, Jiawan Zhang:
Chinese Character Inpainting with Contextual Semantic Constraints. 1829-1837 - Ji Zhang, Jingkuan Song, Yazhou Yao, Lianli Gao:
Curriculum-Based Meta-learning. 1838-1846 - Haonan Qiu, Pan He, Shuchun Liu, Weiyuan Shao, Feiyun Zhang, Jiajun Wang, Liang He, Feng Wang:
Ego-Deliver: A Large-Scale Dataset For Egocentric Video Analysis. 1847-1855 - Ping-Han Chiang, Chi-Shen Chan, Shan-Hung Wu:
Adversarial Pixel Masking: A Defense against Physical Attacks for Pre-trained Object Detectors. 1856-1865 - Li Wang, Baoyu Fan, Zhenhua Guo, Yaqian Zhao, Runze Zhang, Rengang Li, Weifeng Gong, Endong Wang:
Knowledge-Supervised Learning: Knowledge Consensus Constraints for Person Re-Identification. 1866-1874 - Qingzhe Pan, Zhifu Zhao, Xuemei Xie, Jianan Li, Yuhan Cao, Guangming Shi:
View-normalized Skeleton Generation for Action Recognition. 1875-1883 - Zheyun Qin, Xiankai Lu, Xiushan Nie, Xiantong Zhen, Yilong Yin:
Learning Hierarchical Embedding for Video Instance Segmentation. 1884-1892 - Tianhao Zhang, Hung-Yu Tseng, Lu Jiang, Weilong Yang, Honglak Lee, Irfan Essa:
Text as Neural Operator: Image Manipulation by Text Instruction. 1893-1902 - Wenhao Wu, Yuxiang Zhao, Yanwu Xu, Xiao Tan, Dongliang He, Zhikang Zou, Jin Ye, Yingying Li, Mingde Yao, Zichao Dong, Yifeng Shi:
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning. 1903-1911 - Yulin Li, Yuxi Qian, Yuechen Yu, Xiameng Qin, Chengquan Zhang, Yan Liu, Kun Yao, Junyu Han, Jingtuo Liu, Errui Ding:
StrucTexT: Structured Text Understanding with Multi-Modal Transformers. 1912-1920 - Yudong Chen, Sen Wang, Jianglin Lu, Zhi Chen, Zheng Zhang, Zi Huang:
Local Graph Convolutional Networks for Cross-Modal Hashing. 1921-1928 - Shenhao Cao, Qin Zou, Xiuqing Mao, Dengpan Ye, Zhongyuan Wang:
Metric Learning for Anti-Compression Facial Forgery Detection. 1929-1937 - Yaqi Xia, Yan Xia, Wei Li, Rui Song, Kailang Cao, Uwe Stilla:
ASFM-Net: Asymmetrical Siamese Feature Matching Network for Point Completion. 1938-1947 - Ding Ma, Xiangqian Wu:
Capsule-based Object Tracking with Natural Language Specification. 1948-1956 - Bicheng Dai, Kaisheng Wu, Tong Wu, Kai Li, Yanyun Qu, Yuan Xie, Yun Fu:
Faster-PPN: Towards Real-Time Semantic Segmentation with Dual Mutual Learning for Ultra-High Resolution Images. 1957-1965 - Nenglun Chen, Xingjia Pan, Runnan Chen, Lei Yang, Zhiwen Lin, Yuqiang Ren, Haolei Yuan, Xiaowei Guo, Feiyue Huang, Wenping Wang:
Distributed Attention for Grounded Image Captioning. 1966-1975 - Zhiwei Liu, Xiangyu Zhu, Lu Yang, Xiang Yan, Ming Tang, Zhen Lei, Guibo Zhu, Xuetao Feng, Yan Wang, Jinqiao Wang:
Multi-initialization Optimization Network for Accurate 3D Human Pose and Shape Estimation. 1976-1984 - Qinyan Dai, Juncheng Li, Qiaosi Yi, Faming Fang, Guixu Zhang:
Feedback Network for Mutually Boosted Stereo Image Super-Resolution and Disparity Estimation. 1985-1993 - Qijun Wang, Guodong Zheng:
Merging Multiple Template Matching Predictions in Intra Coding with Attentive Convolutional Neural Network. 1994-2001 - Hao Ni, Jingkuan Song, Xiaosu Zhu, Feng Zheng, Lianli Gao:
Camera-Agnostic Person Re-Identification via Adversarial Disentangling Learning. 2002-2010
Session 15: Best Paper Session
- Uttaran Bhattacharya, Elizabeth Childs, Nicholas Rewkowski, Dinesh Manocha:
Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning. 2027-2036 - Shangzhe Di, Zeren Jiang, Si Liu, Zhaokai Wang, Leyan Zhu, Zexin He, Hongming Liu, Shuicheng Yan:
Video Background Music Generation with Controllable Music Transformer. 2037-2045 - Zhi Qiao, Yu Zhou, Jin Wei, Wei Wang, Yuan Zhang, Ning Jiang, Hongbin Wang, Weiping Wang:
PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition. 2046-2055 - Abhishek Kumar, Tristan Braud, Lik Hang Lee, Pan Hui:
Theophany: Multimodal Speech Augmentation in Instantaneous Privacy Channels. 2056-2064 - You-Yang Hu, Yao Fu Jan, Kuan-Wei Tseng, You-Shin Tsai, Hung-Ming Sung, Jin-Yao Lin, Yi-Ping Hung:
aBio: Active Bi-Olfactory Display Using Subwoofers for Virtual Reality. 2065-2073
Poster Session 3
- Yunfei Guo, Wei Feng, Fei Yin, Tao Xue, Shuqi Mei, Cheng-Lin Liu:
Learning to Understand Traffic Signs. 2076-2084 - Yanyuan Qiao, Qi Chen, Chaorui Deng, Ning Ding, Yuankai Qi, Mingkui Tan, Xincheng Ren, Qi Wu:
R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks. 2085-2093 - Chen Zhang, Runmin Cong, Qinwei Lin, Lin Ma, Feng Li, Yao Zhao, Sam Kwong:
Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection. 2094-2102 - Junda Wu, Tong Yu, Shuai Li:
Deconfounded and Explainable Interactive Vision-Language Retrieval of Complex Scenes. 2103-2111 - Junyong You:
Long Short-term Convolutional Transformer for No-Reference Video Quality Assessment. 2112-2120 - Baopu Li, Yanwen Fan, Zhihong Pan, Yuchen Bian, Gang Zhang:
Automatic Channel Pruning with Hyper-parameter Search and Dynamic Masking. 2121-2129 - Yue Zhao, Weizhi Nie, An-An Liu, Zan Gao, Yuting Su:
SVHAN: Sequential View Based Hierarchical Attention Network for 3D Shape Recognition. 2130-2138 - Jian Li, Bin Zhang, Yabiao Wang, Ying Tai, Zhenyu Zhang, Chengjie Wang, Jilin Li, Xiaoming Huang, Yili Xia:
ASFD: Automatic and Scalable Face Detector. 2139-2147 - Qi Tang, Runmin Cong, Ronghui Sheng, Lingzhi He, Dan Zhang, Yao Zhao, Sam Kwong:
BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and Monocular Depth Estimation. 2148-2157 - Yuxi Li, Boshen Zhang, Jian Li, Yabiao Wang, Weiyao Lin, Chengjie Wang, Jilin Li, Feiyue Huang:
LSTC: Boosting Atomic Action Detection with Long-Short-Term Context. 2158-2166 - Taehun Kim, Hyemin Lee, Daijin Kim:
UACANet: Uncertainty Augmented Context Attention for Polyp Segmentation. 2167-2175 - Zhenquan Lin, Kailing Guo, Xiaofen Xing, Xiangmin Xu:
Weight Evolution: Improving Deep Neural Networks Training through Evolving Inferior Weight Values. 2176-2184 - Zhikang Zou, Xiaoye Qu, Pan Zhou, Shuangjie Xu, Xiaoqing Ye, Wenhao Wu, Jin Ye:
Coarse to Fine: Domain Adaptive Crowd Counting via Adversarial Scoring Network. 2185-2194 - Qiming Wu, Zhikang Zou, Pan Zhou, Xiaoqing Ye, Binghui Wang, Ang Li:
Towards Adversarial Patch Analysis and Certified Defense against Crowd Counting. 2195-2204 - Pengpeng Zeng, Lianli Gao, Xinyu Lyu, Shuaiqi Jing, Jingkuan Song:
Conceptual and Syntactical Cross-modal Alignment with Cross-level Consistency for Image-Text Matching. 2205-2213 - Yifan Zhao, Le Hui, Jin Xie:
SSPU-Net: Self-Supervised Point Cloud Upsampling via Differentiable Rendering. 2214-2223 - Anupam Sobti, Vaibhav Mavi, M. Balakrishnan, Chetan Arora:
VmAP: A Fair Metric for Video Object Detection. 2224-2232 - Mucong Ye, Jing Zhang, Jinpeng Ouyang, Ding Yuan:
Source Data-free Unsupervised Domain Adaptation for Semantic Segmentation. 2233-2242 - Wang Yin, Peng Lu, Zhaoran Zhao, Xujun Peng:
Yes, "Attention Is All You Need", for Exemplar based Colorization. 2243-2251 - Jiehua Zhang, Liang Li, Chenggang Yan, Yaoqi Sun, Tao Shen, Jiyong Zhang, Zhan Wang:
Heuristic Depth Estimation with Progressive Depth Reconstruction and Confidence-Aware Loss. 2252-2261 - Jingxian Sun, Lichao Zhang, Yufei Zha, Abel Gonzalez-Garcia, Peng Zhang, Wei Huang, Yanning Zhang:
Unsupervised Cross-Modal Distillation for Thermal Infrared Tracking. 2262-2270 - Kaiqi Dong, Wei Yang, Zhenbo Xu, Liusheng Huang, Zhidong Yu:
ABPNet: Adaptive Background Modeling for Generalized Few Shot Segmentation. 2271-2280 - Qingqing Wang, Liqiang Xiao, Yue Lu, Yaohui Jin, Hao He:
Towards Reasoning Ability in Scene Text Visual Question Answering. 2281-2289 - Jianxin Sun, Qi Li, Weining Wang, Jian Zhao, Zhenan Sun:
Multi-caption Text-to-Face Synthesis: Dataset and Algorithm. 2290-2298 - Weili Guan, Haokun Wen, Xuemeng Song, Chung-Hsing Yeh, Xiaojun Chang, Liqiang Nie:
Multimodal Compatibility Modeling via Exploring the Consistent and Complementary Correlations. 2299-2307 - Shudong Huang, Ivor W. Tsang, Zenglin Xu, Jiancheng Lv, Quanhui Liu:
CDD: Multi-view Subspace Clustering via Cross-view Diversity Detection. 2308-2316 - Yiqi Lin, Jinpeng Wang, Manlin Zhang, Andy J. Ma:
Learning Spatio-temporal Representation by Channel Aliasing Video Perception. 2317-2325 - Huanqian Yan, Xingxing Wei:
Efficient Sparse Attacks on Videos using Reinforcement Learning. 2326-2334 - Shengshan Hu, Yechao Zhang, Xiaogeng Liu, Leo Yu Zhang, Minghui Li, Hai Jin:
AdvHash: Set-to-set Targeted Attack on Deep Hashing with One Single Adversarial Patch. 2335-2343 - Dailan He, Yusheng Zhao, Junyu Luo, Tianrui Hui, Shaofei Huang, Aixi Zhang, Si Liu:
TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding. 2344-2352 - Qian He, Desen Zhou, Bo Wan, Xuming He:
Single Image 3D Object Estimation with Primitive Graph Networks. 2353-2361 - Yun Li, Chen Zhang, Shihao Han, Li Lyna Zhang, Baoqun Yin, Yunxin Liu, Mengwei Xu:
Boosting Mobile CNN Inference through Semantic Memory. 2362-2371 - Gil Shapira, Noga Levy, Ishay Goldin, Roy Josef Jevnisek:
Knowing When to Quit: Selective Cascaded Regression with Patch Attention for Real-Time Face Alignment. 2372-2380 - Jianjun Chen, Shancheng Fang, Hongtao Xie, Zheng-Jun Zha, Yue Hu, Jianlong Tan:
End-to-end Boundary Exploration for Weakly-supervised Semantic Segmentation. 2381-2390 - Xiangwen Deng, Junlin Zhu, Shangming Yang:
SFE-Net: EEG-based Emotion Recognition with Symmetrical Spatial Feature Extraction. 2391-2400 - Dian Jin, Long Ma, Risheng Liu, Xin Fan:
Bridging the Gap between Low-Light Scenes: Bilevel Learning for Fast Adaptation. 2401-2409 - Liangchen Song, Jialian Wu, Ming Yang, Qian Zhang, Yuan Li, Junsong Yuan:
Handling Difficult Labels for Multi-label Image Classification via Uncertainty Distillation. 2410-2419 - Chenxi Ma, Bo Yan, Weimin Tan, Xuhao Jiang:
Perception-Oriented Stereo Image Super-Resolution. 2420-2428 - Rongkai Zhang, Lanqing Guo, Siyu Huang, Bihan Wen:
ReLLIE: Deep Reinforcement Learning for Customized Low-Light Image Enhancement. 2429-2437 - Lingbo Yang, Zhanning Gao, Siwei Ma, Wen Gao:
Intrinsic Temporal Regularization for High-resolution Human Video Synthesis. 2438-2446 - Kit-Yung Lam, Lik Hang Lee, Pan Hui:
A2W: Context-Aware Recommendation System for Mobile Augmented Reality Web Browser. 2447-2455 - Changchong Sheng, Matti Pietikäinen, Qi Tian, Li Liu:
Cross-modal Self-Supervised Learning for Lip Reading: When Contrastive Learning meets Adversarial Training. 2456-2464 - Shentong Mo, Xin Miao:
OsGG-Net: One-step Graph Generation Network for Unbiased Head Pose Estimation. 2465-2473 - Xirong Li, Yang Zhou, Jie Wang, Hailan Lin, Jianchun Zhao, Dayong Ding, Weihong Yu, Youxin Chen:
Multi-Modal Multi-Instance Learning for Retinal Disease Recognition. 2474-2482 - Keyan Ding, Yi Liu, Xueyi Zou, Shiqi Wang, Kede Ma:
Locally Adaptive Structure and Texture Similarity for Image Quality Assessment. 2483-2491 - Yiyang Huang, Xuefeng Liang, Chaowei Fang:
CALLip: Lipreading using Contrastive and Attribute Learning. 2492-2500 - Yu Sugiyama, Keiji Yanai:
Cross-Modal Recipe Embeddings by Disentangling Recipe Contents and Dish Styles. 2501-2509 - Yu Zhou, Hongtao Xie, Shancheng Fang, Jing Wang, Zhengjun Zha, Yongdong Zhang:
TDI TextSpotter: Taking Data Imbalance into Account in Scene Text Spotting. 2510-2518 - Xuanyu Zhang, Qing Yang:
Position-Augmented Transformers with Entity-Aligned Mesh for TextVQA. 2519-2528 - Ye Deng, Siqi Hui, Sanping Zhou, Deyu Meng, Jinjun Wang:
Learning Contextual Transformer Network for Image Inpainting. 2529-2538 - Lei Ma, Jian Shi, Yanyun Chen:
Milliseconds Color Stippling. 2539-2548 - Longyao Liu, Bo Ma, Yulin Zhang, Xin Yi, Haozhi Li:
AFD-Net: Adaptive Fully-Dual Network for Few-Shot Object Detection. 2549-2557 - Meng Shen, Huaizheng Zhang, Yixin Cao, Fan Yang, Yonggang Wen:
Missing Data Imputation for Solar Yield Prediction using Temporal Multi-Modal Variational Auto-Encoder. 2558-2566 - Chenyi Lei, Shixian Luo, Yong Liu, Wanggui He, Jiamang Wang, Guoxin Wang, Haihong Tang, Chunyan Miao, Houqiang Li:
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training. 2567-2576 - Hongyu Li, Jia Li, Dong Zhao, Long Xu:
DehazeFlow: Multi-scale Conditional Flow Network for Single Image Dehazing. 2577-2585 - Huan Zheng, Zhao Zhang, Yang Wang, Zheng Zhang, Mingliang Xu, Yi Yang, Meng Wang:
GCM-Net: Towards Effective Global Context Modeling for Image Inpainting. 2586-2594 - Yufei Wang, Haoliang Li, Lap-Pui Chau, Alex C. Kot:
Embracing the Dark Knowledge: Domain Generalization Using Regularized Knowledge Distillation. 2595-2604 - Bingyu Hu, Zheng-Jun Zha, Jiawei Liu, Xierong Zhu, Hongtao Xie:
Cluster and Scatter: A Multi-grained Active Semi-supervised Learning Framework for Scalable Person Re-identification. 2605-2614 - Xinzhi Dong, Chengjiang Long, Wenju Xu, Chunxia Xiao:
Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning. 2615-2624 - Qilin Deng, Kai Wang, Minghao Zhao, Runze Wu, Yu Ding, Zhene Zou, Yue Shang, Jianrong Tao, Changjie Fan:
Build Your Own Bundle - A Neural Combinatorial Optimization Method. 2625-2633 - Changfeng Yu, Yi Chang, Yi Li, Xile Zhao, Luxin Yan:
Unsupervised Image Deraining: Optimization Model Driven Deep CNN. 2634-2642
Keynote Talks III&IV
- Cordelia Schmid:
Do you see what I see?: Large-scale Learning from Multimodal Videos. 2643 - Jingren Zhou:
Large-scale Multi-Modality Pretrained Models: Applications and Experiences. 2644
Session 17: Multimodal Fusion and Embedding-I
- Xiaoqi Zhao, Youwei Pang, Jiaxing Yang, Lihe Zhang, Huchuan Lu:
Multi-Source Fusion and Automatic Predictor Selection for Zero-Shot Video Object Segmentation. 2645-2653 - Changshu Liu, Liangjian Wen, Zhao Kang, Guangchun Luo, Ling Tian:
Self-supervised Consensus Representation Learning for Attributed Graph. 2654-2662 - Shuhui Qu, Yan Kang, Janghwan Lee:
Efficient Multi-Modal Fusion with Diversity Analysis. 2663-2670 - Yongming Wen, Yiquan Fang, Junhao Cai, Kimwa Tung, Hui Cheng:
GCCN: Geometric Constraint Co-attention Network for 6D Object Pose Estimation. 2671-2679 - Paul Pu Liang, Peter Wu, Ziyin Liu, Louis-Philippe Morency, Ruslan Salakhutdinov:
Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment. 2680-2689 - Yikai Wang, Wenbing Huang, Bin Fang, Fuchun Sun, Chang Li:
Elastic Tactile Simulation Towards Tactile-Visual Perception. 2690-2698
Session 18: Multimodal Fusion and Embedding-II
- Zan Gao, Yuxiang Shao, Weili Guan, Meng Liu, Zhiyong Cheng, Shengyong Chen:
A Novel Patch Convolutional Neural Network for View-based 3D Model Retrieval. 2699-2707 - Xu Yan, Zhengcong Fei, Zekang Li, Shuhui Wang, Qingming Huang, Qi Tian:
Semi-Autoregressive Image Captioning. 2708-2716 - Yi Zhang, Xinwang Liu, Siwei Wang, Jiyuan Liu, Sisi Dai, En Zhu:
One-Stage Incomplete Multi-view Clustering via Late Fusion. 2717-2725 - Jiyuan Liu, Xinwang Liu, Yi Zhang, Pei Zhang, Wenxuan Tu, Siwei Wang, Sihang Zhou, Weixuan Liang, Siqi Wang, Yuexiang Yang:
Self-Representation Subspace Clustering for Incomplete Multi-view Data. 2726-2734 - Meng Wang, Sen Wang, Han Yang, Zheng Zhang, Xi Chen, Guilin Qi:
Is Visual Context Really Helpful for Knowledge Graph? A Representation Learning Perspective. 2735-2743 - Yushan Zhu, Huaixiao Zhao, Wen Zhang, Ganqiang Ye, Hui Chen, Ningyu Zhang, Huajun Chen:
Knowledge Perceived Multi-modal Pretraining in E-commerce. 2744-2752
Session 19: Video Program and Demo Session
- Yipeng Yu, Zirui Tu, Longyu Lu, Xiao Chen, Hui Zhan, Zixun Sun:
Text2Video: Automatic Video Generation Based on Text Scripts. 2753-2755 - Sen Yang, Qike Zhao, Lanxin Miao, Min Chen, Lianli Gao, Jingkuan Song, Weidong Le:
A System for Interactive and Intelligent AD Auxiliary Screening. 2756-2758 - Borun Xu, Biao Wang, Jiale Tao, Tiezheng Ge, Yuning Jiang, Wen Li, Lixin Duan:
Move As You Like: Image Animation in E-Commerce Scenario. 2759-2761 - Rinita Roy, Ruben Mayer, Hans-Arno Jacobsen:
MDMS: Music Data Matching System for Query Variant Retrieval. 2762-2764 - Mu Mu, Murtada Dohan:
Community Generated VR Painting using Eye Gaze. 2765-2767 - Yuki Tajima, Toshiharu Horiuchi, Gen Hattori:
Sync Glass: Virtual Pouring and Toasting Experience with Multimodal Presentation. 2768-2770 - Yanhao Zhang, Qiang Wang, Yun Zheng, Pan Pan, Yinghui Xu:
VideoDiscovery: An Automatic Short-Video Generation System for E-commerce Live-streaming. 2771-2773 - Yuanfeng Song, Xuefang Zhao, Di Jiang, Xiaoling Huang, Weiwei Zhao, Qian Xu, Raymond Chi-Wing Wong, Qiang Yang:
SmartSales: An AI-Powered Telemarketing Coaching System in FinTech. 2774-2776 - Yuanfeng Song, Di Jiang, Xuefang Zhao, Xiaoling Huang, Qian Xu, Raymond Chi-Wing Wong, Qiang Yang:
SmartMeeting: Automatic Meeting Transcription and Summarization for In-Person Conversations. 2777-2779 - Hao Lou, Heng Huang, Chaoen Xiao, Xin Jin:
Aesthetic Evaluation and Guidance for Mobile Photography. 2780-2782 - Wenyuan Xue, Siqi Cai, Wen Wang, Qingyong Li, Baosheng Yu, Yibing Zhan, Dacheng Tao:
A Question Answering System for Unstructured Table Images. 2783-2785 - Xujian Zhao, Chongwei Wang, Peiquan Jin, Hui Zhang, Chunming Yang, Bo Li:
Post2Story: Automatically Generating Storylines from Microblogging Platforms. 2786-2788 - Tong Shen, Jiawei Zuo, Fan Shi, Jin Zhang, Liqin Jiang, Meng Chen, Zhengchen Zhang, Wei Zhang, Xiaodong He, Tao Mei:
ViDA-MAN: Visual Dialog with Digital Humans. 2789-2791 - Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu:
A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation. 2792-2794 - Maxime Grandidier, Fabien Boucaud, Indira Thouvenin, Catherine Pelachaud:
Softly: Simulated Empathic Touch between an Agent and a Human. 2795-2797 - Akihisa Ishino, Yoko Yamakata, Hiroaki Karasawa, Kiyoharu Aizawa:
RecipeLog: Recipe Authoring App for Accurate Food Recording. 2798-2800 - Matthias Springstein, Stefanie Schneider, Javad Rahnama, Eyke Hüllermeier, Hubertus Kohle, Ralph Ewerth:
iART: A Search Engine for Art-Historical Images to Support Research in the Humanities. 2801-2803 - Jardenna Mohazzab, Abe Vos, Jonathan van Westendorp, Lucas Lageweg, Dylan Prins, Aritra Bhowmik:
ArtiVisual: A Platform to Generate and Compare Art. 2804-2806 - Ivona Najdenkoska, Jeroen den Boef, Thomas Schneider, Justo van der Werf, Reinier de Ridder, Fajar Fathurrahman, Marcel Worring:
GCNIllustrator: Illustrating the Effect of Hyperparameters on Graph Convolutional Networks. 2807-2809 - Noboru Yoshida, Jianquan Liu:
On-demand Action Detection System using Pose Information. 2810-2812 - Xian Zhao, Jiaming Zhang, Xiaowen Huang:
APF: An Adversarial Privacy-preserving Filter to Protect Portrait Information. 2813-2815 - Li Hu, Jinwei Qi, Bang Zhang, Pan Pan, Yinghui Xu:
Text-driven 3D Avatar Animation with Emotional and Expressive Behaviors. 2816-2818 - Xinyan Yang, Fei Hu, Long Ye:
Text to Scene: A System of Configurable 3D Indoor Scene Synthesis. 2819-2821 - Ruiqi Wang, Long Ye, Qin Zhang:
MovieREP: A New Movie Reproduction Framework for Film Soundtrack. 2822-2824
Session 20: Multimodal Fusion and Embedding-III
- Li Gao, Jing Zhang, Lefei Zhang, Dacheng Tao:
DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation. 2825-2833 - Yu Lin, Jinghui Guo, Yang Gao, Yi-Fan Li, Zhuoyi Wang, Latifur Khan:
Generating Point Cloud from Single Image in The Few Shot Scenario. 2834-2842 - Yuqing Song, Shizhe Chen, Qin Jin, Wei Luo, Jun Xie, Fei Huang:
Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training. 2843-2852 - Yong Liu, Susen Yang, Chenyi Lei, Guoxin Wang, Haihong Tang, Juyong Zhang, Aixin Sun, Chunyan Miao:
Pre-training Graph Transformer with Multimodal Side Information for Recommendation. 2853-2861 - Minyoung Kim, Ricardo Guerrero, Vladimir Pavlovic:
Learning Disentangled Factors from Paired Data in Cross-Modal Retrieval: An Implicit Identifiable VAE Approach. 2862-2870 - Liang Peng, Shuangji Yang, Yi Bin, Guoqing Wang:
Progressive Graph Attention Network for Video Question Answering. 2871-2879
Session 21: Media Interpretation-I
- Tao Dai, Yalei Lv, Bin Chen, Zhi Wang, Zexuan Zhu, Shu-Tao Xia:
Mix-order Attention Networks for Image Restoration. 2880-2888 - Ji Zhang, Jian-Jun Qiao, Xiao Wu, Wei Li:
Vehicle Counting Network with Attention-based Mask Refinement and Spatial-awareness Block Loss. 2889-2898 - Zhiyang Chen, Yousong Zhu, Chaoyang Zhao, Guosheng Hu, Wei Zeng, Jinqiao Wang, Ming Tang:
DPT: Deformable Patch-based Transformer for Visual Recognition. 2899-2907 - Cairong Zhao, Shuyang Feng, Brian Nlong Zhao, Zhijun Ding, Jun Wu, Fumin Shen, Heng Tao Shen:
Scene Text Image Super-Resolution via Parallelly Contextual Attention Network. 2908-2917 - Mengyuan Ding, Shanshan Zhang, Jian Yang:
Improving Pedestrian Detection from a Long-tailed Domain Perspective. 2918-2926 - Xianyong Fang, Xiaohao He, Linbo Wang, Jianbing Shen:
Robust Shadow Detection by Exploring Effective Shadow Contexts. 2927-2935
Session 22: Doctoral Symposium
- Babak Taraghi:
End-to-end Quality of Experience Evaluation for HTTP Adaptive Streaming. 2936-2939 - Yutong Zhou:
Generative Adversarial Network for Text-to-Face Synthesis and Manipulation. 2940-2944 - Zhihang Ren:
GAN-aided Serial Dependence Study in Medical Image Perception. 2945-2949 - Ru Li:
Image Style Transfer with Generative Adversarial Networks. 2950-2954 - Yuhang Lu:
Annotation-Efficient Semantic Segmentation with Shape Prior Knowledge. 2955-2959 - Peng Dai:
Neural-based Rendering and Application. 2960-2963 - Shaoxiang Chen:
Towards Bridging Video and Language by Caption Generation and Sentence Localization. 2964-2968 - Pratibha Kumari:
Situational Anomaly Detection in Multimedia Data under Concept Drift. 2969-2973 - Guangzhi Wang:
Dynamic Knowledge Distillation with Cross-Modality Knowledge Transfer. 2974-2978
Session 23: Media Interpretation-II
- Peidong Liu, Zibin He, Xiyu Yan, Yong Jiang, Shu-Tao Xia, Feng Zheng, Maowei Hu:
WeClick: Weakly-Supervised Video Semantic Segmentation with Click Annotations. 2995-3004 - Jinhai Yang, Hua Yang, Lin Chen:
Towards Cross-Granularity Few-Shot Learning: Coarse-to-Fine Pseudo-Labeling with Visual-Semantic Meta-Embedding. 3005-3014 - Guoqing Wang, Changming Sun, Xing Xu, Jingjing Li, Zheng Wang, Zeyu Ma:
Disentangled Representation Learning and Enhancement Network for Single Image De-Raining. 3015-3023 - Lei Zhu, Zhaojing Luo, Wei Wang, Meihui Zhang, Gang Chen, Kaiping Zheng:
Towards Robust Cross-domain Image Understanding with Unsupervised Noise Removal. 3024-3033 - Zaid Khan, Yun Fu:
Exploiting BERT for Multimodal Target Sentiment Classification through Input Space Translation. 3034-3042 - Jingran Zhang, Xing Xu, Fumin Shen, Yazhou Yao, Jie Shao, Xiaofeng Zhu:
Video Representation Learning with Graph Contrastive Augmentation. 3043-3051
Poster Session 4
- Shipeng Yan, Jiale Zhou, Jiangwei Xie, Songyang Zhang, Xuming He:
An EM Framework for Online Incremental Learning of Semantic Segmentation. 3052-3060 - Shuang Li, Bingfeng Han, Zhenjie Yu, Chi Harold Liu, Kai Chen, Shuigen Wang:
I2V-GAN: Unpaired Infrared-to-Visible Video Translation. 3061-3069 - Zitai Wang, Qianqian Xu, Zhiyong Yang, Xiaochun Cao, Qingming Huang:
Implicit Feedbacks are Not Always Favorable: Iterative Relabeled One-Class Collaborative Filtering against Noisy Interactions. 3070-3078 - Dahu Shi, Xing Wei, Xiaodong Yu, Wenming Tan, Ye Ren, Shiliang Pu:
InsPose: Instance-Aware Networks for Single-Stage Multi-Person Pose Estimation. 3079-3087 - Lufan Ma, Tiancai Wang, Bin Dong, Jiangpeng Yan, Xiu Li, Xiangyu Zhang:
Implicit Feature Refinement for Instance Segmentation. 3088-3096 - Anwen Hu, Shizhe Chen, Qin Jin:
Question-controlled Text-aware Image Captioning. 3097-3105 - Yiwei Zhang, Toshihiko Yamasaki:
Style-Aware Image Recommendation for Social Media Marketing. 3106-3114 - He Li, Mang Ye, Bo Du:
WePerson: Learning a Generalized Re-identification Model from All-weather Virtual Data. 3115-3123 - Shuai Liu, Lu Zhang, Shuai Hao, Huchuan Lu, You He:
Polar Ray: A Single-stage Angle-free Detector for Oriented Object Detection in Aerial Images. 3124-3132 - Bi'an Du, Xiang Gao, Wei Hu, Xin Li:
Self-Contrastive Learning with Hard Negative Sampling for Self-supervised Point Cloud Learning. 3133-3142 - Yi Zhang, Sheng Huang, Fengtao Zhou:
Generally Boosting Few-Shot Learning with HandCrafted Features. 3143-3152 - Tianjun Zhang, Brian Nlong Zhao, Ying Shen, Xuan Shao, Lin Zhang, Yicong Zhou:
ROECS: A Robust Semi-direct Pipeline Towards Online Extrinsics Correction of the Surround-view System. 3153-3161 - Wen Qian, Zhiqun He, Silong Peng, Chen Chen, Wei Wu:
Pseudo Graph Convolutional Network for Vehicle ReID. 3162-3171 - Wencan Huang, Wenwen Pan, Zhou Zhao, Qi Tian:
Towards Fast and High-Quality Sign Language Production. 3172-3181 - Zhenzhong Kuang, Huigui Liu, Jun Yu, Aikui Tian, Lei Wang, Jianping Fan, Noboru Babaguchi:
Effective De-identification Generative Adversarial Network for Face Anonymization. 3182-3191 - Ricardo Guerrero, Hai Xuan Pham, Vladimir Pavlovic:
Cross-modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Subspace Learning. 3192-3201 - Jie Xiao, Dandan Zhan, Haoran Qi, Zhi Jin:
When Face Completion Meets Irregular Holes: An Attributes Guided Deep Inpainting Network. 3202-3210 - Zongmo Huang, Yazhou Ren, Xiaorong Pu, Lifang He:
Non-Linear Fusion for Self-Paced Multi-View Clustering. 3211-3219 - Pengzhan Sun, Bo Wu, Xunsong Li, Wen Li, Lixin Duan, Chuang Gan:
Counterfactual Debiasing Inference for Compositional Action Recognition. 3220-3228 - Yuhan Zhang, Bo Wu, Wen Li, Lixin Duan, Chuang Gan:
STST: Spatial-Temporal Specialized Transformer for Skeleton-based Action Recognition. 3229-3237 - Xinyu Liu, Baopu Li, Zhen Chen, Yixuan Yuan:
Exploring Gradient Flow Based Saliency for DNN Model Compression. 3238-3246 - Shengjie Chen, Zhenhua Guo, Bo Yuan:
An Adaptive Iterative Inpainting Method with More Information Exploration. 3247-3256 - Gonçalo Marcelino, David Semedo, André Mourão, Saverio G. Blasi, João Magalhães, Marta Mrak:
Assisting News Media Editors with Cohesive Visual Storylines. 3257-3265 - Yiqiang Zhao, Yiyao Zhou, Rui Chen, Bin Hu, Xiding Ai:
MM-Flow: Multi-modal Flow Network for Point Cloud Completion. 3266-3274 - Zhiliang Peng, Wei Huang, Zonghao Guo, Xiaosong Zhang, Jianbin Jiao, Qixiang Ye:
Long-tailed Distribution Adaptation. 3275-3282 - Kecheng Chen, Kun Long, Yazhou Ren, Jiayu Sun, Xiaorong Pu:
Lesion-Inspired Denoising Network: Connecting Medical Image Denoising and Lesion Detection. 3283-3292 - Fuming You, Jingjing Li, Lei Zhu, Zhi Chen, Zi Huang:
Domain Adaptive Semantic Segmentation without Source Data. 3293-3302 - Yuchen Yang, Min Wang, Wengang Zhou, Houqiang Li:
Cross-modal Joint Prediction and Alignment for Composed Query Image Retrieval. 3303-3311 - Yingjian Li, Yingnan Gao, Bingzhi Chen, Zheng Zhang, Lei Zhu, Guangming Lu:
JDMAN: Joint Discriminative and Mutual Adaptation Networks for Cross-Domain Facial Expression Recognition. 3312-3320 - Feifei Shao, Yawei Luo, Li Zhang, Lu Ye, Siliang Tang, Yi Yang, Jun Xiao:
Improving Weakly Supervised Object Localization via Causal Intervention. 3321-3329 - Xinhao Li, Jingjing Li, Lei Zhu, Guoqing Wang, Zi Huang:
Imbalanced Source-free Domain Adaptation. 3330-3339 - Zhekai Du, Jingjing Li, Ke Lu, Lei Zhu, Zi Huang:
Learning Transferrable and Interpretable Representations for Domain Generalization. 3340-3349 - Zhenyu Xie, Xujie Zhang, Fuwei Zhao, Haoye Dong, Michael C. Kampffmeyer, Haonan Yan, Xiaodan Liang:
WAS-VTON: Warping Architecture Search for Virtual Try-on Network. 3350-3359 - Yan-Jie Zhou, Shi-Qi Liu, Xiao-Liang Xie, Zeng-Guang Hou:
DFR-Net: A Novel Multi-Task Learning Network for Real-Time Multi-Instrument Segmentation. 3360-3369 - Mingrui Lao, Yanming Guo, Yu Liu, Wei Chen, Nan Pu, Michael S. Lew:
From Superficial to Deep: Language Bias driven Curriculum Learning for Visual Question Answering. 3370-3379 - Xun Gao, Yin Zhao, Jie Zhang, Longjun Cai:
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark. 3380-3389 - Yingying Cheng, Fan Zhang, Gang Hu, Yiwen Wang, Hanhui Yang, Gong Zhang, Zhuo Cheng:
Block Popularity Prediction for Multimedia Storage Systems Using Spatial-Temporal-Sequential Neural Networks. 3390-3398 - Yang Chen, Yingwei Pan, Yu Wang, Ting Yao, Xinmei Tian, Tao Mei:
Transferrable Contrastive Learning for Visual Domain Adaptation. 3399-3408 - Rong-Cheng Tu, Xian-Ling Mao, Cihang Kong, Zihang Shao, Ze-Lin Li, Wei Wei, Heyan Huang:
Weighted Gaussian Loss based Hamming Hashing. 3409-3417 - Peng Lu, Gao Huang, Hangyu Lin, Wenming Yang, Guodong Guo, Yanwei Fu:
Domain-Aware SE Network for Sketch-based Image Retrieval with Multiplicative Euclidean Margin Softmax. 3418-3426 - Deyu Wang, Dongchao Wen, Wei Tao, Lingxiao Yin, Tse-Wei Chen, Tadayuki Ito, Kinya Osa, Masami Kato:
FTAFace: Context-enhanced Face Detector with Fine-grained Task Attention. 3427-3436 - Jingcheng Ni, Jie Qin, Di Huang:
Identity-aware Graph Memory Network for Action Detection. 3437-3445 - Wenkang Shan, Haopeng Lu, Shanshe Wang, Xinfeng Zhang, Wen Gao:
Improving Robustness and Accuracy via Relative Information Encoding in 3D Human Pose Estimation. 3446-3454 - Nan Zhong, Zhenxing Qian, Xinpeng Zhang:
Deep Neural Network Retrieval. 3455-3463 - Xingcai Wu, Yucheng Xie, Jiaqi Zeng, Zhenguo Yang, Yi Yu, Qing Li, Wenyin Liu:
Adversarial Learning with Mask Reconstruction for Text-Guided Image Inpainting. 3464-3472 - Zhihao Gu, Yang Chen, Taiping Yao, Shouhong Ding, Jilin Li, Feiyue Huang, Lizhuang Ma:
Spatiotemporal Inconsistency Learning for DeepFake Video Detection. 3473-3481 - Gian-Luca Savino, Jessé Moraes Braga, Johannes Schöning:
VeloCity: Using Voice Assistants for Cyclists to Provide Traffic Reports. 3482-3491 - Qiyu Dai, Shuai Yang, Wenjing Wang, Wei Xiang, Jiaying Liu:
Edit Like A Designer: Modeling Design Workflows for Unaligned Fashion Editing. 3492-3500 - Jizhizi Li, Sihan Ma, Jing Zhang, Dacheng Tao:
Privacy-Preserving Portrait Matting. 3501-3509 - Jiaxiang You, Yuanman Li, Jiantao Zhou, Zhongyun Hua, Weiwei Sun, Xia Li:
A Transformer based Approach for Image Manipulation Chain Detection. 3510-3517 - Peng Wu, Xiangteng He, Mingqian Tang, Yiliang Lv, Jing Liu:
HANet: Hierarchical Alignment Networks for Video-Text Retrieval. 3518-3527 - Mengjing Sun, Pei Zhang, Siwei Wang, Sihang Zhou, Wenxuan Tu, Xinwang Liu, En Zhu, Changjian Wang:
Scalable Multi-view Subspace Clustering with Unified Anchors. 3528-3536 - Tao Xiang, Ying Yang, Shangwei Guo, Hangcheng Liu, Hantao Liu:
PRNet: A Progressive Recovery Network for Revealing Perceptually Encrypted Images. 3537-3545 - Run Wang, Felix Juefei-Xu, Meng Luo, Yang Liu, Lina Wang:
FakeTagger: Robust Safeguards against DeepFake Dissemination via Provenance Tracking. 3546-3555 - Yang Bai, Junyan Wang, Yang Long, Bingzhang Hu, Yang Song, Maurice Pagnucco, Yu Guan:
Discriminative Latent Semantic Graph for Video Captioning. 3556-3564 - Qichao Ying, Zhenxing Qian, Hang Zhou, Haisheng Xu, Xinpeng Zhang, Siyi Li:
From Image to Imuge: Immunized Image Generation. 3565-3573 - Sravya Vardhani Shivapuja, Mansi Pradeep Khamkar, Divij Bajaj, Ganesh Ramakrishnan, Ravi Kiran Sarvadevabhatla:
Wisdom of (Binned) Crowds: A Bayesian Stratification Paradigm for Crowd Counting. 3574-3582 - Insoo Lee, Jinsung Lee, Kyunghan Lee, Dirk Grunwald, Sangtae Ha:
Demystifying Commercial Video Conferencing Applications. 3583-3591 - Han Hu, Sheng Cheng, Xinggong Zhang, Zongming Guo:
LightFEC: Network Adaptive FEC with a Lightweight Deep-Learning Approach. 3592-3600 - Yueming Lyu, Jing Dong, Bo Peng, Wei Wang, Tieniu Tan:
SOGAN: 3D-Aware Shadow and Occlusion Robust GAN for Makeup Transfer. 3601-3609
Reproducibility
- Yuqing Liao, Xinke Li, Zekun Tong, Yabang Zhao, Andrew Lim, Zhenzhong Kuang, Cise Midoglu:
Reproducibility Companion Paper: Campus3D: A Photogrammetry Point Cloud Benchmark for Outdoor Scene Hierarchical Understanding. 3610-3614 - Dingquan Li, Tingting Jiang, Ming Jiang, Vajira Lasantha Thambawita, Haoliang Wang:
Reproducibility Companion Paper: Norm-in-Norm Loss with Faster Convergence and Better Performance for Image Quality Assessment. 3615-3618 - Serhan Gül, Sebastian Bosse, Dimitri Podborski, Thomas Schierl, Cornelius Hellge, Marc A. Kastner, Jan Zahálka:
Reproducibility Companion Paper: Kalman Filter-Based Head Motion Prediction for Cloud-Based Mixed Reality. 3619-3621 - Jari Korhonen, Yicheng Su, Junyong You, Steven Hicks, Cise Midoglu:
Reproducibility Companion Paper: Blind Natural Video Quality Prediction via Statistical Temporal Features and Deep Spatial Features. 3622-3626 - Jakub Nawala, Lucjan Janowski, Bogdan Cmiel, Krzysztof Rusek, Marc A. Kastner, Jan Zahálka:
Reproducibility Companion Paper: Describing Subjective Experiment Consistency by p-Value P-P Plot. 3627-3629 - Li Tao, Xueting Wang, Toshihiko Yamasaki, Jingjing Chen, Steven Hicks:
Reproducibility Companion Paper: Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework. 3630-3632 - Fan Yu, Haonan Wang, Tongwei Ren, Jinhui Tang, Gangshan Wu, Jingjing Chen, Zhenzhong Kuang:
Reproducibility Companion Paper: Visual Relation of Interest Detection. 3633-3637 - Lijian Gao, Qirong Mao, Jingjing Chen, Ming Dong, Ratna Babu Chinnam, Lucile Sassatelli, Miguel Fabián Romero Rondón, Ujjwal Sharma:
Reproducibility Companion Paper: On Learning Disentangled Representation for Acoustic Event Detection. 3638-3641
Keynote Talk V&VI
- James C. Lester:
AI and the Future of Education. 3642 - Zhengyou Zhang:
Digital Human in an Integrated Physical-Digital World (IPhD). 3643
Session 24: Media Interpretation-III
- Wenhang Ge, Chunyan Pan, Ancong Wu, Hongwei Zheng, Wei-Shi Zheng:
Cross-Camera Feature Prediction for Intra-Camera Supervised Person Re-identification across Distant Scenes. 3644-3653 - Xindi Shang, Yicong Li, Junbin Xiao, Wei Ji, Tat-Seng Chua:
Video Visual Relation Detection via Iterative Inference. 3654-3663 - Jiahui Li, Kun Kuang, Lin Li, Long Chen, Songyang Zhang, Jian Shao, Jun Xiao:
Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation. 3664-3672 - Zheyu Zhang, Yurui Zhu, Xueyang Fu, Zhiwei Xiong, Zheng-Jun Zha, Feng Wu:
Multifocal Attention-Based Cross-Scale Network for Image De-raining. 3673-3681 - Dongyang Zhang, Changyu Li, Ning Xie, Guoqing Wang, Jie Shao:
PFFN: Progressive Feature Fusion Network for Lightweight Image Super-Resolution. 3682-3690 - Mengzhu Wang, Wei Wang, Baopu Li, Xiang Zhang, Long Lan, Huibin Tan, Tianyi Liang, Wei Yu, Zhigang Luo:
InterBN: Channel Fusion for Adversarial Unsupervised Domain Adaptation. 3691-3700
Session 25: Multimedia Art, Entertainment and Culture
- Shaozu Yuan, Ruixue Liu, Meng Chen, Baoyang Chen, Zhijie Qiu, Xiaodong He:
Learning to Compose Stylistic Calligraphy Artwork with Emotions. 3701-3709 - Athanasios Efthymiou, Stevan Rudinac, Monika Kackovic, Marcel Worring, Nachoem Wijnberg:
Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings. 3710-3719 - Mark David Hosale, Robert S. Allison, Jim Madsen, Marcus Gordon:
ArtScience and the ICECUBE LED Display [ILDm^3]. 3720-3727 - Guo Li, Baoliang Chen, Lingyu Zhu, Qingwen He, Hongfei Fan, Shiqi Wang:
PUGCQ: A Large Scale Dataset for Quality Assessment of Professional User-Generated Content. 3728-3736 - Yurui Ren, Yubo Wu, Thomas H. Li, Shan Liu, Ge Li:
Combining Attention with Flow for Person Image Synthesis. 3737-3745 - Shuang Wu, Zhenguang Liu, Shijian Lu, Li Cheng:
Dual Learning Music Composition and Dance Choreography. 3746-3754
Session 26: Open Source Competition
- Xin Liu, Jiancheng Li, Jiaqi Wang, Ziwei Liu:
MMFashion: An Open-Source Toolbox for Visual Fashion Analysis. 3755-3758 - Zihan Ding, Tianyang Yu, Hongming Zhang, Yanhua Huang, Guo Li, Quancheng Guo, Luo Mai, Hao Dong:
Efficient Reinforcement Learning Development with RLzoo. 3759-3762 - Yixiao Guo, Jiawei Liu, Guo Li, Luo Mai, Hao Dong:
Fast and Flexible Human Pose Estimation with HyperPose. 3763-3766 - Xuezhi Wang, Guanyu Gao:
SmartEye: An Open Source Framework for Real-Time Video Analytics with Edge-Cloud Collaboration. 3767-3770 - Tom Bartindale, Peter Chen, Harrison Marshall, Stanislav Pozdniakov, Dan Richardson:
ZoomSense: A Scalable Infrastructure for Augmenting Zoom. 3771-3774 - Jun Hu, Shengsheng Qian, Quan Fang, Youze Wang, Quan Zhao, Huaiwen Zhang, Changsheng Xu:
Efficient Graph Deep Learning in TensorFlow with tf_geometric. 3775-3778 - Jun Wang, Yinglu Liu, Yibo Hu, Hailin Shi, Tao Mei:
FaceX-Zoo: A PyTorch Toolbox for Face Recognition. 3779-3782 - Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross B. Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer:
PyTorchVideo: A Deep Learning Library for Video Understanding. 3783-3786 - Haocong Ying, Tie Liu, Mingxin Ai, Jiali Ding, Yuanyuan Shang:
AICoacher: A System Framework for Online Realtime Workout Coach. 3787-3790 - Zhanghui Kuang, Hongbin Sun, Zhizhong Li, Xiaoyu Yue, Tsui Hin Lin, Jianyong Chen, Huaqiang Wei, Yiqin Zhu, Tong Gao, Wenwei Zhang, Kai Chen, Wayne Zhang, Dahua Lin:
MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding. 3791-3794 - Adam Wieckowski, Christian Lehmann, Benjamin Bross, Detlev Marpe, Thibaud Biatek, Mickaël Raulet, Jean Le Feuvre:
A Complete End to End Open Source Toolchain for the Versatile Video Coding (VVC) Standard. 3795-3798 - Yehao Li, Yingwei Pan, Jingwen Chen, Ting Yao, Tao Mei:
X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics. 3799-3802 - Luka Murn, Alan F. Smeaton, Marta Mrak:
Interpreting Super-Resolution CNNs for Sub-Pixel Motion Compensation in Video Coding. 3803-3806
Session 27: Multimedia Search and Recommendation-I
- Yi-Geng Hong, Hui-Chu Xiao, Wan-Lei Zhao:
Towards Accurate Localization by Instance Search. 3807-3815 - Rintaro Yanagi, Ren Togo, Takahiro Ogawa, Miki Haseyama:
Database-adaptive Re-ranking for Enhancing Cross-modal Image Retrieval. 3816-3825 - Ning Han, Jingjing Chen, Guangyi Xiao, Hao Zhang, Yawen Zeng, Hao Chen:
Fine-grained Cross-modal Alignment Network for Text-Video Retrieval. 3826-3834 - Jiwei Wei, Xing Xu, Zheng Wang, Guoqing Wang:
Meta Self-Paced Learning for Cross-Modal Matching. 3835-3843 - Ruihong Qiu, Sen Wang, Zhi Chen, Hongzhi Yin, Zi Huang:
CausalRec: Causal Inference for Visual Debiasing in Visually-Aware Recommendation. 3844-3852 - Haifeng Xia, Taotao Jing, Chen Chen, Zhengming Ding:
Semi-supervised Domain Adaptive Retrieval via Discriminative Hashing Learning. 3853-3861
Session 28: Multimedia Search and Recommendation-II
- Zhizhong Han, Xiyang Wang, Yu-Shen Liu, Matthias Zwicker:
Hierarchical View Predictor: Unsupervised 3D Global Feature Learning through Hierarchical Prediction among Unordered Views. 3862-3871 - Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, Liang Wang:
Mining Latent Structures for Multimedia Recommendation. 3872-3880 - Jiahao Xun, Shengyu Zhang, Zhou Zhao, Jieming Zhu, Qi Zhang, Jingjie Li, Xiuqiang He, Xiaofei He, Tat-Seng Chua, Fei Wu:
Why Do We Click: Visual Impression-aware News Recommendation. 3881-3890 - Jingzhi Li, Lutong Han, Ruoyu Chen, Hua Zhang, Bing Han, Lili Wang, Xiaochun Cao:
Identity-Preserving Face Anonymization via Adaptively Facial Attributes Obfuscation. 3891-3899 - Zhijian Hou, Chong-Wah Ngo, Wing Kwong Chan:
CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval. 3900-3908 - Qianxiu Hao, Qianqian Xu, Zhiyong Yang, Qingming Huang:
Learning Unified Embeddings for Recommendation via Meta-path Semantics. 3909-3917
Session 29: Music, Speech and Audio Processing in Multimedia
- Kin Wai Cheuk, Dorien Herremans, Li Su:
ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data. 3918-3926 - Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li:
Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection. 3927-3935 - Wei Tsung Lu, Meng-Hsuan Wu, Yuh-Ming Chiu, Li Su:
Actions Speak Louder than Listening: Evaluating Music Style Transfer based on Editing Experience. 3936-3944 - Rongjie Huang, Feiyang Chen, Yi Ren, Jinglin Liu, Chenye Cui, Zhou Zhao:
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus. 3945-3954 - Hongyuan Zhu, Ye Niu, Di Fu, Hao Wang:
MusicBERT: A Self-supervised Learning of Music Representation. 3955-3963 - Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, Xilin Chen:
UniCon: Unified Context Network for Robust Active Speaker Detection. 3964-3972
Session 30: Multimedia Transport and Delivery
- Yakun Huang, Yuanwei Zhu, Xiuquan Qiao, Zhijie Tan, Boyuan Bai:
AITransfer: Progressive AI-powered Transmission for Real-Time Point Cloud Video Streaming. 3989-3997 - Tiesong Zhao, Jielian Lin, Yanjie Song, Xu Wang, Yuzhen Niu:
Game Theory-driven Rate Control for 360-Degree Video Coding. 3998-4006 - Lei Zhang, Yanyan Suo, Ximing Wu, Feng Wang, Yuchi Chen, Laizhong Cui, Jiangchuan Liu, Zhong Ming:
TBRA: Tiling and Bitrate Adaptation for Mobile 360-Degree Video Streaming. 4007-4015 - Wanxin Shi, Qing Li, Ruishan Zhang, Gengbiao Shen, Yong Jiang, Zhenhui Yuan, Gabriel-Miro Muntean:
QoE Ready to Respond: A QoE-aware MEC Selection Scheme for DASH-based Adaptive Video Streaming to Mobile Users. 4016-4024 - Pengfei Xiong, Yu Chen:
Hierarchical Fusion for Practical Ghost-free High Dynamic Range Imaging. 4025-4033 - Xindong Zhang, Hui Zeng, Lei Zhang:
Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices. 4034-4043
Poster Session 5
- Hanyue Tu, Li Li, Wengang Zhou, Houqiang Li:
Semantic Scalable Image Compression with Cross-Layer Priors. 4044-4052 - Weidong Chen, Guorong Li, Xinfeng Zhang, Hongyang Yu, Shuhui Wang, Qingming Huang:
Cascade Cross-modal Attention Network for Video Actor and Action Segmentation from a Sentence. 4053-4062 - Chuanyi Zhang, Yazhou Yao, Xing Xu, Jie Shao, Jingkuan Song, Zechao Li, Zhenmin Tang:
Extracting Useful Knowledge from Noisy Web Images via Data Purification for Fine-Grained Recognition. 4063-4072 - Tianyu Su, Xuemeng Song, Na Zheng, Weili Guan, Yan Li, Liqiang Nie:
Complementary Factorization towards Outfit Compatibility Modeling. 4073-4081 - Xin Dong, Hao Liu, Weiwei Cai, Pengyuan Lv, Zekuan Yu:
Open Set Face Anti-Spoofing in Unseen Attacks. 4082-4090 - Yicong Li, Xun Yang, Xindi Shang, Tat-Seng Chua:
Interventional Video Relation Detection. 4091-4099 - Yuxi Xie, Danqing Huang, Jinpeng Wang, Chin-Yew Lin:
CanvasEmb: Learning Layout Representation with Large-scale Pre-training for Graphic Design. 4100-4108 - Yizhen Lao, Jie Yang, Xinying Wang, Jianxin Lin, Yu Cao, Shien Song:
Augmenting TV Shows via Uncalibrated Camera Small Motion Tracking in Dynamic Scene. 4109-4117 - Aoxiong Yin, Zhou Zhao, Jinglin Liu, Weike Jin, Meng Zhang, Xingshan Zeng, Xiaofei He:
SimulSLT: End-to-End Simultaneous Sign Language Translation. 4118-4127 - Hongshuo Tian, Ning Xu, An-An Liu, Chenggang Yan, Zhendong Mao, Quan Zhang, Yongdong Zhang:
Mask and Predict: Multi-step Reasoning for Scene Graph Generation. 4128-4136 - Shanmin Yang, Xiao Yang, Yi Lin, Peng Cheng, Yi Zhang, Jianwei Zhang:
Heterogeneous Face Recognition with Attention-guided Feature Disentangling. 4137-4145 - Yiqi Jiang, Weihua Chen, Xiuyu Sun, Xiaoyu Shi, Fan Wang, Hao Li:
Exploring the Quality of GAN Generated Images for Person Re-Identification. 4146-4155 - Chen Zhang, Siwei Wang, Jiyuan Liu, Sihang Zhou, Pei Zhang, Xinwang Liu, En Zhu, Changwang Zhang:
Multi-view Clustering via Deep Matrix Factorization and Partition Alignment. 4156-4164 - Zhen Han, Xiangteng He, Mingqian Tang, Yiliang Lv:
Video Similarity and Alignment Learning on Partial Video Copy Detection. 4165-4173 - Jinjian Wu, Yongxu Liu, Leida Li, Weisheng Dong, Guangming Shi:
No-Reference Video Quality Assessment with Heterogeneous Knowledge Ensemble. 4174-4182 - Carlos Bermejo Fernandez, Petteri Nurmi, Pan Hui:
Seeing is Believing?: Effects of Visualization on Smart Device Privacy Perceptions. 4183-4192 - Shuai Shao, Lei Xing, Yan Wang, Rui Xu, Chunyan Zhao, Yanjiang Wang, Baodi Liu:
MHFC: Multi-Head Feature Collaboration for Few-Shot Learning. 4193-4201 - Shuo Ma, Yanli Ji, Xing Xu, Xiaofeng Zhu:
Vision-guided Music Source Separation via a Fine-grained Cycle-Separation Network. 4202-4210 - Yuchen Yang, Ye Xiang, Shuaicheng Liu, Lifang Wu, Boxuan Zhao, Bing Zeng:
GLM-Net: Global and Local Motion Estimation via Task-Oriented Encoder-Decoder Structure. 4211-4219 - Katsuyuki Nakamura, Hiroki Ohashi, Mitsuhiro Okada:
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention. 4220-4229 - Jiguo Li, Chuanmin Jia, Xinfeng Zhang, Siwei Ma, Wen Gao:
Cross Modal Compression: Towards Human-comprehensible Semantic Compression. 4230-4238 - Yunqing Hu, Xuan Jin, Yin Zhang, Haiwen Hong, Jingfeng Zhang, Yuan He, Hui Xue:
RAMS-Trans: Recurrent Attention Multi-scale Transformer for Fine-grained Image Recognition. 4239-4248 - Jiechong Song, Bin Chen, Jian Zhang:
Memory-Augmented Deep Unfolding Network for Compressive Sensing. 4249-4258 - Lihao Jiang, Yi Wang, Qi Jia, Shengwei Xu, Yu Liu, Xin Fan, Haojie Li, Risheng Liu, Xinwei Xue, Ruili Wang:
Underwater Species Detection using Channel Sharpening Attention. 4259-4267 - Junyin Zhang, Yongxin Ge, Xinqian Gu, Boyu Hua, Tao Xiang:
Self-Supervised Pre-training on the Target Domain for Cross-Domain Person Re-identification. 4268-4276 - Lei Zhang, Leiting Chen, Chuan Zhou, Fan Yang, Xin Li:
Exploring Graph-Structured Semantics for Cross-Modal Retrieval. 4277-4286 - Lei Shen, Haolan Zhan, Xin Shen, Yonghao Song, Xiaofang Zhao:
Text is NOT Enough: Integrating Visual Impressions into Open-domain Dialogue Generation. 4287-4296 - Yang Li, Shiqi Wang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, Yue Wang:
Quality Assessment of End-to-End Learned Image Compression: The Benchmark and Objective Measure. 4297-4305 - Xiao Luo, Daqing Wu, Zeyu Ma, Chong Chen, Minghua Deng, Jianqiang Huang, Xian-Sheng Hua:
A Statistical Approach to Mining Semantic Similarity for Deep Unsupervised Hashing. 4306-4314 - Zi-Rong Jin, Liang-Jian Deng, Tian-Jing Zhang, Xiao-Xu Jin:
BAM: Bilateral Activation Mechanism for Image Fusion. 4315-4323 - Lei Wang, Piotr Koniusz:
Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors. 4324-4333 - Tailin Chen, Desen Zhou, Jian Wang, Shidong Wang, Yu Guan, Xuming He, Errui Ding:
Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition. 4334-4342 - Weijie Li, Xinhang Song, Yubing Bai, Sixian Zhang, Shuqiang Jiang:
ION: Instance-level Object Navigation. 4343-4352 - Shiwei Gan, Yafeng Yin, Zhiwei Jiang, Lei Xie, Sanglu Lu:
Skeleton-Aware Neural Sign Language Translation. 4353-4361 - Srinivas Kruthiventi S. S, George Jose, Nitya Tandon, Rajesh Roshan Biswal, Aashish Kumar:
Fingerspelling Recognition in the Wild with Fixed-Query based Visual Attention. 4362-4370 - Qiongjie Cui, Huaijiang Sun, Yue Kong, Xiaoning Sun:
Deep Human Dynamics Prior. 4371-4379 - Jiangming Shi, Zixian Gao, Hao Liu, Zekuan Yu, Fengjun Li:
Exploiting Invariance of Mining Facial Landmarks. 4380-4389 - Jiaxiang Tang, Xiaokang Chen, Gang Zeng:
Joint Implicit Image Function for Guided Depth Super-Resolution. 4390-4399 - Ziqi Yuan, Wei Li, Hua Xu, Wenmeng Yu:
Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis. 4400-4407 - Jun Xiao, Qian Ye, Rui Zhao, Kin-Man Lam, Kao Wan:
Self-feature Learning: An Efficient Deep Lightweight Network for Image Super-resolution. 4408-4416 - Sebastian Szyller, Buse Gul Atli, Samuel Marchal, N. Asokan:
DAWN: Dynamic Adversarial Watermarking of Neural Networks. 4417-4425 - Jing Liang, Li Niu, Fengjun Guo, Teng Long, Liqing Zhang:
Visible Watermark Removal via Self-calibrated Localization and Background Refinement. 4426-4434 - Ruoxi Deng, Shengjun Liu, Jinxin Wang, Huibing Wang, Hanli Zhao, Xiaoqin Zhang:
Learning to Decode Contextual Information for Efficient Contour Detection. 4435-4443 - Yiguo Qiao, Licheng Jiao, Wenbin Li, Christian Richardt, Darren Cosker:
Fast, High-Quality Hierarchical Depth-Map Super-Resolution. 4444-4453 - Nan Xiang, Xiaosong Yang, Jian J. Zhang:
TsFPS: An Accurate and Flexible 6DoF Tracking System with Fiducial Platonic Solids. 4454-4462 - Xiao Wang, Zheng Wang, Wu Liu, Xin Xu, Jing Chen, Chia-Wen Lin:
Consistency-Constancy Bi-Knowledge Learning for Pedestrian Detection in Night Surveillance. 4463-4471 - Yudong Wang, Liang-Jian Deng, Tian-Jing Zhang, Xiao Wu:
SSconv: Explicit Spectral-to-Spatial Convolution for Pansharpening. 4472-4480 - Zhengyi Liu, Yuan Wang, Zhengzheng Tu, Yun Xiao, Bin Tang:
TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network. 4481-4490 - Pu Li, Xiaobai Liu, Xiaohui Xie:
Learning Sample-Specific Policies for Sequential Image Augmentation. 4491-4500 - Wen Yang, Jinjian Wu, Leida Li, Weisheng Dong, Guangming Shi:
Image Quality Caption with Attentive and Recurrent Semantic Attractor Network. 4501-4509 - Weizhi Nie, Jiesi Li, Ning Xu, An-An Liu, Xuanya Li, Yongdong Zhang:
Triangle-Reward Reinforcement Learning: A Visual-Linguistic Semantic Alignment for Image Captioning. 4510-4518 - Huiyuan Fu, Changhao Tian, Xin Wang, Huadong Ma:
Stacked Semantically-Guided Learning for Image De-distortion. 4519-4527 - Yudong Han, Yangyang Guo, Jianhua Yin, Meng Liu, Yupeng Hu, Liqiang Nie:
Focal and Composed Vision-semantic Modeling for Visual Question Answering. 4528-4536 - Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Jiawei Liu, Zhizheng Zhang, Zheng-Jun Zha:
Pose-Guided Feature Learning with Knowledge Distillation for Occluded Person Re-Identification. 4537-4545 - Jiayuan Xie, Yi Cai, Qingbao Huang, Tao Wang:
Multiple Objects-Aware Visual Question Generation. 4546-4554 - Chamara Madarasingha, Kanchana Thilakarathna:
VASTile: Viewport Adaptive Scalable 360-Degree Video Frame Tiling. 4555-4563 - Li Ding, Yongwei Wang, Xin Ding, Kaiwen Yuan, Ping Wang, Hua Huang, Z. Jane Wang:
Delving into Deep Image Prior for Adversarial Defense: A Novel Reconstruction-based Defense Framework. 4564-4572 - Yongrui Li, Shilian Wu, Jun Yu, Zengfu Wang:
Fine-Grained Language Identification in Scene Text Images. 4573-4581 - Dongjie Tang, Cathy Bao, Yong Yao, Chao Xie, Qiming Shi, Marc Mao, Randy Xu, Linsheng Li, Mohammad R. Haghighat, Zhengwei Qi, Haibing Guan:
CARE: Cloudified Android OSes on the Cloud Rendering. 4582-4590 - Shuangping Huang, Yu Luo, Zhenzhou Zhuang, Jin-Gang Yu, Mengchao He, Yongpan Wang:
Context-Aware Selective Label Smoothing for Calibrating Sequence Recognition Model. 4591-4599 - Chunbin Gu, Jiajun Bu, Zhen Zhang, Zhi Yu, Dongfang Ma, Wei Wang:
Image Search with Text Feedback by Deep Hierarchical Attention Mutual Information Maximization. 4600-4609
Panel 2
- Hayley Hung, Cathal Gurrin, Martha A. Larson, Hatice Gunes, Fabien Ringeval, Elisabeth André, Louis-Philippe Morency:
Social Signals and Multimedia: Past, Present, Future. 4610-4612
Session 31: Multimedia Telepresence and Virtual/Augmented Reality
- Xianqiang Lyu, Zhiyu Zhu, Mantang Guo, Jing Jin, Junhui Hou, Huanqiang Zeng:
Learning Spatial-angular Fusion for Compressive Light Field Imaging in a Cycle-consistent Framework. 4613-4621 - Jiale Li, Hang Dai, Ling Shao, Yong Ding:
From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with Voxel-to-Point Decoder. 4622-4631 - Jisheng Li, Yuze He, Jinghui Jiao, Yubin Hu, Yuxing Han, Jiangtao Wen:
Extending 6-DoF VR Experience Via Multi-Sphere Images Interpolation. 4632-4640 - Liao Wang, Ziyu Wang, Pei Lin, Yuheng Jiang, Xin Suo, Minye Wu, Lan Xu, Jingyi Yu:
iButter: Neural Interactive Bullet Time Generator for Human Free-viewpoint Rendering. 4641-4650 - Guoxing Sun, Xin Chen, Yizhang Chen, Anqi Pang, Pei Lin, Yuheng Jiang, Lan Xu, Jingyi Yu, Jingya Wang:
Neural Free-Viewpoint Performance Rendering under Complex Human-object Interactions. 4651-4660 - Hongkuan Shi, Zhiwei Wang, Jinxin Lv, Yilang Wang, Peng Zhang, Fei Zhu, Qiang Li:
Semi-supervised Learning via Improved Teacher-Student Network for Robust 3D Reconstruction of Stereo Endoscopic Image. 4661-4669
Session 32: Social Multimedia
- Qiang Hou, Weiqing Min, Jing Wang, Sujuan Hou, Yuanjie Zheng, Shuqiang Jiang:
FoodLogoDet-1500: A Dataset for Large-Scale Food Logo Detection via Multi-Scale Feature Decoupling Network. 4670-4679 - Jing Wang, Yuanjie Zheng, Jingqi Song, Sujuan Hou:
Cross-View Representation Learning for Multi-View Logo Classification with Information Bottleneck. 4680-4688 - Xiangjun Tang, Wenxin Sun, Yong-Liang Yang, Xiaogang Jin:
Parametric Reshaping of Portraits in Videos. 4689-4697 - Anshu Singh, Shaojing Fan, Mohan S. Kankanhalli:
Human Attributes Prediction under Privacy-preserving Conditions. 4698-4706 - Bin Liang, Chenwei Lou, Xiang Li, Lin Gui, Min Yang, Ruifeng Xu:
Multi-Modal Sarcasm Detection with Interactive In-Modal and Cross-Modal Graphs. 4707-4715 - Shiwei Wu, Joya Chen, Tong Xu, Liyi Chen, Lingfei Wu, Yao Hu, Enhong Chen:
Linking the Characters: Video-oriented Social Graph Generation via Hierarchical-cumulative GCN. 4716-4724
Session 33: Multimedia Grand Challenge
- Zhenzhi Wang, Zhimin Li, Liyu Wu, Jiangfeng Xiong, Qinglin Lu:
Overview of Tencent Multi-modal Ads Video Understanding. 4725-4729 - Haoxin Zhang, Zhimin Li, Qinglin Lu:
Better Learning Shot Boundary Detection via Multi-task. 4730-4734 - Xinqi Fan, Ali Raza Shahid, Hong Yan:
Facial Micro-Expression Generation based on Deep Motion Retargeting and Transfer Learning. 4735-4739 - Chao Zhou, Wenjun Wu, Dan Yang, Tianchi Huang, Liang Guo, Bing Yu:
Deadline and Priority-aware Congestion Control for Delay-sensitive Multimedia Streaming. 4740-4744 - Wang-Wang Yu, Jingwen Jiang, Yong-Jie Li:
LSSNet: A Two-stream Convolutional Neural Network for Spotting Macro- and Micro-expression in Long Videos. 4745-4749 - Chengbo Dong, Xinru Chen, Aozhu Chen, Fan Hu, Zihan Wang, Xirong Li:
Multi-Level Visual Representation with Semantic-Reinforced Learning for Video Captioning. 4750-4754 - Yi Zhang, Youjun Zhao, Yuhang Wen, Zixuan Tang, Xinhua Xu, Mengyuan Liu:
Facial Prior Based First Order Motion Model for Micro-expression Generation. 4755-4759 - Xuansheng Wu, Feichi Yang, Tong Zhou, Xinyue Lin:
Rethinking the Impacts of Overfitting and Feature Quality on Small-scale Video Classification. 4760-4764 - Fuxing Leng:
A Gradient Balancing Approach for Robust Logo Detection. 4765-4769 - Daya Guo, Zhaoyang Zeng:
Multi-modal Representation Learning for Video Advertisement Content Structuring. 4770-4774 - Haozhe Li:
Phoenix: Combining Highest-Profit First Scheduling and Responsive Congestion Control for Delay-sensitive Multimedia Transmission. 4775-4778 - Wei Ji, Yicong Li, Meng Wei, Xindi Shang, Junbin Xiao, Tongwei Ren, Tat-Seng Chua:
VidVRD 2021: The Third Grand Challenge on Video Relation Detection. 4779-4783 - Weipeng Xu, Ye Liu, Daquan Lin:
A Simple and Effective Baseline for Robust Logo Detection. 4784-4788 - Hang Chen, Xiao Li, Zefan Wang, Xiaolin Hu:
Robust Logo Detection in E-Commerce Images by Data Augmentation. 4789-4793 - Bo Yang, Jianming Wu, Zhiguang Zhou, Megumi Komiya, Koki Kishimoto, Jianfeng Xu, Keisuke Nonaka, Toshiharu Horiuchi, Satoshi Komorita, Gen Hattori, Sei Naito, Yasuhiro Takishima:
Facial Action Unit-based Deep Learning Framework for Spotting Macro- and Micro-expressions in Long Video Sequences. 4794-4798 - Liwei Jin, Haoyue Cheng, Su Xu, Wayne Wu, Limin Wang:
NJU MCG - Sensetime Team Submission to Pre-training for Video Understanding Challenge Track II. 4799-4802 - Yuhong He:
Research on Micro-Expression Spotting Method Based on Optical Flow Features. 4803-4807 - Hao Wu, Jiajie Wang, Yuanzhe Gu, Peisen Zhao, Zhonglin Zu:
A Solution to Multi-modal Ads Video Tagging Challenge. 4808-4812 - Yifan Xu, Sirui Zhao, Huaying Tang, Xinglong Mao, Tong Xu, Enhong Chen:
FAMGAN: Fine-grained AUs Modulation based Generative Adversarial Network for Micro-Expression Generation. 4813-4817 - Yiqing Huang, Hongwei Xue, Jiansheng Chen, Huimin Ma, Hongbing Ma:
Semantic Tag Augmented XlanV Model for Video Captioning. 4818-4822 - Qin Lin, Nuo Pang, Zhiying Hong:
Automated Multi-Modal Video Editing for Ads Video. 4823-4827 - Dongyuan Su, Laizhong Cui, Lei Zhang, Yanyan Suo, Yan Qiu:
Rate Adaptation and Block Scheduling for Delay-sensitive Multimedia Applications. 4828-4832 - Kaifeng Gao, Long Chen, Yifeng Huang, Jun Xiao:
Video Relation Detection via Tracklet based Visual Transformer. 4833-4837 - Chris Birmingham, Kalin Stefanov, Maja J. Mataric:
Group-Level Focus of Visual Attention for Improved Next Speaker Prediction. 4838-4842 - Zejia Weng, Lingchen Meng, Rui Wang, Zuxuan Wu, Yu-Gang Jiang:
A Multimodal Framework for Video Ads Understanding. 4843-4847 - Beibei Zhang, Fan Yu, Yanxin Gao, Tongwei Ren, Gangshan Wu:
Joint Learning for Relationship and Interaction Analysis in Video with Multimodal Feature Fusion. 4848-4852 - Sihan Chen, Xinxin Zhu, Dongze Hao, Wei Liu, Jiawei Liu, Zijia Zhao, Longteng Guo, Jing Liu:
MM21 Pre-training for Video Understanding Challenge: Video Captioning with Pretraining Techniques. 4853-4857 - Mingkang Tang, Zhanyu Wang, Zhenhua Liu, Fengyun Rao, Dian Li, Xiu Li:
CLIP4Caption: CLIP for Video Caption. 4858-4862 - Jie Zhang, Junjie Deng, Mowei Wang, Yong Cui, Wei Tsang Ooi, Jiangchuan Liu, Xinyu Zhang, Kai Zheng, Yi Li:
The ACM Multimedia 2021 Meet Deadline Requirements Grand Challenge. 4863-4867 - Vishal Anand, Raksha Ramesh, Boshen Jin, Ziyin Wang, Xiaoxiao Lei, Ching-Yung Lin:
MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding. 4868-4872 - Eugene Yujun Fu, Michael W. Ngai:
Using Motion Histories for Eye Contact Detection in Multiperson Group Conversations. 4873-4877 - Philipp Müller, Michael Dietz, Dominik Schiller, Dominike Thomas, Guanhua Zhang, Patrick Gebhard, Elisabeth André, Andreas Bulling:
MultiMediate: Multi-modal Group Behaviour Analysis for Artificial Mediation. 4878-4882
Session 34: Summarization, Analytics, and Storytelling
- Vinit Veerendraveer Singh, Shivanand Venkanna Sheshappanavar, Chandra Kambhamettu:
MeshNet++: A Network with a Face. 4883-4891 - Mengshi Qi, Jie Qin, Di Huang, Zhiqiang Shen, Yi Yang, Jiebo Luo:
Latent Memory-augmented Graph Transformer for Visual Storytelling. 4892-4901 - Shunli Wang, Dingkang Yang, Peng Zhai, Chixiao Chen, Lihua Zhang:
TSA-Net: Tube Self-Attention Network for Action Quality Assessment. 4902-4910 - Xiangpeng Li, Lianli Gao, Lei Zhao, Jingkuan Song:
Exploring Contextual-Aware Representation and Linguistic-Diverse Expression for Visual Dialog. 4911-4919 - Injung Lee, Hyunchul Kim, Byungjoo Lee:
Automated Playtesting with a Cognitive Model of Sensorimotor Coordination. 4920-4929 - Yifan Ren, Xing Xu, Fumin Shen, Yazhou Yao, Huimin Lu:
CAA: Candidate-Aware Aggregation for Temporal Action Detection. 4930-4938
Session 35: Vision and Language-I
- Zehui Chen, Chenhongyi Yang, Qiaofei Li, Feng Zhao, Zheng-Jun Zha, Feng Wu:
Disentangle Your Dense Object Detector. 4939-4948 - Lei Hu, Shaoli Huang, Shilei Wang, Wei Liu, Jifeng Ning:
Do We Really Need Frame-by-Frame Annotation Datasets for Object Tracking? 4949-4957 - Xu Chen, Chenqiang Gao, Feng Yang, Xiaohan Wang, Yi Yang, Yahong Han:
Video-to-Image Casting: A Flatting Method for Video Analysis. 4958-4966 - Zhirui Zhao, Changqun Xia, Chenxi Xie, Jia Li:
Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection. 4967-4975 - Kedi Lyu, Zhenguang Liu, Shuang Wu, Haipeng Chen, Xuhong Zhang, Yuyu Yin:
Learning Human Motion Prediction via Stochastic Differential Equations. 4976-4984 - Ning Wang, Guangming Zhu, Liang Zhang, Peiyi Shen, Hongsheng Li, Cong Hua:
Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition. 4985-4993
Session 36: Vision and Language-II
- Xiang Guan, Guoqing Wang, Xing Xu, Yi Bin:
Learning Hierarchal Channel Attention for Fine-grained Visual Classification. 5011-5019 - Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan:
Group-based Distinctive Image Captioning with Memory Attention. 5020-5028 - Lei Li, Chun Yuan:
VQMG: Hierarchical Vector Quantised and Multi-hops Graph Reasoning for Explicit Representation Learning. 5029-5037 - Minli Li, Peilin Zhao, Yifan Zhang, Shuaicheng Niu, Qingyao Wu, Mingkui Tan:
Structure-aware Mathematical Expression Recognition with Sequence-Level Modeling. 5038-5046 - Ying Cheng, Ruize Wang, Jiashuo Yu, Rui-Wei Zhao, Yuejie Zhang, Rui Feng:
Exploring Logical Reasoning for Referring Expression Comprehension. 5047-5055 - Zeliang Song, Xiaofei Zhou, Linhua Dong, Jianlong Tan, Li Guo:
Direction Relation Transformer for Image Captioning. 5056-5064
Session 37: Vision and Language-III
- Tao Jin, Zhou Zhao:
Contrastive Disentangled Meta-Learning for Signer-Independent Sign Language Translation. 5065-5073 - Zeming Liao, Qingbao Huang, Yu Liang, Mingyi Fu, Yi Cai, Qing Li:
Scene Graph with 3D Information for Change Captioning. 5074-5082 - Hongying Liu, Ruyi Luo, Fanhua Shang, Mantang Niu, Yuanyuan Liu:
Progressive Semantic Matching for Video-Text Retrieval. 5083-5091 - Qing Lin, Bo Yan, Weimin Tan:
Multimodal Asymmetric Dual Learning for Unsupervised Eyeglasses Removal. 5092-5100 - Dong An, Yuankai Qi, Yan Huang, Qi Wu, Liang Wang, Tieniu Tan:
Neighbor-view Enhanced Model for Vision and Language Navigation. 5101-5109 - Yi Bin, Xindi Shang, Bo Peng, Yujuan Ding, Tat-Seng Chua:
Multi-Perspective Video Captioning. 5110-5118
Poster Session 6
- Hui Wang, Dan Guo, Xian-Sheng Hua, Meng Wang:
Pairwise VLAD Interaction Network for Video Question Answering. 5119-5127 - Yunke Zhang, Chi Wang, Miaomiao Cui, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Hujun Bao, Qixing Huang, Weiwei Xu:
Attention-guided Temporally Coherent Video Object Matting. 5128-5137 - Roy Ka-Wei Lee, Rui Cao, Ziqing Fan, Jing Jiang, Wen-Haw Chong:
Disentangling Hate in Online Memes. 5138-5147 - Jiutao Yue, Haofeng Li, Pengxu Wei, Guanbin Li, Liang Lin:
Robust Real-World Image Super-Resolution against Adversarial Attacks. 5148-5157 - Chaoning Zhang, Adil Karjauv, Philipp Benz, In So Kweon:
Towards Robust Deep Hiding Under Non-Differentiable Distortions for Practical Blind Watermarking. 5158-5166 - Liuwu Li, Yuqi Bu, Yi Cai:
Bottom-Up and Bidirectional Alignment for Referring Expression Comprehension. 5167-5175 - Lingyun Zhang, Xiuxiu Bai, Yao Gao:
SalS-GAN: Spatially-Adaptive Latent Space in StyleGAN for Real Image Embedding. 5176-5184 - Xuri Ge, Fuhai Chen, Joemon M. Jose, Zhilong Ji, Zhongqin Wu, Xiao Liu:
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval. 5185-5193 - Clinton Mo, Kun Hu, Shaohui Mei, Zebin Chen, Zhiyong Wang:
Keyframe Extraction from Motion Capture Sequences with Graph based Deep Reinforcement Learning. 5194-5202 - Lei Shi, Kai Shuang, Shijie Geng, Peng Gao, Zuohui Fu, Gerard de Melo, Yunpeng Chen, Sen Su:
Dense Contrastive Visual-Linguistic Pretraining. 5203-5212 - Weijiang Yu, Jian Liang, Lei Ji, Lu Li, Yuejian Fang, Nong Xiao, Nan Duan:
Hybrid Reasoning Network for Video-based Commonsense Captioning. 5213-5221 - Guibao Shen, Yingkui Zhang, Jialu Li, Mingqiang Wei, Qiong Wang, Guangyong Chen, Pheng-Ann Heng:
Learning Regularizer for Monocular Depth Estimation with Adversarial Guidance. 5222-5230 - Wenyu Zhang, Qing Ding, Jian Hu, Yi Ma, Mingzhe Lu:
Pixel-wise Graph Attention Networks for Person Re-identification. 5231-5238 - Xiaomeng Chu, Jiajun Deng, Yao Li, Zhenxun Yuan, Yanyong Zhang, Jianmin Ji, Yu Zhang:
Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting. 5239-5247 - Rui Ma, Hanxiao Luo, Qingbo Wu, King Ngi Ngan, Hongliang Li, Fanman Meng, Linfeng Xu:
Remember and Reuse: Cross-Task Blind Image Quality Assessment via Relevance-aware Incremental Learning. 5248-5256 - Yajun Gao, Tengfei Liang, Yi Jin, Xiaoyan Gu, Wu Liu, Yidong Li, Congyan Lang:
MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification. 5257-5265 - Wenxu Tao, Gangyi Jiang, Zhidi Jiang, Mei Yu:
Point Cloud Projection and Multi-Scale Feature Fusion Network Based Blind Quality Assessment for Colored Point Clouds. 5266-5272 - Guangjun Li, Yongxiong Wang, Fengting Zhu:
Multi-branch Channel-wise Enhancement Network for Fine-grained Visual Recognition. 5273-5280 - Bowei Zhu, Yong Liu:
General Approximate Cross Validation for Model Selection: Supervised, Semi-supervised and Pairwise Learning. 5281-5289 - Qian Ye, Jun Xiao, Kin-Man Lam, Takayuki Okatani:
Progressive and Selective Fusion Network for High Dynamic Range Imaging. 5290-5297 - Changmeng Zheng, Junhao Feng, Ze Fu, Yi Cai, Qing Li, Tao Wang:
Multimodal Relation Extraction with Efficient Graph Alignment. 5298-5306 - Jia Tan, Nan Ji, Haidong Xie, Xueshuang Xiang:
Legitimate Adversarial Patches: Evading Human Eyes and Detection Models in the Physical World. 5307-5315 - Xian Zhong, Shilei Zhao, Xiao Wang, Kui Jiang, Wenxuan Liu, Wenxin Huang, Zheng Wang:
Unsupervised Vehicle Search in the Wild: A New Benchmark. 5316-5325 - Yuqian Fu, Yanwei Fu, Yu-Gang Jiang:
Meta-FDMixup: Cross-Domain Few-Shot Learning Guided by Labeled Target Data. 5326-5334 - Jiliang Yan, Deming Zhai, Junjun Jiang, Xianming Liu:
Target-guided Adaptive Base Class Reweighting for Few-Shot Learning. 5335-5343 - Yunzhi Zhuge, Chunhua Shen:
Deep Reasoning Network for Few-shot Semantic Segmentation. 5344-5352 - Gangjian Zhang, Shikui Wei, Huaxin Pang, Yao Zhao:
Heterogeneous Feature Fusion and Cross-modal Alignment for Composed Image Retrieval. 5353-5362 - Guodun Li, Yuchen Zhai, Zehao Lin, Yin Zhang:
Similar Scenes Arouse Similar Emotions: Parallel Data Augmentation for Stylized Image Captioning. 5363-5372 - Danni Xu, Ruimin Hu, Zixiang Xiong, Zheng Wang, Linbo Luo, Dengshi Li:
Trajectory is not Enough: Hidden Following Detection. 5373-5381 - Yinwei Wei, Xiang Wang, Qi Li, Liqiang Nie, Yan Li, Xuanping Li, Tat-Seng Chua:
Contrastive Learning for Cold-Start Recommendation. 5382-5390 - Jixin Liu, Rui Chen, Shipeng An, Heng Zhang:
CG-GAN: Class-Attribute Guided Generative Adversarial Network for Old Photo Restoration. 5391-5399 - Zekun Zheng, Xiaodong Wang, Xinye Lin, Shaohe Lv:
Get The Best of the Three Worlds: Real-Time Neural Image Compression in a Non-GPU Environment. 5400-5409 - Ye Zheng, Xi Huang, Li Cui:
Visual Language Based Succinct Zero-Shot Object Detection. 5410-5418 - Bo Jiang, Pengfei Sun, Ziyan Zhang, Jin Tang, Bin Luo:
GAMnet: Robust Feature Matching via Graph Adversarial-Matching Network. 5419-5426 - Zhixiong Zeng, Ying Sun, Wenji Mao:
MCCN: Multimodal Coordinated Clustering Network for Large-Scale Cross-modal Retrieval. 5427-5435 - Yi Ma, Yongqi Zhai, Jiayu Yang, Chunhui Yang, Ronggang Wang:
AFEC: Adaptive Feature Extraction Modules for Learned Image Compression. 5436-5444 - Chengcheng Zhou, Zongqing Lu, Linge Li, Qiangyu Yan, Jing-Hao Xue:
How Video Super-Resolution and Frame Interpolation Mutually Benefit. 5445-5453 - Lingdong Wang, Mohammad H. Hajiesmaili, Ramesh K. Sitaraman:
FOCAS: Practical Video Super Resolution using Foveated Rendering. 5454-5462 - Xiangrong Zhang, Zelin Peng, Peng Zhu, Tianyang Zhang, Chen Li, Huiyu Zhou, Licheng Jiao:
Adaptive Affinity Loss and Erroneous Pseudo-Label Refinement for Weakly Supervised Semantic Segmentation. 5463-5472 - Jialin Tian, Xing Xu, Zheng Wang, Fumin Shen, Xin Liu:
Relationship-Preserving Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval. 5473-5481 - Francesco Bongini, Lorenzo Berlincioni, Marco Bertini, Alberto Del Bimbo:
Partially Fake it Till you Make It: Mixing Real and Fake Thermal Images for Improved Object Detection. 5482-5490 - Tianshuo Xu, Yuhang Wu, Xiawu Zheng, Teng Xi, Gang Zhang, Errui Ding, Fei Chao, Rongrong Ji:
CDP: Towards Optimal Filter Pruning via Class-wise Discriminative Power. 5491-5500 - Tao Lu, Yuanzhi Wang, Yanduo Zhang, Yu Wang, Wei Liu, Zhongyuan Wang, Junjun Jiang:
Face Hallucination via Split-Attention in Split-Attention Network. 5501-5509 - Xianglong Feng, Yi Xie, Mengmei Ye, Zhongze Tang, Bo Yuan, Sheng Wei:
Fake Gradient: A Security and Privacy Protection Framework for DNN-based Image Classification. 5510-5518 - Zhihua Li, Xiang Deng, Xiaotian Li, Lijun Yin:
Integrating Semantic and Temporal Relationships in Facial Action Unit Detection. 5519-5527 - Md Fahim Faysal Khan, Nelson Daniel Troncoso Aldas, Abhishek Kumar, Siddharth Advani, Vijaykrishnan Narayanan:
Sparse to Dense Depth Completion using a Generative Adversarial Network with Intelligent Sampling Strategies. 5528-5536 - Siyan Xue, Shaobing Gao, Minjie Tan, Zhen He, Liangtian He:
How does Color Constancy Affect Target Recognition and Instance Segmentation? 5537-5545 - Xinyang Feng, Dongjin Song, Yuncong Chen, Zhengzhang Chen, Jingchao Ni, Haifeng Chen:
Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection. 5546-5554 - Yuan Chang, Yisong Chen, Guoping Wang:
Salient Error Detection based Refinement for Wide-baseline Image Interpolation. 5555-5564 - Rui Li, Yiting Wang, Bao-Liang Lu:
A Multi-Domain Adaptive Graph Convolutional Network for EEG-based Emotion Recognition. 5565-5573 - Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Yichen Qian, Dongyang Li, Hao Li:
Interpolation Variable Rate Image Compression. 5574-5582 - Songhe Wang, Zheng Bao, Jingtong E:
Armor: A Benchmark for Meta-evaluation of Artificial Music. 5583-5590 - Haiwen Hong, Xuan Jin, Yin Zhang, Yunqing Hu, Jingfeng Zhang, Yuan He, Hui Xue:
DRDF: Determining the Importance of Different Multimodal Information with Dual-Router Dynamic Framework. 5591-5599 - Jianjie Luo, Yehao Li, Yingwei Pan, Ting Yao, Hongyang Chao, Tao Mei:
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising. 5600-5608 - Jiaqing Xu, Haifeng Sun, Qi Qi, Jingyu Wang, Ce Ge, Lejian Zhang, Jianxin Liao:
DLA-Net for FG-SBIR: Dynamic Local Aligned Network for Fine-Grained Sketch-Based Image Retrieval. 5609-5618 - Qianxiu Hao, Qianqian Xu, Zhiyong Yang, Qingming Huang:
Pareto Optimality for Fairness-constrained Collaborative Filtering. 5619-5627 - Yan Gao, Qimeng Wang, Xu Tang, Haochen Wang, Fei Ding, Jing Li, Yao Hu:
Decoupled IoU Regression for Object Detection. 5628-5636 - Zhuofan Zong, Qianggang Cao, Biao Leng:
RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection. 5637-5645 - Minyi Zhao, Yi Xu, Shuigeng Zhou:
Recursive Fusion and Deformable Spatiotemporal Attention for Video Compression Artifact Reduction. 5646-5654 - Jan Zdenek, Hideki Nakayama:
JokerGAN: Memory-Efficient Model for Handwritten Text Generation with Text Line Awareness. 5655-5663
Tutorials
- Kede Ma, Yuming Fang:
Image Quality Assessment in the Modern Age. 5664-5666 - Xiaowen Huang, Jiaming Zhang, Yi Zhang, Xian Zhao, Jitao Sang:
Trustworthy Multimedia Analysis. 5667-5669 - Manjunath Iyer:
Multimedia Classifiers: Behind the Scenes. 5670-5672 - Jie Chen, Qixiang Ye, Xiaoshan Yang, S. Kevin Zhou, Xiaopeng Hong, Li Zhang:
Few-shot Learning for Multi-Modality Tasks. 5673-5674 - António M. G. Pinheiro:
Plenoptic Quality Assessment: The JPEG Pleno Experience. 5675-5677 - Xu Tan, Xiaobing Li:
A Tutorial on AI Music Composition. 5678-5680 - Xin Wang, Peng Cui, Wenwu Zhu:
Out-of-distribution Generalization and Its Applications for Multimedia. 5681-5682 - Guo Lu, Ren Yang, Shenlong Wang, Shan Liu, Radu Timofte:
Deep Learning for Visual Data Compression. 5683-5685
Workshop Summaries
- Aishan Liu, Xinyun Chen, Yingwei Li, Chaowei Xiao, Xun Yang, Xianglong Liu, Dawn Song, Dacheng Tao, Alan L. Yuille, Anima Anandkumar:
ADVM'21: 1st International Workshop on Adversarial Learning for Multimedia. 5686-5687 - Ricardo Guerrero, Michael Spranger, Shuqiang Jiang, Chong-Wah Ngo:
AIxFood'21: 3rd Workshop on AIxFood. 5688-5689 - Wu Liu, Xinchen Liu, Jingkuan Song, Dingwen Zhang, Wenbing Huang, Junbo Guo, John Smith:
HUMA'21: 2nd International Workshop on Human-centric Multimedia Analysis. 5690-5691 - Rainer Lienhart, Thomas B. Moeslund, Hideo Saito:
MMSports'21: 4th International Workshop on Multimedia Content Analysis in Sports. 5692-5693 - Valérie Gouet-Brunet, Margarita Khokhlova, Ronak Kosti, Li Weng:
SUMAC'21: 3rd Workshop on Structuring and Understanding of Multimedia heritAge Contents. 5694-5695 - Stevan Rudinac, Alessandro Bozzon, Tat-Seng Chua, Suzanne Little, Daniel Gatica-Perez, Kiyoharu Aizawa:
UrbanMM'21: 1st International Workshop on Multimedia Computing for Urban Data. 5696-5697 - Stefan Winkler, Weiling Chen, Abhinav Dhall, Pavel Korshunov:
ADGD'21: 1st Workshop on Synthetic Multimedia - Audiovisual Deepfake Generation and Detection. 5698-5699 - Jingting Li, Moi Hoon Yap, Wen-Huang Cheng, John See, Xiaopeng Hong, Xiaobai Li, Su-Jing Wang:
FME'21: 1st Workshop on Facial Micro-Expression: Advanced Techniques for Facial Expressions Generation and Spotting. 5700-5701 - João Magalhães, Alexander G. Hauptmann, Ricardo Gamelas Sousa, Carlos Santiago:
MuCAI'21: 2nd ACM Multimedia Workshop on Multimodal Conversational AI. 5702-5703 - Xiu-Shen Wei, Jufeng Yang, Han-Jia Ye, Jian Yang:
MULL'21: First International Workshop on Multimedia Understanding with Less Labeling. 5704-5705 - Lukas Stappen, Eva-Maria Meßner, Erik Cambria, Guoying Zhao, Björn W. Schuller:
MuSe 2021 Challenge: Multimodal Emotion, Sentiment, Physiological-Emotion, and Stress Detection. 5706-5707 - Teddy Furon, Jingen Liu, Yogesh S. Rawat, Wei Zhang, Qi Zhao:
Trustworthy AI'21: 1st International Workshop on Trustworthy AI for Multimedia Computing. 5708-5709 - Yueting Zhuang, Xing Tang, Guilin Wu, Yahong Han, Haihong Tang, Xiaobo Li, Xiaohan Wang, Baoming Yan, Bo Gao, Yi Yang:
WAB'21: 1st Workshop on Multimodal Product Identification in Livestreaming and WAB Challenge. 5710-5711
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.