default search action
CVPR 2025: Nashville, TN, USA
- IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025, Nashville, TN, USA, June 11-15, 2025. Computer Vision Foundation / IEEE 2025, ISBN 979-8-3503-5300-6
Day 1: 2025-06-13
- Daniel Geng, Charles Herrmann, Junhwa Hur, Forrester Cole, Serena Zhang, Tobias Pfaff, Tatiana Lopez-Guevara, Yusuf Aytar, Michael Rubinstein, Chen Sun, Oliver Wang, Andrew Owens, Deqing Sun:
Motion Prompting: Controlling Video Generation with Motion Trajectories. 1-12 - Ryan D. Burgert, Yuancheng Xu, Wenqi Xian, Oliver Pilarski, Pascal Clausen, Mingming He, Li Ma, Yitong Deng, Lingxiao Li, Mohsen Mousavi
, Michael S. Ryoo, Paul E. Debevec, Ning Yu:
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise. 13-23 - Pascal Chang
, Sergio Sancho, Jingwei Tang, Markus Gross, Vinicius C. Azevedo:
LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping. 24-33 - Yifan Zhou, Zeqi Xiao, Shuai Yang, Xingang Pan:
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space. 34-44 - Ziqi Pang, Tianyuan Zhang, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang:
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders. 45-55 - Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu
, Tianhua Li, Yuxuan Xie, Xiaojun Chang
, Yu Qiao, Wenqi Shao, Kaipeng Zhang
:
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation. 56-66 - Faridoun Mehri, Mahdieh Soleymani Baghshah, Mohammad Taher Pilehvar:
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions. 67-78 - Damien Teney, Liangze Jiang, Florin Gogianu, Ehsan Abbasnejad:
Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild. 79-90 - Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, Yen-Sung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, Rose Hendrix, Favyen Bastani, Eli VanderBilt, Nathan Lambert, Yvonne Chou, Arnavi Chheda, Jenna Sparks, Sam Skjonsberg, Michael Schmitz, Aaron Sarnat, Byron Bischoff, Pete Walsh, Chris Newell, Piper Wolters, Tanmay Gupta, Kuo-Hao Zeng, Jon Borchardt, Dirk Groeneveld, Crystal Nam, Sophie Lebrecht, Caitlin Wittlif, Carissa Schoenick, Oscar Michel, Ranjay Krishna, Luca Weihs, Noah A. Smith, Hannaneh Hajishirzi, Ross B. Girshick, Ali Farhadi, Aniruddha Kembhavi:
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models. 91-104 - Xiao Guo, Xiufeng Song, Yue Zhang, Xiaohong Liu, Xiaoming Liu:
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector. 105-116 - Nick Stracke, Stefan Andreas Baumann, Kolja Bauer, Frank Fundel, Björn Ommer:
CleanDIFT: Diffusion Features without Noise. 117-127 - Meng Lou
, Yizhou Yu:
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels. 128-138 - Longyu Yang
, Ping Hu, Shangbo Yuan, Lu Zhang, Jun Liu, Hengtao Shen, Xiaofeng Zhu:
Towards Explicit Geometry-Reflectance Collaboration for Generalized LiDAR Segmentation in Adverse Weather. 139-149 - Xiaoyi Liu
, Hao Tang
:
DiffFNO: Diffusion Fourier Neural Operator. 150-160 - Eric Kee, Adam Pikielny, Kevin Blackburn-Matzen, Marc Levoy:
Removing Reflections from RAW Photos. 161-171 - Zhedong Zhang, Liang Li, Chenggang Yan, Chunshan Liu
, Anton van den Hengel, Yuankai Qi
:
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing. 172-182 - Hao Li, Ju Dai, Xin Zhao, Feng Zhou, Junjun Pan, Lei Li:
Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation. 183-192 - Xiaozhong Ji, Xiaobin Hu, Zhihong Xu, Junwei Zhu, Chuming Lin, Qingdong He, Jiangning Zhang, Donghao Luo, Yi Chen, Qin Lin, Qinglin Lu, Chengjie Wang
:
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation. 193-203 - Xuanchen Li, Jianyu Wang, Yuhao Cheng, Yikun Zeng, Xingyu Ren, Wenhan Zhu, Weiming Zhao, Yichao Yan:
Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture. 204-214 - Tim Büchner, Christoph Anders, Orlando Guntinas-Lichius, Joachim Denzler:
Electromyography-Informed Facial Expression Reconstruction for Physiological-Based Synthesis and Analysis. 215-227 - Mingtao Guo
, Guanyu Xing, Yanli Liu:
High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model. 228-238 - Tuur Stuyck, Gene Wei-Chin Lin, Egor Larionov, Hsiao-Yu Chen, Aljaz Bozic, Nikolaos Sarafianos, Doug Roble:
Quaffure: Real-Time Quasi-Static Neural Hair Simulation. 239-249 - Wei-Qi Feng, Dong Han, Ze-Kang Zhou, Shunkai Li, Xiaoqiang Liu, Pengfei Wan, Di Zhang, Miao Wang:
GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections. 250-259 - Hongrui Cai, Yuting Xiao, Xuan Wang, Jiafei Li, Yudong Guo, Yanbo Fan, Shenghua Gao, Juyong Zhang:
HERA: Hybrid Explicit Representation for Ultra-Realistic Head Avatars. 260-270 - Jack R. Saunders
, Charlie Hewitt, Yanan Jian, Marek Kowalski, Tadas Baltrusaitis, Yiye Chen, Darren Cosker
, Virginia Estellers, Nicholas Gyde, Vinay P. Namboodiri, Benjamin E. Lundell:
GASP: Gaussian Avatars with Synthetic Priors. 271-280 - Rong Wang, Fabian Prada, Ziyan Wang, Zhongshi Jiang, Chengxiang Yin, Junxuan Li, Shunsuke Saito, Igor Santesteban, Javier Romero, Rohan Joshi, Hongdong Li, Jason M. Saragih, Yaser Sheikh:
FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images. 281-291 - Jingyu Zhuang, Di Kang, Linchao Bao, Liang Lin, Guanbin Li:
DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh. 292-303 - Zedong Chu, Feng Xiong, Meiduo Liu, Jinzhi Zhang, Mingqi Shao, Zhaoxu Sun, Di Wang, Mu Xu:
HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset. 304-313 - Yuanyou Xu, Zongxin Yang, Yi Yang:
SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons. 314-325 - Xingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamori:
FreeUV: Ground-Truth-Free Realistic Facial UV Texture Recovery via Cross-Assembly Inference Strategy. 326-337 - Gangjian Zhang, Nanjie Yao, Shunsi Zhang, Hanfeng Zhao, Guoliang Pang, Jian Shu, Hao Wang:
MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction. 338-347 - Zichen Tang, Yuan Yao, Miaomiao Cui, Liefeng Bo, Hongyu Yang:
GaussianIP: Identity-Preserving Realistic 3D Human Generation via Human-Centric Diffusion Prior. 348-358 - Yingmao Miao, Zhanpeng Huang, Rui Han, Zibin Wang, Chenhao Lin, Chao Shen:
Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model. 359-368 - Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy, Jingyuan Liu, Julie Dorsey
, Zhixin Shu:
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces. 369-379 - Junying Wang, Jingyuan Liu, Xin Sun, Krishna Kumar Singh, Zhixin Shu, He Zhang, Jimei Yang, Nanxuan Zhao, Tuanfeng Y. Wang, Simon S. Chen, Ulrich Neumann, Jae Shin Yoon:
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization. 380-390 - Kenji Enomoto, Scott Cohen, Brian L. Price, T. J. Rhodes:
Polarized Color Screen Matting. 391-399 - Ning Ni
, Libao Zhang:
SLVR: Super-Light Visual Reconstruction via Blueprint Controllable Convolutions and Exploring Feature Diversity Representation. 400-410 - Ping Wang, Lishun Wang, Gang Qu, Xiaodong Wang, Yulun Zhang, Xin Yuan
:
Proximal Algorithm Unrolling: Flexible and Efficient Reconstruction Networks for Single-Pixel Imaging. 411-421 - Bojian Wu, Yifan Peng, Ruizhen Hu, Xiaowei Zhou:
Glossy Object Reconstruction with Cost-effective Polarized Acquisition. 422-431 - Wei Xu
, Charles James Wagner, Junjie Luo, Qi Guo:
Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries. 432-441 - Xiaoyan Xing, Konrad Groh, Sezer Karaoglu, Theo Gevers, Anand Bhattad:
LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting. 442-452 - Chao Wang, Zhihao Xia, Thomas Leimkühler, Karol Myszkowski, Xuaner Zhang:
LEDiff: Latent Exposure Diffusion for HDR Generation. 453-464 - Chih-Hao Lin, Jia-Bin Huang, Zhengqin Li, Zhao Dong, Christian Richardt, Tuotuo Li, Michael Zollhöfer, Johannes Kopf, Shenlong Wang, Changil Kim:
IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images. 465-474 - Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek:
Differentiable Inverse Rendering with Interpretable Basis BRDFs. 475-484 - Samuel Rota Bulò, Nemanja Bartolovic, Lorenzo Porzi, Peter Kontschieder:
Hardware-Rasterized Ray-Based Gaussian Splatting. 485-494 - Chun Gu, Xiaofei Wei, Li Zhang, Xiatian Zhu:
TensoFlow: Tensorial Flow-based Sampler for Inverse Rendering. 495-504 - Zhengqin Li, Dilin Wang, Ka Chen, Zhaoyang Lv, Thu Nguyen-Phuoc, Milim Lee, Jia-Bin Huang, Lei Xiao, Yufeng Zhu, Carl S. Marshall, Yuheng Ren
, Richard A. Newcombe, Zhao Dong:
LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields. 505-517 - Yutao Feng, Xiang Feng, Yintong Shang, Ying Jiang, Chang Yu, Zeshun Zong, Tianjia Shao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang:
Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering. 518-529 - Aditya Chetan, Guandao Yang, Zichen Wang, Steve Marschner, Bharath Hariharan:
Accurate Differential Operators for Hybrid Neural Fields. 530-539 - Feixiang He, Jiangbei Yue, Jialin Zhu
, Armin Seyfried, Dan Casas, Julien Pettré, He Wang:
Learning Extremely High Density Crowds as Active Matters. 540-550 - Bojun Xiong, Jialun Liu, Jiakui Hu, Chenming Wu, Jinbo Wu, Xing Liu, Chen Zhao, Errui Ding, Zhouhui Lian:
TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting. 551-561 - Guoxing Sun
, Rishabh Dabral, Heming Zhu, Pascal Fua, Christian Theobalt, Marc Habermann:
Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures. 562-573 - Zhipeng Huang, Wangbo Yu, Xinhua Cheng, ChengShu Zhao, Yunyang Ge, Mingyi Guo, Li Yuan, Yonghong Tian:
RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing. 574-584 - Wei Cheng, Juncheng Mu, Xianfang Zeng, Xin Chen, Anqi Pang, Chi Zhang, Zhibin Wang, Bin Fu, Gang Yu, Ziwei Liu, Liang Pan:
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D. 585-594 - Qiao Yu, Xianzhi Li, Yuan Tang, Xu Han, Long Hu, Yixue Hao, Min Chen:
Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation. 595-604 - Nissim Maruani
, Wang Yifan, Matthew Fisher, Pierre Alliez, Mathieu Desbrun:
ShapeShifter: 3D Variations Using Multiscale and Sparse Point-Voxel Diffusion. 605-617 - Daoyi Gao, Yawar Siddiqui, Lei Li
, Angela Dai:
MeshArt: Generating Articulated Meshes with Structure-Guided Transformers. 618-627 - Aleksei Bokhovkin, Quan Meng, Shubham Tulsiani, Angela Dai:
SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation. 628-639 - Ziya Erkoç, Can Gümeli, Chaoyang Wang, Matthias Nießner, Angela Dai, Peter Wonka, Hsin-Ying Lee, Peiye Zhuang:
PrEditor3D: Fast and Precise 3D Shape Editing. 640-649 - Quan Meng, Lei Li
, Matthias Nießner, Angela Dai:
LT3SD: Latent Trees for 3D Scene Diffusion. 650-660 - Yian Zhao, Wanshi Xu, Ruochong Zheng, Pengchong Qiao, Chang Liu, Jie Chen:
iSegMan: Interactive Segment-and-Manipulate 3D Gaussians. 661-670 - Jianxiong Shen, Yue Qian, Xiaohang Zhan:
LOD-GS: Achieving Levels of Detail using Scalable Gaussian Soup. 671-680 - Yifei Liu, Zhihang Zhong, Yifan Zhan, Sheng Xu, Xiao Sun:
MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks. 681-690 - Kun Yang, Yuxiang Liu, Zeyu Cui, Yu Liu, Maojun Zhang, Shen Yan, Qing Wang:
NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics. 691-700 - Yexing Xu, Longguang Wang, Minglin Chen, Sheng Ao, Li Li, Yulan Guo:
DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering. 701-710 - Yecong Wan, Mingwen Shao, Yuanshuo Cheng, Wangmeng Zuo:
S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting. 711-721 - Yihao Wang, Marcus Klasson, Matias Turkulainen, Shuzhe Wang, Juho Kannala, Arno Solin
:
DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering. 722-732 - Zhihao Liu, Zhanglin Cheng, Naoto Yokoya:
Neural Hierarchical Decomposition for Single Image Plant Modeling. 733-742 - Xiang Li, Zixuan Huang, Anh Thai, James M. Rehg:
Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation. 743-752 - Zhao Dong, Ka Chen, Zhaoyang Lv, Hong-Xing Yu, Yunzhi Zhang, Cheng Zhang, Yufeng Zhu, Stephen Tian, Zhengqin Li, Geordie Moffatt, Sean Christofferson, James Fort, Xiaqing Pan, Mingfei Yan, Jiajun Wu, Carl Yuheng Ren, Richard A. Newcombe:
Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset. 753-763 - Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen, Greg Shakhnarovich, Rares Ambrus:
Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion. 764-776 - Lingen Li, Zhaoyang Zhang, Yaowei Li, Jiale Xu, Wenbo Hu, Xiaoyu Li, Weihao Cheng, Jinwei Gu, Tianfan Xue, Ying Shan:
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images. 777-787 - Jingyu Lin, Jiaqi Gu, Lubin Fan, Bojian Wu, Yujing Lou, Renjie Chen, Ligang Liu, Jieping Ye:
HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting. 788-797 - Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri Terzopoulos, Konstantinos N. Plataniotis, Sergey Tulyakov, Jian Ren:
Wonderland: Navigating 3D Scenes from a Single Image. 798-810 - Zhen Lv, Yangqi Long, Congzhentao Huang, Cao Li, Chengfei Lv, Hao Ren, Dian Zheng:
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input. 811-821 - Yunzhi Yan, Zhen Xu, Haotong Lin, Haian Jin, Haoyu Guo, Yida Wang, Kun Zhan, Xianpeng Lang, Hujun Bao, Xiaowei Zhou, Sida Peng:
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models. 822-832 - Jiadong Tang, Yu Gao, Dianyi Yang, Liqi Yan, Yufeng Yue, Yi Yang:
DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery. 833-843 - Cong Ruan, Yuesong Wang, Tao Guan, Bin Zhang, Lili Ju:
IndoorGS: Geometric Cues Guided Gaussian Splatting for Indoor Scene Reconstruction. 844-853 - Xiaohao Xu, Feng Xue
, Shibo Zhao, Yike Pan, Sebastian A. Scherer, Xiaonan Huang:
MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction. 854-863 - Sangmin Kim, Seunguk Do, Jaesik Park:
ShowMak3r: Compositional TV Show Reconstruction. 864-874 - Qiang Hu, Zihan Zheng, Houqiang Zhong, Sihua Fu, Li Song, Xiaoyun Zhang, Guangtao Zhai, Yanfeng Wang:
4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video. 875-885 - Yiming Liang, Tianhan Xu
, Yuta Kikuchi:
HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation. 886-895 - Siyuan Shen, Tianjia Shao, Kun Zhou, Chenfanfu Jiang, Yin Yang:
EnliveningGS: Active Locomotion of 3DGS. 896-905 - Hongye Cheng, Tianyu Wang, Guangsi Shi, Zexing Zhao, Yanwei Fu:
HOP: Heterogeneous Topology-based Multimodal Entanglement for Co-Speech Gesture Generation. 906-916 - Haolin Liu, Xiaohang Zhan, Zizheng Yan, Zhongjin Luo, Yuxin Wen, Xiaoguang Han:
Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence. 917-928 - Bohan Yu, Jinxiu Liang, Zhuofeng Wang, Bin Fan, Art Subpa-Asa, Boxin Shi, Imari Sato:
Active Hyperspectral Imaging Using an Event Camera. 929-939 - Yaniv Benny, Lior Wolf:
SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception. 940-950 - Huan Zheng, Wencheng Han, Jianbing Shen:
Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution. 951-960 - Hongkai Lin, Dingkang Liang, Zhenghao Qi, Xiang Bai:
A Unified Image-Dense Annotation Generation Model for Underwater Scenes. 961-970 - Jianing Li, Yunjian Zhang, Haiqian Han, Xiangyang Ji:
Active Event-based Stereo Vision. 971-981 - Zidong Cao, Jinjing Zhu, Weiming Zhang, Hao Ai
, Haotian Bai, Hengshuang Zhao, Lin Wang:
PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation. 982-992 - Xunzhi Zheng, Dan Xu
:
Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations. 993-1002 - Jiaxi Deng, Yushen Wang, Haitao Meng, Zuoxun Hou, Yi Chang, Gang Chen:
OmniStereo: Real-time Omnidireactional Depth Estimation with Multiview Fisheye Cameras. 1003-1012 - Luca Bartolomei, Fabio Tosi, Matteo Poggi, Stefano Mattoccia
:
Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail. 1013-1027 - Luigi Piccinelli, Christos Sakaridis, Mattia Segù, Yung-Hsu Yang, Siyuan Li, Wim Abbeloos, Luc Van Gool:
UniK3D: Universal Camera Monocular 3D Estimation. 1028-1039 - Yihan Wang, Linfei Pan, Marc Pollefeys, Viktor Larsson:
Structure-from-Motion with a Non-Parametric Camera Model. 1040-1049 - Yohann Cabon, Lucas Stoffl, Leonid Antsfeld, Gabriela Csurka, Boris Chidlovskii, Jérôme Revaud, Vincent Leroy:
MUSt3R: Multi-view Network for Stereo 3D Reconstruction. 1050-1060 - Hana Bezalel, Dotan Ankri, Ruojin Cai, Hadar Averbuch-Elor:
Extreme Rotation Estimation in the Wild. 1061-1070 - Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy, Lourdes Agapito, Jérôme Revaud:
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors. 1071-1081 - Maxime Pietrantoni, Gabriela Csurka, Torsten Sattler:
Gaussian Splatting Feature Fields for (Privacy-Preserving) Visual Localization. 1082-1092 - Jonathan Astermark
, Anders Heyden, Viktor Larsson:
Dense Match Summarization for Faster Two-view Estimation. 1093-1102 - Honggyu An, Jin Hyeon Kim, Seonghoon Park, Jaewoo Jung, Jisang Han, Sunghwan Hong, Seungryong Kim:
Cross-View Completion Models are Zero-shot Correspondence Estimators. 1103-1115 - David Yifan Yao, Albert J. Zhai, Shenlong Wang:
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video. 1116-1126 - Yuzhen Liu, Qiulei Dong:
EquiPose: Exploiting Permutation Equivariance for Relative Camera Pose Estimation. 1127-1137 - Krispin Wandel
, Hesheng Wang:
SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations. 1138-1147 - Yufu Wang, Yu Sun, Priyanka Patel, Kostas Daniilidis, Michael J. Black, Muhammed Kocabas:
PromptHMR: Promptable Human Mesh Recovery. 1148-1159 - Yalong Xu, Lin Zhao, Chen Gong, Guangyu Li, Di Wang, Nannan Wang:
DynPose: Largely Improving the Efficiency of Human Pose Estimation by a Simple Dynamic Framework. 1160-1169 - Huan Ren, Wenfei Yang, Shifeng Zhang, Tianzhu Zhang:
Rethinking Correspondence-based Category-Level Object Pose Estimation. 1170-1179 - Ming-Feng Li, Xin Yang, Fu-En Wang, Hritam Basak, Yuyin Sun, Shreekant Gayaka, Min Sun, Cheng-Hao Kuo:
UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References. 1180-1189 - Bin Tan, Rui Yu
, Yujun Shen, Nan Xue:
PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes. 1190-1199 - Xiuqiang Song, Li Jin, Zhengxian Zhang, Jiachen Li, Fan Zhong, Guofeng Zhang, Xueying Qin:
Prior-free 3D Object Tracking. 1200-1209 - Guiyu Zhao, Sheng Ao, Ye Zhang, Kai Xu, Yulan Guo:
Progressive Correspondence Regenerator for Robust 3D Registration. 1210-1219 - Amir Etefaghi Daryani, M. Usman Maqbool Bhutta, Byron Hernandez, Henry Medeiros:
CaMuViD: Calibration-Free Multi-View Detection. 1220-1229 - Théo Bodrito, Olivier Flasseur, Julien Mairal, Jean Ponce, Maud Langlois, Anne-Marie Lagrange:
A New Statistical Model of Star Speckles for Learning to Detect and Characterize Exoplanets in Direct Imaging Observations. 1230-1240 - Huy Nguyen
, Kien Nguyen
, Akila Pemasiri, Feng Liu, Sridha Sridharan, Clinton Fookes:
AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification. 1241-1251 - Shuo Wang, Wanting Li
, Yongcai Wang, Zhaoxin Fan, Zhe Huang
, Xudong Cai, Jian Zhao, Deying Li:
MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing. 1252-1262 - Hongyu Sun
, Qiuhong Ke, Ming Cheng, Yongcai Wang, Deying Li, Chenhui Gou, Jianfei Cai:
Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis. 1263-1275 - Zimo Wang, Cheng Wang, Taiki Yoshino, Sirui Tao
, Ziyang Fu, Tzu-Mao Li:
HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition. 1276-1286 - Yuanqi Li, Jingcheng Huang, Hongshen Wang
, Peiyuan Lv, Yansong Liu, Jiuming Zheng, Jie Guo, Yanwen Guo:
High-quality Point Cloud Oriented Normal Estimation via Hybrid Angular and Euclidean Distance Encoding. 1287-1296 - Jiangbei Hu, Yanggeng Li, Fei Hou, Junhui Hou
, Zhebin Zhang, Shengfa Wang, Na Lei, Ying He:
A Lightweight UDF Learning Framework for 3D Reconstruction Based on Local Shape Functions. 1297-1307 - An Li, Zhe Zhu, Mingqiang Wei:
GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors. 1308-1318 - Ziyi Wang, Yanran Zhang, Jie Zhou, Jiwen Lu:
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting. 1319-1329 - Ziyin Zeng, Mingyue Dong, Jian Zhou, Huan Qiu, Zhen Dong, Man Luo, Bijun Li:
DeepLA-Net: Very Deep Local Aggregation Networks for Point Cloud Analysis. 1330-1341 - Chengzhi Wu, Yuxin Wan, Hao Fu, Julius Pfrommer, Zeyun Zhong
, Junwei Zheng, Jiaming Zhang, Jürgen Beyerer:
SAMBLE: Shape-Specific Point Cloud Sampling for an Optimal Trade-Off Between Local Detail and Global Uniformity. 1342-1352 - Jianan Ye, Weiguang Zhao, Xi Yang, Guangliang Cheng, Kaizhu Huang:
PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection. 1353-1362 - Shaocheng Yan, Yiming Wang, Kaiyan Zhao, Pengcheng Shi, Zhenjun Zhao, Yongjun Zhang, Jiayuan Li:
HeMoRa: Unsupervised Heuristic Consensus Sampling for Robust Point Cloud Registration. 1363-1373 - Zihui Zhang, Weisheng Dai, Hongtao Wen, Bo Yang:
LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds. 1374-1384 - Runmao Yao, Yi Du, Zhuoqun Chen, Haoze Zheng, Chen Wang:
AirRoom: Objects Matter in Room Reidentification. 1385-1394 - Fajwel Fogel, Yohann Perron
, Nikola Besic, Laurent Saint-André, Agnès Pellissier-Tanon, Martin Schwartz, Thomas Boudras, Ibrahim Fayad, Alexandre d'Aspremont, Loïc Landrieu, Philippe Ciais:
Open-Canopy: Towards Very High Resolution Forest Monitoring. 1395-1406 - Xin Jin, Haisheng Su, Kai Liu, Cong Ma, Wei Wu, Fei Hui, Junchi Yan:
UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection. 1407-1417 - Qiming Xia, Wenkai Lin, Haoen Xiang, Xun Huang, Siheng Chen, Zhen Dong, Cheng Wang, Chenglu Wen:
Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual Labels. 1418-1428 - R. D. Lin, Pengcheng Weng, Yinqiao Wang, Han Ding, Jinsong Han, Fei Wang:
HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving. 1429-1438 - Zakaria Laskar, Tomas Vojir, Matej Grcic, Iaroslav Melekhov, Shankar Gangisetty, Juho Kannala, Jiri Matas, Giorgos Tolias, C. V. Jawahar:
A Dataset for Semantic Segmentation in the Presence of Unknowns. 1439-1448 - Ben Agro, Sergio Casas, Patrick Wang, Thomas Gilles, Raquel Urtasun:
MAD: Memory-Augmented Detection of 3D Objects. 1449-1460 - Cédric Vincent, Taehyoung Kim, Henri Meeß:
High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight. 1461-1471 - Lingdong Kong, Dongyue Lu, Xiang Xu, Lai Xing Ng
, Wei Tsang Ooi, Benoit R. Cottereau:
EventFly: Event Camera Perception from Ground to the Sky. 1472-1484 - Tianchen Deng, Guole Shen, Chen Xun, Shenghai Yuan
, Tongxin Jin
, Hongming Shen, Yanbo Wang, Jingchuan Wang, Hesheng Wang, Danwei Wang, Weidong Chen:
MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots. 1485-1494 - Xin Ye, Burhaneddin Yaman, Sheng Cheng, Feng Tao, Abhirup Mallik, Liu Ren:
BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance. 1495-1504 - Dubing Chen, Huan Zheng, Jin Fang, Xingping Dong, Xianfei Li, Wenlong Liao, Tao He, Pai Peng, Jianbing Shen:
Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction. 1505-1515 - Zhimin Liao, Ping Wei, Shuaijia Chen, Haoxuan Wang, Ziyang Ren:
STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction. 1516-1526 - Pan Yin, Kaiyu Li, Xiangyong Cao, Jing Yao, Lei Liu, Xueru Bai, Feng Zhou, Deyu Meng:
Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method. 1527-1537 - Chenxu Zhou, Lvchang Fu, Sida Peng, Yunzhi Yan, Zhanhua Zhang, Yong Chen, Jiazhi Xia, Xiaowei Zhou:
LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation. 1538-1548 - Jingqiu Zhou, Lue Fan, Linjiang Huang, Xiaoyu Shi, Si Liu, Zhaoxiang Zhang
, Hongsheng Li:
FlexDrive: Toward Trajectory Flexibility in Driving Scene Gaussian Splatting Reconstruction and Rendering. 1549-1558 - Chaojun Ni, Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Wenkang Qin, Guan Huang, Chen Liu, Yuyin Chen, Yida Wang, Xueyang Zhang, Yifei Zhan, Kun Zhan, Peng Jia, Xianpeng Lang, Xingang Wang, Wenjun Mei:
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration. 1559-1569 - Shuhan Tan, John Lambert, Hong Jeon, Sakshum Kulshrestha, Yijing Bai, Jing Luo, Dragomir Anguelov, Mingxing Tan, Chiyu Max Jiang:
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model. 1570-1580 - Ziyang Xie, Zhizheng Liu, Zhenghao Peng, Wayne Wu, Bolei Zhou:
Vid2Sim: Realistic and Interactive Simulation from Video for Urban Navigation. 1581-1591 - Yuchen Xia, Quan Yuan, Guiyang Luo, Xiaoyuan Fu, Yang Li, Xuanhan Zhu, Tianyou Luo, Siheng Chen, Jinglin Li:
One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception. 1592-1601 - Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, Wei Yin:
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving. 1602-1611 - Zikang Zhou
, Hengjian Zhou, Haibo Hu, Zihao Wen
, Jianping Wang, Yung-Hui Li, Yu-Kai Huang:
ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling. 1612-1621 - Yichen Xie, Runsheng Xu, Tong He, Jyh-Jing Hwang, Katie Luo, Jingwei Ji, Hubert Lin, Letian Chen, Yiren Lu, Zhaoqi Leng, Dragomir Anguelov, Mingxing Tan:
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation. 1622-1632 - Yifan Wang, Jian Zhao, Zhaoxin Fan, Xin Zhang, Xuecheng Wu, Yudian Zhang, Lei Jin, Xinyue Li, Gang Wang, Mengxi Jia, Ping Hu, Zheng Zhu, Xuelong Li:
JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems. 1633-1644 - Ruiqi Qiu, Jun Gong, Xinyu Zhang, Siqi Luo, Bowen Zhang, Yi Cen:
Adapting to Observation Length of Trajectory Prediction via Contrastive Learning. 1645-1654 - Dianze Li
, Jianing Li, Xu Liu, Xiaopeng Fan
, Yonghong Tian:
Asynchronous Collaborative Graph Representation for Frames and Events. 1655-1666 - Huangyue Yu, Baoxiong Jia, Yixin Chen, Yandan Yang, Puhao Li, Rongpeng Su, Jiaxin Li, Qing Li, Wei Liang, Song-Chun Zhu, Tengyu Liu, Siyuan Huang:
METASCENES: Towards Automated Replica Creation for Real-world 3D Scans. 1667-1679 - Dongyue Lu, Lingdong Kong, Tianxin Huang, Gim Hee Lee:
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency. 1680-1690 - Chunlin Yu, Hanqing Wang, Ye Shi, Haoyang Luo, Sibei Yang, Jingyi Yu, Jingya Wang:
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model. 1691-1701 - Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn
, Ankur Handa, Tsung-Yi Lin, Gordon Wetzstein, Ming-Yu Liu, Donglai Xiang:
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models. 1702-1713 - Zhenyu Wu, Yuheng Zhou, Xiuwei Xu, Ziwei Wang, Haibin Yan:
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation. 1714-1723 - Yuheng Ji, Huajie Tan, Jiayu Shi, Xiaoshuai Hao, Yuan Zhang, Hengyuan Zhang, Pengwei Wang, Mengdi Zhao, Yao Mu, Pengju An, Xinda Xue, Qinghang Su, Huaihai Lyu, Xiaolong Zheng, Jiaming Liu, Zhongyuan Wang, Shanghang Zhang:
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete. 1724-1734 - Tianxing Chen, Yao Mu, Zhixuan Liang, Zanxin Chen, Shijia Peng, Qiangyu Chen, Mingkun Xu, Ruizhen Hu, Hongyuan Zhang, Xuelong Li, Ping Luo:
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation. 1735-1744 - Zhixuan Liang, Yao Mu, Yixiao Wang, Tianxing Chen, Wenqi Shao, Wei Zhan, Masayoshi Tomizuka, Ping Luo, Mingyu Ding:
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation. 1745-1755 - Guangyan Chen, Te Cui, Meiling Wang, Chengcai Yang, Mengxiao Hu, Haoyang Lu, Yao Mu, Zicai Peng, Tianxing Zhou, Xinran Jiang, Yi Yang, Yufeng Yue:
GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning. 1756-1768 - Yun Liu, Chengwen Zhang
, Ruofan Xing, Bingda Tang, Bowen Yang, Li Yi:
CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement. 1769-1782 - Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi, Arjun S. Lakshmipathy, Agniv Chatterjee, Michael J. Black, Dimitrios Tzionas:
PICO: Reconstructing 3D People In Contact with Objects. 1783-1794 - Yiming Dou, Wonseok Oh, Yuqing Luo, Antonio Loquercio, Andrew Owens:
Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes. 1795-1804 - Jinglei Zhang, Jiankang Deng, Chao Ma, Rolandos Alexandros Potamias:
HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos. 1805-1815 - Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na, Hanbyul Joo:
ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions. 1816-1828 - Jing Gao, Ce Zheng, László A. Jeni, Zackory Erickson:
DiSRT-In-Bed: Diffusion-Based Sim-to-Real Transfer Framework for In-Bed Human Mesh Recovery. 1829-1838 - Songpengcheng Xia, Yu Zhang, Zhuo Su, Xiaozheng Zheng, Zheng Lv, Guidong Wang, Yongjie Zhang, Qi Wu, Lei Chu, Ling Pei:
EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling. 1839-1849 - Germán Barquero, Nadine Bertsch, Manojkumar Marramreddy, Carlos Chacón, Filippo Arcadu, Ferran Rigual, Nicky Sijia He, Cristina Palmero, Sergio Escalera
, Yuting Ye, Robin Kips:
From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models. 1850-1860 - Dong Wei, Xiaoning Sun, Xizhan Gao, Shengxiang Hu, Huaijiang Sun:
ALIEN: Implicit Neural Representations for Human Motion Prediction under Arbitrary Latency. 1861-1870 - Cecilia Curreli, Dominik Muhle, Abhishek Saroha, Zhenzhang Ye, Riccardo Marin, Daniel Cremers:
Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction. 1871-1882 - Jianwei Tang, Hong Yang, Tengyue Chen, Jianfang Hu:
Stochastic Human Motion Prediction with Memory of Action Transition and Action Characteristic. 1883-1893 - Jiayi Su, Youhe Feng, Zheng Li, Jinhua Song, Yangfan He, Botao Ren, Botian Xu:
ArtFormer: Controllable Generation of Diverse 3D Articulated Objects. 1894-1904 - Dian Shao, Mingfei Shi, Shengda Xu, Haodong Chen, Yongle Huang
, Binglu Wang:
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance. 1905-1916 - Ting-Hsuan Liao, Yi Zhou, Yu Shen, Chun-Hao Paul Huang, Saayan Mitra, Jia-Bin Huang, Uttaran Bhattacharya:
Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions. 1917-1928 - Xuan Wang
, Kai Ruan
, Xing Zhang, Gaoang Wang:
AniMo: Species-Aware Model for Text-Driven Animal Motion Generation. 1929-1939 - Yifeng Ma, Jinwei Qi, Chaonan Ji, Peng Zhang, Bang Zhang, Zhidong Deng, Liefeng Bo:
Exploring Timeline Control for Facial Motion Generation. 1940-1950 - Ruineng Li, Daitao Xing, Huiming Sun, Yuanzhou Ha, Jinglin Shen, Chiuman Ho:
TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation. 1951-1961 - Inès Hyeonsu Kim, Seokju Cho, Jiahui Huang, Jung Yi, Joon-Young Lee, Seungryong Kim:
Exploring Temporally-Aware Features for Point Tracking. 1962-1972 - Yuhong Zhang, Guanlin Wu, Ling-Hao Chen, Zhuokai Zhao, Jing Lin, Xiaoke Jiang, Jiamin Wu, Zhuoheng Li, Hao (Frank) Yang, Haoqian Wang, Lei Zhang:
HumanMM: Global Human Motion Recovery from Multi-shot Videos. 1973-1983 - Daikun Liu, Lei Cheng, Teng Wang, Changyin Sun:
EDCFlow: Exploring Temporally Dense Difference Maps for Event-based Optical Flow Estimation. 1984-1993 - Zaoming Yan, Pengcheng Lei, Tingting Wang, Faming Fang, Junkang Zhang, Yaomin Huang, Haichuan Song:
Explicit Depth-Aware Blurry Video Frame Interpolation Guided by Differential Curves. 1994-2004 - Wenbo Hu, Xiangjun Gao, Xiaoyu Li, Sijie Zhao, Xiaodong Cun, Yong Zhang, Long Quan, Ying Shan:
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos. 2005-2015 - Baorui Ma, Huachen Gao, Haoge Deng, Zhengxiong Luo, Tiejun Huang, Lulu Tang, Xinlong Wang:
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale. 2016-2029 - Karran Pandey, Yannick Hold-Geoffroy, Matheus Gadelha, Niloy J. Mitra, Karan Singh, Paul Guerrero:
Motion Modes: What Could Happen Next? 2030-2039 - Wonjoon Jin, Qi Dai, Chong Luo, Seung-Hwan Baek, Sunghyun Cho:
FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis. 2040-2049 - David Junhao Zhang, Roni Paiss, Shiran Zada, Nikhil Karnad, David E. Jacobs, Yael Pritch, Inbar Mosseri, Mike Zheng Shou, Neal Wadhwa, Nataniel Ruiz:
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning. 2050-2062 - Zhenghao Zhang, Junchao Liao
, Menghao Li, Zuozhuo Dai, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang:
Tora: Trajectory-oriented Diffusion Transformer for Video Generation. 2063-2073 - Shengzhi Wang, Yingkang Zhong, Jiangchuan Mu, Kai Wu, Mingliang Xiong, Wen Fang, Mingqing Liu, Hao Deng, Bin He, Gang Li, Qingwen Liu:
Align-A-Video: Deterministic Reward Tuning of Image Diffusion Models for Consistent Video Editing. 2074-2083 - Xin Jin, Simon Niklaus, Zhoutong Zhang, Zhihao Xia, Chunle Guo, Yuting Yang, Jiawen Chen, Chongyi Li:
Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable. 2084-2093 - Yifan Bian, Chuanbo Tang, Li Li, Dong Liu:
Augmented Deep Contexts for Spatially Embedded Video Coding. 2094-2104 - Zihao Zhang, Haoran Chen, Haoyu Zhao, Guansong Lu, Yanwei Fu, Hang Xu, Zuxuan Wu:
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation. 2105-2115 - Yuantong Zhang, Zhenzhong Chen:
Continuous Space-Time Video Resampling with Invertible Motion Steganography. 2116-2126 - Xingguang Zhang, Nicholas Chimitt
, Xijun Wang, Yu Yuan, Stanley H. Chan:
Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation. 2127-2138 - Yiran Xu, Taesung Park, Richard Zhang, Yang Zhou, Eli Shechtman, Feng Liu, Jia-Bin Huang, Difan Liu:
VideoGigaGAN: Towards Detail-rich Video Super-Resolution. 2139-2149 - Yunpeng Qu, Kun Yuan, Qizhi Xie, Ming Sun, Chao Zhou, Jian Wang:
KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception. 2150-2160 - Jianyi Wang, Zhijie Lin, Meng Wei, Yang Zhao, Ceyuan Yang, Chen Change Loy, Lu Jiang:
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration. 2161-2172 - Weichen Dai, Hexing Wu, Xiaoyang Weng, Yuxin Zheng, Yuhang Ming, Wanzeng Kong:
Multi-Modal Synergistic Implicit Image Enhancement for Efficient Optical Flow Estimation. 2173-2182 - Yutong Wang, Jiajie Teng, Jiajiong Cao, Yuming Li, Chenguang Ma
, Hongteng Xu, Dixin Luo:
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency. 2183-2193 - Xinan Xie, Qing Zhang, Wei-Shi Zheng:
Diffusion-based Event Generation for High-Quality Image Deblurring. 2194-2203 - Yanis Benidir, Nicolas Gonthier, Clément Mallet:
The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generationf. 2204-2214 - Hyejin Oh, Woo-Shik Kim, Sangyoon Lee
, YungKyung Park, Je-Won Kang:
Illumination Spectrum Estimation for Multispectral Images via Surface Reflectance Modeling and Spatial-Spectral Feature Generation. 2215-2225 - Jinyuan Liu, Bowei Zhang, Qingyun Mei, Xingyuan Li, Yang Zou, Zhiying Jiang, Long Ma, Risheng Liu, Xin Fan:
DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion. 2226-2235 - Junming Hou, Xiaoyu Chen, Ran Ran, Xiaofeng Cong, Xinyang Liu, Jian Wei You, Liang-Jian Deng:
Binarized Neural Network for Multi-spectral Image Fusion. 2236-2245 - Haitao Wu, Qing Li, Changqing Zhang, Zhen He, Xiaomin Ying:
Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior. 2246-2257 - Jiuchen Chen, Xinyu Yan, Qizhi Xu, Kaiqi Li:
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images. 2258-2268 - Woo Kyoung Han, Byeonghun Lee, Hyunmin Cho, Sunghoon Im, Kyong Hwan Jin:
Towards Lossless Implicit Neural Representation via Bit Plane Decomposition. 2269-2278 - Wei Long, Xingyu Zhou, Leheng Zhang, Shuhang Gu:
Progressive Focused Transformer for Single Image Super-Resolution. 2279-2288 - Yuxuan Jiang, Ho Man Kwan, Tianhao Peng, Ge Gao
, Fan Zhang
, Xiaoqing Zhu, Joel Sole, David Bull:
HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution. 2289-2299 - Yulu Bai, Jiahong Fu, Qi Xie, Deyu Meng:
A Regularization-Guided Equivariant Approach for Image Restoration. 2300-2310 - Fengjia Zhang, Samrudhdhi B. Rangrej, Tristan Aumentado-Armstrong, Afsaneh Fazly, Alex Levinshtein:
Augmenting Perceptual Super-Resolution via Image Quality Predictors. 2311-2322 - Tengyu Ma, Long Ma, Ziye Li, Yuetong Wang, Jinyuan Liu, Chengpei Xu, Risheng Liu:
Rethinking Reconstruction and Denoising in the Dark: New Perspective, General Architecture and Beyond. 2323-2332 - Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, Lei Zhang:
Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach. 2333-2343 - Xudong Li, Wenjie Nie, Yan Zhang, Runze Hu, Ke Li, Xiawu Zheng, Liujuan Cao:
Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment. 2344-2354 - Jinho Jeong, Sangmin Han, Jinwoo Kim, Seon Joo Kim:
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models. 2355-2365 - Guangqian Guo, Yong Guo, Xuehui Yu, Wenbo Li, Yaoxing Wang
, Shan Gao:
Segment Any-Quality Images with Generative Latent Space Enhancement. 2366-2376 - Yuhan Wang
, Suzhi Bi, Ying-Jun Angela Zhang, Xiaojun Yuan:
Traversing Distortion-Perception Tradeoff using a Single Score-Based Generative Model. 2377-2386 - Zhifu Tian
, Tao Hu, Chaoyang Niu, Di Wu, Shu Wang:
Sampling Innovation-Based Adaptive Compressive Sensing. 2387-2397 - Tomer Garber, Tom Tirer:
Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond). 2398-2407 - Zhi Jiang, Jingbo Hu, Ling Zhang, Gang Fu, Chunxia Xiao:
Hierarchical Adaptive Filtering Network for Text Image Specular Highlight Removal. 2408-2417 - Yi Liu, Hao Zhou, Benlei Cui, Wenxiang Shang, Ran Lin:
Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways. 2418-2427 - Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu
:
Balanced Rate-Distortion Optimization in Learned Image Compression. 2428-2438 - Sora Kim, Sungho Suh, Minsik Lee:
RAD: Region-Aware Diffusion Models for Image Inpainting. 2439-2448 - Lucas Relic, Roberto Azevedo, Yang Zhang, Markus Gross, Christopher Schroers:
Bridging the Gap between Gaussian Diffusion Models and Universal Quantization for Image Compression. 2449-2458 - Haosen Yang, Adrian Bulat, Isma Hadji, Hai X. Pham, Xiatian Zhu, Georgios Tzimiropoulos, Brais Martínez:
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion. 2459-2468 - Yuchuan Tian, Jing Han, Chengcheng Wang, Yuchen Liang, Chao Xu, Hanting Chen:
DiC: Rethinking Conv3x3 Designs in Diffusion Models. 2469-2478 - Yushu Wu, Zhixing Zhang, Yanyu Li, Yanwu Xu, Anil Kag, Yang Sui, Huseyin Coskun, Ke Ma, Aleksei Lebedev, Ju Hu, Dimitris N. Metaxas, Yanzhi Wang, Sergey Tulyakov, Jian Ren:
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device. 2479-2490 - Zijian Zhou, Shikun Liu, Xiao Han, Haozhe Liu, Kam Woh Ng, Tian Xie, Yuren Cong, Hang Li, Mengmeng Xu, Juan-Manuel Pérez-Rúa, Aditya Patel, Tao Xiang, Miaojing Shi, Sen He:
Learning Flow Fields in Attention for Controllable Person Image Generation. 2491-2501 - Xiao Zhang, Ruoxi Jiang, Rebecca Willett, Michael Maire:
Nested Diffusion Models Using Hierarchical Latent Priors. 2502-2512 - Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim, Byung-Jun Lee:
Adaptive Non-Uniform Timestep Sampling for Accelerating Diffusion Model Training. 2513-2522 - Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi S. Jaakkola, Xuhui Jia, Saining Xie:
Scaling Inference Time Compute for Diffusion Models. 2523-2534 - Hermann Kumbong, Xian Liu, Tsung-Yi Lin, Ming-Yu Liu, Xihui Liu, Ziwei Liu, Daniel Y. Fu, Christopher Ré, David W. Romero:
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation. 2535-2544 - Liao Qu, Huichao Zhang, Yiheng Liu, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Daniel K. Du, Zehuan Yuan, Xinglong Wu:
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation. 2545-2555 - Subhadeep Koley, Tapas Kumar Dutta, Aneeshan Sain, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Yi-Zhe Song:
SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models. 2556-2567 - Roberto Henschel, Levon Khachatryan, Hayk Poghosyan, Daniil Hayrapetyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi:
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text. 2568-2577 - Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu, Ji Hou, Tao Xu, Jialiang Wang, Felix Juefei-Xu, Yaqiao Luo, Peizhao Zhang, Tingbo Hou, Peter Vajda, Niraj K. Jha, Xiaoliang Dai:
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity. 2578-2588 - Yukun Wang, Longguang Wang, Zhiyuan Ma, Qibin Hu, Kai Xu, Yulan Guo:
VideoDirector: Precise Video Editing via Text-to-Video Models. 2589-2598 - Dohun Lee, Bryan Sangwoo Kim, Geon Yeong Park, Jong Chul Ye:
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide. 2599-2608 - Xi Wang, Robin Courant, Marc Christie, Vicky Kalogeiton:
AKiRa: Augmentation Kit on Rays for Optical Video Generation. 2609-2619 - Mingi Kwon, Shin seong Kim, Jaeseok Jeong, Yi Ting Hsiao, Youngjung Uh:
TCFG: Tangential Damping Classifier-free Guidance. 2620-2629 - Zixuan Ye, Huijuan Huang, Xintao Wang, Pengfei Wan, Di Zhang, Wenhan Luo:
StyleMaster: Stylize Your Video with Artistic Generation and Translation. 2630-2640 - Nadav Z. Cohen, Oron Nir, Ariel Shamir:
Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation. 2641-2650 - Yufan Ren, Zicong Jiang
, Tong Zhang, Søren Forchhammer, Sabine Süsstrunk:
FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing. 2651-2660 - Fengyi Fu, Lei Zhang, Mengqi Huang, Zhendong Mao:
FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation. 2661-2670 - Duong H. Le, Tuan Pham, Sangho Lee, Christopher Clark, Aniruddha Kembhavi, Stephan Mandt, Ranjay Krishna, Jiasen Lu:
One Diffusion to Generate Them All. 2671-2682 - Yanfeng Li, Ka-Hou Chan, Yue Sun, Chan-Tong Lam, Tong Tong, Zitong Yu, Keren Fu, Xiaohong Liu, Tao Tan:
MoEdit: On Learning Quantity Perception for Multi-object Image Editing. 2683-2693 - Yingjing Xu, Jie Kong, Jiazhi Wang, Xiao Pan, Bo Lin, Qiang Liu:
InsightEdit: Towards Better Instruction Following for Image Editing. 2694-2703 - Mingdeng Cao, Xuaner Zhang, Yinqiang Zheng, Zhihao Xia:
Instruction-based Image Manipulation by Watching How Things Move. 2704-2713 - Mushui Liu, Dong She, Jingxuan Pang, Qihan Huang, Jiacheng Ying, Wanggui He, Yuanlei Hou, Siming Fu:
TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance. 2714-2723 - Edurne Bernal-Berdun, Ana Serrano, Belén Masiá, Matheus Gadelha, Yannick Hold-Geoffroy, Xin Sun, Diego Gutierrez:
PreciseCam: Precise Camera Control for Text-to-Image Generation. 2724-2733 - Jialuo Li, Wenhao Chai, Xingyu Fu, Haiyang Xu
, Saining Xie:
Science-T2I: Addressing Scientific Illusions in Image Synthesis. 2734-2744 - Wataru Shimoda, Naoto Inoue, Daichi Haraguchi, Hayato Mitani, Seiichi Uchida
, Kota Yamaguchi:
Type-R: Automatically Retouching Typos for Text-to-Image Generation. 2745-2754 - Qihao Liu, Xi Yin, Alan L. Yuille, Andrew Brown, Mannat Singh:
Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution. 2755-2765 - Chao Feng, Ziyang Chen, Aleksander Holynski, Alexei A. Efros, Andrew Owens:
GPS as a Control Signal for Image Generation. 2766-2778 - Zijie Li, Henry Li, Yichun Shi, Amir Barati Farimani, Yuval Kluger
, Linjie Yang, Peng Wang:
Dual Diffusion for Unified Image Generation and Understanding. 2779-2790 - Rishubh Parihar, Vaibhav Agrawal, Sachidanand VS, Venkatesh Babu Radhakrishnan:
Compass Control: Multi Object Orientation Control for Text-to-Image Generation. 2791-2801 - Jiaxiu Jiang, Yabo Zhang, Kailai Feng, Xiaohe Wu, Wenbo Li, Renjing Pei, Fan Li, Wangmeng Zuo:
MC^2: Multi-concept Guidance for Customized Multi-concept Generation. 2802-2812 - Bin Wu
, Wuxuan Shi, Jinqiao Wang, Mang Ye
:
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models. 2813-2823 - Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Nicu Sebe
, Mubarak Shah:
Curriculum Direct Preference Optimization for Diffusion and Consistency Models. 2824-2834 - Rui Zhao, Weijia Mao, Mike Zheng Shou:
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles. 2835-2846 - Cong Xie, Han Zou, Ruiqi Yu, Yan Zhang
, Zhenpeng Zhan:
SerialGen: Personalized Image Generation by First Standardization Then Personalization. 2847-2856 - Yuanbo Yang, Jiahao Shao, Xinyang Li, Yujun Shen, Andreas Geiger, Yiyi Liao:
Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation. 2857-2869 - Silin Gao, Sheryl Mathew, Li Mi, Sepideh Mamooler, Mengjie Zhao, Hiromi Wakaki, Yuki Mitsufuji, Syrielle Montariol
, Antoine Bosselut:
VinaBench: Benchmark for Faithful and Consistent Visual Narratives. 2870-2879 - Bonan Li, Zicheng Zhang, Xingyi Yang, Xinchao Wang:
CoSER: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation. 2880-2890 - Zeqi Gu, Yin Cui, Zhaoshuo Li, Fangyin Wei, Yunhao Ge, Jinwei Gu, Ming-Yu Liu, Abe Davis, Yifan Ding:
ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary. 2891-2901 - Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng
, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, Trevor Darrell:
AutoPresent: Designing Structured Visuals from Scratch. 2902-2911 - Xi Wang, Hongzhen Li, Heng Fang, Yichen Peng, Haoran Xie, Xi Yang, Chuntao Li:
LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model. 2912-2923 - Siyuan Bian, Chenghao Xu
, Yuliang Xiu
, Artur Grigorev, Zhen Liu, Cewu Lu, Michael J. Black, Yao Feng:
ChatGarment: Garment Estimation, Generation and Editing via Large Language Models. 2924-2934 - Haobin Zhong, Shuai He, Anlong Ming, Huadong Ma:
Rethinking Personalized Aesthetics Assessment: Employing Physique Aesthetics Assessment as An Exemplification. 2935-2944 - Zirun Guo, Tao Jin:
ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation. 2945-2954 - Qianlong Xiang, Miao Zhang, Yuzhang Shang, Jianlong Wu, Yan Yan, Liqiang Nie:
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture. 2955-2965 - Matan Rusanovsky, Shimon Malnick, Amir Jevnisek, Ohad Fried, Shai Avidan:
Memories of Forgotten Concepts. 2966-2975 - Basim Azam
, Naveed Akhtar:
Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control. 2976-2985 - Yimeng Zhang, Tiancheng Zhi, Jing Liu, Shen Sang, Liming Jiang, Qing Yan, Sijia Liu, Linjie Luo:
ID-Patch: Robust ID Association for Group Photo Personalization. 2986-2996 - Hao Cheng, Erjia Xiao, Jiayan Yang, Jiahang Cao, Qiang Zhang, Jize Zhang
, Kaidi Xu, Jindong Gu, Renjing Xu:
Not Just Text: Uncovering Vision Modality Typographic Threats in Image Generation Models. 2997-3007 - Xuanyu Zhang, Zecheng Tang
, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, Jian Zhang:
OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking. 3008-3018 - Yiren Song, Pei Yang, Hai Ci, Mike Zheng Shou:
IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation. 3019-3028 - Mischa Dombrowski, Weitong Zhang, Sarah Cechnicka, Hadrien Reynaud, Bernhard Kainz:
Image Generation Diversity Issues and How to Tame Them. 3029-3039 - Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm
:
Forensic Self-Descriptions Are All You Need for Zero-Shot Detection, Open-Set Source Attribution, and Clustering of AI-generated Images. 3040-3050 - Jinwoo Kim, Sangmin Han, Jinho Jeong, Jiwoo Choi, Dongyeoung Kim, Seon Joo Kim:
ORIDa: Object-centric Real-world Image Composition Dataset. 3051-3060 - Dhananjaya Jayasundara, Sudarshan Rajagopalan, Yasiru Ranasinghe, Trac D. Tran, Vishal M. Patel:
SINR: Sparsity Driven Compressed Implicit Neural Representations. 3061-3070 - Tiago Novello, Diana Aldana, Andre Araujo, Luiz Velho:
Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks. 3071-3080 - Yuki Kawana, Shintaro Shiba
, Quan Kong, Norimasa Kobori:
GA3CE: Unconstrained 3D Gaze Estimation with Gaze-Aware 3D Context Encoding. 3081-3090 - Yunfeng Xiao, Xiaowei Bai, Baojun Chen, Hao Su, Hao He, Liang Xie, Erwei Yin:
De^2Gaze: Deformable and Decoupled Representation Learning for 3D Gaze Estimation. 3091-3100 - Tianyun Zhong, Chao Liang, Jianwen Jiang, Gaojie Lin, Jiaqi Yang, Zhou Zhao:
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation. 3101-3110 - Juncheng Wang, Chao Xu, Cheng Yu, Lei Shang, Zhe Hu, Shujun Wang, Liefeng Bo:
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition. 3111-3120 - Inho Kim, Youngkil Song, Jicheol Park, Won Hwa Kim, Suha Kwak:
Improving Sound Source Localization with Joint Slot Attention on Image and Audio. 3121-3130 - Chen Liu, Liying Yang, Peike Li, Dadong Wang, Lincheng Li, Xin Yu
:
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics. 3131-3141 - Eitan Shaar, Ariel Shaulov, Gal Chechik, Lior Wolf:
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds. 3142-3151 - Zitang Zhou, Ke Mei, Yu Lu, Tianyi Wang, Fengyun Rao:
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization. 3152-3162 - Sanchayan Santra, Vishal M. Chudasama, Pankaj Wasnik, Vineeth N. Balasubramanian:
Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance. 3163-3172 - Bingjie Gao, Xinyu Gao, Xiaoxue Wu, Yujie Zhou, Yu Qiao, Li Niu, Xinyuan Chen, Yaohui Wang:
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation. 3173-3183 - Xin Yan, Yuxuan Cai, Qiuyue Wang, Yuan Zhou, Wenhao Huang, Huan Yang:
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation. 3184-3194 - Zhengrong Yue, Shaobin Zhuang, Kunchang Li, Yanbo Ding, Yali Wang:
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents. 3195-3205 - Huiyu Duan, Qiang Hu, Jiarui Wang, Liu Yang, Zitong Xu, Lu Liu, Xiongkuo Min, Chunlei Cai, Tianxiao Ye, Xiaoyun Zhang, Guangtao Zhai:
FineVQ: Fine-Grained User Generated Content Video Quality Assessment. 3206-3217 - Kevin Qinghong Lin, Mike Zheng Shou:
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary. 3218-3228 - Zicheng Zhang, Ziheng Jia, Haoning Wu, Chunyi Li, Zijian Chen, Yingjie Zhou, Wei Sun, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai:
Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs. 3229-3239 - Wei Li, Bing Hu, Rui Shao, Leyang Shen, Liqiang Nie:
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant. 3240-3251 - Kaixuan Wu
, Xinde Li, Xinling Li, Chuanfei Hu, Guoliang Wu:
AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning. 3252-3261 - Huabin Liu, Filip Ilievski
, Cees G. M. Snoek:
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning. 3262-3271 - Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal:
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos. 3272-3283 - Haiyi Qiu, Minghe Gao, Long Qian, Kaihang Pan, Qifan Yu, Juncheng Li, Wenjie Wang, Siliang Tang, Yueting Zhuang, Tat-Seng Chua:
STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training. 3284-3294 - Kangsan Kim, Geon Park, Youngwan Lee, Woongyeong Yeo
, Sung Ju Hwang:
VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding. 3295-3305 - Zhuoming Liu, Yiquan Li, Khoi Duc Nguyen, Yiwu Zhong, Yin Li:
PAVE: Patching and Adapting Video Large Language Models. 3306-3317 - Shuming Liu, Chen Zhao, Tianqi Xu
, Bernard Ghanem
:
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding. 3318-3327 - Zhenpeng Huang, Xinhao Li, Jiaqi Li, Jing Wang, Xiangyu Zeng, Cheng Liang, Tao Wu, Xi Chen, Liang Li, Limin Wang:
Online Video Understanding: OVBench and VideoChat-Online. 3328-3338 - Gengyuan Zhang, Mang Ling Ada Fok, Jialu Ma, Yan Xia, Daniel Cremers, Philip Torr, Volker Tresp, Jindong Gu:
Localizing Events in Videos with Multimodal Queries. 3339-3351 - Junho Kim, Hyunjun Kim, Hosu Lee, Yong Man Ro:
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis. 3352-3362 - Sheng Zhou, Junbin Xiao, Qingyun Li, Yicong Li, Xun Yang, Dan Guo, Meng Wang, Tat-Seng Chua, Angela Yao:
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering. 3363-3373 - Felix Vogel, Walid Bousselham, Anna Kukleva, Nina Shvetsova, Hilde Kuehne:
VideoGEM: Training-free Action Grounding in Videos. 3374-3383 - Aaryan Garg, Akash Kumar, Yogesh S. Rawat:
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding. 3384-3394 - Claudia Cuttano, Gabriele Trivigno, Gabriele Rosi, Carlo Masone
, Giuseppe Averta:
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation. 3395-3405 - Nan Huang, Wenzhao Zheng, Chenfeng Xu, Kurt Keutzer, Shanghang Zhang, Angjoo Kanazawa, Qianqian Wang:
Segment Any Motion in Videos. 3406-3416 - Haiyang Mei, Pengyu Zhang, Mike Zheng Shou:
SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost. 3417-3426 - Andrei Dumitriu
, Florin Tatui, Florin Miron, Aakash Ralhan
, Radu Tudor Ionescu, Radu Timofte:
RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety. 3427-3437 - Olga Zatsarynna, Emad Bahrami, Yazan Abu Farha, Gianpiero Francesca, Juergen Gall:
MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Action Anticipation. 3438-3448 - Yilong Wang, Zilin Gao, Qilong Wang, Zhaofeng Chen, Peihua Li, Qinghua Hu:
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition. 3449-3459 - Shaopeng Yang, Jilong Wang, Saihui Hou, Xu Liu, Chunshui Cao, Liang Wang, Yongzhen Huang:
Bridging Gait Recognition and Large Language Models Sequence Modeling. 3460-3469 - Lorenzo Mur-Labadia, Josechu Guerrero, Ruben Martinez-Cantin:
DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos. 3470-3480 - Shengeng Tang
, Jiayi He, Lechao Cheng, Jingjing Wu, Dan Guo, Richang Hong:
Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations. 3481-3491 - Zezeng Li, Xiaoyu Du, Na Lei, Liming Chen, Weimin Wang:
NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary. 3492-3502 - Li Lin, Santosh Santosh, Mingyang Wu, Xin Wang, Shu Hu:
AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark. 3503-3515 - Fengfan Zhou, Bangjie Yin, Hefei Ling, Qianyu Zhou, Wenxuan Wang:
Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation. 3516-3527 - Saeed Ebrahimi, Sahar Rahimi Malakshan, Ali Dabouei, Srinjoy Das, Jeremy M. Dawson, Nasser M. Nasrabadi:
GIF: Generative Inspiration for Face Recognition at Scale. 3528-3539 - Li Lun
, Kunyu Feng, Qinglong Ni, Ling Liang, Yuan Wang, Ying Li, Dunshan Yu, Xiaoxin Cui:
Towards Effective and Sparse Adversarial Attack on Spiking Neural Networks via Breaking Invisible Surrogate Gradients. 3540-3551 - Ziqi Li, Tao Gao, Yisheng An, Ting Chen, Jing Zhang, Yuanbo Wen, Mengkun Liu, Qianxi Zhang:
Brain-Inspired Spiking Neural Networks for Energy-Efficient Object Detection. 3552-3562 - Tian Gao, Yu Zhang, Zhiyuan Zhang
, Huajun Liu, Kaijie Yin, Chengzhong Xu, Hui Kong:
BHViT: Binarized Hybrid Vision Transformer. 3563-3572 - Zhenyu Cui, Jiahuan Zhou, Yuxin Peng:
DKC: Differentiated Knowledge Consolidation for Cloth-Hybrid Lifelong Person Re-identification. 3573-3582 - Jingwei Zhang, Anh Tien Nguyen, Xi Han, Vincent Quoc-Huy Trinh, Hong Qin, Dimitris Samaras, Mahdi S. Hosseini:
2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification. 3583-3592 - Jose Henrique Lima Marques, Jeffri Murrugarra-Llerena
, Cláudio R. Jung:
GauCho: Gaussian Distributions with Cholesky Decomposition for Oriented Object Detection. 3593-3602 - Biplab Chandra Das, Viswanath Gopalakrishnan:
Camouflage Anything: Learning to Hide using Controlled Out-painting and Representation Engineering. 3603-3613 - Datao Tang, Xiangyong Cao, Xuan Wu, Jialin Li, Jing Yao, Xueru Bai, Dongsheng Jiang, Yin Li, Deyu Meng:
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation. 3614-3624 - Zhe Shan, Yang Liu, Lei Zhou
, Cheng Yan, Heng Wang
, Xia Xie:
ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object. 3625-3635 - Phuc Nguyen, Minh Luu, Anh Tuan Tran, Cuong Pham, Khoi Nguyen:
Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking. 3636-3645 - Joey Wilson, Marcelino Almeida, Sachit Mahajan, Martin Labrie, Maani Ghaffari, Omid Ghasemalizadeh, Min Sun, Cheng-Hao Kuo, Arnab Sen:
POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality. 3646-3655 - Runsong Zhu, Shi Qiu, Zhengzhe Liu, Ka-Hei Hui, Qianyi Wu, Pheng-Ann Heng, Chi-Wing Fu:
Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting. 3656-3665 - Wenxuan Guo, Xiuwei Xu, Ziwei Wang, Jianjiang Feng, Jie Zhou, Jiwen Lu:
Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding. 3666-3675 - Yijie Tang, Jiazhao Zhang, Yuqing Lan, Yulan Guo, Dezun Dong, Chenyang Zhu, Kai Xu:
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging. 3676-3685 - Muzhi Zhu, Yuzhuo Tian, Hao Chen, Chunluan Zhou, Qingpei Guo, Yang Liu, Ming Yang
, Chunhua Shen:
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories. 3686-3696 - Savya Khosla, Sethuraman TV, Alexander G. Schwing, Derek Hoiem:
RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations. 3697-3706 - Rong Li, Shijie Li, Lingdong Kong, Xulei Yang, Junwei Liang:
SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding. 3707-3717 - Zhenyang Liu, Yikai Wang, Sixiao Zheng, Tongying Pan, Longfei Liang, Yanwei Fu, Xiangyang Xue:
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning. 3718-3727 - Ziyang Zhou, Pinghui Wang, Zi Liang, Haitao Bai, Ruofei Zhang:
Cross-Modal 3D Representation with Multi-View Images and Point Clouds. 3728-3739 - Austin Stone, Hagen Soltau, Robert Geirhos, Xi Yi, Ye Xia, Bingyi Cao, Kaifeng Chen, Abhijit Ogale, Jonathon Shlens:
Learning Visual Composition through Improved Semantic Guidance. 3740-3750 - Keyu Guo, Yongle Huang, Shijie Sun, Xiangyu Song, Mingtao Feng, Zedong Liu, Huansheng Song, Tiantian Wang, Jianxin Li, Naveed Akhtar
, Ajmal Saeed Mian
:
Beyond Human Perception: Understanding Multi-Object World from Monocular View. 3751-3760 - Hongyan Zhi, Peihao Chen, Junyan Li, Shuailei Ma, Xinyu Sun, Tianhang Xiang, Yinjie Lei, Mingkui Tan, Chuang Gan:
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences. 3761-3771 - Jiajun Deng, Tianyu He, Li Jiang, Tianyu Wang
, Feras Dayoub
, Ian D. Reid:
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer. 3772-3782 - Benlin Liu, Yuhao Dong, Yiqin Wang, Zixian Ma, Yansong Tang, Luming Tang, Yongming Rao, Wei-Chiu Ma, Ranjay Krishna:
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model. 3783-3792 - Efstathios Karypidis, Ioannis Kakogeorgiou
, Spyros Gidaris, Nikos Komodakis:
Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers. 3793-3803 - Weiming Ren, Huan Yang, Jie Min, Cong Wei, Wenhu Chen:
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation. 3804-3814 - Haoqiang Kang, Enna Sachdeva, Piyush Gupta, Sangjae Bae, Kwonjoon Lee:
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks. 3815-3825 - Cheng Chen, Yunpeng Zhai, Yifan Zhao, Jinyang Gao, Bolin Ding, Jia Li:
Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning. 3826-3835 - Mahtab Bigverdi, Zelun Luo, Cheng-Yu Hsieh, Ethan Shen, Dongping Chen, Linda G. Shapiro, Ranjay Krishna:
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models. 3836-3845 - Akhil Perincherry, Jacob Krantz, Stefan Lee:
Do Visual Imaginations Improve Vision-and-Language Navigation Agents? 3846-3855 - Fan Yang, Ru Zhen, Jianing Wang, Yanhao Zhang, Haoxiang Chen, Haonan Lu, Sicheng Zhao, Guiguang Ding:
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator. 3856-3866 - Ailin Deng, Tri Cao, Zhirui Chen, Bryan Hooi:
Words or Vision: Do Vision-Language Models Have Blind Faith in Text? 3867-3876 - Christopher Chou, Lisa Dunlap, Koki Mashita, Krishna Mandal, Trevor Darrell, Ion Stoica, Joseph E. Gonzalez, Wei-Lin Chiang:
VisionArena: 230k Real World User-VLM Conversations with Preference Labels. 3877-3887 - Wen Yin
, Yong Wang, Guiduo Duan, Dongyang Zhang, Xin Hu, Yuan-Fang Li, Tao He:
Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition. 3888-3898 - Yangyu Huang, Tianyi Gao, Haoran Xu, Qihao Zhao, Yang Song, Zhipeng Gui
, Tengchao Lv, Hao Chen, Lei Cui, Scarlett Li, Furu Wei:
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs. 3899-3908 - Chengyue Huang, Brisa Maneechotesuwan, Shivang Chopra, Zsolt Kira:
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering. 3909-3918 - Miran Heo, Min-Hung Chen, De-An Huang, Sifei Liu, Subhashree Radhakrishnan, Seon Joo Kim, Yu-Chiang Frank Wang, Ryo Hachiuma:
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks. 3919-3930 - Wenbo Chen, Zhen Xu, Ruotao Xu, Si Wu, Hau-San Wong:
Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding. 3931-3941 - Yue Han, Jiangning Zhang, Junwei Zhu, Runze Hou, Xiaozhong Ji, Chuming Lin, Xiaobin Hu, Zhucun Xue, Yong Liu:
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model. 3942-3951 - Yang Bai, Yucheng Ji, Min Cao, Jinqiao Wang, Mang Ye
:
Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment. 3952-3962 - Ruotian Peng, Haiying He, Yake Wei, Yandong Wen, Di Hu:
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception. 3963-3973 - Likai Tian, Jian Zhao, Zechao Hu, Zhengwei Yang, Hao Li, Lei Jin, Zheng Wang, Xuelong Li:
CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval. 3974-3983 - You Li, Fan Ma, Yi Yang:
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy. 3984-3993 - Chuong Huynh, Jinyu Yang, Ashish Tawari, Mubarak Shah, Son Tran, Raffay Hamid, Trishul Chilimbi, Abhinav Shrivastava:
CoLLM: A Large Language Model for Composed Image Retrieval. 3994-4004 - Zhenxing Zhang, Yaxiong Wang, Lechao Cheng, Zhun Zhong, Dan Guo, Meng Wang:
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding. 4005-4014 - Yikun Liu, Yajie Zhang
, Jiayin Cai, Xiaolong Jiang, Yao Hu, Jiangchao Yao, Yanfeng Wang, Weidi Xie:
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant. 4015-4025 - Yuchen Duan, Zhe Chen, Yusong Hu, Weiyun Wang, Shenglong Ye
, Botian Shi, Lewei Lu, Qibin Hou, Tong Lu, Hongsheng Li, Jifeng Dai, Wenhai Wang:
Docopilot: Improving Multimodal Models for Document-Level Understanding. 4026-4037 - Wenhui Liao, Jiapeng Wang, Hongliang Li, Chengyu Wang, Jun Huang, Lianwen Jin:
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding. 4038-4049 - Jeong Ryong Lee, Yejee Shin, Geonhui Son, Dosik Hwang:
Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning. 4050-4059 - Guotao Liang, Baoquan Zhang, Zhiyuan Wen, Junteng Zhao, Yunming Ye, Kola Ye, Yao He:
Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text. 4060-4069 - Hyungyu Choi, Young Kyun Jang, Chanho Eom:
GOAL: Global-local Object Alignment Learning. 4070-4079 - Anjia Cao, Xing Wei, Zhiheng Ma:
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training. 4080-4090 - Shuokai Pan, Gerti Tuzi, Sudarshan Sreeram, Dibakar Gope:
Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales. 4091-4100 - Yassine Ouali, Adrian Bulat, Alexandros Xenos, Anestis Zaganidis, Ioannis Maniadis Metaxas, Brais Martínez, Georgios Tzimiropoulos:
VladVA: Discriminative Fine-tuning of LVLMs. 4101-4111 - Tianyu Chen, Xingcheng Fu, Yisen Gao, Haodong Qian, Yuecen Wei, Kun Yan, Haoyi Zhou, Jianxin Li:
Galaxy Walker: Geometry-aware VLMs For Galaxy-scale Understanding. 4112-4121 - Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, Dacheng Li, Xiuyu Li, Haotian Tang, Yunhao Fang, Yukang Chen, Cheng-Yu Hsieh, De-An Huang, An-Chieh Cheng, Jinyi Hu, Sifei Liu, Ranjay Krishna, Pavlo Molchanov, Jan Kautz, Hongxu Yin, Song Han, Yao Lu:
NVILA: Efficient Frontier Visual Language Models. 4122-4134 - Jing Bi, Junjia Guo, Yunlong Tang
, Lianggong Bruce Wen, Zhang Liu, Bingjie Wang, Chenliang Xu:
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach. 4135-4144 - Xudong Lu, Yinghao Chen, Cheng Chen, Hui Tan, Boheng Chen, Yina Xie, Rui Hu, Guanxin Tan, Renshou Wu, Yan Hu, Yi Zeng, Lei Wu, Liuyang Bian, Zhaoxiong Wang, Long Liu, Yanzhou Yang, Han Xiao, Aojun Zhou, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li:
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices. 4145-4155 - Junyan Lin, Haoran Chen, Yue Fan, Yingqi Fan, Xin Jin, Hui Su, Jinlan Fu, Xiaoyu Shen:
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices. 4156-4166 - Shiyao Li, Yingchun Hu, Xuefei Ning, Xihui Liu, Ke Hong, Xiaotao Jia, Xiuhong Li, Yaqi Yan, Pei Ran, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang:
MBQ: Modality-Balanced Quantization for Large Vision-Language Models. 4167-4177 - Qianhan Feng, Wenshuo Li, Tong Lin, Xinghao Chen:
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement. 4178-4188 - Xianwei Zhuang, Zhihong Zhu, Yuxin Xie, Liming Liang, Yuexian Zou:
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification. 4189-4199 - Dokyoon Yoon, Youngsook Song
, Woomyoung Park:
Stop Learning it all to Mitigate Visual Hallucination, Focus on the Hallucination Target. 4200-4208 - Junzhe Chen, Tianshu Zhang, Shiyu Huang, Yuwei Niu, Linfeng Zhang, Lijie Wen, Xuming Hu:
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models. 4209-4221 - Tobia Poppi, Tejaswi Kasarla, Pascal Mettes, Lorenzo Baraldi
, Rita Cucchiara:
Hyperbolic Safety-Aware Vision-Language Models. 4222-4232 - Jin Wang, Chenghui Lv, Xian Li, Shichao Dong, Huadong Li, Kelu Yao, Chao Li, Wenqi Shao, Ping Luo:
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models. 4233-4245 - Haoyu Zhang, Yangyang Guo, Mohan S. Kankanhalli:
Joint Vision-Language Social Bias Removal for CLIP. 4246-4255 - Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai, Kazuki Adachi
, Daiki Chijiwa:
Post-pre-training for Modality Alignment in Vision-Language Foundation Models. 4256-4266 - Karsten Roth, Zeynep Akata, Dima Damen, Ivana Balazevic, Olivier J. Hénaff:
Context-Aware Multimodal Pretraining. 4267-4279 - Yi Zhang, Yi-Xuan Deng, Meng-Hao Guo, Shi-Min Hu:
Adaptive Parameter Selection for Tuning Vision-Language Models. 4280-4290 - Yabin Wang
, Zhiwu Huang, Xiaopeng Hong:
OpenSDI: Spotting Diffusion-Generated Images in the Open World. 4291-4301 - Jianyu Lai, Sixiang Chen, Yunlong Lin, Tian Ye, Yun Liu, Song Fei, Zhaohu Xing, Hongtao Wu, Weiming Wang, Lei Zhu:
SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization. 4302-4312 - Kevin Miller, Aditya Gangrade, Samarth Mishra, Kate Saenko, Venkatesh Saligrama:
SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models. 4313-4321 - Zhaogeng Liu, Haozhen Zhang, Hualin Zhang, Xingchen Li, Wanli Shi, Bin Gu, Yi Chang:
Query Efficient Black-Box Visual Prompting with Subspace Learning. 4322-4331 - Xueyu Liu, Rui Wang, Yexin Lai, Guangze Shi
, Feixue Shao, Fang Hao, Jianan Zhang, Jia Shen
, Yongfei Wu, Wen Zheng:
Plug-and-Play PPO: An Adaptive Point Prompt Optimizer Making SAM Greater. 4332-4342 - Xueyi Ke, Satoshi Tsutsui, Yayun Zhang, Bihan Wen:
Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning. 4343-4352 - Li Ren, Chen Chen, Liqiang Wang, Kien A. Hua:
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers. 4353-4363 - Wenlong Yu, Qilong Wang, Chuang Liu, Dong Li, Qinghua Hu:
CoE: Chain-of-Explanation via Automatic Visual Concept Circuit Description and Polysemanticity Quantification. 4364-4374 - Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai, Jianyang Gu, Ziheng Zhang, Kazi Sajeed Mehrab, Elizabeth G. Campolongo, Daniel I. Rubenstein, Charles V. Stewart, Anuj Karpatne, Tanya Y. Berger-Wolf, Yu Su, Wei-Lun Chao:
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis. 4375-4385 - Aaron Serianni, Tyler Zhu, Olga Russakovsky, Vikram V. Ramaswamy:
Attention IoU: Examining Biases in CelebA using Attention Maps. 4386-4397 - Guangda Ji
, Silvan Weder, Francis Engelmann
, Marc Pollefeys, Hermann Blum:
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding. 4398-4407 - Andrey Gizdov, Shimon Ullman, Daniel Harari:
Seeing More with Less: Human-like Representations in Vision Models. 4408-4417 - Weiming Zhuang, Chen Chen, Zhizhong Li, Sina Sajadmanesh
, Jingtao Li, Jiabo Huang, Vikash Sehwag, Vivek Sharma, Hirotaka Shinozaki, Felan Carlo Garcia, Yihao Zhan, Naohiro Adachi, Ryoji Eki, Michael Spranger, Peter Stone, Lingjuan Lyu
:
Argus: A Compact and Versatile Foundation Model for Vision. 4418-4429 - Unki Park, Seongmoon Jeong, Youngchan Jang, Gyeong-Moon Park, Jong Hwan Ko:
Test-Time Fine-Tuning of Image Compression Models for Multi-Task Adaptability. 4430-4440 - Sofia Casarin, Sergio Escalera
, Oswald Lanz:
L-SWAG: Layer-Sample Wise Activation with Gradients Information for Zero-Shot NAS on Vision Transformers. 4441-4451 - Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu:
NADER: Neural Architecture Design via Multi-Agent Collaboration. 4452-4461 - Minghao Fu, Hao Yu, Jie Shao, Junjie Zhou, Ke Zhu, Jianxin Wu:
Quantization without Tears. 4462-4472 - Hongjun Wang, Wonmin Byeon, Jiarui Xu, Jinwei Gu, Ka Chun Cheung, Xiaolong Wang, Kai Han, Jan Kautz, Sifei Liu:
Parallel Sequence Modeling via Generalized Spatial Propagation Network. 4473-4483 - Weihao Yu, Xinchao Wang:
MambaOut: Do We Really Need Mamba for Vision? 4484-4496 - Haoyang He, Jiangning Zhang, Yuxuan Cai, Hongxu Chen, Xiaobin Hu, Zhenye Gan, Yabiao Wang, Chengjie Wang
, Yunsheng Wu, Lei Xie:
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network. 4497-4507 - Hao Yu, Tangyu Jiang, Shuning Jia, Shannan Yan, Shunning Liu, Haolong Qian, Guanghao Li, Shuting Dong, Chun Yuan:
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices. 4508-4517 - Yuwei Sun, Hideya Ochiai, Zhirong Wu, Stephen Lin, Ryota Kanai:
Associative Transformer. 4518-4527 - Jon Donnelly, Zhicheng Guo, Alina Jade Barnett, Hayden McTavish, Chaofan Chen, Cynthia Rudin:
Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time. 4528-4538 - Xin Lin, Chong Shi, Zuopeng Yang, Haojin Tang, Zhili Zhou:
SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection. 4539-4549 - Kiet A. Nguyen, Adheesh Sunil Juvekar, Tianjiao Yu, Muntasir Wahed, Ismini Lourentzou:
CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models. 4550-4561 - Zelin Peng, Zhengqin Xu, Zhilin Zeng, Changsong Wen, Yu Huang, Menglin Yang, Feilong Tang, Wei Shen:
Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space. 4562-4572 - Pei Wang, Zhaowei Cai, Hao Yang, Ashwin Swaminathan, R. Manmatha, Stefano Soatto:
Scaling up Image Segmentation across Data and Tasks. 4573-4583 - Amin Karimi, Charalambos Poullis:
DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation. 4584-4594 - Yuchen Zhu, Cheng Shi, Dingyou Wang, Jiajin Tang, Zhengxuan Wei, Yu Wu, Guanbin Li, Sibei Yang:
Rethinking Query-based Transformer for Continual Image Segmentation. 4595-4606 - Seun-An Choe, Keon-Hee Park, Jinwoo Choi, Gyeong-Moon Park:
Universal Domain Adaptation for Semantic Segmentation. 4607-4617 - Yuhan Liu, Yixiong Zou, Yuhua Li, Ruixuan Li:
The Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation. 4618-4627 - Yan Yang, Liyuan Pan, Dongxu Li, Liu Liu:
EZSR: Event-based Zero-Shot Recognition. 4628-4638 - Xianing Chen, Si Huo, Borui Jiang, Hailin Hu, Xinghao Chen:
Single Domain Generalization for Few-Shot Counting via Universal Representation Matching. 4639-4649 - Hao Tan, Zichang Tan, Jun Li, Ajian Liu, Jun Wan, Zhen Lei
:
Recover and Match: Open-Vocabulary Multi-Label Recognition through Knowledge-Constrained Optimal Transport. 4650-4660 - Dongseob Kim, Hyunjung Shim:
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification. 4661-4671 - Phi Vu Tran:
SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection. 4672-4681 - Aming Wu, Cheng Deng:
Percept, Memory, and Imagine: World Feature Simulating for Open-Domain Unknown Object Detection. 4682-4691 - Marc-Antoine Lavoie, Anas Mahmoud, Steven L. Waslander:
Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection. 4692-4702 - Zhixiong Nan, Xianghong Li, Jifeng Dai, Tao Xiang
:
MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism. 4703-4712 - Huixin Sun, Runqi Wang, Yanjing Li, Linlin Yang, Shaohui Lin, Xianbin Cao, Baochang Zhang:
SET: Spectral Enhancement for Tiny Object Detection. 4713-4723 - Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou, Yan Gu:
Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection. 4724-4733 - Kaichen Yang, Junjie Cao, Zeyu Bai, Zhixun Su, Andrea Tagliasacchi:
PIAD: Pose and Illumination agnostic Anomaly Detection. 4734-4743 - Wenxin Ma, Xu Zhang, Qingsong Yao, Fenghe Tang
, Chenxu Wu, Yingtai Li, Rui Yan, Zihang Jiang, S. Kevin Zhou
:
AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP. 4744-4754 - Ziming Huang, Xurui Li, Haotian Liu, Feng Xue, Yuzhe Wang, Yu Zhou:
AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios. 4755-4765 - Xiaofan Li, Xin Tan, Zhuo Chen, Zhizhong Zhang, Ruixin Zhang, Rizen Guo, Guannan Jiang, Yulong Chen, Yanyun Qu, Lizhuang Ma, Yuan Xie:
One-for-More: Continual Diffusion Model for Anomaly Detection. 4766-4775 - Shibin Mei, Hang Wang, Bingbing Ni:
GeoMM: On Geodesic Perspective for Multi-modal Learning. 4776-4786 - Seonggon Kim, Juncheol Shin, Seung-taek Woo, Eunhyeok Park:
HOT: Hadamard-based Optimized Training. 4787-4796 - Zhiqiang Shen, Ammar Sherif, Zeyuan Yin, Shitong Shao:
DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation. 4797-4806 - Jiamu Zhang, Shaochen Zhong, Andrew Ye, Zirui Liu, Sebastian Zhao, Kaixiong Zhou, Li Li, Soo-Hyun Choi, Rui Chen, Xia Hu, Shuai Xu, Vipin Chaudhary:
Flexible Group Count Enables Hassle-Free Structured Pruning. 4807-4818 - Fu Feng
, Yucheng Xie, Jing Wang, Xin Geng:
WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models. 4819-4828 - Hongxu Chen, Zhen Wang, Runshi Li, Bowei Zhu, Long Chen:
IterIS: Iterative Inference-Solving Alignment for LoRA Merging. 4829-4838 - Qiang Wang, Xiang Song, Yuhang He, Jizhou Han, Chenhao Ding, Xinyuan Gao, Yihong Gong:
Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need. 4839-4849 - Yuhao Zhou, Yuxin Tian, Jindi Lv, Mingjia Shi, Yuanxi Li, Qing Ye, Shuhao Zhang, Jiancheng Lv:
Ferret: An Efficient Online Continual Learning Framework under Varying Memory Constraints. 4850-4861 - Xiaohan Zou, Wenchao Ma, Shu Zhao:
Learning Conditional Space-Time Prompt Distributions for Video Class-Incremental Learning. 4862-4873 - Hao Yu, Xin Yang, Le Zhang, Hanlin Gu, Tianrui Li, Lixin Fan
, Qiang Yang:
Handling Spatial-Temporal Data Heterogeneity for Federated Continual Learning via Tail Anchor. 4874-4883 - Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto:
Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning. 4884-4893 - Guannan Lai, Yujie Li, Xiangkun Wang, Junbo Zhang, Tianrui Li, Xin Yang:
Order-Robust Class Incremental Learning: Graph-Driven Dynamic Similarity Grouping. 4894-4904 - Vaibhav Rathore, Shubhranil B, Saikat Dutta, Sarthak Mehrotra, Zsolt Kira, Biplab Banerjee:
When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach. 4905-4915 - Yue Zhang, Mingyue Bin, Yuyang Zhang, Zhongyuan Wang, Zhen Han, Chao Liang:
Link-based Contrastive Learning for One-Shot Unsupervised Domain Adaptation. 4916-4926 - Weiming Liu, Jun Dan, Fan Wang, Xinting Liao, Junhao Dong, Hua Yu, Shunjie Dong, Lianyong Qi:
Distinguish Then Exploit: Source-free Open Set Domain Adaptation via Weight Barcode Estimation and Sparse Label Assignment. 4927-4938 - Shinnosuke Matsuo, Riku Togashi, Ryoma Bise, Seiichi Uchida
, Masahiro Nomura:
Instance-wise Supervision-level Optimization in Active Learning. 4939-4947 - Sk Miraj Ahmed, Umit Yigit Basaran, Dripta S. Raychaudhuri, Arindam Dutta, Rohit Kundu, Fahim Faisal Niloy, Basak Guler, Amit K. Roy-Chowdhury:
Towards Source-Free Machine Unlearning. 4948-4957 - Taero Kim, Subeen Park, Sungjun Lim, Yonghan Jung, Krikamol Muandet, Kyungwoo Song:
Sufficient Invariant Learning for Distribution Shift. 4958-4967 - Zhiwei Ling, Yachen Chang, Hailiang Zhao, Xinkui Zhao, Kingsum Chow, Shuiguang Deng:
CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging. 4968-4977 - Zheng Wang, Zihui Wang, Zheng Wang, Xiaoliang Fan, Cheng Wang:
Federated Learning with Domain Shift Eraser. 4978-4987 - Run He, Kai Tong, Di Fang, Han Sun, Ziqian Zeng, Haoran Li, Tianyi Chen, Huiping Zhuang:
AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models. 4988-4998 - K. Naveen Kumar
, Ranjeet Ranjan Jha, C. Krishna Mohan, Ravindra Babu Tallamraju:
Fortifying Federated Learning Towards Trustworthiness via Auditable Data Valuation and Verifiable Client Contribution. 4999-5009 - Tae-Young Lee, Sundong Park, Minwoo Jeon, Hyoseok Hwang, Gyeong-Moon Park:
ESC: Erasing Space Concept for Knowledge Deletion. 5010-5019 - Jiate Li
, Meng Pang, Yun Dong, Binghui Wang:
Deterministic Certification of Graph Neural Networks against Graph Poisoning Attacks with Arbitrary Perturbations. 5020-5029 - Keke Tang, Chao Hou, Weilong Peng, Xiang Fang, Zhize Wu, Yongwei Nie, Wenping Wang, Zhihong Tian:
Simplification Is All You Need against Out-of-Distribution Overconfidence. 5030-5040 - Ping Guo
, Cheng Gong
, Xi Lin
, Fei Liu
, Zhichao Lu
, Qingfu Zhang, Zhenkun Wang:
MOS-Attack: A Scalable Multi-objective Adversarial Attack Framework. 5041-5051 - Banglong Liu
, Niuniu Qi, Xia Zeng, Lydia Dehbi, Zhengfeng Yang
:
Automated Proof of Polynomial Inequalities via Reinforcement Learning. 5052-5060 - Haiming Xu, Qianqian Wang, Boyue Wang, Quanxue Gao:
Deep Fair Multi-View Clustering with Attention KAN. 5061-5070 - Yuzhuo Dai, Jiaqi Jin, Zhibin Dong, Siwei Wang, Xinwang Liu, En Zhu, Xihong Yang, Xinbiao Gan, Yu Feng:
Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning. 5071-5081 - Zijian Dong, Yilei Wu, Chongyao Chen, Yingtian Zou, Yichi Zhang, Juan Helen Zhou:
Improve Representation for Imbalanced Regression through Geometric Constraints. 5082-5091 - Shanglin Liu, Jianming Lv, Jingdan Kang, Huaidong Zhang, Zequan Liang, Shengfeng He:
MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining. 5092-5101 - Yingxue Xu, Fengtao Zhou
, Chenyu Zhao, Yihui Wang
, Can Yang, Hao Chen:
Distilled Prompt Learning for Incomplete Multimodal Survival Prediction. 5102-5111 - Wei Li, Jiawei Jiang, Jie Wu, Kaihao Yu, Jianwei Zheng:
LMO: Linear Mamba Operator for MRI Reconstruction. 5112-5122 - Xiao Wang, Fuling Wang, Yuehang Li, Qingchuan Ma, Shiao Wang, Bo Jiang, Jin Tang:
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset. 5123-5133 - Ying Chen, Guoan Wang, Yuanfeng Ji
, Yanjun Li, Jin Ye, Tianbin Li, Ming Hu, Rongshan Yu, Yu Qiao, Junjun He:
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding. 5134-5143 - Junxian Wu, Minheng Chen
, Xinyi Ke, Tianwang Xun, Xiaoming Jiang, Hongyu Zhou, Lizhi Shao, Youyong Kong:
Learning Heterogeneous Tissues with Mixture of Experts for Gigapixel Whole Slide Images. 5144-5153 - Ziyuan Yang, Yingyu Chen, Zhiwen Wang, Hongming Shan, Yang Chen, Yi Zhang:
Patient-Level Anatomy Meets Scanning-Level Physics: Personalized Federated Low-Dose CT Denoising Empowered by Large Language Model. 5154-5163 - Shaohao Rui, Lingzhi Chen, Zhenyu Tang, Lilong Wang, Mianxin Liu, Shaoting Zhang, Xiaosong Wang:
Multi-modal Vision Pre-training for Medical Image Analysis. 5164-5174 - Qinghe Ma, Jian Zhang, Zekun Li, Lei Qi, Qian Yu, Yinghuan Shi:
Steady Progress Beats Stagnation: Mutual Aid of Foundation and Conventional Models in Mixed Domain Semi-Supervised Medical Image Segmentation. 5175-5185 - Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko, Andrei Goncharov, Alberto Paderno, Maximilian Miller, Leander Maerkisch, Paul F. Jaeger, Klaus H. Maier-Hein
:
Revisiting MAE Pre-training for 3D Medical Image Segmentation. 5186-5196 - Feng Yu
, Jiacheng Cao, Li Liu, Minghua Jiang:
SuperLightNet: Lightweight Parameter Aggregation Network for Multimodal Brain Tumor Segmentation. 5197-5206 - Jiongtong Hu, Wufeng Xue, Jun Cheng, Yingying Liu, Wei Zhuo, Dong Ni:
EchoONE: Segmenting Multiple Echocardiography Planes in One Model. 5207-5216 - Jinho Joo, Hyeseong Kim
, Hyeyeon Won, Deukhee Lee, Taejoon Eo, Dosik Hwang:
AeSPa : Attention-guided Self-supervised Parallel Imaging for MRI Reconstruction. 5217-5226 - Xinxing Cheng, Tianyang Zhang, Wenqi Lu, Qingjie Meng, Alejandro F. Frangi, Jinming Duan
:
SACB-Net: Spatial-awareness Convolutions for Medical Image Registration. 5227-5237 - Federico Bolelli
, Kevin Marchesini, Niels van Nistelrooij, Luca Lumetti, Vittorio Pipoli
, Elisa Ficarra, Shankeeth Vinayahalingam, Costantino Grana:
Segmenting Maxillofacial Structures in CBCT Volumes. 5238-5248 - Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, Stan Birchfield:
FoundationStereo: Zero-Shot Stereo Matching. 5249-5260 - Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, Jiaolong Yang:
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision. 5261-5271 - Haoyu Guo, He Zhu, Sida Peng, Haotong Lin, Yunzhi Yan, Tao Xie, Wenguan Wang, Xiaowei Zhou, Hujun Bao:
Multi-view Reconstruction via SfM-guided Monocular Depth Estimation. 5272-5282 - Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu, Rakesh Ranjan, Alexander G. Schwing, Zhicheng Yan:
MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds. 5283-5293 - Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, David Novotný:
VGGT: Visual Geometry Grounded Transformer. 5294-5306 - Weiyu Li, Jiarui Liu, Hongyu Yan, Rui Chen, Yixun Liang, Xuelin Chen, Ping Tan, Xiaoxiao Long:
CraftsMan3D: High-fidelity Mesh Generation with 3D Native Diffusion and Interactive Geometry Refiner. 5307-5317 - Felix Taubner, Ruihang Zhang, Mathieu Tuli, David B. Lindell:
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models. 5318-5330 - Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr:
Reanimating Images using Neural Representations of Dynamic Stimuli. 5331-5343 - Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim, Yuting Ye, Richard A. Newcombe, Ziwei Liu, Lingni Ma:
EgoLM: Multi-Modal Language Model of Egocentric Motions. 5344-5354 - Yan Xia, Xiaowei Zhou, Etienne Vouga, Qixing Huang, Georgios Pavlakos:
Reconstructing Humans with a Biomechanically Accurate Skeleton. 5355-5365 - Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Francesc Moreno-Noguer:
MEGA: Masked Generative Autoencoder for Human Mesh Recovery. 5366-5378 - Liang Pan, Zeshi Yang, Zhiyang Dou, Wenjia Wang, Buzhen Huang, Bo Dai, Taku Komura, Jingbo Wang:
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization. 5379-5391 - Laurie Bose, Jianing Chen, Piotr Dudek:
Descriptor-In-Pixel : Point-Feature Tracking For Pixel Processor Arrays. 5392-5400 - Anna Manasyan, Maximilian Seitzer, Filip Radovic, Georg Martius, Andrii Zadaianchuk:
Temporally Consistent Object-Centric Learning by Contrasting Slots. 5401-5411 - SuBeen Lee, WonJun Moon, Hyun Seok Seong, Jae-Pil Heo:
Temporal Alignment-Free Video Matching for Few-shot Action Recognition. 5412-5421 - Zhejun Zhang, Péter Karkus, Maximilian Igl, Wenhao Ding, Yuxiao Chen, Boris Ivanovic, Marco Pavone
:
Closed-Loop Supervised Fine-Tuning of Tokenized Traffic Models. 5422-5432 - Otto Brookes, Maksim Kukushkin, Majid Mirmehdi
, Colleen Stephens, Paula Dieguez, Thurston C. Hicks, Sorrel Jones, Kevin Lee, Maureen S. McCarthy, Amelia Meier, Emmanuelle Normand, Erin G. Wessling, Roman M. Wittig, Kevin Langergraber, Klaus Zuberbühler
, Lukas Boesch, Thomas Schmid, Mimi Arandjelovic, Hjalmar S. Kühl, Tilo Burghardt:
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition. 5433-5443 - Yichen Xiao, Shuai Wang, Dehao Zhang, Wenjie Wei, Yimeng Shan
, Xiaoli Liu, Yulin Jiang, Malu Zhang:
Rethinking Spiking Self-Attention Mechanism: Implementing a-XNOR Similarity Calculation in Spiking Transformers. 5444-5454 - Kai Chen, Yunhao Gou, Runhui Huang, Zhili Liu, Daxin Tan, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok, Hengshuang Zhao, Xiaodan Liang, Dit-Yan Yeung, Xiao Chen, Zhenguo Li, Wei Zhang, Qun Liu, Lanqing Hong, Lu Hou, Hang Xu:
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions. 5455-5466 - Xiumei Xie, Zikai Huang
, Wenhao Xu, Peng Xiao, Xuemiao Xu, Huaidong Zhang:
Let's Chorus: Partner-aware Hybrid Song-Driven 3D Head Animation. 5467-5476 - Antoni Bigata Casademunt, Michal Stypulkowski, Rodrigo Mira, Stella Bounareli, Konstantinos Vougioukas, Zoe Landgraf, Nikita Drobyshev, Maciej Zieba, Stavros Petridis, Maja Pantic:
KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation. 5477-5488 - Rang Meng, Xingyu Zhang, Yuming Li, Chenguang Ma
:
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation. 5489-5498 - Di Chang, Hongyi Xu, You Xie, Yipeng Gao, Zhengfei Kuang, Shengqu Cai, Chenxu Zhang, Guoxian Song, Chao Wang, Yichun Shi, Zeyuan Chen, Shijie Zhou, Linjie Luo, Gordon Wetzstein, Mohammad Soleymani:
X-Dyna: Expressive Dynamic Human Image Animation. 5499-5509 - Yiqun Mei, Mingming He, Li Ma, Julien Philip, Wenqi Xian, David M. George, Xueming Yu, Gabriel Dedic, Ahmet Levent Tasel, Ning Yu, Vishal M. Patel, Paul E. Debevec:
Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset. 5510-5522 - Shengjie Gong, Haojie Li, Jiapeng Tang, Dongming Hu, Shuangping Huang, Hao Chen, Tianshui Chen, Zhuoman Liu:
Monocular and Generalizable Gaussian Talking Head Animation. 5523-5534 - Jiawei Zhang, Zijian Wu, Zhiyang Liang, Yicheng Gong, Dongfang Hu, Yao Yao, Xun Cao, Hao Zhu:
FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video. 5535-5545 - Jiapeng Tang, Davide Davoli, Tobias Kirschstein, Liam Schoneveld, Matthias Nießner:
GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion. 5546-5558 - Chen Guo, Junxuan Li, Yash Kant, Yaser Sheikh, Shunsuke Saito, Chen Cao:
Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior. 5559-5570 - Yufan Wu, Xuanhong Chen, Wen Li, Shunran Jia, Hualiang Wei, Kairui Feng, Jialiang Chen, Yuhan Li, Ang He, Weimin Zhang, Bingbing Ni, Wenjun Zhang:
SinGS: Animatable Single-Image Human Gaussian Splats with Kinematic Priors. 5571-5580 - Suzhen Wang
, Weijie Chen, Wei Zhang, Minda Zhao, Lincheng Li, Rongsheng Zhang, Zhipeng Hu, Xin Yu
:
EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting. 5581-5591 - Yuxin Yao
, Zhi Deng, Junhui Hou
:
RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos. 5592-5601 - Yuxiang Mao
, Zhenfeng Fan, ZhiJie Zhang, Zhiheng Zhang, Shihong Xia:
Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model. 5602-5613 - Wooseok Jang, Youngjun Hong, Geonho Cha, Seungryong Kim:
ControlFace: Harnessing Facial Parametric Control for Face Rigging. 5614-5624 - Yifang Xu, Benxiang Zhai, Yunzhuo Sun, Ming Li, Yang Li, Sidan Du:
HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion. 5625-5635 - Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh, Kyoung Mu Lee:
DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image. 5636-5645 - Tengfei Xiao, Yue Wu, Yuelong Li, Can Qin, Maoguo Gong, Qiguang Miao, Wenping Ma:
Disentangled Pose and Appearance Guidance for Multi-Pose Generation. 5646-5655 - Yuanbo Wang, Zhaoxuan Zhang, Jiajin Qiu, Dilong Sun, Zhengyu Meng, Xiaopeng Wei, Xin Yang:
Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction. 5656-5665 - Zhiheng Liu, Ka Leong Cheng, Xi Chen, Jie Xiao, Hao Ouyang, Kai Zhu, Yu Liu, Yujun Shen, Qifeng Chen, Ping Luo:
MangaNinja: Line Art Colorization with Precise Reference Following. 5666-5677 - Qingsen Yan, Yixu Feng, Cheng Zhang, Guansong Pang
, Kangbiao Shi, Peng Wu, Wei Dong, Jinqiu Sun, Yanning Zhang:
HVI: A New Color Space for Low-light Image Enhancement. 5678-5687 - Tianfu Wang, Mingyang Xie, Haoming Cai, Sachin Shah, Christopher A. Metzler:
Flash-Split: 2D Reflection Removal with Flash Cues and Latent Diffusion Separation. 5688-5698 - Feiran Li, Haiyang Jiang, Daisuke Iso:
Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image Denoising. 5699-5708 - Hang Chen, Yin Xie, Xiaoxiu Peng, Lihu Sun, Wenkai Su, Xiaodong Yang, Chengming Liu:
Quad-Pixel Image Defocus Deblurring: A New Benchmark and Model. 5709-5719 - Jun Myeong Choi, Annie Wang, Pieter Peers, Anand Bhattad, Roni Sengupta:
ScribbleLight: Single Image Indoor Relighting with Scribbles. 5720-5731 - Xiulong Liu, Anurag Kumar, Paul Calamia, Sebastià Vicenc Amengual Garí, Calvin Murdock, Ishwarya Ananthabhotla
, Philip W. Robinson, Eli Shlizerman, Vamsi Krishna Ithapu, Ruohan Gao:
Hearing Anywhere in Any Environment. 5732-5741 - Tao Xie, Xi Chen, Zhen Xu, Yiman Xie, Yudong Jin, Yujun Shen, Sida Peng, Hujun Bao, Xiaowei Zhou:
EnvGS: Modeling View-Dependent Appearance with Environment Gaussian. 5742-5751 - Kaiwen Jiang, Venkataram Sivaram, Cheng Peng, Ravi Ramamoorthi:
Geometry Field Splatting with Gaussian Surfels. 5752-5762 - Ishit Mehta, Manmohan Chandraker, Ravi Ramamoorthi:
Locally Orderless Images for Optimization in Differentiable Rendering. 5763-5772 - Junyong Choi, Min-Cheol Sagong, SeokYeong Lee, Seung-Won Jung, Ig-Jae Kim, Junghyun Cho:
Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes. 5773-5782 - Moritz Heep, Sven Behnke, Eduard Zell:
Feature-Preserving Mesh Decimation for Normal Integration. 5783-5792 - Xinran Yang, Donghao Ji, Yuanqi Li, Jie Guo, Yanwen Guo, Junyuan Xie:
SGCR: Spherical Gaussians for Efficient 3D Curve Reconstruction. 5793-5803 - Zeyi Xu, Jinfan Liu, Kuangxu Chen, Ye Chen, Zhangli Hu, Bingbing Ni:
AMR-Transformer: Enabling Efficient Long-range Interaction for Complex Neural Fluid Simulation. 5804-5813 - Jianhui Wang, Zhifei Yang, Yangfan He, Huixiong Zhang, Yuxuan Chen, Jingwei Huang:
MaRI: Material Retrieval Integration across Domains. 5814-5823 - Xiancheng Sun, Mai Xu, Shengxi Li, Senmao Ma, Xin Deng, Lai Jiang, Gang Shen:
Spherical Manifold Guided Diffusion Model for Panoramic Image Generation. 5824-5834 - Zilong Chen, Yikai Wang, Wenqiang Sun, Feng Wang, Yiwen Chen, Huaping Liu:
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation. 5835-5848 - Soumyaratna Debnath, Ashish Tiwari, Kaustubh Sadekar
, Shanmuganathan Raman:
RASP: Revisiting 3D Anamorphic Art for Shadow-Guided Packing of Irregular Objects. 5849-5858 - Jesus Zarzar, Tom Monnier, Roman Shapovalov, Andrea Vedaldi, David Novotný:
Twinner: Shining Light on Digital Twins in a Few Snaps. 5859-5869 - Jiantao Lin, Xin Yang, Meixi Chen, Yingjie Xu, Dongyu Yan, Leyi Wu, Xinli Xu, Lie Xu, Shunsi Zhang, Ying-Cong Chen:
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation. 5870-5880 - Minghao Chen, Roman Shapovalov, Iro Laina, Tom Monnier, Jianyuan Wang, David Novotný, Andrea Vedaldi:
PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models. 5881-5892 - Tongyuan Bai, Wangyuanfan Bai, Dong Chen
, Tieru Wu, Manyi Li, Rui Ma:
FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts. 5893-5903 - Bahri Batuhan Bilecen, Yigit Yalin, Ning Yu, Aysegul Dundar:
Reference-Based 3D-Aware Image Editing with Triplanes. 5904-5915 - Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu:
WonderWorld: Interactive 3D Scene Generation from a Single Image. 5916-5926 - Aashish Rai, Dilin Wang, Mihir Jain, Nikolaos Sarafianos, Kefan Chen, Srinath Sridhar, Aayush Prakash:
UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping. 5927-5937 - Youngdong Jang, Hyunje Park, Feng Yang, Heeju Ko, Euijin Choo, Sangpil Kim:
3D-GSW: 3D Gaussian Splatting for Robust Watermarking. 5938-5948 - Alex Hanson
, Allen Tu, Vasu Singla, Mayuka Jayawardhana, Matthias Zwicker, Tom Goldstein:
PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting. 5949-5958 - Luca Savant Aira
, Gabriele Facciolo, Thibaud Ehret:
Gaussian Splatting for Efficient Satellite Image Photogrammetry. 5959-5969 - Christopher Thomas Thirgood, Oscar Mendez, Erin Chao Ling, Jon Storey, Simon Hadfield:
HyperGS: Hyperspectral 3D Gaussian Splatting. 5970-5979 - Junjin Xiao, Qing Zhang, Yonewei Nie, Lei Zhu, Wei-Shi Zheng:
RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images. 5980-5990 - Jinfeng Liu, Lingtong Kong, Bo Li, Dan Xu
:
GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping. 5991-6000 - Antoine Guédon, Tomoki Ichikawa, Kohei Yamashita, Ko Nishino:
MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views. 6001-6011 - Haeyun Choi, Heemin Yang, Janghyeok Han, Sunghyun Cho:
Exploiting Deblurring Networks for Radiance Fields. 6012-6021 - Junfeng Ni, Yu Liu, Ruijie Lu, Zirui Zhou, Song-Chun Zhu, Yixin Chen, Siyuan Huang:
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior. 6022-6033 - Mohammad Asim, Christopher Wewer, Thomas Wimmer, Bernt Schiele, Jan Eric Lenssen:
MET3R: Measuring Multi-View Consistency in Generated Images. 6034-6044 - Chenjie Cao, Chaohui Yu, Shang Liu, Fan Wang, Xiangyang Xue, Yanwei Fu:
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model. 6045-6056 - Maxim V. Shugaev, Vincent Chen, Maxim Karrenbach, Kyle Ashley, Bridget Kennedy, Naresh P. Cuntoor:
ERUPT: Efficient Rendering with Unposed Patch Transformer. 6057-6067 - Ningli Xu, Rongjun Qin:
Satellite to GroundScape - Large-scale Consistent Ground View Generation from Satellite Views. 6068-6077 - Sibo Wu, Congrong Xu, Binbin Huang, Andreas Geiger, Anpei Chen:
GenFusion: Closing the Loop between Reconstruction and Generation via Videos. 6078-6088 - Shengjun Zhang, Jinzhao Li, Xin Fei, Hao Liu, Yueqi Duan:
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model. 6089-6098 - Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Yuwei Fang, Kwot Sin Lee, Ivan Skorokhodov, Kfir Aberman, Jun-Yan Zhu, Ming-Hsuan Yang, Sergey Tulyakov:
Multi-subject Open-set Personalization in Video Generation. 6099-6110 - Haozhe Xie
, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu:
Generative Gaussian Splatting for Unbounded 3D City Generation. 6111-6120 - Xuanchi Ren, Tianchang Shen, Jiahui Huang, Huan Ling, Yifan Lu, Merlin Nimier-David, Thomas Müller, Alexander Keller, Sanja Fidler, Jun Gao:
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control. 6121-6132 - Yingji Zhong, Zhihao Li, Dave Zhenyu Chen, Lanqing Hong, Dan Xu
:
Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs. 6133-6143 - Jinxiu Liu, Shaoheng Lin, Yinxiao Li, Ming-Hsuan Yang:
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes. 6144-6153 - Remy Sabathier, Niloy J. Mitra, David Novotný:
LIM: Large Interpolator Model for Dynamic Reconstruction. 6154-6164 - Jiahui Lei, Yijia Weng, Adam W. Harley, Leonidas J. Guibas, Kostas Daniilidis:
MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds. 6165-6177 - Boyuan Chen, Hanxiao Jiang, Shaowei Liu, Saurabh Gupta, Yunzhu Li, Hao Zhao, Shenlong Wang:
PhysGen3D: Crafting a Miniature Interactive World from a Single Image. 6178-6189 - Matthew Marchellus, Nadhira Noor, In Kyu Park:
Link to the Past: Temporal Propagation for Fast 3D Human Reconstruction from Monocular Video. 6190-6199 - Changan Chen, Juze Zhang, Shrinidhi K. Lakshmikanth, Yusu Fang, Ruizhi Shao, Gordon Wetzstein, Li Fei-Fei, Ehsan Adeli
:
The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion. 6200-6211 - Zhenyu Zhou
, Chengdong Dong, Ajay Kumar:
Towards Explainable and Unprecedented Accuracy in Matching Challenging Finger Crease Patterns. 6212-6221 - Yuhan Bao, Shaohua Gao, Wenyong Li, Kaiwei Wang:
One-Step Event-Driven High-Speed Autofocus. 6222-6230 - Yanchen Dong
, Ruiqin Xiong, Xiaopeng Fan
, Zhaofei Yu, Yonghong Tian, Tiejun Huang:
Self-Supervised Learning for Color Spike Camera Reconstruction. 6231-6240 - Kazuma Kitazawa, Takahito Aoto, Satoshi Ikehata, Tsuyoshi Takatani:
PS-EIP: Robust Photometric Stereo Based on Event Interval Profile. 6241-6251 - Yongfan Liu, Hyoukjun Kwon:
Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses. 6252-6261 - Jinhong Wang, Jian Liu, Dongqi Tang, Weiqiang Wang, Wentong Li, Danny Chen, Jintai Chen, Jian Wu:
Scalable Autoregressive Monocular Depth Estimation. 6262-6272 - Junda Cheng, Longliang Liu, Gangwei Xu, Xianqi Wang, Zhaoxing Zhang, Yong Deng, Jinliang Zang, Yurui Chen, Zhipeng Cai, Xin Yang:
MonSter: Marry Monodepth to Stereo Unleashes Power. 6273-6282 - Juhyung Choi, Jinnyeong Kim, Seokjun Choi, Jinwoo Lee
, Samuel Brucker, Mario Bijelic, Felix Heide, Seung-Hwan Baek:
Dual Exposure Stereo for Extended Dynamic Range 3D Imaging. 6283-6293 - Kaining Zhang, Yuxin Deng, Jiayi Ma, Paolo Favaro:
Adapting Dense Matching for Homography Estimation with Grid-based Acceleration. 6294-6303 - Patrick Rim
, Hyoungseob Park, Suchisrit Gangopadhyay, Ziyao Zeng, Younjoon Chung
, Alex Wong
:
ProtoDepth: Unsupervised Continual Depth Completion with Prototypes. 6304-6316 - Qitao Zhao, Amy Lin, Jeff Tan, Jason Y. Zhang, Deva Ramanan, Shubham Tulsiani:
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion. 6317-6326 - Zeran Ke, Bin Tan, Xianwei Zheng, Yujun Shen, Tianfu Wu, Nan Xue:
ScaleLSD: Scalable Deep Line Segment Detection Streamlined. 6327-6336 - Dongki Jung, Jaehoon Choi, Yonghan Lee, Somi Jeong, Taejae Lee, Dinesh Manocha, Suyong Yeon:
EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching. 6337-6347 - Yue Chen
, Xingyu Chen, Anpei Chen, Gerard Pons-Moll
, Yuliang Xiu
:
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting. 6348-6361 - Zimin Xia, Alexandre Alahi:
FG^2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching. 6362-6372 - Philip Doldo, Derek Everett, Amol Khanna, André T. Nguyen, Edward Raff:
Stop Walking in Circles! Bailing Out Early in Projected Gradient Descent. 6373-6382 - Ayush Shrivastava, Andrew Owens:
Self-Supervised Spatial Correspondence Across Modalities. 6383-6393 - Gonglin Chen, Tianwen Fu, Haiwei Chen, Wenbin Teng, Hanyuan Xiao, Yajie Zhao:
RDD: Robust Feature Detector and Descriptor using Deformable Transformer. 6394-6403 - JongMin Lee, Sungjoo Yoo:
Dense-SfM: Structure from Motion with Dense Consistent Matching. 6404-6414 - Yuto Matsubara, Ko Nishino:
HeatFormer: A Neural Optimizer for Multiview Human Mesh Recovery. 6415-6424 - Ben Kaye, Tomas Jakab, Shangzhe Wu, Christian Ruprecht, Andrea Vedaldi:
DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction. 6425-6435 - Tuo Cao, Fei Luo, Jiongming Qin, Yu Jiang, Yusen Wang, Chunxia Xiao:
iG-6DoF: Model-free 6DoF Pose Estimation for Unseen Object via Iterative 3D Gaussian Splatting. 6436-6446 - Jaeguk Kim, Jaewoo Park, Keuntek Lee, Nam Ik Cho:
RefPose: Leveraging Reference Geometric Correspondences for Accurate 6D Pose Estimation of Unseen Objects. 6447-6456 - Mengya Liu
, Siyuan Li, Ajad Chhatkuli, Prune Truong, Luc Van Gool, Federico Tombari:
One2Any: One-Reference 6D Pose Estimation for Any Object. 6457-6467 - Leonhard Sommer, Olaf Dünkel, Christian Theobalt, Adam Kortylewski:
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space. 6468-6479 - Burak Bekci, Nassir Navab, Federico Tombari, Mahdi Saleh:
ESCAPE: Equivariant Shape Completion via Anchor Point Encoding. 6480-6489 - Jiayang Ao, Yanbei Jiang, Qiuhong Ke, Krista A. Ehinger:
Open-World Amodal Appearance Completion. 6490-6499 - Chuanyu Sun, Jiqing Zhang, Yang Wang, Huilin Ge, Qianchen Xia, Baocai Yin, Xin Yang:
Exploring Historical Information for RGBE Visual Tracking with Mamba. 6500-6509 - Albert W. Reed, Connor Hashemi, Dennis Melamed, Nitesh Menon, Keigo Hirakawa, Scott McCloskey:
EBS-EKF: Accurate and High Frequency Event-based Star Tracking. 6510-6519 - Fanqi Pu, Yifan Wang, Jiru Deng, Wenming Yang:
MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors. 6520-6530 - Rishubh Parihar, Srinjay Sarkar, Sarthak Vora, Jogendra Nath Kundu, R. Venkatesh Babu:
MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection. 6531-6541 - Lu Sang, Zehranaz Canfes, Dongliang Cao, Riccardo Marin, Florian Bernard, Daniel Cremers:
4Deform: Neural Surface Deformation for Robust Shape Interpolation. 6542-6551 - Amine Ouasfi, Shubhendu Jena, Éric Marchand, Adnane Boukhayma:
Toward Robust Neural Reconstruction from Sparse Point Sets. 6552-6562 - Qirui Huang, Runze Zhang, Kangjun Liu, Minglun Gong, Hao Zhang, Hui Huang:
ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points. 6563-6572 - Johan Edstedt, André Mateus
, Alberto Jaenal:
ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration. 6573-6583 - Xu Han, Yuan Tang, Jinfeng Xu
, Xianzhi Li:
MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning. 6584-6594 - Liyan Chen, Gregory P. Meyer, Zaiwei Zhang, Eric M. Wolff, Paul Vernaza:
Flash3D: Super-scaling Point Transformers through Joint Hardware-Geometry Locality. 6595-6604 - Song Wang, Xiaolu Liu, Lingdong Kong, Jianyun Xu, Chunyong Hu, Gongfan Fang, Wentong Li, Jianke Zhu, Xinchao Wang:
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning. 6605-6615 - Feifei Shao, Ping Liu, Zhao Wang, Yawei Luo, Hongwei Wang, Jun Xiao:
MICAS: Multi-grained In-Context Adaptive Sampling for 3D Point Cloud Processing. 6616-6626 - Chuanfu Shen, Rui Wang, Lixin Duan, Shiqi Yu
:
LidarGait++: Learning Local Features and Size Awareness from LiDAR Point Clouds for 3D Gait Recognition. 6627-6636 - Yanlong Xu, Haoxuan Qu, Jun Liu, Wenxiao Zhang, Xun Yang:
CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework. 6637-6647 - Ethan Griffiths, Maryam Haghighat
, Simon Denman, Clinton Fookes, Milad Ramezani:
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views. 6648-6658 - Yanqing Shen, Turcan Tuna, Marco Hutter, César Cadena, Nanning Zheng:
ForestLPR: LiDAR Place Recognition in Forests Attentioning Multiple BEV Density Images. 6659-6669 - Barza Nisar, Steven L. Waslander:
PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds. 6670-6679 - Wen Li, Chen Liu, Shangshu Yu, Dunqiang Liu, Yin Zhou, Siqi Shen, Chenglu Wen, Cheng Wang:
LightLoc: Learning Outdoor LiDAR Localization at Light Speed. 6680-6689 - Junsung Park, Hwijeong Lee, Inha Kang, Hyunjung Shim:
No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather. 6690-6699 - Van-Tin Luu, Yon-Lin Cai, Vu-Hoang Tran
, Wei-Chen Chiu, Yi-Ting Chen, Ching-Chun Huang:
RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network. 6700-6709 - Ting Li, Mao Ye, Tianwen Wu, Nianxin Li, Shuaifeng Li, Song Tang, Luping Ji:
Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection. 6710-6719 - Konyul Park, Yecheol Kim, Daehun Kim, Jun Won Choi:
Resilient Sensor Fusion Under Adverse Sensor Failures via Multi-Modal Expert Fusion. 6720-6729 - Chaocan Xue
, Bineng Zhong, Qihua Liang, Yaozong Zheng, Ning Li, Yuanliang Xue
, Shuxiang Song:
Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking. 6730-6740 - Vladimir Yugay, Theo Gevers, Martin R. Oswald:
MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM. 6741-6750 - Zaipeng Duan
, Chenxu Dang, Xuzhong Hu, Pei An, Junfeng Ding, Jie Zhan, YunBiao Xu, Jie Ma:
SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction. 6751-6760 - Ziyue Zhu, Shenlong Wang, Jin Xie, Jiang-jiang Liu, Jingdong Wang, Jian Yang:
VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction. 6761-6771 - Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang, Jie Zhou, Jiwen Lu:
GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction. 6772-6781 - Chensheng Peng, Chengwei Zhang, Yixiao Wang, Chenfeng Xu, Yichen Xie, Wenzhao Zheng, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan:
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes. 6782-6791 - Runjian Chen, Wenqi Shao, Bo Zhang, Shaoshuai Shi, Li Jiang, Ping Luo:
JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data. 6792-6801 - Yifan Chang, Junjie Huang, Xiaofeng Wang, Yun Ye, Zhujin Liang, Yi Shan, Dalong Du, Xingang Wang:
Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection. 6802-6811 - Zehao Zhu, Yuliang Zou, Chiyu Max Jiang, Bo Sun, Vincent Casser, Xiukun Huang, Jiahao Wang, Zhenpei Yang, Ruiqi Gao, Leonidas J. Guibas, Mingxing Tan, Dragomir Anguelov:
SceneCrafter: Controllable Multi-View Driving Scene Editing. 6812-6822 - Xinyuan Chang, Maixuan Xue, Xinran Liu, Zheng Pan, Xing Wei:
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map. 6823-6833 - Junhao Xu, Yanan Zhang, Zhi Cai, Di Huang:
CoSDH: Communication-Efficient Collaborative Perception via Supply-Demand Awareness and Intermediate-Late Hybridization. 6834-6843 - Yanhao Wu, Haoyang Zhang, Tianwei Lin, Lichao Huang, Shujie Luo, Rui Wu, Congpei Qiu, Wei Ke, Tong Zhang:
Generating Multimodal Driving Scenes via Next-Scene Prediction. 6844-6853 - Bozhou Zhang, Nan Song, Xin Jin, Li Zhang:
Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning. 6854-6863 - Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao, Qiannan Guo, Jiayin Zhu, Pengfei Li, Zilong Chen, Huiming Yang, Zhiwei Li, Lening Wang, Tiao Tan, Huaping Liu:
MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception. 6864-6874 - Xinhao Liu, Jintong Li, Yicheng Jiang, Niranjan Sujay, Zhicheng Yang, Juexiao Zhang, John Abanes, Jing Zhang, Chen Feng:
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos. 6875-6885 - Mohamed Aghzal
, Xiang Yue, Erion Plaku, Ziyu Yao:
Evaluating Vision-Language Models as Evaluators in Path Planning. 6886-6897 - Sheng Fan, Rui Liu, Wenguan Wang, Yi Yang:
Scene Map-based Prompt Tuning for Navigation Instruction Generation. 6898-6908 - Manon Dampfhoffer
, Thomas Mesquida, Damien Joubert, Thomas Dalgaty, Pascal Vivet, Christoph Posch:
Graph Neural Network Combining Event Stream and Periodic Aggregation for Low-Latency Event-based Vision. 6909-6918 - Enshen Zhou, Qi Su, Cheng Chi, Zhizheng Zhang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, He Wang:
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection. 6919-6929 - Raktim Gautam Goswami, Prashanth Krishnamurthy, Yann LeCun, Farshad Khorrami:
RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training. 6930-6939 - Weijie Zhou
, Manli Tao, Chaoyang Zhao, Haiyun Guo, Honghui Dong, Ming Tang, Jinqiao Wang:
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability. 6940-6949 - Ruihai Wu, Ziyu Zhu, Yuran Wang, Yue Chen, Jiarui Wang, Hao Dong:
GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation. 6950-6959 - Jiange Yang, Haoyi Zhu, Yating Wang, Gangshan Wu, Tong He, Limin Wang:
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning. 6960-6970 - Shijie Wu, Yihang Zhu, Yunao Huang, Kaizhen Zhu, Jiayuan Gu, Jingyi Yu, Ye Shi, Jingya Wang:
AffordDP: Generalizable Diffusion Policy with Transferable Affordance. 6971-6980 - Wenke Xia, Ruoxuan Feng, Dong Wang, Di Hu:
Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction. 6981-6990 - Kailin Li, Puhao Li, Tengyu Liu, Yuyang Li, Siyuan Huang:
ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning. 6991-7003 - Mingju Gao, Yike Pan, Huan-ang Gao, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, Hao Zhao:
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model. 7004-7014 - Jinlu Zhang, Yixin Chen, Zan Wang, Jie Yang, Yizhou Wang, Siyuan Huang:
InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing. 7015-7025 - Aditya Prakash, Benjamin Lundell, Dmitry Andreychuk, David Forsyth, Saurabh Gupta, Harpreet Sawhney:
How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions. 7026-7036 - Yumeng Liu, Xiaoxiao Long, Zemin Yang, Yuan Liu, Marc Habermann, Christian Theobalt, Yuexin Ma, Wenping Wang:
EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild. 7037-7047 - Sirui Xu, Dongting Li, Yucheng Zhang, Xiyan Xu, Qi Long, Ziyin Wang, Yunzhi Lu, Shuchang Dong, Hezi Jiang, Akshat Gupta, Yu-Xiong Wang, Liang-Yan Gui:
InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation. 7048-7060 - Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Shangchen Han, Fan Zhang, Linguang Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard A. Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan:
HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos. 7061-7071 - Brent Yi, Vickie Ye, Maya Zheng, Yunqi Li, Lea Müller, Georgios Pavlakos, Yi Ma, Jitendra Malik, Angjoo Kanazawa:
Estimating Body and Hand Motion in an Ego-sensed World. 7072-7084 - Huakun Liu, Hiroki Ota, Xin Wei
, Yutaro Hirao, Monica Perusquía-Hernández, Hideaki Uchiyama, Kiyoshi Kiyokawa:
UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units. 7085-7094 - Jihyun Lee, Weipeng Xu, Alexander Richard, Shih-En Wei, Shunsuke Saito, Shaojie Bai, Te-Li Wang, Minhyuk Sung, Tae-Kyun Kim, Jason M. Saragih:
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning. 7095-7104 - Xiaoning Sun, Dong Wei, Huaijiang Sun, Shengxiang Hu:
LAL: Enhancing 3D Human Motion Prediction with Latency-aware Auxiliary Learning. 7105-7114 - Daniel Etaat, Dvij Kalaria, Nima Rahmanian, S. Shankar Sastry:
LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos. 7115-7124 - Sanjay Subramanian, Evonne Ng, Lea Müller, Dan Klein, Shiry Ginosar, Trevor Darrell:
Pose Priors from Language Models. 7125-7135 - Mingzhen Huang, Fu-Jen Chu, Bugra Tekin, Kevin J. Liang, Haoyu Ma, Weiyao Wang, Xingyu Chen, Pierre Gleize, Hongfei Xue, Siwei Lyu, Kris Kitani, Matt Feiszli, Hao Tang:
HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models. 7136-7146 - Yuan Wang, Yali Li, Xiang Li, Shengjin Wang:
HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction. 7147-7157 - Seokhyeon Hong, Chaelin Kim, Serin Yoon
, Junghyun Nam, Sihun Cha, Junyong Noh:
SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing. 7158-7168 - Yabiao Wang, Shuo Wang, Jiangning Zhang, Ke Fan, Jiafu Wu, Zhucun Xue, Yong Liu:
TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation. 7169-7178 - Qihang Fang, Chengcheng Tang, Bugra Tekin, Shugao Ma, Yanchao Yang:
HuMoCon: Concept Discovery for Human Motion Understanding. 7179-7190 - Jiayi Gao, Zijin Yin, Changcheng Hua, Yuxin Peng, Kongming Liang, Zhanyu Ma, Jun Guo, Yang Liu:
ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer. 7191-7200 - Mingzhe Guo, Weiping Tan, Wenyu Ran, Liping Jing, Zhipeng Zhang:
DreamTrack: Dreaming the Future for Multimodal Visual Object Tracking. 7201-7210 - Seokju Cho, Jiahui Huang, Seungryong Kim, Joon-Young Lee:
Seurat: From Moving Points to Depth. 7211-7221 - Jiaqi Li, Yiran Wang, Jinghong Zheng, Junrui Zhang, Liao Shen, Tianqi Liu, Zhiguo Cao:
CH3Depth: Efficient and Flexible Depth Foundation Model with Flow Matching. 7222-7232 - Bingxin Ke
, Dominik Narnhofer, Shengyu Huang, Lei Ke, Torben Peters, Katerina Fragkiadaki, Anton Obukhov, Konrad Schindler:
Video Depth without Video Models. 7233-7243 - Wonyong Seo, Jihyong Oh, Munchurl Kim:
BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions. 7244-7253 - Shiyi Liang, Yifan Bai, Yihong Gong, Xing Wei:
Autoregressive Sequential Pretraining for Visual Tracking. 7254-7264 - Yuyang Huang
, Yabo Chen, Li Ding, Xiaopeng Zhang, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian:
IM-Zero: Instance-level Motion Controllable Video Generation in a Zero-shot Manner. 7265-7275 - Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye, Niloy J. Mitra, Duygu Ceylan:
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation. 7276-7287 - Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Tien-Tsin Wong, Yuan-Fang Li, Cunjian Chen:
Consistent and Controllable Image Animation with Motion Diffusion Models. 7288-7298 - Peiqing Yang, Shangchen Zhou, Jixin Zhao, Qingyi Tao, Chen Change Loy:
MatAnyone: Stable Video Matting with Consistent Memory Propagation. 7299-7308 - Zhongrui Yu, Martina Megaro-Boldini, Robert W. Sumner, Abdelaziz Djelouah:
Unboxed: Geometrically and Temporally Consistent Video Outpainting. 7309-7319 - Zhaoyi Tian, Feifeng Wang, Shiwei Wang, Zihao Zhou, Yao Zhu, Liquan Shen:
High Dynamic Range Video Compression: A Large-Scale Benchmark Dataset and A Learned Bit-depth Scalable Compression Algorithm. 7320-7330 - Wei Jiang
, Junru Li, Kai Zhang, Li Zhang:
ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression. 7331-7341 - Gang He, Weiran Wang, Guancheng Quan, Shihao Wang, Dajiang Zhou, Yunsong Li:
RivuletMLP: An MLP-based Architecture for Efficient Compressed Video Quality Enhancement. 7342-7352 - Feng Liu, Shiwei Zhang, Xiaofeng Wang, Yujie Wei
, Haonan Qiu, Yuzhong Zhao, Yingya Zhang, Qixiang Ye, Fang Wan:
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model. 7353-7363 - Mingzhen Sun, Weining Wang, Gen Li, Jiawei Liu, Jiahui Sun, Wanquan Feng, Shanshan Lao, SiYu Zhou, Qian He, Jing Liu:
AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion. 7364-7373 - Deyu Zhou, Quan Sun, Yuang Peng, Kun Yan, Runpei Dong, Duomin Wang, Zheng Ge, Nan Duan, Xiangyu Zhang:
Taming Teacher Forcing for Masked Autoregressive Video Generation. 7374-7384 - Shijun Shi, Jing Xu, Lijing Lu, Zhihang Li, Kai Hu:
Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution. 7385-7395 - Zonghui Guo, Yingjie Liu, Jie Zhang, Haiyong Zheng, Shiguang Shan:
Face Forgery Video Detection via Temporal Forgery Cue Unraveling. 7396-7405 - Zhiyao Wang, Xu Chen, Chengming Xu, Junwei Zhu, Xiaobin Hu, Jiangning Zhang, Chengjie Wang, Yuqi Liu, Yiyi Zhou, Rongrong Ji:
SVFR: A Unified Framework for Generalized Video Face Restoration. 7406-7415 - Xin Zhang, Xue Yang, Yuxuan Li, Jian Yang, Ming-Ming Cheng, Xiang Li:
RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark. 7416-7426 - Minh Kha Do, Kang Han, Phu Lai, Khoa T. Phan, Wei Xiang:
RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability. 7427-7436 - Yuanye Liu
, Jinyang Liu, Renwei Dian
, Shutao Li:
A Selective Re-learning Mechanism for Hyperspectral Fusion Imaging. 7437-7446 - Jie Huang, Haorui Chen, Jiaxuan Ren, Siran Peng, Liangjian Deng
:
A General Adaptive Dual-level Weighting Mechanism for Remote Sensing Pansharpening. 7447-7456 - Haowen Bai, Jiangshe Zhang, Zixiang Zhao, Yichen Wu
, Lilun Deng, Yukun Cui, Tao Feng, Shuang Xu:
Task-driven Image Fusion with Learnable Fusion Loss. 7457-7468 - Guanglu Dong, Tianheng Zheng, Yuanzhouhan Cao, Linbo Qing, Chao Ren:
Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining. 7469-7479 - Gehui Li, Bin Chen, Chen Zhao, Lei Zhang, Jian Zhang:
OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction. 7480-7490 - Boyun Li, Haiyu Zhao, Wenxin Wang, Peng Hu, Yuanbiao Gou, Xi Peng:
MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration. 7491-7501 - Yuhui Quan, Tianxiang Zheng, Zhiyuan Ma, Hui Ji:
Zero-Shot Blind-spot Image Denoising via Implicit Neural Sampling. 7502-7512 - Hang Xu, Jie Huang, Wei Yu, Jiangtong Tan, Zhen Zou, Feng Zhao:
Adaptive Dropout: Unleashing Dropout across Layers for Generalizable Image Super-Resolution. 7513-7523 - Haijin Zeng, Xiangming Wang, Yongyong Chen, Jingyong Su, Jie Liu:
Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks. 7524-7533 - Xingyuan Li, Zirui Wang, Yang Zou, Zhixin Chen, Jun Ma
, Zhiying Jiang, Long Ma, Jinyuan Liu:
DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution. 7534-7544 - Haina Qin, Wenyang Luo, Libin Wang, Dandan Zheng, Jingdong Chen, Ming Yang
, Bing Li, Weiming Hu:
Reversing Flow for Image Restoration. 7545-7558 - Siyang Wang, Naishan Zheng, Jie Huang, Feng Zhao:
Navigating Image Restoration with VAR's Distribution Alignment Prior. 7559-7569 - Chunyi Li, Yuan Tian, Xiaoyue Ling, Zicheng Zhang, Haodong Duan, Haoning Wu, Ziheng Jia, Xiaohong Liu, Xiongkuo Min, Guo Lu, Weisi Lin, Guangtao Zhai:
Image Quality Assessment: From Human to Machine Preference. 7570-7581 - Sidi Yang, Binxiao Huang, Yulun Zhang, Dahai Yu, Yujiu Yang, Ngai Wong:
DnLUT: Ultra-Efficient Color Image Denoising via Channel-Aware Lookup Tables. 7582-7591 - Jiamin Xu, Yuxin Zheng, Zelong Li, Chi Wang, Renshu Gu, Weiwei Xu, Gang Xu:
Detail-Preserving Latent Diffusion for Stable Shadow Removal. 7592-7602 - Haonan Zhao, Qingyang Liu, Xinhao Tao, Li Niu, Guangtao Zhai:
Shadow Generation Using Diffusion Model with Geometry Prior. 7603-7612 - Liangbin Xie, Daniil Pakhomov, Zhonghao Wang, Zongze Wu, Ziyan Chen, Yuqian Zhou, Haitian Zheng, Zhifei Zhang, Zhe Lin, Jiantao Zhou, Chao Dong:
TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting. 7613-7622 - Donghui Feng, Zhengxue Cheng, Shen Wang, Ronghua Wu, Hongwei Hu, Guo Lu, Li Song:
Linear Attention Modeling for Learned Image Compression. 7623-7632 - Hao Xu, Xiaolin Wu, Xi Zhang
:
Multirate Neural Image Compression with Adaptive Lattice Vector Quantization. 7633-7642 - Jinrui Yang, Qing Liu, Yijun Li, Soo Ye Kim, Daniil Pakhomov, Mengwei Ren, Jianming Zhang, Zhe Lin, Cihang Xie, Yuyin Zhou:
Generative Image Layer Decomposition with Visual Effects. 7643-7653 - Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song:
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training. 7654-7663 - Lianghui Zhu, Zilong Huang, Bencheng Liao, Jun Hao Liew, Hanshu Yan, Jiashi Feng, Xinggang Wang
:
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention. 7664-7674 - Lexington Allen Whalen, Zhenbang Du, Haoran You, Chaojian Li, Sixu Li, Yingyan Lin:
Early-Bird Diffusion: Investigating and Leveraging Timestep-Aware Early-Bird Tickets in Diffusion Models for Efficient Training. 7675-7684 - Yousef Yeganeh, Azade Farshad
, Ioannis Charisiadis, Marta Hasny, Martin Hartenberger, Björn Ommer, Nassir Navab, Ehsan Adeli
:
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis. 7685-7695 - Jian Wang, Xin Lan, Jizhe Zhou, Yuxin Tian, Jiancheng Lv:
Style Quantization for Data-Efficient GAN Training. 7696-7706 - Yu Cao, Zengqun Zhao, Ioannis Patras, Shaogang Gong:
Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts. 7707-7716 - Hoigi Seo, Wongi Jeong, Kyungryeol Lee, Se Young Chun:
Efficient Personalization of Quantized Diffusion Model without Backpropagation. 7717-7727 - Zhenyu Wang, Jianxi Huang, Zhida Sun, Yuanhao Gong, Daniel Cohen-Or, Min Lu:
Layered Image Vectorization via Semantic Simplification. 7728-7738 - Yiyang Ma, Xingchao Liu, Xiaokang Chen, Wen Liu, Chengyue Wu, Zhiyu Wu, Zizheng Pan, Zhenda Xie, Haowei Zhang, Xingkai Yu, Liang Zhao, Yisong Wang, Jiaying Liu, Chong Ruan:
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation. 7739-7751 - Hui Li, Mingwang Xu
, Yun Zhan, Shan Mu, Jiaye Li, Kaihui Cheng, Yuxuan Chen, Tan Chen, Mao Ye, Jingdong Wang, Siyu Zhu:
OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation. 7752-7762 - Minghong Cai, Xiaodong Cun, Xiaoyu Li, Wenze Liu, Zhaoyang Zhang, Yong Zhang, Ying Shan, Xiangyu Yue:
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation. 7763-7772 - Shijie Wang, Samaneh Azadi, Rohit Girdhar, Saketh Rambhatla, Chen Sun, Xi Yin:
MotiF: Making Text Count in Image Animation with Motion Focal Loss. 7773-7783 - Zhengbo Zhang, Yuxi Zhou
, Duo Peng, Joo-Hwee Lim, Zhigang Tu, De Wen Soh, Lin Geng Foo:
Visual Prompting for One-shot Controllable Video Editing without Inversion. 7784-7794 - Or Madar, Ohad Fried:
Tiled Diffusion. 7795-7804 - Lingjun Mao, Zineng Tang, Alane Suhr:
Evaluating Model Perception of Color Illusions in Photorealistic Scenes. 7805-7814 - Fatemeh Behrad, Tinne Tuytelaars
, Johan Wagemans
:
Charm: The Missing Piece in ViT Fine-Tuning for Image Aesthetic Assessment. 7815-7824 - Jamie Wynn, Zawar Qureshi, Jakub Powierza, Jamie Watson, Mohamed Sayed:
Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization. 7825-7836 - Hyelin Nam, Jaemin Kim, Dohun Lee, Jong Chul Ye:
Optical-Flow Guided Prompt Optimization for Coherent Video Generation. 7837-7846 - Ye Wang, Ruiqi Liu, Jiang Lin, Fei Liu, Zili Yi, Yilin Wang, Rui Ma:
OmniStyle: Filtering High Quality Style Transfer Data at Scale. 7847-7856 - Noam Rotstein, Gal Yona, Daniel Silver, Roy Velich, David Bensaïd, Ron Kimmel:
Pathways on the Image Manifold: Image Editing via Video Generation. 7857-7866 - Ziqi Cai, Shuchen Weng, Yifei Xia, Boxin Shi:
PhyS-EdiT: Physics-aware Semantic Image Editing with Text Description. 7867-7876 - Omri Avrahami, Or Patashnik, Ohad Fried, Egor Nemchinov, Kfir Aberman, Dani Lischinski, Daniel Cohen-Or:
Stable Flow: Vital Layers for Training-Free Image Editing. 7877-7888 - Daneul Kim, Jaeah Lee, Jaesik Park:
Improving Editability in Image Generation with Layer-wise Memory. 7889-7898 - Jiteng Mu, Nuno Vasconcelos, Xiaolong Wang:
EditAR: Unified Conditional Generation with Autoregressive Models. 7899-7909 - Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli
, Alessio Tonioni, Rita Cucchiara:
Zero-Shot Styled Text Image Generation, but Make It Autoregressive. 7910-7919 - Yu Yuan, Xijun Wang, Yichen Sheng, Prateek Chennuri, Xingguang Zhang, Stanley Chan:
Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis. 7920-7930 - Sean J. Liu, Nupur Kumari, Ariel Shamir, Jun-Yan Zhu:
Generative Photomontage. 7931-7941 - Han Yang, Chuanguang Yang, Qiuli Wang, Zhulin An, Weilun Feng, Libo Huang, Yongjun Xu:
Multi-party Collaborative Attention Control for Image Customization. 7942-7951 - Yifan Pu, Yiming Zhao, Zhicong Tang, Ruihong Yin, Haoxing Ye, Yuhui Yuan, Dong Chen, Jianmin Bao, Sirui Zhang, Yanbin Wang, Lin Liang, Lijuan Wang, Ji Li, Xiu Li, Zhouhui Lian, Gao Huang, Baining Guo:
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation. 7952-7962 - Lunhao Duan, Shanshan Zhao, Wenjun Yan, Yinglun Li, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Mingming Gong, Gui-Song Xia:
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation. 7963-7973 - Jian Yang, Dacheng Yin, Yizhou Zhou, Fengyun Rao, Wei Zhai, Yang Cao, Zheng-Jun Zha:
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling. 7974-7985 - Chaehun Shin, Jooyoung Choi, Heeseung Kim, Sungroh Yoon:
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator. 7986-7996 - Jierun Chen, Dongting Hu, Xijie Huang, Huseyin Coskun, Arpit Sahni, Aarush Gupta, Anujraaj Goyal, Dishani Lahiri, Rajesh Singh, Yerlan Idelbayev, Junli Cao, Yanyu Li, Kwang-Ting Cheng, S.-H. Gary Chan, Mingming Gong, Sergey Tulyakov, Anil Kag, Yanwu Xu, Jian Ren:
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training. 7997-8008 - Runtao Liu, Haoyu Wu, Ziqiang Zheng, Chen Wei, Yingqing He, Renjie Pi, Qifeng Chen:
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation. 8009-8019 - Meihua Dang, Anikait Singh, Linqi Zhou
, Stefano Ermon, Jiaming Song:
Personalized Preference Fine-tuning of Diffusion Models. 8020-8030 - Jeeyung Kim
, Erfan Esmaeili, Qiang Qiu:
Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps. 8031-8040 - SeungJu Cha, Kwanyoung Lee, Ye-Chan Kim, Hyunwoo Oh, Dong-Jin Kim:
VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness. 8041-8050 - Shuailei Ma, Kecheng Zheng, Ying Wei, Wei Wu, Fan Lu, Yifei Zhang, Chen-Wei Xie, Biao Gong, Jiapeng Zhu, Yujun Shen:
Learning Visual Generative Priors without Text. 8051-8061 - Gianni Franchi, Nacim Belkhir, Dat Nguyen Trong, Guoxuan Xia, Andrea Pilzer:
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation. 8062-8072 - Wei Chen, Lin Li, Yongqi Yang, Bin Wen, Fan Yang, Tingting Gao, Yu Wu, Long Chen
:
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation. 8073-8082 - Yifan Gao, Zihang Lin, Chuanbin Liu, Min Zhou, Tiezheng Ge, Bo Zheng, Hongtao Xie:
PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering. 8083-8093 - Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang, He Zhang, Andrew Gilbert, John P. Collomosse
, Soo Ye Kim:
Multitwine: Multi-Object Compositing with Text and Layout Control. 8094-8104 - Sankalp Sinha, Mohammad Sadil Khan, Muhammad Usama, Shino Sam, Didier Stricker, Sk Aziz Ali, Muhammad Zeshan Afzal:
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation. 8105-8116 - HsiaoYuan Hsu, Yuxin Peng:
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation. 8117-8127 - Jiawei Lin, Shizhao Sun, Danqing Huang, Ting Liu, Ji Li, Jiang Bian:
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition. 8128-8137 - Kiyohiro Nakayama, Jan Ackermann, Timur Levent Kesdogan, Yang Zheng, Maria Korosteleva, Olga Sorkine-Hornung, Leonidas J. Guibas, Guandao Yang, Gordon Wetzstein:
AIpparel: A Multimodal Foundation Model for Digital Garments. 8138-8149 - Jing Lin, Yao Feng, Weiyang Liu, Michael J. Black:
ChatHuman: Chatting about 3D Humans with Tools. 8150-8161 - Akshay R. Kulkarni
, Ge Yan, Chung-En Sun, Tuomas P. Oikarinen, Tsui-Wei Weng:
Interpretable Generative Models through Post-hoc Concept Bottlenecks. 8162-8171 - Lingyun Zhang, Yu Xie, Yanwei Fu, Ping Chen
:
Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization. 8172-8181 - Chen Chen, Daochang Liu
, Mubarak Shah, Chang Xu:
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models. 8182-8191 - Yingdong Shi, Changming Li, Yifan Wang, Yongxiang Zhao, Anqi Pang, Sibei Yang, Jingyi Yu, Kan Ren:
Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability. 8192-8202 - Sangwon Jang, June Suk Choi, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang:
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models. 8203-8212 - Zilan Wang, Junfeng Guo, Jiacheng Zhu, Yiming Li, Heng Huang, Muhao Chen, Zhengzhong Tu:
SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models. 8213-8224 - Gaozhi Liu, Silu Cao, Zhenxing Qian, Xinpeng Zhang, Sheng Li, Wanli Peng:
Watermarking One for All: A Robust Watermarking Scheme Against Partial Image Theft. 8225-8234 - Ali Salar, Qing Liu, Yingli Tian, Guoying Zhao:
Enhancing Facial Privacy Protection via Weakening Diffusion Purification. 8235-8244 - Jeongsoo Park
, Andrew Owens:
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors. 8245-8257 - Nan Zhong, Haoyu Chen
, Yiran Xu
, Zhenxing Qian, Xinpeng Zhang:
Beyond Generation: A Diffusion-based Low-level Feature Extractor for Detecting AI-generated Images. 8258-8268 - Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li, Farzan Farnia:
Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach. 8269-8278 - Dhananjaya Jayasundara, Heng Zhao, Demetrio Labate, Vishal M. Patel:
MIRE: Matched Implicit Neural Representations. 8279-8288 - Rui Wang, Shaocheng Jin, Ziheng Chen, Xiaoqing Luo, Xiao-Jun Wu:
Learning to Normalize on the SPD Manifold under Bures-Wasserstein Geometry. 8289-8298 - Sara A. Al-Emadi
, Yin Yang, Ferda Ofli:
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery. 8299-8309 - Michele Mazzamuto, Antonino Furnari, Yoichi Sato, Giovanni Maria Farinella:
Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities. 8310-8320 - Changchang Sun, Gaowen Liu, Charles Fleming, Yan Yan:
Enhancing Dance-to-Music Generation via Negative Conditioning Latent Diffusion Model. 8321-8330 - Mingfei Chen, Israel D. Gebru, Ishwarya Ananthabhotla
, Christian Richardt, Dejan Markovic, Jake Sandakly, Steven Krenn, Todd Keebler, Eli Shlizerman, Alexander Richard:
SoundVista: Novel-View Ambient Sound Synthesis via Visual-Acoustic Binding. 8331-8341 - Sung Jin Um, Dongjin Kim
, Sangmin Lee
, Jung Uk Kim:
Object-aware Sound Source Localization via Audio-Visual Scene Understanding. 8342-8351 - Shaofei Huang, Rui Ling, Tianrui Hui, Hongyu Li, Xu Zhou, Shifeng Zhang, Si Liu, Richang Hong, Meng Wang:
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer. 8352-8361 - Jinxing Zhou, Dan Guo, Ruohao Guo, Yuxin Mao, Jingjing Hu, Yiran Zhong, Xiaojun Chang
, Meng Wang:
Towards Open-Vocabulary Audio-Visual Event Localization. 8362-8371 - Hanlin Wang, Zhan Tong
, Kecheng Zheng, Yujun Shen, Limin Wang:
Contextual AD Narration with Interleaved Multimodal Sequence. 8372-8383 - Jiayuan Rao, Haoning Wu, Hao Jiang, Ya Zhang, Yanfeng Wang, Weidi Xie:
Towards Universal Soccer Video Understanding. 8384-8394 - S. P. Sharan, Minkyu Choi, Sahil Shah, Harsh Goel, Mohammad Omama, Sandeep Chinchali:
Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification. 8395-8405 - Kaiyue Sun, Kaiyi Huang, Xian Liu, Yue Wu, Zihan Xu, Zhenguo Li, Xihui Liu:
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation. 8406-8416 - Kangyi Wu, Pengna Li, Jingwen Fu, Yizhe Li, Yang Wu, Yuhan Liu, Jinjun Wang, Sanping Zhou:
Event-Equalized Dense Video Captioning. 8417-8427 - Qiuheng Wang, Yukai Shi, Jiarong Ou, Rui Chen, Ke Lin, Jiahao Wang, Boyuan Jiang, Haotian Yang, Mingwu Zheng, Xin Tao, Fei Yang, Pengfei Wan, Di Zhang:
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content. 8428-8437 - Fida Mohammad Thoker
, Letian Jiang
, Chen Zhao, Bernard Ghanem
:
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning. 8438-8449 - Wenyi Hong, Yean Cheng, Zhuoyi Yang, Weihan Wang, Lefan Wang, Xiaotao Gu, Shiyu Huang, Yuxiao Dong, Jie Tang:
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models. 8450-8460 - Ziyang Luo, Haoning Wu, Dongxu Li, Jing Ma, Mohan S. Kankanhalli, Junnan Li:
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation. 8461-8474 - Yilun Zhao, Haowei Zhang
, Lujing Xie, Tongyan Hu, Guo Gan, Yitao Long, Zhiyuan Hu, Weiyuan Chen, Chuhan Li, Zhijian Xu, Chengye Wang, Ziyao Shangguan, Zhenwen Liang, Yixin Liu, Chen Zhao, Arman Cohan:
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding. 8475-8489 - Yunlong Tang
, Junjia Guo, Hang Hua, Susan Liang, Mingqian Feng, Xinyang Li, Rui Mao, Chao Huang, Jing Bi, Zeliang Zhang, Pooyan Fazli, Chenliang Xu:
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos? 8490-8500 - Yujie Lu, Yale Song, William Wang, Lorenzo Torresani, Tushar Nagarajan:
VITED: Video Temporal Evidence Distillation. 8501-8511 - Yudong Han, Qingpei Guo, Liyuan Pan, Liu Liu, Yu Guan, Ming Yang
:
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding. 8512-8522 - Yudi Shi, Shangzhe Di, Qirui Chen, Weidi Xie:
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation. 8523-8533 - Yuanbin Man, Ying Huang, Chengming Zhang
, Bingzhe Li, Wei Niu, Miao Yin:
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction. 8534-8544 - Shehreen Azad, Vibhav Vineet, Yogesh Singh Rawat:
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding. 8545-8556 - Yue Wu, Zhaobo Qi, Junshu Sun, Yaowei Wang, Qingming Huang, Shuhui Wang:
Video Language Model Pretraining with Spatio-temporal Masking. 8557-8567 - Zhihang Liu, Chen-Wei Xie, Pandeng Li, Liming Zhao, Longxiang Tang, Yun Zheng, Chuanbin Liu, Hongtao Xie:
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models. 8568-8578 - Jinhui Ye, Zihan Wang, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristóbal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli
, Li Fei-Fei, Jiajun Wu, Manling Li:
Re-thinking Temporal Search for Long-Form Video Understanding. 8579-8591 - Hongyu Li, Jinyu Chen, Ziyu Wei, Shaofei Huang, Tianrui Hui, Jialin Gao, Xiaoming Wei, Si Liu:
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding. 8592-8603 - Yunseok Jang, Yeda Song, Sungryull Sohn, Lajanugen Logeswaran, Tiange Luo, Dong-Ki Kim, Kyunghoon Bae, Honglak Lee:
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents. 8604-8614 - Xun Jiang, Zhiyi Huang, Xing Xu, Jingkuan Song, Fumin Shen, Heng Tao Shen:
PHGC: Procedural Heterogeneous Graph Completion for Natural Language Task Verification in Egocentric Videos. 8615-8624 - Andong Deng, Tongjia Chen, Shoubin Yu, Taojiannan Yang, Lincoln Spencer, Yapeng Tian, Ajmal Saeed Mian
, Mohit Bansal, Chen Chen:
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level. 8625-8636 - Adnen Abdessaied, Anna Rohrbach, Marcus Rohrbach, Andreas Bulling:
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts. 8637-8647 - Rohith Peddi, Saurabh, Ayush Abhay Shrivastava, Parag Singla, Vibhav Gogate:
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation. 8648-8657 - Lang Lin, Xueyang Yu, Ziqi Pang, Yu-Xiong Wang:
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation. 8658-8667 - Qin Liu, Jianfeng Wang, Zhengyuan Yang, Linjie Li, Kevin Lin, Marc Niethammer, Lijuan Wang:
LiVOS: Light Video Object Segmentation with Gated Linear Matching. 8668-8678 - Muchao Ye, Weiyang Liu, Pan He:
VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models. 8679-8688 - Yuzhi Huang, Chenxin Li, Haitao Zhang, Zixu Lin, Yunlong Lin, Hengyu Liu, Wuyang Li, Xinyu Liu, Jiechao Gao, Yue Huang, Xinghao Ding, Yixuan Yuan:
Track Any Anomalous Object: A Granular Video Anomaly Detection Pipeline. 8689-8699 - Zhanzhong Pang, Fadime Sener, Angela Yao:
Context-Enhanced Memory-Refined Transformer for Online Action Detection. 8700-8710 - Ziyi Liu, Yangcen Liu:
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer. 8711-8720 - Yang Chen
, Jingcai Guo, Song Guo, Dacheng Tao:
Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition. 8721-8730 - Xinqi Liu, Li Zhou, Zikun Zhou, Jianqiu Chen, Zhenyu He:
MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking. 8731-8741 - Youngjoon Jang, Haran Raajesh, Liliane Momeni, Gül Varol, Andrew Zisserman:
Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues. 8742-8752 - Song Xia, Yi Yu, Wenhan Yang, Meiwen Ding, Zhuo Chen, Ling-Yu Duan, Alex C. Kot, Xudong Jiang:
Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems. 8753-8763 - Shuaiwei Yuan, Junyu Dong, Yuezun Li:
Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted. 8764-8774 - Hossein Kashiani, Niloufar Alipour Talemi, Fatemeh Afghah:
FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing. 8775-8785 - Guocheng Qian, Kuan-Chieh Wang, Or Patashnik, Negin Heravi, Daniil Ostashev, Sergey Tulyakov, Daniel Cohen-Or, Kfir Aberman:
Omni-ID: Holistic Identity Representation Designed for Generative Tasks. 8786-8795 - Kang You, Ziling Wei, Jing Yan, Boning Zhang, Qinghai Guo, Yaoyu Zhang, Zhezhi He:
VISTREAM: Improving Computation Efficiency of Visual Streaming Perception via Law-of-Charge-Conservation Inspired Spiking Neural Network. 8796-8805 - Kairong Yu, Chengting Yu, Tianqing Zhang, Xiaochen Zhao, Shu Yang, Hongwei Wang, Qiang Zhang, Qi Xu:
Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks. 8806-8816 - Shiyang Zhou, Haijin Zeng, Yunfan Lu, Tong Shao, Ke Tang, Yongyong Chen, Jie Liu, Jingyong Su:
Binarized Mamba-Transformer for Lightweight Quad Bayer HybridEVS Demosaicing. 8817-8827 - Yan Jiang, Hao Yu, Xu Cheng, Haoyu Chen, Zhaodong Sun, Guoying Zhao:
From Laboratory to Real World: A New Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification. 8828-8837 - Leiye Liu, Miao Zhang, Jihao Yin, Tingwei Liu, Wei Ji, Yongri Piao, Huchuan Lu:
DefMamba: Deformable Visual State Space Model. 8838-8847 - Woojin Lee, Hyugjae Chang, Jaeho Moon, Jaehyup Lee, Munchurl Kim:
ABBSPO: Adaptive Bounding Box Scaling and Symmetric Prior based Orientation Prediction for Detecting Aerial Image Objects. 8848-8858 - Zhongyu Li, Xin Jin, Bo-Yuan Sun, Chun-Le Guo, Ming-Ming Cheng:
Towards RAW Object Detection in Diverse Conditions. 8859-8868 - Minshan Xie
, Jian Lin, Hanyuan Liu, Chengze Li, Tien-Tsin Wong:
Advancing Manga Analysis: Comprehensive Segmentation Annotations for the Manga109 Dataset. 8869-8878 - Qian Deng, Le Hui, Jin Xie, Jian Yang:
Sketchy Bounding-box Supervision for 3D Instance Segmentation. 8879-8888 - Jiahao Lu, Jiacheng Deng:
Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation. 8889-8899 - Shuai Liu, Mingyue Cui, Boyang Li, Quanmin Liang, Tinghe Hong, Yunxiao Shan, Kai Huang:
FSHNet: Fully Sparse Hybrid Network for 3D Object Detection. 8900-8909 - Weijie Wei, Osman Ülger, Fatemeh Karimi Nejadasl, Theo Gevers, Martin R. Oswald:
3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation. 8910-8920 - Chenyi Zhang, Ting Liu, Xiaochao Qu, Luoqi Liu, Yao Zhao, Yunchao Wei:
NTClick: Achieving Precise Interactive Segmentation With Noise-tolerant Clicks. 8921-8930 - Cong Wei
, Yujie Zhong, Haoxian Tan, Yong Liu, Jie Hu, Dengjie Li, Zheng Zhao, Yujiu Yang
:
HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver. 8931-8941 - Chinedu Innocent Nwoye, Kareem Elgohary, Anvita Srinivas, Fauzan Zaid, Joël L. Lavanchy
, Nicolas Padoy:
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools. 8942-8952 - Jiawei Fu, Tiantian Zhang, Kai Chen, Qi Dou:
Hybrid Reciprocal Transformer with Triplet Feature Alignment for Scene Graph Generation. 8953-8963 - Brian Chao, Hung-Yu Tseng, Lorenzo Porzi, Chen Gao, Tuotuo Li, Qinbo Li, Ayush Saraf, Jia-Bin Huang, Johannes Kopf, Gordon Wetzstein, Changil Kim:
Textured Gaussians for Enhanced 3D Scene Appearance Modeling. 8964-8974 - Wei Deng, Mengshi Qi, Huadong Ma:
Global-Local Tree Search in VLMs for 3D Indoor Scene Generation. 8975-8984 - Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Daniel Barath, Iro Armeni:
CrossOver: 3D Scene Cross-Modal Alignment. 8985-8994 - Duo Zheng, Shijia Huang, Liwei Wang
:
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding. 8995-9006 - Xuewu Lin, Tianwei Lin, Lichao Huang, Hongyu Xie, Zhizhong Su
:
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence. 9007-9016 - Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju, Sougata Sen, Sanjay E. Sarma, Archan Misra:
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding. 9017-9026 - Yuejiao Su, Yi Wang, Qiongyang Hu, Chuang Yang, Lap-Pui Chau:
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction. 9027-9038 - Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Dongmei Jiang, Liqiang Nie:
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy. 9039-9049 - Di Zhang, Jingdi Lei, Junxian Li, Xunzhi Wang, Yujie Liu, Zonglin Yang, Jiatong Li, Weida Wang, Suorong Yang, Jianbo Wu, Peng Ye, Wanli Ouyang, Dongzhan Zhou:
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning. 9050-9061 - Yuhao Dong, Zuyan Liu, Hai-Long Sun, Jingkang Yang, Winston Hu, Yongming Rao, Ziwei Liu:
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models. 9062-9072 - Jae Sung Park, Zixian Ma, Linjie Li, Chenhao Zheng, Cheng-Yu Hsieh, Ximing Lu, Khyathi Raghavi Chandu, Quan Kong, Norimasa Kobori, Ali Farhadi, Yejin Choi, Ranjay Krishna:
Synthetic Visual Genome. 9073-9086 - Yexin Liu, Zhengyang Liang, Yueze Wang, Xianfeng Wu, Feilong Tang, Muyang He, Jian Li, Zheng Liu, Harry Yang, Sernam Lim, Bo Zhao:
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly. 9087-9097 - Geng Li
, Jinglin Xu, Yunzhen Zhao, Yuxin Peng:
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding. 9098-9108 - Vésteinn Snæbjarnarson
, Kevin Du, Niklas Stoehr, Serge J. Belongie
, Ryan Cotterell, Nico Lang, Stella Frank:
Taxonomy-Aware Evaluation of Vision-Language Models. 9109-9120 - Kartik Thakral, Tamar Glaser, Tal Hassner, Mayank Vatsa, Richa Singh:
Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models. 9121-9130 - Zhikai Li, Xuewen Liu, Dongrong Joe Fu, Jianquan Li, Qingyi Gu, Kurt Keutzer, Zhen Dong:
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences. 9131-9141 - Xuecheng Wu, Heli Sun, Yifan Wang, Jiayu Nie, Jie Zhang, Yabing Wang, Junxiao Xue, Liang He:
AVF-MAE++: Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning. 9142-9153 - Xiaoqin Wang, Xusen Ma, Xianxu Hou
, Meidan Ding, Yudong Li, Junliang Chen, Wenting Chen
, Xiaoyang Peng, Linlin Shen:
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs. 9154-9164 - Qihui Zhang, Munan Ning, Zheyuan Liu, Yue Huang, Shuo Yang, Yanbo Wang, Jiayi Ye, Xiao Chen, Yibing Song, Li Yuan:
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation. 9165-9174 - Silin Cheng, Yang Liu
, Xinwei He, Sébastien Ourselin, Lei Tan, Gen Luo:
WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation. 9175-9185 - Jiansheng Li, Xingxuan Zhang, Hao Zou, Yige Guo, Renzhe Xu, Yilong Liu, Chuzhao Zhu, Yue He, Peng Cui:
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts. 9186-9198 - Federico Cocchi, Nicholas Moratelli, Marcella Cornia, Lorenzo Baraldi
, Rita Cucchiara:
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering. 9199-9209 - Daniel Samira, Edan Habler, Yuval Elovici, Asaf Shabtai:
Variance-Based Membership Inference Attacks Against Large-Scale Image Captioning Models. 9210-9219 - Jiayu Jiang, Changxing Ding, Wentao Tan, Junhong Wang, Jin Tao, Xiangmin Xu:
Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification. 9220-9230 - Huakai Lai, Guoxin Xiong, Huayu Mai, Xiang Liu, Tianzhu Zhang:
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment. 9231-9241 - Yiheng Li
, Yang Yang, Zichang Tan, Huan Liu, Weihua Chen, Xu Zhou, Zhen Lei
:
Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation. 9242-9252 - Davide Talon, Federico Girella, Ziyue Liu
, Marco Cristani, Yiming Wang
:
Seeing the Abstract: Translating the Abstract Language for Vision Language Models. 9253-9262 - Zengrong Lin, Zheng Wang, Tianwen Qian, Pan Mu, Sixian Chan, Cong Bai:
NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval. 9263-9273 - Xin Zhang, Yanzhao Zhang, Wen Xie, Mingxin Li, Ziqi Dai, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li, Min Zhang:
Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models. 9274-9285 - Davide Caffagni, Sara Sarto
, Marcella Cornia, Lorenzo Baraldi
, Rita Cucchiara:
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval. 9286-9295 - Qirui Jiao, Daoyuan Chen, Yilun Huang, Bolin Ding, Yaliang Li, Ying Shen:
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models. 9296-9307 - Reza Abbasi, Ali Nazari, Aminreza Sefid, Mohammadali Banayeeanzade, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah:
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation. 9308-9317 - Yifei Zhang, Chang Liu, Jin Wei, Xiaomeng Yang, Yu Zhou, Can Ma, Xiangyang Ji:
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition. 9318-9328 - Dongliang Luo, Hanshen Zhu
, Ziyang Zhang, Dingkang Liang, Xudong Xie, Yuliang Liu, Xiang Bai:
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting. 9329-9338 - Seil Kang, Jinyeong Kim, Junhyeok Kim, Seong Jae Hwang:
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding. 9339-9350 - Teng Hu, Jiangning Zhang, Ran Yi, Jieyu Weng, Yabiao Wang, Xianfang Zeng, Zhucun Xue, Lizhuang Ma:
Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction. 9351-9360 - Hongrui Jia, Chaoya Jiang, Haiyang Xu, Wei Ye, Mengfan Dong, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang:
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization. 9361-9371 - Mohsen Gholami, Mohammad Akbari, Kevin Cannons, Yong Zhang:
CASP: Compression of Large Multimodal Models Based on Attention Sparsity. 9372-9381 - Hao Yin, Guangzong Si, Zilei Wang:
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference. 9382-9391 - Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari, Yong Zhang:
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models. 9392-9401 - Longrong Yang, Dong Shen, Chaoxiang Cai, Kaibing Chen, Fan Yang, Tingting Gao, Di Zhang, Xi Li:
Libra-Merging: Importance-redundancy and Pruning-merging Trade-off for Acceleration Plug-in in Large Vision-Language Model. 9402-9412 - Yiyang Du, Xiaochen Wang, Chi Chen, Jiabo Ye, Yiru Wang, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Zhifang Sui, Maosong Sun, Yang Liu:
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization. 9413-9422 - Zefeng Zhang, Hengzhu Tang, Jiawei Sheng, Zhenyu Zhang, Yiming Ren, Zhenyang Li, Dawei Yin, Duohe Ma, Tingwen Liu:
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization. 9423-9433 - Mingyang Song, Xiaoye Qu, Jiawei Zhou, Yu Cheng:
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration. 9434-9444 - Yinan Liang, Ziwei Wang, Xiuwei Xu, Jie Zhou, Jiwen Lu:
EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models. 9445-9454 - Qizhou Chen
, Chengyu Wang, Dakan Wang, Taolin Zhang, Wangyue Li, Xiaofeng He:
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts. 9455-9466 - Zuopeng Yang, Jiluan Fan, Anli Yan, Erdun Gao, Xin Lin, Tao Li, Kanghua Mo, Changyu Dong:
Distraction is All You Need for Multimodal Large Language Model Jailbreaking. 9467-9476 - Siyuan Liang, Jiawei Liang, Tianyu Pang, Chao Du, Aishan Liu, Mingli Zhu, Xiaochun Cao
, Dacheng Tao:
Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift. 9477-9486 - Xin Luo, Xueming Fu, Zihang Jiang, S. Kevin Zhou:
ICP: Immediate Compensation Pruning for Mid-to-high Sparsity. 9487-9496 - Lianyu Wang, Meng Wang, Huazhu Fu, Daoqiang Zhang:
Vision-Language Model IP Protection via Prompt-based Learning. 9497-9506 - Xuan Wang, Xitong Gao, Dongping Liao, Tianrui Qin, Yu-liang Lu, Cheng-Zhong Xu:
A3: Few-shot Prompt Learning of Unlearnable Examples with Cross-Modal Adversarial Feature Alignment. 9507-9516 - Zequn Zeng, Yudi Su, Jianqiao Sun, Tiansheng Wen, Hao Zhang, Zhengjue Wang, Bo Chen
, Hongwei Liu, Jiawei Ma
:
Explaining Domain Shifts in Language: Concept Erasing for Interpretable Image Classification. 9517-9526 - Xueqing Wu, Yuheng Ding, Bingxuan Li, Pan Lu, Da Yin, Kai-Wei Chang, Nanyun Peng:
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning. 9527-9537 - Qiyuan Dai, Sibei Yang:
Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM. 9538-9548 - Mingjin Zhang, Xiaolong Li, Fei Gao, Jie Guo, Xinbo Gao, Jing Zhang:
SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining. 9549-9558 - Changsong Wen, Zelin Peng, Yu Huang, Xiaokang Yang, Wei Shen:
Domain Generalization in CLIP via Learning with Diverse Text Prompts. 9559-9569 - Junyi Wu, Yan Huang, Min Gao, Yuzhen Niu, Yuzhong Chen, Qiang Wu
:
Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition. 9570-9579 - Zilin Xiao, Pavel Suma, Ayush Sachdeva, Hao-Jen Wang, Giorgos Kordopatis-Zilos
, Giorgos Tolias, Vicente Ordonez:
LOCORE: Image Re-ranking with Long-Context Sequence Modeling. 9580-9590 - Jie Wang, Nana Yu, Zihao Zhang, Yahong Han:
Visual Consensus Prompting for Co-Salient Object Detection. 9591-9600 - Nuo Chen, Ming Jiang, Qi Zhao:
Explainable Saliency: Articulating Reasoning with Contextual Prioritization. 9601-9610 - Ziheng Zhang, Jianyang Gu, Arpita Chowdhury, Zheda Mai, David Carlyn, Tanya Y. Berger-Wolf, Yu Su, Wei-Lun Chao:
Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation. 9611-9620 - Junru Zhao, Tianqin Li, Dunhan Jiang, Shenghao Wu, Alan Ramirez, Tai Sing Lee:
Perceptual Inductive Bias Is What You Need Before Contrastive Learning. 9621-9630 - Baifeng Shi, Boyi Li, Han Cai, Yao Lu, Sifei Liu, Marco Pavone, Jan Kautz, Song Han, Trevor Darrell, Pavlo Molchanov, Hongxu Yin:
Scaling Vision Pre-Training to 4K Resolution. 9631-9640 - Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Victor G. Turrisi da Costa, Louis Béthune, Zhe Gan, Alexander Toshev, Marcin Eichner, Moin Nabi, Yinfei Yang, Joshua M. Susskind, Alaaeldin El-Nouby:
Multimodal Autoregressive Pre-training of Large Vision Encoders. 9641-9654 - Tianran Chen, Jiarui Chen, Baoquan Zhang, Zhehao Yu, Shidong Chen, Rui Ye, Xutao Li, Yunming Ye:
Sensitivity-Aware Efficient Fine-Tuning via Compact Dynamic-Rank Adaptation. 9655-9664 - Long Zhou, Fereshteh Shakeri, Aymen Sadraoui, Mounir Kaaniche, Jean-Christophe Pesquet, Ismail Ben Ayed:
UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning. 9665-9675 - Yunze Liu, Li Yi:
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining. 9676-9685 - Zhuguanyu Wu
, Jiayi Zhang, Jiaxin Chen, Jinyang Guo, Di Huang, Yunhong Wang:
APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers. 9686-9695 - Yoojin Jung, Byung Cheol Song:
Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models. 9696-9706 - Zhaozhi Wang, Yue Liu, Yunjie Tian, Yunfan Liu, Yaowei Wang, Qixiang Ye:
Building Vision Models upon Heat Conduction. 9707-9717 - Ao Wang, Hui Chen, Zijia Lin, Jungong Han, Guiguang Ding:
LSNet: See Large, Focus Small. 9718-9729 - Nick Nikzad
, Yi Liao, Yongsheng Gao, Jun Zhou:
SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers. 9730-9739 - Benjamin Bergner, Christoph Lippert, Aravindh Mahendran:
Token Cropr: Faster ViTs for Quite a Few Tasks. 9740-9750 - Joshua Fixelle:
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges. 9751-9761 - Zhijie Zhu, Lei Fan
, Maurice Pagnucco, Yang Song:
Interpretable Image Classification via Non-parametric Part Prototype Learning. 9762-9771 - Fanding Huang, Jingyan Jiang, Qinting Jiang, Hebei Li, Faisal Nadeem Khan, Zhi Wang:
COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation. 9772-9781 - Jiho Choi, Seonho Lee, Minhyun Lee, Seungho Lee, Hyunjung Shim:
Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation. 9782-9793 - Vladan Stojnic
, Yannis Kalantidis, Jirí Matas, Giorgos Tolias:
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation. 9794-9803 - Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos, Marc Botet Colomer, Linus Härenstam-Nielsen, Mattia Segù, Pier Luigi Dovesi, Jussi Karlgren, Daniel Cremers, Federico Tombari, Matteo Poggi:
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation. 9804-9815 - Hritam Basak, Zhaozheng Yin:
SemiDAViL: Semi-supervised Domain Adaptation with Vision-Language Guidance for Semantic Segmentation. 9816-9828 - Xiangfeng Xu, Pinyi Zhang, Wenxuan Huang, Yunhang Shen, Haosheng Chen, Jingzhong Lin, Wei Li, Gaoqi He, Jiao Xie, Shaohui Lin:
Weakly Supervised Semantic Segmentation via Progressive Confidence Region Expansion. 9829-9838 - Hongmei Yin, Tingliang Feng, Fan Lyu, Fanhua Shang, Hongying Liu, Wei Feng, Liang Wan:
Beyond Background Shift: Rethinking Instance Replay in Continual Semantic Segmentation. 9839-9848 - Zhaoyang Li, Yuan Wang, Wangkai Li, Tianzhu Zhang, Xiang Liu:
Dual-Agent Optimization framework for Cross-Domain Few-Shot Segmentation. 9849-9859 - Dongyao Jiang, Haodong Jing, Yongqiang Ma, Nanning Zheng:
Beyond Image Classification: A Video Benchmark and Dual-Branch Hybrid Discrimination Framework for Compositional Zero-Shot Learning. 9860-9869 - Zeliang Zhang, Gaowen Liu, Charles Fleming, Ramana Rao Kompella, Chenliang Xu:
Targeted Forgetting of Image Subgroups in CLIP Models. 9870-9880 - Yiyang Chen, Tianyu Ding, Lei Wang, Jing Huo, Yang Gao, Wenbin Li:
Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration. 9881-9890 - Yuanpei Liu, Zhenqi He, Kai Han:
Hyperbolic Category Discovery. 9891-9900 - Qianqian Shen, Yunhan Zhao, Nahyun Kwon, Jeeeun Kim, Yanan Li, Shu Kong:
Solving Instance Detection from an Open-World Perspective. 9901-9910 - Yun Zhu, Le Hui, Hang Yang, Jianjun Qian, Jin Xie, Jian Yang:
Learning Class Prototypes for Unified Sparse-Supervised 3D Object Detection. 9911-9920 - Boyong He, Yuxiang Ji, Qianwen Ye, Zhuoyue Tan, Liaoni Wu:
Generalized Diffusion Detector: Mining Robust Features from Diffusion Models for Domain-Generalized Detection. 9921-9932 - Chang-Bin Zhang, Yujie Zhong, Kai Han:
Mr. DETR: Instructive Multi-Route Training for Detection Transformers. 9933-9943 - Dennis Jacob, Chong Xiang, Prateek Mittal:
PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches. 9944-9953 - Ramchandran Muthukumar, Ambar Pal, Jeremias Sulam, René Vidal:
Disentangling Safe and Unsafe Image Corruptions via Anisotropy and Locality. 9954-9963 - Kai Mao, Ping Wei
, Yiyang Lian, Yangyang Wang, Nanning Zheng:
Beyond Single-Modal Boundary: Cross-Modal Anomaly Detection through Visual Prototype and Harmonization. 9964-9973 - Wei Luo, Yunkang Cao
, Haiming Yao, Xiaotian Zhang, Jianan Lou, Yuqi Cheng, Weiming Shen, Wenyong Yu:
Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection. 9974-9983 - Wenqiao Li, Bozhong Zheng, Xiaohao Xu, Jinye Gan, Fading Lu, Xiang Li, Na Ni, Zheng Tian, Xiaonan Huang, Shenghua Gao, Yingna Wu:
Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties. 9984-9993 - Shun Wei, Jielin Jiang, Xiaolong Xu:
UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection. 9994-10003 - Ruihan Xu, Haokui Zhang, Yaowei Wang, Wei Zeng, Shiliang Zhang:
NN-Former: Rethinking Graph Structure in Neural Architecture Representation. 10004-10014 - Minh-Tuan Tran, Trung Le, Xuan-May Le, Thanh-Toan Do, Dinh Q. Phung:
Enhancing Dataset Distillation via Non-Critical Region Refinement. 10015-10024 - Shu Yang, Chengting Yu, Lei Liu, Hanzhi Ma, Aili Wang, Erping Li:
Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement. 10025-10035 - Zesen Cheng, Hang Zhang, Kehan Li, Sicong Leng, Zhiqiang Hu, Fei Wu, Deli Zhao, Xin Li, Lidong Bing:
Breaking the Memory Barrier of Contrastive Loss via Tile-Based Strategy. 10036-10045 - Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves, Petros Daras:
LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty. 10046-10055 - Yongqi Huang, Peng Ye, Chenyu Huang, Jianjian Cao, Lin Zhang, Baopu Li, Gang Yu, Tao Chen:
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models. 10056-10066 - Xiaohan Qin, Xiaoxing Wang, Junchi Yan:
Towards Consistent Multi-Task Learning: Unlocking the Potential of Task-Specific Parameters. 10067-10076 - Hankyul Kang, Gregor Seifer, Donghyun Lee, Jongbin Ryu:
Do Your Best and Get Enough Rest for Continual Learning. 10077-10086 - Huiyi Wang, Haodong Lu, Lina Yao, Dong Gong:
Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning. 10087-10098 - Bowen Zheng, Da-Wei Zhou
, Han-Jia Ye, De-Chuan Zhan:
Task-Agnostic Guided Feature Expansion for Class-Incremental Learning. 10099-10109 - Mohamad Hassan N C, Divyam Gupta, Mainak Singha, Sai Bhargav Rongali, Ankit Jha, Muhammad Haris Khan, Biplab Banerjee:
OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP. 10110-10120 - Huitong Chen, Yu Wang, Yan Fan, Guosong Jiang, Qinghua Hu:
Reducing Class-wise Confusion for Incremental Learning with Disentangled Manifolds. 10121-10130 - Haopeng Sun, Yingwei Zhang, Lumin Xu, Sheng Jin, Ping Luo, Chen Qian, Wentao Liu, Yiqiang Chen:
Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling. 10131-10141 - Dexuan Zhang, Thomas Westfechtel, Tatsuya Harada:
A Theory of Learning Unified Model via Knowledge Integration from Label Space Varying Domains. 10142-10152 - Chen-Chen Zong, Sheng-Jun Huang:
Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach. 10153-10162 - Tianxiang Yin, Ningzhong Liu, Han Sun:
Towards Cost-Effective Learning: A Synergy of Semi-Supervised and Active Learning. 10163-10172 - Yijie Liu, Xinyi Shang, Yiqun Zhang
, Yang Lu, Chen Gong, Jing-Hao Xue, Hanzi Wang:
Mind the Gap: Confidence Discrepancy Can Guide Federated Semi-Supervised Learning Across Pseudo-Mismatch. 10173-10182 - Yuchuan Li, Jae-Mo Kang, Il-Min Kim:
Beyond Clean Training Data: A Versatile and Model-Agnostic Framework for Out-of-Distribution Detection with Contaminated Training Data. 10183-10192 - Shawn Li, Huixian Gong, Hao Dong, Tiankai Yang, Zhengzhong Tu, Yue Zhao:
DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection. 10193-10202 - Luyuan Xie, Tianyu Luan, Wenyuan Cai, Guochen Yan, Zhaoyu Chen, Nan Xi, Yuejian Fang
, Qingni Shen, Zhonghai Wu, Junsong Yuan:
dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis. 10203-10213 - Zhuoyao Wang, Fan Yi, Peizhu Gong, Caitou He, Cheng Jin, Weizhong Zhang:
Population Normalization for Federated Learning. 10214-10223 - Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy, Basak Guler:
Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning. 10224-10234 - Jeonghwan Park, Niall McLaughlin
, Ihsen Alouani
:
Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis. 10235-10243 - George I. Kamberov:
Doppelgangers and Adversarial Vulnerability. 10244-10254 - Wei Li, Pin-Yu Chen, Sijia Liu, Ren Wang:
PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection. 10255-10264 - Keyizhi Xu, Chi Zhang, Zhan Chen, Zhongyuan Wang, Chunxia Xiao, Chao Liang:
Rethinking the Adversarial Robustness of Multi-Exit Neural Networks in an Attack-Defense Game. 10265-10274 - Hanrui Zhao, Niuniu Qi, Mengxin Ren, Banglong Liu, Shuming Shi, Zhengfeng Yang
:
Learning-enabled Polynomial Lyapunov Function Synthesis via High-Accuracy Counterexample-Guided Framework. 10275-10284 - Jing Wang, Songhe Feng, Kristoffer Knutsen Wickstrøm, Michael C. Kampffmeyer:
AdaptCMVC: Robust Adaption to Incremental Views in Continual Multi-view Clustering. 10285-10294 - Liang Chen, Zhe Xue, Yawen Li, Meiyu Liang, Yan Wang, Anton van den Hengel, Yuankai Qi:
Medusa: A Multi-Scale High-order Contrastive Dual-Diffusion Approach for Multi-View Clustering. 10295-10304 - David Mildenberger, Paul Hager, Daniel Rueckert, Martin J. Menten:
A Tale of Two Classes: Adapting Supervised Contrastive Learning to Binary Imbalanced Datasets. 10305-10314 - Xingxuan Zhang, Jiansheng Li, Wenjing Chu, Junjia Hai, Renzhe Xu, Yuqing Yang, Shikai Guan, Jiazheng Xu, Liping Jing, Peng Cui:
On the Out-Of-Distribution Generalization of Large Multimodal Models. 10315-10326 - Xingjian Li, Qiming Zhao, Neelesh Bisht, Mostofa Rafid Uddin, Jin Yu Kim, Bryan Zhang, Min Xu:
DiffCAM: Data-Driven Saliency Maps by Capturing Feature Differences. 10327-10337 - Renshuai Tao, Haoyu Wang, Yuzhe Guo, Hairong Chen, Li Zhang, Xianglong Liu, Yunchao Wei, Yao Zhao:
Dual-view X-ray Detection: Can AI Detect Prohibited Items from Dual-view X-ray Images like Humans? 10338-10347 - Kang Liu
, Zhuoqi Ma, Xiaolu Kang, Yunan Li, Kun Xie, Zhicheng Jiao, Qiguang Miao:
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation. 10348-10359 - Yuxuan Sun, Yixuan Si, Chenglu Zhu, Xuan Gong, Kai Zhang, Pingyi Chen, Ye Zhang, Zhongyi Shui, Tao Lin, Lin Yang:
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology. 10360-10371 - Amaya Gallagher-Syed
, Henry Senior, Omnia Alwazzan, Elena Pontarini, Michele Bombardieri, Costantino Pitzalis, Myles J. Lewis, Michael R. Barnes, Luca Rossi, Gregory G. Slabaugh:
BioX-CPath: Biologically-driven Explainable Diagnostics for Multistain IHC Computational Pathology. 10372-10383 - Junjie Zhou, Jiao Tang, Yingli Zuo, Peng Wan, Daoqiang Zhang, Wei Shao:
Robust Multimodal Survival Prediction with Conditional Latent Differentiation Variational AutoEncoder. 10384-10393 - Yuxin Li, Zihao Zhu, Yuxiang Zhang, Yifan Chen, Zhibin Yu:
Boost the Inference with Co-training: A Depth-guided Mutual Learning Framework for Semi-supervised Medical Polyp Segmentation. 10394-10403 - Suruchi Kumari, Pravendra Singh:
Annotation Ambiguity Aware Semi-Supervised Medical Image Segmentation. 10404-10413 - Shijie Chang, Xiaoqi Zhao, Lihe Zhang, Tiancheng Wang:
Unified Medical Lesion Segmentation via Self-referring Indicator. 10414-10424 - Lexin Fang, Yunyang Xu
, Xiang Ma, Xuemei Li, Caiming Zhang:
Minding Fuzzy Regions: A Data-driven Alternating Learning Paradigm for Stable Lesion Segmentation. 10425-10434 - Md Mostafijur Rahman, Radu Marculescu:
EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation. 10435-10444 - Jiao Xu, Xin Chen, Lihe Zhang:
Learning Dynamic Collaborative Network for Semi-supervised 3D Vessel Segmentation. 10445-10454 - Peirong Liu
, Ana Lawry Aguila, Juan Eugenio Iglesias:
Unraveling Normal Anatomy via Fluid-Driven Anomaly Randomization. 10455-10465 - Wensheng Cheng, Zhenghong Li, Jiaxiang Ren, Hyomin Jeong, Congwu Du, Yingtian Pan, Haibin Ling:
Blood Flow Speed Estimation with Optical Coherence Tomography Angiography Images. 10466-10475 - Shufan Xi, Zexian Liu, Junlin Chang, Hongyu Wu, Xiaogang Wang, Aimin Hao:
3D Dental Model Segmentation with Geometrical Boundary Preserving. 10476-10485
Day 2: 2025-06-14
- Zhengqi Li, Richard Tucker, Forrester Cole, Qianqian Wang, Linyi Jin, Vickie Ye, Angjoo Kanazawa, Aleksander Holynski, Noah Snavely:
MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos. 10486-10496 - Linyi Jin, Richard Tucker, Zhengqi Li, David Fouhey, Noah Snavely, Aleksander Holynski:
Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos. 10497-10509 - Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A. Efros, Angjoo Kanazawa:
Continuous 3D Perception Model with Persistent State. 10510-10522 - Yiran Wang, Jiaqi Li, Chaoyi Hong, Ruibo Li, Liusheng Sun, Xiao Song, Zhe Wang, Zhiguo Cao, Guosheng Lin:
TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion. 10523-10533 - Anagh Malik, Benjamin Attal, Andrew Xie, Matthew O'Toole, David B. Lindell:
Neural Inverse Rendering from Propagating Light. 10534-10544 - Kaiyu Li, Ruixun Liu, Xiangyong Cao, Xueru Bai, Feng Zhou, Deyu Meng, Zhi Wang:
SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images. 10545-10556 - Ding Qi, Jian Li, Junyao Gao, Shuguang Dou, Ying Tai, Jianlong Hu, Bo Zhao, Yabiao Wang, Chengjie Wang
, Cairong Zhao:
Towards Universal Dataset Distillation via Task-Driven Diffusion. 10557-10566 - Jingyi Xu
, Siwei Tu, Weidong Yang, Ben Fei, Shuhao Li, Keyi Liu, Yeqi Luo, Lipeng Ma, Lei Bai:
IceDiff: High Resolution and High-Quality Arctic Sea Ice Forecasting with Generative Diffusion Prior. 10567-10576 - Kunyu Wang, Xueyang Fu, Xin Lu, Chengjie Ge, Chengzhi Cao, Wei Zhai, Zheng-Jun Zha:
Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning. 10577-10586 - Jiaxin Cai, Jingze Su, Qi Li, Wenjie Yang, Shu Wang, Tiesong Zhao
, Shengfeng He
, Wenxi Liu:
Keep the Balance: A Parameter-Efficient Symmetrical Framework for RGB+X Semantic Segmentation. 10587-10598 - Xinyu Tian, Shu Zou, Zhaoyuan Yang, Jing Zhang:
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models. 10599-10609 - Zhihe Yang, Xufang Luo, Dongqi Han, Yunjian Xu, Dongsheng Li:
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key. 10610-10620 - Zicheng Zhang, Tengchuan Kou, Shushi Wang, Chunyi Li, Wei Sun, Wei Wang, Xiaoyu Li, Zongyu Wang, Xuezhi Cao, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai:
Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content. 10621-10631 - Jihan Yang, Shusheng Yang, Anjali W. Gupta, Rilyn Han, Li Fei-Fei, Saining Xie:
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces. 10632-10643 - Andrew Szot, Bogdan Mazoure, Omar Attia, Aleksei Timofeev, Harsh Agrawal, R. Devon Hjelm, Zhe Gan, Zsolt Kira, Alexander Toshev:
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons. 10644-10655 - Peiwen Lai, Weizhi Zhong, Yipeng Qin, Xiaohang Ren, Baoyuan Wang, Guanbin Li:
LLM-driven Multimodal and Multi-Identity Listening Head Generation. 10656-10666 - Yongming Zhu, Longhao Zhang, Zhengkun Rong, Tianshu Hu, Shuang Liang, Zhipeng Ge:
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations. 10667-10677 - Jiazhi Guan, Kaisiyuan Wang, Zhiliang Xu, Quanwei Yang, Yasheng Sun, Shengyi He, Borong Liang, Yukang Cao, Yingying Li, Haocheng Feng, Errui Ding, Jingdong Wang, Youjian Zhao, Hang Zhou, Ziwei Liu:
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers. 10678-10689 - Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Jun Zhou, Lin Gu:
InsTaG: Learning Personalized 3D Talking Head from Few-Second Video. 10690-10700 - Bohao Zhang, Xuejiao Wang
, Changbo Wang, Gaoqi He:
Dynamic Stereotype Theory Induced Micro-expression Recognition with Oriented Deformation. 10701-10711 - Shengze Wang, Xueting Li, Chao Liu, Matthew A. Chan, Michael Stengel, Henry Fuchs, Shalini De Mello, Koki Nagano:
Coherent 3D Portrait Video Reconstruction via Triplane Fusion. 10712-10722 - Jianchuan Chen, Jingchuan Hu, Gaige Wang, Zhonghua Jiang, Tiansong Zhou, Zhiwen Chen, Chengfei Lv:
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting. 10723-10734 - Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas, George Kopanas, Paulo F. U. Gotardo, Thabo Beeler, Justus Thies, Timo Bolkart:
Synthetic Prior for Few-Shot Drivable Head Avatar Inversion. 10735-10746 - Linzhou Li
, Yumeng Li, Yanlin Weng, Youyi Zheng, Kun Zhou:
RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars. 10747-10757 - Hongyu Liu, Xuan Wang, Ziyu Wan, Yue Ma, Jingye Chen, Yanbo Fan, Yujun Shen, Yibing Song, Qifeng Chen:
AvatarArtist: Open-Domain 4D Avatarization. 10758-10769 - Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros Potamias, Alexandros Lattas, Stefanos Zafeiriou:
Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance. 10770-10782 - Zhiyang Guo
, Jinxu Xiang, Kai Ma, Wengang Zhou, Houqiang Li, Ran Zhang:
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters. 10783-10792 - Tianyi Xie, Yiwei Zhao, Ying Jiang, Chenfanfu Jiang:
PhysAnimator: Physics-Guided Generative Cartoon Animation. 10793-10804 - Taewoong Kang, Sohyun Jeong, Hyojin Jang, Jaegul Choo:
Zero-Shot Head Swapping in Real-World Scenarios. 10805-10814 - Zhiyu Qu, Yunqi Miao, Zhensong Zhang, Jifei Song, Jiankang Deng, Yi-Zhe Song:
CaricatureBooth: Data-Free Interactive Caricature Generation in a Photo Booth. 10815-10824 - Kwan Yun, Chaelin Kim, Hangyeul Shin, Junyong Noh:
FFaceNeRF: Few-shot Face Editing in Neural Radiance Fields. 10825-10835 - Honghu Chen, Bo Peng, Yunfan Tao, Juyong Zhang:
D^3-Human: Dynamic Disentangled Digital Human from Monocular Video. 10836-10846 - Radu Alexandru Rosu, Keyu Wu, Yao Feng, Youyi Zheng, Michael J. Black:
DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models. 10847-10857 - Hang Shao, Lei Luo, Jianjun Qian, Mengkai Yan, Shuo Chen, Jian Yang:
Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios. 10858-10867 - Chen-Wei Chang, Cheng-De Fan, Chia-Che Chang, Yi-Chen Lo, Yu-Chee Tseng, Jiun-Long Huang, Yu-Lun Liu:
GCC: Generative Color Constancy via Diffusing a Color Checker. 10868-10878 - Daniel Feijoo, Juan C. Benito, Álvaro García, Marcos V. Conde:
DarkIR: Robust Low-Light Image Restoration. 10879-10889 - Mingde Yao, Menglu Wang, King-Man Tam, Lingen Li, Tianfan Xue, Jinwei Gu:
PolarFree: Polarization-based Reflection-Free Imaging. 10890-10899 - Benquan Wang, Ruyi An, Jin-Kyu So, Sergei Kurdiumov, Eng Aik Chan, Giorgio Adamo, Yuhan Peng, Yewen Li, Bo An:
OpticalNet: An Optical Imaging Dataset and Benchmark Beyond the Diffraction Limit. 10900-10912 - Liqun Chen, Yuxuan Li, Jun Dai, Jinwei Gu, Tianfan Xue:
A Physics-Informed Blur Learning Framework for Imaging Systems. 10913-10922 - Kevin Zhang, Jia-Bin Huang, Jose Echevarria, Stephen DiVerdi, Aaron Hertzmann:
MaDCoW: Marginal Distortion Correction for Wide-Angle Photography with Arbitrary Objects. 10923-10932 - Hadi Alzayer, Philipp Henzler, Jonathan T. Barron, Jia-Bin Huang, Pratul P. Srinivasan, Dor Verbin:
Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation. 10933-10942 - Chun Gu, Xiaofei Wei, Zixuan Zeng, Yuxuan Yao, Li Zhang:
IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing. 10943-10952 - Chinmay Talegaonkar, Yash Belhe, Ravi Ramamoorthi, Nicholas Antipa
:
Volumetrically Consistent 3D Gaussian Rasterization. 10953-10963 - Federico Lincetto
, Gianluca Agresti, Mattia Rossi, Pietro Zanuttigh:
MultimodalStudio: A Heterogeneous Sensor Dataset and Framework for Neural Rendering across Multiple Imaging Modalities. 10964-10973 - Sean Wu, Shamik Basu
, Tim Broedermann, Luc Van Gool, Christos Sakaridis:
PBR-NeRF: Inverse Rendering with Physics-Based Neural Fields. 10974-10984 - Haoyuan Wang
, Zhenwei Wang
, Xiaoxiao Long, Cheng Lin, Gerhard P. Hancke, Rynson W. H. Lau:
MAGE : Single Image to Material-Aware 3D via the Multi-View G-Buffer Estimation Model. 10985-10995 - Haolin Li, Jinyang Liu, Mario Sznaier, Octavia I. Camps:
3D-HGS: 3D Half-Gaussian Splatting. 10996-11005 - Junha Hyung, Kinam Kim, Susung Hong, Min-Jung Kim, Jaegul Choo:
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling. 11006-11015 - Zhuoman Liu, Weicai Ye, Yan Luximon, Pengfei Wan, Di Zhang:
Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation. 11016-11025 - Briac Toussaint, Diego Thomas, Jean-Sébastien Franco:
ProbeSDF: Light Field Probes For Neural Surface Reconstruction. 11026-11035 - Zhiyuan Ma, Xinyue Liang, Rongyuan Wu, Xiangyu Zhu, Zhen Lei, Lei Zhang:
Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data. 11036-11050 - Fangyu Wu, Yuhao Chen:
FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting. 11051-11060 - Wang Zhao, Yan-Pei Cao, Jiale Xu, Yuejiang Dong, Ying Shan:
DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation. 11061-11072 - Cheng Chen, Jiacheng Wei, Tianrun Chen, Chi Zhang, Xiaofeng Yang, Shangzhan Zhang, Bingchen Yang, Chuan-Sheng Foo, Guosheng Lin, Qixing Huang, Fayao Liu
:
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images. 11073-11082 - Jinnan Chen, Lingting Zhu, Zeyu Hu, Shengju Qian, Yugang Chen, Xin Wang, Gim Hee Lee:
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation. 11083-11092 - Haohan Weng, Zibo Zhao, Biwen Lei, Xianghui Yang, Jian Liu, Zeqiang Lai, Zhuo Chen, Yuhong Liu, Jie Jiang, Chunchao Guo, Tong Zhang, Shenghua Gao, C. L. Philip Chen:
Scaling Mesh Generation via Compressive Tokenization. 11093-11103 - Qitong Yang, Mingtao Feng, Zijie Wu, Weisheng Dong, Fangfang Wu, Yaonan Wang, Ajmal Mian
:
Hierarchical Gaussian Mixture Model Splatting for Efficient and Part Controllable 3D Generation. 11104-11114 - Seonhwa Kim, Jiwon Kim, Soobin Park, Donghoon Ahn, Jiwon Kang, Seungryong Kim, Kyong Hwan Jin, Eunju Cha:
Identity-preserving Distillation Sampling by Fixed-Point Iterator. 11115-11124 - Martin Spitznagel, Jan Vaillant, Janis Keuper:
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations? 11125-11134 - Dong In Lee, Hyeongcheol Park, Jiyoung Seo, Eunbyung Park, Hyunje Park, Hadam Baek, Sangheon Shin, Sangmin Kim, Sangpil Kim:
EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting. 11135-11145 - Youyu Chen, Junjun Jiang, Kui Jiang, Xiao Tang, Zhihao Li, Xianming Liu, Yinyu Nie:
DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds. 11146-11155 - Zhenqi Dai, Ting Liu, Yanning Zhang:
Efficient Decoupled Feature 3D Gaussian Splatting via Hierarchical Compression. 11156-11166 - Jiahui Zhang, Fangneng Zhan, Ling Shao, Shijian Lu:
SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting. 11167-11176 - Yuanjian Qiao, Mingwen Shao, Lingzhuang Meng, Kai Xu:
RestorGS: Depth-aware Gaussian Splatting for Efficient 3D Scene Restoration. 11177-11186 - Yufan Zhang, Yu Ji, Yu Guo, Jinwei Ye:
Seeing A 3D World in A Grain of Sand. 11187-11196 - Long Ma, Yuxin Feng, Yan Zhang, Jinyuan Liu, Weimin Wang, Guang-Yong Chen, Chengpei Xu, Zhuo Su:
CoA: Towards Real Image Dehazing via Compression-and-Adaptation. 11197-11206 - Yutong Liu, Wenming Weng, Yueyi Zhang, Zhiwei Xiong:
S2D-LFE: Sparse-to-Dense Light Field Event Generation. 11207-11216 - Li Fang, Hao Zhu, Longlong Chen, Fei Hu, Long Ye, Zhan Ma:
Depth-Guided Bundle Sampling for Efficient Generalizable Neural Radiance Field Reconstruction. 11217-11226 - Chin-Yang Lin, Chung-Ho Wu, Chang-Han Yeh, Shih-Han Yen, Cheng Sun, Yu-Lun Liu:
FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors. 11227-11238 - Ankit Dhiman, Manan Shah, R. Venkatesh Babu:
MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World. 11239-11249 - Yuanxun Lu, Jingyang Zhang, Tian Fang, Jean-Daniel Nahmias, Yanghai Tsin, Long Quan, Xun Cao, Yao Yao, Shiwei Li:
Matrix3D: Large Photogrammetry Model All-in-One. 11250-11263 - Guibiao Liao, Qing Li, Zhenyu Bao, Guoping Qiu, Kanglin Liu:
SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs. 11264-11274 - Hyunho Ha, Lei Xiao, Christian Richardt, Thu Nguyen-Phuoc, Changil Kim, Min H. Kim, Douglas Lanman, Numair Khan:
Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency. 11275-11285 - Sheng Miao, Jiaxin Huang, Dongfeng Bai, Xu Yan, Hongyu Zhou, Yue Wang, Bingbing Liu, Andreas Geiger, Yiyi Liao:
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis. 11286-11296 - Yuhan Wang
, Fangzhou Hong, Shuai Yang, Liming Jiang, Wayne Wu, Chen Change Loy:
MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention. 11297-11306 - Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, Yanning Zhang:
Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views. 11307-11316 - Wenyuan Zhang
, Emily Yue-ting Jia, Junsheng Zhou, Baorui Ma, Kanle Shi, Yu-Shen Liu, Zhizhong Han:
NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction. 11317-11327 - Mingjun Zheng, Long Sun, Jiangxin Dong, Jinshan Pan:
Efficient Video Super-Resolution for Real-time Rendering with Decoupled G-buffer Guidance. 11328-11337 - Sangwoon Kwak, Joonsoo Kim
, Jun Young Jeong, Won-Sik Cheong, Jihyong Oh, Munchurl Kim:
MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting. 11338-11348 - Yuheng Jiang, Zhehao Shen, Chengcheng Guo, Yu Hong, Zhuo Su, Yingliang Zhang, Marc Habermann, Lan Xu:
RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance. 11349-11360 - Miaowei Wang, Yibo Zhang, Weiwei Xu, Rui Ma, Changqing Zou, Daniel D. Morris
:
DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction. 11361-11372 - Navami Kairanda, Marc Habermann, Shanthika Naik, Christian Theobalt, Vladislav Golyanik:
Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields. 11373-11383 - Xinjie Li, Ziyi Chen
, Xinlu Yu, Iek-Heng Chu, Peng Chang, Jing Xiao:
Co-Speech Gesture Video Generation with Implicit Motion-Audio Entanglement. 11384-11394 - Natacha Kuete Meli, Vladislav Golyanik, Marcel Seelbach Benkner, Michael Moeller:
QuCOOP: A Versatile Framework for Solving Composite and Binary-Parametrised Problems on Quantum Annealers. 11395-11405 - Shashwath Bharadwaj, Ruangrawee Kitichotkul, Akshay Agarwal, Vivek K. Goyal:
Image Reconstruction from Readout-Multiplexed Single-Photon Detector Arrays. 11406-11415 - Yuanlin Wang, Yiyang Zhang, Ruiqin Xiong, Jing Zhao, Jian Zhang, Xiaopeng Fan, Tiejun Huang:
Spk2SRImgNet: Super-Resolve Dynamic Scene from Spike Stream via Motion Aligned Collaborative Filtering. 11416-11426 - Bohan Yu, Jin Han, Boxin Shi, Imari Sato:
EventPSR: Surface Normal and Reflectance Estimation from Photometric Stereo Using an Event Camera. 11427-11436 - Cheng Zhang, Haofei Xu, Qianyi Wu, Camilo Cruz Gambardella, Dinh Q. Phung, Jianfei Cai:
PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting. 11437-11447 - Xuan Shen, Weize Ma, Jing Liu, Changdi Yang, Rui Ding, Quanyi Wang, Henghui Ding, Wei Niu, Yanzhi Wang, Pu Zhao, Jun Lin, Jiuxiang Gu:
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge. 11448-11460 - Jianhao Zheng
, Zihan Zhu, Valentin Bieri, Marc Pollefeys, Songyou Peng, Iro Armeni:
WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments. 11461-11471 - Hoang Chuong Nguyen, Wei Mao, José M. Álvarez, Miaomiao Liu:
Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video. 11472-11481 - Jinnyeong Kim, Seung-Hwan Baek:
Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision. 11482-11492 - Sergio Izquierdo, Mohamed Sayed, Michael Firman, Guillermo Garcia-Hernando, Daniyar Turmukhambetov, Javier Civera, Oisin Mac Aodha, Gabriel J. Brostow, Jamie Watson:
MVSAnywhere: Zero-Shot Multi-View Stereo. 11493-11504 - Yaqing Ding, Viktor Kocur, Zuzana Berger Haladová, Qianliang Wu, Shen Cai, Jian Yang, Zuzana Kukelova:
Three-view Focal Length Recovery From Homographies. 11505-11514 - Ji Zhao, Banglei Guan, Zibin Liu, Laurent Kneip:
Full-DoF Egomotion Estimation for Event Cameras Using Geometric Solvers. 11515-11524 - Haifeng Wu, Shuhang Gu, Lixin Duan, Wen Li:
GeoDepth: From Point-to-Depth to Plane-to-Depth Modeling for Self-Supervised Monocular Depth Estimation. 11525-11535 - Xudong Jiang, Fangjinhua Wang, Silvano Galliani, Christoph Vogel, Marc Pollefeys:
R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization. 11536-11546 - Ron Ferens, Yosi Keller:
HyperPose: Hypernetwork-Infused Camera Pose Localization and an Extended Cambridge Landmarks Dataset. 11547-11557 - Nicole Damblon, Marc Pollefeys, Daniel Barath:
Learning to Filter Outlier Edges in Global SfM. 11558-11568 - Max Kahl, Sebastian Stricker, Lisa Hutschenreiter, Florian Bernard, Carsten Rother, Bogdan Savchynskyy:
Towards Optimizing Large-Scale Multi-Graph Matching in Bioimaging. 11569-11578 - Qiyang Qian, Hansheng Chen, Masayoshi Tomizuka, Kurt Keutzer, Qianqian Wang, Chenfeng Xu:
Bridging Viewpoint Gaps: Geometric Reasoning Boosts Semantic Correspondence. 11579-11589 - Aviral Chharia
, Wenbo Gou, Haoye Dong:
MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation. 11590-11599 - Chamuditha Jayanga Galappaththige, Jason Lai, Lloyd Windrim, Donald G. Dansereau
, Niko Sünderhauf, Dimity Miller
:
Multi-View Pose-Agnostic Change Localization with Zero Labels. 11600-11610 - Yihan Chen, Wenfei Yang, Huan Ren, Shifeng Zhang, Tianzhu Zhang, Feng Wu:
Structure-Aware Correspondence Learning for Relative Pose Estimation. 11611-11621 - Sungphill Moon, Hyeontae Son, Dongcheol Hur, Sangwook Kim:
Co-op: Correspondence-based Novel Object Pose Estimation. 11622-11632 - Taeyeop Lee, Bowen Wen, Minjun Kang, Gyuree Kang, In So Kweon, Kuk-Jin Yoon:
Any6D: Model-free 6D Pose Estimation of Novel Objects. 11633-11643 - Jingnan Shi, Rajat Talak, Harry Zhang, David Jin, Luca Carlone:
CRISP: Object Pose and Shape Estimation with Test-Time Adaptation. 11644-11653 - Jingshun Huang, Haitao Lin, Tianyu Wang, Yanwei Fu, Xiangyang Xue, Yi Zhu:
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image. 11654-11664 - Yizheng Xie, Viktoria Ehm, Paul Roetzer, Nafie El Amrani, Maolin Gao, Florian Bernard, Daniel Cremers:
EchoMatch: Partial-to-Partial Shape Matching via Correspondence Reflection. 11665-11675 - Sayak Nag, Udita Ghosh, Calvin-Khang Ta, Sarosij Bose, Jiachen Li, Amit K. Roy-Chowdhury:
Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation. 11676-11686 - Kyujin Shim, Kangwook Ko, Yujin Yang, Changick Kim:
Focusing on Tracks for Online Multi-Object Tracking. 11687-11696 - Hyunseop Kim, Hyo-Jun Lee, Yonguk Lee, Jinu Lee, Hanul Kim, Yeong Jun Koh:
GRAE-3DMOT: Geometry Relation-Aware Encoder for Online 3D Multi-Object Tracking. 11697-11706 - Weizhuo Li
, Yue Xi, Wenjing Jia
, Zehao Zhang, Fei Li, Xiangzeng Liu, Qiguang Miao:
PointSR: Self-Regularized Point Supervision for Drone-View Object Detection. 11707-11716 - Sijie Wang
, Rui She
, Qiyu Kang, Siqi Li, Disheng Li, Tianyu Geng, Shangshu Yu, Wee Peng Tay:
Multi-Modal Aerial-Ground Cross-View Place Recognition with Neural ODEs. 11717-11728 - Huan Lei:
OffsetOPT: Explicit Surface Reconstruction without Normals. 11729-11738 - Chen Zhang, Wentao Wang, Ximeng Li, Xinyao Liao, Wanjuan Su, Wenbing Tao:
High-Fidelity Lightweight Mesh Reconstruction from Point Clouds. 11739-11748 - Zhaiyu Chen
, Yuqing Wang, Liangliang Nan, Xiaoxiang Zhu:
Parametric Point Cloud Completion for Polygonal Surface Reconstruction. 11749-11758 - Aocheng Li
, James Zimmer-Dauphinee, Rajesh Kalyanam, Ian Lindsay, Parker VanValkenburgh, Steven A. Wernke, Daniel G. Aliaga:
Self-Supervised Large Scale Point Cloud Completion for Archaeological Site Restoration. 11759-11768 - Kexue Fu, Mingzhi Yuan, Changwei Wang, Weiguang Pang, Jing Chi, Manning Wang, Longxiang Gao:
Dual Focus-Attention Transformer for Robust Point Cloud Registration. 11769-11778 - Changhao Peng:
Generalized Gaussian Entropy Model for Point Cloud Attribute Compression with Dynamic Likelihood Intervals. 11779-11788 - Dekai Zhu, Yan Di, Stefan Gavranovic, Slobodan Ilic:
SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation. 11789-11798 - Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, Sahar Dastani, Milad Cheraghalikhani, Gustavo Adolfo Vargas Hakim, David Osowiechi, Farzad Beizaee, Ismail Ben Ayed, Christian Desrosiers:
Spectral Informed Mamba for Robust Point Cloud Processing. 11799-11809 - Tanuj Sur, Samrat Mukherjee, Kaizer Rahaman, Subhasis Chaudhuri, Muhammad Haris Khan, Biplab Banerjee:
Hyperbolic Uncertainty-Aware Few-Shot Incremental Point Cloud Segmentation. 11810-11821 - Jianhui Zhang
, Yizhi Luo, Zicheng Zhang, Xuecheng Nie, Bonan Li:
CamPoint: Boosting Point Cloud Segmentation with Virtual Camera. 11822-11832 - Radu Berdan, Beril Besbinar, Christoph Reinders, Junji Otsuka, Daisuke Iso:
ReRAW: RGB-to-RAW Image Reconstruction via Stratified Sampling for Efficient Object Detection on the Edge. 11833-11843 - Zhuochen Yu
, Bijie Qiu, Andy W. H. Khong:
ViKIENet: Towards Efficient 3D Object Detection with Virtual Key Instance Enhanced Network. 11844-11853 - Hala Djeghim, Nathan Piasco, Moussâb Bennehar, Luis Roldão, Dzmitry Tsishkou, Désiré Sidibé:
ViiNeuS: Volumetric Initialization for Implicit Neural Surface Reconstruction of Urban Scenes with Limited Image Overlap. 11854-11863 - Jichun Zhao, Haiyong Jiang, Haoxuan Song, Jun Xiao, Dong Gong:
D^3CTTA: Domain-Dependent Decorrelation for Continual Test-Time Adaption of 3D LiDAR Segmentation. 11864-11874 - Alexey Nekrasov, Malcolm Burdorf, Stewart Worrall, Bastian Leibe, Julie Stephany Berrio Perez:
Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving. 11875-11885 - Daizong Liu, Wei Hu:
Seeing is Not Believing: Adversarial Natural Object Optimization for Hard-Label 3D Scene Attacks. 11886-11897 - Houzhang Fang, Xiaolin Wang, Zengyang Li, Lu Wang, Qingshan Li, Yi Chang, Luxin Yan:
Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAV Target Detection. 11898-11907 - Shihang Du, Sanqing Qu, Tianhang Wang, Xudong Zhang, Yunwei Zhu, Jian Mao, Fan Lu, Qiao Lin, Guang Chen:
RCP-Bench: Benchmarking Robustness for Collaborative Perception Under Diverse Corruptions. 11908-11918 - Jiahui Fu, Yue Gong, Luting Wang
, Shifeng Zhang, Xu Zhou, Si Liu:
Generative Map Priors for Collaborative BEV Semantic Segmentation. 11919-11928 - Xiyue Guo, Jiarui Hu, Junjie Hu, Hujun Bao, Guofeng Zhang:
SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion. 11929-11938 - Jongseong Bae, Junwoo Ha, Ha Young Kim:
Three Cars Approaching within 100m! Enhancing Distant Geometry by Tri-Axis Voxel Scanning for Camera-based Semantic Scene Completion. 11939-11948 - Heng Li, Yuenan Hou, Xiaohan Xing, Yuexin Ma, Xiao Sun, Yanyong Zhang:
OccMamba: Semantic Occupancy Prediction with State Space Models. 11949-11959 - Haoyi Jiang, Liu Liu, Tianheng Cheng, Xinjie Wang, Tianwei Lin, Zhizhong Su
, Wenyu Liu, Xinggang Wang
:
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding. 11960-11970 - Bohan Li, Jiazhe Guo, Hongsi Liu, Yingshuang Zou, Yikang Ding, Xiwu Chen, Hu Zhu, Feiyang Tan, Chi Zhang, Tiancai Wang, Shuchang Zhou, Li Zhang, Xiaojuan Qi, Hao Zhao, Mu Yang, Wenjun Zeng, Xin Jin:
UniScene: Unified Occupancy-centric Driving Scene Generation. 11971-11981 - Georg Hess, Carl Lindström, Maryam Fatemi, Christoffer Petersson, Lennart Svensson:
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving. 11982-11992 - Katrin Renz, Long Chen, Elahe Arani, Oleg Sinavski:
SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment. 11993-12003 - Lue Fan, Hao Zhang, Qitai Wang, Hongsheng Li, Zhaoxiang Zhang:
FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes. 12004-12014 - Guosheng Zhao, Chaojun Ni, Xiaofeng Wang, Zheng Zhu, Xueyang Zhang, Yida Wang, Guan Huang, Xinze Chen, Boyuan Wang, Youyi Zhang, Wenjun Mei, Xingang Wang:
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation. 12015-12026 - Tai-Yu Pan, Sooyoung Jeon, Mengdi Fan, Jinsu Yoo, Zhenyang Feng, Mark E. Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao:
Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene. 12027-12036 - Bencheng Liao, Shaoyu Chen, Haoran Yin, Bo Jiang, Cheng Wang
, Sixu Yan, Xinbang Zhang, Xiangyu Li, Ying Zhang, Qian Zhang, Xinggang Wang:
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving. 12037-12047 - Zhiying Song, Lei Yang, Fuxi Wen, Jun Li:
TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception. 12048-12057 - Yizhou Huang, Yihua Cheng, Kezhi Wang:
Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM. 12058-12067 - Xuesong Chen, Linjiang Huang, Tao Ma, Rongyao Fang, Shaoshuai Shi, Hongsheng Li:
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving. 12068-12077 - Xinshuai Song, Weixing Chen, Yang Liu, Weikai Chen, Guanbin Li, Liang Lin:
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method. 12078-12088 - Zhiyuan Zhang, Xiaofan Li, Zhihao Xu, Wenjie Peng
, Zijian Zhou, Miaojing Shi, Shuangping Huang:
MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving. 12089-12099 - Hao Ren, Yiming Zeng, Zetong Bi, Zhaoliang Wan, Junlong Huang, Hui Cheng:
Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models. 12100-12110 - Steeven Janny, Hervé Poirier, Leonid Antsfeld, Guillaume Bono, Gianluca Monaci, Boris Chidlovskii, Francesco Giuliari, Alessio Del Bue, Christian Wolf:
Reasoning in Visual Navigation of End-to-end Trained Agents: A Dynamical Systems Approach. 12111-12121 - Shaofei Cai, Zihao Wang, Kewei Lian, Zhancun Mu, Xiaojian Ma, Anji Liu
, Yitao Liang:
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting. 12122-12131 - Can Zhang
, Gim Hee Lee:
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments. 12132-12142 - Xin Wen, Bingchen Zhao, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi:
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning. 12143-12154 - Yanbang Li, Ziyang Gong, Haoyang Li, Xiaoqi Huang, Haolan Kang, Guangping Bai, Xianzheng Ma:
Robotic Visual Instruction. 12155-12165 - Sangmin Lee, Sungyong Park, Heewon Kim:
DynScene: Scalable Generation of Dynamic Robotic Manipulation Scenes for Embodied AI. 12166-12175 - Sen Wang, Le Wang, Sanping Zhou, Jingyi Tian, Jiayi Li, Haowen Sun, Wei Tang:
FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation. 12176-12186 - Ning Gao, Yilun Chen, Shuai Yang, Xinyi Chen, Yang Tian, Hao Li, Haifeng Huang
, Hanqing Wang, Tai Wang, Jiangmiao Pang:
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation. 12187-12198 - Wenbo Wang, Fangyun Wei, Lei Zhou, Xi Chen, Lin Luo, Xiaohan Yi, Yizhong Zhang, Yaobo Liang, Chang Xu, Yan Lu, Jiaolong Yang, Baining Guo:
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping. 12199-12208 - Youxin Pang, Ruizhi Shao, Jiajun Zhang, Hanzhang Tu, Yun Liu, Boyao Zhou, Hongwen Zhang, Yebin Liu:
ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping. 12209-12219 - Shijian Jiang, Qi Ye, Rengan Xie, Yuchi Huo, Jiming Chen:
Hand-held Object Reconstruction from RGB Video with Dynamic Interaction. 12220-12230 - Yinqiao Wang, Hao Xu, Pheng-Ann Heng, Chi-Wing Fu:
UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation. 12231-12241 - Rolandos Alexandros Potamias, Jinglei Zhang, Jiankang Deng, Stefanos Zafeiriou:
WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild. 12242-12254 - Zhuoran Zhao, Linlin Yang, Pengzhan Sun, Pan Hui, Angela Yao:
Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation. 12255-12265 - Sirui Xu, Hung Yu Ling
, Yu-Xiong Wang, Liang-Yan Gui:
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions. 12266-12277 - Uyoung Jeong, Jonathan Freer, Seungryul Baek, Hyung Jin Chang, Kwang In Kim:
PoseBH: Prototypical Multi-Dataset Training Beyond Human Pose Estimation. 12278-12288 - Qingzheng Xu, Ru Cao, Xin Shen, Heming Du, Sen Wang, Xin Yu
:
M3GYM: A Large-Scale Multimodal Multi-view Multi-person Pose Dataset for Fitness Activity Understanding in Real-world Settings. 12289-12300 - Mohammadhossein Bahari, Saeed Saadatnejad, Amirhossein Askari-Farsangi, Seyed-Mohsen Moosavi-Dezfooli, Alexandre Alahi:
Certified Human Trajectory Prediction. 12301-12311 - Ming Yan
, Xincheng Lin, Yuhua Luo, Shuqi Fan, Yudi Dai, Qixin Zhong, Lincai Zhong, Yuexin Ma, Lan Xu, Chenglu Wen, Siqi Shen, Cheng Wang:
ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate. 12312-12323 - Hiromu Taketsugu, Takeru Oba, Takahiro Maeda, Shohei Nobuhara, Norimichi Ukita:
Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment. 12324-12334 - Ting Yu, Yi Lin, Jun Yu, Zhenyu Lou, Qiongjie Cui:
Vision-Guided Action: Enhancing 3D Human Motion Prediction with Gaze-informed Affordance in 3D Scenes. 12335-12346 - Dongyang Jin, Chao Fan, Jingzhe Ma, Jingkai Zhou, Weihua Chen, Shiqi Yu
:
On Denoising Walking Videos for Gait Recognition. 12347-12357 - Ling-An Zeng, Guohong Huang, Yi-Lin Wei, Shengbo Gu, Yu-Ming Tang, Jingke Meng, Wei-Shi Zheng:
ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation. 12358-12369 - Tao Wang
, Zhihua Wu, Qiaozhi He, Jiaming Chu, Ling Qian, Yu Cheng, Junliang Xing, Jian Zhao, Lei Jin:
StickMotion: Generating 3D Human Motions by Drawing a Stickman. 12370-12379 - Pablo Ruiz-Ponce, Germán Barquero, Cristina Palmero, Sergio Escalera, José García Rodríguez:
MixerMDM: Learnable Composition of Human Motion Diffusion Models. 12380-12390 - Boyuan Wang, Xiaofeng Wang, Chaojun Ni, Guosheng Zhao, Zhiqin Yang, Zheng Zhu, Muyang Zhang, Yukun Zhou, Xinze Chen, Guan Huang, Lihong Liu, Xingang Wang:
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation. 12391-12401 - Neerja Thakkar, Tara Sadjadpour, Jathushan Rajasegeran, Shiry Ginosar, Jitendra Malik:
Poly-Autoregressive Prediction for Modeling Interactions. 12402-12412 - Baixuan Lv, Yaohua Zha, Tao Dai, Xue Yuerong, Ke Chen, Shu-Tao Xia:
Adapting Pre-trained 3D Models for Point Cloud Video Understanding via Cross-frame Spatio-temporal Perception. 12413-12422 - Jaeah Lee, Changwoon Choi, Young Min Kim, Jaesik Park:
Recovering Dynamic 3D Sketches from Videos. 12423-12432 - Jinxi Li, Ziyang Song, Siyuan Zhou, Bo Yang:
FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity. 12433-12443 - Chris Rockwell, Joseph Tung, Tsung-Yi Lin, Ming-Yu Liu, David F. Fouhey, Chen-Hsuan Lin:
Dynamic Camera Poses and Where to Find Them. 12444-12455 - Jingxi Chen, Brandon Y. Feng, Haoming Cai, Tianfu Wang, Levi Burner, Dehao Yuan, Cornelia Fermüller, Christopher A. Metzler, Yiannis Aloimonos:
Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation. 12456-12466 - Rick Akkerman, Haiwen Feng, Michael J. Black, Dimitrios Tzionas, Victoria Fernández Abrevaya:
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models. 12467-12479 - Emanuele Aiello, Umberto Michieli, Diego Valsesia, Mete Ozay, Enrico Magli:
DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching. 12480-12489 - Hanlin Wang, Hao Ouyang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Qifeng Chen, Yujun Shen, Limin Wang:
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis. 12490-12500 - Xi Chen, Zhifei Zhang, He Zhang, Yuqian Zhou, Soo Ye Kim, Qing Liu, Yijun Li, Jianming Zhang, Nanxuan Zhao, Yilin Wang, Hui Ding, Zhe Lin, Hengshuang Zhao:
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics. 12501-12511 - Jie Tian, Xiaoye Qu, Zhenyi Lu, Wei Wei, Sichen Liu, Yu Cheng:
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think. 12512-12521 - Yao-Chih Lee, Erika Lu, Sarah Rumbley, Michal Geyer, Jia-Bin Huang, Tali Dekel, Forrester Cole:
Generative Omnimatte: Learning to Decompose Video into Layers. 12522-12532 - Uri Gadot, Assaf Shocher, Shie Mannor, Gal Chechik, Assaf Hallak:
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression. 12533-12542 - Zhaoyang Jia, Bin Li, Jiahao Li, Wenxuan Xie, Linfeng Qi, Houqiang Li, Yan Lu:
Towards Practical Real-Time Neural Video Compression. 12543-12552 - Chuanbo Tang, Zhuoyuan Li, Yifan Bian, Li Li, Dong Liu:
Neural Video Compression with Context Modulation. 12553-12563 - Zeyu Xiao, Xinchao Wang:
Event-based Video Super-Resolution via State Space Models. 12564-12574 - Shuaizhen Yao, Xiaoya Zhang, Xin Liu, Mengyi Liu, Zhen Cui:
STDD: Spatio-Temporal Dual Diffusion for Video Generation. 12575-12584 - Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang, Jiangning Zhang, Hao Chen, Mingmin Chi, Yabiao Wang, Wenhan Luo:
OSV: One Step is Enough for High-Quality Image to Video Generation. 12585-12594 - Dongnan Gui, Xun Guo, Wengang Zhou, Yan Lu:
I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models. 12595-12604 - Zhaolin Wan, Han Qin, Zhiyang Li, Xiaopeng Fan, Wangmeng Zuo, Debin Zhao:
CASP: Consistency-aware Audio-induced Saliency Prediction Model for Omnidirectional Video. 12605-12614 - Zhiyuan Yan, Yandan Zhao, Shen Chen, Mingyi Guo, Xinghe Fu, Taiping Yao, Shouhong Ding, Yunsheng Wu, Li Yuan:
Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning. 12615-12625 - Jingkai Wang, Jue Gong, Lin Zhang, Zheng Chen, Xing Liu, Hong Gu, Yutong Liu, Yulun Zhang, Xiaokang Yang:
OSDFace: One-Step Diffusion Model for Face Restoration. 12626-12636 - Mengqiu Xu, Kaixin Chen, Heng Guo, Yixiang Huang, Ming Wu, Zhenwei Shi, Chuang Zhang, Jun Guo:
MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting. 12637-12646 - Qi Zang, Dong Zhao, Shuang Wang, Dou Quan, Zhun Zhong:
Feature Spectrum Learning for Remote Sensing Change Detection. 12647-12657 - Yinghui Xing, Litao Qu, Shizhou Zhang, Di Xu, Yingkun Yang, Yanning Zhang:
Dual-Granularity Semantic Guided Sparse Routing Diffusion Model for General Pansharpening. 12658-12668 - Jin-Liang Xiao, Ting-Zhu Huang, Liang-Jian Deng, Guang Lin, Zihan Cao, Chao Li, Qibin Zhao:
Hyperspectral Pansharpening via Diffusion Models with Iteratively Zero-Shot Guidance. 12669-12678 - Yuchen Wang, Hongyuan Wang, Lizhi Wang, Xin Wang, Lin Zhu, Wanxuan Lu, Hua Huang:
Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising. 12679-12689 - Ning Ni
, Libao Zhang:
Hazy Low-Quality Satellite Video Restoration Via Learning Optimal Joint Degradation Patterns and Continuous-Scale Super-Resolution Reconstruction. 12690-12699 - Jiayi Fu, Siyu Liu, Zikun Liu, Chun-Le Guo, Hyunhee Park, Ruiqi Wu, Guoqing Wang, Chongyi Li:
Iterative Predictor-Critic Code Decoding for Real-World Image Dehazing. 12700-12709 - Lingshun Kong, Jiangxin Dong, Jinhui Tang, Ming-Hsuan Yang, Jinshan Pan:
Efficient Visual State Space Model for Image Deblurring. 12710-12719 - Hanze Liu, Jiahong Fu, Qi Xie, Deyu Meng:
Rotation-Equivariant Self-Supervised Method in Image Denoising. 12720-12730 - Xuyi He, Yuhui Quan, Ruotao Xu, Hui Ji:
A Universal Scale-Adaptive Deformable Transformer for Image Restoration across Diverse Artifacts. 12731-12741 - Du Chen, Tianhe Wu
, Kede Ma
, Lei Zhang:
Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption. 12742-12752 - Eduard Zamfir, Zongwei Wu, Nancy Mehta, Yuedong Tan, Danda Pani Paudel, Yulun Zhang, Radu Timofte:
Complexity Experts are Task-Discriminative Learners for Any Image Restoration. 12753-12763 - Wenyang Luo, Haina Qin, Zewen Chen, Libin Wang, Dandan Zheng, Yuming Li, Yufan Liu, Bing Li, Weiming Hu:
Visual-Instructed Degradation Diffusion for All-in-One Image Restoration. 12764-12777 - Libo Zhu, Jianze Li, Haotong Qin, Wenbo Li, Yulun Zhang, Yong Guo, Xiaokang Yang:
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution. 12778-12788 - Isma Hadji, Mehdi Noroozi, Victor Escorcia, Anestis Zaganidis, Brais Martínez, Georgios Tzimiropoulos:
Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning. 12789-12798 - Feiyang Shen, Hongping Gan:
HUNet: Homotopy Unfolding Network for Image Compressive Sensing. 12799-12808 - Dehong Kong, Fan Li, Zhixin Wang, Jiaqi Xu, Renjing Pei, Wenbo Li, Wenqi Ren:
Dual Prompting Image Restoration with Diffusion Transformers. 12809-12819 - Jiaming Liu, Qi Zheng, Zihao Liu, Yilian Zhong, Peiye Liu, Tao Liu, Shusong Xu, Yanheng Lu, Sicheng Li, Dimin Niu, Yibo Fan:
Frequency-Biased Synergistic Design for Image Compression and Compensation. 12820-12829 - Beilin Chu, Xuan Xu, Xin Wang, Yufei Zhang, Weike You, Linna Zhou:
FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error. 12830-12839 - Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang, Jing Liao
, Shuhang Gu, Dejun Zheng, Changbo Wang, Chenhui Li:
Robust Message Embedding via Attention Flow-Based Steganography. 12840-12849 - Jingbo Lu, Leheng Zhang, Xingyu Zhou, Mu Li, Wen Li, Shuhang Gu:
Learned Image Compression with Dictionary-based Entropy Model. 12850-12859 - Weinan Jia, Mengqi Huang, Nan Chen, Lei Zhang, Zhendong Mao:
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation. 12860-12870 - Anubhav Jain, Yuya Kobayashi, Takashi Shibuya, Yuhta Takida, Nasir D. Memon, Julian Togelius, Yuki Mitsufuji:
Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization. 12871-12879 - Lei Wang, Senmao Li, Fei Yang, Jianye Wang, Ziheng Zhang, Yuhan Liu, Yaxing Wang, Jian Yang:
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability. 12880-12890 - Hui Zhang, Tingwei Gao, Jie Shao, Zuxuan Wu:
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers. 12891-12900 - Xinyin Ma, Runpeng Yu, Songhua Liu, Gongfan Fang, Xinchao Wang:
Diffusion Model is Effectively Its Own Teacher. 12901-12911 - Zhiwei Jia, Yuesong Nan, Huixi Zhao, Gengdai Liu:
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward. 12912-12922 - Xin Ding, Lei Yu, Xin Li, Zhijun Tu, Hanting Chen, Jie Hu, Zhibo Chen:
RaSS: Improving Denoising Diffusion Samplers with Reinforced Active Sampling Scheduler. 12923-12933 - Kai Wang, Mingjia Shi, Yukun Zhou, Zekai Li, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, Yang You:
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training. 12934-12944 - Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik:
Scaling Properties of Diffusion Models For Perceptual Tasks. 12945-12954 - Yuqing Wang, Shuhuai Ren, Zhijie Lin, Yujin Han, Haoyuan Guo, Zhenheng Yang, Difan Zou, Jiashi Feng, Xihui Liu:
Parallelized Autoregressive Visual Generation. 12955-12965 - Chengyue Wu, Xiaokang Chen, Zhiyu Wu, Yiyang Ma, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan, Ping Luo:
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation. 12966-12977 - Shenghai Yuan, Jinfa Huang, Xianyi He, Yunyang Ge, Yujun Shi, Liuhan Chen, Jiebo Luo, Li Yuan:
Identity-Preserving Text-to-Video Generation by Frequency Decomposition. 12978-12988 - Weixi Feng, Chao Liu, Sifei Liu, William Yang Wang, Arash Vahdat, Weili Nie:
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations. 12989-12998 - Jiazi Bu, Pengyang Ling, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang:
ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way. 12999-13008 - Yuwei Guo, Ceyuan Yang, Anyi Rao, Chenlin Meng, Omer Bar-Tal, Shuangrui Ding, Maneesh Agrawala
, Dahua Lin, Bo Dai:
Keyframe-Guided Creative Video Inpainting. 13009-13020 - Jaerin Lee, Daniel Sungho Jung, Kanggeon Lee, Kyoung Mu Lee:
SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models. 13021-13030 - Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser, Takahiro Shirakawa, Ko Watanabe, Andreas Dengel, Jinjia Zhou:
TKG-DM: Training-free Chroma Key Content Generation Diffusion Model. 13031-13040 - Ziheng Ouyang, Zhen Li, Qibin Hou:
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs. 13041-13050 - Chunnan Shang, Zhizhong Wang, Hongwei Wang, Xiangming Meng:
SCSA: A Plug-and-Play Semantic Continuous-Sparse Attention for Arbitrary Semantic Style Transfer. 13051-13060 - Ta Ying Cheng, Prafull Sharma, Mark Boss, Varun Jampani:
MARBLE: Material Recomposition and Blending in CLIP-Space. 13061-13071 - Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Wen Wang, Zhiheng Liu, Qifeng Chen, Yujun Shen:
MagicQuill: An Intelligent Interactive Image Editing System. 13072-13082 - Yusuf Dalva, Kavana Venkatesh, Pinar Yanardag:
FluxSpace: Disentangled Semantic Editing in Rectified Flow Models. 13083-13092 - Jun Zhou, Jiahao Li, Zunnan Xu, Hanhui Li, Yiji Cheng, Fa-Ting Hong, Qin Lin, Qinglin Lu, Xiaodan Liang:
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model. 13093-13103 - Zhengyao Fang
, Pengyuan Lyu, Jingjing Wu, Chengquan Zhang, Jun Yu
, Guangming Lu, Wenjie Pei:
Recognition-Synergistic Scene Text Editing. 13104-13113 - Mengtian Li, Jinshu Chen, Wanquan Feng, Bingchuan Li, Fei Dai, Songtao Zhao, Qian He:
HyperLoRA: Parameter-Efficient Adaptive Generation for Portrait Synthesis. 13114-13123 - Atharva Sehgal, Patrick Yuan, Ziniu Hu, Yisong Yue, Jennifer J. Sun, Swarat Chaudhuri:
Self-Evolving Visual Concept Library using Vision-Language Critics. 13124-13134 - Zixuan Wang, Duo Peng, Feng Chen, Yuwei Yang, Yinjie Lei:
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis. 13135-13145 - Feng Liang, Haoyu Ma, Zecheng He, Tingbo Hou, Ji Hou, Kunpeng Li, Xiaoliang Dai, Felix Juefei-Xu, Samaneh Azadi, Animesh Sinha, Peizhao Zhang, Peter Vajda, Diana Marculescu:
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts. 13146-13156 - Xixi Hu, Keyang Xu, Bo Liu, Qiang Liu, Hongliang Fei:
AMO Sampler: Enhancing Text Rendering with Overshooting. 13157-13166 - Shuya Yang, Shaozhe Hao, Yukang Cao, Kwan-Yee K. Wong:
ArtiFade: Learning to Generate High-quality Subject from Blemished Images. 13167-13177 - Shufan Li, Konstantinos Kallidromitis, Akash Gokul, Zichun Liao, Yusuke Kato, Kazuki Kozuka, Aditya Grover:
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows. 13178-13188 - Enis Simsar, Thomas Hofmann, Federico Tombari, Pinar Yanardag:
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models. 13189-13198 - Zhanhao Liang, Yuhui Yuan, Shuyang Gu, Bohan Chen, Tiankai Hang, Mingxi Cheng, Ji Li, Liang Zheng:
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization. 13199-13208 - Harsh Rangwani, Aishwarya Agarwal, Kuldeep Kulkarni, R. Venkatesh Babu, Srikrishna Karanam:
Composing Parts for Expressive Object Generation. 13209-13219 - Xin Xie, Dong Gong:
DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling. 13220-13230 - Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Melvin Sevi, Vincent Tao Hu, Björn Ommer:
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions. 13231-13241 - Lital Binyamin, Yoad Tewel, Hilit Segev, Eran Hirsch, Royi Rassin, Gal Chechik:
Make It Count: Text-to-Image Generation with an Accurate Number of Objects. 13242-13251 - Feifei Li, Mi Zhang, Yiming Sun, Min Yang:
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization. 13252-13262 - Mingcheng Li, Xiaolu Hou, Ziyang Liu, Dingkang Yang, Ziyun Qian, Jiawei Chen, Jinjie Wei, Yue Jiang, Qingyao Xu, Lihua Zhang:
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation. 13263-13272 - Xiaoqian Shen
, Mohamed Elhoseiny
:
StoryGPT-V: Large Language Models as Consistent Story Visualizers. 13273-13283 - Chengyou Jia, Changliang Xia, Zhuohang Dang, Weijia Wu, Hangwei Qian, Minnan Luo:
ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting. 13284-13293 - Shitao Xiao, Yueze Wang, Junjie Zhou, Huaying Yuan, Xingrun Xing, Ruiran Yan, Chaofan Li, Shuting Wang, Tiejun Huang, Zheng Liu:
OmniGen: Unified Image Generation. 13294-13304 - Dmitry Petrov
, Pradyumn Goyal, Divyansh Shivashok, Yuanming Tao, Melinos Averkiou, Evangelos Kalogerakis:
ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts. 13305-13314 - Jingxuan Wei, Cheng Tan, Qi Chen, Gaowei Wu, Siyuan Li, Zhangyang Gao, Linzhuang Sun, Bihui Yu, Ruifeng Guo:
From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing. 13315-13325 - Shivam Duggal, Yushi Hu, Oscar Michel, Aniruddha Kembhavi, William T. Freeman, Noah A. Smith, Ranjay Krishna, Antonio Torralba, Ali Farhadi, Wei-Chiu Ma:
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation. 13326-13336 - Ming Li, Jike Zhong, Tianle Chen, Yuxiang Lai, Konstantinos Psounis:
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark. 13337-13349 - Haoyu Wang, Le Wang, Sanping Zhou, Jingyi Tian, Zheng Qin, Yabing Wang, Gang Hua, Wei Tang:
Towards Precise Embodied Dialogue Localization via Causality Guided Diffusion. 13350-13360 - Eunji Kim, Siwon Kim, Minjun Park, Rahim Entezari, Sungroh Yoon:
Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion. 13361-13370 - Mengfei Xia
, Nan Xue, Yujun Shen, Ran Yi, Tieliang Gong, Yong-Jin Liu:
Rectified Diffusion Guidance for Conditional Generation. 13371-13380 - Lijun Li, Zhelun Shi, Xuhao Hu, Bowen Dong
, Yiran Qin, Xihui Liu, Lu Sheng, Jing Shao:
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation. 13381-13392 - Naveen George
, Karthik Nandan Dasaraju, Rutheesh Reddy Chittepu, Konda Reddy Mopuri:
The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models. 13393-13402 - Peter V. Sushko, Ayana Bharadwaj, Zhi Yang Lim, Vasily Ilin, Ben Caffee, Dongping Chen, Mohammadreza Salehi, Cheng-Yu Hsieh, Ranjay Krishna:
RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations. 13403-13413 - Long Xu, Jiakai Wang, Haojie Hao, Haotong Qin, Jiejie Zhao, Xianglong Liu:
Harnessing Global-Local Collaborative Adversarial Perturbation for Anti-Customization. 13414-13423 - Haonan An
, Guang Hua, Zhengru Fang
, Guowen Xu, Susanto Rahardja, Yuguang Fang
:
Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal. 13424-13433 - Yuan Gan, Jiaxu Miao, Yunze Wang, Yi Yang:
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation. 13434-13444 - Zexi Jia, Chuanwei Huang, Yeshuang Zhu, Hongyan Fei, Xiaoyue Duan, Zhiqiang Yuan, Ying Deng, Jiapei Zhang, Jinchao Zhang, Jie Zhou:
Secret Lies in Color: Enhancing AI-Generated Images Detection with Color Distribution Analysis. 13445-13454 - Siyuan Cheng, Lingjuan Lyu
, Zhenting Wang, Xiangyu Zhang, Vikash Sehwag:
CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI. 13455-13465 - Ian Huang, Yanan Bao, Karen Truong, Howard Zhou, Cordelia Schmid, Leonidas J. Guibas, Alireza Fathi:
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement. 13466-13476 - Chamin Hewa Koneputugodage, Yizhak Ben-Shabat, Sameera Ramasinghe, Stephen Gould:
VI^3NR: Variance Informed Initialization for Implicit Neural Representations. 13477-13486 - Lo-Wei Tai, Ching-En Li, Cheng-Lin Chen, Chih-Jung Tsai, Hwann-Tzong Chen, Tyng-Luh Liu:
EigenGS Representation: From Eigenspace to Gaussian Image Space. 13487-13496 - Ruoyu Xue, Jingyi Xu, Sounak Mondal, Hieu Le, Gregory J. Zelinsky, Minh Hoai, Dimitris Samaras:
Few-shot Personalized Scanpath Prediction. 13497-13507 - Pierre Vuillecard, Jean-Marc Odobez:
Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels. 13508-13518 - Zhifeng Xie, Qile He, Youjia Zhu, Qiwei He, Mengtian Li:
FilmComposer: LLM-Driven Music Production for Silent Film Clips. 13519-13528 - Saksham Singh Kushwaha, Yapeng Tian:
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation. 13529-13539 - Hyeonggon Ryu, Seongyu Kim, Joon Son Chung, Arda Senocak:
Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes. 13540-13549 - Ruohao Guo, Xianghua Ying, Yaru Chen, Dantong Niu, Guangyao Li, Liao Qu, Yanyu Qi, Jinxing Zhou, Bowei Xing, Wenzhen Yue, Ji Shi, Qixun Wang, Peiliang Zhang, Buwen Liang:
Audio-Visual Instance Segmentation. 13550-13560 - Yung-Hsuan Lai, Janek Ebbers, Yu-Chiang Frank Wang, François G. Germain, Michael J. Jones, Moitreya Chatterjee:
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing. 13561-13570 - Bo Fang
, Wenhao Wu, Qiangqiang Wu
, Yuxin Song, Antoni B. Chan
:
DistinctAD: Distinctive Audio Description Generation in Contexts. 13571-13581 - Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos, Kris Kitani, Kristen Grauman:
ExpertAF: Expert Actionable Feedback from Video. 13582-13594 - Rong Gao, Xin Liu, Zhuozhao Hu, Bohao Xing
, Baiqiang Xia, Zitong Yu, Heikki Kälviäinen
:
FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding. 13595-13605 - Yuying Ge, Yizhuo Li, Yixiao Ge, Ying Shan:
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation. 13606-13617 - Tianyi Xiong, Xiyao Wang, Dong Guo, Qinghao Ye, Haoqi Fan, Quanquan Gu, Heng Huang, Chunyuan Li:
LLaVA-Critic: Learning to Evaluate Multimodal Models. 13618-13628 - Yiping Wang, Xuehai He, Kuan Wang, Luyao Ma, Jianwei Yang, Shuohang Wang, Simon Shaolei Du, Yelong Shen:
Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation. 13629-13638 - Zihui Xue, Joungbin An, Xitong Yang, Kristen Grauman:
Progress-Aware Video Frame Captioning. 13639-13650 - Tengda Han, Dilara Gokay, Joseph Heyward, Chuhan Zhang, Daniel Zoran, Viorica Patraucean, João Carreira, Dima Damen, Andrew Zisserman:
Learning from Streaming Video with Orthogonal Gradients. 13651-13660 - Jungin Park, Jiyoung Lee, Kwanghoon Sohn:
Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations. 13661-13670 - Bozheng Li, Yongliang Wu, Yi Lu, Jiashuo Yu, Licheng Tang, Jiawang Cao, Wenqing Zhu, Yuyang Sun, Jay Wu, Wenbo Zhu:
VEU-Bench: Towards Comprehensive Understanding of Video Editing. 13671-13680 - Hongyeob Kim, Inyoung Jung, Dayoon Suh, Youjia Zhang, Sangmin Lee
, Sungeun Hong:
Question-Aware Gaussian Experts for Audio-Visual Question Answering. 13681-13690 - Junjie Zhou, Yan Shu, Bo Zhao, Boya Wu, Zhengyang Liang, Shitao Xiao, Minghao Qin, Xi Yang, Yongping Xiong, Bo Zhang, Tiejun Huang, Zheng Liu:
MLVU: Benchmarking Multi-task Long Video Understanding. 13691-13701 - Kai Hu
, Feng Gao, Xiaohan Nie, Peng Zhou, Son Tran, Tal Neiman, Lingyun Wang, Mubarak Shah, Raffay Hamid, Bing Yin, Trishul Chilimbi:
M-LLM Based Video Frame Selection for Efficient Video Understanding. 13702-13712 - Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang, Angela Yao:
On the Consistency of Video Large Language Models in Temporal Comprehension. 13713-13722 - Chaoyu Li
, Eun Woo Im, Pooyan Fazli:
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding. 13723-13733 - Anxhelo Diko, Tinghuai Wang, Wassim Swaileh, Shiyan Sun, Ioannis Patras:
ReWind: Understanding Long Videos with Instructed Learnable Memory. 13734-13743 - Kyungho Bae, Jinhyung Kim, Sihaeng Lee, Soonyoung Lee, Gunhee Lee, Jinwoo Choi:
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations. 13744-13753 - Yongliang Wu, Xinting Hu, Yuyang Sun, Yizhou Zhou, Wenbo Zhu, Fengyun Rao, Bernt Schiele, Xu Yang:
Number it: Temporal Grounding Videos like Flipping Manga. 13754-13765 - Andong Deng, Zhongpai Gao, Anwesa Choudhuri, Benjamin Planche, Meng Zheng, Bin Wang, Terrence Chen, Chen Chen, Ziyan Wu:
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding. 13766-13775 - Zichen Liu, Kunlun Xu, Bing Su, Xu Zou, Yuxin Peng, Jiahuan Zhou:
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding. 13776-13786 - Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li, Olga Zatsarynna, Juergen Gall:
SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction. 13787-13797 - Hao Du, Bo Wu, Yan Lu, Zhendong Mao:
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation. 13798-13809 - Jirui Tian, Jinrong Zhang, Shenglan Liu, Luhao Xu, Zhixiong Huang, Gao Huang:
DTOS: Dynamic Time Object Sensing with Large Multimodal Model. 13810-13820 - Hao Fang, Runmin Cong, Xiankai Lu, Xiaofei Zhou, Sam Kwong
, Wei Zhang:
Decoupled Motion Expression Video Segmentation. 13821-13831 - Chong Zhou, Chenchen Zhu, Yunyang Xiong, Saksham Suri, Fanyi Xiao, Lemeng Wu, Raghuraman Krishnamoorthi, Bo Dai, Chen Change Loy, Vikas Chandra, Bilge Soran:
EdgeTAM: On-Device Track Anything Model. 13832-13842 - Huaxin Zhang, Xiaohao Xu, Xiang Wang, Jialong Zuo, Xiaonan Huang, Changxin Gao, Shanjun Zhang, Li Yu, Nong Sang:
Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity. 13843-13853 - Valentin Gabeff, Haozhe Qi, Brendan Flaherty, Gencer Sumbul, Alexander Mathis, Devis Tuia:
MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps. 13854-13864 - Mengnan Liu
, Le Wang, Sanping Zhou, Kun Xia, Xiaolong Sun, Gang Hua:
Boosting Point-Supervised Temporal Action Localization through Integrating Query Reformation and Optimal Transport. 13865-13875 - Anqi Zhu, Jingmin Zhu, James Bailey, Mingming Gong, Qiuhong Ke:
Semantic-guided Cross-Modal Prompt Learning for Skeleton-based Zero-shot Action Recognition. 13876-13885 - Hongkai Wei
, Yang Yang, Shijie Sun, Mingtao Feng, Xiangyu Song, Qi Lei, Hongli Hu, Rong Wang, Huansheng Song, Naveed Akhtar, Ajmal Saeed Mian:
Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking. 13886-13896 - Manfred Georg, Garrett Tanzer, Esha Uboweja, Saad Hassan, Maximus Shengelia, Sam S. Sepah, Sean Forbes, Thad Starner:
FSboard: Over 3 Million Characters of ASL Fingerspelling Collected via Smartphones. 13897-13906 - Chanhui Lee, Yeonghwan Song, Jeany Son:
Data-free Universal Adversarial Perturbation with Pseudo-semantic Prior. 13907-13916 - Qian Wang, Chen Li, Yuchen Luo, Hefei Ling, Shijuan Huang, Ruoxi Jia, Ning Yu:
Detecting Adversarial Data Using Perturbation Forgery. 13917-13926 - Jikang Cheng, Zhiyuan Yan, Ying Zhang, Li Hao, Jiaxin Ai, Qin Zou, Chen Li, Zhongyuan Wang:
Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection. 13927-13936 - Minchul Kim, Dingqiang Ye, Yiyang Su, Feng Liu, Xiaoming Liu:
SapiensID: Foundation for Human Recognition. 13937-13947 - Donghyun Lee, Yuhang Li, Youngeun Kim, Shiting Xiao, Priyadarshini Panda:
Spiking Transformer with Spatial-Temporal Attention. 13948-13958 - Tianqing Zhang, Kairong Yu, Xian Zhong, Hongwei Wang, Qi Xu, Qiang Zhang:
STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks. 13959-13969 - Soikat Hasan Ahmed, Jan Finkbeiner, Emre Neftci:
Efficient Event-Based Object Detection: A Hybrid Neural Network with Spatial and Temporal Attention. 13970-13979 - Xin Liang, Yogesh S. Rawat:
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID. 13980-13989 - Minsu Kim, Seungryong Kim, Kwanghoon Sohn:
Mixture of Submodules for Domain Adaptive Person Search. 13990-14001 - Xiaofei Hui, Haoxuan Qu, Hossein Rahmani, Jun Liu:
An Image-like Diffusion Method for Human-Object Interaction Detection. 14002-14012 - Haoliang Meng, Xiaopeng Hong, Zhengqin Lai, Miao Shang
:
Free Lunch Enhancements for Multi-modal Crowd Counting. 14013-14023 - Ruibin Li, Tao Yang, Song Guo, Lei Zhang:
RORem: Training a Robust Object Remover with Human-in-the-Loop. 14024-14035 - Hao Zhu, Yan Zhu, Jiayu Xiao, Tianxiang Xiao, Yike Ma, Yucheng Zhang, Feng Dai:
Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation. 14036-14045 - Chenxi Xie, Minghan Li, Hui Zeng, Jun Luo, Lei Zhang:
MaSS13K: A Matting-level Semantic Segmentation Benchmark. 14046-14056 - Wonseok Roh, Hwanhee Jung, Giljoo Nam, Dong In Lee, Hyeongcheol Park, Sang Ho Yoon
, Jungseock Joo, Sangpil Kim:
Insightful Instance Features for 3D Instance Segmentation. 14057-14067 - Xinyu Zhao, Jun Xie, Shengzhe Chen, Jun Liu:
Convex Combination Star Shape Prior for Data-driven Image Semantic Segmentation. 14068-14077 - Haijie Li, Yanmin Wu, Jiarui Meng, Qiankun Gao, Zhiyao Zhang, Ronggang Wang, Jian Zhang:
InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception. 14078-14088 - Junha Lee, Chunghyun Park, Jaesung Choe, Yu-Chiang Frank Wang, Jan Kautz, Minsu Cho, Christopher B. Choy:
Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation. 14089-14101 - Xingchen Liu, Piyush Tayal, Jianyuan Wang, Jesus Zarzar, Tom Monnier, Konstantinos Tertikas, Jiali Duan, Antoine Toisoul, Jason Y. Zhang, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotný:
UnCommon Objects in 3D. 14102-14113 - Hongjia Zhai, Hai Li, Zhenzhe Li, Xiaokun Pan, Yijia He, Guofeng Zhang:
PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding. 14114-14124 - Yan Wang, Baoxiong Jia, Ziyu Zhu, Siyuan Huang:
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding. 14125-14136 - Kim Jun-Seong, GeonU Kim, Kim Yu-Ji, Yu-Chiang Frank Wang, Jaesung Choe, Tae-Hyun Oh:
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration. 14137-14146 - Hanxun Yu, Wentong Li, Song Wang, Junbo Chen
, Jianke Zhu:
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning. 14147-14157 - Shengqiong Wu, Hao Fei, Tat-Seng Chua:
Universal Scene Graph Generation. 14158-14168 - Jingzhou Luo, Yang Liu, Weixing Chen, Zhen Li, Yaowei Wang, Guanbin Li, Liang Lin:
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering. 14169-14178 - Shijie Zhou, Hui Ren, Yijia Weng, Shuwang Zhang, Zhen Wang, Dejia Xu, Zhiwen Fan
, Suya You, Zhangyang Wang, Leonidas J. Guibas, Achuta Kadambi:
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields. 14179-14190 - Zihan Wang, Gim Hee Lee:
g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks. 14191-14202 - Jianwei Yang, Reuben Tan, Qianhui Wu, Ruijie Zheng, Baolin Peng, Yongyuan Liang, Yu Gu
, Mu Cai, Seonghyeon Ye, Joel Jang, Yuquan Deng, Jianfeng Gao:
Magma: A Foundation Model for Multimodal AI Agents. 14203-14214 - Jing Zhu, Yuhang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil Shah, Danai Koutra:
Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning. 14215-14224 - Zihao Zhang, Aming Wu, Yahong Han:
Style Evolving along Chain-of-Thought for Unknown-Domain Object Detection. 14225-14234 - Yuanze Lin, Yunsheng Li, Dongdong Chen, Weijian Xu, Ronald Clark, Philip Torr:
Olympus: A Universal Task Router for Computer Vision Tasks. 14235-14246 - Bardia Safaei, Faizan Siddiqui, Jiacong Xu, Vishal M. Patel, Shao-Yuan Lo:
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning. 14247-14256 - Ji Hyeok Jung, Eun Tae Kim, Seo Yeon Kim, Joo Ho Lee
, Bumsoo Kim, Buru Chang:
Is 'Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning. 14257-14267 - Yunze Man, De-An Huang, Guilin Liu, Shiwei Sheng, Shilong Liu, Liang-Yan Gui, Jan Kautz, Yu-Xiong Wang, Zhiding Yu:
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought. 14268-14280 - Xuanbai Chen, Xiang Xu, Zhihua Li, Tianchen Zhao
, Pietro Perona, Qin Zhang, Yifan Xing:
Model Diagnosis and Correction via Linguistic and Implicit Attribute Editing. 14281-14292 - Sébastien Piérard, Anaïs Halin, Anthony Cioppa, Adrien Deliège, Marc Van Droogenbroeck:
Foundations of the Theory of Performance-Based Ranking. 14293-14302 - Sagar Soni, Akshay Dudhane, Hiyam Debary, Mustansar Fiaz, Muhammad Akhtar Munir, Muhammad Sohail Danish, Paolo Fraccaro, Campbell D. Watson, Levente J. Klein, Fahad Shahbaz Khan, Salman H. Khan:
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues. 14303-14313 - Yiyang Fang, Wenke Huang, Guancheng Wan, Kehua Su, Mang Ye:
EMOE: Modality-Specific Enhanced Dynamic Emotion Experts. 14314-14324 - Fengxiang Wang, Hongzhen Wang, Zonghao Guo, Di Wang, Yulin Wang, Mingshuo Chen, Qiang Ma, Long Lan, Wenjing Yang, Jing Zhang, Zhiyuan Liu, Maosong Sun:
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery? 14325-14336 - Erjian Guo, Zhen Zhao, Zicheng Wang, Tong Chen
, Yunyi Liu, Luping Zhou:
DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels. 14337-14346 - Xiaofu Chen, Yaxin Luo, Gen Luo, Jiayi Ji, Henghui Ding, Yiyi Zhou:
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension. 14347-14357 - Heng Yin, Yuqiang Ren, Ke Yan, Shouhong Ding, Yongtao Hao:
ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models. 14358-14368 - Guofeng Mei
, Wei Lin, Luigi Riz
, Yujiao Wu, Fabio Poiesi, Yiming Wang
:
PerLA: Perceptive 3D Language Assistant. 14369-14379 - Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng:
BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs. 14380-14389 - Yang Qin, Chao Chen, Zhihang Fu, Dezhong Peng, Xi Peng, Peng Hu:
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification. 14390-14399 - Yuanmin Tang, Jue Zhang, Xiaoting Qin, Jing Yu, Gaopeng Gou, Gang Xiong, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
, Qi Wu:
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval. 14400-14410 - Zhaoran Zhao, Peng Lu, Anran Zhang, Peipei Li, Xia Li, Xuannan Liu, Yang Hu, Shiyi Chen, Liwei Wang, Wenhao Guo:
Can Machines Understand Composition? Dataset and Benchmark for Photographic Image Composition Embedding and Understanding. 14411-14421 - Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem, Talfan Evans, Samuel Albanie, Federico Tombari, Yongqin Xian, Alessio Tonioni, Olivier J. Hénaff:
Active Data Curation Effectively Distills Large-Scale Multimodal Models. 14422-14437 - Thao Nguyen, Krishna Kumar Singh, Jing Shi, Trung Bui, Yong Jae Lee, Yuheng Li:
Yo'Chameleon: Personalized Vision and Language Generation. 14438-14448 - Zi-Han Jiang, Chien-Wei Lin, Wei-Hua Li
, Hsuan-Tung Liu, Yi-Ren Yeh, Chu-Song Chen:
Relation-Rich Visual Document Generator for Visual Information Extraction. 14449-14459 - Zining Wang, Tongkun Guan, Pei Fu, Chen Duan, Qianyi Jiang, Zhentao Guo, Shan Guo, Junfeng Luo, Wei Shen, Xiaokang Yang:
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding. 14460-14471 - Zhaoqing Zhu, Chuwei Luo, Zirui Shao, Feiyu Gao, Hangdi Xing, Qi Zheng, Ji Zhang:
A Simple yet Effective Layout Token in Large Language Models for Document Understanding. 14472-14482 - Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, Chao Dong:
Teaching Large Language Models to Regress Accurate Image Quality Scores Using Score Distribution. 14483-14494 - Mothilal Asokan, Kebin Wu, Fatima Albreiki:
FineLIP: Extending CLIP's Reach via Fine-Grained Alignment with Longer Text Inputs. 14495-14504 - Lucas Morin
, Valéry Weber, Ahmed Nassar, Gerhard Ingmar Meijer, Luc Van Gool, Yawei Li
, Peter W. J. Staar:
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures. 14505-14515 - Andrea Maracani, Savas Özkan, Sijun Cho, Hyowon Kim, Eunchung Noh, Jeongwon Min, Cho Jung Min, Dookun Park, Mete Ozay:
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation. 14516-14526 - Xin Zhang, Robby T. Tan:
Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation. 14527-14537 - Haoran Hao, Jiaming Han, Changsheng Li, Yu-Feng Li, Xiangyu Yue:
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models. 14538-14548 - Omri Kaduri, Shai Bagon, Tali Dekel:
What's in the Image? A Deep-Dive into the Vision of Vision Language Models. 14549-14558 - Chenxin Tao, Shiqian Su, Xizhou Zhu, Chenyu Zhang, Zhe Chen, Jiawen Liu, Wenhai Wang, Lewei Lu, Gao Huang, Yu Qiao, Jifeng Dai:
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding. 14559-14569 - Bo Tong, Bokai Lai, Yiyi Zhou, Gen Luo, Yunhang Shen, Ke Li, Xiaoshuai Sun, Rongrong Ji:
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression. 14570-14581 - Mohamed Dhouib, Davide Buscaldi, Sonia Vanier, Aymen Shabou:
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models. 14582-14592 - Long Xing, Qidong Huang, Xiaoyi Dong, Jiajie Lu, Pan Zhang, Yuhang Zang, Yuhang Cao, Conghui He, Jiaqi Wang
, Feng Wu, Dahua Lin:
Conical Visual Concentration for Efficient Large Vision-Language Models. 14593-14603 - Le Zhang, Qian Yang, Aishwarya Agrawal:
Assessing and Learning Alignment of Unimodal Vision and Language Models. 14604-14614 - Ke Zhu, Yu Wang, Yanpeng Sun, Qiang Chen, Jiangjiang Liu, Gang Zhang, Jingdong Wang:
Continual SFT Matches Multimodal RLHF with Negative Supervision. 14615-14624 - Hao Yin, Guangzong Si, Zilei Wang:
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models. 14625-14634 - Le Yang, Ziwei Zheng, Boxu Chen
, Zhengyu Zhao, Chenhao Lin, Chao Shen:
Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection. 14635-14645 - Yuanchen Wu, Lu Zhang, Hang Yao
, Junlong Du, Ke Yan, Shouhong Ding, Yunsheng Wu, Xiaoqiang Li:
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception. 14646-14656 - Zhenting Wang, Shuming Hu, Shiyu Zhao, Xiaowen Lin, Felix Juefei-Xu, Zhuowei Li, Ligong Han, Harihar Subramanyam, Li Chen, Jianfa Chen, Nan Jiang, Lingjuan Lyu
, Shiqing Ma, Dimitris N. Metaxas, Ankit Jain:
MLLM-as-a-Judge for Image Safety without Human Labeling. 14657-14666 - Yuan-Hong Liao, Rafid Mahmood, Sanja Fidler, David Acuna:
Can Large Vision-Language Models Correct Semantic Grounding Errors By Themselves? 14667-14678 - Peng Xie, Yequan Bie, Jianda Mao, Yangqiu Song, Yang Wang, Hao Chen, Kani Chen:
Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks. 14679-14689 - Sanghwan Kim, Rui Xiao, Mariana-Iuliana Georgescu, Stephan Alaniz, Zeynep Akata:
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training. 14690-14700 - Ziliang Chen, Xin Huang, Xiaoxuan Fan, Keze Wang, Yuyu Zhou, Quanlong Guan, Liang Lin:
Reproducible Vision-Language Models Meet Concepts Out of Pre-Training. 14701-14711 - Chong Yu, Tao Chen, Zhongxue Gan:
Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple Vision-Language Model Variants. 14712-14722 - Shihan Wu, Ji Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen:
Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves. 14723-14732 - Qi Zhu, Jiangwei Lao, Deyi Ji, Junwei Luo, Kang Wu, Yingying Zhang, Lixiang Ru, Jian Wang, Jingdong Chen, Ming Yang
, Dong Liu, Feng Zhao:
SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling. 14733-14744 - Fusheng Hao, Fengxiang He, Fuxiang Wu, Tichao Wang, Chengqun Song, Jun Cheng:
Task-Aware Clustering for Prompting Vision-Language Models. 14745-14755 - Yuxin Fan, Junbiao Cui, Jiye Liang:
Learning Textual Prompts for Open-World Semi-Supervised Learning. 14756-14765 - Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao:
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models. 14766-14776 - Giorgos Kordopatis-Zilos
, Vladan Stojnic
, Anna Manko, Pavel Suma, Nikolaos-Antonios Ypsilantis, Nikos Efthymiadis
, Zakaria Laskar, Jiri Matas, Ondrej Chum, Giorgos Tolias:
ILIAS: Instance-Level Image retrieval At Scale. 14777-14787 - Vishwesh Nath, Wenqi Li, Dong Yang, Andriy Myronenko, Mingxin Zheng, Yao Lu, Zhijian Liu, Hongxu Yin, Yee Man Law, Yucheng Tang, Pengfei Guo, Can Zhao, Ziyue Xu, Yufan He, Stephanie A. Harmon, Benjamin Simon, Greg Heinrich, Stephen R. Aylward, Marc Edgar, Michael Zephyr, Pavlo Molchanov, Baris Turkbey, Holger Roth, Daguang Xu:
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge. 14788-14798 - Tahira Kazimi, Ritika Allada, Pinar Yanardag:
Explaining in Diffusion: Explaining a Classifier with Diffusion Semantics. 14799-14809 - Bo Wang, Dingwei Tan, Yen-Ling Kuo, Zhaowei Sun, Jeremy M. Wolfe, Tat-Jen Cham, Mengmi Zhang
:
Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging. 14810-14823 - Junjie Wang, Bin Chen, Yulin Li, Bin Kang, Yichi Chen, Zhuotao Tian:
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception. 14824-14834 - Pedro Hermosilla, Christian Stippel, Leon Sick:
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding. 14835-14844 - Zheda Mai, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Quang-Huy Nguyen, Li Zhang, Wei-Lun Chao:
Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition. 14845-14857 - Seungmin Baek, Soyul Lee, Hayeon Jo, Hyesong Choi, Dongbo Min:
TADFormer: Task-Adaptive Dynamic TransFormer for Efficient Multi-Task Learning. 14858-14868 - Xuan Cai, Renjie Pan, Hua Yang:
LoKi: Low-dimensional KAN for Efficient Fine-tuning Image Models. 14869-14880 - Ondrej Týbl, Lukás Neumann:
Training-free Neural Architecture Search through Variance of Knowledge of Deep Network Weights. 14881-14890 - Zhuguanyu Wu
, Shihe Wang, Jiayi Zhang, Jiaxin Chen, Yunhong Wang:
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation. 14891-14900 - Jiachen Zhu, Xinlei Chen, Kaiming He, Yann LeCun, Zhuang Liu:
Transformers without Normalization. 14901-14911 - Abdelrahman M. Shaker, Syed Talal Wasim, Salman H. Khan, Juergen Gall, Fahad Shahbaz Khan:
GroupMamba: Efficient Group-Based Visual State Space Model. 14912-14922 - Sanghyeok Lee
, Joonmyung Choi, Hyunwoo J. Kim:
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality. 14923-14933 - Xiaoyong Lu, Songlin Du:
JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba. 14934-14943 - Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao
, Yuyin Zhou, Alan L. Yuille, Cihang Xie:
Mamba-Reg: Vision Mamba Also Needs Registers. 14944-14953 - Cheng Lei, Ao Li
, Hu Yao
, Ce Zhu, Le Zhang:
Rethinking Token Reduction with Parameter-Efficient Fine-Tuning in ViT for Pixel-Level Tasks. 14954-14964 - Rong Qin, Xin Liu, Xingyu Liu, Jiaxuan Liu, Jinglei Shi, Liang Lin, Jufeng Yang:
No Pains, More Gains: Recycling Sub-Salient Patches for Efficient High-Resolution Image Recognition. 14965-14975 - Lu Yu, Haoyu Han, Zhe Tao, Hantao Yao, Changsheng Xu:
Language Guided Concept Bottleneck Models for Interpretable Continual Learning. 14976-14986 - Shenghao Fu, Qize Yang, Qijie Mo, Junkai Yan, Xihan Wei, Jingke Meng, Xiaohua Xie, Wei-Shi Zheng:
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models. 14987-14997 - Yongkang Li, Tianheng Cheng, Bin Feng, Wenyu Liu, Xinggang Wang
:
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation. 14998-15008 - Zelin Peng, Zhengqin Xu, Zhilin Zeng, Yu Huang, Yaoming Wang, Wei Shen:
Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation. 15009-15020 - Xiao-Hui Li, Fei Yin, Cheng-Lin Liu:
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning. 15021-15032 - Chanyoung Kim, Dayun Ju, Woojung Han, Ming-Hsuan Yang, Seong Jae Hwang:
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation. 15033-15042 - Dong Zhao, Jinlong Li, Shuang Wang, Mengyao Wu, Qi Zang, Nicu Sebe
, Zhun Zhong:
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation. 15043-15054 - Jian Wang, Tianhong Dai, Bingfeng Zhang, Siyue Yu, Eng Gee Lim, Jimin Xiao:
POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation. 15055-15064 - Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj, Jackson David Cothren, Khoa Luu:
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding. 15065-15075 - Shifan Zhang, Hongzi Zhu, Yinan He, Minyi Guo, Ziyang Lou, Shan Chang:
WISNet: Pseudo Label Generation on Unbalanced and Patch Annotated Waste Images. 15076-15085 - Tian Liu, Huixin Zhang, Shubham Parashar, Shu Kong:
Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning. 15086-15097 - Marco Garosi, Alessandro Conti
, Gaowen Liu, Elisa Ricci, Massimiliano Mancini
:
Compositional Caching for Training-free Open-vocabulary Attribute Detection. 15098-15107 - Zilin Wang
, Sangwoo Mo, Stella X. Yu, Sima Behpour, Liu Ren:
Open Ad-hoc Categorization with Contextualized Feature Learning. 15108-15117 - Zhengyuan Peng, Jinpeng Ma, Zhimin Sun, Ran Yi, Haichuan Song, Xin Tan, Lizhuang Ma:
MOS: Modeling Object-Scene Associations in Generalized Category Discovery. 15118-15128 - Mankeerat Sidhu, Hetarth Chopra, Ansel Blume, Jeonghwan Kim, Revanth Gangi Reddy, Heng Ji:
Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval. 15129-15138 - Konstantinos Panagiotis Alexandridis, Ismail Elezi, Jiankang Deng, Anh Nguyen, Shan Luo:
Fractal Calibration for Long-tailed Object Detection. 15139-15150 - Quentin Guimard, Moreno D'Incà, Massimiliano Mancini
, Elisa Ricci:
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers. 15151-15161 - Shihua Huang, Zhichao Lu
, Xiaodong Cun, Yongjun Yu, Xiao Zhou, Xi Shen:
DEIM: DETR with Improved Matching for Fast Convergence. 15162-15171 - Songlong Xing, Zhengyu Zhao, Nicu Sebe
:
CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP. 15172-15182 - Zhonghang Liu, Kun Zhou, Changshuo Wang
, Wen-Yan Lin, Jiangbo Lu:
FlexUOD: The Answer to Real-world Unsupervised Image Outlier Detection. 15183-15193 - Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, Jinqiao Wang:
UniVAD: A Training-free Unified Model for Few-shot Visual Anomaly Detection. 15194-15203 - Jinjin Zhang, Guodong Wang, Yizhou Jin, Di Huang:
Towards Training-free Anomaly Detection with Vision and Language Foundation Models. 15204-15213 - Wenbing Zhu, Lidong Wang, Ziqing Zhou, Chengjie Wang
, Yurui Pan, Ruoyi Zhang, Zhuhao Chen, Linjie Cheng, Bin-Bin Gao
, Jiangning Zhang, Zhenye Gan, Yuxie Wang, Yulong Chen, Shuguang Qian, Mingmin Chi, Bo Peng, Lizhuang Ma:
Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection. 15214-15223 - Sheng Wu, Yimi Wang, Xudong Liu, Yuguang Yang, Runqi Wang, Guodong Guo, David S. Doermann, Baochang Zhang:
DFM: Differentiable Feature Matching for Anomaly Detection. 15224-15233 - Xiaoyi Qu, David Aponte, Colby R. Banbury, Daniel P. Robinson, Tianyu Ding, Kazuhito Koishida, Ilya Zharkov, Tianyi Chen:
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression. 15234-15244 - Xiao Cui, Yulei Qin, Wengang Zhou, Hongsheng Li, Houqiang Li:
OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation. 15245-15254 - Yushuai Sun, Zikun Zhou, Dongmei Jiang, Yaowei Wang, Jun Yu, Guangming Lu, Wenjie Pei:
Learning Compatible Multi-Prize Subnetworks for Asymmetric Retrieval. 15255-15264 - Biqing Qi, Fangyuan Li
, Zhen Wang, Junqi Gao, Dong Li, Peng Ye, Bowen Zhou:
Less is More: Efficient Model Merging with Binary Task Switch. 15265-15274 - Carlos Garrido-Munoz, Jorge Calvo-Zaragoza:
On the Generalization of Handwritten Text Recognition Models. 15275-15286 - Tao Sun, Yuhao Huang, Li Shen, Kele Xu
, Bao Wang:
Investigating the Role of Weight Decay in Enhancing Nonconvex SGD. 15287-15296 - Yusong Hu, Zichen Liang, Fei Yang, Qibin Hou, Xialei Liu, Ming-Ming Cheng:
KAC: Kolmogorov-Arnold Classifier for Continual Learning. 15297-15307 - Xuan Liu, Xiaobin Chang:
LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning. 15308-15318 - Chenggong Ni, Fan Lyu, Jiayao Tan, Fuyuan Hu, Rui Yao, Tao Zhou:
Maintaining Consistent Inter-Class Topology in Continual Test-Time Adaptation. 15319-15328 - Juntae Lee, Munawar Hayat, Sungrack Yun:
Tripartite Weight-Space Ensemble for Few-Shot Class-Incremental Learning. 15329-15338 - Seonghyeon Hwang, Minsu Kim, Steven Euijong Whang:
T-CIL: Temperature Scaling using Adversarial Perturbation for Calibration in Class-Incremental Learning. 15339-15348 - Aodi Li, Liansheng Zhuang, Xiao Long, Minghong Yao, Shafei Wang:
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes. 15349-15359 - Dong Kyu Cho, Inwoo Hwang, Sanghack Lee:
PEER Pressure: Model-to-Model Regularization for Single Source Domain Generalization. 15360-15370 - Marzi Heidari, Abdullah Alchihabi, Hao Yan, Yuhong Guo:
A Unified Framework for Heterogeneous Semi-supervised Learning. 15371-15380 - Bo Cheng, Jueqing Lu, Yuan Tian, Haifeng Zhao, Yi Chang, Lan Du:
CGMatch: A Different Perspective of Semi-supervised Learning. 15381-15391 - Yucong Dai, Shilin Gu, Ruidong Fan, Chao Xu, Chenping Hou:
Label Shift Meets Online Learning: Ensuring Consistent Adaptation with Universal Dynamic Regret. 15392-15401 - Zhuo Xu, Xiang Xiang, Yifan Liang:
Overcoming Shortcut Problem in VLM for Robust Out-of-Distribution Detection. 15402-15412 - Yuhang Liu, Wenjie Zhao, Yunhui Guo:
H2ST: Hierarchical Two-Sample Tests for Continual Out-of-Distribution Detection. 15413-15423 - Litian Liu, Yao Qin:
Detecting Out-of-Distribution Through the Lens of Neural Collapse. 15424-15433 - Chenhe Hao, Weiying Xie, Daixun Li, Haonan Qin, Hangyu Ye, Leyuan Fang, Yunsong Li:
FedCS: Coreset Selection for Federated Learning. 15434-15443 - Hao Zheng, Zhigang Hu, Liu Yang, Meiguang Zheng, Aikun Xu, Boyu Wang:
FedCALM: Conflict-aware Layer-wise Mitigation for Selective Aggregation in Deeper Personalized Federated Learning. 15444-15453 - Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong:
Model Poisoning Attacks to Federated Learning via Multi-Round Consistency. 15454-15463 - Zihan Tan, Guancheng Wan, Wenke Huang, He Li, Guibin Zhang, Carl Yang, Mang Ye:
FedSPA: Generalizable Federated Graph Learning under Homophily Heterogeneity. 15464-15475 - Yuhang Wang, Junkang Guo, Aolei Liu, Kaihao Wang, Zaitong Wu, Zhenyu Liu, Wenfei Yin, Jian Liu:
TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions. 15476-15485 - Weiwei Li
, Junzhuo Liu, Yuanyuan Ren, Yuchen Zheng, Yahao Liu, Wen Li:
Let Samples Speak: Mitigating Spurious Correlation by Exploiting the Clusterness of Samples. 15486-15496 - Jinxu Lin, Linwei Tao, Minjing Dong, Chang Xu:
Uncertainty Weighted Gradients for Model Calibration. 15497-15507 - Wei Liu, Yufei Chen, Xiaodong Yue:
Enhancing Testing-Time Robustness for Trusted Multi-View Classification in the Wild. 15508-15517 - Zhibin Dong, Meng Liu, Siwei Wang, Ke Liang, Yi Zhang, Suyuan Liu, Jiaqi Jin, Xinwang Liu, En Zhu:
Enhanced then Progressive Fusion with View Graph for Multi-View Clustering. 15518-15527 - Zheming Xu, He Liu, Congyan Lang, Tao Wang, Yidong Li, Michael C. Kampffmeyer:
A Hubness Perspective on Representation Learning for Graph-Based Multi-View Clustering. 15528-15537 - Dileepa Pitawela, Gustavo Carneiro, Hsiang-Ting Chen:
CLOC: Contrastive Learning for Ordinal Classification with Multi-Margin N-pair Loss. 15538-15548 - Siyi Du, Xinzhe Luo, Declan P. O'Regan, Chen Qin:
STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification. 15549-15559 - Jie Liu
, Tiexin Qin
, Hui Liu
, Yilei Shi, Lichao Mou, Xiao Xiang Zhu, Shiqi Wang, Haoliang Li
:
Q-PART: Quasi-Periodic Adaptive Regression with Test-time Training for Pediatric Left Ventricular Ejection Fraction Regression. 15560-15569 - Bingzhi Chen, Sisi Fu, Xiaocheng Fang, Jieyi Cai, Boya Zhang, Minhua Lu, Yishu Liu:
OralXrays-9: Towards Hospital-Scale Panoramic X-ray Anomaly Detection via Personalized Multi-Object Query-Aware Mining. 15570-15579 - Sang-Jun Park, Keun-Soo Heo, Dong-Hee Shin
, Young-Han Son, Ji-Hye Oh, Tae-Eui Kam:
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation. 15580-15589 - Zhengrui Guo, Conghao Xiong, Jiabo Ma, Qichen Sun, Lishuang Feng, Jinzhuo Wang, Hao Chen:
FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification. 15590-15600 - Tingting Zheng, Kui Jiang, Yi Xiao, Sicheng Zhao, Hongxun Yao:
M3amba: Memory Mamba is All You Need for Whole Slide Image Classification. 15601-15610 - Aniruddha Ganguly, Debolina Chatterjee
, Wentao Huang, Jie Zhang, Alisa Yurovsky, Travis Steele Johnson, Chao Chen:
MERGE: Multi-faceted Hierarchical Graph-based GNN for Gene Expression Prediction from Whole Slide Histopathology Images. 15611-15620 - Xingguo Lv, Xingbo Dong, Liwen Wang, Jiewen Yang, Lei Zhao, Bin Pu, Zhe Jin, Xuejun Li:
Test-Time Domain Generalization via Universe Learning: A Multi-Graph Matching Approach for Medical Image Segmentation. 15621-15631 - Zhenhui Ding, Guilian Chen, Qin Zhang, Huisi Wu, Jing Qin:
CSC-PA: Cross-image Semantic Correlation via Prototype Attentions for Single-network Semi-supervised Breast Tumor Segmentation. 15632-15641 - Yuan Guo, Jingyu Kong, Yu Wang, Yuping Duan:
Take the Bull by the Horns: Learning to Segment Hard Samples. 15642-15652 - Jie Mei, Chenyu Lin, Yu Qiu, Yaonan Wang, Hui Zhang, Ziyang Wang, Dong Dai:
Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images. 15653-15662 - Tianyi Liu, Haochuan Jiang, Kaizhu Huang:
KMD: Koopman Multi-modality Decomposition for Generalized Brain Tumor Segmentation under Incomplete Modalities. 15663-15671 - Kunpeng Qiu, Zhiqiang Gao, Zhiying Zhou, Mingjie Sun, Yongxin Guo
:
Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation. 15672-15681 - Chun-Hung Wu, Shih-Hong Chen, Chih-Yao Hu, Hsin-Yu Wu, Kai-Hsin Chen, Yu-You Chen, Chih-Hai Su, Chih-Kuo Lee, Yu-Lun Liu:
DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation. 15682-15692 - Zhifeng Wang, Renjiao Yi, Xin Wen, Chenyang Zhu, Kai Xu:
VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis. 15693-15702 - Jingfeng Yao, Bin Yang, Xinggang Wang:
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models. 15703-15712 - Kaiwen Zha, Lijun Yu, Alireza Fathi, David A. Ross, Cordelia Schmid, Dina Katabi, Xiuye Gu:
Language-Guided Image Tokenization for Generation. 15713-15722 - Qingyu Shi, Lu Qi, Jianzong Wu, Jinbin Bai, Jingbo Wang, Yunhai Tong, Xiangtai Li:
DreamRelation: Bridging Customization and Relation Generation. 15723-15732 - Jian Han, Jinlai Liu, Yi Jiang, Bin Yan, Yuqi Zhang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu:
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis. 15733-15744 - Yeongmin Kim, Sotiris Anagnostidis, Yuming Du, Edgar Schönfeld, Jonas Kohler, Markos Georgopoulos, Albert Pumarola, Ali K. Thabet, Artsiom Sanakoyeu:
Autoregressive Distillation of Diffusion Transformers. 15745-15756 - Jingyi Tian, Le Wang, Sanping Zhou, Sen Wang, Jiayi Li, Haowen Sun, Wei Tang:
PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation. 15757-15767 - Chan Hee Song, Valts Blukis, Jonathan Tremblay, Stephen Tyree, Yu Su, Stan Birchfield:
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics. 15768-15780 - Jieming Cui, Tengyu Liu, Ziyu Meng, Jiale Yu, Ran Song, Wei Zhang, Yixin Zhu, Siyuan Huang:
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill. 15781-15790 - Amir Bar, Gaoyue Zhou, Danny Tran, Trevor Darrell, Yann LeCun:
Navigation World Models. 15791-15801 - Mi Luo, Zihui Xue, Alex Dimakis, Kristen Grauman:
Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning. 15802-15812 - Zhengxue Wang, Zhiqiang Yan, Jinshan Pan, Guangwei Gao, Kai Zhang, Jian Yang:
DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution. 15813-15822 - Bangyan Liao, Zhenjun Zhao, Haoang Li, Yi Zhou, Yingping Zeng, Hao Li, Peidong Liu:
Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World. 15823-15832 - Yuhui Liu, Liangxun Ou, Qiang Fu
, Hadi Amata
, Wolfgang Heidrich
, Yifan Peng:
Learned Binocular-Encoding Optics for RGBD Imaging Using Joint Stereo and Focus Cues. 15833-15842 - Juan Carlos Dibene, Enrique Dunn:
Camera Resection from Known Line Pencils and a Radially Distorted Scanline. 15843-15851 - Sotiris Nousias, Mian Wei, Howard Xiao, Maxx Wu, Shahmeer Athar, Kevin J. Wang, Anagh Malik, David A. Barmherzig, David B. Lindell, Kyros Kutulakos:
Opportunistic Single-Photon Time of Flight. 15852-15862 - Gaoxiang Cong, Jiadong Pan, Liang Li, Yuankai Qi
, Yuxin Peng, Anton van den Hengel, Jian Yang
, Qingming Huang:
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing. 15863-15873 - Ji-Hoon Kim, Jeongsoo Choi, Jaehun Kim, Chaeyoung Jung, Joon Son Chung:
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech. 15874-15884 - Yinuo Wang
, Yanbo Fan, Xuan Wang, Yu Guo, Fei Wang:
Diffusion-based Realistic Listening Head Generation via Hybrid Motion Modeling. 15885-15895 - Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan, Nikos Kolotouros, Thiemo Alldieck, Cristian Sminchisescu:
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis. 15896-15908 - Zunnan Xu, Zhentao Yu, Zixiang Zhou, Jun Zhou, Xiaoyu Jin, Fa-Ting Hong, Xiaozhong Ji, Junwei Zhu, Chengfei Cai, Shiyu Tang, Qin Lin, Xiu Li, Qinglin Lu:
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation. 15909-15919 - Jianwen Jiang, Gaojie Lin, Zhengkun Rong, Chao Liang, Yongming Zhu, Jiaqi Yang, Tianyun Zhong:
MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices. 15920-15929 - Wojciech Zielonka, Timo Bolkart, Thabo Beeler, Justus Thies:
Gaussian Eigen Models for Human Heads. 15930-15940 - Zhenglin Zhou, Fan Ma, Hehe Fan, Tat-Seng Chua:
Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion. 15941-15952 - Hyunsoo Cha, Inhee Lee, Hanbyul Joo:
PERSE: Personalized 3D Generative Avatars from A Single Portrait. 15953-15962 - Zihao Huang, Shoukang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu:
WildAvatar: Learning In-the-wild 3D Avatars from the Web. 15963-15975 - Hanxi Liu, Yifang Men, Zhouhui Lian:
Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting. 15976-15986 - Hang Ye, Xiaoxuan Ma, Hai Ci, Wentao Zhu, Yizhou Wang:
FreeCloth: Free-form Generation Enhances Challenging Clothed Human Modeling. 15987-15997 - Chaoyue Song, Jianfeng Zhang, Xiu Li, Fan Yang, Yiwen Chen, Zhongcong Xu, Jun Hao Liew, Xiaoyang Guo, Fayao Liu
, Jiashi Feng, Guosheng Lin:
MagicArticulate: Make Your 3D Models Articulation-Ready. 15998-16007 - Peng Li, Wangguandong Zheng, Yuan Liu, Tao Yu, Yangguang Li, Xingqun Qi, Xiaowei Chi, Siyu Xia, Yan-Pei Cao, Wei Xue, Wenhan Luo, Yike Guo:
PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing. 16008-16018 - Jiaqi Liu
, Jichao Zhang, Paolo Rota, Nicu Sebe
:
Multi-focal Conditioned Latent Diffusion for Person Image Synthesis. 16019-16028 - Nannan Zhang, Yijiang Li, Dong Du, Zheng Chong, Zhengwentai Sun, Jianhao Zeng, Yusheng Dai, Zhengyu Xie, Hairui Zhu, Xiaoguang Han:
Robust-MVTON: Learning Cross-Pose Feature Alignment and Fusion for Robust Multi-View Virtual Try-On. 16029-16039 - Yang Zheng, Menglei Chai, Delio Vicini, Yuxiao Zhou, Yinghao Xu, Leonidas J. Guibas, Gordon Wetzstein, Thabo Beeler:
GroomLight: Hybrid Inverse Rendering for Relightable Human Hair Appearance Modeling. 16040-16050 - Xingyu Ren, Jiankang Deng, Yuhao Cheng, Wenhan Zhu, Yichao Yan, Xiaokang Yang, Stefanos Zafeiriou, Chao Ma:
S^3-Face: SSS-Compliant Facial Reflectance Estimation via Diffusion Priors. 16051-16060 - Zhilv Yi, Xiao Lu, Hong Ding, Jingbo Hu, Zhi Jiang, Chunxia Xiao:
DL2G: Degradation-guided Local-to-Global Restoration for Eyeglass Reflection Removal. 16061-16070 - Yuxuan Gu
, Haoxuan Wang, Pengyang Ling, Zhixiang Wei, Huaian Chen, Yi Jin, Enhong Chen:
Improving Visual and Downstream Performance of Low-Light Enhancer with Vision Foundation Models Collaboration. 16071-16080 - Shuangfan Zhou, Chu Zhou, Youwei Lyu, Heng Guo, Zhanyu Ma, Boxin Shi
, Imari Sato:
PIDSR: Complementary Polarized Image Demosaicing and Super-Resolution. 16081-16090 - Zelin Li
, Chenwei Wang, Zhaoke Huang, Yiming Ma, Cunming Zhao, Zhongying Zhao
, Hong Yan:
Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution. 16091-16100 - Jungho Lee, Suhwan Cho, Taeoh Kim, Ho-Deok Jang, Minhyeok Lee, Geonho Cha, Dongyoon Wee, Dogyoon Lee, Sangyoun Lee:
CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images. 16101-16110 - Zixuan Chen, Yujin Wang
, Xin Cai, Zhiyuan You, Zheming Lu, Fan Zhang, Shi Guo, Tianfan Xue:
UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion. 16111-16121 - Xiaoyu Zhang
, Weihong Pan, Chong Bao, Xiyu Zhang, Xiaojun Xiang, Hanqing Jiang, Hujun Bao:
LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene. 16122-16132 - Jiajun Tang, Fan Fei
, Zhihao Li, Xiao Tang, Shiyong Liu, Youyu Chen, Binxiao Huang, Zhenyu Chen, Xiaofei Wu, Boxin Shi:
SpecTRe-GS: Modeling Highly Specular Surfaces with Reflected Nearby Objects by Tracing Rays in 3D Gaussian Splatting. 16133-16142 - Hanxiao Sun, Yupeng Gao, Jin Xie, Jian Yang, Beibei Wang:
SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering. 16143-16152 - Qiyu Dai, Xingyu Ni, Qianfan Shen, Wenzheng Chen, Baoquan Chen, Mengyu Chu:
RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting. 16153-16162 - Ludwic Leonard, Nils Thürey, Rüdiger Westermann:
Light Transport-aware Diffusion Posterior Sampling for Single-View Reconstruction of 3D Volumes. 16163-16174 - Juan A. Rodríguez, Abhay Puri, Shubham Agarwal, Issam H. Laradji, Pau Rodríguez, Sai Rajeswar, David Vázquez, Christopher Pal, Marco Pedersoli:
StarVector: Generating Scalable Vector Graphics Code from Images and Text. 16175-16186 - Cheng Sun, Jaesung Choe, Charles Loop, Wei-Chiu Ma, Yu-Chiang Frank Wang:
Sparse Voxels Rasterization: Real-time High-fidelity Radiance Field Rendering. 16187-16196 - Minye Wu, Haizhao Dai, Kaixin Yao, Tinne Tuytelaars
, Jingyi Yu:
BG-Triangle: Bezier Gaussian Triangle for 3D Vectorization and Rendering. 16197-16207 - Himangi Mittal, Peiye Zhuang, Hsin-Ying Lee, Shubham Tulsiani:
UniPhy: Learning a Unified Constitutive Model for Inverse Physics Simulation. 16208-16218 - Kaiwei Zhang, Dandan Zhu, Xiongkuo Min, Guangtao Zhai:
Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes. 16219-16228 - Xiaoliang Ju, Hongsheng Li:
DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation. 16229-16239 - Mark Boss, Zixuan Huang, Aaryaman Vasishta, Varun Jampani:
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement. 16240-16250 - Rui Chen, Jianfeng Zhang, Yixun Liang, Guan Luo, Weiyu Li, Jiarui Liu, Xiu Li, Xiaoxiao Long, Jiashi Feng, Ping Tan:
Dora: Sampling and Benchmarking for 3D Shape Variational Auto-Encoders. 16251-16261 - Suizhi Huang, Xingyi Yang, Hongtao Lu, Xinchao Wang:
Few-shot Implicit Function Generation via Equivariance. 16262-16272 - Amir Barda, Matheus Gadelha, Vladimir G. Kim, Noam Aigerman, Amit H. Bermano, Thibault Groueix:
Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects. 16273-16282 - Sinisa Stekovic, Arslan Artykov, Stefan Ainetter, Mattia D'Urso, Friedrich Fraundorfer:
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction. 16283-16292 - Susung Hong, Johanna Suvi Karras, Ricardo Martin-Brualla, Ira Kemelmacher-Shlizerman:
Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories. 16293-16303 - Yufei Huang, Bangyan Liao, Yuqi Hu, Haitao Lin, Lirong Wu, Siyuan Li, Cheng Tan, Zicheng Liu, Yunfan Liu, Zelin Zang, Chang Yu, Zhen Lei
:
DaCapo: Score Distillation as Stacked Bridge for Fast and High-quality 3D Editing. 16304-16313 - Takuhiro Kaneko:
Structure from Collision. 16314-16324 - Zixuan Chen, Guangcong Wang, Jiahao Zhu, Jianhuang Lai, Xiaohua Xie:
GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting. 16325-16335 - Hengyu Liu, Yuehao Wang, Chenxin Li, Ruisi Cai, Kevin Wang, Wuyang Li, Pavlo Molchanov, Peihao Wang, Zhangyang Wang:
FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting. 16336-16345 - You Shen, Zhipeng Zhang, Xinyang Li, Yansong Qu, Yu Lin, Shengchuan Zhang, Liujuan Cao:
Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization. 16346-16355 - Suyoung Lee, Jaeyoung Chung, Kihoon Kim, Jaeyoo Huh, Gunhee Lee, Minsoo Lee, Kyoung Mu Lee:
OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities. 16356-16365 - Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen, Jie-Ying Lee
, Bo-Hsu Ke, Chun-Wei Tuan Mu, Yi-Chuan Huang, Chin-Yang Lin, Min-Hung Chen, Yen-Yu Lin, Yu-Lun Liu:
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360deg Unbounded Scene Inpainting. 16366-16376 - Chong Bao, Xiyu Zhang, Zehao Yu, Jiale Shi, Guofeng Zhang, Songyou Peng, Zhaopeng Cui:
Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views. 16377-16387 - Nicole Meng, Caleb Manicke, Ronak Sahu, Caiwen Ding, Yingjie Lao:
Advancing Adversarial Robustness in GNeRFs: The IL2-NeRF Attack. 16388-16397 - Jiahe Li, Feiyu Wang, Xiaochao Qu, Chengjing Wu, Luoqi Liu, Ting Liu:
EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis. 16398-16407 - Xiaoding Yuan, Shitao Tang, Kejie Li, Peng Wang:
CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model. 16408-16417 - Yash Kant, Ethan Weber, Jin Kyu Kim, Rawal Khirodkar, Su Zhaoen, Julieta Martinez, Igor Gilitschenski, Shunsuke Saito, Timur M. Bagautdinov:
Pippo: High-Resolution Multi-View Humans from a Single Image. 16418-16429 - Yihang Luo, Shangchen Zhou, Yushi Lan, Xingang Pan, Chen Change Loy:
3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement. 16430-16440 - Hanwen Jiang, Zexiang Xu, Desai Xie, Ziwen Chen, Haian Jin, Fujun Luan, Zhixin Shu, Kai Zhang, Sai Bi, Xin Sun, Jiuxiang Gu, Qixing Huang, Georgios Pavlakos, Hao Tan:
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data. 16441-16452 - Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, Marc Pollefeys:
DepthSplat: Connecting Gaussian Splatting and Depth. 16453-16463 - Alex Trevithick, Roni Paiss, Philipp Henzler, Dor Verbin, Rundi Wu, Hadi Alzayer, Ruiqi Gao, Ben Poole, Jonathan T. Barron, Aleksander Holynski, Ravi Ramamoorthi, Pratul P. Srinivasan:
SimVS: Simulating World Inconsistencies for Robust View Synthesis. 16464-16474 - Hanyang Wang, Fangfu Liu, Jiawei Chi, Yueqi Duan:
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step. 16475-16485 - Liyan Chen, Huangying Zhan, Kevin Chen, Xiangyu Xu, Qingan Yan, Changjiang Cai, Yi Xu:
ActiveGAMER: Active GAussian Mapping through Efficient Rendering. 16486-16497 - Dongrui Dai
, Yuxiang Xing:
EAP-GS: Efficient Augmentation of Pointcloud for 3D Gaussian Splatting in Few-shot Scene Reconstruction. 16498-16507 - Guoyu Lu:
Shading Meets Motion: Self-supervised Indoor 3D Reconstruction Via Simultaneous Shape-from-Shading and Structure-from-Motion. 16508-16519 - Jinbo Yan, Rui Peng, Zhiyan Wang, Luyang Tang, Jiayu Yang, Jie Liang, Jiahao Wu, Ronggang Wang:
Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting. 16520-16531 - Yiren Lu
, Yunlai Zhou, Disheng Liu, Tuo Liang, Yu Yin:
BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting. 16532-16542 - Chengwei Zheng, Lixin Xue, Juan Zarate, Jie Song:
GauSTAR: Gaussian Surface Tracking and Reconstruction. 16543-16553 - Zhengxian Yang, Shi Pan, Shengqi Wang, Haoxiang Wang, Li Lin, Guanjun Li, Zhengqi Wen, Borong Lin, Jianhua Tao, Tao Yu:
ImViD: Immersive Volumetric Videos for Enhanced VR Engagement. 16554-16564 - Peter Kulits, Michael J. Black, Silvia Zuffi:
Reconstructing Animals and the Wild. 16565-16577 - Muhammad Hamza Mughal, Rishabh Dabral, Merel C. J. Scholman, Vera Demberg, Christian Theobalt:
Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis. 16578-16588 - Suhyun Shin, Seungwoo Yoon, Ryota Maeda, Seung-Hwan Baek:
Dense Dispersed Structured Light for Hyperspectral 3D Imaging of Dynamic Scenes. 16589-16598 - Jongsung Lee, Harin Park, Byeong-Uk Lee, Kyungdon Joo:
HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics. 16599-16608 - Kang Chen, Jiyuan Zhang, Zecheng Hao
, Yajing Zheng, Tiejun Huang, Zhaofei Yu:
USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting. 16609-16618 - Xuan Zhu, Jijun Xiang, Xianqi Wang, Longliang Liu, Yu Wang, Hong Zhang, Fei Guo, Xin Yang:
SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion. 16619-16628 - Nisha Varghese, A. N. Rajagopalan:
Sea-ing in Low-light. 16629-16640 - Jingyi Zhou, Peng Ye, Haoyu Zhang, Jiakang Yuan, Rao Qiang, Yangchenxu Liu, Wu Cailin, Feng Xu, Tao Chen:
Consistency-aware Self-Training for Iterative-based Stereo Matching. 16641-16650 - Yuzheng Liu, Siyan Dong, Shuzhe Wang, Yingda Yin, Yanchao Yang, Qingnan Fan, Baoquan Chen:
SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos. 16651-16662 - Diankun Wu, Fangfu Liu, Yi-Hsin Hung, Yue Qian, Xiaohang Zhan, Yueqi Duan:
4D-Fly: Fast 4D Reconstruction from a Single Monocular Video. 16663-16673 - Andrea Porfiri Dal Cin, Georgi Dikov, Jihong Ju, Mohsen Ghafoorian:
AnyMap: Learning a General Camera Model for Structure-from-Motion with Unknown Distortion in Dynamic Scenes. 16674-16684 - Junchen Yu, Si-Yuan Cao, Runmin Zhang
, Chenghao Zhang, Zhu Yu, Shujie Chen, Bailin Yang, Hui-Liang Shen:
SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Reformulation and Split Optimization. 16685-16694 - Riku Murai, Eric Dexheimer, Andrew J. Davison:
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors. 16695-16705 - Yifan Yu, Shaohui Liu, Rémi Pautrat, Marc Pollefeys, Viktor Larsson:
Relative Pose Estimation through Affine Corrections of Monocular Depth Priors. 16706-16716 - Felix Wimbauer, Weirong Chen
, Dominik Muhle, Christian Rupprecht, Daniel Cremers:
AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos. 16717-16727 - Yunxuan Li, Lei Fan, Xiaoying Xing, Jianxiong Zhou, Ying Wu:
GPVK-VL: Geometry-Preserving Virtual Keyframes for Visual Localization under Large Viewpoint Changes. 16728-16738 - Siyan Dong, Shuzhe Wang, Shaohui Liu, Lulu Cai, Qingnan Fan, Juho Kannala, Yanchao Yang:
Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization. 16739-16752 - Alan Baade, Changan Chen:
Self-Supervised Cross-View Correspondence with Predictive Cycle Consistency. 16753-16763 - Ruojin Cai, Jason Y. Zhang, Philipp Henzler, Zhengqi Li, Noah Snavely, Ricardo Martin-Brualla:
Can Generative Video Models Help Pose Estimation? 16764-16773 - Sven Elflein, Qunjie Zhou, Laura Leal-Taixé:
Light3R-SfM: Towards Feed-forward Structure-from-Motion. 16774-16784 - Yuguang Li, Ivaylo Boyadzhiev, Zixuan Liu, Linda G. Shapiro, Alex Colburn:
BADGR: Bundle Adjustment Diffusion Conditioned by Gradients for Wide-Baseline Floor Plan Reconstruction. 16785-16795 - Chi Su, Xiaoxuan Ma, Jiajun Su, Yizhou Wang:
SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens. 16796-16806 - Hongwei Zheng, Han Li, Wenrui Dai, Ziyang Zheng, Chenglin Li, Junni Zou, Hongkai Xiong:
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation. 16807-16817 - Weijian Deng, Dylan Campbell, Chunyi Sun, Jiahao Zhang, Shubham Kanitkar, Matthew E. Shaffer, Stephen Gould:
Pos3R: 6D Pose Estimation for Unseen Objects Made Easy. 16818-16828 - Tao Tan
, Qiulei Dong:
ONDA-Pose: Occlusion-Aware Neural Domain Adaptation for Self-Supervised 6D Object Pose Estimation. 16829-16838 - Junning Qiu, Minglei Lu, Fei Wang, Yu Guo, Yonggen Ling:
Leveraging Global Stereo Consistency for Category-Level Shape and 6D Pose Estimation from Stereo Images. 16839-16849 - Li Jin, Yujie Wang, Wenzheng Chen, Qiyu Dai, Qingzhe Gao, Xueying Qin, Baoquan Chen:
One-shot 3D Object Canonicalization based on Geometric and Semantic Consistency. 16850-16859 - Zixuan Huang, Mark Boss, Aaryaman Vasishta, James M. Rehg, Varun Jampani:
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images. 16860-16870 - Wenrui Cai, Qingjie Liu, Yunhong Wang:
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking. 16871-16881 - Haolin Qin, Tingfa Xu, Tianhao Li, Zhenxiang Chen, Tao Feng, Jianan Li:
MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking. 16882-16891 - Huijie Fan, Yu Qiao, Yihao Zhen, Tinghui Zhao, Baojie Fan, Qiang Wang:
All-Day Multi-Camera Multi-Target Tracking. 16892-16901 - Sunkyung Park, Jeongmin Lee, Dongjun Lee:
Shape Abstraction via Marching Differentiable Support Functions. 16902-16911 - Shaoming Li, Qing Cai, Songqi Kong, Runqing Tan, Heng Tong, Shiji Qiu, Yongguo Jiang, Zhi Liu:
MESC-3D: Mining Effective Semantic Cues for 3D Reconstruction from a Single Image. 16912-16921 - Xinjun Li, Wenfei Yang, Jiacheng Deng, Zhixin Cheng
, Xu Zhou, Tianzhu Zhang:
Implicit Correspondence Learning for Image-to-Point Cloud Registration. 16922-16931 - Rao Fu, Jianmin Zheng, Liang Yu:
Consistent Normal Orientation for 3D Point Clouds via Least Squares on Delaunay Graph. 16932-16942 - Haobo Jiang
, Jin Xie, Jian Yang, Liang Yu, Jianmin Zheng:
Zero-shot RGB-D Point Cloud Registration with Pre-trained Large Vision Model. 16943-16952 - Yi Du, Zhipeng Zhao, Shaoshu Su, Sharath Golluri, Haoze Zheng, Runmao Yao, Chen Wang:
SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization. 16953-16964 - Khanh Nguyen
, Ghulam Mubashar Hassan
, Ajmal Mian
:
Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition. 16965-16975 - Yaohua Zha, Yanzi Wang, Hang Guo, Jinpeng Wang, Tao Dai, Bin Chen, Zhihao Ouyang, Xue Yuerong, Ke Chen, Shu-Tao Xia:
PMA: Towards Parameter-Efficient Point Cloud Understanding via Point Mamba Adapter. 16976-16986 - Boqian Zhang, Shen Yang, Hao Chen, Chao Yang, Jing Jia, Guang Jiang:
Point Cloud Upsampling Using Conditional Diffusion Module with Adaptive Noise Suppression. 16987-16996 - Zhaochong An, Guolei Sun, Yun Liu, Runjia Li, Junlin Han, Ender Konukoglu, Serge J. Belongie
:
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model. 16997-17007 - Yujun Liu, Ruisheng Wang, Shangfeng Huang, Guorong Cai:
EdgeDiff: Edge-aware Diffusion Network for Building Reconstruction from Point Clouds. 17008-17018 - Yang Wu, Yun Zhu, Kaihua Zhang, Jianjun Qian, Jin Xie, Jian Yang:
WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion. 17019-17028 - Chenxu Dang, Zaipeng Duan
, Pei An, Xinmin Zhang, Xuzhong Hu, Jie Ma:
FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection. 17029-17038 - Dusan Malic, Christian Fruhwirth-Reisinger, Samuel Schulter, Horst Possegger:
LiSu: A Dataset and Method for LiDAR Surface Normal Estimation. 17039-17049 - Yongshu Huang, Chen Liu, Minghang Zhu, Sheng Ao, Chenglu Wen, Cheng Wang:
DiffLO: Semantic-Aware LiDAR Odometry with Diffusion-Based Refinement. 17050-17059 - Duc-Hai Pham, Tung Do, Phong Nguyen, Binh-Son Hua, Khoi Nguyen, Rang Nguyen:
SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation. 17060-17069 - Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang:
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation. 17070-17080 - Xiaomeng Chu, Jiajun Deng, Guoliang You, Yifan Duan, Houqiang Li, Yanyong Zhang:
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion. 17081-17091 - Lei Lai, Zekai Yin, Eshed Ohn-Bar:
ZeroVO: Visual Odometry with Minimal Assumptions. 17092-17102 - You Wu, Xucheng Wang, Xiangyang Yang, Mengyuan Liu, Dan Zeng, Hengzhou Ye, Shuiwang Li:
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking. 17103-17113 - Jesse J. Hagenaars, Yilun Wu, Federico Paredes-Vallés, Stein Stroobants, Guido C. H. E. de Croon:
On-Device Self-Supervised Learning of Low-Latency Monocular Depth from Only Events. 17114-17123 - Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen:
Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting. 17124-17133 - Gyeongrok Oh, Sungjune Kim, Heeju Ko, Hyung-gun Chi, Jinkyu Kim, Dongwook Lee, Daehyun Ji, Sungjoon Choi, Sujin Jang, Sangpil Kim:
3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation. 17134-17144 - Hyo-Jun Lee, Yeong Jun Koh, Hanul Kim, Hyunseop Kim, Yonguk Lee, Jinu Lee:
SOAP: Vision-Centric 3D Semantic Scene Completion with Scene-Adaptive Decoder and Occluded Region-Aware View Projection. 17145-17154 - Yancong Lin, Shiming Wang, Liangliang Nan
, Julian F. P. Kooij
, Holger Caesar:
VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow. 17155-17164 - Haiming Zhang, Wending Zhou, Yiyao Zhu, Xu Yan, Jiantao Gao, Dongfeng Bai, Yingjie Cai, Bingbing Liu, Shuguang Cui, Zhen Li:
VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving. 17165-17175 - Kuang Wu, Chuan Yang, Zhanbin Li:
InteractionMap: Improving Online Vectorized HDMap Construction with Interaction. 17176-17186 - Wei Wu, Xi Guo, Weixuan Tang, Tingxuan Huang, Chiyu Wang, Chenjing Ding:
DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion. 17187-17196 - Changsheng Lv, Mengshi Qi, Liang Liu, Huadong Ma:
T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving. 17197-17206 - Luke Rowe, Roger Girgis, Anthony Gosselin, Liam Paull, Christopher Pal, Felix Heide:
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments. 17207-17218 - Zhiwei Dong, Ran Ding, Wei Li, Peng Zhang, Guobin Tang, Jia Guo:
Leveraging SD Map to Augment HD Map-based Trajectory Prediction. 17219-17228 - Yi Yu, Weizhen Han, Libing Wu, Bingyi Liu, Enshu Wang, Zhuangzhuang Zhang
:
Enduring, Efficient and Robust Trajectory Prediction Attack in Autonomous Driving via Optimization-Driven Multi-Frame Perturbation Framework. 17229-17238 - Dongkun Zhang, Jiaming Liang, Ke Guo, Sha Lu, Qi Wang, Rong Xiong, Zhenwei Miao, Yue Wang:
CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-Scale Reinforcement Learning in Autonomous Driving. 17239-17248 - Wufei Ma, Luoxin Ye, Celso M. de Melo, Alan L. Yuille, Jieneng Chen:
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models. 17249-17260 - Zhenhua Xu, Yan Bai, Yujia Zhang, Zhuoling Li, Fei Xia, Kwan-Yee K. Wong, Jianqiang Wang, Hengshuang Zhao:
DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving. 17261-17270 - Ahmad Rahimi, Po-Chien Luan, Yuejiang Liu, Frano Rajic, Alexandre Alahi:
Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations. 17271-17281 - Yuxiang Fu, Qi Yan, Lele Wang, Ke Li, Renjie Liao:
MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation. 17282-17293 - Yuncong Yang, Han Yang, Jiachen Zhou, Peihao Chen, Hongxin Zhang, Yilun Du, Chuang Gan:
3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning. 17294-17303 - Xingyu Chen, Zhuheng Song, Xiaoke Jiang, Yaoqing Hu, Junzhi Yu, Lei Zhang:
HandOS: 3D Hand Reconstruction in One Stage. 17304-17314 - Zifan Wang, Ziqing Chen, Junyu Chen, Jilong Wang, Yuxin Yang, Yunze Liu, Xueyi Liu, He Wang, Li Yi:
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data. 17315-17325 - Yawen Shao, Wei Zhai, Yuhang Yang, Hongchen Luo, Yang Cao, Zheng-Jun Zha:
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding. 17326-17336 - He Zhu, Quyu Kong, Kechun Xu, Xunlong Xia, Bing Deng, Jieping Ye, Rong Xiong, Yue Wang:
Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions. 17337-17346 - Yueru Jia, Jiaming Liu, Sixiang Chen, Chenyang Gu, Zhilue Wang, Longzan Luo, Xiaoqi Li, Pengwei Wang, Zhongyuan Wang, Renrui Zhang, Shanghang Zhang:
Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation. 17347-17358 - Mingjie Pan, Jiyao Zhang, Tianshu Wu, Yinghao Zhao, Wenlong Gao, Hao Dong:
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints. 17359-17369 - Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura, Shinsuke Mori:
Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision. 17370-17382 - Yu Qi, Yuanchen Ju, Tianming Wei, Chi Chu, Lawson L. S. Wong, Huazhe Xu:
Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation. 17383-17393 - Qi Lv, Hao Li, Xiang Deng, Rui Shao, Yinchuan Li, Jianye Hao, Longxiang Gao, Michael Yu Wang, Liqiang Nie:
Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation. 17394-17404 - Shun Iwase, Muhammad Zubair Irshad, Katherine Liu, Vitor Guizilini, Robert Lee, Takuya Ikeda, Ayako Amma, Koichi Nishiwaki, Kris Kitani, Rares Ambrus, Sergey Zakharov:
ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping. 17405-17415 - Muchen Li, Sammy Christen, Chengde Wan, Yujun Cai, Renjie Liao, Leonid Sigal, Shugao Ma:
LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion. 17416-17425 - Boran Wen, Dingbang Huang, Zichen Zhang, Jiahong Zhou, Jianbin Deng, Jingyu Gong, Yulong Chen, Lizhuang Ma, Yong-Lu Li:
Reconstructing In-the-Wild Open-Vocabulary Human-Object Interactions. 17426-17436 - Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang, Junuk Cha, Soohyun Hwang, Hyein Hwang, Seungryul Baek:
BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting. 17437-17447 - Kefan Chen, Chaerin Min, Linguang Zhang, Shreyas Hampali, Cem Keskin, Srinath Sridhar:
FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation. 17448-17460 - Rao Fu, Dingxi Zhang, Alex Jiang, Wanjia Fu, Austin Funk, Daniel Ritchie, Srinath Sridhar:
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities. 17461-17474 - Buzhen Huang, Chen Li, Chongyang Xu, Dongyue Lu, Jinnan Chen, Yangang Wang, Gim Hee Lee:
Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning. 17475-17485 - Jin Lyu, Tianyi Zhu, Yi Gu, Li Lin, Pujin Cheng, Yebin Liu, Xiaoying Tang, Liang An:
AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer. 17486-17496 - Andrea Boscolo Camiletto, Jian Wang, Eduardo Alvarado, Rishabh Dabral, Thabo Beeler, Marc Habermann, Christian Theobalt:
FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video. 17497-17507 - Hyunjun Lee, Hyunsoo Lee, Sookwan Han:
SyncSDE: A Probabilistic Framework for Diffusion Synchronization. 17508-17517 - Jiaman Li, C. Karen Liu
, Jiajun Wu:
Lifting Motion to the 3D World via 2D Diffusion. 17518-17528 - Kenkun Liu, Yurong Fu, Weihao Yuan
, Jing Lin, Peihao Li, Xiaodong Gu, Lingteng Qiu, Haoqian Wang, Zilong Dong, Xiaoguang Han:
Motions as Queries: One-Stage Multi-Person Holistic Human Motion Capture. 17529-17539 - Yinhuai Wang, Qihan Zhao, Runyi Yu, Hok Wai Tsui, Ailing Zeng, Jing Lin, Zhengyi Luo, Jiwen Yu, Xiu Li, Qifeng Chen, Jian Zhang, Lei Zhang, Ping Tan:
SkillMimic: Learning Basketball Interaction Skills from Demonstrations. 17540-17549 - Yingying Fan, Quanwei Yang, Kaisiyuan Wang, Hang Zhou, Yingying Li, Haocheng Feng, Errui Ding, Yu Wu, Jingdong Wang:
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model. 17550-17560 - Peishan Cong, Ziyi Wang, Yuexin Ma, Xiangyu Yue:
SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance. 17561-17570 - Xuan Li, Qianli Ma, Tsung-Yi Lin, Yongxin Chen, Chenfanfu Jiang, Ming-Yu Liu, Donglai Xiang:
Articulated Kinematics Distillation from Video Diffusion Models. 17571-17581 - Lei Li, Sen Jia, Jianhao Wang, Zhongyu Jiang, Feng Zhou, Ju Dai, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang:
Human Motion Instruction Tuning. 17582-17591 - Jianrong Zhang, Hehe Fan, Yi Yang:
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space. 17592-17602 - Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung, Kai-Po Chang, Fu-En Yang, Yu-Chiang Frank Wang:
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models. 17603-17612 - Kumar Ashutosh, Georgios Pavlakos, Kristen Grauman:
FIction: 4D Future Interaction Prediction from Video. 17613-17625 - Jiuming Liu, Jinru Han, Lihao Liu, Angelica I. Avilés-Rivero, Chaokang Jiang, Zhe Liu, Hesheng Wang:
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models. 17626-17636 - Vadim Tschernezki, Diane Larlus, Iro Laina, Andrea Vedaldi:
Layered Motion Fusion: Lifting Motion Segmentation to 3D in Egocentric Videos. 17637-17648 - Haoyue Liu
, Jinghan Xu, Yi Chang, Hanyu Zhou, Haozhi Zhao, Lin Wang, Luxin Yan:
TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion. 17649-17659 - Zhengfei Kuang, Tianyuan Zhang, Kai Zhang, Hao Tan, Sai Bi, Yiwei Hu, Zexiang Xu, Milos Hasan, Gordon Wetzstein, Fujun Luan:
Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors. 17660-17670 - Min-Wu Jeong, Chae-Eun Rhee:
LC-Mamba: Local and Continuous Mamba with Shifted Windows for Frame Interpolation. 17671-17681 - Xin Yu, Tianyu Wang, Soo Ye Kim, Paul Guerrero, Xi Chen, Qing Liu, Zhe Lin, Xiaojuan Qi:
ObjectMover: Generative Object Movement with Video Prior. 17682-17691 - Juil Koo, Paul Guerrero, Chun-Hao Paul Huang, Duygu Ceylan, Minhyuk Sung:
VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors. 17692-17701 - Karan Dalal, Daniel Koceja, Jiarui Xu, Yue Zhao, Shihao Han, Ka Chun Cheung, Jan Kautz, Yejin Choi, Yu Sun, Xiaolong Wang:
One-Minute Video Generation with Test-Time Training. 17702-17711 - Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang, Qing Liu, Zhifei Zhang, Joon-Young Lee, Yijun Li, Bei Yu, Zhe Lin, Soo Ye Kim, Jiaya Jia:
Generative Video Propagation. 17712-17722 - Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo, Willi Menapace, Aliaksandr Siarohin, Michael Vasilkovsky, Ivan Skorokhodov, Sergey Tulyakov, Peter Wonka, Hsin-Ying Lee:
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion. 17723-17732 - Guodong Ding, Rongyu Chen, Angela Yao:
Condensing Action Segmentation Datasets via Generative Network Inversion. 17733-17742 - Muhammad Umar Karim Khan, Aaron Chadha, Mohammad Ashraful Anam, Yiannis Andreopoulos:
Perceptual Video Compression with Neural Wrapping. 17743-17754 - Shuoyan Wei, Feng Li, Shengeng Tang
, Yao Zhao, Huihui Bai:
EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events. 17755-17766 - Huimin Zeng, Jiacheng Li, Zhiwei Xiong:
Plug-and-Play Versatile Compressed Video Enhancement. 17767-17777 - Zongjian Li, Bin Lin, Yang Ye, Liuhan Chen, Xinhua Cheng, Shenghai Yuan
, Li Yuan:
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model. 17778-17788 - Zhuoling Li, Hossein Rahmani, Qiuhong Ke, Jun Liu:
LongDiff: Training-Free Long Video Generation in One Go. 17789-17798 - Shian Du, Menghan Xia, Chang Liu, Xintao Wang, Jing Wang, Pengfei Wan, Di Zhang, Xiangyang Ji:
PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution. 17799-17809 - Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar, Xiangyang Ji, Xu-Cheng Yin:
DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework. 17810-17820 - Lianxin Xie, Bingbing Zheng, Si Wu, Hau-San Wong:
Dynamic Content Prediction with Motion-aware Priors for Blind Face Video Restoration. 17821-17830 - Haoyan Gong, Zhenrong Zhang, Yuzheng Feng, Anh Nguyen, Hongbin Liu:
LP-Diff: Towards Improved Restoration of Real-World Degraded License Plate. 17831-17840 - Kenghong Lin, Baoquan Zhang, Demin Yu, Wenzhi Feng, Shidong Chen, Feifan Gao, Xutao Li, Yunming Ye:
AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting. 17841-17850 - Yi Liu, Wengen Li, Jihong Guan
, Shuigeng Zhou, Yichao Zhang:
Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space. 17851-17861 - Jian Zhu, He Wang, Yang Xu, Zebin Wu, Zhihui Wei:
Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model. 17862-17871 - Xueyang Wang, Zhixin Zheng, Jiandong Shao, Yule Duan, Liang-Jian Deng
:
Adaptive Rectangular Convolution for Remote Sensing Pansharpening. 17872-17881 - Guanyao Wu, Haoyu Liu, Hongming Fu, Yichuan Peng, Jinyuan Liu, Xin Fan, Risheng Liu:
Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond. 17882-17891 - Donggoo Jung, Daehyun Kim, Guanghui Wang, Tae Hyun Kim:
Exposure-slot: Exposure-centric Representations Learning with Slot-in-Slot Attention for Region-aware Exposure Correction. 17892-17901 - Xin Liu, Jie Liu
, Jie Tang, Gangshan Wu:
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution. 17902-17912 - Yubin Gu
, Yuan Meng, Jiayi Ji, Xiaoshuai Sun:
ACL: Activating Capability of Linear Attention for Image Restoration. 17913-17923 - Tong Li, Lizhi Wang, Zhiyuan Xu, Lin Zhu, Wanxuan Lu, Hua Huang:
Positive2Negative: Breaking the Information-Lossy Barrier in Self-Supervised Single Image Denoising. 17924-17934 - Chen Zhao
, Zhizhou Chen, Yunzhe Xu, Enxuan Gu, Jian Li, Zili Yi, Qian Wang, Jian Yang, Ying Tai:
From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective. 17935-17946 - Muhammad Abdullah Jamal, Omid Mohareri:
Multi-Modal Contrastive Masked Autoencoders: A Two-Stage Progressive Pre-training Approach for RGBD Datasets. 17947-17957 - MinKyu Lee, Sangeek Hyun, Woojin Jun, Jae-Pil Heo:
Auto-Encoded Supervision for Perceptual Image Super-Resolution. 17958-17968 - I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu, Yuan-Chun Chiang, Sy-Yen Kuo, Ming-Hsuan Yang:
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior. 17969-17979 - Leheng Zhang, Weiyi You, Kexuan Shi, Shuhang Gu:
Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model. 17980-17989 - Wenhao Shen
, Mingliang Zhou, Yu Chen, Xuekai Wei, Yong Feng, Huayan Pu, Weijia Jia:
Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference. 17990-17999 - Chen Liao
, Yan Shen, Dan Li, Zhongli Wang:
Using Powerful Prior Knowledge of Diffusion Model in Deep Unfolding Networks for Image Compressive Sensing. 18000-18010 - Zhiyuan Chen, Keyi Li, Yifan Jia, Le Ye, Yufei Ma:
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition. 18011-18020 - Ping Chen, Xingpeng Zhang
, Zhaoxiang Liu, Huan Hu, Xiang Liu, Kai Wang, Min Wang, Yanlin Qian, Shiguo Lian:
Optimizing for the Shortest Path in Denoising Diffusion Model. 18021-18030 - Kendong Liu
, Zhiyu Zhu, Hui Liu, Junhui Hou
:
Acc3D: Accelerating Single Image to 3D Diffusion Models via Edge Consistency Guided Score Distillation. 18031-18040 - Fanhu Zeng, Hao Tang, Yihua Shao, Siyu Chen, Ling Shao, Yan Wang:
MambaIC: State Space Models for High-Performance Learned Image Compression. 18041-18050 - Jinchang Xu, Shaokang Wang, Jintao Chen, Zhe Li, Peidong Jia, Fei Zhao, Guoqing Xiang, Zhijian Hao
, Shanghang Zhang, Xiaodong Xie:
Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression. 18051-18061 - Emiel Hoogeboom, Thomas Mensink, Jonathan Heek, Kay Lamerigts, Ruiqi Gao, Tim Salimans:
Simpler Diffusion: 1.5 FID on ImageNet512 with Pixel-space Diffusion. 18062-18071 - Haoran You, Connelly Barnes, Yuqian Zhou, Yan Kang, Zhenbang Du, Wei Zhou, Lingzhi Zhang, Yotam Nitzan, Xiaoyang Liu, Zhe Lin, Eli Shechtman, Sohrab Amirghodsi, Yingyan Celine Lin:
Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers. 18072-18082 - Haipeng Fang, Sheng Tang, Juan Cao, Enshuo Zhang, Fan Tang, Tong-Yee Lee:
Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration. 18083-18092 - Longquan Dai, He Wang, Jinhui Tang:
NoiseCtrl: A Sampling-Algorithm-Agnostic Conditional Generation Method for Diffusion Models. 18093-18102 - Yunpeng Liu, Boxiao Liu, Yi Zhang, Xingzhong Hou, Guanglu Song, Yu Liu, Haihang You:
See Further When Clear: Curriculum Consistency Model. 18103-18112 - Huiyang Shao, Xin Xia, Yuhong Yang, Yuxi Ren, Xing Wang, Xuefeng Xiao:
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories. 18113-18123 - Pingyu Wu, Kai Zhu, Yu Liu, Liming Zhao, Wei Zhai, Yang Cao, Zheng-Jun Zha:
Improved Video VAE for Latent Video Diffusion Model. 18124-18133 - Maosen Zhao, Pengtao Chen, Chong Yu, Yan Wen
, Xudong Tan, Tao Chen:
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning. 18134-18143 - Gongfan Fang, Kunjun Li, Xinyin Ma, Xinchao Wang:
TinyFusion: Diffusion Transformers Learned Shallow. 18144-18154 - Yuanyang Yin, Yaqi Zhao, Mingwu Zheng, Ke Lin, Jiarong Ou, Rui Chen, Victor Shea-Jay Huang, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Baoqun Yin, Wentao Zhang, Kun Gai:
Towards Precise Scaling Laws for Video Diffusion Transformers. 18155-18165 - Kaibo Zhao
, Liang Bao, Yufei Li, Xu Su, Ke Zhang, Xiaotian Qiao:
Less is More: Efficient Image Vectorization with Adaptive Parameterization. 18166-18175 - Mohd Hozaifa Khan, Ravi Kiran Sarvadevabhatla:
Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback. 18176-18186 - Yihao Meng, Hao Ouyang, Hanlin Wang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Zhiheng Liu, Yujun Shen, Huamin Qu:
AniDoc: Animation Creation Made Easier. 18187-18197 - Guy Yariv, Yuval Kirstain, Amit Zohar, Shelly Sheynin, Yaniv Taigman, Yossi Adi, Sagie Benaim, Adam Polyak:
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation. 18198-18208 - Tongtong Su, Chengyu Wang, Bingyan Liu, Jun Huang, Dongming Lu:
Encapsulated Composition of Text-to-Image and Text-to-Video Models for High-Quality Video Synthesis. 18209-18218 - Diljeet Jagpal, Xi Chen, Vinay P. Namboodiri:
EIDT-V: Exploiting Intersections in Diffusion Trajectories for Model-Agnostic, Zero-Shot, Training-Free Text-to-Video Generation. 18219-18228 - Luozhou Wang, Yijun Li, Zhifei Chen, Jui-Hsien Wang, Zhifei Zhang, He Zhang, Zhe Lin, Ying-Cong Chen:
TransPixeler: Advancing Text-to-Video Generation with Transparency. 18229-18239 - Xiang Gao, Shuai Yang, Jiaying Liu:
PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model. 18240-18249 - Hyunsoo Kim, Donghyun Kim, Suhyun Kim:
Difference Inversion: Interpolate and Isolate the Difference with Token Consistency for Image Analogy Generation. 18250-18259 - Ruojun Xu, Weijie Xi, Xiaodi Wang, Yongbo Mao, Zach Cheng:
StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer. 18260-18269 - Yang Zhou, Xu Gao, Zichong Chen, Hui Huang:
Attention Distillation: A Unified Approach to Visual Characteristics Transfer. 18270-18280 - Jihun Park, Jongmin Gim, Kyoungmin Lee, Seunghun Lee, Sunghoon Im:
Style-Editor: Text-driven Object-centric Style Editing. 18281-18291 - Suho Ryu, Kihyun Kim, Eugene Baek, Dongsoo Shin, Joonseok Lee:
Towards Scalable Human-aligned Benchmark for Text-guided Image Editing. 18292-18301 - Weicheng Wang, Guoli Jia, Zhongqi Zhang, Liang Lin, Jufeng Yang:
PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention. 18302-18312 - Navve Wasserman
, Noam Rotstein, Roy Ganz, Ron Kimmel:
Paint by Inpaint: Learning to Add Image Objects by Removing Them First. 18313-18324 - Jun Huang, Ting Liu, Yihang Wu, Xiaochao Qu, Luoqi Liu, Xiaolin Hu:
MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting. 18325-18334 - Yizhe Tang, Zhimin Sun, Yuzhen Du, Ran Yi, Guangben Lu, Teng Hu, Luying Li, Lizhuang Ma, Fangyuan Zou:
ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting. 18335-18345 - Bolin Lai, Felix Juefei-Xu, Miao Liu, Xiaoliang Dai, Nikhil Mehta, Chenguang Zhu, Zeyi Huang, James M. Rehg, Sangmin Lee
, Ning Zhang, Tong Xiao:
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation. 18346-18357 - Pu Cao
, Feng Zhou, Lu Yang, Tianrui Huang, Qing Song:
Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation. 18358-18368 - Qihan Huang, Long Chan, Jinlong Liu, Wanggui He, Hao Jiang, Mingli Song, Jie Song:
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation. 18369-18378 - Dong Liang, Jinyuan Jia, Yuhao Liu
, Zhanghan Ke, Hongbo Fu, Rynson W. H. Lau:
VODiff: Controlling Object Visibility Order in Text-to-Image Generation. 18379-18389 - Yingying Deng, Xiangyu He, Fan Tang, Weiming Dong:
Z-Magic: Zero-shot Multiple Attributes Guided Image Creator. 18390-18400 - Woojung Han, Yeonkyung Lee, Chanyoung Kim, Kwanghyun Park, Seong Jae Hwang:
Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis. 18401-18410 - Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng, Yinghao Xu, Zifan Shi, Yifei Zhang, Qifeng Chen, Yujun Shen:
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis. 18411-18423 - Lifu Wang, Daqing Liu, Xinchen Liu, Xiaodong He:
Scaling Down Text Encoders of Text-to-Image Diffusion Models. 18424-18433 - Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang, Leonidas J. Guibas, Jiajun Wu, Gordon Wetzstein:
Diffusion Self-Distillation for Zero-Shot Customized Image Generation. 18434-18443 - Fu Feng
, Yucheng Xie, Xu Yang, Jing Wang, Xin Geng:
Redefining in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation. 18444-18454 - Shulei Wang, Wang Lin, Hai Huang, Hanting Wang, Sihang Cai, WenKang Han, Tao Jin, Jingyuan Chen, Jiacheng Sun, Jieming Zhu, Zhou Zhao:
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance. 18455-18464 - Kyungmin Lee, Xiahong Li, Qifei Wang, Junfeng He, Junjie Ke, Ming-Hsuan Yang, Irfan Essa, Jinwoo Shin, Feng Yang, Yinxiao Li:
Calibrated Multi-Preference Optimization for Aligning Diffusion Models. 18465-18475 - Keyu Tu, Mengqi Huang, Zhuowei Chen, Zhendong Mao:
A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models. 18476-18485 - Xiaoying Xing, Avinab Saha, Junfeng He, Susan Hao, Paul Vicol, Moonkyung Ryu, Gang Li, Sahil Singla, Sarah Young, Yinxiao Li, Feng Yang, Deepak Ramachandran:
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation. 18486-18496 - Leigang Qu, Haochuan Li, Wenjie Wang, Xiang Liu, Juncheng Li, Liqiang Nie, Tat-Seng Chua:
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation. 18497-18508 - Chongjian Ge, Chenfeng Xu, Yuanfeng Ji
, Chensheng Peng, Masayoshi Tomizuka, Ping Luo, Mingyu Ding, Varun Jampani, Wei Zhan:
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians. 18509-18520 - Yiming Qin, Zhu Xu, Yang Liu:
Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation. 18521-18530 - Yidi Li, Jun Xiao, Zhengda Lu, Yiqun Wang, Haiyong Jiang:
Empowering Vector Graphics with Consistently Arbitrary Viewing and View-dependent Visibility. 18531-18540 - Chen Liang, Lianghua Huang, Jingwu Fang, Huanzhang Dou, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Junge Zhang, Xin Zhao
, Yu Liu:
IDEA-Bench: How Far are Generative Models from Professional Designing? 18541-18551 - Shalini Maiti, Lourdes Agapito, Filippos Kokkinos:
Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects. 18552-18562 - Jiahao Li, Weijian Ma, Xueyang Li, Yunzhong Lou, Guichun Zhou, Xiangdong Zhou:
CAD-Llama: Leveraging Large Language Models for Computer-Aided Design Parametric 3D Model Generation. 18563-18573 - Yunqi Gu, Ian Huang, Jihyeon Je, Guandao Yang, Leonidas J. Guibas:
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing. 18574-18583 - Zhipeng Xu
, De Cheng, Xinyang Jiang, Nannan Wang, Dongsheng Li, Xinbo Gao:
Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization. 18584-18595 - Byung Hyun Lee, Sungjin Lim, Se Young Chun:
Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation. 18596-18606 - Dohyun Kim, Sehwan Park, Geonhee Han
, Seung Wook Kim, Paul Hongsuck Seo:
Random Conditioning for Diffusion Model Compression with Distillation. 18607-18618 - Reza Shirkavand, Peiran Yu, Shangqian Gao, Gowthami Somepalli, Tom Goldstein, Heng Huang:
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models. 18619-18629 - Jisu Nam, Soowon Son, Zhan Xu, Jing Shi, Difan Liu, Feng Liu, Seungryong Kim, Yang Zhou:
Visual Persona: Foundation Model for Full-Body Human Customization. 18630-18641 - Alexandra Gomez-Villa, Kai Wang, C. Alejandro Párraga
, Bartlomiej Twardowski, Jesus Malo, Javier Vazquez-Corral
, Joost van de Weijer:
The Art of Deception: Color Visual Illusions and Diffusion Models. 18642-18652 - Zhenguang Liu, Chao Shuai, Shaojing Fan, Ziping Dong, Jinwu Hu, Zhongjie Ba, Kui Ren:
Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models. 18653-18662 - Haoyu Chen
, Yunqiao Yang
, Nan Zhong, Kede Ma
:
Hiding Images in Diffusion Models by Editing Learned Score Functions. 18663-18673 - Jan Dubinski, Antoni Kowalczuk, Franziska Boenisch, Adam Dziedzic:
CDI: Copyrighted Data Identification in Diffusion Models. 18674-18684 - Fabrizio Guillaro, Giada Zingarini, Ben Usman, Avneesh Sud, Davide Cozzolino, Luisa Verdoliva:
A Bias-Free Training Paradigm for More General AI-generated Image Detection. 18685-18694 - Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli, Simone Scardapane, Fabrizio Silvestri, Emanuele Rodolà:
Task Singular Vectors: Reducing Task Interference in Model Merging. 18695-18705 - Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris, Efstratios Gavves:
Any-Resolution AI-Generated Image Detection by Spectral Learning. 18706-18717 - Jaewoo Song, Daemin Park, Kanghyun Baek, Sangyub Lee, Jooyoung Choi, Eunji Kim, Sungroh Yoon:
DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection. 18718-18727 - Alexander Gielisse, Jan van Gemert:
End-to-End Implicit Neural Representations for Classification. 18728-18737 - Nathan Mankovich, Ignacio Santamaría, Gustau Camps-Valls, Tolga Birdal:
A Flag Decomposition for Hierarchical Datasets. 18738-18748 - Yiwei Bao, Zhiming Wang, Feng Lu:
GazeGene: Large-scale Synthetic Gaze Dataset with 3D Eyeball Annotations. 18749-18759 - Daosong Hu, Mingyue Cui, Kai Huang:
FIFA: Fine-grained Inter-frame Attention for Driver's Video Gaze Estimation. 18760-18769 - Ziyang Chen, Prem Seetharaman, Bryan C. Russell, Oriol Nieto, David Bourgin, Andrew Owens, Justin Salamon:
Video-Guided Foley Sound Generation with Multimodal Controls. 18770-18781 - Zeyue Tian, Zhaoyang Liu, Ruibin Yuan, Jiahao Pan, Qifeng Liu, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo:
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling. 18782-18793 - Edson Araujo, Andrew Rouditchenko, Yuan Gong, Saurabhchand Bhati, Samuel Thomas, Brian Kingsbury, Leonid Karlinsky, Rogério Feris, James R. Glass, Hilde Kuehne:
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment. 18794-18803 - Henghui Du, Guangyao Li, Chang Zhou, Chunjie Zhang, Alan Zhao, Di Hu:
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation. 18804-18814 - Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata, Elisabeta Oneata:
Circumventing Shortcuts in Audio-visual Deepfake Detection Datasets with Unsupervised Learning. 18815-18825 - Qiyao Xue
, Xiangyu Yin, Boyuan Yang, Wei Gao:
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation. 18826-18836 - Tianhao Qi, Jianlong Yuan, Wanquan Feng, Shancheng Fang, Jiawei Liu, SiYu Zhou, Qian He, Hongtao Xie, Yongdong Zhang:
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation. 18837-18846 - Yijie Xu, Bolun Zheng, Wei Zhu, Hangjia Pan, Yuchen Yao, Ning Xu, Anan Liu, Quan Zhang, Chenggang Yan:
SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity. 18847-18857 - Hui Han, Siyuan Li, Jiaqi Chen, Yiwen Yuan, Yuling Wu, Yufan Deng, Chak Tou Leong, Hanwen Du, Junchen Fu, Youhua Li
, Jie Zhang, Chi Zhang, Li-jia Li, Yongxin Ni:
Video-Bench: Human-Aligned Video Generation Benchmark. 18858-18868 - Jiarui Wang, Huiyu Duan, Guangtao Zhai, Juntong Wang, Xiongkuo Min:
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM. 18869-18880 - Niu Lian, Jun Li
, Jinpeng Wang, Ruisheng Luo, Yaowei Wang, Shu-Tao Xia, Bin Chen:
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing. 18881-18890 - Orr Zohar, Xiaohan Wang, Yann Dubois, Nikhil Mehta, Tong Xiao, Philippe Hansen-Estruch, Licheng Yu, Xiaofang Wang, Felix Juefei-Xu, Ning Zhang, Serena Yeung-Levy, Xide Xia:
Apollo: An Exploration of Video Understanding in Large Multimodal Models. 18891-18901 - Junbo Niu, Yifei Li, Ziyang Miao, Chunjiang Ge, Yuanhang Zhou, Qihao He, Xiaoyi Dong, Haodong Duan, Shuangrui Ding, Rui Qian, Pan Zhang, Yuhang Zang, Yuhang Cao, Conghui He, Jiaqi Wang
:
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? 18902-18913 - Darshana Saravanan, Varun Gupta, Darshan Singh S, Zeeshan Khan, Vineet Gandhi, Makarand Tapaswi:
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment. 18914-18924 - Yuxuan Wang, Yueqian Wang, Bo Chen, Tong Wu, Dongyan Zhao, Zilong Zheng:
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts. 18925-18935 - Ziyu Ma, Chenhui Gou, Hengcan Shi, Bin Sun, Shutao Li, Hamid Rezatofighi, Jianfei Cai:
DrVideo: Document Retrieval Based Long Video Understanding. 18936-18946 - Lucas Ventura, Antoine Yang, Cordelia Schmid, Gül Varol:
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs. 18947-18958 - Tiantian Geng, Jinrui Zhang, Qingni Wang, Teng Wang, Jinming Duan, Feng Zheng:
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. 18959-18969 - Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing:
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM. 18970-18980 - Min Jung Lee, Dayoung Gong, Minsu Cho:
Video Summarization with Large Language Models. 18981-18991 - Keda Tao, Can Qin, Haoxuan You, Yang Sui, Huan Wang:
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models. 18992-19001 - Chirag Parikh, Deepti Rawat, Rakshitha R. T, Tathagata Ghosh, Ravi Kiran Sarvadevabhatla:
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives. 19002-19011 - Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu, Thomas Seidl, Gedas Bertasius:
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos. 19012-19022 - Ali Athar, Xueqing Deng, Liang-Chieh Chen:
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation. 19023-19035 - Shehan Munasinghe, Hanan Gani, Wenqi Zhu, Jiale Cao, Eric P. Xing, Fahad Shahbaz Khan, Salman H. Khan:
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos. 19036-19046 - Yanjun Li, Zhaoyang Li, Honghui Chen, Lizhi Xu:
Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing. 19047-19056 - Hang Yin, Xiuwei Xu, Linqing Zhao, Ziwei Wang, Jie Zhou, Jiwen Lu:
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation. 19057-19066 - Feiyu Pan, Hao Fang, Fangkai Li, Yanyu Xu, Yawei Li
, Luca Benini, Xiankai Lu:
Semantic and Sequential Alignment for Referring Video Object Segmentation. 19067-19076 - Yunxiang Fu, Meng Lou
, Yizhou Yu:
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation. 19077-19087 - Farzad Beizaee, Gregory A. Lodygensky, Christian Desrosiers, Jose Dolz:
Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection. 19088-19097 - Zhenghao Xing, Hao Chen, Binzhu Xie, Jiaqi Xu, Ziyu Guo, Xuemiao Xu, Jianye Hao, Chi-Wing Fu, Xiaowei Hu
, Pheng-Ann Heng:
EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights. 19098-19108 - Han Hu, Wenli Du, Peng Liao, Bing Wang, Siyuan Fan:
Noise-Resistant Video Anomaly Detection via RGB Error-Guided Multiscale Predictive Coding and Dynamic Memory. 19109-19119 - Yuhan Shen, Ehsan Elhamifar:
Understanding Multi-Task Activities from Single-Task Videos. 19120-19131 - Mengmeng Wang, Zeyi Huang, Xiangjie Kong, Guojiang Shen, Guang Dai, Jingdong Wang, Yong Liu:
Action Detail Matters: Refining Video Recognition with Local Action Queries. 19132-19142 - Ziyu Yao, Xuxin Cheng, Zhiqi Huang, Lei Li:
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model. 19143-19153 - Hongsong Wang, Xiaoyan Ma, Jidong Kuang, Jie Gui:
Heterogeneous Skeleton-Based Action Representation Learning. 19154-19164 - Xiaohai Li, Bineng Zhong, Qihua Liang, Zhiyi Mo, Jian Nong, Shuxiang Song:
Dynamic Updates for Language Adaptation in Visual-Language Tracking. 19165-19174 - Yu Guo, Weiquan Liu, Qingshan Xu, Shijun Zheng
, Shujun Huang, Yu Zang, Siqi Shen, Chenglu Wen, Cheng Wang:
Boosting Adversarial Transferability through Augmentation in Hypothesis Space. 19175-19185 - Yuning Han, Bingyin Zhao, Rui Chu, Feng Luo, Biplab Sikdar, Yingjie Lao:
UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models. 19186-19196 - Wei Ao, Vishnu Naresh Boddeti:
CryptoFace: End-to-End Encrypted Face Recognition. 19197-19206 - Xinjie Cui, Yuezun Li, Ao Luo, Jiaran Zhou, Junyu Dong:
Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection. 19207-19217 - Haoran Wang, Xinji Mai, Zeng Tao, Xuan Tong, Junxiong Lin, Yan Wang, Jiawen Yu, Shaoqi Yan, Ziheng Zhou, Wenqiang Zhang:
D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition. 19218-19229 - Tianyi Wang, Zichen Wang, Cong Wang, Yuanchao Shu, Ruilong Deng, Peng Cheng, Jiming Chen:
Can't Slow Me Down: Learning Robust and Hardware-Adaptive Object Detectors against Latency Attacks for Edge Devices. 19230-19240 - Wei Huang, Qinying Gu, Nanyang Ye:
Decision SpikeFormer: Spike-Driven Transformer for Decision Making. 19241-19250 - Zhiqi Pang, Junjie Wang, Lingling Zhao, Chunyu Wang:
Identity-Clothing Similarity Modeling for Unsupervised Clothing Change Person Re-Identification. 19251-19260 - Jinxi Yang, He Li, Bo Du, Mang Ye:
Cheb-GR: Rethinking K-nearest Neighbor Search in Re-ranking for Person Re-identification. 19261-19270 - Ji Du
, Fangwei Hao, Mingyang Yu
, Desheng Kong, Jiesheng Wu
, Bin Wang, Jing Xu, Ping Li:
Shift the Lens: Environment-Aware Unsupervised Camouflaged Object Detection. 19271-19282 - Yi Yu, Botao Ren, Peiyuan Zhang, Mingxin Liu, Junwei Luo, Shaofeng Zhang, Feipeng Da, Junchi Yan, Xue Yang:
Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances. 19283-19293 - Hang Zhou, Xinxin Zuo
, Rui Ma, Li Cheng:
BOOTPLACE: Bootstrapped Object Placement with Detection Transformers. 19294-19303 - Fangyun Wei, Jinjing Zhao, Kun Yan, Chang Xu:
Minimizing Labeled, Maximizing Unlabeled: An Image-Driven Approach for Video Instance Segmentation. 19304-19314 - Jiacheng Sun
, Xinghong Zhou
, Yiqiang Wu, Bin Zhu, Jiaxuan Lu, Yu Qin, Xiaomao Li:
PolarNeXt: Rethink Instance Segmentation with Polar Representation. 19315-19324 - Jihuai Zhao, Junbao Zhuo, Jiansheng Chen, Huimin Ma:
SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation. 19325-19334 - Jiaxin Zhang, Junjun Jiang, Youyu Chen, Kui Jiang, Xianming Liu:
COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting. 19335-19344 - Bowen Yin, Jiao-Long Cao, Ming-Ming Cheng, Qibin Hou:
DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation. 19345-19355 - Chongkai Yu, Ting Liu, Anqi Li, Xiaochao Qu, Chengjing Wu, Luoqi Liu, Xiaolin Hu
:
SAM-REF: Introducing Image-Prompt Synergy during Interaction for Detail Enhancement in the Segment Anything Model. 19356-19365 - Subhransu S. Bhattacharjee, Dylan Campbell, Rahul Shome:
Believing is Seeing: Unobserved Object Detection using Generative Models. 19366-19377 - Ege Özsoy, Chantal Pellegrini, Tobias Czempiel, Felix Tristram, Kun Yuan, David Bani-Harouni, Ulrich Eck
, Benjamin Busam, Matthias Keicher, Nassir Navab:
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments. 19378-19389 - Jinlong Li, Cristiano Saltori, Fabio Poiesi, Nicu Sebe
:
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding. 19390-19400 - Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang, Ruida Zhang, Xiangyang Ji, Marc Pollefeys, Francis Engelmann:
Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces. 19401-19413 - Junsheng Wang, Nieqing Cao, Yan Ding, Mengying Xie, Fuqiang Gu, Chao Chen:
SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs. 19414-19423 - Hsiang-Wei Huang, Fu-Chen Chen, Wenhao Chai, Che-Chun Su, Lu Xia, Sanghun Jung, Cheng-Yen Yang, Jenq-Neng Hwang, Min Sun, Cheng-Hao Kuo:
Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression. 19424-19434 - Zhihao Yuan, Yibo Peng, Jinke Ren, Yinghong Liao, Yatong Han, Chun-Mei Feng, Hengshuang Zhao, Guanbin Li, Shuguang Cui, Zhen Li:
Empowering Large Language Models with 3D Situation Awareness. 19435-19445 - Damiano Marsili, Rohun Agrawal, Yisong Yue, Georgia Gkioxari:
Visual Agentic AI for Spatial Reasoning with a Dynamic API. 19446-19455 - Ziyi Bai, Hanxuan Li, Bin Fu
, Chuyan Xiong, Ruiping Wang, Xilin Chen:
R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner. 19456-19466 - Yi Fang, Bowen Jin, Jiacheng Shen, Sirui Ding, Qiaoyu Tan, Jiawei Han:
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs. 19467-19476 - Yuchen Sun, Shanhui Zhao, Tao Yu, Hao Wen, Samith Va, Mengwei Xu, Yuanchun Li, Chongyang Zhang:
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration. 19477-19486 - Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, Qian Yu:
Empowering LLMs to Understand and Generate Complex Vector Graphics. 19487-19497 - Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Stan Weixian Lei, Lijuan Wang, Mike Zheng Shou:
ShowUI: One Vision-Language-Action Model for GUI Visual Agent. 19498-19508 - Xu Cao, Pranav Virupaksha, Wenqi Jia, Bolin Lai, Fiona Ryan, Sangmin Lee
, James M. Rehg:
SocialGesture: Delving into Multi-person Gesture Understanding. 19509-19519 - Jun Gao, Yongqi Li, Ziqiang Cao, Wenjie Li:
Interleaved-Modal Chain-of-Thought. 19520-19529 - Guillaume Astruc, Nicolas Gonthier, Clément Mallet, Loïc Landrieu:
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities. 19530-19540 - Peijie Wang, Zhong-Zhi Li, Fei Yin, Dekang Ran, Cheng-Lin Liu:
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts. 19541-19551 - James Burgess, Jeffrey J. Nirschl, Laura Bravo-Sánchez, Alejandro Lozano, Sanket Rajan Gupte, Jesus G. Galaz-Montoya, Yuhui Zhang, Yuchang Su, Disha Bhowmik, Zachary Coman, Sarina M. Hasan, Alexandra Johannesson, William D. Leineweber
, Malvika G. Nair, Ridhi Yarlagadda, Connor Zuraski, Wah Chiu
, Sarah Cohen, Jan N. Hansen
, Manuel D. Leonetti, Chad Liu, Emma Lundberg
, Serena Yeung-Levy:
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research. 19552-19564 - Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana, Noor Ahsan, Nevasini Sasikumar, Omkar Thawakar, Henok Biadglign Ademtew, Yahya Hmaiti, Amandeep Kumar, Kartik Kuckreja, Mykola Maslych, Wafa Al Ghallabi, Mihail Minkov Mihaylov, Chao Qin, Abdelrahman M. Shaker, Mike Zhang
, Mahardika Krisna Ihsani, Amiel Gian Esplana, Monil Gokani, Shachar Mirkin, Harsh Singh, Ashay Srivastava, Endre Hamerlik, Fathinah Asma Izzati, Fadillah Adamsyah Maani, Sebastian Cavada, Jenny Chim, Rohit Gupta, Sanjay Manjunath, Kamila Zhumakhanova, Feno Heriniaina Rabevohitra, Azril Hafizi Amirudin, Muhammad Ridzuan, Daniya Najiha Abdul Kareem, Ketan Pravin More, Kunyang Li, Pramesh Shakya, Muhammad Saad, Amirpouya Ghasemaghaei, Amirbek Djanibekov, Dilshod Azizov, Branislava Jankovic, Naman Bhatia, Alvaro Cabrera, Johan S. Obando-Ceron, Olympiah Otieno, Fabian Farestam, Muztoba Rabbani, Sanoojan Baliah, Santosh Sanjeev, Abduragim Shtanchaev, Maheen Fatima, Thao Nguyen, Amrin Kareem, Toluwani Aremu, Nathan Augusto Zacarias Xavier
, Amit Bhatkal, Hawau Olamide Toyin, Aman Chadha, Hisham Cholakkal, Rao Muhammad Anwer, Michael Felsberg, Jorma Laaksonen
, Thamar Solorio, Monojit Choudhury, Ivan Laptev, Mubarak Shah, Salman H. Khan, Fahad Shahbaz Khan:
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages. 19565-19575 - Ke Sun, Shen Chen, Taiping Yao, Ziyin Zhou, Jiayi Ji, Xiaoshuai Sun, Chia-Wen Lin, Rongrong Ji:
Towards General Visual-Linguistic Face Forgery Detection. 19576-19586 - Zhicheng Wang, Zhiyu Pan, Zhan Peng, Jian Cheng, Liwen Xiao, Wei Jiang, Zhiguo Cao:
Exploring Contextual Attribute Density in Referring Expression Counting. 19587-19596 - Wenlong Fang, Qiaofeng Wu
, Jing Chen, Yun Xue:
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering. 19597-19607 - Tianyu Huai, Jie Zhou, Xingjiao Wu, Qin Chen, Qingchun Bai, Ze Zhou, Liang He:
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering. 19608-19617 - Fan Lu, Wei Wu, Kecheng Zheng, Shuailei Ma, Biao Gong, Jiawei Liu, Wei Zhai, Yang Cao, Yujun Shen, Zheng-Jun Zha:
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning. 19618-19627 - Shuxian Li, Changhao He, Xiting Liu, Joey Tianyi Zhou, Xi Peng, Peng Hu:
Learning with Noisy Triplet Correspondence for Composed Image Retrieval. 19628-19637 - Eric Xing, Pranavi Kolouju, Robert Pless, Abby Stylianou
, Nathan Jacobs:
ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval. 19638-19648 - Qiang Zou
, Shuli Cheng, Jiayi Chen
:
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval. 19649-19658 - Sungyeon Kim, Xinliang Zhu, Xiaofan Lin, Muhammet Bastan, Douglas Gray, Suha Kwak:
GENIUS: A Generative Framework for Universal Multimodal Search. 19659-19669 - Yingxin Lai, Cuijie Xu, Haitian Shi, Guoqing Yang, Xiaoning Li, Zhiming Luo, Shaozi Li:
Font-Agent: Enhancing Font Understanding with Large Language Models. 19670-19680 - Bin Wang, Fan Wu, Linke Ouyang, Zhuangcheng Gu, Rui Zhang, Renqiu Xia, Botian Shi, Bo Zhang, Conghui He:
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching. 19681-19690 - Arun V. Reddy, Alexander Martin, Eugene Yang, Andrew Yates, Kate Sanders, Kenton Murray, Reno Kriz, Celso M. de Melo, Benjamin Van Durme, Rama Chellappa:
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval. 19691-19701 - Leqi Shen, Guoqiang Gong, Tianxiang Hao, Tao He, Yifeng Zhang, Pengzhang Liu, Sicheng Zhao, Jungong Han, Guiguang Ding:
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval. 19702-19712 - Siyuan Li, Luyuan Zhang, Zedong Wang, Juanxi Tian, Cheng Tan, Zicheng Liu, Chang Yu, Qingsong Xie, Haonan Lu, Haoqian Wang, Zhen Lei
:
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization. 19713-19723 - Alejandro Lozano, Min Woo Sun, James Burgess, Liangyu Chen, Jeffrey J. Nirschl, Jeffrey Gu, Ivan Lopez
, Josiah Aklilu, Anita Rau, Austin Wolfgang Katzer, Yuhui Zhang, Collin Chiu, Xiaohan Wang, Alfred Seunghoon Song, Robert Tibshirani, Serena Yeung-Levy:
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature. 19724-19735 - Xudong Wang, Xingyi Zhou, Alireza Fathi, Trevor Darrell, Cordelia Schmid:
Visual Lexicon: Rich Image Features in Language Space. 19736-19747 - Fiona Ryan, Josef Sivic, Fabian Caba Heilbron, Judy Hoffman, James M. Rehg, Bryan C. Russell:
Improving Personalized Search with Regularized Low-Rank Parameter Updates. 19748-19757 - Jingyi Xie, Jintao Yang, Zhunchen Luo, Yunbo Cao, Qiang Gao, Mengyuan Zhang, Wenpeng Hu:
AdaDARE-gamma: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation. 19758-19768 - Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li, Cem Koc, Nate True, Albert Antony, Gokula Santhanam, James Gabriel, Peter Grasch, Oncel Tuzel, Hadi Pouransari:
FastVLM: Efficient Vision Encoding for Vision Language Models. 19769-19780 - Zhi Zhang, Srishti Yadav, Fengze Han, Ekaterina Shutova:
Cross-modal Information Flow in Multimodal Large Language Models. 19781-19791 - Senqiao Yang, Yukang Chen, Zhuotao Tian, Chengyao Wang, Jingyao Li, Bei Yu, Jiaya Jia:
VisionZip: Longer is Better but Not Necessary in Vision Language Models. 19792-19802 - Cheng Yang, Yang Sui, Jinqi Xiao, Lingyi Huang, Yu Gong, Chendi Li, Jinghua Yan, Yu Bai, Ponnuswamy Sadayappan, Xia Hu, Bo Yuan:
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model. 19803-19813 - Wangbo Zhao, Yizeng Han, Jiasheng Tang, Zhikai Li, Yibing Song, Kai Wang, Zhangyang Wang, Yang You:
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs. 19814-19824 - Souhail Hadgi, Luca Moschella, Andrea Santilli, Diego Gomez
, Qixing Huang, Emanuele Rodolà, Simone Melzi, Maks Ovsjanikov:
Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces. 19825-19835 - Yahan Tu, Rui Hu
, Jitao Sang:
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models. 19836-19845 - Jiajun Cao, Yuan Zhang, Tao Huang, Ming Lu, Qizhe Zhang, Ruichuan An, Ningning Ma, Shanghang Zhang:
MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders. 19846-19856 - Jiazhen Liu
, Yuhan Fu, Ruobing Xie, Runquan Xie, Xingwu Sun, Fengzong Lian, Zhanhui Kang, Xirong Li:
PhD: A ChatGPT-Prompted Visual Hallucination Evaluation Dataset. 19857-19866 - Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui
, Jing Shao:
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models. 19867-19878 - Yanbo Wang, Jiyang Guan, Jian Liang, Ran He:
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models? 19879-19889 - Shuyang Hao, Bryan Hooi, Jun Liu, Kai-Wei Chang, Zi Huang
, Yujun Cai
:
Exploring Visual Vulnerabilities via Multi-Loss Adversarial Search for Jailbreaking Vision-Language Models. 19890-19899 - Jiaming Zhang, Junhong Ye, Xingjun Ma, Yige Li, Yunfan Yang, Yunhao Chen, Jitao Sang, Dit-Yan Yeung:
Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models. 19900-19909 - Xin Wang, Kai Chen, Jiaming Zhang, Jingjing Chen, Xingjun Ma:
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models. 19910-19920 - Baoshun Tong, Hanjiang Lai, Yan Pan, Jian Yin:
On the Zero-shot Adversarial Robustness of Vision-Language Models: A Truly Zero-shot and Training-free Approach. 19921-19930 - Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz:
Conformal Prediction for Zero-Shot Models. 19931-19941 - Ashshak Sharifdeen, Muhammad Akhtar Munir, Sanoojan Baliah, Salman Khan, Muhammad Haris Khan:
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models. 19942-19951 - Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, Xiangyu Zhao, Kai Chen:
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language. 19952-19962 - Bikang Pan, Qun Li, Xiaoying Tang, Wei Huang, Zhen Fang, Feng Liu, Jingya Wang, Jingyi Yu, Ye Shi:
NLPrompt: Noise-Label Prompt Learning for Vision-Language Models. 19963-19973 - Tung-Long Vuong, Hoang Phan, Vy Vo, Anh Bui, Thanh-Toan Do, Trung Le, Dinh Phung:
Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation. 19974-19984 - Tianyu Yu, Haoye Zhang, Qiming Li, Qixin Xu, Yuan Yao, Da Chen, Xiaoman Lu, Ganqu Cui, Yunkai Dang, Taiwen He, Xiaocheng Feng, Jun Song, Bo Zheng, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun:
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness. 19985-19995 - Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele:
Test-Time Visual In-Context Tuning. 19996-20005 - Pramit Saha, Felix Wagner, Divyanshu Mishra, Can Peng, Anshul Thakur, David A. Clifton, Konstantinos Kamnitsas, J. Alison Noble
:
F^3OCUS - Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics. 20006-20017 - Arne Grobrügge, Niklas Kühl, Gerhard Satzger, Philipp Spitzer
:
Towards Human-Understandable Multi-Dimensional Concept Discovery. 20018-20027 - Jinhong Lin, Cheng-En Wu, Huanran Li, Jifan Zhang, Yu Hen Hu, Pedro Morgado:
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling. 20028-20038 - Yancheng Cai, Fei Yin, Dounia Hammou
, Rafal Mantiuk:
Do Computer Vision Foundation Models Learn the Low-level Characteristics of the Human Visual System? 20039-20048 - Duolikun Danier, Mehmet Aygün, Changjian Li, Hakan Bilen, Oisin Mac Aodha:
DepthCues: Evaluating Monocular Depth Perception in Large Vision Models. 20049-20059 - Zhaoqing Wang, Xiaobo Xia, Runnan Chen, Dongdong Yu, Changhu Wang, Mingming Gong, Tongliang Liu:
LaVin-DiT: Large Vision Diffusion Transformer. 20060-20070 - Dongshuo Yin, Leiyi Hu, Bin Li, Youqun Zhang, Xue Yang:
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks. 20071-20081 - Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache:
Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks. 20082-20091 - Lixu Wang, Bingqi Shang
, Yi Li, Payal Mohapatra, Wei Dong, Xiao Wang, Qi Zhu:
Split Adaptation for Pre-trained Vision Transformers. 20092-20102 - Jialai Wang, Yuxiao Wu
, Weiye Xu, Yating Huang, Chao Zhang, Zongpeng Li, Mingwei Xu, Zhenkai Liang:
Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation. 20103-20112 - Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, José M. Álvarez:
MDP: Multidimensional Vision Model Pruning with Latency Constraint. 20113-20123 - Fei Xie, Jiahao Nie, Yujin Tang, Wenkang Zhang, Hongshen Zhao
:
Mamba-Adaptor: State Space Model Adaptor for Visual Recognition. 20124-20134 - Yuan Zhou, Qingshan Xu, Jiequan Cui, Junbao Zhou
, Jing Zhang, Richang Hong, Hanwang Zhang:
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction. 20135-20145 - Zichen Miao, Wei Chen, Qiang Qiu:
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models. 20146-20157 - Caoshuo Li, Tanzhe Li, Xiaobin Hu, Donghao Luo, Taisong Jin:
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition. 20158-20168 - Ruiheng Liu, Haozhe Chen, Boyao Zhao
, Kejiang Chen, Weiming Zhang:
Graph-Embedded Structure-Aware Perceptual Hashing for Neural Network Protection and Piracy Detection. 20169-20178 - Yang Liu
, Tianwei Zhang, Shi Gu:
Hybrid Concept Bottleneck Models. 20179-20189 - Sanghyun Kim, Deunsol Jung, Minsu Cho:
Locality-Aware Zero-Shot Human-Object Interaction Detection. 20190-20200 - Dianmo Sheng
, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Tao Gong, Bin Liu, Jing Han, Wenbin Tu, Shengwei Xu, Nenghai Yu:
UNICL-SAM: Uncertainty-Driven In-Context Segmentation with Part Prototype Discovery. 20201-20211 - Zhengyang Wang, Tingliang Feng, Fan Lyu, Fanhua Shang, Wei Feng, Liang Wan:
Dual Semantic Guidance for Open Vocabulary Semantic Segmentation. 20212-20222 - Zhiwei Yang, Yucong Meng, Kexue Fu, Feilong Tang, Shuo Wang, Zhijian Song:
Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation. 20223-20232 - Chen-Yi Lu, Kasra Derakhshandeh, Somali Chaterji:
Improving Semi-Supervised Semantic Segmentation with Sliced-Wasserstein Feature Alignment and Uniformity. 20233-20243 - Zhongwen Zhang, Yuri Boykov:
Soft Self-labeling and Potts Relaxations for Weakly-supervised Segmentation. 20244-20253 - Zhaochen Liu, Limeng Qiao, Xiangxiang Chu, Lin Ma, Tingting Jiang:
Towards Efficient Foundation Model for Zero-shot Amodal Segmentation. 20254-20264 - Dongkai Wang, Jiang Duan, Liangjian Wen, Shiyu Xuan, Hao Chen, Shiliang Zhang:
Generalizable Object Keypoint Localization from Generative Priors. 20265-20274 - Huajie Jiang, Zhengxian Li, Xiaohan Yu
, Yongli Hu, Baocai Yin, Jian Yang
, Yuankai Qi
:
Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning. 20275-20285 - Libiao Chen, Dong Nie, Junjun Pan, Jing Yan, Zhenyu Tang:
Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation. 20286-20295 - Enguang Wang, Zhimao Peng, Zhengyuan Xie, Fei Yang, Xialei Liu, Ming-Ming Cheng:
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery. 20296-20306 - Chang-Bin Zhang, Jinhong Ni, Yujie Zhong, Kai Han:
v-CLR: View-Consistent Learning for Open-World Instance Segmentation. 20307-20317 - Muli Yang
, Gabriel James Goenawan, Huaiyuan Qin, Kai Han, Xi Peng, Yanhua Yang, Hongyuan Zhu:
Detecting Open World Objects via Partial Attribute Assignment. 20318-20328 - Jiangyi Wang, Na Zhao
:
Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection. 20329-20339 - Shizhou Zhang, Xueqiang Lv, Yinghui Xing, Qirui Wu
, Di Xu, Yanning Zhang:
Revisiting Generative Replay for Class Incremental Object Detection. 20340-20349 - Yu Zhou, Dian Zheng, Qijie Mo, Renjie Lu, Kun-Yu Lin
, Wei-Shi Zheng:
Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks. 20350-20359 - Mauricio Byrd Victorica, György Dán, Henrik Sandberg:
Saliuitl: Ensemble Salience Guided Recovery of Adversarial Patches against CNNs. 20360-20369 - Jiacong Xu, Shao-Yuan Lo, Bardia Safaei, Vishal M. Patel, Isht Dwivedi:
Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models. 20370-20382 - Mojtaba Nafez, Amirhossein Koochakian, Arad Maleki, Jafar Habibi, Mohammad Hossein Rohban:
PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies. 20383-20394 - Ankan Bhunia, Changjian Li, Hakan Bilen:
Odd-One-Out: Anomaly Detection by Comparing with Neighbors. 20395-20404 - Jia Guo, Shuai Lu, Weihang Zhang, Fang Chen, Huiqi Li, Hongen Liao:
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection. 20405-20415 - Fuyun Wang, Tong Zhang, Yuanzhi Wang, Yide Qiu, Xin Liu, Xu Guo, Zhen Cui
:
Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection. 20416-20426 - Mohamed Afane, Gabrielle Ebbrecht, Ying Wang, Juntao Chen, Junaid Farooq:
ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks. 20427-20436 - Yanda Chen, Gongwei Chen, Miao Zhang, Weili Guan, Liqiang Nie:
Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation. 20437-20446 - Byeongho Heo, Taekyung Kim, Sangdoo Yun, Dongyoon Han:
Masking meets Supervision: A Strong Learning Alliance. 20447-20457 - Qing Zhou, Junyu Gao, Qi Wang:
Scale Efficient Training for Large Datasets. 20458-20467 - Eliahu Horwitz, Bar Cavia, Jonathan Kahana, Yedid Hoshen:
Learning on Model Weights using Tree Experts. 20468-20478 - Sebastian Dziadzio, Vishaal Udandarao, Karsten Roth, Ameya Prabhu, Zeynep Akata, Samuel Albanie, Matthias Bethge:
How to Merge Your Multimodal Models Over Time? 20479-20491 - Xiaohan Qin, Xiaoxing Wang, Junchi Yan:
Revisiting Fairness in Multitask Learning: A Performance-Driven Approach for Variance Reduction. 20492-20501 - Sihao Liu, Yibo Yang, Xiaojie Li, David A. Clifton, Bernard Ghanem
:
Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization. 20502-20511 - Fei Ye, Adrian G. Bors:
Online Task-Free Continual Learning via Dynamic Expansionable Memory Distribution. 20512-20522 - Zijian Gao, Wangwang Jia, Xingxing Zhang, Dulan Zhou, Kele Xu
, Dawei Feng, Yong Dou, Xinjun Mao, Huaimin Wang:
Knowledge Memorization and Rumination for Pre-trained Model-based Class-Incremental Learning. 20523-20533 - Arnav M. Das, Gantavya Bhatt, Lilly Kumari, Sahil Verma, Jeff A. Bilmes:
COBRA: COmBinatorial Retrieval Augmentation for Few-Shot Adaptation. 20534-20546 - Da-Wei Zhou
, Zi-Wen Cai, Han-Jia Ye, Lijun Zhang, De-Chuan Zhan:
Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning. 20547-20557 - Aristotelis Ballas, Christos Diou:
Gradient-Guided Annealing for Domain Generalization. 20558-20568 - Xiangyu Chang, Fahim Faisal Niloy, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury:
AdMiT: Adaptive Multi-Source Tuning in Dynamic Environments. 20569-20579 - Hassan Mahmood, Ehsan Elhamifar:
Compositional Targeted Multi-Label Universal Perturbations. 20580-20591 - Tianhao Ma, Han Chen, Juncheng Hu, Yungang Zhu, Ximing Li:
Forming Auxiliary High-confident Instance-level Loss to Promote Learning from Label Proportions. 20592-20601 - Jae Hyeon Park, Joo Hyeon Jeon, Jae Yun Lee, Sangyeon Ahn, Minhee Cha, Min Geol Kim, Hyeok Nam, Sung In Cho
:
Dynamic Pseudo Labeling via Gradient Cutting for High-Low Entropy Exploration. 20602-20611 - Erik Wallin
, Fredrik Kahl, Lars Hammarstrand:
ProHOC: Probabilistic Hierarchical Out-of-Distribution Classification via Multi-Depth Networks. 20612-20621 - Divya Shanmugam, Helen Lu, Swami Sankaranarayanan, John V. Guttag:
Test-time Augmentation Improves Efficiency in Conformal Prediction. 20622-20631 - Xiangtao Zhang, Sheng Li, Ao Li
, Yipeng Liu, Fan Zhang, Ce Zhu, Le Zhang:
Subspace Constraint and Contribution Estimation for Heterogeneous Federated Learning. 20632-20642 - Gongxi Zhu, Donghao Li, Hanlin Gu, Yuan Yao, Lixin Fan
, Yuxing Han:
FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning. 20643-20653 - Jiahao Xu, Zikai Zhang, Rui Hu:
Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection. 20654-20664 - Fan Xing, Zhuo Tian, Xuefeng Fan, Xiaoyi Zhou:
RAEncoder: A Label-Free Reversible Adversarial Examples Encoder for Dataset Intellectual Property Protection. 20665-20674 - Sizai Hou, Songze Li, Duanyi Yao:
DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders. 20675-20684 - Shixin Li, Chaoxiang He, Xiaojing Ma, Bin Benjamin Zhu, Shuo Wang, Hongsheng Hu, Dongmei Zhang
, Linchen Yu:
Enhancing Adversarial Transferability with Checkpoints of a Single Model's Training. 20685-20694 - Yuan Xiao
, Yuchen Chen, Shiqing Ma, Chunrong Fang, Tongtong Bai, Mingzheng Gu, Yuxin Cheng, Yanwei Chen, Zhenyu Chen:
Tightening Robustness Verification of MaxPool-based Neural Networks via Minimizing the Over-Approximation Zone. 20695-20705 - Quanjiang Li, Tingjin Luo, Jiahui Liao:
Theory-Inspired Deep Multi-View Multi-Label Learning with Incomplete Views and Noisy Labels. 20706-20715 - Baili Xiao, Zhibin Dong, Ke Liang, Suyuan Liu, Siwei Wang, Tianrui Liu, Xingchen Hu, En Zhu, Xinwang Liu:
EASEMVC: Efficient Dual Selection Mechanism for Deep Multi-View Clustering. 20716-20726 - Jiyuan Liu, Xinwang Liu, Chuankun Li, Xinhang Wan, Hao Tan, Yi Zhang, Weixuan Liang, Qian Qu, Yu Feng, Renxiang Guan, Ke Liang:
Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels. 20727-20736 - Jungkyoo Shin, Bumsoo Kim, Eunwoo Kim
:
Generative Modeling of Class Probability for Multi-Modal Representation Learning. 20737-20746 - Siyuan Duan
, Yuan Sun, Dezhong Peng, Zheng Liu, Xiaomin Song, Peng Hu:
Fuzzy Multimodal Learning for Trusted Cross-modal Retrieval. 20747-20756 - Tal Zeevi, Ravid Shwartz-Ziv, Yann LeCun, Lawrence H. Staib, John A. Onofrey
:
Rate-In: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation. 20757-20766 - Divya Velayudhan, Abdelfatah Hassan Ahmed, Mohamad Alansari, Neha Gour, Abderaouf Behouch, Taimur Hassan, Syed Talal Wasim, Nabil Maalej, Muzammal Naseer, Juergen Gall, Mohammed Bennamoun, Ernesto Damiani, Naoufel Werghi:
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection. 20767-20777 - Yang Yue
, Yulin Wang, Chenxin Tao, Pan Liu, Shiji Song, Gao Huang:
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning. 20778-20788 - Jianwei Zhao, Xin Li, Fan Yang, Qiang Zhai, Ao Luo, Yang Zhao, Hong Cheng, Huazhu Fu:
MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification. 20789-20799 - Xianrui Li
, Yufei Cui
, Jun Li, Antoni B. Chan
:
Advancing Multiple Instance Learning with Continual Learning for Whole Slide Imaging. 20800-20809 - Hang Shi, Changxi Chi, Peng Wan, Daoqiang Zhang, Wei Shao:
Multi-modal Topology-embedded Graph Learning for Spatially Resolved Genes Prediction from Pathology Images with Prior Gene Similarity Information. 20810-20819 - Marcus Nordström, Atsuto Maki, Henrik Hult:
The Impact Label Noise and Choice of Threshold has on Cross-Entropy and Soft-Dice in Image Segmentation. 20820-20829 - Yunhe Gao, Di Liu, Zhuowei Li, Yunsheng Li, Dongdong Chen, Mu Zhou, Dimitris N. Metaxas:
Show and Segment: Universal Medical Image Segmentation via In-Context Learning. 20830-20840 - Junlong Cheng, Bin Fu, Jin Ye, Guoan Wang, Tianbin Li, Haoyu Wang, Ruoyu Li, He Yao, Junren Chen, Jingwen Li, Yanzhou Su, Min Zhu, Junjun He:
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline. 20841-20851 - Yanfeng Zhou, Lingrui Li, Le Lu, Minfeng Xu:
nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark. 20852-20862 - Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Benjamin Simon
, Mason Belue, Stephanie A. Harmon, Baris Turkbey, Daguang Xu, Wenqi Li:
VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging. 20863-20873 - Bastian Wittmann, Yannick Wattenberg, Tamaz Amiranashvili, Suprosanna Shit, Bjoern H. Menze:
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation. 20874-20884 - Zhuangzhuang Chen, Hualiang Wang, Chubin Ou, Xiaomeng Li:
MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation. 20885-20894
Day 3: 2025-06-15
- Bingliang Zhang
, Wenda Chu, Julius Berner, Chenlin Meng, Anima Anandkumar, Yang Song:
Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing. 20895-20905 - Zhendong Wang, Jianmin Bao, Shuyang Gu, Dong Chen, Wengang Zhou, Houqiang Li:
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models. 20906-20915 - Lingjie Kong, Kai Wu, Chengming Xu, Xiaobin Hu, Wenhui Han, Jinlong Peng, Donghao Luo, Mengtian Li, Jiangning Zhang, Chengjie Wang
, Yanwei Fu:
CustAny: Customizing Anything from A Single Example. 20916-20925 - Soobin Um, Jong Chul Ye:
Minority-Focused Text-to-Image Generation via Prompt Optimization. 20926-20936 - Andreas Müller, Denis Lukovnikov, Jonas Thietke, Asja Fischer, Erwin Quiring:
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models. 20937-20946 - Hao Lin, Ke Wu, Jie Li, Jun Li, Wu-Jun Li:
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming. 20947-20957 - Yanbiao Ma, Wei Dai, Wenke Huang, Jiayi Chen:
Geometric Knowledge-Guided Localized Global Distribution Alignment for Federated Learning. 20958-20968 - Kai Zhao
, Zhihao Zhuang, Miao Zhang, Chenjuan Guo, Yang Shu, Bin Yang
:
Enhancing Diversity for Data-free Quantization. 20969-20978 - Meilong Xu, Saumya Gupta, Xiaoling Hu, Chen Li, Shahira Abousamra, Dimitris Samaras, Prateek Prasanna, Chao Chen:
TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model. 20979-20989 - Aishik Konwer, Zhijian Yang, Erhan Bas, Cao Xiao, Prateek Prasanna, Parminder Bhatia, Taha A. Kass-Hout:
Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation. 20990-21000 - Junying Wang, Hongyuan Zhang, Yuan Yuan:
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks. 21001-21010 - Shoichiro Takeda, Yasunori Akagi:
Gromov-Wasserstein Problem with Cyclic Symmetry. 21011-21020 - Runfeng Li, Mikhail Okunev, Zixuan Guo, Anh Ha Duong, Christian Richardt, Matthew O'Toole, James Tompkin
:
Time of the Flight of the Gaussians: Optimizing Depth Indirectly in Dynamic Radiance Fields. 21021-21030 - Yiqing Liang, Abhishek Badki, Hang Su, James Tompkin
, Orazio Gallo:
Zero-Shot Monocular Scene Flow Estimation in the Wild. 21031-21044 - Jialin Zhu
, Jiangbei Yue, Feixiang He, He Wang:
3D Student Splatting and Scooping. 21045-21054 - Ziqiao Peng, Yanbo Fan, Haoyu Wu, Xuan Wang, Hongyan Liu, Jun He, Zhaoxin Fan:
DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations. 21055-21064 - Lee Chae-Yeon, Oh Hyun-Bin, Han EunGi, Sung-Bin Kim, Suekyeong Nam, Tae-Hyun Oh:
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics. 21065-21074 - Dingcheng Zhen, Shunshun Yin, Shiyang Qin, Hou Yi, Ziwei Zhang
, Siyuan Liu, Gan Qi, Ming Tao:
Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation. 21075-21085 - Jiahao Cui, Hui Li, Yun Zhan, Hanlin Shang, Kaihui Cheng, Yuqi Ma, Shan Mu, Hang Zhou, Jingdong Wang, Siyu Zhu:
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer. 21086-21095 - Shuyuan Tu, Zhen Xing, Xintong Han, Zhi-Qi Cheng, Qi Dai, Chong Luo, Zuxuan Wu:
StableAnimator: High-Quality Identity-Preserving Human Image Animation. 21096-21106 - Yuan Li, Ziqian Bai, Feitong Tan, Zhaopeng Cui, Sean Fanello, Yinda Zhang:
IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular VideosC. 21107-21116 - Yating Wang, Xuan Wang, Ran Yi, Yanbo Fan, Jichen Hu, Jingcheng Zhu, Lizhuang Ma:
3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations. 21117-21126 - Di Liu, Teng Deng, Giljoo Nam, Yu Rong, Stanislav Pidhorskyi, Junxuan Li, Jason M. Saragih, Dimitris N. Metaxas, Chen Cao:
LUCAS: Layered Universal Codec Avatars. 21127-21137 - Soohyun Lee, Seoyeon Kim, HeeKyung Lee, Won-Sik Jeong, Joo Ho Lee
:
GeoAvatar: Geometrically-Consistent Multi-Person Avatar Reconstruction from Sparse Multi-View Videos. 21138-21147 - Lingteng Qiu, Shenhao Zhu, Qi Zuo, Xiaodong Gu, Yuan Dong, Junfei Zhang, Chao Xu, Zhe Li, Weihao Yuan, Liefeng Bo, Guanying Chen, Zilong Dong:
AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction. 21148-21158 - Zhichao Zhai, Guikun Chen, Wenguan Wang, Dong Zheng, Jun Xiao:
TAGA: Self-supervised Learning for Template-free Animatable Gaussian Articulated Model. 21159-21169 - Mingze Sun, Junhao Chen, Junting Dong, Yurun Chen, Xinyu Jiang, Shiwei Mao, Puhua Jiang, Jingbo Wang, Bo Dai, Ruqi Huang:
DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters. 21170-21180 - Yifang Men, Yuan Yao
, Miaomiao Cui, Liefeng Bo:
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling. 21181-21191 - Satyajit Tourani, Siddharth Tourani, Arif Mahmood, Muhammad Haris Khan:
Unsupervised Discovery of Facial Landmarks and Head Pose. 21192-21202 - Yuxi Mi, Zhizhou Zhong, Yuge Huang, Qiuyang Yuan, Xuan Zhao, Jianqing Xu, Shouhong Ding, Shaoming Wang, Rizen Guo, Shuigeng Zhou:
Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion. 21203-21214 - Michelle Guo, Matt Jen-Yuan Chiang, Igor Santesteban, Nikolaos Sarafianos, Hsiao-Yu Chen, Oshri Halimi, Aljaz Bozic, Shunsuke Saito, Jiajun Wu, C. Karen Liu
, Tuur Stuyck, Egor Larionov:
PGC: Physics-Based Gaussian Cloth from a Single Pose. 21215-21225 - Zeqing Wang, Qingyang Ma, Wentao Wan, Haojie Li, Keze Wang, Yonghong Tian:
Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body. 21226-21237 - Nannan Li, Kevin J. Shih, Bryan A. Plummer:
Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling. 21238-21247 - Yuanwei Liu, Hui Wei, Chengyu Jia, Ruqi Xiao, Weijian Ruan, Xingxing Wei, Joey Tianyi Zhou, Zheng Wang:
ProjAttacker: A Configurable Physical Adversarial Attack for Face Recognition via Projector. 21248-21257 - Yu-Cheng Chiu, Guan-Rong Chen, Zihao Chen, Yan-Tsung Peng:
ABC-Former: Auxiliary Bimodal Cross-domain Transformer with Interactive Channel Attention for White Balance. 21258-21266 - Rui Xu
, Yuzhen Niu, Yuezhou Li, Huangbiao Xu, Wenxi Liu, Yuzhong Chen:
URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration. 21267-21276 - Guanzhou Lan, Qianli Ma
, Yuqi Yang, Zhigang Wang, Dong Wang, Xuelong Li, Bin Zhao:
Efficient Diffusion as Low Light Enhancer. 21277-21286 - Hesong Li, Ziqi Wu, Ruiwen Shao, Tao Zhang, Ying Fu:
Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement. 21287-21296 - Yujie Wang, Praneeth Chakravarthula, Baoquan Chen:
DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal. 21297-21306 - Jingzhi Li, Zongwei Wu, Eduard Zamfir, Radu Timofte:
ReCap: Better Gaussian Relighting with Cross-Environment Captures. 21307-21316 - Yue Fan, Ningjing Fan, Ivan Skorokhodov
, Oleg Voynov, Savva Ignatyev, Evgeny Burnaev, Peter Wonka, Yiqun Wang:
Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects. 21317-21327 - Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu, Jie-Ying Lee
, Jiun-Long Huang, Yu-Chee Tseng, Yu-Lun Liu:
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes. 21328-21338 - Xingyu Chen, Zihao Feng, Kun Qian, Xinyu Zhang:
Radio Frequency Ray Tracing with Neural Object Representation for Enhanced RF Modeling. 21339-21348 - You Wang, Li Fang, Hao Zhu, Fei Hu, Long Ye, Zhan Ma:
GoLF-NRT: Integrating Global Context and Local Geometry for Few-Shot View Synthesis. 21349-21359 - Jan Held, Renaud Vandeghen, Abdullah Hamdi, Adrien Deliège, Anthony Cioppa, Silvio Giancola, Andrea Vedaldi, Bernard Ghanem
, Marc Van Droogenbroeck:
3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes. 21360-21369 - Stefano Esposito, Anpei Chen, Christian Reiser, Samuel Rota Bulò, Lorenzo Porzi, Katja Schwarz, Christian Richardt, Michael Zollhöfer, Peter Kontschieder, Andreas Geiger:
Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes. 21370-21380 - Shu Wang, Yanbo Gao, Shuai Li, Chong Lv, Xun Cai, Chuankun Li, Hui Yuan
, Jinglin Zhang:
MetricGrids: Arbitrary Nonlinear Approximation with Elementary Metric Grids based Implicit Neural Representation. 21381-21391 - Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang, Qi Zhang, Wenbo Hu, Chaopeng Zhang, Yao Yao, Ying Shan, Long Quan:
Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh. 21392-21402 - Dana Cohen-Bar, Daniel Cohen-Or, Gal Chechik, Yoni Kasten:
TriTex: Learning Texture from a Single Mesh via Triplane Semantic Features. 21403-21413 - Armin Shafiee Sarvestani, Sheyang Tang, Zhou Wang:
HybridMQA: Exploring Geometry-Texture Interactions for Colored Mesh Quality Assessment. 21414-21424 - Xiang Feng, Chang Yu, Zoubin Bi, Yintong Shang, Feng Gao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang:
ARM: Appearance Reconstruction Model for Relightable 3D Generation. 21425-21437 - Jing Li, Yihang Fu, Falai Chen:
DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry. 21438-21447 - Yuan Li, Cheng Lin, Yuan Liu, Xiaoxiao Long, Chenxu Zhang, Ningna Wang, Xin Li, Wenping Wang, Xiaohu Guo:
CADDreamer: CAD Object Generation from Single-view Images. 21448-21457 - Yiftach Edelstein, Or Patashnik, Dana Cohen-Bar, Lihi Zelnik-Manor:
Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation. 21458-21468 - Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, Jiaolong Yang:
Structured 3D Latents for Scalable and Versatile 3D Generation. 21469-21480 - Xingyi Yang, Songhua Liu, Xinchao Wang:
Hash3D: Training-free Acceleration for 3D Generation. 21481-21491 - Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen, Anh Tuan Tran, Cuong Pham:
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion. 21492-21501 - Weiran Guang, Xiaoguang Gu, Mengqi Huang, Zhendong Mao:
Dragin3D: Image Editing by Dragging in 3D Space. 21502-21512 - Yi-Hua Huang, Ming-Xian Lin, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi:
Deformable Radial Kernel Splatting. 21513-21523 - Hyojun Go, Byeongjun Park, Jiho Jang, Jin-Young Kim, Soonwoo Kwon, Changick Kim:
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis. 21524-21536 - Alex Hanson
, Allen Tu, Geng Lin, Vasu Singla, Matthias Zwicker, Tom Goldstein:
Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives. 21537-21546 - Jinguang Tong, Xuesong Li, Fahira Afzal Maken, Sundaram Muthu, Lars Petersson, Chuong Nguyen, Hongdong Li:
GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction. 21547-21557 - Yiyang Shen, Kun Zhou, He Wang, Yin Yang, Tianjia Shao:
High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model. 21558-21569 - Yifan Liu, Keyu Fan, Weihao Yu, Chenxin Li, Hao Lu, Yixuan Yuan:
MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models. 21570-21579 - Han Zhou, Wei Dong, Jun Chen:
LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors. 21580-21589 - Zheng Chen, Chenming Wu, Zhelun Shen, Chen Zhao, Weicai Ye, Haocheng Feng, Errui Ding, Song-Hai Zhang:
Splatter-360: Generalizable 360 Gaussian Splatting for Wide-baseline Panoramic Images. 21590-21599 - Hyunwoo Park, Gun Ryu, Wonjun Kim:
DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting. 21600-21609 - Dian Zheng, Cheng Zhang, Xiao-Ming Wu, Cao Li, Chengfei Lv, Jian-Fang Hu, Wei-Shi Zheng:
Panorama Generation From NFoV Image Done Right. 21610-21619 - Yucheng Mao, Boyang Wang, Nilesh Kulkarni, Jeong Joon Park:
SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model. 21620-21630 - Hao Wen, Zehuan Huang, Yaohui Wang, Xinyuan Chen, Lu Sheng:
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion. 21631-21641 - Wenyuan Zhang
, Yixiao Yang, Han Huang, Liang Han, Kanle Shi, Yu-Shen Liu, Zhizhong Han:
MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction. 21642-21653 - Bo Ji, Angela Yao:
SfM-Free 3D Gaussian Splatting via Hierarchical Training. 21654-21663 - Xiangyu Liu, Xiaomei Zhang, Zhiyuan Ma, Xiangyu Zhu, Zhen Lei:
MVBoost: Boost 3D Reconstruction with Multi-View Refinement. 21664-21673 - Khiem Vuong, Anurag Ghosh, Deva Ramanan, Srinivasa G. Narasimhan, Shubham Tulsiani:
AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis. 21674-21684 - Qihang Zhang, Shuangfei Zhai, Miguel Ángel Bautista Martin, Kevin Miao, Alexander Toshev, Joshua M. Susskind, Jiatao Gu:
World-consistent Video Diffusion with Explicit 3D Modeling. 21685-21695 - Haosen Yang, Chenhao Zhang, Wenqing Wang, Marco Volino, Adrian Hilton
, Li Zhang, Xiatian Zhu
:
Improving Gaussian Splatting with Localized Points Management. 21696-21705 - Sebastian Koch, Johanna Wald, Mirco Colosi, Narunas Vaskevicius, Pedro Hermosilla, Federico Tombari, Timo Ropinski:
RelationField: Relate Anything in Radiance Fields. 21706-21716 - Weikang Bian, Zhaoyang Huang, Xiaoyu Shi, Yijin Li, Fu-Yun Wang, Hongsheng Li:
GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields through Efficient Dense 3D Point Tracking. 21717-21727 - Ashish Kumar, A. N. Rajagopalan:
DynaMoDe-NeRF: Motion-aware Deblurring Neural Radiance Field for Dynamic Scenes. 21728-21738 - Seungjun Lee, Gim Hee Lee:
DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting. 21739-21749 - Yifan Wang, Peishan Yang, Zhen Xu, Jiaming Sun, Zhanhua Zhang, Yong Chen, Hujun Bao, Sida Peng, Xiaowei Zhou:
FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction. 21750-21760 - Hao Li, Sicheng Li, Xiang Gao, Abudouaihati Batuer, Lu Yu, Yiyi Liao:
GIFStream: 4D Gaussian-based Immersive Video with Feature Stream. 21761-21770 - Hongchi Xia, Entong Su, Marius Memmel, Arhan Jain, Raymond Yu, Numfor Mbiziwo-Tiapo, Ali Farhadi, Abhishek Gupta, Shenlong Wang, Wei-Chiu Ma:
DRAWER: Digital Reconstruction and Articulation With Environment Realism. 21771-21782 - Awais Nizamani
, Hamid Laga, Guanjin Wang, Farid Boussaïd, Mohammed Bennamoun
, Anuj Srivastava:
Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis. 21783-21792 - Paul Roetzer, Viktoria Ehm, Daniel Cremers, Zorah Lähner, Florian Bernard:
Higher-Order Ratio Cycles for Fast and Globally Optimal Shape Matching. 21793-21803 - Ryota Maeda, Yunseong Moon, Seung-Hwan Baek:
Event Ellipsometer: Event-based Mueller-Matrix Video Imaging. 21804-21813 - Noah Stier, Alex Rich, Pradeep Sen, Tobias Höllerer:
AniGrad: Anisotropic Gradient-Adaptive Sampling for 3D Reconstruction From Monocular Video. 21814-21823 - Zetong Zhang, Manuel Kaufmann, Lixin Xue, Jie Song, Martin R. Oswald:
ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos. 21824-21835 - Hongtao Yu, Shaohui Song, Lihu Sun, Wenkai Su, Xiaodong Yang, Chengming Liu:
All-directional Disparity Estimation for Real-world QPD Images. 21836-21846 - Songsong Yu, Yuxin Chen, Zhongang Qi, Zeke Xie, Yifan Wang, Lijun Wang, Ying Shan, Huchuan Lu:
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion. 21847-21856 - Hualie Jiang, Zhiqiang Lou, Laiyan Ding, Rui Xu, Minglang Tan, Wenjie Jiang, Rui Huang:
DEFOM-Stereo: Depth Foundation Model Based Stereo Matching. 21857-21867 - Marwane Hariat, Antoine Manzanera, David Filliat:
Improved Monocular Depth Prediction Using Distance Transform Over Pre-semantic Contours with Self-supervised Neural Networks. 21868-21879 - Weilong Yan, Ming Li, Haipeng Li, Shuwei Shao, Robby T. Tan:
Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors. 21880-21890 - Zador Pataki, Paul-Edouard Sarlin, Johannes L. Schönberger, Marc Pollefeys:
MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion. 21891-21901 - Daniil Sinitsyn, Linus Härenstam-Nielsen, Daniel Cremers:
PRaDA: Projective Radial Distortion Averaging. 21902-21912 - Charalambos Tzamos, Viktor Kocur, Yaqing Ding, Daniel Barath, Zuzana Berger Haladová, Torsten Sattler, Zuzana Kukelova:
Practical Solutions to the Relative Pose of Three Calibrated Cameras. 21913-21923 - Jianing Yang, Alexander Sax, Kevin J. Liang, Mikael Henaff, Hao Tang, Ang Cao, Joyce Chai, Franziska Meier, Matt Feiszli:
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass. 21924-21935 - Shangzhan Zhang, Jianyuan Wang, Yinghao Xu, Nan Xue, Christian Rupprecht, Xiaowei Zhou, Yujun Shen, Gordon Wetzstein:
FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views. 21936-21947 - Lea Müller, Hongsuk Choi, Anthony Zhang, Brent Yi, Jitendra Malik, Angjoo Kanazawa:
Reconstructing People, Places, and Cameras. 21948-21958 - Kai Luo, Hao Shi, Sheng Wu, Fei Teng, Mengfei Duan, Chang Huang, Yuhang Wang, Kaiwei Wang, Kailun Yang:
Omnidirectional Multi-Object Tracking. 21959-21969 - Jintao Zhang, Zimin Xia, Mingyue Dong, Shuhan Shen, Linwei Yue, Xianwei Zheng:
CoMatcher: Multi-View Collaborative Feature Matching. 21970-21980 - Wooju Lee, Juhye Park, Dasol Hong, Changki Sung, Youngwoo Seo, Dongwan Kang, Hyun Myung:
PIDLoc: Cross-View Pose Optimization Network Inspired by PID Controllers. 21981-21990 - Shengze Wang, Jiefeng Li, Tianye Li, Ye Yuan, Henry Fuchs, Koki Nagano, Shalini De Mello, Michael Stengel:
BLADE: Single-view Body Mesh Estimation through Accurate Depth Estimation. 21991-22000 - Wanhua Li, Renping Zhou, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang, Hanspeter Pfister:
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models. 22001-22011 - Gyeongjin Kang, Jisang Yoo, Jihyeon Park, Seungtae Nam, Hyeonsoo Im, Sangheon Shin, Sangpil Kim, Eunbyung Park:
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting. 22012-22022 - Xingyu Liu, Gu Wang, Ruida Zhang, Chenyangguang Zhang, Federico Tombari, Xiangyang Ji:
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image. 22023-22034 - Junjie Chen, Weilong Chen, Yifan Zuo, Yuming Fang:
Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation. 22035-22044 - Qingyuan Wang, Rui Song, Jiaojiao Li, Kerui Cheng, David Ferstl, Yinlin Hu:
SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow. 22045-22054 - Ziqin Huang, Gu Wang, Chenyangguang Zhang, Ruida Zhang, Xiu Li, Xiangyang Ji:
GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation. 22055-22066 - Wen-Hsuan Chu, Lei Ke, Jianmeng Liu, Mingxiao Huo, Pavel Tokmakov, Katerina Fragkiadaki:
Robust Multi-Object 4D Generation for In-the-wild Videos. 22067-22077 - Guangzhao He
, Chen Geng
, Shangzhe Wu, Jiajun Wu:
Category-Agnostic Neural Object Rigging. 22078-22088 - Zekai Shao, Yufan Hu, Bin Fan, Hongmin Liu:
PURA: Parameter Update-Recovery Test-Time Adaption for RGB-T Tracking. 22089-22098 - Xinyu Xiang, Qinglong Yan, Hao Zhang
, Jiayi Ma:
ACAttack: Adaptive Cross Attacking RGB-T Tracker via Multi-Modal Response Decoupling. 22099-22108 - Ahyun Seo, Minsu Cho:
Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection. 22109-22118 - Shining Wang, Yunlong Wang, Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Peng Wang:
SeCap: Self-Calibrating and Adaptive Prompts for Cross-view Person Re-Identification in Aerial-Ground Networks. 22119-22128 - Eric Hedlin, Munawar Hayat, Fatih Porikli, Kwang Moo Yi, Shweta Mahajan:
HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories. 22129-22138 - Takeshi Noda, Chao Chen, Junsheng Zhou, Weiqi Zhang, Yu-Shen Liu, Zhizhong Han:
Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation. 22139-22149 - Xinran Yang, Donghao Ji, Yuanqi Li, Junyuan Xie, Jie Guo, Yanwen Guo:
EdgeMovingNet: Edge-preserving Point Cloud Reconstruction via Joint Geometry Features. 22150-22160 - Lin Bie, Shouan Pan, Siqi Li, Yining Zhao, Yue Gao:
GraphI2P: Image-to-Point Cloud Registration with Exploring Pattern of Correspondence via Graph Learning. 22161-22171 - Kang You, Tong Chen
, Dandan Ding, M. Salman Asif, Zhan Ma:
RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds. 22172-22181 - Changshuo Wang
, Shuting He, Xiang Fang, Jiawei Han, Zhonghang Liu, Xin Ning, Weijun Li, Prayag Tiwari:
Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding. 22182-22192 - Xiaoyang Wu, Daniel DeTone, Duncan P. Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob J. Engel, Richard A. Newcombe, Hengshuang Zhao, Julian Straub:
Sonata: Self-Supervised Learning of Reliable Point Representations. 22193-22204 - Qi Zhang
, Jibin Peng, Zhao Huang, Wei Feng, Di Lin:
Generative Hard Example Augmentation for Semantic Point Cloud Segmentation. 22205-22214 - Yuzhou Liu, Lingjie Zhu, Hanqiao Ye, Shangfeng Huang, Xiang Gao, Xianwei Zheng, Shuhan Shen:
BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with Transformer. 22215-22224 - Justin Lazarow, David Griffiths, Gefen Kohavi, Francisco Crespo, Afshin Dehghan:
Cubify Anything: Scaling Indoor 3D Object Detection. 22225-22233 - Mohamed Abdelsamad
, Michael Ulrich, Claudius Gläser, Abhinav Valada:
Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds. 22234-22243 - Zhenxuan Zeng
, Qiao Wu, Xiyu Zhang, Lin Yuanbo Wu, Pei An, Jiaqi Yang, Ji Wang, Peng Wang:
Unlocking Generalization Power in LiDAR Point Cloud Registration. 22244-22253 - Yingping Liang, Yutao Hu, Wenqi Shao, Ying Fu:
Distilling Monocular Foundation Model for Fine-grained Depth Completion. 22254-22265 - Hou-I Liu, Christine Wu, Jen-Hao Cheng, Wenhao Chai, Shian-Yun Wang, Gaowen Liu, Hugo Latapie, Jhih-Ciang Wu
, Jenq-Neng Hwang, Hong-Han Shuai, Wen-Huang Cheng:
MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection. 22266-22275 - Yunfei Long, Abhinav Kumar, Xiaoming Liu, Daniel D. Morris
:
RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection. 22276-22285 - Xingyue Liu, Jiahao Qi, Chen Chen, Kangcheng Bin, Ping Zhong:
UCM-VeID V2: A Richer Dataset and A Pre-training Method for UAV Cross-Modality Vehicle Re-Identification. 22286-22295 - Yunshuang Yuan, Yan Xia, Daniel Cremers, Monika Sester
:
SparseAlign: a Fully Sparse Framework for Cooperative Object Detection. 22296-22305 - Luke Chen, Junyao Wang, Trier Mortlock, Pramod P. Khargonekar, Mohammad Abdullah Al Faruque:
Hyperdimensional Uncertainty Quantification for Multimodal Uncertainty Fusion in Autonomous Vehicles Perception. 22306-22316 - Dongxu Wei, Zhiqi Li, Peidong Liu:
Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction. 22317-22327 - David T. Hoffmann, Syed Haseeb Raza, Hanqiu Jiang, Denis Tananaev, Steffen Klingenhoefer, Martin Meinke:
Floxels: Fast Unsupervised Voxel Based Scene Flow Estimation. 22328-22337 - Jingyi Xu, Xieyuanli Chen
, Junyi Ma, Jiawei Huang, Jintao Xu, Yue Wang, Ling Pei:
Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting. 22338-22347 - Rui Gong
, Kim-Hui Yap, Weide Liu, Xulei Yang, Jun Cheng:
Rectification-specific Supervision and Constrained Estimator for Online Stereo Rectification. 22348-22358 - Xiaolu Liu, Ruizi Yang, Song Wang, Wentong Li, Junbo Chen
, Jianke Zhu:
Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction. 22359-22368 - Yunlong Lin, Zixu Lin, Haoyu Chen, Panwang Pan, Chenxin Li, Sixiang Chen, Kairun Wen, Yeying Jin, Wenbo Li, Xinghao Ding:
JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration. 22369-22380 - Jingcheng Ni, Yuxin Guo, Yichen Liu, Rui Chen, Lewei Lu, Zehuan Wu:
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction. 22381-22391 - Ze Yang
, Jingkang Wang, Haowei Zhang, Sivabalan Manivasagam, Yun Chen, Raquel Urtasun:
GenAssets: Generating in-the-wild 3D Assets in Latent Space. 22392-22403 - Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Pedro M. B. Rezende, Yasaman Haghighi, David Brüggemann, Isinsu Katircioglu, Lin Zhang, Xiaoran Chen, Suman Saha, Marco Cannici, Elie Aljalbout, Botao Ye, Xi Wang, Aram Davtyan
, Mathieu Salzmann, Davide Scaramuzza, Marc Pollefeys, Paolo Favaro, Alexandre Alahi:
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control. 22404-22415 - Inhwan Bae, Junoh Lee, Hae-Gon Jeon:
Continuous Locomotive Crowd Behavior Generation. 22416-22431 - Ziying Song, Caiyan Jia, Lin Liu, Hongyu Pan, Yongchang Zhang, Junming Wang, Xingyu Zhang, Shaoqing Xu, Lei Yang, Yadan Luo
:
Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving. 22432-22441 - Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, José M. Álvarez:
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning. 22442-22452 - Weizhen Wang, Chenda Duan, Zhenghao Peng, Yuxin Liu, Bolei Zhou:
Embodied Scene Understanding for Vision Language Models via MetaVQA. 22453-22464 - Kai Chen, Xiaodong Zhao, Yujie Huang, Guoyu Fang, Xiao Song, Ruiping Wang, Ziyuan Wang:
SocialMOIF: Multi-Order Intention Fusion for Pedestrian Trajectory Prediction. 22465-22475 - Guillem Capellera
, Antonio Rubio, Luis Ferraz, Antonio Agudo:
Unified Uncertainty-Aware Diffusion for Multi-Agent Trajectory Modeling. 22476-22486 - Greg Heinrich, Mike Ranzinger, Hongxu Yin, Yao Lu, Jan Kautz, Andrew Tao, Bryan Catanzaro, Pavlo Molchanov:
RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models. 22487-22497 - Kwan-Yee Lin, Stella X. Yu:
Let Humanoids Hike! Integrative Skill Development on Complex Trails. 22498-22507 - Jinliang Zheng
, Jianxiong Li
, Dongxiu Liu, Yinan Zheng, Zhihao Wang, Zhonghong Ou, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan:
Universal Actions for Enhanced Embodied Foundation Models. 22508-22519 - Shibo Zhao, Sifan Zhou
, Raphael Blanchard, Yuheng Qiu, Wenshan Wang, Sebastian A. Scherer:
Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics. 22520-22529 - Shengyi Qian, Kaichun Mo, Valts Blukis, David F. Fouhey, Dieter Fox, Ankit Goyal:
3D-MVP: 3D Multiview Pretraining for Manipulation. 22530-22539 - Haifeng Huang, Xinyi Chen, Yilun Chen, Hao Li, Xiaoshen Han, Zehan Wang, Tai Wang, Jiangmiao Pang, Zhou Zhao:
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors. 22540-22550 - Jiaming Zhou, Teli Ma, Kun-Yu Lin
, Zifan Wang, Ronghe Qiu, Junwei Liang:
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation. 22551-22561 - Quanyuan Ruan, Jiabao Lei, Wenhao Yuan, Yanglin Zhang, Dekun Lu, Guiliang Liu, Kui Jia:
Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions. 22562-22572 - Yuanqi Yao, Siao Liu, Haoming Song, Delin Qu, Qizhi Chen, Yan Ding, Bin Zhao, Zhigang Wang, Xuelong Li, Dong Wang:
Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation. 22573-22583 - Yiming Zhong
, Qi Jiang, Jingyi Yu, Yuexin Ma:
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness. 22584-22594 - Yuxing Long, Jiyao Zhang, Mingjie Pan, Tianshu Wu, Taewhan Kim, Hao Dong:
CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation. 22595-22604 - Sai Kumar Dwivedi, Dimitrije Antic, Shashank Tripathi, Omid Taheri, Cordelia Schmid, Michael J. Black, Dimitrios Tzionas:
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models. 22605-22615 - Yujie Liang, Xiaobin Hu, Boyuan Jiang, Donghao Luo, Xu Peng, Kai Wu, Chengming Xu, Wenhui Han, Taisong Jin, Chengjie Wang, Rongrong Ji:
VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding. 22616-22626 - Kaixin Fan, Pengfei Ren, Jingyu Wang, Haifeng Sun, Qi Qi, Zirui Zhuang, Jianxin Liao:
Pose-Guided Temporal Enhancement for Robust Low-Resolution Hand Reconstruction. 22627-22637 - Li Zhang, Mingliang Xu, Jianan Wang, Qiaojun Yu, Lixin Yang, Yonglu Li, Cewu Lu, Rujing Wang, Liu Liu:
GaPT-DAR: Category-level Garments Pose Tracking via Integrated 2D Deformation and 3D Reconstruction. 22638-22647 - Dong Li, Wenqi Zhong, Wei Yu, Yingwei Pan, Dingwen Zhang, Ting Yao, Junwei Han, Tao Mei:
Pursuing Temporal-Consistent Video Virtual Try-On via Dynamic Pose Interaction. 22648-22657 - Shuhang Chen, Xianliang Huang, Zhizhou Zhong, Juhong Guan, Shuigeng Zhou:
A Focused Human Body Model for Accurate Anthropometric Measurements Extraction. 22658-22667 - Jian Wang, Rishabh Dabral, Diogo C. Luvizon, Zhe Cao, Lingjie Liu, Thabo Beeler, Christian Theobalt:
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input. 22668-22679 - Reyhaneh HosseiniNejad, Megh Shukla, Saeed Saadatnejad, Mathieu Salzmann, Alexandre Alahi:
MotionMap: Representing Multimodality in Human Pose Forecasting. 22680-22689 - Bin Ji, Ye Pan, Zhimeng Liu, Shuai Tan, Xiaogang Jin, Xiaokang Yang:
POMP: Physics-constrainable Motion Generative Model through Phase Manifolds. 22690-22701 - Zhanbo Huang, Xiaoming Liu, Yu Kong:
H-MoRe: Learning Human-centric Motion Representation for Action Analysis. 22702-22713 - Mengqing Xue, Yifei Liu, Ling Guo, Shaoli Huang, Changxing Ding:
Guiding Human-Object Interactions with Rich Geometry and Relations. 22714-22723 - Hua Yu, Weiming Liu, Gui Xu, Yaqing Hou, Yew-Soon Ong, Qiang Zhang:
Deterministic-to-Stochastic Diverse Latent Feature Mapping for Human Motion Synthesis. 22724-22734 - Nan Jiang, Hongjie Li, Ziye Yuan, Zimo He
, Yixin Chen, Tengyu Liu, Yixin Zhu, Siyuan Huang:
Dynamic Motion Blending for Versatile Motion Editing. 22735-22745 - Haonan Han, Xiangzuo Wu, Huan Liao, Zunnan Xu, Zhongyuan Hu, Ronghui Li, Yachao Zhang, Xiu Li:
AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward. 22746-22755 - Boeun Kim
, Hea In Jeong, JungHoon Sung, Yihua Cheng, Jeongmin Lee
, Ju Yong Chang, Sang-Il Choi, Younggeun Choi, Saim Shin, Jungho Kim, Hyung Jin Chang:
PersonaBooth: Personalized Text-to-Motion Generation. 22756-22765 - Hsin-Ping Huang, Yang Zhou, Jui-Hsien Wang, Difan Liu, Feng Liu, Ming-Hsuan Yang, Zhan Xu:
Move-in-2D: 2D-Conditioned Human Motion Generation. 22766-22775 - Longbin Ji, Lei Zhong, Pengfei Wei, Changjian Li:
PoseTraj: Pose-Aware Trajectory Control in Video Diffusion. 22776-22785 - Junhyeong Cho
, Kim Youwang, Hunmin Yang, Tae-Hyun Oh:
Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild. 22786-22798 - Yung-Hao Yang, Zitang Sun, Taiki Fukiage, Shin'ya Nishida:
HuPerFlow: A Comprehensive Benchmark for Human vs. Machine Motion Estimation Comparison. 22799-22808 - Zihang Lai, Andrea Vedaldi:
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better. 22809-22819 - Jiahao Lu, Tianyu Huang, Peng Li, Zhiyang Dou, Cheng Lin, Zhiming Cui, Zhen Dong, Sai-Kit Yeung, Wenping Wang, Yuan Liu:
Align3R: Aligned Monocular Depth Estimation for Dynamic Videos. 22820-22830 - Sili Chen, Hengkai Guo, Shengnan Zhu, Feihu Zhang, Zilong Huang, Jiashi Feng, Bingyi Kang:
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos. 22831-22840 - Jiahao Shao, Yuanbo Yang, Hongyu Zhou, Youmin Zhang, Yujun Shen, Vitor Guizilini, Yue Wang, Matteo Poggi, Yiyi Liao:
Learning Temporally Consistent Video Depth from Video Diffusion Priors. 22841-22852 - Huiwon Jang, Sihyun Yu, Jinwoo Shin, Pieter Abbeel, Younggyo Seo:
Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction. 22853-22863 - Shuwei Shi, Biao Gong, Xi Chen, Dandan Zheng, Shuai Tan, Zizheng Yang, Yuyuan Li, Jingwen He, Kecheng Zheng, Jingdong Chen, Ming Yang
, Yinqiang Zheng:
MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation. 22864-22874 - Sherwin Bahmani, Ivan Skorokhodov, Guocheng Qian, Aliaksandr Siarohin, Willi Menapace, Andrea Tagliasacchi, David B. Lindell, Sergey Tulyakov:
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers. 22875-22889 - Kaihua Chen, Deva Ramanan, Tarasha Khurana:
Using Diffusion Priors for Video Amodal Segmentation. 22890-22900 - Juan Luis Gonzalez, Xu Yao, Alex Whelan, Kyle Olszewski, Hyeongwoo Kim, Pablo Garrido:
VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing. 22901-22910 - Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov, Philip Torr, Fabio Pizzati:
Video Motion Transfer with Diffusion Transformers. 22911-22921 - Yuchi Wang, Junliang Guo, Xinyi Xie, Tianyu He, Xu Sun, Jiang Bian:
VidTwin: Video VAE with Decoupled Structure and Dynamics. 22922-22932 - Maria Pilligua, Danna Xue, Javier Vazquez-Corral
:
HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks. 22933-22942 - Yang Hai, Guo Wang, Tan Su, Wenjie Jiang, Yinlin Hu:
Hierarchical Flow Diffusion for Efficient Frame Interpolation. 22943-22952 - Ding Ding, Yueming Pan, Ruoyu Feng, Qi Dai, Kai Qiu, Jianmin Bao, Chong Luo, Zhenzhong Chen:
HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion. 22953-22962 - Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Freeman, Frédo Durand, Eli Shechtman, Xun Huang:
From Slow Bidirectional to Fast Autoregressive Video Diffusion Models. 22963-22974 - Shuyun Wang, Hu Zhang, Xin Shen, Dadong Wang, Xin Yu
:
Blind Bitstream-corrupted Video Recovery via Metadata-guided Diffusion Model. 22975-22984 - Qian Wang
, Abdelrahman Eldesokey, Mohit Mendiratta, Fangneng Zhan, Adam Kortylewski, Christian Theobalt, Peter Wonka:
VidSeg: Training-free Video Semantic Segmentation based on Diffusion Models. 22985-22994 - Yue-Hua Han, Tai-Ming Huang, Kai-Lung Hua, Jun-Cheng Chen:
Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model. 22995-23005 - Zhenxuan Fang, Fangfang Wu, Tao Huang, Le Dong, Weisheng Dong, Xin Li, Guangming Shi:
Parameterized Blur Kernel Prior Learning for Local Motion Deblurring. 23006-23015 - Nicolas Dufour, Vicky Kalogeiton, David Picard, Loïc Landrieu:
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation. 23016-23026 - Shasha Mao, Shiming Lu, Zhaolong Du, Licheng Jiao, Shuiping Gou, Luntian Mou, Xuequan Lu, Lin Xiong, Yimeng Zhang:
Cross-Rejective Open-Set SAR Image Registration. 23027-23036 - Zichen Tian, Yaoyao Liu
, Qianru Sun:
Meta-Learning Hyperparameters for Parameter Efficient Fine-Tuning. 23037-23047 - Jingtao Li, Yingyi Liu, Xinyu Wang, Yunning Peng, Chen Sun, Shaoyu Wang, Zhendong Sun, Tian Ke, Xiao Jiang, Tangwei Lu, Anran Zhao, Yanfei Zhong:
HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery. 23048-23058 - Jiangwei Ren
, Xingyu Jiang
, Zizhuo Li, Dingkang Liang, Xin Zhou
, Xiang Bai:
MINIMA: Modality Invariant Image Matching. 23059-23068 - Sungpyo Kim, Jeonghyeok Do, Jaehyup Lee, Munchurl Kim:
U-Know-DiffPAN: An Uncertainty-aware Knowledge Distillation Diffusion Framework with Details Enhancement for PAN-Sharpening. 23069-23079 - Xin Di, Long Peng, Peizhe Xia, Wenbo Li, Renjing Pei, Yang Cao, Yang Wang, Zheng-Jun Zha:
QMambaBSR: Burst Image Super-Resolution with Query State Space Model. 23080-23090 - Ruiyi Wang
, Yushuo Zheng, Zicheng Zhang, Chunyi Li, Shuaicheng Liu, Guangtao Zhai, Xiaohong Liu:
Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing. 23091-23100 - Ze-Yu Mi, Yu-Bin Yang:
ADD: Attribution-Driven Data Augmentation Framework for Boosting Image Super-Resolution. 23101-23110 - Heemin Yang, Jaesung Rim, Seungyong Lee, Seung-Hwan Baek, Sunghyun Cho:
Gyro-based Neural Single Image Deblurring. 23111-23120 - Yidi Liu, Dong Li, Xueyang Fu, Xin Lu, Jie Huang, Zheng-Jun Zha:
UHD-processer: Unified UHD Image Restoration with Progressive Frequency Learning and Degradation-aware Prompts. 23121-23130 - Yuheng Xu, Shijie Yang, Xin Liu, Jie Liu
, Jie Tang, Gangshan Wu:
AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning. 23131-23140 - Kangfu Mei, Hossein Talebi, Mojtaba Ardakani, Vishal M. Patel, Peyman Milanfar, Mauricio Delbracio:
The Power of Context: How Multimodality Improves Image Super-Resolution. 23141-23152 - Zongsheng Yue, Kang Liao, Chen Change Loy:
Arbitrary-steps Image Super-resolution via Diffusion Inversion. 23153-23163 - Anat Levin, Marina Alterman:
Understanding Multi-layered Transmission Matrices. 23164-23173 - Linwei Dong, Qingnan Fan, Yihong Guo, Zhonghao Wang, Qi Zhang, Jinwei Chen, Yawei Luo, Changqing Zou:
TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution. 23174-23184 - Matthieu Terris, Ulugbek S. Kamilov, Thomas Moreau:
FiRe: Fixed-points of Restoration Priors for Solving Inverse Problems. 23185-23194 - Junyuan Deng, Xinyi Wu, Yongxing Yang, Congchao Zhu, Song Wang, Zhenyao Wu:
Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration. 23195-23206 - Chong Wang
, Lanqing Guo, Zixuan Fu, Siyuan Yang, Hao Cheng, Alex C. Kot, Bihan Wen:
Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual. 23207-23216 - Xinrui Wang
, Lanqing Guo, Xiyu Wang, Siyu Huang, Bihan Wen:
SoftShadow: Leveraging Soft Masks for Penumbra-Aware Shadow Removal. 23217-23226 - Xingyu Qiu
, Mengying Yang, Xinghua Ma, Fanding Li, Dong Liang, Gongning Luo, Wei Wang, Kuanquan Wang, Shuo Li
:
Finding Local Diffusion Schrodinger Bridge using Kolmogorov-Arnold Network. 23227-23236 - Yikai Wang, Chenjie Cao, Junqiu Yu, Ke Fan, Xiangyang Xue, Yanwei Fu:
Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency. 23237-23248 - Zhe Zhang, Zhenzhong Chen, Shan Liu:
Fitted Neural Lossless Image Compression. 23249-23258 - Jona Ballé, Luca Versari, Emilien Dupont, Hyunjik Kim, Matthias Bauer:
Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion. 23259-23268 - Xuewen Liu, Zhikai Li, Qingyi Gu:
CacheQuant: Comprehensively Accelerated Diffusion Models. 23269-23280 - Qianli Ma
, Xuefei Ning, Dongrui Liu, Li Niu, Linfeng Zhang:
Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning. 23281-23291 - Youyuan Zhang, Zehua Liu, Zenan Li, Zhaoyu Li, James J. Clark, Xujie Si:
Decoupling Training-Free Guided Diffusion by ADMM. 23292-23302 - Mashrur M. Morshed, Vishnu Boddeti:
DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows. 23303-23312 - Junhyuk So, Jiwoong Shin, Chaeyeon Jang, Eunhyeok Park:
PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models. 23313-23322 - David McAllister, Matthew Tancik, Jiaming Song, Angjoo Kanazawa:
Decentralized Diffusion Models. 23323-23333 - Zigeng Chen, Xinyin Ma, Gongfan Fang, Xinchao Wang:
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient. 23334-23344 - Ye Chen, Zhangli Hu, Zhongyin Zhao, Yupeng Zhu, Yue Shi, Yuxuan Xiong, Bingbing Ni:
Easy-editable Image Vectorization with Multi-layer Multi-scale Distributed Visual Feature Embedding. 23345-23354 - Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E. Fan, Antonio Torralba:
SketchAgent: Language-Driven Sequential Sketch Generation. 23355-23368 - Xihua Wang, Ruihua Song, Chongxuan Li, Xin Cheng, Boyuan Li, Yihan Wu, Yuyue Wang, Hongteng Xu, Yunfeng Wang:
Animate and Sound an Image. 23369-23378 - Feng-Lin Liu, Hongbo Fu, Xintao Wang, Weicai Ye, Pengfei Wan, Di Zhang, Lin Gao:
SketchVideo: Sketch-based Video Generation and Editing. 23379-23390 - Dingkun Yan, Xinrui Wang, Zhuoru Li, Suguru Saito, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo:
Image Referenced Sketch Colorization Based on Animation Creation Workflow. 23391-23400 - Junyu Gao, Kunlin Yang, Xuan Yao, Yufan Hu:
Unity in Diversity: Video Editing via Gradient-Latent Purification. 23401-23411 - Zilyu Ye, Zhiyang Chen, Tiancheng Li, Zemin Huang, Weijian Luo, Guo-Jun Qi:
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation. 23412-23422 - Ravishankar Evani
, Deepu Rajan, Shangbo Mao:
Chebyshev Attention Depth Permutation Texture Network with Latent Texture Attribute Loss. 23423-23432 - Shuhao Zhang, Hui Kang, Yang Liu, Fang Mei, Hongjuan Li:
HSI: A Holistic Style Injector for Arbitrary Style Transfer. 23433-23442 - Mingkun Lei, Xue Song, Beier Zhu, Hao Wang, Chi Zhang:
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements. 23443-23452 - Srikar Yellapragada, Alexandros Graikos, Kostas Triaridis, Prateek Prasanna, Rajarsi Gupta, Joel H. Saltz, Dimitris Samaras:
ZoomLDM: Latent Diffusion Model for Multi-scale Image Generation. 23453-23463 - Jinjin Zhang, Qiuyu Huang, Junjie Liu, Xiefan Guo, Di Huang:
Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models. 23464-23473 - Yoonjeon Kim, Soohyun Ryu, Yeonsung Jung, Hyunkoo Lee, Joowon Kim, June Yong Yang, Jaeryong Hwang, Eunho Yang:
Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing. 23474-23483 - Shanshan Huang, Haoxuan Li
, Chunyuan Zheng, Lei Wang, Guorui Liao, Zhili Gong, Huayi Yang, Li Liu:
Visual Representation Learning through Causal Intervention for Controllable Image Editing. 23484-23493 - Wenhao Gu, Li Gu, Chingyee Yee Suen, Yang Wang:
MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning. 23494-23504 - Zihao Wang, Yuxiang Wei, Fan Li, Renjing Pei, Hang Xu, Wangmeng Zuo:
ACE: Anti-Editing Concept Erasure in Text-to-Image Models. 23505-23515 - Shoufa Chen, Chongjian Ge, Yuqi Zhang, Yida Zhang, Fengda Zhu, Hao Yang, Hongxiang Hao, Hui Wu, Zhichao Lai, Yifei Hu, Ting-Che Lin, Shilong Zhang, Fu Li, Chuan Li, Xing Wang, Yanghua Peng, Peize Sun, Ping Luo, Yi Jiang, Zehuan Yuan, Bingyue Peng, Xiaobing Liu:
Goku: Flow Based Video Generative Foundation Models. 23516-23527 - Weimin Qiu, Jieke Wang, Meng Tang:
Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects. 23528-23538 - Chao Wang, Hehe Fan, Huichen Yang, Sarvnaz Karimi, Lina Yao, Yi Yang:
Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration. 23539-23550 - Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee:
Boost Your Human Image Generation Model via Direct Preference Optimization. 23551-23562 - Senmao Li, Lei Wang, Kai Wang, Tao Liu, Jiehang Xie, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang:
One-Way Ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models. 23563-23574 - Boming Miao, Chunxiao Li, Xiaoxiao Wang, Andi Zhang, Rui Sun, Zizhe Wang, Yao Zhu:
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis. 23575-23584 - Jian Jin, Zhenbo Yu, Yang Shen, Zhenyong Fu, Jian Yang:
LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending. 23585-23594 - Kyungmin Jo, Jooyeol Yun, Jaegul Choo:
Devil is in the Detail: Towards Injecting Fine Details of Image Prompt in Image Generation via Conflict-free Guidance and Stratified Attention. 23595-23603 - Zijing Hu, Fengda Zhang, Long Chen
, Kun Kuang, Jiahui Li, Kaifeng Gao, Jun Xiao, Xin Wang, Wenwu Zhu:
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards. 23604-23614 - Yuyang Peng, Shishi Xiao, Keming Wu, Qisheng Liao, Bohan Chen, Kevin Lin, Danqing Huang, Ji Li, Yuhui Yuan:
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation. 23615-23624 - Taeyoung Yun, Dinghuai Zhang, Jinkyoo Park, Ling Pan:
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation. 23625-23635 - Xiaomin Li, Yixuan Liu, Takashi Isobe, Xu Jia, Qinpeng Cui, Dong Zhou, Dong Li, You He, Huchuan Lu, Zhongdao Wang, Emad Barsoum:
ReNeg: Learning Negative Embedding with Reward Guidance. 23636-23645 - Zehuan Huang, Yuan-Chen Guo, Xingqiao An, Yunhan Yang, Yangguang Li, Zi-Xin Zou, Ding Liang, Xihui Liu, Yan-Pei Cao, Lu Sheng:
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation. 23646-23657 - Yuchao Gu, Yipin Zhou, Yunfan Ye, Yixin Nie, Licheng Yu, Pingchuan Ma, Kevin Qinghong Lin, Mike Zheng Shou:
ROICtrl: Boosting Instance Control for Visual Generation. 23658-23667 - Hanzhe Hu, Tianwei Yin, Fujun Luan, Yiwei Hu, Hao Tan, Zexiang Xu, Sai Bi, Shubham Tulsiani, Kai Zhang:
Turbo3D: Ultra-fast Text-to-3D Generation. 23668-23678 - Zhipeng Huang, Shaobin Zhuang, Canmiao Fu, Binxin Yang, Ying Zhang, Chong Sun, Zhizheng Zhang, Yali Wang, Chen Li, Zheng-Jun Zha:
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat. 23679-23689 - Ronghuan Wu
, Wanchao Su, Jing Liao
:
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models. 23690-23700 - Sohan Patnaik, Rishabh Jain
, Balaji Krishnamurthy, Mausoom Sarkar:
AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models. 23701-23711 - Feng Zhou, Ruiyang Liu, Chen Liu, Gaofeng He, Yong-Lu Li, Xiaogang Jin, Huamin Wang:
Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis. 23712-23722 - Xinghui Li, Qichao Sun, Pengze Zhang, Fulong Ye, Zhichao Liao, Wanquan Feng, Songtao Zhao, Qian He:
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models. 23723-23733 - Fernando Julio Cendra, Kai Han:
ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models. 23734-23743 - Sangwon Jung, Alex Oesterling, Claudio Mayrink Verdun, Sajani Vithana, Taesup Moon, Flávio P. Calmon:
Multi-Group Proportional Representations for Text-to-Image Models. 23744-23754 - Logan Frank, Jim Davis:
What Makes a Good Dataset for Knowledge Distillation? 23755-23764 - Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer, Vishal M. Patel, Karthik Nandakumar:
STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models. 23765-23774 - Xinting Hu, Haoran Wang
, Jan Eric Lenssen, Bernt Schiele:
PersonaHOI: Effortlessly Improving Face Personalization in Human-Object Interaction Generation. 23775-23784 - Junxi Chen, Junhao Dong, Xiaohua Xie:
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking. 23785-23794 - Won Jun Kim, Hyungjin Chung, Jaemin Kim, Sangmin Lee, Byeongsu Sim, Jong Chul Ye:
Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI. 23795-23805 - Hanhui Wang, Yihua Zhang, Ruizheng Bai, Yue Zhao, Sijia Liu, Zhengzhong Tu:
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing. 23806-23816 - Yuechen Xie, Jie Song, Huiqiong Wang, Mingli Song:
Training Data Provenance Verification: Did Your Model Use Synthetic Data from My Generative Model for Training? 23817-23827 - Haifeng Zhang, Qinghui He, Xiuli Bi, Weisheng Li, Bo Liu, Bin Xiao:
Towards Universal AI-Generated Image Detection by Variational Information Bottleneck Network. 23828-23837 - Qi Bi, Jingjun Yi, Huimin Huang, Hao Zheng, Haolan Zhan, Yawen Huang, Yuexiang Li, Xian Wu, Yefeng Zheng:
NightAdapter: Learning a Frequency Adapter for Generalizable Night-time Scene Segmentation. 23838-23849 - Yongqi Yang, Zhihao Qian, Ye Zhu, Olga Russakovsky, Yu Wu:
D^3: Scaling Up Deepfake Detection by Learning from Discrepancy. 23850-23859 - Feng Yan, Xiaoheng Jiang, Yang Lu, Jiale Cao, Dong Chen, Mingliang Xu:
Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection. 23860-23869 - Zhanqiang Guo, Jiamin Wu, Yonghao Song, Jiahui Bu, Weijian Mai, Qihao Zheng, Wanli Ouyang, Chunfeng Song:
Neuro-3D: Towards 3D Visual Decoding from EEG Signals. 23870-23880 - Sahar Dastani, Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori, David Osowiechi, Gustavo Adolfo Vargas Hakim, Farzad Beizaee, Milad Cheraghalikhani, Arnab Kumar Mondal, Herve Lombaert
, Christian Desrosiers:
Spectral State Space Model for Rotation-Invariant Visual Representation Learning. 23881-23890 - Yihua Cheng, Hengfei Wang, Zhongqun Zhang, Yang Yue, Boeun Kim
, Feng Lu, Hyung Jin Chang:
3D Prior Is All You Need: Cross-Task Few-shot 2D Gaze Estimation. 23891-23900 - Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha, Omar Emara, Sam Pollard
, Kranti Kumar Parida, Kaiting Liu, Prajwal Gatti, Siddhant Bansal
, Kevin Flanagan, Jacob Chalk
, Zhifan Zhu, Rhodri Guerrier, Fahd Abdelazim, Bin Zhu
, Davide Moltisanti
, Michael Wray
, Hazel Doughty, Dima Damen
:
HD-EPIC: A Highly-Detailed Egocentric Video Dataset. 23901-23913 - Fan Qi, Kunsheng Ma, Changsheng Xu:
Customized Condition Controllable Generation for Video Soundtrack. 23914-23924 - Chao Huang, Ruohan Gao, J. M. F. Tsang, Jan Kurcius, Cagdas Bilen, Chenliang Xu, Anurag Kumar, Sanjeel Parekh:
Learning to Highlight Audio by Watching Movies. 23925-23935 - Anna Min, Ziyang Chen, Hang Zhao, Andrew Owens:
Supervising Sound Localization by In-the-wild Egomotion. 23936-23946 - Abduljalil Radman
, Jorma Laaksonen
:
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation. 23947-23956 - Liang Liu, Shuaiyong Li
, Yongqiang Zhu
:
Audio-Visual Semantic Graph Network for Audio-Visual Event Localization. 23957-23966 - Huangbiao Xu
, Xiao Ke, Huanqi Wu, Rui Xu
, Yuezhou Li, Wenzhong Guo:
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment. 23967-23977 - Shuai Tan, Biao Gong, Yutong Feng, Kecheng Zheng, Dandan Zheng, Shuwei Shi, Yujun Shen, Jingdong Chen, Ming Yang
:
Mimir: Improving Video Diffusion Models for Precise Text Understanding. 23978-23988 - Ziyi Wu, Aliaksandr Siarohin, Willi Menapace, Ivan Skorokhodov, Yuwei Fang, Varnith Chordia, Igor Gilitschenski, Sergey Tulyakov:
Mind the Time: Temporally-Controlled Multi-Event Video Generation. 23989-24000 - Kun Liu, Qi Liu, Xinchen Liu, Jie Li, Yongdong Zhang, Jiebo Luo, Xiaodong He, Wu Liu:
HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation. 24001-24010 - Duowang Zhu, Xiaohu Huang, Haiyan Huang, Hao Zhou, Zhenfeng Shao:
Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective. 24011-24022 - Darryl Ho, Samuel Madden:
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification. 24023-24032 - Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai, Qingming Huang:
When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning. 24033-24044 - Rui Qian, Shuangrui Ding, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang
:
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction. 24045-24055 - Haitong Liu, Kuofeng Gao, Yang Bai, Jinmin Li, Jinxiao Shan, Tao Dai, Shu-Tao Xia:
Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations. 24056-24065 - Zijia Lu, A S. M. Iftekhar, Gaurav Mittal, Tianjian Meng, Xiawei Wang, Cheng Zhao, Rohith Kukkala, Ehsan Elhamifar, Mei Chen:
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos. 24066-24076 - Chan Hur, Jeong-Hun Hong, Dong-hun Lee, Dabin Kang, Semin Myeong, Sang-hyo Park, Hyeyoung Park:
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions. 24077-24086 - Weixing Chen, Yang Liu, Binglin Chen, Jiandong Su, Yongsen Zheng, Liang Lin:
Cross-modal Causal Relation Alignment for Video Question Grounding. 24087-24096 - Luca Zanella
, Massimiliano Mancini
, Willi Menapace, Sergey Tulyakov, Yiming Wang, Elisa Ricci:
Can Text-to-Video Generation help Video-Language Alignment? 24097-24107 - Chaoyou Fu
, Yuhan Dai, Yongdong Luo
, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li
, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu
, Xiawu Zheng, Enhong Chen, Caifeng Shan
, Ran He, Xing Sun:
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis. 24108-24118 - Jinhui Yi, Syed Talal Wasim, Yanan Luo, Muzammal Naseer, Juergen Gall:
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models. 24119-24128 - Chiara Plizzari, Alessio Tonioni, Yongqin Xian, Achin Kulshrestha, Federico Tombari:
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos. 24129-24138 - Quan Zhang, Jinwei Fang, Rui Yuan, Xi Tang, Yuxin Qi, Ke Zhang, Chun Yuan:
Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models. 24139-24148 - Reno Kriz, Kate Sanders, David Etter, Kenton Murray, Cameron Carpenter, Hannah Recknor, Jimena Guallar-Blasco, Alexander Martin, Eugene Yang, Benjamin Van Durme:
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval. 24149-24158 - Zijia Zhao, Yuqi Huo, Tongtian Yue, Longteng Guo, Haoyu Lu, Bingning Wang, Weipeng Chen, Jing Liu:
Efficient Motion-Aware Video MLLM. 24159-24168 - Zeyi Huang, Yuyang Ji, Xiaofang Wang, Nikhil Mehta, Tong Xiao, Donghyun Lee, Sigmund Vanvalkenburgh, Shengxin Zha, Bolin Lai, Licheng Yu, Ning Zhang, Yong Jae Lee, Miao Liu:
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs. 24169-24179 - Jiawei Tan, Hongxing Wang, Junwu Weng, Jiaxin Li, Zhilong Ou, Kang Dang:
Anchor-Aware Similarity Cohesion in Target Frames Enables Predicting Temporal Moment Boundaries in 2D. 24180-24189 - Yisen Feng, Haoyu Zhang, Meng Liu, Weili Guan, Liqiang Nie:
Object-Shot Enhanced Grounding Network for Egocentric Video. 24190-24200 - Aditya Chinchure
, Sahithya Ravi, Raymond T. Ng, Vered Shwartz, Boyang Li, Leonid Sigal:
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events. 24201-24210 - Syed Ariff Syed Hesham, Yun Liu, Guolei Sun, Henghui Ding, Jing Yang, Ender Konukoglu, Xue Geng, Xudong Jiang:
Exploiting Temporal State Space Sharing for Video Semantic Segmentation. 24211-24221 - Jaewoo Jeong, Seohee Lee, Daehee Park, Giwon Lee, Kuk-Jin Yoon:
Multi-modal Knowledge Distillation-based Human Trajectory Forecasting. 24222-24233 - Mingqiao Ye, Seoung Wug Oh, Lei Ke, Joon-Young Lee:
EntitySAM: Segment Everything in Video. 24234-24243 - Md. Zarif Hossain, Ahmed Imteaj
:
SLADE: Shielding against Dual Exploits in Large Vision-Language Models. 24244-24254 - Jovana Videnovic, Alan Lukezic, Matej Kristan:
A Distractor-Aware Memory for Visual Object Tracking with SAM2. 24255-24264 - Snehashis Majhi, Giacomo D'Amicantonio, Antitza Dantcheva, Quan Kong, Lorenzo Garattoni, Gianpiero Francesca, Egor Bondarev, François Brémond:
Just Dance with pi! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection. 24265-24274 - Kazi Sajeed Mehrab, M. Maruf, Arka Daw, Abhilash Neog, Harish Babu Manogaran, Mridul Khurana, Zhenyang Feng, Bahadir Altintas, Yasin Bakis, Elizabeth G. Campolongo, Matthew J. Thompson, Xiaojun Wang, Hilmar Lapp, Tanya Y. Berger-Wolf, Paula M. Mabee, Henry L. Bart Jr., Wei-Lun Chao, Wasila M. Dahdul, Anuj Karpatne:
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images. 24275-24285 - Ho-Joong Kim, Yearang Lee, Jung-Ho Hong, Seong-Whan Lee:
DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer. 24286-24296 - Dominick Reilly, Rajatsubhra Chakraborty
, Arkaprava Sinha, Manish Kumar Govind, Pu Wang, François Brémond, Le Xue, Srijan Das
:
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living. 24297-24308 - Jianyang Xie, Yitian Zhao, Yanda Meng, He Zhao, Anh Nguyen, Yalin Zheng
:
Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized? 24309-24319 - Yuhao Li, Xinyue Chen, Hongkai Li, Xiaorong Pu, Peng Jin, Yazhou Ren:
VSNet: Focusing on the Linguistic Characteristics of Sign Language. 24320-24330 - Chun Tong Lei, Hon Ming Yam, Zhongliang Guo, Yifei Qian, Chun Pong Lau:
Instant Adversarial Purification with Adversarial Consistency Distillation. 24331-24340 - Huu Binh Ta, Duc Nguyen, Quyen Tran, Toan Tran, Tung Pham:
Low-Rank Adaptation in Multilinear Operator Networks for Security-Preserving Incremental Learning. 24341-24350 - Zhuowei Li, Tianchen Zhao
, Xiang Xu, Zheng Zhang, Zhihua Li, Xuanbai Chen, Qin Zhang, Alessandro Bergamo, Anil K. Jain, Yifan Xing:
Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing. 24351-24363 - Gaojian Wang
, Feng Lin, Tong Wu, Zhenguang Liu, Zhongjie Ba, Kui Ren:
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning. 24364-24376 - Hangtao Zhang, Yichen Wang, Shihui Yan, Chenyu Zhu, Ziqi Zhou, Linshan Hou, Shengshan Hu, Minghui Li, Yanjun Zhang, Leo Yu Zhang:
Test-Time Backdoor Detection for Object Detection Models. 24377-24386 - Tong Bu, Maohua Li, Zhaofei Yu:
Inference-Scale Complexity in ANN-SNN Conversion for High-Performance and Low-Power Applications. 24387-24397 - Yufei Guo, Xiaode Liu, Yuanpei Chen, Weihang Peng, Yuhan Zhang, Zhe Ma:
Spiking Transformer: Introducing Accurate Addition-Only Spiking Self-Attention for Transformer. 24398-24408 - Chao Yuan, Guiwei Zhang, Changxiao Ma, Tianyi Zhang, Guanglin Niu:
From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization. 24409-24418 - Keqi Chen, Vinkle Srivastav, Didier Mutter, Nicolas Padoy:
Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes. 24419-24428 - Jiaqi Zhao, Zeyu Ding
, Yong Zhou, Hancheng Zhu, Wen-Liang Du, Rui Yao:
ReDiffDet: Rotation-equivariant Diffusion Model for Oriented Object Detection. 24429-24439 - Maochen Yang, Zekun Li, Jian Zhang, Lei Qi, Yinghuan Shi:
Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting. 24440-24451 - Longtao Jiang, Zhendong Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Lei Shi, Dong Chen, Houqiang Li:
SmartEraser: Remove Anything from Images using Masked-Region Guidance. 24452-24462 - Jae-Woo Kim, Ue-Hwan Kim
:
Towards Generalizable Scene Change Detection. 24463-24473 - Weixiao Gao, Liangliang Nan, Hugo Ledoux:
SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes. 24474-24484 - Oliver Hahn, Christoph Reich, Nikita Araslanov, Daniel Cremers, Christian Rupprecht, Stefan Roth:
Scene-Centric Unsupervised Panoptic Segmentation. 24485-24495 - Hongyi Zeng, Wenxuan Liu, Tianhua Xia, Jinhui Chen, Ziyun Li, Sai Qian Zhang:
Foveated Instance Segmentation. 24496-24505 - Yushan Zhang, Aljosa Osep, Laura Leal-Taixé, Tim Meinhardt:
Zero-Shot 4D Lidar Panoptic Segmentation. 24506-24517 - Markus Karmann, Onay Urfalioglu:
Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation. 24518-24528 - Saad Lahlali, Sandra Kara, Hejer Ammar, Florian Chabot, Nicolas Granger, Hervé Le Borgne, Quoc-Cuong Pham:
Cross-Modal Distillation for 2D/3D Multi-Object Discovery from 2D Motion. 24529-24538 - Shengqiong Wu, Hao Fei, Jingkang Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Tat-Seng Chua:
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene. 24539-24549 - Jaime Corsetti, Francesco Giuliari, Alice Fasoli, Davide Boscaini
, Fabio Poiesi:
Functionality Understanding and Segmentation in 3D Scenes. 24550-24559 - Jiaxin Shi, Mingyue Xiang, Hao Sun, Yixuan Huang, Zhi Weng:
Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding. 24560-24569 - Jiangyong Huang, Baoxiong Jia, Yan Wang, Ziyu Zhu, Xiongkun Linghu, Qing Li, Song-Chun Zhu, Siyuan Huang:
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis. 24570-24581 - Qihang Peng, Henry Zheng, Gao Huang:
ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding. 24582-24592 - Ronghao Dang, Yuqian Yuan, Wenqi Zhang, Yifei Xin, Boqiang Zhang, Long Li, Liuyi Wang, Qinyang Zeng, Xin Li, Lidong Bing:
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark. 24593-24602 - Filippo Ziliotto, Tommaso Campari
, Luciano Serafini, Lamberto Ballan:
TANGO: Training-free Embodied AI Agents for Open-world Tasks. 24603-24613 - Xiangyuan Xue, Zeyu Lu, Di Huang, Zidong Wang, Wanli Ouyang, Lei Bai:
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems. 24614-24624 - Yunzhi Zhang, Zizhang Li, Matt Zhou, Shangzhe Wu, Jiajun Wu:
The Scene Language: Representing Scenes with Programs, Words, and Embeddings. 24625-24634 - Yongshuo Zong, Qin Zhang, Dongsheng An, Zhihua Li, Xiang Xu, Linghan Xu, Zhuowen Tu, Yifan Xing, Onkar Dabeer:
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels. 24635-24645 - Artemis Panagopoulou, Honglu Zhou, Silvio Savarese, Caiming Xiong, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles:
ViUniT: Visual Unit Tests for More Robust Visual Programming. 24646-24656 - Lei Li, Yuancheng Wei, Zhihui Xie, Xuqing Yang, Yifan Song, Peiyi Wang, Chenxin An, Tianyu Liu, Sujian Li, Bill Yuchen Lin, Lingpeng Kong, Qi Liu:
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models. 24657-24668 - Xingrui Wang, Wufei Ma, Tiezheng Zhang, Celso M. de Melo, Jieneng Chen, Alan L. Yuille:
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models. 24669-24679 - Aayush Dhakal, Srikumar Sastry, Subash Khanal, Adeel Ahmad
, Eric Xing, Nathan Jacobs:
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings. 24680-24689 - Jingyuan Yang, Jiawei Feng, Weibin Luo, Dani Lischinski, Daniel Cohen-Or, Hui Huang
:
EmoEdit: Evoking Emotions through Image Manipulation. 24690-24699 - Qu Yang, Qinghongya Shi, Tongxin Wang, Mang Ye
:
Uncertain Multimodal Intention and Emotion Understanding in the Wild. 24700-24709 - Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy:
F-LMM: Grounding Frozen Large Multimodal Models. 24710-24721 - Rui Qian, Xin Yin, Dejing Dou:
Reasoning to Attend: Try to Understand How Token Works. 24722-24731 - Yanyuan Chen, Dexuan Xu
, Yu Huang, Songkun Zhan, Hanpin Wang, Dongxue Chen, Xueping Wang, Meikang Qiu, Hang Li:
MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output. 24732-24741 - Yuzhong Zhao, Feng Liu, Yue Liu, Mingxiang Liao, Chen Gong, Qixiang Ye, Fang Wan:
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution. 24742-24752 - Zhen Yang, Zhuo Tao, Qi Chen, Liang Li, Yuankai Qi, Anton van den Hengel, Qingming Huang:
Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering. 24753-24762 - Hang Hua, Qing Liu, Lingzhi Zhang, Jing Shi, Soo Ye Kim, Zhifei Zhang, Yilin Wang, Jianming Zhang, Zhe Lin, Jiebo Luo
:
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity. 24763-24773 - Yan Li, Yifei Xing, Xiangyuan Lan, Xin Li, Haifeng Chen, Dongmei Jiang:
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment. 24774-24784 - Yuanmin Tang, Jing Yu, Keke Gai, Jiamin Zhuang, Gang Xiong, Gaopeng Gou, Qi Wu:
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval. 24785-24795 - Bangbang Zhou, Zuan Gao, Zixiao Wang, Boqiang Zhang, Yuxin Wang, Zhineng Chen, Hongtao Xie:
SynTab-LLaVA: Enhancing Multimodal Table Understanding with Decoupled Synthesis. 24796-24806 - Daiqing Qi, Handong Zhao, Jing Shi, Simon Jenni, Yifei Fan, Franck Dernoncourt, Scott Cohen, Sheng Li:
The Photographer's Eye: Teaching Multimodal Large Language Models to See, and Critique Like Photographers. 24807-24816 - Jun Chen, Dannong Xu, Junjie Fei, Chun-Mei Feng, Mohamed Elhoseiny
:
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents. 24817-24826 - Ryota Tanaka, Taichi Iki, Taku Hasegawa, Kyosuke Nishida, Kuniko Saito, Jun Suzuki:
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents. 24827-24837 - Linke Ouyang, Yuan Qu, Hongbin Zhou, Jiawei Zhu, Rui Zhang, Qunshu Lin, Bin Wang, Zhiyuan Zhao, Man Jiang, Xiaomeng Zhao, Jin Shi, Fan Wu, Pei Chu, Minghao Liu, Zhenxiang Li, Chao Xu, Bo Zhang, Botian Shi, Zhongying Tu, Conghui He:
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations. 24838-24848 - Haoxin Li, Boyang Li:
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data. 24849-24861 - Gensheng Pei, Tao Chen, Yujia Wang, Xinhao Cai, Xiangbo Shu, Tianfei Zhou, Yazhou Yao:
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection. 24862-24872 - Xugong Qin, Peng Zhang, Jun Jie Ou Yang, Gangyan Zeng, Yubo Li, Yuanyuan Wang, Wanqian Zhang, Pengwen Dai:
CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR. 24873-24883 - Rui Xiao, Sanghwan Kim, Mariana-Iuliana Georgescu, Zeynep Akata, Stephan Alaniz:
FLAIR: VLM with Fine-grained Language-informed Image Representations. 24884-24894 - Yuheng Feng, Changsong Wen, Zelin Peng, Li jiaye, Siyu Zhu:
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation. 24895-24904 - Cijo Jose, Théo Moutakanni, Dahyun Kang, Federico Baldassarre, Timothée Darcet, Hu Xu, Daniel Li, Marc Szafraniec, Michaël Ramamonjisoa, Maxime Oquab, Oriane Siméoni, Huy V. Vo, Patrick Labatut, Piotr Bojanowski:
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment. 24905-24916 - Davide Berasi, Matteo Farina, Massimiliano Mancini
, Elisa Ricci, Nicola Strisciuglio:
Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models. 24917-24927 - Jiuhai Chen, Jianwei Yang, Haiping Wu, Dianqi Li, Jianfeng Gao, Tianyi Zhou, Bin Xiao:
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion. 24928-24938 - Chenyu Yang, Xuan Dong, Xizhou Zhu, Weijie Su, Jiahao Wang, Hao Tian, Zhe Chen, Wenhai Wang, Lewei Lu, Jifeng Dai:
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models. 24939-24949 - Yaqi Zhao, Yuanyang Yin, Lin Li, Mingan Lin, Victor Shea-Jay Huang, Siwei Chen, Weipeng Chen, Baoqun Yin, Zenan Zhou, Wentao Zhang:
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge. 24950-24959 - Gen Luo, Xue Yang, Wenhan Dou, Zhaokai Wang, Jiawen Liu, Jifeng Dai, Yu Qiao, Xizhou Zhu:
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training. 24960-24971 - Xubing Ye, Yukang Gan, Yixiao Ge, Xiao-Ping Zhang, Yansong Tang:
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models. 24972-24982 - Dominik Schnaus, Nikita Araslanov, Daniel Cremers:
It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data. 24983-24992 - Kun Zhang
, Jingyu Li, Zhe Li
, S. Kevin Zhou
:
DH-Set: Improving Vision-Language Alignment with Diverse and Hybrid Set-Embeddings Learning. 24993-25003 - Zhangqi Jiang, Junkai Chen, Beier Zhu
, Tingjin Luo, Yankun Shen, Xu Yang:
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens. 25004-25014 - Yuncheng Guo, Xiaodong Gu:
MMRL: Multi-Modal Representation Learning for Vision-Language Models. 25015-25025 - Zixuan Hu, Yongxian Wei, Li Shen, Chun Yuan, Dacheng Tao:
LoRA Recycle: Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs. 25026-25037 - Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh, Tianrui Guan, Mengdi Wang, Ahmad Beirami, Furong Huang, Alvaro Velasquez, Dinesh Manocha, Amrit Singh Bedi:
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment. 25038-25049 - Yue Cao, Yun Xing, Jie Zhang, Di Lin, Tianwei Zhang, Ivor W. Tsang, Yang Liu, Qing Guo:
SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments. 25050-25059 - Zhaoyi Liu, Huan Zhang:
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models. 25060-25070 - Yuchen Ren
, Zhengyu Zhao, Chenhao Lin, Bo Yang, Lu Zhou, Zhe Liu, Chao Shen:
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement. 25071-25080 - Jenny Schmalfuss, Nadine Chang, Vibashan VS, Maying Shen, Andrés Bruhn, José M. Álvarez:
PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models. 25081-25091 - Yassir Bendou, Amine Ouasfi, Vincent Gripon, Adnane Boukhayma:
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models. 25092-25102 - Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer, Ismail Ben Ayed:
Realistic Test-Time Adaptation of Vision-Language Models. 25103-25112 - Dengyang Jiang, Haoyu Wang, Lei Zhang, Wei Wei, Guang Dai, Mengmeng Wang, Jingdong Wang, Yanning Zhang:
Low-Biased General Annotated Dataset Generation. 25113-25123 - Chaoyang Li, Jianyang Qin, Jinhao Cui, Zeyu Liu, Ning Hu, Qing Liao:
Joint Scheduling of Causal Prompts and Tasks for Multi-Task Learning. 25124-25134 - Hairui Ren, Fan Tang, He Zhao, Zixuan Wang, Dandan Guo, Yi Chang:
Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning. 25135-25144 - Xiangyan Qu, Gaopeng Gou, Jiamin Zhuang, Jing Yu, Kun Song, Qihao Wang, Yili Li, Gang Xiong:
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification. 25145-25155 - Jinpeng Wang, Tianci Luo, Yaohua Zha, Yan Feng, Ruisheng Luo, Bin Chen, Tao Dai, Long Chen, Yaowei Wang, Shu-Tao Xia:
Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning. 25156-25165 - Jung-Ho Hong, Ho-Joong Kim, Kyu-Sung Jeon, Seong-Whan Lee:
Comprehensive Information Bottleneck for Unveiling Universal Attribution to Interpret Vision Transformers. 25166-25175 - Jungsoo Lee, Debasmit Das, Munawar Hayat, Sungha Choi
, Kyuwoong Hwang, Fatih Porikli:
CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation. 25176-25186 - Debora Caldarola, Pietro Cagnasso, Barbara Caputo, Marco Ciccone:
Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning. 25187-25197 - Shunxin Wang, Raymond N. J. Veldhuis, Nicola Strisciuglio:
Do ImageNet-trained Models Learn Shortcuts? The Impact of Frequency Shortcuts on Generalization. 25198-25207 - Ningyuan Tang, Minghao Fu, Jianxin Wu:
Minimal Interaction Seperated Tuning: A New Paradigm for Visual Adaptation. 25208-25217 - Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Bruno Ribeiro, Chaim Baskin, Moshe Eliasof:
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations. 25218-25229 - Jian Meng, Ahmed Hassan, Li Yang, Deliang Fan, Jinwoo Shin, Jae-sun Seo:
Closest Neighbors are Harmful for Lightweight Masked Auto-encoders. 25230-25239 - Mengqiao Han, Liyuan Pan, Xiabi Liu:
GliaNet: Adaptive Neural Network Structure Learning with Glia-Driven. 25240-25249 - Quentin Bouniot, Ievgen Redko, Anton Mallasto, Charlotte Laclau, Oliver Struckmeier, Karol Arndt, Markus Heinonen, Ville Kyrki
, Samuel Kaski:
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport. 25250-25260 - Ali Hatamizadeh, Jan Kautz:
MambaVision: A Hybrid Mamba-Transformer Vision Backbone. 25261-25270 - Qihang Fan, Huaibo Huang, Ran He:
Breaking the Low-Rank Dilemma of Linear Attention. 25271-25280 - Dachong Li, Li Li, Zhuangzhuang Chen, Jianqiang Li:
ShiftwiseConv: Small Convolutional Kernel with Large Kernel Effect. 25281-25291 - Zelin Peng, Yu Huang, Zhengqin Xu, Feilong Tang, Ming Hu, Xiaokang Yang, Wei Shen:
Star with Bilinear Mapping. 25292-25302 - Tommie Kerssies, Niccolò Cavagnero
, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, Daan de Geus:
Your ViT is Secretly an Image Segmentation Model. 25303-25313 - Jiahao He, Keren Fu, Xiaohong Liu, Qijun Zhao:
Samba: A Unified Mamba-based Framework for General Salient Object Detection. 25314-25324 - Pei Geng, Jian Yang, Shanshan Zhang:
HORP: Human-Object Relation Priors Guided HOI Detection. 25325-25335 - Yifei Qian, Zhongliang Guo, Bowen Deng, Chun Tong Lei, Shuai Zhao, Chun Pong Lau, Xiaopeng Hong, Michael P. Pound:
T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting. 25336-25345 - Ziyu Zhao, Xiaoguang Li, Lingjia Shi, Nasrin Imanpour, Song Wang:
DPSeg: Dual-Prompt Cost Volume Learning for Open-Vocabulary Semantic Segmentation. 25346-25356 - Srinivasa Rao Nandam, Sara Atito, Zhenhua Feng, Josef Kittler, Muhammad Awais:
Text Augmented Correlation Transformer For Few-shot Classification & Segmentation. 25357-25366 - Guoyu Yang
, Yuan Wang, Daming Shi, Yanzhong Wang:
Golden Cudgel Network for Real-Time Semantic Segmentation. 25367-25376 - Hyeokjun Kweon, Kuk-Jin Yoon:
WISH: Weakly Supervised Instance Segmentation using Heterogeneous Labels. 25377-25387 - Can Küçüksözen, Yücel Yemez:
Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning. 25388-25398 - Mingfu Liang, Jiahuan Zhou, Xu Zou, Ying Wu:
Incremental Object Keypoint Learning. 25399-25410 - Shuo Li, Fang Liu, Zehua Hao, Xinyi Wang, Lingling Li, Xu Liu, Puhua Chen, Wenping Ma:
Logits DeConfusion with CLIP for Few-Shot Learning. 25411-25421 - Luyao Tang, Yuxuan Yuan, Chaoqi Chen, Zeyu Zhang, Yue Huang, Kun Zhang:
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad. 25422-25433 - Lei-Lei Ma, Shuo Xu, Ming-Kun Xie, Lei Wang, Dengdi Sun, Haifeng Zhao:
Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning. 25434-25443 - Qiyuan Dai, Hanzhuo Huang, Yu Wu, Sibei Yang:
Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement. 25444-25453 - Xing Xi, Yangyang Huang, Ronghua Luo, Yu Qiu:
OW-OVD: Unified Open World and Open Vocabulary Object Detection. 25454-25464 - Haochen Li, Rui Zhang, Hantao Yao, Xin Zhang, Yifan Hao, Xinkai Song, Shaohui Peng, Yongwei Zhao, Chen Zhao, Yanjun Wu, Ling Li:
SEEN-DA: SEmantic ENtropy guided Domain-aware Attention for Domain Adaptive Object Detection. 25465-25475 - Zhaohu Xing, Lihao Liu, Yijun Yang, Hongqiu Wang, Tian Ye, Sixiang Chen, Wenxue Li
, Guang Liu, Lei Zhu:
Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine. 25476-25486 - Beier Zhu
, Jiequan Cui, Hanwang Zhang, Chi Zhang:
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness. 25487-25496 - Zhou Yang, Mingtao Feng, Tao Huang
, Fangfang Wu, Weisheng Dong, Xin Li, Guangming Shi:
Gain from Neighbors: Boosting Model Robustness in the Wild via Adversarial Perturbations Toward Neighboring Classes. 25497-25507 - Han Sun, Yunkang Cao
, Hao Dong, Olga Fink:
Unseen Visual Anomaly Generation. 25508-25517 - Lei Fan
, Dongdong Fan, Zhiguang Hu, Yiwen Ding, Donglin Di, Kai Yi, Maurice Pagnucco, Yang Song:
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects. 25518-25527 - Shilhora Akshay, Niveditha Lakshmi Narasimhan, Jacob George, Vineeth N. Balasubramanian:
A Unified Latent Schrodinger Bridge Diffusion Model for Unsupervised Anomaly Detection and Localization. 25528-25538 - Yoon Gyo Jung, Jaewoo Park, Jaeho Yoon, Kuan-Chuan Peng, Wonchul Kim, Andrew Beng Jin Teoh, Octavia I. Camps:
TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection. 25539-25548 - Shubhang Bhatnagar, Narendra Ahuja:
Potential Field Based Deep Metric Learning. 25549-25559 - Yanghao Wang, Long Chen:
Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification. 25560-25569 - Shaobo Wang, Yicun Yang, Zhiyuan Liu, Chenghao Sun, Xuming Hu, Conghui He, Linfeng Zhang:
Dataset Distillation with Neural Characteristic Function: A Minmax Perspective. 25570-25580 - Wenliang Zhong, Haoyu Tang, Qinghai Zheng, Mingzhu Xu, Yupeng Hu, Weili Guan:
Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory. 25581-25589 - Felipe del Río, Alain Raymond-Saez, Daniel Florea, Rodrigo Toro Icarte
, Julio Hurtado, Cristian Buc Calderon, Alvaro Soto:
Data Distributional Properties As Inductive Bias for Systematic Generalization. 25590-25601 - Seokju Yun, Seunghye Chae, Dongheon Lee, Youngmin Ro:
SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning. 25602-25612 - Hao Zhu, Yifei Zhang, Junhao Dong, Piotr Koniusz:
BiLoRA: Almost-Orthogonal Parameter Spaces for Continual Learning. 25613-25622 - Haoyang Li
, Liang Wang, Chao Wang, Jing Jiang, Yan Peng, Guodong Long:
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models. 25623-25632 - Guowei Wang, Changxing Ding:
Effortless Active Labeling for Long-Term Test-Time Adaptation. 25633-25642 - Ye Liu, Meng Yang:
SEC-Prompt: SEmantic Complementary Prompting for Few-Shot Class-Incremental Learning. 25643-25656 - Li-Jun Zhao, Zhen-Duo Chen, Yongxin Wang, Xin Luo
, Xin-Shun Xu:
Attraction Diminishing and Distributing for Few-Shot Class-Incremental Learning. 25657-25666 - Kai Fang, Anqi Zhang, Guangyu Gao, Jianbo Jiao
, Chi Harold Liu, Yunchao Wei:
CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation. 25667-25676 - Sebastian Schmidt, Leonard Schenk, Leo Schwinn, Stephan Günnemann:
Joint Out-of-Distribution Filtering and Data Discovery Active Learning. 25677-25687 - Ronghang Zhu, Mengxuan Hu, Weiming Zhuang, Lingjuan Lyu
, Xiang Yu, Sheng Li:
Revisiting Source-Free Domain Adaptation: Insights into Representativeness, Generalization, and Variety. 25688-25697 - Junyi Chai, Shenyu Lu, Xiaoqian Wang:
Identifying and Mitigating Spurious Correlation in Multi-Task Learning. 25698-25707 - Na Zheng, Xuemeng Song, Xue Dong
, Aashish Nikhil Ghosh, Liqiang Nie, Roger Zimmermann:
Language-Assisted Debiasing and Smoothing for Foundation Model-Based Semi-Supervised Learning. 25708-25717 - Lilin Zhang, Chengpei Wu, Ning Yang:
Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data. 25718-25727 - Qi Chen, Hu Ding:
Dual Energy-Based Model with Open-World Uncertainty Estimation for Out-of-distribution Detection. 25728-25737 - Senyu Hou, Gaoxia Jiang, Jia Zhang, Shangrong Yang, Husheng Guo, Yaqing Guo, Wenjian Wang:
Directional Label Diffusion Model for Learning from Noisy Labels. 25738-25748 - Yunlu Yan, Huazhu Fu, Yuexiang Li, Jinheng Xie, Jun Ma, Guang Yang, Lei Zhu:
A Simple Data Augmentation for Feature Distribution Skewed Federated Learning. 25749-25758 - Yasser H. Khalil, Leo Maxime Brunswic, Soufiane Lamghari, Xu Li, Mahdi Beitollahi, Xi Chen:
NoT: Federated Unlearning via Weight Negation. 25759-25769 - Ye Li, Yanchao Zhao, Chengcheng Zhu, Jiale Zhang:
Infighting in the Dark: Multi-Label Backdoor Attack in Federated Learning. 25770-25779 - Dongyoon Yang, Jihu Lee, Yongdai Kim:
TAROT: Towards Essentially Domain-Invariant Robustness with Theoretical Justification. 25780-25789 - Hanrong Zhang, Zhenting Wang, Boheng Li, Fulin Lin, Tingxu Han
, Mingyu Jin, Chenlu Zhan, Mengnan Du, Hongwei Wang, Shiqing Ma:
Invisible Backdoor Attack against Self-supervised Learning. 25790-25801 - Kaisheng Liang, Xuelong Dai, Yanjie Li, Dong Wang, Bin Xiao:
Improving Transferable Targeted Attacks with Feature Tuning Mixup. 25802-25811 - Han Liu, Peng Cui, Bingning Wang, Weipeng Chen, Yupeng Zhang, Jun Zhu, Xiaolin Hu:
Improving Accuracy and Calibration via Differentiated Deep Mutual Learning. 25812-25821 - Ren Wang, Haoliang Sun, Yuxiu Lin, Chuanhui Zuo, Yongshun Gong, Yilong Yin, Wenjia Meng:
SeqMvRL: A Sequential Fusion Framework for Multi-view Representation Learning. 25822-25831 - Bowen Zhao, Qianqian Wang, Zhengming Ding, Quanxue Gao:
Attribute-Missing Multi-view Graph Clustering. 25832-25841 - Thomas Dagès, Simon Weber, Ya-Wei Eileen Lin, Ronen Talmon, Daniel Cremers, Michael Lindenbaum, Alfred M. Bruckstein, Ron Kimmel:
Finsler Multi-Dimensional Scaling: Manifold Learning for Asymmetric Dimensionality Reduction and Embedding. 25842-25853 - Chengxiang Huang, Yake Wei, Zequn Yang, Di Hu:
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition. 25854-25863 - Guanzhou Ke, Shengfeng He, Xiaoli Wang, Bo Wang, Guoqing Chao, Yuanyang Zhang
, Yi Xie, Hexing Su:
Knowledge Bridger: Towards Training-Free Missing Modality Completion. 25864-25873 - Max Gutbrod, David Rauber, Danilo Weber Nunes
, Christoph Palm:
OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection. 25874-25886 - Mariamma Antony, Rajiv Porana, Sahil M. Lathiya, Siva Teja Kakileti, Chiranjib Bhattacharyya:
CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices. 25887-25896 - Hanbin Ko, Chang-Min Park
:
Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis. 25897-25906 - Shahad Albastaki, Anabia Sohail, Iyyakutti Iyappan Ganapathi, Basit Alawode, Asim Khan, Sajid Javed, Naoufel Werghi, Mohammed Bennamoun, Arif Mahmood:
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation. 25907-25919 - Tong Wang, Mingkang Wang, Zhongze Wang, Hongkai Wang, Qi Xu, Fengyu Cong, Hongming Xu:
ODA-GAN: Orthogonal Decoupling Alignment GAN Assisted by Weakly-supervised Learning for Virtual Immunohistochemistry Staining. 25920-25929 - Yisi Luo, Xile Zhao, Kai Ye, Deyu Meng:
STINR: Deciphering Spatial Transcriptomics via Implicit Neural Representation. 25930-25939 - Zheng Zhang, Guanchun Yin, Bo Zhang, Wu Liu, Xiuzhuang Zhou, Wendong Wang:
A Semantic Knowledge Complementarity based Decoupling Framework for Semi-supervised Class-imbalanced Medical Image Segmentation. 25940-25949 - Theodore Zhao, Sid Kiblawi, Naoto Usuyama, Ho Hin Lee, Sam Preston, Hoifung Poon, Mu Wei:
Boltzmann Attention Sampling for Image Analysis with Small Objects. 25950-25959 - Rong Qin, Xingyu Liu, Jinglei Shi, Liang Lin, Jufeng Yang:
Boosting the Dual-Stream Architecture in Ultra-High Resolution Segmentation with Resolution-Biased Uncertainty Estimation. 25960-25970 - Yankai Jiang, Peng Zhang, Donglin Yang, Yuan Tian, Hai Lin, Xiaosong Wang:
Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models. 25971-25981 - Zheyu Zhang, Yayuan Lu, Feipeng Ma, Yueyi Zhang, Huanjing Yue, Xiaoyan Sun:
Incomplete Multi-modal Brain Tumor Segmentation via Learnable Sorting State Space Model. 25982-25992 - Yang Yue
, Yulin Wang, Haojun Jiang, Pan Liu, Shiji Song, Gao Huang:
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance. 25993-26003 - Armeet Singh Jatyani, Jiayun Wang, Aditi Chandrashekar, Zihui Wu, Miguel Liu-Schiaffini, Bahareh Tolooshams, Anima Anandkumar:
A Unified Model for Compressed Sensing MRI Across Undersampling Patterns. 26004-26013 - Thomas Hastings Greer, Lin Tian, François-Xavier Vialard, Roland Kwitt, Raúl San José Estépar, Marc Niethammer:
CARL: A Framework for Equivariant Image Registration. 26014-26023 - Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, Huan Ling:
DIFIX3D+: Improving 3D Reconstructions with Single-Step Diffusion Models. 26024-26035 - Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei, Nicolas Moënne-Loccoz, Zan Gojcic:
3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting. 26036-26046 - Xinyi Zhang, Naiqi Li, Angela Dai:
DNF: Unconditional 4D Generation with Dictionary-based Neural Fields. 26047-26056 - Rundi Wu, Ruiqi Gao, Ben Poole, Alex Trevithick, Changxi Zheng, Jonathan T. Barron, Aleksander Holynski:
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models. 26057-26068 - Ruofan Liang, Zan Gojcic, Huan Ling, Jacob Munkberg, Jon Hasselgren, Chih-Hao Lin, Jun Gao, Alexander Keller, Nandita Vijaykumar, Sanja Fidler, Zian Wang:
Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models. 26069-26080 - Minhyeok Lee, Suhwan Cho, Jungho Lee, Sunghun Yang, Heeseung Choi, Ig-Jae Kim, Sangyoun Lee:
Effective SAM Combination for Open-Vocabulary Semantic Segmentation. 26081-26090 - Yue Gao, Hong-Xing Yu, Bo Zhu, Jiajun Wu:
FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video. 26091-26101 - Chen Geng
, Yunzhi Zhang, Shangzhe Wu, Jiajun Wu:
Birth and Death of a Rose. 26102-26113 - Shangquan Sun, Wenqi Ren, Juxiang Zhou, Shu Wang, Jianhou Gan, Xiaochun Cao
:
Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining. 26114-26124 - Qifan Yu, Wei Chow, Zhongqi Yue, Kaihang Pan, Yang Wu, Xiaoyang Wan, Juncheng Li, Siliang Tang, Hanwang Zhang, Yueting Zhuang:
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea. 26125-26135 - Kaihang Pan, Wang Lin, Zhongqi Yue, Tenglong Ao, Liyu Jia, Wei Zhao, Juncheng Li, Siliang Tang, Hanwang Zhang:
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens. 26136-26146 - Feilong Tang, Chengzhi Liu, Zhongxing Xu, Ming Hu, Zile Huang, Haochen Xue, Ziyang Chen, Zelin Peng, Zhiwei Yang, Sijin Zhou, Wenxue Li
, Yulong Li, Wenxuan Song, Shiyan Su, Wei Feng, Jionglong Su, Mingquan Lin, Yifan Peng, Xuelian Cheng, Imran Razzak, Zongyuan Ge:
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding. 26147-26159 - Yan Shu, Zheng Liu, Peitian Zhang, Minghao Qin, Junjie Zhou, Zhengyang Liang, Tiejun Huang, Bo Zhao:
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding. 26160-26169 - Jian Liang, Wenke Huang, Guancheng Wan, Qu Yang, Mang Ye:
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models. 26170-26180 - Songhao Han, Wei Huang, Hairong Shi, Le Zhuo, Xiu Su, Shifeng Zhang, Xu Zhou, Xiaojuan Qi, Yue Liao, Si Liu:
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection. 26181-26191 - Lan Wang, Yujia Chen, Du Tran, Vishnu Naresh Boddeti, Wen-Sheng Chu:
SEAL: Semantic Attention Learning for Long Video Representation. 26192-26201 - Boseung Jeong, Jicheol Park, Sungyeon Kim, Suha Kwak:
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval. 26202-26211 - Haotian Wang, Yuzhe Weng, Yueyan Li, Zilu Guo, Jun Du, Shutong Niu, Jiefeng Ma, Shan He, Xiaoyan Wu, Qiming Hu, Bing Yin, Cong Liu, Qingfeng Liu:
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion. 26212-26221 - Huaize Liu, Wenzhang Sun, Donglin Di, Shibo Sun, Jiahui Yang, Changqing Zou, Hujun Bao:
MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation. 26222-26231 - Shuling Zhao, Fa-Ting Hong, Xiaoshui Huang
, Dan Xu
:
Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation. 26232-26241 - Yukang Lin, Hokit Fung, Jianjin Xu, Zeping Ren, Adela S. M. Lau, Guosheng Yin, Xiu Li:
MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation. 26242-26252 - Fa-Ting Hong, Zhan Xu, Haiyang Liu, Qinjie Lin, Luchuan Song, Zhixin Shu, Yang Zhou, Duygu Ceylan, Dan Xu
:
Free-viewpoint Human Animation with Pose-correlated Reference Selection. 26253-26262 - Yuming Gu, Phong Tran, Yujian Zheng, Hongyi Xu, Heyuan Li, Adilbek Karmanov, Hao Li:
DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis. 26263-26273 - Cong Wang, Di Kang, Heyi Sun, Shen-Han Qian, Zixuan Wang, Linchao Bao, Song-Hai Zhang:
MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing. 26274-26284 - Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Kangjie Chen, Minghan Qin, Yu Li, Haoqian Wang:
HRAvatar: High-Quality and Relightable Gaussian Head Avatar. 26285-26296 - Youyi Zhan, Tianjia Shao, Yin Yang, Kun Zhou:
Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs. 26297-26307 - Yiyu Zhuang, Jiaxi Lv, Hao Wen, Qing Shuai, Ailing Zeng, Hao Zhu, Shifeng Chen, Yujiu Yang
, Xun Cao, Wei Liu:
IDOL: Instant Photorealistic 3D Human Creation from a Single Image. 26308-26319 - Xueting Li, Ye Yuan, Shalini De Mello, Gilles Daviet, Jonathan Leaf, Miles Macklin, Jan Kautz, Umar Iqbal:
SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing. 26320-26330 - Hui En Pang, Shuai Liu, Zhongang Cai, Lei Yang, Tianwei Zhang, Ziwei Liu:
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image. 26331-26344 - Yuze He, Yanning Zhou, Wang Zhao, Zhongkai Wu, Kaiwen Xiao, Wei Yang, Yong-Jin Liu, Xiao Han
:
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images. 26345-26355 - Philipp Flotho, Moritz Piening, Anna Kukleva, Gabriele Steidl:
T-FAKE: Synthesizing Thermal Images for Facial Landmarking. 26356-26366 - Jianlong Jin, Chenglong Zhao, Ruixin Zhang, Sheng Shang, Jianqing Xu, Jingyun Zhang, Shaoming Wang, Yang Zhao, Shouhong Ding, Wei Jia, Yunsheng Wu:
Diff-Palm: Realistic Palmprint Generation with Polynomial Creases and Intra-Class Variation Controllable Diffusion Models. 26367-26376 - Hanzhang Tu, Zhanfeng Liao, Boyao Zhou, Shunyuan Zheng, Xilong Zhou, Liuxin Zhang, QianYing Wang, Yebin Liu:
GBC-Splat: Generalizable Gaussian-Based Clothed Human Digitalization under Sparse RGB Cameras. 26377-26387 - Zijian He, Yuwei Ning, Yipeng Qin, Guangrun Wang, Sibei Yang, Liang Lin, Guanbin Li:
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction. 26388-26398 - Xuanpu Zhang, Dan Song, Pengxin Zhan, Tianyu Chang, Jianhao Zeng, Qingguo Chen, Weihua Luo, An-An Liu:
BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training. 26399-26408 - Daisheng Jin, Jiangbei Hu, Baixin Xu, Yuxin Dai, Chen Qian, Ying He:
SFDM: Robust Decomposition of Geometry and Reflectance for Realistic Face Rendering from Sparse-view Images. 26409-26419 - Wenjun Wei, Yanlin Qian, Huaian Chen, Junkang Dai, Yi Jin:
Integral Fast Fourier Color Constancy. 26420-26429 - Hao Zhao, Mingjia Li, Qiming Hu, Xiaojie Guo:
Reversible Decoupling Network for Single Image Reflection Removal. 26430-26439 - Shouhang Zhu, Chenglin Li, Yuankun Jiang, Li Wei, Nuowen Kan, Ziyang Zheng, Wenrui Dai, Junni Zou, Hongkai Xiong:
Stabilizing and Accelerating Autofocus with Expert Trajectory Regularized Deep Reinforcement Learning. 26440-26450 - Jiayin Zhao, Zhenqi Fu, Tao Yu, Hui Qiao:
V2V3D: View-to-View Denoised 3D Reconstruction for Light Field Microscopy. 26451-26461 - Liao Shen, Tianqi Liu, Huiqiang Sun, Jiaqi Li, Zhiguo Cao, Wei Li, Chen Change Loy:
DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting. 26462-26471 - Ziteng Cui
, Xuangeng Chu, Tatsuya Harada:
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment. 26472-26482 - Youjia Zhang, Anpei Chen, Yumin Wan, Zikai Song, Junqing Yu, Yawei Luo, Wei Yang:
Ref-GS: Directional Factorization for 2D Gaussian Splatting. 26483-26492 - Chenhao Li, Taishi Ono, Takeshi Uemori, Sho Nitta, Hajime Mihara, Alexander Gatto, Hajime Nagahara, Yusuke Moriuchi:
NeISF++: Neural Incident Stokes Field for Polarized Inverse Rendering of Conductors and Dielectrics. 26493-26503 - Zhifei Chen, Tianshuo Xu, Wenhang Ge, Leyi Wu, Dongyu Yan, Jing He, Luozhou Wang, Lu Zeng, Shunsi Zhang, Ying-Cong Chen:
Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion. 26504-26513 - Zexin He, Tengfei Wang, Xin Huang, Xingang Pan, Ziwei Liu:
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion. 26514-26524 - Jiahui Fan, Fujun Luan, Jian Yang, Milos Hasan, Beibei Wang:
RNG: Relightable Neural Gaussians. 26525-26534 - Bruno Galerne, Jianling Wang, Lara Raad, Jean-Michel Morel
:
SGSST: Scaling Gaussian Splatting Style Transfer. 26535-26544 - Chuhao Chen
, Zhiyang Dou, Chen Wang, Yiming Huang
, Anjun Chen, Qiao Feng, Jiatao Gu, Lingjie Liu:
Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation. 26545-26555 - Xin Huang, Tengfei Wang, Ziwei Liu, Qing Wang:
Material Anything: Generating Materials for Any 3D Object via Diffusion. 26556-26565 - Jialun Liu, Jinbo Wu, Xiaobo Gao, Jiakui Hu, Bojun Xiong, Xing Liu, Chen Zhao, Hongbin Pei, Haocheng Feng, Yingying Li, Errui Ding, Jingdong Wang:
TexGarment: Consistent Garment UV Texture Generation via Efficient 3D Structure-Guided Diffusion Transformer. 26566-26575 - Zhaoxi Chen, Jiaxiang Tang, Yuhao Dong, Ziang Cao, Fangzhou Hong, Yushi Lan, Tengfei Wang, Haozhe Xie, Tong Wu, Shunsuke Saito, Liang Pan, Dahua Lin, Ziwei Liu:
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion. 26576-26586 - Hao Guo, Xiaoshui Huang, Jiacheng Hao, Yunpeng Bai, Hongping Gan, Yilei Shi:
BrepGiff: Lightweight Generation of Complex B-rep with 3D GAT Diffusion. 26587-26596 - Xinyu Gao, Ziyi Yang, Bingchen Gong, Xiaoguang Han, Sipeng Yang, Xiaogang Jin:
Towards Realistic Example-based Modeling via 3D Gaussian Stitching. 26597-26607 - Stefan Lionar, Jiabin Liang, Gim Hee Lee:
TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing. 26608-26617 - Yuezhi Yang, Qimin Chen, Vladimir G. Kim, Siddhartha Chaudhuri, Qixing Huang, Zhiqin Chen:
GenVDM: Generating Vector Displacement Maps From a Single Image. 26618-26629 - Kai He, Chin-Hsuan Wu, Igor Gilitschenski:
CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion. 26630-26640 - Jiamin Wu, Kenkun Liu, Han Gao, Xiaoke Jiang, Yuan Yao, Lei Zhang:
LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians. 26641-26651 - Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Boni Hu, Linning Xu, Zhilin Pei, Hengjie Li
, Xiuhong Li, Ninghui Sun, Xingcheng Zhang, Bo Dai:
FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering. 26652-26662 - Peihao Wang, Yuehao Wang, Dilin Wang, Sreyas Mohan, Zhiwen Fan
, Lemeng Wu, Ruisi Cai, Yu-Ying Yeh, Zhangyang Wang, Qiang Liu, Rakesh Ranjan:
Steepest Descent Density Control for Compact 3D Gaussian Splatting. 26663-26672 - Yangming Zhang, Wenqi Jia
, Wei Niu, Miao Yin:
GaussianSpa: An "Optimizing-Sparsifying" Simplification Framework for Compact and High-Quality 3D Gaussian Splatting. 26673-26682 - Seungtae Nam, Xiangyu Sun, Gyeongjin Kang
, Younggeun Lee, Seungjun Oh, Eunbyung Park:
Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction. 26683-26693 - Zhihao Shi, Dong Huo, Yuhongze Zhou, Yan Min, Juwei Lu, Xinxin Zuo
:
IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement. 26694-26703 - Sheng-Yu Huang, Zi-Ting Chou, Yu-Chiang Frank Wang:
3D Gaussian Inpainting with Depth-Guided Cross-View Consistency. 26704-26713 - Xinpeng Liu, Zeyi Huang, Fumio Okura
, Yasuyuki Matsushita:
HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting. 26714-26722 - Zilong Huang, Jun He, Junyan Ye, Lihan Jiang, Weijia Li, Yiping Chen, Ting Han:
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration. 26723-26733 - Xiaoqian Ruan, Pei Yu, Dian Jia, Hyeonjeong Park, Peixi Xiong, Wei Tang:
Learning Partonomic 3D Reconstruction from Image Collections. 26734-26744 - Hanyang Kong, Xingyi Yang, Xinchao Wang:
Generative Sparse-View Gaussian Splatting. 26745-26755 - Noam Elata, Bahjat Kawar, Yaron Ostrovsky-Berman, Miriam Farber, Ron Sokolovsky:
Novel View Synthesis with Pixel-Space Diffusion Models. 26756-26766 - Ruijie Lu, Yixin Chen, Junfeng Ni, Baoxiong Jia, Yu Liu, Diwen Wan, Gang Zeng, Siyuan Huang:
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes. 26767-26778 - Youngkyoon Jang, Eduardo Pérez-Pellitero:
CoMapGS: Covisibility Map-based Gaussian Splatting for Sparse Novel View Synthesis. 26779-26788 - Lihan Jiang, Kerui Ren, Mulin Yu, Linning Xu, Junting Dong, Tao Lu, Feng Zhao, Dahua Lin, Bo Dai:
Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes. 26789-26799 - Yulong Zheng, Zicheng Jiang, Shengfeng He, Yandu Sun, Junyu Dong, Huaidong Zhang, Yong Du:
NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting. 26800-26809 - Yutao Tang, Yuxiang Guo, Deming Li, Cheng Peng:
SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction. 26810-26821 - Shangjin Zhai, Zhichao Ye, Jialin Liu, Weijian Xie, Jiaqi Hu, Zhen Peng, Hua Xue, Danpeng Chen, Xiaomeng Wang, Lei Yang, Nan Wang, Haomin Liu, Guofeng Zhang:
StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation. 26822-26833 - Mingzhi Pei, Xu Cao, Xiangyi Wang, Heng Guo, Zhanyu Ma:
PMNI: Pose-free Multi-view Normal Integration for Reflective and Textureless Surface Reconstruction. 26834-26843 - Bingbing Hu, Yanyan Li, Rui Xie, Bo Xu, Haoye Dong, Junfeng Yao, Gim Hee Lee:
Learnable Infinite Taylor Gaussian for Dynamic View Rendering. 26844-26854 - JooHyun Kwon, Hanbyel Cho, Junmo Kim:
Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation. 26855-26865 - Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello, Jaeho Moon, Jihyong Oh, Munchurl Kim:
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video. 26866-26875 - Toshiya Yura, Ashkan Mirzaei, Igor Gilitschenski:
EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering. 26876-26886 - Jianping Jiang, Weiye Xiao, Zhengyu Lin, Huaizhong Zhang, Tianxiang Ren, Yang Gao, Zhiqian Lin, Zhongang Cai, Lei Yang, Ziwei Liu:
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters. 26887-26898 - Aleksei Zhuravlev, Zorah Lähner, Vladislav Golyanik:
Denoising Functional Maps: Diffusion Models for Shape Correspondence. 26899-26909 - Ziyuan Qu, Zihao Zou, Vivek Boominathan, Praneeth Chakravarthula, Adithya Pediredla
:
Event Fields: Capturing Light Fields at High Speed, Resolution, and Dynamic Range. 26910-26920 - Hidenobu Matsuki, Gwangbin Bae, Andrew J. Davison:
4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians. 26921-26932 - Jian Huang, Chengrui Dong, Xuanhua Chen, Peidong Liu:
IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera. 26933-26942 - Zhiqiang Yan, Zhengxue Wang, Kun Wang, Jun Li, Jian Yang:
Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion. 26943-26953 - Nikhil Behari, Aaron Young, Siddharth Somasundaram, Tzofi Klinghoffer, Akshat Dave
, Ramesh Raskar:
Blurred LiDAR for Sharper 3D: Robust Handheld 3D Scanning with Diffuse LiDAR and RGB. 26954-26964 - Junjie Luo, John Mamish, Alan Fu, Thomas Concannon, Josiah D. Hester, Emma Alexander, Qi Guo:
Focal Split: Untethered Snapshot Depth from Differential Defocus. 26965-26974 - Mehdi Zayene, Jannik Endres
, Albias Havolli, Charles Corbière, Salim Cherkaoui, Alexandre Kontouli, Alexandre Alahi:
HELVIPAD: A Real-World Dataset for Omnidirectional Stereo Depth Estimation. 26975-26984 - Pratheba Selvaraju, Victoria Fernández Abrevaya, Timo Bolkart, Rick Akkerman, Tianyu Ding, Faezeh Amjadi, Ilya Zharkov:
OFER: Occluded Face Expression Reconstruction. 26985-26995 - Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh, Xinyu Huang, Liu Ren:
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera. 26996-27006 - Marvin Anas Hahn, Kathlén Kohn, Orlando Marigliano, Tomás Pajdla:
Order-One Rolling Shutter Cameras. 27007-27016 - Daniel Safari:
Matrix-Free Shared Intrinsics Bundle Adjustment. 27017-27026 - Jiachen Liu, Rui Yu
, Sili Chen, Sharon X. Huang
, Hengkai Guo:
Towards In-the-wild 3D Plane Reconstruction from a Single Image. 27027-27037 - Pengju Sun, Banglei Guan, Zhenbao Yu, Yang Shang, Qifeng Yu, Daniel Barath:
Learning Affine Correspondences by Integrating Geometric Constraints. 27038-27048 - Jianping Wu:
DiskVPS: Vanishing Point Detector via Hough Transform in a Disk Region. 27049-27058 - Zhiwei Huang, Hailin Yu, Yichun Shentu, Jin Yuan, Guofeng Zhang:
From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting. 27059-27069 - Thibaut Loiseau, Guillaume Bourmaud:
RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges. 27070-27080 - Fei Xue, Sven Elflein, Laura Leal-Taixé, Qunjie Zhou:
MATCHA: Towards Matching Anything. 27081-27091 - Junwei Zheng, Ruiping Liu, Yufan Chen, Zhenfang Chen, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen:
Scene-agnostic Pose Regression for Visual Localization. 27092-27102 - Xinyue Zhang, Zijia Dai, Wanting Xu, Laurent Kneip:
Simulator HC: Regression-based Online Simulation of Starting Problem-Solution Pairs for Homotopy Continuation in Geometric Vision. 27103-27112 - Shujuan Li, Yu-Shen Liu, Zhizhong Han:
GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting. 27113-27123 - Miroslav Purkrábek
, Jiri Matas:
ProbPose: A Probabilistic Approach to 2D Human Pose Estimation. 27124-27133 - Yunze Man, Yichen Sheng, Jianming Zhang, Liang-Yan Gui, Yu-Xiong Wang:
Floating No More: Object-Ground Reconstruction from a Single Image. 27134-27143 - Junfu Guo, Yu Xin, Gaoyi Liu, Kai Xu, Ligang Liu, Ruizhen Hu:
ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting. 27144-27153 - Weihang Li, Hongli Xu, Junwen Huang, Hyunjun Jung, Peter KT Yu, Nassir Navab, Benjamin Busam:
GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation. 27154-27165 - Yuanbo Xiangli, Ruojin Cai, Hanyu Chen
, Jeffrey Byrne, Noah Snavely:
Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features. 27166-27175 - Mengjie Xu, Yitao Zhu
, Haotian Jiang, Jiaming Li, Zhenrong Shen, Sheng Wang, Haolin Huang, Xinyu Wang, Han Zhang, Qing Yang, Qian Wang:
MITracker: Multi-View Integration for Visual Object Tracking. 27176-27185 - Friedhelm Hamann, Daniel Gehrig, Filbert Febryanto, Kostas Daniilidis, Guillermo Gallego:
ETAP: Event-based Tracking of Any Point. 27186-27196 - Hoonhee Cho, Jae-Young Kang, Youngho Kim, Kuk-Jin Yoon:
Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras. 27197-27210 - Zechuan Li, Hongshan Yu, Yihao Ding, Jinhao Qiao, Basim Azam
, Naveed Akhtar:
GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector. 27211-27221 - Shin-Fang Ch'ng, Hemanth Saratchandran, Simon Lucey:
Preconditioners for the Stochastic Training of Neural Fields. 27222-27232 - Chenhui Shi, Fulin Tang, Ning An, Yihong Wu:
3D-SLNR: A Super Lightweight Neural Representation for Large-scale 3D Mapping. 27233-27242 - Guangshun Wei, Yuan Feng, Long Ma, Chen Wang, Yuanfeng Zhou, Changjian Li
:
PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors. 27243-27253 - Zikuan Li, Honghua Chen, Yuecheng Wang, Sibo Wu, Mingqiang Wei, Jun Wang:
STAR-Edge: Structure-aware Local Spherical Curve Representation for Thin-walled Edge Extraction from Unstructured Point Clouds. 27254-27263 - Zhangquan Chen, Puhua Jiang, Ruqi Huang:
DV-Matcher: Deformation-based Non-rigid Point Cloud Matching Guided by Pre-trained Visual Features. 27264-27274 - Ruiqi Zhang, Hao Zhu, Jingyi Zhao, Qi Zhang, Xun Cao, Zhan Ma:
Mitigating Ambiguities in 3D Classification with Gaussian Splatting. 27275-27284 - Changfeng Ma, Ran Bi, Jie Guo, Chongjun Wang, Yanwen Guo:
Sparse Point Cloud Patches Rendering via Splitting 2D Gaussians. 27285-27294 - Jinfeng Xu
, Xianzhi Li, Yuan Tang, Xu Han, Qiao Yu, Yixue Hao, Long Hu, Min Chen:
SASep: Saliency-Aware Structured Separation of Geometry and Feature for Open Set Learning on Point Clouds. 27295-27304 - Xinjie Wang
, Yifan Zhang, Ting Liu, Xinpu Liu
, Ke Xu, Jianwei Wan, Yulan Guo, Hanyun Wang:
TopNet: Transformer-Efficient Occupancy Prediction Network for Octree-Structured Point Cloud Geometry Compression. 27305-27314 - Qiang Li, Jian Ruan
, Fanghao Wu, Yuchi Chen, Zhihua Wei, Wen Shen:
A Unified Approach to Interpreting Self-supervised Pre-training Methods for 3D Point Clouds via Interactions. 27315-27324 - Wentao Qu, Jing Wang, Yongshun Gong, Xiaoshui Huang, Liang Xiao:
An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models. 27325-27335 - Sifan Zhou
, Zhihang Yuan, Dawei Yang, Xing Hu, Jian Qian, Ziyu Zhao:
PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram. 27336-27345 - Yante Li, Hanwen Qi, Haoyu Chen, Xinlian Liang, Guoying Zhao:
Deep Change Monitoring: A Hyperbolic Representative Learning Framework and a Dataset for Long-term Fine-grained Tree Change Detection. 27346-27356 - Dusan Malic, Christian Fruhwirth-Reisinger, Samuel Schulter, Horst Possegger:
GBlobs: Explicit Local Structure via Gaussian Blobs for Improved Cross-Domain LiDAR-based 3D Object Detection. 27357-27367 - Xiang Xu, Lingdong Kong, Hui Shuai, Liang Pan, Ziwei Liu, Qingshan Liu:
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes. 27368-27379 - Chuandong Liu, Xingxing Weng, Shuguo Jiang, Pengcheng Li, Lei Yu, Gui-Song Xia:
Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation. 27380-27389 - Xun Huang, Jinlong Wang, Qiming Xia, Siheng Chen, Bisheng Yang, Xin Li, Cheng Wang, Chenglu Wen:
V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection. 27390-27400 - Jinhyung Park, Navyata Sanghvi, Hiroki Adachi, Yoshihisa Shibata, Shawn Hunt, Shinya Tanaka, Hironobu Fujiyoshi, Kris Kitani:
Leveraging Temporal Cues for Semi-Supervised Multi-View 3D Object Detection. 27401-27412 - Ziteng Xue, Mingzhe Guo, Heng Fan, Shihui Zhang
, Zhipeng Zhang:
CorrBEV: Multi-View 3D Object Detection by Correlation Learning with Multi-modal Prototypes. 27413-27423 - Hermann Blum
, Alessandro Mercurio, Joshua O'Reilly, Tim Engelbracht, Mihai Dusmanu, Marc Pollefeys, Zuria Bauer:
CroCoDL: Cross-device Collaborative Dataset for Localization. 27424-27434 - Tomás Soucek, Prajwal Gatti, Michael Wray
, Ivan Laptev, Dima Damen
, Josef Sivic:
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions. 27435-27445 - Haisheng Su, Feixiang Song, Cong Ma, Wei Wu, Junchi Yan:
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments. 27446-27455 - Christopher Diehl, Quinlan Sykora, Ben Agro, Thomas Gilles, Sergio Casas, Raquel Urtasun:
DIO: Decomposable Implicit 4D Occupancy-Flow World Model. 27456-27466 - Jonas Kälble, Sascha Wirges, Maxim Tatarchenko, Eddy Ilg:
EvOcc: Accurate Semantic Occupancy for Automated Driving Using Evidence Theory. 27467-27476 - Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng, Yunpeng Zhang, Dalong Du, Jiwen Lu:
GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction. 27477-27486 - Su Sun, Cheng Zhao, Zhuoyang Sun, Yingjie Victor Chen, Mei Chen:
SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving. 27487-27496 - Hongbin Lin, Zilu Guo, Yifan Zhang, Shuaicheng Niu, Yafeng Li, Ruimao Zhang, Shuguang Cui, Zhen Li:
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation. 27497-27507 - Halil Ibrahim Öztürk, Muhammet Esat Kalfaoglu, Ozsel Kilinc:
GLane3D: Detecting Lanes with Graph of 3D Keypoints. 27508-27518 - Yichong Lu, Yichi Cai, Shangzhan Zhang, Hongyu Zhou, Haoji Hu, Huimin Yu, Andreas Geiger, Yiyi Liao:
UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation. 27519-27530 - Tianyi Yan, Dongming Wu, Wencheng Han, Junpeng Jiang, Xia Zhou, Kun Zhan, Cheng-Zhong Xu, Jianbing Shen:
DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation. 27531-27541 - Haohong Lin, Xin Huang, Tung Phan, David S. Hayden, Huan Zhang, Ding Zhao
, Siddhartha S. Srinivasa, Eric M. Wolff, Hongge Chen:
Causal Composition Diffusion Model for Closed-loop Traffic Generation. 27542-27552 - Wayne Wu, Honglin He, Chaoyuan Zhang, Jack He, Seth Z. Zhao, Ran Gong, Quanyi Li, Bolei Zhou:
Towards Autonomous Micromobility through Scalable Urban Simulation. 27553-27563 - Kaouther Messaoud, Matthieu Cord, Alexandre Alahi:
Towards Generalizable Trajectory Prediction using Dual-Level Representation Learning and Adaptive Prompting. 27564-27574 - Deepti Hegde, Rajeev Yasarla, Hong Cai, Shizhong Han, Apratim Bhattacharyya, Shweta Mahajan, Litian Liu, Risheek Garrepalli, Vishal M. Patel, Fatih Porikli:
Distilling Multi-modal Large Language Models for Autonomous Driving. 27575-27585 - Mingfei Han, Liang Ma, Kamila Zhumakhanova, Ekaterina Radionova, Jingyi Zhang, Xiaojun Chang
, Xiaodan Liang, Ivan Laptev:
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation. 27586-27596 - Nedko Savov, Naser Kazemi, Mohammad Mahdi, Danda Pani Paudel, Xi Wang, Luc Van Gool:
Exploration-Driven Generative Interactive Environments. 27597-27607 - Chenjie Hao, Weyl Lu, Yifan Xu, Yubei Chen:
Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning. 27608-27617 - Yuxuan Wang, Aming Wu, Muli Yang, Yukuan Min, Yihang Zhu, Cheng Deng:
Reasoning Mamba: Hypergraph-Guided Region Relation Calculating for Weakly Supervised Affordance Grounding. 27618-27627 - Jiong Lin, Lechen Zhang, Kwansoo Lee, Jialong Ning, Judah Goldfeder, Hod Lipson:
AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration. 27628-27637 - Xiaoqi Li, Jingyun Xu, Mingxu Zhang, Jiaming Liu, Yan Shen, Iaroslav Ponomarenko, Jiahui Xu, Liang Heng, Siyuan Huang, Shanghang Zhang, Hao Dong:
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation. 27638-27648 - Yao Mu, Tianxing Chen, Zanxin Chen, Shijia Peng, Zhiqian Lan, Zeyu Gao, Zhixuan Liang, Qiaojun Yu, Yude Zou, Mingkun Xu, Lunkai Lin, Zhiqiang Xie, Mingyu Ding, Ping Luo:
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins. 27649-27660 - Hanzhi Chen, Boyang Sun, Anran Zhang, Marc Pollefeys, Stefan Leutenegger:
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation. 27661-27672 - Yitang Li, Mingxian Lin, Zhuo Lin, Yipeng Deng, Yue Cao, Li Yi:
Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References. 27673-27682 - Hongxiang Zhao, Xingchen Liu, Mutian Xu
, Yiming Hao, Weikai Chen, Xiaoguang Han:
TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation. 27683-27693 - Wanyue Zhang, Rishabh Dabral, Vladislav Golyanik, Vasileios Choutas, Eduardo Alvarado, Thabo Beeler, Marc Habermann, Christian Theobalt:
BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects. 27694-27705 - Zhenrong Wang, Qi Zheng, Sihan Ma, Maosheng Ye, Yibing Zhan, Dongjiang Li:
End-to-End HOI Reconstruction Transformer with Graph-based Encoding. 27706-27715 - Zhengdi Yu, Stefanos Zafeiriou, Tolga Birdal
:
Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera. 27716-27726 - Yiming Zhao, Taein Kwon, Paul Streli, Marc Pollefeys, Christian Holz:
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision. 27727-27738 - Ziyu Wu, Yufan Xiong, Mengting Niu, Fangting Xie, Quan Wan, Qijun Ying, Boyan Liu, Xiaohui Cai:
PI-HMR: Towards Robust In-bed Temporal Human Shape Reconstruction with Contact Pressure Sensing. 27739-27749 - Jaeho Choi
, Soheil Hor, Shubo Yang, Amin Arbabian:
MVDoppler-Pose: Multi-Modal Multi-View mmWave Sensing for Long-Distance Self-Occluded Human Walking Pose Estimation. 27750-27759 - Shenghao Ren, Yi Lu
, Jiayi Huang, Jiayi Zhao, He Zhang, Tao Yu, Qiu Shen, Xun Cao:
MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond. 27760-27770 - Yinghao Wu
, Shihui Guo, Yipeng Qin:
MODA: Motion-Drift Augmentation for Inertial Human Motion Analysis. 27771-27781 - Xinpeng Liu, Junxuan Liang, Chenshuo Zhang, Zixuan Cai, Cewu Lu, Yong-Lu Li:
Homogeneous Dynamics Space for Heterogeneous Humans. 27782-27793 - Wei-Jin Huang, Yuan-Ming Li, Zhi-Wei Xia, Yu-Ming Tang, Kun-Yu Lin
, Jian-Fang Hu, Wei-Shi Zheng:
Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks. 27794-27804 - Yiheng Li, Ruibing Hou, Hong Chang, Shiguang Shan, Xilin Chen:
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing. 27805-27815 - Jiaqi Chen, Xiaoye Zhu, Yue Wang, Tianyang Liu, Xinhui Chen, Ying Chen, Chak Tou Leong, Yifei Ke, Joseph Liu, Yiwen Yuan, Julian J. McAuley
, Li-jia Li:
Symbolic Representation for Any-to-Any Generative Tasks. 27816-27826 - Zhengyuan Li, Kai Cheng
, Anindita Ghosh, Uttaran Bhattacharya, Liangyan Gui, Aniket Bera:
SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction. 27827-27837 - Kwan Yun, Seokhyeon Hong, Chaelin Kim, Junyong Noh:
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models. 27838-27848 - Bizhu Wu, Jinheng Xie, Keming Shen, Zhe Kong, Jianfeng Ren, Ruibin Bai, Rong Qu, Linlin Shen:
MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities. 27849-27858 - Zichong Meng, Yiming Xie, Xiaogang Peng, Zeyu Han, Huaizu Jiang:
Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression. 27859-27871 - Shunlin Lu, Jingbo Wang, Zeyu Lu, Ling-Hao Chen, Wenxun Dai, Junting Dong, Zhiyang Dou, Bo Dai, Ruimao Zhang:
ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model. 27872-27882 - Ruopeng Gao, Ji Qi, Limin Wang:
Multiple Object Tracking as ID Prediction. 27883-27893 - Libo Long, Xiao Hu, Jochen Lang:
Shape and Texture: What Influences Reliable Optical Flow Estimation? 27894-27903 - Hanyu Zhou, Haonan Wang, Haoyue Liu
, Yuxing Duan, Yi Chang, Luxin Yan:
Bridge Frame and Event: Common Spatiotemporal Fusion for High-Dynamic Scene Optical Flow. 27904-27913 - Qiyao Gao, Peiqi Duan, Hanyue Lou, Minggui Teng, Ziqi Cai, Xu Chen, Boxin Shi
:
Unified Reconstruction of Static and Dynamic Scenes from Events. 27914-27923 - Alejandro Castañeda Garcia, Jan Warchocki, Jan van Gemert, Daan Brinks, Nergis Tomen:
Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems. 27924-27933 - Gene Chou, Kai Zhang, Sai Bi, Hao Tan, Zexiang Xu, Fujun Luan, Bharath Hariharan, Noah Snavely:
Generating 3D-Consistent Videos from Unposed Internet Photos. 27934-27945 - Guojun Lei, Chi Wang, Rong Zhang, Yikai Wang, Hong Li, Weiwei Xu:
AnimateAnything: Consistent and Controllable Animation for Video Generation. 27946-27956 - Zhongwei Zhang, Fuchen Long, Zhaofan Qiu, Yingwei Pan, Wu Liu, Ting Yao, Tao Mei:
MotionPro: A Precise Motion Controller for Image-to-Video Generation. 27957-27967 - Tianyi Zhu, Dongwei Ren, Qilong Wang, Xiaohe Wu, Wangmeng Zuo:
Generative Inbetweening through Frame-wise Conditions-Driven Video Generation. 27968-27978 - Jiangtong Tan, Hu Yu, Jie Huang, Jie Xiao, Feng Zhao:
FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis. 27979-27988 - Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang, Peter Henry Tu, Jing Zhang, Hongdong Li, Richard I. Hartley, Dylan Campbell:
Probability Density Geodesics in Image Diffusion Latent Space. 27989-27998 - Alper Kayabasi, Anil Kumar Vadathya
, Guha Balakrishnan, Vishwanath Saragadam:
Bias for Action: Video Implicit Neural Representations with Bias Modulation. 27999-28008 - Eunjin Kim, Hyeonjin Kim, Kyong Hwan Jin, Jaejun Yoo:
BF-STVSR: B-Splines and Fourier - Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution. 28009-28018 - Chun Zhang, Heming Sun, Jiro Katto:
FLAVC: Learned Video Compression with Feature Level Attention. 28019-28028 - Lei Ke, Haohang Xu, Xuefei Ning, Yu Li, Jiajun Li, Haoling Li, Yuxuan Lin, Dongsheng Jiang, Yujiu Yang
, Linfeng Zhang:
ProReflow: Progressive Reflow with Decomposed Velocity. 28029-28038 - Yudong Mao
, Hao Luo, Zhiwei Zhong, Peilin Chen, Zhijiang Zhang, Shiqi Wang:
Making Old Film Great Again: Degradation-aware State Space Model for Old Film Restoration. 28039-28049 - Rohit Kundu, Hao Xiong, Vishal Mohanty, Athula Balachandran, Amit K. Roy-Chowdhury:
Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content. 28050-28060 - Duosheng Chen, Shihao Zhou, Jinshan Pan, Jinglei Shi, Lishen Qu, Jufeng Yang:
A Polarization-Aided Transformer for Image Deblurring via Motion Vector Decomposition. 28061-28070 - Siwei Tu, Ben Fei, Weidong Yang, Fenghua Ling, Hao Chen, Zili Liu, Kun Chen, Hang Fan, Wanli Ouyang, Lei Bai:
Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution. 28071-28080 - Zhuoran Du, Shaodi You, Cheng Cheng, Shikui Wei:
Automatic Spectral Calibration of Hyperspectral Images: Method, Dataset and Benchmark. 28081-28090 - Dabing Yu, Zheng Gao:
VolFormer: Explore More Comprehensive Cube Interaction for Hyperspectral Image Restoration and Beyond. 28091-28101 - Chunyang Cheng, Tianyang Xu, Zhenhua Feng, Xiaojun Wu, Zhangyong Tang, Hui Li, Zeyang Zhang
, Sara Atito Ali, Muhammad Awais, Josef Kittler:
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion. 28102-28112 - Xin Lu, Jie Xiao, Yurui Zhu, Xueyang Fu:
Continuous Adverse Weather Removal via Degradation-Aware Distillation. 28113-28123 - Hang Guo, Yong Guo, Yaohua Zha, Yulun Zhang, Wenbo Li, Tao Dai, Shu-Tao Xia, Yawei Li
:
MambaIRv2: Attentive State Space Restoration. 28124-28133 - Kun Zhou, Xinyu Lin
, Jiangbo Lu:
TSP-Mamba: The Travelling Salesman Problem Meets Mamba for Image Super-resolution and Beyond. 28134-28143 - Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay N. Paranjape, Vishal M. Patel:
GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration. 28144-28154 - Brayan Monroy, Jorge Bacca, Julián Tachella:
Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise. 28155-28164 - Xiangpeng Tian
, Xiangyu Liao, Xiao Liu, Meng Li, Chao Ren:
Degradation-Aware Feature Perturbation for All-in-One Image Restoration. 28165-28175 - Guanglu Dong, Xiangyu Liao, Mingyang Li, Guihuan Guo, Chao Ren:
Exploring Semantic Feature Discrimination for Perceptual Image Super-Resolution and Opinion-Unaware No-Reference Image Quality Assessment. 28176-28187 - Junyang Chen, Jinshan Pan, Jiangxin Dong:
FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution. 28188-28197 - Zhu Liu, Zijun Wang, Jinyuan Liu, Fanqi Meng, Long Ma, Risheng Liu:
DEAL: Data-Efficient Adversarial Learning for High-Quality Infrared Imaging. 28198-28207 - Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, Lei Zhang:
Adversarial Diffusion Compression for Real-World Image Super-Resolution. 28208-28220 - Xiaoling Zhou, Zhemg Lee, Wei Ye, Rui Xie, Wenbo Zhang, Guanju Peng, Zongze Li, Shikun Zhang:
All-Optical Nonlinear Diffractive Deep Network for Ultrafast Image Denoising. 28221-28231 - Bohan Xiao, Peiyong Wang, Qisheng He, Ming Dong:
Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators. 28232-28241 - Jiawan Li, Fei Zhou, Zhipeng Zhong, Jiongzhi Lin, Guoping Qiu:
Towards Smart Point-and-Shoot Photography. 28242-28251 - Tianyu Wang, Jianming Zhang, Haitian Zheng, Zhihong Ding, Scott Cohen, Zhe Lin, Wei Xiong, Chi-Wing Fu, Luis Figueroa, Soo Ye Kim:
MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis. 28252-28262 - Jing Wu, Trung Le, Munawar Hayat, Mehrtash Harandi:
Erasing Undesirable Influence in Diffusion Models. 28263-28273 - Yixing Zhu, Qing Zhang, Yitong Wang, Yongwei Nie, Wei-Shi Zheng:
EntityErasure: Erasing Entity Cleanly via Amodal Entity Segmentation and Completion. 28274-28283 - Ji Woo Hong, Tri Ton, Trung X. Pham
, Gwanhyeong Koo, Sunjae Yoon, Chang D. Yoo:
ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On. 28284-28294 - Matheus Souza, Yidan Zheng
, Kaizhang Kang, Yogeshwar Nath Mishra, Qiang Fu, Wolfgang Heidrich:
Latent Space Imaging. 28295-28305 - Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu:
Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers. 28306-28315 - Sotiris Anagnostidis, Gregor Bachmann, Yeongmin Kim, Jonas Kohler, Markos Georgopoulos, Artsiom Sanakoyeu, Yuming Du, Albert Pumarola, Ali K. Thabet, Edgar Schönfeld:
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute. 28316-28326 - Vishal Purohit, Matthew Repasky, Jianfeng Lu
, Qiang Qiu, Yao Xie, Xiuyuan Cheng:
Consistency Posterior Sampling for Diverse Image Synthesis. 28327-28336 - Wenxin Su, Song Tang, Xiaofeng Liu, Xiaojing Yi, Mao Ye, Chunxiao Zu, Jiahao Li, Xiatian Zhu:
Domain Adaptive Diabetic Retinopathy Grading with Model Absence and Flowing Data. 28337-28346 - Johannes Schusterbauer, Ming Gui, Frank Fundel, Björn Ommer:
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment. 28347-28357 - Hao Chen, Ze Wang, Xiang Li, Ximeng Sun, Fangyi Chen, Jiang Liu, Jindong Wang, Bhiksha Raj, Zicheng Liu, Emad Barsoum:
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer. 28358-28370 - Yongwei Chen, Yushi Lan, Shangchen Zhou, Tengfei Wang, Xingang Pan:
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE. 28371-28382 - Aneeshan Sain, Subhajit Maity, Pinaki Nath Chowdhury, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song:
Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch. 28383-28393 - Hmrishav Bandyopadhyay, Yi-Zhe Song:
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations. 28394-28404 - Ozgur Kara, Krishna Kumar Singh, Feng Liu, Duygu Ceylan, James M. Rehg, Tobias Hinz:
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models. 28405-28415 - Shuchen Weng, Haojie Zheng, Peixuan Zhang, Yuchen Hong, Han Jiang, Si Li, Boxin Shi
:
VIRES: Video Instance Repainting via Sketch and Text Guided Generation. 28416-28425 - Yixuan Zhu, Haolin Wang, Shilin Ma, Wenliang Zhao, Yansong Tang, Lei Chen, Jie Zhou:
FADE: Frequency-Aware Diffusion Model Factorization for Video Editing. 28426-28435 - Tongda Xu, Jiahao Li, Bin Li, Yan Wang, Ya-Qin Zhang, Yan Lu:
PICD: Versatile Perceptual Image Compression with Diffusion Rendering. 28436-28445 - Ka-Chun Shum, Binh-Son Hua, Duc Thanh Nguyen, Sai-Kit Yeung:
Color Alignment in Diffusion. 28446-28455 - Nam Anh Dinh, Itai Lang, Hyunwoo Kim, Oded Stein, Rana Hanocka:
Geometry in Style: 3D Stylization via Surface Normal Deformation. 28456-28467 - Hongda Liu, Longguang Wang, Ye Zhang, Ziru Yu, Yulan Guo:
SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer. 28468-28478 - Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Jiangning Zhang, Chengjie Wang
, Yunsheng Wu, Charles Ling, Boyu Wang:
Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing. 28479-28489 - Toan Nguyen, Kien Do, Duc Kieu, Thin Nguyen:
h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform. 28490-28501 - Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan Chan, Hancheng Min, Chris Callison-Burch, René Vidal:
Concept Lancet: Image Editing with Compositional Representation Transplant. 28502-28512 - Sherry X. Chen, Misha Sra, Pradeep Sen
:
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning. 28513-28522 - Tong Wang, Ting Liu, Xiaochao Qu, Chengjing Wu, Luoqi Liu, Xiaolin Hu:
GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing. 28523-28532 - Bin Xia, Yuechen Zhang, Jingyao Li, Chengyao Wang, Yitong Wang, Xinglong Wu, Bei Yu, Jiaya Jia:
DreamOmni: Unified Image Generation and Editing. 28533-28543 - Muhammad Shaheryar, Jong Taek Lee, Soon Ki Jung:
Black Hole-Driven Identity Absorbing in Diffusion Models. 28544-28554 - Yibin Wang, Weizhong Zhang, Honghui Xu, Cheng Jin:
DreamText: High Fidelity Scene Text Synthesis. 28555-28563 - Yasamin Medghalchi, Moein Heidari, Clayton Allard, Leonid Sigal, Ilker Hacihaliloglu
:
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attack on Breast Ultrasound Images. 28564-28574 - Andrew Z. Wang, Songwei Ge, Tero Karras, Ming-Yu Liu, Yogesh Balaji:
A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation. 28575-28585 - Bingda Tang, Boyang Zheng, Sayak Paul, Saining Xie:
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis. 28586-28595 - Vikash Sehwag, Xianghao Kong, Jingtao Li, Michael Spranger, Lingjuan Lyu
:
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget. 28596-28608 - Jiyeon Han
, Dahee Kwon, Gayoung Lee, Junho Kim, Jaesik Choi:
Enhancing Creative Generation on Stable Diffusion-based Models. 28609-28618 - Jungwoo Chae, Jiyoon Kim, Jaewoong Choi, Kyungyul Kim, Sangheum Hwang:
APT: Adaptive Personalized Training for Diffusion Models with Limited Data. 28619-28628 - Yunhong Lu, Qichao Wang, Hengyuan Cao, Xierui Wang, Xiaoyin Xu, Min Zhang:
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment. 28629-28639 - Yuning Qiu, Andong Wang, Chao Li, Haonan Huang, Guoxu Zhou, Qibin Zhao:
STEPS: Sequential Probability Tensor Estimation for Text-to-Image Hard Prompt Search. 28640-28650 - Eduard Gabriel Poesina, Adriana Valentina Costache, Adrian-Gabriel Chifu, Josiane Mothe, Radu Tudor Ionescu:
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction. 28651-28661 - Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao, Ziyu Guo, Haoquan Zhang, Manyuan Zhang, Jiaming Liu, Peng Gao, Hongsheng Li:
Let's Verify and Reinforce Image Generation Step by Step. 28662-28672 - Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth:
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning. 28673-28683 - Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong:
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation. 28684-28693 - Haoyu Chen, Xiaojie Xu, Wenbo Li, Jingjing Ren, Tian Ye, Songhua Liu, Ying-Cong Chen, Lei Zhu, Xinchao Wang:
POSTA: A Go-to Framework for Customized Artistic Poster Generation. 28694-28704 - Zhaoxing Gan, Mengtian Li, Ruhua Chen, Zhongxia Ji, Sichen Guo, Huanling Hu, Guangnan Ye, Zuo Hu:
StageDesigner: Artistic Stage Generation for Scenography via Theater Scripts. 28705-28714 - Aditya Ganeshan, Thibault Groueix, Paul Guerrero, Radomír Mech, Matthew Fisher, Daniel Ritchie:
Pattern Analogies: Learning to Perform Programmatic Image Edits by Analogy. 28715-28725 - Shanshan Huang, Haoxuan Li
, Chunyuan Zheng, Mingyuan Ge, Wei Gao, Lei Wang, Li Liu:
Text-Driven Fashion Image Editing with Compositional Concept Learning and Counterfactual Abduction. 28726-28735 - Yisol Choi, Sangkyung Kwak, Sihyun Yu, Hyungwon Choi, Jinwoo Shin:
Controllable Human Image Generation with Personalized Multi-Garments. 28736-28747 - Zengqun Zhao, Ziquan Liu, Yu Cao, Shaogang Gong, Ioannis Patras:
AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data. 28748-28758 - Yuan Wang, Ouxiang Li, Tingting Mu
, Yanbin Hao, Kuien Liu, Xiang Wang, Xiangnan He:
Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters. 28759-28768 - Jie Ren, Kangrui Chen, Yingqian Cui, Shenglai Zeng, Hui Liu, Yue Xing, Jiliang Tang, Lingjuan Lyu
:
Six-CD: Benchmarking Concept Removals for Text-to-image Diffusion Models. 28769-28778 - Huayang Huang, Xiangye Jin, Jiaxu Miao, Yu Wu:
Implicit Bias Injection Attacks against Text-to-Image Diffusion Models. 28779-28789 - Zebin You, Xinyu Zhang, Hanzhong Guo, Jingdong Wang, Chongxuan Li:
Are Images Indistinguishable to Humans Also Indistinguishable to Classifiers? 28790-28800 - Namhyuk Ahn, KiYoon Yoo, Wonhyuk Ahn, Daesik Kim, Seung-Hun Nam:
Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models. 28801-28810 - Huan Teng, Yuhui Quan, Chengyu Wang, Jun Huang, Hui Ji:
Fingerprinting Denoising Diffusion Probabilistic Models. 28811-28820 - Haoyue Bai, Yiyou Sun, Wei Cheng, Haifeng Chen:
Where's the Liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content. 28821-28830 - Zhenglin Huang, Jinwei Hu, Xiangtai Li, Yiwei He, Xingyu Zhao
, Bei Peng, Baoyuan Wu, Xiaowei Huang, Guangliang Cheng:
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model. 28831-28841 - Anqi Liang, Ciprian A. Corneanu, Qianli Feng, Giorgio Giannone, Aleix Martinez:
Be More Specific: Evaluating Object-centric Realism in Synthetic Images. 28842-28851 - Reese Kneeland, Paul S. Scotti, Ghislain St-Yves, Jesse Breedlove, Kendrick N. Kay, Thomas Naselaris:
NSD-Imagery: A Benchmark Dataset for Extending fMRI Vision Decoding Methods to Mental Imagery. 28852-28862 - Nikola Zubic, Davide Scaramuzza:
GG-SSMs: Graph-Generating State Space Models. 28863-28873 - Fiona Ryan, Ajay Bati, Sangmin Lee
, Daniel Bolya, Judy Hoffman, James M. Rehg:
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders. 28874-28884 - Jingkang Yang, Shuai Liu, Hongming Guo, Yuhao Dong, Xiamengwei Zhang, Sicheng Zhang, Pengyun Wang, Zitang Zhou, Binzhu Xie, Ziyue Wang, Bei Ouyang, Zhengyu Lin, Marco Cominelli, Zhongang Cai, Bo Li, Yuanhan Zhang, Peiyuan Zhang, Fangzhou Hong, Joerg Widmer, Francesco Gringoli, Lei Yang, Ziwei Liu:
EgoLife: Towards Egocentric Life Assistant. 28885-28900 - Ho Kei Cheng, Masato Ishii, Akio Hayakawa, Takashi Shibuya, Alexander G. Schwing, Yuki Mitsufuji:
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis. 28901-28911 - Shentong Mo, Yibing Song:
Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows. 28912-28921 - Chen Liu, Peike Li, Liying Yang, Dadong Wang, Lincheng Li, Xin Yu
:
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment. 28922-28931 - Yuji Wang, Haoran Xu, Yong Liu, Jiaze Li, Yansong Tang:
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes. 28932-28941 - Sihong Huang, Jiaxin Wu, Xiaoyong Wei, Yi Cai, Dongmei Jiang, Yaowei Wang:
Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues. 28942-28951 - Yulu Pan, Ce Zhang, Gedas Bertasius:
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation. 28952-28962 - Lehan Yang, Lu Qi, Xiangtai Li, Sheng Li, Varun Jampani, Ming-Hsuan Yang:
Unified Dense Prediction of Video Diffusion. 28963-28973 - Tiehan Fan, Kepan Nan, Rui Xie, Penghao Zhou, Zhenheng Yang, Chaoyou Fu, Xiang Li, Jian Yang, Ying Tai:
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption. 28974-28983 - Weijia Wu, Mingyu Liu, Zeyu Zhu, Xi Xia, Haoen Feng, Wen Wang, Kevin Qinghong Lin, Chunhua Shen, Mike Zheng Shou:
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation. 28984-28994 - Chenkai Zhang, Yiming Lei
, Zeming Liu, Haitao Leng, Shaoguo Liu, Tingting Gao, Qingjie Liu, Yunhong Wang:
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding. 28995-29004 - Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz, Elad Ben-Avraham, Alona Golts, Yair Kittenplon, Shai Mazor, Ron Litman:
DocVLM: Make Your VLM an Efficient Reader. 29005-29015 - Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah, Reina Pradhan, Kristen Grauman:
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos. 29016-29028 - Zhongwei Ren, Yunchao Wei, Xun Guo, Yao Zhao, Bingyi Kang, Jiashi Feng, Xiaojie Jin
:
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos. 29029-29039 - Chris Dongjoo Kim, Jihwan Moon, Sangwoo Moon, Heeseung Yun, Sihaeng Lee, Aniruddha Kembhavi, Soonyoung Lee, Gunhee Kim, Sangho Lee, Christopher Clark:
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams. 29040-29049 - Nina Shvetsova, Arsha Nagrani, Bernt Schiele, Hilde Kuehne, Christian Rupprecht:
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks. 29050-29059 - Dahun Kim, A. J. Piergiovanni, Ganesh Satish Mallya, Anelia Angelova:
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models. 29060-29070 - Shyamal Buch, Arsha Nagrani, Anurag Arnab, Cordelia Schmid:
Flexible Frame Selection for Efficient Video Reasoning. 29071-29082 - Joya Chen, Ziyun Zeng, Yiqi Lin, Wei Li, Zejun Ma, Mike Zheng Shou:
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale. 29083-29095 - Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, Gedas Bertasius, Lorenzo Torresani:
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering. 29096-29107 - Yangliu Hu, Zikai Song, Na Feng, Yawei Luo, Junqing Yu, Yi-Ping Phoebe Chen
, Wei Yang:
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding. 29108-29117 - Xi Tang
, Jihao Qiu, Lingxi Xie, Yunjie Tian, Jianbin Jiao, Qixiang Ye:
Adaptive Keyframe Sampling for Long Video Understanding. 29118-29128 - Haoxing Chen, Zizheng Huang, Yan Hong, Yanshuo Wang, Zhongcai Lyu, Zhuoer Xu, Jun Lan, Zhangxuan Gu:
Efficient Transfer Learning for Video-language Foundation Models. 29129-29138 - Shaoyu Liu, Jianing Li, Guanghui Zhao, Yunjian Zhang, Xin Meng, Fei Richard Yu, Xiangyang Ji, Ming Li:
EventGPT: Event Stream Understanding with Multimodal Large Language Models. 29139-29149 - Trong-Thuan Nguyen, Pha Nguyen, Jackson David Cothren, Alper Yilmaz, Khoa Luu:
HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation. 29150-29160 - Mu Chen, Liulei Li, Wenguan Wang, Yi Yang:
DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation. 29161-29172 - Weitao Feng, Hang Zhou
, Jing Liao
, Li Cheng, Wenbo Zhou:
CASAGPT: Cuboid Arrangement and Scene Assembly for Interior Design. 29173-29182 - Sitong Gong, Yunzhi Zhuge, Lu Zhang, Zongxin Yang, Pingping Zhang, Huchuan Lu:
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation. 29183-29192 - Zixuan Chen, Jiaxin Li, Junxuan Liang, Liming Tan, Yejie Guo, Cewu Lu, Yong-Lu Li:
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation. 29193-29202 - Fei Li, Wenxuan Liu, Jingjing Chen, Ruixu Zhang, Yuran Wang, Xian Zhong, Zheng Wang:
Anomize: Better Open Vocabulary Video Anomaly Detection. 29203-29212 - Chen Tang, Xinzhu Ma, Encheng Su, Xiufeng Song, Xiaohong Liu, Wei-Hong Li, Lei Bai, Wanli Ouyang, Xiangyu Yue:
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines. 29213-29224 - Xiaoyong Chen, Yong Guo, Jiaming Liang, Sitong Zhuang, Runhao Zeng, Xiping Hu:
Temporal Action Detection Model Compression by Progressive Block Drop. 29225-29236 - Yuting Zhang, Hao Lu, Qingyong Hu, Yin Wang, Kaishen Yuan, Xin Liu, Kaishun Wu:
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model. 29237-29247 - Hongda Liu, Yunfan Liu, Min Ren, Hao Wang, Yunlong Wang, Zhenan Sun:
Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition. 29248-29257 - Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier, Bharath Hariharan, Kavita Bala, Carl Vondrick:
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery. 29258-29267 - Gaozheng Pei, Shaojie Lyu, Gong Chen, Ke Ma, Qianqian Xu, Yingfei Sun, Qingming Huang:
Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification. 29268-29277 - Xiaofan Bai, Shixin Li, Xiaojing Ma, Bin Benjamin Zhu, Dongmei Zhang
, Linchen Yu:
SDBF: Steep-Decision-Boundary Fingerprinting for Hard-Label Tampering Detection of DNN Models. 29278-29287 - Ziang Li, Hongguang Zhang, Juan Wang, Meihui Chen, Hongxin Hu, Wenzhe Yi, Xiaoyang Xu, Mengda Yang, Chenjun Ma:
From Head to Tail: Efficient Black-box Model Inversion Attack via Long-tailed Learning. 29288-29298 - Meng Pang, Wenjun Zhang, Nanrun Zhou, Shengbo Chen, Hong Rao:
UMFN: Unified Multi-Domain Face Normalization for Joint Cross-domain Prototype Learning and Heterogeneous Face Recognition. 29299-29308 - Zeqi Zhu, Ibrahim Batuhan Akkaya, Luc Waeijen, Egor Bondarev, Arash Pourtaherian, Orlando Moreira:
MEET: Towards Memory-Efficient Temporal Sparse Deep Neural Networks. 29309-29320 - Xiao Wang, Yu Jin, Wentao Wu, Wei Zhang, Lin Zhu, Bo Jiang, Yonghong Tian:
Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset. 29321-29330 - Yi-Xing Peng, Yu-Ming Tang, Kun-Yu Lin, Qize Yang, Jingke Meng, Xihan Wei, Wei-Shi Zheng:
Person De-reidentification: A Variation-guided Identity Shift Modeling. 29331-29341 - Yu Mao, Jun Wang
, Nan Guan
, Chun Jason Xue:
WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression. 29342-29351 - Runmin Jiang, Jackson Daggett, Shriya Pingulkar, Yizhou Zhao, Priyanshu Dhingra, Daniel Brown, Qifeng Wu, Xiangrui Zeng, Xingjian Li
, Min Xu
:
BOE-ViT: Boosting Orientation Estimation with Equivariance in Self-Supervised 3D Subtomogram Alignment. 29352-29362 - Wei Lin
, Chenyang Zhao
, Antoni B. Chan
:
Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting. 29363-29373 - Shijia Zhao, Qiming Xia, Xusheng Guo, Pufan Zou, Maoji Zheng, Hai Wu, Chenglu Wen, Cheng Wang:
SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts. 29374-29384 - Wei-En Tai, Yu-Lin Shih, Cheng Sun, Yu-Chiang Frank Wang, Hwann-Tzong Chen:
Segment Anything, Even Occluded. 29385-29394 - Weiguang Zhao
, Rui Zhang, Qiufeng Wang, Guangliang Cheng, Kaizhu Huang:
BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis. 29395-29405 - Hui Liu
, Chen Jia, Fan Shi, Xu Cheng, Shengyong Chen:
SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures. 29406-29416 - Zihan Lin, Zilei Wang, Xu Wang:
Towards Continual Universal Segmentation. 29417-29427 - Tanner Schmidt, Richard A. Newcombe:
Segment This Thing: Foveated Tokenization for Efficient Point-Prompted Segmentation. 29428-29437 - Jiyong Rao, Brian Nlong Zhao, Yu Wang:
Probabilistic Prompt Distribution Learning for Animal Pose Estimation. 29438-29447 - Wenhuan Huang, Yi Ji, Guiqian Zhu, Li Ying, Chunping Liu:
Navigating the Unseen: Zero-shot Scene Graph Generation via Capsule-Based Equivariant Features. 29448-29457 - Yun Chang, Leonor Fermoselle, Duy Ta, Bernadette Bucher, Luca Carlone, Jiuguang Wang:
ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis. 29458-29468 - Fan-Yun Sun, Weiyu Liu
, Siyi Gu, Dylan Lim, Goutam Bhat, Federico Tombari, Manling Li, Nick Haber, Jiajun Wu:
LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models. 29469-29478 - Jinchang Zhang, Guoyu Lu:
Vision-Language Embodiment for Monocular Depth Estimation. 29479-29489 - Zhiyuan Huang, Ziming Cheng, Junting Pan, Zhaohui Hou, Mingjie Zhan:
SpiritSight Agent: Advanced GUI Agent with One Look. 29490-29500 - Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David F. Fouhey, Joyce Chai:
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination. 29501-29512 - Lizheng Zu, Lin Lin, Song Fu
, Na Zhao
, Pan Zhou:
Collaborative Tree Search for Enhancing Embodied Multi-Agent Collaboration. 29513-29522 - Aniket Didolkar, Andrii Zadaianchuk, Rabiul Awal, Maximilian Seitzer, Efstratios Gavves, Aishwarya Agrawal:
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning. 29523-29533 - Haoran Xu, Peixi Peng, Guang Tan, Yiqian Chang, Luntong Li, Yonghong Tian:
VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning. 29534-29544 - Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang, Yong Man Ro, Yueh-Hua Wu:
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models. 29545-29557 - Han Xiao, Yina Xie, Guanxin Tan, Yinghao Chen, Rui Hu, Ke Wang, Aojun Zhou, Hao Li, Hao Shao, Xudong Lu, Peng Gao, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li:
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding. 29558-29568 - Yiqi Zhu, Ziyue Wang, Can Zhang, Peng Li, Yang Liu:
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models. 29569-29579 - Yuhui Zhang, Yuchang Su, Yiming Liu, Xiaohan Wang, James Burgess, Elaine Sui, Chenyu Wang, Josiah Aklilu, Alejandro Lozano, Anjiang Wei
, Ludwig Schmidt, Serena Yeung-Levy:
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation. 29580-29590 - Xuli Shen, Hua Cai, Weilin Shen, Qing Xu, Dingding Yu, Weifeng Ge, Xiangyang Xue:
CocoER: Aligning Multi-Level Feature by Competition and Coordination for Emotion Recognition. 29591-29600 - Wuyou Xia, Guoli Jia, Sicheng Zhao, Jufeng Yang:
Seek Common Ground While Reserving Differences: Semi-Supervised Image-Text Sentiment Recognition. 29601-29611 - Kumail Alhamoud, Shaden Alshammari, Yonglong Tian, Guohao Li, Philip H. S. Torr, Yoon Kim, Marzyeh Ghassemi
:
Vision-Language Models Do Not Understand Negation. 29612-29622 - Yuanhao Zou, Zhaozheng Yin:
Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering. 29623-29633 - Ting Liu, Siyuan Li
:
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation. 29634-29643 - Bo Zhou, Liulei Li, Yujia Wang, Huafeng Liu, Yazhou Yao, Wenguan Wang:
UNIALIGN: Scaling Multimodal Alignment within One Unified Model. 29644-29655 - Zehan Wang, Sashuai Zhou, Shaoxuan He, Haifeng Huang
, Lihe Yang, Ziang Zhang, Xize Cheng, Shengpeng Ji, Tao Jin, Hengshuang Zhao, Zhou Zhao:
SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language. 29656-29666 - Andre Ye, Sebastin Santy, Jena D. Hwang, Amy X. Zhang, Ranjay Krishna:
Semantic and Expressive Variations in Image Captions Across Languages. 29667-29679 - Quanxing Zha
, Xin Liu, Shu-Juan Peng, Yiu-ming Cheung, Xing Xu, Nannan Wang:
ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning. 29680-29689 - Lan Wang, Wei Ao, Vishnu Naresh Boddeti, Ser-Nam Lim:
Generative Zero-Shot Composed Image Retrieval. 29690-29700 - Yuhao Wang, Yongfeng Lv, Pingping Zhang, Huchuan Lu:
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification. 29701-29710 - Ziwei Wang, Weizhi Chen, Leyang Yang, Sheng Zhou, Shengchu Zhao, Hanbei Zhan, Jiongchao Jin, Liangcheng Li, Zirui Shao, Jiajun Bu:
MP-GUI: Modality Perception with MLLMs for GUI Understanding. 29711-29721 - Hao Guo, Xugong Qin, Jun Jie Ou Yang, Peng Zhang, Gangyan Zeng, Yubo Li, Hailun Lin:
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark. 29722-29732 - Yuhao Cui, Xinxing Zu, Wenhua Zhang, Zhongzhou Zhao, Jinyang Gao:
Incorporating Dense Knowledge Alignment into Unified Multimodal Representation Models. 29733-29743 - Ziyang Zhang, Yang Yu, Yucheng Chen, Xulei Yang, Si Yong Yeo:
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations. 29744-29755 - Wang Lin, Qingsong Wang, Yueying Feng, Shulei Wang, Tao Jin, Zhou Zhao, Fei Wu, Chang Yao, Jingyuan Chen:
Non-Natural Image Understanding with Advancing Frequency-based Vision Encoders. 29756-29766 - Hao Li, Changyao Tian, Jie Shao, Xizhou Zhu, Zhaokai Wang, Jinguo Zhu, Wenhan Dou, Xiao-Gang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai:
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding. 29767-29779 - Shaoan Xie, Lingjing, Yujia Zheng, Yu Yao, Zeyu Tang, Eric P. Xing, Guangyi Chen, Kun Zhang:
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees. 29780-29790 - Haicheng Wang, Chen Ju, Weixiong Lin, Shuai Xiao, Mengting Chen, Yixuan Huang, Chang Liu, Mingshuai Yao, Jinsong Lan, Ying Chen, Qingwen Liu, Yanfeng Wang:
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training. 29791-29802 - Fang Liu
, Yuhao Liu
, Ke Xu
, Shuquan Ye
, Gerhard Petrus Hancke, Rynson W. H. Lau:
Language-Guided Salient Object Ranking. 29803-29813 - Runhui Huang, Xinpeng Ding, Chunwei Wang, Jianhua Han, Yulong Liu, Hengshuang Zhao, Hang Xu, Lu Hou, Wei Zhang, Xiaodan Liang:
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models. 29814-29824 - Yuchu Jiang, Jiale Fu, Chenduo Hao, Xinting Hu, Yingzhe Peng, Xin Geng, Xu Yang:
Mimic In-Context Learning for Multimodal Tasks. 29825-29835 - Xubing Ye, Yukang Gan, Xiaoke Huang, Yixiao Ge, Yansong Tang:
VoCo-LLaMA: Towards Vision Compression with Large Language Models. 29836-29846 - Mayug Maniparambil, Raiymbek Akshulakov, Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Ankit Singh, Noel E. O'Connor:
Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment. 29847-29857 - Sudong Wang, Yunjian Zhang, Yao Zhu, Jianing Li, Zizhe Wang, Yanwei Liu
, Xiangyang Ji:
Towards Understanding How Knowledge Evolves in Large Vision-Language Models. 29858-29868 - Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu, Xide Xia, Miao Liu, Xiaofang Wang, Mingfu Liang, Ning Zhang, Dimitris N. Metaxas, Licheng Yu:
Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction. 29869-29879 - Ziang Yan, Zhilin Li, Yinan He, Chenting Wang, Kunchang Li, Xinhao Li, Xiangyu Zeng, Zilei Wang, Yali Wang, Yu Qiao, Limin Wang, Yi Wang:
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment. 29880-29892 - Eunkyu Park, Minyeong Kim, Gunhee Kim:
HalLoc: Token-level Localization of Hallucinations for Vision Language Models. 29893-29903 - Wei Suo, Lijun Zhang, Mengyang Sun, Lin Yuanbo Wu, Peng Wang, Yanning Zhang:
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding. 29904-29914 - Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, Qianying Wang, Ping Chen, Xiaoqin Zhang, Shijian Lu:
Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention. 29915-29926 - Zenghui Yuan, Jiawen Shi, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun:
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models. 29927-29936 - Joonhyun Jeong, Seyun Bae, Yeonsung Jung, Jaeryong Hwang, Eunho Yang:
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy. 29937-29946 - Han Wang, Gang Wang, Huan Zhang:
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks. 29947-29957 - Lijun Sheng, Jian Liang, Zilei Wang, Ran He:
R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning. 29958-29967 - Yuhang Yang, Jinhong Deng, Wen Li, Lixin Duan:
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference. 29968-29978 - Jeonghyeon Kim, Sangheum Hwang:
Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations. 29979-29988 - Matteo Farina, Massimiliano Mancini
, Giovanni Iacca, Elisa Ricci:
Rethinking Few-Shot Adaptation of Vision-Language Models in Two Stages. 29989-29998 - Lihua Zhou, Mao Ye, Shuaifeng Li, Nianxin Li, Xiatian Zhu, Lei Deng, Hongbin Liu, Zhen Lei:
Bayesian Test-Time Adaptation for Vision-Language Models. 29999-30009 - Seung Hyun Lee, Jijun Jiang, Yiran Xu, Zhuofang Li, Junjie Ke, Yinxiao Li, Junfeng He, Steven Hickson, Katie Datsenko, Sangpil Kim, Ming-Hsuan Yang, Irfan Essa, Feng Yang:
Cropper: Vision-Language Model for Image Cropping through In-Context Learning. 30010-30019 - Haoyuan Yang, Xiaoou Li, Jiaming Lv, Xianjun Cheng, Qilong Wang, Peihua Li:
ImagineFSL: Self-Supervised Pretraining Matters on Imagined Base Set for VLM-based Few-shot Learning. 30020-30031 - Chenyu Zhang, Kunlun Xu, Zichen Liu, Yuxin Peng, Jiahuan Zhou:
SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute Prompting. 30032-30041 - Ruoyu Chen
, Siyuan Liang, Jingzhi Li, Shiming Liu, Maosen Li, Zhen Huang, Hua Zhang, Xiaochun Cao
:
Interpreting Object-level Foundation Models via Visual Precision Search. 30042-30052 - Lintong Zhang, Kang Yin, Seong-Whan Lee:
Towards Fine-Grained Interpretability: Counterfactual Explanations for Misclassification with Saliency Partition. 30053-30062 - Itay Benou, Tammy Riklin Raviv:
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models. 30063-30072 - Jinseong Jang, Chunfei Ma, Byeongwon Lee:
VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks. 30073-30083 - Mert Bülent Sariyildiz, Philippe Weinzaepfel, Thomas Lucas, Pau de Jorge, Diane Larlus, Yannis Kalantidis:
DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers. 30084-30094 - Xuweiyi Chen, Markus Marks, Zezhou Cheng:
Probing the Mid-level Vision Capabilities of Self-Supervised Learning. 30095-30105 - Seokil Ham, Hee-Seon Kim, Sangmin Woo, Changick Kim:
Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation. 30106-30115 - Jinqi Xiao, Shen Sang, Tiancheng Zhi, Jing Liu, Qing Yan, Linjie Luo, Bo Yuan:
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection. 30116-30126 - Elad Amrani, Leonid Karlinsky, Alex M. Bronstein:
Sample- and Parameter-Efficient Auto-Regressive Image Models. 30127-30136 - Jeimin Jeon, Youngmin Oh, Junghyup Lee, Donghyeon Baek, Dohyung Kim, Chanho Eom
, Bumsub Ham:
Subnet-Aware Dynamic Supernet Training for Neural Architecture Search. 30137-30146 - Sabbir Ahmed, Abdullah Al Arafat
, Deniz Najafi, Akhlak Mahmood, Mamshad Nayeem Rizve, Mohaiminul Al Nahian, Ranyang Zhou, Shaahin Angizi, Adnan Siraj Rakin
:
DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the Edge. 30147-30156 - Feng Wang, Timing Yang, Yaodong Yu, Sucheng Ren, Guoyizhe Wei, Angtian Wang, Wei Shao, Yuyin Zhou, Alan L. Yuille, Cihang Xie:
Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency. 30157-30166 - Yair Smadar, Assaf Hoogi:
Dynamic Group Normalization: Spatio-Temporal Adaptation to Evolving Data Statistics. 30167-30177 - Linwei Chen, Lin Gu, Liang Li, Chenggang Yan, Ying Fu:
Frequency Dynamic Convolution for Dense Image Prediction. 30178-30188 - Kwonyoung Kim, Jungin Park, Jin Kim, Hyeongjun Kwon, Kwanghoon Sohn:
Faster Parameter-Efficient Tuning with Token Redundancy Reduction. 30189-30198 - Yan Xie
, Zequn Zeng, Hao Zhang, Yucheng Ding, Yi Wang, Zhengjue Wang, Bo Chen, Hongwei Liu:
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models. 30199-30209 - Aishwarya Agarwal, Srikrishna Karanam, Vineet Gandhi:
TIDE: Training Locally Interpretable Domain Generalization Models Enables Test-time Correction. 30210-30220 - Zihang Lai:
Exploring Simple Open-Vocabulary Semantic Segmentation. 30221-30230 - Lanyun Zhu, Tianrun Chen, Qianxiong Xu, Xuanyi Liu, Deyi Ji, Haiyang Wu, De Wen Soh, Jun Liu:
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation. 30231-30240 - Songsong Duan, Xi Yang, Nannan Wang:
Multi-Label Prototype Visual Spatial Search for Weakly Supervised Semantic Segmentation. 30241-30250 - Farchan Hakim Raswa, Chun-Shien Lu, Jia-Ching Wang:
HistoFS: Non-IID Histopathologic Whole Slide Image Classification via Federated Style Transfer with RoI-Preserving. 30251-30260 - Ziqian Yang, Xinqiao Zhao, Xiaolei Wang, Quan Zhang, Jimin Xiao:
FFR: Frequency Feature Rectification for Weakly Supervised Semantic Segmentation. 30261-30270 - Qingchen Tang, Lei Fan
, Maurice Pagnucco, Yang Song:
Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation. 30271-30280 - Pinzhuo Tian, Shengjie Yang, Hang Yu, Alex C. Kot:
Pay Attention to the Foreground in Object-Centric Learning. 30281-30290 - Jianyang Zhang, Qianli Luo, Guowu Yang, Wenjing Yang
, Weide Liu, Guosheng Lin, Fengmao Lv:
Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability. 30291-30300 - Peng Wu, Xiankai Lu, Hao Hu, Yongqin Xian, Jianbing Shen, Wenguan Wang:
LOGICZSL: Exploring Logic-induced Representation for Compositional Zero-shot Learning. 30301-30311 - Xiaokun Li, Yaping Huang, Qingji Guan:
CLIP-driven Coarse-to-fine Semantic Guidance for Fine-grained Open-set Semi-supervised Learning. 30312-30321 - Wei Zhang, Baopeng Zhang, Zhu Teng, Wenxin Luo, Junnan Zou, Jianping Fan:
Less Attention is More: Prompt Transformer for Generalized Category Discovery. 30322-30331 - Shan Zhang, Yao Ni, Jinhao Du, Yuan Xue
, Philip Torr, Piotr Koniusz, Anton van den Hengel:
Open-World Objectness Modeling Unifies Novel Object Detection. 30332-30342 - Zhenya Tian, Jun Xiao, Lupeng Liu, Haiyong Jiang:
Activating Sparse Part Concepts for 3D Class Incremental Learning. 30343-30353 - Xiang Song, Yuhang He, Jingyuan Li, Qiang Wang, Yihong Gong:
Learning Endogenous Attention for Incremental Object Detection. 30354-30364 - Weiqi Yan
, Lvhai Chen, Huaijia Kou, Shengchuan Zhang, Yan Zhang, Liujuan Cao:
UCOD-DPL: Unsupervised Camouflaged Object Detection via Dynamic Pseudo-label Learning. 30365-30375 - Jinghao Bian, Mingtao Feng, Weisheng Dong, Fangfang Wu, Jianqiao Luo, Yaonan Wang, Guangming Shi:
Feature Information Driven Position Gaussian Distribution Estimation for Tiny Object Detection. 30376-30386 - Vishesh Kumar, Akshay Agarwal:
A Unified, Resilient, and Explainable Adversarial Patch Detector. 30387-30397 - Zhen Qu, Xian Tao, Xinyi Gong, Shichen Qu, Qiyu Chen
, Zhengtao Zhang, Xingang Wang, Guiguang Ding:
Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection. 30398-30408 - Wenqiao Li, Yao Gu, Xintao Chen, Xiaohao Xu, Ming Hu, Xiaonan Huang, Yingna Wu:
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection. 30409-30419 - Ying Jin, Jinlong Peng, Qingdong He, Teng Hu, Jiafu Wu, Hao Chen, Haoxuan Wang, Wenbing Zhu, Mingmin Chi, Jun Liu
, Yabiao Wang:
Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation. 30420-30429 - Yusuke Matsui:
LotusFilter: Fast Diverse Nearest Neighbor Search via a Learned Cutoff Table. 30430-30439 - Haokun Chen, Hang Li, Yao Zhang, Jinhe Bi, Gengyuan Zhang, Yueqi Zhang, Philip Torr, Jindong Gu, Denis Krompass, Volker Tresp:
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models. 30440-30450 - Kai Wang, Zekai Li, Zhi-Qi Cheng, Samir Khaki, Ahmad Sajedi, Ramakrishna Vedantam, Konstantinos N. Plataniotis, Alexander Hauptmann, Yang You:
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios. 30451-30461 - Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Meikang Qiu, Shuhan Qi, Shu-Tao Xia:
Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation. 30462-30471 - Weixiang Zhang, Shuzhao Xie, Chengwei Ren, Siyi Xie, Chen Tang, Shijia Ge, Mingzi Wang, Zhi Wang:
EVOS: Efficient Implicit Neural Training via EVOlutionary Selector. 30472-30482 - Shizhen Zhao, Xin Wen, Jiahui Liu, Chuofan Ma, Chunfeng Yuan, Xiaojuan Qi:
Learning from Neighbors: Category Extrapolation for Long-Tail Learning. 30483-30492 - Anshul Nasery, Jonathan Hayase, Pang Wei Koh, Sewoong Oh:
PLeaS - Merging Models with Permutations and Least Squares. 30493-30502 - Jiayi Guo, Junhao Zhao, Chaoqun Du, Yulin Wang, Chunjiang Ge, Zanlin Ni, Shiji Song, Humphrey Shi, Gao Huang:
Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment. 30503-30513 - Ke Ma, Jiaqi Tang, Bin Guo, Fan Dang, Sicong Liu, Zhui Zhu, Lei Wu, Cheng Fang, Ying-Cong Chen, Zhiwen Yu, Yunhao Liu:
SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity. 30514-30523 - Qiang Zhang, Mengsheng Zhao
, Jiawei Liu, Fanrui Zhang, Yongchao Xu, Zheng-Jun Zha:
Hierarchical Knowledge Prompt Tuning for Multi-task Test-Time Adaptation. 30524-30533 - Jiangpeng He
, Zhihao Duan, Fengqing Zhu
:
CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning. 30534-30544 - Jiashuo Li, Shaokun Wang, Bo Qian, Yuhang He, Xing Wei, Qiang Wang, Yihong Gong:
Dynamic Integration of Task-Specific Adapters for Class Incremental Learning. 30545-30555 - Yunlong Li, Xiabi Liu, Liyuan Pan, Yuchen Ren:
Task-Specific Gradient Adaptation for Few-Shot One-Class Classification. 30556-30565 - Peihua Deng, Jiehua Zhang, Xichun Sheng, Chenggang Yan, Yaoqi Sun, Ying Fu, Liang Li:
Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free Unsupervised Domain Adaptation. 30566-30576 - Xiran Wang, Jian Zhang, Lei Qi, Yinghuan Shi:
Balanced Direction from Multifarious Choices: Arithmetic Meta-Learning for Domain Generalization. 30577-30587 - Yushan Lai, Guowen Li, Haoyuan Liang, Juepeng Zheng, Zhiyu Ye:
ADU: Adaptive Detection of Unknown Categories in Black-Box Domain Adaptation. 30588-30598 - Dongkwan Lee, Kyomin Hwang, Nojun Kwak:
Unlocking the Potential of Unlabeled Data in Semi-Supervised Domain Generalization. 30599-30608 - Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang, Kai Wang, Yan Yan:
Distilling Long-tailed Datasets. 30609-30618 - Changkun Ye, Russell Tsuchida, Lars Petersson, Nick Barnes:
Open Set Label Shift with Test Time Out-of-Distribution Reference. 30619-30629 - Yifeng Yang, Lin Zhu, Zewen Sun, Hengyu Liu, Qinying Gu, Nanyang Ye:
OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary. 30630-30639 - Yifei Zhang, Hao Zhu, Alysa Ziying Tan, Dianzhi Yu, Longtao Huang, Han Yu:
pFedMxF: Personalized Federated Class-Incremental Learning with Mixture of Frequency Aggregation. 30640-30650 - Changlong Shi
, He Zhao, Bingjie Zhang, Mingyuan Zhou, Dandan Guo, Yi Chang:
FedAWA: Adaptive Optimization of Aggregation Weights in Federated Learning Using Client Vectors. 30651-30660 - Zhengyi Zhong, Weidong Bao, Ji Wang, Shuai Zhang, Jingxuan Zhou, Lingjuan Lyu
, Wei Yang Bryan Lim:
Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter. 30661-30670 - Yongli Xiang, Ziming Hong, Lina Yao, Dadong Wang, Tongliang Liu:
Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising. 30671-30681 - Zhaoyu Zhang, Yang Hua
, Guanxiong Sun, Hui Wang
, Seán F. McLoone:
Improving the Training of Data-Efficient GANs via Quality Aware Dynamic Discriminator Rejection Sampling. 30682-30691 - Ming Sun
, Rui Wang, Zixuan Zhu, Lihua Jing, Yuanfang Guo:
EntropyMark: Towards More Harmless Backdoor Watermark via Entropy-based Constraint for Open-source Dataset Copyright Protection. 30692-30701 - Yong Xie, Weijie Zheng, Hanxun Huang, Guangnan Ye, Xingjun Ma:
Towards Million-Scale Adversarial Robustness Evaluation With Stronger Individual Attacks. 30702-30711 - Jiani Ni
, He Zhao, Jintong Gao, Dandan Guo, Hongyuan Zha:
Balancing Two Classifiers via A Simplex ETF Structure for Model Calibration. 30712-30721 - Xu Yan, Jun Yin, Jie Wen
:
Incomplete Multi-View Multi-label Learning via Disentangled Representation and Label Semantic Embedding. 30722-30731 - Yuan Sun, Yongxiang Li, Zhenwen Ren, Guiduo Duan, Dezhong Peng, Peng Hu:
ROLL: Robust Noisy Pseudo-label Learning for Multi-View Clustering with Noisy Correspondence. 30732-30741 - Rittwika Kansabanik, Adrian Barbu:
Feature Selection for Latent Factor Models. 30742-30751 - Jiahua Rao, Hanjing Lin, Leyu Chen, Jiancong Xie, Shuangjia Zheng, Yuedong Yang
:
Multi-modal Contrastive Learning with Negative Sampling Calibration for Phenotypic Drug Discovery. 30752-30762 - Wanyi Chen, Zihua Zhao, Jiangchao Yao, Ya Zhang, Jiajun Bu, Haishuai Wang:
Multi-modal Medical Diagnosis via Large-small Model Collaboration. 30763-30773 - Yuan Tian, Kaiyuan Ji, Rongzhao Zhang, Yankai Jiang, Chunyi Li, Xiaosong Wang, Guangtao Zhai:
Towards All-in-One Medical Image Re-Identification. 30774-30786 - Alice Heiman, Xiaoman Zhang, Emma Chen, Sung Eun Kim, Pranav Rajpurkar:
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models. 30787-30796 - Ta Duc Huy, Sen Kim Tran, Phan Nguyen, Nguyen Hoang Tran, Tran Bao Sam, Anton van den Hengel, Zhibin Liao, Johan W. Verjans
, Minh-Son To, Vu Minh Hieu Phan:
Interactive Medical Image Analysis with Concept-based Similarity Reasoning. 30797-30806 - Tim Lenz, Peter Neidlinger, Marta Ligero, Georg Wölflein, Marko van Treeck, Jakob Nikolas Kather:
Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning. 30807-30817 - Jiuyang Dong, Junjun Jiang, Kui Jiang, Jiahan Li, Yongbing Zhang:
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning. 30818-30828 - Junchao Zhu, Ruining Deng, Tianyuan Yao, Juming Xiong, Chongyu Qu, Junlin Guo, Siqi Lu, Mengmeng Yin, Yu Wang, Shilin Zhao, Haichun Yang
, Yuankai Huo:
ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics. 30829-30838 - Ming Hu, Jianfu Yin
, Zhuangzhuang Ma, Jianheng Ma, Feiyu Zhu, Bingbing Wu, Ya Wen, Meng Wu, Cong Hu, Bingliang Hu, Quan Wang:
beta-FFT: Nonlinear Interpolation and Differentiated Training Strategies for Semi-Supervised Medical Image Segmentation. 30839-30849 - Maregu Assefa
, Muzammal Naseer, Iyyakutti Iyappan Ganapathi, Syed Sadaf Ali, Mohamed L. Seghier, Naoufel Werghi:
DyCON: Dynamic Uncertainty-aware Consistency and Contrastive Learning for Semi-supervised Medical Image Segmentation. 30850-30860 - Saad Wazir
, Daeyoung Kim:
Rethinking Decoder Design: Improving Biomarker Segmentation Using Depth-to-Space Restoration and Residual Linear Attention. 30861-30871 - Maximilian Rokuss, Yannick Kirchhoff, Seval Akbal, Balint Kovacs, Saikat Roy, Constantin Ulrich, Tassilo Wald, Lukas T. Rotkopf, Heinz-Peter Schlemmer, Klaus H. Maier-Hein
:
LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging. 30872-30885 - Junjie Zhou, Shouju Wang, Yuxia Tang, Qi Zhu, Daoqiang Zhang, Wei Shao:
DAMM-Diffusion: Learning Divergence-Aware Multi-Modal Diffusion Model for Nanoparticles Distribution Prediction. 30886-30895 - Ziwei Zhao, Zhixing Zhang, Yuhang Liu, Zhao Zhang, Haojun Yu, Dong Wang, Liwei Wang:
DeformCL: Learning Deformable Centerline Representation for Vessel Extraction in 3D Medical Image. 30896-30905 - S. Mazdak Abulnaga, Andrew Hoopes, Neel Dey, Malte Hoffmann, Bruce Fischl, John V. Guttag, Adrian V. Dalca:
MultiMorph: On-demand Atlas Construction. 30906-30917 - Yejee Shin, Yeeun Lee, Hanbyol Jang, Geonhui Son, Hyeongyu Kim, Dosik Hwang:
Anatomical Consistency and Adaptive Prior-informed Transformation for Multi-contrast MR Image Synthesis via Diffusion Model. 30918-30927 - Thomas Walker, Salvatore Esposito, Daniel Rebain, Amir Vaxman, Arno Onken, Changjian Li, Oisin Mac Aodha:
CrossSDF: 3D Reconstruction of Thin Structures From Cross-Sections. 30928-30937
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.