-
Institute of Automation, Chinese Academy of Sciences
- BEIJING, CHINA
Stars
Official implementation of Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions (NeurIPS DB Track'24 Spotlight).
Official implementation of WebVLN: Vision-and-Language Navigation on Websites
Everyday Object Disrupts Vision-and-Language Navigation Agent via Backdoor(VLN-ATT)
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
A human-annotated, fine-grained dataset for Vision-and-Language Navigation
Official Implementation of Frequency-enhanced Data Augmentation for Vision-and-Language Navigation (NeurIPS2023)
Inpaint anything using Segment Anything and inpainting models.
[TPAMI 2024] Official repo of "ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments"
Official implementation of Think Global, Act Local: Dual-scale GraphTransformer for Vision-and-Language Navigation (CVPR'22 Oral).
Room-across-Room (RxR) is a large-scale, multilingual dataset for Vision-and-Language Navigation (VLN) in Matterport3D environments. It contains 126k navigation instructions in English, Hindi and T…
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Code and Data of the CVPR 2022 paper: Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Code for NeurIPS 2021 paper "Curriculum Learning for Vision-and-Language Navigation"
A curated list of Multimodal Related Research.
cvpr2024/cvpr2023/cvpr2022/cvpr2021/cvpr2020/cvpr2019/cvpr2018/cvpr2017 论文/代码/解读/直播合集,极市团队整理
Official implementation of History Aware Multimodal Transformer for Vision-and-Language Navigation (NeurIPS'21).
Recent Transformer-based CV and related works.
Reading list for research topics in multimodal machine learning
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
Know What and Know Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
Reading list for research topics in embodied vision
Code of the CVPR 2021 Oral paper: A Recurrent Vision-and-Language BERT for Navigation
awesome grounding: A curated list of research papers in visual grounding
Reading list for research topics in embodied vision
[ACM MM 2021 Oral] Official repo of "Neighbor-view Enhanced Model for Vision and Language Navigation"
Pytorch implementation of CVPR2021 paper "MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation"
Evaluation code for various unsupervised automated metrics for Natural Language Generation.
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)