-
autonomy_stack_go2 Public
Forked from jizhang-cmu/autonomy_stack_go2Full Autonomy Stack for Unitree Go2
C++ UpdatedMar 31, 2025 -
samurai Public
Forked from yangchris11/samuraiOfficial repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
Python Apache License 2.0 UpdatedMar 18, 2025 -
VILA Public
Forked from NVlabs/VILAVILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
Python Apache License 2.0 UpdatedJan 24, 2025 -
VLN-Survey-with-Foundation-Models Public
Forked from zhangyuejoslin/VLN-Survey-with-Foundation-ModelsUpdatedJan 8, 2025 -
RoboticsDiffusionTransformer Public
Forked from thu-ml/RoboticsDiffusionTransformerRDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Python MIT License UpdatedDec 24, 2024 -
VLN-CE-Isaac Public
Forked from yang-zj1026/NaVILA-BenchVision-Language Navigation Benchmark in Isaac Lab
Python Other UpdatedDec 20, 2024 -
legged-loco Public
Forked from yang-zj1026/legged-locoLow-level locomotion policy training in Isaac Lab
Python MIT License UpdatedDec 15, 2024 -
data-juicer Public
Forked from datajuicer/data-juicerA one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Python Apache License 2.0 UpdatedNov 8, 2024 -
CogVideo Public
Forked from zai-org/CogVideotext and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Python Apache License 2.0 UpdatedOct 28, 2024 -
Show-o Public
Forked from showlab/Show-oRepository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Python Apache License 2.0 UpdatedOct 27, 2024 -
videophy Public
Forked from Hritikbansal/videophyVideo Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics
Python MIT License UpdatedOct 11, 2024 -
VAR Public
Forked from FoundationVision/VAR[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
Python MIT License UpdatedOct 6, 2024 -
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Python MIT License UpdatedSep 27, 2024 -
Open-MAGVIT2 Public
Forked from TencentARC/SEED-VokenOpen-MAGVIT2: Democratizing Autoregressive Visual Generation
Python Apache License 2.0 UpdatedSep 27, 2024 -
Open-Sora Public
Forked from hpcaitech/Open-SoraOpen-Sora: Democratizing Efficient Video Production for All
Python Apache License 2.0 UpdatedAug 9, 2024 -
TextHawk Public
Forked from yuyq96/TextHawkExploring Efficient Fine-Grained Perception of Multimodal Large Language Models
Python UpdatedApr 16, 2024 -
Vary Public
Forked from Ucas-HaoranWei/VaryOfficial code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
Python UpdatedDec 12, 2023 -
Monkey Public
Forked from Yuliang-Liu/MonkeyMonkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Python MIT License UpdatedDec 4, 2023 -
unilm Public
Forked from microsoft/unilmLarge-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Python MIT License UpdatedNov 29, 2023 -
Kosmos2.5 Public
Forked from kyegomez/Kosmos2.5My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"
Python MIT License UpdatedNov 27, 2023 -
nougat Public
Forked from facebookresearch/nougatImplementation of Nougat Neural Optical Understanding for Academic Documents
Python MIT License UpdatedNov 17, 2023 -
donut Public
Forked from clovaai/donutOfficial Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Python MIT License UpdatedNov 15, 2023 -
Awesome-Open-Vocabulary-Object-Detection Public
Forked from witnessai/Awesome-Open-Vocabulary-Object-DetectionA curated list of papers, datasets and resources pertaining to open vocabulary object detection.
UpdatedAug 21, 2023 -
sam-hq Public
Forked from SysCV/sam-hqSegment Anything in High Quality
Python Apache License 2.0 UpdatedAug 16, 2023 -
fc-clip Public
Forked from bytedance/fc-clipThis repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Python Apache License 2.0 UpdatedAug 15, 2023 -
LISA Public
Forked from dvlab-research/LISAProject Page for "LISA: Reasoning Segmentation via Large Language Model"
Python Apache License 2.0 UpdatedAug 10, 2023 -
ONE-PEACE Public
Forked from OFA-Sys/ONE-PEACEA general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Python Apache License 2.0 UpdatedAug 10, 2023 -
DeOP Public
Forked from CongHan0808/DeOPOpen-vocabulary Semantic Segmentation
Python UpdatedAug 3, 2023 -
Segment-Everything-Everywhere-All-At-Once Public
Forked from UX-Decoder/Segment-Everything-Everywhere-All-At-OnceOfficial implementation of the paper "Segment Everything Everywhere All at Once"
Python Apache License 2.0 UpdatedJul 28, 2023 -
Grounded-Segment-Anything Public
Forked from IDEA-Research/Grounded-Segment-AnythingGrounded-SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Jupyter Notebook Apache License 2.0 UpdatedJul 25, 2023