Stars
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Reference PyTorch implementation and models for DINOv3
Easily train a good VC model with voice data <= 10 mins!
Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.
No fortress, purely open ground. OpenManus is Coming.
Train your AI self, amplify you, bridge the world
[ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
[CVPR 2025] DEIM: DETR with Improved Matching for Fast Convergence
Official implementation of the WACV 2025 ( Oral ) paper. RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision.
Code and models for the paper "The effectiveness of MAE pre-pretraining for billion-scale pretraining" https://arxiv.org/abs/2303.13496
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
EVE Series: Encoder-Free Vision-Language Models from BAAI
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
[NeurIPS 2023] HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
Pytorch Implementation of "SMITE: Segment Me In TimE" (ICLR 2025)