Stars
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
You like pytorch? You like micrograd? You love tinygrad! ❤️
Paper reading notes on Deep Learning and Machine Learning
[IEEE T-PAMI 2024] All you need for End-to-end Autonomous Driving
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
We write your reusable computer vision tools. 💜
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
A curated list of foundation models for vision and language tasks
Awesome papers & datasets specifically focused on long-term videos.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Computer Vision Annotation Tool (CVAT) is a leading platform for building high-quality visual datasets for vision AI. It offers open-source, cloud, and enterprise products, as well as labeling serv…
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
An open-source framework for training large multimodal models.
🎢 Creating and sharing simulation environments for embodied and synthetic data research
Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
Visual tracking library based on PyTorch.
An on-going paper list on new trends in 3D vision with deep learning
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
The Replica Dataset v1 as published in https://arxiv.org/abs/1906.05797 .
A data generation pipeline for creating semi-realistic synthetic multi-object videos with rich annotations such as instance segmentation masks, depth maps, and optical flow.
PointTrack (ECCV2020 ORAL): Segment as Points for Efficient Online Multi-Object Tracking and Segmentation