Starred repositories
The first open-source harness builder for AI coding. Make AI coding deterministic and repeatable.
The operating layer for Claude Code + OpenAI Codex: persistent project memory, intent routing, safety hooks, cost telemetry, and parallel agent fleets.
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
[IROS 2024] Representing 3D sparse map points and lines for camera relocalization; [IROS 2025] Improved 3D Point-Line Mapping Regression for Camera Relocalization
[AAAI 2025] Official implementation of "OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on"
🌊 [ECCV'24 Oral] MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.
ChatDev 2.0: Dev All through LLM-powered Multi-Agent Collaboration
[NeurIPS 2023] This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Official implementation for the ICCV 2023 paper "NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space"
A simple react one page landing page templates for startups/companies
An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents
An open-source visual programming environment for battle-testing prompts to LLMs.
A high-throughput and memory-efficient inference and serving engine for LLMs
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
Doppelgangers: Learning to Disambiguate Images of Similar Structures
Align 3D Point Cloud with Multi-modalities for Large Language Models
Code release for "Cut and Learn for Unsupervised Object Detection and Instance Segmentation" and "VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation"
Official implementation for ACM MM 2023 paper '360-Degree Panorama Generation from Few Unregistered NFoV Images'
DVIS: Decoupled Video Instance Segmentation Framework
[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation"
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, le…
Get up and running with Kimi-K2.6, GLM-5.1, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
LightGlue: Local Feature Matching at Light Speed (ICCV 2023)