Stars
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Hunt down social media accounts by username across social networks
A high-throughput and memory-efficient inference and serving engine for LLMs
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
The swiss army knife of lossless video/audio editing
We write your reusable computer vision tools. 💜
AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
SpotX patcher used for patching the desktop version of Spotify
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
AirLLM 70B inference with single 4GB GPU
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
High-resolution models for human tasks.
DeepSeek-VL: Towards Real-World Vision-Language Understanding
A Python package for segmenting geospatial data with the Segment Anything Model (SAM)
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]
[NeurIPS 2025] YOLOv12: Attention-Centric Real-Time Object Detectors
Images to inference with no labeling (use foundation models to train supervised models).
Trackers gives you clean, modular re-implementations of leading multi-object tracking algorithms released under the permissive Apache 2.0 license. You combine them with any detection model you alre…
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
[ICLR 2025] From anything to mesh like human artists. Official impl. of "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers"
A upside-down, fast, portable, and compact 3D printer
NVIDIA DeepStream SDK 8.0 / 7.1 / 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models
Official code for "FeatUp: A Model-Agnostic Frameworkfor Features at Any Resolution" ICLR 2024