Highlights
- Pro
Stars
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Official Code for DragGAN (SIGGRAPH 2023)
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Generative Models by Stability AI
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Official inference repo for FLUX.1 models
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Lets make video diffusion practical!
Wan: Open and Advanced Large-Scale Video Generative Models
An open source implementation of CLIP.
Generate 3D objects conditioned on text or images
Official implementation of AnimateDiff.
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
Python client for Baidu Yun (Personal Cloud Storage) 百度云/百度网盘Python客户端
Real-Time High-Resolution Background Matting
Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.