Starred repositories
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Production-ready platform for agentic workflow development.
Robust Speech Recognition via Large-Scale Weak Supervision
real time face swap and one-click video deepfake with only a single image
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
A high-throughput and memory-efficient inference and serving engine for LLMs
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
High-Resolution Image Synthesis with Latent Diffusion Models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
OpenMMLab Detection Toolbox and Benchmark
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
ncnn is a high-performance neural network inference framework optimized for the mobile platform
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
SGLang is a fast serving framework for large language models and vision language models.
Faster Whisper transcription with CTranslate2
Janus-Series: Unified Multimodal Understanding and Generation Models
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Development repository for the Triton language and compiler
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.