Starred repositories
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Robust Speech Recognition via Large-Scale Weak Supervision
real time face swap and one-click video deepfake with only a single image
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
A high-throughput and memory-efficient inference and serving engine for LLMs
High-Resolution Image Synthesis with Latent Diffusion Models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
OpenMMLab Detection Toolbox and Benchmark
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
SGLang is a fast serving framework for large language models and vision language models.
Faster Whisper transcription with CTranslate2
Janus-Series: Unified Multimodal Understanding and Generation Models
Bringing Old Photo Back to Life (CVPR 2020 oral)
🤩 Easy-to-use global IM bot platform designed for LLM era / 简单易用的大模型即时通信机器人开发平台 ⚡️ Bots for QQ / QQ频道 / Discord / LINE / WeChat(微信, 企业微信)/ Telegram / 飞书 / 钉钉 / Slack 🧩 Integrated with ChatGPT(GPT),…
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
The official GitHub page for the survey paper "A Survey of Large Language Models".
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
A paper list of object detection using deep learning.
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
[CVPR 2024] Official repository for "MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model"
Enjoy the magic of Diffusion models!
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…