Starred repositories
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
A latent text-to-image diffusion model
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
12 Weeks, 24 Lessons, AI for All!
🔊 Text-Prompted Generative Audio Model
面向开发者的 LLM 入门教程,吴恩达大模型系列课程中文版
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
A guidance language for controlling large language models.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant
stable diffusion webui colab
StableLM: Stability AI Language Models
This repository contains the source code for the paper First Order Motion Model for Image Animation
A multi-voice TTS system trained with an emphasis on quality
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
High-Resolution Image Synthesis with Latent Diffusion Models
PyTorch code and models for the DINOv2 self-supervised learning method.
Foundational Models for State-of-the-Art Speech and Text Translation
LAVIS - A One-stop Library for Language-Vision Intelligence
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Inpaint anything using Segment Anything and inpainting models.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
骆驼(Luotuo): Open Sourced Chinese Language Models. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技
[CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.
A denoising autoencoder + adversarial losses and attention mechanisms for face swapping.
serp-ai / bark-with-voice-clone
Forked from suno-ai/bark🔊 Text-prompted Generative Audio Model - With the ability to clone voices