Starred repositories
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”
[NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
An Open Source text-to-speech system built by inverting Whisper.
Machine learning metrics for distributed, scalable PyTorch applications.
The official implementation of HierSpeech++
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Official PyTorch implementation of BigVGAN (ICLR 2023)
Vector (and Scalar) Quantization, in Pytorch
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
🔊 Text-Prompted Generative Audio Model
An Open-source Streaming High-fidelity Neural Audio Codec
A multi-voice TTS system trained with an emphasis on quality
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
PyTorch implementations of Generative Adversarial Networks.
⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡
Official repo for consistency models.
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
The simplest, fastest repository for training/finetuning medium-sized GPTs.