-
Columbia University
- New York, US
Highlights
- Pro
Stars
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
SALMONN: Speech Audio Language Music Open Neural Network
An Open-Sourced LLM-empowered Foundation TTS System
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
Encode and decode audio samples to/from compressed latent representations!
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
The open source code for SimpleSpeech series
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
Evaluation Protocol for Large-Scale Zero-Shot TTS Literature
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
[ACMMM'2024] Generative Expressive Conversational Speech Synthesis
Official PyTorch implementation of BigVGAN (ICLR 2023)
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
Foundational model for human-like, expressive TTS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
The official GitHub page for the survey paper "A Survey of Large Language Models".