Skip to content
View yl4579's full-sized avatar
  • Columbia University
  • New York, US

Highlights

  • Pro

Block or report yl4579

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Brand new TTS solution

Python 14,719 1,127 Updated Nov 24, 2024

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Python 1,220 151 Updated Oct 8, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 7,495 928 Updated Nov 27, 2024

SALMONN: Speech Audio Language Music Open Neural Network

Python 1,065 83 Updated Nov 21, 2024

An Open-Sourced LLM-empowered Foundation TTS System

Python 459 30 Updated Oct 17, 2024
Python 48 3 Updated Nov 22, 2024

LLM101n: Let's build a Storyteller

30,294 1,652 Updated Aug 1, 2024

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

Python 238 8 Updated Nov 12, 2024

Encode and decode audio samples to/from compressed latent representations!

Python 152 9 Updated Aug 16, 2024

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 86 4 Updated Sep 19, 2024
Python 6,845 537 Updated Nov 23, 2024

The open source code for SimpleSpeech series

Python 112 6 Updated Oct 8, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,613 175 Updated Nov 14, 2024

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python 100 10 Updated Nov 1, 2024

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Python 969 59 Updated Oct 24, 2024

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature

Python 66 7 Updated Sep 26, 2024

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 104 5 Updated Oct 1, 2024

Audio Large Language Models

144 8 Updated Nov 26, 2024

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,257 84 Updated Aug 13, 2024

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,041 44 Updated Nov 19, 2024

[ACMMM'2024] Generative Expressive Conversational Speech Synthesis

28 2 Updated Oct 28, 2024

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 905 107 Updated Sep 5, 2024

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Python 99 4 Updated Sep 20, 2024

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 7,660 748 Updated Jun 24, 2024

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Python 183 11 Updated Sep 10, 2024

Foundational model for human-like, expressive TTS

Python 3,906 661 Updated Jul 30, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 36,156 4,124 Updated Nov 7, 2024

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 10,515 822 Updated Aug 20, 2024
Python 574 27 Updated Feb 15, 2024
Next