Skip to content
View HeCheng0625's full-sized avatar

Block or report HeCheng0625

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Python 861 41 Updated Oct 23, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 1,867 127 Updated Oct 30, 2024

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 5,866 450 Updated Oct 29, 2024

The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”

Python 106 1 Updated Oct 28, 2024

[NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Python 42 1 Updated Jun 25, 2024

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Jupyter Notebook 444 17 Updated Oct 16, 2024

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,157 854 Updated Jul 1, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 5,931 635 Updated Oct 22, 2024

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 264 20 Updated Oct 30, 2024

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 35,044 4,273 Updated Aug 16, 2024

An Open Source text-to-speech system built by inverting Whisper.

Jupyter Notebook 3,915 212 Updated Jun 18, 2024

Machine learning metrics for distributed, scalable PyTorch applications.

Python 2,127 404 Updated Oct 29, 2024

The official implementation of HierSpeech++

Python 1,176 134 Updated Feb 20, 2024

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,035 319 Updated Nov 14, 2023

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 870 100 Updated Sep 5, 2024

Vector (and Scalar) Quantization, in Pytorch

Python 2,555 204 Updated Oct 23, 2024

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python 20,861 2,129 Updated Jul 18, 2024

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 35,900 4,224 Updated Aug 19, 2024

An Open-source Streaming High-fidelity Neural Audio Codec

Python 430 20 Updated Oct 28, 2024

A multi-voice TTS system trained with an emphasis on quality

Jupyter Notebook 13,137 1,814 Updated Aug 19, 2024

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Python 1,276 99 Updated Sep 24, 2023

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 20,062 2,496 Updated Aug 15, 2024

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Python 25,390 2,911 Updated Sep 2, 2024

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python 167,910 44,326 Updated Oct 30, 2024

PyTorch implementations of Generative Adversarial Networks.

Python 16,387 4,071 Updated Jun 18, 2024

⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡

Python 2,932 200 Updated Nov 26, 2023

Official repo for consistency models.

Python 6,126 412 Updated Mar 22, 2024

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.

Python 204 12 Updated Jul 25, 2024

Let us control diffusion models!

Python 30,212 2,721 Updated Feb 25, 2024

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 36,999 5,864 Updated Aug 19, 2024
Next