HeCheng0625

Yuancheng0625 HeCheng0625

A PhD student at CUHK(SZ).

56 followers · 4 following

Achievements

Starred repositories

deepseek-ai / Janus

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Python 861 41 Updated Oct 23, 2024

THUDM / GLM-4-Voice

GLM-4-Voice | 端到端中英语音对话模型

Python 1,867 127 Updated Oct 30, 2024

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 5,866 450 Updated Oct 29, 2024

opendatalab / LOKI

The official implementation of the paper “LOKI：A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”

Python 106 1 Updated Oct 28, 2024

amphionspace / SD-Eval

[NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Python 42 1 Updated Jun 25, 2024

bytedance / 1d-tokenizer

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Jupyter Notebook 444 17 Updated Oct 16, 2024

karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,157 854 Updated Jul 1, 2024

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 5,931 635 Updated Oct 22, 2024

liutaocode / TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 264 20 Updated Oct 30, 2024

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 35,044 4,273 Updated Aug 16, 2024

collabora / WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

Jupyter Notebook 3,915 212 Updated Jun 18, 2024

Lightning-AI / torchmetrics

Machine learning metrics for distributed, scalable PyTorch applications.

Python 2,127 404 Updated Oct 29, 2024

sh-lee-prml / HierSpeechpp

The official implementation of HierSpeech++

Python 1,176 134 Updated Feb 20, 2024

lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,035 319 Updated Nov 14, 2023

NVIDIA / BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 870 100 Updated Sep 5, 2024

lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Python 2,555 204 Updated Oct 23, 2024

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python 20,861 2,129 Updated Jul 18, 2024

suno-ai / bark

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 35,900 4,224 Updated Aug 19, 2024

facebookresearch / AudioDec

An Open-source Streaming High-fidelity Neural Audio Codec

Python 430 20 Updated Oct 28, 2024

neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality

Jupyter Notebook 13,137 1,814 Updated Aug 19, 2024

lucidrains / naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Python 1,276 99 Updated Sep 24, 2023

karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 20,062 2,496 Updated Aug 15, 2024

Vision-CAIR / MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Python 25,390 2,911 Updated Sep 2, 2024

Significant-Gravitas / AutoGPT

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

Python 167,910 44,326 Updated Oct 30, 2024