cpdu

Chenpeng Du cpdu

68 followers · 10 following

Achievements

Stars

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 42,360 4,847 Updated Apr 24, 2026

athrowaway2021 / comix

Seamlessly download and de-drm comics and manga from Kindle in highest possible quality

Python 92 25 Updated Feb 3, 2024

X-LANCE / KWStreamingSearch

Python 84 5 Updated Jun 25, 2025

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 11,210 1,105 Updated Nov 18, 2024

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 24,722 2,762 Updated Aug 12, 2024

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,539 310 Updated Nov 5, 2024

zai-org / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 7,076 618 Updated Jul 4, 2025

tianweiy / DMD2

(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis

Python 1,312 71 Updated Mar 5, 2025

QwenLM / Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,895 145 Updated Jul 5, 2024

metavoiceio / metavoice-src

Foundational model for human-like, expressive TTS

Python 4,194 690 Updated Jul 30, 2024

mlabonne / llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

78,730 9,159 Updated Feb 5, 2026

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,777 811 Updated Mar 25, 2026

francislata / unicats

An unofficial implementation of "UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding".

Python 26 1 Updated Nov 4, 2023

sony / bigvsan

Pytorch implementation of BigVSAN

Python 202 18 Updated Dec 9, 2025

modelscope / FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Python 443 34 Updated Jan 25, 2024

descriptinc / descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,774 178 Updated Jan 26, 2026

Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,953 780 Updated Feb 11, 2024

lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,207 333 Updated Sep 10, 2025

lucidrains / voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Python 683 55 Updated Oct 1, 2024

sp-nitech / diffsptk

A differentiable version of SPTK

Python 197 20 Updated Mar 26, 2026

facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,770 1,174 Updated Apr 8, 2026

k2-fsa / icefall

Python 1,404 409 Updated Apr 12, 2026

SpeechifyInc / Meta-voicebox

Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.

593 32 Updated Jun 19, 2023

WelkinYang / WaveODE

An ODE-based generative neural vocoder using Rectified Flow

Python 58 6 Updated Apr 29, 2023

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 42,198 4,807 Updated Apr 24, 2026

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 41,378 4,518 Updated Apr 13, 2026

liusongxiang / Large-Audio-Models

Keep track of big models in audio domain, including speech, singing, music etc.

509 31 Updated Sep 26, 2024

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 17,133 3,403 Updated Apr 26, 2026

NVIDIA / BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 1,207 145 Updated Sep 5, 2024

yoyolicoris / music-spectrogram-diffusion-pytorch

Python 88 6 Updated Jan 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chenpeng Du cpdu

Achievements

Achievements

Block or report cpdu

Stars

microsoft / VibeVoice

athrowaway2021 / comix

X-LANCE / KWStreamingSearch

salesforce / LAVIS

haotian-liu / LLaVA

gpt-omni / mini-omni

zai-org / GLM-4

tianweiy / DMD2

QwenLM / Qwen-Audio

metavoiceio / metavoice-src

mlabonne / llm-course

open-mmlab / Amphion

francislata / unicats

sony / bigvsan

modelscope / FunCodec

descriptinc / descript-audio-codec

Plachtaa / VALL-E-X

lifeiteng / vall-e

lucidrains / voicebox-pytorch

sp-nitech / diffsptk

facebookresearch / seamless_communication

k2-fsa / icefall

SpeechifyInc / Meta-voicebox

WelkinYang / WaveODE

deepspeedai / DeepSpeed

hpcaitech / ColossalAI

liusongxiang / Large-Audio-Models

NVIDIA-NeMo / NeMo

NVIDIA / BigVGAN

yoyolicoris / music-spectrogram-diffusion-pytorch