huutuongtu

😀

Huh?

Huu Tuong Tu huutuongtu

😀

Huh?

Strygwyr

16 followers · 62 following

Vietnam

Achievements

Lists (16)

Sort

Stars

289 stars written in Python

Clear filter

zai-org / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,096 1,209 Updated Nov 4, 2025

PKU-YuanGroup / Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 12,067 1,070 Updated Oct 29, 2025

rushter / MLAlgorithms

Minimal and clean examples of machine learning algorithms implementations

Python 10,919 1,774 Updated Jun 15, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…

Python 10,917 947 Updated Nov 6, 2025

speechbrain / speechbrain

A PyTorch-based Speech Toolkit

Python 10,736 1,596 Updated Nov 6, 2025

SparkAudio / Spark-TTS

Spark-TTS Inference Code

Python 10,682 1,139 Updated Apr 9, 2025

modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!

Python 10,591 988 Updated Nov 6, 2025

lucidrains / denoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python 10,136 1,226 Updated Aug 4, 2025

karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 10,116 974 Updated Jul 1, 2024

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 9,988 1,665 Updated Nov 6, 2025

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,482 766 Updated May 27, 2025

arogozhnikov / einops

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Python 9,262 386 Updated Aug 12, 2025

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,074 828 Updated Nov 3, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 8,379 1,022 Updated Nov 3, 2025

netease-youdao / EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Python 8,365 733 Updated Aug 13, 2024

YaoFANGUK / video-subtitle-extractor

视频硬字幕提取，生成srt文件。无需申请第三方API，本地实现文本识别。基于深度学习的视频字幕提取框架，包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

Python 8,002 829 Updated Aug 21, 2025

Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,965 789 Updated Feb 11, 2024

jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python 7,725 1,382 Updated Dec 6, 2023

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 7,573 560 Updated Sep 15, 2025

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

Python 6,886 641 Updated Aug 15, 2025

openai / consistency_models

Official repo for consistency models.

Python 6,432 434 Updated Mar 22, 2024

rtqichen / torchdiffeq

Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

Python 6,213 980 Updated Apr 4, 2025

yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 6,037 629 Updated Aug 10, 2024

bytedance / MegaTTS3

Python 6,014 463 Updated Aug 29, 2025

davidteather / TikTok-Api

The Unofficial TikTok API Wrapper In Python

Python 5,870 1,112 Updated Oct 14, 2025

canopyai / Orpheus-TTS

Towards Human-Sounding Speech

Python 5,699 484 Updated May 6, 2025

lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers

Python 5,668 485 Updated Nov 6, 2025

mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Python 5,589 590 Updated Nov 2, 2025

souzatharsis / podcastfy

An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI

Python 5,586 648 Updated Oct 31, 2025

huggingface / parler-tts

Inference and training library for high-quality TTS models.

Python 5,463 581 Updated Dec 10, 2024

Previous Next

Huu Tuong Tu huutuongtu

Lists (16)

Aligner

Audio Enhancement

DATASET

improve_model_architecture

Interactive AI

MDD

MLOPS

SE

Singing Voice

Speaker Diarization

Speech LLM

Speech quality assessment

Speech Separation

Speech Tokenizer

Tool

trader

Stars