Skip to content
View huutuongtu's full-sized avatar
😀
Huh?
😀
Huh?

Block or report huutuongtu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
289 stars written in Python
Clear filter

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,096 1,209 Updated Nov 4, 2025

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 12,067 1,070 Updated Oct 29, 2025

Minimal and clean examples of machine learning algorithms implementations

Python 10,919 1,774 Updated Jun 15, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…

Python 10,917 947 Updated Nov 6, 2025

A PyTorch-based Speech Toolkit

Python 10,736 1,596 Updated Nov 6, 2025

Spark-TTS Inference Code

Python 10,682 1,139 Updated Apr 9, 2025

Enjoy the magic of Diffusion models!

Python 10,591 988 Updated Nov 6, 2025

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python 10,136 1,226 Updated Aug 4, 2025

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 10,116 974 Updated Jul 1, 2024

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 9,988 1,665 Updated Nov 6, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 9,482 766 Updated May 27, 2025

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Python 9,262 386 Updated Aug 12, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,074 828 Updated Nov 3, 2025

Nano vLLM

Python 8,379 1,022 Updated Nov 3, 2025

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Python 8,365 733 Updated Aug 13, 2024

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

Python 8,002 829 Updated Aug 21, 2025

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,965 789 Updated Feb 11, 2024

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python 7,725 1,382 Updated Dec 6, 2023

Text-audio foundation model from Boson AI

Python 7,573 560 Updated Sep 15, 2025

Multilingual Voice Understanding Model

Python 6,886 641 Updated Aug 15, 2025

Official repo for consistency models.

Python 6,432 434 Updated Mar 22, 2024

Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

Python 6,213 980 Updated Apr 4, 2025

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 6,037 629 Updated Aug 10, 2024
Python 6,014 463 Updated Aug 29, 2025

The Unofficial TikTok API Wrapper In Python

Python 5,870 1,112 Updated Oct 14, 2025

Towards Human-Sounding Speech

Python 5,699 484 Updated May 6, 2025

A concise but complete full-attention transformer with a set of promising experimental features from various papers

Python 5,668 485 Updated Nov 6, 2025

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Python 5,589 590 Updated Nov 2, 2025

An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI

Python 5,586 648 Updated Oct 31, 2025

Inference and training library for high-quality TTS models.

Python 5,463 581 Updated Dec 10, 2024