- Shanghai
- https://x.com/FeitengLi
- @FeitengLi
-
FSA/FST algorithms, differentiable, with PyTorch compatibility.
Cuda Apache License 2.0 UpdatedDec 13, 2025 -
OmniSenseVoice Public
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
-
VibeVoice Public
Forked from microsoft/VibeVoiceOpen-Source Frontier Voice AI
Python MIT License UpdatedDec 9, 2025 -
DeepPhonemizer Public
Forked from spring-media/DeepPhonemizerGrapheme to phoneme conversion with deep learning.
-
sebbs Public
Forked from merlresearch/sebbsPrediction of sound event bounding boxes (SEBBs)
-
NeMo Public
Forked from NVIDIA-NeMo/NeMoA scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
torch-audiomentations Public
Forked from iver56/torch-audiomentationsFast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
-
pysubs2 Public
Forked from tkarabela/pysubs2A Python library for editing subtitle files
Python MIT License UpdatedNov 16, 2025 -
am-cf-tunnel Public template
Forked from amclubs/am-cf-tunnel这是一个基于 Cloudflare Workers 和 Pages平台的脚本,通过EDtunnel修改,使用该脚本可以自动生成VLESS、Trojan免费节点,并配置信息使用在线配置转换到 Clash、 Singbox 、Quantumult X等工具中。
JavaScript Apache License 2.0 UpdatedOct 20, 2025 -
Wan2.2 Public
Forked from Wan-Video/Wan2.2Wan: Open and Advanced Large-Scale Video Generative Models
-
DiffSynth-Studio Public
Forked from modelscope/DiffSynth-StudioEnjoy the magic of Diffusion models!
Python Apache License 2.0 UpdatedSep 19, 2025 -
vall-e Public
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
-
wenyan-mcp Public
Forked from caol64/wenyan-mcp文颜 MCP Server 可以让 AI 自动将 Markdown 文章排版后发布至微信公众号。
CSS UpdatedAug 7, 2025 -
DiscoSeqSampler Public
Distributed Coordinated Sequence Sampler
-
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
Python BSD 3-Clause "New" or "Revised" License UpdatedJul 24, 2025 -
NotebookTTS Public
Text-To-Speech for NotebookLM
-
ZipVoice Public
Forked from k2-fsa/ZipVoiceFast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Python Apache License 2.0 UpdatedJul 18, 2025 -
Magic-TryOn Public
Forked from vivoCameraResearch/Magic-TryOnMagicTryOn is a video virtual try-on framework based on a large-scale video diffusion Transformer.
Python Other UpdatedJun 16, 2025 -
HunyuanVideo-Avatar Public
Forked from Tencent-Hunyuan/HunyuanVideo-AvatarPython Other UpdatedJun 9, 2025 -
descript-audio-codec Public
Forked from descriptinc/descript-audio-codecState-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
-
lhotse Public
Forked from lhotse-speech/lhotseTools for handling speech data in machine learning projects.
-
-
-
Wan2.1 Public
Forked from Wan-Video/Wan2.1Wan: Open and Advanced Large-Scale Video Generative Models
Python Apache License 2.0 UpdatedMay 9, 2025 -
HunyuanCustom Public
Forked from Tencent-Hunyuan/HunyuanCustomHunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Python Other UpdatedMay 8, 2025 -
Aligner-SUPERB Public
Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark
-
-
fairseq Public
Forked from facebookresearch/fairseqFacebook AI Research Sequence-to-Sequence Toolkit written in Python.
Python MIT License UpdatedMar 1, 2025 -
-
audiossl Public
Forked from Audio-WestlakeU/audiosslA library built for easier audio self-supervised training, downstream tasks evaluation