- Shanghai
- https://x.com/FeitengLi
- @FeitengLi
Stars
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Multilingual TTS model with voice cloning and duration control, based on T5Gemma encoder-decoder LLM
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
PyTorch implementation of ReaLchords, ReaLJam and GAPT: real-time music accompaniment systems with generative models trained via reinforcement learning
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
Kaldi-compatible online fbank extractor without external dependencies
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform
My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.
WebCodecs is a flexible web API for encoding and decoding audio and video.
Foundational Model for Speech Recognition Tasks
Official inference repo for FLUX.2 models
The fastest way to create an HTML app
Text-to-text alignment algorithm for speech recognition error analysis.
🎥 Python and OpenCV-based scene cut/transition detection program & library.
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
QuantMind is an intelligent knowledge extraction and retrieval framework for quantitative finance.
Open-source reproducible benchmarks from Argmax
A free, open source, and extensible speech-to-text application that works completely offline.
Voice-to-text app for macOS to transcribe what you say to text almost instantly
Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Data Pipeline, Models, and Benchmark for Omni-Captioner.
Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation"
LongLive: Real-time Interactive Long Video Generation
【Accepted by TPAMI】Human Motion Video Generation: A Survey (https://ieeexplore.ieee.org/document/11106267)
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference