Lists (16)
Sort Name ascending (A-Z)
Stars
real time face swap and one-click video deepfake with only a single image
A high-throughput and memory-efficient inference and serving engine for LLMs
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
The Unofficial TikTok API Wrapper In Python
An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Audio generation using diffusion models, in PyTorch.
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
AudioLDM training, finetuning, evaluation and inference.
Audio Codec Speech processing Universal PERformance Benchmark
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Sequence alignement methods with helpers for PyTorch.