Lists (16)
Sort Name ascending (A-Z)
Stars
An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI
📄 Configuration files that enhance Cursor AI editor experience with custom rules and behaviors
A high-throughput and memory-efficient inference and serving engine for LLMs
real time face swap and one-click video deepfake with only a single image
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
The Unofficial TikTok API Wrapper In Python
Sequence alignement methods with helpers for PyTorch.
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Audio Codec Speech processing Universal PERformance Benchmark
AudioLDM training, finetuning, evaluation and inference.
Audio generation using diffusion models, in PyTorch.
serp-ai / bark-with-voice-clone
Forked from suno-ai/bark🔊 Text-prompted Generative Audio Model - With the ability to clone voices
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.
📝 Algorithms and data structures implemented in JavaScript with explanations and links to further readings