Stars
Soprano: Instant, Ultra-Realistic Text-to-Speech
SpeechPlus: Small LLM-Based Text-to-Speech Library 🚀
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…
Helloworld for agentic frameworks, minimial but runnable! LangGraph, Agno, AutoGen, Smolagents, OpenAI Agents, etc.
anan235 / dia-multilingual
Forked from nari-labs/diaA TTS model capable of generating ultra-realistic dialogue in one pass.
Run Orpheus 3B Locally With LM Studio
NeMo text processing for ASR and TTS
Automatically create a crew and tasks for CrewAI
litagin02 / Style-Bert-VITS2
Forked from fishaudio/Bert-VITS2Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles.
Repository for research project about watermarkng audio
A lightweight end-to-end text-to-speech model
Download YouTube video (or supply your own) and generate dual languange subtitles with OpenAI Whisper and translation API (GPT) 下载 YouTube 视频(或提供您自己的视频)并使用 Whisper 和翻译API (GPT) 生成双语字幕
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
[NO LONGER MAINTAINED] Command-line utility for auto-generating subtitles for any video file
Learn Python with Colaboratory (colab.research.google.com)
ManaTTS is the largest open Persian speech dataset with 114+ hours of transcribed audio. Includes data collection pipeline and tools. Suitable for Persian text-to-speech models.
Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant,…
A program to dub non-english media with modern AI speech synthesis, diarization, and voice cloning!
Noise removal/ reducer from the audio file in python. De-noising is done using Wavelets and thresholding is done by VISU Shrink thresholding technique
A neural word aligner based on multilingual BERT
A Telegram Bot that automatically reacts to posts in Telegram Channels, groups, and private messages, developed as a server-less application.✨
Fine-Tuning your VITS model using a pre-trained model