You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A mobile-first web app that reads academic papers aloud using a cloned voice, built for passive listening during moments where you want to engage your audio-sensory cognitive channels even while occupied with other task e.g long commutes and boring physical tasks.
Professional local-first AI production pipeline for long-form narration. Clone voices and generate studio-grade audiobooks (M4B/MP3) using Coqui XTTS-v2 and support for Voxtral (cloud)
Voice Stack is the speech stack I built for my own homelab so I could keep ASR and TTS workloads close to the data I care about—mostly Bazarr in my media server and my OpenWebUI containers so I can literally talk to my AI.
VoxLibri: The Ultimate AI-Powered eBook to Audiobook Converter. 🎧📚 Transform any eBook into a high-quality audiobook with state-of-the-art neural Text-to-Speech (TTS) technology. Featuring voice cloning, multi-language support (English, Tamil, and more), and a sleek Streamlit UI for a premium narration experience.
A professional, high-fidelity AI Text-to-Speech (TTS) and Voice Cloning engine. Built using Coqui XTTS v2 for realistic, 100% offline vocal synthesis and custom voice cloning with privacy and performance at its core.
A locally running Turkish text-to-speech application developed with Coqui XTTS v2 and Gradio. Generate Turkish voiceovers with multiple speakers on your own computer.
A full-stack voice-to-voice AI agent built with React.js frontend and Flask backend. Users interact using voice, which is processed by an LLM (Google Gemini), and the response is spoken back.
Free voice cloning for creators using Coqui XTTS-v2 on Google Colab. Clone your voice with just a few minutes of audio. Complete guide to build your own notebook.
Virtual AI Assistant – A scalable, real-time AI-powered avatar platform with Live2D, WebSockets, LLM integration, and TTS, designed for immersive experiences and future AR/VR extensions.