Starred repositories
NeMo text processing for ASR and TTS
Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.
automatic mastering plugin for live streaming, podcasts and internet radio.
Open Vision Agents by Stream. Build Vision Agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.
An NVDA add-on that provides structured navigation (headings, lists, tables, etc.) in editable text areas for Markdown files.
Application designed to optimize, customize and enhance your Windows experience.
Official inference framework for 1-bit LLMs
A free, online learning platform to make quality education accessible for all.
Financial data platform for analysts, quants and AI agents.
SenseUI is an AI-powered web extension providing web design feedback for blind and visually impaired developers. Join our community as tester, developer or advisor to help advance this project and …
A TTS that fits in your CPU (and pocket)
A list of python problems for beginners and intermediate developers
A module to create readable `"multipart/form-data"` streams. Can be used to submit forms and file uploads to other web applications.
Simple configuration GUI for FlexASIO
A flexible universal ASIO driver that uses the PortAudio sound I/O library. Supports WASAPI (shared and exclusive), KS, DirectSound and MME.
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
An accessible, light-weight, cross-platform ebook and document reader.
GUI for a Vocal Remover that uses Deep Neural Networks.
Bili23 Downloader 是一款跨平台(Windows/Linux/macOS)的 B 站视频下载工具,支持下载 B 站投稿视频、番剧、电影等类型视频。支持多线程加速、断点续传等特性,搭配图形化界面与零配置操作,提供高效便捷的下载体验。
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
vits2 backbone with multilingual-bert
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
C# binding for portaudio supporting Linux, macOS, Windows, iOS
SoulX-FlashTalk is the first 14B model to achieve sub-second start-up latency (0.87s) while maintaining a real-time throughput of 32 FPS on an 8xH800 node.
An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.