Lists (30)
Sort Name ascending (A-Z)
agent
alg
architecture
audio
backend
conditioning
diffusion
disentangle
flow
frontend
infra
language
llm
lora
manifold
ml_materials
mlops
MoE
monitoring_and_operation
music
optimization
personalization
quantization
reinforcement_learning
Scala
small_model
style_transfer
video
vision
web
Starred repositories
litagin02 / Style-Bert-VITS2
Forked from fishaudio/Bert-VITS2Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles.
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Safe direct-style streaming, concurrency and resiliency for Scala on the JVM
Open-source speech AI models from KRAFTON, including Raon-Speech and Raon-SpeechChat for speech understanding, generation, and real-time full-duplex conversation.
Official implementation of AsymFlow, pi-Flow, GMFlow
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come …
Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoregressive.
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Create stunning demos for free. Open-source, no subscriptions, no watermarks, and free for commercial use. An alternative to Screen Studio.
Warp is an agentic development environment, born out of the terminal.
High-Quality Voice Cloning TTS for 600+ Languages
A Java port of ratatui — build rich terminal UIs from Java
The first continuous diffusion language model that rivals discrete counterparts on standard language modeling benchmarks like LM1B and OpenWebText.
Hearth fire starter - incubator/dogfooding for Hearth-based macro libraries
🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman
The open-source managed agents platform. Turn coding agents into real teammates — assign tasks, track progress, compound skills.
TriAttention — Efficient long reasoning with trigonometric KV cache compression. Enables OpenClaw local deployment on memory-constrained GPUs.
CLI tool for coding agents and developers to query the public API of any Maven JVM dependency — get symbol signatures, list packages, search by name, and inspect dependency trees. Powered by Coursi…