Stars
A simple, hands-free Python voice assistant that runs 100% locally. This script uses openwakeword for wakeword detection, webrtcvad for silence detection, OpenAI's Whisper for transcription, and Ol…
Implementations of select OEIS integer sequences in Python 3.
This is a foundational template for a 3D multiplayer game, developed in Godot Engine 4.6.
Local voice AI powered by Ollama, Kokoro, Nemotron STT, and LiveKit.
A complete voice AI starter for LiveKit Agents with Python.
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
A framework for building realtime voice AI agents 🤖🎙️📹
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
A free, open source, and extensible speech-to-text application that works completely offline.
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
Open-source framework for conversational voice AI agents
Open Source framework for voice and multimodal conversational AI
Full stack, modern web application template. Using FastAPI, React, SQLModel, PostgreSQL, Docker, GitHub Actions, automatic HTTPS and more.
RandomX, KawPow, CryptoNight and GhostRider unified CPU/GPU miner and RandomX benchmark
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Build local voice agents with open-source models
projectM - Cross-platform Music Visualization Library. Open-source and Milkdrop-compatible.
A high-throughput and memory-efficient inference and serving engine for LLMs
insanely-fast-whisper with support for AMD GPU's with rocm 6.1 - 7.1
Robust Speech Recognition via Large-Scale Weak Supervision
ACE-Step: A Step Towards Music Generation Foundation Model
Installation script for an AI applications using ROCm on Linux.
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok, Bielik, chat, vision, voice, RAG, image and video generation, agents, tools, MCP, plugins, …