Stars
"ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"
The retrieval layer for production AI systems. Lightning-fast (<10ms) search without vector databases. Built for browser, edge, on-device, and cloud.
A standalone desktop/smartTV overlay that translates system audio into 3D Sign Language animation in real-time.
Open-source American english TTS model. 6 voices and a high performance inference library for Apple Silicon.
[SIGGRAPH 2026] Pixal3D: Pixel-Aligned 3D Generation from Images
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
[ECCV 2026] Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"
Self hosted, real-time digital human agent platform. Build voice-first AI agents with WebRTC, persona memory, tools, RAG, and optional digital-human video.
FlashRT is a high-performance realtime inference engine for small-batch, latency-sensitive AI workloads. The flagship integration is production VLA control for Pi0, Pi0.5, GROOT N1.6, and Pi0-FAST.…
Browser-based text-to-speech powered by OmniVoice. Runs entirely locally via WebGPU and WebAssembly.
Open source video conferencing app powered by LiveKit. Built with Django and React.
Self-hosted DTLN noise suppression plugin for LiveKit Agents — no cloud API, no per-minute fees
Building actual open source including dataset Multilingual TTS more than 150 languages with Voice Cloning.
A framework for efficient model inference with omni-modality models
🎙️ VoxSherpa TTS Offline Neural Text-to-Speech Engine for Android ⚡ Sherpa-ONNX powered 🔊 Natural voice synthesis 📱 Fully offline processing 🚀 No cloud • No limits
Vietnamese TTS with instant voice cloning • On-device • Real-time CPU inference • 24kHz audio quality • Chuyển văn bản thành giọng nói tiếng Việt • Text to speech tiếng Việt • TTS tiếng Việt
Automate the process of making money online.
VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency and Speaking rate Control
This repository contains the official code for LPIPS-AttnWav2Lip. The paper has been accepted by the journal Speech Communication.
Detect Anything in Real Time: Real-time object detection using frontier object detection models.
The open-source app everyone uses to manage agents at work
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD
Real-time voice-to-avatar interaction server combining OpenAI Realtime API for conversational AI with an Audio to Expression model for synchronized avatar facial animation.
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!