Stars
A free, open source, and extensible speech-to-text application that works completely offline.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Paper2Agent is a multi-agent AI system that automatically transforms research papers into interactive AI agents with minimal human input.
Unofficial InstantDB Admin API client for Python.
This repository delivers end-to-end, code-first tutorials covering every layer of production-grade GenAI agents, guiding you from spark to scale with proven patterns and reusable blueprints for re…
A blazing fast AI Gateway with integrated guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.
[SIGGRAPH Asia 2025] DreamO: A Unified Framework for Image Customization
AI Analytics and Knowledge Engine for RAG over large-scale, heterogeneous data. - The only MCP Server you'll ever need
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Open Source Application for Advanced LLM + Diffusion Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
[ACM MM 2025] FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Build Real-Time Knowledge Graphs for AI Agents
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
A TTS model capable of generating ultra-realistic dialogue in one pass.
[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning
Python tool for converting files and office documents to Markdown.
[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
davidbrowne17 / csm-streaming
Forked from SesameAILabs/csmRealtime demo, Streaming and Finetuning code for CSM
A pipeline parallel training script for diffusion models.
A Conversational Speech Generation Model
A simple screen parsing tool towards pure vision based GUI agent
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
[CVPR2025 Highlight] Video Generation Foundation Models: https://saiyan-world.github.io/goku/
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" (ICCV 2021) https://arxiv.org/abs/2104.02699
The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data 🔥