Lists (2)
Sort Name ascending (A-Z)
Starred repositories
A feature-rich command-line audio/video downloader
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Python tool for converting files and office documents to Markdown.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
100+ AI Agent & RAG apps you can actually run — clone, customize, ship.
💫 Toolkit to help you get started with Spec-Driven Development
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Get your documents ready for gen AI
The original local LLM interface. Text, vision, tool-calling, training. UI + API, 100% offline and private.
A community-supported supercharged document management system: scan, index and archive all your documents
Official inference framework for 1-bit LLMs
[EMNLP2025] "LightRAG: Simple and Fast Retrieval-Augmented Generation"
Convert PDF to markdown + JSON quickly with high accuracy
Intelligent automation and multi-agent orchestration for Claude Code
Official inference repo for FLUX.1 models
A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
A TTS model capable of generating ultra-realistic dialogue in one pass.
A collaborative note taking, wiki and documentation platform that scales. Built with Django and React.
A collection of projects designed to help developers quickly get started with building deployable applications using the Claude API
An open source, privacy focused alternative to NotebookLM for teams with no data limits. Join our Discord: https://discord.gg/ejRNvftDp9
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
Self-hosted video downloader for YouTube and other sites (web UI for youtube-dl / yt-dlp)
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.