Stars
Dependency-Aware Structural Retrieval for Massive Agent Skills
Let Skills Evolve Collectively with Agentic Evolver
Opinionated skills for AI coding agents to create stunning diagrams and visualizations directly in Markdown. These skills extend agent capabilities across diagram generation, data visualization, an…
MOSS-TTS-Nano is an open-source multilingual tiny speech generation model from MOSI.AI and the OpenMOSS team. With only 0.1B parameters, it is designed for realtime speech generation, can run direc…
JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing.
CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies
Official skills for the GLM family of models.
PixelSmile: Fine-grained facial expression editing with continuous control, reduced semantic entanglement, and strong identity preservation.
Multimodal OCR: Parse Anything from Documents
A fast, helpful, and open-source document parser
Covo-Audio is a 7B-parameter end-to-end large audio language model that directly processes continuous audio inputs and generates audio outputs within a single unified architecture.
QIE-Object-Remover-Bbox is an advanced, AI-powered image editing application specifically designed to perform precise object removal and background inpainting based on user-defined bounding box coo…
FireRed-Image-Edit is a powerful image editing foundation model achieving open-source state-of-the-art performance with precise instruction following, high-fidelity generation, superior identity co…
Speech recognition API service powered by FunASR and Qwen-ASR, supporting 52 languages, compatible with OpenAI API and Alibaba Cloud Speech API. 基于 FunASR 与 Qwen3-ASR 的语音识别 API 服务,支持 52 种语言,兼容 Open…
yuekaizhang / Fun-ASR-vllm
Forked from FunAudioLLM/Fun-ASRFun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
[CVPR 2026] Offical implementation of the paper "HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images".
🚀 AI 全自动短视频引擎 | AI Fully Automated Short Video Engine
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E2E Retrieval.
A unified and fully open-source framework for instruction-guided and reference-guided video editing using natural language.
Helios: Real Real-Time Long Video Generation Model
SoulX-FlashHead: A unified 1.3B-parameter framework designed for high-fidelity, infinite-length, and real-time streaming portrait video generation.
The library for web and native user interfaces.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞