Stars
MiniCPM-o 4.5: A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Mulitmodal Live Streaming on Your Phone
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
An Open-Source Project to Unify Audio Processing and Generation
A ComfyUI custom node for 3D camera angle control. Provides an interactive Three.js viewport to adjust camera angles and outputs formatted prompt strings for multi-angle image generation.
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
Google Antigravity AI模型配额监控插件 (Antigravity AI Model Quota Watching)
Professional Antigravity Account Manager & Switcher. One-click seamless account switching for Antigravity Tools. Built with Tauri v2 + React (Rust).专业的 Antigravity 账号管理与切换工具。为 Antigravity 提供一键无缝账号切…
STEP-GUI: The top GUI agent solution in the galaxy. Developed by the StepFun-GELab team and powered by StepFun’s cutting-edge research capabilities.
Convert any video into a tiny size.
Send files and folders anywhere in the world without storing in cloud - any size, any format, no accounts, no restrictions.
One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs (sm_121 architecture)
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
带有 WebUI 的 NovelAI 批量生成工具, 支持批量文生图, 图生图, 局部重绘, 导演工具, 角色分区, 角色参考, 支持 wildcards, 支持超分降噪, 支持元数据解析及抹除, 支持反推 tag, 支持图片筛选, 插件加载!
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
ACE-Step: A Step Towards Music Generation Foundation Model
A terminal-based dashboard for managing cron jobs locally and on servers.
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对…
Official implementation of WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
我的 nano-banana 创意玩法大合集! 持续更新中!
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[arXiv 2025] ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models
The open-source CapCut alternative
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程