Stars
📷 [CVPR'26] Camera-controlled text-to-video generation, now with intrinsics, distortion and orientation control!
FireRed-Image-Edit is a powerful image editing foundation model achieving open-source state-of-the-art performance with precise instruction following, high-fidelity generation, superior identity co…
MAMBO-G: A training-free, magnitude-aware adaptive guidance framework for accelerating Classifier-Free Guidance (CFG). Dynamically mitigates early-step overshoot in flow-matching models (SD3.5, Qwe…
An agentic skills framework & software development methodology that works.
A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
An Open-Source Project to Unify Audio Processing and Generation
A ComfyUI custom node for 3D camera angle control. Provides an interactive Three.js viewport to adjust camera angles and outputs formatted prompt strings for multi-angle image generation.
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
Google Antigravity AI模型配额监控插件 (Antigravity AI Model Quota Watching)
Professional Antigravity Account Manager & Switcher. One-click seamless account switching for Antigravity Tools. Built with Tauri v2 + React (Rust).专业的 Antigravity 账号管理与切换工具。为 Antigravity 提供一键无缝账号切…
STEP-GUI: The top GUI agent solution in the galaxy. Developed by the StepFun-GELab team and powered by StepFun’s cutting-edge research capabilities.
Convert any video/image into a tiny size. 100% free & open-source. Available for Mac, Windows & Linux.
Send files and folders anywhere in the world without storing in cloud - any size, any format, no accounts, no restrictions.
One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs (sm_121 architecture)
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
带有 WebUI 的 NovelAI 批量生成工具, 支持批量文生图, 图生图, 局部重绘, 导演工具, 角色分区, 角色参考, 支持 wildcards, 支持超分降噪, 支持元数据解析及抹除, 支持反推 tag, 支持图片筛选, 插件加载!
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
ACE-Step: A Step Towards Music Generation Foundation Model
A terminal-based dashboard for managing cron jobs locally and on servers.
⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构…
Official implementation of WorldForge [Code Released 🔥& Accepted by CVPR 2026! 🎉 ]
我的 nano-banana 创意玩法大合集! 持续更新中!
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer