Stars
GELab: GUI Exploration Lab. One of the best GUI agent solutions in the galaxy, built by the StepFun-GELab team and powered by Step’s research capabilities.
Convert any video into a tiny size.
Send files and folders anywhere in the world without storing in cloud - any size, any format, no accounts, no restrictions.
One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs (sm_121 architecture)
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
带有 WebUI 的 NovelAI 批量生成工具, 支持批量文生图, 图生图, 局部重绘, 导演工具, 角色分区, 角色参考, 支持 wildcards, 支持超分降噪, 支持元数据解析及抹除, 支持反推 tag, 支持图片筛选, 插件加载!
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
ACE-Step: A Step Towards Music Generation Foundation Model
A terminal-based dashboard for managing cron jobs locally and on servers.
🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 推送,1分钟手机通知,无需…
Official implementation of WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance
我的 nano-banana 创意玩法大合集! 持续更新中!
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[arXiv 2025] ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models
The open-source CapCut alternative
带有 WebUI 的 NovelAI 量产工具, 实现了批量文生图; 批量图生图; 视频转绘; 分块重绘; 批量 Vibe; 批量局部重绘; 批量超分降噪; 批量自动打码; 批量添加水印; 批量上传 Pixiv; 图片筛选; 批量抹除, 还原或导出生成信息; 法术解析; 多模型反推提示词; ChatGPT; 动态加载插件; 自动 roll 画风串; 批量 Enhance; tag选择器; 涂…
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
An AI-powered custom node for ComfyUI designed to enhance workflow automation and provide intelligent assistance
An unofficial https://bgm.tv ui first app client for Android and iOS, built with React Native. 一个无广告、以爱好为驱动、不以盈利为目的、专门做 ACG 的类似豆瓣的追番记录,bgm.tv 第三方客户端。为移动端重新设计,内置大量加强的网页端难以实现的功能,且提供了相当的自定义选项。 目前已适配…
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.