Highlights
- Pro
Stars
Skills for Real Engineers. Straight from my .claude directory.
The ultimate training toolkit for finetuning diffusion models
Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)
Pioneering Automated GUI Interaction with Native Agents
Official PyTorch implementation of One-Minute Video Generation with Test-Time Training
[CVPR 2025] WildAvatar: Learning In-the-wild 3D Avatars from the Web
Wan: Open and Advanced Large-Scale Video Generative Models
Scalable and memory-optimized training of diffusion models
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a high-performance serving framework for large language models and multimodal models.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
[ICCV 2025, Oral] TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
No fortress, purely open ground. OpenManus is Coming.
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
SkyReels V1: The first and most advanced open-source human-centric video foundation model
LAVIS - A One-stop Library for Language-Vision Intelligence
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.
A simple pip-installable Python tool to generate your HTML citation world map from your Google Scholar ID.
FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
The best OSS video generation models, created by Genmo
Agent S: an open agentic framework that uses computers like a human
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)