-
Nanjing University
- Nanjing, China
- https://z-jiaming.github.io/
Starred repositories
Zotero MCP: Connects your Zotero research library with Claude and other AI assistants via the Model Context Protocol to discuss papers, get summaries, analyze citations, and more.
Helios: Real Real-Time Long Video Generation Model
A curated list of papers on reinforcement learning for video generation
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
Lightweight Image Video Action Generation Inference Framework
Official Implementation of "MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives"
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Finetune HunyuanImage 3.0, a 80B unified understanding and generation model
A tool for running and customizing real-time, interactive generative AI pipelines and models
This repository contains a summary of knowledge cut-off dates for various large language models (LLMs), such as GPT, Claude, Gemini, Llama, and more.
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
HunyuanVideo-1.5: A leading lightweight video generation model
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
StreamDiffusion, Live Stream APP
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
[ICLR 26 Oral] Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
VideoNSA: Native Sparse Attention Scales Video Understanding
Official Repo for Self-Forcing++ High Quality Long Video Generation
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.