-
Sun Yat-sen University
- kongzhecn.github.io
Lists (5)
Sort Name ascending (A-Z)
Stars
AI PPT赛道终结者,史上最最最强 PPT Skill!!! 使用GPT生成豪华的图片格式PPT,然后转换为完全可编辑的PPTX文件。
Wan: Open and Advanced Large-Scale Video Generative Models
A toolkit for speaker diarization.
Official Implementation of LongLive-RAG: A general retrieval-augmented framework for long video generation.
JoyAI-Echo: Pushing the Frontier of Long Audio-Visual Generation
Official page of ImmerIris: A Large-Scale Dataset and Benchmark for Off-Axis and Unconstrained Iris Recognition in Immersive Applications.
"CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub: https://clianything.cc/
Implementation of Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players
A Minimal and Elegant Framework & Tutorial for Real-Time Interactive World Models
Codex skill for converting slide images, PDFs, and image-based PPTX files into editable PowerPoint decks.
Multimodal RL training framework for diffusion & omni models
Official Repo of "D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models"
A unified framework for easy reinforcement learning in Flow-Matching models
AI generates a real, editable PowerPoint from any document — native shapes & animations, speaker notes voiced as audio narration, and the option to follow your own .pptx template, not slide images …
Interactive World Model papers organized by core research challenges.
[ICML 2026] World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
A Curated List of Awesome Video World Models with AR Diffusion: Covering Algorithms, Applications, and Infrastructure, Aimed at Serving as a Comprehensive Resource for Researchers, Practitioners, a…
GPT-Image-2 API and Prompts
Official Implementation of MultiWorld: Scalable Multi-Agent Multi-View Video World Models
[ICLR 26 Oral] Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
Helios: Real Real-Time Long Video Generation Model
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…
[ICML 2026] DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation
ReactMotion: Generating Reactive Listener Motions from Speaker Utterance
The official code of "Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling"