kongzhecn

Follow

Zhe Kong kongzhecn

Follow

94 followers · 24 following

Sun Yat-sen University
kongzhecn.github.io

Achievements

Achievements

Lists (5)

Sort

codebase

evaluation

post train

survey

tools

Stars

GordenSun / GordenSuperPPTSkills

AI PPT赛道终结者，史上最最最强 PPT Skill！！！使用GPT生成豪华的图片格式PPT，然后转换为完全可编辑的PPTX文件。

Python 1,128 105 Updated Jun 7, 2026

Wan-Video / Wan2.2

Wan: Open and Advanced Large-Scale Video Generative Models

Python 16,332 2,030 Updated Mar 17, 2026

Tongyi-MAI / Z-Image

Python 11,597 790 Updated Feb 9, 2026

BUTSpeechFIT / DiariZen

A toolkit for speaker diarization.

Jupyter Notebook 481 56 Updated May 29, 2026

qixinhu11 / LongLive-RAG

Official Implementation of LongLive-RAG: A general retrieval-augmented framework for long video generation.

Python 72 Updated Jun 4, 2026

jd-opensource / JoyAI-Echo

JoyAI-Echo: Pushing the Frontier of Long Audio-Visual Generation

Python 1,644 149 Updated Jun 16, 2026

NiborPolaris / ImmerIris

Official page of ImmerIris: A Large-Scale Dataset and Benchmark for Off-Axis and Unconstrained Iris Recognition in Immersive Applications.

HTML 30 1 Updated Jun 9, 2026

HKUDS / CLI-Anything

"CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub: https://clianything.cc/

Python 43,631 4,083 Updated Jun 14, 2026

nv-tlabs / Gamma-World

Implementation of Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

Python 635 9 Updated Jun 17, 2026

shengshu-ai / minWM

A Minimal and Elegant Framework & Tutorial for Real-Time Interactive World Models

Python 625 12 Updated Jun 15, 2026

ningzimu / image-to-editable-ppt-skill

Codex skill for converting slide images, PDFs, and image-based PPTX files into editable PowerPoint decks.

Python 774 39 Updated Jun 22, 2026

verl-project / verl-omni

Multimodal RL training framework for diffusion & omni models

Python 402 58 Updated Jun 22, 2026

vvvvvjdy / D-OPSD

Official Repo of "D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models"

Python 256 7 Updated May 22, 2026

X-GenGroup / Flow-Factory

A unified framework for easy reinforcement learning in Flow-Matching models

Python 584 47 Updated Jun 18, 2026

hugohe3 / ppt-master

AI generates a real, editable PowerPoint from any document — native shapes & animations, speaker notes voiced as audio narration, and the option to follow your own .pptx template, not slide images …

Python 30,351 2,652 Updated Jun 22, 2026

EasonTuT / Awesome-Interactive-World-Model

Interactive World Model papers organized by core research challenges.

Python 241 8 Updated Jun 19, 2026

microsoft / World-R1

[ICML 2026] World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

Python 392 15 Updated Jun 3, 2026

gracezhao1997 / Awesome-Video-World-Models-with-AR-Diffusion

A Curated List of Awesome Video World Models with AR Diffusion: Covering Algorithms, Applications, and Infrastructure, Aimed at Serving as a Comprehensive Resource for Researchers, Practitioners, a…

TeX 608 17 Updated Jun 4, 2026

EvoLinkAI / awesome-gpt-image-2-API-and-Prompts

GPT-Image-2 API and Prompts

Python 16,907 1,714 Updated Jun 22, 2026

CIntellifusion / MultiWorld

Official Implementation of MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Python 231 12 Updated May 12, 2026

vita-epfl / Stable-Video-Infinity

[ICLR 26 Oral] Stable Video Infinity: Infinite-Length Video Generation with Error Recycling

Python 2,501 219 Updated Jun 3, 2026

PKU-YuanGroup / Helios

Helios: Real Real-Time Long Video Generation Model

Python 1,923 149 Updated Jun 10, 2026

wanshuiyin / Auto-claude-code-research-in-sleep

ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…

Python 12,486 1,133 Updated Jun 21, 2026

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 49,545 5,530 Updated May 6, 2026

GAIR-NLP / daVinci-MagiHuman

Python 2,061 211 Updated Apr 11, 2026

WeChatCV / Identity-as-Presence

Python 15 Updated Mar 30, 2026

Guoxu1233 / DreamID-Omni

[ICML 2026] DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation

Python 268 17 Updated May 22, 2026

awakening-ai / ReactMotion

ReactMotion: Generating Reactive Listener Motions from Speaker Utterance

Python 105 2 Updated Mar 30, 2026

fishaudio / fish-speech

SOTA Open Source TTS

Python 30,899 2,636 Updated Jun 9, 2026

HKUST-C4G / diffusion-rm

The official code of "Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling"

Python 60 3 Updated Feb 26, 2026