-
Australian Institute for Machine Learning (AIML)
- Adelaide, Australia
-
20:35
(UTC +10:30) - https://gengzezhou.github.io/
- in/gengze-zhou-159095203
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
[ICML 2025 Spotlight] Direct Discriminative Optimization: Supercharging Diffusion/Autoregressive with GAN-type Discrimination
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Official implementation of Rethinking Training Dynamics in Scale-wise Autoregressive Generation
EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models
[NeurIPS 2025] The official repository of "Sekai: A Video Dataset towards World Exploration"
🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 推送,1分钟手机通知,无需…
Native Multimodal Models are World Learners
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
A simple state update rule to enhance length generalization for CUT3R
[CVPR 2025 Highlight] GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control
Tongyi Deep Research, the Leading Open-source Deep Research Agent
InternRobotics' open platform for building generalized navigation foundation models.
[TMLR 2024] repository for VLN with foundation models
🚀 Efficient implementations of state-of-the-art linear attention models
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
TradingAgents: Multi-Agents LLM Financial Trading Framework
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"
Official code for the CVPR 2025 paper "Navigation World Models".
Code release for paper "Test-Time Training Done Right"
Official implementation of EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance
Awesome habitat top down map work 🤩
Official Implementation of Diffusion Step Annealing (DiSA) in Autoregressive Image Generation
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL