-
Tsinghua University
- Tsinghua University
-
09:40
(UTC -07:00) - https://knightnemo.github.io
- https://knightnemo.github.io/blog
Lists (4)
Sort Name ascending (A-Z)
Stars
A latent text-to-image diffusion model
Reference PyTorch implementation and models for DINOv3
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
An annotated implementation of the Transformer paper.
Flax is a neural network library for JAX that is designed for flexibility.
NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Code for "GVHMR: World-Grounded Human Motion Recovery via Gravity-View Coordinates", Siggraph Asia 2024
assistant tools for attention visualization in deep learning
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
🍌 The official starter kit for the Nano Banana Hackathon. Clone this repo to get building fast!
Data preparation and loader for AMASS
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
A pytorch implementation of the vector quantized variational autoencoder (https://arxiv.org/abs/1711.00937)
RynnBrain: Open Embodied Foundation Models
Ego4d dataset repository. Download the dataset, visualize, extract features & example usage of the dataset
DreamGen: Nvidia GEAR Lab's initiative to solve the robotics data problem using world models
An open-source toolbox for fast sampling of diffusion models. Official implementations of our works published in ICML, NeurIPS, CVPR, J. Stat. Mech.
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
DelinQu / SimplerEnv-OpenVLA
Forked from simpler-env/SimplerEnvEvaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo, and OpenVLA) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)
[ICML 2020] PyTorch Code for "One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control"
Official Code for What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks (In NeurIPS 2023)
Official codebase for the paper "How to build a consistency model: Learning flow maps via self-distillation" (NeurIPS 2025).
WorldArena: A Unified Benchmark for Evaluating Perception and Functional Utility of Embodied World Models
Implementation of the MetaController proposed in "Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning" from the Paradigms of Intelligence team at Google