- New York
- https://tongdaxu.github.io/
Stars
Official Implementation of Paper Transfer between Modalities with MetaQueries
Official repository for “PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss”
Predicting the generation FID of latent diffusion, with a variant of reconstruction FID of Variational Auto-encoder.
Latent Diffusion Models with Masked AutoEncoders (LDMAE) official code
[AAAI 2026] Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment
A Pytorch port of the code from "Jet: A Modern Transformer-Based Normalizing Flow"
A library for efficient similarity search and clustering of dense vectors.
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
[ICCV 2025] Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
Towards Scalable Pre-training of Visual Tokenizers for Generation
The Official PyTorch Implementation of "LSGM: Score-based Generative Modeling in Latent Space" (NeurIPS 2021)
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]
State-of-the-Art VQ-VAE from Gaussian VAE without Training!
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.
Codec-aware perceptual super-resolution with a diffusion-based differentiable codec simulator (H.264/H.265/H.266).
A Library for Advanced Deep Time Series Models for General Time Series Analysis.
LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence https://arxiv.org/abs/2509.03505
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
Privacy‑first, on‑device LLM SDK powered by llama.cpp — unified C API with Android/iOS/Flutter bindings, low‑latency streaming, OpenAI‑compatible chat/function calling, multi‑LoRA stacking.