-
University of Science and Technology of China
- University of Science and Technology of China
-
08:33
(UTC -12:00)
Stars
Create mermaid diagrams in image format on-the-fly.
Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Google Gen AI Python SDK provides an interface for developers to integrate Google's generative models into their Python applications.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Convert PDF to markdown + JSON quickly with high accuracy
Zotero plugin to automatically move attachments and link them
[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.
Some tools to help move my notes from LogSeq to Obsidian
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Retrieve author and publication information from Google Scholar in a friendly, Pythonic way without having to worry about CAPTCHAs!
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual Alignment Benefit Vision Representations? (NeurIPS 2024)
Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
Video-Inpaint-Anything: This is the inference code for our paper CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility.
A generative world for general-purpose robotics & embodied AI learning.
Production-ready platform for agentic workflow development.
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
Open Overleaf/ShareLaTex projects in vscode, with full collaboration support.
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340