-
Tsinghua University
- Beijing
- https://liuyifan22.github.io
- in/yifan-liu-8a0374371
Highlights
- Pro
Stars
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
Gibson Environments: Real-World Perception for Embodied Agents
Just random useful things for LeRobot, LeKiwi, and SO-ARM100/101
Official code release for ConceptGraphs
A paper list for spatial reasoning
Submanifold sparse convolutional networks
An open source implementation of CLIP.
RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2
This is a repository for listing papers on scene graph generation and application.
A batched implementation for efficient Qwen2.5-VL inference.
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.
[NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"
WentseChen / Verlog
Forked from verl-project/verlVerlog: A Multi-turn RL framework for LLM agents
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators
RL training scripts for learning an agent using ProcTHOR.
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
ICCV 2025 | TesserAct: Learning 4D Embodied World Models
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
liuyifan22 / LLaVA-UHD
Forked from thunlp/LLaVA-UHDCo-first author in paper: LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer