Skip to content
View liuyifan22's full-sized avatar

Highlights

  • Pro

Block or report liuyifan22

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LaTeX Thesis Template for Tsinghua University

TeX 5,251 1,144 Updated Apr 4, 2026

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.

Python 61,497 5,323 Updated Apr 14, 2026

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,809 208 Updated Apr 10, 2026

[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling

Python 4,498 369 Updated Sep 26, 2025

Gibson Environments: Real-World Perception for Embodied Agents

C 939 149 Updated Apr 15, 2024

Just random useful things for LeRobot, LeKiwi, and SO-ARM100/101

Python 26 10 Updated Mar 12, 2026

Official code release for ConceptGraphs

Python 822 118 Updated Oct 16, 2025
Python 185 22 Updated Aug 20, 2024

A paper list for spatial reasoning

714 39 Updated Jan 19, 2026

Submanifold sparse convolutional networks

C++ 2,141 335 Updated Jan 9, 2024

An open source implementation of CLIP.

Python 13,685 1,276 Updated Apr 6, 2026

RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2

Python 311 57 Updated Aug 20, 2024

This is a repository for listing papers on scene graph generation and application.

632 42 Updated Apr 9, 2026

A batched implementation for efficient Qwen2.5-VL inference.

Python 24 1 Updated Jul 16, 2025

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.org/abs/2404.12390 [ECCV 2024]

Python 164 8 Updated Sep 27, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 18,944 2,420 Updated Apr 7, 2026

Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.

Python 1,535 159 Updated Dec 17, 2025

[NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"

Python 144 4 Updated Nov 4, 2025

Verlog: A Multi-turn RL framework for LLM agents

Python 73 7 Updated Mar 27, 2026
Python 123 12 Updated Jul 22, 2025
Python 2 Updated Aug 7, 2025

PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators

Python 107 11 Updated Nov 21, 2024

RL training scripts for learning an agent using ProcTHOR.

Python 36 7 Updated Feb 18, 2025

A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites

806 41 Updated Dec 17, 2025

ICCV 2025 | TesserAct: Learning 4D Embodied World Models

Python 384 18 Updated Aug 4, 2025

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 7,904 808 Updated Mar 24, 2026
C++ 1 Updated Jun 22, 2023

Co-first author in paper: LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer

Python 2 Updated Apr 20, 2025
Next