Highlights
Lists (24)
Sort Name ascending (A-Z)
🤖 AI
💯 Algorithm
🔍 BigQuery
🔖
📎 CLIP / VLM
Data Mining
👁️🗨️ Vision
Game Bot
🧑💻 Git
🌐 GNN
👨 Personal Web Templates
💬 NLP
💻 nodesktop
JS, CSS🧊 object-centric learning
📖 Open Vocabulary
🎑 Scene Graph
📜 Templates
⚙️ Setup, dotfile
🎇 Part Segmentation
⭐ Hetero GNN / CL
🖥️ Ubuntu
Visualization
VLM Bias
VLM Bias🎲 Wordle
wordle- All languages
- ABAP
- ASP
- AppleScript
- Assembly
- BibTeX Style
- C
- C#
- C++
- CSS
- Cuda
- Cython
- Dockerfile
- Go
- HTML
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- Lua
- MATLAB
- MDX
- Makefile
- OCaml
- Objective-C
- OpenEdge ABL
- PHP
- Perl
- PowerShell
- Python
- R
- Reason
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Swift
- TeX
- TypeScript
- Vim Script
- Vue
- Yacc
Starred repositories
Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
Witness the aha moment of VLM with less than $3.
[NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS
A friendly programming language from the future
Stay in flow while building with AI
A step-by-step reasoning framework for 3D scene understanding
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents [NeurIPS 2025 Spotlight]
Nav-R1: Reasoning and Navigation in Embodied Scenes
Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"
Official PyTorch implementation of "Aligned but Stereotypical? The Hidden Influence of System Prompts on Social Bias in LVLM-Based Text-to-Image Models"
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
[NeurIPS 2025] We propose a first RL-based personalized image captioning framework with well-defined verifiable rewards.
Code for the paper "GRPO is Secretly a Process Reward Model": https://arxiv.org/abs/2509.21154
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
Code for 3D-LLM: Injecting the 3D World into Large Language Models
evelinehong / 3D-CLR-Official
Forked from zsh2000/3D-CLR[CVPR 2023] Code for "3D Concept Learning and Reasoning from Multi-View Images"