-
Zhongguancun Academy
- Beijing
-
13:35
(UTC +08:00) - https://orcid.org/0009-0007-3883-6793
Stars
Official repo for "GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes"
Official code for the paper “Look Where It Matters: Training-Free Ultra-HR Remote Sensing VQA via Adaptive Zoom Search”.
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
[NeurIPS 2025 D&B] RSCC: A Real-World Remote Sensing Change Caption Dataset
[CVPR 2025 🔥] EarthDial: Turning Multi-Sensory Earth Observations to Interactive Dialogues.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis
Official repo for "S5: Scalable Semi-Supervised Semantic Segmentation in Remote Sensing"
Official repo for "[ICCV 2025] Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling"
Official repo for "REX-RAG: Reasoning Exploration with Policy Correction in Retrieval-Augmented Generation"
Official repo for "SPEX: A Vision-Language Model for Land Cover Extraction on Spectral Remote Sensing Images"
Official repo for "Advances in Radiance Field for Dynamic Scene: From Neural Field to Gaussian Field"
Official repo for [NeurlPS 2025 Spotlight] "GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution"
[arXiv: 2505.12307] LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
Official repo for "AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs"
Official repo for "TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series"
Official repo for [NeurlPS 2025] "DGSolver: Diffusion Generalist Solver with Universal Posterior Sampling for Image Restoration"
Official repo for [NeurlPS 2025] "RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing"
Reference PyTorch implementation and models for DINOv3
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Towards a Unified Copernicus Foundation Model for Earth Vision
This is the implement of the paper "DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding"
This is an official implementation for "HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery" (CVPR2025)
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
verl: Volcano Engine Reinforcement Learning for LLMs
Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)