-
Zhejiang University
- https://suhao07.github.io/
Highlights
- Pro
Lists (21)
Sort Name ascending (A-Z)
Agent
Autonomous Driving
coding
CV
Dataset
deeplearning
diffusion
Embodied AI
experience
Gaussian
LLM
math
planning
reinforcement learning
robotic
simulator/WM
SLAM
tools
VideoModel
VLMN
WorldModel
Stars
Official implementation of RAE-NWM: Navigation World Model in Dense Visual Representation Space.
你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候,对你的期望是很高的。 一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...)…
HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
【ICLR 2026】 Official implementation of [OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation]
An OpenStreetMap MCP server implementation that enhances LLM capabilities with location-based services and geospatial data.
Official code base for LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?
[AAAI 2026] Official implementation of paper "UrbanNav: Learning Language-Guided Embodied Urban Navigation from Web-Scale Human Trajectories"
[ICLR 2026] FantasyWorld: Geometry-Consistent World Modeling via Unified Video and 3D Prediction
[CVPR 2026] Official implementation of FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-and-Language Navigation
[CVPR 2026] Official code repository for : "DecoVLN: Decoupling Observation, Reasoning, and Correction for Vision-and-Language Navigation"
TradingAgents: Multi-Agents LLM Financial Trading Framework
Wan: Open and Advanced Large-Scale Video Generative Models
This is the official repository for VLN-CLASH.
[CVPR 2026] LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
Sparse Video Generation Model for Embodied Navigation conditioned on loose language guidance, 100% real world verification
Code to pretrain, fine-tune, and evaluate DreamZero and run sim & real-world evals
[ICLR 2026] From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
GitNexus: The Zero-Server Code Intelligence Engine - GitNexus is a client-side knowledge graph creator that runs entirely in your browser. Drop in a GitHub repo or ZIP file, and get an interactive …
Elevate your AI research writing, no more tedious polishing ✨
This is a repository for listing papers on scene graph generation and application.
[ICLR2026] Official implementation for "JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation"