Skip to content
View dongyh20's full-sized avatar

Block or report dongyh20

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dongyh20/README.md

Hi there 👋

🔭 I’m currently working on the topic of visual perception and my long-term goal is to build general foundation models.

⚡ Recently I'm focusing on vision-language model and unified visual models.

📫 If you are also interested in relevant issues, feel free to chat with me!

Pinned Loading

  1. Oryx-mllm/Oryx Oryx-mllm/Oryx Public

    [ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

    Python 327 18

  2. Insight-V Insight-V Public

    [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

    Python 229 5

  3. Octopus Octopus Public

    [ECCV2024] 🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.

    Python 292 18

  4. Ola-Omni/Ola Ola-Omni/Ola Public

    Ola: Pushing the Frontiers of Omni-Modal Language Model

    Python 374 16

  5. EvolvingLMMs-Lab/lmms-eval EvolvingLMMs-Lab/lmms-eval Public

    One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

    Python 3.3k 418

  6. open-compass/VLMEvalKit open-compass/VLMEvalKit Public

    Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

    Python 3.3k 531