hongbo-sun

Zhixing Sun hongbo-sun

Interested in multimedia content analysis.

5 followers · 8 following

https://github.com/hongbo-sun

Achievements

Stars

Hui-design / TSPO

[AAAI 2026] ✨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding

Python 124 12 Updated Nov 12, 2025

IDEA-Research / Rex-Thinker

[ICLR-2026] Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning

Python 147 7 Updated Jun 30, 2025

codefanw / FlashSloth

[CVPR2025] FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Python 64 5 Updated Oct 10, 2025

THU-MIG / RepViT

RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything

Jupyter Notebook 1,079 82 Updated Jun 14, 2024

zhoujiahuan1991 / MM2024-InsVP

Python 15 3 Updated May 5, 2025

360CVGroup / FG-CLIP

New generation of CLIP with strong fine grained discrimination capability, ICML2025

Python 680 35 Updated Oct 27, 2025

fjq-tongji / MLLM-SUL

Python 3 1 Updated Dec 31, 2024

PKU-ICST-MIPL / FinePOSE_CVPR2024

FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models

Python 115 15 Updated May 3, 2025

naver-ai / tc-clip

[ECCV 2024] Official PyTorch implementation of TC-CLIP "Leveraging Temporal Contextualization for Video Action Recognition"

Python 94 11 Updated Feb 25, 2025

Visual-AI / FROSTER

[ICLR 2024] FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition

Python 99 9 Updated Jan 14, 2025

om-ai-lab / VLM-R1

Solve Visual Understanding with Reinforced VLMs

Python 5,939 377 Updated Mar 12, 2026

PKU-ICST-MIPL / HCL_TMM2023

Python 8 5 Updated Oct 30, 2023

PKU-ICST-MIPL / FineFMPL_IJCAI2024

Python 10 3 Updated Aug 6, 2024

VARGPT-family / VARGPT

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model

Python 341 18 Updated Apr 17, 2025

yossigandelsman / clip_text_span

official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"

Jupyter Notebook 233 29 Updated Jun 1, 2025

linhuixiao / HiVG

[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.

Python 62 6 Updated Nov 10, 2025

tmlr-group / WCA

Forked from JinhaoLee/WCA

[ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"

Python 58 4 Updated Sep 3, 2024

SunzeY / AlphaCLIP

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Jupyter Notebook 871 58 Updated Jul 20, 2025

Dmmm1997 / SimVG

[NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion

Python 102 4 Updated Oct 29, 2025

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 9,975 770 Updated Sep 22, 2025

JiuTian-VL / JiuTian-LION

[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

Jupyter Notebook 153 7 Updated Sep 3, 2025

GhostJxL / MANET

code for MANET

Python 2 1 Updated Mar 3, 2023

PKU-YuanGroup / LLaVA-CoT

[ICCV 2025] LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning

Python 2,132 82 Updated Dec 12, 2025

CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025

Python 278 9 Updated May 26, 2025

Oryx-mllm / Oryx

[ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Python 330 19 Updated Jul 4, 2025

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,986 1,737 Updated Jan 30, 2026

JethroJames / CREST

[ACM MM 2024] Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning

Python 8 1 Updated Jul 22, 2024

DCDmllm / Cheetah

Python 353 27 Updated May 25, 2024

netease-youdao / QAnything

Question and Answer based on Anything.

Python 13,952 1,348 Updated Mar 24, 2025

TheAiSingularity / graphrag-local-ollama

Local models support for Microsoft's graphrag using ollama (llama3, mistral, gemma2 phi3)- LLM & Embedding extraction

Python 1,098 167 Updated Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zhixing Sun hongbo-sun

Achievements

Achievements

Block or report hongbo-sun

Stars

Hui-design / TSPO

IDEA-Research / Rex-Thinker

codefanw / FlashSloth

THU-MIG / RepViT

zhoujiahuan1991 / MM2024-InsVP

360CVGroup / FG-CLIP

fjq-tongji / MLLM-SUL

PKU-ICST-MIPL / FinePOSE_CVPR2024

naver-ai / tc-clip

Visual-AI / FROSTER

om-ai-lab / VLM-R1

PKU-ICST-MIPL / HCL_TMM2023

PKU-ICST-MIPL / FineFMPL_IJCAI2024

VARGPT-family / VARGPT

yossigandelsman / clip_text_span

linhuixiao / HiVG

tmlr-group / WCA

SunzeY / AlphaCLIP

Dmmm1997 / SimVG

OpenGVLab / InternVL

JiuTian-VL / JiuTian-LION

GhostJxL / MANET

PKU-YuanGroup / LLaVA-CoT

CircleRadon / TokenPacker

Oryx-mllm / Oryx

QwenLM / Qwen3-VL

JethroJames / CREST

DCDmllm / Cheetah

netease-youdao / QAnything

TheAiSingularity / graphrag-local-ollama