Skip to content
View daeunni's full-sized avatar
☘️
Researching for the happiness
☘️
Researching for the happiness

Block or report daeunni

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 30 3 Updated Dec 23, 2025

A continuously updated project to track the latest progress in the field of multi-modal object tracking. This project focuses solely on single-object tracking.

Jupyter Notebook 914 50 Updated Dec 23, 2025

2026 AI/ML internship & new graduate job list updated daily

4,250 172 Updated Dec 23, 2025

Spatial Reasoning with Vision-Language Models

Python 30 1 Updated Nov 17, 2025

Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"

4 Updated Oct 18, 2025

[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Python 5,918 1,072 Updated Jun 19, 2024

Official implementation of RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models

Python 178 14 Updated Nov 30, 2025

[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥

Python 4,640 543 Updated Dec 3, 2025

Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation (TIP 2024, ACM MM 2023)

Python 19 2 Updated Mar 13, 2024

Wan: Open and Advanced Large-Scale Video Generative Models

Python 13,019 1,519 Updated Dec 17, 2025

Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"

Python 72 1 Updated Dec 5, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 14,975 2,222 Updated Dec 15, 2025

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

16 Updated Dec 1, 2025
Python 3 Updated Nov 19, 2025

HunyuanVideo-1.5: A leading lightweight video generation model

Python 2,107 102 Updated Dec 23, 2025

🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.

108 4 Updated Dec 20, 2025

Official code for PRInTS: Rewarding Agents for Long-Horizon Information Seeking

Python 4 Updated Dec 10, 2025

Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Python 129 5 Updated Dec 17, 2025

**Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.

Python 320 7 Updated Nov 3, 2025

This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).

249 15 Updated Dec 15, 2025

Official implementation for paper How Can Objects Help Video-Language Understanding

Python 6 Updated Aug 2, 2025

A step-by-step reasoning framework for 3D scene understanding

12 1 Updated Nov 7, 2025

SAM 3D Objects

Python 5,063 470 Updated Dec 16, 2025

GraphicBench: A Planning Benchmark for Graphic Design Generation with Language Agents

JavaScript 4 Updated Apr 17, 2025

Official code for EgoGazeVQA, accepted to NeurIPS D&B 2025

Python 8 Updated Oct 22, 2025

Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"

Python 127 7 Updated Dec 18, 2025

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

Jupyter Notebook 336 7 Updated Jun 1, 2025

The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"

Python 71 1 Updated Oct 15, 2025
Python 37 1 Updated Oct 20, 2025
Next