-
UNC Chapel Hill
- Chaple Hill, NC
-
22:02
(UTC -05:00) - https://daeunni.github.io/
- https://daeun-computer-uneasy.tistory.com/
Stars
A continuously updated project to track the latest progress in the field of multi-modal object tracking. This project focuses solely on single-object tracking.
2026 AI/ML internship & new graduate job list updated daily
Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"
[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box
Official implementation of RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation (TIP 2024, ACM MM 2023)
Wan: Open and Advanced Large-Scale Video Generative Models
Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"
Wan: Open and Advanced Large-Scale Video Generative Models
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
HunyuanVideo-1.5: A leading lightweight video generation model
🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.
Official code for PRInTS: Rewarding Agents for Long-Horizon Information Seeking
Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
**Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.
This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).
Official implementation for paper How Can Objects Help Video-Language Understanding
A step-by-step reasoning framework for 3D scene understanding
GraphicBench: A Planning Benchmark for Graphic Design Generation with Language Agents
Official code for EgoGazeVQA, accepted to NeurIPS D&B 2025
Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"