-
UNC Chapel Hill
- Chaple Hill, NC
-
01:36
(UTC -04:00) - https://daeunni.github.io/
- https://daeun-computer-uneasy.tistory.com/
Stars
[CVPR 2026] Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos
Official Python inference and LoRA trainer package for the LTX-2 audioβvideo generative model.
"Visual Prompt Selection for In-Context Learning Segmentation Framework"
[Awesome] π₯π₯π₯ Latest Papers, Codes and Datasets on Streaming / Online Video Understanding
Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
Track and Caption Any Motion: Query-Free Motion Discovery and Description in Videos
[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga
π This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.
Code for "StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos [CVPR 2026]"
Repository for NeurIPS 2025 Paper "Gaze-VLM: Bridging Gaze and VLMs via Attention Regularization for Egocentric Understanding"
[NeurIPS'25 Spotlight] ARM: Adaptive Reasoning Model
VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
[NeurIPS 2025 spotlight] QFFT, Question-Free Fine-Tuning for Adaptive Reasoning
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
A continuously updated project to track the latest progress in the field of multi-modal object tracking. This project focuses solely on single-object tracking.
2026 AI/ML internship & new graduate job list updated daily
[ICLR2026] Spatial Reasoning with Vision-Language Models
Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"
[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box
Official implementation of RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. π₯ π₯ π₯
Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation (TIP 2024, ACM MM 2023)
Wan: Open and Advanced Large-Scale Video Generative Models
[CVPR 2026] Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"
Wan: Open and Advanced Large-Scale Video Generative Models