Skip to content

talk2event/toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

talk2event

About

Event cameras offer microsecond-level latency and robustness to motion blur, making them ideal for understanding dynamic environments. Yet, connecting these asynchronous streams to human language remains an open challenge. We introduce Talk2Event, the first large-scale benchmark for language-driven object grounding in event-based perception. Built from real-world driving data, we provide over 30,000 validated referring expressions, each enriched with four grounding attributes -- appearance, status, relation to viewer, and relation to other objects -- bridging spatial, temporal, and relational reasoning. To fully exploit these cues, we propose EventRefer, an attribute-aware grounding framework that dynamically fuses multi-attribute representations through a Mixture of Event-Attribute Experts (MoEE). Our method adapts to different modalities and scene dynamics, achieving consistent gains over state-of-the-art baselines in event-only, frame-only, and event-frame fusion settings. We hope our dataset and approach will establish a foundation for advancing multimodal, temporally-aware, and language-driven perception in real-world robotics and autonomy.

Table of Contents

Benchmark Overview

talk2event

⚙️ Installation

For details related to installation and environment setups, kindly refer to INSTALL.md.

♨️ Data Preparation

Kindly refer to DATA_PREPAER.md for the details to prepare the datasets.

🚀 Getting Started

To learn more usage about this codebase, kindly refer to GET_STARTED.md.

License

This work is under the Apache License Version 2.0, while some specific implementations in this codebase might be under other licenses. Kindly refer to the original repositories for a more careful check, if you are using our code for commercial matters.

About

[NeurIPS 2025 Spotlight] Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published