LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision

Jiani Huang · Ziyang Li · Mayur Naik · Ser-Nam Lim

University of Pennsylvania · University of Central Florida

🔗 Follow-up Work

ESCA: Contextualizing Embodied Agents via Scene-Graph Generation

NeurIPS 2025 Spotlight · Code

Jiani Huang • Amish Sethi • Matthew Kuo • Mayank Keoliya • Neelay Velingker • JungHo Jung • Ser-Nam Lim • Ziyang Li • Mayur Naik

This follow-up work demonstrates applying LASER for scene-graph generation in embodied agent environments.

🎬 What does LASER do for you?

Input Video	Output with Scene Graph

LASER automatically detects objects, actions and their relationships in videos

📰 News

[2025.12.01] 🤗 We have released a Hugging Face demo!
[2025.10.28] 🎉 Our follow-up work ESCA, demonstrating the usage of LASER model in an embodied environment, is accepted as NeurIPS 2025 Spotlight!
Jiani Huang, Amish Sethi, Matthew Kuo, Mayank Keoliya, Neelay Velingker, JungHo Jung, Ziyang Li, Ser-Nam Lim, Mayur Naik
[2025.08.30] 🤗 We have open sourced our scene graph generation model
[2025.08.30] 📊 We have open sourced our training data
[2025.03.02] ✨ LASER is accepted to ICLR 2025!

📖 Overview

LASER addresses the challenge of learning comprehensive scene understanding from videos by integrating:

🔍 Vision-Language Understanding: Uses CLIP-based models to learn visual-semantic representations of objects and their relationships
⏱️ Temporal Reasoning: Employs Scallop logic programming for symbolic reasoning over temporal sequences
🏷️ Weak Supervision: Learns from natural language descriptions converted to formal specifications using GPT
🎯 Multi-modal Processing: Combines object detection (GroundingDINO), segmentation (SAM2), and relationship modeling

The framework is designed to work with minimal supervision, making it practical for real-world applications where fully annotated temporal scene graphs are expensive or infeasible to obtain.

✨ Key Features

🔗 Spatial-Temporal Scene Graph Learning: Automatically discovers object relationships across time
📝 Natural Language Specifications: Converts natural language descriptions to formal temporal logic specifications (STSL)
⚖️ Contrastive Learning: Uses positive and negative examples for robust relationship learning
📚 Multi-Dataset Support: Trained and evaluated on ESCA-video-87K and LLaVA-Video-178K datasets
🚀 End-to-End Pipeline: Complete preprocessing, training, and evaluation workflow

🛠️ Installation

Environment Setup

🏋️ Training Environment

# 1. Create environment
conda env create -f environments/laser_train_env.yml

# 2. Install dependencies (follow their respective instructions)
# - GroundingDINO: https://github.com/video-fm/GroundingDINO
# - Segment Anything 2: https://github.com/video-fm/video-sam2
# - Scallop: https://github.com/scallop-lang/scallop

# 3. Verify
python src/training/train_clip_distributed_restore.py

📊 Evaluation Environment

# Create environment and install same dependencies as training
conda env create -f environments/laser_eval_env.yml

# Verify by running the demo notebook: demo/inference.ipynb

Datasets

Training Dataset Downloading

Download the generated mask data and GPT generated label data from https://huggingface.co/datasets/video-fm/ESCA-video-87K
Download the full videos from https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K

Preprocessing

We have already preprocessed the required masks and labels for you, but if you want to generate your own dataset, please follow the instructions HERE

Video Mask Processing

src/Preprocess/mask_generation.py

STSL Generation

Using GPT to generate JSON structures of the video captions. src/Preprocess/GPTSpecs_1.py
Parsing the generated structures to create STSL programs. src/Preprocess/GPTSpecs_2.py
Negative sample generation for contrastive learning. src/Preprocess/NegativeSampler.py

Common Questions

1. Question: My SAM2 shows post processing issues

Answer: Ensure your CUDA Tool kit and your pytorch has the same version.

Take 12.4 as an example: If you have sudo access, you can simply do sudo apt-get install cuda-toolkit-12-4. If not, follow the instructions below.

Download CUDA. You need to create an installation directory, to install without sudo access.

# Install CUDA 12.4 without sudo
# Download CUDA installer
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
# Create installation directory
mkdir -p ~/cuda-12.4
# Run installer
sh cuda_12.4.0_550.54.14_linux.run --toolkit --toolkitpath=~/cuda-12.4 --defaultroot=~/cuda-12.4 --no-opengl-libs --no-man-page --no-drm

Once you run the installer, a UI interface will appear. Accept the end user license agreement. Then you will see a CUDA Installer menu. Note - replace the install path in the screenshots with the path of the installation directory you created. cuda installer menu default
Uncheck the checked Driver section. Navigate to Options using arrow keys, press Enter. uncheck driver
The Options menu will appear. Navigate to Toolkit Options. cuda options menu
In Toolkit options, navigate to Change Toolkit Install Path. Make sure your install path is the installation directory you created earlier. cuda change toolkit install path
After changing the toolkit install path, stay in the Toolkit Options menu. Make sure to uncheck "Create symbolic link from /usr/local/cuda". Navigate to Done. cuda toolkit options menu
Navigate to Library install path. Ensure that the install path is also the installation directory. cuda library install path
Navigate to Done. Then navigate to Install. After installing, set your environment variables.

 echo 'export PATH=/home/[user]/cuda/cuda-12.4/bin:$PATH' >> ~/.bashrc
 echo 'export LD_LIBRARY_PATH=/home/[user]/cuda/cuda-12.4/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
 source ~/.bashrc

Verify your installation.

nvcc --version

Install PyTorch support for CUDA 12.4

conda install pytorch=2.5.1 torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Verify PyTorch and CUDA 12.4

import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA toolkit: {torch.version.cuda}")

Contributing

Contributing Guidelines

Create a Github issue outlining the piece of work. Solicit feedback from anyone who has recently contributed to the component of the repository you plan to contribute to. Reach out for feedback on the ESCA slack. If it's adding a feature, please share a brief 1 page google document describing what you're adding and how you will implement it.
Checkout a branch from main - preferably name your branch [github username]/[brief description of contribution]
Create a pull request that refers to the created github issue in the commit message.

To link to the github issue, in your commit for example you would simply add in the commit message:
```
[what the PR does briefly] #[commit issue]
```
Then when you push your commit and create your pull request, Github will automatically link the commit back to the issue. Add more details in the pull request, and request reviewers from anyone who has recently modified related code.

After 1-2 approvals, merge your pull request.

📚 Citation

If you use LASER in your research, please cite:

@inproceedings{huang2025laser,
  title={LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision},
  author={Huang, Jiani and Li, Ziyang and Naik, Mayur and Lim, Ser-Nam},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
demo		demo
environments		environments
laser		laser
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision

🔗 Follow-up Work

ESCA: Contextualizing Embodied Agents via Scene-Graph Generation

🎬 What does LASER do for you?

Input Video

Output with Scene Graph

📰 News

📖 Overview

✨ Key Features

🛠️ Installation

Environment Setup

Datasets

Training Dataset Downloading

Preprocessing

Video Mask Processing

STSL Generation

Common Questions

1. Question: My SAM2 shows post processing issues

Contributing

Contributing Guidelines

📚 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

video-fm/LASER

Folders and files

Latest commit

History

Repository files navigation

LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision

🔗 Follow-up Work

ESCA: Contextualizing Embodied Agents via Scene-Graph Generation

🎬 What does LASER do for you?

Input Video

Output with Scene Graph

📰 News

📖 Overview

✨ Key Features

🛠️ Installation

Environment Setup

Datasets

Training Dataset Downloading

Preprocessing

Video Mask Processing

STSL Generation

Common Questions

1. Question: My SAM2 shows post processing issues

Contributing

Contributing Guidelines

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages