Skip to content

visinf/glass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Official repository of the CVPR 2025 paper
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Krishnakant Singh, Simone Schaub-Meyer, and Stefan Roth
Visual Inference Lab, TU Darmstadt


Overview

GLASS introduces a diffusion-based framework for object-centric representation learning.
It integrates slot attention with a latent diffusion decoder to learn slot representations that generalize across visual tasks:

  • 🧠 Unsupervised Object Discovery
  • 🎨 Image Generation & Reconstruction
  • Compositional Image Generation
πŸ”§ Dependencies
Python >= 3.11  
PyTorch == 2.5.0  
CUDA == 11.8
βš™οΈ Environment Setup
conda create -n glass python==3.11.10
conda activate glass

# Install PyTorch and CUDA
conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=11.8 -c pytorch -c nvidia

# Install remaining dependencies
pip install -r requirements.txt
πŸ’Ύ Pretrained Models

Pretrained checkpoints from the paper are available here:
πŸ“₯ Google Drive Folder

Please unzip the folder and place the models under a top-level directory named glass/.

πŸ–ΌοΈ Datasets

πŸš€ Evaluation

🧠 Object-Centric Segmentation

bash ./src/eval/scripts/coco/eval_oclf_metrics_coco.sh

This would create file metrics_coco.json file in the checkpoint folder.

🎨 Image Generation

bash ./src/eval/scripts/coco/eval_generation.sh

Compositional Generation

We provide a very crude implementation for generation compositional images.

bash ./src/eval/scripts/coco/eval_composition.sh

πŸ“Œ TODO

  • Release full training pipeline

πŸ“š Citation

If you find this repository useful, please consider citing:

@inproceedings{singh2025glass,
  author    = {Krishnakant Singh and Simone Schaub-Meyer and Stefan Roth},
  title     = {GLASS: Guided Latent Slot Diffusion for Object-Centric Learning},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2025},
}

πŸ™ Acknowledgements

This repository builds upon
LSD: Latent Slot Diffusion and Dataset Diffusion. We thank the authors for open-sourcing their work.


πŸ“œ License

License: Apache 2.0


βœ‰οΈ Contact

Krishnakant Singh
πŸ“§ firstname.lastname@visinf.tu-darmstadt.de
🌐 https://visinf.github.io/glass


About

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning (CVPR 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors