Official PyTorch implementation of "SemLayoutDiff: Semantic Layout Diffusion for 3D Indoor Scene Generation"
Xiaohao Sun1, Divyam Goel2, Angel X. Chang1,3
1Simon Fraser University 2CMU 3Alberta Machine Intelligence Institute (Amii)
π Project Website | π Paper |
- π Overview
- π Quick Start
- π¦ Pretrained Weights
- π Dataset Preparation
- ποΈ Model Training
- π¨ Scene Generation
- π¨ Scene Rendering
- π Evaluation
- π― Citation
- π Acknowledgements
SemLayoutDiff is a two-stage approach to generate realistic 3D indoor scenes condition on room type and room architecture mask. Our method combines semantic layout generation with attribute prediction to create complete 3D rooms.
Our approach consists of two main components:
-
Semantic Layout Diffusion Network (SLDN): Generates 2D semantic layout maps from room type conditioning using a diffusion process using discrete diffsion model.
-
Attribute Prediction Model (APM): Takes the generated 2D semantic layouts and predicts full 3D object attributes including positions, orientations, and scales for each instance.
After the above two stage, we perform object retrieval based on the object category and size to construct the complete 3D scene.
- Clone the repository
git clone https://github.com/3dlg-hcvc/SemLayoutDiff.git
cd SemLayoutDiff- Set up the environment
conda env create -f environment.yml
conda activate semlayoutdiff
# Install semlayoutdiff
pip install -e .
# Install ai2thor for libsg build
pip install --extra-index-url https://ai2thor-pypi.allenai.org ai2thor==0+8524eadda94df0ab2dbb2ef5a577e4d37c712897We provide pretrained weights for both model components to enable quick scene generation without training from scratch.
Download Links:
Setup:
# Create checkpoints directory
mkdir -p checkpoints && cd checkpoints
# Download and extract pretrained weights
# SLDN weights
wget [SLDN_DOWNLOAD_LINK] -O checkpoints/sldn_checkpoints.tar.gz
tar -xf sldn_checkpoints.tar.gz
# APM weights
wget [APM_DOWNLOAD_LINK] -O apm_checkpoints.ckptThe data requirements depend on your use case:
| Use Case | Required Data | Description |
|---|---|---|
| π¨ Scene Generation Only (using pretrained models) | threed_future_model_unified.json |
3D furniture model database for object retrieval |
| ποΈ Training SLDN | unified_w_arch_120x120.npy |
Preprocessed 2D semantic layouts for diffusion training |
| ποΈ Training APM | unified_w_arch_3dfront/ |
3D scene data with semantic maps and object attributes |
For most users: Download our preprocessed datasets built on 3D-FRONT and 3D-FUTURE:
# Create datasets directory and download processed data
mkdir -p datasets && cd datasets
wget [DOWNLOAD_LINK] -O 3dfront_processed.tar.gz
tar -xzf 3dfront_processed.tar.gz --strip-components=1 && cd ..Download Link: 3D-FRONT Processed Dataset
What's included:
unified_w_arch_120x120.npy: SLDN training data (2D semantic layouts with room architecture)unified_w_arch_3dfront/: APM training dataset (3D scenes with object attributes and semantic maps)threed_future_model_unified.json: 3D-FUTURE model database (required for object retrieval during scene generation)
Note: Data splits and metadata are included in the repository under preprocess/metadata/ (no separate download needed).
Click to expand directory structure
datasets/
βββ unified_w_arch_120x120.npy # SLDN training data (required!)
βββ unified_w_arch_3dfront/ # APM training dataset
β βββ train/val/test/ # Scene data with .json/.png files
βββ threed_future_model_unified.json # 3D-FUTURE models
βββ results/ # Generated outputs
Getting Started with Custom Processing:
For detailed instructions on processing your own data from raw 3D-FRONT datasets, see preprocess/README.md.
Our approach consists of two main components that can be trained independently:
- Semantic Layout Diffusion Network (SLDN): Generates 2D semantic maps from room type conditioning
- Attribute Prediction Model (APM): Projects 2D semantic maps to full 3D object layouts with attributes
Our training uses YAML configuration files in the configs/ directory. For reproducing paper results, use:
configs/sldn/mixed_con_unified_config.yamlfor SLDN trainingconfigs/apm/unified_config.yamlfor APM training
Train the semantic layout diffusion model on 3D-FRONT dataset:
python scripts/train_sldn.py --config configs/sldn/mixed_con_unified_config.yamlKey parameters include batch_size: 48, data_size: 120 for 120Γ120 semantic maps, and num_categories: 38 object types. The model uses room_type_condition: true and w_arch: true to enable conditioning on both room types and architectural elements.
Requirements:
datasets/unified_w_arch_120x120.npy- Essential SLDN training data- GPU with β₯16GB memory for the default batch size
Train the attribute prediction model to map semantic layouts to 3D scenes:
python scripts/train_apm.py --config-path=../configs/apm --config-name=unified_configRequirements:
datasets/unified_w_arch_3dfront/- Essential APM training datadatasets/threed_future_model_unified.json- 3D model database
Generate 2D semantic maps using the trained SLDN:
python scripts/sample_layout.py --config configs/sldn/sample_layout.yamlConfigure the script by editing sample_layout.yaml. Set room_type to unified and sample_room_type to 0, 1, or 2 for bedroom, dining room, and living room respectively. Use samples: 5 to generate 5 layouts and condition_type: "arch" for architecture conditioning. Set w_arch: true to include doors/windows and specify out_dir for the output location.
Convert semantic layouts to full 3D scenes with the APM:
python scripts/inference.py --config-path=../configs/apm --config-name=unified_configModify unified_config.yaml to set semantic_map_dir to your generated layouts directory, room_type to match the layout room type (bedroom/livingroom/diningroom), and output_dir for results. The script uses checkpoint_path to load the trained APM model.
Output: Generated 3D scenes are saved as JSON files containing object positions, orientations, scales, categories, and 3D model IDs for furniture retrieval.
Render generated scenes using the SmartScenes ToolKit (SSTK) for visualization and quantitative evaluation.
- Install SSTK following the instructions in their repository
- Prepare 3D-FRONT metadata from sstk-metadata following their README setup instructions
NODE_BASE_URL=<your_root_path> ~/path/to/sstk/ssc/render-file.js \
--assetType scene \
--assetGroups 3dfModel,3dfTexture \
--input datasets/results/scenestate_files/your_scene.json \
--output_dir rendered_scenes \
--color_by objectType \
--index preprocess/metadata/semantic_color_index_livingdiningroom.csv \
--restrict_to_color_index \
--assetInfo '{"archOptions":{"includeWallHoleBoundingBoxes":true}}'Parameters:
--input: Path to generated scenestate JSON file--index: Semantic color mapping (usesemantic_color_index_bedroom.csvfor bedroom scenes)--output_dir: Directory for rendered outputs--assetInfo: Includes walls and doors in rendering with architectural options
The semantic color indices are provided in preprocess/metadata/ for different room types.
Export Scene as GLB File (click to expand)
Export generated scenes as GLB (GL Transmission Format Binary) files for use in 3D applications:
NODE_BASE_URL=<your_root_path> ~/path/to/sstk/ssc/export-mesh.js \
--assetType scene \
--assetGroups 3dfModel,3dfTexture \
--input datasets/results/scenestate_files/your_scene.json \
--input_type path \
--output_format glb \
--output_dir datasets \
--color_by objectType \
--index preprocess/metadata/semantic_color_index_livingdiningroom.csv \
--restrict_to_color_index \
--assetInfo '{"archOptions":{"includeWallHoleBoundingBoxes":true}}'Parameters:
--input: Path to generated scenestate JSON file--input_type: Set topathfor file path input--output_format: Set toglbfor GLB export format--output_dir: Directory for exported GLB files--index: Semantic color mapping (usesemantic_color_index_bedroom.csvfor bedroom scenes)--assetInfo: Includes walls and doors in the exported model with architectural options
The exported GLB files can be imported into Blender for better visualization.
Evaluate generated scenes using multiple quantitative metrics:
python scripts/eval.py --config configs/evaluation_config.yamlThe evaluation script computes four key metrics:
1. Out-of-Boundary (OOB) Rate: Measures the percentage of generated objects that are placed outside the room boundaries or overlap with walls/architectural elements.
2. Scene Classification Accuracy (SCA): Trains a CNN classifier on real scene renderings and tests it on generated scenes to measure visual realism and scene coherence.
3. FrΓ©chet Inception Distance (FID): Computes FID and KID scores between real and generated scene renderings to measure visual quality and diversity.
4. KL Divergence: Measures the difference between object category distributions in real vs. generated scenes across 34 furniture types (beds, chairs, tables, etc.).
For scene plausibility evaluation, please refer to SceneEval.
If you find this work useful, please consider citing our paper:
@article{sun2025semlayoutdiff,
title={{SemLayoutDiff}: Semantic Layout Generation with Diffusion Model for Indoor Scene Synthesis},
author={Xiaohao Sun and Divyam Goel and Angel X. Chang},
year={2025},
eprint={2508.18597},
archivePrefix={arXiv},
}Thanks to Multinomial Diffusion, whose implementations form the foundation of our diffusion models. And thanks to BlenderProc-3DFront and DiffuScene that our data processing based on their code.
This work was funded in part by a CIFAR AI Chair and NSERC Discovery Grants, and enabled by support from the Digital Research Alliance of Canada and a CFI/BCKDF JELF. We thank Ivan Tam for help with running SceneEval; Yiming Zhang and Jiayi Liu for suggestions on figures; Derek Pun, Dongchen Yang, Xingguang Yan, and Manolis Savva for discussions, proofreading, and paper suggestions.