training

Training

Training scripts for the See-through layer decomposition models: LayerDiff, Marigold depth, transparent VAE, and body part segmentation.

This codebase produces the V3 model with 23 body-part tag training. To reproduce the results from the paper, refer to the v0.0.1 model.

Environment Setup

Uses the unified see_through conda env. See the root README for setup.

Optional extras

# DeepSpeed ZeRO for multi-GPU training (required for LayerDiff)
pip install -r requirements-training-deepspeed.txt

# Experiment tracking
pip install wandb

# 8-bit Adam optimizer (used with --use_8bit_adam in config)
pip install bitsandbytes

# FID / LPIPS / PSNR benchmarks (required by scripts/benchmark.py)
pip install torchmetrics

Data Preparation

Training data is prepared in three stages:

Extract Live2D model layers using CubismPartExtr — converts .moc3 model files into per-drawable RGBA images.
Parse and label the extracted layers using parse_live2d.py and the annotation UI — runs SAM segmentation and assigns body-part tags. See README_datapipeline.md for the full walkthrough.
Render training samples using scripts/data_pipeline.py — composites labeled layers onto background images with augmentation to produce the final training data.

Training data should be placed under workspace/datasets/ (gitignored). The sample list paths in the YAML configs (e.g. workspace/datasets/l2d_bodysamples_v3.txt) point to text files listing the training samples, one per line.

Testbed

Our training was conducted on 8x NVIDIA H200 GPUs. LayerDiff and LayerDiff 3D require multi-GPU training with DeepSpeed ZeRO-2; other models can be trained on a single GPU.

Training Pipelines

The two main model families each follow a three-stage pipeline: train a 2D model, convert its UNet weights to 3D, then fine-tune the 3D model.

Marigold depth (SD 1.5 scale):

train_marigold_depth.py  -->  cvt_marigold2d_to_3d.py  -->  train_marigold3d.py
    (2D depth model)          (UNet weight conversion)       (3D depth model)

LayerDiff (SDXL scale):

train_layerdiff.py  -->  cvt_layerdiff2d_to_3d.py  -->  train_layerdiff3d.py
   (2D layer model)       (UNet weight conversion)       (3D layer model)

Auxiliary models (single-stage, single-GPU):

Script	Purpose
`train/train_depth.py`	Depth Anything V2 adapter
`train/train_vae.py`	Transparent VAE encoder/decoder
`train/train_partseg.py`	SAM-HQ body part segmentation

Usage

Always run from the repository root:

cd /path/to/see-through
conda activate see_through

# Multi-GPU training with DeepSpeed (LayerDiff example)
accelerate launch --config_file training/configs/test_ddp_4gpu.json \
  training/train/train_layerdiff.py \
  --config training/configs/test_layerdiff.yaml

Accelerate config files are provided for 4-GPU (test_ddp_4gpu.json) and 8-GPU (ddp_bf16.json) setups. Adjust num_processes to match your GPU count.

Script Inventory

Training scripts (`train/`)

Script	Purpose
`train_layerdiff.py`	LayerDiff fine-tuning (SDXL, multi-GPU DeepSpeed)
`train_layerdiff3d.py`	LayerDiff 3D training (SDXL, multi-GPU DeepSpeed)
`train_marigold_depth.py`	Marigold 2D depth estimation
`train_marigold3d.py`	Marigold 3D depth
`train_partseg.py`	Body part segmentation (SAM-HQ)
`train_depth.py`	Depth Anything V2 adapter
`train_vae.py`	Transparent VAE encoder/decoder
`dataset_layerdiff.py`	LayerDiff dataset loader
`dataset_depth.py`	Depth dataset loader
`dataset_seg.py`	Segmentation dataset loader
`loss_depth.py`	Depth training losses
`loss_vae.py`	VAE training losses (LPIPS + ConvNeXt perceptual)
`loss_mask_samhq.py`	SAM-HQ mask losses
`eval_utils.py`	Evaluation utilities
`kepler.py`	Kepler codebook quantizer (VQ-VAE)
`benchmark.py`	In-training benchmark utilities

Utility scripts (`scripts/`)

Script	Purpose
`cvt_marigold2d_to_3d.py`	Convert Marigold 2D UNet weights to 3D
`cvt_layerdiff2d_to_3d.py`	Convert LayerDiff 2D UNet weights to 3D
`data_pipeline.py`	Training data rendering and augmentation
`benchmark.py`	FID / LPIPS / PSNR evaluation (requires `torchmetrics`)
`save_ckpt.py`	Checkpoint format conversion
`ckpt.py`	Checkpoint utilities
`hf.py`	HuggingFace Hub upload/download helpers

Metrics (`metrics/`)

Script	Purpose
`clip_score.py`	CLIP-based similarity scoring
`binary_dice_loss.py`	Binary Dice loss for segmentation

Configs (`configs/`)

Config	Purpose
`test_layerdiff.yaml`	LayerDiff training config
`test_layerdiff3d.yaml`	LayerDiff 3D training config
`test_marigold_depth.yaml`	Marigold 2D depth config
`test_marigold3d.yaml`	Marigold 3D depth config
`test_depth.yaml`	Depth Anything adapter config
`test_vae.yaml`	Transparent VAE config
`test_partseg.yaml`	Body part segmentation config
`finetune_layerdiff_iter2.yaml`	LayerDiff fine-tuning with multi-source data
`test_ddp_4gpu.json`	Accelerate config for 4-GPU DeepSpeed
`ddp_bf16.json`	Accelerate config for 8-GPU DeepSpeed

Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
metrics		metrics
scripts		scripts
tests		tests
train		train
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Training

Environment Setup

Optional extras

Data Preparation

Testbed

Training Pipelines

Usage

Script Inventory

Training scripts (`train/`)

Utility scripts (`scripts/`)

Metrics (`metrics/`)

Configs (`configs/`)

FilesExpand file tree

training

Directory actions

More options

Directory actions

More options

Latest commit

History

training

Folders and files

parent directory

README.md

Training

Environment Setup

Optional extras

Data Preparation

Testbed

Training Pipelines

Usage

Script Inventory

Training scripts (train/)

Utility scripts (scripts/)

Metrics (metrics/)

Configs (configs/)

Training scripts (`train/`)

Utility scripts (`scripts/`)

Metrics (`metrics/`)

Configs (`configs/`)