Latent Diffusion for Continuous-Scale Super-Resolution of Remote-Sensing Images

Hanlin Wu^* Jiangwei Mo^* Xiaohui Sun Jie Ma

^* Equal contribution

Beijing Foreign Studies University

Paper | PDF

Overview

Key Highlights

Continuous-Scale SR: Support arbitrary scale factors from 1× to 8× with a single model
Two-Stage Latent Diffusion: Efficient super-resolution in compressed latent space
Few-Step Inference: Only 4 diffusion steps required for high-quality results
Remote Sensing Optimized: Designed specifically for aerial and satellite imagery

Dependencies and Installation

Clone repo

git clone https://github.com/MoooJianG/LDCSR.git

Install dependencies

conda create -n LDCSR python=3.10
conda activate LDCSR
pip install -r requirement.txt

Usage

Dataset Preparation

We support AID, DOTA, and DIOR out‑of‑the‑box. Download HR images using the official links below (or your own data) and generate LR/HR pairs via bicubic down‑sampling.

Supported Datasets:

AID - Aerial Image Dataset
DOTA - Dataset for Object Detection in Aerial Images
DIOR - Detection In Optical Remote sensing images

# Example: split AID
python data/prepare_split.py --split_file AID_split.pkl --data_path dataset/RawAID --output_path dataset/AID

Custom datasets should replicate the following folder structure:

└── dataset
    └── YourData
        ├── Train
        |   ├── HR
        |   └── LR
        ├── Test
        └── Val

Quick Start

Model Training in AID

# First-stage
python train.py --config configs/first_stage_kl_v6.yaml
# Second-stage
python train.py --config configs/second_stage_van_v4.yaml

Model Testing

python test.py --checkpoint path/to/checkpoint.ckpt --datasets AID --scales 4

Pretrained Models

Download pretrained models and dataset splits from Google Drive.

Checkpoints:

first_stage/ - First stage VAE model
second_stage/ - Second stage diffusion model

Dataset Splits:

AID_split.pkl - AID dataset split
DIOR_400test.pkl - DIOR test split
DOTA_400test.pkl - DOTA test split
NWPU_100test.pkl - NWPU test split

Method

LDCSR adopts a two-stage latent diffusion framework:

Stage 1 - Variational Autoencoder (VAE)

Encodes HR images into a compact 4-channel latent space (8× spatial compression)
GAPEncoder/GAPDecoder with scale-aware decoding for continuous-scale reconstruction
Trained with reconstruction loss, KL divergence, and adversarial loss (LPIPS + Discriminator)

Stage 2 - Latent Diffusion Model

Performs super-resolution directly in the latent space
UNet denoiser conditioned on: LR features, scale factor, and output size embeddings
Only 4 diffusion steps for efficient inference

Results

Quantitative comparison on AID dataset for integer scale factors (×2, ×4, ×8). Best results in bold.

Fixed-Scale Methods:

Method	Scale	PSNR↑	LPIPS↓	FID↓
HAT-L	×4	29.49	0.321	39.82
SR3	×4	28.15	0.252	26.01
SPSR	×4	25.95	0.185	20.94

Continuous-Scale Methods:

Method	Scale	PSNR↑	LPIPS↓	FID↓
LIIF	×4	29.32	0.334	42.49
CiaoSR	×4	29.64	0.316	38.77
LMF-LTE	×4	29.62	0.318	39.07
Ours	×4	27.19	0.174	18.37

Our method achieves the best perceptual quality (LPIPS, FID) among continuous-scale methods while maintaining competitive PSNR. See the paper for full results on AID, DOTA, and DIOR datasets.

Configuration

Key parameters in config files:

First Stage (configs/first_stage_kl_v6.yaml)

embed_dim: Latent space dimension (default: 4)
ch_mult: Channel multipliers for encoder/decoder
disc_start: Step to enable discriminator training

Second Stage (configs/second_stage_van_v4.yaml)

timesteps: Number of diffusion steps (default: 4)
linear_start/end: Noise schedule parameters
scale_by_std: Whether to normalize latent by std

Contact

If you have any questions or suggestions, feel free to contact me.

Email：20220119004@bfsu.edu.cn

Citation

@ARTICLE{11006698,
  author={Wu, Hanlin and Mo, Jiangwei and Sun, Xiaohui and Ma, Jie},
  journal={IEEE Transactions on Geoscience and Remote Sensing}, 
  title={Latent Diffusion, Implicit Amplification: Efficient Continuous-Scale Super-Resolution for Remote Sensing Images}, 
  year={2025},
  volume={},
  number={},
  pages={1-1},
  keywords={Diffusion models;Training;Image synthesis;Noise reduction;Visualization;Decoding;Computational modeling;Remote sensing;Image reconstruction;Autoencoders;Remote sensing;super-resolution;latent diffusion;continuous-scale},
  doi={10.1109/TGRS.2025.3571290}}

License

This project is released under the MIT License.

Acknowledgements

This work is built upon several excellent open-source projects:

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
asserts		asserts
configs		configs
data		data
losses		losses
metrics		metrics
models		models
modules		modules
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
_test.py		_test.py
auto_benchmark.py		auto_benchmark.py
auto_benchmark_world_strat.py		auto_benchmark_world_strat.py
gen_lr_datastes.py		gen_lr_datastes.py
readme.md		readme.md
requirement.txt		requirement.txt
test.ipynb		test.ipynb
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Latent Diffusion for Continuous-Scale Super-Resolution of Remote-Sensing Images

Overview

Key Highlights

Dependencies and Installation

Usage

Dataset Preparation

Quick Start

Model Training in AID

Model Testing

Pretrained Models

Method

Results

Configuration

Contact

Citation

License

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

MoooJianG/LDCSR

Folders and files

Latest commit

History

Repository files navigation

Latent Diffusion for Continuous-Scale Super-Resolution of Remote-Sensing Images

Overview

Key Highlights

Dependencies and Installation

Usage

Dataset Preparation

Quick Start

Model Training in AID

Model Testing

Pretrained Models

Method

Results

Configuration

Contact

Citation

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages