ATR-UMMIR (Unmanned Multimodal Image Registration) is a large-scale dataset for multimodal image registration and matching in aerial scenarios. It focuses on aligning visible and infrared image pairs captured by UAVs under diverse real-world conditions. This dataset is designed to advance research in multimodal image alignment, fusion-based detection, and condition-aware visual understanding.
- Modalities: Aligned visibleβinfrared image pairs
- Scene Count: 15,000+ unique locations
- Total Images: 60,000+ (30k visible, 30k infrared)
- Resolution: 640Γ512 pixels
- Annotations:
- Coarse-level manual alignment
- Fine-level keypoints (for subset)
- Detailed condition labels (see below)
To reflect real-world complexity, each image pair is annotated across six condition attributes:
- Altitude: 80mβ300m (majority in 100β120m)
- Camera Angle: 0Β° (nadir) to 75Β° (oblique), majority at 30Β°β45Β°
- Shooting Time: Day, night, dawn, morning, afternoon
- Weather: Sunny, cloudy, rainy, after-rain, foggy
- Illumination: Night, twilight, dim, normal, overexposed
- Scenario: 11 types including urban, suburban, village, factory, road, school, etc.
This rich condition diversity enables robust evaluation of multimodal models under dynamic imaging environments.
[β¬ Dataset Download Link (Link)]
Please fill in the download address or contact us for access.
- Multimodal image registration and alignment
- Condition-aware image matching
- Cross-modality fusion and detection
- UAV-based remote sensing tasks
If you use ATR-UMMIR in your research, please cite:
@misc{ATRUMMIR2025,
title={ATR-UMMIR: A Multimodal UAV Image Matching Dataset under Diverse Conditions},
author={Your Name and Others},
year={2025},
howpublished={\url{https://github.com/yourname/ATR-UMMIR}},
}