Spatial Temporal Reasoning Models (STRMs)

Official PyTorch implementation of Spatial Temporal Reasoning Models (STRMs) for vision-based localization.

STRMs transform first-person perspective (FPP) observations into global map perspective (GMP) and precise geographical coordinates, achieving GPS-level precision for autonomous navigation.

Citation

If you use this code in your research, please cite our paper:

@article{lui2025strms,
  title={STRMs: Spatial Temporal Reasoning Models for Vision-Based Localization Rivaling GPS Precision},
  author={Lui, Hin Wai and Krichmar, Jeffrey L.},
  journal={arXiv preprint arXiv:2503.07939},
  year={2025}
}

Paper: arXiv:2503.07939

Download Data and Trained Weights

Download the following from Google Drive:

Datasets (Jackal and Tesla)
Pretrained model weights
Satellite images (jackal_satellite.png and tesla_satellite.png - required for generating Global Map Perspective (GMP) images)

Using gdown (Recommended)

pip install gdown
gdown --folder FOLDER_URL

Note: Download each file individually to avoid rate limits when downloading the entire dataset at once.

Directory Structure

After downloading, your directory should look like:

strm/
├── data/
│   ├── jackal/
│   └── tesla/
├── trained_weights/
│   └── models/
├── jackal_satellite.png  # Required for GMP image generation
└── tesla_satellite.png   # Required for GMP image generation

Installation

Clone the Repository

git clone https://github.com/UCI-CARL/strm.git
cd strm

Create Virtual Environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install Dependencies

pip install -r requirements.txt

Install GPUMultiprocessing

pip install git+https://gitlab.com/paloha/gpuMultiprocessing

Requirements

Python 3.8+
PyTorch 2.0+
CUDA-compatible GPU (recommended)

Data Preprocessing

Preprocess Raw Data

python jackal_pre_process.py --source data/jackal
python tesla_pre_process.py --source data/tesla

Build Sequence Dataset Caches

Dataset caches are automatically built during training when needed. You can also build them manually:

python dataloader.py --source data/jackal --seq_len 24 --seq_delta 10
python dataloader.py --source data/tesla --seq_len 24 --seq_delta 10

Note: A new cache is created whenever you use different seq_len or seq_delta values.

Training

Train RNN Model on Jackal Dataset

python vae_train.py --model VAERNN --source data/jackal --seq_len 24 --seq_delta 10 --tag jackal_rnn

Train Transformer Model on Tesla Dataset. It requires a lower LR of 8e-6

python vae_train.py --model VAETransformer --source data/tesla --seq_len 24 --seq_delta 10 --tag tesla_transformer --lr=8e-6

Note: Transformer models require a lower learning rate (8e-6) for optimal performance.

Quick Debug Mode

Test your setup with a small subset of data:

python vae_train.py --debug --model VAETransformer --source data/jackal

Evaluation

Evaluate Pretrained Models

Evaluate models without retraining:

python vae_train.py --model VAERNN --source data/jackal \
    --load_path trained_weights/models/jackal/VAERNN --skip_train

Run Inference

Generate predictions on test data:

python vae_inference.py --model VAETransformer --source data/tesla \
    --load_path trained_weights/models/tesla/VAETransformer

Hyperparameter Search

Full Hyperparameter Search

python vae_hp_search.py --hp_search latent_size

Quick Search (10% of Dataset)

For faster experimentation:

python vae_hp_search.py --hp_search model --dataset_ratio 0.1

Reproducing Paper Figures

Generate all figures from the paper:

Figure 1: Dataset Collection Paths

python plot_paths.py --source data/jackal
python plot_paths.py --source data/tesla

Figure 3: Localization Performance Characteristics (LPC)

python lpc.py

Figure 4: Inference Time Comparison

python inference_time_plot.py

Figure 5: Reconstruction Ablation Analysis

python recon_ablation.py

Figure 6: Image Reconstruction Examples

python vae_visualize.py --load_path trained_weights/models/jackal/VAERNN

Baseline Comparisons

VIGOR Multi-Seed Comparison

Train and compare VIGOR models against STRM with statistical robustness:

# Full comparison with 3 seeds (train + evaluate)
python vigor_comparison.py --dataset jackal --seeds 1 2 3 --epochs 30

# For Tesla dataset
python vigor_comparison.py --dataset tesla --seeds 1 2 3 --epochs 30

For detailed usage, data format adaptations, and troubleshooting, see VIGOR_COMPARISON.md.

Computational Benchmarking

Compare computational efficiency (GPU memory, parameters, inference time) against VIGOR and TransGeo:

python benchmark_models.py --dataset jackal --batch_size 32

Results are saved to:

CSV: trained_weights/vigor_vs_strm/{dataset}/benchmark_results.csv
LaTeX table: trained_weights/vigor_vs_strm/{dataset}/benchmark_table.tex

Benchmark Options

Option	Description	Default
`--dataset`	Dataset to use (`jackal` or `tesla`)	Required
`--batch_size`	Batch size for benchmarking	32
`--num_warmup`	Number of warmup iterations	10
`--num_iterations`	Number of benchmark iterations	100
`--seed`	Model seed index to load	0

Configuration Options

The main training script (vae_train.py) supports many configuration options:

Option	Description	Default
`--model`	Model architecture (VAERNN, VAETransformer, VAEMultiscaleTransformer)	VAERNN
`--seq_len`	Sequence length for temporal processing	24
`--seq_delta`	Time between frames in seconds	10
`--latent_size`	Size of latent space dimension	256
`--img_size`	Input image size	224
`--batch_size`	Batch size for training	32
`--epochs`	Number of training epochs	100
`--lr`	Learning rate	1e-4

For a complete list of options:

python vae_train.py --help

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This work builds upon research in vision-based localization and spatial-temporal reasoning. We thank the authors of VIGOR and TransGeo for their open-source implementations.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
VIGOR_COMPARISON.md		VIGOR_COMPARISON.md
benchmark_models.py		benchmark_models.py
binary_cache_handler.py		binary_cache_handler.py
data_types.py		data_types.py
dataloader.py		dataloader.py
deit_models.py		deit_models.py
inference_time_plot.py		inference_time_plot.py
jackal_pre_process.py		jackal_pre_process.py
lpc.py		lpc.py
plot_paths.py		plot_paths.py
recon_ablation.py		recon_ablation.py
requirements.txt		requirements.txt
rtkgps_jackal.py		rtkgps_jackal.py
rtkgps_tesla.py		rtkgps_tesla.py
tesla_pre_process.py		tesla_pre_process.py
transgeo_models_torch.py		transgeo_models_torch.py
utils.py		utils.py
vae_hp_search.py		vae_hp_search.py
vae_inference.py		vae_inference.py
vae_rnn.py		vae_rnn.py
vae_train.py		vae_train.py
vae_transformer.py		vae_transformer.py
vae_visualize.py		vae_visualize.py
vigor_comparison.py		vigor_comparison.py
vigor_evaluate.py		vigor_evaluate.py
vigor_models_torch.py		vigor_models_torch.py
vigor_train.py		vigor_train.py

License

UCI-CARL/strm

Folders and files

Latest commit

History

Repository files navigation

Spatial Temporal Reasoning Models (STRMs)

Citation

Download Data and Trained Weights

Using gdown (Recommended)

Directory Structure

Installation

Clone the Repository

Create Virtual Environment

Install Dependencies

Install GPUMultiprocessing

Requirements

Data Preprocessing

Preprocess Raw Data

Build Sequence Dataset Caches

Training

Train RNN Model on Jackal Dataset

Train Transformer Model on Tesla Dataset. It requires a lower LR of 8e-6

Quick Debug Mode

Evaluation

Evaluate Pretrained Models

Run Inference

Hyperparameter Search

Full Hyperparameter Search

Quick Search (10% of Dataset)

Reproducing Paper Figures

Figure 1: Dataset Collection Paths

Figure 3: Localization Performance Characteristics (LPC)

Figure 4: Inference Time Comparison

Figure 5: Reconstruction Ablation Analysis

Figure 6: Image Reconstruction Examples

Baseline Comparisons

VIGOR Multi-Seed Comparison

Computational Benchmarking

Benchmark Options

Configuration Options

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages