Skip to content
/ strm Public

Vision-based localization using biologically-inspired VAE models that rival GPS precision by transforming first-person camera observations into precise geographical coordinates.

License

Notifications You must be signed in to change notification settings

UCI-CARL/strm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spatial Temporal Reasoning Models (STRMs)

Official PyTorch implementation of Spatial Temporal Reasoning Models (STRMs) for vision-based localization.

STRMs transform first-person perspective (FPP) observations into global map perspective (GMP) and precise geographical coordinates, achieving GPS-level precision for autonomous navigation.

Citation

If you use this code in your research, please cite our paper:

@article{lui2025strms,
  title={STRMs: Spatial Temporal Reasoning Models for Vision-Based Localization Rivaling GPS Precision},
  author={Lui, Hin Wai and Krichmar, Jeffrey L.},
  journal={arXiv preprint arXiv:2503.07939},
  year={2025}
}

Paper: arXiv:2503.07939

Download Data and Trained Weights

Download the following from Google Drive:

  • Datasets (Jackal and Tesla)
  • Pretrained model weights
  • Satellite images (jackal_satellite.png and tesla_satellite.png - required for generating Global Map Perspective (GMP) images)

Using gdown (Recommended)

pip install gdown
gdown --folder FOLDER_URL

Note: Download each file individually to avoid rate limits when downloading the entire dataset at once.

Directory Structure

After downloading, your directory should look like:

strm/
├── data/
│   ├── jackal/
│   └── tesla/
├── trained_weights/
│   └── models/
├── jackal_satellite.png  # Required for GMP image generation
└── tesla_satellite.png   # Required for GMP image generation

Installation

Clone the Repository

git clone https://github.com/UCI-CARL/strm.git
cd strm

Create Virtual Environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install Dependencies

pip install -r requirements.txt

Install GPUMultiprocessing

pip install git+https://gitlab.com/paloha/gpuMultiprocessing

Requirements

  • Python 3.8+
  • PyTorch 2.0+
  • CUDA-compatible GPU (recommended)

Data Preprocessing

Preprocess Raw Data

python jackal_pre_process.py --source data/jackal
python tesla_pre_process.py --source data/tesla

Build Sequence Dataset Caches

Dataset caches are automatically built during training when needed. You can also build them manually:

python dataloader.py --source data/jackal --seq_len 24 --seq_delta 10
python dataloader.py --source data/tesla --seq_len 24 --seq_delta 10

Note: A new cache is created whenever you use different seq_len or seq_delta values.

Training

Train RNN Model on Jackal Dataset

python vae_train.py --model VAERNN --source data/jackal --seq_len 24 --seq_delta 10 --tag jackal_rnn

Train Transformer Model on Tesla Dataset. It requires a lower LR of 8e-6

python vae_train.py --model VAETransformer --source data/tesla --seq_len 24 --seq_delta 10 --tag tesla_transformer --lr=8e-6

Note: Transformer models require a lower learning rate (8e-6) for optimal performance.

Quick Debug Mode

Test your setup with a small subset of data:

python vae_train.py --debug --model VAETransformer --source data/jackal

Evaluation

Evaluate Pretrained Models

Evaluate models without retraining:

python vae_train.py --model VAERNN --source data/jackal \
    --load_path trained_weights/models/jackal/VAERNN --skip_train

Run Inference

Generate predictions on test data:

python vae_inference.py --model VAETransformer --source data/tesla \
    --load_path trained_weights/models/tesla/VAETransformer

Hyperparameter Search

Full Hyperparameter Search

python vae_hp_search.py --hp_search latent_size

Quick Search (10% of Dataset)

For faster experimentation:

python vae_hp_search.py --hp_search model --dataset_ratio 0.1

Reproducing Paper Figures

Generate all figures from the paper:

Figure 1: Dataset Collection Paths

python plot_paths.py --source data/jackal
python plot_paths.py --source data/tesla

Figure 3: Localization Performance Characteristics (LPC)

python lpc.py

Figure 4: Inference Time Comparison

python inference_time_plot.py

Figure 5: Reconstruction Ablation Analysis

python recon_ablation.py

Figure 6: Image Reconstruction Examples

python vae_visualize.py --load_path trained_weights/models/jackal/VAERNN

Baseline Comparisons

VIGOR Multi-Seed Comparison

Train and compare VIGOR models against STRM with statistical robustness:

# Full comparison with 3 seeds (train + evaluate)
python vigor_comparison.py --dataset jackal --seeds 1 2 3 --epochs 30

# For Tesla dataset
python vigor_comparison.py --dataset tesla --seeds 1 2 3 --epochs 30

For detailed usage, data format adaptations, and troubleshooting, see VIGOR_COMPARISON.md.

Computational Benchmarking

Compare computational efficiency (GPU memory, parameters, inference time) against VIGOR and TransGeo:

python benchmark_models.py --dataset jackal --batch_size 32

Results are saved to:

  • CSV: trained_weights/vigor_vs_strm/{dataset}/benchmark_results.csv
  • LaTeX table: trained_weights/vigor_vs_strm/{dataset}/benchmark_table.tex

Benchmark Options

Option Description Default
--dataset Dataset to use (jackal or tesla) Required
--batch_size Batch size for benchmarking 32
--num_warmup Number of warmup iterations 10
--num_iterations Number of benchmark iterations 100
--seed Model seed index to load 0

Configuration Options

The main training script (vae_train.py) supports many configuration options:

Option Description Default
--model Model architecture (VAERNN, VAETransformer, VAEMultiscaleTransformer) VAERNN
--seq_len Sequence length for temporal processing 24
--seq_delta Time between frames in seconds 10
--latent_size Size of latent space dimension 256
--img_size Input image size 224
--batch_size Batch size for training 32
--epochs Number of training epochs 100
--lr Learning rate 1e-4

For a complete list of options:

python vae_train.py --help

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This work builds upon research in vision-based localization and spatial-temporal reasoning. We thank the authors of VIGOR and TransGeo for their open-source implementations.

About

Vision-based localization using biologically-inspired VAE models that rival GPS precision by transforming first-person camera observations into precise geographical coordinates.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages