GitHub - allenai/olmoearth_pretrain: Earth system foundation model data, training, and eval

The OlmoEarth models are a flexible, multi-modal, spatio-temporal family of foundation models for Earth Observations.

The OlmoEarth models exist as part of the OlmoEarth platform. The OlmoEarth Platform is an end-to-end solution for scalable planetary intelligence, providing everything needed to go from raw data through R&D, to fine-tuning and production deployment.

Installation

We recommend Python 3.12, and recommend using uv. To install dependencies with uv, run:

git clone git@github.com:allenai/olmoearth_pretrain.git
cd olmoearth_pretrain
uv sync --locked --all-groups --python 3.12
# only necessary for development
uv tool install pre-commit --with pre-commit-uv --force-reinstall

uv installs everything into a venv, so to keep using python commands you can activate uv's venv: source .venv/bin/activate. Otherwise, swap to uv run python.

OlmoEarth is built using OLMo-core. OLMo-core's published Docker images contain all core and optional dependencies.

Model Summary

The OlmoEarth models are trained on three satellite modalities (Sentinel 2, Sentinel 1 and Landsat) and six derived maps (OpenStreetMap, WorldCover, USDA Cropland Data Layer, SRTM DEM, WRI Canopy Height Map, and WorldCereal).

Model Size	Weights	Encoder Params	Decoder Params
Nano	link	1.4M	800K
Tiny	link	6.2M	1.9M
Base	link	89M	30M
Large	link	308M	53M

Using OlmoEarth

InferenceQuickstart shows how to initialize the OlmoEarth model and apply it on a satellite image.

We also have several more in-depth tutorials for computing OlmoEarth embeddings and fine-tuning OlmoEarth on downstream tasks:

Additionally, olmoearth_projects has several examples of active OlmoEarth deployments.

Data Summary

Our pretraining dataset contains 285,288 samples from around the world of 2.56km×2.56km regions, although many samples contain only a subset of the timesteps and modalities.

The distribution of the samples is available below:

The dataset can be downloaded here.

Detailed instructions on how to make your own pretraining dataset are available in the dataset README.

Training scripts

Detailed instructions on how to pretrain your own OlmoEarth model are available in Pretraining.md.

Evaluations

Detailed instructions on how to replicate our evaluations is available here:

License

This code is licensed under the OlmoEarth Artifact License.

Name		Name	Last commit message	Last commit date
Latest commit History 4,068 Commits
.github/workflows		.github/workflows
assets		assets
data/rslearn_dataset_configs		data/rslearn_dataset_configs
docs		docs
helios		helios
olmoearth_pretrain		olmoearth_pretrain
scripts		scripts
tests		tests
.cursorignore		.cursorignore
.cursorrules		.cursorrules
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Installation

Model Summary

Using OlmoEarth

Data Summary

Training scripts

Evaluations

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 12

Languages

License

allenai/olmoearth_pretrain

Folders and files

Latest commit

History

Repository files navigation

Installation

Model Summary

Using OlmoEarth

Data Summary

Training scripts

Evaluations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 12

Languages

Packages