ClimData provides a unified interface for extracting climate data from multiple providers (MSWX, CMIP, POWER, DWD, HYRAS), computing extreme indices, and converting results to tabular form. The ClimData (or ClimateExtractor) class is central: it manages configuration, extraction, index computation, and common I/O.
- Provider-agnostic extraction (point / region / shapefile)
- Unit normalization via xclim
- Compute extreme indices using package indices
- Convert xarray Datasets → long-form pandas DataFrames
- Simple workflow runner for chained actions
- Create and activate a conda environment:
# create
conda create -n climdata python=3.11 -y
# activate
conda activate climdata- Install via pip (PyPI, if available) or from source:
# from PyPI
pip install climdata
# or from local source (editable)
git clone <repo-url>
cd climdata
pip install -e .Install optional extras as needed (e.g., xclim, shapely, hydra, dask):
pip install xarray xclim shapely hydra-core dask "pandas>=1.5"If you need the imputation functionality (gap filling with ML models), install PyTorch and related packages:
# Install PyTorch CPU version (recommended for most users)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install additional ML packages for imputation
pip install torch-cluster -f https://data.pyg.org/whl/torch-2.3.1+cpu.html
pip install pytorch-lightning torchmetrics lightning torchcde reformer-pytorch
pip install tensorflow darts sktime prophet tsfel tsfresh transformers timmFor GPU support, see PyTorch installation guide.
from climdata import ClimData # or from climdata.utils.wrapper_workflow import ClimateExtractor
overrides = [
"dataset=mswx",
"lat=52.5",
"lon=13.4",
"time_range.start_date=2014-01-01",
"time_range.end_date=2014-12-31",
"variables=[tasmin,tasmax,pr]",
"data_dir=/path/to/data",
"index=tn10p",
]
# initialize
extractor = ClimData(overrides=overrides)
# extract data (returns xarray.Dataset and updates internal state)
ds = extractor.extract()
# compute index (uses cfg.index)
ds_index = extractor.calc_index(ds)
# convert to long-form dataframe and save
df = extractor.to_dataframe(ds_index)
extractor.to_csv(df, filename="index.csv")Use run_workflow for multi-step sequences:
result = extractor.run_workflow(actions=["extract", "calc_index", "to_dataframe", "to_csv"])WorkflowResult contains produced dataset(s), dataframe(s), and filenames.
- See API docs under
docs/api/for detailed descriptions of ClimData/ClimateExtractor methods. - Examples and notebooks are under
examples/.
- Run tests and lint locally.
- Follow project coding and documentation conventions; submit PRs with tests.
If you use climdata in your research or projects, please cite it using the following formats:
@software{muduchuru2024climdata,
title={climdata: Automated Climate Data Extraction and Processing},
author={Muduchuru, Kaushik},
year={2024},
version={0.5.0},
url={https://github.com/Kaushikreddym/climdata},
note={Available at https://Kaushikreddym.github.io/climdata}
}Muduchuru, K. (2024). climdata: Automated climate data extraction and processing (v0.5.0). Retrieved from https://github.com/Kaushikreddym/climdata
Our repository includes a CITATION.cff file. GitHub will automatically show a "Cite this repository" button with ready-to-use citation formats.
DOI: https://doi.org/10.5281/zenodo.19554926
Zenodo Record: https://zenodo.org/record/19554926
For archival setup details, see Zenodo & DOI Guide.
Refer to the repository LICENSE file for terms.
-
Make sure
yqis installed:brew install yq # macOS # OR pip install yq
-
To see available variables for a specific dataset (for example
mswx), run:python download_location.py --cfg job | yq '.mappings.mswx.variables | keys'
- Supports multiple weather data providers
- Uses
xarrayfor robust gridded data extraction - Handles curvilinear and rectilinear grids
- Uses a Google Drive Service Account for secure downloads
- Easily reproducible runs using Hydra
MSWX (Multi-Source Weather) is released under the CC BY-NC 4.0 license. This means:
✅ Allowed uses:
- Academic research
- Non-profit scientific studies
- Personal projects
- Government or NGO applications (non-commercial)
❌ Not allowed:
- Commercial use or products
- For-profit services
To access MSWX data:
- Visit https://www.gloh2o.org/mswx/
- Submit a data request for non-commercial use
- Once approved, follow the Google Drive API setup below to configure climdata
⚠️ Important: By using MSWX via climdata, you agree to the CC BY-NC 4.0 license terms. Unauthorized commercial use is prohibited.
This project uses the Google Drive API with a Service Account to securely download weather data files from a shared Google Drive folder.
Follow these steps to set it up correctly:
- Go to Google Cloud Console.
- Click “Select Project” → “New Project”.
- Enter a project name (e.g.
WeatherDataDownloader). - Click “Create”.
- In the left sidebar, go to APIs & Services → Library.
- Search for “Google Drive API”.
- Click it, then click “Enable”.
- Go to IAM & Admin → Service Accounts.
- Click “Create Service Account”.
- Enter a name (e.g.
weather-downloader-sa). - Click “Create and Continue”. You can skip assigning roles for read-only Drive access.
- Click “Done” to finish.
- After creating the Service Account, click on its email address to open its details.
- Go to the “Keys” tab.
- Click “Add Key” → “Create new key” → choose
JSON→ click “Create”. - A
.jsonkey file will download automatically. Store it securely!
- Place the downloaded
.jsonkey in the conf folder with the name service.json.
-
Create a free account on the Copernicus Climate Data Store
-
Once logged in, go to your user profile
-
Click on the "Show API key" button
-
Create the file
~/.cdsapircwith the following content:url: https://cds.climate.copernicus.eu/api/v2 key: <your-api-key-here>
-
Make sure the file has the correct permissions:
chmod 600 ~/.cdsapirc