Welcome to the Air Quality Analysis (AQA) Library, an integrated framework designed to automate the download, processing, and harmonization of environmental data from multiple sources.
This project enables the combination of ARPA Lombardia ground data, Sentinel-5P satellite observations, and ERA5 reanalysis variables to generate daily, spatially consistent air quality indicators.
To execute this notebook successfully, you will need access to the following datasets:
-
ARPA Lombardia (Ground Sensors)
Hourly pollutant concentration measurements (NO₂, SO₂, CO, O₃, PM10, PM2.5, etc.), includingidsensore,lat,lng,provincia, and timestamp columns. -
Sentinel-5P (Satellite)
Atmospheric column densities for selected pollutants (NO2_column_number_density,O3_column_number_density, etc.) accessed through the Google Earth Engine API. -
ERA5 (Reanalysis)
Meteorological parameters (temperature, pressure, wind speed, boundary layer height, radiation, precipitation, etc.) obtained from CMCC or Copernicus Climate Data Store (CDS).
- Automated download of ERA5, Sentinel-5P, and ARPA datasets.
- Data cleaning and normalization for all input sources.
- Temporal harmonization to 12:00–15:00 mean values.
- Spatial merging of satellite, reanalysis, and sensor data.
- Integrated summary table for pollutant concentrations and weather variables.
- Visualization tools for pollutant maps and correlation analysis.
If using Windows: environment.yml If using Mac: nobuilds.yml
conda env create -f environment.yml
conda activate AQA_DayRangepython -m venv .venv
.venv\Scripts\activate # Windows
source .venv/bin/activate # macOS/Linux
pip install -r requirements.txtAll functions and workflow are implemented in the Jupyter Notebook:
AQA_DayRange.ipynb
This notebook contains:
- ARPA data access and cleaning.
- ERA5 and Sentinel-5P integration.
- Spatial grid generation and interpolation.
- Computation of current (
curr_) and previous (prev_) daily means. - Export of harmonized results to CSV and visualization of pollutant maps.
Loads ARPA Lombardia datasets using the API and organizes metadata for sensors, pollutants, and coordinates.
meta = pd.read_csv(meta_url).dropna(subset=["idsensore", "lat", "lng"])
data = requests.get(data_url, headers=headers).json()Downloads meteorological variables (temperature, pressure, wind speed, radiation, BLH, etc.) using CMCC or CDS API and converts them to a harmonized format.
import cdsapi
c = cdsapi.Client()
c.retrieve('reanalysis-era5-single-levels', {...}, 'era5_data.nc')Retrieves pollutant column density data (e.g., NO₂, O₃, CO, SO₂) via Google Earth Engine and scales values for consistency with ground units.
pollutants = {
"no2": {"collection": "COPERNICUS/S5P/OFFL/L3_NO2", "band": "NO2_column_number_density"}
}Combines all datasets (ARPA, ERA5, Sentinel) through coordinate matching and averaging based on the AOI grid.
summary = pd.merge(ground_df, sentinel_df, on=["Latitude", "Longitude"])
summary = pd.merge(summary, era5_df, on=["Latitude", "Longitude"])- CSV results:
results/ARPA_ERA_SP5-<date>.csv - Summary tables: harmonized pollutant and meteorological data
AQA/
│
├── AQA_DayRange.ipynb # Main analysis notebook
├── environment.yml # Conda environment for windows
├── nobuilds.yml # Conda environment for Mac
├── requirements.txt # pip dependencies
└── README.md # Project documentation
- Python (pandas, geopandas, numpy, matplotlib, xarray, requests)
- Google Earth Engine API
- Copernicus CDS / CMCC ERA5
- ARPA Lombardia Open Data API
- GeoPandas + Matplotlib for geospatial analysis
The data pipeline has been validated across multiple pollutants and date ranges.
This project is licensed under the MIT License.
See the LICENSE file for details.
Claudia Isabela Saud-Miño
Politecnico di Milano — Environmental & Geoinformatics Research
📧 Contact via GitHub