A targets pipeline for building SSURGO databases with DuckDB.
This R project provides a reproducible pipeline for processing and building SSURGO (Soil Survey Geographic Database) databases using the targets package and DuckDB.
- Reproducible Workflows: Built on the
targetspackage for reliable, efficient, scalable data pipelines - DuckDB Integration: Leverages DuckDB for columnar data storage and querying of spatial and tabular data
- R-based Pipeline: Written entirely in R, the project leverages the soilDB package for downloading data and creating the database
To get started, ensure you have R installed (4.0.0+), then clone this repository.
This project uses renv to manage a consistent, isolated set of package dependencies. When you open the project in R, the .Rprofile automatically activates renv, ensuring dependencies are loaded from the project-local library.
# Open R from the SSURGO/ directory
setwd("path/to/SSURGO")
R
# renv will auto-activate; initialize and discover dependencies from DESCRIPTION
renv::init()Dependencies are managed in the project-local environment via renv. To add or remove dependencies, modify DESCRIPTION and run renv::install() or renv::remove(), which will also update your local lock file.
Once dependencies are set up, run SSURGO.R to generate the _targets.R pipeline file:
You can modify the soil survey areas to include in the database in the first four targets. The default setup assumes you are creating a database with all US States, but you can choose any subset of one or more states, or any alternative method to create the ssas target (a character vector of area symbols).
source("SSURGO.R")This project uses the targets package to manage the pipeline.
To run the workflow, be sure your working directory is the ./SSURGO/ folder containing _targets.R.
# Load the targets library
library(targets)
# View the pipeline
tar_visnetwork() # Visualize the pipeline DAG
# Run the pipeline
tar_make()SSURGO/
|-- _targets.R # Main targets pipeline configuration (generated by SSURGO.R)
|-- SSURGO.R # Entry point for `tar_script()` _targets.R generation
|-- R/ # Core R functions and wrappers
|-- man/ # Documentation files
|-- DESCRIPTION # Package metadata
|-- NAMESPACE # Package namespace
|-- README.md # This file
All runtime dependencies are declared in DESCRIPTION and managed by renv:
- targets: Workflow orchestration
- duckdb: In-process SQL database engine
- soilDB: SSURGO data download utilities
- sf: Spatial data handling
- Plus supporting packages (DBI, tarchetypes, geotargets)
R >= 4.0.0 is required. See DESCRIPTION for full details.
The pipeline follows a structured approach:
- Data Ingestion: Download and prepare SSURGO data sources
- Database Building: Construct optimized DuckDB databases
- Output: Generate final database artifacts
This project uses renv to provide each user with an isolated, project-local R package library. The .Rprofile file automatically activates renv when you start R in this directory—no manual setup needed beyond the initial renv::restore().
Why this approach? Rather than pinning package versions in version control, each user maintains a local renv.lock file (which is .gitignored). This allows:
- Flexibility: Teams can work with newer package versions if desired
- Isolation: This project's dependencies don't affect your other R work
- Simplicity: No need to manage global package state
Please raise any issues on the Issue Tracker.
This project is licensed under the terms specified in LICENSE.md.
Andrew G. Brown (@brownag)