Skip to content

blw-ofag-ufag/crops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

688 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RDF master and reference data about crops

This project addresses the challenge of fragmented agricultural crop data within the Swiss federal administration, where essential systems1 all use separate, non-harmonized crop terminologies. This lack of a "single source of truth" creates significant integration hurdles for digital tools.

In this project, we propose a unified master data system for crops and crop-related objects. The repository implements a sustainable solution by using a dedicated RDF ontology (crops ontology) and a graph database on LINDAS. This approach first connects (or "maps") the various crop terms from the different systems, creating a unified, machine-readable master data system that can be queried centrally. This graph not only allows for complex queries across formerly siloed data but also provides the stable, versioned foundation for the long-term, step-by-step harmonization of crop data across the Swiss agricultural sector. Click here to search for crops in the graph.

Warning

This project is still work in progress.

Data model

The general data model is doumented here. Note that SHACL Play! reads the data from rdf/shape/data-model.ttl on main. You may inspect the crop taxonomy/ontology using WebVOWL here or read its turtle file here.

Note

You may find more information on the repository wiki.

Repository structure

  • /data: source data files
  • /docs: (static) html documents, rendered as github page
  • /rdf: all RDF (turtle) files
    • /data: tabular data
    • /ontology: core vocabulary, crop taxonomy
    • /processed: any automatically written turtle files -- do not change (manually)
    • /shape: dedicated files for SHACL shapes
  • /src: source code
  • /tests: pytest files

Run the data processing and LINDAS integration pipeline

The data integration pipeline uses all the R and python scripts in the /scripts folder. The entire pipeline can be triggered with:

  1. Add variables to .env

    USER=lindas-foag
    PASSWORD=********
    GRAPH=https://lindas.admin.ch/foag/crops
    ENDPOINT=https://stardog.cluster.ldbar.ch/lindas
    EPPO=********
  2. Start a virtual environment and install libraries:

    python -m venv venv
    source venv/bin/activate  # On Windows use: venv\Scripts\activate
    pip install --upgrade pip
    pip install -r src/python/requirements.txt
  3. Run the ETL pipeline:

    bash src/pipeline.bash

    This pipeline

    1. validates syntax of all RDF turtle files,
    2. constructs a graph with inferred triples (based on the OWL-ontology),
    3. validates the final graph based on the SHACL data model and
    4. overrides the named graph https://lindas.admin.ch/foag/crops on LINDAS.
  4. Make sure you pass all tests with pytest:

    pytest tests/ -v
  5. Check out the results on LINDAS. (Here's an example entity.)

How can I use this data?

You can query the crop master data system using the SPARQL 1.1 Query Language. Here's an example query that gets you all cultivation type URIs and labels in German:

PREFIX schema: <http://schema.org/>
PREFIX owl: <http://www.w3.org/2002/07/owl#> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <https://agriculture.ld.admin.ch/crops/>

SELECT ?name ?URI
FROM <https://lindas.admin.ch/foag/crops>
WHERE {
  ?URI a owl:Class ;
    schema:name ?name ;
    rdfs:subClassOf+ :Cultivation .
  FILTER(LANG(?name) = "de")
}
ORDER BY ?name

More examples are available in src/sparql/queries. Note that some of these examples are "parametrized" and thus can't be run without modifications.

In order to automatically retrieve data from LINDAS, you can send a POST request to the LINDAS endpoint, passing the SPARQL query as a parameter.

# define any query you want...
query="SELECT * FROM <https://lindas.admin.ch/foag/crops> WHERE { ?s ?p ?o }"
curl -G "https://lindas.admin.ch/query" \
     --data-urlencode "query=$query" \
     -H "Accept: application/json"

Depending on your integration needs, you can adjust the accept header to retrieve the data in several possible formats, including JSON (application/json), XML (application/sparql-results+xml) or CSV (text/csv).

Footnotes

  1. For example AGIS (direct payments), GRUD (fertilization), PSM registry (plant protection), ProVar (varieties), PGREL-NIS (gene bank) and others.