The LinkML schema provided in this repository is an extension of the DCAT Application Profile, which allows to provide additional metadata for a dcat:Dataset in a very generic manner, such as:
- which kind(s) of entity(s) or activity(s) were evaluated,
- which kind of activity generated the
dcat:Dataset, - which kind of instruments were used in the dataset generating activity,
- in which surrounding (e.g. a laboratory) and according to which plan the dataset generating activity took place,
- as well as which kind(s) of qualitative and quantitative characteristic(s) were attributed to the evaluated entity or evaluated activity and to the used instruments.
This extension is mainly based on the Starting Point Terms of the Provenance Ontology (PROV-O),
in that it makes the prov:wasGeneratedBy property of the Dataset class mandatory and specifies necessary properties for its expected range, the prov:Activity class.
The choice to use LinkML for extending DCAT-AP was based on the need to have different layers that cater to different domain-specific use cases. DCAT-AP+ serves as the basic layer for such extensions and is thus kept very generic. Being the basis of the ChemDCAT-AP, one can see how it can be applied to further extend its classes for domain-specific needs.
DCAT-AP+ is developed within close collaboration between NFDI4Chem & NFDI4Cat and is intended to be further improved, extended and adapted by the whole NFDI community.
A more elaborate documentation is provided here: https://nfdi-de.github.io/dcat-ap-plus.
The JSON-LD serialization of the official DCAT-AP 3.0.0 SHACL shapes (dcat_ap_shacl.jsonld) were downloaded from the DCAT-AP GitHub repository 3.0.0 release folder within the master branch. The downloaded SHACL shapes were then processed by the dcat_ap_shacl_2_linkml.py script to generate two LinkML schemas from it:
- dcat_ap_linkml.yaml - an almost 1:1 translation of the DCAT-AP SHACL shapes to LinkML.
- dcat_ap_plus.yaml - the LinkML representation of DCAT-AP to which we added the additional constraints, classes and properties we need for our DCAT-AP+ extension.
- docs/ - mkdocs-managed documentation
- elements/ - generated schema documentation
- examples/ - Examples of using the schema
- project/ - project files (these files are auto-generated, do not edit)
- src/ - source files (edit these)
- dcat_ap_plus
- schema/ -- LinkML schema (edit this)
- datamodel/ -- generated Python datamodel
- dcat_ap_plus
- tests/ - Python tests
- data/ - Example data
See also the documentation of the template: https://github.com/linkml/linkml-project-copier?tab=readme-ov-file#prerequisites
-
uv
uv is a tool to manage Python projects and for managing isolated Python-based applications. You will use it in your generated project to manage dependencies and build distribution files. Install uv by following their instructions.
Note: Environments with private PyPi repository may need extra configuration (example):
export UV_DEFAULT_INDEX=https://nexus.example.com/repository/pypi-all/simple -
Copier
Copier is a tool for generating projects based on a template (like this one!). It also allows re-configuring the projects and to keep them updated when the original template changes. To insert dates into the template, copier requires jinja2_time in the copier environment. Install both with uv by running:
uv tool install --with jinja2-time copier -
just
The project contains a justfile with pre-defined complex commands. To execute these commands you need just as command runner. Install it by running:
uv tool install rust-justTo generate project artefacts run:
just gen-project: generates all other representationsjust site: Builds all artefacts but does not deploy the gh-pagesjust deploy: deploys sitejust test: runs all testsjust testdoc: locally builds docs and runs test server
The project provides a pre-commit configuration with several code quality tools:
- yamllint — consistent formatting of schema YAML files
- ruff — formatting and linting of Python code
- codespell and typos — spell checkers
To use pre-commit, install it and activate it at the root of the repository:
uv tool install pre-commit --with pre-commit-uv
pre-commit installOnce installed, pre-commit will run checks automatically on every commit and reject commits that contain errors; it will also auto-correct several types of errors. The use of pre-commit-uv is optional but recommended as it accelerates the initial setup.
You can also run all checks manually at any time:
pre-commit run -aThe files dcat_ap_linkml.yaml and dcat_ap_plus.yaml are build artifacts, not source code.
Do not manually edit these YAML files to add classes, slots, change constraints, or edit its metadata.
All schema changes must be made by modifying the Python generation script (dcat_ap_shacl_2_linkml.py).
- To change the DCAT-AP base: Update the logic in
parse_dcat_ap_shacl_shapes()(e.g., to handle new SHACL shapes). - To change the DCAT-AP+ extension: Update the logic in
build_dcatap_plus()(e.g., adding slots toextend_dataset()).
To ensure your changes are captured correctly and the dynamic versioning updates to the right commit hash, follow this exact sequence:
- Edit the Script: Make your changes to
src/dcat_ap_plus/dcat_ap_shacl_2_linkml.py. - Commit the Script:
(This creates a new commit hash. The versioning system needs this commit to exist before it can calculate the new version.)
git add src/dcat_ap_plus/dcat_ap_shacl_2_linkml.py git commit -m "feat: update generation logic for [your change]" - Run the Generation Script:
uv run python src/dcat_ap_plus/dcat_ap_shacl_2_linkml.py
- Note: The script automatically runs
uv syncat the start. This ensures the installed package metadata matches your new commit hash, allowing the script to inject the correct version (e.g.,...+g<new_commit_hash>) into the YAML. - This step generates/overwrites
dcat_ap_linkml.yamlanddcat_ap_plus.yaml.
- Note: The script automatically runs
- Validate and Regenerate Derived Artifacts:
Update the Python datamodel, documentation, and other artifacts from the newly versioned YAML:
just gen-project _test-python _test-examples # Or specific commands for docs/tests as needed - Commit the Generated Files:
git add src/dcat_ap_plus/schema/*.yaml src/dcat_ap_plus/datamodel/ project/ docs/elements/ git commit -m "chore: regenerate schema and artifacts with new version"
Validate and test all: just test
Validate a single example dataset using LinkML's validator framework:
- Validate DCAT-AP-PLUS extension conform example
uv run linkml validate tests/data/valid/AnalysisDataset-001.yaml -s src/dcat_ap_plus/schema/dcat_ap_plus.yaml -C AnalysisDataset - Validate DCAT-AP-PLUS extension conform example
uv run linkml validate tests/data/valid/Dataset-001.yaml -s src/dcat_ap_plus/schema/dcat_ap_plus.yaml -C Dataset
To convert the test datasets of each DCAT-AP profile into a TTL graph run:
- Convert domain agnostic DCAT-AP extension conform example of an analysis
uv run linkml-convert -t ttl tests/data/valid/AnalysisDataset-001.yaml -s src/dcat_ap_plus/schema/dcat_ap_plus.yaml -P "_base=https://search.nfdi4chem.de/dataset/" -C AnalysisDataset - Convert a NMR spectroscopy-specific DCAT-AP extension conform example
uv run linkml-convert -t ttl tests/data/valid/Dataset-001.yaml -s ssrc/dcat_ap_plus/schema/dcat_ap_plus.yaml -P "_base=https://search.nfdi4chem.de/dataset/" -C Dataset
uv run mkdocs serve
rm -rf docs/elements/*.md && uv run gen-doc -d docs/elements src/dcat_ap_plus/schema/dcat_ap_plus.yaml
This work was funded by the German Research Foundation (DFG) through the projects:
- "NFDI4Cat - NFDI for Catalysis-Related Sciences" (DFG project no. 441926934) and
- "NFDI4Chem - NFDI for Chemistry" (DFG project no. 441958208)"
within the National Research Data Infrastructure (NFDI) programme of the Joint Science Conference (GWK).
This project uses the template linkml-project-copier.