Small service to copy data to our central storage on demand and update SciCat appropriately.
Listens to a RabbitMQ queue to receive work by e.g. the MLZ SciCat Ingestor[1].
It copies data from HAYFORK_SRC to HAYFORK_DEST using a so called consumer and updates the relevant entries in the configured SciCat catalogue[2].
The container has to be run with the correct NFS mounts or similar.
Currently the following consumers are available:
- NFS
- S3 (under development)
The following environment variables can be set:
The consumer has to be selected:
HAYFORK_CONSUMER: defines which consumer should be usedNFS: nfs or direct filesystem access in the containerS3: s3 api
The SciCat backend which should be updated can be configured using:
SCICAT_HOST: URL under which the SciCat backend can be reached, e.g.http://localhost:3000.SCICAT_USER: User to use for SciCat login.SCICAT_PASSWORD: Password for the SciCat user.
RabbitMQ messaging is configured with the following settings:
RMQ_HOST: RabbitMQ host to connect to.RMQ_PORT: Port to connect to RabbitMQ on.RMQ_USER: User for RabbitMQ authentication.RMQ_PASSWORD: Password for RabbitMQ authentication.RMQ_VHOST: (Optional) The Vhost the queue is on, defaults to'/'.RMQ_QUEUE: The name of the queue to read messages from.
Hayfork settings for all consumers:
HAYFORK_SOURCE_FOLDER_REWRITE: (Optional) String in the form of "original_prefix:new_prefix"
If defined, the sourceFolder in SciCat is updated with the new value as soon as the first file is copied and if the prefix matches.
Example: HAYFORK_SOURCE_FOLDER_REWRITE="/nicos/data:sans1
The data is locally written to /nicos/data/2026/P00016-08. After copying the data to HAYFORK_DEST the corresponding entry for the sourceFolder metadata field is rewritten to sans1/2026/P00016-08
NFS:
HAYFORK_SRC: (Optional) The path in the container where files should be copied from, defaults to/data.HAYFORK_DEST: (Optional) The path where files should be written to, defaults to/dest.
S3:
currently under development
The k8s submodule is used for the integration at MLZ and can be ignored for normal operation.
[1] mirrored publicly at https://github.com/mlz-cisx/scicat-ingestor