Skip to content

mlz-cisx/hayfork

Repository files navigation

Hayfork

Small service to copy data to our central storage on demand and update SciCat appropriately. Listens to a RabbitMQ queue to receive work by e.g. the MLZ SciCat Ingestor[1]. It copies data from HAYFORK_SRC to HAYFORK_DEST using a so called consumer and updates the relevant entries in the configured SciCat catalogue[2].

The container has to be run with the correct NFS mounts or similar.

Currently the following consumers are available:

  • NFS
  • S3 (under development)

Usage

The following environment variables can be set:

The consumer has to be selected:

  • HAYFORK_CONSUMER: defines which consumer should be used
    • NFS: nfs or direct filesystem access in the container
    • S3: s3 api

The SciCat backend which should be updated can be configured using:

  • SCICAT_HOST: URL under which the SciCat backend can be reached, e.g. http://localhost:3000.
  • SCICAT_USER: User to use for SciCat login.
  • SCICAT_PASSWORD: Password for the SciCat user.

RabbitMQ messaging is configured with the following settings:

  • RMQ_HOST: RabbitMQ host to connect to.
  • RMQ_PORT: Port to connect to RabbitMQ on.
  • RMQ_USER: User for RabbitMQ authentication.
  • RMQ_PASSWORD: Password for RabbitMQ authentication.
  • RMQ_VHOST: (Optional) The Vhost the queue is on, defaults to '/'.
  • RMQ_QUEUE: The name of the queue to read messages from.

Hayfork settings for all consumers:

HAYFORK_SOURCE_FOLDER_REWRITE: (Optional) String in the form of "original_prefix:new_prefix" If defined, the sourceFolder in SciCat is updated with the new value as soon as the first file is copied and if the prefix matches.

Example: HAYFORK_SOURCE_FOLDER_REWRITE="/nicos/data:sans1

The data is locally written to /nicos/data/2026/P00016-08. After copying the data to HAYFORK_DEST the corresponding entry for the sourceFolder metadata field is rewritten to sans1/2026/P00016-08

NFS:

  • HAYFORK_SRC: (Optional) The path in the container where files should be copied from, defaults to /data.
  • HAYFORK_DEST: (Optional) The path where files should be written to, defaults to /dest.

S3:

currently under development

Note

The k8s submodule is used for the integration at MLZ and can be ignored for normal operation.

References

[1] mirrored publicly at https://github.com/mlz-cisx/scicat-ingestor

[2] https://github.com/SciCatProject/backend

About

A service to copy files listed in SciCat between storages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors