Globus Genomics Galaxy

S3 + Kubernetes setup: Galaxy's job execution requires job working directory, inputs, outputs, tools and indices available on the local filesystem. In order to use distributed computation resources, NFS has to be mounted to the workers so that these files are accessible. But NFS has its limitation, it creates large network burden for the head node when there are a lot of workers running at the same time. We've developed a solution to solve this issue and in the meantime lower the cost.

S3 is a low cost storage, it is durable and highly available, also data transfer is free and fast inside the same AWS region, we only have to pay the data operation cost which is reasonable. So S3 is very ideal to be used as the main storage and a relay to distribute data among the workers. Also a S3 bucket can be mounted as file system, but this is only stable enough to be used as read only file system, so we won't use it for workers, only for the head node when users want to read output datasets.

Containers keep a clean environment for the job executions. Kubernetes is a great tool to help with running containers on the cloud. It comes with a great auto scaler to scale up and down the worker fleets. The auto scaler can launch workers based on job requirements and also diversify the instance types. This increases the resource utilisation and lower the chance of job evictions, since we use spot instances to lower the cost and having diversified instance types is very helpful.

Development: Galaxy's Kubernetes runner is updated and a job execution script is created to achieve this goal.

Pre job: On the head node, from the job execution files, get the job working directory path and sync the directory to S3; Get the inputs and outputs paths, upload them to S3 if not already exist on S3, create symlinks pointing to the file in the mounted S3 file system; Get the tools information and indices information; Replace the job execution command with the command to run the job execution script along with the arguments.
Run job: In the container on the worker, the job execution script will download all the required working directory, inputs and outputs from S3; Tools are provided by NFS; Indices will also be downloaded from S3; Execute the job; After the job is finished, upload outputs and working directory back to S3, clean out the data.
Post job: On the head node, sync the job working directory from S3 so that the job logs and exit code files are available on the local file system. The users can read the outputs because the S3 is mounted to the head node.

To use this setup, the tool needs to be configured to use the Kubernetes runner in the job_conf.xml, that is choose one of the k8s destinations depends on the job resource requirements; Provide S3 bucket information in the galaxy.yml, and make sure head node and workers have access to the bucket such as using IAM roles.

Name		Name	Last commit message	Last commit date
Latest commit History 40,613 Commits
.ci		.ci
.circleci		.circleci
client		client
config		config
contrib		contrib
cron		cron
database		database
display_applications		display_applications
doc		doc
lib		lib
locale		locale
packages		packages
scripts		scripts
static		static
templates		templates
test-data		test-data
test		test
tool-data-eupath		tool-data-eupath
tool-data		tool-data
tools		tools
.coveragerc		.coveragerc
.gitignore		.gitignore
.travis.yml		.travis.yml
CITATION		CITATION
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
SECURITY_POLICY.md		SECURITY_POLICY.md
create_db.sh		create_db.sh
extract_dataset_parts.sh		extract_dataset_parts.sh
gg_notes		gg_notes
manage_db.sh		manage_db.sh
pytest.ini		pytest.ini
requirements.txt		requirements.txt
rolling_restart.sh		rolling_restart.sh
run.sh		run.sh
run_reports.sh		run_reports.sh
run_tests.sh		run_tests.sh
run_tool_shed.sh		run_tool_shed.sh
setup.cfg		setup.cfg
tool_list.py		tool_list.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Globus Genomics Galaxy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Globus Genomics Galaxy

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages