Skip to content

Conversation

@alecgunny
Copy link

My team and I have been augmenting pycondor's functionality in some of our projects to include APIs for more pythonic interaction with submitted jobs. We'll continue to develop them as submodules using my fork, including adding more features and testing, but I wanted to submit these as a PR now to start a discussion with the pycondor team as to whether they think something like this might be useful, and if so how these might need to change to fit their vision. The idea is essentially:

import pycondor
from pycondor.cluster import JobStatus

job = pycondor.Job('myjob', 'myscript.py', queue=5)
job.build()

# submit_job now returns a JobCluster object containing a list of
# Process objects representing each individual job
cluster = job.submit_job()

print(cluster.id)
assert len(cluster.procs) == 5
assert cluster.procs[0].id == f"{cluster.id}.000"

# wait for all the jobs to finish:
while not cluster.check_status(JobStatus.COMPLETED, how="all"):
    if cluster.check_status([JobStatus.HELD, JobStatus.FAILED, JobStatus.CANCELLED], how="any"):
        raise ValueError("Something went wrong!")
        cluster.rm()

There's also some utility functions for getting the cluster-internal IPs of running jobs, which was sort of the impetus for all this to begin with: deploying ephemeral distributed services.

Curious to get folks' thoughts, happy to talk more about this whenever as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant