-
Notifications
You must be signed in to change notification settings - Fork 7k
Closed
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcore-clustersFor launching and managing Ray clusters/jobs/kubernetesFor launching and managing Ray clusters/jobs/kubernetes
Description
What happened + What you expected to happen
If jobs are submitted with the same ID, the second one will fail with an internal unfriendly error (though it hints at the root cause). Even if the first job would have succeeded, its status is overwritten with the failed status from the second job.
❯ ray job submit --submission-id blah3 --no-wait -- echo hi again & ray job submit --submission-id blah3 --no-wait -- echo hi again
❯ ray job status blah3
Job submission server address: http://127.0.0.1:8265
------------------
Job 'blah3' failed
------------------
Status message: Failed to start Job Supervisor actor: The name _ray_internal_job_actor_blah3 (namespace=SUPERVISOR_ACTOR_RAY_NAMESPACE) is already taken. Please use a different name or get the existing actor using ray.get_actor('_ray_internal_job_actor_blah3', namespace='SUPERVISOR_ACTOR_RAY_NAMESPACE').
I would expect that (1) the first command should succeed and its status should reflect that, and (2) the second should fail with RuntimeError: Job blah3 already exists.. This currently happens if the first command is given a second or so to run and update its internal JobInfo, but this should still happen even if the commands are issued right after one another.
Versions / Dependencies
master, MacOS, Python 3.8
Reproduction script
Above
Issue Severity
Medium: It is a significant difficulty but I can work around it.
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tcore-clustersFor launching and managing Ray clusters/jobs/kubernetesFor launching and managing Ray clusters/jobs/kubernetes