Skip to content

when job deployed to container group fails to connect or loses connection, job status should be "ERROR" #4909

@kdelee

Description

@kdelee
ISSUE TYPE
  • Bug Report
SUMMARY

This is a known issue of #4189 that will be resolved in follow up PR.

When a job deployed to a container group fails after the pod has been deployed from never being able to connect (no rsync on container for example) or from losing connection (pod prematurely exits/is killed) the job gets failure state "failed" instead of "error", although error is put in result_traceback.

STEPS TO REPRODUCE
  1. create a Container Group but override container args to be sleep 5 so it will exit too fast
  2. Create an job template with our test playbooks to run sleep.yml for 45 seconds
  3. Assign Container Group to JT
  4. launch job
EXPECTED RESULTS

Job fails with state error and get traceback

ACTUAL RESULTS

Job fails with state failed and traceback

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions