ISSUE TYPE
SUMMARY
This is a known issue of #4189 that will be resolved in follow up PR.
When a job deployed to a container group fails after the pod has been deployed from never being able to connect (no rsync on container for example) or from losing connection (pod prematurely exits/is killed) the job gets failure state "failed" instead of "error", although error is put in result_traceback.
STEPS TO REPRODUCE
- create a Container Group but override container
args to be sleep 5 so it will exit too fast
- Create an job template with our test playbooks to run
sleep.yml for 45 seconds
- Assign Container Group to JT
- launch job
EXPECTED RESULTS
Job fails with state error and get traceback
ACTUAL RESULTS
Job fails with state failed and traceback