Skip to content

Conversation

@kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Mar 9, 2024

Why are these changes needed?

#1956 (comment)

This PR removes ConfigMap from the YAML. In this doc, we need to run kubectl create -f ray-job.kueue-toy-sample.yaml multiple times. If the YAML includes a ConfigMap, users may see a warning message indicating that the ConfigMap already exists. Removing the ConfigMap improves the UX.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@kevin85421 kevin85421 marked this pull request as ready for review March 9, 2024 02:15
@kevin85421 kevin85421 requested a review from jjyao March 9, 2024 02:15
@kevin85421
Copy link
Member Author

cc @andrewsykim

# The alternative value is "HTTPMode", indicating that KubeRay will submit the Ray job by sending an HTTP request to the RayCluster.
# submissionMode: "K8sJobMode"
entrypoint: python /home/ray/samples/sample_code.py
entrypoint: python -c "import time; [print(i) or time.sleep(1) for i in range(600)]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the example used here: https://docs.ray.io/en/master/cluster/kubernetes/getting-started/raycluster-quick-start.html#method-1-execute-a-ray-job-in-the-head-pod

Can this be

entrypoint: python -c "import ray; ray.init(); print(ray.cluster_resources())"

I think an example that uses ray library will be better, even if it's very simple

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ray.cluster_resources() example is not suitable for showcasing gang scheduling. In this doc, we create two RayJob custom resources and ensure that no Pods belonging to the second RayJob will be created because the ClusterQueue doesn't have enough resources. If the RayJob finishes very quickly, it is possible that users may not be able to observe this behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewsykim does it make sense to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense. This was more of a general comment that we don't need ConfigMap for most examples. ray-job.sample.yaml for example can probably inline the example right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ray.cluster_resources() example is not suitable for showcasing gang scheduling.

Actually, even for this example, can we run ray.cluster_resources() in between each sleep?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 6093435

@andrewsykim
Copy link
Member

LGTM, my other comments are not blocking

@kevin85421 kevin85421 merged commit 8c9c732 into ray-project:master Mar 11, 2024
kevin85421 added a commit to kevin85421/kuberay that referenced this pull request Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants