Skip to content

Conversation

@kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Mar 26, 2023

Why are these changes needed?

Ray can be configured to use TLS on it’s gRPC channels (ref). However, the document requires users to have related knowledge about TLS, HTTPS, and basic cryptography to create correct public-private key pairs, self-signed certificates, and certificate authority.

In KubeRay, users also need to know how does head / workers communicate with each other.

  • head uses POD_IP to communicate with worker.
  • worker uses the fully qualified domain name of the head service (i.e. FQ_RAY_IP) to communicate with the head.

Hence, head needs to have FQ_RAY_IP's certificate, and worker requires the certificate of its POD_IP.

Configuring TLS in KubeRay can be a challenging task for users. To ease this process, this PR offers detailed instructions and an example YAML file, simplifying the setup and configuration of TLS for users

Credit: This is a joint effort. Over the last two weeks, @YQ-Wang and I have been working closely together on this and other issues, synchronizing almost every day.

Related issue number

Closes #889

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(
# Create a Kubernetes cluster
kind create cluster --image=kindest/node:v1.23.0

# Build docker image (path: kuberay/ray-operator/)
make docker-image

# Load the image into the Kind cluster
kind load docker-image controller:latest

# Install the operator with the new image
helm install kuberay-operator kuberay/kuberay-operator --version 0.4.0 --set image.repository=controller,image.tag=latest

# Create a RayCluster with TLS
kubectl apply -f ray-cluster.tls.yaml
  • head Pod's log
    Screen Shot 2023-03-26 at 10 04 27 PM

  • worker Pod's log
    Screen Shot 2023-03-26 at 10 04 42 PM

Verift the TLS authentication (Step 4 in the doc)

# Log in to the worker Pod
kubectl exec -it ${WORKER_POD} -- bash

# Since the head Pod has the certificate of $FQ_RAY_IP, the connection to the worker Pods 
# will be established successfully, and the exit code of the ray health-check command 
# should be 0.
ray health-check --address $FQ_RAY_IP:6379
echo $? # 0

# Since the head Pod has the certificate of $RAY_IP, the connection will fail and an error
# message similar to the following will be displayed: "Peer name raycluster-tls-head-svc is
# not in peer certificate".
ray health-check --address $RAY_IP:6379
  • Screenshot for the verification
    Screen Shot 2023-03-26 at 10 08 50 PM

@kevin85421 kevin85421 changed the title [WIP] TLS authentication [Feature] TLS authentication Mar 27, 2023
@kevin85421 kevin85421 marked this pull request as ready for review March 27, 2023 05:09
Copy link
Contributor

@YQ-Wang YQ-Wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can also mention that the gencert scripts can be prebaked into the docker container so the configMap approach is optional. Otherwise lgtm.

Copy link
Contributor

@architkulkarni architkulkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Some minor comments/questions, but they shouldn't block the PR

kevin85421 and others added 6 commits March 27, 2023 18:55
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>
Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>
Signed-off-by: Kai-Hsun Chen <kaihsun@apache.org>
@kevin85421
Copy link
Member Author

nit: We can also mention that the gencert scripts can be prebaked into the docker container so the configMap approach is optional. Otherwise lgtm.

Updated 376c9b5.

@kevin85421 kevin85421 merged commit ca6d792 into ray-project:master Mar 27, 2023
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] RAY_USE_TLS in Kuberay

4 participants