Skip to content

Conversation

@rueian
Copy link
Collaborator

@rueian rueian commented Oct 1, 2023

Why are these changes needed?

After introducing the redis cleanup step as a finalizer of a fault-tolerant enable RayCluster in #1412, we still want to keep eyes on the effectiveness of the cleanup step by an E2E test.

Therefore, in this PR, I made the following changes to the clean_up phase of the E2E test for RayCluster:

  1. Instead of using kubectl delete -f first, we only delete the target RayCluster first to leave the redis alive.
  2. In the converge loop, we also check there are no more pods labeled with ray.io/node-type=redis-cleanup.
  3. After the converge loop, we use redis-cli DBSIZE to test the redis is empty or not. If not, we throw an Exception.
  4. Finally, we do the kubectl delete -f to remove redis related resources.

Also, #1459 will be fixed because of these changes.

Discussions

Currently, I have hard-coded the redis deployment name as redis and its secret name as redis-password-secret since it seems there is no better choice without more code changes. So, I would like to discuss which direction we can make improvements.

Hi @kevin85421, should we avoid these hard-coded names by querying the original yaml? Currently, the original yaml is not stored in the CREvent. Or should we create another pod with the same env during the E2E test to check if the redis has been cleaned?

Related issue number

#1412
#1422
#1459

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants