Skip to content

Conversation

@rueian
Copy link
Collaborator

@rueian rueian commented Jan 18, 2025

Why are these changes needed?

Follow-ups for #2696 (comment) and resolves #2764.

  • Add e2e tests to check if Redis is empty at the end of each GCS FT test.
  • Fix Redis cleanup job to accept empty passwords.

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Signed-off-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: Rueian <rueiancsie@gmail.com>
@rueian rueian changed the title Redis e2e cleanup check [GCS FT] Redis e2e cleanup check Jan 18, 2025
_, err := t.Client().Core().CoreV1().Pods(namespace).Apply(
t.Ctx(),
appsv1ac.Deployment("redis", namespace).
WithSpec(appsv1ac.DeploymentSpec().
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for us to use Deployment for Redis in tests. just use Pod instead.

if output = checkDBSize(); output == "0" {
return
}
time.Sleep(time.Second)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for the cleanup job.

@rueian rueian marked this pull request as ready for review January 18, 2025 20:55
)
assert.NoError(t.T(), err)

checkDBSize := func() string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ExecPodCmd instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


return strings.TrimSpace(stdout.String() + stderr.String())
}
return func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only return checkDBSize and use Eventually in raycluster_gcsft_test.go?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

"redis_address = redis_address if '://' in redis_address else 'redis://' + redis_address; " +
"parsed = urlparse(redis_address); " +
"sys.exit(1) if not cleanup_redis_storage(host=parsed.hostname, port=parsed.port, password=os.getenv('REDIS_PASSWORD', parsed.password), use_ssl=parsed.scheme=='rediss', storage_namespace=os.getenv('RAY_external_storage_namespace')) else None\"",
"sys.exit(1) if not cleanup_redis_storage(host=parsed.hostname, port=parsed.port, password=os.getenv('REDIS_PASSWORD', parsed.password or ''), use_ssl=parsed.scheme=='rediss', storage_namespace=os.getenv('RAY_external_storage_namespace')) else None\"",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind explaining the change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the change, we could pass None to the password parameter and resulting in this error:
image

namespace := test.NewTestNamespace()

deployRedis(test, namespace.Name, tc.redisPassword)
defer deployRedis(test, namespace.Name, tc.redisPassword)()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will the stack trace look like if the check in defer fails? If it is not easy to read, maybe we can return the clean up function and explicitly call it with Eventually at the end of the test logic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trace is the defer approach:

=== RUN   TestGcsFaultToleranceAnnotations/GCS_FT_without_redis_password
Modified Ray Image to: rayproject/ray:2.40.0-aarch64 for ARM chips
    raycluster_gcsft_test.go:211: Created RayCluster test-ns-72s65/raycluster-gcsft successfully
    raycluster_gcsft_test.go:213: Waiting for RayCluster test-ns-72s65/raycluster-gcsft to become ready
    raycluster_gcsft_test.go:217: Verifying environment variables on Head Pod
    support.go:253: failed cleanup redis expect 0 but got: 3
    test.go:110: Retrieving Pod Container test-ns-72s65/redis/redis logs
    test.go:98: Creating ephemeral output directory as KUBERAY_TEST_OUTPUT_DIR env variable is unset
    test.go:101: Output directory has been created at: /var/folders/sw/cyfhnvns2hv82r3fj1sgwsb00000gn/T/TestGcsFaultToleranceAnnotationsGCS_FT_without_redis_password2730163886/001
--- FAIL: TestGcsFaultToleranceAnnotations/GCS_FT_without_redis_password (48.50s)

This one is the trace for ExecPodCmd + Eventually.

=== RUN   TestGcsFaultToleranceAnnotations/GCS_FT_without_redis_password
Modified Ray Image to: rayproject/ray:2.40.0-aarch64 for ARM chips
    raycluster_gcsft_test.go:214: Created RayCluster test-ns-s257l/raycluster-gcsft successfully
    raycluster_gcsft_test.go:216: Waiting for RayCluster test-ns-s257l/raycluster-gcsft to become ready
    raycluster_gcsft_test.go:220: Verifying environment variables on Head Pod
    core.go:88: Executing command: [redis-cli --no-auth-warning DBSIZE]
    core.go:101: Command stdout: 3
    core.go:102: Command stderr: 
    ...
    core.go:88: Executing command: [redis-cli --no-auth-warning DBSIZE]
    core.go:101: Command stdout: 3
    core.go:102: Command stderr: 
    core.go:88: Executing command: [redis-cli --no-auth-warning DBSIZE]
    core.go:101: Command stdout: 3
    core.go:102: Command stderr: 
    raycluster_gcsft_test.go:236: 
        Timed out after 30.072s.
        Expected
            <string>: 3
        to be equivalent to
            <string>: 0
    test.go:110: Retrieving Pod Container test-ns-s257l/redis/redis logs
    test.go:98: Creating ephemeral output directory as KUBERAY_TEST_OUTPUT_DIR env variable is unset
    test.go:101: Output directory has been created at: /var/folders/sw/cyfhnvns2hv82r3fj1sgwsb00000gn/T/TestGcsFaultToleranceAnnotationsGCS_FT_without_redis_password315262234/001
--- FAIL: TestGcsFaultToleranceAnnotations/GCS_FT_without_redis_password (51.40s)

@kevin85421 kevin85421 self-assigned this Jan 20, 2025
@rueian rueian force-pushed the redis-e2e-cleanup-check branch from 8b02924 to b73254d Compare January 20, 2025 02:12
Signed-off-by: Rueian <rueiancsie@gmail.com>
@rueian rueian force-pushed the redis-e2e-cleanup-check branch from b73254d to 8f9a030 Compare January 20, 2025 02:18
@kevin85421 kevin85421 merged commit a81ea81 into ray-project:master Jan 20, 2025
24 checks passed
win5923 pushed a commit to win5923/kuberay that referenced this pull request Jan 20, 2025
win5923 pushed a commit to win5923/kuberay that referenced this pull request Jan 20, 2025
Ygnas pushed a commit to Ygnas/kuberay that referenced this pull request Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Check Redis cleanup in Golang e2e tests

2 participants