[RayService][Bug] Partial Removal of Deployments in ray-service.sample.yaml's ServeConfigV2 Causes WaitForServeDeploymentReady State

### Search before asking

- [X] I searched the [issues](https://github.com/ray-project/kuberay/issues) and found no similar issues.


### KubeRay Component

ray-operator, Others

### What happened + What you expected to happen

In `ray-service.sample.yaml`, the `serveConfigV2` defines `deployments` for `MangoStand`, `OrangeStand`, and `PearStand`.
- If two of these deployments are removed (e.g., keeping only `MangoStand` and `FruitMarket`), running `kubectl get rayservice` shows the state as `WaitForServeDeploymentReady`, and the service does not reach a ready state.
- However, if only one deployment is removed (e.g., keeping two of the three), the service works as expected.

### Reproduction script

Edit the `ray-service.sample.yaml` file to remove two of the three deployments in `serveConfigV2` (e.g., keep only `MangoStand`).
Apply the updated `ray-service.sample.yaml` using `kubectl apply -f ray-service.sample.yaml`.
Run` kubectl get rayservice` and observe the status remaining in `WaitForServeDeploymentReady`

```yaml
# Make sure to increase resource requests and limits before using this example in production.
# For examples with more realistic resource configuration, see
# ray-cluster.complete.large.yaml and
# ray-cluster.autoscaler.large.yaml.
apiVersion: ray.io/v1
kind: RayService
metadata:
  name: rayservice-sample
spec:
  # serveConfigV2 takes a yaml multi-line scalar, which should be a Ray Serve multi-application config. See https://docs.ray.io/en/latest/serve/multi-app.html.
  serveConfigV2: |
    applications:
      - name: fruit_app
        import_path: fruit.deployment_graph
        route_prefix: /fruit
        runtime_env:
          working_dir: "https://github.com/ray-project/test_dag/archive/78b4a5da38796123d9f9ffff59bab2792a043e95.zip"
        deployments:
          - name: MangoStand
            num_replicas: 2
            max_replicas_per_node: 1
            user_config:
              price: 3
            ray_actor_options:
              num_cpus: 0.1
          - name: FruitMarket
            num_replicas: 1
            ray_actor_options:
              num_cpus: 0.1
      - name: math_app
        import_path: conditional_dag.serve_dag
        route_prefix: /calc
        runtime_env:
          working_dir: "https://github.com/ray-project/test_dag/archive/78b4a5da38796123d9f9ffff59bab2792a043e95.zip"
        deployments:
          - name: Adder
            num_replicas: 1
            user_config:
              increment: 3
            ray_actor_options:
              num_cpus: 0.1
          - name: Multiplier
            num_replicas: 1
            user_config:
              factor: 5
            ray_actor_options:
              num_cpus: 0.1
          - name: Router
            num_replicas: 1
  rayClusterConfig:
    rayVersion: '2.9.0' # should match the Ray version in the image of the containers
    ######################headGroupSpecs#################################
    # Ray head pod template.
    headGroupSpec:
      # The `rayStartParams` are used to configure the `ray start` command.
      # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
      # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
      rayStartParams:
        dashboard-host: '0.0.0.0'
      #pod template
      template:
        spec:
          containers:
            - name: ray-head
              image: rayproject/ray:2.9.0
              resources:
                limits:
                  cpu: 2
                  memory: 4Gi
                requests:
                  cpu: 2
                  memory: 4Gi
              ports:
                - containerPort: 6379
                  name: gcs-server
                - containerPort: 8265 # Ray dashboard
                  name: dashboard
                - containerPort: 10001
                  name: client
                - containerPort: 8000
                  name: serve
    workerGroupSpecs:
      # the pod replicas in this group typed worker
      - replicas: 1
        minReplicas: 1
        maxReplicas: 5
        # logical group name, for this called small-group, also can be functional
        groupName: small-group
        # The `rayStartParams` are used to configure the `ray start` command.
        # See https://github.com/ray-project/kuberay/blob/master/docs/guidance/rayStartParams.md for the default settings of `rayStartParams` in KubeRay.
        # See https://docs.ray.io/en/latest/cluster/cli.html#ray-start for all available options in `rayStartParams`.
        rayStartParams: {}
        #pod template
        template:
          spec:
            containers:
              - name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name',  or '123-abc'
                image: rayproject/ray:2.9.0
                lifecycle:
                  preStop:
                    exec:
                      command: ["/bin/sh","-c","ray stop"]
                resources:
                  limits:
                    cpu: "1"
                    memory: "2Gi"
                  requests:
                    cpu: "500m"
                    memory: "2Gi"

```

### Anything else

We need to investigate why removing two deployments causes the issue while removing only one deployment does not. It seems like there might be a threshold or configuration issue in serveConfigV2.

### Are you willing to submit a PR?

- [ ] Yes I am willing to submit a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RayService][Bug] Partial Removal of Deployments in ray-service.sample.yaml's ServeConfigV2 Causes WaitForServeDeploymentReady State #2557

Search before asking

KubeRay Component

What happened + What you expected to happen

Reproduction script

Anything else

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RayService][Bug] Partial Removal of Deployments in ray-service.sample.yaml's ServeConfigV2 Causes WaitForServeDeploymentReady State #2557

Description

Search before asking

KubeRay Component

What happened + What you expected to happen

Reproduction script

Anything else

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions