Skip to content

Live Control Plane Migration(CPM) or CPM with zero downtime #10686

@acumino

Description

@acumino

How to categorize this issue?

/area open-source
/kind enhancement

What would you like to be added:
Currently, shoot control plane migrations cause temporary downtime for the shoot cluster because ETCD needs to be backed up and deleted before being restored in the new seed cluster. During this time, the API server, along with all other control plane components, is also taken down. Although the workload within the shoot cluster continues running, it cannot be reconciled, scaled, or updated, leading to downtime since the control plane is unavailable to users.

We would like to support live Control Plane Migration (CPM), allowing migrations to happen without causing downtime for the API server, thereby preventing downtime for the users. This ensures that the shoot cluster remains fully operational, with continuous availability of control-plane for the users.

We(@acumino, @shafeeqes and @ary1992) conducted a POC on this, and it is feasible to implement. More details can be found here.

Why is this needed:
Prevent downtime during control plane migration, ✨ enabling support for more use cases and scenarios, such as 'seed draining/shoot evictions' or streamlined seed cluster deletions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/open-sourceOpen Source (community, enablement, contributions, conferences, CNCF, etc.) relatedkind/enhancementEnhancement, improvement, extensionlifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions