Skip to content

Conversation

@AndrewSirenko
Copy link
Contributor

@AndrewSirenko AndrewSirenko commented Jul 17, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Enhances observability of VolumeAttributesClass Feature by adding additional PVCollector metrics bound/unbound PVCs with labels for namespace, storage_class, volume_attributes_class.

With these additional labels, cluster operators can alarm on pv_collector_bound_pvc_count to detect PVCs that are not able to bind. With the additional StorageClass and VolumeAttributesClass name labels, cluster operators can more easily check if a storage object misconfiguration is the cause of these binding issues.

Which issue(s) this PR fixes:

Fixes #119302

Special notes for your reviewer:

Notes from July 22 SIG Implementation meeting:

@gnufied @jsafrane @AndrewSirenko @xing-yang @msau42 were all present and aligned on the following:

  • We will add new labels to existing PV collector metric. All stakeholders agreed to this. @gnufied @jsafrane especially wanted some kind of metric on PV Collector. The downstream impact of changing metric by adding labels is better than additional metric with confusing name, especially because metric is ALPHA.
  • Metrics should also be added to kuberentes/kube-state-metrics in the near future. We are not bound by code freeze 24th deadline for that PR. @msau42 especially thinks metric should exist in kube-state-metrics.

Does this PR introduce a user-facing change?

Added `storage_class` and `volume_attributes_class` labels to `pv_collector_bound_pvc_count` and `pv_collector_unbound_pvc_count` metrics.    

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/issues/3751

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 17, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jul 17, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @AndrewSirenko!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @AndrewSirenko. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 17, 2024
@k8s-ci-robot k8s-ci-robot added area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 17, 2024
@xing-yang
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 17, 2024
@msau42
Copy link
Member

msau42 commented Jul 18, 2024

BTW I think if this is a draft then the presubmit tests won't run

unboundPVKey = "unbound_pv_count"
boundPVCKey = "bound_pvc_count"
unboundPVCKey = "unbound_pvc_count"
boundPVCWithVACKey = "bound_pvc_experimental_count" // TODO Q: Is there better name?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better name? bound_pvc_with_vac_count?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need "experimental" in the name? We'll have to remove it when it is no longer experimental and it becomes a new metric name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a new label to the existing metrics instead of new metrics?

Another question is why we need to add those metrics in k/k repo? Is it possible to add some changes for kube_persistentvolumeclaim_info in kuberentes/kube-state-metrics. I don't know the historical context.

Above metrics in kuberentes/kube-state-metrics repo is better than k/k.

/sig instrumentation

@mengjiao-liu Can you also take another look?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to remove it when it is no longer experimental and it becomes a new metric name.

Can we add a new label to the existing metrics instead of new metrics?

@jsafrane @gnufied and I were discussing add labels to existing metrics only once VAC is GA, to prevent new time-series being created every time VolumeAttributesClass feature gate is flipped (because adding/removing labels breaks downstream consumers of metric). pv_collector_bound_pvc_with_vac_count name works for me if we stick with metrics in k/k.

Another question is why we need to add those metrics in k/k repo? Is it possible to add some changes for kube_persistentvolumeclaim_info in kuberentes/kube-state-metrics.

I will look into kuberentes/kube-state-metrics metrics repo, thank you @carlory. Added metrics to k/k because it is where alpha KEP pointed, but this might be in kuberentes/kube-state-metrics instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from July 22 SIG Implementation meeting on ^^:

@gnufied @jsafrane @AndrewSirenko @xing-yang @msau42 were all present and aligned on the following:

  1. We will add new label to existing PV collector metric. I will edit PR Today. All stakeholders agreed to this. @gnufied @jsafrane especially wanted some kind of metric on PV Collector.
  2. Metrics should also be added to kuberentes/kube-state-metrics. We are not bound by code freeze 24th deadline for that PR. @msau42 especially thinks metric should exist in kube-state-metrics.

CC @carlory

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Jul 18, 2024
})

// Test for pv controller metrics, concretely: bound/unbound pv/pvc count.
ginkgo.Describe("PVController", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those e2e tests only run on some special cloud providers.

e2eskipper.SkipUnlessProviderIs("gce", "gke", "aws")

@jsafrane @msau42 why the e2e tests depend on these provider? Is it possible to use kind cluster and hostpath-csi-driver instead?

And @AndrewSirenko which job can run the new e2e tests?

All the e2e tests in this file are skipped by pull-kubernetes-e2e-gce-csi-serial, pull-kubernetes-e2e-gce-storage-snapshot and pull-kubernetes-e2e-gce-storage-slow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably now. But for awhile IIRC - mainly it was because of dynamic provisioning and delayed binding etc.

@AndrewSirenko
Copy link
Contributor Author

/test pull-kubernetes-unit

@jsafrane
Copy link
Member

/test pull-kubernetes-e2e-gce
the cluster did not come up

@carlory
Copy link
Member

carlory commented Jul 23, 2024

LGTM for code changes, but new tests are still skipped.

image

Is it possible to merge it before code freeze. can we make the e2e tests improved before test freeze?

cc @msau42 @jsafrane

@gnufied
Copy link
Member

gnufied commented Jul 23, 2024

We can fix e2e with test freeze deadline. the code changes look fine to me.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 23, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: e8aeb97bab6443bef5cb22a1ee17c0fc69d31940

@AndrewSirenko
Copy link
Contributor Author

/unhold

New e2e tests passed on local EC2 kOps cluster.

@msau42
Copy link
Member

msau42 commented Jul 23, 2024

/approve

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 23, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndrewSirenko, msau42

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2024
@AndrewSirenko
Copy link
Contributor Author

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jul 23, 2024
@msau42
Copy link
Member

msau42 commented Jul 23, 2024

/milestone v1.31

@k8s-ci-robot k8s-ci-robot added this to the v1.31 milestone Jul 23, 2024
@k8s-ci-robot k8s-ci-robot merged commit 16c2ad5 into kubernetes:master Jul 23, 2024
akhilerm pushed a commit to akhilerm/kubernetes that referenced this pull request Aug 4, 2024
…tesClass Feature (kubernetes#126166)

* Add labels to PVCollector bound/unbound PVC metrics

* fixup! Add labels to PVCollector bound/unbound PVC metrics

* wip: Fix 'Unknown
    Decorator'

* fixup! Add labels to PVCollector bound/unbound PVC metrics
hungnguyen243 pushed a commit to hungnguyen243/kubernetes that referenced this pull request Aug 16, 2024
…tesClass Feature (kubernetes#126166)

* Add labels to PVCollector bound/unbound PVC metrics

* fixup! Add labels to PVCollector bound/unbound PVC metrics

* wip: Fix 'Unknown
    Decorator'

* fixup! Add labels to PVCollector bound/unbound PVC metrics
SoulPancake pushed a commit to SoulPancake/k8s that referenced this pull request Sep 11, 2024
…tesClass Feature (kubernetes#126166)

* Add labels to PVCollector bound/unbound PVC metrics

* fixup! Add labels to PVCollector bound/unbound PVC metrics

* wip: Fix 'Unknown
    Decorator'

* fixup! Add labels to PVCollector bound/unbound PVC metrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

1.30 - [kep-3751] Metrics Change

7 participants