Skip to content

Conversation

@divyansh42
Copy link
Member

@divyansh42 divyansh42 commented Jul 11, 2025

Changes

This PR add two metrics:

  • Add storage latency tracking to measure time from run
    completion to successful database storage. This doesn't track
    If the runs fail to get stored in the database.
    Supported tags are: kind (pipelinerun/taskrun), namespace.

  • Another one is runs_not_stored_count metrics which tracks
    number of runs (Pipelineruns/Taskruns) which are deleted before
    getting stored into the database.

/kind feature

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you review them:

  • Has Docs included if any changes are user facing
  • Has Tests included if any functionality added or changed
  • Tested your changes locally (if this is a code change)
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including functionality, content, code)
  • Has a kind label. You can add a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • Release notes block below has been updated with any user-facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings)
  • Release notes contain the string "action required" if the change requires additional action from users switching to the new release

Release Notes

feat: add storage latency and runs not stored count metrics for PipelineRuns and TaskRuns

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Jul 11, 2025
@tekton-robot tekton-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jul 11, 2025
@divyansh42
Copy link
Member Author

Output from the testing:

curl -s http://localhost:9090/metrics | grep "storage"                                                                                       diagrawa@diagrawa-mac
# HELP watcher_run_storage_latency_seconds time from run completion to successful storage
# TYPE watcher_run_storage_latency_seconds histogram
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="0.1"} 0
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="0.5"} 0
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="1"} 0
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="2"} 0
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="5"} 0
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="10"} 0
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="30"} 1
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="60"} 1
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="120"} 1
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="300"} 1
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="600"} 1
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="1800"} 1
watcher_run_storage_latency_seconds_bucket{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task="",le="+Inf"} 1
watcher_run_storage_latency_seconds_sum{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task=""} 15
watcher_run_storage_latency_seconds_count{kind="pipelinerun",namespace="default",pipeline="sum-three-pipeline",task=""} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="0.1"} 0
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="0.5"} 0
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="1"} 0
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="2"} 0
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="5"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="10"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="30"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="60"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="120"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="300"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="600"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="1800"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add",le="+Inf"} 1
watcher_run_storage_latency_seconds_sum{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add"} 2
watcher_run_storage_latency_seconds_count{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="first-add"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="0.1"} 0
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="0.5"} 0
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="1"} 0
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="2"} 0
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="5"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="10"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="30"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="60"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="120"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="300"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="600"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="1800"} 1
watcher_run_storage_latency_seconds_bucket{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add",le="+Inf"} 1
watcher_run_storage_latency_seconds_sum{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add"} 3
watcher_run_storage_latency_seconds_count{kind="taskrun",namespace="default",pipeline="sum-three-pipeline",task="second-add"} 1

@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/metrics/metrics.go Do not exist 61.8%
pkg/pipelinerunmetrics/metrics.go 69.1% 64.9% -4.2
pkg/taskrunmetrics/metrics.go 74.3% 71.1% -3.3
pkg/watcher/reconciler/dynamic/dynamic.go 50.7% 49.3% -1.4
pkg/watcher/reconciler/pipelinerun/reconciler.go 77.5% 73.7% -3.8
pkg/watcher/reconciler/taskrun/reconciler.go 66.1% 62.3% -3.8

@divyansh42 divyansh42 force-pushed the add-latency-metrics branch from 23fc41d to e22e87a Compare July 11, 2025 09:29
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/metrics/metrics.go Do not exist 61.8%
pkg/pipelinerunmetrics/metrics.go 69.1% 64.9% -4.2
pkg/taskrunmetrics/metrics.go 74.3% 71.1% -3.3
pkg/watcher/reconciler/dynamic/dynamic.go 50.7% 49.3% -1.4
pkg/watcher/reconciler/pipelinerun/reconciler.go 77.5% 73.7% -3.8
pkg/watcher/reconciler/taskrun/reconciler.go 66.1% 62.3% -3.8

@divyansh42
Copy link
Member Author

/retest

@divyansh42
Copy link
Member Author

/cc @enarha @aThorp96 @khrm
Can you please take a look?

@aThorp96 I have rebased on top of your PR #1064

@tekton-robot tekton-robot requested a review from enarha July 11, 2025 10:05
@tekton-robot
Copy link

@divyansh42: GitHub didn't allow me to request PR reviews from the following users: aThorp96.

Note that only tektoncd members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @enarha @aThorp96 @khrm
Can you please take a look?

@aThorp96 I have rebased on top of your PR #1064

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot requested a review from khrm July 11, 2025 10:05
@divyansh42 divyansh42 force-pushed the add-latency-metrics branch from e22e87a to 72d7f5d Compare July 11, 2025 10:09
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/metrics/metrics.go Do not exist 61.8%
pkg/pipelinerunmetrics/metrics.go 69.1% 64.9% -4.2
pkg/taskrunmetrics/metrics.go 74.3% 71.1% -3.3
pkg/watcher/reconciler/dynamic/dynamic.go 50.7% 49.3% -1.4
pkg/watcher/reconciler/pipelinerun/reconciler.go 77.5% 73.7% -3.8
pkg/watcher/reconciler/taskrun/reconciler.go 66.1% 62.3% -3.8

@divyansh42 divyansh42 force-pushed the add-latency-metrics branch from 72d7f5d to 9c0d74c Compare July 11, 2025 10:44
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/metrics/metrics.go Do not exist 61.8%
pkg/pipelinerunmetrics/metrics.go 69.1% 64.9% -4.2
pkg/taskrunmetrics/metrics.go 74.3% 71.1% -3.3
pkg/watcher/reconciler/dynamic/dynamic.go 50.7% 49.3% -1.4
pkg/watcher/reconciler/pipelinerun/reconciler.go 77.5% 73.7% -3.8
pkg/watcher/reconciler/taskrun/reconciler.go 66.1% 62.3% -3.8

@divyansh42 divyansh42 force-pushed the add-latency-metrics branch 2 times, most recently from e34c72d to fe521aa Compare July 11, 2025 11:44
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/metrics/extractors.go Do not exist 93.1%
pkg/metrics/metrics.go Do not exist 53.4%
pkg/pipelinerunmetrics/metrics.go 69.1% 64.9% -4.2
pkg/taskrunmetrics/metrics.go 74.3% 71.1% -3.3
pkg/watcher/reconciler/dynamic/dynamic.go 50.7% 49.3% -1.4
pkg/watcher/reconciler/pipelinerun/reconciler.go 77.5% 73.7% -3.8
pkg/watcher/reconciler/taskrun/reconciler.go 66.1% 62.3% -3.8

@divyansh42 divyansh42 force-pushed the add-latency-metrics branch from fe521aa to 68eb24d Compare July 14, 2025 06:49
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/metrics/extractors.go Do not exist 93.1%
pkg/metrics/metrics.go Do not exist 53.4%
pkg/pipelinerunmetrics/metrics.go 69.1% 64.9% -4.2
pkg/taskrunmetrics/metrics.go 74.3% 71.1% -3.3
pkg/watcher/reconciler/dynamic/dynamic.go 50.7% 49.3% -1.4
pkg/watcher/reconciler/pipelinerun/reconciler.go 77.5% 73.7% -3.8
pkg/watcher/reconciler/taskrun/reconciler.go 66.1% 62.3% -3.8

@khrm khrm requested a review from Copilot July 14, 2025 11:03

This comment was marked as outdated.

@divyansh42 divyansh42 force-pushed the add-latency-metrics branch from 68eb24d to 4423576 Compare July 14, 2025 17:21
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/metrics/extractors.go Do not exist 100.0%
pkg/metrics/metrics.go Do not exist 53.4%
pkg/pipelinerunmetrics/metrics.go 69.1% 64.9% -4.2
pkg/taskrunmetrics/metrics.go 74.3% 71.1% -3.3
pkg/watcher/reconciler/dynamic/dynamic.go 50.7% 49.3% -1.4
pkg/watcher/reconciler/pipelinerun/reconciler.go 77.5% 73.7% -3.8
pkg/watcher/reconciler/taskrun/reconciler.go 66.1% 62.3% -3.8

@divyansh42 divyansh42 force-pushed the add-latency-metrics branch from 4423576 to 92b7a82 Compare July 14, 2025 17:31
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/metrics/extractors.go Do not exist 100.0%
pkg/metrics/metrics.go Do not exist 53.4%
pkg/pipelinerunmetrics/metrics.go 69.1% 64.9% -4.2
pkg/taskrunmetrics/metrics.go 74.3% 71.1% -3.3
pkg/watcher/reconciler/dynamic/dynamic.go 50.7% 49.3% -1.4
pkg/watcher/reconciler/pipelinerun/reconciler.go 77.5% 73.7% -3.8
pkg/watcher/reconciler/taskrun/reconciler.go 66.1% 62.3% -3.8

@divyansh42 divyansh42 force-pushed the add-latency-metrics branch from d222da6 to 101d1a5 Compare July 29, 2025 08:04
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/metrics/metrics.go Do not exist 62.5%
pkg/pipelinerunmetrics/metrics.go 69.1% 67.3% -1.8
pkg/taskrunmetrics/metrics.go 74.3% 73.0% -1.4
pkg/watcher/reconciler/dynamic/dynamic.go 50.7% 50.0% -0.7
pkg/watcher/reconciler/pipelinerun/reconciler.go 77.5% 70.0% -7.5
pkg/watcher/reconciler/taskrun/reconciler.go 66.1% 58.5% -7.6

@divyansh42 divyansh42 force-pushed the add-latency-metrics branch from 101d1a5 to 71470c6 Compare July 30, 2025 11:30
@tekton-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/metrics/metrics.go Do not exist 62.5%
pkg/pipelinerunmetrics/metrics.go 69.1% 67.3% -1.8
pkg/taskrunmetrics/metrics.go 74.3% 73.0% -1.4
pkg/watcher/reconciler/dynamic/dynamic.go 50.7% 50.0% -0.7
pkg/watcher/reconciler/pipelinerun/reconciler.go 77.5% 70.0% -7.5
pkg/watcher/reconciler/taskrun/reconciler.go 66.1% 58.5% -7.6

@khrm
Copy link
Contributor

khrm commented Oct 14, 2025

@divyansh42 This requires rebase. Let's merge this.

@divyansh42 divyansh42 force-pushed the add-latency-metrics branch 2 times, most recently from 1c58af7 to 09d2cb1 Compare October 24, 2025 06:57
@khrm
Copy link
Contributor

khrm commented Oct 27, 2025

It's still showing conflict. @divyansh42

@tekton-robot tekton-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2025
aThorp96 and others added 2 commits October 28, 2025 15:59
Add storage latency tracking to measure time from run
completion to successful database storage. This doesn't track
if the runs are failed to get stored on the database.
Supported tags are: kind (pipelinerun/taskrun), namespace, pipeline, task
pipeline and task denotes the name of pipeline and task respectively

Signed-off-by: divyansh42 <diagrawa@redhat.com>
@tekton-robot tekton-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 28, 2025
@divyansh42
Copy link
Member Author

Need to fix the unit tests, something wrong with the rebase

@enarha
Copy link
Contributor

enarha commented Oct 29, 2025

/approve

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 29, 2025
Copy link
Member

@aThorp96 aThorp96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion may be out of scope for this PR but lmk what you think

If we could determine that we failed to delete finalizers on the last storage attempt then we could decrement the metric perhaps?

We can check if the finalizers failed to be updated by listing those events for the pipelinerun:

		events, err := r.kubeClientSet.CoreV1().Events(pr.GetNamespace()).List(ctx, metav1.ListOptions{
			FieldSelector: "reason=FinalizerUpdateFailed,involvedObject.uid=" + string(pr.GetUID()),
		})

However there's still two elements missing for this strategy to work:

  1. We can't currently tell if that event was emitted by TektonResults. In order to do that we'd have to set an AgentName in the ControllerOptions when creating the controller (e.g. AgentName: "tekton-results-pipelinerun-controller"), and then reference the agent name in the field selector: reason=FinalizerUpdateFailed,reportingComponent=tekton-results-pipelinerun-controller,involvedObject.uid=" + string(pr.GetUID())
  2. Since the FinalierUpdateFailed event can also be emitted when setting the finalizer, we need to ensure we're only checking for events which occur after the pipelinerun was completed.

Putting it all together I think this becomes a lot more resilient to retries with something like this:

// pipelinerun/controller.go

	impl := pipelinerunreconciler.NewImpl(ctx, c, func(_ *controller.Impl) controller.Options {
		return controller.Options{
			// This results pipelinerun reconciler shouldn't mutate the pipelinerun's status.
			SkipStatusUpdates: true,
			ConfigStore:       configStore,
			FinalizerName:     "results.tekton.dev/pipelinerun",
			AgentName:        "tekton-results-pipelinerun-controller",
		}
	})

// pipelinerun/reconciler.go
		events, _ := r.kubeClientSet.CoreV1().Events(pr.GetNamespace()).List(ctx, metav1.ListOptions{
			FieldSelector: "reason=FinalizerUpdateFailed,reportingComponent=tekton-results-pipelinerun-controller,involvedObject.uid=" + string(pr.GetUID()),
		})
		if slices.ContainsFunc(events.Items, func(event corev1.Event) bool { event.LastTimestamp.After(pr.Status.CompletionTime.Time) }) {
			if err := metrics.CountRunNotStored(ctx, pr.Namespace, "PipelineRun"); err != nil {
				logging.FromContext(ctx).Errorf("error counting PipelineRun as not stored: %w", err)
			}
		}

| `watcher_taskrun_delete_duration_seconds_[bucket, sum, count]` | The duration that take to delete the TaskRun since completion | Histogram/LastValue(Gauge) | `*pipeline`=&lt;pipeline_name&gt; <br> `status`=&lt;status&gt; <br> `*task`=&lt;task_name&gt; <br> `*taskrun`=&lt;taskrun_name&gt;<br> `namespace`=&lt;pipelineruns-taskruns-namespace&gt; | experimental |
| `watcher_pipelinerun_delete_count` | The total count of deleted PipelineRun | Counter | `status`=&lt;status&gt; <br> `namespace`=&lt;pipelinerun-namespace&gt; | experimental |
| `watcher_taskrun_delete_count` | The total count of deleted TaskRun | Counter | `status`=&lt;status&gt; <br> `namespace`=&lt;pipelinerun-namespace&gt; | experimental |
| `watcher_run_storage_latency_seconds_[bucket, sum, count]` | The duration between run completion and successful storage | Histogram | `kind`=&lt;pipelinerun\|taskrun&gt; <br> `namespace`=&lt;run_namespace&gt; | experimental |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include a note about the expected behavior of watcher_run_storage_latency_seconds when the Run is stored before it is complete?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do that.
I'll add a note.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a note.

Comment on lines +202 to +204
if err := metrics.CountRunNotStored(ctx, pr.Namespace, "PipelineRun"); err != nil {
logging.FromContext(ctx).Errorf("error counting PipelineRun as not stored: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we had a way to only run this if the client reconciler succeeded :/

Maybe we could have a TEP for adding support for a PostReconcile hook on the PipelineRunReconciler interface

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we should add.

@divyansh42
Copy link
Member Author

This suggestion may be out of scope for this PR but lmk what you think

If we could determine that we failed to delete finalizers on the last storage attempt then we could decrement the metric perhaps?

We can check if the finalizers failed to be updated by listing those events for the pipelinerun:

		events, err := r.kubeClientSet.CoreV1().Events(pr.GetNamespace()).List(ctx, metav1.ListOptions{
			FieldSelector: "reason=FinalizerUpdateFailed,involvedObject.uid=" + string(pr.GetUID()),
		})

However there's still two elements missing for this strategy to work:

  1. We can't currently tell if that event was emitted by TektonResults. In order to do that we'd have to set an AgentName in the ControllerOptions when creating the controller (e.g. AgentName: "tekton-results-pipelinerun-controller"), and then reference the agent name in the field selector: reason=FinalizerUpdateFailed,reportingComponent=tekton-results-pipelinerun-controller,involvedObject.uid=" + string(pr.GetUID())
  2. Since the FinalierUpdateFailed event can also be emitted when setting the finalizer, we need to ensure we're only checking for events which occur after the pipelinerun was completed.

Putting it all together I think this becomes a lot more resilient to retries with something like this:

// pipelinerun/controller.go

	impl := pipelinerunreconciler.NewImpl(ctx, c, func(_ *controller.Impl) controller.Options {
		return controller.Options{
			// This results pipelinerun reconciler shouldn't mutate the pipelinerun's status.
			SkipStatusUpdates: true,
			ConfigStore:       configStore,
			FinalizerName:     "results.tekton.dev/pipelinerun",
			AgentName:        "tekton-results-pipelinerun-controller",
		}
	})

// pipelinerun/reconciler.go
		events, _ := r.kubeClientSet.CoreV1().Events(pr.GetNamespace()).List(ctx, metav1.ListOptions{
			FieldSelector: "reason=FinalizerUpdateFailed,reportingComponent=tekton-results-pipelinerun-controller,involvedObject.uid=" + string(pr.GetUID()),
		})
		if slices.ContainsFunc(events.Items, func(event corev1.Event) bool { event.LastTimestamp.After(pr.Status.CompletionTime.Time) }) {
			if err := metrics.CountRunNotStored(ctx, pr.Namespace, "PipelineRun"); err != nil {
				logging.FromContext(ctx).Errorf("error counting PipelineRun as not stored: %w", err)
			}
		}

Thanks @aThorp96 for the detailed explanation. I think this approach makes sense.
Should we do this in another PR so that reviews and all can be managed properly? We have already added a note regarding the deviation in the number.

@khrm @enarha Please let us know your thoughts on this.

Signed-off-by: divyansh42 <diagrawa@redhat.com>
Copy link
Member

@aThorp96 aThorp96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\lgtm

@tekton-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aThorp96, enarha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aThorp96
Copy link
Member

aThorp96 commented Nov 3, 2025

Thanks @aThorp96 for the detailed explanation. I think this approach makes sense.
Should we do this in another PR so that reviews and all can be managed properly? We have already added a note regarding the deviation in the number.

@divyansh42 I am not opposed to that being handled in separate PR

Copy link
Contributor

@khrm khrm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 4, 2025
@tekton-robot tekton-robot merged commit 1112953 into tektoncd:main Nov 4, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants