Integration test for Prometheus receiver of metric types histogram and summary #909

LujieDuan · 2022-10-17T20:48:02Z

Description

Add integration test case for the Prometheus receiver, with histogram and summary types of metrics.

Related issue

b/253060243

How has this been tested?

Run the test manually and use unexpected values to make sure the test can catch that.
Run the test in the PR.

Checklist:

Unit tests
- Unit tests do not apply.
- Unit tests have been added/modified and passed for this PR.
Integration tests
- Integration tests do not apply.
- Integration tests have been added/modified and passed for this PR.
Documentation
- This PR introduces no user visible changes.
- This PR introduces user visible changes and the corresponding documentation change has been made.
Minor version bump
- This PR introduces no new features.
- This PR introduces new features, and there is a separate PR to bump the minor version since the last release already.
- This PR bumps the version.

ridwanmsharif

Overall looks pretty good to me but I had a few questions. Will quick scan once more after

ridwanmsharif · 2022-11-23T15:50:59Z

integration_test/ops_agent_test.go

Why was this retry needed again? I don't remember anymore - maybe we should comment on it here for future viewers of this test.

The retry was needed because the Python http server with nohup would randomly fail. The Python http server also different across distros and Python versions and make this test unstable. I have changed to use a go http server (since we can already set up go env in the vm) with systemd (to make sure the server starts, and logs to journalctl_output.txt for better debugging). From my testing it looks this is much more reliable than the previous approach.

ridwanmsharif · 2022-11-23T15:58:06Z

integration_test/ops_agent_test.go

Is this file really JSON? I thought it was a Prometheus metrics output file no?

Good catch! Renamed to just data.

ridwanmsharif · 2022-11-23T16:03:58Z

integration_test/ops_agent_test.go

I don't completely understand why these are all 0 values? They all do have some values right (in your example above, they have a total of 10 points). Shouldn't this have the first values? I see in step 2 they get the delta between the second and the first set of values, but in this case the delta between the first and 0 is dropped?

If this is expected, please let me know why. I may be missing something

Sync'ed offline, the cumulative points (histogram, count & sum of summary) get normalized when exporting to GCM: the first point of a cumulative time series is not sent and used to subtract from all following points. That's why even though the first step point has values, the agent scrape this multiple times and they are all received as 0 values. Added comments to point to the normalization of the cumulative points.

This makes sense to me. I think we should atleast manually verify that this is actually what promql users expect once they migrate off of prometheus. We should be doing what GMP does on GKE, or what prometheus does by default

ridwanmsharif · 2022-11-23T16:10:47Z

integration_test/ops_agent_test.go

Is this always zero? Should it not be set to what was found before step 4?

Also looks like we are already testing for Step 2 summary points below. Is this needed?

Together with WaitForMetricWithCondition, this check here has been removed and replaced with a sleep(3m).

ridwanmsharif · 2022-11-23T16:16:15Z

integration_test/ops_agent_test.go

Same question for these values like for the histogram. It feels like the first points are basically just dropped?

Summary's count and sum are cumulative and get normalized as well.

ridwanmsharif · 2022-11-23T16:17:57Z

integration_test/ops_agent_test.go

[super ultra nit]: This is a pretty massive test. I wonder if we can simplify some of this into a separate method for the set up and then have two tests: 1 for histogram and 1 for summary. That way if one of them breaks, debugging and fixing may be easier

Good suggestion! I have broken this into two separate tests and look cleaner this way.

ridwanmsharif · 2022-11-24T17:21:18Z

integration_test/ops_agent_test.go

Was it not possible to use the extraFilters param instead of adding a condition after we wait for the metric?
Asking to see if we can push the query for this condition to the server and not do it client side.

WaitForMetricWithCondition was used to make sure step two points have been received before checking their values. Now I have found that those step two points normally get received pretty quick (with 10s scrape interval for the agent, it usually takes another couple seconds before those become available) so I have replaced this with a sleep(3m) and we don't have to keep retrying here if something wrong with the step two points.

ridwanmsharif

The new http server is very cool. Can we use it for TestPrometheusMetricsWithJSONExporter too?

Overall looks pretty solid, just not sure yet about whether we want the normalization that the exporter enables by default or whether we want to toggle that off.

ridwanmsharif · 2023-01-04T18:40:42Z

integration_test/testdata/prometheus/http_server.go

Doesn't the test curl /data?

The http server will make all files in a folder available, in case we need to host/query multiple files. Currently the test curls the {dir}/data file to get the metrics.

ridwanmsharif · 2023-01-04T18:41:32Z

integration_test/ops_agent_test.go

This is much nicer!

Will use this same method for the TestPrometheusMetricsWithJSONExporter as well!

ridwanmsharif · 2023-01-04T18:45:44Z

integration_test/ops_agent_test.go

Other than the step_one and step_two, everything here looks like the same for the summary test too. Maybe we can pull out this setup into a helper method and reuse it in both places. Will also simplify changes made to the http server and setup later on.

Done! Right now the top level test functions will only specify step one and two's files & checks against expected values, and the helper function will have all the common steps.

ridwanmsharif · 2023-01-04T19:07:15Z

integration_test/ops_agent_test.go

This makes sense to me. I think we should atleast manually verify that this is actually what promql users expect once they migrate off of prometheus. We should be doing what GMP does on GKE, or what prometheus does by default

…m and summary Refactor the test to show steps Add comments to the steps and expected values Fix typo Fix typo

LujieDuan changed the title ~~Add integration test for Prometheus receiver of metric types histogra…~~ Integration test for Prometheus receiver of metric types histogram and summary Oct 20, 2022

LujieDuan marked this pull request as ready for review October 20, 2022 22:42

LujieDuan requested a review from ridwanmsharif October 20, 2022 22:43

LujieDuan force-pushed the lujieduan-prometheus-histogram-summary-test branch from bd16a1f to f0948e4 Compare November 15, 2022 17:49

LujieDuan changed the base branch from private_preview/prometheus to master November 15, 2022 17:50

ridwanmsharif reviewed Nov 23, 2022

View reviewed changes

ridwanmsharif reviewed Nov 24, 2022

View reviewed changes

LujieDuan force-pushed the lujieduan-prometheus-histogram-summary-test branch 2 times, most recently from e84f738 to 4a0d0ad Compare December 20, 2022 22:51

LujieDuan added the kokoro:force-run Forces kokoro to run integration tests on a CL label Dec 21, 2022

stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Dec 21, 2022

LujieDuan force-pushed the lujieduan-prometheus-histogram-summary-test branch from 3df2c20 to 527bd34 Compare December 21, 2022 16:51

LujieDuan added the kokoro:force-run Forces kokoro to run integration tests on a CL label Dec 21, 2022

stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Dec 21, 2022

LujieDuan added the kokoro:force-run Forces kokoro to run integration tests on a CL label Dec 22, 2022

stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Dec 22, 2022

LujieDuan added the kokoro:force-run Forces kokoro to run integration tests on a CL label Dec 28, 2022

stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Dec 28, 2022

LujieDuan added the kokoro:force-run Forces kokoro to run integration tests on a CL label Dec 28, 2022

stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Dec 28, 2022

LujieDuan added the kokoro:force-run Forces kokoro to run integration tests on a CL label Jan 3, 2023

stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Jan 3, 2023

LujieDuan force-pushed the lujieduan-prometheus-histogram-summary-test branch from 527bd34 to 111f12f Compare January 3, 2023 22:09

LujieDuan requested a review from ridwanmsharif January 4, 2023 15:38

ridwanmsharif reviewed Jan 4, 2023

View reviewed changes

LujieDuan added the kokoro:force-run Forces kokoro to run integration tests on a CL label Jan 5, 2023

stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Jan 5, 2023

LujieDuan requested a review from ridwanmsharif January 5, 2023 20:48

ridwanmsharif approved these changes Jan 9, 2023

View reviewed changes

Add integration test for Prometheus receiver of metric types histogra…

474e536

…m and summary Refactor the test to show steps Add comments to the steps and expected values Fix typo Fix typo

LujieDuan added 3 commits January 9, 2023 21:42

Enable Prometheus feature for the test

a934986

Use systemd and go http server for summary and histogram tests

76012f3

Extract common steps to one function for both tests

22eaa73

LujieDuan force-pushed the lujieduan-prometheus-histogram-summary-test branch from 5f21d30 to 22eaa73 Compare January 9, 2023 21:42

LujieDuan added the kokoro:force-run Forces kokoro to run integration tests on a CL label Jan 9, 2023

stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Jan 9, 2023

LujieDuan added the kokoro:force-run Forces kokoro to run integration tests on a CL label Jan 9, 2023

stackdriver-instrumentation-release removed the kokoro:force-run Forces kokoro to run integration tests on a CL label Jan 9, 2023

LujieDuan merged commit e6cf401 into master Jan 10, 2023

LujieDuan mentioned this pull request Jan 10, 2023

Update Prometheus JSON Exporter Integration Test #1029

Merged

9 tasks

igorpeshansky deleted the lujieduan-prometheus-histogram-summary-test branch July 10, 2023 22:00

Integration test for Prometheus receiver of metric types histogram and summary #909

Integration test for Prometheus receiver of metric types histogram and summary #909

Uh oh!

Conversation

LujieDuan commented Oct 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issue

How has this been tested?

Checklist:

Uh oh!

ridwanmsharif left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ridwanmsharif left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LujieDuan commented Oct 17, 2022 •

edited

Loading