Try getting sudo access before running tests #986

jefferbrecht · 2022-11-17T14:39:00Z

Description

sles-12 failed in the nightlies for the sudo issue, for which we had previously added a retry policy (5 times with 5 seconds in between). This change moves the retry to a more general area before tests are run, and tries a general sudo command instead: if it succeeds then later testing code should not have issues running sudo commands anymore. The retry count is also greatly increased, from 5 to 60.

Related issue

b/259122953

How has this been tested?

Tested sles-12-sp2-sap locally, though this is a flake to begin with so we'll have to assume it's working until it fails again.

Checklist:

Unit tests
- Unit tests do not apply.
- Unit tests have been added/modified and passed for this PR.
Integration tests
- Integration tests do not apply.
- Integration tests have been added/modified and passed for this PR.
Documentation
- This PR introduces no user visible changes.
- This PR introduces user visible changes and the corresponding documentation change has been made.
Minor version bump
- This PR introduces no new features.
- This PR introduces new features, and there is a separate PR to bump the minor version since the last release already.
- This PR bumps the version.

franciscovalentecastro · 2022-11-17T15:06:56Z

integration_test/gce/gce_testing.go

+		// TODO(b/259122953): wait until sudo is ready
+		backoffPolicy := backoff.WithContext(backoff.WithMaxRetries(backoff.NewConstantBackOff(slesStartupSudoDelay), slesStartupSudoMaxAttempts), ctx)
+		err := backoff.Retry(func() error {
+			_, err := RunRemotely(ctx, logger, vm, "", "sudo ls /root")


Is there a specific reason to choose sudo ls /root as the sudo command ? I like it and I think it's simple enough. My only concern is that something weird would happen like e.g. The /root folder takes a while to be created even after is-system-running is a success.

I wanted a built-in non-destructive command that's guaranteed to only be runnable with sudo, and sudo ls /root was just the first thing that came to mind. I think /root gets created pretty early during initialization, but even if there's a delay after is-system-running passes I think that's actually a good thing -- the goal is to block on as much startup initialization as possible.

I suppose I could change it to sudo stat /etc/sudoers.d/google_sudoers, which would more directly accomplish the goal of "make sure sudo works", but I'm not sure it matters much either way.

Ok! Understood, I'm fine with either of the commands you mentioned (sudo ls /root or sudo stat /etc/sudoers.d/google_sudoers). LGTM!

Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics prometheus: add receiver for ingesting prometheus metrics using the Ops Agent (#904) * prometheus: add config generation for the prometheus receiver (#844) * prometheus: add config generation for the prometheus receiver This change does the following: - [ ] Pulls in the googlemanagedprometheus exporter for prometheus - [ ] Pulls in prometheus so we can use the exact same config structure - [ ] Adds the config generation for prometheus receivers - [ ] Refactor some of the pipeline logic so prometheus receivers have their own exporter - [ ] Adds config validation for prometheus receivers - [ ] Adds basic unit tests for the prometheus receiver * prometheus: add more unit tests Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: regesiter all service discovery implementations so yaml parsing doesn't fail Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: report error in platform-agnostic way Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: add metadata labels to every static config (#872) * prometheus: add metadata labels to every static config This change adds the following: - [] Hooks up the receiver to the metadata detector - [] Adds labels to every static config - [] Adds unit tests - [] Adds integration tests - [] Disallows updating namespace, location and cluster labels * integration_test: ignore instance_id label for prom metrics * prometheus: add groupbyattrs processor so namespace, location and cluster fields can be used Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: use prometheus styled regex instead of OTel (#886) * prometheus: use prometheus styled regex isntead of OTel This is mainly focussed on the `replacement` field and us not using the otel styled `$` syntax for the user visible prom config. * prometheus: deep copy using marshal and unmarshal before updating regex Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: simplify deepcopy and escaping of $ in replacement strings Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * Prometheus receiver: add integration test with JSON exporter (#869) * prometheus: disable receiver by default * prometheus: presubmit update license and yamlfmt * prometheus: skip integration test on centos * prometheus: address PR comments * prometheus: update golden files Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Co-authored-by: Lujie Duan <lujieduan@google.com> Add caching to Windows build (#939) * testing windows caching * comment * new fancy run * try moving submodule update into dockerfile * install git in dockerfile * productionize flow * comments resolved * removed blank lines * extra space * remove cache missing logic from dockerfile Add workaround for Windows 2012 fluent-bit lockups (#952) * Add workaround for Windows 2012 fluent-bit lockups See b/240564518 for more background. * Revert "Testing: Add workaround for windows-2012 flakes" This reverts commit a0c21e3. * Revert force-restart workaround for Windows 2012 Testing : Add retries on `sudo sed` command when setting up SUSE test VM. (#956) * Add retries on `sudo sed` command when setting up SUSE test VM. * Add sles specific constants more max attempts and backoff duration. Get bison package from team vendor repo (#954) integration_test: skip prometheus tests on rhel (#959) Add compatible restart command for sles-15-sap (#960) Testing: change restart command to work on SLES-15 (#961) Let's try just removing the `.target` option. Use docker-credential-gcr instead of gcloud to match kokoro's prefetching logic (#964) Fix missing opensuse condition (#974) Fixes `TestPrometheusMetricsWithJSONExporter/opensuse-leap*`. Add startup delay on SUES platforms (#976) Add ZYPP_LOCK_TIMEOUT to reduce flakes (#975) Vault install and user documentation update (#973) * add metric policy to script and replace init references * add cleaner enable script that will guide users if they do not follow the configuration options * add configure_integration documentation * update doc nit Internal: tests install `go` from a GCS bucket (#977) This is to prevent flakes due to `golang.org` throttling us. :) Strip out mentions of winrm.par (#925) Use absolute path for mkswap and swapon (#978) We're still not sure why `mkswap` is randomly failing on sles-15-sap, but providing the absolute path does seem to help... Remove sudo from scripts along with updated docker install (#955) * Try skipping "update docker" step * also print out docker version * remove more obsolete steps * focal masquerades as hirsute * try again with jammy * test removing sudo * focal Co-authored-by: Martijn van Schaardenburg <martijnvs@google.com> Fall back to unqualified mkswap (#979) Some platforms, e.g. bionic, have mkswap under a different folder. We haven't had any problems running mkswap unqualified on bionic, or anywhere outside of SLES really, so add a fallback to the unqualified version of the command if the absolute version fails. Testing: Run Oracle DB test in a more normal way (#893) Update VERSION (#987) Update minimum_supported_agent_version in metadata.yaml. (#988) Co-authored-by: Rafael Westphal <westphalrafael@google.com> Attempting to force flush feature tracking metrics Try getting sudo access before running tests (#986) resourcedetector: Get default service account scopes. (#984) * Add getDefaultScopes() to resourcedetector. * Add `getSlice()` to testin FakeProvider. * Verify DefaultScopes in TestGettingResourceWithoutError.

* Adding support for feature tracking * Added feature tracking into `CollectOpsAgentSelfMetrics()`` * Added feature tracking metric in `expected_metric` metadata.yaml * Added confgenerator import to `main_windows` * Fix bug prometheus: add receiver for ingesting prometheus metrics using the Ops Agent (#904) * prometheus: add config generation for the prometheus receiver (#844) * prometheus: add config generation for the prometheus receiver This change does the following: - [ ] Pulls in the googlemanagedprometheus exporter for prometheus - [ ] Pulls in prometheus so we can use the exact same config structure - [ ] Adds the config generation for prometheus receivers - [ ] Refactor some of the pipeline logic so prometheus receivers have their own exporter - [ ] Adds config validation for prometheus receivers - [ ] Adds basic unit tests for the prometheus receiver * prometheus: add more unit tests Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: regesiter all service discovery implementations so yaml parsing doesn't fail Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: report error in platform-agnostic way Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: add metadata labels to every static config (#872) * prometheus: add metadata labels to every static config This change adds the following: - [] Hooks up the receiver to the metadata detector - [] Adds labels to every static config - [] Adds unit tests - [] Adds integration tests - [] Disallows updating namespace, location and cluster labels * integration_test: ignore instance_id label for prom metrics * prometheus: add groupbyattrs processor so namespace, location and cluster fields can be used Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: use prometheus styled regex instead of OTel (#886) * prometheus: use prometheus styled regex isntead of OTel This is mainly focussed on the `replacement` field and us not using the otel styled `$` syntax for the user visible prom config. * prometheus: deep copy using marshal and unmarshal before updating regex Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: simplify deepcopy and escaping of $ in replacement strings Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * Prometheus receiver: add integration test with JSON exporter (#869) * prometheus: disable receiver by default * prometheus: presubmit update license and yamlfmt * prometheus: skip integration test on centos * prometheus: address PR comments * prometheus: update golden files Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Co-authored-by: Lujie Duan <lujieduan@google.com> Add caching to Windows build (#939) * testing windows caching * comment * new fancy run * try moving submodule update into dockerfile * install git in dockerfile * productionize flow * comments resolved * removed blank lines * extra space * remove cache missing logic from dockerfile Add workaround for Windows 2012 fluent-bit lockups (#952) * Add workaround for Windows 2012 fluent-bit lockups See b/240564518 for more background. * Revert "Testing: Add workaround for windows-2012 flakes" This reverts commit a0c21e3. * Revert force-restart workaround for Windows 2012 Testing : Add retries on `sudo sed` command when setting up SUSE test VM. (#956) * Add retries on `sudo sed` command when setting up SUSE test VM. * Add sles specific constants more max attempts and backoff duration. Get bison package from team vendor repo (#954) integration_test: skip prometheus tests on rhel (#959) Add compatible restart command for sles-15-sap (#960) Testing: change restart command to work on SLES-15 (#961) Let's try just removing the `.target` option. Use docker-credential-gcr instead of gcloud to match kokoro's prefetching logic (#964) Fix missing opensuse condition (#974) Fixes `TestPrometheusMetricsWithJSONExporter/opensuse-leap*`. Add startup delay on SUES platforms (#976) Add ZYPP_LOCK_TIMEOUT to reduce flakes (#975) Vault install and user documentation update (#973) * add metric policy to script and replace init references * add cleaner enable script that will guide users if they do not follow the configuration options * add configure_integration documentation * update doc nit Internal: tests install `go` from a GCS bucket (#977) This is to prevent flakes due to `golang.org` throttling us. :) Strip out mentions of winrm.par (#925) Use absolute path for mkswap and swapon (#978) We're still not sure why `mkswap` is randomly failing on sles-15-sap, but providing the absolute path does seem to help... Remove sudo from scripts along with updated docker install (#955) * Try skipping "update docker" step * also print out docker version * remove more obsolete steps * focal masquerades as hirsute * try again with jammy * test removing sudo * focal Co-authored-by: Martijn van Schaardenburg <martijnvs@google.com> Fall back to unqualified mkswap (#979) Some platforms, e.g. bionic, have mkswap under a different folder. We haven't had any problems running mkswap unqualified on bionic, or anywhere outside of SLES really, so add a fallback to the unqualified version of the command if the absolute version fails. Testing: Run Oracle DB test in a more normal way (#893) Update VERSION (#987) Update minimum_supported_agent_version in metadata.yaml. (#988) Co-authored-by: Rafael Westphal <westphalrafael@google.com> Attempting to force flush feature tracking metrics Try getting sudo access before running tests (#986) resourcedetector: Get default service account scopes. (#984) * Add getDefaultScopes() to resourcedetector. * Add `getSlice()` to testin FakeProvider. * Verify DefaultScopes in TestGettingResourceWithoutError. * Refactoring tests to include internal metrics Refactoring tests to include internal metrics * Refactoring tests to include internal metrics * Fixed dependencies * Testing third party integrations - active_directory_ds * Testing third party integrations - activemq * Testing third party integrations - apache * Added extra expected metric for active_directory_ds * Fixed bug where feature extraction did not properly capture values of pointers * Testing third party integrations - aerospike * Merged, and fixed `go.mod` * Fixed go.sum * Addressed comments

Try getting sudo access before running tests

32b1324

jefferbrecht marked this pull request as ready for review November 17, 2022 14:39

franciscovalentecastro reviewed Nov 17, 2022

View reviewed changes

franciscovalentecastro approved these changes Nov 17, 2022

View reviewed changes

Merge branch 'master' into jefferbrecht-sles12-sudo

143deef

jefferbrecht merged commit 0093562 into master Nov 17, 2022

jefferbrecht deleted the jefferbrecht-sles12-sudo branch November 17, 2022 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Try getting sudo access before running tests #986

Try getting sudo access before running tests #986

Uh oh!

jefferbrecht commented Nov 17, 2022

Uh oh!

franciscovalentecastro Nov 17, 2022

Uh oh!

jefferbrecht Nov 17, 2022

Uh oh!

franciscovalentecastro Nov 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Try getting sudo access before running tests #986

Try getting sudo access before running tests #986

Uh oh!

Conversation

jefferbrecht commented Nov 17, 2022

Description

Related issue

How has this been tested?

Checklist:

Uh oh!

franciscovalentecastro Nov 17, 2022

Choose a reason for hiding this comment

Uh oh!

jefferbrecht Nov 17, 2022

Choose a reason for hiding this comment

Uh oh!

franciscovalentecastro Nov 17, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants