-
Notifications
You must be signed in to change notification settings - Fork 77
Try getting sudo access before running tests #986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| // TODO(b/259122953): wait until sudo is ready | ||
| backoffPolicy := backoff.WithContext(backoff.WithMaxRetries(backoff.NewConstantBackOff(slesStartupSudoDelay), slesStartupSudoMaxAttempts), ctx) | ||
| err := backoff.Retry(func() error { | ||
| _, err := RunRemotely(ctx, logger, vm, "", "sudo ls /root") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason to choose sudo ls /root as the sudo command ? I like it and I think it's simple enough. My only concern is that something weird would happen like e.g. The /root folder takes a while to be created even after is-system-running is a success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted a built-in non-destructive command that's guaranteed to only be runnable with sudo, and sudo ls /root was just the first thing that came to mind. I think /root gets created pretty early during initialization, but even if there's a delay after is-system-running passes I think that's actually a good thing -- the goal is to block on as much startup initialization as possible.
I suppose I could change it to sudo stat /etc/sudoers.d/google_sudoers, which would more directly accomplish the goal of "make sure sudo works", but I'm not sure it matters much either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! Understood, I'm fine with either of the commands you mentioned (sudo ls /root or sudo stat /etc/sudoers.d/google_sudoers). LGTM!
Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics Attempting to force flush feature tracking metrics prometheus: add receiver for ingesting prometheus metrics using the Ops Agent (#904) * prometheus: add config generation for the prometheus receiver (#844) * prometheus: add config generation for the prometheus receiver This change does the following: - [ ] Pulls in the googlemanagedprometheus exporter for prometheus - [ ] Pulls in prometheus so we can use the exact same config structure - [ ] Adds the config generation for prometheus receivers - [ ] Refactor some of the pipeline logic so prometheus receivers have their own exporter - [ ] Adds config validation for prometheus receivers - [ ] Adds basic unit tests for the prometheus receiver * prometheus: add more unit tests Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: regesiter all service discovery implementations so yaml parsing doesn't fail Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: report error in platform-agnostic way Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: add metadata labels to every static config (#872) * prometheus: add metadata labels to every static config This change adds the following: - [] Hooks up the receiver to the metadata detector - [] Adds labels to every static config - [] Adds unit tests - [] Adds integration tests - [] Disallows updating namespace, location and cluster labels * integration_test: ignore instance_id label for prom metrics * prometheus: add groupbyattrs processor so namespace, location and cluster fields can be used Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: use prometheus styled regex instead of OTel (#886) * prometheus: use prometheus styled regex isntead of OTel This is mainly focussed on the `replacement` field and us not using the otel styled `$` syntax for the user visible prom config. * prometheus: deep copy using marshal and unmarshal before updating regex Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: simplify deepcopy and escaping of $ in replacement strings Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * Prometheus receiver: add integration test with JSON exporter (#869) * prometheus: disable receiver by default * prometheus: presubmit update license and yamlfmt * prometheus: skip integration test on centos * prometheus: address PR comments * prometheus: update golden files Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Co-authored-by: Lujie Duan <lujieduan@google.com> Add caching to Windows build (#939) * testing windows caching * comment * new fancy run * try moving submodule update into dockerfile * install git in dockerfile * productionize flow * comments resolved * removed blank lines * extra space * remove cache missing logic from dockerfile Add workaround for Windows 2012 fluent-bit lockups (#952) * Add workaround for Windows 2012 fluent-bit lockups See b/240564518 for more background. * Revert "Testing: Add workaround for windows-2012 flakes" This reverts commit a0c21e3. * Revert force-restart workaround for Windows 2012 Testing : Add retries on `sudo sed` command when setting up SUSE test VM. (#956) * Add retries on `sudo sed` command when setting up SUSE test VM. * Add sles specific constants more max attempts and backoff duration. Get bison package from team vendor repo (#954) integration_test: skip prometheus tests on rhel (#959) Add compatible restart command for sles-15-sap (#960) Testing: change restart command to work on SLES-15 (#961) Let's try just removing the `.target` option. Use docker-credential-gcr instead of gcloud to match kokoro's prefetching logic (#964) Fix missing opensuse condition (#974) Fixes `TestPrometheusMetricsWithJSONExporter/opensuse-leap*`. Add startup delay on SUES platforms (#976) Add ZYPP_LOCK_TIMEOUT to reduce flakes (#975) Vault install and user documentation update (#973) * add metric policy to script and replace init references * add cleaner enable script that will guide users if they do not follow the configuration options * add configure_integration documentation * update doc nit Internal: tests install `go` from a GCS bucket (#977) This is to prevent flakes due to `golang.org` throttling us. :) Strip out mentions of winrm.par (#925) Use absolute path for mkswap and swapon (#978) We're still not sure why `mkswap` is randomly failing on sles-15-sap, but providing the absolute path does seem to help... Remove sudo from scripts along with updated docker install (#955) * Try skipping "update docker" step * also print out docker version * remove more obsolete steps * focal masquerades as hirsute * try again with jammy * test removing sudo * focal Co-authored-by: Martijn van Schaardenburg <martijnvs@google.com> Fall back to unqualified mkswap (#979) Some platforms, e.g. bionic, have mkswap under a different folder. We haven't had any problems running mkswap unqualified on bionic, or anywhere outside of SLES really, so add a fallback to the unqualified version of the command if the absolute version fails. Testing: Run Oracle DB test in a more normal way (#893) Update VERSION (#987) Update minimum_supported_agent_version in metadata.yaml. (#988) Co-authored-by: Rafael Westphal <westphalrafael@google.com> Attempting to force flush feature tracking metrics Try getting sudo access before running tests (#986) resourcedetector: Get default service account scopes. (#984) * Add getDefaultScopes() to resourcedetector. * Add `getSlice()` to testin FakeProvider. * Verify DefaultScopes in TestGettingResourceWithoutError.
* Adding support for feature tracking * Added feature tracking into `CollectOpsAgentSelfMetrics()`` * Added feature tracking metric in `expected_metric` metadata.yaml * Added confgenerator import to `main_windows` * Fix bug prometheus: add receiver for ingesting prometheus metrics using the Ops Agent (#904) * prometheus: add config generation for the prometheus receiver (#844) * prometheus: add config generation for the prometheus receiver This change does the following: - [ ] Pulls in the googlemanagedprometheus exporter for prometheus - [ ] Pulls in prometheus so we can use the exact same config structure - [ ] Adds the config generation for prometheus receivers - [ ] Refactor some of the pipeline logic so prometheus receivers have their own exporter - [ ] Adds config validation for prometheus receivers - [ ] Adds basic unit tests for the prometheus receiver * prometheus: add more unit tests Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: regesiter all service discovery implementations so yaml parsing doesn't fail Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: report error in platform-agnostic way Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: add metadata labels to every static config (#872) * prometheus: add metadata labels to every static config This change adds the following: - [] Hooks up the receiver to the metadata detector - [] Adds labels to every static config - [] Adds unit tests - [] Adds integration tests - [] Disallows updating namespace, location and cluster labels * integration_test: ignore instance_id label for prom metrics * prometheus: add groupbyattrs processor so namespace, location and cluster fields can be used Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: use prometheus styled regex instead of OTel (#886) * prometheus: use prometheus styled regex isntead of OTel This is mainly focussed on the `replacement` field and us not using the otel styled `$` syntax for the user visible prom config. * prometheus: deep copy using marshal and unmarshal before updating regex Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * prometheus: simplify deepcopy and escaping of $ in replacement strings Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> * Prometheus receiver: add integration test with JSON exporter (#869) * prometheus: disable receiver by default * prometheus: presubmit update license and yamlfmt * prometheus: skip integration test on centos * prometheus: address PR comments * prometheus: update golden files Signed-off-by: Ridwan Sharif <ridwanmsharif@google.com> Co-authored-by: Lujie Duan <lujieduan@google.com> Add caching to Windows build (#939) * testing windows caching * comment * new fancy run * try moving submodule update into dockerfile * install git in dockerfile * productionize flow * comments resolved * removed blank lines * extra space * remove cache missing logic from dockerfile Add workaround for Windows 2012 fluent-bit lockups (#952) * Add workaround for Windows 2012 fluent-bit lockups See b/240564518 for more background. * Revert "Testing: Add workaround for windows-2012 flakes" This reverts commit a0c21e3. * Revert force-restart workaround for Windows 2012 Testing : Add retries on `sudo sed` command when setting up SUSE test VM. (#956) * Add retries on `sudo sed` command when setting up SUSE test VM. * Add sles specific constants more max attempts and backoff duration. Get bison package from team vendor repo (#954) integration_test: skip prometheus tests on rhel (#959) Add compatible restart command for sles-15-sap (#960) Testing: change restart command to work on SLES-15 (#961) Let's try just removing the `.target` option. Use docker-credential-gcr instead of gcloud to match kokoro's prefetching logic (#964) Fix missing opensuse condition (#974) Fixes `TestPrometheusMetricsWithJSONExporter/opensuse-leap*`. Add startup delay on SUES platforms (#976) Add ZYPP_LOCK_TIMEOUT to reduce flakes (#975) Vault install and user documentation update (#973) * add metric policy to script and replace init references * add cleaner enable script that will guide users if they do not follow the configuration options * add configure_integration documentation * update doc nit Internal: tests install `go` from a GCS bucket (#977) This is to prevent flakes due to `golang.org` throttling us. :) Strip out mentions of winrm.par (#925) Use absolute path for mkswap and swapon (#978) We're still not sure why `mkswap` is randomly failing on sles-15-sap, but providing the absolute path does seem to help... Remove sudo from scripts along with updated docker install (#955) * Try skipping "update docker" step * also print out docker version * remove more obsolete steps * focal masquerades as hirsute * try again with jammy * test removing sudo * focal Co-authored-by: Martijn van Schaardenburg <martijnvs@google.com> Fall back to unqualified mkswap (#979) Some platforms, e.g. bionic, have mkswap under a different folder. We haven't had any problems running mkswap unqualified on bionic, or anywhere outside of SLES really, so add a fallback to the unqualified version of the command if the absolute version fails. Testing: Run Oracle DB test in a more normal way (#893) Update VERSION (#987) Update minimum_supported_agent_version in metadata.yaml. (#988) Co-authored-by: Rafael Westphal <westphalrafael@google.com> Attempting to force flush feature tracking metrics Try getting sudo access before running tests (#986) resourcedetector: Get default service account scopes. (#984) * Add getDefaultScopes() to resourcedetector. * Add `getSlice()` to testin FakeProvider. * Verify DefaultScopes in TestGettingResourceWithoutError. * Refactoring tests to include internal metrics Refactoring tests to include internal metrics * Refactoring tests to include internal metrics * Fixed dependencies * Testing third party integrations - active_directory_ds * Testing third party integrations - activemq * Testing third party integrations - apache * Added extra expected metric for active_directory_ds * Fixed bug where feature extraction did not properly capture values of pointers * Testing third party integrations - aerospike * Merged, and fixed `go.mod` * Fixed go.sum * Addressed comments
Description
sles-12 failed in the nightlies for the sudo issue, for which we had previously added a retry policy (5 times with 5 seconds in between). This change moves the retry to a more general area before tests are run, and tries a general sudo command instead: if it succeeds then later testing code should not have issues running sudo commands anymore. The retry count is also greatly increased, from 5 to 60.
Related issue
b/259122953
How has this been tested?
Tested sles-12-sp2-sap locally, though this is a flake to begin with so we'll have to assume it's working until it fails again.
Checklist: