Skip to content

Conversation

@qingling128
Copy link
Contributor

@qingling128 qingling128 commented Jun 23, 2021

Implements b/191888008.

@qingling128 qingling128 requested a review from davidbtucker June 23, 2021 23:41
@qingling128
Copy link
Contributor Author

Still fixing some tests on the Windows side, but the rough idea is ready for review.

@punya
Copy link

punya commented Jun 23, 2021

What's the user experience like when there's a syntax error (malformed YAML or something that doesn't pass validation)? Does it point to the specific file that has the error?

(Asking because we were discussing this very thing in OTel SIG today.)

@qingling128
Copy link
Contributor Author

qingling128 commented Jun 23, 2021

What's the user experience like when there's a syntax error (malformed YAML or something that doesn't pass validation)? Does it point to the specific file that has the error?

(Asking because we were discussing this very thing in OTel SIG today.)

Currently it writes out the built-in config and the merged config to a folder /etc/google-cloud-ops-agent/debugging (Detailed naming is TBD). The agent does not load configs from these two files. It's purely an output from the agent for debugging purpose. The errors can happen in 2 phases:

  1. When the user config gets merged with the built-in config. If things fails here, users get an error complaining the merge did not succeeded with the error (e.g. not a valid yaml; trying to merge a struct that does not match the original type).
  2. When the merged config is not valid, that is handled by the ops agent config validation logic. This validation will emit error messages that calls out which receiver/processor/exporter and which parameters are problematic: e.g. https://github.com/GoogleCloudPlatform/ops-agent/blob/master/confgenerator/testdata/invalid/linux/logging-receiver_files_type_unsupported_parameter_listen_host/golden_error. Users can trace back to the config by receiver/processor/exporter ID.

When in the future we need to support more config files (right now user config is all in one file), we'll need to add additional logic to map the receiver/processor/exporter ID to a certain file, and include that file path in the error message.

@qingling128 qingling128 marked this pull request as draft June 24, 2021 00:02
@qingling128 qingling128 changed the title Split conf to built-in conf and user conf and use Viper to merge them. Split conf to built-in conf and user conf and merge them. Jun 28, 2021
@qingling128 qingling128 force-pushed the lingshi-viper branch 2 times, most recently from 79864d6 to 106334a Compare June 28, 2021 22:07
@qingling128 qingling128 marked this pull request as ready for review June 28, 2021 22:08
Copy link
Contributor Author

@qingling128 qingling128 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready for review now.

There are a few TODOs as seen in the code that I'm still cleaning up on the side, while this PR is being reviewed. But it's not a GA blocker, so if the PR is ready to ship before I get there, I will send a followup PR for them instead.

@qingling128
Copy link
Contributor Author

All unit and integration tests passed for this PR at commite1162fd563a2de1508e811050a5aed6436b2045b except a known / unrelated issue on CentOS 8: b/191888008#comment3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these not the same case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code to handle them are exactly the same. Just wanna document the subtle choices we made:

  1. Difference between an empty list and nil
  2. For a non-empty list, we only overrides the whole thing instead of trying to append to the existing list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the "difference". Appending to an empty list is the same as replacing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't merging this at the pipeline level just like the other object types produce exactly the same behavior in fewer lines of code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of cases like:

$ cat testdata/valid/windows/metrics-turn_off_iis/input.yaml

metrics:
  service:
    pipelines:
      default_pipeline:
        receivers: [hostmetrics,mssql]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how that is different. That looks like it would merge the same either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's at the pipeline level, users would have to specify the full default_pipeline (receivers, processors, exporters), right?

Copy link
Contributor Author

@qingling128 qingling128 Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sample case: https://github.com/GoogleCloudPlatform/ops-agent/blob/1f661798d2c136c156ec3fd45c429269fc00cfc5/confgenerator/testdata/valid/linux/metrics-exclude_metrics_by_prefixes/input.yaml

metrics:
  processors:
    metrics_filter:
      type: exclude_metrics
      metrics_pattern:
      - agent.googleapis.com/processes/*
      - agent.googleapis.com/cpu/*
  service:
    pipelines:
      default_pipeline:
        processors: [metrics_filter]

The alternative is for users to always repeat receivers:

metrics:
  processors:
    metrics_filter:
      type: exclude_metrics
      metrics_pattern:
      - agent.googleapis.com/processes/*
      - agent.googleapis.com/cpu/*
  service:
    pipelines:
      default_pipeline:
        receivers: [hostmetrics]
        processors: [metrics_filter]

@qingling128 qingling128 force-pushed the lingshi-viper branch 8 times, most recently from b61fecd to 1f66179 Compare June 30, 2021 06:38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did discuss moving these into testdata and making these files symlinks to the testdata ones, to remove the special case for update_golden.

* Move string to struct.
* Clean up mergeConfigs
* Add a test for overrides default pipeline's processors
* Add a test for deleted config file corner case.
* Rename no_config to all-user_config_file_deleted and remove special logic
* Make built-in-conf.yaml and merged-conf.yaml file locations non configurable.
* Remove defaultConfig param from platformConfig and combine default-config.yaml and windows-default-config.yaml
* Rename WriteConfigFile to writeConfigFile
Copy link
Contributor

@igorpeshansky igorpeshansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@qingling128 qingling128 merged commit e3a5f3a into master Jun 30, 2021
@qingling128 qingling128 deleted the lingshi-viper branch June 30, 2021 20:53
if err != nil {
return err
}
uc, err := ParseUnifiedConfig(data, hostInfo.OS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this change disabled config validation on Linux. Fixed in #133.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants