-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added input plugin for VMware vSphere. #4141
Conversation
Fixed bug with missing newlines in Wavefront plugin.
…r console with error messages.
Reorganized code
changed Interval properties to durations
Can you add some example output to the README? https://github.com/influxdata/telegraf/pull/4141/files#r213154334 |
Added sample output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried out the simulator, I think it will work well for integration testing for now. I did run into a couple issues:
I put in a bogus address https://localhost:1234/sdk
, but there was nothing printed. We should return an error either from gather and start if the connection cannot be established.
I fixed the address but still didn't see any error, simulator logged remote error: tls: bad certificate
. This should also display an error if possible., after setting insecure_skip_verify
everything is working great.
vsphere_host_disk,disk=/var/folders/rf/txwdm4pj409f70wnkdlp7sz80000gq/T/govcsim-DC0-LocalDS_0-367088371@folder-5,esxhostname=DC0_H0,host=host.example.com,moid=host-19,os=Mac,source=DC0_H0,vcenter=localhost:8989 write_average=2635i,read_average=30i 1535660339000000000 | ||
vsphere_host_mem,esxhostname=DC0_H0,host=host.example.com,moid=host-19,os=Mac,source=DC0_H0,vcenter=localhost:8989 usage_average=98.5 1535660339000000000 | ||
vsphere_host_net,esxhostname=DC0_H0,host=host.example.com,moid=host-19,os=Mac,source=DC0_H0,vcenter=localhost:8989 usage_average=1887i,bytesRx_average=662i,bytesTx_average=251i 1535660339000000000 | ||
vsphere_host_net,esxhostname=DC0_H0,host=host.example.com,interface=vmnic0,moid=host-19,os=Mac,source=DC0_H0,vcenter=localhost:8989 usage_average=1481i,bytesTx_average=899i,bytesRx_average=992i 1535660339000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two lines above, L335:336, is one of them meant to be an aggregate metric of all interfaces on the host? If so it would be ideal if we could, at least by default, collect only the total usage or the per interface metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what the [host|vm|cluster|datastore]_instances flag does. If you set it to false, you'll get the behavior you're asking for. By default, they're set to "true", except for datastores. This reflects some kind of "best practice" of what people usually want to collect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should add support for collecting only the instance metrics? If you collect instance metrics then there is less reason to store the totals since it can be computed at query time. Perhaps the instance variables could be something like: `vm_metric_mode = "instance|aggregate|all".
One thing that is a moderate issue with metrics tagged like this is selecting only the instance without the totals. For example say we have these metrics:
vsphere_host_net,interface=vmnic0 bytesTx_average=1i
vsphere_host_net,interface=vmnic1 bytesTx_average=3i
vsphere_host_net bytesTx_average=2i
To average only the instance versions you need to know the secret method to remove series containing the total aggregation:
select mean(bytesTx_average) from vsphere_host_net where interface =~ /./
What we usually do is try to rename the field by appending the aggregation type, the new field name ensures the values do not get aggregated together:
vsphere_host_net,interface=vmnic0 bytesTx_average=1i
vsphere_host_net,interface=vmnic1 bytesTx_average=3i
vsphere_host_net bytesTx_average_average=2i
So that would be very ugly, maybe something like this would be better?:
vsphere_host_net,interface=vmnic0 bytesTx_average=1i
vsphere_host_net,interface=vmnic1 bytesTx_average=3i
vsphere_host_net total_bytesTx_average=2i
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point that it's hard to search to things that are missing a certain point tag. The problem is that it's tricky to reliably determine whether a metric in vCenter will ever show up with an instance metric, so you'd have to add the "total" or "average" to anything that's not a metric on an instance. You never know if that same metric is going to show up in the future with an instance. @puckpuck what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. cpu.blablabla
doesn't have an instance metric in vSphere version N, so all metrics are reported as just cpu.blablabla
. In version N+1, the metric is given an instance and all of a sudden cpu.blablabla
becomes cpu.blablabla.total
. I really don't like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be a downside, you don't want to switch your dashboards because you change this minor setting. It seems to me that the best solution would be to collect only one type or the other, at least by default. Don't have to worry about excluding data from the query if you don't have it, and I think having both forms is redundant anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best solution is to always emit the instance tag and send it as instance-total
for the aggregate metrics (i.e. metrics without an instance tag).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be similar to the way e.g. input.cpu does it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Commit coming shortly.
vsphere_host_cpu,cpu=1,esxhostname=DC0_H0,host=host.example.com,moid=host-19,os=Mac,source=DC0_H0,vcenter=localhost:8989 coreUtilization_average=25.92,utilization_average=18.72,used_summation=39790i,usage_average=40.42,idle_summation=69457i 1535660339000000000 | ||
vsphere_host_net,clustername=DC0_C0,esxhostname=DC0_C0_H0,host=host.example.com,interface=vmnic0,moid=host-30,os=Mac,source=DC0_C0_H0,vcenter=localhost:8989 usage_average=1246i,bytesTx_average=673i,bytesRx_average=781i 1535660339000000000 | ||
vsphere_host_cpu,clustername=DC0_C0,esxhostname=DC0_C0_H0,host=host.example.com,moid=host-30,os=Mac,source=DC0_C0_H0,vcenter=localhost:8989 coreUtilization_average=33.8,idle_summation=77121i,ready_summation=15857i,readiness_average=0.39,used_summation=29554i,costop_summation=2i,wait_summation=4338417i,utilization_average=17.87,latency_average=0.44,usage_average=28.78 1535660339000000000 | ||
vsphere_host_cpu,clustername=DC0_C0,cpu=0,esxhostname=DC0_C0_H0,host=host.example.com,moid=host-30,os=Mac,source=DC0_C0_H0,vcenter=localhost:8989 idle_summation=86610i,coreUtilization_average=34.36,utilization_average=19.03,used_summation=28766i,usage_average=23.72 1535660339000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe the relationship between cluster, esxhostname, source, moid, and vcenter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A vCenter has clusters, clusters have esxhosts (the physical machines) identified by the esxhostname
, hosts have vms. The source
field is the name of whatever object is collected (vm, host, cluster or datastore). A moid
is the unique internal id of any object in a vCenter. It is sometimes useful for uniquely identifying an object when e.g. it has been renamed.
Do you think we need this in the README? My assumption was that most people using thisa plugin would already be familiar with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(MOID stands for Managed Object ID and is a well-known property of vSphere resources)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we need this in the README?
No probably not, I agree that anyone monitoring will be more up to speed than I am.
## Sample output | ||
|
||
``` | ||
vsphere_vm_cpu,esxhostname=DC0_H0,guest=other,host=host.example.com,moid=vm-35,os=Mac,source=DC0_H0_VM0,vcenter=localhost:8989,vmname=DC0_H0_VM0 run_summation=2608i,ready_summation=129i,usage_average=5.01,used_summation=2134i,demand_average=326i 1535660299000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a couple collections, lets remove it down to just one interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Will check on that.
"virtualDisk.totalReadLatency.average", | ||
"virtualDisk.totalWriteLatency.average", | ||
"virtualDisk.write.average", | ||
"virtualDisk.writeOIO.latest", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the metrics in the default config are not documented in METRICS.md
plugins/inputs/vsphere/README.MD
Outdated
# metrics_per_query = 256 | ||
|
||
## number of go routines to use for collection and discovery of objects and metrics | ||
# collect_concurrency = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see collect_concurrency being used, do we still need it?
plugins/inputs/vsphere/README.MD
Outdated
|
||
## number of go routines to use for collection and discovery of objects and metrics | ||
# collect_concurrency = 1 | ||
# discover_concurrency = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resource that is scarce is connections, so I propose we rename this max_discover_connections
.
BTW, did you see they added an option for this to the transport in Go 1.11 (MaxConnsPerHost)? Should be pretty helpful, but we still target Go 1.9 so we can't use it yet.
|
||
For a detailed list of commonly available metrics, please refer to [METRICS.MD](METRICS.MD) | ||
|
||
## Tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will probably merge the Tags section with Measurements & Fields when I merge the PR, unless you do it first :)
Check EXAMPLE_README.md for the latest style.
@prydin We are hoping to release 1.8.0-rc1 on Wednesday afternoon, can you ping me once any final tweaks are in? |
@prydin Please confirm that you aren't outputting any empty tag values. On the old version I'm using, I just found that a lot of my points were being blocked in wavefront due to an empty cluster tag, e.g. cluster="". |
I will commit my final changes today. A question for @danielnelson : My fork has a ton of commits. Do you squash them as you pull them in or you want me to do anything? |
@randallt Thanks for the note. I went through the code and fixed an issue related to what you mentioned. It was possible for the plugin to send empty cluster tags under certain circumstances. Not anymore. |
@danielnelson please hold off on this for a few hours. Need to check something. |
Sorry about that. We are back in business. |
Merged! 🍻 |
closes #1420
Added full support for vSphere monitoring. Supports vms, hosts, clusters and datastores. Allows filtering of metrics and resources. Written by Pontus Rydin, VMware and Pierre Tessier, VMware.
Required for all PRs: