Added input plugin for VMware vSphere. #4141

prydin · 2018-05-12T00:03:39Z

Added full support for vSphere monitoring. Supports vms, hosts, clusters and datastores. Allows filtering of metrics and resources. Written by Pontus Rydin, VMware and Pierre Tessier, VMware.

Required for all PRs:

[ X ] Signed CLA.
[ X ] Associated README.md updated.
[ X ] Has appropriate unit tests.

Fixed bug with missing newlines in Wavefront plugin.

…r console with error messages.

Reorganized code

changed Interval properties to durations

danielnelson · 2018-08-30T19:15:05Z

Can you add some example output to the README? https://github.com/influxdata/telegraf/pull/4141/files#r213154334

prydin · 2018-08-30T20:29:38Z

Added sample output.

danielnelson

Tried out the simulator, I think it will work well for integration testing for now. I did run into a couple issues:

I put in a bogus address https://localhost:1234/sdk, but there was nothing printed. We should return an error either from gather and start if the connection cannot be established.

I fixed the address but still didn't see any error, simulator logged remote error: tls: bad certificate. This should also display an error if possible., after setting insecure_skip_verify everything is working great.

plugins/inputs/vsphere/client.go

danielnelson · 2018-09-06T00:01:24Z

plugins/inputs/vsphere/README.MD

+vsphere_host_disk,disk=/var/folders/rf/txwdm4pj409f70wnkdlp7sz80000gq/T/govcsim-DC0-LocalDS_0-367088371@folder-5,esxhostname=DC0_H0,host=host.example.com,moid=host-19,os=Mac,source=DC0_H0,vcenter=localhost:8989 write_average=2635i,read_average=30i 1535660339000000000
+vsphere_host_mem,esxhostname=DC0_H0,host=host.example.com,moid=host-19,os=Mac,source=DC0_H0,vcenter=localhost:8989 usage_average=98.5 1535660339000000000
+vsphere_host_net,esxhostname=DC0_H0,host=host.example.com,moid=host-19,os=Mac,source=DC0_H0,vcenter=localhost:8989 usage_average=1887i,bytesRx_average=662i,bytesTx_average=251i 1535660339000000000
+vsphere_host_net,esxhostname=DC0_H0,host=host.example.com,interface=vmnic0,moid=host-19,os=Mac,source=DC0_H0,vcenter=localhost:8989 usage_average=1481i,bytesTx_average=899i,bytesRx_average=992i 1535660339000000000


The two lines above, L335:336, is one of them meant to be an aggregate metric of all interfaces on the host? If so it would be ideal if we could, at least by default, collect only the total usage or the per interface metrics.

That's what the [host|vm|cluster|datastore]_instances flag does. If you set it to false, you'll get the behavior you're asking for. By default, they're set to "true", except for datastores. This reflects some kind of "best practice" of what people usually want to collect.

Do you think we should add support for collecting only the instance metrics? If you collect instance metrics then there is less reason to store the totals since it can be computed at query time. Perhaps the instance variables could be something like: `vm_metric_mode = "instance|aggregate|all".

One thing that is a moderate issue with metrics tagged like this is selecting only the instance without the totals. For example say we have these metrics:

vsphere_host_net,interface=vmnic0 bytesTx_average=1i vsphere_host_net,interface=vmnic1 bytesTx_average=3i vsphere_host_net bytesTx_average=2i

To average only the instance versions you need to know the secret method to remove series containing the total aggregation:

select mean(bytesTx_average) from vsphere_host_net where interface =~ /./

What we usually do is try to rename the field by appending the aggregation type, the new field name ensures the values do not get aggregated together:

vsphere_host_net,interface=vmnic0 bytesTx_average=1i vsphere_host_net,interface=vmnic1 bytesTx_average=3i vsphere_host_net bytesTx_average_average=2i

So that would be very ugly, maybe something like this would be better?:

vsphere_host_net,interface=vmnic0 bytesTx_average=1i vsphere_host_net,interface=vmnic1 bytesTx_average=3i vsphere_host_net total_bytesTx_average=2i

I see your point that it's hard to search to things that are missing a certain point tag. The problem is that it's tricky to reliably determine whether a metric in vCenter will ever show up with an instance metric, so you'd have to add the "total" or "average" to anything that's not a metric on an instance. You never know if that same metric is going to show up in the future with an instance. @puckpuck what do you think?

E.g. cpu.blablabla doesn't have an instance metric in vSphere version N, so all metrics are reported as just cpu.blablabla. In version N+1, the metric is given an instance and all of a sudden cpu.blablabla becomes cpu.blablabla.total. I really don't like that.

Yeah, that would be a downside, you don't want to switch your dashboards because you change this minor setting. It seems to me that the best solution would be to collect only one type or the other, at least by default. Don't have to worry about excluding data from the query if you don't have it, and I think having both forms is redundant anyway.

I think the best solution is to always emit the instance tag and send it as instance-total for the aggregate metrics (i.e. metrics without an instance tag).

That would be similar to the way e.g. input.cpu does it.

That will work.

Ok. Commit coming shortly.

danielnelson · 2018-09-06T00:04:01Z

plugins/inputs/vsphere/README.MD

+vsphere_host_cpu,cpu=1,esxhostname=DC0_H0,host=host.example.com,moid=host-19,os=Mac,source=DC0_H0,vcenter=localhost:8989 coreUtilization_average=25.92,utilization_average=18.72,used_summation=39790i,usage_average=40.42,idle_summation=69457i 1535660339000000000
+vsphere_host_net,clustername=DC0_C0,esxhostname=DC0_C0_H0,host=host.example.com,interface=vmnic0,moid=host-30,os=Mac,source=DC0_C0_H0,vcenter=localhost:8989 usage_average=1246i,bytesTx_average=673i,bytesRx_average=781i 1535660339000000000
+vsphere_host_cpu,clustername=DC0_C0,esxhostname=DC0_C0_H0,host=host.example.com,moid=host-30,os=Mac,source=DC0_C0_H0,vcenter=localhost:8989 coreUtilization_average=33.8,idle_summation=77121i,ready_summation=15857i,readiness_average=0.39,used_summation=29554i,costop_summation=2i,wait_summation=4338417i,utilization_average=17.87,latency_average=0.44,usage_average=28.78 1535660339000000000
+vsphere_host_cpu,clustername=DC0_C0,cpu=0,esxhostname=DC0_C0_H0,host=host.example.com,moid=host-30,os=Mac,source=DC0_C0_H0,vcenter=localhost:8989 idle_summation=86610i,coreUtilization_average=34.36,utilization_average=19.03,used_summation=28766i,usage_average=23.72 1535660339000000000


Can you describe the relationship between cluster, esxhostname, source, moid, and vcenter

A vCenter has clusters, clusters have esxhosts (the physical machines) identified by the esxhostname, hosts have vms. The source field is the name of whatever object is collected (vm, host, cluster or datastore). A moid is the unique internal id of any object in a vCenter. It is sometimes useful for uniquely identifying an object when e.g. it has been renamed.

Do you think we need this in the README? My assumption was that most people using thisa plugin would already be familiar with this.

(MOID stands for Managed Object ID and is a well-known property of vSphere resources)

Do you think we need this in the README?

No probably not, I agree that anyone monitoring will be more up to speed than I am.

danielnelson · 2018-09-06T00:11:33Z

plugins/inputs/vsphere/README.MD

+## Sample output
+
+```
+vsphere_vm_cpu,esxhostname=DC0_H0,guest=other,host=host.example.com,moid=vm-35,os=Mac,source=DC0_H0_VM0,vcenter=localhost:8989,vmname=DC0_H0_VM0 run_summation=2608i,ready_summation=129i,usage_average=5.01,used_summation=2134i,demand_average=326i 1535660299000000000


I think this is a couple collections, lets remove it down to just one interval.

OK. Will check on that.

danielnelson · 2018-09-06T00:13:33Z

plugins/inputs/vsphere/README.MD

+    "virtualDisk.totalReadLatency.average",
+    "virtualDisk.totalWriteLatency.average",
+    "virtualDisk.write.average",
+    "virtualDisk.writeOIO.latest",


Some of the metrics in the default config are not documented in METRICS.md

plugins/inputs/vsphere/README.MD

danielnelson · 2018-09-06T00:19:02Z

plugins/inputs/vsphere/README.MD

+  # metrics_per_query = 256
+
+  ## number of go routines to use for collection and discovery of objects and metrics
+  # collect_concurrency = 1


I don't see collect_concurrency being used, do we still need it?

danielnelson · 2018-09-06T00:19:44Z

plugins/inputs/vsphere/README.MD

+
+  ## number of go routines to use for collection and discovery of objects and metrics
+  # collect_concurrency = 1
+  # discover_concurrency = 1


The resource that is scarce is connections, so I propose we rename this max_discover_connections.

BTW, did you see they added an option for this to the transport in Go 1.11 (MaxConnsPerHost)? Should be pretty helpful, but we still target Go 1.9 so we can't use it yet.

danielnelson · 2018-09-06T00:23:33Z

plugins/inputs/vsphere/README.MD

+
+For a detailed list of commonly available metrics, please refer to [METRICS.MD](METRICS.MD)
+
+## Tags


I will probably merge the Tags section with Measurements & Fields when I merge the PR, unless you do it first :)

Check EXAMPLE_README.md for the latest style.

danielnelson · 2018-09-11T00:49:59Z

@prydin We are hoping to release 1.8.0-rc1 on Wednesday afternoon, can you ping me once any final tweaks are in?

randallt · 2018-09-11T13:27:26Z

@prydin Please confirm that you aren't outputting any empty tag values. On the old version I'm using, I just found that a lot of my points were being blocked in wavefront due to an empty cluster tag, e.g. cluster="".

prydin · 2018-09-11T16:04:53Z

I will commit my final changes today. A question for @danielnelson : My fork has a ton of commits. Do you squash them as you pull them in or you want me to do anything?

prydin · 2018-09-11T17:44:43Z

@randallt Thanks for the note. I went through the code and fixed an issue related to what you mentioned. It was possible for the plugin to send empty cluster tags under certain circumstances. Not anymore.

prydin · 2018-09-11T20:37:56Z

@danielnelson please hold off on this for a few hours. Need to check something.

prydin · 2018-09-11T21:42:33Z

Sorry about that. We are back in business.

danielnelson · 2018-09-11T21:57:43Z

Merged! 🍻

prydin and others added 30 commits September 30, 2017 16:07

Added vsphere input plugin.

2c8ffa9

First "working" version of vsphere input plugin

037983c

Optimizations of vSphere plugin.

2ca7cf4

Fixed bug with missing newlines in Wavefront plugin.

Performance improvements

3f2e1b9

Use logrus instead of built-in log

4985053

Performance improvments.

36a2826

Merge branch 'master' into master

1aa1400

Fixed premature destroy of performance manager

0460ba8

Merge remote-tracking branch 'origin/master'

26a104c

Removed call to PerformanceManager.Destroy since it floods the vCente…

c9b1adf

…r console with error messages.

Include name of parent in output.

748961a

Reorganized code. Add parent as tag.

80c1bae

Merge pull request #1 from prydin/pontus-reorg

50a7c94

Reorganized code

Merge branch 'master' of https://github.com/influxdata/telegraf

63c4a71

Merge branch 'master' of https://github.com/prydin/telegraf

dd804c3

changed Interval properties to durations

a0f084b

Merge pull request #2 from prydin/pierre

ffed322

changed Interval properties to durations

Started on concurrent object discovery. Not finished yet.

313c89d

use sync.WaitGroup to manage async

f8355a2

datastore, and instance context

db90cff

Run object discovery in the background.

14ff78d

added gather_<category> properties

efac9b9

initial readme

bad3279

Initial commit of include/exclude. Basic testing done.

61d37c8

Moved object ID maps to Endpoint.

ac72aca

readme and datastore source name

09e3c51

typo

c8b616b

gauges and grammar

8a11e37

parrallel on vSphere.init

213241b

refactored endpoint, connection timeout

e2c211e

Added sample output

fa79c85

puckpuck and others added 5 commits September 1, 2018 09:28

updated readme and sample config

a4f5990

force_discover_on_init default to false

e2e2040

Set ForceDiscoverOnInit to true in test to avoid false positives

7370f23

Merge branch 'master' of https://github.com/prydin/telegraf

38b4ada

Fixed typo in default config

81f5e84

danielnelson reviewed Sep 6, 2018

View reviewed changes

Better error handling

238e015

danielnelson mentioned this pull request Sep 7, 2018

Add VSphere input plugin #2682

Closed

3 tasks

danielnelson added this to the 1.8.0 milestone Sep 7, 2018

RC1: Final tweaks and added datacenter tags

99d1111

RC1.1: Fixed merge issue

c06edee

prydin closed this Sep 11, 2018

prydin reopened this Sep 11, 2018

danielnelson approved these changes Sep 11, 2018

View reviewed changes

danielnelson merged commit 5f3c331 into influxdata:master Sep 11, 2018

rgitzel pushed a commit to rgitzel/telegraf that referenced this pull request Oct 17, 2018

Add input plugin for VMware vSphere (influxdata#4141)

923f32e

otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019

Add input plugin for VMware vSphere (influxdata#4141)

fd0b0f0

otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019

Add input plugin for VMware vSphere (influxdata#4141)

76ec483

dupondje pushed a commit to dupondje/telegraf that referenced this pull request Apr 22, 2019

Add input plugin for VMware vSphere (influxdata#4141)

b61c9f2

athoune pushed a commit to bearstech/telegraf that referenced this pull request Apr 17, 2020

Add input plugin for VMware vSphere (influxdata#4141)

7abcfcc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added input plugin for VMware vSphere. #4141

Added input plugin for VMware vSphere. #4141

prydin commented May 12, 2018 •

edited by danielnelson

Loading

danielnelson commented Aug 30, 2018

prydin commented Aug 30, 2018

danielnelson left a comment

danielnelson Sep 6, 2018

prydin Sep 6, 2018 •

edited

Loading

danielnelson Sep 6, 2018

prydin Sep 6, 2018

prydin Sep 6, 2018 •

edited

Loading

danielnelson Sep 7, 2018

prydin Sep 8, 2018 •

edited

Loading

prydin Sep 8, 2018

danielnelson Sep 8, 2018

prydin Sep 8, 2018

danielnelson Sep 6, 2018

prydin Sep 6, 2018

prydin Sep 6, 2018

danielnelson Sep 6, 2018

danielnelson Sep 6, 2018

prydin Sep 6, 2018

danielnelson Sep 6, 2018

danielnelson Sep 6, 2018

danielnelson Sep 6, 2018

danielnelson Sep 6, 2018

danielnelson commented Sep 11, 2018

randallt commented Sep 11, 2018

prydin commented Sep 11, 2018

prydin commented Sep 11, 2018

prydin commented Sep 11, 2018

prydin commented Sep 11, 2018

danielnelson commented Sep 11, 2018


		For a detailed list of commonly available metrics, please refer to [METRICS.MD](METRICS.MD)

		## Tags

Added input plugin for VMware vSphere. #4141

Added input plugin for VMware vSphere. #4141

Conversation

prydin commented May 12, 2018 • edited by danielnelson Loading

Required for all PRs:

danielnelson commented Aug 30, 2018

prydin commented Aug 30, 2018

danielnelson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prydin Sep 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prydin Sep 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prydin Sep 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielnelson commented Sep 11, 2018

randallt commented Sep 11, 2018

prydin commented Sep 11, 2018

prydin commented Sep 11, 2018

prydin commented Sep 11, 2018

prydin commented Sep 11, 2018

danielnelson commented Sep 11, 2018

prydin commented May 12, 2018 •

edited by danielnelson

Loading

prydin Sep 6, 2018 •

edited

Loading

prydin Sep 6, 2018 •

edited

Loading

prydin Sep 8, 2018 •

edited

Loading