-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka monitoring #4819
Comments
@andmarios could you share some insight on the following?
|
We have a lot of interest here, but no sponsor. Can someone assume the role so we can move forward? |
|
Hi @Cricket007, the responsibilities are mentioned in the OP. Would you be interested to help? I received the following from @PanosJee, regarding other vendors monitoring Kafka (I think it's enough to get us started at least):
|
I've limited familiarity with Go, but I currently use Datadog and NewRelic and plan on taking the Confluent admin certification sometime soon, so happy to help where I can
Each of these links are primarily server side monitoring. Is that a good starting point, or should client side metrics be tracked as well? |
i see all of them are using self written jmx fethcer newrelis: https://github.com/newrelic/nrjmx |
Interested to see the monitoring here! Though I am not an expert in Kafka so maybe not much insights... To me, maybe monitor the rate of message generation, rate of consumption, not-yet-consumed message count. :) |
Those all already exist as JMX metrics |
@OneCricketeer Thanks! |
Additional metrics of interest would be under replicated partitions, offline partitions, follower time, fetch time, leader election rate, consumer lag, etc |
Consumer lag would be measured in the consumer themselves, unless interacting with a tool like Burrow. The rest are available in JMX |
Hi all, JMX is one thing but you might know that there are now some monitoring tool which are quite "standard" to use to monitor different part of your kafka cluster. I would mainly speak about the linkedin tools Burrow and Cruise-control. Both of them exposes very nice APIs which can provide all that we need to properly monitor a Kafka Cluster (https://github.com/linkedin/cruise-control/wiki/REST-APIs#get-requests, https://github.com/linkedin/Burrow/wiki/HTTP-Endpoint) I would like to see netdata automatically monitor our kafka clusters through these APIs. I don't have time to work on it actually but if somebody is interested I might help and test. Best, |
Hey @jrevillard, Thanks for chiming in! If you could guide us in structuring the available metrics in charts that make sense, that would be a huge help for us. We can do the development alright, although @ilyam8 will have more information in regards to our priorities. The hard part is finding people such as yourself, who are willing to guide us through the available metrics, select the ones that we care and structure them in sensible charts. What do you think? |
Hi @odyslam, As I said I'm willing to help if I can... but please apologize in advance if I'm not always very reactive because I really have a lot of stuff at work. So concerning cruise-control, you can see here the different metrics that they have already:
So here are the endpoints that need to be queried:
All those endpoints can return JSON data if you specify the What would you need then, some sample output of the different endpoints ? Best, |
Thanks @jrevillard for this in-detail message. We perfectly understand, please do take your time in our communication. Ilya is aware of this and should reply shortly in this issue, he is our integration engineer and will have more information regarding prioritisation and bandwidth. In every case, it's super helpful and reassuring that we have a user who is willing to guide us towards the correct data sources and charts. Intimate knowledge of the collector's object has always been the toughest challenge. I will continue monitoring this issue. See you soon and a happy new year :) Best, |
Hi @jrevillard, thanks for your willing to help ❤️ Monitoring Apache Kafka cluster via cruise-control looks very good.
Yes, sample output would be very good to have. Ideally we need to setup a local Apache Kafka cluster with cruise-control for developing the collector. |
Worth mentioning (again) that much of the data exposed via Burrow/Cruise Control can be gathered via JMX |
@OneCricketeer Perhaps @ilyam8 we could approach this via creating a JMX collector/helper function and use that to gather apache kafka, 2 birds with one stone sort of thing. Thougts? |
I suspect jmx collector should be written in java in which i have 0 experience (i am not 100% sure, because i have no clear understanding what it means - having a JMX collector). There is netdata-java-plugin (written by a contributer, not really maintained), perhaps it is something related to the issue. |
See Dadadog implementation
https://github.com/DataDog/datadog-agent/blob/e070027e0253884d6d2608d5f532e9c8825c599f/pkg/collector/check/jmx.go
…On Mon, 18 Jan 2021 at 1:10 PM, Ilya Mashchenko ***@***.***> wrote:
I suspect jmx collector should be written in java in which i have 0
experience (i am not 100% sure, because i have no clear understanding what
it means - *having a JMX collector*).
There is netdata-java-plugin
<https://github.com/simonnagl/netdata-java-orchestrator> (written by a
contributer, not really maintained), perhaps it is something related to the
issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4819 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAGDRMUPTXXKKV7TZHXYFLS2QJJBANCNFSM4GHUDRYA>
.
|
@PanosJee thnks for sharing it. If i understand it right, both datadog and newrelic have some jmxfetcher agent (a tool for extracting data out of any application exposing a JMX interface) written in java: That golang collector gathers data from it (perhaps does some additional work, like filtering). |
Let me clarify that Cruise control is nice for monitoring external state of Kafka, but it can't give internal state like heap usage and the granularity of data that JMX can provide (depending on the polling interval). If you're limited to a REST API, then Jolokia is another option, but sounds like you want to steer away from that (#364) |
Hey @jrevillard, Thanks for coming back to this. We are in the process of defining the Roadmap and new features. When we have new information regarding our bandwidth and priorities, we will come back to this thread. We believe in developing alongside users, but due to the vastness of this project, we have to be ruthless in our prioritization. Hopefully, we will have more information soon! |
This issue has been mentioned on the Netdata Community. There might be relevant details there: https://community.netdata.cloud/t/creating-dynamic-charts-within-plugin-get-data-method/957/4 |
Hello guys, Sorry for the delay to bring updates. After to do some initial research, I am glad to inform that we took a look in different possibilities and we starting working again with Kafka.
My current conclusion is that we will work firstly with a collector to monitor Kafka using Best regards! |
I currently have metrics fetched to prometheus and a dashboard in grafana to show us information. I am happy to share what i can if anyone needs it. |
Will we able to monitor topic sizes and disk utilization by topics using this plugin? |
We are willing to support monitoring Kafka with netdata.
To get started, we need a user that will act as a sponsor during the implementation.
He/she will assist us with the following:
We would also appreciate getting votes 👍 for this issue from the community, to understand the interest in the particular collector.
The text was updated successfully, but these errors were encountered: