-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single member stats for existing metrics collector #82
Comments
Hi @JohnStarich I think the current version of It's a good idea to collect current member's statistics. I have been planing to implement this solution in
That's quite easy. But I'm willing to backport the above solution to
Thank you! I'm really happy to hear that. I would like to hear more about your use case, if it's possible. 😄 |
Thanks for your quick reply! My bad for the late response, last week was pretty busy for me 😅
Aha, good to know thank you. Good to hear about the new stats work. If you need a hand, let me know!
I agree, that behavior does sound pretty buggy – probably best left in the past 😄
Of course! We're dark-launching with Olric soon to help us provide better failure modes for multi-region aggregation. Essentially we need to resolve certain data without all regions being healthy, so using a distributed cache seems like the right approach. We're going to use 1 DMap cluster per region, with read-through caching when querying a region. It might be interesting to use a multi-region DMap, but I imagine the latency could be too high for our purposes. Since it's a global and highly available service, it's a bit difficult keeping all data in sync in every instance of the service via normal memcaches, but we don't want the operational overhead of separate database. I really liked Olric's embedded & distributed approach, similar to Hazelcast's where I first heard about the pattern. Next, we'll be trying out Olric distributed caches for places where one might normally use a Redis cluster. In this case, caching read-only data just above the database layer. We're excited to see how it performs! |
Hi @JohnStarich I have just released v0.3.7. It includes the new stats work. I have also fixed dependency problems of
It's great to hear that. I'm always here for the questions, bug reports and the other things about Olric. Keep in touch! |
Thanks @buraksezer! The exported structs are much easier to use, and the new One quick question, are the Partitions assumed to be entirely member-local now? So we could aggregate the SlabInfo across all partitions returned from Stats to get an idea of how much memory this instance is using? |
Thinking about how we'd index into the stats maps with Maybe it'd be useful to define a type alias for member IDs and use those in the map types? I could open a PR for this if you like // MemberID <description>
type MemberID = uint64
// Stats includes some metadata information about the cluster. The nodes add everything it knows about the cluster.
type Stats struct {
Cmdline []string
ReleaseVersion string
Runtime Runtime
ClusterCoordinator Member
Member Member // The current member
Partitions map[uint64]Partition // Includes metadata for this member's partitions.
Backups map[MemberID]Partition
ClusterMembers map[MemberID]Member
} |
Yes, it is. Partitions is entirely member-local. Olric calls
In your case, if you call
You are absolutely right. It's very confusing. We should use type aliases. You should feel free to open a PR for fixes and improvements. For
In addition to that, // MemberID <description>
type MemberID = uint64
...
Partitions: make(map[PartitionID]stats.Partition),
Backups: make(map[PartitionID]stats.Partition),
ClusterMembers: make(map[MemberID]stats.Member),
... Thank you! |
Oh my! I agree, there should be an option to disable "stopping the world". We also have separate monitoring of memory usage, so skipping it in an embedded Olric instance is a good idea. We could probably make a change like this backward-compatible with either a new function that takes a struct of options, or a variadic options pattern. I usually prefer the struct pattern since it's simpler, and it's easier to find documentation of all options. On the other hand, variadic is good for keeping it all in the same function signature. For example: // add struct options
func (db *Olric) Stats() (stats.Stats, error)
type StatOptions struct {
SkipRuntime bool
}
func (db *Olric) StatsOptions(opts StatOptions) (stats.Stats, error)
/////////////////////////
// OR use variadic options
type statConfig struct {
collectRuntime bool
}
type statsOption func(statConfig)
func SkipRuntime(cfg statConfig) {
cfg.collectRuntime = false
}
func (db *Olric) Stats(options ...statsOption) (stats.Stats, error) {
cfg := statsConfig{collectRuntime: true}
for _, opt := range options {
opt(cfg)
}
// ...
}
// called like this
olr.Stats(olric.SkipRuntime)
You've got it. I'll see if I can get to this soon. 👍 |
I'm totally fine with the variadic version of The protocol is derived from the memcached binary protocol. It's tiny and easy to modify, I think. See that: req := protocol.NewSystemMessage(protocol.OpStats)
req.SetExtra(protocol.StatsExtra{
SkipRuntime: true,
}) We need to define func (db *Olric) statsOperation(w, r protocol.EncodeDecoder) {
extra, ok := r.Extra().(protocol.StatsExtra)
if !ok {
// Impossible
}
if extra.SkipRuntime {
// Skip calling runtime.ReadMemStats
}
s := db.stats()
value, err := msgpack.Marshal(s)
if err != nil {
neterrors.ErrorResponse(w, err)
return
}
w.SetStatus(protocol.StatusOK)
w.SetValue(value)
} It's relatively easy. But I can handle this case, if you don't have time to deal with it.
Thanks in advance. I have just merged |
Thanks @buraksezer! Sorry for my delay, lots of things came up this month and I lost track of this one 😅 For v0.4, still in beta, should we skip runtime stats by default and allow opt-in? Stop-the-world behavior seems like one of those things that should be skipped by default to avoid performance issues. I'll start on the work for v0.4 assuming we'd skip by default at first, but can course-correct if needed. |
After digging in a bit, I have a couple questions for you when you have a moment:
|
@buraksezer Guessing these were the answers, based on the merge 😉
Update on the v0.3 back port, I think we can probably wait for v0.4. We've mitigated for now by significantly increasing the metrics collection interval. I didn't observe any negative effects on a lower interval, but just to be safe it was increased. Maybe we can keep this issue open until v0.4 leaves beta? |
@JohnStarich are you still using Olric, and if so, did you end up finding a good solution for exposing Prometheus metrics? |
Hey @derekperkins, we are indeed. Our solution thus far has been to poll the We later took that a step further and now emit metrics on every key-value Get and Put operation for more fine-grained measurements like op duration, byte size, and errors. |
Closing this issue due to inactivity. |
Is there a way to programmatically identify the current embedded server's Stats? Or more simply, collect only the current member's stats?
I'm attempting to connect Prometheus metrics to Olric so it only reports on the local instance – but not all members. Since Prometheus is typically set up to scrape all of my Kubernetes pods, I don't want the extra noise of each pod reporting the state of the world.
Perhaps we can expose only the current member's stats in a new method on the server instance? Alternatively, could the current member's auto-generated ID or name be exposed somewhere to filter the results of Stats()?
Thanks for such a great project, by the way! My team and I look forward to using this in the coming days. 😄
Related #32, #57
The text was updated successfully, but these errors were encountered: