Skip to content

Conversation

@ndealmeida
Copy link

Hi there, my first contribution on this project! Let me know your thoughts.

Context

Inspired by cerebro, I want to create a dashboard in Grafana to visualize shard distribution, state and movements over time. This helps during scale up periods and diagnose imbalances. I tried to use count(elasticsearch_indices_shards_docs) by (index, node) metric, but that didn't give me proper node name (e.g. 4n6k9UN4SZG-n3MjOAnszA) and a distinction of primary, replica and shard states.

Changes

  • Fix: add elasticsearch_node_shards_total on metrics.md (missing)
  • Refactor: simplify shards.go to be more consistent with other files like indices.go
  • Feat: add elasticsearch_node_shards_state metric with the following encoding (chosen arbitrarily):
0=unassigned
10=primary started
11=primary initializing
12=primary relocating
20=replica started
21=replica initializing
22=replica relocating

This allows me to create a Matrix with nodes as rows, indices as columns, and color the cells based on the value. The metric is flexible, has more labels but still low in cardinality due to limited possible values.

Signed-off-by: Neemias Almeida <neemias.junior@gmail.com>
@ndealmeida ndealmeida force-pushed the feat/node-shards-state-metric branch from edebf07 to 2a3f356 Compare November 21, 2025 12:18
@pjmpsu
Copy link

pjmpsu commented Nov 21, 2025

We are also looking to replace Cerebro due to its state of activity. Would you mind sharing the dashboard you have come up with thus far?

@ndealmeida ndealmeida closed this Nov 24, 2025
@ndealmeida ndealmeida reopened this Nov 24, 2025
@ndealmeida
Copy link
Author

ndealmeida commented Nov 24, 2025

Hi @pjmpsu, sure I can share the chart with the community. It would be nice if this #1100 also gets merged, so we can correlate the shards state with total docs per shard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants