Skip to content

Conversation

tsaarni
Copy link
Member

@tsaarni tsaarni commented Oct 6, 2025

Commit Message: This change adds endpoint to retrieve TCMalloc stats. For details, see https://google.github.io/tcmalloc/stats.html.

Additional Description:

TCMalloc provides detailed statistics that can be useful for troubleshooting memory-related issues. For example, in several github issues about overload manager, users have been advised to patch Envoy in order to access these statistics to exclude possible heap fragmentation.

This pull request proposes adding a new admin interface endpoint /memory/tcmalloc that exposes the raw TCMalloc statistics, as generated by the TCMalloc implementation(s).

See attached examples.
google-tcmalloc-dump.txt
gperftools-tcmalloc-dump.txt
These dumps do not contain any sensitive information.

Risk Level: low
Testing: A new test case
Docs Changes: Endpoint added in Administration interface document.
Release Notes: Added /memory/tcmalloc admin endpoint that provides TCMalloc memory statistics.
Fixes #41375

@tsaarni
Copy link
Member Author

tsaarni commented Oct 6, 2025

Hi @nezdolik while troubleshooting some overload manager issues, I noticed cases where this could help users avoid unnecessary recompilation. What are your thoughts?

This change adds endpoint to retrieve TCMalloc stats.
For details, see https://google.github.io/tcmalloc/stats.html

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
@tsaarni tsaarni force-pushed the tcmalloc-stats-endpoint branch from 6d9a983 to 0f42a06 Compare October 6, 2025 13:26
@tsaarni
Copy link
Member Author

tsaarni commented Oct 6, 2025

The asan, msan and tsan tests are failing

[ RUN      ] IpVersions/AdminInstanceTest.MemoryTcmalloc/IPv6
test/server/admin/server_info_handler_test.cc:71: Failure
Value of: output
Expected: has substring "Bytes in use by application"
  Actual: ""
Stack trace:
  0x560a7a245111: Envoy::Server::AdminInstanceTest_MemoryTcmalloc_Test::TestBody()
  0x560a8057101d: testing::internal::HandleSehExceptionsInMethodIfSupported<>()
  0x560a80538deb: testing::internal::HandleExceptionsInMethodIfSupported<>()
  0x560a80513c89: testing::Test::Run()
  0x560a80514abd: testing::TestInfo::Run()

I guess tcmalloc is not really used in these builds and maybe that is why it returns empty string. I can disable this test for sanitizer builds. For reference, AdminInstanceTest.Memory which exercises /memory allows zero values likely for this reason, but maybe it does not make much sense for stats dump test to allow empty dump content. Maybe at that point the whole test could be removed as it is pretty trivial...

@kyessenov
Copy link
Contributor

LGTM and I think this is very useful. Can you put if defined(TCMALLOC) around the content test? I think that would be enough since production builds use TCMALLOC.

Signed-off-by: Tero Saarni <tero.saarni@gmail.com>
Signed-off-by: Tero Saarni <tero.saarni@gmail.com>
@ramaraochavali
Copy link
Contributor

+1 This is super useful. Thank you so much @tsaarni for adding this

Copy link
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this!

/wait

buffer.resize(strlen(buffer.c_str()));
return buffer;
#else
return "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I think the admin endpoint should return 501 (Not implemented) and some text saying envoy wasn't built with tcmalloc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Signed-off-by: Tero Saarni <tero.saarni@gmail.com>
@yanavlasov
Copy link
Contributor

@tsaarni coverage degradation in source/common/memory looks real. https://storage.googleapis.com/envoy-cncf-pr/7f9d628/coverage/source/common/memory/index.html

Can you see if the test coverage can be improved so CI can pass, please?

/wait

Signed-off-by: Tero Saarni <tero.saarni@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add admin endpoint for exposing TCMalloc statistics
5 participants