Description
Currently, the login-failures cache is unbounded, and entries seem to never expire. While they are very small, this causes the cache to grow in memory infinitely, similar to offline sessions. For offline sessions, a passivation feature was added in #26998. This issue causes nowhere near as much memory pressure as offline sessions as there are much fewer cache entries of this type than login sessions in a large installation.
Still it seems reasonable that there should be configuration parameters here so that the caches don't grow to an unbounded size over time. Over a period of less than 2 weeks, we have seen more than 40k entries in this cache in our cluster per node, with the default 2 owners, so roughly 60k entries in the cluster in total. In an attack scenario like credential stuffing, this cache could potentially grow a lot, which might threaten cluster stability in the worst case by taking up large amounts of memory in infinispan
Discussion
No response
Motivation
This is a possible stability improvement to make sure there are no caches in Keycloak that can take up unbounded amounts of heap space.
Details
Currently, the entries seem to have no expire and no max idle time in https://github.com/keycloak/keycloak/blob/main/model/infinispan/src/main/java/org/keycloak/models/sessions/infinispan/util/SessionTimeouts.java#L206-L219. Backwards compatibility reasons are mentioned in the code, but maybe these can be evaluated. When looking at the development of the cache entries in our monitoring boards, there do not seem to be background cleaner threads cleaning login failures (anymore?). If there were, we would assume to see a decrease during the off-hours at night, but all we see is a slower/zero increase during those times.
It seems like a max-idle time is the most reasonable solution here: Just deleting a login failure entry because it's old may let a user in even if they are currently being brute-forced, just because the initial login failure was some time ago. If a login failure entry was not touched for several hours, there seems to be no good reason to store it in memory anymore
Description
Currently, the login-failures cache is unbounded, and entries seem to never expire. While they are very small, this causes the cache to grow in memory infinitely, similar to offline sessions. For offline sessions, a passivation feature was added in #26998. This issue causes nowhere near as much memory pressure as offline sessions as there are much fewer cache entries of this type than login sessions in a large installation.
Still it seems reasonable that there should be configuration parameters here so that the caches don't grow to an unbounded size over time. Over a period of less than 2 weeks, we have seen more than 40k entries in this cache in our cluster per node, with the default 2 owners, so roughly 60k entries in the cluster in total. In an attack scenario like credential stuffing, this cache could potentially grow a lot, which might threaten cluster stability in the worst case by taking up large amounts of memory in infinispan
Discussion
No response
Motivation
This is a possible stability improvement to make sure there are no caches in Keycloak that can take up unbounded amounts of heap space.
Details
Currently, the entries seem to have no expire and no max idle time in https://github.com/keycloak/keycloak/blob/main/model/infinispan/src/main/java/org/keycloak/models/sessions/infinispan/util/SessionTimeouts.java#L206-L219. Backwards compatibility reasons are mentioned in the code, but maybe these can be evaluated. When looking at the development of the cache entries in our monitoring boards, there do not seem to be background cleaner threads cleaning login failures (anymore?). If there were, we would assume to see a decrease during the off-hours at night, but all we see is a slower/zero increase during those times.
It seems like a max-idle time is the most reasonable solution here: Just deleting a login failure entry because it's old may let a user in even if they are currently being brute-forced, just because the initial login failure was some time ago. If a login failure entry was not touched for several hours, there seems to be no good reason to store it in memory anymore