Composite client role mappings endpoint is slow and degrades under concurrency with many client roles

### Before reporting an issue

- [x] I have read and understood the above terms for submitting issues, and I understand that my issue may be closed without action if I do not follow them.

### Area

admin/api

### Describe the bug

We use a single realm with one client where client roles represent fine-grained permissions tied to individual resources (e.g., per-site or per-entity access controls). As resources are onboarded, new client roles are created — we currently have ~2,400 client roles and this count grows continuously. Users are assigned ~20 direct roles, roughly a dozen of which are composite roles each bundling ~16-17 leaf roles, resulting in ~212 effective roles per user. Our services rely on the Admin API endpoint `GET /admin/realms/{realm}/users/{user-id}/role-mappings/clients/{client-id}/composite` to resolve a user's effective client role mappings.

The `getCompositeClientRoleMappings()` implementation iterates over **all** client roles (not just the user's assigned roles) and calls `user.hasRole()` on each one. `hasRole()` in turn delegates to `KeycloakModelUtils.searchFor()`, which recursively expands composite roles using a fresh `HashSet` on every invocation — there is no memoization across calls. This produces O(C × M × D) complexity, where C is the total number of client roles, M is the number of the user's direct role mappings, and D is the composite expansion depth. In our case, this means roughly 800,000 recursive role-containment checks per single API call.

Under concurrency, this causes severe latency degradation. A single request completes in ~1s, but under moderate load (10-60 parallel requests), response times spike to 7-23s due to CPU saturation and GC pressure from the large number of short-lived `HashSet` allocations. The database is not the bottleneck — we observe zero DB queries during these requests and a 99.96% Infinispan cache hit ratio. The problem is purely algorithmic: the work is proportional to the total number of client roles in the realm rather than the number of roles assigned to the user, and it will get progressively worse as more client roles are added.

**Related issues:**
- #12885 — "Performance problem with big number of roles" (closed 2024-03). Same symptom on the same `/composite` endpoint (11,000 roles, 15-22s). Closed after PR #24012 improved role *listing*, but the `user.hasRole()` per-role loop in `getCompositeClientRoleMappings()` was never addressed.
- #13146 — "Optimize composite role evaluation" PR (closed without merge). Introduced `expandCompositeRoles()` utility but never applied it to this endpoint.
- #12888 — "Improve querying of composite roles" (closed 2026-01). Storage-level improvements for composite role queries.
- #11796 — "Avoid loading entire composite role collection" (closed 2026-01). JPA collection loading optimization.
- #42190 — "Keycloak performance and composite roles" (open discussion). Composite role performance remains an active community concern.

### Version

26.5.5

### Regression

- [ ] The issue is a regression

### Expected behavior

The endpoint should return effective client role mappings in time proportional to the user's role count, not the total number of client roles in the realm. `RoleUtils.expandCompositeRoles()` already exists in the codebase and performs BFS expansion in O(M × D) — it should be used here instead of iterating all client roles with `user.hasRole()`.

### Actual behavior

`ClientRoleMappingsResource.getCompositeClientRoleMappings()` (line 130-145 in `services/src/main/java/org/keycloak/services/resources/admin/ClientRoleMappingsResource.java`) iterates all client roles via `client.getRolesStream()` and filters with `user.hasRole()`. Each `hasRole()` call triggers recursive composite expansion through `KeycloakModelUtils.searchFor()` with a new `HashSet` per invocation.

Current code:
```java
Stream<RoleModel> roles = client.getRolesStream();   // ALL client roles (2,400+)
return roles.filter(user::hasRole).map(toBriefRepresentation);  // hasRole() per role
```

Proposed fix using existing `RoleUtils.expandCompositeRoles()`:
```java
Set<RoleModel> directRoles = user.getRoleMappingsStream().collect(Collectors.toSet());
Set<RoleModel> effectiveRoles = RoleUtils.expandCompositeRoles(directRoles);
return effectiveRoles.stream()
        .filter(r -> r.isClientRole() && r.getContainerId().equals(client.getId()))
        .map(toBriefRepresentation);
```

This changes the algorithm from O(C × M × D) to O(M × D + C), eliminating ~800,000 recursive checks per request.

**Validated results (patched image deployed to production-like environment, 6 replicas):**

| Scenario | Before | After | Improvement |
|----------|--------|-------|-------------|
| Sequential (warm cache) | 1.009s | 0.115s | 8.8x |
| 60 concurrent avg | 6.822s | 0.604s | 11x |
| 60 concurrent p95 | 8.794s | 0.663s | 13x |
| Sustained 60 rps / 30s avg | ~7-23s | 0.144s | 50-160x |

Responses are byte-for-byte identical between patched and unpatched versions (212 roles, same IDs and content).

### How to Reproduce?

1. Create a realm with a client containing ~2,400+ roles
2. Create ~12 composite roles, each bundling ~16-17 leaf client roles
3. Assign ~20 direct roles (mix of composite and leaf) to a user, resulting in ~212 effective roles
4. Call `GET /admin/realms/{realm}/users/{user-id}/role-mappings/clients/{client-id}/composite`
5. Observe ~1s latency for a single request
6. Send 60 concurrent requests to the same endpoint
7. Observe average latency of 6-7s, p95 of 8-9s

The latency scales with the total number of client roles, not the user's role count. Adding more client roles (even unrelated to the user) increases per-request latency.

### Anything else?

We have a patched image deployed and validated in our environment. We plan to submit a PR with the fix and integration tests. The fix uses `RoleUtils.expandCompositeRoles()` which already exists in the codebase (introduced via work on composite role optimization) but was never applied to this specific endpoint.

Note: this issue was investigated and the fix developed with assistance from Claude (Anthropic AI). The algorithmic analysis, patching, deployment, and load test validation were performed by a human engineer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Composite client role mappings endpoint is slow and degrades under concurrency with many client roles #47157

Before reporting an issue

Area

Describe the bug

Version

Regression

Expected behavior

Actual behavior

How to Reproduce?

Anything else?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scenario	Before	After	Improvement
Sequential (warm cache)	1.009s	0.115s	8.8x
60 concurrent avg	6.822s	0.604s	11x
60 concurrent p95	8.794s	0.663s	13x
Sustained 60 rps / 30s avg	~7-23s	0.144s	50-160x

Composite client role mappings endpoint is slow and degrades under concurrency with many client roles #47157

Description

Before reporting an issue

Area

Describe the bug

Version

Regression

Expected behavior

Actual behavior

How to Reproduce?

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions