[KEYCLOAK-7950] - Fixes user pagination when using filtering users members of groups#5576
Conversation
1e3037f to
40b742e
Compare
There was a problem hiding this comment.
Any specific reason this branch uses different order and comparison function than the ones above?
There was a problem hiding this comment.
This does look like an n+1 query to me. Wouldn't it be possible to fetch the users directly inside the above query?
There was a problem hiding this comment.
I find the whole query generation code extremely hard to read and understand. Wouldn't it be possible to at least add a comment sketching the generated query? I do prefer readable code over comments any day but this is too much for me or I did not have enough coffee, yet... :)
40b742e to
32bad1f
Compare
|
@sschu, regarding the usage of the JPA Criteria API, you can check the generated SQL by setting Thanks for the feedback, I was expecting to hear from you that it worked so I could refactor the code a bit. I think that now we should perform slightly better as we are returning only |
6c42a71 to
fe59ce5
Compare
|
We just tested this with our target setup of 250K users in 220 groups with permissions enabled. This again times out. I think what is missing in the initial retrieval of the ids is the pagination. In our setup around 60k user ids will be returned for the biggest group (which is Germany, in our case:)). If this would be limited to the given page size, it should be much more efficient. Two minor things: |
fe59ce5 to
e49f1d3
Compare
|
@sschu, after some more testing, I found that the generated query was reading user table twice. If you could check again my changes, I appreciate. I'm going to run some more tests here now using a dataset similar to yours. I'm using 150k users, very few users per groups, and without timeouts. |
|
@sschu btw, you are sending first/max result parameters when querying users, right? Or are you fetching everything ? |
6a134f1 to
b456efe
Compare
hmlnarik
left a comment
There was a problem hiding this comment.
The built SELECT as well as internal representations of objects have to be optimized.
There was a problem hiding this comment.
We want Hibernate to provide values of the fillters as parameters, hence values should be Objects, not Strings.
There was a problem hiding this comment.
There was a problem hiding this comment.
I believe we should.
The current changes in UsersResources are beyond the API contract already, since according to the JavaDoc documentation, the attributes can only contain first, last, username, and email, and for these the type would stay String.
So if we are to fix this issue, we need to change the API contract anyway. Hence my take is to change the signature to make it compile time rather than runtime issue, add a big fat note into migration guide that this change takes place and that the attributes can contain other attribute names than what has been specified in org.keycloak.storage.user.UserQueryProvider.searchForUser javadocs, and that the code needs to check value type as well.
@stianst Do you agree with this change? Could you please comment here?
There was a problem hiding this comment.
I would say that support additional attributes is probably better than changing the method signature. At the worst case, people will just ignore any unknown/unnecessary attribute. If we wanted to really restrict the attributes that can go into the map, we could probably just have them as arguments instead of entries in a map.
Note that in this PR, I'm not really changing any behavior but using Criteria API (which I'm still evaluating).
There was a problem hiding this comment.
Continuing discussion below.
There was a problem hiding this comment.
UserEntity should be returned, not String
There was a problem hiding this comment.
queryBuilder.select(root);
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Not needed, since we're already selecting UserModels due to changes on L708
There was a problem hiding this comment.
Here the costly WHERE id IN (SELECT DISTINCT user.ID ...) is created. We only need to WHERE EXISTS (SELECT 1 FROM UserGroupMembership ...) which is cheaper.
That needs to be optimized. See https://docs.oracle.com/javaee/6/api/javax/persistence/criteria/CriteriaBuilder.html#exists(javax.persistence.criteria.Subquery)
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
List should be passed in directly, not JSON serialization.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
The realm.getGroups() call is very expensive on the DB. In our scenario it times out (3061 groups in the DB currently). It fetches all Groups with associated attributes, group roles and child groups as n+1 selects. This is nearly 10K queries. Ideally, only IDs of groups with permissions enabled would be fetched. Isn't there a way to identify groups with permissions enabled directly in the database using some kind of join?
There was a problem hiding this comment.
I was a bit curious why the lazy loading of the groups is triggered in the first place because loading the 3061 group entities is not a problem. As it turns out the call to realm.getGroups() calls RealmCacheSession.getGroupById() for every group which in turn creates a CachedGroup for every Group. This has the side effect of triggering the lazy loading for all groups. In this case I guess it is really better to bypass the cache if this is possible. I can imagine there are more cases like this and a more general solution would be necessary to not trigger lazy loading of large collections by populating the cache. @hmlnarik I guess this is your domain?
There was a problem hiding this comment.
@sschu, I did something different. Groups are obtained based on the existence of a resource with a Group type. I also tried to remove N+1 queries when retrieving resources and policies from the database.
Initial tests using a database with 7500 groups show a significant improvement (each group with permission enabled) in response time. You should see a delay during the first load, but it should not hang. Wondering if you can test the new changes and give some feedback ...
In order to test my changes, you will need to execute the following UPDATE:
update keycloak.RESOURCE_SERVER_RESOURCE set type = 'Group' where name like 'group.resource.%'
If we agree about these initial changes, the command above will be part of the migration.
There was a problem hiding this comment.
I just retested this with our current dataset. I can confirm that
a) Retrieving the group resources is as efficient as it can get on the DB looking for resources of type "Group", i.e. no N+1 triggering of lazy-loaded associations.
b) The caching works so the second and all follow-up requests are reasonably fast.
c) However, evaluating evaluating the permissions on the first request still takes quite some time ~ 2 minutes with ca. 200 groups. I can see that for every group 11 queries are generated, that's 2K+ DB queries in our case. To me, it looks like they are partly redundant, i.e. I suspect some data is fetched because lazy loading is triggered by caching and not because the data is actually needed in the computation. The responsible code is in the generic authz part. Not sure it is possible to somehow eager-load and cache all necessary entities and if that is faster in the end. Currently, these are the queries per resource:
Hibernate:
select
policyenti0_.ID as ID1_66_0_,
policyenti4_.ID as ID1_66_1_,
policyenti0_.DECISION_STRATEGY as DECISION2_66_0_,
policyenti0_.DESCRIPTION as DESCRIPT3_66_0_,
policyenti0_.LOGIC as LOGIC4_66_0_,
policyenti0_.NAME as NAME5_66_0_,
policyenti0_.OWNER as OWNER6_66_0_,
policyenti0_.RESOURCE_SERVER_ID as RESOURCE8_66_0_,
policyenti0_.TYPE as TYPE7_66_0_,
config5_.POLICY_ID as POLICY_I1_46_1__,
config5_.VALUE as VALUE2_46_1__,
config5_.NAME as NAME3_1__,
policyenti4_.DECISION_STRATEGY as DECISION2_66_1_,
policyenti4_.DESCRIPTION as DESCRIPT3_66_1_,
policyenti4_.LOGIC as LOGIC4_66_1_,
policyenti4_.NAME as NAME5_66_1_,
policyenti4_.OWNER as OWNER6_66_1_,
policyenti4_.RESOURCE_SERVER_ID as RESOURCE8_66_1_,
policyenti4_.TYPE as TYPE7_66_1_,
associated3_.POLICY_ID as POLICY_I1_1_0__,
associated3_.ASSOCIATED_POLICY_ID as ASSOCIAT2_1_0__,
config5_.POLICY_ID as POLICY_I1_46_1__,
config5_.VALUE as VALUE2_46_1__,
config5_.NAME as NAME3_1__
from
RESOURCE_SERVER_POLICY policyenti0_
inner join
RESOURCE_POLICY resources1_
on policyenti0_.ID=resources1_.POLICY_ID
inner join
RESOURCE_SERVER_RESOURCE resourceen2_
on resources1_.RESOURCE_ID=resourceen2_.ID
inner join
ASSOCIATED_POLICY associated3_
on policyenti0_.ID=associated3_.POLICY_ID
inner join
RESOURCE_SERVER_POLICY policyenti4_
on associated3_.ASSOCIATED_POLICY_ID=policyenti4_.ID
left outer join
POLICY_CONFIG config5_
on policyenti0_.ID=config5_.POLICY_ID
where
policyenti0_.RESOURCE_SERVER_ID=?
and resourceen2_.RESOURCE_SERVER_ID=?
and resourceen2_.ID=?
Hibernate:
select
config0_.POLICY_ID as POLICY_I1_46_0_,
config0_.VALUE as VALUE2_46_0_,
config0_.NAME as NAME3_0_
from
POLICY_CONFIG config0_
where
config0_.POLICY_ID=?
Hibernate:
select
associated0_.POLICY_ID as POLICY_I1_1_0_,
associated0_.ASSOCIATED_POLICY_ID as ASSOCIAT2_1_0_,
policyenti1_.ID as ID1_66_1_,
policyenti1_.DECISION_STRATEGY as DECISION2_66_1_,
policyenti1_.DESCRIPTION as DESCRIPT3_66_1_,
policyenti1_.LOGIC as LOGIC4_66_1_,
policyenti1_.NAME as NAME5_66_1_,
policyenti1_.OWNER as OWNER6_66_1_,
policyenti1_.RESOURCE_SERVER_ID as RESOURCE8_66_1_,
policyenti1_.TYPE as TYPE7_66_1_
from
ASSOCIATED_POLICY associated0_
inner join
RESOURCE_SERVER_POLICY policyenti1_
on associated0_.ASSOCIATED_POLICY_ID=policyenti1_.ID
where
associated0_.POLICY_ID=?
Hibernate:
select
resources0_.POLICY_ID as POLICY_I1_62_0_,
resources0_.RESOURCE_ID as RESOURCE2_62_0_,
resourceen1_.ID as ID1_67_1_,
resourceen1_.DISPLAY_NAME as DISPLAY_2_67_1_,
resourceen1_.ICON_URI as ICON_URI3_67_1_,
resourceen1_.NAME as NAME4_67_1_,
resourceen1_.OWNER as OWNER5_67_1_,
resourceen1_.OWNER_MANAGED_ACCESS as OWNER_MA6_67_1_,
resourceen1_.RESOURCE_SERVER_ID as RESOURCE8_67_1_,
resourceen1_.TYPE as TYPE7_67_1_
from
RESOURCE_POLICY resources0_
inner join
RESOURCE_SERVER_RESOURCE resourceen1_
on resources0_.RESOURCE_ID=resourceen1_.ID
where
resources0_.POLICY_ID=?
Hibernate:
select
scopes0_.POLICY_ID as POLICY_I1_71_0_,
scopes0_.SCOPE_ID as SCOPE_ID2_71_0_,
scopeentit1_.ID as ID1_68_1_,
scopeentit1_.DISPLAY_NAME as DISPLAY_2_68_1_,
scopeentit1_.ICON_URI as ICON_URI3_68_1_,
scopeentit1_.NAME as NAME4_68_1_,
scopeentit1_.RESOURCE_SERVER_ID as RESOURCE5_68_1_
from
SCOPE_POLICY scopes0_
inner join
RESOURCE_SERVER_SCOPE scopeentit1_
on scopes0_.SCOPE_ID=scopeentit1_.ID
where
scopes0_.POLICY_ID=?
Hibernate:
select
roleentity0_.ID as ID1_42_0_,
roleentity0_.CLIENT as CLIENT7_42_0_,
roleentity0_.CLIENT_REALM_CONSTRAINT as CLIENT_R2_42_0_,
roleentity0_.CLIENT_ROLE as CLIENT_R3_42_0_,
roleentity0_.DESCRIPTION as DESCRIPT4_42_0_,
roleentity0_.NAME as NAME5_42_0_,
roleentity0_.REALM as REALM8_42_0_,
roleentity0_.REALM_ID as REALM_ID6_42_0_
from
KEYCLOAK_ROLE roleentity0_
where
roleentity0_.ID=?
Hibernate:
select
compositer0_.COMPOSITE as COMPOSIT1_19_0_,
compositer0_.CHILD_ROLE as CHILD_RO2_19_0_,
roleentity1_.ID as ID1_42_1_,
roleentity1_.CLIENT as CLIENT7_42_1_,
roleentity1_.CLIENT_REALM_CONSTRAINT as CLIENT_R2_42_1_,
roleentity1_.CLIENT_ROLE as CLIENT_R3_42_1_,
roleentity1_.DESCRIPTION as DESCRIPT4_42_1_,
roleentity1_.NAME as NAME5_42_1_,
roleentity1_.REALM as REALM8_42_1_,
roleentity1_.REALM_ID as REALM_ID6_42_1_
from
COMPOSITE_ROLE compositer0_
inner join
KEYCLOAK_ROLE roleentity1_
on compositer0_.CHILD_ROLE=roleentity1_.ID
where
compositer0_.COMPOSITE=?
Hibernate:
select
roleentity0_.ID as col_0_0_
from
KEYCLOAK_ROLE roleentity0_
where
roleentity0_.CLIENT_ROLE=0
and roleentity0_.NAME=?
and roleentity0_.REALM=?
Hibernate:
select
scopes0_.POLICY_ID as POLICY_I1_71_0_,
scopes0_.SCOPE_ID as SCOPE_ID2_71_0_,
scopeentit1_.ID as ID1_68_1_,
scopeentit1_.DISPLAY_NAME as DISPLAY_2_68_1_,
scopeentit1_.ICON_URI as ICON_URI3_68_1_,
scopeentit1_.NAME as NAME4_68_1_,
scopeentit1_.RESOURCE_SERVER_ID as RESOURCE5_68_1_
from
SCOPE_POLICY scopes0_
inner join
RESOURCE_SERVER_SCOPE scopeentit1_
on scopes0_.SCOPE_ID=scopeentit1_.ID
where
scopes0_.POLICY_ID=?
Hibernate:
select
scopes0_.POLICY_ID as POLICY_I1_71_0_,
scopes0_.SCOPE_ID as SCOPE_ID2_71_0_,
scopeentit1_.ID as ID1_68_1_,
scopeentit1_.DISPLAY_NAME as DISPLAY_2_68_1_,
scopeentit1_.ICON_URI as ICON_URI3_68_1_,
scopeentit1_.NAME as NAME4_68_1_,
scopeentit1_.RESOURCE_SERVER_ID as RESOURCE5_68_1_
from
SCOPE_POLICY scopes0_
inner join
RESOURCE_SERVER_SCOPE scopeentit1_
on scopes0_.SCOPE_ID=scopeentit1_.ID
where
scopes0_.POLICY_ID=?
Hibernate:
select
scopes0_.POLICY_ID as POLICY_I1_71_0_,
scopes0_.SCOPE_ID as SCOPE_ID2_71_0_,
scopeentit1_.ID as ID1_68_1_,
scopeentit1_.DISPLAY_NAME as DISPLAY_2_68_1_,
scopeentit1_.ICON_URI as ICON_URI3_68_1_,
scopeentit1_.NAME as NAME4_68_1_,
scopeentit1_.RESOURCE_SERVER_ID as RESOURCE5_68_1_
from
SCOPE_POLICY scopes0_
inner join
RESOURCE_SERVER_SCOPE scopeentit1_
on scopes0_.SCOPE_ID=scopeentit1_.ID
where
scopes0_.POLICY_ID=?
There was a problem hiding this comment.
Regarding #3, I also tried to remove unnecessary fetches when caching entries and only keep what is going to be used during the processing, I'll check again any other place we can improve.
Will let you know once I have something. I think the difference between your test environment and mine is that most of the permissions I have for groups have no policies. Will change that.
Regarding your dataset, you have almost 4k groups but only ~300 of them have permissions enabled, right?
There was a problem hiding this comment.
@sschu, I think I was able to make things better. I know we still have a lot of room for improvements but that would require reviewing the functionality more deeply in order to optimize how fine-grained permissions in admin are being done.
In summary, my latest changes (after your last comment) improved the time taken to load users during the first request. I think you should experience a slight improvement in response time. I found other places we could avoid N+1 queries and improvements to fine-grained admin code to make better use of our authz services API.
In a nutshell, I think I'm happier now with this initial set of changes to fine-grained permissions in admin, especially regarding user/groups permissions. Please, let me know how the latest changes work for you. I hope you get a better experience than my last round of changes :)
There was a problem hiding this comment.
Currently, we have 3K groups with 220 having permissions enabled. Target is 60K groups with 220 having permissions enabled. Every permissions has a different role-based policy attached targeting a role that says "may_manage_group_x".
Your last changes improved things a bit as you said. We are down to 9 db queries per group. 8 to go... :)
5e779bd to
ebd1c73
Compare
|
User storage provider is a supported API so we need to find a way to make these changes backwards compatible. Existing user storage provider implementations should continue to work on RH-SSO 7.3. I haven't yet had time to review this fully so a bit fuzzy on what is actually being changed here. |
stianst
left a comment
There was a problem hiding this comment.
We can not break a supported API, so any breaking changes will have to be removed or done in such a way that existing user storage providers continue to work as is, but may need updating to add new capabilities. Now, thinking about that in this context if existing user storage providers works as they do today, but they need to make updates to make user pagination work when filtering users members of groups that should be fine.
6f04fc6 to
7e9f1a4
Compare
|
I like the change to Criteria API since it leads to better portable code (my previous comments still apply though). That said, the changes in behaviour are on L241 and L248. The Since updating the Shouldn't |
Isn't adding more allowed attributes a compatible change or am I missing something here? I cannot really imagine somebody is relying on certain attributes being rejected... |
|
@sschu Unfortunately, not. Since JavaDoc says only these attributes are valid in I am not arguing how correct such an implementation is from engineering perspective, and that it fails basic sanity checks. The point is that it is a valid supported implementation, and has to work even after this change. There are options to overcome this for the current PR to stay conservative. We need to implement one. |
There was a problem hiding this comment.
This triggers the lazy-loading of all user associations (attributes, credentials) so the effect of the briefRepresentation-change from @hmlnarik in L252 is lost. The loading is triggered in JpaUserProvider.java L784 calling UserModel getUserById(String id, RealmModel realm) that caches the user.
There was a problem hiding this comment.
+1 That part should be removed completely (see the related comment in JpaUserProvider).
7e9f1a4 to
bdc42c5
Compare
|
@sschu, PR updated with latest changes. I managed to review the cache strategy for some entities as well as how we are fetching data from the database. Basically, these changes involve user, groups, policies, and resources. You should not see some of those N+1 queries anymore. The initial request is faster as well as the subsequent requests since data is being cached right after it is fetched from the database. The new changes also provide the necessary migration logic to update the resources related to groups with the |
| <databaseChangeLog xmlns="http://www.liquibase.org/xml/ns/dbchangelog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.1.xsd"> | ||
|
|
||
| <changeSet author="psilva@redhat.com" id="4.6.0-KEYCLOAK-7950"> | ||
| <customChange class="org.keycloak.connections.jpa.updater.liquibase.custom.SetTypeToGroupAdminResources"/> |
There was a problem hiding this comment.
In this case, http://www.liquibase.org/documentation/changes/update.html would be a better candidate since it would make it to manual migration script.
5e1fe90 to
0769675
Compare
hmlnarik
left a comment
There was a problem hiding this comment.
Thanks for the changes, @pedroigor, great job! From my perspective, the PR can go in. I'm not merging this though, this I left to @stianst once he is happy with this PR too.
The optimization requested in JpaUserProvider can go in a separate PR.
0172da2 to
096edc0
Compare
|
@pedroigor The Jenkins CI failure looks relevant, could you please check? |
096edc0 to
cfddb18
Compare
|
@keycloak-ci-bot test |
|
@hmlnarik Job is scheduled |
|
@keycloak-ci-bot test |
|
@hmlnarik Job is scheduled |
There was a problem hiding this comment.
This applies to the rest of the lazy loaders as well. Maybe worth adding a logic to DefaultLazyLoader (a Producer in constructor) for case of source == null.
e.g.
this.subGroups = new DefaultLazyLoader<>(Collections::emptySet, source -> source.getSubGroups().stream().map(GroupModel::getId).collect(Collectors.toSet()));
hmlnarik
left a comment
There was a problem hiding this comment.
NPE can be thrown from most places where DefaultLazyLoader is used.
|
@hmlnarik, yeah. I was trying to check whether or not that could actually fix the issue. Put that logic in the loaded is better. |
cfddb18 to
850d96e
Compare
|
@keycloak-ci-bot test |
|
@pedroigor Job is scheduled |
|
@hmlnarik, jenkins build looks fine now. |
I'm happy if @hmlnarik is happy so dismissing my review
No description provided.