Skip to content

fix: adds affinity and other scheduling to the operator#29977

Merged
vmuzikar merged 1 commit into
keycloak:mainfrom
shawkins:iss29258
Jul 3, 2024
Merged

fix: adds affinity and other scheduling to the operator#29977
vmuzikar merged 1 commit into
keycloak:mainfrom
shawkins:iss29258

Conversation

@shawkins

Copy link
Copy Markdown
Contributor

closes: #29258

Draft of a scheduling spec with the options that are most likely to be requested - affinity, topologyspread, priorityclass, and tolerations.

The handling is similar to other constructs - if the user has populated the field in the unsupported spec, that takes presedence.

A new label, app.kubernetes.io/component, is added to distinguish the server pods.

The default node anti-affinity is added if no other affinity has been specified. This mechanism may not have great usability. It could be better to have a withDefaults field on the scheduling spec.

Similarly I'm proposing to go a step further and handle the default affinity or spread - depending upon whether an external cache is in use. Issues with doing this:

  • cache configuration is currently done though additional properties. However we do have a cache spec where we could have first-class properties.
  • the cache configuration could be baked into the image, which we currently don't know how to account for. At worst the user would need to supply their own spread constraint(s).

@ahus1 @vmuzikar @abelmatos WDYT?

@keycloak-github-bot keycloak-github-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreported flaky test detected, please review

@keycloak-github-bot

Copy link
Copy Markdown

Unreported flaky test detected

If the flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR.

org.keycloak.testsuite.broker.KcOidcBrokerTest#testPostBrokerLoginFlowWithOTP_bruteForceEnabled

Keycloak CI - Java Distribution IT (windows-latest - temurin - 19)

java.lang.AssertionError: expected:<Invalid authenticator code.> but was:<null>
	at org.junit.Assert.fail(Assert.java:89)
	at org.junit.Assert.failNotEquals(Assert.java:835)
	at org.junit.Assert.assertEquals(Assert.java:120)
	at org.junit.Assert.assertEquals(Assert.java:146)
...

Report flaky test

@ahus1

ahus1 commented Jun 3, 2024

Copy link
Copy Markdown
Member

Thank you for this PR. What I'm reading here is that you're about to expose the Kubernetes constructs as is. When doing this, we'll offer a non-opinionated way to schedule Pods, which is IMHO ok as we don't seem to have an opinionated and agreed-upon solution.

Eventually, more coarse grained options might emerge as we learn about how this can be used in different environments. I'd say even then exposing this as a supported feature makes sense as we won't be able to foresee any possible scenario.

Even if the deployment uses an external Infinispan, there are still the embedded Infinispans for now, and therefore I'd advise the Operator not to take any caching configuration into account - at least for now. @pruivo is working an a PR which is planned for KC26 which would allow running Keycloak without its embedded distributed caches, but this is still some time in the future.

So for now, I'd say

  • have a default node anti-affinity,
  • ignore the caching options, and
  • expose the scheduling as is.

cc: @ryanemerson

@shawkins

shawkins commented Jun 3, 2024

Copy link
Copy Markdown
Contributor Author

@ahus1 I'm fine with removing the defaults around spread constraints and moving that to a topic for the scaling guide.

@ahus1

ahus1 commented Jun 3, 2024

Copy link
Copy Markdown
Member

I'm fine with removing the defaults around spread constraints and moving that to a topic for the scaling guide.

No, please add a default best-effort spread. I'm just against making the default settings dependent on the cache configuration.

@shawkins

shawkins commented Jun 3, 2024

Copy link
Copy Markdown
Contributor Author

No, please add a default best-effort spread. I'm just against making the default settings dependent on the cache configuration.

The default spread is just an affinity for the same zone. The pr has been updated with that.

Should just need some docs.

@vmuzikar WDYT?

@keycloak-github-bot keycloak-github-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreported flaky test detected, please review

@keycloak-github-bot

Copy link
Copy Markdown

Unreported flaky test detected

If the flaky tests below are affected by the changes, please review and update the changes accordingly. Otherwise, a maintainer should report the flaky tests prior to merging the PR.

org.keycloak.testsuite.oauth.TokenIntrospectionTest#testUnsupportedToken

Keycloak CI - Base IT (6)

java.lang.NullPointerException: Cannot invoke "com.fasterxml.jackson.databind.JsonNode.asBoolean()" because the return value of "com.fasterxml.jackson.databind.JsonNode.get(String)" is null
	at org.keycloak.testsuite.oauth.TokenIntrospectionTest.testUnsupportedToken(TokenIntrospectionTest.java:313)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
...

Report flaky test

@vmuzikar vmuzikar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawkins Thank you for the changes!

@shawkins shawkins marked this pull request as ready for review June 25, 2024 10:52
@shawkins shawkins requested review from a team as code owners June 25, 2024 10:52
@shawkins

Copy link
Copy Markdown
Contributor Author

@vmuzikar added the docs, should be fully ready for review now.

@mabartos mabartos left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawkins Great work! I've just put a little brain dump below. And there's also some changes conflict.

nit: I also like when we have test-covered the CR serialization as they're running a few ms and may bring additional test coverage - but in this case the PodTemplate test should be sufficient enough.

Comment thread docs/guides/operator/advanced-configuration.adoc Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be good for users to also briefly explain why we want to prevent the stretched clusters? When we support the affinity configuration, we should mention the cons of the different AZ and the embedded Infinispan, correct? We will avoid some future confusion when users use different affinity settings.

@shawkins WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a little more on this. I'd defer to @ryanemerson if we want a more detailed explanation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we have for now is fine. I think we need a dedicated guide explaining different architectures, stretch vs non-stretch etc, as it's a complex topic.

Comment thread docs/documentation/upgrading/topics/changes/changes-26_0_0.adoc Outdated
Comment thread docs/guides/operator/advanced-configuration.adoc Outdated
@shawkins

Copy link
Copy Markdown
Contributor Author

@ryanemerson @ahus1 @vmuzikar in the docs should we call out specifically recommending an affinity for the same zone as the database?

@ahus1

ahus1 commented Jun 28, 2024

Copy link
Copy Markdown
Member

@shawkins

should we call out specifically recommending an affinity for the same zone as the database?

The good thing about pod affinity is that it is now (1) supported by the Operator and no longer part of the unsupported pod template and (2) we have a reasonable default.

Deploying Keycloak on the same region as the database seems to be a good idea, still I didn't tested this how much it actually impacts Keycloak. And if the database is running outside of Kubernetes, you would start coding the zone manually in the affinity rules, and not use the "try to run all pods in the same zone but I don't care which".

So I'd say it is probably good enough for now.

Once we work with this a bit more, I could see that we provide some predefined scheduling templates to hide the complexity and to promote some well-tested and well-understood scheduling methods. This would then be a follow-up PR.

@ryanemerson

Copy link
Copy Markdown
Contributor

+1 to Alexander's comment. I think that more advanced/opinionated setting examples should exist as part of the Keycloak HA guide or similar, where we explain in detail the trade-offs of different architectures and how they apply to specific use-cases.

@mabartos mabartos left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawkins Just last minor thing about the example formatting, otherwise LGTM

Comment thread docs/guides/operator/advanced-configuration.adoc Outdated
@shawkins

shawkins commented Jul 2, 2024

Copy link
Copy Markdown
Contributor Author

@shawkins Just last minor thing about the example formatting, otherwise LGTM

Should be good now.

@vmuzikar vmuzikar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one minor comment. LGTM otherwise.

Comment thread docs/guides/operator/advanced-configuration.adoc Outdated
closes: keycloak#29258

Signed-off-by: Steve Hawkins <shawkins@redhat.com>

@vmuzikar vmuzikar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawkins LGTM, thank you.

@vmuzikar

vmuzikar commented Jul 3, 2024

Copy link
Copy Markdown
Contributor

@mabartos I believe your comments has been resolved. I'm overriding your review request so we can merge. Once you're back, let me know if you have any further concerns, we can address them as a follow-up.

@vmuzikar vmuzikar dismissed mabartos’s stale review July 3, 2024 18:06

Review comments has been resolved

@vmuzikar vmuzikar merged commit a7ae90c into keycloak:main Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support pod affinity settings in the Keycloak Operator

5 participants