Skip to content

Inkless 4.2 sync 4.2.1#649

Open
gqmelo wants to merge 73 commits into
inkless-4.2from
inkless-4.2-sync-4.2.1
Open

Inkless 4.2 sync 4.2.1#649
gqmelo wants to merge 73 commits into
inkless-4.2from
inkless-4.2-sync-4.2.1

Conversation

@gqmelo

@gqmelo gqmelo commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

clolov and others added 30 commits February 16, 2026 11:39
…ker provisioning (#21394)

## Summary

Fixes bugs where `--jdk-version` and `--jdk-arch` parameters were
ignored during system test worker provisioning, and refactors
`vagrant/base.sh` to support flexible JDK versions without code changes.

---

## Problem

The Vagrant provisioning script (`vagrant/base.sh`) had two bugs that
caused JDK version parameters to be ignored:

| Bug | Problem |
|-----|---------|
| **#1: `--jdk-version` ignored** | `JDK_FULL` was hardcoded to
`17-linux-x64`, so passing `--jdk-version 25` still downloaded JDK 17 |
| **#2: `--jdk-arch` ignored** | Architecture parameter was passed but
never used in the S3 download URL |

---

## Solution

- Validate `JDK_MAJOR` and `JDK_ARCH` input parameters with regex
- Dynamically construct `JDK_FULL` from `JDK_MAJOR` and `JDK_ARCH`
- Update S3 path to use `/jdk/` subdirectory
- Add logging for debugging

---

## Changes

### `vagrant/base.sh`

| Change | Description |
|--------|-------------|
| **Input validation** | Added regex validation for `JDK_MAJOR` and
`JDK_ARCH` with sensible defaults |
| **Dynamic construction** | `JDK_FULL` is now constructed from
`JDK_MAJOR` and `JDK_ARCH` if not explicitly provided |
| **Updated S3 path** | Changed URL from
`/kafka-packages/jdk-{version}.tar.gz` to
`/kafka-packages/jdk/jdk-{version}.tar.gz` |
| **Logging** | Added debug output for JDK configuration |
| **Backward compatibility** | Vagrantfile can still pass `JDK_FULL`
directly; the script validates and uses it if valid |

---

## S3 Path Change

### Old Path
```
s3://kafka-packages/jdk-{version}.tar.gz
```

### New Path
```
s3://kafka-packages/jdk/jdk-{version}.tar.gz
```

### Available JDKs in `s3://kafka-packages/jdk/`

| File | Version | Architecture |
|------|---------|--------------|
| `jdk-7u80-linux-x64.tar.gz` | 7u80 | x64 |
| `jdk-8u144-linux-x64.tar.gz` | 8u144 | x64 |
| `jdk-8u161-linux-x64.tar.gz` | 8u161 | x64 |
| `jdk-8u171-linux-x64.tar.gz` | 8u171 | x64 |
| `jdk-8u191-linux-x64.tar.gz` | 8u191 | x64 |
| `jdk-8u202-linux-x64.tar.gz` | 8u202 | x64 |
| `jdk-11.0.2-linux-x64.tar.gz` | 11.0.2 | x64 |
| `jdk-17-linux-x64.tar.gz` | 17 | x64 |
| `jdk-18.0.2-linux-x64.tar.gz` | 18.0.2 | x64 |
| `jdk-21.0.1-linux-x64.tar.gz` | 21.0.1 | x64 |
| `jdk-21.0.1-linux-aarch64.tar.gz` | 21.0.1 | aarch64 |
| `jdk-25-linux-x64.tar.gz` | 25 | x64 |
| `jdk-25-linux-aarch64.tar.gz` | 25 | aarch64 |
| `jdk-25.0.1-linux-x64.tar.gz` | 25.0.1 | x64 |
| `jdk-25.0.1-linux-aarch64.tar.gz` | 25.0.1 | aarch64 |
| `jdk-25.0.2-linux-x64.tar.gz` | 25.0.2 | x64 |
| `jdk-25.0.2-linux-aarch64.tar.gz` | 25.0.2 | aarch64 |

---

## Future JDK Releases

> **IMPORTANT: No code changes required for future Java major/minor
releases!**

The validation regex supports all version formats:
- **Major versions**: `17`, `25`, `26`
- **Minor versions**: `25.0.1`, `25.0.2`, `26.0.1`
- **Legacy format**: `8u144`, `8u202`

### Adding New JDK Versions

To add support for a new JDK version (e.g., JDK 26, 25.0.3):

1. Download the JDK tarball from Oracle/Adoptium
2. Rename to follow naming convention:
`jdk-{VERSION}-linux-{ARCH}.tar.gz`
3. Upload to S3: `aws s3 cp jdk-{VERSION}-linux-{ARCH}.tar.gz
s3://kafka-packages/jdk/`
4. Use in tests: `--jdk-version {VERSION} --jdk-arch {ARCH}`

No modifications to `base.sh` or any other scripts are needed.

---

## Benefits

| Before | After |
|--------|-------|
| `--jdk-version` ignored | ✅ Correctly uses specified version |
| `--jdk-arch` ignored | ✅ Correctly uses specified architecture |
| Only major version support | ✅ Full version support (e.g., `25.0.2`) |
| Code change needed for new JDK | ✅ Just upload to S3 and pass version
|

---

## Testing

Tested with different JDK versions to confirm the fix works correctly:

| Test | JDK_MAJOR | Expected | Actual | Result | Test Report |
|------|-----------|----------|--------|--------|-------------|
| JDK 17 | `17` | javac 17.0.4 | javac 17.0.4 | ✅ |
| JDK 25 | `25` | javac 25.0.2 | javac 25.0.2 | ✅ |


---

## Backward Compatibility

- **Vagrantfile**: Continues to work as before
- **Existing workflows**: Default behavior unchanged (JDK 17 on x64
architecture)
- **No breaking changes**: All existing configurations continue to work

---

 Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…(#21452)

Upgrade Jetty from 12.0.22 to 12.0.32 to address

GHSA-mmxm-8w33-wc4h
(MadeYouReset HTTP/2 DoS, CVSS 7.7 HIGH).

  Note that CVE-2025-5115 only affects the
`org.eclipse.jetty.http2:jetty-http2-common` module. Kafka does not
depend on this module — its embedded Jetty servers (Connect RestServer
and Trogdor JsonRestServer) only use HTTP/1.1 via `ServerConnector`
without any                     `HTTP2ServerConnectionFactory`
configuration. As such, the attack vector is not applicable. This
upgrade from 12.0.22 to 12.0.32 is to keep the dependency up to date.

4.0: apache/kafka#21462
4.1: apache/kafka#21461

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…character (#21471)

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Andrew Schofield
 <aschofield@confluent.io>, Sushant Mahajan <smahajan@confluent.io>
…#21448) (#21501)

Cherry-pick of apache/kafka#21448

Changed the name of method that work only with initialized tasks(not
pending) to better reflect their purpose.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Lucas Brutschy
 <lbrutschy@confluent.io>

Conflicts:
streams/src/main/java/org/apache/kafka/streams/processor/internals/TaskManager.java
streams/src/test/java/org/apache/kafka/streams/processor/internals/TaskManagerTest.java
…incompatibility (#21559)

Jetty 12.0.30+ introduced SLF4J 2.x fluent API usage
(`Logger.atDebug()`) which causes `NoSuchMethodError` at runtime since
Kafka still uses SLF4J 1.7.x. Downgrade to 12.0.25 which includes the
CVE-2025-5115 fix without the SLF4J 2.x incompatibility.

The issue was discovered and discussed in
apache/kafka#21452 (comment).

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…tbeat (#21526)

When the endpoint information epoch known to a member is outdated, the
broker recomputes the endpoint-to-partitions mapping for all members
from scratch. This is expensive because it happens for each member
individually as they catch up to the new epoch.

This commit adds a per-member cache for endpoint-to-partitions results
in StreamsGroup, keyed by member ID. Cache entries are explicitly
invalidated when a member changes, when a member is  removed, or when
the configured topology changes.

The computation and endpoint building logic is moved from
GroupMetadataManager into StreamsGroup. This keeps the cache management
colocated with the member lifecycle methods that need to invalidate it
(updateMember, removeMember, setConfiguredTopology), and also avoids the
overhead of the Collections.unmodifiableMap wrapper returned by the
public members() accessor.

Tests cover cache population, retrieval, and invalidation on task
change, member removal, topology change, and preservation when tasks are
unchanged.

Reviewers: Matthias J. Sax <matthias@confluent.io>
…lag on failed LIST_OFFSETS calls (#21596)

Cherry-pick of #21457 to 4.2.

Updates the `ClassicKafkaConsumer` to clear out the `SubscriptionState`
`endOffsetRequested` flag if the `LIST_OFFSETS` call fails.

Reviewers: Andrew Schofield <aschofield@confluent.io>
Add a note to the upgrading docs for users with clusters of fewer than 3
brokers. Share groups introduce a new internal topic whose default
configuration assumes 3 or more brokers.

This will be cherry-picked back to the 4.2 branch for inclusion in
4.2.1.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>
jackson-annotations has a different versioning starting from 2.20 - 2.20
instead of 2.20.x (no patch digit), so I separated it from the other
jackson dependencies.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
fix Jackson core vulnerability reported in

1. GHSA-72hv-8253-57qq
2. https://www.miggo.io/vulnerability-database/cve/GHSA-72hv-8253-57qq

Issue report is at https://issues.apache.org/jira/browse/KAFKA-20241

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Mickael Maison <mickael.maison@gmail.com>
…roup exists (#21641)

During replay of the __consumer_offsets topic, offset commit records can
appear before streams group records, for example after log compaction.
When OffsetMetadataManager replays an offset commit for a group that
doesn't exist yet, it automatically creates a simple ClassicGroup to
hold the offsets. When the streams group records are subsequently
replayed, getOrMaybeCreatePersistedStreamsGroup fails with "Group X is
not a streams group" because it does not handle simple classic groups.

This adds handling for simple classic groups in
getOrMaybeCreatePersistedStreamsGroup, matching the existing pattern in
getOrMaybeCreatePersistedConsumerGroup. Simple classic groups have no
backing records in __consumer_offsets and can safely be replaced.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Chia-Ping Tsai
 <chia7712@gmail.com>
… migration in 4.2.0 (#21638)

Due to a critical bug in the offline migration code (KAFKA-20254), we
recommend against doing migrations from classic to streams groups in
4.2.0. This adds a note to both the main upgrade guide and the Kafka
Streams upgrade guide, advising users to upgrade to a later release that
includes the fix before attempting the migration.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Christo Lolov
 <lolovc@amazon.com>
…(#21685)

Fix internal documentation links that were using the old path format
(`/documentation/streams/...`) and `.html` extensions, which no longer
resolve correctly in the new docs site structure. Also reorder the
KAFKA-20254 migration warning note in the Streams upgrade guide to
appear after the GA announcement paragraph for better readability.

Reviewers: Matthias J. Sax <matthias@confluent.io>
…ager (#21729)

Log messages in

`org.apache.kafka.controller.ReplicationControlManager#validateAlterPartitionData`
are misleading and are not coherent with the other log messages (where
"current" indicate the controller state coming from
PartitionRegistration).

Reviewers: Josep Prat <josep.prat@aiven.io>
…eout (#21619)

Set pendingRpc to false when controller registration times out. If we do
not set this, controllers cannot send ControllerRegistrationRequests
thereafter. Instead, subsequent calls to
maybeSendControllerRegistration() will always log:
"maybeSendControllerRegistration: waiting for the previous RPC to
complete." The asserts on L294 and L300 fail when pendingRpc does not
get set in onTimeout.

Previously, RegistrationResponseHandler callbacks were invoked from the
NodeToControllerRequestThread. Instead, these callbacks should append an
event to the ControllerRegistrationManger event queue.

Added testRetransmitRegistrationAfterTimeout. This test times out a
controller registration, and checks that the registration manager
channel manager's request queue state is as expected.

Reviewers: José Armando García Sancio <jsancio@apache.org>
…#21782)

[KIP-1228](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1228%3A+Add+Transaction+Version+to+WriteTxnMarkersRequest)
added a WriteTxnMarkersRequest v2 with a TransactionVersion field.
However, TransactionMarkerChannelManager creates its NetworkClient with
discoverBrokerVersions=false, which disables API version negotiation
with peer brokers. Without version discovery, the ApiVersions cache is
never populated — apiVersions.get(nodeId) returns null in
NetworkClient.doSend(), causing it to fall through to
builder.latestAllowedVersion() which blindly uses the highest version
the sending broker knows about rather than negotiating a mutually
supported version.  In this fix we enable discovery and also fix the
system test that could've caught this issue earlier. The system test was
run locally and I verified that it failed without the fix and passed
with the fix.

Reviewers: Artem Livshits <alivshits@confluent.io>, David Jacot
 <djacot@confluent.io>
…eased if request is invalid (#21740)

If an exception is thrown within
kafka.network.Processor#processCompletedReceives close the receive
(return the buffer to the memory pool) if it has not been returned
already.  Buffer may have been returned when successfully creating the
RequestChannel.Request if the api did not require DelayedAllocation

(cherry picked from commit b9882a0020d6553cf51cab673a8ba3fb48bbddac)
Fixes a potential NPE in `ShareFetch`. I have hit it once during
testing, but only once. Fix in progress on trunk and will include in
4.3.0. This is worth putting into 4.2.1.

Reviewers: Nilesh Kumar <nileshkumar3@gmail.com>, aliehsaeedii
 <asaeedi@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
Cherry picking
*

apache/kafka@3e9ae03
*

apache/kafka@6eebb86
For 4.2.1 cut

Reviewers: Andrew Schofield <aschofield@confluent.io>

---------

Co-authored-by: Chirag Wadhwa <cwadhwa@confluent.io>
Co-authored-by: majialong <majialoong@gmail.com>
…group. (#21239)

*What*

- Currently if a consumer/share-consumer calls `close()` before it has
joined a group, then the heartbeat on close will be sent with `epoch` =
-1 and the broker would return "`GroupIdNotFoundException`".
- This was causing couple of tests in `ShareConsumerTest` to be flaky if
the heartbeat to join the group was sent with `epoch` = -1.
- Since this can occur in real scenarios as well, it would be better to
tolerate this exception while we are leaving the group so that the
consumer can close cleanly.

Reviewers: Andrew Schofield <aschofield@confluent.io>
…701)

    Fix to call the shareMembershipMgr reconcile when processing a share
    poll event (not the consumerMembershipManager)

    No changes in logic because maybeReconcile is implemented in the parent
    class AbstractMembershipMgr, but it's confusing and could lead to errors
    if ever we override the maybeReconcile.

    Reviewers: Andrew Schofield <aschofield@confluent.io>
…vent timeout issue (#21235)

In extreme situations, the existing throttling mechanism in share
consumer limits the consumer to processing only a single record at a
time, which can intermittently cause `testComplexShareConsumer` to time
out. I use a known number of records to make the test reliable.

Reviewers: Andrew Schofield <aschofield@confluent.io>
When a share consumer is initially joining a group but not yet fetching
records, the poll loop has no records to wait for. Each iteration of the
loop was creating an instance of `SharePollEvent` without considering
whether an in-flight event existed. As a result, many events could
briefly queue up for no good reason, only to be drained once the
consumer successfully joined the group.

The PR follows a similar design as used in the `AsyncKafkaConsumer` for
managing its poll event, without that code's extra complication of
validating the position.

Reviewers: TaiJuWu <tjwu1217@gmail.com>, Lianet Magrans
 <lmagrans@confluent.io>
… repartition topics (#21817)

This PR fixes `StreamsGroupCommand` to include repartition source
topics when retrieving committed offsets for streams groups.   Offset
resets and deletions are still limited exclusively to source topics.

Testing - unit tests:
- testGetCommittedOffsetsIncludesRepartitionTopics: Verifies that
repartition topics are included while changelog and output topics are
excluded    - testGetCommittedOffsetsWithMultipleSubtopologies: Verifies
correct behavior across multiple subtopologies

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
(cherry picked from commit 4ce6c0467f10b26cf18be7ef2a64be2eda0d13a7)
…n and broker restart (#21642)

This adds integration tests to GroupCoordinatorIntegrationTest that
verify the group coordinator loads correctly after compaction and broker
restart when a group has been upgraded from the classic protocol to the
consumer or streams protocol.

For each protocol upgrade path (classic-to-consumer and
classic-to-streams), there are two test variants. The first variant
compacts __consumer_offsets but retains the GroupMetadata tombstone,
verifying that replaying tombstones after compaction works correctly.
The second variant forces tombstone removal by running two compaction
passes with delete.retention.ms=0 and min.cleanable.dirty.ratio=0. This
reproduces the KAFKA-20254 scenario: after the tombstone is removed, the
classic group's offset commit records precede the consumer/streams group
records in the log. During replay, these offset commits create a simple
classic group, which the consumer/streams group replay logic must handle
via the isSimpleGroup() fix in

getOrMaybeCreatePersistedConsumerGroup/getOrMaybeCreatePersistedStreamsGroup.

In the tombstone-removed variants, the upgraded group avoids committing
offsets so that the classic group's offset commits are not overwritten
during compaction and survive at their original early position in the
log, naturally triggering the bug without the fix.

Reviewers: David Jacot <djacot@confluent.io>, Matthias J. Sax
 <matthias@confluent.io>
(cherry picked from commit b9eae3388cb8145c3e43130f66526927f804a749)
…tMetrics$MetricGroup.close() (#21717)

The PR fixed an NPE when connectorStatusMetrics unregistering task. The
issue will happen if a connector has more than 1 task on one worker and
these tasks fail to start. For example, if the DNS cant resolve producer
URL, tasks will fail to build. If these tasks failing to start,
ConnectorStatusMetricsGroup will unregister them. After first task
unregistered , connectorStatusMetrics removed connector. When second
task calls `recordTaskRemoved()`, connectorStatusMetrics is empty, so
`connectorStatusMetrics.get(connectorTaskId.connector()).close();` will
throw NPE.  Like in method `recordTaskAdded()`, this fix checks
connectorStatusMetrics at the beginning of method `recordTaskRemoved()`.

Co-authored-by: Fan Yang <fan365@hotmail.com>

Reviewers: Ding <isDing_L@163.com>, Chia-Ping Tsai <chia7712@gmail.com>
(cherry picked from commit eb0a7358c453782f07c860fe94f8e5e1c8ef7f6f)
…tch (#21944)

Cherry-picking (#21464)

Fix to ensure consumer poll/committed do not fail fatally on retriable
UNKNOWN_TOPIC_ID partition-level error when fetching committed offsets.

This issue didn't happen before using topic IDs on offset fetch, because
when fetching with topic names the GC returns no error for unknown
partitions (just includes offset -1). But with topic IDs (OffsetFetch
v10+), if a topic is not known to the GC, the OffsetFetch response
includes UNKNOWN_TOPIC_ID as error.

The fix is to let poll/committed retry while there is time. If the error
does not recover, fetch offsets still returns gracefully (offsets for
the partitions that have, null for the partitions that were deleted).
This keeps the contract of the committed API, and ensures poll uses the
committed offsets available (it will use partition offsets for
partitions that still need positions)

Added integration tests that fail without the fix, to show the gaps.

Reviewers: David Jacot <djacot@confluent.io>, Nikita Shupletsov
 <nikita@shupletsov.ca>
Upgrade Jetty from 12.0.25 to 12.0.34.

Jetty 12.0.34 has removed all dependencies on the SLF4J 2.x API,
resolving the previous incompatibility with Kafka's SLF4J 1.7.x usage.
This also fixes CVE-2025-11143 in jetty-http.

This can be cherry-picked to 4.3 and 4.2. A separate PR for 4.1: #21940.

Reviewers: PoAn Yang <payang@apache.org>
(cherry picked from commit 55a7c2fa74112a50afd1ced0df952460b8f8fd79)
FrankYang0529 and others added 23 commits April 18, 2026 20:00
…lready. (#21586)

System tests that use VerifiableConsumer are flaky because
VerifiableConsumer isn't shutting down on request in certain situations.
There can be a race condition in the commitSync method, as the future
that we set as the active task to the wakeupTrigger can be already
completed by the time we are setting it. Which leads to the wakeup
request never being fulfilled.  Added a check if the task we are
receiving in setActiveTask was triggered when we complete it
exceptionally.

Also added additional logging when a shutdown is requested to make
debugging easier.

Reviewers: Kirk True <ktrue@confluent.io>, Bill Bejeck
 <bbejeck@apache.org>
(cherry picked from commit fdece9c358f3c60a83f086d8adac9749d0e45fba)
Adding 4.2.0 to system tests as a post-release step

Reviewers: Matthias J. Sax <mjsax@apache.org>
(cherry picked from commit ae02ee9769ebadde78f2dead91a119ac22b61855)
cherry picked from commit 2759ea0dfade37f5ead2554dab4bd47572b5db18

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Federico Valeri
 <fedevaleri@gmail.com>, Ken Huang <s7133700@gmail.com>

Co-authored-by: Murali Basani <muralidhar.basani@aiven.io>
…(#22090)

Fail to run docker_rc_release.yml cause of actions are not in allowed
list. Update related actions.

https: //github.com/apache/kafka/actions/runs/24604244002
https:
//github.com/apache/infrastructure-actions/blob/main/approved_patterns.yml

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

Signed-off-by: PoAn Yang <payang@apache.org>
(cherry picked from commit a09f7eda3f33693a92c0fe9b9faf845abfbffb5d)
…n (#22130)

As explianed in KAFKA-20505, there can be a deadlock when future is
completed for the request where next set of actions tries to attain lock
on purgatory (checkAndComplete/trigger waiting requests). As the lock
might not always be released hence a deadlock can happen. The PR moves
such futures out of the lock.

I have also reviewed other future completions and doesn't seems we need
other changes.

I have tested using franz-go Kafka test and can't reproduce the issues
in 160 continuous runs. Earlier the issue was reproducible between 20-50
consecutive runs.

```
=== Run 160 ===
=== RUN   TestShareGroupETL
=== PAUSE TestShareGroupETL
=== CONT  TestShareGroupETL
[09:59:38.788 1][INFO] producing to a new topic for the first time,
fetching metadata to learn its partitions; topic:
f2c35a17978f41d684a36ab8297dd873138cb3c982f095a7d0e74d7a8589411e
...
...
[09:59:43.94 3][INFO] immediate metadata update triggered; why: forced
load because we are producing to a topic for the first time
[09:59:43.947 3][INFO] done waiting for metadata for new topic; topic:
6ce53f956e91c6b106843ff7aa728451f290cbd9064cefc8dc73ee2bcadef7b9
    share_test.go:507: level 1 phase 2: adding consumers after 122923
consumed
[09:59:44.225 3][INFO] flushing
[09:59:44.225 4][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
...
...
[09:59:44.226 5][INFO] beginning to manage the share group lifecycle;
group: 0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35
[09:59:44.226 10][INFO] beginning to manage the share group lifecycle;
group: 0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35
[09:59:44.227 3][INFO] leaving share group; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
member_id: LP4lpqzQjAm-QxQdCRkSXA==
[09:59:44.227 7][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments: map[]
[09:59:44.227 4][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments: map[]
...
...
[09:59:49.232 7][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments:
map[f2c35a17978f41d684a36ab8297dd873138cb3c982f095a7d0e74d7a8589411e:[1
2]]
[09:59:49.232 6][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments:
map[f2c35a17978f41d684a36ab8297dd873138cb3c982f095a7d0e74d7a8589411e:[1]]
...
...
[09:59:49.466 18][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
[09:59:49.466 13][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
[09:59:49.466 15][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
[09:59:49.467 16][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
[09:59:49.467 14][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
[09:59:49.467 11][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
...
...
[09:59:49.467 15][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
[09:59:49.468 13][INFO] assigning share partitions; group:
0045957d65fdd11a7bd0dfbc9293c3fd935da4bb359df6fa56aca8dcc119ff35,
assignments: map[]
[09:59:49.469 17][INFO] assigning share partitions; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
assignments: map[]
...
...
[09:59:54.472 18][INFO] assigning share partitions; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
assignments:
map[5da63960cb4fe97d887d575a73dfbddc51a2eb8071d119b3a5ba5a2b0d87bc7e:[1]]
[09:59:54.485 14][INFO] assigning share partitions; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
assignments:
map[6ce53f956e91c6b106843ff7aa728451f290cbd9064cefc8dc73ee2bcadef7b9:[0]]
...
...
[09:59:54.494 12][INFO] metadata update triggered; why: reload trigger
due to produce topic still not known
[09:59:54.495 12][INFO] producer id initialization success; id: 3524,
epoch: 0
[09:59:54.5 13][INFO] producing to a new topic for the first time,
fetching metadata to learn its partitions; topic:
6ce53f956e91c6b106843ff7aa728451f290cbd9064cefc8dc73ee2bcadef7b9
[09:59:54.5 13][INFO] immediate metadata update triggered; why: forced
load because we are producing to a topic for the first time
...
...
[09:59:54.525 11][INFO] leaving share group; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
member_id: _4Jk9faoDdUlGMlSR9zKmg==
    share_test.go:605: level 2 rebalance 1: killing l2-c1 after 169339
consumed
[09:59:55.101 14][INFO] flushing
[09:59:55.101 19][INFO] immediate metadata update triggered; why:
querying metadata for consumer initialization
[09:59:55.102 19][INFO] beginning to manage the share group lifecycle;
group: d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6
[09:59:55.103 14][INFO] leaving share group; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
member_id: wJibgxG934tiAuqCPloF_w==
[09:59:55.107 19][INFO] assigning share partitions; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
assignments: map[]
    share_test.go:619: level 2 rebalance 2: killing l2-c3 after 375726
consumed
[09:59:55.401 18][INFO] flushing
...
...
[10:00:00.915 20][INFO] leaving share group; group:
d90b64695fc163edc7c1052c412cd0ca4d4dd1696badab2b5a9b5a1e81e000c6,
member_id: HjzG4-5QwncfYfr8pSmEUQ==
    share_test.go:377: level 1: 499900 unique keys, 500624 total
accepts, 500624 produced, 724 duplicates, 35614 redelivered, max dc 3,
consumed 532987
    share_test.go:377: level 2: 499900 unique keys, 501513 total
accepts, 501513 produced, 1613 duplicates, 20272 redelivered, max dc 2,
consumed 518049
    share_test.go:704: level 1: 100 purely rejected, 35614 redelivered
    share_test.go:60: deleting topic
f2c35a17978f41d684a36ab8297dd873138cb3c982f095a7d0e74d7a8589411e
    share_test.go:61: deleting topic
f7e388a2de7ef0814328f9186e8c4b73b1f2437490e1b98730af9fb17ee74175
    share_test.go:62: deleting topic
5da63960cb4fe97d887d575a73dfbddc51a2eb8071d119b3a5ba5a2b0d87bc7e
    share_test.go:63: deleting topic
6ce53f956e91c6b106843ff7aa728451f290cbd9064cefc8dc73ee2bcadef7b9
    share_test.go:64: deleting topic
7e74eb054cbb02e0de5da8a8018115dc01094496222039f323841770b11b8a12
    share_test.go:65: deleting topic
4b8c44d4071cd22272ae9ac694342faa3404bd10b479fe88874bdef4a8a4276d
--- PASS: TestShareGroupETL (22.73s)
PASS
ok      github.com/twmb/franz-go/pkg/kgo        22.926s
```

Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai
 <chia7712@gmail.com>, PoAn Yang <payang@apache.org>
When upgrading from Kafka 3.x to 4.0, the metadata log may contain
dynamic configurations that were removed in 4.0 (e.g.,
message.format.version per KIP-724). These removed configs cause
InvalidConfigurationException when users attempt to modify any
configuration, because validation checks all existing configs including
the removed ones.

Adds filtering to prevent unsupported or invalid configurations from
being applied during metadata replay. The filtering is implemented using
a SupportedConfigChecker interface that is injected via dependency
injection through Builder patterns. When a ConfigRecord is replayed, the
checker validates whether the configuration name is supported for the
given resource type. Unsupported configurations are silently ignored
during replay, ensuring that only valid configurations enter the
in-memory state.

The SupportedConfigChecker interface provides a default TRUE
implementation that accepts all configurations. The actual filtering
logic is implemented by DefaultSupportedConfigChecker, which maintains a
whitelist of valid configuration names per resource type (TOPIC,
CLIENT_METRICS, GROUP) based on the actual config definitions. The
filtering occurs in both ConfigurationDelta#replay and
ConfigurationControlManager#replay methods.

Added unit tests to ensure:
- Removed configurations are filtered during the replay operations
- Only supported configurations appear in the resulting metadata images
- The filtering works correctly for all resource types (TOPIC, BROKER,
CLIENT_METRICS, GROUP)
- DefaultSupportedConfigChecker correctly identifies supported vs
unsupported configurations for each resource type

Reviewers: José Armando García Sancio <jsancio@apache.org>, Jun Rao
<junrao@apache.org>, Alyssa Huang <ahuang@confluent.io>, Kevin Wu
<kevin.wu2412@gmail.com>, Andrew Grant <agrant@confluent.io>

(cherry picked from commit a35d6492fbf8068cdb025419178434cbae3a991b)
… defined (#22216)

Kafka can incorrectly resolve the advertised listener for controllers if
it is not specified. For controller configurations that specify the
controller.quorum.voters but not the advertised.listener, Kafka can
deduce the advertise listener for the default listener. In this cases,
Kafka will automatically set the advertised listener for the default
listener to the endpoint specified in controller.quorum.voters.

Reviewers: José Armando García Sancio <jsancio@apache.org>
Co-authored-by: José Armando García Sancio <jsancio@apache.org>
Fixes NPE when starting Kafka Connect 4.2.0 with listener-prefixed SSL
configs.

JIRA: https://issues.apache.org/jira/browse/KAFKA-20572

This is a regression introduced on
apache/kafka#20334

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…trict constraint after 4.2.1 sync

Upstream updated jackson to 2.21.2, but wiremock 3.9.2 depends on
jackson-dataformat-yaml 2.18.0

Currently there is no newer version of wiremock that uses jackson
2.21.2, so we need to resolve this conflict.
…r change

Upstream 4.2.1 changed ConfigDef to silently de-duplicate LIST config values
(with a warning log) instead of throwing a ConfigException. The inkless tests
for CLASSIC_REMOTE_STORAGE_FORCE_EXCLUDE_TOPIC_REGEXES_CONFIG and
DISKLESS_FORCE_INCLUDE_TOPIC_REGEXES_CONFIG were asserting the old throw
behavior, which no longer applies.

See:

apache/kafka@a1861ad
@gqmelo gqmelo force-pushed the inkless-4.2-sync-4.2.1 branch 4 times, most recently from bffe454 to f773230 Compare June 16, 2026 10:50
@gqmelo gqmelo marked this pull request as ready for review June 16, 2026 11:12
A new verifyVersionConsistency gradle task was added by upstream and
that fails with our Inkless versioning:

61b8b29#diff-49a96e7eea8a94af862798a45174e6ac43eb4f8b4bd40759b5da63ba31ec3ef7R4073

This tweaks the task to accept our versions containing the inkless
suffix.
@gqmelo gqmelo force-pushed the inkless-4.2-sync-4.2.1 branch from f773230 to 80b7cb3 Compare June 16, 2026 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.