Skip to content

Conversation

@def-
Copy link

@def- def- commented Nov 3, 2025

No description provided.

emasab and others added 30 commits August 5, 2024 11:29
…4794)

and add build checks with different configurations
fallback happens when any topic to fetch has a zero topic id,
that can happen if the cluster has a Fetch version that is greater or equal to 13 but an inter broker protocol version that is less than 2.8
add missing consumer metrics described in the KIP:

* consumer.coordinator.rebalance.latency.avg
* consumer.coordinator.rebalance.latency.max
* consumer.coordinator.rebalance.latency.total
* consumer.fetch.manager.fetch.latency.avg
* consumer.fetch.manager.fetch.latency.max
* consumer.poll.idle.ratio.avg
* consumer.coordinator.commit.latency.avg
* consumer.coordinator.commit.latency.max

additionally:

* add unit tests for all the metrics
* add integrations tests with the producer or consumer while they're active
* configurable group initial rebalance delay ms to make integration tests reusable with both producer and consumer
---------
Co-authored-by: mahajanadhitya <[email protected]>
Co-authored-by: Anchit Jain <[email protected]>
Co-authored-by: Emanuele Sabellico <[email protected]>
also fix macOS build issue
---------

Co-authored-by: Emanuele Sabellico <[email protected]>
…tions (confluentinc#4845)

* electLeaders Api 1st draft

* Minor change

* Elect Leaders API final

* minor change

* Added a function

* Minor Change

* removed binary file from pr

* Style fix

* Resolve documentation errors

* resolved build errors

* first round of comments

* name fixes

* second round of comments

* memory deallocation fix

* Changelog changes addition

* changes

* variable naming change

* third round comments

* grammar change

* latest changes

* additional formatting

* documentation changes

* additional comments

* one more round comments

* election type count

* name change election type cnt

* new comments

* removing election type check

* removing merge conflicts

* changelog.md changes

* new changes

* Fixing failing tests

* Moved top level error code from response to event

* changed topic partition result structure

* style fix

* remove null partitions check

* requested changes

* introduction md formatting

* requested changes

* partitions cnt print

---------

Co-authored-by: Pranav Rathi <[email protected]>
from external API and avoid exposing internal destructor
…fluentinc#4871)

AK 2.7 is the first version implementing Fetch 12, before that it shouldn't fallback to v12, neither check if topic IDs are supported.
…#4800)

when potential topic partitions are less than number of members.
…nfluentinc#4754)

as replicas reply with "NOT_LEADER_OR_FOLLOWER" when using the replica id, Java clients sends requests to the leader too.


Co-authored-by: Kyle Phelps <[email protected]>
…a instances (confluentinc#4724)

Circular dependencies from a partition fetch queue message to  the same partition blocked the destroy of an instance, that happened in case the partition was removed from the cluster while it was being consumed. Solved by purging internal partition queue, after being stopped and removed, to allow reference count to reach zero and trigger a destroy.

Purging internal fetch queue on removing the partition only for the consumer.
* Security upgrade for OpenSSL and Curl, CVEs fixed:

OpenSSL
- CVE-2024-2511
- CVE-2024-4603
- CVE-2024-4741
- CVE-2024-5535
- CVE-2024-6119

CURL
- CVE-2024-8096
- CVE-2024-7264
- CVE-2024-6874
- CVE-2024-6197

* Fix for curl configure failure caused by
curl/curl#14373
)

must be equal to the server sent nonce, that already contains the client side nonce. librdkafka was incorrectly concatenating the client side nonce again, leading to this fix being made on AK side, released in 3.8.1, with endsWith instead of equals.
apache/kafka@0a00456
except the Style check job because it needs clang format 10
Fix to inherit javac path, needed by test 0098
Mock handler implementation
Rename current consumer protocol from generic to classic
Mock handler with automatic or manual assignment
More consumer group metadata getters
Test helpers
Configurable session timeout and HB interval
Fix mock handler ListOffsets response
LeaderEpoch instead of CurrentLeaderEpoch
Integration tests passing with AK trunk
Improve documentation and KIP 848 specific mock tests
Add mock tests for unknown topic id in metadata request and partial reconciliation
Make test 0147 more reliable
Fix test 0106 after HB timeout change
Exclude test case with AK trunk
Rename rd_kafka_buf_write_tags to rd_kafka_buf_write_tags_empty
Trivup 0.12.5 can run a KafkaCluster directly with KRaft and AK trunk
Trivup 0.12.6 build with a specific commit
Trivup 0.12.7 with fixes for AK 3.8.0 and Py 3.12
New version of trivup 0.12.7 to fix an issue with apache/kafka#16464 on AK > 3.8.0
Static group membership mock tests
Move test 0147 to a different PR
Disable interactive "needsrestart" prompt
* test_read_file can read binary files too
* Trivup 0.12.8
* Read certificate CA chain when set using a configuration setter with PEM format. Test that CA with untrusted chain fails authentication.
* Test untrusted certificate signed with an intermediate CA
* Remove private key and duplicate certs from pem client certificate
* Print logs sent as events
* Trivup now already inheriths the environment in interactive mode
* Use namespace to avoid conflicts on TestEventCb
…h with client certificate chain (confluentinc#4900)

Failing test: expect the error code that is received when no certificate is sent instead of the one received when it's sent but not trusted.
Client cert callback to check if trusted certificate authorities match with client certificate chain.
Log a warning when client certificate isn't sent


---------

Co-authored-by: trnguyencflt <[email protected]>
… leader epoch (confluentinc#4901)

Failing tests including for confluentinc#4796 and confluentinc#4804
Closes confluentinc#4796 and confluentinc#4804
CHANGELOG
Fix for the correct expected RPC code in test 0139
Apply same fix to metadata update operation too
Don't change rktp state to active when there's no leader but wait it's available to validate it
Comment about excluded -1 value
An incorrect assumption is made that libssl is built with support for
the (now-deprecated) ENGINE API if it is provided by OpenSSL >= 1.1.0 or
LibreSSL. OPENSSL_NO_ENGINE is defined by OpenSSL and all of its forks
if the ENGINE API was disabled at compile-time - ensure that the
definition of OPENSSL_NO_ENGINE is taken into account when using ENGINE
features.
…and client is using SASL authentication only (confluentinc#4936)

without any client certificate set
emasab and others added 22 commits August 25, 2025 18:19
…e with big-endian architectures (confluentinc#5183)

* Fix compression types read issue in GetTelemetrySubscriptions response for big-endian architectures
* Decrease allocated buffer size in `rd_kafka_PushTelemetryRequest` and explicitly cast the enum
…roupHeartbeat not updating member epoch in a case (confluentinc#4672)

[KIP-848] Fixed a condition where error was being raised in commit due to old error in the topic partition
[KIP-848] Fix discarding heartbeat response without epoch update when leaving during inflight HB
Re-bootstrap is now triggered only after metadata.recovery.rebootstrap.trigger.ms
have passed since first metadata refresh request after last successful
metadata response. The calculation was since last successful metadata response
so it's possible it did overlap with the periodic topic.metadata.refresh.interval.ms
and cause a re-bootstrap even if not needed.
…m them (confluentinc#4931)

* Fetched committed offsets should be validated
before starting to consume from it.
Failing test and mock handler implementation
for returning the committed offset leader epoch
instead of current leader epoch.

* Validate the offsets before starting to fetch assigned partitions

* Add more test cases for partition assignment
offset validation

* Fix for test 0139 subtest `do_test_store_offset_without_leader_epoch` . When fetching an offset it returns the leader epoch used when committing, not the current
leader epoch.
Given the mock cluster fix the test needs to be changed.

* Fix test `0139` subtest `do_test_list_offsets_leader_change`:
use cloned partition list for listing offsets, to avoid the fake leader epoch is then used for validation when assigning.

Fix ListOffsets mock handler for logging the correct returned leader epoch.

* Changelog entry

* Reduce number of tests in quick mode

* Add a new fetch state when finishing validating and starting to seek after a truncation,
to avoid a second repeated validation and possibly duplicated messages.

* Increase single test timeout

* Fix to leave the group in `rd_kafka_cgrp_incr_unassign_done` if terminate was requested, as done in `rd_kafka_cgrp_unassign_done` and `rd_kafka_cgrp_consumer_incr_unassign_done`

* Mock cluster, set the group as empty when last member leaves
instead of triggering a rebalance

* Test 0139 with mock cluster marked as local.
Doesn't delete topic if tests are local only as it's
possible there's no cluster to connect to
and it speeds up completing the test

* Resume the partition before fetch start or before validation
* Revert setting timeout to infinity

* style fix

* Changelog change

* Changelog changes

* Changelog change
* Fix flakyness test 0085
* Errors that cause a refresh coordinator
like NOT_COORDINATOR during an offset fetch
should not be propagated to the application.
…al promotions (confluentinc#5191)

* Pipeline improvements about machine types and auto-cancel
* Use cached docker image for integration tests, style checks, docs build
* vcpkg cache
* msys2 cache
* Upgrade macOS agents
…fluentinc#5155)

* Implementation of OAUTHBEARER/OIDC metadata based authentication, initially supporting the Azure UAMI method.
* Tests with trivup 0.14.0 supporting metadata based authentications
* Add documentation and changelog entry
* Rename `azure` value to `azure_imds` and replace UAMI that is the identity with IMDS that is the authentication service
* Extract authentication URL and rename internal function and enums
* Changes to name the configuration property "query" instead of "params" as in other implementations and to make it optional if the default endpoint is overridden.
…odes (confluentinc#5194)

* Add test cases for new OffsetCommit and OffsetFetch Error Codes
* Testcase for discarding the member epoch in a consumer group heartbeat response when leaving with an inflight HB
confluentinc#5214)

* Changelog changes and some modification to the KIP-848 migration guide
* Add that KIP-848 is not enabled by default and other PR comments
* Downgrade min supported OSX version to 13
* Version upgrade to v2.12.1
@def- def- requested review from DAlperin, bkirwi and teskje November 3, 2025 19:40
Copy link

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, can we reset master to the upstream master and Cherry-Pick our commits on top? I fear otherwise we will hopelessly lose track of the changes we made.

@bkirwi
Copy link

bkirwi commented Nov 3, 2025

Unfortunately, this also doesn't build. :/ It looks like one of the commits we've reverted in our fork has been built upon so that future changes will also need to be unpicked.

It's probably worthwhile to go through the small number of extra commits we have and triage individually, but that will take longer...

@teskje
Copy link

teskje commented Nov 3, 2025

I discussed that revert with @petrosagg some time ago and it's worth trying to upgrade without it.

For context, this was added to fix unexplained memory errors. Two relevant Slack threads I could find:

Since the reason for the memory issues is unexplained, there is a chance that they have been fixed in current librdkafka versions, or have been fixed by other random changes we made.

@def-
Copy link
Author

def- commented Nov 3, 2025

In that case I think all changes we have (confluentinc/librdkafka@master...MaterializeInc:librdkafka:master) are already included in upstream librdkafka or supplanted by a better change, so we could just switch our rust-rdkafka to use the upstream librdkafka. Edit: MaterializeInc/rust-rdkafka#35

@def- def- closed this Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.