-
Notifications
You must be signed in to change notification settings - Fork 47
transactions: add info about preventing OOM #1425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -270,6 +270,27 @@ Redpanda’s default configuration supports exactly-once processing. To preserve | |
|
|
||
| To help avoid common pitfalls and optimize performance, consider the following when configuring transactional workloads in Redpanda: | ||
|
|
||
| === Tune producer ID limits | ||
|
|
||
| For production environments with heavy producer usage, configure xref:reference:properties/cluster-properties.adoc#max_concurrent_producer_ids[`max_concurrent_producer_ids`] to prevent out-of-memory (OOM) crashes. The default unlimited value can lead to unbounded memory growth, especially with transactions or idempotent producers. | ||
|
||
|
|
||
| Calculate an appropriate value based on your expected concurrent producers: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Calculate an appropriate value ? |
||
|
|
||
| * **Lower bound**: `kafka_connections_max` ÷ `number_of_shards` (assumes each producer connects to only one shard) | ||
paulohtb6 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * **Upper bound**: `topic_partitions_per_shard` × `kafka_connections_max` (assumes producers connect to all shards) | ||
paulohtb6 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * **Recommended starting point**: Use a value between these bounds, considering your application's produce patterns | ||
paulohtb6 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Applications with wide fan-out patterns (producers writing to many partitions across multiple shards) need values closer to the upper bound. | ||
paulohtb6 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Monitor these metrics to determine if the limit is being reached: | ||
|
|
||
| * xref:reference:internal-metrics-reference.adoc#vectorized_cluster_producer_state_manager_evicted_producers[`vectorized_cluster_producer_state_manager_evicted_producers`]: Number of evicted producers (should be 0 in steady state) | ||
| * xref:reference:internal-metrics-reference.adoc#vectorized_cluster_producer_state_manager_producer_manager_total_active_producers[`vectorized_cluster_producer_state_manager_producer_manager_total_active_producers`]: Current number of active producers per shard | ||
|
|
||
| If `evicted_producers` > 0, the shard is exceeding the configured limit. For applications with long-running transactions, ensure xref:reference:properties/cluster-properties.adoc#transactional_id_expiration_ms[`transactional_id_expiration_ms`] accommodates your typical transaction lifetime to avoid premature producer ID expiration. | ||
|
|
||
| === Configure transaction timeouts and limits | ||
|
|
||
| * If a consumer is configured to use the read_committed isolation level, it can only process successfully committed transactions. As a result, an ongoing transaction with a large timeout that becomes stuck could prevent the consumer from processing other committed transactions. | ||
| + | ||
| To avoid this, don't set the transaction timeout client setting (`transaction.timeout.ms` in the Kafka Java client implementation) to a value that is too high. The longer the timeout, the longer consumers may be blocked. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4066,7 +4066,9 @@ For a compacted topic, the maximum time a message remains ineligible for compact | |
|
|
||
| === max_concurrent_producer_ids | ||
|
|
||
| Maximum number of active producer sessions. When the threshold is passed, Redpanda terminates old sessions. When an idle producer corresponding to the terminated session wakes up and produces, its message batches are rejected, and an out of order sequence error is emitted. Consumers don't affect this setting. | ||
| Maximum number of active producer sessions per shard. Each shard tracks producer IDs using an LRU (Least Recently Used) eviction policy. When the configured limit is exceeded, the least recently used producer IDs are evicted from the cache. | ||
|
|
||
| IMPORTANT: The default value is unlimited, which can lead to unbounded memory growth and out-of-memory (OOM) crashes in production environments with heavy producer usage, especially when using transactions or idempotent producers. It is strongly recommended to set a reasonable limit in production deployments. See xref:develop:transactions.adoc#tune-producer-id-limits[Tune producer ID limits] to determine an appropriate value based on your workload. | ||
|
|
||
| *Requires restart:* No | ||
|
|
||
|
|
@@ -4078,6 +4080,12 @@ Maximum number of active producer sessions. When the threshold is passed, Redpan | |
|
|
||
| *Default:* `18446744073709551615` | ||
|
|
||
| **Related topics**: | ||
|
|
||
| - xref:develop:transactions.adoc#tune-producer-id-limits[Tune producer ID limits] | ||
| - xref:reference:properties/cluster-properties.adoc#transactional_id_expiration_ms[`transactional_id_expiration_ms`] | ||
| - xref:manage:monitoring.adoc[Monitor Redpanda] | ||
|
|
||
| --- | ||
|
|
||
| === max_in_flight_pandaproxy_requests_per_shard | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This property appears twice in a row. Delete one. |
||
|
|
@@ -6350,6 +6358,8 @@ The maximum allowed timeout for transactions. If a client-requested transaction | |
|
|
||
| Expiration time of producer IDs. Measured starting from the time of the last write until now for a given ID. | ||
|
|
||
| Producer IDs are automatically removed from memory when they expire, which helps manage memory usage. However, this natural cleanup may not be sufficient for workloads with high producer churn rates. For applications with long-running transactions, ensure this value accommodates your typical transaction lifetime to avoid premature producer ID expiration. | ||
|
|
||
| *Unit:* milliseconds | ||
|
|
||
| *Requires restart:* No | ||
|
|
@@ -6362,6 +6372,11 @@ Expiration time of producer IDs. Measured starting from the time of the last wri | |
|
|
||
| *Default:* `604800000` (10080 min) | ||
|
|
||
| **Related topics**: | ||
|
|
||
| - xref:develop:transactions.adoc#tune-producer-id-limits[Tune producer ID limits] | ||
| - xref:reference:properties/cluster-properties.adoc#max_concurrent_producer_ids[`max_concurrent_producer_ids`] | ||
|
|
||
| --- | ||
|
|
||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.