diff --git a/modules/develop/pages/transactions.adoc b/modules/develop/pages/transactions.adoc index c60c54606f..ac2f4ff7bd 100644 --- a/modules/develop/pages/transactions.adoc +++ b/modules/develop/pages/transactions.adoc @@ -270,6 +270,27 @@ Redpanda’s default configuration supports exactly-once processing. To preserve To help avoid common pitfalls and optimize performance, consider the following when configuring transactional workloads in Redpanda: +=== Tune producer ID limits + +For production environments with heavy producer usage, consider using xref:reference:properties/cluster-properties.adoc#max_concurrent_producer_ids[`max_concurrent_producer_ids`] to prevent out-of-memory (OOM) crashes. The default unlimited value can lead to unbounded memory growth, especially with transactions or idempotent producers. + +Calculate an appropriate value based on your expected concurrent producers: + +* **Lower bound**: `kafka_connections_max` ÷ `number_of_shards` (based on the assumption that each producer connects to only one shard) +* **Upper bound**: `topic_partitions_per_shard` × `kafka_connections_max` (based on the assumption that producers connect to all shards) +* **Recommended starting point**: Use a value between these upper and lower bounds, considering your application's produce patterns + +Applications with wide fan-out patterns (producers writing to many partitions across multiple shards) require values closer to the upper bound. + +Monitor these metrics to determine if the limit is being reached: + +* xref:reference:internal-metrics-reference.adoc#vectorized_cluster_producer_state_manager_evicted_producers[`vectorized_cluster_producer_state_manager_evicted_producers`]: Number of evicted producers (should be 0 in steady state) +* xref:reference:internal-metrics-reference.adoc#vectorized_cluster_producer_state_manager_producer_manager_total_active_producers[`vectorized_cluster_producer_state_manager_producer_manager_total_active_producers`]: Current number of active producers per shard + +If `evicted_producers` > 0, the shard is exceeding the configured limit. For applications with long-running transactions, ensure xref:reference:properties/cluster-properties.adoc#transactional_id_expiration_ms[`transactional_id_expiration_ms`] accommodates your typical transaction lifetime to avoid premature producer ID expiration. + +=== Configure transaction timeouts and limits + * If a consumer is configured to use the read_committed isolation level, it can only process successfully committed transactions. As a result, an ongoing transaction with a large timeout that becomes stuck could prevent the consumer from processing other committed transactions. + To avoid this, don't set the transaction timeout client setting (`transaction.timeout.ms` in the Kafka Java client implementation) to a value that is too high. The longer the timeout, the longer consumers may be blocked. diff --git a/modules/reference/pages/properties/cluster-properties.adoc b/modules/reference/pages/properties/cluster-properties.adoc index 0dd8017630..413e46f78d 100644 --- a/modules/reference/pages/properties/cluster-properties.adoc +++ b/modules/reference/pages/properties/cluster-properties.adoc @@ -4066,7 +4066,9 @@ For a compacted topic, the maximum time a message remains ineligible for compact === max_concurrent_producer_ids -Maximum number of active producer sessions. When the threshold is passed, Redpanda terminates old sessions. When an idle producer corresponding to the terminated session wakes up and produces, its message batches are rejected, and an out of order sequence error is emitted. Consumers don't affect this setting. +Maximum number of active producer sessions per shard. Each shard tracks producer IDs using an LRU (Least Recently Used) eviction policy. When the configured limit is exceeded, the least recently used producer IDs are evicted from the cache. + +IMPORTANT: The default value is unlimited, which can lead to unbounded memory growth and out-of-memory (OOM) crashes in production environments with heavy producer usage, especially when using transactions or idempotent producers. It is strongly recommended to set a reasonable limit in production deployments. See xref:develop:transactions.adoc#tune-producer-id-limits[Tune producer ID limits] to determine an appropriate value based on your workload. *Requires restart:* No @@ -4078,6 +4080,12 @@ Maximum number of active producer sessions. When the threshold is passed, Redpan *Default:* `18446744073709551615` +**Related topics**: + +- xref:develop:transactions.adoc#tune-producer-id-limits[Tune producer ID limits] +- xref:reference:properties/cluster-properties.adoc#transactional_id_expiration_ms[`transactional_id_expiration_ms`] +- xref:manage:monitoring.adoc[Monitor Redpanda] + --- === max_in_flight_pandaproxy_requests_per_shard @@ -6350,6 +6358,8 @@ The maximum allowed timeout for transactions. If a client-requested transaction Expiration time of producer IDs. Measured starting from the time of the last write until now for a given ID. +Producer IDs are automatically removed from memory when they expire, which helps manage memory usage. However, this natural cleanup may not be sufficient for workloads with high producer churn rates. For applications with long-running transactions, ensure this value accommodates your typical transaction lifetime to avoid premature producer ID expiration. + *Unit:* milliseconds *Requires restart:* No @@ -6362,6 +6372,11 @@ Expiration time of producer IDs. Measured starting from the time of the last wri *Default:* `604800000` (10080 min) +**Related topics**: + +- xref:develop:transactions.adoc#tune-producer-id-limits[Tune producer ID limits] +- xref:reference:properties/cluster-properties.adoc#max_concurrent_producer_ids[`max_concurrent_producer_ids`] + ---