Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance with Kafka input module #9813

Open
uranru opened this issue Jan 9, 2025 · 6 comments
Open

Slow performance with Kafka input module #9813

uranru opened this issue Jan 9, 2025 · 6 comments

Comments

@uranru
Copy link

uranru commented Jan 9, 2025

Bug Report

Describe the bug
I want to receive messages from a Kafka cluster, process them and send them further.
Now I'm testing with standard output, I see very low performance.
About 20,000 messages per minute, it's very slow for my number of messages.
I tried different buffer settings and rdkafka.X settings.
But the speed is always the same.
Bit works in a docker container.
Tell me how can I increase performance?
Memory is not loaded, the processor is not loaded, my container is resting.

To Reproduce

  • Example log message if applicable:
{
	"host": {
		"name": "int-queues",
		"domain": "int"
	},
	"tags": [
		"internal"
	],
	"@timestamp": "2025-01-09T16:04:52.082Z",
	"log": {
		"name": "debug-trn-sync",
		"level": "debug",
		"pid": "2825983",
		"dir": "log",
		"ip": "local-script",
		"time": "2025-01-09T16:04:51.445Z"
	},
	"app": {
		"name": "backend",
		"group": "web",
		"environment": "prod"
	},
	"@version": "1"
}
  • Steps to reproduce the problem:

Expected behavior

Screenshots

Your Environment

  • Version used:
[2025/01/09 18:52:30] [ info] [fluent bit] version=3.2.4, commit=5b0ff04120, pid=1
[2025/01/09 18:52:30] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/01/09 18:52:30] [ info] [simd    ] disabled
[2025/01/09 18:52:30] [ info] [cmetrics] version=0.9.9
[2025/01/09 18:52:30] [ info] [ctraces ] version=0.5.7
  • Configuration:
[SERVICE]
    flush        1
    http_server  on
    http_port    2020
    health_check on              # /api/v1/health
    #daemon       Off
    log_level    info
    #log_level    debug
    #Buffer_max_size 1024Mb
    #Buffer_chunk_size 50Mb
    #storage.max_chunks_up 1000

[INPUT]
    name            fluentbit_metrics
    tag             metrics.internal
    scrape_interval 5

[INPUT]
    Name          kafka
    Tag           kafka
    threaded      true
    brokers       nd-kafka-n01.int:9092,nd-kafka-n02.int:9092,nd-kafka-n03.int:9092
    Mem_Buf_Limit 256MB
    topics        int.web.backend
    poll_ms       100
    format        json
    group_id      fluent

    #rdkafka.queued.max.messages.kbytes 262144  # 256MB
    #rdkafka.fetch.message.max.bytes   10485760
    #rdkafka.max.partition.fetch.bytes 10485760
    #rdkafka.fetch.max.bytes           524288000
    #rdkafka.fetch.min.bytes 16384
    rdkafka.sasl.mechanism     PLAIN
    rdkafka.security.protocol  SASL_SSL
    rdkafka.sasl.username      kafkaclient
    rdkafka.sasl.password      {{ lookup('hashi_vault', 'secret=data/nomad/kafka:kafka_pass_kafkaclient') }}
    rdkafka.ssl.ca.location    /etc/certs/ca.cer

[OUTPUT]
    Name        stdout
    Match       kafka

[OUTPUT]
    Match        metrics.internal
    Name        prometheus_remote_write
    Host        victoria.stream.service.int
    Port        8428
    Uri         /api/v1/write

[OUTPUT]
    name            prometheus_exporter
    match           metrics.internal
    host            0.0.0.0
    port            2021
  • Environment name and version (e.g. Kubernetes? What version?):
    I use https://hub.docker.com/r/bitnami/fluent-bit
  • Server type and version:
  • Operating System and version:
    22.04.5 LTS (GNU/Linux 5.15.0-130-generic x86_64)
  • Filters and plugins:

Additional context

@uranru
Copy link
Author

uranru commented Jan 21, 2025

maybe it is related to this problem: #8030

@hafkaM
Copy link

hafkaM commented Mar 2, 2025

I have a similar problem. When I use the Kafka input processor to consume data from Kafka, I can't get beyond 1000 EPS per partition. If I have 6 partitions, I reach around 5-6K EPS; if I have 15 partitions, I reach 15K EPS, and so on...

When I tried Fluentd, it behaved well, although it used more resources due to the Ruby overhead.

However, I would prefer to use Fluent Bit in the landing zone because I only need to move data from one Kafka instance to another without any major transformations.

@hafkaM
Copy link

hafkaM commented Mar 2, 2025

I've also tried various buffer settings and different rdkafka parameter configurations, but I've never been able to exceed approximately 1000 EPS per partition...

@hafkaM
Copy link

hafkaM commented Apr 2, 2025

Hello
I installed the new 4.0 Fluent Bit image and tested the performance of the Kafka input plugin. I thought this had already been resolved, see:
in_kafka: optimize poll timeout handling for threaded and main event loop modes #10122

However, the performance is terrible — just a few hundred events per second across 8 partitions.
So, is the low EPS issue actually resolved or not?

From the load graphs, it’s clear that the CPU is barely doing anything, while the Kafka queue is growing rapidly.

@nareshku
Copy link
Contributor

Have you tried including poll_timeout_ms?
Without this config, it defaults to just the way it used to be in the past.

@hafkaM
Copy link

hafkaM commented Apr 11, 2025

Below is my current configuration.
I’ve experimented with different rdkafka settings, but the throughput is still low, and we’re seeing increasing lag on the topic.

[INPUT]
Name kafka
brokers test-1:9093,test-2:9093,test-3:9093
topics test-applog-raw
group_id test-fluent1
rdkafka.sasl.mechanism SCRAM-SHA-256
rdkafka.sasl.username ${FBIT_KAFKA_READER_USERNAME}
rdkafka.sasl.password ${FBIT_KAFKA_READER_PASSWORD}
rdkafka.security.protocol SASL_PLAINTEXT
poll_ms 100
poll_timeout_ms 1000
Buffer_Max_Size 50M
rdkafka.auto.offset.reset latest
format none
threaded true

[OUTPUT]
Name null
Match *

I can reconfigure the test environment and provide the resulting metrics for further analysis.
M.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants