Description
Bug Report
Describe the bug
I have deployed a fluent-bit via a Deployment
which' only job is to gather kubernetes_events and output them somewhere.
This fluent-bit seems to have an issue where sometimes over the timespan of a few minutes to sometimes multiple hours the cpu usage goes to 1 (100% on 1 core).
The deployment only has a request of >1, no limit set, and the node has a lot of spare cpu capacity (32 core system).
My other fluent-bits which are gathering logs and outputting to the same output do not seem to have this issue.
There is no custom parsers in custom_parsers.conf.
I do use the helm chart of fluent-bit with these values:
kind: Deployment
autoscaling:
vpa:
enabled: true
config:
hotReload:
enabled: true
inputs: |
[Input]
Name kubernetes_events
db /var/sync/db
kube_retention_time 15m
Tag k8s-events
customParsers: ""
filters: ""
outputs: |
[Output]
Name forward
Match k8s-events
Retry_Limit 5
Host my-external-fluentd-hostname
Port 15000
extraVolumes:
- name: sync
persistentVolumeClaim:
claimName: fluent-bit-k8s-events-sync
extraVolumeMounts:
- name: sync
mountPath: /var/sync
image:
tag: 3.2.4
rbac:
create: true
eventsAccess: true
replicaCount: 1
serviceMonitor:
enabled: true
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
I can also see on the node that it is fluent-bit itself causing the cpu usage and not the config watcher or hot-reload mechanism:
# ps aux | grep fluent
root 3751 0.0 0.0 1226304 2164 ? Ssl 09:50 0:00 /fluent-bit/bin/fluent-bit-watcher
root 3778 0.1 0.1 125872 19508 ? Sl 09:50 0:02 /fluent-bit/bin/fluent-bit --enable-hot-reload -c /fluent-bit/etc/fluent-bit.conf
root 54668 99.6 0.5 295496 92072 ? Ssl 10:14 5:25 /fluent-bit/bin/fluent-bit --workdir=/fluent-bit/etc --config=/fluent-bit/etc/conf/fluent-bit.conf
To Reproduce
- Run fluent-bit with the given chart values for some days (ensure to create a pvc
fluent-bit-k8s-events-sync
first) - observe cpu usage
Expected behavior
cpu usage should correlate to event amount produced
Your Environment
- Version used: 3.2.4
- Configuration: as can be seen above, manually create a pvc with name
fluent-bit-k8s-events-sync
that can be used to create the db sync - Environment name and version (e.g. Kubernetes? What version?): Kubernetes - AWS EKS v1.31.1-eks-1b3e656
- Server type and version: AWS EC2 Instance
- Operating System and version: Bottlerocket OS 1.29.0 (aws-k8s-1.31)
- Filters and plugins: none
Additional context
It seems that fluent-bit is still processing events and writing them to the output, but i haven't checked if they are complete.
I do see this behavior across all our clusters, except those where the output is running inside the same cluster (the outputs hostname is an internal kubernetes service in this case).