Skip to content

Elevated Memory Utilization (v2.18.0) #2867

@christensenjairus

Description

@christensenjairus

What happened:
Memory on v2.18.0 is elevated, more than double what it used at v2.17.0 at peak utilization. In my configuration, it never once was OOMKilled at v2.17.0 with 512MiB limit, where on v2.18.0 it will be OOMKilled at 1GiB.

What you expected to happen:
Memory utilization to be consistent between versions

How to reproduce it (as minimally and precisely as possible):
Deployment Spec:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "14"
    meta.helm.sh/release-name: victoria-metrics
    meta.helm.sh/release-namespace: monitoring
  labels:
    app.kubernetes.io/component: metrics
    app.kubernetes.io/instance: victoria-metrics
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/part-of: kube-state-metrics
    app.kubernetes.io/version: 2.18.0
    helm.sh/chart: kube-state-metrics-6.3.0
    helm.toolkit.fluxcd.io/name: victoria-metrics
    helm.toolkit.fluxcd.io/namespace: monitoring
  name: victoria-metrics-kube-state-metrics
  namespace: monitoring
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: victoria-metrics
      app.kubernetes.io/name: kube-state-metrics
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app.kubernetes.io/component: metrics
        app.kubernetes.io/instance: victoria-metrics
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: kube-state-metrics
        app.kubernetes.io/part-of: kube-state-metrics
        app.kubernetes.io/version: 2.18.0
        helm.sh/chart: kube-state-metrics-6.3.0
    spec:
      automountServiceAccountToken: true
      containers:
      - args:
        - --port=8080
        - --resources=certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,leases,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
        - --custom-resource-state-config-file=/etc/customresourcestate/config.yaml
        image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.18.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: 8081
            scheme: HTTP
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resizePolicy:
        - resourceName: memory
          restartPolicy: NotRequired
        - resourceName: cpu
          restartPolicy: NotRequired
        resources:
          limits:
            memory: 512Mi
          requests:
            cpu: 25m
            memory: 256Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/customresourcestate
          name: customresourcestate-config
          readOnly: true
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: harbor-pull-secret
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        runAsUser: 65534
        seccompProfile:
          type: RuntimeDefault
      serviceAccount: victoria-metrics-kube-state-metrics
      serviceAccountName: victoria-metrics-kube-state-metrics
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          name: victoria-metrics-kube-state-metrics-customresourcestate-config
        name: customresourcestate-config

Custom Resource State ConfigMap (used for VPA metrics)

apiVersion: v1
data:
  config.yaml: |
    kind: CustomResourceStateMetrics
    spec:
      resources:
      - groupVersionKind:
          group: autoscaling.k8s.io
          kind: VerticalPodAutoscaler
          version: v1
        labelsFromPath:
          namespace:
          - metadata
          - namespace
          target_api_version:
          - spec
          - targetRef
          - apiVersion
          target_kind:
          - spec
          - targetRef
          - kind
          target_name:
          - spec
          - targetRef
          - name
          verticalpodautoscaler:
          - metadata
          - name
        metrics:
        - each:
            info:
              labelsFromPath:
                name:
                - metadata
                - name
            type: Info
          help: VPA container recommendations. Kubernetes labels converted to Prometheus
            labels
          name: verticalpodautoscaler_labels
        - commonLabels:
            resource: memory
            unit: byte
          each:
            gauge:
              labelsFromPath:
                container:
                - containerName
              path:
              - status
              - recommendation
              - containerRecommendations
              valueFrom:
              - target
              - memory
            type: Gauge
          help: VPA container recommendations for memory. Target resources the VerticalPodAutoscaler
            recommends for the container.
          name: verticalpodautoscaler_status_recommendation_containerrecommendations_target
        - commonLabels:
            resource: memory
            unit: byte
          each:
            gauge:
              labelsFromPath:
                container:
                - containerName
              path:
              - status
              - recommendation
              - containerRecommendations
              valueFrom:
              - lowerBound
              - memory
            type: Gauge
          help: VPA container recommendations for memory. Minimum resources the container
            can use before the VerticalPodAutoscaler updater evicts it
          name: verticalpodautoscaler_status_recommendation_containerrecommendations_lowerbound
        - commonLabels:
            resource: memory
            unit: byte
          each:
            gauge:
              labelsFromPath:
                container:
                - containerName
              path:
              - status
              - recommendation
              - containerRecommendations
              valueFrom:
              - upperBound
              - memory
            type: Gauge
          help: VPA container recommendations for memory. Maximum resources the container
            can use before the VerticalPodAutoscaler updater evicts it
          name: verticalpodautoscaler_status_recommendation_containerrecommendations_upperbound
        - commonLabels:
            resource: memory
            unit: byte
          each:
            gauge:
              labelsFromPath:
                container:
                - containerName
              path:
              - status
              - recommendation
              - containerRecommendations
              valueFrom:
              - uncappedTarget
              - memory
            type: Gauge
          help: VPA container recommendations for memory. Target resources the VerticalPodAutoscaler
            recommends for the container ignoring bounds
          name: verticalpodautoscaler_status_recommendation_containerrecommendations_uncappedtarget
        - commonLabels:
            resource: cpu
            unit: core
          each:
            gauge:
              labelsFromPath:
                container:
                - containerName
              path:
              - status
              - recommendation
              - containerRecommendations
              valueFrom:
              - target
              - cpu
            type: Gauge
          help: VPA container recommendations for cpu. Target resources the VerticalPodAutoscaler
            recommends for the container.
          name: verticalpodautoscaler_status_recommendation_containerrecommendations_target
        - commonLabels:
            resource: cpu
            unit: core
          each:
            gauge:
              labelsFromPath:
                container:
                - containerName
              path:
              - status
              - recommendation
              - containerRecommendations
              valueFrom:
              - lowerBound
              - cpu
            type: Gauge
          help: VPA container recommendations for cpu. Minimum resources the container
            can use before the VerticalPodAutoscaler updater evicts it
          name: verticalpodautoscaler_status_recommendation_containerrecommendations_lowerbound
        - commonLabels:
            resource: cpu
            unit: core
          each:
            gauge:
              labelsFromPath:
                container:
                - containerName
              path:
              - status
              - recommendation
              - containerRecommendations
              valueFrom:
              - upperBound
              - cpu
            type: Gauge
          help: VPA container recommendations for cpu. Maximum resources the container
            can use before the VerticalPodAutoscaler updater evicts it
          name: verticalpodautoscaler_status_recommendation_containerrecommendations_upperbound
        - commonLabels:
            resource: cpu
            unit: core
          each:
            gauge:
              labelsFromPath:
                container:
                - containerName
              path:
              - status
              - recommendation
              - containerRecommendations
              valueFrom:
              - uncappedTarget
              - cpu
            type: Gauge
          help: VPA container recommendations for cpu. Target resources the VerticalPodAutoscaler
            recommends for the container ignoring bounds
          name: verticalpodautoscaler_status_recommendation_containerrecommendations_uncappedtarget
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: victoria-metrics
    meta.helm.sh/release-namespace: monitoring
  labels:
    app.kubernetes.io/component: metrics
    app.kubernetes.io/instance: victoria-metrics
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/part-of: kube-state-metrics
    app.kubernetes.io/version: 2.18.0
    helm.sh/chart: kube-state-metrics-6.3.0
    helm.toolkit.fluxcd.io/name: victoria-metrics
    helm.toolkit.fluxcd.io/namespace: monitoring
  name: victoria-metrics-kube-state-metrics-customresourcestate-config
  namespace: monitoring

Anything else we need to know?:
It doesn't immediately OOM, the memory grows over time, suggesting a memory leak.

Here's a memory usage graph of the 12 hours I had version 2.18.0 deployed.

Image

Environment:

  • kube-state-metrics version: 2.18.0
  • Kubernetes version (use kubectl version): 1.35.0
  • Cloud provider or hardware configuration: Self-hosted, kubeadm cluster
  • Other info:

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    Status

    Needs Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions