Skip to content
This repository was archived by the owner on Jan 29, 2025. It is now read-only.
This repository was archived by the owner on Jan 29, 2025. It is now read-only.

Missing Metrics #580

@gkramer

Description

@gkramer

Hey guys,

Wondering if someone could assist with an issue I'm having with BigGraphite [BG]. It currently receives a large number of metrics, but appears to drop a noticable proportion randomly... this was highlighted when looking at metrics from Apache Spark, which has frequent gaps per hour (of one minute each).

Infrastructure Setup:

  • Within EKS (1.20)
  • internal AWS NLB
  • Traffic Flow: NLB -> Carbon Container -> {elasticsearch + cassandra}
  • Carbon: Running inside an upstream Alpine container
  • PS:
    1 root 0:00 {entrypoint} /bin/sh /entrypoint
    49 root 0:00 runsvdir -P /etc/service
    51 root 0:00 runsv bg-carbon
    52 root 0:03 runsv brubeck
    53 root 0:00 runsv carbon
    54 root 0:00 runsv carbon-aggregator
    55 root 0:03 runsv carbon-relay
    56 root 0:03 runsv collectd
    57 root 0:00 runsv cron
    58 root 0:00 runsv go-carbon
    59 root 0:00 runsv graphite
    60 root 0:00 runsv nginx
    61 root 0:03 runsv redis
    62 root 0:00 runsv statsd
    63 root 0:00 tee -a /var/log/carbon.log
    65 root 0:00 tee -a /var/log/carbon-relay.log
    68 root 0:00 tee -a /var/log/statsd.log
    69 root 0:01 {gunicorn} /opt/graphite/bin/python3 /opt/graphite/bin/gunicorn wsgi --pythonpath=/opt/graphite/webapp/graphite --preload --threads=1 --worker-class=sync --workers=4 --limit-request-line=0 --max-requests=1000 --timeout=65 --bind=0.0
    70 root 0:09 {node} statsd /opt/statsd/config/tcp.js
    71 root 0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
    76 root 0:00 /usr/sbin/crond -f
    79 nginx 0:00 nginx: worker process
    80 nginx 0:00 nginx: worker process
    81 nginx 0:00 nginx: worker process
    82 nginx 0:00 nginx: worker process
    85 root 0:35 tee -a /var/log/bg-carbon.log
    86 root 45:27 /opt/graphite/bin/python3 /opt/graphite/bin/bg-carbon-cache start --nodaemon --debug
    88 root 0:00 tee -a /var/log/carbon-aggregator.log
    156 root 0:41 {gunicorn} /opt/graphite/bin/python3 /opt/graphite/bin/gunicorn wsgi --pythonpath=/opt/graphite/webapp/graphite --preload --threads=1 --worker-class=sync --workers=4 --limit-request-line=0 --max-requests=1000 --timeout=65 --bind=0.0
    157 root 0:49 {gunicorn} /opt/graphite/bin/python3 /opt/graphite/bin/gunicorn wsgi --pythonpath=/opt/graphite/webapp/graphite --preload --threads=1 --worker-class=sync --workers=4 --limit-request-line=0 --max-requests=1000 --timeout=65 --bind=0.0
    158 root 0:46 {gunicorn} /opt/graphite/bin/python3 /opt/graphite/bin/gunicorn wsgi --pythonpath=/opt/graphite/webapp/graphite --preload --threads=1 --worker-class=sync --workers=4 --limit-request-line=0 --max-requests=1000 --timeout=65 --bind=0.0
    159 root 0:47 {gunicorn} /opt/graphite/bin/python3 /opt/graphite/bin/gunicorn wsgi --pythonpath=/opt/graphite/webapp/graphite --preload --threads=1 --worker-class=sync --workers=4 --limit-request-line=0 --max-requests=1000 --timeout=65 --bind=0.0

I can see traffic coming in to the interface (tcpdump/tcpflow), and can see logs to bg-carbon.log with references to 'cache query', but almost no datapoint logs for spark metrics.

Any assistance in troubleshooting would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions