Skip to content

Prometheus middleware improvements#627

Open
stat1c-void wants to merge 6 commits into
Bogdanp:masterfrom
stat1c-void:prometheus_changes
Open

Prometheus middleware improvements#627
stat1c-void wants to merge 6 commits into
Bogdanp:masterfrom
stat1c-void:prometheus_changes

Conversation

@stat1c-void

Copy link
Copy Markdown

A number of Prometheus middleware improvements:

  • canonical queue names in metrics
  • livesum for a gauge that was seemingly missing
  • unique PROMETHEUS_MULTIPROC_DIR per top-level process by default
  • made prometheus metrics test a bit more stable (several tries)

Have successfully run pytest --benchmark-skip on my local python 3.10.14.

Some details below.

Canonical queue names

If queue names aren't canonical, then something like this happens to metrics:

dramatiq_delayed_messages_inprogress{..., pid="7", queue_name="costs"} -1
dramatiq_delayed_messages_inprogress{..., pid="7", queue_name="costs.DQ"} 1
dramatiq_delayed_messages_inprogress{..., pid="7", queue_name="costs"} -4
dramatiq_delayed_messages_inprogress{..., pid="7", queue_name="costs.DQ"} 4
dramatiq_delayed_messages_inprogress{..., pid="7", queue_name="default.DQ"} 2
dramatiq_delayed_messages_inprogress{..., pid="6", queue_name="default"} -1
dramatiq_delayed_messages_inprogress{..., pid="6", queue_name="default.DQ"} 2

Unique PROMETHEUS_MULTIPROC_DIR

Previously, Prometheus mutli-proc exporting would interfere between multiple Dramatiq instances on the same machine (because the prom DB path would be the same across different instances). Actually, would interfere even across restarts of the same Dramatiq instance (and even in containerized environments, when containers are just restarted, and not recreated).

In containerized env, top-level PID value is likely to be the same across runs, but with auto-cleanup it can be managed.

Metrics test

Noticed that test_prometheus_middleware_exposes_metrics was sometimes failing due to timings - metrics endpoint wasn't ready in 1 second. Should be a bit more stable now.

@Bogdanp

Bogdanp commented May 30, 2024

Copy link
Copy Markdown
Owner

I’m traveling right now and won’t be able to do a full review for a few weeks, but the casing of the environment variables should remain unchanged, otherwise this would be a breaking change.

@stat1c-void

Copy link
Copy Markdown
Author

Do you think it would be acceptable to use uppercase DRAMATIQ_PROM_DB and fallback to dramatiq_prom_db if uppercase one is missing?

And what's your opinion on PROMETHEUS_MULTIPROC_DIR? I've looked up prom python client code - lowercase variant is deprecated for 3 years now.
https://github.com/prometheus/client_python/blob/09a5ae30602a7a81f6174dae4ba08b93ee7feed2/prometheus_client/multiprocess.py#L27

And I'll try to look at gevent tests. Slightly off-topic, but is tox config up-to-date? I've been having trouble trying to run tox - errors out on tests with something about "conflicting factors".

@Bogdanp

Bogdanp commented May 31, 2024

Copy link
Copy Markdown
Owner

Do you think it would be acceptable to use uppercase DRAMATIQ_PROM_DB and fallback to dramatiq_prom_db if uppercase one is missing?

And what's your opinion on PROMETHEUS_MULTIPROC_DIR? I've looked up prom python client code - lowercase variant is deprecated for 3 years now. https://github.com/prometheus/client_python/blob/09a5ae30602a7a81f6174dae4ba08b93ee7feed2/prometheus_client/multiprocess.py#L27

And I'll try to look at gevent tests. Slightly off-topic, but is tox config up-to-date? I've been having trouble trying to run tox - errors out on tests with something about "conflicting factors".

Let’s use fallbacks for both environment vars. The tox config is probably out of date at the moment since it’s no longer used in CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants