Add instrument tracing #149

zxiong · 2023-11-27T06:48:02Z

Add common instrument tracing function so that it can be used by pubtools-* projects

for more information, see https://pre-commit.ci

codecov · 2023-11-27T06:52:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (faa0e80) 98.66% compared to head (ea0eeee) 99.36%.
Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #149      +/-   ##
==========================================
+ Coverage   98.66%   99.36%   +0.69%     
==========================================
  Files           4        6       +2     
  Lines          75      157      +82     
==========================================
+ Hits           74      156      +82     
  Misses          1        1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rohanpm · 2023-11-28T00:58:23Z

pubtools/tracing.py

@@ -0,0 +1,123 @@
+"""Instrument Pub task functions.


Can you please revise the docs and the environment variable names to not be specific to pub? This code should work even when running outside of pub.

rohanpm · 2023-11-28T01:01:17Z

pubtools/tracing.py

+log = logging.getLogger(__name__)
+
+
+class TracingWrapper:


There's no delineation here between what's intended as public API, and what's considered just an implementation detail.

As far as I can tell, only the instrument_func decorator should be public API. Everything else should be hidden.

Can you please follow the same setup as other modules in order to achieve this? Check how pubtools/pluggy.py works for example. Basically it means that:

this file should be moved to pubtools/_impl/tracing.py and it can contain anything you want

in pubtools/tracing.py you should import only what's intended as public API - instrument_func I guess - and put it into __all__ .

Updated as your suggestion.

rohanpm · 2023-11-28T01:04:42Z

pubtools/tracing.py

+        return cls.__instance
+
+
+def instrument_func(span_name=None, args_to_attr=False):


Can you please ensure that this appears in online docs at https://release-engineering.github.io/pubtools/ ?

I think you'll have to write a whole new page for this, similar to the hooks page which exists now. The page can contain the docs for this function and also general info about how to enable this (e.g. how to set the relevant env vars).

Added docs for it.

rohanpm · 2023-11-28T01:05:19Z

pubtools/tracing.py

+
+    Args:
+        span_name: str
+            Span name.


Could you please mention that it's OK to omit this in order to use the function's name as the span name?

rohanpm · 2023-11-28T01:21:47Z

pubtools/tracing.py

+        if TracingWrapper.__instance is None:
+            log.info("Creating TracingWrapper instance")
+            cls.__instance = super().__new__(cls)
+            otlp_exporter = OTLPSpanExporter(


I don't think that the exporter part of this should be hardcoded to use only this type of exporter.

Correct me if I'm wrong, but I think the intended design is that you can set up all your tracing code, and then choose different exporters for different scenarios without having to change the tracing code?

e.g. in https://opentelemetry.io/docs/instrumentation/python/exporters/#console-exporter it refers to using a console exporter which seems to be built in to the main otel library. It says it's good for "development and debugging tasks" and I can imagine that indeed being quite useful for us.

For instance, if I'm working on pubtools-foo and I'm adding instrumentation code, it would be really nice if I can somehow make it use the console exporter when I run commands locally, but then use the HTTP exporter when deployed in Pub. I wouldn't want to be forced to deploy some kind of OTLP/HTTP receiver locally just to test my OTEL-related changes during local development.

I think you should consider using a hook to create the exporter. You can look at existing get_cert_key_paths hook to get an idea what's needed across pubtools & pub for that.

So, it could be something like this:

@hookspec(firstresult=True) def otel_exporter(): """Return an OTEL exporter, used by OTEL instrumentation. If OTEL tracing is enabled and this hook is not implemented, a default `ConsoleSpanExporter` will be used. """

...then in Pub, implement the otel_exporter hook to return an OTLPSpanExporter.

That should allow all the OTEL related code to work even in local development and tests while exporting to console, and work in Pub by using OTLP/HTTP.

Added this hook.

rohanpm · 2023-11-28T01:25:44Z

pubtools/tracing.py

+            attributes = {
+                "function_name": func.__qualname__,
+            }
+            if args_to_attr:


I would rather this is moved up to the top before the PUB_OTEL_TRACING check, even though the result is not used when it's false.

The reason is that there's no guarantee the arguments can be converted to string successfully. I think we can say it's a bug to use args_to_attr=True in that case, but the problem is that the bug will not be detected unless PUB_OTEL_TRACING is set. So, this sets up a trap where somebody adds @instrument_func on some new code and it seems to work fine when they test locally with PUB_OTEL_TRACING not set, but as soon as it's deployed into production where PUB_OTEL_TRACING is set, it crashes.

By moving up this attribute calculation to the beginning, we ensure bugs in the attribute calculation will be detected immediately even if tracing is disabled.

Updated as your suggestion.

rohanpm · 2023-11-28T01:27:57Z

tests/test_instrument_tracing.py

+def test_instrument_func_multiple_threads():
+    trace_id = "cefb2b8db35d5f3c0dfdf79d5aab1451"
+    span_id = "1f2bb7927f140744"
+    os.environ["PUB_OTEL_TRACING"] = "true"


If you change os.environ directly, it's not reverted at the end of the test, so this test might influence later tests run by the same process.

Please use pytest monkeypatch fixture instead. If you monkeypatch.setenv, whatever you set is automatically rolled back at the end of the test.

Updated those tests to use monkeypatch

for more information, see https://pre-commit.ci

- Revise the docs and the env-vars names to not be specific to pub. - Hide instrument_func implementation. - Use a hook to create the exporter instead of hardcode. - Adjust function args to span attributes. - Use pytest monkeypatch fixture instead of changing os.environ in unit test.

for more information, see https://pre-commit.ci

rohanpm · 2023-12-05T00:28:50Z

docs/tracing.rst

+
+
+Usage
+.....................


I believe the number of characters in these header lines need to match exactly the line above in order for docs to render properly and without warnings.

So, for example, since "Usage" is 5 characters long this needs to be:

Suggested change

.....................

.....

rohanpm · 2023-12-05T00:29:34Z

docs/tracing.rst

+Following environment variables are used in the module:
+
+- ``OTEL_TRACING``: set ``true`` to enable instrument tracing, otherwise tracing is disabled.
+- ``OTEL_SERVICE_NAME``: required, set the value of the service.name resource attribute. It's


At the moment I don't quite understand this, can you give me an example? (Just telling me here and not putting into the docs.) For example would we be setting this to "pub" ?

"pub-hub" for Pub hub, and "pubd" (or pub-worker) for Pub worker. It can be as a filter field in field condition.

rohanpm · 2023-12-05T00:29:48Z

docs/tracing.rst

+
+
+OTEL Exporter
+~~~~~~~~~~~~~~


Again, not perfectly aligned with the header text.

rohanpm · 2023-12-05T00:34:37Z

pubtools/_impl/tracing.py

+            log.info("Creating TracingWrapper instance")
+            cls.__instance = super().__new__(cls)
+            exporter = (
+                pm.hook.otel_exporter()


I think this should be called only once in case construction of the exporter is expensive or has some side-effects.

rohanpm · 2023-12-05T00:40:33Z

docs/tracing.rst

+Instrument tracing for functions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Before instrument tracing for functions, :class:`TracingWrapper` should be called once in the service, which


Could we please remove this requirement? It seems like an ugly implementation detail. The user doesn't have to actually do anything with the class except instantiate it, right?

Actually, I'm reading TracingWrapper right now and I can't understand why an instance of the class is even needed. There is never any reference to TracingWrapper.__instance anywhere. Is the point of that code just to do some one-time initialization and there is no technical reason for it to be a class (as opposed to e.g. just a plain function)?

Anyway, I would very much like us to find some way to avoid this and to do this automatically for the user.

Right now, when does the initialization need to happen? Is it necessary before @instrument_func or is it just necessary before one of the instrumented functions is invoked? I would suggest that instrument_func itself can arrange for this initialization to happen once, when needed, and don't require the caller to care about this.

I do see one usage of TracingWrapper in the change, which is that this happens in some tests:

tracing_wrapper.processor.force_flush()

Would you anticipate that other projects would ever need to do something like that? If so, that's a reason why it would have to remain in the API.

If it needs to remain in the API because projects would need to use API like the above, here is a different suggestion how to handle it:

There should be a public class named something like TraceContext which provides access to the processor (and whatever other otel classes that projects might need to access).

There should be a public function named something like get_trace_context() which returns the single global trace context, creating it if necessary; or returning an object which does nothing if tracing is disabled.

The instrument_func decorator, rather than being a standalone function, should be a member of TraceContext. The way of using it downstream would then be something like:

from pubtools.tracing import get_trace_context ctx = get_trace_context() class Whatever: @ctx.instrument_func(...) def some_method_i_want_to_instrument(self): ...

Note that it forces the caller to obtain a TraceContext before they can instrument anything, so it's not possible to forget to initialize the context.

I do see one usage of TracingWrapper in the change, which is that this happens in some tests:

tracing_wrapper.processor.force_flush()

Would you anticipate that other projects would ever need to do something like that? If so, that's a reason why it would have to remain in the API.

tracing_wrapper.processor.force_flush() will be used in pubd to make sure all trace data to be sent to exporters before a pub task exits.

Besides that racing_wrapper.provider will be used when enabling RequestsInstrumentor() in pubd.
RequestsInstrumentor().instrument(tracer_provider=tracing_wrapper.provider)

rohanpm · 2023-12-05T00:42:35Z

requirements.txt

@@ -1,2 +1,5 @@
 pluggy
 setuptools
+opentelemetry-api
+opentelemetry-sdk
+opentelemetry-exporter-otlp


I think this dependency can be removed now, because we now expect that the implementer of the otel_exporter hook (e.g. pub) is the one which declares the dependency on this (or on whatever exporter it wants to use).

Removed opentelemetry-exporter-otlp.

rohanpm · 2023-12-05T00:43:50Z

pubtools/_impl/tracing.py

+            exporter = (
+                pm.hook.otel_exporter()
+                if pm.hook.otel_exporter()
+                else ConsoleSpanExporter()


Because I've never run this, I have no idea what the console exporter does. Have you tried it yet? Does it seem like a useful default when running locally?

Yes, I tried it in my local env. This exporter can print out all spans in the trace in the console, for example:

{ "name": "test_span", "context": { "trace_id": "0x57eabe45b97e63ba6d734f1b9b751daf", "span_id": "0xcb886a22023940bc", "trace_state": "[]" }, "kind": "SpanKind.INTERNAL", "parent_id": null, "start_time": "2023-12-05T01:40:50.679578Z", "end_time": "2023-12-05T01:40:50.679598Z", "status": { "status_code": "UNSET" }, "attributes": {}, "events": [], "links": [], "resource": { "attributes": { "service.name": "client-local-test" }, "schema_url": "" } }
The benefit of console exporter is easy to setup. But, If we'd like to see the relationships between spans clearly, it's not straightforward.

For now, I am more using OLTP exporter to send trace data a local jaeger which provides web UI to see the trace in my local test env.

for more information, see https://pre-commit.ci

rohanpm · 2023-12-05T05:29:57Z

pubtools/_impl/tracing.py

+            cls.__instance = super().__new__(cls)
+            exporter = pm.hook.otel_exporter()
+            if not exporter:
+                exporter = ConsoleSpanExporter()


You should be able to cheat and write it like this if you don't want to bother with explicitly covering the branch codecov is complaining about:

exporter = pm.hook.otel_exporter() or ConsoleSpanExporter()

rohanpm · 2023-12-05T05:38:56Z

pubtools/_impl/tracing.py

+
+def get_trace_wrapper():
+    """return a global trace wrapper instance"""
+    return TracingWrapper()


There is no special case here when tracing is disabled. Isn't it going to lead to a crash? You've implemented __new__ so it'll return None in that case.

Then won't the following happen?

from pubtools.tracing import get_trace_wrapper tw = get_trace_wrapper() @tw.instrument_func(...) # CRASH! When tracing is disabled, tw is None so this can't work. def some_func(...): ....

That's why I said it'd have to return an object which "does nothing" (i.e. instrument_func exists but has no effect) in the case of tracing being disabled.

This is not necessarily just about instrument_func either. You said that you expect downstream code to access tw.processor, that would also crash if tw is None. Will everywhere accessing tw.processor need to first check if tw is None, or maybe check if tw.processor is None? It would be nice to come up with some better way to handle it because that will lead to bugs where people write code which only works when tracing is enabled, or vice-versa.

One option would be not to expose processor at all, but instead on the TracingWrapper class you implement the specific methods that you expect downstream to need (e.g. have a TracingWrapper.flush() method and have it do nothing if tracing is disabled).

By the way, I'm not familiar with the otel python library at all, but it wouldn't surprise me if it already has some way to make it gracefully do nothing. If so it would be cleaner to use that than to have to introduce None vs non-None code paths.

It did crash. I will fix it.

tests/tracing/test_instrument_tracing.py

for more information, see https://pre-commit.ci

rohanpm · 2023-12-05T20:37:50Z

pubtools/_impl/tracing.py

+            if os.getenv("OTEL_TRACING", "").lower() == "true":
+                log.info("Creating TracingWrapper instance")
+                exporter = pm.hook.otel_exporter() or ConsoleSpanExporter()
+                cls.provider = TracerProvider(


Is there a particular reason that you set these on the class object rather than the instance just created?

I suppose that it will work, but it's odd and it's effectively creating two levels of singleton. In the same way that TracingWrapper.instance is global, so is TracingWrapper.provider, TracingWrapper.processor, TracingWrapper.tracer. It's questionable then why a class is even needed if there is never any instance-specific data.

This should all work but I think it would be way cleaner if it were written like the below. Just implement TracingWrapper as a plain old class, no overriding __new__, and then implement get_trace_wrapper() so it returns an existing copy of it.

TRACE_WRAPPER = None def get_trace_wrapper(): """return a global trace wrapper instance""" if TRACE_WRAPPER is None: TRACE_WRAPPER = TracingWrapper() return TRACE_WRAPPER # TracingWrapper is implemented "normally" class TracingWrapper: def __init__(self): # These should be private. We want callers to only use our # public methods e.g. `force_flush` so we can handle the `tracing disabled` case # for them. self._processor = None self._provider = None self._tracer = None if os.getenv("OTEL_TRACING", "").lower() == "true": # OK, actually initialize self._processor etc to functioning objects now self._processor = ... def force_flush(self): """Flush trace data into OTEL collectors""" if self._processor: self._processor.force_flush()

Personally I think that's a lot easier to understand and deal with, and it means that tests in this repo can just write TracingWrapper() and have it work as expected - creating a clean new instance of the object.

Also, we haven't discussed it before, but the singleton initialization would need to use a lock if there is any chance that multiple threads might be doing the initialization at the same time.

Is there a particular reason that you set these on the class object rather than the instance just created?

No special reason. Just thought that TracingWrapper is only needed to instanced once in the pubd/pub task job and referred it to other projects, so use singleton. Implementing TracingWrapper as a plain old class looks better, let me update it.

for more information, see https://pre-commit.ci

rohanpm · 2023-12-06T02:19:36Z

pubtools/_impl/tracing.py

+                        "{}={}".format(k, v) for k, v in kwargs.items()
+                    )
+
+                if not os.getenv("OTEL_TRACING", "").lower() == "true":


Minor: it's probably a good idea to do this env var check only once, in the TracingWrapper constructor, and store the result as self._enabled or similar.

Otherwise we have to worry about what happens if you have the env var unset, construct a TracingWrapper, instrument functions, then set the env var and call the instrumented functions. It'll be kind of half enabled, half disabled at that point.

rohanpm · 2023-12-06T02:43:10Z

I think the pre-commit failure you're running into can be fixed by adding this into tox.ini:

[isort]
profile = black

for more information, see https://pre-commit.ci

Add instrument tracing

a5ce0f7

zxiong requested review from rohanpm, rbikar and querti as code owners November 27, 2023 06:48

[pre-commit.ci] auto fixes from pre-commit.com hooks

64f6887

for more information, see https://pre-commit.ci

zxiong requested a review from lubomir November 27, 2023 06:48

zxiong mentioned this pull request Nov 27, 2023

Instrument tracing for container push release-engineering/pubtools-quay#201

Merged

midnightercz approved these changes Nov 27, 2023

View reviewed changes

zxiong added 3 commits November 27, 2023 16:09

Fix tests

c8e3d4d

Using mock env-vars instead of updating

cba169c

Improve instrument tracing tests

261be6c

rohanpm requested changes Nov 28, 2023

View reviewed changes

zxiong and others added 7 commits November 28, 2023 10:06

Fix test coverage

b44b400

[pre-commit.ci] auto fixes from pre-commit.com hooks

b6da0e5

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

e9b5221

for more information, see https://pre-commit.ci

Add docs for OTEL instrument tracing

5947846

Fix tests

320e87f

Fix some typos

62b9922

zxiong requested a review from rohanpm December 4, 2023 10:31

rohanpm requested changes Dec 5, 2023

View reviewed changes

zxiong and others added 2 commits December 5, 2023 12:36

Refactor tracing code and fix some docs

0da956d

[pre-commit.ci] auto fixes from pre-commit.com hooks

f641afa

for more information, see https://pre-commit.ci

rohanpm requested changes Dec 5, 2023

View reviewed changes

rohanpm reviewed Dec 5, 2023

View reviewed changes

tests/tracing/test_instrument_tracing.py Show resolved Hide resolved

zxiong and others added 4 commits December 5, 2023 15:22

Fix TracingWrapper null issue

5c9c198

[pre-commit.ci] auto fixes from pre-commit.com hooks

e1695ff

for more information, see https://pre-commit.ci

Make TracingWrapper able to be wiped out

f707016

[pre-commit.ci] auto fixes from pre-commit.com hooks

c577c48

for more information, see https://pre-commit.ci

rohanpm reviewed Dec 5, 2023

View reviewed changes

zxiong and others added 2 commits December 6, 2023 09:58

Using plain class for TracingWrapper

87488e0

[pre-commit.ci] auto fixes from pre-commit.com hooks

ca57d6f

for more information, see https://pre-commit.ci

rohanpm approved these changes Dec 6, 2023

View reviewed changes

zxiong and others added 2 commits December 6, 2023 12:45

Improve OTEL tracing check and fix isort check

3b355d3

[pre-commit.ci] auto fixes from pre-commit.com hooks

ea0eeee

for more information, see https://pre-commit.ci

rohanpm approved these changes Dec 6, 2023

View reviewed changes

zxiong merged commit 3ba55b8 into release-engineering:main Dec 6, 2023
6 checks passed

		return cls.__instance


		def instrument_func(span_name=None, args_to_attr=False):



		Usage
		.....................

Add instrument tracing #149

Add instrument tracing #149

Conversation

zxiong commented Nov 27, 2023

codecov bot commented Nov 27, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rohanpm commented Dec 6, 2023

codecov bot commented Nov 27, 2023 •

edited

Loading