-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat: add more spans to opentelemetry plugin #12686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Nic <[email protected]>
Signed-off-by: Nic <[email protected]>
Signed-off-by: Nic <[email protected]>
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
8d48eeb to
359ddc4
Compare
| end | ||
|
|
||
| inject_core_spans(ctx, api_ctx, conf) | ||
| span:set_attributes(attr.int("http.status_code", upstream_status)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use http.response.status_code to comply with the OTEL semantic conventions.
https://opentelemetry.io/docs/specs/semconv/registry/attributes/http/
| -> net.host.name: Str(127.0.0.1) | ||
| -> http.method: Str(GET) | ||
| -> http.scheme: Str(http) | ||
| -> http.target: Str(/anything) | ||
| -> http.user_agent: Str(curl/7.64.1) | ||
| -> http.target: Str(/headers) | ||
| -> http.user_agent: Str(curl/8.16.0) | ||
| -> apisix.route_id: Str(otel-tracing-route) | ||
| -> apisix.route_name: Empty() | ||
| -> http.route: Str(/anything) | ||
| -> http.route: Str(/headers) | ||
| -> http.status_code: Int(200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. check them again against the conventions https://opentelemetry.io/docs/specs/semconv/registry/attributes/http/
For previously added attributes that do not comply with the convention, you may retain them, but please add a new attribute that complies with the convention. Because those old attributes may have been marked as deprecated.
| -> net.host.name: Str(127.0.0.1) | ||
| -> http.method: Str(GET) | ||
| -> http.scheme: Str(http) | ||
| -> http.target: Str(/headers) | ||
| -> http.user_agent: Str(curl/8.16.0) | ||
| -> apisix.route_id: Str(otel-tracing-route) | ||
| -> apisix.route_name: Empty() | ||
| -> http.route: Str(/headers) | ||
| -> http.status_code: Int(200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| @@ -177,9 +179,10 @@ function _M.match_and_set(api_ctx, match_only, alt_sni) | |||
| -- with it sometimes | |||
| core.log.error("failed to find any SSL certificate by SNI: ", sni) | |||
| end | |||
| tracer.finish(api_ctx.ngx_ctx, tracer.status.ERROR, "failed match SNI") | |||
| return false | |||
| end | |||
|
|
|||
| tracer.finish(api_ctx.ngx_ctx) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly, this API design is still strange, with many internal details hidden within the stack data structure. For developers, it presents an opaque data structure.
If any span of code begins with tracer.start but fails to conclude with tracer.finish, the entire tracing process will encounter an error. This unclosed span will encompass all subsequent spans, which is not the intended behavior.
If we examine OTEL implementations in other major programming languages, we find that many of them follow the following pattern:
Use the tracer's method to create a span, then obtain an instance of that span. After internal operations complete, use the span instance's methods to actively terminate the specified span. The ctx will be used to track the current span, ensuring that parent-child relationships are correctly maintained when creating child spans.
-- context-based
local parent_span = tracer.start(ctx, "parent")
local child_span = tracer.start(ctx, "child")
child_span:end()
parent_span:end()ref: https://opentelemetry.io/docs/languages/go/instrumentation/#create-nested-spans
ref: https://opentelemetry.io/docs/languages/php/instrumentation/#create-nested-spans
ref: https://opentelemetry.io/docs/languages/js/instrumentation/#create-nested-spans
ref: https://opentelemetry.io/docs/languages/java/api/#span
ref: https://opentelemetry.io/docs/languages/cpp/instrumentation/#create-nested-spans
Alternatively, explicitly pass the parent span instance to the child span to achieve precise hierarchical control.
-- pass span directly
local parent_span = tracer.start("parent")
local child_span = tracer.start("child", parent_span)
child_span:end()
parent_span:end()Note that the following Rust code typically terminates spans automatically via drop calls when the span instance's variable scopes end.
ref: https://docs.rs/tracing/latest/tracing/span/index.html#span-relationships
None of the OTEL libraries for these languages organize and manage the entire span stack using only global state. Each one terminates spans by returning span instances and requiring developers to explicitly call span.end() when operations conclude.
Our current approach requires any developer writing trace code to be familiar with the stack mechanism we've created and to have complete clarity about where their spans appear in the stack, whether parent spans exist, whether they've closed as expected, or any other state. Otherwise, they cannot write correct code. Therefore, I disagree with the current API design.
It's all too easy to make mistakes like the one below.
tracer.start("parent")
tracer.start("child")
tracer.finish()
-- missing last tracer.finish
The result is an output issue, because regardless of what you do, calling finish_all will terminate the span. However, the parent-child relationships and timestamps for all subsequent spans will be incorrect. This won't trigger any obvious error messages, making it extremely difficult for users to debug.
I have already pointed this out in #12686 (comment), and @membphis has clearly reiterated this point through images.

Description
Run jaeger in local by https://www.jaegertracing.io/docs/2.11/getting-started/#all-in-one
New Features
tracingconfiguration option to enable detailed observability.Tests
Checklist