Skip to content

Conversation

@nic-6443
Copy link
Member

@nic-6443 nic-6443 commented Oct 19, 2025

Description

Run jaeger in local by https://www.jaegertracing.io/docs/2.11/getting-started/#all-in-one

image
  • New Features

    • Added comprehensive distributed tracing across request lifecycle (SSL/SNI, access, header/body filter, upstream, and logging).
    • Enhanced OpenTelemetry integration with improved span propagation, context management, and per-request span lifecycle.
    • New tracing configuration option to enable detailed observability.
  • Tests

    • Added OpenTelemetry plugin tracing test suite validating span generation and propagation.

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@Revolyssup Revolyssup marked this pull request as ready for review October 23, 2025 14:21
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Oct 23, 2025
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Oct 23, 2025
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Oct 23, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Oct 24, 2025
@github-actions
Copy link

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 29, 2025
@github-actions
Copy link

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jan 26, 2026
@AlinsRan AlinsRan reopened this Jan 29, 2026
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jan 29, 2026
@AlinsRan AlinsRan marked this pull request as draft January 29, 2026 02:00
@AlinsRan AlinsRan marked this pull request as ready for review January 29, 2026 05:46
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Jan 29, 2026
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jan 29, 2026
@github-actions github-actions bot removed the stale label Jan 29, 2026
end

inject_core_spans(ctx, api_ctx, conf)
span:set_attributes(attr.int("http.status_code", upstream_status))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use http.response.status_code to comply with the OTEL semantic conventions.

https://opentelemetry.io/docs/specs/semconv/registry/attributes/http/

Comment on lines 305 to 313
-> net.host.name: Str(127.0.0.1)
-> http.method: Str(GET)
-> http.scheme: Str(http)
-> http.target: Str(/anything)
-> http.user_agent: Str(curl/7.64.1)
-> http.target: Str(/headers)
-> http.user_agent: Str(curl/8.16.0)
-> apisix.route_id: Str(otel-tracing-route)
-> apisix.route_name: Empty()
-> http.route: Str(/anything)
-> http.route: Str(/headers)
-> http.status_code: Int(200)
Copy link
Contributor

@bzp2010 bzp2010 Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. check them again against the conventions https://opentelemetry.io/docs/specs/semconv/registry/attributes/http/

For previously added attributes that do not comply with the convention, you may retain them, but please add a new attribute that complies with the convention. Because those old attributes may have been marked as deprecated.

Comment on lines +304 to +312
-> net.host.name: Str(127.0.0.1)
-> http.method: Str(GET)
-> http.scheme: Str(http)
-> http.target: Str(/headers)
-> http.user_agent: Str(curl/8.16.0)
-> apisix.route_id: Str(otel-tracing-route)
-> apisix.route_name: Empty()
-> http.route: Str(/headers)
-> http.status_code: Int(200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 173 to 185
@@ -177,9 +179,10 @@ function _M.match_and_set(api_ctx, match_only, alt_sni)
-- with it sometimes
core.log.error("failed to find any SSL certificate by SNI: ", sni)
end
tracer.finish(api_ctx.ngx_ctx, tracer.status.ERROR, "failed match SNI")
return false
end

tracer.finish(api_ctx.ngx_ctx)
Copy link
Contributor

@bzp2010 bzp2010 Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly, this API design is still strange, with many internal details hidden within the stack data structure. For developers, it presents an opaque data structure.
If any span of code begins with tracer.start but fails to conclude with tracer.finish, the entire tracing process will encounter an error. This unclosed span will encompass all subsequent spans, which is not the intended behavior.

If we examine OTEL implementations in other major programming languages, we find that many of them follow the following pattern:

Use the tracer's method to create a span, then obtain an instance of that span. After internal operations complete, use the span instance's methods to actively terminate the specified span. The ctx will be used to track the current span, ensuring that parent-child relationships are correctly maintained when creating child spans.

-- context-based
local parent_span = tracer.start(ctx, "parent")

local child_span = tracer.start(ctx, "child")

child_span:end()

parent_span:end()

ref: https://opentelemetry.io/docs/languages/go/instrumentation/#create-nested-spans
ref: https://opentelemetry.io/docs/languages/php/instrumentation/#create-nested-spans
ref: https://opentelemetry.io/docs/languages/js/instrumentation/#create-nested-spans
ref: https://opentelemetry.io/docs/languages/java/api/#span
ref: https://opentelemetry.io/docs/languages/cpp/instrumentation/#create-nested-spans

Alternatively, explicitly pass the parent span instance to the child span to achieve precise hierarchical control.

-- pass span directly
local parent_span = tracer.start("parent")

local child_span = tracer.start("child", parent_span)

child_span:end()

parent_span:end()

Note that the following Rust code typically terminates spans automatically via drop calls when the span instance's variable scopes end.

ref: https://docs.rs/tracing/latest/tracing/span/index.html#span-relationships

None of the OTEL libraries for these languages organize and manage the entire span stack using only global state. Each one terminates spans by returning span instances and requiring developers to explicitly call span.end() when operations conclude.

Our current approach requires any developer writing trace code to be familiar with the stack mechanism we've created and to have complete clarity about where their spans appear in the stack, whether parent spans exist, whether they've closed as expected, or any other state. Otherwise, they cannot write correct code. Therefore, I disagree with the current API design.

It's all too easy to make mistakes like the one below.

tracer.start("parent")

tracer.start("child")

tracer.finish()

-- missing last tracer.finish

The result is an output issue, because regardless of what you do, calling finish_all will terminate the span. However, the parent-child relationships and timestamps for all subsequent spans will be incorrect. This won't trigger any obvious error messages, making it extremely difficult for users to debug.


I have already pointed this out in #12686 (comment), and @membphis has clearly reiterated this point through images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants