Skip to content

feat: add more spans to opentelemetry plugin#12686

Merged
nic-6443 merged 55 commits intoapache:masterfrom
nic-6443:nic/opentelemetry
Feb 7, 2026
Merged

feat: add more spans to opentelemetry plugin#12686
nic-6443 merged 55 commits intoapache:masterfrom
nic-6443:nic/opentelemetry

Conversation

@nic-6443
Copy link
Member

@nic-6443 nic-6443 commented Oct 19, 2025

Description

Run jaeger in local by https://www.jaegertracing.io/docs/2.11/getting-started/#all-in-one

image
  • New Features

    • Added comprehensive distributed tracing across request lifecycle (SSL/SNI, access, header/body filter, upstream, and logging).
    • Enhanced OpenTelemetry integration with improved span propagation, context management, and per-request span lifecycle.
    • New tracing configuration option to enable detailed observability.
  • Tests

    • Added OpenTelemetry plugin tracing test suite validating span generation and propagation.

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@Revolyssup Revolyssup marked this pull request as ready for review October 23, 2025 14:21
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Oct 23, 2025
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Oct 23, 2025
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Oct 23, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Oct 24, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Feb 3, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “comprehensive request lifecycle tracing” mode for the OpenTelemetry integration, instrumenting more APISIX internals and documenting/testing the resulting spans.

Changes:

  • Introduces a core tracing utility (apisix/tracer.lua) and span model (apisix/utils/span.lua) to capture APISIX-internal spans across phases.
  • Instruments multiple request lifecycle points (SSL/SNI matching, DNS resolve, plugin/global-rule execution, phases) and injects them into OpenTelemetry export.
  • Adds apisix.tracing config flag, updates EN/ZH docs, and adds a new tracing-focused test.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
t/plugin/opentelemetry6.t New test covering additional spans being emitted and exported.
docs/zh/latest/plugins/opentelemetry.md Documents enabling comprehensive lifecycle tracing and shows updated collector output.
docs/en/latest/plugins/opentelemetry.md Same as ZH docs, for English readers.
conf/config.yaml.example Adds apisix.tracing example + explanation.
apisix/cli/config.lua Adds default apisix.tracing = false in CLI default config.
apisix/tracer.lua New internal tracer used to create/finish APISIX core spans.
apisix/utils/span.lua New span container type used by the internal tracer.
apisix/init.lua Starts/finishes internal spans across HTTP + SSL phases; finishes all spans in log phase and releases tracing state.
apisix/plugin.lua Wraps plugin execution + global rules with spans.
apisix/utils/upstream.lua Adds a resolve_dns span around domain resolution.
apisix/ssl/router/radixtree_sni.lua Adds a sni_radixtree_match span around SNI routing.
apisix/secret.lua Adds a fetch_secret span around secret retrieval.
apisix/plugins/opentelemetry.lua Adds attributes + injects internal APISIX spans into OpenTelemetry export in log phase.
apisix/core/response.lua Attempts to mark spans as error on >= 400 exits.
Comments suppressed due to low confidence (1)

apisix/secret.lua:157

  • A span is started (span = tracer.start(...)) but on the pcall(require, ...) failure path the function returns without finishing the span, which will leak an unfinished span into ngx.ctx.tracing. Finish the span with an error status before returning.
    local span = tracer.start(ngx.ctx, "fetch_secret", tracer.kind.client)
    local ok, sm = pcall(require, "apisix.secret." .. opts.manager)
    if not ok then
        return nil, "no secret manager: " .. opts.manager
    end

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

apisix/secret.lua:157

  • A span is started before pcall(require, ...), but when require fails this function returns without finishing the span. That leaves an open child span in ngx.ctx.tracing and can skew parent/child tracking until finish_all runs. Finish the span on this error path as well (and set ERROR status/message).
    local span = tracer.start(ngx.ctx, "fetch_secret", tracer.kind.client)
    local ok, sm = pcall(require, "apisix.secret." .. opts.manager)
    if not ok then
        return nil, "no secret manager: " .. opts.manager
    end

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@bzp2010 bzp2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no more problems to say.

I've already expressed my views twice; this would be the third time.

A span is an instance. Whatever it means to Lua—a metatable or anything else—I don't care.

We can always terminate a span using the following way:

local span = tracer.start()

span:finsih(api_ctx.ngx_ctx or ngx.ctx, status, message)

-- Even `api_ctx.ngx_ctx` can be omitted, since `ngx.ctx` is always available in any context where `api_ctx` can be accessed.

Simply attaching an identical metatable to the span table and obtaining the span instance from self is sufficient to achieve the same result internally as you are currently doing.
I don't understand why the API design persists in making function calls statelessly from the tracer. I've already repeated this three times, and I don't want to repeat it again.

  1. #12686 (comment)
  2. #12686 (comment)

Throughout the entire process, the PR never addressed my concerns regarding the API design.

I'm not sure how these design conflicts arose, as maintaining API compatibility with other popular implementations shouldn't be difficult. There is no external evidence to support this API design.

Since this API is unexpected for me, I won't approve this PR. However, I won't block the merge either—you'll need approval from other maintainers.

@nic-6443 nic-6443 force-pushed the nic/opentelemetry branch 2 times, most recently from 73eb186 to 83be609 Compare February 7, 2026 05:49
Signed-off-by: Nic <[email protected]>
@nic-6443 nic-6443 requested a review from membphis February 7, 2026 07:39
@nic-6443 nic-6443 merged commit afda194 into apache:master Feb 7, 2026
25 of 26 checks passed
@nic-6443 nic-6443 deleted the nic/opentelemetry branch February 7, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants