Skip to content

Add per-turn index to GenAI span names for agent/model requests #5687

@sipa-echo-ngbm

Description

@sipa-echo-ngbm

Description

I’d like to propose adding a per-turn index to the GenAI span naming emitted by PydanticAI agent/model tracing, similar in spirit to the per-LLM-turn spans described in Promptfoo’s tracing documentation. Today the model spans are emitted with names like chat {gen_ai.request.model} and the corresponding semantic attributes include gen_ai.operation.name="chat" and gen_ai.request.model="gpt-4.1-mini" (or similar), which is useful, but it does not expose the turn number in a way that is easy to assert against during evaluation.

My use case is evaluation with Promptfoo on top of OpenTelemetry traces outputed by PydanticAI agent run. I need to verify not just that the correct tools were called, but that the agent batched tool calls in the correct turn order. For example, in a workflow where the agent must paginate sequentially and then fan out tool calls in parallel, it is important to distinguish:

  1. a single LLM turn that emits multiple tool calls in parallel, versus
  2. multiple LLM turns that emit tools sequentially across generations.

Right now, that distinction is hard to make reliably from the existing trace shape because the visible model span name is model-based and the agent span name is separate. In practice, the trace is great for observability, but it is not sufficient for clean automated assertions about turn structure when tool execution order matters. I ended up needing custom wrapper spans in my provider just to create a stable span name for counting turns, which feels like a workaround for something that could be built into the framework itself.

I’d like to suggest that PydanticAI emit an incrementing turn index on every model request, for example by extending the current span naming or attributes to include something like:

  • span name: gen_ai.operation.name + gen_ai.operation.turn + gen_ai.request.model

That would make trace-based evaluation much cleaner because it would let tools like Promptfoo distinguish turn boundaries directly from the trace, instead of inferring them indirectly from parent-child spans or from model-specific span titles. It would also make it easier to assert sequential versus parallel tool calling behavior, which is a common requirement for latency-sensitive agent workflows.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    OpenTelemetryevalsfeatureNew feature request, or PR implementing a feature (enhancement)pydanty:featureManaged by pydanty dogfooding automation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions