Add per-turn index to GenAI span names for agent/model requests

### Description

I’d like to propose adding a per-turn index to the GenAI span naming emitted by PydanticAI agent/model tracing, similar in spirit to the per-LLM-turn spans described in Promptfoo’s tracing documentation. Today the model spans are emitted with names like `chat {gen_ai.request.model}` and the corresponding semantic attributes include `gen_ai.operation.name="chat"` and `gen_ai.request.model="gpt-4.1-mini"` (or similar), which is useful, but it does not expose the turn number in a way that is easy to assert against during evaluation. 

My use case is evaluation with Promptfoo on top of OpenTelemetry traces outputed by PydanticAI agent run. I need to verify not just that the correct tools were called, but that the agent batched tool calls in the correct turn order. For example, in a workflow where the agent must paginate sequentially and then fan out tool calls in parallel, it is important to distinguish:  
1. a single LLM turn that emits multiple tool calls in parallel, versus  
2. multiple LLM turns that emit tools sequentially across generations. 

Right now, that distinction is hard to make reliably from the existing trace shape because the visible model span name is model-based and the agent span name is separate. In practice, the trace is great for observability, but it is not sufficient for clean automated assertions about turn structure when tool execution order matters. I ended up needing custom wrapper spans in my provider just to create a stable span name for counting turns, which feels like a workaround for something that could be built into the framework itself. 

I’d like to suggest that PydanticAI emit an incrementing turn index on every model request, for example by extending the current span naming or attributes to include something like:  
- span name: `gen_ai.operation.name + gen_ai.operation.turn + gen_ai.request.model`  

That would make trace-based evaluation much cleaner because it would let tools like Promptfoo distinguish turn boundaries directly from the trace, instead of inferring them indirectly from parent-child spans or from model-specific span titles. It would also make it easier to assert sequential versus parallel tool calling behavior, which is a common requirement for latency-sensitive agent workflows. 

### References

- Promptfoo per-LLM-turn spans documentation: <https://github.com/promptfoo/promptfoo/blob/feat/claude-agent-sdk-turn-spans/site/docs/tracing.md#per-llm-turn-spans>
- PydanticAI issue about span titles: <https://github.com/pydantic/pydantic-ai/issues/2925>
- PydanticAI issue about updating OpenTelemetry span and attribute names: <https://github.com/pydantic/pydantic-ai/issues/2964>
- OpenTelemetry GenAI span conventions: <https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/> 
- PydanticAI instrumented model docs: <https://pydantic.dev/docs/ai/api/models/instrumented/>
- PydanticAI span-based evaluation docs: <https://pydantic.dev/docs/ai/evals/evaluators/span-based/>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-turn index to GenAI span names for agent/model requests #5687

Description

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add per-turn index to GenAI span names for agent/model requests #5687

Description

Description

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions