Refactor: Centralize multi-step tool loop into StepLoop; gateways become single-turn adapters#349
Conversation
46b1fc1 to
c6cb2d6
Compare
|
Anthropic's thinking blocks include a cryptographic signature that must be replayed verbatim in tool use continuations. The old gateway preserved this by replaying raw API content. The orchestrator round-trips through typed DTOs, so the signature needs to be explicitly modeled. Added reasoningSignature to ToolCall, following the existing reasoningId/reasoningSummary pattern. The Anthropic gateway now extracts thinking text + signature and attaches them to tool calls. MapsMessages emits the full thinking block with signature on replay. Also fixed a pre-existing gap in DatabaseConversationStore — reasoningId, reasoningSummary, and the new reasoningSignature were being written to the DB but never read back on load. Longer term, reasoning could be modeled as standalone content parts on AssistantMessage with a provider metadata bag - keeping core types generic while letting each provider carry opaque data like signatures through the round-trip. Happy to take that on as a follow-up. |
bf02e78 to
7922b07
Compare
724e6d8 to
2eb0604
Compare
…e for LLM interactions - Updated TextGateway interface to replace TextResponse with SingleTurnResponse for generateText and streamText methods. - Implemented SingleTurnResponse in FakeTextGateway and OpenAiGateway, ensuring proper handling of tool calls and responses. - Introduced TextOrchestrator to manage multi-step tool loops and streamline response generation. - Removed deprecated onToolInvocation methods and adjusted tool invocation handling. - Enhanced parsing logic in OpenAi and Prism gateways to accommodate new response structure. - Added tests to validate new functionality and ensure backward compatibility.
…port - Added `previousResponseId` parameter to `generateText` and `streamText` methods in TextGateway and its implementations. - Introduced `buildContinuationBody` method in BuildsTextRequests to create lightweight continuation requests. - Updated TextOrchestrator to manage `previousResponseId` for multi-step interactions. - Enhanced response handling in OpenAiGateway and other classes to accommodate new stateful continuation logic. - Added serialization for tool result outputs to streamline API requests. - Updated SingleTurnResponse and StreamEnd to include response ID for better tracking of interactions.
- Simplified the tool name comparison by removing the method_exists check for 'name', directly using class_basename for matching. - This change enhances code readability and maintains functionality.
… stateful continuation support - Updated variable usage to enhance clarity and maintainability by replacing direct calls with a single variable for last results. - Adjusted the buildFinalResponse method to utilize the last result for structured data handling. - Removed the raw data parameter from SingleTurnResponse to simplify the response structure.
…d streamline response handling - Updated generateText and streamText methods to include previousResponseId parameter for stateful interactions. - Replaced TextResponse with SingleTurnResponse in response parsing and handling across relevant classes. - Simplified response processing logic by removing unnecessary parameters and enhancing clarity. - Adjusted tool invocation handling to improve maintainability and performance in streaming responses.
- Introduced handling for server tool blocks to improve event emission during text streaming. - Updated tool call filtering to exclude synthetic structured output, ensuring accurate tool call tracking. - Adjusted finish reason logic to correctly identify stop conditions based on real tool calls. - Enhanced response parsing to accommodate structured data extraction when applicable.
- Removed unused parameters `tools` and `schema` from `streamText` and `processTextStream` methods to simplify method signatures and improve clarity. - Updated relevant gateway implementations to reflect these changes, enhancing maintainability.
- Introduced handling for reasoning text and signatures in text streaming and response parsing. - Updated `processTextStream` to capture and emit reasoning details during tool calls. - Enhanced `mapAssistantMessage` to include reasoning text and signature in assistant messages. - Added methods to extract and attach reasoning data to tool calls for improved replay accuracy.
…nse and streamline response handling - Updated generateText and streamText methods to include previousResponseId parameter for stateful interactions. - Replaced TextResponse with SingleTurnResponse in response parsing and handling across relevant classes. - Simplified method signatures by removing unused parameters and enhancing clarity. - Enhanced text streaming processes to improve event emission and tool call tracking. - Adjusted response processing logic to accommodate new stateful continuation support.
…d continuation support - Introduced `buildContinuationBody` method to streamline the creation of continuation requests using `previous_response_id`. - Enhanced `mergeSharedResponsesRequestOptions` to consolidate shared options for both initial and continuation requests. - Updated `generateText` and `streamText` methods to utilize continuation logic, ensuring consistent request structure. - Added tests to validate the correct handling of tool choices and options in follow-up requests.
…o support provider execution status
StepLoop pairs with the existing Step DTO it produces and reads more concretely than "Orchestrator". SingleTurnResponse is an internal contract between gateways and the step loop, so it belongs under `Laravel\Ai\Gateway` alongside its consumer rather than under `Laravel\Ai\Responses` where user-facing response types live. No behavior change.
…tract These gateways landed on 0.x with the pre-refactor shape (per-gateway tool loop, recursive continueWithToolResults, TextResponse return type). Convert them to implement the new TextGateway contract: one LLM call per invocation, returning SingleTurnResponse; the multi-step loop is handled by StepLoop in the provider trait. Behavioral quirks preserved: - Ollama: force FinishReason::ToolCalls when tool_calls are populated regardless of done_reason, since real Ollama responses can report "stop" alongside tool calls. - OpenRouter: treat finish_reason "error" with inline error payload as a stream Error event.
2eb0604 to
78ec5d3
Compare
|
Just rebased over #409. Thought it was a nice example of what this refactor helps with: that fix needed changes in BuildsTextRequests, continueWithToolResults, and handleStreamingToolCalls within each affected gateway. With the centralized step loop, only BuildsTextRequests needs the fix per gateway since continuation just calls the same request builder again. Fewer places to patch within each gateway for future fixes like this one 🙂 |
… and tool call behavior
Summary
This PR extracts the multi-step tool loop from individual gateways into a central
StepLoopclass, making gateways thin single-turn adapters. The net result is ~3,600 lines of duplicated orchestration code replaced by ~850 lines of shared logic and leaner gateways.StepLoop(src/Gateway/StepLoop.php) — new central class that owns the multi-step tool loop (LLM call → tool execution → re-prompt) for bothgenerateandstreamflows. Named after theStepDTO it produces on each iteration.SingleTurnResponse(src/Gateway/SingleTurnResponse.php) — lean DTO representing one LLM turn, returned by all gateways. Lives underLaravel\Ai\Gateway(notResponses) because it's an internal gateway↔loop contract, never exposed to userland.ParsesTextResponsesandHandlesTextStreamingtraits drop recursive multi-step logic entirely.previous_response_idthreading for OpenAI / xAI —StepLooppasses an opaque provider response ID between turns so providers that support stateful continuation can use it instead of replaying the full conversation.providerExecutedflag onToolCall— Anthropic'sserver_tool_useblocks (web_search, web_fetch, code_execution, advisor) are now tagged when parsed and re-emitted asserver_tool_useon replay. The step loop skips local execution for them via the existingfindTool()→ null path.providerOptionscorrectly persisted in continuation requests (addresses the fix from Fix OpenAI strict tool parameters and persist providerOptions in tool loops #340).A note on scope
This is a large architectural change and my first PR to this repo, so understandably there may not be trust built up yet. I'm completely fine with this serving as a reference implementation that maintainers can review, cherry-pick from, or rewrite as they see fit.
I raised the proposal in #347 after noticing the same orchestration logic being duplicated across new gateway PRs (#309, #311). @pushpak1300 mentioned wanting to get migrated off Prism first before making arch changes — this PR is compatible with that direction since it doesn't remove Prism, it just makes
PrismGatewaya thinner adapter withwithMaxSteps(1). I've also been contributing to Prism and working on client-executed tools and tool approval there (prism-php/prism#932), which is what originally motivated this refactor — those features would only need to be implemented once in the step loop rather than in every gateway.Happy to iterate on any feedback.
Motivation
Before this change, every gateway independently implemented the same recursive multi-step tool orchestration —
ParsesTextResponseshadprocessResponse→continueWithToolResults→processResponseloops, andHandlesTextStreaminghad parallelhandleStreamingToolCalls→processTextStreamrecursion. This meant:server_tool_usereplay, OpenAI'sprevious_response_idcontinuation, or Anthropic's reasoning-signature pass-through had to be thought through in each gateway's bespoke loop.With orchestration centralized, a new gateway only needs to implement single-turn LLM communication — one
generateText()and onestreamText()method. The surface area becomes small enough that providers could realistically be maintained as community packages: anyone could add support for a new LLM by implementing the single-turn contract without understanding or replicating the orchestration logic.Changes
TextGatewaycontractgenerateText()returnsSingleTurnResponse, accepts?string $previousResponseId;onToolInvocation()removedStepLoop(new)previousResponseIdthreading, streaming pathSingleTurnResponse(new)ToolCallproviderExecuted: boolflag for server-side tool invocations (Anthropic server_tool_use)OpenAiGatewayParsesTextResponsesdrops ~270 lines;BuildsTextRequestshandlesprevious_response_idAnthropicGatewayParsesTextResponsesdrops ~260 lines;HandlesTextStreamingdrops ~220 lines;providerExecutedre-emitsserver_tool_useblocks on replayGroqGatewayParsesTextResponsesdrops ~280 lines;HandlesTextStreamingdrops ~190 linesGeminiGatewayParsesTextResponsesdrops ~210 lines;HandlesTextStreamingdrops ~150 linesMistralGatewayParsesTextResponsesdrops ~250 lines;HandlesTextStreamingdrops ~160 linesXaiGatewayParsesTextResponsesdrops ~250 lines;BuildsTextRequestshandlesprevious_response_idDeepSeekGatewayOllamaGatewayFinishReason::ToolCallswhentool_callsare populated regardless ofdone_reason, since Ollama can return"stop"alongside tool callsOpenRouterGatewayfinish_reason: "error"with inline error payload → emit streamErroreventPrismGatewaywithMaxSteps(1)FakeTextGatewayGeneratesText/StreamsTextStepLoopand delegate; tool-invocation events wired via callbacksStreamEnd?string $responseIdfor provider ID propagation through the streaming path