feat: Add GPT-5 support; harden OpenAI client init, token handling, and response parsing #491
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces several interrelated improvements to model handling, OpenAI client initialization, token accounting, and response parsing to prepare the codebase for GPT-5 style models and more robust runtime behavior.
Key changes
Add ModelType.GPT_5
File: typing.py
Registers GPT_5 in the model enum so the rest of the codebase can target gpt-5 explicitly.
Teach token counters and limits about GPT-5
File: utils.py
Include ModelType.GPT_5 where the code routes message counting to the OpenAI-style token counter.
Add a token limit mapping for GPT-5 (same high limit as other large models, 128k).
Harden OpenAI client initialization and token-completion logic
File: model_backend.py
Fall back to cl100k_base tiktoken encoding when tiktoken.encoding_for_model fails (robust to unknown/edge model names).
Wrap OpenAI client creation in try/except and surface clearer errors on failure.
Normalize token-limit map values (use large safe defaults for newer models).
For non-GPT-5 models: compute a safe max_tokens for completions and cap it with a conservative safe_cap (2048) to avoid large unintended completions.
For GPT-5 models: avoid sending legacy token/max_tokens parameters and instead send a minimal, GPT-5-friendly request body (use response_format and an extra_body map for model-specific flags like verbosity/reasoning_effort). This avoids sending incompatible fields.
Add defensive logging via log_visualize to help debug what token parameters are being used.
Keep the block that handles older OpenAI python client paths and similarly wrap and log failure cases.
Make web_spider robust to missing OpenAI client
File: web_spider.py
Wrap client initialization in try/except and set client = None on failure with a warning.
Short-circuit modal_trans if the client is not available and return an empty string (avoid crashing background web spider flows when OpenAI client can't be initialized).
Improve ChatAgent message handling for varied OpenAI response shapes
File: chat_agent.py
More defensive parsing of both new OpenAI SDK objects and older dict-style responses:
Avoid KeyError/AttributeError by using getattr/get with sensible defaults.
Strip or ignore fields the local ChatMessage doesn't support (e.g., extra annotations, reasoning fields) while passing through supported extras: refusal, audio, function_call, tool_calls.
Ensures empty/None content doesn't cause failures and preserves useful compatible fields.
Why this change
Adds explicit support for GPT-5 and prepares request/response handling for model differences (fields, behaviors).
Improves runtime robustness when a model isn't known to tiktoken or the OpenAI client initialization fails (useful on machines without correct environment variables or when using alternative base URLs).
Prevents sending incompatible payload fields to GPT-5-style endpoints and reduces risk of large, unbounded completions for older models by applying a conservative cap.
Prevents crashes in background utilities (web_spider) when the OpenAI client can't be created.
Makes ChatAgent resilient to payload shape differences across OpenAI SDK versions and model responses.
Migration notes / compatibility
GPT-5 is added to the ModelType enum; ensure any config or code that references model names is updated if necessary.
The changes intentionally avoid sending max_tokens and other legacy ChatGPT config fields to GPT-5-style models. If you have workflows that expect those fields to control completion length for GPT-5, you may need to adapt them to the new model-specific control parameters (e.g., the extra_body keys used here are examples).
The code falls back to cl100k_base encoding when tiktoken cannot resolve a model name; that may change token counts slightly compared with model-specific encodings but is preferable to failing outright.
Testing notes
Smoke test ideas:
Run unit or integration flows that call OpenAIModel.run() with:
a known older model (e.g., gpt-3.5-turbo) and confirm completions still arrive and max_tokens is capped.
a new gpt-5 model string (if available in your environment) and confirm the request is sent without legacy token fields.
Start the app with no OPENAI_API_KEY or with an invalid BASE_URL to confirm web_spider doesn't crash and prints the warning.
Trigger ChatAgent flows and simulate both new SDK-style responses and older dict responses to ensure no exceptions and that ChatMessage objects contain the expected subset of fields.
Files changed (high level)
typing.py — add GPT-5 enum
utils.py — include GPT-5 in token counting and token limit lookup
model_backend.py — robust encoding fallback, client init errors handled, GPT-5 friendly request composition, safe completion caps, logging
web_spider.py — safe OpenAI client init and runtime guard
chat_agent.py — defensive response parsing for OpenAI outputs
Checklist
Add GPT-5 to model types
Add token handling and token limit awareness for GPT-5
Harden OpenAI client init and fallback encoding
Send GPT-5-compatible request body (avoid legacy fields)
Add defensive parsing in ChatAgent for new/old OpenAI responses
Protect web_spider from client initialization failures