feat: Support OpenAI reasoning effort configuration for O1/O3 models #1045

da-moon · 2025-02-03T14:10:35Z

Reason for This Change

OpenAI's O1 and O3 models support configuring reasoning effort to control the
model's behavior. This change implements and improves support for reasoning
effort configuration through model name suffixes, making it easier to adjust
model behavior without modifying code.

Description of Changes

`crates/goose/src/providers/formats/openai.rs`

Core Functionality:
- Added reasoning effort extraction from model name suffixes (e.g.,
  o3-mini-high → high)
- Implemented default "medium" reasoning effort when not specified
- Ensured proper model name handling for both O-family and non-O-family
  models
- Added proper validation for o1-mini model (blocked as unsupported)
Model Name Processing:
- Improved model name handling with separate logic for O-family models
- Added robust parsing of model name suffixes
- Maintained original model names for non-O family models
- Implemented clean separation of model name and reasoning effort
Documentation and Code Structure:
- Added comprehensive module-level documentation explaining all features
- Added detailed function-level documentation with examples
- Split large functions into smaller, well-documented components
- Improved code organization with clear sections
- Added type annotations for better error handling clarity
Test Coverage:
- Added TestModelConfig struct for easier test case creation
- Added assert_request helper for consistent test validation
- Split tests into smaller, focused test functions
- Added comprehensive test cases for:
  - O1 and O3 model configurations
  - Default and custom reasoning efforts
  - Non-O family model handling
  - Edge cases and error conditions
- Improved test naming and organization
- Added specific tests for model name handling
- Added float comparison with appropriate epsilon values
Request Generation:
- Updated token field usage for O family models
- Removed temperature support for O1/O3 models
- Maintained backward compatibility for non-O family models
- Improved error messages for unsupported configurations
Code Quality Improvements:
- Extracted common test setup into helper functions
- Added clear separation between test categories
- Improved error message clarity
- Enhanced code readability with better variable names
- Added proper documentation for all public functions
- Implemented proper type annotations throughout
Internal Refactoring:
- Split the message formatting logic into smaller functions
- Added helper functions for processing different content types
- Improved error handling with proper type annotations
- Enhanced code organization with logical grouping

These changes provide a robust and well-documented implementation of the
reasoning effort feature while improving the overall code quality and
maintainability of the OpenAI format module.

Validation Criteria

Test Run

~/tmp/goose> cargo test --package goose --lib providers::formats::openai -- --nocapture
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.19s
     Running unittests src/lib.rs (target/debug/deps/goose-411d7bfe04e9d139)

running 24 tests
test providers::formats::openai::tests::test_create_request_o1_reasoning_effort ... ok
test providers::formats::openai::tests::test_create_request_o3_reasoning_effort ... ok
test providers::formats::openai::tests::test_format_tools ... ok
test providers::formats::openai::tests::test_format_messages ... ok
test providers::formats::openai::tests::test_format_tools_empty ... ok
test providers::formats::openai::tests::test_gpt35_turbo_config ... ok
test providers::formats::openai::tests::test_format_tools_duplicate ... ok
test providers::formats::openai::tests::test_gpt4_standard_config ... ok
test providers::formats::openai::tests::test_gpt4_with_version_suffix ... ok
test providers::formats::openai::tests::test_non_o_family_with_high_suffix ... ok
test providers::formats::openai::tests::test_non_o_family_with_low_suffix ... ok
test providers::formats::openai::tests::test_o1_default_reasoning_effort ... ok
test providers::formats::openai::tests::test_o1_mini_not_supported ... ok
test providers::formats::openai::tests::test_o1_custom_reasoning_effort ... ok
test providers::formats::openai::tests::test_non_o_family_with_medium_suffix ... ok
test providers::formats::openai::tests::test_o3_custom_reasoning_effort ... ok
test providers::formats::openai::tests::test_response_to_message_text ... ok
test providers::formats::openai::tests::test_o3_default_reasoning_effort ... ok
test providers::formats::openai::tests::test_o3_invalid_suffix_defaults_to_medium ... ok
test providers::formats::openai::tests::test_response_to_message_invalid_func_name ... ok
test providers::formats::openai::tests::test_response_to_message_json_decode_error ... ok
test providers::formats::openai::tests::test_response_to_message_valid_toolrequest ... ok
test providers::formats::openai::tests::test_format_messages_multiple_content ... ok
test providers::formats::openai::tests::test_format_messages_complex ... ok

test result: ok. 24 passed; 0 failed; 0 ignored; 0 measured; 65 filtered out; finished in 0.01s

This commit adds support for configuring reasoning effort in OpenAI O1 and O3 models through model name suffixes. The reasoning effort can be set to low, medium, or high (e.g., o3-mini-high) and defaults to medium if not specified. Signed-off-by: da-moon <[email protected]>

- Extract model name and reasoning effort handling into separate logic - Split large test functions into smaller atomic units - Add TestModelConfig struct and assert_request helper for test reuse - Fix temperature type handling (f32 vs f64) - Improve code organization and readability with clear sections - Add comprehensive test coverage for all model name scenarios Signed-off-by: da-moon <[email protected]>

add comprehensive documentation to OpenAI format module The changes include: - Added detailed module-level documentation explaining key features - Added function-level documentation with examples for all major functions - Split large functions into smaller, well-documented components - Improved code organization and readability through better function structure - Added type annotations for error handling clarity Signed-off-by: da-moon <[email protected]>

michaelneale · 2025-02-04T01:40:29Z

oh this is really cool - the openai formats are used by a few providers so we want to be careful with that., but I like this.

michaelneale · 2025-02-04T01:45:52Z

@da-moon looks like need some cargo fmt but some compile errors too - if you push again I think it will trigger CI

da-moon · 2025-02-04T01:51:09Z

I am pretty sure compilation worked as I actually used all 3 modes of O3-mini and ran a bunch of end to end tests and goose sessions. I might have made some breaking changes at one point, that's why the build is failing. Let me look into it and see why it is failing.

Sadly, the models were not as fine-tuned on tool calls as I expected them to and Sonnet was still better.

da-moon · 2025-02-04T01:52:19Z

I will work on handling formatting and dealing with the build issue tomorrow morning.

@michaelneale what are your thoughts on a pr that enables option to use prompt caching or turn it off?

da-moon · 2025-02-04T01:56:56Z

@michaelneale what are your thoughts on creating a docker image for building the binary so that there is a unified build environment in both CI and local machine? I can open a different PR for that and submit it by EOD tomorrow

- Make format functions public (format_text_content, format_tool_request, etc.) - Fix TextContent annotations field to use Option<Annotations> - Fix Message created field to use i64 instead of Option - Add missing imports to doc tests - Fix formatting Signed-off-by: da-moon <[email protected]>

da-moon · 2025-02-04T18:39:34Z

@michaelneale The compilation issue was due to failing doctest. it should be fixed now

Library Unit Tests

~/tmp/goose> cargo test --package goose --lib providers::formats::openai -- --nocapture
   Compiling goose v1.0.4 (/home/damoon/tmp/goose/crates/goose)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 11.60s
     Running unittests src/lib.rs (target/debug/deps/goose-411d7bfe04e9d139)

running 24 tests
test providers::formats::openai::tests::test_format_tools ... ok
test providers::formats::openai::tests::test_gpt35_turbo_config ... ok
test providers::formats::openai::tests::test_format_messages ... ok
test providers::formats::openai::tests::test_format_tools_empty ... ok
test providers::formats::openai::tests::test_gpt4_standard_config ... ok
test providers::formats::openai::tests::test_non_o_family_with_low_suffix ... ok
test providers::formats::openai::tests::test_create_request_o1_reasoning_effort ... ok
test providers::formats::openai::tests::test_non_o_family_with_medium_suffix ... ok
test providers::formats::openai::tests::test_o1_custom_reasoning_effort ... ok
test providers::formats::openai::tests::test_create_request_o3_reasoning_effort ... ok
test providers::formats::openai::tests::test_format_tools_duplicate ... ok
test providers::formats::openai::tests::test_o1_default_reasoning_effort ... ok
test providers::formats::openai::tests::test_non_o_family_with_high_suffix ... ok
test providers::formats::openai::tests::test_o3_custom_reasoning_effort ... ok
test providers::formats::openai::tests::test_o1_mini_not_supported ... ok
test providers::formats::openai::tests::test_gpt4_with_version_suffix ... ok
test providers::formats::openai::tests::test_o3_default_reasoning_effort ... ok
test providers::formats::openai::tests::test_response_to_message_text ... ok
test providers::formats::openai::tests::test_o3_invalid_suffix_defaults_to_medium ... ok
test providers::formats::openai::tests::test_response_to_message_json_decode_error ... ok
test providers::formats::openai::tests::test_response_to_message_invalid_func_name ... ok
test providers::formats::openai::tests::test_response_to_message_valid_toolrequest ... ok
test providers::formats::openai::tests::test_format_messages_multiple_content ... ok
test providers::formats::openai::tests::test_format_messages_complex ... ok

Doc Tests

~/tmp/goose> cargo test --doc -- --test-threads=1
   Compiling serial_test v3.2.0
   Compiling ciborium v0.2.2
   Compiling goose v1.0.4 (/home/damoon/tmp/goose/crates/goose)
   Compiling criterion v0.5.1
    Finished `test` profile [unoptimized + debuginfo] target(s) in 3.37s
   Doc-tests goose

running 10 tests
test crates/goose/src/config/base.rs - config::base::Config (line 68) - compile ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::create_request (line 561) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::format_messages (line 277) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::format_text_content (line 40) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::format_tool_request (line 71) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::format_tool_response (line 190) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::format_tools (line 352) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::get_usage (line 503) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::process_tool_content (line 134) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::response_to_message (line 413) ... ok

test result: ok. 10 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 43.00s

   Doc-tests goose_mcp

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests mcp_client

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests mcp_core

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests mcp_macros

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests mcp_server

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

da-moon added 2 commits February 3, 2025 09:02

da-moon force-pushed the main branch from 56f009b to 9e11943 Compare February 3, 2025 15:04

da-moon force-pushed the main branch from 9e11943 to 6777df4 Compare February 3, 2025 15:06

da-moon force-pushed the main branch from fa263a7 to 316a822 Compare February 4, 2025 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support OpenAI reasoning effort configuration for O1/O3 models #1045

feat: Support OpenAI reasoning effort configuration for O1/O3 models #1045

da-moon commented Feb 3, 2025 •

edited

Loading

michaelneale commented Feb 4, 2025

michaelneale commented Feb 4, 2025

da-moon commented Feb 4, 2025 •

edited

Loading

da-moon commented Feb 4, 2025 •

edited

Loading

da-moon commented Feb 4, 2025

da-moon commented Feb 4, 2025

feat: Support OpenAI reasoning effort configuration for O1/O3 models #1045

Are you sure you want to change the base?

feat: Support OpenAI reasoning effort configuration for O1/O3 models #1045

Conversation

da-moon commented Feb 3, 2025 • edited Loading

Reason for This Change

Description of Changes

crates/goose/src/providers/formats/openai.rs

Validation Criteria

michaelneale commented Feb 4, 2025

michaelneale commented Feb 4, 2025

da-moon commented Feb 4, 2025 • edited Loading

da-moon commented Feb 4, 2025 • edited Loading

da-moon commented Feb 4, 2025

da-moon commented Feb 4, 2025

da-moon commented Feb 3, 2025 •

edited

Loading

`crates/goose/src/providers/formats/openai.rs`

da-moon commented Feb 4, 2025 •

edited

Loading

da-moon commented Feb 4, 2025 •

edited

Loading