Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support OpenAI reasoning effort configuration for O1/O3 models #1045

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

da-moon
Copy link

@da-moon da-moon commented Feb 3, 2025

Reason for This Change

OpenAI's O1 and O3 models support configuring reasoning effort to control the
model's behavior. This change implements and improves support for reasoning
effort configuration through model name suffixes, making it easier to adjust
model behavior without modifying code.

Description of Changes

crates/goose/src/providers/formats/openai.rs

  1. Core Functionality:

    • Added reasoning effort extraction from model name suffixes (e.g.,
      o3-mini-high → high)
    • Implemented default "medium" reasoning effort when not specified
    • Ensured proper model name handling for both O-family and non-O-family
      models
    • Added proper validation for o1-mini model (blocked as unsupported)
  2. Model Name Processing:

    • Improved model name handling with separate logic for O-family models
    • Added robust parsing of model name suffixes
    • Maintained original model names for non-O family models
    • Implemented clean separation of model name and reasoning effort
  3. Documentation and Code Structure:

    • Added comprehensive module-level documentation explaining all features
    • Added detailed function-level documentation with examples
    • Split large functions into smaller, well-documented components
    • Improved code organization with clear sections
    • Added type annotations for better error handling clarity
  4. Test Coverage:

    • Added TestModelConfig struct for easier test case creation
    • Added assert_request helper for consistent test validation
    • Split tests into smaller, focused test functions
    • Added comprehensive test cases for:
      • O1 and O3 model configurations
      • Default and custom reasoning efforts
      • Non-O family model handling
      • Edge cases and error conditions
    • Improved test naming and organization
    • Added specific tests for model name handling
    • Added float comparison with appropriate epsilon values
  5. Request Generation:

    • Updated token field usage for O family models
    • Removed temperature support for O1/O3 models
    • Maintained backward compatibility for non-O family models
    • Improved error messages for unsupported configurations
  6. Code Quality Improvements:

    • Extracted common test setup into helper functions
    • Added clear separation between test categories
    • Improved error message clarity
    • Enhanced code readability with better variable names
    • Added proper documentation for all public functions
    • Implemented proper type annotations throughout
  7. Internal Refactoring:

    • Split the message formatting logic into smaller functions
    • Added helper functions for processing different content types
    • Improved error handling with proper type annotations
    • Enhanced code organization with logical grouping

These changes provide a robust and well-documented implementation of the
reasoning effort feature while improving the overall code quality and
maintainability of the OpenAI format module.

Validation Criteria

Test Run

~/tmp/goose> cargo test --package goose --lib providers::formats::openai -- --nocapture
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.19s
     Running unittests src/lib.rs (target/debug/deps/goose-411d7bfe04e9d139)

running 24 tests
test providers::formats::openai::tests::test_create_request_o1_reasoning_effort ... ok
test providers::formats::openai::tests::test_create_request_o3_reasoning_effort ... ok
test providers::formats::openai::tests::test_format_tools ... ok
test providers::formats::openai::tests::test_format_messages ... ok
test providers::formats::openai::tests::test_format_tools_empty ... ok
test providers::formats::openai::tests::test_gpt35_turbo_config ... ok
test providers::formats::openai::tests::test_format_tools_duplicate ... ok
test providers::formats::openai::tests::test_gpt4_standard_config ... ok
test providers::formats::openai::tests::test_gpt4_with_version_suffix ... ok
test providers::formats::openai::tests::test_non_o_family_with_high_suffix ... ok
test providers::formats::openai::tests::test_non_o_family_with_low_suffix ... ok
test providers::formats::openai::tests::test_o1_default_reasoning_effort ... ok
test providers::formats::openai::tests::test_o1_mini_not_supported ... ok
test providers::formats::openai::tests::test_o1_custom_reasoning_effort ... ok
test providers::formats::openai::tests::test_non_o_family_with_medium_suffix ... ok
test providers::formats::openai::tests::test_o3_custom_reasoning_effort ... ok
test providers::formats::openai::tests::test_response_to_message_text ... ok
test providers::formats::openai::tests::test_o3_default_reasoning_effort ... ok
test providers::formats::openai::tests::test_o3_invalid_suffix_defaults_to_medium ... ok
test providers::formats::openai::tests::test_response_to_message_invalid_func_name ... ok
test providers::formats::openai::tests::test_response_to_message_json_decode_error ... ok
test providers::formats::openai::tests::test_response_to_message_valid_toolrequest ... ok
test providers::formats::openai::tests::test_format_messages_multiple_content ... ok
test providers::formats::openai::tests::test_format_messages_complex ... ok

test result: ok. 24 passed; 0 failed; 0 ignored; 0 measured; 65 filtered out; finished in 0.01s

This commit adds support for configuring reasoning effort in OpenAI O1
and O3 models through model name suffixes. The reasoning effort can be
set to low, medium, or high (e.g., o3-mini-high) and defaults to medium
if not specified.

Signed-off-by: da-moon <[email protected]>
- Extract model name and reasoning effort handling into separate logic
- Split large test functions into smaller atomic units
- Add TestModelConfig struct and assert_request helper for test reuse
- Fix temperature type handling (f32 vs f64)
- Improve code organization and readability with clear sections
- Add comprehensive test coverage for all model name scenarios

Signed-off-by: da-moon <[email protected]>
add comprehensive documentation
to OpenAI format module

The changes include:
- Added detailed module-level documentation explaining key features
- Added function-level documentation with examples for all major functions
- Split large functions into smaller, well-documented components
- Improved code organization and readability through better function structure
- Added type annotations for error handling clarity

Signed-off-by: da-moon <[email protected]>
@michaelneale
Copy link
Collaborator

oh this is really cool - the openai formats are used by a few providers so we want to be careful with that., but I like this.

@michaelneale
Copy link
Collaborator

@da-moon looks like need some cargo fmt but some compile errors too - if you push again I think it will trigger CI

@da-moon
Copy link
Author

da-moon commented Feb 4, 2025

I am pretty sure compilation worked as I actually used all 3 modes of O3-mini and ran a bunch of end to end tests and goose sessions. I might have made some breaking changes at one point, that's why the build is failing. Let me look into it and see why it is failing.

Sadly, the models were not as fine-tuned on tool calls as I expected them to and Sonnet was still better.

@da-moon
Copy link
Author

da-moon commented Feb 4, 2025

I will work on handling formatting and dealing with the build issue tomorrow morning.

@michaelneale what are your thoughts on a pr that enables option to use prompt caching or turn it off?

@da-moon
Copy link
Author

da-moon commented Feb 4, 2025

@michaelneale what are your thoughts on creating a docker image for building the binary so that there is a unified build environment in both CI and local machine? I can open a different PR for that and submit it by EOD tomorrow

- Make format functions public (format_text_content, format_tool_request, etc.)
- Fix TextContent annotations field to use Option<Annotations>
- Fix Message created field to use i64 instead of Option
- Add missing imports to doc tests
- Fix formatting

Signed-off-by: da-moon <[email protected]>
@da-moon
Copy link
Author

da-moon commented Feb 4, 2025

@michaelneale The compilation issue was due to failing doctest. it should be fixed now

  • Library Unit Tests
~/tmp/goose> cargo test --package goose --lib providers::formats::openai -- --nocapture
   Compiling goose v1.0.4 (/home/damoon/tmp/goose/crates/goose)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 11.60s
     Running unittests src/lib.rs (target/debug/deps/goose-411d7bfe04e9d139)

running 24 tests
test providers::formats::openai::tests::test_format_tools ... ok
test providers::formats::openai::tests::test_gpt35_turbo_config ... ok
test providers::formats::openai::tests::test_format_messages ... ok
test providers::formats::openai::tests::test_format_tools_empty ... ok
test providers::formats::openai::tests::test_gpt4_standard_config ... ok
test providers::formats::openai::tests::test_non_o_family_with_low_suffix ... ok
test providers::formats::openai::tests::test_create_request_o1_reasoning_effort ... ok
test providers::formats::openai::tests::test_non_o_family_with_medium_suffix ... ok
test providers::formats::openai::tests::test_o1_custom_reasoning_effort ... ok
test providers::formats::openai::tests::test_create_request_o3_reasoning_effort ... ok
test providers::formats::openai::tests::test_format_tools_duplicate ... ok
test providers::formats::openai::tests::test_o1_default_reasoning_effort ... ok
test providers::formats::openai::tests::test_non_o_family_with_high_suffix ... ok
test providers::formats::openai::tests::test_o3_custom_reasoning_effort ... ok
test providers::formats::openai::tests::test_o1_mini_not_supported ... ok
test providers::formats::openai::tests::test_gpt4_with_version_suffix ... ok
test providers::formats::openai::tests::test_o3_default_reasoning_effort ... ok
test providers::formats::openai::tests::test_response_to_message_text ... ok
test providers::formats::openai::tests::test_o3_invalid_suffix_defaults_to_medium ... ok
test providers::formats::openai::tests::test_response_to_message_json_decode_error ... ok
test providers::formats::openai::tests::test_response_to_message_invalid_func_name ... ok
test providers::formats::openai::tests::test_response_to_message_valid_toolrequest ... ok
test providers::formats::openai::tests::test_format_messages_multiple_content ... ok
test providers::formats::openai::tests::test_format_messages_complex ... ok
  • Doc Tests
~/tmp/goose> cargo test --doc -- --test-threads=1
   Compiling serial_test v3.2.0
   Compiling ciborium v0.2.2
   Compiling goose v1.0.4 (/home/damoon/tmp/goose/crates/goose)
   Compiling criterion v0.5.1
    Finished `test` profile [unoptimized + debuginfo] target(s) in 3.37s
   Doc-tests goose

running 10 tests
test crates/goose/src/config/base.rs - config::base::Config (line 68) - compile ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::create_request (line 561) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::format_messages (line 277) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::format_text_content (line 40) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::format_tool_request (line 71) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::format_tool_response (line 190) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::format_tools (line 352) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::get_usage (line 503) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::process_tool_content (line 134) ... ok
test crates/goose/src/providers/formats/openai.rs - providers::formats::openai::response_to_message (line 413) ... ok

test result: ok. 10 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 43.00s

   Doc-tests goose_mcp

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests mcp_client

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests mcp_core

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests mcp_macros

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests mcp_server

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants