Skip to content

Conversation

hschawe
Copy link
Collaborator

@hschawe hschawe commented Aug 22, 2025

Jira: https://jira.mongodb.org/browse/EAI-1187

Changes

  • Let user know that we searched after fetch_page fallback
  • Also added a couple new prompt adherence eval cases for the new instruction
  • Summary of the prompt changes & why they're needed:
    1. OpenAI's GPT-4.1 prompting guide recommends to put all tool instructions in the tool descriptions, and I saw a notable increase across all metrics after doing this (for fetch_page and prompt adherence eval cases)
    2. There were still some issues with selecting the right tool & formatting the responses, so I put the "coordination" instructions in the system prompt, which improved prompt adherence & the tool call metrics
    3. Repeating instructions greatly helps the LLM follow instructions when it comes to tool instructions. I tested out some search fallback changes (which did not make it into this PR) and the fetch_page assumption instructions (which did) and in both cases having an extra line of instruction really helped the model use the tools correctly

Notes

  • Evaluation runs for this change compared to main
  • The LLM isn't always great at always giving the fallback response instruction in only the {fallback_to_search} case. Sometimes when fetch_page has to truncate the page, the LLM will sometimes say it couldn't use the page, even if it didn't call the search_tool.

@hschawe hschawe marked this pull request as ready for review August 22, 2025 19:07
Copy link
Collaborator

@mongodben mongodben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small request for change

@hschawe hschawe requested a review from mongodben August 25, 2025 20:17
Copy link
Collaborator

@mongodben mongodben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great prompt engineering work here!

1 small suggestion

@mongodben
Copy link
Collaborator

  1. OpenAI's GPT-4.1 prompting guide recommends to put all tool instructions in the tool descriptions, and I saw a notable increase across all metrics after doing this (for fetch_page and prompt adherence eval cases)

great, in the prompting guide and evals i trust.

interesting that this is the optimal pattern now. it actually used to be that the API had a hard cap on how long the description could be, and you were supposed to put all this info in the system prompt. wonder when it changed.

  1. There were still some issues with selecting the right tool & formatting the responses, so I put the "coordination" instructions in the system prompt, which improved prompt adherence & the tool call metrics

sounds good

  1. Repeating instructions greatly helps the LLM follow instructions when it comes to tool instructions. I tested out some search fallback changes (which did not make it into this PR) and the fetch_page assumption instructions (which did) and in both cases having an extra line of instruction really helped the model use the tools correctly

👍

  • The LLM isn't always great at always giving the fallback response instruction in only the {fallback_to_search} case. Sometimes when fetch_page has to truncate the page, the LLM will sometimes say it couldn't use the page, even if it didn't call the search_tool.

sorry i dont really understand what you mean here. can you explain a bit more?

and, is this an issue that we should address? if so, with what priority? should there be a follow up ticket?

@hschawe
Copy link
Collaborator Author

hschawe commented Aug 28, 2025

  • The LLM isn't always great at always giving the fallback response instruction in only the {fallback_to_search} case. Sometimes when fetch_page has to truncate the page, the LLM will sometimes say it couldn't use the page, even if it didn't call the search_tool.

sorry i dont really understand what you mean here. can you explain a bit more?

and, is this an issue that we should address? if so, with what priority? should there be a follow up ticket?

when fetch_page is used on a long page (>150,000 characters), the page is truncated and we do an on-page search for relevant content. the LLM knows that this search is happening, so sometimes - not always - it will give the fallback disclaimer, even though it didn't fall back to the search_content tool.

i don't think this is that big of an issue especially since it doesn't affect how the LLM uses the tools, but it would be good to dedicate some time to this on a separate ticket (EAI-1288)

@hschawe hschawe merged commit a00268b into main Aug 28, 2025
2 checks passed
@hschawe hschawe deleted the EAI-1187 branch August 28, 2025 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants