-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat: Add Exa search tools integration #3736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Returns: | ||
| The search results with text content. | ||
| """ | ||
| response = await self.client.search_and_contents( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/exa-labs/exa-py/blob/0d7aca32161fde2eca59c5759ca17a064426050c/exa_py/api.py#L1094 suggests this method should not be used anymore :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch sir! I've updated to use the new search() and find_similar() APIs instead of the deprecated search_and_contents() and find_similar_and_contents(). Also bumped the dependency to exa-py>=2.0.0 which is required for the new API.
| results: list[ExaSearchResult] = [] | ||
| for result in response.results: | ||
| results.append( | ||
| ExaSearchResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these extra types or could we use the dataclasses from the SDK? Or is this mostly about limiting the fields we return to the LLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is intentional to limit the fields returned to the LLM. The SDK dataclasses include many additional fields (score, highlights, extra metadata) that would bloat the context. The TypedDicts only include the most useful fields (title, url, text, author, published_date), which helps with token efficiency - especially important since search results can be large.
| text=result.text or '', | ||
| ) | ||
| ) | ||
| return results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to use list comprehensions: return [ExaSearchResult(...) for result in response.results]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Refactored all result building to use list comprehensions.
| You can get one by signing up at [https://dashboard.exa.ai](https://dashboard.exa.ai). | ||
| """ | ||
| return Tool[Any]( | ||
| ExaAnswerTool(client=AsyncExa(api_key=api_key)).__call__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user adds multiple of these tools to their agent and the model may call them in parallel, is it inefficient at all to use a different client for each?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're exposing multiple tools here, I suggest we also create an ExaToolset (https://ai.pydantic.dev/toolsets/, see LangChainToolset for an example) that takes a single api_key, and can then be configured on which tools to return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - addressed this with the new ExaToolset class. For users who want multiple Exa tools, ExaToolset shares a single client across all tools. The individual factory functions are kept for simple single-tool use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Takes a single api_key and creates a shared AsyncExa client
- Supports num_results and max_characters configuration
- Has include_* flags to select which tools to include
- Follows the same pattern as LangChainToolset
Example usage:
toolset = ExaToolset(api_key, max_characters=1000)> agent = Agent('openai:gpt-4o', toolsets=[toolset])>
|
@10ishq Looks like I reviewed just as you were pushing in some changes 😄 Have a look at my comments, which may still be relevant, and let me know when it's ready for another round |
Hey @DouweM! Thanks for the review - all comments addressed:
Ready for another round! |
|
@10ishq I'll do another code review tomorrow (busy day, sorry about that), but please have a look at the failing linting and tests! The timeouts are just CI being flaky, but the other failures seem real. |
| Tool[Any]( # type: ignore[reportUnknownArgumentType] | ||
| ExaSearchTool(client=client, num_results=num_results, max_characters=max_characters).__call__, | ||
| name='exa_search', | ||
| description='Searches Exa for the given query and returns the results with content. Exa is a neural search engine that finds high-quality, relevant results.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's dedupe this from the definitions above. Maybe the tool generator functions can accept aclient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! Refactored the factory functions (exa_search_tool, etc.) to accept an optional client parameter. Now ExaToolset just calls the factory functions with a shared client:
if include_search:
tools.append(exa_search_tool(client=client, num_results=num_results, ...))
This also enables advanced users to share a client across tools if they want.
docs/common-tools.md
Outdated
| - [`exa_search_tool`][pydantic_ai.common_tools.exa.exa_search_tool]: Search the web with various search types (auto, keyword, neural, fast, deep) | ||
| - [`exa_find_similar_tool`][pydantic_ai.common_tools.exa.exa_find_similar_tool]: Find pages similar to a given URL | ||
| - [`exa_get_contents_tool`][pydantic_ai.common_tools.exa.exa_get_contents_tool]: Get full text content from URLs | ||
| - [`exa_answer_tool`][pydantic_ai.common_tools.exa.exa_answer_tool]: Get AI-powered answers with citations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this list into Usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Moved the tools list to the start of the Usage section, right after "You can use Exa tools individually or as a toolset."
…ntents() to find_similar() API
- Factory functions now accept optional client parameter for sharing - ExaToolset reuses factory functions instead of duplicating Tool creation - Docs show include_* parameters in ExaToolset example - Moved Available Tools list into Usage section
59f998f to
e4e7547
Compare
| - [`exa_search_tool`][pydantic_ai.common_tools.exa.exa_search_tool]: Search the web with various search types (auto, keyword, neural, fast, deep) | ||
| - [`exa_find_similar_tool`][pydantic_ai.common_tools.exa.exa_find_similar_tool]: Find pages similar to a given URL | ||
| - [`exa_get_contents_tool`][pydantic_ai.common_tools.exa.exa_get_contents_tool]: Get full text content from URLs | ||
| - [`exa_answer_tool`][pydantic_ai.common_tools.exa.exa_answer_tool]: Get AI-powered answers with citations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make these links work we need to update docs/api/common_tools.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Added ::: pydantic_ai.common_tools.exa to docs/api/common_tools.md so the mkdocs cross-references resolve correctly.
| title=result.title or '', # pyright: ignore[reportUnknownMemberType,reportUnknownArgumentType] | ||
| text=result.text or '', # pyright: ignore[reportUnknownMemberType,reportUnknownArgumentType] | ||
| author=result.author, # pyright: ignore[reportUnknownMemberType,reportUnknownArgumentType] | ||
| published_date=result.published_date, # pyright: ignore[reportUnknownMemberType,reportUnknownArgumentType] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need all these ignores here and below, but not above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exa-py SDK has incomplete type annotations. Specifically:
-
client.search() returns SearchResponse[Result] - properly typed, no ignores needed
-
client.get_contents() returns SearchResponse[Unknown] - the generic type is Unknown, so accessing result.title, result.text etc. triggers pyright errors
This is an issue with the SDK itself. I verified by checking the SDK source - get_contents doesn't have the same generic type as search. The ignores are only where the SDK types are incomplete.
| """ | ||
| if client is None: | ||
| if api_key is None: | ||
| raise ValueError('Either api_key or client must be provided') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add @overloads to require on or the other at the type checking level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Added @overload decorators to all factory functions (exa_search_tool, exa_find_similar_tool, exa_get_contents_tool, exa_answer_tool). Now the type checker enforces that you must provide either api_key or client, but not both/neither:
|
Update on Coverage fix Why this approach:
Changes:
Please lmk your thoughts! |
Summary
Adds 4 new Exa search tools to
pydantic_ai.common_tools:exa_search_tool- Neural web searchexa_find_similar_tool- Find similar pagesexa_get_contents_tool- Extract URL contentexa_answer_tool- AI-powered answers with citationsChanges
pydantic_ai_slim/pydantic_ai/common_tools/exa.py(305 lines)pydantic_ai_slim/pyproject.tomlwithexa = ["exa-py>=1.12.0"]docs/install.mdwith Exa documentationTesting
All 4 tools tested successfully with live Exa API:
Documentation
duckduckgo.pyandtavily.py