-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Introduce support for agent skills #3780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request introduces a comprehensive Skills System to Pydantic AI, enabling agents to dynamically discover and utilize modular skill packages. The implementation follows Anthropic's Agent Skills patterns and provides progressive disclosure of capabilities through a standardized toolset interface.
Key Changes:
- Added
SkillsToolsetwith four core tools:list_skills(),load_skill(),read_skill_resource(), andrun_skill_script() - Introduced skill discovery from filesystem with YAML frontmatter parsing for metadata
- Implemented security measures including path traversal prevention and script execution timeouts
- Added PyYAML as a required dependency to
pydantic-ai-slimfor parsing skill metadata
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Added PyYAML 6.0+ dependency lock for YAML frontmatter parsing |
pydantic_ai_slim/pyproject.toml |
Added PyYAML dependency to project requirements |
tests/test_skills.py |
Comprehensive test suite covering skill discovery, parsing, validation, and toolset integration (900 lines) |
pydantic_ai_slim/pydantic_ai/toolsets/skills/_types.py |
Type definitions for Skill, SkillMetadata, SkillResource, and SkillScript dataclasses |
pydantic_ai_slim/pydantic_ai/toolsets/skills/_exceptions.py |
Custom exception classes for skill operations |
pydantic_ai_slim/pydantic_ai/toolsets/skills/_discovery.py |
Skill discovery, YAML parsing, and validation logic |
pydantic_ai_slim/pydantic_ai/toolsets/skills/_toolset.py |
Main SkillsToolset implementation with tool registration and execution |
pydantic_ai_slim/pydantic_ai/toolsets/skills/__init__.py |
Module exports and documentation |
pydantic_ai_slim/pydantic_ai/toolsets/__init__.py |
Added Skills toolset exports to main toolsets module |
pydantic_ai_slim/pydantic_ai/__init__.py |
Added SkillsToolset to main package exports |
mkdocs.yml |
Added skills.md documentation page to navigation |
examples/pydantic_ai_examples/skills_agent.py |
Example demonstrating Skills integration with an agent |
examples/pydantic_ai_examples/skills/pydanticai-docs/SKILL.md |
Example skill providing Pydantic AI framework documentation |
examples/pydantic_ai_examples/skills/arxiv-search/scripts/arxiv_search.py |
Example Python script for arXiv paper search |
examples/pydantic_ai_examples/skills/arxiv-search/SKILL.md |
Example skill for searching arXiv repository |
docs/skills.md |
Comprehensive documentation covering skill creation, usage patterns, and API reference (535 lines) |
docs/api/toolsets.md |
API documentation updates for Skills toolset types and functions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@DouweM here is the PR to introduce agent skills natively in the Pydantic AI. I did some more refactoring to align with the Pydantic AI codebase. Question: To effectively use Agent Skills, their definitions must be added to the system prompt. Currently, the dev experience is: from pydantic_ai import Agent, SkillsToolset
# Initialize Skills Toolset with skill directories
skills_toolset = SkillsToolset(directories=["./skills"])
# Create agent with skills
agent = Agent(
model='openai:gpt-4o',
instructions="You are a helpful research assistant.",
toolsets=[skills_toolset]
)
# Developer must explictly add skills system prompt using our helper function
@agent.system_prompt
async def add_skills_to_system_prompt() -> str:
return skills_toolset.get_skills_system_prompt()
# Use agent - skills tools are automatically available
result = await agent.run(
"What are the last 3 papers on arXiv about machine learning?"
)
print(result.output)I suggest adding an optional |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
@DougTrajano Good idea, call it Note that we also have an I'll give the PR a proper review on Monday, thanks for working on this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 16 out of 17 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
DouweM
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't help myself and left a few quick comments ahead of a full review next week :) Main point is that I'd like this to be less hard-coded to use the local environment
|
|
||
| def __init__( | ||
| self, | ||
| directories: list[str | Path], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want this to be hard-coded to having skills on the local file system. Can we support programmatically passing in skills?
The Anthropic API may be useful for seeing how they represent skills, since they obviously can't actually do local file reads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good suggestion. Maybe we should create a SkillSource concept that interacts with SkillsToolset via standard methods such as discover(), read_resource(), etc.
This allows us to have LocalSkillSource, RemoteSkillSource, etc. It also allows developers implement their own logic.
It will take some time, but I believe I can give it to you as a Christmas gift. 🎅🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought from the peanut gallery, appreciate all the work getting done on this PR.
I think there really will need to have the ability for multiple sources of skills because I think workflows where you mix/match those sources can be pretty common for prompts referencing multiple skills. If you look at the skills Anthropic released out of the box they could be pretty useful pieces of boilerplate that are happily referenced from a hosted source (currently Claude API).
If my prompt is "can you update a report using the custom-business-report skill and then create a pdf using the pdf skill". The PDF skill is useful boilerplate that I don't want to pull and package alongside by custom skill. What I don't quite understand yet and worries me is how effective an agent is going to be if it needs to go back and forth between a tool-calling implementation of skills (Pydantic AI) and a filesystem implementation (Anthropic/Claude native) in a single loop.
Do you anticipate references in SKILL.md to just be paths to the reference files like in Anthropic's implementation? Or would it be "use read_skill_resource tool to read /path/to/resource"? It would be great if you could just list relative paths like in Anthropic's and the agent just somehow knew it was working with a filesystem implementation or a tool-calling implementation and act appropriately. It would mean skills are fully portable between the two implementations without modification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you anticipate references in SKILL.md to just be paths to the reference files like in Anthropic's implementation? Or would it be "use
read_skill_resourcetool to read /path/to/resource"? It would be great if you could just list relative paths like in Anthropic's and the agent just somehow knew it was working with a filesystem implementation or a tool-calling implementation and act appropriately. It would mean skills are fully portable between the two implementations without modification.
Check my proposal in this comment #3780 (comment)
Essentially, we will implement an interface for SkillsSource that handles the logic for reading resources when the SkillsToolset requires them. With this approach, we should be able to support file paths or even URLs. Developers can also define custom logic by modifying read_skill_resource() as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a great approach. Having fully portable skills that work across different skill supporting clients is important. Filesystem implementation is effectively the standard because that's how this started but tool based implementation offers the benefit of remotely hosted (centralized) skill repositories. Things get messy if we have to tweak the SKILL.md and supporting resources if you move between a filesystem client and tools client... it's gotta be the same SKILL.md for both. Appreciate the work here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you could just list relative paths like in Anthropic's and the agent just somehow knew it was working with a filesystem implementation or a tool-calling implementation and act appropriately.
Note that the filesystem implementation looks like a tool calling implementation to the LLM as well: reading a file is calling a tool like read_file(...) or run_sh('cat ...'). So if we give our skill tools names and descriptions that the model will understand are equivalent (like read_skill_file), it should do just as well.
It would mean skills are fully portable between the two implementations without modification.
I agree that should be the case!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it needs to go back and forth between a tool-calling implementation of skills (Pydantic AI) and a filesystem implementation (Anthropic/Claude native) in a single loop.
@jbnitorum Can you elaborate on what you mean here, and what the specific thing is that you're concerned about Pydantic AI handling well?
Do you mean that during a single agent run, some skills should be handled by Pydantic AI, and some should be handled by Anthropic? I imagine the model would get very confused having 2 competing sources of and tools around skills. I imagine that it would be more like #3212, where Pydantic AI would decide based on model capabilities whether to use the framework or API implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that was what I was implying, but taking a step back I think its better solved with a different approach.
I was highlighting the usefulness of some of the built-in skills and thinking about it conceptually through more of a Claude Web/Desktop lens where all enabled skills get exposed to the system prompt natively. When accessing via the API though any skills available in the agent run must be explicitly defined, even the built in ones. So the simplistic built in file type skills aren't just "magically" always there like in the chat clients.
Given that an explicit definition needs to be present anyway it seems much more simplistic to just grab the necessary built-in skills off Github at runtime and treat them like any other local skill as opposed to try to orchestrate something where some of the skills are coming from Pydantic AI and others are within calls to the Anthropic API. I think its a pretty safe bet that any built in skills will continue to be open sourced on github.
A helper function to dig skills out of repos would work, used like below. I'll leave it to @DougTrajano if he has a more elegant solution, maybe build something into SkillSource itself for common remote storage types, Github, S3, etc.
from pydantic_ai.toolsets.skills import SkillsSource, SkillsToolset, GitHubSkill
skills_source = SkillsSource(
sources=[GitHubSkill(repo='anthropics/skills', path='skills/pdf', dest='local_skills/pdf', commit='abc123')],
executor=sys.executable
)
skills_toolset = SkillsToolset(
sources=[skills_source]
)| validate: bool = True, | ||
| id: str | None = None, | ||
| script_timeout: int = 30, | ||
| python_executable: str | Path | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if this is None, we should NOT run scripts, as running them on the local system is potentially dangerous and requires the user to know what they're doing.
It could also be useful to let the user provide a function to execute the tool (instead of a Python path?) so that they can plug in some other (possibly remote, possibly sandboxed) execution environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, good point. I think we should design it as a pluggable system with three main components:
SkillsToolset: Interface between agent and skills source.SkillsSource: Previouslydirectories, it will be an interface that we can extend forLocalSkillsSource,RemoteSkillsSource, etc.- It provides standard methods to interact with
SkillsToolsetsuch aslist_skills(),load_skill(), etc.
- It provides standard methods to interact with
SkillScriptExecutor: An execution environment representation (can be a local Python path as today, remote, or sandboxed environment) associated with a givenSkillsSource.- The
SkillScriptExecutorreceives the script invocation fromSkillsToolset.run_skill_script()and executes it. - Limitation: We will not support different
SkillScriptExecutorin the sameSkillsSource, I don't think it's required, and it increases the complexity because we need to have it defined inside each skill. I don't want to overwhelm it to handle just a few markdown and Python Scripts.- If you require multiple
SkillScriptExecutorinstances within the sameSkillsSource, consider reorganizing your skills. It might be beneficial to split them into two separateSkillsSourceobjects, enabling you to define two different execution environments.
- If you require multiple
- The
What do you think?
flowchart TB
subgraph Agent["Agent Runtime"]
A[Agent] --> ST[SkillsToolset]
end
subgraph Toolset["SkillsToolset"]
ST --> |"delegates to appropriate source"| Router{Route by skill_name}
end
subgraph Source1["LocalSkillsSource"]
Router --> LSS[LocalSkillsSource]
LSS --> |"list_skills()"| FileSystem1[(File System)]
LSS --> |"load_skill()"| FileSystem1
LSS --> |"read_skill_resource()"| FileSystem1
LSS --> |"run_skill_script()"| LSE[LocalSkillScriptExecutor]
LSE --> Sub[anyio.run_process]
end
subgraph Source2["RemoteSkillsSource"]
Router --> RSS[RemoteSkillsSource]
RSS --> |"list_skills()"| API[(Remote API)]
RSS --> |"load_skill()"| API
RSS --> |"read_skill_resource()"| API
RSS --> |"run_skill_script()"| RSE[RemoteSkillScriptExecutor]
RSE --> RAPI[Remote Execution API]
end
subgraph Source3["CustomSkillSource"]
Router --> CSS[CustomSkillsSource]
CSS --> |"run_skill_script()"| SBE[SandboxedExecutor]
SBE --> Container[Docker/WASM]
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That mostly makes sense, but I have a few questions that might be cleared up by just seeing the code that makes this work the way you envision :)
- Besides skill sources (which I agree are useful for local and remote URL-based discovery) I'd expect to also be able to build my own
Skillobjects and pass those in somehow - Why is the executor tied to the skill source and does each skill source possibly have a separate executor? I see the source and the executor as orthogonal, 2 separate things pluggable into the toolset, but otherwise not tied together, but I may be missing something.
- Do we need a full executor class, or could it be a simple callable taking the script to be run?
- If the scripts need an execution environment that need to be stable across runs, we should pass the
RunContextinto the callable so it can usedepsto read things likecontainer_id - I wonder if we could design the executors to also work with Programmatic Tool Calling (i.e. CodeAct, Code Mode), which similarly needs a pluggable execution environment, but also needs a way to inject code context (i.e. Python functions that will be available) + let the code return tool calls back to the agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey buddy, here are my comments:
-
Let me implement this in some way.
-
By tying the execution environment to the
SkillsSource, we can support multiple execution environments without requiring a Pydantic-specific definition of agent skills. Suppose I have a local folder named./skillswith these two skills: pdf and xlsx.- The easier way to use them is just read the entire
./skillsfolder.
from pydantic_ai.toolsets.skills import SkillsSource, SkillsToolset skills_source = SkillsSource( sources=["./skills"], executor=sys.executable # default to using the current Python interpreter ) skills_toolset = SkillsToolset( sources=[skills_source] ) # both pdf and xlsx skills are available within the skills_toolset # in the same execution environment
- However, suppose I want to execute the PDF skills in a different environment (e.g., a Docker container with specific libraries installed). In that case, I just need to separate the skills into two sources. It does not require any change in the skills files to accomplish this.
import sys from pydantic_ai.toolsets.skills import SkillsSource, SkillsToolset def pdf_executor(skill, script, args): # Custom logic to execute PDF skills pass pdf_source = SkillsSource( sources=["./skills/pdf"], executuor=pdf_executor ) xlsx_source = SkillsSource( sources=["./skills/xlsx"], executor=sys.executable # default to using the current Python interpreter ) skills_toolset = SkillsToolset( sources=[pdf_source, xlsx_source] ) # PDF skills are executed using pdf_executor, # while XLSX skills use the current Python interpreter
- The easier way to use them is just read the entire
-
We will provide an executor protocol and two classes:
LocalSkillExecutorandCallableSkillScriptExecutor. TheSkillsToolsetwill accept either theSkillScriptExecutorprotocol, such asLocalSkillExecutor, or a simple callable/function with specific arguments: skill, script, and args. We will then wrap this function into theCallableSkillScriptExecutor. -
I think that advanced use cases may require some dependencies (secrets, reusable HTTP clients, etc.). I will add the context to the executor then.
-
We should be able to accomplish this by creating a new executor class. I am currently developing the proposal above. I will commit it soon. I believe it's better to handle it in a separate PR by extending
SkillScriptExecutor.
| return '' | ||
|
|
||
| lines = [ | ||
| '# Skills', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default prompts should live in constants at the top of the file and should be overridable, either through the constructor, or at least the in-development #3656
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, it's ready for your review.
Co-authored-by: Copilot <[email protected]>
fa37900 to
852df1c
Compare
Perfect, it's done. I also updated the |
|
just for ref PR that introduced Agent Skills in vscode microsoft/vscode#278445 |
This pull request introduces a new "Skills" system to Pydantic AI, enabling modular, progressive skill discovery and execution for agents. The main changes include the addition of the
SkillsToolsetand related types, updates to documentation and examples to demonstrate skill usage, and new skill definitions and scripts for practical use.References:
Skills Toolset Integration
SkillsToolsetand supporting types (Skill,SkillMetadata,SkillResource,SkillScript, etc.) to the main package exports, enabling agents to discover and use skills dynamically. [1] [2] [3] [4] [5]pydantic_ai/toolsets/skills/__init__.pywith documentation and examples for building and managing agent skills.Documentation and API Reference Updates
docs/api/toolsets.md) and navigation (mkdocs.yml) to include the new Skills toolset and its members, ensuring clear guidance for users. [1] [2]pydanticai-docs/SKILL.md), detailing framework features and usage patterns.Skill Example Implementation
arxiv-search/SKILL.mdandarxiv_search.py), including usage instructions, argument descriptions, and output formatting. [1] [2]Agent Example with Skills
skills_agent.py) demonstrating how to create an agent with skills, list available skills, load instructions, and execute skill scripts.These changes collectively enable agents to leverage domain-specific skills, improve extensibility, and provide clear documentation and examples to help users get started.