Conversation
Implement two AbstractCapability subclasses for agent file system access and shell command execution, with comprehensive configuration, security controls, and 100% test coverage. FileSystem provides read_file, write_file, edit_file, list_directory, and search_files tools scoped to a configurable root directory with path traversal prevention and glob-based allow/deny filtering. Shell provides a run_command tool with anyio-based subprocess execution, timeout support, output truncation, and command allow/deny lists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rsist_cwd FileSystem: - Add create_directory(path) tool that creates directories with parents - Add find_files(pattern, path) tool for glob-based file name search Shell: - Separate stdout and stderr in output with [stdout]/[stderr] labels - Add persist_cwd parameter to track cd commands across calls All new features have 100% test coverage (113 tests total). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…GC warnings - Add `await proc.aclose()` in finally block to properly clean up subprocess resources after command execution - Add pytest filterwarnings to suppress Python 3.10-specific asyncio subprocess transport warnings (ResourceWarning, PytestUnraisableExceptionWarning) that occur during GC even after proper cleanup. Fixed in CPython 3.11+. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| for file_path in files: | ||
| if not file_path.is_file(): | ||
| continue | ||
| # Skip hidden files/directories | ||
| try: | ||
| rel_parts = file_path.relative_to(self._root).parts | ||
| except ValueError: # pragma: no cover | ||
| continue | ||
| if any(part.startswith('.') for part in rel_parts): | ||
| continue | ||
| try: | ||
| raw = file_path.read_bytes() | ||
| except OSError: | ||
| continue | ||
| # Skip binary files | ||
| if b'\x00' in raw[:8192]: | ||
| continue | ||
| text = raw.decode('utf-8', errors='replace') | ||
| rel_path = str(file_path.relative_to(self._root)) | ||
| for line_num, line in enumerate(text.splitlines(), start=1): | ||
| if compiled.search(line): | ||
| results.append(f'{rel_path}:{line_num}:{line}') |
There was a problem hiding this comment.
🔴 search_files, find_files, and list_directory bypass denied_patterns/allowed_patterns access controls
search_files, find_files, and list_directory only call check_access() on the top-level directory path argument (e.g., '.'), not on the individual files they iterate over. This means an agent can read the contents of files matching denied_patterns (or not matching allowed_patterns) by using search_files to grep through them, find_files to discover them, or list_directory to see them listed.
Concrete exploit with denied_patterns=['*.env']
With FileSystem(root_dir=tmp, denied_patterns=['*.env']), calling read_file('secret.env') is correctly blocked. However:
search_files('API_KEY')reads all files viarglob('*')atfilesystem.py:244without checking access, exposing the content ofsecret.envfind_files('*.env')listssecret.envviaresolved.glob(pattern)atfilesystem.py:302without checking accesslist_directory('.')revealssecret.envin the listing atfilesystem.py:215without filtering
Each file's relative path should be checked with check_access() before reading or listing.
Prompt for agents
The search_files, find_files, and list_directory methods all iterate over files without checking each individual file against the denied_patterns/allowed_patterns access controls. Only the top-level directory path is checked via safe_resolve/check_access.
To fix this, add a per-file check_access call in each method's iteration loop. For search_files (filesystem.py:246-267), before reading each file, compute the relative path and call self.check_access(rel_path), catching PermissionError to skip denied files. Apply the same pattern to find_files (filesystem.py:302-308) and list_directory (filesystem.py:215-224).
The check_access method already works with relative path strings and fnmatch, so the fix is straightforward: compute rel_path = str(file_path.relative_to(self._root)) and wrap the file processing in a try/except PermissionError block, or pre-check with check_access before processing.
Was this helpful? React with 👍 or 👎 to provide feedback.
| try: | ||
| tokens = shlex.split(command) | ||
| except ValueError: | ||
| # If shlex can't parse it, fall through and let the shell handle it | ||
| return | ||
| if not tokens: | ||
| return | ||
| executable = tokens[0] | ||
|
|
||
| if self.denied_commands and executable in self.denied_commands: | ||
| raise PermissionError(f'Command {executable!r} is denied.') | ||
| if self.allowed_commands and executable not in self.allowed_commands: | ||
| raise PermissionError(f'Command {executable!r} is not in the allowed list.') |
There was a problem hiding this comment.
🔴 check_command only validates first token, allowing shell metacharacter bypass of allow/deny lists
check_command uses shlex.split() to tokenize the command and only checks the first token against allowed_commands/denied_commands. Since shlex.split does not recognize shell metacharacters (;, &&, ||, |) as command separators, an attacker can chain a denied command after an allowed one. For example, with denied_commands=['rm'], the command echo hello; rm -rf / has first token 'echo' (since shlex produces ['echo', 'hello;', 'rm', '-rf', '/']), bypassing the deny check while the shell executes both echo and rm.
Prompt for agents
The check_command method in shell.py:79-91 only validates the first token from shlex.split, but shlex does not split on shell metacharacters like ;, &&, ||, |, backticks, or $(). This means denied commands can be trivially executed by chaining them: 'echo hi; rm -rf /' passes the check because the first token is 'echo'.
Possible approaches to fix:
1. Before shlex parsing, split the command on shell metacharacters (;, &&, ||, |) and validate each sub-command's first token independently.
2. Use shlex with punctuation_chars=True to properly tokenize shell operators, then identify and validate each command segment.
3. Document this as a known limitation and note that allow/deny lists provide best-effort protection only against direct invocation, not shell-level command chaining.
Approach 1 or 2 would involve iterating over all command segments in the pipeline/chain and checking each executable against the allow/deny lists.
Was this helpful? React with 👍 or 👎 to provide feedback.
| elif target.startswith('~'): | ||
| new_cwd = Path.home() / target[2:] # skip "~/" |
There was a problem hiding this comment.
🟡 _update_cwd incorrectly handles ~username paths by stripping first 2 chars instead of checking for ~/
In _update_cwd at shell.py:133-134, the elif target.startswith('~') branch handles all tilde paths that aren't exactly '~' by doing target[2:] (comment says "skip ~/"). This is only correct for paths starting with ~/. For a path like ~a, target[2:] yields '', causing Path.home() / '' = Path.home(), so cd ~a incorrectly changes to the home directory if it exists. For ~user/path, target[2:] yields er/path, producing a wrong path. The condition should be target.startswith('~/') to only handle home-relative paths.
| elif target.startswith('~'): | |
| new_cwd = Path.home() / target[2:] # skip "~/" | |
| elif target.startswith('~/'): | |
| new_cwd = Path.home() / target[2:] # skip "~/" |
Was this helpful? React with 👍 or 👎 to provide feedback.
Audit vs prior art: FileSystem + ShellWorth adding now:
Follow-up opportunities:
|
|
I see that file editing now relies on text replacement. In pydantic-ai-backend, we also support hashline - https://github.com/vstorm-co/pydantic-ai-backend/blob/main/src/pydantic_ai_backends/hashline.py Here's an article on the topic: https://blog.can.ac/2026/02/12/the-harness-problem/ |
Summary
FileSystemcapability withread_file,write_file,edit_file,list_directory, andsearch_filestools, scoped to a configurable root directory with path traversal prevention and glob-based allow/deny filteringShellcapability withrun_commandtool usinganyio-based subprocess execution, timeout support, output truncation, and command allow/deny listsAbstractCapabilitysubclasses that provide tools viaget_toolset()usingFunctionToolsetCloses #25, closes #26.
Test plan
pytest)coverage run -m pytest && coverage report)pyrightstrict mode, 0 errors)ruff checkandruff format --check)🤖 Generated with Claude Code