Skip to content

feat: Workspace toolkit + GitHubContextProvider (HITL, edit-via-PR)#7683

Open
ashpreetbedi wants to merge 15 commits intomainfrom
feat/workspace-tools
Open

feat: Workspace toolkit + GitHubContextProvider (HITL, edit-via-PR)#7683
ashpreetbedi wants to merge 15 commits intomainfrom
feat/workspace-tools

Conversation

@ashpreetbedi
Copy link
Copy Markdown
Contributor

@ashpreetbedi ashpreetbedi commented Apr 25, 2026

Summary

This PR ships two related primitives that release together as 2.6.2:

  1. Workspace — a polished local-machine toolkit at libs/agno/agno/tools/workspace.py that gives agents read/list/search/write/edit/move/delete/shell access to a root directory tree, with destructive operations gated by Agno's built-in human-in-the-loop confirmation by default.
  2. GitHubContextProvider — a context provider at libs/agno/agno/context/github/ that gives agents navigation + edit-via-PR access to a Git repository hosted on GitHub. Two-tool surface (query_<id> / update_<id>) with read/write sub-agents bound to different toolsets, mirroring DatabaseContextProvider.

They ship together because the update_<id> path of GitHubContextProvider builds on Workspace for file ops inside per-task worktrees.


1. Workspace toolkit

This closes the polish gap with the Claude Agent SDK tab on the homepage. The new homepage snippet for the Agno SDK tab now reads:

tools=[Workspace(
    ".",
    allowed=["read", "list", "search"],
    confirm=["write", "edit", "delete", "shell"],
)]

Mutually exclusive allowed= / confirm= partitions of short aliases — auto-pass vs. approval-required. Aliases translate internally to descriptive method names (read_file, write_file, …) so the LLM tool spec stays self-explanatory.

What's in it

  • 8 method pairs (sync + async): read_file, list_files, search_content, write_file, edit_file, move_file, delete_file, run_command.
  • Line-numbered read_file output (cat -n style) — chunk reads preserve actual file line numbers so the agent can chain into edit_file precisely.
  • edit_file with replace_all=False — unique-or-fail by default; flip for renames.
  • Rich list_files entries ({path, type, size}) with optional recursive=True, max_depth=3 for tree-style exploration.
  • Atomic write_file (writes .tmp then os.replace).
  • run_command strips ANSI before tailing output (saves tokens on npm/pip/etc.).
  • Path-scoping enforcement via inherited Toolkit._check_path. Explicitly NOT a process sandbox — the docstring spells this out and points at Daytona for untrusted code.
  • Opt-in require_read_before_write=True blocks writes/edits/moves/deletes on existing files until the agent has read them this session. Catches the "agent hallucinated the file's contents" bug class.

Permission model

  • A name in allowed runs silently.
  • A name in confirm requires user approval (Agno's requires_confirmation_tools HITL — surfaces as approval cards in the AgentOS UI; pause/resume in code).
  • A name in neither isn't registered — the LLM doesn't see it.
  • A name in both raises ValueError.
  • Default (both None) = reads in allowed, writes in confirm.
  • Type guard: confirm=True or allowed="read" raise a clear TypeError instead of confusing alias errors.

Cookbook + docs reorg

  • cookbook/91_tools/workspace_tools/basic_usage.py, with_confirmation.py, README, TEST_LOG.
  • cookbook/99_docs/home/cookbook/99_docs/index/ (homepage SDK tabs preserved with git mv); if __name__ block dropped — the runnable path is now fastapi dev <file>.py.
  • cookbook/99_docs/first-agent/workbench.py — runnable copy of the new "Build Your First Agent" snippet (18 lines, Workspace(".") with default safe partition + enable_agentic_memory=True).

2. GitHubContextProvider

Read + write access to a GitHub repository cloned into a local working directory (typically a Docker volume). Mirrors DatabaseContextProvider's read/write split:

  • query_<id>(question) — natural-language reads against the checkout. Backed by a sub-agent with read-only Workspace + GitReadTools (log/diff/show/blame/branches).
  • update_<id>(instruction) — natural-language writes that end in a pull request. Backed by a sub-agent with full Workspace + GitWriteTools (status/add/commit/push/gh pr create/gh pr view), scoped to a per-session worktree.

What's in it

  • libs/agno/agno/context/github/provider.pyGitHubContextProvider with asetup() (clone or fetch+pull, idempotent), aclose() (best-effort worktree cleanup), status() (<repo>@<branch>:<sha>), and the read/write sub-agent split.
  • libs/agno/agno/context/github/tools.pyGitReadTools (5 read ops) and GitWriteTools (6 write/PR ops).
  • libs/agno/agno/context/github/__init__.py — exports the provider, the toolkits, and the default instructions.
  • libs/agno/tests/unit/context/test_github_provider.py — 32 tests, no network. Uses a local bare git repo as the fake remote and stubs gh via a shell script.
  • cookbook/12_context/12_github.py — read demo against agno-agi/agno (always runs); write demo opens a PR if GITHUB_WRITE_REPO is set.

Key behaviors

  • Worktree-per-task (Coda's pattern): each session gets its own <workdir>/worktrees/<task>/ worktree on a <prefix>/<task> branch. Parallel update_<id> calls in different sessions don't collide. Cached by run_context.session_id; ephemeral teardown when no session_id is propagated.
  • Branch-prefix safety: every git push and gh pr create validates the active branch matches <pr_branch_prefix>/*. Default prefix is agno. The agent cannot push to the default branch — that's the tripwire that keeps this safe to expose.
  • PAT auth: github_token kwarg or GITHUB_TOKEN env. Embedded into the clone URL so subsequent pushes inherit auth without further setup. gh calls receive the token via GH_TOKEN / GITHUB_TOKEN in the subprocess env.
  • gh CLI dependency: create_pull_request and pr_status shell out to gh. Missing gh is a clear runtime error, not a silent constructor failure.
  • mode=tools returns the read-only flat surface only (Workspace(allowed=READ_TOOLS) + GitReadTools). Writes need the per-session worktree, so they require mode=default (two-tool surface).

Out of scope (deferred)

  • Multi-repo per provider instance. One provider = one repo.
  • GitHub App / OAuth auth. PAT only.
  • Direct push to default branch (refused by branch-prefix safety).
  • Forking workflow. Assumes push access to the source repo.
  • Mid-task conflict resolution.
  • Updating Scout's contexts.py — caller-side, separate.
  • Updating docs/ — separate doc PR for 2.6.2.

Type of change

  • New feature
  • Improvement (homepage + first-agent guide DX)
  • Bug fix
  • Breaking change
  • Model update
  • Other

FileTools, ShellTools, and LocalFileSystemTools are untouched — no deprecation, no breaking change for existing users.


Checklist

  • Code complies with style guidelines
  • Ran format/validation scripts (./scripts/format.sh and ./scripts/validate.sh)
  • Self-review completed
  • Documentation updated (comments, docstrings)
  • Examples and guides: cookbook examples included for both Workspace and GitHubContextProvider; homepage + first-agent docs snippets updated (docs repo PR is separate)
  • Tested in clean environment (cookbook smoke runs against gpt-5.4 end-to-end)
  • Tests added/updated (61 new workspace tests + 32 new GitHub provider tests; existing FileTools / context tests still green)

Duplicate and AI-Generated PR Check

  • I have searched existing open pull requests and confirmed that no other PR already addresses this issue
  • If a similar PR exists, I have explained below why this PR is a better approach
  • Check if this PR was entirely AI-generated (by Copilot, Claude Code, Cursor, etc.)

Additional Notes

Verified end-to-end:

  • All workspace + context unit tests green (164 total: 61 workspace, 32 github, 71 context regression).
  • ./scripts/format.sh and ./scripts/validate.sh clean for new files (pre-existing mypy errors in unrelated slack.py / drive.py / sql.py are out of scope).
  • cookbook/91_tools/workspace_tools/basic_usage.py ran end-to-end against gpt-5.4 — agent called read_file → write_file → list_files.
  • cookbook/91_tools/workspace_tools/with_confirmation.py ran end-to-end — read_file ran silent, edit_file paused, requirement.confirm() resumed cleanly, edit applied.

Out of scope (Workspace, captured in .context/workspace_tools_design.md for future sprints):

  • multi_edit — atomic batched edits to one file (Claude Code parity).
  • Background processes — run_command(background=True) + command_output(handle) + kill_command(handle).
  • Dynamic confirm predicate (callable instead of static list) — touches Agno's Function layer.
  • additional_roots=[...] — extend root scope to multiple dirs.
  • LSP integration (Mastra has it; heavyweight).
  • Lifecycle hooks (PreToolUse / PostToolUse / Stop / SessionStart / SessionEnd).

Docs repo: the matching changes to index.mdx (Agno SDK tab) and first-agent.mdx (Build Your First Agent guide) live in the docs repo and will be committed there separately.

ashpreetbedi and others added 12 commits April 23, 2026 13:54
Add AgentOS examples used in the docs welcome page, one per
framework — Agno SDK, Claude Agent SDK, DSPy, LangGraph. Each
file is self-contained and runnable via `python <file>.py`.

Also add the required demo deps (claude-agent-sdk, langgraph,
langchain-openai, dspy) to libs/agno/pyproject.toml so the demo
venv can run all four.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Adds a polished local-machine toolkit alongside FileTools, ShellTools, and
LocalFileSystemTools. Combines read/write/edit/delete/search/shell into one
cohesive surface, sandboxed to a configurable base_dir, with destructive
operations requiring user approval by default through Toolkit's existing
requires_confirmation_tools mechanism.

The constructor exposes mutually-exclusive allowed_tools (auto-pass) and
confirm_tools (approval-required) lists. The snippet on the docs homepage now
mirrors the Claude Agent SDK tab's polish (visible sandbox + visible permission
story) in Agno-native form.

- New: libs/agno/agno/tools/workspace.py (7 sync + 7 async methods)
- New: libs/agno/tests/unit/tools/test_workspace.py (38 tests, all passing)
- New: cookbook/91_tools/workspace_tools/ (basic_usage, with_confirmation, README, TEST_LOG)
- Switched: cookbook/99_docs/home/agno_agent.py to WorkspaceTools
- FileTools, ShellTools, LocalFileSystemTools untouched (no deprecation)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… init

Snippet-readability pass on the toolkit landed in the previous commit:

- Class: WorkspaceTools -> Workspace
- First param: base_dir -> root (positional, so Workspace(".") works)
- allowed_tools / confirm_tools now accept short aliases instead of full
  method names. The toolkit translates aliases -> method names internally,
  so the LLM tool spec keeps the descriptive names (read_file, write_file,
  list_files, ...) — only the developer-facing snippet is shortened.

Aliases: read, list, search, write, edit, delete, shell.
Method names (and signatures) are unchanged.

Net effect on the homepage snippet:

    tools=[Workspace(
        ".",
        allowed_tools=["read", "list", "search"],
        confirm_tools=["write", "edit", "delete", "shell"],
    )]

Tests updated; 41 unit tests green (added 3 covering positional root,
default-cwd root, and full-name-rejected-as-alias).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…st, recursive, move, atomic, ANSI, read-before-write)

Tier 1 — behavior tightening (no API additions):

- read_file output is now line-numbered (cat -n style). Numbers reflect actual
  file lines so the agent can chain into edit_file precisely. Chunked reads
  preserve correct numbering relative to the source file.
- edit_file gains replace_all=False. Default behavior (unique-or-fail) is
  unchanged; replace_all=True replaces every occurrence and reports the count.
  Multi-match error message now mentions the flag.
- list_files entries are now {path, type, size} dicts instead of bare paths
  so the LLM can decide what to read without a second call.
- run_command strips ANSI escape sequences (color codes, cursor moves) from
  output before tailing — saves tokens on npm/pip/etc. CLI output.
- read_file "file too long" hints now mention search_content as an
  alternative to start_line/end_line chunking.

Tier 2 — small additions:

- list_files gains recursive=False and max_depth=3 params. tree -L semantics:
  max_depth=1 returns only the immediate children of the search root.
- New move_file / amove_file (alias "move"). Both src and dst sandbox-checked.
  Refuses to clobber existing dst unless overwrite=True. Added to WRITE_TOOLS.
- write_file is now atomic — writes to <file>.tmp, then os.replace into place.
  A crash mid-write can't leave a partially-written target.
- New opt-in require_read_before_write=False constructor flag. When True,
  blocks write/edit/move/delete on existing files until the agent has read
  them this session. Catches the "agent hallucinated the file's contents"
  bug class. Newly-created files skip the check.

Tests: 41 → 59 (+18 new, ~6 updated for new output shapes). FileTools tests
unchanged (no regressions).

Homepage snippet unchanged at 4 confirm tools — `move` is documented in the
toolkit README and discoverable, but not enabled by default in the demo.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ed agno_assist

Reorganize cookbook/99_docs/ to mirror the docs site layout:

- cookbook/99_docs/home/  → cookbook/99_docs/index/  (homepage SDK tabs)
- new: cookbook/99_docs/first-agent/agno_assist.py — runnable copy of the
  snippet shown in docs/first-agent.mdx, switched from MCPTools to
  Workspace(".") and trimmed to 18 lines (no markdown=True). Defaults give
  the agent the full read/list/search/write/edit/move/delete/shell surface
  with safe-by-default confirmation on destructive ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
- Rename file: cookbook/99_docs/first-agent/agno_assist.py → workbench.py
- Rename agent: "Agno Assist" → "Workbench" (pairs with Workspace as the
  thing that *works in* the workspace)
- Add enable_agentic_memory=True so the agent can remember things across
  sessions, not just within the conversation history window. 19 lines total.

Mirrors the updated docs/first-agent.mdx (in the docs repo).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
"Sandbox" implies OS-level isolation guarantees (process namespaces, syscall
filtering, network blocking) that Workspace doesn't deliver. What it actually
does is path-scoping: paths must resolve under root, shell commands run with
cwd=root. The agent can still read env vars, hit the network via shell, and
use anything else the host process can.

Removed "sandbox/sandboxed" wording across the toolkit docstring, README,
test comments, design doc, and the cookbook workbench file. Where the word
remains, it now appears as an explicit disclaimer pointing to Daytona for
real sandboxing.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Removes the trailing main-gate from all four index/ tab files. The runnable
path is now 'fastapi dev <file>.py', matching the convention already used
in cookbook/99_docs/first-agent/workbench.py and the docs first-agent guide.

3 fewer lines per file, snippet ends cleanly at 'app = agent_os.get_app()'.
Mirrors the matching change in docs/index.mdx.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Drops the _tools suffix on the two partition kwargs:

  Workspace.allowed_tools=[...]   →  Workspace.allowed=[...]
  Workspace.confirm_tools=[...]   →  Workspace.confirm=[...]

The strings inside the lists ("read", "write", ...) already make the meaning
self-evident, and the shorter names save real estate on the homepage snippet
where every character matters. Our partition semantics already differ from
Claude SDK's allowed_tools (theirs = whitelist; ours = auto-pass subset
mutually exclusive with confirm), so the rename also reduces a false-friend
collision.

Adds an isinstance(list) check in _resolve_partitions so confirm=True or
allowed="read" raise a clear TypeError instead of a confusing alias error
(e.g. "unknown alias 'r', 'e', 'a', 'd'" from set('read')).

Sweep: workspace.py constructor + docstring + error messages, test_workspace
(38 occurrences + 2 new TypeError tests, 61 tests total), README, basic_usage,
TEST_LOG, design doc, and the homepage cookbook agno_agent.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Was bare 'set'; now Set[Path] for proper type checking.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@ashpreetbedi ashpreetbedi requested a review from a team as a code owner April 25, 2026 16:09
ashpreetbedi and others added 3 commits April 26, 2026 10:56
Mirrors DatabaseContextProvider's read/write split with two sub-agents:
- query_<id>: read-only Workspace + GitReadTools over the main checkout
- update_<id>: full Workspace + GitWriteTools scoped to a per-session
  worktree at <workdir>/worktrees/<task>/

Every write task ends in a PR on a <prefix>/<task> branch the human
reviews and merges. Branch-prefix safety on git_push and gh pr create
keeps the agent from pushing to the default branch.

- Auth: PAT via github_token kwarg or GITHUB_TOKEN env, embedded into
  the clone URL so subsequent pushes inherit it; gh receives it via
  GH_TOKEN/GITHUB_TOKEN in the subprocess env.
- Worktree-per-task (Coda's pattern): parallel update_<id> calls in
  different sessions don't collide. Cached by run_context.session_id
  (synthetic ephemeral fallback for stateless callers).
- gh CLI dependency: create_pull_request + pr_status shell out to gh.
  Missing gh is a clear runtime error, not a constructor failure.
- mode=tools returns the read-only flat surface only; writes need the
  sub-agent split, so they require mode=default (two-tool surface).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
32 unit tests, no network. The fake remote is a bare git repo seeded
with one commit on main; gh is stubbed via a shell script that echoes
a fake PR URL.

Coverage:
- repo URL parsing (owner/name, https, .git suffix)
- task name sanitization (path/branch-safe, length cap, uuid fallback)
- asetup: fresh clone, idempotent, dirty-tree warning without failure
- token sourcing: kwarg wins, env fallback, stays None when neither set
- mode resolution: default returns query+update, tools returns read-only
- worktree lifecycle: created on first update, reused per session_id,
  ephemeral teardown when session_id is absent, cleaned up by aclose
- branch-prefix safety: git_push and create_pull_request both refuse
  non-prefixed branches
- path-escape: GitWriteTools rejects task_workdir outside workdir
- per-call author identity stamps Agno <[email protected]> without global
  git config
- gh integration: create_pull_request returns the URL, pr_status returns
  parsed JSON, missing gh produces a clear error

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Single combined file (matches the per-provider numbered convention in
12_context/, e.g. 05_slack.py): the read prompt always runs against
agno-agi/agno; the write prompt runs only when GITHUB_WRITE_REPO is
set, so a casual python cookbook/12_context/12_github.py never opens
a PR against a real repo.

Also updates the cookbook README to list the new provider and demo.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@ashpreetbedi ashpreetbedi changed the title feat: add Workspace toolkit for local file + shell ops with HITL feat: Workspace toolkit + GitHubContextProvider (HITL, edit-via-PR) Apr 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant