SafeRL-Lab · chauncygu · May 10, 2026 · May 10, 2026
diff --git a/README.md b/README.md
@@ -42,7 +42,8 @@ Other install methods: [pip install](#alternative-install-with-pip) | [uv instal
 ## 🔥🔥🔥 News (Pacific Time)
 
 
-- May 10, 2026 (latest): **Web Chat UI fixes — slash commands no longer reply twice; `--web --model X` actually applies the model.** Two related issues that surfaced when wiring a self-hosted vLLM endpoint into the Chat UI. (1) **Issue #111 — slash commands duplicated in Chat UI but not in terminal.** `web/api.py:handle_slash_sync` was both returning events inline in the HTTP response **and** broadcasting the same events to the WS subscribers of the same client; `chat.js` then iterated `data.events` AND fired `_handleEvent` from `ws.onmessage`, rendering every reply twice. Same bug in `handle_slash_stream` for SSE-streamed long commands (`/brainstorm`, `/worker`, `/agent`, `/plan`). Both helpers now deliver events through a single channel — HTTP/SSE only — so `_handleEvent` runs exactly once per event. Background-thread events (sentinel flows, agent runs) are unaffected: by the time the worker thread emits, `_broadcast` is already restored to the live WS broadcaster in `finally`. (2) **`--web --model X` was silently ignored.** The CLI override branch only ran in the interactive-REPL path; the `if args.web:` branch loaded config straight from disk and started the server, so `python cheetahclaws.py --web --model custom/qwen2.5-72b` would happily boot but every request handler reloaded `~/.cheetahclaws/config.json` with the previous model name (e.g. `gemma-4-31B-it`), producing a confusing `404: model does not exist` against the new endpoint. Fix: `cheetahclaws.py` now persists `args.model` to config before calling `start_web_server`, matching the documented behavior; `provider:model` → `provider/model` normalization is identical to the REPL path. User-side guide: [`docs/guides/web-ui.md`](docs/guides/web-ui.md) (Troubleshooting + Architecture notes updated).
+- May 10, 2026 (latest): **Web Chat UI session organization — folders, drag-drop, batch ops, resizable sidebar, ChatGPT-style active-folder context.** Built on top of the slash-fix branch the same day. (1) **Folders.** New `folders` table (per-user, name unique), `chat_sessions.folder_id` nullable FK with `ON DELETE SET NULL` semantics enforced at the repo layer (SQLite `PRAGMA foreign_keys` is off in this engine). Light-touch migration runs in `init_db()`: a `PRAGMA table_info` probe adds the column to existing DBs in place — no Alembic, no manual steps for upgraders. New endpoints: `GET/POST /api/folders`, `PATCH/DELETE /api/folders/{id}`, `PATCH /api/sessions/{id}/folder` (body `{folder_id: int|null}`); deleting a folder preserves its sessions and reparents them to "Ungrouped". (2) **Drag-and-drop + Move-to context menu.** Session items are HTML5-draggable; folder rows and the Ungrouped header are drop targets with `drop-target` highlight. Right-click on a session shows a flat `Move to:` section listing every folder, plus `(Ungrouped)` when applicable and `+ New folder…` for create-and-move in one shot. Right-click on a folder header offers Rename / Delete (with a `confirm()` that spells out "sessions become Ungrouped — they are NOT deleted"). (3) **Active-folder context (ChatGPT-style).** Click a folder name (not the disclosure arrow) to "enter" that folder — the row gets accent highlighting and the topbar grows a `Chat · in <Folder>` breadcrumb. While a folder is active, **`+ New` and direct-typing auto-create both drop the new session into that folder**, mirroring how OpenAI Projects scope new chats. Switching to any session syncs active-folder to that session's folder so the breadcrumb stays honest. State persists across reloads via `localStorage` (`cc-active-folder`); deleted folders auto-clear. (4) **Batch select.** "Select" button in the sidebar header enters a checkbox mode with a footer action bar: count, Select all (respects the search filter), Delete (single confirm with total-message count), Export (single combined Markdown download with one `## Session: <title>` block per session, `chats-N-sessions.md`), Cancel. Right-click context menu is suppressed in select mode to avoid mode confusion. (5) **Resizable sidebar.** 4-px drag handle between sidebar and main pane (mouse + touch); width clamped to 200–600 px and persisted to `localStorage` (`cc-sidebar-w`). Double-click resets to default. Hidden under `@media (max-width: 768px)` so the mobile drawer keeps its swipe behavior. **Tests:** +10 new in `test_web_api.py` (folder CRUD, duplicate-409, move, delete-preserves, cross-user isolation, list includes `folder_id`, batch delete, batch delete cross-user, batch export, batch export empty 400) — full file at 31 tests, all passing, zero regressions on the existing 21. User-side guide: [`docs/guides/web-ui.md`](docs/guides/web-ui.md) (Layout / Session management / Folders / HTTP API all updated).
+- May 10, 2026: **Web Chat UI fixes — slash commands no longer reply twice; `--web --model X` actually applies the model.** Two related issues that surfaced when wiring a self-hosted vLLM endpoint into the Chat UI. (1) **Issue #111 — slash commands duplicated in Chat UI but not in terminal.** `web/api.py:handle_slash_sync` was both returning events inline in the HTTP response **and** broadcasting the same events to the WS subscribers of the same client; `chat.js` then iterated `data.events` AND fired `_handleEvent` from `ws.onmessage`, rendering every reply twice. Same bug in `handle_slash_stream` for SSE-streamed long commands (`/brainstorm`, `/worker`, `/agent`, `/plan`). Both helpers now deliver events through a single channel — HTTP/SSE only — so `_handleEvent` runs exactly once per event. Background-thread events (sentinel flows, agent runs) are unaffected: by the time the worker thread emits, `_broadcast` is already restored to the live WS broadcaster in `finally`. (2) **`--web --model X` was silently ignored.** The CLI override branch only ran in the interactive-REPL path; the `if args.web:` branch loaded config straight from disk and started the server, so `python cheetahclaws.py --web --model custom/qwen2.5-72b` would happily boot but every request handler reloaded `~/.cheetahclaws/config.json` with the previous model name (e.g. `gemma-4-31B-it`), producing a confusing `404: model does not exist` against the new endpoint. Fix: `cheetahclaws.py` now persists `args.model` to config before calling `start_web_server`, matching the documented behavior; `provider:model` → `provider/model` normalization is identical to the REPL path. User-side guide: [`docs/guides/web-ui.md`](docs/guides/web-ui.md) (Troubleshooting + Architecture notes updated).
 - May 10, 2026: **Small-context local models survive large workloads — 4-part fix: ctx cap, auto-fanout, stagnation-stop, output paths under `~/.cheetahclaws/`.** Repro that motivated the work: running `/agent → 1 (Research Assistant)` on a 6.6 MB PDF (`AutoRedTeamer.pdf` — ~70k tokens of extracted text) with `custom/qwen2.5-72b` (32k ctx). Old behavior: 400 BadRequest "context length 32768"; the agent_runner kept polling the template every 2 s; the model produced **1500+ identical "task complete" summaries** before anything stopped it. New behavior, four cooperating layers: (1) **Per-model context-window registry + dynamic max_tokens cap** (`providers._MODEL_CONTEXT_LIMITS` + `get_model_context_window` + `dynamic_cap_max_tokens`) — covers Qwen 2.5/3, Llama 3.x, Mistral/Mixtral, Phi, Gemma, DeepSeek local variants; `_fetch_custom_model_limit` now backfills `PROVIDERS["custom"]["context_limit"]` so compaction sees the live `/v1/models` value; per-call shrink based on actual prompt size keeps `input + output + 1024 safety ≤ ctx`. `compaction.get_context_limit` gains an optional `config` arg so custom-endpoint detection works on the very first turn. (2) **Auto-fanout for oversize tool outputs** (`multi_agent/fanout.py`) — when a single tool result (Read on a huge PDF, Grep over a giant tree, WebFetch of a long article) exceeds 0.4 × ctx_window, split into chunks at paragraph boundaries with token-overlap, dispatch parallel sub-LLM map calls (one per chunk, default cap 5 subagents), merge with a single reduce call; substitutes the merged summary in conversation history instead of letting the next API call overflow. Hooked at the tool-result append site in `agent.py`; transparent UX prints `[Auto-fanout: <Tool> returned ~N chars (>threshold) → dispatching K parallel sub-summaries]`. Configurable: `auto_fanout_enabled` / `_threshold` / `_max_subagents` / `_chunk_overlap_tokens`. (3) **Stagnation-stop in `agent_runner.py`** — when the model emits the same summary N iterations in a row (default 3, whitespace/case-normalized), stop the loop with a clear notification instead of burning thousands of API calls; configurable via `auto_agent_dup_summary_limit` (0 disables). (4) **Agent output paths under `~/.cheetahclaws/`** — `/agent` wizard now resolves relative output filenames (e.g. `research_notes.md`) to absolute paths under `~/.cheetahclaws/agents/<name>/output/` instead of CWD; `AgentRunner` exposes `runner.output_dir`, eagerly mkdir'd; Summary block + post-start info show the resolved path in green; absolute paths pass through unchanged. **Tests:** +47 new (fanout 23, ctx cap 18, dup-stop 13, output paths 8). **Full suite: 2139 passing, zero regressions.** User-side guide: [`docs/guides/extensions.md`](docs/guides/extensions.md).
 - May 9, 2026: **`fix/agentic-on-every-model` branch — make every model produce useful work, and make `/brainstorm` an actual debate.** A single coordinated branch (9 commits, 269 new tests, zero regressions) that lands on weak / non-Claude models specifically. **Prompts:** new `prompts/overlays/qwen.md` overlay for qwen / qwq families plus an explore-first section in `default.md` so any model walks a directory before asking the user to name a file. **Runtime:** `agent.py` auto-nudge (one-shot, when user message contains an absolute path but the model replies text-only); read-only tool dedup (Read/Glob/Grep/WebFetch/WebSearch with identical args within a turn → 2nd call short-circuited, model gets a `[deduped]` reminder); KeyError-on-empty-args hardening in tool dispatch (`Write({}) → KeyError: 'file_path'` is now a friendly "missing required parameter" error the model can self-correct from). **Providers:** new `nim` provider (build.nvidia.com free tier, 10-model curated chain) invoked as `nim/<vendor>/<model>`, with 429 cascade fallback (cap 3 swaps/turn, gated to NIM only). **`/brainstorm` overhaul:** real lead moderator (`--lead <model>`) does opening (sets agenda + bans filler) → personas debate in N rounds (`--rounds N`, default 2) → lead probes after each round → lead synthesizes a structured master plan inline (no main-agent Read needed); round 2+ is **adversarial cross-examination** — every persona MUST quote another agent's claim and attack it with a falsifiable counter, "agree-and-extend" is forbidden, lead probes any dodge. New `--models a,b,c` flag distributes different models per persona for epistemic diversity. **`/monitor` + `/research` stability:** `/subscribe` no longer truncates multi-word topics ("Agent OS Benchmark" used to become "Agent"); aggregator no longer deadlocks on a hung source after `as_completed` timeout; REPL Ctrl+C during a slow slash command cancels just that command instead of killing the whole process. Branch: `fix/agentic-on-every-model`. User-side guide: [`docs/guides/brainstorm.md`](docs/guides/brainstorm.md).
 - May 8, 2026: **Agent-OS layer (`cc_kernel/`) reaches v1.0 — 27 RFCs shipped, 1771 tests passing, zero regressions on the legacy REPL/bridges path.** 
@@ -932,8 +933,11 @@ On first visit to `http://localhost:<port>/chat`, the UI routes you to a **regis
 |---------|---------|
 | **Streaming chat** | WebSocket for live prompts + SSE for long-running slash commands |
 | **Persistent history** | Every session + message lives in SQLite (`~/.cheetahclaws/web.db`). Server restart does not lose state. |
-| **Sidebar session management** | Title auto-titled from first user message, relative time ("12m ago"), message count, busy dot, client-side search, right-click menu (Rename / Export Markdown / Delete) |
-| **Cross-user isolation** | Each user only sees their own sessions — enforced at DB query and in-memory cache |
+| **Sidebar session management** | Title auto-titled from first user message, relative time ("12m ago"), message count, busy dot, client-side search, right-click menu (Rename / Export Markdown / Move to / Delete) |
+| **Folders + ChatGPT-style Projects** | `+ Folder` button creates per-user folders; drag a session onto a folder header (or right-click → Move to ▸) to file it; click a folder name to "enter" — `+ New` and direct-typing then auto-drop the new session into that folder, with a `Chat · in <Folder>` topbar breadcrumb. Deleting a folder reparents its sessions to "Ungrouped" rather than deleting them. |
+| **Batch operations** | "Select" button enters multi-select mode (checkboxes, Select all respects the search filter); a footer action bar batch-deletes (single confirm + total-message count) or batch-exports as a single combined Markdown (`chats-N-sessions.md`). |
+| **Resizable sidebar** | Drag the 4-px divider between the sidebar and the chat pane (200–600 px clamp); double-click resets; width persists across reloads. |
+| **Cross-user isolation** | Each user only sees their own sessions and folders — enforced at DB query and in-memory cache |
 | **Tool cards** | Collapsible cards show tool name, inputs, outputs, status (running / done / denied) |
 | **Permission approval** | Inline Allow / Deny buttons |
 | **45+ slash commands** | `/status`, `/model`, `/brainstorm`, `/ssj`, `/plan`, `/telegram`, `/wechat`, `/slack`, `/voice`, `/image`, etc. |
@@ -955,6 +959,9 @@ Browser ──→ /chat                ──→ 9 JS modules load from /static/
         ──→ /api/prompt (POST)   ──→ persists to SQLite, fans events out
         ──→ /api/events (WS)     ──→ real-time text_chunk / tool_* / permission_*
         ──→ /api/sessions/*      ──→ list / get / rename / delete / export
+                                       + batch_delete / batch_export
+                                       + {id}/folder (move to folder)
+        ──→ /api/folders         ──→ list / create / rename / delete folders
 
         ──→ /                     ──→ xterm.js PTY (password-gated)
         ──→ /health               ──→ { ok, db, uptime_s }        (unauthenticated)