A persistent, tool-wielding Claude agent with Discord and web UI interfaces. Built to be genuinely cheap to run — not as an afterthought, but as a core design constraint baked into every layer of the architecture.
Galadriel just grew a memory palace. Not a vector-DB-as-a-service. Not a paid tier. A local, embedded, verbatim store of everything she has ever written — searchable by meaning, not just keywords — with zero Anthropic tokens spent on retrieval.
The integration is built on MemPalace, an independent local-first memory library. MemPalace does the real work (storage, embeddings, knowledge graph, temporal reasoning, compression). This harness adds the wrappers that expose it to the agent as 10 new tools (14 total, up from 4) and wires it into the lifecycle — conversations are archived before /new clears them, daily logs are mined at goodnight, and a compact wake-up snapshot rides in the dynamic block so she walks into every session with her own continuity.
Why this is the headline change:
| Problem before | Solution now |
|---|---|
Verbatim history was lost at /new or compaction |
Everything is archived to the palace before it's cleared |
| Recall of facts older than today meant grepping daily logs | Semantic search across every config, log, and archived conversation |
| "What did we decide about X?" drained API budget (big context re-reads) | Zero tokens — all retrieval runs locally in ChromaDB + SQLite |
| No structured facts — everything was prose | Knowledge graph with temporal triples: subject --[predicate]--> object, with validity windows |
| No sense of self across sessions | Diary in her own voice; L0 wake-up snapshot injected into every turn |
Measured impact (14 consecutive API calls on a deployed instance):
| Metric | Value |
|---|---|
| Cache hit ratio (post-integration) | 86.5% |
| Total-input token savings vs. no caching | 71.2% |
| Palace lookup cost per search | 0 tokens — ChromaDB query runs locally |
| Palace lookup cost for a 5-hop KG timeline | 0 tokens — SQLite traversal runs locally |
| Estimated annual overhead of the integration | ~$95/year (additional) |
| Drawers indexed on a real deployment | 706 across 7 rooms + 8 halls |
| Tools added | 10 (palace_search, palace_add_drawer, palace_wake_up, palace_taxonomy, palace_kg_add/query/invalidate/timeline, palace_diary_write/read) |
The 90% cache-read discount remains intact. Adding MemPalace costs ~1.5 percentage points of cache hit ratio (10 extra tool schemas in the tools-layer cache + a ~800-token wake-up snapshot in the dynamic block) and the rest is measured, bounded, and dial-backable (PALACE_WAKE_UP_INJECT=0).
What this means in practice:
- Short term (within a session): The agent can pull back a verbatim quote from a conversation three weeks ago — no re-reading of logs, no "I don't have that context." One tool call, zero tokens, the exact words you said.
- Long term (across months): The knowledge graph preserves history. When a fact changes, the old triple gets a
valid_todate and the new one goes in — so "what was the max_tokens setting last October?" and "what is it now?" both resolve correctly. Nothing is overwritten, only superseded. - On relational questions: Graph traversal ("everything ever said about the payment service," "every decision involving the scheduler," "the full timeline of the Polly voice choice") resolves as one KG call against the local SQLite store. The kind of query that, done naively through conversation history, would cost you real money — or just fail outright because the context has long since been compacted away.
Read on for the metaphor system (wings, rooms, drawers, halls) and the caching details that make this affordable in the first place.
MemPalace organizes memory the way a human would organize a library, and the agent uses exactly the same words.
| Metaphor | What it is | Example |
|---|---|---|
| Drawer | A single chunk of content — the atomic unit. ~200–1000 tokens, a verbatim slice of something the agent (or you) wrote. | One paragraph of a daily log. One decision note. One archived Discord exchange. |
| Room | A folder-based grouping of drawers. Every drawer belongs to exactly one room. | room=memory (daily logs), room=harness (her own code), room=tower (the web UI), room=discord_bot, room=cmd, room=configuration, room=general. |
| Wing | The top-level namespace. Usually one per agent. | wing=agent is the default. |
| Hall | A keyword-based, auto-classified topic that cross-cuts rooms. A drawer about a bug in harness code lives in room=harness AND hall=problems. |
hall=decisions, hall=problems, hall=milestones. |
Why this matters: rooms let you say "look only in the code area", halls let you say "look only at things tagged as problems", and you can compose both. A search like palace_search("retry logic", room="harness", hall="problems", k=10) reads as "give me bug-tagged content from the code room" — which is exactly how a human would ask a librarian.
The agent's diary is a separate wing — her own journal, written at end-of-session, read at wake-up. Her own voice to her future self, not mixed with operational logs.
The knowledge graph sits alongside the drawers. Where drawers are prose, the KG is relational: claude-opus-4-6 --[supports]--> prompt_caching with valid_from=2025-07-10. When a fact changes you don't delete the old triple, you invalidate it. History is preserved; the timeline is queryable.
The library is MemPalace. All credit for the storage layer, the embedding pipeline, the knowledge graph, the AAAK compression dialect, and the wake-up generation belongs to the MemPalace team. This harness is a consumer — it adds the Python wrappers, the tool schemas, and the lifecycle hooks (archive-before-clear, mine-at-goodnight, inject-at-wake-up) that expose the library to a running Claude agent.
# 1. Install (mempalace is in requirements.txt)
pip install -r requirements.txt
# 2. Copy the room layout template
cp mempalace.yaml.example mempalace.yaml
# 3. Initialize palace storage (defaults to ~/.mempalace/)
mempalace init
# 4. Seed the palace with everything you've got
mempalace mine .That's it. The harness picks it up automatically on next start. palace_search works immediately; the wake-up snapshot appears in the next API call.
| Variable | Default | Purpose |
|---|---|---|
MEMPALACE_PATH |
~/.mempalace/palace |
Where the palace lives on disk. Read by MemPalace itself. |
PALACE_ARCHIVE_ROOT |
~/.mempalace/archive |
Where archived conversations + pre-compaction tool_results land before being mined. |
PALACE_WAKE_UP_FILE |
~/.mempalace/wake_up.md |
Cached wake-up snapshot. |
PALACE_WAKE_UP_INJECT |
1 |
Set to 0 to disable the wake-up injection into the dynamic block (recovers a small amount of per-call token overhead if budget is tight). |
Here is a fact that most Claude API users don't know about: cached tokens cost 90% less than regular input tokens. Not 10% less. Not 20% less. Ninety percent. It's in the Anthropic docs, but the majority of people building with the API leave this entirely on the table.
The math is brutal in your favour. Every API call you make, Claude processes your system prompt from scratch — your personality definition, your memory files, your tool schemas — and you pay full price for every token, every time. With prompt caching, after the first call, all of that context reads at $0.30/MTok instead of $3/MTok (on Sonnet). That's the same intelligence, the same context, for a tenth of the cost. On a long-running personal agent with a rich system prompt, this is not a rounding error. It changes the economics entirely.
Galadriel exploits this with three cache breakpoints, stacked deliberately:
| Cache layer | What it covers | Behaviour |
|---|---|---|
| Tool definitions | All 14 tool schemas (4 core + 10 palace) | Cached once at startup, never re-sent |
| Stable system block | Personality + memory + identity files | Marked cache_control: ephemeral; hits at ~100% after first call |
| Trailing message history | The growing conversation | Attached per-call; cache hit rate rises every turn |
The stable block alone — your SOUL.md, MEMORY.md, identity files — is typically 4 000–8 000 tokens. On a warm cache, those tokens cost $0.08–$0.30/MTok instead of $0.80–$3.00/MTok depending on model. That's your biggest fixed overhead per call, cut by 90%, on every single turn of the conversation.
Anthropic's own benchmarks show latency dropping by up to 85% on long prompts with caching engaged. A 100K-token context that took 11.5 seconds drops to 2.4 seconds. For a persistent agent that carries memory across sessions, this is the difference between a tool that feels alive and one that grinds.
Compaction finishes the job. The /compact command uses Claude Haiku — the cheapest model in the family — to summarize old tool results in your conversation history. A 60-message session bloated with verbose shell output compresses to 20% of its token count, for a fraction of a cent. Haiku handles the summarization; Opus handles the thinking.
Use /status in Discord at any time to watch live token numbers — input, cache_read, cache_write, output — for the last API call.
Prompt caching has a minimum prefix length before it engages. If your stable block is too short, the API silently skips caching entirely — you get no error, no warning, just a cache_read=0 in every log line and a bill that looks exactly like the naive approach.
| Model | Minimum to activate caching |
|---|---|
| Claude Opus (any version) | 4,096 tokens (~16 KB of text) |
| Claude Haiku 4.5 | 4,096 tokens |
| Claude Sonnet 4.6 | 2,048 tokens |
| Claude Sonnet 4.5 / 4 | 1,024 tokens |
Out of the box, config/SOUL.md + config/MEMORY.md together are roughly 500–800 tokens. That is well below the Opus threshold. Caching will not engage until you cross it.
The fix: fill in config/CONTEXT.md. Drop your project's architecture, goals, key file paths, known quirks, and current status into it. Any *.md file you place in config/ is automatically loaded into the stable cache block — so adding content there is all it takes. A reasonably filled CONTEXT.md (1–2 pages of project notes) will push the total above 4K tokens and keep it there.
Once you're over the threshold, verify it's working:
journalctl -u galadriel -f # or check your terminal outputLook for lines like:
Tokens | input=60 cache_read=5800 cache_write=0 output=240
cache_read climbing and cache_write near zero after the first call = caching is engaged and you're paying 10 cents on the dollar for that context. If cache_read stays at 0, add more content to config/CONTEXT.md. See CACHING.md for the full breakdown and a worked cost example.
Sonnet users: your minimum is only 2,048 tokens, so SOUL.md + MEMORY.md alone may be enough. But filling CONTEXT.md is still worthwhile — the agent has your project context without needing tool calls to find it.
This project's CLAUDE.md embeds the Andrej Karpathy coding guidelines — four principles distilled from Karpathy's observations on how LLMs fail as coding assistants when left to their own instincts.
Karpathy's insight is that LLMs have a systematic failure mode: they over-build. Given any instruction, they add abstraction layers that weren't asked for, refactor adjacent code that wasn't broken, invent "flexibility" that will never be used, and generate 200 lines when 40 would suffice. The guidelines are a direct antidote to that tendency:
1. Think Before Coding — State assumptions explicitly. If multiple interpretations exist, surface them — don't pick silently. If something is unclear, stop and ask rather than confidently building the wrong thing.
2. Simplicity First — Minimum code that solves the problem, nothing speculative. No unrequested features. No abstractions for single-use code. No error handling for impossible scenarios. If it could be 50 lines, make it 50 lines.
3. Surgical Changes — Touch only what the task requires. Don't improve adjacent code. Don't refactor things that aren't broken. Match existing style. When your changes make something obsolete, remove it — but leave pre-existing dead code alone.
4. Goal-Driven Execution — Transform vague tasks into verifiable goals. "Fix the bug" becomes "write a test that reproduces it, then make it pass." Clear success criteria let the agent loop independently to completion rather than guessing when it's done.
These aren't abstract ideals — they are mechanically enforced via the CLAUDE.md file that Claude Code (and Galadriel, when asked to modify her own harness) reads before every task. The result is fewer rewrites, smaller diffs, and changes that trace directly to what was asked. For a codebase that runs as a persistent service you actually depend on, this matters.
- Discord gateway — DMs, channel mentions, or a dedicated channel; gated by user ID
- Web UI (Tower) — local chat interface and dashboard at
localhost:8080 - Tool use — 14 tools: shell execution, file read/write, memory logging, and 10 MemPalace tools (semantic search, knowledge graph, diary, taxonomy); all async, non-blocking
- Persistent verbatim memory — local MemPalace integration with wings/rooms/halls/drawers, zero-token retrieval, archive-before-clear on
/new, goodnight mine of daily logs, wake-up snapshot in the dynamic block - Safety tiers — green (auto), yellow (notify), red (Discord reaction approval required)
- Scheduler — morning briefing, goodnight, configurable heartbeat
- Job watcher — monitors
/tmp/galadriel-jobs/*.donemarkers and reports completions - Compaction — Haiku-powered context compression on demand (archives verbatim tool_results to the palace before summarizing)
- Three-layer prompt caching — automatically managed, always active
# 1. Clone
git clone https://github.com/avasol/galadriel-public.git
cd galadriel-public
# 2. Install (includes mempalace — dependency of the memory palace)
pip install -r requirements.txt
# 3. Configure
cp .env.example .env
# Edit .env — set ANTHROPIC_API_KEY at minimum
# 4. (Optional but recommended) Seed the memory palace
cp mempalace.yaml.example mempalace.yaml
mempalace init # creates ~/.mempalace/
mempalace mine . # indexes this repo into the palace
# 5. Run
python main.pyTower-only mode: Omit DISCORD_BOT_TOKEN — the harness runs with just the web UI on port 8080.
Full mode: Set both ANTHROPIC_API_KEY and DISCORD_BOT_TOKEN.
Skipping step 4? That's fine — the harness runs normally and palace tools just return [palace unavailable] until you seed. You can do it any time.
main.py Entry point — wires all components, starts Discord + Tower
harness/
agent.py Core agent loop: Anthropic API, tool use, cache management
memory.py Stable + dynamic system prompt blocks; daily memory logs
tools.py 14 tools: run_shell, read_file, write_file, memory_log + 10 palace_*
palace.py MemPalace wrapper: search, archive, wake-up, KG, diary, taxonomy
safety.py Command classification (green / yellow / red)
compaction.py Haiku-powered context compression (archives to palace first)
scheduler.py Morning briefing, goodnight (mines daily logs), heartbeat
job_watcher.py Background job completion notifications
error_humanizer.py Readable Anthropic API error mapping
discord_bot/
bot.py Discord gateway, approval buttons, slash + prefix commands
tower/
app.py Flask dashboard + REST API
templates/ Tower UI HTML
static/ CSS
config/
SOUL.md Agent personality and values (your main customization point)
MEMORY.md Long-term memory (agent-maintained)
CONTEXT.md Your project context — fill this in to activate Opus caching
TOOLS.md Palace tool reference + decision matrix (read by agent on every call)
visions/ Optional per-project context files
memory/ Daily logs — auto-generated, gitignored
mempalace.yaml.example Room-structure template for `mempalace init` (copy to mempalace.yaml)
~/.mempalace/ Palace storage (created by `mempalace init`) — overridable via MEMPALACE_PATH
config/SOUL.md contains Galadriel's complete identity — the Cyber-Elf persona, her values, her voice, her continuity instructions. This is not a placeholder. Clone the repo, set your API key, and she's alive. You don't need to touch SOUL.md to get started.
When you're ready to make her your own: edit the name, rewrite the vibe, change the metaphors. The harness is fully persona-agnostic — SOUL.md is just a Markdown file. Some people have replaced her entirely with a stoic Roman general, a dry British detective, a no-nonsense SRE. It works because the character lives in the file, not in the code.
config/MEMORY.md is her operational memory: your name, your infrastructure, your constraints. The agent can update it herself during a session using the write_file tool. Here's what a real deployment looks like:
## About Your User
- User Name: Lord Isildur ← what she calls you, every message
- Authorized Discord ID: 123456789012345678
## Infrastructure
- Server: EC2 t4g.medium, eu-north-1
- Working Dir: /opt/galadriel
- Python Venv: /home/ubuntu/.venv
- Model: claude-opus-4-6
## Operational Notes
- AWS_PROFILE must be blank when using instance role
- Git remote: https://github.com/you/galadriel-public.gitFill in your real values and she'll orient herself correctly from the first message of every session.
config/CONTEXT.md is where you describe what you're building. It loads into the stable cache block alongside SOUL.md and MEMORY.md, so Galadriel always has your project's architecture, goals, and known quirks available without needing tool calls to find them. It's also what pushes the stable block over the Opus cache minimum — see the warning above.
| Command | Description |
|---|---|
/new |
Archive conversation to the palace, then start fresh |
/compact |
Compress history with Haiku (archives verbatim tool_results to the palace first) — reports token reduction |
/status |
Model, memory usage, last API token breakdown, scheduler state |
| Command | Description |
|---|---|
!status |
Same as /status |
!clear |
Archive to palace, then clear history for this channel |
!new |
Same as !clear — archive then fresh start |
!compact |
Compress history (with palace archive of long tool_results) |
| Input | Behaviour |
|---|---|
rest / rest. / rest! |
Disable heartbeat; agent acknowledges |
All shell commands are classified before the agent executes them:
| Tier | Behaviour | Examples |
|---|---|---|
| 🟢 Green | Auto-execute | ls, git status, aws s3 ls, cat, python3 script.py |
| 🟡 Yellow | Notify, proceed | git push, pip install, sudo systemctl, sam deploy |
| 🔴 Red | Discord reaction required (✅/❌, 30s timeout → denied) | rm, IAM changes, CloudFormation mutations, shutdown |
Unknown commands default to yellow. Red commands denied by timeout or ❌ are never executed.
| Event | Default time | Condition |
|---|---|---|
| Morning briefing | 09:10 CET | Workdays (Mon–Fri) |
| Goodnight | 21:00 CET | Daily; disables heartbeat |
| Heartbeat | Every 10 min | When enabled; off by default |
See .env.example for the full list with inline documentation.
| Variable | Required | Description |
|---|---|---|
ANTHROPIC_API_KEY |
Yes | Claude API key |
DISCORD_BOT_TOKEN |
No | Enables Discord gateway |
DISCORD_AUTHORIZED_USER_ID |
No | Only this Discord user ID can interact |
DISCORD_CHANNEL_ID |
No | Guild channel for conversation |
TOWER_HOST |
No | Tower bind address (default: 127.0.0.1) |
TOWER_PORT |
No | Tower port (default: 8080) |
TOWER_SECRET_KEY |
No | Flask session secret — change this |
AGENT_MODEL |
No | Claude model (default: claude-opus-4-6) |
AGENT_MAX_TOKENS |
No | Max output tokens per call (default: 8192) |
MEMPALACE_PATH |
No | Palace directory — read by the MemPalace library itself (default: ~/.mempalace/palace) |
PALACE_ARCHIVE_ROOT |
No | Where archived conversations + pre-compaction tool_results land before mining (default: ~/.mempalace/archive) |
PALACE_WAKE_UP_FILE |
No | Cached wake-up snapshot path (default: ~/.mempalace/wake_up.md) |
PALACE_WAKE_UP_INJECT |
No | Set to 0 to disable injection of the wake-up snapshot into the dynamic system-prompt block (default: 1 — enabled) |
Before running on a public server, read this.
Tower UI has no authentication. It's designed to run on 127.0.0.1 and be accessed via SSH tunnel. Binding it to 0.0.0.0 on a server with an open port gives anyone who can reach that port full agent access — which includes shell execution.
Access Tower over SSH tunnel:
ssh -L 8080:localhost:8080 user@host— keepTOWER_HOST=127.0.0.1.
Discord is the secure interface. Authorization is enforced by DISCORD_AUTHORIZED_USER_ID. Only messages from that user ID are processed. Unauthorized users get "I do not know you, stranger."
run_shell is unrestricted. The agent can execute any command the process user can run. The safety tier system classifies and gates commands, but it's defense-in-depth, not a sandbox. Run the harness as a low-privilege user on a dedicated machine or VM.
read_file and write_file have no path restrictions. The agent can read any file the process can access. This is intentional for a personal assistant that needs to operate freely on your system.
Debug prompt dumps are excluded from git (.gitignore covers debug/prompts/). If you re-enable them, be aware they contain your full system prompt including personality and memory files.
A silent dataloss path was identified and closed. Previously, if an agent response ran over the max_tokens ceiling three times in a row, the harness trimmed the conversation twice (dropping messages from the front) and then hard-reset it — without archiving the dropped content to the palace. The archive-before-clear contract established in 1.12 for /new and /compact didn't extend to this recovery path. A runaway output cascade could eat an entire channel's verbatim history.
Four changes in harness/agent.py and one in config/SOUL.md close this:
-
Archive-before-recovery. At the first
max_tokensretry, before any trim or reset fires, the current message list is snapshotted and queued viaasyncio.create_task(palace.archive_conversation(...))with a channel tag ofmax_tokens_<channel_id>. One archive per cascade covers both subsequent trims and a possible hard reset. Fire-and-forget — recovery is never blocked by the mine. -
Output-ceiling early warning. A new
_maybe_warn_output_ceilingfires the existingcontext_warning_callbackwhen two consecutive responses come within 100 tokens ofmax_tokens. Gives the user a chance to/compactor steer toward brevity before the third strike starts the cascade. Silent no-op if no callback is wired up. Streak resets on any response that comes in comfortably below the ceiling. -
Post-recovery advisory. When a cascade archives + trims/resets, the archive tag is recorded per-channel. On every subsequent
respond()call in that channel (until it's genuinely cleared via/new), a[SYSTEM:POST-RECOVERY-ADVISORY]block is appended to the system prompt telling the model the archive tag so it canpalace_searchif the user references missing history. The reset message itself also advertises that the prior exchange was preserved in the palace. -
Concision principle in
SOUL.md. A new "Favour the scalpel" line in the Vibe section soft-caps runaway prose at the persona level. "A 2000-token response almost always hides a 400-token answer." Lead with the answer, stop when it's said.
All changes are additive and gracefully degrade. If MemPalace isn't installed, the archive step silently no-ops (the trim/reset still happens so the conversation can continue). If the context_warning_callback isn't wired up, the output-ceiling warning is silent. The harness still works without any of the Palace integration.
10 new tools, 14 total. The agent now has a local semantic memory palace (MemPalace) wired into the harness as first-class tools: palace_search, palace_add_drawer, palace_wake_up, palace_taxonomy, palace_kg_add / kg_query / kg_invalidate / kg_timeline, palace_diary_write / diary_read. All retrieval runs locally in ChromaDB + SQLite — zero Anthropic tokens spent on any palace operation, including multi-hop knowledge-graph traversals that would otherwise cost real money through conversation history.
Lifecycle hooks. /new, !new, and !clear now archive the conversation to the palace before clearing it (via a new GaladrielAgent.pop_and_archive_history()), so nothing is lost at the moment of wipe. Goodnight (21:00 CET) fires palace.archive_daily_logs() so today's log becomes searchable overnight. /compact and context compaction file verbatim tool_results to the palace before they're replaced with Haiku summaries.
Wake-up injection. A compact L0+L1 snapshot (~800 tokens, cached to ~/.mempalace/wake_up.md by a subprocess that keeps chromadb out of the main process) rides in the dynamic system-prompt block on every API call. Disable with PALACE_WAKE_UP_INJECT=0 if you want to dial back per-call overhead.
Cache impact, measured. 14 consecutive calls on a real deployment: 86.5% cache hit ratio, 71.2% total-input token savings vs. no caching. The 90% cache-read discount is intact — integration costs ~1.5 percentage points of cache hit ratio (one extra wake-up snapshot in dynamic, 10 more tool schemas in the tools-layer cache). Estimated annual overhead: ~$95.
Graceful degradation. If MemPalace isn't installed, all palace tools return [palace unavailable] at dispatch time; the rest of the harness runs normally. Upgrade path is pip install mempalace>=3.3.2,<3.4 + mempalace init + mempalace mine ..
Palace Protocol codified in SOUL.md — 5 non-negotiable rules: verify before speaking, say "let me check" when unsure, diary at session-end, invalidate-then-add when facts change. See config/TOOLS.md for the full decision matrix (memory_log vs palace_add_drawer vs palace_kg_add vs palace_diary_write).
All credit for the underlying memory system goes to the MemPalace team. This release is the harness integration; MemPalace is the engine.
Buttons replace reactions. Red-tier command approvals now render as Discord UI buttons (discord.ui.View) instead of ✅/❌ reactions. The "1/1" counter artifact from the bot's own seed reactions is gone, buttons disable on click to prevent double-submits, and the resolved message shows a proper greyed-out state. Also noticeably better on mobile — tap targets beat emoji-picker fiddling.
Dedup concurrent approvals. When Claude re-emits the same run_shell tool_use (typically after a max_tokens retry), subsequent callers now attach to the in-flight Future instead of spawning a second bubble. One bubble, one click, every caller gets the same answer. Fixes the "⏰ Timed out (denied)" message that could appear for a command which had already been approved and executed successfully. The resolved bubble also annotates dedup hits — (merged 2 requests) etc — so it's visible when the path fires.
iOS screenshot support. Discord's content_type header is unreliable on iOS — screenshots arrive labelled image/jpeg even when the bytes are PNG. Anthropic's API validates the actual format and returned a 400, breaking image upload on mobile. The harness now sniffs magic bytes (PNG, JPEG, GIF, WEBP) and uses the real type. Discord's header is treated as a hint, not truth.
Image retention by user turn. /compact strips image blocks from any message older than the last 3 user turns, independent of total message count. Previously images only aged out once they fell behind the "last 20 messages" cutoff, which could span many turns when tool use was involved. Three exchanges in, the base64 blob is usually moot — stop paying to carry it.
Humanized API errors. Instead of dumping raw exception repr to Discord (Error code: 400 — {'type': 'error', ...}), common Anthropic API exceptions are now mapped to short, readable explanations: timeouts, rate limits, auth failures, overloaded 529s, bad-request details, model-not-found hints. Unknown errors still fall through unchanged. Server logs continue to capture the full traceback for forensics.
MIT
