This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Otterbot is a Telegram chatbot that serves as a board game assistant for the NCS MA boardgames group. It has two primary functions:
- Research: Downloads and stores game rules/documentation from the web
- Q&A: Answers user questions about game rules using RAG (retrieval-augmented generation)
otterbot/
├── bot/ # Telegram bot code
│ ├── main.py # Bot entry point
│ ├── otterrouter.py # Message routing and intent handling
│ ├── tools.py # Research, Query, and Games List tools
│ ├── webapp.py # WebApp button utilities
│ ├── utils.py # Message formatting helpers
│ ├── db/ # Database layer
│ ├── llms/ # OpenAI integration
│ └── datasources/ # FAISS vector search
├── api/ # FastAPI web server
│ ├── server.py # API endpoints
│ ├── render.py # HTML rendering
│ ├── templates/ # HTML templates
│ └── static/ # CSS and assets
├── storage/ # Game files and databases
├── assets/ # Static assets (images, etc.)
└── scripts/ # Utility scripts
# Install dependencies
poetry install
# Activate virtual environment (if not using poetry shell)
source venv/bin/activateRecommended: Start both services together
# Start both Telegram bot and FastAPI server with one command
bash scripts/start.sh
# This script:
# - Starts FastAPI on 0.0.0.0:8000 (api/server.py)
# - Starts Telegram bot (bot/main.py)
# - Tracks PIDs for both processes
# - Handles cleanup on Ctrl+C (kills both processes)Alternative: Start services separately
# Terminal 1: Start the Telegram bot
python3 bot/main.py
# Terminal 2: Start the FastAPI file server (for browsing stored files)
uvicorn api.server:app --host 0.0.0.0 --port 8000 --reloadImportant: The FastAPI server must bind to 0.0.0.0 (not 127.0.0.1) to be accessible from external browsers. Links sent by the bot (e.g., https://otterbot.space/games/1/files) require the API server to be running and accessible.
# Format and lint code (applies fixes)
sh scripts/lint.sh
# Check only (no fixes)
sh scripts/lint.sh . --check
# Individual tools
ruff format . # Format code
ruff check . --fix # Lint and fix
mypy . # Type checkingThe bot uses Telegram's WebApp feature to display game files in a native in-app interface:
Components:
- WebApp Utilities (bot/webapp.py): Reusable functions for creating WebApp buttons
create_game_files_button(game_id, game_name)- Button to view files for a specific gamecreate_games_library_button()- Button to browse all games
- Research workflow (bot/otterrouter.py:110-115): Sends WebApp button after researching
- Games list (bot/otterrouter.py:72-90): Displays WebApp buttons (2 per row) for all ready games
- Display: FastAPI (api/server.py) serves beautiful HTML pages within Telegram's WebApp viewer
- Templates: HTML in api/templates/game_files.html
- Styles: Responsive CSS in api/static/css/styles.css (mobile-optimized, 2 files per row on phones)
User Experience:
- User researches a game → Bot sends "📂 View [Game] Files" button
- User taps button → WebApp opens in-app showing all downloaded files
- User can view PDFs, HTML pages, and external links without leaving Telegram
Message handling (bot/otterrouter.py:20 otterhandler)
- All Telegram messages are filtered through one handler
- Only responds to messages mentioning "otter" in the first 32 chars (or private chats)
- Routes to either Research or Query workflow based on AI intent classification using GPT-4o-mini structured output
Research workflow (bot/tools.py:377 ResearchTool.research)
- BGG URL Discovery: Try BGG XML API (with
exact=1parameter), fallback to Google search if 401 (authentication required) - Parallel Fetch (3 concurrent tasks):
- Web research: OpenAI Responses API finds 20-30 authoritative sources
- BGG metadata: Fetch actual BGG page HTML, extract difficulty & player count from real content (8000 chars + JSON-LD)
- YouTube tutorial: Use YouTube Data API v3 to find best tutorial (scored by views, channel quality, relevance)
- YouTube Validation: Validate video URL with oEmbed API, fallback to Google search if needed
- Download PDFs and HTML pages, extract text from HTML
- Store files in
storage/games/<game-id>/ - Create FAISS vector index from all text chunks
- Generate game description using GPT-4o-mini from source summaries
- Save all metadata (BGG URL, YouTube link, difficulty, player count, description)
- Send response with description, metadata, WebApp button
Query workflow (bot/tools.py:626 QueryTool.answer)
- Game Identification: Extract game name via LLM structured output + fuzzy matching, fallback to chat history
- Context Retrieval: If game researched → FAISS semantic search for top 5 relevant chunks
- Hybrid Answer: OpenAI Responses API combines internal docs + web search for comprehensive answers
- Source Attribution: Show internal file citations + disclaimer if game not researched yet
- All answers end with 🦦
games: Core game records with status tracking
slug: URL-safe identifier derived from namestatus: created → researching → readystore_dir: Local path for downloaded files
game_sources: Downloaded/linked resources per game
source_type: pdf|html|link|video|txt|otherlocal_path: Where file is stored (if downloaded)
chat_log: Telegram conversation history
- Stores user/assistant/system messages
game_slug: Tagged game for context inference- Used to infer which game user is asking about
Web Research: Uses OpenAI Responses API (NOT Chat Completions)
client.responses.create()withweb_searchtool- Returns structured JSON with source URLs
- Prompt in bot/llms/prompt.py:31
WEB_RESEARCH_PROMPT
Q&A: Uses Responses API with web_search
client.responses.create()withweb_searchtool- Combines internal FAISS results with web search
- Prompt in bot/llms/prompt.py:104
WEB_SEARCH_QA_PROMPT
BGG Metadata: Fetches actual BGG page HTML
- Direct HTTP request to BGG URL (not web search)
- Extracts JSON-LD structured data + visible text (8000 chars)
- GPT-4o-mini parses difficulty score & player count from REAL page content
- If page inaccessible (404/timeout) → removes BGG link entirely
- Prompt in bot/llms/prompt.py:124
BGG_METADATA_EXTRACTION_PROMPT
YouTube Tutorial Search: Uses YouTube Data API v3 (NOT LLM)
googleapiclient.discovery.build('youtube', 'v3')- Searches with 3 query variations: "how to play", "board game rules", "learn to play"
- Scores videos by: view count (log scale, max 50pts), quality channels (+30pts), title relevance (+20pts), game name match (+15pts), like ratio (max 10pts)
- Quality channels: Watch It Played, JonGetsGames, Shut Up & Sit Down, The Rules Girl, Rodney Smith, etc.
- Validates final URL with YouTube oEmbed API before returning
- Falls back to Google search if API fails
Description Generation: Uses Chat Completions API
client.chat.completions.create()with GPT-4o-mini- Generates 2-3 sentence game descriptions from source summaries
- Prompt in bot/llms/prompt.py:64
GAME_DESCRIPTION_PROMPT
The FastAPI server (api/server.py) provides:
- JSON API endpoints for programmatic access
- Beautiful HTML interface for browsing game files (WebApp-ready)
Key endpoints:
GET /games- List all games (JSON)GET /games/{game_id}- Get game details (JSON)GET /games/{game_id}/files- Browse game files (HTML by default, add?format=jsonfor JSON)GET /files/{game_id}/{filename}- Serve static files (PDFs, HTML, etc.)
HTML Interface Features:
- Mobile-optimized: 2 files per row on phones, responsive grid on tablets/desktop
- Clean separation: HTML templates (api/templates/) and CSS (api/static/css/)
- Rendering: api/render.py handles HTML generation
- Files grouped by type (PDFs, Web Pages, External Links)
- PDF preview thumbnails embedded in cards
- Hover animations and modern UI
- Badges showing downloaded vs. external files
- Direct links to view files and original sources
- python-telegram-bot: Telegram bot framework
- openai: For Responses API (web research) and Chat API (Q&A)
- google-api-python-client: YouTube Data API v3 client
- google-auth-oauthlib: OAuth for Google APIs
- google-auth-httplib2: HTTP transport for Google APIs
- beautifulsoup4: HTML parsing and text extraction
- fastapi: Web server for browsing stored docs with beautiful HTML interface
- uvicorn: ASGI server for FastAPI
- faiss-cpu: Vector database for semantic search
- sqlite3: Built-in, no external DB required
The DB class uses singleton pattern with thread-safe initialization. Always instantiate as db = DB() - you'll get the same instance.
When user asks a question without naming the game, system checks:
- Explicit game names extracted via LLM structured output
- Fuzzy matching against available games
- Recent chat context (tagged game_id in chat_log)
storage/
games/
<game-id>/ # Uses numeric game ID, not slug
page.html # Downloaded HTML
page.txt # Extracted text from HTML
rulebook.pdf # Downloaded PDFs
...
datasources/
<game-id>/ # FAISS vector indices per game
index.faiss
metadata.pkl
Note: Storage directories use numeric game IDs (e.g., storage/games/1/) for simplicity and to avoid issues with special characters in game names.
Environment variables (create bot/.env file):
OTTER_BOT_TOKEN=<telegram-bot-token> # Required - from @BotFather
OPENAI_API_KEY=<openai-key> # Required - for LLM and embeddings
YOUTUBE_API_KEY=<youtube-api-key> # Required - from Google Cloud Console (YouTube Data API v3)
DATABASE_NAME=otterbot # Optional, defaults to "database"
STORAGE_DIR=storage # Optional, defaults to "storage"
API_BASE_URL=https://otterbot.space # Required - public URL for file links in bot messagesCritical:
.envfile is inbot/.env(not root)API_BASE_URLshould be your public-facing URL (e.g.,https://otterbot.space), notlocalhost. The bot sends links like{API_BASE_URL}/games/1/filesto users in Telegram.YOUTUBE_API_KEYis required for tutorial search. Get it from Google Cloud Console → APIs & Services → Credentials → Create API Key → Enable YouTube Data API v3
Symptom: Telegram API errors about "terminated by other getUpdates request"
Solution: Only run one bot instance at a time. Check for existing processes:
ps aux | grep "python.*main.py" | grep -v grepKill any existing instances before starting a new one.
Symptom: Messages fail with "Can't parse entities: unsupported start tag"
Solution: Telegram's HTML parser only supports: <b>, <i>, <a>, <code>, <pre>. The bot uses bot/utils.py:md_to_html() to convert markdown to Telegram-compatible HTML. Supports both **bold** and *bold* patterns. Never use <br> tags - use newlines instead.
Symptom: BGG XML API returns 401 authentication errors
Solution: BoardGameGeek now requires authentication tokens (as of late 2024). The bot automatically falls back to Google search for BGG URLs. This is expected behavior and works reliably. See logs with [BGG] and [Google BGG] prefixes.
Symptom: No YouTube tutorial shown after research
Solution: The bot uses YouTube Data API v3 with smart scoring (view count, channel quality, relevance). If no suitable tutorial exists, it will show none. Check logs with [YouTube API] prefix to see what was found and why. Fallback to Google search happens automatically if API fails.
Symptom: Can't access http://your-server:8000 from browser
Solution: Ensure uvicorn binds to 0.0.0.0 (not 127.0.0.1):
uvicorn app.api:app --host 0.0.0.0 --port 8000No test suite currently exists. When adding tests:
- Mock
DBby passing a test SQLite connection toDB(conn=...) - Mock OpenAI calls in
llms/openai.py - Use test fixtures for sample HTML/PDF content