OoriData · uogbuji · Apr 30, 2026 · Apr 30, 2026 · Apr 30, 2026 · Apr 30, 2026
diff --git a/.claude/skills/python/SKILL.md b/.claude/skills/python/SKILL.md
@@ -0,0 +1,86 @@
+---
+name: python-backend
+description: Python 3.12+ backend and library development — packaging, testing, pyproject.toml, uv, hatchling, asyncio, and repo hygiene. Use when working on Python services, CLIs, or libraries.
+---
+
+# Python Backend Development
+
+## Purpose
+Follow this skill for Python 3.12+ backend and library work: packaging, project structure, testing, and repository hygiene.
+
+## Default rules
+- Single quotes throughout, including triple-quoted strings.
+- Absolute imports; 120-char lines; moderate comments.
+- `uv` for installs; `uv pip install -U .` for real package validation.
+- Hatchling build system; no `setuptools`, no `setup.py`.
+- No editable installs for libraries.
+- `asyncio` for I/O-bound work; multiprocessing for CPU-bound.
+- `fire` for CLI args; `structlog` for logging; `httpx` for HTTP; `pytest` for tests.
+- `tenacity` for retries; `rich` for terminal output.
+- No `langchain` unless explicitly requested.
+- Dataclasses over Pydantic; keep abstractions proportionate to the task.
+
+## Workflow
+1. Read the repo's `CLAUDE.md` / `AGENTS.md` first.
+2. Check `pyproject.toml` — follow its build and test commands.
+3. Prefer small, deterministic changes.
+4. Validate with `pytest` or a targeted run.
+5. Report any assumptions or unresolved ambiguity.
+
+## Packaging
+- Library code lives under `pylib/`.
+- Use `[tool.hatch.build.targets.wheel]` with `only-include = ['pylib']`.
+- Map `pylib` to the package name in `[tool.hatch.build.sources]`.
+- Export CLIs through `[project.scripts]` with a `main()` entry point in each module.
+
+## If the task is unclear
+Ask for the repo type (library vs service), runtime target, and whether strict installability or editable installs are acceptable.
+
+
+## Full conventions
+
+Additional context for AI tools & coding agents
+
+- Python 3.12+ code, unless otherwise specified
+- Python code uses single outer quotes, including triple single quotes for e.g. docstrings
+- prefer absolute imports to relative imports
+- Use a decent amount of comments
+  - not *too* many, just enough that anybody familiar with the code can use them as a reference point. Not meant to teach somebody new every intricacy of the code, just help keep the savvy reader oriented.
+- if it saves a line, put a comment after a line rather than above it
+  - use the standard two spaces before the comment character, eg. `CODE  # COMMENT`
+- Try to stick to 120 characters per line
+  - if one of those comments would break this guideline, just put that comment above the line instead, as is standard convention
+- If there is a pyproject.toml in place, use it as a reference for builds, installs, etc. The basic packaging and dev preference, including if you have to supply your own pyproject.toml, is as follows:
+  - Prefer hatchling build system over setuptools, poetry, etc. Avoid setuptools as much as possible. No setup.py.
+  - Reusable Python code modules are developed in the `pylib` folder, and installed using e.g. `uv pip install -U .`, which includes proper mapping to Python library package namespace via `tool.hatch.build.sources`. The `__init__.py` and other modules in the top-level package go directly in `pylib`, though submodules can use subdirectories, e.g. `pylib/a/b` becomes `installed_library_name.a.b`. Ultimately this will mean the installed package is importable as `from installed_library_name.etc import …`
+  - Use `[tool.hatch.build.targets.wheel]` with `only-include = ["pylib"]` to ensure the pylib directory structure gets included properly in the wheel, avoiding the duplication issue that can occur with sources mapping
+  - Yes this means editable and "dev mode" environments are NOT desirable, nor are shenanigans adding pylib to `sys.path`. Layer-efficient dockerization is an option if that's needed.
+  - The ethos is to always develop keeping things properly installable. No dev mode shortcuts. Substantive modification to libray code requires e.g. `uv pip install -U .` each time.
+  - Note: This avoidance of editable installs can be relaxed for non-library code, such as demos or main app launch scripts (e.g. webapp back ends)
+  - If it's a CLI provided as part of a library, though, it should still use proper installation via `[project.scripts]` entry points (e.g., `ooriscout = 'ooriscout.cli.scout:main'`), which creates console scripts that work correctly after `uv pip install -U .`. The CLI module lives in `pylib/cli/` and exposes a `main()` function that uses fire to handle command-line arguments. 
+- **Debugging package issues**: When modules aren't importing correctly after installation, check:
+  - That you are in the correct virtualenv (you may have to ask the developer)
+  - Package structure in site-packages (e.g., `ls -la /path/to/site-packages/package_name/`)
+- Use uv, but pay attention to the above
+  - Again always use `uv pip install -U .` for full installation, never editable installs (`pip install -e`). This ensures proper testing of the actual distribution.
+- Use async (e.g. asyncio) wherever it makes sense. Avoid multithreading, though multiprocessing is OK. Multiprocess for CPU-bound concurrency, and asyncIO for I/O bound, cooperative etc.
+- Be pythonic. Avoid e.g. complex abstract class hierarchies for the sake of them, though classes are also fine in many usage patterns. We love dictionaries, dynamic dispatch, etc.
+  - I don't consider Pydantic very Pythonic, so we can tolerate it if need be (e.g. we're using a toolkit that strictly works with Pydantic), but otherwise, simple dataclasses are better.
+- Type hints are OK in moderation, but avoid absolutely littering the code with them.
+  - No excess imports & symbols, e.g. Use type | None rather than Optional[type]
+- use iterator patterns as much as practical. Also functional programming approaches, including partials (currying) and decorators
+- Prefereed tools:
+  - Logging: structlog
+  - Retries on failure: tenacity
+  - CLI argument processing: fire—avoid argparse except for truly trivial usage
+  - CLI formatting: rich
+  - HTTP client: httpx (async)
+  - HTML/XML parsing: selectolax (though for now we're using html5-modern as the base implementation for our html5 features)
+  - Browser-like Web crawling/scraping: Python playwright (with playwright_stealth if needed)
+  - pytest, as well as pytest-mock, pytest-httpx, pytest-asyncio
+  - rapidfuzz for fuzzy text matching
+- AVOID the following unless explicitly requested or otherwise unavoidable:
+  - langchain
+
+- Once again PREFER SINGLE QUOTES
+
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,20 @@
+<!-- Generated by oori-seed-repo. Compatible with Claude Code, OpenCode, GitHub Copilot, Antigravity, and Aider. -->
+# WordLoom
+
+<!-- BEGIN MANAGED:agent-preamble -->
+- Source of truth is the code and git history, not assistant memory.
+- Read existing code before modifying; prefer targeted, minimal changes.
+- Validate changes with tests before reporting completion.
+- Ask before making destructive or hard-to-reverse changes.
+<!-- END MANAGED:agent-preamble -->
+
+## Project type: python
+
+<!-- BEGIN MANAGED:python-core -->
+For Python library/backend work, load `.claude/skills/python/SKILL.md` — covers conventions, packaging, testing, and tooling.
+
+<!-- END MANAGED:python-core -->
+
+## Local context
+
+<!-- Add project-specific notes here. This section is preserved through `oori-sync-repo` updates. -->
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,24 @@
+<!-- Generated by oori-seed-repo. Local overrides are preserved through syncs. -->
+# WordLoom — Agent Instructions
+
+<!-- BEGIN MANAGED:agent-preamble -->
+- Source of truth is the code and git history, not assistant memory.
+- Load only the skills and snippets needed for the task at hand.
+- Prefer small, deterministic changes; validate with tests before reporting done.
+- Ask before making destructive or hard-to-reverse changes.
+<!-- END MANAGED:agent-preamble -->
+
+## Project type: python
+
+<!-- BEGIN MANAGED:python-core -->
+For Python library/backend work, load `.claude/skills/python/SKILL.md` — covers conventions, packaging, testing, and tooling.
+
+<!-- END MANAGED:python-core -->
+
+## Skills
+
+Skills are in `.claude/skills/`. Load a skill's `SKILL.md` when the task matches its description.
+
+## Local overrides
+
+<!-- Add repo-specific instructions below. This section is never overwritten by `oori-sync-repo`. -->
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -94,26 +94,21 @@ WordLoom/
 ├── pylib/              # Source code (becomes 'wordloom' package)
 │   ├── __init__.py
 │   ├── __about__.py    # Version info
-│   └── wordloom.py     # Main implementation
+│   ├── wordloom.py     # Core implementation
+│   └── ext/            # Opt-in extensions (loaded only when features= requests them)
+│       ├── __init__.py
+│       └── file_includes.py  # file-inclusion extension
 ├── resources/          # Bundled resources
 │   └── wordloom/
 │       └── sample.toml
 ├── test/               # Tests
 │   ├── test_basics.py
-│   ├── test_i18n_integration.py
-│   └── test_openai_integration.py
+│   ├── test_i18n.py
+│   ├── test_openai.py
+│   └── test_file_inclusion.py
 ├── pyproject.toml      # Project config
+├── implementation.md   # Library internals and extension docs
 └── README.md
-
-When installed, becomes:
-site-packages/
-└── wordloom/
-    ├── __init__.py
-    ├── __about__.py
-    ├── wordloom.py
-    └── resources/
-        └── wordloom/
-            └── sample.toml
 ```
 
 When installed, becomes:
@@ -124,6 +119,9 @@ site-packages/
     ├── __init__.py
     ├── __about__.py
     ├── wordloom.py
+    ├── ext/
+    │   ├── __init__.py
+    │   └── file_includes.py
     └── resources/
         └── wordloom/
             └── sample.toml
@@ -134,7 +132,8 @@ site-packages/
 - `pylib/__about__.py` - Version number (update for releases)
 - `pyproject.toml` - Dependencies, metadata, build config
 - `resources/wordloom/sample.toml` - Sample file used by tests
-- `README.md` - Main documentation
+- `README.md` - User-facing documentation
+- `implementation.md` - Library internals, `load()` API reference, extension docs
 - `wordloom_spec.md` - Format specification (CC BY 4.0)
 
 # Publishing a Release

diff --git a/README.md b/README.md
@@ -107,7 +107,7 @@ This is an under-considered area in AI prompting. When dealing with multiple lan
 
 # Contributing
 
-Contributions welcome! We're interested in feedback from the community about what works and what doesn't in real-world usage. To get help with the code implementation, read [CONTRIBUTING.md](CONTRIBUTING.md).
+Contributions welcome! We're interested in feedback from the community about what works and what doesn't in real-world usage. To get help with the code implementation, or to learn about our packaging approach, read [CONTRIBUTING.md](CONTRIBUTING.md).
 
 # License
 

diff --git a/agent-control.toml b/agent-control.toml
@@ -0,0 +1,15 @@
+# Agent control config for WordLoom. Managed by oori_coding_control.
+
+[project]
+name = 'WordLoom'
+kind = 'python'
+control = 'https://github.com/OoriData/coding-agent-control'
+
+[paths]
+claude = 'CLAUDE.md'
+agents = 'AGENTS.md'
+config = 'agent-control.toml'
+
+[managed]
+# Skills installed in this repo (updated by oori-seed-repo)
+skills = ['python']
diff --git a/implementation.md b/implementation.md
@@ -0,0 +1,174 @@
+# WordLoom — Python Implementation
+
+This document covers the internals of the Python library, including the core
+data model, the `load()` API, and available opt-in extensions.
+
+---
+
+## Core data model: `language_item`
+
+`language_item` is a `str` subclass.  Every item parsed from a loom file is
+one of these.  Converting to `str` gives the default-language text, which
+means items drop into any `str.format()` call naturally.
+
+Key attributes:
+
+| Attribute | Type | Description |
+|---|---|---|
+| `lang` | `str` | Default language code (BCP 47) |
+| `altlang` | `dict[str, str]` | Alternate-language texts keyed by language code |
+| `meta` | `dict` | Raw metadata from the TOML table (non-reserved keys) |
+| `markers` | `list \| None` | Template variable names declared with `_m` |
+| `file_bindings` | `dict[str, str]` | Resolved file/dir/glob inclusions (empty when the feature is not active) |
+
+### `in_lang(lang)`
+
+Returns the alternate-language text for `lang`, or `None` if not present.
+
+### `render(**kwargs)`
+
+Formats the template text by merging `file_bindings` with any runtime
+`kwargs` (runtime values win on collision), then calling `str.format`.
+
+```python
+prompt.render(extra='value')
+# equivalent to: str(prompt).format(**{**prompt.file_bindings, **kwargs})
+```
+
+When `file_bindings` is empty (feature disabled), this is a transparent
+wrapper around `str.format`.
+
+### `clone(**overrides)`
+
+Returns a new `language_item` with selective attribute overrides.
+`file_bindings` is preserved unless explicitly replaced.
+
+---
+
+## `load()` — reading a loom file
+
+```python
+wordloom.load(fp_or_str, lang='en', preserve_key=False, features=None, base_dir=None)
+```
+
+Returns a `dict` mapping each TOML key (and its default-language text) to a
+`language_item`.  Only items whose `lang` (or the file-level default `lang`)
+matches the requested `lang` are included.
+
+### Input forms
+
+| Type passed | Behaviour |
+|---|---|
+| `pathlib.Path` | Opened as a file; parent directory used as loom base |
+| `str` that resolves to an existing file | Opened as a file; parent directory used as loom base |
+| `str` with no matching file | Treated as raw TOML content |
+| `bytes` | Treated as raw TOML content |
+| File-like object from `open()` | Read directly; `.name` used to detect loom base |
+
+### Parameters
+
+`lang` — language to select (default: `'en'`).
+
+`preserve_key` — if `True`, the TOML key name is stored in `meta['_key']`.
+
+`features` — a `set` or `dict` enabling optional extensions.  A set entry or
+a truthy dict value activates that feature.  Example:
+
+```python
+loom = wordloom.load(Path('prompts.toml'), features={'file-inclusion'})
+# or equivalently:
+loom = wordloom.load(Path('prompts.toml'), features={'file-inclusion': True})
+```
+
+`base_dir` — override the auto-detected loom base directory.  Useful when
+loading from a `bytes` or in-memory string with extensions that need path
+resolution.
+
+---
+
+## Extension: `file-inclusion`
+
+**Module**: `wordloom.ext.file_includes`  
+**Feature key**: `'file-inclusion'`
+
+This extension interprets metadata values that carry a scheme prefix as
+references to external content, and resolves them at load time.
+
+**Warning:** The security model prevents path traversal, but it cannot protect against malicious *content* inside included files. If file contents are user-influenced or come from untrusted sources, they could inject instructions into your prompts. Only include files you trust, or inspect/strip their content before loading.
+
+
+### TOML syntax
+
+```toml
+[my_prompt]
+_ = """
+Analyse the following documents:
+
+{corpus}
+"""
+_m = ["corpus"]
+corpus = "dir:documents"
+```
+
+Any metadata key (non-`_`, non-`lang`) whose string value begins with one of
+the three schemes below is treated as a file reference.  All other metadata
+values pass through unmodified.
+
+| Scheme | Example value | Resolves to |
+|---|---|---|
+| `file:<rel-path>` | `file:context/background.txt` | UTF-8 content of that file |
+| `dir:<rel-path>` | `dir:analysis` | All UTF-8 files under that directory, concatenated with `=== relative/path ===` headers |
+| `glob:<pattern>` | `glob:notes/**/*.md` | All UTF-8 files matching the glob, same concatenation format |
+
+Paths are always **relative to the directory containing the loom TOML file**.
+
+### Accessing resolved content
+
+```python
+from pathlib import Path
+import wordloom
+
+loom = wordloom.load(Path('prompts.toml'), features={'file-inclusion'})
+
+prompt = loom['my_prompt']
+
+# Inspect what was resolved
+print(prompt.file_bindings)  # {'corpus': '=== doc1.txt ===\n...'}
+
+# Format the template — file_bindings are applied automatically
+result = prompt.render()
+
+# Supply additional runtime values; they override file_bindings on collision
+result = prompt.render(extra_context='additional info')
+```
+
+The raw metadata values (`"dir:documents"` etc.) remain in `prompt.meta`
+unchanged — `file_bindings` holds only the resolved content.
+
+### Security model
+
+The extension enforces that all resolved paths stay within the loom base
+directory:
+
+- Absolute paths (`file:/etc/passwd`) → `ValueError`
+- Traversal escapes (`file:../../secret`) → `ValueError`
+- `glob:` patterns with `..` segments → `ValueError`
+- Missing `file:` target → `FileNotFoundError`
+- Missing `dir:` target → `NotADirectoryError`
+
+For `dir:` and `glob:` scans:
+- Files larger than 2 MB are silently skipped
+- Non-UTF-8 files are silently skipped
+- Hidden paths (any component starting with `.`) are silently skipped
+
+### Requiring a base directory
+
+The extension needs to know where the loom file lives.  It is auto-detected
+when you pass a `Path`, a path string, or an `open()` handle.  When loading
+from raw bytes or an in-memory string, set `base_dir` explicitly:
+
+```python
+loom = wordloom.load(toml_bytes, features={'file-inclusion'}, base_dir='/path/to/loom-dir')
+```
+
+Without a base directory, the feature raises `ValueError` at load time.
diff --git a/pylib/ext/__init__.py b/pylib/ext/__init__.py
@@ -0,0 +1,2 @@
+# SPDX-FileCopyrightText: 2023-present Oori Data <[email protected]>
+# SPDX-License-Identifier: Apache-2.0
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# SPDX-FileCopyrightText: 2023-present Oori Data <[email protected]>
		# SPDX-License-Identifier: Apache-2.0