Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions .claude/skills/python/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
name: python-backend
description: Python 3.12+ backend and library development — packaging, testing, pyproject.toml, uv, hatchling, asyncio, and repo hygiene. Use when working on Python services, CLIs, or libraries.
---

# Python Backend Development

## Purpose
Follow this skill for Python 3.12+ backend and library work: packaging, project structure, testing, and repository hygiene.

## Default rules
- Single quotes throughout, including triple-quoted strings.
- Absolute imports; 120-char lines; moderate comments.
- `uv` for installs; `uv pip install -U .` for real package validation.
- Hatchling build system; no `setuptools`, no `setup.py`.
- No editable installs for libraries.
- `asyncio` for I/O-bound work; multiprocessing for CPU-bound.
- `fire` for CLI args; `structlog` for logging; `httpx` for HTTP; `pytest` for tests.
- `tenacity` for retries; `rich` for terminal output.
- No `langchain` unless explicitly requested.
- Dataclasses over Pydantic; keep abstractions proportionate to the task.

## Workflow
1. Read the repo's `CLAUDE.md` / `AGENTS.md` first.
2. Check `pyproject.toml` — follow its build and test commands.
3. Prefer small, deterministic changes.
4. Validate with `pytest` or a targeted run.
5. Report any assumptions or unresolved ambiguity.

## Packaging
- Library code lives under `pylib/`.
- Use `[tool.hatch.build.targets.wheel]` with `only-include = ['pylib']`.
- Map `pylib` to the package name in `[tool.hatch.build.sources]`.
- Export CLIs through `[project.scripts]` with a `main()` entry point in each module.

## If the task is unclear
Ask for the repo type (library vs service), runtime target, and whether strict installability or editable installs are acceptable.


## Full conventions

Additional context for AI tools & coding agents

- Python 3.12+ code, unless otherwise specified
- Python code uses single outer quotes, including triple single quotes for e.g. docstrings
- prefer absolute imports to relative imports
- Use a decent amount of comments
- not *too* many, just enough that anybody familiar with the code can use them as a reference point. Not meant to teach somebody new every intricacy of the code, just help keep the savvy reader oriented.
- if it saves a line, put a comment after a line rather than above it
- use the standard two spaces before the comment character, eg. `CODE # COMMENT`
- Try to stick to 120 characters per line
- if one of those comments would break this guideline, just put that comment above the line instead, as is standard convention
- If there is a pyproject.toml in place, use it as a reference for builds, installs, etc. The basic packaging and dev preference, including if you have to supply your own pyproject.toml, is as follows:
- Prefer hatchling build system over setuptools, poetry, etc. Avoid setuptools as much as possible. No setup.py.
- Reusable Python code modules are developed in the `pylib` folder, and installed using e.g. `uv pip install -U .`, which includes proper mapping to Python library package namespace via `tool.hatch.build.sources`. The `__init__.py` and other modules in the top-level package go directly in `pylib`, though submodules can use subdirectories, e.g. `pylib/a/b` becomes `installed_library_name.a.b`. Ultimately this will mean the installed package is importable as `from installed_library_name.etc import …`
- Use `[tool.hatch.build.targets.wheel]` with `only-include = ["pylib"]` to ensure the pylib directory structure gets included properly in the wheel, avoiding the duplication issue that can occur with sources mapping
- Yes this means editable and "dev mode" environments are NOT desirable, nor are shenanigans adding pylib to `sys.path`. Layer-efficient dockerization is an option if that's needed.
- The ethos is to always develop keeping things properly installable. No dev mode shortcuts. Substantive modification to libray code requires e.g. `uv pip install -U .` each time.
- Note: This avoidance of editable installs can be relaxed for non-library code, such as demos or main app launch scripts (e.g. webapp back ends)
- If it's a CLI provided as part of a library, though, it should still use proper installation via `[project.scripts]` entry points (e.g., `ooriscout = 'ooriscout.cli.scout:main'`), which creates console scripts that work correctly after `uv pip install -U .`. The CLI module lives in `pylib/cli/` and exposes a `main()` function that uses fire to handle command-line arguments.
- **Debugging package issues**: When modules aren't importing correctly after installation, check:
- That you are in the correct virtualenv (you may have to ask the developer)
- Package structure in site-packages (e.g., `ls -la /path/to/site-packages/package_name/`)
- Use uv, but pay attention to the above
- Again always use `uv pip install -U .` for full installation, never editable installs (`pip install -e`). This ensures proper testing of the actual distribution.
- Use async (e.g. asyncio) wherever it makes sense. Avoid multithreading, though multiprocessing is OK. Multiprocess for CPU-bound concurrency, and asyncIO for I/O bound, cooperative etc.
- Be pythonic. Avoid e.g. complex abstract class hierarchies for the sake of them, though classes are also fine in many usage patterns. We love dictionaries, dynamic dispatch, etc.
- I don't consider Pydantic very Pythonic, so we can tolerate it if need be (e.g. we're using a toolkit that strictly works with Pydantic), but otherwise, simple dataclasses are better.
- Type hints are OK in moderation, but avoid absolutely littering the code with them.
- No excess imports & symbols, e.g. Use type | None rather than Optional[type]
- use iterator patterns as much as practical. Also functional programming approaches, including partials (currying) and decorators
- Prefereed tools:
- Logging: structlog
- Retries on failure: tenacity
- CLI argument processing: fire—avoid argparse except for truly trivial usage
- CLI formatting: rich
- HTTP client: httpx (async)
- HTML/XML parsing: selectolax (though for now we're using html5-modern as the base implementation for our html5 features)
- Browser-like Web crawling/scraping: Python playwright (with playwright_stealth if needed)
- pytest, as well as pytest-mock, pytest-httpx, pytest-asyncio
- rapidfuzz for fuzzy text matching
- AVOID the following unless explicitly requested or otherwise unavoidable:
- langchain

- Once again PREFER SINGLE QUOTES

20 changes: 20 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<!-- Generated by oori-seed-repo. Compatible with Claude Code, OpenCode, GitHub Copilot, Antigravity, and Aider. -->
# WordLoom

<!-- BEGIN MANAGED:agent-preamble -->
- Source of truth is the code and git history, not assistant memory.
- Read existing code before modifying; prefer targeted, minimal changes.
- Validate changes with tests before reporting completion.
- Ask before making destructive or hard-to-reverse changes.
<!-- END MANAGED:agent-preamble -->

## Project type: python

<!-- BEGIN MANAGED:python-core -->
For Python library/backend work, load `.claude/skills/python/SKILL.md` — covers conventions, packaging, testing, and tooling.

<!-- END MANAGED:python-core -->

## Local context

<!-- Add project-specific notes here. This section is preserved through `oori-sync-repo` updates. -->
24 changes: 24 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<!-- Generated by oori-seed-repo. Local overrides are preserved through syncs. -->
# WordLoom — Agent Instructions

<!-- BEGIN MANAGED:agent-preamble -->
- Source of truth is the code and git history, not assistant memory.
- Load only the skills and snippets needed for the task at hand.
- Prefer small, deterministic changes; validate with tests before reporting done.
- Ask before making destructive or hard-to-reverse changes.
<!-- END MANAGED:agent-preamble -->

## Project type: python

<!-- BEGIN MANAGED:python-core -->
For Python library/backend work, load `.claude/skills/python/SKILL.md` — covers conventions, packaging, testing, and tooling.

<!-- END MANAGED:python-core -->

## Skills

Skills are in `.claude/skills/`. Load a skill's `SKILL.md` when the task matches its description.

## Local overrides

<!-- Add repo-specific instructions below. This section is never overwritten by `oori-sync-repo`. -->
27 changes: 13 additions & 14 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,26 +94,21 @@ WordLoom/
├── pylib/ # Source code (becomes 'wordloom' package)
│ ├── __init__.py
│ ├── __about__.py # Version info
│ └── wordloom.py # Main implementation
│ ├── wordloom.py # Core implementation
│ └── ext/ # Opt-in extensions (loaded only when features= requests them)
│ ├── __init__.py
│ └── file_includes.py # file-inclusion extension
├── resources/ # Bundled resources
│ └── wordloom/
│ └── sample.toml
├── test/ # Tests
│ ├── test_basics.py
│ ├── test_i18n_integration.py
│ └── test_openai_integration.py
│ ├── test_i18n.py
│ ├── test_openai.py
│ └── test_file_inclusion.py
├── pyproject.toml # Project config
├── implementation.md # Library internals and extension docs
└── README.md

When installed, becomes:
site-packages/
└── wordloom/
├── __init__.py
├── __about__.py
├── wordloom.py
└── resources/
└── wordloom/
└── sample.toml
```

When installed, becomes:
Expand All @@ -124,6 +119,9 @@ site-packages/
├── __init__.py
├── __about__.py
├── wordloom.py
├── ext/
│ ├── __init__.py
│ └── file_includes.py
└── resources/
└── wordloom/
└── sample.toml
Expand All @@ -134,7 +132,8 @@ site-packages/
- `pylib/__about__.py` - Version number (update for releases)
- `pyproject.toml` - Dependencies, metadata, build config
- `resources/wordloom/sample.toml` - Sample file used by tests
- `README.md` - Main documentation
- `README.md` - User-facing documentation
- `implementation.md` - Library internals, `load()` API reference, extension docs
- `wordloom_spec.md` - Format specification (CC BY 4.0)

# Publishing a Release
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ This is an under-considered area in AI prompting. When dealing with multiple lan

# Contributing

Contributions welcome! We're interested in feedback from the community about what works and what doesn't in real-world usage. To get help with the code implementation, read [CONTRIBUTING.md](CONTRIBUTING.md).
Contributions welcome! We're interested in feedback from the community about what works and what doesn't in real-world usage. To get help with the code implementation, or to learn about our packaging approach, read [CONTRIBUTING.md](CONTRIBUTING.md).

# License

Expand Down
15 changes: 15 additions & 0 deletions agent-control.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Agent control config for WordLoom. Managed by oori_coding_control.

[project]
name = 'WordLoom'
kind = 'python'
control = 'https://github.com/OoriData/coding-agent-control'

[paths]
claude = 'CLAUDE.md'
agents = 'AGENTS.md'
config = 'agent-control.toml'

[managed]
# Skills installed in this repo (updated by oori-seed-repo)
skills = ['python']
174 changes: 174 additions & 0 deletions implementation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# WordLoom — Python Implementation

This document covers the internals of the Python library, including the core
data model, the `load()` API, and available opt-in extensions.

---

## Core data model: `language_item`

`language_item` is a `str` subclass. Every item parsed from a loom file is
one of these. Converting to `str` gives the default-language text, which
means items drop into any `str.format()` call naturally.

Key attributes:

| Attribute | Type | Description |
|---|---|---|
| `lang` | `str` | Default language code (BCP 47) |
| `altlang` | `dict[str, str]` | Alternate-language texts keyed by language code |
| `meta` | `dict` | Raw metadata from the TOML table (non-reserved keys) |
| `markers` | `list \| None` | Template variable names declared with `_m` |
| `file_bindings` | `dict[str, str]` | Resolved file/dir/glob inclusions (empty when the feature is not active) |

### `in_lang(lang)`

Returns the alternate-language text for `lang`, or `None` if not present.

### `render(**kwargs)`

Formats the template text by merging `file_bindings` with any runtime
`kwargs` (runtime values win on collision), then calling `str.format`.

```python
prompt.render(extra='value')
# equivalent to: str(prompt).format(**{**prompt.file_bindings, **kwargs})
```

When `file_bindings` is empty (feature disabled), this is a transparent
wrapper around `str.format`.

### `clone(**overrides)`

Returns a new `language_item` with selective attribute overrides.
`file_bindings` is preserved unless explicitly replaced.

---

## `load()` — reading a loom file

```python
wordloom.load(fp_or_str, lang='en', preserve_key=False, features=None, base_dir=None)
```

Returns a `dict` mapping each TOML key (and its default-language text) to a
`language_item`. Only items whose `lang` (or the file-level default `lang`)
matches the requested `lang` are included.

### Input forms

| Type passed | Behaviour |
|---|---|
| `pathlib.Path` | Opened as a file; parent directory used as loom base |
| `str` that resolves to an existing file | Opened as a file; parent directory used as loom base |
| `str` with no matching file | Treated as raw TOML content |
| `bytes` | Treated as raw TOML content |
| File-like object from `open()` | Read directly; `.name` used to detect loom base |

### Parameters

`lang` — language to select (default: `'en'`).

`preserve_key` — if `True`, the TOML key name is stored in `meta['_key']`.

`features` — a `set` or `dict` enabling optional extensions. A set entry or
a truthy dict value activates that feature. Example:

```python
loom = wordloom.load(Path('prompts.toml'), features={'file-inclusion'})
# or equivalently:
loom = wordloom.load(Path('prompts.toml'), features={'file-inclusion': True})
```

`base_dir` — override the auto-detected loom base directory. Useful when
loading from a `bytes` or in-memory string with extensions that need path
resolution.

---

## Extension: `file-inclusion`

**Module**: `wordloom.ext.file_includes`
**Feature key**: `'file-inclusion'`

This extension interprets metadata values that carry a scheme prefix as
references to external content, and resolves them at load time.

**Warning:** The security model prevents path traversal, but it cannot protect against malicious *content* inside included files. If file contents are user-influenced or come from untrusted sources, they could inject instructions into your prompts. Only include files you trust, or inspect/strip their content before loading.


### TOML syntax

```toml
[my_prompt]
_ = """
Analyse the following documents:

{corpus}
"""
_m = ["corpus"]
corpus = "dir:documents"
```

Any metadata key (non-`_`, non-`lang`) whose string value begins with one of
the three schemes below is treated as a file reference. All other metadata
values pass through unmodified.

| Scheme | Example value | Resolves to |
|---|---|---|
| `file:<rel-path>` | `file:context/background.txt` | UTF-8 content of that file |
| `dir:<rel-path>` | `dir:analysis` | All UTF-8 files under that directory, concatenated with `=== relative/path ===` headers |
| `glob:<pattern>` | `glob:notes/**/*.md` | All UTF-8 files matching the glob, same concatenation format |

Paths are always **relative to the directory containing the loom TOML file**.

### Accessing resolved content

```python
from pathlib import Path
import wordloom

loom = wordloom.load(Path('prompts.toml'), features={'file-inclusion'})

prompt = loom['my_prompt']

# Inspect what was resolved
print(prompt.file_bindings) # {'corpus': '=== doc1.txt ===\n...'}

# Format the template — file_bindings are applied automatically
result = prompt.render()

# Supply additional runtime values; they override file_bindings on collision
result = prompt.render(extra_context='additional info')
```

The raw metadata values (`"dir:documents"` etc.) remain in `prompt.meta`
unchanged — `file_bindings` holds only the resolved content.

### Security model

The extension enforces that all resolved paths stay within the loom base
directory:

- Absolute paths (`file:/etc/passwd`) → `ValueError`
- Traversal escapes (`file:../../secret`) → `ValueError`
- `glob:` patterns with `..` segments → `ValueError`
- Missing `file:` target → `FileNotFoundError`
- Missing `dir:` target → `NotADirectoryError`

For `dir:` and `glob:` scans:
- Files larger than 2 MB are silently skipped
- Non-UTF-8 files are silently skipped
- Hidden paths (any component starting with `.`) are silently skipped

### Requiring a base directory

The extension needs to know where the loom file lives. It is auto-detected
when you pass a `Path`, a path string, or an `open()` handle. When loading
from raw bytes or an in-memory string, set `base_dir` explicitly:

```python
loom = wordloom.load(toml_bytes, features={'file-inclusion'}, base_dir='/path/to/loom-dir')
```

Without a base directory, the feature raises `ValueError` at load time.
2 changes: 2 additions & 0 deletions pylib/ext/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# SPDX-FileCopyrightText: 2023-present Oori Data <[email protected]>
# SPDX-License-Identifier: Apache-2.0
Loading
Loading