Skip to content

feat: registry URL scheme/host CI gate + http allowlist (F-11-01)#95

Merged
kurtseifried merged 1 commit into
mainfrom
fix/registry-url-scheme-gate
Jun 21, 2026
Merged

feat: registry URL scheme/host CI gate + http allowlist (F-11-01)#95
kurtseifried merged 1 commit into
mainfrom
fix/registry-url-scheme-gate

Conversation

@kurtseifried

Copy link
Copy Markdown
Collaborator

Finding

F-11-01 — the registry is the URL trust root for every resolver, but the only automated content gate (validate-registry-schema.py) is structural; a poisoned-but-well-formed url (javascript:, attacker host, …) passed it and auto-propagated to production KV.

Fix

New blocking CI step scripts/validate-urls.py (wired into validate-registry.yml): every url / url_template / *_url / wikipedia value must be https, unless the host is in scripts/http-exception-allowlist.txt; javascript:/data:/file:/blob:/vbscript: are always rejected. Templated values are checked on their literal scheme+host prefix (a {placeholder} can't supply/move the host).

Policy (owner decision): both http and https supported, https preferred, http only via the curated allowlist (a new http URL fails CI until upgraded or explicitly allowlisted).

Seeded the allowlist with the 3 verified http-only hosts: veriscommunity.net (https cert mismatch), www.pentest-standard.org (https protocol error), www.planalto.gov.br (live gov site, blocks non-browser UAs). Removed the dead http://www.vapidlabs.com/misc/policy.html (hard 404, already _broken-annotated) from registry/disclosure/com/me.json.

Verification

  • validate-urls.py runs GREEN against all 2028 registry files with the allowlist + dead-link removal.
  • validate-registry-schema.py still green (schema unchanged).
  • 6 offline unit tests (scripts/test_validate_urls.py) cover the rejection paths the all-clean registry never exercises (dangerous schemes, non-allowlisted http, placeholder-scheme, template prefix).

Deferred

Schema additionalProperties:false tightening (F-11-03) → follow-up; needs a field-enumeration sweep across all entries to avoid rejecting valid data. Pairs with the publish-gate (F-12-04) PR next.

🤖 Generated with Claude Code

Adds scripts/validate-urls.py as a BLOCKING check in validate-registry.yml:
every url / url_template / *_url / wikipedia value across registry/**/*.json
must be https, unless the host is in scripts/http-exception-allowlist.txt;
javascript:/data:/file:/blob:/vbscript: are always rejected. Closes the gap
where a poisoned-but-well-formed URL passed the structural-only schema check
and auto-propagated to production KV.

Policy (owner decision): both http and https supported, https preferred, http
allowed only via the curated allowlist. Seeded with 3 verified http-only hosts
(veriscommunity.net, www.pentest-standard.org, www.planalto.gov.br). Removes
the dead http://www.vapidlabs.com/misc/policy.html entry (hard 404, already
_broken-annotated) from registry/disclosure/com/me.json.

Verified GREEN against all 2028 registry files; 6 offline rejection-path unit
tests (scripts/test_validate_urls.py). Schema additionalProperties tightening
(F-11-03) deferred to a follow-up (needs a field-enumeration sweep).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kurtseifried kurtseifried merged commit 6e49cb5 into main Jun 21, 2026
2 checks passed
@kurtseifried kurtseifried deleted the fix/registry-url-scheme-gate branch June 21, 2026 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant