Skip to content

Commit 49e32e5

Browse files
raymondkclaude
andauthored
feat: per-skill content hash for change detection + differential autosync (#205)
* feat: publish per-skill content hash in .well-known/skills/index.json Add a `hash` field ("sha256:<hex>") to each skill entry so consumers can detect which skills changed from a single index fetch, without downloading and hashing every file. The hash is computed over the skill's served files (path + per-file sha256, sorted) — order-independent and rename-sensitive. `files` stays a string array; the field is purely additive, so already deployed sync scripts are unaffected. Includes the design doc documenting the hash contract. This is the publishing half; the differential-sync consumer upgrade in autosync-ic-skills is a follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(autosync-ic-skills): differential sync keyed off per-skill hash Replace the blind full-mirror with a differential sync. The script fetches index.json once, compares each skill's published `hash` against a {name: hash} manifest (.ic-managed.json), and re-downloads only changed or new skills — pruning removed ones. Unchanged skills are skipped with no per-file downloads, and a no-op sync is silent. Falls back to re-downloading any skill the server publishes no `hash` for, keeps cached skills on network/jq failure, retains the old hash on a failed download so the next run retries, and transparently migrates the legacy bare-array manifest format. The script is now shipped as an attached file (scripts/sync-ic-skills.sh) and the installer fetches it via curl for byte-exact delivery instead of transcribing an inline block. On change it emits a SessionStart JSON object (systemMessage + additionalContext) so the summary surfaces in the Claude Code UI and Claude's context. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Make sure skills get reloaded on change --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent d4ebad9 commit 49e32e5

5 files changed

Lines changed: 226 additions & 69 deletions

File tree

README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,34 @@ The files are plain markdown — paste into any system prompt, rules file, or co
7373
| Skill index | [`llms.txt`](https://skills.internetcomputer.org/llms.txt) | All skills with descriptions and discovery links |
7474
| Skill page | [`/skills/{name}/`](https://skills.internetcomputer.org/skills/ckbtc/) | Pre-rendered skill page for humans |
7575

76+
### Change detection — the `hash` field
77+
78+
Each skill entry in [`index.json`](https://skills.internetcomputer.org/.well-known/skills/index.json) carries a `hash`:
79+
80+
```jsonc
81+
{
82+
"name": "asset-canister",
83+
"url": "https://.../asset-canister/SKILL.md",
84+
"files": ["SKILL.md"],
85+
"hash": "sha256:f3ee5a3e…" // per-skill aggregate content hash
86+
}
87+
```
88+
89+
**What it is.** A `sha256:<hex>` digest over all of the skill's served files. It is
90+
computed from each file's path plus the sha256 of its bytes, sorted by path — so it
91+
changes whenever any file in the skill changes (including `references/` and `scripts/`
92+
files), and is sensitive to renames. It is **not** tied to a git commit; it is a pure
93+
content hash of what the server actually serves.
94+
95+
**What it's for.** Detecting *which* skills changed from a single fetch of `index.json`,
96+
without downloading and hashing every file yourself. Store the `{name: hash}` map, and on
97+
the next fetch re-download only the skills whose `hash` differs (or are new), and prune
98+
those no longer listed. This is the basis for the differential sync in the
99+
[`autosync-ic-skills`](skills/autosync-ic-skills/SKILL.md) skill.
100+
101+
**What it's not for.** It is not a version number or changelog signal — it carries no
102+
ordering or human meaning, only equality. Compare hashes for equality; do not parse them.
103+
76104
## Evaluations
77105

78106
Each skill can have an evaluation file at `evaluations/<skill-name>.json` that tests whether agents produce correct output with the skill loaded. Evals compare agent output with and without the skill, using an LLM judge to score expected behaviors.

skills/autosync-ic-skills/SKILL.md

Lines changed: 37 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,17 @@ needs this link again — the installed `SessionStart` hook does the work from t
1818

1919
## What you will create
2020

21-
1. `.claude/sync-ic-skills.sh` — mirrors the live skill index into `.claude/skills/`.
21+
1. `.claude/sync-ic-skills.sh` — a **differential** sync script that mirrors the live
22+
skill index into `.claude/skills/`.
2223
2. A `SessionStart` hook in `.claude/settings.json` that runs that script.
2324
3. An immediate first run, so skills are present right away.
2425

25-
The sync is a **mirror**: it always re-downloads the current skills, so it picks up
26-
new skills, updated versions of existing skills, and removals — with no version
27-
metadata required on the server side.
26+
The script is a **differential mirror**. It fetches the discovery index once and
27+
compares each skill's published `hash` against a stored manifest, re-downloading only
28+
the skills that actually changed (and pruning ones removed upstream). Unchanged skills
29+
are skipped with no per-file downloads, and the script stays silent unless something
30+
changed. If the server does not publish a `hash` for a skill, the script falls back to
31+
re-downloading it every run, so it remains correct either way.
2832

2933
## Important: tell the user what to expect
3034

@@ -60,70 +64,34 @@ command -v jq >/dev/null 2>&1 && echo "jq: ok" || echo "jq: MISSING"
6064
(it exits cleanly with a warning when `jq` is absent), and they can install `jq`
6165
later and the next session will sync.
6266

63-
## Step 1 — Write the sync script
67+
## Step 1 — Download the sync script
6468

65-
Create `.claude/sync-ic-skills.sh` with **exactly** this content:
69+
The script is published as a file alongside this skill, so you fetch it verbatim rather
70+
than transcribing it (this guarantees byte-exact content). Create the `.claude` directory
71+
and download it:
6672

6773
```bash
68-
#!/usr/bin/env bash
69-
# sync-ic-skills.sh — mirror the latest Internet Computer skills into .claude/skills/
70-
# Idempotent and offline-safe. Only skills this script installed are ever pruned,
71-
# so your own local skills are never touched.
72-
set -euo pipefail
73-
74-
BASE="https://skills.internetcomputer.org/.well-known/skills"
75-
INDEX_URL="$BASE/index.json"
76-
DEST=".claude/skills"
77-
MANIFEST="$DEST/.ic-managed.json" # tracks which skills this script manages
78-
79-
mkdir -p "$DEST"
80-
81-
# --- Fetch the index. On any network failure, keep cached skills and exit cleanly. ---
82-
TMP_INDEX="$(mktemp)"
83-
trap 'rm -f "$TMP_INDEX"' EXIT
84-
if ! curl -fsSL --max-time 20 "$INDEX_URL" -o "$TMP_INDEX"; then
85-
echo "[ic-skills] could not reach $INDEX_URL — keeping cached skills" >&2
86-
exit 0
87-
fi
88-
89-
# --- jq is required to parse the index. If absent, warn and exit without failing. ---
90-
if ! command -v jq >/dev/null 2>&1; then
91-
echo "[ic-skills] 'jq' not found — install jq to enable IC skill sync" >&2
92-
exit 0
93-
fi
94-
95-
NEW_NAMES="$(jq -r '.skills[].name' "$TMP_INDEX")"
96-
97-
# --- Prune: drop previously-managed skills that are no longer in the index. ---
98-
if [ -f "$MANIFEST" ]; then
99-
while IFS= read -r old; do
100-
[ -n "$old" ] || continue
101-
if ! grep -qxF "$old" <<<"$NEW_NAMES"; then
102-
rm -rf "${DEST:?}/$old"
103-
echo "[ic-skills] pruned removed skill: $old" >&2
104-
fi
105-
done < <(jq -r '.[]?' "$MANIFEST" 2>/dev/null || true)
106-
fi
107-
108-
# --- Download every skill's files (overwrite == always latest). ---
109-
jq -c '.skills[]' "$TMP_INDEX" | while IFS= read -r entry; do
110-
name="$(jq -r '.name' <<<"$entry")"
111-
[ -n "$name" ] && [ "$name" != "null" ] || continue
112-
mkdir -p "$DEST/$name"
113-
while IFS= read -r f; do
114-
[ -n "$f" ] || continue
115-
mkdir -p "$(dirname "$DEST/$name/$f")" # files may live in subdirs (e.g. scripts/)
116-
if ! curl -fsSL --max-time 20 "$BASE/$name/$f" -o "$DEST/$name/$f"; then
117-
echo "[ic-skills] warning: failed to fetch $name/$f" >&2
118-
fi
119-
done < <(jq -r '.files[]?' <<<"$entry")
120-
done
121-
122-
# --- Record managed skill names for the next prune pass. ---
123-
jq '[.skills[].name]' "$TMP_INDEX" > "$MANIFEST"
124-
echo "[ic-skills] synced $(jq '.skills | length' "$TMP_INDEX") Internet Computer skills into $DEST" >&2
74+
mkdir -p .claude
75+
curl -fsSL https://skills.internetcomputer.org/.well-known/skills/autosync-ic-skills/scripts/sync-ic-skills.sh \
76+
-o .claude/sync-ic-skills.sh
12577
```
12678

79+
Do **not** hand-write or paraphrase the script — always fetch the published copy so the
80+
sync logic stays correct as it is updated upstream.
81+
82+
**What the script does** (for the user's awareness):
83+
84+
- Fetches `https://skills.internetcomputer.org/.well-known/skills/index.json` once.
85+
- For each skill, compares the published `hash` against `.claude/skills/.ic-managed.json`
86+
(a `{ "<skill>": "<hash>" }` manifest of skills it manages) and re-downloads only the
87+
skills whose hash changed or are new.
88+
- Prunes skills it previously installed that are no longer in the index.
89+
- Prints a one-line `added / updated / removed` summary only when something changed;
90+
otherwise it is silent.
91+
- Degrades gracefully: exits cleanly (keeping cached skills) if the network is down or
92+
`jq` is missing, and falls back to re-downloading skills the server publishes no
93+
`hash` for.
94+
12795
## Step 2 — Register the SessionStart hook (idempotently)
12896

12997
Add a `SessionStart` hook to `.claude/settings.json` that runs the script.
@@ -164,7 +132,7 @@ bash .claude/sync-ic-skills.sh
164132

165133
- Confirm `.claude/skills/` now contains skill directories (e.g. `motoko`,
166134
`asset-canister`, `internet-identity`, …) each with a `SKILL.md`.
167-
- Confirm `.claude/skills/.ic-managed.json` lists the synced skill names.
135+
- Confirm `.claude/skills/.ic-managed.json` maps each synced skill name to its hash.
168136
- Tell the user: how many skills were installed, that the `SessionStart` hook is in
169137
place, and that they'll be prompted to trust the hook before it auto-runs next
170138
session. From then on, their IC skills refresh automatically every session.
@@ -173,10 +141,11 @@ bash .claude/sync-ic-skills.sh
173141

174142
- **Safe to re-run.** Re-invoking this skill or the script is idempotent: the hook is
175143
not duplicated, and only skills tracked in `.ic-managed.json` are ever pruned.
176-
- **No server-side versioning needed.** Because the script re-mirrors current content,
177-
it captures new skills, new versions, and removals automatically. If the index later
178-
adds `sha256`/`version` fields, the script can be upgraded to a differential sync,
179-
but that is not required for correctness.
144+
- **Differential by hash.** The script keys off the per-skill `hash` field in the
145+
discovery index, so a normal session that touches nothing downloads only `index.json`
146+
and exits silently. Skills are re-downloaded only when their hash changes. Migrating
147+
from an older version of this script (whose manifest was a bare name array) is handled
148+
automatically on the next run.
180149
- **Optional mid-session refresh.** For very long-running sessions, the user can also
181150
run `bash .claude/sync-ic-skills.sh` manually, or schedule it (e.g. via `/loop` or a
182151
cron routine) — but the SessionStart hook covers the normal case.
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
#!/usr/bin/env bash
2+
# sync-ic-skills.sh — mirror the latest Internet Computer skills into .claude/skills/
3+
#
4+
# Differential sync: fetches the discovery index once and re-downloads only the
5+
# skills whose published `hash` changed (or are new). Skills already at the current
6+
# hash are skipped entirely — no per-file downloads. Prints a one-line summary only
7+
# when something actually changed.
8+
#
9+
# Idempotent and offline-safe. Only skills this script installed are ever pruned,
10+
# so your own local skills are never touched.
11+
set -euo pipefail
12+
13+
BASE="https://skills.internetcomputer.org/.well-known/skills"
14+
INDEX_URL="$BASE/index.json"
15+
DEST=".claude/skills"
16+
MANIFEST="$DEST/.ic-managed.json" # { "<skill>": "<hash>" } of skills this script manages
17+
18+
mkdir -p "$DEST"
19+
20+
# --- Temp files. NEW_MANIFEST is built up as we go, then swapped in atomically. ---
21+
TMP_INDEX="$(mktemp)"
22+
NEW_MANIFEST="$(mktemp)"
23+
trap 'rm -f "$TMP_INDEX" "$NEW_MANIFEST"' EXIT
24+
25+
# --- Fetch the index. On any network failure, keep cached skills and exit cleanly. ---
26+
if ! curl -fsSL --max-time 20 "$INDEX_URL" -o "$TMP_INDEX"; then
27+
echo "[autosync-ic-skills] could not reach $INDEX_URL — keeping cached skills" >&2
28+
exit 0
29+
fi
30+
31+
# --- jq is required to parse the index. If absent, warn and exit without failing. ---
32+
if ! command -v jq >/dev/null 2>&1; then
33+
echo "[autosync-ic-skills] 'jq' not found — install jq to enable IC skill sync" >&2
34+
exit 0
35+
fi
36+
37+
# --- Previously-managed skill names. Supports the legacy manifest format
38+
# (a bare array of names, no hashes) as well as the current object form. ---
39+
managed_names() {
40+
[ -f "$MANIFEST" ] || return 0
41+
jq -r 'if type == "object" then keys[] elif type == "array" then .[] else empty end' \
42+
"$MANIFEST" 2>/dev/null || true
43+
}
44+
45+
# --- Stored hash for a skill, or empty if unknown (new skill, or legacy manifest). ---
46+
stored_hash() {
47+
[ -f "$MANIFEST" ] || return 0
48+
jq -r --arg n "$1" 'if type == "object" then (.[$n] // "") else "" end' \
49+
"$MANIFEST" 2>/dev/null || true
50+
}
51+
52+
# --- Append a name->hash pair to the new manifest being built. ---
53+
record() {
54+
local tmp; tmp="$(mktemp)"
55+
jq --arg n "$1" --arg h "$2" '.[$n] = $h' "$NEW_MANIFEST" > "$tmp" && mv "$tmp" "$NEW_MANIFEST"
56+
}
57+
58+
NEW_NAMES="$(jq -r '.skills[].name' "$TMP_INDEX")"
59+
MANAGED="$(managed_names)"
60+
echo '{}' > "$NEW_MANIFEST"
61+
62+
# --- Prune: drop previously-managed skills that are no longer in the index. ---
63+
removed=0
64+
while IFS= read -r old; do
65+
[ -n "$old" ] || continue
66+
if ! grep -qxF "$old" <<<"$NEW_NAMES"; then
67+
rm -rf "${DEST:?}/$old"
68+
removed=$((removed + 1))
69+
echo "[autosync-ic-skills] removed: $old" >&2
70+
fi
71+
done <<<"$MANAGED"
72+
73+
# --- Sync: download only skills whose hash changed (new / hashless always download). ---
74+
added=0; updated=0; unchanged=0
75+
while IFS= read -r entry; do
76+
name="$(jq -r '.name' <<<"$entry")"
77+
[ -n "$name" ] && [ "$name" != "null" ] || continue
78+
new_hash="$(jq -r '.hash // ""' <<<"$entry")"
79+
old_hash="$(stored_hash "$name")"
80+
81+
# Skip when the hash is known, unchanged, and the files are already on disk.
82+
if [ -n "$new_hash" ] && [ "$new_hash" = "$old_hash" ] && [ -d "$DEST/$name" ]; then
83+
unchanged=$((unchanged + 1))
84+
record "$name" "$new_hash"
85+
continue
86+
fi
87+
88+
# Otherwise (re)download every file for this skill.
89+
ok=1
90+
mkdir -p "$DEST/$name"
91+
while IFS= read -r f; do
92+
[ -n "$f" ] || continue
93+
mkdir -p "$(dirname "$DEST/$name/$f")" # files may live in subdirs (e.g. scripts/)
94+
if ! curl -fsSL --max-time 20 "$BASE/$name/$f" -o "$DEST/$name/$f"; then
95+
echo "[autosync-ic-skills] warning: failed to fetch $name/$f" >&2
96+
ok=0
97+
fi
98+
done < <(jq -r '.files[]?' <<<"$entry")
99+
100+
if [ "$ok" -eq 1 ]; then
101+
# Record the new hash so the next run can skip this skill. A hashless server
102+
# records an empty hash, which never equals new_hash -> always re-downloads.
103+
record "$name" "$new_hash"
104+
if grep -qxF "$name" <<<"$MANAGED"; then
105+
updated=$((updated + 1))
106+
else
107+
added=$((added + 1))
108+
fi
109+
else
110+
# Download incomplete: keep the old hash so the next run retries this skill.
111+
record "$name" "$old_hash"
112+
fi
113+
done < <(jq -c '.skills[]' "$TMP_INDEX")
114+
115+
# --- Swap in the updated manifest. ---
116+
mv "$NEW_MANIFEST" "$MANIFEST"
117+
118+
# --- Report only when something changed; stay silent on a no-op sync. ---
119+
# SessionStart hook stdout/stderr is NOT shown in the Claude Code UI — only JSON
120+
# fields are surfaced. We emit a single JSON object on stdout:
121+
# - systemMessage -> rendered to the USER as a visible system notice
122+
# - additionalContext -> injected into Claude's context so it can mention it too
123+
if [ $((added + updated + removed)) -gt 0 ]; then
124+
summary="[autosync-ic-skills] ${added} added, ${updated} updated, ${removed} removed (${unchanged} unchanged) in $DEST"
125+
jq -n --arg msg "$summary" '{
126+
systemMessage: $msg,
127+
hookSpecificOutput: {
128+
reloadSkills: true,
129+
hookEventName: "SessionStart",
130+
additionalContext: $msg
131+
}
132+
}'
133+
fi

src/lib/skills.ts

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import { getCollection, type CollectionEntry } from 'astro:content';
66
import fs from 'node:fs/promises';
77
import path from 'node:path';
8+
import crypto from 'node:crypto';
89
import { execFile } from 'node:child_process';
910
import { promisify } from 'node:util';
1011

@@ -117,6 +118,31 @@ export async function getSkillFiles(skill: Skill): Promise<string[]> {
117118
return ['SKILL.md', ...allFiles.filter((f) => f !== 'SKILL.md').sort()];
118119
}
119120

121+
/**
122+
* Per-skill aggregate content hash, published in .well-known/skills/index.json so
123+
* consumers can detect which skills changed without downloading every file.
124+
*
125+
* Returns "sha256:<hex>" over the skill's files. The input is built from each served
126+
* file (the same set getSkillFiles returns) sorted by path, contributing:
127+
* <relative-path> "\n" <sha256-hex of file bytes> "\n"
128+
* Hashing path + per-file digest (rather than concatenating raw bytes) makes the
129+
* result order-independent and sensitive to renames. The hash definition is part of
130+
* the public contract — consumers key off it — so it must stay stable.
131+
*/
132+
export async function getSkillHash(skill: Skill): Promise<string> {
133+
const rel = skill.filePath ?? `skills/${skill.id}/SKILL.md`;
134+
const skillDir = path.dirname(path.resolve(process.cwd(), rel));
135+
const files = (await getSkillFiles(skill)).slice().sort();
136+
137+
const agg = crypto.createHash('sha256');
138+
for (const f of files) {
139+
const bytes = await fs.readFile(path.join(skillDir, f));
140+
const fileHash = crypto.createHash('sha256').update(bytes).digest('hex');
141+
agg.update(`${f}\n${fileHash}\n`);
142+
}
143+
return `sha256:${agg.digest('hex')}`;
144+
}
145+
120146
export interface SkillFileEntry {
121147
name: string;
122148
path: string;

src/pages/.well-known/skills/index.json.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
// https://github.com/cloudflare/agent-skills-discovery-rfc
44
import type { APIRoute } from 'astro';
55
import { absUrl } from '../../../lib/site';
6-
import { getAllSkills, getSkillFiles } from '../../../lib/skills';
6+
import { getAllSkills, getSkillFiles, getSkillHash } from '../../../lib/skills';
77

88
export const GET: APIRoute = async () => {
99
const skills = await getAllSkills();
@@ -15,6 +15,7 @@ export const GET: APIRoute = async () => {
1515
description: s.data.description,
1616
url: absUrl(`/.well-known/skills/${s.data.name}/SKILL.md`),
1717
files: await getSkillFiles(s),
18+
hash: await getSkillHash(s),
1819
})),
1920
),
2021
};

0 commit comments

Comments
 (0)