Experimental Rust CLI for reading CKAN metadata repositories, benchmarking parse performance, and producing optional catalog/search sidecar files for CKAN-Linux.
The project is intentionally read-only. It does not replace CKAN's resolver, repository update path, install logic, or registry writes; it parses metadata into comparison and sidecar outputs that CKAN-Linux can consume as an optional browse/search acceleration layer.
- Reads CKAN metadata from
.zip,.tar.gz/.tgz, or extracted directories. - Parses
.ckanmodules in parallel. - Reports archive, metadata, field coverage, relationship, and timing counts.
- Emits per-module summaries as terminal tables, JSON arrays, or JSON lines.
- Selects the latest version per identifier using CKAN-ish version ordering.
- Exports stable package summaries for compatibility comparison.
- Builds an optional CKAN-Linux catalog/search sidecar index with module rows, reverse relationship edges, download counts, and provider mappings.
- Downloads live CKAN-meta archives and maintains extracted metadata caches.
- Searches identifiers, names, versions, relationship targets, unresolved references, and reverse relationships.
- Compares two metadata sources by normalized per-module fingerprints.
Parsed fields currently include identifiers, names, versions, spec versions,
abstracts/descriptions, authors, licenses, kinds, release dates, download sizes,
download resources, install stanzas, KSP compatibility fields, and resolver
relationship buckets (depends, recommends, suggests, conflicts,
provides).
Requirements:
- Rust stable toolchain
- Network access only for
fetch,sync, or the live benchmark scripts
Build from the repository:
cargo build --release
target/release/ckan-meta-rs --helpOr run directly during development:
cargo run -- --helpDownload the live CKAN metadata archive, extract a cache, and export JSON lines:
cargo run -- sync \
--archive data/CKAN-meta-master.zip \
--cache-dir data/CKAN-meta-cache \
--export data/modules.jsonl \
--json-linesRun common analysis commands against the extracted cache:
cargo run -- parse data/CKAN-meta-cache
cargo run -- bench data/CKAN-meta-cache --runs 20 --warmups 3
cargo run -- latest data/CKAN-meta-cache --limit 20
cargo run -- find data/CKAN-meta-cache Astronomer --limit 20
cargo run -- inspect data/CKAN-meta-cache AstronomersVisualPack --latest
cargo run -- relation-stats data/CKAN-meta-cache --limit 20Build the optional CKAN-Linux sidecar index:
cargo run -- catalog-index data/CKAN-meta-cache \
--output data/catalog-index.json \
--latest-only \
--pretty
cargo run -- validate-catalog-index data/catalog-index.jsonTo use the generated index with CKAN-Linux, either point the app at it:
CKAN_CATALOG_INDEX_PATH=/path/to/catalog-index.json ckan-linuxor symlink it into CKAN app data:
mkdir -p ~/.local/share/CKAN
ln -s /path/to/catalog-index.json ~/.local/share/CKAN/catalog-index-latest.jsonCKAN-Linux does not require this repo. If no valid sidecar index is configured, CKAN-Linux falls back to CKAN's normal registry/repository cache path.
parse Parse a source and report counts/timing
bench Repeated parse benchmark with warmups
modules Emit parsed module summaries
latest Emit latest module summary per identifier
export Write stable package JSON or JSON lines
catalog-index Write optional CKAN-Linux catalog/search sidecar JSON
validate-catalog-index Validate a catalog sidecar index
validate-export Validate an exported summary file
fetch Download a CKAN metadata archive
sync Download, extract, and optionally export
cache Extract relevant metadata into a cache directory
find Search by identifier, name, or version
relations Show modules that reference a relationship target
relation-stats Count common relationship targets
unresolved Find missing relationship targets
inspect Inspect a module and reverse relationships
compare Compare two metadata sources
completions Generate shell completions
Most commands accept a CKAN-meta .zip, .tar.gz/.tgz, or extracted metadata
directory.
See docs/commands.md for detailed command examples.
Terminal reports are meant for quick inspection:
Archive: data/CKAN-meta-master.zip
Type: zip
CKAN metadata entries: 29858
Parsed modules: 29858
Unique identifiers: 3497
Parse errors: 0
Timing statistics:
read min=432ms avg=440.50ms max=465ms total=8810ms
parse min=35ms avg=39.35ms max=48ms total=787ms
total min=470ms avg=482.65ms max=505ms total=9653ms
JSON output is available on report-style commands with --json. Module lists can
be emitted as pretty JSON arrays with --json or newline-delimited JSON with
--json-lines.
Example module summary:
{
"identifier": "AVP-4kTextures",
"version": "v1.13",
"dependency_names": ["AstronomersVisualPack"],
"conflict_names": ["AVP-Textures"],
"provided_names": ["AVP-Textures"]
}Run the local verification script:
scripts/smoke.shThe smoke script runs formatting, tests, Clippy, release build, and fixture-based CLI checks when CKAN-Linux test fixtures are available.
Equivalent core checks:
cargo fmt -- --check
cargo test --locked
cargo clippy --locked -- -D warnings
cargo build --release --lockedFetch and benchmark the live metadata repository:
scripts/fetch-live-meta.sh
cargo build --release
target/release/ckan-meta-rs bench data/CKAN-meta-master.zip --runs 20 --warmups 3
unzip -q data/CKAN-meta-master.zip -d data
target/release/ckan-meta-rs bench data/CKAN-meta-master --runs 20 --warmups 3Or run the bundled comparison script:
scripts/bench-live-meta.shOn the current live metadata set, JSON parsing is not the main bottleneck. Zip reading and decompression dominate end-to-end time, while scanning an extracted cache is substantially faster.
The current sidecar path is:
- Download CKAN-meta normally.
- Maintain a persistent extracted metadata cache.
- Scan the extracted cache in parallel.
- Produce compact JSON sidecars for optional CKAN-Linux catalog/search loading.
CKAN-Linux consumes the sidecar only when a valid index is configured and falls back to CKAN's normal registry/repository cache path otherwise.
See docs/findings.md for benchmark numbers and integration notes.
- src/: CLI, archive readers, parser, output, and export validation code.
- scripts/: live metadata fetch, benchmark, and smoke scripts.
- docs/commands.md: command reference and common workflows.
- docs/findings.md: benchmark results and integration notes.