Skip to content

Add versioning freeze & compatibility workflows#123

Open
HendrikBorgelt wants to merge 3 commits into
nfdi-de:mainfrom
HendrikBorgelt:version_freezing_and_w3id_import_fix
Open

Add versioning freeze & compatibility workflows#123
HendrikBorgelt wants to merge 3 commits into
nfdi-de:mainfrom
HendrikBorgelt:version_freezing_and_w3id_import_fix

Conversation

@HendrikBorgelt
Copy link
Copy Markdown
Collaborator

This PR should implement version freezing for Chem-dcat-ap's LinkML Schema, as well as implement warning messages and compatability checks for existing versions.

The scripts were developed and tested in two seperate test repos:

The scripts should be "plug and play" and should use the corecct w3id import uri's.

@dalito, I still need to make a "double check" and review the code again. However, since you might be more familiar with some of the just and uv commands, It would be nice if you can have a look over the scripts.

Add a versioning-freeze pipeline and compatibility checks for chem-dcat-ap. Introduces a README (.github/README_for_versioning.md), new workflows (.github/workflows/handle-upstream-release.yaml, .github/workflows/check-schema-compatibility.yaml), and helper scripts (scripts/freeze_imports.py, scripts/should_update_latest.py, scripts/check_compatibility.py). Update deploy-docs.yaml to run a release-time freeze, conditionally promote the 'latest' alias using should_update_latest.py, and adjust action/setup versions. The changes automate freezing upstream dcat-ap-plus imports at release, open freeze PRs on upstream releases, and publish a weekly compatibility badge and matrix to gh-pages to detect and notify about broken upstream dependencies.
@dalito
Copy link
Copy Markdown
Collaborator

dalito commented Mar 16, 2026

Could you add a description what the scripts/CI jobs should exactly do or specify the outcomes you exactly expect. That would help me to give more concrete feedback.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dalito, sorry for the AI slope in this file, but I think this is the document you would want to read.

If you'd like a more concise document, I'll write it this evening.

@HendrikBorgelt
Copy link
Copy Markdown
Collaborator Author

The main Idea is to get versions of chem-dcat-ap, which do not reference dcat-ap+ latest, when they are released, but rather versions that have fixed dependencies based on the last compatible version:

grafik

To achieve this, the deploy-docs.yaml was extended, and a Python script "freeze_imports.py" was implemented in order to overwrite the "latest" to the specific version, here v0.1.0.

Since this tooling would not be as valuable, if we don't get updates, when there are changes in the parent repo, such that we need to freeze the child repo (here chem-dcat-ap), I also implemented a GitHub action "check-schema-compatibility.yaml" Which checks for compatibility issues and automatically makes a PR which reminds us to freeze/publish the latest compatible version with the old DCAT-AP+ version.
See here: HendrikBorgelt/test_chemDCAT_ap_versioning_freeze#5

I also implemented a small badge
grafik
as well as a compatibility chart:
grafik
https://hendrikborgelt.github.io/test_chemDCAT_ap_versioning_freeze/compatibility.html
with the script check_compatibility.py.

@StroemPhi
Copy link
Copy Markdown
Member

While writing the docs, I realized that it is actually only material_entities_ap.yaml that needs to import DCAT-AP+, since it is imported by chemical_entites_ap.yaml which in turn is imported by chemical_reactions_ap.yaml and chem-dcat-ap.yaml. So we could delete this import statement from https://github.com/nfdi-de/chem-dcat-ap/blob/main/src/chem_dcat_ap/schema/chemical_reaction_ap.yaml#L55 & https://github.com/nfdi-de/chem-dcat-ap/blob/main/src/chem_dcat_ap/schema/chem_dcat_ap.yaml#L61 to make things easier.

What do you think?

@HendrikBorgelt
Copy link
Copy Markdown
Collaborator Author

Sounds good. Right now, there are still minor issues with my code, because I still missed some use cases (the current tooling always assumes that you want to be compatible with the "latest" schema, at the time of publishing, and therefore only replaces "latest" with the schema version, which is, of course, not always intended) and might have some bugs in it.

I would like David's general opinion on whether this is going in the right direction for what I want to do, or if I should implement it with other tools in mind.

@markdoerr
Copy link
Copy Markdown
Collaborator

@HendrikBorgelt, I will check if we can use the semver tools (or similar) to automatically set the new versions (I do this for many of my files (toml, yaml) and it works quite smoothly.

I stead of watching every day, consider an event based mechanism - to update the chem-dcat-ap repo on a change of the the dcat-ap-plus repo. This must exist.

@markdoerr
Copy link
Copy Markdown
Collaborator

@HendrikBorgelt,
here is the documentation about the events:
github workflow events

this fixes some minor issues regarding the correct w3id import statements, implements a compatability md generation.
The the new issue post for more details.
@HendrikBorgelt
Copy link
Copy Markdown
Collaborator Author

@HendrikBorgelt, here is the documentation about the events: github workflow events

Hi Mark.
I intentionally don't want to use a comunicative approach as of now. Establishing tooling outside of pooling means implementing conunication between the parent and the child repo, which means registering the child repo at the parent repo, which is a maintenance nightmare, if we assume that there will be multiple subprofiles established in the near future.

@HendrikBorgelt
Copy link
Copy Markdown
Collaborator Author

Updated Issue description for this PR:

Implement versioned sub-module imports and downstream freeze pipeline

Problem statement

chem-dcat-ap defines several sub-modules — chemical_entities_ap, chemical_reaction_ap, etc. — that are currently referenced using bare local imports:

imports:
  - chemical_entities_ap
  - chemical_reaction_ap

Bare imports carry no version information, so downstream schemas cannot determine which version of a sub-module was in use when a class was defined. The same problem applies to the upstream dcat-ap-plus dependency: without a pinned version in the dcatapplus: prefix URI, the deployed schema silently drifts whenever dcat-ap-plus releases a new version.


The inheritance chain

dcat-ap-plus  →  chem-dcat-ap  →  coremeta4cat
                               →  dcat-ap+labactions
                               →  visualization tool
                               →  ckan ?

Each arrow is a schema import. Unversioned imports anywhere in this chain make it impossible to reproduce validation results deterministically, or for downstream schemas to declare a stable dependency.


What the freeze pipeline delivers

At release time, CI transforms the development-form schemas into fully version-pinned release artifacts.

Source (on main, development form):

prefixes:
  dcatapplus: https://w3id.org/nfdi-de/dcat-ap-plus/v0.3.0/
  chemdcatap: https://w3id.org/nfdi-de/dcat-ap-plus/chemistry/
  chemical_entities_ap: https://w3id.org/nfdi-de/dcat-ap-plus/chemistry/entity/
  chemical_reaction_ap: https://w3id.org/nfdi-de/dcat-ap-plus/chemistry/reaction/

imports:
  - dcatapplus:schema/dcat_ap_plus
  - chemical_entities_ap
  - chemical_reaction_ap

Released artifact (on GitHub Pages / w3id.org, at /chemistry/v0.3.0/schema/):

prefixes:
  dcatapplus: https://w3id.org/nfdi-de/dcat-ap-plus/v0.3.0/
  chemdcatap: https://w3id.org/nfdi-de/dcat-ap-plus/chemistry/v0.3.0/
  chemical_entities_ap: https://w3id.org/nfdi-de/dcat-ap-plus/chemistry/entity/v0.3.0/
  chemical_reaction_ap: https://w3id.org/nfdi-de/dcat-ap-plus/chemistry/reaction/v0.3.0/

imports:
  - dcatapplus:schema/dcat_ap_plus
  - chemical_entities_ap:schema/chemical_entities_ap
  - chemical_reaction_ap:schema/chemical_reaction_ap

Each sub-schema import resolves via its own prefix to the correct versioned w3id path.

Note: id: fields are not versioned. The schemas are used as SHACL shape definitions
rather than OWL ontologies; the id: field does not determine shape node URIs (those come
from the default_prefix, i.e. the chemdcatap: prefix value, which is versioned). Version
identity is already expressed by the schema's version: attribute and the versioned prefixes.

The dcatapplus: prefix is already pinned on main (via the handle-upstream-release
workflow; see below). The chemdcatap:, per-module prefixes, and bare sub-module imports
are all frozen at release time.

How the test repo differs

The test repo (test_chemDCAT_ap_versioning_freeze) hosts all sub-schemas under a single
GitHub Pages root without w3id redirects. There is only one chemdcatap: prefix, and all
sub-schemas are deployed at /{version}/schema/. No per-module prefix declarations exist in
the source schemas, so all bare imports fall back to chemdcatap:schema/name. The
--sub-module-base flag is therefore not used in the test repo workflow.

Example of a frozen test-repo artifact:


Two-phase pipeline

Phase 1 — build-docs job (runs on every release tag push):

since we want to make sure the schema is technically valid, we should do regular testing before the schema is transfered to the github pages. However since the test would run into a chicken and the egg problem, where it can't validate a schema published on the github pages, since it hasn't validated that it is correct, and thereby has not transfered those schemas to github, we need to do local testing. We can assume that the freezing of the import statements should not cause any issues, therefore testing the correctness of the schema can be tested with local imports.

  1. gen-doc runs on raw schemas; bare local imports resolve against adjacent files — no network needed.
  2. Freeze step transforms the working copy in place:
    • chemdcatap: prefix → versioned w3id URL / gh-pages url for the test repo
    • Sub-schema own-namespace prefixes → versioned w3id URLs / gh-pages url for the test repo
    • Bare local imports → chemdcatap:schema/X CURIE form
    • Copies frozen files to docs/schema/ to replace any stale copies made by the gen-doc step.
  3. mike deploy publishes the frozen schemas to gh-pages.

Phase 2 — post-deploy-validate job (runs after build-docs, tags only):

Since we don't want to introduce an errors with the freezing of the import statements, a post publishing validation is executed here. This is done in a "test" branch, where the schemas can be loaded into and are tested with the respective w3ids artefacts in place on the github pages.

  1. Polls gh-pages until the new version is accessible (30 s interval, 20 attempts).
  2. Re-applies the identical freeze (idempotent — already frozen from Phase 1).
  3. Validates every frozen schema with gen-yaml — sub-schema CURIEs resolve against w3id / gh-pages, which now carries consistently versioned prefixes.
  4. Pushes frozen schemas to reference branch schema-release/{tag} (always pushed, even on failure, for maintainer inspection).
  5. Opens a GitHub Issue on validation failure.

New workflows

Three new or significantly changed workflow files are introduced.

deploy-docs.yaml (modified)

Adds the two-phase freeze pipeline described above to the existing build-docs job, and introduces the new post-deploy-validate job that runs after every successful release deploy.

handle-upstream-release.yaml (new)

Runs daily at 08:00 UTC. Checks whether dcat-ap-plus has published a new latest version since the source schemas were last pinned.

What it does:

  1. Fetches versions.json from the dcat-ap-plus GitHub Pages site and reads the version tagged latest.
  2. Reads the current dcatapplus: prefix value from the source schemas to determine the currently pinned version.
  3. Skip guard A: if the source is already pinned to the upstream latest, stops silently.
  4. Skip guard B: if an open PR for this exact version already exists, stops silently.
  5. Creates branch freeze/dcatapplus-{version}, commits the updated dcatapplus: prefix value, and opens a PR with a CI checklist.
  6. Posts a notice comment on every other open PR so contributors are aware of the upstream change.

Why should we not b informed by dcat-ap-plus: The workflow polls the existing public versions.json file. It requires no secrets, webhooks, or permissions beyond the default GITHUB_TOKEN scoped to this repository. This keeps the change fully self-contained in chem-dcat-ap.

Why polling rather than a push event from dcat-ap-plus: A push-based approach would require adding a downstream-dispatch step to the dcat-ap-plus release workflow and granting it a cross-repository secret. Polling from the child repo avoids any changes to the parent repo and needs no cross-repo permissions. The daily schedule introduces at most a 24-hour lag, which is acceptable.

One-time repository setup required:
Settings → Actions → General → "Allow GitHub Actions to create and approve pull requests" must be enabled.

check-schema-compatibility.yaml (new)

Runs weekly on Mondays at 06:00 UTC. For every deployed version of chem-dcat-ap, fetches the frozen dcatapplus: version from the published schema and checks whether that upstream version is still accessible on GitHub Pages.

Why a compatibility page: After release, the upstream schema it depends on could theoretically be removed or moved (e.g. if dcat-ap-plus deletes old versions from gh-pages). Downstream consumers of older chem-dcat-ap versions would then face broken imports without any visible warning. The compatibility matrix and badge give maintainers and users an at-a-glance view of which released versions are still fully resolvable.

Outputs:

  • badge.json — shields.io endpoint badge pushed to the gh-pages root; embed in the README:
    ![schema deps](https://img.shields.io/endpoint?url=https://nfdi-de.github.io/chem-dcat-ap/badge.json)
  • compatibility.html — standalone HTML matrix pushed to the gh-pages root.
  • compatibility.md — MkDocs-compatible markdown committed to main; rebuilt into the docs site by the subsequent deploy-docs run.
  • GitHub Issue with label schema-dep-stale opened when latest becomes stale; closed automatically when it resolves.

Compatibility table: tracking freeze-action success

The compatibility table includes a Freeze validated column showing whether the post-deploy-validate job succeeded for each released version.

How it works: At the end of every post-deploy-validate run (success or failure), the workflow writes a freeze-status.json file to the gh-pages root. Each key is a version string; each value records validated (true/false) and a timestamp. check_compatibility.py fetches this file at the start of each weekly check and uses it to populate the column. Versions released before this feature was added show (no data).

This is particularly useful if a release is tagged but validation fails silently — maintainers can see at a glance that the frozen schemas for that version should be inspected.


Implementation steps

See for_direct_implementation_at_chemdcat_ap/README.md for the complete guide. In summary:

  1. Add scripts/freeze_imports.py, scripts/check_compatibility.py, and scripts/should_update_latest.py.
  2. Enable "Allow GitHub Actions to create and approve pull requests" in repository settings.
  3. Replace .github/workflows/deploy-docs.yaml with the version in workflows/.
  4. Add .github/workflows/handle-upstream-release.yaml from workflows/.
  5. Add .github/workflows/check-schema-compatibility.yaml from workflows/.

Why not always-versioned imports on main

Keeping versioned imports directly in source would break local development: gen-doc and linkml-run-examples would always resolve to a previously-deployed version rather than the local file being edited. It also creates a chicken-and-egg problem for new sub-modules (a URL cannot exist before the first deploy). Bare imports on main with CI-managed versioning at release time gives the best of both worlds.


Future: PyPI-based dependency management

The current polling and auto-PR approach for upstream releases is a pragmatic solution given that dcat-ap-plus is distributed as a GitHub Pages site rather than a PyPI package. If dcat-ap-plus were published to PyPI in the future, this workflow could be replaced by standard dependency-management tooling such as Dependabot or a bump-my-version action — which handle version pinning, PR creation, and CI checks natively. The current polling approach is not intended as a permanent architecture, but is the most practical option until PyPI publication is feasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants