diff --git a/docs/requirements/process_overview.rst b/docs/requirements/process_overview.rst index 64e120eb..4b660a2e 100644 --- a/docs/requirements/process_overview.rst +++ b/docs/requirements/process_overview.rst @@ -5,6 +5,23 @@ Process Requirements Overview =============================== .. needtable:: - :types: tool_req - :columns: satisfies as "Process Requirement" ;id as "Tool Requirement";implemented;source_code_link + :columns: satisfies_back as "Tool Requirement"; id as "Process Requirement";tags :style: table + + r = {} + + all = {} + for need in needs: + all[need['id']] = need + + for need in needs: + if any(tag in need['tags'] for tag in ["done_automation", "prio_1_automation", "prio_2_automation", "prio_3_automation"]): + if not "change_management" in need['tags']: + # Filter out change management related requirements + r[need['id']] = need + + for tool_req in needs.filter_types(['tool_req']): + for process_req_id in tool_req['satisfies']: + r[process_req_id] = all[process_req_id] + + results = r.values() diff --git a/docs/requirements/requirements.rst b/docs/requirements/requirements.rst index 942943af..b9c97db8 100644 --- a/docs/requirements/requirements.rst +++ b/docs/requirements/requirements.rst @@ -532,6 +532,9 @@ Architecture Attributes PROCESS_gd_req__req__linkage_fulfill :parent_covered: YES + .. note:: + TODO: link targets not clear + Docs-as-Code shall enforce that linking via the ``fulfils`` attribute follows defined rules. Allowed source and target combinations are defined in the following table: @@ -980,3 +983,16 @@ Overview of Tool to Process Requirements .. needextend:: c.this_doc() and type == 'tool_req' and not status :status: valid + +.. tool_req:: Metamodel + :id: tool_req__docs_metamodel + :tags: metamodel + :implemented: YES + + Docs-as-Code shall provide a metamodel for definining config in a `metamodel.yaml` in the source code repository. + + .. note:: "satisfied by" is something like "used by" or "required by". + +.. needextend:: "metamodel.yaml" in source_code_link + :+satisfies: tool_req__docs_metamodel + :+tags: config diff --git a/integration.md b/integration.md new file mode 100644 index 00000000..ad81d26e --- /dev/null +++ b/integration.md @@ -0,0 +1,317 @@ +# Integration Testing in a Distributed Monolith + +*RFC – Working draft. High-level overview of how our projects typically integrate +(reflecting practices used in several codebases). Assumptions and trade-offs are noted; +please flag gaps or disagreeable sections so we can iterate. Easiest way for feedback is +face to face!* + +Teams often split what is functionally a single system across many repositories. Each +repository can show a green build while the assembled system is already broken. This +article looks at how to bring system-level feedback earlier when you work that way. This +article does not argue for pull requests, trunk-based development, or continuous +integration itself. Those are well covered elsewhere. It also does not look into any +specific tools or implementations for achieving these practices - except for providing a +GitHub based example. + +The context here assumes three things: you develop through pull requests with required +checks; you have multiple interdependent repositories that ship together; and you either +have or will create a central integration repository used only for orchestration. If any +of those are absent you will need to establish them first; the rest of the discussion +builds on them. + +## Where Problems Usually Appear +An interface change (for example a renamed field in a shared schema) is updated in two +direct consumers. Their pull requests pass. Another consumer several repositories away +still depends on the old interface and only fails once the whole set of changes reaches +main and a later integration run executes. The defect was present early but only visible +late. Investigation now needs cross-repo log hunting instead of a quick fix while the +change was still in flight. + +Running full end-to-end environments on every pull request is rarely affordable. +Coordinated multi-repository changes are then handled informally through ad-hoc +ordering: “merge yours after mine”. Late detection raises cost and makes regression +origins harder to locate. + +## Core Concepts +We model the integrated system as an explicit set of (component, commit) pairs captured +in a manifest. Manifests are derived deterministically from events: a single pull +request, a coordinated group of pull requests, or a post-merge refresh. A curated fast +subset of integration tests provides pre-merge feedback; a deeper suite runs after +merge. Passing suites produce a recorded manifest (“known good”). Coordinated +multi-repository change is treated as a first-class case—we validate the set as a unit +rather than relying on merge ordering. + +Terminology (brief): +* Component – repository that participates in the assembled product (e.g. service API + repo, shared library). +* Fast subset – curated integration tests finishing in single-digit minutes (protocol + seams, migration boundaries, adapters). +* Tuple – mapping of component names to commit SHAs for one integrated build (e.g. { + users: a1c3f9d, billing: 9e02b4c }). +* Known good – tuple + metadata (timestamp, suite, manifest hash) stored for later + reproduction. + +History & context: classic continuous integration assumed a single codebase; splitting +one system across repositories reintroduces coordination issues CI was intended to +remove. This adapts familiar CI principles (frequent integration, fast feedback, +reproducibility) to a multi-repository boundary. The central integration repository is a +neutral place to define participating components, build manifests, hold +integration-specific helpers (overrides, fixtures, seam tests), and persist known-good +records. It should not contain business logic; keeping it lean reduces accidental +coupling and simplifies review. + +## Integration Workflows +We use three recurring workflows: a single pull request, a coordinated subset when +multiple pull requests must land together, and a post-merge fuller suite. Each produces +a manifest, runs an appropriate depth of tests, and may record the tuple if successful. + +### Visual Overview +```mermaid +flowchart TB + subgraph COMP[Component Repos] + pr[PR opened / updated
<event>]:::event --> comp_ci[Component tests]:::step + + trigger1[Merge to main
<event>]:::event + end + + subgraph INT[Integration Repo] + comp_ci --> |dispatch|detect_changeset[Detect multi repository PRs]:::step + knownGood[(Known good store)]:::artifact + + %% PR + detect_changeset --> buildMan[Build PR/PRs manifest using PR/PRs SHA + known good others]:::step + knownGood --> buildMan + buildMan --> runSubset[Run fast subset of integration tests]:::step + runSubset --> prFeedback[Provide Feedback in PR / all PRs]:::step + + %% Post-merge / scheduled full suite + trigger1 -->|dispatch| fullMan[Build full manifest from latest mains of all repos]:::step + trigger2[schedule
<event>]:::event --> fullMan + fullMan --> fullSuite[Run full integration test suite]:::step + fullSuite --> fullPass{Full suite pass?}:::decision + fullPass -->|Yes| knownGood + fullPass -->|No| issue["Create Issue
(or a more clever automated bisect solution)"]:::red + end +``` +*High-level flow of integration workflows. Known good store feeds manifest construction for single and coordinated paths; full test suite success updates the store.* + +### Single Pull Request +When a pull request opens or updates, its repository runs its normal fast tests. The +integration repository is also triggered with the repository name, pull request number, +and head SHA. It builds a manifest using that SHA for the changed component and the last +known-good SHAs for others, then runs the curated fast subset. The result is reported +back to the pull request. The manifest and logs are stored even when failing so a +developer can reproduce locally. + +The subset is explicit rather than dynamically inferred. Tests in it should fail quickly +when contracts or shared schemas drift. If the list grows until it is slow it will +either be disabled or ignored; regular curation keeps it useful. + +### Coordinated Multi-Repository Subset +Some changes require multiple repositories to move together (for example a schema +evolution, a cross-cutting refactor, a protocol tightening). We mark related pull +requests using a stable mechanism such as a common label (e.g. changeset:feature-x). The +integration workflow discovers all open pull requests sharing the label, builds a +manifest from their head SHAs, and runs the same fast subset. A unified status is posted +back to each pull request. None merge until the coordinated set is green. This removes +informal merge ordering as a coordination mechanism. + +### Post-Merge Full Suite +After merges we run a deeper suite. Some teams trigger on every push to main; others run +on a schedule (hourly seems to be a common practice). Per-merge runs localise failures +but cost more; batched runs save resources but expand the search space when problems +appear. When the suite fails, retaining the manifest lets you bisect between the last +known-good tuple and the current manifest (using a scripted search across the changed +SHAs if multiple components advanced). On success we append a record for the tuple with +a manifest hash and timing data. + +### Manifests +Manifests are minimal documents describing the composition. They allow reconstruction of +the integrated system later. + +Single pull request example: +``` +pr: 482 +component_under_test: + name: docs-as-code + repo: eclipse-score/docs-as-code + sha: 6bc901f2 +others: + - name: component-a + repo: eclipse-score/component-a + ref: 34985hf8 # based on last known-good + - name: component-b + repo: eclipse-score/component-b + ref: a4fd56re # based on last known-good +subset: pr_fast +timestamp: 2025-08-13T12:14:03Z +``` + +Coordinated example: +``` +components_under_test: + - name: users-service + repo: eclipse-score/users-service + branch: feature/new_email_index + ref: a57hrdfg + pr: 16 + - name: auth-service + repo: eclipse-score/auth-service + branch: feature/lenient-token-parser + ref: q928d46b75 + pr: 150 +others: + - name: billing-service + repo: eclipse-score/billing-service + ref: a4fd56re # based on last known-good +subset: pr_fast +changeset: feature-x +``` + +Large configuration belongs elsewhere; manifests should stay readable and diffable. + +## Example: GitHub Actions (Conceptual) +*Conceptual outline; not yet implemented here.* + +Trigger from a component repository: +``` +name: integration-pr +on: [pull_request] +jobs: + dispatch: + runs-on: ubuntu-latest + steps: + - name: Dispatch to integration repo + uses: peter-evans/repository-dispatch@v3 + with: + token: ${{ secrets.INTEGRATION_TRIGGER_TOKEN }} + repository: eclipse-score/reference_integration + event-type: pr-integration + client-payload: >- + {"repo":"${{ github.repository }}","pr":"${{ github.event.pull_request.number }}","sha":"${{ github.sha }}"} +``` + +Integration repository receiver (subset): +``` +on: + repository_dispatch: + types: [pr-integration] +jobs: + pr-fast-subset: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Parse payload + run: echo '${{ toJson(github.event.client_payload) }}' > payload.json + + - name: Materialize composition + run: gen_pr_manifest.py last_known_good.yaml payload.json > manifest.pr.yaml + + - name: Render MODULE overrides + run: render_overrides.py manifest.pr.yaml > MODULE.override.bzl + + - name: Bazel test (subset) + run: bazel test //integration/subset:pr_fast --override_module_files=MODULE.override.bzl + + - name: Store manifest & results + uses: actions/upload-artifact@v4 + with: + name: pr-subset-${{ github.run_id }} + path: | + manifest.pr.yaml + bazel-testlogs/**/test.log +``` + +Post-merge full suite: +``` +on: + schedule: [{cron: "15 * * * *"}] + repository_dispatch: + types: [component-merged] +jobs: + full-suite: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Generate new last_known_good.yaml + run: update_last_known_good.py last_known_good.yaml > last_known_good.yaml + + - name: Bazel test (full) + run: bazel test //integration/full:all --test_tag_filters=-flaky + + - name: Persist known-good tuple (on success) + if: success() + run: | + git add last_known_good.yaml + git commit -m "update known good" + + - name: Upload artifacts + uses: actions/upload-artifact@v4 + with: + name: full-${{ github.run_id }} + path: | + bazel-testlogs/**/test.log +``` + +### Recording Known-Good Tuples +Known-good records are stored append-only. +``` +[ + { + "timestamp": "2025-08-13T12:55:10Z", + "tuple": { + "docs-as-code": "6bc901f2", + "component-a": "91c0d4e1", + "component-b": "a44f0cd9" + }, + "manifest_sha256": "4c9b7f...", + "suite": "full", + "duration_s": 742 + } +] +``` +Persisting enables reproduction (attach manifest to a defect), audit (what exactly +passed before a release), gating (choose any known-good tuple), and comparison (diff +manifests to isolate drift) without relying on (rather fragile) links to unique runs in +your CI system. + +## Operating It +**Curating the fast subset:** Tests should fail quickly when public seams change. Keep +the list explicit (e.g. //integration/subset:pr_fast). Remove redundant tests and +quarantine flaky ones; review periodically (monthly or after significant interface +churn) to preserve signal. + +**Handling failures:** For a failing pull request subset: inspect manifest + log; +reproduce locally with a script consuming the manifest. For a failing coordinated set: +treat all related pull requests as atomic. For a failing post-merge full suite: bisect +between the last known-good tuple and current manifest (script permutations if multiple +repositories changed) to narrow cause. Distinguish real regressions from test fragility. + +**Trade-offs and choices:** Manifests + SHAs avoid tag noise and keep validation close +to heads. Two tiers (subset + full) offer a clear mental model; add more only with +evidence. A central orchestration repository centralises caching, secrets, and audit +history. + +**Practical notes:** Cache builds to stabilise subset runtime. Hash manifests (e.g. +SHA-256) for concise references. Expose an endpoint or badge showing the latest known +good. Generate overrides; do not hand-edit ephemeral files. Optionally lint the subset +target for allowed directories. + +**Avoiding pitfalls:** Diff-based dynamic test selection often misses schema or contract +drift. Ad-hoc manual edits to integration config reduce reproducibility. Merge ordering +as coordination defers detection to the last merge. + +**Signs it is working:** Interface breakage is caught pre-merge. Coordinated change sets +show unified status. Multi-repository regressions are localised rapidly using stored +manifests. + +## Summary +By expressing the integrated system as explicit manifests, curating a fast integration +subset for pull requests, and running a deeper post-merge suite, you move discovery of +cross-repository breakage earlier while keeping costs predictable. Each successful run +leaves a reproducible record, making release selection and debugging straightforward. +The approach lets a distributed codebase behave operationally like a single one. + +*Further reading:* Continuous Integration (Fowler), Continuous Delivery (Humble & +Farley), trunk-based development resources.