Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 48 additions & 25 deletions DISCREPANCIES.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,15 @@
When the CLAUDE.md plan and the two CSA whitepapers disagree, the whitepapers win.
This file logs every conflict found during implementation so Jim can confirm the resolution.

> **2026-06-12 update:** the **published** versions of both papers (June 2026, now in
> `docs/methodology/`) were reviewed against this list. D2 and D5 are resolved by the
> published text + Jim's OQ answers; D1, D3, D4 remain genuinely open; D6 is new.

---

## D1. Letter-grade band gaps
## D1. Letter-grade band gaps — ⚠️ STILL OPEN (now in the published paper)

**Source:** Concept Paper states `A: 900–1000 · B: 800–890 · C: 700–790 · D: 600–690 · F: 0–590`
**Source:** Concept Paper (published June 2026, §9.2) still states `A: 900–1000 · B: 800–890 · C: 700–790 · D: 600–690 · F: 0–590`

**Problem:** The band boundaries have a one-point gap between each tier (890→900, 790→800, 690→700, 590–600 unassigned). Scores of exactly 891–899, 791–799, 691–699, 591–599 would fall into no band.

Expand All @@ -18,56 +22,75 @@ This file logs every conflict found during implementation so Jim can confirm the
- D: 600–699
- F: ≤ 599

**Needs Jim's confirmation:** Yes — confirm this is the intended interpretation before the grade logic ships.
**Decision (Jim, 2026-06-13):** Keep the contiguous-band interpretation in the platform and **communicate it to partners in writing** via the prototype status memo (`partner-kit/PROTOTYPE_STATUS_MEMO.md` §5c), so no partner is surprised by how a 891–899 / 991+ score grades. Recommend a paper erratum at V2 spec finalization. No code change needed — `consensus.compute_grade` already implements contiguous bands.

---

## D2. Score polarity (TRS vs. display score)

**Source:** The *Scoring Methodology* paper defines TRS ∈ [0,1] where **higher = worse** (more risk). The *Concept Paper* defines a `Resilience Score = e^(−α·r)` and letter-grade bands (A 900–1000) where **higher = better**.
## D2. Score polarity (TRS vs. display score) — ✅ RESOLVED (OQ#1 + published papers)

**Problem:** These two conventions are in direct conflict. A raw TRS of 0.9 is "nearly unacceptable risk" in the Scoring Methodology paper, but maps to a 0–1000 display score of 100 (F grade) — which is correct but non-obvious without the transform. The UI must never mix the two.
**Source:** The published *Scoring Methodology* (§3.2) defines TRS ∈ [0,1] where **higher = worse**. The published *Concept Paper* (§8) describes "the Total Risk Score (TRS), ranging from 0 (the model failed on every attack) to 1000 (the model never failed)" — i.e. **0–1000, higher = better**, under the same name. See also D6.

**Resolution applied (per CLAUDE.md §2.3):**
**Resolution applied (per OQ#1, confirmed 2026-06-05):**
- Store `trs` (raw, 0–1, higher = worse) in all DB rows.
- Derive `score_1000 = round(1000 × (1 − trs))` (0–1000, higher = better) for all public display.
- The API returns both; the UI only renders `score_1000` and letter grade.
- TRS action bands (§2.3) are shown only on the Methodology page as an educational explainer, not on leaderboard rows.
- TRS action bands are shown only on the Methodology page as an educational explainer.
- The submission API enforces a **polarity consistency gate**: `trs` is cross-checked
against the weighted pillar composite (tolerance ±0.25) and gross mismatches are
rejected with a polarity hint (`tool/app/crud.py`).

**Needs Jim's confirmation:** Yes — polarity choice is Open Question #1.
**Status:** Resolved and mechanically enforced. No further action.

---

## D3. Dynamic penalty formula application scope
## D3. Submission depth: pillar summaries vs. test-case detail — ✅ RESOLVED (Jim, 2026-06-13)

**Source:** The Scoring Methodology paper defines the dynamic penalty `W_adj = W_tc × e^(α×ASR)` at the individual test-case level for TRS computation. The Concept Paper's `Resilience = e^(−α×r)` applies at the per-pillar level.

**Source:** The Scoring Methodology paper defines the dynamic penalty `W_adj = W_tc × e^(α×ASR)` at the individual test-case level for TRS computation. The Concept Paper's `Resilience = e^(−α×r)` appears to apply at the per-pillar level.
**Problem:** It was unclear whether scanners must submit test-case-level data or only per-pillar summaries.

**Problem:** It is unclear whether the exponential penalty is applied at test-case granularity (fine-grained, requires full test-case data) or at pillar summary level (coarse, only requires per-pillar ASR). Partners submitting only summary-level pillar scores cannot reproduce the test-case-level calculation.
**Decision (Jim):** *"Summaries are sufficient to drive user traffic to partners, but access to details for quality control and bug detection is preferred."*

**Resolution applied:** The `ScanSubmission` entity stores per-pillar scores as the canonical atomic unit for consensus math. `TestCaseResult` rows are optional depth for evidence drill-down. The consensus formula uses pillar scores, not raw test cases. The test-case penalty is the scanner's internal concern.
**Resolution applied:**
- The six `pillar_scores` remain the **canonical, required, and sufficient** input — they alone drive consensus, the leaderboard, and partner routing. The test-case penalty math is the scanner's internal concern.
- Test-case detail is now an **optional, requested** passthrough: `test_case_results[]` on `POST /api/submissions` (and batch), persisted to the `test_case_results` table, **never entering consensus math**, and exposed to CSA QC via `GET /api/admin/submissions/{id}/test-cases`. This realizes the "preferred for QC and bug detection" half of the decision technically.
- Documented in PARTNER_GUIDE §4 ("summaries drive the score, detail drives quality") and the partner submission schema.
- *(This separation is also why the D2 polarity gate uses a generous ±0.25 tolerance — partner TRS legitimately differs from the naive pillar composite.)*

**Needs Jim's confirmation:** Confirm whether CSA requires scanners to submit test-case-level data or only pillar summaries.
**Status:** Resolved and implemented.

---

## D4. α (alpha) parameter value
## D4. α (alpha) parameter value — 🤝 PARTNER DISCUSSION ITEM (Jim, 2026-06-13)

**Source:** The published Concept Paper §9.1: α "set in V1 to 15. The α parameter is preserved for backward compatibility; tuning per service type is open for review during V2 spec finalization."

**Problem:** The MethodologyVersion entity has `alpha_by_service_type{}` but no canonical V2 values are locked.

**Resolution applied:** Seeded with `{"model": 15, "mcp_server": 15, "agent": 15}` (V1 carryover) in the synthetic data, marked provisional.

**Decision (Jim):** Carry as an **item for discussion with the scanner partners** — the partners run the engines that produce the ASR distributions α reshapes, so per-service-type tuning should be set with their input rather than unilaterally. Listed as an open item in the partner status memo (`partner-kit/PROTOTYPE_STATUS_MEMO.md`). Lands at V2 spec finalization.

---

**Source:** CLAUDE.md §2.3 notes "V1 used 15; per-service-type tuning is open for V2." The Scoring Methodology paper does not specify a V2 α value.
## D5. Confidence Index vs. Coverage Index naming — ✅ RESOLVED (OQ#3)

**Problem:** The MethodologyVersion entity has `alpha_by_service_type{}` but no canonical values are locked for V2.
**Source:** Concept Paper §9.3 flags the naming refinement but does **not** commit it ("deferred to specification finalization").

**Resolution applied:** Seeded with `{"model": 15, "mcp_server": 15, "agent": 15}` (V1 carryover) in the synthetic data. Marked as provisional.
**Resolution applied (per OQ#3, confirmed 2026-06-05):** "Confidence Index", C = N, no saturation curve. Used consistently in all prototype code and copy.

**Needs Jim's confirmation:** What are the V2 α values per service type?
**Status:** Resolved for launch. The published paper keeps the door open for V2.1 — revisit at the amendment cycle.

---

## D5. Confidence Index vs. Coverage Index naming
## D6. "TRS" means two different things across the two published papers — 🆕 NEW (found 2026-06-12)

**Source:** Concept Paper §9 introduces "Confidence Index" but the CLAUDE.md notes it "may be relabeled 'Coverage Index' / 'Breadth.'"
**Source:**
- *Scoring Methodology* (published June 2026, §1 + §3.2): "TRS ∈ [0, 1] where zero indicates no failure, one indicates total failure across all attacks" — **risk polarity**.
- *Concept Paper* (published June 2026, §8): "a single numerical score, the Total Risk Score (TRS), ranging from 0 (the model failed on every attack) to 1000 (the model never failed)" — **resilience polarity, different scale**.

**Problem:** Naming is not finalized. Prototypes use "Confidence Index" throughout.
**Problem:** The same term ("Total Risk Score / TRS") is published with opposite polarity and different scales in the two authoritative documents. Anyone implementing from the papers alone could build either. This is the root cause of D2.

**Resolution applied:** Using "Confidence Index" in all prototype code and copy, with a TODO token `<!-- TODO: confirm CI vs Coverage Index naming (OQ#3) -->` in templates.
**Resolution applied:** The site/API/partner-guide vocabulary is now strictly: **"TRS" = the Scoring Methodology's [0,1] risk score (internal + API field `trs`)**, and **"RiskRubric Score" = the 0–1000 higher-is-safer display number** (what the Concept Paper §8 calls "TRS"). The PARTNER_GUIDE's polarity contract (§4) and the API's polarity gate enforce it mechanically.

**Needs Jim's confirmation:** Open Question #3.
**Needs Jim's confirmation:** Recommend an erratum to the Concept Paper §8 (call the 0–1000 number "RiskRubric Score", reserve "TRS" for the [0,1] risk score) before partner engineering teams read both papers side by side.
52 changes: 39 additions & 13 deletions docs/SYSTEM_OVERVIEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ riskrubric-v2/

Base URL (production): `https://riskrubric.ai` (or `http://localhost:8006` in Docker)

All responses are JSON. All public GET endpoints require no authentication. POST endpoints require an `X-Api-Key` header (scanner key or admin key, depending on endpoint).
All responses are JSON. All public GET endpoints require no authentication. Partner endpoints require the scanner API key as `Authorization: Bearer rrk_…` (or legacy `X-Scanner-Key`); admin endpoints require `X-Admin-Key`. Full partner-facing documentation: [`partner-kit/PARTNER_GUIDE.md`](../partner-kit/PARTNER_GUIDE.md).

### Public read endpoints

Expand Down Expand Up @@ -336,13 +336,13 @@ Machine-readable `methodology.schema.json` (same file as the downloadable artifa

Returns an array of service+consensus+submissions objects for up to 4 services. Used by the Compare & Diverge prototype.

### Partner submission endpoint
### Partner submission endpoints

#### `POST /api/submissions`

**Auth:** `X-Api-Key: <scanner-api-key>`
**Auth:** `Authorization: Bearer rrk_live_…` (scanner key; `X-Scanner-Key` also accepted)

**Request body:**
**Request body** (full schema: `partner-kit/submission.schema.json`):
```json
{
"service_id": "svc-001",
Expand All @@ -356,42 +356,68 @@ Returns an array of service+consensus+submissions objects for up to 4 services.
"safety": 835,
"excessive_agency": 820
},
"idempotency_key": "svc-001-2026-07-15-run1",
"scan_started_at": "2026-07-15T08:00:00Z",
"scan_completed_at": "2026-07-15T11:30:00Z",
"coi_disclosed": false,
"coi_note": null,
"evidence_uri": "https://pointguardai.com/evidence/run-20260715",
"reproducibility_runs": 3,
"native_categories": {"optional": "scanner-native taxonomy passthrough"},
"category_mapping_version": "partner-mapping-v1",
"engine_version": "scanner-2.4.1",
"is_synthetic": false
}
```

**Validation gates (any failure → `status: rejected`):**
**Validation gates (any failure → stored as `rejected`, returned as `422`):**
1. All six pillar scores present and in [0, 1000]
2. TRS in [0, 1]
3. `methodology_version_id` is a known, published version
4. Scanner's `covered_service_types` includes the target service's type
5. COI: if scanner org affiliates the service vendor and `coi_disclosed != true` → reject
5. COI: if scanner org/affiliate matches the service vendor and `coi_disclosed != true` → reject
6. Polarity consistency: `trs` vs. weighted pillar composite within ±0.25

**Response:** `201 Created` with the created `ScanSubmission` (status: `received`). A rejected submission returns `201` with `status: rejected` and `validation_errors` populated.
Advisory warnings (stored, never auto-reject): `reproducibility_runs < 2`, missing `scan_completed_at`.

**Responses:** `201` (received) · `200` + `X-Idempotent-Replay: true` (idempotency-key replay) · `422` (rejected — body carries machine-readable `validation_errors`; the rejected record is retained append-only for audit).

#### `POST /api/submissions/batch`

Bulk ingestion — up to **500 items per call**, processed independently; always `200` with index-aligned per-item results (`received`/`rejected`/`replayed`/`error` + errors/warnings). Built for large partner backfills (e.g. PointGuard's MCP-server corpus).

#### `GET /api/submissions` · `GET /api/submissions/{id}`

Scanner-scoped status polling (each scanner sees only its own submissions).

#### Webhooks

Scanners with a registered `webhook_url` receive `submission.status_changed` POSTs on every transition, HMAC-SHA256 signed (`X-RiskRubric-Signature`). Best-effort delivery; polling is the source of truth.

### Admin endpoints (CSA staff only)

**Auth:** `X-Api-Key: <admin-key>`
**Auth:** `X-Admin-Key: <admin-key>`

| Method | Path | Action |
|---|---|---|
| `POST` | `/api/admin/submissions/{id}/validate` | Move `received` → `validated` |
| `POST` | `/api/admin/submissions/{id}/publish` | Move `validated` → `published`; triggers consensus recompute |
| `POST` | `/api/admin/submissions/{id}/reject` | Move any → `rejected` with reason |
| `POST` | `/api/admin/submissions/{id}/reject` | Reject with reason (pre-publication only) |
| `POST` | `/api/admin/submissions/{id}/withdraw` | Withdraw; published submissions leave consensus immediately |
| `POST` | `/api/admin/submissions/{id}/dispute` | Move `published` → `disputed` (stub; full workflow TBD) |
| `POST` | `/api/admin/scanners/{slug}/suspend` | Suspend scanner; its published scores leave consensus (auto-recompute all affected services) |
| `POST` | `/api/admin/scanners/{slug}/reinstate` | Reinstate; scores re-enter consensus |
| `POST` | `/api/admin/scanners/{slug}/keys` | Issue/rotate the scanner's API key (raw key shown once) |
| `GET` | `/api/admin/audit?entity_type=&entity_id=&limit=` | Audit log query |
| `POST` | `/api/admin/consensus/recompute/{service_id}` | Manual consensus refresh for a service |

### Submission status lifecycle
### Submission status lifecycle (enforced state machine — illegal transitions → 409)

```
received ──(admin validate)──► validated ──(admin publish)──► published
│ │
└──(admin reject)──► rejected (admin dispute)──► disputed
received ──(validate)──► validated ──(publish)──► published ──(dispute)──► disputed
│ │ │ │
└──(reject)──► rejected ◄┘ (withdraw)──► withdrawn ◄──(withdraw)
[terminal — resubmit instead] [recomputes consensus if was published]
```

---
Expand Down
Binary file not shown.
Binary file added docs/methodology/riskrubric-v2-concept-paper.pdf
Binary file not shown.
1 change: 0 additions & 1 deletion methodology.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
"version": "2.0.0",
"released_at": "2026-07-29",
"steward": "Cloud Security Alliance (CSA) / CSAI Foundation",
"is_synthetic": true,

"service_types": [
{
Expand Down
Loading