Skip to content

🐛 RBAC: Conditional permissions are removed on restart and cannot be re-created #9429

@MaciekStrzelbicki

Description

@MaciekStrzelbicki

Workspace

rbac

📜 Description

During RBAC conditional policy reload, reconciliation applies removals before additions.
When Catalog permission metadata is temporarily unavailable, the add phase fails with metadata retrieval errors and 503 responses.

As a result, previously valid conditional policies are removed and not restored, causing loss of effective Catalog read conditions for non-admin users.

👍 Expected behavior

Conditional policy reconciliation should be non-destructive.
If metadata lookup fails or returns temporary 503, existing conditions must remain unchanged.
No condition should be removed unless replacement/addition has completed successfully.

👎 Actual Behavior with Screenshots

Observed during PROD release window:

  1. Runtime logs show permission metadata errors, including:
    Unable to get permission list for plugin catalog

  2. Temporary 503 Service Unavailable is returned from the Catalog permission metadata endpoint.

  3. Conditional Catalog read policy for all-users is removed and not re-created.

  4. Non-admin users lose expected effective Catalog read access.

👟 Reproduction steps

  1. Configure RBAC conditional policy for Catalog read (including baseline all-users condition).
  2. Trigger RBAC conditional policy reload/reconciliation (for example during restart/release).
  3. Introduce or hit temporary Catalog permission metadata unavailability (503).
  4. Observe add-phase failure in logs:
    Unable to get permission list for plugin catalog
  5. Verify condition removals were applied, but additions were not restored.
  6. Confirm non-admin Catalog read behavior regressed.

📃 Provide the context for the Bug.

Platform: Backstage
Plugin area: RBAC + Catalog permission metadata
Incident window: 2026-06-05 (PROD release window)
Observed symptom: temporary 503 on metadata endpoint during conditional reconciliation
User impact: non-admin users lost effective Catalog read conditions

Why this is critical:
This is a destructive failure mode in security policy reconciliation. A transient dependency outage (503) should not permanently alter effective authorization state.

👀 Have you spent some time to check if this bug has been raised before?

  • I checked and didn't find similar issue

🏢 Have you read the Code of Conduct?

Are you willing to submit PR?

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions