Skip to content

Add design for managing Kopia repositories via BSL and new BSLR #1827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mpryc
Copy link
Contributor

@mpryc mpryc commented Jul 9, 2025

This design introduces the BackupStorageLocationRepository (BSLR) as a new custom resource that models and manages Kopia repositories on a per-BSL basis.

Why the changes were made

Propose new design that provides a clear separation between BSL config and repository state, and enables early provisioning of Kopia repos before backups run. This in the future will allow to create BSL Server on top of BSLR.

How to test the changes made

Read the design.

@openshift-ci openshift-ci bot requested review from kaovilai and sseago July 9, 2025 14:07
Copy link

openshift-ci bot commented Jul 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mpryc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2025
Copy link
Contributor

@sseago sseago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that isn't clear from the design doc is what changes may be needed upstream to allow BackupRepositories to be managed/created from outside Velero. I don't think Velero has any pluggability here currently, but if you want two different VM backups in the same namespace and BSL to use a different kopia repository, then I would think some velero-level integration would be required.


## Background

The current architecture of OADP tightly couples each BackupStorageLocation (BSL) with a single Kopia repository. This repository is provisioned and controlled entirely by Velero’s core components. While this setup is adequate for standard backup scenarios, it introduces significant limitations when more flexible or granular configurations are needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kopia repos are scoped to BSL+namespace, not just BSL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sseago true, will update.


## Goals

* **Support Multiple Repository Instances per BSL**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Support Multiple Repository Instances per BSL in the same namespace"

@weshayutin
Copy link
Contributor

woot! very excited about this one, thank you @mpryc

@mpryc
Copy link
Contributor Author

mpryc commented Jul 9, 2025

One thing that isn't clear from the design doc is what changes may be needed upstream to allow BackupRepositories to be managed/created from outside Velero. I don't think Velero has any pluggability here currently, but if you want two different VM backups in the same namespace and BSL to use a different kopia repository, then I would think some velero-level integration would be required.

@sseago currently, the design does not include any pluggability within Velero itself. The main goal is to enable managing Kopia repositories independently from Velero by using OADP. Specifically, OADP will handle initializing and managing Kopia repositories through functions that interact with the Velero codebase via the OADP controller. This creates an abstraction layer allowing users to access and manage Kopia server repositories outside of Velero. Another goal is to give access to kopia repositories outside of velero to allow e.g. retrieval of the files from kopia cli.

@sseago
Copy link
Contributor

sseago commented Jul 9, 2025

@mpryc How do we keep velero and OADP from stepping on each other if they're both trying to manage kopia repos? i.e. if we have 2 VMs in the same namespace and they need to be stored in different kopia repos, how does that work if Velero creates one repo and uses it for both? I have a feeling I'm missing something here.

@kaovilai
Copy link
Member

kaovilai commented Jul 9, 2025

Thank you for this proposal! After analyzing the design and comparing it with Velero's current BackupRepository implementation, I have some findings and questions that need clarification.

Current Understanding

Based on the clarification that "the design does not include any pluggability within Velero itself" and that OADP will manage Kopia repositories independently:

  1. BSLR is OADP-only: The BackupStorageLocationRepository (BSLR) CR is managed entirely by OADP, not Velero
  2. Velero remains unchanged: Velero continues using its existing BackupRepository model
  3. External access goal: Primary objective is enabling direct Kopia CLI access to repositories outside of Velero

Current Velero Architecture

In Velero today, BackupRepository and BackupStorageLocation have a many-to-one relationship:

graph TB
    subgraph "Current Velero Architecture"
        BSL["BackupStorageLocation"]
        
        BR1["BackupRepository: ns1-default-kopia"]
        BR2["BackupRepository: ns2-default-restic"]
        BR3["BackupRepository: ns3-default-kopia"]
        
        BR1 -->|References| BSL
        BR2 -->|References| BSL
        BR3 -->|References| BSL
        
        BR1 -.->|Stores data in| S3[("S3/Azure/GCS Bucket")]
        BR2 -.->|Stores data in| S3
        BR3 -.->|Stores data in| S3
        
        BSL -->|Points to| S3
    end
    
    style BSL fill:#1976d2,stroke:#0d47a1,stroke-width:2px,color:#fff
    style BR1 fill:#f57c00,stroke:#e65100,stroke-width:2px,color:#fff
    style BR2 fill:#f57c00,stroke:#e65100,stroke-width:2px,color:#fff
    style BR3 fill:#f57c00,stroke:#e65100,stroke-width:2px,color:#fff
    style S3 fill:#616161,stroke:#424242,stroke-width:2px,color:#fff
Loading

Key characteristics:

  • Repository naming is deterministic: {namespace}-{bsl-name}-{repo-type}
  • All repositories sharing a BSL use the same credentials
  • Repositories are created on-demand when pod volume backups are needed

Proposed OADP BSLR Architecture

Based on my understanding, BSLR operates as a parallel system:

graph TB
    subgraph "OADP Layer"
        OADP["OADP Controller"]
        BSLR1["BSLR: vm-backups"]
        BSLR2["BSLR: database-backups"]
        
        OADP -->|Manages| BSLR1
        OADP -->|Manages| BSLR2
        
        OADP -->|Initializes| KopiaRepos[("Independent Kopia Repositories")]
    end
    
    subgraph "Velero Layer - Unchanged"
        Velero["Velero"]
        BSL["BackupStorageLocation"]
        BR["BackupRepository"]
        
        Velero -->|Uses| BSL
        Velero -->|Creates| BR
        BR -->|References| BSL
    end
    
    subgraph "External Access"
        KopiaCLI["Kopia CLI"]
        KopiaCLI -.->|Direct access| KopiaRepos
    end
    
    OADP -.->|Interacts with| Velero
    BSLR1 -.->|Maps to| BSL
    BSLR2 -.->|Maps to| BSL
    
    style OADP fill:#00796b,stroke:#004d40,stroke-width:2px,color:#fff
    style BSLR1 fill:#388e3c,stroke:#1b5e20,stroke-width:2px,color:#fff
    style BSLR2 fill:#388e3c,stroke:#1b5e20,stroke-width:2px,color:#fff
    style Velero fill:#1976d2,stroke:#0d47a1,stroke-width:2px,color:#fff
    style BSL fill:#1565c0,stroke:#0d47a1,stroke-width:2px,color:#fff
    style BR fill:#f57c00,stroke:#e65100,stroke-width:2px,color:#fff
    style KopiaCLI fill:#5d4037,stroke:#3e2723,stroke-width:2px,color:#fff
    style KopiaRepos fill:#424242,stroke:#212121,stroke-width:2px,color:#fff
Loading

Critical Questions

1. Data Flow and Integration

How does OADP intercept or redirect the actual backup data flow? Since Velero will continue creating its own BackupRepository objects and managing backups as usual, how does OADP ensure that data goes to the appropriate BSLR-managed repository instead?

2. Namespace Conflict Resolution

As @sseago pointed out: If two VMs in the same namespace need different Kopia repositories, but Velero creates one repository per namespace-BSL combination, how does OADP handle this conflict?

For example:

  • VM1 in namespace "prod" needs repository with encryption key A
  • VM2 in namespace "prod" needs repository with encryption key B
  • Velero would create a single repository: prod-default-kopia

3. Repository Initialization Race Condition

Who initializes the Kopia repositories - OADP or Velero? If both systems try to initialize repositories in the same storage location, how are conflicts avoided?

4. Backup Operation Flow

During velero backup create, what actually happens? Here's what's unclear:

sequenceDiagram
    participant User
    participant Velero
    participant OADP
    participant Storage

    User->>Velero: velero backup create
    Velero->>Velero: Create Backup CR
    Note over Velero: Creates PodVolumeBackup CRs
    
    rect rgb(200, 50, 50)
        Note right of Velero: UNCLEAR: How does OADP<br/>intercept this flow?
        Velero-->>OADP: ???
        OADP-->>Storage: Use BSLR repository?
    end
    
    alt Current Understanding
        Velero->>Storage: Writes to namespace-based repo
    else Proposed BSLR
        OADP->>Storage: Writes to workload-based repo
    end
Loading

5. Repository Mapping

How does OADP map between:

  • Velero's namespace-based repository model (namespace-bsl-kopia)
  • BSLR's workload-based repository model (e.g., vm-backups, database-backups)

6. Credential Management

If BSLR repositories have independent credentials, but Velero is still performing the actual backup operations, how are the BSLR credentials injected into the backup process?

Suggested Clarifications

  1. Add a detailed sequence diagram showing:

    • Which component creates which CRs
    • How OADP intercepts or redirects the backup flow
    • Where repository selection/routing happens
    • How credentials are handled
  2. Clarify the integration mechanism:

    • Does OADP use a webhook to intercept PodVolumeBackup creation?
    • Does OADP modify Velero's repository initialization?
    • Is there a new controller that watches for Velero resources?
  3. Explain conflict resolution:

    • What happens when both Velero and OADP try to manage the same storage location?
    • How are repository naming conflicts resolved?

This would help address the "stepping on each other" concern and clarify how these two systems can coexist.

Summary

The core architectural question is: How do OADP and Velero coexist without conflicts when they have fundamentally different repository models?

  • Velero: One repository per namespace per BSL
  • OADP BSLR: Multiple repositories per BSL with workload-based selection

Understanding this interaction is crucial for evaluating the design's feasibility and implementation approach.

This design introduces the BackupStorageLocationRepository (BSLR) as a new custom
resource that models and manages Kopia repositories on a per-BSL basis.

Signed-off-by: Michal Pryc <[email protected]>
@mpryc
Copy link
Contributor Author

mpryc commented Jul 15, 2025

@mpryc How do we keep velero and OADP from stepping on each other if they're both trying to manage kopia repos?

Velero and the new OADP BSLR mechanism avoid stepping on each other by clearly separating responsibility through the use of default vs non-default BackupStorageLocationRepository (BSLR) objects.

Velero continues to manage the default BSLR (associated with the default BSL), while OADP only manages non-default BSLRs. In those cases, BSLR acts as a pointer to the Kopia repository, but OADP takes over orchestration and lifecycle management.

This separation is handled explicitly in the logic:

https://github.com/openshift/oadp-operator/pull/1827/files#diff-4f6749d4d0b57189a920b3e595dd90eece13a21614684c04ab1384e9e859f128R86

and

https://github.com/openshift/oadp-operator/pull/1827/files#diff-4f6749d4d0b57189a920b3e595dd90eece13a21614684c04ab1384e9e859f128R97

i.e. if we have 2 VMs in the same namespace and they need to be stored in different kopia repos, how does that work if Velero creates one repo and uses it for both? I have a feeling I'm missing something here.

Velero itself will not create or manage the Kopia repositories for the above use case in this design — that responsibility lies with the OADP controller, via the new BackupStorageLocationRepository (BSLR) custom resource.

So to address your example: if you have 2 VMs in the same namespace and want them to be stored in separate Kopia repositories, you would define 2 separate BSLRs (each tied to the same or different BSL, depending on your setup). Each VM would then reference its own BSLS — ensuring separation at the repository level. The BSLS is actually another layer which is covered in the design: #1830

Another possibility is to have one BSLR with one BSLS and users specified within BSLS. Each VM user will get their own credentials and Kopia Repository will separate access to the backups (snapshots) for them.

Copy link

openshift-ci bot commented Jul 15, 2025

@mpryc: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mpryc
Copy link
Contributor Author

mpryc commented Jul 15, 2025

@kaovilai Thank you for your detailed review. I've added an additional design proposal that may address some of your concerns and complements the current BSLR design: #1830

This is a separate design, but it is intended to work in tandem with the current BSLR approach.

Below are answers to your questions. Because your review was outside of the code I will copy-paste some of them inline:

Current Understanding

All those three points are correct, meaning your understanding is perfect, which leads to conclusion design is self-explanatory.

Current Velero Architecture
Repository naming is deterministic: {namespace}-{bsl-name}-{repo-type}

First of all let's drop restic out of the equation as it's not relevant to the current and future OADP nor the BSLR (as per design).

The current velero architecture is pretty much correct, the BSLR will be similar to BackupRepository, just managed, created by the OADP controller. We do not want to have same CR as Velero for this parallel mechanism.

It would be also possible to use BackupRepository and Velero as management mechanism and drop the BSLR all together, but that would require BSLS (from another design) to point to BR and have their own naming mechanism to not to step each other. Also in such case the repository password management per BR would need to be included.

Proposed OADP BSLR Architecture

Yes you are correct - very much parallel mechanism similar to BR.

Critical Questions
How does OADP intercept or redirect the actual backup data flow? Since Velero will continue creating its own BackupRepository objects and managing backups as usual, how does OADP ensure that data goes to the appropriate BSLR-managed repository instead?

OADP does not redirect or intercept the actual backup data flow managed by Velero. Instead, it introduces the option for cluster administrators to explicitly enable the BackupStorageLocationServer (BSLS) for a given BackupStorageLocationRepository (BSLR), as outlined in the BSLS design proposal.

Velero continues to manage its own backup lifecycle as usual, and any BSLR-enabled repository is managed in parallel by the OADP. OADP does not override or influence which repository Velero writes to.

Administrators could optionally configure access control (ACL) to allow read-only access to a repository created by Velero - for example, enabling users to restore specific files without allowing them to write backups. However, such ACL support is outside the scope of the current design.

In short, it is up to the cluster administrator to enable BSLS for selected BSLRs and manage access credentials accordingly.

  1. Namespace Conflict Resolution
    This is precisely the motivation for introducing the BackupStorageLocationRepository (BSLR) and the BackupStorageLocationServer (BSLS) in OADP. In the standard Velero model, repository scoping is tightly coupled to the namespace and BSL name. This makes it impossible for multiple VMs in the same namespace to use different repositories.

With the new OADP design:

  • Velero is no longer responsible for creating Kopia repositories specified in the BSLR object.
  • The OADP controller creates and manages repositories via BSLR objects.
  • Each VM can be associated with its own BSLR, which is not constrained by Velero's namespace-BSL binding.
  • Each BSLR can have own BSLS (via Add design for the Backup Storage Location Server #1830) and each BSLS can have multiple users.

For example:

  • VM1 in prod can be backed up to a BSLS 'prod' that is using BSLR 'prod' and user vm1.
  • VM2 in dev can be backed up to 'another' BSLS using BSLR 'another'.
  • VM3 in prod can be backed up to BSLS 'prod' tat is using BSLR 'prod' and user vm3.

Velero will still perform the backup and restore operations, but it will operate against repositories that were created by the Velero and are BackupRepository objects. OADP has provisioned and made available repositories that were referenced by the BSLR mechanism and made them available for backup/restore via the BSLS proxy mechanism. Velero doesn't directly know about the BSLRs - OADP configures and controls access to them, resolving the namespace-level conflict externally.

  1. Repository Initialization Race Condition
    This was answered above. OADP manages BSLR ones, Velero BR ones.
  1. Backup Operation Flow
    During velero backup create, what actually happens? Here's what's unclear.
    Nothing is changed. This is parallel mechanism which is not related to Velero Backup.
  1. Repository Mapping

OADP does not rely on Velero’s namespace-based repository model (namespace-bsl-kopia) for repository management. Instead, it decouples repository creation and selection from Velero entirely by introducing a new model:

  1. Credential Management
    Not relevant. OADP manages credentials from the BSLR object. Unless we decide to drop BSLR all together and use BR instead to allow Velero manage all the repositories including those exposed by BSLS.

@weshayutin
Copy link
Contributor

cool cool, interested in seeing the updates we discussed today. Thank you @mpryc really cool work here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants