Add KEP-12513: Introduce PVC-based artifact storage for Kubeflow Pipe… #12515

hbelmiro · 2025-12-03T19:53:07Z

Resolves: #12513

…lines Signed-off-by: Helber Belmiro <[email protected]>

google-oss-prow · 2025-12-03T19:53:10Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

google-oss-prow · 2025-12-03T19:53:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zazulam for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

juliusvonkohout · 2025-12-04T11:49:07Z

Here are some thoughts, but I still need to read the full document.

That is very far away from V1 artifact passing via PVC and would violate zero overhead namespaces. Just imagine a cluster with 1000 namespaces for scalability. Then you add 1000 permanent pods, so massive overhead. In V1 we just accepted that the artifacts will not be in the UI instead. PVC for all namespace also sounds scary. Why should we offer such a security nightmare in the first place? That would break the namespace isolation contract.

Maybe I am missing something and I very much admire that you spend the time for the KEP, but there might be some fundamental problems as mentioned above regarding security and scalability that we did not have in the KFP V1 Implementation.

But as I said some of my statements could be wrong and that is just my initial assessment without checking it thoroughly.

What I also do not understand is that by default we ship seaweedfs and you do not have to configure object storage yourself. So where is this making it easier for beginners, i think it is actually more difficult?
I see the main benefit in enterprise environments if you want to have purely per namespace storage/PVCs for faster performance than S3 and per namespace storage quotas and stticter isolation. For everyone else i would recommend to stick to seaweedfs.

Signed-off-by: Helber Belmiro <[email protected]>

hbelmiro · 2025-12-04T20:34:00Z

@juliusvonkohout

Here are some thoughts, but I still need to read the full document.

That is very far away from V1 artifact passing via PVC and would violate zero overhead namespaces. Just imagine a cluster with 1000 namespaces for scalability. Then you add 1000 permanent pods, so massive overhead.

There will be two modes: central and namespace-local.

Users should choose which mode fits to their scenarios or not even use the filesystem storage if it doesn't make sense to them. The idea is not to replace the existing storage solutions, but add a new alternative.

In V1 we just accepted that the artifacts will not be in the UI instead. PVC for all namespace also sounds scary. Why should we offer such a security nightmare in the first place? That would break the namespace isolation contract.

Maybe I am missing something and I very much admire that you spend the time for the KEP, but there might be some fundamental problems as mentioned above regarding security and scalability that we did not have in the KFP V1 Implementation.

But as I said some of my statements could be wrong and that is just my initial assessment without checking it thoroughly.

I'm not sure I'm following. Please correct me if I'm wrong, but the existing s3 solutions already break the namespace isolation contract once we have one instance for all namespaces. With the namespace-local mode proposed here we can achieve complete namespace isolation, which we don't have today.

What I also do not understand is that by default we ship seaweedfs and you do not have to configure object storage yourself. So where is this making it easier for beginners, i think it is actually more difficult? I see the main benefit in enterprise environments if you want to have purely per namespace storage/PVCs for faster performance than S3 and per namespace storage quotas and stticter isolation. For everyone else i would recommend to stick to seaweedfs.

Thanks for clarifying regarding SeaweedFS. I updated the motivation with that in mind. You can see the specific commit here.

Maybe taking a look on Goals and Non-Goals may clarify some things before going deep into the proposal.

…pace configuration. Signed-off-by: Helber Belmiro <[email protected]>

juliusvonkohout · 2025-12-05T20:25:42Z

I'm not sure I'm following. Please correct me if I'm wrong, but the existing s3 solutions already break the namespace isolation contract once we have one instance for all namespaces. With the namespace-local mode proposed here we can achieve complete namespace isolation, which we don't have today.

That is not the case anymore. Seaweedfs is now multi-tenant with ACLS and credentials per namespace as of KFP release 2.15. We even have tests that verify the namespace isolation of seaweedfs. Therefore i doubt that the central mode is needed at all. It just adds complexity and decreases security.

juliusvonkohout · 2025-12-05T20:32:06Z

proposals/12513-pvc-artifact-storage/README.md

+#### Story 1: User Running Pipelines on Local Kubernetes
+
+As a user with KFP on kind/minikube/k3s, I want my pipeline artifacts to automatically use the local cluster's default `StorageClass` via the `kfp-artifacts://` scheme, so that I can develop pipelines offline without any storage configuration.
+
+**Acceptance Criteria:**
+
+- KFP works out-of-the-box with filesystem storage on local clusters
+- Artifacts are stored using the `kfp-artifacts://` URI scheme
+- No S3/GCS credentials required
+- Artifact viewing in UI works seamlessly


Suggested change

#### Story 1: User Running Pipelines on Local Kubernetes

As a user with KFP on kind/minikube/k3s, I want my pipeline artifacts to automatically use the local cluster's default `StorageClass` via the `kfp-artifacts://` scheme, so that I can develop pipelines offline without any storage configuration.

**Acceptance Criteria:**

- KFP works out-of-the-box with filesystem storage on local clusters

- Artifacts are stored using the `kfp-artifacts://` URI scheme

- No S3/GCS credentials required

- Artifact viewing in UI works seamlessly

That is already covered by the default seaweedfs with zero effort. No storage configuration needed. seaweedfs just uses the default StorageClass. It already satisfies

KFP works out-of-the-box with filesystem storage on local clusters

No S3/GCS credentials required

Artifact viewing in UI works seamlessly

So i recommend to remove this story

I agree. This user story is already covered. The KEP just proposes a new way to do this.

juliusvonkohout · 2025-12-05T20:33:21Z

proposals/12513-pvc-artifact-storage/README.md

+#### Story 2: Operator Deploying KFP Without External Object Storage
+
+As an operator for a Kubeflow distribution, I want to deploy KFP with filesystem storage so that I don't need to productize and support a separate object storage system.
+
+**Acceptance Criteria:**
+
+- Single configuration option to enable filesystem storage
+- Artifact handling is part of KFP (no separate object storage component)
+- Storage automatically provisioned via PVCs
+- Backup/restore follows standard Kubernetes PVC procedures


Suggested change

#### Story 2: Operator Deploying KFP Without External Object Storage

As an operator for a Kubeflow distribution, I want to deploy KFP with filesystem storage so that I don't need to productize and support a separate object storage system.

**Acceptance Criteria:**

- Single configuration option to enable filesystem storage

- Artifact handling is part of KFP (no separate object storage component)

- Storage automatically provisioned via PVCs

- Backup/restore follows standard Kubernetes PVC procedures

That is already covered by the default seaweedfs with zero effort. No storage configuration needed. seaweedfs just uses the default StorageClass. It already satisfies

Artifact handling is part of KFP (no separate object storage component)

Storage automatically provisioned via PVCs

Backup/restore follows standard Kubernetes PVC procedures

So i recommend to remove this story

I think this user story still applies as it's about the admin not wanting to maintain an object storage service, which is valid for use cases where the organization wants on-premise storage (no AWS access) but doesn't have an enterprise license/support for SeaweedFS.

juliusvonkohout · 2025-12-05T20:41:07Z

proposals/12513-pvc-artifact-storage/README.md

+#### Story 3: Operator Configuring Storage Class and Size
+
+As an operator, I want to configure KFP to use a specific StorageClass and PVC size instead of defaults, so that I can match storage performance and capacity to my workload requirements.
+
+**Acceptance Criteria:**
+
+- Can specify `StorageClass` in KFP configuration
+- Can set PVC size limits (global configuration)
+- Storage quotas enforced via Kubernetes `ResourceQuotas`
+- Clear error messages when storage limits are reached
+- Can choose between RWO and RWX access modes based on needs


Suggested change

#### Story 3: Operator Configuring Storage Class and Size

As an operator, I want to configure KFP to use a specific StorageClass and PVC size instead of defaults, so that I can match storage performance and capacity to my workload requirements.

**Acceptance Criteria:**

- Can specify `StorageClass` in KFP configuration

- Can set PVC size limits (global configuration)

- Storage quotas enforced via Kubernetes `ResourceQuotas`

- Clear error messages when storage limits are reached

- Can choose between RWO and RWX access modes based on needs

That is already covered by the default seaweedfs with zero effort. You can already decide which storageclass is used for Seaweedfs. It already satisfies

Can specify StorageClass in KFP configuration

Can set PVC size limits (global configuration)

Clear error messages when storage limits are reached

Can choose between RWO and RWX access modes based on needs

So i recommend to remove this story

The only thing remaining is - Storage quotas enforced via Kubernetes ResourceQuotas. We can set it per bucket already, but i am not sure about per folder. So it is partially solved and fully solved if you create on bucket per namespace. This item can be moved to another story.

juliusvonkohout · 2025-12-05T20:44:14Z

proposals/12513-pvc-artifact-storage/README.md

+#### Story 5: Operator Deploying Multi-Tenant KFP with Namespace Isolation
+
+As an operator, I want to deploy KFP in namespace-local mode where each namespace annotated with `pipelines.kubeflow.org/enabled=true` gets its own artifact server pod and dedicated PVC, so that Team A's artifacts in namespace `team-a` are physically isolated from Team B's artifacts in namespace `team-b`.
+
+**Acceptance Criteria:**
+
+- Each namespace with `pipelines.kubeflow.org/enabled=true` annotation gets its own artifact server deployment
+- Each namespace gets its own dedicated PVC (no shared storage)
+- Artifact server in `team-a` namespace cannot access PVC in `team-b` namespace
+- Users can only access artifacts in namespaces they have RBAC permissions for (via `SubjectAccessReview`)
+- Physical isolation verified: deleting `team-a` namespace doesn't affect `team-b`'s artifacts


Suggested change

#### Story 5: Operator Deploying Multi-Tenant KFP with Namespace Isolation

As an operator, I want to deploy KFP in namespace-local mode where each namespace annotated with `pipelines.kubeflow.org/enabled=true` gets its own artifact server pod and dedicated PVC, so that Team A's artifacts in namespace `team-a` are physically isolated from Team B's artifacts in namespace `team-b`.

**Acceptance Criteria:**

- Each namespace with `pipelines.kubeflow.org/enabled=true` annotation gets its own artifact server deployment

- Each namespace gets its own dedicated PVC (no shared storage)

- Artifact server in `team-a` namespace cannot access PVC in `team-b` namespace

- Users can only access artifacts in namespaces they have RBAC permissions for (via `SubjectAccessReview`)

- Physical isolation verified: deleting `team-a` namespace doesn't affect `team-b`'s artifacts

This storay has been made completely obsolete by the multi-tenant default seaweedfs. It would just be more complicated and break zero overhead namespaces. Its all done already in a secure manner and without massive overhead from additinal artifact servers per namespace. Just imagine 1000 namespaces and 1000 extra pods when idle.

juliusvonkohout · 2025-12-05T20:45:45Z

proposals/12513-pvc-artifact-storage/README.md

+#### Story 6: Operator Preferring KFP-Native Storage
+
+As an operator in a regulated environment (e.g., healthcare, finance), I want to deploy KFP with filesystem storage using an encrypted `StorageClass` (e.g., `encrypted-gp3`), so that artifact handling stays within the KFP codebase and I don't need to include a separate object storage system in my security audits.
+
+**Acceptance Criteria:**
+
+- All artifacts stored on PVCs within the cluster
+- KFP configuration uses `Filesystem.Type: "pvc"` with encrypted `StorageClass`
+- `SubjectAccessReview` validates all artifact access requests
+- Encryption at rest provided by the configured `StorageClass` (e.g., `encrypted-gp3`)
+- No separate object storage component to audit


Suggested change

#### Story 6: Operator Preferring KFP-Native Storage

As an operator in a regulated environment (e.g., healthcare, finance), I want to deploy KFP with filesystem storage using an encrypted `StorageClass` (e.g., `encrypted-gp3`), so that artifact handling stays within the KFP codebase and I don't need to include a separate object storage system in my security audits.

**Acceptance Criteria:**

- All artifacts stored on PVCs within the cluster

- KFP configuration uses `Filesystem.Type: "pvc"` with encrypted `StorageClass`

- `SubjectAccessReview` validates all artifact access requests

- Encryption at rest provided by the configured `StorageClass` (e.g., `encrypted-gp3`)

- No separate object storage component to audit

This story is also obsolete because you can just make the seaweedfs PVC encrypted.

I agree. I think we can remove this.

juliusvonkohout · 2025-12-05T20:47:57Z

proposals/12513-pvc-artifact-storage/README.md

+#### Story 7: Operator Running KFP on Storage-Constrained Infrastructure
+
+As an operator with limited storage budget (only 1TB total available), I want to deploy KFP in central mode with a single 500GB PVC shared across 10 team namespaces, so that all teams can run pipelines without each needing their own 100GB PVC (which would require 1TB total).
+
+**Acceptance Criteria:**
+
+- Central mode configured with: `DeploymentMode: "central"`, `Size: "500Gi"`
+- Single PVC created in `kubeflow` namespace mounted by one artifact server
+- All 10 teams' artifacts stored in `/artifacts/<namespace>/` directories on same PVC
+- Teams can run pipelines concurrently without storage allocation failures
+- No per-namespace storage limits (trade-off of central mode - shared PVC means no per-team quotas)


Suggested change

#### Story 7: Operator Running KFP on Storage-Constrained Infrastructure

As an operator with limited storage budget (only 1TB total available), I want to deploy KFP in central mode with a single 500GB PVC shared across 10 team namespaces, so that all teams can run pipelines without each needing their own 100GB PVC (which would require 1TB total).

**Acceptance Criteria:**

- Central mode configured with: `DeploymentMode: "central"`, `Size: "500Gi"`

- Single PVC created in `kubeflow` namespace mounted by one artifact server

- All 10 teams' artifacts stored in `/artifacts/<namespace>/` directories on same PVC

- Teams can run pipelines concurrently without storage allocation failures

- No per-namespace storage limits (trade-off of central mode - shared PVC means no per-team quotas)

That is also just fully covered by the default multi-tenant seaweedfs. Just one single pvc in the kubeflow namespace.

juliusvonkohout · 2025-12-05T20:50:00Z

proposals/12513-pvc-artifact-storage/README.md

+#### Story 8: Operator Scaling High-Throughput Model Training Platform
+
+As an operator supporting 100+ concurrent pipeline runs with multi-GB model checkpoints, I want to deploy KFP in central mode with RWX storage (e.g., NFS/CephFS) and multiple artifact server replicas behind a load balancer, so that artifact upload/download operations can scale horizontally without bottlenecks.
+
+**Acceptance Criteria:**
+
+- Can deploy artifact servers in both central and namespace-local modes
+- Artifact servers stream large files without loading into memory
+- Can use high-performance `StorageClasses` (e.g., SSD-backed)
+- Horizontal scaling possible in central mode
+- Direct pod-to-pod communication in namespace-local mode reduces latency


Suggested change

#### Story 8: Operator Scaling High-Throughput Model Training Platform

As an operator supporting 100+ concurrent pipeline runs with multi-GB model checkpoints, I want to deploy KFP in central mode with RWX storage (e.g., NFS/CephFS) and multiple artifact server replicas behind a load balancer, so that artifact upload/download operations can scale horizontally without bottlenecks.

**Acceptance Criteria:**

- Can deploy artifact servers in both central and namespace-local modes

- Artifact servers stream large files without loading into memory

- Can use high-performance `StorageClasses` (e.g., SSD-backed)

- Horizontal scaling possible in central mode

- Direct pod-to-pod communication in namespace-local mode reduces latency

That is also obsolete by merging #12391, so a distributed seaweedfs.

juliusvonkohout · 2025-12-05T20:55:18Z

proposals/12513-pvc-artifact-storage/README.md

+#### Story 9: Operator with Mixed Isolation Requirements
+
+As an operator, I want to deploy KFP in central mode by default, but configure specific namespaces (e.g., `team-finance`) to use namespace-local mode for stricter isolation, so that most teams share the simple central server while sensitive teams get dedicated resources.
+
+**Acceptance Criteria:**
+
+- Global deployment mode set to `central` (default)
+- Specific namespaces can override to `namespaced` via their `kfp-launcher` ConfigMap
+- Teams using central mode share the central artifact server
+- Teams with namespace-local override get their own artifact server and PVC
+- UI correctly routes artifact requests based on each namespace's deployment mode
+- No cluster-wide restart needed to change a namespace's mode


Suggested change

#### Story 9: Operator with Mixed Isolation Requirements

As an operator, I want to deploy KFP in central mode by default, but configure specific namespaces (e.g., `team-finance`) to use namespace-local mode for stricter isolation, so that most teams share the simple central server while sensitive teams get dedicated resources.

**Acceptance Criteria:**

- Global deployment mode set to `central` (default)

- Specific namespaces can override to `namespaced` via their `kfp-launcher` ConfigMap

- Teams using central mode share the central artifact server

- Teams with namespace-local override get their own artifact server and PVC

- UI correctly routes artifact requests based on each namespace's deployment mode

- No cluster-wide restart needed to change a namespace's mode

#### Story 9: Operator with enterprise isolation requirements and fast local storage for giant models

I want to have as in kfp v1 data_passing_method() a way to have a per pipeline or per namespace PVC that is very fast local storage such that i do not need to up and download large models for each step from and to S3.

That is a security nightmare i would not want to support or recommend to any enterprise. It is just worse than the default multi-tenant seaweedfs. I added a typical usecase i have seen so far.

juliusvonkohout · 2025-12-05T20:58:05Z

proposals/12513-pvc-artifact-storage/README.md

+##### Mode 1: Central Artifact Server (Default)
+
+A single artifact server in the main KFP namespace serves all namespaces, configured via `ObjectStoreConfig.ArtifactServer.DeploymentMode: "central"`.
+
+**Central Mode Characteristics:**
+
+- **Single PVC** with directory structure: `/artifacts/<namespace>/<pipeline>/<run-id>/<node-id>/<artifact-name>`
+- **Authorization**: Uses `SubjectAccessReview` to verify namespace access
+- **Best for**: Simple deployments, single-user setups, small teams
+- **Advantages**: Simple setup, single storage location, easy backup
+- **Limitations**: All namespaces share same PVC and storage quota


Suggested change

##### Mode 1: Central Artifact Server (Default)

A single artifact server in the main KFP namespace serves all namespaces, configured via `ObjectStoreConfig.ArtifactServer.DeploymentMode: "central"`.

**Central Mode Characteristics:**

- **Single PVC** with directory structure: `/artifacts/<namespace>/<pipeline>/<run-id>/<node-id>/<artifact-name>`

- **Authorization**: Uses `SubjectAccessReview` to verify namespace access

- **Best for**: Simple deployments, single-user setups, small teams

- **Advantages**: Simple setup, single storage location, easy backup

- **Limitations**: All namespaces share same PVC and storage quota

For the reasons mentioned above and below this mode has been made fully obsolete by the default multi-tenant seaweedfs. Furthermore it is a security nightmare and adds unnecessary complexity. There is zero benefit implementing this.

juliusvonkohout · 2025-12-05T21:00:01Z

proposals/12513-pvc-artifact-storage/README.md

+##### Mode 2: Namespace-Local Artifact Servers
+
+Each namespace runs its own artifact server, configured via `ObjectStoreConfig.ArtifactServer.DeploymentMode: "namespaced"`.
+
+**Namespace-Local Mode Characteristics:**
+
+- **PVC per namespace**: Complete storage isolation
+- **Direct access**: Clients connect directly to namespace servers (no proxying)
+- **Authorization**: Natural isolation (each server only accesses its namespace's PVC)
+- **Best for**: Large multi-tenant deployments, strict isolation requirements
+- **Advantages**: True multi-tenancy, per-namespace scaling, independent quotas
+- **Deployment**: Proactive initialization when namespace has `pipelines.kubeflow.org/enabled=true` annotation


We already have multi tenancy from the default seaweedfs. The question is how can we avoid the per namespace overhead and not break zero overhead namespaces when idle. This was working well in KFP v1 already. Just copying the kfp v1 data_passing_method() to kfp v2 would be enough.

juliusvonkohout · 2025-12-05T21:02:30Z

proposals/12513-pvc-artifact-storage/README.md

+##### Mixed Mode Support
+
+For multi-tenant deployments with varying isolation requirements, administrators can configure different deployment modes per namespace using the `kfp-launcher` ConfigMap:
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: kfp-launcher
+  namespace: team-requiring-isolation
+data:
+  defaultPipelineRoot: "kfp-artifacts://team-requiring-isolation"
+  artifactServer: |
+    deploymentMode: namespaced
+```
+
+The `artifactServer` key contains a YAML block that mirrors the global `ObjectStoreConfig.ArtifactServer` structure, allowing consistent configuration patterns across global and per-namespace settings.
+
+This enables scenarios where:
+
+- Most namespaces use the simpler central mode (global default)
+- Specific namespaces requiring strict isolation use namespace-local mode
+- Teams can be migrated between modes without cluster-wide changes
+
+#### Request Routing
+
+Based on the configured mode (global default or per-namespace override from `kfp-launcher` ConfigMap), artifact URIs are resolved differently:
+
+**Central Mode:**
+
+```text
+Client
+  │
+  │ GET kfp-artifacts://<namespace>/...
+  ▼
+KFP API Server (central)
+  │
+  │ Authorization check (SubjectAccessReview)
+  ▼
+Serve from /artifacts/<namespace>/...


Suggested change

##### Mixed Mode Support

For multi-tenant deployments with varying isolation requirements, administrators can configure different deployment modes per namespace using the `kfp-launcher` ConfigMap:

```yaml

apiVersion: v1

kind: ConfigMap

metadata:

name: kfp-launcher

namespace: team-requiring-isolation

data:

defaultPipelineRoot: "kfp-artifacts://team-requiring-isolation"

artifactServer: |

deploymentMode: namespaced

```

The `artifactServer` key contains a YAML block that mirrors the global `ObjectStoreConfig.ArtifactServer` structure, allowing consistent configuration patterns across global and per-namespace settings.

This enables scenarios where:

- Most namespaces use the simpler central mode (global default)

- Specific namespaces requiring strict isolation use namespace-local mode

- Teams can be migrated between modes without cluster-wide changes

#### Request Routing

Based on the configured mode (global default or per-namespace override from `kfp-launcher` ConfigMap), artifact URIs are resolved differently:

**Central Mode:**

```text

Client

│

│ GET kfp-artifacts://<namespace>/...

▼

KFP API Server (central)

│

│ Authorization check (SubjectAccessReview)

▼

Serve from /artifacts/<namespace>/...

As explained above the central mode is dangerous and worse than the default multi-tenant seaweedfs.

juliusvonkohout · 2025-12-05T21:03:04Z

proposals/12513-pvc-artifact-storage/README.md

+###### Central Mode Architecture
+
+```text
+┌──────────────────────────────────────┐
+│              KFP SDK                 │
+└──────────────────┬───────────────────┘
+                   │ pipeline_root: "kfp-artifacts://..."
+                   ▼
+┌──────────────────────────────────────┐
+│           Pipeline Spec              │
+└──────────────────┬───────────────────┘
+                   │
+                   ▼
+┌──────────────────────────────────────┐
+│             Compiler                 │
+│     (generates artifact API calls)   │
+└──────────────────┬───────────────────┘
+                   │
+                   ▼
+┌──────────────────────────────────────┐
+│              Driver                  │
+│   (validates artifact server exists) │
+└──────────────────┬───────────────────┘
+                   │
+                   ▼
+┌──────────────────────────────────────┐
+│             Launcher                 │
+│     (uploads/downloads via API)      │
+└──────────────────┬───────────────────┘
+                   │ API calls
+                   ▼
+┌──────────────────────────────────────┐
+│      Central Artifact Server         │
+│        (namespace: kubeflow)         │
+│                                      │
+│  • SubjectAccessReview               │
+│  • Mounts central PVC                │
+│  • Serves all namespaces             │
+└──────────────────┬───────────────────┘
+                   │ mounts
+                   ▼
+┌──────────────────────────────────────┐
+│  Central PVC (kfp-artifacts-central) │
+│  /artifacts/                         │
+│    ├── ns1/                          │
+│    ├── ns2/                          │
+│    └── ns3/                          │
+└──────────────────────────────────────┘


As explained above the central mode is dangerous and worse than the default multi-tenant seaweedfs.

Suggested change

###### Central Mode Architecture

```text

┌──────────────────────────────────────┐

│ KFP SDK │

└──────────────────┬───────────────────┘

│ pipeline_root: "kfp-artifacts://..."

▼

┌──────────────────────────────────────┐

│ Pipeline Spec │

└──────────────────┬───────────────────┘

│

▼

┌──────────────────────────────────────┐

│ Compiler │

│ (generates artifact API calls) │

└──────────────────┬───────────────────┘

│

▼

┌──────────────────────────────────────┐

│ Driver │

│ (validates artifact server exists) │

└──────────────────┬───────────────────┘

│

▼

┌──────────────────────────────────────┐

│ Launcher │

│ (uploads/downloads via API) │

└──────────────────┬───────────────────┘

│ API calls

▼

┌──────────────────────────────────────┐

│ Central Artifact Server │

│ (namespace: kubeflow) │

│ │

│ • SubjectAccessReview │

│ • Mounts central PVC │

│ • Serves all namespaces │

└──────────────────┬───────────────────┘

│ mounts

▼

┌──────────────────────────────────────┐

│ Central PVC (kfp-artifacts-central) │

│ /artifacts/ │

│ ├── ns1/ │

│ ├── ns2/ │

│ └── ns3/ │

└──────────────────────────────────────┘

juliusvonkohout · 2025-12-05T21:23:21Z

/hold

hbelmiro · 2025-12-08T12:44:02Z

@juliusvonkohout

Seaweedfs is now multi-tenant with ACLS and credentials per namespace as of KFP release 2.15. We even have tests that verify the namespace isolation of seaweedfs.

Just imagine 1000 namespaces and 1000 extra pods when idle.

Are you saying that SeaweedFS uses one PVC per namespace (like the namespaced-local mode in the proposal)?
I'm curious how it workarounds the physical limitation between the api server and the PVCs on different namespaces without an additional pod between them.

juliusvonkohout · 2025-12-10T21:29:54Z

@juliusvonkohout

Seaweedfs is now multi-tenant with ACLS and credentials per namespace as of KFP release 2.15. We even have tests that verify the namespace isolation of seaweedfs.

Just imagine 1000 namespaces and 1000 extra pods when idle.

Are you saying that SeaweedFS uses one PVC per namespace (like the namespaced-local mode in the proposal)? I'm curious how it workarounds the physical limitation between the api server and the PVCs on different namespaces without an additional pod between them.

No, it uses a hard multi-tenant separated S3 storage that is directly accessed by the ml-pipeline-ui by default. no proxy needed. See also https://github.com/kubeflow/manifests#architecture for my diagram

This is also in the new release and blogposts

https://github.com/kubeflow/manifests/releases/tag/v1.11.0-rc.1 check the highlights
https://blog.kubeflow.org/gsoc/community/kubeflow/2025/09/06/kubeflow-and-gsoc2025.html#project-1-kubeflow-platform-enhancements
https://medium.com/@hpotpose26/kubeflow-pipelines-embraces-seaweedfs-9a7e022d5571 here are technical details, each namespace has its own S3 user and ml-pipeline-ui has administrative access. Read access would also be enough.

mprahl · 2025-12-11T17:23:15Z

proposals/12513-pvc-artifact-storage/README.md

+
+This KEP proposes adding filesystem-based storage as an alternative artifact storage backend for Kubeflow Pipelines v2. While KFP currently ships with S3-compatible storage by default, some deployments prefer not to depend on a separate object storage system. This proposal introduces filesystem storage as an additional option where artifact handling is integrated into KFP itself, eliminating the need for an external object storage component.
+
+The filesystem backend will primarily use `PersistentVolumeClaim` (PVC) based storage in Kubernetes environments, providing namespace-isolated storage using Kubernetes native `PersistentVolumes`. However, the design is flexible enough to support other filesystem backends (e.g., local filesystem for development). Users can specify any access mode value which KFP will pass through to Kubernetes without validation (e.g., `ReadWriteMany` for parallel task execution across nodes, `ReadWriteOnce` for single node access, etc.), with RWO as the default if not specified. The actual behavior depends on what the underlying storage class supports. Existing pipelines will work without modification unless they contain hardcoded S3/object storage paths.


This paragraph implies the PVCs are mounted to the pods "which KFP will pass through to Kubernetes without validation". The following paragraph contradicts that.

mprahl · 2025-12-11T17:24:52Z

proposals/12513-pvc-artifact-storage/README.md

+| **URI Scheme**         | `s3://`, `gs://`, `minio://`                    | `kfp-artifacts://`                                |
+| **Architecture**       | Separate object storage service                 | KFP-native artifact server                        |
+| **Required Knowledge** | S3 concepts (buckets, endpoints, regions)       | Kubernetes concepts (PVCs, StorageClasses)        |
+| **Multi-tenancy**      | Shared storage (single instance)                | Per-namespace PVCs (in namespace-local mode)      |


Per-namespace PVCs should be optional. In other words, default to the central artifact server, respecting multi-tenancy, but allow individual namespaces to use a different solution (e.g. S3 or another artifact server).

mprahl · 2025-12-11T17:25:27Z

proposals/12513-pvc-artifact-storage/README.md

+Driver (detects "kfp-artifacts://")
+        │
+        ▼
+Artifact Server (mounts PVC)


Wouldn't the PVC always be mounted?

mprahl · 2025-12-11T17:28:33Z

proposals/12513-pvc-artifact-storage/README.md

+### Key Components
+
+- **Storage**: Kubernetes `PersistentVolumeClaims` (one per namespace)
+- **URI Format**: `kfp-artifacts://<namespace>/<pipeline>/<run-id>/<node-id>/<artifact-name>`


Could you be more specific on <pipeline>? Is it a name or ID? Is it a pipeline or pipeline version?

Also, what is ?

mprahl · 2025-12-11T17:31:04Z

proposals/12513-pvc-artifact-storage/README.md

+A dedicated endpoint provides filesystem storage configuration and routing:
+
+```http
+GET /apis/v2beta1/filesystem-storage/config


I don't think we need this. The artifact URI should contain the hostname of the artifact server that was used and it can be parsed directly to know how to route the request.

mprahl · 2025-12-11T17:32:38Z

proposals/12513-pvc-artifact-storage/README.md

+
+**This proposal does not aim to replace existing object storage solutions.** S3-compatible storage remains fully supported and recommended for most production workloads. Instead, this KEP provides an additional option for deployments where a simpler, KFP-native artifact storage solution is preferred.
+
+While KFP currently ships with S3-compatible storage by default, this still requires deploying and maintaining a separate object storage service. For some deployment scenarios, this additional component may not be desired.


This also helps with running KFP locally, outside of Kubernetes.

mprahl · 2025-12-11T17:35:02Z

proposals/12513-pvc-artifact-storage/README.md

+
+While KFP currently ships with S3-compatible storage by default, this still requires deploying and maintaining a separate object storage service. For some deployment scenarios, this additional component may not be desired.
+
+### Reduced External Dependencies


I don't think this section adds a lot and could be consolidated in the Motivation section.

mprahl · 2025-12-11T17:36:23Z

proposals/12513-pvc-artifact-storage/README.md

+
+Many enterprises and Kubeflow distributions prefer not to have additional external dependencies. With filesystem storage:
+
+- No separate object storage project to productize and support


Also include that this will directly leverage Kubernetes RBAC, aligned with existing permission mechanisms used by other parts of KFP. This makes onboarding and provisioning new namespaces simpler.

mprahl · 2025-12-11T17:38:28Z

proposals/12513-pvc-artifact-storage/README.md

+
+### Enterprise Considerations
+
+Many enterprises and Kubeflow distributions prefer not to have additional external dependencies. With filesystem storage:


Another aspect is you'll automatically get namespace isolation of artifacts on the central artifact server through namespace aware paths and Kubernetes RBAC.

mprahl · 2025-12-11T17:39:44Z

proposals/12513-pvc-artifact-storage/README.md

+In namespace-local mode, each namespace gets its own dedicated artifact server and PVC. This provides:
+
+- **Storage isolation**: Each team's artifacts are physically separated in their own PVC
+- **Independent scaling**: Teams can scale their artifact server horizontally (with RWX storage) and size their PVC based on workload requirements


Suggested change

- **Independent scaling**: Teams can scale their artifact server horizontally (with RWX storage) and size their PVC based on workload requirements

- **Independent scaling**: Teams can scale their artifact server horizontally and size their PVC based on workload requirements

mprahl · 2025-12-11T17:40:43Z

proposals/12513-pvc-artifact-storage/README.md

+
+### Per-Namespace Isolation and Scaling
+
+In namespace-local mode, each namespace gets its own dedicated artifact server and PVC. This provides:


I don't think we should have a specific mode.

Essentially, the default behavior when using the KFP artifact server would be to use the central/default instance, but every namespace can override this configuration using the kfp-launcher ConfigMap or equivalent.

mprahl · 2025-12-11T17:41:59Z

proposals/12513-pvc-artifact-storage/README.md

+**When to use filesystem storage:**
+
+- Deployments where eliminating object storage dependency is preferred
+- Environments where Kubeflow distributions or platform providers prefer not to support additional storage systems


Suggested change

- Environments where Kubeflow distributions or platform providers prefer not to support additional storage systems

- Environments where Kubeflow distributions or platform providers do not offer a fully supported object store solution

mprahl · 2025-12-11T17:43:09Z

proposals/12513-pvc-artifact-storage/README.md

+
+- Deployments where eliminating object storage dependency is preferred
+- Environments where Kubeflow distributions or platform providers prefer not to support additional storage systems
+- Multi-tenant deployments requiring per-namespace storage isolation, scaling, and quotas


I think this can all be achieved through S3.

mprahl · 2025-12-11T17:43:53Z

proposals/12513-pvc-artifact-storage/README.md

+
+Based on the user story "As a user, I want to provision Kubeflow Pipelines with just a PVC for artifact storage so that I can quickly get started", this KEP aims to:
+
+1. **Add filesystem storage as an additional backend option** alongside S3-compatible and Google Cloud Storage, primarily using PVC but not limited to it


What do you mean by "but not limited to it"?

mprahl · 2025-12-11T17:44:44Z

proposals/12513-pvc-artifact-storage/README.md

+1. **Add filesystem storage as an additional backend option** alongside S3-compatible and Google Cloud Storage, primarily using PVC but not limited to it
+2. **Enable zero-configuration storage** for experimentation use cases - a KFP server can be installed with just a PVC for artifact storage
+3. **Provide namespace-isolated artifact storage** with proper subject access review guards in multi-user mode
+4. **Allow any Kubernetes access mode to be configured** - KFP passes through the configuration to Kubernetes (RWO default)


What does this mean?

proposals/12513-pvc-artifact-storage/README.md

mprahl · 2025-12-11T17:47:45Z

proposals/12513-pvc-artifact-storage/README.md

+4. **Allow any Kubernetes access mode to be configured** - KFP passes through the configuration to Kubernetes (RWO default)
+5. **Support existing pipelines** that use KFP's standard artifact types (Dataset, Model, etc.) - pipelines work unchanged with the new filesystem backend
+6. **Match existing artifact persistence behavior** - artifacts persist indefinitely until explicitly deleted (no automatic cleanup)
+7. **Enable separate scaling of artifact serving** through an artifacts-only KFP instance with `--artifacts-only` flag


We should also support deploying the artifact server as a DaemonSet (ensuring a pod is on every Kubernetes node) and set the Service to internalTrafficPolicy: Local to keep all artifact traffic local to the Kubernetes node.

mprahl · 2025-12-11T17:49:36Z

proposals/12513-pvc-artifact-storage/README.md

+
+This KEP proposes adding a new artifact storage backend that uses filesystem storage (primarily Kubernetes `PersistentVolumeClaims`) instead of object storage. The implementation will:
+
+1. Create one PVC per namespace for artifact storage


We should not require this as the central artifact server should be namespace aware (namespace in the path and subject access review based on the namespace in the path).

mprahl · 2025-12-11T17:49:59Z

proposals/12513-pvc-artifact-storage/README.md

+This KEP proposes adding a new artifact storage backend that uses filesystem storage (primarily Kubernetes `PersistentVolumeClaims`) instead of object storage. The implementation will:
+
+1. Create one PVC per namespace for artifact storage
+2. Use configurable access mode with sensible defaults (RWO)


What does this mean?

mprahl · 2025-12-11T17:50:12Z

proposals/12513-pvc-artifact-storage/README.md

+
+1. Create one PVC per namespace for artifact storage
+2. Use configurable access mode with sensible defaults (RWO)
+3. Organize artifacts in a filesystem hierarchy within the PVC


Suggested change

3. Organize artifacts in a filesystem hierarchy within the PVC

3. Organize artifacts in a filesystem hierarchy within the PVC that is namespace aware

mprahl · 2025-12-11T17:50:44Z

proposals/12513-pvc-artifact-storage/README.md

+1. Create one PVC per namespace for artifact storage
+2. Use configurable access mode with sensible defaults (RWO)
+3. Organize artifacts in a filesystem hierarchy within the PVC
+4. Provide transparent access through the existing KFP artifact APIs with new `kfp-artifacts://` URI scheme


Do we have "KFP artifact APIs"? Perhaps this is referencing to @HumairAK 's MLMD removal PR.

mprahl · 2025-12-11T17:51:00Z

proposals/12513-pvc-artifact-storage/README.md

+2. Use configurable access mode with sensible defaults (RWO)
+3. Organize artifacts in a filesystem hierarchy within the PVC
+4. Provide transparent access through the existing KFP artifact APIs with new `kfp-artifacts://` URI scheme
+5. Maintain compatibility with existing pipeline definitions that don't have hardcoded storage paths


When is this ever the case?

mprahl · 2025-12-11T17:59:16Z

proposals/12513-pvc-artifact-storage/README.md

+
+#### Story 3: Operator Configuring Storage Class and Size
+
+As an operator, I want to configure KFP to use a specific StorageClass and PVC size instead of defaults, so that I can match storage performance and capacity to my workload requirements.


This is implied whenever you use a PVC, so if the user story is about making artifact storage configurable per namespace (e.g. use a dedicated artifact server or use S3 in this namespace), then I think this can be removed.

mprahl · 2025-12-11T18:00:08Z

proposals/12513-pvc-artifact-storage/README.md

+
+#### Story 4: User Migrating from S3 to Filesystem Storage
+
+As a user with existing pipelines containing components that call `boto3.upload_file()` directly, I want KFP system artifacts to use `kfp-artifacts://` with PVC storage while my custom components continue accessing S3, so that I can migrate incrementally without rewriting all components at once.


I don't think this user story is relevant. That's just a user's custom Python code.

mprahl · 2025-12-11T18:06:43Z

proposals/12513-pvc-artifact-storage/README.md

+
+#### Story 5: Operator Deploying Multi-Tenant KFP with Namespace Isolation
+
+As an operator, I want to deploy KFP in namespace-local mode where each namespace annotated with `pipelines.kubeflow.org/enabled=true` gets its own artifact server pod and dedicated PVC, so that Team A's artifacts in namespace `team-a` are physically isolated from Team B's artifacts in namespace `team-b`.


Like I said, we don't need the concept of modes. We just need a central artifact server and allow each namespace to override to use a different solution (e.g. dedicated artifact server or S3) through the kfp-launcher ConfigMap.

An admin can still opt in to this by configuring each provisioned namespace, but this largely defeats the spirit of the KEP of making administration easier (and potentially cheaper if enterprise licensing is required) with less components involved.

mprahl · 2025-12-11T18:10:30Z

proposals/12513-pvc-artifact-storage/README.md

+6. Support separate scaling of artifact serving through artifacts-only instances
+7. Update the UI to seamlessly handle artifact downloads from filesystem storage
+
+### User Stories


I think we only need 2 user stories:

As an admin with a requirement to have storage on-premise, I want a simple artifact storage solution for KFP artifacts without having to maintain a separate service for artifacts due to administrative overhead or enterprise licensing costs.

As an admin using the KFP artifact server, I want the ability to override a namespace's artifact configuration to use alternative storage such as S3 or a dedicated artifact server.

Add KEP-12513: Introduce PVC-based artifact storage for Kubeflow Pipe…

6f283dc

…lines Signed-off-by: Helber Belmiro <[email protected]>

google-oss-prow bot added the do-not-merge/work-in-progress label Dec 3, 2025

hbelmiro marked this pull request as ready for review December 3, 2025 19:53

google-oss-prow bot requested a review from droctothorpe December 3, 2025 19:53

google-oss-prow bot removed the do-not-merge/work-in-progress label Dec 3, 2025

google-oss-prow bot requested review from HumairAK, james-jwu and zazulam December 3, 2025 19:53

google-oss-prow bot added the size/XXL label Dec 3, 2025

hbelmiro mentioned this pull request Dec 3, 2025

Add KEP-924: Introduce PVC-based artifact storage for Kubeflow Pipelines kubeflow/community#927

Closed

hbelmiro added 2 commits December 4, 2025 17:08

docs: Update deployment mode to "namespaced" in KEP-12513 instructions

3e7bfbc

Signed-off-by: Helber Belmiro <[email protected]>

docs: Enhance motivation and comparison sections in KEP-12513

cc1cced

Signed-off-by: Helber Belmiro <[email protected]>

Add support for mixed artifact server deployment modes with per-names…

ab037c4

…pace configuration. Signed-off-by: Helber Belmiro <[email protected]>

juliusvonkohout reviewed Dec 5, 2025

View reviewed changes

google-oss-prow bot added the do-not-merge/hold label Dec 5, 2025

mprahl reviewed Dec 11, 2025

View reviewed changes

proposals/12513-pvc-artifact-storage/README.md Show resolved Hide resolved

mprahl reviewed Dec 11, 2025

View reviewed changes


		This KEP proposes adding filesystem-based storage as an alternative artifact storage backend for Kubeflow Pipelines v2. While KFP currently ships with S3-compatible storage by default, some deployments prefer not to depend on a separate object storage system. This proposal introduces filesystem storage as an additional option where artifact handling is integrated into KFP itself, eliminating the need for an external object storage component.

		The filesystem backend will primarily use `PersistentVolumeClaim` (PVC) based storage in Kubernetes environments, providing namespace-isolated storage using Kubernetes native `PersistentVolumes`. However, the design is flexible enough to support other filesystem backends (e.g., local filesystem for development). Users can specify any access mode value which KFP will pass through to Kubernetes without validation (e.g., `ReadWriteMany` for parallel task execution across nodes, `ReadWriteOnce` for single node access, etc.), with RWO as the default if not specified. The actual behavior depends on what the underlying storage class supports. Existing pipelines will work without modification unless they contain hardcoded S3/object storage paths.


		This proposal does not aim to replace existing object storage solutions. S3-compatible storage remains fully supported and recommended for most production workloads. Instead, this KEP provides an additional option for deployments where a simpler, KFP-native artifact storage solution is preferred.

		While KFP currently ships with S3-compatible storage by default, this still requires deploying and maintaining a separate object storage service. For some deployment scenarios, this additional component may not be desired.


		While KFP currently ships with S3-compatible storage by default, this still requires deploying and maintaining a separate object storage service. For some deployment scenarios, this additional component may not be desired.

		### Reduced External Dependencies


		Many enterprises and Kubeflow distributions prefer not to have additional external dependencies. With filesystem storage:

		- No separate object storage project to productize and support


		### Enterprise Considerations

		Many enterprises and Kubeflow distributions prefer not to have additional external dependencies. With filesystem storage:

	- Independent scaling: Teams can scale their artifact server horizontally (with RWX storage) and size their PVC based on workload requirements
	- Independent scaling: Teams can scale their artifact server horizontally and size their PVC based on workload requirements


		### Per-Namespace Isolation and Scaling

		In namespace-local mode, each namespace gets its own dedicated artifact server and PVC. This provides:

	- Environments where Kubeflow distributions or platform providers prefer not to support additional storage systems
	- Environments where Kubeflow distributions or platform providers do not offer a fully supported object store solution


		Based on the user story "As a user, I want to provision Kubeflow Pipelines with just a PVC for artifact storage so that I can quickly get started", this KEP aims to:

		1. Add filesystem storage as an additional backend option alongside S3-compatible and Google Cloud Storage, primarily using PVC but not limited to it


		This KEP proposes adding a new artifact storage backend that uses filesystem storage (primarily Kubernetes `PersistentVolumeClaims`) instead of object storage. The implementation will:

		1. Create one PVC per namespace for artifact storage

	3. Organize artifacts in a filesystem hierarchy within the PVC
	3. Organize artifacts in a filesystem hierarchy within the PVC that is namespace aware


		#### Story 3: Operator Configuring Storage Class and Size

		As an operator, I want to configure KFP to use a specific StorageClass and PVC size instead of defaults, so that I can match storage performance and capacity to my workload requirements.


		#### Story 4: User Migrating from S3 to Filesystem Storage

		As a user with existing pipelines containing components that call `boto3.upload_file()` directly, I want KFP system artifacts to use `kfp-artifacts://` with PVC storage while my custom components continue accessing S3, so that I can migrate incrementally without rewriting all components at once.


		#### Story 5: Operator Deploying Multi-Tenant KFP with Namespace Isolation

		As an operator, I want to deploy KFP in namespace-local mode where each namespace annotated with `pipelines.kubeflow.org/enabled=true` gets its own artifact server pod and dedicated PVC, so that Team A's artifacts in namespace `team-a` are physically isolated from Team B's artifacts in namespace `team-b`.

Add KEP-12513: Introduce PVC-based artifact storage for Kubeflow Pipe… #12515

Are you sure you want to change the base?

Add KEP-12513: Introduce PVC-based artifact storage for Kubeflow Pipe… #12515

Conversation

hbelmiro commented Dec 3, 2025

Uh oh!

google-oss-prow bot commented Dec 3, 2025

Uh oh!

google-oss-prow bot commented Dec 3, 2025

Uh oh!

juliusvonkohout commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hbelmiro commented Dec 4, 2025

Uh oh!

juliusvonkohout commented Dec 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliusvonkohout Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliusvonkohout Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliusvonkohout Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliusvonkohout Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliusvonkohout commented Dec 5, 2025

Uh oh!

hbelmiro commented Dec 8, 2025

Uh oh!

juliusvonkohout commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mprahl Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mprahl Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

juliusvonkohout commented Dec 4, 2025 •

edited

Loading

juliusvonkohout Dec 5, 2025 •

edited

Loading

juliusvonkohout Dec 5, 2025 •

edited

Loading

juliusvonkohout Dec 5, 2025 •

edited

Loading

juliusvonkohout Dec 5, 2025 •

edited

Loading

juliusvonkohout commented Dec 10, 2025 •

edited

Loading

mprahl Dec 11, 2025 •

edited

Loading

mprahl Dec 11, 2025 •

edited

Loading