Skip to content

Add seqera:// data-links support to nf-tower filesystem#7070

Open
jorgee wants to merge 7 commits intomasterfrom
260422-seqera-datalinks-fs
Open

Add seqera:// data-links support to nf-tower filesystem#7070
jorgee wants to merge 7 commits intomasterfrom
260422-seqera-datalinks-fs

Conversation

@jorgee
Copy link
Copy Markdown
Contributor

@jorgee jorgee commented Apr 24, 2026

Summary

Extends the seqera:// NIO filesystem in nf-tower with a second resource type, data-links. Paths of the form seqera://<org>/<ws>/data-links/<provider>/<name>/<sub-path> resolve to files and directories inside Platform-managed data-links (S3/GCS/Azure buckets or prefixes).

Listings and attribute queries go through the Platform's /data-links/{id}/browse[/path] endpoints; byte reads go through pre-signed URLs returned by /data-links/{id}/generate-download-url and fetched with a plain JDK HttpClient. Only the Seqera access token is required — no AWS/GCP/Azure credentials, no cloud SDK dependency is introduced.

As part of this change, the existing dataset-specific logic in SeqeraFileSystemProvider, SeqeraFileSystem, and SeqeraPath is extracted into a real ResourceTypeHandler abstraction; DatasetsResourceHandler and DataLinksResourceHandler are the two implementations. The generic fs/ classes become resource-type-agnostic for depth ≥ 3 (enforced by ResourceTypeAbstractionTest).

Design artifacts: spec.md, plan.md, ADR.

Highlights

  • Path shape: seqera://<org>/<ws>/data-links/<provider>/<name>/<sub-path>. Provider segments are the lowercase DataLinkProvider.toString() value (aws, google, azure, …).
  • Generic lazy pagination via PagedIterable<T>: a single shared abstraction backs both the workspace data-link list (offset paginated) and data-link content browse (token paginated). The first page is fetched eagerly so IOException surfaces at the call site, not at the first Iterator.hasNext(). Two named static fetchers (DataLinkListFetcher, DataLinkContentFetcher) own their own cursor state.
  • Reliable file-vs-directory detection: readAttributes on a sub-path lists the path's parent directory and finds the entry by name; the entry's type (FILE/FOLDER) is the authoritative signal, and a missing entry → NoSuchFileException. The /browse/{path} response shape alone does not reliably distinguish file/directory/missing paths.
  • Per-path attribute caching: listings attach SeqeraFileAttributes to each emitted SeqeraPath; the provider also writes resolved attributes back onto the path after a fresh read. Subsequent readAttributes calls on the same path instance hit the cache (zero API calls).
  • Single-call data-link resolution: getDataLink(ws, provider, name) issues a combined keyword search (<name> provider:<provider>) so the server returns at most one match. @Memoized, including null misses.
  • Cached user-id on the filesystem: SeqeraFileSystem holds the TowerClient directly and exposes getUserId() cached for the lifetime of the FS — the token doesn't change during a pipeline run. User/workspace lookup is shared infrastructure across resource types, not a dataset-client method.
  • credentialsId forwarding: when DataLinkDto.credentials is non-empty, the first credential's id is forwarded as the credentialsId query parameter on browse and download-URL requests.
  • Error mapping: 401 → AbortOperationException; 403 → AccessDeniedException; 404 → NoSuchFileException. Consistent with the dataset client.
  • 369 unit tests pass (Spock + Mock(TowerClient)). The pre-existing dataset tests are unchanged and continue to pass.

Requirements / prerequisites

⚠️ Platform permission: the Seqera Platform user whose access token is used to run the pipeline must have a Maintain role (or higher) on the workspace. Lower roles (e.g. View) cannot list/browse data-links through the Platform API and will see AccessDeniedException on any seqera://<org>/<ws>/data-links/... path.

  • nf-tower plugin must be enabled with tower.accessToken / TOWER_ACCESS_TOKEN.

Known limitations

  • Signed URL expiration is not handled transparently. Very long reads that outlive the URL's validity window surface as IOException; Nextflow task retry handles recovery.
  • No per-item last-modified exposed by the Platform browse API. SeqeraFileAttributes.lastModifiedTime() returns Instant.EPOCH for data-link entries.
  • Read-only in this iteration. Write operations raise UnsupportedOperationException. The Platform's /data-links/{id}/upload endpoints are a natural future extension point.
  • No data-link write, rename, delete, or management operations (create/update/delete the data-link entity itself).
  • Single Platform endpoint per JVM (unchanged from the dataset feature).

Test plan

  • ./gradlew :plugins:nf-tower:test — all 369 tests pass (verified locally)
  • ./gradlew :plugins:nf-tower:dependencies --configuration runtimeClasspath shows no new cloud-SDK artifacts (no aws-sdk, google-cloud-storage, azure-*)
  • Manual: nextflow fs ls seqera://<org>/<ws>/data-links/* lists providers
  • Manual: nextflow fs ls seqera://<org>/<ws>/data-links/<provider>/* lists data-link names
  • Manual: nextflow fs ls seqera://<org>/<ws>/data-links/<provider>/<name>/* lists top-level bucket entries
  • Manual: nextflow fs stat seqera://<org>/<ws>/data-links/<provider>/<name>/<file> reports is directory: false and the correct size
  • Manual: nextflow fs stat seqera://<org>/<ws>/data-links/<provider>/<name>/<dir> reports is directory: true
  • Manual: nextflow fs stat seqera://<org>/<ws>/data-links/<provider>/<name>/<missing> raises NoSuchFileException
  • Integration test: pipeline reads a file inside a data-link via file('seqera://…/data-links/<provider>/<name>/path/to/file') using only TOWER_ACCESS_TOKEN
  • Manual: verify that a Platform user with a View role (below Maintain) receives a clear AccessDeniedException

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 24, 2026

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 953429e
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/69f1f906ed564f00088dd91a
😎 Deploy Preview https://deploy-preview-7070--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@jorgee jorgee marked this pull request as ready for review April 29, 2026 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant