diff --git a/adr/20260422-seqera-datalinks-filesystem.md b/adr/20260422-seqera-datalinks-filesystem.md new file mode 100644 index 0000000000..dc290f26b6 --- /dev/null +++ b/adr/20260422-seqera-datalinks-filesystem.md @@ -0,0 +1,211 @@ +# NIO Filesystem Support for Seqera Platform Data-Links + +- Authors: Jorge Ejarque +- Status: draft +- Date: 2026-04-22 +- Tags: nio, filesystem, seqera, data-links, nf-tower + +Technical Story: Extend the `seqera://` NIO filesystem (introduced by [20260310-seqera-dataset-filesystem](20260310-seqera-dataset-filesystem.md)) to address files and directories inside Seqera Platform data-links without requiring cloud-provider credentials or SDK integration. + +## Summary + +Add a second resource type (`data-links`) to the existing `seqera://` filesystem in the `nf-tower` plugin. Paths of the form `seqera:////data-links///` resolve to files and directories inside a Platform-managed data-link. Listings and attribute queries are served by the Platform's `/data-links/{id}/browse[/path]` endpoints; byte reads go through pre-signed URLs returned by `/data-links/{id}/generate-download-url` and fetched with a plain JDK `HttpClient`. Only the Seqera access token is required — no AWS/GCP/Azure credentials, no cloud SDK dependency. + +As part of this change, the existing dataset-specific logic in `SeqeraFileSystemProvider`, `SeqeraFileSystem`, and `SeqeraPath` is extracted into a `ResourceTypeHandler` abstraction, so the two resource types coexist behind a common contract. + +## Problem Statement + +The dataset filesystem ships `seqera://` URI support for Platform datasets, but datasets are only one of several file-like resources users manage on the Platform. Data-links are the most common — they reference a cloud bucket or prefix (S3, GCS, Azure Blob) with potentially large, nested content. Today, a pipeline that needs to read a file inside a data-link must: + +1. Look up the data-link's underlying URI outside Nextflow. +2. Configure cloud credentials in the compute environment (AWS access keys, GCP service account, Azure SAS, etc.). +3. Reference the object by its cloud URI. + +This is friction the Platform already solves: data-links are scoped, ACL-controlled entities, and the Platform knows how to broker access to their content. A `seqera://` URI for a path inside a data-link would let pipelines consume Platform-managed data with only the Seqera access token — no cloud SDK, no credential sprawl. + +## Goals or Decision Drivers + +- Native `seqera://` access to files and directories inside Platform data-links, at arbitrary depth. +- Zero cloud-provider credential configuration — the Seqera access token is the only auth surface. +- No new runtime dependency on cloud SDKs (`aws-sdk`, `google-cloud-storage`, `azure-*`). +- Reuse of existing nf-tower plugin infrastructure — `TowerClient` for HTTP + auth + retry, tower-api DTOs for wire types. +- Introduce a `ResourceTypeHandler` abstraction so the dataset and data-link behaviors share one filesystem without leaking into each other. +- Preserve Platform-side access control for listings and metadata (not just reads). + +## Non-goals + +- Write operations to data-links (upload). The Platform's `POST /data-links/{id}/multipart-upload` is a future hook; it is not implemented in this iteration. +- Data-link management (create/update/delete the data-link entity itself). +- Transparent pre-signed URL renewal when a URL expires mid-stream — failures surface as `IOException` and Nextflow task retry handles them. +- Browse-result caching within a run. +- Fusion integration — Fusion has its own data-link access path. + +## Considered Options + +### Option 1: Platform-brokered credentials + cloud SDK delegation + +For each read, call the Platform to obtain short-lived AWS/GCP/Azure credentials scoped to the data-link, then use the existing `nf-amazon` / `nf-google` / `nf-azure` providers for the actual I/O. + +- Good, because cloud providers handle streaming, range reads, multi-part efficiently. +- Bad, because it requires `nf-tower` to depend on (or coordinate with) three cloud plugins. +- Bad, because credential plumbing across plugin boundaries is complex — each cloud plugin has its own credential object model. +- Bad, because it adds failure modes around credential refresh windows crossing long reads. + +### Option 2: Pre-signed URL + direct HTTPS fetch + +Call the Platform's `GET /data-links/{id}/generate-download-url?filePath=` endpoint to obtain a pre-signed URL; stream bytes through a standalone HTTPS client. + +- Good, because there is no cloud SDK dependency — all I/O is generic HTTPS. +- Good, because the Platform is the only credential surface (user token goes in, signed URL comes out; credentials never cross our process boundary as a distinct object). +- Good, because it uniformly supports every provider the Platform supports — now and in the future — with no per-provider code. +- Good, because the existing `TowerClient` handles the Platform-side call (`/generate-download-url`) with retry/backoff, and the cloud-side fetch is a one-shot HTTPS GET through a standalone `java.net.http.HttpClient`. +- Bad, because pre-signed URLs have time windows; a very long read can outlive its URL. Acceptable: Nextflow task retry handles the failure. +- Bad, because range reads / multi-part reads are not implemented in this iteration. Acceptable: datasets are already single-shot reads and the pattern matches. + +### Option 3: Proxy all bytes through the Platform + +Route all reads through a Platform endpoint that streams content back to the client from the underlying cloud. + +- Good, because the Platform sees and can log every byte. +- Bad, because it imposes Platform bandwidth/egress cost on every pipeline byte. +- Bad, because no such primary endpoint is offered — `/download` returns a URL, not bytes. + +## Pros and Cons of the Options + +See above. + +## Solution or decision outcome + +Option 2 — pre-signed URL + direct HTTPS fetch. The plugin calls the Platform's `/generate-download-url` endpoint through `TowerClient.sendApiRequest()` to obtain a pre-signed URL, then fetches that URL with a plain JDK `HttpClient` (no Seqera `Authorization` header). The plugin never touches a cloud SDK and never holds a long-lived cloud credential. + +Extend the `fs/` package with a real `ResourceTypeHandler` abstraction. Extract the existing dataset logic into a `DatasetsResourceHandler`. Add `DataLinksResourceHandler` as the second implementation. + +## Rationale & discussion + +### Path Hierarchy + +The `seqera://` path gains a second resource-type branch: + +``` +seqera:// → ROOT (directory, depth 0) + └── / → ORGANIZATION (directory, depth 1) + └── / → WORKSPACE (directory, depth 2) + ├── datasets/ → RESOURCE TYPE (directory, depth 3) + │ └── [@] → DATASET (file, depth 4) + └── data-links/ → RESOURCE TYPE (directory, depth 3) + └── / → PROVIDER (directory, depth 4) + └── / → DATA-LINK (directory, depth 5) + └── /…/ → CONTENT (directory or file, depth 6+) +``` + +Three structural differences from datasets: + +1. **Two identity segments** (`/`) instead of one (``). Provider disambiguation is required because a workspace can host two data-links with the same name on different clouds. +2. **Arbitrary sub-path depth** below the data-link root. Each segment is a folder or file inside the underlying bucket. +3. **No version pinning** — data-link content is not versioned by the Platform. Content is always "current". + +`ResourceTypeHandler.getIdentitySegmentCount()` encodes the difference: 1 for datasets, 2 for data-links. `SeqeraPath` treats everything after the identity segments as the handler-owned sub-path. + +### Component Structure + +``` +plugins/nf-tower/src/main/io/seqera/tower/plugin/ +├── fs/ ← generic NIO layer (refactored) +│ ├── SeqeraFileSystemProvider ← dispatches by resourceType to handler +│ ├── SeqeraFileSystem ← org/ws cache + handler registry +│ ├── SeqeraPath ← generic segment list (identity + sub-path) +│ ├── SeqeraFileAttributes ← plain (isDir, size, lastModified) holder +│ ├── SeqeraPathFactory ← unchanged +│ ├── DatasetInputStream ← unchanged +│ ├── ResourceTypeHandler ← NEW interface +│ └── handler/ +│ ├── DatasetsResourceHandler ← NEW — dataset logic extracted here +│ └── DataLinksResourceHandler ← NEW +├── dataset/ +│ └── SeqeraDatasetClient ← unchanged +└── datalink/ ← NEW + └── SeqeraDataLinkClient ← typed client over TowerClient + returns io.seqera.tower.model.* directly +``` + +No plugin-local DTO classes are introduced. `DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkProvider` and related types are reused from `io.seqera:tower-api:1.121.0`. + +### `ResourceTypeHandler` contract + +``` +interface ResourceTypeHandler { + String getResourceType() // "datasets" / "data-links" + int getIdentitySegmentCount() // 1 / 2 + List list(SeqeraPath dir) throws IOException + SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException + InputStream newInputStream(SeqeraPath p) throws IOException + void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException +} +``` + +`SeqeraFileSystemProvider` owns dispatch at depth ≥ 3. Depth 0–2 (root/org/workspace) remains in `SeqeraFileSystem`, shared across all handlers. At depth 3 (the workspace listing returns the resource-type children), the handler registry is enumerated — `datasets` and `data-links` are the two entries today, added automatically by the provider at `newFileSystem()` time. + +### API Usage Summary (Data-Links) + +| NIO operation | Platform endpoint | Notes | +| -------------------------------------------------------------- | -------------------------------------------------------------- | ------------------------------------------------------------------------------- | +| enumerate providers in workspace (depth-3 listing) | `GET /data-links?workspaceId=X&max=100&offset=O` | offset pagination via lazy `Iterator` | +| resolve one data-link by (provider, name) | `GET /data-links?workspaceId=X&search=&max=100&offset=O` | server-side filter by name; short-circuit on first provider match; `@Memoized` | +| `newDirectoryStream(dir)` at data-link root | `GET /data-links/{id}/browse?workspaceId=X[&credentialsId=C]` | lazy `PagedDataLinkContent` — token pagination via `nextPageToken` | +| `newDirectoryStream(dir)` at a sub-path | `GET /data-links/{id}/browse/{path}?workspaceId=X[&credentialsId=C]` | same; slashes in `{path}` are preserved | +| `readAttributes(path)` inside a data-link | same as above (first page only) | short-circuited when `path.cachedAttributes` was set by a prior listing | +| `newInputStream(file)` | `GET /data-links/{id}/generate-download-url?workspaceId=X&filePath=[&credentialsId=C]` | parse `DataLinkDownloadUrlResponse.url`; fetch with plain JDK `HttpClient` (no Seqera auth header — the URL is signed for the cloud backend) | + +`credentialsId` is forwarded when `DataLinkDto.credentials` is non-empty (using the first entry's `id`); omitted otherwise. + +### Key Design Decisions + +1. **TowerClient delegation for Platform calls**: `SeqeraDataLinkClient` routes all Seqera API calls (list, content, download-URL) through `TowerClient.sendApiRequest()`, sharing authentication state with the dataset client. The pre-signed URL itself is fetched directly with a plain JDK `HttpClient` — no Seqera headers are sent to the cloud backend. + +2. **Pre-signed URLs, not credential brokering**: the Platform returns a URL that already has the auth embedded. No AWS/GCP/Azure SDK is imported; no credential object crosses the plugin boundary. This is the single biggest simplification relative to a "get creds, hand to cloud plugin" approach. + +3. **No per-stream URL renewal**: if a signed URL expires mid-read, the HTTP connection errors and the `InputStream` surfaces an `IOException`. Nextflow task retry handles the failure as it does for any other transient read failure. The plugin does not implement transparent re-issuance. + +4. **Provider disambiguation in the path**: the data-link identity is `(workspace, provider, name)` on the Platform side. The path segment layout mirrors this to avoid ambiguity when names collide across providers. + +5. **Reuse tower-api DTOs**: every wire type is an `io.seqera.tower.model.*` class already on the plugin's classpath via `tower-api:1.121.0`. No parallel plugin-local DTOs. + +6. **Handler registry at construction, not via PF4J**: handlers are instantiated in `SeqeraFileSystemProvider.newFileSystem()`. Adding a third resource type is a code change to this plugin, identical in shape to the dataset/data-link pair. No extension-point protocol is introduced — YAGNI. + +7. **`readAttributes` is single-target**: because `GET /data-links/{id}/browse/{path}` accepts both directory and file paths, a file-level `readAttributes` is one API call — not a parent browse plus filter. No N+1 problem; no browse cache needed. + +8. **Read-only stance preserved**: `SeqeraFileSystem.isReadOnly()` remains `true`. Write operations on data-links raise `UnsupportedOperationException`. The `/data-links/{id}/upload` endpoints are a future extension point. + +9. **Listings stream lazily**: paginated Platform responses are exposed as lazy iterators rather than eagerly-materialized lists. `listDataLinks` is an `Iterator` that fetches offsets on demand. `getContent` returns a `PagedDataLinkContent` that loads the first page eagerly (for `readAttributes`) and paginates further only as the iterator advances. Handler `list()` returns `Iterable`, flowed through `DirectoryStream` without full materialization. + +10. **Per-path attribute cache, not a global cache**: listings attach `SeqeraFileAttributes` to each emitted `SeqeraPath` via `resolveWithAttributes(name, attrs)`. A follow-up `readAttributes(child)` returns the cached value with zero API calls. Paths parsed from raw URIs (no prior listing) fall back to the live browse endpoint. No global browse-result or URL cache is maintained. + +11. **`credentialsId` forwarding**: when a data-link exposes credentials in its `DataLinkDto.credentials` list, the plugin forwards the first credential's `id` as the `credentialsId` query parameter on browse and download-URL calls. When the list is empty, the parameter is omitted and the Platform falls back to its default resolution. + +### Refactor Delivered by This Change + +Adding a second resource type requires a shared abstraction in the `fs/` package so the two behaviors do not collide: + +- The `ResourceTypeHandler` interface is introduced. +- All dataset-specific logic previously inlined in `SeqeraFileSystemProvider`, `SeqeraFileSystem`, and `SeqeraPath` is moved to a new `DatasetsResourceHandler`. +- `DataLinksResourceHandler` is added alongside it, implementing the same interface. +- The generic classes (`SeqeraFileSystemProvider`, `SeqeraFileSystem`, `SeqeraPath`) become resource-type-agnostic for depth ≥ 3 — they dispatch to handlers and carry no knowledge of either resource's semantics. + +The existing dataset test suite continues to pass unchanged; every dataset code path is routed through `DatasetsResourceHandler` without behavioral change. + +### Limitations + +- **No write support for data-links in this iteration.** Upload paths must continue to use Fusion or direct cloud-SDK access until a follow-up adds the `/data-links/{id}/upload` handler. +- **Signed URL expiration is not handled transparently.** Very long reads may outlive the URL's validity window. +- **Per-item last-modified is not exposed by the Platform browse API.** `SeqeraFileAttributes.lastModifiedTime()` reports `Instant.EPOCH` for data-link entries until the Platform surfaces this metadata. +- **Single endpoint per JVM** (unchanged from dataset ADR): concurrent access to multiple Platform endpoints in one JVM is not supported. + +## Links + +- [Spec](../specs/260422-seqera-datalinks-fs/spec.md) +- Extends [20260310-seqera-dataset-filesystem](20260310-seqera-dataset-filesystem.md) + +## More information + +- [Seqera Platform OpenAPI spec](https://cloud.seqera.io/openapi/seqera-api-latest.yml) — `/data-links` endpoints. +- [What is an ADR and why should you use them](https://github.com/thomvaill/log4brains/tree/master#-what-is-an-adr-and-why-should-you-use-them) diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy new file mode 100644 index 0000000000..e5c5359740 --- /dev/null +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/PagedDataLinkContent.groovy @@ -0,0 +1,104 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.datalink + +import groovy.transform.CompileStatic +import io.seqera.tower.model.DataLinkItem + +/** + * Lazy, paginated view over a data-link's content. + * + * The first page is fetched eagerly by the producer so callers can inspect + * {@link #getOriginalPath()} and {@link #getFirstPage()} without triggering + * additional HTTP calls. Iterating yields items from the first page followed + * by subsequent pages fetched on demand via the injected page fetcher. + */ +@CompileStatic +class PagedDataLinkContent implements Iterable { + + /** + * Opaque page fetcher. Given a {@code nextPageToken}, returns the next page + * as a map with keys {@code objects} ({@code List}) and + * {@code nextPageToken} ({@code String}, null if no more pages). + */ + static interface PageFetcher { + Map fetch(String nextPageToken) throws IOException + } + + private final String originalPath + private final List firstPage + private final String firstPageNextToken + private final PageFetcher pageFetcher + + PagedDataLinkContent(String originalPath, + List firstPage, + String firstPageNextToken, + PageFetcher pageFetcher) { + this.originalPath = originalPath + this.firstPage = firstPage ?: Collections.emptyList() + this.firstPageNextToken = firstPageNextToken + this.pageFetcher = pageFetcher + } + + String getOriginalPath() { originalPath } + + /** First page, loaded eagerly — bounded in size by the server's page size. */ + List getFirstPage() { Collections.unmodifiableList(firstPage) } + + boolean isEmpty() { firstPage.isEmpty() && !firstPageNextToken } + + @Override + Iterator iterator() { + return new PagedIterator(firstPage, firstPageNextToken, pageFetcher) + } + + /** Lazy iterator that paginates on demand. */ + @CompileStatic + private static class PagedIterator implements Iterator { + private Iterator current + private String nextToken + private final PageFetcher fetcher + + PagedIterator(List firstPage, String firstPageNextToken, PageFetcher fetcher) { + this.current = firstPage.iterator() + this.nextToken = firstPageNextToken + this.fetcher = fetcher + } + + @Override + boolean hasNext() { + while (!current.hasNext()) { + if (!nextToken) return false + try { + final page = fetcher.fetch(nextToken) + final items = (page?.objects ?: []) as List + current = items.iterator() + nextToken = page?.nextPageToken as String + } catch (IOException e) { + throw new UncheckedIOException(e) + } + } + return true + } + + @Override + DataLinkItem next() { + if (!hasNext()) throw new NoSuchElementException() + return current.next() + } + } +} diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy new file mode 100644 index 0000000000..5f9041b726 --- /dev/null +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy @@ -0,0 +1,296 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.datalink + +import groovy.transform.Memoized + +import java.nio.file.AccessDeniedException +import java.nio.file.NoSuchFileException + +import groovy.json.JsonSlurper +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.model.DataLinkCredentials +import io.seqera.tower.model.DataLinkDownloadUrlResponse +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.model.DataLinkItem +import io.seqera.tower.model.DataLinkItemType +import io.seqera.tower.model.DataLinkProvider +import io.seqera.tower.plugin.TowerClient +import nextflow.exception.AbortOperationException + +/** + * Typed client for Seqera Platform data-link API endpoints. + * + * Paginated endpoints return lazy iterators so callers don't materialize the + * full result set in memory — only the current page is buffered at any time. + */ +@Slf4j +@CompileStatic +class SeqeraDataLinkClient { + + private static final int LIST_PAGE_SIZE = 100 + + private final TowerClient towerClient + + SeqeraDataLinkClient(TowerClient towerClient) { + this.towerClient = towerClient + } + + private String getEndpoint() { towerClient.endpoint } + + /** + * Lazy iterator over every data-link in the workspace. + * Pages are fetched from {@code GET /data-links?workspaceId=&max=&offset=} + * on demand as the iterator advances. + */ + Iterator listDataLinks(long workspaceId) { + return new DataLinkListIterator(towerClient, endpoint, workspaceId, LIST_PAGE_SIZE) + } + + /** + * Distinct provider identifiers present in the workspace, sorted. + * The returned set is unmodifiable; memoized per workspace. + */ + @Memoized + Set getDataLinkProviders(long workspaceId) { + final providers = new TreeSet() + final Iterator it = listDataLinks(workspaceId) + while (it.hasNext()) { + final p = it.next().provider?.toString() + if (p) providers.add(p) + } + return Collections.unmodifiableSet(providers) + } + + /** + * Resolve a data-link by {@code (provider, name)} in the given workspace. + * Iterates the API's list endpoint lazily (server-side filtered by {@code name}) + * and short-circuits on first match. + * + * Memoized per {@code (workspaceId, provider, name)} tuple. Note: Groovy's + * {@code @Memoized} caches successful returns only — a path that repeatedly + * references a non-existent data-link re-runs the search each time. + */ + @Memoized + DataLinkDto getDataLink(long workspaceId, String provider, String name) { + final Iterator it = new DataLinkListIterator(towerClient, endpoint, workspaceId, LIST_PAGE_SIZE, name) + while( it.hasNext() ) { + final dl = it.next() + if( dl.provider?.toString() == provider ) + return dl + } + throw new NoSuchFileException( + "seqera://.../data-links/${provider}/${name}", + null, + "Data-link '${name}' not found for provider '${provider}' in workspace '$workspaceId'") + } + + /** + * Browse the content of a data-link. + * The first page is fetched eagerly to populate metadata ({@code originalPath}, + * first-page items). Subsequent pages are fetched on demand as the returned + * {@link PagedDataLinkContent} is iterated. + * + * Endpoints: {@code GET /data-links/{id}/browse} (root) and + * {@code GET /data-links/{id}/browse/{path}} (sub-path). + * + * @param credentialsId optional data-link credentials identifier (from + * {@code DataLinkDto.credentials[0].id}); forwarded as a query param when set. + */ + PagedDataLinkContent getContent(String dataLinkId, String subPath, long workspaceId, String credentialsId = null) { + final pathSegment = subPath ? '/' + encodePath(subPath) : '' + final baseUrl = "${endpoint}/data-links/${encodePath(dataLinkId)}/browse${pathSegment}" + final page = fetchBrowsePage(baseUrl, workspaceId, credentialsId, null) + final firstItems = page.objects + final firstToken = page.nextPageToken + final originalPath = page.originalPath + final fetcher = new PagedDataLinkContent.PageFetcher() { + @Override + Map fetch(String token) throws IOException { + final next = fetchBrowsePage(baseUrl, workspaceId, credentialsId, token) + return [objects: next.objects, nextPageToken: next.nextPageToken] as Map + } + } + return new PagedDataLinkContent(originalPath, firstItems, firstToken, fetcher) + } + + /** {@code GET /data-links/{id}/generate-download-url?workspaceId=&filePath=[&credentialsId=]} */ + DataLinkDownloadUrlResponse getDownloadUrl(String dataLinkId, String subPath, long workspaceId, String credentialsId = null) { + String url = "${endpoint}/data-links/${encodePath(dataLinkId)}/generate-download-url?workspaceId=${workspaceId}&filePath=${encodeQuery(subPath ?: '')}" + if (credentialsId) url += "&credentialsId=${encodeQuery(credentialsId)}" + log.debug "Getting downloadURL: GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + final out = new DataLinkDownloadUrlResponse() + out.url = json.url as String + return out + } + + // ---- page-fetching helpers ---- + + /** Fetch one browse page and normalize it into a {@link BrowsePage}. */ + private BrowsePage fetchBrowsePage(String baseUrl, long workspaceId, String credentialsId, String nextPageToken) { + String url = "${baseUrl}?workspaceId=${workspaceId}" + if (credentialsId) url += "&credentialsId=${encodeQuery(credentialsId)}" + if (nextPageToken) url += "&nextPageToken=${encodeQuery(nextPageToken)}" + log.debug "Fetching Browse page GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + final items = (json.objects as List)?.collect { Map m -> mapItem(m) } ?: Collections.emptyList() + return new BrowsePage(json.originalPath as String, items, json.nextPageToken as String) + } + + @CompileStatic + private static class BrowsePage { + final String originalPath + final List objects + final String nextPageToken + + BrowsePage(String originalPath, List objects, String nextPageToken) { + this.originalPath = originalPath + this.objects = objects + this.nextPageToken = nextPageToken + } + } + + /** Lazy iterator for the {@code /data-links} list endpoint (offset pagination). */ + @CompileStatic + private static class DataLinkListIterator implements Iterator { + private final TowerClient towerClient + private final String endpoint + private final long workspaceId + private final int pageSize + private final String search + + private Iterator current = Collections.emptyIterator() + private int offset = 0 + private long total = -1L // -1 = unknown; set only when the server reports totalSize + private boolean exhausted = false + + DataLinkListIterator(TowerClient towerClient, String endpoint, long workspaceId, int pageSize, String search = null) { + this.towerClient = towerClient + this.endpoint = endpoint + this.workspaceId = workspaceId + this.pageSize = pageSize + this.search = search + } + + @Override + boolean hasNext() { + while (!current.hasNext()) { + if (exhausted) return false + fetchNextPage() + } + return true + } + + @Override + DataLinkDto next() { + if (!hasNext()) throw new NoSuchElementException() + return current.next() + } + + private void fetchNextPage() { + final url = "${endpoint}/data-links?workspaceId=${workspaceId}&max=${pageSize}&offset=${offset}${search ? '&search='+ encodeQuery(search) :''}" + log.debug "Fetching next list of datalinks: GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + final items = (json.dataLinks as List)?.collect { Map m -> mapDataLink(m) } ?: Collections.emptyList() + current = items.iterator() + offset += items.size() + // Record the server-reported total only if present (null/missing → leave as -1 and + // rely on an empty-page response to signal exhaustion) + if (total < 0 && json.totalSize != null) total = json.totalSize as Long + // Exhausted when: this page is empty, OR we've reached the known total + if (items.isEmpty() || (total >= 0 && offset >= total)) exhausted = true + } + } + + // ---- encoding / error mapping ---- + + /** URL-encode a path value while preserving {@code /} as path separators. */ + private static String encodePath(String s) { + new URI(null, null, s ?: '', null).rawPath ?: '' + } + + /** URL-encode a value intended for use as a query-string value. */ + private static String encodeQuery(String s) { + URLEncoder.encode(s ?: '', 'UTF-8') + } + + private static void checkFsResponse(TowerClient.Response resp, String url) { + if (!resp.error) return + final code = resp.code + if (code == 401) + throw new AbortOperationException("Seqera authentication failed — check tower.accessToken or TOWER_ACCESS_TOKEN") + if (code == 403) + throw new AccessDeniedException(url, null, "Forbidden — check workspace permissions") + if (code == 404) + throw new NoSuchFileException(url) + throw new IOException("Seqera API error: HTTP ${code} for ${url}") + } + + private static DataLinkDto mapDataLink(Map m) { + final dto = new DataLinkDto() + dto.id = m.id as String + dto.name = m.name as String + dto.description = m.description as String + dto.resourceRef = m.resourceRef as String + if (m.provider) dto.provider = parseProvider(m.provider as String) + dto.region = m.region as String + final credList = m.credentials as List + if (credList) dto.credentials = credList.collect { Map c -> mapCredentials(c) } + return dto + } + + private static DataLinkCredentials mapCredentials(Map m) { + final c = new DataLinkCredentials() + c.id = m.id as String + c.name = m.name as String + if (m.provider) c.provider = parseProvider(m.provider as String) + return c + } + + private static DataLinkItem mapItem(Map m) { + final it = new DataLinkItem() + it.name = m.name as String + if (m.type) it.type = parseItemType(m.type as String) + it.size = (m.size as Long) ?: 0L + it.mimeType = m.mimeType as String + return it + } + + private static DataLinkProvider parseProvider(String value) { + try { + return DataLinkProvider.fromValue(value) + } catch (Throwable ignored) { + return DataLinkProvider.values().find { DataLinkProvider p -> p.toString() == value } + } + } + + private static DataLinkItemType parseItemType(String value) { + try { + return DataLinkItemType.fromValue(value) + } catch (Throwable ignored) { + return DataLinkItemType.values().find { DataLinkItemType t -> t.toString() == value } + } + } +} diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy new file mode 100644 index 0000000000..55f0f35a64 --- /dev/null +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy @@ -0,0 +1,56 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs + +import java.nio.file.AccessMode +import java.nio.file.Path + +/** + * Strategy owning the semantics of one depth-3 path segment under {@code seqera://}. + * Registered in {@link SeqeraFileSystem} at filesystem construction. + * + * Implementations own their resource's API client, caches, and interpretation of + * trail segments beyond {@code resourceType}. The generic NIO layer does not look + * inside the trail. + */ +interface ResourceTypeHandler { + + /** e.g. {@code "datasets"} or {@code "data-links"}. Must match the depth-3 path segment. */ + String getResourceType() + + /** + * List entries at the given directory path. Caller has verified depth ≥ 3. + * Returning an {@link Iterable} lets implementations stream large listings + * without materializing them in memory. + */ + Iterable list(SeqeraPath dir) throws IOException + + /** Return attributes for any path at depth ≥ 3 owned by this handler. */ + SeqeraFileAttributes readAttributes(SeqeraPath path) throws IOException + + /** + * Open a read stream for a leaf path. Throw {@link IllegalArgumentException} + * if the path is a directory or not otherwise addressable as a file. + */ + InputStream newInputStream(SeqeraPath path) throws IOException + + /** + * Verify the path exists and requested modes are satisfiable. READ is allowed; + * WRITE/EXECUTE throw {@link java.nio.file.AccessDeniedException}. + */ + void checkAccess(SeqeraPath path, AccessMode... modes) throws IOException +} diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileAttributes.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileAttributes.groovy index 3246bd35e2..abc35d044b 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileAttributes.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileAttributes.groovy @@ -21,67 +21,57 @@ import java.nio.file.attribute.FileTime import java.time.Instant import groovy.transform.CompileStatic -import io.seqera.tower.model.DatasetDto /** * {@link BasicFileAttributes} for {@code seqera://} paths. - * For depth < 4 (directory paths): {@code isDirectory=true}, {@code size=0}. - * For depth 4 (dataset file paths): {@code isRegularFile=true}, timestamps from {@link DatasetDto}. * - * @author Seqera Labs + * Resource-type agnostic: virtual directories use the {@code (boolean isDir)} + * constructor; file-like entries use the explicit {@code (size, lastMod, created, key)} + * constructor. Handlers build instances using whatever metadata the underlying + * resource exposes. */ @CompileStatic class SeqeraFileAttributes implements BasicFileAttributes { private final boolean directory - private final DatasetDto dataset + private final long size + private final Instant lastModified + private final Instant created + private final Object fileKey - /** Construct attributes for a virtual directory (depth 0–3). */ + /** Construct attributes for a virtual directory. */ SeqeraFileAttributes(boolean isDir) { this.directory = isDir - this.dataset = null + this.size = 0L + this.lastModified = Instant.EPOCH + this.created = Instant.EPOCH + this.fileKey = null } - /** Construct attributes for a dataset file (depth 4). */ - SeqeraFileAttributes(DatasetDto dataset) { + /** Construct attributes for a regular file with explicit metadata. */ + SeqeraFileAttributes(long size, Instant lastModified, Instant created, Object fileKey) { this.directory = false - this.dataset = dataset + this.size = size >= 0 ? size : 0L + this.lastModified = lastModified ?: Instant.EPOCH + this.created = created ?: Instant.EPOCH + this.fileKey = fileKey } - @Override - FileTime lastModifiedTime() { - if (dataset?.lastUpdated) { - return FileTime.from(dataset.lastUpdated.toInstant()) - } - return FileTime.from(Instant.EPOCH) - } + @Override FileTime lastModifiedTime() { FileTime.from(lastModified) } - @Override - FileTime lastAccessTime() { lastModifiedTime() } + @Override FileTime lastAccessTime() { FileTime.from(lastModified) } - @Override - FileTime creationTime() { - if (dataset?.dateCreated) { - return FileTime.from(dataset.dateCreated.toInstant()) - } - return FileTime.from(Instant.EPOCH) - } + @Override FileTime creationTime() { FileTime.from(created) } - @Override - boolean isRegularFile() { !directory } + @Override boolean isRegularFile() { !directory } - @Override - boolean isDirectory() { directory } + @Override boolean isDirectory() { directory } - @Override - boolean isSymbolicLink() { false } + @Override boolean isSymbolicLink() { false } - @Override - boolean isOther() { false } + @Override boolean isOther() { false } - @Override - long size() { 0L } + @Override long size() { size } - @Override - Object fileKey() { dataset?.id } + @Override Object fileKey() { fileKey } } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy index 4f639facd6..3ee200f425 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy @@ -27,48 +27,52 @@ import java.nio.file.spi.FileSystemProvider import groovy.transform.CompileStatic import groovy.util.logging.Slf4j -import io.seqera.tower.model.DatasetDto -import io.seqera.tower.model.DatasetVersionDto import io.seqera.tower.model.OrgAndWorkspaceDto import io.seqera.tower.plugin.dataset.SeqeraDatasetClient /** * FileSystem instance for the {@code seqera://} scheme. - * One instance per (endpoint + credentials) pair, cached by {@link SeqeraFileSystemProvider}. + * One instance per {@link SeqeraFileSystemProvider}. * - * Lazily populates org/workspace/dataset caches on first access. - * Cache is invalidated on dataset write operations. - * - * @author Seqera Labs + * Resource-type-agnostic: the filesystem owns the org/workspace cache (shared across + * resource types) and a registry of {@link ResourceTypeHandler}s. Each handler owns + * its own API client and resource-specific caches. */ @Slf4j @CompileStatic class SeqeraFileSystem extends FileSystem { private final SeqeraFileSystemProvider provider0 - final SeqeraDatasetClient client + private SeqeraDatasetClient orgWorkspaceClient /** orgName → orgId */ private final Map orgCache = new LinkedHashMap<>() /** "orgName/workspaceName" → workspaceId */ private final Map workspaceCache = new LinkedHashMap<>() - /** workspaceId → list of DatasetDto */ - private final Map> datasetCache = new LinkedHashMap<>() - /** datasetId → list of DatasetVersionDto */ - private final Map> versionCache = new LinkedHashMap<>() + + /** resourceType → handler */ + private final Map handlers = new LinkedHashMap<>() private volatile boolean orgWorkspaceCacheLoaded = false - SeqeraFileSystem(SeqeraFileSystemProvider provider, SeqeraDatasetClient client) { + SeqeraFileSystem(SeqeraFileSystemProvider provider) { this.provider0 = provider - this.client = client + } + + /** + * Attach the dataset client used for user-info / workspaces lookup. The org/workspace + * listing uses dataset endpoints today ({@code GET /user-info}, {@code GET /user/{id}/workspaces}); + * keeping the client on the filesystem avoids duplicating it across handlers. + */ + void setOrgWorkspaceClient(SeqeraDatasetClient client) { + this.orgWorkspaceClient = client } @Override FileSystemProvider provider() { provider0 } @Override - void close() { /* no-op: platform API connection is stateless */ } + void close() { /* no-op */ } @Override boolean isOpen() { true } @@ -111,16 +115,14 @@ class SeqeraFileSystem extends FileSystem { throw new UnsupportedOperationException("WatchService not supported by seqera:// filesystem") } - // ---- cache management ---- + // ---- org/workspace cache (shared across handlers) ---- - /** - * Ensure the org/workspace cache is populated. Thread-safe: loads at most once. - * Calls GET /user-info then GET /user/{userId}/workspaces. - */ synchronized void loadOrgWorkspaceCache() { if (orgWorkspaceCacheLoaded) return + if (!orgWorkspaceClient) + throw new IllegalStateException("SeqeraFileSystem has no orgWorkspaceClient attached") log.debug "Loading Seqera org/workspace cache" - final entries = client.listUserWorkspacesAndOrgs(client.getUserId()) + final entries = orgWorkspaceClient.listUserWorkspacesAndOrgs(orgWorkspaceClient.getUserId()) for (OrgAndWorkspaceDto entry : entries) { if (entry.orgName) orgCache.put(entry.orgName, entry.orgId) @@ -130,28 +132,18 @@ class SeqeraFileSystem extends FileSystem { orgWorkspaceCacheLoaded = true } - /** - * @return distinct org names visible to the authenticated user - */ synchronized Set listOrgNames() { loadOrgWorkspaceCache() return Collections.unmodifiableSet(orgCache.keySet()) } - /** - * @return workspace names for the given org - */ synchronized List listWorkspaceNames(String org) { loadOrgWorkspaceCache() return workspaceCache.keySet() - .findAll { String k -> k.startsWith("${org}/") } - .collect { String k -> k.substring(org.length() + 1) } + .findAll { String k -> k.startsWith("${org}/") } + .collect { String k -> k.substring(org.length() + 1) } } - /** - * Resolve a workspace ID by org and workspace name. - * @throws NoSuchFileException if the org or workspace is not in the cache - */ synchronized long resolveWorkspaceId(String org, String workspace) throws NoSuchFileException { loadOrgWorkspaceCache() final key = "${org}/${workspace}" as String @@ -161,54 +153,17 @@ class SeqeraFileSystem extends FileSystem { return id } - /** - * Return datasets for the given workspace, populating the cache on first access. - */ - synchronized List resolveDatasets(long workspaceId) { - List cached = datasetCache.get(workspaceId) - if (cached == null) { - cached = client.listDatasets(workspaceId) - datasetCache.put(workspaceId, cached) - } - return cached - } + // ---- handler registry ---- - /** - * Invalidate the dataset and version caches for a workspace (call after a write operation). - */ - synchronized void invalidateDatasetCache(long workspaceId) { - // Remove version caches for all datasets in this workspace - final datasets = datasetCache.get(workspaceId) - if (datasets) { - for (DatasetDto ds : datasets) { - versionCache.remove(ds.id) - } - } - datasetCache.remove(workspaceId) + synchronized void registerHandler(ResourceTypeHandler handler) { + handlers.put(handler.resourceType, handler) } - /** - * Resolve a DatasetDto by name within a workspace. - * @throws NoSuchFileException if no dataset with the given name exists - */ - synchronized DatasetDto resolveDataset(long workspaceId, String name) throws NoSuchFileException { - final datasets = resolveDatasets(workspaceId) - return datasets.find { DatasetDto d -> d.name == name } + synchronized ResourceTypeHandler getHandler(String resourceType) { + handlers.get(resourceType) } - /** - * Return versions for the given dataset, populating the cache on first access. - * Note: the version cache is only invalidated when the workspace dataset cache is invalidated - * (e.g. after a write operation). Versions published externally during a pipeline run will not - * be visible until the cache is cleared. - */ - synchronized List resolveVersions(String datasetId, long workspaceId) { - List cached = versionCache.get(datasetId) - if (cached == null) { - cached = client.listVersions(datasetId, workspaceId) - versionCache.put(datasetId, cached) - } - return cached + synchronized Set getResourceTypes() { + Collections.unmodifiableSet(new LinkedHashSet(handlers.keySet())) } - } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy index bed6667dd3..19d965633d 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy @@ -20,6 +20,7 @@ import java.nio.channels.SeekableByteChannel import java.nio.file.AccessDeniedException import java.nio.file.AccessMode import java.nio.file.CopyOption +import java.nio.file.DirectoryIteratorException import java.nio.file.DirectoryStream import java.nio.file.FileStore import java.nio.file.FileSystem @@ -27,9 +28,7 @@ import java.nio.file.FileSystemAlreadyExistsException import java.nio.file.FileSystemNotFoundException import java.nio.file.Files import java.nio.file.LinkOption -import java.nio.file.DirectoryIteratorException import java.nio.file.NoSuchFileException -import java.nio.file.NotDirectoryException import java.nio.file.OpenOption import java.nio.file.Path import java.nio.file.ProviderMismatchException @@ -41,23 +40,19 @@ import java.nio.file.spi.FileSystemProvider import groovy.transform.CompileStatic import groovy.util.logging.Slf4j -import io.seqera.tower.model.DatasetDto -import io.seqera.tower.model.DatasetVersionDto import io.seqera.tower.plugin.TowerClient import io.seqera.tower.plugin.TowerFactory +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler +import io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler /** - * NIO {@link FileSystemProvider} for the {@code seqera://} scheme. - * Registered via {@code META-INF/services/java.nio.file.spi.FileSystemProvider}. - * - * Enables Nextflow pipelines to read Seqera Platform datasets as ordinary file paths: - * {@code seqera:////datasets/} - * - * Follows the {@code LinFileSystemProvider} pattern for structure. - * Write support follows the {@code AzFileSystemProvider} buffered-upload pattern. + * NIO {@link FileSystemProvider} for the {@code seqera://} scheme. Registered via + * {@code META-INF/services/java.nio.file.spi.FileSystemProvider}. * - * @author Seqera Labs + * Generic for depth ≥ 3: dispatches to a {@link ResourceTypeHandler} selected by + * {@code SeqeraPath.resourceType}. The handlers own all resource-specific logic. */ @Slf4j @CompileStatic @@ -65,24 +60,26 @@ class SeqeraFileSystemProvider extends FileSystemProvider { public static final String SCHEME = 'seqera' - /** Single filesystem instance — TowerClient is a singleton per session */ private volatile SeqeraFileSystem fileSystem @Override String getScheme() { SCHEME } - // ---- FileSystem lifecycle ---- + // ---- lifecycle ---- @Override synchronized FileSystem newFileSystem(URI uri, Map env) throws IOException { checkScheme(uri) if (fileSystem) throw new FileSystemAlreadyExistsException("File system `seqera://` already exists") - final TowerClient towerClient = TowerFactory.client() - if (!towerClient) - throw new IllegalStateException("File system `seqera://` requires the Seqera Platform access token to be provided - use `tower.accessToken` config option or TOWER_ACCESS_TOKEN env variable") - final client = new SeqeraDatasetClient(towerClient) - fileSystem = new SeqeraFileSystem(this, client) + final TowerClient tc = TowerFactory.client() + if (!tc) + throw new IllegalStateException("File system `seqera://` requires the Seqera Platform access token — use `tower.accessToken` config option or TOWER_ACCESS_TOKEN env variable") + final datasetClient = new SeqeraDatasetClient(tc) + fileSystem = new SeqeraFileSystem(this) + fileSystem.setOrgWorkspaceClient(datasetClient) + fileSystem.registerHandler(new DatasetsResourceHandler(fileSystem, datasetClient)) + fileSystem.registerHandler(new DataLinksResourceHandler(fileSystem, new SeqeraDataLinkClient(tc))) return fileSystem } @@ -95,10 +92,7 @@ class SeqeraFileSystemProvider extends FileSystemProvider { synchronized SeqeraFileSystem getOrCreateFileSystem(URI uri, Map env) { checkScheme(uri) - if (!fileSystem) { - final envMap = env ?: Collections.emptyMap() - newFileSystem(uri, envMap as Map) - } + if (!fileSystem) newFileSystem(uri, env ?: Collections.emptyMap()) return fileSystem } @@ -108,51 +102,46 @@ class SeqeraFileSystemProvider extends FileSystemProvider { return new SeqeraPath(fs, uri.toString()) } - // ---- Read operations ---- + // ---- read ---- @Override InputStream newInputStream(Path path, OpenOption... options) throws IOException { final sp = toSeqeraPath(path) - if (sp.depth() != 4) - throw new IllegalArgumentException("Operation `newInputStream` requires a dataset path (depth 4): $path") + if (sp.depth() < 3) + throw new IllegalArgumentException("newInputStream requires a leaf path: $path") final fs = sp.getFileSystem() as SeqeraFileSystem - final workspaceId = fs.resolveWorkspaceId(sp.org, sp.workspace) - final dataset = fs.resolveDataset(workspaceId, sp.datasetName) - if (!dataset) - throw new NoSuchFileException(sp.toString(), null, "Dataset '${sp.datasetName}' not found in workspace $sp.workspace") - final version = resolveVersion(fs, dataset, sp) - log.debug "Downloading dataset '${sp.datasetName}' version ${version.version} (${version.fileName}) from workspace $workspaceId" - return fs.client.downloadDataset(dataset.id, String.valueOf(version.version), version.fileName, dataset.workspaceId) + final h = fs.getHandler(sp.resourceType) + if (!h) + throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") + return h.newInputStream(sp) } @Override SeekableByteChannel newByteChannel(Path path, Set options, FileAttribute... attrs) throws IOException { if (options?.contains(StandardOpenOption.WRITE) || options?.contains(StandardOpenOption.APPEND)) - throw new UnsupportedOperationException("File system `seqera://` is read-only") - final inputStream = newInputStream(path) - return new DatasetInputStream(inputStream) + throw new UnsupportedOperationException("seqera:// filesystem is read-only") + return new DatasetInputStream(newInputStream(path)) } - // ---- Metadata ---- + // ---- attributes ---- @Override A readAttributes(Path path, Class type, LinkOption... options) throws IOException { if (!BasicFileAttributes.isAssignableFrom(type)) throw new UnsupportedOperationException("Attribute type not supported: $type") final sp = toSeqeraPath(path) + if (sp.cachedAttributes) + return (A) sp.cachedAttributes final fs = sp.getFileSystem() as SeqeraFileSystem final d = sp.depth() - if (d < 4) { - // Virtual directory — validate the path exists (throws NoSuchFileException if not) - validateDirectoryExists(fs, sp) + if (d < 3) { + validateSharedDirectoryExists(fs, sp) return (A) new SeqeraFileAttributes(true) } - // Dataset file - final workspaceId = fs.resolveWorkspaceId(sp.org, sp.workspace) - final dataset = fs.resolveDataset(workspaceId, sp.datasetName) - if (!dataset) - throw new NoSuchFileException(sp.toString(), null, "Dataset '${sp.datasetName}' not found in workspace $sp.workspace") - return (A) new SeqeraFileAttributes(dataset) + final h = fs.getHandler(sp.resourceType) + if (!h) + throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") + return (A) h.readAttributes(sp) } @Override @@ -160,7 +149,7 @@ class SeqeraFileSystemProvider extends FileSystemProvider { throw new UnsupportedOperationException("Operation `readAttributes(String)` not supported by `seqera://` file system") } - // ---- Access check ---- + // ---- access ---- @Override void checkAccess(Path path, AccessMode... modes) throws IOException { @@ -169,73 +158,114 @@ class SeqeraFileSystemProvider extends FileSystemProvider { if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) throw new AccessDeniedException(path.toString(), null, "seqera:// filesystem is read-only") } - // For READ, verify the path resolves without throwing NoSuchFileException - if (sp.depth() >= 1) { - final fs = sp.getFileSystem() as SeqeraFileSystem - if (sp.depth() == 1) { - fs.loadOrgWorkspaceCache() - if (!fs.listOrgNames().contains(sp.org)) - throw new NoSuchFileException(path.toString(), null, "Organisation not found") - } else { - fs.resolveWorkspaceId(sp.org, sp.workspace) - } + final d = sp.depth() + if (d == 0) return + if (d < 3) { + validateSharedDirectoryExists(sp.getFileSystem() as SeqeraFileSystem, sp) + return } + final fs = sp.getFileSystem() as SeqeraFileSystem + final h = fs.getHandler(sp.resourceType) + if (!h) + throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") + h.checkAccess(sp, modes) } - // ---- Directory stream ---- + // ---- directory stream ---- @Override DirectoryStream newDirectoryStream(Path dir, DirectoryStream.Filter filter) throws IOException { final sp = toSeqeraPath(dir) final fs = sp.getFileSystem() as SeqeraFileSystem final d = sp.depth() - List entries + Iterable entries if (d == 0) { - // Root: list distinct org names fs.loadOrgWorkspaceCache() entries = fs.listOrgNames().collect { String org -> sp.resolve(org) as Path } } else if (d == 1) { - // Org: list workspace names fs.loadOrgWorkspaceCache() entries = fs.listWorkspaceNames(sp.org).collect { String ws -> sp.resolve(ws) as Path } } else if (d == 2) { - // Workspace: static resource types - entries = ['datasets'].collect { String rt -> sp.resolve(rt) as Path } - } else if (d == 3) { - // Resource type directory: list dataset names - final workspaceId = fs.resolveWorkspaceId(sp.org, sp.workspace) - entries = fs.resolveDatasets(workspaceId).collect { DatasetDto ds -> - sp.resolve(ds.name) as Path - } + fs.resolveWorkspaceId(sp.org, sp.workspace) + entries = fs.getResourceTypes().collect { String rt -> sp.resolve(rt) as Path } } else { - throw new NotDirectoryException(dir.toString()) + final h = fs.getHandler(sp.resourceType) + if (!h) + throw new NoSuchFileException(dir.toString(), null, "Unsupported resource type: ${sp.resourceType}") + entries = h.list(sp) } - final filtered = filter ? entries.findAll { Path p -> - try { filter.accept(p) } - catch (IOException e) { throw new DirectoryIteratorException(e) } - } : entries - + final source = entries return new DirectoryStream() { - @Override Iterator iterator() { filtered.iterator() } + private boolean iteratorCalled = false + @Override + Iterator iterator() { + // NIO contract: DirectoryStream.iterator() may be called at most once. + // For data-link listings a second iteration would also re-fetch pages 2+ + // (needlessly doubling API calls), so enforcing the contract is a win. + if (iteratorCalled) + throw new IllegalStateException("DirectoryStream.iterator() may be called at most once") + iteratorCalled = true + final inner = source.iterator() + if (!filter) return inner + return new FilteredIterator(inner, filter) + } @Override void close() {} } } - // ---- Copy ---- + /** Lazy filtering iterator: calls the filter as each element is consumed. */ + @CompileStatic + private static class FilteredIterator implements Iterator { + private final Iterator inner + private final DirectoryStream.Filter filter + private T buffered + private boolean hasBuffered = false + + FilteredIterator(Iterator inner, DirectoryStream.Filter filter) { + this.inner = inner + this.filter = filter + } + + @Override + boolean hasNext() { + while (!hasBuffered && inner.hasNext()) { + final candidate = inner.next() + try { + if (filter.accept(candidate)) { + buffered = candidate + hasBuffered = true + } + } catch (IOException e) { + throw new DirectoryIteratorException(e) + } + } + return hasBuffered + } + + @Override + T next() { + if (!hasNext()) throw new NoSuchElementException() + final out = buffered + buffered = null + hasBuffered = false + return out + } + } + + // ---- copy ---- @Override void copy(Path source, Path target, CopyOption... options) throws IOException { toSeqeraPath(source) if (target instanceof SeqeraPath) throw new UnsupportedOperationException("seqera:// filesystem is read-only") - // cross-provider (seqera → local): stream to target try (final InputStream is = newInputStream(source)) { Files.copy(is, target, options) } } - // ---- Unsupported mutations ---- + // ---- unsupported mutations ---- @Override void move(Path source, Path target, CopyOption... options) { @@ -252,7 +282,7 @@ class SeqeraFileSystemProvider extends FileSystemProvider { throw new UnsupportedOperationException("createDirectory() not supported by seqera:// filesystem") } - // ---- Misc ---- + // ---- misc ---- @Override boolean isSameFile(Path path, Path path2) throws IOException { @@ -277,11 +307,10 @@ class SeqeraFileSystemProvider extends FileSystemProvider { throw new UnsupportedOperationException("setAttribute() not supported by seqera:// filesystem") } - // ---- private helpers ---- + // ---- helpers ---- private static SeqeraPath toSeqeraPath(Path path) { - if (path !instanceof SeqeraPath) - throw new ProviderMismatchException() + if (path !instanceof SeqeraPath) throw new ProviderMismatchException() return (SeqeraPath) path } @@ -290,36 +319,13 @@ class SeqeraFileSystemProvider extends FileSystemProvider { throw new IllegalArgumentException("Not a seqera:// URI: $uri") } - private static void validateDirectoryExists(SeqeraFileSystem fs, SeqeraPath sp) throws NoSuchFileException { + private static void validateSharedDirectoryExists(SeqeraFileSystem fs, SeqeraPath sp) throws NoSuchFileException { final d = sp.depth() if (d == 0) return - // Depth 1+: ensure org/workspace cache is loaded fs.loadOrgWorkspaceCache() if (d >= 1 && !fs.listOrgNames().contains(sp.org)) throw new NoSuchFileException("seqera://${sp.org}", null, "Organisation not found") if (d >= 2) fs.resolveWorkspaceId(sp.org, sp.workspace) - if (d >= 3 && sp.resourceType != 'datasets') - throw new NoSuchFileException("seqera://${sp.org}/${sp.workspace}/${sp.resourceType}", null, "Unsupported resource type") - } - - private static DatasetVersionDto resolveVersion(SeqeraFileSystem fs, DatasetDto dataset, SeqeraPath sp) throws IOException { - final pinnedVersion = sp.version - final versions = fs.resolveVersions(dataset.id, dataset.workspaceId) - if (versions.isEmpty()) - throw new NoSuchFileException(sp.toString(), null, "No versions available for dataset '${dataset.name}'") - if (pinnedVersion) { - final found = versions.find { DatasetVersionDto v -> String.valueOf(v.version) == pinnedVersion } - if (!found) - throw new NoSuchFileException(sp.toString(), null, "Version '${pinnedVersion}' not found for dataset '${dataset.name}'") - return found - } - // Latest non-disabled version - final latest = versions.findAll { DatasetVersionDto v -> !v.disabled } - .max { DatasetVersionDto v -> v.version } - if (!latest) - throw new NoSuchFileException(sp.toString(), null, "No enabled versions for dataset '${dataset.name}'") - return latest } - } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy index af2b39165c..2da5dcfdde 100644 --- a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy @@ -30,211 +30,190 @@ import groovy.transform.CompileStatic /** * {@link Path} implementation for the {@code seqera://} scheme. * - * Path hierarchy: + * Path shape: *
- *   depth 0  seqera://                                   (root — directory)
- *   depth 1  seqera://<org>                              (org — directory)
- *   depth 2  seqera://<org>/<workspace>                  (workspace — directory)
- *   depth 3  seqera://<org>/<workspace>/datasets          (resource type — directory)
- *   depth 4  seqera://<org>/<workspace>/datasets/<name>  (dataset file)
- *            seqera://<org>/<workspace>/datasets/<name@ver>  (pinned version)
+ *   seqera://                               depth 0 — root
+ *   seqera://<org>                       depth 1
+ *   seqera://<org>/<ws>               depth 2
+ *   seqera://<org>/<ws>/<type>      depth 3 — resource type
+ *   seqera://<org>/<ws>/<type>/...   depth 4+ — handler-owned trail
  * 
* - * @author Seqera Labs + * Resource-type-agnostic for depth ≥ 3: segments after {@code resourceType} are + * exposed as {@link #getTrail()} and interpreted by the matching + * {@link ResourceTypeHandler}. */ @CompileStatic class SeqeraPath implements Path { - /** URI scheme */ public static final String SCHEME = 'seqera' public static final String PROTOCOL = "${SCHEME}://" public static final String SEPARATOR = '/' private final SeqeraFileSystem fs - /** path segments in order: [org, workspace, resourceType, datasetName] — null for missing levels */ private final String org private final String workspace private final String resourceType - private final String datasetName - /** version string extracted from {@code @version} suffix; null when not pinned */ - private final String version - /** - * Raw relative path string — non-null only for relative {@code SeqeraPath} instances - * created by {@link #relativize(Path)}. When non-null, {@link #fs} is {@code null} - * and all segment fields are {@code null}. - */ + private final List trail + /** non-null only for relative paths produced by {@link #relativize(Path)} */ private final String relPath - /** - * Parse a {@code seqera://} URI string into a SeqeraPath. - * The URI authority is the org; path segments are workspace, resourceType, datasetName. - * The last segment may contain a {@code @version} suffix. + * Optional attributes attached when this path was produced by a directory listing, + * so {@code readAttributes()} can return them without a follow-up API call. + * Not part of the URI — does not affect {@link #equals}, {@link #hashCode}, + * {@link #toString}, {@link #toUri}, or propagation via {@link #resolve} / {@link #getParent}. */ + private final SeqeraFileAttributes cachedAttributes + + /** Parse a {@code seqera://} URI string. */ SeqeraPath(SeqeraFileSystem fs, String uriString) { this.fs = fs this.relPath = null - if (!uriString.startsWith("${SCHEME}://")) + this.cachedAttributes = null + if (!uriString.startsWith(PROTOCOL)) throw new InvalidPathException(uriString, "Not a seqera:// URI") - // strip scheme: seqera://rest - final withoutScheme = uriString.substring("${SCHEME}://".length()) - // split on '/' - final parts = withoutScheme.split('/', -1).toList().findAll { it != null } as List - // parts[0]=org, parts[1]=workspace, parts[2]=resourceType, parts[3]=datasetName[@version] - this.org = parts.size() > 0 && parts[0] ? parts[0] : null - this.workspace = parts.size() > 1 && parts[1] ? parts[1] : null + final withoutScheme = uriString.substring(PROTOCOL.length()) + final parts = withoutScheme.split('/', -1).toList().findAll { String s -> s != null } as List + this.org = parts.size() > 0 && parts[0] ? parts[0] : null + this.workspace = parts.size() > 1 && parts[1] ? parts[1] : null this.resourceType = parts.size() > 2 && parts[2] ? parts[2] : null - if (parts.size() > 3 && parts[3]) { - final last = parts[3] - final atIdx = last.lastIndexOf('@') - if (atIdx > 0) { - this.datasetName = last.substring(0, atIdx) - this.version = last.substring(atIdx + 1) - } else { - this.datasetName = last - this.version = null - } - } else { - this.datasetName = null - this.version = null - } + // Trail: drop any empty segments (trailing slash, accidental double-slashes) + final List tail = parts.size() > 3 + ? parts.subList(3, parts.size()).findAll { String s -> s } as List + : new ArrayList() + this.trail = Collections.unmodifiableList(tail) validatePath(uriString) } - /** Internal constructor for programmatic absolute path creation */ - SeqeraPath(SeqeraFileSystem fs, String org, String workspace, String resourceType, String datasetName, String version) { + /** Programmatic absolute-path constructor. */ + SeqeraPath(SeqeraFileSystem fs, String org, String workspace, String resourceType, List trail) { + this(fs, org, workspace, resourceType, trail, null) + } + + /** Programmatic absolute-path constructor with pre-resolved attributes. */ + SeqeraPath(SeqeraFileSystem fs, String org, String workspace, String resourceType, List trail, SeqeraFileAttributes cachedAttributes) { this.fs = fs this.relPath = null this.org = org this.workspace = workspace this.resourceType = resourceType - this.datasetName = datasetName - this.version = version + this.trail = trail != null + ? Collections.unmodifiableList(new ArrayList(trail)) + : Collections.emptyList() + this.cachedAttributes = cachedAttributes validatePath(null) } - /** - * Constructor for relative paths produced by {@link #relativize(Path)}. - * The {@code relPath} is a slash-separated string of the differing path segments. - * All segment fields are {@code null}; {@link #isAbsolute()} returns {@code false}. - */ + /** Relative path produced only by {@link #relativize(Path)}. */ SeqeraPath(String relPath) { this.fs = null this.relPath = relPath ?: '' this.org = null this.workspace = null this.resourceType = null - this.datasetName = null - this.version = null + this.trail = Collections.emptyList() + this.cachedAttributes = null } - /** - * Validate structural integrity: deeper segments require all shallower ones, - * and no segment may contain {@code /}. - * - * @param original original URI string used in error messages (null → derive from fields) - * @throws InvalidPathException if the path is malformed - */ private void validatePath(String original) { final label = original ?: rawPath() - if (datasetName && !workspace) - throw new InvalidPathException(label, "Dataset path requires a workspace segment") + if (trail && !resourceType) + throw new InvalidPathException(label, "Trail segments require a resource-type segment") if (resourceType && !workspace) throw new InvalidPathException(label, "Resource type requires a workspace segment") if (workspace && !org) throw new InvalidPathException(label, "Workspace requires an org segment") - // Segments from URI parsing never contain '/', but guard the internal constructor too if (org?.contains('/')) throw new InvalidPathException(label, "Org name cannot contain '/'") if (workspace?.contains('/')) throw new InvalidPathException(label, "Workspace name cannot contain '/'") if (resourceType?.contains('/')) throw new InvalidPathException(label, "Resource type cannot contain '/'") - if (datasetName?.contains('/')) - throw new InvalidPathException(label, "Dataset name cannot contain '/'") + for (String t : trail) { + if (t == null || t.isEmpty()) + throw new InvalidPathException(label, "Path segments cannot be empty") + if (t.contains('/')) + throw new InvalidPathException(label, "Path segments cannot contain '/'") + } + } + + private String rawPath() { + final sb = new StringBuilder(PROTOCOL) + if (org) sb.append(org) + if (workspace) sb.append('/').append(workspace) + if (resourceType) sb.append('/').append(resourceType) + for (String t : trail) sb.append('/').append(t) + return sb.toString() } - /** Return a list of name component strings (works for both absolute and relative paths). */ private List nameComponents() { if (isAbsolute()) { final d = depth() - final result = new ArrayList(d) + final out = new ArrayList(d) for (int i = 0; i < d; i++) - result.add(getName(i).toString()) - return result + out.add(getName(i).toString()) + return out } if (!relPath) return Collections.emptyList() return relPath.split('/').toList().findAll { String s -> s } as List } - /** Build a raw path string from the current fields, for use in exception messages. */ - private String rawPath() { - final sb = new StringBuilder("${SCHEME}://") - if (org) sb.append(org) - if (workspace) sb.append('/').append(workspace) - if (resourceType) sb.append('/').append(resourceType) - if (datasetName) { - sb.append('/').append(datasetName) - if (version) sb.append('@').append(version) - } - return sb.toString() - } - - // ---- path component accessors ---- + // ---- accessors ---- String getOrg() { org } String getWorkspace() { workspace } String getResourceType() { resourceType } - String getDatasetName() { datasetName } - String getVersion() { version } + List getTrail() { trail } + SeqeraFileAttributes getCachedAttributes() { cachedAttributes } /** - * Path depth: 0=root, 1=org, 2=workspace, 3=resourceType, 4=dataset file. + * Resolve a child segment and attach the given attributes to the resulting path. + * Used by directory-listing code paths so follow-up {@code readAttributes()} calls + * don't re-fetch information that was already available from the listing response. */ + SeqeraPath resolveWithAttributes(String segment, SeqeraFileAttributes attrs) { + final child = resolve(segment) as SeqeraPath + return new SeqeraPath(child.fs, child.org, child.workspace, child.resourceType, child.trail, attrs) + } + int depth() { - if (datasetName) return 4 - if (resourceType) return 3 + if (resourceType) return 3 + trail.size() if (workspace) return 2 if (org) return 1 return 0 } - boolean isDirectory() { depth() < 4 } - boolean isRegularFile() { depth() == 4 } - // ---- Path API ---- - @Override - FileSystem getFileSystem() { fs } - - @Override - boolean isAbsolute() { fs != null } + @Override FileSystem getFileSystem() { fs } + @Override boolean isAbsolute() { fs != null } @Override - Path getRoot() { new SeqeraPath(fs, null, null, null, null, null) } + Path getRoot() { new SeqeraPath(fs, null, null, null, null) } @Override Path getFileName() { final d = depth() if (d == 0) return null - final name = d == 4 ? (version ? "${datasetName}@${version}" : datasetName) - : d == 3 ? resourceType - : d == 2 ? workspace - : org - return new SeqeraPath( name as String) + if (d >= 4) return new SeqeraPath(trail[trail.size() - 1]) + if (d == 3) return new SeqeraPath(resourceType) + if (d == 2) return new SeqeraPath(workspace) + return new SeqeraPath(org) } @Override Path getParent() { final d = depth() if (d == 0) return null - if (d == 1) return new SeqeraPath(fs, null, null, null, null, null) - if (d == 2) return new SeqeraPath(fs, org, null, null, null, null) - if (d == 3) return new SeqeraPath(fs, org, workspace, null, null, null) - return new SeqeraPath(fs, org, workspace, resourceType, null, null) + if (d == 1) return new SeqeraPath(fs, null, null, null, null) + if (d == 2) return new SeqeraPath(fs, org, null, null, null) + if (d == 3) return new SeqeraPath(fs, org, workspace, null, null) + // d >= 4: drop last trail segment + final newTrail = trail.subList(0, trail.size() - 1) + return new SeqeraPath(fs, org, workspace, resourceType, newTrail) } - @Override - int getNameCount() { depth() } + @Override int getNameCount() { depth() } @Override Path getName(int index) { @@ -244,7 +223,7 @@ class SeqeraPath implements Path { if (index == 0) return new SeqeraPath(org) if (index == 1) return new SeqeraPath(workspace) if (index == 2) return new SeqeraPath(resourceType) - return new SeqeraPath((version ? "${datasetName}@${version}" : datasetName) as String) + return new SeqeraPath(trail[index - 3]) } @Override @@ -254,19 +233,14 @@ class SeqeraPath implements Path { @Override boolean startsWith(Path other) { - if (other !instanceof SeqeraPath) - return false + if (other !instanceof SeqeraPath) return false final that = (SeqeraPath) other - if (this.isAbsolute() != that.isAbsolute()) - return false - final thisNames = this.nameComponents() - final thatNames = that.nameComponents() - if (thatNames.size() > thisNames.size()) - return false - for (int i = 0; i < thatNames.size(); i++) { - if (thisNames[i] != thatNames[i]) - return false - } + if (this.isAbsolute() != that.isAbsolute()) return false + final mine = nameComponents() + final theirs = that.nameComponents() + if (theirs.size() > mine.size()) return false + for (int i = 0; i < theirs.size(); i++) + if (mine[i] != theirs[i]) return false return true } @@ -276,27 +250,20 @@ class SeqeraPath implements Path { try { final Path p = SeqeraPath.isSeqeraUri(other) ? new SeqeraPath(fs, other) : new SeqeraPath(other) return startsWith(p) - } catch (Exception ignored) { - return false - } + } catch (Exception ignored) { return false } } @Override boolean endsWith(Path other) { - if (other !instanceof SeqeraPath) - return false + if (other !instanceof SeqeraPath) return false final that = (SeqeraPath) other - if (that.isAbsolute()) - return this.equals(that) - final thisNames = this.nameComponents() - final thatNames = that.nameComponents() - if (thatNames.isEmpty() || thatNames.size() > thisNames.size()) - return false - final offset = thisNames.size() - thatNames.size() - for (int i = 0; i < thatNames.size(); i++) { - if (thisNames[offset + i] != thatNames[i]) - return false - } + if (that.isAbsolute()) return this.equals(that) + final mine = nameComponents() + final theirs = that.nameComponents() + if (theirs.isEmpty() || theirs.size() > mine.size()) return false + final offset = mine.size() - theirs.size() + for (int i = 0; i < theirs.size(); i++) + if (mine[offset + i] != theirs[i]) return false return true } @@ -306,20 +273,16 @@ class SeqeraPath implements Path { try { final Path p = SeqeraPath.isSeqeraUri(other) ? new SeqeraPath(fs, other) : new SeqeraPath(other) return endsWith(p) - } catch (Exception ignored) { - return false - } + } catch (Exception ignored) { return false } } - @Override - Path normalize() { this } + @Override Path normalize() { this } @Override Path resolve(Path other) { if (other instanceof SeqeraPath) { final that = (SeqeraPath) other if (that.isAbsolute()) return that - // Relative SeqeraPath: resolve each segment of relPath against this return resolve(that.relPath) } return resolve(other.toString()) @@ -328,34 +291,24 @@ class SeqeraPath implements Path { @Override Path resolve(String segment) { if (!segment) return this - // Absolute seqera:// URI — parse and return directly if (segment.startsWith(PROTOCOL)) return new SeqeraPath(fs, segment) - // Strip a single leading slash final stripped = segment.startsWith(SEPARATOR) ? segment.substring(1) : segment if (!stripped) return this - // Multi-segment: split and resolve one segment at a time final segs = stripped.split(SEPARATOR, -1).findAll { String s -> s } as List SeqeraPath result = this - for (String seg : segs) { - result = result.resolveOne(seg) - } + for (String seg : segs) result = result.resolveOne(seg) return result } - /** Resolve a single (non-empty, slash-free) segment against this path. */ private SeqeraPath resolveOne(String seg) { final d = depth() - if (d == 0) return new SeqeraPath(fs, seg, null, null, null, null) - if (d == 1) return new SeqeraPath(fs, org, seg, null, null, null) - if (d == 2) return new SeqeraPath(fs, org, workspace, seg, null, null) - if (d == 3) { - final atIdx = seg.lastIndexOf('@') - if (atIdx > 0) - return new SeqeraPath(fs, org, workspace, resourceType, seg.substring(0, atIdx), seg.substring(atIdx + 1)) - return new SeqeraPath(fs, org, workspace, resourceType, seg, null) - } - throw new IllegalStateException("Cannot resolve a path segment on a depth-4 path: $this") + if (d == 0) return new SeqeraPath(fs, seg, null, null, null) + if (d == 1) return new SeqeraPath(fs, org, seg, null, null) + if (d == 2) return new SeqeraPath(fs, org, workspace, seg, null) + final newTrail = new ArrayList(trail) + newTrail.add(seg) + return new SeqeraPath(fs, org, workspace, resourceType, newTrail) } @Override @@ -372,47 +325,36 @@ class SeqeraPath implements Path { @Override Path relativize(Path other) { - if (other !instanceof SeqeraPath) - throw new ProviderMismatchException() + if (other !instanceof SeqeraPath) throw new ProviderMismatchException() final that = (SeqeraPath) other if (!this.isAbsolute() || !that.isAbsolute()) throw new IllegalArgumentException("Both paths must be absolute to relativize: ${this} vs ${other}") - final thisNames = this.nameComponents() - final thatNames = that.nameComponents() - // Find common prefix length + final mine = this.nameComponents() + final theirs = that.nameComponents() int common = 0 - while (common < thisNames.size() && common < thatNames.size() - && thisNames[common] == thatNames[common]) - common++ - // Build ".." for each remaining segment in this, then append remaining segments of other + while (common < mine.size() && common < theirs.size() && mine[common] == theirs[common]) common++ final parts = new ArrayList() - for (int i = common; i < thisNames.size(); i++) - parts.add('..') - for (int i = common; i < thatNames.size(); i++) - parts.add(thatNames[i]) + for (int i = common; i < mine.size(); i++) parts.add('..') + for (int i = common; i < theirs.size(); i++) parts.add(theirs[i]) return new SeqeraPath(parts.join(SEPARATOR)) } @Override URI toUri() { - // Build path component for depth >= 2 String uriPath = null if (workspace) { final segments = [workspace] if (resourceType) segments.add(resourceType) - if (datasetName) segments.add(version ? "${datasetName}@${version}" as String : datasetName) + for (String t : trail) segments.add(t) uriPath = '/' + segments.join('/') } - // new URI(scheme, authority, path, query, fragment) avoids URI.create() pitfalls for edge cases return new URI(SCHEME, org ?: '', uriPath, null, null) } @Override String toString() { if (!isAbsolute()) return relPath - // Return the canonical human-readable representation - final d = depth() - if (d == 0) return "${SCHEME}://" + if (depth() == 0) return PROTOCOL return toUri().toString() } @@ -423,38 +365,30 @@ class SeqeraPath implements Path { return this } - @Override - Path toRealPath(LinkOption... options) { this } + @Override Path toRealPath(LinkOption... options) { this } @Override - File toFile() { - throw new UnsupportedOperationException("toFile() not supported for seqera:// paths") - } + File toFile() { throw new UnsupportedOperationException("toFile() not supported for seqera:// paths") } @Override - WatchKey register(WatchService watcher, WatchEvent.Kind[] events, WatchEvent.Modifier... modifiers) { + WatchKey register(WatchService w, WatchEvent.Kind[] e, WatchEvent.Modifier... m) { throw new UnsupportedOperationException("WatchService not supported by seqera:// paths") } @Override - WatchKey register(WatchService watcher, WatchEvent.Kind... events) { + WatchKey register(WatchService w, WatchEvent.Kind... e) { throw new UnsupportedOperationException("WatchService not supported by seqera:// paths") } @Override Iterator iterator() { final d = depth() - final List parts = new ArrayList<>(d) - for (int i = 0; i < d; i++) { - parts.add(getName(i)) - } - return parts.iterator() + final out = new ArrayList(d) + for (int i = 0; i < d; i++) out.add(getName(i)) + return out.iterator() } - @Override - int compareTo(Path other) { - return toString().compareTo(other.toString()) - } + @Override int compareTo(Path other) { toString().compareTo(other.toString()) } @Override boolean equals(Object obj) { @@ -463,23 +397,17 @@ class SeqeraPath implements Path { return toString() == obj.toString() } - @Override - int hashCode() { toString().hashCode() } + @Override int hashCode() { toString().hashCode() } static URI asUri(String path) { - if( !path ) - throw new IllegalArgumentException("Missing 'path' argument") - if( !path.startsWith(PROTOCOL) ) + if (!path) throw new IllegalArgumentException("Missing 'path' argument") + if (!path.startsWith(PROTOCOL)) throw new IllegalArgumentException("Invalid Seqera file system path URI - it must start with '${PROTOCOL}' prefix - offending value: $path") - if( path.startsWith(PROTOCOL + SEPARATOR) && path.length() > PROTOCOL.length() + 1 ) + if (path.startsWith(PROTOCOL + SEPARATOR) && path.length() > PROTOCOL.length() + 1) throw new IllegalArgumentException("Invalid Seqera file system path URI - make sure the scheme prefix does not contain more than two slash characters or a query in the root '/' - offending value: $path") - - //URI strings like seqera://./something are converted to seqera://something - if( path.startsWith(PROTOCOL + './') ) { + if (path.startsWith(PROTOCOL + './')) path = PROTOCOL + path.substring(PROTOCOL.length() + 2) - } - - if( path == PROTOCOL || path == PROTOCOL + '.') //Empty path case + if (path == PROTOCOL || path == PROTOCOL + '.') return new URI(PROTOCOL + '/') return new URI(path) } diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy new file mode 100644 index 0000000000..8917b0740a --- /dev/null +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy @@ -0,0 +1,237 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs.handler + +import java.net.http.HttpClient +import java.net.http.HttpRequest +import java.net.http.HttpResponse +import java.nio.file.AccessDeniedException +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException +import java.nio.file.Path +import java.time.Duration +import java.time.Instant + +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.model.DataLinkItem +import io.seqera.tower.model.DataLinkItemType +import io.seqera.tower.plugin.datalink.PagedDataLinkContent +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient +import io.seqera.tower.plugin.fs.ResourceTypeHandler +import io.seqera.tower.plugin.fs.SeqeraFileAttributes +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath + +/** + * {@link ResourceTypeHandler} for the {@code data-links} resource type. + * + * Listings and attribute queries go through the Seqera Platform API; file reads + * use a pre-signed URL obtained from {@code /generate-download-url} and fetched + * with a plain JDK {@link HttpClient} — the Seqera {@code Authorization} header + * must not be sent to the cloud-backed URL. + * + * Data-link list and directory content are streamed lazily to avoid materializing + * potentially large result sets in memory. + */ +@Slf4j +@CompileStatic +class DataLinksResourceHandler implements ResourceTypeHandler { + + public static final String TYPE = 'data-links' + + private final SeqeraFileSystem fs + private final SeqeraDataLinkClient client + private final HttpClient httpClient + + DataLinksResourceHandler(SeqeraFileSystem fs, SeqeraDataLinkClient client) { + this(fs, client, HttpClient.newBuilder() + .connectTimeout(Duration.ofSeconds(10)) + .followRedirects(HttpClient.Redirect.NORMAL) + .build()) + } + + /** Test-only constructor to inject a mock {@link HttpClient}. */ + DataLinksResourceHandler(SeqeraFileSystem fs, SeqeraDataLinkClient client, HttpClient httpClient) { + this.fs = fs + this.client = client + this.httpClient = httpClient + } + + @Override + String getResourceType() { TYPE } + + @Override + Iterable list(SeqeraPath dir) throws IOException { + final workspaceId = fs.resolveWorkspaceId(dir.org, dir.workspace) + final trail = dir.trail + if (trail.isEmpty()) { + // data-links/ → distinct providers in use (sorted). Iterate the stream, + // collect distinct provider names — small output. + final providers = client.getDataLinkProviders(workspaceId) + return providers.collect { String p -> dir.resolve(p) as Path } + } + if (trail.size() == 1) { + // data-links// → sorted data-link names for that provider + final prov = trail[0] + final names = new TreeSet() + final Iterator it = client.listDataLinks(workspaceId) + while (it.hasNext()) { + final dl = it.next() + if (dl.provider?.toString() == prov) names.add(dl.name) + } + if (names.isEmpty()) + throw new NoSuchFileException(dir.toString(), null, "No data-links for provider '$prov' in workspace '${dir.workspace}'") + return names.collect { String n -> dir.resolve(n) as Path } + } + // trail.size() >= 2 — browse inside a specific data-link. + // Content can be very large, so we stream it lazily. + final dl = client.getDataLink(workspaceId, trail[0], trail[1]) + final subPath = trail.size() > 2 ? trail.subList(2, trail.size()).join('/') : '' + log.debug("Listing files for $dl.name path $subPath") + final content = client.getContent(dl.id, subPath, workspaceId, credentialsIdOf(dl)) + return new PathMappingIterable(content, dir) + } + + @Override + SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException { + // Short-circuit: attributes attached when this path was produced by a listing + if (p.cachedAttributes) return p.cachedAttributes + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final trail = p.trail + if (trail.isEmpty()) { + // data-links/ — always a directory + return new SeqeraFileAttributes(true) + } + if (trail.size() == 1) { + // data-links/ — validate the provider has at least one data-link + final providers = client.getDataLinkProviders(workspaceId) + if (!providers.contains(trail[0])) + throw new NoSuchFileException(p.toString(), null, "No data-links for provider '${trail[0]}' in workspace '${p.workspace}'") + return new SeqeraFileAttributes(true) + } + final dl = client.getDataLink(workspaceId, trail[0], trail[1]) + if (trail.size() == 2) return new SeqeraFileAttributes(true) // data-link root + final subPath = trail.subList(2, trail.size()).join('/') + log.debug("Reading attributes for $p") + final content = client.getContent(dl.id, subPath, workspaceId, credentialsIdOf(dl)) + return attributesFor(content, subPath, p) + } + + @Override + InputStream newInputStream(SeqeraPath p) throws IOException { + if (p.trail.size() < 3) + throw new IllegalArgumentException("newInputStream requires a file path inside a data-link: $p") + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final dl = client.getDataLink(workspaceId, p.trail[0], p.trail[1]) + final subPath = p.trail.subList(2, p.trail.size()).join('/') + final urlResp = client.getDownloadUrl(dl.id, subPath, workspaceId, credentialsIdOf(dl)) + if (!urlResp.url) + throw new NoSuchFileException(p.toString(), null, "Platform returned no download URL") + return fetchSignedUrl(urlResp.url) + } + + /** First credentials entry on the data-link (or null if none). */ + private static String credentialsIdOf(DataLinkDto dl) { + final creds = dl?.credentials + return (creds && !creds.isEmpty()) ? creds[0].id : null + } + + @Override + void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException { + for (AccessMode m : modes) { + if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) + throw new AccessDeniedException(p.toString(), null, "seqera:// data-links are read-only") + } + readAttributes(p) + } + + // ---- private helpers ---- + + /** + * Decide whether {@code subPath} refers to a file or a directory by inspecting + * only the first page of the content response — never paginates further. + */ + private static SeqeraFileAttributes attributesFor(PagedDataLinkContent content, String subPath, SeqeraPath pathForErrors) throws NoSuchFileException { + final firstPage = content.firstPage + final lastSeg = subPath.contains('/') ? subPath.substring(subPath.lastIndexOf('/') + 1) : subPath + // Single-file response: one FILE item whose name matches the final segment + final single = firstPage.find { DataLinkItem it -> it.name == lastSeg && it.type == DataLinkItemType.FILE } + if (single) + return new SeqeraFileAttributes(single.size ?: 0L, Instant.EPOCH, Instant.EPOCH, pathForErrors.toString()) + // If there are children, this is a directory listing + if (!firstPage.isEmpty()) return new SeqeraFileAttributes(true) + // No items AND no originalPath → path does not exist + if (!content.originalPath) + throw new NoSuchFileException(pathForErrors.toString(), null, "Path not found inside data-link") + return new SeqeraFileAttributes(true) + } + + private InputStream fetchSignedUrl(String url) throws IOException { + final req = HttpRequest.newBuilder(URI.create(url)) + .timeout(Duration.ofMinutes(5)) + .GET() + .build() + try { + final HttpResponse resp = httpClient.send(req, HttpResponse.BodyHandlers.ofInputStream()) + final status = resp.statusCode() + if (status >= 200 && status < 300) return resp.body() + try { resp.body()?.close() } catch (Throwable ignored) {} + throw new IOException("Signed URL fetch failed: HTTP $status for $url") + } catch (InterruptedException e) { + Thread.currentThread().interrupt() + throw new IOException("Interrupted while fetching signed URL", e) + } + } + + /** + * Lazy {@link Iterable} that maps each {@link DataLinkItem} from a + * {@link PagedDataLinkContent} to a child {@link SeqeraPath} under + * {@code parent}. Each produced path carries cached attributes built from the + * item, so a follow-up {@code readAttributes()} call does not re-browse the + * Platform. Pages are fetched on demand as the iterator advances. + */ + @CompileStatic + private static class PathMappingIterable implements Iterable { + private final PagedDataLinkContent content + private final SeqeraPath parent + + PathMappingIterable(PagedDataLinkContent content, SeqeraPath parent) { + this.content = content + this.parent = parent + } + + @Override + Iterator iterator() { + final Iterator inner = content.iterator() + return new Iterator() { + @Override boolean hasNext() { inner.hasNext() } + @Override Path next() { + final item = inner.next() + return parent.resolveWithAttributes(item.name, attributesFor(item)) as Path + } + } + } + + private static SeqeraFileAttributes attributesFor(DataLinkItem item) { + if (item.type == DataLinkItemType.FILE) + return new SeqeraFileAttributes(item.size ?: 0L, Instant.EPOCH, Instant.EPOCH, item.name) + return new SeqeraFileAttributes(true) + } + } +} diff --git a/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy new file mode 100644 index 0000000000..ef9c50785e --- /dev/null +++ b/plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy @@ -0,0 +1,173 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs.handler + +import java.nio.file.AccessDeniedException +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException +import java.nio.file.Path +import java.time.Instant + +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.model.DatasetDto +import io.seqera.tower.model.DatasetVersionDto +import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.fs.ResourceTypeHandler +import io.seqera.tower.plugin.fs.SeqeraFileAttributes +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath + +/** + * {@link ResourceTypeHandler} for the {@code datasets} resource type. + * Owns its own dataset/version caches and {@code @version} suffix parsing. + */ +@Slf4j +@CompileStatic +class DatasetsResourceHandler implements ResourceTypeHandler { + + public static final String TYPE = 'datasets' + + private final SeqeraFileSystem fs + private final SeqeraDatasetClient client + + /** workspaceId → list of DatasetDto */ + private final Map> datasetCache = new LinkedHashMap<>() + /** datasetId → list of DatasetVersionDto */ + private final Map> versionCache = new LinkedHashMap<>() + + DatasetsResourceHandler(SeqeraFileSystem fs, SeqeraDatasetClient client) { + this.fs = fs + this.client = client + } + + @Override + String getResourceType() { TYPE } + + @Override + Iterable list(SeqeraPath dir) throws IOException { + final d = dir.depth() + if (d == 3) { + final workspaceId = fs.resolveWorkspaceId(dir.org, dir.workspace) + return resolveDatasets(workspaceId).collect { DatasetDto ds -> + dir.resolveWithAttributes(ds.name, attributesFor(ds)) as Path + } + } + throw new IllegalArgumentException("datasets handler does not list depth $d paths: $dir") + } + + @Override + SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException { + // Short-circuit: attributes attached when this path was produced by a listing + if (p.cachedAttributes) return p.cachedAttributes + final d = p.depth() + if (d == 3) { + fs.resolveWorkspaceId(p.org, p.workspace) // validates + return new SeqeraFileAttributes(true) + } + if (d != 4) + throw new NoSuchFileException(p.toString(), null, "Invalid dataset path depth: $d") + final names = parseNameAndVersion(p.trail[0]) + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final dataset = resolveDataset(workspaceId, names[0]) + if (!dataset) + throw new NoSuchFileException(p.toString(), null, "Dataset '${names[0]}' not found in workspace ${p.workspace}") + return attributesFor(dataset) + } + + private static SeqeraFileAttributes attributesFor(DatasetDto ds) { + return new SeqeraFileAttributes( + 0L, + ds.lastUpdated?.toInstant() ?: Instant.EPOCH, + ds.dateCreated?.toInstant() ?: Instant.EPOCH, + ds.id) + } + + @Override + InputStream newInputStream(SeqeraPath p) throws IOException { + if (p.depth() != 4) + throw new IllegalArgumentException("Operation `newInputStream` requires a dataset path (depth 4): $p") + final names = parseNameAndVersion(p.trail[0]) + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final dataset = resolveDataset(workspaceId, names[0]) + if (!dataset) + throw new NoSuchFileException(p.toString(), null, "Dataset '${names[0]}' not found in workspace ${p.workspace}") + final version = resolveVersion(dataset, names[1], p) + log.debug "Downloading dataset '${names[0]}' version ${version.version} (${version.fileName}) from workspace $workspaceId" + return client.downloadDataset(dataset.id, String.valueOf(version.version), version.fileName, dataset.workspaceId) + } + + @Override + void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException { + for (AccessMode m : modes) { + if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) + throw new AccessDeniedException(p.toString(), null, "seqera:// datasets are read-only") + } + readAttributes(p) + } + + // ---- helpers ---- + + /** + * Split a trail segment into {@code [name, version]}. Version is {@code null} when + * the segment does not contain an {@code @}. + */ + private static String[] parseNameAndVersion(String segment) { + final atIdx = segment.lastIndexOf('@') + if (atIdx > 0) + return [segment.substring(0, atIdx), segment.substring(atIdx + 1)] as String[] + return [segment, null] as String[] + } + + private synchronized List resolveDatasets(long workspaceId) { + def cached = datasetCache.get(workspaceId) + if (cached == null) { + cached = client.listDatasets(workspaceId) + datasetCache.put(workspaceId, cached) + } + return cached + } + + private synchronized DatasetDto resolveDataset(long workspaceId, String name) { + return resolveDatasets(workspaceId).find { DatasetDto d -> d.name == name } + } + + private synchronized List resolveVersions(String datasetId, long workspaceId) { + def cached = versionCache.get(datasetId) + if (cached == null) { + cached = client.listVersions(datasetId, workspaceId) + versionCache.put(datasetId, cached) + } + return cached + } + + private DatasetVersionDto resolveVersion(DatasetDto dataset, String pinnedVersion, SeqeraPath p) throws IOException { + final versions = resolveVersions(dataset.id, dataset.workspaceId) + if (versions.isEmpty()) + throw new NoSuchFileException(p.toString(), null, "No versions available for dataset '${dataset.name}'") + if (pinnedVersion) { + final found = versions.find { DatasetVersionDto v -> String.valueOf(v.version) == pinnedVersion } + if (!found) + throw new NoSuchFileException(p.toString(), null, "Version '$pinnedVersion' not found for dataset '${dataset.name}'") + return found + } + final latest = versions.findAll { DatasetVersionDto v -> !v.disabled }.max { DatasetVersionDto v -> v.version } + if (!latest) + throw new NoSuchFileException(p.toString(), null, "No enabled versions for dataset '${dataset.name}'") + return latest + } +} diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy new file mode 100644 index 0000000000..3b5f8ef798 --- /dev/null +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy @@ -0,0 +1,372 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.datalink + +import java.nio.file.AccessDeniedException +import java.nio.file.NoSuchFileException + +import groovy.json.JsonOutput +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.plugin.TowerClient +import nextflow.exception.AbortOperationException +import spock.lang.Specification + +/** + * Tests for {@link SeqeraDataLinkClient} using a spy {@link TowerClient}. + */ +class SeqeraDataLinkClientTest extends Specification { + + private static final String EP = 'https://api.example.com' + + private TowerClient tower() { + def tc = Spy(TowerClient) + tc.@endpoint = EP + return tc + } + + private static TowerClient.Response ok(String body) { new TowerClient.Response(200, body) } + private static TowerClient.Response err(int code) { new TowerClient.Response(code, "error $code") } + + private static List drain(Iterator it) { + final out = new ArrayList() + while (it.hasNext()) out.add(it.next()) + return out + } + + // ---- listDataLinks ---- + + def "listDataLinks yields DTOs lazily for a single page"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'inputs', provider: 'aws', resourceRef: 's3://bucket/'], + [id: 'dl-2', name: 'archive', provider: 'google', resourceRef: 'gs://bucket/'] + ], totalSize: 2]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + def list = drain(client.listDataLinks(10L)) + + then: + list.size() == 2 + list[0].id == 'dl-1' + list[1].provider.toString() == 'google' + } + + def "listDataLinks exhausts pagination across multiple pages"() { + given: + def p1 = JsonOutput.toJson([dataLinks: [[id: 'dl-1', name: 'a', provider: 'aws']], totalSize: 3]) + def p2 = JsonOutput.toJson([dataLinks: [[id: 'dl-2', name: 'b', provider: 'aws']], totalSize: 3]) + def p3 = JsonOutput.toJson([dataLinks: [[id: 'dl-3', name: 'c', provider: 'aws']], totalSize: 3]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(p1) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=1") >> ok(p2) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=2") >> ok(p3) + def client = new SeqeraDataLinkClient(tc) + + when: + def list = drain(client.listDataLinks(10L)) + + then: + list*.id == ['dl-1', 'dl-2', 'dl-3'] + } + + def "listDataLinks short-circuits — only fetches enough pages to satisfy the consumer"() { + given: + def p1 = JsonOutput.toJson([dataLinks: [[id: 'dl-1', name: 'a', provider: 'aws']], totalSize: 5]) + def tc = tower() + def client = new SeqeraDataLinkClient(tc) + + when: + def it = client.listDataLinks(10L) + def first = it.next() + + then: + 1 * tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(p1) + 0 * tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=1") + first.id == 'dl-1' + } + + def "listDataLinks returns empty iterator when workspace has none"() { + given: + def body = JsonOutput.toJson([dataLinks: [], totalSize: 0]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + expect: + !client.listDataLinks(10L).hasNext() + } + + // ---- getDataLink ---- + + def "getDataLink uses server-side search filter and returns first matching provider"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'inputs', provider: 'google'], + [id: 'dl-2', name: 'inputs', provider: 'aws'] + ], totalSize: 2]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=inputs") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + def dl = client.getDataLink(10L, 'aws', 'inputs') + + then: + dl.id == 'dl-2' + dl.provider.toString() == 'aws' + } + + def "getDataLink throws NoSuchFileException when no matching (provider, name) is found"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'inputs', provider: 'google'] + ], totalSize: 1]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=inputs") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + client.getDataLink(10L, 'aws', 'inputs') + + then: + thrown(NoSuchFileException) + } + + def "getDataLink memoizes successful lookups"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'inputs', provider: 'aws'] + ], totalSize: 1]) + def tc = tower() + def client = new SeqeraDataLinkClient(tc) + + when: + def a = client.getDataLink(10L, 'aws', 'inputs') + def b = client.getDataLink(10L, 'aws', 'inputs') + + then: + 1 * tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0&search=inputs") >> ok(body) + a.is(b) + } + + // ---- getDataLinkProviders ---- + + def "getDataLinkProviders returns distinct sorted providers across all pages"() { + given: + def p1 = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'a', provider: 'aws'], + [id: 'dl-2', name: 'b', provider: 'google'] + ], totalSize: 3]) + def p2 = JsonOutput.toJson([dataLinks: [ + [id: 'dl-3', name: 'c', provider: 'aws'] + ], totalSize: 3]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(p1) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=2") >> ok(p2) + def client = new SeqeraDataLinkClient(tc) + + when: + def providers = client.getDataLinkProviders(10L) + + then: + providers as List == ['aws', 'google'] + } + + def "getDataLinkProviders memoizes the result"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'a', provider: 'aws'] + ], totalSize: 1]) + def tc = tower() + def client = new SeqeraDataLinkClient(tc) + + when: + def a = client.getDataLinkProviders(10L) + def b = client.getDataLinkProviders(10L) + + then: + 1 * tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(body) + a.is(b) + } + + def "getDataLinkProviders returns an unmodifiable Set"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'a', provider: 'aws'] + ], totalSize: 1]) + def tc = tower() + tc.sendApiRequest(_) >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + client.getDataLinkProviders(10L).add('gcs') + + then: + thrown(UnsupportedOperationException) + } + + // ---- listDataLinks pagination robustness ---- + + def "listDataLinks keeps paginating when totalSize is absent until an empty page"() { + given: + def p1 = JsonOutput.toJson([dataLinks: [[id: 'dl-1', name: 'a', provider: 'aws']]]) // no totalSize + def p2 = JsonOutput.toJson([dataLinks: [[id: 'dl-2', name: 'b', provider: 'aws']]]) // no totalSize + def p3 = JsonOutput.toJson([dataLinks: []]) // empty page → exhausted + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(p1) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=1") >> ok(p2) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=2") >> ok(p3) + def client = new SeqeraDataLinkClient(tc) + + when: + def list = drain(client.listDataLinks(10L)) + + then: + list*.id == ['dl-1', 'dl-2'] + } + + // ---- getContent ---- + + def "getContent on a sub-path uses /browse/{path}"() { + given: + def body = JsonOutput.toJson([ + originalPath: 'reads/', + objects: [ + [name: 'a.fq', type: 'FILE', size: 123, mimeType: 'application/gzip'], + [name: 'b.fq', type: 'FILE', size: 456, mimeType: 'application/gzip'] + ]]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links/dl-1/browse/reads/?workspaceId=10") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + def resp = client.getContent('dl-1', 'reads/', 10L) + + then: + resp.originalPath == 'reads/' + resp.firstPage.size() == 2 + resp.firstPage[0].name == 'a.fq' + resp.firstPage[0].size == 123L + } + + def "getContent at the data-link root uses /browse (no path)"() { + given: + def body = JsonOutput.toJson([originalPath: '', objects: [[name: 'a', type: 'FILE', size: 1]]]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links/dl-1/browse?workspaceId=10") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + def resp = client.getContent('dl-1', null, 10L) + + then: + resp.firstPage*.name == ['a'] + } + + def "getContent iterator lazily paginates across pages"() { + given: + def p1 = JsonOutput.toJson([originalPath: '', objects: [[name: 'a', type: 'FILE', size: 1]], nextPageToken: 'T2']) + def p2 = JsonOutput.toJson([originalPath: '', objects: [[name: 'b', type: 'FILE', size: 2]]]) + def tc = tower() + def client = new SeqeraDataLinkClient(tc) + + when: 'the caller iterates the full stream' + def resp = client.getContent('dl-1', null, 10L) + def names = resp.collect { it.name } + + then: 'first page fetched eagerly; second page fetched only when iterator advances past page 1' + 1 * tc.sendApiRequest("${EP}/data-links/dl-1/browse?workspaceId=10") >> ok(p1) + 1 * tc.sendApiRequest("${EP}/data-links/dl-1/browse?workspaceId=10&nextPageToken=T2") >> ok(p2) + names == ['a', 'b'] + } + + def "getContent does not fetch page 2 if the caller only consumes the first page"() { + given: + def p1 = JsonOutput.toJson([originalPath: '', objects: [[name: 'a', type: 'FILE', size: 1]], nextPageToken: 'T2']) + def tc = tower() + def client = new SeqeraDataLinkClient(tc) + + when: 'caller only reads firstPage metadata without iterating' + def resp = client.getContent('dl-1', null, 10L) + def first = resp.firstPage + + then: + 1 * tc.sendApiRequest("${EP}/data-links/dl-1/browse?workspaceId=10") >> ok(p1) + 0 * tc.sendApiRequest("${EP}/data-links/dl-1/browse?workspaceId=10&nextPageToken=T2") + first*.name == ['a'] + } + + // ---- getDownloadUrl ---- + + def "getDownloadUrl returns the signed URL from /generate-download-url"() { + given: + def tc = tower() + def expectedUrl = "${EP}/data-links/dl-1/generate-download-url?workspaceId=10&filePath=" + URLEncoder.encode('reads/a.fq', 'UTF-8') + tc.sendApiRequest(expectedUrl) >> ok(JsonOutput.toJson([url: 'https://signed'])) + def client = new SeqeraDataLinkClient(tc) + + when: + def resp = client.getDownloadUrl('dl-1', 'reads/a.fq', 10L) + + then: + resp.url == 'https://signed' + } + + // ---- error mapping ---- + + def "401 throws AbortOperationException"() { + given: + def tc = tower() + tc.sendApiRequest(_) >> err(401) + def client = new SeqeraDataLinkClient(tc) + + when: + drain(client.listDataLinks(10L)) + + then: + thrown(AbortOperationException) + } + + def "403 throws AccessDeniedException"() { + given: + def tc = tower() + tc.sendApiRequest(_) >> err(403) + def client = new SeqeraDataLinkClient(tc) + + when: + client.getContent('dl-1', '', 10L) + + then: + thrown(AccessDeniedException) + } + + def "404 throws NoSuchFileException"() { + given: + def tc = tower() + tc.sendApiRequest(_) >> err(404) + def client = new SeqeraDataLinkClient(tc) + + when: + client.getDownloadUrl('dl-1', 'missing', 10L) + + then: + thrown(NoSuchFileException) + } +} diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/ResourceTypeAbstractionTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/ResourceTypeAbstractionTest.groovy new file mode 100644 index 0000000000..ed1958b52b --- /dev/null +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/ResourceTypeAbstractionTest.groovy @@ -0,0 +1,89 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs + +import spock.lang.Specification + +/** + * Guards that the generic NIO layer does not reach into resource-type-specific packages. + * + * {@link SeqeraPath}, {@link SeqeraFileSystem}, {@link SeqeraFileAttributes} must not + * depend on {@code dataset/}, {@code datalink/}, or {@code fs/handler/}. Dispatch goes + * through the {@link ResourceTypeHandler} interface. + * + * {@link SeqeraFileSystemProvider} is the dispatch point: it wires handlers and routes + * calls to them, so it is *expected* to import the handler packages. The guard only + * applies to the generic classes above. + */ +class ResourceTypeAbstractionTest extends Specification { + + static final Class[] GENERIC_CLASSES = [SeqeraPath, SeqeraFileSystem, SeqeraFileAttributes] + + private static File srcRoot() { + // Gradle test cwd may be the plugin module dir or the repo root. + final candidates = [ + 'src/main/io/seqera/tower/plugin/fs', + 'plugins/nf-tower/src/main/io/seqera/tower/plugin/fs' + ] + for (String c : candidates) { + final f = new File(c) + if (f.isDirectory()) return f + } + throw new IllegalStateException("Cannot locate plugin source directory from ${new File('.').absolutePath}") + } + + static final File SRC_ROOT = srcRoot() + + def "generic fs classes do not import resource-type-specific packages"() { + expect: + GENERIC_CLASSES.each { Class c -> + final src = new File(SRC_ROOT, "${c.simpleName}.groovy").text + assert !src.contains('io.seqera.tower.plugin.datalink.'), "${c.simpleName} must not import datalink package" + assert !src.contains('io.seqera.tower.plugin.fs.handler.'), "${c.simpleName} must not import handler package" + assert !src.contains('DataLink'), "${c.simpleName} must not reference data-link types" + assert !src.contains('DatasetDto'), "${c.simpleName} must not reference DatasetDto" + assert !src.contains('DatasetVersionDto'), "${c.simpleName} must not reference DatasetVersionDto" + } + } + + def "generic fs classes do not carry resource-type string literals"() { + expect: + GENERIC_CLASSES.each { Class c -> + final src = new File(SRC_ROOT, "${c.simpleName}.groovy").text + assert !src.contains("'datasets'"), "${c.simpleName} must not hard-code the 'datasets' resource type" + assert !src.contains('"datasets"'), "${c.simpleName} must not hard-code the 'datasets' resource type" + assert !src.contains("'data-links'"), "${c.simpleName} must not hard-code the 'data-links' resource type" + assert !src.contains('"data-links"'), "${c.simpleName} must not hard-code the 'data-links' resource type" + } + } + + def "both handlers implement the ResourceTypeHandler interface"() { + expect: + ResourceTypeHandler.isAssignableFrom(io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler) + ResourceTypeHandler.isAssignableFrom(io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler) + } + + def "handlers do not reference each other's resource type"() { + expect: + final datasetSrc = new File(SRC_ROOT, 'handler/DatasetsResourceHandler.groovy').text + final dataLinkSrc = new File(SRC_ROOT, 'handler/DataLinksResourceHandler.groovy').text + !datasetSrc.contains('DataLink') + !datasetSrc.contains('datalink') + !dataLinkSrc.contains('DatasetDto') + !dataLinkSrc.contains('DatasetVersionDto') + } +} diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy index 7a698c1d5a..83dd83f5be 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy @@ -53,7 +53,10 @@ class SeqeraFileSystemProviderTest extends Specification { private SeqeraFileSystem buildFs(TowerClient tc) { final client = new SeqeraDatasetClient(tc) final provider = new SeqeraFileSystemProvider() - return new SeqeraFileSystem(provider, client) + final fs = new SeqeraFileSystem(provider) + fs.setOrgWorkspaceClient(client) + fs.registerHandler(new io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler(fs, client)) + return fs } private static String userInfoJson() { @@ -240,16 +243,18 @@ class SeqeraFileSystemProviderTest extends Specification { entries[0].toString() == 'seqera://acme/research' } - def "newDirectoryStream on workspace returns datasets resource type"() { + def "newDirectoryStream on workspace returns registered resource types"() { given: def tc = spyTower() + tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) final fs = buildFs(tc) final wsPath = new SeqeraPath(fs, 'seqera://acme/research') when: def entries = fs.provider().newDirectoryStream(wsPath, null).toList() - then: + then: 'only datasets is registered by this test helper; data-links registration happens in the production provider' entries.size() == 1 entries[0].toString() == 'seqera://acme/research/datasets' } @@ -397,9 +402,8 @@ class SeqeraFileSystemProviderTest extends Specification { def "newFileSystem throws FileSystemAlreadyExistsException when filesystem exists"() { given: 'a provider with an existing filesystem' - def tc = spyTower() def provider = new SeqeraFileSystemProvider() - def fs = new SeqeraFileSystem(provider, new SeqeraDatasetClient(tc)) + def fs = new SeqeraFileSystem(provider) provider.@fileSystem = fs when: @@ -408,4 +412,76 @@ class SeqeraFileSystemProviderTest extends Specification { then: thrown(FileSystemAlreadyExistsException) } + + // ---- handler dispatch ---- + + def "newDirectoryStream at workspace enumerates registered handlers (datasets + data-links)"() { + given: + def tc = spyTower() + tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) + def datasetClient = new SeqeraDatasetClient(tc) + def fs = new SeqeraFileSystem(new SeqeraFileSystemProvider()) + fs.setOrgWorkspaceClient(datasetClient) + fs.registerHandler(new io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler(fs, datasetClient)) + fs.registerHandler(new io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler(fs, new io.seqera.tower.plugin.datalink.SeqeraDataLinkClient(tc))) + def wsPath = new SeqeraPath(fs, 'seqera://acme/research') + + when: + def entries = fs.provider().newDirectoryStream(wsPath, null).toList() + + then: + entries*.toString().sort() == [ + 'seqera://acme/research/data-links', + 'seqera://acme/research/datasets' + ] + } + + def "newInputStream on an unsupported resource type throws NoSuchFileException"() { + given: + def tc = spyTower() + tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) + def fs = buildFs(tc) + def path = new SeqeraPath(fs, 'seqera://acme/research/unknown-type/foo') + + when: + fs.provider().newInputStream(path) + + then: + def ex = thrown(NoSuchFileException) + ex.reason?.contains('Unsupported resource type') + } + + def "readAttributes short-circuits when the SeqeraPath carries cachedAttributes (no API call)"() { + given: 'a provider with a fresh filesystem and a path carrying pre-resolved attrs' + def tc = spyTower() + def fs = buildFs(tc) + def attrs = new SeqeraFileAttributes(999L, java.time.Instant.EPOCH, java.time.Instant.EPOCH, 'cached-key') + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples').resolveWithAttributes('nested', attrs) + + when: + def got = fs.provider().readAttributes(path, java.nio.file.attribute.BasicFileAttributes) + + then: 'no workspace-cache load and no dataset/browse API calls were issued' + 0 * tc.sendApiRequest(_) + got === attrs + } + + def "newDirectoryStream.iterator() throws IllegalStateException on a second call"() { + given: + def tc = spyTower() + tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) + def fs = buildFs(tc) + def wsPath = new SeqeraPath(fs, 'seqera://acme/research') + def stream = fs.provider().newDirectoryStream(wsPath, null) + + when: + stream.iterator() + stream.iterator() + + then: + thrown(IllegalStateException) + } } diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemTest.groovy index c7db27a534..b600177e69 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemTest.groovy @@ -24,7 +24,8 @@ import io.seqera.tower.plugin.dataset.SeqeraDatasetClient import spock.lang.Specification /** - * Tests for {@link SeqeraFileSystem} caching and workspace resolution using a mock {@link TowerClient}. + * Tests for {@link SeqeraFileSystem} org/workspace cache and handler registry. + * Resource-specific caches (datasets, data-links) are tested against their handlers. */ class SeqeraFileSystemTest extends Specification { @@ -53,7 +54,9 @@ class SeqeraFileSystemTest extends Specification { } private SeqeraFileSystem buildFs(TowerClient tc) { - new SeqeraFileSystem(new SeqeraFileSystemProvider(), new SeqeraDatasetClient(tc)) + final fs = new SeqeraFileSystem(new SeqeraFileSystemProvider()) + fs.setOrgWorkspaceClient(new SeqeraDatasetClient(tc)) + return fs } // ---- cache loading ---- @@ -147,58 +150,21 @@ class SeqeraFileSystemTest extends Specification { thrown(NoSuchFileException) } - // ---- dataset cache ---- + // ---- handler registry ---- - def "resolveDatasets populates cache and returns datasets"() { + def "registerHandler stores and looks up by resource type"() { given: - def tc = spyTower() - tc.sendApiRequest("${ENDPOINT}/datasets?workspaceId=10") >> - ok(JsonOutput.toJson([datasets: [ - [id: 'ds-1', name: 'samples', version: 1L, mediaType: 'text/csv', - dateCreated: '2024-01-01T00:00:00Z', lastUpdated: '2024-01-02T00:00:00Z'] - ], totalSize: 1])) - final fs = buildFs(tc) - - when: - def datasets = fs.resolveDatasets(10L) - - then: - datasets.size() == 1 - datasets[0].name == 'samples' - } - - def "resolveDatasets returns cached result on second call without extra API request"() { - given: - def tc = spyTower() - final datasetsJson = JsonOutput.toJson([datasets: [ - [id: 'ds-1', name: 'samples', version: 1L, mediaType: 'text/csv', - dateCreated: '2024-01-01T00:00:00Z', lastUpdated: '2024-01-02T00:00:00Z'] - ], totalSize: 1]) - final fs = buildFs(tc) - - when: - fs.resolveDatasets(10L) - fs.resolveDatasets(10L) - - then: - 1 * tc.sendApiRequest("${ENDPOINT}/datasets?workspaceId=10") >> ok(datasetsJson) - } - - def "invalidateDatasetCache forces re-fetch on next resolveDatasets call"() { - given: - def tc = spyTower() - final datasetsJson = JsonOutput.toJson([datasets: [ - [id: 'ds-1', name: 'samples', version: 1L, mediaType: 'text/csv', - dateCreated: '2024-01-01T00:00:00Z', lastUpdated: '2024-01-02T00:00:00Z'] - ], totalSize: 1]) - final fs = buildFs(tc) + def fs = new SeqeraFileSystem(new SeqeraFileSystemProvider()) + def handler = Mock(ResourceTypeHandler) { + getResourceType() >> 'datasets' + } when: - fs.resolveDatasets(10L) - fs.invalidateDatasetCache(10L) - fs.resolveDatasets(10L) + fs.registerHandler(handler) then: - 2 * tc.sendApiRequest("${ENDPOINT}/datasets?workspaceId=10") >> ok(datasetsJson) + fs.getHandler('datasets') === handler + fs.getHandler('unknown') == null + fs.getResourceTypes() == ['datasets'] as Set } } diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy index 69a5e26915..6183465f7d 100644 --- a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy @@ -16,7 +16,6 @@ package io.seqera.tower.plugin.fs -import io.seqera.tower.plugin.dataset.SeqeraDatasetClient import spock.lang.Specification /** @@ -26,10 +25,11 @@ class SeqeraPathTest extends Specification { private SeqeraFileSystem mockFs() { def provider = new SeqeraFileSystemProvider() - def client = Mock(SeqeraDatasetClient) - return new SeqeraFileSystem(provider, client) + return new SeqeraFileSystem(provider) } + // ---- depth / segment accessors ---- + def "depth 0 - root path"() { given: def fs = mockFs() @@ -37,10 +37,10 @@ class SeqeraPathTest extends Specification { expect: path.depth() == 0 - path.isDirectory() - !path.isRegularFile() path.org == null path.workspace == null + path.resourceType == null + path.trail == [] } def "depth 1 - org path"() { @@ -50,8 +50,6 @@ class SeqeraPathTest extends Specification { expect: path.depth() == 1 - path.isDirectory() - !path.isRegularFile() path.org == 'acme' path.workspace == null } @@ -63,7 +61,6 @@ class SeqeraPathTest extends Specification { expect: path.depth() == 2 - path.isDirectory() path.org == 'acme' path.workspace == 'research' path.resourceType == null @@ -76,40 +73,58 @@ class SeqeraPathTest extends Specification { expect: path.depth() == 3 - path.isDirectory() path.org == 'acme' path.workspace == 'research' path.resourceType == 'datasets' - path.datasetName == null + path.trail == [] } - def "depth 4 - dataset file path"() { + def "depth 4 - dataset trail segment is raw (handler parses @version)"() { given: def fs = mockFs() def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') expect: path.depth() == 4 - !path.isDirectory() - path.isRegularFile() - path.org == 'acme' - path.workspace == 'research' path.resourceType == 'datasets' - path.datasetName == 'samples' - path.version == null + path.trail == ['samples'] } - def "depth 4 - dataset with pinned version"() { + def "dataset with @version suffix stays raw in trail"() { given: def fs = mockFs() def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples@2') + expect: + // Path is resource-type-agnostic — no @version parsing here. + path.depth() == 4 + path.trail == ['samples@2'] + } + + def "data-link path with provider, name, and sub-path"() { + given: + def fs = mockFs() + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/sample.fq.gz') + + expect: + path.depth() == 7 + path.resourceType == 'data-links' + path.trail == ['AWS', 'inputs', 'reads', 'sample.fq.gz'] + } + + def "data-link path at provider level"() { + given: + def fs = mockFs() + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS') + expect: path.depth() == 4 - path.datasetName == 'samples' - path.version == '2' + path.resourceType == 'data-links' + path.trail == ['AWS'] } + // ---- toUri / toString ---- + def "toUri round-trip - no version"() { given: def fs = mockFs() @@ -121,7 +136,7 @@ class SeqeraPathTest extends Specification { path.toString() == uri } - def "toUri round-trip - with version"() { + def "toUri round-trip - dataset with @version"() { given: def fs = mockFs() def uri = 'seqera://acme/research/datasets/samples@2' @@ -131,6 +146,18 @@ class SeqeraPathTest extends Specification { path.toUri().toString() == uri } + def "toUri round-trip - deep data-link path"() { + given: + def fs = mockFs() + def uri = 'seqera://acme/research/data-links/AWS/inputs/reads/sample.fq.gz' + def path = new SeqeraPath(fs, uri) + + expect: + path.toUri().toString() == uri + } + + // ---- getParent ---- + def "getParent - depth 4 returns depth 3"() { given: def fs = mockFs() @@ -144,6 +171,19 @@ class SeqeraPathTest extends Specification { (parent as SeqeraPath).depth() == 3 } + def "getParent - depth 7 returns depth 6 (drops one trail segment)"() { + given: + def fs = mockFs() + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/s.fq.gz') + + when: + def parent = path.getParent() as SeqeraPath + + then: + parent.trail == ['AWS', 'inputs', 'reads'] + parent.depth() == 6 + } + def "getParent - depth 3 returns depth 2"() { given: def fs = mockFs() @@ -171,6 +211,8 @@ class SeqeraPathTest extends Specification { path.getParent() == null } + // ---- resolve ---- + def "resolve - appends segment to workspace"() { given: def fs = mockFs() @@ -196,19 +238,52 @@ class SeqeraPathTest extends Specification { resolved.toString() == 'seqera://acme/research/datasets/my-dataset' } - def "resolve - dataset name with version"() { + def "resolve - dataset name with @version preserved as raw trail segment"() { given: def fs = mockFs() def path = new SeqeraPath(fs, 'seqera://acme/research/datasets') when: - def resolved = path.resolve('samples@3') + def resolved = path.resolve('samples@3') as SeqeraPath then: - (resolved as SeqeraPath).datasetName == 'samples' - (resolved as SeqeraPath).version == '3' + resolved.trail == ['samples@3'] + } + + def "resolve - appends nested data-link path segment"() { + given: + def fs = mockFs() + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs') + + when: + def child = path.resolve('reads') as SeqeraPath + + then: + child.trail == ['AWS', 'inputs', 'reads'] + } + + def "resolve with multi-segment string builds correct path"() { + given: + def fs = mockFs() + def base = new SeqeraPath(fs, 'seqera://acme/research') + + expect: + base.resolve('datasets/samples').toString() == 'seqera://acme/research/datasets/samples' + base.resolve('datasets').toString() == 'seqera://acme/research/datasets' } + def "resolve with absolute seqera URI returns that URI"() { + given: + def fs = mockFs() + def base = new SeqeraPath(fs, 'seqera://acme/research') + def absolute = 'seqera://other/ws/datasets/report' + + expect: + base.resolve(absolute).toString() == absolute + } + + // ---- equality / hashCode ---- + def "equality and hashCode"() { given: def fs = mockFs() @@ -222,7 +297,7 @@ class SeqeraPathTest extends Specification { p1 != p3 } - def "isAbsolute always true"() { + def "isAbsolute true when fs attached"() { given: def fs = mockFs() @@ -239,6 +314,7 @@ class SeqeraPathTest extends Specification { new SeqeraPath(fs, 'seqera://').nameCount == 0 new SeqeraPath(fs, 'seqera://acme').nameCount == 1 new SeqeraPath(fs, 'seqera://acme/research/datasets/samples').nameCount == 4 + new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq').nameCount == 7 } // ---- relativize ---- @@ -291,28 +367,6 @@ class SeqeraPathTest extends Specification { 'seqera://acme/research/datasets/samples' | 'seqera://acme/research/datasets/other' | '../other' } - // ---- multi-segment resolve ---- - - def "resolve with multi-segment string builds correct path"() { - given: - def fs = mockFs() - def base = new SeqeraPath(fs, 'seqera://acme/research') - - expect: - base.resolve('datasets/samples').toString() == 'seqera://acme/research/datasets/samples' - base.resolve('datasets').toString() == 'seqera://acme/research/datasets' - } - - def "resolve with absolute seqera URI returns that URI"() { - given: - def fs = mockFs() - def base = new SeqeraPath(fs, 'seqera://acme/research') - def absolute = 'seqera://other/ws/datasets/report' - - expect: - base.resolve(absolute).toString() == absolute - } - def "isAbsolute is false for relative paths produced by relativize"() { given: def fs = mockFs() @@ -337,6 +391,7 @@ class SeqeraPathTest extends Specification { new SeqeraPath(fs, 'seqera://acme/research/datasets').getFileName().toString() == 'datasets' new SeqeraPath(fs, 'seqera://acme/research/datasets/samples').getFileName().toString() == 'samples' new SeqeraPath(fs, 'seqera://acme/research/datasets/samples@2').getFileName().toString() == 'samples@2' + new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq').getFileName().toString() == 'a.fq' } def "getFileName is not absolute (uses relative constructor)"() { @@ -365,9 +420,7 @@ class SeqeraPathTest extends Specification { def "asUri - path starting with dot has dot stripped"() { expect: - // seqera://. → strips dot → seqera:// → hits empty-path case → seqera:/// SeqeraPath.asUri('seqera://.').toString() == 'seqera:///' - // seqera://./foo/bar → strips dot only (substring from index 10) → seqera:///foo/bar SeqeraPath.asUri('seqera://./foo/bar').toString() == 'seqera://foo/bar' } @@ -518,4 +571,83 @@ class SeqeraPathTest extends Specification { then: parts == ['acme'] } + + // ---- trailing slash / accidental double-slash tolerance ---- + + def "trailing slash on resource-type directory is ignored"() { + given: + def fs = mockFs() + + when: + def p = new SeqeraPath(fs, 'seqera://acme/research/datasets/') + + then: + p.depth() == 3 + p.resourceType == 'datasets' + p.trail == [] + } + + def "trailing slash on data-link directory is ignored"() { + given: + def fs = mockFs() + + when: + def p = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/') + + then: + p.depth() == 5 + p.trail == ['aws', 'inputs'] + } + + def "accidental double-slash inside the trail is collapsed"() { + given: + def fs = mockFs() + + when: + def p = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs//reads/a.fq') + + then: + p.trail == ['aws', 'inputs', 'reads', 'a.fq'] + } + + // ---- cached attributes ---- + + def "cachedAttributes is null by default and preserved by resolveWithAttributes"() { + given: + def fs = mockFs() + def parent = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') + def attrs = new SeqeraFileAttributes(42L, java.time.Instant.EPOCH, java.time.Instant.EPOCH, 'k') + + when: + def child = parent.resolveWithAttributes('reads', attrs) + + then: + parent.cachedAttributes == null + child.cachedAttributes === attrs + child.toString() == 'seqera://acme/research/data-links/aws/inputs/reads' + } + + def "cachedAttributes does not affect equals/hashCode"() { + given: + def fs = mockFs() + def attrs = new SeqeraFileAttributes(true) + def withAttrs = new SeqeraPath(fs, 'seqera://acme/research/datasets').resolveWithAttributes('samples', attrs) + def withoutAttrs = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + + expect: + withAttrs == withoutAttrs + withAttrs.hashCode() == withoutAttrs.hashCode() + } + + def "iterator on deep data-link path returns all segments"() { + given: + def fs = mockFs() + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq') + + when: + def parts = path.iterator().collect { it.toString() } + + then: + parts == ['acme', 'research', 'data-links', 'AWS', 'inputs', 'reads', 'a.fq'] + } } diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy new file mode 100644 index 0000000000..dab9b880f1 --- /dev/null +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy @@ -0,0 +1,418 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs.handler + +import java.net.http.HttpClient +import java.net.http.HttpResponse +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException +import java.nio.file.Path + +import io.seqera.tower.model.DataLinkCredentials +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.model.DataLinkItem +import io.seqera.tower.model.DataLinkItemType +import io.seqera.tower.model.DataLinkProvider +import io.seqera.tower.model.DataLinkDownloadUrlResponse +import io.seqera.tower.plugin.datalink.PagedDataLinkContent +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath +import spock.lang.Specification + +class DataLinksResourceHandlerTest extends Specification { + + private SeqeraFileSystem fs = Mock(SeqeraFileSystem) + private SeqeraDataLinkClient client = Mock(SeqeraDataLinkClient) + private HttpClient http = Mock(HttpClient) + private DataLinksResourceHandler handler = new DataLinksResourceHandler(fs, client, http) + + private static DataLinkDto dl(String id, String name, DataLinkProvider p, String credId = null) { + def d = new DataLinkDto() + d.id = id; d.name = name; d.provider = p + if (credId) { + def c = new DataLinkCredentials(); c.id = credId + d.credentials = [c] + } + return d + } + + private static DataLinkItem item(String name, DataLinkItemType t, long size) { + def i = new DataLinkItem(); i.name = name; i.type = t; i.size = size; return i + } + + private static PagedDataLinkContent pagedContent(List items, String originalPath = null) { + return new PagedDataLinkContent(originalPath, items, null, new PagedDataLinkContent.PageFetcher() { + Map fetch(String t) { throw new IllegalStateException('no more pages') } + }) + } + + private static Iterator iter(List list) { list.iterator() } + + private static List asList(Iterable iterable) { + final out = new ArrayList() + for (Path p : iterable) out.add(p) + return out + } + + // ===================================================== + // newInputStream — MVP + // ===================================================== + + def "newInputStream resolves (provider,name,subPath) and streams the signed URL"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads/a.fq') + def signedBody = new ByteArrayInputStream('data'.bytes) + def httpResp = Mock(HttpResponse) { + statusCode() >> 200 + body() >> signedBody + } + def urlResp = new DataLinkDownloadUrlResponse(); urlResp.url = 'https://signed/a' + + when: + def stream = handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L, null) >> urlResp + 1 * http.send(_, _) >> httpResp + stream === signedBody + } + + def "newInputStream forwards credentialsId from the data-link's credentials"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads/a.fq') + def urlResp = new DataLinkDownloadUrlResponse(); urlResp.url = 'https://signed/a' + def httpResp = Mock(HttpResponse) { + statusCode() >> 200 + body() >> new ByteArrayInputStream('x'.bytes) + } + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS, 'cred-42') + 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L, 'cred-42') >> urlResp + 1 * http.send(_, _) >> httpResp + } + + def "newInputStream throws NoSuchFileException when data-link unknown"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/unknown/reads/a.fq') + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.getDataLink(10L, 'aws', 'unknown') >> { throw new NoSuchFileException("data-link not found") } + thrown(NoSuchFileException) + } + + def "newInputStream requires trail.size >= 3 (file path, not the data-link root itself)"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') + + when: + handler.newInputStream(path) + + then: + thrown(IllegalArgumentException) + } + + def "newInputStream surfaces signed-URL HTTP failure as IOException"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads/a.fq') + def urlResp = new DataLinkDownloadUrlResponse(); urlResp.url = 'https://signed/a' + def httpResp = Mock(HttpResponse) { + statusCode() >> 403 + body() >> new ByteArrayInputStream(new byte[0]) + } + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L, null) >> urlResp + 1 * http.send(_, _) >> httpResp + thrown(IOException) + } + + // ===================================================== + // list — US2 browse + // ===================================================== + + def "list at data-links/ returns distinct providers in use"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links') + + when: + def paths = asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.getDataLinkProviders(10L) >> (['aws', 'google'] as Set) + paths*.toString() == [ + 'seqera://acme/research/data-links/aws', + 'seqera://acme/research/data-links/google' + ] + } + + def "list at data-links// returns data-link names for that provider"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws') + + when: + def paths = asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> iter([ + dl('dl-1', 'inputs', DataLinkProvider.AWS), + dl('dl-2', 'archive', DataLinkProvider.AWS), + dl('dl-3', 'onGcs', DataLinkProvider.GOOGLE) + ]) + paths*.toString() == [ + 'seqera://acme/research/data-links/aws/archive', + 'seqera://acme/research/data-links/aws/inputs' + ] + } + + def "list at data-link root returns top-level objects with cached attributes"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') + + when: + def paths = asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', '', 10L, null) >> pagedContent([ + item('reads', DataLinkItemType.FOLDER, 0), + item('samplesheet.csv', DataLinkItemType.FILE, 42) + ]) + paths*.toString() == [ + 'seqera://acme/research/data-links/aws/inputs/reads', + 'seqera://acme/research/data-links/aws/inputs/samplesheet.csv' + ] + // Attributes attached without follow-up API calls + (paths[0] as SeqeraPath).cachedAttributes.directory + (paths[1] as SeqeraPath).cachedAttributes.regularFile + (paths[1] as SeqeraPath).cachedAttributes.size() == 42L + } + + def "list forwards credentialsId to getContent when data-link has credentials"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') + + when: + asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS, 'cred-42') + 1 * client.getContent('dl-1', '', 10L, 'cred-42') >> pagedContent([]) + } + + def "list at deep sub-path browses the correct sub-path"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads') + + when: + def paths = asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', 'reads', 10L, null) >> pagedContent([ + item('a.fq', DataLinkItemType.FILE, 1), + item('b.fq', DataLinkItemType.FILE, 2) + ]) + paths*.toString() == [ + 'seqera://acme/research/data-links/aws/inputs/reads/a.fq', + 'seqera://acme/research/data-links/aws/inputs/reads/b.fq' + ] + } + + // ===================================================== + // readAttributes + // ===================================================== + + def "readAttributes at data-links/ resource-type dir reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + attr.directory + !attr.regularFile + } + + def "readAttributes at data-links// reports directory when provider exists"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLinkProviders(10L) >> (['aws', 'google'] as Set) + attr.directory + } + + def "readAttributes at data-links// throws when the provider has no data-links"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/azure') + + when: + handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLinkProviders(10L) >> (['aws'] as Set) + def ex = thrown(NoSuchFileException) + ex.reason?.contains("No data-links for provider 'azure'") + } + + def "readAttributes at data-link root reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + attr.directory + } + + def "readAttributes on a file sub-path reports file with size"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads/a.fq') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', 'reads/a.fq', 10L, null) >> pagedContent([ + item('a.fq', DataLinkItemType.FILE, 123) + ]) + attr.regularFile + attr.size() == 123L + } + + def "readAttributes on a directory sub-path reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', 'reads', 10L, null) >> pagedContent( + [item('a.fq', DataLinkItemType.FILE, 1), item('b.fq', DataLinkItemType.FILE, 2)], + 'reads/') + attr.directory + } + + def "readAttributes short-circuits when path has cached attributes (no API call)"() { + given: + def attrs = new io.seqera.tower.plugin.fs.SeqeraFileAttributes(99L, java.time.Instant.EPOCH, java.time.Instant.EPOCH, 'key') + def parent = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads') + def path = parent.resolveWithAttributes('a.fq', attrs) + + when: + def got = handler.readAttributes(path) + + then: + 0 * fs.resolveWorkspaceId(_, _) + 0 * client.getDataLink(_, _, _) + 0 * client.getContent(_, _, _, _) + got === attrs + } + + // ===================================================== + // error paths — US3 + // ===================================================== + + def "list at data-links// throws when no data-links for that provider"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/azure') + + when: + asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> iter([dl('dl-1', 'x', DataLinkProvider.AWS)]) + def ex = thrown(NoSuchFileException) + ex.reason?.toLowerCase()?.contains('no data-links') + } + + def "unknown data-link under a known provider throws"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/ghost/a.fq') + + when: + asList(handler.list(path)) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'ghost') >> { throw new NoSuchFileException("not found") } + thrown(NoSuchFileException) + } + + def "missing sub-path inside a data-link surfaces as NoSuchFileException"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/does/not/exist') + + when: + handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.getDataLink(10L, 'aws', 'inputs') >> dl('dl-1', 'inputs', DataLinkProvider.AWS) + 1 * client.getContent('dl-1', 'does/not/exist', 10L, null) >> pagedContent([]) + thrown(NoSuchFileException) + } + + def "checkAccess with WRITE is rejected"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/a.fq') + + when: + handler.checkAccess(path, AccessMode.WRITE) + + then: + thrown(java.nio.file.AccessDeniedException) + } +} diff --git a/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy new file mode 100644 index 0000000000..752091c2d6 --- /dev/null +++ b/plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy @@ -0,0 +1,233 @@ +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs.handler + +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException + +import io.seqera.tower.model.DatasetDto +import io.seqera.tower.model.DatasetVersionDto +import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath +import spock.lang.Specification + +class DatasetsResourceHandlerTest extends Specification { + + def fs = Mock(SeqeraFileSystem) + def client = Mock(SeqeraDatasetClient) + def handler = new DatasetsResourceHandler(fs, client) + + private static DatasetDto ds(String id, String name, long wsId = 10L) { + def d = new DatasetDto() + d.id = id; d.name = name; d.workspaceId = wsId + return d + } + + private static DatasetVersionDto ver(String dsId, long v, String file, boolean disabled = false) { + def dv = new DatasetVersionDto() + dv.datasetId = dsId; dv.version = v; dv.fileName = file; dv.disabled = disabled + return dv + } + + def "getResourceType returns 'datasets'"() { + expect: + handler.resourceType == 'datasets' + } + + def "list at depth 3 returns one path per dataset"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets') + + when: + def paths = handler.list(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDatasets(10L) >> [ds('d1', 'one'), ds('d2', 'two')] + paths*.toString() == [ + 'seqera://acme/research/datasets/one', + 'seqera://acme/research/datasets/two' + ] + } + + def "list result is cached across calls"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets') + + when: + handler.list(path) + handler.list(path) + + then: + 2 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDatasets(10L) >> [ds('d1', 'one')] + } + + def "newInputStream resolves latest non-disabled version when no pin"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + def dataset = ds('d1', 'samples') + def v1 = ver('d1', 1, 'a.csv') + def v2 = ver('d1', 2, 'b.csv') + def content = new ByteArrayInputStream('x'.bytes) + + when: + def stream = handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDatasets(10L) >> [dataset] + 1 * client.listVersions('d1', 10L) >> [v1, v2] + 1 * client.downloadDataset('d1', '2', 'b.csv', 10L) >> content + stream === content + } + + def "newInputStream honors @version pin"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples@1') + def dataset = ds('d1', 'samples') + def v1 = ver('d1', 1, 'a.csv') + def v2 = ver('d1', 2, 'b.csv') + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDatasets(10L) >> [dataset] + 1 * client.listVersions('d1', 10L) >> [v1, v2] + 1 * client.downloadDataset('d1', '1', 'a.csv', 10L) >> new ByteArrayInputStream('x'.bytes) + } + + def "newInputStream throws NoSuchFileException when dataset is missing"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/ghost') + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDatasets(10L) >> [ds('d1', 'samples')] + thrown(NoSuchFileException) + } + + def "newInputStream throws when pinned version is unknown"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples@99') + def dataset = ds('d1', 'samples') + def v1 = ver('d1', 1, 'a.csv') + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDatasets(10L) >> [dataset] + 1 * client.listVersions('d1', 10L) >> [v1] + thrown(NoSuchFileException) + } + + def "newInputStream falls back to latest when pinned version is omitted"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + def dataset = ds('d1', 'samples') + def enabled = ver('d1', 3, 'c.csv', false) + def disabled = ver('d1', 4, 'd.csv', true) + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDatasets(10L) >> [dataset] + 1 * client.listVersions('d1', 10L) >> [enabled, disabled] + 1 * client.downloadDataset('d1', '3', 'c.csv', 10L) >> new ByteArrayInputStream('x'.bytes) + } + + def "readAttributes at depth 3 reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + attr.directory + !attr.regularFile + } + + def "readAttributes at depth 4 returns file attributes"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDatasets(10L) >> [ds('d1', 'samples')] + attr.regularFile + !attr.directory + attr.fileKey() == 'd1' + } + + def "list attaches cached attributes to every child path"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets') + def now = java.time.OffsetDateTime.parse('2026-03-01T12:00:00Z') + def d = ds('d1', 'samples'); d.dateCreated = now; d.lastUpdated = now + + when: + def paths = handler.list(path).toList() + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDatasets(10L) >> [d] + def cached = (paths[0] as SeqeraPath).cachedAttributes + cached != null + cached.regularFile + cached.fileKey() == 'd1' + } + + def "readAttributes short-circuits when the path has cached attributes"() { + given: + def attrs = new io.seqera.tower.plugin.fs.SeqeraFileAttributes(0L, java.time.Instant.EPOCH, java.time.Instant.EPOCH, 'key') + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets').resolveWithAttributes('samples', attrs) + + when: + def got = handler.readAttributes(path) + + then: + 0 * fs.resolveWorkspaceId(_, _) + 0 * client.listDatasets(_) + got === attrs + } + + def "checkAccess rejects WRITE"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + + when: + handler.checkAccess(path, AccessMode.WRITE) + + then: + thrown(java.nio.file.AccessDeniedException) + } +} diff --git a/specs/260422-seqera-datalinks-fs/plan.md b/specs/260422-seqera-datalinks-fs/plan.md new file mode 100644 index 0000000000..3f76fc89d2 --- /dev/null +++ b/specs/260422-seqera-datalinks-fs/plan.md @@ -0,0 +1,313 @@ +# Implementation Plan: Seqera NIO Filesystem Support for Platform Data-Links + +**Branch**: `260422-seqera-datalinks-fs` | **Date**: 2026-04-23 | **Spec**: [spec.md](spec.md) | **ADR**: [20260422-seqera-datalinks-filesystem](../../adr/20260422-seqera-datalinks-filesystem.md) + +--- + +## Summary + +Extend the `nf-tower` plugin's `seqera://` NIO filesystem (shipped in [260310-seqera-dataset-fs](../260310-seqera-dataset-fs/spec.md)) with a second resource type, `data-links`. Paths of the form `seqera:////data-links///` resolve to entries inside Platform-managed data-links (S3/GCS/Azure buckets or prefixes). Listings and attribute queries hit the Platform's `/data-links/{id}/browse[/path]` endpoints; byte reads go through a pre-signed URL returned by `/data-links/{id}/generate-download-url` and fetched with a plain JDK HTTP client — no cloud SDK dependency. + +As part of this change, extract a real `ResourceTypeHandler` abstraction from the existing dataset logic. `DatasetsResourceHandler` and `DataLinksResourceHandler` are parallel implementations; the generic classes (`SeqeraFileSystemProvider`, `SeqeraFileSystem`, `SeqeraPath`, `SeqeraFileAttributes`) become resource-type-agnostic for depth ≥ 3. + +--- + +## Technical Context + +**Language/Version**: Groovy 4.0.29, targeting Java 17 runtime +**Primary Dependencies**: `io.seqera:tower-api:1.121.0` (existing), `io.seqera:lib-httpx:2.1.0` (existing via TowerClient) +**Storage**: None (in-memory org/workspace + data-link list caches on `SeqeraFileSystem`) +**Testing**: Spock Framework with `Mock(TowerClient)` + `JsonOutput` fixtures — matches the dataset test style +**Target Platform**: nf-tower plugin; runs wherever Nextflow runs +**Performance Goals**: Listing any hierarchy level in ≤ 5s for workspaces with up to 500 data-links (SC-003) +**Constraints**: Read-only in this iteration; no cloud SDK on classpath (SC-006) +**Scale/Scope**: Single plugin change; no new plugin, no core module changes + +--- + +## Constitution Check + +| Principle | Status | Notes | +|-----------|--------|-------| +| I. Modular Architecture | ✅ PASS | Feature lives in `nf-tower`; no core module changes | +| II. Test-Driven Quality | ✅ PASS | Spock unit tests for client, handler, refactored path; reuses existing `Mock(TowerClient)` pattern | +| III. Dataflow Programming Model | ✅ PASS | NIO `InputStream`; no dataflow model changes | +| IV. Apache 2.0 License | ✅ PASS | All new files must include Apache 2.0 header | +| V. DCO Sign-off | ✅ PASS | All commits use `git commit -s` | +| VI. Semantic Versioning | ✅ PASS | VERSION bump and changelog entry both deferred to release time | +| VII. Groovy Idioms | ✅ PASS | `@CompileStatic`, follow existing `fs/` patterns | + +No violations. + +--- + +## Project Structure + +### Documentation (this feature) + +```text +specs/260422-seqera-datalinks-fs/ +├── spec.md ← feature spec (exists) +├── plan.md ← this file +└── tasks.md ← task checklist +``` + +### Source Code (nf-tower plugin) + +```text +plugins/nf-tower/ +└── src/ (VERSION and changelog.txt updated at release time, not in this feature) + ├── main/io/seqera/tower/plugin/ + │ ├── fs/ + │ │ ├── ResourceTypeHandler.groovy ← NEW (interface; list returns Iterable) + │ │ ├── SeqeraFileSystemProvider.groovy ← refactored (dispatch by handler; lazy filter iterator) + │ │ ├── SeqeraFileSystem.groovy ← refactored (handler registry; no dataset caches) + │ │ ├── SeqeraPath.groovy ← refactored (trail segments, cachedAttributes, resolveWithAttributes) + │ │ ├── SeqeraFileAttributes.groovy ← refactored (isDir, size, lastMod, created, fileKey) + │ │ ├── SeqeraPathFactory.groovy ← unchanged + │ │ ├── DatasetInputStream.groovy ← unchanged + │ │ └── handler/ + │ │ ├── DatasetsResourceHandler.groovy ← NEW (extracted; owns dataset caches; parses @version) + │ │ └── DataLinksResourceHandler.groovy ← NEW + │ ├── dataset/ + │ │ └── SeqeraDatasetClient.groovy ← unchanged + │ └── datalink/ ← NEW package + │ ├── SeqeraDataLinkClient.groovy ← NEW (typed client; returns iterators and PagedDataLinkContent) + │ └── PagedDataLinkContent.groovy ← NEW (lazy pagination view over data-link content) + └── test/io/seqera/tower/plugin/ + ├── fs/ + │ ├── SeqeraPathTest.groovy ← extended (sub-path cases, cachedAttributes, trailing slash) + │ ├── SeqeraFileSystemTest.groovy ← extended (handler registry) + │ ├── SeqeraFileSystemProviderTest.groovy ← extended (data-link dispatch specs) + │ ├── ResourceTypeAbstractionTest.groovy ← NEW (architectural guard) + │ └── handler/ + │ ├── DatasetsResourceHandlerTest.groovy ← NEW (caches, attr short-circuit) + │ └── DataLinksResourceHandlerTest.groovy ← NEW (cache, credentialsId, paged listings) + └── datalink/ + └── SeqeraDataLinkClientTest.groovy ← NEW (pagination, endpoint URLs, error mapping) +``` + +**Structure decision**: Parallel `datalink/` package mirrors the existing `dataset/` package. Handlers live in `fs/handler/` so the generic NIO classes in `fs/` remain resource-type-agnostic. All wire DTOs are reused from `io.seqera.tower.model.*` — no plugin-local DTO classes. `PagedDataLinkContent` is a plugin-local service type (not a DTO) that wraps lazy pagination over `DataLinkItem` streams. + +--- + +## Phase 0: Research Notes + +### Tower-API DTOs (confirmed via `javap`) + +All reused from `io.seqera:tower-api:1.121.0` (already on the classpath): + +| DTO | Fields used here | +|---|---| +| `DataLinkDto` | `id: String`, `name: String`, `provider: DataLinkProvider`, `resourceRef: String`, `credentials: List` | +| `DataLinkCredentials` | `id: String`, `name: String`, `provider: DataLinkProvider` | +| `DataLinkProvider` (enum) | `AWS`, `GOOGLE`, `AZURE`, `AZURE_ENTRA`, `AZURE_CLOUD`, `SEQERACOMPUTE`, `S3`. `toString()` returns the **lowercase** enum value (e.g. `"aws"`, `"google"`) — this is what appears in user-visible paths. | +| `DataLinksListResponse` | `dataLinks: List`, `totalSize: Long` | +| `DataLinkContentResponse` | `originalPath: String`, `objects: List`, `nextPageToken: String` | +| `DataLinkItem` | `type: DataLinkItemType`, `name: String`, `size: Long`, `mimeType: String` — no last-modified field | +| `DataLinkItemType` (enum) | `FOLDER`, `FILE` | +| `DataLinkDownloadUrlResponse` | `url: String` | + +**Attribute consequence**: `DataLinkItem` does not expose a last-modified timestamp. `SeqeraFileAttributes.lastModifiedTime()` for data-link paths returns `FileTime.from(Instant.EPOCH)`. Spec assumption and FR-005 remain satisfied — we return a valid `FileTime`; the absence of real data is a Platform-API limitation. + +### Platform endpoints (confirmed from OpenAPI) + +| Operation | Endpoint | Notes | +|---|---|---| +| List data-links in workspace | `GET /data-links?workspaceId=&max=&offset=` | Offset pagination. `totalSize` = full count; `max=100` per page. Optional `&search=` used by `getDataLink` for server-side pre-filter. | +| Browse root of a data-link | `GET /data-links/{id}/browse?workspaceId=` | Token pagination via `nextPageToken`. Optional `credentialsId`. | +| Browse a sub-path | `GET /data-links/{id}/browse/{path}?workspaceId=` | Same response and pagination as the root variant. The `{path}` segment preserves `/` as path separators. | +| Pre-signed download URL | `GET /data-links/{id}/generate-download-url?workspaceId=&filePath=` | Returns `DataLinkDownloadUrlResponse.url`. Optional `credentialsId`. | + +`credentialsId` is taken from `DataLinkDto.credentials[0].id` when the list is non-empty, otherwise the query parameter is omitted. + +### Signed-URL fetch + +The signed URL is **not** a Seqera endpoint; it points at S3/GCS/Azure with auth baked into the query string. It must be fetched **without** the Seqera `Authorization` header (AWS SigV4 will reject unknown `Authorization` headers). Use a standalone `java.net.http.HttpClient` inside `DataLinksResourceHandler` for this fetch. Do **not** use `TowerClient.sendStreamingRequest`, which would add Seqera auth headers. + +### SeqeraPath refactor shape + +Replace the six typed fields (`org`, `workspace`, `resourceType`, `datasetName`, `version`, `relPath`) with: + +- `org: String` (or null) +- `workspace: String` (or null) +- `resourceType: String` (or null) +- `trail: List` (possibly empty) — the segments after `resourceType` +- `relPath: String` (for relative paths; mutually exclusive with absolute segments) +- `cachedAttributes: SeqeraFileAttributes` (nullable) — set only by handlers when this path is produced by a listing, so subsequent `readAttributes` calls skip the API + +`trail` is opaque to `SeqeraPath` — handlers interpret it. Trail segments are stored verbatim (including any `@version` suffix for datasets); interpretation is the handler's responsibility. Concrete interpretations: + +- **Dataset** (`resourceType = "datasets"`): `trail.size() == 0` → resource-type dir; `trail.size() == 1` → dataset file. The trail segment may carry an `@version` suffix (e.g. `samples@2`); `DatasetsResourceHandler.parseNameAndVersion` splits it internally. +- **Data-link** (`resourceType = "data-links"`): `trail.size() == 0` → resource-type dir; `trail.size() == 1` → provider dir; `trail.size() == 2` → data-link root dir; `trail.size() ≥ 3` → entry inside the data-link (directory or file, per `readAttributes`). + +`depth()` becomes `3 + trail.size()` when `resourceType` is set, else the count of non-null identity fields. + +`SeqeraPath` tolerates trailing slashes and accidental double-slashes in the URI (empty trail segments are filtered at parse time). `cachedAttributes` is ignored by `equals`/`hashCode`/`toString`/`toUri`/`resolve`/`getParent`; a new method `resolveWithAttributes(String segment, SeqeraFileAttributes attrs)` produces a child path carrying the given attrs. + +### Existing tests to preserve + +Running `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.*'` and `... --tests 'io.seqera.tower.plugin.dataset.*'` must continue to pass throughout the refactor. The dataset behavior does not change; only the class that implements it does. + +--- + +## Phase 1: Design & Contracts + +### `ResourceTypeHandler` interface + +```groovy +interface ResourceTypeHandler { + /** the depth-3 segment this handler owns, e.g. "datasets" or "data-links" */ + String getResourceType() + + /** + * List entries at the given directory path. Caller has verified depth >= 3. + * Returning Iterable lets implementations stream large listings without + * materializing them in memory. + */ + Iterable list(SeqeraPath dir) throws IOException + + /** return BasicFileAttributes for any path at depth >= 3 owned by this handler */ + SeqeraFileAttributes readAttributes(SeqeraPath path) throws IOException + + /** open a read stream for a leaf path; throw if the path is a directory */ + InputStream newInputStream(SeqeraPath path) throws IOException + + /** verify the path exists and modes are satisfiable; READ allowed, WRITE/EXECUTE rejected */ + void checkAccess(SeqeraPath path, AccessMode... modes) throws IOException +} +``` + +Handlers build each child path via `parent.resolveWithAttributes(segmentName, attrs)` so subsequent `readAttributes` calls short-circuit when the same path is used. + +### `SeqeraDataLinkClient` contract + +```groovy +class SeqeraDataLinkClient { + SeqeraDataLinkClient(TowerClient towerClient) + + /** + * Lazy iterator over every data-link in the workspace. Pages fetched on demand + * via GET /data-links?workspaceId=&max=100&offset=. + */ + Iterator listDataLinks(long workspaceId) + + /** + * Server-side-filtered resolution of a single data-link by (provider, name). + * Iterates /data-links with &search=, short-circuits on first match; + * result is @Memoized per (workspaceId, provider, name). + * Throws NoSuchFileException if not found. + */ + DataLinkDto getDataLink(long workspaceId, String provider, String name) + + /** Distinct provider identifiers present in the workspace (sorted). */ + Set getDataLinkProviders(long workspaceId) + + /** + * Lazy paginated view over /data-links/{id}/browse[/{path}]. + * The returned PagedDataLinkContent loads the first page eagerly and paginates + * subsequent pages as its iterator advances. + */ + PagedDataLinkContent getContent(String dataLinkId, String subPath, long workspaceId, String credentialsId = null) + + /** GET /data-links/{id}/generate-download-url?filePath=[&credentialsId=] */ + DataLinkDownloadUrlResponse getDownloadUrl(String dataLinkId, String subPath, long workspaceId, String credentialsId = null) +} +``` + +All endpoints translate 401/403/404/5xx through the same `checkFsResponse` pattern used in `SeqeraDatasetClient`. The `credentialsId` parameter is forwarded as a query-string value when non-null; the handler sources it from `DataLinkDto.credentials[0].id`. + +### `PagedDataLinkContent` contract + +```groovy +class PagedDataLinkContent implements Iterable { + /** Page fetcher: fetch(null) -> first page; fetch(token) -> next page. */ + static interface PageFetcher { + Map fetch(String nextPageToken) throws IOException + // returns: {objects: List, nextPageToken: String, originalPath: String (first page only)} + } + + PagedDataLinkContent(String originalPath, List firstPage, String firstPageNextToken, PageFetcher pageFetcher) + + String getOriginalPath() + List getFirstPage() // eager, already loaded at construction + boolean isEmpty() + Iterator iterator() // yields first-page items, then paginates lazily +} +``` + +### `SeqeraFileSystem` handler registry + +```groovy +class SeqeraFileSystem extends FileSystem { + // existing org/workspace state (unchanged) + private final Map handlers = new LinkedHashMap<>() + + void registerHandler(ResourceTypeHandler h) { handlers.put(h.resourceType, h) } + ResourceTypeHandler getHandler(String type) { handlers.get(type) } + Set getResourceTypes() { Collections.unmodifiableSet(handlers.keySet()) } +} +``` + +`SeqeraFileSystemProvider.newFileSystem()` registers both handlers after constructing the filesystem: + +```groovy +fs.registerHandler(new DatasetsResourceHandler(fs, new SeqeraDatasetClient(towerClient))) +fs.registerHandler(new DataLinksResourceHandler(fs, new SeqeraDataLinkClient(towerClient))) +``` + +### Dispatch in `SeqeraFileSystemProvider` + +- Depth 0–2: handled directly (root/org/workspace) — uses `SeqeraFileSystem`'s org/workspace cache as before. +- Depth 2 listing: returns **`fs.getResourceTypes()`** as child paths (replaces the hard-coded `['datasets']`). +- Depth ≥ 3: dispatch to `fs.getHandler(sp.resourceType)`; unknown type → `NoSuchFileException("Unsupported resource type: ${sp.resourceType}")`. + +### `SeqeraFileAttributes` refactor + +Replace the `DatasetDto`-coupled constructor with two constructors: + +```groovy +/** directory */ +SeqeraFileAttributes(boolean isDir) +/** file with explicit metadata */ +SeqeraFileAttributes(long size, Instant lastModified, Instant created, Object fileKey) +``` + +Internal fields become `(directory, size, lastModified, created, fileKey)`. `DatasetsResourceHandler` constructs the file variant from `DatasetDto`; `DataLinksResourceHandler` constructs it from `DataLinkItem`. The previous `SeqeraFileAttributes(DatasetDto)` and `SeqeraFileAttributes(boolean)` call sites are updated. + +### `DataLinksResourceHandler` behaviors + +| Path shape | Method | Implementation | +|---|---|---| +| `data-links/` (trail=[]) | `list` | `client.getDataLinkProviders(ws)` → distinct providers (sorted); emit child paths `data-links/` | +| `data-links/` (trail=[p]) | `list` | stream `client.listDataLinks(ws)`; collect names where provider matches; emit child paths; `NoSuchFileException` if none match | +| `data-links//` (trail=[p,n]) | `list` | `client.getDataLink(ws, p, n)` → `dl`; `client.getContent(dl.id, "", ws, credentialsIdOf(dl))` → wrap items as `Iterable` carrying cached `SeqeraFileAttributes` | +| `data-links////…` (trail ≥ 3) | `list` | same as above with `subPath = trail[2..].join('/')` | +| any depth ≥ 3 | `readAttributes` | short-circuit if `p.cachedAttributes` is set; else: data-link-root → directory; deeper → `getContent(id, sub, ws, credentialsIdOf(dl)).firstPage`; if one item matching the last segment with `type = FILE`, return file attrs (size from item); otherwise → directory | +| leaf file | `newInputStream` | `client.getDataLink(ws, p, n)` → `dl`; `client.getDownloadUrl(dl.id, sub, ws, credentialsIdOf(dl))`; open a plain JDK `HttpClient.send(..., BodyHandlers.ofInputStream())` against `response.url`; return body stream | + +`credentialsIdOf(dl)` returns `dl.credentials[0].id` when non-empty, else `null` (query parameter omitted). + +Provider segment canonicalization: the path segment is the `DataLinkProvider` enum's `toString()` — lowercase (e.g. `aws`, `google`, `azure`). A path with an unknown provider segment fails via `client.getDataLink(...)` → `NoSuchFileException`. + +Listings populate cached attributes on each emitted `SeqeraPath` (via `parent.resolveWithAttributes(name, attrs)`) so a follow-up `readAttributes(child)` returns immediately with zero API calls. Attributes come directly from each `DataLinkItem`: file → `(size, Instant.EPOCH, Instant.EPOCH, item.name)`; folder → `SeqeraFileAttributes(true)`. + +### Data-link identity resolution + +`client.getDataLink(workspaceId, provider, name)` iterates `/data-links?search=` (server-side pre-filter) and returns the first entry whose `provider.toString() == providerSegment`. Memoized via `@Memoized` per `(workspaceId, provider, name)` — repeated handler calls within a run hit the memoization cache. The handler does NOT maintain its own `Map>` cache — the client-level streaming iterator plus memoized-lookup replaces it. + +--- + +## Phase 2: Deliverables + +The detailed task list is in [tasks.md](tasks.md). Phases in execution order: + +1. **Refactor (foundational)** — extract `ResourceTypeHandler`, `DatasetsResourceHandler`; generalize `SeqeraPath`, `SeqeraFileSystem`, `SeqeraFileSystemProvider`, `SeqeraFileAttributes`. Existing dataset tests must pass unchanged. +2. **Data-link API client** — implement `SeqeraDataLinkClient` with pagination and error mapping. Unit tests with `Mock(TowerClient)`. +3. **US1 — Read file inside a data-link** — implement `DataLinksResourceHandler.newInputStream` and register the handler. +4. **US2 — Browse hierarchy** — implement `list` and `readAttributes`; workspace listing enumerates handlers. +5. **US3 — Error paths** — explicit tests for unknown provider, unknown data-link, missing sub-path, 401/403 mapping. +6. **US4 — Extensibility validation** — architectural check that generic classes stay resource-type-agnostic; end-to-end spec exercising both handlers in one fs. +7. **Final verification** — full test suite green; no cloud SDK added to classpath. VERSION bump and changelog entry happen at release time, not in this feature. + +Each task in `tasks.md` specifies exact file paths, exact code changes, exact test commands, and a commit step. diff --git a/specs/260422-seqera-datalinks-fs/spec.md b/specs/260422-seqera-datalinks-fs/spec.md new file mode 100644 index 0000000000..43b782d69a --- /dev/null +++ b/specs/260422-seqera-datalinks-fs/spec.md @@ -0,0 +1,189 @@ +# Feature Specification: Seqera NIO Filesystem Support for Platform Data-Links + +**Feature Branch**: `260422-seqera-datalinks-fs` +**Created**: 2026-04-22 +**Status**: Draft +**Depends on**: [260310-seqera-dataset-fs](../260310-seqera-dataset-fs/spec.md) +**Input**: User description: "I want to extend the seqera NIO filesystem to include the seqera platform data-links using this url template `seqera:////data-links/...`. The seqera platform API is https://cloud.seqera.io/openapi/seqera-api-latest.yml" + +## Clarifications + +### Session 2026-04-22 + +- Q: Scope of data-link support — list-only, full Platform-driven traversal, or hybrid (Platform-driven listing + cloud-driven I/O)? → A: Hybrid. Listing and attributes go through the Seqera Platform; byte-level I/O is delegated, but via pre-signed URLs the Platform returns, so no cloud provider SDK integration is required. +- Q: How are credentials handled for the cloud object storage behind a data-link? → A: Platform-brokered. The user provides only the `tower.accessToken`. The Platform returns a short-lived pre-signed URL for each read; no cloud credentials cross the plugin boundary. +- Q: Read-only, or read + write for data-links in this iteration? → A: Read-only. Write (upload) support may be added later; the architecture does not preclude it but the feature is not in scope. +- Q: Path hierarchy — does the data-link identity segment include the provider? → A: Yes. `data-links///...`. Names are not globally unique within a workspace (the same name may exist on two different providers), so the provider segment is required to disambiguate and mirrors the Platform UI's provider-grouped data explorer. +- Q: How deep can a data-link path go? → A: Arbitrary depth below the data-link root. Each segment after `` is an entry inside the underlying bucket/prefix — a directory or file, resolved via the Platform browse API. +- Q: How should the existing dataset filesystem code be extended to accommodate data-links? → A: Introduce a true resource-type abstraction (`ResourceTypeHandler`). The current dataset-specific logic in `SeqeraFileSystemProvider`, `SeqeraFileSystem`, and `SeqeraPath` is extracted into a `DatasetsResourceHandler`; data-links are added as a parallel `DataLinksResourceHandler`. The core path/filesystem/provider classes become resource-type-agnostic. +- Q: How should the listing vs I/O boundary work? → A: Listing (`newDirectoryStream`) and attributes (`readAttributes`) are resolved via the Platform's browse endpoints (`GET /data-links/{id}/browse` for the data-link root and `GET /data-links/{id}/browse/{path}` for sub-paths). Downloads (`newInputStream`) go through `GET /data-links/{id}/generate-download-url?filePath=` to obtain a pre-signed URL, which is then fetched with a plain JDK `HttpClient` (no Seqera auth header on the cloud-backed URL). No cloud SDK is used. +- Q: Which DTOs are introduced by this feature? → A: None. All types are reused from the `io.seqera:tower-api:1.121.0` dependency (`DataLinkDto`, `DataLinkItem`, `DataLinkProvider`, `DataLinkCredentials`, `DataLinkContentResponse`, `DataLinkDownloadUrlResponse`, etc.). A plugin-local `PagedDataLinkContent` holder class wraps the eager-first-page + lazy-pagination behavior but holds only tower-api types. +- Q: Is browse-per-file supported by the Platform API? → A: Yes. `GET /data-links/{id}/browse/{path}` works for both directories and files, so `readAttributes` on any path is a single targeted call — no parent-browse-and-filter, no N+1 problem. +- Q: How are paginated Platform responses returned to callers? → A: Streaming. The workspace data-link list (`GET /data-links`) returns an `Iterator` that fetches offsets on demand. The browse endpoint returns a `PagedDataLinkContent` that loads the first page eagerly (so `readAttributes` can inspect it without iterating) and fetches subsequent pages lazily as the iterator advances. The handler layer exposes `Iterable` to the NIO `DirectoryStream`; no full materialization of listings in memory. +- Q: What convenience methods does the client expose on top of the raw list endpoint? → A: Two memoized helpers — `getDataLink(ws, provider, name)` uses the server-side `&search=` filter and returns the first match (throws `NoSuchFileException` on miss); `getDataLinkProviders(ws)` returns the sorted set of distinct providers present in the workspace. Both are memoized per-arguments within a single `SeqeraDataLinkClient` instance. +- Q: How are attributes discovered after a listing? → A: When `newDirectoryStream` yields a child path, the handler attaches the per-item attributes (size for files, directory marker for folders) to the `SeqeraPath` via an optional cache field. A subsequent `readAttributes` on that path returns the cached value without any additional Platform API call. Paths parsed from URIs (no prior listing) fall back to the live browse endpoint. +- Q: How are cloud credentials for the underlying bucket/prefix selected? → A: The Platform's `DataLinkDto.credentials` list associates one or more credential records with a data-link. The plugin forwards the first credential's ID as the `credentialsId` query parameter on browse and download-URL requests, when present. If the data-link has no associated credentials, the parameter is omitted and the Platform uses its default resolution. +- Q: Which provider-segment value appears in user-visible paths? → A: The lowercase value of the `DataLinkProvider` enum, as exposed by its `toString()` (e.g. `aws`, `google`, `azure`). This matches the Platform UI. +- Q: What happens if the pre-signed URL expires during a long read? → A: The underlying HTTP connection errors out with an `IOException`. The plugin does not transparently re-issue URLs; Nextflow's task retry handles the failure as it already does for other transient I/O errors. + +## User Scenarios & Testing *(mandatory)* + +### User Story 1 — Use a File Inside a Data-Link as Pipeline Input (Priority: P1) + +A Nextflow pipeline developer has registered an S3 bucket or a GCS prefix as a Seqera Platform data-link (e.g. `inputs` on AWS). They want to reference a file inside that data-link as a pipeline input using a `seqera://` path, without configuring cloud credentials separately and without any manual pre-download step. + +**Why this priority**: This is the core value proposition of the feature. All other stories build on the ability to resolve and read a file inside a data-link by path. + +**Independent Test**: Write a pipeline that sets an input channel to `seqera:////data-links///` and verify the pipeline task receives the correct file content, using only the Seqera access token for authentication. + +**Acceptance Scenarios**: + +1. **Given** a data-link named `inputs` registered with provider `aws` in workspace `acme/research`, pointing at `s3://my-bucket/data/`, **When** a pipeline references `seqera://acme/research/data-links/aws/inputs/reads/sample1.fq.gz`, **Then** the pipeline task receives the byte content of `s3://my-bucket/data/reads/sample1.fq.gz` transparently. +2. **Given** a data-link on `google` and one on `azure` in the same workspace, **When** both are referenced from the same pipeline, **Then** each path resolves independently and content is streamed correctly from the respective provider. +3. **Given** a Seqera access token is configured (`tower.accessToken` or `TOWER_ACCESS_TOKEN`), **When** a data-link path is accessed, **Then** no additional cloud credentials (AWS, GCP, Azure) are required from the user. + +--- + +### User Story 2 — Browse the Data-Link Hierarchy (Priority: P2) + +A pipeline developer wants to navigate the data-link namespace at any level — list providers in a workspace, list data-links within a provider, and browse into the content tree of a specific data-link — using ordinary directory listing operations. + +**Why this priority**: Hierarchical listing supports discoverability and dynamic pipeline construction. Without it, users must know the exact path in advance. + +**Independent Test**: List each level of the hierarchy and verify the correct child entries appear. + +**Acceptance Scenarios**: + +1. **Given** a workspace `acme/research` has data-links on both AWS and GCS, **When** a user lists `seqera://acme/research/data-links/`, **Then** `aws` and `google` are returned as directory entries (only providers in use appear). +2. **Given** the `aws` provider has two data-links `inputs` and `archive`, **When** a user lists `seqera://acme/research/data-links/aws/`, **Then** both names are returned as directory entries. +3. **Given** a data-link `inputs` contains a folder `reads/` with files `a.fq.gz` and `b.fq.gz`, **When** a user lists `seqera://acme/research/data-links/aws/inputs/reads/`, **Then** both files appear as file entries with correct size and last-modified metadata. +4. **Given** a data-link root is listed, **When** the data-link is empty, **Then** an empty result is returned without errors. +5. **Given** a user lacks access to a workspace, **When** they attempt to list any `data-links/` path within it, **Then** a clear access-denied error is returned without leaking internal details. +6. **Given** `readAttributes` is called directly on a file path (no prior listing), **When** the path exists, **Then** a single Platform API call returns the file attributes (no parent-directory scan). + +--- + +### User Story 3 — Receive Meaningful Errors for Invalid or Inaccessible Paths (Priority: P3) + +A pipeline developer mistypes a data-link name, uses a provider segment that has no registered data-links, or references a path that no longer exists inside a data-link. They receive a clear, actionable error that helps them fix the problem. + +**Why this priority**: Error handling is essential for usability but delivers no new functionality on its own. Good errors prevent support escalations. + +**Independent Test**: Reference invalid data-link paths and verify the error messages identify the problem (unknown provider, unknown data-link name, path not found inside data-link, authentication failure) without generic or cryptic failures. + +**Acceptance Scenarios**: + +1. **Given** a workspace has no data-links on the `azure` provider, **When** a pipeline references `seqera://.../data-links/azure/anything`, **Then** a `NoSuchFileException` is raised with a message indicating no data-links exist for provider `azure` in the workspace. +2. **Given** a data-link name that does not exist under a provider, **When** a pipeline attempts to read any path within it, **Then** the error identifies the missing data-link by name and path. +3. **Given** a valid data-link but a path that does not exist inside it, **When** a pipeline attempts to read it, **Then** a `NoSuchFileException` is raised with a message including the sub-path. +4. **Given** invalid or expired credentials, **When** any data-link path is accessed, **Then** an authentication error is reported with guidance to reconfigure `tower.accessToken` / `TOWER_ACCESS_TOKEN`. +5. **Given** a resource-type segment that is neither `datasets` nor `data-links`, **When** used, **Then** a clear "unsupported resource type" error is returned (unchanged from dataset feature). + +--- + +### User Story 4 — Extensible Resource-Type Architecture (Priority: P4) + +A Nextflow or Seqera engineer wants the filesystem's resource-type abstraction to be real and exercised by more than one resource type, so that adding future Seqera-managed resources requires isolated, scoped changes rather than cross-cutting refactors. + +**Why this priority**: Shipping a second resource type is the right moment to introduce the shared abstraction — with two concrete consumers in place, the interface is validated in practice. This story captures the refactor as a first-class deliverable alongside the new data-link functionality. + +**Independent Test**: A code review confirms that (a) `SeqeraFileSystemProvider`, `SeqeraFileSystem`, and `SeqeraPath` contain no dataset- or data-link-specific branching for depth ≥ 3; (b) adding a hypothetical third resource type is a new `ResourceTypeHandler` implementation with no changes to the generic path/provider/filesystem classes; (c) both `DatasetsResourceHandler` and `DataLinksResourceHandler` implement the same interface without leakage of each other's concepts. + +**Acceptance Scenarios**: + +1. **Given** the refactored filesystem, **When** `seqera:////` is listed, **Then** the resource-type entries (`datasets`, `data-links`) are enumerated from the handler registry rather than a hard-coded list. +2. **Given** the refactored `SeqeraPath`, **When** any path shape valid for either resource type is parsed, **Then** parsing succeeds without requiring the path class to know which resource type owns the depth-4+ segments. +3. **Given** a new handler is registered in `SeqeraFileSystem`, **When** paths with its resource-type segment are resolved, **Then** dispatch reaches it without modifying existing handlers. +4. **Given** both existing resource types, **When** a path from one is accessed, **Then** the other handler's code is never executed. + +--- + +### Edge Cases + +- What happens when a data-link's underlying bucket/prefix has been revoked on the provider side but the data-link still exists in Seqera? The Platform surfaces an error which is propagated as `IOException`. +- What happens when a data-link has thousands of entries at one level? The Platform browse endpoint's pagination (if any) must be exhausted; initial implementation pages through using whatever cursor token the response exposes. +- What happens when a user has a provider name containing characters the path class rejects? Provider identifiers come from the Platform API verbatim (`DataLinkProvider` enum values); they are valid path segments by construction. +- What happens when the same data-link name exists for two providers (e.g. `inputs` on AWS and `inputs` on GCS)? Both are addressable at distinct paths: `data-links/aws/inputs/...` and `data-links/google/inputs/...`. The provider segment disambiguates. +- What happens when a pre-signed URL expires mid-read? The HTTP read fails with `IOException`. The plugin does not transparently re-issue URLs; Nextflow task retry handles the failure. +- What happens when the transient Platform API is unavailable? Same as the dataset feature — `TowerClient`'s retry/backoff is reused; exhaustion raises `IOException`. +- What happens when a data-link's listing contains entries whose names include `/`? The browse response is expected to return name segments, not paths; any entry whose name contains `/` is rejected with a descriptive error (indicates a provider data issue). +- What happens when a data-link is accessed concurrently from many pipeline tasks? All reads are independent signed-URL fetches; no shared state beyond the (read-only) cached data-link list. +- How is pagination of `GET /data-links` handled if a workspace has more data-links than fit in a single response page? Implementation must exhaust pages before caching. + +## Requirements *(mandatory)* + +### Functional Requirements + +- **FR-001**: System MUST accept paths in the format `seqera:////data-links///` where `` is zero or more segments addressing a directory or file inside the data-link. +- **FR-002**: System MUST read file content addressed by a data-link path transparently, requiring only the existing `tower.accessToken` / `TOWER_ACCESS_TOKEN` configuration — no cloud-provider credentials. +- **FR-003**: System MUST perform listing and attribute queries via the Seqera Platform browse endpoints (`GET /data-links/{id}/browse` for the data-link root and `GET /data-links/{id}/browse/{path}` for sub-paths), and stream file content via pre-signed URLs returned from `GET /data-links/{id}/generate-download-url?filePath=`. +- **FR-004**: System MUST support hierarchical directory listing: + - `seqera:////` → directory; entries include `datasets` and `data-links` (enumerated from the handler registry). + - `seqera:////data-links/` → directory; entries are distinct provider identifiers present in the workspace. + - `seqera:////data-links//` → directory; entries are data-link names under that provider. + - `seqera:////data-links///` → directory; entries are the top-level items in the data-link. + - `seqera:////data-links////` → directory; entries are the children at that sub-path. + - `seqera:////data-links////` → file. +- **FR-005**: System MUST return correct `BasicFileAttributes` — `isDirectory`, `isRegularFile`, `size`, `lastModifiedTime`, `creationTime` — for any path inside a data-link. When a path was produced by a prior `newDirectoryStream` listing, its attributes MUST be returned from the listing response without a follow-up API call. Paths parsed from a URI (no prior listing) MUST source attributes from the Platform's browse endpoint for that specific path. +- **FR-006**: System MUST treat data-link paths as read-only in this iteration. Any write-like operation (`newByteChannel` with `WRITE`/`APPEND`, `copy` with a data-link as target, `delete`, `createDirectory`, `move`) MUST fail with `UnsupportedOperationException` or `AccessDeniedException`, consistent with the dataset feature's read-only stance. +- **FR-007**: System MUST produce clear, actionable error messages distinguishing: unknown org/workspace, unknown provider, unknown data-link name, missing sub-path, unsupported resource type, authentication failure, and transient Platform errors. +- **FR-008**: System MUST NOT depend on `nf-amazon`, `nf-google`, or `nf-azure`. All cloud I/O is reduced to a single HTTPS fetch of a pre-signed URL. The signed URL is fetched with a plain JDK `HttpClient` — NOT through `TowerClient`, since the URL is addressed to the cloud backend and must not carry the Seqera `Authorization` header. +- **FR-009**: System MUST reuse DTOs from `io.seqera:tower-api:1.121.0` (`DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkCredentials`, `DataLinkProvider`, etc.) without introducing parallel plugin-local DTOs. A plugin-local `PagedDataLinkContent` service type is permitted as a lazy-pagination wrapper around tower-api types. +- **FR-010**: System MUST refactor the existing `fs/` package to introduce a `ResourceTypeHandler` interface. `DatasetsResourceHandler` MUST encapsulate all dataset-specific behavior previously inlined in `SeqeraFileSystemProvider` / `SeqeraFileSystem` / `SeqeraPath`. `DataLinksResourceHandler` MUST implement the same interface. +- **FR-011**: After the refactor, the classes `SeqeraPath`, `SeqeraFileSystem`, and `SeqeraFileSystemProvider` MUST contain no dataset- or data-link-specific logic for depth ≥ 3; all such logic MUST live in the respective handler. +- **FR-012**: `SeqeraPath` MUST parse and represent arbitrary sub-paths below depth 4 for resource types that support them (data-links). Datasets continue to reject sub-paths beyond depth 4. +- **FR-013**: The filesystem MUST reuse the existing `TowerClient` retry/backoff for all Platform API calls. No new retry logic is introduced. +- **FR-014**: Transient failure of a pre-signed URL fetch mid-stream MUST surface as `IOException`; Nextflow task retry handles the recovery. The plugin MUST NOT re-issue URLs transparently within a single `InputStream`. +- **FR-015**: System MUST NOT maintain a global or per-run cache of browse-result pages or pre-signed URLs. A cheap per-path attribute cache lives on each `SeqeraPath` instance returned by a listing (file size / directory flag captured from the listing item); this cache is scoped to the lifetime of that path object and is not shared across paths. +- **FR-016**: Paginated Platform responses MUST be exposed to callers as lazy iterators — callers consume pages only as elements are requested. The workspace data-link list (`GET /data-links?workspaceId=X&max=&offset=`) MUST be returned as an `Iterator`; the data-link content endpoint (`GET /data-links/{id}/browse[/path]`) MUST be returned as a `PagedDataLinkContent` view backed by a lazy iterator of `DataLinkItem`. +- **FR-017**: System MUST forward the data-link's associated credentials identifier to the Platform when one is available. When `DataLinkDto.credentials` is non-empty, the first entry's `id` MUST be passed as the `credentialsId` query parameter on browse (`GET /data-links/{id}/browse[/path]`) and download-URL (`GET /data-links/{id}/generate-download-url`) requests. When the list is empty, the parameter MUST be omitted so the Platform applies its default resolution. + +### Key Entities + +- **Data-Link**: A Seqera Platform entity referencing a bucket or prefix on a cloud provider (S3, GCS, Azure Blob, etc.). Addressed by `(workspaceId, provider, name)`; content is browsed and read through Platform API calls. Represented in the path as `data-links///`. +- **Data-Link Provider**: A Platform-defined identifier for the cloud backend (`DataLinkProvider` enum values, e.g. `aws`, `google`, `azure`). Used as a path segment to disambiguate data-links with the same name on different providers. +- **Data-Link Entry**: An item inside a data-link — a file or folder — returned by the browse API. Has a name, type (`FILE`/`FOLDER`), size, and MIME type. The Platform's browse response does not currently expose a per-item last-modified timestamp, so that attribute is reported as epoch. +- **Resource-Type Handler**: A pluggable strategy that owns the semantics of one depth-3 path segment (`datasets`, `data-links`, …). Exposes listing, attribute, read, and access-check operations to the generic filesystem. +- **Seqera Path (data-link variant)**: The URI `seqera:////data-links//[//…]`. All segments up to and including `` form the data-link identity; the remainder is the sub-path within the data-link. + +## Success Criteria *(mandatory)* + +### Measurable Outcomes + +- **SC-001**: A pipeline developer can reference a file inside a Seqera data-link via `seqera://` path, and the pipeline runs successfully using only the Seqera access token — no cloud credentials, no manual pre-download step. +- **SC-002**: 100% of existing Nextflow file operations that work on cloud-hosted files (read, iterate lines, pass as channel input) work identically when the file is referenced via a `seqera://` data-link path. +- **SC-003**: Listing any level of the data-link hierarchy completes in under 5 seconds for workspaces with up to 500 data-links and data-links containing up to 1,000 entries at a single level. +- **SC-004**: Invalid or inaccessible data-link paths produce error messages that allow a developer to identify and fix the problem without consulting external documentation in 90% of cases (measured by user testing or code review of error-text coverage). +- **SC-005**: The refactored `fs/` package passes a code review confirming that (a) no resource-type-specific logic remains in `SeqeraPath`, `SeqeraFileSystem`, or `SeqeraFileSystemProvider`; (b) the two handlers share no code paths for depth ≥ 3; (c) the dataset tests pass unchanged after routing through `DatasetsResourceHandler`. +- **SC-006**: The plugin's runtime classpath for this feature gains no new cloud-SDK dependency (no `aws-sdk`, no `google-cloud-storage`, no `azure-*` artifacts introduced by this change). + +## Assumptions + +- Authentication reuses the existing nf-tower plugin credential mechanism (Seqera access token); no new auth configuration is required from users. +- The `GET /data-links/{id}/browse` endpoint (root) and `GET /data-links/{id}/browse/{path}` endpoint (sub-path) work for both directory and file paths. When the path points at a file, the response's `objects` array contains the single file entry; when it points at a directory, the array enumerates children. Both endpoints page via `nextPageToken`. +- The `GET /data-links/{id}/generate-download-url?filePath=` endpoint returns a pre-signed URL valid for long enough to complete a typical file read. The plugin does not extend this window. +- The signed URL points at the underlying cloud object (S3 / GCS / Azure). Fetching it does NOT go through `TowerClient`; it uses a plain JDK `HttpClient` so the Seqera `Authorization` header is not sent to the cloud backend (which would be rejected by AWS SigV4 and similar schemes). +- Data-link provider identifiers returned by the Platform (`DataLinkProvider`) are safe as path segments and are emitted in lowercase by `toString()` (e.g. `aws`, `google`, `azure`). User-visible paths use this lowercase form. +- The tower-api artifact (`io.seqera:tower-api:1.121.0`) already available on the plugin classpath exposes all DTOs required (`DataLinkDto`, `DataLinkContentResponse`, `DataLinkItem`, `DataLinkDownloadUrlResponse`, `DataLinkCredentials`, `DataLinkProvider`, etc.). +- Data-link writes, renames, deletes, and management operations (create, update, delete the data-link entity itself) are **out of scope** for this iteration. +- Browse and download-URL Platform API calls reuse `TowerClient.sendApiRequest`, inheriting the existing retry/backoff policy. The cloud-side signed-URL fetch is a one-shot JDK HTTP GET with no additional retry layer beyond Nextflow task retry. +- Data-link listings may be paginated; the plugin exposes them as lazy iterators and only fetches the pages the caller consumes. A caller that reads just the first page of a browse response pays exactly one HTTP call. +- No local caching across pipeline runs. Nextflow's standard task staging handles intra-run caching. +- Paths are case-sensitive — matches the Platform API and the dataset filesystem. +- The dataset feature's read-only filesystem stance (`isReadOnly()=true`) is preserved; data-link writes are deferred to a future iteration. + +## Dependencies + +- Seqera platform API (data-links endpoints: `/data-links`, `/data-links/{id}`, `/data-links/{id}/browse`, `/data-links/{id}/browse/{path}`, `/data-links/{id}/generate-download-url`) must be accessible from the compute environment where the pipeline runs. +- nf-tower plugin must be enabled and configured with a valid `tower.accessToken` / `TOWER_ACCESS_TOKEN`. +- The Seqera account must have at least read access to the target workspace and data-link. +- The existing dataset filesystem (`260310-seqera-dataset-fs`) must be merged — this feature builds on its classes and refactors them. + +## Out of Scope + +- Write operations (upload) to data-links — the Platform's `POST /data-links/{id}/multipart-upload` endpoint is a natural future hook but is not implemented here. +- Data-link management operations (create, update, delete the data-link entity itself). +- Transparent pre-signed URL renewal mid-stream. +- Local caching across pipeline runs. +- Browse-result caching within a run. +- Fusion integration (Fusion has its own data-link access path; this feature is for direct NIO access). diff --git a/specs/260422-seqera-datalinks-fs/tasks.md b/specs/260422-seqera-datalinks-fs/tasks.md new file mode 100644 index 0000000000..3acf79fbec --- /dev/null +++ b/specs/260422-seqera-datalinks-fs/tasks.md @@ -0,0 +1,2475 @@ +# Tasks: Seqera NIO Filesystem Support for Platform Data-Links + +**Branch**: `260422-seqera-datalinks-fs` | **Spec**: [spec.md](spec.md) | **Plan**: [plan.md](plan.md) + +> **Status**: This task checklist was the initial implementation recipe. The shipped code diverges in several specifics discovered during integration — the canonical design now lives in [spec.md](spec.md) and [plan.md](plan.md). Notable refinements vs the tasks below: +> +> - Endpoints: browse uses `/data-links/{id}/browse` and `/data-links/{id}/browse/{path}` (not `/content`); signed URLs come from `/data-links/{id}/generate-download-url?filePath=…` (not `/download`). +> - Pagination is exposed as lazy iterators: `listDataLinks` → `Iterator`; `getContent` → `PagedDataLinkContent` (eager first page, lazy successors). +> - `ResourceTypeHandler.list` returns `Iterable` (not `List`) so directory streams never materialize. +> - `SeqeraPath` gained `cachedAttributes` and `resolveWithAttributes(name, attrs)` so `readAttributes` on a child of a listing short-circuits without an API call. Dataset `@version` parsing moved into `DatasetsResourceHandler`. +> - Data-link `credentialsId` is forwarded on browse and download-URL calls from `DataLinkDto.credentials[0].id`. +> - `SeqeraDataLinkClient` adds `getDataLink(ws, provider, name)` (memoized, server-side `search=` filter) and `getDataLinkProviders(ws)`. + +> **For agentic workers**: execute tasks in order. Each task is self-contained and ends with a commit step. Do not skip TDD steps — write the test first, watch it fail, then make it pass. All commits use `git commit -s`. + +Tests use Spock with `Mock(TowerClient)` + `groovy.json.JsonOutput` fixtures — matching the style of `SeqeraDatasetClientTest` and `SeqeraFileSystemProviderTest`. No WireMock, no real HTTP. + +Legend: +- **[P]**: can be done in parallel with the previous task (different files, no dependency) +- **[Story]**: which user story from the spec +- Exact file paths are relative to repo root + +--- + +## Phase 1: Foundational Refactor (blocks all US) + +**Purpose**: Extract `ResourceTypeHandler`, move dataset-specific logic out of the generic classes. Existing dataset tests must pass unchanged at the end of this phase. + +### T001 — Generalize `SeqeraFileAttributes` + +**Files:** +- Modify: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileAttributes.groovy` + +- [ ] **Step 1: Replace the DatasetDto-coupled class with a generic one** + +Replace the entire class body (everything between `class SeqeraFileAttributes implements BasicFileAttributes {` and the final `}`) with: + +```groovy +@CompileStatic +class SeqeraFileAttributes implements BasicFileAttributes { + + private final boolean directory + private final long size + private final Instant lastModified + private final Instant created + private final Object fileKey + + /** Construct attributes for a virtual directory (any depth). */ + SeqeraFileAttributes(boolean isDir) { + this.directory = isDir + this.size = 0L + this.lastModified = Instant.EPOCH + this.created = Instant.EPOCH + this.fileKey = null + } + + /** Construct attributes for a regular file. */ + SeqeraFileAttributes(long size, Instant lastModified, Instant created, Object fileKey) { + this.directory = false + this.size = size ?: 0L + this.lastModified = lastModified ?: Instant.EPOCH + this.created = created ?: Instant.EPOCH + this.fileKey = fileKey + } + + @Override FileTime lastModifiedTime() { FileTime.from(lastModified) } + @Override FileTime lastAccessTime() { FileTime.from(lastModified) } + @Override FileTime creationTime() { FileTime.from(created) } + @Override boolean isRegularFile() { !directory } + @Override boolean isDirectory() { directory } + @Override boolean isSymbolicLink() { false } + @Override boolean isOther() { false } + @Override long size() { size } + @Override Object fileKey() { fileKey } +} +``` + +- [ ] **Step 2: Drop the now-unused `DatasetDto` import** + +In the same file, remove `import io.seqera.tower.model.DatasetDto`. Keep `import java.time.Instant`. + +- [ ] **Step 3: Compile the plugin** + +Run: `./gradlew :plugins:nf-tower:compileGroovy` +Expected: `BUILD FAILED` with errors in `SeqeraFileSystemProvider.groovy` (calls `new SeqeraFileAttributes(dataset)`) — we will fix that when we extract the dataset handler. Compile errors here are expected at this step only; leave them for T004. + +- [ ] **Step 4: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileAttributes.groovy +git commit -s -m "refactor(nf-tower): generalize SeqeraFileAttributes (no DatasetDto coupling)" +``` + +### T002 — Add `ResourceTypeHandler` interface + +**Files:** +- Create: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy` + +- [ ] **Step 1: Write the interface** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs + +import java.nio.file.AccessMode +import java.nio.file.Path + +/** + * Strategy owning the semantics of one depth-3 path segment under {@code seqera://}. + * Registered in {@link SeqeraFileSystem} at filesystem construction. + */ +interface ResourceTypeHandler { + + /** e.g. {@code "datasets"} or {@code "data-links"}. Must match the depth-3 path segment. */ + String getResourceType() + + /** List entries at the given directory path. Caller has verified depth ≥ 3 and {@code sp.isDirectory()}. */ + List list(SeqeraPath dir) throws IOException + + /** Return attributes for any path at depth ≥ 3 owned by this handler. */ + SeqeraFileAttributes readAttributes(SeqeraPath path) throws IOException + + /** Open a read stream for a leaf path. Throw {@link IllegalArgumentException} if the path is a directory. */ + InputStream newInputStream(SeqeraPath path) throws IOException + + /** Verify the path exists and requested modes are satisfiable. READ is allowed; WRITE/EXECUTE throw {@link java.nio.file.AccessDeniedException}. */ + void checkAccess(SeqeraPath path, AccessMode... modes) throws IOException +} +``` + +- [ ] **Step 2: Compile (the other existing errors from T001 still stand — expected)** + +Run: `./gradlew :plugins:nf-tower:compileGroovy` +Expected: same errors as T001 (in `SeqeraFileSystemProvider.groovy`), no new errors from the new file. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/ResourceTypeHandler.groovy +git commit -s -m "refactor(nf-tower): add ResourceTypeHandler interface" +``` + +### T003 — Refactor `SeqeraPath` to use generic `trail` segments + +**Files:** +- Modify: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy` +- Modify: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy` + +- [ ] **Step 1: Read the current `SeqeraPath` tests to understand coverage** + +Open `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy`. Note the existing cases you must continue to pass (URI parsing, `toUri()` round-trip, `getParent`, `resolve`, `@version`, `equals`/`hashCode`). + +- [ ] **Step 2: Write failing tests for the new depth-≥5 cases** + +Append to the end of the class body (before the final `}`): + +```groovy + def "parse data-link path with provider and name"() { + when: + def p = new SeqeraPath(Mock(SeqeraFileSystem), 'seqera://acme/research/data-links/aws/inputs') + + then: + p.org == 'acme' + p.workspace == 'research' + p.resourceType == 'data-links' + p.trail == ['aws', 'inputs'] + p.depth() == 5 + } + + def "parse data-link path with nested sub-path"() { + when: + def p = new SeqeraPath(Mock(SeqeraFileSystem), 'seqera://acme/research/data-links/aws/inputs/reads/sample1.fq.gz') + + then: + p.trail == ['aws', 'inputs', 'reads', 'sample1.fq.gz'] + p.depth() == 7 + p.isRegularFile() == false // handler decides; generic class reports directory-by-default for trail != 1 dataset + } + + def "getParent walks up one trail segment for deep data-link paths"() { + given: + def fs = Mock(SeqeraFileSystem) + def p = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs/reads/s.fq.gz') + + when: + def parent = p.getParent() as SeqeraPath + + then: + parent.trail == ['aws', 'inputs', 'reads'] + parent.depth() == 6 + } + + def "resolve appends one segment to trail"() { + given: + def fs = Mock(SeqeraFileSystem) + def p = new SeqeraPath(fs, 'seqera://acme/research/data-links/aws/inputs') + + when: + def child = p.resolve('reads') as SeqeraPath + + then: + child.trail == ['aws', 'inputs', 'reads'] + } + + def "toUri round-trip for deep data-link path"() { + given: + def fs = Mock(SeqeraFileSystem) + def original = 'seqera://acme/research/data-links/aws/inputs/reads/sample.fq.gz' + + when: + def p = new SeqeraPath(fs, original) + + then: + p.toUri().toString() == original + } + + def "dataset version pinning preserved after refactor"() { + given: + def fs = Mock(SeqeraFileSystem) + + when: + def p = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples@3') + + then: + p.resourceType == 'datasets' + p.trail == ['samples'] + p.version == '3' + p.datasetName == 'samples' + p.depth() == 4 + } +``` + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.SeqeraPathTest' -i` +Expected: these new tests fail (methods `trail`, `version`, `datasetName` don't behave right yet, or `depth()` returns 4 for data-link paths because the current parse truncates). + +- [ ] **Step 3: Rewrite `SeqeraPath` to use generic trail segments** + +Replace `SeqeraPath.groovy` entirely with: + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs + +import java.nio.file.FileSystem +import java.nio.file.InvalidPathException +import java.nio.file.LinkOption +import java.nio.file.Path +import java.nio.file.ProviderMismatchException +import java.nio.file.WatchEvent +import java.nio.file.WatchKey +import java.nio.file.WatchService + +import groovy.transform.CompileStatic + +/** + * {@link Path} implementation for the {@code seqera://} scheme. + * + * Path shape: + *
+ *   seqera://                               depth 0 — root
+ *   seqera://<org>                       depth 1
+ *   seqera://<org>/<ws>               depth 2
+ *   seqera://<org>/<ws>/<type>      depth 3 — resource type
+ *   seqera://<org>/<ws>/<type>/...   depth 4+ — handler-owned trail
+ * 
+ * + * The generic class is resource-type-agnostic for depth ≥ 3: segments after + * {@code resourceType} are exposed as {@link #getTrail()} for the matching + * {@link ResourceTypeHandler} to interpret. + * + * The dataset convention (single trail segment, optional {@code @version} suffix) + * is preserved via {@link #getDatasetName()} and {@link #getVersion()} accessors. + */ +@CompileStatic +class SeqeraPath implements Path { + + public static final String SCHEME = 'seqera' + public static final String PROTOCOL = "${SCHEME}://" + public static final String SEPARATOR = '/' + + private final SeqeraFileSystem fs + private final String org + private final String workspace + private final String resourceType + private final List trail + private final String version + private final String relPath + + SeqeraPath(SeqeraFileSystem fs, String uriString) { + this.fs = fs + this.relPath = null + if (!uriString.startsWith(PROTOCOL)) + throw new InvalidPathException(uriString, "Not a seqera:// URI") + final withoutScheme = uriString.substring(PROTOCOL.length()) + final parts = withoutScheme.split('/', -1).toList().findAll { String s -> s != null } as List + this.org = parts.size() > 0 && parts[0] ? parts[0] : null + this.workspace = parts.size() > 1 && parts[1] ? parts[1] : null + this.resourceType = parts.size() > 2 && parts[2] ? parts[2] : null + final List tail = parts.size() > 3 ? new ArrayList(parts.subList(3, parts.size())) : new ArrayList() + // For datasets: strip "@version" from the last trail segment if present. + if (this.resourceType == 'datasets' && tail.size() == 1) { + final last = tail[0] + final atIdx = last.lastIndexOf('@') + if (atIdx > 0) { + tail[0] = last.substring(0, atIdx) + this.version = last.substring(atIdx + 1) + } else { + this.version = null + } + } else { + this.version = null + } + this.trail = Collections.unmodifiableList(tail) + validatePath(uriString) + } + + /** Programmatic absolute-path constructor. */ + SeqeraPath(SeqeraFileSystem fs, String org, String workspace, String resourceType, List trail, String version) { + this.fs = fs + this.relPath = null + this.org = org + this.workspace = workspace + this.resourceType = resourceType + this.trail = trail != null ? Collections.unmodifiableList(new ArrayList(trail)) : Collections.emptyList() + this.version = version + validatePath(null) + } + + /** Relative path, produced only by {@link #relativize(Path)}. */ + SeqeraPath(String relPath) { + this.fs = null + this.relPath = relPath ?: '' + this.org = null + this.workspace = null + this.resourceType = null + this.trail = Collections.emptyList() + this.version = null + } + + private void validatePath(String original) { + final label = original ?: rawPath() + if (trail && !resourceType) + throw new InvalidPathException(label, "Trail segments require a resource-type segment") + if (resourceType && !workspace) + throw new InvalidPathException(label, "Resource type requires a workspace segment") + if (workspace && !org) + throw new InvalidPathException(label, "Workspace requires an org segment") + if (org?.contains('/')) + throw new InvalidPathException(label, "Org name cannot contain '/'") + if (workspace?.contains('/')) + throw new InvalidPathException(label, "Workspace name cannot contain '/'") + if (resourceType?.contains('/')) + throw new InvalidPathException(label, "Resource type cannot contain '/'") + for (String t : trail) { + if (t == null || t.isEmpty()) + throw new InvalidPathException(label, "Path segments cannot be empty") + if (t.contains('/')) + throw new InvalidPathException(label, "Path segments cannot contain '/'") + } + // Datasets accept at most one trail segment + if (resourceType == 'datasets' && trail.size() > 1) + throw new InvalidPathException(label, "Dataset paths cannot have sub-paths beyond the dataset name") + } + + private String rawPath() { + final sb = new StringBuilder(PROTOCOL) + if (org) sb.append(org) + if (workspace) sb.append('/').append(workspace) + if (resourceType) sb.append('/').append(resourceType) + for (int i = 0; i < trail.size(); i++) { + sb.append('/') + if (i == trail.size() - 1 && version) + sb.append(trail[i]).append('@').append(version) + else + sb.append(trail[i]) + } + return sb.toString() + } + + private List nameComponents() { + if (isAbsolute()) { + final d = depth() + final out = new ArrayList(d) + for (int i = 0; i < d; i++) + out.add(getName(i).toString()) + return out + } + if (!relPath) return Collections.emptyList() + return relPath.split('/').toList().findAll { String s -> s } as List + } + + // ---- accessors ---- + + String getOrg() { org } + String getWorkspace() { workspace } + String getResourceType() { resourceType } + List getTrail() { trail } + String getVersion() { version } + + /** Backwards-compat: dataset name is the single trail segment when resourceType=='datasets'. */ + String getDatasetName() { + (resourceType == 'datasets' && trail.size() == 1) ? trail[0] : null + } + + int depth() { + if (resourceType) return 3 + trail.size() + if (workspace) return 2 + if (org) return 1 + return 0 + } + + boolean isDirectory() { + // Dataset leaf at depth 4 is a file; all other shapes are directory-by-default. + // Handlers override this interpretation for data-link sub-paths via readAttributes. + !(resourceType == 'datasets' && trail.size() == 1) + } + + boolean isRegularFile() { !isDirectory() } + + // ---- Path API ---- + + @Override FileSystem getFileSystem() { fs } + @Override boolean isAbsolute() { fs != null } + + @Override + Path getRoot() { new SeqeraPath(fs, null, null, null, null, null) } + + @Override + Path getFileName() { + final d = depth() + if (d == 0) return null + if (d >= 4) { + final last = trail[trail.size() - 1] + return new SeqeraPath((d == 4 && version) ? "${last}@${version}" as String : last) + } + if (d == 3) return new SeqeraPath(resourceType) + if (d == 2) return new SeqeraPath(workspace) + return new SeqeraPath(org) + } + + @Override + Path getParent() { + final d = depth() + if (d == 0) return null + if (d == 1) return new SeqeraPath(fs, null, null, null, null, null) + if (d == 2) return new SeqeraPath(fs, org, null, null, null, null) + if (d == 3) return new SeqeraPath(fs, org, workspace, null, null, null) + // d >= 4: drop last trail segment + final newTrail = trail.subList(0, trail.size() - 1) + return new SeqeraPath(fs, org, workspace, resourceType, newTrail, null) + } + + @Override int getNameCount() { depth() } + + @Override + Path getName(int index) { + final d = depth() + if (index < 0 || index >= d) + throw new IllegalArgumentException("Index out of range: $index") + if (index == 0) return new SeqeraPath(org) + if (index == 1) return new SeqeraPath(workspace) + if (index == 2) return new SeqeraPath(resourceType) + final trailIdx = index - 3 + final seg = trail[trailIdx] + // Only the last segment of a depth-4 dataset path carries the version suffix + if (trailIdx == trail.size() - 1 && version && resourceType == 'datasets') + return new SeqeraPath("${seg}@${version}" as String) + return new SeqeraPath(seg) + } + + @Override + Path subpath(int beginIndex, int endIndex) { + throw new UnsupportedOperationException("subpath not supported by seqera:// paths") + } + + @Override + boolean startsWith(Path other) { + if (other !instanceof SeqeraPath) return false + final that = (SeqeraPath) other + if (this.isAbsolute() != that.isAbsolute()) return false + final mine = nameComponents() + final theirs = that.nameComponents() + if (theirs.size() > mine.size()) return false + for (int i = 0; i < theirs.size(); i++) + if (mine[i] != theirs[i]) return false + return true + } + + @Override + boolean startsWith(String other) { + if (!other) return false + try { + final p = SeqeraPath.isSeqeraUri(other) ? new SeqeraPath(fs, other) : new SeqeraPath(other) + return startsWith(p) + } catch (Exception ignored) { return false } + } + + @Override + boolean endsWith(Path other) { + if (other !instanceof SeqeraPath) return false + final that = (SeqeraPath) other + if (that.isAbsolute()) return this.equals(that) + final mine = nameComponents() + final theirs = that.nameComponents() + if (theirs.isEmpty() || theirs.size() > mine.size()) return false + final offset = mine.size() - theirs.size() + for (int i = 0; i < theirs.size(); i++) + if (mine[offset + i] != theirs[i]) return false + return true + } + + @Override + boolean endsWith(String other) { + if (!other) return false + try { + final p = SeqeraPath.isSeqeraUri(other) ? new SeqeraPath(fs, other) : new SeqeraPath(other) + return endsWith(p) + } catch (Exception ignored) { return false } + } + + @Override Path normalize() { this } + + @Override + Path resolve(Path other) { + if (other instanceof SeqeraPath) { + final that = (SeqeraPath) other + if (that.isAbsolute()) return that + return resolve(that.relPath) + } + return resolve(other.toString()) + } + + @Override + Path resolve(String segment) { + if (!segment) return this + if (segment.startsWith(PROTOCOL)) + return new SeqeraPath(fs, segment) + final stripped = segment.startsWith(SEPARATOR) ? segment.substring(1) : segment + if (!stripped) return this + final segs = stripped.split(SEPARATOR, -1).findAll { String s -> s } as List + SeqeraPath result = this + for (String seg : segs) result = result.resolveOne(seg) + return result + } + + private SeqeraPath resolveOne(String seg) { + final d = depth() + if (d == 0) return new SeqeraPath(fs, seg, null, null, null, null) + if (d == 1) return new SeqeraPath(fs, org, seg, null, null, null) + if (d == 2) return new SeqeraPath(fs, org, workspace, seg, null, null) + // d >= 3: append to trail (with @version parsing only for dataset-shaped paths at depth 3→4) + if (d == 3 && resourceType == 'datasets') { + final atIdx = seg.lastIndexOf('@') + if (atIdx > 0) + return new SeqeraPath(fs, org, workspace, resourceType, [seg.substring(0, atIdx)], seg.substring(atIdx + 1)) + } + final newTrail = new ArrayList(trail) + newTrail.add(seg) + return new SeqeraPath(fs, org, workspace, resourceType, newTrail, null) + } + + @Override + Path resolveSibling(Path other) { + final parent = getParent() + return parent != null ? parent.resolve(other) : other + } + + @Override + Path resolveSibling(String other) { + final parent = getParent() + return parent != null ? parent.resolve(other) : new SeqeraPath(fs, other) + } + + @Override + Path relativize(Path other) { + if (other !instanceof SeqeraPath) throw new ProviderMismatchException() + final that = (SeqeraPath) other + if (!this.isAbsolute() || !that.isAbsolute()) + throw new IllegalArgumentException("Both paths must be absolute to relativize: ${this} vs ${other}") + final mine = this.nameComponents() + final theirs = that.nameComponents() + int common = 0 + while (common < mine.size() && common < theirs.size() && mine[common] == theirs[common]) common++ + final parts = new ArrayList() + for (int i = common; i < mine.size(); i++) parts.add('..') + for (int i = common; i < theirs.size(); i++) parts.add(theirs[i]) + return new SeqeraPath(parts.join(SEPARATOR)) + } + + @Override + URI toUri() { + String uriPath = null + if (workspace) { + final segments = [workspace] + if (resourceType) segments.add(resourceType) + for (int i = 0; i < trail.size(); i++) { + final t = trail[i] + if (i == trail.size() - 1 && version && resourceType == 'datasets') + segments.add("${t}@${version}" as String) + else + segments.add(t) + } + uriPath = '/' + segments.join('/') + } + return new URI(SCHEME, org ?: '', uriPath, null, null) + } + + @Override + String toString() { + if (!isAbsolute()) return relPath + if (depth() == 0) return PROTOCOL + return toUri().toString() + } + + @Override + Path toAbsolutePath() { + if (!isAbsolute()) + throw new IllegalStateException("Cannot convert relative SeqeraPath to absolute — no default directory context") + return this + } + + @Override Path toRealPath(LinkOption... options) { this } + + @Override + File toFile() { throw new UnsupportedOperationException("toFile() not supported for seqera:// paths") } + + @Override + WatchKey register(WatchService w, WatchEvent.Kind[] e, WatchEvent.Modifier... m) { + throw new UnsupportedOperationException("WatchService not supported by seqera:// paths") + } + + @Override + WatchKey register(WatchService w, WatchEvent.Kind... e) { + throw new UnsupportedOperationException("WatchService not supported by seqera:// paths") + } + + @Override + Iterator iterator() { + final d = depth() + final out = new ArrayList(d) + for (int i = 0; i < d; i++) out.add(getName(i)) + return out.iterator() + } + + @Override int compareTo(Path other) { toString().compareTo(other.toString()) } + + @Override + boolean equals(Object obj) { + if (obj == this) return true + if (obj !instanceof SeqeraPath) return false + return toString() == obj.toString() + } + + @Override int hashCode() { toString().hashCode() } + + static URI asUri(String path) { + if (!path) throw new IllegalArgumentException("Missing 'path' argument") + if (!path.startsWith(PROTOCOL)) + throw new IllegalArgumentException("Invalid Seqera file system path URI - it must start with '${PROTOCOL}' prefix - offending value: $path") + if (path.startsWith(PROTOCOL + SEPARATOR) && path.length() > PROTOCOL.length() + 1) + throw new IllegalArgumentException("Invalid Seqera file system path URI - make sure the scheme prefix does not contain more than two slash characters or a query in the root '/' - offending value: $path") + if (path.startsWith(PROTOCOL + './')) + path = PROTOCOL + path.substring(PROTOCOL.length() + 2) + if (path == PROTOCOL || path == PROTOCOL + '.') + return new URI(PROTOCOL + '/') + return new URI(path) + } + + static boolean isSeqeraUri(String path) { + return path && path.startsWith(PROTOCOL) + } +} +``` + +- [ ] **Step 4: Run the `SeqeraPathTest` — new and existing cases must pass** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.SeqeraPathTest' -i` +Expected: all tests pass. If any existing test fails, it signals a refactor regression — fix before continuing. + +- [ ] **Step 5: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraPath.groovy \ + plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraPathTest.groovy +git commit -s -m "refactor(nf-tower): generalize SeqeraPath with trail segments for multi-resource support" +``` + +### T004 — Extract `DatasetsResourceHandler` + +**Files:** +- Create: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy` +- Create: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy` + +- [ ] **Step 1: Write the handler — move logic from current `SeqeraFileSystemProvider`** + +The source of the logic is the current `newInputStream`, `readAttributes`, `newDirectoryStream` (depth 3 only), and `checkAccess` branches in `SeqeraFileSystemProvider.groovy` that handle `datasets`. Preserve behavior byte-for-byte. + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package io.seqera.tower.plugin.fs.handler + +import java.nio.file.AccessDeniedException +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException +import java.nio.file.Path +import java.time.Instant + +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.model.DatasetDto +import io.seqera.tower.model.DatasetVersionDto +import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.fs.ResourceTypeHandler +import io.seqera.tower.plugin.fs.SeqeraFileAttributes +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath + +/** + * {@link ResourceTypeHandler} for {@code datasets} resource type. + * All logic previously inlined in {@link SeqeraFileSystemProvider} for dataset paths lives here. + */ +@Slf4j +@CompileStatic +class DatasetsResourceHandler implements ResourceTypeHandler { + + public static final String TYPE = 'datasets' + + private final SeqeraFileSystem fs + private final SeqeraDatasetClient client + + DatasetsResourceHandler(SeqeraFileSystem fs, SeqeraDatasetClient client) { + this.fs = fs + this.client = client + } + + @Override + String getResourceType() { TYPE } + + @Override + List list(SeqeraPath dir) throws IOException { + final d = dir.depth() + if (d == 3) { + final workspaceId = fs.resolveWorkspaceId(dir.org, dir.workspace) + final datasets = fs.resolveDatasets(workspaceId) + return datasets.collect { DatasetDto ds -> dir.resolve(ds.name) as Path } + } + throw new IllegalArgumentException("datasets handler cannot list depth $d paths: $dir") + } + + @Override + SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException { + final d = p.depth() + if (d == 3) { + // resource-type dir — validate workspace + fs.resolveWorkspaceId(p.org, p.workspace) + return new SeqeraFileAttributes(true) + } + if (d != 4) + throw new NoSuchFileException(p.toString(), null, "Invalid dataset path depth: $d") + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final dataset = fs.resolveDataset(workspaceId, p.datasetName) + if (!dataset) + throw new NoSuchFileException(p.toString(), null, "Dataset '${p.datasetName}' not found in workspace ${p.workspace}") + return new SeqeraFileAttributes( + 0L, + dataset.lastUpdated?.toInstant() ?: Instant.EPOCH, + dataset.dateCreated?.toInstant() ?: Instant.EPOCH, + dataset.id ) + } + + @Override + InputStream newInputStream(SeqeraPath p) throws IOException { + if (p.depth() != 4) + throw new IllegalArgumentException("Operation `newInputStream` requires a dataset path (depth 4): $p") + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final dataset = fs.resolveDataset(workspaceId, p.datasetName) + if (!dataset) + throw new NoSuchFileException(p.toString(), null, "Dataset '${p.datasetName}' not found in workspace ${p.workspace}") + final version = resolveVersion(dataset, p) + log.debug "Downloading dataset '${p.datasetName}' version ${version.version} (${version.fileName}) from workspace $workspaceId" + return client.downloadDataset(dataset.id, String.valueOf(version.version), version.fileName, dataset.workspaceId) + } + + @Override + void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException { + for (AccessMode m : modes) { + if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) + throw new AccessDeniedException(p.toString(), null, "seqera:// datasets are read-only") + } + // READ: make sure the dataset resolves + readAttributes(p) + } + + private DatasetVersionDto resolveVersion(DatasetDto dataset, SeqeraPath p) throws IOException { + final pinned = p.version + final versions = fs.resolveVersions(dataset.id, dataset.workspaceId) + if (versions.isEmpty()) + throw new NoSuchFileException(p.toString(), null, "No versions available for dataset '${dataset.name}'") + if (pinned) { + final found = versions.find { DatasetVersionDto v -> String.valueOf(v.version) == pinned } + if (!found) + throw new NoSuchFileException(p.toString(), null, "Version '$pinned' not found for dataset '${dataset.name}'") + return found + } + final latest = versions.findAll { DatasetVersionDto v -> !v.disabled }.max { DatasetVersionDto v -> v.version } + if (!latest) + throw new NoSuchFileException(p.toString(), null, "No enabled versions for dataset '${dataset.name}'") + return latest + } +} +``` + +- [ ] **Step 2: Write Spock tests for the handler** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0) + */ +package io.seqera.tower.plugin.fs.handler + +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException + +import io.seqera.tower.model.DatasetDto +import io.seqera.tower.model.DatasetVersionDto +import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath +import spock.lang.Specification + +class DatasetsResourceHandlerTest extends Specification { + + def fs = Mock(SeqeraFileSystem) + def client = Mock(SeqeraDatasetClient) + def handler = new DatasetsResourceHandler(fs, client) + + def "getResourceType returns 'datasets'"() { + expect: + handler.resourceType == 'datasets' + } + + def "list at depth 3 returns one path per dataset"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets') + def ds1 = new DatasetDto(id: 'd1', name: 'one') + def ds2 = new DatasetDto(id: 'd2', name: 'two') + + when: + def paths = handler.list(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * fs.resolveDatasets(10L) >> [ds1, ds2] + paths*.toString() == ['seqera://acme/research/datasets/one', 'seqera://acme/research/datasets/two'] + } + + def "newInputStream resolves latest non-disabled version when no pin"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + def ds = new DatasetDto(id: 'd1', name: 'samples', workspaceId: 10L) + def v1 = new DatasetVersionDto(datasetId: 'd1', version: 1L, fileName: 'a.csv', disabled: false) + def v2 = new DatasetVersionDto(datasetId: 'd1', version: 2L, fileName: 'b.csv', disabled: false) + def content = new ByteArrayInputStream('x'.bytes) + + when: + def stream = handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * fs.resolveDataset(10L, 'samples') >> ds + 1 * fs.resolveVersions('d1', 10L) >> [v1, v2] + 1 * client.downloadDataset('d1', '2', 'b.csv', 10L) >> content + stream === content + } + + def "newInputStream honors @version pin"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples@1') + def ds = new DatasetDto(id: 'd1', name: 'samples', workspaceId: 10L) + def v1 = new DatasetVersionDto(datasetId: 'd1', version: 1L, fileName: 'a.csv', disabled: false) + def v2 = new DatasetVersionDto(datasetId: 'd1', version: 2L, fileName: 'b.csv', disabled: false) + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * fs.resolveDataset(10L, 'samples') >> ds + 1 * fs.resolveVersions('d1', 10L) >> [v1, v2] + 1 * client.downloadDataset('d1', '1', 'a.csv', 10L) >> new ByteArrayInputStream('x'.bytes) + } + + def "newInputStream throws NoSuchFileException when dataset is missing"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/ghost') + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * fs.resolveDataset(10L, 'ghost') >> null + thrown(NoSuchFileException) + } + + def "checkAccess rejects WRITE"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/datasets/samples') + + when: + handler.checkAccess(path, AccessMode.WRITE) + + then: + thrown(java.nio.file.AccessDeniedException) + } +} +``` + +- [ ] **Step 3: Run the handler tests** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.handler.DatasetsResourceHandlerTest' -i` +Expected: all pass (after T005–T006 make the provider compile; until then the top-level compile failure blocks this — continue to T005). + +- [ ] **Step 4: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy \ + plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandlerTest.groovy +git commit -s -m "refactor(nf-tower): extract DatasetsResourceHandler from SeqeraFileSystemProvider" +``` + +### T005 — Add handler registry to `SeqeraFileSystem` + +**Files:** +- Modify: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy` + +- [ ] **Step 1: Add the registry field and accessors** + +Insert after the existing cache field declarations (e.g. after the `versionCache` line): + +```groovy + private final Map handlers = new LinkedHashMap<>() +``` + +Then insert immediately before the closing `}` of the class: + +```groovy + // ---- handler registry ---- + + synchronized void registerHandler(ResourceTypeHandler handler) { + handlers.put(handler.resourceType, handler) + } + + synchronized ResourceTypeHandler getHandler(String resourceType) { + handlers.get(resourceType) + } + + synchronized Set getResourceTypes() { + Collections.unmodifiableSet(new LinkedHashSet(handlers.keySet())) + } +``` + +- [ ] **Step 2: Compile** + +Run: `./gradlew :plugins:nf-tower:compileGroovy` +Expected: compile errors in `SeqeraFileSystemProvider.groovy` remain; no new errors from `SeqeraFileSystem`. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy +git commit -s -m "refactor(nf-tower): add ResourceTypeHandler registry to SeqeraFileSystem" +``` + +### T006 — Refactor `SeqeraFileSystemProvider` to dispatch via handlers + +**Files:** +- Modify: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy` + +- [ ] **Step 1: Replace the provider body** + +Replace the class body with the following (the class shell, package, imports are adjusted): + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0 header unchanged) + */ + +package io.seqera.tower.plugin.fs + +import java.nio.channels.SeekableByteChannel +import java.nio.file.AccessDeniedException +import java.nio.file.AccessMode +import java.nio.file.CopyOption +import java.nio.file.DirectoryIteratorException +import java.nio.file.DirectoryStream +import java.nio.file.FileStore +import java.nio.file.FileSystem +import java.nio.file.FileSystemAlreadyExistsException +import java.nio.file.FileSystemNotFoundException +import java.nio.file.Files +import java.nio.file.LinkOption +import java.nio.file.NoSuchFileException +import java.nio.file.NotDirectoryException +import java.nio.file.OpenOption +import java.nio.file.Path +import java.nio.file.ProviderMismatchException +import java.nio.file.StandardOpenOption +import java.nio.file.attribute.BasicFileAttributes +import java.nio.file.attribute.FileAttribute +import java.nio.file.attribute.FileAttributeView +import java.nio.file.spi.FileSystemProvider + +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.plugin.TowerClient +import io.seqera.tower.plugin.TowerFactory +import io.seqera.tower.plugin.dataset.SeqeraDatasetClient +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient +import io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler +import io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler + +@Slf4j +@CompileStatic +class SeqeraFileSystemProvider extends FileSystemProvider { + + public static final String SCHEME = 'seqera' + + private volatile SeqeraFileSystem fileSystem + + @Override String getScheme() { SCHEME } + + @Override + synchronized FileSystem newFileSystem(URI uri, Map env) throws IOException { + checkScheme(uri) + if (fileSystem) + throw new FileSystemAlreadyExistsException("File system `seqera://` already exists") + final TowerClient tc = TowerFactory.client() + if (!tc) + throw new IllegalStateException("File system `seqera://` requires the Seqera Platform access token — use `tower.accessToken` or TOWER_ACCESS_TOKEN") + final datasetClient = new SeqeraDatasetClient(tc) + fileSystem = new SeqeraFileSystem(this, datasetClient) + fileSystem.registerHandler(new DatasetsResourceHandler(fileSystem, datasetClient)) + fileSystem.registerHandler(new DataLinksResourceHandler(fileSystem, new SeqeraDataLinkClient(tc))) + return fileSystem + } + + @Override + synchronized FileSystem getFileSystem(URI uri) { + checkScheme(uri) + if (!fileSystem) throw new FileSystemNotFoundException("No seqera:// filesystem has been created yet") + return fileSystem + } + + synchronized SeqeraFileSystem getOrCreateFileSystem(URI uri, Map env) { + checkScheme(uri) + if (!fileSystem) newFileSystem(uri, env ?: Collections.emptyMap()) + return fileSystem + } + + @Override + SeqeraPath getPath(URI uri) { + final fs = getOrCreateFileSystem(uri, Collections.emptyMap()) + return new SeqeraPath(fs, uri.toString()) + } + + // ---- read ---- + + @Override + InputStream newInputStream(Path path, OpenOption... options) throws IOException { + final sp = toSeqeraPath(path) + if (sp.depth() < 3) + throw new IllegalArgumentException("newInputStream requires a leaf path: $path") + final fs = sp.getFileSystem() as SeqeraFileSystem + final h = fs.getHandler(sp.resourceType) + if (!h) throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") + return h.newInputStream(sp) + } + + @Override + SeekableByteChannel newByteChannel(Path path, Set options, FileAttribute... attrs) throws IOException { + if (options?.contains(StandardOpenOption.WRITE) || options?.contains(StandardOpenOption.APPEND)) + throw new UnsupportedOperationException("seqera:// filesystem is read-only") + return new DatasetInputStream(newInputStream(path)) + } + + // ---- attributes ---- + + @Override +
A readAttributes(Path path, Class type, LinkOption... options) throws IOException { + if (!BasicFileAttributes.isAssignableFrom(type)) + throw new UnsupportedOperationException("Attribute type not supported: $type") + final sp = toSeqeraPath(path) + final fs = sp.getFileSystem() as SeqeraFileSystem + final d = sp.depth() + if (d < 3) { + validateSharedDirectoryExists(fs, sp) + return (A) new SeqeraFileAttributes(true) + } + final h = fs.getHandler(sp.resourceType) + if (!h) throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") + return (A) h.readAttributes(sp) + } + + @Override + Map readAttributes(Path path, String attributes, LinkOption... options) throws IOException { + throw new UnsupportedOperationException("readAttributes(String) not supported by `seqera://` filesystem") + } + + // ---- access ---- + + @Override + void checkAccess(Path path, AccessMode... modes) throws IOException { + final sp = toSeqeraPath(path) + for (AccessMode m : modes) { + if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) + throw new AccessDeniedException(path.toString(), null, "seqera:// filesystem is read-only") + } + final d = sp.depth() + if (d == 0) return + if (d < 3) { + validateSharedDirectoryExists(sp.getFileSystem() as SeqeraFileSystem, sp) + return + } + final fs = sp.getFileSystem() as SeqeraFileSystem + final h = fs.getHandler(sp.resourceType) + if (!h) throw new NoSuchFileException(path.toString(), null, "Unsupported resource type: ${sp.resourceType}") + h.checkAccess(sp, modes) + } + + // ---- directory stream ---- + + @Override + DirectoryStream newDirectoryStream(Path dir, DirectoryStream.Filter filter) throws IOException { + final sp = toSeqeraPath(dir) + final fs = sp.getFileSystem() as SeqeraFileSystem + final d = sp.depth() + List entries + if (d == 0) { + fs.loadOrgWorkspaceCache() + entries = fs.listOrgNames().collect { String org -> sp.resolve(org) as Path } + } else if (d == 1) { + fs.loadOrgWorkspaceCache() + entries = fs.listWorkspaceNames(sp.org).collect { String ws -> sp.resolve(ws) as Path } + } else if (d == 2) { + fs.resolveWorkspaceId(sp.org, sp.workspace) // validates existence + entries = fs.getResourceTypes().collect { String rt -> sp.resolve(rt) as Path } + } else { + final h = fs.getHandler(sp.resourceType) + if (!h) throw new NoSuchFileException(dir.toString(), null, "Unsupported resource type: ${sp.resourceType}") + entries = h.list(sp) + } + + final filtered = filter ? entries.findAll { Path p -> + try { filter.accept(p) } + catch (IOException e) { throw new DirectoryIteratorException(e) } + } : entries + + return new DirectoryStream() { + @Override Iterator iterator() { filtered.iterator() } + @Override void close() {} + } + } + + // ---- copy ---- + + @Override + void copy(Path source, Path target, CopyOption... options) throws IOException { + toSeqeraPath(source) + if (target instanceof SeqeraPath) + throw new UnsupportedOperationException("seqera:// filesystem is read-only") + try (final InputStream is = newInputStream(source)) { + Files.copy(is, target, options) + } + } + + // ---- unsupported mutations ---- + + @Override void move(Path s, Path t, CopyOption... o) { throw new UnsupportedOperationException("move() not supported") } + @Override void delete(Path p) { throw new UnsupportedOperationException("delete() not supported") } + @Override void createDirectory(Path d, FileAttribute... a) { throw new UnsupportedOperationException("createDirectory() not supported") } + @Override boolean isSameFile(Path a, Path b) { a == b } + @Override boolean isHidden(Path p) { false } + @Override FileStore getFileStore(Path p) { throw new UnsupportedOperationException("getFileStore() not supported") } + @Override V getFileAttributeView(Path p, Class t, LinkOption... o) { null } + @Override void setAttribute(Path p, String a, Object v, LinkOption... o) { throw new UnsupportedOperationException("setAttribute() not supported") } + + // ---- helpers ---- + + private static SeqeraPath toSeqeraPath(Path path) { + if (path !instanceof SeqeraPath) throw new ProviderMismatchException() + return (SeqeraPath) path + } + + private static void checkScheme(URI uri) { + if (uri.scheme?.toLowerCase() != SCHEME) + throw new IllegalArgumentException("Not a seqera:// URI: $uri") + } + + private static void validateSharedDirectoryExists(SeqeraFileSystem fs, SeqeraPath sp) throws NoSuchFileException { + final d = sp.depth() + if (d == 0) return + fs.loadOrgWorkspaceCache() + if (d >= 1 && !fs.listOrgNames().contains(sp.org)) + throw new NoSuchFileException("seqera://${sp.org}", null, "Organisation not found") + if (d >= 2) fs.resolveWorkspaceId(sp.org, sp.workspace) + } +} +``` + +- [ ] **Step 2: Compile** + +Run: `./gradlew :plugins:nf-tower:compileGroovy` +Expected: **one** remaining compile failure — `import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient` and `import io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler` are not yet defined. These are created in T008–T010; continue without committing yet. + +- [ ] **Step 3: Temporarily stub the missing imports so compile passes and existing tests can run** + +Create a minimal stub at `plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy`: + +```groovy +package io.seqera.tower.plugin.datalink + +import groovy.transform.CompileStatic +import io.seqera.tower.plugin.TowerClient + +@CompileStatic +class SeqeraDataLinkClient { + SeqeraDataLinkClient(TowerClient tc) {} +} +``` + +Create a minimal stub at `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy`: + +```groovy +package io.seqera.tower.plugin.fs.handler + +import java.nio.file.AccessMode +import java.nio.file.Path +import groovy.transform.CompileStatic +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient +import io.seqera.tower.plugin.fs.ResourceTypeHandler +import io.seqera.tower.plugin.fs.SeqeraFileAttributes +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath + +@CompileStatic +class DataLinksResourceHandler implements ResourceTypeHandler { + DataLinksResourceHandler(SeqeraFileSystem fs, SeqeraDataLinkClient client) {} + @Override String getResourceType() { 'data-links' } + @Override List list(SeqeraPath dir) { throw new UnsupportedOperationException('stub') } + @Override SeqeraFileAttributes readAttributes(SeqeraPath path) { throw new UnsupportedOperationException('stub') } + @Override InputStream newInputStream(SeqeraPath path) { throw new UnsupportedOperationException('stub') } + @Override void checkAccess(SeqeraPath path, AccessMode... modes) { throw new UnsupportedOperationException('stub') } +} +``` + +- [ ] **Step 4: Compile and run the existing nf-tower tests** + +Run: `./gradlew :plugins:nf-tower:test -i` +Expected: all existing tests pass — including `SeqeraDatasetClientTest`, `SeqeraFileSystemTest`, `SeqeraPathTest`, `SeqeraFileSystemProviderTest`, `DatasetsResourceHandlerTest` from T004. Any failure is a refactor regression and must be fixed before committing. + +- [ ] **Step 5: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystemProvider.groovy \ + plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy \ + plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy +git commit -s -m "refactor(nf-tower): SeqeraFileSystemProvider dispatches to ResourceTypeHandler; add data-link stubs" +``` + +**Checkpoint**: Refactor done. Dataset behavior is unchanged, routed through `DatasetsResourceHandler`. Data-link stubs exist but throw `UnsupportedOperationException`. + +--- + +## Phase 2: Data-Link API Client + +### T007 — Implement `SeqeraDataLinkClient` + +**Files:** +- Modify: `plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy` + +- [ ] **Step 1: Replace the stub with the real client** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0) + */ + +package io.seqera.tower.plugin.datalink + +import java.nio.file.AccessDeniedException +import java.nio.file.NoSuchFileException + +import groovy.json.JsonSlurper +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.model.DataLinkContentResponse +import io.seqera.tower.model.DataLinkDownloadUrlResponse +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.model.DataLinkItem +import io.seqera.tower.model.DataLinkItemType +import io.seqera.tower.model.DataLinkProvider +import io.seqera.tower.plugin.TowerClient +import io.seqera.tower.plugin.exception.ForbiddenException +import io.seqera.tower.plugin.exception.NotFoundException +import io.seqera.tower.plugin.exception.UnauthorizedException +import nextflow.exception.AbortOperationException + +/** + * Typed client for Seqera Platform data-link API endpoints. + * Delegates HTTP execution to {@link TowerClient#sendApiRequest}. + */ +@Slf4j +@CompileStatic +class SeqeraDataLinkClient { + + private static final int LIST_PAGE_SIZE = 100 + + private final TowerClient towerClient + + SeqeraDataLinkClient(TowerClient tc) { this.towerClient = tc } + + private String getEndpoint() { towerClient.endpoint } + + /** + * GET /data-links?workspaceId={ws}&max={n}&offset={o} + * Exhausts pagination and returns all data-links in the workspace. + */ + List listDataLinks(long workspaceId) { + final out = new ArrayList() + int offset = 0 + while (true) { + final url = "${endpoint}/data-links?workspaceId=${workspaceId}&max=${LIST_PAGE_SIZE}&offset=${offset}" + log.debug "SeqeraDataLinkClient GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + final items = json.dataLinks as List + if (items) { + for (Map m : items) out.add(mapDataLink(m)) + offset += items.size() + } + final total = (json.totalSize as Long) ?: 0L + if (!items || offset >= total) break + } + return out + } + + /** + * GET /data-links/{id}/content?workspaceId={ws}&path={sub}&nextPageToken={tok} + * Works for directories and single files. Exhausts {@code nextPageToken}. + * Returns a synthesised {@link DataLinkContentResponse} with concatenated objects. + */ + DataLinkContentResponse getContent(String dataLinkId, String subPath, long workspaceId) { + final out = new DataLinkContentResponse() + out.objects = new ArrayList() + String token = null + while (true) { + String url = "${endpoint}/data-links/${dataLinkId}/content?workspaceId=${workspaceId}" + if (subPath) url += "&path=${encode(subPath)}" + if (token) url += "&nextPageToken=${encode(token)}" + log.debug "SeqeraDataLinkClient GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + if (out.originalPath == null) out.originalPath = json.originalPath as String + final items = json.objects as List + if (items) for (Map m : items) out.objects.add(mapItem(m)) + token = json.nextPageToken as String + if (!token) break + } + return out + } + + /** GET /data-links/{id}/download?workspaceId={ws}&path={sub} */ + DataLinkDownloadUrlResponse getDownloadUrl(String dataLinkId, String subPath, long workspaceId) { + final url = "${endpoint}/data-links/${dataLinkId}/download?workspaceId=${workspaceId}&path=${encode(subPath ?: '')}" + log.debug "SeqeraDataLinkClient GET $url" + final resp = towerClient.sendApiRequest(url) + checkFsResponse(resp, url) + final json = new JsonSlurper().parseText(resp.message) as Map + final out = new DataLinkDownloadUrlResponse() + out.url = json.url as String + return out + } + + // ---- helpers ---- + + private static String encode(String s) { + new URI(null, null, s, null).rawPath + } + + private static void checkFsResponse(TowerClient.Response resp, String url) { + if (!resp.error) return + final code = resp.code + if (code == 401) + throw new AbortOperationException("Seqera authentication failed — check tower.accessToken or TOWER_ACCESS_TOKEN") + if (code == 403) + throw new AccessDeniedException(url, null, "Forbidden — check workspace permissions") + if (code == 404) + throw new NoSuchFileException(url) + throw new IOException("Seqera API error: HTTP ${code} for ${url}") + } + + private static DataLinkDto mapDataLink(Map m) { + final dto = new DataLinkDto() + dto.id = m.id as String + dto.name = m.name as String + dto.description = m.description as String + dto.resourceRef = m.resourceRef as String + if (m.provider) dto.provider = DataLinkProvider.fromValue(m.provider as String) + dto.region = m.region as String + return dto + } + + private static DataLinkItem mapItem(Map m) { + final it = new DataLinkItem() + it.name = m.name as String + if (m.type) it.type = DataLinkItemType.fromValue(m.type as String) + it.size = (m.size as Long) ?: 0L + it.mimeType = m.mimeType as String + return it + } +} +``` + +- [ ] **Step 2: Verify `DataLinkProvider.fromValue` and `DataLinkItemType.fromValue` exist** + +Run: `javap -p /home/jorgee/IdeaProjects/nextflow/plugins/nf-tower/build/target/libs/tower-api-1.121.0.jar | grep -A2 'class io.seqera.tower.model.DataLinkProvider' | head -20` — or just proceed. These `fromValue` methods are standard on generated Micronaut enums; if not present the compile step will tell us. + +- [ ] **Step 3: Compile** + +Run: `./gradlew :plugins:nf-tower:compileGroovy` +Expected: success. If `fromValue` is missing, fall back to `DataLinkProvider.values().find { it.toString() == m.provider }`. + +- [ ] **Step 4: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/datalink/SeqeraDataLinkClient.groovy +git commit -s -m "feat(nf-tower): add SeqeraDataLinkClient with pagination and error mapping" +``` + +### T008 — Unit tests for `SeqeraDataLinkClient` + +**Files:** +- Create: `plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy` + +- [ ] **Step 1: Write the Spock spec** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0) + */ +package io.seqera.tower.plugin.datalink + +import java.nio.file.AccessDeniedException +import java.nio.file.NoSuchFileException + +import groovy.json.JsonOutput +import io.seqera.tower.plugin.TowerClient +import nextflow.exception.AbortOperationException +import spock.lang.Specification + +class SeqeraDataLinkClientTest extends Specification { + + private static final String EP = 'https://api.example.com' + + private TowerClient tower() { + def tc = Spy(TowerClient) + tc.@endpoint = EP + return tc + } + + private static TowerClient.Response ok(String body) { new TowerClient.Response(200, body) } + private static TowerClient.Response err(int code) { new TowerClient.Response(code, "error $code") } + + // ---- listDataLinks ---- + + def "listDataLinks returns parsed DTOs for a single page"() { + given: + def body = JsonOutput.toJson([dataLinks: [ + [id: 'dl-1', name: 'inputs', provider: 'aws', resourceRef: 's3://bucket/'], + [id: 'dl-2', name: 'archive', provider: 'google', resourceRef: 'gs://bucket/'] + ], totalSize: 2]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + def list = client.listDataLinks(10L) + + then: + list.size() == 2 + list[0].id == 'dl-1' + list[0].name == 'inputs' + list[1].provider.toString() == 'google' + } + + def "listDataLinks exhausts pagination"() { + given: + def page1 = JsonOutput.toJson([dataLinks: [[id: 'dl-1', name: 'a', provider: 'aws']], totalSize: 3]) + def page2 = JsonOutput.toJson([dataLinks: [[id: 'dl-2', name: 'b', provider: 'aws']], totalSize: 3]) + def page3 = JsonOutput.toJson([dataLinks: [[id: 'dl-3', name: 'c', provider: 'aws']], totalSize: 3]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=0") >> ok(page1) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=1") >> ok(page2) + tc.sendApiRequest("${EP}/data-links?workspaceId=10&max=100&offset=2") >> ok(page3) + def client = new SeqeraDataLinkClient(tc) + + when: + def list = client.listDataLinks(10L) + + then: + list*.id == ['dl-1', 'dl-2', 'dl-3'] + } + + // ---- getContent ---- + + def "getContent single page returns parsed items"() { + given: + def body = JsonOutput.toJson([ + originalPath: 'reads/', + objects: [ + [name: 'a.fq', type: 'FILE', size: 123, mimeType: 'application/gzip'], + [name: 'b.fq', type: 'FILE', size: 456, mimeType: 'application/gzip'] + ]]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links/dl-1/content?workspaceId=10&path=reads/") >> ok(body) + def client = new SeqeraDataLinkClient(tc) + + when: + def resp = client.getContent('dl-1', 'reads/', 10L) + + then: + resp.objects.size() == 2 + resp.objects[0].name == 'a.fq' + resp.objects[0].size == 123L + resp.objects[0].type.toString() == 'FILE' + } + + def "getContent follows nextPageToken"() { + given: + def p1 = JsonOutput.toJson([originalPath: '', objects: [[name: 'a', type: 'FILE', size: 1]], nextPageToken: 'T2']) + def p2 = JsonOutput.toJson([originalPath: '', objects: [[name: 'b', type: 'FILE', size: 2]]]) + def tc = tower() + tc.sendApiRequest("${EP}/data-links/dl-1/content?workspaceId=10") >> ok(p1) + tc.sendApiRequest("${EP}/data-links/dl-1/content?workspaceId=10&nextPageToken=T2") >> ok(p2) + def client = new SeqeraDataLinkClient(tc) + + when: + def resp = client.getContent('dl-1', null, 10L) + + then: + resp.objects*.name == ['a', 'b'] + } + + // ---- getDownloadUrl ---- + + def "getDownloadUrl returns the signed URL"() { + given: + def tc = tower() + tc.sendApiRequest("${EP}/data-links/dl-1/download?workspaceId=10&path=reads/a.fq") >> ok(JsonOutput.toJson([url: 'https://signed'])) + def client = new SeqeraDataLinkClient(tc) + + when: + def resp = client.getDownloadUrl('dl-1', 'reads/a.fq', 10L) + + then: + resp.url == 'https://signed' + } + + // ---- error mapping ---- + + def "401 throws AbortOperationException"() { + given: + def tc = tower() + tc.sendApiRequest(_) >> err(401) + def client = new SeqeraDataLinkClient(tc) + + when: + client.listDataLinks(10L) + + then: + thrown(AbortOperationException) + } + + def "403 throws AccessDeniedException"() { + given: + def tc = tower() + tc.sendApiRequest(_) >> err(403) + def client = new SeqeraDataLinkClient(tc) + + when: + client.getContent('dl-1', '', 10L) + + then: + thrown(AccessDeniedException) + } + + def "404 throws NoSuchFileException"() { + given: + def tc = tower() + tc.sendApiRequest(_) >> err(404) + def client = new SeqeraDataLinkClient(tc) + + when: + client.getDownloadUrl('dl-1', 'missing', 10L) + + then: + thrown(NoSuchFileException) + } +} +``` + +- [ ] **Step 2: Run the tests** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.datalink.SeqeraDataLinkClientTest' -i` +Expected: all pass. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/datalink/SeqeraDataLinkClientTest.groovy +git commit -s -m "test(nf-tower): SeqeraDataLinkClient unit tests — pagination, parse, error mapping" +``` + +--- + +## Phase 3: User Story 1 — Read a File Inside a Data-Link (P1) 🎯 MVP + +**Goal**: `file('seqera:////data-links///')` returns an `InputStream` over the file content. + +### T009 [US1] — Implement `DataLinksResourceHandler` (real, replacing the stub) + +**Files:** +- Modify: `plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy` + +- [ ] **Step 1: Replace the stub with the full handler** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0) + */ + +package io.seqera.tower.plugin.fs.handler + +import java.net.http.HttpClient +import java.net.http.HttpRequest +import java.net.http.HttpResponse +import java.nio.file.AccessDeniedException +import java.nio.file.AccessMode +import java.nio.file.NoSuchFileException +import java.nio.file.Path +import java.time.Duration +import java.time.Instant + +import groovy.transform.CompileStatic +import groovy.util.logging.Slf4j +import io.seqera.tower.model.DataLinkContentResponse +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.model.DataLinkItem +import io.seqera.tower.model.DataLinkItemType +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient +import io.seqera.tower.plugin.fs.ResourceTypeHandler +import io.seqera.tower.plugin.fs.SeqeraFileAttributes +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath + +/** + * {@link ResourceTypeHandler} for {@code data-links} resource type. + * Listings and attributes go through the Seqera Platform API; file reads go through + * a pre-signed URL fetched with a plain JDK {@link HttpClient} — the signed URL is + * addressed to the cloud backend and the Seqera {@code Authorization} header must not be sent. + */ +@Slf4j +@CompileStatic +class DataLinksResourceHandler implements ResourceTypeHandler { + + public static final String TYPE = 'data-links' + + private final SeqeraFileSystem fs + private final SeqeraDataLinkClient client + private final HttpClient httpClient + + /** workspaceId → data-link list */ + private final Map> dataLinkCache = new LinkedHashMap<>() + + DataLinksResourceHandler(SeqeraFileSystem fs, SeqeraDataLinkClient client) { + this(fs, client, HttpClient.newBuilder().connectTimeout(Duration.ofSeconds(10)).build()) + } + + /** Test-only constructor to inject a mock {@link HttpClient}. */ + DataLinksResourceHandler(SeqeraFileSystem fs, SeqeraDataLinkClient client, HttpClient httpClient) { + this.fs = fs + this.client = client + this.httpClient = httpClient + } + + @Override String getResourceType() { TYPE } + + @Override + List list(SeqeraPath dir) throws IOException { + final workspaceId = fs.resolveWorkspaceId(dir.org, dir.workspace) + final trail = dir.trail + if (trail.isEmpty()) { + // List distinct providers in use + final providers = resolveDataLinks(workspaceId) + .collect { DataLinkDto dl -> dl.provider?.toString() } + .findAll { String p -> p } + .toSet() + return providers.toList().sort().collect { String p -> dir.resolve(p) as Path } + } + if (trail.size() == 1) { + // List data-link names under the given provider + final prov = trail[0] + final names = resolveDataLinks(workspaceId) + .findAll { DataLinkDto dl -> dl.provider?.toString() == prov } + .collect { DataLinkDto dl -> dl.name } + .sort() + if (names.isEmpty()) + throw new NoSuchFileException(dir.toString(), null, "No data-links for provider '$prov' in workspace '${dir.workspace}'") + return names.collect { String n -> dir.resolve(n) as Path } + } + // trail.size() >= 2 — browse inside the data-link + final dl = resolveDataLink(workspaceId, trail[0], trail[1]) + final subPath = trail.size() > 2 ? trail.subList(2, trail.size()).join('/') : '' + final resp = client.getContent(dl.id, subPath, workspaceId) + return (resp.objects ?: []).collect { DataLinkItem it -> dir.resolve(it.name) as Path } + } + + @Override + SeqeraFileAttributes readAttributes(SeqeraPath p) throws IOException { + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final trail = p.trail + // data-links/ dir itself, provider dir, data-link root — all directories + if (trail.size() < 2) return new SeqeraFileAttributes(true) + final dl = resolveDataLink(workspaceId, trail[0], trail[1]) + if (trail.size() == 2) return new SeqeraFileAttributes(true) // data-link root + final subPath = trail.subList(2, trail.size()).join('/') + final resp = client.getContent(dl.id, subPath, workspaceId) + return attributesFor(resp, subPath, p) + } + + @Override + InputStream newInputStream(SeqeraPath p) throws IOException { + if (p.trail.size() < 3) + throw new IllegalArgumentException("newInputStream requires a file path inside a data-link: $p") + final workspaceId = fs.resolveWorkspaceId(p.org, p.workspace) + final dl = resolveDataLink(workspaceId, p.trail[0], p.trail[1]) + final subPath = p.trail.subList(2, p.trail.size()).join('/') + final urlResp = client.getDownloadUrl(dl.id, subPath, workspaceId) + if (!urlResp.url) + throw new NoSuchFileException(p.toString(), null, "Platform returned no download URL") + return fetchSignedUrl(urlResp.url) + } + + @Override + void checkAccess(SeqeraPath p, AccessMode... modes) throws IOException { + for (AccessMode m : modes) { + if (m == AccessMode.WRITE || m == AccessMode.EXECUTE) + throw new AccessDeniedException(p.toString(), null, "seqera:// data-links are read-only") + } + // READ: rely on readAttributes to validate existence + readAttributes(p) + } + + // ---- private ---- + + private synchronized List resolveDataLinks(long workspaceId) { + def cached = dataLinkCache.get(workspaceId) + if (cached == null) { + cached = client.listDataLinks(workspaceId) + dataLinkCache.put(workspaceId, cached) + } + return cached + } + + private DataLinkDto resolveDataLink(long workspaceId, String provider, String name) throws NoSuchFileException { + final found = resolveDataLinks(workspaceId).find { DataLinkDto dl -> + dl.provider?.toString() == provider && dl.name == name + } + if (!found) + throw new NoSuchFileException( + "seqera://.../data-links/${provider}/${name}", + null, + "Data-link '${name}' not found for provider '${provider}' in workspace '$workspaceId'") + return found + } + + private SeqeraFileAttributes attributesFor(DataLinkContentResponse resp, String subPath, SeqeraPath pathForErrors) throws NoSuchFileException { + final items = resp.objects ?: [] + // Single-file content response: one object whose name matches the final segment + final lastSeg = subPath.contains('/') ? subPath.substring(subPath.lastIndexOf('/') + 1) : subPath + final single = items.find { DataLinkItem it -> it.name == lastSeg && it.type == DataLinkItemType.FILE } + if (single) + return new SeqeraFileAttributes(single.size ?: 0L, Instant.EPOCH, Instant.EPOCH, pathForErrors.toString()) + // Otherwise treat as a directory (content response with multiple children or zero) + // If there are no children AND no originalPath, the path does not exist + if (items.isEmpty() && !resp.originalPath) + throw new NoSuchFileException(pathForErrors.toString(), null, "Path not found inside data-link") + return new SeqeraFileAttributes(true) + } + + private InputStream fetchSignedUrl(String url) throws IOException { + final req = HttpRequest.newBuilder(URI.create(url)) + .timeout(Duration.ofMinutes(5)) + .GET() + .build() + try { + final HttpResponse resp = httpClient.send(req, HttpResponse.BodyHandlers.ofInputStream()) + final status = resp.statusCode() + if (status >= 200 && status < 300) return resp.body() + resp.body()?.close() + throw new IOException("Signed URL fetch failed: HTTP $status for $url") + } catch (InterruptedException e) { + Thread.currentThread().interrupt() + throw new IOException("Interrupted while fetching signed URL", e) + } + } +} +``` + +- [ ] **Step 2: Compile** + +Run: `./gradlew :plugins:nf-tower:compileGroovy` +Expected: success. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandler.groovy +git commit -s -m "feat(nf-tower): implement DataLinksResourceHandler (list, readAttributes, newInputStream)" +``` + +### T010 [US1] — Unit tests for `DataLinksResourceHandler.newInputStream` + +**Files:** +- Create: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy` + +- [ ] **Step 1: Write the spec — MVP scenarios for newInputStream** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0) + */ +package io.seqera.tower.plugin.fs.handler + +import java.net.http.HttpClient +import java.net.http.HttpResponse +import java.nio.file.NoSuchFileException + +import io.seqera.tower.model.DataLinkContentResponse +import io.seqera.tower.model.DataLinkDownloadUrlResponse +import io.seqera.tower.model.DataLinkDto +import io.seqera.tower.model.DataLinkItem +import io.seqera.tower.model.DataLinkItemType +import io.seqera.tower.model.DataLinkProvider +import io.seqera.tower.plugin.datalink.SeqeraDataLinkClient +import io.seqera.tower.plugin.fs.SeqeraFileSystem +import io.seqera.tower.plugin.fs.SeqeraPath +import spock.lang.Specification + +class DataLinksResourceHandlerTest extends Specification { + + private SeqeraFileSystem fs = Mock(SeqeraFileSystem) + private SeqeraDataLinkClient client = Mock(SeqeraDataLinkClient) + private HttpClient http = Mock(HttpClient) + private DataLinksResourceHandler handler = new DataLinksResourceHandler(fs, client, http) + + private DataLinkDto dl(String id, String name, DataLinkProvider p) { + def d = new DataLinkDto(); d.id = id; d.name = name; d.provider = p; return d + } + private DataLinkItem item(String name, DataLinkItemType t, long size) { + def i = new DataLinkItem(); i.name = name; i.type = t; i.size = size; return i + } + + // ---- newInputStream ---- + + def "newInputStream resolves (provider,name,subPath) and streams the signed URL"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq') + def signedBody = new ByteArrayInputStream('data'.bytes) + def httpResp = Mock(HttpResponse) { + statusCode() >> 200 + body() >> signedBody + } + def urlResp = new DataLinkDownloadUrlResponse(); urlResp.url = 'https://signed/a' + + when: + def stream = handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L) >> urlResp + 1 * http.send(_, _) >> httpResp + stream === signedBody + } + + def "newInputStream throws NoSuchFileException when data-link unknown"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/unknown/reads/a.fq') + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + thrown(NoSuchFileException) + } + + def "newInputStream requires trail.size >= 3 (file path, not the data-link root itself)"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs') + + when: + handler.newInputStream(path) + + then: + thrown(IllegalArgumentException) + } + + def "newInputStream wraps signed-URL HTTP 403 as IOException"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq') + def urlResp = new DataLinkDownloadUrlResponse(); urlResp.url = 'https://signed/a' + def httpResp = Mock(HttpResponse) { + statusCode() >> 403 + body() >> new ByteArrayInputStream(new byte[0]) + } + + when: + handler.newInputStream(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getDownloadUrl('dl-1', 'reads/a.fq', 10L) >> urlResp + 1 * http.send(_, _) >> httpResp + thrown(IOException) + } +} +``` + +- [ ] **Step 2: Run the tests** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.handler.DataLinksResourceHandlerTest' -i` +Expected: all pass. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy +git commit -s -m "test(nf-tower): DataLinksResourceHandler.newInputStream unit tests" +``` + +**Checkpoint**: US1 complete — file reads through data-link paths work end to end in unit tests. + +--- + +## Phase 4: User Story 2 — Browse Data-Link Hierarchy (P2) + +### T011 [US2] — List & readAttributes tests for DataLinksResourceHandler + +**Files:** +- Modify: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy` + +- [ ] **Step 1: Append list / readAttributes tests** + +Add the following specs to `DataLinksResourceHandlerTest`: + +```groovy + // ---- list: depth 3 (data-links/) ---- + + def "list at data-links/ returns distinct providers in use"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links') + + when: + def paths = handler.list(path) + + then: + 1 * fs.resolveWorkspaceId('acme', 'research') >> 10L + 1 * client.listDataLinks(10L) >> [ + dl('dl-1', 'a', DataLinkProvider.AWS), + dl('dl-2', 'b', DataLinkProvider.GOOGLE), + dl('dl-3', 'c', DataLinkProvider.AWS) + ] + paths*.toString().sort() == [ + 'seqera://acme/research/data-links/AWS', + 'seqera://acme/research/data-links/GOOGLE' + ] + } + + // ---- list: depth 4 (data-links//) ---- + + def "list at data-links// returns data-link names for that provider"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS') + + when: + def paths = handler.list(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [ + dl('dl-1', 'inputs', DataLinkProvider.AWS), + dl('dl-2', 'archive', DataLinkProvider.AWS), + dl('dl-3', 'onGcs', DataLinkProvider.GOOGLE) + ] + paths*.toString() == [ + 'seqera://acme/research/data-links/AWS/archive', + 'seqera://acme/research/data-links/AWS/inputs' + ] + } + + def "list at data-links// throws when no data-links for that provider"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AZURE') + + when: + handler.list(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'x', DataLinkProvider.AWS)] + thrown(NoSuchFileException) + } + + // ---- list: depth 5 (data-link root) ---- + + def "list at data-link root returns top-level objects"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs') + def content = new DataLinkContentResponse() + content.objects = [item('reads', DataLinkItemType.FOLDER, 0), item('samplesheet.csv', DataLinkItemType.FILE, 42)] + + when: + def paths = handler.list(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getContent('dl-1', '', 10L) >> content + paths*.toString() == [ + 'seqera://acme/research/data-links/AWS/inputs/reads', + 'seqera://acme/research/data-links/AWS/inputs/samplesheet.csv' + ] + } + + // ---- list: depth 6+ (nested sub-path) ---- + + def "list at deep sub-path browses the correct sub-path"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads') + def content = new DataLinkContentResponse() + content.objects = [item('a.fq', DataLinkItemType.FILE, 1), item('b.fq', DataLinkItemType.FILE, 2)] + + when: + def paths = handler.list(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getContent('dl-1', 'reads', 10L) >> content + paths*.toString() == [ + 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq', + 'seqera://acme/research/data-links/AWS/inputs/reads/b.fq' + ] + } + + // ---- readAttributes ---- + + def "readAttributes at data-links/ resource-type dir reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + attr.directory + !attr.regularFile + } + + def "readAttributes at data-link root reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs') + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + attr.directory + } + + def "readAttributes on a file sub-path reports file with size"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads/a.fq') + def content = new DataLinkContentResponse() + content.objects = [item('a.fq', DataLinkItemType.FILE, 123)] + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getContent('dl-1', 'reads/a.fq', 10L) >> content + attr.regularFile + attr.size() == 123L + } + + def "readAttributes on a directory sub-path reports directory"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/reads') + def content = new DataLinkContentResponse() + content.originalPath = 'reads/' + content.objects = [item('a.fq', DataLinkItemType.FILE, 1), item('b.fq', DataLinkItemType.FILE, 2)] + + when: + def attr = handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getContent('dl-1', 'reads', 10L) >> content + attr.directory + } +``` + +- [ ] **Step 2: Run the tests** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.handler.DataLinksResourceHandlerTest' -i` +Expected: all pass. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy +git commit -s -m "test(nf-tower): DataLinksResourceHandler list & readAttributes unit tests" +``` + +### T012 [US2] — Provider-level browsing test (workspace listing enumerates handlers) + +**Files:** +- Modify: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy` + +- [ ] **Step 1: Add a spec confirming `datasets` AND `data-links` appear when listing a workspace** + +Append to the test class (before the final `}`): + +```groovy + def "listing a workspace enumerates both registered resource types"() { + given: + def tc = spyTower() + tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) + + final provider = new SeqeraFileSystemProvider() + provider.newFileSystem(URI.create('seqera://'), [:]) + final fs = provider.getFileSystem(URI.create('seqera://')) as SeqeraFileSystem + final wsPath = new SeqeraPath(fs, 'seqera://acme/research') + + when: + final List entries = [] + provider.newDirectoryStream(wsPath, null).withCloseable { s -> s.iterator().each { entries.add(it) } } + + then: + entries*.toString().sort() == [ + 'seqera://acme/research/data-links', + 'seqera://acme/research/datasets' + ] + } +``` + +- [ ] **Step 2: Run the test** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.SeqeraFileSystemProviderTest' -i` +Expected: all pass — includes both the new spec and the existing dataset-related specs. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy +git commit -s -m "test(nf-tower): workspace listing enumerates both datasets and data-links" +``` + +**Checkpoint**: US2 complete — browsing works at every depth. + +--- + +## Phase 5: User Story 3 — Meaningful Errors (P3) + +### T013 [US3] — Error-mapping tests for data-link paths + +**Files:** +- Modify: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy` + +- [ ] **Step 1: Append error-path specs** + +```groovy + def "unknown data-link under a known provider throws with clear message"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/ghost/a.fq') + + when: + handler.list(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + def ex = thrown(NoSuchFileException) + ex.message.toLowerCase().contains('not found') || ex.reason?.toLowerCase()?.contains('not found') + } + + def "missing sub-path inside a data-link surfaces as NoSuchFileException"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/does/not/exist') + def empty = new DataLinkContentResponse() + empty.originalPath = null + empty.objects = [] + + when: + handler.readAttributes(path) + + then: + 1 * fs.resolveWorkspaceId(_, _) >> 10L + 1 * client.listDataLinks(10L) >> [dl('dl-1', 'inputs', DataLinkProvider.AWS)] + 1 * client.getContent('dl-1', 'does/not/exist', 10L) >> empty + thrown(NoSuchFileException) + } + + def "checkAccess with WRITE is rejected"() { + given: + def path = new SeqeraPath(fs, 'seqera://acme/research/data-links/AWS/inputs/a.fq') + + when: + handler.checkAccess(path, java.nio.file.AccessMode.WRITE) + + then: + thrown(java.nio.file.AccessDeniedException) + } +``` + +- [ ] **Step 2: Run the tests** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.handler.DataLinksResourceHandlerTest' -i` +Expected: all pass. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/handler/DataLinksResourceHandlerTest.groovy +git commit -s -m "test(nf-tower): data-link error paths — unknown link, missing sub-path, write-rejected" +``` + +### T014 [US3] — Unsupported-resource-type error via the provider + +**Files:** +- Modify: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy` + +- [ ] **Step 1: Add a provider-level dispatch test** + +Append: + +```groovy + def "newInputStream on an unsupported resource type throws NoSuchFileException"() { + given: + def tc = spyTower() + tc.sendApiRequest("${ENDPOINT}/user-info") >> ok(userInfoJson()) + tc.sendApiRequest("${ENDPOINT}/user/42/workspaces") >> ok(workspacesJson()) + def provider = new SeqeraFileSystemProvider() + provider.newFileSystem(URI.create('seqera://'), [:]) + def fs = provider.getFileSystem(URI.create('seqera://')) as SeqeraFileSystem + def path = new SeqeraPath(fs, 'seqera://acme/research/unknown-type/foo') + + when: + provider.newInputStream(path) + + then: + def ex = thrown(NoSuchFileException) + ex.message.contains('Unsupported resource type') || ex.reason?.contains('Unsupported resource type') + } +``` + +- [ ] **Step 2: Run the tests** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.SeqeraFileSystemProviderTest' -i` +Expected: all pass. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/SeqeraFileSystemProviderTest.groovy +git commit -s -m "test(nf-tower): unsupported resource type surfaces NoSuchFileException" +``` + +**Checkpoint**: US3 complete — all error paths produce clear, type-specific exceptions. + +--- + +## Phase 6: User Story 4 — Extensibility Validation (P4) + +### T015 [US4] — Architectural guard test + +**Files:** +- Create: `plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/ResourceTypeAbstractionTest.groovy` + +- [ ] **Step 1: Write a guard test** + +```groovy +/* + * Copyright 2013-2026, Seqera Labs + * (Apache 2.0) + */ +package io.seqera.tower.plugin.fs + +import spock.lang.Specification + +/** + * Guards that the generic NIO layer does not reach into resource-type-specific packages. + * {@link SeqeraPath}, {@link SeqeraFileSystem}, {@link SeqeraFileSystemProvider} must not + * depend on {@code dataset/}, {@code datalink/}, or {@code fs/handler/} — handlers dispatch, + * but dispatch lives behind the {@link ResourceTypeHandler} interface. + */ +class ResourceTypeAbstractionTest extends Specification { + + static final Class[] GENERIC_CLASSES = [SeqeraPath, SeqeraFileSystem, SeqeraFileAttributes] + + def "generic fs classes do not import resource-type-specific packages"() { + expect: + GENERIC_CLASSES.each { Class c -> + final src = new File("plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/${c.simpleName}.groovy").text + assert !src.contains('io.seqera.tower.plugin.dataset.'), "${c.simpleName} must not import dataset package" + assert !src.contains('io.seqera.tower.plugin.datalink.'), "${c.simpleName} must not import datalink package" + assert !src.contains('io.seqera.tower.plugin.fs.handler.'), "${c.simpleName} must not import handler package" + assert !src.contains('DataLink') , "${c.simpleName} must not reference data-link types" + assert !src.contains('DatasetDto'), "${c.simpleName} must not reference DatasetDto" + } + } + + def "both handlers implement the ResourceTypeHandler interface"() { + expect: + ResourceTypeHandler.isAssignableFrom(io.seqera.tower.plugin.fs.handler.DatasetsResourceHandler) + ResourceTypeHandler.isAssignableFrom(io.seqera.tower.plugin.fs.handler.DataLinksResourceHandler) + } +} +``` + +- [ ] **Step 2: Run** + +Run: `./gradlew :plugins:nf-tower:test --tests 'io.seqera.tower.plugin.fs.ResourceTypeAbstractionTest' -i` +Expected: all pass. If any import lingered from the refactor (e.g. `SeqeraFileSystem.groovy` still references `DatasetDto` via a cache field type), fix the import before proceeding — the dataset-specific caches belong in `DatasetsResourceHandler` long-term, but keeping the `DatasetDto` typed cache in `SeqeraFileSystem` **is acceptable** as long as the import is `io.seqera.tower.model.DatasetDto` (the tower-api DTO, not the plugin's `dataset` package). The test check against `'DatasetDto'` in source text guards the generic classes; if the existing field trips this, refactor the cache into the handler (move `datasetCache`, `versionCache`, `resolveDatasets`, `resolveDataset`, `resolveVersions`, `invalidateDatasetCache` from `SeqeraFileSystem` into `DatasetsResourceHandler`). + +**Important refactor note**: if the guard test fails on `SeqeraFileSystem.groovy`, perform this sub-step: + +- Remove `datasetCache`, `versionCache`, `resolveDatasets`, `resolveDataset`, `resolveVersions`, and `invalidateDatasetCache` from `SeqeraFileSystem.groovy`, along with their imports (`io.seqera.tower.model.DatasetDto`, `io.seqera.tower.model.DatasetVersionDto`). +- Move the same caches and methods into `DatasetsResourceHandler.groovy` as private fields and synchronized methods. +- Update `DatasetsResourceHandler` to use its own cache methods instead of calling `fs.resolveDatasets(...)` etc. +- Re-run the guard test to confirm. + +- [ ] **Step 3: Commit** + +```bash +git add plugins/nf-tower/src/test/io/seqera/tower/plugin/fs/ResourceTypeAbstractionTest.groovy \ + plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/SeqeraFileSystem.groovy \ + plugins/nf-tower/src/main/io/seqera/tower/plugin/fs/handler/DatasetsResourceHandler.groovy +git commit -s -m "test(nf-tower): enforce resource-type-agnostic boundary in generic fs classes" +``` + +**Checkpoint**: US4 complete — abstraction is validated by automated guard. + +--- + +## Phase 7: Final Verification + +**Note**: both the `plugins/nf-tower/VERSION` bump and `changelog.txt` entries are handled at release time by the repo's release process (see `CLAUDE.md § Release process`), not per-feature. This phase only verifies the build is green. + +### T016 — Final full test run + +- [ ] **Step 1: Run the full nf-tower test suite** + +Run: `./gradlew :plugins:nf-tower:test -i` +Expected: all tests pass — dataset, data-link, path, provider, filesystem, abstraction guard. + +- [ ] **Step 2: Run the full plugin build** + +Run: `./gradlew :plugins:nf-tower:build` +Expected: success. + +- [ ] **Step 3: Confirm no cloud-SDK dependencies were introduced** + +Run: +``` +./gradlew :plugins:nf-tower:dependencies --configuration runtimeClasspath | grep -iE 'aws-sdk|google-cloud|azure-' || echo 'OK: no cloud SDKs on classpath' +``` +Expected output ends with `OK: no cloud SDKs on classpath`. (SC-006) + +- [ ] **Step 4: Summary — nothing to commit; just confirm build-green at HEAD.** + +--- + +## Appendix — Task Dependency Graph + +``` +T001 ─┐ +T002 ─┼──► T003 ──► T004 ──► T005 ──► T006 ──► T007 ──► T008 + │ │ + │ └──► T009 ──► T010 + │ │ + │ ├──► T011 + │ ├──► T012 + │ ├──► T013 + │ └──► T014 + │ │ + │ └──► T015 + │ │ + └──────────────────────────────────────────────────────────────────────────────────┴──► T016 +```