Add seqera:// data-links support to nf-tower filesystem#7070
Open
Add seqera:// data-links support to nf-tower filesystem#7070
Conversation
Signed-off-by: jorgee <[email protected]>
…ing cache Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
Signed-off-by: jorgee <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends the
seqera://NIO filesystem innf-towerwith a second resource type,data-links. Paths of the formseqera://<org>/<ws>/data-links/<provider>/<name>/<sub-path>resolve to files and directories inside Platform-managed data-links (S3/GCS/Azure buckets or prefixes).Listings and attribute queries go through the Platform's
/data-links/{id}/browse[/path]endpoints; byte reads go through pre-signed URLs returned by/data-links/{id}/generate-download-urland fetched with a plain JDKHttpClient. Only the Seqera access token is required — no AWS/GCP/Azure credentials, no cloud SDK dependency is introduced.As part of this change, the existing dataset-specific logic in
SeqeraFileSystemProvider,SeqeraFileSystem, andSeqeraPathis extracted into a realResourceTypeHandlerabstraction;DatasetsResourceHandlerandDataLinksResourceHandlerare the two implementations. The genericfs/classes become resource-type-agnostic for depth ≥ 3 (enforced byResourceTypeAbstractionTest).Design artifacts: spec.md, plan.md, ADR.
Highlights
seqera://<org>/<ws>/data-links/<provider>/<name>/<sub-path>. Provider segments are the lowercaseDataLinkProvider.toString()value (aws,google,azure, …).PagedIterable<T>: a single shared abstraction backs both the workspace data-link list (offset paginated) and data-link content browse (token paginated). The first page is fetched eagerly soIOExceptionsurfaces at the call site, not at the firstIterator.hasNext(). Two named static fetchers (DataLinkListFetcher,DataLinkContentFetcher) own their own cursor state.readAttributeson a sub-path lists the path's parent directory and finds the entry by name; the entry'stype(FILE/FOLDER) is the authoritative signal, and a missing entry →NoSuchFileException. The/browse/{path}response shape alone does not reliably distinguish file/directory/missing paths.SeqeraFileAttributesto each emittedSeqeraPath; the provider also writes resolved attributes back onto the path after a fresh read. SubsequentreadAttributescalls on the same path instance hit the cache (zero API calls).getDataLink(ws, provider, name)issues a combined keyword search (<name> provider:<provider>) so the server returns at most one match.@Memoized, includingnullmisses.SeqeraFileSystemholds theTowerClientdirectly and exposesgetUserId()cached for the lifetime of the FS — the token doesn't change during a pipeline run. User/workspace lookup is shared infrastructure across resource types, not a dataset-client method.credentialsIdforwarding: whenDataLinkDto.credentialsis non-empty, the first credential'sidis forwarded as thecredentialsIdquery parameter on browse and download-URL requests.AbortOperationException; 403 →AccessDeniedException; 404 →NoSuchFileException. Consistent with the dataset client.Mock(TowerClient)). The pre-existing dataset tests are unchanged and continue to pass.Requirements / prerequisites
nf-towerplugin must be enabled withtower.accessToken/TOWER_ACCESS_TOKEN.Known limitations
IOException; Nextflow task retry handles recovery.SeqeraFileAttributes.lastModifiedTime()returnsInstant.EPOCHfor data-link entries.UnsupportedOperationException. The Platform's/data-links/{id}/uploadendpoints are a natural future extension point.Test plan
./gradlew :plugins:nf-tower:test— all 369 tests pass (verified locally)./gradlew :plugins:nf-tower:dependencies --configuration runtimeClasspathshows no new cloud-SDK artifacts (noaws-sdk,google-cloud-storage,azure-*)nextflow fs ls seqera://<org>/<ws>/data-links/*lists providersnextflow fs ls seqera://<org>/<ws>/data-links/<provider>/*lists data-link namesnextflow fs ls seqera://<org>/<ws>/data-links/<provider>/<name>/*lists top-level bucket entriesnextflow fs stat seqera://<org>/<ws>/data-links/<provider>/<name>/<file>reportsis directory: falseand the correctsizenextflow fs stat seqera://<org>/<ws>/data-links/<provider>/<name>/<dir>reportsis directory: truenextflow fs stat seqera://<org>/<ws>/data-links/<provider>/<name>/<missing>raisesNoSuchFileExceptionfile('seqera://…/data-links/<provider>/<name>/path/to/file')using onlyTOWER_ACCESS_TOKENAccessDeniedException