Skip to content

Conversation

frcroth
Copy link
Contributor

@frcroth frcroth commented Sep 10, 2025

Description

  • If configured, uploaded datasets are forwarded to S3 after unzipping
  • layersToLink now reference the source dataset by datasetId (this introduces api version 11)
  • All newly uploaded datasets are virtual (meaning that the on-disk datasource-properties.json is removed and the database entry is the source of truth)
  • layersToLink no longer creates symlinks but rather just entries in the DB, just like it works for uploadToPaths
  • Raw uploaded data is backed up inside of the .trash folder for debugging purposes. It gets cleared regularly anyway.

Steps to test:

  • Configure S3 uploads using appropriate credentials in the application.conf
  • Upload a dataset
  • View that dataset
  • Also test that local filesystem case did not break
  • Test uploading zipped dataset and non-zipped dataset
  • Test tif upload with yarn enable-jobs and a worker
  • test upload from libs (ideally with layersToLink)

TODOs:

  • Handle uploading larger files with multipart upload
  • Fix uploaded files having "aws-chunked" encoding
  • Push updated application.conf without credentials
  • Test with more datasets
  • Do not upload files that are not referenced in layers (e.g. datasource properties.json files)
  • use UPath for everything
  • is there still a case where uploaded datasets should not be virtual? → no!
  • Make all uploaded datasets virtual
  • unify layers to link mechanism
  • api versioning (layersToLink adapter)
  • re-test with s3
  • delete local stuff or move to .trash?
  • make sure both old and new convert_to_wkw variants are supported

Issues:


  • Added changelog entry (create a $PR_NUMBER.md file in unreleased_changes or use ./tools/create-changelog-entry.py)
  • Added migration guide entry if applicable (edit the same file as for the changelog)
  • Removed dev-only changes
  • Considered common edge cases
  • Needs datastore update after deployment

Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

📝 Walkthrough

Walkthrough

Refactors upload and linking flows to be datasetId-centric, adds S3 upload configuration and AWS transfer-manager dependency, replaces symlink-based layer linking with a new LayerToLinkService, introduces Slack failure notifications for finishUpload, bumps API to v12, and updates controllers, services, routes, models, and frontend accordingly.

Changes

Cohort / File(s) Summary of changes
WK remote controller
app/controllers/WKRemoteDataStoreController.scala
Switched reserve/report/delete to datasetId (ObjectId); replaced local layer validation with layerToLinkService.validateLayerToLink; adjusted constructor dependencies and JSON shapes; removed legacy imports.
Dataset & controllers
app/controllers/DatasetController.scala, app/models/dataset/ComposeService.scala, app/models/dataset/Dataset.scala
Replaced createVirtualDataset with createAndSetUpDataset (folderId → ObjectId, added isVirtual), added makeVirtual, injected UsedStorageService, and added post-create tracking calls.
Dataset services & linking
app/models/dataset/DatasetService.scala, app/models/dataset/DatasetUploadToPathsService.scala, app/models/dataset/LayerToLinkService.scala
Added LayerToLinkService (validate/resolve/link layers), delegated layer linking, removed symlink helpers, refactored dataset creation/tracking and constructor dependencies.
Upload service & models
webknossos-datastore/.../services/uploading/UploadService.scala, webknossos-datastore/.../services/uploading/DatasetSymlinkService.scala (removed)
Persist upload→datasetId in Redis; added LinkedLayerIdentifier, ReportDatasetUploadParameters, UploadInformation models; updated reserve/finish/cancel flows to datasetId-centric; added S3 transfer-manager scaffolding and relocation; removed symlink service.
Datastore controllers & remote client
webknossos-datastore/.../controllers/DataSourceController.scala, webknossos-datastore/.../services/DSRemoteWebknossosClient.scala
Replaced DataSource terminology with datasetId across reserve/finish/cancel/delete flows; added slack hook for failed finishUpload; updated remote client signatures to accept ObjectId and parameter object for reportUpload.
Access & tokens
webknossos-datastore/.../services/AccessTokenService.scala
Changed UserAccessRequest factory surface (added administrateDataSources/readDataSources, removed deleteDataSource/writeDataSource); removed assertUserAccess helper.
Datastore config & deps
conf/application.conf, webknossos-datastore/.../DataStoreConfig.scala, project/Dependencies.scala, unreleased_changes/8912.md
Added datastore.s3Upload config and mapping; centralized AWS SDK version and added s3-transfer-manager dependency; docs for S3 upload config.
Datastore internals & helpers
webknossos-datastore/.../datavault/S3DataVault.scala, webknossos-datastore/.../services/DataSourceService.scala, webknossos-datastore/.../models/datasource/DataLayer.scala, webknossos-datastore/.../helpers/DirectoryConstants.scala, webknossos-datastore/.../models/datasource/DataSourceStatus.scala
Exposed hostBucketFromUri publicly; added resolvePathsInNewBasePath; added attachmentMapping to StaticLayer.mapped and propagated attachments; added unpackedDir = ".unpacked"; added DataSourceStatus.notImportedYet.
Legacy API & adapters
webknossos-datastore/.../controllers/DSLegacyApiController.scala, webknossos-datastore/conf/datastore.versioned.routes
Added DSLegacyApiController.reserveUploadV11() and legacy payload types (LegacyReserveUploadInformation, LegacyLinkedLayerIdentifier) to adapt old clients to new datasetId-based flow.
Routing & API versioning
conf/webknossos.latest.routes, conf/webknossos.versioned.routes, webknossos-tracingstore/conf/tracingstore.versioned.routes, util/src/main/scala/com/scalableminds/util/mvc/ApiVersioning.scala
reportDatasetUpload route now expects datasetId; added v12 entries; bumped CURRENT_API_VERSION to 12; added versioned datastore reserveUpload mappings.
Frontend
frontend/javascripts/admin/dataset/dataset_upload_view.tsx
Propagates needsConversion into reserve upload payload.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Possibly related PRs

Suggested reviewers

  • MichaelBuessemeyer
  • frcroth

Poem

I nibbled code and hopped with glee,
Datasets now travel by their ID.
Carrots to S3, no symlink tangle,
Slack warns my whiskers when uploads mangle.
v12's a hop — a rabbit's giggle and jiggle. 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning Several changes—such as introducing analytics tracking and Mailchimp tagging in DatasetService.trackNewDataset and altering ComposeService’s layer mapping logic—are not described in the linked issues and introduce side-effects outside the stated objectives. Consider extracting or removing the analytics and Mailchimp integration and the ComposeService mapping changes into a separate refactoring pull request or provide justification for how they align with the linked issue objectives.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The title “Upload datasets to S3” concisely highlights the primary new feature of forwarding uploaded datasets to S3, avoiding unnecessary detail while remaining specific to the main change implemented in this pull request.
Linked Issues Check ✅ Passed The pull request fully implements the objectives of the linked issues: it adds S3 upload support with configuration and SDK integration (#8415), injects Slack notifications for failed finishUpload flows (#8934), replaces legacy layer identifiers with datasetId and introduces API versioning (#8893), and removes filesystem symlinks in favor of database-backed layer linking (#8749).
Description Check ✅ Passed The description clearly outlines the S3 upload feature, layersToLink API versioning and behavior changes, virtual dataset semantics, symlink removal, data backup handling, and testing steps, all of which directly correspond to the implemented code changes and linked issues.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch s3-upload

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@frcroth frcroth self-assigned this Sep 10, 2025
@frcroth frcroth changed the title Implement rudimentary upload of datasets to S3 Upload datasets to S3 Sep 10, 2025
@frcroth frcroth requested a review from fm3 September 12, 2025 10:05
@frcroth frcroth marked this pull request as ready for review September 12, 2025 10:05
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (7)
unreleased_changes/8912.md (1)

7-17: Specify language for fenced config block to satisfy markdownlint and improve readability.

Use hocon for syntax highlighting.

-```
+```hocon
 s3Upload {
     enabled = true
     # Use the prefix / name of a credential in the dataVaults section here to use it for uploads.
     credentialName = "s3://fsn1.your-objectstorage.com/webknossos-test/upload-tests/"
     endpoint = "https://fsn1.your-objectstorage.com"
     bucketName = "webknossos-test"
     objectKeyPrefix = "upload-tests"
   }

</blockquote></details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)</summary><blockquote>

`70-71`: **Include S3Upload and DataVaults in children for consistency with nested config traversal/logging.**

Prevents omissions in config dumping/validation that rely on children.


```diff
-    val children = List(WebKnossos, WatchFileSystem, Cache, AdHocMesh, Redis, AgglomerateSkeleton)
+    val children = List(WebKnossos, WatchFileSystem, Cache, AdHocMesh, Redis, AgglomerateSkeleton, DataVaults, S3Upload)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)

525-540: Surface partial-failure details from TransferManager.

You check failedTransfers.isEmpty which is good. Consider logging individual failures (key and exception) to aid debugging large uploads.


818-823: Exclude the S3 staging root from orphan cleanup.

cleanUpOrphanUploadsForOrga will treat the .cloud directory itself as an “orphan” and delete/recreate it, adding churn. Filter it out.

Apply:

-        uploadDirs <- PathUtils.listDirectories(orgaUploadingDir, silent = false).toFox
-        _ <- Fox.serialCombined(uploadDirs) { uploadDir =>
+        uploadDirs <- PathUtils.listDirectories(orgaUploadingDir, silent = false).toFox
+        uploadDirsFiltered = uploadDirs.filterNot(_.getFileName.toString == uploadToS3Dir)
+        _ <- Fox.serialCombined(uploadDirsFiltered) { uploadDir =>

398-422: S3 path update + cleanup sequence: add best-effort rollback.

If updateDataSource fails after a successful S3 upload, the local unpack dir is already deleted. Consider best-effort cleanup (e.g., delete uploaded S3 prefix) or tag the dataset with a “failed upload” signal to enable ops cleanup.


254-266: Path traversal check could be more robust.

startsWith("../") on a relativized path is brittle. Prefer normalizing and ensuring the target stays within uploadDir.

Apply:

-  private def isOutsideUploadDir(uploadDir: Path, filePath: String): Boolean =
-    uploadDir.relativize(uploadDir.resolve(filePath)).startsWith("../")
+  private def isOutsideUploadDir(uploadDir: Path, filePath: String): Boolean = {
+    val target = uploadDir.resolve(filePath).normalize()
+    !target.startsWith(uploadDir)
+  }

826-853: Orphan upload cleanup at startup: OK; consider periodic metrics.

You may want a counter for deleted orphans to surface drift.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between abcdb07 and 5c64063.

📒 Files selected for processing (12)
  • app/controllers/WKRemoteDataStoreController.scala (1 hunks)
  • app/models/dataset/DatasetService.scala (2 hunks)
  • conf/application.conf (1 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • project/Dependencies.scala (2 hunks)
  • unreleased_changes/8912.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DirectoryConstants.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (13 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)
util/src/main/scala/com/scalableminds/util/tools/ConfigReader.scala (1)
  • get (23-26)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (3)
  • isEmpty (21-21)
  • DatasetLayerAttachments (13-22)
  • DatasetLayerAttachments (24-27)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
  • DataVaultService (21-30)
  • DataVaultService (32-69)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (411-511)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (2)
  • toString (38-38)
  • toMagLiteral (40-43)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (3)
  • coordinateTransformations (89-89)
  • defaultViewConfiguration (101-101)
  • attachments (107-107)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (3)
  • PathUtils (14-14)
  • ensureDirectoryBox (124-132)
  • deleteDirectoryRecursively (186-193)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
  • DataStoreConfig (10-79)
  • Datastore (17-71)
  • S3Upload (63-69)
  • DataVaults (60-62)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataStoreRedisStore.scala (1)
  • DataStoreRedisStore (8-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1)
  • RemoteSourceDescriptorService (20-177)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala (2)
  • insert (55-66)
  • remove (82-87)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
  • prependAllPaths (369-418)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
  • updateDataSource (139-147)
  • getCredential (194-205)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
  • DataStoreConfig (10-79)
  • Datastore (17-71)
  • S3Upload (63-69)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • isKnownUpload (175-178)
  • reserveUpload (192-230)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • reserveDataSourceUpload (130-139)
🪛 markdownlint-cli2 (0.17.2)
unreleased_changes/8912.md

8-8: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (19)
conf/webknossos.latest.routes (1)

290-290: Whitespace-only change — OK.

No functional impact.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DirectoryConstants.scala (1)

7-7: Constant addition looks good.

Consistent with hidden-dot convention for internal dirs.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)

63-69: Config group addition — OK.

Fields align with application.conf.

app/models/dataset/DatasetService.scala (2)

101-102: Propagation of isVirtual — OK.

Correctly forwards to createDataset.


91-93: Signature change verified — call sites updated.

DatasetService.scala now requires isVirtual; the only invocation found (app/controllers/WKRemoteDataStoreController.scala) passes uploadInfo.isVirtual.getOrElse(false).

project/Dependencies.scala (1)

8-8: Pinning awsVersion and adding s3-transfer-manager — artifacts available.
Verified software.amazon.awssdk:s3 and software.amazon.awssdk:s3-transfer-manager v2.32.24 are present on Maven Central (HTTP 200).

app/controllers/WKRemoteDataStoreController.scala (1)

88-94: createPreliminaryDataset now accepts isVirtual; callsites verified

Signature in app/models/dataset/DatasetService.scala includes isVirtual: Boolean; the sole callsite found (app/controllers/WKRemoteDataStoreController.scala) passes uploadInfo.requireUniqueName.getOrElse(false) and uploadInfo.isVirtual.getOrElse(false), preserving the legacy default false — no other usages found.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)

415-507: Attachment path rewrites included in mapped(): LGTM.

Adding attachmentMapping and propagating it across all concrete copies is correct and unlocks URI rewrites for attachments alongside mags.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)

70-74: Constructor wiring for DataStoreConfig is fine.

No issues; enables controller-level config decisions.


147-149: Manual uploads: forcing isVirtual = false is appropriate.

Matches the “on-disk” manual workflow.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (9)

384-386: Back-compat: datasetId missing in Redis for older reservations.

getDatasetIdByUploadId(uploadId) will fail for uploads reserved before this PR. Add a fallback (e.g., resolve via WK by directoryName) to let those finish.

Would you like me to add a DSRemoteWebknossosClient.findDatasetId(...) fallback here?


454-462: Path allowlist check for explored datasources: good guard.

Ensures URIs after exploration are permissible.


183-186: Staging dir for S3 uploads: OK.

Using ${uploadingDir}/$uploadToS3Dir/$uploadId isolates S3 prep; no functional issues spotted.

Also applies to: 586-599


531-539: Content-encoding workaround comment.

The checksum setting workaround for “aws-chunked” is pragmatic. Please confirm with a small integration test that uploaded objects are readable without special headers.


269-279: Chunk handling and bounds checks: solid.

Includes size guard via totalFileSizeInBytes and idempotent chunk writes.

Also applies to: 315-336


409-417: Delete local staging after S3 push: good.

Saves disk and avoids double-storage.


621-641: Layer linking rename switch is clean.

Covers all concrete layer types.


760-789: ZIP vs. raw layout handling is careful.

Truncating common prefixes and excluding known layer prefixes avoids extra nesting.


855-861: Helper to parse JSON from Redis: OK.

Minimal and typed.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for diving into this! I had a first look and added some comments.

@normanrz maybe you would like to have a look as well, especially at some of the design decisions? Does this fit with our multi-bucket setup? Is it ok that we can’t show datastore-to-s3 upload progress to the user?

I did not yet do any testing.

Note that we’ll have to see what is merged first, this or #8844 – if the latter, we could use UPath stuff here as well.

Also, I guess the worker has to be adapted to write the output of the convert_to_wkw job to S3 as well. But maybe that can be deployed separately from this change?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (2)

162-173: Public helper should handle regional virtual-hosted endpoints (bucket.s3.us-xyz-1.amazonaws.com).

hostBucketFromUri fails for regional virtual-hosted style; making it public risks propagating this limitation. Extract bucket using the first occurrence of ".s3" and normalize path handling.

Apply:

-  def hostBucketFromUri(uri: URI): Option[String] = {
-    val host = uri.getHost
-    if (isShortStyle(uri)) { // assume host is omitted from uri, shortcut form s3://bucket/key
-      Some(host)
-    } else if (isVirtualHostedStyle(uri)) {
-      Some(host.substring(0, host.length - ".s3.amazonaws.com".length))
-    } else if (isPathStyle(uri)) {
-      Some(uri.getPath.substring(1).split("/")(0))
-    } else {
-      None
-    }
-  }
+  def hostBucketFromUri(uri: URI): Option[String] = {
+    val host = Option(uri.getHost).getOrElse("")
+    if (isShortStyle(uri)) { // s3://bucket/key
+      Some(host)
+    } else if (isVirtualHostedStyle(uri)) { // bucket.s3[.region].amazonaws.com
+      val idx = host.indexOf(".s3")
+      if (idx > 0) Some(host.substring(0, idx)) else None
+    } else if (isPathStyle(uri)) { // s3[.region].amazonaws.com/bucket/key OR non-AWS host
+      Some(uri.getPath.stripPrefix("/").split("/")(0))
+    } else None
+  }

175-186: Broaden endpoint detection for AWS styles; escape regex dots.

Covers regional and dualstack variants; current endsWith(".s3.amazonaws.com") is too narrow.

Apply:

-  private def isVirtualHostedStyle(uri: URI): Boolean =
-    uri.getHost.endsWith(".s3.amazonaws.com")
+  private def isVirtualHostedStyle(uri: URI): Boolean =
+    uri.getHost.matches(""".+\.s3([.-][\w\-]+)?\.amazonaws\.com""")
 
-  private def isPathStyle(uri: URI): Boolean =
-    uri.getHost.matches("s3(.[\\w\\-_]+)?.amazonaws.com") ||
-      (!uri.getHost.contains("amazonaws.com") && uri.getHost.contains("."))
+  private def isPathStyle(uri: URI): Boolean =
+    uri.getHost.matches("""s3([.-][\w\-]+)?\.amazonaws\.com""") ||
+      (!uri.getHost.contains("amazonaws.com") && uri.getHost.contains("."))
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)

186-188: Exclude hidden staging dir from orphan cleanup.

.cloud under uploading/ can be mistaken as an orphan and deleted by cleanUpOrphanUploadsForOrga. Skip dot-prefixed dirs in that sweep.

Proposed check inside cleanUpOrphanUploadsForOrga before trying to delete:

  • if uploadDir.getFileName.toString.startsWith("."), continue.

510-523: Bucket/endpoint derivation assumptions are brittle.

  • s3UploadBucketOpt depends on credentialName being an S3 URI containing a bucket segment.

  • s3UploadEndpoint forces HTTPS; some S3-compatible stores are HTTP-only.

  • Validate credentialName format at startup and fail fast with a clear message.

  • Consider reusing S3DataVault.getAmazonS3Client (with injected WSClient) for consistent endpoint/region handling, or make protocol configurable.


524-541: Region is hardcoded to us-east-1.

This can break signing for AWS regional endpoints. Either parse from host (like S3DataVault) or make it configurable.

Short-term: add a datastore.s3Upload.region with default us-east-1. Long-term: parse via AwsHostNameUtils.parseSigningRegion.


575-594: Edge case: depth limit may keep unused files.

maxDepth = 10 may miss deep trees, leaving unreferenced files to be uploaded.

  • Consider unlimited traversal or a higher bound, or compute neededPaths first and filter by prefix without full recursion.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc0509d and acc56ba.

📒 Files selected for processing (5)
  • conf/application.conf (1 hunks)
  • unreleased_changes/8912.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
  • conf/application.conf
🧰 Additional context used
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
  • DataStoreConfig (10-77)
  • Datastore (17-69)
  • S3Upload (63-67)
  • DataVaults (60-62)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
  • S3DataVault (45-149)
  • S3DataVault (151-244)
  • getCredential (140-140)
  • hostBucketFromUri (162-173)
  • create (152-160)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
  • UnfinishedUpload (7-16)
  • UnfinishedUpload (18-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DSRemoteWebknossosClient (57-225)
  • updateDataSource (139-147)
  • reportDataSource (84-89)
  • getCredential (194-205)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
🪛 markdownlint-cli2 (0.17.2)
unreleased_changes/8912.md

8-8: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: frontend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests

@frcroth frcroth removed their assignment Sep 17, 2025
@fm3 fm3 self-assigned this Sep 22, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

511-531: Consider making S3 client initialization more robust

The current S3 client initialization could be improved to handle credential failures more gracefully.

Following up on the previous comment about credentials, consider using Box for better error handling:

-  private lazy val s3ClientBox: Box[S3AsyncClient] = for {
-    accessKeyId <- Box(s3UploadCredentialsOpt.map(_._1))
-    secretAccessKey <- Box(s3UploadCredentialsOpt.map(_._2))
-  } yield
+  private lazy val s3ClientBox: Box[S3AsyncClient] = 
+    s3UploadCredentialsOpt match {
+      case Some((accessKeyId, secretAccessKey)) if accessKeyId.nonEmpty && secretAccessKey.nonEmpty =>
+        Full(
           S3AsyncClient
             .builder()
             .credentialsProvider(
               StaticCredentialsProvider.create(
                 AwsBasicCredentials.builder.accessKeyId(accessKeyId).secretAccessKey(secretAccessKey).build()
               ))
             .crossRegionAccessEnabled(true)
             .forcePathStyle(true)
             .endpointOverride(s3UploadEndpoint)
             .region(Region.US_EAST_1)
             // Disabling checksum calculation prevents files being stored with Content Encoding "aws-chunked".
             .requestChecksumCalculation(RequestChecksumCalculation.WHEN_REQUIRED)
             .build()
+        )
+      case _ =>
+        Failure(s"S3 upload enabled but no valid credentials found for ${dataStoreConfig.Datastore.S3Upload.credentialName}")
+    }

   private lazy val transferManagerBox: Box[S3TransferManager] = for {
     client <- s3ClientBox
   } yield S3TransferManager.builder().s3Client(client).build()

Also, consider making the S3 region configurable instead of hardcoding Region.US_EAST_1.


562-580: Consider optimizing file filtering logic

The getNonReferencedFiles method could be improved for clarity and efficiency.

Consider extracting the implicit path construction logic:

 private def getNonReferencedFiles(dataDir: Path, dataSource: DataSource): Fox[List[Path]] =
   for {
     usableDataSource <- dataSource.toUsable.toFox ?~> "Data source is not usable"
-    explicitPaths: Set[Path] = usableDataSource.dataLayers
+    implicitPaths: Set[Path] = usableDataSource.dataLayers
       .flatMap(layer =>
         layer.mags.map(mag =>
           mag.path match {
             case Some(_) => None
             case None    => Some(dataDir.resolve(List(layer.name, mag.mag.toMagLiteral(true)).mkString("/")))
         }))
       .flatten
       .toSet
-    neededPaths = usableDataSource.dataLayers
+    explicitPaths = usableDataSource.dataLayers
       .flatMap(layer => layer.allExplicitPaths)
       .map(dataDir.resolve)
-      .toSet ++ explicitPaths
+      .toSet
+    neededPaths = explicitPaths ++ implicitPaths
     allFiles <- PathUtils.listFilesRecursive(dataDir, silent = true, maxDepth = 10).toFox
     filesToDelete = allFiles.filterNot(file => neededPaths.exists(neededPath => file.startsWith(neededPath)))
   } yield filesToDelete
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between acc56ba and 36533df.

📒 Files selected for processing (10)
  • app/controllers/WKRemoteDataStoreController.scala (1 hunks)
  • app/models/dataset/DatasetService.scala (1 hunks)
  • conf/application.conf (1 hunks)
  • project/Dependencies.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (3 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • conf/application.conf
  • app/controllers/WKRemoteDataStoreController.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
  • app/models/dataset/DatasetService.scala
  • project/Dependencies.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.

Applied to files:

  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
🧬 Code graph analysis (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
  • DataStoreConfig (11-78)
  • Datastore (18-70)
  • S3Upload (64-68)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • isKnownUpload (164-167)
  • reserveUpload (181-219)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • reserveDataSourceUpload (129-138)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (3)
  • coordinateTransformations (85-85)
  • attachments (103-103)
  • defaultViewConfiguration (97-97)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/PathSchemes.scala (1)
  • PathSchemes (3-9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (102-131)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (4)
  • PathUtils (14-14)
  • ensureDirectoryBox (124-132)
  • deleteDirectoryRecursively (186-193)
  • listFilesRecursive (104-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
  • DataStoreConfig (11-78)
  • Datastore (18-70)
  • S3Upload (64-68)
  • DataVaults (61-63)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
  • S3DataVault (46-155)
  • S3DataVault (157-250)
  • getCredential (143-143)
  • hostBucketFromUri (168-179)
  • create (158-166)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DSRemoteWebknossosClient (56-222)
  • updateDataSource (138-144)
  • reportDataSource (83-88)
  • getCredential (191-202)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
🔇 Additional comments (7)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)

102-129: LGTM! Well-structured attachment mapping support

The addition of the attachmentMapping parameter with a default identity function is clean and maintains backward compatibility. The implementation correctly applies the mapping to both StaticColorLayer and StaticSegmentationLayer branches.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)

345-414: Persist and use isVirtual from reservation to ensure consistency

The finishUpload method currently determines S3 vs. local storage using dataStoreConfig.Datastore.S3Upload.enabled at Line 353. This could lead to inconsistent behavior if the configuration changes between reservation and finish.


390-395: Replace with Instant.logSince for consistency

Use the utility method for consistent timing logs.


387-406: Add cleanup on S3 upload failure

If the S3 upload fails between lines 393-405, local temporary data remains and the pre-created dataset may linger without proper cleanup.


489-510: Fail fast if S3 credentials are missing

The lazy initialization of s3UploadCredentialsOpt at line 489 uses getOrElse(("", "")) pattern which silently creates a client with empty credentials, leading to confusing runtime failures.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)

99-105: Good: isVirtual set from config at reservation time

Setting shouldBeVirtual once and enforcing it via request.body.copy(isVirtual = Some(shouldBeVirtual)) is the right approach. The controller correctly captures the configuration state at reservation time.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)

254-302: URI construction with null scheme creates invalid relative paths

The replaceUri function creates URIs with null schemes when prepending paths, which produces relative path strings that may not be properly handled by S3 clients and could cause path resolution issues.

Apply this refactor to properly handle URI construction and path normalization:

-  // Prepend newBasePath to all (relative) paths in mags and attachments of the data source.
-  def prependAllPaths(dataSource: DataSource, newBasePath: String): Fox[DataSource] = {
-    val replaceUri = (uri: URI) => {
-      val isRelativeFilePath = (uri.getScheme == null || uri.getScheme.isEmpty || uri.getScheme == PathSchemes.schemeFile) && !uri.isAbsolute
-      uri.getPath match {
-        case pathStr if isRelativeFilePath =>
-          new URI(uri.getScheme,
-                  uri.getUserInfo,
-                  uri.getHost,
-                  uri.getPort,
-                  newBasePath + pathStr,
-                  uri.getQuery,
-                  uri.getFragment)
-        case _ => uri
-      }
-    }
+  // Prepend newBasePath to all relative paths in mags and attachments of the data source.
+  def prependAllPaths(dataSource: DataSource, newBasePath: String): Fox[DataSource] = {
+    def ensureTrailingSlash(s: String): String = 
+      if (s.endsWith("/")) s else s + "/"
+    
+    def stripLeadingSlash(s: String): String = 
+      if (s.startsWith("/")) s.drop(1) else s
+    
+    val baseUri = new URI(ensureTrailingSlash(newBasePath))
+    
+    def prependToPath(pathStr: String): String = 
+      baseUri.resolve(stripLeadingSlash(pathStr)).toString
+    
+    def replaceUri(uri: URI): URI = {
+      val scheme = Option(uri.getScheme).getOrElse("")
+      // Only rewrite relative, non-opaque URIs (no scheme or file scheme)
+      if (scheme.isEmpty || scheme == PathSchemes.schemeFile) {
+        new URI(prependToPath(uri.getPath))
+      } else {
+        uri
+      }
+    }

    dataSource.toUsable match {
      case Some(usableDataSource) =>
        val updatedDataLayers = usableDataSource.dataLayers.map {
          case layerWithMagLocators: StaticLayer =>
            layerWithMagLocators.mapped(
              magMapping = mag =>
                mag.path match {
-                  case Some(pathStr) => mag.copy(path = Some(replaceUri(new URI(pathStr)).toString))
+                  case Some(pathStr) => mag.copy(path = Some(prependToPath(pathStr)))
                  // If the mag does not have a path, it is an implicit path, we need to make it explicit.
                  case _ =>
                    mag.copy(
                      path = Some(
-                        new URI(newBasePath)
-                          .resolve(List(layerWithMagLocators.name, mag.mag.toMagLiteral(true)).mkString("/"))
-                          .toString))
+                        baseUri.resolve(List(layerWithMagLocators.name, mag.mag.toMagLiteral(true)).mkString("/")).toString))
              },
              attachmentMapping = attachment =>
                DataLayerAttachments(
                  attachment.meshes.map(a => a.copy(path = replaceUri(a.path))),
                  attachment.agglomerates.map(a => a.copy(path = replaceUri(a.path))),
                  attachment.segmentIndex.map(a => a.copy(path = replaceUri(a.path))),
                  attachment.connectomes.map(a => a.copy(path = replaceUri(a.path))),
                  attachment.cumsum.map(a => a.copy(path = replaceUri(a.path)))
              )
            )
          case layer => layer
        }
        Fox.successful(usableDataSource.copy(dataLayers = updatedDataLayers))
      case None =>
        Fox.failure("Cannot replace paths of unusable datasource")
    }
  }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
app/controllers/WKRemoteDataStoreController.scala (2)

207-224: Await dataset deletion and storage refresh.

Deletion is created as a Fox but not sequenced; effects may not run and errors won’t propagate. Use <- to await.

-                _ = datasetDAO
-                  .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
-                  .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
-              } yield ()
+                _ <- datasetDAO
+                      .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
+                      .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
+              } yield ()

269-279: Re-evaluate ignoring updates for virtual datasets.

With S3 uploads, datasets remain virtual but still need their DataSource persisted. Either keep the DB update in reportDatasetUpload (preferred) or allow this endpoint to update virtual datasets too.

-          _ <- Fox.runIf(!dataset.isVirtual)(
-            datasetDAO.updateDataSource(datasetId,
-                                        name,
-                                        request.body.hashCode(),
-                                        request.body,
-                                        isUsable = request.body.toUsable.isDefined)(GlobalAccessContext))
+          _ <- datasetDAO.updateDataSource(
+                 datasetId,
+                 name,
+                 request.body.hashCode(),
+                 request.body,
+                 isUsable = request.body.toUsable.isDefined
+               )(GlobalAccessContext)

If you keep the persistence in reportDatasetUpload, you can leave this route as-is but ensure no client relies on it for S3 uploads.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

329-339: Fix nested Fox and ensure cleanup triggers on size violation.

Using map returns Fox[Fox[Unit]] and doesn’t sequence cleanup/failure. Use flatMap and propagate the error.

-  private def assertWithinRequestedFileSizeAndCleanUpOtherwise(uploadDir: Path, uploadId: String): Fox[Unit] =
+  private def assertWithinRequestedFileSizeAndCleanUpOtherwise(uploadDir: Path, uploadId: String): Fox[Unit] =
     for {
       totalFileSizeInBytesOpt <- runningUploadMetadataStore.find(redisKeyForTotalFileSizeInBytes(uploadId))
-      _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize =>
-        tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.map(actualFileSize =>
-          if (actualFileSize > maxFileSize.toLong) {
-            cleanUpDatasetExceedingSize(uploadDir, uploadId)
-            Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes")
-          } else Fox.successful(()))
-      }
+      _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize =>
+        for {
+          actualFileSize <- tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox
+          _ <- if (actualFileSize > maxFileSize.toLong) {
+                 cleanUpDatasetExceedingSize(uploadDir, uploadId)
+                   .flatMap(_ =>
+                     Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes"))
+               } else Fox.successful(())
+        } yield ()
+      }
     } yield ()

232-241: Guard against empty filePathRaw before charAt(0).

If the path is empty, charAt(0) throws. Use nonEmpty && startsWith("/").

-      filePathRaw = uploadFileId.split("/").tail.mkString("/")
-      filePath = if (filePathRaw.charAt(0) == '/') filePathRaw.drop(1) else filePathRaw
+      filePathRaw = uploadFileId.split("/").tail.mkString("/")
+      filePath = if (filePathRaw.nonEmpty && filePathRaw.charAt(0) == '/') filePathRaw.drop(1) else filePathRaw
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

414-426: Honor reserved “isVirtual/uploadToS3” flag instead of live config.

This branch uses dataStoreConfig.Datastore.S3Upload.enabled. Persist and read the reservation’s isVirtual/uploadToS3 to avoid inconsistency if config flips.

Proposed shape:

  • Persist isVirtual at reserve time (e.g., redisKeyForIsVirtual(uploadId)).
  • Read it in finish and pass uploadToS3: Boolean into moveUnpackedToTarget.
  • Prefer stored flag; fallback to config only if missing.

I can provide a concrete patch if you confirm the intended key naming in Redis.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fbe524 and 9e04d83.

📒 Files selected for processing (4)
  • app/controllers/DatasetController.scala (5 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (6 hunks)
  • app/models/dataset/DatasetService.scala (4 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (18 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
app/controllers/DatasetController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • LinkedLayerIdentifier (80-80)
  • LinkedLayerIdentifier (82-84)
app/models/storage/UsedStorageService.scala (2)
  • UsedStorageService (21-103)
  • refreshStorageReportForDataset (89-103)
app/models/dataset/DatasetService.scala (1)
  • trackNewDataset (558-570)
app/models/dataset/DatasetService.scala (8)
app/models/user/MultiUser.scala (1)
  • MultiUserDAO (39-181)
app/mail/MailchimpClient.scala (2)
  • MailchimpClient (17-82)
  • tagUser (45-54)
app/models/analytics/AnalyticsService.scala (3)
  • AnalyticsService (22-97)
  • track (38-46)
  • isSuperUser (103-108)
app/models/analytics/AnalyticsEvent.scala (1)
  • UploadDatasetEvent (205-218)
app/telemetry/SlackNotificationService.scala (2)
  • SlackNotificationService (11-59)
  • info (30-36)
app/utils/WkConf.scala (1)
  • WkConf (14-296)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
  • UnusableDataSource (45-62)
  • UnusableDataSource (64-66)
app/models/dataset/DataStore.scala (1)
  • findOneByName (118-125)
app/controllers/WKRemoteDataStoreController.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • deleteDataset (138-143)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)
  • LinkedLayerIdentifier (80-80)
  • LinkedLayerIdentifier (82-84)
  • ReportDatasetUploadParameters (69-74)
  • ReportDatasetUploadParameters (75-78)
  • ReserveAdditionalInformation (63-63)
  • ReserveAdditionalInformation (64-67)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
app/models/dataset/DataStore.scala (1)
  • validateAccess (84-91)
app/security/WebknossosBearerTokenAuthenticatorService.scala (1)
  • userForToken (76-84)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (10)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
  • DataStoreConfig (11-78)
  • Datastore (18-70)
  • S3Upload (64-68)
  • DataVaults (61-63)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
  • S3DataVault (46-155)
  • S3DataVault (157-250)
  • getCredential (143-143)
  • hostBucketFromUri (168-179)
  • create (158-166)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/explore/ExploreLocalLayerService.scala (1)
  • ExploreLocalLayerService (21-152)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
  • UnfinishedUpload (7-16)
  • UnfinishedUpload (18-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DSRemoteWebknossosClient (57-216)
  • reportUpload (98-105)
  • getCredential (185-196)
  • deleteDataset (138-143)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
  • DataSourceService (25-307)
  • resolvePathsInNewBasePath (253-275)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)
  • finishUpload (185-196)
  • deleteOnDisk (390-414)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
  • deleteOnDisk (28-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
🔇 Additional comments (7)
app/models/dataset/DatasetService.scala (2)

88-97: Preliminary datasets as virtual: OK.

Creating preliminary datasets as virtual aligns with S3 uploads and later activation via report path.


558-569: Good tracking side-effects placement.

Slack, analytics, and Mailchimp tagging gated on needsConversion look correct.

app/controllers/DatasetController.scala (2)

650-665: Nice: finalize “upload to paths” updates storage and tracking.

Refreshing used storage and calling trackNewDataset after marking usable is appropriate.


140-148: Injecting UsedStorageService here is appropriate.

The controller’s responsibilities are clear and stay thin.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

520-531: Preserve scheme and port for custom S3 endpoints.

Hardcoding https and omitting the port breaks MinIO and non‑default endpoints.

-  private lazy val s3UploadEndpoint: URI = {
-    val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
-    new URI(
-      "https",
-      null,
-      credentialUri.getHost,
-      -1,
-      null,
-      null,
-      null
-    )
-  }
+  private lazy val s3UploadEndpoint: URI =
+    new URI(dataStoreConfig.Datastore.S3Upload.credentialName)

305-307: Fix log typo.

“datsetId” → “datasetId”.

-              s"Error receiving chunk $currentChunkNumber for uploadId $uploadId (datsetId $datasetId): ${e.getMessage}"
+              s"Error receiving chunk $currentChunkNumber for uploadId $uploadId (datasetId $datasetId): ${e.getMessage}"
app/controllers/WKRemoteDataStoreController.scala (1)

133-151: Persist final DataSource and mark dataset usable in reportDatasetUpload (no‑conversion path).

Currently nothing is written to the DB, so virtual datasets never become usable and paths aren’t stored. Update the dataset when needsConversion is false and require dataSourceOpt.

Apply this diff:

-  def reportDatasetUpload(name: String,
-                          key: String,
-                          token: String,
-                          datasetId: ObjectId): Action[ReportDatasetUploadParameters] =
-    Action.async(validateJson[ReportDatasetUploadParameters]) { implicit request =>
-      dataStoreService.validateAccess(name, key) { _ =>
+  def reportDatasetUpload(name: String,
+                          key: String,
+                          token: String,
+                          datasetId: ObjectId): Action[ReportDatasetUploadParameters] =
+    Action.async(validateJson[ReportDatasetUploadParameters]) { implicit request =>
+      dataStoreService.validateAccess(name, key) { dataStore =>
         for {
           user <- bearerTokenService.userForToken(token)
           dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND
-          _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset))
+          // Persist final DataSource and mark usable when no conversion is needed
+          _ <- Fox.runIf(!request.body.needsConversion) {
+            for {
+              dataSource <- request.body.dataSourceOpt.toFox ?~> "reportUpload.dataSourceMissing"
+              _ <- datasetDAO.updateDataSource(
+                     datasetId,
+                     dataStore.name,
+                     dataSource.hashCode(),
+                     dataSource,
+                     isUsable = dataSource.toUsable.isDefined
+                   )(GlobalAccessContext)
+            } yield ()
+          }
           _ <- datasetService.trackNewDataset(dataset,
                                               user,
                                               request.body.needsConversion,
                                               request.body.datasetSizeBytes,
                                               viaAddRoute = false)
+          _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset))
           // TODO update dataset in db with layersToLink
         } yield Ok
       }
     }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9e04d83 and d09c5f1.

📒 Files selected for processing (5)
  • conf/webknossos.versioned.routes (2 hunks)
  • util/src/main/scala/com/scalableminds/util/mvc/ApiVersioning.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/LegacyController.scala (3 hunks)
  • webknossos-datastore/conf/datastore.versioned.routes (6 hunks)
  • webknossos-tracingstore/conf/tracingstore.versioned.routes (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/LegacyController.scala
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: frontend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
🔇 Additional comments (3)
conf/webknossos.versioned.routes (1)

7-22: Routing bump to v12 looks correct.

The changelog note and the new /v12/ mapping both line up with the datasetId-based upload flow and keep newest routes prioritized.

webknossos-tracingstore/conf/tracingstore.versioned.routes (1)

3-3: Confirm ApiVersioning trait update

Line [3]: The header reminds us this list must stay aligned with com.scalableminds.util.mvc.ApiVersioning. Please double-check that v12 is now reported there (and in any other version registry) so clients get a consistent version story.

util/src/main/scala/com/scalableminds/util/mvc/ApiVersioning.scala (1)

8-8: API version bump looks good.

Thanks for keeping the versioning metadata in sync with the new API surface introduced in this PR.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (5)
app/controllers/DatasetController.scala (1)

236-241: Track dataset creation after addVirtualDataset

Hooking analytics and notifications here is good. Consider viaAddRoute = true for add routes to reflect origin in Slack text.

-        _ <- datasetService.trackNewDataset(dataset,
-                                            user,
-                                            needsConversion = false,
-                                            datasetSizeBytes = 0,
-                                            viaAddRoute = false)
+        _ <- datasetService.trackNewDataset(dataset,
+                                            user,
+                                            needsConversion = false,
+                                            datasetSizeBytes = 0,
+                                            viaAddRoute = true)
app/models/dataset/LayerToLinkService.scala (1)

22-31: Validate layer existence early

Fail fast if the referenced layer does not exist in the source dataset. This catches mistakes at reservation time.

   def validateLayerToLink(layerIdentifier: LinkedLayerIdentifier,
                           requestingUser: User)(implicit ec: ExecutionContext, m: MessagesProvider): Fox[Unit] =
     for {
       dataset <- datasetDAO.findOne(layerIdentifier.datasetId)(AuthorizedAccessContext(requestingUser)) ?~> Messages(
         "dataset.notFound",
         layerIdentifier.datasetId) ~> NOT_FOUND
       isTeamManagerOrAdmin <- userService.isTeamManagerOrAdminOfOrg(requestingUser, dataset._organization)
       _ <- Fox.fromBool(isTeamManagerOrAdmin || requestingUser.isDatasetManager || dataset.isPublic) ?~> "dataset.upload.linkRestricted"
+      _ <- datasetService
+             .usableDataSourceFor(dataset)
+             .flatMap(uds => uds.dataLayers.find(_.name == layerIdentifier.layerName).toFox ?~> "dataset.layerToLink.layerNotFound")
     } yield ()
app/models/dataset/DatasetService.scala (1)

559-568: Slack message content – nit

Optional: include dataset name in the message for easier triage.

-      _ = slackNotificationService.info(s"Dataset added $addLabel$superUserLabel",
-                                        s"For organization: ${organization.name}. <$resultLink|Result>")
+      _ = slackNotificationService.info(s"Dataset added $addLabel$superUserLabel",
+                                        s"For organization: ${organization.name}, dataset: $datasetId. <$resultLink|Result>")
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

539-555: Region hardcoded to US_EAST_1

Make region configurable (or infer from endpoint/credential) to support non-US regions.

-        .region(Region.US_EAST_1)
+        .region(
+          Option(System.getProperty("WK_S3_REGION"))
+            .flatMap(r => Try(Region.of(r)).toOption)
+            .getOrElse(Region.US_EAST_1)
+        )

Outside this file, consider adding a DataStoreConfig.Datastore.S3Upload.region option and read it here.


823-829: Remove persisted isVirtual on cleanup

If you persist isVirtual, also delete it in removeFromRedis.

       _ <- runningUploadMetadataStore.remove(redisKeyForDataSourceId(uploadId))
       _ <- runningUploadMetadataStore.remove(redisKeyForDatasetId(uploadId))
+      _ <- runningUploadMetadataStore.remove(redisKeyForIsVirtual(uploadId))
       _ <- runningUploadMetadataStore.remove(redisKeyForLinkedLayerIdentifier(uploadId))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d09c5f1 and 0f9e165.

📒 Files selected for processing (11)
  • app/controllers/DatasetController.scala (7 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (5 hunks)
  • app/models/dataset/ComposeService.scala (1 hunks)
  • app/models/dataset/DatasetService.scala (4 hunks)
  • app/models/dataset/DatasetUploadToPathsService.scala (3 hunks)
  • app/models/dataset/LayerToLinkService.scala (1 hunks)
  • conf/application.conf (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (7 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (18 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/DSSlackNotificationService.scala (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • conf/application.conf
🧰 Additional context used
🧬 Code graph analysis (9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/DSSlackNotificationService.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/SlackClient.scala (1)
  • warn (18-21)
app/controllers/DatasetController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • LinkedLayerIdentifier (80-80)
  • LinkedLayerIdentifier (82-84)
app/models/storage/UsedStorageService.scala (2)
  • UsedStorageService (21-103)
  • refreshStorageReportForDataset (89-103)
app/models/dataset/DatasetService.scala (1)
  • trackNewDataset (547-559)
app/models/dataset/LayerToLinkService.scala (5)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (2)
  • AuthorizedAccessContext (19-21)
  • DBAccessContext (31-37)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • LinkedLayerIdentifier (80-80)
  • LinkedLayerIdentifier (82-84)
app/models/user/UserService.scala (2)
  • UserService (32-435)
  • isTeamManagerOrAdminOfOrg (321-326)
app/models/dataset/DatasetService.scala (1)
  • usableDataSourceFor (383-389)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
  • toFox (14-14)
  • successful (53-56)
  • failure (58-62)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (3)
  • PathUtils (14-14)
  • deleteDirectoryRecursively (186-193)
  • ensureDirectoryBox (124-132)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
  • DataStoreConfig (11-78)
  • Datastore (18-70)
  • S3Upload (64-68)
  • DataVaults (61-63)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
  • S3DataVault (46-155)
  • S3DataVault (157-250)
  • getCredential (143-143)
  • hostBucketFromUri (168-179)
  • create (158-166)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/explore/ExploreLocalLayerService.scala (1)
  • ExploreLocalLayerService (21-152)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
  • UnfinishedUpload (7-16)
  • UnfinishedUpload (18-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DSRemoteWebknossosClient (57-216)
  • reportUpload (98-105)
  • getCredential (185-196)
  • deleteDataset (138-143)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
app/models/dataset/DatasetUploadToPathsService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • LinkedLayerIdentifier (80-80)
  • LinkedLayerIdentifier (82-84)
app/controllers/DatasetController.scala (2)
  • ReserveDatasetUploadToPathsRequest (59-67)
  • ReserveDatasetUploadToPathsRequest (69-71)
app/models/dataset/LayerToLinkService.scala (1)
  • addLayersToLinkToDataSource (32-42)
app/controllers/WKRemoteDataStoreController.scala (10)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
  • AuthorizedAccessContext (19-21)
  • DBAccessContext (31-37)
  • GlobalAccessContext (25-29)
util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
  • ObjectId (11-13)
  • ObjectId (15-66)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
  • DataSource (31-43)
  • UnusableDataSource (45-62)
  • UnusableDataSource (64-66)
  • isUsable (20-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
  • DataSourceStatus (3-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • ReportDatasetUploadParameters (69-74)
  • ReportDatasetUploadParameters (75-78)
app/models/dataset/LayerToLinkService.scala (2)
  • validateLayerToLink (22-32)
  • addLayersToLinkToDataSource (32-42)
app/models/dataset/DatasetService.scala (2)
  • createVirtualDataset (88-110)
  • trackNewDataset (547-559)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
app/security/WebknossosBearerTokenAuthenticatorService.scala (1)
  • userForToken (76-84)
app/models/dataset/Dataset.scala (3)
  • findOne (203-210)
  • updateDataSource (633-655)
  • deleteDataset (702-730)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/MappingProvider.scala (3)
  • exploreMappings (32-43)
  • MappingProvider (11-24)
  • MappingProvider (26-44)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
  • UnusableDataSource (45-62)
  • UnusableDataSource (64-66)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
  • DataSourceStatus (3-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (104-133)
util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
  • toMagLiteral (40-43)
app/models/dataset/DatasetService.scala (8)
app/models/user/MultiUser.scala (1)
  • MultiUserDAO (39-181)
app/mail/MailchimpClient.scala (2)
  • MailchimpClient (17-82)
  • tagUser (45-54)
app/mail/MailchimpTag.scala (1)
  • MailchimpTag (5-22)
app/models/analytics/AnalyticsService.scala (2)
  • AnalyticsService (22-97)
  • track (38-46)
app/models/analytics/AnalyticsEvent.scala (1)
  • UploadDatasetEvent (205-218)
app/telemetry/SlackNotificationService.scala (2)
  • SlackNotificationService (11-59)
  • info (30-36)
app/utils/WkConf.scala (1)
  • WkConf (14-296)
app/models/dataset/DataStore.scala (1)
  • findOneByName (118-125)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (5)
  • webknossos (70-72)
  • UserAccessRequest (30-30)
  • UserAccessRequest (31-72)
  • writeDataset (52-55)
  • deleteDataset (49-52)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/DSSlackNotificationService.scala (2)
  • DSSlackNotificationService (8-31)
  • noticeFailedFinishUpload (26-31)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
  • reserveDataSourceUpload (123-132)
  • deleteDataset (138-143)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)
  • reserveUpload (167-205)
  • getDatasetIdByUploadId (164-167)
  • extractDatasetUploadId (156-156)
  • finishUpload (341-385)
  • cancelUpload (313-327)
  • isKnownUpload (153-156)
util/src/main/scala/com/scalableminds/util/requestlogging/RequestLogging.scala (1)
  • log (55-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
🔇 Additional comments (24)
app/models/dataset/ComposeService.scala (1)

51-52: Folder ID type change looks good.

Switching to ObjectId keeps createVirtualDataset aligned with the new API surface. No issues spotted.

app/models/dataset/DatasetUploadToPathsService.scala (1)

41-84: Delegating to LayerToLinkService looks correct.

The new dependency injection and the replacement of the inlined linking logic with layerToLinkService.addLayersToLinkToDataSource keep the flow coherent with the rest of the refactor.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4)

97-105: New reservation flow is consistent.

Fetching ReserveAdditionalInformation once and passing it through to UploadService.reserveUpload keeps the data flow sound.


153-169: Chunk-upload auth change looks good.

Ensuring the caller has writeDataset on the resolved dataset ID before accepting chunks aligns with the dataset-centric permissions.


200-214: Cancel flow now guards on dataset ID—looks good.

Pulling the dataset ID first and switching access control to deleteDataset matches the dataset-based authorisation model.


399-411: Remote delete even when not on disk makes sense.

Calling deleteDataset on WK irrespective of local presence keeps remote state tidy; the log message clarifies the local situation.

app/controllers/DatasetController.scala (4)

141-142: Good: usedStorageService injected where needed

Wiring UsedStorageService into the controller enables post-upload report refreshes in one place.


120-124: API type change: folderId is now ObjectId

This is a breaking change for clients sending folderId as string. Confirm all callers (FE, datastore) and route versions are adapted.


209-215: Passing folderIdOpt directly is correct

The explore+add flow now feeds Option[ObjectId] end-to-end. Looks good.


656-665: Finish upload-to-paths: refresh usage before tracking – LGTM

The order (mark usable -> refresh report -> track) is sensible.

app/models/dataset/LayerToLinkService.scala (2)

32-41: Merging linked layers and duplicate-name check – LGTM

Combining resolved layers with existing ones and guarding against duplicates is correct.


42-57: Resolution logic is sound

Uses Authorized/DB contexts correctly, copies layer data immutably, and supports both color/segmentation.

app/models/dataset/DatasetService.scala (3)

88-108: createVirtualDataset: type and access updates – LGTM

  • Accepting DataSource (usable/unusable) is flexible.
  • Asserting folder write access and using folder fallback is correct.

103-108: Use withUpdatedId on DataSource – LGTM

Ensures directoryName + organization are bound to the new dataset id.


547-558: New tracking hook is appropriate

Encapsulating Slack/Mailchimp/Analytics side-effects is clean. No blocking on analytics is good.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (7)

46-61: ReserveUploadInformation: new fields – LGTM

Carrying layersToLink/requireUniqueName/isVirtual is aligned with the new flow.


63-67: ReserveAdditionalInformation: LGTM

Dataset id + directory name is all the datastore needs.


69-78: ReportDatasetUploadParameters shape – LGTM

Encodes the dataset-centric report with optional dataSource and links.


167-203: Write isVirtual to Redis during reserve()

Persist the decision made at reservation. Use it later in finish().

       newDataSourceId = DataSourceId(reserveUploadAdditionalInfo.directoryName, reserveUploadInfo.organization)
@@
       _ <- runningUploadMetadataStore.insert(
         redisKeyForDataSourceId(reserveUploadInfo.uploadId),
         Json.stringify(Json.toJson(newDataSourceId))
       )
+      // Persist isVirtual decision for consistent finish behavior
+      _ <- {
+        val isVirtual = reserveUploadInfo.isVirtual.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled)
+        runningUploadMetadataStore.insert(redisKeyForIsVirtual(reserveUploadInfo.uploadId), isVirtual.toString)
+      }
@@
       _ = logger.info(
         f"Reserving dataset upload ${reserveUploadInfo.uploadId} for dataset ${reserveUploadAdditionalInfo.newDatasetId} ($newDataSourceId)...")

140-146: Persist isVirtual alongside upload metadata

Currently, you don’t store the isVirtual decision at reserve time. If config flips between reserve/finish or a restart occurs, uploads can finalize inconsistently.

Apply:

   private def redisKeyForUploadId(datasourceId: DataSourceId): String =
     s"upload___${Json.stringify(Json.toJson(datasourceId))}___datasourceId"
+  private def redisKeyForIsVirtual(uploadId: String): String =
+    s"upload___${uploadId}___isVirtual"

403-444: moveUnpackedToTarget: parameterize S3 decision and use it instead of live config

Use the computed uploadToS3 boolean, not the current configuration, to avoid inconsistency.

-  private def moveUnpackedToTarget(unpackedDir: Path,
-                                   needsConversion: Boolean,
-                                   datasetId: ObjectId,
-                                   dataSourceId: DataSourceId): Fox[Option[UsableDataSource]] =
+  private def moveUnpackedToTarget(unpackedDir: Path,
+                                   needsConversion: Boolean,
+                                   uploadToS3: Boolean,
+                                   datasetId: ObjectId,
+                                   dataSourceId: DataSourceId): Fox[Option[UsableDataSource]] =
@@
-        newBasePath <- if (dataStoreConfig.Datastore.S3Upload.enabled) {
+        newBasePath <- if (uploadToS3) {
           for {
             s3UploadBucket <- s3UploadBucketOpt.toFox
             beforeS3Upload = Instant.now
             s3ObjectKey = s"${dataStoreConfig.Datastore.S3Upload.objectKeyPrefix}/${dataSourceId.organizationId}/${dataSourceId.directoryName}/"
             _ <- uploadDirectoryToS3(unpackedDir, s3UploadBucket, s3ObjectKey)
             _ = Instant.logSince(beforeS3Upload,
                                  s"Forwarding of uploaded of dataset $datasetId ($dataSourceId) to S3",
                                  logger)
-            endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost
+            endPointHost = s3UploadEndpoint.getHost
             newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox
           } yield newBasePath
         } else {

523-537: Endpoint override drops scheme/port; preserve original endpoint

Construct s3UploadEndpoint from the configured URI to keep http/https and custom port (e.g., MinIO).

-  private lazy val s3UploadEndpoint: URI = {
-    val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
-    new URI(
-      "https",
-      null,
-      credentialUri.getHost,
-      -1,
-      null,
-      null,
-      null
-    )
-  }
+  private lazy val s3UploadEndpoint: URI =
+    new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
app/controllers/WKRemoteDataStoreController.scala (2)

61-90: Reserve dataset: validation and prelim creation – LGTM

  • Validates linked layers with access checks.
  • Creates a virtual dataset with an unusable placeholder, returning ids to the datastore.

206-219: Delete dataset: refresh used storage after delete – LGTM

Marks deleted (or hard-deletes) and refreshes usage; good cleanup flow.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
app/models/dataset/ComposeService.scala (1)

62-70: Reintroduce the StaticLayer guard to prevent MatchError

mapped is implemented via a this match { case StaticColorLayer | StaticSegmentationLayer => ... }, so calling it on a non-static DataLayer (e.g., EditableMappingLayer) now blows up with a MatchError. The earlier version short-circuited those cases; this change regresses into a runtime crash. Let’s keep the defensive pattern match.

-      applyCoordinateTransformations = (cOpt: Option[List[CoordinateTransformation]]) =>
-        cOpt match {
-          case Some(c) => Some(c ++ composeLayer.transformations.toList)
-          case None    => Some(composeLayer.transformations.toList)
-      }
-      editedLayer = layer.mapped(name = composeLayer.newName,
-                                 coordinateTransformations =
-                                   applyCoordinateTransformations(layer.coordinateTransformations))
-    } yield editedLayer
+      applyCoordinateTransformations = (cOpt: Option[List[CoordinateTransformation]]) =>
+        cOpt match {
+          case Some(c) => Some(c ++ composeLayer.transformations.toList)
+          case None    => Some(composeLayer.transformations.toList)
+        }
+      editedLayer <- layer match {
+        case staticLayer: StaticLayer =>
+          Fox.successful(
+            staticLayer.mapped(
+              name = composeLayer.newName,
+              coordinateTransformations =
+                applyCoordinateTransformations(staticLayer.coordinateTransformations)
+            )
+          )
+        case _ =>
+          Fox.failure("Only static layers can be composed")
+      }
+    } yield editedLayer
app/models/dataset/DatasetUploadToPathsService.scala (1)

228-233: Bug: reserveAttachmentUploadToPath regenerates directoryName and ignores uploadToPathsInfixOpt
File: app/models/dataset/DatasetUploadToPathsService.scala Lines ~211–218

  • It calls generateDirectoryName on dataset.directoryName instead of reusing the stored name, and omits uploadToPathsInfixOpt, so attachments land under a different path than layers.
    Apply this diff to mirror addPathsToDatasource:
-      newDirectoryName = datasetService.generateDirectoryName(dataset.directoryName, dataset._id)
-      datasetPath = uploadToPathsPrefix / dataset._organization / newDirectoryName
+      orgaDir = uploadToPathsPrefix / dataset._organization
+      datasetParent = uploadToPathsInfixOpt.map(infix => orgaDir / infix).getOrElse(orgaDir)
+      datasetPath = datasetParent / dataset.directoryName
       attachmentPath = generateAttachmentPath(parameters.attachmentName,
                                               parameters.attachmentDataformat,
                                               parameters.attachmentType,
                                               datasetPath / parameters.layerName)
🧹 Nitpick comments (2)
app/models/dataset/DatasetUploadToPathsService.scala (2)

139-142: Normalize requested path prefix before whitelist check.

If clients pass a logically identical but differently-normalized UPath (e.g., relative vs absolute), this equality test may reject it.

Apply this diff:

-        case Some(requested) =>
-          if (configuredPrefixes.contains(requested)) Full(requested)
+        case Some(requested) =>
+          val requestedAbs = requested.toAbsolute
+          if (configuredPrefixes.contains(requestedAbs)) Full(requestedAbs)
           else Failure("Requested path prefix is not in list of configured ones.")

146-156: DRY: centralize dataset parent path construction (optional).

Both addPathsToDatasource and reserveAttachmentUploadToPath compute the same orgaDir/infix logic. Extract a helper to reduce drift.

Add this helper:

private def datasetParentFor(prefix: UPath, organizationId: String): UPath = {
  val orgaDir = prefix / organizationId
  uploadToPathsInfixOpt.map(infix => orgaDir / infix).getOrElse(orgaDir)
}

Then use:

  • In addPathsToDatasource: replace orgaDir/datasetParent with datasetParentFor(uploadToPathsPrefix, organizationId)
  • In reserveAttachmentUploadToPath: replace with datasetParentFor(uploadToPathsPrefix, dataset._organization)

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 0f9e16546404424413b4b020b1aca1af14b3eb02 and a509ceba4f160452bb55a00bb5117b1a8c36eb8d.

</details>

<details>
<summary>📒 Files selected for processing (4)</summary>

* `app/models/dataset/ComposeService.scala` (2 hunks)
* `app/models/dataset/DatasetUploadToPathsService.scala` (3 hunks)
* `conf/application.conf` (1 hunks)
* `webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala` (4 hunks)

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary>

* conf/application.conf

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧬 Code graph analysis (3)</summary>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (4)</summary><blockquote>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)</summary>

* `webknossos` (70-72)
* `validateAccessFromTokenContext` (87-94)

</details>
<details>
<summary>util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)</summary>

* `validateJson` (204-209)

</details>
<details>
<summary>util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)</summary>

* `fromFuture` (42-53)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)</summary>

* `getDatasetId` (206-216)

</details>

</blockquote></details>
<details>
<summary>app/models/dataset/ComposeService.scala (2)</summary><blockquote>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)</summary>

* `mapped` (104-133)

</details>
<details>
<summary>webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)</summary>

* `coordinateTransformations` (85-85)

</details>

</blockquote></details>
<details>
<summary>app/models/dataset/DatasetUploadToPathsService.scala (2)</summary><blockquote>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)</summary>

* `LinkedLayerIdentifier` (80-80)
* `LinkedLayerIdentifier` (82-84)

</details>
<details>
<summary>app/models/dataset/LayerToLinkService.scala (1)</summary>

* `addLayersToLinkToDataSource` (32-42)

</details>

</blockquote></details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)</summary>

* GitHub Check: backend-tests
* GitHub Check: frontend-tests
* GitHub Check: build-smoketest-push

</details>

<details>
<summary>🔇 Additional comments (5)</summary><blockquote>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)</summary><blockquote>

`133-143`: **Legacy manual uploads must set `isVirtual = false`**

Leaving `isVirtual` as `None` makes manual uploads inherit the new default (`true`), so data that lives on disk is marked virtual and never served correctly. Explicitly pass `Some(false)` here.



```diff
-              Some(request.body.requireUniqueName),
-              None
+              Some(request.body.requireUniqueName),
+              Some(false)
app/models/dataset/DatasetUploadToPathsService.scala (4)

23-23: Good: import LinkedLayerIdentifier from uploading module.

This aligns with the new public surface in the datastore uploading package.


26-28: Good: switched to public request types.

Using ReserveDatasetUploadToPaths(ForPreliminary)Request from controllers is consistent with API versioning plans.


41-41: Good: delegate layer linking via injected LayerToLinkService.

Cleaner separation of concerns; avoids duplicating resolve/link logic here.


67-69: Good: delegate layersToLink handling to LayerToLinkService.

This ensures duplicate-name checks and resolution are centralized.

Please confirm i18n keys used inside LayerToLinkService ("dataset.layerToLink.failed", "dataset.duplicateLayerNames") exist in messages files for all supported locales.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)

330-341: Bug: use flatMap instead of map to avoid nested Fox and ensure cleanup runs

Current map returns Fox[Fox[Unit]] and likely skips size enforcement or fails type inference. Use flatMap.

-      _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize =>
-        tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.map(actualFileSize =>
-          if (actualFileSize > maxFileSize.toLong) {
-            cleanUpDatasetExceedingSize(uploadDir, uploadId)
-            Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes")
-          } else Fox.successful(()))
-      }
+      _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize =>
+        tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.flatMap { actualFileSize =>
+          if (actualFileSize > maxFileSize.toLong) {
+            cleanUpDatasetExceedingSize(uploadDir, uploadId)
+          } else Fox.successful(())
+        }
+      }
frontend/javascripts/admin/dataset/dataset_upload_view.tsx (1)

334-347: Add needsConversion to ReserveUploadInformation TS type and backend
The ReserveUploadInformation type in frontend/javascripts/admin/rest_api.ts (currently at lines 1136–1146) must include a needsConversion?: boolean field, and the corresponding backend DTO needs to accept and handle it optional-safely.

🧹 Nitpick comments (5)
app/models/dataset/DatasetService.scala (2)

88-99: Unfinished uploads query: add guardrails for scale and configurability.

  • Consider making the 14‑day lookback configurable (e.g., WkConf) and optionally add a hard LIMIT to avoid large result sets.
  • Ensure there is a supporting compound index matching the filters: (uploaderId, organizationId, isActive, status, created DESC).

100-121: createAndSetUpDataset: good access checks; avoid double‑write for folder if possible.

  • You create with root folder and then immediately update to folderIdWithFallback. To reduce writes and transient states, consider passing the intended folder into createDataset (e.g., new optional param initialFolderId: Option[ObjectId] = None defaulting to org root).
  • The updated DataSourceId with (directoryName, organizationId) looks correct for virtual datasets.
app/controllers/WKRemoteDataStoreController.scala (1)

76-90: isVirtual toggling vs. needsConversion—confirm intent

You set isVirtual = !needsConversion. That matches the note about convert_to_wkw requiring non‑virtual prelim datasets, but conflicts with “all newly uploaded datasets are virtual” as a general rule. Please confirm this exception is intended and documented in the changelog/API notes.

app/controllers/DatasetController.scala (1)

208-215: Marking explore-and-add datasets virtual is correct; consider analytics hook

Setting isVirtual = true here is aligned with “all newly uploaded datasets are virtual.” Optionally, also call trackNewDataset with viaAddRoute = true to keep analytics/Slack consistent across creation paths.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)

76-86: Two DSRemoteWebknossosClient injections — consolidate

Both remoteWebknossosClient and dsRemoteWebknossosClient are DSRemoteWebknossosClient. Use a single instance to reduce confusion and wiring complexity.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a509ceb and 32ead33.

📒 Files selected for processing (9)
  • app/controllers/DatasetController.scala (7 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (5 hunks)
  • app/models/dataset/ComposeService.scala (2 hunks)
  • app/models/dataset/Dataset.scala (1 hunks)
  • app/models/dataset/DatasetService.scala (4 hunks)
  • app/models/dataset/DatasetUploadToPathsService.scala (4 hunks)
  • frontend/javascripts/admin/dataset/dataset_upload_view.tsx (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (4 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (18 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/models/dataset/ComposeService.scala
🧰 Additional context used
🧬 Code graph analysis (7)
app/models/dataset/Dataset.scala (2)
app/utils/sql/SecuredSQLDAO.scala (1)
  • assertUpdateAccess (36-48)
app/utils/sql/SqlInterpolation.scala (2)
  • q (20-39)
  • asUpdate (74-74)
app/controllers/WKRemoteDataStoreController.scala (9)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
  • AuthorizedAccessContext (19-21)
  • DBAccessContext (31-37)
  • GlobalAccessContext (25-29)
util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
  • ObjectId (11-13)
  • ObjectId (15-66)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
  • DataSource (31-43)
  • UnusableDataSource (45-62)
  • UnusableDataSource (64-66)
  • isUsable (20-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
  • DataSourceStatus (3-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)
  • ReportDatasetUploadParameters (70-75)
  • ReportDatasetUploadParameters (76-79)
  • ReserveAdditionalInformation (64-64)
  • ReserveAdditionalInformation (65-68)
  • ReserveUploadInformation (46-59)
  • ReserveUploadInformation (60-62)
app/models/dataset/LayerToLinkService.scala (2)
  • validateLayerToLink (22-32)
  • addLayersToLinkToDataSource (32-42)
app/models/dataset/DatasetService.scala (2)
  • createAndSetUpDataset (100-123)
  • trackNewDataset (548-560)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
app/models/dataset/Dataset.scala (3)
  • findOne (203-210)
  • updateDataSource (633-655)
  • deleteDataset (708-736)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)
  • webknossos (70-72)
  • validateAccessFromTokenContext (87-94)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
  • LinkedLayerIdentifier (81-81)
  • LinkedLayerIdentifier (83-85)
  • ReserveUploadInformation (46-59)
  • ReserveUploadInformation (60-62)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
  • fromFuture (42-53)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • getDatasetId (206-216)
app/models/dataset/DatasetUploadToPathsService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • LinkedLayerIdentifier (81-81)
  • LinkedLayerIdentifier (83-85)
app/models/dataset/LayerToLinkService.scala (1)
  • addLayersToLinkToDataSource (32-42)
app/models/dataset/Dataset.scala (1)
  • makeVirtual (666-672)
app/models/dataset/DatasetService.scala (8)
app/models/user/MultiUser.scala (1)
  • MultiUserDAO (39-181)
app/mail/MailchimpClient.scala (2)
  • MailchimpClient (17-82)
  • tagUser (45-54)
app/mail/MailchimpTag.scala (1)
  • MailchimpTag (5-22)
app/models/analytics/AnalyticsService.scala (2)
  • AnalyticsService (22-97)
  • track (38-46)
app/models/analytics/AnalyticsEvent.scala (1)
  • UploadDatasetEvent (205-218)
app/telemetry/SlackNotificationService.scala (2)
  • SlackNotificationService (11-59)
  • info (30-36)
app/utils/WkConf.scala (1)
  • WkConf (14-296)
app/models/dataset/DataStore.scala (1)
  • findOneByName (118-125)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
  • DataStoreConfig (11-78)
  • Datastore (18-70)
  • S3Upload (64-68)
  • DataVaults (61-63)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
  • S3DataVault (46-155)
  • S3DataVault (157-250)
  • getCredential (143-143)
  • hostBucketFromUri (168-179)
  • create (158-166)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DSRemoteWebknossosClient (57-216)
  • reportUpload (98-105)
  • getCredential (185-196)
  • deleteDataset (138-143)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
  • DataSourceService (25-303)
  • resolvePathsInNewBasePath (253-269)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
app/controllers/DatasetController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • LinkedLayerIdentifier (81-81)
  • LinkedLayerIdentifier (83-85)
app/models/storage/UsedStorageService.scala (2)
  • UsedStorageService (21-103)
  • refreshStorageReportForDataset (89-103)
app/models/dataset/DatasetService.scala (2)
  • createAndSetUpDataset (100-123)
  • trackNewDataset (548-560)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: backend-tests
  • GitHub Check: build-smoketest-push
  • GitHub Check: frontend-tests
🔇 Additional comments (15)
app/models/dataset/DatasetService.scala (3)

24-24: LGTM: imports match new telemetry/analytics dependencies.

No issues spotted with these additions.

Also applies to: 26-27, 32-33


47-56: Refined search to locate manual new DatasetService invocations
Please rerun and check for any instantiations that need updating:

#!/bin/bash
rg -nP 'new\s+DatasetService\s*\(' --type scala -C2

If this still yields no results, manually verify any Guice modules or test setups that instantiate DatasetService.


560-570: Gate Slack logging by config and recover failures

  • If conf.SlackNotifications.uri is empty/None, return Fox.successful(()) to skip DB reads.
  • Wrap the inner for in .orElse(Fox.successful(())) (or use a recover) so Slack client errors don’t bubble up.
  • Confirm that conf.SlackNotifications.uri yields an empty string or None when unconfigured.
app/models/dataset/DatasetUploadToPathsService.scala (1)

67-69: Delegating layer linking to LayerToLinkService looks good

Centralizing linking and duplicate-name checks here is cleaner and reduces duplication. LGTM.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)

404-445: Respect configured endpoint object; avoid re-parsing credentialName

Use the normalized s3UploadEndpoint (which should preserve scheme/port) when building S3 base path.

-            endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost
-            newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox
+            val endPointHost = s3UploadEndpoint.getAuthority // includes host[:port]
+            newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox

527-538: Preserve scheme and port for custom S3 endpoints

Hardcoding https and dropping the port breaks MinIO and non‑standard endpoints. Keep original URI.

-  private lazy val s3UploadEndpoint: URI = {
-    val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
-    new URI(
-      "https",
-      null,
-      credentialUri.getHost,
-      -1,
-      null,
-      null,
-      null
-    )
-  }
+  private lazy val s3UploadEndpoint: URI =
+    new URI(dataStoreConfig.Datastore.S3Upload.credentialName)

342-386: Persist and honor isVirtual across finish; don’t rely on live config

Finish currently decides S3 vs local via dataStoreConfig.Datastore.S3Upload.enabled. Persist the chosen isVirtual at reserve and read it here for consistent behavior on retries/restarts.

@@
   private def redisKeyForUploadId(datasourceId: DataSourceId): String =
     s"upload___${Json.stringify(Json.toJson(datasourceId))}___datasourceId"
+  private def redisKeyForIsVirtual(uploadId: String): String =
+    s"upload___${uploadId}___isVirtual"
@@ def reserveUpload(...)
-      _ <- runningUploadMetadataStore.insert(
+      _ <- runningUploadMetadataStore.insert(
         redisKeyForDataSourceId(reserveUploadInfo.uploadId),
         Json.stringify(Json.toJson(newDataSourceId))
       )
+      _ <- runningUploadMetadataStore.insert(
+        redisKeyForIsVirtual(reserveUploadInfo.uploadId),
+        reserveUploadInfo.isVirtual
+          .orElse(reserveUploadInfo.needsConversion.map(nc => !nc)) // fallback: virtual if no conversion
+          .getOrElse(dataStoreConfig.Datastore.S3Upload.enabled)
+          .toString
+      )
@@ def finishUpload(...)
-      dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId)
+      isVirtualOpt <- runningUploadMetadataStore.find(redisKeyForIsVirtual(uploadId)).map(_.map(_.toBoolean))
+      uploadToS3 = isVirtualOpt.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled)
+      dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId, uploadToS3)
@@ def cleanUpUploadedDataset(...)
-    removeFromRedis(uploadId)
+    removeFromRedis(uploadId)
@@ def removeFromRedis(uploadId: String)
       _ <- runningUploadMetadataStore.remove(redisKeyForDataSourceId(uploadId))
+      _ <- runningUploadMetadataStore.remove(redisKeyForIsVirtual(uploadId))

And adapt moveUnpackedToTarget signature to accept uploadToS3 and use it instead of reading config.

app/controllers/WKRemoteDataStoreController.scala (1)

128-154: Require dataSource when needsConversion == false (persist and mark usable)

Without this, non‑conversion uploads can leave datasets unusable if dataSourceOpt is omitted. Enforce presence and persist.

       for {
         user <- bearerTokenService.userForToken(token)
         dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND
+        _ <- Fox.fromBool(request.body.needsConversion || request.body.dataSourceOpt.isDefined) ?~>
+             "reportUpload.dataSourceMissing"
         _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset))
         _ <- datasetService.trackNewDataset(dataset,
                                             user,
                                             request.body.needsConversion,
                                             request.body.datasetSizeBytes,
                                             viaAddRoute = false)
         dataSourceWithLinkedLayersOpt <- Fox.runOptional(request.body.dataSourceOpt) {
           implicit val ctx: DBAccessContext = AuthorizedAccessContext(user)
           layerToLinkService.addLayersToLinkToDataSource(_, request.body.layersToLink)
         }
app/controllers/DatasetController.scala (4)

18-18: Consolidate LinkedLayerIdentifier source — good change

Switching to the shared type from uploading services avoids duplication and keeps JSON formats consistent. No action needed.


120-120: API shape change: folderId now ObjectId

DataSourceRegistrationInfo.folderId moved from Option[String] to Option[ObjectId]. Verify all callers (FE and any scripts) send a hex ObjectId and not a plain folder path/string. If older clients exist, provide a small adapter at the boundary.


230-237: createAndSetUpDataset usage looks correct

Passing folderId as ObjectId and setting isVirtual = true matches the new model.


141-141: Make storage‐report refresh best‐effort and retain 200 OK

Wrap the per‐dataset refresh so transient 4xx/5xx errors won’t fail the route after the dataset is marked usable:

-        _ <- usedStorageService.refreshStorageReportForDataset(dataset)
+        // best‐effort: don’t fail the route if storage‐report refresh fails
+        _ <- usedStorageService
+               .refreshStorageReportForDataset(dataset)
+               .recover { case _ => () }

Ensure the failure path of finishUploadToPaths still triggers the Slack notification required by #8934.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (3)

93-117: reserveUploadV11 mapping looks good; keep isVirtual unset

Forwarding requireUniqueName and leaving isVirtual/needsConversion as None is correct for non-manual flows. No changes needed.


119-124: Use datasetDirectoryName when resolving legacy links (prevents 404s)

getDatasetId expects the directory name. Fall back to dataSetName only if datasetDirectoryName is absent. This was flagged earlier and still applies.

Apply:

-      datasetId <- remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId,
-                                                       legacyLayerToLink.dataSetName)
+      val directoryName =
+        legacyLayerToLink.datasetDirectoryName.getOrElse(legacyLayerToLink.dataSetName)
+      datasetId <- remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, directoryName)

Also add a minimal guard so we fail fast if both organizationId and organizationName are missing:

-  private def adaptLayerToLink(legacyLayerToLink: LegacyLinkedLayerIdentifier): Fox[LinkedLayerIdentifier] =
-    for {
+  private def adaptLayerToLink(legacyLayerToLink: LegacyLinkedLayerIdentifier): Fox[LinkedLayerIdentifier] =
+    for {
+      _ <- Fox.fromBool(legacyLayerToLink.organizationId.isDefined || legacyLayerToLink.organizationName.isDefined)
+            ?~> "organization.missing"
       datasetId <- ...

140-146: Set isVirtual = Some(false) for manual uploads (legacy path)

Manual disk uploads should explicitly mark isVirtual = false to avoid relying on defaults.

Apply:

-              Some(request.body.requireUniqueName),
-              None,
+              Some(request.body.requireUniqueName),
+              Some(false),
               needsConversion = None

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

171-207: Persist and consume isVirtual across reserve/finish; also clean up on move failures

Config flips can misroute data (S3 vs local) and break resumability. Persist isVirtual on reserve, read it at finish, and pass it into moveUnpackedToTarget. Also wrap the move step with cleanup-on-failure, like unpack/post-process. Finally, remove the persisted flag on cleanup.

Apply:

@@
   private def redisKeyForDatasetId(uploadId: String): String =
     s"upload___${uploadId}___datasetId"
+  private def redisKeyForIsVirtual(uploadId: String): String =
+    s"upload___${uploadId}___isVirtual"
@@ def reserveUpload(reserveUploadInfo: ReserveUploadInformation,
       _ <- runningUploadMetadataStore.insert(
         redisKeyForDatasetId(reserveUploadInfo.uploadId),
         Json.stringify(Json.toJson(reserveUploadAdditionalInfo.newDatasetId))
       )
+      _ <- runningUploadMetadataStore.insert(
+        redisKeyForIsVirtual(reserveUploadInfo.uploadId),
+        reserveUploadInfo.isVirtual.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled).toString
+      )
@@ def finishUpload(uploadInformation: UploadInformation)
-      dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId)
+      // Decide target storage based on persisted isVirtual (fallback to live config)
+      isVirtualOpt <- runningUploadMetadataStore.find(redisKeyForIsVirtual(uploadId)).map(_.map(_.toBoolean))
+      uploadToS3 = isVirtualOpt.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled)
+      moveResult <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId, uploadToS3).shiftBox
+      _ <- cleanUpOnFailure(moveResult, datasetId, dataSourceId, needsConversion, label = s"moving dataset to target storage")
+      dataSourceWithAbsolutePathsOpt <- moveResult.toFox
@@
   private def moveUnpackedToTarget(unpackedDir: Path,
                                    needsConversion: Boolean,
-                                   datasetId: ObjectId,
-                                   dataSourceId: DataSourceId): Fox[Option[UsableDataSource]] =
+                                   datasetId: ObjectId,
+                                   dataSourceId: DataSourceId,
+                                   uploadToS3: Boolean): Fox[Option[UsableDataSource]] =
@@
-        newBasePath <- if (dataStoreConfig.Datastore.S3Upload.enabled) {
+        newBasePath <- if (uploadToS3) {
@@
-            endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost
+            endPointHost = s3UploadEndpoint.getHost
             newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox
           } yield newBasePath
         } else {
@@ def removeFromRedis(uploadId: String): Fox[Unit] =
       dataSourceId <- getDataSourceIdByUploadId(uploadId)
       _ <- runningUploadMetadataStore.remove(redisKeyForDataSourceId(uploadId))
       _ <- runningUploadMetadataStore.remove(redisKeyForDatasetId(uploadId))
+      _ <- runningUploadMetadataStore.remove(redisKeyForIsVirtual(uploadId))
       _ <- runningUploadMetadataStore.remove(redisKeyForLinkedLayerIdentifier(uploadId))

Also applies to: 349-392, 410-451, 820-840


337-347: Cleanup on size-exceed must actually run; current call is dropped

The call to cleanUpDatasetExceedingSize is not sequenced and won’t execute before returning a failure.

Apply:

-      _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize =>
-        tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.map(actualFileSize =>
-          if (actualFileSize > maxFileSize.toLong) {
-            cleanUpDatasetExceedingSize(uploadDir, uploadId)
-            Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes")
-          } else Fox.successful(()))
-      }
+      _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize =>
+        tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.flatMap { actualFileSize =>
+          if (actualFileSize > maxFileSize.toLong)
+            cleanUpDatasetExceedingSize(uploadDir, uploadId)
+              .flatMap(_ => Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes"))
+          else Fox.successful(())
+        }
+      }
app/controllers/WKRemoteDataStoreController.scala (1)

210-227: Await DB delete and used-storage refresh

Current code builds a Future chain and discards it; side-effects may never run.

Apply:

-                _ = datasetDAO
-                  .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
-                  .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
+                _ <- datasetDAO
+                  .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0)
+                  .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32ead33 and b48966d.

📒 Files selected for processing (3)
  • app/controllers/DatasetController.scala (7 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (5 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (19 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
app/controllers/WKRemoteDataStoreController.scala (8)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
  • AuthorizedAccessContext (19-21)
  • DBAccessContext (31-37)
  • GlobalAccessContext (25-29)
util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
  • ObjectId (11-13)
  • ObjectId (15-66)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
  • DataSource (31-43)
  • UnusableDataSource (45-62)
  • UnusableDataSource (64-66)
  • isUsable (20-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
  • DataSourceStatus (3-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • ReportDatasetUploadParameters (70-75)
  • ReportDatasetUploadParameters (76-79)
app/models/dataset/LayerToLinkService.scala (2)
  • validateLayerToLink (22-32)
  • addLayersToLinkToDataSource (32-42)
app/models/dataset/DatasetService.scala (2)
  • createAndSetUpDataset (100-123)
  • trackNewDataset (548-560)
app/models/dataset/Dataset.scala (3)
  • findOne (203-210)
  • updateDataSource (633-655)
  • deleteDataset (708-736)
app/controllers/DatasetController.scala (2)
app/models/storage/UsedStorageService.scala (2)
  • UsedStorageService (21-103)
  • refreshStorageReportForDataset (89-103)
app/models/dataset/DatasetService.scala (2)
  • createAndSetUpDataset (100-123)
  • trackNewDataset (548-560)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (4)
  • PathUtils (14-14)
  • deleteDirectoryRecursively (186-193)
  • ensureDirectoryBox (124-132)
  • listFilesRecursive (104-108)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (2)
  • DataStoreConfig (11-78)
  • S3Upload (64-68)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
  • S3DataVault (46-155)
  • S3DataVault (157-250)
  • getCredential (143-143)
  • hostBucketFromUri (168-179)
  • create (158-166)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
  • UPath (54-96)
  • fromLocalPath (80-80)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
  • UnfinishedUpload (7-16)
  • UnfinishedUpload (18-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
  • DSRemoteWebknossosClient (57-216)
  • reportUpload (98-105)
  • getCredential (185-196)
  • deleteDataset (138-143)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
  • CredentialConfigReader (8-64)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
  • deleteOnDisk (28-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
  • GitHub Check: build-smoketest-push
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)

431-439: Use normalized endpoint host for S3 base path

Don’t re-parse the config string here; use s3UploadEndpoint to ensure scheme/host/port are consistent.

Apply:

-            endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost
+            endPointHost = s3UploadEndpoint.getHost

533-544: Preserve scheme/port when building S3 endpoint

Hardcoding https and dropping port breaks MinIO/custom endpoints.

Apply:

-  private lazy val s3UploadEndpoint: URI = {
-    val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
-    new URI(
-      "https",
-      null,
-      credentialUri.getHost,
-      -1,
-      null,
-      null,
-      null
-    )
-  }
+  private lazy val s3UploadEndpoint: URI =
+    new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
app/controllers/DatasetController.scala (1)

230-243: trackNewDataset not awaited and viaAddRoute should be true for Add route

Current code doesn’t await side-effects (Slack/analytics) and mislabels the source.

Apply:

-        )
-        _ = datasetService.trackNewDataset(dataset,
-                                           user,
-                                           needsConversion = false,
-                                           datasetSizeBytes = 0,
-                                           viaAddRoute = false)
+        )
+        _ <- datasetService.trackNewDataset(dataset,
+                                            user,
+                                            needsConversion = false,
+                                            datasetSizeBytes = 0,
+                                            viaAddRoute = true)
app/controllers/WKRemoteDataStoreController.scala (1)

125-154: Require dataSource when needsConversion == false; also await tracking

Without enforcing dataSource for non-conversion uploads, datasets remain unusable. Also, trackNewDataset must be awaited.

Apply:

   def reportDatasetUpload(name: String,
                           key: String,
                           token: String,
                           datasetId: ObjectId): Action[ReportDatasetUploadParameters] =
     Action.async(validateJson[ReportDatasetUploadParameters]) { implicit request =>
       dataStoreService.validateAccess(name, key) { _ =>
         for {
           user <- bearerTokenService.userForToken(token)
           dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND
-          _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset))
-          _ = datasetService.trackNewDataset(dataset,
-                                             user,
-                                             request.body.needsConversion,
-                                             request.body.datasetSizeBytes,
-                                             viaAddRoute = false)
+          _ <- Fox.fromBool(request.body.needsConversion || request.body.dataSourceOpt.isDefined) ?~>
+               "reportUpload.dataSourceMissing"
+          _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset))
+          _ <- datasetService.trackNewDataset(dataset,
+                                              user,
+                                              request.body.needsConversion,
+                                              request.body.datasetSizeBytes,
+                                              viaAddRoute = false)
           dataSourceWithLinkedLayersOpt <- Fox.runOptional(request.body.dataSourceOpt) {
             implicit val ctx: DBAccessContext = AuthorizedAccessContext(user)
             layerToLinkService.addLayersToLinkToDataSource(_, request.body.layersToLink)
           }
           _ <- Fox.runOptional(dataSourceWithLinkedLayersOpt) { dataSource =>
             logger.info(s"Updating dataset $datasetId in database after upload reported from datastore $name.")
             datasetDAO.updateDataSource(datasetId,
                                         dataset._dataStore,
                                         dataSource.hashCode(),
                                         dataSource,
                                         isUsable = true)(GlobalAccessContext)
           }
         } yield Ok
       }
     }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b48966d and 4bfb9a2.

📒 Files selected for processing (3)
  • app/controllers/WKRemoteDataStoreController.scala (6 hunks)
  • app/models/dataset/Dataset.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
app/models/dataset/Dataset.scala (2)
app/utils/sql/SecuredSQLDAO.scala (2)
  • existingCollectionName (16-16)
  • assertUpdateAccess (36-48)
app/utils/sql/SqlInterpolation.scala (2)
  • q (20-39)
  • asUpdate (74-74)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)
  • webknossos (70-72)
  • validateAccessFromTokenContext (87-94)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
  • LinkedLayerIdentifier (81-81)
  • LinkedLayerIdentifier (83-85)
  • ReserveUploadInformation (46-59)
  • ReserveUploadInformation (60-62)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
  • fromFuture (42-53)
  • successful (53-56)
util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (1)
  • fromStringSync (22-26)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
  • getDatasetId (206-216)
app/controllers/WKRemoteDataStoreController.scala (11)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
  • AuthorizedAccessContext (19-21)
  • DBAccessContext (31-37)
  • GlobalAccessContext (25-29)
util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
  • ObjectId (11-13)
  • ObjectId (15-66)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
  • DataSource (31-43)
  • UnusableDataSource (45-62)
  • UnusableDataSource (64-66)
  • isUsable (20-20)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
  • DataSourceStatus (3-12)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
  • ReportDatasetUploadParameters (70-75)
  • ReportDatasetUploadParameters (76-79)
app/models/storage/UsedStorageService.scala (2)
  • UsedStorageService (21-103)
  • refreshStorageReportForDataset (89-103)
app/models/dataset/LayerToLinkService.scala (2)
  • validateLayerToLink (22-32)
  • addLayersToLinkToDataSource (32-42)
app/models/dataset/DatasetService.scala (2)
  • createAndSetUpDataset (100-123)
  • trackNewDataset (548-560)
util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
  • validateJson (204-209)
app/security/WebknossosBearerTokenAuthenticatorService.scala (1)
  • userForToken (76-84)
app/models/dataset/Dataset.scala (3)
  • findOne (203-210)
  • updateDataSource (634-656)
  • deleteDataset (709-737)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-smoketest-push
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
🔇 Additional comments (2)
app/controllers/WKRemoteDataStoreController.scala (1)

140-151: Re-introduce the dataSource guard for non-conversion uploads.

If needsConversion is false and dataSourceOpt is missing, we leave the dataset backed by the placeholder UnusableDataSource, so it remains unusable despite reporting success. The previous guard prevented this, but it’s gone now. Please add it back before processing optional linkage.

         for {
           user <- bearerTokenService.userForToken(token)
           dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND
+          _ <- Fox.fromBool(request.body.needsConversion || request.body.dataSourceOpt.isDefined) ?~>
+               "reportUpload.dataSourceMissing"
           _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset))
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)

127-130: Prefer datasetDirectoryName when available.

remoteWebknossosClient.getDatasetId expects the directory name. When clients supply datasetDirectoryName, we must use it, otherwise legacy uploads break as soon as friendly names differ from directory names.

-          remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, legacyLayerToLink.dataSetName)
+          val directoryName =
+            legacyLayerToLink.datasetDirectoryName.getOrElse(legacyLayerToLink.dataSetName)
+          remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, directoryName)

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for implementing this. Including @frcroth :)

Here is my code feedback / questions. Testing is TODO

for {
organization <- organizationDAO.findOne(user._organization)(GlobalAccessContext)
multiUser <- multiUserDAO.findOne(user._multiUser)(GlobalAccessContext)
resultLink = s"${conf.Http.uri}/datasets/$datasetId"
Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proper link format should be "authority/datasets/<ds name>-<ds-id>" if I am not incorrect. The frontend understands the only id format and auto converts it to the usual one, but maybe we want to still use the "authority/datasets/<ds name>-<ds-id>" format here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the id-only format is supported now, we will probably have to support it later. I actually think it’s the clean variant, and the prepended name one is only “sugar” to make the URIs more readable where needed. I’d rather not add the name here because we’d have to be careful that it stays a valid URI etc 🤔

case S3AccessKeyCredential(credentialName, accessKeyId, secretAccessKey, _, _)
if dataStoreConfig.Datastore.S3Upload.credentialName == credentialName =>
(accessKeyId, secretAccessKey)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the same s3 instance needs to be registered twice in the config? Once for datavault access and once to register it as an upload target?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The credentials are taken from the same source though, only the upload target is given again. It can include a prefix to direct the upload to specific storage buckets (this may change regularly as the first are filled up)

@fm3 fm3 requested a review from MichaelBuessemeyer October 2, 2025 12:41
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (7)
app/models/dataset/Dataset.scala (1)

667-672: Verify that worker flows use GlobalAccessContext when calling makeVirtual.

This method asserts update access, which could block convert_to_wkw worker paths if they don't use GlobalAccessContext. Confirm that all call sites (especially worker flows) either have update access or use GlobalAccessContext.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)

188-198: Verify that Slack notification is triggered only on failures.

The log(Some(slackNotificationService.noticeFailedFinishUpload)) wrapper may trigger notifications even on successful 200 OK responses. Confirm that the notification is only sent on failures.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)

352-393: Verify cleanup on moveUnpackedToTarget failure and config consistency.

Two concerns:

  1. If moveUnpackedToTarget fails, the local unpackToDir may not be cleaned up. Consider wrapping the move result in cleanUpOnFailure similar to unpack/post-processing steps.
  2. The method uses dataStoreConfig.Datastore.S3Upload.enabled directly instead of reading a persisted isVirtual flag from Redis. If the config changes between reserve and finish, the behavior may be inconsistent. Consider persisting isVirtual at reserve time and reading it here.

410-451: Verify cleanup on S3 upload failure.

If the S3 upload (line 432) or path rewrite (line 446) fails, the local unpackedDir may not be cleaned up. Consider wrapping the S3 branch in a recovery handler that deletes the local directory and calls remoteWebknossosClient.deleteDataset(datasetId) on failure.


546-562: Fail fast on missing S3 credentials.

The current code extracts accessKeyId and secretAccessKey as Box[Option[String]], which can succeed with None values and create an invalid S3 client. Validate s3UploadCredentialsOpt before building the client and fail with a clear error if missing.


533-544: Preserve scheme and port in s3UploadEndpoint.

The endpoint URI construction hardcodes "https" and port -1, which will break non-HTTPS or non-default-port S3 endpoints (e.g., MinIO). Preserve the original scheme and port from credentialName.


436-437: Use s3UploadEndpoint for consistent host extraction.

Line 436 extracts the host directly from the credentialName URI, but s3UploadEndpoint (lines 533-544) normalizes the URI. For consistency, use s3UploadEndpoint.getHost instead of re-parsing credentialName.

Apply this diff:

-            endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost
+            endPointHost = s3UploadEndpoint.getHost
🧹 Nitpick comments (1)
unreleased_changes/8912.md (1)

8-15: Add language identifier to the code block.

The code block should specify a language for proper syntax highlighting and to satisfy markdown linting rules (MD040).

Based on static analysis hints.

Apply this diff:

-```
+```hocon
 s3Upload {
   enabled = false
   # Use the name of a credential in the dataVaults section here to use it for uploads.
   credentialName = "s3://example/uri/prefix"
   objectKeyPrefix = "webknossos-uploads"
 }

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: CodeRabbit UI

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 4bfb9a225cf601fccc3c5f89bc57de92217dcb12 and 2aebc3497affc6de63adcb7921f52bb5f7970706.

</details>

<details>
<summary>📒 Files selected for processing (6)</summary>

* `app/models/dataset/ComposeService.scala` (2 hunks)
* `app/models/dataset/Dataset.scala` (2 hunks)
* `unreleased_changes/8912.md` (1 hunks)
* `webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala` (7 hunks)
* `webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala` (1 hunks)
* `webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala` (19 hunks)

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (1)</summary>

<details>
<summary>📚 Learning: 2025-04-28T14:18:04.368Z</summary>

Learnt from: frcroth
PR: #8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad, Fox.fromBool() expressions should use the <- binding operator instead of the = assignment operator to properly propagate error conditions. Using = will cause validation failures to be silently ignored.


**Applied to files:**
- `webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala`

</details>

</details><details>
<summary>🧬 Code graph analysis (4)</summary>

<details>
<summary>app/models/dataset/Dataset.scala (3)</summary><blockquote>

<details>
<summary>app/models/project/Project.scala (1)</summary>

* `findOneByNameAndOrganization` (115-125)

</details>
<details>
<summary>app/utils/sql/SecuredSQLDAO.scala (3)</summary>

* `readAccessQuery` (23-36)
* `existingCollectionName` (16-16)
* `assertUpdateAccess` (36-48)

</details>
<details>
<summary>app/utils/sql/SqlInterpolation.scala (2)</summary>

* `q` (20-39)
* `asUpdate` (74-74)

</details>

</blockquote></details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (6)</summary><blockquote>

<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (6)</summary>

* `webknossos` (70-72)
* `validateAccessFromTokenContext` (87-94)
* `UserAccessRequest` (30-30)
* `UserAccessRequest` (31-72)
* `writeDataset` (52-55)
* `deleteDataset` (49-52)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/DSSlackNotificationService.scala (2)</summary>

* `DSSlackNotificationService` (8-31)
* `noticeFailedFinishUpload` (26-31)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)</summary>

* `reserveDataSourceUpload` (123-132)
* `deleteDataset` (138-143)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)</summary>

* `reserveUpload` (171-212)
* `getDatasetIdByUploadId` (168-171)
* `extractDatasetUploadId` (157-157)
* `finishUpload` (352-393)
* `cancelUpload` (321-335)
* `isKnownUpload` (154-157)

</details>
<details>
<summary>util/src/main/scala/com/scalableminds/util/requestlogging/RequestLogging.scala (1)</summary>

* `log` (55-62)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/SlackClient.scala (1)</summary>

* `warn` (18-21)

</details>

</blockquote></details>
<details>
<summary>app/models/dataset/ComposeService.scala (3)</summary><blockquote>

<details>
<summary>app/models/dataset/DatasetService.scala (1)</summary>

* `createAndSetUpDataset` (100-123)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)</summary>

* `mapped` (104-133)

</details>
<details>
<summary>webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)</summary>

* `coordinateTransformations` (85-85)

</details>

</blockquote></details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (7)</summary><blockquote>

<details>
<summary>util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)</summary>

* `PathUtils` (14-14)
* `deleteDirectoryRecursively` (186-193)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)</summary>

* `DataStoreConfig` (11-78)
* `Datastore` (18-70)
* `S3Upload` (64-68)
* `DataVaults` (61-63)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)</summary>

* `S3DataVault` (46-182)
* `S3DataVault` (184-277)
* `getCredential` (170-170)
* `hostBucketFromUri` (195-206)
* `create` (185-193)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)</summary>

* `UPath` (54-96)
* `fromLocalPath` (80-80)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)</summary>

* `UnfinishedUpload` (7-16)
* `UnfinishedUpload` (18-20)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)</summary>

* `DSRemoteWebknossosClient` (57-216)
* `reportUpload` (98-105)
* `getCredential` (185-196)
* `deleteDataset` (138-143)

</details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)</summary>

* `CredentialConfigReader` (8-64)

</details>

</blockquote></details>

</details><details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>unreleased_changes/8912.md</summary>

8-8: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)</summary>

* GitHub Check: build-smoketest-push
* GitHub Check: frontend-tests

</details>

<details>
<summary>🔇 Additional comments (19)</summary><blockquote>

<details>
<summary>app/models/dataset/ComposeService.scala (2)</summary><blockquote>

`50-55`: **LGTM! Virtual dataset creation aligns with PR objectives.**

The refactor from `createVirtualDataset` to `createAndSetUpDataset` with `isVirtual = true` correctly enforces the virtual dataset model. The `targetFolderId` type change to `ObjectId` is consistent with the method signature and improves type safety.

---

`71-73`: **LGTM! Simplified layer editing using polymorphic `mapped` method.**

The refactor eliminates explicit pattern matching by leveraging the `StaticLayer.mapped` method, which internally handles both `StaticColorLayer` and `StaticSegmentationLayer` types. This reduces code duplication and improves maintainability while preserving the coordinate transformation logic.

</blockquote></details>
<details>
<summary>app/models/dataset/Dataset.scala (1)</summary><blockquote>

`446-458`: **LGTM: Backwards‑compatible lookup by directoryName.**

The method now looks up datasets by `directoryName` instead of `name`, which aligns with the shift to datasetId-centric operations. The comment correctly explains that this is for backwards compatibility with legacy links.

</blockquote></details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (5)</summary><blockquote>

`36-36`: **LGTM: Slack notification service injected.**

The DSSlackNotificationService is properly injected for use in failure logging.




Also applies to: 67-67

---

`100-103`: **LGTM: Reserve flow updated to use for-comprehension.**

The reserve flow now awaits `reserveUploadAdditionalInfo` before calling `uploadService.reserveUpload`, which is more readable and correct.

---

`153-155`: **LGTM: Chunk upload flow updated to datasetId.**

The chunk upload and test chunk flows now correctly use `getDatasetIdByUploadId` and `UserAccessRequest.writeDataset(datasetId)`.




Also applies to: 176-177

---

`202-214`: **LGTM: Cancel upload flow updated to datasetId.**

The cancel upload flow now correctly uses `getDatasetIdByUploadId` and `UserAccessRequest.deleteDataset(datasetId)`.

---

`405-410`: **LGTM: Delete on disk flow updated to datasetId.**

The delete on disk flow now correctly uses `dsRemoteWebknossosClient.deleteDataset(datasetId)` and the log message references `datasetId`.

</blockquote></details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1)</summary><blockquote>

`195-195`: **LGTM: hostBucketFromUri made public for S3 upload support.**

The visibility change allows `UploadService` and other modules to extract bucket names from S3 URIs during the upload workflow.

</blockquote></details>
<details>
<summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (10)</summary><blockquote>

`46-101`: **LGTM: Data models updated for datasetId‑centric flow.**

The new and updated data models correctly reflect the shift to datasetId-based operations and S3 upload support. JSON formats are properly defined.

---

`103-108`: **LGTM: DataStoreConfig dependency added.**

The `DataStoreConfig` dependency is correctly injected to support S3 upload configuration.

---

`143-144`: **LGTM: Redis key helper added for datasetId.**

The `redisKeyForDatasetId` helper is consistent with existing Redis key patterns.

---

`159-164`: **LGTM: Upload directory helpers updated.**

The upload directory helpers now support the backup feature by creating backup directories in the `.trash` folder.

---

`168-169`: **LGTM: DatasetId retrieval method added.**

The `getDatasetIdByUploadId` method is consistent with the existing `getDataSourceIdByUploadId` pattern.

---

`171-210`: **LGTM: reserveUpload updated for datasetId and layer linking.**

The method now correctly validates that `needsConversion` and `layersToLink` are not both set, and stores the `datasetId` in Redis.

---

`263-319`: **LGTM: handleUploadChunk updated to datasetId.**

The chunk upload handler now correctly uses `getDatasetIdByUploadId` and includes `datasetId` in log messages.

---

`321-335`: **LGTM: cancelUpload updated to datasetId.**

The cancel upload handler now correctly uses `getDatasetIdByUploadId` and includes `datasetId` in log messages.

---

`602-628`: **LGTM: Cleanup logic includes dataset deletion.**

The cleanup logic correctly deletes the local directory and calls `remoteWebknossosClient.deleteDataset(datasetId)` on failure.

---

`802-804`: **LGTM: Raw uploaded data backed up for debugging.**

The backup feature correctly copies the raw uploaded data to a `.trash` directory before processing.

</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants