-
Notifications
You must be signed in to change notification settings - Fork 29
Upload datasets to S3 #8912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Upload datasets to S3 #8912
Conversation
📝 WalkthroughWalkthroughRefactors upload and linking flows to be datasetId-centric, adds S3 upload configuration and AWS transfer-manager dependency, replaces symlink-based layer linking with a new LayerToLinkService, introduces Slack failure notifications for finishUpload, bumps API to v12, and updates controllers, services, routes, models, and frontend accordingly. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (7)
unreleased_changes/8912.md (1)
7-17
: Specify language for fenced config block to satisfy markdownlint and improve readability.Use hocon for syntax highlighting.
-``` +```hocon s3Upload { enabled = true # Use the prefix / name of a credential in the dataVaults section here to use it for uploads. credentialName = "s3://fsn1.your-objectstorage.com/webknossos-test/upload-tests/" endpoint = "https://fsn1.your-objectstorage.com" bucketName = "webknossos-test" objectKeyPrefix = "upload-tests" }</blockquote></details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)</summary><blockquote> `70-71`: **Include S3Upload and DataVaults in children for consistency with nested config traversal/logging.** Prevents omissions in config dumping/validation that rely on children. ```diff - val children = List(WebKnossos, WatchFileSystem, Cache, AdHocMesh, Redis, AgglomerateSkeleton) + val children = List(WebKnossos, WatchFileSystem, Cache, AdHocMesh, Redis, AgglomerateSkeleton, DataVaults, S3Upload)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
525-540
: Surface partial-failure details from TransferManager.You check
failedTransfers.isEmpty
which is good. Consider logging individual failures (key and exception) to aid debugging large uploads.
818-823
: Exclude the S3 staging root from orphan cleanup.
cleanUpOrphanUploadsForOrga
will treat the.cloud
directory itself as an “orphan” and delete/recreate it, adding churn. Filter it out.Apply:
- uploadDirs <- PathUtils.listDirectories(orgaUploadingDir, silent = false).toFox - _ <- Fox.serialCombined(uploadDirs) { uploadDir => + uploadDirs <- PathUtils.listDirectories(orgaUploadingDir, silent = false).toFox + uploadDirsFiltered = uploadDirs.filterNot(_.getFileName.toString == uploadToS3Dir) + _ <- Fox.serialCombined(uploadDirsFiltered) { uploadDir =>
398-422
: S3 path update + cleanup sequence: add best-effort rollback.If
updateDataSource
fails after a successful S3 upload, the local unpack dir is already deleted. Consider best-effort cleanup (e.g., delete uploaded S3 prefix) or tag the dataset with a “failed upload” signal to enable ops cleanup.
254-266
: Path traversal check could be more robust.
startsWith("../")
on a relativized path is brittle. Prefer normalizing and ensuring the target stays withinuploadDir
.Apply:
- private def isOutsideUploadDir(uploadDir: Path, filePath: String): Boolean = - uploadDir.relativize(uploadDir.resolve(filePath)).startsWith("../") + private def isOutsideUploadDir(uploadDir: Path, filePath: String): Boolean = { + val target = uploadDir.resolve(filePath).normalize() + !target.startsWith(uploadDir) + }
826-853
: Orphan upload cleanup at startup: OK; consider periodic metrics.You may want a counter for deleted orphans to surface drift.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
app/controllers/WKRemoteDataStoreController.scala
(1 hunks)app/models/dataset/DatasetService.scala
(2 hunks)conf/application.conf
(1 hunks)conf/webknossos.latest.routes
(1 hunks)project/Dependencies.scala
(2 hunks)unreleased_changes/8912.md
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(4 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DirectoryConstants.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(13 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)
util/src/main/scala/com/scalableminds/util/tools/ConfigReader.scala (1)
get
(23-26)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DatasetLayerAttachments.scala (3)
isEmpty
(21-21)DatasetLayerAttachments
(13-22)DatasetLayerAttachments
(24-27)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataVaultService.scala (2)
DataVaultService
(21-30)DataVaultService
(32-69)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
mapped
(411-511)util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (2)
toString
(38-38)toMagLiteral
(40-43)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (3)
coordinateTransformations
(89-89)defaultViewConfiguration
(101-101)attachments
(107-107)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (3)
PathUtils
(14-14)ensureDirectoryBox
(124-132)deleteDirectoryRecursively
(186-193)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
DataStoreConfig
(10-79)Datastore
(17-71)S3Upload
(63-69)DataVaults
(60-62)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/DataStoreRedisStore.scala (1)
DataStoreRedisStore
(8-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RemoteSourceDescriptorService.scala (1)
RemoteSourceDescriptorService
(20-177)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/RedisTemporaryStore.scala (2)
insert
(55-66)remove
(82-87)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
prependAllPaths
(369-418)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
updateDataSource
(139-147)getCredential
(194-205)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
DataStoreConfig
(10-79)Datastore
(17-71)S3Upload
(63-69)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
isKnownUpload
(175-178)reserveUpload
(192-230)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
reserveDataSourceUpload
(130-139)
🪛 markdownlint-cli2 (0.17.2)
unreleased_changes/8912.md
8-8: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (19)
conf/webknossos.latest.routes (1)
290-290
: Whitespace-only change — OK.No functional impact.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DirectoryConstants.scala (1)
7-7
: Constant addition looks good.Consistent with hidden-dot convention for internal dirs.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (1)
63-69
: Config group addition — OK.Fields align with application.conf.
app/models/dataset/DatasetService.scala (2)
101-102
: Propagation of isVirtual — OK.Correctly forwards to createDataset.
91-93
: Signature change verified — call sites updated.DatasetService.scala now requires isVirtual; the only invocation found (app/controllers/WKRemoteDataStoreController.scala) passes uploadInfo.isVirtual.getOrElse(false).
project/Dependencies.scala (1)
8-8
: Pinning awsVersion and adding s3-transfer-manager — artifacts available.
Verified software.amazon.awssdk:s3 and software.amazon.awssdk:s3-transfer-manager v2.32.24 are present on Maven Central (HTTP 200).app/controllers/WKRemoteDataStoreController.scala (1)
88-94
: createPreliminaryDataset now accepts isVirtual; callsites verifiedSignature in app/models/dataset/DatasetService.scala includes isVirtual: Boolean; the sole callsite found (app/controllers/WKRemoteDataStoreController.scala) passes uploadInfo.requireUniqueName.getOrElse(false) and uploadInfo.isVirtual.getOrElse(false), preserving the legacy default false — no other usages found.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
415-507
: Attachment path rewrites included in mapped(): LGTM.Adding
attachmentMapping
and propagating it across all concrete copies is correct and unlocks URI rewrites for attachments alongside mags.webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)
70-74
: Constructor wiring for DataStoreConfig is fine.No issues; enables controller-level config decisions.
147-149
: Manual uploads: forcing isVirtual = false is appropriate.Matches the “on-disk” manual workflow.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (9)
384-386
: Back-compat: datasetId missing in Redis for older reservations.
getDatasetIdByUploadId(uploadId)
will fail for uploads reserved before this PR. Add a fallback (e.g., resolve via WK by directoryName) to let those finish.Would you like me to add a
DSRemoteWebknossosClient.findDatasetId(...)
fallback here?
454-462
: Path allowlist check for explored datasources: good guard.Ensures URIs after exploration are permissible.
183-186
: Staging dir for S3 uploads: OK.Using
${uploadingDir}/$uploadToS3Dir/$uploadId
isolates S3 prep; no functional issues spotted.Also applies to: 586-599
531-539
: Content-encoding workaround comment.The checksum setting workaround for “aws-chunked” is pragmatic. Please confirm with a small integration test that uploaded objects are readable without special headers.
269-279
: Chunk handling and bounds checks: solid.Includes size guard via
totalFileSizeInBytes
and idempotent chunk writes.Also applies to: 315-336
409-417
: Delete local staging after S3 push: good.Saves disk and avoids double-storage.
621-641
: Layer linking rename switch is clean.Covers all concrete layer types.
760-789
: ZIP vs. raw layout handling is careful.Truncating common prefixes and excluding known layer prefixes avoids extra nesting.
855-861
: Helper to parse JSON from Redis: OK.Minimal and typed.
...-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
Outdated
Show resolved
Hide resolved
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Outdated
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for diving into this! I had a first look and added some comments.
@normanrz maybe you would like to have a look as well, especially at some of the design decisions? Does this fit with our multi-bucket setup? Is it ok that we can’t show datastore-to-s3 upload progress to the user?
I did not yet do any testing.
Note that we’ll have to see what is merged first, this or #8844 – if the latter, we could use UPath stuff here as well.
Also, I guess the worker has to be adapted to write the output of the convert_to_wkw job to S3 as well. But maybe that can be deployed separately from this change?
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (2)
162-173
: Public helper should handle regional virtual-hosted endpoints (bucket.s3.us-xyz-1.amazonaws.com).
hostBucketFromUri
fails for regional virtual-hosted style; making it public risks propagating this limitation. Extract bucket using the first occurrence of ".s3" and normalize path handling.Apply:
- def hostBucketFromUri(uri: URI): Option[String] = { - val host = uri.getHost - if (isShortStyle(uri)) { // assume host is omitted from uri, shortcut form s3://bucket/key - Some(host) - } else if (isVirtualHostedStyle(uri)) { - Some(host.substring(0, host.length - ".s3.amazonaws.com".length)) - } else if (isPathStyle(uri)) { - Some(uri.getPath.substring(1).split("/")(0)) - } else { - None - } - } + def hostBucketFromUri(uri: URI): Option[String] = { + val host = Option(uri.getHost).getOrElse("") + if (isShortStyle(uri)) { // s3://bucket/key + Some(host) + } else if (isVirtualHostedStyle(uri)) { // bucket.s3[.region].amazonaws.com + val idx = host.indexOf(".s3") + if (idx > 0) Some(host.substring(0, idx)) else None + } else if (isPathStyle(uri)) { // s3[.region].amazonaws.com/bucket/key OR non-AWS host + Some(uri.getPath.stripPrefix("/").split("/")(0)) + } else None + }
175-186
: Broaden endpoint detection for AWS styles; escape regex dots.Covers regional and dualstack variants; current
endsWith(".s3.amazonaws.com")
is too narrow.Apply:
- private def isVirtualHostedStyle(uri: URI): Boolean = - uri.getHost.endsWith(".s3.amazonaws.com") + private def isVirtualHostedStyle(uri: URI): Boolean = + uri.getHost.matches(""".+\.s3([.-][\w\-]+)?\.amazonaws\.com""") - private def isPathStyle(uri: URI): Boolean = - uri.getHost.matches("s3(.[\\w\\-_]+)?.amazonaws.com") || - (!uri.getHost.contains("amazonaws.com") && uri.getHost.contains(".")) + private def isPathStyle(uri: URI): Boolean = + uri.getHost.matches("""s3([.-][\w\-]+)?\.amazonaws\.com""") || + (!uri.getHost.contains("amazonaws.com") && uri.getHost.contains("."))webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
186-188
: Exclude hidden staging dir from orphan cleanup.
.cloud
underuploading/
can be mistaken as an orphan and deleted bycleanUpOrphanUploadsForOrga
. Skip dot-prefixed dirs in that sweep.Proposed check inside
cleanUpOrphanUploadsForOrga
before trying to delete:
- if
uploadDir.getFileName.toString.startsWith(".")
, continue.
510-523
: Bucket/endpoint derivation assumptions are brittle.
s3UploadBucketOpt
depends oncredentialName
being an S3 URI containing a bucket segment.
s3UploadEndpoint
forces HTTPS; some S3-compatible stores are HTTP-only.Validate
credentialName
format at startup and fail fast with a clear message.Consider reusing
S3DataVault.getAmazonS3Client
(with injected WSClient) for consistent endpoint/region handling, or make protocol configurable.
524-541
: Region is hardcoded to us-east-1.This can break signing for AWS regional endpoints. Either parse from host (like S3DataVault) or make it configurable.
Short-term: add a datastore.s3Upload.region with default
us-east-1
. Long-term: parse viaAwsHostNameUtils.parseSigningRegion
.
575-594
: Edge case: depth limit may keep unused files.
maxDepth = 10
may miss deep trees, leaving unreferenced files to be uploaded.
- Consider unlimited traversal or a higher bound, or compute
neededPaths
first and filter by prefix without full recursion.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
conf/application.conf
(1 hunks)unreleased_changes/8912.md
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(12 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
- conf/application.conf
🧰 Additional context used
🧬 Code graph analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
DataStoreConfig
(10-77)Datastore
(17-69)S3Upload
(63-67)DataVaults
(60-62)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
S3DataVault
(45-149)S3DataVault
(151-244)getCredential
(140-140)hostBucketFromUri
(162-173)create
(152-160)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
UnfinishedUpload
(7-16)UnfinishedUpload
(18-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DSRemoteWebknossosClient
(57-225)updateDataSource
(139-147)reportDataSource
(84-89)getCredential
(194-205)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)
🪛 markdownlint-cli2 (0.17.2)
unreleased_changes/8912.md
8-8: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: frontend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
511-531
: Consider making S3 client initialization more robustThe current S3 client initialization could be improved to handle credential failures more gracefully.
Following up on the previous comment about credentials, consider using
Box
for better error handling:- private lazy val s3ClientBox: Box[S3AsyncClient] = for { - accessKeyId <- Box(s3UploadCredentialsOpt.map(_._1)) - secretAccessKey <- Box(s3UploadCredentialsOpt.map(_._2)) - } yield + private lazy val s3ClientBox: Box[S3AsyncClient] = + s3UploadCredentialsOpt match { + case Some((accessKeyId, secretAccessKey)) if accessKeyId.nonEmpty && secretAccessKey.nonEmpty => + Full( S3AsyncClient .builder() .credentialsProvider( StaticCredentialsProvider.create( AwsBasicCredentials.builder.accessKeyId(accessKeyId).secretAccessKey(secretAccessKey).build() )) .crossRegionAccessEnabled(true) .forcePathStyle(true) .endpointOverride(s3UploadEndpoint) .region(Region.US_EAST_1) // Disabling checksum calculation prevents files being stored with Content Encoding "aws-chunked". .requestChecksumCalculation(RequestChecksumCalculation.WHEN_REQUIRED) .build() + ) + case _ => + Failure(s"S3 upload enabled but no valid credentials found for ${dataStoreConfig.Datastore.S3Upload.credentialName}") + } private lazy val transferManagerBox: Box[S3TransferManager] = for { client <- s3ClientBox } yield S3TransferManager.builder().s3Client(client).build()Also, consider making the S3 region configurable instead of hardcoding
Region.US_EAST_1
.
562-580
: Consider optimizing file filtering logicThe
getNonReferencedFiles
method could be improved for clarity and efficiency.Consider extracting the implicit path construction logic:
private def getNonReferencedFiles(dataDir: Path, dataSource: DataSource): Fox[List[Path]] = for { usableDataSource <- dataSource.toUsable.toFox ?~> "Data source is not usable" - explicitPaths: Set[Path] = usableDataSource.dataLayers + implicitPaths: Set[Path] = usableDataSource.dataLayers .flatMap(layer => layer.mags.map(mag => mag.path match { case Some(_) => None case None => Some(dataDir.resolve(List(layer.name, mag.mag.toMagLiteral(true)).mkString("/"))) })) .flatten .toSet - neededPaths = usableDataSource.dataLayers + explicitPaths = usableDataSource.dataLayers .flatMap(layer => layer.allExplicitPaths) .map(dataDir.resolve) - .toSet ++ explicitPaths + .toSet + neededPaths = explicitPaths ++ implicitPaths allFiles <- PathUtils.listFilesRecursive(dataDir, silent = true, maxDepth = 10).toFox filesToDelete = allFiles.filterNot(file => neededPaths.exists(neededPath => file.startsWith(neededPath))) } yield filesToDelete
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
app/controllers/WKRemoteDataStoreController.scala
(1 hunks)app/models/dataset/DatasetService.scala
(1 hunks)conf/application.conf
(1 hunks)project/Dependencies.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(3 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
(3 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(12 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- conf/application.conf
- app/controllers/WKRemoteDataStoreController.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala
- app/models/dataset/DatasetService.scala
- project/Dependencies.scala
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-23T08:51:57.756Z
Learnt from: frcroth
PR: scalableminds/webknossos#8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/MeshFileService.scala:170-173
Timestamp: 2025-04-23T08:51:57.756Z
Learning: In the webknossos codebase, classes extending `FoxImplicits` have access to an implicit conversion from `Option[A]` to `Fox[A]`, where `None` is converted to an empty Fox that fails gracefully in for-comprehensions.
Applied to files:
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
🧬 Code graph analysis (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (3)
DataStoreConfig
(11-78)Datastore
(18-70)S3Upload
(64-68)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
isKnownUpload
(164-167)reserveUpload
(181-219)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
reserveDataSourceUpload
(129-138)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (3)
coordinateTransformations
(85-85)attachments
(103-103)defaultViewConfiguration
(97-97)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/PathSchemes.scala (1)
PathSchemes
(3-9)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
mapped
(102-131)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (4)
PathUtils
(14-14)ensureDirectoryBox
(124-132)deleteDirectoryRecursively
(186-193)listFilesRecursive
(104-108)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
DataStoreConfig
(11-78)Datastore
(18-70)S3Upload
(64-68)DataVaults
(61-63)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
S3DataVault
(46-155)S3DataVault
(157-250)getCredential
(143-143)hostBucketFromUri
(168-179)create
(158-166)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DSRemoteWebknossosClient
(56-222)updateDataSource
(138-144)reportDataSource
(83-88)getCredential
(191-202)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
🔇 Additional comments (7)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
102-129
: LGTM! Well-structured attachment mapping supportThe addition of the
attachmentMapping
parameter with a default identity function is clean and maintains backward compatibility. The implementation correctly applies the mapping to bothStaticColorLayer
andStaticSegmentationLayer
branches.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
345-414
: Persist and use isVirtual from reservation to ensure consistencyThe
finishUpload
method currently determines S3 vs. local storage usingdataStoreConfig.Datastore.S3Upload.enabled
at Line 353. This could lead to inconsistent behavior if the configuration changes between reservation and finish.
390-395
: Replace with Instant.logSince for consistencyUse the utility method for consistent timing logs.
387-406
: Add cleanup on S3 upload failureIf the S3 upload fails between lines 393-405, local temporary data remains and the pre-created dataset may linger without proper cleanup.
489-510
: Fail fast if S3 credentials are missingThe lazy initialization of
s3UploadCredentialsOpt
at line 489 usesgetOrElse(("", ""))
pattern which silently creates a client with empty credentials, leading to confusing runtime failures.webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
99-105
: Good: isVirtual set from config at reservation timeSetting
shouldBeVirtual
once and enforcing it viarequest.body.copy(isVirtual = Some(shouldBeVirtual))
is the right approach. The controller correctly captures the configuration state at reservation time.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1)
254-302
: URI construction with null scheme creates invalid relative pathsThe
replaceUri
function creates URIs with null schemes when prepending paths, which produces relative path strings that may not be properly handled by S3 clients and could cause path resolution issues.Apply this refactor to properly handle URI construction and path normalization:
- // Prepend newBasePath to all (relative) paths in mags and attachments of the data source. - def prependAllPaths(dataSource: DataSource, newBasePath: String): Fox[DataSource] = { - val replaceUri = (uri: URI) => { - val isRelativeFilePath = (uri.getScheme == null || uri.getScheme.isEmpty || uri.getScheme == PathSchemes.schemeFile) && !uri.isAbsolute - uri.getPath match { - case pathStr if isRelativeFilePath => - new URI(uri.getScheme, - uri.getUserInfo, - uri.getHost, - uri.getPort, - newBasePath + pathStr, - uri.getQuery, - uri.getFragment) - case _ => uri - } - } + // Prepend newBasePath to all relative paths in mags and attachments of the data source. + def prependAllPaths(dataSource: DataSource, newBasePath: String): Fox[DataSource] = { + def ensureTrailingSlash(s: String): String = + if (s.endsWith("/")) s else s + "/" + + def stripLeadingSlash(s: String): String = + if (s.startsWith("/")) s.drop(1) else s + + val baseUri = new URI(ensureTrailingSlash(newBasePath)) + + def prependToPath(pathStr: String): String = + baseUri.resolve(stripLeadingSlash(pathStr)).toString + + def replaceUri(uri: URI): URI = { + val scheme = Option(uri.getScheme).getOrElse("") + // Only rewrite relative, non-opaque URIs (no scheme or file scheme) + if (scheme.isEmpty || scheme == PathSchemes.schemeFile) { + new URI(prependToPath(uri.getPath)) + } else { + uri + } + } dataSource.toUsable match { case Some(usableDataSource) => val updatedDataLayers = usableDataSource.dataLayers.map { case layerWithMagLocators: StaticLayer => layerWithMagLocators.mapped( magMapping = mag => mag.path match { - case Some(pathStr) => mag.copy(path = Some(replaceUri(new URI(pathStr)).toString)) + case Some(pathStr) => mag.copy(path = Some(prependToPath(pathStr))) // If the mag does not have a path, it is an implicit path, we need to make it explicit. case _ => mag.copy( path = Some( - new URI(newBasePath) - .resolve(List(layerWithMagLocators.name, mag.mag.toMagLiteral(true)).mkString("/")) - .toString)) + baseUri.resolve(List(layerWithMagLocators.name, mag.mag.toMagLiteral(true)).mkString("/")).toString)) }, attachmentMapping = attachment => DataLayerAttachments( attachment.meshes.map(a => a.copy(path = replaceUri(a.path))), attachment.agglomerates.map(a => a.copy(path = replaceUri(a.path))), attachment.segmentIndex.map(a => a.copy(path = replaceUri(a.path))), attachment.connectomes.map(a => a.copy(path = replaceUri(a.path))), attachment.cumsum.map(a => a.copy(path = replaceUri(a.path))) ) ) case layer => layer } Fox.successful(usableDataSource.copy(dataLayers = updatedDataLayers)) case None => Fox.failure("Cannot replace paths of unusable datasource") } }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
app/controllers/WKRemoteDataStoreController.scala (2)
207-224
: Await dataset deletion and storage refresh.Deletion is created as a Fox but not sequenced; effects may not run and errors won’t propagate. Use
<-
to await.- _ = datasetDAO - .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) - .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset)) - } yield () + _ <- datasetDAO + .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) + .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset)) + } yield ()
269-279
: Re-evaluate ignoring updates for virtual datasets.With S3 uploads, datasets remain virtual but still need their DataSource persisted. Either keep the DB update in reportDatasetUpload (preferred) or allow this endpoint to update virtual datasets too.
- _ <- Fox.runIf(!dataset.isVirtual)( - datasetDAO.updateDataSource(datasetId, - name, - request.body.hashCode(), - request.body, - isUsable = request.body.toUsable.isDefined)(GlobalAccessContext)) + _ <- datasetDAO.updateDataSource( + datasetId, + name, + request.body.hashCode(), + request.body, + isUsable = request.body.toUsable.isDefined + )(GlobalAccessContext)If you keep the persistence in reportDatasetUpload, you can leave this route as-is but ensure no client relies on it for S3 uploads.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
329-339
: Fix nested Fox and ensure cleanup triggers on size violation.Using
map
returnsFox[Fox[Unit]]
and doesn’t sequence cleanup/failure. UseflatMap
and propagate the error.- private def assertWithinRequestedFileSizeAndCleanUpOtherwise(uploadDir: Path, uploadId: String): Fox[Unit] = + private def assertWithinRequestedFileSizeAndCleanUpOtherwise(uploadDir: Path, uploadId: String): Fox[Unit] = for { totalFileSizeInBytesOpt <- runningUploadMetadataStore.find(redisKeyForTotalFileSizeInBytes(uploadId)) - _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize => - tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.map(actualFileSize => - if (actualFileSize > maxFileSize.toLong) { - cleanUpDatasetExceedingSize(uploadDir, uploadId) - Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes") - } else Fox.successful(())) - } + _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize => + for { + actualFileSize <- tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox + _ <- if (actualFileSize > maxFileSize.toLong) { + cleanUpDatasetExceedingSize(uploadDir, uploadId) + .flatMap(_ => + Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes")) + } else Fox.successful(()) + } yield () + } } yield ()
232-241
: Guard against empty filePathRaw before charAt(0).If the path is empty,
charAt(0)
throws. UsenonEmpty && startsWith("/")
.- filePathRaw = uploadFileId.split("/").tail.mkString("/") - filePath = if (filePathRaw.charAt(0) == '/') filePathRaw.drop(1) else filePathRaw + filePathRaw = uploadFileId.split("/").tail.mkString("/") + filePath = if (filePathRaw.nonEmpty && filePathRaw.charAt(0) == '/') filePathRaw.drop(1) else filePathRaw
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
414-426
: Honor reserved “isVirtual/uploadToS3” flag instead of live config.This branch uses
dataStoreConfig.Datastore.S3Upload.enabled
. Persist and read the reservation’s isVirtual/uploadToS3 to avoid inconsistency if config flips.Proposed shape:
- Persist isVirtual at reserve time (e.g.,
redisKeyForIsVirtual(uploadId)
).- Read it in finish and pass
uploadToS3: Boolean
intomoveUnpackedToTarget
.- Prefer stored flag; fallback to config only if missing.
I can provide a concrete patch if you confirm the intended key naming in Redis.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
app/controllers/DatasetController.scala
(5 hunks)app/controllers/WKRemoteDataStoreController.scala
(6 hunks)app/models/dataset/DatasetService.scala
(4 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(18 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
app/controllers/DatasetController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
LinkedLayerIdentifier
(80-80)LinkedLayerIdentifier
(82-84)app/models/storage/UsedStorageService.scala (2)
UsedStorageService
(21-103)refreshStorageReportForDataset
(89-103)app/models/dataset/DatasetService.scala (1)
trackNewDataset
(558-570)
app/models/dataset/DatasetService.scala (8)
app/models/user/MultiUser.scala (1)
MultiUserDAO
(39-181)app/mail/MailchimpClient.scala (2)
MailchimpClient
(17-82)tagUser
(45-54)app/models/analytics/AnalyticsService.scala (3)
AnalyticsService
(22-97)track
(38-46)isSuperUser
(103-108)app/models/analytics/AnalyticsEvent.scala (1)
UploadDatasetEvent
(205-218)app/telemetry/SlackNotificationService.scala (2)
SlackNotificationService
(11-59)info
(30-36)app/utils/WkConf.scala (1)
WkConf
(14-296)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
UnusableDataSource
(45-62)UnusableDataSource
(64-66)app/models/dataset/DataStore.scala (1)
findOneByName
(118-125)
app/controllers/WKRemoteDataStoreController.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
deleteDataset
(138-143)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)
LinkedLayerIdentifier
(80-80)LinkedLayerIdentifier
(82-84)ReportDatasetUploadParameters
(69-74)ReportDatasetUploadParameters
(75-78)ReserveAdditionalInformation
(63-63)ReserveAdditionalInformation
(64-67)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)app/models/dataset/DataStore.scala (1)
validateAccess
(84-91)app/security/WebknossosBearerTokenAuthenticatorService.scala (1)
userForToken
(76-84)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (10)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
DataStoreConfig
(11-78)Datastore
(18-70)S3Upload
(64-68)DataVaults
(61-63)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
S3DataVault
(46-155)S3DataVault
(157-250)getCredential
(143-143)hostBucketFromUri
(168-179)create
(158-166)webknossos-datastore/app/com/scalableminds/webknossos/datastore/explore/ExploreLocalLayerService.scala (1)
ExploreLocalLayerService
(21-152)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath
(54-96)fromLocalPath
(80-80)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
UnfinishedUpload
(7-16)UnfinishedUpload
(18-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DSRemoteWebknossosClient
(57-216)reportUpload
(98-105)getCredential
(185-196)deleteDataset
(138-143)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
DataSourceService
(25-307)resolvePathsInNewBasePath
(253-275)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (2)
finishUpload
(185-196)deleteOnDisk
(390-414)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
deleteOnDisk
(28-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
🔇 Additional comments (7)
app/models/dataset/DatasetService.scala (2)
88-97
: Preliminary datasets as virtual: OK.Creating preliminary datasets as virtual aligns with S3 uploads and later activation via report path.
558-569
: Good tracking side-effects placement.Slack, analytics, and Mailchimp tagging gated on needsConversion look correct.
app/controllers/DatasetController.scala (2)
650-665
: Nice: finalize “upload to paths” updates storage and tracking.Refreshing used storage and calling trackNewDataset after marking usable is appropriate.
140-148
: Injecting UsedStorageService here is appropriate.The controller’s responsibilities are clear and stay thin.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
520-531
: Preserve scheme and port for custom S3 endpoints.Hardcoding https and omitting the port breaks MinIO and non‑default endpoints.
- private lazy val s3UploadEndpoint: URI = { - val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) - new URI( - "https", - null, - credentialUri.getHost, - -1, - null, - null, - null - ) - } + private lazy val s3UploadEndpoint: URI = + new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
305-307
: Fix log typo.“datsetId” → “datasetId”.
- s"Error receiving chunk $currentChunkNumber for uploadId $uploadId (datsetId $datasetId): ${e.getMessage}" + s"Error receiving chunk $currentChunkNumber for uploadId $uploadId (datasetId $datasetId): ${e.getMessage}"app/controllers/WKRemoteDataStoreController.scala (1)
133-151
: Persist final DataSource and mark dataset usable in reportDatasetUpload (no‑conversion path).Currently nothing is written to the DB, so virtual datasets never become usable and paths aren’t stored. Update the dataset when needsConversion is false and require dataSourceOpt.
Apply this diff:
- def reportDatasetUpload(name: String, - key: String, - token: String, - datasetId: ObjectId): Action[ReportDatasetUploadParameters] = - Action.async(validateJson[ReportDatasetUploadParameters]) { implicit request => - dataStoreService.validateAccess(name, key) { _ => + def reportDatasetUpload(name: String, + key: String, + token: String, + datasetId: ObjectId): Action[ReportDatasetUploadParameters] = + Action.async(validateJson[ReportDatasetUploadParameters]) { implicit request => + dataStoreService.validateAccess(name, key) { dataStore => for { user <- bearerTokenService.userForToken(token) dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND - _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset)) + // Persist final DataSource and mark usable when no conversion is needed + _ <- Fox.runIf(!request.body.needsConversion) { + for { + dataSource <- request.body.dataSourceOpt.toFox ?~> "reportUpload.dataSourceMissing" + _ <- datasetDAO.updateDataSource( + datasetId, + dataStore.name, + dataSource.hashCode(), + dataSource, + isUsable = dataSource.toUsable.isDefined + )(GlobalAccessContext) + } yield () + } _ <- datasetService.trackNewDataset(dataset, user, request.body.needsConversion, request.body.datasetSizeBytes, viaAddRoute = false) + _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset)) // TODO update dataset in db with layersToLink } yield Ok } }
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
conf/webknossos.versioned.routes
(2 hunks)util/src/main/scala/com/scalableminds/util/mvc/ApiVersioning.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/LegacyController.scala
(3 hunks)webknossos-datastore/conf/datastore.versioned.routes
(6 hunks)webknossos-tracingstore/conf/tracingstore.versioned.routes
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/LegacyController.scala
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: frontend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
🔇 Additional comments (3)
conf/webknossos.versioned.routes (1)
7-22
: Routing bump to v12 looks correct.The changelog note and the new
/v12/
mapping both line up with the datasetId-based upload flow and keep newest routes prioritized.webknossos-tracingstore/conf/tracingstore.versioned.routes (1)
3-3
: Confirm ApiVersioning trait updateLine [3]: The header reminds us this list must stay aligned with
com.scalableminds.util.mvc.ApiVersioning
. Please double-check that v12 is now reported there (and in any other version registry) so clients get a consistent version story.util/src/main/scala/com/scalableminds/util/mvc/ApiVersioning.scala (1)
8-8
: API version bump looks good.Thanks for keeping the versioning metadata in sync with the new API surface introduced in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (5)
app/controllers/DatasetController.scala (1)
236-241
: Track dataset creation after addVirtualDatasetHooking analytics and notifications here is good. Consider viaAddRoute = true for add routes to reflect origin in Slack text.
- _ <- datasetService.trackNewDataset(dataset, - user, - needsConversion = false, - datasetSizeBytes = 0, - viaAddRoute = false) + _ <- datasetService.trackNewDataset(dataset, + user, + needsConversion = false, + datasetSizeBytes = 0, + viaAddRoute = true)app/models/dataset/LayerToLinkService.scala (1)
22-31
: Validate layer existence earlyFail fast if the referenced layer does not exist in the source dataset. This catches mistakes at reservation time.
def validateLayerToLink(layerIdentifier: LinkedLayerIdentifier, requestingUser: User)(implicit ec: ExecutionContext, m: MessagesProvider): Fox[Unit] = for { dataset <- datasetDAO.findOne(layerIdentifier.datasetId)(AuthorizedAccessContext(requestingUser)) ?~> Messages( "dataset.notFound", layerIdentifier.datasetId) ~> NOT_FOUND isTeamManagerOrAdmin <- userService.isTeamManagerOrAdminOfOrg(requestingUser, dataset._organization) _ <- Fox.fromBool(isTeamManagerOrAdmin || requestingUser.isDatasetManager || dataset.isPublic) ?~> "dataset.upload.linkRestricted" + _ <- datasetService + .usableDataSourceFor(dataset) + .flatMap(uds => uds.dataLayers.find(_.name == layerIdentifier.layerName).toFox ?~> "dataset.layerToLink.layerNotFound") } yield ()app/models/dataset/DatasetService.scala (1)
559-568
: Slack message content – nitOptional: include dataset name in the message for easier triage.
- _ = slackNotificationService.info(s"Dataset added $addLabel$superUserLabel", - s"For organization: ${organization.name}. <$resultLink|Result>") + _ = slackNotificationService.info(s"Dataset added $addLabel$superUserLabel", + s"For organization: ${organization.name}, dataset: $datasetId. <$resultLink|Result>")webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
539-555
: Region hardcoded to US_EAST_1Make region configurable (or infer from endpoint/credential) to support non-US regions.
- .region(Region.US_EAST_1) + .region( + Option(System.getProperty("WK_S3_REGION")) + .flatMap(r => Try(Region.of(r)).toOption) + .getOrElse(Region.US_EAST_1) + )Outside this file, consider adding a DataStoreConfig.Datastore.S3Upload.region option and read it here.
823-829
: Remove persisted isVirtual on cleanupIf you persist isVirtual, also delete it in removeFromRedis.
_ <- runningUploadMetadataStore.remove(redisKeyForDataSourceId(uploadId)) _ <- runningUploadMetadataStore.remove(redisKeyForDatasetId(uploadId)) + _ <- runningUploadMetadataStore.remove(redisKeyForIsVirtual(uploadId)) _ <- runningUploadMetadataStore.remove(redisKeyForLinkedLayerIdentifier(uploadId))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
app/controllers/DatasetController.scala
(7 hunks)app/controllers/WKRemoteDataStoreController.scala
(5 hunks)app/models/dataset/ComposeService.scala
(1 hunks)app/models/dataset/DatasetService.scala
(4 hunks)app/models/dataset/DatasetUploadToPathsService.scala
(3 hunks)app/models/dataset/LayerToLinkService.scala
(1 hunks)conf/application.conf
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(7 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(18 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/DSSlackNotificationService.scala
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- conf/application.conf
🧰 Additional context used
🧬 Code graph analysis (9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/DSSlackNotificationService.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/SlackClient.scala (1)
warn
(18-21)
app/controllers/DatasetController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
LinkedLayerIdentifier
(80-80)LinkedLayerIdentifier
(82-84)app/models/storage/UsedStorageService.scala (2)
UsedStorageService
(21-103)refreshStorageReportForDataset
(89-103)app/models/dataset/DatasetService.scala (1)
trackNewDataset
(547-559)
app/models/dataset/LayerToLinkService.scala (5)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (2)
AuthorizedAccessContext
(19-21)DBAccessContext
(31-37)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
LinkedLayerIdentifier
(80-80)LinkedLayerIdentifier
(82-84)app/models/user/UserService.scala (2)
UserService
(32-435)isTeamManagerOrAdminOfOrg
(321-326)app/models/dataset/DatasetService.scala (1)
usableDataSourceFor
(383-389)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (3)
toFox
(14-14)successful
(53-56)failure
(58-62)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (3)
PathUtils
(14-14)deleteDirectoryRecursively
(186-193)ensureDirectoryBox
(124-132)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
DataStoreConfig
(11-78)Datastore
(18-70)S3Upload
(64-68)DataVaults
(61-63)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
S3DataVault
(46-155)S3DataVault
(157-250)getCredential
(143-143)hostBucketFromUri
(168-179)create
(158-166)webknossos-datastore/app/com/scalableminds/webknossos/datastore/explore/ExploreLocalLayerService.scala (1)
ExploreLocalLayerService
(21-152)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath
(54-96)fromLocalPath
(80-80)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
UnfinishedUpload
(7-16)UnfinishedUpload
(18-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DSRemoteWebknossosClient
(57-216)reportUpload
(98-105)getCredential
(185-196)deleteDataset
(138-143)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)
app/models/dataset/DatasetUploadToPathsService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
LinkedLayerIdentifier
(80-80)LinkedLayerIdentifier
(82-84)app/controllers/DatasetController.scala (2)
ReserveDatasetUploadToPathsRequest
(59-67)ReserveDatasetUploadToPathsRequest
(69-71)app/models/dataset/LayerToLinkService.scala (1)
addLayersToLinkToDataSource
(32-42)
app/controllers/WKRemoteDataStoreController.scala (10)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
AuthorizedAccessContext
(19-21)DBAccessContext
(31-37)GlobalAccessContext
(25-29)util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
ObjectId
(11-13)ObjectId
(15-66)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
DataSource
(31-43)UnusableDataSource
(45-62)UnusableDataSource
(64-66)isUsable
(20-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
DataSourceStatus
(3-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
ReportDatasetUploadParameters
(69-74)ReportDatasetUploadParameters
(75-78)app/models/dataset/LayerToLinkService.scala (2)
validateLayerToLink
(22-32)addLayersToLinkToDataSource
(32-42)app/models/dataset/DatasetService.scala (2)
createVirtualDataset
(88-110)trackNewDataset
(547-559)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)app/security/WebknossosBearerTokenAuthenticatorService.scala (1)
userForToken
(76-84)app/models/dataset/Dataset.scala (3)
findOne
(203-210)updateDataSource
(633-655)deleteDataset
(702-730)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/MappingProvider.scala (3)
exploreMappings
(32-43)MappingProvider
(11-24)MappingProvider
(26-44)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (2)
UnusableDataSource
(45-62)UnusableDataSource
(64-66)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
DataSourceStatus
(3-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
mapped
(104-133)util/src/main/scala/com/scalableminds/util/geometry/Vec3Int.scala (1)
toMagLiteral
(40-43)
app/models/dataset/DatasetService.scala (8)
app/models/user/MultiUser.scala (1)
MultiUserDAO
(39-181)app/mail/MailchimpClient.scala (2)
MailchimpClient
(17-82)tagUser
(45-54)app/mail/MailchimpTag.scala (1)
MailchimpTag
(5-22)app/models/analytics/AnalyticsService.scala (2)
AnalyticsService
(22-97)track
(38-46)app/models/analytics/AnalyticsEvent.scala (1)
UploadDatasetEvent
(205-218)app/telemetry/SlackNotificationService.scala (2)
SlackNotificationService
(11-59)info
(30-36)app/utils/WkConf.scala (1)
WkConf
(14-296)app/models/dataset/DataStore.scala (1)
findOneByName
(118-125)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (5)
webknossos
(70-72)UserAccessRequest
(30-30)UserAccessRequest
(31-72)writeDataset
(52-55)deleteDataset
(49-52)webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/DSSlackNotificationService.scala (2)
DSSlackNotificationService
(8-31)noticeFailedFinishUpload
(26-31)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
reserveDataSourceUpload
(123-132)deleteDataset
(138-143)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)
reserveUpload
(167-205)getDatasetIdByUploadId
(164-167)extractDatasetUploadId
(156-156)finishUpload
(341-385)cancelUpload
(313-327)isKnownUpload
(153-156)util/src/main/scala/com/scalableminds/util/requestlogging/RequestLogging.scala (1)
log
(55-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
🔇 Additional comments (24)
app/models/dataset/ComposeService.scala (1)
51-52
: Folder ID type change looks good.Switching to
ObjectId
keepscreateVirtualDataset
aligned with the new API surface. No issues spotted.app/models/dataset/DatasetUploadToPathsService.scala (1)
41-84
: Delegating toLayerToLinkService
looks correct.The new dependency injection and the replacement of the inlined linking logic with
layerToLinkService.addLayersToLinkToDataSource
keep the flow coherent with the rest of the refactor.webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (4)
97-105
: New reservation flow is consistent.Fetching
ReserveAdditionalInformation
once and passing it through toUploadService.reserveUpload
keeps the data flow sound.
153-169
: Chunk-upload auth change looks good.Ensuring the caller has
writeDataset
on the resolved dataset ID before accepting chunks aligns with the dataset-centric permissions.
200-214
: Cancel flow now guards on dataset ID—looks good.Pulling the dataset ID first and switching access control to
deleteDataset
matches the dataset-based authorisation model.
399-411
: Remote delete even when not on disk makes sense.Calling
deleteDataset
on WK irrespective of local presence keeps remote state tidy; the log message clarifies the local situation.app/controllers/DatasetController.scala (4)
141-142
: Good: usedStorageService injected where neededWiring UsedStorageService into the controller enables post-upload report refreshes in one place.
120-124
: API type change: folderId is now ObjectIdThis is a breaking change for clients sending folderId as string. Confirm all callers (FE, datastore) and route versions are adapted.
209-215
: Passing folderIdOpt directly is correctThe explore+add flow now feeds Option[ObjectId] end-to-end. Looks good.
656-665
: Finish upload-to-paths: refresh usage before tracking – LGTMThe order (mark usable -> refresh report -> track) is sensible.
app/models/dataset/LayerToLinkService.scala (2)
32-41
: Merging linked layers and duplicate-name check – LGTMCombining resolved layers with existing ones and guarding against duplicates is correct.
42-57
: Resolution logic is soundUses Authorized/DB contexts correctly, copies layer data immutably, and supports both color/segmentation.
app/models/dataset/DatasetService.scala (3)
88-108
: createVirtualDataset: type and access updates – LGTM
- Accepting DataSource (usable/unusable) is flexible.
- Asserting folder write access and using folder fallback is correct.
103-108
: Use withUpdatedId on DataSource – LGTMEnsures directoryName + organization are bound to the new dataset id.
547-558
: New tracking hook is appropriateEncapsulating Slack/Mailchimp/Analytics side-effects is clean. No blocking on analytics is good.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (7)
46-61
: ReserveUploadInformation: new fields – LGTMCarrying layersToLink/requireUniqueName/isVirtual is aligned with the new flow.
63-67
: ReserveAdditionalInformation: LGTMDataset id + directory name is all the datastore needs.
69-78
: ReportDatasetUploadParameters shape – LGTMEncodes the dataset-centric report with optional dataSource and links.
167-203
: Write isVirtual to Redis during reserve()Persist the decision made at reservation. Use it later in finish().
newDataSourceId = DataSourceId(reserveUploadAdditionalInfo.directoryName, reserveUploadInfo.organization) @@ _ <- runningUploadMetadataStore.insert( redisKeyForDataSourceId(reserveUploadInfo.uploadId), Json.stringify(Json.toJson(newDataSourceId)) ) + // Persist isVirtual decision for consistent finish behavior + _ <- { + val isVirtual = reserveUploadInfo.isVirtual.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled) + runningUploadMetadataStore.insert(redisKeyForIsVirtual(reserveUploadInfo.uploadId), isVirtual.toString) + } @@ _ = logger.info( f"Reserving dataset upload ${reserveUploadInfo.uploadId} for dataset ${reserveUploadAdditionalInfo.newDatasetId} ($newDataSourceId)...")
140-146
: Persist isVirtual alongside upload metadataCurrently, you don’t store the isVirtual decision at reserve time. If config flips between reserve/finish or a restart occurs, uploads can finalize inconsistently.
Apply:
private def redisKeyForUploadId(datasourceId: DataSourceId): String = s"upload___${Json.stringify(Json.toJson(datasourceId))}___datasourceId" + private def redisKeyForIsVirtual(uploadId: String): String = + s"upload___${uploadId}___isVirtual"
403-444
: moveUnpackedToTarget: parameterize S3 decision and use it instead of live configUse the computed uploadToS3 boolean, not the current configuration, to avoid inconsistency.
- private def moveUnpackedToTarget(unpackedDir: Path, - needsConversion: Boolean, - datasetId: ObjectId, - dataSourceId: DataSourceId): Fox[Option[UsableDataSource]] = + private def moveUnpackedToTarget(unpackedDir: Path, + needsConversion: Boolean, + uploadToS3: Boolean, + datasetId: ObjectId, + dataSourceId: DataSourceId): Fox[Option[UsableDataSource]] = @@ - newBasePath <- if (dataStoreConfig.Datastore.S3Upload.enabled) { + newBasePath <- if (uploadToS3) { for { s3UploadBucket <- s3UploadBucketOpt.toFox beforeS3Upload = Instant.now s3ObjectKey = s"${dataStoreConfig.Datastore.S3Upload.objectKeyPrefix}/${dataSourceId.organizationId}/${dataSourceId.directoryName}/" _ <- uploadDirectoryToS3(unpackedDir, s3UploadBucket, s3ObjectKey) _ = Instant.logSince(beforeS3Upload, s"Forwarding of uploaded of dataset $datasetId ($dataSourceId) to S3", logger) - endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost + endPointHost = s3UploadEndpoint.getHost newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox } yield newBasePath } else {
523-537
: Endpoint override drops scheme/port; preserve original endpointConstruct s3UploadEndpoint from the configured URI to keep http/https and custom port (e.g., MinIO).
- private lazy val s3UploadEndpoint: URI = { - val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) - new URI( - "https", - null, - credentialUri.getHost, - -1, - null, - null, - null - ) - } + private lazy val s3UploadEndpoint: URI = + new URI(dataStoreConfig.Datastore.S3Upload.credentialName)app/controllers/WKRemoteDataStoreController.scala (2)
61-90
: Reserve dataset: validation and prelim creation – LGTM
- Validates linked layers with access checks.
- Creates a virtual dataset with an unusable placeholder, returning ids to the datastore.
206-219
: Delete dataset: refresh used storage after delete – LGTMMarks deleted (or hard-deletes) and refreshes usage; good cleanup flow.
...-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
Show resolved
Hide resolved
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
...pp/com/scalableminds/webknossos/datastore/slacknotification/DSSlackNotificationService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
app/models/dataset/ComposeService.scala (1)
62-70
: Reintroduce the StaticLayer guard to prevent MatchError
mapped
is implemented via athis match { case StaticColorLayer | StaticSegmentationLayer => ... }
, so calling it on a non-staticDataLayer
(e.g.,EditableMappingLayer
) now blows up with aMatchError
. The earlier version short-circuited those cases; this change regresses into a runtime crash. Let’s keep the defensive pattern match.- applyCoordinateTransformations = (cOpt: Option[List[CoordinateTransformation]]) => - cOpt match { - case Some(c) => Some(c ++ composeLayer.transformations.toList) - case None => Some(composeLayer.transformations.toList) - } - editedLayer = layer.mapped(name = composeLayer.newName, - coordinateTransformations = - applyCoordinateTransformations(layer.coordinateTransformations)) - } yield editedLayer + applyCoordinateTransformations = (cOpt: Option[List[CoordinateTransformation]]) => + cOpt match { + case Some(c) => Some(c ++ composeLayer.transformations.toList) + case None => Some(composeLayer.transformations.toList) + } + editedLayer <- layer match { + case staticLayer: StaticLayer => + Fox.successful( + staticLayer.mapped( + name = composeLayer.newName, + coordinateTransformations = + applyCoordinateTransformations(staticLayer.coordinateTransformations) + ) + ) + case _ => + Fox.failure("Only static layers can be composed") + } + } yield editedLayerapp/models/dataset/DatasetUploadToPathsService.scala (1)
228-233
: Bug: reserveAttachmentUploadToPath regenerates directoryName and ignores uploadToPathsInfixOpt
File: app/models/dataset/DatasetUploadToPathsService.scala Lines ~211–218
- It calls
generateDirectoryName
ondataset.directoryName
instead of reusing the stored name, and omitsuploadToPathsInfixOpt
, so attachments land under a different path than layers.
Apply this diff to mirroraddPathsToDatasource
:- newDirectoryName = datasetService.generateDirectoryName(dataset.directoryName, dataset._id) - datasetPath = uploadToPathsPrefix / dataset._organization / newDirectoryName + orgaDir = uploadToPathsPrefix / dataset._organization + datasetParent = uploadToPathsInfixOpt.map(infix => orgaDir / infix).getOrElse(orgaDir) + datasetPath = datasetParent / dataset.directoryName attachmentPath = generateAttachmentPath(parameters.attachmentName, parameters.attachmentDataformat, parameters.attachmentType, datasetPath / parameters.layerName)
🧹 Nitpick comments (2)
app/models/dataset/DatasetUploadToPathsService.scala (2)
139-142
: Normalize requested path prefix before whitelist check.If clients pass a logically identical but differently-normalized UPath (e.g., relative vs absolute), this equality test may reject it.
Apply this diff:
- case Some(requested) => - if (configuredPrefixes.contains(requested)) Full(requested) + case Some(requested) => + val requestedAbs = requested.toAbsolute + if (configuredPrefixes.contains(requestedAbs)) Full(requestedAbs) else Failure("Requested path prefix is not in list of configured ones.")
146-156
: DRY: centralize dataset parent path construction (optional).Both addPathsToDatasource and reserveAttachmentUploadToPath compute the same orgaDir/infix logic. Extract a helper to reduce drift.
Add this helper:
private def datasetParentFor(prefix: UPath, organizationId: String): UPath = { val orgaDir = prefix / organizationId uploadToPathsInfixOpt.map(infix => orgaDir / infix).getOrElse(orgaDir) }Then use:
- In addPathsToDatasource: replace orgaDir/datasetParent with datasetParentFor(uploadToPathsPrefix, organizationId)
- In reserveAttachmentUploadToPath: replace with datasetParentFor(uploadToPathsPrefix, dataset._organization)
</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: CodeRabbit UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 0f9e16546404424413b4b020b1aca1af14b3eb02 and a509ceba4f160452bb55a00bb5117b1a8c36eb8d. </details> <details> <summary>📒 Files selected for processing (4)</summary> * `app/models/dataset/ComposeService.scala` (2 hunks) * `app/models/dataset/DatasetUploadToPathsService.scala` (3 hunks) * `conf/application.conf` (1 hunks) * `webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala` (4 hunks) </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary> * conf/application.conf </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧬 Code graph analysis (3)</summary> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (4)</summary><blockquote> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)</summary> * `webknossos` (70-72) * `validateAccessFromTokenContext` (87-94) </details> <details> <summary>util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)</summary> * `validateJson` (204-209) </details> <details> <summary>util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)</summary> * `fromFuture` (42-53) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)</summary> * `getDatasetId` (206-216) </details> </blockquote></details> <details> <summary>app/models/dataset/ComposeService.scala (2)</summary><blockquote> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)</summary> * `mapped` (104-133) </details> <details> <summary>webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)</summary> * `coordinateTransformations` (85-85) </details> </blockquote></details> <details> <summary>app/models/dataset/DatasetUploadToPathsService.scala (2)</summary><blockquote> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)</summary> * `LinkedLayerIdentifier` (80-80) * `LinkedLayerIdentifier` (82-84) </details> <details> <summary>app/models/dataset/LayerToLinkService.scala (1)</summary> * `addLayersToLinkToDataSource` (32-42) </details> </blockquote></details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)</summary> * GitHub Check: backend-tests * GitHub Check: frontend-tests * GitHub Check: build-smoketest-push </details> <details> <summary>🔇 Additional comments (5)</summary><blockquote> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)</summary><blockquote> `133-143`: **Legacy manual uploads must set `isVirtual = false`** Leaving `isVirtual` as `None` makes manual uploads inherit the new default (`true`), so data that lives on disk is marked virtual and never served correctly. Explicitly pass `Some(false)` here. ```diff - Some(request.body.requireUniqueName), - None + Some(request.body.requireUniqueName), + Some(false)
app/models/dataset/DatasetUploadToPathsService.scala (4)
23-23
: Good: import LinkedLayerIdentifier from uploading module.This aligns with the new public surface in the datastore uploading package.
26-28
: Good: switched to public request types.Using ReserveDatasetUploadToPaths(ForPreliminary)Request from controllers is consistent with API versioning plans.
41-41
: Good: delegate layer linking via injected LayerToLinkService.Cleaner separation of concerns; avoids duplicating resolve/link logic here.
67-69
: Good: delegate layersToLink handling to LayerToLinkService.This ensures duplicate-name checks and resolution are centralized.
Please confirm i18n keys used inside LayerToLinkService ("dataset.layerToLink.failed", "dataset.duplicateLayerNames") exist in messages files for all supported locales.
...datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1)
330-341
: Bug: use flatMap instead of map to avoid nested Fox and ensure cleanup runsCurrent map returns Fox[Fox[Unit]] and likely skips size enforcement or fails type inference. Use flatMap.
- _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize => - tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.map(actualFileSize => - if (actualFileSize > maxFileSize.toLong) { - cleanUpDatasetExceedingSize(uploadDir, uploadId) - Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes") - } else Fox.successful(())) - } + _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize => + tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.flatMap { actualFileSize => + if (actualFileSize > maxFileSize.toLong) { + cleanUpDatasetExceedingSize(uploadDir, uploadId) + } else Fox.successful(()) + } + }frontend/javascripts/admin/dataset/dataset_upload_view.tsx (1)
334-347
: AddneedsConversion
toReserveUploadInformation
TS type and backend
TheReserveUploadInformation
type infrontend/javascripts/admin/rest_api.ts
(currently at lines 1136–1146) must include aneedsConversion?: boolean
field, and the corresponding backend DTO needs to accept and handle it optional-safely.
🧹 Nitpick comments (5)
app/models/dataset/DatasetService.scala (2)
88-99
: Unfinished uploads query: add guardrails for scale and configurability.
- Consider making the 14‑day lookback configurable (e.g., WkConf) and optionally add a hard
LIMIT
to avoid large result sets.- Ensure there is a supporting compound index matching the filters: (uploaderId, organizationId, isActive, status, created DESC).
100-121
: createAndSetUpDataset: good access checks; avoid double‑write for folder if possible.
- You create with root folder and then immediately update to
folderIdWithFallback
. To reduce writes and transient states, consider passing the intended folder intocreateDataset
(e.g., new optional paraminitialFolderId: Option[ObjectId] = None
defaulting to org root).- The updated DataSourceId with (directoryName, organizationId) looks correct for virtual datasets.
app/controllers/WKRemoteDataStoreController.scala (1)
76-90
: isVirtual toggling vs. needsConversion—confirm intentYou set isVirtual = !needsConversion. That matches the note about convert_to_wkw requiring non‑virtual prelim datasets, but conflicts with “all newly uploaded datasets are virtual” as a general rule. Please confirm this exception is intended and documented in the changelog/API notes.
app/controllers/DatasetController.scala (1)
208-215
: Marking explore-and-add datasets virtual is correct; consider analytics hookSetting isVirtual = true here is aligned with “all newly uploaded datasets are virtual.” Optionally, also call trackNewDataset with viaAddRoute = true to keep analytics/Slack consistent across creation paths.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)
76-86
: Two DSRemoteWebknossosClient injections — consolidateBoth remoteWebknossosClient and dsRemoteWebknossosClient are DSRemoteWebknossosClient. Use a single instance to reduce confusion and wiring complexity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
app/controllers/DatasetController.scala
(7 hunks)app/controllers/WKRemoteDataStoreController.scala
(5 hunks)app/models/dataset/ComposeService.scala
(2 hunks)app/models/dataset/Dataset.scala
(1 hunks)app/models/dataset/DatasetService.scala
(4 hunks)app/models/dataset/DatasetUploadToPathsService.scala
(4 hunks)frontend/javascripts/admin/dataset/dataset_upload_view.tsx
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
(4 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(18 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- app/models/dataset/ComposeService.scala
🧰 Additional context used
🧬 Code graph analysis (7)
app/models/dataset/Dataset.scala (2)
app/utils/sql/SecuredSQLDAO.scala (1)
assertUpdateAccess
(36-48)app/utils/sql/SqlInterpolation.scala (2)
q
(20-39)asUpdate
(74-74)
app/controllers/WKRemoteDataStoreController.scala (9)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
AuthorizedAccessContext
(19-21)DBAccessContext
(31-37)GlobalAccessContext
(25-29)util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
ObjectId
(11-13)ObjectId
(15-66)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
DataSource
(31-43)UnusableDataSource
(45-62)UnusableDataSource
(64-66)isUsable
(20-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
DataSourceStatus
(3-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)
ReportDatasetUploadParameters
(70-75)ReportDatasetUploadParameters
(76-79)ReserveAdditionalInformation
(64-64)ReserveAdditionalInformation
(65-68)ReserveUploadInformation
(46-59)ReserveUploadInformation
(60-62)app/models/dataset/LayerToLinkService.scala (2)
validateLayerToLink
(22-32)addLayersToLinkToDataSource
(32-42)app/models/dataset/DatasetService.scala (2)
createAndSetUpDataset
(100-123)trackNewDataset
(548-560)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)app/models/dataset/Dataset.scala (3)
findOne
(203-210)updateDataSource
(633-655)deleteDataset
(708-736)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)
webknossos
(70-72)validateAccessFromTokenContext
(87-94)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
LinkedLayerIdentifier
(81-81)LinkedLayerIdentifier
(83-85)ReserveUploadInformation
(46-59)ReserveUploadInformation
(60-62)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
fromFuture
(42-53)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
getDatasetId
(206-216)
app/models/dataset/DatasetUploadToPathsService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
LinkedLayerIdentifier
(81-81)LinkedLayerIdentifier
(83-85)app/models/dataset/LayerToLinkService.scala (1)
addLayersToLinkToDataSource
(32-42)app/models/dataset/Dataset.scala (1)
makeVirtual
(666-672)
app/models/dataset/DatasetService.scala (8)
app/models/user/MultiUser.scala (1)
MultiUserDAO
(39-181)app/mail/MailchimpClient.scala (2)
MailchimpClient
(17-82)tagUser
(45-54)app/mail/MailchimpTag.scala (1)
MailchimpTag
(5-22)app/models/analytics/AnalyticsService.scala (2)
AnalyticsService
(22-97)track
(38-46)app/models/analytics/AnalyticsEvent.scala (1)
UploadDatasetEvent
(205-218)app/telemetry/SlackNotificationService.scala (2)
SlackNotificationService
(11-59)info
(30-36)app/utils/WkConf.scala (1)
WkConf
(14-296)app/models/dataset/DataStore.scala (1)
findOneByName
(118-125)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)
DataStoreConfig
(11-78)Datastore
(18-70)S3Upload
(64-68)DataVaults
(61-63)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
S3DataVault
(46-155)S3DataVault
(157-250)getCredential
(143-143)hostBucketFromUri
(168-179)create
(158-166)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath
(54-96)fromLocalPath
(80-80)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DSRemoteWebknossosClient
(57-216)reportUpload
(98-105)getCredential
(185-196)deleteDataset
(138-143)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (2)
DataSourceService
(25-303)resolvePathsInNewBasePath
(253-269)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)
app/controllers/DatasetController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
LinkedLayerIdentifier
(81-81)LinkedLayerIdentifier
(83-85)app/models/storage/UsedStorageService.scala (2)
UsedStorageService
(21-103)refreshStorageReportForDataset
(89-103)app/models/dataset/DatasetService.scala (2)
createAndSetUpDataset
(100-123)trackNewDataset
(548-560)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: backend-tests
- GitHub Check: build-smoketest-push
- GitHub Check: frontend-tests
🔇 Additional comments (15)
app/models/dataset/DatasetService.scala (3)
24-24
: LGTM: imports match new telemetry/analytics dependencies.No issues spotted with these additions.
Also applies to: 26-27, 32-33
47-56
: Refined search to locate manualnew DatasetService
invocations
Please rerun and check for any instantiations that need updating:#!/bin/bash rg -nP 'new\s+DatasetService\s*\(' --type scala -C2If this still yields no results, manually verify any Guice modules or test setups that instantiate
DatasetService
.
560-570
: Gate Slack logging by config and recover failures
- If
conf.SlackNotifications.uri
is empty/None, returnFox.successful(())
to skip DB reads.- Wrap the inner
for
in.orElse(Fox.successful(()))
(or use arecover
) so Slack client errors don’t bubble up.- Confirm that
conf.SlackNotifications.uri
yields an empty string orNone
when unconfigured.app/models/dataset/DatasetUploadToPathsService.scala (1)
67-69
: Delegating layer linking to LayerToLinkService looks goodCentralizing linking and duplicate-name checks here is cleaner and reduces duplication. LGTM.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
404-445
: Respect configured endpoint object; avoid re-parsing credentialNameUse the normalized s3UploadEndpoint (which should preserve scheme/port) when building S3 base path.
- endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost - newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox + val endPointHost = s3UploadEndpoint.getAuthority // includes host[:port] + newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox
527-538
: Preserve scheme and port for custom S3 endpointsHardcoding https and dropping the port breaks MinIO and non‑standard endpoints. Keep original URI.
- private lazy val s3UploadEndpoint: URI = { - val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) - new URI( - "https", - null, - credentialUri.getHost, - -1, - null, - null, - null - ) - } + private lazy val s3UploadEndpoint: URI = + new URI(dataStoreConfig.Datastore.S3Upload.credentialName)
342-386
: Persist and honor isVirtual across finish; don’t rely on live configFinish currently decides S3 vs local via dataStoreConfig.Datastore.S3Upload.enabled. Persist the chosen isVirtual at reserve and read it here for consistent behavior on retries/restarts.
@@ private def redisKeyForUploadId(datasourceId: DataSourceId): String = s"upload___${Json.stringify(Json.toJson(datasourceId))}___datasourceId" + private def redisKeyForIsVirtual(uploadId: String): String = + s"upload___${uploadId}___isVirtual" @@ def reserveUpload(...) - _ <- runningUploadMetadataStore.insert( + _ <- runningUploadMetadataStore.insert( redisKeyForDataSourceId(reserveUploadInfo.uploadId), Json.stringify(Json.toJson(newDataSourceId)) ) + _ <- runningUploadMetadataStore.insert( + redisKeyForIsVirtual(reserveUploadInfo.uploadId), + reserveUploadInfo.isVirtual + .orElse(reserveUploadInfo.needsConversion.map(nc => !nc)) // fallback: virtual if no conversion + .getOrElse(dataStoreConfig.Datastore.S3Upload.enabled) + .toString + ) @@ def finishUpload(...) - dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId) + isVirtualOpt <- runningUploadMetadataStore.find(redisKeyForIsVirtual(uploadId)).map(_.map(_.toBoolean)) + uploadToS3 = isVirtualOpt.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled) + dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId, uploadToS3) @@ def cleanUpUploadedDataset(...) - removeFromRedis(uploadId) + removeFromRedis(uploadId) @@ def removeFromRedis(uploadId: String) _ <- runningUploadMetadataStore.remove(redisKeyForDataSourceId(uploadId)) + _ <- runningUploadMetadataStore.remove(redisKeyForIsVirtual(uploadId))And adapt moveUnpackedToTarget signature to accept uploadToS3 and use it instead of reading config.
app/controllers/WKRemoteDataStoreController.scala (1)
128-154
: Require dataSource when needsConversion == false (persist and mark usable)Without this, non‑conversion uploads can leave datasets unusable if dataSourceOpt is omitted. Enforce presence and persist.
for { user <- bearerTokenService.userForToken(token) dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND + _ <- Fox.fromBool(request.body.needsConversion || request.body.dataSourceOpt.isDefined) ?~> + "reportUpload.dataSourceMissing" _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset)) _ <- datasetService.trackNewDataset(dataset, user, request.body.needsConversion, request.body.datasetSizeBytes, viaAddRoute = false) dataSourceWithLinkedLayersOpt <- Fox.runOptional(request.body.dataSourceOpt) { implicit val ctx: DBAccessContext = AuthorizedAccessContext(user) layerToLinkService.addLayersToLinkToDataSource(_, request.body.layersToLink) }app/controllers/DatasetController.scala (4)
18-18
: Consolidate LinkedLayerIdentifier source — good changeSwitching to the shared type from uploading services avoids duplication and keeps JSON formats consistent. No action needed.
120-120
: API shape change: folderId now ObjectIdDataSourceRegistrationInfo.folderId moved from Option[String] to Option[ObjectId]. Verify all callers (FE and any scripts) send a hex ObjectId and not a plain folder path/string. If older clients exist, provide a small adapter at the boundary.
230-237
: createAndSetUpDataset usage looks correctPassing folderId as ObjectId and setting isVirtual = true matches the new model.
141-141
: Make storage‐report refresh best‐effort and retain 200 OKWrap the per‐dataset refresh so transient 4xx/5xx errors won’t fail the route after the dataset is marked usable:
- _ <- usedStorageService.refreshStorageReportForDataset(dataset) + // best‐effort: don’t fail the route if storage‐report refresh fails + _ <- usedStorageService + .refreshStorageReportForDataset(dataset) + .recover { case _ => () }Ensure the failure path of
finishUploadToPaths
still triggers the Slack notification required by #8934.webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (3)
93-117
: reserveUploadV11 mapping looks good; keep isVirtual unsetForwarding requireUniqueName and leaving isVirtual/needsConversion as None is correct for non-manual flows. No changes needed.
119-124
: Use datasetDirectoryName when resolving legacy links (prevents 404s)getDatasetId expects the directory name. Fall back to dataSetName only if datasetDirectoryName is absent. This was flagged earlier and still applies.
Apply:
- datasetId <- remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, - legacyLayerToLink.dataSetName) + val directoryName = + legacyLayerToLink.datasetDirectoryName.getOrElse(legacyLayerToLink.dataSetName) + datasetId <- remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, directoryName)Also add a minimal guard so we fail fast if both organizationId and organizationName are missing:
- private def adaptLayerToLink(legacyLayerToLink: LegacyLinkedLayerIdentifier): Fox[LinkedLayerIdentifier] = - for { + private def adaptLayerToLink(legacyLayerToLink: LegacyLinkedLayerIdentifier): Fox[LinkedLayerIdentifier] = + for { + _ <- Fox.fromBool(legacyLayerToLink.organizationId.isDefined || legacyLayerToLink.organizationName.isDefined) + ?~> "organization.missing" datasetId <- ...
140-146
: Set isVirtual = Some(false) for manual uploads (legacy path)Manual disk uploads should explicitly mark isVirtual = false to avoid relying on defaults.
Apply:
- Some(request.body.requireUniqueName), - None, + Some(request.body.requireUniqueName), + Some(false), needsConversion = None
...datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
171-207
: Persist and consume isVirtual across reserve/finish; also clean up on move failuresConfig flips can misroute data (S3 vs local) and break resumability. Persist isVirtual on reserve, read it at finish, and pass it into moveUnpackedToTarget. Also wrap the move step with cleanup-on-failure, like unpack/post-process. Finally, remove the persisted flag on cleanup.
Apply:
@@ private def redisKeyForDatasetId(uploadId: String): String = s"upload___${uploadId}___datasetId" + private def redisKeyForIsVirtual(uploadId: String): String = + s"upload___${uploadId}___isVirtual" @@ def reserveUpload(reserveUploadInfo: ReserveUploadInformation, _ <- runningUploadMetadataStore.insert( redisKeyForDatasetId(reserveUploadInfo.uploadId), Json.stringify(Json.toJson(reserveUploadAdditionalInfo.newDatasetId)) ) + _ <- runningUploadMetadataStore.insert( + redisKeyForIsVirtual(reserveUploadInfo.uploadId), + reserveUploadInfo.isVirtual.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled).toString + ) @@ def finishUpload(uploadInformation: UploadInformation) - dataSourceWithAbsolutePathsOpt <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId) + // Decide target storage based on persisted isVirtual (fallback to live config) + isVirtualOpt <- runningUploadMetadataStore.find(redisKeyForIsVirtual(uploadId)).map(_.map(_.toBoolean)) + uploadToS3 = isVirtualOpt.getOrElse(dataStoreConfig.Datastore.S3Upload.enabled) + moveResult <- moveUnpackedToTarget(unpackToDir, needsConversion, datasetId, dataSourceId, uploadToS3).shiftBox + _ <- cleanUpOnFailure(moveResult, datasetId, dataSourceId, needsConversion, label = s"moving dataset to target storage") + dataSourceWithAbsolutePathsOpt <- moveResult.toFox @@ private def moveUnpackedToTarget(unpackedDir: Path, needsConversion: Boolean, - datasetId: ObjectId, - dataSourceId: DataSourceId): Fox[Option[UsableDataSource]] = + datasetId: ObjectId, + dataSourceId: DataSourceId, + uploadToS3: Boolean): Fox[Option[UsableDataSource]] = @@ - newBasePath <- if (dataStoreConfig.Datastore.S3Upload.enabled) { + newBasePath <- if (uploadToS3) { @@ - endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost + endPointHost = s3UploadEndpoint.getHost newBasePath <- UPath.fromString(s"s3://$endPointHost/$s3UploadBucket/$s3ObjectKey").toFox } yield newBasePath } else { @@ def removeFromRedis(uploadId: String): Fox[Unit] = dataSourceId <- getDataSourceIdByUploadId(uploadId) _ <- runningUploadMetadataStore.remove(redisKeyForDataSourceId(uploadId)) _ <- runningUploadMetadataStore.remove(redisKeyForDatasetId(uploadId)) + _ <- runningUploadMetadataStore.remove(redisKeyForIsVirtual(uploadId)) _ <- runningUploadMetadataStore.remove(redisKeyForLinkedLayerIdentifier(uploadId))Also applies to: 349-392, 410-451, 820-840
337-347
: Cleanup on size-exceed must actually run; current call is droppedThe call to cleanUpDatasetExceedingSize is not sequenced and won’t execute before returning a failure.
Apply:
- _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize => - tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.map(actualFileSize => - if (actualFileSize > maxFileSize.toLong) { - cleanUpDatasetExceedingSize(uploadDir, uploadId) - Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes") - } else Fox.successful(())) - } + _ <- Fox.runOptional(totalFileSizeInBytesOpt) { maxFileSize => + tryo(FileUtils.sizeOfDirectoryAsBigInteger(uploadDir.toFile).longValue).toFox.flatMap { actualFileSize => + if (actualFileSize > maxFileSize.toLong) + cleanUpDatasetExceedingSize(uploadDir, uploadId) + .flatMap(_ => Fox.failure(s"Uploaded dataset exceeds the maximum allowed size of $maxFileSize bytes")) + else Fox.successful(()) + } + }app/controllers/WKRemoteDataStoreController.scala (1)
210-227
: Await DB delete and used-storage refreshCurrent code builds a Future chain and discards it; side-effects may never run.
Apply:
- _ = datasetDAO - .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) - .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset)) + _ <- datasetDAO + .deleteDataset(dataset._id, onlyMarkAsDeleted = annotationCount > 0) + .flatMap(_ => usedStorageService.refreshStorageReportForDataset(dataset))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
app/controllers/DatasetController.scala
(7 hunks)app/controllers/WKRemoteDataStoreController.scala
(5 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
(19 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
app/controllers/WKRemoteDataStoreController.scala (8)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
AuthorizedAccessContext
(19-21)DBAccessContext
(31-37)GlobalAccessContext
(25-29)util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
ObjectId
(11-13)ObjectId
(15-66)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
DataSource
(31-43)UnusableDataSource
(45-62)UnusableDataSource
(64-66)isUsable
(20-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
DataSourceStatus
(3-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
ReportDatasetUploadParameters
(70-75)ReportDatasetUploadParameters
(76-79)app/models/dataset/LayerToLinkService.scala (2)
validateLayerToLink
(22-32)addLayersToLinkToDataSource
(32-42)app/models/dataset/DatasetService.scala (2)
createAndSetUpDataset
(100-123)trackNewDataset
(548-560)app/models/dataset/Dataset.scala (3)
findOne
(203-210)updateDataSource
(633-655)deleteDataset
(708-736)
app/controllers/DatasetController.scala (2)
app/models/storage/UsedStorageService.scala (2)
UsedStorageService
(21-103)refreshStorageReportForDataset
(89-103)app/models/dataset/DatasetService.scala (2)
createAndSetUpDataset
(100-123)trackNewDataset
(548-560)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (8)
util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (4)
PathUtils
(14-14)deleteDirectoryRecursively
(186-193)ensureDirectoryBox
(124-132)listFilesRecursive
(104-108)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (2)
DataStoreConfig
(11-78)S3Upload
(64-68)webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)
S3DataVault
(46-155)S3DataVault
(157-250)getCredential
(143-143)hostBucketFromUri
(168-179)create
(158-166)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)
UPath
(54-96)fromLocalPath
(80-80)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)
UnfinishedUpload
(7-16)UnfinishedUpload
(18-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)
DSRemoteWebknossosClient
(57-216)reportUpload
(98-105)getCredential
(185-196)deleteDataset
(138-143)webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)
CredentialConfigReader
(8-64)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)
deleteOnDisk
(28-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
- GitHub Check: build-smoketest-push
🔇 Additional comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
431-439
: Use normalized endpoint host for S3 base pathDon’t re-parse the config string here; use s3UploadEndpoint to ensure scheme/host/port are consistent.
Apply:
- endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost + endPointHost = s3UploadEndpoint.getHost
533-544
: Preserve scheme/port when building S3 endpointHardcoding https and dropping port breaks MinIO/custom endpoints.
Apply:
- private lazy val s3UploadEndpoint: URI = { - val credentialUri = new URI(dataStoreConfig.Datastore.S3Upload.credentialName) - new URI( - "https", - null, - credentialUri.getHost, - -1, - null, - null, - null - ) - } + private lazy val s3UploadEndpoint: URI = + new URI(dataStoreConfig.Datastore.S3Upload.credentialName)app/controllers/DatasetController.scala (1)
230-243
: trackNewDataset not awaited and viaAddRoute should be true for Add routeCurrent code doesn’t await side-effects (Slack/analytics) and mislabels the source.
Apply:
- ) - _ = datasetService.trackNewDataset(dataset, - user, - needsConversion = false, - datasetSizeBytes = 0, - viaAddRoute = false) + ) + _ <- datasetService.trackNewDataset(dataset, + user, + needsConversion = false, + datasetSizeBytes = 0, + viaAddRoute = true)app/controllers/WKRemoteDataStoreController.scala (1)
125-154
: Require dataSource when needsConversion == false; also await trackingWithout enforcing dataSource for non-conversion uploads, datasets remain unusable. Also, trackNewDataset must be awaited.
Apply:
def reportDatasetUpload(name: String, key: String, token: String, datasetId: ObjectId): Action[ReportDatasetUploadParameters] = Action.async(validateJson[ReportDatasetUploadParameters]) { implicit request => dataStoreService.validateAccess(name, key) { _ => for { user <- bearerTokenService.userForToken(token) dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND - _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset)) - _ = datasetService.trackNewDataset(dataset, - user, - request.body.needsConversion, - request.body.datasetSizeBytes, - viaAddRoute = false) + _ <- Fox.fromBool(request.body.needsConversion || request.body.dataSourceOpt.isDefined) ?~> + "reportUpload.dataSourceMissing" + _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset)) + _ <- datasetService.trackNewDataset(dataset, + user, + request.body.needsConversion, + request.body.datasetSizeBytes, + viaAddRoute = false) dataSourceWithLinkedLayersOpt <- Fox.runOptional(request.body.dataSourceOpt) { implicit val ctx: DBAccessContext = AuthorizedAccessContext(user) layerToLinkService.addLayersToLinkToDataSource(_, request.body.layersToLink) } _ <- Fox.runOptional(dataSourceWithLinkedLayersOpt) { dataSource => logger.info(s"Updating dataset $datasetId in database after upload reported from datastore $name.") datasetDAO.updateDataSource(datasetId, dataset._dataStore, dataSource.hashCode(), dataSource, isUsable = true)(GlobalAccessContext) } } yield Ok } }
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
app/controllers/WKRemoteDataStoreController.scala
(6 hunks)app/models/dataset/Dataset.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
app/models/dataset/Dataset.scala (2)
app/utils/sql/SecuredSQLDAO.scala (2)
existingCollectionName
(16-16)assertUpdateAccess
(36-48)app/utils/sql/SqlInterpolation.scala (2)
q
(20-39)asUpdate
(74-74)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (2)
webknossos
(70-72)validateAccessFromTokenContext
(87-94)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (4)
LinkedLayerIdentifier
(81-81)LinkedLayerIdentifier
(83-85)ReserveUploadInformation
(46-59)ReserveUploadInformation
(60-62)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
fromFuture
(42-53)successful
(53-56)util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (1)
fromStringSync
(22-26)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)
getDatasetId
(206-216)
app/controllers/WKRemoteDataStoreController.scala (11)
util/src/main/scala/com/scalableminds/util/accesscontext/AccessContext.scala (3)
AuthorizedAccessContext
(19-21)DBAccessContext
(31-37)GlobalAccessContext
(25-29)util/src/main/scala/com/scalableminds/util/objectid/ObjectId.scala (2)
ObjectId
(11-13)ObjectId
(15-66)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSource.scala (4)
DataSource
(31-43)UnusableDataSource
(45-62)UnusableDataSource
(64-66)isUsable
(20-20)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataSourceStatus.scala (1)
DataSourceStatus
(3-12)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2)
ReportDatasetUploadParameters
(70-75)ReportDatasetUploadParameters
(76-79)app/models/storage/UsedStorageService.scala (2)
UsedStorageService
(21-103)refreshStorageReportForDataset
(89-103)app/models/dataset/LayerToLinkService.scala (2)
validateLayerToLink
(22-32)addLayersToLinkToDataSource
(32-42)app/models/dataset/DatasetService.scala (2)
createAndSetUpDataset
(100-123)trackNewDataset
(548-560)util/src/main/scala/com/scalableminds/util/mvc/ExtendedController.scala (1)
validateJson
(204-209)app/security/WebknossosBearerTokenAuthenticatorService.scala (1)
userForToken
(76-84)app/models/dataset/Dataset.scala (3)
findOne
(203-210)updateDataSource
(634-656)deleteDataset
(709-737)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-smoketest-push
- GitHub Check: backend-tests
- GitHub Check: frontend-tests
🔇 Additional comments (2)
app/controllers/WKRemoteDataStoreController.scala (1)
140-151
: Re-introduce thedataSource
guard for non-conversion uploads.If
needsConversion
is false anddataSourceOpt
is missing, we leave the dataset backed by the placeholderUnusableDataSource
, so it remains unusable despite reporting success. The previous guard prevented this, but it’s gone now. Please add it back before processing optional linkage.for { user <- bearerTokenService.userForToken(token) dataset <- datasetDAO.findOne(datasetId)(GlobalAccessContext) ?~> Messages("dataset.notFound", datasetId) ~> NOT_FOUND + _ <- Fox.fromBool(request.body.needsConversion || request.body.dataSourceOpt.isDefined) ?~> + "reportUpload.dataSourceMissing" _ <- Fox.runIf(!request.body.needsConversion)(usedStorageService.refreshStorageReportForDataset(dataset))webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala (1)
127-130
: PreferdatasetDirectoryName
when available.
remoteWebknossosClient.getDatasetId
expects the directory name. When clients supplydatasetDirectoryName
, we must use it, otherwise legacy uploads break as soon as friendly names differ from directory names.- remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, legacyLayerToLink.dataSetName) + val directoryName = + legacyLayerToLink.datasetDirectoryName.getOrElse(legacyLayerToLink.dataSetName) + remoteWebknossosClient.getDatasetId(legacyLayerToLink.getOrganizationId, directoryName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for implementing this. Including @frcroth :)
Here is my code feedback / questions. Testing is TODO
for { | ||
organization <- organizationDAO.findOne(user._organization)(GlobalAccessContext) | ||
multiUser <- multiUserDAO.findOne(user._multiUser)(GlobalAccessContext) | ||
resultLink = s"${conf.Http.uri}/datasets/$datasetId" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proper link format should be "authority/datasets/<ds name>-<ds-id>" if I am not incorrect. The frontend understands the only id format and auto converts it to the usual one, but maybe we want to still use the "authority/datasets/<ds name>-<ds-id>" format here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the id-only format is supported now, we will probably have to support it later. I actually think it’s the clean variant, and the prepended name one is only “sugar” to make the URIs more readable where needed. I’d rather not add the name here because we’d have to be careful that it stays a valid URI etc 🤔
...datastore/app/com/scalableminds/webknossos/datastore/controllers/DSLegacyApiController.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Outdated
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
case S3AccessKeyCredential(credentialName, accessKeyId, secretAccessKey, _, _) | ||
if dataStoreConfig.Datastore.S3Upload.credentialName == credentialName => | ||
(accessKeyId, secretAccessKey) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the same s3 instance needs to be registered twice in the config? Once for datavault access and once to register it as an upload target?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. The credentials are taken from the same source though, only the upload target is given again. It can include a prefix to direct the upload to specific storage buckets (this may change regularly as the first are filled up)
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
Show resolved
Hide resolved
...datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (7)
app/models/dataset/Dataset.scala (1)
667-672
: Verify that worker flows use GlobalAccessContext when calling makeVirtual.This method asserts update access, which could block convert_to_wkw worker paths if they don't use GlobalAccessContext. Confirm that all call sites (especially worker flows) either have update access or use GlobalAccessContext.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
188-198
: Verify that Slack notification is triggered only on failures.The
log(Some(slackNotificationService.noticeFailedFinishUpload))
wrapper may trigger notifications even on successful 200 OK responses. Confirm that the notification is only sent on failures.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (5)
352-393
: Verify cleanup on moveUnpackedToTarget failure and config consistency.Two concerns:
- If
moveUnpackedToTarget
fails, the localunpackToDir
may not be cleaned up. Consider wrapping the move result incleanUpOnFailure
similar to unpack/post-processing steps.- The method uses
dataStoreConfig.Datastore.S3Upload.enabled
directly instead of reading a persistedisVirtual
flag from Redis. If the config changes between reserve and finish, the behavior may be inconsistent. Consider persistingisVirtual
at reserve time and reading it here.
410-451
: Verify cleanup on S3 upload failure.If the S3 upload (line 432) or path rewrite (line 446) fails, the local
unpackedDir
may not be cleaned up. Consider wrapping the S3 branch in a recovery handler that deletes the local directory and callsremoteWebknossosClient.deleteDataset(datasetId)
on failure.
546-562
: Fail fast on missing S3 credentials.The current code extracts
accessKeyId
andsecretAccessKey
asBox[Option[String]]
, which can succeed withNone
values and create an invalid S3 client. Validates3UploadCredentialsOpt
before building the client and fail with a clear error if missing.
533-544
: Preserve scheme and port in s3UploadEndpoint.The endpoint URI construction hardcodes
"https"
and port-1
, which will break non-HTTPS or non-default-port S3 endpoints (e.g., MinIO). Preserve the original scheme and port fromcredentialName
.
436-437
: Use s3UploadEndpoint for consistent host extraction.Line 436 extracts the host directly from the
credentialName
URI, buts3UploadEndpoint
(lines 533-544) normalizes the URI. For consistency, uses3UploadEndpoint.getHost
instead of re-parsingcredentialName
.Apply this diff:
- endPointHost = new URI(dataStoreConfig.Datastore.S3Upload.credentialName).getHost + endPointHost = s3UploadEndpoint.getHost
🧹 Nitpick comments (1)
unreleased_changes/8912.md (1)
8-15
: Add language identifier to the code block.The code block should specify a language for proper syntax highlighting and to satisfy markdown linting rules (MD040).
Based on static analysis hints.
Apply this diff:
-``` +```hocon s3Upload { enabled = false # Use the name of a credential in the dataVaults section here to use it for uploads. credentialName = "s3://example/uri/prefix" objectKeyPrefix = "webknossos-uploads" }</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: CodeRabbit UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 4bfb9a225cf601fccc3c5f89bc57de92217dcb12 and 2aebc3497affc6de63adcb7921f52bb5f7970706. </details> <details> <summary>📒 Files selected for processing (6)</summary> * `app/models/dataset/ComposeService.scala` (2 hunks) * `app/models/dataset/Dataset.scala` (2 hunks) * `unreleased_changes/8912.md` (1 hunks) * `webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala` (7 hunks) * `webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala` (1 hunks) * `webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala` (19 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (1)</summary> <details> <summary>📚 Learning: 2025-04-28T14:18:04.368Z</summary>
Learnt from: frcroth
PR: #8236
File: webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/mesh/NeuroglancerPrecomputedMeshFileService.scala:161-166
Timestamp: 2025-04-28T14:18:04.368Z
Learning: In Scala for-comprehensions with the Fox error handling monad,Fox.fromBool()
expressions should use the<-
binding operator instead of the=
assignment operator to properly propagate error conditions. Using=
will cause validation failures to be silently ignored.**Applied to files:** - `webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala` </details> </details><details> <summary>🧬 Code graph analysis (4)</summary> <details> <summary>app/models/dataset/Dataset.scala (3)</summary><blockquote> <details> <summary>app/models/project/Project.scala (1)</summary> * `findOneByNameAndOrganization` (115-125) </details> <details> <summary>app/utils/sql/SecuredSQLDAO.scala (3)</summary> * `readAccessQuery` (23-36) * `existingCollectionName` (16-16) * `assertUpdateAccess` (36-48) </details> <details> <summary>app/utils/sql/SqlInterpolation.scala (2)</summary> * `q` (20-39) * `asUpdate` (74-74) </details> </blockquote></details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (6)</summary><blockquote> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AccessTokenService.scala (6)</summary> * `webknossos` (70-72) * `validateAccessFromTokenContext` (87-94) * `UserAccessRequest` (30-30) * `UserAccessRequest` (31-72) * `writeDataset` (52-55) * `deleteDataset` (49-52) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/DSSlackNotificationService.scala (2)</summary> * `DSSlackNotificationService` (8-31) * `noticeFailedFinishUpload` (26-31) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)</summary> * `reserveDataSourceUpload` (123-132) * `deleteDataset` (138-143) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (6)</summary> * `reserveUpload` (171-212) * `getDatasetIdByUploadId` (168-171) * `extractDatasetUploadId` (157-157) * `finishUpload` (352-393) * `cancelUpload` (321-335) * `isKnownUpload` (154-157) </details> <details> <summary>util/src/main/scala/com/scalableminds/util/requestlogging/RequestLogging.scala (1)</summary> * `log` (55-62) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/slacknotification/SlackClient.scala (1)</summary> * `warn` (18-21) </details> </blockquote></details> <details> <summary>app/models/dataset/ComposeService.scala (3)</summary><blockquote> <details> <summary>app/models/dataset/DatasetService.scala (1)</summary> * `createAndSetUpDataset` (100-123) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)</summary> * `mapped` (104-133) </details> <details> <summary>webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (1)</summary> * `coordinateTransformations` (85-85) </details> </blockquote></details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (7)</summary><blockquote> <details> <summary>util/src/main/scala/com/scalableminds/util/io/PathUtils.scala (2)</summary> * `PathUtils` (14-14) * `deleteDirectoryRecursively` (186-193) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreConfig.scala (4)</summary> * `DataStoreConfig` (11-78) * `Datastore` (18-70) * `S3Upload` (64-68) * `DataVaults` (61-63) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (5)</summary> * `S3DataVault` (46-182) * `S3DataVault` (184-277) * `getCredential` (170-170) * `hostBucketFromUri` (195-206) * `create` (185-193) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/UPath.scala (2)</summary> * `UPath` (54-96) * `fromLocalPath` (80-80) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/UnfinishedUpload.scala (2)</summary> * `UnfinishedUpload` (7-16) * `UnfinishedUpload` (18-20) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (4)</summary> * `DSRemoteWebknossosClient` (57-216) * `reportUpload` (98-105) * `getCredential` (185-196) * `deleteDataset` (138-143) </details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/storage/CredentialConfigReader.scala (1)</summary> * `CredentialConfigReader` (8-64) </details> </blockquote></details> </details><details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>unreleased_changes/8912.md</summary> 8-8: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)</summary> * GitHub Check: build-smoketest-push * GitHub Check: frontend-tests </details> <details> <summary>🔇 Additional comments (19)</summary><blockquote> <details> <summary>app/models/dataset/ComposeService.scala (2)</summary><blockquote> `50-55`: **LGTM! Virtual dataset creation aligns with PR objectives.** The refactor from `createVirtualDataset` to `createAndSetUpDataset` with `isVirtual = true` correctly enforces the virtual dataset model. The `targetFolderId` type change to `ObjectId` is consistent with the method signature and improves type safety. --- `71-73`: **LGTM! Simplified layer editing using polymorphic `mapped` method.** The refactor eliminates explicit pattern matching by leveraging the `StaticLayer.mapped` method, which internally handles both `StaticColorLayer` and `StaticSegmentationLayer` types. This reduces code duplication and improves maintainability while preserving the coordinate transformation logic. </blockquote></details> <details> <summary>app/models/dataset/Dataset.scala (1)</summary><blockquote> `446-458`: **LGTM: Backwards‑compatible lookup by directoryName.** The method now looks up datasets by `directoryName` instead of `name`, which aligns with the shift to datasetId-centric operations. The comment correctly explains that this is for backwards compatibility with legacy links. </blockquote></details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (5)</summary><blockquote> `36-36`: **LGTM: Slack notification service injected.** The DSSlackNotificationService is properly injected for use in failure logging. Also applies to: 67-67 --- `100-103`: **LGTM: Reserve flow updated to use for-comprehension.** The reserve flow now awaits `reserveUploadAdditionalInfo` before calling `uploadService.reserveUpload`, which is more readable and correct. --- `153-155`: **LGTM: Chunk upload flow updated to datasetId.** The chunk upload and test chunk flows now correctly use `getDatasetIdByUploadId` and `UserAccessRequest.writeDataset(datasetId)`. Also applies to: 176-177 --- `202-214`: **LGTM: Cancel upload flow updated to datasetId.** The cancel upload flow now correctly uses `getDatasetIdByUploadId` and `UserAccessRequest.deleteDataset(datasetId)`. --- `405-410`: **LGTM: Delete on disk flow updated to datasetId.** The delete on disk flow now correctly uses `dsRemoteWebknossosClient.deleteDataset(datasetId)` and the log message references `datasetId`. </blockquote></details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/datavault/S3DataVault.scala (1)</summary><blockquote> `195-195`: **LGTM: hostBucketFromUri made public for S3 upload support.** The visibility change allows `UploadService` and other modules to extract bucket names from S3 URIs during the upload workflow. </blockquote></details> <details> <summary>webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (10)</summary><blockquote> `46-101`: **LGTM: Data models updated for datasetId‑centric flow.** The new and updated data models correctly reflect the shift to datasetId-based operations and S3 upload support. JSON formats are properly defined. --- `103-108`: **LGTM: DataStoreConfig dependency added.** The `DataStoreConfig` dependency is correctly injected to support S3 upload configuration. --- `143-144`: **LGTM: Redis key helper added for datasetId.** The `redisKeyForDatasetId` helper is consistent with existing Redis key patterns. --- `159-164`: **LGTM: Upload directory helpers updated.** The upload directory helpers now support the backup feature by creating backup directories in the `.trash` folder. --- `168-169`: **LGTM: DatasetId retrieval method added.** The `getDatasetIdByUploadId` method is consistent with the existing `getDataSourceIdByUploadId` pattern. --- `171-210`: **LGTM: reserveUpload updated for datasetId and layer linking.** The method now correctly validates that `needsConversion` and `layersToLink` are not both set, and stores the `datasetId` in Redis. --- `263-319`: **LGTM: handleUploadChunk updated to datasetId.** The chunk upload handler now correctly uses `getDatasetIdByUploadId` and includes `datasetId` in log messages. --- `321-335`: **LGTM: cancelUpload updated to datasetId.** The cancel upload handler now correctly uses `getDatasetIdByUploadId` and includes `datasetId` in log messages. --- `602-628`: **LGTM: Cleanup logic includes dataset deletion.** The cleanup logic correctly deletes the local directory and calls `remoteWebknossosClient.deleteDataset(datasetId)` on failure. --- `802-804`: **LGTM: Raw uploaded data backed up for debugging.** The backup feature correctly copies the raw uploaded data to a `.trash` directory before processing. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Description
Steps to test:
yarn enable-jobs
and a workerTODOs:
Issues:
$PR_NUMBER.md
file inunreleased_changes
or use./tools/create-changelog-entry.py
)