Skip to content

When deleting layers, move referenced data #8437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Apr 14, 2025

Conversation

frcroth
Copy link
Member

@frcroth frcroth commented Mar 10, 2025

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • For all scenarios create datasets and make sure they have been scanned and reported to WK.
  • Layers (all this should work for fully linked layers, so it should not matter if the layer itself is a link or all mags in the layer are linked), note that for locally explored layers you need local non-wkw data
    • DS 1 has Layer A with local data, DS 2 has Layer B with symlink to A. Delete DS 1. Layer B now has local data.
    • DS 1 has Layer A with local data, DS 2 has Layer B with symlink to A, DS 3 has Layer C with symlink to A. Delete DS 1. Layer B (without restriction of generality) has local data, Layer C has symlink to B.
    • DS 1 has Layer A with local data, DS 2 has Layer B locally explored (file:// protocol) with link to A. Delete DS 1. Layer B now exists physically with local data, there is no remote link anymore.
    • DS 1 has layer A with local data, DS 2 has Layer B locally explored with link to A, DS 3 has Layer C with symlink to A. Delete DS 1. it is currently arbitrary which ds is selected to store local data If DS 2 is selected, this is now a regular dataset and C is symlink to B, otherwise DS 2 now symlinks to C. In any case, DS 2 is no longer a remote dataset with file protocol.
    • DS 1 has Layer A with local data, DS 2 has Layer B, which exists physically, but every mag in layer B is a symlink to the mag in Layer A. Delete DS 1. Layer B now has local data.
  • Mags
    • DS 1 has Layer A with local data. DS 2 has Layer B where one mag is a symlink to a mag of A. Delete DS 1. This mag has now been moved to B.
    • DS 1 has Layer A with local data. DS 2 has Layer B where one mag is a symlink to a mag M of A. DS 3 has layer C where one mag is a symlink to mag M. Delete DS 1. Mag M has been moved to B or C, the other has a symlink to that location.

Notes

When a layer A symlinks to another layer B that links to a layer C, and you remove layer B, the symlink at layer A is broken. That is because for storage in the DB all symlinks are completely resolved, so A real Path is C and B real Path is C, when B is removed there is no knowledge on the link from A. I hope this works for now. Solving this would probably require a new table only for paths, because every length of link-links need to be resolved.

Another problem currently is locally explored datasets (with file://). Because we can't rewrite symlinks there, I think we need to rewrite the datasource properties in this case. Done

Right now everything assumes the usual structure with layers and mags named as usual. Anything else will cause trouble.

TODOs:

All these cases need to be handled

  • Layer with symlinks to it
    • One layer is a symlink -> move layer here
    • Multiple layers with symlinks -> move layer to one, relink everything else
    • (Multiple) layers with symlinks, but also a layer where one mag symlinks
  • Mag with symlinks to it
  • Locally explored

Issues:


(Please delete unneeded items, merge only when none are left open)

Copy link
Contributor

coderabbitai bot commented Mar 10, 2025

📝 Walkthrough

Walkthrough

This pull request introduces changes to ensure that when a dataset or layer is deleted, any referenced layers are preserved by relocating them instead of being permanently removed. The documentation and routing have been updated accordingly. Unused case classes were removed from controllers, and new methods were added for path retrieval and symlink management. In the models and services layers, data types and method signatures have been updated to support the new relocation behavior. Additionally, significant enhancements were made to the dataset deletion process in the datastore, including robust symlink handling and client integrations.

Changes

File(s) Change Summary
CHANGELOG.unreleased.md, docs/datasets/settings.md, conf/...latest.routes Updated deletion documentation and added a new GET route for retrieving dataset paths.
app/controllers/DatasetController.scala, app/controllers/WKRemoteDataStoreController.scala Removed the unused MagLinkInfo; added the new getPaths method and updated controller dependencies.
app/models/dataset/Dataset.scala, app/models/dataset/DatasetService.scala Replaced DatasetMagInfo with DataSourceMagRow/DataSourceMagInfo, updating method signatures and filtering logic for dataset paths.
webknossos-datastore/app/com/scalableminds/.../DatasetDeleter.scala, .../MagLinkInfo.scala, .../BinaryDataService.scala, .../DSRemoteWebknossosClient.scala, .../UploadService.scala, .../DataSourceController.scala, .../DataSourceService.scala Enhanced deletion process to relocate referenced layers safely, added robust symlink management through new helper methods, and updated client integrations by modifying trait usage and parameter visibility.

Assessment against linked issues

Objective Addressed Explanation
Delete layers that are referenced by other datasets (#8279)

Possibly related PRs

Suggested labels

backend, refactoring

Poem

I hopped through code with eyes so bright,
Fixing symlinks deep into the night.
Layers now saved, securely aligned,
No dataset lost, all neatly confined.
With every hop, the bugs take flight –
A jolly rabbit sings in pure delight!
🐇

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@frcroth frcroth changed the title Handle symlinks when deleting datasets/layers When deleting layers, move referenced data Mar 17, 2025
@frcroth frcroth marked this pull request as ready for review March 19, 2025 10:47
@frcroth frcroth requested a review from fm3 March 19, 2025 10:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
docs/datasets/settings.md (1)

74-74: Documentation update matches changelog

The added documentation clearly explains the behavior when deleting datasets with referenced layers, providing necessary context for users.

There's a minor grammatical issue with the comma placement.

-When other datasets reference layers from this dataset, WEBKNOSSOS will try to move these layers to the dataset they are referenced in, so that it can still be accessed.
+When other datasets reference layers from this dataset, WEBKNOSSOS will try to move these layers to the dataset they are referenced in so that it can still be accessed.
🧰 Tools
🪛 LanguageTool

[typographical] ~74-~74: The conjunction “so that” does not have a comma in front.
Context: ...rs to the dataset they are referenced in, so that it can still be accessed. ![Dataset Ed...

(SO_THAT_UNNECESSARY_COMMA)

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (5)

42-59: Robust trash-move method.
Method gracefully handles non-existing paths and relocates existing datasets to the trash location. Consider concurrency checks if multiple deletions target the same dataset simultaneously.


72-92: Symlink handling logic in moveSymlinks.
Overall approach to fetch and handle symlinks is sound. Verify concurrency if multiple calls run in parallel.


105-109: Relativizing symlink paths.
Implementation is standard. Ensure cross-filesystem edge cases are documented or handled.


118-149: updateDatasourceProperties modifies local JSON.
Deletion-then-write approach can risk partial updates if an exception occurs between these operations. Consider fail-safe or transactional writes if feasible.


230-276: handleLayerSymlinks fallback logic.
Falls back to moving individual mags if no fully linked layer is detected. Code is large; a dedicated helper class or modular approach may increase readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9f63bf and 41ba8ff.

📒 Files selected for processing (15)
  • CHANGELOG.unreleased.md (1 hunks)
  • app/controllers/DatasetController.scala (0 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (4 hunks)
  • app/models/dataset/Dataset.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (2 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • docs/datasets/settings.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/MagLinkInfo.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (1 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (1 hunks)
💤 Files with no reviewable changes (1)
  • app/controllers/DatasetController.scala
🧰 Additional context used
🧬 Code Definitions (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1) (1)
  • DSRemoteWebknossosClient (56-195)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceRepository.scala (1) (1)
  • DataSourceRepository (15-60)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DataSourceService.scala (1) (1)
  • DataSourceService (30-338)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (1) (1)
  • remoteWKClient (152-152)
app/controllers/WKRemoteDataStoreController.scala (1)
app/models/dataset/Dataset.scala (3) (3)
  • findOne (200-207)
  • findOneByDirectoryNameAndOrganization (408-421)
  • findAllForDataset (897-907)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/MagLinkInfo.scala (1)
app/models/dataset/Dataset.scala (1) (1)
  • dataSourceId (89-89)
🪛 LanguageTool
docs/datasets/settings.md

[typographical] ~74-~74: The conjunction “so that” does not have a comma in front.
Context: ...rs to the dataset they are referenced in, so that it can still be accessed. ![Dataset Ed...

(SO_THAT_UNNECESSARY_COMMA)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (33)
CHANGELOG.unreleased.md (1)

21-21: Concise and descriptive changelog entry

The changelog entry clearly explains the improved behavior for deleting datasets/layers, which aligns well with the PR objectives.

conf/webknossos.latest.routes (1)

112-112:

✅ Verification successful

New route for accessing directory paths

This new endpoint appears to support the layer reference management functionality, allowing retrieval of paths for specific organization/directory combinations in a datastore.


🏁 Script executed:

#!/bin/bash
# Check if there's a matching controller implementation for this new route
rg -A 3 -B 3 "def getPaths" --glob "*.scala"

Length of output: 1233


Verified new endpoint for directory path access

I confirmed that the route in conf/webknossos.latest.routes now correctly maps to the getPaths method in app/controllers/WKRemoteDataStoreController.scala. The shell script output verifies that the getPaths controller method (accepting name, key, organizationId, and directoryName as parameters) exists and is properly set up. This endpoint supports the intended layer reference management functionality for retrieving directory paths by organization and directory.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (1)

72-72:

✅ Verification successful

Updated BinaryDataService constructor with additional parameter

The constructor call has been updated to include an additional None parameter, likely corresponding to a new remoteWKClient parameter in the BinaryDataService class. This change supports the enhanced functionality for handling referenced layers.


🏁 Script executed:

#!/bin/bash
# Check the signature of the BinaryDataService class to confirm parameter alignment
rg -A 10 "class BinaryDataService" --glob "*.scala"

Length of output: 3750


Verified Update: BinaryDataService Constructor Parameter

The instantiation in VolumeTracingService.scala now correctly includes an additional None value that matches the new remoteWKClient: Option[DSRemoteWebknossosClient] parameter in BinaryDataService. This change is aligned with the updated constructor signature (as verified in webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala) and supports the enhanced functionality for handling referenced layers.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (1)

123-127: New method for fetching layer paths looks good

This new fetchPaths method allows retrieving linked layer path information from WebKnossos, which is essential for the PR's goal of handling referenced data during layer deletion.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (1)

106-106: Updated constructor call to accommodate new parameter

The BinaryDataService constructor now receives an additional None parameter, which likely represents the optional remoteWKClient needed for the new symlink resolution functionality.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (3)

20-24: Added import for DSRemoteWebknossosClient

Good improvement to explicitly import the DSRemoteWebknossosClient class.


113-114: Added remoteWebknossosClient as dependency

The UploadService now receives the DSRemoteWebknossosClient as a constructor parameter, enabling it to interact with the remote WebKnossos server during upload operations.


152-152: Implemented remoteWKClient method

This override correctly provides access to the injected remoteWebknossosClient, enabling the DatasetDeleter trait to access it when needed for resolving symlinks during layer deletions.

app/models/dataset/DatasetService.scala (3)

7-7: Updated import to include DataSourceMagInfo

The import now correctly references DataSourceMagInfo, which is used in the updated getPathsForDataLayer method.


360-361: Changed return type to use DataSourceMagInfo

The method signature has been updated to return DataSourceMagInfo instead of DatasetMagInfo, which likely provides more comprehensive information needed for the proper handling of symlinked layers.


369-370: Improved filtering logic for linked paths

The filtering logic now excludes paths from the same data source rather than exact path matches. This is a key improvement for the PR's goal of properly handling referenced data during layer deletion, as it ensures that only references to data in other datasets are considered for preservation.

app/controllers/WKRemoteDataStoreController.scala (3)

8-8: Good addition of necessary import.

The imported LayerMagLinkInfo and MagLinkInfo types are required for the new getPaths method implementation.


51-51: Appropriate dependency injection.

Adding the datasetLayerDAO parameter is necessary to fetch layer information in the new getPaths method.


240-255: Well-structured implementation of the getPaths endpoint.

The method correctly:

  1. Validates datastore access
  2. Retrieves organization and dataset
  3. Fetches all layers for the dataset
  4. Gets paths for each layer
  5. Returns structured data in JSON format

This endpoint will be essential for handling layer symlinks when deleting datasets, as described in the PR objectives.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)

25-26: Good addition of remote client parameter.

Adding the remoteWKClient parameter provides the BinaryDataService with the capability to communicate with remote WebKnossos instances, which will be necessary for managing symlinked data across datasets when deleting layers.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala (2)

22-26: Properly updated dependency injection.

The constructor parameters have been neatly reformatted, and the new remoteWebknossosClient parameter has been correctly added to support the functionality required for layer deletion with symlink preservation.


49-50: Correct instantiation with new parameter.

The binaryDataService instantiation is properly updated to include the new remoteWebknossosClient parameter, ensuring the service has all required dependencies.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/MagLinkInfo.scala (3)

7-16: Well-designed data model for source information.

The DataSourceMagInfo class effectively encapsulates all necessary information about a data source, including:

  • Data source identifier
  • Layer name
  • Magnification vector
  • Path information
  • Local data status

The JSON format implementation will facilitate serialization for API responses.


18-22: Clear representation of mag links.

The MagLinkInfo class provides a clean structure to represent relationships between a primary mag and its linked mags, which is essential for tracking symlinks between layers.


24-28: Effective layer organization structure.

The LayerMagLinkInfo class appropriately groups multiple MagLinkInfo objects by layer name, providing a hierarchical structure that will be useful for the getPaths endpoint response.

app/models/dataset/Dataset.scala (6)

8-8: Import usage approved.
This new import for DataSourceMagInfo is consistent with the added logic in the file.


805-814: Comprehensive transformation logic.
rowsToMagInfos neatly converts DB rows into typed domain objects. No issues.


815-823: Query alignment for findPathsForDatasetAndDatalayer.
The joined queries and data-mapping approach seem correct. Confirm $columns includes the columns consumed by DataSourceMagRow.


825-832: Similar query approach for findAllByRealPath.
Implementation matches the pattern in findPathsForDatasetAndDatalayer. Looks good.


35-35:

✅ Verification successful

New slick import for custom row parsing.
Ensure all raw queries align column order with the DataSourceMagRow fields to avoid runtime mismatches.


🏁 Script executed:

#!/bin/bash
# Verify all SELECT column orders match DataSourceMagRow constructor usage
rg -A5 'SELECT \$columns,' 

Length of output: 1130


Slick Import Verified & Column Order Confirmed

The new slick.jdbc.GetResult import is correctly implemented for custom row parsing. Verification confirms that the raw SELECT queries (e.g., those using $columns, _organization, directoryName) align with the DataSourceMagRow fields. Please ensure that any future modifications maintain this column order to avoid runtime mismatches.


793-804:

✅ Verification successful

Check column ordering in GetResultDataSourceMagRow.
Verify that the selected columns strictly match the constructor parameters for DataSourceMagRow. Otherwise, misaligned columns may lead to runtime errors.


🏁 Script executed:

#!/bin/bash
# Verify the SQL queries that populate DataSourceMagRow and confirm they match the constructor's field order
rg -A8 'DataSourceMagRow('

Length of output: 113


Attention: Additional Verification Needed for GetResultDataSourceMagRow Column Alignment

The initial verification command failed due to an unescaped parenthesis in the regex. To properly verify that the SQL queries and the constructor parameters for DataSourceMagRow are aligned, please run the following additional shell script:

This script uses an escaped parenthesis (\() to search for instances where DataSourceMagRow is invoked, and it also locates the class definition to verify the constructor’s parameter order. Once you review the outputs, confirm that the column order in the GetResultDataSourceMagRow implementation strictly matches the order of parameters in the DataSourceMagRow constructor.


🏁 Script executed:

#!/bin/bash
echo "Checking instantiations in app/models/dataset/Dataset.scala:"
rg -A10 "DataSourceMagRow\(" app/models/dataset/Dataset.scala

echo "---------"
echo "Looking for the definition of DataSourceMagRow (its constructor signature):"
rg -n "case class DataSourceMagRow" .

Length of output: 1511


Column ordering in GetResultDataSourceMagRow verified.
The verification confirms that the order of the parameters provided in the DataSourceMagRow instantiation exactly matches the constructor signature declared in the case class (i.e., _dataset, dataLayerName, mag, path, realPath, hasLocalData, _organization, and directoryName). No changes are required.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (7)

2-9: New imports introduced.
All newly imported classes and traits appear relevant for symlink handling, JSON parsing, and remote data source interaction.


11-12: Liftweb imports for Box usage.
Using tryo and Full maintains consistency with the existing code style. No concerns here.


64-68: Symlink move precedes trash deletion.
If moveSymlinks fails, the dataset is not deleted. Verify this ordering is intentional.


70-70: Remote client accessor.
The optional remoteWKClient is straightforward. No immediate issues.


93-103: getFullyLinkedLayers synergy.
Nicely checks for all local mags and an intersection of linked layers. Straightforward logic.


111-117: getMagPath location builder.
Clear path construction for data source references. Looks fine.


150-165: updateMagSymlinks single-step logic.
Deletes any existing link before recreating it. This is functional, but be aware of concurrency edge cases.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, quite the complex logic 🙈 But it does seem justified, I don’t see easy ways to simplify.

I’m a little fearful of exceptions, could you have a look which of the used file functions can throw exceptions and add tryos there?

Also, maybe it makes sense to first check for write access so that a readable error message can be given. Also, what happens if for permission reasons this resymlinking does not happen. Can the deletion still go forward? What is shown to the user?

I also added a few small comments on the code.

Did no testing yet, will do that in the next round.

Comment on lines 51 to 53
logger.info(s"Deleting dataset by moving it from $dataSourcePath to $targetPath${if (reason.isDefined)
s" because ${reason.getOrElse("")}"
else "..."}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info(s"Deleting dataset by moving it from $dataSourcePath to $targetPath${if (reason.isDefined)
s" because ${reason.getOrElse("")}"
else "..."}")
logger.info(s"Deleting dataset by moving it from $dataSourcePath to $targetPath${reason.map(r => s"because $r").getOrElse("...")}")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I wrote this but happy to change 😄

case dl => dl
})
// Write properties back
Files.delete(propertiesPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might throw an exception, maybe we should tryo this too.

@frcroth
Copy link
Member Author

frcroth commented Mar 24, 2025

Wow, quite the complex logic 🙈 But it does seem justified, I don’t see easy ways to simplify.

I’m a little fearful of exceptions, could you have a look which of the used file functions can throw exceptions and add tryos there?

I added the suggestion tryo for updateDatasourceProperties deletion, everything else should be covered by the big tryo in handleLayerSymlinks.

Also, maybe it makes sense to first check for write access so that a readable error message can be given. Also, what happens if for permission reasons this resymlinking does not happen. Can the deletion still go forward? What is shown to the user?

Added an exception before deleting stuff.

@frcroth frcroth requested a review from fm3 March 24, 2025 16:27
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)

112-144: Robust datasource properties update with error handling.

The method properly handles updating datasource properties, including checks for file existence and JSON validation. Error handling with tryo is used for the deletion operation.

Consider wrapping the JSON write operation in a tryo as well:

-              case Full(_) => JsonHelper.jsonToFile(propertiesPath, updatedDataSource)
+              case Full(_) => tryo(JsonHelper.jsonToFile(propertiesPath, updatedDataSource))

146-166: Comprehensive mag symlink update with permission checking.

The method handles various cases for updating mag symlinks, including permission checking, existence verification, and special handling for locally explored datasets.

Consider wrapping the Files.createSymbolicLink operations in tryo blocks to handle potential failures:

-      Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))
+      tryo(Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))) match {
+        case Full(_) => // success
+        case e => logger.warn(s"Failed to create symlink at $linkedMagPath: $e")
+      }

168-240: Complex but logically sound layer migration implementation.

The moveLayer method correctly handles moving layers and updating all related symlinks with appropriate permission checks, deletion of existing symlinks, and recreation of symlinks to the new location.

The comment on line 177 raises a good question about selection criteria. Consider adding a more definitive explanation or implementing a more deterministic selection process for the target layer.


176-178: Target selection strategy could be improved.

Currently, an arbitrary target is selected from the fully linked layers. Consider implementing a more deterministic selection strategy based on specific criteria (e.g., most recent, most referenced, etc.).

-    // Selection of the first one is arbitrary, is there anything to distinguish between them?
-    val target = fullLayerLinks.head
+    // Select target based on criteria like most recent access or layer with most references
+    val target = selectTargetLayer(fullLayerLinks)

A selectTargetLayer method could implement logic to choose the most appropriate target based on metadata or other criteria.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab0ab84 and 6572e32.

📒 Files selected for processing (2)
  • CHANGELOG.unreleased.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.unreleased.md
🔇 Additional comments (9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (9)

2-13: Well-organized imports with necessary additions.

The imports have been properly organized to support the new functionality for handling symlinks and JSON processing.


43-58: New moveToTrash method handles dataset deletion correctly.

The method properly handles moving datasets to the trash directory, with good error handling and logging. It checks if the dataset exists before attempting deletion and appropriately handles both cases.


64-67: Refactored deletion process with symlink handling.

The deleteOnDisk method now uses a clean for-comprehension to combine moveSymlinks and moveToTrash operations, ensuring that symlinks are handled before the dataset is moved to trash.


70-70: Abstract method for remote client access.

This abstract method declaration ensures implementing classes will provide access to the remote WebKnossos client, allowing for proper dependency injection.


87-97: Clear logic for identifying fully linked layers.

The method effectively determines which layers are completely linked through all magnifications, with good comments explaining the logic.


99-103: Efficient path relativization for symlinks.

The method properly converts absolute paths to relative paths for symlink creation, ensuring symlinks remain valid regardless of the absolute path to the data directory.


105-111: Clear mag path construction.

The method constructs paths to magnification directories in a clear, readable manner using appropriate path resolution.


242-297: Comprehensive layer symlink handling with excellent error handling.

The method orchestrates the handling of symlinks for layers during deletion, with proper error handling using tryo. The comments explaining different cases for magnifications without local data are very helpful.


74-85:

❓ Verification inconclusive

Comprehensive symlink migration coordination.

The moveSymlinks method effectively coordinates retrieving paths, handling layer symlinks, and updating datasource properties.

Verify all symlink operations are handled within error boundaries:


🏁 Script executed:

#!/bin/bash
# Check if the handling of symlink operations is properly wrapped in error handling
rg -n "tryo" --context 5 DatasetDeleter.scala

Length of output: 106


Update: Verify Correct File Path & Error Handling in Symlink Migration

It appears the previous verification script referenced the wrong file path (i.e. using "DatasetDeleter.scala" directly). Please ensure that the error boundary handling (using constructs such as the custom operator ?~>) in the moveSymlinks method is correctly applied in the file located at webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala. For clarity, please verify the following:

  • The file exists at the expected location.
  • Error handling for symlink operations (e.g. via Fox.combined(exceptionBoxes.map(Fox.box2Fox)) ?~> "Failed to move symlinks") is in place.
  • Confirm that any internal wrappers (if intended) properly catch and relay errors during the symlink migration.

Once you’ve validated these points—preferably by running the search on the correct file path or manually checking the file—please update the verification accordingly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (7)

53-54: Missing space in log message.

There should be a space between the path and the reason in the log message for better readability.

-          s"Deleting dataset by moving it from $dataSourcePath to $targetPath${reason.map(r => s"because $r").getOrElse("...")}")
+          s"Deleting dataset by moving it from $dataSourcePath to $targetPath ${reason.map(r => s"because $r").getOrElse("...")}")

75-86: Consider adding comprehensive logging for symlink operations.

The symlink handling is complex and critical to data integrity. Consider adding more detailed logging to trace each step of the process, especially for debugging purposes.

 private def moveSymlinks(organizationId: String, datasetName: String)(implicit ec: ExecutionContext) =
   for {
     dataSourceId <- Fox.successful(DataSourceId(datasetName, organizationId))
+    _ = logger.info(s"Starting symlink handling for dataset: $datasetName in organization: $organizationId")
     layersAndLinkedMags <- remoteWebknossosClient.fetchPaths(dataSourceId)
+    _ = logger.info(s"Found ${layersAndLinkedMags.size} layers with linked mags to process")
     exceptionBoxes = layersAndLinkedMags.map(layerMagLinkInfo =>
       handleLayerSymlinks(dataSourceId, layerMagLinkInfo.layerName, layerMagLinkInfo.magLinkInfos.toList))
     _ <- Fox.combined(exceptionBoxes.map(Fox.box2Fox)) ?~> "Failed to move symlinks"
     affectedDataSources = layersAndLinkedMags
       .flatMap(_.magLinkInfos.map(m => m.linkedMags.map(_.dataSourceId)))
       .flatten
+    _ = logger.info(s"Updating properties for ${affectedDataSources.size} affected data sources")
     _ <- updateDatasourceProperties(affectedDataSources)
+    _ = logger.info(s"Successfully completed symlink handling for dataset: $datasetName")
   } yield ()

88-98: Add documentation to clarify the purpose of getFullyLinkedLayers.

This method performs a complex operation to find layers that fully link to all mags. Adding documentation will help future developers understand its purpose and logic.

+  /**
+   * Identifies layers that fully link to all mags in the list.
+   * A fully linked layer is one where:
+   * 1. All mags have local data
+   * 2. There's at least one dataset/layer combination that links to every mag
+   *
+   * @param linkedMags List of mag link information
+   * @return Option containing sequences of (DataSourceId, layerName) pairs that fully link to all mags, or None if no such layers exist
+   */
   private def getFullyLinkedLayers(linkedMags: List[MagLinkInfo]): Option[Seq[(DataSourceId, String)]] = {
     val allMagsLocal = linkedMags.forall(_.mag.hasLocalData)
     val allLinkedDatasetLayers = linkedMags.map(_.linkedMags.map(lm => (lm.dataSourceId, lm.dataLayerName)))
     // Get combinations of datasourceId, layerName that link to EVERY mag
     val linkedToByAllMags = allLinkedDatasetLayers.reduce((a, b) => a.intersect(b))
     if (allMagsLocal && linkedToByAllMags.nonEmpty) {
       Some(linkedToByAllMags)
     } else {
       None
     }
   }

100-104: Simplify the relativizeSymlinkPath method.

The method can be simplified by removing the intermediate variable and directly returning the result.

 private def relativizeSymlinkPath(targetPath: Path, originPath: Path): Path = {
   val absoluteTargetPath = targetPath.toAbsolutePath
-  val relativeTargetPath = originPath.getParent.toAbsolutePath.relativize(absoluteTargetPath)
-  relativeTargetPath
+  originPath.getParent.toAbsolutePath.relativize(absoluteTargetPath)
 }

169-246: Add high-level documentation for the complex moveLayer method.

This method handles multiple complex scenarios and would benefit from clear documentation that explains its purpose and the different cases it handles.

+  /**
+   * Moves a layer from one dataset to another when it's fully linked by other datasets.
+   * This operation involves:
+   * 1. Identifying a target dataset/layer to move the source layer to
+   * 2. Moving the physical layer directory to the target location
+   * 3. Updating all symlinks that were pointing to the original layer
+   * 4. Handling individual mag symlinks for partially linked layers
+   *
+   * @param sourceDataSource The data source ID of the layer being moved
+   * @param sourceLayer The name of the layer being moved
+   * @param fullLayerLinks List of dataset/layer pairs that fully link to this layer
+   * @param layerMags Information about mags in this layer and their links
+   */
   private def moveLayer(sourceDataSource: DataSourceId,
                         sourceLayer: String,
                         fullLayerLinks: Seq[(DataSourceId, String)],
                         layerMags: List[MagLinkInfo]): Unit = {

191-192: Improve log message clarity.

The log message could be more clear about the purpose of the operation being performed.

-      s"Found complete symlinks to layer; Moving layer $sourceLayer from $sourceDataSource to $moveToDataSource/$moveToDataLayer")
+      s"Found complete symlinks to layer. Moving layer $sourceLayer from dataset $sourceDataSource to dataset $moveToDataSource layer $moveToDataLayer to preserve data")

248-303: Consider breaking down handleLayerSymlinks into smaller functions.

This method is quite complex and handles multiple scenarios. Breaking it down into smaller, focused functions would improve readability and maintainability.

Consider refactoring this large method into smaller helper methods:

  1. A method for the fully linked layer case
  2. A method for handling individual mag symlinks
  3. A method specifically for the local data movement

This would make the code more maintainable and easier to test.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6572e32 and 8d14be3.

📒 Files selected for processing (2)
  • util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (2)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (1)
  • failure (67-71)
util/src/main/scala/com/scalableminds/util/time/Instant.scala (1)
  • toString (15-15)
🔇 Additional comments (6)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (2)

95-99: LGTM: Well-structured new method for exception formatting.

The addition of the firstException method provides a clean way to extract and format the exception message from a Failure instance. This follows the established pattern of helper methods in this trait and will improve error reporting by displaying exception information clearly.


117-118: Enhanced error reporting with improved message structure.

The change to include the first exception at the beginning of the formatted message provides better context when debugging failures. This will be particularly useful when handling errors during the layer deletion process mentioned in the PR objectives, making it easier to diagnose issues with symlink operations and data movement.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)

65-68: LGTM: Good error handling in the deletion process.

The for-comprehension with appropriate error messages ensures that symlink handling failures are properly reported before attempting to move the dataset to trash.


147-167: LGTM: Good permission checks before symlink operations.

The method correctly checks write permissions before attempting to modify symlinks, which will help prevent partial operations and provide clear error messages to users.


2-10: LGTM: Good import organization.

The imports are well organized, clearly separating the different types of dependencies needed for the symlink handling functionality.


75-145:

❓ Verification inconclusive

Verify proper error recovery in symlink handling.

The symlink handling involves multiple filesystem operations. Ensure that the system can recover properly if some operations fail in the middle of the process, particularly in cases of permission issues or network failures when dealing with remote resources.


🏁 Script executed:

#!/bin/bash
# Check for exception handling patterns in the codebase

# Look for similar error handling patterns in other parts of the codebase
echo "Checking for tryo usage patterns in other deletion operations:"
rg "tryo.*\{.*delete" --type scala

echo "Checking for recovery mechanisms in similar operations:"
rg "recover|fallback|onFailure" --type scala -A 2 -B 2

Length of output: 65883


Action: Confirm and Document Error Recovery in Symlink Handling

The implementation in DatasetDeleter.scala leverages the Fox monad (with constructs like ?~> and tryo) consistent with error handling patterns across the codebase. However, please verify that:

  • Partial Operation Rollback: In the moveSymlinks method, if some filesystem or remote operations (e.g., moving symlinks or deleting the dataset properties file) succeed while subsequent ones fail, the system does not end up in an inconsistent state. If not already handled, consider adding a rollback mechanism or documenting that partial changes are acceptable.

  • Robust Error Wrapping: Ensure that all critical filesystem operations (such as Files.delete) and remote resource calls are adequately wrapped to capture exceptions related to permission issues or network failures.

If the current error recovery is by design, please add documentation clarifying that these failure paths have been considered. Otherwise, adjustments may be needed to manage partial failures more gracefully.

Comment on lines +123 to +139
JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match {
case Full(dataSource) =>
val updatedDataSource = dataSource.copy(dataLayers = dataSource.dataLayers.map {
case dl: DataLayerWithMagLocators =>
if (dl.mags.forall(_.path.exists(_.startsWith(s"${DataVaultService.schemeFile}://")))) {
// Setting path to None means using resolution of layer/mag directories to access data
dl.mapped(magMapping = _.copy(path = None))
} else {
dl
}
case dl => dl
})
// Write properties back
tryo(Files.delete(propertiesPath)) match {
case Full(_) => JsonHelper.jsonToFile(propertiesPath, updatedDataSource)
case e => e
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Add error handling for JSON parsing failures.

The method handles errors when deleting files but doesn't provide detailed error logging for JSON parsing failures. Consider adding more specific error handling for this case.


🏁 Script executed:

#!/bin/bash
# Check if there are any JSON related failures in the logs

# Look for JSON parsing errors in the logs
rg -i "json.*error|failed.*json|invalid.*json" --type log

Length of output: 208


Action Required: Enhance JSON Parsing Error Handling

  • In DatasetDeleter.scala (lines 123–139), the code only handles file deletion errors but does not account for JSON parsing failures from the call to JsonHelper.validatedJsonFromFile.

  • Please add a dedicated case in the match expression (e.g., a Failure branch) that logs detailed error messages when JSON parsing fails. For example:

    JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match {
      case Full(dataSource) =>
        // … existing processing …
      case Failure(msg, exception, chain) =>
        logger.error(s"JSON parsing failed for file $propertiesPath: $msg", exception)
        // Consider handling the failure further, e.g., returning an error response or safe fallback
      case _ =>
        logger.error(s"Unexpected error during JSON parsing for file $propertiesPath.")
    }
  • Note: Our initial log search using rg did not yield any JSON parsing errors (possibly due to filtering issues), so please verify manually that any JSON parsing failures in your runtime or production environment are captured and logged appropriately.

@frcroth
Copy link
Member Author

frcroth commented Mar 31, 2025

Place this script into sample_organization to generate test datasets to test scenarios (not testing locally explored layers) with a dataset as argument that has a color layer and mags 1, 2-2-1, 4-4-1 ( I have used l4_sample with only the color layer and reduced mags to make copying faster)

#!/bin/bash

set -e  # Exit on error

if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <dataset_directory>"
    exit 1
fi

SOURCE_DIR="$1"
BASE_DIR="$(dirname "$SOURCE_DIR")"

# Function to copy dataset while maintaining symlinks
copy_test_dataset() {
    local src="$1"
    local dest="$2"
    mkdir -p "$dest"
    cp -a "$src"/. "$dest"  # Preserve symlinks and structure
    cp "$src/datasource-properties.json" "$dest/"  # Copy properties file
}

# Test Case 1: DS1 has color with local data, DS2 has color symlinked to DS1
TC1_DS1="$BASE_DIR/TC1_DS1"
TC1_DS2="$BASE_DIR/TC1_DS2"
copy_test_dataset "$SOURCE_DIR" "$TC1_DS1"
mkdir -p "$TC1_DS2"  # Ensure parent directory exists
ln -s "../TC1_DS1/color" "$TC1_DS2/color"
cp "$TC1_DS1/datasource-properties.json" "$TC1_DS2/"  # Copy properties file

# Test Case 2: TC2_DS1 has color with local data, TC2_DS3 has color symlinked to TC2_DS1
TC2_DS1="$BASE_DIR/TC2_DS1"
TC2_DS2="$BASE_DIR/TC2_DS2"
TC2_DS3="$BASE_DIR/TC2_DS3"
copy_test_dataset "$SOURCE_DIR" "$TC2_DS1"
mkdir -p "$TC2_DS3"  # Ensure parent directory exists
ln -s "../TC2_DS1/color" "$TC2_DS3/color"
cp "$TC2_DS1/datasource-properties.json" "$TC2_DS3/"  # Copy properties file
mkdir -p "$TC2_DS2"  # Ensure parent directory exists
ln -s "../TC2_DS1/color" "$TC2_DS2/color"
cp "$TC2_DS1/datasource-properties.json" "$TC2_DS2/"  # Copy properties file

# Test Case 3: TC3_DS1 has color with local data, TC3_DS2 has color with mags symlinked to TC3_DS1
TC3_DS1="$BASE_DIR/TC3_DS1"
TC3_DS2="$BASE_DIR/TC3_DS2"
copy_test_dataset "$SOURCE_DIR" "$TC3_DS1"
mkdir -p "$TC3_DS2/color"
for mag in "$TC3_DS1/color/"*; do
    ln -s "../../$mag" "$TC3_DS2/color/$(basename "$mag")"
done
cp "$TC3_DS1/datasource-properties.json" "$TC3_DS2/"  # Copy properties file

# Test Case 4: TC4_DS1 has color with local data, TC4_DS2 has one mag symlinked to TC4_DS1
TC4_DS1="$BASE_DIR/TC4_DS1"
TC4_DS2="$BASE_DIR/TC4_DS2"
copy_test_dataset "$SOURCE_DIR" "$TC4_DS1"
mkdir -p "$TC4_DS2/color"
ln -s "../../$TC4_DS1/color/1" "$TC4_DS2/color/1"
cp -r "$TC4_DS1/color/2-2-1" "$TC4_DS2/color/2-2-1"
cp -r "$TC4_DS1/color/4-4-1" "$TC4_DS2/color/4-4-1"
cp "$TC4_DS1/datasource-properties.json" "$TC4_DS2/"  # Copy properties file

# Test Case 5: TC5_DS1 has color with local data, TC5_DS2 and TC5_DS3 both symlink the same mag
TC5_DS1="$BASE_DIR/TC5_DS1"
TC5_DS2="$BASE_DIR/TC5_DS2"
TC5_DS3="$BASE_DIR/TC5_DS3"
copy_test_dataset "$SOURCE_DIR" "$TC5_DS1"
mkdir -p "$TC5_DS2/color" "$TC5_DS3/color"
ln -s "../../$TC5_DS1/color/1" "$TC5_DS2/color/1"
cp -r "$TC5_DS1/color/2-2-1" "$TC5_DS2/color/2-2-1"
cp -r "$TC5_DS1/color/4-4-1" "$TC5_DS2/color/4-4-1"
ln -s "../../$TC5_DS1/color/1" "$TC5_DS3/color/1"
cp -r "$TC5_DS1/color/2-2-1" "$TC5_DS3/color/2-2-1"
cp -r "$TC5_DS1/color/4-4-1" "$TC5_DS3/color/4-4-1"
cp "$TC5_DS1/datasource-properties.json" "$TC5_DS2/"  # Copy properties file
cp "$TC5_DS1/datasource-properties.json" "$TC5_DS3/"  # Copy properties file

echo "Test datasets created in $BASE_DIR"
echo "To test deletion scenarios, remove the DS1 datasets"

``

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
app/controllers/WKRemoteDataStoreController.scala (1)

240-255: Consider handling potential error cases more explicitly.

While using the Fox monad provides error handling, consider adding explicit error messaging for the layer retrieval operation to improve debugging and user feedback.

     Action.async { implicit request =>
       dataStoreService.validateAccess(name, key) { _ =>
         for {
           organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext)
           dataset <- datasetDAO.findOneByDirectoryNameAndOrganization(directoryName, organization._id)(
             GlobalAccessContext)
-          layers <- datasetLayerDAO.findAllForDataset(dataset._id)
+          layers <- datasetLayerDAO.findAllForDataset(dataset._id) ?~> 
+            s"Failed to retrieve layers for dataset ${dataset.name} (${dataset._id})"
           magsAndLinkedMags <- Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
           magLinkInfos = magsAndLinkedMags.map(_.map { case (mag, linkedMags) => MagLinkInfo(mag, linkedMags) })
           layersAndMagLinkInfos = layers.zip(magLinkInfos).map {
             case (layer, magLinkInfo) => LayerMagLinkInfo(layer.name, magLinkInfo)
           }
         } yield Ok(Json.toJson(layersAndMagLinkInfos))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d026b65 and cbb431d.

📒 Files selected for processing (5)
  • CHANGELOG.unreleased.md (1 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (4 hunks)
  • conf/webknossos.latest.routes (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • conf/webknossos.latest.routes
  • CHANGELOG.unreleased.md
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (3)
app/controllers/WKRemoteDataStoreController.scala (3)

8-8: LGTM: Required import for new functionality.

The import for LayerMagLinkInfo and MagLinkInfo is necessary to support the return type of the new getPaths method.


38-38: Good addition of explicit duration import.

The import scala.concurrent.duration.DurationInt enables the 30 seconds syntax used on line 203, making the code more readable and ensuring the implicit conversion is explicitly imported.


51-51: LGTM: Required dependency for new functionality.

Adding the datasetLayerDAO dependency to the controller is necessary for retrieving layer information in the new getPaths method, maintaining the dependency injection pattern used throughout the controller.

Comment on lines +240 to +255
def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] =
Action.async { implicit request =>
dataStoreService.validateAccess(name, key) { _ =>
for {
organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext)
dataset <- datasetDAO.findOneByDirectoryNameAndOrganization(directoryName, organization._id)(
GlobalAccessContext)
layers <- datasetLayerDAO.findAllForDataset(dataset._id)
magsAndLinkedMags <- Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
magLinkInfos = magsAndLinkedMags.map(_.map { case (mag, linkedMags) => MagLinkInfo(mag, linkedMags) })
layersAndMagLinkInfos = layers.zip(magLinkInfos).map {
case (layer, magLinkInfo) => LayerMagLinkInfo(layer.name, magLinkInfo)
}
} yield Ok(Json.toJson(layersAndMagLinkInfos))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Ensure proper test coverage for new endpoint.

This new endpoint is critical for managing layer symlinks during deletion operations. Ensure comprehensive test coverage exists to verify its behavior with various dataset and layer configurations.


🏁 Script executed:

#!/bin/bash
# Check if tests exist for this new endpoint
rg -A 3 -B 3 "getPaths.*Action" --type scala test/
rg -A 3 -B 3 "LayerMagLinkInfo" --type scala test/

Length of output: 103


Action Required: Add Comprehensive Tests for the getPaths Endpoint

It appears that no tests referencing the new getPaths endpoint or its usage of LayerMagLinkInfo were found in the test suite. Since this endpoint is critical for managing layer symlinks during deletion operations, please ensure that comprehensive tests are added to verify the behavior with various dataset and layer configurations.

  • File: app/controllers/WKRemoteDataStoreController.scala (lines 240-255)
  • Areas to Cover:
    • Validate that the access control (validateAccess) correctly gates the endpoint.
    • Ensure that retrievals from organizationDAO, datasetDAO, and datasetLayerDAO behave as expected.
    • Test the complete mapping logic that produces LayerMagLinkInfo with varying dataset and layer scenarios.

💡 Verification agent

🧩 Analysis chain

Add documentation and consider performance optimizations.

The implementation of getPaths is well-structured and follows the existing patterns in the controller. However, there are a few improvements to consider:

  1. Add method documentation explaining its purpose and usage context
  2. The Fox.serialCombined performs operations sequentially, which could be a performance bottleneck with many layers - consider if these operations could be parallelized
  3. Ensure comprehensive error handling for all database operations
-  def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] =
+  /**
+   * Retrieves paths for all layers in a dataset.
+   * Used to support layer symlink management when deleting layers.
+   *
+   * @param name Datastore name
+   * @param key Datastore access key
+   * @param organizationId ID of the organization owning the dataset
+   * @param directoryName Directory name of the dataset
+   * @return JSON response with LayerMagLinkInfo objects containing layer paths
+   */
+  def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] =
     Action.async { implicit request =>
       dataStoreService.validateAccess(name, key) { _ =>
         for {
           organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext)
           dataset <- datasetDAO.findOneByDirectoryNameAndOrganization(directoryName, organization._id)(
             GlobalAccessContext)
           layers <- datasetLayerDAO.findAllForDataset(dataset._id)
-          magsAndLinkedMags <- Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
+          // Perform path lookups in parallel for better performance with many layers
+          magsAndLinkedMags <- Fox.combined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
           magLinkInfos = magsAndLinkedMags.map(_.map { case (mag, linkedMags) => MagLinkInfo(mag, linkedMags) })
           layersAndMagLinkInfos = layers.zip(magLinkInfos).map {
             case (layer, magLinkInfo) => LayerMagLinkInfo(layer.name, magLinkInfo)
           }
         } yield Ok(Json.toJson(layersAndMagLinkInfos))
       }
     }

Consider verifying if parallel execution using Fox.combined instead of Fox.serialCombined is appropriate for this case:


🏁 Script executed:

#!/bin/bash
# Check implementation of Fox.serialCombined vs Fox.combined to understand differences
rg -A 5 "def serialCombined" --type scala
rg -A 5 "def combined" --type scala

# Look for similar patterns in the codebase
rg -B 2 -A 2 "Fox\.combined" --type scala

Length of output: 23365


Attention: Required changes on getPaths method for parallel execution

The updated method documentation is a good improvement, and parallelizing the data layer lookups can boost performance. However, based on our investigation:

  • Parallel Execution: The current diff replaces the sequential call

    Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))

    with

    Fox.combined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))

    However, the definition of Fox.combined in our codebase expects a sequence of Fox[T] values (i.e. a single-argument list). To safely execute these operations in parallel, please refactor the call by mapping the function over the layers before combining the results. For example, update the call to:

    magsAndLinkedMags <- Fox.combined(layers.map(l => datasetService.getPathsForDataLayer(dataset._id, l.name)))

    This change ensures that each lookup is executed concurrently according to Fox.combined’s signature.

  • Error Handling: While the core logic remains sound, please verify that comprehensive error handling is applied to all database operations to avoid unhandled failures.

Please incorporate these changes to resolve the concerns.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well, thanks for the setup script!

I like the verbose logging for this stuff, that way we can trace where anything was moved.

I also provoked some errors and I think the handling is ok. (I would prefer smaller tryo units over these big ones, to benefit more from error chaining, but it’s ok like this for the moment)

I added a few more code comments.

@@ -108,7 +114,8 @@ trait Formatter {
}

val serverTimeMsg = if (includeTime) s"[Server Time ${Instant.now}] " else ""
serverTimeMsg + formatOneFailure(failure) + formatStackTrace(failure) + formatNextChain(failure.chain)
serverTimeMsg + firstException(failure) + formatOneFailure(failure) + formatStackTrace(failure) + formatNextChain(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we have includeStackTraces=true? Will the exception class be logged twice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

"java.lang.Error: this is bad: problem Stack trace: java.lang.Error: this is bad
	at <init>(<console>:1)
	at <init>(<console>:6)
	at .<clinit>(<console>:1)
	at .lzycompute(<synthetic>:6)
	at .$print(<synthetic>:5)
	at $print(<synthetic>)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:678)
	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1027)
	at scala...
>....

Should the firstException only be logged when includeStackTraces = false?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean it’s not very important, but seems like this would be nicer, yes :)

// Get combinations of datasourceId, layerName that link to EVERY mag
val linkedToByAllMags = allLinkedDatasetLayers.reduce((a, b) => a.intersect(b))
if (allMagsLocal && linkedToByAllMags.nonEmpty) {
Some(linkedToByAllMags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about encoding this with an empty sequence rather than the Option? Here we have a nonEmpty check and later we use .head.

.resolve(dataSourceId.organizationId)
.resolve(dataSourceId.directoryName)
.resolve(layerName)
.resolve(magToDelete.mag.toMagLiteral(true))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably also try with allowScalar=false. If both variants don’t exist, throw exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this looks like duplication with the new getMagPath function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the duplication. The different possible mag paths generally make more parts difficult, I will look into it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in 14f8a82. This adds another dimension to test.

.resolve(layerName)
.resolve(magToDelete.mag.toMagLiteral(true))
// Select an arbitrary linked mag to move to
val target = magLinkInfo.linkedMags.head
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another .head which would lead to non-readable exception. maybe add nonEmpty assertion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in a block that is already guarded by
if (magLinkInfo.linkedMags.nonEmpty) {

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2)

184-184: Re-check fullLayerLinks emptiness in moveLayer for robustness.

As mentioned in a past review comment, the code assumes that fullLayerLinks is non-empty, but it would be more robust to verify this within the method.


123-141: ⚠️ Potential issue

Improve JSON parsing error handling.

The current implementation only handles the Full case explicitly when parsing JSON, but doesn't provide detailed error handling for parsing failures.

Add explicit error handling for JSON parsing failures:

        JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match {
          case Full(dataSource) =>
            val updatedDataSource = dataSource.copy(dataLayers = dataSource.dataLayers.map {
              case dl: DataLayerWithMagLocators =>
                if (dl.mags.forall(_.path.exists(_.startsWith(s"${DataVaultService.schemeFile}://")))) {
                  // Setting path to None means using resolution of layer/mag directories to access data
                  dl.mapped(magMapping = _.copy(path = None))
                } else {
                  dl
                }
              case dl => dl
            })
            // Write properties back
            tryo(Files.delete(propertiesPath)) match {
              case Full(_) => JsonHelper.jsonToFile(propertiesPath, updatedDataSource)
              case e       => e
            }
-          case _ => Full(())
+          case net.liftweb.common.Failure(msg, exception, _) =>
+            logger.error(s"JSON parsing failed for file $propertiesPath: $msg", exception)
+            Fox.failure(s"Failed to parse JSON from $propertiesPath: $msg")
+          case net.liftweb.common.Empty =>
+            logger.warn(s"Empty result when reading JSON from $propertiesPath")
+            Full(())
+          case unexpected =>
+            logger.error(s"Unexpected result type when reading JSON from $propertiesPath: $unexpected")
+            Full(())
        }
🧹 Nitpick comments (3)
util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1)

91-96: Improved exception reporting but may cause duplication

The change to return exception.toString when stack traces are disabled is a good improvement as it provides more context in the logs. However, when includeStackTraces is true, the exception class and message will likely appear twice in the output: once from exception.toString and again at the beginning of the stack trace.

As discussed in the previous comments between fm3 and frcroth, this duplication could be avoided.

Consider this alternative implementation to avoid duplication:

case Full(exception) =>
  if (includeStackTraces)
    s" Stack trace: ${TextUtils.stackTraceAsString(exception)} "
  else
    exception.toString

You could also consider a more comprehensive refactor:

case Full(exception) =>
  if (includeStackTraces) {
    s" Stack trace: ${TextUtils.stackTraceAsString(exception)} "
  } else {
    exception.toString
  }
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2)

44-59: Implementation of moveToTrash looks good with minor formatting issue.

The method implementation correctly handles both cases where the dataset exists and doesn't exist, with appropriate logging. The trash directory creation and path construction logic is sound.

I noticed a small formatting issue in the log message. Consider adding a space before "because" for better readability:

-          s"Deleting dataset by moving it from $dataSourcePath to $targetPath${reason.map(r => s"because $r").getOrElse("...")}")
+          s"Deleting dataset by moving it from $dataSourcePath to $targetPath${reason.map(r => s" because $r").getOrElse("...")}")

75-86: Improve error handling in moveSymlinks method.

The method effectively orchestrates symlink handling, but could benefit from more detailed error reporting.

Consider providing more specific error messages for individual layer failures:

-      exceptionBoxes = layersAndLinkedMags.map(layerMagLinkInfo =>
-        handleLayerSymlinks(dataSourceId, layerMagLinkInfo.layerName, layerMagLinkInfo.magLinkInfos.toList))
-      _ <- Fox.combined(exceptionBoxes.map(Fox.box2Fox)) ?~> "Failed to move symlinks"
+      exceptionBoxes = layersAndLinkedMags.map(layerMagLinkInfo => {
+        val result = handleLayerSymlinks(dataSourceId, layerMagLinkInfo.layerName, layerMagLinkInfo.magLinkInfos.toList)
+        result.map(_ => layerMagLinkInfo.layerName).getOrElse(s"Failed for layer: ${layerMagLinkInfo.layerName}")
+      })
+      _ <- Fox.combined(exceptionBoxes.map(name => 
+        if (name.startsWith("Failed")) Fox.failure(name) else Fox.successful(name)
+      )) ?~> "Failed to move symlinks"

Additionally, lines 83-84 could be more concisely written:

-      affectedDataSources = layersAndLinkedMags
-        .flatMap(_.magLinkInfos.map(m => m.linkedMags.map(_.dataSourceId)))
-        .flatten
+      affectedDataSources = layersAndLinkedMags
+        .flatMap(_.magLinkInfos.flatMap(m => m.linkedMags.map(_.dataSourceId)))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6c5123 and 30f0d0f.

📒 Files selected for processing (2)
  • util/src/main/scala/com/scalableminds/util/mvc/Formatter.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)
util/src/main/scala/com/scalableminds/util/tools/JsonHelper.scala (2)
  • JsonHelper (18-139)
  • validatedJsonFromFile (31-34)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
  • DSRemoteWebknossosClient (56-195)
  • fetchPaths (123-129)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/Cuboid.scala (1)
  • mag (46-46)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (329-403)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2)

65-69: Good use of for-comprehension for sequential operations.

The refactoring to use a for-comprehension makes the code more readable and maintainable. The sequence of operations ensures that symlinks are properly handled before deletion.


254-304: Good implementation of handleLayerSymlinks with comprehensive error handling.

The method effectively manages the logic for handling layer symlinks with excellent comments explaining edge cases. The tryo wrapper provides good top-level exception handling.

Comment on lines 169 to 252
private def moveLayer(sourceDataSource: DataSourceId,
sourceLayer: String,
fullLayerLinks: Seq[(DataSourceId, String)],
layerMags: List[MagLinkInfo]): Unit = {
// Move layer on disk
val layerPath =
dataBaseDir.resolve(sourceDataSource.organizationId).resolve(sourceDataSource.directoryName).resolve(sourceLayer)

if (fullLayerLinks.isEmpty) {
throw new IllegalArgumentException(
s"Cannot move layer $sourceLayer from $sourceDataSource, no fully linked layers provided!")
}

// Select one of the fully linked layers as target to move layer to
// Selection of the first one is arbitrary, is there anything to distinguish between them?
val target = fullLayerLinks.head
val moveToDataSource = target._1
val moveToDataLayer = target._2
val targetPath = dataBaseDir
.resolve(moveToDataSource.organizationId)
.resolve(moveToDataSource.directoryName)
.resolve(moveToDataLayer)

// Before deleting, check write permissions at targetPath
if (!Files.isWritable(targetPath.getParent)) {
throw new Exception(s"Cannot move layer $sourceLayer to $targetPath, no write permissions!")
}

logger.info(
s"Found complete symlinks to layer; Moving layer $sourceLayer from $sourceDataSource to $moveToDataSource/$moveToDataLayer")
if (Files.exists(targetPath) && Files.isSymbolicLink(targetPath)) {
Files.delete(targetPath)
}
if (Files.exists(targetPath) && Files.isDirectory(targetPath)) {
// This happens when the fully linked layer consists of mag symlinks. The directory exists and is full of symlinked mags.
// We need to delete the directory before moving the layer.
FileUtils.deleteDirectory(targetPath.toFile)
}
Files.move(layerPath, targetPath)

// All symlinks are now broken, we need to recreate them
// There may be more layers that are "fully linked", where we need to add only one symlink

fullLayerLinks.tail.foreach { linkedLayer =>
val linkedLayerPath =
dataBaseDir.resolve(linkedLayer._1.organizationId).resolve(linkedLayer._1.directoryName).resolve(linkedLayer._2)
// Before deleting, check write permissions at linkedLayerPath
if (!Files.isWritable(linkedLayerPath.getParent)) {
throw new Exception(s"Cannot move layer $sourceLayer to $targetPath, no write permissions!")
}
if (Files.exists(linkedLayerPath) || Files.isSymbolicLink(linkedLayerPath)) {
// Two cases exist here: 1. The layer is a regular directory where each mag is a symlink
// 2. The layer is a symlink to the other layer itself.
// We can handle both by deleting the layer and creating a new symlink.
Files.delete(linkedLayerPath)
logger.info(
s"Deleting existing symlink at $linkedLayerPath linking to $sourceDataSource/$sourceLayer, creating new symlink")
Files.createSymbolicLink(linkedLayerPath, relativizeSymlinkPath(targetPath, linkedLayerPath))
} else {
if (!Files.exists(linkedLayerPath)) {
// This happens when the layer is a locally explored dataset, where the path is directly written into the properties
// and no layer directory actually exists.
Files.createSymbolicLink(linkedLayerPath, relativizeSymlinkPath(targetPath, linkedLayerPath))
} else {
// This should not happen, since we got the info from WK that a layer exists here
logger.warn(s"Trying to recreate symlink at layer $linkedLayerPath, but it does not exist!")
}
}
}

// For every mag that linked to this layer, we need to update the symlink
// We need to discard the already handled mags (fully linked layers)

layerMags.foreach { magLinkInfo =>
val mag = magLinkInfo.mag
val newMagPath = targetPath.resolve(mag.mag.toMagLiteral(true))
magLinkInfo.linkedMags
.filter(linkedMag => !fullLayerLinks.contains((linkedMag.dataSourceId, linkedMag.dataLayerName))) // Filter out mags that are fully linked layers, we already handled them
.foreach { linkedMag =>
updateMagSymlinks(newMagPath, linkedMag)
}
}

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix error message in moveLayer method.

The error message on line 217 references the wrong path variable.

Fix the error message to reference the correct path variable:

      // Before deleting, check write permissions at linkedLayerPath
      if (!Files.isWritable(linkedLayerPath.getParent)) {
-        throw new Exception(s"Cannot move layer $sourceLayer to $targetPath, no write permissions!")
+        throw new Exception(s"Cannot create symlink at $linkedLayerPath, no write permissions!")
      }

Also, consider adding additional error handling for file operations similar to the suggestions for updateMagSymlinks.

Additionally, consider checking if fullLayerLinks is empty at the beginning of the method call to avoid throwing exceptions inside the method:

  private def moveLayer(sourceDataSource: DataSourceId,
                        sourceLayer: String,
                        fullLayerLinks: Seq[(DataSourceId, String)],
                        layerMags: List[MagLinkInfo]): Unit = {
+    require(fullLayerLinks.nonEmpty, 
+      s"Cannot move layer $sourceLayer from $sourceDataSource, no fully linked layers provided!")
+      
    // Move layer on disk
    val layerPath =
      dataBaseDir.resolve(sourceDataSource.organizationId).resolve(sourceDataSource.directoryName).resolve(sourceLayer)

-    if (fullLayerLinks.isEmpty) {
-      throw new IllegalArgumentException(
-        s"Cannot move layer $sourceLayer from $sourceDataSource, no fully linked layers provided!")
-    }

    // Select one of the fully linked layers as target to move layer to
    // Selection of the first one is arbitrary, is there anything to distinguish between them?
    val target = fullLayerLinks.head
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private def moveLayer(sourceDataSource: DataSourceId,
sourceLayer: String,
fullLayerLinks: Seq[(DataSourceId, String)],
layerMags: List[MagLinkInfo]): Unit = {
// Move layer on disk
val layerPath =
dataBaseDir.resolve(sourceDataSource.organizationId).resolve(sourceDataSource.directoryName).resolve(sourceLayer)
if (fullLayerLinks.isEmpty) {
throw new IllegalArgumentException(
s"Cannot move layer $sourceLayer from $sourceDataSource, no fully linked layers provided!")
}
// Select one of the fully linked layers as target to move layer to
// Selection of the first one is arbitrary, is there anything to distinguish between them?
val target = fullLayerLinks.head
val moveToDataSource = target._1
val moveToDataLayer = target._2
val targetPath = dataBaseDir
.resolve(moveToDataSource.organizationId)
.resolve(moveToDataSource.directoryName)
.resolve(moveToDataLayer)
// Before deleting, check write permissions at targetPath
if (!Files.isWritable(targetPath.getParent)) {
throw new Exception(s"Cannot move layer $sourceLayer to $targetPath, no write permissions!")
}
logger.info(
s"Found complete symlinks to layer; Moving layer $sourceLayer from $sourceDataSource to $moveToDataSource/$moveToDataLayer")
if (Files.exists(targetPath) && Files.isSymbolicLink(targetPath)) {
Files.delete(targetPath)
}
if (Files.exists(targetPath) && Files.isDirectory(targetPath)) {
// This happens when the fully linked layer consists of mag symlinks. The directory exists and is full of symlinked mags.
// We need to delete the directory before moving the layer.
FileUtils.deleteDirectory(targetPath.toFile)
}
Files.move(layerPath, targetPath)
// All symlinks are now broken, we need to recreate them
// There may be more layers that are "fully linked", where we need to add only one symlink
fullLayerLinks.tail.foreach { linkedLayer =>
val linkedLayerPath =
dataBaseDir.resolve(linkedLayer._1.organizationId).resolve(linkedLayer._1.directoryName).resolve(linkedLayer._2)
// Before deleting, check write permissions at linkedLayerPath
if (!Files.isWritable(linkedLayerPath.getParent)) {
throw new Exception(s"Cannot move layer $sourceLayer to $targetPath, no write permissions!")
}
if (Files.exists(linkedLayerPath) || Files.isSymbolicLink(linkedLayerPath)) {
// Two cases exist here: 1. The layer is a regular directory where each mag is a symlink
// 2. The layer is a symlink to the other layer itself.
// We can handle both by deleting the layer and creating a new symlink.
Files.delete(linkedLayerPath)
logger.info(
s"Deleting existing symlink at $linkedLayerPath linking to $sourceDataSource/$sourceLayer, creating new symlink")
Files.createSymbolicLink(linkedLayerPath, relativizeSymlinkPath(targetPath, linkedLayerPath))
} else {
if (!Files.exists(linkedLayerPath)) {
// This happens when the layer is a locally explored dataset, where the path is directly written into the properties
// and no layer directory actually exists.
Files.createSymbolicLink(linkedLayerPath, relativizeSymlinkPath(targetPath, linkedLayerPath))
} else {
// This should not happen, since we got the info from WK that a layer exists here
logger.warn(s"Trying to recreate symlink at layer $linkedLayerPath, but it does not exist!")
}
}
}
// For every mag that linked to this layer, we need to update the symlink
// We need to discard the already handled mags (fully linked layers)
layerMags.foreach { magLinkInfo =>
val mag = magLinkInfo.mag
val newMagPath = targetPath.resolve(mag.mag.toMagLiteral(true))
magLinkInfo.linkedMags
.filter(linkedMag => !fullLayerLinks.contains((linkedMag.dataSourceId, linkedMag.dataLayerName))) // Filter out mags that are fully linked layers, we already handled them
.foreach { linkedMag =>
updateMagSymlinks(newMagPath, linkedMag)
}
}
}
private def moveLayer(sourceDataSource: DataSourceId,
sourceLayer: String,
fullLayerLinks: Seq[(DataSourceId, String)],
layerMags: List[MagLinkInfo]): Unit = {
+ require(fullLayerLinks.nonEmpty,
+ s"Cannot move layer $sourceLayer from $sourceDataSource, no fully linked layers provided!")
// Move layer on disk
val layerPath =
dataBaseDir.resolve(sourceDataSource.organizationId).resolve(sourceDataSource.directoryName).resolve(sourceLayer)
- if (fullLayerLinks.isEmpty) {
- throw new IllegalArgumentException(
- s"Cannot move layer $sourceLayer from $sourceDataSource, no fully linked layers provided!")
- }
// Select one of the fully linked layers as target to move layer to
// Selection of the first one is arbitrary, is there anything to distinguish between them?
val target = fullLayerLinks.head
val moveToDataSource = target._1
val moveToDataLayer = target._2
val targetPath = dataBaseDir
.resolve(moveToDataSource.organizationId)
.resolve(moveToDataSource.directoryName)
.resolve(moveToDataLayer)
// Before deleting, check write permissions at targetPath
if (!Files.isWritable(targetPath.getParent)) {
throw new Exception(s"Cannot move layer $sourceLayer to $targetPath, no write permissions!")
}
logger.info(
s"Found complete symlinks to layer; Moving layer $sourceLayer from $sourceDataSource to $moveToDataSource/$moveToDataLayer")
if (Files.exists(targetPath) && Files.isSymbolicLink(targetPath)) {
Files.delete(targetPath)
}
if (Files.exists(targetPath) && Files.isDirectory(targetPath)) {
// This happens when the fully linked layer consists of mag symlinks. The directory exists and is full of symlinked mags.
// We need to delete the directory before moving the layer.
FileUtils.deleteDirectory(targetPath.toFile)
}
Files.move(layerPath, targetPath)
// All symlinks are now broken, we need to recreate them
// There may be more layers that are "fully linked", where we need to add only one symlink
fullLayerLinks.tail.foreach { linkedLayer =>
val linkedLayerPath =
dataBaseDir.resolve(linkedLayer._1.organizationId).resolve(linkedLayer._1.directoryName).resolve(linkedLayer._2)
// Before deleting, check write permissions at linkedLayerPath
if (!Files.isWritable(linkedLayerPath.getParent)) {
- throw new Exception(s"Cannot move layer $sourceLayer to $targetPath, no write permissions!")
+ throw new Exception(s"Cannot create symlink at $linkedLayerPath, no write permissions!")
}
if (Files.exists(linkedLayerPath) || Files.isSymbolicLink(linkedLayerPath)) {
// Two cases exist here: 1. The layer is a regular directory where each mag is a symlink
// 2. The layer is a symlink to the other layer itself.
// We can handle both by deleting the layer and creating a new symlink.
Files.delete(linkedLayerPath)
logger.info(
s"Deleting existing symlink at $linkedLayerPath linking to $sourceDataSource/$sourceLayer, creating new symlink")
Files.createSymbolicLink(linkedLayerPath, relativizeSymlinkPath(targetPath, linkedLayerPath))
} else {
if (!Files.exists(linkedLayerPath)) {
// This happens when the layer is a locally explored dataset, where the path is directly written into the properties
// and no layer directory actually exists.
Files.createSymbolicLink(linkedLayerPath, relativizeSymlinkPath(targetPath, linkedLayerPath))
} else {
// This should not happen, since we got the info from WK that a layer exists here
logger.warn(s"Trying to recreate symlink at layer $linkedLayerPath, but it does not exist!")
}
}
}
// For every mag that linked to this layer, we need to update the symlink
// We need to discard the already handled mags (fully linked layers)
layerMags.foreach { magLinkInfo =>
val mag = magLinkInfo.mag
val newMagPath = targetPath.resolve(mag.mag.toMagLiteral(true))
magLinkInfo.linkedMags
.filter(linkedMag => !fullLayerLinks.contains((linkedMag.dataSourceId, linkedMag.dataLayerName))) // Filter out mags that are fully linked layers, we already handled them
.foreach { linkedMag =>
updateMagSymlinks(newMagPath, linkedMag)
}
}
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (3)

222-224: ⚠️ Potential issue

Fix error message in moveLayer method.

The error message references the wrong path variable.

Fix the error message to reference the correct path variable:

      // Before deleting, check write permissions at linkedLayerPath
      if (!Files.isWritable(linkedLayerPath.getParent)) {
-        throw new Exception(s"Cannot move layer $sourceLayer to $targetPath, no write permissions!")
+        throw new Exception(s"Cannot create symlink at $linkedLayerPath, no write permissions!")
      }

88-98: ⚠️ Potential issue

Add guard against empty collections in getFullyLinkedLayers.

The method uses reduce which will throw an exception if allLinkedDatasetLayers is empty.

Add a guard to prevent exceptions with empty collections:

  private def getFullyLinkedLayers(linkedMags: List[MagLinkInfo]): Seq[(DataSourceId, String)] = {
    val allMagsLocal = linkedMags.forall(_.mag.hasLocalData)
    val allLinkedDatasetLayers = linkedMags.map(_.linkedMags.map(lm => (lm.dataSourceId, lm.dataLayerName)))
    // Get combinations of datasourceId, layerName that link to EVERY mag
-    val linkedToByAllMags = allLinkedDatasetLayers.reduce((a, b) => a.intersect(b))
+    val linkedToByAllMags = if (allLinkedDatasetLayers.isEmpty) Seq() 
+                           else allLinkedDatasetLayers.reduce((a, b) => a.intersect(b))
    if (allMagsLocal && linkedToByAllMags.nonEmpty) {
      linkedToByAllMags
    } else {
      Seq()
    }
  }

149-173: 🛠️ Refactor suggestion

Add exception handling to updateMagSymlinks.

This method performs file operations that could fail with exceptions, but doesn't have comprehensive error handling.

Wrap file operations in tryo blocks to handle potential exceptions:

  private def updateMagSymlinks(targetMagPath: Path, linkedMag: DataSourceMagInfo): Unit = {
    val linkedMagPaths = getPossibleMagPaths(dataBaseDir, linkedMag)
    // Before deleting, check write permissions at linkedMagPath
    if (!Files.isWritable(linkedMagPaths.head.getParent)) {
      throw new Exception(s"Cannot update symlink at ${linkedMagPaths.head}, no write permissions!")
    }
    val existingLinkedMagPath = linkedMagPaths.find(p => Files.exists(p) || Files.isSymbolicLink(p))

    existingLinkedMagPath match {
      case Some(linkedMagPath) =>
-        Files.delete(linkedMagPath)
+        tryo(Files.delete(linkedMagPath)).getOrElse {
+          logger.error(s"Failed to delete existing symlink at $linkedMagPath")
+          throw new Exception(s"Failed to delete existing symlink at $linkedMagPath")
+        }
        logger.info(s"Deleting symlink and recreating it at $linkedMagPath")
-        Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))
+        tryo(Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))).getOrElse {
+          logger.error(s"Failed to create symlink at $linkedMagPath pointing to $targetMagPath")
+          throw new Exception(s"Failed to create symlink at $linkedMagPath pointing to $targetMagPath")
+        }
      case None =>
        val linkedMagPath = linkedMagPaths.head
        if (!Files.exists(linkedMagPath) && linkedMag.path == linkedMag.realPath) {
          // This is the case for locally explored datasets
          // Since locally explored datasets are always fully linked layers when explored, this case can
          // only happen if one of the mags was manually edited in the properties file.
-          Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))
+          tryo(Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))).getOrElse {
+            logger.error(s"Failed to create symlink for locally explored dataset at $linkedMagPath pointing to $targetMagPath")
+            throw new Exception(s"Failed to create symlink for locally explored dataset at $linkedMagPath pointing to $targetMagPath")
+          }
        } else {
          logger.warn(s"Trying to recreate symlink at mag $linkedMagPath, but it does not exist!")
        }
    }
  }
🧹 Nitpick comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)

183-186: Consider moving the check for empty fullLayerLinks.

This check should ideally be at the beginning of the method to fail early.

Consider refactoring to check early in the method:

  private def moveLayer(sourceDataSource: DataSourceId,
                        sourceLayer: String,
                        fullLayerLinks: Seq[(DataSourceId, String)],
                        layerMags: List[MagLinkInfo]): Unit = {
+    if (fullLayerLinks.isEmpty) {
+      throw new IllegalArgumentException(
+        s"Cannot move layer $sourceLayer from $sourceDataSource, no fully linked layers provided!")
+    }
    // Move layer on disk
    val layerPath =
      dataBaseDir.resolve(sourceDataSource.organizationId).resolve(sourceDataSource.directoryName).resolve(sourceLayer)

-    if (fullLayerLinks.isEmpty) {
-      throw new IllegalArgumentException(
-        s"Cannot move layer $sourceLayer from $sourceDataSource, no fully linked layers provided!")
-    }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30f0d0f and 1711604.

📒 Files selected for processing (4)
  • CHANGELOG.unreleased.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (0 hunks)
💤 Files with no reviewable changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala
🚧 Files skipped from review as they are similar to previous changes (2)
  • CHANGELOG.unreleased.md
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
🧰 Additional context used
🧬 Code Graph Analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (4)
util/src/main/scala/com/scalableminds/util/tools/JsonHelper.scala (3)
  • JsonHelper (18-139)
  • validatedJsonFromFile (31-34)
  • jsonToFile (20-25)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
  • DSRemoteWebknossosClient (56-195)
  • fetchPaths (123-129)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/Cuboid.scala (1)
  • mag (46-46)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (329-403)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (9)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (9)

2-10: Imports added to support symlink handling functionality.

The new imports support handling of JSON file operations, data source models, and remote WebKnossos client integration - all necessary for the symlink relocation feature.


44-59: Method extraction improves code organization.

Good refactoring to extract the moveToTrash functionality into a separate method. This improves readability and separates concerns appropriately. The logging is informative and the handling of non-existent datasets is graceful.


65-69: Added symlink handling before trashing datasets.

The code now properly handles symlinks before moving datasets to trash, ensuring that referenced data is preserved.


71-71: Abstract method for remote client access.

This abstract method provides a clean way for implementations to supply the remote client dependency.


75-86: Comprehensive symlink movement orchestration.

This method effectively orchestrates the symlink handling process by fetching paths, processing each layer's symlinks, and updating affected data sources. The error messages are clear and helpful for debugging.


100-104: Efficient path relativization for symlinks.

The method correctly calculates relative paths to ensure symlinks work properly across different directory structures.


106-113: Utility method for resolving mag paths.

The method handles both scalar and non-scalar mag path formats, providing flexibility for different mag representations.


265-322: Comprehensive error handling with tryo.

Good use of tryo to catch exceptions in the complex symlink handling process. The logic handles both fully linked layers and incomplete symlinks appropriately.


115-147:

❓ Verification inconclusive

Add error handling for JSON parsing failures.

The method handles errors when deleting files but doesn't provide detailed error logging for JSON parsing failures.

Enhance the error handling for JSON parsing:

  JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match {
    case Full(dataSource) =>
      // ... existing processing ...
+    case Failure(msg, exception, chain) =>
+      logger.error(s"JSON parsing failed for file $propertiesPath: $msg", exception)
+      Full(())  // Return empty success to continue with other files
    case _ => Full(())
  }

🏁 Script executed:

#!/bin/bash
# Check if there are any JSON related failures in the logs

# Look for JSON parsing errors in the logs
rg -i "json.*error|failed.*json|invalid.*json" --type log

Length of output: 208


Action: Enhance JSON Parsing Error Handling

We should extend error handling in the JSON parsing logic by explicitly matching on failure cases and logging detailed errors. Although our initial log search didn't show any JSON-related errors (likely due to filtering issues rather than a true absence of failures), it's important to proactively handle and log any JSON parsing issues. Please update the code as follows:

  JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match {
    case Full(dataSource) =>
      // ... existing processing ...
+    case Failure(msg, exception, chain) =>
+      logger.error(s"JSON parsing failed for file $propertiesPath: $msg", exception)
+      Full(())  // Continue processing other files gracefully
    case _ => Full(())
  }

Please manually verify that error handling works as expected for JSON parsing failures.

@frcroth frcroth requested a review from fm3 April 9, 2025 12:00
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (1)

224-226: ⚠️ Potential issue

Fix incorrect error message in moveLayer method

The error message references targetPath instead of linkedLayerPath, making the message confusing and inaccurate.

// Before deleting, check write permissions at linkedLayerPath
if (!Files.isWritable(linkedLayerPath.getParent)) {
-  throw new Exception(s"Cannot move layer $sourceLayer to $targetPath, no write permissions!")
+  throw new Exception(s"Cannot create symlink at $linkedLayerPath, no write permissions!")
}
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2)

102-106: Optimize relativizeSymlinkPath method

The method can be simplified by removing the intermediate variable on line 105.

private def relativizeSymlinkPath(targetPath: Path, originPath: Path): Path = {
  val absoluteTargetPath = targetPath.toAbsolutePath
-  val relativeTargetPath = originPath.getParent.toAbsolutePath.relativize(absoluteTargetPath)
-  relativeTargetPath
+  originPath.getParent.toAbsolutePath.relativize(absoluteTargetPath)
}

177-265: Consider refactoring moveLayer method for better maintainability

The moveLayer method is complex with multiple responsibilities. Consider breaking it down into smaller, focused methods to improve readability and maintainability.

The method could be split into:

  1. A method to validate preconditions (lines 185-203)
  2. A method to move the layer (lines 204-216)
  3. A method to update symlinks for linked layers (lines 217-245)
  4. A method to update symlinks for linked mags (lines 246-263)

This would make the code easier to understand, test, and maintain.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1711604 and 6f7e6bd.

📒 Files selected for processing (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (3)
util/src/main/scala/com/scalableminds/util/tools/JsonHelper.scala (2)
  • JsonHelper (18-139)
  • validatedJsonFromFile (31-34)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
  • DSRemoteWebknossosClient (56-195)
  • fetchPaths (123-129)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (329-403)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (6)

44-59: LGTM! Clean implementation of the moveToTrash method.

The method correctly handles both cases where the dataset exists and doesn't exist, properly creating the trash directory only when needed and providing clear logging.


71-71: Add abstract method for remoteWebknossosClient

Good addition of an abstract method that allows implementations to provide their own client instance. This creates a clean dependency injection point.


75-86: Well-structured method for handling symlinks during deletion

The method effectively orchestrates the symlink update process by fetching paths, handling layer symlinks, and updating datasource properties. The error handling is appropriate with the use of the ?~> operator to provide clear error messages.


88-100: Improved getFullyLinkedLayers with empty collection handling

The method now safely handles empty collections before applying reduce, addressing the previous review comment. The logic for determining fully linked layers is clear and well-commented.


108-115: LGTM! Comprehensive mag path generation

The method correctly handles both scalar and non-scalar variants of the mag path, addressing a previous review comment.


267-324: Good use of error handling with tryo block

This method properly uses a tryo block to handle exceptions that might occur during symlink manipulation, addressing previous review comments about error handling.

Comment on lines +151 to +175
private def updateMagSymlinks(targetMagPath: Path, linkedMag: DataSourceMagInfo): Unit = {
val linkedMagPaths = getPossibleMagPaths(dataBaseDir, linkedMag)
// Before deleting, check write permissions at linkedMagPath
if (!Files.isWritable(linkedMagPaths.head.getParent)) {
throw new Exception(s"Cannot update symlink at ${linkedMagPaths.head}, no write permissions!")
}
val existingLinkedMagPath = linkedMagPaths.find(p => Files.exists(p) || Files.isSymbolicLink(p))

existingLinkedMagPath match {
case Some(linkedMagPath) =>
Files.delete(linkedMagPath)
logger.info(s"Deleting symlink and recreating it at $linkedMagPath")
Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))
case None =>
val linkedMagPath = linkedMagPaths.head
if (!Files.exists(linkedMagPath) && linkedMag.path == linkedMag.realPath) {
// This is the case for locally explored datasets
// Since locally explored datasets are always fully linked layers when explored, this case can
// only happen if one of the mags was manually edited in the properties file.
Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))
} else {
logger.warn(s"Trying to recreate symlink at mag $linkedMagPath, but it does not exist!")
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling to file operations in updateMagSymlinks

The method performs file operations without try-catch blocks or error handling, which could lead to unhandled exceptions.

existingLinkedMagPath match {
  case Some(linkedMagPath) =>
-    Files.delete(linkedMagPath)
+    tryo(Files.delete(linkedMagPath)).getOrElse {
+      logger.error(s"Failed to delete existing symlink at $linkedMagPath")
+      throw new Exception(s"Failed to delete existing symlink at $linkedMagPath")
+    }
    logger.info(s"Deleting symlink and recreating it at $linkedMagPath")
-    Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))
+    tryo(Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))).getOrElse {
+      logger.error(s"Failed to create symlink at $linkedMagPath pointing to $targetMagPath")
+      throw new Exception(s"Failed to create symlink at $linkedMagPath pointing to $targetMagPath")
+    }
  case None =>
    val linkedMagPath = linkedMagPaths.head
    if (!Files.exists(linkedMagPath) && linkedMag.path == linkedMag.realPath) {
      // This is the case for locally explored datasets
-      Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))
+      tryo(Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))).getOrElse {
+        logger.error(s"Failed to create symlink for locally explored dataset at $linkedMagPath pointing to $targetMagPath")
+        throw new Exception(s"Failed to create symlink for locally explored dataset at $linkedMagPath pointing to $targetMagPath")
+      }
    } else {
      logger.warn(s"Trying to recreate symlink at mag $linkedMagPath, but it does not exist!")
    }
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private def updateMagSymlinks(targetMagPath: Path, linkedMag: DataSourceMagInfo): Unit = {
val linkedMagPaths = getPossibleMagPaths(dataBaseDir, linkedMag)
// Before deleting, check write permissions at linkedMagPath
if (!Files.isWritable(linkedMagPaths.head.getParent)) {
throw new Exception(s"Cannot update symlink at ${linkedMagPaths.head}, no write permissions!")
}
val existingLinkedMagPath = linkedMagPaths.find(p => Files.exists(p) || Files.isSymbolicLink(p))
existingLinkedMagPath match {
case Some(linkedMagPath) =>
Files.delete(linkedMagPath)
logger.info(s"Deleting symlink and recreating it at $linkedMagPath")
Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))
case None =>
val linkedMagPath = linkedMagPaths.head
if (!Files.exists(linkedMagPath) && linkedMag.path == linkedMag.realPath) {
// This is the case for locally explored datasets
// Since locally explored datasets are always fully linked layers when explored, this case can
// only happen if one of the mags was manually edited in the properties file.
Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))
} else {
logger.warn(s"Trying to recreate symlink at mag $linkedMagPath, but it does not exist!")
}
}
}
private def updateMagSymlinks(targetMagPath: Path, linkedMag: DataSourceMagInfo): Unit = {
val linkedMagPaths = getPossibleMagPaths(dataBaseDir, linkedMag)
// Before deleting, check write permissions at linkedMagPath
if (!Files.isWritable(linkedMagPaths.head.getParent)) {
throw new Exception(s"Cannot update symlink at ${linkedMagPaths.head}, no write permissions!")
}
val existingLinkedMagPath = linkedMagPaths.find(p => Files.exists(p) || Files.isSymbolicLink(p))
existingLinkedMagPath match {
case Some(linkedMagPath) =>
tryo(Files.delete(linkedMagPath)).getOrElse {
logger.error(s"Failed to delete existing symlink at $linkedMagPath")
throw new Exception(s"Failed to delete existing symlink at $linkedMagPath")
}
logger.info(s"Deleting symlink and recreating it at $linkedMagPath")
tryo(Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))).getOrElse {
logger.error(s"Failed to create symlink at $linkedMagPath pointing to $targetMagPath")
throw new Exception(s"Failed to create symlink at $linkedMagPath pointing to $targetMagPath")
}
case None =>
val linkedMagPath = linkedMagPaths.head
if (!Files.exists(linkedMagPath) && linkedMag.path == linkedMag.realPath) {
// This is the case for locally explored datasets
tryo(Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))).getOrElse {
logger.error(s"Failed to create symlink for locally explored dataset at $linkedMagPath pointing to $targetMagPath")
throw new Exception(s"Failed to create symlink for locally explored dataset at $linkedMagPath pointing to $targetMagPath")
}
} else {
logger.warn(s"Trying to recreate symlink at mag $linkedMagPath, but it does not exist!")
}
}
}

Comment on lines +117 to +149
private def updateDatasourceProperties(dataSourceIds: List[DataSourceId])(
implicit ec: ExecutionContext): Fox[List[Unit]] =
// We need to update locally explored datasets, since they now may have symlinks where previously they only had the
// path property set.
Fox.serialCombined(dataSourceIds)(dataSourceId => {
val propertiesPath = dataBaseDir
.resolve(dataSourceId.organizationId)
.resolve(dataSourceId.directoryName)
.resolve(GenericDataSource.FILENAME_DATASOURCE_PROPERTIES_JSON)
if (Files.exists(propertiesPath)) {
JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match {
case Full(dataSource) =>
val updatedDataSource = dataSource.copy(dataLayers = dataSource.dataLayers.map {
case dl: DataLayerWithMagLocators =>
if (dl.mags.forall(_.path.exists(_.startsWith(s"${DataVaultService.schemeFile}://")))) {
// Setting path to None means using resolution of layer/mag directories to access data
dl.mapped(magMapping = _.copy(path = None))
} else {
dl
}
case dl => dl
})
// Write properties back
tryo(Files.delete(propertiesPath)) match {
case Full(_) => JsonHelper.jsonToFile(propertiesPath, updatedDataSource)
case e => e
}
case _ => Full(())
}
} else {
Full(())
}
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Enhance JSON parsing error handling

The method only handles successful JSON parsing (Full case) and file deletion errors but doesn't provide detailed handling for JSON parsing failures.

JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match {
  case Full(dataSource) =>
    val updatedDataSource = dataSource.copy(dataLayers = dataSource.dataLayers.map {
      case dl: DataLayerWithMagLocators =>
        if (dl.mags.forall(_.path.exists(_.startsWith(s"${DataVaultService.schemeFile}://")))) {
          // Setting path to None means using resolution of layer/mag directories to access data
          dl.mapped(magMapping = _.copy(path = None))
        } else {
          dl
        }
      case dl => dl
    })
    // Write properties back
    tryo(Files.delete(propertiesPath)) match {
      case Full(_) => JsonHelper.jsonToFile(propertiesPath, updatedDataSource)
      case e       => e
    }
+  case Failure(msg, exception, chain) =>
+    logger.error(s"JSON parsing failed for file $propertiesPath: $msg", exception)
+    Full(())
  case _ => 
+    logger.warn(s"Unexpected result when parsing JSON from $propertiesPath")
    Full(())
}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private def updateDatasourceProperties(dataSourceIds: List[DataSourceId])(
implicit ec: ExecutionContext): Fox[List[Unit]] =
// We need to update locally explored datasets, since they now may have symlinks where previously they only had the
// path property set.
Fox.serialCombined(dataSourceIds)(dataSourceId => {
val propertiesPath = dataBaseDir
.resolve(dataSourceId.organizationId)
.resolve(dataSourceId.directoryName)
.resolve(GenericDataSource.FILENAME_DATASOURCE_PROPERTIES_JSON)
if (Files.exists(propertiesPath)) {
JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match {
case Full(dataSource) =>
val updatedDataSource = dataSource.copy(dataLayers = dataSource.dataLayers.map {
case dl: DataLayerWithMagLocators =>
if (dl.mags.forall(_.path.exists(_.startsWith(s"${DataVaultService.schemeFile}://")))) {
// Setting path to None means using resolution of layer/mag directories to access data
dl.mapped(magMapping = _.copy(path = None))
} else {
dl
}
case dl => dl
})
// Write properties back
tryo(Files.delete(propertiesPath)) match {
case Full(_) => JsonHelper.jsonToFile(propertiesPath, updatedDataSource)
case e => e
}
case _ => Full(())
}
} else {
Full(())
}
})
private def updateDatasourceProperties(dataSourceIds: List[DataSourceId])(
implicit ec: ExecutionContext): Fox[List[Unit]] =
// We need to update locally explored datasets, since they now may have symlinks where previously they only had the
// path property set.
Fox.serialCombined(dataSourceIds)(dataSourceId => {
val propertiesPath = dataBaseDir
.resolve(dataSourceId.organizationId)
.resolve(dataSourceId.directoryName)
.resolve(GenericDataSource.FILENAME_DATASOURCE_PROPERTIES_JSON)
if (Files.exists(propertiesPath)) {
JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match {
case Full(dataSource) =>
val updatedDataSource = dataSource.copy(dataLayers = dataSource.dataLayers.map {
case dl: DataLayerWithMagLocators =>
if (dl.mags.forall(_.path.exists(_.startsWith(s"${DataVaultService.schemeFile}://")))) {
// Setting path to None means using resolution of layer/mag directories to access data
dl.mapped(magMapping = _.copy(path = None))
} else {
dl
}
case dl => dl
})
// Write properties back
tryo(Files.delete(propertiesPath)) match {
case Full(_) => JsonHelper.jsonToFile(propertiesPath, updatedDataSource)
case e => e
}
case Failure(msg, exception, chain) =>
logger.error(s"JSON parsing failed for file $propertiesPath: $msg", exception)
Full(())
case _ =>
logger.warn(s"Unexpected result when parsing JSON from $propertiesPath")
Full(())
}
} else {
Full(())
}
})

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
app/controllers/WKRemoteDataStoreController.scala (1)

244-259: 🛠️ Refactor suggestion

Add documentation and consider optimizing the getPaths method.

The implementation retrieves path information for data layers correctly, but lacks documentation and could benefit from performance optimization.

Consider adding method documentation and implementing parallel processing:

+/**
+ * Retrieves paths for all layers in a dataset.
+ * Used to support layer symlink management when deleting layers.
+ *
+ * @param name Datastore name
+ * @param key Datastore access key
+ * @param organizationId ID of the organization owning the dataset
+ * @param directoryName Directory name of the dataset
+ * @return JSON response with LayerMagLinkInfo objects containing layer paths
+ */
 def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] =
   Action.async { implicit request =>
     dataStoreService.validateAccess(name, key) { _ =>
       for {
         organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext)
         dataset <- datasetDAO.findOneByDirectoryNameAndOrganization(directoryName, organization._id)(
           GlobalAccessContext)
         layers <- datasetLayerDAO.findAllForDataset(dataset._id)
-        magsAndLinkedMags <- Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
+        magsAndLinkedMags <- Fox.combined(layers.map(l => datasetService.getPathsForDataLayer(dataset._id, l.name)))
         magLinkInfos = magsAndLinkedMags.map(_.map { case (mag, linkedMags) => MagLinkInfo(mag, linkedMags) })
         layersAndMagLinkInfos = layers.zip(magLinkInfos).map {
           case (layer, magLinkInfo) => LayerMagLinkInfo(layer.name, magLinkInfo)
         }
       } yield Ok(Json.toJson(layersAndMagLinkInfos))
     }
   }

This change:

  1. Adds proper documentation explaining the endpoint's purpose
  2. Changes from sequential Fox.serialCombined to parallel Fox.combined for better performance with multiple layers
  3. Uses the correct mapping pattern required for Fox.combined as noted in the previous review
🧹 Nitpick comments (2)
CHANGELOG.unreleased.md (2)

21-21: Clarify the relocation description in the deletion entry.
The changelog entry on line 21 clearly indicates the new behavior for dataset/layer deletion. However, the phrase “moved there” could be ambiguous. It might be clearer to state “moved to the referencing dataset(s)” to specify the destination of the relocated layers.


29-29: Refine the phrasing for the fixed issue entry.
On line 29, the current wording “Fixed that it was possible to trigger the find largest segment id job on layers…” is a bit awkward. Consider rephrasing it to:
“Fixed an issue where the find largest segment id job could be triggered on layers that are not stored as segmentation layers.”
This wording improves clarity and readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f7e6bd and d808612.

📒 Files selected for processing (5)
  • CHANGELOG.unreleased.md (1 hunks)
  • app/controllers/WKRemoteDataStoreController.scala (4 hunks)
  • app/models/dataset/DatasetService.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/uploading/UploadService.scala
🧰 Additional context used
🧬 Code Graph Analysis (2)
app/models/dataset/DatasetService.scala (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/MagLinkInfo.scala (2)
  • DataSourceMagInfo (7-12)
  • DataSourceMagInfo (14-16)
app/models/dataset/Dataset.scala (3)
  • findPathsForDatasetAndDatalayer (808-818)
  • findAllByRealPath (818-827)
  • dataSourceId (89-89)
app/controllers/WKRemoteDataStoreController.scala (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/MagLinkInfo.scala (4)
  • LayerMagLinkInfo (24-24)
  • LayerMagLinkInfo (26-28)
  • MagLinkInfo (18-18)
  • MagLinkInfo (20-22)
app/models/dataset/Dataset.scala (3)
  • findOne (200-207)
  • findOneByDirectoryNameAndOrganization (408-421)
  • findAllForDataset (890-900)
app/models/dataset/DatasetService.scala (1)
  • getPathsForDataLayer (371-386)
🪛 LanguageTool
CHANGELOG.unreleased.md

[grammar] ~22-~22: The usual collocation for “returned” is “to”, not “in”.
Context: ...new dataset's id and directory name are returned in the response. [#8476](https://github.co...

(RETURN_IN_THE)


[duplication] ~27-~27: Possible typo: you repeated a word.
Context: ...calableminds/webknossos/pull/8147) ### Fixed - Fixed the alignment of the button that allows...

(ENGLISH_WORD_REPEAT_RULE)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: frontend-tests
  • GitHub Check: backend-tests
  • GitHub Check: build-test-e2e-deploy
🔇 Additional comments (7)
app/models/dataset/DatasetService.scala (4)

7-7: Verified import of DataSourceMagInfo.

The import for DataSourceMagInfo correctly supports the type change in the getPathsForDataLayer method.


364-370: Excellent documentation addressing previous review feedback.

The added documentation explains the function's purpose and return value structure clearly, which addresses the previous comment about explaining the nested list in the return type.


371-372: Type change from DatasetMagInfo to DataSourceMagInfo.

The signature update correctly reflects the transition to the new DataSourceMagInfo type that's used throughout the codebase for better handling of data sources and their relationships.


380-380: Improved filtering logic for path information.

The updated filter condition is more precise, filtering by dataSourceId instead of the entire object. This change better handles the symlink resolution required for layer deletion operations.

app/controllers/WKRemoteDataStoreController.scala (3)

8-8: Verified import for LayerMagLinkInfo and MagLinkInfo.

The import statement correctly includes the classes needed for the new getPaths endpoint.


51-51: Added datasetLayerDAO to constructor.

The constructor now includes the necessary dependency for fetching layer data in the new getPaths method.


244-259:

❓ Verification inconclusive

Add comprehensive tests for this critical endpoint.

This endpoint is crucial for managing layer symlinks during deletion operations, but appears to lack dedicated tests.

Please verify test coverage by running:

Consider adding tests that verify:

  1. Access validation works correctly
  2. Proper retrieval of layers and mag infos
  3. Correct formatting of the response
  4. Error handling for missing datasets/orgs

🏁 Script executed:

#!/bin/bash
# Search for tests that verify the getPaths endpoint
echo "Searching for existing tests for getPaths endpoint..."
rg -A 3 -B 3 "getPaths" --type scala test/
rg -A 3 -B 3 "LayerMagLinkInfo" --type scala test/

echo "Checking for tests of the underlying getPathsForDataLayer method..."
rg -A 3 -B 3 "getPathsForDataLayer" --type scala test/

Length of output: 410


Critical: No automated tests detected for the getPaths endpoint in WKRemoteDataStoreController.scala

The current verification indicates that there are no tests covering this endpoint. Given its crucial role in managing layer symlinks during deletion operations, please add comprehensive tests that ensure:

  • Access Control: Verify that access validation works correctly with valid and invalid credentials.
  • Data Retrieval: Ensure that layers and their associated mag info are retrieved accurately.
  • Response Formatting: Confirm that the JSON response is formatted as expected.
  • Error Handling: Properly handle scenarios when the organization, dataset, or corresponding data layers are missing.

Since the initial search did not yield any test results, please manually verify the absence of tests and consider adding the necessary ones.

Copy link
Member

@fm3 fm3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright let’s go :) Please wait with merging until the current CI problems are resolved, hopefully today.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (3)

177-269: Fix error message in moveLayer method.

The error message on line 225 references the wrong path variable.

// Before deleting, check write permissions at linkedLayerPath
if (!Files.isWritable(linkedLayerPath.getParent)) {
-  throw new Exception(s"Cannot move layer $sourceLayer to $targetPath, no write permissions!")
+  throw new Exception(s"Cannot create symlink at $linkedLayerPath, no write permissions!")
}

Additionally, the moveLayer method is quite complex and handles many edge cases. While it's comprehensive, it would benefit from:

  1. Adding error handling for file operations similar to the suggestions for updateMagSymlinks
  2. Adding more granular logging for better traceability
  3. Breaking up the method into smaller functions with clear responsibilities

151-175: ⚠️ Potential issue

Add error handling to file operations in updateMagSymlinks.

The method performs file operations without proper error handling, which could lead to unhandled exceptions.

existingLinkedMagPath match {
  case Some(linkedMagPath) =>
-    Files.delete(linkedMagPath)
+    tryo(Files.delete(linkedMagPath)).getOrElse {
+      logger.error(s"Failed to delete existing symlink at $linkedMagPath")
+      throw new Exception(s"Failed to delete existing symlink at $linkedMagPath")
+    }
    logger.info(s"Deleting symlink and recreating it at $linkedMagPath")
-    Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))
+    tryo(Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))).getOrElse {
+      logger.error(s"Failed to create symlink at $linkedMagPath pointing to $targetMagPath")
+      throw new Exception(s"Failed to create symlink at $linkedMagPath pointing to $targetMagPath")
+    }
  case None =>
    val linkedMagPath = linkedMagPaths.head
    if (!Files.exists(linkedMagPath) && linkedMag.path == linkedMag.realPath) {
      // This is the case for locally explored datasets
-      Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))
+      tryo(Files.createSymbolicLink(linkedMagPath, relativizeSymlinkPath(targetMagPath, linkedMagPath))).getOrElse {
+        logger.error(s"Failed to create symlink for locally explored dataset at $linkedMagPath pointing to $targetMagPath")
+        throw new Exception(s"Failed to create symlink for locally explored dataset at $linkedMagPath pointing to $targetMagPath")
+      }
    } else {
      logger.warn(s"Trying to recreate symlink at mag $linkedMagPath, but it does not exist!")
    }
}

117-149: ⚠️ Potential issue

Enhance JSON parsing error handling in updateDatasourceProperties.

The method needs more specific error handling for JSON parsing failures, as noted in a previous review.

JsonHelper.validatedJsonFromFile[DataSource](propertiesPath, dataBaseDir) match {
  case Full(dataSource) =>
    val updatedDataSource = dataSource.copy(dataLayers = dataSource.dataLayers.map {
      case dl: DataLayerWithMagLocators =>
        if (dl.mags.forall(_.path.exists(_.startsWith(s"${DataVaultService.schemeFile}://")))) {
          // Setting path to None means using resolution of layer/mag directories to access data
          dl.mapped(magMapping = _.copy(path = None))
        } else {
          dl
        }
      case dl => dl
    })
    // Write properties back
    tryo(Files.delete(propertiesPath)) match {
      case Full(_) => JsonHelper.jsonToFile(propertiesPath, updatedDataSource)
      case e       => e
    }
+  case Failure(msg, exception, chain) =>
+    logger.error(s"JSON parsing failed for file $propertiesPath: $msg", exception)
+    Full(())
-  case _ => Full(())
+  case _ => 
+    logger.warn(s"Unexpected result when parsing JSON from $propertiesPath")
+    Full(())
}
🧹 Nitpick comments (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2)

102-106: The relativizeSymlinkPath method is correct but could be simplified.

The method correctly relativizes paths but could be more concise.

private def relativizeSymlinkPath(targetPath: Path, originPath: Path): Path = {
-  val absoluteTargetPath = targetPath.toAbsolutePath
-  val relativeTargetPath = originPath.getParent.toAbsolutePath.relativize(absoluteTargetPath)
-  relativeTargetPath
+  originPath.getParent.toAbsolutePath.relativize(targetPath.toAbsolutePath)
}

271-328: Comprehensive error handling in handleLayerSymlinks.

Good use of tryo to wrap the entire operation in error handling. However, consider:

  1. Add more targeted error handling for specific operations rather than wrapping the entire block
  2. Add more detailed logging at key decision points to aid in debugging
  3. Consider restructuring the deeply nested if/else blocks for better readability

For example:

private def handleLayerSymlinks(dataSourceId: DataSourceId,
                                layerName: String,
                                linkedMags: List[MagLinkInfo]): Box[Unit] =
  tryo {
    val fullyLinkedLayers = getFullyLinkedLayers(linkedMags)
+    logger.debug(s"Found ${fullyLinkedLayers.size} fully linked layers for $dataSourceId/$layerName")
    if (fullyLinkedLayers.nonEmpty) {
      moveLayer(dataSourceId, layerName, fullyLinkedLayers, linkedMags)
    } else {
      logger.info(s"Found incomplete symlinks to layer; Moving mags from $dataSourceId to other datasets")
      linkedMags.foreach { magLinkInfo =>
        // Consider extracting this block to a separate private method
        val magToDelete = magLinkInfo.mag
        if (magLinkInfo.linkedMags.nonEmpty) {
          if (magToDelete.hasLocalData) {
            // ... existing code ...
          } else {
            // ... existing code ...
          }
        }
      }
    }
  }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d808612 and 4776358.

📒 Files selected for processing (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (3)
util/src/main/scala/com/scalableminds/util/tools/JsonHelper.scala (2)
  • JsonHelper (18-139)
  • validatedJsonFromFile (31-34)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DSRemoteWebknossosClient.scala (2)
  • DSRemoteWebknossosClient (56-195)
  • fetchPaths (123-129)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • mapped (329-403)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build-test-e2e-deploy
  • GitHub Check: backend-tests
  • GitHub Check: frontend-tests
🔇 Additional comments (6)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/DatasetDeleter.scala (6)

2-9: Import organization looks good.

The new imports are well-organized and appropriately grouped, supporting the new functionality for symlink handling and JSON operations.


44-69: Implementation of moveToTrash is clean and effective.

The moveToTrash method effectively isolates the trash directory creation and dataset relocation logic, providing better separation of concerns. The method appropriately handles the case when the dataset doesn't exist by logging and continuing successfully.


65-69: Good use of for-comprehension for sequential operations.

Using a for-comprehension here nicely sequences the symlink handling before dataset deletion, ensuring referenced data is preserved. The error message for failing to remake symlinks is clear and helpful.


71-71: Abstract method for remote client is appropriate.

Adding the abstract remoteWebknossosClient method to the trait allows for dependency injection while maintaining encapsulation, which is a good design choice.


88-100: Good enhancement of getFullyLinkedLayers to handle empty collections.

The method correctly checks for empty collections before using reduce(), which would otherwise throw an exception. This was previously identified in a review and has been properly addressed.


108-115: Comprehensive approach to handle multiple possible mag paths.

The method considers both scalar and non-scalar mag paths, which is a robust approach to handling different possible mag configurations.

Comment on lines +75 to +86
private def moveSymlinks(organizationId: String, datasetName: String)(implicit ec: ExecutionContext) =
for {
dataSourceId <- Fox.successful(DataSourceId(datasetName, organizationId))
layersAndLinkedMags <- remoteWebknossosClient.fetchPaths(dataSourceId)
exceptionBoxes = layersAndLinkedMags.map(layerMagLinkInfo =>
handleLayerSymlinks(dataSourceId, layerMagLinkInfo.layerName, layerMagLinkInfo.magLinkInfos.toList))
_ <- Fox.combined(exceptionBoxes.map(Fox.box2Fox)) ?~> "Failed to move symlinks"
affectedDataSources = layersAndLinkedMags
.flatMap(_.magLinkInfos.map(m => m.linkedMags.map(_.dataSourceId)))
.flatten
_ <- updateDatasourceProperties(affectedDataSources)
} yield ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

The moveSymlinks implementation is comprehensive but needs error handling enhancement.

While the implementation correctly fetches and processes linked mags, there's room for improvement in error handling, particularly for file operations.

Consider enhancing the error handling by:

  1. Using more descriptive error messages in the Fox failures to help with debugging
  2. Adding logging when operations succeed to improve traceability
private def moveSymlinks(organizationId: String, datasetName: String)(implicit ec: ExecutionContext) =
  for {
    dataSourceId <- Fox.successful(DataSourceId(datasetName, organizationId))
    layersAndLinkedMags <- remoteWebknossosClient.fetchPaths(dataSourceId)
+    _ <- Fox.successful(logger.info(s"Found ${layersAndLinkedMags.size} layers with linked mags to process for $dataSourceId"))
    exceptionBoxes = layersAndLinkedMags.map(layerMagLinkInfo =>
      handleLayerSymlinks(dataSourceId, layerMagLinkInfo.layerName, layerMagLinkInfo.magLinkInfos.toList))
-    _ <- Fox.combined(exceptionBoxes.map(Fox.box2Fox)) ?~> "Failed to move symlinks"
+    _ <- Fox.combined(exceptionBoxes.map(Fox.box2Fox)) ?~> s"Failed to move symlinks for dataset $datasetName in organization $organizationId"
    affectedDataSources = layersAndLinkedMags
      .flatMap(_.magLinkInfos.map(m => m.linkedMags.map(_.dataSourceId)))
      .flatten
+    _ <- Fox.successful(logger.info(s"Updating datasource properties for ${affectedDataSources.size} affected datasources"))
    _ <- updateDatasourceProperties(affectedDataSources)
  } yield ()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private def moveSymlinks(organizationId: String, datasetName: String)(implicit ec: ExecutionContext) =
for {
dataSourceId <- Fox.successful(DataSourceId(datasetName, organizationId))
layersAndLinkedMags <- remoteWebknossosClient.fetchPaths(dataSourceId)
exceptionBoxes = layersAndLinkedMags.map(layerMagLinkInfo =>
handleLayerSymlinks(dataSourceId, layerMagLinkInfo.layerName, layerMagLinkInfo.magLinkInfos.toList))
_ <- Fox.combined(exceptionBoxes.map(Fox.box2Fox)) ?~> "Failed to move symlinks"
affectedDataSources = layersAndLinkedMags
.flatMap(_.magLinkInfos.map(m => m.linkedMags.map(_.dataSourceId)))
.flatten
_ <- updateDatasourceProperties(affectedDataSources)
} yield ()
private def moveSymlinks(organizationId: String, datasetName: String)(implicit ec: ExecutionContext) =
for {
dataSourceId <- Fox.successful(DataSourceId(datasetName, organizationId))
layersAndLinkedMags <- remoteWebknossosClient.fetchPaths(dataSourceId)
_ <- Fox.successful(logger.info(s"Found ${layersAndLinkedMags.size} layers with linked mags to process for $dataSourceId"))
exceptionBoxes = layersAndLinkedMags.map(layerMagLinkInfo =>
handleLayerSymlinks(dataSourceId, layerMagLinkInfo.layerName, layerMagLinkInfo.magLinkInfos.toList))
_ <- Fox.combined(exceptionBoxes.map(Fox.box2Fox)) ?~> s"Failed to move symlinks for dataset $datasetName in organization $organizationId"
affectedDataSources = layersAndLinkedMags
.flatMap(_.magLinkInfos.map(m => m.linkedMags.map(_.dataSourceId)))
.flatten
_ <- Fox.successful(logger.info(s"Updating datasource properties for ${affectedDataSources.size} affected datasources"))
_ <- updateDatasourceProperties(affectedDataSources)
} yield ()

@frcroth frcroth merged commit 56d6e6e into master Apr 14, 2025
5 checks passed
@frcroth frcroth deleted the move-symlinks-on-dataset-delete branch April 14, 2025 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Delete layers that are referenced by other datasets
2 participants