Skip to content

When deleting layers, move referenced data #8437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Apr 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
2552b5c
Handle simplest case, one layer with symlink to deleted layer
frcroth Mar 10, 2025
7edc95a
Handle other layers that also link to removed dataset
frcroth Mar 12, 2025
9f8b0fe
Move and relink mags
frcroth Mar 12, 2025
b718bed
Rewrite datasource properties for locally explored layers
frcroth Mar 17, 2025
2be93a5
Fix handling of mag, update changelog
frcroth Mar 19, 2025
2437fc3
Merge branch 'master' into move-symlinks-on-dataset-delete
frcroth Mar 19, 2025
f6edac4
Revert changes to application.conf
frcroth Mar 19, 2025
41ba8ff
Remove code duplication
frcroth Mar 19, 2025
6c54a51
Add logs whenever something is deleted
frcroth Mar 19, 2025
ab0ab84
Make remoteWkClient non optional
frcroth Mar 24, 2025
6d45d08
Throw exceptions when write permission is not given
frcroth Mar 24, 2025
6572e32
Merge branch 'master' into move-symlinks-on-dataset-delete
frcroth Mar 24, 2025
8d14be3
Handle deletion of fully linked layer with all mags seperately linked
frcroth Mar 26, 2025
d026b65
fix format
frcroth Mar 26, 2025
1f59a64
Merge branch 'master' into move-symlinks-on-dataset-delete
frcroth Mar 31, 2025
cbb431d
Merge branch 'master' into move-symlinks-on-dataset-delete
frcroth Mar 31, 2025
b3de4af
Merge branch 'master' into move-symlinks-on-dataset-delete
fm3 Apr 7, 2025
f6c5123
improve error msg on realpath scan
fm3 Apr 7, 2025
30f0d0f
Apply suggestions from code review
frcroth Apr 9, 2025
14f8a82
Handle non-compact mag paths
frcroth Apr 9, 2025
1711604
Merge branch 'master' into move-symlinks-on-dataset-delete
frcroth Apr 9, 2025
6f7e6bd
Prevent exception on empty reduce
frcroth Apr 9, 2025
d808612
Merge branch 'master' into move-symlinks-on-dataset-delete
frcroth Apr 14, 2025
e104ab2
Merge branch 'master' into move-symlinks-on-dataset-delete
frcroth Apr 14, 2025
4776358
Handle edge case of fully linked layer with symlinked mags and not mo…
frcroth Apr 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions CHANGELOG.unreleased.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,16 @@ For upgrade instructions, please check the [migration guide](MIGRATIONS.released
- New config option `datastore.dataVaults.credentials` allows admins to set up global credentials for remote dataset loading. [#8509](https://github.com/scalableminds/webknossos/pull/8509)

### Changed
- When deleting a dataset / layer, layers that are referenced in other datasets are moved there instead of being deleted. [#8437](https://github.com/scalableminds/webknossos/pull/8437/)
- Added a parameter to the reserve manual upload route allowing to make the request fail if the name is already taken. Moreover, the new dataset's id and directory name are returned in the response. [#8476](https://github.com/scalableminds/webknossos/pull/8476)
- The skeleton tool can no longer be activated if the skeleton layer is invisible. [#8501](https://github.com/scalableminds/webknossos/pull/8501)
- Improved speed of mesh rendering and mouse interaction in 3D viewport. [#8106](https://github.com/scalableminds/webknossos/pull/8106)

- Numbered docker image now use different and larger numbers. [#8147](https://github.com/scalableminds/webknossos/pull/8147)

### Fixed
- Fixed the alignment of the button that allows restricting floodfill operations to a bounding box. [#8388](https://github.com/scalableminds/webknossos/pull/8388)
- Fixed that is was possible to trigger the find largest segment id job on layers which are not stored as segmentation layers on the server. [#8503](https://github.com/scalableminds/webknossos/pull/8503)
- Fixed that it was possible to trigger the find largest segment id job on layers which are not stored as segmentation layers on the server. [#8503](https://github.com/scalableminds/webknossos/pull/8503)
- Fixed a rare and subtle bug related to volume annotation and undo/redo. [#7506](https://github.com/scalableminds/webknossos/pull/7506)

- Fixed a bug where segment statistics would sometimes be wrong in case of an on-disk segmentation fallback layer with segment index file. [#8460](https://github.com/scalableminds/webknossos/pull/8460)
- Fixed a bug where sometimes outdated segment statistics would be displayed. [#8460](https://github.com/scalableminds/webknossos/pull/8460)
- Fixed a bug where outbound zarr streaming would contain a typo in the zarr header dimension_separator field. [#8510](https://github.com/scalableminds/webknossos/pull/8510)
Expand Down
6 changes: 0 additions & 6 deletions app/controllers/DatasetController.scala
Original file line number Diff line number Diff line change
Expand Up @@ -71,12 +71,6 @@ object SegmentAnythingMaskParameters {
implicit val jsonFormat: Format[SegmentAnythingMaskParameters] = Json.format[SegmentAnythingMaskParameters]
}

case class MagLinkInfo(mag: DatasetMagInfo, linkedMags: Seq[DatasetMagInfo])

object MagLinkInfo {
implicit val jsonFormat: Format[MagLinkInfo] = Json.format[MagLinkInfo]
}

class DatasetController @Inject()(userService: UserService,
userDAO: UserDAO,
datasetService: DatasetService,
Expand Down
21 changes: 20 additions & 1 deletion app/controllers/WKRemoteDataStoreController.scala
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import com.scalableminds.util.objectid.ObjectId
import com.scalableminds.util.time.Instant
import com.scalableminds.util.tools.Fox
import com.scalableminds.webknossos.datastore.controllers.JobExportProperties
import com.scalableminds.webknossos.datastore.helpers.{LayerMagLinkInfo, MagLinkInfo}
import com.scalableminds.webknossos.datastore.models.UnfinishedUpload
import com.scalableminds.webknossos.datastore.models.datasource.DataSourceId
import com.scalableminds.webknossos.datastore.models.datasource.inbox.{InboxDataSourceLike => InboxDataSource}
Expand Down Expand Up @@ -33,8 +34,8 @@ import play.api.mvc.{Action, AnyContent, PlayBodyParsers}
import security.{WebknossosBearerTokenAuthenticatorService, WkSilhouetteEnvironment}
import telemetry.SlackNotificationService
import utils.WkConf
import scala.concurrent.duration.DurationInt

import scala.concurrent.duration.DurationInt
import javax.inject.Inject
import scala.concurrent.{ExecutionContext, Future}

Expand All @@ -47,6 +48,7 @@ class WKRemoteDataStoreController @Inject()(
organizationDAO: OrganizationDAO,
usedStorageService: UsedStorageService,
datasetDAO: DatasetDAO,
datasetLayerDAO: DatasetLayerDAO,
userDAO: UserDAO,
folderDAO: FolderDAO,
teamDAO: TeamDAO,
Expand Down Expand Up @@ -239,6 +241,23 @@ class WKRemoteDataStoreController @Inject()(
}
}

def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] =
Action.async { implicit request =>
dataStoreService.validateAccess(name, key) { _ =>
for {
organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext)
dataset <- datasetDAO.findOneByDirectoryNameAndOrganization(directoryName, organization._id)(
GlobalAccessContext)
layers <- datasetLayerDAO.findAllForDataset(dataset._id)
magsAndLinkedMags <- Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
magLinkInfos = magsAndLinkedMags.map(_.map { case (mag, linkedMags) => MagLinkInfo(mag, linkedMags) })
layersAndMagLinkInfos = layers.zip(magLinkInfos).map {
case (layer, magLinkInfo) => LayerMagLinkInfo(layer.name, magLinkInfo)
}
} yield Ok(Json.toJson(layersAndMagLinkInfos))
}
}
Comment on lines +244 to +259
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Ensure proper test coverage for new endpoint.

This new endpoint is critical for managing layer symlinks during deletion operations. Ensure comprehensive test coverage exists to verify its behavior with various dataset and layer configurations.


🏁 Script executed:

#!/bin/bash
# Check if tests exist for this new endpoint
rg -A 3 -B 3 "getPaths.*Action" --type scala test/
rg -A 3 -B 3 "LayerMagLinkInfo" --type scala test/

Length of output: 103


Action Required: Add Comprehensive Tests for the getPaths Endpoint

It appears that no tests referencing the new getPaths endpoint or its usage of LayerMagLinkInfo were found in the test suite. Since this endpoint is critical for managing layer symlinks during deletion operations, please ensure that comprehensive tests are added to verify the behavior with various dataset and layer configurations.

  • File: app/controllers/WKRemoteDataStoreController.scala (lines 240-255)
  • Areas to Cover:
    • Validate that the access control (validateAccess) correctly gates the endpoint.
    • Ensure that retrievals from organizationDAO, datasetDAO, and datasetLayerDAO behave as expected.
    • Test the complete mapping logic that produces LayerMagLinkInfo with varying dataset and layer scenarios.

💡 Verification agent

🧩 Analysis chain

Add documentation and consider performance optimizations.

The implementation of getPaths is well-structured and follows the existing patterns in the controller. However, there are a few improvements to consider:

  1. Add method documentation explaining its purpose and usage context
  2. The Fox.serialCombined performs operations sequentially, which could be a performance bottleneck with many layers - consider if these operations could be parallelized
  3. Ensure comprehensive error handling for all database operations
-  def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] =
+  /**
+   * Retrieves paths for all layers in a dataset.
+   * Used to support layer symlink management when deleting layers.
+   *
+   * @param name Datastore name
+   * @param key Datastore access key
+   * @param organizationId ID of the organization owning the dataset
+   * @param directoryName Directory name of the dataset
+   * @return JSON response with LayerMagLinkInfo objects containing layer paths
+   */
+  def getPaths(name: String, key: String, organizationId: String, directoryName: String): Action[AnyContent] =
     Action.async { implicit request =>
       dataStoreService.validateAccess(name, key) { _ =>
         for {
           organization <- organizationDAO.findOne(organizationId)(GlobalAccessContext)
           dataset <- datasetDAO.findOneByDirectoryNameAndOrganization(directoryName, organization._id)(
             GlobalAccessContext)
           layers <- datasetLayerDAO.findAllForDataset(dataset._id)
-          magsAndLinkedMags <- Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
+          // Perform path lookups in parallel for better performance with many layers
+          magsAndLinkedMags <- Fox.combined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))
           magLinkInfos = magsAndLinkedMags.map(_.map { case (mag, linkedMags) => MagLinkInfo(mag, linkedMags) })
           layersAndMagLinkInfos = layers.zip(magLinkInfos).map {
             case (layer, magLinkInfo) => LayerMagLinkInfo(layer.name, magLinkInfo)
           }
         } yield Ok(Json.toJson(layersAndMagLinkInfos))
       }
     }

Consider verifying if parallel execution using Fox.combined instead of Fox.serialCombined is appropriate for this case:


🏁 Script executed:

#!/bin/bash
# Check implementation of Fox.serialCombined vs Fox.combined to understand differences
rg -A 5 "def serialCombined" --type scala
rg -A 5 "def combined" --type scala

# Look for similar patterns in the codebase
rg -B 2 -A 2 "Fox\.combined" --type scala

Length of output: 23365


Attention: Required changes on getPaths method for parallel execution

The updated method documentation is a good improvement, and parallelizing the data layer lookups can boost performance. However, based on our investigation:

  • Parallel Execution: The current diff replaces the sequential call

    Fox.serialCombined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))

    with

    Fox.combined(layers)(l => datasetService.getPathsForDataLayer(dataset._id, l.name))

    However, the definition of Fox.combined in our codebase expects a sequence of Fox[T] values (i.e. a single-argument list). To safely execute these operations in parallel, please refactor the call by mapping the function over the layers before combining the results. For example, update the call to:

    magsAndLinkedMags <- Fox.combined(layers.map(l => datasetService.getPathsForDataLayer(dataset._id, l.name)))

    This change ensures that each lookup is executed concurrently according to Fox.combined’s signature.

  • Error Handling: While the core logic remains sound, please verify that comprehensive error handling is applied to all database operations to avoid unhandled failures.

Please incorporate these changes to resolve the concerns.


def deleteDataset(name: String, key: String): Action[JsValue] = Action.async(parse.json) { implicit request =>
dataStoreService.validateAccess(name, key) { _ =>
for {
Expand Down
69 changes: 44 additions & 25 deletions app/models/dataset/Dataset.scala
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import com.scalableminds.util.geometry.{BoundingBox, Vec3Double, Vec3Int}
import com.scalableminds.util.objectid.ObjectId
import com.scalableminds.util.time.Instant
import com.scalableminds.util.tools.{Fox, FoxImplicits, JsonHelper}
import com.scalableminds.webknossos.datastore.helpers.DataSourceMagInfo
import com.scalableminds.webknossos.datastore.models.{LengthUnit, VoxelSize}
import com.scalableminds.webknossos.datastore.models.datasource.DatasetViewConfiguration.DatasetViewConfiguration
import com.scalableminds.webknossos.datastore.models.datasource.LayerViewConfiguration.LayerViewConfiguration
Expand All @@ -31,6 +32,7 @@ import play.api.i18n.{Messages, MessagesProvider}
import play.api.libs.json._
import play.utils.UriEncoding
import slick.dbio.DBIO
import slick.jdbc.GetResult
import slick.jdbc.PostgresProfile.api._
import slick.jdbc.TransactionIsolation.Serializable
import slick.lifted.Rep
Expand Down Expand Up @@ -709,22 +711,21 @@ class DatasetDAO @Inject()(sqlClient: SqlClient, datasetLayerDAO: DatasetLayerDA
}
}

case class DatasetMagInfo(datasetId: ObjectId,
dataLayerName: String,
mag: Vec3Int,
path: Option[String],
realPath: Option[String],
hasLocalData: Boolean)

case class MagWithPaths(layerName: String,
mag: Vec3Int,
path: Option[String],
realPath: Option[String],
hasLocalData: Boolean)

object DatasetMagInfo {
implicit val jsonFormat: Format[DatasetMagInfo] = Json.format[DatasetMagInfo]
}
case class DataSourceMagRow(_dataset: ObjectId,
dataLayerName: String,
mag: String,
path: Option[String],
realPath: Option[String],
hasLocalData: Boolean,
_organization: String,
directoryName: String)

class DatasetMagsDAO @Inject()(sqlClient: SqlClient)(implicit ec: ExecutionContext)
extends SQLDAO[MagWithPaths, DatasetMagsRow, DatasetMags](sqlClient) {
protected val collection = DatasetMags
Expand Down Expand Up @@ -782,27 +783,45 @@ class DatasetMagsDAO @Inject()(sqlClient: SqlClient)(implicit ec: ExecutionConte
)
} yield ()

def findPathsForDatasetAndDatalayer(datasetId: ObjectId, dataLayerName: String): Fox[List[DatasetMagInfo]] =
implicit def GetResultDataSourceMagRow: GetResult[DataSourceMagRow] =
GetResult(
r =>
DataSourceMagRow(ObjectId(r.nextString()),
r.nextString(),
r.nextString(),
r.nextStringOption(),
r.nextStringOption(),
r.nextBoolean(),
r.nextString(),
r.nextString()))

private def rowsToMagInfos(rows: Vector[DataSourceMagRow]): Fox[List[DataSourceMagInfo]] =
for {
rows <- run(q"""SELECT $columns
mags <- Fox.serialCombined(rows.toList)(r => parseMag(r.mag))
dataSources = rows.map(row => DataSourceId(row.directoryName, row._organization))
magInfos = rows.toList.zip(mags).zip(dataSources).map {
case ((row, mag), dataSource) =>
DataSourceMagInfo(dataSource, row.dataLayerName, mag, row.path, row.realPath, row.hasLocalData)
}
} yield magInfos

def findPathsForDatasetAndDatalayer(datasetId: ObjectId, dataLayerName: String): Fox[List[DataSourceMagInfo]] =
for {
rows <- run(q"""SELECT $columns, _organization, directoryName
FROM webknossos.dataset_mags
INNER JOIN webknossos.datasets ON webknossos.dataset_mags._dataset = webknossos.datasets._id
WHERE _dataset = $datasetId
AND dataLayerName = $dataLayerName""".as[DatasetMagsRow])
magInfos <- Fox.combined(rows.toList.map(parse))
datasetMagInfos = magInfos.map(magInfo =>
DatasetMagInfo(datasetId, magInfo.layerName, magInfo.mag, magInfo.path, magInfo.realPath, magInfo.hasLocalData))
} yield datasetMagInfos
AND dataLayerName = $dataLayerName""".as[DataSourceMagRow])
magInfos <- rowsToMagInfos(rows)
} yield magInfos

def findAllByRealPath(realPath: String): Fox[List[DatasetMagInfo]] =
def findAllByRealPath(realPath: String): Fox[List[DataSourceMagInfo]] =
for {
rows <- run(q"""SELECT $columns
rows <- run(q"""SELECT $columns, _organization, directoryName
FROM webknossos.dataset_mags
WHERE realPath = $realPath""".as[DatasetMagsRow])
mags <- Fox.serialCombined(rows.toList)(r => parseMag(r.mag))
magInfos = rows.toList.zip(mags).map {
case (row, mag) =>
DatasetMagInfo(ObjectId(row._Dataset), row.datalayername, mag, row.path, row.realpath, row.haslocaldata)
}
INNER JOIN webknossos.datasets ON webknossos.dataset_mags._dataset = webknossos.datasets._id
WHERE realPath = $realPath""".as[DataSourceMagRow])
magInfos <- rowsToMagInfos(rows)
} yield magInfos

}
Expand Down
14 changes: 12 additions & 2 deletions app/models/dataset/DatasetService.scala
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import com.scalableminds.util.accesscontext.{DBAccessContext, GlobalAccessContex
import com.scalableminds.util.objectid.ObjectId
import com.scalableminds.util.time.Instant
import com.scalableminds.util.tools.{Fox, FoxImplicits}
import com.scalableminds.webknossos.datastore.helpers.DataSourceMagInfo
import com.scalableminds.webknossos.datastore.models.datasource.inbox.{
UnusableDataSource,
InboxDataSourceLike => InboxDataSource
Expand Down Expand Up @@ -360,15 +361,24 @@ class DatasetService @Inject()(organizationDAO: OrganizationDAO,
_ <- Fox.serialCombined(pathInfos)(updateRealPath)
} yield ()

def getPathsForDataLayer(datasetId: ObjectId, layerName: String): Fox[List[(DatasetMagInfo, List[DatasetMagInfo])]] =
/**
* Returns a list of tuples, where the first element is the magInfo and the second element is a list of all magInfos
* that share the same realPath but have a different dataSourceId. For each mag in the data layer there is one tuple.
* @param datasetId id of the dataset
* @param layerName name of the layer in the dataset
* @return
*/
def getPathsForDataLayer(datasetId: ObjectId,
layerName: String): Fox[List[(DataSourceMagInfo, List[DataSourceMagInfo])]] =
for {
magInfos <- datasetMagsDAO.findPathsForDatasetAndDatalayer(datasetId, layerName)
magInfosAndLinkedMags <- Fox.serialCombined(magInfos)(magInfo =>
magInfo.realPath match {
case Some(realPath) =>
for {
pathInfos <- datasetMagsDAO.findAllByRealPath(realPath)
} yield (magInfo, pathInfos.filter(!_.equals(magInfo)))
filteredPathInfos = pathInfos.filter(_.dataSourceId != magInfo.dataSourceId)
} yield (magInfo, filteredPathInfos)
case None => Fox.successful((magInfo, List()))
})
} yield magInfosAndLinkedMags
Expand Down
1 change: 1 addition & 0 deletions conf/webknossos.latest.routes
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ GET /datastores
PUT /datastores/:name/datasource controllers.WKRemoteDataStoreController.updateOne(name: String, key: String)
PUT /datastores/:name/datasources controllers.WKRemoteDataStoreController.updateAll(name: String, key: String)
PUT /datastores/:name/datasources/paths controllers.WKRemoteDataStoreController.updatePaths(name: String, key: String)
GET /datastores/:name/datasources/:organizationId/:directoryName/paths controllers.WKRemoteDataStoreController.getPaths(name: String, key: String, organizationId: String, directoryName: String)
PATCH /datastores/:name/status controllers.WKRemoteDataStoreController.statusUpdate(name: String, key: String)
POST /datastores/:name/reserveUpload controllers.WKRemoteDataStoreController.reserveDatasetUpload(name: String, key: String, token: String)
GET /datastores/:name/getUnfinishedUploadsForUser controllers.WKRemoteDataStoreController.getUnfinishedUploadsForUser(name: String, key: String, token: String, organizationName: String)
Expand Down
2 changes: 2 additions & 0 deletions docs/datasets/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,6 @@ You don't have to set complete *View Configurations* in either option, as WEBKNO

Offers an option to delete a dataset and completely remove it from WEBKNOSSOS. Be careful, this cannot be undone!

When other datasets reference layers from this dataset, WEBKNOSSOS will try to move these layers to the dataset they are referenced in, so that it can still be accessed.

![Dataset Editing: Delete Tab](../images/delete_tab.jpeg)
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,12 @@ trait Formatter {

def formatStackTrace(failure: Failure) =
failure.exception match {
case Full(exception) if includeStackTraces => s" Stack trace: ${TextUtils.stackTraceAsString(exception)} "
case _ => ""
case Full(exception) =>
if (includeStackTraces)
s" Stack trace: ${TextUtils.stackTraceAsString(exception)} "
else
exception.toString
case _ => ""
}

def formatNextChain(chainBox: Box[Failure]): String = chainBox match {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -495,7 +495,7 @@ class DataSourceController @Inject()(
val dataSourceId = DataSourceId(datasetDirectoryName, organizationId)
accessTokenService.validateAccessFromTokenContext(UserAccessRequest.deleteDataSource(dataSourceId)) {
for {
_ <- binaryDataServiceHolder.binaryDataService.deleteOnDisk(
_ <- dataSourceService.deleteOnDisk(
organizationId,
datasetDirectoryName,
reason = Some("the user wants to delete the dataset")) ?~> "dataset.delete.failed"
Expand Down
Loading