-
Notifications
You must be signed in to change notification settings - Fork 27
Use multi-get; native bucket scanner in segment stats computation #8469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThis pull request introduces a variety of changes across the codebase. A new changelog entry highlights performance improvements for segment statistics. Methods and parameters have been updated to accept broader sequence types, and error logging services have been refactored and replaced with more specific implementations. New methods have been added for handling multiple bucket data requests, and JNI interfaces have been extended with functions for voxel counting and bounding box extension. Dependency injections and method signatures in both datastore and tracing store components have been updated accordingly. Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
CHANGELOG.unreleased.md (1)
17-17
: Ensure Consistent and Clear Documentation of Performance ImprovementsThe changelog entry added on line 17 clearly documents the new performance improvements for segment statistics (volume and bounding box in context menu) and references PR [#8469]. Please verify that the terminology used here aligns with the implementations in the
SegmentStatistics.scala
and related components. If desired, consider adding a brief note about the use of the Multi-Get feature or native bucket scanner to further highlight the technical improvements.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentStatisticsService.scala (1)
60-88
: Implemented multi-position bucket data retrieval.The new
getDataForBucketPositions
method efficiently processes multiple bucket positions in a single request, aligning with the PR objective of implementing multi-get functionality. The method properly handles both editable mapping and standard volume tracing cases.Two minor optimization opportunities:
- The
.toList
conversion at line 78 may be unnecessary ifWebknossosDataRequest
acceptsSeq
.- Consider adding a check for empty
bucketPositions
to potentially short-circuit the processing.private def getDataForBucketPositions(annotationId: String, tracingId: String)( bucketPositions: Seq[Vec3Int], mag: Vec3Int, additionalCoordinates: Option[Seq[AdditionalCoordinate]])( implicit tc: TokenContext, ec: ExecutionContext): Fox[(Seq[Box[Array[Byte]]], ElementClass.Value)] = for { tracing <- annotationService.findVolume(annotationId, tracingId) ?~> "tracing.notFound" + _ <- if (bucketPositions.isEmpty) Fox.successful(()) else Fox.successful(()) dataRequests = bucketPositions.map { position => WebknossosDataRequest( position = position * mag * DataLayer.bucketLength, mag = mag, cubeSize = DataLayer.bucketLength, fourBit = Some(false), applyAgglomerate = None, version = None, additionalCoordinates = additionalCoordinates ) - }.toList + } bucketDataBoxes <- if (tracing.getHasEditableMapping) { val mappingLayer = annotationService.editableMappingLayer(annotationId, tracingId, tracing) editableMappingService.volumeDataBucketBoxes(mappingLayer, dataRequests) } else volumeTracingService.dataBucketBoxes(annotationId, tracingId, tracing, dataRequests, includeFallbackDataIfAvailable = true) } yield (bucketDataBoxes, elementClassFromProto(tracing.elementClass))webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/BucketProvider.scala (1)
13-17
: Good implementation of the loadMultiple methodThe method provides a default implementation for loading multiple data read instructions by processing each instruction serially with
Fox.serialSequenceBox
. This creates a foundation that concrete providers can override with more optimized implementations.Consider adding a comment indicating this is a default implementation that can be overridden by bucket providers with more efficient batch loading capabilities.
def loadMultiple(readInstructions: Seq[DataReadInstruction])(implicit ec: ExecutionContext, tc: TokenContext): Fox[Seq[Box[Array[Byte]]]] = + // Default implementation that processes instructions serially. + // Concrete providers can override this with more efficient batch loading. Fox.serialSequenceBox(readInstructions) { readInstruction => load(readInstruction) }webknossos-jni/src/bucketScanner.cpp (3)
81-85
: Informative error messages.Throwing descriptive runtime exceptions in both catch blocks with details about the native function helps with debugging. Make sure these messages don't leak sensitive or excessive information, especially in production logs.
90-116
:countSegmentVoxels
function logic is correct and performance is acceptable.
- The loop and counter increment seem correct for counting occurrences of
segmentId
.- Consider early release of
bucketBytes
before returning to neatly ensure the JNI memory is freed. Currently, memory is only released on exceptions.... try { ... + // release bytes here before returning + env->ReleaseByteArrayElements(bucketBytesJavaArray, bucketBytes, 0); return segmentVoxelCount; } catch (const std::exception &e) { ...
119-160
:extendSegmentBoundingBox
is methodically implemented.
- The triple-nested loop is logically correct for scanning each voxel in a bucket.
- Updates to the bounding box are consistent with typical min/max bounding logic.
- Similar to
countSegmentVoxels
, consider releasingbucketBytes
prior to returning from thetry
block to avoid relying solely on catch blocks.... try { ... + env->ReleaseByteArrayElements(bucketBytesJavaArray, bucketBytes, 0); return resultAsJIntArray; } catch (const std::exception &e) { ...
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
CHANGELOG.unreleased.md
(1 hunks)util/src/main/scala/com/scalableminds/util/tools/Fox.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/BucketProvider.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/NativeBucketScanner.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala
(4 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala
(5 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala
(2 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala
(4 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala
(5 hunks)webknossos-jni/src/bucketScanner.cpp
(2 hunks)webknossos-jni/src/include/com_scalableminds_webknossos_datastore_helpers_NativeBucketScanner.h
(1 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TracingStoreModule.scala
(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala
(4 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/TSDatasetErrorLoggingService.scala
(1 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentStatisticsService.scala
(4 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala
(2 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala
(3 hunks)
🧰 Additional context used
🧬 Code Definitions (8)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
DataLayer
(277-325)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (3)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/TSDatasetErrorLoggingService.scala (1)
TSDatasetErrorLoggingService
(11-16)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
DataServiceDataRequest
(17-24)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)
handleMultipleBucketRequests
(56-101)
webknossos-jni/src/include/com_scalableminds_webknossos_datastore_helpers_NativeBucketScanner.h (1)
webknossos-jni/src/bucketScanner.cpp (2)
jlong
(90-116)jintArray
(119-160)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (3)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/TemporaryTracingService.scala (1)
isTemporaryTracing
(100-103)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
DataServiceDataRequest
(17-24)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)
handleMultipleBucketRequests
(56-101)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentStatisticsService.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/NativeBucketScanner.scala (1)
NativeBucketScanner
(5-28)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (2)
EditableMappingService
(90-524)volumeDataBucketBoxes
(207-216)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)
tracingId
(110-110)version
(108-108)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (1)
dataBucketBoxes
(510-528)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (2)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (10)
flatMap
(321-331)flatMap
(407-407)foreach
(334-337)foreach
(409-409)Fox
(58-277)Fox
(279-415)successful
(62-65)map
(318-321)map
(405-405)toFox
(377-377)app/mail/MailchimpTicker.scala (1)
withErrorLogging
(39-46)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (2)
isSingleBucket
(23-23)DataReadInstruction
(26-34)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala (1)
bucketProvider
(16-21)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (2)
withErrorLoggingMultiple
(48-60)withErrorLogging
(60-86)webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/BucketProvider.scala (1)
loadMultiple
(13-18)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)
applyAgglomerate
(50-98)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (2)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentStatisticsService.scala (2)
getBucketPositions
(90-111)getDataForBucketPositions
(60-90)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala (4)
getBucketPositions
(174-190)getBucketPositions
(190-201)getDataForBucketPositions
(156-174)getDataForBucketPositions
(231-258)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: circleci_build
🔇 Additional comments (51)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentStatisticsService.scala (5)
7-9
: Imports updated for native bucket scanning.The imports have been updated to include
NativeBucketScanner
,SegmentStatistics
, andWebknossosDataRequest
. These changes align with the PR objective of replacing Scala-based voxel counting with a native bucket scanner and supporting multi-get functionality.
13-13
: Changed Box import to Lift's implementation.The import for
Box
has been changed from Scala's implementation to Lift's. This change supports the multi-get functionality as Lift's Box is used by the underlying data retrieval methods.
26-27
: Added native bucket scanner.Addition of the
NativeBucketScanner
as a lazy val is a good implementation choice. This ensures the native scanner is only initialized when needed and aligns with the PR objective of replacing Scala-based voxel counting with a more efficient native implementation.
41-41
: Updated method call to use batch bucket data retrieval.The method call has been updated to use the new
getDataForBucketPositions
method instead of the previous single-position method. This change enables batch processing of bucket positions, which should lead to the expected 10x performance improvement mentioned in the PR objectives.
57-57
: Updated method call to use batch bucket data retrieval for bounding box calculation.Similar to the volume calculation update, the bounding box calculation now uses the batch processing method
getDataForBucketPositions
. This consistency ensures that both volume and bounding box calculations benefit from the performance improvements.webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (7)
6-9
: Imports look fine.
These additional imports appear valid and are used in the code below (e.g.,tryo
fromnet.liftweb.common.Box
). No concerns here.
24-25
: Introduction ofbucketScanner
trait.
DefiningbucketScanner
as a protected member is a clear way to encapsulate the native scanning logic. Ensure proper initialization in concrete classes or mixins.
26-34
: Check large-scale data handling.
Using a batch-oriented parameter (getDataForBucketPositions
) for volume calculation is beneficial for performance. However, if this needs to handle very large sets of buckets, confirm that memory usage and performance remain acceptable.
39-49
: Ensure sequential execution is appropriate.
This volume-counting step runs serially viaFox.serialCombined
. Verify that the expected runtime is acceptable for large bucket lists. If parallelization is desired, consider verifying concurrency safety first, as the current approach looks strictly sequential.
53-62
: Bounding box signature update looks consistent.
The new parameters and approach for retrieving data in batches mirror the volume calculation changes. This is consistent and should help unify the data retrieval logic.
72-81
: Bucket-by-bucket bounding box extension.
Serially processing each bucket withextendBoundingBoxByData
is straightforward. No immediate concerns.
109-134
: Potential concurrency consideration.
extendBoundingBoxByData
updates a shared mutable structure (boundingBoxMutable
). Because it's invoked withinFox.serialCombined
, it should be safe in serial. If there's ever a parallel approach, this would need synchronization or a refactor.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/TSDatasetErrorLoggingService.scala (1)
1-17
: New dataset error logging service.
This class cleanly extendsDatasetErrorLoggingService
. MarkingapplicationHealthService
asNone
is acceptable for now, but if future health checks are needed, consider providing a dedicated service object. No issues otherwise.webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
22-24
: Helper method for single-bucket detection.
The logic to verify if the cuboid's size matchesDataLayer.bucketLength
is straightforward. This should help differentiate single-bucket vs bulk requests.webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/BucketProvider.scala (1)
6-6
: Added import for Box from net.liftweb.commonThis import is necessary for the new
loadMultiple
method's return type.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TracingStoreModule.scala (2)
9-9
: Updated import statement to include TSDatasetErrorLoggingServiceConsolidated import statement to include both
VolumeTracingService
andTSDatasetErrorLoggingService
from the same package.
29-29
: Added eager singleton binding for TSDatasetErrorLoggingServiceThe new binding registers
TSDatasetErrorLoggingService
as an eager singleton, ensuring it's initialized at application startup. This aligns with the PR objective of enhancing error logging for bucket loading in the tracing store.webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)
46-46
: Updated parameter type to DSDatasetErrorLoggingServiceChanged the injected service type from what was likely
DatasetErrorLoggingService
to the more specificDSDatasetErrorLoggingService
. This is part of the larger effort to make the error logging service mandatory and enhance error handling capabilities.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (2)
16-16
: Added import for Box from net.liftweb.commonThis import is necessary for the new
loadMultiple
method's return type.
39-48
:✅ Verification successful
Optimized implementation of loadMultiple
This override of the
loadMultiple
method from theBucketProvider
trait provides a more efficient implementation for loading multiple buckets simultaneously. It checks for empty requests first, then extracts the version from the first instruction and uses it for all buckets. This implementation likely leverages the Multi-Get feature from FossilDB mentioned in the PR objectives, which should significantly improve performance.Verify that in practical use, all read instructions in a batch would have the same version, since this implementation uses only the version from the first instruction.
🏁 Script executed:
#!/bin/bash # Check if DataReadInstruction objects typically have the same version when processed in batches # Search for places where multiple DataReadInstruction objects are created rg -A 5 "Seq\[DataReadInstruction\]" --include="*.scala" # Check for any code that might set different versions rg -A 3 "DataReadInstruction.*version" --include="*.scala"Length of output: 1003
Below is a new shell script using corrected flags to re-run the search for any indications that batches of read instructions might have differing versions:
Attention: Verify version consistency in loadMultiple method
The optimized implementation of
loadMultiple
inVolumeTracingLayer.scala
improves performance by leveraging FossilDB’s multi-get feature; however, it always extracts the version from only the first read instruction. Please verify that, in practical usage, allDataReadInstruction
instances in a batch share the same version to avoid potential issues.
- File:
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala
(Lines 39–48)- Action: Double-check through manual code review or tests that no code paths create a batch of read instructions with differing versions.
🏁 Script executed:
#!/bin/bash # Search for occurrences of multiple DataReadInstruction objects in Scala files and check their version settings rg -A 5 "Seq\[DataReadInstruction\]" -g "*.scala" rg -A 3 "DataReadInstruction.*version" -g "*.scala"Length of output: 3645
Optimized implementation verified—please ensure all batches maintain version consistency
After verifying across the codebase, we did not find any indication that batches of
DataReadInstruction
instances are created with differing versions. The relevant searches in both theVolumeTracingLayer.scala
and the related datastore files consistently show version retrieval based on the settings (e.g.,request.settings.version
inBinaryDataService.scala
) and no alternative logic for setting distinct versions within a single batch. Therefore, the design decision to extract the version only from the first instruction inloadMultiple
appears valid under current usage.
- File:
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala
(Lines 39–48)- Action: Although current evidence indicates that all read instructions in a batch share the same version, maintain vigilance in future changes to ensure that batch creation does not introduce version mismatches.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (1)
28-28
: Service migration fromDatasetErrorLoggingService
toDSDatasetErrorLoggingService
The binding has been updated to use the datastore-specific
DSDatasetErrorLoggingService
implementation instead of the genericDatasetErrorLoggingService
. This aligns with the PR objective of enhancing error logging for bucket loading operations.util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)
90-91
: Method signature enhancement to accept broader sequence typesThe method signature has been improved to accept any
Seq[A]
instead of onlyList[A]
, increasing the flexibility of this utility method without changing its behavior.
102-102
: Necessary conversion to maintain compatibilityThe implementation now converts the input sequence to a list using
l.toList
to ensure compatibility with the existingrunNext
method implementation. This is the correct approach when broadening the input parameter type.webknossos-jni/src/include/com_scalableminds_webknossos_datastore_helpers_NativeBucketScanner.h (2)
18-24
: New JNI method for segment voxel countingAdded a native method declaration for counting voxels within a specific segment. This aligns with the PR objective of improving performance for segment statistics calculation by implementing native code optimizations.
26-32
: New JNI method for bounding box calculationAdded a native method declaration for extending segment bounding boxes. This complements the voxel counting functionality and is part of the performance optimization effort mentioned in the PR description.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/NativeBucketScanner.scala (2)
9-12
: Native implementation for segment voxel countingThis new native method corresponds to the JNI declaration and provides efficient counting of voxels belonging to a specific segment ID. This replaces the previous Scala-based voxel counting implementation for better performance.
14-27
: Native implementation for bounding box extensionThis complex native method efficiently calculates and extends a segment's bounding box by analyzing voxel data. The method signature properly handles all necessary parameters including bucket metadata, segment ID, and existing bounding box coordinates.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala (2)
26-26
: Constructor parameter type change to more specific implementationThe parameter type has been changed from
DatasetErrorLoggingService
toDSDatasetErrorLoggingService
, indicating a move to a more specific implementation for the datastore context.
49-49
: Parameter no longer wrapped in OptionThe
datasetErrorLoggingService
is now passed directly to thebinaryDataService
constructor without being wrapped in anOption
, making it a required parameter rather than optional. This aligns with the PR objective of enhancing error logging for bucket loading.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (3)
44-44
: Added datasetErrorLoggingService dependencyAdded the
TSDatasetErrorLoggingService
as a constructor parameter, making error logging services available to this class. This aligns with the PR objective of enhancing error logging for bucket loading.
73-73
: Updated BinaryDataService instantiationThe
binaryDataService
instantiation now includes thedatasetErrorLoggingService
parameter instead ofNone
, ensuring error logging is properly integrated into the binary data service.
510-527
: Added new method for optimized bucket data retrievalThe new
dataBucketBoxes
method provides an optimized way to retrieve bucket data by leveraging thehandleMultipleBucketRequests
method frombinaryDataService
. This implementation supports the PR's goal of achieving approximately a 10x speedup for segment volume and bounding box requests.The method follows the same pattern as the existing
data
method but returns raw bucket data boxes instead of processed data, allowing for more efficient processing of multiple bucket requests in a single operation.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (4)
25-25
: Updated import to include TSDatasetErrorLoggingServiceAdded import for
TSDatasetErrorLoggingService
to support the new constructor parameter.
91-91
: Added datasetErrorLoggingService dependencyAdded
TSDatasetErrorLoggingService
as a constructor parameter, consistent with the changes in other services to make error logging mandatory.
107-107
: Updated BinaryDataService instantiationThe
binaryDataService
instantiation now includes thedatasetErrorLoggingService
parameter instead ofNone
, ensuring error logging is properly integrated.
207-216
: Added volumeDataBucketBoxes method for optimized bucket data retrievalImplemented a new method that uses
handleMultipleBucketRequests
frombinaryDataService
to efficiently retrieve bucket data in bulk. This is consistent with the similar method added toVolumeTracingService
and supports the performance improvements described in the PR objectives.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala (6)
9-10
: Updated imports for native bucket scanner integrationModified imports to include
NativeBucketScanner
and update theElementClass
import path, supporting the transition from Scala-based voxel counting to the native bucket scanner mentioned in the PR objectives.
16-16
: Updated import path for consistencyModified the import path for consistency with other changes related to the native bucket scanner integration.
37-38
: Added NativeBucketScanner instanceInitialized a new protected
bucketScanner
instance ofNativeBucketScanner
, which will be used for the optimized segment statistics computation described in the PR objectives.
124-132
: Refactored getSegmentVolume for improved performanceThe method has been refactored to use
calculateSegmentVolume
with the newgetDataForBucketPositions
method, taking advantage of the multi-get feature described in the PR objectives for approximately 10x speedup.
147-147
: Updated getSegmentBoundingBox to use multi-getUpdated
getSegmentBoundingBox
to use the newgetDataForBucketPositions
method instead of the previous single-bucket implementation, enabling performance improvements for bounding box requests.
156-172
: Replaced getTypedDataForBucketPosition with optimized implementationThe previous method has been replaced with
getDataForBucketPositions
, which now accepts multiple bucket positions and returns both the bucket data and element class. This change supports the bulk processing capability needed for the performance improvements described in the PR objectives.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (4)
11-20
: Extracting theapplicationHealthService
into a separate trait field looks good.By making
applicationHealthService
optional and part of the trait, you allow implementations to decide how (and whether) to push errors. This promotes flexibility and reusability of error-logging behavior.
48-58
: Implementation ofwithErrorLoggingMultiple
is consistent with single-item error logging.The iterative application of
withErrorLogging
to eachBox[Array[Byte]]
ensures that all errors are captured. This is a logical extension of the existingwithErrorLogging
pattern. The method's overall structure appears clean and error-friendly.
75-75
: Ensure potential side effects are intentional.
applicationHealthService.foreach(_.pushError(e))
is a key side effect. Verify that calling it here, specifically withe
fromInternalError
, aligns with your broader health/error reporting requirements.
88-94
: Concrete implementation ofDSDatasetErrorLoggingService
is straightforward.Injecting
dsApplicationHealthService
and returning it asSome(...)
at runtime clarifies error push behavior. This class offers a clean pattern for other, similar services to extend the base trait.webknossos-jni/src/bucketScanner.cpp (1)
5-5
: Use of<vector>
ensures efficient collection handling.Including
<vector>
is appropriate for storing intermediate bounding box data. No concerns here.webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (4)
3-3
: New imports and updated constructor dependencies.
import collections.SequenceUtils
usage suggests you’re introducing a new utility for handling sequences – verify it’s needed in this file.- Explicitly providing
datasetErrorLoggingService
ensures all calls can leverage consistent logging.Also applies to: 15-15, 26-26
56-99
:handleMultipleBucketRequests
logic promotes succinct retrieval of multiple buckets.
- The check
requests.forall(_.isSingleBucket)
is a good safety measure, ensuring only single-bucket calls are processed here.- The location-based filtering (
indicesWhereOutsideRange
) neatly separates valid from out-of-range requests.- Returning
Empty
boxes for invalid requests is a clean approach to differentiate partial vs. completely missing data.
101-124
: Refactoring data conversion viaconvertIfNecessary
andconvertAccordingToRequest
.
- This approach keeps the code DRY and less error-prone, isolating conversion logic.
- The fallback to
Fox.successful(inputArray)
when no agglomeration is needed orhalfByte
is false is straightforward.- Ensure the error strings passed to
?~>
remain user-friendly and actionable.
132-133
: ApplyingconvertAccordingToRequest
again is consistent.The explicit second call to
convertAccordingToRequest
afterhandleDataRequest
maintains uniform conversion logic. No immediate issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat ✨, Couldn't find any bugs.
I think I found potential places in the code that could use a parallel for-loop instead of serial processing. Please see my comments and tell me what you think.
Testing went also well :D
But please keep in mind that I did not review the JNI part. If I should do this as well, please just tell me 👍
...nossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/BucketProvider.scala
Show resolved
Hide resolved
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala
Show resolved
Hide resolved
...nossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala
Outdated
Show resolved
Hide resolved
...-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala
Show resolved
Hide resolved
...com/scalableminds/webknossos/tracingstore/tracings/volume/TSDatasetErrorLoggingService.scala
Show resolved
Hide resolved
Maybe #8482 is already a solution to my parallel for loop suggestions (instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
webknossos-jni/src/bucketScanner.cpp (4)
95-95
: Inconsistent use of NULL vs nullptrThe code uses
NULL
here butnullptr
in other parts of the file. While both work, consistency is better for maintainability.- jbyte * bucketBytes = env -> GetByteArrayElements(bucketBytesJavaArray, NULL); + jbyte * bucketBytes = env -> GetByteArrayElements(bucketBytesJavaArray, nullptr);
128-128
: Inconsistent use of NULL vs nullptrFor consistency with the rest of the codebase, use
nullptr
instead ofNULL
.- jbyte * bucketBytes = env -> GetByteArrayElements(bucketBytesJavaArray, NULL); + jbyte * bucketBytes = env -> GetByteArrayElements(bucketBytesJavaArray, nullptr);
132-147
: Consider validating bucketLength parameterThe code doesn't validate that
bucketLength
is positive before using it in for-loops. IfbucketLength
is zero or negative, this could lead to unexpected behavior.Add a validation check at the beginning of the try block:
try { + if (bucketLength <= 0) { + throw std::invalid_argument("bucketLength must be positive"); + } const size_t elementCount = getElementCount(inputLengthBytes, bytesPerElement);
132-147
: Consider optimizing the bounding box calculationThe current implementation uses nested loops to iterate through every voxel in the bucket, which might be inefficient for large buckets with few matching voxels.
A potential optimization would be to only iterate through voxels that match the segment ID, rather than checking every voxel in the bucket. This could be accomplished by first identifying the indices of matching voxels and then calculating their coordinates, or by using a more efficient data structure to store voxel positions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
CHANGELOG.unreleased.md
(1 hunks)webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/NativeBucketScanner.scala
(1 hunks)webknossos-jni/src/bucketScanner.cpp
(2 hunks)webknossos-jni/src/include/com_scalableminds_webknossos_datastore_helpers_NativeBucketScanner.h
(1 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala
(4 hunks)webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- CHANGELOG.unreleased.md
- webknossos-jni/src/include/com_scalableminds_webknossos_datastore_helpers_NativeBucketScanner.h
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/NativeBucketScanner.scala
🧰 Additional context used
🧬 Code Definitions (2)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (3)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/TemporaryTracingService.scala (1)
isTemporaryTracing
(100-103)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
DataServiceDataRequest
(17-24)webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)
handleMultipleBucketRequests
(56-101)
webknossos-jni/src/bucketScanner.cpp (3)
webknossos-jni/src/include/jniutils.h (1)
throwRuntimeException
(6-6)webknossos-jni/src/jniutils.cpp (2)
throwRuntimeException
(3-9)throwRuntimeException
(3-3)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (2)
bytesPerElement
(101-113)isSigned
(67-82)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: circleci_build
🔇 Additional comments (8)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (3)
44-44
: Consistent error logging enhancement.Good addition of the
datasetErrorLoggingService
parameter, which aligns with the PR objective to enhance error logging for bucket loading in the tracing store.
73-73
: Proper dependency injection.Correctly updated the
binaryDataService
initialization to use the injecteddatasetErrorLoggingService
instead ofNone
, enabling better error logging for bucket loading operations.
510-527
: Implemented Multi-Get feature for improved segment statistics performance.This new method effectively implements the Multi-Get feature from FossilDB mentioned in the PR objectives. By using
handleMultipleBucketRequests
instead ofhandleDataRequests
, you achieve the performance improvement (approximately 10x speedup) for segment volume and bounding box requests when no fallback volume annotations are present.The implementation mirrors the structure of the existing
data
method while providing the necessary functionality differences to support batched bucket requests.webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (4)
31-31
: Appropriate import for error logging service.Correctly added the import for
TSDatasetErrorLoggingService
to support the enhanced error logging.
98-98
: Consistent error logging enhancement.Good addition of the
datasetErrorLoggingService
parameter to the constructor, which maintains consistency with the changes inVolumeTracingService
and supports the PR's error logging improvements.
114-114
: Proper dependency injection.Updated the
binaryDataService
initialization to use the injecteddatasetErrorLoggingService
instead ofNone
, consistent with the changes inVolumeTracingService
.
220-229
: Implemented Multi-Get feature for improved performance.Added a parallel implementation of the Multi-Get feature, ensuring consistency across the codebase. This method mirrors the implementation in
VolumeTracingService.dataBucketBoxes
and will provide the same performance benefits when handling multiple bucket requests.The implementation follows the established pattern of the existing
volumeData
method while adapting it to use the more efficienthandleMultipleBucketRequests
method.webknossos-jni/src/bucketScanner.cpp (1)
91-117
: Good implementation of new native functionsThe implementation of both new functions follows the established patterns in the codebase and correctly handles exceptions with appropriate error messages. The functions make good use of existing helper methods (
getElementCount
andsegmentIdAtIndex
) for consistency.Also applies to: 120-161
Thanks for your review @MichaelBuessemeyer I adressed most parts. Please see my comment at #8482 (comment) concerning the serialCombined/parallelization considerations. @normanrz could you please have a look at the c++ changes again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
webknossos-jni/src/bucketScanner.cpp (1)
149-153
:⚠️ Potential issueMemory leak due to missing byte array release in the success path
The function properly releases the byte array elements in error paths (lines 155 and 159), but is missing the release in the success path. This will cause memory leaks.
env->ReleaseByteArrayElements(bucketBytesJavaArray, bucketBytes, 0); jintArray resultAsJIntArray = env -> NewIntArray(bbox.size()); env -> SetIntArrayRegion(resultAsJIntArray, 0, bbox.size(), reinterpret_cast < const jint * > (bbox.data())); return resultAsJIntArray;Wait, I see that line 149 already contains the release, so this issue has been fixed from previous review comments. This implementation correctly releases the byte array before returning.
🧹 Nitpick comments (1)
webknossos-jni/src/bucketScanner.cpp (1)
133-148
: Consider optimizing the triple-nested loop for better performanceThe current implementation uses a triple-nested loop to iterate through all voxels in the bucket. For large buckets, this could be a performance bottleneck. Since you're only interested in voxels matching the segmentId, consider optimizing:
- for (int x = 0; x < bucketLength; x++) { - for (int y = 0; y < bucketLength; y++) { - for (int z = 0; z < bucketLength; z++) { - int index = z * bucketLength * bucketLength + y * bucketLength + x; - int64_t currentValue = segmentIdAtIndex(bucketBytes, index, bytesPerElement, isSigned); - if (currentValue == segmentId) { - bbox[0] = std::min(bbox[0], x + bucketTopLeftX); - bbox[1] = std::min(bbox[1], y + bucketTopLeftY); - bbox[2] = std::min(bbox[2], z + bucketTopLeftZ); - bbox[3] = std::max(bbox[3], x + bucketTopLeftX); - bbox[4] = std::max(bbox[4], y + bucketTopLeftY); - bbox[5] = std::max(bbox[5], z + bucketTopLeftZ); - } - } - } - } + // Single loop approach for potentially better cache locality + const size_t totalVoxels = bucketLength * bucketLength * bucketLength; + for (size_t i = 0; i < totalVoxels; i++) { + int64_t currentValue = segmentIdAtIndex(bucketBytes, i, bytesPerElement, isSigned); + if (currentValue == segmentId) { + // Convert linear index to 3D coordinates + int z = i / (bucketLength * bucketLength); + int y = (i % (bucketLength * bucketLength)) / bucketLength; + int x = i % bucketLength; + + bbox[0] = std::min(bbox[0], x + bucketTopLeftX); + bbox[1] = std::min(bbox[1], y + bucketTopLeftY); + bbox[2] = std::min(bbox[2], z + bucketTopLeftZ); + bbox[3] = std::max(bbox[3], x + bucketTopLeftX); + bbox[4] = std::max(bbox[4], y + bucketTopLeftY); + bbox[5] = std::max(bbox[5], z + bucketTopLeftZ); + } + }This single-loop approach may have better cache locality and could be faster for large datasets.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala
(4 hunks)webknossos-jni/src/bucketScanner.cpp
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: circleci_build
🔇 Additional comments (4)
webknossos-jni/src/bucketScanner.cpp (4)
5-5
: Added vector include for the new bounding box implementationThe include for the standard vector library is appropriately added to support the bounding box vector in the new
extendSegmentBoundingBox
function.
82-82
: Improved exception messages with function contextAdding the function name to exception messages is a good improvement for debuggability, allowing errors to be more easily traced to their source.
Also applies to: 86-86
91-118
: Input validation needed for segmentId in countSegmentVoxelsThe function correctly implements voxel counting for a specific segment ID, but it's missing input validation for the
segmentId
parameter.While the function handles exceptions from
getElementCount
and properly releases JNI resources in all paths, it should validate thatsegmentId
is non-negative before processing, as negative segment IDs likely don't make sense in this context.try { + if (segmentId < 0) { + throw std::invalid_argument("segmentId cannot be negative"); + } const size_t elementCount = getElementCount(inputLengthBytes, bytesPerElement);Note that the existing
segmentIdAtIndex
function handles validation forbytesPerElement
, but explicit validation would improve robustness.
121-163
: Input validation needed for segmentId and nested loops in extendSegmentBoundingBoxThe bounding box extension function has several areas for improvement:
- Missing validation for
segmentId
similarly to the previous function- The triple-nested loop could be a performance concern for large buckets
- There's no validation of
bucketLength
to ensure it's positivetry { + if (segmentId < 0) { + throw std::invalid_argument("segmentId cannot be negative"); + } + if (bucketLength <= 0) { + throw std::invalid_argument("bucketLength must be positive"); + } const size_t elementCount = getElementCount(inputLengthBytes, bytesPerElement);Additionally, consider adding a check to verify that
elementCount == bucketLength^3
to ensure data consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
webknossos-jni/src/bucketScanner.cpp (1)
152-154
:⚠️ Potential issueMissing byte array release before returning.
The function fails to release the byte array reference before returning in the success path, which could cause memory leaks.
Apply the following fix:
jintArray resultAsJIntArray = env -> NewIntArray(bbox.size()); env -> SetIntArrayRegion(resultAsJIntArray, 0, bbox.size(), reinterpret_cast < const jint * > (bbox.data())); + env->ReleaseByteArrayElements(bucketBytesJavaArray, bucketBytes, 0); return resultAsJIntArray;
🧹 Nitpick comments (1)
webknossos-jni/src/bucketScanner.cpp (1)
120-165
: Well-optimized loop ordering for cache locality.The z,y,x loop ordering matches the bucket's fortran ordering, providing good cache performance. This is well documented with the comment on line 146.
However, there's still a need for input validation like in the previous function.
Consider adding input validation at the function entry:
(JNIEnv * env, jobject instance, jbyteArray bucketBytesJavaArray, jint bytesPerElement, jboolean isSigned, jint bucketLength, jlong segmentId, jint bucketTopLeftX, jint bucketTopLeftY, jint bucketTopLeftZ, jint existingBBoxTopLeftX, jint existingBBoxTopLeftY, jint existingBBoxTopLeftZ, jint existingBBoxBottomRightX, jint existingBBoxBottomRightY, jint existingBBoxBottomRightZ) { + // Validate bytesPerElement + if (bytesPerElement != 1 && bytesPerElement != 2 && bytesPerElement != 4 && bytesPerElement != 8) { + throwRuntimeException(env, "Native Exception in BucketScanner extendSegmentBoundingBox: unsupported bytesPerElement value"); + return nullptr; + } jsize inputLengthBytes = env -> GetArrayLength(bucketBytesJavaArray);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.unreleased.md
(1 hunks)webknossos-jni/src/bucketScanner.cpp
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- CHANGELOG.unreleased.md
🧰 Additional context used
🧬 Code Definitions (1)
webknossos-jni/src/bucketScanner.cpp (3)
webknossos-jni/src/include/jniutils.h (1)
throwRuntimeException
(6-6)webknossos-jni/src/jniutils.cpp (2)
throwRuntimeException
(3-9)throwRuntimeException
(3-3)webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (2)
bytesPerElement
(101-113)isSigned
(67-82)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: circleci_build
🔇 Additional comments (2)
webknossos-jni/src/bucketScanner.cpp (2)
5-5
: Include addition is appropriate.This include is necessary for using
std::array
in theextendSegmentBoundingBox
function.
82-82
: Improved error messaging.Adding the function name to the exception messages provides better context for debugging.
Also applies to: 86-86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome thanks. Scala changes look good 👍. Should I retest or did you already did plenty testing? At least the scala changes look like nothing really changed
Should be fine, thanks! I’m just waiting on the cpp re-review then :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c++ looks good
BinaryDataService
now offershandleMultipleBucketRequests
, which forwards this to the BucketProviders. TheVolumeTracingBucketProvider
implements this with a FossilDB Multi-GetdatasetErrorLoggingService
param of theBinaryDataService
is no longer optional).Steps to test:
TODOs:
Issues: