Skip to content

Use multi-get; native bucket scanner in segment stats computation #8469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 67 commits into from
Apr 8, 2025

Conversation

fm3
Copy link
Member

@fm3 fm3 commented Mar 26, 2025

  • BinaryDataService now offers handleMultipleBucketRequests, which forwards this to the BucketProviders. The VolumeTracingBucketProvider implements this with a FossilDB Multi-Get
  • The segment stats computation now uses that.
  • Also, instead of scala-based voxel counting, it now uses the nativeBucketScanner (implemented two new functions there)
  • I’m getting about 10x speedup for segment volume and bbox requests with this (on no-fallback volume annotations).
  • Added bucket loading error logging also for the tracingstore (datasetErrorLoggingService param of the BinaryDataService is no longer optional).

Steps to test:

TODOs:

  • Seems broken for volumes with fallback segmentation, I’m always getting 0
  • Also implement for BoundingBox request
  • Also implement for edigable mapping volume layers
  • Clean up abstractions
  • re-add error logging and mag/layer checks to handleMultipleBucketRequests
  • also use multi-get if a multi-bucket area was requested? e.g. for ad-hoc meshing

Issues:


  • Updated changelog
  • Removed dev-only changes like prints and application.conf edits
  • Considered common edge cases
  • Needs datastore update after deployment

@fm3 fm3 self-assigned this Mar 26, 2025
Copy link
Contributor

coderabbitai bot commented Mar 26, 2025

📝 Walkthrough

Walkthrough

This pull request introduces a variety of changes across the codebase. A new changelog entry highlights performance improvements for segment statistics. Methods and parameters have been updated to accept broader sequence types, and error logging services have been refactored and replaced with more specific implementations. New methods have been added for handling multiple bucket data requests, and JNI interfaces have been extended with functions for voxel counting and bounding box extension. Dependency injections and method signatures in both datastore and tracing store components have been updated accordingly.

Changes

File(s) Change Summary
CHANGELOG.unreleased.md Added a new entry under "Added" detailing performance improvements for segment statistics (volume and bounding box enhancements in the context menu).
util/.../Fox.scala Modified serialSequenceBox to change its parameter type from List[A] to Seq[A] and added conversion via toList.
webknossos-datastore/.../DataStoreModule.scala,
webknossos-datastore/.../DataSourceController.scala,
webknossos-datastore/.../BinaryDataService.scala,
webknossos-datastore/.../BinaryDataServiceHolder.scala,
webknossos-datastore/.../DatasetErrorLoggingService.scala
Replaced bindings and parameter types from DatasetErrorLoggingService to DSDatasetErrorLoggingService; refactored the error logging service into a trait with a concrete implementation.
webknossos-datastore/.../BucketProvider.scala,
webknossos-datastore/.../SegmentStatistics.scala,
webknossos-datastore/.../SegmentIndexFileService.scala,
webknossos-datastore/.../DataServiceRequests.scala
Added a new loadMultiple method to handle sequences of data read instructions; updated SegmentStatistics methods to use batch data retrieval and modified method names/signatures accordingly; added isSingleBucket in data service requests.
webknossos-datastore/.../NativeBucketScanner.scala,
webknossos-jni/src/bucketScanner.cpp,
webknossos-jni/src/include/.../NativeBucketScanner.h
Introduced new native methods for counting segment voxels and extending segment bounding boxes, with improved error handling and updated exception messages.
webknossos-tracingstore/.../TracingStoreModule.scala,
webknossos-tracingstore/.../EditableMappingService.scala,
webknossos-tracingstore/.../tracings/volume/TSDatasetErrorLoggingService.scala,
webknossos-tracingstore/.../tracings/volume/VolumeSegmentStatisticsService.scala,
webknossos-tracingstore/.../tracings/volume/VolumeTracingLayer.scala,
webknossos-tracingstore/.../tracings/volume/VolumeTracingService.scala
Added new bindings for TSDatasetErrorLoggingService and updated dependency injections; modified constructors to inject error logging services; introduced new methods (volumeDataBucketBoxes, loadMultiple, dataBucketBoxes) to handle batch bucket requests and streamlined data retrieval processes.

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • normanrz

Poem

I hopped through lines of code so light,
Tweaking buckets and boxes with delight.
New methods bloom like carrots in the field,
Error logs and sequences have been healed.
With every update, my whiskers twitch in glee—
A rabbit’s joy in a world of clarity!
🥕🐇

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@fm3 fm3 changed the title use multi-get; native bucket scanner in segment stats computation WIP use multi-get; native bucket scanner in segment stats computation Mar 26, 2025
Base automatically changed from segment-index-perf to master April 3, 2025 06:26
@fm3 fm3 changed the title WIP use multi-get; native bucket scanner in segment stats computation Use multi-get; native bucket scanner in segment stats computation Apr 3, 2025
@fm3 fm3 marked this pull request as ready for review April 3, 2025 08:58
@fm3 fm3 requested a review from MichaelBuessemeyer April 3, 2025 08:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
CHANGELOG.unreleased.md (1)

17-17: Ensure Consistent and Clear Documentation of Performance Improvements

The changelog entry added on line 17 clearly documents the new performance improvements for segment statistics (volume and bounding box in context menu) and references PR [#8469]. Please verify that the terminology used here aligns with the implementations in the SegmentStatistics.scala and related components. If desired, consider adding a brief note about the use of the Multi-Get feature or native bucket scanner to further highlight the technical improvements.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentStatisticsService.scala (1)

60-88: Implemented multi-position bucket data retrieval.

The new getDataForBucketPositions method efficiently processes multiple bucket positions in a single request, aligning with the PR objective of implementing multi-get functionality. The method properly handles both editable mapping and standard volume tracing cases.

Two minor optimization opportunities:

  1. The .toList conversion at line 78 may be unnecessary if WebknossosDataRequest accepts Seq.
  2. Consider adding a check for empty bucketPositions to potentially short-circuit the processing.
private def getDataForBucketPositions(annotationId: String, tracingId: String)(
    bucketPositions: Seq[Vec3Int],
    mag: Vec3Int,
    additionalCoordinates: Option[Seq[AdditionalCoordinate]])(
    implicit tc: TokenContext,
    ec: ExecutionContext): Fox[(Seq[Box[Array[Byte]]], ElementClass.Value)] =
  for {
    tracing <- annotationService.findVolume(annotationId, tracingId) ?~> "tracing.notFound"
+   _ <- if (bucketPositions.isEmpty) Fox.successful(()) else Fox.successful(())
    dataRequests = bucketPositions.map { position =>
      WebknossosDataRequest(
        position = position * mag * DataLayer.bucketLength,
        mag = mag,
        cubeSize = DataLayer.bucketLength,
        fourBit = Some(false),
        applyAgglomerate = None,
        version = None,
        additionalCoordinates = additionalCoordinates
      )
-   }.toList
+   }
    bucketDataBoxes <- if (tracing.getHasEditableMapping) {
      val mappingLayer = annotationService.editableMappingLayer(annotationId, tracingId, tracing)
      editableMappingService.volumeDataBucketBoxes(mappingLayer, dataRequests)
    } else
      volumeTracingService.dataBucketBoxes(annotationId,
                                           tracingId,
                                           tracing,
                                           dataRequests,
                                           includeFallbackDataIfAvailable = true)
  } yield (bucketDataBoxes, elementClassFromProto(tracing.elementClass))
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/BucketProvider.scala (1)

13-17: Good implementation of the loadMultiple method

The method provides a default implementation for loading multiple data read instructions by processing each instruction serially with Fox.serialSequenceBox. This creates a foundation that concrete providers can override with more optimized implementations.

Consider adding a comment indicating this is a default implementation that can be overridden by bucket providers with more efficient batch loading capabilities.

  def loadMultiple(readInstructions: Seq[DataReadInstruction])(implicit ec: ExecutionContext,
                                                               tc: TokenContext): Fox[Seq[Box[Array[Byte]]]] =
+   // Default implementation that processes instructions serially.
+   // Concrete providers can override this with more efficient batch loading.
    Fox.serialSequenceBox(readInstructions) { readInstruction =>
      load(readInstruction)
    }
webknossos-jni/src/bucketScanner.cpp (3)

81-85: Informative error messages.

Throwing descriptive runtime exceptions in both catch blocks with details about the native function helps with debugging. Make sure these messages don't leak sensitive or excessive information, especially in production logs.


90-116: countSegmentVoxels function logic is correct and performance is acceptable.

  1. The loop and counter increment seem correct for counting occurrences of segmentId.
  2. Consider early release of bucketBytes before returning to neatly ensure the JNI memory is freed. Currently, memory is only released on exceptions.
 ...
 try {
   ...
+  // release bytes here before returning
+  env->ReleaseByteArrayElements(bucketBytesJavaArray, bucketBytes, 0);
   return segmentVoxelCount;
 } catch (const std::exception &e) {
 ...

119-160: extendSegmentBoundingBox is methodically implemented.

  1. The triple-nested loop is logically correct for scanning each voxel in a bucket.
  2. Updates to the bounding box are consistent with typical min/max bounding logic.
  3. Similar to countSegmentVoxels, consider releasing bucketBytes prior to returning from the try block to avoid relying solely on catch blocks.
...
 try {
   ...
+  env->ReleaseByteArrayElements(bucketBytesJavaArray, bucketBytes, 0);
   return resultAsJIntArray;
 } catch (const std::exception &e) {
...
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa4dd9c and f3ff1c9.

📒 Files selected for processing (20)
  • CHANGELOG.unreleased.md (1 hunks)
  • util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/BucketProvider.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/NativeBucketScanner.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (4 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (5 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala (2 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (4 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala (5 hunks)
  • webknossos-jni/src/bucketScanner.cpp (2 hunks)
  • webknossos-jni/src/include/com_scalableminds_webknossos_datastore_helpers_NativeBucketScanner.h (1 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TracingStoreModule.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (4 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/TSDatasetErrorLoggingService.scala (1 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentStatisticsService.scala (4 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (2 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (3 hunks)
🧰 Additional context used
🧬 Code Definitions (8)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (1)
  • DataLayer (277-325)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (3)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/TSDatasetErrorLoggingService.scala (1)
  • TSDatasetErrorLoggingService (11-16)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
  • DataServiceDataRequest (17-24)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)
  • handleMultipleBucketRequests (56-101)
webknossos-jni/src/include/com_scalableminds_webknossos_datastore_helpers_NativeBucketScanner.h (1)
webknossos-jni/src/bucketScanner.cpp (2)
  • jlong (90-116)
  • jintArray (119-160)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (3)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/TemporaryTracingService.scala (1)
  • isTemporaryTracing (100-103)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
  • DataServiceDataRequest (17-24)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)
  • handleMultipleBucketRequests (56-101)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentStatisticsService.scala (4)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/NativeBucketScanner.scala (1)
  • NativeBucketScanner (5-28)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (2)
  • EditableMappingService (90-524)
  • volumeDataBucketBoxes (207-216)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingLayer.scala (2)
  • tracingId (110-110)
  • version (108-108)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (1)
  • dataBucketBoxes (510-528)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (2)
util/src/main/scala/com/scalableminds/util/tools/Fox.scala (10)
  • flatMap (321-331)
  • flatMap (407-407)
  • foreach (334-337)
  • foreach (409-409)
  • Fox (58-277)
  • Fox (279-415)
  • successful (62-65)
  • map (318-321)
  • map (405-405)
  • toFox (377-377)
app/mail/MailchimpTicker.scala (1)
  • withErrorLogging (39-46)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (5)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (2)
  • isSingleBucket (23-23)
  • DataReadInstruction (26-34)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/layers/N5DataLayers.scala (1)
  • bucketProvider (16-21)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (2)
  • withErrorLoggingMultiple (48-60)
  • withErrorLogging (60-86)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/BucketProvider.scala (1)
  • loadMultiple (13-18)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/AgglomerateService.scala (1)
  • applyAgglomerate (50-98)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (2)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentStatisticsService.scala (2)
  • getBucketPositions (90-111)
  • getDataForBucketPositions (60-90)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala (4)
  • getBucketPositions (174-190)
  • getBucketPositions (190-201)
  • getDataForBucketPositions (156-174)
  • getDataForBucketPositions (231-258)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (51)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeSegmentStatisticsService.scala (5)

7-9: Imports updated for native bucket scanning.

The imports have been updated to include NativeBucketScanner, SegmentStatistics, and WebknossosDataRequest. These changes align with the PR objective of replacing Scala-based voxel counting with a native bucket scanner and supporting multi-get functionality.


13-13: Changed Box import to Lift's implementation.

The import for Box has been changed from Scala's implementation to Lift's. This change supports the multi-get functionality as Lift's Box is used by the underlying data retrieval methods.


26-27: Added native bucket scanner.

Addition of the NativeBucketScanner as a lazy val is a good implementation choice. This ensures the native scanner is only initialized when needed and aligns with the PR objective of replacing Scala-based voxel counting with a more efficient native implementation.


41-41: Updated method call to use batch bucket data retrieval.

The method call has been updated to use the new getDataForBucketPositions method instead of the previous single-position method. This change enables batch processing of bucket positions, which should lead to the expected 10x performance improvement mentioned in the PR objectives.


57-57: Updated method call to use batch bucket data retrieval for bounding box calculation.

Similar to the volume calculation update, the bounding box calculation now uses the batch processing method getDataForBucketPositions. This consistency ensures that both volume and bounding box calculations benefit from the performance improvements.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/SegmentStatistics.scala (7)

6-9: Imports look fine.
These additional imports appear valid and are used in the code below (e.g., tryo from net.liftweb.common.Box). No concerns here.


24-25: Introduction of bucketScanner trait.
Defining bucketScanner as a protected member is a clear way to encapsulate the native scanning logic. Ensure proper initialization in concrete classes or mixins.


26-34: Check large-scale data handling.
Using a batch-oriented parameter (getDataForBucketPositions) for volume calculation is beneficial for performance. However, if this needs to handle very large sets of buckets, confirm that memory usage and performance remain acceptable.


39-49: Ensure sequential execution is appropriate.
This volume-counting step runs serially via Fox.serialCombined. Verify that the expected runtime is acceptable for large bucket lists. If parallelization is desired, consider verifying concurrency safety first, as the current approach looks strictly sequential.


53-62: Bounding box signature update looks consistent.
The new parameters and approach for retrieving data in batches mirror the volume calculation changes. This is consistent and should help unify the data retrieval logic.


72-81: Bucket-by-bucket bounding box extension.
Serially processing each bucket with extendBoundingBoxByData is straightforward. No immediate concerns.


109-134: Potential concurrency consideration.
extendBoundingBoxByData updates a shared mutable structure (boundingBoxMutable). Because it's invoked within Fox.serialCombined, it should be safe in serial. If there's ever a parallel approach, this would need synchronization or a refactor.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/TSDatasetErrorLoggingService.scala (1)

1-17: New dataset error logging service.
This class cleanly extends DatasetErrorLoggingService. Marking applicationHealthService as None is acceptable for now, but if future health checks are needed, consider providing a dedicated service object. No issues otherwise.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)

22-24: Helper method for single-bucket detection.
The logic to verify if the cuboid's size matches DataLayer.bucketLength is straightforward. This should help differentiate single-bucket vs bulk requests.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/dataformats/BucketProvider.scala (1)

6-6: Added import for Box from net.liftweb.common

This import is necessary for the new loadMultiple method's return type.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/TracingStoreModule.scala (2)

9-9: Updated import statement to include TSDatasetErrorLoggingService

Consolidated import statement to include both VolumeTracingService and TSDatasetErrorLoggingService from the same package.


29-29: Added eager singleton binding for TSDatasetErrorLoggingService

The new binding registers TSDatasetErrorLoggingService as an eager singleton, ensuring it's initialized at application startup. This aligns with the PR objective of enhancing error logging for bucket loading in the tracing store.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/controllers/DataSourceController.scala (1)

46-46: Updated parameter type to DSDatasetErrorLoggingService

Changed the injected service type from what was likely DatasetErrorLoggingService to the more specific DSDatasetErrorLoggingService. This is part of the larger effort to make the error logging service mandatory and enhance error handling capabilities.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (2)

16-16: Added import for Box from net.liftweb.common

This import is necessary for the new loadMultiple method's return type.


39-48:

✅ Verification successful

Optimized implementation of loadMultiple

This override of the loadMultiple method from the BucketProvider trait provides a more efficient implementation for loading multiple buckets simultaneously. It checks for empty requests first, then extracts the version from the first instruction and uses it for all buckets. This implementation likely leverages the Multi-Get feature from FossilDB mentioned in the PR objectives, which should significantly improve performance.

Verify that in practical use, all read instructions in a batch would have the same version, since this implementation uses only the version from the first instruction.


🏁 Script executed:

#!/bin/bash
# Check if DataReadInstruction objects typically have the same version when processed in batches

# Search for places where multiple DataReadInstruction objects are created
rg -A 5 "Seq\[DataReadInstruction\]" --include="*.scala"

# Check for any code that might set different versions
rg -A 3 "DataReadInstruction.*version" --include="*.scala"

Length of output: 1003


Below is a new shell script using corrected flags to re-run the search for any indications that batches of read instructions might have differing versions:

Attention: Verify version consistency in loadMultiple method

The optimized implementation of loadMultiple in VolumeTracingLayer.scala improves performance by leveraging FossilDB’s multi-get feature; however, it always extracts the version from only the first read instruction. Please verify that, in practical usage, all DataReadInstruction instances in a batch share the same version to avoid potential issues.

  • File: webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (Lines 39–48)
  • Action: Double-check through manual code review or tests that no code paths create a batch of read instructions with differing versions.

🏁 Script executed:

#!/bin/bash
# Search for occurrences of multiple DataReadInstruction objects in Scala files and check their version settings
rg -A 5 "Seq\[DataReadInstruction\]" -g "*.scala"
rg -A 3 "DataReadInstruction.*version" -g "*.scala"

Length of output: 3645


Optimized implementation verified—please ensure all batches maintain version consistency

After verifying across the codebase, we did not find any indication that batches of DataReadInstruction instances are created with differing versions. The relevant searches in both the VolumeTracingLayer.scala and the related datastore files consistently show version retrieval based on the settings (e.g., request.settings.version in BinaryDataService.scala) and no alternative logic for setting distinct versions within a single batch. Therefore, the design decision to extract the version only from the first instruction in loadMultiple appears valid under current usage.

  • File: webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingLayer.scala (Lines 39–48)
  • Action: Although current evidence indicates that all read instructions in a batch share the same version, maintain vigilance in future changes to ensure that batch creation does not introduce version mismatches.
webknossos-datastore/app/com/scalableminds/webknossos/datastore/DataStoreModule.scala (1)

28-28: Service migration from DatasetErrorLoggingService to DSDatasetErrorLoggingService

The binding has been updated to use the datastore-specific DSDatasetErrorLoggingService implementation instead of the generic DatasetErrorLoggingService. This aligns with the PR objective of enhancing error logging for bucket loading operations.

util/src/main/scala/com/scalableminds/util/tools/Fox.scala (2)

90-91: Method signature enhancement to accept broader sequence types

The method signature has been improved to accept any Seq[A] instead of only List[A], increasing the flexibility of this utility method without changing its behavior.


102-102: Necessary conversion to maintain compatibility

The implementation now converts the input sequence to a list using l.toList to ensure compatibility with the existing runNext method implementation. This is the correct approach when broadening the input parameter type.

webknossos-jni/src/include/com_scalableminds_webknossos_datastore_helpers_NativeBucketScanner.h (2)

18-24: New JNI method for segment voxel counting

Added a native method declaration for counting voxels within a specific segment. This aligns with the PR objective of improving performance for segment statistics calculation by implementing native code optimizations.


26-32: New JNI method for bounding box calculation

Added a native method declaration for extending segment bounding boxes. This complements the voxel counting functionality and is part of the performance optimization effort mentioned in the PR description.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/NativeBucketScanner.scala (2)

9-12: Native implementation for segment voxel counting

This new native method corresponds to the JNI declaration and provides efficient counting of voxels belonging to a specific segment ID. This replaces the previous Scala-based voxel counting implementation for better performance.


14-27: Native implementation for bounding box extension

This complex native method efficiently calculates and extends a segment's bounding box by analyzing voxel data. The method signature properly handles all necessary parameters including bucket metadata, segment ID, and existing bounding box coordinates.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataServiceHolder.scala (2)

26-26: Constructor parameter type change to more specific implementation

The parameter type has been changed from DatasetErrorLoggingService to DSDatasetErrorLoggingService, indicating a move to a more specific implementation for the datastore context.


49-49: Parameter no longer wrapped in Option

The datasetErrorLoggingService is now passed directly to the binaryDataService constructor without being wrapped in an Option, making it a required parameter rather than optional. This aligns with the PR objective of enhancing error logging for bucket loading.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (3)

44-44: Added datasetErrorLoggingService dependency

Added the TSDatasetErrorLoggingService as a constructor parameter, making error logging services available to this class. This aligns with the PR objective of enhancing error logging for bucket loading.


73-73: Updated BinaryDataService instantiation

The binaryDataService instantiation now includes the datasetErrorLoggingService parameter instead of None, ensuring error logging is properly integrated into the binary data service.


510-527: Added new method for optimized bucket data retrieval

The new dataBucketBoxes method provides an optimized way to retrieve bucket data by leveraging the handleMultipleBucketRequests method from binaryDataService. This implementation supports the PR's goal of achieving approximately a 10x speedup for segment volume and bounding box requests.

The method follows the same pattern as the existing data method but returns raw bucket data boxes instead of processed data, allowing for more efficient processing of multiple bucket requests in a single operation.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (4)

25-25: Updated import to include TSDatasetErrorLoggingService

Added import for TSDatasetErrorLoggingService to support the new constructor parameter.


91-91: Added datasetErrorLoggingService dependency

Added TSDatasetErrorLoggingService as a constructor parameter, consistent with the changes in other services to make error logging mandatory.


107-107: Updated BinaryDataService instantiation

The binaryDataService instantiation now includes the datasetErrorLoggingService parameter instead of None, ensuring error logging is properly integrated.


207-216: Added volumeDataBucketBoxes method for optimized bucket data retrieval

Implemented a new method that uses handleMultipleBucketRequests from binaryDataService to efficiently retrieve bucket data in bulk. This is consistent with the similar method added to VolumeTracingService and supports the performance improvements described in the PR objectives.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala (6)

9-10: Updated imports for native bucket scanner integration

Modified imports to include NativeBucketScanner and update the ElementClass import path, supporting the transition from Scala-based voxel counting to the native bucket scanner mentioned in the PR objectives.


16-16: Updated import path for consistency

Modified the import path for consistency with other changes related to the native bucket scanner integration.


37-38: Added NativeBucketScanner instance

Initialized a new protected bucketScanner instance of NativeBucketScanner, which will be used for the optimized segment statistics computation described in the PR objectives.


124-132: Refactored getSegmentVolume for improved performance

The method has been refactored to use calculateSegmentVolume with the new getDataForBucketPositions method, taking advantage of the multi-get feature described in the PR objectives for approximately 10x speedup.


147-147: Updated getSegmentBoundingBox to use multi-get

Updated getSegmentBoundingBox to use the new getDataForBucketPositions method instead of the previous single-bucket implementation, enabling performance improvements for bounding box requests.


156-172: Replaced getTypedDataForBucketPosition with optimized implementation

The previous method has been replaced with getDataForBucketPositions, which now accepts multiple bucket positions and returns both the bucket data and element class. This change supports the bulk processing capability needed for the performance improvements described in the PR objectives.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/DatasetErrorLoggingService.scala (4)

11-20: Extracting the applicationHealthService into a separate trait field looks good.

By making applicationHealthService optional and part of the trait, you allow implementations to decide how (and whether) to push errors. This promotes flexibility and reusability of error-logging behavior.


48-58: Implementation of withErrorLoggingMultiple is consistent with single-item error logging.

The iterative application of withErrorLogging to each Box[Array[Byte]] ensures that all errors are captured. This is a logical extension of the existing withErrorLogging pattern. The method's overall structure appears clean and error-friendly.


75-75: Ensure potential side effects are intentional.

applicationHealthService.foreach(_.pushError(e)) is a key side effect. Verify that calling it here, specifically with e from InternalError, aligns with your broader health/error reporting requirements.


88-94: Concrete implementation of DSDatasetErrorLoggingService is straightforward.

Injecting dsApplicationHealthService and returning it as Some(...) at runtime clarifies error push behavior. This class offers a clean pattern for other, similar services to extend the base trait.

webknossos-jni/src/bucketScanner.cpp (1)

5-5: Use of <vector> ensures efficient collection handling.

Including <vector> is appropriate for storing intermediate bounding box data. No concerns here.

webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (4)

3-3: New imports and updated constructor dependencies.

  1. import collections.SequenceUtils usage suggests you’re introducing a new utility for handling sequences – verify it’s needed in this file.
  2. Explicitly providing datasetErrorLoggingService ensures all calls can leverage consistent logging.

Also applies to: 15-15, 26-26


56-99: handleMultipleBucketRequests logic promotes succinct retrieval of multiple buckets.

  1. The check requests.forall(_.isSingleBucket) is a good safety measure, ensuring only single-bucket calls are processed here.
  2. The location-based filtering (indicesWhereOutsideRange) neatly separates valid from out-of-range requests.
  3. Returning Empty boxes for invalid requests is a clean approach to differentiate partial vs. completely missing data.

101-124: Refactoring data conversion via convertIfNecessary and convertAccordingToRequest.

  1. This approach keeps the code DRY and less error-prone, isolating conversion logic.
  2. The fallback to Fox.successful(inputArray) when no agglomeration is needed or halfByte is false is straightforward.
  3. Ensure the error strings passed to ?~> remain user-friendly and actionable.

132-133: Applying convertAccordingToRequest again is consistent.

The explicit second call to convertAccordingToRequest after handleDataRequest maintains uniform conversion logic. No immediate issues.

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat ✨, Couldn't find any bugs.

I think I found potential places in the code that could use a parallel for-loop instead of serial processing. Please see my comments and tell me what you think.

Testing went also well :D

But please keep in mind that I did not review the JNI part. If I should do this as well, please just tell me 👍

@MichaelBuessemeyer
Copy link
Contributor

Maybe #8482 is already a solution to my parallel for loop suggestions (instead of Fox.serialCombined) implementations as it introduces the FossilDBPutBuffer. But there might still be places in the code where Fox.serialCombined might be blocking instead of running in a parallel for loop for each element.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
webknossos-jni/src/bucketScanner.cpp (4)

95-95: Inconsistent use of NULL vs nullptr

The code uses NULL here but nullptr in other parts of the file. While both work, consistency is better for maintainability.

-    jbyte * bucketBytes = env -> GetByteArrayElements(bucketBytesJavaArray, NULL);
+    jbyte * bucketBytes = env -> GetByteArrayElements(bucketBytesJavaArray, nullptr);

128-128: Inconsistent use of NULL vs nullptr

For consistency with the rest of the codebase, use nullptr instead of NULL.

-    jbyte * bucketBytes = env -> GetByteArrayElements(bucketBytesJavaArray, NULL);
+    jbyte * bucketBytes = env -> GetByteArrayElements(bucketBytesJavaArray, nullptr);

132-147: Consider validating bucketLength parameter

The code doesn't validate that bucketLength is positive before using it in for-loops. If bucketLength is zero or negative, this could lead to unexpected behavior.

Add a validation check at the beginning of the try block:

    try {
+       if (bucketLength <= 0) {
+           throw std::invalid_argument("bucketLength must be positive");
+       }

        const size_t elementCount = getElementCount(inputLengthBytes, bytesPerElement);

132-147: Consider optimizing the bounding box calculation

The current implementation uses nested loops to iterate through every voxel in the bucket, which might be inefficient for large buckets with few matching voxels.

A potential optimization would be to only iterate through voxels that match the segment ID, rather than checking every voxel in the bucket. This could be accomplished by first identifying the indices of matching voxels and then calculating their coordinates, or by using a more efficient data structure to store voxel positions.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18eb48b and 5fcafce.

📒 Files selected for processing (6)
  • CHANGELOG.unreleased.md (1 hunks)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/NativeBucketScanner.scala (1 hunks)
  • webknossos-jni/src/bucketScanner.cpp (2 hunks)
  • webknossos-jni/src/include/com_scalableminds_webknossos_datastore_helpers_NativeBucketScanner.h (1 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (4 hunks)
  • webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • CHANGELOG.unreleased.md
  • webknossos-jni/src/include/com_scalableminds_webknossos_datastore_helpers_NativeBucketScanner.h
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/helpers/NativeBucketScanner.scala
🧰 Additional context used
🧬 Code Definitions (2)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (3)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/TemporaryTracingService.scala (1)
  • isTemporaryTracing (100-103)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/requests/DataServiceRequests.scala (1)
  • DataServiceDataRequest (17-24)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/BinaryDataService.scala (1)
  • handleMultipleBucketRequests (56-101)
webknossos-jni/src/bucketScanner.cpp (3)
webknossos-jni/src/include/jniutils.h (1)
  • throwRuntimeException (6-6)
webknossos-jni/src/jniutils.cpp (2)
  • throwRuntimeException (3-9)
  • throwRuntimeException (3-3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (2)
  • bytesPerElement (101-113)
  • isSigned (67-82)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (8)
webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/volume/VolumeTracingService.scala (3)

44-44: Consistent error logging enhancement.

Good addition of the datasetErrorLoggingService parameter, which aligns with the PR objective to enhance error logging for bucket loading in the tracing store.


73-73: Proper dependency injection.

Correctly updated the binaryDataService initialization to use the injected datasetErrorLoggingService instead of None, enabling better error logging for bucket loading operations.


510-527: Implemented Multi-Get feature for improved segment statistics performance.

This new method effectively implements the Multi-Get feature from FossilDB mentioned in the PR objectives. By using handleMultipleBucketRequests instead of handleDataRequests, you achieve the performance improvement (approximately 10x speedup) for segment volume and bounding box requests when no fallback volume annotations are present.

The implementation mirrors the structure of the existing data method while providing the necessary functionality differences to support batched bucket requests.

webknossos-tracingstore/app/com/scalableminds/webknossos/tracingstore/tracings/editablemapping/EditableMappingService.scala (4)

31-31: Appropriate import for error logging service.

Correctly added the import for TSDatasetErrorLoggingService to support the enhanced error logging.


98-98: Consistent error logging enhancement.

Good addition of the datasetErrorLoggingService parameter to the constructor, which maintains consistency with the changes in VolumeTracingService and supports the PR's error logging improvements.


114-114: Proper dependency injection.

Updated the binaryDataService initialization to use the injected datasetErrorLoggingService instead of None, consistent with the changes in VolumeTracingService.


220-229: Implemented Multi-Get feature for improved performance.

Added a parallel implementation of the Multi-Get feature, ensuring consistency across the codebase. This method mirrors the implementation in VolumeTracingService.dataBucketBoxes and will provide the same performance benefits when handling multiple bucket requests.

The implementation follows the established pattern of the existing volumeData method while adapting it to use the more efficient handleMultipleBucketRequests method.

webknossos-jni/src/bucketScanner.cpp (1)

91-117: Good implementation of new native functions

The implementation of both new functions follows the established patterns in the codebase and correctly handles exceptions with appropriate error messages. The functions make good use of existing helper methods (getElementCount and segmentIdAtIndex) for consistency.

Also applies to: 120-161

@fm3
Copy link
Member Author

fm3 commented Apr 7, 2025

Thanks for your review @MichaelBuessemeyer I adressed most parts. Please see my comment at #8482 (comment) concerning the serialCombined/parallelization considerations.

@normanrz could you please have a look at the c++ changes again?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
webknossos-jni/src/bucketScanner.cpp (1)

149-153: ⚠️ Potential issue

Memory leak due to missing byte array release in the success path

The function properly releases the byte array elements in error paths (lines 155 and 159), but is missing the release in the success path. This will cause memory leaks.

        env->ReleaseByteArrayElements(bucketBytesJavaArray, bucketBytes, 0);
        jintArray resultAsJIntArray = env -> NewIntArray(bbox.size());
        env -> SetIntArrayRegion(resultAsJIntArray, 0, bbox.size(), reinterpret_cast < const jint * > (bbox.data()));

        return resultAsJIntArray;

Wait, I see that line 149 already contains the release, so this issue has been fixed from previous review comments. This implementation correctly releases the byte array before returning.

🧹 Nitpick comments (1)
webknossos-jni/src/bucketScanner.cpp (1)

133-148: Consider optimizing the triple-nested loop for better performance

The current implementation uses a triple-nested loop to iterate through all voxels in the bucket. For large buckets, this could be a performance bottleneck. Since you're only interested in voxels matching the segmentId, consider optimizing:

-       for (int x = 0; x < bucketLength; x++) {
-           for (int y = 0; y < bucketLength; y++) {
-               for (int z = 0; z < bucketLength; z++) {
-                   int index = z * bucketLength * bucketLength + y * bucketLength + x;
-                   int64_t currentValue = segmentIdAtIndex(bucketBytes, index, bytesPerElement, isSigned);
-                   if (currentValue == segmentId) {
-                       bbox[0] = std::min(bbox[0], x + bucketTopLeftX);
-                       bbox[1] = std::min(bbox[1], y + bucketTopLeftY);
-                       bbox[2] = std::min(bbox[2], z + bucketTopLeftZ);
-                       bbox[3] = std::max(bbox[3], x + bucketTopLeftX);
-                       bbox[4] = std::max(bbox[4], y + bucketTopLeftY);
-                       bbox[5] = std::max(bbox[5], z + bucketTopLeftZ);
-                   }
-               }
-           }
-       }
+       // Single loop approach for potentially better cache locality
+       const size_t totalVoxels = bucketLength * bucketLength * bucketLength;
+       for (size_t i = 0; i < totalVoxels; i++) {
+           int64_t currentValue = segmentIdAtIndex(bucketBytes, i, bytesPerElement, isSigned);
+           if (currentValue == segmentId) {
+               // Convert linear index to 3D coordinates
+               int z = i / (bucketLength * bucketLength);
+               int y = (i % (bucketLength * bucketLength)) / bucketLength;
+               int x = i % bucketLength;
+               
+               bbox[0] = std::min(bbox[0], x + bucketTopLeftX);
+               bbox[1] = std::min(bbox[1], y + bucketTopLeftY);
+               bbox[2] = std::min(bbox[2], z + bucketTopLeftZ);
+               bbox[3] = std::max(bbox[3], x + bucketTopLeftX);
+               bbox[4] = std::max(bbox[4], y + bucketTopLeftY);
+               bbox[5] = std::max(bbox[5], z + bucketTopLeftZ);
+           }
+       }

This single-loop approach may have better cache locality and could be faster for large datasets.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5fcafce and b08a51f.

📒 Files selected for processing (2)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala (4 hunks)
  • webknossos-jni/src/bucketScanner.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • webknossos-datastore/app/com/scalableminds/webknossos/datastore/services/SegmentIndexFileService.scala
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (4)
webknossos-jni/src/bucketScanner.cpp (4)

5-5: Added vector include for the new bounding box implementation

The include for the standard vector library is appropriately added to support the bounding box vector in the new extendSegmentBoundingBox function.


82-82: Improved exception messages with function context

Adding the function name to exception messages is a good improvement for debuggability, allowing errors to be more easily traced to their source.

Also applies to: 86-86


91-118: Input validation needed for segmentId in countSegmentVoxels

The function correctly implements voxel counting for a specific segment ID, but it's missing input validation for the segmentId parameter.

While the function handles exceptions from getElementCount and properly releases JNI resources in all paths, it should validate that segmentId is non-negative before processing, as negative segment IDs likely don't make sense in this context.

    try {
+       if (segmentId < 0) {
+           throw std::invalid_argument("segmentId cannot be negative");
+       }

        const size_t elementCount = getElementCount(inputLengthBytes, bytesPerElement);

Note that the existing segmentIdAtIndex function handles validation for bytesPerElement, but explicit validation would improve robustness.


121-163: Input validation needed for segmentId and nested loops in extendSegmentBoundingBox

The bounding box extension function has several areas for improvement:

  1. Missing validation for segmentId similarly to the previous function
  2. The triple-nested loop could be a performance concern for large buckets
  3. There's no validation of bucketLength to ensure it's positive
    try {
+       if (segmentId < 0) {
+           throw std::invalid_argument("segmentId cannot be negative");
+       }
+       if (bucketLength <= 0) {
+           throw std::invalid_argument("bucketLength must be positive");
+       }

        const size_t elementCount = getElementCount(inputLengthBytes, bytesPerElement);

Additionally, consider adding a check to verify that elementCount == bucketLength^3 to ensure data consistency.

@fm3 fm3 requested a review from normanrz April 8, 2025 07:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
webknossos-jni/src/bucketScanner.cpp (1)

152-154: ⚠️ Potential issue

Missing byte array release before returning.

The function fails to release the byte array reference before returning in the success path, which could cause memory leaks.

Apply the following fix:

        jintArray resultAsJIntArray = env -> NewIntArray(bbox.size());
        env -> SetIntArrayRegion(resultAsJIntArray, 0, bbox.size(), reinterpret_cast < const jint * > (bbox.data()));
+       env->ReleaseByteArrayElements(bucketBytesJavaArray, bucketBytes, 0);

        return resultAsJIntArray;
🧹 Nitpick comments (1)
webknossos-jni/src/bucketScanner.cpp (1)

120-165: Well-optimized loop ordering for cache locality.

The z,y,x loop ordering matches the bucket's fortran ordering, providing good cache performance. This is well documented with the comment on line 146.

However, there's still a need for input validation like in the previous function.

Consider adding input validation at the function entry:

    (JNIEnv * env, jobject instance, jbyteArray bucketBytesJavaArray, jint bytesPerElement, jboolean isSigned, jint bucketLength, jlong segmentId,
      jint bucketTopLeftX, jint bucketTopLeftY, jint bucketTopLeftZ,
      jint existingBBoxTopLeftX, jint existingBBoxTopLeftY, jint existingBBoxTopLeftZ,
      jint existingBBoxBottomRightX, jint existingBBoxBottomRightY, jint existingBBoxBottomRightZ) {

+   // Validate bytesPerElement
+   if (bytesPerElement != 1 && bytesPerElement != 2 && bytesPerElement != 4 && bytesPerElement != 8) {
+       throwRuntimeException(env, "Native Exception in BucketScanner extendSegmentBoundingBox: unsupported bytesPerElement value");
+       return nullptr;
+   }

    jsize inputLengthBytes = env -> GetArrayLength(bucketBytesJavaArray);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b08a51f and ad36432.

📒 Files selected for processing (2)
  • CHANGELOG.unreleased.md (1 hunks)
  • webknossos-jni/src/bucketScanner.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.unreleased.md
🧰 Additional context used
🧬 Code Definitions (1)
webknossos-jni/src/bucketScanner.cpp (3)
webknossos-jni/src/include/jniutils.h (1)
  • throwRuntimeException (6-6)
webknossos-jni/src/jniutils.cpp (2)
  • throwRuntimeException (3-9)
  • throwRuntimeException (3-3)
webknossos-datastore/app/com/scalableminds/webknossos/datastore/models/datasource/DataLayer.scala (2)
  • bytesPerElement (101-113)
  • isSigned (67-82)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: circleci_build
🔇 Additional comments (2)
webknossos-jni/src/bucketScanner.cpp (2)

5-5: Include addition is appropriate.

This include is necessary for using std::array in the extendSegmentBoundingBox function.


82-82: Improved error messaging.

Adding the function name to the exception messages provides better context for debugging.

Also applies to: 86-86

Copy link
Contributor

@MichaelBuessemeyer MichaelBuessemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks. Scala changes look good 👍. Should I retest or did you already did plenty testing? At least the scala changes look like nothing really changed

@fm3
Copy link
Member Author

fm3 commented Apr 8, 2025

Should be fine, thanks! I’m just waiting on the cpp re-review then :)

Copy link
Member

@normanrz normanrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c++ looks good

@fm3 fm3 merged commit 5a00690 into master Apr 8, 2025
3 checks passed
@fm3 fm3 deleted the volume-stats-perf branch April 8, 2025 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants