Skip to content

Conversation

@JoshuaSiraj
Copy link
Collaborator

@JoshuaSiraj JoshuaSiraj commented Nov 25, 2025

Summary by CodeRabbit

  • New Features

    • Global dry-run mode (CLI flag) to simulate runs: skips actual file writes and directory creation while still returning resolved paths and updating indexes.
    • DICOM metadata extraction now includes additional patient fields: sex, birth date, age, ethnic group, weight, size, and clinical history.
  • Behavior Changes

    • In dry-run, missing output files no longer raise errors; save operations are skipped but still reported.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 25, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (4)
  • .github/actions/pytest/action.yml is excluded by none and included by none
  • .github/workflows/main.yml is excluded by none and included by none
  • pixi.lock is excluded by !**/*.lock and included by none
  • pixi.toml is excluded by none and included by none

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

Adds a CLI-accessible dry-run mode propagated through Autopipeline, SampleOutput, and writers to skip actual filesystem writes while still resolving paths and updating indexes; expands DICOM metadata extraction to include seven additional patient-related tags. process_one_sample only errors on missing outputs when not in dry-run.

Changes

Cohort / File(s) Summary
CLI & Autopipeline
src/imgtools/cli/autopipeline.py, src/imgtools/autopipeline.py
Adds --dry-run / -d CLI flag and dry_run: bool = False parameter; forwards dry_run into Autopipeline construction and pipeline execution; process_one_sample conditionalizes error on missing outputs by dry_run.
SampleOutput & writer base
src/imgtools/io/sample_output.py, src/imgtools/io/writers/abstract_base_writer.py
Adds dry_run: bool field to SampleOutput and AbstractBaseWriter; SampleOutput forwards dry_run to writers and sets create_dirs=not self.dry_run; AbstractBaseWriter short-circuits path resolution/save when dry_run is true and documents behavior.
Writer implementations
src/imgtools/io/writers/nifti_writer.py, src/imgtools/io/writers/numpy_writer.py
Writer save logic gated by self.dry_run: when true, skip actual I/O but still compute/reserve paths and update the index; non-dry-run behavior preserved, including existence checks and error handling.
DICOM metadata extraction
src/imgtools/dicom/dicom_metadata/extractor_base.py
Extends ModalityMetadataExtractor.base_tags with: PatientSex, PatientBirthDate, PatientAge, EthnicGroup, PatientWeight, PatientSize, AdditionalPatientHistory.
Snapshots / Tests
tests/snapshots/...
Test snapshots updated to include newly extracted DICOM patient fields (PatientAge, PatientSex, PatientWeight, PatientSize, AdditionalPatientHistory) across multiple datasets.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Verify dry_run is forwarded to every writer construction path and honored in model_post_init.
  • Confirm index update and replace_existing logic in nifti_writer.py/numpy_writer.py remains correct when writes are skipped.
  • Check create_dirs=not self.dry_run does not break callers expecting directories to exist.
  • Review added PHI DICOM tags in extractor_base.py for privacy/compliance and test updates for correctness.

Suggested labels

hackathon

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the two main changes in the PR: adding dry-run functionality to autopipeline and including sensitive DICOM tags in metadata extraction.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JoshuaSiraj JoshuaSiraj requested a review from strixy16 November 25, 2025 21:43
@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

❌ Patch coverage is 33.33333% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.58%. Comparing base (9e73f82) to head (a913545).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/imgtools/io/writers/numpy_writer.py 0.00% 16 Missing ⚠️
src/imgtools/io/writers/abstract_base_writer.py 33.33% 4 Missing ⚠️
src/imgtools/autopipeline.py 0.00% 1 Missing ⚠️
src/imgtools/io/sample_output.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #430      +/-   ##
==========================================
- Coverage   54.60%   54.58%   -0.02%     
==========================================
  Files          66       66              
  Lines        4318     4325       +7     
==========================================
+ Hits         2358     2361       +3     
- Misses       1960     1964       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/imgtools/io/writers/numpy_writer.py (1)

74-79: dry_run currently masks invalid data types in NumPyWriter.save

With the new if not self.dry_run: guard, the else branch that raises NumpyWriterValidationError for unsupported data types is never executed in dry‑run mode. That means invalid inputs will “succeed” in dry‑run (path + index entry) but later fail only when running for real, which makes misconfiguration harder to catch.

You can keep dry_run semantics while still validating input types by moving the type check outside the dry_run gate:

     def save(
         self,
         data: np.ndarray | sitk.Image | dict[str, np.ndarray | sitk.Image],
         **kwargs: object,
@@
-        out_path = self.resolve_path(**kwargs)
-
-        if not self.dry_run:
-            if isinstance(data, (np.ndarray, sitk.Image)):
-                # Single image or array
-                array, metadata = self._to_numpy(data)
-                np.savez_compressed(out_path, image_array=array, **metadata)
-            elif isinstance(data, dict):
-                # Multiple images or arrays
-                arrays = {}
-                metadata = {}
-                for key, value in data.items():
-                    array, meta = self._to_numpy(value)
-                    arrays[key] = array
-                    for meta_key, meta_value in meta.items():
-                        metadata[f"{key}_{meta_key}"] = meta_value
-                if self.compressed:
-                    np.savez_compressed(
-                        out_path, allow_pickle=False, **arrays, **metadata
-                    )
-                else:
-                    np.savez(out_path, allow_pickle=False, **arrays, **metadata)
-            else:
-                raise NumpyWriterValidationError(
-                    "Data must be a NumPy array, SimpleITK image, or a dictionary of these types."
-                )
+        out_path = self.resolve_path(**kwargs)
+
+        # Always validate input type, even in dry_run mode.
+        if not isinstance(data, (np.ndarray, sitk.Image, dict)):
+            raise NumpyWriterValidationError(
+                "Data must be a NumPy array, SimpleITK image, or a dictionary of these types."
+            )
+
+        if not self.dry_run:
+            if isinstance(data, (np.ndarray, sitk.Image)):
+                # Single image or array
+                array, metadata = self._to_numpy(data)
+                np.savez_compressed(out_path, image_array=array, **metadata)
+            elif isinstance(data, dict):
+                # Multiple images or arrays
+                arrays: dict[str, np.ndarray] = {}
+                metadata: dict[str, object] = {}
+                for key, value in data.items():
+                    array, meta = self._to_numpy(value)
+                    arrays[key] = array
+                    for meta_key, meta_value in meta.items():
+                        metadata[f"{key}_{meta_key}"] = meta_value
+                if self.compressed:
+                    np.savez_compressed(
+                        out_path, allow_pickle=False, **arrays, **metadata
+                    )
+                else:
+                    np.savez(out_path, allow_pickle=False, **arrays, **metadata)
@@
-        self.add_to_index(
+        self.add_to_index(
             out_path,
             include_all_context=True,
             filepath_column="path",
             replace_existing=True,
         )

(Separately, the TRY003 hint about the long error message is purely stylistic; you can leave it as‑is or move the message string into the exception class/docstring if you prefer.)

Also applies to: 98-125

🧹 Nitpick comments (4)
src/imgtools/dicom/dicom_metadata/extractor_base.py (1)

105-159: New PHI tags will now flow through all metadata paths

Adding PatientSex, PatientBirthDate, PatientAge, EthnicGroup, PatientWeight, PatientSize, and AdditionalPatientHistory to base_tags means these PHI fields will be extracted for every modality and can end up in indexes, CSVs, and logs. Please double‑check that all downstream consumers (index writing, report generation, logging, export) are intended and configured to handle PHI safely (e.g., anonymization, access controls, redaction when exporting).

src/imgtools/io/writers/abstract_base_writer.py (1)

624-655: ExampleWriter.save respects dry_run but still resolves paths and updates index

The added if not self.dry_run: around the file write aligns with the base class contract: dry‑run skips actual writes while still resolving the path and updating the index. Just be aware that directories and the index file are still created via __post_init__ and add_to_index, so this is a “no data files written” dry run, not a completely side‑effect‑free mode.

src/imgtools/io/sample_output.py (1)

134-138: Dry‑run wiring is correct; consider whether root dir creation is desired

Passing dry_run=self.dry_run and create_dirs=not self.dry_run into NIFTIWriter ensures that, in dry‑run mode, the writer doesn’t create per‑file directories or touch data files, while still updating the index and returning resolved paths.

Note that the directory validator still calls validate_directory(..., create=True), so the top‑level output directory will be created even in dry‑run mode. If you want a completely non‑creating dry run, you may eventually want to make the validator conditional on dry_run as well, or document that dry_run only applies to individual file creation, not the root directory or index.

Also applies to: 141-150

src/imgtools/cli/autopipeline.py (1)

168-175: CLI dry‑run flag is wired correctly; consider clarifying its semantics

The --dry-run / -d flag is correctly threaded through the autopipeline function into Autopipeline(dry_run=...), so end‑to‑end behavior will track the flag as intended.

If you want to avoid confusion, you might tweak the option help text to reflect the actual behavior (e.g., “Run the pipeline without writing output data files; still generates index and summary reports”) so users don’t expect a completely side‑effect‑free run.

Also applies to: 179-197, 255-275

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d02ced9 and 54dac55.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock and included by none
📒 Files selected for processing (7)
  • src/imgtools/autopipeline.py (4 hunks)
  • src/imgtools/cli/autopipeline.py (3 hunks)
  • src/imgtools/dicom/dicom_metadata/extractor_base.py (1 hunks)
  • src/imgtools/io/sample_output.py (2 hunks)
  • src/imgtools/io/writers/abstract_base_writer.py (5 hunks)
  • src/imgtools/io/writers/nifti_writer.py (1 hunks)
  • src/imgtools/io/writers/numpy_writer.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure the following: - Variables and functions follow meaningful naming conventions. - Docstrings are present, accurate, and align with the implementation. - Code is efficient and avoids redundancy while adhering to DRY principles. - Consider suggestions to enhance readability and maintainability. - Highlight any potential performance issues, edge cases, or logical errors. - Ensure all imported libraries are used and necessary.

Files:

  • src/imgtools/cli/autopipeline.py
  • src/imgtools/io/writers/numpy_writer.py
  • src/imgtools/io/sample_output.py
  • src/imgtools/dicom/dicom_metadata/extractor_base.py
  • src/imgtools/autopipeline.py
  • src/imgtools/io/writers/abstract_base_writer.py
  • src/imgtools/io/writers/nifti_writer.py
🧬 Code graph analysis (2)
src/imgtools/io/sample_output.py (1)
src/imgtools/io/sample_input.py (1)
  • default (228-230)
src/imgtools/io/writers/nifti_writer.py (1)
src/imgtools/io/writers/abstract_base_writer.py (1)
  • ExistingFileMode (57-79)
🪛 Ruff (0.14.5)
src/imgtools/io/writers/numpy_writer.py

121-123: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Unit-Tests (windows-latest, py311, public)
  • GitHub Check: Unit-Tests (macos-latest, py312, public)
  • GitHub Check: Unit-Tests (windows-latest, py310, public)
  • GitHub Check: Integration-Tests (ubuntu-latest, py313, public)
  • GitHub Check: Integration-Tests (windows-latest, py310, public)
  • GitHub Check: Integration-Tests (windows-latest, py313, public)
  • GitHub Check: Integration-Tests (ubuntu-latest, py310, public)
  • GitHub Check: Integration-Tests (macos-latest, py313, public)
  • GitHub Check: Integration-Tests (macos-latest, py310, public)
🔇 Additional comments (2)
src/imgtools/io/writers/nifti_writer.py (1)

135-225: Dry‑run gate around NIfTI I/O is well‑placed

The if not self.dry_run: block cleanly isolates the existence check and sitk.WriteImage call from the rest of the method, while still resolving the path and updating the index. Once AbstractBaseWriter.resolve_path is updated to avoid unlinking files in dry‑run mode, this writer’s behavior will match the documented semantics.

src/imgtools/autopipeline.py (1)

259-269: dry_run propagation in Autopipeline is consistent and avoids false failures

Forwarding dry_run into SampleOutput and relaxing the “No output files were saved” check when sample_output.dry_run is True gives a sensible behavior: dry‑run still executes the full pipeline, populates the index, and generates reports, but doesn’t treat “no outputs written” as a hard error.

Combined with the writer‑level dry_run gates (once resolve_path is fixed to avoid unlinking in dry_run), this offers a coherent end‑to‑end dry‑run mode without affecting existing non‑dry‑run behavior.

Also applies to: 308-309, 346-347, 360-366, 465-472

@JoshuaSiraj JoshuaSiraj requested a review from skim2257 November 26, 2025 15:43
@JoshuaSiraj JoshuaSiraj linked an issue Nov 26, 2025 that may be closed by this pull request
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/imgtools/io/writers/abstract_base_writer.py (1)

356-368: dry_run still allows destructive operations via preview_path (and directory manipulation)

The early if self.dry_run: return out_path in resolve_path nicely fixes the earlier issue where OVERWRITE could unlink files during a dry‑run. However:

  • preview_path still calls out_path.unlink() in ExistingFileMode.OVERWRITE regardless of self.dry_run, so a caller using dry_run=True + OVERWRITE + preview_path() can still delete existing outputs.
  • Additionally, resolve_path will still create parent directories for new paths (and __exit__ can delete an empty root directory) even when dry_run is True, which partially contradicts the “skip actual file I/O operations” wording.

If the intent is “no filesystem mutations at all in dry‑run,” you may want to:

  • Guard the out_path.unlink() in preview_path behind not self.dry_run.
  • Consider skipping new directory creation in resolve_path and the empty‑directory cleanup in __exit__ when dry_run is True.

If instead the goal is only “no data‑file writes, but directory housekeeping is OK,” it would still be worth updating the class‑level docs for dry_run to call that out, and to mention that preview_path remains destructive in OVERWRITE mode. This will make the contract much clearer to future users of the API.

Also applies to: 439-460

🧹 Nitpick comments (2)
src/imgtools/io/writers/abstract_base_writer.py (2)

129-132: Align dry_run documentation with resolve_path behaviour (especially FAIL mode)

The new dry_run attribute and the abstract save doc clearly state that dry‑run “skips actual file I/O operations but still write[s] to the index and return[s] the resolved path,” which is good from a readability standpoint. However, there’s now a subtle mismatch between the resolve_path docstring and implementation:

  • The resolve_path docstring still says it “only raises FileExistsError if the file already exists and the mode is set to FAIL.”
  • With the new if self.dry_run: return out_path branch, a FileExistsError is no longer raised in FAIL mode when dry_run is True; the call quietly succeeds and returns the path.

For maintainability, it would be good to make this explicit one way or the other:

  • Either keep the old semantics and let FAIL still raise even in dry‑run (so dry‑run simulates failures as well as writes), or
  • Intentionally document that in dry‑run, resolve_path never raises on existing files regardless of existing_file_mode, because it’s meant as a read‑only preview.

Right now, a reader relying on the docstring would expect FileExistsError in dry‑run + FAIL, but won’t get it. Clarifying this in the docs (or adjusting the branch) will avoid surprises for users of the abstract base.

Also applies to: 159-159, 220-235, 346-368


633-657: ExampleWriter.save dry_run handling is clear; minor doc/style nits

The ExampleWriter.save implementation looks good:

  • It now respects self.dry_run by skipping the actual file write while still calling add_to_index, which matches the abstract save contract.
  • Using replace_existing=output_path.exists() continues to behave sensibly in both normal and dry‑run modes.

Two minor readability / style tweaks to consider:

  • Line 653: PEP 8 prefers two spaces before an inline comment, e.g. if self.dry_run: # ..., to keep comments visually separated from code.
  • Since this class is the “demo” implementation, you might add a brief line to the docstring noting that dry_run=True will skip writing the file but still update the index. That would make the example self‑documenting for downstream users reading this concrete writer.

Neither change affects behaviour, but they would make the example slightly clearer and more consistent.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54dac55 and 8c90996.

📒 Files selected for processing (1)
  • src/imgtools/io/writers/abstract_base_writer.py (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure the following: - Variables and functions follow meaningful naming conventions. - Docstrings are present, accurate, and align with the implementation. - Code is efficient and avoids redundancy while adhering to DRY principles. - Consider suggestions to enhance readability and maintainability. - Highlight any potential performance issues, edge cases, or logical errors. - Ensure all imported libraries are used and necessary.

Files:

  • src/imgtools/io/writers/abstract_base_writer.py
🧬 Code graph analysis (1)
src/imgtools/io/writers/abstract_base_writer.py (1)
src/imgtools/io/sample_output.py (1)
  • default (159-166)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Integration-Tests (windows-latest, py313, public)
  • GitHub Check: Integration-Tests (macos-latest, py313, public)
  • GitHub Check: Integration-Tests (windows-latest, py310, public)
  • GitHub Check: Integration-Tests (ubuntu-latest, py310, public)
  • GitHub Check: Integration-Tests (macos-latest, py310, public)
  • GitHub Check: Integration-Tests (ubuntu-latest, py313, public)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/imgtools/io/sample_output.py (1)

134-150: Surface dry_run clearly in docs and consider aligning directory creation with dry-run intent

The dry_run flag being threaded into NIFTIWriter and create_dirs=not self.dry_run looks good and keeps the behavior localized.

Two small maintainability/readability suggestions:

  • The SampleOutput class docstring’s Attributes section doesn’t list dry_run. Adding it there will keep the public configuration surface self-documenting for users reading just this model.
  • Even when dry_run=True, validate_directory is still called with create=True, so instantiating SampleOutput may create the root output directory. If the intended contract of dry-run is “no filesystem changes at all,” consider gating this as well (e.g., create=not dry_run) or at least documenting that only subdirectory creation is suppressed via create_dirs. This will make the behavior less surprising for callers relying on dry-run.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c90996 and 37df848.

📒 Files selected for processing (3)
  • src/imgtools/io/sample_output.py (2 hunks)
  • src/imgtools/io/writers/abstract_base_writer.py (6 hunks)
  • src/imgtools/io/writers/numpy_writer.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/imgtools/io/writers/abstract_base_writer.py
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure the following: - Variables and functions follow meaningful naming conventions. - Docstrings are present, accurate, and align with the implementation. - Code is efficient and avoids redundancy while adhering to DRY principles. - Consider suggestions to enhance readability and maintainability. - Highlight any potential performance issues, edge cases, or logical errors. - Ensure all imported libraries are used and necessary.

Files:

  • src/imgtools/io/sample_output.py
  • src/imgtools/io/writers/numpy_writer.py
🧬 Code graph analysis (2)
src/imgtools/io/sample_output.py (1)
src/imgtools/io/sample_input.py (1)
  • default (228-230)
src/imgtools/io/writers/numpy_writer.py (1)
src/imgtools/vizualize/visualizer2.py (1)
  • array (176-177)
🪛 Ruff (0.14.5)
src/imgtools/io/writers/numpy_writer.py

123-125: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Integration-Tests (windows-latest, py313, public)
  • GitHub Check: Integration-Tests (macos-latest, py310, public)
  • GitHub Check: Integration-Tests (macos-latest, py313, public)
  • GitHub Check: Integration-Tests (windows-latest, py310, public)
  • GitHub Check: Integration-Tests (ubuntu-latest, py313, public)
  • GitHub Check: Linting
  • GitHub Check: Integration-Tests (ubuntu-latest, py310, public)
🔇 Additional comments (1)
src/imgtools/io/writers/numpy_writer.py (1)

85-87: Docstring update for **kwargs improves clarity

Documenting **kwargs as additional context for filename generation is helpful and keeps the save API self-explanatory for callers.

Comment on lines +100 to +125
if not self.dry_run:
if isinstance(data, (np.ndarray, sitk.Image)):
# Single image or array
array, metadata = self._to_numpy(data)
np.savez_compressed(out_path, image_array=array, **metadata)
elif isinstance(data, dict):
# Multiple images or arrays
arrays = {}
metadata = {}
for key, value in data.items():
array, meta = self._to_numpy(value)
arrays[key] = array
for meta_key, meta_value in meta.items():
metadata[f"{key}_{meta_key}"] = meta_value
if self.compressed:
np.savez_compressed(
out_path, allow_pickle=False, **arrays, **metadata
)
else:
np.savez(
out_path, allow_pickle=False, **arrays, **metadata
)
else:
np.savez(out_path, allow_pickle=False, **arrays, **metadata)
else:
raise NumpyWriterValidationError(
"Data must be a NumPy array, SimpleITK image, or a dictionary of these types."
)
raise NumpyWriterValidationError(
"Data must be a NumPy array, SimpleITK image, or a dictionary of these types."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve data validation behavior under dry_run and optionally centralize error message

The dry-run guard cleanly skips actual filesystem writes, but it currently also skips all type validation:

  • The isinstance checks and _to_numpy calls only run when not self.dry_run.
  • If dry_run=True and data is an unsupported type, no NumpyWriterValidationError is raised; instead, the method quietly indexes a path and returns it. That makes it easy for a dry-run to “succeed” while a real run would fail at write time.

To keep dry-run useful for catching pipeline issues while still avoiding I/O, consider validating data independently of the write, e.g.:

-        out_path = self.resolve_path(**kwargs)
-
-        if not self.dry_run:
-            if isinstance(data, (np.ndarray, sitk.Image)):
+        out_path = self.resolve_path(**kwargs)
+
+        # Validate input type regardless of dry-run to surface misuses early.
+        if isinstance(data, (np.ndarray, sitk.Image)):
+            to_save = ("single", data)
+        elif isinstance(data, dict):
+            to_save = ("dict", data)
+        else:
+            raise NumpyWriterValidationError(
+                "Data must be a NumPy array, SimpleITK image, or a dictionary of these types."
+            )
+
+        if not self.dry_run:
+            kind, payload = to_save
+            if kind == "single":
+                data_single = payload
                 # Single image or array
-                array, metadata = self._to_numpy(data)
+                array, metadata = self._to_numpy(data_single)
                 np.savez_compressed(out_path, image_array=array, **metadata)
-            elif isinstance(data, dict):
+            else:  # dict
                 # Multiple images or arrays
-                arrays = {}
-                metadata = {}
-                for key, value in data.items():
+                arrays: dict[str, np.ndarray] = {}
+                metadata: dict[str, object] = {}
+                for key, value in payload.items():
                 ...
-            else:
-                raise NumpyWriterValidationError(
-                    "Data must be a NumPy array, SimpleITK image, or a dictionary of these types."
-                )

This keeps validation semantics identical between normal and dry-run modes while still avoiding writes.

Optionally, to satisfy TRY003 and improve reuse, you could move the long error message into a class-level constant on NumpyWriterValidationError or NumPyWriter and reference it from both _to_numpy and save, but that’s more of a style/readability tweak than a functional requirement.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.5)

123-125: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In src/imgtools/io/writers/numpy_writer.py around lines 100 to 125, the current
dry-run branch skips all type validation so unsupported input types never raise
NumpyWriterValidationError during dry runs; change the flow to always validate
input (perform isinstance checks and call self._to_numpy or a dedicated
validation helper) before checking self.dry_run, then only skip the actual
np.savez/np.savez_compressed when dry_run is True; optionally extract the long
error message into a class-level constant on NumpyWriter or
NumpyWriterValidationError and reuse it in both _to_numpy and save to avoid
duplication.

Copy link
Collaborator

@skim2257 skim2257 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great. just would love clarification on one CLI output

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"no output files, check the directory" sounds a bit confusing? what are they checking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you send the full output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create simple index file without saving files

3 participants