feat: add dry run functionality to autopipeline and add sensitive dicom tags to metadata #430

JoshuaSiraj · 2025-11-25T21:43:10Z

Summary by CodeRabbit

New Features
- Global dry-run mode (CLI flag) to simulate runs: skips actual file writes and directory creation while still returning resolved paths and updating indexes.
- DICOM metadata extraction now includes additional patient fields: sex, birth date, age, ethnic group, weight, size, and clinical history.
Behavior Changes
- In dry-run, missing output files no longer raise errors; save operations are skipped but still reported.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…om tags to metadata

coderabbitai · 2025-11-25T21:43:20Z

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (4)

.github/actions/pytest/action.yml is excluded by none and included by none
.github/workflows/main.yml is excluded by none and included by none
pixi.lock is excluded by !**/*.lock and included by none
pixi.toml is excluded by none and included by none

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

🔍 Trigger a full review

📝 Walkthrough

Walkthrough

Adds a CLI-accessible dry-run mode propagated through Autopipeline, SampleOutput, and writers to skip actual filesystem writes while still resolving paths and updating indexes; expands DICOM metadata extraction to include seven additional patient-related tags. process_one_sample only errors on missing outputs when not in dry-run.

Changes

Cohort / File(s)	Summary
CLI & Autopipeline `src/imgtools/cli/autopipeline.py`, `src/imgtools/autopipeline.py`	Adds `--dry-run / -d` CLI flag and `dry_run: bool = False` parameter; forwards `dry_run` into `Autopipeline` construction and pipeline execution; `process_one_sample` conditionalizes error on missing outputs by `dry_run`.
SampleOutput & writer base `src/imgtools/io/sample_output.py`, `src/imgtools/io/writers/abstract_base_writer.py`	Adds `dry_run: bool` field to `SampleOutput` and `AbstractBaseWriter`; `SampleOutput` forwards `dry_run` to writers and sets `create_dirs=not self.dry_run`; `AbstractBaseWriter` short-circuits path resolution/save when `dry_run` is true and documents behavior.
Writer implementations `src/imgtools/io/writers/nifti_writer.py`, `src/imgtools/io/writers/numpy_writer.py`	Writer save logic gated by `self.dry_run`: when true, skip actual I/O but still compute/reserve paths and update the index; non-dry-run behavior preserved, including existence checks and error handling.
DICOM metadata extraction `src/imgtools/dicom/dicom_metadata/extractor_base.py`	Extends `ModalityMetadataExtractor.base_tags` with: `PatientSex`, `PatientBirthDate`, `PatientAge`, `EthnicGroup`, `PatientWeight`, `PatientSize`, `AdditionalPatientHistory`.
Snapshots / Tests `tests/snapshots/...`	Test snapshots updated to include newly extracted DICOM patient fields (PatientAge, PatientSex, PatientWeight, PatientSize, AdditionalPatientHistory) across multiple datasets.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Verify dry_run is forwarded to every writer construction path and honored in model_post_init.
Confirm index update and replace_existing logic in nifti_writer.py/numpy_writer.py remains correct when writes are skipped.
Check create_dirs=not self.dry_run does not break callers expecting directories to exist.
Review added PHI DICOM tags in extractor_base.py for privacy/compliance and test updates for correctness.

Suggested labels

hackathon

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the two main changes in the PR: adding dry-run functionality to autopipeline and including sensitive DICOM tags in metadata extraction.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-11-25T21:44:16Z

Codecov Report

❌ Patch coverage is 33.33333% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.58%. Comparing base (9e73f82) to head (a913545).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/imgtools/io/writers/numpy_writer.py	0.00%	16 Missing ⚠️
src/imgtools/io/writers/abstract_base_writer.py	33.33%	4 Missing ⚠️
src/imgtools/autopipeline.py	0.00%	1 Missing ⚠️
src/imgtools/io/sample_output.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #430      +/-   ##
==========================================
- Coverage   54.60%   54.58%   -0.02%     
==========================================
  Files          66       66              
  Lines        4318     4325       +7     
==========================================
+ Hits         2358     2361       +3     
- Misses       1960     1964       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/imgtools/io/writers/numpy_writer.py (1)

74-79: dry_run currently masks invalid data types in NumPyWriter.save

With the new if not self.dry_run: guard, the else branch that raises NumpyWriterValidationError for unsupported data types is never executed in dry‑run mode. That means invalid inputs will “succeed” in dry‑run (path + index entry) but later fail only when running for real, which makes misconfiguration harder to catch.

You can keep dry_run semantics while still validating input types by moving the type check outside the dry_run gate:

     def save(
         self,
         data: np.ndarray | sitk.Image | dict[str, np.ndarray | sitk.Image],
         **kwargs: object,
@@
-        out_path = self.resolve_path(**kwargs)
-
-        if not self.dry_run:
-            if isinstance(data, (np.ndarray, sitk.Image)):
-                # Single image or array
-                array, metadata = self._to_numpy(data)
-                np.savez_compressed(out_path, image_array=array, **metadata)
-            elif isinstance(data, dict):
-                # Multiple images or arrays
-                arrays = {}
-                metadata = {}
-                for key, value in data.items():
-                    array, meta = self._to_numpy(value)
-                    arrays[key] = array
-                    for meta_key, meta_value in meta.items():
-                        metadata[f"{key}_{meta_key}"] = meta_value
-                if self.compressed:
-                    np.savez_compressed(
-                        out_path, allow_pickle=False, **arrays, **metadata
-                    )
-                else:
-                    np.savez(out_path, allow_pickle=False, **arrays, **metadata)
-            else:
-                raise NumpyWriterValidationError(
-                    "Data must be a NumPy array, SimpleITK image, or a dictionary of these types."
-                )
+        out_path = self.resolve_path(**kwargs)
+
+        # Always validate input type, even in dry_run mode.
+        if not isinstance(data, (np.ndarray, sitk.Image, dict)):
+            raise NumpyWriterValidationError(
+                "Data must be a NumPy array, SimpleITK image, or a dictionary of these types."
+            )
+
+        if not self.dry_run:
+            if isinstance(data, (np.ndarray, sitk.Image)):
+                # Single image or array
+                array, metadata = self._to_numpy(data)
+                np.savez_compressed(out_path, image_array=array, **metadata)
+            elif isinstance(data, dict):
+                # Multiple images or arrays
+                arrays: dict[str, np.ndarray] = {}
+                metadata: dict[str, object] = {}
+                for key, value in data.items():
+                    array, meta = self._to_numpy(value)
+                    arrays[key] = array
+                    for meta_key, meta_value in meta.items():
+                        metadata[f"{key}_{meta_key}"] = meta_value
+                if self.compressed:
+                    np.savez_compressed(
+                        out_path, allow_pickle=False, **arrays, **metadata
+                    )
+                else:
+                    np.savez(out_path, allow_pickle=False, **arrays, **metadata)
@@
-        self.add_to_index(
+        self.add_to_index(
             out_path,
             include_all_context=True,
             filepath_column="path",
             replace_existing=True,
         )

(Separately, the TRY003 hint about the long error message is purely stylistic; you can leave it as‑is or move the message string into the exception class/docstring if you prefer.)

Also applies to: 98-125

🧹 Nitpick comments (4)

src/imgtools/dicom/dicom_metadata/extractor_base.py (1)

105-159: New PHI tags will now flow through all metadata paths

Adding PatientSex, PatientBirthDate, PatientAge, EthnicGroup, PatientWeight, PatientSize, and AdditionalPatientHistory to base_tags means these PHI fields will be extracted for every modality and can end up in indexes, CSVs, and logs. Please double‑check that all downstream consumers (index writing, report generation, logging, export) are intended and configured to handle PHI safely (e.g., anonymization, access controls, redaction when exporting).

src/imgtools/io/writers/abstract_base_writer.py (1)

624-655: ExampleWriter.save respects dry_run but still resolves paths and updates index

The added if not self.dry_run: around the file write aligns with the base class contract: dry‑run skips actual writes while still resolving the path and updating the index. Just be aware that directories and the index file are still created via __post_init__ and add_to_index, so this is a “no data files written” dry run, not a completely side‑effect‑free mode.

src/imgtools/io/sample_output.py (1)

134-138: Dry‑run wiring is correct; consider whether root dir creation is desired

Passing dry_run=self.dry_run and create_dirs=not self.dry_run into NIFTIWriter ensures that, in dry‑run mode, the writer doesn’t create per‑file directories or touch data files, while still updating the index and returning resolved paths.

Note that the directory validator still calls validate_directory(..., create=True), so the top‑level output directory will be created even in dry‑run mode. If you want a completely non‑creating dry run, you may eventually want to make the validator conditional on dry_run as well, or document that dry_run only applies to individual file creation, not the root directory or index.

Also applies to: 141-150

src/imgtools/cli/autopipeline.py (1)

168-175: CLI dry‑run flag is wired correctly; consider clarifying its semantics

The --dry-run / -d flag is correctly threaded through the autopipeline function into Autopipeline(dry_run=...), so end‑to‑end behavior will track the flag as intended.

If you want to avoid confusion, you might tweak the option help text to reflect the actual behavior (e.g., “Run the pipeline without writing output data files; still generates index and summary reports”) so users don’t expect a completely side‑effect‑free run.

Also applies to: 179-197, 255-275

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d02ced9 and 54dac55.

⛔ Files ignored due to path filters (1)

pixi.lock is excluded by !**/*.lock and included by none

📒 Files selected for processing (7)

src/imgtools/autopipeline.py (4 hunks)
src/imgtools/cli/autopipeline.py (3 hunks)
src/imgtools/dicom/dicom_metadata/extractor_base.py (1 hunks)
src/imgtools/io/sample_output.py (2 hunks)
src/imgtools/io/writers/abstract_base_writer.py (5 hunks)
src/imgtools/io/writers/nifti_writer.py (1 hunks)
src/imgtools/io/writers/numpy_writer.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure the following: - Variables and functions follow meaningful naming conventions. - Docstrings are present, accurate, and align with the implementation. - Code is efficient and avoids redundancy while adhering to DRY principles. - Consider suggestions to enhance readability and maintainability. - Highlight any potential performance issues, edge cases, or logical errors. - Ensure all imported libraries are used and necessary.

Files:

src/imgtools/cli/autopipeline.py
src/imgtools/io/writers/numpy_writer.py
src/imgtools/io/sample_output.py
src/imgtools/dicom/dicom_metadata/extractor_base.py
src/imgtools/autopipeline.py
src/imgtools/io/writers/abstract_base_writer.py
src/imgtools/io/writers/nifti_writer.py

🧬 Code graph analysis (2)

src/imgtools/io/sample_output.py (1)

src/imgtools/io/sample_input.py (1)

default (228-230)

src/imgtools/io/writers/nifti_writer.py (1)

src/imgtools/io/writers/abstract_base_writer.py (1)

ExistingFileMode (57-79)

🪛 Ruff (0.14.5)

src/imgtools/io/writers/numpy_writer.py

121-123: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Unit-Tests (windows-latest, py311, public)
GitHub Check: Unit-Tests (macos-latest, py312, public)
GitHub Check: Unit-Tests (windows-latest, py310, public)
GitHub Check: Integration-Tests (ubuntu-latest, py313, public)
GitHub Check: Integration-Tests (windows-latest, py310, public)
GitHub Check: Integration-Tests (windows-latest, py313, public)
GitHub Check: Integration-Tests (ubuntu-latest, py310, public)
GitHub Check: Integration-Tests (macos-latest, py313, public)
GitHub Check: Integration-Tests (macos-latest, py310, public)

🔇 Additional comments (2)

src/imgtools/io/writers/nifti_writer.py (1)

135-225: Dry‑run gate around NIfTI I/O is well‑placed

The if not self.dry_run: block cleanly isolates the existence check and sitk.WriteImage call from the rest of the method, while still resolving the path and updating the index. Once AbstractBaseWriter.resolve_path is updated to avoid unlinking files in dry‑run mode, this writer’s behavior will match the documented semantics.

src/imgtools/autopipeline.py (1)

259-269: dry_run propagation in Autopipeline is consistent and avoids false failures

Forwarding dry_run into SampleOutput and relaxing the “No output files were saved” check when sample_output.dry_run is True gives a sensible behavior: dry‑run still executes the full pipeline, populates the index, and generates reports, but doesn’t treat “no outputs written” as a hard error.

Combined with the writer‑level dry_run gates (once resolve_path is fixed to avoid unlinking in dry_run), this offers a coherent end‑to‑end dry‑run mode without affecting existing non‑dry‑run behavior.

Also applies to: 308-309, 346-347, 360-366, 465-472

src/imgtools/io/writers/abstract_base_writer.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

src/imgtools/io/writers/abstract_base_writer.py (1)

356-368: dry_run still allows destructive operations via preview_path (and directory manipulation)

The early if self.dry_run: return out_path in resolve_path nicely fixes the earlier issue where OVERWRITE could unlink files during a dry‑run. However:

preview_path still calls out_path.unlink() in ExistingFileMode.OVERWRITE regardless of self.dry_run, so a caller using dry_run=True + OVERWRITE + preview_path() can still delete existing outputs.

Additionally, resolve_path will still create parent directories for new paths (and __exit__ can delete an empty root directory) even when dry_run is True, which partially contradicts the “skip actual file I/O operations” wording.

If the intent is “no filesystem mutations at all in dry‑run,” you may want to:

Guard the out_path.unlink() in preview_path behind not self.dry_run.

Consider skipping new directory creation in resolve_path and the empty‑directory cleanup in __exit__ when dry_run is True.

If instead the goal is only “no data‑file writes, but directory housekeeping is OK,” it would still be worth updating the class‑level docs for dry_run to call that out, and to mention that preview_path remains destructive in OVERWRITE mode. This will make the contract much clearer to future users of the API.

Also applies to: 439-460

🧹 Nitpick comments (2)

src/imgtools/io/writers/abstract_base_writer.py (2)

129-132: Align dry_run documentation with resolve_path behaviour (especially FAIL mode)

The new dry_run attribute and the abstract save doc clearly state that dry‑run “skips actual file I/O operations but still write[s] to the index and return[s] the resolved path,” which is good from a readability standpoint. However, there’s now a subtle mismatch between the resolve_path docstring and implementation:

The resolve_path docstring still says it “only raises FileExistsError if the file already exists and the mode is set to FAIL.”

With the new if self.dry_run: return out_path branch, a FileExistsError is no longer raised in FAIL mode when dry_run is True; the call quietly succeeds and returns the path.

For maintainability, it would be good to make this explicit one way or the other:

Either keep the old semantics and let FAIL still raise even in dry‑run (so dry‑run simulates failures as well as writes), or

Intentionally document that in dry‑run, resolve_path never raises on existing files regardless of existing_file_mode, because it’s meant as a read‑only preview.

Right now, a reader relying on the docstring would expect FileExistsError in dry‑run + FAIL, but won’t get it. Clarifying this in the docs (or adjusting the branch) will avoid surprises for users of the abstract base.

Also applies to: 159-159, 220-235, 346-368

633-657: ExampleWriter.save dry_run handling is clear; minor doc/style nits

The ExampleWriter.save implementation looks good:

It now respects self.dry_run by skipping the actual file write while still calling add_to_index, which matches the abstract save contract.

Using replace_existing=output_path.exists() continues to behave sensibly in both normal and dry‑run modes.

Two minor readability / style tweaks to consider:

Line 653: PEP 8 prefers two spaces before an inline comment, e.g. if self.dry_run: # ..., to keep comments visually separated from code.

Since this class is the “demo” implementation, you might add a brief line to the docstring noting that dry_run=True will skip writing the file but still update the index. That would make the example self‑documenting for downstream users reading this concrete writer.

Neither change affects behaviour, but they would make the example slightly clearer and more consistent.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54dac55 and 8c90996.

📒 Files selected for processing (1)

src/imgtools/io/writers/abstract_base_writer.py (6 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure the following: - Variables and functions follow meaningful naming conventions. - Docstrings are present, accurate, and align with the implementation. - Code is efficient and avoids redundancy while adhering to DRY principles. - Consider suggestions to enhance readability and maintainability. - Highlight any potential performance issues, edge cases, or logical errors. - Ensure all imported libraries are used and necessary.

Files:

src/imgtools/io/writers/abstract_base_writer.py

🧬 Code graph analysis (1)

src/imgtools/io/writers/abstract_base_writer.py (1)

src/imgtools/io/sample_output.py (1)

default (159-166)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Integration-Tests (windows-latest, py313, public)
GitHub Check: Integration-Tests (macos-latest, py313, public)
GitHub Check: Integration-Tests (windows-latest, py310, public)
GitHub Check: Integration-Tests (ubuntu-latest, py310, public)
GitHub Check: Integration-Tests (macos-latest, py310, public)
GitHub Check: Integration-Tests (ubuntu-latest, py313, public)

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/imgtools/io/sample_output.py (1)

134-150: Surface dry_run clearly in docs and consider aligning directory creation with dry-run intent

The dry_run flag being threaded into NIFTIWriter and create_dirs=not self.dry_run looks good and keeps the behavior localized.

Two small maintainability/readability suggestions:

The SampleOutput class docstring’s Attributes section doesn’t list dry_run. Adding it there will keep the public configuration surface self-documenting for users reading just this model.

Even when dry_run=True, validate_directory is still called with create=True, so instantiating SampleOutput may create the root output directory. If the intended contract of dry-run is “no filesystem changes at all,” consider gating this as well (e.g., create=not dry_run) or at least documenting that only subdirectory creation is suppressed via create_dirs. This will make the behavior less surprising for callers relying on dry-run.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c90996 and 37df848.

📒 Files selected for processing (3)

src/imgtools/io/sample_output.py (2 hunks)
src/imgtools/io/writers/abstract_base_writer.py (6 hunks)
src/imgtools/io/writers/numpy_writer.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/imgtools/io/writers/abstract_base_writer.py

🧰 Additional context used

📓 Path-based instructions (1)

src/**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for compliance with PEP 8 and PEP 257 (docstring conventions). Ensure the following: - Variables and functions follow meaningful naming conventions. - Docstrings are present, accurate, and align with the implementation. - Code is efficient and avoids redundancy while adhering to DRY principles. - Consider suggestions to enhance readability and maintainability. - Highlight any potential performance issues, edge cases, or logical errors. - Ensure all imported libraries are used and necessary.

Files:

src/imgtools/io/sample_output.py
src/imgtools/io/writers/numpy_writer.py

🧬 Code graph analysis (2)

src/imgtools/io/sample_output.py (1)

src/imgtools/io/sample_input.py (1)

default (228-230)

src/imgtools/io/writers/numpy_writer.py (1)

src/imgtools/vizualize/visualizer2.py (1)

array (176-177)

🪛 Ruff (0.14.5)

src/imgtools/io/writers/numpy_writer.py

123-125: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Integration-Tests (windows-latest, py313, public)
GitHub Check: Integration-Tests (macos-latest, py310, public)
GitHub Check: Integration-Tests (macos-latest, py313, public)
GitHub Check: Integration-Tests (windows-latest, py310, public)
GitHub Check: Integration-Tests (ubuntu-latest, py313, public)
GitHub Check: Linting
GitHub Check: Integration-Tests (ubuntu-latest, py310, public)

🔇 Additional comments (1)

src/imgtools/io/writers/numpy_writer.py (1)

85-87: Docstring update for **kwargs improves clarity

Documenting **kwargs as additional context for filename generation is helpful and keeps the save API self-explanatory for callers.

coderabbitai · 2025-11-26T15:59:52Z

src/imgtools/io/writers/numpy_writer.py

+        if not self.dry_run:
+            if isinstance(data, (np.ndarray, sitk.Image)):
+                # Single image or array
+                array, metadata = self._to_numpy(data)
+                np.savez_compressed(out_path, image_array=array, **metadata)
+            elif isinstance(data, dict):
+                # Multiple images or arrays
+                arrays = {}
+                metadata = {}
+                for key, value in data.items():
+                    array, meta = self._to_numpy(value)
+                    arrays[key] = array
+                    for meta_key, meta_value in meta.items():
+                        metadata[f"{key}_{meta_key}"] = meta_value
+                if self.compressed:
+                    np.savez_compressed(
+                        out_path, allow_pickle=False, **arrays, **metadata
+                    )
+                else:
+                    np.savez(
+                        out_path, allow_pickle=False, **arrays, **metadata
+                    )
            else:
-                np.savez(out_path, allow_pickle=False, **arrays, **metadata)
-        else:
-            raise NumpyWriterValidationError(
-                "Data must be a NumPy array, SimpleITK image, or a dictionary of these types."
-            )
+                raise NumpyWriterValidationError(
+                    "Data must be a NumPy array, SimpleITK image, or a dictionary of these types."
+                )


⚠️ Potential issue | 🟠 Major

Preserve data validation behavior under dry_run and optionally centralize error message

The dry-run guard cleanly skips actual filesystem writes, but it currently also skips all type validation:

The isinstance checks and _to_numpy calls only run when not self.dry_run.

If dry_run=True and data is an unsupported type, no NumpyWriterValidationError is raised; instead, the method quietly indexes a path and returns it. That makes it easy for a dry-run to “succeed” while a real run would fail at write time.

To keep dry-run useful for catching pipeline issues while still avoiding I/O, consider validating data independently of the write, e.g.:

- out_path = self.resolve_path(**kwargs) - - if not self.dry_run: - if isinstance(data, (np.ndarray, sitk.Image)): + out_path = self.resolve_path(**kwargs) + + # Validate input type regardless of dry-run to surface misuses early. + if isinstance(data, (np.ndarray, sitk.Image)): + to_save = ("single", data) + elif isinstance(data, dict): + to_save = ("dict", data) + else: + raise NumpyWriterValidationError( + "Data must be a NumPy array, SimpleITK image, or a dictionary of these types." + ) + + if not self.dry_run: + kind, payload = to_save + if kind == "single": + data_single = payload # Single image or array - array, metadata = self._to_numpy(data) + array, metadata = self._to_numpy(data_single) np.savez_compressed(out_path, image_array=array, **metadata) - elif isinstance(data, dict): + else: # dict # Multiple images or arrays - arrays = {} - metadata = {} - for key, value in data.items(): + arrays: dict[str, np.ndarray] = {} + metadata: dict[str, object] = {} + for key, value in payload.items(): ... - else: - raise NumpyWriterValidationError( - "Data must be a NumPy array, SimpleITK image, or a dictionary of these types." - )

This keeps validation semantics identical between normal and dry-run modes while still avoiding writes.

Optionally, to satisfy TRY003 and improve reuse, you could move the long error message into a class-level constant on NumpyWriterValidationError or NumPyWriter and reference it from both _to_numpy and save, but that’s more of a style/readability tweak than a functional requirement.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.5)

123-125: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In src/imgtools/io/writers/numpy_writer.py around lines 100 to 125, the current dry-run branch skips all type validation so unsupported input types never raise NumpyWriterValidationError during dry runs; change the flow to always validate input (perform isinstance checks and call self._to_numpy or a dedicated validation helper) before checking self.dry_run, then only skip the actual np.savez/np.savez_compressed when dry_run is True; optionally extract the long error message into a class-level constant on NumpyWriter or NumpyWriterValidationError and reuse it in both _to_numpy and save to avoid duplication.

src/imgtools/dicom/dicom_metadata/extractor_base.py

src/imgtools/io/sample_output.py

Co-authored-by: Katy Scott <[email protected]>

skim2257

looks great. just would love clarification on one CLI output

skim2257 · 2025-12-02T19:33:05Z

src/imgtools/autopipeline.py

"no output files, check the directory" sounds a bit confusing? what are they checking?

Can you send the full output

feat: add dry run functionality to autopipeline and add sensitive dic…

54dac55

…om tags to metadata

JoshuaSiraj requested a review from strixy16 November 25, 2025 21:43

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

src/imgtools/io/writers/abstract_base_writer.py Show resolved Hide resolved

chore: bypass existing file logic when dry_run=True

8c90996

JoshuaSiraj requested a review from skim2257 November 26, 2025 15:43

JoshuaSiraj linked an issue Nov 26, 2025 that may be closed by this pull request

Create simple index file without saving files #416

Open

coderabbitai bot reviewed Nov 26, 2025

View reviewed changes

chore: ruff format

37df848

coderabbitai bot reviewed Nov 26, 2025

View reviewed changes

strixy16 reviewed Nov 26, 2025

View reviewed changes

src/imgtools/dicom/dicom_metadata/extractor_base.py Show resolved Hide resolved

strixy16 requested changes Nov 26, 2025

View reviewed changes

src/imgtools/io/sample_output.py Outdated Show resolved Hide resolved

Update src/imgtools/io/sample_output.py

c8a1c13

Co-authored-by: Katy Scott <[email protected]>

strixy16 approved these changes Nov 26, 2025

View reviewed changes

skim2257 reviewed Dec 2, 2025

View reviewed changes

feat: updated snapshots for integration testing

ce5f490

coderabbitai bot added the hackathon label Dec 10, 2025

JoshuaSiraj and others added 12 commits December 10, 2025 14:58

chore: update lock file

47567a8

chore update lock

4f06061

build: updated pixi lock

8cf0f35

build: updated lockfile from linux

70cc33d

build: pixi lock update with pixi 0.61.0

86493d7

chore: update setup-pixi and pixi versions in main.yml for actions

f14f3e1

chore: update action.yaml

9507a17

chore: pixi lock with version 0.61

64958ad

chore: update pixi lock

2204176

chore: try removing pixi caching

1ecc5ba

chore

91ef21a

chore

f5e79b2

JoshuaSiraj and others added 5 commits January 5, 2026 11:27

chore

5e69eca

chore: try updating log level

c1957ab

chore: turn of locking

de9c9f8

Merge remote-tracking branch 'origin/main' into JoshuaSiraj/feat/dry_run

42b8369

build: updated pixi lock

a913545

feat: add dry run functionality to autopipeline and add sensitive dicom tags to metadata #430

Are you sure you want to change the base?

feat: add dry run functionality to autopipeline and add sensitive dicom tags to metadata #430

Uh oh!

Conversation

JoshuaSiraj commented Nov 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested labels

Pre-merge checks and finishing touches

Uh oh!

codecov bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

skim2257 left a comment

Choose a reason for hiding this comment

Uh oh!

skim2257 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

JoshuaSiraj Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JoshuaSiraj commented Nov 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 25, 2025 •

edited

Loading

codecov bot commented Nov 25, 2025 •

edited

Loading