Open
Conversation
60822b9 to
01e8bdf
Compare
…de1vec into codex/various-fixes
Read tiles directly from WSI during embedding via cucim's batched read_region, eliminating the tar creation step. Super tiles (8x8, 4x4, 2x2 blocks) reduce the number of read calls. A custom batch sampler (adaptive_batching) optionally aligns DataLoader batches to super tile boundaries to avoid redundant reads. New config options under tiling: - on_the_fly (default true): read from WSI instead of tar - gpu_decode (default false): experimental GPU JPEG decoding - adaptive_batching (default false): vary batch size to match super tiles Co-Authored-By: Claude Opus 4.6 <[email protected]>
- remove dead try/except around gpu_decode dict assignment - remove unused variable in read_batch - update docstring to mention 2x2 blocks - fix on_the_fly default mismatch in from_config fallback (False → True) - use np.isin instead of set for DDP tile filtering Co-Authored-By: Claude Opus 4.6 <[email protected]>
The flat arrays (tile_to_st, tile_crop_x, tile_crop_y) are the only lookup structures used in the hot path. The per-super-tile duplicates were never read. Co-Authored-By: Claude Opus 4.6 <[email protected]>
PyTurboJPEG>=2 requires libjpeg-turbo 3.0+. Ubuntu 22.04 ships 2.x via libturbojpeg0-dev. Build libjpeg-turbo 3.1.0 from source in the build stage and copy the shared libs to the runtime stage. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Pulls in hs2p[cucim] (cucim-cu12, cupy-cuda12x, nvidia-nvimgcodec-cu12) and PyTurboJPEG. Co-Authored-By: Claude Opus 4.6 <[email protected]>
_WSDTarReadPlan now carries tile_indices directly, so _build_supertile_index no longer needs to reconstruct member coords via coord_to_index — replaced with iter(plan.tile_indices). Also bumps hs2p requirement to >=2.4.1 which introduced the top-down super tile grouping and the tile_indices field. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Threads hs2p's jpeg_backend parameter (turbojpeg/pil) through PreprocessingConfig → _tile_slides so callers can pick the JPEG encoder used during tar extraction. Defaults to "turbojpeg". Sets jpeg_backend="pil" in test_output_consistency so the tar path matches the PIL-encoded ground truth fixtures. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Adds num_cucim_workers (default 4) to PreprocessingConfig. In the on-the-fly path, DataLoader num_workers is auto-derived as cpu_count // num_cucim_workers instead of reusing speed.num_workers, avoiding oversubscription when both are large. A log message is emitted when the computed value differs from speed.num_workers. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Adds num_preprocessing_workers to ExecutionOptions, read from speed.num_preprocessing_workers (fallback speed.num_workers). _tile_slides now uses num_preprocessing_workers for hs2p, while execution.num_workers (speed.num_dataloader_workers) remains the DataLoader knob for the tar embedding path. Default yaml updated with explicit num_preprocessing_workers and num_dataloader_workers keys. Backward-compatible via num_workers fallback in from_config. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tiling.save_tilesandTarTileReader, while keeping HS2P coordinate artifacts as the canonical tiling metadatatiles_*coordinate files tocoordinates_*, update docs, and add focused regression coverage for the new tile-store pathValidation
COVERAGE_FILE=/tmp/slide2vec-region-final.coverage ~/Code/venv/slide2vec/bin/python -m pytest -q tests/test_dependency_split.py tests/test_regression_inference.py tests/test_progress.py -k 'region_batch_preprocessor or region_models or serialize_execution or uses_batched_loader_knobs or batch_timing or run_forward_pass or dependency_split'COVERAGE_FILE=/tmp/slide2vec-backend-split.coverage ~/Code/venv/slide2vec/bin/python -m pytest -q tests/test_regression_core.py tests/test_regression_inference.py -k 'execution_options_from_config or execution_options_with_output_dir or serialize_execution or build_hs2p_configs or resolve_embedding_backend or uses_batched_loader_knobs'COVERAGE_FILE=/tmp/slide2vec-push.coverage ~/Code/venv/slide2vec/bin/python -m pytest -q tests/test_hs2p_package_cutover.py -k 'load_process_df_requires_hs2p_process_list_columns'~/Code/venv/slide2vec/bin/python -m py_compile slide2vec/api.py slide2vec/data/tile_store.py slide2vec/data/dataset.py slide2vec/inference.py slide2vec/utils/tiling_io.py tests/test_tile_store.pyNotes
tests/test_tile_store.pywas added for the tar-backed reader, but this local venv is missingPIL, so I did not run that file here..claude/settings.local.jsonremains local and was not included in the branch.