Skip to content

Conversation

@deto
Copy link

@deto deto commented Nov 21, 2025

This addresses issue #61

See issue for details on the motivation for this fix.

I am using this dev version to run some jobs (so I know it runs), but I'd like to be able to run the tests. However, the tests seem to run on some sort of local s3 mirror? Happy to run the tests if you could provide instructions on how.

Summary by CodeRabbit

  • Refactor
    • Improved directory detection for S3 storage operations through real-time validation instead of cached state.
    • Enhanced error handling to better distinguish between missing objects and other failures.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 21, 2025

📝 Walkthrough

Walkthrough

Removed cached directory-state tracking in the S3 storage plugin, replacing _is_dir initialization with real-time directory detection via sentinel file checks. The new is_dir() method determines directory status by checking if the local path is a directory, then attempting to load a .snakemake_timestamp sentinel object, using the result to infer S3 directory existence.

Changes

Cohort / File(s) Change Summary
Directory state tracking refactor
snakemake_storage_plugin_s3/__init__.py
Removed _is_dir cache field and its initialization in __post_init__() and store_object(). Refactored is_dir() to perform real-time checks: returns True for local directories, attempts sentinel file load via s3obj(".snakemake_timestamp"), returns True on success, False on 404, and re-raises other exceptions. Updated error handling to treat 404 as non-existence indicator.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant is_dir
    participant local_path
    participant S3

    Caller->>is_dir: is_dir()
    is_dir->>local_path: check if local_path() is directory
    alt local_path is directory
        local_path-->>is_dir: True
        is_dir-->>Caller: return True
    else local_path is not directory
        local_path-->>is_dir: False
        is_dir->>S3: attempt load s3obj(".snakemake_timestamp")
        alt sentinel object exists
            S3-->>is_dir: success
            is_dir-->>Caller: return True
        else 404 error
            S3-->>is_dir: 404
            is_dir-->>Caller: return False
        else other exception
            S3-->>is_dir: exception
            is_dir-->>Caller: re-raise exception
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Core function refactor: is_dir() logic significantly changed with new sentinel-file-based detection replacing cached approach; requires verification of all execution paths
  • Error handling changes: 404 interpretation and re-raise logic must be validated against S3 client behavior
  • Cache removal impact: Ensure removal of _is_dir mutations in store_object() and __post_init__() doesn't introduce race conditions or missed directory detections in concurrent scenarios
  • Behavioral implications: Real-time sentinel file checks may have latency or consistency implications depending on S3 backend; test edge cases where sentinel files may not exist or be stale

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: removing _is_dir caching and switching to .snakemake-timestamp sentinel detection, which aligns with the file changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
snakemake_storage_plugin_s3/__init__.py (1)

327-341: is_dir logic looks good; consider a small cleanup of control flow and exception re‑raise

The new directory detection via .snakemake_timestamp is consistent with Snakemake’s directory semantics and fits well with how exists(), mtime(), and size() use is_dir().

There are two small, non‑blocking cleanups you may want to make:

  1. The final return False is unreachable, since all paths in the try/except either return or raise.
  2. Re‑raising with raise instead of raise e preserves the original traceback and addresses the TRY201 hint.

You can address both (and the TRY300 hint about return in try) with a minimal refactor:

 def is_dir(self):
-        if self.local_path().is_dir():
-            return True
-
-        try:
-            self.s3obj(".snakemake_timestamp").load()
-            return True
-        except botocore.exceptions.ClientError as e:
-            err_code = e.response["Error"]["Code"]
-            if err_code == "404":
-                return False
-            else:
-                raise e
-
-        return False
+        if self.local_path().is_dir():
+            return True
+
+        try:
+            self.s3obj(".snakemake_timestamp").load()
+        except botocore.exceptions.ClientError as e:
+            err_code = e.response["Error"]["Code"]
+            if err_code == "404":
+                return False
+            raise
+        return True

This keeps behavior identical while simplifying the flow and aligning with Ruff’s suggestions.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d067f40 and 53a098a.

📒 Files selected for processing (1)
  • snakemake_storage_plugin_s3/__init__.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • snakemake_storage_plugin_s3/__init__.py
🪛 Ruff (0.14.5)
snakemake_storage_plugin_s3/__init__.py

333-333: Consider moving this statement to an else block

(TRY300)


339-339: Use raise without specifying exception name

Remove exception name

(TRY201)

@deto deto changed the title Remove _is_dir caching, use .snakemake-timestamp instead fix: Remove _is_dir caching, use .snakemake-timestamp instead Nov 21, 2025
@deto
Copy link
Author

deto commented Nov 21, 2025

And....now I'm seeing there are two other pull requests both to fix the same issue!

I think I like how #60 did it as it preserves the caching. I'll leave this up, though, as a reference for an alternate fix to be closed once one of the other PRs is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant