-
Notifications
You must be signed in to change notification settings - Fork 22
Fix corrupted data before processing + auto recovery #861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
aliel
wants to merge
16
commits into
main
Choose a base branch
from
aliel-fix-corrupted-data
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
a5f110a
Fix corrupted data before processing
aliel 2a63c80
move the fix on the correct location (message processing)
aliel 0e7b8f1
Fix corrupted storage cache: atomic writes + hash verification + reco…
aliel 4d3eb99
Scope SHA-256 cache check to message content, fix regressions + revie…
aliel dfdc0a6
Fix test: use valid CIDv0 for IPFS cache test fixture
aliel e33e4ee
Extract JSON corruption recovery into _recover_cached_content method
aliel f08ceaa
Refactor: encapsulate corruption recovery, keep JSON parsing in caller
aliel a35f663
Address reviewer nits: docstring and u0000 comment
aliel 8812331
isort fix
aliel 0e7a5b2
Document test dependency on _fetch_content_from_network verification
aliel b64deb1
Fix unused type: ignore in test_get_content
aliel 3263c1b
Add IPFS JSON recovery + inline regression tests; fix comment typo
aliel 5d1861c
Review: rename recovery vars, fix IPFS test, add cleanup comment
aliel 4dda544
Remove redundant str() casts, fix verify_spy comment wording and clar…
aliel 9b113b4
Fix test regressions: IPFS cache assertion and inline schema guard
aliel f018e14
Move SHA-256 cache check to repair tool behind --repair-native-storage
aliel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,5 @@ | ||
| import asyncio | ||
| import os | ||
| from pathlib import Path | ||
| from typing import AsyncIterable, Optional, Union | ||
|
|
||
|
|
@@ -43,7 +45,66 @@ async def _read_iterator(): | |
|
|
||
| async def write(self, filename: str, content: bytes): | ||
| file_path = self.folder / filename | ||
| file_path.write_bytes(content) | ||
| temp_path = self.folder / f"{filename}.tmp" | ||
|
|
||
| # Run blocking syscalls (os.open, os.fsync, os.replace) off the event loop. | ||
| await asyncio.to_thread(self._write_durably, temp_path, file_path, content) | ||
|
|
||
| @staticmethod | ||
| def _write_durably(temp_path: Path, file_path: Path, content: bytes) -> None: | ||
| """Atomically and durably write ``content`` to ``file_path``. | ||
|
|
||
| Steps: | ||
| 1. Write bytes to ``temp_path`` (same directory as ``file_path``). | ||
| 2. fsync the file descriptor so data and file metadata hit the disk. | ||
| 3. Atomically rename via ``os.replace`` (POSIX-atomic on same FS). | ||
| 4. Best-effort fsync of the parent directory so the rename is durable | ||
| across kernel crashes (POSIX-only; silently skipped on Windows). | ||
|
|
||
| On any exception (including post-rename errors in the directory-fsync | ||
| section), the temp file is removed best-effort and the exception is | ||
| re-raised. The target file is never touched until the rename succeeds, | ||
| so crashes leave either the old content or none. | ||
| """ | ||
| fd = os.open( | ||
| temp_path, | ||
| os.O_WRONLY | os.O_CREAT | os.O_TRUNC, | ||
| 0o644, | ||
| ) | ||
| try: | ||
| try: | ||
| view = memoryview(content) | ||
| written = 0 | ||
| while written < len(view): | ||
| written += os.write(fd, view[written:]) | ||
| os.fsync(fd) | ||
| finally: | ||
| os.close(fd) | ||
|
|
||
| os.replace(temp_path, file_path) | ||
|
|
||
| # Best-effort directory fsync — makes the rename durable. | ||
| # os.O_DIRECTORY is POSIX-only (AttributeError on Windows); | ||
| # some filesystems/VMs also raise OSError — both are silently skipped. | ||
| try: | ||
| dir_fd = os.open(file_path.parent, os.O_DIRECTORY) | ||
| except (AttributeError, OSError): | ||
| return | ||
| try: | ||
| os.fsync(dir_fd) | ||
| except OSError: | ||
| pass | ||
| finally: | ||
| os.close(dir_fd) | ||
|
|
||
| except Exception: | ||
| try: | ||
| # temp_path may already be gone if os.replace succeeded before | ||
| # the exception (e.g. an error in the dir-fsync section). | ||
| temp_path.unlink(missing_ok=True) | ||
| except OSError: | ||
| pass | ||
| raise | ||
|
Comment on lines
+100
to
+107
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cleanup of the temporary directory should be left to the caller, either declare the temporary file in this function or perform the cleanup in the caller. |
||
|
|
||
| async def delete(self, filename: str): | ||
| file_path = self.folder / filename | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| """Tests for FileSystemStorageEngine's durable atomic write. | ||
|
|
||
| These tests cover the invariants provided by ``write()``: | ||
| - Content on disk matches what was written (happy path). | ||
| - A temp file is used and atomically renamed. | ||
| - ``os.fsync`` is called on the file fd before ``os.replace``. | ||
| - Failures during rename leave the target unchanged and clean up the temp file. | ||
| """ | ||
|
|
||
| import os | ||
| from pathlib import Path | ||
| from unittest.mock import patch | ||
|
|
||
| import pytest | ||
|
|
||
| from aleph.services.storage.fileystem_engine import FileSystemStorageEngine | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_write_produces_final_file(tmp_path: Path): | ||
| engine = FileSystemStorageEngine(folder=tmp_path) | ||
| await engine.write(filename="abc", content=b"hello") | ||
|
|
||
| final = tmp_path / "abc" | ||
| assert final.is_file() | ||
| assert final.read_bytes() == b"hello" | ||
| assert not (tmp_path / "abc.tmp").exists() | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_write_uses_temp_file_then_rename(tmp_path: Path): | ||
| engine = FileSystemStorageEngine(folder=tmp_path) | ||
|
|
||
| observed = {} | ||
| real_replace = os.replace | ||
|
|
||
| def spy_replace(src, dst): | ||
| observed["src_name"] = os.path.basename(src) | ||
| observed["src_exists_before"] = os.path.exists(src) | ||
| observed["dst_missing_before"] = not os.path.exists(dst) | ||
| return real_replace(src, dst) | ||
|
|
||
| with patch( | ||
| "aleph.services.storage.fileystem_engine.os.replace", | ||
| side_effect=spy_replace, | ||
| ): | ||
| await engine.write(filename="abc", content=b"hello") | ||
|
|
||
| assert observed["src_name"] == "abc.tmp" | ||
| assert observed["src_exists_before"] is True | ||
| assert observed["dst_missing_before"] is True | ||
| assert (tmp_path / "abc").read_bytes() == b"hello" | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_write_fsyncs_before_rename(tmp_path: Path): | ||
| engine = FileSystemStorageEngine(folder=tmp_path) | ||
|
|
||
| call_order = [] | ||
| real_fsync = os.fsync | ||
| real_replace = os.replace | ||
|
|
||
| def tracking_fsync(fd): | ||
| call_order.append("fsync") | ||
| return real_fsync(fd) | ||
|
|
||
| def tracking_replace(src, dst): | ||
| call_order.append("replace") | ||
| return real_replace(src, dst) | ||
|
|
||
| with ( | ||
| patch( | ||
| "aleph.services.storage.fileystem_engine.os.fsync", | ||
| side_effect=tracking_fsync, | ||
| ), | ||
| patch( | ||
| "aleph.services.storage.fileystem_engine.os.replace", | ||
| side_effect=tracking_replace, | ||
| ), | ||
| ): | ||
| await engine.write(filename="abc", content=b"hello") | ||
|
|
||
| # The file-fd fsync must occur before the atomic rename. | ||
| first_replace = call_order.index("replace") | ||
| assert "fsync" in call_order[:first_replace] | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_write_failure_cleans_up_temp_and_preserves_target(tmp_path: Path): | ||
| engine = FileSystemStorageEngine(folder=tmp_path) | ||
|
|
||
| # Pre-seed the target to confirm it survives a failed write. | ||
| target = tmp_path / "abc" | ||
| target.write_bytes(b"original") | ||
|
|
||
| def boom(src, dst): | ||
| raise OSError("simulated rename failure") | ||
|
|
||
| with patch( | ||
| "aleph.services.storage.fileystem_engine.os.replace", | ||
| side_effect=boom, | ||
| ): | ||
| with pytest.raises(OSError): | ||
| await engine.write(filename="abc", content=b"new-content") | ||
|
|
||
| assert target.read_bytes() == b"original" | ||
| assert not (tmp_path / "abc.tmp").exists() | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_read_missing_returns_none(tmp_path: Path): | ||
| engine = FileSystemStorageEngine(folder=tmp_path) | ||
| assert await engine.read("missing") is None | ||
|
|
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_overwrite_replaces_content(tmp_path: Path): | ||
| engine = FileSystemStorageEngine(folder=tmp_path) | ||
| await engine.write(filename="abc", content=b"v1") | ||
| await engine.write(filename="abc", content=b"v2") | ||
|
|
||
| assert (tmp_path / "abc").read_bytes() == b"v2" | ||
| assert not (tmp_path / "abc.tmp").exists() |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.