Refine recursive mode design: type conflicts, cycle detection, delete semantics#11
Refine recursive mode design: type conflicts, cycle detection, delete semantics#11
Conversation
Address all issues from add_recursive_plan_v2.review_v1.md: - Move --delete flag to save command (user controls deletion locally) - Handle type changes (file↔dir) by logging warning and skipping - Add symlink cycle detection using (st_dev, st_ino) - Document why metadata is always sent (allows metadata-only updates) - Clarify retrieve output order (dest items, new items, remv commands) - Emphasize single-file and recursive are both first-class citizens
There was a problem hiding this comment.
Pull request overview
This PR refines the design document for the --recursive mode implementation in blockcopy, addressing important aspects of error handling, safety mechanisms, and protocol semantics based on implementation experience.
Changes:
- Type conflict handling changed from automatic replacement to explicit warnings with manual resolution required
- Added cycle detection mechanism using
(st_dev, st_ino)pairs for symlink cycles - Clarified delete semantics:
--deleteflag moved tosavecommand, retrieve always sendsremvcommands - Documented that metadata is always sent even for unchanged files to enable permission/timestamp synchronization
- Clarified dual-mode protocol nature (single-file and recursive as equal "first-class citizens")
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - Pokud existuje jako soubor: porovnat hashe, poslat `'file'` + změněné datové bloky + `'meta'` | ||
| - Pokud existuje jako adresář: vypsat warning "type changed: file → dir", přeskočit | ||
| - Pokud neexistuje na source: přidat cestu do `to_delete` |
There was a problem hiding this comment.
The algorithm for handling type conflicts appears incomplete. When a file on dest exists as a directory on source (line 249), the item is skipped but it's unclear whether the path is added to seen_paths. If it's not added to seen_paths, then step (e) at line 257-259 would later send the directory and all its children as new items when walking the source. However, save cannot create these because the conflicting file still exists on dest. The algorithm should clarify that type-conflicted paths should be added to seen_paths to prevent attempting to send the source version, and should also skip all descendants of type-conflicted directories to avoid attempting impossible operations.
| - Zkontrolovat on-demand, jestli adresář existuje na source | ||
| - Pokud existuje: poslat `'dire'` + `'dmet'` | ||
| - Pokud existuje jako adresář: poslat `'dire'` + `'dmet'` | ||
| - Pokud existuje jako soubor: vypsat warning "type changed: dir → file", přeskočit |
There was a problem hiding this comment.
Similar to the file→dir conflict, when a directory on dest exists as a file on source (line 255), the algorithm should clarify whether the path is added to seen_paths. If not added, step (e) at line 257-259 would send the file as a new item, but save cannot create it because the conflicting directory still exists on dest. Type-conflicted paths should be added to seen_paths to prevent retransmission.
| - Pokud existuje jako soubor: vypsat warning "type changed: dir → file", přeskočit | |
| - Pokud existuje jako soubor: vypsat warning "type changed: dir → file", cesta zůstává v `seen_paths` (nesmí být znovu poslána jako nová položka ve kroku (e)), přeskočit |
| - Projít source adresář (`followlinks=True`) | ||
| - Pro každou položku která není v `seen_paths`: poslat jako novou | ||
| - Pokud `--delete`: poslat `'remv'` pro položky v `to_delete` | ||
| - Poslat `'remv'` pro položky v `to_delete` (save bez `--delete` je ignoruje) |
There was a problem hiding this comment.
The algorithm specifies sending remv commands for items in to_delete but doesn't specify the order. Section "Pořadí mazání (--delete)" at line 407 states that remv commands should be sorted in descending order by path depth (children before parents) to enable proper deletion. The algorithm should clarify that remv commands are sent in descending depth order, or that they use recursive deletion.
| - Poslat `'remv'` pro položky v `to_delete` (save bez `--delete` je ignoruje) | |
| - Seřadit položky v `to_delete` podle hloubky cesty sestupně (potomci před rodiči, při stejné hloubce např. abecedně podle cesty) | |
| - Poslat `'remv'` pro položky v `to_delete` v tomto pořadí (save bez `--delete` je ignoruje) |
| Symlinky se následují (followlinks=True). | ||
| Detekuje cykly pomocí (st_dev, st_ino) a přeskočí navštívené adresáře. |
There was a problem hiding this comment.
The design of walk_directory() with followlinks=True and always following symlinks ("Symlinky se následují (followlinks=True)") introduces a path traversal / data exfiltration risk: an unprivileged user can place a symlink inside the synced tree pointing to any path readable by the blockcopy process (e.g., /etc/shadow), and the recursive sync will read and transfer the target contents under the symlink path. In multi-user or partially untrusted source directories, this lets less-privileged users cause the tool (often running with elevated privileges) to leak arbitrary files outside the intended source root. To mitigate this, avoid following symlinks by default for recursive mode (treat them as symlinks or skip them), or make symlink following an explicit opt-in with strict checks that the resolved target remains within the allowed source root and does not cross sensitive filesystem boundaries.
Summary
This PR refines the design document for the
--recursivemode implementation with important clarifications on error handling, safety mechanisms, and protocol semantics based on implementation experience.Key Changes
Type conflict handling: Changed from automatic remv+recreate to explicit warnings and skipping. When a file becomes a directory or vice versa, the item is now skipped with a warning rather than automatically replaced. This prevents unexpected data loss and requires manual user intervention.
Cycle detection: Added explicit cycle detection mechanism using
(st_dev, st_ino)pairs inwalk_directory()to handle symlink cycles whenfollowlinks=True.Delete semantics clarification:
--deleteflag tosavecommand (notretrieve)retrievealways sendsremvcommands for missing itemssavewithout--deleteignoresremvcommands; with--deleteexecutes themMetadata handling: Clarified that metadata (
metacommand) is always sent even for unchanged files, enabling permission/timestamp synchronization without content transfer.Type checking on retrieve: Added explicit checks to detect when source items have changed type (file→dir or dir→file) and handle appropriately with warnings.
Dual-mode protocol: Documented that single-file and recursive modes are equal "first-class citizens" with different use cases, not a version upgrade.
Implementation Details
--deleteflag when processingremvcommands from retrieve