Skip to content

Refine recursive mode design: type conflicts, cycle detection, delete semantics#11

Open
messa wants to merge 1 commit intomainfrom
claude/merge-recursive-plan-review-R8g6L
Open

Refine recursive mode design: type conflicts, cycle detection, delete semantics#11
messa wants to merge 1 commit intomainfrom
claude/merge-recursive-plan-review-R8g6L

Conversation

@messa
Copy link
Copy Markdown
Owner

@messa messa commented Jan 26, 2026

Summary

This PR refines the design document for the --recursive mode implementation with important clarifications on error handling, safety mechanisms, and protocol semantics based on implementation experience.

Key Changes

  • Type conflict handling: Changed from automatic remv+recreate to explicit warnings and skipping. When a file becomes a directory or vice versa, the item is now skipped with a warning rather than automatically replaced. This prevents unexpected data loss and requires manual user intervention.

  • Cycle detection: Added explicit cycle detection mechanism using (st_dev, st_ino) pairs in walk_directory() to handle symlink cycles when followlinks=True.

  • Delete semantics clarification:

    • Moved --delete flag to save command (not retrieve)
    • retrieve always sends remv commands for missing items
    • save without --delete ignores remv commands; with --delete executes them
    • This gives users control at the destination side where they typically run the pipeline
  • Metadata handling: Clarified that metadata (meta command) is always sent even for unchanged files, enabling permission/timestamp synchronization without content transfer.

  • Type checking on retrieve: Added explicit checks to detect when source items have changed type (file→dir or dir→file) and handle appropriately with warnings.

  • Dual-mode protocol: Documented that single-file and recursive modes are equal "first-class citizens" with different use cases, not a version upgrade.

Implementation Details

  • Retrieve now validates item types before processing: files must exist as files, directories as directories
  • Save respects --delete flag when processing remv commands from retrieve
  • Metadata overhead (~60-70 bytes per file) is acceptable for typical use cases
  • Deterministic ordering maintained: checksum sends items alphabetically, retrieve responds in same order then adds new items, finally sends delete commands

Address all issues from add_recursive_plan_v2.review_v1.md:

- Move --delete flag to save command (user controls deletion locally)
- Handle type changes (file↔dir) by logging warning and skipping
- Add symlink cycle detection using (st_dev, st_ino)
- Document why metadata is always sent (allows metadata-only updates)
- Clarify retrieve output order (dest items, new items, remv commands)
- Emphasize single-file and recursive are both first-class citizens
Copilot AI review requested due to automatic review settings January 26, 2026 15:57
@messa messa self-assigned this Jan 26, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines the design document for the --recursive mode implementation in blockcopy, addressing important aspects of error handling, safety mechanisms, and protocol semantics based on implementation experience.

Changes:

  • Type conflict handling changed from automatic replacement to explicit warnings with manual resolution required
  • Added cycle detection mechanism using (st_dev, st_ino) pairs for symlink cycles
  • Clarified delete semantics: --delete flag moved to save command, retrieve always sends remv commands
  • Documented that metadata is always sent even for unchanged files to enable permission/timestamp synchronization
  • Clarified dual-mode protocol nature (single-file and recursive as equal "first-class citizens")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread add_recursive_plan_v2.md
Comment on lines +248 to 250
- Pokud existuje jako soubor: porovnat hashe, poslat `'file'` + změněné datové bloky + `'meta'`
- Pokud existuje jako adresář: vypsat warning "type changed: file → dir", přeskočit
- Pokud neexistuje na source: přidat cestu do `to_delete`
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm for handling type conflicts appears incomplete. When a file on dest exists as a directory on source (line 249), the item is skipped but it's unclear whether the path is added to seen_paths. If it's not added to seen_paths, then step (e) at line 257-259 would later send the directory and all its children as new items when walking the source. However, save cannot create these because the conflicting file still exists on dest. The algorithm should clarify that type-conflicted paths should be added to seen_paths to prevent attempting to send the source version, and should also skip all descendants of type-conflicted directories to avoid attempting impossible operations.

Copilot uses AI. Check for mistakes.
Comment thread add_recursive_plan_v2.md
- Zkontrolovat on-demand, jestli adresář existuje na source
- Pokud existuje: poslat `'dire'` + `'dmet'`
- Pokud existuje jako adresář: poslat `'dire'` + `'dmet'`
- Pokud existuje jako soubor: vypsat warning "type changed: dir → file", přeskočit
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the file→dir conflict, when a directory on dest exists as a file on source (line 255), the algorithm should clarify whether the path is added to seen_paths. If not added, step (e) at line 257-259 would send the file as a new item, but save cannot create it because the conflicting directory still exists on dest. Type-conflicted paths should be added to seen_paths to prevent retransmission.

Suggested change
- Pokud existuje jako soubor: vypsat warning "type changed: dir → file", přeskočit
- Pokud existuje jako soubor: vypsat warning "type changed: dir → file", cesta zůstává v `seen_paths` (nesmí být znovu poslána jako nová položka ve kroku (e)), přeskočit

Copilot uses AI. Check for mistakes.
Comment thread add_recursive_plan_v2.md
- Projít source adresář (`followlinks=True`)
- Pro každou položku která není v `seen_paths`: poslat jako novou
- Pokud `--delete`: poslat `'remv'` pro položky v `to_delete`
- Poslat `'remv'` pro položky v `to_delete` (save bez `--delete` je ignoruje)
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm specifies sending remv commands for items in to_delete but doesn't specify the order. Section "Pořadí mazání (--delete)" at line 407 states that remv commands should be sorted in descending order by path depth (children before parents) to enable proper deletion. The algorithm should clarify that remv commands are sent in descending depth order, or that they use recursive deletion.

Suggested change
- Poslat `'remv'` pro položky v `to_delete` (save bez `--delete` je ignoruje)
- Seřadit položky v `to_delete` podle hloubky cesty sestupně (potomci před rodiči, při stejné hloubce např. abecedně podle cesty)
- Poslat `'remv'` pro položky v `to_delete` v tomto pořadí (save bez `--delete` je ignoruje)

Copilot uses AI. Check for mistakes.
Comment thread add_recursive_plan_v2.md
Comment on lines 443 to +444
Symlinky se následují (followlinks=True).
Detekuje cykly pomocí (st_dev, st_ino) a přeskočí navštívené adresáře.
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design of walk_directory() with followlinks=True and always following symlinks ("Symlinky se následují (followlinks=True)") introduces a path traversal / data exfiltration risk: an unprivileged user can place a symlink inside the synced tree pointing to any path readable by the blockcopy process (e.g., /etc/shadow), and the recursive sync will read and transfer the target contents under the symlink path. In multi-user or partially untrusted source directories, this lets less-privileged users cause the tool (often running with elevated privileges) to leak arbitrary files outside the intended source root. To mitigate this, avoid following symlinks by default for recursive mode (treat them as symlinks or skip them), or make symlink following an explicit opt-in with strict checks that the resolved target remains within the allowed source root and does not cross sensitive filesystem boundaries.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants