You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: operator doppelgänger protection with slot-based detection (#692)
Closes#627
Implements **Operator Doppelgänger Protection** to detect when multiple instances of the same operator are running, preventing QBFT protocol violations.
### Problem
If two operator instances run simultaneously:
- Both participate in QBFT consensus
- Both send messages with the same operator ID
- QBFT receives conflicting votes from one operator ID
- Protocol correctness breaks (QBFT expects one vote per operator per round)
### Solution
**Slot-Based Detection**: Records the startup slot and monitors the network for messages with our operator ID that reference slots after startup.
| Phase | Outgoing Messages | Detection Logic |
|------------|-------------------|----------------------------------------------------------|
| Monitoring | ❌ BLOCKED | Messages with slot > startup_slot → Twin detected |
| Completed | ✅ SENT | No longer checking |
**Why Slot Comparison Works:**
- All outgoing messages blocked during monitoring
- Messages with `slot ≤ startup_slot`: Ignored (our own old messages)
- Messages with `slot > startup_slot`: Twin detected (we didn't send them)
- No race conditions possible since we never compete with twins
**State Management:**
- Simple AtomicBool for lock-free monitoring state
- No grace period needed (slot comparison handles message filtering)
- Detection active immediately on startup
**Twin Detection:**
- Monitors single-signer QBFT and partial signature messages
- Twin detected → logs detailed context → graceful shutdown
- Handles edge cases: restart in same slot, network delays, clock skew
### Benefits Over Time-Based Approaches
- **Faster Startup**: No 411-second grace period wait
- **More Reliable**: Slot comparison is deterministic, unaffected by network delays
- **Simpler**: Fewer states, cleaner logic with lock-free operations
### Configuration
This feature is **opt-in** to avoid imposing a monitoring delay on all operators:
- `--operator-dg`: Enable protection (default: false)
- `--operator-dg-wait-epochs`: Monitoring duration in epochs (default: 2, ~768 seconds)
**Rationale for Opt-In:**
- QBFT's Byzantine tolerance already handles some operator duplication scenarios
- No clear path from operator duplication to validator slashing (operators don't directly sign beacon chain messages)
- Provides operational discipline rather than security guarantees
- ~12 minute startup delay may not be acceptable for all deployment scenarios
- Operators can enable when operational safety is prioritized
### Testing
8 comprehensive tests covering slot-based detection, state management, and edge cases. Manual testing verified all configuration scenarios work correctly.
Co-Authored-By: diego <[email protected]>
Copy file name to clipboardExpand all lines: CLAUDE.md
+36-1Lines changed: 36 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -225,6 +225,16 @@ When contributing to Anchor, follow these Rust best practices:
225
225
6.**Simplicity First**: Always choose the simplest solution that elegantly solves the problem, follows existing patterns, maintains performance, and uses basic constructs over complex data structures
226
226
7.**Check Requirements First**: Before implementing or creating anything (PRs, commits, code), always read and follow existing templates, guidelines, and requirements in the codebase
227
227
228
+
### Architecture and Design
229
+
230
+
1.**Question Intermediaries**: If data flows A → B with no transformation, question why A → intermediate → B exists. Each layer should provide clear value (logging, transformation, validation, etc.). Ask: "What problem does this solve that direct communication doesn't?"
231
+
232
+
2.**Separation Through Interfaces, Not Layers**: Clean boundaries come from well-defined APIs, not intermediary components. A component receiving a `Sender<Event>` achieves separation without needing forwarding tasks or wrapper channels.
233
+
234
+
3.**Simplification is Always Valid**: Refactoring working code for simplicity is encouraged. Question architectural decisions even after tests pass. Fewer lines and fewer components often indicates better design.
235
+
236
+
4.**Challenge Complexity**: Every abstraction should justify its existence. "We might need it later" or "it provides separation" aren't sufficient reasons. Complexity must solve specific, current problems.
237
+
228
238
### Specific Guidelines
229
239
230
240
1.**Naming**:
@@ -409,9 +419,34 @@ When writing PR descriptions, follow these guidelines for maintainable and revie
409
419
410
420
-**Keep "Proposed Changes" section high-level** - focus on what components were changed and why
411
421
-**Avoid line-by-line documentation** - reviewers can see specific changes in the diff
412
-
-**Use component-level summaries** rather than file-by-file breakdowns
422
+
-**Use component-level summaries** rather than file-by-file breakdowns
413
423
-**Emphasize the principles** being applied and operational impact
414
424
-**Be concise but complete** - provide context without overwhelming detail
425
+
-**Don't mention implementation details** - avoid specifying exact files, line numbers, or function names
426
+
-**Don't state the obvious** - don't mention that tests pass (CI will verify this)
427
+
-**Avoid redundancy** - don't repeat information already in the title or commit message
428
+
-**Focus on the "why"** - explain the motivation and impact, not the mechanics
0 commit comments