-
Notifications
You must be signed in to change notification settings - Fork 29
refactor: move retry event data query into retry flow #654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
alexbouchardd
approved these changes
Jan 23, 2026
Contributor
alexbouchardd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good refactor 👍
04c3ebd to
9277a3f
Compare
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Extends TestRetryDelivery to verify that the manual retry API publishes a DeliveryTask with complete event data to deliveryMQ. This ensures the deliverymq handler receives full event data for manual retries, consistent with the scheduled retry flow which fetches event data in the retry scheduler. Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add unit and e2e tests verifying that retries are not lost when the retry scheduler queries logstore before the event has been persisted. Test scenario: 1. Initial delivery fails, retry is scheduled 2. Retry scheduler queries logstore for event data 3. Event is not yet persisted (logmq batching delay) 4. Retry should remain in queue and be reprocessed later Tests added: - TestRetryScheduler_RaceCondition_EventNotYetPersisted (unit test) - TestE2E_Regression_RetryRaceCondition (e2e test) Also adds: - RetryVisibilityTimeoutSeconds config option - WithRetryVisibilityTimeout scheduler option - mockDelayedEventGetter for simulating delayed persistence Co-Authored-By: Claude Opus 4.5 <[email protected]>
Return error instead of nil so the retry message stays in queue and will be reprocessed after the visibility timeout. Co-Authored-By: Claude Opus 4.5 <[email protected]>
eb51fdb to
b701223
Compare
alexluong
added a commit
that referenced
this pull request
Jan 30, 2026
* refactor: normalize logstore driver interface - Rename interface methods: InsertManyDeliveryEvent -> InsertMany, ListDeliveryEvent -> ListDelivery, RetrieveDeliveryEvent -> RetrieveDelivery - Rename request types: ListDeliveryEventRequest -> ListDeliveryRequest, etc. - Add DeliveryRecord type for query results with Event and Delivery - Update memlogstore, pglogstore, chlogstore implementations - Update all API handlers and tests to use new interface - Remove DeliveryEventID field from Delivery struct Co-Authored-By: Claude Opus 4.5 <[email protected]> * refactor: introduce LogEntry message type for logmq * refactor: move batch processor from builder to logmq package * refactor: rename RetryMessage to RetryTask Co-Authored-By: Claude Opus 4.5 <[email protected]> * refactor: introduce DeliveryTask and update message queue flow - Add DeliveryTask struct with IdempotencyKey() and RetryID() methods - Update deliverymq to publish/consume DeliveryTask instead of DeliveryEvent - Update publishmq to create and enqueue DeliveryTask - Update RetryTask to convert to DeliveryTask - Update API handlers, eventtracer, alert, emetrics to use DeliveryTask - Fix Delivery fields (TenantID, Attempt, Manual) not being set before logging - Add :manual suffix to idempotency key for manual retries Co-Authored-By: Claude Opus 4.5 <[email protected]> * refactor: remove DeliveryEvent type and legacy API handlers Co-Authored-By: Claude Opus 4.5 <[email protected]> * docs: update README and comments to reflect DeliveryEvent removal - Update chlogstore/README.md method names and SQL examples - Update pglogstore/README.md method names - Update tracer_test.go comment to reference DeliveryTask Co-Authored-By: Claude Opus 4.5 <[email protected]> * refactor: remove delivery_event_id column and legacy API docs Co-Authored-By: Claude Opus 4.5 <[email protected]> * docs: generate config * refactor: change InsertMany to accept []*LogEntry Preserves Event-Delivery pairing through the insert flow, eliminating the need for eventMap reconstruction in ClickHouse implementation. Co-Authored-By: Claude Opus 4.5 <[email protected]> * refactor: move retry event data query into retry flow (#654) * refactor: add event fetching to retry scheduler Co-Authored-By: Claude Opus 4.5 <[email protected]> * fix: align mock eventGetter with logstore behavior Co-Authored-By: Claude Opus 4.5 <[email protected]> * refactor: remove logStore from messagehandler Co-Authored-By: Claude Opus 4.5 <[email protected]> * chore: remove dead eventGetter code from messagehandler tests Co-Authored-By: Claude Opus 4.5 <[email protected]> * test: verify manual retry publishes full event data Extends TestRetryDelivery to verify that the manual retry API publishes a DeliveryTask with complete event data to deliveryMQ. This ensures the deliverymq handler receives full event data for manual retries, consistent with the scheduled retry flow which fetches event data in the retry scheduler. Co-Authored-By: Claude Opus 4.5 <[email protected]> * test: add failing tests for retry race condition Add unit and e2e tests verifying that retries are not lost when the retry scheduler queries logstore before the event has been persisted. Test scenario: 1. Initial delivery fails, retry is scheduled 2. Retry scheduler queries logstore for event data 3. Event is not yet persisted (logmq batching delay) 4. Retry should remain in queue and be reprocessed later Tests added: - TestRetryScheduler_RaceCondition_EventNotYetPersisted (unit test) - TestE2E_Regression_RetryRaceCondition (e2e test) Also adds: - RetryVisibilityTimeoutSeconds config option - WithRetryVisibilityTimeout scheduler option - mockDelayedEventGetter for simulating delayed persistence Co-Authored-By: Claude Opus 4.5 <[email protected]> * fix: return error when event not found in logstore during retry Return error instead of nil so the retry message stays in queue and will be reprocessed after the visibility timeout. Co-Authored-By: Claude Opus 4.5 <[email protected]> * test: improve flaky tests * chore: dev yaml * chore: `make test` skip cmd/e2e by default --------- Co-authored-by: Claude Opus 4.5 <[email protected]> * test: redis testcontainer flakiness * refactor: rename Delivery to Attempt in core + API layer Co-Authored-By: Claude Opus 4.5 <[email protected]> * test: rename Delivery → Attempt in all test files Update test files to use the new Attempt naming: - logstore/drivertest/*.go: AttemptFactory, ListAttempt, RetrieveAttempt - deliverymq/*_test.go: AttemptStatus*, entry.Attempt - apirouter/*_test.go: AttemptFactory, /attempts API paths - logmq/batchprocessor_test.go: AttemptFactory, Attempt struct fields All unit tests pass (1665 tests). Co-Authored-By: Claude Opus 4.5 <[email protected]> * chore: add database migrations for Delivery → Attempt rename Co-Authored-By: Claude Opus 4.5 <[email protected]> * docs: rename Delivery to Attempt in OpenAPI spec Co-Authored-By: Claude Opus 4.5 <[email protected]> * feat: rename Delivery to Attempt in UI components Co-Authored-By: Claude Opus 4.5 <[email protected]> * chore: fix Delivery → Attempt rename inconsistencies - Revert attempt_metadata → delivery_metadata in OpenAPI (matches Go code) - Fix config: delivery_prefix → attempt_prefix, example att → atm - Fix variable names: deliveryID → attemptID, attErr → atmErr - Fix test names: TestListDeliveries → TestListAttempts, etc. - Update doc links for renamed API endpoints - Update comments and test descriptions Co-Authored-By: Claude Opus 4.5 <[email protected]> * chore: remove redundant inline comments Remove comments that simply restate what the next line of code does, such as "// Create client" before NewClient() or "// Check if exists" before .Exists(). Preserves all meaningful comments including godoc, section headers, WHY explanations, and unit clarifications. Co-Authored-By: Claude Opus 4.5 <[email protected]> * chore: add destination-scoped attempts routes, rename attempt_number, update UI name * chore: opanapi.yaml * test: e2e tests for attempt_number * chore: rename portal attempt route --------- Co-authored-by: Claude Opus 4.5 <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Moves event fetching from the delivery flow to the retry flow.
What changed
Why this design
Scheduled retry tasks only store event IDs (not full payloads) to keep queue messages small. Previously, the delivery handler had to detect retries via
event.Time.IsZero()and conditionally fetch event data.Moving this to the retry scheduler is cleaner because:
Both entry points now consistently provide full event data to the delivery queue: