Skip to content

Conversation

@Kamalpannu
Copy link

Summary

This PR fixes the Dead Letter Queue (DLQ) workflow in ChronoMongoDatastore:

  • Properly preserves _id when moving tasks to and from the DLQ.
  • Updates task status to PENDING when redriving from DLQ.
  • Ensures main collection reflects the correct task state after redrive.
  • All associated unit tests for DLQ now pass.

Related Issue

Closes #35

Verification

  • Added failed tasks to DLQ.
  • Redrived tasks back to the main queue.
  • Verified status updates to PENDING.
  • Confirmed DLQ is empty after redrive.
  • Ran vitest and all 4 tests pass.

Notes

  • No breaking changes; backward compatible.
  • Workflow now fully aligned with expected DLQ behavior.

Copy link
Collaborator

@darrenpicard25 darrenpicard25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Kamapannu

Thank you so much for contributing to the chrono project. The team and I will look at your PR soon. In the mean time if you could update a few things it would be greatly appreciated and help speed up the process

  1. linting/formating seems to be slightly off. Could you ensure your have biome set up properly and are running appropriate lint/format commands
  2. Could you ensure your PR description includes the reason why you would like this feature. Is it to address issue #35 ??
  3. you appear to have corrected some spelling mistakes which is great, but some updates introduce spelling mistakes. Could you spend a minute to look them over and make sure all spelling suggestions are correct

If you can address these issues before the actual code review it will greatly speed up the process. If not we will probably add comments for changes in the review when we get to it.

Thank you

Copy link
Collaborator

@darrenpicard25 darrenpicard25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Kamalpannu

While we appreciate you spending the time to put up this PR to improve Chrono. I do not believe we can proceed with this PR.

The entire approach taken in this PR does not really make any sense as basically this is equivalent to just having an infinite retry policy (which is maybe something we can do), but would not even work due to duplicate key constraints.

It is clear that the code was not tested, as it would fail, while also blocking the processing of any other task

Let us know if you would like to discuss further, but we cannot proceed with this PR as it is now and a new approach needs to be thought of

/** The maximum time a task handler can take to complete before it will be considered timed out @default 5000ms */
taskHandlerTimeoutMs?: number;
/** The maximum number of retries for a task handler, before task is marked as failed. @default 5 */
/** The maximum number of retries for a tasak handler, before task is marked as failed. @default 5 */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** The maximum number of retries for a tasak handler, before task is marked as failed. @default 5 */
/** The maximum number of retries for a task handler, before task is marked as failed. @default 5 */

);

this.stopRequested = true;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should produce a linting warning having removed this empty line

timestamp: new Date(),
});
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly confused by this

  1. you are blocking the processing-loop for 60 seconds. meaning no new tasks can be claimed while this flow is being hit
  2. redriveFromDlq grabs all "tasks" in the DLQ yet you only emit a single TASK_RETRY_SCHEDULED event despite the possibility of many more being "re-scheduled"
  3. this logic adds a task document to the DLQ collection. Queries it. Inserts identical document into main collection. but does not update the "old" document in any way. And because you have now by-passed the await this.datastore.fail call the original document will be re-queried after the timeout interval.

claimedAt: undefined,
lastExecutedAt: new Date(),
_id: task._id, // make sure original _id is kept
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i kinda mentioned it in other comment but you are just re-inserting another document into the main collection. you're even still propagating the retryCount meaning this process will kick off again on a single error.

throw new Error('DLQ collection name is not set');
}
return database.collection<TaskDocument<TaskKind, TaskMapping[TaskKind]>>(this.config.dlqCollectionName);
}*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???

was this meant to be deleted?

*/
async redriveFromDlq<TaskKind extends keyof TaskMapping>(): Promise<void> {
const database = await this.getDatabase();
const dlqName = this.config.dlqCollectionName ?? 'chrono-tasks-dlq';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaults should be decide in the constructor. aka we can make dlqCollectionName required

package.json Outdated
},
"dependencies": {
"@neofinancial/chrono": "^0.5.1",
"mongodb-memory-server": "^10.2.3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???? not sure why these were added?

status: TaskStatus.PENDING,
claimedAt: undefined,
lastExecutedAt: new Date(),
_id: task._id, // make sure original _id is kept
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will error. you cannot have 2 documents in main collection with same _id

@Kamalpannu
Copy link
Author

Kamalpannu commented Oct 30, 2025 via email

@Kamalpannu
Copy link
Author

Hi @darrenpicard25 and team,
I’ve implemented the suggestions from the earlier review — including the retry limit and _id preservation fix.
Just wanted to confirm if you had a chance to review the latest commits or if there’s a preferred approach I should align with for the DLQ redrive logic.
I’d be happy to refactor it again if there’s a new design direction planned for Chrono.

Thanks again for your time and guidance earlier — it helped me understand the system’s workflow much better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement redrive tasks functionality

2 participants