-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: add telemetry queue persistence system #7133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Implement TelemetryQueueManager for persistent event storage - Add QueuedTelemetryClient base class with retry logic - Update PostHogTelemetryClient to use queuing system - Store queue per-workspace to avoid conflicts - Add exponential backoff retry (1s to 60s max) - Events persist for 24 hours before expiring - Queue limited to 100 events to manage file size - Add comprehensive tests for queue functionality - Disable PostHog's internal queue for better control This ensures telemetry events are not lost during network outages or server downtime, with events persisted to disk and retried automatically when connectivity is restored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution! I've reviewed the telemetry queue persistence implementation and found several issues that need attention. The overall architecture is solid, but there are some critical issues around production logging and potential race conditions that should be addressed.
|
||
// Force immediate flush to detect network errors | ||
// This will throw if there's a network issue | ||
await this.client.flush() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flush() call could fail for reasons other than network issues. Could we add more specific error handling here to distinguish between different failure types? This would help with debugging and potentially allow for different retry strategies based on the error type.
/** | ||
* Persist queue to disk | ||
*/ | ||
private async persistQueue(): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The persistQueue() method writes to disk asynchronously without awaiting. Could this cause race conditions if multiple rapid enqueue operations occur? Consider using a write queue or debouncing mechanism to prevent potential data corruption.
public updateTelemetryState(didUserOptIn: boolean): void { | ||
this.telemetryEnabled = didUserOptIn | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we define a proper interface for the test client instead of using 'any' type? This would improve type safety and make the tests more maintainable.
}) | ||
|
||
describe("getRetryDelay", () => { | ||
it("should calculate exponential backoff correctly", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed there's no test for the scenario where the queue file is corrupted (invalid JSON). The loadQueue method handles this case, but it would be good to have explicit test coverage.
- Make console.log statements conditional based on debug flag - Add singleton reset method for TelemetryQueueManager - Fix race condition in persistQueue with debouncing and promise tracking - Improve error handling in PostHogTelemetryClient to differentiate error types - Make retry interval configurable in QueuedTelemetryClient - Update tests to reflect 24-hour event expiration instead of retry-based removal - Add test for handling corrupted JSON queue files - Fix test expectations for retry count and event filtering
- Move pendingPersist flag clearing inside persistQueue() method - Implement loop-based draining to handle concurrent persist requests - Add comprehensive tests for concurrent operations - Ensures no telemetry events are lost during rapid enqueue operations The previous implementation had a lost-notification bug where the pendingPersist flag was cleared in the setImmediate callback before calling persistQueue(). This could cause events enqueued during an in-flight persist to remain unpersisted. The fix implements Option A: clearing the flag inside persistQueue() and using a while loop to drain all pending requests, ensuring any enqueue that happens during a persist operation triggers another persist pass immediately after.
- Remove unused _isNetworkError variable in PostHogTelemetryClient - Add TelemetryQueueManager.resetInstance() call in extension deactivation to prevent memory leaks and stale data when switching workspaces
- Removed all console.info and console.log statements that were used for debugging - Kept console.error statements for actual error reporting - Fixed ESLint warnings for unused error variables - Telemetry debug logs are now completely removed to prevent console spam
Summary
This PR implements a telemetry queuing system to ensure telemetry events are not lost during network outages or server downtime.
Problem
Currently, the Roo Code Extension sends telemetry using a "fire and forget" approach. When telemetry events fail to send (due to network issues, server downtime, or other connectivity problems), they are simply lost with no retry mechanism.
Solution
Key Components:
TelemetryQueueManager - Manages persistent event storage
QueuedTelemetryClient - Base class for telemetry clients
Updated PostHogTelemetryClient
Features:
Testing:
Changes
How to Test
The queue file is stored at:
Important
Introduces a telemetry queuing system with persistent storage and retry logic to handle failed telemetry events due to network issues.
TelemetryQueueManager
for persistent event storage and retry logic with exponential backoff inTelemetryQueueManager.ts
.QueuedTelemetryClient
as a base class for telemetry clients to handle automatic queuing and retry inQueuedTelemetryClient.ts
.PostHogTelemetryClient
to extendQueuedTelemetryClient
, disabling PostHog's internal queue and handling retries inPostHogTelemetryClient.ts
.TelemetryQueueManager
inTelemetryQueueManager.test.ts
.QueuedTelemetryClient
retry logic inQueuedTelemetryClient.test.ts
.PostHogTelemetryClient
tests inPostHogTelemetryClient.test.ts
.index.ts
.extension.ts
to register the new telemetry client with debug mode support.This description was created by
for a897849. You can customize this summary. It will automatically update as commits are pushed.