-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
improv(webapp): Add new table for optimized logs search #3036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
d322cec
a614149
5eca0e3
cc6dc02
51d5852
f3534c4
51ceac9
b9324cd
8dd9258
58022cc
96163c6
9a83b02
be9888d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🚩 CANCELLED status SPANs are unreachable by any level filter The
(Refers to lines 298-301) Was this helpful? React with 👍 or 👎 to provide feedback.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will be figured out in the future after some alpha testing in prod. This status is not that used and we might not want to filter by it. |
devin-ai-integration[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,62 @@ | ||||||||||||||
| -- +goose Up | ||||||||||||||
| CREATE TABLE IF NOT EXISTS trigger_dev.task_events_search_v1 | ||||||||||||||
| ( | ||||||||||||||
| environment_id String, | ||||||||||||||
| organization_id String, | ||||||||||||||
| project_id String, | ||||||||||||||
| triggered_timestamp DateTime64(9) CODEC(Delta(8), ZSTD(1)), | ||||||||||||||
| trace_id String CODEC(ZSTD(1)), | ||||||||||||||
| span_id String CODEC(ZSTD(1)), | ||||||||||||||
| run_id String CODEC(ZSTD(1)), | ||||||||||||||
| task_identifier String CODEC(ZSTD(1)), | ||||||||||||||
| start_time DateTime64(9) CODEC(Delta(8), ZSTD(1)), | ||||||||||||||
| inserted_at DateTime64(3), | ||||||||||||||
| message String CODEC(ZSTD(1)), | ||||||||||||||
| kind LowCardinality(String) CODEC(ZSTD(1)), | ||||||||||||||
| status LowCardinality(String) CODEC(ZSTD(1)), | ||||||||||||||
| duration UInt64 CODEC(ZSTD(1)), | ||||||||||||||
| parent_span_id String CODEC(ZSTD(1)), | ||||||||||||||
| attributes_text String CODEC(ZSTD(1)), | ||||||||||||||
|
|
||||||||||||||
| INDEX idx_run_id run_id TYPE bloom_filter(0.001) GRANULARITY 1, | ||||||||||||||
| INDEX idx_message_text_search lower(message) TYPE ngrambf_v1(3, 32768, 2, 0) GRANULARITY 1, | ||||||||||||||
| INDEX idx_attributes_text_search lower(attributes_text) TYPE ngrambf_v1(3, 32768, 2, 0) GRANULARITY 1 | ||||||||||||||
| ) | ||||||||||||||
| ENGINE = MergeTree | ||||||||||||||
| PARTITION BY toDate(triggered_timestamp) | ||||||||||||||
| ORDER BY (organization_id, environment_id, triggered_timestamp) | ||||||||||||||
|
||||||||||||||
| ENGINE = MergeTree | |
| PARTITION BY toDate(triggered_timestamp) | |
| ORDER BY (organization_id, environment_id, triggered_timestamp) | |
| ENGINE = MergeTree | |
| PARTITION BY toDate(triggered_timestamp) | |
| ORDER BY (organization_id, environment_id, triggered_timestamp, trace_id) |
🤖 Prompt for AI Agents
In `@internal-packages/clickhouse/schema/016_add_task_events_search_v1.sql` around
lines 25 - 27, The table's MergeTree ORDER BY lacks the cursor tiebreaker
trace_id used by LogsListPresenter, causing non-deterministic pagination; update
the table definition (the ENGINE = MergeTree block and its ORDER BY clause) to
include trace_id appended to the existing ORDER BY tuple (organization_id,
environment_id, triggered_timestamp) so ordering becomes (organization_id,
environment_id, triggered_timestamp, trace_id), then rebuild or ALTER the table
accordingly to ensure deterministic cursor pagination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they will also have the same trace_id, duplication will be handled at span level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the LogsListPresenter implementation to understand cursor pagination logic
cat apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
# Check the materialized view definition to understand triggered_timestamp computation
cat internal-packages/clickhouse/schema/016_add_task_events_search_v1.sql | grep -A 5 "triggered_timestamp"
# Search for any code that groups or filters by trace_id in relation to triggered_timestamp
rg -n -C3 "triggered_timestamp.*trace_id|trace_id.*triggered_timestamp" --type=ts
Length of output: 16875
@mpcgrid I don't think that's necessarily the case. The triggered_timestamp is computed as start_time + duration for each individual event/span in the materialized view:
fromUnixTimestamp64Nano(toUnixTimestamp64Nano(start_time) + toInt64(duration)) AS triggered_timestampThis means triggered_timestamp represents the end time of each event. Multiple events from different traces can theoretically have the same end time, even at nanosecond precision:
- Event A from Trace X: starts at T, duration 100ms → ends at T+100ms
- Event B from Trace Y: starts at T+50ms, duration 50ms → ends at T+100ms
Both events would have identical triggered_timestamp values but different trace_id values.
Since the cursor pagination in LogsListPresenter uses the tuple (environment_id, triggered_timestamp, trace_id) for ordering and comparison (line 367, 376), but the table's ORDER BY only includes (organization_id, environment_id, triggered_timestamp), ClickHouse won't guarantee a deterministic order for rows sharing the same triggered_timestamp. This could lead to pagination inconsistencies.
Adding trace_id to the table's ORDER BY ensures the physical row order matches the cursor's comparison semantics, making pagination deterministic.
🧠 Learnings used
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/clickhouse/schema/**/*.sql : ClickHouse migrations must use Goose format with `-- +goose Up` and `-- +goose Down` markers
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/clickhouse/schema/**/*.sql : Follow ClickHouse naming conventions: `raw_` prefix for input tables, `_v1`, `_v2` suffixes for versioning, `_mv_v1` suffix for materialized views
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/database/prisma/migrations/**/*.sql : When editing the Prisma schema, remove extraneous migration lines related to specific tables: `_BackgroundWorkerToBackgroundWorkerFile`, `_BackgroundWorkerToTaskQueue`, `_TaskRunToTaskRunTag`, `_WaitpointRunConnections`, `_completedWaitpoints`, `SecretStore_key_idx`, and unrelated `TaskRun` indexes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚩 triggered_timestamp = start_time + duration means log events and span events sort differently than before
The MV computes triggered_timestamp as fromUnixTimestamp64Nano(toUnixTimestamp64Nano(start_time) + toInt64(duration)) at internal-packages/clickhouse/schema/016_add_task_events_search_v1.sql:52. For log events (duration=0), triggered_timestamp = start_time. For spans (duration>0), triggered_timestamp = end_time. The old ordering used start_time directly (toUnixTimestamp(start_time) DESC). This means spans now sort by their completion time rather than start time. This is a deliberate semantic change — logs appear in the order they "completed" rather than started — but could surprise users who expect chronological start-time ordering, especially for long-running spans that may appear far from their actual start in the log timeline.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intended
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚩 MV's status != 'PARTIAL' is broader than the old SPAN-only PARTIAL filter
The old code had queryBuilder.where("NOT (kind = 'SPAN' AND status = 'PARTIAL')") which only excluded PARTIAL status for SPAN kind events. The new MV at 016_add_task_events_search_v1.sql:56 uses AND status != 'PARTIAL' which excludes ALL event kinds with PARTIAL status. If there are non-SPAN events that legitimately have PARTIAL status and should appear in logs, they would now be silently excluded at the MV ingestion level with no way to recover them without backfilling. This is likely intentional cleanup, but worth confirming no other event kinds use PARTIAL status.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚩 Partition pruning with
inserted_atis ineffective on the new search tableThe LogsListPresenter applies
inserted_at >= ...andinserted_at <= ...filters (lines 269-271, 282-284). On the oldtask_events_v2table, these were effective for partition pruning since it usedPARTITION BY toDate(inserted_at)(010_add_task_events_v2.sql:48). However, the newtask_events_search_v1table usesPARTITION BY toDate(triggered_timestamp)(016_add_task_events_search_v1.sql:26). Theinserted_atfilter will still execute correctly but won't help prune partitions, making it dead weight. For effective partition pruning, the time range filter should targettriggered_timestampinstead. Thestart_timefilters that are also applied are close totriggered_timestamp(sincetriggered_timestamp = start_time + duration) but won't trigger partition pruning either since they reference a different column. This won't cause incorrect results but may degrade query performance for large datasets.(Refers to lines 266-289)
Was this helpful? React with 👍 or 👎 to provide feedback.