Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 15 additions & 24 deletions apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Partition pruning with inserted_at is ineffective on the new search table

The LogsListPresenter applies inserted_at >= ... and inserted_at <= ... filters (lines 269-271, 282-284). On the old task_events_v2 table, these were effective for partition pruning since it used PARTITION BY toDate(inserted_at) (010_add_task_events_v2.sql:48). However, the new task_events_search_v1 table uses PARTITION BY toDate(triggered_timestamp) (016_add_task_events_search_v1.sql:26). The inserted_at filter will still execute correctly but won't help prune partitions, making it dead weight. For effective partition pruning, the time range filter should target triggered_timestamp instead. The start_time filters that are also applied are close to triggered_timestamp (since triggered_timestamp = start_time + duration) but won't trigger partition pruning either since they reference a different column. This won't cause incorrect results but may degrade query performance for large datasets.

(Refers to lines 266-289)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 CANCELLED status SPANs are unreachable by any level filter

The excluded_statuses parameter at LogsListPresenter.server.ts:300 always excludes ['ERROR', 'CANCELLED'] from the kinds branch of every level. This means:

  • INFO filter: kind IN ('LOG_INFO', 'LOG_LOG', 'SPAN') AND status NOT IN ('ERROR', 'CANCELLED') — excludes CANCELLED SPANs
  • ERROR filter: status IN ('ERROR') — doesn't match CANCELLED
  • No other level claims them

kindToLevel('SPAN', 'CANCELLED') at logUtils.ts:73-94 returns INFO (since CANCELLED isn't checked before the switch). So a SPAN with CANCELLED status is classified as INFO but excluded from the INFO filter. These rows would appear in the unfiltered view but become invisible once any level filter is applied. This may be acceptable since CANCELLED is a terminal state not typically surfaced in logs, but it's an inconsistency worth being aware of.

(Refers to lines 298-301)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Contributor Author

@mpcgrid mpcgrid Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be figured out in the future after some alpha testing in prod. This status is not that used and we might not want to filter by it.

Original file line number Diff line number Diff line change
Expand Up @@ -84,13 +84,13 @@ export type LogsListAppliedFilters = LogsList["filters"];
// Cursor is a base64 encoded JSON of the pagination keys
type LogCursor = {
environmentId: string;
unixTimestamp: number;
triggeredTimestamp: string; // DateTime64(9) string
traceId: string;
};

const LogCursorSchema = z.object({
environmentId: z.string(),
unixTimestamp: z.number(),
triggeredTimestamp: z.string(),
traceId: z.string(),
});

Expand Down Expand Up @@ -252,7 +252,7 @@ export class LogsListPresenter extends BasePresenter {
);
}

const queryBuilder = this.clickhouse.taskEventsV2.logsListQueryBuilder();
const queryBuilder = this.clickhouse.taskEventsSearch.logsListQueryBuilder();

queryBuilder.where("environment_id = {environmentId: String}", {
environmentId,
Expand Down Expand Up @@ -350,38 +350,30 @@ export class LogsListPresenter extends BasePresenter {
}

// Debug logs are available only to admins
if (includeDebugLogs === false) {
queryBuilder.where("kind NOT IN {debugKinds: Array(String)}", {
debugKinds: ["DEBUG_EVENT"],
});

queryBuilder.where("NOT ((kind = 'LOG_INFO') AND (attributes_text = '{}'))");
}

queryBuilder.where("kind NOT IN {debugSpans: Array(String)}", {
debugSpans: ["SPAN", "ANCESTOR_OVERRIDE", "SPAN_EVENT"],
});

// kindCondition += ` `;
// params["excluded_statuses"] = ["SPAN", "ANCESTOR_OVERRIDE", "SPAN_EVENT"];

// if (includeDebugLogs === false) {
// queryBuilder.where("kind NOT IN {debugKinds: Array(String)}", {
// debugKinds: ["DEBUG_EVENT"],
// });
//
// queryBuilder.where("NOT ((kind = 'LOG_INFO') AND (attributes_text = '{}'))");
// }

queryBuilder.where("NOT (kind = 'SPAN' AND status = 'PARTIAL')");
// SPAN, ANCESTOR_OVERRIDE, SPAN_EVENT kinds are already filtered out by the materialized view

// Cursor pagination
const decodedCursor = cursor ? decodeCursor(cursor) : null;
if (decodedCursor) {
queryBuilder.where(
"(environment_id, toUnixTimestamp(start_time), trace_id) < ({cursorEnvId: String}, {cursorUnixTimestamp: Int64}, {cursorTraceId: String})",
"(environment_id, triggered_timestamp, trace_id) < ({cursorEnvId: String}, {cursorTriggeredTimestamp: String}, {cursorTraceId: String})",
{
cursorEnvId: decodedCursor.environmentId,
cursorUnixTimestamp: decodedCursor.unixTimestamp,
cursorTriggeredTimestamp: decodedCursor.triggeredTimestamp,
cursorTraceId: decodedCursor.traceId,
}
);
}

queryBuilder.orderBy("environment_id DESC, toUnixTimestamp(start_time) DESC, trace_id DESC");
queryBuilder.orderBy("environment_id DESC, triggered_timestamp DESC, trace_id DESC");
// Limit + 1 to check if there are more results
queryBuilder.limit(pageSize + 1);

Expand All @@ -399,10 +391,9 @@ export class LogsListPresenter extends BasePresenter {
let nextCursor: string | undefined;
if (hasMore && logs.length > 0) {
const lastLog = logs[logs.length - 1];
const unixTimestamp = Math.floor(new Date(lastLog.start_time).getTime() / 1000);
nextCursor = encodeCursor({
environmentId,
unixTimestamp,
triggeredTimestamp: lastLog.triggered_timestamp,
traceId: lastLog.trace_id,
});
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ export const loader = async ({ request, params }: LoaderFunctionArgs) => {
}

// Query ClickHouse for related spans in the same trace
const queryBuilder = clickhouseClient.taskEventsV2.logsListQueryBuilder();
const queryBuilder = clickhouseClient.taskEventsSearch.logsListQueryBuilder();

queryBuilder.where("environment_id = {environmentId: String}", {
environmentId: environment.id,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
-- +goose Up
CREATE TABLE IF NOT EXISTS trigger_dev.task_events_search_v1
(
environment_id String,
organization_id String,
project_id String,
triggered_timestamp DateTime64(9) CODEC(Delta(8), ZSTD(1)),
trace_id String CODEC(ZSTD(1)),
span_id String CODEC(ZSTD(1)),
run_id String CODEC(ZSTD(1)),
task_identifier String CODEC(ZSTD(1)),
start_time DateTime64(9) CODEC(Delta(8), ZSTD(1)),
inserted_at DateTime64(3),
message String CODEC(ZSTD(1)),
kind LowCardinality(String) CODEC(ZSTD(1)),
status LowCardinality(String) CODEC(ZSTD(1)),
duration UInt64 CODEC(ZSTD(1)),
parent_span_id String CODEC(ZSTD(1)),
attributes_text String CODEC(ZSTD(1)),

INDEX idx_run_id run_id TYPE bloom_filter(0.001) GRANULARITY 1,
INDEX idx_message_text_search lower(message) TYPE ngrambf_v1(3, 32768, 2, 0) GRANULARITY 1,
INDEX idx_attributes_text_search lower(attributes_text) TYPE ngrambf_v1(3, 32768, 2, 0) GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toDate(triggered_timestamp)
ORDER BY (organization_id, environment_id, triggered_timestamp)
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cursor pagination tiebreaker (trace_id) is not in the table's ORDER BY.

The table ORDER BY is (organization_id, environment_id, triggered_timestamp), but LogsListPresenter paginates with the tuple (environment_id, triggered_timestamp, trace_id). Since trace_id is absent from the ordering key, rows sharing the same triggered_timestamp have no deterministic order — the cursor comparison can skip or duplicate rows across pages.

Consider adding trace_id to the ORDER BY:

Proposed fix
-ORDER BY (organization_id, environment_id, triggered_timestamp)
+ORDER BY (organization_id, environment_id, triggered_timestamp, trace_id)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ENGINE = MergeTree
PARTITION BY toDate(triggered_timestamp)
ORDER BY (organization_id, environment_id, triggered_timestamp)
ENGINE = MergeTree
PARTITION BY toDate(triggered_timestamp)
ORDER BY (organization_id, environment_id, triggered_timestamp, trace_id)
🤖 Prompt for AI Agents
In `@internal-packages/clickhouse/schema/016_add_task_events_search_v1.sql` around
lines 25 - 27, The table's MergeTree ORDER BY lacks the cursor tiebreaker
trace_id used by LogsListPresenter, causing non-deterministic pagination; update
the table definition (the ENGINE = MergeTree block and its ORDER BY clause) to
include trace_id appended to the existing ORDER BY tuple (organization_id,
environment_id, triggered_timestamp) so ordering becomes (organization_id,
environment_id, triggered_timestamp, trace_id), then rebuild or ALTER the table
accordingly to ensure deterministic cursor pagination.

Copy link
Contributor Author

@mpcgrid mpcgrid Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they will also have the same trace_id, duplication will be handled at span level

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Check the LogsListPresenter implementation to understand cursor pagination logic
cat apps/webapp/app/presenters/v3/LogsListPresenter.server.ts

# Check the materialized view definition to understand triggered_timestamp computation
cat internal-packages/clickhouse/schema/016_add_task_events_search_v1.sql | grep -A 5 "triggered_timestamp"

# Search for any code that groups or filters by trace_id in relation to triggered_timestamp
rg -n -C3 "triggered_timestamp.*trace_id|trace_id.*triggered_timestamp" --type=ts

Length of output: 16875


@mpcgrid I don't think that's necessarily the case. The triggered_timestamp is computed as start_time + duration for each individual event/span in the materialized view:

fromUnixTimestamp64Nano(toUnixTimestamp64Nano(start_time) + toInt64(duration)) AS triggered_timestamp

This means triggered_timestamp represents the end time of each event. Multiple events from different traces can theoretically have the same end time, even at nanosecond precision:

  • Event A from Trace X: starts at T, duration 100ms → ends at T+100ms
  • Event B from Trace Y: starts at T+50ms, duration 50ms → ends at T+100ms

Both events would have identical triggered_timestamp values but different trace_id values.

Since the cursor pagination in LogsListPresenter uses the tuple (environment_id, triggered_timestamp, trace_id) for ordering and comparison (line 367, 376), but the table's ORDER BY only includes (organization_id, environment_id, triggered_timestamp), ClickHouse won't guarantee a deterministic order for rows sharing the same triggered_timestamp. This could lead to pagination inconsistencies.

Adding trace_id to the table's ORDER BY ensures the physical row order matches the cursor's comparison semantics, making pagination deterministic.


🧠 Learnings used
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/clickhouse/schema/**/*.sql : ClickHouse migrations must use Goose format with `-- +goose Up` and `-- +goose Down` markers

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/clickhouse/schema/**/*.sql : Follow ClickHouse naming conventions: `raw_` prefix for input tables, `_v1`, `_v2` suffixes for versioning, `_mv_v1` suffix for materialized views

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/database/prisma/migrations/**/*.sql : When editing the Prisma schema, remove extraneous migration lines related to specific tables: `_BackgroundWorkerToBackgroundWorkerFile`, `_BackgroundWorkerToTaskQueue`, `_TaskRunToTaskRunTag`, `_WaitpointRunConnections`, `_completedWaitpoints`, `SecretStore_key_idx`, and unrelated `TaskRun` indexes

--Right now we have maximum retention of up to 30 days based on plan.
--We put a logical limit for now, the 90 DAY TTL is just a backup
--This might need to be updated for longer retention periods
TTL toDateTime(triggered_timestamp) + INTERVAL 90 DAY
SETTINGS ttl_only_drop_parts = 1;

CREATE MATERIALIZED VIEW IF NOT EXISTS trigger_dev.task_events_search_mv_v1
TO trigger_dev.task_events_search_v1 AS
SELECT
environment_id,
organization_id,
project_id,
trace_id,
span_id,
run_id,
task_identifier,
start_time,
inserted_at,
message,
kind,
status,
duration,
parent_span_id,
toJSONString(attributes) AS attributes_text,
fromUnixTimestamp64Nano(toUnixTimestamp64Nano(start_time) + toInt64(duration)) AS triggered_timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 triggered_timestamp = start_time + duration means log events and span events sort differently than before

The MV computes triggered_timestamp as fromUnixTimestamp64Nano(toUnixTimestamp64Nano(start_time) + toInt64(duration)) at internal-packages/clickhouse/schema/016_add_task_events_search_v1.sql:52. For log events (duration=0), triggered_timestamp = start_time. For spans (duration>0), triggered_timestamp = end_time. The old ordering used start_time directly (toUnixTimestamp(start_time) DESC). This means spans now sort by their completion time rather than start time. This is a deliberate semantic change — logs appear in the order they "completed" rather than started — but could surprise users who expect chronological start-time ordering, especially for long-running spans that may appear far from their actual start in the log timeline.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intended

FROM trigger_dev.task_events_v2
WHERE
kind != 'DEBUG_EVENT'
AND status != 'PARTIAL'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 MV's status != 'PARTIAL' is broader than the old SPAN-only PARTIAL filter

The old code had queryBuilder.where("NOT (kind = 'SPAN' AND status = 'PARTIAL')") which only excluded PARTIAL status for SPAN kind events. The new MV at 016_add_task_events_search_v1.sql:56 uses AND status != 'PARTIAL' which excludes ALL event kinds with PARTIAL status. If there are non-SPAN events that legitimately have PARTIAL status and should appear in logs, they would now be silently excluded at the MV ingestion level with no way to recover them without backfilling. This is likely intentional cleanup, but worth confirming no other event kinds use PARTIAL status.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

AND NOT (kind = 'SPAN_EVENT' AND attributes_text = '{}')
AND message != 'trigger.dev/start';

-- +goose Down
DROP VIEW IF EXISTS trigger_dev.task_events_search_mv_v1;
DROP TABLE IF EXISTS trigger_dev.task_events_search_v1;
9 changes: 7 additions & 2 deletions internal-packages/clickhouse/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ import {
getTraceSummaryQueryBuilderV2,
insertTaskEvents,
insertTaskEventsV2,
getLogsListQueryBuilderV2,
getLogDetailQueryBuilderV2,
getLogsSearchListQueryBuilder,
} from "./taskEvents.js";
import { Logger, type LogLevel } from "@trigger.dev/core/logger";
import type { Agent as HttpAgent } from "http";
Expand Down Expand Up @@ -220,8 +220,13 @@ export class ClickHouse {
traceSummaryQueryBuilder: getTraceSummaryQueryBuilderV2(this.reader),
traceDetailedSummaryQueryBuilder: getTraceDetailedSummaryQueryBuilderV2(this.reader),
spanDetailsQueryBuilder: getSpanDetailsQueryBuilderV2(this.reader),
logsListQueryBuilder: getLogsListQueryBuilderV2(this.reader, this.logsQuerySettings?.list),
logDetailQueryBuilder: getLogDetailQueryBuilderV2(this.reader, this.logsQuerySettings?.detail),
};
}

get taskEventsSearch() {
return {
logsListQueryBuilder: getLogsSearchListQueryBuilder(this.reader, this.logsQuerySettings?.list),
};
}
}
24 changes: 15 additions & 9 deletions internal-packages/clickhouse/src/taskEvents.ts
Original file line number Diff line number Diff line change
Expand Up @@ -231,11 +231,12 @@ export function getSpanDetailsQueryBuilderV2(
});
}


// ============================================================================
// Logs List Query Builders (for aggregated logs page)
// Search Table Query Builders (for logs page, using task_events_search_v1)
// ============================================================================

export const LogsListResult = z.object({
export const LogsSearchListResult = z.object({
environment_id: z.string(),
organization_id: z.string(),
project_id: z.string(),
Expand All @@ -250,14 +251,18 @@ export const LogsListResult = z.object({
status: z.string(),
duration: z.number().or(z.string()),
attributes_text: z.string(),
triggered_timestamp: z.string(),
});

export type LogsListResult = z.output<typeof LogsListResult>;
export type LogsSearchListResult = z.output<typeof LogsSearchListResult>;

export function getLogsListQueryBuilderV2(ch: ClickhouseReader, settings?: ClickHouseSettings) {
return ch.queryBuilderFast<LogsListResult>({
name: "getLogsList",
table: "trigger_dev.task_events_v2",
export function getLogsSearchListQueryBuilder(
ch: ClickhouseReader,
settings?: ClickHouseSettings
) {
return ch.queryBuilderFast<LogsSearchListResult>({
name: "getLogsSearchList",
table: "trigger_dev.task_events_search_v1",
columns: [
"environment_id",
"organization_id",
Expand All @@ -268,11 +273,12 @@ export function getLogsListQueryBuilderV2(ch: ClickhouseReader, settings?: Click
"trace_id",
"span_id",
"parent_span_id",
{ name: "message", expression: "LEFT(message, 512)" },
{ name: "message", expression: "LEFT(message, 256)" },
"kind",
"status",
"duration",
"attributes_text"
"attributes_text",
"triggered_timestamp",
],
settings,
});
Expand Down
26 changes: 25 additions & 1 deletion references/seed/src/trigger/spanSpammer.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { logger, task, wait } from "@trigger.dev/sdk/v3";
import { logger, task, wait, metadata } from "@trigger.dev/sdk/v3";

const CONFIG = {
delayBetweenBatchesSeconds: 0.2,
Expand All @@ -14,6 +14,22 @@ export const SpanSpammerTask = task({
const context = { payload, ctx };
let logCount = 0;

// 30s trace with events every 5s
await logger.trace("10s-span", async () => {
const totalSeconds = 10;
const intervalSeconds = 2;
const totalEvents = totalSeconds / intervalSeconds;

logger.info("Starting 30s span", context);

for (let i = 1; i <= totalEvents; i++) {
await wait.for({ seconds: intervalSeconds });
logger.info(`Inner event ${i}/${totalEvents} at ${i * intervalSeconds}s`, context);
}

logger.info("Completed 30s span", context);
});

logger.info("Starting span spammer task", context);
logger.warn("This will generate a lot of logs", context);

Expand All @@ -36,6 +52,14 @@ export const SpanSpammerTask = task({
emitBatch("This is a test log!!! Log number: ");
}

metadata.parent.set("childStatus", "running");
metadata.parent.increment("completedChildren", 1);

// Update the root run's metadata (top-level run in the chain)
metadata.root.set("deepChildStatus", "done");
metadata.root.append("completedTasks", "child-task");


logger.info("Completed span spammer task", context);
return { message: `Created ${logCount} logs` };
},
Expand Down