Skip to content

fix(parser): keep last occurrence of streaming message id within file (#110)#155

Open
ousamabenyounes wants to merge 1 commit intogetagentseal:mainfrom
ousamabenyounes:fix/parser-streaming-dedup
Open

fix(parser): keep last occurrence of streaming message id within file (#110)#155
ousamabenyounes wants to merge 1 commit intogetagentseal:mainfrom
ousamabenyounes:fix/parser-streaming-dedup

Conversation

@ousamabenyounes
Copy link
Copy Markdown

@ousamabenyounes ousamabenyounes commented Apr 25, 2026

Summary

Fixes #110.

Claude Code writes the same message.id multiple times to the session JSONL as a response streams in. Only the final write carries the tool_use blocks (MCP servers, Agent, EnterPlanMode, …) and the authoritative token counts. The existing dedup in groupIntoTurns keeps the first occurrence by id, so for every streamed turn:

  • MCP tool calls disappear from the breakdown
  • tool_use blocks for Agent / EnterPlanMode / Bash are dropped
  • Token counts are slightly understated

Fix

Add a within-file pre-pass dedupeStreamingMessageIds in parseSessionFile that keeps the last occurrence of each message.id. The cross-file dedup in groupIntoTurns (against seenMsgIds) is correct for what it does — avoiding double-counting when the same session appears under multiple project dirs — and stays keep-first-seen.

The two concerns are now handled separately:

Where Concern Policy
parseSessionFile (new) Streaming writes within one file keep last
groupIntoTurns (unchanged) Same id seen across multiple files keep first

Verification

  • Baseline: 355 pass, 0 fail
  • Post-fix: 359 pass, 0 fail (4 new tests, 0 regressions)
  • New test tests/parser-streaming-dedup.test.ts fails on the unfixed code (verified by stashing the fix and re-running)
  • npx tsc --noEmit clean
  • No bracket-assign on {}-init maps in src/parser.ts or src/providers/ (semgrep guard)

Files changed

File Change
src/parser.ts new exported dedupeStreamingMessageIds; called from parseSessionFile before groupIntoTurns
tests/parser-streaming-dedup.test.ts 4 new tests

Vibe Coded by Ousama Ben Younes
Developed With Ora Studio (Claude Code)

…getagentseal#110)

Claude Code streams an assistant response across several JSONL writes that share
the same `message.id`: an early `message_start` (empty content), optional mid-
stream updates, and a final `message_stop` carrying the `tool_use` blocks plus
authoritative usage. `groupIntoTurns` deduplicates by id keeping the FIRST
occurrence, so `tool_use` (MCP, Agent, EnterPlanMode, …) and final token counts
in the last entry were silently dropped.

Add a within-file pre-pass `dedupeStreamingMessageIds` in `parseSessionFile`
that keeps the LAST occurrence of each message id. Cross-file dedup against
`seenMsgIds` in `groupIntoTurns` stays keep-first-seen (it serves a different
purpose: avoiding double-counting when the same session appears under several
project dirs).

Adds `tests/parser-streaming-dedup.test.ts` covering streaming dedup, mixed
user/assistant entries, no-id passthrough, and ordering between distinct ids.

Co-Authored-By: Ora Studio <[email protected]>
@Qodo-Free-For-OSS
Copy link
Copy Markdown

Hi, dedupeStreamingMessageIds keeps only the last occurrence of each message.id, which makes parseApiCall/groupIntoTurns use the later streaming update’s timestamp (typically message_stop) as the call timestamp. This can incorrectly include/exclude turns in dateRange filtering and shift day bucketing near midnight because the code explicitly uses the first assistant call timestamp as the cost-incurrence time.

Severity: action required | Category: correctness

How to fix: Preserve first timestamp per id

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

dedupeStreamingMessageIds() keeps the last entry for a given message.id, which changes the assistant call timestamp downstream to the later message_stop timestamp. This conflicts with parseSessionFile date bucketing/filtering logic that uses the first assistant call timestamp as the cost-incurrence timestamp.

Issue Context

The goal (keeping final tool_use blocks and authoritative usage) is correct, but timestamp semantics matter for dateRange filtering and day bucketing.

Fix Focus Areas

  • src/parser.ts[122-142]
  • src/parser.ts[285-330]

Suggested fix

Update dedupeStreamingMessageIds to keep the last entry’s content/usage, but preserve the earliest timestamp seen for that message.id:

  • First pass: track both lastIdxById and firstTimestampById (or earliest timestamp string) for each id.
  • Second pass: when you are about to push the kept (last) entry for an id, if firstTimestampById exists and differs from entry.timestamp, push a shallow clone { ...entry, timestamp: firstTimestamp } (avoid mutating input in-place).
  • Update/add a unit test to assert that the kept entry retains the first timestamp for bucketing, while still retaining the final tool_use/usage from the last update.

Found by Qodo code review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP tool calls and tool_use blocks missing from dashboard due to streaming write deduplication

2 participants