feat: add session record disk logging (RecordLogger)#953
feat: add session record disk logging (RecordLogger)#953
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a disk-based JSONL logger for session records, enabling per-session logging to a configurable directory. Key changes include the implementation of the RecordLogger class, integration into the session registry, and a new CLI argument to enable the feature. Feedback focuses on performance and efficiency: the logging call within the asynchronous chat_completions function performs blocking I/O and should be offloaded to a thread pool to prevent event loop degradation. Additionally, using Pydantic's model_dump_json() is recommended for more efficient serialization of session records.
| response=response, | ||
| ) | ||
| session.append_record(record) | ||
| registry.log_record(session_id, record) |
There was a problem hiding this comment.
The registry.log_record call performs blocking synchronous I/O (writing to disk and flushing) within an async function. This will block the event loop and degrade performance for all concurrent requests. Consider offloading this to a thread pool.
| registry.log_record(session_id, record) | |
| import asyncio | |
| await asyncio.to_thread(registry.log_record, session_id, record) |
There was a problem hiding this comment.
Here I think we dont need asyncio, the record logging should be very light-weighted? Not sure
| def log_record(self, session_id: str, record: SessionRecord) -> None: | ||
| try: | ||
| handle = self._get_handle(session_id) | ||
| handle.write(json.dumps(record.model_dump(), default=str) + "\n") |
There was a problem hiding this comment.
Since SessionRecord is a Pydantic model, using model_dump_json() is more efficient than manually dumping to a dict and then to JSON. It also handles common types like datetime or UUID more gracefully.
| handle.write(json.dumps(record.model_dump(), default=str) + "\n") | |
| handle.write(record.model_dump_json() + "\n") |
| """Enqueue a session-close event (flushes and closes the file handle).""" | ||
| self._queue.put(("close", session_id, None)) | ||
|
|
||
| def close_all(self) -> None: |
There was a problem hiding this comment.
close_all here is not called other than test?
Do you want to add an atexit with close all?
e.g.
import atexit
def __init__(self, log_dir: str):
...
self._thread.start()
atexit.register(self.close_all)
| response=response, | ||
| ) | ||
| session.append_record(record) | ||
| registry.log_record(session_id, record) |
There was a problem hiding this comment.
Here I think we dont need asyncio, the record logging should be very light-weighted? Not sure
| callers (typically async request handlers) never block on file operations. | ||
| Records are flushed immediately so that partial sessions are preserved | ||
| if the process crashes. | ||
| """ |
There was a problem hiding this comment.
Can you refactor this file? I think the logic is good but a little hard to follow.
There was a problem hiding this comment.
Okay. Let me check in more detail
Add a JSONL-based RecordLogger that writes each SessionRecord to a per-session file under `--session-record-log-dir`. Records are flushed immediately so partial sessions survive crashes. Integrate RecordLogger into SessionRegistry: records are written on each chat completion and file handles are closed when sessions are deleted. Made-with: Cursor
Move serialization and disk I/O off the event loop into a dedicated daemon thread backed by a SimpleQueue, so log_record() and close_session() return immediately without blocking async handlers. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Remove the standalone --session-record-log-dir argument and instead
automatically set it to {dump_details}/session_records when
--dump-details is provided, consistent with rollout_data and train_data.
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
02449ea to
69eee6a
Compare
Summary
RecordLoggerclass that writesSessionRecordobjects as one-JSON-per-line to per-session files under--session-record-log-dirSessionRegistry: records are written on each chat completion and file handles are closed when sessions are deleted--session-record-log-dirCLI argument (disabled when not set)Test plan
--session-record-log-diris setMade with Cursor