Skip to content

Log: switch both log files to JSONL (one JSON object per line)#31

Merged
DaLuSt merged 1 commit intomainfrom
claude/log-switch-to-jsonl-7Wp79
Feb 27, 2026
Merged

Log: switch both log files to JSONL (one JSON object per line)#31
DaLuSt merged 1 commit intomainfrom
claude/log-switch-to-jsonl-7Wp79

Conversation

@DaLuSt
Copy link
Copy Markdown
Owner

@DaLuSt DaLuSt commented Feb 21, 2026

Summary

Replaces all pipe/comma-separated log writes with json.dumps() so each entry is a complete, self-describing JSON object on a single line.

Why JSONL

Problem with current format How JSONL fixes it
Post titles / comment bodies containing , or | break parsing JSON string escaping handles any character
Multi-line comment bodies corrupt line count json.dumps escapes \n automatically
Two different formats (pipe for comments, comma for posts) One unified format for both files
Missing fields differ between files Every entry carries all relevant fields
Raw datetime object printed (includes microseconds, +00:00) ISO 8601 UTC with explicit Z suffix

Fields in every comment entry

deleted_at, created_at, id (fullname), subreddit, score, permalink, body, source

Fields in every post entry

deleted_at, created_at, id (fullname), subreddit, score, title, permalink, num_comments, source

Example

{"deleted_at": "2024-01-15T10:30:00Z", "created_at": "2023-06-01T12:00:00Z", "id": "t1_abc123", "subreddit": "python", "score": -2, "permalink": "https://reddit.com/r/python/comments/xyz/...", "body": "The comment text", "source": "ci"}

Files changed

commentCleaner.py (modes 1/2/3), PostCleaner.py, weekly_cleanup.py, web/app.py

https://claude.ai/code/session_014HLhrFtCVRFnEyfRexiy3d

@DaLuSt DaLuSt force-pushed the claude/log-switch-to-jsonl-7Wp79 branch from 3bc198b to 5175584 Compare February 21, 2026 22:17
Replace pipe/comma-separated text log lines with structured JSON objects
(one per line) in commentCleaner.py, PostCleaner.py, weekly_cleanup.py,
and web/app.py. Each entry now includes deleted_at, created_at (ISO 8601),
id (Reddit fullname), subreddit, score, permalink, body/title,
num_comments (posts only), and source field identifying the origin script/mode.

https://claude.ai/code/session_014HLhrFtCVRFnEyfRexiy3d
@DaLuSt DaLuSt force-pushed the claude/log-switch-to-jsonl-7Wp79 branch from 5175584 to 42b28a9 Compare February 27, 2026 09:02
@DaLuSt DaLuSt merged commit 57dda8d into main Feb 27, 2026
0 of 4 checks passed
@DaLuSt DaLuSt deleted the claude/log-switch-to-jsonl-7Wp79 branch February 27, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants