feat: GraphQL-based scrapeLikedTweets with JSONL output by nj-io · Pull Request #20 · nirholas/XActions

nj-io · 2026-04-06T01:47:19Z

Summary

Replaces the DOM-based x_get_likes handler with a GraphQL API scraper using cursor-based pagination. The DOM approach capped at ~25 tweets due to X's viewport virtualization.

Performance

Approach	50 tweets	200 tweets
DOM scraping (old)	~8 min, capped at 25	impossible
GraphQL API (new)	14s	49s

Features

GraphQL Likes API with cursor pagination — no DOM scraping, no scroll limits
JSONL output to ~/.xactions/exports/likes-{user}-{ts}.jsonl — progress survives crashes, memory stays bounded
Rich data via parseTweetResult: text, media (images + video URLs), X Articles, cards, external URLs, engagement stats
Timestamp filtering: from/to params with early exit on reverse chronological data
Auth check on navigation — fails fast on expired cookies
Human-like delays between API pages (2-5s)

Return shape

{
  "file": "~/.xactions/exports/likes-fitdegen-2026-04-05T13-00-17-915Z.jsonl",
  "count": 200,
  "username": "fitdegen",
  "dateRange": { "from": "2026-03-29T10:12:39.000Z", "to": "2026-03-07T23:36:30.000Z" }
}

Architecture

scrapeLikedTweets() in src/scrapers/twitter/index.js
Uses parseTweetResult from shared helpers (feat: rewrite scrapeThread to use TweetDetail GraphQL API #17)
Resolves numeric userId via UserByScreenName GraphQL endpoint
Paginates via Likes GraphQL endpoint (data.user.result.timeline.timeline.instructions)
Wired through scrapers/index.js → local-tools.js → server.js
Removed from xeepyTools, old handler deleted

Depends on

feat: rewrite scrapeThread to use TweetDetail GraphQL API #17 — shared helpers (parseTweetResult, checkAuth, randomDelay)

Supersedes

Closed feat: enhanced scrapeLikedTweets scraper with rich data extraction #13, feat: enhanced scrapeLikedTweets with rich data and timestamp filtering #19

Test plan

50 likes in 14s with full rich data
200 likes in 49s, covers 3+ weeks of history
from param stops pagination early
to param skips newer tweets
JSONL file written incrementally (verified with wc -l during run)
Invalid date throws clear error

🤖 Generated with Claude Code

Replace DOM-based thread scraping with direct GraphQL API calls. X doesn't render self-reply threads as article elements in the DOM, causing empty results — especially for high-engagement tweets. The new approach: - Calls TweetDetail GraphQL API from the page context using session cookies - Gets full_text (no truncation, no "Show more" needed) - note_tweet support for long-form posts - Filters to self-reply chain only (author replying to themselves) - Chronological sorting Also introduces shared helpers for future use by scrapePost: - fetchTweetDetail() — GraphQL API caller - parseTweetResult() — rich data extraction (text, media, article, card, external URLs, engagement stats) - parseThreadFromEntries() — thread chain detection - extractEntries(), unwrapResult(), getScreenName() Fixes: - screen_name moved from user.legacy to user.core in X's GraphQL schema - Self-replies missing from API response for viral tweets (2000+ replies) now handled gracefully (returns available tweets) Supersedes nirholas#12 which patches the DOM approach — this replaces it entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace uniform randomDelay (1-3s) with log-normal distribution (2-7s base + 8% distraction spikes of 8-20s) - Add checkAuth() guard after page navigation — fails fast on expired cookies - Add randomDelay before each fetchTweetDetail API call to simulate human browsing between tweet reads

… filtering Replace the broken DOM-based x_get_likes with a proper scraper using the Likes GraphQL API (cursor-based pagination). - 50 tweets in ~14s, 200 in ~49s (was capped at ~25 with DOM scraping) - Rich data via parseTweetResult (text, media, articles, cards, URLs, engagement) - JSONL output to ~/.xactions/exports/ — progress survives crashes - from/to timestamp filters with early exit on reverse chronological data - Removes x_get_likes from xeepyTools, routes through local-tools.js Depends on: nirholas#17 (shared helpers: parseTweetResult, checkAuth, randomDelay) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel · 2026-04-06T01:47:24Z

@nj-io is attempting to deploy a commit to the kaivocmenirehtacgmailcom's projects Team on Vercel.

A member of the Team first needs to authorize it.

nj-io · 2026-04-07T05:56:21Z

Superseded — resubmitting as clean PRs from current codebase.

nj-io and others added 3 commits April 5, 2026 09:01

nj-io requested a review from nirholas as a code owner April 6, 2026 01:47

nj-io closed this Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: GraphQL-based scrapeLikedTweets with JSONL output#20

feat: GraphQL-based scrapeLikedTweets with JSONL output#20
nj-io wants to merge 3 commits intonirholas:mainfrom
nj-io:feat/graphql-likes

nj-io commented Apr 6, 2026

Uh oh!

vercel bot commented Apr 6, 2026

Uh oh!

nj-io commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

nj-io commented Apr 6, 2026

Summary

Performance

Features

Return shape

Architecture

Depends on

Supersedes

Test plan

Uh oh!

vercel bot commented Apr 6, 2026

Uh oh!

nj-io commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant