feat: GraphQL-based scrapeThread, scrapePost, x_read_post, and shared infrastructure#23
Open
nj-io wants to merge 1 commit intonirholas:mainfrom
Open
feat: GraphQL-based scrapeThread, scrapePost, x_read_post, and shared infrastructure#23nj-io wants to merge 1 commit intonirholas:mainfrom
nj-io wants to merge 1 commit intonirholas:mainfrom
Conversation
Rewrites scrapeThread and adds scrapePost using X's TweetDetail GraphQL
API instead of DOM scraping. Introduces shared infrastructure for all
GraphQL-based scrapers.
New tools:
- x_read_post: read any tweet with full rich data, recursive QT resolution
Shared helpers:
- fetchTweetDetail: GraphQL API caller with retry/backoff on rate limits
- parseTweetResult: rich data extraction (text, media, articles, cards,
URLs, engagement)
- parseThreadFromEntries: self-reply thread chain detection
- checkAuth: post-navigation auth guard
- randomDelay: log-normal distribution with distraction spikes
- newTab: per-call tab isolation (shared browser, separate pages)
scrapeThread rewrite:
- Uses GraphQL API instead of DOM scraping
- Gets full_text (no truncation), note_tweet support
- screen_name from user.core (X moved it from user.legacy)
scrapePost:
- Handles single posts and threads
- Recursive quote tweet resolution (up to 5 levels)
- Each tweet: text, media, articles, cards, external URLs, engagement
- Error surfacing: returns { thread: [], error: "..." } on failure
Multi-tab isolation:
- x_read_post and x_get_thread each create their own browser tab
- Tabs share cookies/auth, don't conflict on concurrent calls
- 60s default timeout per tab
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@nj-io is attempting to deploy a commit to the kaivocmenirehtacgmailcom's projects Team on Vercel. A member of the Team first needs to authorize it. |
nj-io
added a commit
to nj-io/XActions
that referenced
this pull request
Apr 7, 2026
Two new tools for scraping and deeply reading liked tweets. x_get_likes — fast GraphQL-based likes index: - Likes GraphQL API with cursor pagination (50 in 14s, 200 in 49s) - JSONL output to ~/.xactions/exports/ - from/to timestamp filtering with early exit - Rich data via parseTweetResult x_discover_likes — interleaved fetch + deep read: - Fetches likes via API, deep-reads each via scrapePost - Human-like pacing: 3-8s between pages, 2-5s before reads, 5-15s after - Produces two JSONL files: likes index + deep reads - ~38s per tweet average Both use multi-tab isolation (newTab) for concurrent safety. Removes x_get_likes from xeepyTools, deletes old DOM handler. Depends on: nirholas#23 (shared infrastructure) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewrites
scrapeThreadand addsscrapePost/x_read_postusing X's TweetDetail GraphQL API. Introduces shared helpers and multi-tab browser isolation for all GraphQL-based scrapers.Why GraphQL instead of DOM scraping
screen_namemoved fromuser.legacytouser.corein X's GraphQL schemafull_text(no truncation, no "Show more" clicking)New tool: x_read_post
Give it any tweet URL. Returns:
Shared helpers
fetchTweetDetailparseTweetResultparseThreadFromEntriesin_reply_to_status_id_strcheckAuthrandomDelaynewTabMulti-tab isolation
Each tool call (
x_read_post,x_get_thread) creates its own browser tab. Tabs share cookies/auth (same browser) but don't conflict. Concurrent calls from different Claude Code sessions are safe. Tabs auto-close after the call.Supersedes
Test plan
{ thread: [], error: "..." }on failure🤖 Generated with Claude Code