feat: enhanced scrapeLikedTweets with rich data and timestamp filtering#19
Closed
nj-io wants to merge 2 commits intonirholas:mainfrom
Closed
feat: enhanced scrapeLikedTweets with rich data and timestamp filtering#19nj-io wants to merge 2 commits intonirholas:mainfrom
nj-io wants to merge 2 commits intonirholas:mainfrom
Conversation
Replace the broken xeepy-based x_get_likes handler with a proper scraper in src/scrapers/twitter/index.js, following the same pattern as scrapeBookmarks. Rich data per tweet: - text (with "Show more" expansion), author, handle, timestamp, link - images (attributed to correct author by handle matching) - quoted tweets (detected via multiple UserAvatar-Container elements) - X Articles (title, description, cover image via article-cover-image) - link cards (via card.wrapper) - engagement stats (replies, retweets, likes, views from role="group") Timestamp filtering: - from: only include likes from this date onward, stops scrolling early when older tweets are reached (reverse chronological optimization) - to: only include likes up to this date, skips newer but keeps scrolling - Works in conjunction with limit Bug fixes: - "Show more" clicks one at a time (X re-renders DOM after each click) - Auth check after navigation — fails fast on expired cookies - Scroll-based pagination with deduplication - Removes x_get_likes from xeepyTools, routes through local-tools.js Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@nj-io is attempting to deploy a commit to the kaivocmenirehtacgmailcom's projects Team on Vercel. A member of the Team first needs to authorize it. |
- Expand viewport to 2400px height before scrolling (default 800px only fits ~1 tweet, causing X's virtualization to never render more) - Restore viewport to 800px after scraping - Wait for initial tweet selector before entering scroll loop - Scroll by window.innerHeight instead of fixed 1200px - MutationObserver-based wait for DOM changes after each scroll - Progressive backoff on empty scrolls (2-4s base + 1-1.5s per miss) - Increase empty scroll tolerance from 5 to 8 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Author
|
Superseded — rewrote to use GraphQL API instead of DOM scraping. See new PR. |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the broken xeepy-based
x_get_likeshandler with a properscrapeLikedTweets()scraper, following the same pattern asscrapeBookmarks(). Supersedes #13 with timestamp filtering and auth checks added.Rich data per tweet
textauthor,handleUser-Name+ firsta[href]timestamp,linktime[datetime], first/status/linkimagesa[href*="/photo/"]attributed to correct author by handle matchingquotedTweetUserAvatar-Container-*elementsarticlearticle-cover-image+nextElementSiblingfor title/descriptioncardcard.wrapperfor link previewsreplies,retweets,likes,viewsrole="group"aria-labelTimestamp filtering (new)
fromtolimitfrom/to— caps total results.Accepts any format
new Date()understands:"2026-03-01","March 1, 2026", ISO timestamps, etc.Architecture
scrapeLikedTweets()insrc/scrapers/twitter/index.jssrc/scrapers/index.jsx_get_likes()insrc/mcp/local-tools.jsxeepyToolsarray, oldexecuteXeepyToolhandler deletedBug fixes
Relation to other PRs
scrapeLikedTweetsportion of Encrypted DM reader, batch profiles, liked tweets, more human-like delays #7. More comprehensive (rich data, timestamp filtering, auth check).Test plan
x_get_likesreturns rich data with quote tweets, articles, cards, engagement statsfromparam stops scrolling early when passing the target datetoparam skips newer tweets but keeps scrollinglimit+fromwork together🤖 Generated with Claude Code