Skip to content

feat: enhanced scrapeLikedTweets with rich data and timestamp filtering#19

Closed
nj-io wants to merge 2 commits intonirholas:mainfrom
nj-io:feat/enhanced-liked-tweets-v2
Closed

feat: enhanced scrapeLikedTweets with rich data and timestamp filtering#19
nj-io wants to merge 2 commits intonirholas:mainfrom
nj-io:feat/enhanced-liked-tweets-v2

Conversation

@nj-io
Copy link
Copy Markdown

@nj-io nj-io commented Apr 5, 2026

Summary

Replaces the broken xeepy-based x_get_likes handler with a proper scrapeLikedTweets() scraper, following the same pattern as scrapeBookmarks(). Supersedes #13 with timestamp filtering and auth checks added.

Rich data per tweet

Field Source
text Full text with "Show more" expansion
author, handle User-Name + first a[href]
timestamp, link time[datetime], first /status/ link
images a[href*="/photo/"] attributed to correct author by handle matching
quotedTweet Detected via multiple UserAvatar-Container-* elements
article article-cover-image + nextElementSibling for title/description
card card.wrapper for link previews
replies, retweets, likes, views Parsed from role="group" aria-label

Timestamp filtering (new)

Param Effect
from Only include likes from this date onward. Stops scrolling early when older tweets are reached (reverse chronological optimization).
to Only include likes up to this date. Skips newer tweets but keeps scrolling to reach the target window.
limit Works in conjunction with from/to — caps total results.

Accepts any format new Date() understands: "2026-03-01", "March 1, 2026", ISO timestamps, etc.

Architecture

  • scrapeLikedTweets() in src/scrapers/twitter/index.js
  • Exported through src/scrapers/index.js
  • Wrapped as x_get_likes() in src/mcp/local-tools.js
  • Removed from xeepyTools array, old executeXeepyTool handler deleted

Bug fixes

  • "Show more" clicks one at a time — X re-renders DOM after each click
  • Auth check after navigation — fails fast on expired cookies
  • Article URL construction — only for direct articles, not quoted tweet articles

Relation to other PRs

Test plan

  • x_get_likes returns rich data with quote tweets, articles, cards, engagement stats
  • from param stops scrolling early when passing the target date
  • to param skips newer tweets but keeps scrolling
  • limit + from work together
  • Invalid date string throws clear error
  • Expired cookie throws auth error instead of scraping empty pages
  • "Show more" expansion works for truncated tweets

🤖 Generated with Claude Code

Replace the broken xeepy-based x_get_likes handler with a proper scraper
in src/scrapers/twitter/index.js, following the same pattern as
scrapeBookmarks.

Rich data per tweet:
- text (with "Show more" expansion), author, handle, timestamp, link
- images (attributed to correct author by handle matching)
- quoted tweets (detected via multiple UserAvatar-Container elements)
- X Articles (title, description, cover image via article-cover-image)
- link cards (via card.wrapper)
- engagement stats (replies, retweets, likes, views from role="group")

Timestamp filtering:
- from: only include likes from this date onward, stops scrolling early
  when older tweets are reached (reverse chronological optimization)
- to: only include likes up to this date, skips newer but keeps scrolling
- Works in conjunction with limit

Bug fixes:
- "Show more" clicks one at a time (X re-renders DOM after each click)
- Auth check after navigation — fails fast on expired cookies
- Scroll-based pagination with deduplication
- Removes x_get_likes from xeepyTools, routes through local-tools.js

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nj-io nj-io requested a review from nirholas as a code owner April 5, 2026 11:01
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 5, 2026

@nj-io is attempting to deploy a commit to the kaivocmenirehtacgmailcom's projects Team on Vercel.

A member of the Team first needs to authorize it.

- Expand viewport to 2400px height before scrolling (default 800px
  only fits ~1 tweet, causing X's virtualization to never render more)
- Restore viewport to 800px after scraping
- Wait for initial tweet selector before entering scroll loop
- Scroll by window.innerHeight instead of fixed 1200px
- MutationObserver-based wait for DOM changes after each scroll
- Progressive backoff on empty scrolls (2-4s base + 1-1.5s per miss)
- Increase empty scroll tolerance from 5 to 8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nj-io
Copy link
Copy Markdown
Author

nj-io commented Apr 6, 2026

Superseded — rewrote to use GraphQL API instead of DOM scraping. See new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant