fix(xiaohongshu+rednote/search): fall back to href-based note cards when section.note-item class is dropped#1507
Merged
jackwener merged 1 commit intoMay 13, 2026
Conversation
3736a65 to
37c423c
Compare
There was a problem hiding this comment.
Pull request overview
This PR hardens the xiaohongshu search DOM extraction logic so it can still detect and extract note cards when XHS drops the legacy section.note-item class (as reported in #1506), and improves title extraction for the new “bare <section>” render shape.
Changes:
- Adds a fallback note-card detection strategy based on
<section>elements that contain/search_result/or/explore/links (used in wait, scroll-until, and extraction paths). - Adds a title fallback that reads the first
<span>inside the detail link when class-based title selectors don’t match. - Updates extraction loop control flow to work with the new fallback card collection logic.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+22
to
+24
| const findNoteCard = () => document.querySelector( | ||
| 'section.note-item, section:has(a[href*="/search_result/"]), section:has(a[href*="/explore/"])' | ||
| ); |
Comment on lines
185
to
+235
| @@ -184,20 +219,29 @@ export function buildSearchExtractJs(webHost) { | |||
| const authorLinkEl = el.querySelector('a.author, a[href*="/user/profile/"]'); | |||
|
|
|||
| const url = normalizeUrl(detailLinkEl?.getAttribute('href') || ''); | |||
| if (!url) return; | |||
| if (!url) continue; | |||
|
|
|||
| const key = url; | |||
| if (seen.has(key)) return; | |||
| if (seen.has(key)) continue; | |||
| seen.add(key); | |||
|
|
|||
| // Fallback title: the new bare-section render keeps the note caption | |||
| // inside the search_result anchor's first span, not in a class-named | |||
| // .title element. Pull from there when the class-based pick is empty. | |||
| let title = cleanText(titleEl?.textContent || ''); | |||
| if (!title) { | |||
| const captionSpan = detailLinkEl?.querySelector('span'); | |||
| title = cleanText(captionSpan?.textContent || ''); | |||
| } | |||
Comment on lines
+110
to
+116
| if (classMatches.length > 0) return classMatches; | ||
| const sections = new Set(); | ||
| for (const a of document.querySelectorAll('a[href*="/search_result/"], a[href*="/explore/"]')) { | ||
| const section = a.closest('section'); | ||
| if (section) sections.add(section); | ||
| } | ||
| return sections; |
37c423c to
8573130
Compare
970c2dc to
94876f5
Compare
…hen `section.note-item` class is dropped (jackwener#1506) Issue jackwener#1506 reports `opencli xiaohongshu search` returning `[]` even though the page visibly has results. Trace evidence: xhs ships a render variant where each note card is a bare `<section>` (no `note-item` class), so the three `section.note-item` selectors in this file all match zero elements. Three call sites in the shared search IIFEs now use the same defensive selector strategy: try the legacy `section.note-item` class first, then fall back to any `<section>` that wraps a `/search_result/...` or `/explore/...` link. The change is in the xiaohongshu file so the rednote adapter (which imports `buildSearchExtractJs` and `buildScrollUntilJs` from here) picks it up automatically. Extraction-side title selector also gets a fallback: when no `.title` / `.note-title` element matches, read the first `<span>` inside the search-result link, which is where the bare-section render puts the caption per the trace. ## Verification `npx vitest run --project adapter clis/xiaohongshu/`: 105/105 green (existing test suite unchanged, passes on both legacy and fallback paths). Live verify on rednote (same code path, account-safe): ``` $ opencli rednote search "美食" --limit 3 -f json [ {rank:1, title:"在朋友家吃过一次..."}, {rank:2, title:"我的15💰晚餐..."}, {rank:3, title:"干净饮食🫛..."} ] ``` Legacy `section.note-item` path is exercised here (rednote still renders the class) and returns identical row shape to before the fix, confirming no regression on the working path. Live verify on xiaohongshu cannot be performed here (no logged-in xhs session on the test machine; xhs account-ban risk per the project's operational guidance). The fix is structural: the new `<section>` shape the issue reporter traced is reachable through the fallback, and the existing test fixture keeps the legacy path green. `npx tsc --noEmit` clean. `npm run build` 815 manifest entries unchanged shape. `silent-column-drop` / `typed-error-lint` baselines unchanged. Closes jackwener#1506 Refs jackwener#1500
94876f5 to
36b6f29
Compare
Owner
|
深度 review 完毕,合入 ✅。 第一性原理 review
Nitpick(不阻塞 merge)两份 合入:squash → |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Issue #1506 reports
opencli xiaohongshu searchreturning[]even though the search results page visibly has notes. Trace evidence in the issue: xhs ships a render variant where each note card is a bare<section>with nonote-itemclass, so the threesection.note-itemselectors inclis/xiaohongshu/search.jsall match zero elements.The three call sites in the shared search IIFEs (
WAIT_FOR_CONTENT_JS,buildScrollUntilJs,buildSearchExtractJs) now use the same defensive strategy: try the legacysection.note-itemclass first, then fall back to any<section>that wraps a/search_result/...or/explore/...link. The change is in the xiaohongshu file so the rednote adapter (which importsbuildSearchExtractJsandbuildScrollUntilJsfrom here) picks it up automatically.Extraction-side title selector also gains a fallback: when no
.title/.note-titleelement matches, read the first<span>inside the search-result link, which is where the bare-section render puts the caption per the trace.Related issue: #1506. Also incidentally addresses the
EMPTY_RESULTsymptom reported in #1500 (autofix's "page.evaluate envelope" diagnosis is incorrect; rednote uses the same code path and returns proper arrays without unwrapping).Type of Change
Checklist
Documentation (if adding/modifying an adapter)
docs/adapters/(if new adapter)docs/adapters/index.mdtable (if new adapter)docs/.vitepress/config.mts(if new adapter)README.md/README.zh-CN.mdwhen command discoverability changedCliErrorsubclasses instead of rawError(Selector logic only. Command surface and error contract unchanged.)
Screenshots / Output
npx vitest run --project adapter clis/xiaohongshu/: 105/105 green (existing test suite unchanged, passes on both legacy and fallback paths).Live verify on rednote (same imported code path, account-safe to test):
Legacy
section.note-itempath is exercised here (rednote still renders the class). The fallback is dormant on rednote but selectable through the same:has()query if rednote's DOM follows xhs's lead.Live verify on xiaohongshu was not performed: the test machine has no logged-in xhs session, and the project's operational guidance flags xhs as account-ban-sensitive. The fix is structural; the bare
<section>shape the issue reporter traced is reachable through the new fallback, and the existing tests keep the legacy path green.npx tsc --noEmitclean.npm run build815 manifest entries, unchanged shape.silent-column-drop/typed-error-lintbaselines unchanged.Closes #1506
Refs #1500