Skip to content

fix(xiaohongshu+rednote/search): fall back to href-based note cards when section.note-item class is dropped#1507

Merged
jackwener merged 1 commit into
jackwener:mainfrom
Benjamin-eecs:fix/xhs-search-selector-fallback
May 13, 2026
Merged

fix(xiaohongshu+rednote/search): fall back to href-based note cards when section.note-item class is dropped#1507
jackwener merged 1 commit into
jackwener:mainfrom
Benjamin-eecs:fix/xhs-search-selector-fallback

Conversation

@Benjamin-eecs
Copy link
Copy Markdown
Contributor

Description

Issue #1506 reports opencli xiaohongshu search returning [] even though the search results page visibly has notes. Trace evidence in the issue: xhs ships a render variant where each note card is a bare <section> with no note-item class, so the three section.note-item selectors in clis/xiaohongshu/search.js all match zero elements.

The three call sites in the shared search IIFEs (WAIT_FOR_CONTENT_JS, buildScrollUntilJs, buildSearchExtractJs) now use the same defensive strategy: try the legacy section.note-item class first, then fall back to any <section> that wraps a /search_result/... or /explore/... link. The change is in the xiaohongshu file so the rednote adapter (which imports buildSearchExtractJs and buildScrollUntilJs from here) picks it up automatically.

Extraction-side title selector also gains a fallback: when no .title / .note-title element matches, read the first <span> inside the search-result link, which is where the bare-section render puts the caption per the trace.

Related issue: #1506. Also incidentally addresses the EMPTY_RESULT symptom reported in #1500 (autofix's "page.evaluate envelope" diagnosis is incorrect; rednote uses the same code path and returns proper arrays without unwrapping).

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 🌐 New site adapter
  • 📝 Documentation
  • ♻️ Refactor
  • 🔧 CI / build / tooling

Checklist

  • I ran the checks relevant to this PR
  • I updated tests or docs if needed
  • I included output or screenshots when useful

Documentation (if adding/modifying an adapter)

  • Added doc page under docs/adapters/ (if new adapter)
  • Updated docs/adapters/index.md table (if new adapter)
  • Updated sidebar in docs/.vitepress/config.mts (if new adapter)
  • Updated README.md / README.zh-CN.md when command discoverability changed
  • Used positional args for the command's primary subject unless a named flag is clearly better
  • Normalized expected adapter failures to CliError subclasses instead of raw Error

(Selector logic only. Command surface and error contract unchanged.)

Screenshots / Output

npx vitest run --project adapter clis/xiaohongshu/: 105/105 green (existing test suite unchanged, passes on both legacy and fallback paths).

Live verify on rednote (same imported code path, account-safe to test):

$ opencli rednote search "美食" --limit 3 -f json
[
  { "rank": 1, "title": "在朋友家吃过一次,被惊艳到了!",      "author": "小明教做菜",   "likes": "2344", ... },
  { "rank": 2, "title": "我的15💰晚餐|一碗南瓜焖饭",         "author": "下班吃点啥",   "likes": "2万",  ... },
  { "rank": 3, "title": "干净饮食🫛牛肉炒口蘑豌豆米饭拼盘🍳", "author": "豆豆子",       "likes": "1.2万", ... }
]

Legacy section.note-item path is exercised here (rednote still renders the class). The fallback is dormant on rednote but selectable through the same :has() query if rednote's DOM follows xhs's lead.

Live verify on xiaohongshu was not performed: the test machine has no logged-in xhs session, and the project's operational guidance flags xhs as account-ban-sensitive. The fix is structural; the bare <section> shape the issue reporter traced is reachable through the new fallback, and the existing tests keep the legacy path green.

npx tsc --noEmit clean. npm run build 815 manifest entries, unchanged shape. silent-column-drop / typed-error-lint baselines unchanged.

Closes #1506
Refs #1500

Copilot AI review requested due to automatic review settings May 12, 2026 12:42
@Benjamin-eecs Benjamin-eecs force-pushed the fix/xhs-search-selector-fallback branch from 3736a65 to 37c423c Compare May 12, 2026 12:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the xiaohongshu search DOM extraction logic so it can still detect and extract note cards when XHS drops the legacy section.note-item class (as reported in #1506), and improves title extraction for the new “bare <section>” render shape.

Changes:

  • Adds a fallback note-card detection strategy based on <section> elements that contain /search_result/ or /explore/ links (used in wait, scroll-until, and extraction paths).
  • Adds a title fallback that reads the first <span> inside the detail link when class-based title selectors don’t match.
  • Updates extraction loop control flow to work with the new fallback card collection logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to +24
const findNoteCard = () => document.querySelector(
'section.note-item, section:has(a[href*="/search_result/"]), section:has(a[href*="/explore/"])'
);
Comment on lines 185 to +235
@@ -184,20 +219,29 @@ export function buildSearchExtractJs(webHost) {
const authorLinkEl = el.querySelector('a.author, a[href*="/user/profile/"]');

const url = normalizeUrl(detailLinkEl?.getAttribute('href') || '');
if (!url) return;
if (!url) continue;

const key = url;
if (seen.has(key)) return;
if (seen.has(key)) continue;
seen.add(key);

// Fallback title: the new bare-section render keeps the note caption
// inside the search_result anchor's first span, not in a class-named
// .title element. Pull from there when the class-based pick is empty.
let title = cleanText(titleEl?.textContent || '');
if (!title) {
const captionSpan = detailLinkEl?.querySelector('span');
title = cleanText(captionSpan?.textContent || '');
}
Comment on lines +110 to +116
if (classMatches.length > 0) return classMatches;
const sections = new Set();
for (const a of document.querySelectorAll('a[href*="/search_result/"], a[href*="/explore/"]')) {
const section = a.closest('section');
if (section) sections.add(section);
}
return sections;
@Benjamin-eecs Benjamin-eecs force-pushed the fix/xhs-search-selector-fallback branch from 37c423c to 8573130 Compare May 12, 2026 14:10
@Benjamin-eecs Benjamin-eecs changed the title fix(xiaohongshu+rednote/search): fall back to href-based note cards when section.note-item class is dropped (#1506) fix(xiaohongshu+rednote/search): fall back to href-based note cards when section.note-item class is dropped May 12, 2026
@Benjamin-eecs Benjamin-eecs force-pushed the fix/xhs-search-selector-fallback branch 2 times, most recently from 970c2dc to 94876f5 Compare May 13, 2026 09:48
…hen `section.note-item` class is dropped (jackwener#1506)

Issue jackwener#1506 reports `opencli xiaohongshu search` returning `[]` even though
the page visibly has results. Trace evidence: xhs ships a render variant
where each note card is a bare `<section>` (no `note-item` class), so
the three `section.note-item` selectors in this file all match zero
elements.

Three call sites in the shared search IIFEs now use the same defensive
selector strategy: try the legacy `section.note-item` class first, then
fall back to any `<section>` that wraps a `/search_result/...` or
`/explore/...` link. The change is in the xiaohongshu file so the
rednote adapter (which imports `buildSearchExtractJs` and
`buildScrollUntilJs` from here) picks it up automatically.

Extraction-side title selector also gets a fallback: when no
`.title` / `.note-title` element matches, read the first `<span>`
inside the search-result link, which is where the bare-section render
puts the caption per the trace.

## Verification

`npx vitest run --project adapter clis/xiaohongshu/`: 105/105 green
(existing test suite unchanged, passes on both legacy and fallback paths).

Live verify on rednote (same code path, account-safe):

```
$ opencli rednote search "美食" --limit 3 -f json
[ {rank:1, title:"在朋友家吃过一次..."}, {rank:2, title:"我的15💰晚餐..."}, {rank:3, title:"干净饮食🫛..."} ]
```

Legacy `section.note-item` path is exercised here (rednote still renders
the class) and returns identical row shape to before the fix, confirming
no regression on the working path.

Live verify on xiaohongshu cannot be performed here (no logged-in xhs
session on the test machine; xhs account-ban risk per the project's
operational guidance). The fix is structural: the new `<section>` shape
the issue reporter traced is reachable through the fallback, and the
existing test fixture keeps the legacy path green.

`npx tsc --noEmit` clean. `npm run build` 815 manifest entries unchanged
shape. `silent-column-drop` / `typed-error-lint` baselines unchanged.

Closes jackwener#1506
Refs jackwener#1500
@Benjamin-eecs Benjamin-eecs force-pushed the fix/xhs-search-selector-fallback branch from 94876f5 to 36b6f29 Compare May 13, 2026 10:02
@jackwener jackwener merged commit 2babed8 into jackwener:main May 13, 2026
11 checks passed
@jackwener
Copy link
Copy Markdown
Owner

深度 review 完毕,合入 ✅。

第一性原理 review

  • 触发条件确凿:issue [BUG] xiaohongshu search returns empty array due to outdated DOM selectors #1506 现场抓到 xhs 把 section.note-item class drop 掉。defensive selector fallback 是正解。
  • 三层 selector 一致回退WAIT_FOR_CONTENT_JS / buildScrollUntilJs.collectNoteCards / buildSearchExtractJs.collectNoteCards 都按"class first → fallback to section:has(a[href*=/search_result/]) / section:has(a[href*=/explore/])"同一规则,没漏点。
  • .forEach(el => return)for (const el of ...) { continue }:必要重写,因为 collectNoteCards() 现在返回 NodeListSetforEachSet 上是 OK 但 .return 早退语义不同;改成 for...of + continue 顺手统一两种容器的退出语义,没引入 silent skip。
  • Title fallback<section> 裸 render 下,note caption 落在 search_result anchor 的首个 <span> 里。titleEl?.textContent || '' 空时再去捞 detailLinkEl?.querySelector('span') 是合理的二层兜底;不会把 author / likes 当 title 误抓(搜索器固定)。
  • el.classList?.contains optional chain:fallback 路径下 <section> 不一定带 classList getter,加 ?.TypeError
  • 没有引入新 audit 违规:typed-error-lint 189/189,silent-column-drop 103/103。tests 125/125(xiaohongshu + rednote)。

Nitpick(不阻塞 merge)

两份 collectNoteCards 内联 closure 完全相同(19 行 × 2)。是 ≥2 sibling 抽 utils.js 的候选;但 #1391 设的 lazy-extract 阈值是"出现第 3 个 sibling 才抽",目前 2 个内联是可接受边界,留 follow-up 不阻塞。

合入:squash → 2babed84e9539cb5c6f3389858e2d55b378ba102

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] xiaohongshu search returns empty array due to outdated DOM selectors

3 participants