-
Notifications
You must be signed in to change notification settings - Fork 11
fix(meta-tags): add workaround for Preflight-Audit #1216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…sageGroupId to sendMessage in sqs.js
…comment for loading scrape result paths
…/spacecat-shared-scrape-client to 2.1.0
# Conflicts: # package-lock.json # package.json # src/metatags/handler.js # test/audits/metatags.test.js # test/common/async-job-runner.test.js
…at-implement-scrapeClient-stepped-audits # Conflicts: # package-lock.json # package.json
…at-implement-scrapeClient-stepped-audits
This PR will trigger a minor release when merged. |
…at-implement-scrapeClient-stepped-audits
## [1.186.1](v1.186.0...v1.186.1) (2025-09-23) ### Bug Fixes * **meta-tags:** add workaround for Preflight-Audit ([#1216](#1216)) ([8918323](8918323))
🎉 This PR is included in version 1.186.1 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
|
||
// Transform URLs into scrape.json paths and combine them into a Set | ||
const topPagePaths = topPages.map((page) => getScrapeJsonPath(page.url, siteId)); | ||
const includedUrlPaths = includedURLs.map((url) => getScrapeJsonPath(url, siteId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this code is now removed, how are we incorporating the includedUrls in the meta-tags audit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use all URLs that were successfully scraped. in the submitForScraping
-step the included urls will be sent with the top pages to the scraper (see here).
Only the URLs that were submitted by this audit will be returned (no old scrapes or other pages that were not submitted)
const includedURLs = await site?.getConfig()?.getIncludedURLs('meta-tags') || []; | ||
|
||
// Transform URLs into scrape.json paths and combine them into a Set | ||
const topPagePaths = topPages.map((page) => getScrapeJsonPath(page.url, siteId)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this code is now removed, will the scrapeResultPaths coming from context ensure it only contain the latest top-pages scrapes, and no older scrapes (from previous top-pages imports)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scrapeResultPaths will only contain URLs that were originally submitted for scraping by this audit. the new scrapeClient workflow will ensure that.
const pageUrl = object.finalUrl ? new URL(object.finalUrl).pathname | ||
: new URL(url).pathname; | ||
// handling for homepage | ||
if (pageUrl === '') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check was added since the home page had no path(empty). Will the page url being computed above be '/' for home page now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new URL(url).pathname
will resolve to '/'
for the home page
Note: as soon as all audits are updated to use the ScrapeClient, preflight can be updated too and the workarounds can be removed