Summary
analyze-webpage.js uses a hardcoded Chromium executable path and a fragile browser
configuration that can be blocked by sites that detect headless browsers (e.g. HTTP/2
rejection, bot fingerprinting).
Changes needed
- Replace hardcoded
executablePath (/ms-playwright/chromium-1208/chrome-linux/chrome)
with dynamic Chrome-first auto-detection: try channel: 'chrome' first (real TLS
fingerprint), fall back to bundled Chromium silently. The hardcoded path breaks if the
Chromium version changes and doesn't work outside Docker.
- Add
--disable-http2 to browser args to prevent ERR_HTTP2_PROTOCOL_ERROR from
servers that reject HTTP/2 connections from headless browsers.
- Switch navigation to
domcontentloaded (60s timeout + 5s settle) instead of
networkidle. Many sites never reach networkidle and the script times out unnecessarily.
- Add explicit
timeout: 60000 to the screenshot call to prevent failures on large pages.
- Align browser context config with
run-bulk-import.js: Chrome 131 UA, realistic
sec-ch-ua headers, locale, timezone, ignoreHTTPSErrors.
Summary
analyze-webpage.jsuses a hardcoded Chromium executable path and a fragile browserconfiguration that can be blocked by sites that detect headless browsers (e.g. HTTP/2
rejection, bot fingerprinting).
Changes needed
executablePath(/ms-playwright/chromium-1208/chrome-linux/chrome)with dynamic Chrome-first auto-detection: try
channel: 'chrome'first (real TLSfingerprint), fall back to bundled Chromium silently. The hardcoded path breaks if the
Chromium version changes and doesn't work outside Docker.
--disable-http2to browser args to preventERR_HTTP2_PROTOCOL_ERRORfromservers that reject HTTP/2 connections from headless browsers.
domcontentloaded(60s timeout + 5s settle) instead ofnetworkidle. Many sites never reach networkidle and the script times out unnecessarily.timeout: 60000to the screenshot call to prevent failures on large pages.run-bulk-import.js: Chrome 131 UA, realisticsec-ch-uaheaders, locale, timezone,ignoreHTTPSErrors.