Releases: psarno/fetchmd
Releases · psarno/fetchmd
v0.2.0 - Initial Release
Release Notes - v0.2.0
Overview
fetchmd is a command-line utility that fetches web content and returns clean Markdown for developers and LLM agents. No API keys, no browser automation setup required for most pages.
Features
- Two-stage extraction pipeline - Defuddle for static content (blogs, docs, reference pages), optional Playwright rendering for JavaScript-heavy SPAs
- Configurable thresholds - Set minimum content length and maximum output size to suit your needs
- Context-aware output - Truncates at paragraph boundaries and appends markers so downstream consumers (LLM agents) know when content was cut
- Stage visibility - Optional
--stageflag outputs which extraction method succeeded, useful for debugging - LLM-friendly - No server, no protocol, no setup. Any shell-based agent can call it directly after install
Technical Details
- Core extraction: Defuddle 0.15.0 (standardizes Markdown, handles code blocks, tables, footnotes)
- Optional JavaScript rendering: Playwright (headless Chromium, network-idle wait strategy, 20s timeout per page)
- HTTP handling: 15-second timeout, Mozilla user agent, proper error reporting to stderr
- Node.js requirement: 18.0.0 or higher
- Zero external credentials: Works on any network, no API keys or tokens needed
Command-Line Options
--min-length N Minimum characters to accept (default: 200)
--max-chars N Truncate output, paragraph-aligned (default: 50000, 0 to disable)
--no-spa Skip Playwright even if installed
--stage Prefix output with extraction stage (defuddle or playwright)
Installation
npm install -g @psarno/fetchmd
# For JavaScript-rendered pages (optional):
npm install -g playwright
npx playwright install chromium