Skip to content

Releases: psarno/fetchmd

v0.2.0 - Initial Release

06 Apr 11:52

Choose a tag to compare

Release Notes - v0.2.0

Overview

fetchmd is a command-line utility that fetches web content and returns clean Markdown for developers and LLM agents. No API keys, no browser automation setup required for most pages.

Features

  • Two-stage extraction pipeline - Defuddle for static content (blogs, docs, reference pages), optional Playwright rendering for JavaScript-heavy SPAs
  • Configurable thresholds - Set minimum content length and maximum output size to suit your needs
  • Context-aware output - Truncates at paragraph boundaries and appends markers so downstream consumers (LLM agents) know when content was cut
  • Stage visibility - Optional --stage flag outputs which extraction method succeeded, useful for debugging
  • LLM-friendly - No server, no protocol, no setup. Any shell-based agent can call it directly after install

Technical Details

  • Core extraction: Defuddle 0.15.0 (standardizes Markdown, handles code blocks, tables, footnotes)
  • Optional JavaScript rendering: Playwright (headless Chromium, network-idle wait strategy, 20s timeout per page)
  • HTTP handling: 15-second timeout, Mozilla user agent, proper error reporting to stderr
  • Node.js requirement: 18.0.0 or higher
  • Zero external credentials: Works on any network, no API keys or tokens needed

Command-Line Options

--min-length N    Minimum characters to accept (default: 200)
--max-chars N     Truncate output, paragraph-aligned (default: 50000, 0 to disable)
--no-spa          Skip Playwright even if installed
--stage           Prefix output with extraction stage (defuddle or playwright)

Installation

npm install -g @psarno/fetchmd

# For JavaScript-rendered pages (optional):
npm install -g playwright
npx playwright install chromium