Skip to content

Commit

Permalink
Add 'percollate md' command (#152)
Browse files Browse the repository at this point in the history
* Add 'percollate md' command, fixes #93 
* Add markdown preferences with '--md.<option>=<value>' options
  • Loading branch information
danburzo authored Feb 19, 2023
1 parent 630a540 commit b56c712
Show file tree
Hide file tree
Showing 10 changed files with 2,403 additions and 118 deletions.
25 changes: 21 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

<a href="https://www.npmjs.org/package/percollate"><img src="https://img.shields.io/npm/v/percollate.svg?style=flat-square&labelColor=324A97&color=black" alt="npm version"></a>

Percollate is a command-line tool that turns web pages into beautifully formatted PDF, EPUB, or HTML files.
Percollate is a command-line tool that turns web pages into beautifully formatted PDF, EPUB, HTML or Markdown files.

<figure style='margin: 1rem 0'>
<img alt="Sample Output" src="./.github/dimensions-of-colour.png">
Expand Down Expand Up @@ -60,6 +60,7 @@ The following commands are available:
- `percollate pdf` produces a PDF file;
- `percollate epub` produces an EPUB file;
- `percollate html` produces a HTML file.
- `percollate md` produces a Markdown file.

The operands can be URLs, paths to local files, or the `-` character which stands for `stdin` (the standard inputs).

Expand Down Expand Up @@ -103,7 +104,7 @@ percollate pdf --individual http://example.com/page1 http://example.com/page2

#### `--template`

Path to a custom HTML template. Applies to `pdf` and `html`.
Path to a custom HTML template. Applies to `pdf`, `html`, and `md`.

#### `--style`

Expand Down Expand Up @@ -145,11 +146,11 @@ Generate a cover. The option is implicitly enabled when the `--title` option is

Generate a hyperlinked table of contents. The option is implicitly enabled when bundling more than one web page to a single file. Disable this implicit behavior by passing the `--no-toc` flag.

Applies to `pdf` and `html`.
Applies to `pdf`, `html`, and `md`.

#### `--hyphenate`

Hyphenation is enabled by default for `pdf`, and disabled for `epub` and `html`. You can opt into hyphenation with the `--hyphenate` flag, or disable it with the `--no-hyphenate` flag.
Hyphenation is enabled by default for `pdf`, and disabled for `epub`, `html`, and `md`. You can opt into hyphenation with the `--hyphenate` flag, or disable it with the `--no-hyphenate` flag.

See also the [Hyphenation and justification](#hyphenation-and-justification) recipe.

Expand All @@ -159,6 +160,20 @@ Embed images inline with the document. Images are fetched and converted to Base6

This option is particularly useful for `html` to produce self-contained HTML files.

#### `--md.<option>=<value>`

Pass options to the underlying Markdown stringifier, [`mdast-util-to-markdown`](https://github.com/syntax-tree/mdast-util-to-markdown#options). These are the default Markdown options:

```js
const DEFAULT_MARKDOWN_OPTIONS = {
fences: true,
emphasis: '_',
strong: '_',
resourceLink: true,
rule: '-'
};
```

## Recipes

### Basic bundling
Expand Down Expand Up @@ -352,6 +367,8 @@ EPUBs have external images fetched and bundled together with the HTML of each ar

HTMLs are saved without any further changes. When the `--inline` option is used, images are converted to `data` URLs and embedded into the HTML. External images are not otherwise fetched.

Markdown files are produced the same way as HTMLs, then processed with a series of utilities from the [unified.js](https://unifiedjs.com/) umbrella.

## Limitations

Percollate inherits the limitations of two of its main components, Readability and Puppeteer (headless Chrome).
Expand Down
162 changes: 93 additions & 69 deletions cli.js
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/usr/bin/env node

import { readFileSync } from 'node:fs';
import cliopts from './src/cli-opts.js';
import { pdf, epub, html } from './index.js';
import { cliopts, namespacedOptions } from './src/cli-opts.js';
import { pdf, epub, html, md } from './index.js';

const { command, opts, operands } = cliopts(process.argv.slice(2));

Expand Down Expand Up @@ -31,24 +31,41 @@ if (!operands.length) {
operands.push('-');
}

if (opts.markdownOptions) {
console.error(
`Unsupported option 'markdownOptions'. Did you mean '--md.<option>=<value>'?`
);
process.exit(1);
}

switch (command) {
case 'pdf':
if (opts.output === '-') {
console.error(`Output to <stdout> is only supported for HTML.`);
console.error(
`Output to <stdout> is only supported for commands: 'html', 'md'.`
);
process.exit(1);
}
pdf(operands, opts);
break;
case 'epub':
if (opts.output === '-') {
console.error(`Output to <stdout> is only supported for HTML.`);
console.error(
`Output to <stdout> is only supported for commands: 'html', 'md'.`
);
process.exit(1);
}
epub(operands, opts);
break;
case 'html':
html(operands, opts);
break;
case 'md':
md(operands, {
...opts,
markdownOptions: namespacedOptions(opts, 'md')
});
break;
default:
outputHelp(true);
}
Expand All @@ -61,94 +78,101 @@ switch (command) {
function outputHelp(error) {
const helpText = `percollate v${pkg.version}
Usage: percollate <command> [options] url [url]...
Usage:
percollate <command> [options] url [url]...
Commands:
pdf Bundle web pages as a PDF file.
epub Bundle web pages as an EPUB file.
html Bundle web pages as a HTML file.
pdf Bundle web pages as a PDF file.
epub Bundle web pages as an EPUB file.
html Bundle web pages as a HTML file.
md Bundle web pages as a Markdown file.
Commmon options:
-h, --help Output usage information.
-V, --version Output program version.
--debug Print more detailed information.
-o <output>, Path for the generated bundle.
--output=<path> Use '-' to output to standard output ('stdout').
--template=<path> Path to a custom HTML template.
--style=<path> Path to a custom CSS file.
--css=<style> Additional inline CSS style.
-u, --url=<url> Sets the base URL when HTML is provided on stdin.
Multiple URL options can be specified.
-w, --wait=<sec> Process the provided URLs sequentially,
pausing a number of seconds between items.
-t <title>, The bundle title.
--title=<title>
-a <author>, The bundle author.
--author=<author>
--individual Export each web page as an individual file.
--toc Generate a Table of Contents.
Implicitly enabled when bundling more than one item.
--cover Generate a cover for the PDF / EPUB.
Implicitly enabled when bundling more than one item
or the --title option is provided.
--browser=<browser> One of 'chrome' (default), 'firefox'.
Used for producing PDF and the cover image for EPUB.
--hyphenate Enable hyphenation. Enabled by default for PDF.
--inline Embed images inline with the content.
Fetches and converts images to Base64 'data:' URLs.
-h, --help Output usage information.
-V, --version Output program version.
--debug Print more detailed information.
-o <output>, Path for the generated bundle.
--output=<path> Use '-' to output to standard output ('stdout').
--template=<path> Path to a custom HTML template.
--style=<path> Path to a custom CSS file.
--css=<style> Additional inline CSS style.
-u, --url=<url> Sets the base URL when HTML is provided on stdin.
Multiple URL options can be specified.
-w, --wait=<sec> Process the provided URLs sequentially,
pausing a number of seconds between items.
-t <title>, The bundle title.
--title=<title>
-a <author>, The bundle author.
--author=<author>
--individual Export each web page as an individual file.
--toc Generate a Table of Contents.
Implicitly enabled when bundling more than one item.
--cover Generate a cover for the PDF / EPUB.
Implicitly enabled when bundling more than one item
or the --title option is provided.
--browser=<browser> One of 'chrome' (default), 'firefox'.
Used for producing PDF and the cover image for EPUB.
--hyphenate Enable hyphenation. Enabled by default for PDF.
--inline Embed images inline with the content.
Fetches and converts images to Base64 'data:' URLs.
Options to disable features:
--no-amp Don't prefer the AMP version of the web page.
--no-toc Don't generate a table of contents.
--no-cover Don't generate a cover.
--no-hyphenate Disable hyphenation.
--no-amp Don't prefer the AMP version of the web page.
--no-toc Don't generate a table of contents.
--no-cover Don't generate a cover.
--no-hyphenate Disable hyphenation.
PDF options:
--no-sandbox Passed to Puppeteer.
--no-sandbox Passed to Puppeteer.
Markdown options:
--md.<option>=<value> Options to pass to the Markdown stringifier,
the 'mdast-util-to-markdown' library.
Operands:
percollate accepts one or more URLs.
Use the hyphen character ('-') to specify
that the HTML should be read from stdin.
percollate accepts one or more URLs.
Use the hyphen character ('-') to specify
that the HTML should be read from stdin.
Examples:
Single web page to PDF:
Single web page to PDF:
percollate pdf --output my.pdf https://example.com
Single web page read from stdin to PDF:
curl https://example.com | percollate pdf -o my.pdf -u https://example.com -
percollate pdf --output my.pdf https://example.com
Several web pages to a single PDF:
Single web page read from stdin to PDF:
percollate pdf --output my.pdf https://example.com/1 https://example.com/2
curl https://example.com | percollate pdf -o my.pdf -u https://example.com -
Several web pages to a single PDF:
Custom page size and font size:
percollate pdf --output my.pdf https://example.com/1 https://example.com/2
Custom page size and font size:
percollate pdf --output my.pdf --css "@page { size: A3 landscape } html { font-size: 18pt }" https://example.com
percollate pdf --output my.pdf --css "@page { size: A3 landscape } html { font-size: 18pt }" https://example.com
`;
if (error) {
console.error(helpText);
Expand Down
Loading

0 comments on commit b56c712

Please sign in to comment.