Skip to content

Commit b88a02a

Browse files
committed
feat: support of localized images
setable log levels more defaults that "just work"
1 parent b4a0a39 commit b88a02a

14 files changed

+7760
-7513
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
dist/
22
sample/
33
sample_img/
4+
i18n/
45
node_modules/
56
version.json

README.md

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,21 +38,31 @@ means that the id is "0456aa5842946PRETEND4f37c97a0e5".
3838
Determine where you want the markdown files and images to land. The following works well for Docusaurus instances:
3939

4040
```
41-
npx notion-pull-mdx -n secret_PRETEND123456789PRETEND123456789PRETEND6789 -r 0456aa5842946PRETEND4f37c97a0e5 -m "./docs" -i "./images"
41+
npx notion-pull-mdx -n secret_PRETEND123456789PRETEND123456789PRETEND6789 -r 0456aa5842946PRETEND4f37c97a0e5"
4242
```
4343

4444
Likely, you will want to store these codes in your environment variables and then use them like this:
4545

4646
```
4747
(windows)
48-
npx notion-pull-mdx -n %MY_NOTION_TOKEN% -r %MY_NOTION_DOCS_ROOT_PAGE_ID% -m "./docs" -i "./static/notion_images" -p "/notion_images/"
48+
npx notion-pull-mdx -n %MY_NOTION_TOKEN% -r %MY_NOTION_DOCS_ROOT_PAGE_ID%
4949
```
5050

5151
```
5252
(linux / mac)
53-
npx notion-pull-mdx -n $MY_NOTION_TOKEN -r $MY_NOTION_DOCS_ROOT_PAGE_ID -m "./docs" -i "./static/notion_images" -p "/notion_images/"
53+
npx notion-pull-mdx -n $MY_NOTION_TOKEN -r $MY_NOTION_DOCS_ROOT_PAGE_ID
5454
```
5555

56+
NOTE: In the above, we are using `npx` to use the latest `notion-pull-mdx`. A more conservative approach would be to `npm i cross-var notion-pull-mdx` and then create a script in your package.json like this:
57+
58+
```
59+
"scripts": {
60+
"pull": "cross-var notion-pull-mdx -n %NOTION_PULL_INTEGRATION_TOKEN% -r %NOTION_PULL_ROOT_PAGE%"
61+
}
62+
```
63+
64+
and then run that with `npm run pull`.
65+
5666
## 7. Commit
5767

5868
Most projects should probably commit the current markdown and image files each time you run notion-pull-mdx.
@@ -77,11 +87,19 @@ Links from one document to another in Notion are not yet converted to local link
7787

7888
notion-pull-mdx makes some attempt to keep the right order of things, but there are definitely cases where it isn't smart enough yet.
7989

80-
# Localization
90+
# Text Localization
8191

8292
Localize your files in Crowdin (or whatever) based on the markdown files, not in Notion. For how to do this with Docusaurus, see [Docusaurus i18n](https://docusaurus.io/docs/i18n/crowdin).
8393

84-
You may also need to localize screenshots. Crowdin can also handle localizing assets, but this library currently supports a different approach. If you place for example `fr https:\\imgur.com\1234.png` in the caption of a screenshot in Notion, `notion-pull-mdx` will fetch that image and save it locally with the same name as the primary screenshot, but with "-fr" appended. So you'd get for example `static\img\9876.png` and `static\img\9876-fr.png`. To get the French version to show, you'd need to add that "-fr" to the markdown link when you localize the page's text in crowdin. If there is a way, maybe this modification of the markdown can be made automatic in the future so that you automatically get the right image version.
94+
# Screenshot Localization
95+
96+
The only way we know of to provide localization of image in the current Docusaurus (2.0) is to place the images in the same directory as the markdown, and use relative paths for images. Most projects probably won't localize _every_ image, so we also need a way to "fall back" to the original screenshot when the localized one is missing. `notion-pull-mdx` facilitates this. If no localized version of an image is available, `notion-pull-mdx` places a copy of the original image into the correct location.
97+
98+
So how do you provide these localized screenshot files? Crowdin can handle localizing assets, and in the future we may support that. For now, we currently support a different approach. If you place for example `fr https:\\imgur.com\1234.png` in the caption of a screenshot in Notion, `notion-pull-mdx` will fetch that image and save it in the right place to be found when in French mode. Getting URLs to screenshots is easy with screenshot utilities such as [Greenshot](https://getgreenshot.org/) that support uploading to imgur. Note that `notion-pull-mdx` stores a copy of all images in your source tree, so you wouldn't lose the images if imgur were to go away.
99+
100+
NOTE: that as far as I can tell, when you run `docusaurus start` docusaurus 2.0 offers the language picker but it doesn't actually work. So to test out the localized version, do `docusaurus build` followed by `docusaurus serve`.
101+
102+
NOTE: if you just localize an image, it will not get picked up. You also must localize the page that uses the image. Otherwise, Docusaurus will use the English document and when that asks for `./the-image-path`, it will find the image there in the English section, not your other language section.
85103

86104
# Automated builds with Github Actions
87105

package.json

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,10 @@
1313
"semantic-release": "semantic-release",
1414
"typecheck": "tsc --noEmit",
1515
"notion-download": "node dist/index.js",
16-
"test": "ts-node --compiler-options \"{\\\"module\\\": \\\"commonjs\\\"}\" src/index.ts",
16+
"cmdhelp": "ts-node --compiler-options \"{\\\"module\\\": \\\"commonjs\\\"}\" src/index.ts",
1717
"// test out with my sample notion db": "",
18-
"sample": "cross-var ts-node --compiler-options \"{\\\"module\\\": \\\"commonjs\\\"}\" src/index.ts -n %NOTION_PULL_INTEGRATION_TOKEN% -r %NOTION_PULL_ROOT_PAGE% -m ./sample -i ./sample_img/inner -p /inner/"
18+
"sample": "cross-var ts-node --compiler-options \"{\\\"module\\\": \\\"commonjs\\\"}\" src/index.ts -n %NOTION_PULL_INTEGRATION_TOKEN% -r %NOTION_PULL_ROOT_PAGE% -m ./sample --log-level verbose",
19+
"sample-with-paths": "cross-var ts-node --compiler-options \"{\\\"module\\\": \\\"commonjs\\\"}\" src/index.ts -n %NOTION_PULL_INTEGRATION_TOKEN% -r %NOTION_PULL_ROOT_PAGE% -m ./sample --img-output-path ./sample_img"
1920
},
2021
"repository": {
2122
"type": "git",
@@ -51,6 +52,7 @@
5152
"limiter": "^2.1.0",
5253
"node-fetch": "2.6.6",
5354
"notion-to-md": "^2.5.2",
55+
"path": "^0.12.7",
5456
"postinstall-postinstall": "^2.1.0",
5557
"sanitize-filename": "^1.6.3"
5658
},

src/DocusaurusTweaks.ts

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
import chalk from "chalk";
1+
import { logDebug } from "./log";
22

33
export function tweakForDocusaurus(input: string): {
44
body: string;
@@ -90,7 +90,10 @@ function notionEmbedsToMDX(input: string): {
9090
while ((match = v.regex.exec(input)) !== null) {
9191
const string = match[0];
9292
const url = match[1];
93-
console.log(chalk.green(`${string} --> ${v.output.replace("$1", url)}`));
93+
logDebug(
94+
"DocusaurusTweaks",
95+
`${string} --> ${v.output.replace("$1", url)}`
96+
);
9497
body = body.replace(string, v.output.replace("$1", url));
9598
imports.add(v.import);
9699
}

src/LayoutStrategy.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import * as fs from "fs-extra";
2+
import { verbose } from "./log";
23
import { NotionPage } from "./NotionPage";
34

45
// Here a fuller name would be File Tree Layout Strategy. That is,
@@ -16,7 +17,7 @@ export abstract class LayoutStrategy {
1617
public async cleanupOldFiles(): Promise<void> {
1718
// Remove any pre-existing files that aren't around anymore; this indicates that they were removed or renamed in Notion.
1819
for (const p of this.existingPagesNotSeenYetInPull) {
19-
console.log(`Removing old doc: ${p}`);
20+
verbose(`Removing old doc: ${p}`);
2021
await fs.rm(p);
2122
}
2223
}

src/MakeImagePersistencePlan.ts

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
import { ImageSet } from "./NotionImage";
2+
import * as Path from "path";
3+
import { error } from "./log";
4+
5+
export function makeImagePersistencePlan(
6+
imageSet: ImageSet,
7+
imageOutputRootPath: string,
8+
imagePrefix: string
9+
): void {
10+
if (imageSet.fileType?.ext) {
11+
// Since most images come from pasting screenshots, there isn't normally a filename. That's fine, we just make a hash of the url
12+
// Images that are stored by notion come to us with a complex url that changes over time, so we pick out the UUID that doesn't change. Example:
13+
// https://s3.us-west-2.amazonaws.com/secure.notion-static.com/d1058f46-4d2f-4292-8388-4ad393383439/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAT73L2G45EIPT3X45%2F20220516%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20220516T233630Z&X-Amz-Expires=3600&X-Amz-Signature=f215704094fcc884d37073b0b108cf6d1c9da9b7d57a898da38bc30c30b4c4b5&X-Amz-SignedHeaders=host&x-id=GetObject
14+
15+
let thingToHash = imageSet.primaryUrl;
16+
const m = /.*secure\.notion-static\.com\/(.*)\//gm.exec(
17+
imageSet.primaryUrl
18+
);
19+
if (m && m.length > 1) {
20+
thingToHash = m[1];
21+
}
22+
23+
const hash = hashOfString(thingToHash);
24+
imageSet.outputFileName = `${hash}.${imageSet.fileType.ext}`;
25+
26+
imageSet.primaryFileOutputPath = Path.posix.join(
27+
imageOutputRootPath?.length > 0
28+
? imageOutputRootPath
29+
: imageSet.pathToParentDocument!,
30+
imageSet.outputFileName
31+
);
32+
33+
if (imageOutputRootPath && imageSet.localizedUrls.length) {
34+
error(
35+
"imageOutputPath was declared, but one or more localizedUrls were found too. If you are going to localize screenshots, then you can't declare an imageOutputPath."
36+
);
37+
}
38+
39+
imageSet.filePathToUseInMarkdown =
40+
(imagePrefix?.length > 0 ? imagePrefix : ".") +
41+
"/" +
42+
imageSet.outputFileName;
43+
} else {
44+
error(
45+
`Something wrong with the filetype extension on the blob we got from ${imageSet.primaryUrl}`
46+
);
47+
}
48+
}
49+
50+
function hashOfString(s: string) {
51+
let hash = 0;
52+
for (let i = 0; i < s.length; ++i)
53+
hash = Math.imul(31, hash) + s.charCodeAt(i);
54+
55+
return Math.abs(hash);
56+
}
File renamed without changes.

src/NotionImage.ts

Lines changed: 77 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,34 @@
11
import * as fs from "fs-extra";
2-
import FileType from "file-type";
2+
import FileType, { FileTypeResult } from "file-type";
33
import fetch from "node-fetch";
4+
import * as Path from "path";
5+
import { makeImagePersistencePlan } from "./MakeImagePersistencePlan";
6+
import { logDebug, verbose, info } from "./log";
47

58
let existingImagesNotSeenYetInPull: string[] = [];
6-
let imageOutputPath = "not set yet";
7-
let imagePrefix = "not set yet";
9+
let imageOutputPath = ""; // default to putting in the same directory as the document referring to it.
10+
let imagePrefix = ""; // default to "./"
11+
12+
// we parse a notion image and its caption into what we need, which includes any urls to localized versions of the image that may be embedded in the caption
13+
export type ImageSet = {
14+
// We get these from parseImageBlock():
15+
primaryUrl: string;
16+
caption?: string;
17+
localizedUrls: Array<{ iso632Code: string; url: string }>;
18+
19+
// then we fill this in from processImageBlock():
20+
pathToParentDocument?: string;
21+
relativePathToParentDocument?: string;
22+
23+
// then we fill these in readPrimaryImage():
24+
primaryBuffer?: Buffer;
25+
fileType?: FileTypeResult;
26+
27+
// then we fill these in from makeImagePersistencePlan():
28+
primaryFileOutputPath?: string;
29+
outputFileName?: string;
30+
filePathToUseInMarkdown?: string;
31+
};
832

933
export async function initImageHandling(
1034
prefix: string,
@@ -19,85 +43,54 @@ export async function initImageHandling(
1943
// changes, it gets a new id. This way can then prevent downloading
2044
// and image after the 1st time. The downside is currently we don't
2145
// have the smarts to remove unused images.
22-
await fs.mkdir(imageOutputPath, { recursive: true });
46+
if (imageOutputPath) {
47+
await fs.mkdir(imageOutputPath, { recursive: true });
48+
}
2349
}
2450

25-
async function saveImage(
26-
imageSet: ImageSet,
27-
imageFolderPath: string
28-
): Promise<string> {
51+
async function readPrimaryImage(imageSet: ImageSet) {
2952
const response = await fetch(imageSet.primaryUrl);
3053
const arrayBuffer = await response.arrayBuffer();
31-
const buffer = Buffer.from(arrayBuffer);
32-
const fileType = await FileType.fromBuffer(buffer);
33-
if (fileType?.ext) {
34-
// Since most images come from pasting screenshots, there isn't normally a filename. That's fine, we just make a hash of the url
35-
// Images that are stored by notion come to us with a complex url that changes over time, so we pick out the UUID that doesn't change. Example:
36-
// https://s3.us-west-2.amazonaws.com/secure.notion-static.com/d1058f46-4d2f-4292-8388-4ad393383439/Untitled.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=AKIAT73L2G45EIPT3X45%2F20220516%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20220516T233630Z&X-Amz-Expires=3600&X-Amz-Signature=f215704094fcc884d37073b0b108cf6d1c9da9b7d57a898da38bc30c30b4c4b5&X-Amz-SignedHeaders=host&x-id=GetObject
37-
38-
let thingToHash = imageSet.primaryUrl;
39-
const m = /.*secure\.notion-static\.com\/(.*)\//gm.exec(
40-
imageSet.primaryUrl
41-
);
42-
if (m && m.length > 1) {
43-
thingToHash = m[1];
44-
}
54+
imageSet.primaryBuffer = Buffer.from(arrayBuffer);
55+
imageSet.fileType = await FileType.fromBuffer(imageSet.primaryBuffer);
56+
}
4557

46-
const hash = hashOfString(thingToHash);
47-
const outputFileName = `${hash}.${fileType.ext}`;
48-
const primaryFilePath = writeImageIfNew(
49-
imageFolderPath,
50-
outputFileName,
51-
buffer
52-
);
53-
54-
// if there are localized images, save them too, using the same
55-
// name as the primary but with their language code attached
56-
for (const localizedImage of imageSet.localizedUrls) {
57-
const outputFileName = `${hash}-${localizedImage.iso632Code}.${fileType.ext}`;
58-
console.log("Saving localized image to " + outputFileName);
59-
const response = await fetch(localizedImage.url);
60-
const arrayBuffer = await response.arrayBuffer();
61-
const buffer = Buffer.from(arrayBuffer);
62-
writeImageIfNew(imageFolderPath, outputFileName, buffer);
58+
async function saveImage(imageSet: ImageSet): Promise<void> {
59+
writeImageIfNew(imageSet.primaryFileOutputPath!, imageSet.primaryBuffer!);
60+
61+
let foundLocalizedImage = false;
62+
63+
// if there are localized images, save them too, using the same
64+
// name as the primary but with their language code attached
65+
for (const localizedImage of imageSet.localizedUrls) {
66+
verbose(`Retrieving ${localizedImage.iso632Code} version...`);
67+
const response = await fetch(localizedImage.url);
68+
const arrayBuffer = await response.arrayBuffer();
69+
const buffer = Buffer.from(arrayBuffer);
70+
const directory = `./i18n/${
71+
localizedImage.iso632Code
72+
}/docusaurus-plugin-content-docs/current/${imageSet.relativePathToParentDocument!}`;
73+
if (!foundLocalizedImage) {
74+
foundLocalizedImage = true;
75+
info(
76+
"*** found at least one localized image, so /i18n directory will be created and filled with localized image files."
77+
);
6378
}
64-
65-
return primaryFilePath;
66-
} else {
67-
console.error(
68-
`Something wrong with the filetype extension on the blob we got from ${imageSet.primaryUrl}`
69-
);
70-
return "error";
79+
writeImageIfNew(directory + "/" + imageSet.outputFileName!, buffer);
7180
}
7281
}
73-
function writeImageIfNew(
74-
imageFolderPath: string,
75-
outputFileName: string,
76-
buffer: Buffer
77-
) {
78-
const path = imageFolderPath + "/" + outputFileName;
82+
83+
function writeImageIfNew(path: string, buffer: Buffer) {
7984
imageWasSeen(path);
8085
if (!fs.pathExistsSync(path)) {
81-
console.log("Adding image " + path);
86+
verbose("Adding image " + path);
87+
fs.mkdirsSync(Path.dirname(path));
8288
fs.createWriteStream(path).write(buffer); // async but we're not waiting
89+
} else {
90+
verbose(`image already filled: ${path}`);
8391
}
84-
return outputFileName;
8592
}
8693

87-
function hashOfString(s: string) {
88-
let hash = 0;
89-
for (let i = 0; i < s.length; ++i)
90-
hash = Math.imul(31, hash) + s.charCodeAt(i);
91-
92-
return Math.abs(hash);
93-
}
94-
95-
// we parse a notion image and its caption into what we need, which includes any urls to localized versions of the image that may be embedded in the caption
96-
type ImageSet = {
97-
primaryUrl: string;
98-
caption?: string;
99-
localizedUrls: Array<{ iso632Code: string; url: string }>;
100-
};
10194
export function parseImageBlock(b: any): ImageSet {
10295
const imageSet: ImageSet = {
10396
primaryUrl: "",
@@ -142,18 +135,27 @@ export function parseImageBlock(b: any): ImageSet {
142135

143136
// Download the image if we don't have it, give it a good name, and
144137
// change the src to point to our copy of the image.
145-
export async function processImageBlock(b: any): Promise<void> {
146-
//console.log(JSON.stringify(b));
138+
export async function processImageBlock(
139+
b: any,
140+
pathToParentDocument: string,
141+
relativePathToThisPage: string
142+
): Promise<void> {
143+
logDebug("processImageBlock", JSON.stringify(b));
144+
145+
// this is broken into all these steps to facilitate unit testing without IO
147146
const imageSet = parseImageBlock(b);
147+
imageSet.pathToParentDocument = pathToParentDocument;
148+
imageSet.relativePathToParentDocument = relativePathToThisPage;
148149

149-
const newPath =
150-
imagePrefix + "/" + (await saveImage(imageSet, imageOutputPath));
150+
await readPrimaryImage(imageSet);
151+
makeImagePersistencePlan(imageSet, imageOutputPath, imagePrefix);
152+
await saveImage(imageSet);
151153

152154
// change the src to point to our copy of the image
153155
if ("file" in b.image) {
154-
b.image.file.url = newPath;
156+
b.image.file.url = imageSet.filePathToUseInMarkdown;
155157
} else {
156-
b.image.external.url = newPath;
158+
b.image.external.url = imageSet.filePathToUseInMarkdown;
157159
}
158160
// put back the simplified caption, stripped of the meta information
159161
if (imageSet.caption) {
@@ -177,7 +179,7 @@ function imageWasSeen(path: string) {
177179

178180
export async function cleanupOldImages(): Promise<void> {
179181
for (const p of existingImagesNotSeenYetInPull) {
180-
console.log(`Removing old image: ${p}`);
182+
verbose(`Removing old image: ${p}`);
181183
await fs.rm(p);
182184
}
183185
}

0 commit comments

Comments
 (0)