-
Notifications
You must be signed in to change notification settings - Fork 386
Migrating LoopBack Docs to Markdown for use with Jekyll
GOAL: To create an open-source site similar to Express docs. High-level tasks:
- Export content of APIC space to HTML. (This space now contains the source documentation for LoopBack, which is duplicated in pages with the same title in the LB space.)
- Convert/strip HTML to markdown using script.
- Add pages to this repo, edit as necessary
When site goes live, replace Confluence pages with redirects to here.
NOTE: Although the long-term plan is to have both LoopBack 2.x and 3.0 docs, initially we should focus on 2.x, since 3.0 is not released yet. As the 3.0 release approaches, we can "clone" the 2.0 docs into /docs/lb3
, and then add/modify the content as needed.
Conversion will basically be a series of "search and replace" functions to be applied to every Confluence HTML file. The output will be a markdown file with the same base name, but with .md
extension, e.g. Foo.html' is converted to
Foo.md`.
- Title
- Front matter
- Article content
- HTML to discard
Article title is in this block:
<h1 id="title-heading" class="pagetitle">
<span id="title-text">... Article title ... </span>
</h1>
Use the contents of the <span id="title-text">
tag as the value for the title
property in the article front-matter.
NOTE: If the title includes a colon character (:), Jekyll requires the title
property to be quoted. In the Confluence export, these articles will have file names that are numbers instead of text.
Every markdown file must start with some Jekyll front-matter that looks like this:
---
title: The article title goes here
layout: page
keywords: LoopBack
tags:
sidebar: lb2_sidebar
permalink: /doc/lb2/The-file-name-goes-here.html
summary:
---
NOTE: The three dashes before and after front-matter are required.
In general, we don't have a consistent summary for every article, so we'll leave the summary
property blank.
Confluence export apparently does not include "labels" data, so we'll also leave the tags
property blank. This seems pretty lame on the part of Confluence (Atlassian).
The actual article content is in:
... Content here ...
</div>```
Everthing above and below this, i.e. outside of this tag, can be discarded.
### HTML to discard
Some pages may have these, which should just be discarded.
#### Injected CSS
Discard injected CSS: `<style type='text/css'>/*<![CDATA[*/ .... /*]]>*/</style>`
#### Confluence-generated TOC
Since our Jekyll theme has it's own [automatic generated TOCs](http://idratherbewriting.com/documentation-theme-jekyll/mydoc_pages.html#automatic-mini-tocs), we should discard this HTML (that occurs only in some pages):
The class selector rbtoc1470354523244
varies by page.
We need to process links whose href
destination URL begins with https://docs.strongloop.com/display/APIC/
so they link to the new page here instead of the old page. All other links should be left "as is".
Convert
<a href="https://docs.strongloop.com/display/APIC/Creating+model+relations" rel="nofollow">
Creating model relations
</a>
To
[Creating model relations](/doc/lb2/Creating-model-relations.html)
Convert headings as follows:
Confluence HTML | Markdown |
---|---|
<h2> .. </h2> |
## |
<h3> .. </h3> |
### |
<h4> .. </h4> |
### |
<h5> .. </h5> |
### |
<h6> .. </h6> |
### |
We'll copy all the image files (.png
files, etc.) into the /images
folder. It's not clear that it would be helpful to separate the LB2 images from LB3, etc. It might be easier just to keep all the image files in the same place.
Convert image tags to the Jekyll template image include.
So, for example, this HTML:
<img class="confluence-embedded-image" height="388" width="700" src="attachments/9634213/9830499.png" data-image-src="attachments/9634213/9830499.png">
Converts to:
{% include image.html file="9830499.png" alt="" %"}
NOTE: Most of the image content won't have an alt
attribute, but let's add the attribute to the Jekyll include, to make it easier to add it later.
TBD
These Confluence macros convert to Jekyll alerts. Unfortunately, the names are confusingly different.
We use the Confluence macro names here, but what really matters is the div class
attribute.
Information
<div class="aui-message hint shadowed information-macro">
<span class="aui-icon icon-hint">Icon</span>
<div class="message-content"> ... Text content ... </div>
</div>
Converts to:
{% include note.html content="... Text content ..." %}
Tip
<div class="aui-message success shadowed information-macro">
<span class="aui-icon icon-success">Icon</span>
<div class="message-content"> ... Text content ... </div>
</div>
Converts to:
{% include tip.html content="... Text content ..." %}
Warning
<div class="aui-message problem shadowed information-macro">
<span class="aui-icon icon-problem">Icon</span>
<div class="message-content"> ... Text content ... </div>
</div>
Converts to:
{% include warning.html content="... Text content ..." %}
Note
<div class="aui-message warning shadowed information-macro">
<span class="aui-icon icon-warning">Icon</span>
<div class="message-content"> ... Text content ... </div>
</div>
Converts to:
{% include important.html content="... Text content ..." %}
TBD
I'm assuming we can convert the HTML to markdown without too much trouble, but I'm keeping this here for reference in case we need it.
In case it's easier to export to Word and then convert the Word files to markdown. See How can doc/docx files be converted to markdown or structured text?.
Other references:
- https://domchristie.github.io/to-markdown/ Online HTML to MD converter.
- http://pandoc.org/
References: