Skip to content

Migrating LoopBack Docs to Markdown for use with Jekyll

Rand McKinney edited this page Aug 6, 2016 · 16 revisions

GOAL: To create an open-source site similar to Express docs. High-level tasks:

  1. Export content of APIC space to HTML. (This space now contains the source documentation for LoopBack, which is duplicated in pages with the same title in the LB space.)
  2. Convert/strip HTML to markdown using script.
  3. Add pages to this repo, edit as necessary

When site goes live, replace Confluence pages with redirects to here.

NOTE: Although the long-term plan is to have both LoopBack 2.x and 3.0 docs, initially we should focus on 2.x, since 3.0 is not released yet. As the 3.0 release approaches, we can "clone" the 2.0 docs into /docs/lb3, and then add/modify the content as needed.

Conversion from HTML to markdown

Conversion will basically be a series of "search and replace" functions to be applied to every Confluence HTML file. The output will be a markdown file with the same base name, but with .md extension, e.g. Foo.html' is converted to Foo.md`.

  • Title
  • Front matter
  • Article content
  • HTML to discard

Title

Article title is in this block:

<h1 id="title-heading" class="pagetitle">
  <span id="title-text">... Article title ... </span>
</h1>

Use the contents of the <span id="title-text"> tag as the value for the title property in the article front-matter.

NOTE: If the title includes a colon character (:), Jekyll requires the title property to be quoted. In the Confluence export, these articles will have file names that are numbers instead of text.

Front matter

Every markdown file must start with some Jekyll front-matter that looks like this:

---
title: The article title goes here
layout: page
keywords: LoopBack
tags:
sidebar: lb2_sidebar
permalink: /doc/lb2/The-file-name-goes-here.html
summary:
---

NOTE: The three dashes before and after front-matter are required.

In general, we don't have a consistent summary for every article, so we'll leave the summary property blank. Confluence export apparently does not include "labels" data, so we'll also leave the tags property blank. This seems pretty lame on the part of Confluence (Atlassian).

Article content

The actual article content is in:

... Content here ...
</div>```

Everthing above and below this, i.e. outside of this tag, can be discarded.

### HTML to discard

Some pages may have these, which should just be discarded.

#### Injected CSS

Discard injected CSS: `<style type='text/css'>/*<![CDATA[*/ .... /*]]>*/</style>`

#### Confluence-generated TOC

Since our Jekyll theme has it's own [automatic generated TOCs](http://idratherbewriting.com/documentation-theme-jekyll/mydoc_pages.html#automatic-mini-tocs), we should discard this HTML (that occurs only in some pages):
...
```

The class selector rbtoc1470354523244 varies by page.

Links

We need to process links whose href destination URL begins with https://docs.strongloop.com/display/APIC/ so they link to the new page here instead of the old page. All other links should be left "as is".

Convert

<a href="https://docs.strongloop.com/display/APIC/Creating+model+relations" rel="nofollow">
 Creating model relations
</a>

To

[Creating model relations](/doc/lb2/Creating-model-relations.html)

Headings

Convert headings as follows:

Confluence HTML Markdown
<h2> .. </h2> ##
<h3> .. </h3> ###
<h4> .. </h4> ###
<h5> .. </h5> ###
<h6> .. </h6> ###

Images

We'll copy all the image files (.png files, etc.) into the /images folder. It's not clear that it would be helpful to separate the LB2 images from LB3, etc. It might be easier just to keep all the image files in the same place.

Convert image tags to the Jekyll template image include.

So, for example, this HTML:

<img class="confluence-embedded-image" height="388" width="700" src="attachments/9634213/9830499.png" data-image-src="attachments/9634213/9830499.png">

Converts to:

{% include image.html file="9830499.png" alt="" %"}

NOTE: Most of the image content won't have an alt attribute, but let's add the attribute to the Jekyll include, to make it easier to add it later.

Other conversions

Code blocks

TBD

Notes, Warnings, Tips, etc.

These Confluence macros convert to Jekyll alerts. Unfortunately, the names are confusingly different.

We use the Confluence macro names here, but what really matters is the div class attribute.

Information

    <div class="aui-message hint shadowed information-macro">
      <span class="aui-icon icon-hint">Icon</span>
        <div class="message-content"> ... Text content ... </div>
    </div>

Converts to:

{% include note.html content="... Text content ..." %}

Tip

    <div class="aui-message success shadowed information-macro">
      <span class="aui-icon icon-success">Icon</span>
        <div class="message-content"> ... Text content ... </div>
    </div>

Converts to:

{% include tip.html content="... Text content ..." %}

Warning

    <div class="aui-message problem shadowed information-macro">
      <span class="aui-icon icon-problem">Icon</span>
        <div class="message-content"> ... Text content ... </div>
    </div>

Converts to:

{% include warning.html content="... Text content ..." %}

Note

    <div class="aui-message warning shadowed information-macro">
      <span class="aui-icon icon-warning">Icon</span>
        <div class="message-content"> ... Text content ... </div>
    </div>

Converts to:

{% include important.html content="... Text content ..." %}

Macros

TBD


Note

I'm assuming we can convert the HTML to markdown without too much trouble, but I'm keeping this here for reference in case we need it.

In case it's easier to export to Word and then convert the Word files to markdown. See How can doc/docx files be converted to markdown or structured text?.

Other references:

References: