Skip to content

Migrating LoopBack Docs to Markdown for use with Jekyll

Rand McKinney edited this page Aug 6, 2016 · 16 revisions

GOAL: To create an open-source site similar to Express docs. High-level tasks:

  1. Export content of APIC space to HTML. (This space now contains the source documentation for LoopBack, which is duplicated in pages with the same title in the LB space.)
  2. Convert/strip HTML to markdown using script.
  3. Add pages to this repo, edit as necessary

When site goes live, replace Confluence pages with redirects to here.

NOTE: Although the long-term plan is to have both LoopBack 2.x and 3.0 docs, initially we should focus on 2.x, since 3.0 is not released yet. As the 3.0 release approaches, we can "clone" the 2.0 docs into /docs/lb3, and then add/modify the content as needed.

Conversion from HTML to markdown

Title

Article title is in this block:

<h1 id="title-heading" class="pagetitle">
  <span id="title-text">... Article title ... </span>
</h1>

Use the contents of the <span id="title-text"> tag as the value for the title property in the article front-matter.

NOTE: If the title includes a colon character (:), Jekyll requires the title property to be quoted. In the Confluence export, these articles will have file names that are numbers instead of text.

Front matter

Every markdown file must start with some Jekyll front-matter that looks like this:

---
title: The article title goes here
layout: page
keywords: LoopBack
tags:
sidebar: lb2_sidebar
permalink: /doc/lb2/The-file-name-goes-here.html
summary:
---

NOTE: The three dashes before and after front-matter are required.

In general, we don't have a consistent summary for every article, so we'll leave the summary property blank. Confluence export apparently does not include "labels" data, so we'll also leave the tags property blank. This seems pretty lame on the part of Confluence (Atlassian).

Article content

The actual article content is in:

... Content here ...
</div>```

Everthing above and below this, i.e. outside of this tag, can be discarded.

### Other stuff that should be discarded.

Some pages may have these, which should just be discarded.

#### Injected CSS

Discard injected CSS: `<style type='text/css'>/*<![CDATA[*/ .... /*]]>*/</style>`

#### Confluence-generated TOC

Since our Jekyll theme has it's own [automatic generated TOCs](http://idratherbewriting.com/documentation-theme-jekyll/mydoc_pages.html#automatic-mini-tocs), we should discard this HTML (that occurs only in some pages):
...
```

The class selector rbtoc1470354523244 varies by page.

Links

We need to process links whose href destination URL begins with https://docs.strongloop.com/display/APIC/ so they link to the new page here instead of the old page. All other links should be left "as is".

Convert

<a href="https://docs.strongloop.com/display/APIC/Creating+model+relations" rel="nofollow">
 Creating model relations
</a>

To

[Creating model relations](/doc/lb2/Creating-model-relations.html)

Headings

Convert headings as follows:

Confluence HTML Markdown
<h2> .. </h2> ##
<h3> .. </h3> ###
<h4> .. </h4> ###
<h5> .. </h5> ###
<h6> .. </h6> ###

Images

Macros


Note

I'm assuming we can convert the HTML to markdown without too much trouble, but I'm keeping this here for reference in case we need it.

In case it's easier to export to Word and then convert the Word files to markdown. See How can doc/docx files be converted to markdown or structured text?.

Other references:

References: