Skip to content
This repository was archived by the owner on Mar 14, 2024. It is now read-only.

confluence2markdown is built on a broken tokenizer #2

Open
oberlies opened this issue Mar 21, 2017 · 2 comments
Open

confluence2markdown is built on a broken tokenizer #2

oberlies opened this issue Mar 21, 2017 · 2 comments

Comments

@oberlies
Copy link

Floby's node-tokenizer is fundamentally broken - see Floby/node-tokenizer#15

This problem is also not resolved in your fork. This is a shame because otherwise this would be a very useful tool...

@pborenstein
Copy link
Owner

My fork was an ugly hack to get a project finished. (You can see how ugly when I set the max token from 128 to 1024 characters.) I haven't really come back to it since then. If you can make it more robust, that would be great.

@oberlies
Copy link
Author

Well, the issue has been present in Floby's node-tokenizer for years and there hasn't even been a bug report for this up to now. Appears to me that the project is dead. Not something where I'd want to invest...

Parsing the old (3.5) Confluence markup is quite complicated. (After all, this is why they abandoned the format.) So I now switched to an approach that avoids this problem - or rather leaves it to Atlassian:

  1. Create an empty wiki page in a current version of Confluence and save it.
  2. Edit the page again.
  3. Go to Insert > Markup and insert the Confluence markup
  4. Click on Open in source editor to show the content in the new XML-based storage format
  5. Copy the content to an XML file
  6. Add the missing header and footer (see Conversion fails for (non well-formed) XML obtained from "View Source" highsource/confluence-to-markdown-converter#8)
  7. Convert to markdown using the c2md.xsl XSLT transformation from https://github.com/highsource/confluence-to-markdown-converter

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants