Skip to content

feat: rewrite markdown fragment anchors to Confluence-native format during conversion#143

Open
geopanther wants to merge 2 commits into
iamjackg:masterfrom
geopanther:feature/anchor-conversion
Open

feat: rewrite markdown fragment anchors to Confluence-native format during conversion#143
geopanther wants to merge 2 commits into
iamjackg:masterfrom
geopanther:feature/anchor-conversion

Conversation

@geopanther
Copy link
Copy Markdown

Problem

When md2cf uploads Markdown documents, internal fragment links like [link](#the-concept) retain the GitHub-Flavored Markdown anchor format. Confluence generates heading anchors differently — using PageTitle-Heading with spaces and hyphens stripped — so these links break silently after upload.

Solution

This PR rewrites markdown-style fragment anchors to Confluence-native format during conversion, before upload — a purely local string transformation that requires no additional API calls.

New module: md2cf/anchor.py

A self-contained module with a single public function rewrite_page_anchors(body, page_title) that:

  1. Extracts headings from the rendered Confluence storage-format HTML
  2. Builds a mapping from GFM slugs (#the-concept) to Confluence anchors (#MyGuide-TheConcept)
  3. Rewrites href="#…" and ac:anchor="…" attributes in the body

Handles edge cases:

  • Duplicate headings with GFM-style -1, -2 suffixes
  • URL-encoding of special characters (e.g. parentheses in headings)
  • id- prefix for anchors starting with non-alpha characters (Confluence convention)
  • Fragment links that don't match any heading are left untouched

CLI integration

A new --convert-anchors / --no-convert-anchors flag (enabled by default) controls the feature. The rewriting runs as the last step of pre_process_page(), after the page title is finalized (including any --prefix), ensuring the Confluence anchor prefix matches the actual page title.

Tests

33 unit tests covering all internal helpers and the public API, including edge cases for empty bodies, non-matching fragments, duplicate headings, URL encoding, and the id- prefix.

Changes

File Description
md2cf/anchor.py New module — anchor extraction, mapping, and rewriting
md2cf/__main__.py --convert-anchors flag + call in pre_process_page()
test_package/unit/test_anchor.py 33 unit tests

…ence format

Extracts anchor rewriting logic into md2cf/anchor.py with:
- Markdown-to-Confluence anchor mapping (GFM slug -> PageTitle-Heading)
- Duplicate heading support with GFM-style -1, -2 suffixes
- URL-encoding of special characters in Confluence anchors
- id- prefix for anchors starting with non-alpha characters
- Rewrites both href="#..." and ac:anchor="..." attributes

Includes 33 unit tests covering all helper functions and edge cases.
Integrates rewrite_page_anchors() into pre_process_page(), called after
title finalization (including --prefix) so Confluence anchor prefixes
match the final page title. Enabled by default; opt out with
--no-convert-anchors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant