Skip to content

Conversation

@heathdutton
Copy link

Fixes #36106

When wiki headings contain HTML elements (like <a name="anchor"></a>), the raw HTML code was appearing verbatim in the table of contents instead of being stripped out.

This fix uses bluemonday.StrictPolicy() to sanitize the heading text before adding it to the ToC, removing all HTML tags while preserving the text content.

Before: ToC displays <a name="asdf"></a> has strange html
After: ToC displays has strange html

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jan 2, 2026
@github-actions github-actions bot added the modifies/go Pull requests that update Go code label Jan 2, 2026
@lunny
Copy link
Member

lunny commented Jan 2, 2026

Could you add some test?

@heathdutton heathdutton force-pushed the fix/36106-wiki-toc-strip-html branch from f0b0b36 to d737929 Compare January 2, 2026 20:22
@lunny
Copy link
Member

lunny commented Jan 2, 2026

Whether any a tag in the heading should be ignored or we could have an option to do that?

@heathdutton
Copy link
Author

Good question! I went with stripping all HTML tags rather than just <a> specifically - the ToC should really just show plain text for navigation, so things like <span>, <strong>, etc. wouldn't make sense there either.

The heading itself still renders with the HTML in the document body, so anchor links like <a name="anchor"> still work fine for linking purposes.

I think adding an option would be overkill for this - can't think of a case where someone would actually want raw HTML showing up in their ToC. But happy to discuss if you see it differently!

@delvh
Copy link
Member

delvh commented Jan 3, 2026

can't think of a case where someone would actually want raw HTML showing up in their ToC

What about case <a> and <b>?

@delvh
Copy link
Member

delvh commented Jan 3, 2026

So, I think 'raw HTML' is useful when it is only accidentally HTML.
And that one can most certainly happen in titles

@heathdutton heathdutton force-pushed the fix/36106-wiki-toc-strip-html branch from d737929 to a1c7525 Compare January 3, 2026 01:22
@heathdutton
Copy link
Author

Good edge case to think about! I tested this and the fix handles it correctly:

## <a href="link">Click</a> and <b>Bold</b>

ToC shows: Click and Bold

The HTML tags get stripped but the text content inside them is preserved - which is exactly what we want for a readable ToC.

I've added test cases covering this scenario. Also verified that code spans like \`` are handled separately by the markdown parser and aren't affected.

@heathdutton heathdutton force-pushed the fix/36106-wiki-toc-strip-html branch from e211fab to 0b84de7 Compare January 3, 2026 03:16
@wxiaoguang wxiaoguang force-pushed the fix/36106-wiki-toc-strip-html branch from 0b84de7 to e211fab Compare January 3, 2026 03:17
@wxiaoguang
Copy link
Contributor

wxiaoguang commented Jan 3, 2026

I fixed the tests, it needs to clearly assert what we want.

And we can see that the result doesn't seem good.

By the way: no need to rebase or force push, see the contribution guideline https://github.com/go-gitea/gitea/blob/main/CONTRIBUTING.md#maintaining-open-prs

image

@heathdutton
Copy link
Author

Roger that, I'll step back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. modifies/go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

html is copied verbatim to ToC in wiki

5 participants