refactor: parse markdown on the server for content pages by eatyourgreens · Pull Request #411 · OxfordRSE/gutenberg

eatyourgreens · 2025-11-05T16:26:06Z

Add a markdown2Html helper that converts markdown to HTML with remark.
Add a material2Html helper that runs markdown2Html with the appropriate Remark plugins for course material.
Add a html2React helper that converts HTML strings to a React component tree with rehype-react.
Parse markdown in getStaticProps and add a html prop to teaching material pages.
Update the event API to add a html property to events and event groups.
Update Content so that it accepts html as a prop, instead of markdown.
Update the Markdown component to use /lib/markdown.
Parse HTML strings client-side with rehype and add interactivity.

eatyourgreens · 2025-11-05T20:31:34Z

I think this will be my experiment for the hack day on Friday. It builds without errors but I’m not sure that it works.

- Add a `markdown2Html` helper that converts markdown to HTMl with `remark`. - Add a `material2Html` helper that runs `markdown2Html` with the appropriate Remark plugins for course material. - Add a `html2React` helper that converts HTML strings to a React component tree with `rehype-react`. - Parse markdown in `getStaticProps` and add a `html` prop to teaching material pages. - Update the event API to add a `html` property to events and event groups. - Update `Content` so that it accepts `html` as a prop, instead of `markdown`. - Update the `Markdown` component to use `/lib/markdown`. - Parse HTML strings client-side with `rehype` and add interactivity.

alasdairwilson · 2025-11-07T11:51:04Z

I had assumed the primary purpose of this is to reduce the props, it only reduces NEXT_DATA from 290kb to 274kb,

This is more a indication of how disgusting the size the props is, this is some of the very oldest code on gutenberg alongside the nlp stuff (easily predating me) and I think the intention was that the actual markdown was removed from any further drilling but obviously it was always needed to be present to rehydrate the content on the content component.

In terms of page load, there has been no difference in the hydration speed, obviously in both cases the SSG page is fully loaded near instantly and the js in my tests is done in 600ms with the re-parsing the react and 605 with this one, so functionally identical. In both cases as well the SSG page is completely sufficient prior to the JS loading in as well, the whole point of next really.

In terms of problems: I think if this is a good idea to change to then there needs to be some kind of effort to sanitise the html though, the existing code santises with react markdown but this would presumably just fully include <script> tags or embedded frames? While we have some kiknd of protection of our own deploy via the markdown repo it isnt nearly as strong as the protection on the gutenberg app and other deployments may point to sources they dont control and could have arbitrary code injected.

I also am concerned about turning html into react components so our process is becoming markdown to html to react and also markdown to react and both our react doms have to be consistent. Like it is significantly more confusing and i'm worried if we add functionality to components then will it just always work, which the current solution is almost guaranteed to do.

eatyourgreens · 2025-11-07T12:26:00Z

This is the same markdown processor that react-markdown uses internally, I've just broken it up into two pieces:

Markdown to HTML with Remark.
HTML to React with Rehype.

Remark should sanitise output by default, but I'll double-check that.

It's loosely based on similar work I did for Zooniverse's Markdown parser a couple of years ago: seperating the markdown parser from the custom React component processor so that they can be run separately.

alasdairwilson · 2025-11-07T12:34:29Z

Yeah that is fine if it is still safe, just obviously r-m will ignore <script> or <iframe> or whatever but if remark does not explicitly strip those then does rehype?

I mean I know it isnt setting dangerously innerhtml its just worth checking.

eatyourgreens · 2025-11-07T14:35:09Z

Yeah that is fine if it is still safe, just obviously r-m will ignore <script> or <iframe> or whatever but if remark does not explicitly strip those then does rehype?

Good question. I think so, but it's been a couple of years since I've worked with Rehype in detail.

React Markdown runs the Remark output through remark-rehype, then any custom Rehype plugins that you passed in:
https://github.com/remarkjs/react-markdown/blob/fda7fa560bec901a6103e195f9b1979dab543b17/lib/index.js#L269-L273

I assumed that remark-rehype sanitises output, but I'm not 100% sure about that.

eatyourgreens force-pushed the refactor-markdown-parsing branch 4 times, most recently from 36e89fc to 95516c9 Compare November 5, 2025 17:08

eatyourgreens force-pushed the refactor-markdown-parsing branch 14 times, most recently from 43e929a to f1764f2 Compare November 6, 2025 17:32

eatyourgreens force-pushed the refactor-markdown-parsing branch from f1764f2 to 552f519 Compare November 6, 2025 17:45

eatyourgreens added the experiment label Nov 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: parse markdown on the server for content pages#411

refactor: parse markdown on the server for content pages#411
eatyourgreens wants to merge 1 commit intomainfrom
refactor-markdown-parsing

eatyourgreens commented Nov 5, 2025 •

edited

Loading

Uh oh!

eatyourgreens commented Nov 5, 2025

Uh oh!

alasdairwilson commented Nov 7, 2025

Uh oh!

eatyourgreens commented Nov 7, 2025

Uh oh!

alasdairwilson commented Nov 7, 2025

Uh oh!

eatyourgreens commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eatyourgreens commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eatyourgreens commented Nov 5, 2025

Uh oh!

alasdairwilson commented Nov 7, 2025

Uh oh!

eatyourgreens commented Nov 7, 2025

Uh oh!

alasdairwilson commented Nov 7, 2025

Uh oh!

eatyourgreens commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eatyourgreens commented Nov 5, 2025 •

edited

Loading