Skip to content

refactor: parse markdown on the server for content pages#411

Draft
eatyourgreens wants to merge 1 commit intomainfrom
refactor-markdown-parsing
Draft

refactor: parse markdown on the server for content pages#411
eatyourgreens wants to merge 1 commit intomainfrom
refactor-markdown-parsing

Conversation

@eatyourgreens
Copy link
Copy Markdown
Contributor

@eatyourgreens eatyourgreens commented Nov 5, 2025

  • Add a markdown2Html helper that converts markdown to HTML with remark.
  • Add a material2Html helper that runs markdown2Html with the appropriate Remark plugins for course material.
  • Add a html2React helper that converts HTML strings to a React component tree with rehype-react.
  • Parse markdown in getStaticProps and add a html prop to teaching material pages.
  • Update the event API to add a html property to events and event groups.
  • Update Content so that it accepts html as a prop, instead of markdown.
  • Update the Markdown component to use /lib/markdown.
  • Parse HTML strings client-side with rehype and add interactivity.

@eatyourgreens eatyourgreens force-pushed the refactor-markdown-parsing branch 4 times, most recently from 36e89fc to 95516c9 Compare November 5, 2025 17:08
@eatyourgreens
Copy link
Copy Markdown
Contributor Author

I think this will be my experiment for the hack day on Friday. It builds without errors but I’m not sure that it works.

@eatyourgreens eatyourgreens force-pushed the refactor-markdown-parsing branch 14 times, most recently from 43e929a to f1764f2 Compare November 6, 2025 17:32
- Add a `markdown2Html` helper that converts markdown to HTMl with `remark`.
- Add a `material2Html` helper that runs `markdown2Html` with the appropriate Remark plugins for course material.
- Add a `html2React` helper that converts HTML strings to a React component tree with `rehype-react`.
- Parse markdown in `getStaticProps` and add a `html` prop to teaching material pages.
- Update the event API to add a `html` property to events and event groups.
- Update `Content` so that it accepts `html` as a prop, instead of `markdown`.
- Update the `Markdown` component to use `/lib/markdown`.
- Parse HTML strings client-side with `rehype` and add interactivity.
@eatyourgreens eatyourgreens force-pushed the refactor-markdown-parsing branch from f1764f2 to 552f519 Compare November 6, 2025 17:45
@alasdairwilson
Copy link
Copy Markdown
Member

I had assumed the primary purpose of this is to reduce the props, it only reduces NEXT_DATA from 290kb to 274kb,

This is more a indication of how disgusting the size the props is, this is some of the very oldest code on gutenberg alongside the nlp stuff (easily predating me) and I think the intention was that the actual markdown was removed from any further drilling but obviously it was always needed to be present to rehydrate the content on the content component.

In terms of page load, there has been no difference in the hydration speed, obviously in both cases the SSG page is fully loaded near instantly and the js in my tests is done in 600ms with the re-parsing the react and 605 with this one, so functionally identical. In both cases as well the SSG page is completely sufficient prior to the JS loading in as well, the whole point of next really.

In terms of problems: I think if this is a good idea to change to then there needs to be some kind of effort to sanitise the html though, the existing code santises with react markdown but this would presumably just fully include <script> tags or embedded frames? While we have some kiknd of protection of our own deploy via the markdown repo it isnt nearly as strong as the protection on the gutenberg app and other deployments may point to sources they dont control and could have arbitrary code injected.

I also am concerned about turning html into react components so our process is becoming markdown to html to react and also markdown to react and both our react doms have to be consistent. Like it is significantly more confusing and i'm worried if we add functionality to components then will it just always work, which the current solution is almost guaranteed to do.

@eatyourgreens
Copy link
Copy Markdown
Contributor Author

This is the same markdown processor that react-markdown uses internally, I've just broken it up into two pieces:

  • Markdown to HTML with Remark.
  • HTML to React with Rehype.

Remark should sanitise output by default, but I'll double-check that.

It's loosely based on similar work I did for Zooniverse's Markdown parser a couple of years ago: seperating the markdown parser from the custom React component processor so that they can be run separately.

@alasdairwilson
Copy link
Copy Markdown
Member

Yeah that is fine if it is still safe, just obviously r-m will ignore <script> or <iframe> or whatever but if remark does not explicitly strip those then does rehype?

I mean I know it isnt setting dangerously innerhtml its just worth checking.

@eatyourgreens
Copy link
Copy Markdown
Contributor Author

Yeah that is fine if it is still safe, just obviously r-m will ignore <script> or <iframe> or whatever but if remark does not explicitly strip those then does rehype?

Good question. I think so, but it's been a couple of years since I've worked with Rehype in detail.

React Markdown runs the Remark output through remark-rehype, then any custom Rehype plugins that you passed in:
https://github.com/remarkjs/react-markdown/blob/fda7fa560bec901a6103e195f9b1979dab543b17/lib/index.js#L269-L273

I assumed that remark-rehype sanitises output, but I'm not 100% sure about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants