Skip to content

Conversation

TartanLlama
Copy link

Adds an HTML rewriter feature with an interface mostly the same as Akamai's html-rewriter. Uses the lol-html library under the hood.

A few potentially questionable design decisions which we may want to go a different way on:

  • I copied the source of lol-html's C API into this repo. This was done for a few reasons:
    • The upstream version uses a single function call that is from a newer rustc version that we build with, so I patched that change.
    • The C API crate is in a subdirectory of the main lol-html repo and is not published on crates.io.
    • The upstream version specifies a different lib.name in its Cargo.toml, which is incompatible with StarlingMonkey's add_rust_lib function, so I patched that too.
    • I think ideally we'd change the version of rustc we compile with (which is a change made in upstream StarlingMonkey already), patch add_rust_lib to support custom lib.names, then grab the C API crate through cargo with a simple wrapper library, similar to the existing crates in StarlingMonkey. Alternatively, we could make a fork of lol-html and point at that instead.
  • I did not implement the insert_implicit_close feature of Akamai's API, because it seems less important and I don't see a simple way to do this from lol-html. I also extended their API with a escapeHTML option for all insertion functions, which allows inserting HTML content as text.
  • I added the types for the rewriter to the global object, mostly because I couldn't seem to get it working with solely having them in fastly:html-rewriter. Ideally someone who knows their way around the module system better can tell me what to do here 😄

headers: { 'Content-Type': 'text/html' },
}).text();
strictEqual(textEscape, expectedEscape);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test that makes sure Content-Type directives also work ok? As in, that they're still correctly detected as HTML?

Additionally, is there any way to force this to run for, e.g. xhtml, xml, or other things that might technically be parseable? I guess you're expected to just wrap it in new Response(incomingBody, { headers: { 'Content-Type': 'text/html' } })?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rewriter isn't affected by the Content-Type headers, so I'm not sure we need that here, but happy to add it if you still think it's worthwhile!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, if it doesn't care about the content-type header, then that's fine. Since you were using it in every test, I assumed that meant the rewriter wouldn't fire unless it detected HTML content type.

Copy link

@zkat zkat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants