Skip to content

Optimization Detective's XPath expressions in HTML can be erroneously interpreted as JS/CSS comments #1947

Open
@westonruter

Description

@westonruter

Bug Description

I just discovered that certain plugins (example) may implement HTML compression that collapses whitespace and removes JS/CSS comments by means of regular expressions. For example (do not do this):

$buffer = preg_replace( '@\/\*(.*?)\*\/@s', ' ', $buffer );

This is dangerous and will often result in corrupted HTML markup.

This has the effect of turning this:

<p>This is an XPath: <code>/HTML/BODY/*[1][self::DIV]</code></p>

<p>This is a Script:</p>

<pre class="wp-block-code"><code>&lt;script>
/* example script */
&lt;/script></code></pre>

Into:

<p>This is an XPath: <code>/HTML/BODY &lt;/script&gt;</code></pre>

This is because the regular expression starts matching the supposed comment start in the first paragraph which mentions /HTML/BODY/* and then it terminates the "comment" in the code sample in the third block.

This will break pages that contain data-od-xpath attributes which Optimization Detective adds when detection is needed.

Granted, such regular expression logic will invariably cause many more problems than this, but we might want to consider hardening the data-od-xpath attribute to prevent this from happening, namely by doing something like base64-encoding it in the attribute via base64_encode() in PHP. It could then be converted back to the non-encoded form in JavaScript via atob().

Just something to consider.

Metadata

Metadata

Labels

Needs DiscussionAnything that needs a discussion/agreement[Plugin] Optimization DetectiveIssues for the Optimization Detective plugin[Type] BugAn existing feature is broken

Type

Projects

Status

Not Started/Backlog 📆

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions