Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Markdown] [Web/API] Convert webapi to markdown (DO NOT SQUASH MERGE) #8886

Merged
merged 16 commits into from
Sep 14, 2021

Conversation

wbamberg
Copy link
Collaborator

Fixes #8741 .

This PR converts all the docs under Web/API into Markdown.

@wbamberg wbamberg requested a review from a team as a code owner September 13, 2021 17:21
@wbamberg wbamberg requested review from sideshowbarker and removed request for a team September 13, 2021 17:21
@wbamberg wbamberg changed the title Convert webapi to markdown (DO NOT SQUASH MERGE) [Markdown] [Web/API] Convert webapi to markdown (DO NOT SQUASH MERGE) Sep 13, 2021
@wbamberg
Copy link
Collaborator Author

Here's a Google sheet listing 2000 randomly selected pages under Web/API: https://docs.google.com/spreadsheets/d/1kKeUNNZlxaLUV2KsDtnflHNedm9Jm2EwDvdgb64QnPw/edit#gid=0 . I've shared with possible reviewers.

The purpose of this is just to ensure different people review different pages.

To use it:

  • put your name beside some set of pages, in the "Reviewer" column
  • check out this branch, and run yarn start
  • open the pages you claimed, and see how they look. Look especially for mangled formatting and live samples that don't work properly.
  • you could also check the Markdown source, although I think that is less likely to be a problem (since the conversion report tells us that all expected elements got converted)
  • capture any problems as comments to this PR

Thanks!

@teoli2003
Copy link
Contributor

@wbamberg : There is something strange with http://localhost:5000/en-US/docs/Web/API/Document_object_model/Using_the_W3C_DOM_Level_1_Core/Example. It gets a new flaw: unsafe_html…

@wbamberg
Copy link
Collaborator Author

@wbamberg : There is something strange with http://localhost:5000/en-US/docs/Web/API/Document_object_model/Using_the_W3C_DOM_Level_1_Core/Example. It gets a new flaw: unsafe_html…

The converter seems to be converting something like:

<pre> &lt;html&gt;
...
</pre>

...into:

<html>
...

...which is... surprising?

@wbamberg
Copy link
Collaborator Author

It seems like if there's a <pre> block containing HTML, and there are no elements before it, then it's getting converted as raw HTML.

files/en-us/web/api/document_object_model/using_the_w3c_dom_level_1_core/example/index.html is the only page in Web/API whose first element is a <pre> .

I wonder if this is connected with the converter using indentation for plain <pre> blocks rather than backticks (and then the indentation is getting destroyed somehow).

I can fix this by adding class="brush: html" to the <pre> block. I'm going to wait to see whether any other issues are found in these pages before making that fix though.

@hamishwillee
Copy link
Collaborator

hamishwillee commented Sep 14, 2021

A few notes on problems that are not to do with conversion:

@sideshowbarker
Copy link
Member

sideshowbarker commented Sep 14, 2021

http://localhost:5000/en-US/docs/web/api/offscreencanvas/converttoblob has this:

**`OffscreenCanvas.convertToBlob()`**method

…because the source has this:

The <strong><code>OffscreenCanvas.convertToBlob()</code></strong>method

That HTML renders as expected but the lack of a space in `**method causes the markdown to not render as expected.

What should I do for these kinds of cases? Ignore it?

I could do a regex replace across all files/en-us/web/api/ files in this branch to do s/\*\*(\w)/** $1/ — and also s/(\w)**/$1 \*\*/ — and that would make the markdown render as expected.

rg -P '`\*\*\w' files/en-us/web/api/ | wc -l
      14

So that’s 14 cases of the pattern like convertToBlob()**method`

$ rg -P '\w\*\*`' files/en-us/web/api/
files/en-us/web/api/bluetoothremotegattcharacteristic/writevaluewithresponse/index.md
15:The**`BluetoothRemoteGATTCharacteristic.writeValueWithResponse()`** method sets a {{domxref("BluetoothRemoteGATTCharacteristic")}} object’s `value` property to the bytes contained in a given {{JSxRef("ArrayBuffer")}}, calls [`WriteCharacteristicValue`(_this_=`this`, _value=value_, _response_=`"required"`)](https://webbluetoothcg.github.io/web-bluetooth/#writecharacteristicvalue), and returns the resulting {{JSxRef("Promise")}}.

files/en-us/web/api/bluetoothremotegattcharacteristic/writevalue/index.md
19:The**`BluetoothRemoteGATTCharacteristic.writeValue()`** method sets a {{domxref("BluetoothRemoteGATTCharacteristic")}} object’s `value` property to the bytes contained in a given {{JSxRef("ArrayBuffer")}}, calls [`WriteCharacteristicValue`(_this_=`this`, _value=value_, _response_=`"optional"`)](https://webbluetoothcg.github.io/web-bluetooth/#writecharacteristicvalue), and returns the resulting {{JSxRef("Promise")}}.

files/en-us/web/api/rtcpeerconnection/icegatheringstate/index.md
15:The read-only property**`RTCPeerConnection.iceGatheringState`** returns a string

So that’s 3 cases of the flip-side pattern like **`BluetoothRemoteGATTCharacteristic.

If we want, I can push a commit with those fixes to this branch.

Or we could also just change this in a follow-up PR post-merge.

It’s not strictly a problem with the conversion — instead it’s a problem with broken markup in the source (garbage in, garbage out…)

@wbamberg
Copy link
Collaborator Author

Great catch @sideshowbarker ! If you could push a fix for these 17 cases to this PR I think that would be ideal. (I assume you mean fix the converted Markdown, not fix the upstream HTML?)

@sideshowbarker
Copy link
Member

(I assume you mean fix the converted Markdown, not fix the upstream HTML?)

Yup, I meant fixing the Markdown output.

Will run into and write up a commit.

sideshowbarker and others added 2 commits September 14, 2021 12:40
This change fixes some cases where, because the HTML source markup
lacked a space before a <strong> start tag or after a </strong> end tag,
the resulting markdown output didn’t render as expected.
@wbamberg
Copy link
Collaborator Author

It seems like if there's a <pre> block containing HTML, and there are no elements before it, then it's getting converted as raw HTML.

files/en-us/web/api/document_object_model/using_the_w3c_dom_level_1_core/example/index.html is the only page in Web/API whose first element is a <pre> .

I wonder if this is connected with the converter using indentation for plain <pre> blocks rather than backticks (and then the indentation is getting destroyed somehow).

I can fix this by adding class="brush: html" to the <pre> block. I'm going to wait to see whether any other issues are found in these pages before making that fix though.

I just pushed a fix for this "unsafe HTML" issue.

@wbamberg
Copy link
Collaborator Author

I've found a few places where conversion of <kdb> elements inside <dd> was getting mangled, and have (I think) found and fixed all of these in wbamberg@830104e.

@wbamberg
Copy link
Collaborator Author

I found one <dd> containing just a space, which the converter did not make sense of. I fixed that in d52e248 but couldn't find any other instances of this issue.

@sideshowbarker
Copy link
Member

http://localhost:5000/en-US/docs/web/api/document/createelement#parameters has this:

- _options _{{optional_inline}}

…because the HTML it was converted from has this:

 <dt><var>options </var>{{optional_inline}}</dt>

I’ll try to see if I can put together a regex pattern to catch any other possible cases of that (without catching a bunch of false positives)

@teoli2003
Copy link
Contributor

In http://localhost:5000/en-US/docs/web/api/webglrenderbuffer, there was a <code>missing at the beginning. Do you want me to fix the output here or in a follow-up (not a conversion problem per se)?

@wbamberg
Copy link
Collaborator Author

In http://localhost:5000/en-US/docs/web/api/webglrenderbuffer, there was a <code>missing at the beginning. Do you want me to fix the output here or in a follow-up (not a conversion problem per se)?

I think this is strictly optional. On the one hand it's a small low-risk change that's a definite improvement. On the other it's out of scope for this PR, isn't a regression from the HTML and is one of probably many bits of broken markup (we have fixed many in this work, but I'm sure there are many left).

@teoli2003
Copy link
Contributor

I think this is strictly optional. On the one hand it's a small low-risk change that's a definite improvement. On the other it's out of scope for this PR, isn't a regression from the HTML and is one of probably many bits of broken markup (we have fixed many in this work, but I'm sure there are many left).

I will do follow-ups afterward then.

This change fixes a bunch of cases of misplaced-space borkage found by
markdownlint.
This change simplifies a few cases where the source was using emphasis
inside code. It simplifies them by dropping the emphasis.
This change fixes some cases where the ">" blockquote marker ended up in
the middle of link text of a hyperlink — which caused markdownlint to
complain.
sideshowbarker and others added 5 commits September 14, 2021 16:57
This change fixes a few cases with superfluous/redundant emphasis markup
— as well as one case where for some reason the converter failed to
convert some <em>...</em> marked to markdown.
This fixes a case where apparently the source has both a macro call and
an HTML hyperlink for the same thing. So this just drops the hyperlink
and keeps the macro call.
This fixes a few cases where the sources had excessive code markup that
didn’t serve a real informational purpose (and that made markdownlint
confused). It just drops or tweaks some of the markdown.
@wbamberg
Copy link
Collaborator Author

wbamberg commented Sep 14, 2021

We've reviewed the 2000 pages in the spreadsheet, about 1/3 of the total number of pages. We've found a smattering of small issues, mostly to do with badly formed inline markup. We've tried to make generalized fixes across the Web/API docs (especially @sideshowbarker !) I think there's a good chance that there will be a few more of these little issues in the docs, but I don't think it's likely that there are big problems with this conversion, so I think we should merge it.

@sideshowbarker
Copy link
Member

I think there's a good chance that there will be a few more of these little issues in the docs, but I don't think it's likely that there are big problems with this conversion, so I think we should merge it.

+1 — I think we can be confident at this point that we’ve scrutinized everything pretty thoroughly

@wbamberg wbamberg merged commit 0a16b27 into mdn:main Sep 14, 2021
@wbamberg wbamberg deleted the convert-webapi-to-markdown branch September 28, 2021 04:00
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Markdown] Convert Web/API to Markdown
4 participants