-
Notifications
You must be signed in to change notification settings - Fork 22.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Markdown] [Web/API] Convert webapi to markdown (DO NOT SQUASH MERGE) #8886
Conversation
Here's a Google sheet listing 2000 randomly selected pages under Web/API: https://docs.google.com/spreadsheets/d/1kKeUNNZlxaLUV2KsDtnflHNedm9Jm2EwDvdgb64QnPw/edit#gid=0 . I've shared with possible reviewers. The purpose of this is just to ensure different people review different pages. To use it:
Thanks! |
@wbamberg : There is something strange with http://localhost:5000/en-US/docs/Web/API/Document_object_model/Using_the_W3C_DOM_Level_1_Core/Example. It gets a new flaw: unsafe_html… |
The converter seems to be converting something like:
...into:
...which is... surprising? |
It seems like if there's a files/en-us/web/api/document_object_model/using_the_w3c_dom_level_1_core/example/index.html is the only page in Web/API whose first element is a I wonder if this is connected with the converter using indentation for plain I can fix this by adding |
A few notes on problems that are not to do with conversion:
|
http://localhost:5000/en-US/docs/web/api/offscreencanvas/converttoblob has this: **`OffscreenCanvas.convertToBlob()`**method …because the source has this: The <strong><code>OffscreenCanvas.convertToBlob()</code></strong>method That HTML renders as expected but the lack of a space in What should I do for these kinds of cases? Ignore it? I could do a regex replace across all
So that’s 14 cases of the pattern like
So that’s 3 cases of the flip-side pattern like If we want, I can push a commit with those fixes to this branch. Or we could also just change this in a follow-up PR post-merge. It’s not strictly a problem with the conversion — instead it’s a problem with broken markup in the source (garbage in, garbage out…) |
Great catch @sideshowbarker ! If you could push a fix for these 17 cases to this PR I think that would be ideal. (I assume you mean fix the converted Markdown, not fix the upstream HTML?) |
Yup, I meant fixing the Markdown output. Will run into and write up a commit. |
This change fixes some cases where, because the HTML source markup lacked a space before a <strong> start tag or after a </strong> end tag, the resulting markdown output didn’t render as expected.
I just pushed a fix for this "unsafe HTML" issue. |
I've found a few places where conversion of |
I found one |
http://localhost:5000/en-US/docs/web/api/document/createelement#parameters has this: - _options _{{optional_inline}} …because the HTML it was converted from has this: <dt><var>options </var>{{optional_inline}}</dt> I’ll try to see if I can put together a regex pattern to catch any other possible cases of that (without catching a bunch of false positives) |
In http://localhost:5000/en-US/docs/web/api/webglrenderbuffer, there was a |
I think this is strictly optional. On the one hand it's a small low-risk change that's a definite improvement. On the other it's out of scope for this PR, isn't a regression from the HTML and is one of probably many bits of broken markup (we have fixed many in this work, but I'm sure there are many left). |
I will do follow-ups afterward then. |
This change fixes a bunch of cases of misplaced-space borkage found by markdownlint.
This change simplifies a few cases where the source was using emphasis inside code. It simplifies them by dropping the emphasis.
This change fixes some cases where the ">" blockquote marker ended up in the middle of link text of a hyperlink — which caused markdownlint to complain.
This change fixes a few cases with superfluous/redundant emphasis markup — as well as one case where for some reason the converter failed to convert some <em>...</em> marked to markdown.
This fixes a case where apparently the source has both a macro call and an HTML hyperlink for the same thing. So this just drops the hyperlink and keeps the macro call.
This fixes a few cases where the sources had excessive code markup that didn’t serve a real informational purpose (and that made markdownlint confused). It just drops or tweaks some of the markdown.
We've reviewed the 2000 pages in the spreadsheet, about 1/3 of the total number of pages. We've found a smattering of small issues, mostly to do with badly formed inline markup. We've tried to make generalized fixes across the Web/API docs (especially @sideshowbarker !) I think there's a good chance that there will be a few more of these little issues in the docs, but I don't think it's likely that there are big problems with this conversion, so I think we should merge it. |
+1 — I think we can be confident at this point that we’ve scrutinized everything pretty thoroughly |
Fixes #8741 .
This PR converts all the docs under Web/API into Markdown.