-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: improve HTML layer detection, various MD fixes #1241
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
🟢 Require two reviewer for test updatesWonderful, this rule succeeded.When test data is updated, we require two reviewers
|
24a73e3
to
333eb34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments:
- Following up with the discussion thread, would it be better to set the heuristic parameter to
False
by default? In this way, we ensure that no content is lost for users who are not aware of this feature - In the heuristic rule, I would switch to
ContentLayer.BODY
as soon as we see anh*
tag. Currently this only happens once we see<h1>
and the specifications recommend that you do not skip heading levels but just in case we see documents starting with, say,<h2>
and skipping<h1>
.
Markdown fixes: - properly propagate section header levels - improve handling of list subroots without text Signed-off-by: Panos Vagenas <[email protected]>
333eb34
to
40c099e
Compare
Primary bug being addressed:
During HTML parsing (which BTW is also triggered by Markdown parsing when HTML elements are present), in case there were headings (
<h*>
tags) but no<h1>
tags, all content was being marked as furniture.Now the heuristic is being adapted so that:
Markdown fixes: