Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguate attribute declarations from content attributes #10756

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sideshowbarker
Copy link
Member

@sideshowbarker sideshowbarker commented Nov 9, 2024

The change updates the spec to use the term “attribute declaration” in all cases where what the spec is referring to is actually the syntax for declaring attributes in markup.

Otherwise, without this change, the spec uses the same term “attribute” to refer both to attribute declarations in markup and to actual content attributes as they exist in the DOM.

And we have evidence of authors being confused due to that ambiguous use of the same term to refer to different things — and evidence suggesting the ambiguous usage promotes the wrong mental model of HTML for authors.


/forms.html ( diff )
/index.html ( diff )
/indices.html ( diff )
/introduction.html ( diff )
/parsing.html ( diff )
/syntax.html ( diff )

The change updates the spec to use the term “attribute declaration” in
all cases where what the spec is referring to is actually the syntax for
declaring attributes in markup.

Otherwise, without this change, the spec uses the same term “attribute”
to refer both to attribute declarations in markup and to actual content
attributes as they exist in the DOM.

And we have evidence of authors being confused due to that ambiguous use
of the same term to refer to different things — and evidence suggesting
the ambiguous usage promotes the wrong mental model of HTML for authors.
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm +0 on this. It seems vaguely nice to have this kind of distinction, similar to the distinction between tag and element. But unlike tag/element, we don't have a long history of such a distinction to draw on, and the names "attribute" vs. "attribute declaration" are not as distinct in a helpful way. So it might also be OK to leave things ambiguous.

For example, we currently leave things ambiguous between "attribute values" as seen in markup, and "attribute values" as seen in the DOM. Similarly for "CDATA sections" and "comments".

One thing I worry about with this change is that we may not have carefully audited all usage of "attributes". For example, the following quotes don't seem updated, and maybe they should be:

  • The "get an attribute" algorithm
  • "When an end tag token is emitted with attributes" (and various other tokenizer references)
  • "For example, the parsing of certain named character references in attributes"
  • "If the attribute is present, its value must either be the empty string or" (???)

Anyway, as long as we make the suggested fixes to preserve IDs here, and other editors or community members don't have strong arguments against it, I'd be up for merging this.

data-x="syntax-attribute-value">value</span>, separated by an "<code data-x="">=</code>" character.
The attribute value can remain <a href="#unquoted">unquoted</a> if it doesn't contain <span>ASCII
whitespace</span> or any of <code data-x="">"</code> <code data-x="">'</code> <code
<p><span data-x="syntax-attribute-declarations">Attribute declarations</span> are placed inside
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should preserve the old ID by changing the <p> to <p id="syntax-attributes">

@@ -122938,10 +122939,10 @@ dictionary <dfn dictionary>StorageEventInit</dfn> : <span>EventInit</span> {
</ol>


<h5>Attributes</h5>
<h5>Attribute declarations</h5>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly we should preserve the ID here

@@ -124064,7 +124068,7 @@ dictionary <dfn dictionary>StorageEventInit</dfn> : <span>EventInit</span> {
<tr>
<td><dfn data-x="parse-error-missing-whitespace-between-attributes">missing-whitespace-between-attributes</dfn>
<td><p>This error occurs if the parser encounters <span
data-x="syntax-attributes">attributes</span> that are not separated by <span>ASCII
data-x="syntax-attribute-declarations">attributes</span> that are not separated by <span>ASCII
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the text here too, maybe?

@zcorpan
Copy link
Member

zcorpan commented Nov 18, 2024

If we do this I think we should change all of the syntax terms similarly:

  • DOCTYPE declaration
  • element declarations
  • maybe "text" is fine as-is, since it's usually referred to as "Text node" in the DOM
  • comment declarations
  • CDATA section declarations
  • attribute declaration name
  • attribute declaration value

Maybe this makes it easier to separate the syntax vs DOM concepts, but it also makes it more wordy to talk about the syntax.

The HTML parser's tokenizer produces data structures with similar names but are distinct to both the DOM and the syntax concepts.

Start and end tag tokens have a tag name, a self-closing flag, and a list of attributes, each of which has a name and a value.

It would be incorrect for the tokenizer terms to link to the syntax terms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Standard could be clearer
Development

Successfully merging this pull request may close these issues.

3 participants