Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify that contentSchema holds a subschema and when/how it applies #1564

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

gregsdennis
Copy link
Member

@gregsdennis gregsdennis commented Nov 21, 2024

What kind of change does this PR introduce?

clarification

Issue & Discussion References

Summary

Updates the text for contentSchema to indicate that its value is indeed a subschema (and therefore should be treated as such when scanning for identifiers). Also cleans up language around its dependency on contentMediaType.

I didn't include any explicit text about it containing identifiers. Instead I declare that the value is a subschema and removed the "SHOULD ignore" text discussed in the issue.

Also pertinent to the issue discussion is the final couple sentences (already present), which I broke out into a new paragraph rewrote to make it more apparent that it is a note of guidance rather than a requirement.

Accessing the schema through the schema location IRI included as part of the
annotation will ensure that it is correctly processed as a subschema. Using the
extracted annotation value directly is only safe if the subschema is an embedded
resource with both $schema and an absolute IRI $id.

Does this PR introduce a breaking change?

No.

@gregsdennis gregsdennis requested a review from a team November 21, 2024 21:30
@gregsdennis gregsdennis self-assigned this Nov 21, 2024
@gregsdennis gregsdennis added this to the stable-release milestone Nov 21, 2024
Comment on lines 545 to 548
Since `contentMediaType` is required to provide instruction on how to interpret
string content, the annotation schema produced by this keyword has no meaning if
`contentMediaType` is not present.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer that no annotation is produced at all if contentMediaType is missing -- in order to discourage structuring schemas in this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that then also mean that identifiers are not to be processed in that case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that then also mean that identifiers are not to be processed in that case?

IMO, it should always be treated as a normal schema location and therefore always respect identifiers. But, I agree that it shouldn't produce an annotation if it isn't valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we happy with it saying that an annotation SHOULD not be produced (etc.)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be a "MUST".


Accessing the schema through the schema location IRI included as part of the
annotation will ensure that it is correctly processed as a subschema. Using the
extracted annotation value directly is only safe if the subschema is an embedded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"safe" and "correctly processed as a subschema" is vague -- can we say something else? I think this is trying to say that the evaluation behaviour won't be reproducable because the schema is evaluated in isolation, rather than in the context of the surrounding dialect and location identifier (from the containing schema's $schema and $id keywords). So how about instead saying something like:

Because this subschema is intended to be processed in isolation, outside of the context of its containing schema, usage of both the $schema and $id keywords is recommended to ensure predictable and reproducable results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text was already present. I just put it in a new paragraph. I did think it was a bit convoluted. Happy to update it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it could be okay to use the subschema in its original context, so that should still be addressed. This is what the original discussion was about.


Accessing the schema through the schema location IRI included as part of the
annotation will ensure that it is correctly processed as a subschema. Using the
extracted annotation value directly is only safe if the subschema is an embedded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make this a SHOULD?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're suggesting. This is informative, not a requirement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So from what I'm reading here, there are edge cases in contentSchema if the schema doesn't have $id and $schema. If that's the case, shouldn't we highlight in more of a "SHOULD" manner, which from what I understand, it something you should do rather than a MUST (something you HAVE to do)? Or maybe "RECOMMENDED" is the right one similar to this case: https://json-schema.org/draft/2020-12/json-schema-validation#section-7.2.2-5?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an edge case. It just means that if you intend to use the contentSchema subschema solely in its own context, as you would if you received it as an annotation, then any relative $refs will only be resolvable if the subschema has both $schema and $id.

This goes back to my comment here where I show a contentSchema subschema attempting to reference a definition in its parent schema. If you extract the subschema (again, because you've received it as an annotation), then that reference fails.

There's not a best practice here. Both approaches have valid use cases, and schema authors are free to do what makes sense for them. This is merely a caution to schema authors to understand the implications of their approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's obvious to me from @karenetheridge's and your comments that this paragraph isn't clear, so I'll reword it.

@gregsdennis
Copy link
Member Author

@karenetheridge / @jviotti I've rewritten the last paragraph note and added another "editor" footnote that points back to the subject issue linked above. Let me know what you think.

@jviotti
Copy link
Member

jviotti commented Nov 25, 2024

Reads much better now!

specs/jsonschema-validation.md Outdated Show resolved Hide resolved
Comment on lines 545 to 548
Since `contentMediaType` is required to provide instruction on how to interpret
string content, the annotation schema produced by this keyword has no meaning if
`contentMediaType` is not present.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that then also mean that identifiers are not to be processed in that case?

IMO, it should always be treated as a normal schema location and therefore always respect identifiers. But, I agree that it shouldn't produce an annotation if it isn't valid.

@gregsdennis
Copy link
Member Author

@karenetheridge @jdesrosiers @jviotti I believe I've addressed your concerns here.

Copy link
Member

@jviotti jviotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@gregsdennis
Copy link
Member Author

Closing and reopening to rerun build

@gregsdennis gregsdennis reopened this Jan 17, 2025
@gregsdennis gregsdennis dismissed karenetheridge’s stale review January 17, 2025 11:21

Given the 👍 above, I'm assuming approval.

@gregsdennis gregsdennis force-pushed the gregsdennis/contentSchema branch from 015414b to f6276dc Compare January 17, 2025 20:45
Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it seems this one slipped through the cracks for me a while ago.

specs/jsonschema-validation.md Outdated Show resolved Hide resolved
Comment on lines +550 to +551
Note that evaluating the `contentSchema` subschema in-place (i.e. as part of its
parent schema) will ensure that it is correctly processed. Independent use of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial reaction was that this doesn't make sense. The annotation is just the subschema. It's no longer in-place. It doesn't include the context of where it came from. So, how can it be evaluated in-place? Then it occurred to me that an annotation includes not just it's value, but also the schema location it came from and that location can be used to evaluate the contentSchema in-place. I don't think most readers are going to be knowledgeable enough to make that leap. This could use some clarification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the footnote ([^7]) that follows not provide that clarity?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not the thing I'm saying needs to be clarified. We say that the annotation is the subschema. We say that the subschema shouldn't be evaluated out of context from where it appeared in the schema and we explain why in footnote 7. What we don't explain is given a subschema without its parent context, how is it even possible to evaluate it in context. The value of the annotation is just the subschema, not the context. We can't evaluate the subschema in context because we don't know the context in which it needs to be evaluated. I hope that makes sense this time.

Of course the solution is that the location of the annotation keyword in the schema is how you know the context, but that's not intuitive. This is the only annotation where the location of the keyword in the schema is useful or necessary to know. Usually, we only care about the value of the annotation. In this case, we need to know the value and the schema location of the annotation. Actually, when used correctly (in context), the value of the annotation is useless and it's the schema location that the user actually uses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only annotation where the location of the keyword in the schema is useful or necessary to know. Usually, we only care about the value of the annotation.

This is incorrect. Annotation location has always been useful, especially in cases where you receive annotations from the same keyword in different locations, e.g. from title. The location allows the consumer to decide which (or both/all) it wants to use. This is Core, where annotations are defined.

We can't evaluate the subschema in context because we don't know the context in which it needs to be evaluated.

How would we not know the context? It's conveyed by the annotation location, which has always been defined to be a part of an annotation.


It's still not clear why you think that the existing text (including the lines following these) is insufficient. It's saying, "Don't just evaluate this annotation value as a schema because it may rely on things that exist externally to it. You probably need to evaluate it where it came from."

It's actually saying all of that, and then the footnote expands on that warning using an example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't evaluate the subschema in context because we don't know the context in which it needs to be evaluated.

How would we not know the context? It's conveyed by the annotation location, which has always been defined to be a part of an annotation.

Yes, you're right. I acknowledged that in next paragraph. I was walking you through my thought process when I first read it and what I believe the vast majority of readers will be thinking when they read this section. If it took me a minute to make that connection, most readers won't make it at all. Yes, the concept is unambiguously documented elsewhere, but most readers won't have every detail of JSON Schema memorized and I think this is a pretty esoteric detail.

Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to fight with you about this. The text is correct even if I think readers will find it confusing. I'll approve the PR with or without the clarification I'm asking for.

Comment on lines +550 to +551
Note that evaluating the `contentSchema` subschema in-place (i.e. as part of its
parent schema) will ensure that it is correctly processed. Independent use of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't evaluate the subschema in context because we don't know the context in which it needs to be evaluated.

How would we not know the context? It's conveyed by the annotation location, which has always been defined to be a part of an annotation.

Yes, you're right. I acknowledged that in next paragraph. I was walking you through my thought process when I first read it and what I believe the vast majority of readers will be thinking when they read this section. If it took me a minute to make that connection, most readers won't make it at all. Yes, the concept is unambiguously documented elsewhere, but most readers won't have every detail of JSON Schema memorized and I think this is a pretty esoteric detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

contentSchema has implementation-defined referencing behavior when contentMediaType is not present
4 participants