-
-
Notifications
You must be signed in to change notification settings - Fork 309
Non-deterministic JSON Schema validation #332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@danielpeintner I'm trying to figure out what you're getting at. Your example doesn't really demonstrate any "ambiguity" or non-determinism, all validators are going to return the same result, no matter what the input document is. JSON Schema is fundamentally a set of assertions being made about a document, and a validating document is one that conforms to those assertions. Most of those assertions serve to give the JSON document structure - they describe which properties an object can have, and so on. Some are more complex, and don't translate well to structural rules. Especially "oneOf". |
Thank you @awwright for your feedback. Please let me explain my expectation (or desire) what I can get from JSON schema. I hoped to get a deterministic automaton out of JSON schema meaning that I can validate JSON instances by an automaton (see sketch below). At any state (i.e. circle) I wanted to make deterministic decisions where to got next. This works I think with most of the JSON schema constructs but it does not work for "oneOf". FYI: Having some experience in the XML world I made some initial assumptions. In XML schema such a ambiguity is not possible. I guess these assumptions are not true for JSON schema. I hope this clarifies my issue. Are my expectations wrong with that regard? |
I think I understand what you're getting at here. Why do you feel it matters? Why do you want to be able to do this? |
(FWIW, your nondeterministic transition can be transformed into a deterministic one, [I think in the same way you'd do so for a DFA vs NFA] -- just put the two tests in serial rather than in parallel, and add an extra transition to failure for the case where both pass.) So I also would love to hear why this matters. |
It is not the only reason but I am working on applying EXI on various Web formats to improve data exchange and processing. The EXI approach is very general and flexible. The algorithm uses a grammar to determine what is likely to occur at any given point. EXI for JSON can be used already now with a generic schema (see playground). Having said that, more accurate schemas (or respectively EXI grammars) are beneficial w.r.t. processing, size etc. Hence I was trying to look at JSON schema and how it could be mapped to EXI grammars or as intermediary step XML schemas (both are regular and deterministic). Hope this clarifies what I try to get. |
@danielpeintner Ah, cool! Actually, we've got Issue #13 as an open-ended issue for EXI stuff. EXI, as I understand, doesn't need to outright validate documents. Could you interpret "oneOf" as a union, instead of a set of mutually exclusive options? Or at least treat it the same as "anyOf"? Keep in mind, while JSON Schema doesn't intend to provide options for every sort of data validation anyone might want to ever perform, it does offer some validation assertions that go beyond structural validation. Like, "oneOf", like "not". Also see @Julian's comment, it's possible to factor out the "firstName" in your example schema. Though I don't know if it's possible to do such a conversion in general, with the incompleteness theorem and whatnot. |
Thanks for your input!
Correct. EXI does not require data to be fully valid. A simplification of oneOf and other constructs might be fine also.
I fear the complexity a bit. Anyhow, I will discuss this also withing the EXI working group and keep you informed. |
@danielpeintner this seems to be one of many things that would work best with a subset of the validation vocabulary. Other examples are the UI rendering and code generation vocabularies, both of which we are starting to discuss in a dedicated new vocabulary repository The direction we are going with those is to say that they may impose restrictions on which validation keywords can work with the added features of the new vocabularies. Hyper-Schema already does this to a limited extent, e.g. links under a negated schema MUST be ignored because there is no sensible way to figure out whether they are applicable, or how they would relate to the instance. The goal is to allow validation to be as expressive as it needs to be, while also allowing other applications of JSON Schema to exclude complex validation constructs that are not helpful for those other applications. If this approach sounds like what you need, please take a look at the new vocab repository I linked above. You can also propose another vocabulary if this sounds like the right general idea but your needs don't fit any current proposals. Or just open an issue without specifying a particular vocabulary if you are not sure where the solution would ideally live. If that works out, please comment on or open a new issue there, and close this one. If not, I'd like to figure out what action is needed to resolve this. Note that I'll be submitting an |
@danielpeintner given that it's been nearly two months (minus a couple days) since my last comment with no reply, I'm going to close this since I don't see an actionable concern. At least none other than "work with EXI" and we already have #13 for that. I still think that if there is anything here it is best addressed by either a restricted or extended vocabulary, and should be filed in the json-schema-vocabularies repo. However, if I am missing something here, or if you meant to comment but just forgot, please do re-open this. |
I am experimenting a bit with JSON schema rules and was surprised about non-deterministic JSON Schema validation rules (or allowing ambiguity).
Let me use an example JSON schema that describes a document that may have just "firstName" or "firstName" and "lastName".
Note: I do know that my constraints could be described in a much easier way. This is purely to give you an idea about the problem.
oneOf
contains essentially the same/similar content. After detecting "firstName" one still not knows whether the first or the second branch of oneOf is correct. This could be even more complex with nesting and such...Having said that, I wonder why JSON schema allows such ambiguity and does not forbid that.
FYI: XML schema is more strict in that sense and a similar schema with these constraints would not be valid.
Do I miss anything here?
Any thoughts?
Thanks for your feedback!
The text was updated successfully, but these errors were encountered: