-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not normalize language tags in D-interpretations #96
base: main
Are you sure you want to change the base?
Conversation
More over (RDF 1.1): "The value space of language tags is always in lower case." so even less affect. |
As many implementations (including my own) have always stored a normalized version of the language tag, it was always common for these syntactically distinct literals to be considered equivalent. There are no 1.1 tests that check to be sure they create distinct literals IIRC. |
yes, I spotted this sentence, which was a bit sloppy (language tags are not a datatype, so they don't have a value space, strictly speaking). But has gone from the RDF 1.2 spec anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the requirement that language tags in RDF graphs be in lower case?
RDF 1.2 : Not lower case BCP47 says that language tags must be treated case-insensitively. |
So the lower-casing should not be removed in Semantics then. |
I should have specified that this PR is (maybe) pending on w3c/rdf-concepts#162 .
The goal is to convey the idea that, in the abstract syntax, the language tag is no longer a string, strictly speaking. It is an abstraction of that string that is effectively "case-less". (Of course, implementations will probably continue to represent them as strings, but that is now an implementation detail). Given this background, there are two reasons to remove the lower-casing in the semantics
Now, I realize that this also means that the value space of language tags has changed. It used to be a set of pairs (string, lower-case-BCP47-string), it is now a set of pairs (string, abstract language tag). Those two sets are isomorphic, and therefore, it does not change the behaviour of reasoners in any way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
So this PR should be held until that PR is resolved and merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to wait for the resolution of how to handle language tags in Concepts.
In D-interpretation, language tags are no longer converted to lowercase. This is because the RDF 1.2 abstract syntax ignores the case of [BCP47] strings (unlike RDF 1.1). In other words, "chat"@fr and "chat"@FR now have the same language tag in the abstract syntax, and therefore it is not necessary to normalize them in the interpretation.
Note also that this change does not require any other change in the semantics, IMO. In fact, it makes the job of the semantics slightly easier (as this PR shows), because the abstract syntax is now taking care of "merging" equivalent language strings rather than propagating an irrelevant difference.
I mark this PR as substantive because, technically, this is a breaking change in RDF 1.2 with implications on entailment.
In RDF 1.1, the following turtle files produce different graphs, which are neither isomorphic nor simply-equivalent (they are RDF-equivalent, though).
In RDF 1.2, the two turtle files now produce the exact same graphs (and therefore, they are simply-equivalent)
Note however the following.
"foo"
and"foo"^^xsd:string
) are now indistinguishable, and that's for the best.Preview | Diff