Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not normalize language tags in D-interpretations #96

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pchampin
Copy link
Contributor

@pchampin pchampin commented Feb 26, 2025

In D-interpretation, language tags are no longer converted to lowercase. This is because the RDF 1.2 abstract syntax ignores the case of [BCP47] strings (unlike RDF 1.1). In other words, "chat"@fr and "chat"@FR now have the same language tag in the abstract syntax, and therefore it is not necessary to normalize them in the interpretation.

Note also that this change does not require any other change in the semantics, IMO. In fact, it makes the job of the semantics slightly easier (as this PR shows), because the abstract syntax is now taking care of "merging" equivalent language strings rather than propagating an irrelevant difference.


I mark this PR as substantive because, technically, this is a breaking change in RDF 1.2 with implications on entailment.
In RDF 1.1, the following turtle files produce different graphs, which are neither isomorphic nor simply-equivalent (they are RDF-equivalent, though).

#G1
:s :p "chat"@fr .
#G2
:s :p "chat"@FR.

In RDF 1.2, the two turtle files now produce the exact same graphs (and therefore, they are simply-equivalent)

Note however the following.

  • This change is similar to the deprecation of "simple literals" in RDF 1.1. Two things that were once distinct ("foo" and "foo"^^xsd:string) are now indistinguishable, and that's for the best.
  • Most implementations won't be affected, as they are normalizing literals internally already --which was explicitly allowed by the spec in RDF 1.1. For that reason, this changes is even less disruptive than the deprecation of simple literals.

Preview | Diff

@pchampin pchampin added the spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature label Feb 26, 2025
@afs
Copy link
Contributor

afs commented Feb 26, 2025

Most implementations won't be affected,

More over (RDF 1.1): "The value space of language tags is always in lower case." so even less affect.

@gkellogg
Copy link
Member

I mark this PR as substantive because, technically, this is a breaking change in RDF 1.2 with implications on entailment. In RDF 1.1, the following turtle files produce different graphs, which are neither isomorphic nor simply-equivalent (they are RDF-equivalent, though).

As many implementations (including my own) have always stored a normalized version of the language tag, it was always common for these syntactically distinct literals to be considered equivalent. There are no 1.1 tests that check to be sure they create distinct literals IIRC.

@pchampin
Copy link
Contributor Author

More over (RDF 1.1): "The value space of language tags is always in lower case." so even less affect.

yes, I spotted this sentence, which was a bit sloppy (language tags are not a datatype, so they don't have a value space, strictly speaking). But has gone from the RDF 1.2 spec anyway.

Copy link
Contributor

@pfps pfps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the requirement that language tags in RDF graphs be in lower case?

@afs
Copy link
Contributor

afs commented Feb 27, 2025

RDF 1.2 : Not lower case

BCP47 says that language tags must be treated case-insensitively.
See also in the section in Concepts on literals.

@pfps
Copy link
Contributor

pfps commented Feb 27, 2025

RDF 1.2 : Not lower case

BCP47 says that language tags must be treated case-insensitively. See also in the section in Concepts on literals.

So the lower-casing should not be removed in Semantics then.

@pchampin
Copy link
Contributor Author

So the lower-casing should not be removed in Semantics then.

I should have specified that this PR is (maybe) pending on w3c/rdf-concepts#162 .
At least, the text proposed in that PR should make the rationale of this one clearer:

Two [ BCP47 ]-complying strings that differ only by case represent the same language tag .

The goal is to convey the idea that, in the abstract syntax, the language tag is no longer a string, strictly speaking. It is an abstraction of that string that is effectively "case-less". (Of course, implementations will probably continue to represent them as strings, but that is now an implementation detail).

Given this background, there are two reasons to remove the lower-casing in the semantics

  • it is not necessary to ensure that "equivalent" language strings denote the same thing (this is now handled by the abstract syntax abstracting away the case of the language tag in the concrete syntaxes)
  • arguably, those "abstract" language tags being not strings, they can not be lowercased...

Now, I realize that this also means that the value space of language tags has changed. It used to be a set of pairs (string, lower-case-BCP47-string), it is now a set of pairs (string, abstract language tag). Those two sets are isomorphic, and therefore, it does not change the behaviour of reasoners in any way.

Copy link
Member

@TallTed TallTed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@pfps
Copy link
Contributor

pfps commented Feb 28, 2025

So the lower-casing should not be removed in Semantics then.

I should have specified that this PR is (maybe) pending on w3c/rdf-concepts#162 .

So this PR should be held until that PR is resolved and merged.

Copy link
Contributor

@pfps pfps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to wait for the resolution of how to handle language tags in Concepts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants