Do not normalize language tags in D-interpretations #96

pchampin · 2025-02-26T16:39:54Z

In D-interpretation, language tags are no longer converted to lowercase. This is because the RDF 1.2 abstract syntax ignores the case of [BCP47] strings (unlike RDF 1.1). In other words, "chat"@fr and "chat"@FR now have the same language tag in the abstract syntax, and therefore it is not necessary to normalize them in the interpretation.

Note also that this change does not require any other change in the semantics, IMO. In fact, it makes the job of the semantics slightly easier (as this PR shows), because the abstract syntax is now taking care of "merging" equivalent language strings rather than propagating an irrelevant difference.

I mark this PR as substantive because, technically, this is a breaking change in RDF 1.2 with implications on entailment.
In RDF 1.1, the following turtle files produce different graphs, which are neither isomorphic nor simply-equivalent (they are RDF-equivalent, though).

#G1
:s :p "chat"@fr .

#G2
:s :p "chat"@FR.

In RDF 1.2, the two turtle files now produce the exact same graphs (and therefore, they are simply-equivalent)

Note however the following.

This change is similar to the deprecation of "simple literals" in RDF 1.1. Two things that were once distinct ("foo" and "foo"^^xsd:string) are now indistinguishable, and that's for the best.
Most implementations won't be affected, as they are normalizing literals internally already --which was explicitly allowed by the spec in RDF 1.1. For that reason, this changes is even less disruptive than the deprecation of simple literals.

Preview | Diff

afs · 2025-02-26T16:47:23Z

Most implementations won't be affected,

More over (RDF 1.1): "The value space of language tags is always in lower case." so even less affect.

gkellogg · 2025-02-26T18:14:18Z

I mark this PR as substantive because, technically, this is a breaking change in RDF 1.2 with implications on entailment. In RDF 1.1, the following turtle files produce different graphs, which are neither isomorphic nor simply-equivalent (they are RDF-equivalent, though).

As many implementations (including my own) have always stored a normalized version of the language tag, it was always common for these syntactically distinct literals to be considered equivalent. There are no 1.1 tests that check to be sure they create distinct literals IIRC.

pchampin · 2025-02-27T10:32:43Z

More over (RDF 1.1): "The value space of language tags is always in lower case." so even less affect.

yes, I spotted this sentence, which was a bit sloppy (language tags are not a datatype, so they don't have a value space, strictly speaking). But has gone from the RDF 1.2 spec anyway.

pfps

Where is the requirement that language tags in RDF graphs be in lower case?

afs · 2025-02-27T16:00:41Z

RDF 1.2 : Not lower case

BCP47 says that language tags must be treated case-insensitively.
See also in the section in Concepts on literals.

pfps · 2025-02-27T16:05:32Z

RDF 1.2 : Not lower case

BCP47 says that language tags must be treated case-insensitively. See also in the section in Concepts on literals.

So the lower-casing should not be removed in Semantics then.

pchampin · 2025-02-28T00:49:42Z

So the lower-casing should not be removed in Semantics then.

I should have specified that this PR is (maybe) pending on w3c/rdf-concepts#162 .
At least, the text proposed in that PR should make the rationale of this one clearer:

Two [ BCP47 ]-complying strings that differ only by case represent the same language tag .

The goal is to convey the idea that, in the abstract syntax, the language tag is no longer a string, strictly speaking. It is an abstraction of that string that is effectively "case-less". (Of course, implementations will probably continue to represent them as strings, but that is now an implementation detail).

Given this background, there are two reasons to remove the lower-casing in the semantics

it is not necessary to ensure that "equivalent" language strings denote the same thing (this is now handled by the abstract syntax abstracting away the case of the language tag in the concrete syntaxes)
arguably, those "abstract" language tags being not strings, they can not be lowercased...

Now, I realize that this also means that the value space of language tags has changed. It used to be a set of pairs (string, lower-case-BCP47-string), it is now a set of pairs (string, abstract language tag). Those two sets are isomorphic, and therefore, it does not change the behaviour of reasoners in any way.

TallTed

lgtm

pfps · 2025-02-28T17:32:54Z

So the lower-casing should not be removed in Semantics then.

I should have specified that this PR is (maybe) pending on w3c/rdf-concepts#162 .

So this PR should be held until that PR is resolved and merged.

pfps

This needs to wait for the resolution of how to handle language tags in Concepts.

pfps · 2025-03-20T16:32:07Z

With the current wording in Concepts, lowercasing is needed as an implementation of case-independent comparison.

So this PR MUST NOT be merged.

do not normalize language tags in D-interpretations

fa012d8

pchampin added the spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature label Feb 26, 2025

pchampin requested review from TallTed, doerthe, pfps and franconi February 26, 2025 16:39

pfps reviewed Feb 27, 2025

View reviewed changes

TallTed reviewed Feb 28, 2025

View reviewed changes

doerthe approved these changes Mar 6, 2025

View reviewed changes

pfps requested changes Mar 6, 2025

View reviewed changes

pchampin closed this Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do not normalize language tags in D-interpretations #96

Do not normalize language tags in D-interpretations #96

Uh oh!

pchampin commented Feb 26, 2025 •

edited by pr-preview bot

Loading

Uh oh!

afs commented Feb 26, 2025

Uh oh!

gkellogg commented Feb 26, 2025

Uh oh!

pchampin commented Feb 27, 2025

Uh oh!

pfps left a comment

Uh oh!

afs commented Feb 27, 2025

Uh oh!

pfps commented Feb 27, 2025

Uh oh!

pchampin commented Feb 28, 2025

Uh oh!

TallTed left a comment

Uh oh!

pfps commented Feb 28, 2025

Uh oh!

pfps left a comment

Uh oh!

pfps commented Mar 20, 2025

Uh oh!

Uh oh!

Do not normalize language tags in D-interpretations #96

Do not normalize language tags in D-interpretations #96

Uh oh!

Conversation

pchampin commented Feb 26, 2025 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afs commented Feb 26, 2025

Uh oh!

gkellogg commented Feb 26, 2025

Uh oh!

pchampin commented Feb 27, 2025

Uh oh!

pfps left a comment

Choose a reason for hiding this comment

Uh oh!

afs commented Feb 27, 2025

Uh oh!

pfps commented Feb 27, 2025

Uh oh!

pchampin commented Feb 28, 2025

Uh oh!

TallTed left a comment

Choose a reason for hiding this comment

Uh oh!

pfps commented Feb 28, 2025

Uh oh!

pfps left a comment

Choose a reason for hiding this comment

Uh oh!

pfps commented Mar 20, 2025

Uh oh!

Uh oh!

pchampin commented Feb 26, 2025 •

edited by pr-preview bot

Loading