-
Notifications
You must be signed in to change notification settings - Fork 16
Improve the ja character set per ARIB feedback #614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Looking at liaison text and ARIB B-62 2.2E1 F1, I think line 3642 should target character codes listed in this list of Japanese Supplementary character set, e.g. |
Co-authored-by: himorin / Atsushi Shimono <[email protected]>
Co-authored-by: himorin / Atsushi Shimono <[email protected]>
Co-authored-by: himorin / Atsushi Shimono <[email protected]>
Co-authored-by: himorin / Atsushi Shimono <[email protected]>
|
The Timed Text Working Group just discussed
The full IRC log of that discussion<nigel> Topic: Improve the ja character set per ARIB feedback #614<nigel> github: https://github.com//pull/614 <cpn> Pierre: Atsushi, if I accept both your comments, are you happy with the PR? <cpn> Atsushi: The comment about ARIB was just a suggestion <nigel> -> Atsushi's comment https://github.com//pull/614#issuecomment-3332729041 <cpn> Atsushi: The comment relates to the suggested change above. <cpn> Pierre: I accepted the suggestion. Is the PR ok now? <cpn> Atsushi: Yes <nigel> SUMMARY: Atsushi's comments accepted |
|
I've read through whole updated text again. |
Ideographic Variation Selector is not a defined term in Unicode. 23.4 Variation Selectors, CJK Compatibility Ideographs states:
I assumed ARIB meant standardized variation sequences for CJK compatibility ideographs. |
|
In UTS #37, IVS (Ideographic Variation Sequence) is defined as a sequence of two coded characters, first as Ideographic, second as one of variation selector. Since IVS itself is a "sequence" of two Unicode codepoints, but uses one variation selector, so sometimes it is written as Ideographic Variation Selector (like About IVD/IVS at CITPC) (or used the term in early phase of development e.g. some proposals). CJK Compatibility ideographs are compatibility ideographs most of which are normalized into CJK Unified ideographs, but required for backward compatibility with local character encodings. Also some parts of CJK Compatibility ideographs are included as collections listed here, like IBM 32 compatibility ideographs U+FA0E to U+FA2D are listed as part of collection 287 Common Japanese. Following Unicode 6.3, these ranges got another table using SVS (Standardized Variation Sequences, using Standardized Variation Selectors - U+FE00 to U+FE02) as described in the section, which makes codepoints in CJK Compatibility ideographs to be written with CJK Unified Ideographs with one of SVS. So, I believe the note included as the last line of list 2 in liaison text, shall be read as use the Ideographic-specific Variation Selector defined in Unicode, or Ideographic Variation Sequence (IVS) defined in Unicode. |
Ideographic variation sequences are not part of Unicode and instead specified in UTS 37, but Standardized variation sequences are specified in Unicode. Does ARIB STD-B62 reference UTS 37? Are sure that ARIB does not mean Standardized variation sequences? Can ARIB provide examples of what they mean by "variation of Kanji character"? |
|
The basic question is: which clauses of the Unicode standard and what expected conformance does the following requirement from the ARIB liaison refer to? For variation of Kanji characters, the Ideographic Variation Selector defined in [ISO10646] shall be used. In particular, are specifications beyond ISO 10646 required to specify and/or conform to the requirement. In addition, a few examples would be appreciated. |
|
The Timed Text Working Group just discussed
The full IRC log of that discussion<nigel> Subtopic: Improve the ja character set per ARIB feedback #614<nigel> github: https://github.com//pull/614 <nigel> Pierre: [shares screen] <nigel> .. Liaison from ARIB raises the question at hand. <nigel> .. ARIB kindly suggested character set changes for ja, which is great. <nigel> .. There's a note about Ideographic Variation Selector. <nigel> .. However that is not a defined term. <nigel> .. Atsushi and I have been discussing how to interpret it. <nigel> .. We need to figure out what that means, so we don't write something different from <nigel> .. what they intend. <nigel> .. From Atsushi's last comment I think "ideographic variation sequence"? <nigel> Atsushi: CJK compatibility ideographs are there for compatibility. <nigel> .. There can be mismapping between character set and what Unicode says. <nigel> .. For backward compatibility between local character set and unicode some characters <nigel> .. have both mappings within [scribe missed]. <nigel> .. I believe that is not related to variation sequence or anything else. <nigel> .. If someone wants to say about the variation selector usually we say <nigel> .. "ideographic variation selector" or "ideographic variation sequence" <nigel> .. so they should mean the same as each other. They are terms used interchangeably. <nigel> .. I believe what the point means is that the ideographic variation sequences shall be used. <nigel> Pierre: That's not part of main Unicode, it's part of UCS-37. Does ARIB reference UCS-37? <nigel> Atsushi: Variation selector itself is in ISO10646 <nigel> Pierre: That's a much broader thing though, includes emoji selectors which I think we don't want. <nigel> Atsushi: shows [Ideographic variation sequence] in Unicode 17.0.0 <nigel> Pierre: You have to know how to represent it. <nigel> Atsushi: Representation is described in a separate database, not in ISO10646. <nigel> Pierre: Before saying you must or should support this I want to know absolutely certainly that <nigel> .. is what ARIB has in mind. Can we get a sample? <nigel> .. I don't want to suggest a mandatory thing that's wrong or won't be used. <nigel> Atsushi: I wonder if I can ask a "side" way from colleagues in NHK. <nigel> Pierre: Please ask informally! I'm interested as an Editor in knowing which part of Unicode <nigel> .. this "SHALL" exactly means. <nigel> .. Just to clarify the terminology that doesn't exactly match the spec. <nigel> Atsushi: Is it okay to reply to the liaison email by myself? <nigel> Nigel: Yes I think that would be good. I'd suggest if you can write informally in response <nigel> .. that we noticed this small difference in language and want to make sure that we understand <nigel> .. correctly and ask for guidance or even sample data then that would help clear this up for us. <nigel> .. I don't want to go around a whole formal liaison/response loop which will take a long time. <nigel> Pierre: [drafts the essential request in the GitHub issue] <nigel> SUMMARY: @himorin to ask informally for clarification as per the above discussion. |
|
(still waiting reply from ARIB colleagues.) |
|
The Timed Text Working Group just discussed
The full IRC log of that discussion<nigel> Subtopic: Improve the ja character set per ARIB feedback #614<nigel> github: https://github.com//pull/614 <wschildbach> pierre: we added a recommend charset based on ARIB input. <wschildbach> .. unfortunately, there is in the liaison some vagueness. We should make sure we get it right. <wschildbach> .. we asked for more details but got no clarifciation. <wschildbach> .. don't want to remove the text but we need clarification. This is informative (should not a shall), it is usefull but not necessary. <wschildbach> nigel: I think that the idiographic selector is not defined where it says it is. <wschildbach> .. translation issue? <wschildbach> atsushi: this is not a stopping issue <wschildbach> nigel: if your colleague comes later, let's ask them <wschildbach> .. is there a choice of terminology and we need to use the correct one? <wschildbach> pierre: this is a complex part with many things falling underneath it. <wschildbach> .. what would be most useful would be an example of what is meant. <wschildbach> .. I find it a complex part of the unicode spec. <wschildbach> .. as atsushi pointed out, terms may have changed. Ideally have an example. <wschildbach> .. here is sample tet that uses IVS, and here is what we expect the rendering to be. <wschildbach> s/tet/text/ <wschildbach> .. and we could include a spec action. <nigel> s/a spec action/in the spec actually <nigel> s/in/it in <wschildbach> nigel: this is unresolved right now. so we are saying we can proceed to CRS without resolving? <wschildbach> atsushi: agrees. <wschildbach> nigel: we merge later. <wschildbach> atsushi: this is not normative, so don't need another crs <wschildbach> nigel: we can put change in and request transition to rec <wschildbach> .. implementation report will be empty. It is a formality. <wschildbach> s/crs/CRS/ <nigel> SUMMARY: Hold this PR open pending feedback and hopefully an example, and do not hold up CRS publication <nigel> forcedDisplay and visibility="hidden" #484 <nigel> s/forced/Subtopic: forced |
Closes #613