-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propertiness #1064
Propertiness #1064
Conversation
I'd suggest that the top be the properties on https://www.unicode.org/reports/tr18/#RL2.7, perhaps with those groupings. Put all Contributory and Provisional into a separate bucket. Not sure what the parens are for, as in (kEH_Core) Some values don't have links, eg "Obsolete" Identifier_Status Restricted The Will look it over more tomorrow. |
Provisional, see the heading
Finer property status (splitting out Contributory etc.) and groupings would be nice, but we do not have a maintainable way of keeping track of it so far (there was an attempt with PropertyStatus.java, but as noted in the PR description, that did not work). Here I am instead doing what I can based on what we are forced to maintain, namely *PropertyAliases.txt.
Yes, that is because it is multivalued, see #1018 item 2.
Confusable is there, it goes into Non-UCD non-properties (Other information). The Identifier_* stuff is what UTS39 actually describes as a property. RGI_Emoji (but not RGI_Emoji_*_Sequence) should be there because it is described as a property in UTS51, but isn’t because it is hacked directly into the JSPs instead of being in IndexUnicodeProperties; I will add it later, see the TODOs in ExtraPropertyAliases. |
Note that beyond the cosmetics of grouping character.jsp, we actually want to keep track of the « is this a UCD property » information, see #1049. |
Note: I tried splitting out Provisional from Normative+Informative, and that seemed counterproductive for Unihan and Unikemet (which are the only places where we have Informative properties) to have them in two blocks; hence the parentheses approach. |
Ah nevermind, I see UTS51 also describes the RGI_Emoji_*_Sequence zoo as properties. I’ll fix that. |
As noted in the TODOs, I’d like to move RGI_Emoji and IDNA2008_Category into IndexUnicodeProperties (rather than being patched into the JSPs), and to add RGI_Emoji_Qualification, all of these being NonUcdProperty. But I will do that in a subsequent PR. |
@markusicu Friendly ping, since I think some of @jowilco’s work is blocked on this. |
@@ -1440,6 +1442,42 @@ public static void showProperties( | |||
|
|||
String kRSUnicode = getFactory().getProperty("kRSUnicode").getValue(cp); | |||
boolean isUnihan = kRSUnicode != null; | |||
List<UcdProperty> indexedProperties = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optional -- might simplify something:
How about, rather than just building separate lists of properties, you add an enum PropCategory { UCD, NON_UCD, ... CJK, ...}
, and create a Map<PropCategory, List<UcdProperty>>
?
You could then also use maps from PropCategory to table headings and such.
Does it matter if these lists are List's? Or do you just need Collection's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter if these lists are List's?
Not really, I convert them to lists of String below anyway (because one of them is not a list of UcdProperty, namely the list of stuff that gets added in the tools).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge this now to unblock John and see if I can come up with something cleaner in a subsequent PR.
UnicodeJsps/src/main/java/org/unicode/jsp/UnicodeUtilities.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Markus Scherer <[email protected]>
A classification of properties derived from presence in PropertyAliases, or derived from a field that we are forced to fill in in ExtraPopertyAliases (contrast PropertyStatus.java which is out of date).
In character.jsp, split the information into (UCD properties, non-UCD properties, UCD non-properties, non-UCD non-properties), with a further split for Unihan (out of UCD properties and after UCD non-properties). See it in staging: