-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HHH-14032 Parse Locales with scripts properly in LocaleTypeDescriptor.fromString #3410
base: 5.6
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,43 +51,33 @@ public Locale fromString(String string) { | |
return Locale.ROOT; | ||
} | ||
|
||
boolean separatorFound = false; | ||
int position = 0; | ||
char[] chars = string.toCharArray(); | ||
|
||
for ( int i = 0; i < chars.length; i++ ) { | ||
// We just look for separators | ||
if ( chars[i] == '_' ) { | ||
if ( !separatorFound ) { | ||
// On the first separator we know that we have at least a language | ||
string = new String( chars, position, i - position ); | ||
position = i + 1; | ||
} | ||
else { | ||
// On the second separator we have to check whether there are more chars available for variant | ||
if ( chars.length > i + 1 ) { | ||
// There is a variant so add it to the constructor | ||
return new Locale( string, new String( chars, position, i - position ), new String( chars, | ||
i + 1, chars.length - i - 1 ) ); | ||
Locale.Builder builder = new Locale.Builder(); | ||
String[] parts = string.split("_"); | ||
|
||
for (int i = 0; i < parts.length; i++) { | ||
String s = parts[i]; | ||
switch (i) { | ||
case 0: | ||
builder.setLanguage(s); | ||
break; | ||
case 1: | ||
builder.setRegion(s); | ||
break; | ||
case 2: | ||
if (i < parts.length - 1 || !s.startsWith("#")) { | ||
builder.setVariant(s); | ||
break; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be great to adhere to hibernate code format (see https://github.com/hibernate/hibernate-ide-codestyles). Also, existing code convention uses |
||
} | ||
else { | ||
// No variant given, we just have language and country | ||
return new Locale( string, new String( chars, position, i - position ), "" ); | ||
case 3: | ||
if (s.startsWith("#")) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When needing to compare with characters, I'd prefer to use the char API:
|
||
s = s.substring(1); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be ideal to add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The gist of this PR seems to lie in the above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TBH, there must be still room for code readability improvement by clearer logic. The break case falling through is to difficult to understand. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that the above code is less than ideal. Adding a break to case 2 would mean duplicating the code from case 3, I don't know which one is the lesser of evils. We could perhaps just avoid the switch altogether and unroll it into 4 if statements. It could perhaps be rewritten to mimic this pattern as seen in LanguageTag, which would require using something like StringTokenizer. It is unfortunate that this code cannot simply use toLanguageTag / forLanguageTag... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMHO, using StringTokenizer is perfectly fine in Hibernate core codebase (it has been used more than 50 after a quick search). Code readability is a much bigger concern. The current switch case falling through is a clever approach but might make future code maintaining much more difficult. |
||
} | ||
} | ||
|
||
separatorFound = true; | ||
builder.setScript(s); | ||
break; | ||
} | ||
} | ||
|
||
if ( !separatorFound ) { | ||
// No separator found, there is only a language | ||
return new Locale( string ); | ||
} | ||
else { | ||
// Only one separator found, there is a language and a country | ||
return new Locale( string, new String( chars, position, chars.length - position ) ); | ||
} | ||
return builder.build(); | ||
} | ||
|
||
@SuppressWarnings({ "unchecked" }) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,18 +25,19 @@ public class LocaleTypeDescriptorTest extends BaseUnitTestCase { | |
|
||
@Test | ||
public void testConversionFromString() { | ||
assertEquals( toLocale( "de", null, null ), LocaleTypeDescriptor.INSTANCE.fromString( "de" ) ); | ||
assertEquals( toLocale( "de", "DE", null ), LocaleTypeDescriptor.INSTANCE.fromString( "de_DE" ) ); | ||
assertEquals( toLocale( null, "DE", null ), LocaleTypeDescriptor.INSTANCE.fromString( "_DE" ) ); | ||
assertEquals( toLocale( null, null, "ch123" ), LocaleTypeDescriptor.INSTANCE.fromString( "__ch123" ) ); | ||
assertEquals( toLocale( null, "DE", "ch123" ), LocaleTypeDescriptor.INSTANCE.fromString( "_DE_ch123" ) ); | ||
assertEquals( toLocale( "de", null, "ch123" ), LocaleTypeDescriptor.INSTANCE.fromString( "de__ch123" ) ); | ||
assertEquals( toLocale( "de", "DE", "ch123" ), LocaleTypeDescriptor.INSTANCE.fromString( "de_DE_ch123" ) ); | ||
assertEquals( toLocale( "", "", "" ), LocaleTypeDescriptor.INSTANCE.fromString( "" ) ); | ||
assertEquals( toLocale( "de", null, null, null ), LocaleTypeDescriptor.INSTANCE.fromString( "de" ) ); | ||
assertEquals( toLocale( "de", "DE", null, null ), LocaleTypeDescriptor.INSTANCE.fromString( "de_DE" ) ); | ||
assertEquals( toLocale( null, "DE", null, null ), LocaleTypeDescriptor.INSTANCE.fromString( "_DE" ) ); | ||
assertEquals( toLocale( null, null, "ch123", null ), LocaleTypeDescriptor.INSTANCE.fromString( "__ch123" ) ); | ||
assertEquals( toLocale( null, "DE", "ch123", null ), LocaleTypeDescriptor.INSTANCE.fromString( "_DE_ch123" ) ); | ||
assertEquals( toLocale( "de", null, "ch123", null ), LocaleTypeDescriptor.INSTANCE.fromString( "de__ch123" ) ); | ||
assertEquals( toLocale( "de", "DE", "ch123", null ), LocaleTypeDescriptor.INSTANCE.fromString( "de_DE_ch123" ) ); | ||
assertEquals( toLocale( "zh", "HK", null, "Hant"), LocaleTypeDescriptor.INSTANCE.fromString( "zh_HK_#Hant" ) ); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems the above testing case is the sole motivation of this PR. Could we avoid touching existing code logic by a simple patch as following:
Seems a better alternative than the current tricky switch implementation. Also, it is easy to understand and verify, so is more ideal from code maintaining perspective. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The existing code needs to be rewritten regardless, as the only way to set script or extensions is by switching to LocaleBuilder. Also, looking at the Javadoc / implementation for Locale.toString(), we can see the pattern is different based on whether the Locale has a variant or not (e.g., with a variant, script and extension are separated from the rest with "_#" vs just the "#" when no variant is present). Therefore I think the approach of testing if the string ends with "#" would need a special case for "_" anyway. I'll explore the more generic approach with just using a StringTokenizer or the like, if the main concern is the switch statement here? |
||
assertEquals( toLocale( "", "", "", null ), LocaleTypeDescriptor.INSTANCE.fromString( "" ) ); | ||
assertEquals( Locale.ROOT, LocaleTypeDescriptor.INSTANCE.fromString( "" ) ); | ||
} | ||
|
||
public Locale toLocale(String lang, String region, String variant) { | ||
public Locale toLocale(String lang, String region, String variant, String script) { | ||
final Locale.Builder builder = new Locale.Builder(); | ||
if ( StringHelper.isNotEmpty( lang ) ) { | ||
builder.setLanguage( lang ); | ||
|
@@ -47,6 +48,9 @@ public Locale toLocale(String lang, String region, String variant) { | |
if ( StringHelper.isNotEmpty( variant ) ) { | ||
builder.setVariant( variant ); | ||
} | ||
if ( StringHelper.isNotEmpty( script ) ) { | ||
builder.setScript( script ); | ||
} | ||
return builder.build(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From performance perspective, we should avoid using
String.split()
for it will create RE and might incur unnecessary performance cost. There should be better way to replace the abovesplit()
usage.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can use the
StringHelper#split
for better performance.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had originally used StringTokenizer, which should have better performance, but its Javadoc says the following:
That said, even in the latest JDK the class is technically not deprecated, so not sure if the above advice should be strictly heeded.
Looking at the suggested
StringHelper#split
method, it actually does use StringTokenizer internally. Depending on how careful we want to be about avoiding extra allocations, we can use StringTokenizer directly and avoid the array allocation. I don't have a strong opinion.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for knowledge sharing. I have no idea that

StringTokenizer
is not recommended. I took a look of the internal implementation ofString#split()
and found it has a fastpath for simple delimiter, so I think your current approach is perfectly fine.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
split
method accepts a regex which gets compiled at runtime, it would be best to declare the regex as a static final constant in the class, and use that to split strings.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sanne as mentioned in the above comment, the split method has a fast path when splitting on a single character that avoids compiling a regex.