Skip to content

Conversation

@michaelfarrell76
Copy link
Member

Related Issues

Summary

This PR rewrites the browser-language → Transcend locale resolution so it no longer “knows” anything about AWS translation codes. All decisions are now made using our own locale constants and browser-tag map, with a simple and predictable prefix rule.

Why

The previous implementation leaked AWS_SUPPORTED_TRANSLATIONS / LOCALE_TRANSLATION_MAP into selection logic, making UX decisions depend on a vendor map.

Customers can configure which locales appear (and their order) independently for Consent UI and Privacy Center. We need deterministic behavior based only on our browser tag map, and the customer’s supportedLocales list + order.

Rule

  1. If the mapped locale is in supportedLocales, use it.
  2. Else, if the short base (e.g., ar) is in supportedLocales, use it.
  3. Else, use the first base-* variant present in supportedLocales (respects customer order).

Examples:

  • ar → ar (if supported) else first ar-*.
  • ar-EG when only ar-AE is allowed → ar-AE.
  • zh-HK with only zh allowed → zh (not zh-CN unless that’s what zh is in the allowed list).

@linear
Copy link

linear bot commented Oct 10, 2025

Copy link
Member

@csmccarthy csmccarthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have some reservations about the matching logic, comments below!

Comment on lines 109 to 113
const prefixLc = baseOf(normalizeBrowserTag(browserTag));
const shortMatch = supportedLocales.find((l) => l.toLowerCase() === prefixLc);
if (shortMatch) {
return shortMatch;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im not sure that this makes sense to implement - if we assume that we add a base lang and all its sublocale BCP 47 codes to our LOCALE_BROWSER_MAP at a time, then the only time this would trigger is e.g.

  1. my browser locale is it-CH which maps to LOCALE_KEY.ItCh
  2. the customer has explicitly marked LOCALE_KEY.ItCh as unsupported (since thats the only reason why it wouldnt be in the supportedLocales list)
  3. we then look to it, which has to be in the supportedLocales list for this logic to trigger, otherwise we would move to the next fallback, meaning LOCALE_BROWSER_MAP has it map to LOCALE_KEY.It
  4. and we then show them the base italian language json.

i think this feels a little murky to me for 3 reasons:

  1. the customer has to explicitly mark that locale as not supported, so im not sure if we want to fall back to a potentially very different base lang. one example i could think of is if a customer has fr translated, and its full of france french words or references or jokes or something, then them saying "we dont support fr-CA" is for a decent reason
  2. base langs and sublocales can be really different! chinese probably being the biggest example here
  3. the order of operations feels off. if the user has 2 languages they speak marked in their browser, we would be favoring a fuzzy match on their first language over a solid match on their second

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i definitely think that if customer has "fr" enabled, but not "fr-CA" and "fr-CA" is locale, we should default to "fr" instead of "en"....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can definitely adjust the logic to favor an exact match on second over a fuzzy match on first!

but i def think that in above example fr-CA should map to fr > en if those are the two best options available

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im just calling out that i dont think it would necessarily be universally desired. we can settle on that, but fuzzy locale matching has historically been an extremely sticky topic for us when it comes up on tickets and such. ease of understanding the matching logic may be a point towards just trying to go straight through LOCALE_BROWSER_MAP and failover otherwise, esp with all the new locale keys were adding making that exact matching more robust

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csmccarthy but thats not even how it worked today. the way it works today actually seems to be more accepting of fallbacks like this. the logic seemed to be looking and saying "both fr-CA, fr, and fr-FR map to fr locale in AWS translations, so they are all interchangable". this at least makes the logic more obvious about how we fallback...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought about something like "just use LOCALE_BROWSER_MAP" but we'd have to change it to something like { "ar": ["ar", "ar-AR" ,...]
}

and the complexity of maintaining that and making sure folks edit correctly is a bit high. i think the rules here of

  1. take exact hit
  2. fallback to 2 character locale
  3. fallback to best locale matching 2 character prefix
  4. default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont really agree, but i dont feel strongly enough to argue it out in comments on the tests. i think once we have the exact > fuzzy match change in then we can merge

Copy link
Member Author

@michaelfarrell76 michaelfarrell76 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i can do that. i’m definitely open to hearing you out, but id like to get specific test examples for the counter. this logic was very hard to parse without any tests. it's quite tricky to know what was intentional vs overlooked, and how a change to one of these functions results in different corner cases changing.

i feel like what you’re arguing would result in browser language “ar” mapping to “en”
instead of ar-AE which definitely feels off to me?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed it was hard to grok without tests lol, the logic is pretty rough

the tldr is what i was advocating for would result in a situation where if i go into the admin dash and remove the Ar locale key from my orgs supported locales, and then went onto my site with the Ar locale id see the fallback language, which makes intuitive sense to me. doing the fuzzy matching means you cant really "disable" certain locales if you want to

i need to head out the door to get some labwork done before my fmla leave! i can respond again when im back home (or on slack)

Copy link
Member Author

@michaelfarrell76 michaelfarrell76 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk no problem, i may get this merged in so that i can start updating the functions in monorepo and consent uis, but before getting the final pr merged in main, im definitely happy to make sure we flesh this out. at the very least, it should now be easier to make changes to the logic in one place with some quick unit tests.

what do you think about situation where browser locale is "ar-AE" and list of supported locales is "ar", "fr", "en", and "ar-AE" is not supported - do you think it should fallback to ar?

if i go into the admin dash and remove the Ar locale key from my orgs supported locales, and then went onto my site with the Ar locale id see the fallback language, which makes intuitive sense to me. doing the fuzzy matching means you cant really "disable" certain locales if you want to

the situation you describe makes total sense to me as a valid concern... i think, at least for now, it makes sense to start with what we have here as a first step for the following reasons:

  1. for this first pass, consent manager ui still will not support all LOCALE_KEYS. so the customer is not given the ability to choose "ar" vs "ar-AE" -when the consent ui constant was created, we basically chosen a single locale key for each language, but it's not consistently the 5+ character key or the 2 character key. the logic changes here will basically use the same idea where we are essentially treating each locale key as it's 2 character key, and just finding the closest language to the 2 character key.
  • note: of all the languages in the consent ui, the full list of locales that overlap on the first two characters are: es-419, es-ES, he-IL, he, zh-CN, zh-HK --- note "zh", "es" are not configurable options for consent ui at this time
  1. historically when i've seen customers configure the set of languages they use on the privacy center, it seems like they are normally trying to pick the same locale keys that they use on their website, while also choosing the minimum number of options so that they can keep the set of translations they have to maintain to a minimum. most companies seem to either just use 2 character keys with a few specific 4+ character keys, some only use 4 character keys, and others it can be a complete mixture (not always well thought out). i bring this up because the concept of "disabling" or "enabling" a browser language is completely coupled to the need to provide translations, which folks want to keep to a minimum.

  2. I do like the idea of continuing to encourage folks to treat localization separate from regionalization. this would keep switching of languages for the most part independent from the meaning behind the content. i think adding in the 2 character locales will help with this, as we can encourage folks to use the 2 character locales in most places, and only use the 4 character locales when they really need to get nuanced. if this is the ultimate way we push customers, and we agree that "ar-AE" fallsback to "ar", i could see a world where we change it so that "ar" does not fallback to "ar-AE". but we should keep in mind that getting to this state both involves a) making the consent ui language keys selectable and inclusive of all LOCALE_KEY values and b) reviewing all customer privacy centers that are using 4 character locales when they probably should instead be using 2 character locales.

Comment on lines 75 to 78
const baseLc = baseOf(lc);
if (baseLc in LOCALE_BROWSER_MAP_LOWERCASE) {
return LOCALE_BROWSER_MAP_LOWERCASE[baseLc];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to have both this and the base lang fallback in resolveSupportedLocaleForBrowserTag?


/**
* Sort a provided list of locales by the user’s preferences.
* Exact matches rank before base-only matches; otherwise original order is preserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaict this comment as written isnt true, since were adding the base only fuzzy matches to the users preferred locale array

Copy link
Member Author

@michaelfarrell76 michaelfarrell76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csmccarthy i think the one big takeaway i have right now is to make sure that exact matches get prioritized over fuzzy match fallback.

for the other comments, i dont think i completely agree with your suggestion - or im not fully understanding it as its a bit hard to reason through the logic.

if there are things that you still dont agree with, could you take a look at the tests and comment on any tests that you disagree about the value for? or call out additional exampls that i can add to the tests to enforce the desired logic?

i feel like it will be easier to reach an agreement by aligning on the test cases

csmccarthy
csmccarthy previously approved these changes Oct 10, 2025
@michaelfarrell76
Copy link
Member Author

@csmccarthy actually turning out to be a bit more complicated to prioritize exact match over a fuzzy match.

take following test

  it('prioritizes a later exact LOCALE_BROWSER_MAP hit over an earlier fuzzy/base match', () => {
      // Supported has both a base (Ar) and a specific variant (ArAe)
      const supported: LocaleValue[] = [
        LOCALE_KEY.Ar,
        LOCALE_KEY.ArAe,
        LOCALE_KEY.EnUs,
      ];

      // Browser says a fuzzy base-match first (ar-OM -> Ar), then an exact map later (ar-AE -> ArAe)
      const browser = ['ar-OM', 'ar-AE'];

      const res = getUserLocalesFromBrowserLanguages(
        browser,
        supported,
        LOCALE_KEY.EnUs,
      );

      // Exact (ArAe) should be before fuzzy/base (Ar)
      expect(res).to.deep.equal([LOCALE_KEY.ArAe, LOCALE_KEY.Ar]);
    });

currently this returns [LOCALE_KEY.Ar, LOCALE_KEY.ArAe] because both ar-OM and ar-AE are considered an exact match because we have

const LOCALE_BROWSER_MAP = {
  ar: LOCALE_KEY.Ar, // Arabic العربية
  'ar-001': LOCALE_KEY.Ar,
  'ar-AE': LOCALE_KEY.ArAe, 
  'ar-OM': LOCALE_KEY.Ar,
}

i think i am actually OK with the outcome of this... as it still goes back to trusting whatever is in "LOCALE_BROWSER_MAP". but it does make me question what we put i this LOCALE_BROWSER_MAP. i see right now we sometimes map things like af -> afzz, and other places map fr -> fr, frfr->frfr... so for this to work, i think ill need to clean up that LOCALE_BROWSER_MAP to be consistent.

the alternative could be to simplify the LOCALE_BROWSER_MAP to only provide exact matches for ar-AE and ar - but not ar-OM knowing it will fallback to ar automatically...

honestly im kinda leaning towards

  1. trimming down LOCALE_BROWSER_MAP to only be exceptions to the rule e.g.
  yue: LOCALE_KEY.ZhHk, // Cantonese 粵語
  'yue-Hans': LOCALE_KEY.ZhCn, // Cantonese (Simplified) 粤语 (简体)
  'yue-Hans-CN': LOCALE_KEY.ZhCn, // Cantonese (Simplified, China) 粤语 (简体,中华人民共和国)
  'yue-Hant': LOCALE_KEY.ZhHk, // Cantonese (Traditional) 粵語 (繁體)
  'yue-Hant-HK': LOCALE_KEY.ZhHk, // Cantonese (Traditional, Hong Kong SAR China) 粵語 (繁體,中華人民共和國香港特別行政區)
  // 'zgh', // Standard Moroccan Tamazight ⵜⴰⵎⴰⵣⵉⵖⵜ
  // 'zgh-MA', // Standard Moroccan Tamazight (Morocco) ⵜⴰⵎⴰⵣⵉⵖⵜ (ⵍⵎⵖⵔⵉⴱ)
  zh: LOCALE_KEY.Zh, // Chinese 中文
  'zh-Hans': LOCALE_KEY.ZhCn, // Chinese (Simplified) 中文(简体) Simplified Chinese
  'zh-Hans-CN': LOCALE_KEY.ZhCn, // Chinese (Simplified, China) 中文(简体,中国) Simplified Chinese (China)
  'zh-Hans-HK': LOCALE_KEY.ZhCn, // 中文(简体,中国香港特别行政区) Simplified Chinese (Hong Kong SAR China)
  'zh-Hans-MO': LOCALE_KEY.ZhCn, // 中文(简体,中国澳门特别行政区) Simplified Chinese (Macau SAR China)
  'zh-Hans-SG': LOCALE_KEY.ZhCn, // Chinese (Simplified, Singapore) 中文(简体,新加坡) Simplified Chinese (Singapore)
  'zh-Hant': LOCALE_KEY.ZhHk, // Chinese (Traditional) 中文(繁體) Traditional Chinese
  'zh-Hant-HK': LOCALE_KEY.ZhHk, // 中文(繁體字,中國香港特別行政區) Traditional Chinese (Hong Kong SAR China)
  'zh-Hant-MO': LOCALE_KEY.ZhHk, // 中文(繁體字,中國澳門特別行政區) Traditional Chinese (Macau SAR China)
  'zh-Hant-TW': LOCALE_KEY.ZhHk, // Chinese (Traditional, Taiwan) 中文(繁體,台灣) Traditional Chinese (Taiwan)
  1. have exact match be if LOCALE_BROWSER_MAP[browserLanguage] OR browserLanguage is exactly equal to a value in LOCALE_KEY
  2. have <2 char>-* fallback to <2 char>

if we made this change, it would both reduce overhead of maintaining LOCALE_BROWSER_MAP (which already has some issues), it would still account for the inconsistencies in how supported locales are used today , and it will result in the test at top of this file to resolve to [LOCALE_KEY.ArAe, LOCALE_KEY.Ar]
....

too much for a friday, gonna think on this some more and come back later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants