Skip to content

fix(lesson): handle uppercase letters in word filtering#575

Open
Dronakurl wants to merge 4 commits into
aradzie:masterfrom
Dronakurl:fix-uppercase-words
Open

fix(lesson): handle uppercase letters in word filtering#575
Dronakurl wants to merge 4 commits into
aradzie:masterfrom
Dronakurl:fix-uppercase-words

Conversation

@Dronakurl

Copy link
Copy Markdown

PR Message Template: Uppercase Words Fix

Summary

Fix case-sensitivity bug in dictionary filtering that prevented uppercase words from being used in guided lessons.

Related Issue

Fixes #555

Current State

When "Prefer natural words" is enabled, words with uppercase first letters are filtered out because the dictionary uses case-sensitive character matching. The German alphabet contains only lowercase letters, so words like "Hexe", "Mexiko", "Loyalität" are excluded from lessons.

This results in:

  • Only 7 out of 23 words with 'x' being available (30%)
  • Only 9 out of 19 words with 'y' being available (47%)
  • Pseudo-words like "loyalig" and "mexist" being generated for rare letters

Proposed State

Convert characters to lowercase before checking against the codePoints set, allowing all words in the dictionary to be used regardless of capitalization.

Changes

Modified Files

  • packages/keybr-lesson/lib/dictionary.ts
    • Modified filterWordList() to convert characters to lowercase before filtering
    • Modified Word.matches() to convert characters to lowercase before matching

Code Changes

// filterWordList - before:
words.filter((word) =>
  [...toCodePoints(word)].every((codePoint) => codePoints.has(codePoint)),
)

// filterWordList - after:
words.filter((word) =>
  [...toCodePoints(word)].every((codePoint) => {
    const lower = String.fromCodePoint(codePoint).toLowerCase().codePointAt(0)!;
    return codePoints.has(lower);
  }),
)

Same pattern applied to Word.matches().

Impact

Letter Before After Change
x 7 words (30%) 23 words (100%) +227%
y 9 words (47%) 19 words (100%) +111%

Testing

  • Existing tests should pass (no regressions expected)
  • Manual testing with German language focusing on 'x' and 'y' letters
  • Verified that real words like "Hexe", "Mexiko", "Hobby" now appear in lessons

Related

Limitations

  • Short words (< 3 characters) are still filtered out by design (see guided.ts:31)
  • ALL CAPS words (acronyms) would pass, but none exist in the German word list
  • No changes to the phonetic model - it continues to generate lowercase pseudo-words as fallback

Dronakurl and others added 2 commits February 26, 2026 22:14
The filterWordList and Word.matches functions used case-sensitive
character matching, which filtered out words with uppercase first
letters (German nouns, proper nouns, sentence starters).

Fixed by converting characters to lowercase before checking against
the codePoints set, which contains only lowercase letter code points
from the language alphabet.

Fixes aradzie#555

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Dictionary constructor was building the word index using original
case codePoints, but focusedCodePoint lookups always use lowercase
letters from the language alphabet.

This caused words with uppercase first letters (German nouns) to be
missed when filtering by focusedCodePoint, even though the general
filter (Word.matches) was already fixed.

Fixed by converting codePoints to lowercase when building the index.

Fixes aradzie#555

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@semanticdiff-com

semanticdiff-com Bot commented Feb 26, 2026

Copy link
Copy Markdown

Review changes with  SemanticDiff

Changed Files
File Status
  packages/keybr-lesson/lib/dictionary.ts  40% smaller
  packages/keybr-color/lib/convert-xyz.test.ts  24% smaller
  build.sh Unsupported file format
  packages/keybr-lesson/lib/dictionary.test.ts  0% smaller

@trapicki

Copy link
Copy Markdown

Please pull this request.

I tried it, and it really is a huge improvement for German training! Not only with the rare letters, but also with all the other letters it shows so much more words that where hidden. German has inherently many words with initial uppercase, so these words and having them shown in uppercase is exactly doing what keybr has as a unique feature: practice words and letter combinations like they occur in daily writing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Only limited and biased selection of words with Natural Words

2 participants