UnicodeUtil: skip Hangul decompose/compose when no Hangul syllables present by Copilot · Pull Request #390 · elharo/xom

Copilot · 2026-04-01T11:09:13Z

normalize() unconditionally called decomposeHangul() and composeHangul() — both O(n) passes — even for strings with no Hangul content (0xAC00–0xD7A3). This penalizes all non-ASCII normalization (Greek, Arabic, CJK, etc.).

Changes

Single-pass Hangul detection: the existing scan loop now also checks for Hangul syllables. Sets hasHangul = true and breaks early on first match; otherwise continues scanning so a Hangul character appearing later in the string isn't missed.
Conditional Hangul processing: decomposeHangul() and composeHangul() are now gated on hasHangul.

// Before
if (needsNormalizing) {
    s = decomposeHangul(s);           // always called
    UnicodeString ustring = new UnicodeString(s);
    String result = ustring.decompose().compose().toString();
    result = composeHangul(result);   // always called
    return result;
}

// After
if (needsNormalizing) {
    if (hasHangul) s = decomposeHangul(s);
    UnicodeString ustring = new UnicodeString(s);
    String result = ustring.decompose().compose().toString();
    if (hasHangul) result = composeHangul(result);
    return result;
}

…se/compose calls Agent-Logs-Url: https://github.com/elharo/xom/sessions/0936fd40-179a-4ab0-a19a-ff83ce25893a Co-authored-by: elharo <1005544+elharo@users.noreply.github.com>

elharo

@copilot

elharo · 2026-04-01T11:21:03Z

            if (c > 255) {
                needsNormalizing = true;
-                break;
+                if (c >= FIRST_HANGUL_SYLLABLE && c <= LAST_HANGUL_SYLLABLE) {


does it make sense to instead scan forward from i at this point looking specifically for hangul characters since we already know normalizing is needed, avoiding the c > 255 and needsNormalizing checks?

Done in 5e9fe27. The first loop now breaks immediately on finding any c > 255 (setting needsNormalizing). If that character is already Hangul we're done; otherwise a second, focused loop scans from position i+1 checking only the Hangul range, avoiding the redundant c > 255 test.

…er suggestion Agent-Logs-Url: https://github.com/elharo/xom/sessions/da3d8e96-a25c-42c6-9863-6a4fadd4ec66 Co-authored-by: elharo <1005544+elharo@users.noreply.github.com>

Initial plan

1a85286

Copilot AI assigned Copilot and elharo Apr 1, 2026

Copilot started work on behalf of elharo April 1, 2026 11:09 View session

Copilot AI linked an issue Apr 1, 2026 that may be closed by this pull request

UnicodeUtil: optimize normalize() to avoid unnecessary Hangul decompose/compose calls #375

Open

UnicodeUtil: optimize normalize() to avoid unnecessary Hangul decompo…

c01f44d

…se/compose calls Agent-Logs-Url: https://github.com/elharo/xom/sessions/0936fd40-179a-4ab0-a19a-ff83ce25893a Co-authored-by: elharo <1005544+elharo@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Optimize normalize() to conditionally call Hangul methods~~ UnicodeUtil: skip Hangul decompose/compose when no Hangul syllables present Apr 1, 2026

Copilot finished work on behalf of elharo April 1, 2026 11:17

Copilot AI requested a review from elharo April 1, 2026 11:17

elharo marked this pull request as ready for review April 1, 2026 11:18

elharo reviewed Apr 1, 2026

View reviewed changes

Copilot started work on behalf of elharo April 1, 2026 11:21 View session

UnicodeUtil: split normalize() scan into two focused loops per review…

5e9fe27

…er suggestion Agent-Logs-Url: https://github.com/elharo/xom/sessions/da3d8e96-a25c-42c6-9863-6a4fadd4ec66 Co-authored-by: elharo <1005544+elharo@users.noreply.github.com>

Copilot finished work on behalf of elharo April 1, 2026 11:24

Copilot AI requested a review from elharo April 1, 2026 11:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeUtil: skip Hangul decompose/compose when no Hangul syllables present#390

UnicodeUtil: skip Hangul decompose/compose when no Hangul syllables present#390
Copilot wants to merge 3 commits into
masterfrom
copilot/optimize-normalize-hangul-processing

Copilot AI commented Apr 1, 2026 •

edited

Loading

Uh oh!

elharo left a comment

Uh oh!

elharo Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

elharo left a comment

Choose a reason for hiding this comment

Uh oh!

elharo Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 1, 2026 •

edited

Loading