Skip to content

Latest commit

Β 

History

History
29 lines (17 loc) Β· 2.03 KB

READMEEng.md

File metadata and controls

29 lines (17 loc) Β· 2.03 KB

Codacy Badge HitCount

HangulBreak.py & ENYG.py


Hangul, the Korean Alphabet, can be decomposed into several components. For example, 컴퓨터, meaning Computer, can be decomposed into γ…‹γ…“γ…γ…γ… γ…Œγ…“. While there are structural benefits if we can easily decompose Hangul, there are limited resources providing this functionalities. I have created my own module, and open sourced here.

In Korean, we have a suffix that elaborates the grammatical relationships between each words. This suffix varies according to the previous character's composition (을 or λ₯Ό, 이 or κ°€, 은 or λŠ”...). Most of the codes we use simply write two suffix at the same time, in order to reduce workload.

For example, "a" and "an" is conditionally switched according to the following word in English. We can analyze the following word to decide whether 'a' or 'an' fits in the place. But current Korean Hangul libraries are just writing both of them, like the following.

I like a(an) apple.

Yes, it does the job, but it is aesthetically not pleasing and even confusing sometimes... The same thing was happening in Korean Hangul Libraries.

ENYG.py is designed to fix this issue. It analyzes the sentence to decide the suffix of each words.

demo3

This is a decomposing library of Hangul. Unlike other hangul decomposition library, this returns individual character sets for each letter, for greater compatibility. ENYG.py is built using this library.

demo1.png demo2.png