Text Files
The important thing to understand, if we
are going to display Korean characters,
is how they are represented in a text file.
I am used to every character in a text file
being 1 byte. Since this is a Windows project
I consider a text file to be one which looks
like text in Wordpad (or Notepad).
Looking at a Korean text file I found that
there are 1 byte characters and 2 byte characters.
One byte characters are (almost) the same
as ASCII characters. They have values from
0x00 to 0x7e. Any byte which is 0x81-0xfe
is the first byte of a pair of bytes representing
a character. These 2 byte characters have
enough different combinations for all the
rest of the character set.
There seems to be a standard called Wansung
encoding where the 2 byte characters range
from 0xa1a1- 0xfefe. This allows room for
all the characters in KS C 5601-1992.
Microsoft seem to use a superset of this
called Extended Wansung or Codepage 949.
This has a first byte from 0x81-0xfe and
a second byte from 0x41- 0x5a, 0x61 - 0x7a
or 0x81 - 0xfe. Because of this two bytes
can represent many more characters such
as the rest of the Johab characters. Extended
Wansung is upward compatible from Wansung.
Keyboards
I expect Western readers are wondering
how all these characters can be entered
on a PC keyboard. The answer apparently
is called Korean IME (Input Method Editor).
As you type the individual Hangul (see keyboard
picture below) the IME combines them into
the correct combinations.
If you want to type Hanja you enable a
particular function and then you just type
the Hangul that make the same sounds as
the Hanja (you can usually write a word
in both) and the IME gives you a list of
Hanja to choose from (as many make the same
sound). The list of possible Hanja is organised
in order of frequency of use. This order
apparently can adapt to the user's frequency
of use.
Possible IME's are MS KOIME (from Microsoft's
web site) or NJIME (from NJStar).
[Thanks to Mark Williamson for the above
IME information.- 10 December, 2001]