Languages
 
 
 
 

[Last Modified: 10 December 2001-changes in blue, newsflash in red]

Back To GameBoy Project Main Page


Game Boy Book Reader
(adding Chinese Support)


Importing a 16 pixel high chinese font onto the unicode codespace. The glyphs appear as they should and are stored against the correct codes.

I've added a function (View Page) to show a page of characters at a time (see below). The blue-green codepoints are ones not defined in the Microsoft definition of this codepage.

 

The Project

I have often received requests for a Chinese version of GameBoy Book Reader, but without some help from someone in this part of the world it would not have been possible for me to start. Several people have now offered help, and sent me information.

This page is an attempt to keep them, and anyone else who is interested, informed about progress on this project.

Thanks to JACK for the Game Boy Book Reader button at the top of the page.

Any comments to this e-mail address.

I will also be looking for some short stories in Chinese in text file form, suitable for viewing in Notepad...


Chinese Characters and Scrolling

Chinese text display presents new problems for the Gameboy Book Reader which are going to involve a complete change of approach.

The first problem encountered was that the complexity of Chinese characters makes it virtually impossible to represent them in tiles 8 pixels high, especially as we expect to leave the bottom row relatively clear to allow a gap between rows.

The current approach to text display in the reader engine uses the tile mapping as a means of fast scrolling of a line. Instead of moving the character data we just re-number the tile map so that the displayed position changes (this involves sixteen times less bytes to write).

However this more or less forces the use of characters which are 8 or 16 pixels high (including any space between rows). And as soon as we go to 16 pixel high characters we halve the number of rows displayable (to 6), and probably halve the number of characters on a line. So the number of characters on a page would be reduced quite severely.

A good compromise might be characters 11 or 12 pixels high, as it look possible to make chinese convincing characters this high.

The other factor is that a couple of files have been made available to me, containing chinese fonts in 12x12 and 16x16 characters respectively. This promises to save a lot of work for somebody. The point I am currently uncertain about is whether I am going to need to allow a further row of pixels to separate these characters vertically.

I have also seen a bitmap of the chinese version of Notepad. This seems to use a system font with 11 x 11 pixel chinese characters. If I could get hold of a file with this font, I would be really pleased!

All this points to the need for the reader engine to be flexible in the character height, and not require an integral number of tiles vertically per character row. This then means that vertical scrolling will need to be performed by a memory copy process of the actual character data.

Initial mental calculations suggest that this can be fast enough not to be too obtrusive, so I have made the commitment to re-write the engine along these lines.

Chinese Characters and Storage.

The next problem with chinese characters is that as well as being larger (and so requiring up to four times as much storage per character), there are many more possible characters. Whereas it is very feasible to store a font for all the West European languages in a few thousand bytes, Chinese characters number in the tens of thousands. If we always store all of these in the book cartridge, we require the use of a large cartridge even for a very short book. We also have to deal with english books just not needing this overhead.

As we now have the potential requirement for a range of different languages, it seems logical to try to put in place support for any language which may be required. We do this by making a superset of character fonts available to the Makebook utility, which selects only the character glyphs required in the book being processed, and transfers these to the book cartridge.

Storage of Text and Font in Book Cartridge

When the text is stored in the book cartridge we have two main requirements:

  • the text should not take up more room than necessary, but
  • about 30,000 different characters may need to be represented within the same book

The chosen solution here is to represent the ansi codes from 20h to 7Fh as single byte characters, and other characters as 2 byte characters.

Furthermore, characters other than the basic set 20h-7Fh will be allocated codes dynamically as they are found in the book text by Makebook. The first character found will be allocated the code 80h, 00h. the next character 80h, 01h and so on... all the way up to FFh, FFh if necessary; over 30,000 codes.

As the characters are allocated codes they are also assigned sequential space in the font table. Each character glyph will comprise up to 33 bytes. The first byte is the character width (including the blank space following the character on the right). The next two bytes are the (up to) 16 pixels of the top row of the character. The other rows follow in the next bytes. A glyph for an 8 pixel high font will thus have 17 bytes. A 12 pixel high font character will have 25 bytes, and so on.

The lead characters from 00h to 1Fh are reserved for special meanings.

  • 01h introduces a column specifier
  • 02h introduces a custom character

A column specifier acts as a kind of tabulation command. A custom character allows, amongst other things the insertion of monochrome bitmaps into the page.

 

 

Project Status

FLASH: OK It's here now! Just go to the Download Page to get it. And let me know what you think. This page will be updated soon.

Please note that on some versions of Windows in China the text file will look like rubbish. But don't worry. If it looks ok in Notepad it will still make a good book ROM. I would be grateful if any internationalisation expert can help me explain this!

A partially working Gameboy Display (using 16 pixel high characters)

The scroll bar now works (at least in Chinese - thanks to JACK) This display uses 12 pixel high characters). There must be somthing wrong with the row filling logic as there is always room for 1 more character.

Reader Engine

Done so far...
  • This has been partly revised so that the tile mapping for the text area now has a fixed numbering, and a software scroll technique is used. This has not yet been optimised to allow the scrolling to be fast enough not to be noticed.

  • The basic (built-in for testing) english font (8 pixel high) has been redone to have the potential for 16 pixel wide glyphs.

  • The character generating routine has been adjusted to use this new font.

  • Internal font has been ripped out completely. Character display routines have been re-done to display any height of character from 8 to 16 pixels high. Tested at 8 and 16 high so far.

  • Started to put scroll bar info back in. Works in English and Chinese so far.
Still to do...
  • The header bar is still completely ruined, and needs work.

  • Plenty of other stuff...

Makebook Utility

Done so far...
  • Started to modify font editing dialog to allow characters up to 16 x 16

  • Modified Font description files to add Unicode codepoint information.

  • Discovered to my surprise that (at least in Visual Studio) a dialog cannot have more than about 256 controls. Probably a good thing, as I am now using a more GDI programmed method of displaying the glyph being edited. It should look nicer and be more maintainable.

  • Managed to add an import function for chinese font files (see picture above).

  • Import function now imports previous version font files plus various other formats.

  • Imported font files do not overwrite existing charcaters - allows merging of font files from different sources.

  • Arranged that characters are stored against correct codes (including transformations from local code pages to unicode.

  • Added a function to display a complete page of 256 codes at once.

  • Changed font edit function so that left mouse held down draws all pixels passed through, and right button erases pixels in the same way.

  • Modified Export to interpret characters in text file according to specified code page.

  • Modified Export to generate table of glyphs actually used in text file.

Still to do...

  • Export title bar correctly
  • etc...

Unicode

A unicode text file is assumed here to be a file containing characters, each represented by a pair of bytes. There may additionally be a marker pair of bytes at the start of the file; either FF, FE or FE, FF which indicate whether the other byte pairs are arranged in big endian or little endian form.

The basic multilingual plane (BMP) of unicode characters with its 16 bit codes, allows enough code points for up to 64K characters, enough for all the world symbols in common use. Nearly 50 thousand such characters have been assigned positions in this code table.

It should be noted that local representations of a given one of these characters may differ from country to country, so a single world font file is still not really practical, but may be close enough for the Gameboy Book reader.


Multibyte Encoding

The programs which we are concerned with here Notepad, Wordpad, and Makebook all appear to work with multibyte encoding, at least on Windows 9x systems.

In a multibyte encoding system a character may either be one byte or two bytes long. They can be distinguished because one byte characters are confined to a range of values (such as 00-7F), and 2 byte characters must start with a value in a different range. Then the second byte in a 2 byte character can in theory have any value from 00-FF.


The Makebook Font Editor

Microsoft's web site provides text files listing the character values used in different parts of the world. These tables include corresponding unicode values, and descriptions of the characters represented.

So using these tables it is possible to create a font editor which presents locally used characters (in the country running the editor), and then to store the edited character glyph against the unicode code point in the font file. Users in different countries would then be able to edit a common file, but accessing only the code points which

  • they were interested in, and
  • their computers were set up to display

 


Notepad displaying a mixed Chinese / English text file


Everything I know about Chinese

Chinese can be written left to right starting at the top of the page, just like in English. This is the direction of writing adopted by the project at present although, (as Mark Williamson points out), Chinese can also be written top to bottom.

 

 

 

Chinese PC's

They use Notepad too. It uses multibyte encoding to represent Chinese.

The RichEdit control I used in the Makebook editor when used on a Chinese computer, displays correct Chinese in Windows 2000, but gobbledygook in Windows 98.

By this I mean that each Chinese 2 byte character is displayed as 2 individual european letters. This is weird to me because the editor seems to work ok in Korea, where they also have multibyte characters.


Useful Links

Ken Lunde links page (see the CJK.INF document)

Back To GameBoy Project Main Page