You can represent extended characters in either of two ways:
As multibyte characters which can be embedded in an ordinary string, an array of char
objects. Their advantage is that many programs and operating systems can handle occasional multibyte characters scattered among ordinary ASCII characters, without any change.
As wide characters, which are like ordinary characters except that they occupy more bits. The wide character data type, wchar_t
, has a range large enough to hold extended character codes as well as old-fashioned ASCII codes.
An advantage of wide characters is that each character is a single data object, just like ordinary ASCII characters. There are a few disadvantages:
Each existing program must be modified and recompiled to make it use wide characters.
Files of wide characters cannot be read by programs that expect ordinary characters.
Typically, you use the multibyte character representation as part of the external program interface, such as reading or writing text to files. However, it's usually easier to perform internal manipulations on strings containing extended characters on arrays of wchar_t
objects, since the uniform representation makes most editing operations easier. If you do use multibyte characters for files and wide characters for internal operations, you need to convert between them when you read and write data.
If your system supports extended characters, then it supports them both as multibyte characters and as wide characters. The library includes functions you can use to convert between the two representations. These functions are described in this chapter.