C language was originally designed in English environment, the main character set is 7-bit ASCII code, 8-bit byte (byte) is the most common character encoding units. But internationalized software must be able to represent different characters, which are large in number and cannot be encoded with one byte.
C95 standardizes two methods for representing large character sets: wide characters (wide character, each character in the character set uses the same bit length), and multibyte characters (multibyte character, each character can be one to many bytes, The character value of a byte sequence is determined by the context in which the string or stream (stream) is located.
Since the 1994 update, the C language provides not only the char type, but also the wchar_t type (wide character), which is defined in the Stddef.h header file. wchar_t the wide byte type specified is sufficient to represent any element of an implementation version extended character set.
In multibyte character sets, the encoding width of each character is unequal, it can be one byte, or it can be multiple bytes. Both the source code character set and the run character set may contain multibyte characters. Multibyte characters can be used for constants of characters, string literals (string literal), identifiers (identifier), annotations (comment), and header files.
The C language itself does not define or specify any encoding set, or any character set (except for the basic source code character set and the basic run character set), but rather its implementation specifies how to encode wide characters and what type of multibyte character encoding mechanism to support.
Although the C standard does not support Unicode character sets, many implementation versions use Unicode conversion format UTF-16 and UTF-32 to handle wide characters. If you follow the Unicode standard, the wchar_t type is at least 16 or 32 bits long, and a value of the wchar_t type represents a Unicode character.
UTF-8 is an implementation defined by the Unicode Consortium (Universal Code Federation) that can represent all characters of the Unicode character set. The size of the space used by the UTF-8 character is possible from one byte to four bytes.
The main difference between multibyte characters and wide characters (that is, wchar_t) is that the width characters occupy the same number of bytes, while the number of bytes in multibyte characters varies, so that multibyte strings are more difficult to handle than wide strings. For example, even though the character ' a ' can be represented by a byte, but to find this character in a multi-byte string, you cannot use a simple byte alignment, because even if you find a match in a location, this byte is not necessarily a character, it may be part of another different character. However, multibyte characters are quite appropriate for storing text as a file.
C provides standard functions for converting multibyte characters to wchar_t, or for converting wide characters to multibyte characters. For example, if the C compiler uses the Unicode standard UTF-16 and UTF-8, the following call to the Wctomb () function can get a multibyte representation of the character (note: Wctomb = wide character to multibyte).