UTF-8:
It is a multi-byte encoding for international characters. It uses 8 bits (one byte) for English and 24 bits (three bytes) for Chinese characters.
GBK
It is compatible with gb2312 after expansion based on the National Standard gb2312. The GBK text encoding is expressed in double bytes, that is, both Chinese and English characters are expressed in Double Bytes. To distinguish Chinese characters, set the highest bit to 1. GBK contains all Chinese characters and is a national code. Its versatility is worse than utf8, but the database occupied by utf8 is larger than that occupied by GBD.
Generally, all webpages use UTF-8, because a large amount of HTML code in the webpage does not occupy space when UTF-8 is used.
The UTF-8-encoded database varchar (30) can store up to 10 Chinese characters, because one Chinese Character occupies three bytes.
Varchar (20) in versions earlier than 4.0 refers to 20 bytes. If utf8 Chinese characters are stored, only 6 (3 bytes for each Chinese character) can be saved. If varchar (20) is later than 5.0, varchar (20) is a string of 20 characters. It can contain 20 numbers, letters, and UTF-8 characters (3 bytes for each Chinese character). The maximum size is 65532 bytes; varchar (20) is the largest but only 20 bytes in mysql4. However, the storage size of mysql5 varies depending on the encoding. The specific rules are as follows:
A) Storage restrictions
The varchar field stores the actual content separately in the clustered index. The content starts with 1 to 2 bytes to indicate the actual length (2 bytes if the length exceeds 255 ), therefore, the maximum length cannot exceed 65535.
B) encoding length limit
If the character type is GBK, each character occupies a maximum of 2 bytes, and the maximum length cannot exceed 32766;
If the character type is utf8, each character occupies up to 3 bytes, and the maximum length cannot exceed 21845.
If the preceding limits are exceeded during definition, the varchar field is forcibly converted to the text type and generates a warning.
For C Language
According to the compiler, different compilers have different rules. The ANSI standard defines that int occupies 2 bytes, TC is ANSI, and its int occupies 2 bytes. However, in VC, an int occupies 4 bytes.
UTF-8 and GBK encoding differences