In mysql5.1, the following Chinese Character in utf8 encoding occupies a char. mysql5.1utf8
Recently, it was found that the length of the Oracle and MySQL fields is calculated differently (both are UTF-8 encoded), for example:
Defined in Oracle: name varchar2 (10). The name field can contain 10 or 3 Chinese characters.
Defined in MySQL: name varchar (10). The name field can contain 10 or 10 Chinese characters.
It can be learned from the above: in oracle, 1 Chinese Character = 3 bytes
Why is one Chinese Character = one byte in MySQL ??
It is found that the unit of varchar after MySQL5 is character, while that of oracle varchar2 is byte.
Encoding is not the same as the bytes occupied by a Chinese character:
1 Chinese Character = 3 byte under UTF-8
1 Chinese Character = 2 bytes under GDK
Mysql varchar (50), whether in Chinese or English, stores 50.
In MySQL5, The varchar field type is described as follows: varchar (m) variable-length string. M indicates the maximum column length. M ranges from 0 to 65,535. (The maximum actual length of a VARCHAR is determined by the maximum row size and the character set used. The maximum valid length is 65,532 bytes ).
Why is this change? I really feel that the MySQL manual is too unfriendly, because you need to read it carefully before you can find this description: MySQL 5.1 complies with standard SQL specifications, the trailing space of the VARCHAR value is not deleted. When VARCHAR is saved, it uses a prefix of one or two bytes plus data. If the length declared by the VARCHAR column is greater than 255, the length prefix is two bytes.
Well, it seems I understand a little bit. However, when the length is greater than 255, the prefix of two bytes is used. The primary subtraction question is: 65535-2 = 65533. I don't know how these Daniel calculate it. Do you still have questions?
Note: I tested UTF8 encoding. The maximum length of varchar is 21854 bytes.
In mysql 5.0.45, the database code utf8 is tested: varchar is defined as 21785 at most. That is to say, only 21785 letters, numbers, and Chinese characters are allowed.
Suppose: varchar has a maximum byte value of 65535, and utf8 is encoded into three characters: 65535/3 = 21785. However, when using the length function, it is found that a Chinese character occupies three bytes, and a letter or other characters occupy one byte. Is the actual length of char (10) variable?
Reference link:
Http://www.oschina.net/question/59889_12699
Http://zhidao.baidu.com/question/132054814