In mysql5.1, the following Chinese Character in utf8 encoding occupies a char. mysql5.1utf8

Source: Internet
Author: User

In mysql5.1, the following Chinese Character in utf8 encoding occupies a char. mysql5.1utf8
Recently, it was found that the length of the Oracle and MySQL fields is calculated differently (both are UTF-8 encoded), for example:
Defined in Oracle: name varchar2 (10). The name field can contain 10 or 3 Chinese characters.
Defined in MySQL: name varchar (10). The name field can contain 10 or 10 Chinese characters.
It can be learned from the above: in oracle, 1 Chinese Character = 3 bytes
Why is one Chinese Character = one byte in MySQL ??

It is found that the unit of varchar after MySQL5 is character, while that of oracle varchar2 is byte.
Encoding is not the same as the bytes occupied by a Chinese character:
1 Chinese Character = 3 byte under UTF-8
1 Chinese Character = 2 bytes under GDK

Mysql varchar (50), whether in Chinese or English, stores 50.
In MySQL5, The varchar field type is described as follows: varchar (m) variable-length string. M indicates the maximum column length. M ranges from 0 to 65,535. (The maximum actual length of a VARCHAR is determined by the maximum row size and the character set used. The maximum valid length is 65,532 bytes ).
Why is this change? I really feel that the MySQL manual is too unfriendly, because you need to read it carefully before you can find this description: MySQL 5.1 complies with standard SQL specifications, the trailing space of the VARCHAR value is not deleted. When VARCHAR is saved, it uses a prefix of one or two bytes plus data. If the length declared by the VARCHAR column is greater than 255, the length prefix is two bytes.
Well, it seems I understand a little bit. However, when the length is greater than 255, the prefix of two bytes is used. The primary subtraction question is: 65535-2 = 65533. I don't know how these Daniel calculate it. Do you still have questions?

Note: I tested UTF8 encoding. The maximum length of varchar is 21854 bytes.
In mysql 5.0.45, the database code utf8 is tested: varchar is defined as 21785 at most. That is to say, only 21785 letters, numbers, and Chinese characters are allowed.

Suppose: varchar has a maximum byte value of 65535, and utf8 is encoded into three characters: 65535/3 = 21785. However, when using the length function, it is found that a Chinese character occupies three bytes, and a letter or other characters occupy one byte. Is the actual length of char (10) variable?

Reference link:
Http://www.oschina.net/question/59889_12699
Http://zhidao.baidu.com/question/132054814

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.