mysql5.1 UTF8 encoding The next Chinese character takes up the doubt of a char

Source: Internet
Author: User

The calculation of field lengths for Oracle and MySQL has recently been found to be different (all UTF8 encoded), such as:
Defined under Oracle: Name VARCHAR2, the Name field can hold: 10 characters or 3 kanji
Defined under MySQL: Name varchar, name field can hold: 10 characters or 10 kanji
From the above you can tell: under Oracle, 1 Kanji = 3 bytes
Why under MySQL, 1 characters = 1 bytes??

After investigation, said: MySQL5 After the unit is a character, and Oracle's VARCHAR2 is a byte
Code is different. A character occupies a different byte:
UTF-8 1 Kanji = 3 bytes
GDK 1 Kanji = 2 bytes

MySQL varchar (50), both Chinese and English, is 50.
MySQL5 document, where the varchar field type is described as: varchar (m) variable length string. M represents the maximum column length. The range of M is 0 to 65,535. (The maximum actual length of varchar is determined by the size of the longest row and the character set used, and the maximum effective length is 65,532 bytes).
Why is it so transformed? I really feel that the MySQL handbook is too unfriendly, because you have to read it carefully to see this description: MySQL 5.1 complies with the standard SQL specification and does not remove trailing spaces for varchar values. VarChar is saved with a byte or two bytes long prefix + data. If the varchar column declaration is longer than 255, the length prefix is two bytes.
Well, it seems to understand a little. But specifically he said the length is greater than 255 when using a 2-byte length prefix, primary subtraction: 65535-2 = 65533 AH. Do not know how these Daniel calculate, for the moment to reserve doubt it?

Note: I tested it using UTF8 encoding, the maximum length of varchar is 21854 bytes.
In MySQL version 5.0.45, database encoding UTF8 is tested: varchar is defined as a maximum of 21785. That is, no matter the letters, numbers, Chinese characters, can only put 21,785.

Presumption: varchar byte maximum 65535,utf8 encodes a character of 3 bytes 65535/3=21785. However, when using the length function to find a value, a Chinese character occupies 3 bytes, a letter and other characters occupy a byte. For char (10), is the actual length variable?

Reference Links:
http://www.oschina.net/question/59889_12699
http://zhidao.baidu.com/question/132054814

mysql5.1 UTF8 encoding The next Chinese character takes up the doubt of a char

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.