a preface
What does a varchar (n) n mean, and how many Chinese characters can be stored? is a commonplace problem, and today was a development colleague asked me about this issue, simply write an article to specifically introduce.
Two theoretical knowledge
Let's start by explaining the definition of varchar in MySQL's historical version:
4.0 version, varchar (50), refers to 50 bytes, if stored UTF8 kanji, can only save 16 (each Chinese 3 bytes)
More than 5.0 versions, varchar (50), refers to 50 characters, whether it is stored in numbers, letters or UTF8 Chinese (3 bytes per Chinese), can be stored 50
Storage limits
The length of the extra bytes required to hold characters: less than 255 is 1 bytes, greater than 255 is 2 bytes
Encoding restrictions
GBK: A maximum of 2 bytes per character
UTF8: A maximum of 3 bytes per character
Utf8mb4 up to 4 bytes per character, Chinese 3 bytes, emoji emoji 4 bytes
Length limit
The length of the MySQL definition row cannot exceed 65535, which limits the number of columns, such as the char (x) UTF8 character set, up to 65535/(128*3) = 170 characters.
three-Test
Environment Server Version:5.6.26-74.0-log Percona Server
mysql> CREATE TABLE T1
(id int not NULL auto_increment primary key,
-Name varchar (10)
) Engine=innodb default charset=utf8mb4;
Query OK, 0 rows affected (0.01 sec)
Mysql> CREATE TABLE T2
(id int not NULL auto_increment primary key,
-Name varchar (10)
) Engine=innodb default Charset=utf8;
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE T3
(id int not NULL auto_increment primary key,
-Name varchar (10)
) Engine=innodb default CHARSET=GBK;
Query OK, 0 rows affected (0.01 sec)
UTF8MB4 Character Set
mysql> INSERT into T1 (name) VALUES (' Abcdfeghi ');
Query OK, 1 row Affected (0.00 sec)
mysql> INSERT into T1 (name) VALUES (' Abcdfeghij ');
Query OK, 1 row Affected (0.00 sec)
mysql> INSERT into T1 (name) VALUES (' Abcdfeghijk ');
Query OK, 1 row affected, 1 Warning (0.00 sec)
mysql> INSERT into T1 (name) VALUES (' 1,234,567,890 ');
Query OK, 1 row Affected (0.00 sec)
mysql> INSERT into T1 (name) VALUES (' 1,234,567,891 ');
Query OK, 1 row affected, 1 Warning (0.00 sec)
Mysql> Show warnings;
+---------+------+-------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------+
| Warning | 1265 | Data truncated for column ' name ' at row 1 |
+---------+------+-------------------------------------------+
1 row in Set (0.00 sec)
mysql> INSERT into T1 (name) VALUES (' 0123456789 ');
Query OK, 1 row Affected (0.00 sec)
mysql> INSERT into T1 (name) VALUES (' 01234567890 ');
Query OK, 1 row affected, 1 Warning (0.00 sec)
Mysql> Select Id,name,length (name), Char_length (name) from T1;
+----+--------------------------------+--------------+-------------------+
| ID | name | Length (name) | Char_length (name) |
+----+--------------------------------+--------------+-------------------+
| 1 | Abcdfeghi | 9 | 9 |
| 2 | Abcdfeghij | 10 | 10 |
| 3 | Abcdfeghij | 10 | 10 |
| 4 | 1,234,567,890 | 30 | 10 |
| 5 | 1,234,567,890 | 30 | 10 |
| 6 | 0123456789 | 10 | 10 |
| 7 | 0123456789 | 10 | 10 |
+----+--------------------------------+--------------+-------------------+
7 Rows in Set (0.00 sec)
UTF8 Character Set
mysql> INSERT into t2 (name) VALUES (' Abcdfeghi ');
Query OK, 1 row Affected (0.00 sec)
mysql> INSERT into t2 (name) VALUES (' Abcdfeghij ');
Query OK, 1 row Affected (0.00 sec)
mysql> INSERT into t2 (name) VALUES (' Abcdfeghijk ');
Query OK, 1 row affected, 1 Warning (0.00 sec)
mysql> INSERT into t2 (name) VALUES (' 1,234,567,890 ');
Query OK, 1 row Affected (0.00 sec)
mysql> INSERT into t2 (name) VALUES (' 1,234,567,891 ');
Query OK, 1 row affected, 1 Warning (0.00 sec)
mysql> INSERT into t2 (name) VALUES (' 0123456789 ');
Query OK, 1 row Affected (0.00 sec)
mysql> INSERT into t2 (name) VALUES (' 01234567890 ');
Query OK, 1 row affected, 1 Warning (0.00 sec)
Mysql> Select Id,name,length (name), Char_length (name) from T2;
+----+--------------------------------+--------------+-------------------+
| ID | name | Length (name) | Char_length (name) |
+----+--------------------------------+--------------+-------------------+
| 1 | Abcdfeghi | 9 | 9 |
| 2 | Abcdfeghij | 10 | 10 |
| 3 | Abcdfeghij | 10 | 10 |
| 4 | 1,234,567,890 | 30 | 10 |
| 5 | 1,234,567,890 | 30 | 10 |
| 6 | 0123456789 | 10 | 10 |
| 7 | 0123456789 | 10 | 10 |
+----+--------------------------------+--------------+-------------------+
7 Rows in Set (0.00 sec)
GBK Character Set
mysql> insert into t3 (name) VALUES (' Abcdfeghi ');
Query OK, 1 row Affected (0.00 sec)
mysql> insert into t3 (name) VALUES (' Abcdfeghij ');
Query OK, 1 row Affected (0.00 sec)
mysql> insert into t3 (name) VALUES (' Abcdfeghijk ');
Query OK, 1 row affected, 1 Warning (0.00 sec)
mysql> insert into t3 (name) VALUES (' 1,234,567,890 ');
Query OK, 1 row affected (0.01 sec)
mysql> insert into t3 (name) VALUES (' 1,234,567,891 ');
Query OK, 1 row affected, 1 Warning (0.00 sec)
mysql> insert into t3 (name) VALUES (' 0123456789 ');
Query OK, 1 row Affected (0.00 sec)
mysql> insert into t3 (name) VALUES (' 01234567890 ');
Query OK, 1 row affected, 1 Warning (0.00 sec)
Mysql> Select Id,name,length (name), Char_length (name) from T3;
+----+--------------------------------+--------------+-------------------+
| ID | name | Length (name) | Char_length (name) |
+----+--------------------------------+--------------+-------------------+
| 1 | Abcdfeghi | 9 | 9 |
| 2 | Abcdfeghij | 10 | 10 |
| 3 | Abcdfeghij | 10 | 10 |
| 4 | 1,234,567,890 | 20 | 10 |
| 5 | 1,234,567,890 | 20 | 10 |
| 6 | 1,234,567,890 | 20 | 10 |
| 7 | 0123456789 | 10 | 10 |
| 8 | 0123456789 | 10 | 10 |
+----+--------------------------------+--------------+-------------------+
8 rows in Set (0.00 sec)
As can be seen from the above test, the current version of varchar (N) defines the length of the unit is the character, length (str) indicates the number of bytes Str occupies, char_length (str) indicates the number of characters Str occupies.
Regardless of the character set, both the number and the English alphabet are 1 characters and take up one byte. Chinese characters are different because of the character set.
Four Summary
Back to answer the question at the beginning of the article varchar (N) How many Chinese characters can be stored. The answer is that a varchar (n) can hold n characters in a version after 5.0.
Throws the character set, if the row of data is all varchar type, its maximum length is 65,535 bytes.
The line length calculation formula is as follows:
Row length = 1
+ (sum of column lengths)
+ (number of NULL columns + Delete_flag + 7)/8
+ (number of variable-length columns)
For MyISAM, an additional 1 bits are required to record whether the value is null; for InnoDB, there is no difference
For Row_format for Fixed,delete_flag 1; for Row_format=dynamic,delete_flag to 0
Based on this formula, we will be able to solve the maximum value of the beginning N: (65535-1-2)/3
Minus 1 because the actual storage starts with the 2nd byte
Minus 2 because you want to store the actual character length in the list length
Except for 3 because UTF8 encoding is limited
To come together again:
CREATE table t4 (c int, C2 char (+), C3 varchar (N)) Charset=utf8;
Maximum value of N: (65535-1-2-4-30*3)/3
The maximum value of n here is (65535-1-2-4-30*3)/3=21812
Minus 1 and minus 2 are the same as in the previous example;
The reason for minus 4 is that the int type C accounts for 4 bytes;
The reason for reducing 30*3 is that char (30) occupies 90 bytes and the encoding is UTF8.
If the varchar exceeds the B rule above and is strongly turned to the text type, then each field occupies a defined length of 11 bytes, which is not "varchar", of course.
Five references
http://blog.csdn.net/u012048106/article/details/23173911
Characters, bytes, and encodings
MySQL garbled problem and utf8mb4 character set
Original blog:
http://blog.itpub.net/22664653/viewspace-1979335/
"MySQL" talking about varchar (N)