MySQL database varchar How many Chinese characters can be saved, how many English? Let's find out.

Source: Internet
Author: User
Tags mysql query mysql version

First, about UTF-8

UTF-8Unicode transformation format-8bit. is a multi-byte encoding used to solve international characters.

It uses 8 bits (that is, one byte) for English, and Chinese uses 24 bits (three bytes) to encode.

UTF-8Contains the characters that all countries in the world need to use, is the international code, the universality is strong.

UTF-8Encoded text can be displayed on browsers that support the UTF8 character set amount in each country.
In the case UTF8 of encoding, the English IE can also be displayed in Chinese, they do not need to download IE's Chinese language support package.

Second, about GBK

GBKis the national standard GB2312 based on the expansion of compatible GB2312 standards.

GBKThe text encoding is expressed in double-byte, that is, both the Chinese and English characters are represented by double-byte, in order to distinguish the language, the highest bit is set to 1.

GBKContains all Chinese characters, is the country code, the generality is worse than UTF8, but UTF8 occupies the database GBK larger.

Iii. about UTF8MB4

Before MySQL 5.5, the UTF8 encoding only supported 1-3 bytes, only support the BMP part of the Unicode coding area, BMP from where to?
Poke here basically is 0000 ~ FFFF this area.

Starting with MySQL 5.5, 4 byte UTF encoded UTF8MB4 can be supported, and a character can have a maximum of 4 bytes, so more character sets can be supported.


utf8mb4 is a superset of utf8

TF8MB4 is compatible with UTF8 and can represent more characters than UTF8.

As for when to use, to see what you do the project ...
When you do a mobile app, you will encounter the iOS user input expression in the text area emoji , and if you do not do some processing, you will cause the database exception to be inserted.

Iv. the length of Chinese characters is related to coding

MYSQL version above 5.0:

1. How much length of a Chinese character is related to coding:

    • UTF-8: A kanji = 3 bytes, English is a byte
    • GBK: A kanji = 2 bytes, English is a byte

2, varchar (n) represents n characters, regardless of Chinese characters and English, MySql can be stored n in characters, only the actual byte length of the difference.

3. mysql check length, available in SQL language


SELECT LENGTH(fieldname) FROM tablename 

V. Practical testing

1. Use the utf8 create str_test table first.

    CREATE TABLE `str_test` (        `name_chn` varchar(20) NOT NULL,        `name_en`  varchar(20) NOT NULL    ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=utf8

Then insert the value

    mysql> insert into  str_test values (‘我爱Ruby‘, ‘I Love Ruby!‘);    Query OK, 1 row affected (0.02 sec)

Open IRB

    >> "我爱Ruby".size    => 6    >> "I Love Ruby!".size    => 12    >>

The results from the MySQL query, compare

    mysql> select * from str_test;    +------------+--------------+    | name_chn   | name_en      |    +------------+--------------+    | 我爱Ruby   | I Love Ruby! |    +------------+--------------+    1 row in set (0.02 sec)    mysql> select length(name_chn) from str_test;    +------------------+    | length(name_chn) |    +------------------+    |               10 |    +------------------+    1 row in set (0.01 sec)

3[a kanji three bytes] * 2 + 1[One English one byte] * 4 = 10

    mysql> select length(name_en) from str_test;    +-----------------+    | length(name_en) |    +-----------------+    |              12 |    +-----------------+    1 row in set (0.00 sec)

10[one English One byte] * 1 + 2[space One byte] * whitespace = 12

2, use to GBK do the test

Create a table

        CREATE TABLE `str_test` (        `name_chn` varchar(20) NOT NULL,        `name_en`  varchar(20) NOT NULL    ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=gbk

Insert data, and test

    mysql> insert into  str_test values (‘我爱Ruby‘, ‘I Love Ruby!‘);    Query OK, 1 row affected (0.00 sec)    mysql> select * from str_test;    +------------+--------------+    | name_chn   | name_en      |    +------------+--------------+    | 我爱Ruby   | I Love Ruby! |    +------------+--------------+    1 row in set (0.01 sec)

GBKChinese is two bytes, English is a byte.

    mysql> select length(name_chn) from str_test;    +------------------+    | length(name_chn) |    +------------------+    |                8 |    +------------------+    1 row in set (0.00 sec)

2[Chinese Two bytes] * 2 + 4[English One byte] * 1 = 8

    mysql> select length(name_en) from str_test;    +-----------------+    | length(name_en) |    +-----------------+    |              12 |    +-----------------+    1 row in set (0.00 sec)

10[English One byte] * 1 + 2[space One byte] * whitespace = 12

Vi. about how many values varchar can store
    • MySQL record line length is limited, not infinitely long, this length is, that is, 64K 65535 a byte, for all tables are the same.

    • MySQL will have 1-2 bytes for the variable-length field to hold the character length.

    • When the number of characters is less than or equal to 255, MySQL only uses 1 bytes to record, because 2 of the 8 square minus 1 can only be saved to 255.

    • When the number of characters is more than 255, you have to use 2 bytes to save the length.

    • In the utf-8 state of varchar, the maximum can only be to (65535-2)/3 = 21,844 + 1.

    • In the gbk state of varchar, the maximum can only be to (65535-2)/2 = 32,766 + 1

Use to utf-8 Create

        mysql>     CREATE TABLE `str_test` (        ->         `id`  tinyint(1)  NOT NULL,        ->         `name_chn` varchar(21845) NOT NULL        ->     ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=utf8        -> ;        ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs        mysql>     CREATE TABLE `str_test` (        ->         `id`  tinyint(1)  NOT NULL,        ->         `name_chn` varchar(21844) NOT NULL        ->     ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=utf8        ->        ->        -> ;        Query OK, 0 rows affected (0.06 sec)

Use to gbk Create

When the storage length is 32768 failed ~

        mysql>     CREATE TABLE `str_test` (        ->         `id`  tinyint(1)  NOT NULL,        ->         `name_chn` varchar(32768) NOT NULL        ->     ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=gbk        -> ;    ERROR 1074 (42000): Column length too big for column ‘name_chn‘ (max = 32767); use BLOB or TEXT instead

When the storage length is 32767 failed ~

    mysql>     CREATE TABLE `str_test` (                                                                                                 ->         `id`  tinyint(1)  NOT NULL,        ->         `name_chn` varchar(32767) NOT NULL        ->     ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=gbk        -> ;    ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs

When the storage length is 32766 successful ~

    mysql>     CREATE TABLE `str_test` (        ->         `id`  tinyint(1)  NOT NULL,        ->         `name_chn` varchar(32766) NOT NULL        ->     ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=gbk        -> ;    Query OK, 0 rows affected (0.03 sec)

The smallint is stored in two bytes, so

2[smallint] + 32766 * 2[varchar storage length] + 2[2 bytes to save length] > 65535

So failure ~

     mysql>     CREATE TABLE `str_test` (         ->         `id`  smallint(1)  NOT NULL,         ->         `name_chn` varchar(32766) NOT NULL         ->     ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=gbk         -> ;     ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs

# # # #七, byte of numeric type

type Occupied bytes
Int 4 bytes
smallint 2 bytes
tinyint 1 bytes
Decimal Variable length

The official decimal description is as follows

Values for DECIMAL (and NUMERIC) columns is represented using a binary format that packs nine DECIMAL (base) digits in to four bytes.

Storage for the integer and fractional parts of each value is determined separately.

Each multiple of nine digits requires four bytes, and the ' leftover ' digits require some fraction of four bytes.

The storage required for excess digits are given by the following table.

Translate to Chinese

Uses a binary format to compress 9 decimal (10) numbers to 4 bytes to represent the decimal column value.

The integer of each value and the storage of fractional parts are determined separately.

A multiple of each 9-digit number requires 4 bytes, and the "remaining" bits require a portion of 4 bytes.

The following table shows the storage requirements for exceeding the number of digits:

Number of
leftover DigitsBytes
0 0
1 1
2 1
3 2
4 2
5 3
6 3
7 4
8 4

That: Decimal (10,2) takes a few bytes?

1, the first 10 refers to the total length of the integer and fractional parts, 2 refers to the length of the fractional part. Then the whole number of parts is only 10-2 = 8 bits

2, because the whole number and the fractional storage city each independent determination, so their respective occupied space of the synthesis is the total space occupied.

3, to the table, the whole part of the 8 bits accounted for 4 bytes, the decimal part 2 bits accounted for 1 bytes, so decimal (10,2) accounted for a total of 4 + 1 = 5 bytes.

4, Decimal (6,2) integer part (6-2 = 4) bits accounted for 2 bytes, fractional Part 2 bits accounted for 1 bytes, a total of 3 bytes.

Viii. Summary

The varchar field stores the actual content separately from the clustered index, and the content begins with 1 to 2 bytes representing the actual length (2 bytes in length over 255), so the maximum length cannot exceed 65535.

    • UTF-8: A kanji = 3 bytes, English is a byte
    • GBK: A kanji = 2 bytes, English is a byte

In the utf-8 state, Chinese characters can save up to 21,844 strings, and English is 21,844 strings.

In the gbk state, Chinese characters can save up to 32,766 strings, and English is 32,766 strings.

MySQL database varchar How many Chinese characters can be saved, how many English? Let's find out.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.