First, about UTF-8
UTF-8
Unicode transformation format-8bit. is a multi-byte encoding used to solve international characters.
It uses 8 bits (that is, one byte) for English, and Chinese uses 24 bits (three bytes) to encode.
UTF-8
Contains the characters that all countries in the world need to use, is the international code, the universality is strong.
UTF-8
Encoded text can be displayed on browsers that support the UTF8 character set amount in each country.
In the case UTF8
of encoding, the English IE can also be displayed in Chinese, they do not need to download IE's Chinese language support package.
Second, about GBK
GBK
is the national standard GB2312 based on the expansion of compatible GB2312 standards.
GBK
The text encoding is expressed in double-byte, that is, both the Chinese and English characters are represented by double-byte, in order to distinguish the language, the highest bit is set to 1.
GBK
Contains all Chinese characters, is the country code, the generality is worse than UTF8, but UTF8 occupies the database GBK
larger.
Iii. about UTF8MB4
Before MySQL 5.5, the UTF8 encoding only supported 1-3 bytes, only support the BMP part of the Unicode coding area, BMP from where to?
Poke here basically is 0000 ~ FFFF this area.
Starting with MySQL 5.5, 4 byte UTF encoded UTF8MB4 can be supported, and a character can have a maximum of 4 bytes, so more character sets can be supported.
utf8mb4 is a superset of utf8
TF8MB4 is compatible with UTF8 and can represent more characters than UTF8.
As for when to use, to see what you do the project ...
When you do a mobile app, you will encounter the iOS
user input expression in the text area emoji
, and if you do not do some processing, you will cause the database exception to be inserted.
Iv. the length of Chinese characters is related to coding
MYSQL version above 5.0:
1. How much length of a Chinese character is related to coding:
- UTF-8: A kanji = 3 bytes, English is a byte
- GBK: A kanji = 2 bytes, English is a byte
2, varchar (n) represents n characters, regardless of Chinese characters and English, MySql
can be stored n
in characters, only the actual byte length of the difference.
3. mysql check length, available in SQL language
SELECT LENGTH(fieldname) FROM tablename
V. Practical testing
1. Use the utf8
create str_test
table first.
CREATE TABLE `str_test` ( `name_chn` varchar(20) NOT NULL, `name_en` varchar(20) NOT NULL ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=utf8
Then insert the value
mysql> insert into str_test values (‘我爱Ruby‘, ‘I Love Ruby!‘); Query OK, 1 row affected (0.02 sec)
Open IRB
>> "我爱Ruby".size => 6 >> "I Love Ruby!".size => 12 >>
The results from the MySQL query, compare
mysql> select * from str_test; +------------+--------------+ | name_chn | name_en | +------------+--------------+ | 我爱Ruby | I Love Ruby! | +------------+--------------+ 1 row in set (0.02 sec) mysql> select length(name_chn) from str_test; +------------------+ | length(name_chn) | +------------------+ | 10 | +------------------+ 1 row in set (0.01 sec)
3[a kanji three bytes] * 2 + 1[One English one byte] * 4 = 10
mysql> select length(name_en) from str_test; +-----------------+ | length(name_en) | +-----------------+ | 12 | +-----------------+ 1 row in set (0.00 sec)
10[one English One byte] * 1 + 2[space One byte] * whitespace = 12
2, use to GBK
do the test
Create a table
CREATE TABLE `str_test` ( `name_chn` varchar(20) NOT NULL, `name_en` varchar(20) NOT NULL ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=gbk
Insert data, and test
mysql> insert into str_test values (‘我爱Ruby‘, ‘I Love Ruby!‘); Query OK, 1 row affected (0.00 sec) mysql> select * from str_test; +------------+--------------+ | name_chn | name_en | +------------+--------------+ | 我爱Ruby | I Love Ruby! | +------------+--------------+ 1 row in set (0.01 sec)
GBK
Chinese is two bytes, English is a byte.
mysql> select length(name_chn) from str_test; +------------------+ | length(name_chn) | +------------------+ | 8 | +------------------+ 1 row in set (0.00 sec)
2[Chinese Two bytes] * 2 + 4[English One byte] * 1 = 8
mysql> select length(name_en) from str_test; +-----------------+ | length(name_en) | +-----------------+ | 12 | +-----------------+ 1 row in set (0.00 sec)
10[English One byte] * 1 + 2[space One byte] * whitespace = 12
Vi. about how many values varchar can store
MySQL record line length is limited, not infinitely long, this length is, that is, 64K
65535
a byte, for all tables are the same.
MySQL will have 1-2 bytes for the variable-length field to hold the character length.
When the number of characters is less than or equal to 255, MySQL only uses 1 bytes to record, because 2 of the 8 square minus 1 can only be saved to 255.
When the number of characters is more than 255, you have to use 2 bytes to save the length.
In the utf-8
state of varchar, the maximum can only be to (65535-2)/3 = 21,844 + 1.
In the gbk
state of varchar, the maximum can only be to (65535-2)/2 = 32,766 + 1
Use to utf-8
Create
mysql> CREATE TABLE `str_test` ( -> `id` tinyint(1) NOT NULL, -> `name_chn` varchar(21845) NOT NULL -> ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=utf8 -> ; ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs mysql> CREATE TABLE `str_test` ( -> `id` tinyint(1) NOT NULL, -> `name_chn` varchar(21844) NOT NULL -> ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=utf8 -> -> -> ; Query OK, 0 rows affected (0.06 sec)
Use to gbk
Create
When the storage length is 32768 failed ~
mysql> CREATE TABLE `str_test` ( -> `id` tinyint(1) NOT NULL, -> `name_chn` varchar(32768) NOT NULL -> ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=gbk -> ; ERROR 1074 (42000): Column length too big for column ‘name_chn‘ (max = 32767); use BLOB or TEXT instead
When the storage length is 32767 failed ~
mysql> CREATE TABLE `str_test` ( -> `id` tinyint(1) NOT NULL, -> `name_chn` varchar(32767) NOT NULL -> ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=gbk -> ; ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
When the storage length is 32766 successful ~
mysql> CREATE TABLE `str_test` ( -> `id` tinyint(1) NOT NULL, -> `name_chn` varchar(32766) NOT NULL -> ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=gbk -> ; Query OK, 0 rows affected (0.03 sec)
The smallint is stored in two bytes, so
2[smallint] + 32766 * 2[varchar storage length] + 2[2 bytes to save length] > 65535
So failure ~
mysql> CREATE TABLE `str_test` ( -> `id` smallint(1) NOT NULL, -> `name_chn` varchar(32766) NOT NULL -> ) ENGINE=InnoDB AUTO_INCREMENT=62974 DEFAULT CHARSET=gbk -> ; ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
# # # #七, byte of numeric type
type |
Occupied bytes |
Int |
4 bytes |
smallint |
2 bytes |
tinyint |
1 bytes |
Decimal |
Variable length |
The official decimal
description is as follows
Values for DECIMAL (and NUMERIC) columns is represented using a binary format that packs nine DECIMAL (base) digits in to four bytes.
Storage for the integer and fractional parts of each value is determined separately.
Each multiple of nine digits requires four bytes, and the ' leftover ' digits require some fraction of four bytes.
The storage required for excess digits are given by the following table.
Translate to Chinese
Uses a binary format to compress 9 decimal (10) numbers to 4 bytes to represent the decimal column value.
The integer of each value and the storage of fractional parts are determined separately.
A multiple of each 9-digit number requires 4 bytes, and the "remaining" bits require a portion of 4 bytes.
The following table shows the storage requirements for exceeding the number of digits:
leftover Digits | Number of
Bytes |
0 |
0 |
1 |
1 |
2 |
1 |
3 |
2 |
4 |
2 |
5 |
3 |
6 |
3 |
7 |
4 |
8 |
4 |
That: Decimal (10,2) takes a few bytes?
1, the first 10 refers to the total length of the integer and fractional parts, 2 refers to the length of the fractional part. Then the whole number of parts is only 10-2 = 8 bits
2, because the whole number and the fractional storage city each independent determination, so their respective occupied space of the synthesis is the total space occupied.
3, to the table, the whole part of the 8 bits accounted for 4 bytes, the decimal part 2 bits accounted for 1 bytes, so decimal (10,2) accounted for a total of 4 + 1 = 5 bytes.
4, Decimal (6,2) integer part (6-2 = 4) bits accounted for 2 bytes, fractional Part 2 bits accounted for 1 bytes, a total of 3 bytes.
Viii. Summary
The varchar field stores the actual content separately from the clustered index, and the content begins with 1 to 2 bytes representing the actual length (2 bytes in length over 255), so the maximum length cannot exceed 65535.
- UTF-8: A kanji = 3 bytes, English is a byte
- GBK: A kanji = 2 bytes, English is a byte
In the utf-8
state, Chinese characters can save up to 21,844 strings, and English is 21,844 strings.
In the gbk
state, Chinese characters can save up to 32,766 strings, and English is 32,766 strings.
MySQL database varchar How many Chinese characters can be saved, how many English? Let's find out.