Take you 5 minutes to understand MySQL Character Set settings, mysql Character Set
I. Content Overview
When using MySQL, it is important to understand the concepts of character sets and collation, and the impact of different settings on data storage and comparison. Many people encounter garbled characters in their daily work, probably because they do not have a correct understanding of character sets and collation or are incorrectly set.
This article introduces the following content:
- Basic concepts and relationships of character sets and collation
- Character Set and collation settings supported by MySQL, and connections between different settings
- Server, database, table, column-level character set, character order view and settings
- When should I set the character set and collation?
Ii. Concepts and links of character sets and collation
MySQL provides different character sets for data storage. In terms of data comparison, different Collation is supported.
MySQL provides different levels of settings, including server level, database level, table level, and column level, which can provide very precise settings.
What are character sets and collation? To put it simply:
- Character set: defines the character encoding and character encoding.
- Collation: defines character comparison rules.
For example:
There are four characters: A, B, a, and B. The four characters are encoded as A = 0, B = 1, a = 2, B = 3. The character + encoding here constitutes the character set (character set ).
What if we want to compare the two characters? For example, A, B, or a, B, the most intuitive comparison method is to use their encoding, for example, A <B Because 0 <1.
In addition, for A and a, although they are encoded differently, we think the uppercase and lowercase characters should be equal, that is, A =.
The preceding two comparison rules are defined. The set of these comparison rules is collation.
- Compare the encoding size of uppercase and lowercase characters;
- If the two characters are case-sensitive, they are equal.
Iii. Character Set and collation supported by MySQL
MySQL supports multiple character sets and collation.
- A character set corresponds to at least one character sequence (generally one-to-many ).
- Two different character sets cannot have the same collation.
- Each character set has a default collation.
The above is more abstract. Let's take a look at the following sections and we will know what is going on.
1. view supported character sets
You can view the character sets supported by MYSQL in the following ways.
Method 1:
mysql> SHOW CHARACTER SET;
+ ---------- + ------------------------- + -------- ------------- + -------- +
Charset | Description | Default collation | Maxlen |
+ ---------- + ------------------------- + -------- ------------- + -------- +
big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
... omitted
Method 2:
mysql> use information_schema;
mysql> select * from CHARACTER_SETS;
+--------------------+----------------------+-----------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN |
+--------------------+----------------------+-----------------------------+--------+
| big5 | big5_chinese_ci | Big5 Traditional Chinese | 2 |
| dec8 | dec8_swedish_ci | DEC West European | 1 |
...ommited
When you use show character set for viewing, you can also add the WHERE or LIKE restriction.
Example 1: use WHERE conditions.
mysql> SHOW CHARACTER SET WHERE Charset="utf8";
+---------+---------------+-------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
+---------+---------------+-------------------+--------+
1 row in set (0.00 sec)
Example 2: Use the LIKE condition.
mysql> SHOW CHARACTER SET LIKE "utf8%";
+---------+---------------+--------------------+--------+
| Charset | Description | Default collation | Maxlen |
+---------+---------------+--------------------+--------+
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
+---------+---------------+--------------------+--------+
2 rows in set (0.00 sec)
2. view supported collation
Similarly, you can view the collation supported by MYSQL in the following way.
Method 1: View data by using show collation.
We can see that the utf8 character set contains more than 10 collation characters. The Default value is Yes to determine whether it is the Default collation.
mysql> SHOW COLLATION WHERE Charset = 'utf8';
+--------------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------------+---------+-----+---------+----------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
...omitted
Method 2: Query information_schema.COLLATIONS.
mysql> USE information_schema;
mysql> SELECT * FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8";
+--------------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+--------------------------+--------------------+-----+------------+-------------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
3. Naming rules for collation
The name of the collation, with the corresponding character set as the prefix, as shown below. For example, utf8_general_ci indicates that it is the character sequence of the character set utf8.
For more rules, see the official documentation.
MariaDB [information_schema]> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8" limit 2;
+--------------------+-----------------+
| CHARACTER_SET_NAME | COLLATION_NAME |
+--------------------+-----------------+
| utf8 | utf8_general_ci |
| utf8 | utf8_bin |
+--------------------+-----------------+
2 rows in set (0.00 sec)
Iv. server Character Set and collation
Purpose: When you create a database without specifying a character set or collation, the server Character Set and server character sequence are used as the default Character Set and collation of the database.
How to specify: When the MySQL service is started, it can be specified through the command line parameter. You can also specify the parameters in the configuration file.
Server default Character Set and collation: this parameter is specified during MySQL compilation.
Character_set_server and collation_server correspond to the server Character Set and server character order respectively.
1. view the server Character Set and collation
It corresponds to two system variables, character_set_server and collation_server.
mysql> SHOW VARIABLES LIKE "character_set_server";
mysql> SHOW VARIABLES LIKE "collation_server";
2. Specify
You can specify the server Character Set and collation when starting the MySQL service. If this parameter is not specified, the default Collation is latin1 and latin1_swedish_ci.
mysqld --character-set-server=latin1 \
--collation-server=latin1_swedish_ci
Specify the server character set separately. At this time, the server character order is latin1_swedish_ci by default.
mysqld --character-set-server=latin1
3. Specify the configuration file
In addition to the command line parameter, you can also specify it in the configuration file, as shown below.
[client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
4. Modify at runtime
Example: modification during running (it will expire after restart. If you want to keep it unchanged after restart, you need to write it into the configuration file)
mysql> SET character_set_server = utf8 ;
5. Specify the default Character Set and collation during compilation
The default values of character_set_server and collation_server can be specified through the compilation option during MySQL Compilation:
cmake . -DDEFAULT_CHARSET=latin1 \
-DDEFAULT_COLLATION=latin1_german1_ci
V. database character set and collation
Purpose: Specify the character set and collation at the database level. Different character sets/collation can be specified for databases in the same MySQL service.
1. Set the character set/collation of data
You can use character set and COLLATE to specify CHARACTER sets and sorting rules when creating and modifying databases.
Create a database:
CREATE DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
[[DEFAULT] COLLATE collation_name]
Modify database:
ALTER DATABASE db_name
[[DEFAULT] CHARACTER SET charset_name]
[[DEFAULT] COLLATE collation_name]
Example: Create the database test_schema and set the character set to utf8. The default sorting rule is utf8_general_ci.
CREATE DATABASE `test_schema` DEFAULT CHARACTER SET utf8;
2. view the character set/collation of the database
There are three ways to view the character set/collation of the database.
Example 1: view the character set and sorting rules of test_schema. (You need to switch the default database)
mysql> use test_schema;
Database changed
mysql> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8 | utf8_general_ci |
+--------------------------+----------------------+
1 row in set (0.00 sec)
Example 2: You can also use the following command to view the character set and database of test_schema (you do not need to switch the default database)
mysql> SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE schema_name="test_schema";
+-------------+----------------------------+------------------------+
| SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+-------------+----------------------------+------------------------+
| test_schema | utf8 | utf8_general_ci |
+-------------+----------------------------+------------------------+
1 row in set (0.00 sec)
Example 3: You can also view the character set by viewing the statements used to create a database.
mysql> SHOW CREATE DATABASE test_schema;
+-------------+----------------------------------------------------------------------+
| Database | Create Database |
+-------------+----------------------------------------------------------------------+
| test_schema | CREATE DATABASE `test_schema` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+-------------+----------------------------------------------------------------------+
1 row in set (0.00 sec)
3. How is the database character set and collation determined?
If character set or COLLATE is specified when you create a database, the corresponding character set and sorting rule are used.
If no character set or sorting rule is specified during database creation, character_set_server and collation_server will prevail.
6. table Character Set and collation
The syntax for creating and modifying tables is as follows. You can use character set and COLLATE to set character sets and collation.
CREATE TABLE tbl_name (column_list)
[[DEFAULT] CHARACTER SET charset_name]
[COLLATE collation_name]]
ALTER TABLE tbl_name
[[DEFAULT] CHARACTER SET charset_name]
[COLLATE collation_name]
1. Create a table and specify the character set/Collation
For example, if the character set is utf8, the default Collation is used.
CREATE TABLE `test_schema`.`test_table` (
`id` INT NOT NULL COMMENT '',
PRIMARY KEY (`id`) COMMENT '')
DEFAULT CHARACTER SET = utf8;
2. view the character set/collation of the table
Similarly, there are three ways to view the character set/Collation of a table.
Method 1: view the table status through SHOW table STATUS. Note that Collation is utf8_general_ci and the corresponding character set is utf8.
MariaDB [blog]> SHOW TABLE STATUS FROM test_schema \G;
*************************** 1. row ***************************
Name: test_table
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 0
Avg_row_length: 0
Data_length: 16384
Max_data_length: 0
Index_length: 0
Data_free: 11534336
Auto_increment: NULL
Create_time: 2018-01-09 16:10:42
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
Method 2: View information about information_schema.TABLES.
mysql> USE test_schema;
mysql> SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA = "test_schema" AND TABLE_NAME = "test_table";
+-----------------+
| TABLE_COLLATION |
+-----------------+
| utf8_general_ci |
+-----------------+
Method 3: confirm with show create table.
mysql> SHOW CREATE TABLE test_table;
+------------+----------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+----------------------------------------------------------------------------------------------------------------+
| test_table | CREATE TABLE `test_table` (
`id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------+----------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
3. How to Determine the table Character Set and collation
Assume that the character set and COLLATE values are charset_name and collation_name respectively. If:
If charset_name and collation_name are specified, charset_name and collation_name are used.
Only charset_name is specified, but collation_name is not clear, charset_name is used, and charset_name is used as the default character order corresponding to charset_name.
Only collation_name is specified, but charset_name is not clear. collation_name is used in the collation, and collation_name is used in the character set.
If charset_name and collation_name are not clear, the character set and collation settings of the database are used.
VII. column Character Set and sorting
For CHAR, VARCHAR, and TEXT columns, you can specify the character set/collation. The syntax is as follows:
col_name {CHAR | VARCHAR | TEXT} (col_length)
[CHARACTER SET charset_name]
[COLLATE collation_name]
1. Add column and specify Character Set/sorting rule
Example: (table creation is similar)
mysql> ALTER TABLE test_table ADD COLUMN char_column VARCHAR(25) CHARACTER SET utf8;
2. view the character set/Collation of column
Example:
mysql> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA="test_schema" AND TABLE_NAME="test_table" AND COLUMN_NAME="char_column";
+--------------------+-----------------+
| CHARACTER_SET_NAME | COLLATION_NAME |
+--------------------+-----------------+
| utf8 | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)
3. column Character Set/sorting Rule Confirmation
Assume that the character set and COLLATE values are charset_name and collation_name:
If both charset_name and collation_name are specified, charset_name and collation_name are used as the character set and collation parameters.
If only charset_name and collation_name are specified, the character set is charset_name, and the character order is the default character order of charset_name.
If only collation_name and charset_name are specified, the collation is collation_name and the character set is the character set associated with collation_name.
If charset_name and collation_name are not specified, the character set and collation of the table shall prevail.
8. Selection: when to set the character set and collation
Generally, configuration can be performed in three places:
Configure when creating a database.
Configure mysql server when it is started.
When compiling mysql from the source code, configure it by compiling Parameters
1. Method 1: Configure when creating a database
This method is flexible and safe. It does not depend on the default Character Set/collation. When you create a database, specify the character set/Character Sequence. When you create tables and columns in the future, if you do not specify the character set/character sequence, it inherits the character set/Character Sequence of the database.
CREATE DATABASE mydb
DEFAULT CHARACTER SET utf8
DEFAULT COLLATE utf8_general_ci;
2. Method 2: Configure when mysql server is started
You can add the following configuration to configure character-set-server and collation-server when mysql server is started.
When you create a database, table, or column using mysql client without displaying the declared character set or collation, character-set-server/collation-server will be used as the default character set/collation.
In addition, the character SET/collation during client/server connection still needs to be SET through set names.
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
3. Method 3: Set the compilation parameters when compiling mysql from the source code.
If-DDEFAULT_CHARSET and-DDEFAULT_COLLATION are specified during compilation, then:
When you create a database or table, it is used as the default Character Set/collation.
When the client connects to the server, it uses it as the default Character Set/collation. (Do not set names separately)
shell> cmake . -DDEFAULT_CHARSET=utf8 \
-DDEFAULT_COLLATION=utf8_general_ci
IX. Post
This article describes in detail the character set and collation related content in MySQL. This part focuses on data storage and comparison. In fact, a very important part of the content has not been involved: For the character set of the connection, character order settings.
Due to incorrect Character Set and collation settings, many garbled characters are involved. This part of content is also described in the next article.
10. Related Links
10.1 Character Set Support
Https://dev.mysql.com/doc/refman/5.7/en/charset.html
Summary
The above is a small series that will show you how to read MySQL Character Set settings in five minutes. I hope it will help you. If you have any questions, please leave a message and I will reply to you in time. Thank you very much for your support for the help House website!