One, character set and validation rules
A character set is a set of conformance and encoding, a set of rules that are used to compare characters within a character set, that is, the collation of a character set. MySQL can use the pair of character sets and inspection rules to organize characters.
MySQL server can support a variety of character sets, in the same server, the same database, even different fields of the same table can be specified using different character sets, compared to other database management systems such as Oracle, in the same database can only use the same character set, MySQL clearly has greater flexibility.
Each character set can have multiple proofing rules, and there is a default proofing rule, and each proofing rule is only for a character set and is related to other character sets.
In MySQL, the concept and coding scheme of a character set is considered synonymous, and a character set is a combination of a conversion table and an encoding scheme.
Unicode (Universal code) is a character encoding that is used on a computer. Unicode is created to address the limitations of traditional character encoding schemes, which set a uniform and unique binary encoding for each character in each language to meet the requirements of cross-language, cross-platform text conversion and processing. There are different encoding schemes for Unicode, including utf-8,utf-16 and Utf-32. UTF represents the Unicode transformation Format. Second, view the MySQL character Set Method 1, view the MySQL server supported character set
Mysql> show Character set;
Mysql> select * from Information_schema.character_sets;
Mysql> Select Character_set_name, Default_collate_name, description, MaxLen from
Information_schema.character_sets; 2. Review the collation rules of the character set
Mysql> Show collation;
Mysql> show collation like ' UTF8 ';
Mysql> SELECT * from information_schema.collations where collation_name like ' utf8% '; 3. View the character set of the current database
Mysql> Show variables like ' character% ';
+--------------------------+----------------------------------+
| variable_name | Value |
+--------------------------+----------------------------------+
| character_set_client | UTF8 |
| character_set_connection | UTF8 |
| Character_set_database | Latin1 |
| Character_set_filesystem | UTF8 |
| Character_set_results | UTF8 |
| Character_set_server | UTF8 |
| Character_set_system | UTF8 |
| Character_sets_dir | /usr/local/mysql/share/charsets/|
+--------------------------+----------------------------------+
8 rows in Set (0.00 sec)
Noun Explanation:
Character_set_client: Character set for client request data
Character_set_connection: The character set of the client/server connection
Character_set_database: The character set of the default database, regardless of the default database, is the character set, and if there is no default database, use the character set specified by Character_set_server, which is recommended to be managed by the system itself. Do not define it artificially.
Character_set_filesystem: Convert the file name of the OS to this character set, that is, the character_set_client conversion character_set_filesystem, the default binary is not to do any conversion
Character_set_results: Result set, character set returned to the client
Character_set_server: Default character set for the database server
Character_set_system: The system character set, this value is always utf8 and does not need to be set. This character set is used for names of database objects (such as tables and columns), and for functions stored in the table of contents. 4, review the current database proofing rules
Mysql> Show variables like ' collation% ';
+----------------------+-------------------+
| variable_name | Value |
+----------------------+-------------------+
| collation_connection | Utf8_general_ci |
| Collation_database | Latin1_swedish_ci |
| Collation_server | Utf8_general_ci |
+----------------------+-------------------+
3 rows in Set (0.01 sec)
Noun Explanation:
Collation_connection the character set of the current connection.
Collation_database The default proofing for the current date. Each time a use statement is used to "jump" to another database, the value of the variable changes. If there is no current database, the value of this variable is the value of the Collation_server variable.
Collation_server The default proofing for the server.
The naming convention for sorting is: The character Set name _ language _ suffix, where each typical suffix has the following meanings:
1) _ci: case-insensitive sorting method
2) _cs: Case-sensitive sorting method
3) _bin: Binary sorting method, size comparison will be based on character encoding, does not involve human language, so _bin sorting method does not contain human language three, MySQL character set 1, overview
The MySQL character set settings fall into two categories:
1) Create the default value for the object.
2) control the configuration of interactive communication between server and client side. 1. Create default values for objects
The character set collation rule has 4 levels of default settings:
1) server level;
2) database level;
3) Table level, column level;
4) connection level.
Lower-level settings integrate high levels of settings.
Here's a general rule: Select a reasonable character set for the server or database, and then let a column choose its own character set depending on the actual situation. 2, control the server and client side interactive communication configuration
Most MySQL clients do not have the ability to support multiple character sets at the same time, using only one character set at a time.
The character set conversion between the client and the server is controlled by several MySQL system variables.
1) character_set_server:mysql server default character set.
2) Character_set_database: Database default character Set.
3) Character_set_client:mysql Server assumes that the client sends a query using the character set.
4) After the CHARACTER_SET_CONNECTION:MYSQL server receives a query request published by the client, it converts it to the character set specified by the character_set_connection variable.
5) Character_set_results:mysql Server converts the result set and error information to the Character_set_results specified character set and sends it to the client.
6)Character_set_system: System metadata (field name, etc.) character set
There are also variables that begin with Collation_ to describe the character Fu She.
Precautions:
? The Default_character_set setting in MY.CNF only affects the connection character set when the MySQL command connects to the server, and does not have any effect on applications that use the Libmysqlclient library!
? SQL function operations on fields are usually performed in the internal operation character set and are not affected by the connection character set settings.
? Bare strings in SQL statements are affected by the connection character set or introducer settings, which can produce completely different results for operations such as comparisons, and you need to be careful! 3. Character set selection rules by default
(1) When compiling MySQL, a default character set is specified, and this character set is latin1;
(2) When installing MySQL, you can specify a default character set in the configuration file (MY.CNF), and if not specified, this value is inherited from the compile-time specified;
(3) When starting mysqld, you can specify a default character set in the command line arguments, if not specified, this value inherits from the configuration in the configuration file, at this time Character_set_server is set to this default character set;
(4) When a new database is created, the character set of the database is set to Character_set_server by default unless explicitly specified;
(5) When a database is selected, Character_set_database is set to the default character set of this database;
(6) When creating a table in this database, the default character set of the table is set to Character_set_database, which is the default character set of the database;
(7) When a column is set in the table, the default character set of the column is the set of defaults for the table unless explicitly specified; 2.2.1. Assigning character sets to columns
Different columns that belong to the same table can have different character sets, and the default character set is used if no character set is displayed for a column. When you create a table, if you specify a character set for the column, the character set is included as a data type option, and is placed behind the data type and before the empty designation and primary key.
For example:
CREATE TABLE Column_charset (
C1 Char (Ten) character set UTF8 not null,
C2 char (Ten) char set UTF8,
C3 varchar (CharSet UTF8),
C4 varchar (ten)) Engine=innodb;
Note: Character Set can be shortened to Char Set and the CharSet
Use the show CREATE table table_name; command to view the Column_charset build statement:
Mysql> Show CREATE TABLE column_charset\g;
1. Row ***************************
Table:column_charset
Create table:create Table ' Column_charset ' (
' C1 ' char (TEN) CHARACTER SET UTF8 not NULL,
' C2 ' char (Ten) CHARACTER SET UTF8 DEFAULT NULL,
' C3 ' varchar (ten) CHARACTER SET UTF8 DEFAULT NULL,
' C4 ' varchar (DEFAULT NULL)
) Engine=innodb DEFAULT charset=latin1
1 row in Set (0.01 sec)
ERROR:
No query specified
Insert the data and feel the effect:
mysql> INSERT INTO Column_charset (C1,C2,C3,C4) value ("Turing", "Turing", "Turing", "Chavin");
Query OK, 1 row affected (0.01 sec)
Mysql> select * from Column_charset;
+--------+--------+--------+--------+
| C1 | C2 | C3 | C4 |
+--------+--------+--------+--------+
| Turing | Turing | Turing | Chavin |
+--------+--------+--------+--------+
1 row in Set (0.00 sec) 2.2, assigning a character set to a table
CREATE TABLE Table_charset (
C1 varchar (10),
C2 varchar (ENGINE=INNODB) default Charset=utf8;
Note: Specifying the character set for a table can be used in several ways:
default Charset=utf8;
Charset=utf8;
default character Set=utf8;
character Set=utf8;
default char Set=utf8;
Char Set=utf8;
To check the Build Table statement:
Mysql> Show CREATE TABLE table_charset\g;
1. Row ***************************
Table:table_charset
Create table:create Table ' Table_charset ' (
' C1 ' varchar DEFAULT NULL,
' C2 ' varchar (DEFAULT NULL)
) Engine=innodb DEFAULT Charset=utf8
1 row in Set (0.00 sec)
Test:
mysql> INSERT INTO Table_charset (C1,C2) VALUES (' Turing ', ' Turing ');
Query OK, 1 row affected (0.01 sec)
Mysql> select * from Table_charset;
+--------+--------+
| C1 | C2 |
+--------+--------+
| Turing | Turing |
+--------+--------+
1 row in Set (0.00 sec) 2.3, specifying a character set for the database
Each database you create has a default character set, and if not specified, use Latin1.
Create Database dbking Charset=utf8;
Note: Creating a database allocation character set can take the following seed sentences:
Charset=utf8;
default Charset=utf8;
charset UTF8;
default charset UTF8;
Char Set=utf8;
default char Set=utf8;
char set UTF8;
default char set UTF8;
character Set=utf8;
default character Set=utf8;
character set UTF8;
default character set UTF8;
Use the show create database db_name; command to view the DB creation statement:
Mysql> Show CREATE Database dbking;
+----------+-----------------------------------------------------------------+
| Database | Create Database |
+----------+-----------------------------------------------------------------+
| dbking | CREATE DATABASE ' dbking '/*!40100 DEFAULT CHARACTER SET UTF8 */| |
+----------+-----------------------------------------------------------------+
1 row in Set (0.00 sec) 2.4, assigning collation rules to columns
Each column should have a proofing, and if no display is specified, MySQL uses the default proofing that belongs to that character set. If you specify a character set and a proofreading, the character set should be preceded.
CREATE TABLE Column_collate (
C1 varchar (TEN) CharSet UTF8 collate utf8_romanian_ci not NULL,
C2 varchar (TEN) CharSet UTF8 collate utf8_spanish_ci) Engine=innodb;
To view the validation rule information for a table:
Mysql> Select Table_name,column_name,collation_name
From Information_schema.columns
where table_name= ' column_collate ';
+----------------+-------------+------------------+
| table_name | column_name | Collation_name |
+----------------+-------------+------------------+
| Column_collate | C1 | Utf8_romanian_ci |
| Column_collate | C2 | Utf8_spanish_ci |
+----------------+-------------+------------------+
2 rows in Set (0.04 sec)
Note: Character sets and proofreading play an important role in the process of handling character expressions. We cannot compare two different character values that belong to different proofreading. For example:
mysql> INSERT INTO column_collate (C1,C2) VALUES (' A ', ' a ');
Query OK, 1 row affected (0.22 sec)
Mysql> select * from Column_collate;
+----+------+
| C1 | C2 |
+----+------+
| A | A |
+----+------+
1 row in Set (0.00 sec)
Mysql> SELECT * from Column_collate where c1=c2;
ERROR 1267 (HY000): illegal mix of collations (utf8_romanian_ci,implicit) and (utf8_spanish_ci,implicit) for operation ' = ' 2.5. Specify the proofing rules for the table
CREATE TABLE Table_collate (
C1 varchar (10),
C2 varchar (ENGINE=INNODB) default charset UTF8 collate utf8_romanian_ci;
Check the proofing rules for a table:
Mysql> Select Table_name,column_name,collation_name from information_schema.columns where Table_name= ' Table_ Collate ';
+---------------+-------------+------------------+
| table_name | column_name | Collation_name |
+---------------+-------------+------------------+
| Table_collate | C1 | Utf8_romanian_ci |
| Table_collate | C2 | Utf8_romanian_ci |
+---------------+-------------+------------------+
2 rows in Set (0.00 sec) 2.6, specifying proofing rules for databases
Create DATABASE dbking102 default charset UTF8 collate utf8_romanian_ci;
To view a database definition statement:
Mysql> Show CREATE Database dbking102\g;
1. Row ***************************
database:dbking102
Create database:create Database ' dbking102 '/*!40100 DEFAULT CHARACTER SET UTF8 COLLATE utf8_romanian_ci */
1 row in Set (0.00 sec) 2.7, character direct volume character set
If no display is specified, the character set of the direct amount of characters is the default character set for the database. If you want to display the assignment of another character set, you need to put the name of the character set in front of the direct amount and underline it before the character set.
Mysql> Select _utf8 ' language Language words язык ';
+---------------------------------+
| Language Language Words язык|
+---------------------------------+
| Language Language Words язык|
2.8. Modify and set the MySQL server-level character set
The MySQL server supports a number of different character sets, which can be specified at compile time and at run time. 1) specified at compile time
You can specify the default character set and default proofing rules at compile time, to change both the default character set and proofing rules at the same time, using the--with-charset and--with-collation options. The proofing rules must be a valid proofing rule for the character set.
./configure--With-charset=charset--with-collation=collation
With the Configure option--with-extra-charsets=list, you can define the additional character set that is defined in the server. LIST refers to any of the following:
A. A series of character set names for space intervals
B.complex-to include all character sets that cannot be loaded dynamically
c.all– to include all character sets in the binary
./configure--With-charset=charset--with-collation=collation--with-extra-charsets=all 2) specified in the parameter file my.cnf
[Mysqld]
Character_set_server=utf8
--Influence parameters: Character_set_server and Character_set_database
--Note: You will have to restart the database after you modify it to take effect.
[Client]
Default-character-set=utf8
--Influence parameters: Character_set_client,character_set_connection and Character_set_results.
--Note: There is no need to restart the database after modification.
3) Specify before starting the parameter
./mysqld--character-set-server=utf8 &
--Influence parameters: Character_set_server and Character_set_database
4) specified via--default-character-set when MySQL client logs in
Mysql-uroot-pmysql--default-character-set=utf8
--Influence parameters: Set Character_set_client,set character_set_connection,set character_set_results. 5) Temporary designation
A) specify separately
mysql> SET character_set_client = UTF8;
mysql> SET character_set_connection = UTF8;
mysql> SET character_set_database = UTF8;
mysql> SET character_set_results = UTF8;
mysql> SET character_set_server = UTF8;
b) MySQL client use: set names UTF8;
Equivalent to
Set Character_set_client=utf8;
Set Character_set_connection=utf8;
Set Character_set_results=utf8;
c) Set character set UTF8;
Equivalent to
Set Character_set_client=utf8;
Set Character_set_results=utf8;
Set [email protected] @collation_database; 3, summary
Here are the next few MySQL commands:
1) show character set; or show char set;
View all character sets supported by the database
2) status; or \s;
View the current status including the character set settings of course
3) Show variables like ' char% ';
View system character set settings, including all character set settings
4) Show table status from Sqlstudy like '%countries% ';
To view the character set settings for a table in a Sqlstudy database
5) Show full columns from countries;
To view the character set settings for a table column, the key is that each column can be set to a different set of characters in the same table
Know how to view the character set, below me how to set these character sets
1. Modify server-level
A. Temporary changes:
Mysql>set GLOBAL Character_set_server=utf8;
B. Permanent changes:
Modify the My.cnf file
[Mysqld]
Character-set-server=utf8
2. Modify the database level
A. Temporary changes:
Mysql>set GLOBAL Character_set_database=utf8;
B. Permanent changes:
You can change the server level.
3. Modify the table level
Mysql>alter TABLE table_name DEFAULT CHARSET UTF8;
Changed after permanent effect
4. Modifying column Levels
To modify an example:
Mysql>alter table ' products ' change ' products_model ' products_model ' varchar (20)
Character Set UTF8 collate utf8_general_ci null default null;
Changed after permanent effect
5. Change the connection character set
A. Temporary changes:
mysql> set names UTF8;
B. Permanent changes:
Modify the My.cnf file
Add in [client]:
Default-character-set=utf8
The path to the information when executing the SQL statement is this
Information input path: client→connection→server;
Information output path: Server→connection→results. Four, the character set conversion process in MySQL database
1. When the MySQL server receives the request, it converts the request data from character_set_client to character_set_connection;
2, before the internal operation of the request data from character_set_connection to the internal operation character set, the method is determined as follows:
Use the character set set value for each data field;
If the above value does not exist, the default CHARACTER set value of the corresponding data table is used (MySQL extension, non-SQL standard);
If the above value does not exist, the default CHARACTER set value of the corresponding database is used;
If the above value does not exist, use Character_set_server to set the value.
3. Convert the operation result from the internal operation character set to Character_set_results.
Derived from the plot of character set conversions in high-performance MySQL:
Five, MySQL database garbled reason analysis and case
1, the root cause of garbled characters
1) The client does not correctly set the client character set, which causes the original SQL statement to be converted to the connection character set, and this conversion is lost information, if the client is UTF8 format, then if the conversion to gb2312 format, which will inevitably lose information, Conversely, it is not lost. Make sure that the connection character set is larger than the client character set to guarantee that the conversion does not lose information.
2) The database font is not set correctly, if the database font is set incorrectly, then the connection character set conversion to the database character set is lost encoding, the same reason as above. 2, garbled or data loss
Character_set_client: What encoding do we want to tell the server that I sent you the data?
Character_set_connection: Tell the character set converter what encoding to convert to?
Character_set_results: What code is used for the results of the query?
If the above three are character set N, can be abbreviated as set names ' n '; 2.1 garbled problem
Simulation Scenario 1:
When you insert UTF8 encoded data into a data table that has the default character set of UTF8, the connection character set is Latin1, and the connection character set is UTF8 when you query.
The Character_set_client, character_set_connection, and Character_set_results are all latin1, depending on the default settings of the MySQL server when inserting;
The data for the insert operation passes through the LATIN1=>LATIN1=>UTF8 character set conversion process, in which each inserted character is saved from the original 3 bytes into 6 bytes;
The results of the query will pass the UTF8=>UTF8 character set conversion process, the saved 6 bytes are returned intact, resulting in garbled ...
For example:
mysql> set names latin1;
Mysql> CREATE TABLE Temp (name varchar) charset UTF8;
mysql> INSERT into temp values (' China ');
Mysql> SELECT * from temp;
+--------+
| name |
+--------+
| China |
+--------+
mysql> set names UTF8;
Mysql> SELECT * from temp;
+---------------+
| name |
+---------------+
| ?? ??? |
+---------------+
Note: The storage character set encoding is larger than the character set when inserting, if the returned data is garbled, but can be modified by modifying the query character set to avoid garbled, that is, no data loss. 2.2 Data loss issues
Simulation Scenario 1:
The connection character set is UTF8 before inserting UTF8 encoded data into a data table with the default character set of Latin1
The Character_set_client, character_set_connection, and character_set_results are all UTF8 when inserted according to the connection character set;
The inserted data will be converted to a utf8=>utf8=>latin1 character set, and if the original data contains Unicode characters outside the \U0000~\U00FF range, it will be converted to "?" because it cannot be represented in the latin1 character set. (0x3f) symbol, which cannot be restored at a later time, regardless of the connection character set settings.
For example:
mysql> set names UTF8;
Mysql> CREATE TABLE Temp (name varchar) charset latin1;
mysql> INSERT into temp values (' China ');
Mysql> SELECT * from temp;
+------+
| name |
+------+
| ?? |
+------+
mysql> set names latin1;
Mysql> SELECT * from temp;
+------+
| name |
+------+
| ?? |
+------+
The data is incomplete and cannot be recovered. 3, garbled the ultimate solution
1) First of all to clarify what encoding format your client, this is the most important (IE6 General with UTF8, command line is generally GBK, general program is gb2312)
2) Make sure your database uses the UTF8 format, very simple, all coding-all.
3) Make sure that the connection character set is greater than or equal to the client character set, otherwise the information will be lost, such as: Latin1 < gb2312 < GBK < UTF8, if set character_set_client = GB2312, then at least the connection character set is greater than or equal to gb2312, otherwise the information will be lost
4) The above three steps do the correct words, then all Chinese is correctly converted into UTF8 format stored in the database, in order to adapt to different browsers, different clients, you can modify the Character_set_results to display the Chinese font in different encodings, because UTF8 is the general direction, So the web app is I still prefer to use the UTF8 format to display Chinese.
MySQL Character set