Deep Mysql Character Set setting [Essence combination]_mysql

Source: Internet
Author: User
Tags numeric mysql client naming convention numeric value set set create database

Basic Concepts
• character (Character) refers to the smallest form of semantic notation in human language. such as ' A ', ' B ', etc.;
• Given a series of characters, assign a numeric value to each character, and use a numeric value to represent the corresponding character, which is the encoding of the character (Encoding). For example, we give the character ' a ' a value of 0, giving the character ' B ' a value of 1, then 0 is the character ' a ' encoding;
• Given a series of characters and a corresponding encoding, the set of all these characters and pairs of encodings is the character set (Character set). For example, when the given character characters is {' A ', ' B '}, {' A ' =>0, ' B ' =>1} is a character set;
• Character Fu She (collation) refers to the rules of comparison between characters in the same character set;
• Determine the character Fu She to define what is equivalent on a character set and the size relationship between characters;
• Each character Fu She uniquely corresponds to a character set, but a character set can correspond to a variety of character Fu She, one of which is the default character Fu She (defaulted collation);
The word Fu She name in MySQL follows the naming convention: starts with a character set name corresponding to the Fu She, ends with _ci (which indicates case insensitive), _cs (for case sensitive), or _bin (indicates a comparison by encoded value). For example: Under the word Fu She ' utf8_general_ci ', the character ' a ' and ' a ' are equivalent;
MySQL Character set settings
• System Variables:
–character_set_server: Default internal action Character set
–character_set_client: Character set used by client source data
–character_set_connection: Connection Layer Character Set
–character_set_results: Query result Character Set
–character_set_database: The default character set for the currently selected database
–character_set_system: System metadata (field name, etc.) character set
-There are also the corresponding variables at the beginning of the collation_ to describe the word Fu She.
• Use Introducer to specify the character set of the text string:
– Format: [_charset] ' string ' [COLLATE collation]
e.g.
Select _latin1 ' string ';
select _utf8 ' Hello ' COLLATE utf8_general_ci;
– The Introducer-decorated text string is directly converted to internal character set processing without the extra transcoding in the request process.
The character set conversion process in MySQL
1. When the MySQL server receives the request, converts the request data from character_set_client to character_set_connection;
2. The request data is converted from character_set_connection to the internal operation character set prior to internal operation, and the method is determined as follows:
• Use the character set set value for each data field;
• If the above values do not exist, then use the corresponding data table default CHARACTER Set set value (MySQL extension, non-SQL standard);
• If the above values do not exist, then use the corresponding database default CHARACTER set set value;
• If the above values do not exist, use the Character_set_server set value.
3. Converts the result of the operation from the internal operation character set to Character_set_results.


Common Problem Resolution
• The connection character set is not set before inserting UTF8 encoded data into the UTF8 datasheet for the default character set, and setting the connection character set to the query is UTF8

– Inserts the default settings for the MySQL server, character_set_client, character_set_connection, and Character_set_results are latin1;

– The data for the insert operation passes through the LATIN1=>LATIN1=>UTF8 character set conversion process, in which each inserted Chinese character is saved from the original 3 bytes into 6 bytes;

– The results of the query will go through the Utf8=>utf8 character set conversion process, the saved 6 bytes returned intact, generated garbled ...

• Set the connection character set to the UTF8 encoded data before inserting the Latin1 data table into the default character set UTF8

– When inserted according to the connection character set, Character_set_client, character_set_connection and character_set_results are UTF8;

– inserting data is converted to utf8=>utf8=>latin1 character set, and if the original data contains Unicode characters outside the/U0000~/U00FF range, it is converted to "?" because it cannot be represented in the latin1 character set. (0x3F) symbol, which cannot be recovered at a later query regardless of the connection character set setting.

some means of detecting character set problems
show CHARACTER SET;

Show collation;

show VARIABLES like ' character% ';

show VARIABLES like ' collation% ';

SQL function Hex, LENGTH, Char_length

SQL function CharSet, collation

recommendations when using the MySQL character set
• Create database/table and database operations as much as possible to explicitly indicate the use of the character set, rather than relying on MySQL default settings, or MySQL upgrade may cause great trouble;

• Database and connection character sets are used latin1 Although most cases can solve the garbled problem, but the disadvantage is not to use the character as a unit for SQL operations, in general, the database and the connection character set for the UTF8 is a better choice;

• When using the MySQL C API, initialize the database handle with Mysql_options to set the Mysql_set_charset_name property to UTF8 immediately, so that you do not have to explicitly specify the connection character set with the set names statement, and use the Mysql_ The connection character set is reset to UTF8 when the ping is connected to a disconnected long connection;

• For the MySQL php API, General page-level PHP program total running time is short, after connecting to the database explicitly with the SET NAMES statement set a connection character set; But when using long connections, be careful to keep the connections open and use set after disconnecting the reconnection The names statement explicitly resets the connection character set.

Other considerations
The Default_character_set setting in my.cnf only affects the connection character set when the MySQL command connects to the server, and does not have any effect on applications that use the Libmysqlclient library!

• SQL function operations on fields are usually performed in the internal operation character set, and are not affected by the connection character set setting.

The bare strings in the SQL statement are affected by the connection character set or the Introducer setting, which can produce completely different results for comparisons, and you need to be careful! /p>

There is a character set translator between the MySQL client and the MySQL server.
Character_set_client =>GBK: The converter knows that the client sent the code in GBK format.
CHARACTER_SET_CONNECTION=>GBK: Converts data transferred from client to GBK format
Character_set_results =>GBK:
Note: The above three character sets can be set uniformly using the set names GBK
Example:
CREATE TABLE Test (
Name varchar () not NULL
) charset UTF8; #这里的utf8表示服务器端的字符编码
First, insert a piece of data into the datasheet test
Inert into test values (' test ');
, the data "test" is saved in the database in the "UTF8" format
Process:

First, through the MySQL client, the data sent to the MySQL server, after the character set converter, because the character_set_connection value is GBK, so will send the data from the client to GBK format, followed by, When the character set converter transmits data to the server, it discovers that the server is saving the data in UTF8, so that the data is automatically converted from GBK to UTF8 format inside it.
When will there be garbled?
    The client's data format does not match the declared character_set_client
By the header (' Content-type:text/html;charset=utf8 '), the client's data is converted into a UTF8 format, and as the data passes through the "character set converter", because of the CHARACTER_SET_CLIENT=GBK, Character_set_connection is also equal to GBK, so the data sent from the client (in fact, the UTF8 format) is not converted into a format.
However, the character set converter when the data sent to the server, found that the server to the format is UTF8, so the current data as a GBK format to be processed, so as to turn into UTF8 (but this step is actually wrong ... )。
2. Result and the client page does not meet the time

The format of the returned result is set to UTF8, but the client-accepted format is GBK, so garbled
With show character set syntax, you can display all of the available character sets
Latin Character Set

Note: The MaxLen column shows the maximum number of bytes used to store one character.
UTF8 Character Set

GBK Character Set

When will you lose data?
In contrast to the above three images, it is possible to know that each character set has the largest number of bytes to store a single character, utf8 maximum and Latin minimum. Therefore, when the character set converter, if improper handling, will result in data loss, and can not be retrieved.
Like what:
When you change the value of Character_set_connection to Lantin

GBK data sent from the client is converted to LANTIN1 format because the data in the GBK format takes up more characters, resulting in data loss

Summarize:
    character_set_client and character_set_results are generally consistent, because one represents the data format sent by the client, and the other represents the data format accepted by the client in order to avoid data loss,  character_set_connection character encoding is greater than  character_set_client character encoding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.