In-depth Mysql character Set analysis _mysql

Source: Internet
Author: User
Tags numeric naming convention numeric value set set create database
Basic Concepts
• character (Character) refers to the smallest form of semantic notation in human language. such as ' A ', ' B ', etc.;
• Given a series of characters, assign a numeric value to each character, and use a numeric value to represent the corresponding character, which is the encoding of the character (Encoding). For example, we give the character ' a ' a value of 0, giving the character ' B ' a value of 1, then 0 is the character ' a ' encoding;
• Given a series of characters and a corresponding encoding, the set of all these characters and pairs of encodings is the character set (Character set). For example, when the given character characters is {' A ', ' B '}, {' A ' =>0, ' B ' =>1} is a character set;
• Character Fu She (collation) refers to the rules of comparison between characters in the same character set;
• Determine the character Fu She to define what is equivalent on a character set and the size relationship between characters;
• Each character Fu She uniquely corresponds to a character set, but a character set can correspond to a variety of character Fu She, one of which is the default character Fu She (defaulted collation);
The word Fu She name in MySQL follows the naming convention: starts with a character set name corresponding to the Fu She, ends with _ci (which indicates case insensitive), _cs (for case sensitive), or _bin (indicates a comparison by encoded value). For example: Under the word Fu She ' utf8_general_ci ', the character ' a ' and ' a ' are equivalent;
MySQL Character set settings
• System Variables:
–character_set_server: Default internal action Character set
–character_set_client: Character set used by client source data
–character_set_connection: Connection Layer Character Set
–character_set_results: Query result Character Set
–character_set_database: The default character set for the currently selected database
–character_set_system: System metadata (field name, etc.) character set
-There are also the corresponding variables at the beginning of the collation_ to describe the word Fu She.
• Use Introducer to specify the character set of the text string:
– Format: [_charset] ' string ' [COLLATE collation]
e.g.
Select _latin1 ' string ';
select _utf8 ' Hello ' COLLATE utf8_general_ci;
– The Introducer-decorated text string is directly converted to internal character set processing without the extra transcoding in the request process.
The character set conversion process in MySQL
1. When the MySQL server receives the request, converts the request data from character_set_client to character_set_connection;
2. The request data is converted from character_set_connection to the internal operation character set prior to internal operation, and the method is determined as follows:
• Use the character set set value for each data field;
• If the above values do not exist, then use the corresponding data table default CHARACTER Set set value (MySQL extension, non-SQL standard);
• If the above values do not exist, then use the corresponding database default CHARACTER set set value;
• If the above values do not exist, use the Character_set_server set value.
3. Converts the result of the operation from the internal operation character set to Character_set_results.


Common Problem Resolution
• The connection character set is not set before inserting UTF8 encoded data into the UTF8 datasheet for the default character set, and setting the connection character set to the query is UTF8

– Inserts the default settings for the MySQL server, character_set_client, character_set_connection, and Character_set_results are latin1;

– The data for the insert operation passes through the LATIN1=>LATIN1=>UTF8 character set conversion process, in which each inserted Chinese character is saved from the original 3 bytes into 6 bytes;

– The results of the query will go through the Utf8=>utf8 character set conversion process, the saved 6 bytes returned intact, generated garbled ...

• Set the connection character set to the UTF8 encoded data before inserting the Latin1 data table into the default character set UTF8

– When inserted according to the connection character set, Character_set_client, character_set_connection and character_set_results are UTF8;

– inserting data is converted to utf8=>utf8=>latin1 character set, and if the original data contains Unicode characters outside the/U0000~/U00FF range, it is converted to "?" because it cannot be represented in the latin1 character set. (0x3F) symbol, which cannot be recovered at a later query regardless of the connection character set setting.

some means of detecting character set problems
show CHARACTER SET;

Show collation;

show VARIABLES like ' character% ';

show VARIABLES like ' collation% ';

SQL function Hex, LENGTH, Char_length

SQL function CharSet, collation

recommendations when using the MySQL character set
• Create database/table and database operations as much as possible to explicitly indicate the use of the character set, rather than relying on MySQL default settings, or MySQL upgrade may cause great trouble;

• Database and connection character sets are used latin1 Although most cases can solve the garbled problem, but the disadvantage is not to use the character as a unit for SQL operations, in general, the database and the connection character set for the UTF8 is a better choice;

• When using the MySQL C API, initialize the database handle with Mysql_options to set the Mysql_set_charset_name property to UTF8 immediately, so that you do not have to explicitly specify the connection character set with the set names statement, and use the Mysql_ The connection character set is reset to UTF8 when the ping is connected to a disconnected long connection;

• For the MySQL php API, General page-level PHP program total running time is short, after connecting to the database explicitly with the SET NAMES statement set a connection character set; But when using long connections, be careful to keep the connections open and use set after disconnecting the reconnection The names statement explicitly resets the connection character set.

Other considerations
The Default_character_set setting in my.cnf only affects the connection character set when the MySQL command connects to the server, and does not have any effect on applications that use the Libmysqlclient library!

• SQL function operations on fields are usually performed in the internal operation character set, and are not affected by the connection character set setting.

The bare strings in the SQL statement are affected by the connection character set or the Introducer setting, which can produce completely different results for comparisons, and you need to be careful!
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.