[MySQL] deep MySQL character set settings [reprint]

Source: Internet
Author: User
Tags naming convention set set

http://www.laruence.com/2008/01/05/12.html Basic Concepts

• The character (Character) refers to the smallest of the ideographic symbols in the human language. such as ' A ', ' B ' and so on;

• Given a series of characters, each character is given a numeric value that represents the corresponding character, which is the encoding of the character (Encoding). For example, we give the character ' a ' a value of 0, give the character ' B ' a value of 1, then 0 is the encoding of the character ' a ';

• Given a series of characters and given the corresponding encoding, all of these characters and the set of encoding pairs is the character set (Character set). For example, given a word list characters {' A ', ' B '}, {' A ' =>0, ' B ' =>1} is a character set;

• Character Fu She (Collation) refers to the comparison rules between characters within the same character set;

• After you have determined the character Fu She, you can define what is equivalent and the size relationship between characters on a character set.

• Each word Fu She uniquely corresponds to a character set, but a character set can correspond to a variety of Fu She, one of which is the default word Fu She (defaults Collation);

The word Fu She name in MySQL follows the naming convention: start with the character Fu She corresponding to the charset name, _ci (for case insensitive), _cs (for case sensitivity), or _bin (which means comparison by encoded values). For example: Under the word Fu She "Utf8_general_ci", the characters "a" and "a" are equivalent;

MySQL Character set settings

• System Variables:

character_set_server: Default internal operation character Set

character_set_client: The character set used by the client source data

character_set_connection: Connection layer Character Set

character_set_results: Query result Character Set

character_set_database: The default character set for the currently selected database

character_set_system: System metadata (field name, etc.) character set

– There are also variables that begin with Collation_ to describe the character Fu She.

• Specify the character set of the text string with introducer:

– Format: [_charset] ' string ' [COLLATE collation]

For example

Select _latin1 ' string ';

select _utf8 ' Hello ' COLLATE utf8_general_ci;

– Text strings modified by Introducer are converted directly to internal character set processing without excessive transcoding during the request.

the character set conversion process in MySQL

1. When MySQL server receives the request, it converts the request data from character_set_client to character_set_connection;

2. Convert the request data from character_set_connection to the internal operation character set before doing an internal operation, and determine the following method:

• Use the character set set value for each data field;

• If the above value does not exist, the default CHARACTER set value for the corresponding data table is used (MySQL extension, non-SQL standard);

• If the above value does not exist, the default CHARACTER set value for the corresponding database is used;

• If the above value does not exist, use Character_set_server to set the value.

3. Convert the operation result from the internal operation character set to Character_set_results.

FAQ Resolution

• When inserting UTF8 encoded data into a data table with a default character set of UTF8, the connection character set is not set until the query is UTF8

– The Character_set_client, Character_set_connection, and Character_set_results are latin1 when plugged in according to the MySQL server's default settings;

– The data for the insert operation will undergo the LATIN1=>LATIN1=>UTF8 character set conversion process, in which each inserted kanji will be saved from the original 3 bytes to 6 bytes;

– The results of the query will undergo the UTF8=>UTF8 character set conversion process, returning the saved 6 bytes intact, resulting in garbled characters ...

• Set the connection character set to UTF8 before inserting UTF8 encoded data into a data table with the default character set of Latin1

– Character_set_client, Character_set_connection, and character_set_results are all UTF8 when inserted according to the connection character set;

– The Insert data will be converted to a utf8=>utf8=>latin1 character set, and if the original data contains Unicode characters outside the \U0000~\U00FF range, it will be converted to "?" because it cannot be represented in the latin1 character set. (0x3F) symbol, which cannot be restored at a later time, regardless of the connection character set settings.

some means of detecting character set problems

show CHARACTER SET;

show COLLATION;

show VARIABLES like ' character% ';

show VARIABLES like ' collation% ';

SQL function Hex, LENGTH, Char_length

SQL function CharSet, COLLATION

Recommendations when using the MySQL character set

• Make the database/table and database operations as explicit as possible when using the character set, rather than relying on MySQL's default settings, or the MySQL upgrade may cause a lot of trouble;

• The database and the connection character set use Latin1 while most of the time can solve the garbled problem, but the disadvantage is that it is not possible to do SQL operations in a character unit, generally the database and the connection character set are UTF8 is a better choice;

• When using the MySQL C API, use Mysql_options to set the Mysql_set_charset_name property to UTF8 immediately after initializing the database handle, so that you do not have to explicitly specify the connection character set with the set names statement and use the Mysql_ The connection character set is also reset to UTF8 when the ping is connected to a disconnected long connection;

• For MySQL PHP API, the General page-level PHP program has a short running time, after connecting to the database explicitly set the SET NAMES statement once the connection character sets, but when using a long connection, please keep the connection smooth and after disconnecting with set The names statement explicitly resets the connection character set.

Other precautions

The Default_character_set setting in my.cnf only affects the connection character set when the MySQL command connects to the server, and does not have any effect on applications that use the Libmysqlclient library!

• SQL function operations on fields are usually performed in the internal operating character set and are not affected by the connection character set settings.

The bare strings in the SQL statement are affected by the connection character set or the introducer settings, and for operations such as comparisons, you may have completely different results, and you need to be careful!

[MySQL] deep MySQL character set settings [reprint]

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.