The solution of the Mysql Chinese garbled problem 1th/2 page _mysql

Source: Internet
Author: User
Tags mysql version
Turn from: http://www.phpchina.cn/viewarticle.php?id=1584


Here is a very boring thing, filled with a large number of various coding, conversion, client, server-side, connection ... Well, I don't want to look at it myself, but think about it, it's a little bit of a point, for four reasons:

MySQL 4.1 's support for multiple languages has changed a lot (which leads to problems);
While MySQL 3 is still dominant in most places, including personal use and host providers, MySQL 4.1 is the official MySQL-recommended database, and there are already host providers starting to provide and will be more and more;
Many PHP programs with MySQL as the default database management software, but they generally do not distinguish between MySQL 4.1 and 4.1 of the following version of the difference, generally referred to as "MySQL 3.xx.xx version" to meet the installation needs;
Because latin1 in many places (below will describe specifically which place) as the default character set, successfully blinded many PHP program developers and users, masking in the Chinese language environment will appear problems;
Simply put, MySQL's own changes and the use of MySQL's PHP programs ignore this, resulting in problems and complications, and because most users are using English, so this problem is not taken seriously. The PHP program mentioned here, mainly in terms of WordPress.

The principle of MySQL 4.1 character set support MySQL 4.1 for character set can be refined to a machine installed on MySQL, where a database, which a table, where a column, what character set should be used. However, traditional WEB programs do not use a complex configuration when creating databases and datasheets, and they use the default configuration, so where does the default configuration come from?

When compiling MySQL, a default character set is specified, which is latin1;
When installing MySQL, you can specify a default character set in the configuration file (My.ini), and if not specified, the value inherits from the compile-time designation;
When starting mysqld, you can specify a default character set in the command-line arguments, and if not specified, the value inherits from the configuration file;
At this point the character_set_server is set to the default character set;
When a new database is created, the character set of the database is set to Character_set_server by default unless explicitly specified;
When a database is selected, the Character_set_database is set to the default character set for the database;
When a table is created in this database, the default character set of the table is set to Character_set_database, which is the default character set of the database;
When you set a column in a table, the default character set for this column is the default character set for the table unless explicitly specified;
This character set is the character set of the actual data stored in the database, and the content of the mysqldump is this character set;

To sum up, if there is no place to modify, then all the tables of all the database all the fields are stored in latin1, but if we install MySQL, will generally choose multi-language support, that is, the installer will automatically in the configuration file in the Default_character _set is set to UTF-8, which guarantees that all fields of all tables in all databases are stored UTF-8 by default.

When a PHP program is connected to MySQL, what character set does the data that the program sends to MySQL use? MySQL is not known (it can only guess at best), so MySQL 4.1 requires that the client must specify this character set, which is the oddity of character_set_client,mysql that the resulting character set is not immediately converted to the character set stored in the database. Instead, it is converted to a character set specified by the character_set_connection variable; what's the use of this connection layer? I don't quite understand, but after converting to this character set of Character_set_connection, It is also converted to the database default character set, which means that two transitions are required, and when the data is output, the database default character set is converted to the character set specified by the Character_set_results.

A typical environment typical environment with MySQL 4.1 installed on my own computer for example, my own computer is installed with the Apache 2,php 5 and the WordPress 1.5.1.3,mysql configuration file specified Default_character_set as UTF8. So the problem arises:

WordPress is installed by default, so all tables use UTF-8 to store data;
WordPress default Browse Character set is UTF-8 (set in options->reading), so all WP page meta will show that CharSet is utf-8;
So the browser will be utf-8 to display all the WP page, so that all of the Post, and Comment will be UTF-8 format from the browser sent to Apache, and then Apache to PHP;
So WP data from all the forms are utf-8 encoded, WP without conversion directly to the data sent to MySQL;
MySQL default settings Character_set_client and Character_set_connection are latin1, at this time strange things happen, is actually utf-8 format data, was "as latin1" converted to ... Actually or convert to Latin1, and then by this latin1 converted into utf-8, so two times conversion, a part of the utf-8 characters are lost, into??, and the final output character_set_results by default is Latin1, And it's going to be a strange thing.
The most magical is not this, if the WordPress set in the GB2312 format to read, then WP sent to MySQL GB2312 encoded data, was "as latin1" converted, stored in the database is a strange format (really strange format, mysqldump Out can be found, whether as utf-8 or as a gb2312 to read is garbled, but if this format to latin1 output, incredibly can be changed back to gb2312!

What will this cause? WP if the use of MySQL 4.1 database, the code to switch to GB2312 is normal, unfortunately, this normal just seems normal.

How to solve the problem if you are impatient (almost certainly), Google, you will find that the vast majority of the answer is, query before you do: SET NAMES ' UTF8 ', yes, this is the solution, but the purpose of this article is to explain why this is the solution.

To ensure that the results are correct, you must ensure that the data table is in the correct format, that is to say, at least can store all Chinese characters, then we have only two options, GBK or Utf-8, the following discussion of the utf-8 situation.

Because the default_character_set of the configuration file settings is UTF8, the data table defaults to the Utf-8 set up. This should also be the configuration that all host providers using MySQL 4.1 should adopt. So we want to make sure that the client and MySQL interaction between the specified encoding is correct.

There are only two possible ways in which the client sends data in the GB2312 format or sends data in UTF-8 format.

If sent in gb2312 format:

SET character_set_client= ' gb2312 '
SET character_set_connection= ' UTF8 ' or
SET character_set_connection= ' gb2312 '

Are all possible, can ensure that the data in the code conversion does not appear lost, that is, to ensure that the database is stored in the correct content.

How to ensure that the correct content is taken out? Considering the vast majority of clients (including WP), the encoding of the data sent is the encoding that it expects to receive data, so:

SET character_set_results= ' gb2312 '

The format that is guaranteed to be taken out to the browser is gb2312.

In the second case, the client is sent in utf-8 format (the default for WP), which can be configured with the following:

SET character_set_client= ' UTF8 '
SET character_set_connection= ' UTF8 '
SET character_set_results= ' UTF8 '

This configuration is equivalent to SET NAMES ' UTF8 '.

WP should make what modify or that sentence, the client to send the database what encoded data, the database is impossible to know exactly, can only let the client itself said understand, so, WP is must send the correct SET ... To MySQL. How to send the most suitable? Taiwan's Plog colleagues have given some suggestions:

First, the test server is >= 4.1, whether or not to join UTF-8 support at compile time;
Then test the database in what format to store ($dbEncoding);
SET NAMES $dbEncoding
For the 2nd, the situation of WP is different, according to the typical configuration above, as long as the use of WP, sure database is stored with UTF-8, so according to the user set to GB2312 or UTF-8 browse to judge (Bloginfo (' CharSet ')), But this value is to be connected to the database in order to get, so the most efficient way is to connect the database, based on this configuration set NAMES once, without having to set the query every time before.

My way of modifying this is to add in the wp_includes/wp-db.php:

function Set_charset ($charset)
{
Check MySQL version A.
$serverVersion = Mysql_get_server_info ($this->dbh);
$version = Explode ('. ', $serverVersion);
if ($version [0] < 4) return;

Check if UTF8 support is compiled in
$result = mysql_query ("Show CHARACTER SET like ' UTF8 '",
$this-&GT;DBH);
if (mysql_num_rows ($result) < = 0) return;

if ($charset = = ' Utf-8 ' | | $charset = = ' UTF-8 ')
$charset = ' UTF8 ';
@mysql_query ("SET NAMES ' $charset '", $this-&GT;DBH);
}

In the wp-settings.php of require (abspath. Wpinc. '/vars.php '); Add after:

$wpdb->set_charset (Get_bloginfo (' CharSet '));

http://www.phpchina.cn/viewarticle.php?id=1584

Current 1/2 page 12 Next read the full text

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.