MySQL character set ultimate solution [posting]
MySQL, an open-source database, has always been the first choice for small and medium-sized enterprises to build Web applications. In particular, It is a golden partner to cooperate with PHP and is favored by web developers. However, MySQL has added support for multiple character sets since MySQL 4.1. Many mysql users have found that Chinese characters cannot be used and the display turns into a bunch of garbled characters! As a result, many people are still using the old version 3.24.58. recently on the MySQL website, we found that version 3.24 is not provided for download, and MySQL has completely abandoned version 3.24. Fortunately, I still have a copy for Windows, as a souvenir.
How can we solve the problem of garbled characters? As long as you go through the solution on the Internet, you can immediately get the answer: "After obtaining the connection, execute a set names 'gb2312'." But what is the reason for this? Summarize my experience.
When MySQL processes connections, SQL requests sent from external connections are converted in the following order:
Character_set_client// Character set used for customer connection
|
Character_set_connection// MySQL connection Character Set
|
Character_set_database// Character set used by the database (Table, column)
|
Character_set_results// The client displays the character set used
I. The root cause of garbled characters is:
1. the client does not correctly set the client Character Set, causing the original SQL statement to be converted to the character set referred to by connection. This conversion will lead to loss of information. If the client is in utf8 format, if it is converted to the gb2312 format, the information will be lost. Otherwise, the information will not be lost. Make sure that the character set of connection is greater than the client character set to ensure that no information is lost during conversion.
2. the font of the database is not correctly set. If the font of the database is incorrectly set, the connection character set is converted to the database character set and the encoding is still lost because it is the same as the above.
2. Why set names 'gb2312 '?
Set names 'gb2312' is equivalent to these three statements:
Set character_set_client = gb2312;
Set character_set_connection = gb2312;
Set character_set_results = gb2312;
In this case, the above cause of garbled characters 1 does not exist, because the encoding formats are unified, but this is not a panacea. Reasons:
1. Your client does not necessarily use gb2312 encoding to send SQL statements. If the encoding is not gb2312, conversion to gb2312 will cause problems.
2. the table in your database is not necessarily in gb2312 format. If it is not in gb2312 format but another format, such as Latin1, information will be lost when storing character sets.
To sum up, the ultimate solution is as follows:
1. First, determine the encoding format of your client. This is the most important (IE6 generally uses utf8, and the command line is generally GBKProgramIs gb2312)
2. Make sure that your database uses the utf8 format, which is very simple. All codes can be used.
3. Ensure that the connection character set is greater than or equal to the client Character Set, Otherwise information will be lost, such as Latin1 <gb2312 <GBK <utf8
If set character_set_client = gb2312 is set, at least the character set of connection must be greater than or equal to gb2312. Otherwise, information will be lost.
4. if the preceding three steps are correct, all Chinese files are correctly converted to the utf8 format and stored in the database. To adapt to different browsers and clients, you can modify character_set_results to display Chinese fonts in different codes. Because utf8 is a large direction, I prefer to use utf8 to display Chinese Characters in Web applications.
These are my experiences. Attach the Connection source code and the current settings. The character set issue can be ignored in the program.
Include "CONF/system. php ";
Class connection {
Private $ conn;
Function _ construct (){
Global $ mysql_ipaddr, $ mysql_port, $ mysql_db, $ mysql_user, $ mysql_pass;
Try {
$ This-> conn = new PDO ("MYSQL: host = $ mysql_ipaddr; Port = $ mysql_port; dbname = $ mysql_db", $ mysql_user, $ mysql_pass );
} Catch (pdoexception $ e ){
Print "MySQL server connection failed:". $ e-> getmessage (). "<br> ";
Die ();
}
}
Public Function getconnection (){
If ($ this-> Conn! = NULL ){
$ This-> Conn-> query ("set character_set_client = gb2312 ");// The client uses the gb2312 format
$ This-> Conn-> query ("set character_set_connection = utf8 ");// The Connection Character Set uses the utf8 format
$ This-> Conn-> query ("set character_set_results = utf8 ");// Display character sets in utf8 format
Return $ this-> conn;
}
}
Public Function closeconnection (){
If ($ this-> Conn! = NULL ){
$ This-> conn = NULL;
}
}
}
I have encountered a problem in MySQL. Our character set is gb2312. there will be irrelevant result sets during fuzzy search in Chinese.
From the root cause analysis, there are also the following problems.
Example:
The 1st and 2-byte ASCII values of the Chinese "no" are 178 and 187, respectively.
The 1st and 2-byte ASCII values of the Chinese "an" are 176 and 178, respectively.
The 1st and 2-byte ASCII values of the Chinese "Flowers" are 187 and 168, respectively.
The wise man has seen that, when the character "no" is fuzzy searched in the string "Anhua", the MySQL system will also think that the two match!
The cause of this problem is: MySQL is case insensitive when querying strings, in programming MySQL, The ISO-8859 character set is generally used as the default character set, therefore, the case-sensitivity conversion of Chinese encoding Characters During the comparison process causes this phenomenon.
Method 1:
The solution is to add the "binary" attribute to a field that contains Chinese characters for Binary comparison. For example, change "name char (10)" to "name char (10) binary ".
Method 2:
If you use the source code to compile MySQL, you can use the -- with -- charset = GBK parameter when compiling MySQL, so that MySQL can directly support Chinese searching and sorting.
Method 3:
You can use the locate function of MySQL to determine. The preceding problem is used as an example:
Select * from table where locate (field, 'lil')> 0;
This method is used on this site, and it feels good. : P
Method 4:
Change your select statement to this way. Select * from table where fields like binary '% find %!
If you want to use the correct character set, use mysqldump to export the file and import it.
Reprinted statement:This article from http://hi.baidu.com/xhero2008/blog/item/8d5b2f3fe2c617e955e7234c.html/cmtid/bd79d354f5c1da54d1090692
========================================================== ========================================================
How to solve MySQL query Chinese problems [posting]
Original article:
Http://blog.sina.com.cn/u/4909c13c010003va
Q:
When I write a query condition, the problem is as follows:
For example, if I want to write all records whose fields contain the word "Li ",
$ STR = "Li ";
Select * from table where field like '% $ STR % ';
In addition to records containing the word "Li", the displayed records do not contain the word "Li. Why?
A:
In MySQL, Chinese Character sorting and search results are incorrect. This situation exists in many MySQL versions. If this problem is not solved, MySQL cannot actually process Chinese characters.
The cause of this problem is: MySQL is case insensitive when querying strings, in programming MySQL, The ISO-8859 character set is generally used as the default character set, therefore, the case-sensitivity conversion of Chinese encoding Characters During the comparison process causes this phenomenon.
Method 1:
The solution is to add the "binary" attribute to a field that contains Chinese characters for Binary comparison. For example, change "name char (10)" to "name char (10) binary ".
Method 2:
If you use the source code to compile MySQL, you can use the -- with -- charset = GBK parameter when compiling MySQL, so that MySQL can directly support Chinese searching and sorting.
Method 3:
You can use the locate function of MySQL to determine. The preceding problem is used as an example:
Select * from table where locate (field, 'lil')> 0;
This method is used on this site, and it feels good. : P
Method 4:
Change your select statement to this way. Select * from table where fields like binary '% find %!
Http://blog.sina.com.cn/u/4909c13c010003va
Q:
When I write a query condition, the problem is as follows:
For example, if I want to write all records whose fields contain the word "Li ",
$ STR = "Li ";
Select * from table where field like '% $ STR % ';
In addition to records containing the word "Li", the displayed records do not contain the word "Li. Why?
A:
In MySQL, Chinese Character sorting and search results are incorrect. This situation exists in many MySQL versions. If this problem is not solved, MySQL cannot actually process Chinese characters.
The cause of this problem is: MySQL is case insensitive when querying strings, in programming MySQL, The ISO-8859 character set is generally used as the default character set, therefore, the case-sensitivity conversion of Chinese encoding Characters During the comparison process causes this phenomenon.
Method 1:
The solution is to add the "binary" attribute to a field that contains Chinese characters for Binary comparison. For example, change "name char (10)" to "name char (10) binary ".
Method 2:
If you use the source code to compile MySQL, you can use the -- with -- charset = GBK parameter when compiling MySQL, so that MySQL can directly support Chinese searching and sorting.
Method 3:
You can use the locate function of MySQL to determine. The preceding problem is used as an example:
Select * from table where locate (field, 'lil')> 0;
This method is used on this site, and it feels good. : P
Method 4:
Change your select statement to this way. Select * from table where fields like binary '% find %!
Reprinted statement:This article from http://www.playhosts.com/bbs/read.php? Tid = 12357
========================================================== ========================================================
Detailed tutorial on MySQL Character Set conversion and Version Upgrade/downgrade [posting]
Original post addressHttp://club.muzone.cn/viewthread.php? Tid = 28605
Since MySQL 4.1, the multi-language support has changed a lot (which leads to problems ). Although MySQL 3 and 4.0 are dominant in most places (including personal use and host providers), MySQL 4.1 and even 5.0 are officially recommended databases by MySQL, there are already host providers starting to provide and will become more and more; Because Latin1 is used as the default character set in many places (which is detailed below, the success blinded the developers and users of many PHP programs and concealed problems that may occur in Chinese and other language environments.
MySQL 4.1 began to classify the multi-language character sets in more detail, which led to database migration. Or, after the DZ Forum was upgraded to 4.0 (dz4.0 began to use GBK or UTF-8 encoding), there was a garbled problem.
MySQL 4.1 character set support has two aspects: Character Set and collation ). The support for character sets is refined to four levels: Server, database, table, and connection ).
You can run the following two commands to view the character set and sorting method settings of the system:
Reference: Mysql> show variables like 'character _ SET _ % ';
+ -------------------------- + ---------------------------- +
| Variable_name | value |
+ -------------------------- + ---------------------------- +
| Character_set_client | Latin1 |
| Character_set_connection | Latin1 |
| Character_set_database | Latin1 |
| Character_set_results | Latin1 |
| Character_set_server | Latin1 |
| Character_set_system | utf8 |
| Character_sets_dir |/usr/share/MySQL/charsets/|
+ -------------------------- + ---------------------------- +
7 rows in SET (0.00 Sec)
Mysql> show variables like 'collation _ % ';
+ ---------------------- + ------------------- +
| Variable_name | value |
+ ---------------------- + ------------------- +
| Collation_connection | latin1_swedish_ci |
| Collation_database | latin1_swedish_ci |
| Collation_server | latin1_swedish_ci |
+ ---------------------- + ------------------- +
3 rows in SET (0.00 Sec)
MySQL 4.1 specifies the character set to be refined to the MySQL installed on a machine. One database, one table, and one column of the table, which character set should be used. However, traditional web programs do not use such complex configurations when creating databases and data tables. They use default configurations. So where does the default configuration come from?
A default character set is specified during MySQL compilation. the character set is Latin1;
When installing MySQL, you can specify a default character set in the configuration file (My. INI). If it is not specified, this value is inherited from the value specified during compilation;
When you start mysqld, you can specify a default character set in the command line parameters. If not, this value is inherited from the configuration file;
Character_set_server is set to the default character set;
When creating a new database, unless explicitly specified, the character set of this database is set to character_set_server by default;
When a database is selected, character_set_database is set to the default Character Set of the database;
When a table is created in this database, the default Character Set of the table is set to character_set_database, which is the default Character Set of this database;
When a column is set in the table, unless explicitly specified, the default character set in this column is the default Character Set of the table;
This character set is the character set used for actual data storage in the database. The content of mysqldump is under this character set;
When we access the MySQL database through PHP in the original way, even if the default Character Set of the table is set to utf8 and the query is sent through the UTF-8 encoding, you will find that the database is still garbled. The problem lies in the connection layer.
To store "correct" information and obtain "correct" results, it is most convenient to execute the following before all queries start:
Set names 'gbk ';
GBK is the database character set.
It is equivalent to the following three commands:
Set character_set_client = GBK;
Set character_set_results = GBK;
Set character_set_connection = GBK;
The Latin1 character set is used by default in 4.1 and 5.0 (WOOD: Mom, foreigners are really domineering. Do you want to make the world use the Swedish character set)
If we only want to use the GBK character set to store and retrieve data,
When compiling MySQL 4.1 and 5.0, you must add two parameters in my. ini or my. CNF.
Code: [Mysqld]
Default-character-set = utf8 {
Copycode (document. getelementbyid ('code1 '));
} "Href =" javascript: "> [Copy to clipboard]
Code: # Settings for clients (connection, results, clients)
[MySQL]
Default-character-set = utf8 {
Copycode (document. getelementbyid ('code2 '));
} "Href =" javascript: "> [Copy to clipboard]
Next let's talk about how to convert the database Character Set
Two methods,
Reference: First ---- change the storage Character Set
The main idea is to change the character set Latin1 of the database to GBK, big5, or utf8. The following operations must have host permissions. Assume that the database name for the current operation is: Database
Export
First, export the data to the format of mysql4.0. The specific command is as follows:
Mysqldump-uroot-p -- default-character-set = Latin1 -- Set-charset = GBK -- skip-opt Databse> d4. SQL
-- Default-characte-set: the character set of the database before. Generally, it is Latin1,
-- Set-charset: the character set of the data exported. This can be set to GBK, utf8, or big5.
Import
First, use the following statement to create a database with the GBK character set (test)
Create Database 'd4 'default Character Set GBK collate gbk_chinese_ci;
Then, import the exported data to the current database.
Mysql-uroot-p -- default-character-set = GBK-F D4 <d4. SQL
After the preceding export and import operations, you can change the database character set to the correct storage method.
D4 is the name of the new database, and d4. SQL is the name of the exported file.
However, this method is depressing to find that the database data storage volume increases by 30% without reason.
Reference: The other method works the same but requires manual operations. It is generally used for selection after the first method fails.
However, this method is difficult if the database is large, because opening the file will cause you to crash.
First, use phpMyAdmin or dump of MySQL to export the. SQL file.
Then use ultraedit to open all the XXXX. SQL files you have backed up and findCode:Default charset = Latin1{
Copycode (document. getelementbyid ('code3 '));
} "Href =" javascript: ">[Copy to clipboard]
Latin1 may be something else. It's something you don't want. It should be converted into a GBK or big5 character set.
Replace this with "null"
Find Code: Create Table cdb_sessions (
Sid char (6) Character Set Latin1 collate latin1_bin not null default '',
IP1 tinyint (3) unsigned not null default '0 ',
Ip2 tinyint (3) unsigned not null default '0 ',
IP3 tinyint (3) unsigned not null default '0 ',
Ip4 tinyint (3) unsigned not null default '0 ',
UID mediumint (8) unsigned not null default '0 ',
Username char (15) not null default '',
Groupid smallint (6) unsigned not null default '0 ',
Styleid smallint (6) unsigned not null default '0 ',
Invisible tinyint (1) not null default '0 ',
'Action' tinyint (1) unsigned not null default '0 ',
Lastactivity int (10) unsigned not null default '0 ',
FID smallint (6) unsigned not null default '0 ',
TID mediumint (8) unsigned not null default '0 ',
Nickname char (15) not null default '',
Unique key Sid (SID)
) Engine = heap max_rows = 1000; {
Copycode (document. getelementbyid ('code4 '));
} "Href =" javascript: "> [Copy to clipboard]
Replace Code: Create Table 'cdb _ session '(
'Sid 'Char (6) binary not null default '',
'Ip1' tinyint (3) unsigned not null default '0 ',
'Ip2' tinyint (3) unsigned not null default '0 ',
'Ip3' tinyint (3) unsigned not null default '0 ',
'Ip4' tinyint (3) unsigned not null default '0 ',
'Uid' mediumint (8) unsigned not null default '0 ',
'Username' char (15) not null default '',
'Groupid' smallint (6) unsigned not null default '0 ',
'Styleid' smallint (6) unsigned not null default '0 ',
'Invisable' tinyint (1) not null default '0 ',
'Action' tinyint (1) unsigned not null default '0 ',
'Lastactivity' int (10) unsigned not null default '0 ',
'Fid' smallint (6) unsigned not null default '0 ',
'Tid' mediumint (8) unsigned not null default '0 ',
'Nickname' char (15) not null default '',
Unique key 'sid '('sid ')
) Type = heap max_rows = 2000; {
Copycode (document. getelementbyid ('code5 '));
} "Href =" javascript: "> [Copy to clipboard]
This step is simpler to delete the section about the cdb_sessions table. In the future, we will add a new D4 and export the table.
Copy and paste the content to the end of the SQL file [Code] [/Code]
Save the SQL file and import it to your database.
OK.
Using these two methods, you can easily downgrade MySQL Databases of 4.1 and 5.0 to 4.0.
A simple process is
A. Export the 4.1/5.0 Database
B. convert it to the GBK character set.
C. Thoroughly uninstall 4.1 or 5.0
D install 4.0.26
E. Import the processed database.
This method can be used to export data when downgrading
Mysqldump-uroot-p -- default-character-set = Latin1 -- Set-charset = GBK -- skip-opt Databse -- compatible = mysql40> d4. SQL
In this way, the exported result is Kohler 4.0.
For MySQL version upgrades,
If the data file contains Chinese characters, it is not possible to copy the MySQL 4.0 data file directly to MySQL 4.1, even in my. default-character-set is set to the correct character set in ini. Although it seems that there is no problem, the character set of MySQL 4.1 is very annoying. Taking GBK as an example, the original length of MySQL 4.0 data, such as varchar and char, will be half of the original length, in this way, the Chinese storage capacity remains unchanged, and the English storage capacity is half. This is the biggest problem caused by direct copying of data files.
Therefore, if you want to use the correct character set, use mysqldump to export the file and then import it.
Transferred from:Http://www.sd9981.com/mysql? P = 22
Reprinted statement:This article from http://www.playhosts.com/bbs/read.php? Tid = 12357
========================================================== ========================================================