MySQL is prone to garbled problems when porting data containing Chinese. Many of them appeared when transplanted from mysql4.x to mysql5.x. The default character set for MySQL is Latin1, and many people use the latin1 character set when using mysql4.x. When using MYSQL5, you are often willing to use UTF8. So our mission is not to convert the characters in the data from latin1 to UTF8?
No.
In the previous system, we used latin1 to save Chinese characters using the GB series character set (GBK, gb2312, etc.) with a less accurate, but more vivid, statement. How do you say that?
Mysql> Show CREATE TABLE Test\g
1. Row ***************************
Table:test
Create table:create Table ' test '
' A ' varchar (+) Default NULL
) Engine=innodb DEFAULT Charset=utf8
1 row in Set (0.00 sec)
Mysql> Show CREATE TABLE Testlatin1\g
1. Row ***************************
Table:testlatin1
Create table:create Table ' testlatin1 '
' A ' varchar (+) Default NULL
) Engine=innodb DEFAULT charset=latin1
1 row in Set (0.01 sec)
We see that the two tables have different default character sets. This character set tells us that if the character set of the column is not specifically specified, the character type column has the same character set as the table's default character set.
The character set of the column is what character set is used to tell MySQL which characters are saved. But exactly what character sets the character is saved, not by MySQL, nor does MySQL check.
Before UTF8 was widely used, the Chinese characters we used were the GB series of character sets, such as GB2312, GBK, GB18030, and so on.
In MySQL, where the default character set is Latin1, we usually save Chinese characters in the GB character set to the database, but tell MySQL that it is the latin1 character set. While the GB character set is a kanji account of two bytes, Latin1 is a character that occupies one byte. In other words, a GB character is saved as two latin1 characters. This reminds me of the original iso8859_1, which is a similar situation. As long as we save and read as Latin1, do not convert, and then when displayed as a GB character set, it can be used correctly.
So how to Latin1 saved Chinese characters correctly to the UTF8 character set database?
First, the columns in the new database are to use the UTF8 character set. One option is to specify the default character set when database is created, so that the default character set for database is used if you do not specify a character set when building a table.
The exported data is exported in the Latin1 character set, in effect telling MySQL not to convert when exporting (because the original table is the latin1 character set).
Mysqldump out, and then use MySQL to import, but also to tell MySQL, the current data is the GB series of character sets, such as GBK. In this way, MySQL is responsible for converting the data from GBK to UTF8 and saving it to the database.
How to tell MySQL what character set to import SQL, one method is to use--default-character-set, but sometimes it does not play a practical role. This is because the mysqldump file has a set names statement. Like what:
Head Ea192.060913.sql
--MySQL Dump 10.10
--
--Host:localhost database:ea192
-- ------------------------------------------------------
--Server version 5.0.16-standard-log
;
;
;
;
MySQL is unique syntax, in other databases will be ignored as comments. /*! after 40101 is the expression version, in 4.1.1 and above to execute the clause.
Here we see a set NAMES latin1. One of its functions is to tell MySQL that the data passed by the client is the latin1 character set. Because there is such a set Names,--default-character-set also can not play a role. If you have an unfortunate SQL, you need to remove it or change it to set NAMES GBK. Modify or delete the method, when the amount of data is relatively large, you can use head and tail to match. For example, the file above:
First look at the set names in the number of lines (a few), the above see is the 10th line.
Wc-l Ea192.060913.sql
1987 Ea192.060913.sql
Get total number of rows is 1987
Head-9 ea192.060913.sql > Final.sql
[Email protected]:~$ tail-1977 ea192.060913.sql >> final.sql
[Email protected]:~$
Head-9 is to take the first 9 lines, tail-1977 is taken after 1977 lines, so that the 10th line across the past.
When you get Final.sql to run with MySQL, you can use--DEFAULT-CHARACTER-SET=GBK.
Another option is to use--set-charset=false when mysqldump, so that no set names is present.
So far, there may be a problem in the SQL for create table, such as:
DROP TABLE IF EXISTS ' test ';
CREATE TABLE ' Test '
' A ' varchar (+) Default NULL
) Engine=innodb DEFAULT charset=latin1;
There is still a charset=latin1, which will cause the default character set of the newly created table to be latin1, not what we want. What to do, if the amount of data is not enough, you can consider using the editor to remove it or change to UTF8, if the data is large enough to consider using SED, but may still be longer.
Another way is mysqldump, using--create-options=false, does not export the table's creation properties. However, if the storage engine for the exported table is different, there is a problem because the engine type (InnoDB, MyISAM, and so on) is ignored.
In addition, when mysqldump export, do not use-B, but directly specify a database name, in order to not appear the CREATE DATABASE statement, because there may also be a default character set clause, will affect those not in the CREATE table specified character set table. If you have the create DATABASE in your exported SQL, you need to be aware that there are no character set clauses, and if so, you need to modify them.
Well, export files that have been exported or processed through the above methods can be imported using MySQL--DEFAULT-CHARACTER-SET=GBK.
In summary, the following commands are basically executed:
1. Backing Up the database
Mysqldump--default-character-set=latin1--create-options=false--set-charset=false-u root-p database name
>e:\back.sql
2. Create a new database
CREATE database name CHARACTER SET UTF8 COLLATE utf8_general_ci;
3. Import data
Mysql-u root-p--DEFAULT-CHARACTER-SET=GBK Database name <e:\back.sql
》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》》
starting with MySQL4.1, a detailed distinction is added to the character set. Character set often encountered the problem is garbled and query results are not allowed two problems, the result of the hidden danger is very large.
the MySQL default character set is Latin1 or called Iso-8859-1. Please view
Http://faqs.cs.uu.nl/na-dir/internationalization/iso-8859-1-charset.html
In this character set, the binary 0x F9,0xFA,0x FB are different pronunciations of the Latin alphabet U, similar to the Chinese four tones saying.
| F9 | 249 | ù| LATIN SMALL Letter U with GRAVE ACCENT
| FA | 250 | ú| LATIN SMALL Letter U with ACUTE ACCENT
| FB | 251 | û| LATIN SMALL Letter U with circumflex ACCENT
In this way, the Chinese "girl" 0xc9fa and 0xc9f9,0XC9FB, that is, "sound nephews" two words in the latin1 all correspond to the letter U, the following statement will return the wrong result
SELECT * from article author = 'Girls 'There will be female, female, female nephews, three results. As an example:
DROP TABLE IF EXISTS article; CREATE TABLE article (
Message varchar (255) Collate latin1_swedish_ci NOT NULL, Message2 varchar (255) Collate latin1_bin NOT NULL,
KEY Message (message)
) Engine=myisam charset=latin1;
INSERT into article (message, Message2) VALUES ('Girls ',' girls '); INSERT into article (message, Message2) VALUES (' Female ',' female '); INSERT into article (message, Message2) VALUES (' Women's Sacrifice ',' women's '); INSERT into article (message, Message2) VALUES (' Female nephews ',' female nephews '); INSERT into article (message, Message2) VALUES (' Female liters ',' female liter '); INSERT into article (message, Message2) VALUES (' Female infiltration ',' female infiltration ');
We set up two different proofing character set fields to store the same content. Use:
SELECT * FROM article WHERE message = ' Girls ';
Get Three records girl girl
Female Girl
Female niece Nephews
Use:
SELECT * FROM article WHERE message2 = ' Girls ';
Get a record girl girl
There are two ways to solve this, one way to use PHP is to:
$sql = "SELECT * from article author=' girls '";
$res = mysql_query ($sql);
$article = Array ();
while ($row = Mysql_fetch_assoc ($res))
{
if ($row ['author'] = = ' Girls ')
{
$article = $row;
Break
}
}
the method of using MySQL itself to solve is to use the Latin1_bin proofing character set, such as message2, but if your data is already another Latin proofing character Set, use the following method to remedy:
SELECT * FROM article WHERE message = ' Girls ' Collate latin1_bin; SELECT * FROM article WHERE BINARY message = ' Girls ';
From the MySQL Manual, the two methods are the same, and in essence, the method above uses the index, and the following method does not use any indexes.
You can use the EXPLAIN syntax to check the differences between the following three sentences:
Explain SELECT * FROM article WHERE message = ' Girls ' Collate latin1_bin;
Explain SELECT * FROM article WHERE BINARY message = ' Girls ';
Explain SELECT * FROM article WHERE message = ' Girls ';
If a set of systems using a unique index or keywords, the risk is smaller, at most, registered girls, when you register female nephews, the system reported that "the user name already exists" such a lie.
For multiple systems using the same set of SSO passes. This security risk is very large, you can register the "female nephews" in the non-MySQL system and log in back to the MySQL system. If this "girl" is the administrator, then you get the highest authority.
MySQL code latin1 to UTF8