The garbled problem in database porting in MySQL database

Source: Internet
Author: User
Tags character set final modify mysql create database mysql database

When MySQL transplants data containing Chinese, it is easy to get garbled problems. Many are present when transplanted from mysql4.x to mysql5.x. MySQL's default character set is Latin1, and when using mysql4.x, many people are using the latin1 character set. And when the use of MySQL5 is often willing to use UTF-8. So is it our task to convert the characters from Latin1 to UTF-8 from the data? No.

In the previous system, we used the latin1 to save the Chinese characters using the GB series character set (GBK, GB2312, etc.) with a less accurate sentence. Why do you say that?

The following are the referenced contents:

Mysql> Show CREATE TABLE Test\g
1. Row
Table:test
Create table:create Table ' test ' (
' A ' varchar (MB) Default NULL
) Engine=innodb DEFAULT Charset=utf8
1 row in Set (0.00 sec)

Mysql> Show CREATE TABLE Testlatin1\g
1. Row *
Table:testlatin1
Create table:create Table ' testlatin1 ' (
' A ' varchar (MB) Default NULL
) Engine=innodb DEFAULT charset=latin1
1 row in Set (0.01 sec)


The character set is to tell us that if the character set of the column is not specifically specified, the character type column has the same character set as the table's default character set.

The character set of the column is to tell MySQL what character set is used for the characters in this store. But in the end is what character set characters, not by the MySQL decision, MySQL does not check.

Before UTF-8 widely used, we used Chinese characters are GB series of character sets, such as GB2312, GBK, GB18030 and so on.

In MySQL, where the default character set is Latin1, we usually save the characters in the GB character set to the database, but tell MySQL that it is the latin1 character set. The GB character set is a character that occupies two bytes, and Latin1 is one byte. That is to say, a GB character is saved as two latin1 characters. This reminds me of the original iso8859_1, but also a similar situation. As long as we save and read as Latin1, do not convert, and then when displayed as the GB character set, can be used correctly.

So how to latin1 save the Chinese characters correctly guide the UTF-8 character set of the database?

First, the columns in the new database are to use the UTF-8 character set. One approach is to specify the default character set when creating the database, so that the default character set of the database is used when the character set is not specified when the table is being built.

The exported data is exported in the Latin1 character set, in effect telling MySQL to export without conversion (because the original table is the latin1 character set).

Mysqldump out later, and then use MySQL to import, but also tell MySQL, the current data is GB series of character sets, such as GBK. In this way, MySQL is responsible for converting the data from GBK to UTF-8 and saving it to the database.

How to tell MySQL what is the character set of SQL imported, one way is to use--default-character-set, but sometimes it does not work. This is because the mysqldump file has a set names statement. Like what:

The following are the referenced contents:

Head Ea192.060913.sql

--MySQL Dump 10.10
--
--Host:localhost database:ea192
-- ----------------------------------
--Server version 5.0.16-standard-log

/*!40101 SET @OLD_CHARACTER_SET_CLIENT
=@ @CHARACTER_SET_CLIENT * *;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS
=@ @CHARACTER_SET_RESULTS * *;
/*!40101 SET @OLD_COLLATION_CONNECTION
=@ @COLLATION_CONNECTION * *;
/*!40101 SET NAMES latin1 * *;


/*! * * is a mysql-specific syntax, in other databases will be ignored as annotations. The 40101 after/*! is the version that executes the statement in 4.1.1 and above.

Here we see a set NAMES latin1. One of its roles is to tell MySQL that the client passes over the data is the latin1 character set. Because there is such a set Names,--default-character-set can not play a role. If unfortunately there is such a SQL, then you need to remove it or change to set NAMES GBK. Modify or delete the method, when the data quantity is bigger, can use head and tail to cooperate. For example, or the file above:

First look at the set names in line (count), see the 10th line above.

The following are the referenced contents:

Wc-l Ea192.060913.sql
1987 Ea192.060913.sql
The total number of rows is 1987

Head-9 ea192.060913.sql > Final.sql
brum@brum-laptop:~$ tail-1977 Ea192.060913.sql
>> Final.sql
brum@brum-laptop:~$


Head-9 is to take the first 9 lines, tail-1977 is taken after 1977 lines, so that the 10th line of the past.

When you get final.sql and run with MySQL, you can use--DEFAULT-CHARACTER-SET=GBK.

Another option is to use--set-charset=false when mysqldump, so that no set names is present.

So far, there may be a problem with the SQL in CREATE table, such as:

The following are the referenced contents:
DROP TABLE IF EXISTS ' test ';
CREATE TABLE ' Test ' (
' A ' varchar (MB) Default NULL
) Engine=innodb DEFAULT charset=latin1;

There is still a charset=latin1, which will cause the default character set of the newly created table to be latin1 rather than the UTF8 we want.

How to do, if the amount of data is not large, you can consider using the editor to remove it or change to UTF8, if the amount of data can be considered with SED, but may still be a long time.

Another option is mysqldump, which uses--create-options=false to not export the creation properties of the table. However, there is a problem if the storage engine of the exported table is different, because the engine type (InnoDB, MyISAM, etc.) is ignored.

In addition, when mysqldump exports, do not use-B, but directly specify a database name, in order not to appear the CREATE DATABASE statement, because there may also be a default character set of clauses, affect those who do not specify the character set in the Create table. If you have a create DATABASE in your exported SQL, you need to be aware that there are no character set clauses, and if so, you need to modify them.

OK, export files exported or processed through the above methods can be imported using MySQL--DEFAULT-CHARACTER-SET=GBK.



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.