Garbled Characters During MySQL Data Migration

Source: Internet
Author: User

Garbled Characters During MySQL Data Migration

If MySQL Data is transplanted from a lower version of mysql to a later version, it is prone to garbled characters when the transplanted data contains Chinese data. Many of them are migrated from MySQL4.x to MySQL5.x. For example, I found a table created using mysql.4. The table structure is as follows:

Create table 'tb _ bookcase '(

'Id' int (10) unsigned not null AUTO_INCREMENT,

'Name' varchar (30) default null,

'Column _ 3' char (10) default null,

Primary key ('id ')

) ENGINE = MyISAM AUTO_INCREMENT = 4 DEFAULTCHARSET = latin1;

From the above results, we can see that the default Character Set of MySQL is latin1. When using MySQL4.x, many people use the latin1 character set. UTF-8 is often used when MySQL5 or a later version is used. But how can we solve this problem?

I found it very troublesome to find it online. I decided to solve it in this way, that is, we first export the entire database according to the specified characters, the exported data is properly encoded in Chinese (I have tried it myself), and then I want to change the encoding method to import it.

Export is input using the doscommand

Mysqldump-hlocalhost-uroot-proot -- default-character-set = latin1 dataBaseName> d: \ test. SQL

Modify all the encoding in the test. SQL file, that is, change latin1 in the file to utf8 or another encoding, run the test. SQL script file with the source command, or import the file as follows:

Mysql-uroot-p123456 -- default-character-set = gb2312 test <d:/test. SQL

In the above default-character-set, we specify the specific characters to export, that is, latin1 we mentioned above. In this case, if the script code in test. SQL contains Chinese characters, it is normal code.

The following is the code of a netizen :.

In the previous system, we used latin1 to save Chinese characters using the GB series character sets (such as GBK and GB2312.

Mysql> show create table test \ G

* *************************** 1. row

Table: test

Create Table: create table 'test '(

'A' varchar (100) default NULL

) ENGINE = InnoDB default charset = utf8

1 row in set (0.00 sec)

 

Mysql> show create table testlatin1 \ G

* *************************** 1. row *

Table: testlatin1

Create Table: create table 'testlatin1 '(

'A' varchar (100) default NULL

) ENGINE = InnoDB default charset = latin1

1 row in set (0.01 sec)

Character Set indicates that if the character set of the column is not specified, the character set of the character type column is the same as the default Character Set of the table.

The character set of the column is to tell MySQL what character set is stored here. However, MySQL does not check what character sets are stored.

Before the extensive use of UTF-8, we use the Chinese character sets of the GB series, such as GB2312, GBK, GB18030 and so on.

In MySQL with the default Character Set latin1, we usually store the Chinese characters of the GB character set in the database, but tell MySQL that it is the latin1 character set. The GB Character Set occupies one Chinese Character in two bytes, while latin1 occupies one character in one byte. That is to say, a GB Chinese character is saved as two latin1 characters. This reminds me of iso8859_1, Which is similar. As long as we save and read the data as latin1, do not convert it, and then use it as a GB character set during display.

So how to store latin1 Chinese characters correctly export UTF-8 character set database?

First, columns in the new database use the UTF-8 character set. One way is to specify the default character set when creating a database. In this way, if the character set is not specified during table creation, the default Character Set of the database is used.

The exported data must be exported using the latin1 character set. In fact, this means that MySQL does not convert the exported data (because the original tables are all in the latin1 character set ).

After mysqldump is used, when MySQL is used for importing data, it also tells MySQL that the current data is a gb series character set, such as gbk. In this way, MySQL is responsible for converting data from gbk to UTF-8 and saving it to the database.

How can I tell MySQL what character set the imported SQL is? One way is to use -- default-character-set, but sometimes it does not actually work. This is because the file mysqldump contains the set names statement. For example:

Head EA192.060913. SQL

 

-- MySQL dump 10.10

--

-- Host: localhost Database: EA192

------------------------------------

-- Server version 5.0.16-standard-log

 

/*! 40101 SET @ OLD_CHARACTER_SET_CLIENT

= @ CHARACTER_SET_CLIENT */;

/*! 40101 SET @ OLD_CHARACTER_SET_RESULTS

= @ CHARACTER_SET_RESULTS */;

/*! 40101 SET @ OLD_COLLATION_CONNECTION

= @ COLLATION_CONNECTION */;

/*! 40101 set names latin1 */;

/*! */The MySQL-specific syntax is ignored as a comment in other databases. /*! The following version 40101 indicates the version. This statement is executed only in version 4.1.1 and later.

Here we can see a set names latin1. One of its functions is to tell mysql that the data uploaded by the client is the latin1 character set. Because there is such a set names, -- default-character-set will not function. If such an SQL statement is unfortunate, you need to remove it or change it to SET NAMES gbk. The modification or deletion method can be used with head and tail when the data volume is large. For example, the above file:

First, let's take a look at the number of SETNAMES rows in the head. The number of rows shown above is 10th.

Wc-l EA192.060913. SQL

1987 EA192.060913. SQL

The total number of rows obtained is 1987.

 

Head-9 EA192.060913. SQL> final. SQL

Brum @ brum-laptop :~ $ Tail-1977 EA192.060913. SQL

> Final. SQL

Brum @ brum-laptop :~ $

Head-9 is to take the first nine rows, and tail-1977 is to take the last 1977 rows, so that the 10th rows are separated.

 

When final. SQL is run with MySQL, -- default-character-set = gbk can be used.

 

Another way is to use -- set-charset = false in mysqldump, so that no set names will appear.

 

So far, there may be problems in the SQL statement of create table, such:

 

Drop table if exists 'test ';

Create table 'test '(

'A' varchar (100) default NULL

) ENGINE = InnoDB default charset = latin1;

Here there is still a CHARSET = latin1, which will cause the default Character Set of the newly created table to be latin1 rather than UTF8.

What should I do? If the data volume is small, you can use the editor to remove it or change it to UTF8. If the data volume is large, you can use sed, but it may still take a long time.

Another method is mysqldump. Use -- create-options = false to do not export the table creation attributes. However, if the storage engines of the exported tables are different, the engine type (innodb, myisam, etc.) is ignored.

In addition, do not use-B for mysqldump export. Instead, specify a database name directly to avoid the CREATE DATABASE statement, because there may also be a default character set clause, will affect tables that are not specified in the CREATETABLE character set. If the exported SQL statement contains CREATEDATABASE, pay attention to whether there is a character set clause. If so, modify it.

All right, you can use mysql -- default-character-set = gbk to import exported or processed files.

 

 

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.