Scenario Description:
1, now on hundreds of MySQL database, character encoding latin1, now need to do an activity, will be on the MySQL database of some of the activity data synchronized to a MySQL summary database (latin1), and then the data synchronization Oracle, the last official website display.
2, Oracle is the active library, character set is ZHS16GBK, because a large part of the data are in the Oracle library, so you need to synchronize the data in MySQL to Oracle.
3, MySQL has a field name, the content is Chinese, all kinds of Martian text.
4, the official website is developed with Java, all projects are encoded in UTF8.
First you need to understand a few encodings:
1, Latin1 is iso-8859-1 alias, iso-8859-1 encoding is a single-byte encoding, so in support of the ISO-8859-1 system transmission and storage of any other encoded byte stream will not be discarded.
In other words, it's no problem to treat any other coded byte stream as a iso-8859-1 code. This is a very important feature, the MySQL database default code is Latin1 is the use of this feature.
The ASCII encoding is a 7-bit container, and the ISO-8859-1 encoding is a 8-bit container.
2, GBK, this is needless to say, the man's GB code, dedicated to the expression of Chinese characters, is a double-byte code, GBK is a subset of gb2312, GB2312 is a subset of GB18030.
3, UTF8, this is a variable-length encoding, it can use 1~4 byte to represent a symbol, according to different symbols and change the length of bytes.
Solve the idea through Java program:
1, will be from the mysql,iso-8859-1 query, and then converted into GBK encoding stored in Oracle, and then GBK read out of the official website display. Part of the show is fine, but GB18030 can't show the Martian text, many of the Martian texts are shown as.
2, will be from MySQL in UTF8 encoding read out, to UTF8 encoding stored in Oracle. This part is not decisive, why? Because MySQL is latin1 code, to UTF8 encoding is not normal read out, all is garbled.
3, the MySQL will be read iso-8859-1 encoding, and then turned into UTF8, and then UTF8 encoded into Oracle. ISO does not normally turn into UTF8 and is incompatible.
The above methods do not normally encode conversion, only in the summary database this way. If the database that summarizes MySQL is turned into UTF8, then the Java program will display correctly. Let's go!!!
The summary database is able to view the database of the Martian text, Linux support is much better than Java, may be due to open source and the problem of not open source.
1, logical Backup of the database:
mysqldump--default-character-set=latin1-q--single-transaction-t db_collect table1 table2
2. Recreate the UT8 Library and table structure
3, through the Linux below the Iconv command for transcoding
Lang=en_us
Crt=default
Sed-i ' s/latin1/utf8/g ' Db_collect.sql
Iconv-f gb18030-c-T UTF-8 Db_collect.sql-o Db_collect_result.sql
Mysql-f Db_collect2 < Db_collect_result.sql
4. Adjust system coding and CRT coding
Lang=en_us. UTF-8
Crt=utf-8
5, the normal display of data, through the Java program to UTF8 encoding way to view, display normal.
There is a problem, why Java turn the latin1 into GB18030 Martian text can not be shown, in Linux under the Iconv command to turn on it? Latin1 can not be directly converted into GB18030, can only be based on GB18030 coding, and then transferred to the UTF8 to support the Martian text.
The display is fine, but the new problem appears, and every time this transcoding occurs, the database will be unusable. Of course, you can increase the transcoding, and then import, but this is too much trouble.
Finally through a PHP script to solve the problem: directly from hundreds of databases to the default code query data, and then through the iconv into UTF8 encoding, insert directly into the UTF8 table.
However, it should be noted that the set names UTF8 is required before the insert, and the system encoding needs to be changed to UTF8.
PHP Script:
$total _conn = Open_mysql ($total _mysql[1], $total _mysql[2], $total _mysql[3], $total _mysql[4]);
mysql_query ("Set names UTF8;", $total _conn);
... Omitted
$total _sql = "INSERT into Db_collect2.table1 (name) VALUES ('". Iconv (' gb18030 ', ' UTF-8 ', $list [' name ']). "
mysql_query ($total _sql, $total _conn);
... Omitted
Execute script:
Export Lang=en_us. UTF-8
php/tmp/collect.php