Problems related to Chinese garbled characters exported from Hive to Mysql

Source: Internet
Author: User
Tags sqoop
In the previous article export data from Hive to Mysql, although hbase data is successfully exported to mysql through hive, however, we encountered Chinese garbled characters. 1. mysql encoding mysqlshowvariableslikecollation _ %; + -------------------- +

In the previous article export data from Hive to Mysql, although hbase data is successfully exported to mysql through hive, however, we encountered Chinese garbled characters. 1. mysql encoding mysql show variables like 'collation _ % '; + -------------------- +

Chinese garbled characters exported from Hive to Mysql

In the previous article export data from hive to mysql, although hbase data is successfully exported to mysql through hive, we encountered Chinese garbled characters.

I. mysql Encoding
mysql> show variables like 'collation_%';+----------------------+-------------------+| Variable_name | Value |+----------------------+-------------------+| collation_connection | latin1_swedish_ci || collation_database | latin1_swedish_ci || collation_server | latin1_swedish_ci |+----------------------+-------------------+3 rows in set (0.00 sec)mysql> show variables like 'character_set_%';+--------------------------+----------------------------+| Variable_name | Value |+--------------------------+----------------------------+| character_set_client | latin1 || character_set_connection | latin1 || character_set_database | latin1 || character_set_filesystem | binary || character_set_results | latin1 || character_set_server | latin1 || character_set_system | utf8 || character_sets_dir | /usr/share/mysql/charsets/ |+--------------------------+----------------------------+8 rows in set (0.00 sec)

? It can be seen that the default value is latin1 encoding, which causes Chinese garbled characters.

You can set the encoding in mysql.
Mysql> alter database name character set utf8;
Mysql> set character_set_connection = utf8;

Query OK, 0 rows affected (0.00 sec)
Mysql> set character_set_connection = utf8;
Query OK, 0 rows affected (0.00 sec)
Mysql> set character_set_results = utf8;
Query OK, 0 rows affected (0.00 sec)
Mysql> set character_set_server = utf8;
Query OK, 0 rows affected (0.00 sec)
However, it will become invalid after restart.

?

You can modify the configuration file:

[root@Hadoop48 ~]# vi /etc/my.cnf[mysql]default-character-set=utf8[client]default-character-set=utf8[mysqld]default-character-set=utf8character_set_server=utf8init_connect='SET NAMES utf8'

? Restart mysql to ensure that the default encoding is utf8.

[root@Hadoop48 ~]# service mysqld restart

? Check whether it is changed to utf8:

mysql> \s--------------mysql Ver 14.12 Distrib 5.0.95, for redhat-linux-gnu (x86_64) using readline 5.1Connection id: 2Current database: toplistsCurrent user: root@localhostSSL: Not in useCurrent pager: stdoutUsing outfile: ''Using delimiter: ;Server version: 5.0.95 Source distributionProtocol version: 10Connection: Localhost via UNIX socketServer characterset: utf8Db characterset: utf8Client characterset: utf8Conn. characterset: utf8UNIX socket: /var/lib/mysql/mysql.sockUptime: 39 secThreads: 1 Questions: 12 Slow queries: 0 Opens: 15 Flush tables: 1 Open tables: 9 Queries per second avg: 0.308--------------mysql> show variables like "char%";+--------------------------+----------------------------+| Variable_name | Value |+--------------------------+----------------------------+| character_set_client | utf8 || character_set_connection | utf8 || character_set_database | utf8 || character_set_filesystem | binary || character_set_results | utf8 || character_set_server | utf8 || character_set_system | utf8 || character_sets_dir | /usr/share/mysql/charsets/ |+--------------------------+----------------------------+8 rows in set (0.00 sec)mysql> show variables like "colla%";+----------------------+-----------------+| Variable_name | Value |+----------------------+-----------------+| collation_connection | utf8_general_ci || collation_database | utf8_general_ci || collation_server | utf8_general_ci |+----------------------+-----------------+3 rows in set (0.00 sec)

? Change mysql encoding to utf8, specify the Encoding As utf8 in the command line, and run the Export command to report an error:

[zhouhh@Hadoop46 ~]$ sqoop export --connect "jdbc:mysql://Hadoop48/toplists?useUnicode=true&characterEncoding=utf-8" -m 1 --table award --export-dir /user/hive/warehouse/hive_myaward/000000_0 --input-null-string "\\\\N" --input-null-non-string "\\\\N" --input-fields-terminated-by "\\01" --input-lines-terminated-by "\\n"12/07/20 13:17:22 INFO mapred.JobClient: Task Id : attempt_201207191159_0233_m_000000_0, Status : FAILEDjava.io.IOException: java.sql.SQLException: Incorrect string value: '\xE6\x9D\x80\xE7\xA0\xB4...' for column 'nick' at row 1

? In the past, the default encoding of the table I created was not utf8. You also need to change it.

mysql> ALTER TABLE award CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;Query OK, 0 rows affected (0.00 sec)Records: 0 Duplicates: 0 Warnings: 0

? Export again

[zhouhh@Hadoop46 ~]$ sqoop export --connect "jdbc:mysql://Hadoop48/toplists?useUnicode=true&characterEncoding=utf-8" -m 1 --table award --export-dir /user/hive/warehouse/hive_myaward/000000_0 --input-null-string "\\\\N" --input-null-non-string "\\\\N" --input-fields-terminated-by "\\01" --input-lines-terminated-by "\\n"...12/07/20 14:03:10 INFO mapred.JobClient: map 0% reduce 0%12/07/20 14:03:24 INFO mapred.JobClient: map 100% reduce 0%...12/07/20 14:03:29 INFO mapreduce.ExportJobBase: Exported 2 record

?

Checklist mysql> select * from award; + region + ----------- + ------ + ----------- + -------- + ------------- + region + --------- + | rowkey | productid | matchid | rank | tourneyid | userid | gameid gold | loginid | nick | plat | + response + ----------- + ------ + ----------- + -------- + ------------- + ----------------- + ----------- + | 2012-04-27 06: 55: 00: 402713629 | 5947 | 433203828 | 2 | 4027102 | 402713629 | 1001 | NULL | 715878221 | day A | ios | 2012-04-27 06: 55: 00: 406788559 | 778 | 433203930 | 19 | 4017780 | 406788559 | 1 | 1001 | android | + --------------------------------------- + ----------- + --------- + ----------- + -------- + ------ + ------------- + ----------------- + --------- + 2 rows in set (0.00 sec)

? Solve the problem of Chinese encoding.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.