First, the code in MySQL
mysql> show variables like ' collation_% '; +----------------------+-------------------+| variable_name | value |+----------------------+-------------------+| collation_ connection | latin1_swedish_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci |+----------------------+-------------------+3 rows in set (0.00 sec) mysql> show variables like ' character_set_% '; +--------------------------+----------------------------+| variable_name | value |+------- -------------------+----------------------------+| character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ |+--------------------------+------------ ----------------+8 rows in set (0.00 SEC)
The original default is Latin1 encoding, will result in Chinese garbled.
You can set the encoding in MySQL, a single setting
mysql> ALTER DATABASE name character set UTF8;
Mysql> set Character_set_connection=utf8;
Query OK, 0 rows Affected (0.00 sec)
Mysql> set Character_set_connection=utf8;
Query OK, 0 rows Affected (0.00 sec)
Mysql> set Character_set_results=utf8;
Query OK, 0 rows Affected (0.00 sec)
Mysql> set Character_set_server=utf8;
Query OK, 0 rows Affected (0.00 sec)
However, it will fail after a reboot.
You can modify the configuration file:
[[email protected] ~]# Vi/etc/my.cnf[mysql]default-character-set=utf8[client]default-character-set=utf8[mysqld] default-character-set=utf8character_set_server=utf8init_connect= ' Set NAMES UTF8 '
Restart MySQL so that the default encoding is UTF8
[Email protected] ~]# service mysqld restart
See if it becomes UTF8:
Mysql> \s--------------mysql ver 14.12 distrib 5.0.95, for redhat-linux-gnu (x86_64) using readline 5.1Connection id: 2Current Database: toplistscurrent user: [email protected]ssl: not in usecurrent pager: stdoutUsing outfile: ' using delimiter: ; server version: 5.0.95 source distributionprotocol version: 10connection: localhost via unix socketserver characterset: utf8db characterset: Utf8client characterset: utf8conn. characterset: utf8unix socket: /var/lib/mysql /mysql.sockuptime: 39 secthreads: 1 questions: 12 slow queries: 0 Opens: 15 Flush tables: 1 Open tables: 9 Queries per second avg: 0.308--------------mysql> show variables like "char%", +--------------------------+----------------------------+| variable_name | value |+--------------------------+----------------------------+| character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ |+--------------------------+------------ ----------------+8 rows in set (0.00 sec) mysql> show variables like "colla%", +----------------------+-----------------+| variable_name | value |+------- ---------------+-----------------+| collation_connection | utf8_general_ci | | collation_database | utf8_general_ci | | collation_server | utf8_general_ci |+----------------------+-----------------+3 rows in set (0.00 SEC)
change the MySQL encoding to UTF8 and specify the encoding as UTF8 on the command line, and perform an export error:
[[email protected] ~]$ sqoop export --connect "jdbc:mysql://Hadoop48/ Toplists?useunicode=true&characterencoding=utf-8 " -m 1 --table award --export-dir /user/hive/warehouse/hive_myaward/000000_0 --input-null-string "\\\\n" -- input-null-non-string "\\\\n" --input-fields-terminated-by "\\01" -- input-lines-terminated-by "\\n" 12/07/20 13:17:22 info mapred. jobclient: task id : attempt_201207191159_0233_m_000000_0, status : failedjava.io.ioexception: java.sql.sqlexception: incorrect string value: ' \xE6\x9D\ X80\xe7\xa0\xb4 ' for column ' Nick ' at row 1
Originally I created the table default encoding is not UTF8, also need to change a bit.
Mysql> ALTER TABLE Award CONVERT to CHARACTER SET UTF8 COLLATE utf8_general_ci; Query OK, 0 rows Affected (0.00 sec) records:0 duplicates:0 warnings:0
re-export
[[email protected] ~]$ sqoop export --connect "jdbc:mysql://Hadoop48/ Toplists?useunicode=true&characterencoding=utf-8 " -m 1 --table award --export-dir /user/hive/warehouse/hive_myaward/000000_0 --input-null-string "\\\\n" -- input-null-non-string "\\\\n" --input-fields-terminated-by "\\01" -- input-lines-terminated-by "\\n" ... 12/07/20 14:03:10 info mapred. Jobclient: map 0% reduce 0%12/07/20 14:03:24 info mapred. jobclient: map 100% reduce 0% ... 12/07/20 14:03:29 info mapreduce. Exportjobbase: exported 2 record
Checklist mysql> select * from award;+-------------------------------+-----------+-----------+ ------+-----------+-----------+--------+------+-------------+-----------------+---------+| rowkey | productid | matchid | rank | tourneyid | userid | gameid | gold | loginid | nick | plat |+------------------------- ------+-----------+-----------+------+-----------+-----------+--------+------+-------------+-----------------+- --------+| 2012-04-27 06:55:00:402713629 | 5947 | 433203828 | 2 | 4027102 | 402713629 | 1001 | NULL | 715878221 | Kill Day A | ios | | 2012-04-27 06:55:00:406788559 | 778 | 433203930 | 19 | 4017780 | 406788559 | 1001 | 1 | 13835155880 | Pro 牛牛旦旦 | android |+-------------------------------+-----------+-----------+------+-----------+----------- +--------+------+-------------+-----------------+---------+2 rows in set (0.00 sec)
Sqoop Import MySQL Database garbled