I. mysql Encoding
Mysql> show variables like 'collation _ % ';
+ ---------------------- + ------------------- +
| Variable_name | Value |
+ ---------------------- + ------------------- +
| Collation_connection | latin1_swedish_ci |
| Collation_database | latin1_swedish_ci |
| Collation_server | latin1_swedish_ci |
+ ---------------------- + ------------------- +
3 rows in set (0.00 sec)
Mysql> show variables like 'character _ set _ % ';
+ -------------------------- + ---------------------------- +
| Variable_name | Value |
+ -------------------------- + ---------------------------- +
| Character_set_client | latin1 |
| Character_set_connection | latin1 |
| Character_set_database | latin1 |
| Character_set_filesystem | binary |
| Character_set_results | latin1 |
| Character_set_server | latin1 |
| Character_set_system | utf8 |
| Character_sets_dir |/usr/share/mysql/charsets/|
+ -------------------------- + ---------------------------- +
8 rows in set (0.00 sec)
It can be seen that the default value is latin1 encoding, which causes Chinese garbled characters.
You can set the encoding in mysql.
Mysql> alter database name character set utf8;
Mysql> set character_set_connection = utf8;
Query OK, 0 rows affected (0.00 sec)
Mysql> set character_set_connection = utf8;
Query OK, 0 rows affected (0.00 sec)
Mysql> set character_set_results = utf8;
Query OK, 0 rows affected (0.00 sec)
Mysql> set character_set_server = utf8;
Query OK, 0 rows affected (0.00 sec)
However, it will become invalid after restart.
You can modify the configuration file:
[Root @ Hadoop48 ~] # Vi/etc/my. cnf
[Mysql]
Default-character-set = utf8
[Client]
Default-character-set = utf8
[Mysqld]
Default-character-set = utf8
Character_set_server = utf8
Init_connect = 'set NAMES utf8'
Restart mysql to ensure that the default encoding is utf8.
[Root @ Hadoop48 ~] # Service mysqld restart
Check whether it is changed to utf8:
Mysql> \ s
--------------
Mysql Ver 14.12 Distrib 5.0.95, for RedHat-linux-gnu (x86_64) using readline 5.1
Connection id: 2
Current database: toplists
Current user: root @ localhost
SSL: Not in use
Current pager: stdout
Using outfile :''
Using delimiter :;
Server version: 5.0.95 Source distribution
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket:/var/lib/mysql. sock
Uptime: 39 sec
Threads: 1 Questions: 12 Slow queries: 0 Opens: 15 Flush tables: 1 Open tables: 9 Queries per second avg: 0.308
--------------
Mysql> show variables like "char % ";
+ -------------------------- + ---------------------------- +
| Variable_name | Value |
+ -------------------------- + ---------------------------- +
| Character_set_client | utf8 |
| Character_set_connection | utf8 |
| Character_set_database | utf8 |
| Character_set_filesystem | binary |
| Character_set_results | utf8 |
| Character_set_server | utf8 |
| Character_set_system | utf8 |
| Character_sets_dir |/usr/share/mysql/charsets/|
+ -------------------------- + ---------------------------- +
8 rows in set (0.00 sec)
Mysql> show variables like "colla % ";
+ ---------------------- + ----------------- +
| Variable_name | Value |
+ ---------------------- + ----------------- +
| Collation_connection | utf8_general_ci |
| Collation_database | utf8_general_ci |
| Collation_server | utf8_general_ci |
+ ---------------------- + ----------------- +
3 rows in set (0.00 sec)
Change mysql encoding to utf8, specify the Encoding As utf8 in the command line, and run the Export command to report an error:
[Zhouhh @ Hadoop46 ~] $ Sqoop export -- connect "jdbc: mysql: // Hadoop48/toplists? UseUnicode = true & characterEncoding = UTF-8 "-m 1 -- table award -- export-dir/user/hive/warehouse/hive_myaward/000000_0 -- input-null-string" \\\\ N "-- input-null-non-string" \\\\ N "-- input-fields-terminated-by" \ 01 "-- input-lines-terminated-"\\ n"
12/07/20 13:17:22 INFO mapred. JobClient: Task Id: attempt_201207191159_0233_m_000000_0, Status: FAILED
Java. io. IOException: java. SQL. SQLException: Incorrect string value: '\ xE6 \ x9D \ x80 \ xE7 \ xA0 \ xb4...' for column 'Nick 'at row 1
In the past, the default encoding of the table I created was not utf8. You also need to change it.
Mysql> alter table award convert to character set utf8 COLLATE utf8_general_ci;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
Export again
[Zhouhh @ Hadoop46 ~] $ Sqoop export -- connect "jdbc: mysql: // Hadoop48/toplists? UseUnicode = true & characterEncoding = UTF-8 "-m 1 -- table award -- export-dir/user/hive/warehouse/hive_myaward/000000_0 -- input-null-string" \\\\ N "-- input-null-non-string" \\\\ N "-- input-fields-terminated-by" \ 01 "-- input-lines-terminated-"\\ n"
...
12/07/20 14:03:10 INFO mapred. JobClient: map 0% reduce 0%
12/07/20 14:03:24 INFO mapred. JobClient: map 100% reduce 0%
...
12/07/20 14:03:29 INFO mapreduce. ExportJobBase: Exported 2 record Checklist
Mysql> select * from award;
+ ------------------------------- + ----------- + ------ + ----------- + -------- + ------------- + ----------------- + ----------- +
| Rowkey | productid | matchid | rank | tourneyid | userid | gameid | gold | loginid | nick | plat |
+ ------------------------------- + ----------- + ------ + ----------- + -------- + ------------- + ----------------- + ----------- +
| 06: 55: 00: 402713629 | 5947 | 433203828 | 2 | 4027102 | 402713629 | 1001 | NULL | 715878221 | A | ios |
| 06: 55: 00: 406788559 | 778 | 433203930 | 19 | 4017780 | 406788559 | 1001 | 1 | 13835155880 | Android |
+ ------------------------------- + ----------- + ------ + ----------- + -------- + ------------- + ----------------- + ----------- +
2 rows in set (0.00 sec)
Solve the problem of Chinese encoding.