Chinese garbled characters exported from Hive to MySQL

Source: Internet
Author: User

I. mysql Encoding
 
Mysql> show variables like 'collation _ % ';
+ ---------------------- + ------------------- +
| Variable_name | Value |
+ ---------------------- + ------------------- +
| Collation_connection | latin1_swedish_ci |
| Collation_database | latin1_swedish_ci |
| Collation_server | latin1_swedish_ci |
+ ---------------------- + ------------------- +
3 rows in set (0.00 sec)
Mysql> show variables like 'character _ set _ % ';
+ -------------------------- + ---------------------------- +

| Variable_name | Value |
+ -------------------------- + ---------------------------- +
| Character_set_client | latin1 |
| Character_set_connection | latin1 |
| Character_set_database | latin1 |
| Character_set_filesystem | binary |
| Character_set_results | latin1 |
| Character_set_server | latin1 |
| Character_set_system | utf8 |
| Character_sets_dir |/usr/share/mysql/charsets/|
+ -------------------------- + ---------------------------- +
8 rows in set (0.00 sec)
It can be seen that the default value is latin1 encoding, which causes Chinese garbled characters.
 
You can set the encoding in mysql.
Mysql> alter database name character set utf8;
Mysql> set character_set_connection = utf8;
 
Query OK, 0 rows affected (0.00 sec)
Mysql> set character_set_connection = utf8;
Query OK, 0 rows affected (0.00 sec)
Mysql> set character_set_results = utf8;
Query OK, 0 rows affected (0.00 sec)
Mysql> set character_set_server = utf8;
Query OK, 0 rows affected (0.00 sec)
However, it will become invalid after restart.
 
 
 
You can modify the configuration file:
[Root @ Hadoop48 ~] # Vi/etc/my. cnf
[Mysql]
Default-character-set = utf8
[Client]
Default-character-set = utf8
[Mysqld]
Default-character-set = utf8
Character_set_server = utf8
Init_connect = 'set NAMES utf8'
Restart mysql to ensure that the default encoding is utf8.
[Root @ Hadoop48 ~] # Service mysqld restart
Check whether it is changed to utf8:
Mysql> \ s
--------------
Mysql Ver 14.12 Distrib 5.0.95, for RedHat-linux-gnu (x86_64) using readline 5.1

Connection id: 2
Current database: toplists
Current user: root @ localhost
SSL: Not in use
Current pager: stdout
Using outfile :''
Using delimiter :;
Server version: 5.0.95 Source distribution
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket:/var/lib/mysql. sock
Uptime: 39 sec

Threads: 1 Questions: 12 Slow queries: 0 Opens: 15 Flush tables: 1 Open tables: 9 Queries per second avg: 0.308
--------------

Mysql> show variables like "char % ";
+ -------------------------- + ---------------------------- +
| Variable_name | Value |
+ -------------------------- + ---------------------------- +
| Character_set_client | utf8 |
| Character_set_connection | utf8 |
| Character_set_database | utf8 |
| Character_set_filesystem | binary |
| Character_set_results | utf8 |
| Character_set_server | utf8 |
| Character_set_system | utf8 |
| Character_sets_dir |/usr/share/mysql/charsets/|
+ -------------------------- + ---------------------------- +
8 rows in set (0.00 sec)

Mysql> show variables like "colla % ";
+ ---------------------- + ----------------- +
| Variable_name | Value |
+ ---------------------- + ----------------- +
| Collation_connection | utf8_general_ci |
| Collation_database | utf8_general_ci |
| Collation_server | utf8_general_ci |
+ ---------------------- + ----------------- +
3 rows in set (0.00 sec)
Change mysql encoding to utf8, specify the Encoding As utf8 in the command line, and run the Export command to report an error:
[Zhouhh @ Hadoop46 ~] $ Sqoop export -- connect "jdbc: mysql: // Hadoop48/toplists? UseUnicode = true & characterEncoding = UTF-8 "-m 1 -- table award -- export-dir/user/hive/warehouse/hive_myaward/000000_0 -- input-null-string" \\\\ N "-- input-null-non-string" \\\\ N "-- input-fields-terminated-by" \ 01 "-- input-lines-terminated-"\\ n"

12/07/20 13:17:22 INFO mapred. JobClient: Task Id: attempt_201207191159_0233_m_000000_0, Status: FAILED
Java. io. IOException: java. SQL. SQLException: Incorrect string value: '\ xE6 \ x9D \ x80 \ xE7 \ xA0 \ xb4...' for column 'Nick 'at row 1
 
 
In the past, the default encoding of the table I created was not utf8. You also need to change it.
Mysql> alter table award convert to character set utf8 COLLATE utf8_general_ci;
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
Export again
[Zhouhh @ Hadoop46 ~] $ Sqoop export -- connect "jdbc: mysql: // Hadoop48/toplists? UseUnicode = true & characterEncoding = UTF-8 "-m 1 -- table award -- export-dir/user/hive/warehouse/hive_myaward/000000_0 -- input-null-string" \\\\ N "-- input-null-non-string" \\\\ N "-- input-fields-terminated-by" \ 01 "-- input-lines-terminated-"\\ n"
...
12/07/20 14:03:10 INFO mapred. JobClient: map 0% reduce 0%
12/07/20 14:03:24 INFO mapred. JobClient: map 100% reduce 0%
...
12/07/20 14:03:29 INFO mapreduce. ExportJobBase: Exported 2 record Checklist
Mysql> select * from award;
+ ------------------------------- + ----------- + ------ + ----------- + -------- + ------------- + ----------------- + ----------- +
| Rowkey | productid | matchid | rank | tourneyid | userid | gameid | gold | loginid | nick | plat |
+ ------------------------------- + ----------- + ------ + ----------- + -------- + ------------- + ----------------- + ----------- +
| 06: 55: 00: 402713629 | 5947 | 433203828 | 2 | 4027102 | 402713629 | 1001 | NULL | 715878221 | A | ios |
| 06: 55: 00: 406788559 | 778 | 433203930 | 19 | 4017780 | 406788559 | 1001 | 1 | 13835155880 | Android |
+ ------------------------------- + ----------- + ------ + ----------- + -------- + ------------- + ----------------- + ----------- +
2 rows in set (0.00 sec)
 
 
Solve the problem of Chinese encoding.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.