Python2 crawler gets data stored in MySQL times wrong "Incorrect string value: ' \\xE6\\x96\\xB0\\xE9\\x97\\xBB ' for column ' new ' at row 1 ' workaround

Source: Internet
Author: User

Tags: connection number connection column theory input compare Chinese character BSP map

Due to the use of PYTHON3 encoding, when using Python2, the crawler data connected to the database for storage, such as the error occurred, the data is database coding problem.

Turn from: http://www.cnblogs.com/liuzhixin/p/6274821.html's blog, thanks to bloggers for their generous sharing.

Error reason: We can see the character 0xf0 0x9F 0x98 0x84 in the error prompt, which corresponds to the 4-byte encoding in the UTF-8 encoding format (UTF-8 encoding specification). Normal Chinese characters generally don't exceed 3 bytes, why do they appear 4 bytes? In fact, it corresponds to the expression in the smart phone input method. Then why did you make an error? Because utf-8 in MySQL is not a true utf-8, it can only store utf-8 encoding of the length of a byte, if you want to store 4 bytes in a utf8mb4 type. Instead of using the UTF8MB4 type, first make sure the MySQL version is either lower than the MySQL 5.5.3.

Common character Sets
      • ASCII: United States Information Interchange standard coding; English and other Western European languages; single-byte encoding, 7-bit represents a character, a total of 128 characters.
      • GBK: Double-byte, Chinese character code expansion specification; Chinese, Japanese and Korean characters, English, numerals, double-byte encoding, a total of 21,003 Chinese characters, GB2312 extension.
      • Utf-8:unicode standard variable length character encoding, Unicode Standard (Uniform code), industry uniform standards, including the world's dozens of kinds of text system;
      • UTF-8: Encode each character using one to three bytes.
      • Utf8mb4: Stores four bytes, and the application is used to store emoji emoticons because it can have emoji four bytes of expression.
      • Utf8mb4:mysql versions > 5.5.3.
      • Other common character sets:,,, UTF-32 UTF-16 Big5latin1
      • The character set in the database contains two levels of meaning
        • A collection of various text and symbols, including national text, punctuation, graphic symbols, numbers, etc.
        • The encoding of characters, that is, the mapping rules for binary data and characters.

Solution:

1) using the UTF8MB4 data type

[Client]
Default-character-set = Utf8mb4

[MySQL]
Default-character-set = Utf8mb4

[Mysqld]
Character-set-server = Utf8mb4
Collation-server = Utf8mb4_unicode_ci

Change the corresponding field in the database to Utf8mb4_general_ci

# for each of the databases:

ALTER database here name CHARACTER SET = utf8mb4 COLLATE = UTF8 mb4_unicode_ci;
# for each table:
ALTER Table here is the name of CONVERT to CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;   
# for each field:
ALTER table here is the table name change field name repeating field name VARCHAR (191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;   
# above or use Modify to change
ALTER table here is the table name modify field name VARCHAR (CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT '; The
Utf8mb4 is completely backwards compatible with UTF8, with no garbled or other data loss forms. Theoretically it can be safely modified ... It is convenient to modify the database.


Modify the URL of the connection database in the project, remove Characterencoding=utf-8, this step must be

2) Custom filter rules that filter or convert the four-byte UTF-8 characters that appear in the text to a custom type.

The following is an example of a test that translates 4-byte characters to 0000.

for (int i = 0; i < b_text.length; i++) {

if ((B_text[i] & 0xF8) = = 0xF0) {
for (int j = 0; J < 4; J + +) {
b_text[i+j]=0x30;
}
i+=3;
}
}

3) Change to GBK code Yes, but I haven't tried it.

See methods for three MySQL character sets

One, view MySQL database server and database MySQL character set.

    1. MySQL> Show variables like '%char% ';

Second, look at the MySQL data Table (table) of the MySQL character set.

    1. MySQL> Show table status from sqlstudy_db like '%countries% ';

Check the MySQL character set for MySQL data column (column).

      1. MySQL> Show full columns from countries;

Python2 crawler gets data stored in MySQL times wrong "Incorrect string value: ' \\xE6\\x96\\xB0\\xE9\\x97\\xBB ' for column ' new ' at row 1 ' workaround

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: