Python MySQL utf-8 Latin

Source: Internet
Author: User
Tags python mysql

Recently in this news crawler to do text analysis, down from the Internet some crawler code source used is

https://jooop.github.io/2017/01/29/python3%E7%BD%91%E6%98%93%E7%88%AC%E8%99%AB/#1-%e6%a8%a1%e5%9d%97%e7%9a%84% e9%80%89%e6%8b%a9%e5%92%8c%e5%88%97%e8%a1%a8%e9%a1%b5%e9%9d%a2%e7%9a%84%e7%88%ac%e5%8f%96%ef%bc%9a

Python 2.7+mysql5.6+window7 system +pycharm (IDE) can be used directly

Because the crawler involves Chinese storage to the MySQL database, so the middle experienced a Chinese garbled display, Chinese storage to the database is not normal display, from the Python side printed characters are not Chinese display problems

In the final Jiede this is a coding format problem. So write down as a note, in the data circle of people, how can bypass the coding format ....

First, the Python-side coding problem:

Python2 (including Python26, Python27, etc.) string usually contains str, Unicode two types, usually str string encoding method is determined by the source code file encoding, the current use of the basic is UTF-8 encoding format, So to specify the encoding format in the header of the py file: #-*-Coding:utf-8-*-

Inside a python program, the usual string is Unicode encoding, a string character that is a memory-encoded format that, if stored in a file or log, requires A Unicode-encoded string is converted to a storage encoding format for a specific character set.

What is Unicode and UTF-8? What is the connection between Unicode and UTF-8?

Unicode (Uniform Code, universal Code, single Code) is an industry standard in the field of computer science, including character set, encoding scheme, etc. Unicode is created to address the limitations of traditional character encoding schemes, which set a uniform and unique binary encoding for each character in each language to meet the requirements of cross-language, cross-platform text conversion and processing. As long as there is Unicode encoding system on the computer, no matter what kind of text in the world, only need to save the file, save the Unicode encoding can be interpreted by other computer normal.

UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode, which is also a prefix code, also known as the Universal Code.

In a nutshell, Unicode is a concept, and UTF-8 is the instantiation of the Unicode concept. (The boss says we're going to have a big data architecture (this is where the concept boss doesn't know what the implementation standard is for Unicode), and the programmer has a Hadoop architecture (UTF-8) that's the implementation of the big Data architecture.)

Here is the code for the experiment in Python

EX1:

IN[10]: "Chinese"
OUT[10]:
' \xe4\xb8\xad\xe6\x96\x87 '

This example directly enters the value of the UTF-8 encoded format printed in Chinese

    • \x: only 16 binary meaning, followed by two bits, then the single-byte encoding;
      • \d: decimal; \o: octal

EX2: If you want to print out Chinese must be in front of the print I can't help but wonder why? My understanding is that if you do not add Print,python do not think you want to print display, just display the data, it is a lazy way to directly display the word in the computer encoding, if you add print, he understood that you asked to print out, Print it out according to the meaning of your actual representative. The premise is that your system is UTF-8 encoded format OH. If it is not UTF-8 encoding format, print out is garbled. To change the encoding format, see

IN[11]: print "Chinese"

Chinese

IN[12]: sys.getdefaultencoding ()
OUT[12]:
' Utf-8 '

Answer:print the process of printing the display

Figure 1. Print Printing display process

When you call print in Python2.7 for a VAR variable, the operating system will handle Var with a certain character: if Var is a variable of type str, the VAR variable is delivered directly to the terminal for display, and if the Var variable is a Unicode type, The operating system first encodes Var into an object of type str (the encoding format depends on the encoding format of the STDOUT), which is then presented to the terminal. In the terminal display, if the str type of the variable encoding method and the terminal settings are not encoded in the same way, it is likely that garbled problems.

Chinese processing in a ex3, list, or dictionary

data = {"A": "Hello", "B": "China"} #假设是utf-8 format

At this point we use Print to output data directly, or use the STR function to convert data to a string. The Chinese is a Unicode character, such as:

>>> data = {"A": "Hello", "B": "China"}
>>> Print Data
{' A ': ' Hello ', ' B ': ' \xd6\xd0\xb9\xfa '}

Output Chinese fields separately no problem, such as

>>> print data[' B ']
China

If you want to be able to output the entire dictionary normally, you can take advantage of the JSON package dump method, such as:

>>> data = {"A": "Hello", "B": "China"}
>>> s = json.dumps (Data,ensure_ascii=false);
>>> Print S
{"A": "Hello", "B": "China"}

>>> print isinstance (S,STR)
True

Then say how these data are stored properly in MySQL

First MySQL to support UTF-8 encoded storage, need to go to the MySQL installation file My.ini configuration in the configuration

[Client]
#password = Your_password
Port= 3306
socket=/tmp/mysql.sock
Default-character-set=utf8

[Mysqld]
port=3306
Character-set-server=utf8
Collation-server=utf8_general_ci

Second, make sure that the default storage for the tables and fields you created is also in utf-8 format, and how to view and change them

Https://www.cnblogs.com/wcwen1990/p/6917109.html can refer to this page

Then you need to store the Utf-8 character directly in its own character instead of the computer's default binary byte, and you can

Str.decode ("Unicode_escape") implementation
Yes, I didn't do it. Decode conversion, directly stored in the Unicode encoding file

The storage after adding decode ("Unicode_escape") is just the normal text.

Python MySQL utf-8 Latin

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.