Python Django mysqldb connection to MySQL 5.5 Chinese garbled Problem Solution

Source: Internet
Author: User

I wanted to capture some page content from the Internet, parse some of the text, and put it in MySQL data. Python crawls and parses the page to view some tutorials, which will soon be solved. However, it took nearly two days to solve the problem .. That is the problem of Chinese encoding.

The first problem occurred was that the parsed data had already been put into the database, but when dump came out, it was found that it was all garbled: mysqldump-uroot-p -- skip-opt database table> temp. the content of the SQL body is garbled, so I think it should be a problem of MySQL encoding. However, when inserting data, the Chinese characters in the database are displayed normally, maybe when the data is retrieved from the database, encoding is also faulty. In any case, it is clear that the Chinese character encoding methods are inconsistent throughout the process, otherwise similar problems will not occur. MySQL EncodingYou can view some encoding methods in MySQL in the following ways:

Mysql> show variables like 'character % ';

+ ---------------------------------------- + -------------------------

| Variable_name| Value

+ ---------------------------------------- + -------------------------

| Character_set_client| Latin1

| Character_set_connection | Latin1

| Character_set_database | utf8

| Character_set_filesystem | binary

| Character_set_results| Latin1

| Character_set_server| Utf8

| Character_set_system| Utf8

| Character_sets_dir| D: \ mysql \ share \ charsets \

+ ---------------------------------------- + -------------------------

8 rows in SET (0.00 Sec)

 

Mysql> show variables like 'collation _ % ';

+ --------------------------------------- + ------------------

| Variable_name| Value

+ --------------------------------------- + ------------------

| Collation_connection| Latin1_swedish_ci

| Collation_database| Utf8_general_ci

| Collation_server| Utf8_general_ci

+ -------------------------------------- + ------------------

3 rows in SET (0.00 Sec)

We can see that the entire encoding is not UTF-8 encoded. This will cause some problems. For example, the captured Chinese characters are UTF-8 encoded and then transmitted to MySQL. However, the connection and server of MySQL are both Latin1 encoded, indicating utf8, therefore, the submitted utf8 Chinese content will be first encoded by Latin1 and then encoded by utf8. In this case, you may encounter problems when accessing the database using third-party software, such as phpadmin. Although the access is utf8 content, it is a utf8 character encoded by Latin1, problems may occur.


Solution: Of course, the encoding is unified, and all are UTF-8 encoded. How to change it? Here we will only introduce the modification method above mysql5.5. My MySQL is 5.5.29. The modification method is as follows:

Open the MySQL configuration file: sudovim/etc/MySQL/My. CNF

[Mysqld]
Character-set-Server = utf8
Collation-Server = utf8_general_ci


[Client]
Default-character-set = utf8

In the configuration file, add the preceding character encoding content to the corresponding tag.

Then restart the MySQL service: sudo/etc/init. d/MySQL restart

If the restart succeeds, it indicates that the configuration file is correct. If the restart fails, it indicates that the configuration file is faulty. the mysql version you are using is not modified as above, you can check the UTF-8 encoding modification method of the corresponding version online.

In this way, you can view the showvariables like '% char %'; in the previous method, it should all be UTF-8 encoded. (One of them is binary encoded. You don't need to worry about it)


Encoding definition during reading

For example, when using mysqldb to read database content, you can define the encoding method for reading the database:

Mysqldb. Connect (server, user, passwd, dbname, charset = "utf8 ")


In addition, some Chinese encoding methods can be defined in the python header, such:

#-*-Coding: UTF-8 -*-

 

Importsys

Reload (sys)

SYS. setdefaultencoding ('utf8 ')

In short, the encoding method must be consistent throughout the process. encode the data captured from the Internet first. If the encoding is utf8, no matter how it is read, there will be no garbled characters.

# Http://blog.sina.com.cn/s/blog_4aa65a3f01018xgk.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.