Python character set conversion (MySQL data garbled processing)

Source: Internet
Author: User
Tags mysql code

    • This article references: http://blog.csdn.net/crazyhacking/article/details/39375535 thank you for finishing!
    • Chardet module: http://blog.csdn.net/tianzhu123/article/details/8187470
    • Character Set conversion section: http://blog.chinaunix.net/uid-26249349-id-2846894.html

1.mysql garbled problem:

  • Background: Two MSYQL libraries, character sets are GBK, need to fetch data from a library, insert into B library, some of the field values are Chinese.
  • Code:
  • #!/usr/bin/env python#_*_ encoding:utf-8 _*_" "Author:tiantiandas" "Importsysreload (SYS) sys.setdefaultencoding ('GBK')ImportMySQLdbdefConnect_mysql (sql,host): Db_info= {'Host': Host,'User':'Test',               'DB':'TestDB',               'passwd':'dnstest',               'CharSet':'GBK'}#is critical    Try: Connect= MySQLdb.connect (* *db_info) Cursor=connect.cursor () cursor.execute (SQL) connect.commit () result=Cursor.fetchone ()returnresultexceptException as E:Printe Sys.exit (10)defmain (): Domain= Sys.argv[1] Query='Select Name,admindesc from Emailbox where domain= "{0}"'. Format (domain)Try: Name, Admindesc= Connect_mysql (sql=query,host="host1") Update="update Emailbox set name= ' {0} ', admindesc= ' {1} where domain= ' {2} '". Format (NAME,ADMINDESC)Try:            Printupdate connect_mysql (SQL=update,host='Host2')                        exceptException as E:PrinteexceptException as E:Printeif __name__=='__main__': Main ()
  • Several key points:
    • Sys.setdefaultencoding (' GBK '): This code allows data to be pulled from the a library, and Python decodes it into GBK. (That's probably what it means)
    • MySQL code: CHARSET:GBK: This adjustment allows the data set written to the library to be GBK
  • So if you pull out the data for your own viewing, you don't need the sys.setdefaultencoding (' GBK ') code.

2. About encoding and decoding

  • Chardet Module
    • Chardet is a module of character encoding recognition, using the following:
    • # !/usr/bin/env python # _*_ encoding:utf-8 _*_ Import Chardet  a=" everyday "print  chardet.detect (a) Result: {'  confidence'encoding'utf-8 '}
    • If you want to encode a large file, such as the following method, you can improve the recognition speed: (This is really faster than the first one)
    • ImportUrllib fromChardet.universaldetectorImportUniversaldetectorusock= Urllib.urlopen ('http://www.baidu.com/')#Create a Detection objectdetector =Universaldetector () forLineinchusock.readlines ():#The block is tested until the threshold has been reachedDetector.feed (line)ifDetector.done: Break#Close the Detection objectdetector.close () usock.close ( )#Output Test ResultsPrintDetector.result Run Result: {'confidence': 0.99,'encoding':'GB2312'}
    • With the Chardet module, you can recognize the format of the character set that gets the data, and then you can convert the data to the desired character set format.
  • Two functions:
    • Decode: Data can be decoded to the desired character set format
    • Encode: Data can be encoded into the desired character set format
    • Python recognizes Unicode, so it uses decode to convert data to Unicode, and then uses encode to convert the data to the desired character set.
  • Test code:
  • >>> name="every day">>>name'\xe5\xa4\xa9\xe5\xa4\xa9'  #GBK code for everyday Chinese characters>>> B=name.decode ('GBK')   >>>BU'\u6fb6\u2541\u3049'>>> C=b.encode ('UTF8')>>>C'\xe6\xbe\xb6\xe2\x95\x81\xe3\x81\x89'——————————————————————————>>>'\xcc\xec\xcc\xec'. Decode ('GBK') U'\u5929\u5929'>>>'\xcc\xec\xcc\xec'. Decode ('GBK'). Encode ('UTF8')'\xe5\xa4\xa9\xe5\xa4\xa9'>>>'every day''\xe5\xa4\xa9\xe5\xa4\xa9'

Python character set conversion (MySQL data garbled processing)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.