<title>An analysis of Python coding problems</title> Http://www.th7.cn/Program/Python/201303/128631.shtml
2013-03-11 07:49:40--Hits: 40
More 0
? First of all, these problems are only python2. The x version appears since 3. In the X version, the Python environment has only a Unicode type of string, which is automatically converted to a Unicode string by all processes in the program. So 2. How to avoid and clarify the coding problem in the development of Python program X? First of all to maintain a good uniform rules, otherwise everything is a white pull, unified with Utf-8 is the best.
1. Handling Non-ASCII encoding
The default encoding for Python is ASCII encoding, which often occurs when a non-ASCII encoding is processed in the middle of Python:
Unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0x?? In position 1:ordinal not in range (128)
0x?? is a number that exceeds 128.
We often add encoding attributes at the beginning of the file: #-*-Coding=utf8-*-
So it is not possible to process other encodings by setting Python's default encoding to the required encoding, mainly with the following 2 methods:
01. Preferred Method
Import sys reload (SYS) #重新加载sys
Sys.setdefaultencoding (' Utf-8 ') #看你的编码需要utf-8 or gb2312
Why do I have to reload the SYS module first when I call setdefaultencoding? Because the import statement here is actually not SYS's first import statement, that is, this may be the second to third time the Sys module imports, here is only a reference to the SYS, only reload can be reloaded; then why Reload? Instead of calling the function directly? Because the setdefaultencoding function is deleted after being called by the system, it is not already in the import reference, so it must be reload once sys module, so setdefaultencoding will be available To modify the current character encoding of the interpreter in the code.
02. Method of Global setting
Create a new sitecustomize.py file under the Python lib/site-packages folder (sitecustomize.py is a special file that Python will attempt to load at startup, so all code will run the file). Code can be set automatically.
Import Sys
Sys.setdefaultencoding (' gb2312 ')
3. Check the current encoding
Import Sys
Sys.getdefaultencoding ()
An analysis of Python coding problem-Insun-minghacker is Insun
2. Character encoding judgment
The encoding and detection of strings/files can be realized by Chardet.
Installation of Chardet.
The Easy_install tool enables quick installation of the Chardet command as follows: Easy_install.exe Chardet
Use of Chardet.
Chardet can directly use the Detect function to detect the encoding of the given character. The return value of the function is a dictionary, with 2 meta-numbers, one is the credibility of the detection, and the other is the detected encoding.
Import Urllib
Import Chardet
RawData = Urllib.urlopen (' http://www.sina.com.cn/'). Read ()
Print Chardet.detect (rawdata)
#result: {' confidence ': 0.99, ' encoding ': ' GB2312 '}
3. Decoding of file processing
Response = Urllib.urlopen (URL)
Text = Response.read (). Decode ("Utf-8") #add by Insun
Follow the first step to set the UTF8 encoding and then write a crawl mp3 program stored mp3 name is garbled print out the missing is the correct Chinese
Han 锛 Xuan mp3.
This time, obviously, it needs decoding.
Decode ("Utf-8")
We do not go far aside from the BOM head problem.
4.Python operation MySQL Chinese garbled problem
Python operation MySQL requires installation of Python-mysql
Can be searched from the Internet, and the same as the normal Python package installation
Once installed, the module name is MYSQLDB and can be used in Windows and Linux environments
Use the following measures to ensure that MySQL output is not mess:
1 python file Set encoding utf-8 (file front plus #encoding =utf-8)
2 MySQL Database charset=utf-8
3 python connection mysql is plus parameter Charset=utf8
4 Set Python's default encoding to Utf-8 (sys.setdefaultencoding (Utf-8)
#encoding =utf-8
Import Sys
Import MySQLdb
Reload (SYS)
Sys.setdefaultencoding (' Utf-8 ')
Db=mysqldb.connect (user= ' root ', charset= ' UTF8 ')
Cur=db.cursor ()
Cur.execute (' Use MyDB ')
Cur.execute (' select * from MYTB limit 100 ')
F=file ("/home/user/work/tem.txt", ' W ')
For I in Cur.fetchall ():
F.write (str (i))
F.write ("")
F.close ()
Cur.close ()
From for notes (Wiz)
An analysis of Python coding problems