Python encoding type conversion methods and python
This document describes how to convert Python encoding types. We will share this with you for your reference. The details are as follows:
1: Python and unicode
To correctly process multilingual text, Python introduces Unicode strings after version 2.0.
2: print in python
Although python needs to convert text encoding to unicode encoding, terminal display is done by a traditional Python string (in fact, the print Statement of Python cannot print double-byte Unicode characters ).
Python print will automatically convert the output unicode encoding (for other non-unicode encoding, print will be output as is) (when output to the console ), the write method of the file object will not be used. Therefore, when some strings are output normally using print, the write Method to the file is not necessarily the same as the print method.
In linux, it is converted according to environment variables. in linux, you can see it by using the locale command. The print statement transmits the output content to the operating system. The operating system encodes the input byte stream based on the system encoding.
>>> Str = 'learn python' >>> str '\ xe5 \ xad \ xa6 \ xe4 \ xb9 \ xa0python' # asII encoding >>> print learn python >>> str = u'learn python'> str #### unicode encoding '\ xe5u \ xad \ xa6 \ xe4 \ xb9 \ xa0python'
3: decode in python
Convert other character sets to unicode encoding (only Chinese characters need to be converted)
>>> Str = 'learn' >>> ustr = str. decode ('utf-8') >>> ustru' \ u5b66 \ u4e60'
In this way, the Chinese characters are encoded and converted, and python can be used for subsequent processing. (if not converted, python will perform default encoding conversion based on the environment variables of the machine, in this case, garbled characters may occur)
4: encode in python
Convert unicode to other character sets
>>> Str = 'learn' >>> ustr = str. decode ('utf-8') >>> ustru' \ u5b66 \ u4e60 >>> ustr. encode ('utf-8') '\ xe5 \ xad \ xa6 \ xe4 \ xb9 \ xa0' >>> print ustr. encode ('utf-8') Learning