The role of sys.setdefaultencoding (' Utf-8 ') in Python

Source: Internet
Author: User

Tag: Code default causes STR encoding without efault IMP decode

In Python, encoding and decoding is actually the conversion between different encoding systems, by default, the conversion target is Unicode, that is, encoding unicode→str, decoding str→unicode, where Str refers to the byte stream, While Str.decode is decoding the byte stream str in a given decoding mode and converting it into utf-8 form, U.encode is converting the Unicode class into a byte stream by the given encoding method Str. Note that calling the Encode method is a Unicode object that generates a byte stream, and the Decode method is called by the Str object (Byte stream), which produces a Unicode object. If the Str object calls encode will default to the system by default encoding decode into a Unicode object again encode, ignoring the middle default decode often lead to error.

For example, there is the following code:

#! /usr/bin/env python
#-*-Coding:utf-8-*-
s = ' Chinese characters ' # here str is of type STR, not Unicode
S.encode (' gb2312 ')

This code re-encodes s into the gb2312 format, which is the conversion of Unicode-Str. Because S is itself a str type,
Python automatically decodes s to Unicode first, and then encodes it into gb2312. Because decoding is done automatically by Python, and we do not specify the decoding method, Python uses the sys.defaultencoding to decode it in the way indicated. In many cases sys.defaultencoding is anscii, and if S is not the type it will go wrong.
Unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xe4 in position
0:ordinal not in range (128)

In this case, we have two methods to correct the error:
1. Clearly indicate the encoding of s

#! /usr/bin/env python
#-*-Coding:utf-8-*-
s = ' Chinese characters '
S.decode (' Utf-8 '). Encode (' gb2312 ')

2. Change the encoding of sys.defaultencoding to file

#! /usr/bin/env python
#-*-Coding:utf-8-*-
Import Sys
Reload (SYS) # Python2.5 removed the Sys.setdefaultencoding method after initialization, we need to reload
Sys.setdefaultencoding (' Utf-8 ')

str = ' Chinese characters '
Str.encode (' gb2312 ')

The role of sys.setdefaultencoding (' Utf-8 ') in Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.