Working with Chinese characters in a Python string

Source: Internet
Author: User

#-*-coding:utf-8-*-import sys,ostxta = open (' A.txt ', ' r ') str = ' for line ' txta:    str + = Line.strip (). Decode (' Utf-8 ') Txta.close () for word in str:    print word.encode (' Utf-8 ')

Direct output, it will be garbled, the first to decode, then encode.

Reference URL: http://blog.csdn.net/devil_2009/article/details/39526713

The first thing to understand is that the default string in Python is ASCII encoding, which is a string type, and the ASCII encoding process has a problem with the characters. The internal encoding format for Python is Unicode, and a ' u ' prefix can be used to directly declare a Unicode string, such as U ' Hello ', which is a Unicode type. If a non-ASCII code representation of a character is present in the processed string, it must be converted to Unicode encoding if it is not an error. The specific methods are: Decode (), converting other edge-encoded strings to Unicode encoding, such as Str1.decode (' gb2312 '), to convert gb2312 encoded string str1 into Unicode encoding; encode (), Converts a Unicode encoding into another encoded string, such as Str2.encode (' gb2312 '), to convert a Unicode-encoded string str2 to gb2312 encoding; Unicode (), with Decode (), Converting other encoded strings to Unicode encoding, such as Unicode (STR3, ' gb2312 '), represents the conversion of GB2312 encoded string STR3 to Unicode encoding. When transcoding, be sure to understand what string STR is encoded, then decode into Unicode, and then encode into other encodings. In      addition, the decoding of a Unicode-encoded string can be an error, so in the case of unknown encoding, it is possible to determine whether the encoding is Unicode, using isinstance (str, Unicode).      not only in Chinese, but when dealing with non-ASCII encoded strings, you can follow these steps: 1. Determine the encoding format of the source character, assuming that it is utf8;2, using Unicode (), or decode () to convert to Unicode encoding, such as Str1.decode (' UTF8 '), or Unicode (str1, ' UTF8 '), 3, encodes the processed string into the specified format using encode ().

Working with Chinese characters in a Python string

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.