How to solve problems with Python coding (1)

Source: Internet
Author: User
Tags python web crawler

Beginners of Python coding often encounter many coding problems, and often encounter many uncertain coding problems. Even if you write down the code because it is complicated, you will forget it, the following describes how to avoid related situations. It is because we have become more fluent in computer-related applications.

UNICODE:

Two or even four bytes are used to encode a character. Therefore, all the characters in the world can be unified.

Python:

The standard Python encoding only uses seven bits to represent a single character. Therefore, it can be encoded with a maximum of 128 characters. The expanded ANSCII uses 8 bits to represent a single character. It can only encode up to 256 characters.

UNICODE:

It uses two or even four bytes to encode a single character. Therefore, all characters in the world can be encoded in a unified manner.

UTF:

UNICODE encoding conversion format is used to guide how to encode unicode into a byte sequence suitable for file storage and network transmission (unicode-> str ). Some other encoding methods, such as gb2312, gb18030, big5, and UTF, have the same effect, but the encoding methods are different.

Here are a few words:

 
 
  1. "The items of a string are characters", "The items of a Unicode 
    object are Unicode code units", "The string data type is also 
    used to represent arrays of bytes, e.g., to hold data read from a file."  

One or two sentences explain what the elements of str and unicode are because they are sequence ). The Return Value of the default _ len _ function of sequence is the number of units that constitute the sequence. In this case, len ('abc') = 4 and len (u' I am Chinese') = 4 is easy to understand.

The third sentence tells us that str is used to represent an array of data when input and output from a file. Not only file operations, but also network transmission. This is why a unicode string needs to be encoded in Python when being written to a file or transmitted over the network.

  • Python web crawler DIY operations
  • Adaptation and processing of related files in Python
  • Python simple application programming experience sharing
  • Nuances between Python tutorials and C #
  • Introduction to arithmetic operations and arithmetic expressions in Python

Python encoding and decoding, that is, conversion between unicode and str. The encoding is unicode-> str. On the contrary, the decoding is str-> unicode. The remaining problems below are to determine when encoding or decoding is required. For example, some libraries are in the unicode version, in this way, we need to encode the returned values of these library functions into appropriate types when transmitting or writing files.

For the "Python encoding v indication" at the beginning of the file, that is, #-*-coding:-*-this statement. Python default script files are all ANSCII encoded. When there are characters in the file that are not within the ANSCII encoding range, use the "encoding indication" to correct them. About sys. defaultencoding, this method is used when decoding is not explicitly specified. For example, I have the following code:

 
 
  1. #! /usr/bin/env python     
  2. # -*- coding: utf-8 -*-   
  3.  

S = 'China' # note that str is of the str type, not unicode s. encode ('gb18030') re-encodes s into the gb18030 format, that is, unicode-> str conversion. Because s is of the str type, Python automatically decodes s to unicode and then encodes it into gb18030.

Because the decoding is automatically performed by python, we do not specify the decoding method. python uses the method specified by sys. defaultencoding to decode it. In many cases, sys. defaultencoding is ANSCII. If s is not of this type, an error occurs. In this case, my sys. defaultencoding is anscii, and the s encoding method is the same as the file encoding method, which is utf8, so an error occurs:

UNICODE encoding conversion format is used to guide how to encode unicode into a byte sequence suitable for file storage and network transmission (unicode-> str ). Like some other Python codes b2312, gb18030, big5, and UTF, the functions are the same, but the encoding methods are different.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.