Suggestions on how to avoid some common problems during Python programming, and how to avoid python

Source: Internet
Author: User

Suggestions on how to avoid some common problems during Python programming, and how to avoid python

This article collects the nonstandard but occasionally subtle issues I have seen in the code written by new Python developers. This article aims to help novice developers write ugly Python code. In order to take care of the target readers, this article makes some simplification (for example, generator and powerful iteration tools itertools are ignored when discussing the iterator ).

There are always some reasons for new developers to use the anti-pattern. I have tried to give these reasons where possible. However, these anti-pattern causes the code to be unreadable, bug-prone, and not conform to the Python code style. For more information, I strongly recommend The Python Tutorial or Dive into Python.
Iteration

Use of range

A beginner in Python programming prefers to use range to implement simple iteration. Within the length range of the iterator, it gets every element in the iterator:

for i in range(len(alist)):  print alist[i]

It should be noted that range is not used to implement simple iteration of sequences. Compared with the for loop defined by numbers, although the for loop implemented by range is quite natural, it is easy to produce bugs in the iteration of sequences, and it is better to directly construct the iterator to look clear:
 

for item in alist:  print item

Misuse of range can easily lead to an unexpected one (off-by-one) error, this is usually because the new programmer forgets that the object generated by range includes the first parameter of range, but does not include the second parameter. It is similar to the substring in java and many other functions of this type. Programmers who believe that there is no more than the end of the sequence will create bugs:
 

# Method for iteration of the entire sequence error alist = ['her', 'name', 'is', 'Rio '] for I in range (0, len (alist)-1): # one Off by one )! Print I, alist [I]

Common Reasons for improper use of range:
1. Indexes need to be used in the loop. This is not a reasonable reason. You can use the following method to replace indexes:
 

for index, value in enumerate(alist):  print index, value

2. Two loops need to be iterated at the same time, and two values must be obtained using the same index. In this case, you can use zip to implement:
 

for word, number in zip(words, numbers):  print word, number

3. Part of the iteration sequence is required. In this case, only iterative sequence slicing is required. Note that the following Annotations indicate the purpose:
 

For word in words [1:]: # does not include the first element print word

One exception is that when you iterate a large sequence, the overhead caused by slice operations is relatively large. If there are only 10 elements in the sequence, there is no problem. If there are 10 million elements, or when slice is performed in a performance-sensitive inner loop, overhead becomes very important. In this case, you can use xrange instead of range [1].

In addition to iterative sequences, an important usage of range is when you really want to generate a digital sequence instead of an index:
 

# Print foo(x) for 0<=x<5for x in range(5):  print foo(x)

Use List parsing correctly

If you have a loop like this:
 

# An ugly, slow way to build a listwords = ['her', 'name', 'is', 'rio']alist = []for word in words:  alist.append(foo(word))

You can use list parsing to rewrite the statement:
 

words = ['her', 'name', 'is', 'rio']alist = [foo(word) for word in words]

Why? On the one hand, you can avoid errors that may occur when the list is correctly initialized. On the other hand, you can write code to make it look clean and tidy. For those with functional programming backgrounds, using map functions may be more familiar, but in my opinion this approach is not Python-based.

Other common reasons for not using list Resolution:

1. nested loops are required. At this time, you can nest the entire list parsing, or use multiple rows in the list parsing to use the loop:
 

words = ['her', 'name', 'is', 'rio']letters = []for word in words:  for letter in word:    letters.append(letter)

Use List parsing:
 

words = ['her', 'name', 'is', 'rio']letters = [letter for word in words         for letter in word]

Note: In list resolution with multiple loops, the loops are in the same order as you did not use list resolution.

2. You need a condition judgment inside the loop. You only need to add the condition judgment to list parsing:
 

words = ['her', 'name', 'is', 'rio', '1', '2', '3']alpha_words = [word for word in words if isalpha(word)]

A reasonable reason for not using list resolution is that you cannot use exception handling in list resolution. If some elements in the iteration may cause exceptions, you need to transfer the possible exception handling through function call in list parsing, or simply do not use list parsing.
Performance Defects

Check content in linear time

In terms of syntax, it seems similar to checking whether the list or set/dict contains an element, but it is completely different on the surface. If you need to repeat whether a data structure contains an element, you 'd better use set instead of list. (If you want to associate a value with the element you want to check, you can use dict; in this way, the constant check time can also be implemented .)

# Suppose lyrics_list = ['her', 'name', 'is', 'Rio '] starting with list # Avoid writing words = make_wordlist () below () # assume that many words for word in words: if word in lyrics_list: # linear check time print word, "is in the lyrics" # It is best to write lyrics_set = set (lyrics_list) # create setwords = make_wordlist () in linear time # assume that many words to be tested are returned for word in words: if word in lyrics_set: # constant check time print word, "is in the lyrics"

[Note: The set elements and dict key values in Python can be hashed, so the search time complexity is O (1 ).

Remember: Creating a set introduces a one-time overhead. the creation process takes linear time even if the member check takes constant time. Therefore, if you need to check the members in the loop, it is best to take the time to create the set, because you only need to create it once.
Variable Leakage

Loop

Generally, in Python, the scope of a variable is wider than you expected in other languages. For example, the following code in Java cannot be compiled:
 

// Get the index of the lowest-indexed item in the array // that is> maxValuefor (int I = 0; I <y. length; I ++) {if (y [I]> maxValue) {break ;}}// the error message returned here is invalid: iprocessArray (y, I) does not exist );

However, in Python, the same code is always executed smoothly and expected results are obtained:
 

for idx, value in enumerate(y):  if value > max_value:    break processList(y, idx)

This code will run normally, unless the sub-y is empty, the loop will never be executed at this time, and the processList function will throw a NameError exception because idx is not defined. If you use the Pylint code check tool, you will be warned to use the variable idx that may not be defined.

The solution is always clear. You can set idx to some special values before the loop so that you know what you will look for if the loop is never executed. This mode is called the Sentinel mode. So what value can be used as a sentry? In the C language era or earlier, when int rules the programming world, for a function that needs to return an expected error result, the general mode is to return-1. For example, if you want to return the index value of an element in the list:
 

Def find_item (item, alist): # None is more pythonized than-1. result =-1 for idx, other_item in enumerate (alist): if other_item = item: result = idx break return result

Generally, None in Python is a good sentinel value, even if it is not consistently used by Python standard types (for example: str. find [2])

Out-of-scope

New Python programmers often like to put everything in a so-called out-of-scope-a part of a python file that is not contained by code blocks (such as functions or classes. The out-of-scope is equivalent to a global namespace. For this part of discussion, you should assume that the content of the global scope can be accessed anywhere in a single Python file.

The external scope is very powerful for the constants declared at the top of the file that need to be accessed by the entire module. It is wise to use a special name for any variable in the out-of-scope. For example, use the constant name IN_ALL_CAPS. This will not easily cause the following bug:
 

import sys # See the bug in the function declaration?def print_file(filenam):  """Print every line of a file."""  with open(filename) as input_file:    for line in input_file:      print line.strip() if __name__ == "__main__":  filename = sys.argv[1]  print_file(filename)

If you look closer, you will see that the filenam parameter name is used in the print_file function definition, but the function body references filename. However, this program can still run well. Why? In the print_file function, when a local variable filename is not found, the next step is to find it in the global scope. Because the call to print_file is out-of-scope (even if there is indentation), the declared filename is visible to the print_file function.

So how can we avoid such errors? First, do not set any value for global variables such as IN_ALL_CAPS in the external scope [3]. It is best to Pass Parameter Parsing to the main function, so any internal variables in the function do not survive in the out-of-scope.

This also reminds people to pay attention to the global keyword. If you only read the value of a global variable, you do not need the global keyword. You only need to use the global keyword when you want to change the object referenced by the global variable name. Here you can get more information about this discussion of the global keyword on Stack Overflow.
Code style

Salute to PEP8

PEP 8 is a general style guide for Python code. You should keep it in mind and follow it as much as possible, although some people have good reasons to disagree with some of the small styles, for example, the number of indentations or empty rows are used. If you do not follow PEP8, you should have a better reason than "I just don't like that style. The style guides below are extracted from PEP8, which seems to be frequently remembered by programmers.

Test whether it is empty

If you want to check whether a container type (such as list, dictionary, and set) is empty, simply test it instead of using methods similar to checking len (x)> 0:
 

numbers = [-1, -2, -3]# This will be emptypositive_numbers = [num for num in numbers if num > 0]if positive_numbers:  # Do something awesome

If you want to save the result of positive_numbers being null elsewhere, you can use bool (positive_number) to save the result. bool is used to determine the true value of the if condition to judge the statement.

Test whether it is None

As mentioned above, None can be used as a good sentinel value. So how to check it?

If you explicitly want to test None, instead of testing other items with the value False (such as empty containers or 0), you can use:
 

if x is not None:  # Do something with x

If you use None as the sentry, this is also the expected pattern of the Python style, for example, when you want to distinguish between None and 0.

If you only test whether the variable is a useful value, a simple if mode is usually enough:
 

if x:  # Do something with x

For example, if x is expected to be a container type, but x may be returned as another function and the result value becomes None, you should consider this situation immediately. You need to check whether the value passed to x has changed. Otherwise, you may think that True or 0. 0 is a useful value, but the program will not be executed as you want.

Note:

[1] In Python2.x, range generates a list object, while xrange generates a range object. In Python 3.x, xrange is abolished, and range generates a range object, the list factory function can be used to generate a list explicitly;
[2] string. find (str) returns the index value starting with str in string. If it does not exist,-1 is returned;
[3] do not set any value for the local variable name in the function during the external action to prevent the function from calling the same name variable in the external scope due to an error when calling the local variable inside the function.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.