Anti-patterns in Python programming

Source: Internet
Author: User
This article collects the nonstandard but occasionally subtle questions I see in the code written by the novice Python developer. The purpose of this article is to help novice developers through the stages of writing ugly Python code. In order to take care of the target audience, this article makes some simplifications (for example, ignoring the generator and the powerful iteration tool Itertools when discussing iterators).

For those novice developers, there are always reasons to use anti-patterns, and I've tried to give these reasons wherever possible. But often these anti-patterns create code that is less readable, more prone to bugs, and does not conform to Python's code style. I highly recommend the Python tutorial or dive into python if you want to find more information about the presentation.

Iteration

Use of range

Novice Python programmers prefer to use range for a simple iteration to get each element of an iterator within the length range of the iterator:

For I in range (len (alist)):    print Alist[i]


It should be kept in mind that range is not intended to be a simple iteration of a sequence. Although a for loop implemented with a range is natural compared to a for loop that is defined by a number, it is easy to make a bug in the iteration of a sequence, and it is not as clear as a direct construct iterator:

For item in alist:    print Item


The abuse of range can cause an unexpected size difference of one (Off-by-one) error, which is usually due to the novice programmer forgetting that range-generated objects include the first parameter of range, not the second one, similar to substring in Java and many other functions of this type. Novice programmers who think that there are no more than the end of the sequence will create bugs:

# iterate over the entire sequence of errors by means of alist = [' Her ', ' name ', ' was ', ' Rio ']for I in range (0, Len (alist)-1): # size difference One (Off by a)!    Print I, Alist[i]

Common reasons to use range inappropriately:


1. The index needs to be used in the loop. This is not a reasonable reason to use the index in the following way:

For index, value in enumerate (alist):    print Index, value

2. You need to iterate over two loops at the same time, using the same index to get two values. In this case, it can be implemented in Zip:

For word, number in zip (words, numbers):    print word, number

3. Part of the iteration sequence needs to be iterated. In this case, only iterative sequence slices are required to be implemented, and note that the necessary annotations are added to indicate the intent:

For word in words[1:]: # does not include the first element of    print Word

There is one exception: when you iterate over a large sequence, the cost of slicing operations is larger. If the sequence has only 10 elements, there is no problem, but if there are 10 million elements, or when slicing in a performance-sensitive inner loop, the overhead becomes very important. In this case, you might consider using xrange instead of range [1].


In addition to iterating through the sequence, one of the important uses of range is when you really want to generate a sequence of numbers instead of building an index:

# print foo (x) for 0<=x<5for x in range (5):    print foo (x)

Correct use of list parsing


If you have a loop like this:

# an ugly, slow-to-build a listwords = [' Her ', ' name ', ' was ', ' rio ']alist = []for word in words:    alist.append (foo (w ORD))


You can use list parsing to override:

words = [' Her ', ' name ', ' was ', ' rio ']alist = [foo (word) for word in words]


Why do you do this? On the one hand you avoid the errors that the correct initialization list might bring, on the other hand, so that the code makes it look clean and tidy. For those of you who have a functional programming background, using the map function may feel more familiar, but in my opinion this approach is not very python.


Some other common reasons for not using list parsing are:


1. Loop nesting is required. At this point you can nest the entire list parsing, or use loops in the list parsing for multiple rows:

words = [' Her ', ' name ', ' was ', ' rio ']letters = []for word in words: ' to ' in    Word:        letters.append (letter)


Use list parsing:

words = [' Her ', ' name ', ' was ', ' rio ']letters = [letter-for-word in words-in                  Word]


Note: In a list resolution with multiple loops, the loops have the same order as if you didn't use list parsing.


2. You need a condition to judge within the loop. You only need to add this conditional judgment to the list parsing:

words = [' Her ', ' name ', ' was ', ' Rio ', ' 1 ', ' 2 ', ' 3 ']alpha_words = [word for word in words if isalpha (word)]


A reasonable reason to not use list parsing is that you cannot use exception handling in list parsing. If some elements in the iteration can cause an exception, you need to transfer the possible exception handling through a function call in list resolution, or simply not use list parsing.

Performance defects

Check content within linear time

In syntax, checking whether a list or set/dict contains an element on the surface does not seem to be the same, but beneath the surface it is completely different. If you need to double-check whether a data structure contains an element, it is better to use set instead of list. (You can use Dict if you want to associate a value with the element you want to check, so you can also implement a constant check time.) )

# Suppose to start with list lyrics_list = [' Her ', ' name ', ' is ', ' Rio '] # avoid the following notation words = Make_wordlist () # Suppose to return many words to test for word in words:
  if word in lyrics_list: # linear check time        print Word, "is in the lyrics" # It's better to write Lyrics_set = set (lyrics_list) # linear time Create Setword s = make_wordlist () # assumes that many of the words to be tested are returned for word in words:    if Word in Lyrics_set: # constant Check time        print Word, "is in the lyrics "


[Translator Note: The key values of set elements and dict in Python are hashed, so the time-to-find complexity is O (1). ]


It should be remembered that the creation of the set introduces a one-time overhead, and the creation process will spend linear times even if the member checks spend constant time. So if you need to check the members in the loop, it's best to take the time to create the set, because you just need to create it once.

Variable disclosure

Cycle


Generally speaking, in Python, a variable has a scope that is wider than you would expect in other languages. For example: In Java, the following code will not compile:

Get the index of the lowest-indexed item in the array//so is > maxvaluefor (int i = 0; i < y.length; i++) {
  
   if (Y[i] > MaxValue) {break        ;    }} I appear here not valid: Iprocessarray (y, i) does not exist;
  


In Python, however, the same code always executes smoothly and gets the expected result:

For IDX, value in Enumerate (y):    if value > Max_value: Break        processlist (y, idx)


This code will run normally, and unless the child is empty, the loop will never execute, and the call to the Processlist function will throw a Nameerror exception because IDX is not defined. If you use the Pylint Code Checker, you will be warned to use a variable idx that may not be defined.


The solution is always obvious, you can set the IDX to some special value before the loop, so you know what you are going to look for if the loop never executes. This mode is called Sentinel mode. So what values can be used as sentinels? In the C language age or earlier, when int ruled the programming world, the generic pattern for a function that needed to return an expected error result was return-1. For example, when you want to return an index value for an element in a list:

def find_item (item, alist):    # None ratio-1 more python    result =-1    for IDX, Other_item in Enumerate (alist):        if Other_item = = Item:            result = idx break     return result

Typically, none in Python is a better sentinel value, even if it is not consistently used by Python standard types (for example: Str.find [2])


Outer scope


Novice Python programmers often like to put everything in a so-called outer scope--python file that is not part of a block of code (such as a function or class). The outer scope is equivalent to the global namespace; For this part of the discussion, you should assume that the contents of the global scope are accessible anywhere in a single python file.


The outer scope is very powerful for constants declared at the top of a file that define the entire module to be accessed. It is advisable to use a distinctive name for any variable in the outer scope, for example, using the constant name In_all_caps. This will not easily cause the following bug:

Import SYS # See the bug in the function Declaration?def print_file (Filenam): "", "" "    print every line of a file.    " " with open (filename) as Input_file: For line in        input_file:            print Line.strip () if __name__ = = "__main__":    file Name = Sys.argv[1]    print_file (filename)


If you look closer, you'll see that the Print_file function is defined with a filenam named parameter name, but the function body refers to filename. However, this program can still work very well. Why is it? In the Print_file function, when a local variable filename is not found, the next step is to look for it in the global scope. Because of the invocation of Print_file in the outer scope (even if there is indentation), the filename declared here is visible to the Print_file function.


So how do you avoid such mistakes? First, do not set any value [3] for global variables that are not in_all_caps in the outer scope. Parameter parsing is best given to the main function, so any internal variables in the function do not survive outside the scope.


This also reminds people to focus on the Global keyword globals. If you just read the value of a global variable, you don't need global keyword globals. You only have to use the Global keyword if you want to change the object referenced by the Globals variable name. You can get more information about this discussion of the global keyword on Stack Overflow (http://stackoverflow.com/questions/4693120/ use-of-global-keyword-in-python/4693170#4693170).

Code style

Tribute to PEP8

PEP 8 is a common style guide for Python code that you should keep in mind and follow as much as possible, although some people have good reason to disagree with some of these small styles, such as the number of spaces indented or the use of blank lines. If you don't follow PEP8, you should have a better reason than "I just don't like that style". The style guides below are all extracted from PEP8, and it seems that programmers often need to keep in mind.

Test whether it is empty

If you want to check whether a container type (for example: List, dictionary, collection) is empty, simply test it instead of using a method like check len (x) >0:

numbers = [-1,-2, -3]# this'll be emptypositive_numbers = [num for num in numbers if num > 0]if positive_numbers:
  # do something awesome


If you want to save positive_numbers in other places, you can use BOOL (positive_number) as the result, and bool is used to determine the true value of the IF condition judgment statement.

Test is None

As mentioned earlier, none can be used as a good sentinel value. So how do you check it?

If you explicitly want to test none and not just test some other items that have a value of false (such as an empty container or 0), you can use:

If X is not None:    # does something with X

If you use None as a sentry, this is the pattern that Python style expects, such as when you want to differentiate between none and 0.

If you are just testing whether a variable is a useful value, a simple if pattern is usually sufficient:

If x:    # do something with X

For example, if you expect X to be a container type, but X may be the return result value of another function to none, you should consider this situation immediately. You need to be aware that you have changed the value passed to X, otherwise you might think true or 0. 0 is a useful value and the program does not execute the way you want it to.

Translator Note:


[1] in the python2.x range generated is a list object, Xrange generated is a Range object, Python 3.x abolished Xrange,range generated unity as a Range object, with the list factory function can be explicitly generated list;

[2] String.find (str) returns the index value of str starting in string, or 1 if it does not exist;

[3] Do not set any value on the local variable name in the function in order to prevent an error while calling the local variable inside the function and call the variable with the same name in the outer scope.

The above is the Python programming in the anti-pattern content, more relevant content please pay attention to topic.alibabacloud.com (www.php.cn)!

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.