The summary of the best practices of python encoding helps you sort out the knowledge points related to the best practices of python encoding, and makes a brief summary of some usage of python from the perspective of performance, if you are interested, you can refer to the suggestions that I believe many of you are using python. I have always been fond of python. there is no doubt that python is not efficient as an interpreted dynamic language, however, python is favored by its simplicity, ease of reading, scalability, and other features.
Many colleagues are using python at work, but few people often pay attention to its performance and usage. it is generally learned and used now. after all, python is not our main language, we generally only use it for system management. But why are we not doing better? Python zen has the following sentence: There shocould be one -- and preferably only one -- obvious way to do it. although that way may not be obvious at first unless you're Dutch. the general idea is that python encourages the use of an optimal method to accomplish one thing, which is also a difference with ruby and so on. Therefore, a good habit of writing python is very important. This article will give a brief summary of some usage of python from the perspective of performance, hoping to be useful to everyone ~
When it comes to performance, the easiest thing to think of is to reduce complexity. generally, we can analyze it by measuring the complexity of the code loop (cyclomatic complexitly) and the Landau symbol (large O, for example, dict search is O (1), while list search is O (n). Obviously, the data storage method directly affects the complexity of the algorithm.
I. data structure selection
1. search in the list:
For sorted lists, use the bisect module to find elements. this module uses binary search.
def find(seq, el) : pos = bisect(seq, el) if pos == 0 or ( pos == len(seq) and seq[-1] != el ) : return -1 return pos - 1
You can use the following to insert an element quickly:
bisect.insort(list, element)
In this way, the element is inserted and you do not need to call sort () to save the order again. you need to know that the cost of long list is very high.
2. set instead of list:
For example, to deduplicate a list, the most likely implementation is:
seq = ['a', 'a', 'b']res = []for i in seq: if i not in res: res.append(i)
Obviously, the complexity of the above implementation is O (n2). if it is changed:
seq = ['a', 'a', 'b']res = set(seq)
The complexity is immediately reduced to O (n). of course, here we assume that set can be used in the future.
In addition, operations such as union, intersection, and difference of the set are much faster than those of the list iteration. Therefore, if the list intersection, union or difference set are involved, you can convert them to set, pay more attention when using it, especially when the list is large, the performance will be more affected.
3. use the python collections module to replace the built-in container type:
Collections has three types:
Deque: similar list type of enhancement function
Defaultdict: Similar to dict type
Namedtuple: similar to the tuple type
The list is implemented based on arrays, while the deque is based on double-stranded tables. Therefore, the latter inserts an element in the middle or before, or deletes an element much faster.
Defaultdict adds a default factory for the new key value to avoid writing an additional test to initialize the ing entry, which is more efficient than dict. setdefault. An example of referencing the python document is as follows:
# Use profile stats for performance analysis> from pbp. scripts. profiler import profile, stats >>> s = [('yellow', 1), ('Blue ', 2), ('yellow', 3 ),... ('blue', 4), ('red', 1)] >>> @ profile ('defaultict ')... def faster ():... d = defaultdict (list )... for k, v in s :... d [k]. append (v)... >>> @ profile ('dict ')... def slower ():... d = {}... for k, v in s :... d. setdefault (k, []). append (v)... >>> slower (); faster () Optimization: Solutions [306] >>> stats ['dict '] {'stones': 16.587882671716077, 'Memory': 396, 'Time ': 0.35166311264038086 >>>> stats ['defaultid'] {'stones': 6.5733464259021686, 'Memory ': 552, 'Time': 0.13935494422912598}
It can be seen that the performance is improved by three times faster. Defaultdict uses a list factory as the parameter and can also be used for built-in types, such as long.
In addition to the algorithm and architecture, python advocates simplicity and elegance. Therefore, correct syntax practices are necessary to write elegant and easy-to-read code.
II. best syntax practices
String operations: operations over python string objects cannot be changed. Therefore, operations on any string, such as concatenation or modification, will generate a new string object instead of based on the original string, therefore, this continuous copy will affect Python performance to a certain extent:
(1) replace the '+' operator with join, and the latter has the copy overhead;
(2) when you can use regular expressions or built-in functions to process strings, select built-in functions. Such as str. isalpha (), str. isdigit (), str. startswith ('X', 'yz'), str. endswith ('X', 'yz '))
(3) character formatting is better than direct serial reading:
Str = "% s" % (a, B, c, d) # efficient
Str = "" + a + B + c + d + "" # slow
2. use list comprehension (list parsing) & generator (generator) & decorators (decorators) to familiarize yourself with itertools and other modules:
(1) list parsing: I think it is the most impressive feature in python2. Example 1:
>>> # the following is not so Pythonic >>> numbers = range(10) >>> i = 0 >>> evens = [] >>> while i < len(numbers): >>> if i %2 == 0: evens.append(i) >>> i += 1 >>> [0, 2, 4, 6, 8] >>> # the good way to iterate a range, elegant and efficient >>> evens = [ i for i in range(10) if i%2 == 0] >>> [0, 2, 4, 6, 8]
Example 2:
def _treament(pos, element): return '%d: %s' % (pos, element)f = open('test.txt', 'r')if __name__ == '__main__': #list comps 1 print sum(len(word) for line in f for word in line.split()) #list comps 2 print [(x + 1, y + 1) for x in range(3) for y in range(4)] #func print filter(lambda x: x % 2 == 0, range(10)) #list comps3 print [i for i in range(10) if i % 2 == 0] #list comps4 pythonic print [_treament(i, el) for i, el in enumerate(range(10))]output:24[(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (2, 3), (2, 4), (3, 1), (3, 2), (3, 3), (3, 4)][0, 2, 4, 6, 8][0, 2, 4, 6, 8]['0: 0', '1: 1', '2: 2', '3: 3', '4: 4', '5: 5', '6: 6', '7: 7', '8: 8', '9: 9']
Yes, it's so elegant and simple.
(2) the generator expression is introduced in python2.2. it uses the 'lazy evaluation' idea and is more effective in memory. Example of referencing the longest line in the computing file in python core programming:
f = open('/etc/motd, 'r')longest = max(len(x.strip()) for x in f)f.close()return longest
This implementation is concise and does not need to read all the lines of the file into the memory.
(3) python introduced the decorator in 2.4, which is an exciting feature. In short, it encapsulates functions and methods (Receives a function and returns an enhanced function) it is easier to read and understand. The '@' symbol is the modifier syntax. you can describe a function and remember the call results for future use. this technology is called memoization. The following describes how to use the decorator to complete a cache function:
Import timeimport hashlibimport picklefrom itertools import chaincache ={} def is_obsolete (entry, duration): return time. time ()-entry ['Time']> durationdef compute_key (function, args, kw): # serialize/deserialize an object, the pickle module is used to serialize functions and parameter objects into a hash value key = pickle. dumps (function. func_name, args, kw) # hashlib is a library that provides MD5 and sh1. The result is saved in a global dictionary and return hashlib. sha1 (key ). hexdigest () def memoize (duration = 10): def _ memoize (function): def _ memoize (* args, ** kw): key = compute_key (function, args, kw) # do we have it already if (key in cache and not is_obsolete (cache [key], duration )): print 'We got a winner' return cache [key] ['value'] # computing result = function (* args, ** kw) # storing the result cache [key] = {'value': result,-'Time': time. time ()} return result return _ memoize @ memoize () def very_very_complex_stuff (a, B, c): return a + B + cprint very_very_complex_stuff (2, 2, 2, 2) print very_very_complex_stuff (2, 2, 2) @ memoize (1) def very_very_complex_stuff (a, B): return a + bprint very_very_complex_stuff (2, 2) time. sleep (2) print very_very_complex_stuff (2, 2)
Running result:
6we got a winner644
Decorator is used in many scenarios, such as parameter check, lock synchronization, and unit test framework. if you are interested, you can learn it on your own.
3. make good use of python's powerful introspection capabilities (attributes and descriptors): Since python is used, it is really surprising that introspection can do so easily. This topic is limited to a lot of content, I will not go into details here. I will make a separate summary later. to learn python, you must understand it in self-reflection.
III. coding tips
1. in versions earlier than python3, xrange is used to replace range, because range () directly returns the complete list of elements, and xrange () generates only one integer element each call in the sequence, with a low overhead. (In python3, xrange no longer exists. in it, range provides an iterator that can traverse the range of any length)
2. if done is not None than the statement if done! = None is faster;
3. try to use the "in" operator, which is concise and fast: for I in seq: print I
4. 'X <y <Z' replaces 'X <y and y <Z ';
5. while 1 is faster than while True, because the former is a single-step operation, and the latter still needs to be calculated;
6. Use build-in functions whenever possible, because these functions are often very efficient. for example, add (a, B) is better than a + B;
7. in a time-consuming cycle, you can change the function call method to inline. the internal loop should be concise.
8. use multiple values for swap elements:
X, y = y, x # elegant and efficient
Instead:
Temp = x
X = y
Y = temp
9. ternary operator (after python2.5): V1 if X else V2, avoid using (X and V1) or V2, because the latter has problems when V1 =.
10. switch case implementation in python: because the switch case syntax can be fully replaced by if else, there is no switch case syntax in python, but we can implement it using dictionary or lamda:
Switch case structure:
Switch (var) {case v1: func1 (); case v2: func2 ();... case vN: funcN (); default: default_func ();} dictionary implementation: values = {v1: func1, v2: func2 ,... vN: funcN,} values. get (var, default_func) () lambda implementation: {'1': lambda: func1, '2': lambda: func2, '3': lambda: func3} [value] ()
Try... If catch is used to implement a condition with Default, we recommend that you use the dict implementation method.
I only summarized some of the python practices here. I hope these suggestions can help everyone who uses python. optimizing performance is not a key point and solving problems efficiently, make your own code easier to maintain!