[Reprinted] several tips on improving python Efficiency

Source: Internet
Author: User

1.1. Most common
One of the most common speed traps (at least I got stuck when I didn't see this online Introduction)
A few times): when many short strings are added and the strings are grown, the following values are usually used:

Toggle line numbers
1 bytes STRs = [str0, str1,..., strN]
2 # series composed of N + 1 strings
3 longStr ="
4 for s in each STRs: longStr + = s
Because the strings in Python are unchangeable, every longStr + = s copies the original longStr and str into a new string and then assigns it to longStr. as longStr grows, the content to be copied is getting longer and longer. finally, the str0 is copied N + 1 times, And str1 is copied N times ,... .

What should we do? Let's take a look at Mr. Skip Montanaro's explanation: http://musi-cal.mojam.com /~ Skip/python/fastpython.html and you can refer to Guido van rosum myself: http://www.python.org/doc/essays/list2str.html

1.1.1. Identify the speed bottleneck
1) first, you should learn how to identify the speed bottleneck: Python has a profile
Module:

Toggle line numbers
1 import profile
2 profile. run ('name of the function to be checked ()')
It will print out how many times other functions are called in that function, how much time is used each time, and how much time is used in total-Nice? For more information, see the description of the profile module in <database reference>.

Of course, the head is stupid or intelligent. You can also use time () in the time module to display the system time, minus the last time () is the number of seconds between it.

1.1.2. String concatenation
For the above example, use:

Toggle line numbers
1 longStr = ". join (distinct STRs)
I can do it right away, but what if the strings in STRs are not all strings and contain some numbers? Directly Using join will cause an error. Don't worry, as shown in the following code:

Toggle line numbers
1 reverse STRs = [str (s) for s in reverse STRs [I]
2 longStr = ". join (distinct STRs)
That is, first convert all the content in the series into strings and then join.

For a few strings, avoid using all = str0 + str1 + str2 + str3 instead of all = '% s % s' % (str0, str1, str2, str3)

1.1.3. Sequence sorting
List. sort ()
You can use a specific function: list. sort (function), as long as this function accepts two parameters and returns 1, 0,-1 according to specific rules. -Is it convenient? However, it will greatly slow down the running speed. The following method is an example to show that it may be easier to understand.

For example, if your series is l = ['az', 'by'], you want to sort it by the second letter. extract your keywords and form a tuple with each string: new = map (lambda s: (s [1], s), l)

So new is changed to [('Z', 'az'), ('y', 'by')], and new is sorted in the following order: new. sort ()

Then, "new" is changed to [('y', 'by'), ('Z', 'az')], and the second string in each tuples is returned: sorted = map (lambda t: t [1], new)

So sorted is: ['by', 'az']. lambda and map here are used very well.

For more information about how to use sort and sorted after Python2.4, see the Wikipedia: HowToSort.

1.1.4. Loop
For example, for loop. when the loop body is very simple, the overhead of the loop call will appear very bloated. At this time, map can help again. for example, if you want to set a long series of l = ['A', 'B',...] the following code indicates that each string in the string is capitalized:

Toggle line numbers
1 import string
2 newL = []
3 for s in l: newL. append (string. upper (s ))
With map, you can save the first part of the for Loop:

Toggle line numbers
1 import string
2 newL = map (string. upper, l)
Guido's article is very detailed.

1.1.5. Local variables and '.'
As shown above, if append = newL. append is used, and the import method is changed:

Toggle line numbers
1 import string
2 append = newL. append
3 for s in l: append (string. upper (s ))
It is faster than running newL. append in for. Why? Local variables are easy to find.

The result of Skip Montanaro is:

Basic cycle: 3.47 seconds
Local variable used for point removal: 1.79 seconds
Use map: 0.54 seconds

1.1.6. try usage
For example, if You want to calculate a string series: l = ['I', 'you', 'python', 'perl ',...] the number of times each word appears in, you may:

Toggle line numbers
1 count = {}
2 for s in l:
3 if not count. has_key (s): count [s] = 0
4 else: count [s] + = 1
It takes a lot of time to search for a keyword with the same name in count every time. try:

Toggle line numbers
1 count = {}
2 for s in l:
3 try: count [s] + = 1
4 TB t KeyError: count [s] = 0
That's much better. Of course, if exceptions often occur, don't try again.

1.1.7. import Statement
This is easy to understand. It is to avoid importing a module in the function definition and import all modules in the global block.

1.1.8. Massive Data Processing
Because function calling in Python is too heavy (overhead), when processing a large amount of data, you should:

Toggle line numbers
1 def f ():
2 for d in hugeData :...
3 f ()
Rather:

Toggle line numbers
1 def f (d ):...
2 for d in hugeData: f (d)
This seems to be applicable to other languages. It is almost universal, but it is more important to explanatory languages.

1.1.9. Reduce periodic checks
This is the feature of Python: Periodic Check for other threads or system signals to be processed.

You can use setcheckinterval in the sys module to set the interval of each check.

The default value is 10, that is, each 10 virtual commands are checked once.

When you don't need to talk about the system signal, setting the check cycle length will increase the speed and sometimes it will be significant.

-After compilation/translation, it seems that Python is easy to learn and difficult to learn, like go?

2. Our own experiences
Please share it with others!

In the "massive data processing" section, do you mean to put the function out instead of calling the function inside the recycle body? Starting from Python2.2, "identify the speed bottleneck", you can use the hotshot module. It is said that the impact on program running efficiency is smaller than that of profile.-jacbfan
"Because function calling in Python is too heavy (overhead), when processing a large amount of data, it should be:" In this sentence, overhead is translated as "Front. translation into "because the function call overhead in Python is relatively large ,..." Better-jacbfan
Will the method mentioned in array sorting be faster? Is it really easy for us to give up directly using sort for readability? Suspect-hoxide
After Python2.4, the use of sort and sorted is more flexible. link has been added to the text, and I have never compared efficiency. -Yichun
For "try usage ":
In fact, the setdefault method is set for this purpose:

Toggle line numbers
1 count = {}
2 for s in l:
3 count. setdefault (s, 0) + = 1
This actually can do more. The common problem is to group similar things, so you may want to use:

Toggle line numbers
1 count = {}
2 for s in l:
3 count. setdefault (s, []). append (s)
But in this way, you can only hash the same thing, not one type of thing. For example, if you have a dict list called sequence, you need to classify them by a key value of the dict. You also need to perform operations on these dict in each category after classification, you need to use the groupby implemented by Raymond, and you can write:

Totals = dict (key, group)
For key, group in groupby (sequence, lambda x: x. get ('age ')))

 

This article from the CSDN blog, reproduced please indicate the source: http://blog.csdn.net/guzicheng/archive/2010/10/13/5939222.aspx

 

 

Note: increasing the code description can multiply your LOC to make it simple and truly powerful.
Point of View: less typing = more thinking + less errors, 10 lines of code are more understandable than 50 lines. The following tips help increase productivity by 5 times

1. Avoid using temporary variables when switching variable values(Cookbook1.1)
Old Code: We are often very familiar with the following code.
Temp = x
X = y
Y = temp
Code 1:
U, v, w = w, v, u
Someone suggested that the assignment order could be used to simplify the above three lines of code into one line.
Code 2:
U, v = v, u
In fact, the concept of Python tuples assignment can be more concise-tuples initialization + tuples assignment

2. Avoid checking whether the key value exists when reading the dictionary(Cookbook1.2)
D = {'key': 'value '}
Old Code:
If 'key' in d: print d ['key']
Else: print 'not find'
New Code:
Print d. get ('key', 'not find ')

3. code optimization for finding the minimum value and Position:
S = [4, 1, 8, 3]
Old Code:
Mval, mpos = MAX, 0
For I in xrange (len (s )):
If s [I]
Viewpoint 1: to program in Python, you need to feel like a thousand RMB. If you select Python, do not care about the efficiency of a single statement.
The above examples are very basic. It is not impossible to actually compress the original code by 1/5. In our previous sub-project, the C ++ code is 270 K. After reconstruction, the Python code is only 67 K, of course, using the python Log Module (logging), reading and writing table text (csv), and so on, can also be done, and the final code becomes the original 1/4, I think my life is extended by three times... Below are several common code optimizations:

4. The simplest expression of file reading:
Old Code: we need to read text files into the memory
Line =''
Fp = open('text.txt ', 'R ')
For line in fp: text + = line
Code 1:
Text = string. join ([line for line in open('text.txt ')], '']
Code 2:
Text = ''. join ([line for line in open('text.txt ')])
Code 3:
Text = file('text.txt '). read ()
The new version of Python allows you to write code that is more beautiful than 1 or 2 (open is the alias of file, and file is more intuitive here)

5. How to Implement the ternary style in Python:
Old Code: C ++, Java, and C # do not like to write the following code
If n> = 0: print 'positive'
Else: print 'negitive'
Code 1: this technique is also common in Lua.
Print (n> = 0) and 'positive 'or 'negitive'
Note: 'and' and 'or' here are equivalent to ':' and 'in C '? ', The principle is very simple, because if the expression is true, then the following or is short-circuited to 'positive'; otherwise, and is short-circuited to 'negitive'
Code 2:
Print (n> = 0 and ['position'] or ['negitive]) [0]
Note: The two values are assembled into a metagroup, even if the 'positive 'is None, '', and 0, the entire sentence is safe.
Code 3:
Print ('negitive ', 'positive') [n> = 0]
Note: (FalseValue, TrueValue) [Condition] uses the two principles of tuples access + True = 1.

6. Avoid dictionary members from initializing complex objects:(Cookbook1.5)
Old Code:
If not y in d: d [y] = {}
D [y] [x] = 3
New Code:
D. setdefault (y, {}) [x] = 3
The same is true if the member is a list: d. setdefault (key, []). append (val)
The Code has been very compact since the above six tips were put into play, but the Code has not been "no nonsense". May someone doubt that it can actually reduce the code by 1/5 ?? What I want to talk about is that 1/5 is actually very conservative. The authors of Thinking in C ++ later used Python and thought that Python has even improved their work efficiency by 10 times. The following example further describes:
Example 1: Convert the text IP address to an integer
Note: You need to convert an IP address similar to '192. 168.10.214 'to 0x0C0A80AD6 without inet_aton. When C ++/Java programmers are worried about how to analyze texts and handle various input errors, Python programmers are off duty:
F = lambda ip: sum ([int (k) * v for k, v in zip (ip. split ('.'), [1
First, ip. split ('.') gets the list ['20160301', '20160301', '10', '20160301']. After zip assembly, it becomes
[('100', 0x192), ('100', 0x1000000), ('10', 0x168), ('100', 1)]
Next, the for loop performs integer multiplication on the two items of each tuples, and sums the values of the new list with sum. The result is displayed.
The C ++ programmer said, "You seem to believe in data too much and have not considered wrong input"
The Python programmer replied: "try/try t has helped me handle all exceptions, so you don't have to worry about cross-border crashes and cannot capture them"
Java programmers proudly look at their own hundred lines of code: "I want to know how you let your colleagues understand your masterpiece? Have you considered independent processing of functions such as gettoken so that similar problems can be reused? My code illustrates how to make full use of the excellent features of Reflection and interface and provide clear and readable code while increasing reusability"
Python reluctantly said: "This is a 'pure Code', which means it cannot be modified. It is similar to a regular expression. As long as you understand its functions, you can rewrite it if you want to modify it. If I can use three lines of code to complete it, I will never have the idea of encapsulation. What's more, I don't think it's hard to read Python ?"
C ++ programmers throw a killer simple: "What if you want to handle 10 million ip address conversions in one second ?"
Python Programmers think they want to go to bed: "Do you think I will be stupid enough to do this with Python ?"
At this time, the C ++ programmer did not seem to have heard of it. Instead, he began to seriously think about his question just now. Later, he gave a brief look at the other two, and then turned to the computer with confidence, start to input "template" to the screen ...."
Joke: the trap of encapsulation allows people to shout "encapsulation" or "reuse" while breaking all the rewriting in new projects and interpreting it as -- refactoring
Viewpoint 2: simplicity is beauty. It is problematic to design a thing.
Question: If the above program, in turn, converts the integer form of the ip address to a string, how should you design it ??
Example 2: output the names and values of each member of an object.
G = lambda m: '\ n '. join (['% s = % s' % (k, repr (v) for k, v in m. _ dict __. iteritems ()])
Usage: print g (x)
Extended: After getting familiar with lambda in the above two examples, we suggest you try yield

Summary
Q: "How to pay more attention to What you think than What you are writing"
A: "That is to say, you have A 1st page requirement code, but you are looking at the 2nd page requirement content, thinking about how to deal with 5-10 pages"
It makes sense to abolish PASCAL from using Python for scientific research and teaching many years ago. There are countless examples of simplified code. When using it for coding, there should be a feeling of "one word worth a thousand RMB". Otherwise, it will be written in the end, or "C ++ programs disguised as Python ".
Programming is happy, avoiding excessive physical labor and winning more time for thinking.
Question: Is it encapsulation? Or abandon encapsulation?
Question: Is "more than one way to do it" a good thing? What is the opposite?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.