1. Python programming speed skills
1.1. Most common
* The most common speed trap (at least I got stuck when I didn't see this online Introduction)
A few times): when many short strings are added and the strings are grown, the following values are usually used:
Switch row number display
1 bytes STRs = [str0, str1,..., strN]
2
A series of N + 1 strings
3 longStr =''
4 for s in each STRs: longStr + = s
Because the strings in Python are unchangeable, every longStr + = s copies the original longStr and str into a new string and then assigns it to longStr. as longStr grows, the content to be copied is getting longer and longer. finally, the str0 is copied N + 1 times, And str1 is copied N times ,....
What should we do? Let's take a look at Mr. Skip Montanaro's explanation: http://musi-cal.mojam.com /~ Skip/python/fastpython.html and you can refer to Guido van rosum myself: http://www.python.org/doc/essays/list2str.html
1.1.1. Identify the speed bottleneck
* 1) first, you should learn how to identify the speed bottleneck: Python has a profile
Module:
Switch row number display
1 import profile
2 profile. run ('name of the function to be checked ()')
It will print out how many times other functions are called in that function, how much time is used each time, and how much time is used in total --- Nice? For more information, see the description of the profile module in <database reference>.
Of course, the head is stupid or intelligent. You can also use time () in the time module to display the system time, minus the last time () is the number of seconds between it.
1.1.2. String concatenation
* For the above example, use:
Switch row number display
1 longStr = ''. join (distinct STRs)
I can do it right away, but what if the strings in STRs are not all strings and contain some numbers? Directly Using join will cause an error. Don't worry, as shown in the following code:
Switch row number display
1 reverse STRs = [str (s) for s in reverse STRs [I]
2 longStr = ''. join (distinct STRs)
That is, first convert all the content in the series into strings and then join.
For a few strings, avoid using all = str0 + str1 + str2 + str3 instead of all = '% s % s' % (str0, str1, str2, str3)
1.1.3. Sequence sorting
* List. sort ()
You can use a specific function: list. sort (function), as long as this function accepts two parameters and returns 1, 0,-1 according to specific rules. --- is it very convenient? However, it will greatly slow down the running speed. The following method is an example to show that it may be easier to understand.
For example, if your series is l = ['az', 'by'], you want to sort it by the second letter. extract your keywords and form a tuple with each string: new = map (lambda s: (s [1], s), l)
So new is changed to [('Z', 'az'), ('y', 'by')], and new is sorted in the following order: new. sort ()
Then, "new" is changed to [('y', 'by'), ('Z', 'az')], and the second string in each tuples is returned: sorted = map (lambda t: t [1], new)
So sorted is: ['by', 'az']. lambda and map here are used very well.
*
For more information about how to use sort and sorted after Python2.4, see the Wikipedia: HowToSort.
1.1.4. Loop
For example, for loop. when the loop body is very simple, the overhead of the loop call will appear very bloated. At this time, map can help again. for example, if you want to set a long series of l = ['A', 'B',...] the following code indicates that each string in the string is capitalized:
Switch row number display
1 import string
2 newL = []
3 for s in l: newL. append (string. upper (s ))
With map, you can save the first part of the for Loop:
Switch row number display
1 import string
2 newL = map (string. upper, l)
Guido's article is very detailed.
1.1.5. Local variables and '.'
As shown above, if append = newL. append is used, and the import method is changed:
Switch row number display
1 import string
2 append = newL. append
3 for s in l: append (string. upper (s ))
It is faster than running newL. append in for. Why? Local variables are easy to find.
The result of Skip Montanaro is:
Basic cycle: 3.47 seconds
Local variable used for point removal: 1.79 seconds
Use map: 0.54 seconds
1.1.6. try usage
For example, if You want to calculate a string series: l = ['I', 'you', 'python', 'perl ',...] the number of times each word appears in, you may:
Switch row number display
1 count = {}
2 for s in l:
3 if not count. has_key (s): count [s] = 0
4 else: count [s] + = 1
It takes a lot of time to search for a keyword with the same name in count every time. try:
Switch row number display
1 count = {}
2 for s in l:
3 try: count [s] + = 1
4 TB t KeyError: count [s] = 0
That's much better. Of course, if exceptions often occur, don't try again.
1.1.7. import Statement
This is easy to understand. It is to avoid importing a module in the function definition and import all modules in the global block.
1.1.8. Massive Data Processing
Because function calling in Python is too heavy (overhead), when processing a large amount of data, you should:
Switch row number display
1 def f ():
2 for d in hugeData :...
3 f ()
Rather:
Switch row number display
1 def f (d ):...
2 for d in hugeData: f (d)
This seems to be applicable to other languages. It is almost universal, but it is more important to explanatory languages.
1.1.9. Reduce periodic checks
This is the feature of Python: Periodic Check for other threads or system signals to be processed.
You can use setcheckinterval in the sys module to set the interval of each check.
The default value is 10, that is, each 10 virtual commands are checked once.
When you don't need to talk about the system signal, setting the check cycle length will increase the speed and sometimes it will be significant.
--- Compilation/translation is complete. It seems that Python is easy to learn and difficult to learn, like go?
2. Our own experiences
Please share it with others!
2.1. Story
*
Python Performance Tuning ~ Flydudu sharing
2.2. Thinking
* In the "massive data processing" section, isn't it necessary to recycle the internal call function of the body and put the function out? Starting from Python2.2, "identify the speed bottleneck", you can use the hotshot module. It is said that the impact on program running efficiency is smaller than that of profile. -- jacbfan
* "Function calling in Python is heavy, so when processing a large amount of data, it seems inappropriate to translate overhead into" Front "in this sentence. translation into "because the overhead of function calls in Python is relatively large ,... "Better -- jacbfan
* Will the method mentioned in array sorting be faster? Is it really easy for us to give up directly using sort for readability? Suspect-hoxide
* After Python2.4, sort and sorted will be used more flexibly. link has been added to the text, and I have never compared efficiency. -Yichun
* "Try usage ":
In fact, the setdefault method is set for this purpose:
Switch row number display
1 count = {}
2 for s in l:
3 count. setdefault (s, 0) + = 1
This actually can do more. The common problem is to group similar things, so you may want to use:
Switch row number display
1 count = {}
2 for s in l:
3 count. setdefault (s, []). append (s)
But in this way, you can only hash the same thing, not one type of thing. For example, if you have a dict list called sequence, you need to classify them by a key value of the dict. You also need to perform operations on these dict in each category after classification, you need to use the groupby implemented by Raymond, and you can write:
Totals = dict (key, group)
For key, group in groupby (sequence, lambda x: x. get ('age ')))
-Yichun
* Reverse STRs = [str (s) for s in reverse STRs [I] I reported an error in python2.5 (I'm not defined). I changed it to reverse STRs = [str (s) for s in every STRs.