Common Python code optimization skills

Source: Internet
Author: User
Tags new set

Code optimization can make the program run faster. It makes the program run more efficiently without changing the program running result. According to the 80/20 principle, it usually takes 80% of the workload to implement program refactoring, optimization, expansion, and documentation-related tasks. Optimization usually involves two aspects: reducing the size of the code and improving the code running efficiency.

Improve algorithms and select appropriate data structures

A good algorithm plays a key role in performance. Therefore, the primary point of performance improvement is to improve the algorithm. Sort the time complexity of the algorithm in sequence:

O(1) -> O(lg n) -> O(n lg n) -> O(n^2) -> O(n^3) -> O(n^k) -> O(k^n) -> O(n!) 

Therefore, if the algorithm can be improved in terms of time complexity, the performance improvement is self-evident. However, the improvement of specific Algorithms does not fall within the scope of this article. Readers can refer to this information on their own. The following content will focus on the selection of data structures.

  • Dictionary and list)

The hash table is used in the Python dictionary, so the complexity of the query operation is O (1), while the list is actually an array. in the list, the query needs to traverse the entire list, the complexity is O (n), so the operation dictionaries such as searching and accessing members are faster than list.

Listing 1. Code dict. py

from time import time t = time() list = ['a','b','is','python','jason','hello','hill','with','phone','test', 'dfdf','apple','pddf','ind','basic','none','baecr','var','bana','dd','wrd'] #list = dict.fromkeys(list,True) print list filter = [] for i in range (1000000):      for find in ['is','hat','new','list','old','.']:          if find not in list:              filter.append(find) print "total run time:"print time()-t 

The above Code requires about 16.09 seconds to run. If you remove the comments # list = dict. fromkeys (list, True) from the row, convert the list into a dictionary and run it again. The time is about 8.375 seconds, which improves the efficiency by about half. Therefore, it is a good choice to use dict instead of list when multiple data members need to perform frequent searches or accesses.

  • Set and list)

The union, intersection, and difference operations of set are faster than the list iteration. Therefore, if the list intersection is involved, the union or difference problem can be converted to the set operation.

Listing 2. Intersection of list:

from time import time t = time() lista=[1,2,3,4,5,6,7,8,9,13,34,53,42,44] listb=[2,4,6,9,23] intersection=[] for i in range (1000000):      for a in lista:          for b in listb:              if a == b:                  intersection.append(a) print "total run time:"print time()-t 

The running time of the above program is:

total run time: 38.4070000648 

Listing 3. Using set to calculate the intersection

from time import time t = time() lista=[1,2,3,4,5,6,7,8,9,13,34,53,42,44] listb=[2,4,6,9,23] intersection=[] for i in range (1000000):      list(set(lista)&set(listb)) print "total run time:"print time()-t 

After changing to set, the running time of the program is reduced to 8.75, which is more than 4 times higher and the running time is greatly shortened. You can use table 1 for testing.

Table 1. Common set usage

SyntaxOperationDescriptionSet (list1) set (list2) union contains the new set (list1) & set (list2) of all data in list1 and list2) the intersection contains the set (list1)-set (list2) difference of the new set that contains the common elements in list1 and list2. The set of elements that appear in list1 but not in list2

Loop Optimization

The principle for loop optimization is to minimize the amount of computation in the cycle process. If there are multiple cycles, we should try to refer the calculation of the inner layer to the previous layer. The following examples are used to compare the performance improvements brought about by loop optimization. In program list 4, if loop optimization is not performed, the approximate running time is about 132.375.

Listing 4. Before cyclic Optimization

from time import time t = time() lista = [1,2,3,4,5,6,7,8,9,10] listb =[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.01] for i in range (1000000):      for a in range(len(lista)):          for b in range(len(listb)):              x=lista[a]+listb[b] print "total run time:"print time()-t 

Now we will perform the following optimization: We will take the length calculation out of the loop, the range should be replaced by xrange, and the calculation of the third layer lista [a] will be mentioned to the second layer of the loop.

Listing 5. After loop Optimization

from time import time t = time() lista = [1,2,3,4,5,6,7,8,9,10] listb =[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.01] len1=len(lista) len2=len(listb) for i in xrange (1000000):      for a in xrange(len1):          temp=lista[a]          for b in xrange(len2):              x=temp+listb[b] print "total run time:"print time()-t 

The running time of the optimized program is reduced to 102.171999931. In Listing 4, the number of lista [a] computations is 1000000*10*10, and the number of times calculated in the optimized code is 1000000*10. The number of computations is greatly reduced, therefore, the performance has been improved.

Make full use of the features of Lazy if-evaluation

In python, the conditional expression is lazy evaluation. That is to say, if the conditional expression if x and y exists, the value of the y expression will not be calculated if x is false. Therefore, this feature can be used to improve program efficiency to a certain extent.

Listing 6. Use the features of Lazy if-evaluation

from time import time t = time() abbreviations = ['cf.', 'e.g.', 'ex.', 'etc.', 'fig.', 'i.e.', 'Mr.', 'vs.'] for i in range (1000000):      for w in ('Mr.', 'Hat', 'is', 'chasing', 'the', 'black', 'cat', '.'):          if w in abbreviations:          #if w[-1] == '.' and w in abbreviations:              pass print "total run time:"print time()-t 

The running time of the program before optimization is about 8.84. if you use the comment line to replace the first if, the running time is about 6.17.

String Optimization

The string object in python cannot be changed. Therefore, any string operations, such as concatenation or modification, will generate a new String object instead of based on the original string, therefore, this continuous copy will affect python performance to a certain extent. Optimizing strings is also an important aspect to improve performance, especially when there are many texts. String optimization mainly involves the following aspects:

  1. Try to use join () instead of + for string connection. In code listing 7, it takes about 0.125 s to use + for string connection, and 0.016 s to use join. Therefore, join is faster than + in character operations. Therefore, try to use join instead of +.

Listing 7. Use join instead of join strings

from time import time t = time() s = ""list = ['a','b','b','d','e','f','g','h','i','j','k','l','m','n'] for i in range (10000):      for substr in list:          s+= substr     print "total run time:"print time()-t 

Avoid:

s = ""for x in list:    s += func(x) 

Instead, use:

slist = [func(elt) for elt in somelist] s = "".join(slist) 
  1. When you can use regular expressions or built-in functions to process strings, select built-in functions. Such as str. isalpha (), str. isdigit (), str. startswith ('x', 'yz'), str. endswith ('x', 'yz '))
  2. Formatting character is faster than Directly Reading characters in series. Therefore, you must use
out = "

Avoid

out = "

Use list comprehension and generator expression)

List Parsing is more efficient than rebuilding a new list in a loop, so we can use this feature to improve the running efficiency.

from time import time t = time() list = ['a','b','is','python','jason','hello','hill','with','phone','test', 'dfdf','apple','pddf','ind','basic','none','baecr','var','bana','dd','wrd'] total=[] for i in range (1000000):      for w in list:          total.append(w) print "total run time:"print time()-t 

Use List parsing:

for i in range (1000000):     a = [w for w in list] 

It takes about 17 s to run the above Code directly. Instead, after list resolution is used, the running time is shortened to 9.29 s. It is nearly halved. The generator expression is the new content introduced in 2.4. The syntax is similar to list parsing. However, when processing large data volumes, the generator expression has obvious advantages and does not create a list, it only returns a generator, so it is more efficient. In the preceding example, the code a = [w for w in list] is changed to a = (w for w in list), and the running time is reduced to about 2.98 s.

Other optimization skills

  1. If you need to exchange the values of two variables, use a, B = B, a instead of using the intermediate variable t = a; a = B; B = t;
>>> from timeit import Timer >>> Timer("t=a;a=b;b=t","a=1;b=2").timeit() 0.25154118749729365 >>> Timer("a,b=b,a","a=1;b=2").timeit() 0.17156677734181258 >>> 
  1. Xrange rather than range is used in the loop. xrange can save a lot of system memory, because xrange () generates only one integer element each call in the sequence. Range () directly returns the complete list of elements, which is unnecessary during loops. In python3, xrange no longer exists. In it, range provides an iterator that can traverse any range of length.
  2. Use local variables to avoid the "global" keyword. Python accesses local variables much faster than global variables, because this feature can be used to improve performance.
  3. If done is not None than the statement if done! = None is faster, and readers can verify it by themselves;
  4. In a time-consuming loop, you can change the function call to an inline method;
  5. Use cascading comparison "x <y <z" instead of "x <y and y <z ";
  6. While 1 is faster than while True (of course the latter is more readable );
  7. The build in function is usually faster. add (a, B) is better than a + B.

Locate program performance bottlenecks

The premise of code optimization is that you need to know where the performance bottleneck is and where the main time for running the program is consumed. You can use some tools to locate complicated code, python has a variety of built-in performance analysis tools, such as profile, cProfile, and hotshot. Profiler is a set of python programs that can describe the performance when the program runs and provide various statistics to help you locate the performance bottleneck of the program. The Python standard module provides three types of profilers: cProfile, profile, and hotshot.

The use of profile is very simple. You only need to import it before use. The specific example is as follows:

Listing 8. Using profile for Performance Analysis

import profile def profileTest():    Total =1;    for i in range(10):        Total=Total*(i+1)        print Total    return Total if __name__ == "__main__":    profile.run("profileTest()") 

The program running result is as follows:

Figure 1. Performance analysis results

The specific explanations of each output column are as follows:

  • Ncballs: Number of function calls;
  • Tottime: Specifies the total running time of the function, removing the running time of the function that calls the sub-function;
  • Percall: (the first percall) is equal to tottime/ncils;
  • Cumtime: indicates the time when the function and all its subfunctions call and run, that is, the time when the function starts to call and returns;
  • Percall: (the second percall) indicates the average time of a function running, which is equal to cumtime/nccalls;
  • Filename: lineno (function): the details of each function call;

To save the output as a log, you only need to add another parameter during the call. For example, profile. run ("profileTest ()", "testprof ").

If profile profiling data is saved as a binary file, you can use the pstats module to analyze text reports. It supports multiple forms of report output, is a more practical tool on the text interface. Easy to use:

import pstats p = pstats.Stats('testprof') p.sort_stats("name").print_stats() 

The sort_stats () method can sort the split data and accept multiple sorting fields. For example, sort_stats ('name', 'file') will first sort the data by function name, then sort by file name. Common sorting fields include CILS, time, and cumulative. In addition, pstats provides command line interaction tools. After running python-m pstats, you can learn more about the usage through help.

For large applications, it is very practical and intuitive to present performance analysis results in a graphical manner. Common visual chemicals include Gprof2Dot, visualpytune, KCacheGrind, etc, you can check the official website on your own. This article will not discuss it in detail.

Python performance optimization tool

In addition to improving algorithms and selecting appropriate data structures, Python performance optimization also involves several key technologies, such as rewriting key python code into a C extension module, or choose an interpreter that is more optimized in performance. These are called optimization tools in this article. Python has many built-in optimization tools, such as Psyco, Pypy, Cython, and Pyrex. These optimization tools have their own merits. This section introduces several types of optimization tools.

Psyco

Psyco is a just-in-time compiler that can improve the performance without changing the source code. Psyco compiles the operation into a slightly optimized machine code, its operations are divided into three different levels: "RunTime", "compile-time", and "virtual time. Increase or decrease the variable level as needed. The runtime variable is only the original bytecode and object structure processed by the regular Python interpreter. Once Psyco compiles the operation into a machine code, the variables will be expressed in the machine registers and memory locations that can be accessed directly. At the same time, python can cache compiled machine codes for reuse in the future, saving a little time. However, Psyco also has its disadvantages, and its operation occupies a large amount of memory. Psyco is not currently supported in python2.7, and no longer provide maintenance and updates, interested in it can refer to the http://psyco.sourceforge.net/

Pypy

PyPy indicates "Python implemented using Python", but it is actually implemented using a Python subset called RPython, which can convert Python code to C ,.. NET, Java, and other language and platform code. PyPy integrates a real-time (JIT) compiler. Unlike many compilers, the interpreter does not care about the lexical analysis and syntax tree of Python code. Because it is written in the Python language, it uses the Python Code Object directly .. Code Object is the representation of Python bytecode, that is, PyPy directly analyzes the bytecode corresponding to the Python Code ,, these bytecode are not stored in the Python runtime environment, nor in a binary format. The current version is 1.8. It supports installation on different platforms. To install Pypy on windows, you need to download the installation path first. Run pypy on the command line. If the following error occurs: "MSVCR100.dll is not found. Therefore, this application cannot be started. Re-installing the application may fix this problem ", you also need to download VS 2010 runtime libraries on Microsoft's official website to solve the problem. The specific address is http://www.microsoft.com/download/en/details.aspx? Displaylang = en & id = 5555

After the installation is successful, run pypy in the command line. The output is as follows:

C:\Documents and Settings\Administrator>pypy Python 2.7.2 (0e28b379d8b3, Feb 09 2012, 18:31:47) [PyPy 1.8.0 with MSC v.1500 32 bit] on win32 Type "help", "copyright", "credits" or "license" for more information. And now for something completely different: ``PyPy is vast, and contains multitudes''>>>> 

Taking the loop in listing 5 as an example, run python and pypy respectively. The running results are as follows:

C: \ Documents ents and Settings \ Administrator \ Desktop \ doc \ python> pypy loop. py total run time: 8.42199993134 C: \ Documents ents and Settings \ Administrator \ Desktop \ doc \ python> python loop. py total run time: 106.391000032

It can be seen that pypy is used to compile and run programs, which greatly improves the efficiency.

Cython

Cython is a language implemented using python. It can be used to write python extensions. All the libraries written with it can be loaded using import, which is faster than python. In cython, python extensions (such as import math) can be loaded, and header files (such as cdef extern from "math. h "). You can also use it to write python code. Rewrite the key part into the C extension module

Installation of Linux Cpython:

Step 1: Download

[root@v5254085f259 cpython]# wget -N http://cython.org/release/Cython-0.15.1.zip --2012-04-16 22:08:35--  http://cython.org/release/Cython-0.15.1.zip Resolving cython.org... 128.208.160.197 Connecting to cython.org|128.208.160.197|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2200299 (2.1M) [application/zip] Saving to: `Cython-0.15.1.zip'100%[======================================>] 2,200,299   1.96M/s   in 1.1s 2012-04-16 22:08:37 (1.96 MB/s) - `Cython-0.15.1.zip' saved [2200299/2200299] 

Step 2: Extract

[root@v5254085f259 cpython]# unzip -o Cython-0.15.1.zip 

Step 3: Install

python setup.py install 

Enter cython directly after installation. If the following content appears, the installation is successful.

[root@v5254085f259 Cython-0.15.1]# cython Cython (http://cython.org) is a compiler for code written in the Cython language.  Cython is based on Pyrex by Greg Ewing. Usage: cython [options] sourcefile.{pyx,py} ... Options:  -V, --version                  Display version number of cython compiler  -l, --create-listing           Write error messages to a listing file  -I, --include-dir <directory>  Search for include files in named directory                                 (multiple include directories are allowed).  -o, --output-file <filename>   Specify name of generated C file  -t, --timestamps               Only compile newer source files  -f, --force                    Compile all source files (overrides implied -t)  -q, --quiet                    Don't print module names in recursive mode  -v, --verbose                  Be verbose, print file names on multiple compil ation  -p, --embed-positions          If specified, the positions in Cython files of each  function definition is embedded in its docstring.  --cleanup <level>  Release interned objects on python exit, for memory debugging.    Level indicates aggressiveness, default 0 releases nothing.  -w, --working <directory>  Sets the working directory for Cython (the directory modules are searched from)  --gdb Output debug information for cygdb  -D, --no-docstrings              Strip docstrings from the compiled module.  -a, --annotate              Produce a colorized HTML version of the source.  --line-directives              Produce #line directives pointing to the .pyx source  --cplus              Output a C++ rather than C file.  --embed[=<method_name>]              Generate a main() function that embeds the Python interpreter.  -2          Compile based on Python-2 syntax and code seman tics.  -3          Compile based on Python-3 syntax and code seman tics.  --fast-fail     Abort the compilation on the first error  --warning-error, -Werror       Make all warnings into errors  --warning-extra, -Wextra       Enable extra warnings  -X, --directive <name>=<value>  [,<name=value,...] Overrides a compiler directive 

Installation on other platforms can refer to the Documentation: http://docs.cython.org/src/quickstart/install.html

Different from python, Cython code must be compiled first. Generally, it takes two phases to compile the pyx file into a. c file, and then compile the. c file into a. so file. There are multiple compilation methods:

  • Compile using the command line:

If the following test code is available, use the command line to compile it into a. c file.

def sum(int a,int b):        print a+b [root@v5254085f259 test]# cython sum.pyx [root@v5254085f259 test]# ls total 76 4 drwxr-xr-x 2 root root  4096 Apr 17 02:45 . 4 drwxr-xr-x 4 root root  4096 Apr 16 22:20 .. 4 -rw-r--r-- 1 root root    35 Apr 17 02:45 1 60 -rw-r--r-- 1 root root 55169 Apr 17 02:45 sum.c 4 -rw-r--r-- 1 root root    35 Apr 17 02:45 sum.pyx 

Compile the. so file using gcc on linux:

[root@v5254085f259 test]# gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing -I/usr/include/python2.4 -o sum.so sum.c [root@v5254085f259 test]# ls total 96 4 drwxr-xr-x 2 root root  4096 Apr 17 02:47 . 4 drwxr-xr-x 4 root root  4096 Apr 16 22:20 .. 4 -rw-r--r-- 1 root root    35 Apr 17 02:45 1 60 -rw-r--r-- 1 root root 55169 Apr 17 02:45 sum.c 4 -rw-r--r-- 1 root root    35 Apr 17 02:45 sum.pyx 20 -rwxr-xr-x 1 root root 20307 Apr 17 02:47 sum.so 

Use distutils to compile

Create a setup. py script:

from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext ext_modules = [Extension("sum", ["sum.pyx"])] setup(    name = 'sum app',    cmdclass = {'build_ext': build_ext},    ext_modules = ext_modules ) [root@v5254085f259 test]#  python setup.py build_ext --inplace running build_ext cythoning sum.pyx to sum.c building 'sum' extension gcc -pthread -fno-strict-aliasing -fPIC -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/ActivePython-2.7/include/python2.7  -c sum.c -o build/temp.linux-x86_64-2.7/sum.o gcc -pthread -shared build/temp.linux-x86_64-2.7/sum.o -o /root/cpython/test/sum.so 

After compilation, You can import it to python for use:

[root@v5254085f259 test]# python ActivePython 2.7.2.5 (ActiveState Software Inc.) based on Python 2.7.2 (default, Jun 24 2011, 11:24:26) [GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import pyximport; pyximport.install() >>> import sum >>> sum.sum(1,3) 

Below is a simple performance comparison:

Listing 9. Cython test code

from time import time def test(int n):        cdef int a =0        cdef int i        for i in xrange(n):                a+= i        return a t = time() test(10000000) print "total run time:"print time()-t 

Test results:

[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import pyximport; pyximport.install() >>> import ctest total run time: 0.00714015960693 

Listing 10. Python test code

from time import time def test(n):        a =0;        for i in xrange(n):                a+= i        return a t = time() test(10000000) print "total run time:"print time()-t [root@v5254085f259 test]# python test.py total run time: 0.971596002579 

From the comparison above, we can see that the speed of using Cython has increased by nearly 100 times.

Summary

This article discusses common python performance optimization techniques and how to use tools to locate and analyze program performance bottlenecks, and provides tools or languages for performance optimization, we hope to provide more references.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.