Optimize Python code to make it faster in-scope lookups

Source: Internet
Author: User
I'll demonstrate how micro-optimization (micro optimization) can improve the execution speed of Python code 5%. 5%! It also angers anyone who maintains your code.

But in fact, this article simply explains the code that you occasionally encounter in the standard library or other people's code. Let's look at an example of a standard library, collections. Ordereddict class:

def __setitem__ (self, key, value, dict_setitem=dict.__setitem__): If key isn't in self:  root = self.__root last  = ro Ot[0]  last[1] = root[0] = Self.__map[key] = [last, root, key] return Dict_setitem (self, key, value)

Note the last parameter: dict_setitem=dict.__setitem__. It makes sense if you think about it. To associate a value to a key, you only need to pass three parameters to the __setitem__: The key to set, the value associated with the key, and the __setitem__ class method that is passed to the built-in Dict class. Wait a while, okay, maybe the last parameter doesn't make any sense.
Scope Query

To understand exactly what happened, we looked at the scope. Start with a simple question: In a Python function, if you encounter a thing called open, how does Python find the value of open?

# 
 
  
   
   def myfunc (): # with 
  
   
    
    open (' Foo.txt ', ' W ') as F:  pass
  
   
 
  

Simple answer: If you do not know the contents of global and local, you cannot determine the value of open. Conceptually, Python looks for a name when it checks 3 namespaces (for simplicity ignores nested scopes):

Local namespaces
Global namespaces
Built namespaces

So in the MyFunc function, if you try to find the value of open, we first check the local namespace, then the global namespace, and then the built-in namespace. If the definition of open is not found in these 3 namespaces, a Nameerror exception is thrown.
Implementation of Scope Lookup

The above lookup process is conceptually. This implementation of the discovery process gives us the space to explore the implementation.

def foo (): a = 1 return a def bar (): Return a def baz (a=1): Return a

Let's look at the bytecode for each function:

>>> Import dis>>> Dis.dis (foo) 2   0 load_const    1 (1)    3 store_fast    0 (a)  3   6 Load_fast    0 (a)    9 return_value >>> Dis.dis (bar) 2   0 load_global    0 (a)    3 return_value > >> Dis.dis (Baz) 2   0 load_fast    0 (a)    3 return_value

Notice the difference between Foo and bar. As we can see immediately, at the bytecode level, Python has already judged what the local variables are, what is not, because Foo uses load_fast, and bar uses Load_global.

We will not elaborate on how the Python compiler knows when to generate the bytecode (perhaps that is the category of another article), but enough to understand that Python already knows what type of lookup to perform when executing a function.

Another easy to confuse is that Load_global can be used for both global and built-in namespace lookups. Ignore the problem of nested scopes, which you can think of as "nonlocal". The corresponding C code is probably [1]:

Case load_global:v = Pyobject_getitem (f->f_globals, name); if (v = = NULL) {  v = pyobject_getitem (f->f_builtins, name);  if (v = = NULL) {   if (pyerr_exceptionmatches (pyexc_keyerror))    Format_exc_check_arg (       pyexc_nameerror,       name_error_msg, NAME);   goto error;  } } PUSH (v);

Even if you've never seen CPython's C code, the code above is pretty straightforward. First, check that we find the key name in the F->f_globals (Global Dictionary), and then check whether the name is in the F->f_builtins (built-in dictionary), and finally, if the above two locations are not found, will throw a Nameerror exception.
Bind a constant to a local scope

Now let's look at the first code example and understand that the last parameter is actually binding a function to a function in the local scope. This is done by assigning the dict.__setitem__ to the default value of the parameter. Here's another example:

def not_list_or_dict (value): Return Not (Isinstance (value, dict) or isinstance (value, list)) def not_list_or_dict (value, _isinstance=isinstance, _dict=dict, _list=list): Return Not (_isinstance (value, _dict) or _isinstance (value, _list))

Here we do the same thing and bind the objects that would otherwise be in the built-in namespace to the local scope. As a result, Python will use Local_fast instead of Load_global (global lookup). So how fast is that? Let's do a simple test:

$ python-m timeit-s ' def not_list_or_dict (value): Return Not (Isinstance (value, dict) or isinstance (value, list)) ' Not_ List_or_dict ' 1000000 loops, best of 3:0.48 usec per loop$ python-m timeit-s ' def not_list_or_dict (value, _isinstanc E=isinstance, _dict=dict, _list=list): Return Not (_isinstance (value, _dict) or _isinstance (value, _list)) ' Not_list_or _dict ' 1000000 loops, best of 3:0.423 usec per loop

In other words, there is probably a 11.9% increase [2]. More than 5% I promised at the beginning of the article!
There's more to it.

It is reasonable to assume that the speed increase is load_fast reading the local scope, and Load_global first checks the global scope before checking the built-in scope. In the above example function, Isinstance, dict, and list are all in the built-in namespaces.

However, there is more. Not only can we use Load_fast to skip redundant lookups, it is also a different type of lookup.

The above C code fragment gives the Load_global code, the following is the Load_fast:

Case Load_fast:pyobject *value = Fastlocal[oparg]; if (value = = NULL) {  format_exc_check_arg (pyexc_unboundlocalerror,        unboundlocal_error_msg,        pytuple_ GetItem (Co->co_varnames, Oparg));  Goto error; } py_incref (value); PUSH (value); Fast_dispatch ()

We get local values by indexing an array. Although it doesn't appear directly, Oparg is just an index of that array.

Now it sounds reasonable. Our first version of Not_list_or_dict is going to make 4 queries, each of which is in the built-in namespace, which is queried only after the global namespace is found. This is the query operation for the 8 dictionary keys. In contrast, Not_list_or_dict's second edition, direct index C array 4 times, the bottom all use Load_fast. That's why local queries are faster.
Summarize

Now the next time you see this example in other people's code, you'll understand.

Finally, do not use this type of optimization in specific applications unless you really need to. And most of the time you don't need to do it. But if the time comes and you need to squeeze out the last bit of performance, you need to understand this.
Footnote

[1] Note that in order to be easier to read, I removed some performance optimizations from the above code. The real code is a little bit more complicated.

[2] The example function does not actually do anything valuable or IO, most of which is limited by the Python VM cycle.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.