Optimize Python code to accelerate lookup in scope _python

Source: Internet
Author: User

I'll demonstrate how micro-optimization (micro optimization) can improve the execution speed of Python code 5%. 5%! It will also offend anyone who maintains your code.

But in fact, this article simply explains the code that you occasionally encounter in the standard library or other people's code. Let's look at an example of a standard library, collections. Ordereddict class:

def __setitem__ (self, key, value, dict_setitem=dict.__setitem__):
 if key not in self:
  root = Self.__root
  Last = root[0]
  last[1] = root[0] = Self.__map[key] = [last, root, key] return
 dict_setitem (self, key, value)

Note the last parameter: dict_setitem=dict.__setitem__. It makes sense if you think about it. To associate a value with a key, you only need to pass three parameters to the __setitem__: The key to be set, the value associated with the key, and the __setitem__ class method that is passed to the built-in Dict class. Wait a minute, okay, maybe the last parameter doesn't make any sense.
Scoping Queries

To understand what's going on, let's look at the scope. Start with a simple question: In a Python function, if you encounter something called Open, how does Python find the value of open?

# <global:bunch of code here>
 
def myfunc ():
 # <local:bunch of code here>
 with open (' Foo.txt ', ' W ') as F: Pass
  

Simple answer: If you do not know the contents of global and local, you cannot determine the value of open. Conceptually, Python looks up the name by checking 3 namespaces (ignoring nested scopes for simplicity):

Local namespaces
Global namespaces
Built-in namespaces

So in the MyFunc function, if you try to find the value of open, we first check the local namespace, then the global namespace, and then build the namespace. If the definition of open is not found in these 3 namespaces, a Nameerror exception is thrown.
implementation of scoping lookup

The search process above is only conceptual. The implementation of this lookup process gives us the space to explore the implementation.

def foo ():
 a = 1 return
 a
 
def bar (): Return
 a
 
def baz (a=1): Return
 A

Let's look at the byte code for each function:

>>> import Dis
>>> dis.dis (foo)
 2   0 load_const    1 (1)
    3 store_fast    0 ( A)
 
 3   6 load_fast    0 (a)
    9 return_value
 
>>> Dis.dis (bar)
 2   0 Load_global    0 (a)
    3 return_value
 
>>> Dis.dis (baz)
 2   0 load_fast    0 (a)
    3 return_ VALUE

Notice the difference between Foo and bar. We can immediately see that at the bytecode level, Python has already judged what a local variable is, and what is not, because Foo uses load_fast, and bar uses Load_global.

We don't specifically explain how the Python compiler knows when to generate any bytecode (perhaps that's the scope of another article), but it's understandable that Python already knows what type of lookup to perform when it executes a function.

Another easy to confuse is that Load_global can be used for both the global and the lookup of an builtin namespace. Ignoring the problem of nested scopes, you can think of this as "nonlocal." The corresponding C code is probably [1]:

Case Load_global:
 v = pyobject_getitem (f->f_globals, name);
 if (v = = NULL) {
  v = pyobject_getitem (f->f_builtins, name);
  if (v = = NULL) {
   if (pyerr_exceptionmatches (pyexc_keyerror))
    Format_exc_check_arg (
       pyexc_nameerror,
       name_error_msg, NAME);
   goto error;
  }
 }
 PUSH (v);

Even if you've never seen CPython's C code, the code above is pretty straightforward. First, check that the key name we are looking for is in the F->f_globals (Global Dictionary), and then check that the name is in the F->f_builtins (built-in dictionary), and then throw the Nameerror exception if none of the above two locations are found.
to bind a constant to a local scope

Now let's look at the first code example and understand that the last argument is actually binding a function to a function in the local scope. This is done by assigning the dict.__setitem__ to the default value of the parameter. Here's another example:

def not_list_or_dict (value): Return not
 (isinstance (value, dict) or isinstance (value, list)
 
def not_list_or_ Dict (value, _isinstance=isinstance, _dict=dict, _list=list): Return not
 (_isinstance (value, _dict) or _isinstance ( Value, _list))

Here we do the same thing and bind the objects that would otherwise be in the built-in namespace to the local scope. As a result, Python will use Local_fast instead of Load_global (global lookup). So how fast is this going to be? Let's do a simple test:

$ python-m timeit-s ' def not_list_or_dict (value): Return Not (Isinstance (value, dict) or isinstance (value, list)) ' Not_ List_or_dict '
1000000 loops, Best of 3:0.48 usec per loop
$ python-m timeit-s ' def not_list_or_dict (value, _isinstance=isinstance, _dict=dict, _list=list): Return Not (_isinstance (value, _dict) or _isinstance (value, _list)) Not_list_or_dict '
1000000 loops, Best of 3:0.423 usec per loop

In other words, about 11.9% of the Ascension [2]. 5% more than I promised at the beginning of the article!
there's more to it .

It is reasonable to assume that the speed increase lies in load_fast reading of the local scope, while Load_global examines the global scope first before examining the built-in scope. In the example function above, isinstance, dict, and list are all in the built-in namespace.

But there's more. Not only can we use Load_fast to skip redundant lookups, it is also a different type of lookup.

The C code snippet above gives the Load_global code, the following is Load_fast:

Case Load_fast:
 pyobject *value = Fastlocal[oparg];
 if (value = = NULL) {
  format_exc_check_arg (pyexc_unboundlocalerror,
        unboundlocal_error_msg,
        pytuple_ GetItem (Co->co_varnames, Oparg));
  goto error;
 }
 Py_incref (value);
 PUSH (value);
 Fast_dispatch ()

We get local values by indexing an array. Although not directly, Oparg is just an index of that array.

It sounds reasonable now. Our first version of Not_list_or_dict has 4 queries, each of which is located in the built-in namespace, which is queried only after the global namespace is looked up. This is the query operation of the 8 dictionary keys. By contrast, in the second edition of Not_list_or_dict, the direct index of the C array was 4 times, with the bottom all using Load_fast. That's why local queries are faster.
Summarize

Now, the next time you see this example in someone else's code, you'll understand.

Finally, do not perform such optimizations in specific applications unless you really need to. And most of the time you don't need to do it. But if the time comes, you need to squeeze out the last bit of performance, you need to understand this.
footnotes

[1] Note that in order to be easier to read, I removed some of the performance optimizations in the above code. The real code is a little bit complicated.

[2] The example function does not actually do anything worthwhile or IO, and most of it is limited by the Python VM cycle.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.