Optimize Python code to speed up searching within the scope

Source: Internet
Author: User
This article mainly introduces how to optimize the Python code to accelerate the search in scope. This article introduces the C code related to CPython to optimize the search function and speed up the search, for more information, see how I will demonstrate how micro optimization improves the execution speed of python code by 5%. 5%! At the same time, it will also anger anyone who maintains your code.

But in fact, this article only explains the code that you occasionally encounter in the standard library or other people's code. Let's first look at a standard library example, collections. OrderedDict class:

def __setitem__(self, key, value, dict_setitem=dict.__setitem__): if key not in self:  root = self.__root  last = root[0]  last[1] = root[0] = self.__map[key] = [last, root, key] return dict_setitem(self, key, value)

Note the last parameter: dict_setitem = dict. _ setitem __. It makes sense if you think carefully. To associate a value with a key, you only need to pass three parameters to _ setitem _: the key to be set and the value associated with the key, the _ setitem _ class method passed to the built-in dict class. Wait. okay, maybe the last parameter is meaningless.
Scope query

To understand what happened, let's look at the scope. Start with a simple question: in a python function, if something called open is encountered, how does python find the value of open?

# 
 
   def myfunc(): # 
  
    with open('foo.txt', 'w') as f:  pass
  
 

A simple answer: If you do not know the content of GLOBAL and LOCAL, you cannot determine the value of open. In concept, python checks three namespaces for name search (ignore nested scopes for simplicity ):

Local namespace
Global namespace
Built-in namespace

Therefore, in the myfunc function, if you try to find the value of open, we will first check the local namespace, then the global namespace, and then the built-in namespace. If the definition of open is not found in the three namespaces, a NameError exception is thrown.
Implementation of scope search

The search process above is only conceptual. The implementation of this search process gives us space for exploration and implementation.

def foo(): a = 1 return a def bar(): return a def baz(a=1): return a

Let's take a look at the bytecode of each function:

>>> import dis>>> dis.dis(foo) 2   0 LOAD_CONST    1 (1)    3 STORE_FAST    0 (a)  3   6 LOAD_FAST    0 (a)    9 RETURN_VALUE >>> dis.dis(bar) 2   0 LOAD_GLOBAL    0 (a)    3 RETURN_VALUE >>> dis.dis(baz) 2   0 LOAD_FAST    0 (a)    3 RETURN_VALUE

Note the difference between foo and bar. Now we can see that at the bytecode level, python has determined what is a local variable and what is not, because foo uses LOAD_FAST, while bar uses LOAD_GLOBAL.

We won't elaborate on how the python compiler knows when to generate what bytecode (maybe that's another article), but it's enough to understand, python knows the type of search when executing a function.

Another obfuscation is that LOAD_GLOBAL can be used for both global and built-in namespace searches. Ignore the nested scope problem. you can think of it as "non-local ". The corresponding C code is probably [1]:

case LOAD_GLOBAL: v = PyObject_GetItem(f->f_globals, name); if (v == NULL) {  v = PyObject_GetItem(f->f_builtins, name);  if (v == NULL) {   if (PyErr_ExceptionMatches(PyExc_KeyError))    format_exc_check_arg(       PyExc_NameError,       NAME_ERROR_MSG, name);   goto error;  } } PUSH(v);

Even if you have never read the C code of CPython, the above code is quite straightforward. First, check whether the searched key name is in f-> f_globals (Global Dictionary), then check whether the name is in f-> f_builtins (built-in dictionary), and finally, if the preceding two locations are not found, a NameError exception is thrown.
Bind a constant to a local scope

Now let's look at the initial code example to understand that the last parameter is actually binding a function to a function in a local scope. Specifically, dict. _ setitem _ is assigned as the default value of the parameter. Here is another example:

def not_list_or_dict(value): return not (isinstance(value, dict) or isinstance(value, list)) def not_list_or_dict(value, _isinstance=isinstance, _dict=dict, _list=list): return not (_isinstance(value, _dict) or _isinstance(value, _list))

Here we do the same thing to bind the objects in the internal namespace to a local scope. Therefore, python uses LOCAL_FAST instead of LOAD_GLOBAL (global search ). How fast is this? Let's do a simple test:

$ python -m timeit -s 'def not_list_or_dict(value): return not (isinstance(value, dict) or isinstance(value, list))' 'not_list_or_dict(50)'1000000 loops, best of 3: 0.48 usec per loop$ python -m timeit -s 'def not_list_or_dict(value, _isinstance=isinstance, _dict=dict, _list=list): return not (_isinstance(value, _dict) or _isinstance(value, _list))' 'not_list_or_dict(50)'1000000 loops, best of 3: 0.423 usec per loop

In other words, there is an increase of about 11.9% [2]. It is more than 5% more than I promised at the beginning of the article!
More content

It can be reasonably believed that the speed improvement lies in LOAD_FAST reading local scopes, while LOAD_GLOBAL will first check the global scopes before checking the built-in scopes. In the example function above, isinstance, dict, and list are all in the built-in namespace.

However, there are more. We can not only use LOAD_FAST to skip extra searches, but also different types of searches.

The above C code snippet provides the LOAD_GLOBAL code. below is the LOAD_FAST code:

case LOAD_FAST: PyObject *value = fastlocal[oparg]; if (value == NULL) {  format_exc_check_arg(PyExc_UnboundLocalError,        UNBOUNDLOCAL_ERROR_MSG,        PyTuple_GetItem(co->co_varnames, oparg));  goto error; } Py_INCREF(value); PUSH(value); FAST_DISPATCH()

We index an array to obtain the local value. Although it does not appear directly, oparg is only an index of the array.

It sounds reasonable now. In our first version of not_list_or_dict, four queries are required. each name is in the built-in namespace and will be queried only after the global namespace is searched. This is the query operation for 8 dictionary keys. In contrast, in the second version of not_list_or_dict, the C array is directly indexed four times, and the underlying layer uses LOAD_FAST. This is why local queries are faster.
Summary

Now, the next time you see this example in other people's code, you will understand it.

Finally, do not perform such optimization in a specific application unless necessary. And you don't have to do it most of the time. But if the time comes, you need to squeeze out the last point of performance and understand this point.
Footer

[1] Note: to make it easier to read, I removed some performance optimizations from the code above. The real code is a little complicated.

[2] The example function does not actually do anything valuable or perform I/O operations, most of which are restricted by python VM loops.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.