Explore array sorting reasons for improving the efficiency of the Python program's loop operation

Last Update:2016-06-06 Source: Internet

Author: User

Tags shuffle python list

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the morning I accidentally saw a blog post about two Python scripts, one of which was more efficient. This blog post has been deleted, so I can't give the article link, but the script basically can be summed up as follows:
fast.py

Import Timea = [i-I in range 1000000]]sum = 0T1 = Time.time () for I in a:  sum = sum + it2 = time.time () print t2-t1

slow.py

Import timefrom Random Import Shufflea = [I for I in range (1000000)]shuffle (a) sum = 0T1 = Time.time () for I in a:  sum = sum + it2 = time.time () print t2-t1

As you can see, two scripts have exactly the same behavior. Produces a list that contains the first 1 million integers and prints the time that the integers are summed. The only difference is that the slow.py first randomly sorts the integers. Although this may seem strange, it seems that randomization is enough to make the program noticeably slower. On my machine, running Python2.7.3, fast.py is always one-tenth seconds faster than slow.py (fast.py execution takes about three-fourths seconds, which is an unusual growth rate). You might as well try it. (I did not test on Python3, but the result should not be much worse.) ）

So why is the list element randomization causing such a noticeable slowdown? The original author of the blog wrote this as "Branch prediction (branch prediction)". If you are unfamiliar with this term, you can look at StackOverflow's question, which explains the concept well. (My concern is that the original author of the original has encountered this or similar problem and applied the idea to a python fragment that is not suitable for the application.) ）

Of course, I suspect that branch prediction (branch prediction) is the real cause of the problem. There is no top-level conditional branching in this Python code, and it is reasonable that two scripts have a strictly consistent branch in the loop body. No part of the program is conditional on these integers, and the elements of each list are not dependent on the data itself. Of course, I'm still not sure if Python is "bottom-up" enough, so that CPU-level branch prediction can be a factor in the performance analysis of Python scripting. Python is, after all, a high-level language.

So why is slow.py so slow if it is not the cause of the branch prediction? After a bit of research, after some "failed beginnings", I felt I had found a problem. This answer requires a bit of familiarity with Python's internal virtual machines.

Start of failure: List vs. generator (lists and generators)

My first thought was that Python would be more efficient at processing a sorted list [I for I in Range (1000000)] than the random list. In other words, this list can be replaced with the following generator:

def numbers ():  i = 0 while  i < 1000000:    yield i    i + = 1

I think it might be more efficient in time. After all, if Python uses generators internally instead of real lists to avoid the hassle of saving all integers in memory at once, this can save a lot of overhead. Random lists in slow.py cannot be easily captured by a simple generator, and all VMS (virtual machines) cannot be optimized.

However, this is not a useful discovery. If you insert A.sort () between the slow.py shuffle () and the loop, the program will be as fast as fast.py. It is clear that some of the details after sorting the numbers make the program faster.

Start of failure: List comparison array

My second idea is that there is a possibility that the data structure is causing the caching problem. A is a list, which naturally makes me believe that a is actually implemented through a chain of lists. If the shuffle operation deliberately randomize the nodes of this list, then fast.py may be able to assign all the lists of linked table elements to adjacent addresses, thus adopting advanced local caches. There are many cache misses in slow.py, because each node references another node that is not on the same cache line.

Unfortunately, this is not true either. Python's list object is not a list of links, but rather an array of true meanings. In particular, a Python list object is defined with a C struct:

typedef struct {pyobject_var_head pyobject **ob_item; py_ssize_t allocated;} Pylistobject;

...... In other words, Ob_item is a pointer to an array of pyobjects pointers, and the allocated size is the size we assign to the array. Therefore, this does not help to solve this problem (although it is somewhat comforting to me that I am not sure about the complexity of the algorithm for list operations in Python: The complexity of the add operation algorithm for the list is O (1), the algorithm complexity of accessing any list element is O (1), and so on). I just want to explain why Guido chose to call them the list "lists" instead of the array "arrays", but actually they are arrays.

Workaround: Overall Object

Array elements are contiguous in memory, so such data structures do not introduce caching problems. It turns out that the cache location is the cause of the slow.py slowdown, but it comes from an unexpected place. In Python, integers are objects that are allocated in the heap instead of a simple value. Especially in virtual machines, an integer object looks like this:

typedef struct {pyobject_head long ob_ival;} Pyintobject;

The only "interesting" element in the structure above is ob_ival (similar to an integer in C). If you think that using a complete heap object to implement integers is wasteful, you might be right. Many languages are optimized to avoid this. For example, the Ruby interpreter for Matz typically stores objects as pointers, but exceptions are made to frequently used pointers. Simply put, the Ruby interpreter plugs the fixed-length number into the same space as the object and marks it with the least significant bit as an integer instead of a pointer (in all modern systems, malloc always returns a memory address aligned in multiples of 2). At that point, you just need to get the value of the integer by the appropriate displacement-no heap location or redirection is required. If CPython do similar optimizations, slow.py and fast.py will have the same speed (and they may all be faster).

So how does the CPython deal with integers? What is the interpreter's behavior that gives us so much doubt? The Python interpreter assigns integers to the block in 40Byte at a time. When Python needs to generate a new integer object, it opens the next free space in the current integer "block" and stores the integer in it. Our code allocates 1 million integers in the array, and most of the neighboring integers are placed in adjacent memory. As a result, traversal in an ordered 1 million number shows good cache positioning, while locating frequent cache misses in the first 1 million digits of a random order.

Therefore, the answer to "why array ordering makes code faster" is that it does not have this effect at all. Array traversal without scrambling is faster because we access the order of the integer objects in the same order as they are allocated (they must be assigned).



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More