Explore the reasons why array sorting improves the circulation efficiency of Python programs _python

Source: Internet
Author: User
Tags generator shuffle python list

I stumbled across a blog post about two Python scripts in the morning, one of which was more efficient. This blog post has been deleted, so I have no way to give a link to the article, but the script can basically be summed up as follows:
fast.py

Import time
a = [I to I in range (1000000)]
sum = 0
t1 = time.time () to
I in a:
  sum = sum + i
T2 = Time.time ()
print t2-t1

slow.py

Import
random Import shuffle
a = [I for I in Range (1000000)]
Shuffle (a)
sum = 0
t1 = ti Me.time () for
I in a:
  sum = sum + i
t2 = time.time ()
print t2-t1

As you can see, two scripts have exactly the same behavior. Produces a list that contains the first 1 million integers, and prints the time when the integers are summed. The only difference is that slow.py first sorts the integers randomly. Although this may seem strange, it seems that randomization is enough to slow the process down significantly. On my machine, running the Python2.7.3, fast.py is always faster than slow.py one-tenth seconds (fast.py execution takes about three-fourths seconds, which is an unusual growth rate). You might as well try. (I didn't test on Python3, but the result should not be much worse.) )

So why does the randomization of list elements lead to such a noticeable slowdown? The original author of Bowen wrote this as "Branch prediction (branch prediction)". If you are unfamiliar with the term, you can take a look at StackOverflow's question, which is a good explanation for this concept. (My concern is that the original author encountered the problem or similar problem, and applied the idea to a python fragment that was not appropriate for the application.) )

Of course, I doubt if the branch prediction (branch prediction) is the cause of the problem. There is no top-level conditional branching in this Python code, and it is reasonable that two scripts have strictly consistent branches within the loop body. No part of the program is conditional on these integers, and the elements of each list are not dependent on the data itself. Of course, I'm still not sure that Python is "low-level" enough that the CPU-level branch prediction can be a factor in the performance analysis of Python scripts. Python is, after all, a high-level language.

So why is slow.py so slow if it's not the reason for the branch prediction? After a bit of research, after some "beginnings of failure", I felt I had found a problem. The answer needs to be a bit familiar with Python internal virtual machines.

Start of failure: List vs. generator (lists and generators)

My first thought was python. The sort list [I for I in Range (1000000)] is more efficient than a random list. In other words, the list can be replaced with the following generator:

def numbers ():
  i = 0 while
  i < 1000000:
    yield i
    i = 1

I think it might be more productive in time. After all, if Python uses a generator internally instead of a real list to avoid the hassle of saving all integers in memory at once, this can save a lot of overhead. Random lists in slow.py cannot be easily captured by a simple generator, and all VMS (virtual machines) are not able to perform such optimizations.

However, this is not a useful discovery. If you insert A.sort () between the slow.py shuffle () and the loop, the program will be as fast as fast.py. It's clear that some of the details after sorting the numbers make the program faster.

Failed start: List contrast array

My second thought is the possibility of caching problems caused by data structures. A is a list, which naturally makes me believe that a is actually implemented through a linked list. If the shuffle operation deliberately randomization the nodes of this linked list, then fast.py may be able to allocate all the linked table elements in the lists to adjacent addresses, thereby using the advanced local cache, Slow.py will have many cache misses because each node references another node that is not on the same cache line.

Unfortunately, that's not true either. The Python list object is not a list of links, but rather an array of true meaning. In particular, a Python list object is defined with the C struct body:

typedef struct {
 pyobject_var_head
 pyobject **ob_item;
 py_ssize_t allocated;
} Pylistobject;

...... In other words, Ob_item is a pointer to an array of pyobjects pointers, and the size of the allocations is the size we assign to the array. Therefore, this does not help to solve this problem (although it's not certain that I'm not sure about the complexity of the algorithm in Python for list operations: The complexity of the list's add operation algorithm is O (1), the algorithm complexity of accessing any list element is O (1), etc.). I just want to explain why Guido chose to call them the list "lists" rather than the array "arrays", but they are actually arrays.

Solution: Overall Object

Array elements are contiguous in memory, so such data structures do not cause caching problems. It turns out that the cache location is the reason for the slow slow.py, but it comes from an unexpected place. In Python, integers are objects that are allocated in the heap rather than a simple value. Especially in virtual machines, integer objects look like this:

typedef struct {
 pyobject_head
 long ob_ival;
} Pyintobject;

The only "interesting" element in the structure above is ob_ival (similar to an integer in C language). If you think it's wasteful to use a complete heap object to implement integers, you might be right. Many languages are optimized to avoid this. For example, the Ruby interpreter for Matz usually stores objects as pointers, but handles the frequently used pointers as an exception. In simple terms, the Ruby interpreter plugs the fixed-length number into the same space as an object application, and uses the least significant bit to mark it as an integer instead of a pointer (in all modern systems, malloc always returns a memory address that is aligned in multiples of 2). At that point, you only need to get the value of the integer by the appropriate displacement-no heap location or redirection. If CPython do similar optimizations, slow.py and fast.py will have the same speed (and they may all be quicker).

So what does CPython do with integers? What behavior of the interpreter gives us so much confusion? The Python interpreter assigns integers to blocks in 40Byte at a time. When Python needs to generate a new integer object, it opens up the next available space in the current integer "block" and stores the integer in it. Our code allocates 1 million integers in the array, and most of the adjacent integers are placed in adjacent memory. As a result, traversal in an ordered 1 million number shows a good cache location, while locating frequent cache misses in the first 1 million digits of a random sort.

So the answer to "why the array of arrays makes the code faster" is that it doesn't work at all. Array traversal without disrupting order is faster because the order in which we access the integer objects is the same as the order in which they are assigned (they must be assigned).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.