This article explores the reasons why array sorting improves the running efficiency of Python programs.

Source: Internet
Author: User
Tags shuffle python list

This article explores the reasons why array sorting improves the running efficiency of Python programs.

In the morning, I accidentally saw a blog post about two Python scripts, one of which is more efficient. This blog post has been deleted, so I cannot give a link to the article, but the script can basically be summarized as follows:
Fast. py
 

import timea = [i for i in range(1000000)]sum = 0t1 = time.time()for i in a:  sum = sum + it2 = time.time()print t2-t1

Slow. py
 

import timefrom random import shufflea = [i for i in range(1000000)]shuffle(a)sum = 0t1 = time.time()for i in a:  sum = sum + it2 = time.time()print t2-t1

As you can see, the two scripts have the same behavior. Generates a list containing the first 1 million integers and prints the time For summation of these integers. The only difference is that slow. py first sorts integers randomly. Although this seems odd, it seems that randomization is enough to significantly slow down the program. On my machine, running Python2.7.3 and fast. py are always 10 seconds faster than slow. py (fast. py execution takes about 3/4 seconds, which is an unusual growth rate ). Try again. (I didn't test on Python3, but the results should not be too bad .)

So why does the Randomization of list elements cause this obvious slowdown? The original author of the blog wrote this as "branch prediction )". If you are not familiar with this term, you can look at it in StackOverflow's question. Here we have explained this concept very well. (My doubt is that the original author of the original article encountered this problem or similar issues, and applied this idea to the Python snippets that are not suitable for application .)

Of course, I doubt whether branch prediction is the cause of the problem. In this Python code, there is no top-level conditional branch, and it is reasonable that the two scripts have strictly consistent branches in the loop body. No part of the program is conditional on these integers, and each list element is independent of the data itself. Of course, I still don't know if python is "underlying" enough, so that CPU-level branch prediction can become a factor in the performance analysis of python scripts. Python is a high-level language after all.

Therefore, if it is not the reason for branch prediction, why is slow. py so slow? Through some research, after some "failed Beginnings", I felt I had found the problem. This answer requires familiarity with Python internal virtual machines.

Start of failure: List vs. Generator (lists and generators)

My first idea is that Python processes the sorted list [I for I in range (1000000)] more efficiently than the random list. In other words, this list can be replaced by the following generator:

def numbers():  i = 0  while i < 1000000:    yield i    i += 1

I think this may be more time efficient. After all, if Python uses a generator internally to replace the real list, it can avoid the trouble of saving all integers in the memory at one time, which can save a lot of overhead. The random list in slow. py cannot be easily captured by a simple generator, and all VMS (virtual machines) cannot perform such optimization.

However, this is not a useful discovery. If a. sort () is inserted between the shuffle () and loop of slow. py, the program will be as fast. py. Obviously, some details after numerical sorting make the program faster.

Start of failure: list comparison Array

My second thought is that there may be cache problems caused by data structures. A is a list, which naturally makes me believe that a is actually implemented through a linked list. If the shuffle operation intentionally randomizes the node of this linked list, then fast. py may allocate all linked list elements in the list to adjacent addresses, so as to use advanced local cache, while slow. py may cause many cache misses, because each node references another node that is not on the same cache row.

Unfortunately, this is not true. The list object of Python is not a list of links, but an array in the true sense. In particular, the C struct defines Python List objects:
 

typedef struct { PyObject_VAR_HEAD PyObject **ob_item; Py_ssize_t allocated;} PyListObject;

...... In other words, ob_item is a pointer to the PyObjects pointer array, and the allocated size is the size we allocate to the array. Therefore, this does not help solve this problem (although I am not sure about the complexity of the List Operation Algorithm in Python, it is comforting: the complexity of the List addition operation algorithm is O (1 ), the algorithm complexity for accessing any list element is O (1 ). I just want to explain why Guido chose to call them list "lists" instead of array "arrays", but they are actually arrays.

Solution: overall object

Array elements are adjacent in the memory, so such a data structure will not cause cache problems. It turns out that the cache location is why slow. py slows down, but it comes from an unexpected place. In Python, integers are allocated to objects in the heap rather than a simple value. Especially in virtual machines, integer objects look like the following:
 

typedef struct { PyObject_HEAD long ob_ival;} PyIntObject;

The only "interesting" element in the above struct is ob_ival (similar to an integer in C ). If you think it is a waste of using a complete heap object to implement integers, you may be right. Many languages are optimized to avoid this. For example, Matz's Ruby interpreter usually stores objects as pointers, but does not handle frequently used pointers. To put it simply, the Ruby interpreter inserts a fixed length number as an object application into the same space and marks it as an integer rather than a pointer with the lowest valid bit (in all modern systems, malloc always returns the memory address aligned in multiples of 2 ). At that time, you only need to get the integer value through proper displacement-no heap location or redirection. If CPython performs similar optimization, slow. py and fast. py will have the same speed (and they may all be faster ).

So how does CPython process integers? What behavior does the interpreter give us so many questions? The Python interpreter assigns an integer to a "block" of 40 bytes each time ). When Python needs to generate a new integer object, it will open up the next available space in the current integer "Block" and store the integer in it. Our code allocates 1 million Integers to the array. Most adjacent integers are placed in adjacent memory. Therefore, traversal in the ordered 1 million counts shows a good cache position, while frequent cache hits appear in the first 1 million counts of random sorting.

Therefore, the answer to "Why sorting arrays makes code faster" is that it does not work at all. Array traversal without disrupting the order is faster, because the order in which we access the integer objects is the same as that in the allocation order (they must be allocated ).

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.