A tutorial on optimizing the performance of numpy packages in Python _python

Source: Internet
Author: User
Tags first row in python advantage

NumPy is the foundation of numerous scientific software packages in Python. It provides a special data type Ndarray, which is optimized for vector computing. This object is the core of most algorithms in scientific numerical computation.

Compared to native Python, the use of numpy arrays can achieve significant performance acceleration, especially if your calculations follow a single instruction multiple data flow (SIMD) paradigm. However, the use of NumPy may also unintentionally write the code that is not optimized.

In this article, we'll look at a few tips that can help you write efficient NumPy code. Let's first look at how to avoid unnecessary copies of the array to save time and memory. So we will need to go deep inside the numpy.
Learn to avoid unnecessary copies of data

The numpy array calculation may involve internal copies between memory blocks. Sometimes there will be unwanted copies, which should be avoided at this time. Accordingly, here are some tips to help you optimize your code.

Import NumPy as NP

To view the memory address of an array

1. The first step in viewing a silent array copy is to find the address of the array in memory. The following function is doing this:

def ID (x): # This function returns the ' Memory # block address ' an
  array.
  Return x.__array_interface__[' data '][0]

2. Sometimes you may need to copy an array, for example, if you need to manipulate an array, the original copy of the memory remains intact.

A = Np.zeros (10); AID = ID (a); Aid
71211328
b = a.copy (); ID (b) = = Aid
False

Two arrays with the same data address (such as the return value of the ID function), sharing the underlying data buffer. However, arrays that share the underlying data buffers have the same data address only if they have the same offset (meaning that their first element is the same). The two arrays of shared data buffers, but with different offsets, have subtle differences in memory addresses, as shown in the example below:

ID (a), ID (a[1:])
(71211328, 71211336)

In this article, we'll make sure that the array that the function uses has the same offset. Below is a more reliable scenario for determining whether two arrays share the same data:


def get_data_base (arr): ""
  for a given numpy array, finds the base array that "owns" the actual data.
  "" " Base = Arr
  while isinstance (Base.base, Np.ndarray):
    base = base.base return
  base
 
def arrays_share_ Data (x, y): Return
  get_data_base (x) is Get_data_base (y)
 
print (Arrays_share_data (a,a.copy ()), Arrays_share _data (a,a[1:]))
False True

Thank Michael Droettboom for pointing out this more precise approach and proposing this alternative.
In-place operations and implicit copy operations

3. Array calculations include in-place operations (the first example: array modification) or implicit copy operations (the second example: creating a new Array).

A *= 2; ID (a) = = Aid
True
 
c = A * 2; ID (c) = = Aid
False

Be sure to select the type of action you really want. The implicit copy operation is obviously very slow, as follows:

%%timeit a = Np.zeros (10000000)
a *= 2
loops, Best of 3:19.2 ms Per loop
 
%%timeit a = Np.zeros (10000000) 
   b = A * 2
loops, Best of 3:42.6 ms Per loop

4. Reshaping an array may involve copying operations, or it may not be involved. The reasons are explained below. For example, reshaping a two-dimensional matrix does not involve copying operations unless it is transpose (or more generally discontinuous operations):

A = Np.zeros ((10, 10)); AID = ID (a); Aid
53423728

Reshape an array while preserving its order, without triggering a copy operation.

b = A.reshape ((1,-1)); ID (b) = = Aid
True

Transpose an array changes its order, so the remodeling triggers the copy operation.

c = A.t.reshape ((1,-1)); ID (c) = = Aid
False

Therefore, the instructions behind it are significantly slower than the instructions in the front.

5. The flatten and revel methods of the array change the array into a one-dimensional vector (flattening the array). The Flatten method always returns a copy of the copy, and the Revel method returns a copy copy only when necessary (so the method is much faster, especially when operating on a large array).

D = A.flatten (); ID (d) = = Aid
False
 
e = A.ravel (); ID (e) = = Aid
True
 
%timeit a.flatten ()
1000000 loops, Best of 3:881 NS per loop
 
%timeit a.ravel ()
1000000 loops, best 3:294 ns per loop

Broadcast rules

6. Broadcast rules allow you to compute on a different but compatible array of shapes. In other words, you don't always need to reshape or flatten arrays to match their shapes. The following example illustrates the two methods of cross product between two vectors: The first method involves the transformation of the array, and the second involves the broadcast rules. Obviously the second method is much quicker.

n = 1000
 
a = Np.arange (n)
ac = a[:, Np.newaxis]
ar = a[np.newaxis,:
 
%timeit np.tile (AC, (1, N)) * Np.til E (AR, (n, 1))
loops, Best of 3:10 ms per loop
 
%timeit ar * ac
loops, Best of 3:2.36 ms Each loop

Efficient selection on numpy arrays

NumPy provides a variety of array partitioning methods. An array view involves an array of raw data buffers, but with different offsets, shapes, and step lengths. NumPy only allows equal step selection (i.e. linearly delimited index). NumPy also provides specific features that are arbitrarily selected along one axis. Finally, fancy indexing (fancy indexing) is the most common choice, but as we'll see in the article, it's also the slowest. If possible, we should choose a quicker alternative.

1. Create an array with many rows. We will select the fragment of the array along the first dimension.

N, d = 100000,
a = Np.random.random_sample ((n, D)); aid = ID (a)

Array views and fancy indexes

2. Select one row per 10 lines, where two different methods (array view and fancy index) are used.

B1 = a[::10]
b2 = a[np.arange (0, N, ten)]
np.array_equal (B1, B2)
True

3. The array view points to the original data buffer, and the fancy index produces a copy of it.

ID (B1) = = aid, id (b2) = = Aid
(True, False)

4. Compare the efficiency of implementation of the two methodologies.

%timeit A[::10]
1000000 loops, best 3:804 ns per loop
 
%timeit a[np.arange (0, N, ten)]
loops, Best of 3: 14.1 ms Per loop

The fancy index is slow in several orders of magnitude because it replicates a large array.
Alternate Fancy Index: Index list

5. The array view is powerless when it is necessary to make a non-equal step selection along a dimension. However, the alternative to fancy indexing still exists in this case. Given an index list, a numpy function can perform a select operation along one axis.

i = np.arange (0, N, ten)
 
B1 = a[i]
b2 = Np.take (A, I, axis=0)
 
np.array_equal (B1, B2)
True

The second method is a little quicker:

%timeit A[i]
loops, best of 3:13 ms per loop
 
%timeit Np.take (A, I, axis=0)
loops, Best of 3:4.87 Ms Per loop

Alternate Fancy Index: Boolean mask

6. When an index selected along an axis is specified through a boolean mask vector, the Compress function can be used as an alternative to fancy indexing.

i = Np.random.random_sample (n) < 5

You can choose by using the fancy index or the np.compress function.

B1 = a[i]
b2 = np.compress (i, A, axis=0)
 
np.array_equal (B1, B2)
True
 
%timeit a[i]
loops, Best of 3: 59.8 ms per loop
 
%timeit np.compress (i, A, axis=0)
loops, Best of 3:24.1 ms

The second method is also much faster than the fancy index.

A fancy index is the most common way to arbitrarily select an array. However, there are often more efficient and faster methods that should be preferred as far as possible.

The array view should be used when making equal step selection, but note the fact that the view involves the original data buffer.
How does it work?

In this section, we will see what happens at the bottom when using numpy, so that we can understand the optimization techniques in the article.
Why is the numpy array so efficient?

A numpy array is basically composed of metadata (dimensions, shapes, data types, and so on) and actual data. The data is stored in a uniformly contiguous block of memory, which exists at a specific address of system memory (random access memory, or RAM) and is called a data buffer. This is the main difference from a pure Python structure such as list, where the elements of the list are distributed in system memory. This is the determining factor in making the NumPy array so efficient.

Why is this so important? The main reasons are:

1. Low-level languages such as C, can be very efficient in the implementation of array calculations (a large part of NumPy is actually written in C). For example, if you know the memory block address and data type, the array calculation simply iterates through all of the elements. But using the list implementation in Python can have a lot of overhead.

2. Space location access in the memory access mode can produce significant performance improvements, especially thanks to the CPU cache. In fact, the cache loads the byte blocks from RAM to the CPU registers. The adjacent elements can then be loaded efficiently (either sequentially or in a reference position).

3. Data elements are stored continuously in memory, so NumPy can take advantage of modern CPU vectorization instructions, such as Intel's SSE and AVX,AMD's XOP. For example, for vector arithmetic calculations that are implemented as CPU instructions, you can load multiple consecutive floating-point numbers in a 128,256 or 512-bit register.

Also, say the fact that NumPy can connect to highly optimized linear algebra libraries via the Intel Math Kernel Library (MKL), such as Blas and Lapack. Some specific matrix calculations in NumPy may also be multi-threaded, taking advantage of modern multi-core processors.

In summary, the data is stored in a contiguous block of memory, depending on the memory access pattern, CPU caching, and vectorization instructions to ensure the best use of the modern CPU architecture.
What is the difference between an in-place operation and an implicit copy operation?

Let's explain technique 3. An expression similar to a *= 2 corresponds to an in-place operation in which all element values of an array are multiplied by 2. By contrast, a = a*2 means that a new array containing the A*2 result value is created, and the variable a points to the new array at this point. The old array becomes unreferenced and will be deleted by the garbage collector. Memory allocation does not occur in the first case, but in the second case memory allocation occurs.

More generally, an expression like a[i:j] is a view of some parts of an array: They point to the memory buffer that contains the data. Changing them with an in-place operation changes the original data. Therefore, a[:] = A * 2 result is an in-place operation, and a = A * 2 is different.

Knowing the details of NumPy can help you solve some of the errors (for example, arrays are inadvertently modified because of an operation on one view), and can optimize the speed and memory consumption of your code by reducing the number of unnecessary copies.
Why can't some arrays be recreated without copying them?

Here we explain trick 4, a transpose two-dimensional matrix cannot be flattened without relying on a copy. A two-dimensional matrix contains elements indexed by two digits (rows and columns), but it is internally stored as a one-dimensional contiguous block of memory, and can be accessed using a number. There are several ways to store a matrix element in a one-dimensional block of memory: we can put the first row of the elements, and then the second row, and so on, or first column of the elements, then the second column, and so on. The first method is called row prioritization, and the latter is called column prioritization. The choice between the two methods is only an internal convention problem: NumPy uses row precedence, similar to C, and different from Fortran.

More generally, NumPy uses the concept of step length to convert between multidimensional indexes and the underlying sequence of elements (one-dimensional) memory locations. The specific mapping relationship between ARRAY[I1, I2] and the related byte addresses of the internal data is:

offset = array.strides[0] * i1 + array.strides[1] * i2

When reshaping an array, numpy avoids copying as much as possible by modifying the Step property. For example, when you transpose a matrix, the order of the steps is flipped, but the underlying data is still the same. However, simply relying on the modified step size does not complete the flattening of a transpose array (try!). ), so a copy is required.

Recipe 4.6 (using step techniques in NumPy) contains a broader discussion of step size. At the same time, Recipe4.7 (using the step technique to implement an efficient moving average algorithm) shows how to use the steps to speed up specific array computations.

The internal array arrangement can also explain the unexpected performance differences between some numpy similar operations. As a small exercise, can you explain the example below?

A = Np.random.rand (5000, 5000)
%timeit a[0,:].sum ()
%timeit a[:,0].sum () 100000
 
loops, Best of 3:9.57μs per Loop
10000 loops, Best of 3:68.3μs per loop

What is the broadcast rule of NumPy?

Broadcast rules describe arrays with different dimensions and/or shapes that are still available for calculation. The general rule is that when two dimensions are equal, or one of them is 1 o'clock, they are compatible. NumPy uses this rule to compare the shape of the two-element series, starting with the number of dimensions that are behind it and deriving it forward. The smallest dimension is automatically extended internally to match other dimensions, but the operation does not involve any memory replication.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.