What is the performance of Numba?

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article describes a new Python Library-Numba, which is more user-friendly in terms of computational performance.

1. what is Numba?

Numba is a library that compiles python code into local machine instructions at run time without forcing a drastic change to the normal Python code (explained later). Translation / magic is done using the LLVM compiler, which is developed by a fairly active open source community.

Numba was originally developed by Continuum Analytics , and the company also developed the famous Anaconda, but now it is open source. The core application area is math-heavy(intensive mathematics?) Heavy math? ) and array-oriented(array-oriented) features that are fairly slow in native Python . Imagine writing a module in Python that requires an element to iterate through a very large array to perform some calculations instead of using vector manipulation. That's a bad idea, isn't it? So the "usual" library functions are written in C/s + + or Fortran , and are used as external libraries in Python after compilation. Numba Functions such as these can also be written in a common Python module, and the difference in the speed of operation is gradually narrowing.

2. How can i get to Numba ?

The recommended way to install Numba is to use Conda package Management

Conda Install Numba

You can also use Pip to install Numba, but the latest release is only a day long. However, as long as you are able to use Conda, I would recommend using it as it is able to install for you such as CUDA Toolkit, perhaps you want to make your Python code GPU Ready (of course, it is also possible!). ）。

3. How do I use numba ?

There are not many requirements to use it. Basically, you write your own "normal" Python function, and then add a decoration to the function definition (if you're not very familiar with adorners, read about this or that). You can use different types of adorners, but @jit may be one of the first choices. Other adorners can be used for example to create numpy Common functionality @vectorize or to write code @cuda that will be executed on the CUDA GPU . These decorations are not described in this article for the time being. Now, let's take a look at the basic steps. The code examples they provide are the sum function of the 2d array, and the following is the code:

From Numba import JIT

From NumPy import Arange

# JIT decorator tells Numba to Compilethis function.# The argument types would be inferred by Numba when function iscalled.

@jit

def sum2d (arr):

M, N = arr. Shape

Result = 0.0

for I in range (M):

for J in Range (N):

Result + = Arr[i,j]

return Result

A = arange (9). Reshape (3,3)

Print (Sum2d (a))

As you can see, theNumba adorner is added to the function definition, and the voilá function will run quickly. However, here are some interesting things to note: You can only use Numpy and standard library functions to speed up Numba , even without having to open all of their features. They have a fairly good documentation (resources) that lists all the supported content. See here is the supported Python feature and here are the supported Numpy features. There may not be a lot of features supported now, but I want to tell you, that's enough! Keep in mind thatNumba is not about speeding up your database queries or how to harden your image processing capabilities. Their goal is to speed up array-oriented computations, which we can solve using the functions provided in their libraries.

4. Example and speed comparison

Skilled Python users will never use the above code to implement the sum function, but instead call numpy.sum. Instead, I'll introduce you to another example, in order to get a better understanding of this example, maybe it's just a small background story.

Judging from the knowledge I have learned, I will think of myself as a water writer, and a lot of the things I do are simulating the process of rainfall runoff. Simply put: Use time-series data, such as rainfall and air temperature, and then try to create a model to determine how much water flows in a river. This is very complicated in layman's view. But, for us, it's simple. We typically use a module to iterate through the input array, and for each time step we update some of the internal state of the module (for example, simulating soil moisture, snow, or intercepting trees in the water). At the end of each time period, the water flow is calculated, which depends not only on the rain at the same time step, but also on the internal model state (or storage). In this case, we need to consider the state and output of the previous time step. So you might see the problem: We have to calculate the whole process over time, and Python is slow to solve this problem! This is why most modules are implemented in Fortran or c/s + + . As mentioned earlier:Python is slow for this array-oriented calculation. But Numba allows us to do the same thing in Python without much performance damage. I think it would be convenient for the model to be understood and developed at least.

Okay, now let's take a look at what we get . We will use one of the simplest modules, the ABC model developed by MB fiering for educational purposes in 1967 , and the speed of Python code with Numba The optimized Python code is compared with the Fortran implementation. Note that this model is not something we use in the real world (as the name suggests), but I think it might be a good idea to give an example.

A,b,C module is a three parameter module (a,b,C, habitually named), it only receives the rain amount as input, only one storage. The total amount of evaporation and transpiration loss of soil water (parameter b), the other part through the soil infiltration to the groundwater storage (parameter a), the last parameter C represents the total amount of groundwater, leaving the underground into a river. The code in Python , using the Numpy array, may resemble the following:

Import NumPy as NP

def abc_model_py (A, B, C, rain):

#initialize Array for the stream discharge of each time step

Outflow = np. Zeros (Rain. Size), Dtype=np. float64)

#placeholder, in which we save the storage content of the previous and

#current timestep

state_in = 0

State_out = 0

for I in range (Rain. Size):

#Update the storage

State_out = (1 - c) * state_in + a * Rain[i]

#Calculate the stream discharge

Outflow[i] = (1 - a - b) * Rain[i] + c * state_out

state_in = state_out

return Outflow

Next we use Numba to achieve the same functionality.

@jit

def Abc_model_numba (A, B, C, rain):

Outflow = np. Zeros (Rain. Size), Dtype=np. float64)

state_in = 0

State_out = 0

for I in range (Rain. Size):

State_out = (1 - c) * state_in + a * Rain[i]

Outflow[i] = (1 - a - b) * Rain[i] + c * state_out

state_in = state_out

return Outflow

I run these modules with random numbers as inputs, just to compare the calculation time, and also to compare the time for Fortran implementations. Let's take a look at the numbers:

Py_time = %Timeit -R 5 -n -oabc_model_py (0.2, 0.6, 0.1,rain)

>> 6.75 s±11.6 msper Loop (mean±std. Dev. of 5 runs, Loopseach)

# Measure the execution time of Thenumba implementation

Numba_time = %Timeit -R 5 -n -Oabc_model_numba (0.2, 0.6, 0.1,rain)

>> 30.6 ms±498μsper loop (mean±std. Dev. of 5 runs, Loopseach)

# Measure the execution time of Thefortran implementation

Fortran_time = %Timeit -R 5 -n -Oabc_model_fortran (0.2, 0.6, 0.1,rain)

>> 31.9 ms±757μsper Loop (mean±std. Dev. of 5 runs, Loopseach)

# Compare The pure Python vs numbaoptimized time

Py_time. Best / numba_time. Best

>> 222.1521754580626

# Compare The time of the Fastes Numbaand Fortran run

Numba_time. Best / fortran_time. Best

>> 0.9627960721576471

By adding an adorner, our calculations are 222 times faster than pure Python code, or even more quickly than Fortran . When computing power determines the future,Numba is bound to be accepted by more people.

The above is my introduction, I hope someone now has the power to seeNumbaLibrary.

What is the performance of Numba?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More