Linux Lakes 17: What are the characteristics of a language suitable for numerical computing

Source: Internet
Author: User
Tags scalar

In January 2015, I continued to wander in the ocean of numerical computation. During this time, I took a moment to read the Book of Scientific Computation and numerical analysis of Python, and also studied the user manual of Octave, and even the old Fortran, the new R language I went to understand. For the numerical calculation of the library, I learned about the boost Ublas, used to OpenCV, of course, the most known is Python numpy, scipy and pandas.

Some of the previous essays have made a lot of tools, so today I am devoted to a discussion of programming languages. My Linux lakes and Rivers series is a moment of methodology for a while, and every time I talk about programming languages. What we are talking about today is my view of the programming language that is suitable for numerical computation, mainly in some aspects of thinking, not commenting on the merits and demerits of specific languages. In addition, I want to know where to write where, if there is something wrong to welcome everyone to correct.

One, tuple and array

If the numerical calculation is just a subtraction between two scalars, then I don't need to waste my breath here. Vector Ah, matrix ah, multidimensional array ah what, is the real protagonist of numerical calculation. Therefore, a programming language suitable for numerical computation must have a good way of representing arrays, especially multidimensional arrays. Which way is good? Is this:

int a[m][n][k];

This is still the case:

int a[m,n,k];

It seems to be no different, but what if you want to get the shape of array a? such as this:

? = A.shape ();

Or do you want to change the shape of array a further? such as this:

A.reshape (?);

In the code above, "?" What should be used instead?

If I were to give an answer, I would say, "to use tuples." Many programming languages have tuple concepts, such as Python. Tuples are several values separated by commas, which can be added in parentheses or without adding them. I think it's more readable when you add parentheses. For example (A, b) is a tuple, (3,4,5) is also a tuple. If you write [3,4,5] that's an array, in Python, it's also called a list. However, Python's list function is more powerful than arrays, because arrays can hold only values of the same data type, and the list holds any object. Arrays generally cannot dynamically change the length, and the list can. The term cell array is used in the octave language to represent containers that can hold different types of objects. Arrays and matrices in octave can be dynamically changed in length. The C language array does not dynamically change the length of the function, and if you use C + +, you must use the Vector<> template class.

I think that a good programming language must have a concept of "tuples", which must be able to use curly braces, brackets, and parentheses. In there is no tuple this problem, a lot of languages do poorly, C language does not, C + + also No, Java No, C # This has a lot of new features of the language is not, do not tell me there is tuple<> template class can be used, that really does not have a language built-in meta-group function good. The C language is not good enough to be able to use the big brackets. You see it, whether it's initializing an array or initializing a struct, it's curly braces. And Python and JSON do well, initialize the array with brackets, and initialize the object or dictionary with curly braces. If you add parentheses to the tuple, it's a job.

Numerical calculations can be performed on scalar, one-dimensional arrays, two-dimensional arrays, and n-dimensional arrays. The array can be organized as follows:

The largest use of tuples is the ability to represent the shape of an array. For example, the shape of a one-dimensional array is (n,), please note that the comma cannot be omitted. The shape of the two-dimensional array (m,n), the shape of the three-dimensional array (m,n,k), and so on. In addition, tuples can be used to index elements in an array. Like what:

A = [[1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14,15,16= a[2,3,3];

Tuples also have a big purpose, which is to allow a function to return multiple values. C language In this aspect is done more ugly, if a function to return multiple values, can only give the function to pass pointers or multiple pointers as parameters, C + + can be referenced, C # more superfluous, there is an out keyword to modify the function parameters. Microsoft you really, since you can think out, you can't think of tuples? A common example, such as the Meshgrid () function, can initialize two arrays at the same time, and the peak () function can initialize three arrays at a time. You see how convenient they are using tuples:

(xx, yy) == Peak ();

In addition, tuples can be used like this, such as swapping the values of two variables:

(A, b) = (b,a);

Second, array initialization

In numerical calculations, the initialization of an array is also a very important link. If you write like this in the C language:

int a[] = {1234};

It is estimated that many people want to dozens. Write this:

 for (int i=0; i<; i++) {    = i+1;}

is not elegant either. I just want to initialize an array, how do I have to write a loop? If it is a two-dimensional array, it will have two layers of loops, three-dimensional array will have three layers. It's really too disturbing.

Also, as mentioned earlier, I don't like to use curly braces when initializing arrays. I think the brackets are in the array. such as this:

A = [1234];

This is a one-dimensional array, but if you write:

A = [[1234]];

is a row vector. If it is written like this:

A = [[1], [2], [3], [4]];

So this is a column vector, such as:

Of course, the above example has only four digits, so it is understandable to write it. What if it's a lot of numbers? Or an array of many dimensions? Many initialization functions are required, and these initialization functions are best able to accept tuples as parameters to determine the shape of the array. such as this:

1 (3,4,5));  // Initializes an array of 3*4*5 with 1 to 60 digits B = Randn (345// Initializes an array of 3*4*5 with a random number 

Other initialization functions are linspace (), logspace (), ones (), zeros (), eyes (), and so on. These functions can also be used with reshape (), such as:

c = Linspace (02). Reshape (345);

In all of these initializations, tuples are an important component.

Three, range and slices

In fact, the range can be a function, but also a little bit more, like this:

0:2;  // 0,2,4,6,8,10 One:0:-3//11,8,5,2

In some languages, this function is also called slicing. In fact, ":" The flexible use of punctuation can certainly not be wasted. With slices, you can get a sequence of numbers by simply specifying the starting value, the abort value, and the step size.

However, the maximum use of ":" is not to initialize the array, but to index the array. For example, A is a three-dimensional array that can be sliced to get a subset of the data. See the following code:

A = Range (1). Reshape (345//  A is a three-dimensional array b = a[1  2:31:4//  B is a two-dimensional array with values [[12, 13, 14, 15], [17, 18, 19, 20]] 

You can specify the step size in addition to the starting and ending values. Of course, you can also use a single ":" to represent the entire axis. For the concept of axes, you can look at the pictures in front of me. See the following code:

A = Range (1). Reshape (345//  A is a three-dimensional array b = a[1 the value of //  B is a two-dimensional array [[1,2,3,4,5], [6,7,8,9,10],  [11,12, D, +, +], [16,17, Min .,]]

Four, do not write loops

When you subtraction a multidimensional array, you avoid writing loops if you use a traditional language like C. For example, to calculate the addition of two multidimensional arrays, you have to write this code:

m =Ten; n= -; k= -; a= Randn (M, n, K);//A three-dimensional array of shapes (m, n,k) initialized to random valuesb = Randn (M, n, K);//A three-dimensional array of shapes (M, n, K) initialized to random values for(intI=0; i<m; i++){     for(intj=0; j<n; J + +){         for(intp=0; p<k; p++) {c[i, J, p]= A[i, J, p] +B[i, J, p]; }    }}

The code above is of course far less concise than the following:

C = A + B;

So the non-write loop is basically the standard configuration for all numeric computing languages. Matlab and octave are like this, NumPy is so, the R language is the same. C + + is also pursuing this, because C + + has operator overloading functionality, so you can overload the subtraction operator with the Matrix class. But the infrastructure of the operators in C + + is flawed, such as that it does not have a exponentiation operator (the power operator), such as in Octave and NumPy, where $x^y$:x**y can be computed. So in C + +, only the function power (x, y) is used. Do not think of the ^ operator, which is a bitwise operator, so the exponentiation is only used. In addition, multidimensional array operations have special exceptions, such as the subtraction between two-dimensional arrays, which can be either element-wise or matrix-subtraction. Vector calculations also have special cases, which can be either element-wise or vector-subtraction (point multiplication). If it happens to be a vector of length 3, you can also calculate the cross-multiplication. These operators need to be redefined, so although C + + has the mechanism of overloaded operators, because these operators completely transcend C + + infrastructure, C + + has no way to write gracefully.

One of the advantages of not writing loops is that you can optimize the speed of operations. Optimization is the responsibility of the compiler or interpreter, and the person who writes the numerical calculation program does not have to bother at all. The optimizations that a compiler or interpreter can take may be to take advantage of multimedia instruction sets such as SSE, or it may be the multithreading advantage of multi-core CPUs, even using GPGPU calculations. If the user does not want to write a C-like cycle, and he does not inline assembly or OpenMP, then there is no computational speed optimization.

V. Broadcasting

Without writing loops, it is of course easy to subtraction two multidimensional arrays directly. But what if the shape of the two arrays is different? such as a two-dimensional array plus a row vector, or a two-dimensional array with a column vector, or even an array subtraction a scalar, what will happen?

Don't worry, in a numerical-oriented language, there is usually a "broadcast" feature. When the shape of the two arrays is different, the smaller of the shapes can often be broadcast on a dimension of length 1. Such as:

Six, singular index

Fancy indexing, some books translated into fancy index, but I think it is better to call a singular index. It refers to a low-dimensional array, which can be indexed using a high-dimensional array, and the resulting result is a high-dimensional array. If the index contains slices, you might get an array of a higher dimension as the result.

This concept is more difficult to understand. Write again tomorrow.

There are some other features, I think about it at any time to update this article.

All the pictures in this article were drawn using the Inkscape vector graphics software in Ubuntu.

(Jingshan Ranger in 2015-01-19 published in the blog Park, reproduced please indicate the source. )

Linux Lakes 17: What are the characteristics of a language suitable for numerical computing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.