Why are the arrays of some programming languages calculated from zero?

Source: Internet
Author: User
Tags python list
Like the C-language array, the Python list is counted from zero. To tell the truth, from the beginning of programming to the present, have not thought why this design? MATLAB is not starting from scratch.
Is there any advantage in engineering?

Reply content:

This question, Dijkstra in 1982 wrote a small article, titled Why Numbering should start at zero
In a total of 3 pages of handwriting, I will pick the focus of the roughly translated:


In order to represent a natural number sequence 2, 3, ..., 12, exclude the middle of the three points (...), a total of four ways to choose from:

A) 2 <= i < 13

b) 1 < i <= 12

c) 2 <= I <= 12

d) 1 < i < 13

Is there any reason why we should choose one of these and not the other? Yes, indeed. The advantage of observing A and B) is that the difference between the upper and lower bounds of an inequality is exactly the length of the sequence. Based on this, as an intermediate conclusion: in both A and B, two sequences are contiguous, meaning that the upper bound of one is equal to the lower bound of another. But these considerations do not allow us to choose between A and B, so we start from scratch.

There is a minimum natural number. Excluding falling bounds – as in B) and D – would make a lower bound of a sequence starting from the smallest natural number into an unnatural number. Such expressions are difficult to see, so the nether should be more inclined to <=, like A and C). Now, consider the sequence that starts with the smallest natural number: If the upper bounds are included, when the sequence shrinks to an empty sequence, it makes C less like the set of natural numbers. Such expressions are also difficult to see, so the upper bounds should be more inclined to <, as in A and D). To sum up, we should prefer to use a as a means of expression.


When processing a sequence of length N, we expect to differentiate its elements by subscript. The next annoying question is, what is the subscript value we should give to its first element? Using a) expression, the subscript range is 1 <= i < n+1 when starting from 1, and the next marker is 0 <= i < N at the beginning of the 0, better looking. so let's start with the sequence number starting with 0: The ordinal of an element (subscript) equals the number of elements in the sequence before it. the moral of this story is that we should--after so many centuries! --Treat 0 as a natural number.

is a masterpiece of the bcpl author, with the aim of reducing a subtraction instruction in the compiled code.

@haha Wang mentions Dijkstra's article. Why numbering should start at zero. But he did not translate the full text. Full-text translation is supplemented here. There may be some mistakes.

-----------------------------------

Why should I count starting from 0

In order to represent the subsequence of the natural number, 2, 3, ..., 12, without using the ellipsis three pips, we can choose 4 kinds of Conventions:

    • A) 2≤i < 13
    • b) 1 < i≤12
    • c) 2≤i≤12
    • d) 1 < i < 13

Is there any reason why it is better to choose one of the conventions than the other? Yes, there really is a reason. It can be observed that A and B) have the advantage that the subtraction of the upper and lower bounds is exactly equal to the length of the subsequence. In addition, as a corollary, the following observations are also established: in a), b), if two sub-sequences adjacent, one of the upper bound of the sequence is equal to the lower bound of the other sequence. But the above observation does not allow us to choose a better one from both A and b). Let's start the analysis again.


There must be a minimum number of natural numbers. If, like B) and D, the subsequence does not include the Nether, then when the sub-sequence starts from the smallest natural number, it causes the nether to enter the non-natural number area. It's ugly. So for the nether, we should use ≤, as a or C, for example. Now consider that if the subsequence includes an upper bound, the upper bound will also enter the non-natural number area when the sub-sequence starts from the smallest natural number and the sequence is empty. It's ugly, too. So, for the upper bounds, we should adopt <, as a or b). So we come to the conclusion that agreement A is a better choice.


Discussion : Mesa is a programming language developed by Xerox PARC (Xerox Parker Research Center), with 4 different ways of representing integer intervals, all with special markings in Mesa. With a lot of experience with Mesa, it is pointed out that in three different representations, bad and wrong code will continue to be drawn. Therefore, today's experienced Mesa programmers strongly recommend that you do not use the following three features, although they are also available. Whether it is true or false, I also put forward this practical evidence, some people will feel uneasy when the conclusion has not been verified by practice. (End of discussion)


When dealing with a sequence of length n, we want to differentiate its elements by subscript, and the next question to analyze is what subscript values should be given to the first element. We still use a) Convention, when the subscript starts at 1, the subscript interval is 1≤i < n + 1, and when starting from 0, you can get a more beautiful interval 0≤i < n. So let's start with the ordinal from 0: The ordinal of an element (subscript), equals the number of elements in the sequence, before it. This story reminds us that after so many centuries, it is better to take 0 as the most natural number.


Discussion : Many programming languages do not pay enough attention to the details of the counts. In Fortran, subscripts are always starting from 1, whereas in the Pascal language, the Convention c is used; the language SASL, which is closer to the present, goes backwards to the Fortran way: In Sal, the sequence is also manipulated on a positive integer. (End of discussion)


The most recent incident prompted me to make the above analysis. At the time, a math colleague at my university, not a computer-science student, was emotionally charged with the "pedantic" behaviour of a young computer-science man who counted from 0 as a habit. My math colleague sees a reasonable agreement that is consciously taken out of reason and considered provocative. (even "... The end of "Such an agreement is also considered provocative." “... End "Such a convention is useful: I knew there was a student who, of course, thought that the problem had ended in the first page and had almost failed to pass the exam. I think Antony Jay is right: "In a common religion, pagans must be expelled, not because they may be wrong, but because they may be right." ”

This and the way the array is accessed, arrays are in memory, how to access a memory, just need to get the address of this memory can be.
But how to get the address of the element I, get the first address of the array, plus the relative offset can calculate the address of the element I, this offset is good, because each element is the same size, you can use the element size multiplied by the number of elements to get the offset.
The array memory structure is a contiguous sequence of memory blocks. The following:
Address of the 1th element = First address A
Address of 2nd element = First address A + 1* element size
Address of the 3rd element = First address A + element size
Address of 4th element = First address A + zero element size
Address of the 5th element = First address A + the size of the element
...............
Address of the element i = First address A + (i-1) * Element size

So if you use one of the first number of elements in the calculation of memory address will always do a subtraction, with 0 will not, so in violation of the conventional method for the calculation of speed.

The use of disassembly can also be confirmed. Add a little
Modulo operation:
The following marks start at 0:
A[i MoD N]
The following marks start at 1:
A[i mod N + 1] Not starting from scratch is the tease force.

Not from the beginning of the original design may be to make people feel natural, close to the habit of natural persons. But when most languages start with 0, it's not natural to start at 0.

In terms of efficiency, starting from 0 is the most efficient, the subscript number is directly the offset of the storage location, starting from 1 also have to subscript minus 1 is the offset, performance difference although very small, but ultimately not the most direct way. against the answer here for the sake of efficiency.

C Language:

It is generally believed that the C language is designed to avoid the overhead of reducing the subscript at run time, in fact, the compiler can completely take the method of converting to virtual start address to avoid all running overhead.
So, another plausible explanation is that the array of C languages interoperate with pointers.

The above from the "programming language-the road to Practice"
(Original: In C, the lower bound of every array dimension are always zero.) It is often assumed, the language designers adopted this Convention in order to avoid subtracting lower bounds from Dices at run time, thereby avoiding a potential source of inefficiency. As our discussion have shown, however, the compiler can avoid any run-time cost by translating to a virtual starting Locati On. (The one exception to this statement occurs when the lower bound have a very large absolute value:if any index (scaled by Element size) exceeds the maximum offset available with displacement mode addressing [typically 2^15 bytes on RISC machine S], then subtraction may still is required at run time.)
A more likely explanation lies in the interoperability of arrays and pointers in C (Section 7.7.1 ): C ' s conventions allow the compiler to generate code for an index operation on a pointer without worrying about the Lowe R bound of the array into which the pointer points. Interestingly, Fortran array dimensions has a default lower bound of 1; Unless the programmer explicitly specifies a lower bound of 0, the compiler must always translate to a virtual starting Lo cation.)

Python:

Someone asked me on Twitter why Python uses the array index method (0-based), which is the first in 0, and gives me a good article about it. The link. It reminds me of a lot of things: one of the predecessor of Python, the ABC language uses an array index (1-based) with the first 1, while the C language, which has a great influence on Python, uses 0-based. Some of the programming languages I developed earlier (Algol, Fortran, Pascal) used 1-based, while others were more flexible. I think slicing grammar is one of the reasons I made this decision.

Let's take a look at the use of slice syntax. Its most common use should be "first n bits of the tangent array" and "n bits after the first bit of the tangent array" (the former is a special case of the latter under the i== starting bit). If we don't need to use unsightly +1 or 1 compensation, then the code will look a lot nicer.

By using the 0-based index method, Python's semi-open interval and the default matching interval are beautiful, such as: A[:n] and A[i:i+n], the former is the ellipsis of a[0:n].

Under the 1-based index method, if you want to use a[:n] to represent the first n elements, you can only choose to use the slice start bit and the slice length 2 parameters in the slice syntax, or the usage of a closed interval. Using the 1-based index method, the half-open interval slicing syntax is not beautiful enough. Similarly, using the closed interval slicing syntax, you can only use a[i:i+n-1] to represent the nth element from the I-bit. So if you use the 1-based index method, it is more appropriate to use the slice length. You can write a[i:n]. In fact, the ABC language is like this-it uses a special usage, written as a@i|n. (Refer to ABC QUICKREFERENCE)

But is the use of index:length suitable for other situations? To be honest, I don't remember much, but I think I really liked its beautiful half-open interval syntax. In particular, when two slice operations are adjacent and the end index of the first slice operation is the index of the start of the second slice, it is really beautiful. For example, if you want to slice an array with the I, J two points, they will be a[:i], a[i:j], and a[j:].

This is why Python uses the 0-based index method.


The above from the father of Python: Why the Python array subscript starts with 0

This problem can be intuitively understood in the string intercept function. The first word, the second word, the third word-the subscript number should be interpreted as the number of the comma-separated position, not the number of the position itself. Otherwise you say 1 is the first word left or right?
Do not like the memory of this underlying + historical reasons, because that does not explain why not get rid of later. I'm following the joke.

For my this slag to write C language compiler, I would definitely like to set the array as 0 start, the address calculation is much more convenient, the compiler is much better, anyway, C language so close to the assembly of things is it ... So we can make the rules for convenience, hum, not because of laziness! I mention some of the things that are not mentioned here for the time being.
    1. The subscripts are usually represented by an infinite number of integers. An n-bit integer representation range is {0, 1, ..., 2^n-1}. If the target starts with 1, it means that only 2^n-1 can be used. It may not be a big problem to make a 32-bit/64-bit, but it is a bigger problem to use 8-bit/16-bit as a target.
    2. The same remainder calculation (Modular arithmetic) can be used for the next target.
  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.