Previous Blog We explained the implementation of the process (function) call in assembly language. It is important to understand how the data is passed between the caller and the callee, as well as the allocation and release of local variable memory within the callee. So this blog we will explain the allocation and access to arrays.
1, the basic principle of the array
We know that the array is a collection of data of a basic data type, for the data type T and the integer constant N, the declaration of the array is as follows:
T A[n]
The above is called the array name. It has two effects:
①, which allocates a contiguous region of l*n bytes in memory, where L is the size of the data type T (in bytes)
②, a as a pointer to the beginning of the array, if the starting address of the assigned contiguous region is XA, then the value of this pointer is XA
That is, when we use a[i] to read the array elements, we are actually accessing xa+i*sizeof (T). sizeof (t) is the amount of memory used to obtain the data type T, in bytes, such as if T is int, then sizeof (int) is 4. Since the subscript of the array starts at 0, when I equals 0 o'clock, the address we access is XA
For example, the following array declaration:
Char a[12]; Char *b[8]; Double c[6]; Double *d[5];
We can get the following information: note Because both B and D are declared arrays, in IA32, pointer variables occupy 4 bytes of memory space.
such as the following code:
#include <stdio.h>int main () { int a[10]; int i; for (i = 0; i < i++) { printf ("%d\n", &a[i]); } printf ("Array size:%d\n", sizeof (a)); return 0;}
Printing results are:
From the above we can also see that the starting address is 6356736, that is, the address of a[0], the back of the access is incremented by 4 bytes.
In IA32, memory reference directives can be used to simplify array access. For example, for the above int a[10], we want to access a[i], when the address of a is stored in the register%edx, and i is stored in the register%ECX. The instructions are then calculated as follows:
MOVL (%edx,%ecx,4),%eax
This performs the address calculation xa+4i, reads the value of the memory location, and stores the results in the register%eax.
2. Pointer arithmetic
The C language allows the pointer to be evaluated, and the computed value scales based on the size of the data type referenced by the pointer.
That is, if p is a pointer to data that executes type T, the value of P is XP, then the value of the expression P+i is Xp+l*i, where L is the size of the data type T.
Assuming that the starting address and integer index of the integer array E are stored in registers%edx and%ECX respectively, the following is the assembly code implementation for each expression, and the results are stored in the%eax.
In the example above, the Leal instruction is used to generate the address, while the MOVL is used to reference the memory (except for the first and last case, the former is to copy an address, the latter is a copy index); The last example shows that the difference between two pointers in the same data type structure can be computed, and the resulting value is divided by the data type size
3. Nesting of arrays
An array of arrays, such as a two-dimensional array, int a[5][3]. The allocations and references to the arrays described above are also established.
for array int a[5][3], the following is indicated:
We can look at a as an array of 5 elements, and each element is an array of 3 int types.
4, fixed-length arrays and variable-length arrays
To understand the fixed-length and variable-length arrays, we have to figure out what the "definite" and "change" are for. Here we say that the two words are for the compiler, that is, if the length of the array at compile time is determined, we are called fixed-length arrays, and vice versa is called variable-length arrays.
such as int a[10], is a fixed-length array, its length is 10, its length has been determined at compile time, because the length is a constant. The previous C compiler was not allowed to define the length as a variable when declaring an array, but only for constants, although the current C + + compiler has started to support dynamic arrays, but the C + + compiler still does not support method parameters. In addition, the C language provides dynamically allocated memory space for functions like malloc and calloc, and we can convert the returned result to the desired array type.
For the following programs:
int main () { int a[5]; int i,sum; for (i = 0; i < 5; i++) { A[i] = i * 3; } for (i = 0; i < 5; i++) { sum + = A[i]; } return sum;}
We add-o0-s into the assembly code:
Main: pushl %ebp movl %esp,%ebp//to this ready stack frame subl $32,%esp//allocate 32 bytes of space Leal -20 (%EBP),%edx//assigns the frame pointer minus 20 to the%edx register movl $,%eax//sets%eax to 0, where the%EAX register is the focus. L2: movl %eax, (%edx)//place 0 into the frame pointer minus 20? Addl $,%eax//the first cycle,%eax is 3, for I,%eax= (i+1). addl $4,%edx//will%edx plus 4, the first loop%edx point to the frame pointer-16 of the position Cmpl $ ,%eax//compare%eax and 15? jne . l2//if not equal then back to L2 movl -20 (%EBP),%eax//the following five instructions have betrayed the Leal instructions, it is obvious from 20 to 4, is the array of five elements stored in the place. The following does not explain, adding directly and then returning the result. addl -16 (%EBP),%eax addl -12 (%EBP),%eax addl -8 (%EBP),%eax addl -4 (% EBP),%eax leave ret
The comments on the instructions are already clear, so let's look at how the loop process is calculated:
Looking at this diagram, I believe you will be more aware of the intent of the program, and start%EBP minus 20 in order to assign values to the array in turn. Here the compiler used a very perverted optimization technique, that is, the compiler found the law of a[i+1] = A[i] + 3, so the use of addition (adding the%eax 3) instead of the multiplication of i*3, but also the addition (that is, the address is added 4, Instead of using the start address plus the index multiplied by 4, the multiplication operation during the address calculation of the array element is replaced. And the i<5 in the cycle conditions, also become 3*i<15, and 3*i is equal to a[i], so when the entire array of index I, to meet the a[i+1]=15 (note that in the loop,%eax has been stored a[i+1] value, in addition to the first 0), It means that the loop is over, and that's what coml and jne instructions do.
Figuring out the fixed-length array, let's look at the variable-length array below. In the ISO C99 supported by the GCC version, the dimension of the allowed array is an expression that is calculated when the array is allocated. For example, the following function:
int var_ele (int n,int a[n][n],int I,int j) {return a[i][j];}
The resulting assembly code is as follows:
As shown, the address in the calculated element i,j is xa+4 (n*i+j). This calculation is similar to the address calculation for a fixed-length array, unlike the following:
①, due to the addition of parameter n, the address of the argument on the stack moved
②, the multiplication instruction is used to calculate the N*i (line 4th), instead of the Leal instruction to calculate 3i.
So referencing a variable-length array requires only a little change to the fixed-length array, and the dynamic version must use the multiplication instruction to extend n times the I, rather than a series of shifts and additions. In some processors, the multiplication instruction consumes a very long instruction cycle, but in this case it is unavoidable.
In-depth understanding of computer systems (3.8)------array allocation and access