C Compiler Anatomy of the variable parameter function of _c language

Source: Internet
Author: User
Tags dashed line

The variable parameter function of C language

There are many places in the UCC compiler that use the C-language variable parameter function, where we specifically use a section to analyze the implementation principle of the C-language variable-parameter function. The printf function in the C standard library is a typical variable parameter function whose interface is as follows, the ellipsis in the function declaration ... Indicates that this is a variable parameter function.

int printf (const char *format, ...);

Let's take a simple example to illustrate the invocation procedure of the printf function, as shown in 4.2.12. The 1th to 11th line in the figure corresponds to HELLO.C, and the 12th to 25th line is the abstract syntax tree generated by the UCC compiler Hello.ast, and the 26th to 33rd Line is UCC's intermediate code Hello.uil, and the 34th to 52nd Line is a partial assembler code HELLO.S generated by UCC.


Figure 4.2.12 printf () function

We notice that the argument a of line 9th is a float type, and D is a char type, and the function of the abstract syntax tree for line 17th is to convert argument A to a double type, while the 21st row is used to convert argument d to type int. We introduced in section 2.4 of the type system of C, for the old style function declaration of the shape of int f (), the C compiler will perform an actual parameter promotion as required by the C standard, and in the UCC compiler, this action is done by the function promoteargument (). For parameters of the argument function, such as the first argument from a after formatting string in the 9th row of the printf function call, the C compiler also makes the corresponding argument elevation, that is, char and short shorter than int are promoted to type int, and the float type is promoted to type double. The middle code in line 29th to 31st of Figure 4.2.12 is a visual reflection of the process of this argument ascension. The corresponding assembly code, as shown in line 39th to 52nd, the code of line 39th to 40th converts a float type A to a double type, saves the converted result to a temporary variable -12 (%EBP), and in section 1.5 describes the assembly instructions associated with floating-point arithmetic. According to the calling convention of the C function, the parameters are in the right-to-left order into the stack, the 41st line of the instruction completes the conversion of the parameter d from char to int, the 42nd line puts the converted result into the stack, and the 43rd row the parameter C into the stack, and the 44th to 46th line loads the double-precision floating-point number B into Line 47th to 49th loads the double-precision floating-point number from the temporary variable -12 (%EBP) and merges it into the stack. Line 50th puts the first address of the formatted string into the stack, we have stored its address in register EAX on line 38th, and the 51st row makes a real function call, because all the parameters are stored in the stack, taking up a total of 4+8+8+4+4 (that is, 28) bytes, when the function printf returns, We add 28 to the ESP pointer in line 52nd.

The code for the Library function printf () already exists when we write the above hello.c, which means that printf is not really aware of the fact that we passed a few arguments when we called it. For printf, it simply extracts the corresponding parameters from the stack, following the description of the formatted string.

Actually only 10 of this one parameter, but printf sees there are two%d,

I still try to take two parameters from the stack and print out the garbage value of the shape like 10,1074172310.

printf ("%d,%d", 10);

There are actually 10,20,30 these 3 parameters, but printf only sees one%d,

So just print out the parameter 10

printf ("%d", 10,20,30);

Figure 4.2.13 describes the above procedure more clearly, the left side of the dashed line is the stack area in the data structure, and the right side is the global static data area. In the stack area, we have marked the actual stack of parameter types in the order of int, int, double, double, and char *, which occupy a total of 28 bytes of stack space. The formatted string is actually stored in the global static data area, and is just the first address of the string in the stack.


Figure 4.2.13 Stack

Now, let's look at the problem in a different way, assuming that we are the implementation of the printf library function, in the function body of the printf function, we can access the formatted string in the global static data area through the formal parameter format, through the expression & Format we can know the memory address of format in the stack. By the memory layout shown in Figure 4.2.13, we can calculate the address of the other parameters by &format, and with the memory address of these parameters, we will be able to access them. For the convenience of calculation, let us assume that the address of the &format is a decimal 10000, the figure can be calculated from the above 4 parameters corresponding to the address of 10 system 10004, 10012, 10020 and 10024. The calculation process is as follows:

int printf (const char *format, ...) {

unsigned int addr =&format;

The parameter type of%f corresponding to "a =%f" is double, and the address is

Addr + sizeof (char *), which is 10004

The parameter type of%f corresponding to "B =%f" is double, and the address is

Addr + sizeof (char *) +sizeof (double), or 10012

The parameter type corresponding to%f in "C =%d" is int, and the address is

Addr + sizeof (char*) +sizeof (double) + sizeof (double), or 10020

The parameter type that corresponds to%c in "D =%c" is char, and the address is

Addr + sizeof (char*) +sizeof (double) + sizeof (double) +sizeof (int), 10024

}

It is easy to access the contents of a memory unit if it knows the address of the cell and knows the type of the memory unit, for example, to access a unit of memory of type double with address 10004 above, we simply write the following code in C language. With the expression *dptr, we can do whatever we like.

Double * Dptr = (double *) (addr +sizeof (char *));

In this way, we can write the OurPrintfV1 argument function shown below to remove the other "nameless" arguments on the format from the stack, purely to demonstrate how to access other "nameless" parameters based on the address of the formal parameter format, as shown in 4.2.13. It is important to note that we deliberately ignore the process of formatting strings, although this is just a simple string judgment and is not too complex. The printf call added in the body of the function is purely to verify that we have actually correctly removed the arguments from the stack.


Figure 4.2.14 OurPrintfV1 ()

To make the code in Figure 4.2.14 look more elegant, we introduce some macros to handle the process of "locating other parameters by format's address", and OurPrintfV2 () in Figure 4.2.15 is similar to OurPrintfV1 (), but it looks much simpler.


Figure 4.2.15 OurPrintfV2 ()

In contrast to figure 4.2.14 and figure 4.2.15, we can clearly see that the work of the macro definition Va_start () is to take the address of the formal parameter format and do the &format+sizeof (format) operation, which we have described earlier, The C compiler takes an argument to the "anonymous" parameter in the argument function, so the memory size of the actual argument to the stack will be an integer multiple of sizeof (int), and the macro definition Align_int (n) of the 9th line of the 4.2.15 completes the alignment operation. Assuming that sizeof (char) is 1,sizeof (short) is 2 and sizeof (int) is 4, then for the following macro,

Macro Align_int (char) expands to (1+3) & (to), i.e. 4

Macro Align_int (short) expands to (2+3) & (to), i.e. 4

We know that an integer multiplied by 4, quite shifted it to the left 2 bits, which means that any integer that is 4 times times the minimum 2 bits is 0. The purpose of the above (1+3) and (2+3) operations is to obtain a number that is not less than 4, and the purpose of the & operation is to take the low 2-bit clear 0 of the number not less than 4, thus obtaining the alignment result we need. If no alignment is done in the macro definition Va_arg, the va_arg () is defined as:

#define VA_ARG (list, T) (* (T *) ((list+= sizeof (t))-sizeof (T)))

Assume that the value of the variable list is a decimal 20000, then the Va_arg (LIST,CHAR) macro expands to get the following expression, which is actually equivalent to * ((char*) 20000), which takes the contents of memory unit 20000, but the expression has side effects, After the evaluation is completed, the value of the list variable becomes 20001. Parameters are aligned to sizeof (int) when they are in the stack, and if you follow an address 20001 that is not 4 times times to remove a parameter, we will not be able to access the required parameters correctly.

(* (char *) ((list + + 1)-1))

The macro defines whether Va_arg has the same alignment as the 9th line of the 4.2.15, which is related to the actual compiler's header file. The surest and most portable approach is not to use Va_arg (List,char), Va_arg (List,short), or Va_arg (list,float). Because the C compiler has already made an argument to the "nameless parameter" of the variable parameter function, the "nameless parameter" that really exists in the stack will not be char, nor short, nor float, so we use Va_arg (ap,int) instead of Va_ in line 20th of Figure 4.2.15. Arg (Ap,char). Of course, if there is an alignment of the Align_int in line 9th, it is not entirely necessary to use Va_arg (Ap,char). However, if you use Va_arg (ap,float), there will still be a problem because sizeof (float) is 4, whereas a double on the stack that actually exists will account for 8 bytes. C language is concise and powerful, but to better control the C language, you need to have a clearer understanding of the memory layout of the shape 4.2.12. In many cases, the reason why some C programmers are unable to use C pointers better is because there is no clearer concept of the associated memory layout. C + + actually puts a similar requirement on programmers, even in the Java language, which intentionally fades the concept of pointers, and if there is no concept of memory layout at all, it is possible to write Java code as shown below. The programmer originally expected two calls to BG.F () to print out 5 Hello, a total of 10 Hello, but surprisingly found that there are only 5 hello.

Class bug{

int i= 0;

public void F () {

for (; i < 5; i++) {

System.out.println ("Hello");

}

}

public static void Main (String args[]) {

Bug bg = new Bug ();

BG.F ();

BG.F ();

}

}

Of course, when you actually use a macro such as the Va_arg function, we just need to include the standard header file Stdarg.h, and we don't need to display the macro that defines the 8th to 12th line in the diagram 4.2.15. These macros come from the UCC compiler's header file, Ucl\linux\include\stdarg.h. In some cases, we just want to do some preparatory work in the OurPrintfV2 of the variable parameter function of the 4.2.15, and we still want to leave it to another function to deal with, such as the Do_error function in UCL\ERROR.C. As shown in 4.2.16.


Figure 4.2.16 Do_error ()

Figure 4.2.16 Line 7th, we used to record the number of errors in the global variable Errorcount plus 1, the 10th line to print out the source code of the error name and the wrong line number, but the real error message printing, we still want to give the library function vfprintf (), In conjunction with Figure 4.2.13, it is easy to know that the vfprintf () function can remove the "nameless argument" from the do_error corresponding stack record to the Do_error function as long as it knows the first address of the formatted string, and the address of the parameter format of the Do_error function. The access method used is completely similar to figure 4.2.14 and figure 4.2.15. The only difference is that for vfprintf, there is no need to get the address of format through Va_start (), and the function call vfprintf () in line 14th of Figure 4.2.16 has passed the address of format to the following parameter ap2.

int vfprintf (FILE *stream, const CHAR*FORMAT2, va_list AP2);

Figure 4.2.17 gives the memory when we make the following function call, we can see that the vfprintf's formal parameter ap2 has pointed to the do_error corresponding stack in the unknown parameter start position, Do_ The error format and vfprintf Format2 point to the same formatted string. For the vfprintf function, with the first address of the formatted string, and the first address of the Nameless parameter, seven dragon beads have been gathered to summon the dragon.

Do_error (coord, "struct member%sdoesn ' t exsist", "abc");


Figure 4.2.17 the stack of Do_error

C Compiler Anatomy of the variable parameter function of _c language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.