Variable-length parameters in C language are rarely used in self-designed interfaces during development. However, printf is the most commonly used parameter interface, while feeling the strong charm of printf, do you want to figure out how printf is implemented? Here we will explore the mysteries of the variable length parameters in C language.
Consider the following question: if we do not use the facilities provided in the C standard library (libc), can we implement functions with variable length parameters? Let's try it.
Step by step, let's take a look at the fixed parameter list function,
Void fixed_args_func (int A, double B, char * C)
{
Printf ("A = 0x % P \ n", & );
Printf ("B = 0x % P \ n", & B );
Printf ("c = 0x % P \ n", & C );
}
For functions with fixed parameter lists, the names and types of each parameter are directly visible, and their addresses can also be obtained directly. For example, through & A, we can get the address of, we also learned that A is of the int type through the function prototype Declaration; we can get the address of B through & B, and know that B is of the double type through the function prototype Declaration; through & C, we can get the address of C and use the function prototype declaration to understand that C is of the char * type.
But for the variable-length function, we have not been so smooth. Fortunately, according to the C standard, functions that support variable length parameters must have at least one leftmost fixed parameter in the prototype Declaration (this is different from the traditional C, traditional c allows pure variable-length parameter functions without any fixed parameters), so that we can get the fixed parameter address, but still cannot get the address of other variable-length parameters from the Declaration, such:
Void var_args_func (const char * FMT ,...)
{
......
}
Here we can only get the address of the fixed parameter FMT. We cannot determine it from the function prototype only "... "There are several parameters and what types of parameters are, and naturally the location cannot be determined. So how can we do it? Recall the process of passing parameters in the function in the brain, regardless "... "How many parameters are there and what type each parameter is, they are the same as the parameter passing process of fixed parameters. In simple terms, they areStack operationsStack is open to us. In this way, once we know the location of a fixed parameter on the stack of a function frame, it is entirely possible to export the location of other variable-length parameters. Following this idea, we will continue to go down, an example is used to explain: (here we should note that function parameter loading and parameter space address allocation are all "implementation-related", and different platforms and compilers may be different, so the example below is only in IA-32, Windows
XP, mingw GCC v3.4.2)
Use the above fixed_args_func function to determine the order of the stack.
Int main ()
{
Fixed_args_func (17, 5.40, "Hello World ");
Return 0;
}
A = 0x0022ff50
B = 0x0022ff54
C = 0x0022ff5c
From this result, it is obvious that the parameters are pushed from right to left into the stack one by one (the stack is extended from high address to low address, and the stack bottom occupies the highest memory address, the parameter that is first imported into the stack has the highest geographic position ).
We can come to the following conclusion:
C. ADDR = B. ADDR + x_sizeof (B);/* Note: x_sizeof! = Sizeof. Let's talk about it later */
B. ADDR = A. ADDR + x_sizeof ();
With the above "equation", we seem to be able to export the positions of variable parameters in the Void var_args_func (constchar * FMT,...) function. At least the position of the first variable parameter should be: first_vararg.addr = FMT. ADDR + x_sizeof (FMT). Based on this conclusion, we try to implement a function that supports variable parameters:
Void var_args_func (const char * FMT ,...)
{
Char * AP;
AP = (char *) & FMT) + sizeof (FMT );
Printf ("% d \ n", * (int *) AP );
AP = ap + sizeof (INT );
Printf ("% d \ n", * (int *) AP );
AP = ap + sizeof (INT );
Printf ("% s \ n", * (char **) AP ));
}
Int main ()
{
Var_args_func ("% d % s \ n", 4, 5, "helloworld ");
}
Output result:
4
5
Hello World
Var_args_func is only for demonstration. The number and type of variable parameters are not determined based on the format string in the FMT message, but are directly written to the implementation. If you get this program under Solaris 9, after running the program, the correct results will not be obtained. Why. Let's explain this program first. We use the AP to obtain the address of the first variable parameter. We know that the first variable parameter is 4 and an int type, so we use the (int *) AP to tell the compiler, the memory with the AP as the first address should be treated as an integer. * (int *) AP obtains the value of this parameter. The following variable parameter is 5, the address of another int type is ap + sizeof (the first variable parameter), that is, AP
+ Sizeof (INT). Similarly, we use the * (int *) AP to obtain the value of this parameter. The last parameter is a string, that is, char *, different from the first two int parameters, after ap + sizeof (INT), AP points to a char * type memory block on the stack (we call it tmp_ptr, char * tmp_ptr) the first address, that is, ap-> & tmp_ptr. What we want to output is not printf ("% s \ n", AP), but printf ("% s \ n ", tmp_ptr); printf ("% s \ n", AP) is intended to output the memory block referred to by the AP as a string, but the AP
-> & Tmp_ptr: The four bytes occupied by tmp_ptr are obviously not strings but an address. How to make the & tmp_ptr of the char ** type, we forcibly convert the AP (char **) AP <=> & tmp_ptr, in this way, we only need to add a * Before the (char **) AP to access tmp_ptr, that is, printf ("% s \ n", * (char **) AP );
As mentioned above, if var_args_func is put on Solaris, the correct result will not be obtained? Why? BecauseMemory alignment. When the compiler presses a parameter on the stack, It is not next to another parameter. The Compiler places the parameter based on the type of the variable parameter to the address that satisfies the type alignment, in this way, there may actually be gaps between parameters on the stack. In the above example, I obtained the Parameter Interval Based on the decompiled assembly code. Fortunately, it was 4, and then the code was written to death.
To ensure code portability,C Standard LibraryMany facilities are provided in stdarg. h for the variable length parameter. Here is a simple example to see how the standard library supports variable length parameters:
# Include <stdarg. h>
Void std_vararg_func (const char * FMT ,...){
Va_list AP;
Va_start (AP, FMT );
Printf ("% d \ n", va_arg (AP, INT ));
Printf ("% F \ n", va_arg (AP, double ));
Printf ("% s \ n", va_arg (AP, char *));
Va_end (AP );
}
Int main (){
Std_vararg_func ("% d % F % s \ n", 4, 5.4, "Hello World ");
}
Output:
4
5.400000
Hello World
Compare the implementation of std_vararg_func and var_args_func. va_list seems to be char *, and va_start seems to be (char *) & FMT) + sizeof (FMT ), va_arg seems to be the first address of the next parameter. That's right. In most platforms, the implementations of va_list, va_start, and var_arg in stdarg. h are similar. Generally, stdarg. h contains many macros and looks complicated. In some systems, the implementation of stdarg. h depends on some special functions built into thethe compilation.
System to handle Variable Argument lists and stack allocations, most of the implementations of other systems are very similar to the following: (Visual C ++ 6.0 implementation is clearer, because Windows applications only need to be transplanted between Windows platforms, there is no need to consider too many platforms ).
Use of C language va_list and _ vsnprintf
Here is an example.: # Define bufsize 80 Char buffer [bufsize]; /* This function is used to format a string with parameters */ Int vspf (char * FMT ,...) { Va_list argptr; // declare a variable of the conversion Parameter Int CNT; Va_start (argptr, FMT); // initialize the variable CNT = vsnprintf (buffer, bufsize, FMT, argptr ); // Format the string with parameters into the buffer according to the parameter list Va_end (argptr); // end Variable list, which is used in pairs with va_start. Return (CNT ); } Int main (INT argc, char * argv []) { Int inumber = 30; Float fnumber = 90.0; Char string [4] = "ABC "; Vspf ("% d % F % s", inumber, fnumber, string ); { Printf ("% s \ n", buffer ); Return 0; } Next we will discuss how to write a simple variable parameter C function. The following macros should be used in the program for C functions that write variable parameters: To use variable parameters, follow these steps: 1) first define a va_list variable in the function. Here is arg_ptr, which is a pointer to the parameter. 2) then use the va_start macro to initialize the variable arg_ptr. The second parameter of this macro is the first parameter of the first variable parameter, which is a fixed parameter. 3) then return the variable parameter with va_arg and assign the value to the integer J. va_arg. The second parameter is the type of the parameter to be returned. Here it is the int type. 4) use the va_end macro to end variable parameter acquisition. then you can use the second parameter in the function. if a function has multiple variable parameters, call va_arg to obtain the parameters. If we call the following three methods, they are all legal, but the results are different: Processing of variable parameters in the Compiler We know that va_start, va_arg, and va_end are defined as Macros in stdarg. H, because: 1) different hardware platforms 2) compiler differences Microsoft Visual Studio \ vc98 \ include \ stdarg. H, Typedef char * va_list; /* Defines va_list as char *, because on our current PC, the character pointer type can be used to store memory unit addresses. On some machines, va_list is defined as void */ # DEFINE _ intsizeof (N) (sizeof (n) + sizeof (INT)-1 )&~ (Sizeof (INT)-1 )) /* _ Intsizeof (n) macro is used to consider the systems whose memory addresses need to be aligned. The macro name should be aligned with sizeof (INT. Generally, sizeof (INT) = 4, that is, the address of the parameter in the memory is a multiple of 4. For example, if sizeof (n) is between 1 and 4, _ intsizeof (n) = 4; If sizeof (n) is between 5 and 8, _ intsizeof (n) = 8. */ # Define va_start (AP, V) (AP = (va_list) & V + _ intsizeof (v )) /* Va_start is defined as & V + _ intsizeof (V). Here & V is the starting address of the last fixed parameter, plus the actual occupied size, the starting memory address of the first variable parameter is obtained. So after we run va_start (AP, V), the AP points to the memory address of the first variable parameter */ # Define va_arg (AP, t) (* (T *) (AP + = _ intsizeof (t)-_ intsizeof (t ))) /* This macro does two things, ① Use the type name entered by the user to forcibly convert the parameter address to obtain the value required by the user ② Calculate the actual size of this parameter, and adjust the pointer to the end of this parameter, that is, the first address of the next parameter, for later processing. */ # Define va_end (AP) (AP = (va_list) 0) /* The X86 platform is defined as AP = (char *) 0, so that the AP no longer points to the stack, but is the same as null. some are directly defined as (void *) 0, so that the compiler will not generate code for va_end. For example, GCC is defined in this way on the Linux X86 platform. you should pay attention to one problem: Because the address of the parameter is used in the va_start macro, the parameter cannot be declared as a register variable or as a function or array type. */ There are two areas to explore: 1. # DEFINE _ intsizeof (N) (sizeof (n) + sizeof (INT)-1 )&~ (Sizeof (INT)-1 )) Here we will simplify this macro: # DEFINE _ intsizeof (N) (sizeof (n) + x )&~ (X )) X = sizeof (INT)-1 = 3 = 0000 0000 0000 (B) ~ X = 1111 1111 1111 1100 (B)
When a number is & (-x), the obtained value is always a multiple of sizeof (INT), that is, the function of _ intsizeof (n) is to set NRoundTo a multiple of sizeof (INT. Sizeof (n)> = 1, sizeof (n) + sizeof (INT)-1 after the round, it will be an integer> = 4; on other system platforms, the target value is 4, and the target value is 8, depending on the specific system. 2. # define va_arg (AP, t) (* (T *) (AP + = _ intsizeof (t)-_ intsizeof (t ))) In fact, with the implementation of var_args_func, it is not hard to understand here. However, there is a trick here, and many people will certainly not understand the first addition of _ intsizeof (T) and the addition of _ intsizeof (T). In fact, this is a bit transparent: the value returned by the entire expression (AP + = _ intsizeof (t)-_ intsizeof (t) is actually the same as the address pointed to by the original AP, the key is that after the entire expression is evaluated, the AP points to the address of the next parameter, which is so simple. C functions are pushed from right to left into the stack. Figure (1) shows the distribution position of function parameters in the stack. we can see that va_list is defined as char *, and some platforms or operating systems are defined as void *. let's look at the definition of va_start, which is defined as & V + _ intsizeof (V), while & V is a fixed parameter in the stack address. So after we run va_start (AP, V, the AP points to the address of the first variable parameter in the stack ,: High address | ------------------------------------------- |
| Function return address | | ------------------------------------------- | | ....... |
| ------------------------------------------- | | Nth parameter (the first variable parameter) |
| ----------------------------------------- | <-- After va_start, the AP points
| N-1 parameter (the last fixed parameter) |
Low address | ------------------------------------------- | <-- & V
Figure (1) Then, we use va_arg () to obtain the variable parameter value of type T. The preceding example is int type. Let's take a look at va_arg's return value of int type: J = (* (int *) (AP + = _ intsizeof (INT)-_ intsizeof (INT )));
First, ap + = sizeof (INT) is directed to the address of the next parameter. then return the int * pointer of AP-sizeof (INT), which is the address of the first variable parameter in the stack (figure 2 ). then, use * to get the content of this address (parameter value) and assign it to J. High address | -------------------------------------------- |
| Function return address | | -------------------------------------------- | | ....... |
| ------------------------------------------ | <-- After va_arg, the AP points
| Nth parameter (the first variable parameter) |
| ------------------------------------------ | <-- After va_start, the AP points
| N-1 parameter (the last fixed parameter) |
Low address | -------------------------------------------- | <-- & V
Figure (2) The last thing we want to talk about is the va_end macro. The X86 platform is defined as AP = (char *) 0, so that the AP no longer points to the stack, but is the same as null. some are directly defined as (void *) 0, so that the compiler will not generate code for va_end. For example, GCC is defined in this way on the Linux X86 platform. You should pay attention to one problem: Because the address of the parameter is used in the va_start macro, the parameter cannot be declared as a register variable or as a function or array type. This is the description of va_start, va_arg, and va_end. We should note that different operating systems and hardware platforms have different definitions, but their principles are similar. Notes for variable parameters in programming Because va_start, va_arg, and va_end are defined as macros, it seems stupid. the types and numbers of variable parameters are completely controlled by the program code in this function, it cannot intelligently identify the number and type of different parameters.
Someone may ask: Isn't Intelligent Recognition parameters implemented in printf? That's because the function Printf analyzes the parameter type from a fixed format string, and then calls va_arg to obtain variable parameters. that is to say, if you want to implement Intelligent Identification of variable parameters, you must make judgments in your own programs.
Another problem is that the compiler does not strictly check the prototype of the Variable Parameter Function, which is not conducive to programming error. If simple_va_fun () is changed:
Void simple_va_fun (int I ,...) { Va_list arg_ptr; Char * s = NULL; Va_start (arg_ptr, I ); S = va_arg (arg_ptr, char *); Va_end (arg_ptr ); Printf ("% d % s \ n", I, S ); Return 0; } The variable parameter is char * type. If we forget to use two parameters to call this function, a core dump (UNIX) or an invalid page error occurs (window platform ). but there may also be no errors, but they are hard to find, which is not conducive to writing high-quality programs.
The following describes the compatibility of VA macros. System v unix defines va_start as a macro with only one parameter: va_start (va_list arg_ptr ); Ansi c is defined as va_start (va_list arg_ptr, prev_param ); If we want to use the definition of System V, we should use vararg. the macros defined in the H header file are incompatible with the macros of System V. We generally use ansi c, so the definition of ansi c is enough, it also facilitates program transplantation. Summary: The function principle of variable parameters is actually very simple, and the VA series are defined by macro, implementation is related to the stack. when we write a variable function's c function, it has both advantages and disadvantages. Therefore, we do not need to use variable parameters unless necessary. in C ++, we should use C ++ polymorphism to implement variable parameter functions, and try to avoid using C language. |
Printf Research
The following is a simple implementation of the printf function:
# Include "stdio. H"
# Include "stdlib. H"
Void myprintf (char * FMT,...) // a simple implementation similar to printf. // The parameters must be of the int type.
{
Char * parg = NULL; // equivalent to the original va_list
Char C;
Parg = (char *) & FMT; // do not write P = FMT !! Because the address must be set for the // parameter, instead of the value.
Parg + = sizeof (FMT); // equivalent to the original va_start
Do
{
C = * FMT;
If (C! = '% ')
{
Putchar (c); // output character as is
}
Else
{
// Output data by formatted characters
Switch (* ++ FMT)
{
Case 'D ':
Printf ("% d", * (int *) parg ));
Break;
Case 'X ':
Printf ("% # X", * (int *) parg ));
Break;
Default:
Break;
}
Parg + = sizeof (INT); // equivalent to the original va_arg
}
++ FMT;
} While (* FMT! = '\ 0 ');
Parg = NULL; // equivalent to va_end
Return;
}
Int main (INT argc, char * argv [])
{
Int I = 1234;
Int J = 5678;
Myprintf ("thefirst test: I = % d", I, j );
Myprintf ("thesecend test: I = % d; % x; j = % d;", I, 0 xabcd, J );
System ("pause ");
Return 0;
}
The execution results on Intel + Win2k + vc6 are as follows:
The first test: 1 = 1234
The secend test: I = 1234; 0 xabcd; j = 5678;