In-depth analysis of printf function implementation

Source: Internet
Author: User
Tags 04x




Study the implementation of printf. First, let's look at the function body of the printf function.

Int printf (const char * fmt ,...)

{

Int I;

Char buf [256];


Va_list arg = (va_list) (char *) (& fmt) + 4 );

I = vsprintf (buf, fmt, arg );

Write (buf, I );


Return I;

}

Code Location: D :/~ /Funny/kernel/printf. c


There is such a token in the parameter list :...

This is a way of writing variable parameters.

When the number of passed parameters is unknown, this method can be used.

Obviously, we need a method to let the function body know the number of parameters for a specific call.


Let's first look at the content of the printf function:


This sentence:


Va_list arg = (va_list) (char *) (& fmt) + 4 );


Va_list definition:

Typedef char * va_list

This indicates that it is a character pointer.

(Char *) (& fmt) + 4) indicates the first parameter in.

If you don't understand it, I will explain it slowly:

In C language, the direction of parameter pressure stack is from right to left.

That is to say, when the printf function is called, the rightmost parameter is first added to the stack.

Fmt is a pointer pointing to the first element in the first const parameter (const char * fmt.

Fmt is also a variable. Its position is allocated on the stack, and it also has an address.

For a char * type variable, it is a pointer into the stack, rather than the char * type variable.

In other words:

You sizeof (p) (p is a pointer, assuming p = & I, I can be any type of variable)

The result is a fixed value. (4 in my computer)

Of course, I would also like to add that the stack is growing from a high address to a low address.

OK!

Now I think you should understand: Why (char *) (& fmt) + 4) represents the address of the first parameter in.


Let's take a look at the next sentence:

I = vsprintf (buf, fmt, arg );


Let's take a look at what vsprintf (buf, fmt, arg) is.


   

     

     

       

       

         

         

           

           

             

             

               

               

                 

                 

                   

                   

                     

                     

                       

                       

                         

                         

                           

                           

                             

                             

                               

                               

                             

Let's not look at the specific content.

Think about what printf wants.

It accepts a formatting command and formats the specified matched parameters.


OK, let's see I = vsprintf (buf, fmt, arg );

Vsprintf returns a length, which you have guessed: Yes, it returns the length of the string to be printed.

In fact, let's look at the following sentence in printf: write (buf, I); you should have guessed it.

Write, as the name implies: write operations, write the values of the I elements in the buf to the terminal.


Therefore, vsprintf is used for formatting. It accepts the format string fmt that determines the output format. Format parameters with variable numbers with a format string to produce formatted output.

In my code, vsprintf only supports hexadecimal formatting.


As long as you understand what vsprintf functions are, you can easily understand the above Code.


The implementation of the write (buf, I); below is a bit complicated.


If you are an OS, a user program requires you to print some data. Obviously, the bottom-layer printing operations must be related to hardware.

Therefore, you must restrict the permissions of the program:


Let's assume a scenario:

An application said to you: Mr. OS, I need to print out the I data in the buf. Can you help me?

OS: Okay. No problem! Give me the buf.


Then, the OS will take the buf. Hand it over to your younger brother (and hardware-operated functions.

I have to inform this application: Bro, what you do is right! (OS really has a big flaw. ^_^)

In this way, the application will not obtain some super permissions to prevent it from doing something illegal. (Safe and Secure)


Let's track down the write:


Write:

Mov eax, _ NR_write

Mov ebx, [esp + 4]

Mov ecx, [esp + 8]

Int INT_VECTOR_SYS_CALL


Location: d :~ /Kernel/syscall. asm


Here, several parameters are passed to several registers, and an int ends.


Think about what we learned in the compilation, such as returning to the dos Status:

We use this


Mov ax, 4c00h

Int 21 h


Why is the following int 21h used?

This is to tell the compiler that I have to perform deformation according to the given method (the value of each register passed.

Compiler look-up table: Oh, you want to look like this. No problem!


In fact, this is not very strict. If you read some books about protection mode programming, you will know that such an int indicates that the interrupt door will be called. Implement specific system services by interrupting the door.


We can find the implementation of INT_VECTOR_SYS_CALL:

Init_idt_desc (INT_VECTOR_SYS_CALL, DA_386IGate, sys_call, PRIVILEGE_USER );


Location: d :~ /Kernel/protect. c


If you don't understand it, it doesn't matter. You only need to know that an int INT_VECTOR_SYS_CALL indicates that you want to call the sys_call function through the system. (You can guess the approximate value from the above parameter list)


Now let's take a look at the implementation of sys_call:

Sys_call:

Call save


Push dword [p_proc_ready]


Sti


Push ecx

Push ebx

Call [sys_call_table + eax * 4]

Add esp, 4*3


Mov [esi + EAXREG-P_STACKBASE], eax


Cli


Ret



Location :~ /Kernel. asm


A call save is used to save the status of the process before interruption.

Depend!

It's too complicated. if you talk about it in detail, there are too many things to design.

I only care about what I care about. Sys_call implementation is very troublesome. We may not analyze funny OS.

Let's assume that sys_call is a simple little girl. She only implements one function: to display formatted strings.


In this way, if we only understand the implementation of printf, we can write sys_call as follows:

Sys_call:


The number of elements to be printed in ecx.

The first element in the buf character array to be printed in ebx

; This function is used to print out characters until '\ 0' is encountered'

[Gs: edi] corresponds to 0x80000 h: 0. The string is displayed by directly writing to the video memory.

Xor si, si

Mov ah, 0Fh

Mov al, [ebx + si]

Cmp al, '\ 0'

Je. end

Mov [gs: edi], ax

Inc si

Loop:

Sys_call


. End:

Ret



OK! That's easy!

Congratulations! It's important to understand the underlying implementation of printf!



If you have the opportunity to read the source code of linux, you will find that its implementation is also like this.

The same is true for freedos.

For example, in linux, printf indicates this:


Static int printf (const char * fmt ,...)

{

Va_list args;

Int I;


Va_start (args, fmt );

Write (1, printbuf, I = vsprintf (printbuf, fmt, args ));

Va_end (args );

Return I;

}


Va_start

The va_end functions are explained in my blog. I will not talk about them here.


The vsprintf in it is the same as our vsprintf.

However, its write is different from ours. It also has a parameter: 1

Here I can tell you: 1 indicates a file handle corresponding to tty.

In linux, all devices are viewed as files. You only need to know that this 1 indicates writing data to the current display.


In freedos, printf is like this:


Int VA_CDECL printf (const char * fmt ,...)

{

Va_list arg;

Va_start (arg, fmt );

Charp = 0;

Do_printf (fmt, arg );

Return 0;

}


It seems that do_printf implements formatting and output.

Let's take a look at the implementation of do_printf:

STATIC void do_printf (const byte * fmt, va_list arg)

{

Int base;

BYTE s [11], FAR * p;

Int size;

Unsigned char flags;


For (; * fmt! = '\ 0'; fmt ++)

{

If (* fmt! = '% ')

{

Handle_char (* fmt );

Continue;

}


Fmt ++;

Flags = RIGHT;


If (* fmt = '-')

{

Flags = LEFT;

Fmt ++;

}


If (* fmt = '0 ')

{

Flags | = ZEROSFILL;

Fmt ++;

}


Size = 0;

While (1)

{

Unsigned c = (unsigned char) (* fmt-'0 ');

If (c> 9)

Break;

Fmt ++;

Size = size * 10 + c;

}


If (* fmt = 'l ')

{

Flags | = LONGARG;

Fmt ++;

}


Switch (* fmt)

{

Case '\ 0 ':

Va_end (arg );

Return;


Case 'C ':

Handle_char (va_arg (arg, int ));

Continue;


Case 'p ':

{

UWORD w0 = va_arg (arg, unsigned );

Char * tmp = charp;

Sprintf (s, "% 04x: % 04x", va_arg (arg, unsigned), w0 );

P = s;

Charp = tmp;

Break;

}


Case's ':

P = va_arg (arg, char *);

Break;


Case 'F ':

Fmt ++;

/* We assume % Fs here */

Case's ':

P = va_arg (arg, char FAR *);

Break;


Case 'I ':

Case 'D ':

Base =-10;

Goto lprt;


Case 'O ':

Base = 8;

Goto lprt;


Case 'U ':

Base = 10;

Goto lprt;


Case 'X ':

Case 'X ':

Base = 16;


Lprt:

{

Long currentArg;

If (flags & LONGARG)

CurrentArg = va_arg (arg, long );

Else

{

CurrentArg = va_arg (arg, int );

If (base> = 0)

CurrentArg = (long) (unsigned) currentArg;

}

Ltob (currentArg, s, base );

P = s;

}

Break;


Default:

Handle_char ('? ');


Handle_char (* fmt );

Continue;


}

{

Size_t I = 0;

While (p [I]) I ++;

Size-= I;

}


If (flags & RIGHT)

{

Int ch = '';

If (flags & ZEROSFILL) ch = '0 ';

For (; size> 0; size --)

Handle_char (ch );

}

For (; * p! = '\ 0'; p ++)

Handle_char (* p );


For (; size> 0; size --)

Handle_char ('');

}

Va_end (arg );

}



This is a complete Formatting Function.

It calls a function multiple times: handle_char

Let's take a look at its definition:

Static void handle_char (COUNT c)

{

If (charp = 0)

Put_console (c );

Else

* Charp ++ = c;

}


Put_console is also called.

Obviously, it can be seen from the function name: it is used to display

Void put_console (int c)

{

If (buff_offset> = MAX_BUFSIZE)

{

Buff_offset = 0;

Printf ("Printf buffer overflow! \ N ");

}

If (c = '\ n ')

{

Buff [buff_offset] = 0;

Buff_offset = 0;

# Ifdef _ TURBOC __

_ ES = FP_SEG (buff );

_ DX = FP_OFF (buff );

_ AX = 0x13;

_ Int _ (0xe6 );

# Elif defined (I86)

Asm

{

Push ds;

Pop es;

Mov dx, offset buff;

Mov ax, 0x13;

Int 0xe6;

}

# Endif

}

Else

{

Buff [buff_offset] = c;

Buff_offset ++;

}

}


Note: printf is called with a pass rule, but this time it is not formatted, so there will be no endless loops.


Now you should know more clearly: the implementation of printf


Now let's talk about another problem:

In any case, the printf () function cannot determine where the parameter ends. That is to say, it does not know

Number of channels. It prints the address after the format parameter in the stack in sequence based on the number of print formats in the format.

.


In this way, a possible buffer overflow problem exists...




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.