Remember from the beginning of C language learning to use scanf, about scanf usage is also knows, the use of scanf problems have not been deeply explored, so the author intends to explore the implementation of scanf.
How to find scanf source code
About VC in the CRT code in the VS directory in the \VC\CRT\SRC, we first SCANF.C out.
int __cdecl scanf ( const char *format, ... ) { va_list arglist; Va_start (arglist, format); Return VSCANF_FN (_input_l, format, NULL, arglist);}
the scanf function actually calls the VSCANF_FN
int __cdecl vscanf_fn ( inputfn inputfn, const char *format, _locale_t plocinfo, va_list arglist ) /* * stdin ' SCAN ', ' F ' ormatted */{ int retval = 0; _validate_return (format = NULL), EINVAL, EOF); _LOCK_STR2 (0, stdin); __try { retval = (INPUTFN (stdin, format, Plocinfo, arglist)); } __finally { _unlock_str2 (0, stdin); } return (retval);}
We are actually doing our scanf operation based on the INPUTFN function pointer, and we continue to find this inputfn source.
We found the real processing function in input.c, our focus is to parse the file, and let's see the input function description.
/****int _input (stream, format, arglist), static int input (format, arglist) **purpose:* get input items (data items or Li Teral matches) from the input stream* and assign them if appropriate to the items thru the arglist. this* function is intended for internal library use only, not for the user** the _input entry point is for the normal scanf () functions* The input entry point was used when compiling for _cscanf () [cprflaf* defined] and is a static funct Ion called-_cscanf ()--reads from* console.** This code also defines _input_s, which works differently for%c ,%s &%[.* for these, _input_s first picks to the next argument from the variable* argument list & uses it as The maximum size of the character array pointed* to by the next argument in the list.**entry:* FILE *stream-file to Read from* char *format-format string to determine the data to read* arglist-list of pointer to data items**exit: * Returns number of items assigned and FILLS in data items* returns EOF if error or EOF found on stream before 1st data item matched**exceptions:***************** ***************************************************************/
Preliminary analysis of the source before preparation
First, learn some key functions and the use of macro:
"Note: Nolock is for Threads" MSDN Introduction:
_nolock that they does not lock the calling thread. They might be faster because they does not incur the overhead of locking from other threads. Use these functions only in thread-safe contexts such as single-threaded applications or where the calling scope already H Andles thread isolation.
Static _tint __cdecl _inc (file* fileptr) { return (_gettc_nolock (fileptr));}
_inc calls _getcc_nolock to get the buffer read to a character.
static void __cdecl _un_inc (_tint chr, file* fileptr) { if (_teof! = CHR) { _ungettc_nolock (chr,fileptr); }}
The _ungettc_nolock of the _un_inc call puts the character Chr back into Fileptr. "This un-literally makes me very distressed, and then I read it carefully before I know what it means."
Static _tint __cdecl _whiteout (int* counter, file* fileptr) { _tint ch; Do { ++*counter; ch = _inc (fileptr); if (ch = = _teof) {break ; } } while (_istspace (_TUCHAR) ch); return ch;}
_whiteout This meaning is straightforward, the blank character (including ", ' \ n ', ' \ t ', etc.) output all until the first non-whitespace character is encountered" at this time the white space character we have taken it out of the process, we have to put it back first. "
#define _gettc_nolock _getc_nolock#define _getc_nolock (_stream) _fgetc_nolock (_stream) #define _FGETC_ Nolock (_stream) (--(_stream)->_cnt >= 0? 0xFF & * (_stream)->_ptr++: _filbuf (_stream))
The last Maroc checks the number of readable characters in the buffer and, if 0, empties the buffer.
#define INC ( ++charcount, _inc (stream)) #define UN_INC (CHR) (--charcount, _un_inc (CHR, stream)) #define Eat_ White () _whiteout (&charcount, Stream)
Now, the three macro is simple:
①inc () Read characters
②un_inc (CHR) put back characters
③eat_white () Embox The blank word to eat!
With the above foundation, the processing of the buffer stream file is no more difficult. "Windows likes to boil down all sorts of operations to file (files), such as Api:createfile"
Analysis of code
The Swich case in the code contains a lot of goto statements that "write code is quite veteran of the old driver"
Do not mention more, start with the format parsing:
while (*format) { if (_istspace ((_tuchar) *format)) { un_inc (Eat_white ());/* Put first non-space char back */
do { tch = *++format; } while (_istspace ((_tuchar) tch)); Continue; ..................
the Un_inc (Eat_white ()) Here is the first non-whitespace character read from the original eat_white and then into the buffer.
The above code completes the cleanup of whitespace characters in the keyboard buffer until the first character is read normally.
When reading the% number, proceed with:
if (_t ('% ') = = *format && _t ('% ')! = * (format + 1))
We can find a variety of formatting inputs, such as:
Format Character Description
%a read in a floating-point value (valid only C99)
%A Ibid.
%c reads in one character
%d read-in decimal integer
%i read in decimal, octal, hexadecimal integer
%o Read in octal integer
%x read in hexadecimal integer
%x Ibid.
%c reads in one character
%s reads into a
%f read in a floating-point number
%F Ibid.
%e Ibid.
%E Ibid.
%g Ibid.
%G Ibid.
%p read in a pointer
%u read in an unsigned decimal integer
%n the equivalent number of characters that have been read into the value
%[] Scan Character Set
Percent Read% sign
%* data of the specified type without saving
Here we mainly analyze%[] and%*
① by customizing our scan set%[] to make the input more flexible, such as
scanf ("%[a-za-z]", &CHR); Implementation can only input a-z,a-z
scanf ("%[^a-z]", &CHR); Implement input non-a-Z
scanf ("%[^\n]", str); Implementing a readable carriage return
②%* reads the specified type of data without saving
scanf ("%*d%c", &i); Reads%d without saving, saves the read%c to I
The meaning of the ③^ represents reversal
Let's look at the implementation code:
if (_t (' ^ ') = = *scanptr) {++scanptr; --reject; /* Set Reject to 255 */}/* Allocate "table" on first%[] spec */#if ALLOC_ Table if (table = = NULL) {table = (char*) _malloc_crt (tablesize); if (table = = NULL) goto Error_return; Malloc_flag = 1; } #endif/* alloc_table */memset (TABLE, 0, tablesize); if (Left_bracket = = COMCHR) if (_t (') ') = = *scanptr) {Prevchar = _t ('] '); ++scanptr; table[_t ('] ') >> 3] = 1 << (_t (') ') & 7); } while (_t (') ')! = *scanptr) {rngch = *scanptr++; if (_t ('-')! = RNGCH | | !prevchar | | /* First char */_t ('] ') = = *scanptr)/* Last char */table [(Prevchar = RNGCH) >> 3] |= 1 << (RNGCH & 7); else {/* handle A-Z type set */RNGCH = *scanptr++;/* Get end of range */ if (Prevchar < RNGCH)/*%[a-z] */last = RNGCH; else {/*%[z-a] */last = Prevchar; Prevchar = RNGCH; }/* Last could is 0xFF, so we handle it at the end of the FOR loop */ for (RNGCH = Prevchar, rngch < last; ++rngch) { Table[RNGCH >> 3] |= 1 << (RNGCH & 7); } table[last >> 3] |= 1 << (last & 7); Prevchar = 0; } }
Reject reversal mark, if present ^ then reject = FF; Subsequent inversion is facilitated.
For the [] character set, there is a char table[32] to hold 256 ASCII characters. "Each char here is 8bits, so there are 32 groups that can contain exactly 256 ASCII characters"
Microsoft has done this with the characters in table:
TABLE[RNGCH >> 3] |= 1 << (RNGCH & 7);
That is: to divide the read string into 32 groups "rngch>>3 equivalent to divided by 8", each table[n] has 8bits, each bit, the occurrence of the character bit will be set to 1, does not appear 0, so it is perfectly inclusive of 256 ASCII characters.
Determine if a character exists and handle it directly:
(Table[ch >> 3] ^ reject) & (1 << (Ch & 7))
The above is what I did not know at the beginning of the usage, let us explore the use of%d, etc.
This is often the case when writing code:
Char A;char b;scanf ("%c", &a);p rintf ("%c", a), scanf ("%c", &b);p rintf ("%c", b);
When you type a character carriage return, you find that you can no longer type the second character, and the breakpoint debug discovery is saved in B, because case is C, \ n is entered and saved to B, the most common way to solve this problem is to flush the buffer and the like.
int A;int b;scanf ("%d", &a);p rintf ("%d", a), scanf ("%d", &b);p rintf ("%d", b);
When typing 1, enter, 2 o'clock did not find the assignment error condition, this is the processing mode of%d problem, we look at case D:
in which there are many judgments _isxdigit (CH), if not Arabic numerals, will be executed to jump out of the current%d characters read, execute 1313 rows of ++format; /* Skip to next char * /
That is:%d has skipped \ nthe read and continues to read the next character.
The code structure is as follows:
if (_t ('% ') = = *format && _t ('% ')! = * (format + 1)) {... ... ..... ++format; /* Skip to next char * /} else /* ('% '! = *format) */{.......... ...}
Reading a function when reading the code HEXTODEC is good:
Static _tint __cdecl _hextodec (_tchar chr) { return _isdigit (CHR)? CHR: (Chr & ~ (_t (' a ')-_t (' a '))-_t (' a ') + + + _t (' 0 ');}
Converts the read 16-character 0-f to a 10-binary number
The above is my analysis of scanf code.
Analysis of scanf source code