The following illustration is taken from the latest Linux kernel 2.6.29, indicating that the strtok () function is no longer used, instead of strsep (), which is faster.
/*
*
Linux/lib/string. c
*
* Copyright (c) 1991,199 2 Linus
Torvalds
*/
/*
* Stupid Library Routines .. the optimized versions
Shoshould generally be found
* As Inline code in
<ASM-XX/string. h>
*
* These are buggy as well ..
*
** Fri Jun
25 1999, Ingo oeser <ioe@informatik.tu-chemnitz.de>
*-Added strsep ()
Which will replace strtok () soon (because strsep () is
* Reentrant and shocould
Be faster). Use Only strsep () in new code, please.
*
** Sat Feb 09 2002,
Jason Thomas <jason@topic.com.au>,
* Matthew Hawkins
<Matt@mh.dropbear.id.au>
*-Kissed strtok () Goodbye
*/
The strtok () function should have been encountered by everyone, but it seems that there are always some problems. Here we will focus on it.
Here is an example:
Int main (){
Char test1 [] = "Feng, Ke, wei ";
Char * Test2 = "Feng, Ke, wei ";
Char * P; P = strtok (test1 ,",");
While (P)
{
Printf ("% s \ n", P );
P = strtok (null ,",");
}
Return 0;
}
Running result:
Feng
Ke
Wei
However, if p = strtok (Test2, ",") is used, a memory error occurs. Why? Is it related to the static variable in it? Let's take a look at its original code:
/***
*strtok.c - tokenize a string with given delimiters
*
* Copyright (c) Microsoft Corporation. All rights reserved.
*
*Purpose:
* defines strtok() - breaks string into series of token
* via repeated calls.
*
*******************************************************************************/
#include <cruntime.h>
#include <string.h>
#ifdef _MT
#include <mtdll.h>
#endif /* _MT */
/***
* Char * strtok (string, control)-tokenize string with delimiter in control
*
* Purpose:
* Strtok considers the string to consist of a sequence of zero or more
* Text tokens separated by spans of one or more control chars. The first
* Call, with string specified, returns a pointer to the first char of
* First token, and will write a null Char into string immediately
* Following the returned token. Subsequent callwith zero for the first
* Argument (string) will work thru the string until no tokens remain.
* Control string may be different from call to call. When no tokens remain
* In string a null pointer is returned. Remember the control chars with
* Bit map, one bit per ASCII Char. The null char is always a control char.
* // The details are provided here !! Better than msdn!
* Entry:
* Char * string-string to tokenize, or null to get next token
* Char * Control-string of characters to use as delimiters
*
* Exit:
* Returns pointer to first token in string, or if string
* Was null, to next token
* Returns NULL when no more tokens remain.
*
* Uses:
*
* Exceptions:
*
**************************************** ***************************************/
char * __cdecl strtok (
char * string,
const char * control
)
{
unsigned char *str;
const unsigned char *ctrl = control;
unsigned char map[32];
int count;
# Ifdef _ Mt
_ Ptiddata PTD = _ getptd ();
# Else/* _ Mt */
Static char * nextoken; // Save the static variables of the remaining substrings
# Endif/* _ Mt */
/* Clear control map */
for (count = 0; count < 32; count++)
map[count] = 0;
/* Set bits in delimiter table */
do {
map[*ctrl >> 3] |= (1 << (*ctrl & 7));
} while (*ctrl++);
/* Initialize Str. If string is null, set STR to the saved
* Pointer (I. e., continue breaking tokens out of the string
* From the last strtok call )*/
If (string)
STR = string; // The original string used to call the function for the first time
Else
# Ifdef _ Mt
STR = PTD-> _ token;
# Else/* _ Mt */
STR = nextoken; // The remainder String called when the first parameter of the function is set to null
#endif /* _MT */
/* Find beginning of token (skip over leading delimiters). Note that
* there is no token iff this loop sets str to point to the terminal
* null (*str == '\0') */
while ( (map[*str >> 3] & (1 << (*str & 7))) && *str )
str++;
String = STR; // at this time, the string returns the execution result of the remainder string.
/* Find the end of the token. If it is not the end of the string,
* put a null there. */
// Here is the processing core. Find the separator and set it to '\ 0'. Of course,' \ 0' will also be saved in the returned string.
For (; * STR; STR ++)
If (Map [* STR> 3] & (1 <(* STR & 7 ))){
* STR ++ = '\ 0'; // here it is equivalent to modifying the content of the string ①
Break;
}
/* Update nextoken (or the corresponding field in the per-thread data
* Structure */
# Ifdef _ Mt
PTD-> _ token = STR;
# Else/* _ Mt */
Nextoken = STR; // Save the remainder string in a static variable for the next call
# Endif/* _ Mt */
/* Determine if a token has been found .*/
If (string = Str)
Return NULL;
Else
Return string;
1. strtok Introduction
As we all know, strtok can be based on user-provided delimiters (and separators can also be plural numbers, such as ",").
Splits a string until "\ 0" is encountered ".
For example, separator = "," string = "Fred, John, Ann"
Using strtok, we can extract the three strings "Fred", "John", and "Ann.
The above C code is
Quote: int
In = 0;
Char buffer [] = "Fred, John, Ann"
Char * P [3];
Char * buff =
Buffer;
While (P [in] = strtok (BUF ,","))! = NULL ){
I ++;
Buf = NULL;
}
As shown in the above Code, the first execution of strtok needs to take the address of the target string as the first parameter (BUF = buffer), and then strtok needs to take null as the first parameter.
(BUF = NULL ). The pointer column P [] stores the split result. P [0] = "John", P [1] = "John ", P [2] = "Ann", and Buf becomes
Fred \ 0john \ 0ann \ 0.
2. strtok Vulnerabilities
Let's change our plan: We have a string
"Fred male 25, John male 62, Anna female 16" we want to sort this string and input it to a struct,
Quote: struct
Person {
Char [25] Name;
Char [6] sex;
Char [4]
Age;
}
To do this, one of the methods is to extract a string separated by commas (,) and then separate it with spaces.
For example, extract "Fred
Male 25 "and split it into" Fred "" male "" 25"
Below I wrote a small program to demonstrate this process:
Quote: # include <stdio. h>
# Include <string. h>
# Define
Info_max_sz255
Int main ()
{
Int in = 0;
Char
Buffer [info_max_sz] = "Fred male 25, John male 62, Anna female 16 ";
Char
* P [20];
Char * Buf = buffer;
While (P [in] = strtok (BUF ,","))! = NULL)
{
Buf = P [in];
While (P [in] = strtok (BUF ,""))! = NULL)
{
In ++;
Buf = NULL;
}
P [IN ++] = "***"; // represents Segmentation
Buf = NULL;
}
Printf ("here we have % d strings \ n", I );
For (Int J = 0; j <in;
J ++)
Printf ("> % S <\ n", P [J]);
Return 0;
}
The output of this program is:
Here we
Have 4
Strings
> Fred <
> Male <
> 25 <
> *** <
This is just a small piece of data, not what we need. But why?
This is because strtok uses a static (static) pointer to operate data. Let me analyze the running process of the above Code:
Red indicates the position pointed to by strtok's built-in pointer, and blue indicates strtok's string modification.
1.
"Fred male 25, John male 62, Anna female 16"
// External Loop
2. "Fred male 25 \ 0 John male 62, Anna female 16" // enter the inner loop
3.
"Fred \ 0 male 25 \ 0 John male
62, Anna female 16"
4. "Fred \ 0male \ 025 \ 0 John male 62, Anna female 16"
5
"Fred \ 0male \ 025 \ 0 John male 62, Anna female 16"
// The internal loop returns to the External Loop with "\ 0"
6 "Fred \ 0male \ 025 \ 0 John
Male 62, Anna female 16 "// an External Loop Encounters" \ 0.
3.
Use strtok_r
In this case, we should use strtok_r, strtok reentrant.
Char
* Strtok_r (char * s, const char * delim, char
** Ptrptr );
Compared with strtok, we need to provide a pointer for strtok to operate, instead of using a matched pointer like strtok.
Code:
Quote: # include <stdio. h>
# Include <string. h>
# Define
Info_max_sz255
Int main ()
{
Int in = 0;
Char
Buffer [info_max_sz] = "Fred male 25, John male 62, Anna female 16 ";
Char
* P [20];
Char * Buf = buffer;
Char * outer_ptr = NULL;
Char
* Inner_ptr = NULL;
While (P [in] = strtok_r (BUF, ",", & outer_ptr ))! = NULL)
{
Buf = P [in];
While (P [in] = strtok_r (BUF, "", & inner_ptr ))! = NULL)
{
In ++;
Buf = NULL;
}
P [IN ++] = "***";
Buf = NULL;
}
Printf ("here we have % d strings \ n", I );
For (Int J = 0; JN <I;
J ++)
Printf ("> % S <\ n", P [J]);
Return 0;
}
The output for this time is:
Here we
Have 12
Strings
> Fred <
> Male <
> 25 <
> *** <
> JOHN <
> Male <
> 62 <
> *** <
> Anna <
> Female <
> 16 <
> *** <
Let me analyze the running process of the above Code:
The red color indicates the position pointed to by the outer_ptr of strtok_r,
Purple indicates the position pointed to by strtok_r inner_ptr,
The blue is strtok's modification to the string
1.
"Fred male 25, John male 62, Anna female 16"
// External Loop
2. "Fred male 25 \ 0 John male 62, Anna female
16 "// enter the inner loop
3. "Fred \ 0 male 25 \ 0 John male 62, Anna
Female 16"
4 "Fred \ 0male \ 025 \ 0 John male 62, Anna female
16"
5 "Fred \ 0male \ 025 \ 0 John male 62, Anna female 16" // inner loop Encounters "\ 0" back to outer loop
6
"Fred \ 0male \ 025 \ 0 John male 62 \ 0 Anna female 16" // enters the inner loop
}
Originally, this function modified the original string.
Therefore, when char * Test2 = "Feng, Ke, wei" is used as the first parameter,
Because the content pointed to by Test2 is stored in the text constant area, the content in this area cannot be modified, so a memory error occurs. and char test1 [] = "Feng, Ke, wei"
Test1 points to the content stored in the stack, so you can modify it.
We should have a more rational understanding of the text constant area here .....