Strtok () and strtok_r ()

Source: Internet
Author: User
Tags strtok

The following illustration is taken from the latest Linux kernel 2.6.29, indicating that the strtok () function is no longer used, instead of strsep (), which is faster.

/*
*
Linux/lib/string. c
*
* Copyright (c) 1991,199 2 Linus
Torvalds
*/
/*
* Stupid Library Routines .. the optimized versions
Shoshould generally be found
* As Inline code in
<ASM-XX/string. h>
*
* These are buggy as well ..
*
** Fri Jun
25 1999, Ingo oeser <ioe@informatik.tu-chemnitz.de>
*-Added strsep ()
Which will replace strtok () soon (because strsep () is
* Reentrant and shocould
Be faster). Use Only strsep () in new code, please.
*
** Sat Feb 09 2002,
Jason Thomas <jason@topic.com.au>,
* Matthew Hawkins
<Matt@mh.dropbear.id.au>
*-Kissed strtok () Goodbye
*/

The strtok () function should have been encountered by everyone, but it seems that there are always some problems. Here we will focus on it.

Here is an example:

Int main (){

Char test1 [] = "Feng, Ke, wei ";

Char * Test2 = "Feng, Ke, wei ";

Char * P; P = strtok (test1 ,",");

While (P)

{

Printf ("% s \ n", P );

P = strtok (null ,",");

}

Return 0;

}

Running result:

Feng

Ke

Wei

However, if p = strtok (Test2, ",") is used, a memory error occurs. Why? Is it related to the static variable in it? Let's take a look at its original code:

/***
*strtok.c - tokenize a string with given delimiters
*
*       Copyright (c) Microsoft Corporation. All rights reserved.
*
*Purpose:
*       defines strtok() - breaks string into series of token
*       via repeated calls.
*
*******************************************************************************/
#include <cruntime.h>
#include <string.h>
#ifdef _MT
#include <mtdll.h>
#endif /* _MT */
/***
* Char * strtok (string, control)-tokenize string with delimiter in control
*
* Purpose:
* Strtok considers the string to consist of a sequence of zero or more
* Text tokens separated by spans of one or more control chars. The first
* Call, with string specified, returns a pointer to the first char of
* First token, and will write a null Char into string immediately
* Following the returned token. Subsequent callwith zero for the first
* Argument (string) will work thru the string until no tokens remain.
* Control string may be different from call to call. When no tokens remain
* In string a null pointer is returned. Remember the control chars with
* Bit map, one bit per ASCII Char. The null char is always a control char.
* // The details are provided here !! Better than msdn!
* Entry:
* Char * string-string to tokenize, or null to get next token
* Char * Control-string of characters to use as delimiters
*
* Exit:
* Returns pointer to first token in string, or if string
* Was null, to next token
* Returns NULL when no more tokens remain.
*
* Uses:
*
* Exceptions:
*
**************************************** ***************************************/
char * __cdecl strtok (
        char * string,
        const char * control
        )
{
        unsigned char *str;
        const unsigned char *ctrl = control;
          unsigned char map[32];
        int count;
# Ifdef _ Mt
_ Ptiddata PTD = _ getptd ();
# Else/* _ Mt */
Static char * nextoken; // Save the static variables of the remaining substrings
# Endif/* _ Mt */
          /* Clear control map */
        for (count = 0; count < 32; count++)
                map[count] = 0;
          /* Set bits in delimiter table */
        do {
                map[*ctrl >> 3] |= (1 << (*ctrl & 7));
        } while (*ctrl++);
/* Initialize Str. If string is null, set STR to the saved
* Pointer (I. e., continue breaking tokens out of the string
* From the last strtok call )*/
If (string)
STR = string; // The original string used to call the function for the first time
Else
# Ifdef _ Mt
STR = PTD-> _ token;
# Else/* _ Mt */
STR = nextoken; // The remainder String called when the first parameter of the function is set to null
#endif  /* _MT */
          /* Find beginning of token (skip over leading delimiters). Note that
         * there is no token iff this loop sets str to point to the terminal
         * null (*str == '\0') */
        while ( (map[*str >> 3] & (1 << (*str & 7))) && *str )
                str++;
String = STR; // at this time, the string returns the execution result of the remainder string.
          /* Find the end of the token. If it is not the end of the string,
         * put a null there. */
// Here is the processing core. Find the separator and set it to '\ 0'. Of course,' \ 0' will also be saved in the returned string.
For (; * STR; STR ++)
If (Map [* STR> 3] & (1 <(* STR & 7 ))){
* STR ++ = '\ 0'; // here it is equivalent to modifying the content of the string ①
Break;
}
/* Update nextoken (or the corresponding field in the per-thread data
* Structure */
# Ifdef _ Mt
PTD-> _ token = STR;
# Else/* _ Mt */
Nextoken = STR; // Save the remainder string in a static variable for the next call
# Endif/* _ Mt */
/* Determine if a token has been found .*/
If (string = Str)
Return NULL;
Else
Return string;


1. strtok Introduction

As we all know, strtok can be based on user-provided delimiters (and separators can also be plural numbers, such as ",").

Splits a string until "\ 0" is encountered ".



For example, separator = "," string = "Fred, John, Ann"

Using strtok, we can extract the three strings "Fred", "John", and "Ann.

The above C code is
Quote: int
In = 0;
Char buffer [] = "Fred, John, Ann"
Char * P [3];
Char * buff =
Buffer;
While (P [in] = strtok (BUF ,","))! = NULL ){
I ++;
Buf = NULL;
}

As shown in the above Code, the first execution of strtok needs to take the address of the target string as the first parameter (BUF = buffer), and then strtok needs to take null as the first parameter.
(BUF = NULL ). The pointer column P [] stores the split result. P [0] = "John", P [1] = "John ", P [2] = "Ann", and Buf becomes
Fred \ 0john \ 0ann \ 0.

2. strtok Vulnerabilities
Let's change our plan: We have a string
"Fred male 25, John male 62, Anna female 16" we want to sort this string and input it to a struct,

Quote: struct
Person {
Char [25] Name;
Char [6] sex;
Char [4]
Age;
}

To do this, one of the methods is to extract a string separated by commas (,) and then separate it with spaces.
For example, extract "Fred
Male 25 "and split it into" Fred "" male "" 25"
Below I wrote a small program to demonstrate this process:

Quote: # include <stdio. h>
# Include <string. h>
# Define
Info_max_sz255
Int main ()
{
Int in = 0;
Char
Buffer [info_max_sz] = "Fred male 25, John male 62, Anna female 16 ";
Char
* P [20];
Char * Buf = buffer;

While (P [in] = strtok (BUF ,","))! = NULL)
{
Buf = P [in];
While (P [in] = strtok (BUF ,""))! = NULL)
{
In ++;
Buf = NULL;
}
P [IN ++] = "***"; // represents Segmentation
Buf = NULL;
}

Printf ("here we have % d strings \ n", I );
For (Int J = 0; j <in;
J ++)
Printf ("> % S <\ n", P [J]);
Return 0;
}

The output of this program is:
Here we
Have 4
Strings
> Fred <
> Male <
> 25 <
> *** <
This is just a small piece of data, not what we need. But why?
This is because strtok uses a static (static) pointer to operate data. Let me analyze the running process of the above Code:

Red indicates the position pointed to by strtok's built-in pointer, and blue indicates strtok's string modification.

1.
"Fred male 25, John male 62, Anna female 16"
// External Loop

2. "Fred male 25 \ 0 John male 62, Anna female 16" // enter the inner loop

3.
"Fred \ 0 male 25 \ 0 John male
62, Anna female 16"

4. "Fred \ 0male \ 025 \ 0 John male 62, Anna female 16"

5
"Fred \ 0male \ 025 \ 0 John male 62, Anna female 16"
// The internal loop returns to the External Loop with "\ 0"

6 "Fred \ 0male \ 025 \ 0 John
Male 62, Anna female 16 "// an External Loop Encounters" \ 0.

3.
Use strtok_r

In this case, we should use strtok_r, strtok reentrant.
Char
* Strtok_r (char * s, const char * delim, char
** Ptrptr );

Compared with strtok, we need to provide a pointer for strtok to operate, instead of using a matched pointer like strtok.
Code:

Quote: # include <stdio. h>
# Include <string. h>
# Define
Info_max_sz255
Int main ()
{
Int in = 0;
Char
Buffer [info_max_sz] = "Fred male 25, John male 62, Anna female 16 ";
Char
* P [20];
Char * Buf = buffer;

Char * outer_ptr = NULL;
Char
* Inner_ptr = NULL;

While (P [in] = strtok_r (BUF, ",", & outer_ptr ))! = NULL)
{
Buf = P [in];
While (P [in] = strtok_r (BUF, "", & inner_ptr ))! = NULL)
{
In ++;
Buf = NULL;
}
P [IN ++] = "***";
Buf = NULL;
}

Printf ("here we have % d strings \ n", I );
For (Int J = 0; JN <I;
J ++)
Printf ("> % S <\ n", P [J]);
Return 0;
}

The output for this time is:
Here we
Have 12
Strings
> Fred <
> Male <
> 25 <
> *** <
> JOHN <
> Male <
> 62 <
> *** <
> Anna <
> Female <
> 16 <
> *** <

Let me analyze the running process of the above Code:

The red color indicates the position pointed to by the outer_ptr of strtok_r,
Purple indicates the position pointed to by strtok_r inner_ptr,
The blue is strtok's modification to the string

1.
"Fred male 25, John male 62, Anna female 16"
// External Loop

2. "Fred male 25 \ 0 John male 62, Anna female
16 "// enter the inner loop

3. "Fred \ 0 male 25 \ 0 John male 62, Anna
Female 16"

4 "Fred \ 0male \ 025 \ 0 John male 62, Anna female
16"

5 "Fred \ 0male \ 025 \ 0 John male 62, Anna female 16" // inner loop Encounters "\ 0" back to outer loop

6
"Fred \ 0male \ 025 \ 0 John male 62 \ 0 Anna female 16" // enters the inner loop


}

Originally, this function modified the original string.

Therefore, when char * Test2 = "Feng, Ke, wei" is used as the first parameter,
Because the content pointed to by Test2 is stored in the text constant area, the content in this area cannot be modified, so a memory error occurs. and char test1 [] = "Feng, Ke, wei"
Test1 points to the content stored in the stack, so you can modify it.

We should have a more rational understanding of the text constant area here .....

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.