In-depth analysis of strtok () Functions

Source: Internet
Author: User
Tags strtok

The strtok () function should have been encountered by everyone, but it seems that there are always some problems. Here we will focus on it.

 

First, let's take a look at the explanations on msdn:

Char * strtok (char * strtoken, const char * strdelimit );

Parameters
Strtoken
String containing token or tokens.
Strdelimit
Set of delimiter characters.
Return Value

Returns a pointer to the next token found inStrtoken. They returnNullWhen no more tokens are found. Each call modifiesStrtokenBy substituting a null character for each delimiter that is encountered.

Remarks

TheStrtokFunction finds the next token inStrtoken. The set of characters inStrdelimitSpecifies possible delimiters of the token to be found inStrtokenOn the current call.

Security noteThese functions incur a potential threat brought about by a buffer overrun problem. buffer Overrun problems are a frequent method of system attack, resulting in an unwarranted Elevation of Privilege. for more information, see avoiding buffer overruns.

On the first callStrtok, The function skips leading delimiters and returns a pointer to the first token inStrtoken, Terminating the token with a null character. More tokens can be broken out of the remainderStrtokenBy a series of calltoStrtok. Each callStrtokModifiesStrtokenBy inserting a null character after the token returned by that call. To read the next token fromStrtoken, CallStrtokWithNullValue forStrtokenArgument.Null StrtokenArgument causesStrtokTo search for the next token in the modifiedStrtoken.StrdelimitArgument can take any value from one call to the next so that the set of delimiters may vary.

NoteEach function uses a static variable for parsing the string into tokens. if multiple or simultaneous cballs are made to the same function, a high potential for data upload uption and inaccurate results exists. therefore, do not attempt to call the same function simultaneously for different strings and be aware of calling one of these functions from within a loop where another routine may be called that uses the same function. however, calling this function simultaneously from multiple threads does not have undesirable effects.

Dizzy, right? Haha...

Simply put, the function returns the first separator-separated substring, sets the first parameter to null, and the function returns the remaining substrings.

Here is an example:

Int main (){

Char test1 [] = "Feng, Ke, wei ";

Char * Test2 = "Feng, Ke, wei ";

Char * P; P = strtok (test1 ,",");

While (P)

{

Printf ("% s/n", P );

P = strtok (null ,",");

}

Return 0;

}

Running result:

Feng

Ke

Wei

However, if p = strtok (Test2, ",") is used, a memory error occurs. Why? Is it related to the static variable in it? Let's take a look at its original code:

/****strtok.c - tokenize a string with given delimiters**         Copyright (c) Microsoft Corporation. All rights reserved.**Purpose:*         defines strtok() - breaks string into series of token*         via repeated calls.********************************************************************************/
#include <cruntime.h>#include <string.h>#ifdef _MT#include <mtdll.h>#endif  /* _MT */
/***** Char * strtok (string, control)-tokenize string with delimiter in control ** purpose: * strtok considers the string to consist of a sequence of zero or more * Text tokens separated by spans of one or more control chars. the first * call, with string specified, returns a pointer to the first char of the * first token, and will write a null Char into string immediately * Following the returned token. s Ubsequent callwith zero for the first * argument (string) will work thru the string until no tokens remain. the * control string may be different from call to call. when no tokens remain * In string a null pointer is returned. remember the control chars with a * bit map, one bit per ASCII char. the null char is always a control char. * // the details are provided here !! Better than msdn! * Entry: * char * string-string to tokenize, or null to get next token * char * Control-string of characters to use as delimiters ** Exit: * returns pointer to first token in string, or if string * was null, to next token * returns NULL when no more tokens remain. ** uses: ** exceptions: **************************************** ****************************************/
char * __cdecl strtok (          char * string,          const char * control          ){          unsigned char *str;          const unsigned char *ctrl = control;
          unsigned char map[32];          int count;
# Ifdef _ Mt _ ptiddata PTD = _ getptd (); # else/* _ Mt */static char * nextoken; // Save the static variable of the remaining substring # endif/* _ Mt */
          /* Clear control map */          for (count = 0; count < 32; count++)                  map[count] = 0;
          /* Set bits in delimiter table */          do {                  map[*ctrl >> 3] |= (1 << (*ctrl & 7));          } while (*ctrl++);
/* Initialize Str. if string is null, set STR to the saved * pointer (I. E ., continue breaking tokens out of the string * from the last strtok call) */If (string) STR = string; // The original string used to call the function for the first time
Else # ifdef _ Mt STR = PTD-> _ token; # else/* _ Mt */STR = nextoken; // The remainder String called when the first parameter of the function is set to null
#endif  /* _MT */
          /* Find beginning of token (skip over leading delimiters). Note that           * there is no token iff this loop sets str to point to the terminal           * null (*str == '/0') */          while ( (map[*str >> 3] & (1 << (*str & 7))) && *str )                  str++;
String = STR; // at this time, the string returns the execution result of the remainder string.
          /* Find the end of the token. If it is not the end of the string,           * put a null there. */
// Here is the processing core. Find the separator and set it to '/0'. Of course,'/0' will also be saved in the returned string for (; * STR; STR ++) if (Map [* STR> 3] & (1 <(* STR & 7) {* STR ++ = '/0 '; // here it is equivalent to modifying the content of the string ① break ;}
/* Update nextoken (or the corresponding field in the per-thread data * structure */# ifdef _ Mt PTD-> _ token = STR; # else/* _ Mt */nextoken = STR; // Save the remainder in a static variable for the next call # endif/* _ Mt */
          /* Determine if a token has been found. */          if ( string == str )                return NULL;          else                  return string;}

Originally, this function modified the original string.

Therefore, when char * Test2 = "Feng, Ke, wei" is used as the first parameter, the content pointed to by Test2 is stored in the text constant area at location ①, the content in this area cannot be modified, so a memory error occurs. the content indicated by test1 in char test1 [] = "Feng, Ke, wei" is stored in the stack, so it can be modified.

We should have a more rational understanding of the text constant area here .....

Article Source: http://hi.baidu.com/summy00/blog/item/d1be73a8766226b4ca130cc5.html

 

In the PHP hut of hy0kl brother saw the usage of strtok function, post see http://hi.baidu.com/hy0kl/blog/item/2e7a3224a0303228d40742fc.html

When you feel that the functions are great, there are both questions and uncomfortable points. It is unclear why the parameter must be null for the second call. However, if STR is still used for the second call, only the first string is returned. The reason is that after a call, the original string is replaced with another one.

After testing the results, we found that after calling result = strtok (STR, delims);, the STR string becomes different from the original one and becomes the same as the result, only the first substring "now"

The original long string of STR is overwritten by the first substring, destroying the original string, which is very bad.

Let's take the example.

---------------------

Char STR [] = "now # is the time for all # Good men to come to the # aid of their country ";
Char delims [] = "#";
Char * result = NULL;

Printf ("befor delim STR is/" % S/"/N", STR );
Result = strtok (STR, delims );
While (result! = NULL ){
Printf ("result is/" % S/"/N", result );
Result = strtok (null, delims );
}

Printf ("after delim STR is/" % S/"/N", STR );

---------------------

If printf ("after delim STR is/" % S/"/N", STR) is added after the loop, check the subsequent Str

When STR is output, only "now" is output. It cannot be restored to the long string of the original one ..

This is a bad feeling. Why didn't I restore STR to a previous status string when Microsoft implemented this function ?? Puzzled...

Appendix: output result

------------------

Befor delim STR is "now # is the time for all # Good men to come to the # aid
Their Country"
Result is "now"
Result is "is the time for all"
Result is "good men to come to"
Result is "aid of their country"
After delim STR is "now"

---------------

Article Source: http://hi.baidu.com/flyskymlf/blog/item/1ca249cba1d0571cbf09e667.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.