Document directory
- 1. One application instance
- 2. strtok_r and Its Usage
- 3. Source Code of strtok and strtok_r
(1) I have introduced some precautions for using the strtok function. This article introduces an application of strtok and introduces the strtok_r function.
1. One application instance
A classic example on the network is to split strings and store them in the struct. For example, the existing struct
Typedef struct person {
Char name [25];
Char sex [10];
Char age [4];
} Person;
You need to extract the name, gender, and age from the string char buffer [info_max_sz] = "Fred male 25, John male 62, Anna female 16.
A feasible approach is to set a two-layer loop. For an External Loop, the comma (,) is used as the delimiter to separate the information of the three persons. Then, for each substring, the comma (,) is used as a space) returns the name, gender, and age of the person.
According to this idea,SupposedImplement the required functions. To simplify the procedure, we call strtok to first save the substrings to the string pointer array one by one, and print all substrings saved in the pointer array at the end of the program to verify the correctness of the program. The program should be as follows:
Int in = 0; <br/> char buffer [info_max_sz] = "Fred male 25, John male 62, Anna female 16"; <br/> char * P [20]; <br/> char * Buf = buffer; <br/> while (P [in] = strtok (BUF ,","))! = NULL) <br/>{< br/> Buf = P [in]; <br/> while (P [in] = strtok (BUF ,""))! = NULL) <br/>{< br/> In ++; <br/> Buf = NULL; <br/>}< br/> Buf = NULL; <br/>}< br/> printf ("here we have % d strings/N", in); <br/> for (Int J = 0; j <in; j ++) <br/>{< br/> printf ("> % S </N", P [J]); <br/>}
The result is that only the information of the first person is extracted. It seems that the execution of the program is not as expected.Why?
The reason is: InFirst External Loop, Strtok changed the comma after "Fred male 25," to '\ 0 ',In this case, the strtok internal this pointer points to the next character 'j' of the comma'.After the first internal loop, Respectively extracted "Fred" "male" "25 ". After extracting "25,The this pointer inside the function is changed to '\ 0' after "25'.After the inner loop ends (the inner loop is actually executed four times), the second outer loop starts.Because the first parameter of the function is set to null, strtok uses the position pointed to by this pointer as the starting position for decomposition.Unfortunately, this Pointer Points to '\ 0'. strtok cannot split an empty string and returns NULL. The External Loop ends.So we get the information of the first person.
It seems that using strtok cannot solve the problem of extracting multi-user information through two-layer loops.Is there any other way? Obviously, there are other ways.
I have provided a solution. Take ',' (comma) and '(Space) as the delimiters to Solve the Problem cyclically.
In = 0; <br/> while (P [in] = strtok (BUF ,","))! = NULL) <br/>{< br/> switch (in % 3) <br/>{< br/> case 0: <br/> printf ("% d individual: Name! /N ", in/3 + 1); <br/> break; <br/> case 1: <br/> printf (" % d: sex! /N ", in/3 + 1); <br/> break; <br/> case 2: <br/> printf (" % d: Age! /N ", in/3 + 1); <br/> break; <br/>}< br/> In ++; <br/> Buf = NULL; <br/>}< br/> printf ("here we have % d strings/N", in); <br/> for (Int J = 0; j <in; j ++) <br/>{< br/> printf ("> % S </N", P [J]); <br/>}
Although the program can achieve the desired results, it is not a good solution.The program requires youYou must know exactly how many data members a struct contains.It is obviously not as intuitive as a dual loop.
If we need to use the double loop structure extraction, is there a suitable function that can replace strtok? Yes, it is strtok_r.
2. strtok_r and Its Usage
Strtok_r is a thread-safe version of strtok functions in Linux. Windows string. h does not contain it. To use this function, search for the implementation source code in Linux and copy it to your program. Other methods should also be available, such as using gnu c library. I downloaded the GNU C library, found the strtok_r implementation code in its source code, and copied it. It can be seen as the combination of the first method and the second method.
The function prototype of strtok isChar * strtok_r (char *Str, Const char *Delim, Char **Saveptr);
The following is an English description of strtok from http://www.linuxhowtos.org/manpages/3/strtok_r.htm. the translation is provided by me.
TheStrtok_r() Function is a reentrant versionStrtok().
SaveptrArgument is a pointer toChar *Variable that is used internally
Strtok_r() In order to maintain context between successive cballs that parse the same string.
The strtok_r function is a reentrant version of the strtok function.Char **SaveptrA parameter is a pointer variable pointing to char *. It is used to save the time-sharing context in strtok_r to break down the same-source string in response to continuous calls.
On the first callStrtok_r(),StrShocould point to the string to be parsed, and the value
SaveptrIs ignored. In subsequent CILS,StrShocould be null, and
SaveptrShocould be unchanged since the previous call.
When strtok_r is called for the first time, the STR parameter must point to the string to be extracted. The value of the saveptr parameter can be ignored. During continuous calls, STR values are null, and saveptr values are returned after the last call. Do not modify them.
Different strings may be parsed concurrently Using Sequences of callto
Strtok_r() That specify differentSaveptrArguments.
A series of different strings may call strtok_r for extraction at the same time. Different saveptr parameters must be passed for different calls.
TheStrtok() Function uses a static buffer while parsing, so it's not thread safe. Use
Strtok_r() If this matters to you.
The strtok function uses a static buffer when extracting strings. Therefore, it is thread-safe. To ensure thread security, use strtok_r.
Strtok_r is actually the this pointer that is implicitly stored in strtok and interacts with the external function in the form of parameters. The caller transmits, saves, or even modifies the content. When the caller needs to continuously split the same-source string, in addition to assigning the STR parameter to null, it also needs to pass the saveptr saved during the last split.
For example, do you still remember the example of extracting struct mentioned above? We can use strtok_r to extract everyone's information in a dual loop.
Int in = 0; <br/> char buffer [info_max_sz] = "Fred male 25, John male 62, Anna female 16"; <br/> char * P [20]; <br/> char * Buf = buffer; <br/> char * outer_ptr = NULL; <br/> char * inner_ptr = NULL; <br/> while (P [in] = strtok_r (BUF, ",", & outer_ptr ))! = NULL) <br/>{< br/> Buf = P [in]; <br/> while (P [in] = strtok_r (BUF ,"", & inner_ptr ))! = NULL) <br/>{< br/> In ++; <br/> Buf = NULL; <br/>}< br/> Buf = NULL; <br/>}< br/> printf ("here we have % d strings/N", in); <br/> for (Int J = 0; j <in; j ++) <br/>{< br/> printf ("> % S </N", P [J]); <br/>}
The code that calls strtok_r has two more pointers, outer_ptr and inner_ptr, than the code that calls strtok. Outer_ptr is used to mark the extraction location of each person, that is, the External Loop; inner_ptr is used to mark the extraction location of each item in each person, that is, the internal loop. The specific process is as follows:
(1) 1st external loops,Outer_ptr ignoreTo extract the entire source string, extract "Fred male 25", separator ',' modified to '\ 0', and outer_ptr returns the pointer to 'J '.
(2) First inner cycle,Inner_ptr ignore,Results of extracting 1st external Loops"Fred male 25" is extracted and "Fred" is extracted. The separator ''is changed to '\ 0', and inner_ptr returns the pointer to 'M '.
(3) The second inner loop transmits the inner_ptr returned by the first inner loop. The first parameter is null, which is extracted from the 'M' position pointed by inner_ptr and "male" is extracted ", the separator ''is changed to '\ 0', and inner_ptr returns to point to '2 '.
(4) In the third inner loop, the inner_ptr returned by the second inner loop is passed. The first parameter is null. The inner_ptr is extracted from '2', and "25" is extracted ", because ''is not found, inner_ptr returns '\ 0' after 25 '.
(5) In the fourth inner loop, the inner_ptr returned by the third inner loop is passed. The first parameter is null because inner_ptr points to '\ 0' and cannot be extracted. A null value is returned. End the inner loop.
(6) 2nd external loops,Pass the outer_ptr returned by 1st external LoopsThe first parameter is null, which is extracted from 'J' at the position pointed by outer_ptr. The separator "John male 62", ', and' are modified to '\ 0 ', outer_ptr returns 'A '. (Calling strtok is stuck in this step.)
...... Likewise, the External Loop extracts all the information of a person at a time, and the internal loop extracts individual information from the results of the External Loop.
We can see that strtok_r displays the original internal pointer and provides the saveptr parameter. Added flexibility and security for functions.
3. Source Code of strtok and strtok_r
The implementation of these two functions consists of many versions. My strtok_r comes from the gnu c library, and strtok calls strtok_r. Therefore, the source code of strtok_r is given first.
/* Parse S into tokens separated by characters in DELIM. If S is NULL, the saved pointer in SAVE_PTR is used as the next starting point. For example: char s[] = "-abc-=-def"; char *sp; x = strtok_r(s, "-", &sp); // x = "abc", sp = "=-def" x = strtok_r(NULL, "-=", &sp); // x = "def", sp = NULL x = strtok_r(NULL, "=", &sp); // x = NULL // s = "abc\0-def\0"*/char *strtok_r(char *s, const char *delim, char **save_ptr) { char *token; if (s == NULL) s = *save_ptr; /* Scan leading delimiters. */ s += strspn(s, delim); if (*s == '\0') return NULL; /* Find the end of the token. */ token = s; s = strpbrk(token, delim); if (s == NULL) /* This token finishes the string. */ *save_ptr = strchr(token, '\0'); else { /* Terminate the token and make *SAVE_PTR point past it. */ *s = '\0'; *save_ptr = s + 1; } return token;}
The overall code process is as follows:
(1) judge whether the parameter S is null. If it is null, it uses the passed save_ptr as the initial decomposition position. If it is not null, it starts splitting with S.
(2) skip all delimiters starting with the string to be decomposed.
(3) Determine whether the current position to be decomposed is '\ 0'. If yes, return NULL (the explanation of the return value is null in connection with (1); if not, continue.
(4) Save the pointer token of the current string to be decomposed, and call strpbrk to find the Separator in the token: If not found, the save_ptr value is assigned to the position of '\ 0' at the end of the string to be decomposed, And the token does not change. If the token is found, the location of the separator is assigned to' \ 0 ', the token is truncated (extracted), and save_ptr points to the next place of the delimiter.
(5) The final result of the function (whether found or not) will be returned.
The strtok function can be understood as saving the save_ptr in strtok_r with an internal static variable, which is invisible to the caller.The Code is as follows:
Char * strtok (char * s, const char * delim) <br/>{< br/> static char * last; </P> <p> return strtok_r (S, delim, & last); <br/>}
With the implementation code of the above two functions, it is no longer difficult to understand some of the points mentioned in (1) (2.
I spent so much space summing up these two functions. This is because many people have a deep misunderstanding of strtok, and there are very few discussions on strtok on the Internet. Therefore, I would like to summarize a more comprehensive document, it is necessary. Second, this is also a process of continuous learning. The aggregation will get much more important information than the two functions.