File Parsing tool function strtok & strtok_r

Source: Internet
Author: User
Tags strtok

Prototype: extern char * strtok (char * s, char * delim );

Usage: # include <string. h>

Function: Splits a string into a group of tag strings. S is the string to be decomposed, and delim is the separator string.

NOTE: For the first call, s must point to the string to be decomposed, and then the call should set S to null.
Strtok searches for characters contained in delim in S and replaces them with null ('\ 0') until the entire string is searched.
Returns the string pointing to the next tag. If no string is marked, null is returned.

Disadvantage: strtok modifies the square expression of a string. It refers to a static pointer inside the function. When the pointer starts, it refers to the first address of the string. When the specified character is queried, this Pointer Points to the next position of the specified character and changes the specified character to \ n to form a string before the specified character. Here is a specific example: declare a string variable char * buffer = "ABCD, e" statically and then use the strtok function p = strtok (buffer ,",");
No errors will be found during compilation. After the program runs, it will find that the reason for "segmentation fault" is the sentence marked with "*" above. strtok will modify the string content, but it is a static variable and cannot be written. char * s = "ABCD"; and char s [] = "ABCD "; the difference between the two statements is that the first s is just a pointer to a constant string, and the second s is to apply for a memory to store variables, the content of this variable is initialized to "ABCD". The solution can be to change the declaration method, char buffer [] = "ABCD, E"; or to declare another variable, char
* Buf; Buf = buffer;

(1) I have introduced some precautions for using the strtok function. This article introduces an application of strtok and introduces the strtok_r function.
 
 
 
Application instance:
A classic example on the network is to split strings and store them in the struct. For example, the existing struct
Typedef struct person
{
Char name [25];
Char sex [10];
Char age [4];
} Person;
You need to extract the name, gender, and age from the string char buffer [info_max_sz] = "Fred male 25, John male 62, Anna female 16.
A feasible approach is to set a two-layer loop. For an External Loop, the comma (,) is used as the delimiter to separate the information of the three sub-strings,
The name, gender, and age are obtained by using the (Space) as the separator. According to this idea, we should be able to implement the desired functions. To simplify the procedure, we call strtok,
First, the substrings are saved to the string pointer array one by one, and all substrings saved in the pointer array are printed at the end of the program to verify the correctness of the program. The program should be as follows:
Int in = 0;
Char buffer [info_max_sz] = "Fred male 25, John male 62, Anna female 16 ";
Char * P [20];
Char * Buf = buffer;
While (P [in] = strtok (BUF ,","))! = NULL)
{
Buf = P [in];
While (P [in] = strtok (BUF ,""))! = NULL)
{
In ++;
Buf = NULL;
}
Buf = NULL;
}
Printf ("here we have % d strings/N", in );
For (Int J = 0; j <in; j ++)
{
Printf ("> % S </N", P [J]);
}
 
The result is that only the information of the first person is extracted. The execution of the program is not as expected.
The reason is: in the first external loop, strtok changes the comma after "Fred male 25," to '/0 ', in this case, the strtok internal this pointer points to the next character 'j' of the comma '.
After the first internal loop, we extracted "Fred", "male", and "25 ". After "25" is extracted, the this pointer inside the function is changed to '/0' after "25 '.
After the inner loop ends (the inner loop is actually executed four times), the second outer loop starts. Because the first parameter of the function is set to null, strtok uses the position pointed to by this pointer as the starting position for decomposition.
Unfortunately, this Pointer Points to '/0'. strtok cannot split an empty string and returns NULL. The External Loop ends. So we get the information of the first person.
It seems that using strtok cannot solve the problem of extracting multi-user information through two-layer loops. Is there any other way? Obviously, there are other ways.
I have provided a solution. At the same time, the comma and space are used as the delimiters to Solve the Problem cyclically.
In = 0;
While (P [in] = strtok (BUF ,","))! = NULL)
{
Switch (in % 3)
{
Case 0:
Printf ("% d Personal: Name! /N ", in/3 + 1 );
Break;
Case 1:
Printf ("% d individual: sex! /N ", in/3 + 1 );
Break;
Case 2:
Printf ("% d Personal: Age! /N ", in/3 + 1 );
Break;
}
In ++;
Buf = NULL;
}
Printf ("here we have % d strings/N", in );
For (Int J = 0; j <in; j ++)
{
Printf ("> % S </N", P [J]);
}
Although the program can achieve the desired results, it is not a good solution. The program requires you to know exactly how many data members a struct contains before extraction.
It is obviously not as intuitive as a dual loop. If we need to use the double loop structure extraction, is there a suitable function that can replace strtok? Yes, it is strtok_r.
 

2. strtok_r and Its Usage
Strtok_r is a thread-safe version of strtok functions in Linux. Windows string. h does not contain it. To use this function, search for the implementation source code in Linux and copy it to your program.
Other methods should also be available, such as using gnu c library. I downloaded the GNU C library, found the strtok_r implementation code in its source code, and copied it. It can be seen as the combination of the first method and the second method.
The strtok function prototype is char * strtok_r (char * STR, const char * delim, char ** saveptr );
Strtok English description from http://www.linuxhowtos.org/manpages/3/strtok_r.htm
The strtok_r function is a reentrant version of the strtok function. The Char ** saveptr parameter is a pointer variable pointing to char *. It is used to save the time-sharing context within strtok_r to break down the same-source string in response to continuous calls.
When strtok_r is called for the first time, the STR parameter must point to the string to be extracted. The value of the saveptr parameter can be ignored. During continuous calls, STR values are null, and saveptr values are returned after the last call. Do not modify them.
A series of different strings may call strtok_r for extraction at the same time. Different saveptr parameters must be passed for different calls.
The strtok function uses a static buffer when extracting strings. Therefore, it is thread-safe. To ensure thread security, use strtok_r.
Strtok_r is actually the this pointer that is implicitly stored in strtok and interacts with the external function in the form of parameters. The caller transmits, saves, or even modifies the content. When the caller needs to continuously split the same-source string, in addition to assigning the STR parameter to null, it also needs to pass the saveptr saved during the last split.
For example, do you still remember the example of extracting struct mentioned above? We can use strtok_r to extract everyone's information in a dual loop.
Int in = 0;
Char buffer [info_max_sz] = "Fred male 25, John male 62, Anna female 16 ";
Char * P [20];
Char * Buf = buffer;
Char * outer_ptr = NULL;
Char * inner_ptr = NULL;
While (P [in] = strtok_r (BUF, ",", & outer_ptr ))! = NULL)
{
Buf = P [in];
While (P [in] = strtok_r (BUF, "", & inner_ptr ))! = NULL)
{
In ++;
Buf = NULL;
}
Buf = NULL;
}
Printf ("here we have % d strings/N", in );
For (Int J = 0; j <in; j ++)
{
Printf ("> % S </N", P [J]);
}
The code that calls strtok_r has two more pointers, outer_ptr and inner_ptr, than the code that calls strtok. Outer_ptr is used to mark the extraction location of each person, that is, the External Loop; inner_ptr is used to mark the extraction location of each item in each person, that is, the internal loop. The specific process is as follows:
(1) 1st external loops, ignored by outer_ptr, extracted the entire source string, extracted "Fred male 25", separator ',' modified to '/0 ', outer_ptr returns 'J '.
(2) inner_ptr ignores the first inner_ptr and extracts the extracted result "Fred male 25" of the first external loop and extracts "Fred ", the separator ''is modified to '/0', and inner_ptr returns the pointer to 'M '.
(3) The second inner loop transmits the inner_ptr returned by the first inner loop. The first parameter is null, which is extracted from the 'M' position pointed by inner_ptr and "male" is extracted ", the separator ''is modified to '/0', and inner_ptr returns to point to '2 '.
(4) In the third inner loop, the inner_ptr returned by the second inner loop is passed. The first parameter is null. The inner_ptr is extracted from '2', and "25" is extracted ", because ''is not found, inner_ptr returns the '/0' after 25 '.
(5) In the fourth inner loop, the inner_ptr returned by the third inner loop is passed. The first parameter is null. Because inner_ptr points to a position of '/0', it cannot be extracted and a null value is returned. End the inner loop.
(6) For 2nd external loops, pass the outer_ptr returned by 1st external loops. The first parameter is null, which is extracted from 'J' at the position pointed by outer_ptr, extract "John male 62", separator ',' is modified to '/0', and outer_ptr returns the pointer to 'A '. (Calling strtok stops at this step)
...... Likewise, the External Loop extracts all the information of a person at a time, and the internal loop extracts individual information from the results of the External Loop.
We can see that strtok_r displays the original internal pointer and provides the saveptr parameter. Added flexibility and security for functions.
 
3. Source Code of strtok and strtok_r
The implementation of these two functions consists of many versions. My strtok_r comes from the gnu c library, and strtok calls strtok_r. Therefore, the source code of strtok_r is given first.
/*
* Strtok_r.c:
* Implementation of strtok_r for systems which don't have it.
*
* This is taken from the gnu c library and is distributed under the terms
* The lgpl. See copyright notice below.
*
*/
# Ifdef have_config_h
# Include "configuration. H"
# Endif/* have_config_h */
# Ifndef have_strtok_r
Static const char rcsid [] = "$ ID: strtok_r.c, v 1.1 2001/04/24 14:25:34 Chris exp $ ";

# Include <string. h>
# UNDEF strtok_r
/* Parse s into tokens separated by characters in delim.
If S is null, the saved pointer in save_ptr is used
The next starting point. For example:
Char s [] = "-ABC-=-Def ";
Char * sp;
X = strtok_r (S, "-", & SP); // X = "ABC", SP = "=-Def"
X = strtok_r (null, "-=", & SP); // X = "def", SP = NULL
X = strtok_r (null, "=", & SP); // X = NULL
// S = "ABC/0-def/0"
*/
Char * strtok_r (char * s, const char * delim, char ** save_ptr ){
Char * token;
If (S = NULL) S = * save_ptr;
/* Scan leading delimiters .*/
S + = strspn (S, delim );
If (* s = '/0 ')
Return NULL;
/* Find the end Of the token .*/
Token = s;
S = strpbrk (token, delim );
If (S = NULL)
/* This token finishes the string .*/
* Save_ptr = strchr (token, '/0 ');
Else {
/* Terminate the token and make * save_ptr point past it .*/
* S = '/0 ';
* Save_ptr = S + 1;
}
Return Token;
}
 
The overall code process is as follows:
(1) judge whether the parameter S is null. If it is null, it uses the passed save_ptr as the initial decomposition position. If it is not null, it starts splitting with S.
(2) skip all delimiters starting with the string to be decomposed.
(3) Determine whether the current position to be decomposed is '/0'. If yes, return NULL (the explanation of the return value is null in connection with (1); if not, continue.
(4) Save the pointer token of the current string to be decomposed, and call strpbrk to find the Separator in the token: If not found, the save_ptr value is assigned to the position of '/0' at the end of the string to be decomposed, And the token does not change. If the token is found, the location of the separator is assigned to'/0 ', the token is truncated (extracted), and save_ptr points to the next place of the delimiter.
(5) The final result of the function (whether found or not) will be returned.
The strtok function can be understood as saving the save_ptr in strtok_r with an internal static variable, which is invisible to the caller. The Code is as follows:
Char * strtok (char * s, const char * delim)
{
Static char * last;
Return strtok_r (S, delim, & last );
}
With the implementation code of the above two functions, it is no longer difficult to understand some of the points mentioned in (1) (2.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.