C language method of processing CSV file (ii)

Source: Internet
Author: User
Tags strtok

The use of the Strtok function is a commonplace problem. The function is very important and the controversy is great. The following statements may differ from some of the materials or your original understanding, therefore, I try to test the evidence. It is necessary to give an account of the experimental environment, win7+vc6.0, an extremely civilian experimental environment. The source code used in this article is mostly from the network, slightly modified as an illustration. Of course, my level is limited, there is a problem is unavoidable, you forgive me at the same time may wish to do more experiments, to test for evidence.

The function prototypes for Strtok are:

Char *strtok (char *s, char *delim)

Features are:

"Parse S into tokens separated by characters in DELIM. If S is NULL, the saved pointer in Save_ptr are used as the next starting point. ”

Translated into Chinese is: acting on the string s, the characters contained in the Delim as a delimiter, the s tangent into a substring; if s is null, the function holds the pointer save_ptr in the next call as the starting position.

The return value of the function is a pointer from the substring that points to the split.

There are some differences between this definition and some domestic websites, and it is precisely these differences that cause many people to have no correct understanding of strtok. It is desirable for readers to read official documents (mostly in English) before invoking some functions, rather than looking at some baseless assertion.

The following points need to be noted for using strtok:

1. Function is the decomposition of the string, so-called decomposition, that is, there is no new string, but only the content of S pointed to do some hands and feet . Therefore, the source string s has changed!

Set source string s to Char buffer[info_max_sz]= ", Fred male 25,john male 62,anna female 16"; Filter string Delim to char *delim = "", that is, the space is a delimiter.

1#include <stdio.h>2#include <string.h>3 4 intMain ()5 {6     CharBuffer[] =", Fred male 25,john male 62,anna female";7     Char*Buff;8     Char*delima =" ";9 TenBuff =strtok (buffer, Delima); One  A     return 0; -}

The above code produces this result:

First, thebuffer has changed. If you print the value of buffer at this point, "Fred" is displayed, while the back "male 25 ... 16 "disappeared. In fact, the Strtok function finds its first occurrence, based on the delimiter in the Delim, which is the space (Buffer[5]) behind Fred and modifies it to '/0 '. The rest of the position does not change. This is a good explanation for why the value of print buffer can only appear ", Fred," rather than the entire contents of the buffer. Therefore, be careful when using strtok to prevent the source string from being modified.

Understanding the change in buffer is a good explanation for the function's return value. The return value BUF to the substring before the delimiter (in fact, this is not true, see "3" for a detailed description of the return value). Note that the address of the variable indicates that BUF still points to the source string.

The delimiter Delim has not changed, it is no longer.

2. To continue extracting the source string s after the first fetch of the substring is complete, it should be followed (the second, third time ...). The first parameter of Strtok is assigned null value in the call to Nth.
1#include <stdio.h>2#include <string.h>3 4 intMain ()5 {6     CharBuffer[] =", Fred male 25,john male 62,anna female";7     Char*Buff;8     Char*delima =" ";9 TenBuff =strtok (buffer, Delima); OneBuff =strtok (NULL, Delima); ABuff =strtok (NULL, Delima); -Buff =strtok (NULL, Delima); -  the     return 0; -}

The result of the first call, as described earlier, extracts the ", Fred". We also want to continue to use the space as a demarcation, extract the back of the "male" and so on. As can be seen, after the first call we have passed the null value to the first argument of strtok, which means that the function continues to decompose the string from the location where the last call was implicitly saved, and for the second call above, The first call ends with a this pointer pointing to the next bit of the delimiter, which is where the ' m ' is located , which can be extracted sequentially.

  

And so on .....

As for why you should assign a null value, either you remember the conclusion, or you can check the source code of the strtok. There will be some introductions at the end of this article.

Of course, there are some people who love a dead end, not according to the routine out of the card, to see what will not be assigned to the null value of the buffer can have any results. In fact, the answer to think can also think of. Passing buffer again is equivalent to finding the delimiter Delim from the beginning of the string, and at this point the buffer has been modified (the visible part is only "Fred"), so the result must be that the delimiter Delim is not found.

3. Discussion on function return value

As described in "1", in the case of extracting a substring, the return value of strtok (assuming that the return value is assigned to the pointer buf) is a pointer to the extracted substring. This pointer points to the starting position of the substring in the source string. The next character at the end of the substring is delimited before extraction and is modified to '/0 ' after extraction. Therefore, if you print the value of the BUF, you can successfully output the contents of the substring.

What value does the function return if it is not extracted to a substring?

1#include <stdio.h>2#include <string.h>3 4 intMain ()5 {6     CharBuffer[] =", Fred male 25,john male 62,anna female";7     Char*Buff;8     Char*delima ="+";9 TenBuff =strtok (buffer, Delima); One  A     return 0; -}

You can see that the buffer does not contain the delimiter Delim. The value of BUF after calling Strtok is:

Because it was not found, the source string buffer did not change, buf points to the first address of the source string, and the printout value is the full value of the entire string.

When is the return value of a function null?

The Baidu Encyclopedia says, " returns NULL when there is no split string." "This is a very ambiguous statement. If you want to know exactly what the problem is, you might want to look at the strtok implementation principle. This is the first experiment to explain.

The first call to Strtok, no doubt, buf points to "Fred". The this pointer points to the character ' m ' in buffer.

The second call to Strtok, because the first parameter is NULL, indicates that the function continues to decompose at the location of the last call to the saved this pointer, that is, "male 25". When the decomposition is complete, buf points to "male".

The third call to Strtok, the argument continues to be set to NULL, this is the second time to save the position of the this pointer decomposition, that is, "25" decomposition. Because the substring containing the delimiter Delim cannot be found, buf points to "25". Because the substring containing the delimiter Delim is not found, the this pointer at this point points to ' + ' at the end of "25".

Fourth invocation, the parameter is still null, at which point the third call to the saved this pointer has pointed to the end of the string '/0 ', which cannot be decomposed anymore. So the function returns NULL, which is what the Baidu Encyclopedia mentions, "when there is no split string, the function returns NULL."

4. Discussion of parameter separator Delim (Delim is a collection of separators)

Many people in the use of strtok, all take for granted that the function in the split string when the complete match delimiter Delim, such as delim= "AB", then for the "Acdab" the string, the function extracts is "ACD". At least that's what I thought when I used it for the first time. In fact, we are all wrong, I was looking at the source code of the function only to find this problem, and see the following example.

The source string is buffer, the delimiter Delim is a comma and a space, according to the general idea we would think that after calling the function, buf the value of "fred,male,25", the result is this?

The result after the first call was "Fred", not the result we thought. What is this for?

We went back to the GNU C Library function definition for strtok: "Parse S into tokens separated by characters in DELIM". This means that the characters contained in the Delim can be used as delimiters, rather than strictly matched. Delim can be understood as a collection of separators. This is very important ~

Of course, when we break down strings, we seldom use multiple separators. This also leads to the case where many people only discuss a delimiter when writing an example. There are more people looking at the example of the wrong to know the role of Delim.

5. The string to be decomposed, the first character is the delimiter

The first character is a delimiter and cannot be counted as a very special case. It is also possible to decompose the string correctly according to the usual decomposition idea.

I would like to state that Strtok has adopted a faster approach than conventional processing for this situation.

As shown in the example. A comma-delimited string "Fred male 25" can be obtained with just one call, and the ', ' in front of F is ignored. This shows that thestrtok at the time of invocation ignores the start position delimiter. This can be verified from the source code of the strtok.

6. You cannot pass a string constant to the first argument!

The example given in this article saves the source string as a string array variable. If you define the source string as a string constant, you can imagine that the program throws an exception because the Strtok function attempts to modify the value of the source string.

Well, this article describes in detail the use of strtok considerations, (ii) I will detail the strtok can not achieve some of the functions and lead to strtok_r function, and finally introduce the implementation of two functions.

C language methods for working with CSV files (ii)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.