Detailed explanation of the use of regular expressions in C Language
% [] Usage: % [] indicates that a character set is to be read. If [followed by the first character is "^", it indicates the inverse meaning.
The string in [] can be 1 or more characters. The null character set (% []) is in violation of regulations and can
Result in unpredictable results. % [^] Is also in violation of regulations.
% [A-z] reads strings between a-z. If it is not earlier than this, it is stopped, as shown in figure
Char s [] = hello, my friend "; // note: the comma is not between a-z
Sscanf (s, "% [a-z]", string); // string = hello
/* Sscanf () reads data from the cache zone s to the string; while scanf, fscanf is from the standard input file (stdin) and
Number of formats read from a user-defined file */
% [^ A-z] reads strings not between a-z. If a character between a-z is encountered, it is stopped, as shown in figure
Char s [] = HELLOkitty "; // note: the comma is between non-a-z
Sscanf (s, "% [^ a-z]", string); // string = HELLO
% * [^ =] The variable is not saved with the * sign. Skips a qualified string.
Char s [] = notepad = 1.0.0.1001;
Char szfilename [32] =;
Int I = sscanf (s, % * [^ =], szfilename); // szfilename = NULL because it is not saved
Int I = sscanf (s, % * [^ =] = % s, szfilename); // szfilename = 1.0.0.1001
% 40c reads 40 characters
The run-time
Library does not automatically append a null terminator
To the string, nor does reading 40 characters
Automatically terminate the scanf () function. Because
Library uses buffered input, you must press the ENTER key
To terminate the string scan. If you press the ENTER before
The scanf () reads 40 characters, it is displayed normally,
And the library continues to prompt for additional input
Until it reads 40 characters
% [^ =] Reads the string until '=' is reached. '^' can contain more characters, such:
Char s [] = notepad = 1.0.0.1001;
Char szfilename [32] =;
Int I = sscanf (s, % [^ =], szfilename); // szfilename = notepad
If the parameter format is: % [^ =:], you can also read notepad from notepad: 1.0.0.1001.
Example:
Char s [] = notepad = 1.0.0.1001;
Char szname [32] =;
Char szver [32] = "";
Sscanf (s, % [^ =] = % s, szname, szver); // szname = notepad, szver = 1.0.0.1001
Summary: % [] has many functions, but it is not very common, mainly because:
1. Many system scanf functions have vulnerabilities. (In typical cases, TC sometimes fails to input floating point functions ).
2. Complicated usage and error-prone.
3. It is difficult for the compiler to perform syntax analysis, thus affecting the quality and execution efficiency of the target code.
I personally think that 3rd is the most critical, and the more complicated the functions, the lower the execution efficiency. Some simple string analysis can be processed by ourselves.
Supplement:
Regular Expression C/C ++ in sscanf, scanf, and fscanf
Each language has different degrees of support for regular expressions. In C, these three functions with input functions do not have strong support for regular expressions, but we still need to know about them.
First, let's look at their prototype:
# Include
Int scanf (const char * format ,...);
Int fscanf (FILE * stream, const char * format ,...);
Int sscanf (const char * str, const char * format ,...);
All parameters can be changed. sscanf is similar to scanf, and standard input (stdin) can be used as the input source. The most important part is the format parameter. It can be one or more {% [*] [width] [{h | l | I64 | L}] type | ''| non-% characters }.
Parameter description:
1. * can also be used in the format. (% * d and % * s) with an asterisk (*) indicates skipping this data and not reading it. (that is, do not read this data into the parameter) 2. {a | B | c} indicates one of a, B, and c, [d]. it indicates either d or d. 3. width indicates the read width. 4. {h | l | I64 | L}: parameter size. Generally, h indicates a single-byte size, I indicates a 2-byte size, and L indicates a 4-byte size (double exception ), l64 indicates 8-byte size. 5. type: % s, % d, and so on. 6. Special: % * [width] [{h | l | I64 | L}] type indicates that values that meet this condition are filtered out and no value is written to the target parameter.
Supported set operations: % [a-z] indicates matching any character in a to z, greedy (as many as possible) % [AB '] matches a, B, and', greedy % [^ a] matches any character other than a, greedy
Return Value
The three functions return successfully matched and allocated input items. This indicates the format in the format parameter list. The returned value can be less than the number of matched items you provide (some will fail to match ). If the matching fails in advance, 0 is returned. If it reaches the end of the file, EOF is returned, and EOF is also returned when an error occurs. You can view the error code by outputting errno.
If you use fscanf to determine whether the file is complete, there will be a security risk. If each read operation fails, the returned value will never be EOF. All functions of the scanf family read data into the buffer and then read the data in the buffer.
Note: Functions of the scanf family ignore the blank line at the beginning.