If you are familiar with SED, awk, grep, or VI in Linux, the concept of regular expressions is certainly not unfamiliar. Because it can greatly simplify the complexity of processing strings, it has been applied in many Linux utilities. Do not think that regular expressions are only patents for Perl, Python, Bash, and other scripting languages. As a C language programmer, users can also use regular expressions in their own programs.
Standard C and C ++ do not support regular expressions, but some function libraries can help C/C ++ programmers complete this function, among them, the most famous Perl-Compatible Regular Expression Library for the number of Philip Hazel is included in many Linux releases.
The following is a brief introduction to Linux's built-in RegEx, a regular expression developed in C language;
1. Compile a regular expression
To improve efficiency, before comparing a string with a regular expression, you must first use the regcomp () function to compile it and convert it to the regex_t structure:
Int regcomp (regex_t * preg, const char * RegEx, int cflags );
The RegEx parameter is a string that represents the regular expression to be compiled. The preg parameter points to a data structure declared as regex_t, which is used to save the compilation result; the cflags parameter determines how the regular expression is processed.
If the regcomp () function is successfully executed and the compilation result is correctly filled into the preg, the function returns 0, and any other returned results indicate an error.
2. Match Regular Expressions
Once the regular expression is successfully compiled using the regcomp () function, you can call the regexec () function to complete the pattern matching:
Int regexec (const regex_t * preg, const char * string, size_t nmatch, regmatch_t pmatch [], int eflags );
Typedef struct {
Regoff_t rm_so;
Regoff_t rm_eo;
} Regmatch_t;
The preg parameter points to the compiled regular expression. The string parameter is the string to be matched, and the nmatch and pmatch parameters are used to return the matching results to the caller, the last parameter eflags determines the matching details.
In the process of calling the regexec () function for pattern matching, there may be multiple matches with the given regular expression in the string. The pmatch parameter is used to save these matching locations, the nmatch parameter indicates the maximum number of matching results that can be filled into the pmatch array by the regexec () function. When the regexec () function returns successfully, the results of the entire match are saved in pmatch [0]. Even if multiple matches exist in the string, pmatch matches only the first one and ends. If the regular expression RegEx (the second parameter in regcomp) is in parentheses and the matching is successful, the matching content in the string is recorded in pmatch, the subscript starts from 1. If the match succeeds, the corresponding location information is recorded in the regmatch_t structure. Otherwise, the rm_so and rm_eo values of regmatch_t are-1;
3. Release a regular expression
Whenever compiled regular expressions are no longer needed, you should call the regfree () function to release them to avoid Memory leakage.
Void regfree (regex_t * preg );
The regfree () function does not return any results. It only receives a pointer to the regex_t data type. This is the compilation result obtained by calling the regcomp () function.
If the regcomp () function is called multiple times for the same regex_t structure in the program, the POSIX standard does not specify whether the regfree () function must be called to release each time, however, we recommend that you call the regfree () function once each time you compile the regular expression to release the occupied storage space as soon as possible.
4. Report error messages
If regcomp () or regexec () is called to obtain a non-zero return value, it indicates that an error occurs during the processing of the regular expression, in this case, you can call the regerror () function to obtain detailed error information.
Size_t regerror (INT errcode, const regex_t * preg, char * errbuf, size_t errbuf_size );
The errcode parameter is the error code from the regcomp () or regexec () function, while the preg parameter is the compilation result obtained by the regcomp () function, the purpose is to provide the context necessary for formatting a message to the regerror () function. When executing the regerror () function, the maximum number of bytes specified by the errbuf_size parameter will be filled in the errbuf buffer with the formatted error message and the length of the error message will be returned.
5. Apply a regular expression
Finally, a specific example is provided to describe how to process regular expressions in C language programs.
# Include <stdio. h>
# Include <string. h>
# Include <RegEx. h>
/* Function for getting substrings */
Static char * substra (const char * STR, unsigned start, unsigned end)
{
Unsigned n = end-start;
Static char stbuf [256];
Strncpy (stbuf, STR + start, N );
Stbuf [N] = 0;
Return stbuf;
}
/* Main Program */
Int main (INT argc, char ** argv)
{
Char * pattern;
Int X, Z, lno = 0, cflags = 0;
Char ebuf [128], lbuf [256];
Regex_t reg;
Regmatch_t PM [10];
Const size_t nmatch = 10;
/* Compile a regular expression */
Pattern = argv [1];
Z = regcomp (& reg, pattern, cflags );
If (Z! = 0 ){
Regerror (z, & reg, ebuf, sizeof (ebuf ));
Fprintf (stderr, "% s: Pattern '% s' \ n", ebuf, pattern );
Return 1;
}
/* Process input data row by row */
While (fgets (lbuf, sizeof (lbuf), stdin )){
++ LNO;
If (Z = strlen (lbuf)>; 0 & lbuf [Z-1] = '\ n ')
Lbuf [Z-1] = 0;
/* Apply a regular expression to each row */
Z = regexec (& reg, lbuf, nmatch, PM, 0 );
If (Z = reg_nomatch) continue;
Else if (Z! = 0 ){
Regerror (z, & reg, ebuf, sizeof (ebuf ));
Fprintf (stderr, "% s: REGCOM ('% s') \ n", ebuf, lbuf );
Return 2;
}
/* Output the processing result */
For (x = 0; x <nmatch & PM [X]. rm_so! =-1; ++ X ){
If (! X) printf ("% 04d: % s \ n", lno, lbuf );
Printf ("$ % d = '% s' \ n", X, substra (lbuf, PM [X]. rm_so, PM [X]. rm_eo ));
}
}
/* Release the regular expression */
Regfree (& reg );
Return 0;
}
The above program obtains the regular expression from the command line, applies it to each row of data obtained from the standard input, and prints the matching result. Run the following command to compile and execute the program:
# GCC Regexp. C-o Regexp
#./Regexp 'regex [A-Z] * '<Regexp. c
0003: # include <RegEx. h>;
$0 = 'regex'
0027: regex_t reg;
$0 = 'regex'
0054: z = regexec (& reg, lbuf, nmatch, PM, 0 );
$0 = 'regexec'
6. Summary
Regular Expressions are a useful tool for programs that require complex data processing. This article focuses on how to use regular expressions in C to simplify string processing, so as to gain the flexibility similar to the Perl language in data processing.