Standard C and C ++ do not support regular expressions, but some function libraries can help C/C ++ programmers complete this function, among them, the most famous is the Perl-Compatible RegularExpression library of Philip Hazel, which is included in many Linux releases.
Common functions used to process regular expressions in C language include regcomp (), regexec (), regfree (), and regerror (), which are generally divided into three steps:
In C, regular expressions are generally divided into three steps:
1. Compile the regular expression regcomp ()
2. Match the regular expression regexec ()
3. Release the regular expression regfree ()
Below is a detailed explanation of the three functions
1. int regcomp (regex_t * compiled, const char * pattern, int cflags)
This function compiles the specified regular expression pattern into a specific data format compiled, which makes matching more effective. The regexec function uses this data for pattern matching in the target text string. 0 is returned for successful execution.
Parameter description:
① Regex_t is a struct data type used to store compiled regular expressions. Its member re_nsub is used to store
Subregular expressionThe number of sub-regular expressions is partial expressions wrapped in parentheses.
② Pattern is a pointer to the regular expression we have written.
③ Cflags has the following four values or the values after their or operation (|:
REG_EXTENDEDMatch with more powerful extension regular expressions.
REG_ICASECase Insensitive when matching letters.
REG_NOSUBNo need to store matching results.
REG_NEWLINEIdentify the line break so that '$' can start matching at the end of the line, and '^' can start matching at the beginning of the line.
2. int regexec (regex_t * compiled, char * string, size_t nmatch, regmatch_t matchptr [], int eflags)
After compiling the regular expression, you can use regexec to match our target text string. If the cflags parameter is not specified when compiling the regular expression, by default, line breaks are ignored, that is, the entire text string is treated as a string. 0 is returned for successful execution.
In the process of calling the regexec () function for pattern matching, there may be multiple matches with the given regular expression in the string. The pmatch parameter is used to save these matching locations, the nmatch parameter indicates the maximum number of matching results that can be filled into the pmatch array by the regexec () function. When the regexec () function returns successfully, it starts from string + pmatch [0]. rm_so to string + pmatch [0]. rm_eo is the first matched string from string + pmatch [1]. rm_so to string + pmatch [1]. rm_eo is the second matching string, and so on.
Regmatch_t is a struct data type, which is defined in regex. h:
Typedef struct
{
Regoff_t rm_so;
Regoff_t rm_eo;
} Regmatch_t;
The rm_so member stores the starting position of the matching text string in the target string, and the rm_eo member stores the Ending position. We usually define a group of such structures in the form of arrays. Because our regular expressions usually contain subregular expressions.
Array 0 contains the location of the primary Regular Expression,
The subsequent units store the position of the sub-Regular Expression in sequence.
Parameter description:
① Compiled is a regular expression compiled using the regcomp function.
② String is the target text string.
③ Nmatch indicates the maximum number of matching results that can be filled into the pmatch array by the regexec () function.
④ Matchptr regmatch_t type struct array, which stores location information matching text strings.
⑤ Eflags has two values.
According to my understanding, if this value is specified, '^' will not start matching our target string. In short, I still do not quite understand the meaning of this parameter;
REG_NOTEOL is similar to the above one, but this specifies the end of line.
3. void regfree (regex_t * compiled)
When we use the compiled regular expression or re-compile other regular expressions, we can use this function.
Clear the content of the regex_t struct pointed to by compiled. Remember to clear the regex_t struct first if it is re-compiled.
4. size_t regerror (int errcode, regex_t * compiled, char * buffer, size_t length)
When regcomp or regexec is executed to generate an error, you can call this function and return a string containing the error message.
Parameter description:
① Errcode is the error code returned by the regcomp and regexec functions.
② Compiled is a regular expression compiled using the regcomp function. The value can be NULL.
③ Buffer points to the memory space of the string used to store error information.
④ Length indicates the length of the buffer. If the length of the error message is greater than this value, the regerror function automatically truncates the excess string, but it still returns the complete String length. So we can use the following method to get the length of the error string first.
Size_t length = regerror (errcode, compiled, NULL, 0 );
Example: (Environment centos)
/* Write by gx * time: 2014/3/21 * for: test regex **/# include
# Include
# Include
Int main (int argc, char ** argv) {int status; int I; int cflags = REG_EXTENDED; regmatch_t pmatch [1]; const size_t nmatch = 1; regex_t reg; const char * pattern = "^ \ w + ([-+.] \ w +) * @ \ w + ([-.] \ w + )*. \ w + ([-.] \ w +) * $ "; char * buf =" helloworld12345@qq.com "; regcomp (®, Pattern, cflags); status = regexec (®, Buf, nmatch, pmatch, 0); if (REG_NOMATCH = status) {printf ("----- no nomatch --------"); return 0;} if (status = 0) {printf ("match: \ n"); for (I = pmatch [0]. rm_so; I
The above Article reference http://see.xidian.edu.cn/cpp/html/1428.html