Inux C Regular Expression

Source: Internet
Author: User
Tags preg

The following describes how to use it:
First, compile the regular expression.
To improve efficiency, before comparing a string with a regular expression, you must first use the regcomp () function to compile it and convert it to the regex_t structure:

Int regcomp (regex_t * preg, const char * regex, int cflags );

The regex parameter is a string that represents the regular expression to be compiled. The preg parameter points to a data structure declared as regex_t, which is used to save the compilation result; the cflags parameter determines how the regular expression is processed. (You can use the man regcomp command to view the detailed explanation)
If the regcomp () function is successfully executed and the compilation result is correctly filled into the preg, the function returns 0, and any other returned results indicate an error.

Then match the regular expression.
Once the regular expression is successfully compiled using regcomp (), you can call regexec () to complete the pattern matching:

Int regexec (const regex_t * preg, const char * string, size_t nmatch, regmatch_t pmatch [], int eflags );
Typedef struct
{
Regoff_t rm_so;
Regoff_t rm_eo;
} Regmatch_t;

The preg parameter points to the compiled regular expression. The string parameter is the string to be matched, and the nmatch and pmatch parameters are used to return the matching results to the caller, the last parameter eflags determines the matching details.

In the process of calling regexec () for pattern matching, multiple strings may match the given regular expression. The pmatch parameter is used to save these matching bits, the nmatch parameter tells regexec () How many matching results can be filled into the pmatch array at most. When regexec () is returned successfully, it is returned from string + pmatch [0]. rm_so to string + pmatch [0]. rm_eo is the first matched string from string + pmatch [1]. rm_so to string + pmatch [1]. rm_eo is the second matching string, and so on.

Finally, release the regular expression.
Whenever you no longer need a compiled regular expression, you should call regfree () to release it to avoid Memory leakage.
Void regfree (regex_t * preg );

Report error information
If regcomp () or regexec () is called to obtain a non-zero return value, it indicates that an error occurs during the processing of the regular expression. In this case, you can call regerror () obtain detailed error information.

Size_t regerror (int errcode, const regex_t * preg, char * errbuf, size_t errbuf_size );

The errcode parameter is an error code from regcomp () or regexec (), while the preg parameter is the compilation result obtained by regcomp, the purpose is to provide the context required for message formatting to regerror (). When executing regerror (), the maximum number of bytes specified by the errbuf_size parameter will be filled in the errbuf buffer area with the formatted error message and the length of the error message will be returned.

 

 

 

 

Appendix:

1. int regcomp (regex_t * compiled, const char * pattern, int cflags)

This function compiles the specified rule expression pattern into a specific data format compiled, which makes matching more effective. The regexec function uses this data for pattern matching in the target text string. 0 is returned for successful execution.

Regex_t is a struct data type used to store compiled rule expressions. Its member re_nsub is used to store the number of sub-rule expressions in Rule expressions, A sub-rule expression is a partial expression wrapped in parentheses.

Pattern is a pointer to the rule expression we have written.
Cflags has the following four values or their values after the (|) operation:
REG_EXTENDED matches with more powerful Extension Rule expressions.
Case Insensitive when REG_ICASE matches letters.
REG_NOSUB does not need to store matching results.
REG_NEWLINE recognizes line breaks, so '$' can start matching at the end of the line, and '^' can start matching at the beginning of the line.

2. int regexec (regex_t * compiled, char * string, size_t nmatch, regmatch_t matchptr [], int eflags)

After compiling the rule expression, you can use regexec to match our target text string. If the cflags parameter is not specified when the rule expression is compiled, by default, line breaks are ignored, that is, the entire text string is treated as a string. 0 is returned for successful execution.

Regmatch_t is a struct data type. The rm_so member stores the starting position of the matching text string in the target string, and rm_eo stores the Ending position. We usually define a group of such structures in the form of arrays. Because our rule expressions usually contain sub-rule expressions. The array 0 contains the location of the primary rule expression, and the subsequent unit stores the location of the sub-rule expression according to the time.

Compiled is a rule expression that has been compiled using the regcomp function.
String is the target text string.
Nmatch is the length of the regmatch_t struct array.
Matchptr regmatch_t type struct array, which stores the location information matching the text string.
Eflags has two values.
According to my understanding, if this value is specified, '^' will not start matching our target string. In short, I still don't quite understand the meaning of this parameter. The original Article is as follows:
If this bit is set, then the beginning-of-line operator doesn' t match the beginning of the string (presumably because it's not the beginning of a line ). if not set, then the beginning-of-line operator does match the beginning of the string.
REG_NOTEOL is similar to the above one, but this specifies the end of line.

3. void regfree (regex_t * compiled)

When we use the compiled rule expression or re-compile another rule expression, we can use this function to clear the contents of the regex_t struct pointed to by compiled. Remember, if it is re-compiled, you must first clear the regex_t struct.

4. size_t regerror (int errcode, regex_t * compiled, char * buffer, size_t length)

When regcomp or regexec is executed to generate an error, you can call this function and return a string containing the error message.

Errcode is the error code returned by the regcomp and regexec functions.
Compiled is a rule expression compiled using the regcomp function. The value can be NULL.
Buffer points to the memory space of the string used to store error information.
Length indicates the length of the buffer. If the length of the error message is greater than this value, the regerror function automatically truncates the excess string, but returns the complete length of the string. So we can use the following method to get the length of the error string first.
Size_t length = regerror (errcode, compiled, NULL, 0 );

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.