C-language Regular expression Regcomp () regexec () RegFree () detailed

Source: Internet
Author: User

Both standard C and C + + do not support regular expressions, but there are libraries that can assist C + + programmers to do this, the most famous of which is Philip Hazel's perl-compatible Regular expression Library, Many Linux distributions come with this library of functions.

C-language processing regular expressions commonly used functions are Regcomp (), Regexec (), RegFree (), and Regerror (), which are generally divided into three steps, as follows:

The use of regular expressions in the C language is generally divided into three steps:
  1. Compile regular expression Regcomp ()
  2. Match Regular Expression regexec ()
  3. Release regular Expression RegFree ()


Below is a detailed explanation of the three functions

1, int regcomp (regex_t *compiled, const char *pattern, int cflags)
This function compiles the specified regular expression pattern into a specific data format compiled, which makes the match more efficient. The function regexec uses this data for pattern matching in the target text string.  Successful execution returns 0.

parameter Description:
①regex_t is a struct data type used to hold a compiled regular expression, whose member Re_nsub is used to store the number of Zhong expressions in the regular expression, and the Zhong expression is a partial expression wrapped in parentheses.
②pattern is a pointer to a regular expression that we have written.
③cflags have the following 4 values or are they or operations (|) After the value:
The reg_extended is matched in a more powerful way with extended regular expressions.
Reg_icase ignores case when matching letters.
The reg_nosub does not store the results after matching.
Reg_newline recognizes newline characters so that ' $ ' can begin matching from the end of the line, and ' ^ ' can start matching from the beginning of the line.

2. int regexec (regex_t *compiled, Char *string, size_t nmatch, regmatch_t matchptr [], int eflags)
when we compile the regular expression, we can use Regexec to match our target text string, if you compile the regular expression without specifying the Cflags parameter is Reg_newline, then by default is to ignore the newline character, That is, the entire text string is treated as a string. Successful execution returns 0.
regmatch_t is a struct data type, defined in Regex.h:
typedef struct
{
regoff_t Rm_so;
regoff_t Rm_eo;
} regmatch_t;
The member Rm_so holds the starting position of the matched text string in the target string and rm_eo the end position. Usually we define a set of such structures in the form of arrays. Because often our regular expressions also contain child regex expressions. The array 0 cells hold the primary regular expression position, and the back cell stores the sub-regular expression position in turn.

parameter Description:
①compiled is a regular expression that has been compiled with the Regcomp function.
②string is the target text string.
③nmatch is the length of an array of regmatch_t structures.
④MATCHPTR A structure array of type regmatch_t that holds the position information of the matching text string.
⑤eflags has two values
Reg_notbol As I understand it, if this value is specified, then ' ^ ' will not start matching our target string. In short, I still do not quite understand the meaning of this parameter;
Reg_noteol has the same effect as the top, but this specifies the end of line.

3. Void RegFree (regex_t *compiled)
when we use the compiled regular expression, or to recompile other regular expressions, we can use this function to empty the contents of the regex_t struct that compiled points to, remember that if you recompile, you must first empty the regex_t structure.

4. size_t regerror (int errcode, regex_t *compiled, char *buffer, size_t length)
This function can be called to return a string containing the error message when the execution of Regcomp or regexec produces an error.

parameter Description:
①errcode is the error code returned by the Regcomp and regexec functions.
②compiled is a regular expression that has been compiled with the Regcomp function, which can be null.
③buffer points to the memory space of the string used to hold the error message.
④length Indicates the length of the buffer, and if the length of the error message is greater than this value, the Regerror function automatically truncates the exceeded string, but he still returns the length of the full string. So we can get the length of the error string in the following way.

size_t length = Regerror (Errcode, compiled, NULL, 0);

Below is an example of a matching email, which is available in three steps above.



The following program is responsible for getting the regular expression from the command line and then applying it to each row of data obtained from the standard input and printing out the matching results.
#include <stdio.h>
#include <sys/types.h>
#include <regex.h>

/ * function to take substring * /
static char* substr (const CHAR*STR,
unsigned start, unsigned end)
{
unsigned n = end-start;
static char stbuf[256];
strncpy (stbuf, str + start, n);
Stbuf[n] = 0;
return stbuf;
}

/ * Main program * /
int main (int argc, char** argv)
{
char * pattern;
int x, z, lno = 0, cflags = 0;
Char ebuf[128], lbuf[256];
regex_t reg;
regmatch_t pm[10];
const size_t Nmatch = ten;
/ * Compile the regular expression * /
pattern = argv[1];
z = Regcomp (?, pattern, cflags);
if (Z! = 0) {
Regerror (z,?, Ebuf, sizeof (EBUF));
fprintf (stderr, "%s:pattern '%s ' \ n", ebuf, pattern);
return 1;
  }
/ * Process input data row by line * /
While (fgets (lbuf, sizeof (LBUF), stdin))
  {
++lno;
if ((z = strlen (lbuf)) > 0 && lbuf[z-1] = = ' \ n ')
lbuf[z-1] = 0;
/ * Apply regular expressions to each row for matching * /
z = regexec (?, Lbuf, Nmatch, PM, 0);
if (z = = Reg_nomatch) continue;
else if (Z! = 0) {
Regerror (z,?, Ebuf, sizeof (EBUF));
fprintf (stderr, "%s:regcom ('%s ') \ n", Ebuf, lbuf);
return 2;
    }
/ * Output processing results * /
For (x = 0; x < nmatch && Pm[x].rm_so! = 1; + + x)
    {
if (!x) printf ("%04D:%s\n", Lno, lbuf);
printf ("$%d= '%s ' \ n", X, substr (Lbuf, Pm[x].rm_so, Pm[x].rm_eo));
    }
  }
/ * Release the regular expression * /
regfree (?);
return 0;
}

execute the following command to compile and execute the program:
# gcc Regexp.c-o regexp
#./regexp ' regex[a-z]* ' < regexp.c
0003: #include <regex.h>
$0= ' regex '
0027:regex_t reg;
$0= ' regex '
0054:z = Regexec (?, Lbuf, Nmatch, PM, 0);
$0= ' regexec '



Summary: Regular expressions are undoubtedly a useful tool for programs that require complex data processing. This article focuses on how to use regular expressions in the C language to simplify string processing, so that you can gain similar flexibility in data processing as in the Perl language.

C-language Regular expression Regcomp () regexec () RegFree () detailed

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.