using regular expressions in the C languageIf users are familiar with the SED, awk, grep, or VI under Linux, then the concept of regular expressions is certainly not unfamiliar. Because it can greatly simplify the complexity of processing strings, it is now being applied in many Linux utilities. Never assume that regular expressions are just a patent for scripting languages like Perl, Python, Bash, and as C programmers, users can also use regular expressions in their own programs.
Both standard C and C + + do not support regular expressions, but there are libraries that can assist C + + programmers to complete this function, most notably when Philip Hazel's perl-compatible Regular expression Library, Many Linux distributions have this library of functions.
Compiling regular Expressions
To improve efficiency, before comparing a string to a regular expression, you first compile it with the Regcomp () function and convert it to a regex_t structure:
int Regcomp (regex_t *preg, const char *regex, int cflags);
The parameter regex is a string that represents the regular expression to be compiled; The argument preg to a data structure declared as regex_t to hold the compilation result; The argument cflags determines the details of how the regular expression should be handled.
If the function Regcomp () executes successfully and the compilation result is correctly populated into Preg, the function returns 0, and any other return result represents some sort of error generation.
Matching Regular Expressions
Once you have successfully compiled the regular expression with the Regcomp () function, you can then call the Regexec () function to complete the pattern match:
int regexec (const regex_t *preg, const char *string, size_t nmatch,regmatch_t pmatch[], int eflags);
typedef struct {
regoff_t Rm_so;
regoff_t Rm_eo;
} regmatch_t;
The parameter preg points to the compiled regular expression, and the parameter string is the string that will be matched, while the arguments Nmatch and Pmatch are used to return the matching result to the caller, and the last parameter eflags determines the details of the match.
In the process of calling a function regexec () for pattern matching, there may be multiple occurrences of the given regular expression in string strings, and parameter pmatch is used to hold these matching positions, while the argument Nmatch tells the function regexec () The maximum number of matching results can be filled into the pmatch array. When the Regexec () function returns successfully, from String+pmatch[0].rm_so to String+pmatch[0].rm_eo is the first matching string, and from String+pmatch[1].rm_so to string+ Pmatch[1].rm_eo, then the second matching string, and so on.
Releasing regular Expressions
Whenever you no longer need a compiled regular expression, you should call the function RegFree () to release it to avoid a memory leak.
void RegFree (regex_t *preg);
The function RegFree () does not return any results, it receives only a pointer to the REGEX_T data type, which is the result of the compilation of the Regcomp () function previously invoked.
If multiple regcomp () functions are called in the program for the same regex_t structure, the POSIX standard does not specify whether the RegFree () function must be invoked each time for release, but it is recommended that each call to Regcomp () The function calls the RegFree () function once after compiling the regular expression to free up storage space as soon as possible.
Reporting Error Messages
If the call function Regcomp () or regexec () gets a return value that is not 0, it indicates that some error occurred during the processing of the regular expression, at which point a detailed error message can be obtained by calling the function Regerror ().
size_t regerror (int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size);
The parameter errcode is an error code from the function Regcomp () or regexec (), and the parameter preg is a compilation result of the function Regcomp (), which is intended to provide the Regerror () function with the context necessary to format the message. When the function regerror () is executed, the formatted error message is filled in the ERRBUF buffer with the maximum number of bytes indicated by the parameter errbuf_size, and the length of the error message is returned.
Apply Regular Expressions
Finally, a concrete example is given to describe how to handle regular expressions in C language programs.
#include <stdio.h>; #include <sys/types.h>; #include <regex.h>; /* Fetch substring function/static char* substr (const CHAR*STR, unsigned start, unsigned end) {unsigned n = end-start; static char STB UF[256]; strncpy (stbuf, str + start, n); Stbuf[n] = 0; return stbuf; }/* Main program/int main (int argc, char** argv) {char * pattern; int x, z, lno = 0, cflags = 0; char ebuf[128], lbuf[256]; re gex_t reg; regmatch_t PM[10]; Const size_t Nmatch = 10; /* Compile Regular expression */pattern = argv[1]; Z = Regcomp (®, pattern, cflags); if (z!= 0) {regerror (z,®, Ebuf, sizeof (EBUF)); fprintf (stderr, "%s:pattern '%s '/n", ebuf, pattern); return 1;}/* Line by row Fgets (lbuf, sizeof (LBUF), stdin) {++lno; if ((Z = strlen (lbuf)) > 0 && lbuf[z-1] = = ' n ') lbu F[z-1] = 0; /* Apply regular expressions to each row to match * * z = regexec (®, lbuf, Nmatch, PM, 0); if (z = = Reg_nomatch) continue; else if (z!= 0) {regerror (z,®, Ebuf, sizeof (EBUF)); fprintf (stderr, "%s:regcom ('%s ')/n", Ebuf, Lbuf); return 2;}/* Lose The results of the processing * * for (x = 0; x < nmatch && Pm[x].rm_so!=-1; + + x) {if (!x) printf ("%04D:%s/n", Lno, Lbuf); printf ("%d= '%s '/n", X, substr (Lbuf, Pm[x].rm_so, Pm[x].rm_eo)); * * Release Regular expression/RegFree (®); return 0; }