Use regular expressions in C
If you are familiar with SED, awk, grep, or VI in Linux, the concept of regular expressions is certainly not unfamiliar. Because it can greatly simplify the complexity of processing strings, it has been applied in many Linux utilities. Do not think that regular expressions are only patents for Perl, Python, Bash, and other scripting languages. As a C language programmer, users can also use regular expressions in their own programs.
Standard C and C ++ do not support regular expressions, but some function libraries can help C/C ++ programmers complete this function, among them, the most famous Perl-Compatible Regular Expression Library for the number of Philip Hazel is included in many Linux releases.
Compile regular expressions
To improve efficiency, before comparing a string with a regular expression, you must first use the regcomp () function to compile it and convert it to the regex_t structure:
Int regcomp (regex_t * preg, const char * RegEx, int cflags );
The RegEx parameter is a string that represents the regular expression to be compiled. The preg parameter points to a data structure declared as regex_t, which is used to save the compilation result; the cflags parameter determines how the regular expression is processed.
If the regcomp () function is successfully executed and the compilation result is correctly filled into the preg, the function returns 0, and any other returned results indicate an error.
Match Regular Expression
Once the regular expression is successfully compiled using the regcomp () function, you can call the regexec () function to complete the pattern matching:
Int regexec (const regex_t * preg, const char * string, size_t nmatch, regmatch_t pmatch [], int eflags );
Typedef struct {
Regoff_t rm_so;
Regoff_t rm_eo;} regmatch_t; the preg parameter points to the compiled regular expression. The string parameter is the string to be matched, and the nmatch and pmatch parameters are used to return the matching results to the caller, the last parameter eflags determines the matching details.
In the process of calling the regexec () function for pattern matching, there may be multiple matches with the given regular expression in the string. The pmatch parameter is used to save these matching locations, the nmatch parameter indicates the maximum number of matching results that can be filled into the pmatch array by the regexec () function. When the regexec () function returns successfully, the value is from string + pmatch [0]. rm_so to string + pmatch [0]. rm_eo is the first matched string from string + pmatch [1]. rm_so to string + pmatch [1]. rm_eo is the second matching string, and so on.
Release Regular Expression
Whenever compiled regular expressions are no longer needed, you should call the regfree () function to release them to avoid Memory leakage.
Void regfree (regex_t * preg );
The regfree () function does not return any results. It only receives a pointer to the regex_t data type. This is the compilation result obtained by calling the regcomp () function.
If the regcomp () function is called multiple times for the same regex_t structure in the program, the POSIX standard does not specify whether the regfree () function must be called to release each time, however, we recommend that you call the regfree () function once each time you compile the regular expression to release the occupied storage space as soon as possible.
Report error information
If regcomp () or regexec () is called to obtain a non-zero return value, it indicates that an error occurs during the processing of the regular expression, in this case, you can call the regerror () function to obtain detailed error information.
Size_t regerror (INT errcode, const regex_t * preg, char * errbuf, size_t errbuf_size );
The errcode parameter is the error code from the regcomp () or regexec () function, while the preg parameter is the compilation result obtained by the regcomp () function, the purpose is to provide the context necessary for formatting a message to the regerror () function. When executing the regerror () function, the maximum number of bytes specified by the errbuf_size parameter will be filled in the errbuf buffer with the formatted error message and the length of the error message will be returned.
Note: Above from http://www.chinaunix.net/jh/23/303346.html
Encapsulate Regular Expressions in C ++
Regexp. HPP
# Ifndef _ hpp_regexp
# DEFINE _ hpp_regexp
# Include "Autoconf/platform. H"
# Include <sys/types. h> // needed for size_t used in RegEx. h
# Include <RegEx. h>
# Include <string>
# Include <deque>
# Ifdef _ gccver3
Using namespace STD;
# Endif
Class Regexp {
Public:
Regexp ();
~ Regexp ();
Regexp (const Regexp & R );
Bool comp (const char * exp );
Bool match (const char * text );
Int numberofmatches ();
Bool matched ();
STD: String result (int I );
Unsigned int offset (int I );
Unsigned int length (int I );
Char * search (char * file, char * fileend, char * phrase, char * phraseend );
PRIVATE:
STD: deque <STD: String> results;
STD: deque <unsigned int> offsets;
STD: deque <unsigned int> lengths;
Bool imatched;
Regex_t reg;
Bool wascompiled;
STD: String searchstring;
};
# Endif
Regexp. cpp
# Include "Regexp. HPP"
# Include <iostream>
Regexp: Regexp ()
: Imatched (false), wascompiled (false ){}
Regexp ::~ Regexp (){
If (wascompiled ){
Regfree (& reg );
}
}
Regexp: Regexp (const Regexp & R ){
Results. Clear ();
Offsets. Clear ();
Lengths. Clear ();
Unsigned int I;
For (I = 0; I <R. Results. Size (); I ++ ){
Results. push_back (R. Results [I]);
}
For (I = 0; I <R. offsets. Size (); I ++ ){
Offsets. push_back (R. offsets [I]);
}
For (I = 0; I <R. lengths. Size (); I ++ ){
Lengths. push_back (R. lengths [I]);
}
Imatched = R. imatched;
Wascompiled = R. wascompiled;
Searchstring = R. searchstring;
// Release previously allocated memory to prevent memory leakage
If (wascompiled = true ){
If (regcomp (& reg, searchstring. c_str (), reg_icase | reg_extended )){
Regfree (& reg );
Imatched = false;
Wascompiled = false;
}
}
}
Bool Regexp: comp (const char * exp ){
If (wascompiled) {// release the memory
Regfree (& reg );
Wascompiled = false;
}
Results. Clear ();
Offsets. Clear ();
Lengths. Clear ();
Imatched = false;
// An error occurred while compiling the regular expression.
If (regcomp (& reg, exp, reg_icase | reg_extended) {// compile RegEx
Regfree (& reg );
Return false; // need exception?
}
Wascompiled = true;
Searchstring = exp;
Return true;
}
Bool Regexp: Match (const char * Text ){
If (! Wascompiled ){
Return false; // need exception?
}
Char * Pos = (char *) text;
Int I;
Results. Clear ();
Offsets. Clear ();
Lengths. Clear ();
Imatched = false;
Regmatch_t * pmatch;
Pmatch = new regmatch_t [Reg. re_nsub + 1]; // to hold result
If (! Pmatch) {// if it failed
Delete [] pmatch;
Imatched = false;
Return false;
// Exception?
}
If (regexec (& reg, POs, Reg. re_nsub + 1, pmatch, 0) {// run RegEx
Delete [] pmatch;
Imatched = false;
// # Ifdef dgdebug
// STD: cout <"no match for:" <searchstring <STD: Endl;
// # Endif
Return false; // if no match
}
Size_t matchlen;
Char * submatch;
Unsigned int largestoffset;
Int error = 0;
While (error = 0) {// The matching item information in the string text is inserted into results, offsets, and engths in sequence for later operations
Largestoffset = 0;
For (I = 0; I <= (Signed) reg. re_nsub; I ++ ){
If (pmatch [I]. rm_so! =-1 ){
Matchlen = pmatch [I]. rm_eo-pmatch [I]. rm_so;
Submatch = new char [matchlen + 1];
Strncpy (submatch, POS + pmatch [I]. rm_so, matchlen );
Submatch [matchlen] = '/0 ';
Results. push_back (STD: string (submatch ));
Offsets. push_back (pmatch [I]. rm_so + (POS-text ));
Lengths. push_back (matchlen );
Delete [] submatch;
If (pmatch [I]. rm_so + matchlen)> largestoffset ){
Largestoffset = pmatch [I]. rm_so + matchlen;
}
}
}
If (largestoffset> 0) {// The matching array specified by pmatch is too small. All items in text are not matched, and the matching operation starts from POS + largestoffset.
Pos + = largestoffset;
Error = regexec (& reg, POs, Reg. re_nsub + 1, pmatch, reg_notbol );
}
Else {
Error =-1;
}
}
Imatched = true;
Delete [] pmatch;
# Ifdef dgdebug
STD: cout <"match (s) for:" <searchstring <STD: Endl;
# Endif
Return true; // match (s) found
}
STD: String Regexp: result (int I ){
If (I> = (Signed) results. Size () | I <0) {// reality check
Return ""; // maybe exception?
}
Return results [I];
}
Unsigned int Regexp: offset (int I ){
If (I> = (Signed) offsets. Size () | I <0) {// reality check
Return 0; // maybe exception?
}
Return offsets [I];
}
Unsigned int Regexp: length (int I ){
If (I> = (Signed) lengths. Size () | I <0) {// reality check
Return 0; // maybe exception?
}
Return lengths [I];
}
Int Regexp: numberofmatches (){
Int I = (Signed) results. Size ();
Return I;
}
Bool Regexp: matched (){
Return imatched; // Regexp matches only
}
// My own version of STL: Search () which seems to be 5-6 times faster
Char * Regexp: Search (char * file, char * fileend, char * phrase, char * phraseend ){
Int J, L; // counters
Int P; // to hold precalcuated value for speed
Bool match; // flag
Int qsbc [256]; // quick search Boyer Moore shift table (256 alphabet)
Char * k; // pointer used in matching
Int PL = phraseend-phrase; // phrase Length
Int FL = (INT) (fileend-file)-pl; // file length that cocould match
If (FL <PL) return fileend; // reality checking
If (pl> 126) return fileend; // reality checking
// For speed We append the phrase to the end of the memory block so it
// Is always found, thus eliminating some checking. This is possible
// We know an extra 127 bytes have been provided by naughtyfilter. cpp
// And also the optioncontainer does not allow phrase lengths greater
// Than 126 chars
For (j = 0; j <pl; j ++ ){
Fileend [J] = phrase [J];
}
// Next we need to make the quick search Boyer Moore shift table
P = pl + 1;
For (j = 0; j <256; j ++) {// preprocessing
Qsbc [J] = P;
}
For (j = 0; j <pl; j ++) {// preprocessing
Qsbc [(unsigned char) phrase [J] = pl-J;
}
// Now do the searching!
For (j = 0 ;;){
K = file + J;
Match = true;
For (L = 0; L <pl; l ++) {// quiv, but faster, memcmp ()
If (K [l]! = Phrase [l]) {
Match = false;
Break;
}
}
If (MATCH ){
Return (J + file); // match found at offset J (but cocould be
// Copy put at fileend)
}
J + = qsbc [(unsigned char) file [J + PL]; // shift
}
Return fileend; // shocould never get here as it shocould always match
}