[Technical learning] Introduction to three types of Regular Expression Libraries: ATL catlregexp, Greta, boost: RegEx

Source: Internet
Author: User

This article translated several articles and briefly introduced ATL catlregexp, Greta, boost: RegEx
Such as regular expression library, these Expression Libraries allow us to easily take advantage of the great power of the Regular Expression Library, to facilitate our work.

Regular expression syntax

Character Element Meaning
. Match a single character
[] Specifies a character class that matches any character in square brackets. For example, [ABC] matches "A", "B" or "C ".
^ If ^ appears at the beginning of the character class, it denies the character class. The negative character class matches the characters except the characters in square brackets. For example, [^ ABC] matches "",
Characters other than "B" and "C. If ^ appears before the regular expression, it matches the start of the Input. For example, ^ [ABC] matches the input starting with "A", "B", or "C.
- Specifies the range of a character in a character class. For example, [0-9] matches numbers from "0" to "9.
? ? The preceding expression is optional and can be matched once or not. For example, [0-9] [0-9]? Match "2" or "12 ".
+ ? The previous expression matches once or multiple times. For example, [0-9] + matches "1", "13", "666", and so on.

*

Indicates that the expression before * matches zero or multiple times.
??, + ?, *? ?, + The non-Greedy match version of * matches as few characters as possible ?,
+ And * are greedy versions and match as many characters as possible. For example, if you enter "<ABC> <def>", then <. *?>
Match "<ABC>", and <. *> match "<ABC> <def> ".
() Grouping operator. For example: (/d +,) */d + matches a string of numbers separated by commas, for example, "1" or "456 ".
/ Escape Character, followed by escape characters. For example, [0-9] + matches one or more numbers, while [0-9]/+
Match a number with a plus sign. Backslash/is also used to represent the abbreviation,/
Represents any number or letter. If/followed by a number N, it matches the nth matching group (starting from 0), for example, <{. *?}>. *? <// 0> match ""// +", "// A", "<{. *?}>. *? <// 0> ".
$ Put it at the end of the regular expression and match the end of the input. For example, [0-9] $ matches the last number entered.
| Delimiter, which separates two expressions to match one of them correctly. For example: T | the match ""
Or "".

 

Abbreviation match

Abbreviations Match
/ Letters, numbers ([a-zA-Z0-9])
/B Space (blank): ([// T])
/C Letter ([A-Za-Z])
/D Decimal number ([0-9])
/H Hexadecimal number ([0-9a-fa-f])
/N Line feed: (/R | (/R? /N ))
/Q Reference string (/"[^/"] */") | (/''' [^/'''] */''')
/W Text ([A-Za-Z] +)
/Z An integer ([0-9] +)

ATL catlregexp
ATL
Server often needs to decode complex text fields such as addresses and commands, while regular expressions are powerful text parsing tools. Therefore, ATL provides regular expression interpretation tools.
Example:

#include "stdafx.h"#include <atlrx.h>int main(int argc, char* argv[]){   CAtlRegExp<> reUrl;   // five match groups: scheme, authority, path, query, fragment   REParseError status = reUrl.Parse(        "({[^:/?#]+}:)?(//{[^/?#]*})?{[^?#]*}(?{[^#]*})?(#{.*})?" );   if (REPARSE_ERROR_OK != status)   {      // Unexpected error.      return 0;   }   CAtlREMatchContext<> mcUrl;   if (!reUrl.Match(   "http://search.microsoft.com/us/Search.asp?qu=atl&boolean=ALL#results",      &mcUrl))   {      // Unexpected error.      return 0;   }   for (UINT nGroupIndex = 0; nGroupIndex < mcUrl.m_uNumGroups;        ++nGroupIndex)   {      const CAtlREMatchContext<>::RECHAR* szStart = 0;      const CAtlREMatchContext<>::RECHAR* szEnd = 0;      mcUrl.GetMatch(nGroupIndex, &szStart, &szEnd);      ptrdiff_t nLength = szEnd - szStart;      printf("%d: /"%.*s/"/n", nGroupIndex, nLength, szStart);   }}      

Output:

0: "http"1: "search.microsoft.com"2: "/us/Search.asp"3: "qu=atl&boolean=ALL"4: "results"

The match result is returned through the catlrematchcontext class pointed to by the second pcontext parameter. The match result and related information are stored in the catlrematchcontext class, you only need to access the catlrematchcontext method and members to obtain the matching results. Catlrematchcontext provides the caller with matching result information through m_unumgroups member and getmatch () method. M_unumgroups indicates the number of matched groups. getmatch () returns the pstart and pend pointers of matched strings Based on the index value of the group passed to it, with these two pointers, the caller can easily obtain matching results.

For more information, see:
Catlregexp class

Greta

Greta is a regular expression template class Library launched by Microsoft Research Institute. Greta contains C ++ objects and functions, making pattern matching and replacement of strings easy. They are:

  • "Rpattern: Search Mode
  • "Match_results/subst_results: place the container that matches and replaces the result.
  • To perform search and replacement operations, you first need to explicitly Initialize an rpattern object using a string that describes matching rules, and then use the string to be matched as a parameter to call the rpattern function, for example, match () or substitute () can get the matching result. If the match ()/substitute () call fails, the function returns false. If the call is successful, the function returns true. In this case, the match_results object stores the matching results. See the sample code:

    #include <iostream>#include <string>#include "regexpr2.h"using namespace std;using namespace regex;int main() {    match_results results;    string str( "The book cost $12.34" );    rpattern pat( "//$(//d+)(//.(//d//d))?" );      // Match a dollar sign followed by one or more digits,    // optionally followed by a period and two more digits.    // The double-escapes are necessary to satisfy the compiler.    match_results::backref_type br = pat.match( str, results );    if( br.matched ) {        cout << "match success!" << endl;        cout << "price: " << br << endl;    } else {        cout << "match failed!" << endl;    }    return 0;}      

    Program output will be:

    match success!price: $12.34

    You can read the Greta document to learn details about the rpattern object and learn how to customize search policies to improve efficiency.

    Note: All declarations in the header file regexpr2.h are in the namespace RegEx. When you use the objects and functions in the header file, you must add the prefix "RegEx:" or prefix "using ".
    Namespace RegEx; "for simplicity, the" RegEx: "prefix will be omitted in the following sample code.
    The author generates the Greta. lib and regexpr2.h files. You only need the support of these two files to use Greta to parse regular expressions.

    Low matching speed
    Different Regular Expression matching engines are good at different matching modes. As a benchmark, when the mode is used: "^ ([0-9] +) (/-|
    | $) (. *) $ "Matching string" 100-this is a line of FTP response which contains a message
    String ", Greta matches faster than boost (http://www.boost.org) Regular Expression Library about 7 times faster than atl7 catlregexp 10 times faster!
    The boost RegEx documentation contains a matching test performance result of many modes. After comparing this result, I found that Greta and boost
    The RegEx performance is similar, but when Visual Studio. NET 2003 is used for compilation, Greta is slightly better.


    Boost. RegEx

    Boost provides boost: basic_regex to support regular expressions. The boost: basic_regex design is very similar to STD: basic_string:

    namespace boost{template <class charT, class traits = regex_traits<charT>, class Allocator = std::allocator<charT> > class basic_regex;typedef basic_regex<char> regex;typedef basic_regex<wchar_t> wregex;}      

    The documentation attached to the boost RegEx library is very rich, and the examples are even more exciting. For example, if there are two example programs with a small amount of code, the program can directly access C ++
    File syntax highlighting to generate the corresponding HTML (converts a C ++ file to syntax highlighted
    Html ). The following example splits a string into a string of tags (split a string into tokens ).

    #include <list>#include <boost/regex.hpp>unsigned tokenise(std::list<std::string>& l, std::string& s){   return boost::regex_split(std::back_inserter(l), s);}#include <iostream>using namespace std;#if defined(BOOST_MSVC) || (defined(__BORLANDC__) && (__BORLANDC__ == 0x550))// problem with std::getline under MSVC6sp3istream& getline(istream& is, std::string& s){   s.erase();   char c = is.get();   while(c != ''''/n'''')   {      s.append(1, c);      c = is.get();   }   return is;}#endifint main(int argc){   string s;   list<string> l;   do{      if(argc == 1)      {         cout << "Enter text to split (or /"quit/" to exit): ";         getline(cin, s);         if(s == "quit") break;      }      else         s = "This is a string of tokens";      unsigned result = tokenise(l, s);      cout << result << " tokens found" << endl;      cout << "The remaining text is: /"" << s << "/"" << endl;      while(l.size())      {         s = *(l.begin());         l.pop_front();         cout << s << endl;      }   }while(argc == 1);   return 0;}

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.