Simple application of c++11 regular expression

Source: Internet
Author: User

Regular expressions (regular expression) are a concept in computer science, also known as regular expressions, usually abbreviated as regex, RegExp, RE, Regexps, Regexes, and Regexen.

A regular expression is a text pattern. Regular expressions are powerful, convenient, and efficient text-processing tools. The regular expression itself, coupled with the generic pattern notation (general pattern notation), as a pocket programming language, gives the user the ability to describe and analyze text. With the additional support provided by specific tools, regular expressions can add, delete, detach, overlay, insert, and reshape various types of text and data.

The complete regular expression consists of two characters: the special character (special characters) is called the "meta-character" (meta-characters), the other is "literal" (literal), or it is a normal text character (the normal literal characters , such as letters, numbers, kanji, underscores). The meta-character of a regular expression provides a more powerful descriptive capability.

Like a text editor, most advanced programming languages support regular expressions, such as Perl, Java, Python, and C + +, which have their own regular expression packages.

A regular expression is just a string, and it has no length limit. A "subexpression" refers to a part of the entire regular expression, usually an expression in parentheses, or a "|" The split multi-select branch.

By default, the letters in an expression are case-sensitive.

Common metacharacters:

1. "." : Match any single character except "\ n" to match any character, including "\ n", using a pattern such as "[\s\s]";

2. "^": match the starting position of the input string, do not match any characters, to match the "^" character itself, you need to use "\^";

3. "$": matches the end of the input string position, does not match any characters, to match the "$" character itself, you need to use "\$";

4. "*": 0 or more times matches the preceding character or subexpression, "*" is equivalent to "{0,}", such as "\^*b" can match "B", "^b", "^^ B" 、... ;

5. "+": matches the preceding character or sub-expression one or more times, equivalent to "{1,}", such as "a+b" can match "AB", "AaB", "Aaab" 、... ;

6. "?" : 0 or one time matches the preceding character or subexpression, equivalent to "{0,1}", such as "A[CD"? " Can match "a", "AC", "ad", and when this character immediately follows any other qualifier "*", "+", "?", "{n}", "{N,}", "{n,m}", the matching pattern is "non-greedy". The "non-greedy" pattern matches the shortest possible string searched, while the default "greedy" pattern matches the string that is searched for as long as possible. For example, in the string "Oooo", "o+?" Match only a single "O", while "o+" matches all "o";

7. "|" : Two matching criteria for logical or (or) operations, such as regular expression "(Him|her)" matches "itbelongs to him" and "it belongs to her", but does not match "itbelongs to them." ;

8. "\": Mark the next character as a special character, text, reverse reference, or octal escape character, for example, "n" matches the character "n", "\ n" matches the line break, the sequence "\ \" matches "\", "\ (" "\" (";

9. "\w": match letters or numbers or underscores, any one letter or number or underscore, that is, any of the a~z,a~z,0~9,_;

"\w": matches any character that is not alphabetic, numeric, or underlined;

"\s": matches any white space character, including spaces, tabs, and other white space characters, any of which is equivalent to "[\f\n\r\t\v]";

"\s": matches any character that is not a whitespace character, equivalent to "[^\f\n\r\t\v]";

"\d": Match a number, any number, any one of the 0~9, equivalent to "[0-9]";

"\d": matches any non-numeric character, equivalent to "[^0-9]";

"\b": matches a word boundary, that is, the position between the word and the space, that is, the position between the word and the space, does not match any character, for example, "er\b" matches "er" in "never", but does not match "er" in "verb";

"\b": Non-word boundary matching, "er\b" matches "er" in "verb", but does not match "er" in "Never";

"\f": Match a page break, equivalent to "\x0c" and "\CL";

"\ n": Matches a line break, equivalent to "\x0a" and "\CJ";

"\ r": Matches a carriage return, equivalent to "\x0d" and "\cm";

"\ T": match a tab, equivalent to "\x09" and "\ci";

"\v": Matches a vertical tab, equivalent to "\x0b" and "\ck";

"\CX": matches the control character indicated by "X", such as \cm match control-m or carriage return, the value of "X" must be between "A-Z" or "A-Z", if not, it is assumed that C is the "C" character itself;

"{n}": "N" is a nonnegative integer that matches n times, for example, "o{2}" does not match "O" in "Bob", but matches two "o" in "food";

"{N,}": "N" is a nonnegative integer that matches at least n times, for example, "o{2,}" does not match "O" in "Bob", while matching all "O" in "Foooood", "O{1,}" is equivalent to "o+", "o{0," is equivalent to "o*";

"{n,m}": "N" and "M" are nonnegative integers, where n<=m, matches at least N times, at most m times, for example, "o{1,3}" matches the first three o in "Fooooood", ' o{0,1} ' is equivalent to ' O? ', note that spaces cannot be inserted between commas and numbers; "ba{1,3}" can match "ba" or "Baa" or "baaa";

"X|y": Matches "x" or "Y", for example, "Z|food" matches "Z" or "Food"; "(z|f) Ood" matches "Zood" or "food";

"[XYZ]": A character set that matches any of the characters contained, such as, "[ABC]" matches "a" in "plain";

"[^XYZ]": A reverse character set that matches any characters not included, matches any character except "XYZ", for example, "[^abc]" matches "P" in "plain";

"[A-Z]": A character range that matches any character within a specified range, such as "[A-z]" matches any lowercase letter in the range "a" to "Z";

"[^a-z]": Inverse range character, matches any character not in the specified range, such as, "[^a-z]" matches any character that is not in the "a" to "Z" range;

31. "()": The Expression Between "(" and ")" is defined as "group", and the character matching the expression is saved to a temporary area, a regular expression can be saved up to 9, they can be "\1" to "\9" symbol to reference;

"(Pattern)": matches the pattern and captures the matched sub-expression, which can be used to retrieve the captured match from the result "match" collection using the $0...$9 property;

"(?:p attern)": A subexpression that matches the pattern but does not capture the match, that is, it is a non-capture match and does not store a match for later use, for the "or" character "(|)" It is useful to combine mode parts, such as, "Industr (?: y|ies)" is a more abbreviated expression than "industry|industries";

"(? =pattern)": A non-fetch match, positive pre-check, matches the lookup string at the beginning of any string matching pattern, and the match does not need to be fetched for later use. For example, "Windows (? =95|98| nt|2000) "Can match" windows "in" Windows2000 ", but does not match" windows "in" Windows3.1 ". Pre-checking does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, rather than starting with the character that contains the pre-check;

35. "(?! Pattern) ": Non-fetch match, positive negation pre-check, matches the lookup string at the beginning of any mismatched pattern string, the match does not need to be acquired for later use. such as "Windows (?! 95|98| nt|2000) "Can match" windows "in" Windows3.1 ", but does not match" windows "in" Windows2000 ";

To match some special characters, precede this special character with "\", such as to match the character "^", "$", "()", "[]", "{}", ".", "?", "+", "*", "|", you need to use "\^", "\$", "\ (", "\"), "\ [", "\]", "\{ "," \} "," \. "," \? "," \+ "," \* "," \| ".

In C++/c++11, the GCC version is 4.9.0 and above, vs version is VS2013 and above, there will be a Regex header file, this header file will have Regex_match, Regex_search, regex_replace and other functions to call, Here is the test code:

#include"regex.hpp"#include<regex>#include<string>#include<vector>#include<iostream>intTest_regex_match () {std::stringpattern{"\\d{3}-\\d{8}|\\d{4}-\\d{7}"};//Fixed TelephoneStd::regex Re (pattern); Std::vector&LT;STD::string> str{"010-12345678","0319-9876543","021-123456789"}; /*Std::regex_match: Determines whether a regular expression (parameter re) matches the entire sequence of characters STR, which is primarily used to verify the text note that the regular expression must match all of the parsed string, otherwise false, and if the entire sequence is successfully matched, return Back to True*/     for(Auto tmp:str) {BOOLRET =Std::regex_match (TMP, RE); if(ret) fprintf (stderr,"%s, can match\n", Tmp.c_str ()); Elsefprintf (stderr,"%s, can not match\n", Tmp.c_str ()); }    return 0;}intTest_regex_search () {std::stringpattern{"http|hppts://\\w*$"};//URLStd::regex Re (pattern); Std::vector&LT;STD::string> str{"Http://blog.csdn.net/fengbingchun","Https://github.com/fengbingchun",        "abcd://124.456","ABCD Https://github.com/fengbingchun 123" }; /*Std::regex_search: Similar to Regex_match, but it does not require an exact match for the entire character sequence, you can use Regex_search to find a subsequence in the input that matches the regular expression re*/     for(Auto tmp:str) {BOOLRET =Std::regex_search (TMP, RE); if(ret) fprintf (stderr,"%s, can search\n", Tmp.c_str ()); Elsefprintf (stderr,"%s, can not search\n", Tmp.c_str ()); }    return 0;}intTest_regex_search2 () {std::stringpattern{"[a-za-z]+://[^\\s]*"};//URLStd::regex Re (pattern); STD::stringstr{"my csdn Blog addr Is:http://blog.csdn.net/fengbingchun, my github addr Is:https://github.com/fengbingchun" };    Std::smatch results;  while(Std::regex_search (str, results, re)) { for(auto x:results) std::cout<< x <<" "; Std::cout<<Std::endl; STR=results.suffix (). STR (); }    return 0;}intTest_regex_replace () {std::stringpattern{"\\d{18}|\\d{17}x"};//ID CardStd::regex Re (pattern); Std::vector&LT;STD::string> str{"123456789012345678","ABCD123456789012345678EFGH",        "ABCDEFBG","12345678901234567X" }; STD::stringfmt{"********" }; /*Std::regex_replace: Finds all occurrences of a regular expression re in the entire character sequence after each successful match, the matching string is replaced by the parametric FMT*/     for(Auto tmp:str) {std::stringRET =std::regex_replace (TMP, RE, FMT); fprintf (stderr,"src:%s, DST:%s\n", Tmp.c_str (), Ret.c_str ()); }    return 0;}intTest_regex_replace2 () {//Reference:http://www.cplusplus.com/reference/regex/regex_replace/STD::stringS"there is a subsequence in the string\n"); Std::regex E ("\\b (sub) ([^]*)");//matches words beginning by "sub"//using String/c-string (3) version:Std::cout << Std::regex_replace (S, E,"sub-$2"); //using Range/c-string (6) Version:STD::stringresult; Std::regex_replace (Std::back_inserter (Result), S.begin (), S.end (), E," $"); Std::cout<<result; //With flags:Std::cout << Std::regex_replace (S, E," $ and $", std::regex_constants::format_no_copy); Std::cout<<Std::endl; return 0;}

Note: Blogs are forwarded from http://blog.csdn.net/fengbingchun/article/details/54835571

Simple application of c++11 regular expression

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.