Introduction to regular expressions and simple tutorials in C ++ 11.

Source: Internet
Author: User

Introduction to regular expressions and simple tutorials in C ++ 11.

Regular expression Regex (regular expression) is a powerful tool for describing character sequences. Regular Expressions exist in many languages. C ++ 11 also incorporates regular expressions into the new standard. In addition, it also supports the syntax of six different regular expressions, ECMASCRIPT, basic, extended, awk, grep, and egrep. ECMASCRIPT is the default syntax. You can specify the syntax when constructing a regular expression.

A regular expression is a text pattern. Regular Expressions are powerful, convenient, and efficient text processing tools. The regular expression itself, coupled with general pattern notation in the same pocket programming language, gives users the ability to describe and analyze text. With the additional support provided by specific tools, regular expressions can be used to add, delete, separate, overlay, insert, and trim various types of text and data.

The complete regular expression consists of two types of characters: special characters (special characters) are called "meta characters" and others are "literal ), or a common text character (such as letters, numbers, Chinese characters, and underscores ). The metacharacters of regular expressions provide more powerful descriptions.

Like text editors, most advanced programming languages support regular expressions, such as Perl, Java, Python, and C/C ++. These languages all have their own regular expression packages.

A regular expression is only a string with no length limit. A subexpression refers to a part of the entire regular expression, usually an expression in parentheses, or multiple branches separated by "|.

By default, the letters in the expression are case-sensitive.

Common metacharacters:

1. ". ": match any single character except" \ n ". To match any character including" \ n, use a mode such as "[\ s \ S;

2. "^": matches the start position of the input string. It does not match any character. to match the character "^", you must use "\ ^ ";

3. "$": matches the position at the end of the input string. It does not match any character. to match the "$" character itself, you must use "\ $ ";

4. "*": matches the character or subexpression zero or multiple times. "*" is equivalent to "{0 ,}", for example, "\ ^ * B" can match "B", "^ B", "^ B ",...;

5. "+": matches the previous character or subexpression multiple times, equivalent to "{1 ,}", for example, "a + B" can match "AB", "aab", "aaab ",...;

6. "?" : Matches the character or subexpression zero or one time, which is equivalent to "{0, 1}", for example, "a [cd]?". It can match "a", "ac", and "ad". When this character is followed by any other qualifier "*", "+ ","?" , "{N}", "{n,}", "{n, m}", the matching mode is "non-greedy ". The "non-greedy" pattern matches the searched string as short as possible, while the default "greedy" pattern matches the searched string as long as possible. For example, in the string "oooo", "o +? "Only matches a single" o ", while" o + "matches all" o ";

7. "|": Perform logical "Or" (Or) operations on two matching conditions, such as regular expressions "(him | her) "matches" itbelongs to him "and" it belongs to her ", but does not match" itbelongs to them. ";

8. "\": Mark the next character as a special character, text, reverse reference, or octal escape character, for example, "n" matches character "n", "\ n" matches line break, sequence "\" match "\", "\ (" match "(";

9. "\ w": matches letters, numbers, or underscores. Any letter, number, or underline is ~ Z, ~ Z, 0 ~ 9, _ Any one;

10. "\ W": match any character that is not a letter, number, or underline;

11. "\ s": matches any blank space characters, including spaces, tabs, and page breaks, this is equivalent to "[\ f \ n \ r \ t \ v;

12. "\ S": matches any character that is not a blank character, which is equivalent to "[^ \ f \ n \ r \ t \ v;

13. "\ d": match a number, any number, 0 ~ Any of the 9 values is equivalent to "[0-9]".

14. "\ D": match any non-numeric character, equivalent to "[^ 0-9]";

15. "\ B": matches a word boundary, that is, the position between a word and a space, that is, the position between a word and a space. It does not match any character, for example, "er \ B" matches "er" in "never", but does not match "er" in "verb ";

16. "\ B": Non-word boundary match. "er \ B" matches "er" in "verb", but does not match "er" in "never ";

17. "\ f": match a page feed, equivalent to "\ x0c" and "\ cL ";

18. "\ n": match a line break, equivalent to "\ x0a" and "\ cJ ";

19. "\ r": match a carriage return, which is equivalent to "\ x0d" and "\ cM ";

20. "\ t": match a tab, which is equivalent to "\ x09" and "\ cI ";

21. "\ v": matches a vertical tab, which is equivalent to "\ x0b" and "\ cK ";

22. "\ cx": match the Control characters indicated by "x", for example, "\ cM" matches "Control-M" or "Carriage return, the value of "x" must be between "A-Z" or "a-z". If not, it is assumed that c is the "c" character itself;

23. "{n}": "n" is a non-negative integer that exactly matches n times. For example, "o {2}" does not match "o" in "Bob, but it matches two "o" in "food;

24. "{n ,}":" n "is a non-negative integer and matches at least n times. For example," o {2,} "does not match" o "in" Bob ", matching all "o" and "o {1,}" in "foooood" is equivalent to "o +" and "o {0,}" is equivalent to "o *";

25. "{n, m}": "n" and "m" are non-negative integers, where n <= m matches at least n times and at most m times, for example, "o {}" matches the first three o s in "fooooood". 'O {} 'is equivalent to 'o? ', Note: Do not insert spaces between commas and numbers. For example, "ba {}" can match "ba" or "baa" or "baaa ";

26. "x | y": matches "x" or "y", for example, "z | food" matches "z" or "food"; "(z | f) ood "matches" zood "or" food ";

27. "[xyz]": Character Set, matching any character contained, such as "a" in "[abc]" matching "plain ";

28. "[^ xyz]": reverse character set. It matches any character that is not included and any character except "xyz", for example, "[^ abc]" match "p" in "plain ";

29. "[a-z]": character range, matching any character in the specified range, for example, "[a-z]" matches "a" to "z" any lower-case letters;

30. "[^ a-z]": reverse range character, matching any character that is not within the specified range, for example, "[^ a-z]" matches any character that is not in the range of "a" to "z;

31. "()": defines the expression between "(" and ")" as a "group" group, and saves the characters matching this expression to a temporary area, A regular expression can store a maximum of nine values, which can be referenced by the "\ 1" to "\ 9" symbols;

32. "(pattern)": matches pattern and captures the matched subexpression. You can use $0... $9 search for captured matches from the result "match" set;

33. "(? : Pattern) ": a child expression that matches pattern but does not capture the matching, that is, it is a non-capturing match and is not stored for future use, this is useful for combining mode components with the "or" character "(|)", for example, "industr (? : Y | ies) "is a simpler expression than" Industrial | industrial;

34. "(? = Pattern) ": indicates a non-get match. It is a forward validation pre-query. It matches the search string at the beginning of any string that matches pattern. This match does not need to be obtained for future use. For example, "Windows (? = 95 | 98 | NT | 2000) "can match" Windows "in" Windows2000 ", but cannot match" Windows "in" Windows3.1 ". Pre-query does not consume characters. That is to say, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters are included;

35. "(?! Pattern) ": indicates a non-get match. It is a forward negative pre-query. It matches the search string at the beginning of any string that does not match pattern. This match does not need to be obtained for future use. For example, "Windows (?! 95 | 98 | NT | 2000) "can match" Windows "in" Windows3.1 ", but cannot match" Windows "in" Windows2000 ";

To match certain special characters, you must add "\" before the special characters, such as "^", "$", and ","() "," [] "," {} ",". ","?" , "+", "*", "|", Use "\ ^", "\ $", "\ (", "\) "," \ ["," \] "," \ {"," \} "," \. ","\?" , \ +, \ *, And \ | ".

In C ++/C ++ 11, if the GCC version is 4.9.0 or later, and the VS version is VS2013 or later, the regex header file is displayed, this header file contains regex_match, regex_search, regex_replace, and other functions for calling. The following is the test code:

# Include "regex. hpp "# include <regex> # include <string> # include <vector> # include <iostream> int test_regex_match () {std :: string pattern {"\ d {3}-\ d {8} | \ d {4}-\ d {7}"}; // fixed telephone std:: regex re (pattern); std: vector <std: string> str {"010-12345678", "0319-9876543", "021-123456789 "}; /* std: regex_match: determines whether a regular expression (parameter re) matches the entire Character Sequence str. It is mainly used to verify the text note that this regular expression must match all the strings to be analyzed, otherwise, false is returned. if the entire sequence is successfully matched, true */for (auto tmp: str) {bool ret = std: regex_match (tmp, re); if (ret) is returned) fprintf (stderr, "% s, can match \ n", tmp. c_str (); else fprintf (stderr, "% s, can not match \ n", tmp. c_str ();} return 0;} int test_regex_search () {std: string pattern {"http | hppts: // \ w * $ "}; // url std: regex re (pattern); std: vector <std: string> str {" http://blog.csdn.net/fengbingchun "," https://github.com/fengbingchun "," Abcd: // 124.456 "," abcd https://github.com/fengbingchun 123 "};/* std: regex_search: similar to regex_match, but it does not require full matching of the entire character sequence. You can use regex_search to find a subsequence in the input, this sub-sequence matches the regular expression re */for (auto tmp: str) {bool ret = std: regex_search (tmp, re); if (ret) fprintf (stderr, "% s, can search \ n", tmp. c_str (); else fprintf (stderr, "% s, can not search \ n", tmp. c_str ();} return 0;} int test_regex_search2 () {std: string pattern {"[a-zA-z] +: // [^ \ s] * "}; // url std: regex re (pattern); std: string str {" my csdn blog addr is: http://blog.csdn.net/fengbingchun , My github addr is: https://github.com/fengbingchun "}; Std: smatch results; while (std: regex_search (str, results, re) {for (auto x: results) std :: cout <x <""; std: cout <std: endl; str = results. suffix (). str ();} return 0;} int test_regex_replace () {std: string pattern {"\ d {18} | \ d {17} X "}; // id card std: regex re (pattern); std: vector <std: string> str {"123456789012345678", "abcd123456789012345678efgh", "abcdefbg ", "12345678901234567X"}; std: string fmt {"*********"};/* std: regex_replace: find all matching Regular Expression re in the character sequence. After each successful match, replace the matching string with the fmt parameter */for (auto tmp: str) {std:: string ret = std: regex_replace (tmp, re, fmt); fprintf (stderr, "src: % s, dst: % s \ n", tmp. c_str (), ret. c_str ();} return 0;} int test_regex_replace2 () {// reference: http://www.cplusplus.com/reference/regex/regex_replace/ Std: string s ("there is a subsequence in the string \ n"); std: regex e ("\ B (sub) ([^] *) "); // matches words beginning by" sub "// using string/c-string (3) version: std: cout <std: regex_replace (s, e, "sub-$2"); // using range/c-string (6) version: std: string result; std: regex_replace (std: back_inserter (result ), s. begin (), s. end (), e, "$2"); std: cout <result; // with flags: std: cout <std: regex_replace (s, e, "$1 and $2", std: regex_constants: format_no_copy); std: cout <std: endl; return 0 ;}

GitHub: https://github.com/fengbingchun/Messy_Test

The above is a brief introduction to regular expressions and a simple tutorial on C ++ 11. I hope it will help you. If you have any questions, please leave a message, the editor will reply to you in a timely manner. Thank you very much for your support for the help House website!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.