Regular expression (GO)

Last Update:2015-09-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Use of regular expressions

Https://msdn.microsoft.com/zh-cn/library/101eysae (v=vs.90). aspx

By using regular expressions, you can:

1. test the pattern within the string .

For example, you can test the input string to see if there is a phone number pattern or credit card number pattern within the string. This is called data validation.

2, replace the text .

You can use regular expressions to identify specific text in a document, to completely delete the text, or to replace it with other text.

3. extracts substrings from a string based on pattern matching .

You can find specific text within a document or in an input field.

Regular expression syntax

https://msdn.microsoft.com/zh-cn/library/ae5bf541 (v=vs.90). aspx

A regular expression is a text pattern that includes ordinary characters (for example, letters A through Z) and special characters (called metacharacters). The pattern describes one or more strings to match when searching for text.

The following table contains a complete list of metacharacters and their behavior in the context of a regular expression:

Character	Description
\	Marks the next character as a special character, text, reverse reference, or octal escape . For example, "n" matches the character "n". "\ n" matches the line break. The sequence "\ \" matches "\", "\ (" Match "(".
^	matches the starting position of the input string . If the Multiline property of the RegExp object is set, ^ will also match the position after "\ n" or "\ r".
$	matches the position of the end of the input string . If you set the Multiline property of the RegExp object, the $ will also match the position before \ n or \ r.
*	matches the preceding character or sub-expression 0 or more times. For example, zo* matches "z" and "Zoo". * Equivalent to {0,}.
+	matches the preceding character or sub-expression one or more times . For example, "zo+" matches "Zo" and "Zoo", but does not match "Z". + equivalent to {1,}.
?	matches the preceding character or sub-expression 0 or one time . For example, "Do (es)?" Match "Do" in "do" or "does".? Equivalent to {0,1}.
{n}	N is a non-negative integer. Matches exactly N times. For example, "o{2}" does not match "O" in "Bob", but matches two "o" in "food".
{n,}	N is a non-negative integer. Match at least N times. For example, "o{2,}" does not match "O" in "Bob", but matches all o in "Foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*".
{n,m}	m and n are non-negative integers, where n <= M. Matches at least N times, up to m times. For example, "o{1,3}" matches the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' O? '. Note: You cannot insert a space between a comma and a number.
?	When this character follows any other qualifier (*, + 、?、 {n}, {n,}, {n,m}), the matching pattern is "non-greedy". The "non-greedy" pattern matches the shortest possible string searched, while the default "greedy" pattern matches the string that is searched for as long as possible. For example, in the string "Oooo", "o+?" Only a single "O" is matched, and "o+" matches All "O".
.	Matches any single character except "\ n". To match any character that includes "\ n", use a pattern such as "[\s\s]".
(pattern)	Matches the pattern and captures the matched sub-expression. You can use the $0...$9 property to retrieve a captured match from the result "match" collection. To match the bracket character (), use "\ (" or "\)".
(?:pattern)	A subexpression that matches the pattern but does not capture the match, that is, it is a non-capturing match and does not store a match for later use. This is useful for combining pattern parts with the "or" character (\|). For example, ' Industr (?: y\|ies) is a more economical expression than ' industry\|industries '.
(? =pattern)	A subexpression that performs a forward lookahead search that matches the string at the starting point of the string that matches the pattern . It is a non-capture match, that is, a match that cannot be captured for later use. For example, ' Windows (? =95\|98\| nt\|2000) ' Matches Windows 2000 ' in Windows, but does not match Windows 3.1 in Windows. Lookahead does not occupy characters, that is, when a match occurs, the next matching search immediately follows the previous match, rather than the word specifier that makes up the lookahead.
(?! pattern)	A subexpression that performs a reverse lookahead search that matches a search string that is not at the starting point of a string that matches the pattern . It is a non-capture match, that is, a match that cannot be captured for later use. For example, ' Windows (?! 95\|98\| nt\|2000) ' matches Windows 3.1 ' in Windows, but does not match Windows 2000 in Windows. Lookahead does not occupy characters, that is, when a match occurs, the next matching search immediately follows the previous match, rather than the word specifier that makes up the lookahead.
x\| y	Match x or y. For example, ' Z\|food ' matches ' z ' or ' food '. ' (z\|f) Ood ' matches "Zood" or "food".
[XYZ]	Character. Matches any one of the characters contained. For example, "[ABC]" matches "a" in "plain".
[^XYZ]	The reverse character set. Matches any characters that are not contained. For example, "[^abc]" matches "P" in "plain".
[A-Z]	The character range. Matches any character within the specified range. For example, "[A-z]" matches any lowercase letter in the range "a" to "Z".
[^ A-Z]	The inverse range character. Matches any character that is not in the specified range. For example, "[^a-z]" matches any character that is not in the range "a" to "Z".
\b	Matches a word boundary, which is the position between the word and the space. For example, "er\b" matches "er" in "never", but does not match "er" in "verb".
\b	Non-word boundary match. "er\b" matches "er" in "verb", but does not match "er" in "Never".
\cx	Matches the control character indicated by x . For example, \cm matches a control-m or carriage return character. The value of x must be between A-Z or a-Z. If this is not the case, then the C is assumed to be the "C" character itself.
\d	numeric character matching. equivalent to [0-9].
\d	Non-numeric character matching. equivalent to [^0-9].
\f	The page break matches. Equivalent to \x0c and \CL.
\ n	Line break matches. Equivalent to \x0a and \CJ.
\ r	Matches a carriage return character. Equivalent to \x0d and \cm.
\s	Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s	Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].
\ t	TAB matches. Equivalent to \x09 and \ci.
\v	Vertical tab matches. Equivalent to \x0b and \ck.
\w	Matches any character, including underscores. Equivalent to "[a-za-z0-9_]".
\w	Matches any non-word character. Equivalent to "[^a-za-z0-9_]".
\xN	Match N, where n is a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04" & "1". Allows the use of ASCII code in regular expressions.
\Num	Matches num, where num is a positive integer. To capture a matching reverse reference. For example, "(.) \1 "matches two consecutive identical characters.
\N	Identifies an octal escape code or a reverse reference. If there are at least N captured subexpression in front of \n , then n is a reverse reference. Otherwise, if n is an octal number (0-7), then n is the octal escape code.
\nm	Identifies an octal escape code or a reverse reference. If there is at least a nm capture subexpression in front of the \nm , then nm is a reverse reference. If there are at least N captures in front of the \nm , then n is a reverse reference followed by the character M. If neither of the preceding conditions exists, the \nm matches the octal value nm, where n and m are octal digits (0-7).
\NML	When N is an octal number (0-3),m and l are octal numbers (0-7), the octal escape code NMLis matched.
\uN	Matches n, where n is a Unicode character represented by a four-bit hexadecimal number. For example, \u00a9 matches the copyright symbol (©).

Generating regular expressions

　HTTPS://MSDN.MICROSOFT.COM/ZH-CN/LIBRARY/6H0S3KC9 (v=vs.90). aspx

　　The structure of a regular expression is similar to the structure of an arithmetic expression. That is, various meta-characters and operators can combine small expressions to create large expressions.

　　Separator

You can build regular expressions by placing various components of the expression pattern between a pair of delimiters. For JScript, the delimiter is a forward slash (/) character. For example:

/expression/

In the example above, the regular expression pattern (expression)is stored in the pattern property of the RegExp object.

A component of a regular expression can be a single character, a character set, a range of characters, a selection between several characters, or any combination of all of these components.

Priority order

　　The following table illustrates the precedence order of the various regular expression operators, from highest to lowest:

Operator	Description
\	Escape character
(), (?:), (?=), []	Brackets and Brackets
*, +,?, {n}, {n,}, {n,m}	Qualifier
^, $, \ Any meta-character, any character	Anchor points and sequences
\|	Replace

Characters have precedence over the substitution operator, making "M|food" match "M" or "food". To match "mood" or "food", create a subexpression using parentheses, resulting in "(M|f) Ood".

Character matching

Https://msdn.microsoft.com/zh-cn/library/901zys3s (v=vs.90). aspx

a period (.) matches a variety of printed or nonprinting characters in a string, with only one character exception. The exception is the line break (\ n). The following regular expressions match AAC, ABC, ACC, ADC, and so on, as well as A1C, A2C, A-c, and a#c:

/a.c/

To match a string that contains a file name, and a period (.) is part of the input string, precede the period in the regular expression with the backslash (\) character. For example, the following regular expression matches filename.ext:

/filename\.ext/

These expressions only let you match "any" single character. You may need to match a specific group of characters in the list. For example, you might want to find chapter headings (Chapter 1, Chapter 2, and so on) that are represented by numbers.

Bracket ExpressionTo create a list of matched character groups, place one or more individual characters within square brackets ([and]). When the word enclose characters inside the brackets, the list is called the bracket expression. As in any other position, a normal character is represented by itself within the brackets, that is, it matches itself once in the input text. Most special characters lose their meaning when they appear inside a bracket expression. However, there are some exceptions, such as: 1, if the character is not the first item, it ends a list. To match the] character in the list, place it first, immediately after start [back. 2.The \ character continues to be an escape symbol. To match \ characters, use \ \.

A character enclosed in a bracket expression matches only a single character at that position in a regular expression. The following regular expressions match Chapter 1, Chapter 2, Chapter 3, Chapter 4, and Chapter 5:

/chapter [12345]/

Note that the position of the word Chapter and trailing spaces is fixed relative to the characters in brackets. The bracket expression specifies only the character set that matches the single character position immediately following the word Chapter and space. This is the Nineth character position.

To use a range instead of the character itself to represent a matching character group, use a hyphen (-) to separate the start and end characters in the range. The character value of a single character determines the relative order within the range. The following regular expression contains a range expression that is equivalent to the list in brackets shown above.

/chapter [1-5]/

When a range is specified in this manner, both the start value and the end value are included in the range. Note that it is also important that, in the Unicode sort order, the start value must precede the end value.

To include hyphens in bracket expressions, use one of the following methods:

1. Escape it with a counter-oblique carry:

```
[\-]
```

2. Place hyphens at the beginning or end of the brackets list. The following expression matches all lowercase letters and hyphens:

```
[-a-z]
```

```
[a-z-]
```

3. Create a range in which the start character value is less than the hyphen, and the ending character value is equal to or greater than the hyphen character. The following two regular expressions satisfy this requirement:

[!-~]

To find all characters that are not in the list or range, place the caret (^) at the beginning of the list. If the insertion character appears anywhere else in the list, it matches itself. The following regular expression matches a chapter title with a number greater than 5:

/chapter [^12345]/

In the example above, the expression matches any numeric character except 1, 2, 3, 4, or 5 in the nineth position. Thus, for example, Chapter 7 is a match, and Chapter 9 is also a match.

The above expression can be represented using a hyphen (-):

/chapter [^1-5]/

A typical use of a bracket expression is to specify any uppercase or lowercase letters or any number matching. The following expression specifies such a match:

/[a-za-z0-9]/

The regular expression library in the C + + (a)--gnu Regex libraries

Original: http://www.wuzesheng.com/?p=929

Written in front: This article is for readers with regular expression based on the reader friend, if you do not know what the regular expression is, please first come here to learn

。

Regular Expressions (Regular Expressions), also known as a regex or RegExp, is a very simple and flexible text processing tool. It can be used to pinpoint the content of a text that matches a specified rule. In Linux, grep, SED, awk and other tools support regular expressions, the existence of these tools for our daily text processing has brought great convenience. However, sometimes we need to use regular expressions to process some text in our own programs, and we need some support for regular expression libraries. Since I am the main development language in C/s + +, in this article and in the next few articles, I will introduce a few common C + + regular expressions of the library, through my introduction, as well as the specific use of examples, I hope to be able to give readers a friend in C/s It would be a great honor for me to have a bit of help in using regular expressions in the program.

At present, as far as I know, the regular expression library commonly used in C + + has the GNU Regex Libraries, Boost.regex, PCRE, pcre++. Of these four libraries, the latter two are related, others are independent of each other and are different implementations. So I will divide three times, to one by one to the four libraries to introduce. First introduce the GNU Regex Library today.

1. What is the GNU Regular Expression Library (GNU Regex libraries)?
The GNU regular Expression Library is part of the GLIBC (GNU C Library), which provides interfaces that match the POSIX standard-compliant regular expression.
Here is its homepage: http://www.gnu.org/s/libc/manual/html_node/Regular-Expressions.html
Download the library point here: gnuregex0_13

2. Interfaces provided by the GNU Regex Library
(1) Regcomp:

int Regcomp (regex_t *preg, const char *pattern, int cflags) function: Compile the regular expression pattern that will be matched, prepare the parameters before matching: preg, output parameters, Used to save the compiled regular expression result      pattern, enter parameters, pass in the string cflags of the regular expression to be compiled      , input parameters, to specify some options in the regular expression matching process return value: Compile successfully returned 0, failed to return error code not 0

(2) Regexec:

123456789

int regexec (const regex_t *preg, const char *string, size_t nmatch, regmatch_t pmatch[],             int eflags) function: Used to detect strings string is No match regular expression preg parameter: preg, input parameter, compiled regular expression rule string in (1) regcomp,      input parameter, used to match string      Nmatch, input parameter, Used to specify the length of the array that corresponds to the Pmatch parameter      pmatch, the output parameter, to output the exact position of the match preg in the string      Eflag, the input parameter, to specify some option return values in the regular expression matching process: Returns 0 if string matches the rule specified by Preg, otherwise 0 is not returned

(3) Regerror:

size_t regerror (int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size) features: Used to convert error codes generated in Regcompt and regexec into string-like error message parameters: Errcode, input parameters, error codes returned in Regcomp or regexec calls      preg, input parameters, The compiled regular expression structure corresponding to the error code      ERRBUF, the output parameter, is used to return the error message buffer, if buffer is not enough to the desired size, the error message will be truncated      errbuf_size, input parameters, Returns the size of the buffer that returned the error message: If Errbuf_size is 0, then Regerror returns the size of the buffer needed to return the error message

(4) RegFree:

void RegFree (regex_t *preg) function: Used to release memory parameters occupied by Preg structures generated by Regcomp compile time: preg, input parameters, return value of the struct pointer of the regular expression generated by Regcomp compile time: None

3. Some considerations for using the GNU Regex Library
(1) Regcomp and RegFree must be paired with each other, otherwise it will cause memory leaks (analogy malloc/free, New/delete)
(2) regex_t structure: The string form of the regular expression compiled into a regex_t such a structure, to facilitate the subsequent matching work
(3) regmatch_t structure: A structure used to represent the position of a match in a regular expression in a string, expressed as the offset of the starting position.
(4) Flags: Used to configure some options in the matching process, specify how to match, see: http://www.opengroup.org/onlinepubs/007908799/xsh/regcomp.html
(5) Use this library to include the header files: Sys/types.h and regex. h

4. Example of use of the GNU Regex Library

1234567891011121314151617181920212223242526272829303132333435363738

 #include <sys/types.h> #include <regex .h> #include <stdio. h> int Main (int        ARGC, char * * argv) {if (argc! = 3) {printf ("Usage:%s regexstring text\n", argv[0]);    return 1;    }  const char * pregexstr = argv[1];    const char * ptext = argv[2];  regex_t Oregex;    int nerrcode = 0;    Char szerrmsg[1024] = {0}; size_t Unerrmsglen = 0;  if ((Nerrcode = Regcomp (&oregex, PREGEXSTR, 0)) = = 0) {if (Nerrcode = Reg            EXEC (&oregex, ptext, 0, NULL, 0)) = = 0) {printf ("%s matches%s\n", Ptext, PREGEXSTR);            RegFree (&oregex);        return 0;    }}  Unerrmsglen = Regerror (Nerrcode, &oregex, szerrmsg, sizeof (SZERRMSG)); Unerrmsglen = Unerrmsglen < sizeof (SZERRMSG)?    Unerrmsglen:sizeof (szerrmsg)-1;    Szerrmsg[unerrmsglen] = ' + ';    printf ("ErrMsg:%s\n", szerrmsg);  RegFree (&oregex); return 1;}

Program test:

[Email protected]:~/program$ gcc testregex.c-o regex[email protected]:~/program$./regex ' http:\/\/www\. *\.com "" https://www.taobao.com "Errmsg:no match[email protected]:~/program$./regex" http:\/\/www\. *\.com "http://www.taobao.com" http://www.taobao.com matches http:\/\/www\. *\.com

The above is the entire contents of the GNU Regex Library. If the reader's friends have any opinion, please leave me a message below. In the next few days, I will introduce some of the other libraries mentioned earlier and come here first today.

Author's regular expression matching article about another two libraries:

The regular expression library (b)--boost.regex http://www.wuzesheng.com/?p=965

A library of regular expressions in C + + (iii)--pcre, pcre++ http://www.wuzesheng.com/?p=994

Regular expression (go)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Regular expression (GO)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support