Regular Expressions and test tools

Source: Internet
Author: User
Tags character classes

1. Regular Expression

Regular Expression:A character pattern that matches character sequences in text. In many text editors or other tools, regular expressions are usually used to retrieve or replace text content that meets certain patterns. ManyProgramThe design language supports string operations using regular expressions.

A regular expression is composedCommon characters(For example, Character
'A'
To
'Z') AndSpecial characters (metacharacters)Text format. This mode describes one or more strings to be matched when searching the text subject. A regular expression is used as a template to match a character pattern with the searched string.

1.1 characters

Character

Description

Instance

\ D

Match any number

\ D can match 12, but does not match A1 or 1A

\ D

Match any non-numeric characters

\ D matches a @, does not match 12

\ W

Match word characters (including letters, numbers, underscores, and Chinese characters)

 

\ W

Match any non-word characters (including letters, numbers, underlines, and Chinese characters)

\ W matches @ and so on, does not match

\ S

Matches any blank characters, including spaces, tabs, line breaks, etc.

 

\ S

Match any non-blank characters

 

.

Match any character except line breaks

 

^

Start position of matching row

"^ The": indicates all strings starting with "the" ("there", "The cat", etc)

$

End position of matching row

"Of Despair $": A string ending with "of despair"

\ B

Match the start or end position of a word

"\ Bcat" will match "Cat" at the beginning and end, and "cat" in the middle won't be matched

\ B

Set of \ B

"\ Bcat" will match "cat" in the middle of "Cat"

2. character classes

In regular expressions, metacharacters can only match one position or one character set at a time. However, if the character set to be matched does not match the metacharacters, you need to customize the matching Character Set combination. The character class can be used to solve this problem. A character class is a character set combination. If any character in the character set is matched, it will find the matching item.

The character class is the "mini" language in the regular expression and can be enclosed in square brackets."
[]". For example:
[012345]Matching number0, 1, 2, 3, 4, 5.[JJ] ackMatch string"Jack"Or"Jack".

However, regular expressions[0123456789]Is very inconvenient to write. Therefore, the regular expression introduces the connector"-"Defines the character range. Example: Regular Expression[0-9]Equivalent to a regular expression[0123456789].

3. Qualifier

The metacharacters of a regular expression can only match one position or one character at a time. To match zero, one, or more characters, use a qualifier. A qualifier is used to specify the number of times a specific character or character set can appear repeatedly. Descriptions of common delimiters are as follows:2.11.

Character

Description

Instance

{N}

Repeated n times

'O {2} 'cannot match 'O' in "Bob", but can match two in "food.

{N ,}

At least n times

'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'.

{N, m}

Repeat at least N times and at most m times

"O {1, 3}" will match the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '.

+

Repeat at least 1 time, equivalent to {1 ,}

'Zo + 'can match "zo" and "Zoo", but cannot match "Z ". + Is equivalent to {1 ,}.

*

Repeated at least 0 times, equivalent to {0 ,}

Zo * matches "Z" and "Zoo ". * Is equivalent to {0 ,}.

?

When this character is followed by any other qualifier (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible.

For strings "oooo", 'O +? 'Will match a single "O", and 'O +' will match all 'O '.


4. escape characters

Regular Expressions define special metacharacters, for example, ^ , $ , and . . Because these characters are interpreted as other specific meanings in regular expressions, if you need to match these characters, you need to use escape characters to solve this problem. The Escape Character is " \" ( backslash ) , it can cancel these characters ( for example ^ , $ , . and so on ) special meaning in expressions, escape Character Description and example Table 2.12 .

Character

Description

Instance

\

Some characters are used to indicate special meanings, but escape characters must be used to match them.

"\" Match "\"

"\." Match "."

"\ *" Match "*"

"\ +" Match "+"

"\ Unnnn" matches a 4-digit hexadecimal number specified Unicode


5. group, "or", and reverse reference

The previous section describes how to repeat a single character (simply add a qualifier after the character). However, if you want to repeat a string, you should use grouping.

grouping is also known as a subexpression. It divides all or part of a regular expression into one or more groups. The grouping is represented by brackets () . After grouping, you can treat the expressions in the brackets " ()" as a whole, then, specify the number of repetitions of the subexpression. For example, " (\ D {1, 3 }\.) {3} \ D {1, 3} " is a simple ip address matching expression.
" (\ D {1, 3 }\.) {3} " matches three digits with an English ending multiple times 3 times, " \ D {1, 3}" match 1 to / span> 3 digits.

" or" use the character " |" . If a string matches the rule on the left or right of the |" character in the regular expression, then the string matches the regular expression. For example: " 0 \ D {2}-\ D {8} | 0 \ D {3}-\ D {7} | 0 \ D {3} -\ D {8} " match 3 fixed phone numbers in some regions of China: the first 3 is the area code, 8 digits are local numbers, and the first four digits are area numbers, the last 7 digit is the local number; the last one is the 4 digit of the number is the area code, the 8 digit is the local number. Both the area code and the local code use the Connection Symbol " -" to connect.

Reverse reference: After a regular expression is grouped, each group is automatically assigned a group number, which can represent the expression of the group. The rule for preparing group numbers is: from left to right, with the left brackets"("Indicates that the first group number is1The Group Number of the second group is2And so on. Reverse reference provides a convenient way to find duplicate character groups. It can be considered a shortcut command that matches the same string again. You can use a group number named by a number for reverse reference. The syntax is as follows:\ NumberYou can also use the specified group number. Syntax:\ K<Name>. For example, the expression"('| ")(.*?) (\ 1 )"In the'Hello', "world ""First, the matched content is:"'Hello '". When matching the next one, you can match"World "".

2. RegEx tester is used to test regular expressions.

RegEx tester is a simple tool used to test regular expressions and is open-source.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.