Regular expression Notes

Source: Internet
Author: User
Tags expression engine

About regular Expressions

A regular expression can be matched by a meta-character (rule) to find the associated character set. There is a difference between him and the wildcard character. and the relevant use of tools for the regular expression of the metacharacters is different.

First, let's look at the common metacharacters and meanings (not all metacharacters), mainly using grep in bash as a mechanism.


Character matching

. (dot) An arbitrary character, except for a carriage return line character

[] Match any one of the characters contained in the

[^XYZ] negative character set. Matches any character that is not contained.

A [A-z] character range. Matches any character within the specified range.

Note: The range of characters can be represented only if the hyphen is inside a character group and appears between two characters;

If the beginning of the character group is out, only the hyphen itself can be represented.

[^a-z]^ is reversed; matches any character that is not in the specified range.]

[: Space:] a blank character

[:p UNCT:] One (all) punctuation

[: Lower:] A lowercase letter [a-z] cannot be written [z-a]

[: Upper:] One capital letter [A-z]

[:d Igit:] A number [0-9]

[: Alnum:] A number and letter [A-z0-9a-z]

[: Alpha:] A case letter [a-za-z]

Position anchoring

^ matches the starting position of the input string. The beginning of the line written on the left

$ matches the end position of the input string. Write at the end of a row

\b Matches a word boundary,

\b Matches a non-word boundary.

\< \> the Start (\<) and End (\>) of the matching word (word).

Overlap times

* matches the preceding subexpression, the character 0 or more times (greater than or equal to 0 times).

+ Match Previous sub-expression one or more times (greater than or equal to 1 times)

Match the preceding subexpression 0 or one time. (Basic expression requires a change of meaning \)

{n} The preceding character matches n times. (Basic expression requires a change of meaning \)

{N,} matches at least n times, at most no matter what (basic expression requires a change of meaning \)

{n,m} where n<=m.  Matches at least n times and matches up to M times. (Basic expression requires a change of meaning \)

Special Features

() defines the expression between \ (and \) as "group" and saves the character that matches the expression to the

A staging Area

(You can save up to 9 in a regular expression), and they can be referenced using \1 to \9 symbols. Patterns in a group

Match to the content,

Can be memorized in memory by the regular expression engine, and can then be referenced with numbering: Left and then left parenthesis, and matches

Closing parenthesis

(Basic expression requires a change of meaning \)

\ n refers to what the nth parenthesis matches, not the pattern itself

| A logical OR (or) operation of two matching criteria. Extending regular Expressions

\ Escape symbol, the character of the following special to ordinary, ordinary to special

The above characters have no other way to remember and understand, in fact, the rules of the word game is not the beginning of contact when the time recognized as difficult. The important thing is to experiment with the individual meta-characters before going to the combination. In general, regular expressions are a language that a computer can recognize and be manipulated by human beings, and he has many combinations of ways to achieve the functions you want.

Before using regular expressions, there must be a requirement or a goal, and then analyze the combination of matching rules. Here I feel it's good to use an extended regular expression, no need to tangle.

Prepare a string of characters first, I put it in the/tmp/ceshi

130 120 200 450 12 24 70 140 8000 30

30 120 200 450 12 24 170 140 80

78 30 1800 200 450 12 24 170 40 80

30 1800 200 450 120 24 170 40 70 70 70

389 30 1800 200 450 120 24 1000 40 70

30 30 1800 200 450 120 24 1000 40 70

130120 200 450122470140800030

30120200450122417014080

7830180020045012241704080

3018002004501202417040707070

3893018002004501202410004070

303018002004501202410004070

Match the string that must contain 70 in the middle of the beginning of the 30:

1, a single match to start with 30: need to do the beginning of the anchor ^30

2. Single containing 70:70

3, the combination of matching needs to be noted between the middle 30 and 70 can pass any character. *

^30.*70

[email protected] tmp]# Cat Ceshi | Grep-e "^30.*70"

30 120 200 450 12 24 170 140 80

30 1800 200 450 120 24 170 40 70 70 70

30 30 1800 200 450 120 24 1000 40 70

30120200450122417014080

3018002004501202417040707070

303018002004501202410004070

Again to match a string with a beginning of 30 that must contain 1800 ending with 70:

Or the same steps to analyze a few important points

1, 30 Start: ^30

2, the middle must contain around 1800:1800 characters may exist. *1800.*

3, ending with 70:70$

^30.*1800.*70$

[email protected] tmp]# Cat Ceshi | Grep-e "^30.*1800.*70$"

30 1800 200 450 120 24 170 40 70 70 70

30 30 1800 200 450 120 24 1000 40 70

3018002004501202417040707070

303018002004501202410004070

Requires 30 followed by a blank character?

Requires 30 to be followed by 1 or 3?

^30[].*1800.*70$

^30[1|3].*1800.*70$

Let's try some simple examples.


1. Requires 70 to appear at least 2 times up to 3 times

We need to use {n,m} where n<=m. Matches at least n times and matches up to M times.

70{2,3} is that so? Let's try.

[email protected] tmp]# Cat Ceshi | Grep-e "70{2,3}"

[[email protected] tmp]# echo $?

1

Execution error Why? Take a closer look at the metacharacters and you will find that they are basically for a single character.

70{2,3} The actual meaning of the expression should be 700|7000.

So how do we match multiple characters? This is the use of ().

[email protected] tmp]# Cat Ceshi | Grep-e "(70) {2,3}"

3018002004501202417040707070

It's important to note that the match is in () and not the sub-expression inside.

n is calculated from the outer ring and the outer ring is 1

[email protected] tmp]# Cat Ceshi | Grep-e "(6 (7 (\<200\>))) *.*\3? *[]70$ "

30 1800 200 450 120 24 170 40 70 70 70

389 30 1800 200 450 120 24 1000 40 70

30 30 1800 200 450 120 24 1000 40 70

[Email protected] tmp]# a400=200

[email protected] tmp]# Cat Ceshi | Grep-e "(6 (7 (\< ' echo A400 ' \>))) *.*\3?. *[]70$ "

30 1800 200 450 120 24 170 40 70 70 70

389 30 1800 200 450 120 24 1000 40 70

30 30 1800 200 450 120 24 1000 40 70


2, + and? These two metacharacters actually use the same as {n,m}.

+ can be written as {1,} appears at least 1 times, M unassigned means no limit, but does not comply with the requirements of N.

? can be written as {0,1}

for \b \b \< \> you can think of a set of numbers separated by whitespace characters in the above characters as a single word.

The difference between the \b200\b \b200\b \<200\> can be compared by the matching output below.

[email protected] tmp]# Cat Ceshi | Grep-e "\b200\b"

30120200450122417014080

7830180020045012241704080

3018002004501202417040707070

3893018002004501202410004070

303018002004501202410004070

[email protected] tmp]# Cat Ceshi | Grep-e "\b200\b"

130 120 200 450 12 24 70 140 8000 30

30 120 200 450 12 24 170 140 80

78 30 1800 200 450 12 24 170 40 80

30 1800 200 450 120 24 170 40 70 70 70

389 30 1800 200 450 120 24 1000 40 70

30 30 1800 200 450 120 24 1000 40 70

130120 200 450122470140800030

[email protected] tmp]# Cat Ceshi | Grep-e "\<200\>"

130 120 200 450 12 24 70 140 8000 30

30 120 200 450 12 24 170 140 80

78 30 1800 200 450 12 24 170 40 80

30 1800 200 450 120 24 170 40 70 70 70

389 30 1800 200 450 120 24 1000 40 70

30 30 1800 200 450 120 24 1000 40 70

130120 200 450122470140800030


Match an email address

Format of the mailbox [email protected]

This is 126 of the mailbox name rule: 6~18 characters, use letters, numbers, underscores, start with a letter

Grep-e "^[[:alpha:]" ([a-z]|[ a-z]| [0-9]| [_]) {5,17}@ ([[: Alnum:]]+[\.]) +[[:alnum:]]+$ "

[[: Alpha:]] ([a-z]| [a-z]| [0-9]| [_]) {5,17}@ ([[: Alnum:]]+[\.]) +[[:alnum:]]+\b

Beginning of Letter [[: Alnum:]]

You can use letters, numbers, underscores: ([a-z]|[ a-z]| [0-9]| [_])

6~18 characters: {5,17} Note because the letters that preceded it already occupy one.

@:@

Mailbox suffix Format 126.com: ([[: Alnum:]]+[\.]) +[[:alnum:]]+

English uppercase and lowercase numeric characters appear at least once, escaping. This symbol, which appears at least once as a group, and the last match of an English-case numeric character appears at least once.

The reason for grouping is that there may be 126.com.cn 126.163.com.cn the appearance of this format

Well, write down the notes and share of the regular expressions here. In the end, we will first analyze our final goal and then combine the grammar to write, the goal is clear to achieve. There are many ways to implement regular expressions that are not the same without having to struggle with simplicity and complexity, as well as being more skillful and concise.


Regular expression Notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.