Basic parsing Regular Expression

Source: Internet
Author: User

Since the publication of the article "confusedProgramEmployee "and I haven't written anything for a long time, huh? Why should I use the word "again! Ah! Sorry!

Okay! Not much nonsense, the feelings in the garden are really "Silently Two Tears" --- OK> turn into the subject.
Regular Expressions are often seen during verification! Next I will share with you how to better understand regular expressions.
First, let's talk about the name of the regular expression: I think everyone will be familiar with this word! "Regular Expression ". Regular Expressions are generally translated into regular expressions, regular expressions, and regular expressions ". Here, "regular" refers to "rules" and "rules", and "Regular Expression" refers to "regular expressions that describe certain rules", that is, what we call regular expressions.
In fact, regular expressions are used to regulate certain behaviors! Or it is a constraint, just like we must follow the traffic rules.
I personally feel that it is not difficult to understand and understand regular expressions! Just add some of the things that must be remembered and use them flexibly!
Okay! Next I will go deep into the body of the regular expression,
Let's take a look at the key things of Regular Expressions first. If you understand these things! Generally, there is no problem in the project!

The first "\"
This is commonly known as an escape character. It marks a character as a special character or a literal character. For example, "N" means matching "N ". If it is "\ n", it is a line break.
Someone should ask, what if I want to write the slash? This is also very simple! You can simply write! Why do I write two! It is to distinguish.

The second "^"
This is commonly known as the start character, that is, preparing to write regular expressions! If the multiline attribute of the Regexp object is set, ^ matches the position after "\ n" or "\ r.

Third "$"
This is commonly known as the ending character. It can also be called the ending character (unprofessional explanation )! If the multiline attribute of the Regexp object is set, $ also matches the position before "\ n" or "\ r.

Fourth "*"
This matches the previous subexpression zero or multiple times. For example, Zo * can match "Z" and "zo" or "Zoo ". This "*" is equivalent to {0 ,}

Fifth "+"
This is to match the previous subexpression once or multiple times. For example, "zo +" can match "zo" and "Zoo" or "zooo ". This "*" is similar to "+" at the beginning and once. This "+" is equivalent to {1 ,}.

The sixth "?"
This is to match the previous subexpression zero or once. For example, "Do (ES )?" It can match "do" or "does ". This question mark indicates that the question mark must be matched zero times or once!

Seventh "{}"
How many times does this symbol match,
1, {n} matches the specified n times, and N is a non-negative integer. For example, "o {2}" means matching two "oo", such as: Good, food, etc! But it cannot match the body, because it is an o!
2, {n,} matches at least N times, and N is a non-negative integer. For example, "o {2,}" indicates matching two or more "oo", for example: good, goood, gooood, etc. "O {1,}" is equivalent to "O + ". "O {0,}" is equivalent to "O *".
3, {n, m}: this is the minimum match n times to match m times at most, N and m are non-negative integers, where n <= m. For example, "O {1, 3}" matches body, food, and foood. However, it does not match fooood. "O {0, 1}" is equivalent to "O ?". Note that there is no space between a comma and two numbers.

The eighth "?" Special usage
When this character is followed by any other delimiter (*, + ,?, The matching mode after {n}, {n ,}, {n, m}) is not greedy. The so-called non-Greedy mode matches the searched strings as few as possible in the non-Greedy mode, while the default greedy mode matches as many strings as possible. For example, for strings "oooo", "O + ?" A single "O" will be matched, while "O +" will match all "O ".

The ninth "."
Match any single character except the linefeed "\ n. If you want to match any character in the linefeed "\ n", use the "(. | \ n)" mode.

Tenth "pattern"
This "pattern" is not very easy to understand. It was just a bit dizzy! However, my understanding of this is as follows:
1 .? : Pattern matches pattern but does not get the matching result, for example, K (? : 1 | 2 | 3) K matches any one in 123. Example: K1 | K2
2 .? = Pattern positive certainly pre-query example: K (? = 1 | 2 | 3) if K matches any one of 123, select K. Example: K in K1 or K in K2.
3 .?! Pattern positive negative pre-query example: K (?! 1 | 2 | 3) if K does not match any of the 123 values, select K. For example, K does not match K in K1, but it can be K4 or K5.
4 .? <= The reverse direction of pattern must be pre-checked for example :(? <= 1 | 2 | 3) K. When k matches any one of 123, select K. Example: K in 1 K or K in 2 K.
5 .? <! Pattern reverse negative pre-query example :(? <! 1 | 2 | 3) K. If K does not match any of the 123 values, for example, K in 1 K may be 4 K or 5 K.

11th "|"
This symbol is or, for example, "f | good" can match "F" or "good". If so, "(f | G) ood matches "food" or "good ".

12th "[]"
This symbol indicates the character set and meaning. It looks similar to "{}", but the meaning is much worse.

13th "()"
This symbolic array or set (which may be inaccurate ).

1. [xyz] matches any character contained. That is to say, select one of the three. Example: "[ABC]" can match "A" in "company", but cannot match "beauul ul" because two letters are used.
2. [^ XYZ] This is a combination of negative character sets. It can also be said to be "Non ". Example: "[^ ABC]" can match "Drop! If the word does not contain the three letters "ABC.
3. Range of [a-Z] characters. Matches any character in the specified range. For example, "[A-Z]" can match any lowercase letter in the range of "A" to "Z. It can also be written as "[0-9]", which is to match 0 to 9 and directly count.
4. [^ A-Z] I don't want to mention what everyone should think of, right! That's what you mean: any character that is not within the range of "A" to "Z". When I first saw this, I thought it was not a letter between A and Z! I said that if the letter is not between a and Z, it would only be "U" in Chinese! It seems like reading "yu "! Haha! You can see it clearly! It is a character, not a letter.

let's take a look at the special meanings of "\" and letters.
"\ B" is the boundary that matches a word, that is, the position between a word and a space. For example, "Er \ B" can match "er" in "never", but cannot match "er" in "verb ". I think it's better to remember this: the edge of the boundary starts with B!
"\ B" is opposite to "\ B" and matches non-word boundary. "Er \ B" can match "er" in "verb", but cannot match "er" in "never ".
"\ D" is widely used! I suggest you remember this multiple note. It matches numbers, which is equivalent to [0-9].
"\ D" is also very easy to understand. It also means that it is not a number, which is equivalent to [^ 0-9].
"\ f" matches a page break. This is not explained too much! The four below will be explained too much. Just remember it! It will be used in the project!
"\ n" indicates a line break.
"\ r" indicates a carriage return.
"\ t" matches a tab.
"\ v" matches a vertical tab.
"\ s" matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v]. That is to say, this includes all the above five!
the non-blank character "\ s" is equivalent to [^ \ f \ n \ r \ t \ v].
here, we may all feel that regular expressions are actually these characters! In addition, some of them can be inferred based on our logical thinking, and some are repetitive, as long as you can use them flexibly.
okay, let's continue.
"\ W" is a match for any word characters including underscores. It is equivalent to "[A-Za-z0-9 _]". In practice, we recommend that you remember this.
"\ W" matches non-word numeric characters. It is equivalent to "[^ A-Za-z0-9 _]".

okay! Basically, you have to remember that much! Some regular expressions may say, "You are not all at all ?" Haha! Let me explain it in advance. What I write is just a few basics. It is common and practical in projects. Basically, these can be used freely in projects.
next, let's make some substantive things with you and parse some regular expressions together.
for example, this regular expression: ^ ([0-1]? [0-9] | 2 [0-3]) :( [0-5] [0-9]) :( [0-5] [0-9]) $
I want to know what the regular expression is. Of course, some people with strong logic thinking will know what this is. That's right, it's time regular.

OK. Let's parse this regular expression starting with "^", "([0-1]? [0-9] | 2 [0-3]) "is a group," [0-1]?" This question mark can be used to create a maximum of zero or one question mark (0) or one question mark (1). It can be a number between 0 and 9, "|" means "or", that is, not "[0-1]? [0-9] "is" 2 [0-3] "," 2 [0-3] ". This is the first 2 that represents 2, the numbers 0 to 3 are any numbers between 0 and 3. ":" indicates ":" and "([0-5] [0-9])" is also a group, "[0-5]" is any number between 0 and 5, "[0-9]" is any number between 0 and 9, ":" is also the intention, "([0-5] [0-9])" is also a group. "[0-5]" is any number between 0 and 5, "[0-9]" is any number between 0 and 9, and "$" is the Terminator.
Parse a decimal number with everyone.
For example: ^ [1-9] + \ D * (\. [0-9] {1, 2 })? | 0 (\. [0-9] {1, 2 })? $
"^" Is the start character. "[1-9] +", "+" indicates at least one or more between 1 and 9, "\ D *", "\ D", is a number. "*" contains at least zero or multiple numbers. "(\. [0-9] {1, 2 })?" In this group, "\." is the original meaning. "[0-9] {}" there is one or two numbers between 0 and 9, followed by the question mark "?" It means there are zero or one "(\. [0-9 })". "|" Is either "[1-9] + \ D * (\. [0-9] {1, 2 })?" It is either "0 (\. [0-9] {1, 2 })?". "0 (\. [0-9] {1, 2 })?" In this example, 0 is the original intent, "(\. [0-9] {1, 2 })?" In this group, "\." is the original meaning. "[0-9] {}" there is one or two numbers between 0 and 9, followed by the question mark "?" It means there are zero or one "(\. [0-9 })".

Well, I will not resolve them one by one. If I do this, I guess everyone should treat me as a "Tang Miao. I will share this with you today. If you still need to talk about it, you are welcome to criticize and give us some advice. If you have different opinions, please leave a message for discussion.
Below I will give you some common regular expressions to say goodbye:

^ [1-9] \ D * $ // match a positive integer
^-[1-9] \ D * $ // match a negative integer
^ -? [1-9] \ D * $ // match the integer
^ [1-9] \ D * | 0 $ // match a non-negative integer (positive integer + 0)
^-[1-9] \ D * | 0 $ // match a non-positive integer (negative integer + 0)
^ [1-9] \ D * \. \ D * | 0 \. \ D * [1-9] \ D * $ // match the Positive floating point number
^-([1-9] \ D * \. \ D * | 0 \. \ D * [1-9] \ D *) $ // match the negative floating point number
^ -? ([1-9] \ D * \. \ D * | 0 \. \ D * [1-9] \ D * | 0? \. 0 + | 0) $ // match floating point number
^ [1-9] \ D * \. \ D * | 0 \. \ D * [1-9] \ D * | 0? \. 0 + | 0 $ // match non-negative floating point number (Positive floating point number + 0)
^ (-([1-9] \ D * \. \ D * | 0 \. \ D * [1-9] \ D *) | 0? \. 0 + | 0 $ // match non-Positive floating point number (negative floating point number + 0)
^ [A-Za-Z] [a-zA-Z0-9 _] {4, 15} $ // whether the matching account is valid (starting with a letter, may be 5-16 bytes, may be alphanumeric)
^ \ S * | \ s * $ // Regular Expression matching the first and last blank characters
\ N \ s * \ r // Regular Expression matching blank rows
[^ \ X00-\ xFF] // match two-byte characters (including Chinese characters)
[\ U4e00-\ u9fa5] // Regular Expression matching Chinese Characters

User Name
^ [A-z0-9 _-] {} $

Password
^ [A-z0-9 _-] {6, 18} $

Hexadecimal value
^ #? ([A-f0-9] {6} | [a-f0-9] {3}) $

Email
^ ([A-z0-9 _ \.-] +) @ ([\ da-Z \.-] +) \. ([A-Z \.] {2, 6}) $
^ [A-Z \ D] + (\. [A-Z \ D] +) * @ ([\ da-Z] (-[\ da-Z])?) + (\. {1, 2} [A-Z] +) + $

URL
^ (HTTPS? :\/\/)? ([\ Da-Z \. -] + )\. ([A-Z \.] {2, 6}) ([\/\ W \. -] *) * \/? $

IP address
(2 [0-4] \ d | 25 [0-5] | [01]? \ D ?) \.) {3} (2 [0-4] \ d | 25 [0-5] | [01]? \ D ?)

Or
^ (? :(? : 25 [0-5] | 2 [0-4] [0-9] | [01]? [0-9] [0-9]?) \.) {3 }(? : 25 [0-5] | 2 [0-4] [0-9] | [01]? [0-9] [0-9]?) $

HTML Tag
^ <([A-Z] +) ([^ <] + )*(? :> (. *) <\/\ 1> | \ s + \/>) $

Author: green apple
Motto: constantly reflect on yourself! Then change it!
Technologies of interest:. net, database, JavaScript, C #, Ajax, winform, jquery, extjs
Source: http://www.cnblogs.com/xinchun/
The copyright of this article is shared by the author and the blog. You are welcome to repost this article, but you must keep this statement without the author's consent andArticleThe original text connection is clearly displayed on the page. Otherwise, the legal liability is retained.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.