Regular syntax and common Regular Expressions

Source: Internet
Author: User
Regular expression syntax

In typical search and replacement operations, you must provide the exact text to be searched. This technology may be sufficient for simple search and replacement tasks in static text, but it is difficult or even impossible to search dynamic text due to its lack of flexibility.

With a regular expression, you can:

    • Test a mode of a string. For example, you can test an input string to see if there is a phone number or a credit card number. This is called Data Validity verification.
    • Replace text. You can use a regular expression in a document to identify a specific text, and then delete it all or replace it with another text.
    • Extract a substring from the string based on the pattern match. It can be used to search for specific text in text or input fields.

For example, if you need to search the entire web site to delete outdated materials and replace some HTML formatting tags, you can use a regular expression to test each file, check whether there are materials or HTML formatting tags in the file. With this method, you can narrow down the affected files to the files that contain the materials to be deleted or changed. Then, you can use a regular expression to delete outdated materials. Finally, you can use a regular expression to find and replace the tags that need to be replaced.

Another example that describes the usefulness of regular expressions is a language with unknown string processing capabilities. VBScript is a subset of Visual Basic and has rich string processing functions. Similar to C, JScript does not have this capability. Regular Expressions significantly improve the string processing capability of JScript. However, it may be more efficient to use regular expressions in VBScript. It allows multiple string operations in a single expression.

A regular expression is composed of common characters (such as characters A to Z) and special characters (calledMetacharacters. This mode describes one or more strings to be matched when searching the text subject. A regular expression is used as a template to match a character pattern with the searched string.

Here are some examples of regular expressions that may be encountered:

JScript VBScript Match
/^ \ [\ T] * $/ "^ \ [\ T] * $" Matches a blank row.
/\ D {2}-\ D {5 }/ "\ D {2}-\ D {5 }" Verify that an ID number consists of a 2-digit, a hyphen, and a 5-digit number.
/<(. *)>. * <\/\ 1>/ "<(. *)>. * <\/\ 1>" Matches an HTML Tag.

The following table shows a complete list of metacharacters and their behaviors in the context of a regular expression:

Character Description
\ Mark the next character as a special character, a literal character, a back reference, or an octal escape character. For example, 'n' matches the character "N ". '\ N' matches a line break. The sequence '\' matches "\" and "\ (" matches "(".
^ Matches the start position of the input string. IfRegexpObjectMultilineProperty, ^ matches the position after '\ n' or' \ R.
$ Matches the end position of the input string. IfRegexpObjectMultilineAttribute, $ also matches the position before '\ n' or' \ R.
* Matches the previous subexpression zero or multiple times. For example, Zo * can match "Z" and "Zoo ". * Is equivalent to {0 ,}.
+ Match the previous subexpression once or multiple times. For example, 'Zo + 'can match "zo" and "Zoo", but cannot match "Z ". + Is equivalent to {1 ,}.
? Match the previous subexpression zero or once. For example, "Do (ES )? "Can match" do "in" do "or" does ".? It is equivalent to {0, 1 }.
{N} NIs a non-negative integer. MatchedNTimes. For example, 'O {2} 'cannot match 'O' in "Bob", but can match two o in "food.
{N,} NIs a non-negative integer. At least matchNTimes. For example, 'O {2,} 'cannot match 'O' in "Bob", but can match all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'.
{N,M} MAndNAll are non-negative integers, whereN<=M. Least matchNTimes and most matchingMTimes. Liu, "O {1, 3}" will match the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note that there must be no space between a comma and two numbers.
? When this character is followed by any other delimiter (*, + ,?, {N},{N,},{N,M}) The matching mode is not greedy. The non-Greedy mode matches as few searched strings as possible, while the default greedy mode matches as many searched strings as possible. For example, for strings "oooo", 'O ++? 'Will match a single "O", and 'O +' will match all 'O '.
. Matches any single character except "\ n. To match any character including '\ n', use a pattern like' [. \ n.
(Pattern) MatchPatternAnd obtain the matching. The obtained match can be obtained from the generated matches set. It is used in VBScript.SubmatchesSet, which is used in JScript$0...$9Attribute. To match the parentheses, use '\ (' or '\)'.
(? :Pattern) MatchPatternBut does not get the matching result, that is, this is a non-get match and is not stored for future use. This is useful when you use the "or" character (|) to combine each part of a pattern. For example, 'industr (? : Y | ies) is a simpler expression than 'industry | industries.
(? =Pattern) Forward pre-query, in any matchPatternTo start from the string. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (? = 95 | 98 | nt | 2000) 'can match "Windows" in "Windows 2000", but cannot match "Windows" in "Windows 3.1 ". Pre-query does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
(?!Pattern) Negative pre-query, in any does not match negative lookahead matches the search string at any point where a string not matchingPatternTo start from the string. This is a non-get match, that is, the match does not need to be obtained for future use. For example, 'windows (?! 95 | 98 | nt | 2000) 'can match "Windows" in "Windows 3.1", but cannot match "Windows" in "Windows 2000 ". Pre-query does not consume characters. That is to say, after a match occurs, the next matching search starts immediately after the last match, instead of starting after the pre-query characters.
X|Y MatchXOrY. For example, 'z | food' can match "Z" or "food ". '(Z | f) Ood' matches "zood" or "food ".
[XYZ] Character Set combination. Match any character in it. For example, '[ABC]' can match 'A' in "plain '.
[^XYZ] Negative value character set combination. Match any character not included. For example, '[^ ABC]' can match 'p' in "plain '.
[A-z] Character range. Matches any character in the specified range. For example, '[A-Z]' can match any lowercase letter in the range of 'A' to 'Z.
[^A-z] Negative character range. Matches any character that is not within the specified range. For example, '[^ A-Z]' can match any character that is not in the range of 'A' to 'Z.
\ B Match A Word boundary, that is, the position between a word and a space. For example, 'er \ B 'can match 'er' in "never", but cannot match 'er 'in "verb '.
\ B Match non-word boundary. 'Er \ B 'can match 'er' in "verb", but cannot match 'er 'in "never '.
\ CX MatchingXThe specified control character. For example, \ cm matches a control-M or carriage return character.XMust be a A-Z or one of a-Z. Otherwise, C is treated as an original 'C' character.
\ D Match a numeric character. It is equivalent to [0-9].
\ D Match a non-numeric character. It is equivalent to [^ 0-9].
\ F Match a form feed. It is equivalent to \ x0c and \ Cl.
\ N Match A linefeed. It is equivalent to \ x0a and \ CJ.
\ R Match a carriage return. It is equivalent to \ x0d and \ cm.
\ S Matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v].
\ S Match any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].
\ T Match a tab. It is equivalent to \ x09 and \ CI.
\ V Match a vertical tab. It is equivalent to \ x0b and \ ck.
\ W Match any word characters that contain underscores. It is equivalent to '[A-Za-z0-9 _]'.
\ W Match any non-word characters. It is equivalent to '[^ A-Za-z0-9 _]'.
\ XN MatchN, WhereNIt is a hexadecimal escape value. The hexadecimal escape value must be determined by the length of two numbers. For example, '\ x41' matches "". '\ X041' is equivalent to '\ x04' & "1 ". The regular expression can use ASCII encoding ..
\Num MatchNum, WhereNumIs a positive integer. References to the obtained matching. For example, '(.) \ 1' matches two consecutive identical characters.
\N Identifies an octal escape value or a backward reference. If \NAt leastNObtained subexpressionsNIs backward reference. Otherwise, ifNIs an octal digit (0-7 ),NIt is an octal escape value.
\Nm Identifies an octal escape value or a backward reference. If \NmAt least is preceded by at leastNmObtain the child expression, thenNmIs backward reference. If \NmAt leastNNIs followed by textM. If none of the preceding conditions are metNAndMAll are Octal numbers (0-7), then \NmMatch the octal escape ValueNm.
\NML IfNIt is an octal digit (0-3) andMAndLIf the values are Octal numbers (0-7), the octal escape value is matched.NML.
\ UN MatchN, WhereNIt is a Unicode character represented by four hexadecimal numbers. For example, \ u00a9 matches the copyright symbol (?).


Common Regular Expressions:

1. verify the username and password: ("^ [A-Za-Z] \ W {5, 15} $") correct format: "[A-Z] [A-Z] _ [0-9]", and the first word must be 6 ~ 16 bits;
2. Verify the phone number ("^ (\ D {3.4}-) \ D {} $") in the correct format: xxx/XXXX-xxxxxxx/XXXXXXXX;
3. Verify the ID card number (15 or 18 digits): ("^ \ D {15} Then \ D {18} $ ");
4. verify email address: ("^ \ W + ([-+.] \ W +) * @ \ W + ([-.] \ W + )*\. \ W + ([-.] \ W +) * $ ");
5. You can only enter a string consisting of a number and 26 English letters :( "^ [A-Za-z0-9] + $ ");
6. Integer or decimal point: ^ [0-9] + \. {0, 1} [0-9] {0, 2} $
7. Only numbers can be entered: "^ [0-9] * $ ".
8. Only n digits can be entered: "^ \ D {n} $ ".
9. You can only enter at least N digits: "^ \ D {n,} $ ".
10. Only M ~ can be input ~ N-digit :. "^ \ D {m, n} $"
11. Only numbers starting with zero and non-zero can be entered: "^ (0 then [1-9] [0-9] *) $ ".
12. Only positive numbers with two decimal places can be entered: "^ [0-9] + (. [0-9] {2 })? $ ".
13. Only 1 ~ Positive number of three decimal places: "^ [0-9] + (. [0-9] {1, 3 })? $ ".
14. Only non-zero positive integers can be entered: "^ \ +? [1-9] [0-9] * $ ".
15. Only a non-zero negative integer can be entered: "^ \-[1-9] [] 0-9" * $.
16. Only three characters can be entered: "^. {3} $ ".
17. You can only enter a string consisting of 26 English letters: "^ [A-Za-Z] + $ ".
18. Only a string consisting of 26 uppercase letters can be entered: "^ [A-Z] + $ ".
19. You can only enter a string consisting of 26 lower-case English letters: "^ [A-Z] + $ ".
20. Verify whether ^ % & apos;,; =? $ \ "And other characters:" [^ % & apos;,; =? $ \ X22] + ".
21. Only Chinese characters can be entered: "^ [\ u4e00-\ u9fa5] {0,} $"
22. Verify URL: "^ http: // ([\ W-] + \.) + [\ W-] + (/[\ W -./? % & =] *)? $ ".
23. Verify 12 months of the year: "^ (0? [1-9] limit 1 [0-2]) $ "the correct format is:" 01 "~ "09" and "1 "~ "12 ".
24. 31 days of verification for a month: "^ (0? [1-9]) returns (1 hour 2) [0-9]) returns 30 minutes 31) $ "the correct format is;" 01 "~ "09" and "1 "~ "31 ".
25. "^ \ D + $" // non-negative integer (positive integer + 0)
26. "^ [0-9] * [1-9] [0-9] * $" // positive integer
27. "^ (-\ D +) random (0 +) $" // non-positive integer (negative integer + 0)
28. "^-[0-9] * [1-9] [0-9] * $" // negative integer
29. "^ -? \ D + $ "// integer
30. "^ \ D + (\. \ D + )? $ "// Non-negative floating point number (Positive floating point number + 0)
31. "^ ([0-9] + \. [0-9] * [1-9] [0-9] *) returns ([0-9] * [1-9] [0-9] * \. [0-9] +) percentile ([0-9] * [1-9] [0-9] *) $ "// Positive floating point number
32. "^ (-\ D + (\. \ D + )?) Round (0 + (\. 0 + )?)) $ "// Non-Positive floating point number (negative floating point number + 0)
33. "^ (-([0-9] + \. [0-9] * [1-9] [0-9] *) returns ([0-9] * [1-9] [0-9] * \. [0-9] +) rotate ([0-9] * [1-9] [0-9] *) $ "// negative floating point number
34. "^ (-? \ D +) (\. \ D + )? $ "// Floating point number
35. "^ [A-Za-Z] + $" // a string consisting of 26 letters
36. "^ [A-Z] + $" // a string consisting of 26 uppercase letters
37. "^ [A-Z] + $" // a string consisting of 26 lowercase letters
38. "^ [A-Za-z0-9] + $" // string consisting of digits and 26 letters
39. "^ \ W + $" // a string consisting of a number, 26 letters, or underscores
40. "^ [\ W-] + (\. [\ W-] +) * @ [\ W-] + (\. [\ W-] +) + $ "// email address
41. "^ [A-Za-Z] +: // (\ W + (-\ W + )*)(\. (\ W + (-\ W + )*))*(\? \ S *)? $ "// URL
42. Extract the network link from the information: (H) (R branch R) (E Branch E) (F branch f) * = * (& apos; branch ")? (\ W rows \ records \/rows \.) + (& apos; audio "audio * audio> )?
43. email Address in the extracted information: \ W + ([-+.] \ W +) * @ \ W + ([-.] \ W + )*\. \ W + ([-.] \ W + )*
44. Extract the image link from the information: (S ¦ S) (R ¦ R) (C ¦ c) * = * (& apos; ¦ ")? (\ W rows \ records \/rows \.) + (& apos; audio "audio * audio> )?
45. Extract the IP address (\ D +) \. (\ D +)
46. Extract the Chinese mobile phone number from the information: (86) * 0*13 \ D {9}
47. The Chinese landline number (\ D {3, 4} \) contains \ D {3, 4}-numbers \ s) in the extracted information )? \ D {8}
48. Extract the Chinese phone numbers (including mobile and landline phone numbers) in the Information: (\ D {3, 4} \) Jun \ D {3, 4}-jun \ s )? \ D {7, 14}
49. Extract the Chinese zip code from the information: [1-9] {1} (\ D +) {5}
50. Extract the Chinese ID card number from the information: \ D {18} Jun \ D {15}
51. Extract the integer \ D + from the information
52. Extract floating point numbers (decimal places) in the Information ):(-? \ D *)\.? \ D +
53. extract any number in the information :(-? \ D *) (\. \ D + )?
54. Extract the Chinese string from the information: [\ u4e00-\ u9fa5] *
55. Extract the double byte string (Chinese character) from the Information: [^ \ x00-\ xFF] *
56. Extract the English string from the information: \ W *
57. extract content between arbitrary HTML tags: <SCRIPT [\ s] + </script *>
58. High-Intensity date Verification
^ (1 [6-9] | [2-9] \ D) \ D {2})-(0? [1, 13578] | 1 [02])-(0? [1-9] | [12] \ d | 3 [01]) | (1 [6-9] | [2-9] \ D) \ D {2})-(0? [13456789] | 1 [012])-(0? [1-9] | [12] \ d | 30) | (1 [6-9] | [2-9] \ D) \ D {2 }) -0? 2-(0? [1-9] | 1 \ d | 2 [0-8]) | (1 [6-9] | [2-9] \ D) (0 [48] | [2468] [048] | [13579] [26]) | (16 | [2468] [048] | [3579] [26]) 00)-0? 2-29-) $

59. High-Intensity date + time verification
^ (1 [6-9] | [2-9] \ D) \ D {2})-(0? [1, 13578] | 1 [02])-(0? [1-9] | [12] \ d | 3 [01]) | (1 [6-9] | [2-9] \ D) \ D {2})-(0? [13456789] | 1 [012])-(0? [1-9] | [12] \ d | 30) | (1 [6-9] | [2-9] \ D) \ D {2 }) -0? 2-(0? [1-9] | 1 \ d | 2 [0-8]) | (1 [6-9] | [2-9] \ D) (0 [48] | [2468] [048] | [13579] [26]) | (16 | [2468] [048] | [3579] [26]) 00)-0? 2-29-) (20 | 21 | 22 | 23 | [0-1]? \ D): [0-5]? \ D: [0-5]? \ D $

We can see from the above: "^" indicates that the followed character is the beginning; the corresponding style "$" is followed by the first character as the end. note that when "^" is in "[]", it indicates "not", for example: [^ AZ] indicates that it cannot be any character in "az. "[]" indicates a character. "{}" can obtain a range. For example, "{9}" indicates 9 characters, while "{}" indicates 1 to 9 characters.

Regular Expression debugging tool:
/Files/Yasin/regextester.zip

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.