Regular expression syntax

Source: Internet
Author: User
Tags alphabetic character

Regular Expression Syntax

In a typical search and replace operation, you must provide the exact text you want to find. This technique may suffice for simple search and replace tasks in static text, but because of its lack of flexibility, it can be difficult or even impossible to search for dynamic text.

Using regular expressions, you can:

    • Tests a pattern for a string. For example, you can test an input string to see if there is a phone number pattern or a credit card number pattern in the string. This is called data validation.
    • Replace the text. You can use a regular expression in your document to identify specific text, and then you can either delete it all or replace it with another text.
    • Extracts a substring from a string based on pattern matching. Can be used to find specific text in text or input fields.

For example, if you need to search the entire Web site to remove some outdated material and replace some HTML formatting tags, you can use regular expressions to test each file to see if there is a material or HTML formatting tag in the file that you want to find. In this way, you can narrow the affected file to those files that contain the material you want to delete or change. You can then use regular expressions to delete obsolete materials, and finally, you can use regular expressions again to find and replace those that need to be replaced.

Another example that demonstrates the usefulness of a regular expression is a language that is not known for its string-handling capabilities. VBScript is a subset of Visual Basic and has rich string handling capabilities. Jscript, similar to C, does not have this capability. Regular expressions provide a significant improvement in the string handling capabilities of JScript. However, it may be more efficient to use regular expressions in VBScript, which allows multiple string operations to be performed in a single expression.

A regular expression is a text pattern consisting of ordinary characters, such as characters A through z, and special characters (called metacharacters ). This pattern describes one or more strings to match when looking up a text body. A regular expression, as a template, matches a character pattern to the string you are searching for.

Here are some examples of regular expressions that you might encounter:

JScript VBScript Match
/^\[\t]*$/ "^\[\t]*$" Matches a blank line.
/\d{2}-\d{5}/ "\d{2}-\d{5}" Verify that an ID number consists of a 2-digit number, a hyphen, and a 5-digit number.
/< (. *) >.*<\/\1>/ "< (. *) >.*<\/\1>" Matches an HTML tag.

The following table is a complete list of metacharacters and its behavior in the context of regular expressions:

character Description
\ The next character is marked with a special character, or a literal character, or a back reference, or an octal escape character. For example, ' n ' matches the character "n". ' \ n ' matches a line break. The sequence ' \ \ ' matches "\" and "\ (" Matches "(".
^ Matches the starting position of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after ' \ n ' or ' \ R '.
$ Matches the end position of the input string. If the Multiline property of the RegExp object is set, $ also matches the position before ' \ n ' or ' \ R '.
* Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}.
+ Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but not "Z". + equivalent to {1,}.
? Matches the preceding subexpression 0 or one time. For example, "Do (es)?" can match "do" in "do" or "does".? Equivalent to {0,1}.
{n} N is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '.
{n,} N is a non-negative integer. Match at least N times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ', but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
{n,m} Both m and n are non-negative integers, where n <= m. Matches at least N times and matches up to M times. Liu, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' O? '. Note that there can be no spaces between a comma and two numbers.
? When the character immediately follows any other restriction (*, +,?, {n}, {n,}, {n,M}), the matching pattern is non-greedy. The non-greedy pattern matches the searched string as little as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "oooo", ' o+? ' will match a single "O", while ' o+ ' will match all ' o '.
. Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '.
(pattern) Match pattern and get this match. The obtained matches can be obtained from the resulting Matches collection, the submatches collection is used in VBScript, and in JScript ... The $9 property. To match the parentheses character, use ' \ (' or ' \ ').
(?:pattern) Matches pattern but does not get a matching result, which means that this is a non-fetch match and is not stored for later use. This is useful when using the "or" character (|) to combine parts of a pattern. For example, ' Industr (?: y|ies) is a more abbreviated expression than ' industry|industries '.
(? =pattern) Forward-checking matches the lookup string at the beginning of any string that matches the pattern . This is a non-fetch match, which means that the match does not need to be acquired for later use. For example, ' Windows (? =95|98| nt|2000) ' Can match Windows 2000 ', but does not match Windows 3.1 in Windows. Pre-checking does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, rather than starting with the character that contains the pre-check.
(?! pattern) Negative pre-check, in any mismatch negative lookahead matches the search string at any point where a string does not matching pattern start at the beginning of the Match the lookup string. This is a non-fetch match, which means that the match does not need to be acquired for later use. For example ' Windows (?! 95|98| nt|2000) ' can match Windows 3.1 ', but does not match Windows 2000 in Windows. Pre-check does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, rather than starting with the character that contains the pre-check
x| y Match x or y. For example, ' Z|food ' can match "z" or "food". ' (z|f) Ood ' matches "Zood" or "food".
[XYZ] The character set is combined. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain '.
[^XYZ] Negative character set. Matches any character that is not contained. For example, ' [^ABC] ' can match ' P ' in ' plain '.
[A-Z] The character range. Matches any character within the specified range. For example, ' [A-z] ' can match any lowercase alphabetic character in the ' a ' to ' Z ' range.
[^ A-Z] A negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any character that is not within the range of ' a ' to ' Z '.
\b Matches a word boundary, which is the position between a word and a space. For example, ' er\b ' can match ' er ' in ' never ', but not ' er ' in ' verb '.
\b Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.
\cx Matches the control character indicated by x . For example, \cm matches a control-m or carriage return. The value of x must be one of a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
\d Matches a numeric character. equivalent to [0-9].
\d Matches a non-numeric character. equivalent to [^0-9].
\f Matches a page break. Equivalent to \x0c and \CL.
\ n Matches a line break. Equivalent to \x0a and \CJ.
\ r Matches a carriage return character. Equivalent to \x0d and \cm.
\s Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].
\ t Matches a tab character. Equivalent to \x09 and \ci.
\v Matches a vertical tab. Equivalent to \x0b and \ck.
\w Matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '.
\w Matches any non-word character. Equivalent to ' [^a-za-z0-9_] '.
\xN Match N, where n is the hexadecimal escape value. The hexadecimal escape value must be two digits long for a determination. For example, ' \x41 ' matches ' A '. ' \x041 ' is equivalent to ' \x04 ' & ' 1 '. ASCII encoding can be used in regular expressions:
\Num Matches num, where num is a positive integer. A reference to the obtained match. For example, ' (.) \1 ' matches two consecutive identical characters.
\N Identifies an octal escape value or a back reference. If \n has at least N obtained sub-expressions, then n is a back reference. Otherwise, if n is the octal number (0-7), N is an octal escape value.
\nm Identifies an octal escape value or a back reference. The nm is a back reference if at least before \nm There is a preceded by at least nm of obtained sub-expressions. If there are at least N fetches before the \nm , then N is a back reference followed by the literal m . If the preceding conditions are not satisfied, if both n and m are octal digits (0-7), then \nm will match the octal escape value nm.
\NML If n is an octal number (0-3) and both m and l are octal digits (0-7), the octal escape value NML is matched .
\uN Match N, where n is a Unicode character represented by four hexadecimal digits. For example, \u00a9 matches the copyright symbol (?).



Common Regular

1. Verify the user name and password: ("^[a-za-z]\w{5,15}$") the correct format: "[a-z][a-z]_[0-9]" composition, and the first word must be a letter 6~16 bit;
2. Verify the phone number: ("^ (\d{3.4}-) \d{7,8}$") correct format: xxx/xxxx-xxxxxxx/xxxxxxxx;
3. Verify the ID number (15-bit or 18-digit number): ("^\d{15}¦\d{18}$");
4. Verify the email Address: ("^\w+ [-+.] \w+) *@\w+ ([-.] \w+) *\.\w+ ([-.] \w+) *$ ");
5. Only a string consisting of a number and 26 English letters can be entered: ("^[a-za-z0-9]+$");
6. Integer or decimal: ^[0-9]+\. {0,1} [0-9] {0,2}$
7. Enter only the number: "^[0-9]*$".
8. Only n digits can be entered: "^\d{n}$".
9. You can only enter numbers with at least n digits: "^\d{n,}$".
10. Enter only the digits of the m~n bit:. "^\d{m,n}$"
11. You can only enter numbers that begin with 0 and not 0: "^ (0¦[1-9][0-9]*) $".
12. Only positive real numbers with two decimal places can be entered: "^[0-9]+ (. [ 0-9]{2})? $ ".
13. You can only enter positive real numbers with a decimal position: "^[0-9]+ (. [ 0-9]{1,3})? $ ".
14. Only non-zero positive integers can be entered: "^\+?" [1-9] [0-9]*$].
15. You can only enter a non-zero negative integer: "^\-[1-9][]0-9" *$.
16. Only enter a character with a length of 3: "^. {3}$ ".
17. You can only enter a string consisting of 26 English letters: "^[a-za-z]+$".
18. You can only enter a string consisting of 26 uppercase English letters: "^[a-z]+$".
19. You can only enter a string consisting of 26 lowercase English letters: "^[a-z]+$".
20. Verify that the ^%& &apos;,;=?$\ "characters are included:" [^%& &apos;,;=?$\x22]+ ".
21. Only Chinese characters can be entered: "^[\u4e00-\u9fa5]{0,}$"
22. Verify the URL: "^http://([\w-]+\.) +[\w-]+ (/[\w-./?%&=]*)? $ ".
23. Verification 12 months of the year: "^ (0?[ 1-9]¦1[0-2]) $ "The correct format is:" 01 "~" 09 "and" 1 "~" 12 ".
24. Verify one months of 31 days: "^ ((0?[ 1-9]) ¦ ((1¦2) [0-9]) ¦30¦31) $ "correct format for;" 01 "~" 09 "and" 1 "~" 31 ".
"^\d+$"//nonnegative integers (positive integers + 0)
26. "^[0-9]*[1-9][0-9]*$"//Positive integer
"^ ((-\d+) ¦ (0+)) $"//non-positive integer (negative integer + 0)
28. "^-[0-9]*[1-9][0-9]*$"//Negative integer
"^-?\d+$"//Integer
"^\d+ (\.\d+) $"//non-negative floating-point number (positive floating point + 0)
31. "^ ([0-9]+\.[ 0-9]*[1-9][0-9]*) ¦ ([0-9]*[1-9][0-9]*\.[ 0-9]+) ¦ ([0-9]*[1-9][0-9]*)) $ "//positive floating point
"^ ((-\d+ (\.\d+)) ¦ (0+ (\.0+))) $ "//non-positive floating-point number (negative floating-point number + 0)
33. "^ (-([0-9]+\.[ 0-9]*[1-9][0-9]*) ¦ ([0-9]*[1-9][0-9]*\.[ 0-9]+) ¦ ([0-9]*[1-9][0-9]*))) $ "//negative floating-point number
"^ (-?\d+) (\.\d+)? $"//floating-point number
"^[a-za-z]+$"//A string consisting of 26 English letters
"^[a-z]+$"//A string consisting of 26 uppercase letters in English
"^[a-z]+$"//A string consisting of 26 letters in lowercase
"^[a-za-z0-9]+$"//string consisting of a number and 26 English letters
"^\w+$"//A string consisting of numbers, 26 letters or underscores
"^[\w-]+ (\.[ \w-]+) *@[\w-]+ (\.[ \w-]+) +$ "//email address
"^[a-za-z]+://(\w+ (-\w+) *) (\. ( \w+ (-\w+) *) * (\?\s*)? $ "//url
42. The network link in the extracted information: (H¦H) (r¦r) (e¦e) (f¦f) *= * (&apos;¦ ")? (\w¦\\¦\/¦\.) + (&apos;¦ "¦*¦>)?
43. Email address in the Extract information: \w+ ([-+.] \w+) *@\w+ ([-.] \w+) *\.\w+ ([-.] \w+) *
44. The image link in the extracted information: (s¦s) (r¦r) (c¦c) *= * (&apos;¦ ")? (\w¦\\¦\/¦\.) + (&apos;¦ "¦*¦>)?
45. Extract the IP address from the information: (\d+) \. (\d+) \. (\d+) \. (\d+)
46. Chinese mobile phone number in extracting information: (*0*13\D{9)
47. Chinese fixed phone number in extracting information: (\ (\d{3,4}\) ¦\d{3,4}-¦\s)? \d{8}
48. Extract the Chinese phone number (including mobile and landline) in the information: (\ (\d{3,4}\) ¦\d{3,4}-¦\s)? \d{7,14}
49. Extracting the information in China Postcode: [1-9]{1} (\d+) {5}
50. The Chinese identity card number in the information extraction: \d{18}¦\d{15}
51. Extracting the integer from the information: \d+
52. Extract the floating-point number (that is, decimal) in the information: (-?\d*) \.? \d+
53. Extract any number from the information: (-?\d*) (\.\d+)?
54. Extract the Chinese string from the message: [\u4e00-\u9fa5]*
55. Extract the Double-byte string in the message (Kanji): [^\x00-\xff]*
56. Extract the English string from the message: \w*
57. Extract the content between any HTML tags: <script[\s\s]+</script *>
58. High-strength Date verification
^ ((((1[6-9]| [2-9]\d) \d{2})-(0?[ 13578]|1[02])-(0?[ 1-9]| [12]\d|3[01]) | (((1[6-9]| [2-9]\d) \d{2})-(0?[ 13456789]|1[012])-(0?[ 1-9]| [12]\d|30)] | (((1[6-9]| [2-9]\d] \d{2}) -0?2-(0?[ 1-9]|1\D|2[0-8]) | (((1[6-9]| [2-9]\d] (0[48]|[ 2468][048]| [13579] [26]) | ((16| [2468] [048]| [3579] [26]) 00)) ( -0?2-29-)) $

59. High strength Date + time verification
^ ((((1[6-9]| [2-9]\d) \d{2})-(0?[ 13578]|1[02])-(0?[ 1-9]| [12]\d|3[01]) | (((1[6-9]| [2-9]\d) \d{2})-(0?[ 13456789]|1[012])-(0?[ 1-9]| [12]\d|30)] | (((1[6-9]| [2-9]\d] \d{2}) -0?2-(0?[ 1-9]|1\D|2[0-8]) | (((1[6-9]| [2-9]\d] (0[48]|[ 2468][048]| [13579] [26]) | ((16| [2468] [048]| [3579] [26]) 00)) ( -0?2-29-)) (20|21|22|23|[ 0-1]?\d): [0-5]?\d:[0-5]?\d$

From the above we can see that "^" means the character immediately followed by the beginning; the corresponding formula "$" is preceded by the preceding character. But note that when "^" is in "[]", it means "non", for example: [^az] means that it cannot be any of the characters in "AZ". " [] "means one of the characters." {} "can take a range, for example," {9} "represents 9, while" {1,9} "represents 1 to 9 characters.

Regular Debugging Tools
/files/yasin/regextester.zip

Regular expression syntax

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.