The application of regular expressions

Source: Internet
Author: User
Tags alphabetic character character set contains expression integer numeric range reference
Regular

The early origins of regular expressions

The "ancestors" of regular expressions can be traced back to early studies of how the human nervous system works. Warren McCulloch and Walter Pitts, two neuroscientists, have developed a mathematical way of describing these neural networks.

In 1956, an American mathematician named Stephen Kleene, based on the early work of McCulloch and Pitts, published a paper entitled "Representation of neural network events", introducing the concept of regular expressions. A regular expression is an expression that describes what he calls the algebra of a regular set, so the term "regular expression" is used.

Subsequently, it was found that this work could be applied to some early studies using Ken Thompson's computational Search algorithm, and Ken Thompson was the main inventor of Unix. The first practical application of regular expressions is the QED Editor in Unix.

As they say, the rest is a well-known history. From then until now regular expressions are an important part of text-based editors and search tools.

Using regular expressions

In a typical search and replace operation, you must provide the exact text you want to find. This technique may be sufficient for simple search and replace tasks in static text, but because of its lack of flexibility, it is difficult or even impossible to search for dynamic text.

Using regular expressions, you can:

    • Tests a pattern of a string. For example, you can test an input string to see if there is a phone number pattern or a credit card number pattern in the string. This is known as data validation.
    • Replaces text. You can use a regular expression in your document to identify specific text, and then you can delete it all, or replace it with another text.
    • Extracts a substring from a string based on pattern matching. Can be used to find specific text in text or input fields.

For example, if you need to search the entire Web site to remove some outdated material and replace some HTML formatting tags, you can use regular expressions to test each file to see if there are any material or HTML formatting tags that you want to find in the file. With this method, you can narrow the affected file range to those files that contain the material you want to delete or change. You can then use regular expressions to delete obsolete materials, and finally, you can use regular expressions again to find and replace those that need to be replaced.

Another example that illustrates the usefulness of regular expressions is a language whose string-handling power is not yet known. VBScript is a subset of Visual Basic that has rich string processing capabilities. Jscript similar to C does not have this ability. Regular expressions make a noticeable improvement in the string handling capabilities of JScript. However, it may be more efficient to use regular expressions in VBScript, which allows multiple string operations to be performed in a single expression.

Regular expression syntax

A regular expression is a literal pattern consisting of ordinary characters (such as characters A through Z) and special characters (called metacharacters ). This pattern describes one or more strings to be matched when looking for a text body. A regular expression is used as a template to match a character pattern with the string being searched for.

Here are some examples of regular expressions that you might encounter:

JScript VBScript Match
/^\[\t]*$/ "^\[\t]*$" Matches a blank row.
/\d{2}-\d{5}/ "\d{2}-\d{5}" Verify that an ID number consists of a 2-digit number, a hyphen, and a 5-digit number.
/< (. *) >.*<\/\1>/ "< (. *) >.*<\/\1>" Matches an HTML tag.

The following table is a complete list of metacharacters and its behavior in the context of regular expressions:

character Description
\ Marks the next character as a special character, or a literal character, or a back reference, or a octal escape character. For example, ' n ' matches the character ' n '. ' \ n ' matches a newline character. Sequence ' \ ' matches ' \ ' and ' \ (' Matches ' (".
^ Matches the start position of the input string. If the Multiline property of the RegExp object is set, ^ also matches the position after ' \ n ' or ' \ R '.
$ Matches the end position of the input string. If the Multiline property of the RegExp object is set, the $ also matches the position before ' \ n ' or ' \ R '.
* Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * is equivalent to {0,}.
+ Matches the preceding subexpression one or more times. For example, ' zo+ ' can match "Zo" and "Zoo", but cannot match "Z". + is equivalent to {1,}.
? Match the preceding subexpression 0 times or once. For example, "Do (es)" can match "do" in "do" or "does". is equivalent to {0,1}.
{n} N is a non-negative integer. Matches the determined n times. For example, ' o{2} ' cannot match ' o ' in ' Bob ', but can match two o in ' food '.
{n,} N is a non-negative integer. Match at least N times. For example, ' o{2,} ' cannot match ' o ' in ' Bob ' but can match all o in ' Foooood '. ' O{1,} ' is equivalent to ' o+ '. ' O{0,} ' is equivalent to ' o* '.
{n,m} m and n are nonnegative integers, of which n <= m. Matches N times at least and matches up to M times. Liu, "o{1,3}" will match the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' o '. Notice that there is no space between the comma and the two number.
? When the character is immediately following any other qualifier (*, +,?, {n}, {n,}, {n,m}), the matching pattern is not greedy. Non-greedy patterns match as few strings as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "oooo", ' o+? ' will match a single "O", and ' o+ ' will match all ' o '.
. Matches any single character except "\ n". To match any character including ' \ n ', use a pattern like ' [. \ n] '.
(pattern) Match pattern and get this match. The obtained matches can be obtained from the generated matches collection, the submatches collection is used in VBScript, and in JScript the $... the $ attribute. To match the parentheses character, use ' \ (' or ' \ ').
(?:pattern) Matches pattern but does not get a matching result, which means it is a non fetch match and is not stored for later use. This is useful for combining parts of a pattern with the "or" character (|). For example, ' Industr (?: y|ies) is a more abbreviated expression than ' industry|industries '.
(? =pattern) Forward lookup, matching the find string at the beginning of any string matching pattern. This is a non-fetch match, that is, the match does not need to be acquired for later use. For example, ' Windows (? =95|98| nt|2000) ' Can match windows in Windows 2000, but cannot match windows in Windows 3.1. It does not consume characters, that is, after a match occurs, the next matching search begins immediately after the last match, instead of starting after the character that contains the pre-check.
(?! pattern) Negative pre-check, at the beginning of any mismatch negative lookahead matches the search string at either point where a string does matching pattern With a lookup string. This is a non-fetch match, that is, the match does not need to be acquired for later use. For example, ' Windows (?! 95|98| nt|2000) ' Can match windows in Windows 3.1, but cannot match windows in Windows 2000. It does not consume characters, that is, after a match occurs, the next matching search begins immediately after the last match, instead of starting after the character that contains the pre-check.
x| y Match x or y. For example, ' Z|food ' can match "z" or "food". ' (z|f) Ood ' matches ' zood ' or ' food '.
[XYZ] Character set combination. Matches any one of the characters contained. For example, ' [ABC] ' can match ' a ' in ' plain '.
[^XYZ] Negative character set combination. Matches any characters that are not included. For example, ' [^ABC] ' can match ' P ' in ' plain '.
[A-Z] The range of characters. Matches any character within the specified range. For example, ' [A-z] ' can match any lowercase alphabetic character in the range ' a ' to ' Z '.
[^ A-Z] Negative character range. Matches any character that is not in the specified range. For example, ' [^a-z] ' can match any character that is not in the range of ' a ' to ' Z '.
\b Matches a word boundary, which is the position between the word and the space. For example, ' er\b ' can match ' er ' in ' never ', but cannot match ' er ' in ' verb '.
\b Matches a non-word boundary. ' er\b ' can match ' er ' in ' verb ', but cannot match ' er ' in ' Never '.
\cx Matches the control character indicated by x . For example, \cm matches a control-m or carriage return character. The value of x must be one-a-Z or a-Z. Otherwise, c is treated as a literal ' C ' character.
\d Matches a numeric character. equivalent to [0-9].
\d Matches a non-numeric character. equivalent to [^0-9].
\f Matches a page feed character. Equivalent to \x0c and \CL.
\ n Matches a line feed character. Equivalent to \x0a and \CJ.
\ r Matches a carriage return character. Equivalent to \x0d and \cm.
\s Matches any white space character, including spaces, tabs, page breaks, and so on. Equivalent to [? \f\n\r\t\v].
\s Matches any non-white-space character. equivalent to [^?\f\n\r\t\v].
\ t Matches a tab character. Equivalent to \x09 and \ci.
\v Matches a vertical tab. Equivalent to \x0b and \ck.
\w Matches any word character that includes an underscore. Equivalent to ' [a-za-z0-9_] '.
\w Matches any non word character. Equivalent to ' [^a-za-z0-9_] '.
\xN Matches n, where n is the hexadecimal escape value. The hexadecimal escape value must be a determined two digits long. For example, ' \x41 ' matches ' A '. ' \x041 ' is equivalent to ' \x04 ' & ' 1 '. You can use ASCII encoding in regular expressions ...
\Num Matches num, where num is a positive integer. A reference to the match that was obtained. For example, ' (.) \1 ' matches two consecutive identical characters.
\N Identifies a octal escape value or a back reference. N is a back-reference if at least N obtained subexpression before \n . Otherwise, if n is an octal number (0-7), then N is an octal escape value.
\nm Identifies a octal escape value or a back reference. If \nm has at least one preceded by at least nm , then nm is a back-reference. If at least N is fetched before \nm , then N is a back reference followed by a literal m . If the preceding conditions are not satisfied, what if? n and m are octal digits (0-7), then \nm will match octal escape value nm.
\NML If n is an octal number (0-3) and both m and l are octal digits (0-7), the octal escape value NML is matched .
\uN Matches n, where n is a Unicode character represented in four hexadecimal digits. For example, \u00a9 matches the copyright symbol (?).

An example of the application of regular expressions

  • Password checksum
    Function Testpassword (strpassword)
    Dim RE
    Set re = new REGEXP

    Re. IgnoreCase = False
    Re.global = False
    Re. Pattern = "^[a-za-z]\w{3,14}$"

    Testpassword = Re. Test (strpassword)
    End Function
    Compare the regular and natural language descriptions of this test cipher format to see:

    The first character of the password must be a letter: The regular expression description is "^[a-za-z]" where "^" denotes the beginning of the string, and the hyphen tells RegExp to match all the characters in the specified range.

    The password is at least 4 characters and is no more than 15 characters: the Regular expression description is "{3,14}".

    The password cannot contain characters other than letters, numbers, and underscores: the regular expression description is "\w".

    A few notes: {3,14} indicates that the preceding pattern matches at least 3 characters, but not more than 14 (plus the first character is 4 to 15 characters). Note that the syntax in curly braces is extremely strict and does not allow spaces to be added on either side of the comma. If a space is added, it will have an effect on the meaning of the regular expression, resulting in an error in the password format validation. Additionally, the "$" character is not appended to the end of the regular expression above. The $ character causes the regular expression to match the string to the end, ensuring that no other characters are appended to the legal password.
    Practical application: If you get the result with Response.Write Testpassword ("123") will be false, Response.Write Testpassword ("abc123a") will get the true.
  • Other
  • <%
    strobj= "<a href="/blog/"http://www.csdn.net" "target=_blank>csdn</a>"
    Set regEx = New RegExp ' establishes a regular expression.
    Regex.pattern = ". +href=" "([^" "]+?)" ". + "' Set mode.
    Regex.ignorecase = True
    Regex.global = True
    Response.Write Regex.Replace (Strobj, "$")
    %>
    <%
    strobj= "<font color=" "#003399" "> Restaurant </font>"
    Set regEx = New RegExp ' establishes a regular expression.
    Regex.pattern = "<font[^<>]*> ([^<>]*) </font>" ' Set mode.
    Regex.ignorecase = True
    Regex.global = True
    Response.Write Regex.Replace (Strobj, "$")
    %>

    The above is to obtain the URL, the following is to get the word western Restaurant


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.