Apply four regular expressions

Source: Internet
Author: User

I would like to conclude that there is no time. I have seen a good article today. I would like to contribute and enjoy the powerful functions of regular expressions !!
The text is as follows:
------------------------------------------------------------

The Regular Expression provides an efficient and convenient method for string mode matching. Almost all advanced languages support regular expressions or provide ready-made code libraries for calling. This article describes the application skills of regular expressions based on common processing tasks in the ASP environment.

I. Verify the password and email address format

Our first example demonstrates a basic function of a regular expression: to describe any complex string in an abstract way. It means that regular expressions give programmers a formal string description method, which can describe any string pattern encountered by applications with little code. For example, for a person not engaged in technical work, the password format requirements can be described as follows: the first character of the password must be a letter, the password must be at least 4 characters and no more than 15 characters, the password cannot contain characters other than letters, numbers, and underscores.

As a programmer, We must convert the natural language description of the above password format into other forms, so that the ASP page can understand and apply it to prevent illegal password input. The regular expression describing the password format is ^ [a-zA-Z] \ w {} $.

In ASP applications, we can write the password verification process as reusable functions, as shown below:

Function TestPassword (strPassword)
Dim re
Set re = new RegExp

Re. IgnoreCase = false
Re. global = false
Re. Pattern = "^ [a-zA-Z] \ w {3, 14} $"

TestPassword = re. Test (strPassword)
End Function

Next we will compare the regular expression used to verify the password format with the natural language description:

The first character of the password must be a letter: the regular expression description is "^ [a-zA-Z]", where "^" indicates the start of the string, the hyphen tells RegExp to match all characters in the specified range.

The password must contain at least 4 characters and cannot exceed 15 characters. The regular expression description is "{3, 14 }".

The password cannot contain characters other than letters, numbers, and underscores. The regular expression description is "\ w ".

Note: {3, 14} indicates that the previous pattern matches at least 3 characters but cannot exceed 14 characters (the first character is 4 to 15 characters ). Note that the syntax requirements in curly braces are extremely strict. spaces are not allowed on both sides of the comma. If a space is added, it will affect the meaning of the regular expression, resulting in an error in the password format check. In addition, the above regular expression does not end with the "$" character. $ Makes sure that the regular expression matches the string until the end of the string. Make sure that the valid password is not followed by any other character.

Similar to the password format check, checking the validity of the email address is also a common problem. Using a regular expression for a simple email address check can be implemented as follows:

<%
Dim re
Set re = new RegExp

Re. pattern = "^ \ w + @ [a-zA-Z _] +? \. [A-zA-Z] {2, 3} $"
Response. Write re. Test ("aabb@yahoo.com ")
%>
------------------------------------------------------
2. Extract specific parts of the HTML page

The main problem for extracting content from HTML pages is that we must find a way to precisely identify the part of content we want. For example, the following is an HTML code snippet that displays the news title:

<Table border = "0" width = "11%" class = "Somestory">
<Tr>
& Lt; td width = "100%" & gt;
<P align = "center"> other content... </td>
</Tr>
</Table>
<Table border = "0" width = "11%" class = "Headline">
<Tr>
& Lt; td width = "100%" & gt;
<P align = "center"> Iraq war! </Td>
</Tr>
</Table>
<Table border = "0" width = "11%" class = "Someotherstory">
<Tr>
& Lt; td width = "100%" & gt;
<P align = "center"> other content... </td>
</Tr>
</Table>

By observing the code above, it is easy to see that the news title is displayed in the middle table, and its class attribute is set to Headline. If the HTML page is very complex, you can use an additional function provided by Microsoft IE from 5.0 to view only the HTML code of the selected page. Visit http://www.microsoft.com/windows/ie/webaccess/default.aspfor details. In this example, we assume that this is the only table with the class attribute set to Headline. Now we want to create a regular expression, use the regular expression to find the Headline table and include it in our own page. First, write code that supports regular expressions:

<%
Dim re, strHTML
Set re = new RegExp 'create a regular expression object

Re. IgnoreCase = true
Re. Global = false' end search after the first match
%>

Next, let's consider the region we want to extract: Here, we want to extract the entire <table> structure, including the text of the ending mark and news title. Therefore, the start character of the search should be <table> Start mark: re. Pattern = "<table .*(? = Headline )". This regular expression matches the Start mark of the table and returns all content between the start mark and "Headline" (except for line breaks ). The following shows how to return HTML code matching:

'Put all matching HTML code into the Matches set.
Set Matches = re. Execute (strHTML)

'Display all matching HTML code
For Each Item in Matches
Response. Write Item. Value
Next

'Show one of the items
Response. write Matches. Item (0). Value

Run this code to process the HTML snippet shown above. The regular expression returns the following Matching content: <table border = "0" width = "11%" class = ". In the regular expression, "(? = Headline) "does not get characters, so you cannot see the value of the table class attribute. The code for getting the rest of the table is also quite simple: re. Pattern = "<table .*(? = Headline) (. | \ n )*? </Table> ". "*" After "(. | \ n)" matches 0 to multiple arbitrary characters, while "?" Minimize the "*" matching range, that is, match as few characters as possible before finding the next part of the expression. </Table> indicates the end mark of a table.

"?" It prevents expressions from returning code from other tables. For example, if the preceding HTML code snippet is deleted, The returned content is:

<Table border = "0" width = "11%" class = "Headline">
<Tr>
& Lt; td width = "100%" & gt;
<P align = "center"> Iraq war! </Td>
</Tr>
</Table>
<Table border = "0" width = "11%" class = "Someotherstory">
<Tr>
& Lt; td width = "100%" & gt;
<P align = "center"> other content... </td>
</Tr>
</Table>

The returned content not only contains the <table> mark of the Headline table, but also the Someotherstory table. We can see that the "?" Is indispensable.

In this example, we assume that some of the premises are quite idealistic. In practice, the situation is often much more complicated. Especially when you have no influence on the writing of source HTML code in use, it is particularly difficult to compile ASP code. The most effective method is to spend more time analyzing the HTML near the content to be extracted and often test it to ensure that the extracted content is exactly what you need. In addition, we should pay attention to and handle situations where regular expressions cannot match any content on the source HTML page. The content update may be very fast. Do not make your pages suffer from low-level and ridiculous errors only because others have changed the content format.
----------------------------------------------------
Iii. parsing text data files

There are many formats and types of data files. XML documents, structured text, and even non-structured text are often the data sources of ASP applications. The following is an example of a structured text file using a qualifier. A qualifier (such as quotation marks) indicates that each part of a string is inseparable, even if the string contains a separator that separates records into fields.

The following is a simple structured text file:

Surname, name, phone number, description
Sun, Wukong, 312 555 5656, ASP is good
Pig, 847, 555, 5656, I'm a movie producer

This file is very simple. Its first line is the title, and the following two lines are records separated by commas. It is easy to parse this file. You only need to split the file into lines (based on the line feed symbol), and then split each record by field. However, if we add a comma in the content of a field:

Surname, name, phone number, description
Sun, Wukong, 312 555 5656, I like ASP, and VB and SQL
Pig, 847, 555, 5656, I'm a movie producer

A problem occurs when parsing the first record, because in the parser that only accepts the comma separator, it seems that its last field contains the content of two fields. To avoid such problems, fields containing delimiters must be surrounded by delimiters. Single quotes are a common delimiter. After adding the single quotation mark qualifier to the text file above, its content is as follows:

Surname, name, phone number, description
Sun, Wukong, 312 555 5656, 'I like ASP, and VB and SQL'
Pig, Bajie, 847, 555, 5656, 'I'm a movie produsert'

Now we can determine which comma is the separator and which comma is the content of the field, that is, we only need to regard the comma inside the quotation marks as the content of the field. The next step is to implement a regular expression parser that determines when fields are separated by commas and when Commas are treated as field content.

The problem here is slightly different from that of most regular expressions. We usually look at a small part of the text to see if it can match the regular expression. But here, we can reliably determine which content is enclosed in quotation marks only after the entire line of text is taken into account.

The following is an example of this problem. Extract half a line of content from a text file at will and get: 1, beach, black, 21, ', dog, cat, duck ,',. In this example, it is extremely difficult to parse the content of another data on the left of "1. We do not know how many single quotes are in front of the data segment, so that we cannot determine which characters are placed within the quotation marks (the characters cannot be separated when parsing the text within the quotation marks ). If this piece of data contains an even number (or none) single quotes, then "', dog, cat, duck,'" is a string defined by quotation marks and cannot be separated. If the number of quotation marks in front is odd, then "1, beach, black, 21, '" is the end part of a string and cannot be separated.

Therefore, the regular expression must analyze the entire line of text and fully consider the number of quotation marks to determine whether the character is inside or outside the pair of quotation marks, that is :,(? = ([^ '] *' [^ '] *') * (?! [^ '] *'). This regular expression first finds a quotation mark, and then continues to search for and ensure that the number of single quotation marks after the comma is either an even number or 0. This regular expression is based on the following judgment: if the number of single quotes following the comma is an even number, the comma is located outside the string. The following table provides more detailed descriptions:

, To find a comma
(? = Continue the forward search to match the following pattern:
(Start a new model
[^ '] *' [Non-quote characters] 0 or more, followed by a quotation mark
[^ '] *' [^ '] *) [Non-quoted characters] 0 or multiple, followed by one quotation mark. After combining the preceding content, it matches the quotation marks
) * The end mode matches the entire mode (with quotation marks) 0 or multiple times.
(?! Forward lookup, exclude this mode
[^ '] *' [Non-quote characters] 0 or more, followed by a quotation mark
) End Mode

The following is a VBScript function that accepts a string parameter. It splits the string based on the comma separator and single quotation mark qualifier in the string and returns an array of results:

Function SplitAdv (strInput)
Dim objRE
Set objRE = new RegExp

'Set RegExp object
ObjRE. IgnoreCase = true
ObjRE. Global = true
ObjRE. Pattern = ",(? = ([^ '] *' [^ '] *') * (?! [^ '] *')"

The 'replace method uses chr (8) to Replace the comma we need. chr (8) is \ B.
'Character, \ B may appear extremely small in the string.
'Then we split the string and save it to the array according to \ B
SplitAdv = Split (objRE. Replace (strInput, "\ B"), "\ B ")
End Function

To sum up, parsing text data files with regular expressions is efficient and time-consuming, saving a lot of analysis files and extracting useful data based on complex conditions. In a rapidly growing environment, there will still be a lot of traditional data that can be used, and mastering how to construct efficient data analysis routines will be a valuable skill.
-------------------------------------------------------------------
Iv. String replacement

In the last example, we will look at the replacement function of the VBScript regular expression. ASP is often used to dynamically format text obtained from various data sources. With the powerful function of VBScript regular expressions, ASP can dynamically change the matching complex text. Highlighting some words by adding HTML tags is a common application, such as highlighting search keywords in search results.

To illustrate the implementation method, let's look at an example that highlights all ". NET" in the string. This string can be obtained from anywhere, such as databases or other Web sites.

<%
Set regEx = New RegExp
RegEx. Global = true
RegEx. IgnoreCase = True

'Regular expression pattern,
'Look for any word or URL Ending with ". NET.
RegEx. Pattern = "(\ B [a-zA-Z \. _] +? \. NET \ B )"

'String used to test the replacement function
StrText = "Microsoft has established a new website www. ASP. NET. "

'Call the Replace method of the Regular Expression
'$1 indicates to insert the matched text to the current position.
Response. Write regEx. Replace (strText ,_
"<B style = 'color: #000099; font-size: 18pt '> $1 </B> ")
%>

Note a few important points in this example. The entire regular expression is placed in a pair of parentheses. It is used to intercept all matching content for future use. The content is referenced by $1 in the replacement text. Similar interceptions can be replaced by up to 9, which can be referenced by $1 to $9. The Replace method of the regular expression is different from the Replace function of VBScript. It only requires two parameters: the searched text and the replaced text.

In this example, to highlight the searched ". NET" string, we use bold labels and other style attributes to enclose these strings. Using this search and replacement technology, we can easily add the function of highlighting search keywords to the website search program, or automatically add links to other pages for the keywords displayed on the page.

Conclusion

I hope the regular expression skills described in this article will inspire you when and how to apply the regular expression. Although this example is written in VBScript. the Regular Expression in NET is also useful. It is one of the main mechanisms for Form Verification of controls on the server side, and is passed through System. text. the RegularExpressions namespace is exported to the whole.. NET Framework.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.