[. Net]. NET introduces windows developers to the world of Regular Expressions

Source: Internet
Author: User
Tags character classes printable characters uppercase letter expression engine
A regular expression (regular expression, that is, RegEx) is a pattern used to describe large pieces of text ). The Regular Expression Engine applies this mode to the source file text. The original source text is irrelevant-it can be a text file, HTML source code of a webpage, or even a column in a database table.

You only need to use several tokens to describe complex patterns, and you can also perform cool things on these patterns, such as arithmetic operations (that is, calculate the number of times this mode appears ).

Regular Expression Application

There are so many applications using regular expressions that they cannot be listed in this article. However, we still list some of them to show you what the regular expressions do:

  • Verify the validity of user data. User data (such as phone numbers, credit card numbers, and email addresses) can come from websites or local applications.
  • In a text file, search for words such as the complete set of Shakespeare (no less than 10 characters or any specified number of characters, you can also search for "love" and "horse" (case-insensitive) and report their locations and times.
  • Store important data, such as the football schedule provided by various websites, to my own database.
  • For the website search engine, it maintains an intelligent front-end and write mode for the regular expression, and submits all other work to. Net for processing.
Pattern Recognition

A pattern is an algebraic expression of a character sequence. It satisfies incrementing (that is, the pattern can contain a series of other patterns) and progressive (that is, the pattern can contain subpatterns ).TableALists common tags in regular expressions.

 
Table

Mark

Description

.

Match any single character (that is, "w. e" matches "Brawley", where "." matches the single character "L ").

/B

It is used to specify words that do not take into account the prefix and backend characters (that is, it does not matter if the delimiters are spaces, tabs, or commas ).

/W

Match any word character (that is, ~ Z, ~ Z, and 0 ~ 9 ).

/W

Match any non-word characters (except ~ Z, ~ Z, and 0 ~ 9.

/D

Match any number.

/D

Match any non-numbers.

/.

Indicates that you actually search for the bitwise sequence of the vertex number (because the vertex number has other meanings, you need to use the bitwise method to represent it ).

/S

It indicates any white space (it can be a tab or a space, and we don't care what it is ).

/S

It indicates any non-white space.

^

Start marker of a string or line.

$

End mark of a string or line.

*

Used to indicate that zero or greater than zero has occurred.

?

Indicates any character (that is, zero or one occurrence.

+

The previous tag appears once or multiple times (that is, "/W +" indicates matching any word ).

[]

Represents a range, such as a [A-Z] That matches any uppercase letter.

|

Or (OR) operator, used to indicate a set of matching values of interest
(That is, "ABC | BCD | Def" matches any one or more strings in the three sequences.

()

It is the same as "|.

Mark common Regular Expressions

Regular Expression options
[Blocked ads]

You can use various options to change the behavior of regular expressions. There is an option to specify the working method of the corresponding regular expression-single row or multiple rows. The default value is the single-line working method. This method is generally required when you process text files. The multiline mode allows you to read all rows in text as a single object. It is appropriate to use the multiline mode when exporting text files from the database (fields are indicated by commas, tabs, or quotation marks in specific rows.TableBLists several options that are commonly used in. net.

 
Table B

Option

Description

None

Indicates that no option is set.

Ignorecase

It indicates that uppercase and lowercase letters are not sensitive to matching.

Multiline

Specify multiline Mode
Change the meaning of ^ and $ so that they match the beginning and end of any line, rather than the start and end of the entire string.

Explicitcapture

Specify whether effective capture is clearly named or (? <Name> ...) Group numbered in this way
This allows parentheses to act as noncapturing groups without the need to use clumsy syntax (? :...).

Compiled

Indicates that the regular expression will be compiled into assembly.
Generates Microsoft intermediate language (msil) code for regular expressions, making execution faster (but at the cost of the start time ).

Singleline

Single Row Mode
Change the meaning of the Period Character (.) so that it can match any single character (not all characters except/N ).

Ignorepatternwhitespace

Specify the mode to allow all white spaces (unescaped) and enable the comments to start with a digit.
(See the white space characters listed in character escapes .) Note that character classes have never been replaced by white spaces.

Righttoleft

Indicates that the search is from right to left, not from left to right.
The regular expression with this option moves to the left of the starting point rather than the right (therefore, the starting point should be specified as the end of the string rather than the beginning ). To prevent regular expressions from falling into an infinite loop, this option cannot be specified in the middle (midstream. However, lookbehind constructs (? <) It can prevent similar things from happening. It can be used in submode.

Ecmascript

Specify this regular expression to enable ecmascript compatibility mode.
This option can only be used with the ignorecase and Multiline flag. When ecmascript is used with any other flag, exceptions may occur.

Common options in Regular Expressions

For example, the mode "m [aeiouy]" indicates a combination of characters starting with the character "M" followed by any vowels (A, E, I, O, U, or Y, the mode "m [^ aeiouy]" indicates that the character "M" is not followed by a combination of vowels.

Set and Character Set

[Blocked ads]

The Set and character set are represented by syntax such as [A-Za-Z. In this mode, all English letters are used. You can think of a set as the sum of similar characters. For example, you can write:
 
[A-Za-Z]
 
This mode matches all uppercase and lowercase letter characters, and discards numbers and non-printable characters.

Or else

"|" And "()" allow you to construct powerful and concise modes. They have the same function as if/elseif/else, but use less characters.

Group

You can define groups and name or number them. This function is very useful when you create text files in your PostgreSQL server, access, or Excel applications. Assume that your source file contains the title, givenname, surname, and emailaddress data columns (separated by commas ). The givenname and surname columns contain multiple words, such as "Don Diego" and "De La Vega.

Regular Expressions in. net

To use regular expressions in. net, add the following line of code in your source code:
 
Imports system. Text. regularexpressions
 
Then you can use a regular expression at a level. Considering the power of regular expressions, the workload of this job is almost zero. A typical code snippet can contain a maximum of several lines.

Create a test tool

[Blocked ads]

Next, I will create a simple web page that allows you to test any regular expressions in large text. I use VB. NET, but as long as I make two changes to the code, I can also use C #.

Create a new network application named webregex. Place three text boxes on this page. The first text box is named txtpattern, the second is named txtsource, and the third is txtresults. Add a label for each text box. Add a button control on the page, name it btndoit, and change the text above it to "do it ". Adjust the last two text boxes to the appropriate size, and change their textmode attribute to multiline. Change the label name in the txtresults text box to lblmatchcount. Finally, add a validation bar control, change its label to the multiline method, and name it chkmultiline. Now, your page should beFigureA.

Figure

It takes only a few minutes to create a regular expression testing tool. Double-click the button control to enter the code window and add the following code:
 
Dim RX as RegEx
Try
If chkmultiline (). Checked then
RX = new RegEx (txtpattern (). Text, regexoptions. multiline)
Else
RX = new RegEx (txtpattern (). Text)
End if
Catch ex as exception
Response. Write (ex. Message)
Exit sub
End try
Dim MC as matchcollection = Rx. Matches (txtsource (). Text)
Lblmatchcount. Text = "found" & Mc. Count. tostring & "matches ."
Dim M as match
For each m in MC
Txtresults (). Text + = M. value' & "found at" & M. Index & CHR (10) & CHR (13)
Next
 
To save trouble, I will no longer write code to read text files (because all you need to do is open the text file in any editor and copy and paste its content to the txtsource text box control). You may want to use the HTML file you just opened for this test. Suppose you want to find all the HTML tags in This html file. The following mode will complete this task for you:
 
<[^>] *>
 
For another example, a real technique-used to match a valid visa card number:
 
^ (? :(? : [4]) (? : // D {12} | // d {15}) $
 
The following is a pattern used to match the year, month, and day (the year, month, and day are both double digits:
 
^ (0? [1-9]) | (1 [0-2])/(0? [1-9]) | ([12] [0-9]) | (3 [01])/(19 | 20)/D)

Replace text

[Blocked ads]

Even if the text search function can be implemented in all regular expressions, It is very powerful and can be intelligently replaced. Essentially, it includes capturing the text of interest in a certain mode, and then calling the "replace" method in another mode. For example, if you want to delete all HTML tags in a specified HTML file, your code is as follows:
 
Dim RX as RegEx
Dim strpattern as string = "<[^>] *>"
Dim strin as string 'importing the text is not shown
Dim Strout as string
RX = new RegEx (strpattern)
Strout = Rx. Replace (strin ,"")
 

The above code finds all HTML tags and replaces them with spaces. Note that the file may contain the source code, so it will not find a separate <or>.

This search/replacement mode may take some time to comprehend. Fortunately, Visual Studio. NET provides a mode wizard, which includes some common modes, so some of your work is reduced to several clicks and make several choices in the drop-down list, that's all.

If you are new to regular expressions, we recommend that you stop using the replacement mode before using the search mode.

A good tool

In this article, I just try to arouse your interest in constructing regular expressions and remove your mystery. I have seen the power of regular expressions. You may as well try it. If you are working hard to meet the customer's requirements, the opportunity may come. Using Regular Expressions takes less time to build a strong solution.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.