The location matching of regular expression tutorials

Source: Internet
Author: User
Tags expression engine
The example in this article describes the location matching of the regular expression tutorial. Share to everyone for your reference, as follows:

Note: In all examples, the expression match results are included in the source text between "and", some examples will be implemented in Java, if the Java itself is the use of regular expressions, will be described in the appropriate place. All Java examples are tested and passed under Jdk1.6.0_13.

First, the problem introduced

If you want to match a word in a piece of text (regardless of the multiline mode, which is described later), we might look like this:

Text: Yesterday is the history of Tomorrow is a mystery, but today is a gift.

Regular expression: is

Results: Yesterday "is" H "was" Tory, tomorrow "is" a mystery, and today "is" a gift.

Analysis: Originally just to match the word is, but the other words contained in is also match out. To solve this problem, use the boundary qualifier, which is the use of some metacharacters in regular expressions to indicate where (or what boundaries) the matching operation should take place.

Second, the word boundary

A commonly used boundary is a word boundary specified by the qualifier \b, \b used to match the beginning and end of a word. Rather, it matches a position that is located in a character that can be used to make up a word (letters, numbers, underscores, characters that match \w), and a character that cannot be used to make a word (a character that matches a \w). Take a look at the previous example:

Text: Yesterday is the history of Tomorrow is a mystery, but today is a gift.

Regular expression: \bis\b

Results: Yesterday "is", Tomorrow "was" a mystery, but today "is" a gift.

Analysis: In the original text, the word is has a space before and after it, which matches the pattern \bis\b (a space is one of the characters used to separate words). The word history also contains is, because it has a character H and t before and after it, and neither of these characters can match \b.

If a word boundary is not matched, \b is used. Such as:

Text: Please enter the Nine-digit ID as it appears on your color-coded pass-key.

Regular expression: \b-\b

Result: Please enter the ' nine-digit ' ID as it appears on your color-coded "Pass-key".

Analysis: \b-\b will match a hyphen that is not a word boundary before and after, Nine-digit and Pass-key have no spaces before or after the hyphen, so they can match, and there are spaces before and after the hyphen in color-coded, so they cannot match.

Third, string boundaries

Word boundaries can be used to match the position of a word (beginning, ending, whole word, and so on). String boundaries are also used for similar purposes, except for string-related positional matching (beginning, ending, entire string, and so on). There are two metacharacters used to define the bounds of a string: one is the ^ that defines the beginning of the string, and the other is the $ to define the end of the string.

For example, to check the legitimacy of an XML document, the legitimate XML documents are <?xml ... > Such a form begins:

Text:

<?xml version= "1.0" encoding= "UTF-8"? ><project basedir= "." default= "ear" ></project>

Regular Expression:^\s*<\?xml.*?\?>

Results:

<?xml version= "1.0" encoding= "UTF-8"?>
<project basedir= "." default= "ear" >
</project>

Parse: ^ Matches the beginning of a string, so ^\s* will match the beginning of a string and subsequent 0 or more white space characters, because whitespace, tabs, newline characters, and so on, are allowed before the <?xml> tag.

The usage of meta-character character is exactly the same as ^ usage except for the difference in location. For example, check if an HTML page ends with

Four, multi-line matching mode

Regular expressions can change the behavior of some other metacharacters by some special meta-characters. Multi-line matching mode can be enabled by (? m). Multi-line matching mode causes the regular expression engine to treat the row delimiter as a string delimiter. In multiline matching mode, ^ not only matches the beginning of the normal string, but also matches the starting position after the line delimiter (newline character), which not only matches the normal end of the string, but also matches the end position after the line delimiter (newline character).

When used, the (? m) must appear at the front of the entire pattern. For example, a regular expression of a single line of comments in a Java code (with//start) to find out all the content.

Text:

Publicdownloadingdialog (Frame parent) {     //callsuper constructor, specifying that dialog box is modal.     Super (parent,true);     Setdialog box title.     Settitle ("E-mailclient");     Instructwindow not to close when the "X" is clicked.     Setdefaultcloseoperation (do_nothing_on_close);     Puta message with a nice border on this dialog box.     Jpanelcontentpanel = new JPanel ();     Contentpanel.setborder (Borderfactory.createemptyborder (5,5, 5, 5));     Contentpanel.add (Newjlabel ("Downloading Messages ..."));     Setcontentpane (Contentpanel);     Sizedialog box to components.     Pack ();     Centerdialog box over application.     Setlocationrelativeto (parent);}

Regular expression: (? m) ^\s*//.*$

Results:

Publicdownloadingdialog (Frame parent) {
"//call Superconstructor, specifying that dialog box is modal."
Super (Parent,true);
"//set dialog Boxtitle."
Settitle ("E-mailclient");
"//instruct Windownot to close when the" X "is clicked."
Setdefaultcloseoperation (Do_nothing_on_close);
"//put a messagewith a nice border on this dialog box."
Jpanelcontentpanel = new JPanel ();
Contentpanel.setborder (Borderfactory.createemptyborder (5,5, 5, 5));
Contentpanel.add (Newjlabel ("Downloading Messages ..."));
Setcontentpane (Contentpanel);
"//size dialog boxto components."
Pack ();
"//center dialogbox over Application."
Setlocationrelativeto (parent);
}

Analysis: ^\s*//.*$ will match the beginning of a string, then any number of whitespace characters, followed by//, then any text, and finally the end of a string. However, this mode can only find the first comment, plus (? m) prefix, will be a newline character as a string delimiter, so that each line of comments can be matched out.

The Java code is implemented as follows (the text is saved in the Text.txt file):

public static string Gettextfromfile (string path) throws exception{  bufferedreader br = new BufferedReader (New Filere Ader (new File));  StringBuilder sb = new StringBuilder ();  char[] Cbuf = new char[1024];  int len = 0;  while (Br.ready () && (len = Br.read (cbuf)) > 0) {    br.read (cbuf);    Sb.append (cbuf, 0, Len);  }    Br.close ();  return sb.tostring ();} public static void Multilinematch () throws exception{  String text = gettextfromfile ("E:/text.txt");  String regex = "(? m) ^\\s*//.*$";  Matcher m = pattern.compile (regex). Matcher (text);  while (M.find ()) {    System.out.println (M.group ());}  }

The output results are as follows:

Call Super constructor, specifying this dialog box is modal.
Set dialog box title.
Instruct window not to close when the ' X ' is clicked.
Put a message with a nice border on this dialog box.
Size dialog box to components.
Center dialog box over application.

V. Summary

Regular expressions can be used to match not only text blocks of any length, but also text that appears in a particular position in a string. \b is used to specify a word boundary (\b just opposite). ^ and $ are used to specify word boundaries. If used in conjunction with (? m), the ^ and $ will also match the string at the beginning or end of a line break. The use of sub-expressions will be described in the next article.

It is hoped that this article will be helpful to everyone's regular expression learning.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.