Detailed explanation of location matching in the regular expression tutorial

Source: Internet
Author: User
Tags expression engine
This article mainly introduces location matching in the regular expression tutorial, and analyzes the skills related to location matching, such as word boundary, string boundary, and multi-row matching mode in combination with examples, for more information about location matching, see the example in this article. We will share this with you for your reference. The details are as follows:

Note: in all examples, regular expression matching results are included between [and] in the source text, and some examples are implemented using Java. if the regular expression is used in java, it will be described in the corresponding area. All java examples have passed the test under JDK1.6.0 _ 13.

I. problem introduction

If you want to match a word in a text section (we will introduce it later without considering the multiline mode), we may look like the following:

Text: Yesterday is history, tomorrow is a mystery, but today is a gift.

Regular expression: is

Result: Yesterday [is] h [is] today, tomorrow [is] a mystery, but today [is] a gift.

Analysis: We only need to match the word "is", but also match the "is" contained in other words. To solve this problem, use the boundary delimiter, that is, use some metacharacters in the regular expression to indicate the position (or boundary) of the matching operation.

2. word boundary

A common boundary is the word boundary specified by the qualifier \ B, and \ B is used to match the start and end of a word. More specifically, it matches such a position, which is located in a character that can be used to form a word (letter, number, underline, that is, a character that matches \ w) and a character that cannot be used to form a word (a character that matches \ W. Let's look at the previous example:

Text: Yesterday is history, tomorrow is a mystery, but today is a gift.

Regular expression: \ bis \ B

Result: Yesterday [is] history, tomorrow [is] a mystery, but today [is] a gift.

Analysis: In the original text, there is a space before and after the word is, which matches the pattern \ bis \ B (space is one of the characters used to separate words ). The word history also contains is, because it has a character h and t before and after it, which cannot match \ B.

If a word boundary does not match, use \ B. For example:

Text: Please enter the nine-digit id as it appears on your color-coded pass-key.

Regular expression: \ B-\ B

Result: Please enter the [nine-digit] id as it appears on your color-coded [pass-key ].

Analysis: \ B-\ B will match a hyphen that is not the word boundary before and after the nine-digit and pass-key do not have spaces before and after the hyphen, so it can match, in color-coded, there are spaces before and after the hyphen, so they cannot be matched.

3. string boundary

The word boundary can be used to match the position related to a word (such as the start, end, and whole word ). The string boundary also has a similar purpose, but is used to match the position related to the string (string start, end, entire string, and so on ). Two metacharacters are used to define the string boundary: one is used to define the ^ at the beginning of the string, and the other is used to define the $ at the end of the string.

For example, to check the validity of an XML document, all valid XML documents use Start with this form:

Text:

 
 

Regular expression: ^ \ s * <\? Xml .*? \?>

Result:



Analysis: ^ matches the starting position of a string, so ^ \ s * matches the starting position of a string and the subsequent zero or multiple blank characters, because A tag can contain spaces, tabs, line breaks, and other blank characters.

$ The usage of metacharacters is identical to that of ^ Except for the difference in position. For example, check whether an html pageAt the end, you can use the following mode: \ S * $

4. multiline matching mode

Regular expressions can use special metacharacters to change the behavior of other metacharacters. You can use (? M) to enable the multiline matching mode. The multiline match mode allows the regular expression engine to treat the row separator as a string separator. In multi-line match mode, ^ not only matches the beginning of a normal string, but also matches the start position after the line separator (line break), $ not only matches the end of a normal string, it also matches the end position after the line separator (line break.

In use ,(? M) must appear at the beginning of the entire mode. For example, you can use a regular expression to retrieve all the single-line comments (beginning with //) in a java code.

Text:

publicDownloadingDialog(Frame parent){     //Callsuper constructor, specifying that dialog box is modal.     super(parent,true);     //Setdialog box title.     setTitle("E-mailClient");     //Instructwindow not to close when the "X" is clicked.     setDefaultCloseOperation(DO_NOTHING_ON_CLOSE);     //Puta message with a nice border in this dialog box.     JPanelcontentPanel = new JPanel();     contentPanel.setBorder(BorderFactory.createEmptyBorder(5,5, 5, 5));     contentPanel.add(newJLabel("Downloading messages..."));     setContentPane(contentPanel);     //Sizedialog box to components.     pack();     //Centerdialog box over application.     setLocationRelativeTo(parent);}

Regular expression :(? M) ^ \ s * //. * $

Result:

PublicDownloadingDialog (Frame parent ){
[// Call superconstructor, specifying that dialog box is modal .]
Super (parent, true );
[// Set dialog boxtitle .]
SetTitle ("E-mailClient ");
[// Instruct windownot to close when the "X" is clicked .]
Setdefaclocloseoperation (DO_NOTHING_ON_CLOSE );
[// Put a messagewith a nice border in this dialog box .]
JPanelcontentPanel = new JPanel ();
ContentPanel. setBorder (BorderFactory. createEmptyBorder (5, 5, 5 ));
ContentPanel. add (newJLabel ("Downloading messages ..."));
SetContentPane (contentPanel );
[// Size dialog boxto components .]
Pack ();
[// Center dialogbox over application .]
SetLocationRelativeTo (parent );
}

Analysis: ^ \ s *//. * $ will match the start of a string, followed by any number of blank characters, followed by //, followed by any text, followed by the end of a string. However, in this mode, you can only find the first comment and add (? M) after the prefix, the linefeed is treated as a string separator, so that each line of comment can be matched.

Java code is as follows (the producer is saved in the text.txt file ):

public static String getTextFromFile(String path) throws Exception{  BufferedReader br = new BufferedReader(new FileReader(new File(path)));  StringBuilder sb = new StringBuilder();  char[] cbuf = new char[1024];  int len = 0;  while(br.ready() && (len = br.read(cbuf)) > 0){    br.read(cbuf);    sb.append(cbuf, 0, len);  }    br.close();  return sb.toString();}public static void multilineMatch() throws Exception{  String text = getTextFromFile("E:/text.txt");  String regex = "(?m)^\\s*//.*$";  Matcher m = Pattern.compile(regex).matcher(text);  while(m.find()){    System.out.println(m.group());  }}

The output result is as follows:

// Call super constructor, specifying that dialog box is modal.
// Set dialog box title.
// Instruct window not to close when the "X" is clicked.
// Put a message with a nice border in this dialog box.
// Size dialog box to components.
// Center dialog box over application.

V. Summary

Regular expressions can be used not only to match text blocks of any length, but also to match texts that appear at a specific position in a string. \ B is used to specify a word boundary (\ B is the opposite ). ^ And $ are used to specify the word boundary. If (? M) in combination, ^ and $ will also match the string at the beginning or end of a linefeed. In the next article, we will introduce the use of subexpressions.

I hope this article will help you learn regular expressions.

For more details about location matching in the regular expression tutorial, refer to the PHP Chinese website!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.