How to use the regular expression that comes with Java

Source: Internet
Author: User
Tags expression engine java format
In Sun's Java JDK 1.40, Java comes with a package that supports regular expressions. This article introduces how to use the java. util. RegEx package.

It can be roughly estimated that, except for the occasional use of Linux, other linu X users will encounter regular expressions. Regular Expressions are extremely powerful tools and flexible in string mode-matching and string mode-replacement. In the Unix world, regular expressions have almost no restrictions. It is certainly widely used.

The regular expression engine has been implemented by many common UNIX tools, including grep, awk, VI and Emacs. In addition, many widely used scripting languages also support regular expressions, such as Python, TCL, JavaScript, and the most famous Perl.

I used to be a Perl hacker long ago. If you are the same as me, you will be very dependent on these powerful text-munging tools at hand. In recent years, like other program developers, I have been paying more and more attention to Java development.

Java, as a development language, has many recommendations, but it has never provided its own support for regular expressions. Until recently, with the help of third-party class libraries, Java began to support regular expressions, but these third-party class libraries are inconsistent, have poor compatibility, and are poorly maintained. This shortcoming has always been a huge concern for me to choose Java as the primary development tool.

You can imagine how happy I am to know that Sun's Java JDK 1.40 contains Java. util. RegEx (a fully open and built-in Regular Expression package! It is funny to say that I spend some time exploring this hidden gem. I am surprised that a major improvement such as Java (with the java. util. RegEx package included) is not made public much ?!

Recently, Java has jumped into the world of regular expressions. The Java. util. RegEx package also has its advantages in regular expression support. In addition, Java also provides detailed instructions. As a result, the mysterious RegEx of zookeeper is gradually opened. Some regular expressions (perhaps the most significant difference is that the character library is incorporated) cannot be found in Perl.

The RegEx package contains two classes: Pattern and matcher ). The pattern class is used to express and state the objects in the search mode. The matcher class is the objects that really affect the search. Add a new exception class, patternsyntaxexception. When an illegal search mode is encountered, an exception is thrown.

Even if you are familiar with regular expressions, you will find that using regular expressions in Java is quite simple. One thing to note is that for the Perl enthusiasts who are spoiled by Perl's single-line matching, when using the Java RegEx package for replacement operations, it will be more time consuming than they used previously.

The limitations of this article are not a complete tutorial on regular expression usage. If you want to learn more about regular expressions, read the mastering regular expressions of Jeffrey frieldl, which is published by o'reilly Publishing House. Here are some examples to teach readers how to use regular expressions and how to use them more simply.

It may be complicated to design a simple expression to match any phone number, because there are many situations in the phone number format. Select a valid mode for all instances. For example: (212) 555-1212,212 -555-1212 and 212 555 1212, some people will think they are equivalent.

First, let's construct a regular expression. For the sake of simplicity, a regular expression is first formed to identify the telephone numbers in the following format: (nnn) Nnn-NNNN.

Step 1: create a pattern object to match the above sub-string. Once the program runs, you can make the object generic if needed. Matching the regular expression in the above format can be constructed as follows: (/d {3})/S/D {3}-/d {4 }, among them, the/d single character type is used to match any number from 0 to 9, and the {3} repeated symbol is a simple sign to indicate three consecutive digit bits, it is also equivalent to (/D/d ). /S is also a useful single character type used to match spaces, such as space keys, Tab keys, and line breaks.

Is it easy? However, if you use the regular expression pattern in a Java program, you have to do two things. For Java interpreters, the characters before the backslash (/) have special meanings. In Java, not all packages related to RegEx can understand and recognize backslash characters (/), although you can try it. To avoid this, double backslash characters (/) should be used to completely transmit the backslash character (/) in the mode object (/). This excircle bracket has two meanings in the regular expression. If you want it to be interpreted as literal (that is, parentheses), you also need to use a double backslash character (/) before it (/). That is, like the following:

// (// D {3} //) // S // d {3}-// d {4}

Now we will introduce how to implement the regular expression just mentioned in Java code. Remember, when using a regular expression package, you need to include the package before the class you define, that is, a line like this:

Import java. util. RegEx .*;

The function of the following code is to read data from a text file row by row and search for phone numbers line by line. Once the matching number is found, it is output on the console.

Bufferedreader in;

Pattern pattern = pattern. Compile ("// (// d {3} //) // S // d {3}-// d {4 }");

In = new bufferedreader (New filereader ("phone "));

String S;

While (S = in. Readline ())! = NULL)

{

Matcher = pattern. matcher (s );

If (matcher. Find ())

{

System. Out. println (matcher. Group ());

}

}

In. Close ();

This code is common for those who are familiar with implementing regular expressions using Python or Javascript. In python, JavaScript, or other languages, once these regular expressions are explicitly compiled, you can use them wherever you want. Compared with Perl's single-step matching, it seems that a lot of work is done, but this is not very troublesome.

The find () method, as you imagine, is used to search for any target string that matches the regular expression. The group () method is used to return a string containing the matched text. It should be noted that the above Code is only used when each line can contain only one matching telephone number numeric string. Certainly, Java's regular expression package can be searched when a row contains multiple matching targets. The original intention of this article is to give some simple examples to stimulate readers to further learn the regular expression package that comes with Java, so there is no in-depth discussion on this.

This is pretty! Unfortunately, this is only a phone number. Obviously, there are two more points to improve. If it is at the beginning of the phone number, that is, there may be spaces between the location number and the local number. We can also match these conditions by adding/s in the regular expression? In which? Metacharacters indicate that there may be 0 or 1 space character in the mode.

The second point is that there may be space characters between the first three and the last four digits of a local number, rather than font size, or there is no separator at all, that is, seven digits are connected together. For these situations, can we use (-| )? . The regular expression of this structure is the converter, which can match the situations mentioned above. When () can contain a Pipeline character |, can it match whether it contains a space character or a hyphen while the end? Metacharacter indicates whether there is no separator at all.

Finally, the location number may not be included in the parentheses. Can this be simply attached to the parentheses? Metacharacters, but this is not a good solution. Because it also contains unpaired parentheses, such as "(555" or "555 )". On the contrary, we can use another converter to force the phone number to carry parentheses: (/d {3}/) |/d {3 }). If we replace the regular expression in the above Code with these improved ones, the above Code will become a very useful phone number and number matching:

Pattern pattern =

Pattern. Compile ("(// d {3} //) | // d {3}) // s? // D {3} (-| )? // D {4 }");

You can try to further improve the above Code.

Now let's take a look at the second example. It was adapted from Friedl. The function is used to check whether duplicate words exist in text files. This is often encountered in printed layout and is also a syntax checker problem.

Match words. Like other expressions, several regular expressions can be used to match words. Probably the most direct one is/B/W +/B. Its advantage is that only a small amount of RegEx metacharacters are needed. The/W metacharacters are used to match any character from letter A to letter U. + Metacharacters indicate matching once or multiple times./B metacharacters are used to describe the boundary of matching words. They can be spaces or any different punctuation marks (including commas, periods ).

How can we check whether a given word has been repeated three times? To complete this task, you must make full use of the backward scanning well-known in the regular expression. As mentioned above, parentheses have several different usage in regular expressions. One is to provide the combination type, the combination type is used to save the matched results or partially matched results (so that they can be used later), even if the same pattern exists. In the same regular expression, there may be (and usually expect) more than one combination type. In the nth combination type, the matching results can be obtained through backward scanning. Backward scanning makes searching for duplicate words very simple:/B (/W +)/S +/1/B.

Parentheses form a combination type. In this regular expression, it is the first combination type (and only one ). Backward scan/1 refers to any word that is matched by/W +. Therefore, our regular expression can match such a word. It has one or more space characters followed by the same word. Note that the tail positioning type (/B) is essential to prevent errors. If we want to match "Paris in the spring", instead of "Java's RegEx package is the theme of this Article ". According to the current Java format, the above regular expression is: Pattern pattern = pattern. compile ("// B (// W +) // s + // 1 // B ");

The last step is to make the matching case sensitive. For example, the following situation: "the theme of this article is the Java's RegEx package. ", this can be easily implemented in RegEx, that is, by using the pre-defined static flag case_insensitive in the pattern class:

Pattern pattern = pattern. Compile ("// B (// W +) // s + // 1 // B ",

Pattern. case_insensitive );

The topics related to regular expressions are very rich and complex. Java is widely used for implementation. Therefore, we need to thoroughly study the RegEx package. Here we are only talking about the tip of the iceberg. Even if you are unfamiliar with regular expressions, using the RegEx package will soon discover its powerful functions and scalability. If you are a hacker with sophisticated regular expressions from Perl or other language kingdoms and have used the RegEx package, you will be able to invest in the world of Java with peace of mind, instead, give up other tools and regard the Java RegEx package as an essential tool at hand.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.