Take a look at the Great White & programming Studio Effect Difference

Take a look at the Great White & programming Studio Effect Difference _ technology

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This paper briefly introduces the regular expressions of Java and its implementation, and explains the specific usage of regular expressions through examples. 1. Regular expression 1.1. Brief introduction

Regular Expressions (Regular Expression), referred to as regular, are also translated into regular form, used to represent text search patterns. The English abbreviation is a regex (Reg-ex).

Search patterns can vary, such as a single character (character), a specific string (fixed string), complex expressions that contain special meanings, and so on. For a given string, the regular expression may match one or more times, or it may not match at once.

Regular expressions are generally used to find, edit, and replace text, essentially, text (text) and string (string) are the same thing.

The process of parsing/modifying text with a regular expression, called a regular expression applied to a literal/string. The order in which the regular expression scans the string is from left to right. Each character can only be matched successfully once, and the next match scan starts at the back. For example, the regular expression aba, when matching a string Ababababa, scans only two matches (aba_aba__). 1.2. Example

The simplest example is the alphabet string. For example, the regular expression Hello world can match the string "Hello World". In regular expressions, point numbers. (dot, English period) is a wildcard, and the dot matches any one character (character); For example, "a" or "1"; Of course, the dot number does not match the newline by default, and requires a special identity designation.

The following table lists some simple regular expressions and corresponding matching patterns. Regular expression matches this is text exactly matches "This is text" This\s+is\s+text matches with: "This", plus 1 to more white space characters (whitespace character, such as spaces, tab, newline And so on), plus "is", plus 1 to more whitespace, plus "text." ^\d+ (\.\d+)? The regular expression begins with the escape character ^ (small spike), meaning that the line must begin with the character pattern following the small sharp number before a match can be reached. \d+ matches 1 to multiple digits. English question mark? Indicates that 0~1 can appear. \. The match is the character ".", and the parentheses (parentheses) represent a grouping. So this regular expression can match a positive integer or a decimal number, such as: "5", "66.6" or "5.21", and so on.

Note that the Chinese full-width space () character is not a white space character (whitespace characters) and can be considered to belong to a special Chinese character. 1.3. Programming language support for regular expressions

Most programming languages support regular expressions, such as Java, Perl, Groovy, and so on. But the regular expressions of the various languages are slightly different. 2. Preliminary knowledge

This tutorial requires the reader to have a basic knowledge of the Java language.

Some of the following examples use JUnit to validate execution results. If you do not want to use JUnit, you can also overwrite the related code. For a knowledge of junit, refer to the JUnit Tutorial: http://www.vogella.com/tutorials/JUnit/article.html. 3. Grammar rules

This chapter describes the various regular elements of the template, we will first introduce what is metacharacters (meta character). 3.1. General expression description of regular expressions. Point number (.), matching any one character ^regex small tip (^), the starting identity, no other characters can appear before. regex$ dollar sign ($,dollar, American knife), end identity, no more characters can be followed. [ABC] Character set (set), match A or B or C. [Abc][vz] Character set (set), which matches a or B or C, followed by V or Z. [^ABC] If the small spike (^, caret, read here as non) appears first in the brackets, It means negation (negate). Match here: Any character other than a, B, C. [A-d1-7] range notation: matches a single character between A to D, or a single character between 1 and 7, and the whole matches only a single character, not a D1 combination. X| Z-match x or Z. XZ match xz, X and Z must appear sequentially. $ to determine whether a row ends. 3.2. Meta character

The following are predefined metacharacters (Meta characters) that can be used to extract common patterns, such as \d can replace [0-9], or [0123456789]. A regular expression \d a single number, equivalent to [0-9] but more concise \d non-numeric, equivalent to [^0-9] but more concise \s whitespace characters (whitespace), equivalent to [\t\n\x0b\r\f] \s Non-white-space characters, equivalent to [^\s] \w The slash plus lowercase W, which represents a single identifier, an alphanumeric underline, is equivalent to [a-za-z_0-9] \w not a word character, equivalent to [^\w] \s+ matching 1 to multiple non-white-space characters \b match the outer bounds of the word (word boundary), and the word character refers to the [a-za-z 0-9_]

These meta characters are mainly taken from the English initials of the corresponding words, such as: digit (number), space (blank), Word (word), and boundary (boundary). The corresponding uppercase character is used to indicate the inverse. 3.3. quantifier

Quantifiers (quantifier) are used to specify the number of times an element can appear.?, *, + and {} Symbols define the number of regular expressions. The regular expression says the example * 0 to many times, is equivalent to {0,} x* matches 0 to multiple consecutive x,. * matches any string + 1 to multiple, equivalent to {1,} x+ match 1 to multiple consecutive x? 0 to 1 times, equivalent to {0,1} X? Match 0, the latter 1 x {n} exactly matches n times {} The number of occurrences of the previous sequence \d{3} matches 3 digits. {10} matches any 10 characters. {m, n} appears m to n Times, \d{1,4} matches at least 1 digits, up to 4 digits. *? After the quantifier is added, the lazy mode (reluctant quantifier) is indicated. Slowly scan from left to right, finding the first place that satisfies the regular expression pauses the search to try to match the fewest strings. 3.4. Grouping and referencing

You can group regular expressions (Grouping) and enclose them in parentheses (). This allows quantifiers to be used for the whole of parentheses.

Of course, when making substitutions, you can also refer to groups. The capturing group (captures). The backward reference (back reference) points to the string corresponding to the grouping in the match. The substitution can be referenced by $.

Use $ to refer to a capturing group. For example, the first group is represented, the second group is $, and so on, and the $ $ represents the part of the whole match.

For example, to remove a space that precedes a word, a period/comma (point or comma). You can write a period/comma into a regular, and then output it to the result.

Remove the space between the word and '. |, ' String pattern = ' (
\\w) (\\s+) ([\ \.,]) ";
System.out.println (Example_test.replaceall (Pattern, "$1$3"));

To extract the contents of a label:

Extract <title> label content pattern
= "(? i) (<title.*?>) (. +?) ()";
String Updated = Example_test.replaceall (Pattern, "$");

3.5. Look Around

Look Around (lookaround), divided into sequential look (lookahead) and reverse (lookbehind), belonging to the 0 width assertion (zero-length assertion). Similar to line start identification (^) and end identity ($); or the word boundary (\b) a kind of position identification.

Sequential negation (Negative look ahead), used to exclude certain situations while matching. This means that it cannot be followed by a string that matches a particular feature.

Sequential negation (Negative look ahead) use (?!) pattern) This format defines. For example, the following regular will only match the "a" letter that is not followed by a B-letter.

A (?!) b

Similarly, the Order of the Glance (look ahead), also called the order sure look around. If, for example, only a letter is matched but only a B-letter is required, this a does not meet the need:

A (? =b)

Notice that the look is a forward/Post lookup syntax: (? =exp), which looks for exp in the back position; The content that is looked at is not included in the regular expression match.

Looking around (Lookaround) is an advanced technique in which the part of the survey does not match the results, but requires that the matching string be preceded/followed by the feature of the scan.

If you change an equal sign to an exclamation point, you are looking at the negation (?!) EXP), becomes the negation semantics, which means that the location of the lookup cannot be followed by exp.

Reverse sure look around, (? <=exp), indicating that the left side of the position can match exp

Reverse negation look, (? <!exp), indicating that the left side of the location does not match exp

For more information please refer to: Regular application--Reverse search: http://blog.csdn.net/lxcnn/article/details/4954134

Reference: The use of regular expressions to exclude a specific string http://www.cnblogs.com/wangqiguo/archive/2012/05/08/2486548.html 3.6. Pattern of regular expressions

You can specify a pattern modifier (mode modifier) at the beginning of a regular expression. You can also combine multiple modes, such as (? is) patterns.

(? i) regular expressions are not case-sensitive when matched.

(? s) single-line mode, which causes the point number (.) to match all characters, including newline (\ n).

(? m) multiline mode (multi-line mode) that matches the start and end of each row in the target string with a small spike (^,caret) and dollar sign ($, dollar). 3.7. Backslash in Java

In a Java string, a backslash (\, backslash) is an escape character and has a built-in meaning. At the source level, you need to use two backslash \ \ to represent a backslash character. If you want to define a regular expression that is \w, you need to write \\w in the. java File source code. If you want to match 1 backslashes in the text, you need to write 4 backslash \\\\ in the source code. 4. The string class is a regular-related method 4.1. The String class redefined a regular-related method

The String class in Java has 4 built-in methods that support the regular: matches (), Split (), Replacefirst (), and ReplaceAll () methods. Note that replace () is a pure string substitution and does not support regular expressions.

These methods do not optimize performance. We'll discuss the optimized classes later. Method Description S.matches ("regex") determines whether the string S can match a regular "regex". Returns true only if the entire string match is positive. S.split ("regex") splits the string with the regular expression "regex" as the delimiter, returning the result is an string[] array. Note that the delimiter corresponding to the regex is not included in the return result. S.replacefirst ("regex", "replacement") replaces the first matching "regex" with the content "replacement." S.replaceall ("regex", "replacement") replaces all content that matches "regex" with "replacement."

The following is a corresponding example.

Package de.vogella.regex.test;

public class Regexteststrings {public
        static final String example_test = ' This is my small EXAMPLE '
                        + ' String wh Ich I ' m going to "+" use for pattern matching. ";

        public static void Main (string[] args) {
                System.out.println (example_test.matches ("\\w.*"));
                String[] splitstring = (Example_test.split ("\\s+"));
                System.out.println (splitstring.length);//should be
                -for (String string:splitstring) {
                        System.out.println ( string);
                Replace all whitespace characters (whitespace)
                with tab SYSTEM.OUT.PRINTLN (Example_test.replaceall ("\\s+", "\ T");
        }

4.2. Example

Some examples of the use of regular expressions are given below. Please refer to the comment information.

If you are want to test this examples, create for the Java project de.vogella.regex.string.

If you want to test these samples, place the Java files under a Java package, such as de.vogella.regex.string.

Package de.vogella.regex.string;
                public class Stringmatcher {//If the string exactly matches ' true ', returns True public boolean isTrue (string s) {
        Return S.matches ("true"); //If the string exactly matches ' true ' or ' true ', returns True public boolean IsTrueVersion2 (string s) {RET
        Urn S.matches ("[tt]rue"); //If the string exactly matches ' true ' or ' true ' or ' yes ' or ' yes ', returns True public boolean Istrueoryes (S Tring s) {return s.matches ("[tt]rue|[
        Yy]es "); ///If include string "' true", returns True public boolean containstrue (string s) {return s.matches ("
        . *true.* "); //If contains 3 letters, returns True public boolean isthreeletters (String s) {return s.matches ("[a-za-
                Z]{3} ");
        Of course, it is equivalent to the following method of comparison soil/return s.matches ("[a-z][a-z][a-z]");
   //If it does not start with a number, returns True public boolean isnonumberatbeginning (String s) {     May be "^\\d.*" a Little Better return s.matches ("^[^\\d].*"); //If a character other than ' B ' is included, returns True public boolean isintersection (String s) {return s.matches ("
        ([\\w&&[^b]]); ///If the containing string number is less than 300, returns True public boolean islessthenthreehundred (string s) {return s.ma Tches ("[^0-9]*[12]?[
        0-9]{1,2}[^0-9]* "); }

}

and a small JUnit Test to validates the examples.

We validated it through JUnit tests.

Package de.vogella.regex.string;
Import Org.junit.Before;

Import Org.junit.Test;
Import static org.junit.Assert.assertFalse;

Import static org.junit.Assert.assertTrue;

        public class Stringmatchertest {private Stringmatcher m;
        @Before public void Setup () {m = new stringmatcher ();
                @Test public void Testistrue () {Asserttrue (M.istrue ("true"));
                Assertfalse (M.istrue ("true2"));
        Assertfalse (M.istrue ("True"));
                @Test public void TestIsTrueVersion2 () {Asserttrue (M.istrueversion2 ("true"));
                Assertfalse (M.istrueversion2 ("true2"));
        Asserttrue (M.istrueversion2 ("True"));;
                @Test public void Testistrueoryes () {Asserttrue (M.istrueoryes ("true"));
                Asserttrue (M.istrueoryes ("yes"));
                Asserttrue (M.istrueoryes ("yes")); AssertfalsE (M.istrueoryes ("no"));
        @Test public void Testcontainstrue () {Asserttrue (M.containstrue ("Thetruewithin"));
                @Test public void Testisthreeletters () {Asserttrue (M.isthreeletters ("abc"));
        Assertfalse (M.isthreeletters ("ABCD")); @Test public void testisnonumberatbeginning () {asserttrue (m.isnonumberatbeginning) ("ABC
                "));
                Assertfalse (m.isnonumberatbeginning ("1ABCD"));
                Asserttrue (m.isnonumberatbeginning ("A1BCD"));
        Asserttrue (m.isnonumberatbeginning ("ASDFDSF"));
                @Test public void Testisintersection () {Asserttrue (M.isintersection ("1"));
                Assertfalse (M.isintersection ("ABCKSDFKDSKFSDFDSF"));
        Asserttrue (M.isintersection ("skdskfjsmcnxmvjwque484242"));
            @Test public void testlessthenthreehundred () {    Asserttrue (m.islessthenthreehundred ("288"));
                Assertfalse (m.islessthenthreehundred ("3288"));
                Assertfalse (m.islessthenthreehundred ("328 8"));
                Asserttrue (m.islessthenthreehundred ("1"));
                Asserttrue (m.islessthenthreehundred ("99"));
        Assertfalse (m.islessthenthreehundred ("300")); }

}

5. Pattern and Matcher Introduction

For advanced regular Expressions The Java.util.regex.Pattern and Java.util.regex.Matcher classes are.

To support advanced features of regular expressions, you need to use the Java.util.regex.Pattern and Java.util.regex.Matcher classes.

The pattern object is first created/compiled to define the regular expression. For a pattern object, given a string, a corresponding Matcher object is generated. You can perform various regular-related operations on a String by Matcher the object.

Package de.vogella.regex.test;
Import Java.util.regex.Matcher;

Import Java.util.regex.Pattern; public class Regextestpatternmatcher {public static final String example_test = ' This are my small EXAMPLE string

        Which I ' m going to use for the pattern matching. ";
                public static void Main (string[] args) {Pattern pattern = pattern.compile ("\\w+");
                If you want to ignore the case, you can use://pattern pattern = pattern.compile ("\\w+", pattern.case_insensitive);
                Matcher Matcher = Pattern.matcher (example_test); Find all matching results while (Matcher.find ()) {System.out.print ("Start index:" + MATCHER.S
                        Tart ());
                        System.out.print ("End index:" + matcher.end () + "");
                System.out.println (Matcher.group ());
                //Replace the space with tabs pattern replace = pattern.compile ("\\s+"); Matcher Matcher2 = Replace. Matcher (Example_test);
        System.out.println (Matcher2.replaceall ("T")); }
}

6. Example of regular expressions

The following is a list of common regular expression usage scenarios. The reader is expected to make appropriate adjustments according to the actual situation. 6.1 or (OR)

tasks: write regular expressions that match lines that contain the word "Joe" or "Jim", or both.

Create the De.vogella.regex.eitheror package and the following class.

Package de.vogella.regex.eitheror;

Import Org.junit.Test;

Import static org.junit.Assert.assertFalse;
Import static org.junit.Assert.assertTrue;

public class Eitherorcheck {
        @Test public
        void Testsimpletrue () {
                String s = "Humbapumpa Jim";
                Asserttrue (S.matches (". * (Jim|joe). *"));
                s = "Humbapumpa jom";
                Assertfalse (S.matches (". * (Jim|joe). *"));
                s = "Humbapumpa Joe";
                Asserttrue (S.matches (". * (Jim|joe). *"));
                s = "Humbapumpa Joe Jim";
                Asserttrue (S.matches (". * (Jim|joe). *"));
        }

6.2. Matching phone number

tasks: write regular expressions that match various phone numbers.

Suppose the phone number is in the form of "7-bit consecutive digits"; or "3 digits plus a space/horizontal line, plus 4 digits."

Package de.vogella.regex.phonenumber;

Import Org.junit.Test;

Import static org.junit.Assert.assertFalse;
Import static org.junit.Assert.assertTrue;


public class Checkphone {

        @Test public
        void Testsimpletrue () {
                String pattern = ' \\d\\d\\d ([, \\s])? \\d\\d\\ D\\d ";
                String s= "1233323322";
                Assertfalse (s.matches (pattern));
                s = "1233323";
                Asserttrue (s.matches (pattern));
                s = "123 3323";
                Asserttrue (s.matches (pattern));
        }

6.3. Determine the range of specific numbers

The following example is used to determine whether the text has a contiguous 3-digit number.

Create the De.vogella.regex.numbermatch package and the following class.

Package de.vogella.regex.numbermatch;
Import Java.util.regex.Matcher;
Import Java.util.regex.Pattern;

Import Org.junit.Test;

Import static org.junit.Assert.assertFalse;
Import static org.junit.Assert.assertTrue;

public class CheckNumber {


        @Test
        the public void Testsimpletrue () {
                String s= "1233";
                Asserttrue (Test (s));
                s= "0";
                Assertfalse (Test (s));
                s = "KASDKF 2300 kdsdf";
                Asserttrue (Test (s));
                s = "99900234";
                Asserttrue (Test (s));

        public static Boolean test (String s) {pattern pattern
                = Pattern.compile ("\\d{3}");
                Matcher Matcher = Pattern.matcher (s);
                if (Matcher.find ()) {return
                        true;
                }
                return false;
        }

}

6.4. Verify Hyperlink

Suppose you need to find all the valid links from your Web page. Of course, you need to exclude the "javascript:" and "mailto:" scenarios.

Create De.vogella.regex.weblinks packages, as well as the following classes:

Package de.vogella.regex.weblinks;
Import Java.io.BufferedReader;
Import java.io.IOException;
Import Java.io.InputStreamReader;
Import java.net.MalformedURLException;
Import Java.net.URL;
Import java.util.ArrayList;
Import java.util.List;
Import Java.util.regex.Matcher;

Import Java.util.regex.Pattern;
        public class Linkgetter {private pattern htmltag;

        private pattern link; Public Linkgetter () {Htmltag = Pattern.compile ("<a\\b[^>]*href=\" [^>]*> (. *?)
                </a> ");
        link = pattern.compile ("href=\" [^>]*\ ">"); Public list<string> getlinks (String url) {list<string> links = new Arraylist<st
                Ring> ();
                                        try {bufferedreader BufferedReader = new BufferedReader (
                        New InputStreamReader (new URL (URL). OpenStream ());
                        String s; StringBuilder Builder =New StringBuilder ();
                        while ((s = bufferedreader.readline ())!= null) {builder.append (s);
                        } Matcher Tagmatch = Htmltag.matcher (builder.tostring ());
                                while (Tagmatch.find ()) {Matcher Matcher = Link.matcher (Tagmatch.group ());
                                Matcher.find (); String link = matcher.group (). Replacefirst ("href=\", ""). Replacefirst ("\
                                ">", ""). Replacefirst ("\" [\\s]?target=\ "[a-za-z_0-9]*", "");
                                if (valid (link)) {Links.add (makeabsolute (URL, link));
                        catch (Malformedurlexception e) {
                E.printstacktrace ();catch (IOException e) {e.printstacktrace ();
        return links;
                        Private Boolean valid (String s) {if (S.matches ("javascript:.*|mailto:.*")) {
                return false;
        return true;
                        private string Makeabsolute (string url, string link) {if (Link.matches ("http://.*")) {
                return link; } if (Link.matches ("/.*") && url.matches (". *$[^/]")) {return URL + "/" +
                Link } if (Link.matches ("[^/].*") && url.matches (". *[^/]")) {return URL + '/'
                + link;
                } if (Link.matches ("/.*") && url.matches (". *[/]")) {return URL + link;
   } if (Link.matches ("/.*") && url.matches (". *[^/]")) {                     return URL + link; throw new RuntimeException ("Cannot make the" link absolute.
        URL: "+ URL + link" + link); }
}

6.5. Find duplicate words

The following regular expression is used to match repeated words.

\b (\w+) \s+\1\b

\b is a word boundary, \1 refers to the first group, where the first group is the previous word (\w+).

(?! -in) \b (\w+) \1\b to match repeated words that were not preceded by "-in" by a glance at the negation.

Tip: You can perform a cross-row search by adding the (? s) flag to the front. 6.6. Find elements at the beginning of each line

The following regular is used to find the word "title" at the beginning of a line, preceded by a space.

(\n\s*) title

6.7. Find non-Javadoc-style statements

Sometimes, non-Javadoc-style (Non-javadoc) statements appear in Java code; The @Override annotation in Java 1.6, used to tell the IDE that the method overridden the superclass method. This can be removed from the source code. The following guidelines are used to find such annotations.

(s)/\* \ (non-javadoc\). *?\*/

6.7.1. Replace DocBook declaration with AsciiDoc

For example, you have the following XML:

<programlisting language= "java" >
        <xi:include xmlns:xi= "Http://www.w3.org/2001/XInclude" parse= "text" Href= "./examples/statements/myclass.java"/>
</programlisting>

You can match them with the following regular:

' \s+<programlisting language= "java" >\r.\s+<xi:include xmlns:xi= "Http://www\.w3\.org/2001/xinclude" parse = "text" href= "\./examples/(. *) .\s+/>\r.\s+</programlisting> '

The replacement target can be a regex such as the following:

' \r[source,java]\r----\ r include::res/$1[]\r----

7. Using regular expressions in eclipse

In eclipse or other editors, you can use regular to perform lookups and replacements. You typically use the shortcut key ctrl+h to open the Search/search dialog box.

Select the File Search tab and check the Regular expression identity to make a regular lookup/replacement. Of course, you can also specify the file type, as well as the range of directories to find/replace.

The following figure shows how to find XML tags <! [cdata[]]] > and the preceding spaces, and how to remove these spaces.

In the results dialog you can see where the replacements are, and you can remove elements that you don't want to replace. No problem, click the OK button, it will be replaced.

8. RELATED LINKS Sample code download: http://www.vogella.com/code/index.html regular-expressions.info on Using Regular Expressions in Java Regu Lare Xpressions Examples the Java Tutorials:Lesson:Regular Expressions

Original link: http://www.vogella.com/tutorials/JavaRegularExpressions/article.html

Original Date: 2016.06.24

Translation Date: 2017-12-28

Translator: Anchor http://blog.csdn.net/renfufei/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Take a look at the Great White & programming Studio Effect Difference _ technology

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Take a look at the Great White & programming Studio Effect Difference _ technology

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support