Java Regular Series: (1) Getting Started tutorial __java

Source: Internet
Author: User
Tags assert modifier uppercase character junit tutorial

This paper briefly introduces the regular expressions of Java and its implementation, and explains the specific usage of regular expressions through examples. 1. Regular expression 1.1. Introduction

Regular Expressions (Regular Expression), referred to as regular, are also translated into regular form, used to represent text search patterns. The English abbreviation is a regex (Reg-ex).

Search patterns can vary, such as a single character (character), a specific string (fixed string), complex expressions that contain special meanings, and so on. For a given string, the regular expression may match one or more times, or it may not match at once.

Regular expressions are generally used to find, edit, and replace text, essentially, text (text) and string (string) are the same thing.

The process of parsing/modifying text with a regular expression, called a regular expression applied to a literal/string. The order in which the regular expression scans the string is from left to right. Each character can only be matched successfully once, and the next match scan starts at the back. For example, the regular expression aba, when matching a string Ababababa, scans only two matches (aba_aba__). 1.2. Example

The simplest example is the alphabet string. For example, the regular expression Hello world can match the string "Hello World". In regular expressions, point numbers. (dot, English period) is a wildcard, and the dot matches any one character (character); For example, "a" or "1"; Of course, the dot number does not match the newline by default, and requires a special identity designation.

The following table lists some simple regular expressions and corresponding matching patterns.

Regular Expressions matches
This is text Match exactly "This is text"
This\s+is\s+text The matching contents are: "This", plus 1 to more whitespace (whitespace character, such as spaces, tab, newline, etc.), plus "is", plus 1 to more whitespace, plus "text".
^\d+ (\.\d+)? The regular expression begins with the escape character ^ (small spike), meaning that the line must begin with the character pattern following the small sharp number before a match can be reached. \d+ matches 1 to multiple digits. English question mark? Indicates that 0~1 can appear. \. The match is the character ".", and the parentheses (parentheses) represent a grouping. So this regular expression can match a positive integer or a decimal number, such as: "5", "66.6" or "5.21", and so on.

Note that the Chinese full-width space () character is not a white space character (whitespace characters) and can be considered to belong to a special Chinese character. 1.3. Programming language support for regular expressions

Most programming languages support regular expressions, such as Java, Perl, Groovy, and so on. But the regular expressions of the various languages are slightly different. 2. Preliminary knowledge

This tutorial requires the reader to have a basic knowledge of the Java language.

Some of the following examples use JUnit to validate execution results. If you do not want to use JUnit, you can also overwrite the related code. For a knowledge of junit, refer to the JUnit Tutorial: http://www.vogella.com/tutorials/JUnit/article.html. 3. Grammar rules

This chapter describes the various regular elements of the template, we will first introduce what is metacharacters (meta character). 3.1. Overview of General Expressions

Regular Expressions Description
. Point number (.), matching any one character
^regex Small tip (^), starting ID, no other characters can appear before.
regex$ Dollar sign ($,dollar, American knife), end identification, no more characters can appear after.
[ABC] Character group (set) that matches a or B or C.
[ABC] [VZ] A character group (set) that matches a or B or C, followed by V or Z.
[^ABC] If the small spike (^, caret, read here as non) appears first in the brackets, then the negation (negate) is expressed. Match here: Any character other than a, B, C.
[A-d1-7] Range notation: Matches a single character between A to D, or a single character between 1 and 7, and the whole matches only a single character, rather than D1 this combination.
X| Z Match X or Z.
XZ Match XZ, X, and z must appear sequentially.
$ Determines whether a row ends.
3.2. Meta character

The following are predefined metacharacters (Meta characters) that can be used to extract common patterns, such as \d can replace [0-9], or [0123456789].

Regular Expressions Description
\d Single digit, equivalent to [0-9] but more concise
\d Non-numeric, equivalent to [^0-9] but more concise
\s White space character (whitespace), equivalent to [\t\n\x0b\r\f]
\s Non-whitespace characters, equivalent to [^\s]
\w The backslash plus lowercase w denotes a single identifier, an alphanumeric underline, equivalent to [a-za-z_0-9]
\w Non-word characters, equivalent to [^\w]
\s+ Match 1 to multiple non-white-space characters
\b Matches the outer bounds of the word (word boundary), and the word character refers to [a-za-z0-9_]

These meta characters are mainly taken from the English initials of the corresponding words, such as: digit (number), space (blank), Word (word), and boundary (boundary). The corresponding uppercase character is used to indicate the inverse. 3.3. quantifier

Quantifiers (quantifier) are used to specify the number of times an element can appear.?, *, + and {} Symbols define the number of regular expressions.

Regular Expressions Description Sample
* 0 to several times, equivalent to {0,} x* matches 0 to multiple consecutive x,. * matches any string
+ 1 to several times, equivalent to {1,} x+ matches 1 to multiple consecutive X
? 0 to 1 times, equivalent to {0,1} X? Match 0, the latter 1 x
N Exact match n times {} Number of occurrences of previous sequence \D{3} matches a 3-digit number. {10} matches any 10 characters.
{m, n} appear m to n times, \d{1,4} matches at least 1 digits, up to 4 digits.
*? After the quantifier is added, the lazy mode (reluctant quantifier) is indicated. Slowly scan from left to right, finding the first place that satisfies the regular expression pauses the search to try to match the fewest strings.
3.4. Grouping and referencing

You can group regular expressions (Grouping) and enclose them in parentheses (). This allows quantifiers to be used for the whole of parentheses.

Of course, when making substitutions, you can also refer to groups. The capturing group (captures). The backward reference (back reference) points to the string corresponding to the grouping in the match. The substitution can be referenced by $.

Use $ to refer to a capturing group. For example, the first group is represented, the second group is $, and so on, and the $ $ represents the part of the whole match.

For example, to remove a space that precedes a word, a period/comma (point or comma). You can write a period/comma into a regular, and then output it to the result.

Remove the space between the word and '. |, ' String pattern = ' (
\\w) (\\s+) ([\ \.,]) ";
System.out.println (Example_test.replaceall (Pattern, "$1$3"));

To extract the contents of a label:

Extract <title> label content pattern
= "(? i) (<title.*?>) (. +?) ()";
String Updated = Example_test.replaceall (Pattern, "$");
3.5. Look Around

Look Around (lookaround), divided into sequential look (lookahead) and reverse (lookbehind), belonging to the 0 width assertion (zero-length assertion). Similar to line start identification (^) and end identity ($); or the word boundary (\b) a kind of position identification.

Sequential negation (Negative look ahead), used to exclude certain situations while matching. This means that it cannot be followed by a string that matches a particular feature.

Sequential negation (Negative look ahead) use (?!) pattern) This format defines. For example, the following regular will only match the "a" letter that is not followed by a B-letter.

A (?!) b

Similarly, the Order of the Glance (look ahead), also called the order sure look around. If, for example, only a letter is matched but only a B-letter is required, this a does not meet the need:

A (? =b)

Notice that the look is a forward/Post lookup syntax: (? =exp), which looks for exp in the back position; The content that is looked at is not included in the regular expression match.

Looking around (Lookaround) is an advanced technique in which the part of the survey does not match the results, but requires that the matching string be preceded/followed by the feature of the scan.

If you change an equal sign to an exclamation point, you are looking at the negation (?!) EXP), becomes the negation semantics, which means that the location of the lookup cannot be followed by exp.

Reverse sure look around, (? <=exp), indicating that the left side of the position can match exp

Reverse negation look, (? <!exp), indicating that the left side of the location does not match exp

For more information please refer to: Regular application--Reverse search: http://blog.csdn.net/lxcnn/article/details/4954134

Reference: Use regular expressions to exclude specific strings http://www.cnblogs.com/wangqiguo/archive/2012/05/08/2486548.html 3.6. Pattern of regular Expressions

You can specify a pattern modifier (mode modifier) at the beginning of a regular expression. You can also combine multiple modes, such as (? is) patterns.

(? i) regular expressions are not case-sensitive when matched.

(? s) single-line mode, which causes the point number (.) to match all characters, including newline (\ n).

(? m) multiline mode (multi-line mode) that matches the start and end of each row in the target string with a small spike (^,caret) and dollar sign ($, dollar). 3.7. Backslash in Java

In a Java string, a backslash (\, backslash) is an escape character and has a built-in meaning. At the source level, you need to use two backslash \ \ to represent a backslash character. If you want to define a regular expression that is \w, you need to write \\w in the. java File source code. If you want to match 1 backslashes in the text, you need to write 4 backslash \\\\ in the source code. 4. The String class is a regular-related method 4.1. The String class redefined a regular-related method

The String class in Java has 4 built-in methods that support the regular: matches (), Split (), Replacefirst (), and ReplaceAll () methods. Note that replace () is a pure string substitution and does not support regular expressions.

These methods do not optimize performance. We'll discuss the optimized classes later.

Method Description
S.matches ("regex") Determines whether the string S can match a regular "regex". Returns true only if the entire string match is positive.
S.split ("regex") Splits the string with the regular expression "regex" as the delimiter, returning the result is an string[] array. Note that the delimiter corresponding to the regex is not included in the return result.
S.replacefirst ("regex", "replacement") Replace the first match "regex" with the content "replacement."
S.replaceall ("regex", "replacement") Replaces all content that matches the regex with the replacement.

The following is a corresponding example.

Package de.vogella.regex.test;

public class Regexteststrings {public
        static final String example_test = ' This is my small EXAMPLE '
                        + ' String wh Ich I ' m going to "+" use for pattern matching. ";

        public static void Main (string[] args) {
                System.out.println (example_test.matches ("\\w.*"));
                String[] splitstring = (Example_test.split ("\\s+"));
                System.out.println (splitstring.length);//should be
                -for (String string:splitstring) {
                        System.out.println ( string);
                Replace all whitespace characters (whitespace)
                with tab SYSTEM.OUT.PRINTLN (Example_test.replaceall ("\\s+", "\ T");
        }
4.2. Example

Some examples of the use of regular expressions are given below. Please refer to the comment information.

If you are want to test this examples, create for the Java project de.vogella.regex.string.

If you want to test these samples, place the Java files under a Java package, such as de.vogella.regex.string.

Package de.vogella.regex.string;
                public class Stringmatcher {//If the string exactly matches ' true ', returns True public boolean isTrue (string s) {
        Return S.matches ("true"); //If the string exactly matches ' true ' or ' true ', returns True public boolean IsTrueVersion2 (string s) {RET
        Urn S.matches ("[tt]rue"); //If the string exactly matches ' true ' or ' true ' or ' yes ' or ' yes ', returns True public boolean Istrueoryes (S Tring s) {return s.matches ("[tt]rue|[
        Yy]es "); ///If include string "' true", returns True public boolean containstrue (string s) {return s.matches ("
        . *true.* "); //If contains 3 letters, returns True public boolean isthreeletters (String s) {return s.matches ("[a-za-
                Z]{3} ");
        Of course, it is equivalent to the following method of comparison soil/return s.matches ("[a-z][a-z][a-z]");
   //If it does not start with a number, returns True public boolean isnonumberatbeginning (String s) {     May be "^\\d.*" a Little Better return s.matches ("^[^\\d].*"); //If a character other than ' B ' is included, returns True public boolean isintersection (String s) {return s.matches ("
        ([\\w&&[^b]]); ///If the containing string number is less than 300, returns True public boolean islessthenthreehundred (string s) {return s.ma Tches ("[^0-9]*[12]?[
        0-9]{1,2}[^0-9]* "); }

}

and a small JUnit Test to validates the examples.

We validated it through JUnit tests.

Package de.vogella.regex.string;
Import Org.junit.Before;

Import Org.junit.Test;
Import static org.junit.Assert.assertFalse;

Import static org.junit.Assert.assertTrue;

        public class Stringmatchertest {private Stringmatcher m;
        @Before public void Setup () {m = new stringmatcher ();
                @Test public void Testistrue () {Asserttrue (M.istrue ("true"));
                Assertfalse (M.istrue ("true2"));
        Assertfalse (M.istrue ("True"));
                @Test public void TestIsTrueVersion2 () {Asserttrue (M.istrueversion2 ("true"));
                Assertfalse (M.istrueversion2 ("true2"));
        Asserttrue (M.istrueversion2 ("True"));;
                @Test public void Testistrueoryes () {Asserttrue (M.istrueoryes ("true"));
                Asserttrue (M.istrueoryes ("yes"));
                Asserttrue (M.istrueoryes ("yes")); AssertfalsE (M.istrueoryes ("no"));
        @Test public void Testcontainstrue () {Asserttrue (M.containstrue ("Thetruewithin"));
                @Test public void Testisthreeletters () {Asserttrue (M.isthreeletters ("abc"));
        Assertfalse (M.isthreeletters ("ABCD")); @Test public void testisnonumberatbeginning () {asserttrue (m.isnonumberatbeginning) ("ABC
                "));
                Assertfalse (m.isnonumberatbeginning ("1ABCD"));
                Asserttrue (m.isnonumberatbeginning ("A1BCD"));
        Asserttrue (m.isnonumberatbeginning ("ASDFDSF"));
                @Test public void Testisintersection () {Asserttrue (M.isintersection ("1"));
                Assertfalse (M.isintersection ("ABCKSDFKDSKFSDFDSF"));
        Asserttrue (M.isintersection ("skdskfjsmcnxmvjwque484242"));
            @Test public void testlessthenthreehundred () {    Asserttrue (m.islessthenthreehundred ("288"));
                Assertfalse (m.islessthenthreehundred ("3288"));
                Assertfalse (m.islessthenthreehundred ("328 8"));
                Asserttrue (m.islessthenthreehundred ("1"));
                Asserttrue (m.islessthenthreehundred ("99"));
        Assertfalse (m.islessthenthreehundred ("300")); }

}
5. Pattern and Matcher Introduction

For advanced regular Expressions The Java.util.regex.Pattern and Java.util.regex.Matcher classes are.

To support advanced features of regular expressions, you need to use the Java.util.regex.Pattern and Java.util.regex.Matcher classes.

The pattern object is first created/compiled to define the regular expression. For a pattern object, given a string, a corresponding Matcher object is generated. You can perform various regular-related operations on a String by Matcher the object.

Package de.vogella.regex.test;
Import Java.util.regex.Matcher;

Import Java.util.regex.Pattern; public class Regextestpatternmatcher {public static final String example_test = ' This are my small EXAMPLE string

        Which I ' m going to use for the pattern matching. ";
                public static void Main (string[] args) {Pattern pattern = pattern.compile ("\\w+");
                If you want to ignore the case, you can use://pattern pattern = pattern.compile ("\\w+", pattern.case_insensitive);
                Matcher Matcher = Pattern.matcher (example_test); Find all matching results while (Matcher.find ()) {System.out.print ("Start index:" + MATCHER.S
                        Tart ());
                        System.out.print ("End index:" + matcher.end () + "");
                System.out.println (Matcher.group ());
                //Replace the space with tabs pattern replace = pattern.compile ("\\s+"); Matcher Matcher2 = Replace. Matcher (Example_test);
        System.out.println (Matcher2.replaceall ("T")); }
}
6. Example of regular Expressions

The following is a list of common regular expression usage scenarios. The reader is expected to make appropriate adjustments according to the actual situation. 6.1 or (or)

tasks: write regular expressions that match lines that contain the word "Joe" or "Jim", or both.

Create the De.vogella.regex.eitheror package and the following class.

Package de.vogella.regex.eitheror;

Import Org.junit.Test;

Import static org.junit.Assert.assertFalse;
Import static org.junit.Assert.assertTrue;

public class Eitherorcheck {
        @Test public
        void Testsimpletrue () {
                String s = "Humbapumpa Jim";
                Asserttrue (S.matches (". * (Jim|joe). *"));
                s = "Humbapumpa jom";
                Assertfalse (S.matches (". * (Jim|joe). *"));
                s = "Humbapumpa Joe";
                Asserttrue (S.matches (". * (Jim|joe). *"));
                s = "Humbapumpa Joe Jim";
                Asserttrue (S.matches (". * (Jim|joe). *"));
        }
6.2. Matching phone number

tasks: write regular expressions that match various phone numbers.

Suppose the phone number is in the form of "7-bit consecutive digits"; or "3 digits plus a space/horizontal line, plus 4 digits."

Package de.vogella.regex.phonenumber;

Import Org.junit.Test;

Import static org.junit.Assert.assertFalse;
Import static org.junit.Assert.assertTrue;


public class Checkphone {

        @Test public
        void Testsimpletrue () {
                String pattern = ' \\d\\d\\d ([, \\s])? \\d\\d\\ D\\d ";
                String s= "1233323322";
                Assertfalse (s.matches (pattern));
                s = "1233323";
                Asserttrue (s.matches (pattern));
                s = "123 3323";
                Asserttrue (s.matches (pattern));
        }
6.3. Determine the range of specific numbers

The following example is used to determine whether the text has a contiguous 3-digit number.

Create the De.vogella.regex.numbermatch package and the following class.

Package de.vogella.regex.numbermatch;
Import Java.util.regex.Matcher;
Import Java.util.regex.Pattern;

Import Org.junit.Test;

Import static org.junit.Assert.assertFalse;
Import static org.junit.Assert.assertTrue;

public class CheckNumber {


        @Test
        the public void Testsimpletrue () {
                String s= "1233";
                Asserttrue (Test (s));
                s= "0";
                Assertfalse (Test (s));
                s = "KASDKF 2300 kdsdf";
                Asserttrue (Test (s));
                s = "99900234";
                Asserttrue (Test (s));

        public static Boolean test (String s) {pattern pattern
                = Pattern.compile ("\\d{3}");
                Matcher Matcher = Pattern.matcher (s);
                if (Matcher.find ()) {return
                        true;
                }
                return false;
        }

}
6.4. Verify Hyperlink

Suppose you need to find all the valid links from your Web page. Of course, you need to exclude the "javascript:" and "mailto:" scenarios.

Create De.vogella.regex.weblinks packages, as well as the following classes:

Package de.vogella.regex.weblinks;

Import Java.io.BufferedReader;
Import java.io.IOException;
Import Java.io.InputStreamReader;
Import java.net.MalformedURLException;
Import Java.net.URL;
Import java.util.ArrayList;
Import java.util.List;
Import Java.util.regex.Matcher;
Import Java.util.regex.Pattern;

public class Linkgetter {
        private pattern htmltag;
        private pattern link;

        Public Linkgetter () {
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.