Android learning notes ---- Regular Expressions

Source: Internet
Author: User

Regular Expressions are very powerful tools that provide powerful functions for string processing. This article briefly introduces the regular expression syntax and how to apply it in Java.

So what is a regular expression? In Baidu encyclopedia, it is described as follows: in computer science, it refers to a single string used to describe or match a series of strings that conform to a certain syntax rule. In many text editors or other tools, regular expressions are usually used to retrieve and/or replace text content that meets a certain pattern. Many programming languages Support string operations using regular expressions. For example, a powerful Regular Expression Engine is built in Perl. The concept of regular expressions was initially popularized by tools in Unix (such as SED and grep. Regular Expressions are abbreviated as "RegEx". The singular values include Regexp and RegEx, And the plural values include regexps, regexes, and regexen.

To understand what a regular expression is, let's take a simple example:

"aa".matches(".*");

After reading the API documentation, we found this method.: Public Boolean matches (string regularexpression); the input parameter is a regular expression.

1. Simply understand ".", "\ D", "\ D", "\ s", "\ s", "\ W", "\ W"

These are called prefined character classes in regular expressions (defined character classes ).

"." Any character (may or may not match line
Terminators)

Used to match any character.

"\ D" a digit: [0-9]

Used to match numbers. It is equivalent to [0-9 ].

"\ D" A non-digit: [^ 0-9]

Used to match non-numbers. It is equivalent to [^ 0-9 ].

"\ S" A whitespace character: [\ t \ n \ x0b \ f \ r]

It is used to match blank characters, including space, \ t, \ n, \ x0b, \ f, \ r. Its usage and: [\ t \ n \ x0b \ f \ r] is equivalent

"\ S" A non-whitespace character: [^ \ s]

Used to match non-blank characters.

"\ W" a word character: [a-zA-Z_0-9]

Used to match word characters, including lowercase letters of A-Z, uppercase letters of A-Z, underscores, and numbers.

"\ W" A non-word character: [^ \ W]

Used to match non-word characters.


2. A short answer refers to the use of "set" and "[", "]" in regular expressions.

[ABC] A, B or C.

It can be described by the concept in mathematics. And can be described as A, B, or C.

[^ ABC] any character t a, B, c

This is similar to the concept of finding a complementary set in a set. It can also be understood as "not". And any character except A, B, and C.

[A-Za-Z] and [A-Z [A-Z]

Both are "a-Z or A-Z. The former is a parallel relationship or a union relationship.

[A-Z & [DEF] Intersection

Is the meaning of the intersection.

3. boundary matching, "^", "$", "\ B", "\ B", "\ A", "\ G", "\ Z ", "\ Z"

"^" The beginning of a line

Matches the start of a string. For example, "ABC". Matches ("^ A. *"); returns true.

"$" The end of a line

Matches the end of a string. For example, "ABC". Matches ("ABC $"); returns true.

"\ B" a word boundary

Match the word boundary. For example, "Hello world". Matches ("Hello \ B \ swordl"); returns true.

"\ B" A non-word boundary

Match non-word boundary.

"\ A" the beginning of the Input

Matches the start of the input.

"\ G" The End Of The previos match

"\ Z"The
End of the input but for the final Terminator, if any

"\ Z" The end of the input.


3. limit the number of times. There are three more types: greedy quantifiers, reluctant quantifiers, and possessive quantifiers.

1, greedy quantifiers.

In fact, the usage of the three modes is basically the same, but the matching policies will be different, so sometimes the results will be different.

X? X, once or not at all

X does not appear once or once.

X * X, zero or more times

X does not appear or appears any time.

X + X, once or more times

X appears once or more

X {n} X, EXACTLY n times

X appears n times

X {n,} X, at least N times

X appears at least N times

X {n, m} X, atleast n times Bu no more than m times

X appears in the range of N and M.

2, reluctant quantifiers

X ??, X *?, X ++ ?, X {n }?, X {n ,?}?, X {n, m }? The explanations correspond to those in greedy quantifiers.

3. possessve quantifiers

X? +, X *+, X ++, X {n}+, X {n ,?}+, X {n, m}+Their explanations correspond to greedy.
Description in quantifiers.

So what are the differences between the three of them? First, let's look at the following example in the pipeline:

1, greedy quantifiers:

Pattern p = Pattern.compile("\\w+\\d{2}");Matcher m = p.matcher("aa22bb22");if(m.find()){System.out.println(m.group());}

The output result is aa22bb22;

2, reluctant quantifiers:

Pattern p = Pattern.compile("\\w+?\\d{2}");Matcher m = p.matcher("aa22bb22");if(m.find()){System.out.println(m.group());}

The output result is aa22;

3. Possessive quantifiers;

Pattern P = pattern. Compile ("\ W ++ \ D {2 }");

Matcher m = p.matcher("aa22bb22");if(m.find()){System.out.println(m.group());}

The output result is null, that is, no matching result is obtained.

From the above example, we can easily see that their direct difference is still very big. From the perspective of name, greedy: greedy. This means you want to search for strings as long as possible in the matching results. Reluctant: reluctant. It means to find the string as short as possible from the matching result. Possessive: possessive. It is the longest length in a regular expression to match. If the length cannot match, no matching is performed.


4. Pattern and matcher

These two classes are the two most basic classes used for regular expression in Java. These two classes are also used in the above example, which provides many practical methods. This requires us to check the API Wendan when it is practical.

Of course, due to space limitations and the original intention of this article, I only wrote some of the most basic usage, and there are other common matches in the document of pattern, we need to go to the document for help.

Finally, we should remind you that in Java, whenever a slash "\" is used, it must be replaced by a double slash "\", because the single slash will be combined with other characters into escape characters.


"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.