Using Pattern+matcher to extract specific text from a large string in Java programming

Source: Internet
Author: User
String regex= "[0-9]{2}\\:[0-9]{2}\\:[0-9]{2}"; Extracting xx:xx:xx is also time pattern
pattern=pattern.compile (regex);
String Input=result.trim ();
Matcher matcher=pattern.matcher (input);
while (Matcher.find ()) {
     log.d (TAG, ">>>>>" + matcher.group (0));
 }


The following is the complement of regular expressions:

Basic knowledge of regular expressions

Let's start with a simple first. Let's say you're searching for a string that contains the character "cat", and the regular expression for the search is "cat." If the search is not sensitive to case, the word "catalog", "Catherine", "sophisticated" can all match. Other words:
1.1 Period Symbol
Suppose you are playing English Scrabble and want to find three-letter words that must begin with the letter "T" and End With "n" Letters. In addition, suppose you have an English dictionary, you can use regular expressions to search all of its contents. To construct this regular expression, you can use a wildcard character-the period symbol ".". In this way, the complete expression is "T.N", which matches "tan", "ten", "Tin" and "ton", and also matches "T#n", "TPN" or even "T n", and many other meaningless combinations. This is because the period symbol matches all characters, including spaces, tab characters, and even line breaks:
1.2 Square brackets Symbol
In order to solve the problem that the period symbol matching range is too broad, you can specify a meaningful character in square brackets ("[]"). At this point, only the character character character specified in the square brackets participate in the match. That is, the regular expression "t[aeio]n" matches only "tan", "Ten", "Tin", and "ton". But "Toon" does not match, because within the square brackets you can only match a single character:
1.3 "or" symbol
If you want to match "toon" in addition to all the words above, you can use the "|" Operator. | The basic meaning of an operator is the "or" operation. to match "Toon", use the "t (A|e|i|o|oo) n" Regular expression. You cannot use a square extension here because the brackets allow only a single character to be matched, and you must use the parentheses "()" here. Parentheses can also be used to group, as described later in this article.
1.4 Symbols that indicate the number of matches
Table One shows the symbols that represent the number of matches that are used to determine the number of occurrences of the symbol immediately to the left of the symbol:

Suppose we want to search the U.S. Social Security number in a text file. The format of this number is 999-99-9999. The regular expression used to match it is shown in figure one. In a regular expression, a hyphen ("-") has a special meaning, which represents a range, for example, from 0 to 9. Therefore, when matching the hyphenation symbol in the social security number, it is preceded by an escape character "\".

Figure I: Matching Social Security numbers in all 123-12-1234 forms

Suppose you want the hyphen to appear or not when you are searching-that is, 999-99-9999 and 999999999 are in the correct format. At this point, you can add the word ". The quantity qualifier symbol, as shown in Figure two:

Figure II: Matching Social Security numbers for all 123-12-1234 and 123121234 forms

Let's look at another example here. A format for U.S. car licences is four digits plus two letters. Its regular expression is preceded by the number part "[0-9]{4}", plus the letter part "[A-z]{2}". Figure three shows the complete regular expression.

Figure three: Match a typical U.S. car license number, such as 8836KV

1.5 "no" symbol
The "^" symbol is called a "no" symbol. If used in square brackets, "^" denotes a character that you do not want to match. For example, the regular expression in Figure four matches all words except words that begin with the "X" letter.

Figure four: match all words except the beginning of "X"

1.6 Parentheses and blank symbols
Suppose you want to extract the month portion from a birthday date formatted with "June 26, 1951", and the regular expression that matches the date can be as shown in Figure five:

Figure five: matching dates for all moth dd,yyyy formats

The newly appearing "\s" symbol is a blank symbol that matches all whitespace characters, including the tab character. If the string matches correctly, then how do you extract the month portion? Simply create a group with parentheses around the month, and then extract its value with the Oro API (discussed in detail later in this article). The modified regular expression is shown in Figure six:

Figure VI: Matches the date of all month dd,yyyy formats, defining the month value as the first group

1.7 Other symbols
For simplicity, you can use some shortcut symbols that are created for common regular expressions. As shown in table two:
Table II: Common symbols

For example, in the previous example of social Security numbers, "\d" can be used in all places where "[0-9]" appears. The modified regular expression is shown in Figure seven:

Figure VII: Matching Social Security numbers in all 123-12-1234 formats

Second, Jakarta-oro Library
There are a number of open source regular expression libraries available to Java programmers, and many of them support Perl 5-compliant regular expression syntax. What I'm using here is the Jakarta-oro regular expression library, which is one of the most comprehensive regular expression APIs, and it's completely compatible with Perl 5 regular expressions. In addition, it is one of the best optimized APIs.
Jakarta-oro Library formerly called Oromatcher,daniel Savarese generously donated it to Jakarta Project. You can download it by following the instructions in the final reference resource in this article.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.