If you've ever used Perl or any other built-in regular expression-supported language, you know how simple it is to handle text and matching patterns with regular expressions. If you are unfamiliar with the term, then "regular expression" (Regular Expression) is a string of characters that defines a pattern for searching for a matching string.
Many languages, including Perl, PHP, Python, JavaScript, and JScript, support the use of regular expressions to process text, and some text editors implement advanced search-replace functionality with regular expressions. So what about Java? When writing this article, a Java Specification requirement (specification Request) with regular expressions for text processing is recognized, and you can expect to see it in the next version of the JDK.
However, what if you need to use regular expressions now? You can download the source code open Jakarta-oro Library from apache.org. The next section of this article briefly introduces the introductory knowledge of regular expressions, and then uses the Jakarta-oro API as an example to describe how to use regular expressions.
Basic knowledge of regular expressions
Let's start with a simple first. Let's say you're searching for a string that contains the character "cat", and the regular expression for the search is "cat." If the search is not sensitive to case, the word "catalog", "Catherine", "sophisticated" can all match. Other words:
1.1 Period Symbol
Suppose you are playing English Scrabble and want to find three-letter words that must begin with the letter "T" and End With "n" Letters. In addition, suppose you have an English dictionary, you can use regular expressions to search all of its contents. To construct this regular expression, you can use a wildcard character-the period symbol ".". In this way, the complete expression is "T.N", which matches "tan", "ten", "Tin" and "ton", and also matches "T#n", "TPN" or even "T n", and many other meaningless combinations. This is because the period symbol matches all characters, including spaces, tab characters, and even line breaks:
1.2 Square brackets Symbol
In order to solve the problem that the period symbol matching range is too broad, you can specify a meaningful character in square brackets ("[]"). At this point, only the character character character specified in the square brackets participate in the match. That is, the regular expression "t[aeio]n" matches only "tan", "Ten", "Tin", and "ton". But "Toon" does not match, because within the square brackets you can only match a single character:
1.3 "or" symbol
If you want to match "toon" in addition to all the words above, you can use the "|" Operator. | The basic meaning of an operator is the "or" operation. to match "Toon", use the "t (A|e|i|o|oo) n" Regular expression. You cannot use a square extension here because the brackets allow only a single character to be matched, and you must use the parentheses "()" here. Parentheses can also be used to group, as described later in this article.