Scanner class and regular expression

Last Update:2015-01-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Import Java.util.Scanner;  Public class scannertostring {    publicstaticvoid  main (string[] args) {         New Scanner ("inputstring");        SYSTEM.OUT.PRINTLN (scanner);    }}

Output:

Java.util.scanner[delimiters=\p{javawhitespace}+][position=0][match valid=false][need input=false ][source closed=false][skipped=false][group separator=\,][decimal separator=\.] [Positive prefix=] [Negative prefix=\q-\e] [Positive suffix=] [Negative suffix=]
[NaN String=\q?\e] [Infinity String=\q∞\e]

Java.util.Scanner

    // A pattern for Java whitespace    Private Static Pattern Whitespace_pattern = pattern.compile (                                                \\p{javawhitespace}+);

About Regular Expressions:

First article: What is a regular expression?
Before learning about regular expressions in Java, let's look at what regular expressions are:
1, regular expression is a powerful and flexible text processing tool;
2, technically, the regular expression to implement the operation of the string, in the previous, these tasks are generally assigned to the Java string, StringBuffer and stringtokenizer these classes;
3, regular expression general and I/O joint use;
4. Regular expressions allow us to programmatically specify the complex text patterns that can be found in the input string, and once we have discovered these patterns, we can process them in whatever way we want them to be.
5. Regular expressions provide a compact, dynamic language that can solve various string processing problems (e.g., matching, selecting, editing and validating) in a completely generic way;
Second article: How do I create a regular expression?
Well, see here I think you should have a certain understanding of the regular expression, let me explain how to create a regular expression:
To learn the regular expression must understand the regular expression of the construction set, only to understand the structure set, master the pattern matching principles and methods, you can write a suitable for your needs of the regular expression, for example, to construct a house model, it is necessary to use the building blocks, let them in the appropriate way to flatten, The matching symbol in the tectonic set is like a small building block for building a house model, and a complete list of constructed regular expressions can be found in the Javadocs pattern class, which I have listed here for reference in order to facilitate querying (note: Excerpt from JDK 5.0 Documentation)
————————————————————————————————————————————————
Data reference: Construction summary of regular expressions

1) characters
X character X
\ \ backslash Character
\0n characters with octal value 0 N (0 <= n <= 7)
\0nn character nn with octal value 0 (0 <= n <= 7)
\0mnn characters with octal value 0 mnn (0 <= m <= 3, 0 <= n <= 7)
\xhh character hh with hexadecimal value of 0x
\uhhhh characters with a hexadecimal value of 0x HHHH
\ t tab (' \u0009 ')
\ n New Line (newline) character (' \u000a ')
\ r return character (' \u000d ')
\f page Break (' \u000c ')
\a Alarm (Bell) symbol (' \u0007 ')
\e Escape character (' \u001b ')
\cx the control that corresponds to X

2) Character class
[ABC] A, B or C (simple Class)
[^ABC] Any character except A, B, or C (negation)
[A-za-z] A to Z or A to Z, the letters at both ends are included (range)
[A-d[m-p]] A to D or M to P:[a-dm-p] (set)
[A-z&&[def]] D, E or F (intersection)
[A-Z&AMP;&AMP;[^BC]] A to Z, except B and C:[ad-z] (minus)
[A-z&&[^m-p]] A to Z, not M to P:[a-lq-z] (minus)

3) predefined character classes
. Any character (may or may not match the line terminator)
\d number: [0-9]
\d non-numeric: [^0-9]
\s whitespace characters: [\t\n\x0b\f\r]
\s non-whitespace characters: [^\s]
\w Word character: [a-za-z_0-9]
\w non-word characters: [^\w]

4) POSIX character class (Us-ascii only)
\p{lower} lowercase alphabetic characters: [A-z]
\p{upper} uppercase characters: [A-z]
\P{ASCII} all ascii:[\x00-\x7f]
\p{alpha} alphabetic characters: [\p{lower}\p{upper}]
\p{digit} decimal number: [0-9]
\p{alnum} alphanumeric characters: [\p{alpha}\p{digit}]
\P{PUNCT} punctuation:! " #$%& ' () *+,-./:;<=>[email protected][\]^_ ' {|} ~
\p{graph} visible characters: [\p{alnum}\p{punct}]
\p{print} printable characters: [\p{graph}\x20]
\p{blank} spaces or tabs: [\ t]
\p{cntrl} control characters: [\x00-\x1f\x7f]
\p{xdigit} hex Number: [0-9a-fa-f]
\p{space} white space character: [\t\n\x0b\f\r]

5) Java.lang.Character class (Simple Java character type)
\p{javalowercase} is equivalent to Java.lang.Character.isLowerCase ()
\p{javauppercase} is equivalent to Java.lang.Character.isUpperCase ()
\p{javawhitespace} is equivalent to Java.lang.Character.isWhitespace ()
\p{javamirrored} is equivalent to java.lang.Character.isMirrored ()

6) classes of Unicode blocks and categories
\p{ingreek} characters in Greek block (simple block)
\P{LU} capital letters (simple category)
\P{SC} currency symbol
\p{ingreek} all characters except (negation) in the Greek block
[\p{l}&&[^\p{lu}]] All letters, except capital letters (minus)

7) Boundary Matching device
^ The beginning of the line
End of the $ line
\b Word boundaries
\b Non-word boundaries
\a the beginning of the input
\g the end of the previous match
\z the end of the input, only for the last terminator (if any)
\z End of input

8) Greedy number of words
X? X, not once or once
X* X, 0 or more times
x+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n times, but no more than m times

9) Reluctant number of words
X?? X, not once or once
X*? X, 0 or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n times, but not more than m times

Ten) possessive number of words
x?+ X, not once or once
x*+ X, 0 or more times
X + + ×, one or more times
x{n}+ X, exactly n times
x{n,}+ X, at least n times
x{n,m}+ X, at least n times, but no more than m times

One) Logical operator
XY X followed by Y
X| Y X or Y
(x) x, as capturing group

) Back Reference
\ n Any matching nth capturing group

13) References
\ Nothing, but references the following characters
\q nothing, but references all characters until \e
\e Nothing, but ends a reference starting from \q

14) Special construction (non-capture)
(?: x) x, as a non-capturing group
(? idmsux-idmsux) Nothing, but the match flag is switched from on to off
(? idmsux-idmsux:x) X, as a non-capturing group with the given flag On-off
(? =x) X, through a 0-width positive lookahead
(?! x) x, through a 0-width negative lookahead
(? <=x) X, through a 0-width positive lookbehind
(? <! x) x, through a 0-width negative lookbehind
(? >x) X, as a standalone, non-capturing group
————————————————————————————————————————————————
As an example, each of the following expressions is a valid regular expression, and all expressions will successfully match the character sequence Rudolph.
The first: Rudolph Note: Is the word itself, absolute match;
The second: [Rr]udolph Note: The first letter can select a match in R and R, so it can also be successful;
Third: [rr][aeiou][a-z]ol.* Note: The first letter is matched in R and R, the second letter is matched in the original letter Aeiou, the third letter matches the 26 letters of A to Z, the fourth letter and the fifth letter are absolutely matched, the following characters can be arbitrary, Therefore, such a match can also be successful;
Fourth: r.* Note: The first letter is absolutely matched, the following character can be any matching principle, so it is also successful.
In this example, I encountered a. * Match symbol, here I only explain:. Represents any character, * represents any character. can appear 0 or more times, this is a very common matching symbol, it is best to remember.
Similarly, to write a regular expression that fits your needs, you can refer to the Construction class table for the various characters, as well as the rules and methods of matching.
Article three: an important concept in regular expressions: quantifiers
Why say quantifier is an important concept in the regular expression, how to have "Java programming thought" a book of students can see, the quantifier is put in a separate chapter to tell, it is obvious that its common, mainly because the quantifier construction set in the regular expression of the writing time is very ordinary use of the structure set, It can also be understood as a soul of a regular expression, so attach importance to it.
Quantifier, the literal understanding is the number of words, in the regular expression it is how to re-understand it?
In the Java programming idea, the definition is: "quantifiers describe the way a pattern absorbs input text", here the way is divided into three kinds: greedy, reluctant, possessive.
The popular explanation can be this: the limit and Match of a character or string in the number of occurrences.
In the above tectonic set also wrote the three ways to match the standard, here do not introduce, for the three ways there is any difference, I give the "Java programming Idea" a book to make the explanation, you can according to their own business needs to choose.
1) greedy. Quantifiers are always greedy unless other options are set. The greedy expression will find as many matches as possible for all possible patterns. A typical reason for this is to assume that our pattern matches only the first possible set of characters, and if it is greedy, it will continue to match down.
2) reluctantly. There is a question mark to specify that the quantifier matches the minimum number of characters required to satisfy the pattern. Therefore also known as lazy, least matched, non-greedy, or not greedy.
3) Possession of. Quantifiers are currently available only in the Java language (not in other languages), and it is more advanced, so we probably won't use it immediately. When a regular expression is applied to a string, it produces quite a few States so that it can backtrack when the match fails. And the possessive quantifiers do not preserve these intermediate states, so we can prevent backtracking. They are often made more efficient by preventing the regular expression from getting out of control.
Note: When using quantifiers, it is best to enclose the characters or strings with quantifiers in parentheses;
Fourth: pattern and matching device
After introducing some of the basics of regular expressions, let's talk about how regular expressions are embodied in Java, more clearly how they are implemented in Java, and how to write code.
In Java, regular expressions are implemented through the pattern and matcher two classes in a Java.util.regex package. A pattern object represents a compiled version of a regular expression that can be compared in a similar way to a Java file and a class file, one for the programmer to see, one for the virtual machine to execute, and a regular expression in Java to be converted to a pattern object in order to be used eventually. The Complie () method of the pattern can be implemented, and we can use the Matcher () method and the input string to generate the Matcher object from the compiled pattern object, and in Matcher you get the information you want to get the result value after the processing is completed.
This article introduces several important methods in the Matcher class:
1) Find (): The user discovers multiple pattern matches applied to charsequence (that is, the input string), and find () is like an iterator that can move the iteration forward in the input string, in the second version of Find () You can specify an integer parameter to tell the location of the character to start the search;
2) GroupCount (): Before introducing this method, we present a concept: "group"
A group is a number of regular expressions separated by parentheses, which can then be called according to their group number. The No. 0 Group represents the entire match expression, the 1th group represents the first group enclosed in parentheses, and so on, so there are three groups in the expression A (B (C)) d: Group No. 0 ABCD, 1th group BC, 2nd group C.
The GroupCount () method returns the number of groupings in the pattern. It is important to note that group No. 0 is not included.
3) Group (): Returns the input subsequence that was matched by the previous match operation.
4) Group (int i): Returns the input subsequence captured by a given group during a previous match operation. If the match succeeds, but the specified group does not match any part of the input string, NULL is returned.
5) Start (int group): Returns the initial index of a subsequence captured by a given group during a previous match operation.
6) End (int group): Returns the offset after the last character of a subsequence captured by a given group during a previous match operation.
7) Start (): Returns the initial index of the previous match.
8) End (): Returns the offset after the last matching character.
Pattern Tags:
A pattern marker is a tag parameter that can affect the matching behavior of a regular expression, listing the effects of these tags for ease of understanding:
PATTERN.CANON_EQ: Enable canonical equivalence.
Pattern.case_insensitive: Enables case-insensitive matching.
Pattern.comments: whitespace and annotations are allowed in the pattern.
Pattern.dotall: Enable Dotall mode, in Dotall mode, expression. Matches many characters, including row terminator, by default, the. Expression does not match the row terminator.
Pattern.literal: Enables pattern-literal parsing.
Pattern.multiline: Enables multi-line mode.
Pattern.unicode_case: Enables UNICODE-aware case folding.
Pattern.unix_lines: Enables UNIX line mode.
Replace operation:
Substitution operations play a particularly important role in regular expressions, and several ways to implement substitution operations are:
1) Replacefirst (string replacement): Replaces the first matching part of the input string with replacement.
2) ReplaceAll (string replacement): Replaces all matching portions of the input string with replacement.
3) appendreplacement (StringBuffer SB, String replacement): Gradually perform the substitution in SB, instead of just replacing the first match like Replacefirst (), or like ReplaceAll () Replace all matches like that. This is a very important method because it allows us to invoke certain methods to perform some other processing to produce replacement (unlike Replacefirst () and ReplaceAll () only enter a fixed string). With this approach, we can programmatically implement the splitting of the target into groups and create powerful replacements.
4) Appendtail (StringBuffer SB): Called after one or more appendreplacement () calls to copy the remainder of the input string.
Reset (String s) method:
You can apply a Matcher object to a new sequence of characters.
So far, the regular expression in Java general application is almost, if you want to learn more about the use of regular expressions in Java can refer to the Java programming ideas, "Mastering Regular Expression_r (second edition)", Jeffrey E . F.friedl (O ' reilly,2002).
Examples of application of regular expressions:

ImportJava.util.regex.Matcher;ImportJava.util.regex.Pattern; Public classregextest{ Publicregextest () {} Public Static voidMain (string[] arg) {regextest main=Newregextest (); Main.emailmre ("[Email protected]"); Main.replacemre ("3.23+4.34433-34433.3434", "F"); }        Public voidEmailmre (String textstr) {string MRE= "\\[email protected]\\w+[." \\w+ "; Pattern P=Pattern.compile (MRE); Matcher m=P.matcher (TEXTSTR); if(M.find ()) {System.out.println ("Verified success!"); } Else{System.out.println ("Validation failed!"); }    }        Public voidReplacemre (String textstr, String requeststr) {string MRE= "\\d{1,8}[." \\d{1,8} "; StringBuffer SB=NewStringBuffer (); Pattern P=Pattern.compile (MRE); Matcher m=P.matcher (TEXTSTR); inti = 1;  while(M.find ()) {System.out.println ("Find the" +i+ "match:" + m.group () + "Location:" + m.start () + "-" + (M.end ()-1));            M.appendreplacement (SB, REQUESTSTR); I++;        } m.appendtail (SB); System.out.println (The replaced result string is: "+SB); }}

OutPut:

Verify success! found 1th match:3.23 position: 0-3 found 2nd match:4.34433 position: 5-11 found 3rd match:34433.3434 position: 13-22 The result string replaced by : F+f-f

Http://blog.sina.com.cn/s/blog_701780040100m3z6.html

Scanner class and regular expression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Scanner class and regular expression

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support