Dark Horse programmer-Java basics-Regular Expressions

Source: Internet
Author: User
Tags character classes control characters printable characters uppercase letter alphanumeric characters

Regular Expression:A regular expression.
Function: used to operate strings.
Features: Used to indicate specific symbolsCodeOperation. This simplifies writing.

Benefits: You can simplify complex operations on strings.
Disadvantages: the more symbols are defined, the longer the regular expression, and the worse the readability.

Specific operation functions:

1. Match: String matches method. Match the entire string with the rule. If one of them does not match the rule, the match ends and false is returned.

/* Only 13xxx 15xxx 18xxxx */public static void checktel () {string Tel = "16900001111 "; string telreg = "1 [358] \ D {9}"; system. out. println (tel. matches (telreg);} public static void demo () {string STR = "b23a23456789"; string Reg = "[A-Za-Z] \ D *"; boolean B = Str. matches (REG); system. out. println (B);} public static void checkqq () {string QQ = "123a454"; string RegEx = "[1-9] \ D {4, 14 }"; boolean flag = QQ. matches (RegEx); If (FLAG) system. out. println (qq + "... is OK "); elsesystem. out. println (qq + "... invalid ");}/* verification requirements for QQ numbers: 5 ~ A value of 15 0 cannot start with a number. It can only be a number. The method in the string class is used to complete the combination. But the code is too complex. */Public static void checkqq_1 () {string QQ = "1882345a0"; int Len = QQ. length (); If (LEN> = 5 & Len <= 15) {If (! QQ. startswith ("0") // integer. parseint ("12a"); numberformatexception {try {long l = long. parselong (qq); system. out. println ("QQ:" + l);} catch (numberformatexception e) {system. out. println ("illegal character ....... ");}/* char [] arr = QQ. tochararray (); // 123a4boolean flag = true; For (INT x = 0; x <arr. length; X ++) {If (! (ARR [x]> = '0' & arr [x] <= '9') {flag = false; break ;}} if (FLAG) {system. out. println ("QQ:" + qq);} else {system. out. println ("invalid character");} */} else {system. out. println ("cannot start with 0") ;}} else {system. out. println ("Length error ");}}

 
2. Cut: String split ();

 
Public static void splitdemo (string STR, string REG) {// string Reg = "+"; // cut string by multiple spaces [] arr = Str. split (REG); system. out. println (ARR. length); For (string S: ARR) {system. out. println (s );}}

3. Replace: String replaceall (RegEx, STR); If RegEx has a defined group, you can use the $ symbol in the second parameter to obtain the existing group in the regular expression.

 
Public static void replacealldemo (string STR, string Reg, string newstr) {string STR = "wer13899820.ty1234564uiod234345675f"; // Replace the array in the string #. // Replacealldemo (STR, "\ D {5,}", "#"); string str1 = "erkktyqqquizzzzzo"; // Replace the overlapping words with $. // replace overlapping characters with a single letter. Zzzz-> zreplacealldemo (str1 ,"(.) \ 1 + "," $1 "); STR = Str. replaceall (Reg, newstr); system. out. println (STR );}

4. Obtain: remove the sub-strings that match the rules in the string.

Procedure:
1. encapsulate a regular expression into an object.
2. Associate the regular object with the string to be operated.
3. Obtain the regular expression matching engine after Association.
4. Use the engine to perform operations on the sub-strings that comply with the rules, such as extracting them.

 
Import Java. util. regEx. *; Class regexdemo2 {public static void main (string [] ARGs) {getdemo ();} public static void getdemo () {string STR = "Ming Tian JIU Yao Fang jia le, da Jia. "; System. Out. println (STR); string Reg =" \ B [a-Z] {4} \ B "; // encapsulate the rule into an object. Pattern P = pattern. Compile (REG); // associate the regular object with the string to be applied. Obtain the matching object. Matcher M = P. matcher (STR); // system. Out. println (M. Matches (); // actually the matches method in the string class. It is completed using the pattern and matcher objects. // It is easy to use after being encapsulated by the string method. However, the function is single. // Boolean B = M. Find (); // apply the rule to a string and search for substrings that comply with the rule. // System. Out. println (B); // system. Out. println (M. Group (); // obtain the matching result. // System. out. println ("Matches:" + M. matches (); While (M. find () {system. out. println (M. group (); system. out. println (M. start () + ".... "+ M. end ());}}}

 

/* Requirement: Check the email address. */Public static void checkmail () {string mail = "abc12@sina.com"; mail = "1@1.1"; string Reg = "[a-zA-Z0-9 _] + @ [a-zA-Z0-9] + (\\. [A-Za-Z] +) + "; // exact match. Reg = "\ W + @ \ W + (\. \ W +) +"; // relatively inaccurate matching. // Mail. indexof ("@")! =-1system. Out. println (mail. Matches (REG);}/* requirement: Convert the following string to: I want to learn programming. Which of the four functions is used? And which of the following? Method of thinking: 1. If you only want to know whether the character is correct or not, use a match. 2. replace an existing string with another one. 3. You want to convert a string into multiple strings in a custom way. Cutting. Obtain substrings other than the rule. 4. obtain the required string substring. Obtain the sub-string that matches the rule. */Public static void test_1 () {string STR = "I am... I am... I want .. to... to... learning .... learning... editing... programming .. cheng. cheng... cheng... process ";/* converts an existing string into another string. Use the replacement function. 1. Remove. First. 2. Convert multiple duplicate content into a single content. */STR = Str. replaceall ("\\. + "," "); system. out. println (STR); STR = Str. replaceall ("(.) \ 1 + "," $1 "); system. out. println (STR);}/* 192.68.1.254 102.49.23.013 10.10.10.10 2.2.2.2 8.109.90.30 sorts the IP addresses in the order of CIDR blocks. It also follows the natural order of strings, as long as each of them is 3 bits. 1. Perform the completion based on the maximum number of zeros required by each segment. Therefore, each segment must have at least three digits. 2. Retain only three digits for each segment. In this way, all IP addresses are 3 bits per segment. */Public static void ipsort () {string IP = "192.68.1.254 102.49.23.013 10.10.10.10 2.2.2.2 8.109.90.30"; IP = IP. replaceall ("(\ D +)", "00 $1"); system. out. println (IP); IP = IP. replaceall ("0 * (\ D {3})", "$1"); system. out. println (IP); string [] arr = IP. split (""); treeset <string> TS = new treeset <string> (); For (string S: ARR) {ts. add (s) ;}for (string S: TS) {system. out. println (S. replaceall ("0 * (\ D +)", "$1 "));}}

/* Web crawler (SPIDER) */import Java. io. *; import Java. util. regEx. *; import java.net. *; import Java. util. *; Class regextest2 {public static void main (string [] ARGs) throws exception {getmails_1 ();} public static void getmails_1 () throws exception {URL url = new URL ("http: // 192.168.1.254: 8080/myweb/mail.html"); urlconnection conn = URL. openconnection (); bufferedreader bufin = new bufferedreader (New inputstreamreader (Conn. Getinputstream (); string line = NULL; string mailreg = "\ W + (\\. \ W +) + "; pattern P = pattern. compile (mailreg); While (line = bufin. readline ())! = NULL) {matcher M = P. matcher (line); While (M. find () {system. out. println (M. group () ;}}/ * get the email address in the specified document. Use the get function. Pattern matcher */public static void getmails () throws exception {bufferedreader bufr = new bufferedreader (New filereader ("mail.txt"); string line = NULL; string mailreg = "\ W + @ \ W + (\\. \ W +) + "; pattern P = pattern. compile (mailreg); While (line = bufr. readline ())! = NULL) {matcher M = P. matcher (line); While (M. Find () {system. Out. println (M. Group ());}}}}

 

Java. util. RegEx
Class Pattern
 
Java. Lang. ObjectJava. util. RegEx. Pattern


Structure Match
 
Character
X CharacterX
\\ Backslash characters
\ 0N With octal values0CharacterN(0<= N <=7)
\ 0Nn With octal values0CharacterNn(0<= N <=7)
\ 0Mnn With octal values0CharacterMnn(0<= M <=3. 0<= N <=7)
\ XHH With hexadecimal value0xCharacterHH
\ UHhhh With hexadecimal value0xCharacterHhhh
\ T Tab ('\ U0009')
\ N New Line (line feed) character ('\ U000a')
\ R Carriage Return ('\ U000d')
\ F Page feed ('\ U000c')
\ Alarm (Bell) operator ('\ U0007')
\ E Escape Character ('\ U001b')
\ CX CorrespondsXController
 
Character class
[ABC] A,BOrC(Simple class)
[^ ABC] Any characterA,BOrC(No)
[A-Za-Z] AToZOrAToZ, Two letters included (range)
[A-d [M-p] AToDOrMToP:[A-DM-p](Union)
[A-Z & [DEF] D,EOrF(Intersection)
[A-Z & [^ BC] AToZ,BAndC:[Ad-Z](Minus)
[A-Z & [^ m-p] AToZ, RatherMToP:[A-SCSI-Z](Minus)
 
Predefined character classes
. Any character (may or may not match the line terminator)
\ D Number:[0-9]
\ D Non-numeric:[^ 0-9]
\ S Blank characters:[\ T \ n \ x0b \ f \ r]
\ S Non-blank characters:[^ \ S]
\ W Word character:A-zA-Z_0-9
\ W Non-word characters:[^ \ W]
 
POSIX character class (US-ASCII only)
\ P {lower} Lowercase letter:[A-Z]
\ P {upper} Uppercase letter:A-Z
\ P {ASCII} All ASCII:[\ X00-\ x7f]
\ P {Alpha} Letter:[\ P {lower} \ P {upper}]
\ P {digit} Decimal number:[0-9]
\ P {alnum} Alphanumeric characters:[\ P {Alpha} \ P {digit}]
\ P {punct} Punctuation:! "# $ % & '() * +,-./:; <=>? @ [\] ^ _ '{| }~
\ P {graph} Visible characters:[\ P {alnum} \ P {punct}]
\ P {print} Printable characters:[\ P {graph} \ x20]
\ P {Blank} Space or tab:[\ T]
\ P {cntrl} Control characters:[\ X00-\ x1f \ x7f]
\ P {xdigit} Hexadecimal number:[0-9a-fa-f]
\ P {space} Blank characters:[\ T \ n \ x0b \ f \ r]

 

Boundary
^ Start of a row
$ End of a row
\ B Word boundary
\ B Non-word boundary
\ Start of input
\ G Last matched end
\ Z The end of the input. It is only used for the last terminator (if any)
\ Z End of input
 
Greedy quantifiers
X? X, Neither once nor once
X* X, Zero or multiple times
X+ X, Once or multiple times
X{N} X, ExactlyNTimes
X{N,} X, At leastNTimes
X{N,M} X, At leastNTimes, but no moreMTimes
 
Reluctant quantifiers
X?? X, Neither once nor once
X*? X, Zero or multiple times
X+? X, Once or multiple times
X{N}? X, ExactlyNTimes
X{N,}? X, At leastNTimes
X{N,M}? X, At leastNTimes, but no moreMTimes
 
Possessive quantifiers
X? + X, Neither once nor once
X* + X, Zero or multiple times
X++ X, Once or multiple times
X{N} + X, ExactlyNTimes
X{N,} + X, At leastNTimes
X{N,M} + X, At leastNTimes, but no moreMTimes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.