The code is as follows |
|
[U4e00-u9fa5] Chinese characters? [ufe30-uffa0] Full-angle characters [U4e00-u9fa5] Chinese characters? [ufe30-uffa0] Full-angle characters |
Matching regular expressions for Chinese characters: [U4E00-U9FA5]
Match Double-byte characters (including Chinese characters): [^x00-xff]
Application: Computes the length of the string (a double-byte character length meter 2,ascii character 1)
The code is as follows |
|
String.prototype.len=function () {return This.replace ([^x00-xff]/g, "AA"). Length;} |
A regular expression that matches a blank row: n[s|] *r
Regular Expression:/< (. *) >.* Matching HTML tags |< (. *)/>/
Matching a regular expression with a trailing space: (^s*) | (s*$)
If we know it, we're done.
code as follows |
copy code |
public static void Regxchinese () { ///String to match string s Ource = "<span title= ' 5 star hotels ' class= ' dx dx5 ' >"; //Convert the string above to lowercase //Source = Source.tolowercase (); A regular expression for the //matching string String reg_charset = "<span[^>]*?title=" ([0-9]*[s| s]*[u4e00-u9fa5]*) ' [s| S] *class= ' [a-z]*[s| S]*[a-z]*[0-9]* ' "; Pattern p = pattern.compile (Reg_charset); Matcher m = p.matcher (source); while (M.find ()) { System.out.println (M.group (1)); } } public static void Regxchinese () { ///to match string string source = "<span title=" 5 star hotels ' class= ' d x dx5 ' > '; //Convert the string above to lowercase //Source = Source.tolowercase (); A regular expression for the //matching string String reg_charset = "<span[^>]*?title=" ([0-9]*[s| s]*[u4e00-u9fa5]*) ' [s| S] *class= ' [a-z]*[s| s]*[a-z]*[0-9]* ' "; Pattern p = pattern.compile (Reg_charset); Matcher m = p.matcher (source); while (M.find ()) { System.out.println (M.group (1)); } } |
Java Regular expressions can match Chinese characters, while it is also possible to write an expression with the characters
The code is as follows |
Copy Code |
String reg_charset = "<span[^>]*?title=" ([0-9]*[s| S]* star Hotel) ' [s| S]*class= ' [a-z]*[s| S *[a-z]*[0-9]* ' "; String reg_charset = "<span[^>]*?title=" ([0-9]*[s| S]* star Hotel) ' [s| S]*class= ' [a-z]* [s| S]*[a-z]*[0-9]* ' "; |
Some common regular matching rules
Matching regular expressions for Chinese characters: [U4E00-U9FA5]
Commentary: Matching Chinese is really a headache, with this expression will be easy to do
Match Double-byte characters (including Chinese characters): [^x00-xff]
Commentary: can be used to compute the length of a string (a double-byte character length meter 2,ascii 1 characters)
A regular expression that matches a blank row: ns*r
Commentary: can be used to delete blank lines
Regular expression:< matching HTML tags (s*?) [^>]*>.*?| <.*? />
Commentary: The online version is too bad, the above can only match the part of the complex nested tags still powerless
A regular expression that matches the end-end whitespace character: ^s*|s*$
Commentary: A useful expression that can be used to delete white-space characters (including spaces, tabs, page breaks, and so on) at the end of a line at the beginning
Regular expression matching an email address: w+ ([-+.] w+) *@w+ ([-.] w+) *.w+ ([-.] w+) *
Commentary: Form validation is useful
Regular expressions that match URL URLs: [a-za-z]+://[^s]*
Commentary: Online circulation of the version of the function is very limited, which can meet the basic requirements
Match account number is legal (beginning of letter, allow 5-16 bytes, allow alphanumeric underline): ^[a-za-z][a-za-z0-9_]{4,15}$
Commentary: Form validation is useful
Match domestic phone number: D{3}-d{8}|d{4}-d{7}
Commentary: Match form such as 0511-4405222 or 021-87888822
Matching Tencent QQ Number: [1-9][0-9]{4,}
Commentary: Tencent QQ number starting from 10000
Match China ZIP Code: [1-9]d{5} (?! D
Commentary: China postal code is 6 digits
Matching ID: d{15}|d{18}
Commentary: China's ID card is 15-or 18-digit
Matching IP address: d+.d+.d+.d+
Commentary: Useful when extracting IP addresses