Common regular Expressions (? i) ignore the case of letters! __ Regular Expressions

Source: Internet
Author: User
Tags assert lowercase uppercase letter

1. ^/d+$//matching nonnegative integer (positive integer + 0)
2. ^[0-9]*[1-9][0-9]*$//Matching positive integer
3. ^ ((-/d+) | (0+)) $//matching non positive integer (negative integer + 0)
4. ^-[0-9]*[1-9][0-9]*$//matching negative integers
5. ^-?/d+$//Matching integer
6. ^/d+ (/./d+)? $//matching nonnegative floating-point number (positive floating-point number + 0)
7. ^ ([0-9]+/. [0-9]*[1-9][0-9]*) | ([0-9]*[1-9][0-9]*/. [0-9]+) | ([0-9]*[1-9][0-9]*)] $//matching positive floating-point numbers
8. ^ ((-/d+ (/./d+)?) | (0+ (/.0+)) $//matching non-positive floating-point numbers (negative floating-point number + 0)
9. ^ (-([0-9]+/. [0-9]*[1-9][0-9]*) | ([0-9]*[1-9][0-9]*/. [0-9]+) | ([0-9]*[1-9][0-9]*))] $///Match negative floating-point number
10. ^ (-?/d+) (/./d+)? $//matching floating-point number
11. ^[a-za-z]+$//Match a string of 26 English letters
12. ^[a-z]+$//Match a string of 26 uppercase letters
13. ^[a-z]+$//Match string consisting of 26 lowercase letters
14. ^[a-za-z0-9]+$//Match a string of numbers and 26 English letters
15. ^/w+$//Match A string of numbers, 26 English letters, or underscores
16. ^[/w-]+ (/.[ /w-]+) *@[/w-]+ (/.[ /w-]+) +$//matching email address
17. ^[a-za-z]+://Matching (/w+ (-/w+) *) (/. ( /w+ (-/w+) *) * (/?/s*)? $//matching URL
18. Matching regular expressions for Chinese characters: [/U4E00-/U9FA5]
19. Match Double-byte characters (including Chinese characters): [^/x00-/xff]
20. Application: Computes the length of the string (a double-byte character length meter 2,ascii character 1)
String.prototype.len=function () {return This.replace ([^/x00-/xff]/g, "AA"). Length;}
21st. A regular expression that matches a blank row:/n[/s|] */r
22. Regular expression matching HTML tags:/< (. *) >.*<///1>|< (. *)//>/
23. Matching a regular expression with a trailing space: (^/s*) | (/s*$)

* Regular expression Use cases
* 1, ^/s+[a-z a-z]$ can not be empty can not be a space can only be English letters
* 2,/s{6, can not be empty for more than six digits
* 3, ^/d+$ can not have a space can not be not a digital
* 4, (. *) (/.jpg|/.bmp) $ can only be JPG and BMP format
* 5, ^/d{4}/-/d{1,2}-/d{1,2}$ can only be 2004-10-22 format
* 6, ^0$ at least one of the selected
* 7, ^0{2,}$ select at least two
* 8, ^[/s|/s]{20,}$ can not be empty for more than 20 words
* 9, ^/+? [A-z0-9] (([-+.]| [_]+)? [a-z0-9]+) *@ ([A-z0-9]+ (/.| /-)) +[a-z]{2,6}$ Mail
* 10,/w+ ([-+.] /w+) *@/w+ ([-.] /w+) */./w+ ([-.] /w+) * ([,;] /s*/w+ ([-+.] /w+) *@/w+ ([-.] /w+) */./w+ ([-.] /w+) *) * Enter multiple addresses separate messages with commas or spaces
* 11, ^ (/([0-9]+/))? [0-9] {7,8}$ Phone number 7-bit or 8-bit or preceded by an area code such as (022) 87341628
* 12, ^[a-z-Z 0-9 _]+@[a-z-z 0-9 _]+ (/.[ A-Z 0-9 _]+) + (/,[a-z-Z 0-9 _]+@[a-z-z 0-9 _]+ (/.[ A-Z 0-9 _]+) + *$
* Can only be letters, numbers, underscores; must have @ and. Simultaneous format to standardize messages
* ^/w+@/w+ (/./w+) + (/,/w+@/w+ (/./w+) +) *$ The above expression can also be written in this way, more concise.
^/w+ ((-/w+) | ( /./w+)) */@/w+ (/.| -) */./w+$/w+)







Limit conditions
Final String CONDITION = "(? =.*[a-z])" (? =.*[a-z]) (? =.*//d) ";

Characters that are allowed to appear
Final String Special_char = "[-a-za-z0-9!$%& ()/;<?{}" //[//]^////]";

Number
Final String QUANTITY = "{8,16}";



Reply to the 1 floor

(? =.*[a-z]) represents the character that must appear after the current position, which can be understood to have a lowercase letter. *[a-z.
Or can be understood as a gap between a character must meet the conditions, this only as a condition to judge does not match any word
Character, because this belongs to the Lookarround 0 width matching in the non-capturing group.

Let's take a common example:

Expression: Win (? =xp)
The existing string WinXP and WinNT, when the expression is applied, the former can match it, and why.

When the match indicates to (? =XP), which is the gap behind the N letter, the gap must be satisfied
The condition is: the character behind must be XP, if it is, the match succeeds, otherwise the match fails. Because
(? =xp) is matching the gap, so does not put XP to match the output, and only output the Win so, this
The semantics of an expression can be seen as: Find the Win that is followed by the "XP" character.

If we write the expression in Win (? =xp) (? =nt), then this semantics is: find the back
For "XP" and for "NT" characters all Win can imagine that this is a
Matches that could never be satisfied. (? =xp) (=NT) The condition that the current gap must be met simultaneously.

Change this expression and change it to win (? =.*xp) (? =.*nt) This means that the back of win must appear
XP and NT, location and order are irrelevant (this is mainly. * role). Of course, the effect of this expression
Rate is relatively low, it has to be two times back to assert.

If the string is wincbaxpabcnt this string, it starts when the matching indicator goes to the slot behind N
To make a backward assertion, first of all, to assert the. *xp, it is obvious that the CBAXP can be matched successfully, when the first break
The word is complete, again to the. *nt assertion, you can see that cbaxpabcnt can match it successfully, when the second assertion ends
So that the expression Win (? =.*xp) (? =.*nt) can match the wincbaxpabcnt.

The same effect is true of WINCBANTABCXP.

If you can understand these above, for (? =.*[a-z]) (? =.*[a-z]) (? =.*//d) This should not
It's hard, it's just three conditions that must be met at the same time.

This expression is asserted at the beginning, where the index is 0, which is the front of the first character.
Gaps, the characters behind this gap must be met. *[a-z]. *[a-z]. *//d three conditions, which means
Must be followed by at least one lowercase letter, at least one uppercase parent, and at least one digit.


As for the use of expression 2, that is, the escape of characters in the [] should be noted.

^ and-there is some meaning in the expression of [] structure.

[^ABC] means all characters except ABC, note that this is put in front of the meaning of the expression,
If changed to [A^BC] This represents only a ^ b C four characters. If you need to match ^ this character
, do not put it first, if you must put in the first, you have to use the escape character.

-in [] denotes the range of characters, such as [A-z], that represents the 26 letters between A and Z,
[A-za-z] This represents a A-Z and a-Z 52 letters. The scope of use should be noted, if written
[Z-a], the scope is checked when the expression is compiled Pattern.compile, which produces
Exception, so when using-range, the following Unicode value must be greater than or equal to the preceding Unicode
Value.

If you want to match "-", try not to put-this between the characters, can be placed on both sides of [].
For example [-a-z] This can match 26 lowercase letters and "-". Of course, we can also write
[A-z-a-z] This can match 52 letters and "-", but this is not intuitive, we would rather write
[a-za-z-] or [-a-za-z].





2: Do not start with so-and-so, such as www.


Java Code
public class Test {
public static void Main (string[] args) {
String[] STRs = {"abc1232", "WWWADSF", "Awwwfas", "WWADFSF", "", "ww", "", "www"};
String regex = "(?:(?! ^WWW).) *";
for (String str:strs) {

System.out.printf ("%-7s%s%n", str, str.matches (regex));
}
}
}


(?! X) The professional name is Negative lookahead, which indicates characters that are not allowed to appear after the gap between the characters.
That is, match the gap between the characters, if the characters after the gap is not X, then the gap will match the success.

For example, AAB and AAC, existing expression AA (?!) B then we can match to the string is AAC,
Since the slot behind the AA does not allow character B to appear, only AAC is matched.

Let's look at an example:

Java Code

public class Test {
public static void Main (string[] args) {
String str = "Aquickbrownfoxjumpsoverthelazydog";
string[] STRs = Str.split ("(? <!^) (? =[a-z])");
for (String s:strs) {
System.out.println (s);
}
}
}


Splits a string according to uppercase letters. Of course, this parsing can also be split by using a string,
But the use of regular expressions to disassemble the words is more convenient and intuitive.

In this split, because the number of characters after the split can not be reduced, you can only use a 0-width
Lookaround function to match, Lookaround includes four, namely:

Java Code
(? =x) (?! X) (? <=x) (? <! X


Take a look at this expression: (? <!^) (? =[a-z])

The previous mentioned (?!) means something that is not allowed to appear behind a gap, and (? <!) means something that is not allowed to appear before the gap.
(? =) to indicate what is allowed to occur after a gap (? <=) indicates what is allowed before the gap.

This expression is split at a splitting point, based on a 0-width matching slot, which must meet the following conditions:

(? <!^) that the gap is not allowed before the start of the line, that is, the gap can not appear in front of the first letter.
(? =[a-z]) indicates that a-Z uppercase letter is allowed to appear after the gap.

The expression then matches the following | The Gap:

Java Code
a| quick| brown| Fox| jumps| over| the| lazy| Dogps: If you do not add (? <!^), it will become: | a| quick| brown| Fox| jumps| over| the| lazy| Dog


According to the function of split, the regular expression handler is based on the above | Split the string into separate parts.


3, case-insensitive
A match without any restrictions is a matching case, but the regular expression can be changed,
There are two ways: parametric and inline.

Let's look at an example:

Java Code
Import Java.util.regex.Matcher;
Import Java.util.regex.Pattern;
public class Test {
public static void Main (string[] args) {
String str = "book";
Pattern pattern = pattern.compile ("book");
Matcher Matcher = Pattern.matcher (str);
System.out.println (Matcher.matches ());
}
}


The above expression book cannot match the string book, so we can just give the compile-time argument:

Pattern pattern = pattern.compile ("book", Pattern.case_insensitive);

Pattern.case_insensitive This is a constant of type int, with a value of 2. Indicates that the expression ignores case for a district match.

If we don't match the pattern with the Matcher two classes, just use the String matches method,
We can not specify the compiler parameters of the expression, then we need to adopt an inline flag expression, and the pattern.case_insensitive
The corresponding inline flag expression is (? i), which has four different forms:
1, (? i)
2, (?-I.)
3, (? i:x)
4, (?-i:x)
Not with-is the opening sign, with-is off the mark.

Change the above code to this:

Java Code
public class Test {
public static void Main (string[] args) {
String str = "book";
String regex = "(? i) book";
System.out.println (Str.matches (regex));
}
}


We have achieved the same effect, of course it is not the best, because only B in the string is uppercase,
We do not need to match all the characters in a case-insensitive case, we can open the logo with the
The second form immediately shuts it down:
String regex = "(? i) b (? i) ook";

In this case, only B is case-sensitive, and the following (?-i) still have to be case-sensitive. Write like this
May seem very uncomfortable, we can also use the 3rd form to directly specify that some characters are case-insensitive.
String regex = "(? i:b) Ook";

Such an expression is semantically the same as the one above. On the efficiency is definitely better than suddenly open, suddenly shut.

Visible inline flag Expressions are much more powerful than specifying compilation parameters.

Use recommendations: If you can determine the case of certain characters, try to use the identified characters, for the indeterminate can be used
(? i:x) in the manner specified. So when you open a case-insensitive switch, there is a certain effect on the performance of the match.

Think about it: String regex = "(? i) b (?-i:oo) k"; The meaning of this expression.


In addition: 1th and 4th, I did not see understand what needs to know, please in the floor below to specify.



1: Multi-line matching

In the default case. Can not match the line terminator (there are 6 line end characters, specific to see the pattern of the API DOC)
Similarly, you can use compilation parameters like mismatched case matching: pattern.dotall

If you have to distinguish the case, you have to add the above mentioned pattern.case_insensitive this, for example:

Java Code
Import Java.util.regex.Matcher;
Import Java.util.regex.Pattern;
public class Test {
public static void Main (string[] args) {
String str = "<table> n" + "<tr>/n" + "<td>/n" + "Hello world! /n "+" </td>/n "+" </tr>/n "+" </table> ";
String regex = "<td> (. +?) </td> ";
Pattern pattern = pattern.compile (regex);
Matcher Matcher = Pattern.matcher (str);
while (Matcher.find ()) {
System.out.println (Matcher.group (1). Trim ());
}
}
}


There's no way to extract anything from STR because the TD has a line break behind it, so we just need to change it:

Pattern pattern = pattern.compile (regex, Pattern.dotall);

This will do, if the TD has to be case-insensitive, and then changed to:

Java Code
Pattern pattern = pattern.compile (regex, Pattern.dotall | pattern.case_insensitive);


In this case, the TD even uppercase this expression can be the TD between the character area extracted.

Of course, like Pattern.case_insensitive, Pattern.dotall also has an inline flag expression, that is, (? s)
The meaning of s means that the single-line is to ignore the line break or something, only as a single line for processing.

This expression uses inline (? s) to read:

Java Code
String regex = "(? s) <td> (. +?) </td> If the case is not case-sensitive, plus the I flag: String regex = "(? s) (? i) <td> (. +?) </td> "But it's so sloppy that you can combine them: String regex =" (. is) <td> (. +?) </td> "; Order doesn't matter.


The last thing I need to say is that I've seen it because I don't understand dotall, to make it. Match the line terminator and write the expression directly:

Java Code
String regex = "<td> (. | s) +?) </td> ";


This is extremely dangerous, due to the problem of selecting the matching efficiency of the structure, which can result in a stack overflow in a longer string.
Causes the program to crash, if you use Dotall or (s), this will not happen.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.