Technical Summary of regular expression for data verification

Last Update:2018-12-07 Source: Internet

Author: User

Tags control characters

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Data verification is very common in both C/S and B/S. In the past, we liked to use a bunch of if... else... determine whether the input content meets the requirements.

Many languages now support regular expressions, which define a set of their own syntax rules (Common syntaxes include character matching, repeat matching, character locating, escape matching, and other advanced syntaxes)

To complete the verification of various materials, the power of the function is almost invincible to me.

But as far as I know (haha, it is very likely to be a good observation of the sky, if there is any uncomfortable, please forgive me for not seeing the world) many self-called (OR HE SAID) very few programmers do their work at ordinary times.

When using regular expressions, I do not know why. It may be because the familiar environment is relatively stable and I am too lazy to seek new information. It may also be when I see a long string such:

^ ([\ W-\.] +) @ (\ [0-9] {1, 3 }\. [0-9] {1, 3 }\. [0-9] {1, 3 }\.) | ([\ W-] + \.) +) ([A-Za-Z] {2, 4} | [0-9] {1, 3}) (\]?) $,

I don't want to seek new ideas. Of course, I may have mastered some common Regular Expressions for data verification, so I don't want to understand the principles.

Of course, this kind of phenomenon is good or bad. Everyone may think differently, but I still strongly recommend that you understand it. Mastering the use of regular expressions will bring a lot of convenience to your work,

In addition to data verification, it can also find and replace data and test specific conditions in text and data streams. Removes spam from a large number of emails. In Spam

In the recycle application, the program uses a regular expression to determine whether there is a known spam address in the Mail column. The Mail Filter program usually uses a regular expression to perform this operation.

The benefits of any technology also bring a lot of inconvenience. The power of regular expressions is also based on the complexity of its syntax-the readability is very poor! So if you want to use it well

The regular expression does not want to spend a day watching the long string of alien characters written by others. Only by learning it and writing it by yourself.

Appendix 1:

Regular Expression Syntax:

Character Description
\ Mark the next character as a special character, text, reverse reference, or octal escape character. For example, "N" matches the character "N ". "\ N" matches the line break. The sequence "\" matches "\", and "\ (" matches "(".
^ Matches the start position of the input string. If the multiline attribute of the Regexp object is set, ^ matches the position after "\ n" or "\ r.
$ Matches the position at the end of the input string. If the multiline attribute of the Regexp object is set, $ also matches the position before "\ n" or "\ r.
* Matches the character or subexpression zero or multiple times. For example, Zo * matches "Z" and "Zoo ". * Is equivalent to {0 ,}.
+ Match the previous character or subexpression once or multiple times. For example, "zo +" matches "zo" and "Zoo", but does not match "Z. + Is equivalent to {1 ,}.
? Matches the previous character or subexpression zero or once. For example, "Do (ES )?" Match "do" in "do" or "does ".? It is equivalent to {0, 1 }.
{N} n is a non-negative integer. Exactly match n times. For example, "o {2}" does not match "O" in "Bob", but matches two "O" in "food.
{N,} n is a non-negative integer. Match at least N times. For example, "o {2,}" does not match "O" in "Bob", but matches all o in "foooood. 'O {1,} 'is equivalent to 'o + '. 'O {0,} 'is equivalent to 'o *'.
{N, m} m and n are non-negative integers, where n <= m. Match at least N times and at most m times. For example, "O {1, 3}" matches the first three o in "fooooood. 'O {0, 1} 'is equivalent to 'o? '. Note: you cannot insert spaces between commas and numbers.
? When this character is followed by any other qualifier (*, + ,? , {N}, {n ,}, {n, m}), the matching mode is "non-greedy ". The "non-greedy" Mode matches the searched strings as short as possible, while the default "greedy" Mode matches the searched strings as long as possible. For example, in the string "oooo", "O + ?" Only one "O" is matched, and "O +" is matched with all "O ".
. Match any single character except "\ n. To match any character including "\ n", use a mode such as "[\ s.
(Pattern) matches pattern and captures the matched child expression. You can use the $0... $9 attribute to retrieve the captured match from the result "match" set. To match the parentheses (), use "\ (" or "\)".
(? : Pattern) a child expression that matches pattern but does not capture the matching. That is, it is a non-capturing match and is not stored for future use. This is useful for components that use the "or" character (|) combination mode. For example, compared with "Industry | industrial", "Industrial STR (? : Y | ies) is a more economical expression.
(? = Pattern) execute the Forward prediction first search subexpression, which matches the string at the starting point of the string matching pattern. It is a non-capture match, that is, it cannot be captured for future use. For example (? = 95 | 98 | nt | 2000) "matches" Windows "in" Windows 2000 ", but does not match" Windows "in" Windows 3.1. Prediction first does not occupy characters, that is, after a match occurs, the next matched search follows the previous match, rather than after the characters that constitute prediction first.
(?! Pattern) execute the subexpression of the reverse prediction first search, which matches the search string that is not at the start point of the string that matches the pattern. It is a non-capture match, that is, it cannot be captured for future use. For example, "windows (?! 95 | 98 | nt | 2000) "matches" windows "in" Windows 3.1 ", but does not match" windows "in" Windows 2000. Prediction first does not occupy characters, that is, after a match occurs, the next matched search follows the previous match, rather than after the characters that constitute prediction first.
X | y matches X or Y. For example, "z | food" matches "Z" or "food. "(Z | f) Ood" matches "zood" or "food.
[Xyz] character set. Match any character. For example, "[ABC]" matches "A" in "plain ".
[^ XYZ] Reverse character set. Match any character that is not included. For example, "[^ ABC]" matches "P" in "plain ".
[A-Z] character range. Matches any character in the specified range. For example, "[A-Z]" matches any lowercase letter in the range of "A" to "Z.
[^ A-Z] Reverse range character. Matches any character that is not within the specified range. For example, "[^ A-Z]" matches any character that is not in the range of "A" to "Z.
\ B matches a word boundary, that is, the position between the word and the space. For example, "Er \ B" matches "er" in "never", but does not match "er" in "verb ".
\ B Non-word boundary match. "Er \ B" matches "er" in "verb", but does not match "er" in "never ".
\ CX matches the control characters indicated by X. For example, \ cm matches a control-M or carriage return character. The value of X must be between the A-Z or a-Z. If this is not the case, it is assumed that C is the "c" character itself.
\ D numeric character match. It is equivalent to [0-9].
\ D non-numeric character match. It is equivalent to [^ 0-9].
\ F page feed match. It is equivalent to \ x0c and \ Cl.
\ N line breaks match. It is equivalent to \ x0a and \ CJ.
\ R matches a carriage return. It is equivalent to \ x0d and \ cm.
\ S matches any blank characters, including spaces, tabs, and page breaks. It is equivalent to [\ f \ n \ r \ t \ v.
\ S matches any non-blank characters. It is equivalent to [^ \ f \ n \ r \ t \ v].
\ T tab match. It is equivalent to \ x09 and \ CI.
\ V vertical tab matching. It is equivalent to \ x0b and \ ck.
\ W matches any character, including underscores. Equivalent to [A-Za-z0-9.
\ W any non-character match. Equivalent to [^ A-Za-z0-9.
\ XN matches n, where n is a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. For example, "\ x41" matches "". "\ X041" is equivalent to "\ x04" & "1. ASCII code can be used in regular expressions.
\ Num matches num, where num is a positive integer. To capture matched reverse references. For example, "(.) \ 1" matches two consecutive identical characters.
\ N identifies an octal escape code or a reverse reference. If \ n contains at least N capture subexpressions, then n is a reverse reference. Otherwise, if n is an octal number (0-7), n is an octal escape code.
\ Nm identifies an octal escape code or reverse reference. If there are at least one capture sub-expression before \ nm, then nm is a reverse reference. If there are at least N captures before \ nm, then n is a reverse reference, followed by M. If none of the preceding conditions exist, when N and m are octal values (0-7), \ nm memory ..

Appendix 2: common data verification technology (Regular Expression completed)

I. Network Verification Application Skills

1. Verify the e-mail format

Public bool isemail (string str_email)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_email,

@ "^ ([\ W-\.] +) @ (\ [0-9] {1, 3 }\. [0-9] {1, 3 }\. [0-9] {1, 3 }\.) | ([\ W-] + \.) +) ([A-Za-Z] {2, 4} | [0-9] {1, 3}) (\]?) $ ");
}

2. Verify the IP address

Public bool ipcheck (string IP)
{
String num = "(25 [0-5] | 2 [0-4] \ d | [0-1] \ D {2} | [1-9]? \ D )";
Return RegEx. ismatch (IP, ("^" + num + "\\. "+ num + "\\. "+ num + "\\. "+ num +" $ "));
}

3. Verify the URL

Public bool isurl (string str_url)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_url, @ "HTTP (s )? : // ([\ W-] + \.) + [\ W-] + (/[\ W -./? % & =] *)? ");
}

2. Common digit verification skills

1. Verify the phone number

Public bool istelephone (string str_telephone)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_telephone, @ "^ (\ D {3, 4 }-)? \ D {6, 8} $ ");
}

2. Enter Password conditions (both characters and data appear)

Public bool ispassword (string str_password)
{

Return System. Text. regularexpressions. RegEx. ismatch (str_password, @ "[A-Za-Z] + [0-9]");
}

3. Zip code

Public bool ispostalcode (string str_postalcode)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_postalcode, @ "^ \ D {6} $ ");
}

4. Mobile phone number

Public bool ishandset (string str_handset)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_handset, @ "^ [1] + [3, 5] + \ D {9} $ ");
}

5. ID card number

Public bool isidcard (string str_idcard)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_idcard, @ "(^ \ D {18} $) | (^ \ D {15} $ )");
}

6. two decimal places

Public bool isdecimal (string str_decimal)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_decimal, @ "^ [0-9] + (. [0-9] {2 })? $ ");
}

7. 12 months of a year

Public bool ismonth (string str_month)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_month, @ "^ (0? [[1-9] | 1 [0-2]) $ ");
}

8. 31 days of a month

Public bool isday (string str_day)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_day, @ "^ (0? [1-9]) | (1 | 2) [0-9]) | 30 | 31) $ ");
}

9. Digital Input

Public bool isnumber (string str_number)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_number, @ "^ [0-9] * $ ");
}

10. Password Length (6-18 characters)

Public bool ispasswlength (string str_length)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_length, @ "^ \ D {6, 18} $ ");
}

11. Non-zero positive integer

Public bool isintnumber (string str_intnumber)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_intnumber, @ "^ \ +? [1-9] [0-9] * $ ");
}

Iii. Common Character verification skills

1. uppercase letters

Public bool isupchar (string str_upchar)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_upchar, @ "^ [A-Z] + $ ");
}

2. lowercase letters

Public bool islowchar (string str_upchar)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_upchar, @ "^ [A-Z] + $ ");
}

3. Check the repeated words in the string.

Private void btnword_click (Object sender, eventargs E)
{
System. Text. regularexpressions. matchcollection matches = system. Text. regularexpressions. RegEx. Matches (label1.text,

@ "\ B (? <Word> \ W +) \ s + (\ K <word>) \ B ", system. text. regularexpressions. regexoptions. compiled | system. text. regularexpressions. regexoptions. ignorecase );
If (matches. Count! = 0)
{
Foreach (system. Text. regularexpressions. Match match in matches)
{
String word = match. Groups ["word"]. value;
MessageBox. Show (word. tostring (), "English word ");
}
}
Else {MessageBox. Show ("No duplicate words ");}

}

4. Replace the string

Private void button#click (Object sender, eventargs E)
{

String strresult = system. Text. regularexpressions. RegEx. Replace (textbox1.text, @ "[A-Za-Z] \ *? ", Textbox2.text );
MessageBox. show ("replace Prefix:" + "\ n" + textbox1.text + "\ n" + "Replace:" + "\ n" + textbox2.text + "\ n" +

"Character After replacement:" + "\ n" + strresult, "replace ");

}

5. Split the string

Private void button#click (Object sender, eventargs E)
{
// Instance: A 025-8343243 B 0755-2228382 Bing 029-32983298389289328932893289 ding
Foreach (string s in system. Text. regularexpressions. RegEx. Split (textbox1.text, @ "\ D {3, 4}-\ D *"))
{
Textbox2.text + = s; // output "A, B, and C" in turn"
}

}

6. Verify the input letter
Public bool isletter (string str_letter)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_letter, @ "^ [A-Za-Z] + $ ");
}

7. Verify the input Chinese Characters

Public bool ischinese (string str_chinese)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_chinese, @ "^ [\ u4e00-\ u9fa5], {0,} $ ");
}

8. Verify the input string (at least 8 characters)

Public bool islength (string str_length)
{
Return System. Text. regularexpressions. RegEx. ismatch (str_length, @ "^. {8,} $ ");
}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More