Author: nuysoft/js Siege master/Gao Yun qq:47214707 email:nuysoft@gmail.com
Statement: This article for the original article, if you want to reprint, please specify the source and retain the original link.
Post-announcement: Regular expression analysis in jquery
2.4 Common Regular expressions
On the internet to find a widely circulated article "Common Regular expression", analysis, one by one, insufficient place to supplement and correct.
Copy Code code as follows:
Common number regular (strict match)
Regular meaning
^[1-9]\d*$ matching positive integer
^-[1-9]\d*$ matching negative integers
^-? [1-9]\d*$ Matching integer
^[1-9]\d*|0$ matching nonnegative integer (positive integer + 0)
^-[1-9]\d*|0$ Matching non positive integer (negative integer + 0)
^[1-9]\d*\.\d*|0\.\d*[1-9]\d*$ matching positive floating-point numbers
^-([1-9]\d*\.\d*|0\.\d*[1-9]\d*) $ matching negative floating-point number
^-? ([1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0) $ matching floating-point number
^[1-9]\d*\.\d*|0\.\d*[1-9]\d*|0?\.0+|0$ matching nonnegative floating-point number (positive floating-point number + 0)
^ (-([1-9]\d*\.\d*|0\.\d*[1-9]\d*)) |0?\.0+|0$ matching non-positive floating-point numbers (negative floating-point number + 0)
Copy Code code as follows:
Common String Regular
The regular meaning complements
^[a-za-z]+$ matches a string or/^[a-z]+$/i consisting of 26 English letters
^[a-z]+$ matches a string of 26 English-letter Capitals
^[a-z]+$ matches a string consisting of 26 lowercase letters
^[a-za-z0-9]+$ matches a string of numbers and 26 English letters note \w contains underscores _
^\w+$ matches a string of numbers, 26 English letters, or underscores
Common numbers regular and common string regular, is the most basic regular application, the reader can be used as an introductory exercise, try to quickly read the meaning of it.
Copy Code code as follows:
Match Chinese characters
The general use of the regular is [\U4E00-\U9FA5], but this scope is not complete. For example:
/[\u4e00-\u9fa5]/.test (' ⻏ ')//test radical ⻏, return False
According to the Unicode version 5.0 encoding, it is accurate to determine that a Chinese character should include:
Scope meaning range meaning
2e80-2eff CJK Radical Supplement 2F00-2FDF Kangxi Dictionary Radical
3000-303f CJK Symbols and punctuation 31c0-31ef CJK Strokes
3200-32FF closed CJK Text and month 3300-33FF CJK compatible
3400-4DBF CJK Unified ideographic Sign extension A 4dc0-4dff I ching 64 gua symbol
4E00-9FBF CJK Unified Ideographic Symbol F900-faff CJK compatible Hieroglyphics
fe30-fe4f CJK compatible form FF00-FFEF full-angle ASCII, full-width punctuation
Therefore, the correct match for the single-character expression of the literal is:
var RCJK =/[\u2e80-\u2eff\u2f00-\u2fdf\u3000-\u303f\u31c0-\u31ef\u3200-\u32ff\u3300-\u33ff\u3400-\u4dbf\u4dc0-\ u4dff\u4e00-\u9fbf\uf900-\ufaff\ufe30-\ufe4f\uff00-\uffef]+/g;
If you do not want to match punctuation or symbols, remove the corresponding range in the regular:
3000-303f CJK Symbols and punctuation ff00-ffef full-angle ASCII, full-width punctuation
Copy Code code as follows:
Match Double-byte characters (including Chinese characters)
[^\x00-\xff], which can be used to compute the length of a string (a double-byte character length meter 2,ascii 1 characters), as shown in the code example:
Console.info ("ABC". Replace (/[^\x00-\xff]/g, "AA"). Length)//3
Console.info ("kanji". Replace (/[^\x00-\xff]/g, "AA"). Length)//4
Console.info ("ABC kanji". Replace (/[^\x00-\xff]/g, "AA"). Length)//7
Copy Code code as follows:
Regular expressions that match HTML tags
Let's talk about the version circulated online:
< (\s*?) [^>]*>.*?</\1>|<.*? />
*? * Represents 0 or more,? represents 0 or 1, two are superimposed to identify 0 multiple, overlap with the function of the *
(\s*?) The label must be greater than 0 in length, so it cannot be used with *?
|<.*?\/> is not grouped and cannot get a label written in <div/> this closed format
</\1>
<.*? /> Some of the labels are not closed, such as <BR><HR>, and therefore cannot be forced to close
The amendments are as follows:
var rtag =/^< ([a-z]+) \s*\/?>.* (?:<\/\1>)? $/i
Rtag.exec (' <-div></-div> ')//null
Rtag.exec (' <div>abc ')//["<div>abc", "div"]
This expression is also imperfect, such as the second Test statement, so written to be able to extract the contents of the tag contains text, if you want to strictly match, can be modified again:
var rtag =/^< ([a-z]+) \s*\/?> (?:<\/\1>)? $/i//Minus the middle. *
This regular application range is limited to simple tag matching, extraction, and cannot match nested tags.
Copy Code code as follows:
Regular expressions that match the ending and trailing whitespace characters
Let's talk about the version circulated online:
^\s*|\s*$
You can delete white space characters at the end of a line at the beginning, for example:
' \ t \n\r abc \ t \n\r '. Replace (/^\s*|\s*$/g, ')//ABC
However, \s* cannot be used to determine whether a string has \s at the beginning or end, for example:
/^\s*|\s*$/.test (' abc ')//True
The amendments are as follows:
^\s+|\s+$
' \ t \n\r abc \ t \n\r '. Replace (/^\s+|\s+$/g, ')//ABC
/^\s+|\s+$/.test (' abc ')//False
Copy Code code as follows:
Regular expression matching an email address
First introduced under the rules of email: local-part@domain
local-part longest 64,domain up to 253, total length up to 256
local-part can use any ASCII character:
Uppercase and lowercase English letters a-z,a-z
Number 0-9
Character!#$%& ' *+-/=?^_ ' {|} ~
Characters. cannot be the first and last and cannot appear two consecutive times
But some mail servers reject e-mail addresses that contain special characters
domain (domain name) is limited to 26 English letters, 10 digits, conjunction number-
Conjunction number-cannot be the first character
Top-level domain name (COM, CN, etc.) is 2 to 6 length
Let's talk about the version circulated online:
\w+ ([-+.] \w+) *@\w+ ([-.] \w+) *\.\w+ ([-.] \w+) *
() Strange and strange grouping, if only grouped not recorded, you can use (?:)
@\w domain cannot contain underscores _
\w+ ([-.] \w+) * Top-level domain name does not conform to the rules
The amendments are as follows:
var remail =/^ ([\w-_]+ (?: \. [\w-_]+) *) @ ((?: [a-z0-9]+ (?:-[a-za-z0-9]+) *) +\.[ a-z]{2,6}) $/i
Remail.exec (' nuysoft@gmail.com ')//"nuysoft@gmail.com", "Nuysoft", "gmail.com"]
Remail.exec (' nuysoft@gmail.comcomcom ')//null
Remail.exec (' nuysoft@_gmail.com)//null
The revised regularization has the following limitations:
Do not support Chinese email, Chinese domain name, the reason is not in the support is because of my personal inclination, aversion to this kind of flashy stuff
do not support special symbols, to avoid non-mail server refused, if necessary, you can add.
Reference articles:
Http://en.wikipedia.org/wiki/Email_address
Http://baike.baidu.com/view/119298.htm
Copy Code code as follows:
Regular expressions that match URL URLs
Let's talk about the version circulated online:
[a-za-z]+://[^\s]*
Rough, not grouping individual blocks in URLs
The amendments are as follows (another version circulated online):
var _url = "^ (HTTPS|HTTP|FTP|RTSP|MMS)?:/ /)?" //
+ "([0-9a-z_!~* ' (). &=+$%-]+:)?" [0-9a-z_!~* ' (). &=+$%-]+@)? " user@ of FTP
+ "([0-9]{1,3}.) {3} [0-9] {1,3} "//IP form of URL-199.194.52.184
+ "|"//Allow IP and domain (domains)
+ "([0-9a-z_!~* ' ()-]+.) * "//Domain name-www."
+ "([0-9a-z][0-9a-z-]{0,61})?" [0-9a-z]. " Level two domain name
+ "[a-z]{2,6}]"///domain-. com or. Museum
+ "(: [0-9]{1,4})?"//Port-: 80
+ "((/?)|" A slash isn ' t required if there is no file name
+ "(/[0-9a-z_!~* ' ().;?: @&=+$,%#-]+) +/?" $";
var rurl = new RegExp (_url, ' I ');
Test:
Rurl.exec (' baidu.com ')//["baidu.com", Undefined, undefined, undefined, undefined, "baidu.com", Undefined, "baid", Unde Fined, undefined, "", "", undefined]
Rurl.exec (' http://baidu.com ')//
Rurl.exec (' http://www.baidu.com ')//["http://baidu.com", "http://", "http", Undefined, Undefined, "baidu.com", Undefined, "baid", Undefined, Undefined, "", "", undefined]
Rurl.test (' Baidu ')//True
It doesn't seem to work very well, but it's still to learn Todo.
Copy Code code as follows:
Match account number is legal
Let's talk about the version circulated online:
^[a-za-z][a-za-z0-9_]{4,15}$
(beginning of letter, allowing 5-16 bytes, allowing alphanumeric underlines)
The limit must start with a letter now seems inappropriate, such as QQ login platform
Restrictions can not be the beginning of the underscore is not necessary, such as Baidu is allowed, so the simple point
The amendments are as follows:
var ruser =/\w{4,16}/
Copy Code code as follows:
Match domestic phone number
The spread of the online version is very easy to use:
\D{3}-\D{8}|\D{4}-\D{7}
Commentary: Match form such as 0511-4405222 or 021-87888822
Copy Code code as follows:
Match Tencent QQ number
The spread of the online version is very easy to use:
[1-9] [0-9] {4,}
Commentary: Tencent QQ number starting from 10000
Copy Code code as follows:
Match China ZIP code
The spread of the online version is very easy to use:
[1-9]\d{5} (?! \d)
Commentary: China postal code is 6 digits
Copy Code code as follows:
Matching ID
Let's talk about the version circulated online:
\D{15}|\D{18}
D{15}
\D{18} can be judged, but somewhat coarse
From the ID card can resolve the address, birthday, gender, etc., so specifically:
Identity card Rules
China's ID card is 15 digits (generation) or 18 digits (second generation), the difference is that the second generation is only in the generation of the seventh digit number plus 19 and at the end of the Add a verification code
Upgrade 15-bit to 18-bit, and resolve 18-digit number composition (address, birthday, sex)
The code is as follows:
function Parseid (ID) {
if (id.length = = 15) {
Upgrade to 18-bit
ID = Id.substr (0, 6) + "n" + id.substr (6);
The first 17-digit corresponding coefficients
var rank = [
"7", "9", "10", "5", "8", "4", "2", "1", "6", "3", "7", "9", "10", "5", "8", "4", "2"
];
The first 17 is the last ID number corresponding to the remainder of the weighted divide by 17.
var last = [
"1", "0", "X", "9", "8", "7", "6", "5", "4", "3", "2"
];
Weighted and
for (var i = 0, sum = 0, Len = id.length i < len; i++)
Sum + + id[i] * rank[i];
Plus the last one.
ID + last[sum% 11];
}
if (id.length!=) return null;
var match = rid.exec (ID);
Return match? {
Id:id,
area:match[1],
y:match[2],
m:match[3],
d:match[4],
sex:match[5]% 2
}: null;
}
Limit:
here only resolves the address code, how to convert the code to the actual address please Niang.
return the object in the sex is 1 (male) or 0 (female), did not do the conversion, if the page display needs, you can convert: Sex? "Male": "Female"
Test:
Console.info (Parseid ("142327840821047"));
Console.info (Parseid ("142327198408210470"));
Resources:
Http://baike.baidu.com/view/118340.htm#1
Copy Code code as follows:
Match IP address
Tell me about the online version:
\d+\.\d+\.\d+\.\d+
\d number is not restricted
Fixed as follows:
var rip =/^ (?:(?: [01]?\d{1,2}|2[0-4]\d|25[0-5]) \.) {3} (?: [01]?\d{1,2}|2[0-4]\d|25[0-5]) $/;
Rip.test ("192.168.1.1")//True
Rip.test ("0.0.0.0")//True
Rip.test ("255.255.255.255")/True
rip.test ("256.255.255.255")//False
Add further groupings:
var rip2 =/^ ([01]?\d{1,2}|2[0-4]\d|25[0-5]) \. ( [01]?\d{1,2}|2[0-4]\d|25[0-5]) \. ([01]?\d{1,2}|2[0-4]\d|25[0-5]) \. ([01]?\d{1,2}|2[0-4]\d|25[0-5]) $/;
Rip2.exec ("192.168.1.1")//["192.168.1.1", "1", "1"]
Rip2.exec ("0.0.0.0")//["0.0.0.0", "0", "0", "0", "0"]
rip2.exec ("255.255.255.255")//["255.255.255.255", "255", "255", "255", "255"]
Rip2.exec (" 256.255.255.255 ")//NULL