Transferred from: http://www.liaoxuefeng.com/wiki/001434446689867b27157e896e74d51a89c25cc8b43bdb3000/ 001434499503920bb7b42ff6627420da2ceae4babf6c4f2000
Strings are the most data structure involved in programming, and the need to manipulate strings is almost ubiquitous. For example, to determine whether a string is a legitimate email address, although you can programmatically extract @
the substring before and after, and then judge whether it is a word and domain name, but this is not only cumbersome, and code is difficult to reuse.
A regular expression is a powerful weapon used to match strings. Its design idea is to use a descriptive language to define a rule for a string, and any string that conforms to the rule, we think it "matches", otherwise the string is illegal.
So the way we judge whether a string is a legitimate email is:
Create a regular expression that matches the email;
Use the regular expression to match the user's input to determine whether it is legal.
Because the regular expression is also represented by a string, we first know how to describe the character with characters.
In regular expressions, if a character is given directly, it is exactly the exact match. To match a \d
number, \w
you can match a letter or a number, so:
.
Can match any character, so:
‘js.‘
Can match ‘jsp‘
, ‘jss‘
, and ‘js!‘
so on.
To match a variable-length character, in a regular expression, with a representation of *
any character (including 0), with a representation of +
at least one character, representing ?
0 or 1 characters, with a representation of {n}
n characters, represented by {n,m}
n-m characters:
Take a look at a complex example: \d{3}\s+\d{3,8}
.
Let's read from left to right:
\d{3}
Indicates a match of 3 digits, for example ‘010‘
;
\s
Can match a space (also including tab and other white space characters), so that \s+
there is at least one space, such as matching ‘ ‘
, ‘\t\t‘
etc.;
\d{3,8}
Represents a 3-8 number, for example ‘1234567‘
.
Together, the above regular expression can match a telephone number with an area code separated by any space.
What if you want to match ‘010-12345‘
a number like this? Because ‘-‘
it is a special character, it is escaped in the regular expression, ‘\‘
so the above is \d{3}\-\d{3,8}
.
However, there is still no match ‘010 - 12345‘
because there are spaces. So we need more complex ways of matching.
Advanced
To make a more accurate match, you can use a []
representation range, such as:
[0-9a-zA-Z\_]
Can match a number, letter, or underscore;
[0-9a-zA-Z\_]+
Can match a string of at least one number, letter, or underscore, for example, and ‘a100‘
‘0_Z‘
‘js2015‘
so on;
[a-zA-Z\_\$][0-9a-zA-Z\_\$]*
You can match a string consisting of a number, letter, or underscore, or $, which is the name of the variable allowed by JavaScript, by a letter or underscore, or $.
[a-zA-Z\_\$][0-9a-zA-Z\_\$]{0, 19}
More precisely limit the length of a variable to 1-20 characters (1 characters before + 19 characters later).
A|B
Can match A or B, so (J|j)ava(S|s)cript
you can match ‘JavaScript‘
, ‘Javascript‘
, ‘javaScript‘
or ‘javascript‘
.
^
Represents the beginning of a row, ^\d
indicating that a number must begin.
$
Represents the end of a line, indicating that it \d$
must end with a number.
You may have noticed it, but you can match it, js
‘jsp‘
but plus ^js$
it turns into an entire line match, it only matches ‘js‘
.
Regexp
With the knowledge of readiness, we can use regular expressions in JavaScript.
JavaScript has two ways of creating a regular expression:
The first way is by /正则表达式/
writing it directly, and the second way is by new RegExp(‘正则表达式‘)
creating a RegExp object.
The two formulations are the same:
var re1 = /ABC\-001/;var re2 = new RegExp(‘ABC\\-001‘);re1; // /ABC\-001/re2; // /ABC\-001/
Note that if you use the second notation because of the escape problem of the string, the two of the string \\
is actually one \
.
Let's look at how to tell if a regular expression matches:
/^\d{3}\-\d{3,8}$/;re.test(‘010-12345‘); // truere.test(‘010-1234x‘); // falsere.test(‘010 12345‘); // false
The method of the RegExp object test()
is used to test whether a given string conforms to a condition.
Slicing a string
Using regular expressions to slice a string is more flexible than a fixed character, see the normal segmentation code:
‘a b c‘.split(‘ ‘); // [‘a‘, ‘b‘, ‘‘, ‘‘, ‘c‘]
Well, you can't recognize contiguous spaces, try using regular expressions:
‘a b c‘.split(/\s+/); // [‘a‘, ‘b‘, ‘c‘]
No matter how many spaces can be divided normally. Add to try ,
:
‘a,b, c d‘.split(/[\s\,]+/); // [‘a‘, ‘b‘, ‘c‘, ‘d‘]
Try again ;
:
‘a,b;; c d‘.split(/[\s\,\;]+/); // [‘a‘, ‘b‘, ‘c‘, ‘d‘]
If the user enters a set of tags, next time remember to use regular expressions to convert the nonstandard input into the correct array.
Group
In addition to simply judging whether a match is matched, the regular expression also has the power to extract substrings. The ()
Grouping (group) to be extracted is represented by the. Like what:
^(\d{3})-(\d{3,8})$
Two groups are defined separately, and the area code and local numbers can be extracted directly from the matching string:
var re = /^(\d{3})-(\d{3,8})$/;re.exec(‘010-12345‘); // [‘010-12345‘, ‘010‘, ‘12345‘]re.exec(‘010 12345‘); // null
If a group is defined in a regular expression, you can extract the substring from the RegExp
object using a exec()
method.
exec()
After the match succeeds, the method returns one Array
, the first element is the entire string to which the regular expression matches, and the subsequent string represents the successful substring.
exec()
Method is returned when a match fails null
.
Extracting substrings is useful. Look at a more vicious example:
var re = /^(0[0-9]|1[0-9]|2[0-3]|[0-9])\:(0[0-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-9]|[0-9])\:(0[0-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-9]|[0-9])$/;re.exec(‘19:05:30‘); // [‘19:05:30‘, ‘19‘, ‘05‘, ‘30‘]
This regular expression can directly identify the legal time. However, there are times when it is not possible to fully validate with regular expressions, such as identifying dates:
var re = /^(0[1-9]|1[0-2]|[0-9])-(0[1-9]|1[0-9]|2[0-9]|3[0-1]|[0-9])$/;
For ‘2-30‘
, ‘4-31‘
such illegal date, with regular or can not be recognized, or write out to be very difficult, then need to program with identification.
Greedy match
In particular, a regular match is a greedy match by default, which is to match as many characters as possible. For example, match the following numbers 0
:
var re = /^(\d+)(0*)$/;re.exec(‘102300‘); // [‘102300‘, ‘102300‘, ‘‘]
Because \d+
of the greedy match, directly the back of 0
all matching, the result 0*
can only match the empty string.
\d+
a non-greedy match (that is, as few matches as possible) must be used in order to match the latter 0
and add a ?
\d+
non-greedy match to it:
var re = /^(\d+?)(0*)$/;re.exec(‘102300‘); // [‘102300‘, ‘1023‘, ‘00‘]
Global Search
JavaScript regular expressions also have several special flags, most commonly used g
to represent global matches:
var r1 = /test/g;// 等价于:var r2 = new RegExp(‘test‘, ‘g‘);
A global match can execute exec()
the method multiple times to search for a matching string. When we specify a g
flag, each time it is run exec()
, the regular expression itself updates the lastIndex
property, representing the last index to which it was last matched:
var s = ‘JavaScript, VBScript, JScript and ECMAScript‘;var re=/[a-zA-Z]+Script/g;// 使用全局匹配:re.exec(s); // [‘JavaScript‘]re.lastIndex; // 10re.exec(s); // [‘VBScript‘]re.lastIndex; // 20re.exec(s); // [‘JScript‘]re.lastIndex; // 29re.exec(s); // [‘ECMAScript‘]re.lastIndex; // 44re.exec(s); // null,直到结束仍没有匹配到
The global match is similar to a search and therefore cannot be used, so it /^...$/
will only match at most once.
The regular expression can also specify i
flags, which indicate that the case is ignored, and the flag indicates that a m
multiline match is performed.
Summary
The regular expression is very powerful, it is impossible to finish it in a short section. You can write a thick book if you want to know everything about the regular. If you frequently encounter problems with regular expressions, you may need a reference book for regular expressions.
JS Regular expression