Definition
In javascript, we can use a built-in class to define a regular expression.
var reName = new RegExp("bkjia");
In fact, the RegExp constructor can accept two parameters. In addition to the pattern strings that need to be matched, you can also define the second parameter that specifies the additional processing method.
Var reName = new RegExp ("bkjia", "I"); // case insensitive
I'm curious about what results will reName get? So:
document.write(reName);
The result is:/bkjia/I. Therefore, we get the second definition method (perl style) of the regular expression in javascript ):
var reName = /bkjia/;
What about the second parameter? Of course, you can also specify the second parameter for it:
var reName = /bkjia/i;
Both methods are feasible and can be selected based on your habits. Just as you can use var s = new String ("for a simple life"); while defining strings, you can also use var s = "for a simple life "; is exactly the same. We recommend that you use the perl style. In addition to being concise, it also saves the trouble to escape "\" when using RegExp to construct function definitions.
To match the character "\", the perl style is written as follows:
var res = /\\/;
The write rules of constructor must be escaped for both:
var res = new RegExp("\\\\");
Is it a lot of trouble?
Remember, in a complete regular expression, "\" is followed by another character.
Regular Expressions in javascript
In fact, the above section has already introduced the implementation of the regular expression in javascript. Only the regular expression is defined. But how can we actually use the regular expression in javascript? In javascript, RegExp and String objects both have methods for processing regular expressions.
- Test -- RegExp's test method is used to test whether the string matches the given matching mode and returns a Boolean value;
- Exec -- RegExp the exec method returns the first matched array or null;
- Match -- String the match method returns an array containing all matching substrings;
- Replace -- String replace method to replace string. Regular Expressions are supported;
- Search -- similar to the indexof method of String, the difference is that search supports regular expressions, not just strings;
- Split -- split the String according to certain rules and store the substring to the String method in the array.
For details about how to use these functions, refer to the relevant function manual of JS.
In addition to methods, an instance object also has attributes. A regular expression has the following attributes:
- Global -- Boolean value. If global option g is set, true is returned; otherwise, false is returned;
- IgnoreCase -- Boolean value. If case-insensitive option I is ignored, true is returned; otherwise, false is returned;
- LastIndex-integer. It is filled in when exec or test is used, indicating the position of the next matching character;
- Multiline -- Boolean value, indicating whether to set multi-row mode option m. If set, true is returned; otherwise, false is returned;
- Source -- the meta-string form of the regular expression. The source of/\/will return "\".
Metacharacters
Some special character symbols in regular expressions cannot be used directly. They must be escaped before they can be used. For example, "\", because these characters have special syntax meanings in regular expressions, such characters are called metacharacters. The metacharacters in regular expressions include:
.,\,/,*,?,+,[,(,),],{,},^,$,|
It may not be easy to remember. When you cannot determine whether a character is a metacharacter, It is not wrong to escape it bravely, escaping characters that are not metacharacters does not cause any problems, but unexpected errors may occur if they are not escaped.
Group match
A simple character can be a matching pattern, but the reality is often not that simple. For example, we want to match a 0-9 Number:
var i = 5;var j = 6;
How can I write this regular expression to match both numbers at the same time? A simple character expression cannot be completed. In this case, we can define a character set (character class) for matching with 0-9 digits.
var reNum = /[0123456789]/;document.write(reNum.test(i));//truedocument.write(reNum.test(j));//true
True is output for all matching results using the test method.
Range matching
In the previous example, group matching is used. However, if you want to match all 26 English letters, including uppercase and lowercase letters, you can still use group matching:
var reLetter = /abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ/;
Well, this regular expression is completely correct, but does it feel too long? Is there a way to make it more concise? Of course there are. You can specify a matching range for characters or numbers.
var reNum = /[0-9]/;var reLetter = /[a-zA-Z]/;
In this way, "-" is used to define A matching interval. The specific character sequence is determined by the ASCII delimiter table, so it cannot be written as/A-z /, because Z-a also contains other characters.
Non-matching
"!" Is used in many programming languages. Take non-operations, including javascript. There are also non-operations in the regular expression. For example,/[^ 0-9]/is a non-operation regular expression.
var i = 5;var s = "o";var rec = /[^0-9]/;document.write(rec.test(i));//falsedocument.write(rec.test(s));//true
The ^ symbol is used to complete the non-operation, and ^ 0-9 must also be included in [], because ^ has another special purpose.
Special characters
Maybe you think/[a-zA-Z]/,/[0-9]/is not concise enough. Indeed, some specific character sets in regular expressions can be replaced by some special metacharacters. These special characters are not essential, but they can bring us a lot of convenience. /[0-9]/can be written as follows:
var reNum = /\d/;
What about uppercase and lowercase letters? Unfortunately, apart from POSIX character classes (javascript does not support POSIX character classes), there is no special alternative to the special character classes that support uppercase and lowercase letters.
Common special characters include:
- \ D any numeric character, equivalent to [0-9]
- \ D any non-numeric character, equivalent to [^ 0-9]
- \ W any letter, number, or underline character, equivalent to [a-zA-Z _]
- \ W any non-alphanumeric or underline character, equivalent to [^ a-zA-Z _]
- \ S any blank character, including a page break, line break, carriage return, tab, and vertical tab, equivalent to [\ f \ n \ r \ t \ v]
- \ S any non-blank character, equivalent to [^ \ f \ n \ r \ t \ v]
- . Any character other than line feed and carriage return is equivalent to [^ \ n \ r]
The same letter is case-insensitive.
Hexadecimal and octal characters
It is also feasible to use hexadecimal or octal characters in regular expressions. The matching characters are the characters corresponding to the converted decimal value in ASCII.
Var reAt =/\ x40/; // The hexadecimal character \ x40 (64) corresponds to the character "@" var reA =/\ 0101 /; // octal character \ 0101 (65) corresponds to the character ""
Duplicate match
Taking matching an email address as an example, an email address such as the mymail@mail.com must contain a valid username mymail, @ symbol, and a valid domain. The number of characters in the user name and domain name cannot be determined, but one thing is certain: the user name must be at least one character, and the domain name must contain at least one dot between two characters. So we can do this:
var reMail = /\w+@\w+\.\w+/i;var email = "mymail@mail.com";document.write(reMail.test(email));//true
"+" Indicates that the character appears once or multiple times, at least once. This regular expression does not actually match all valid email addresses. We will continue to improve it later.
In addition to "+", there are many other ways to specify the number of matching times.
- ? Zero or one occurrence, at most once
- * Any occurrence (zero, one, multiple)
- + Appears once or multiple times, at least once
- {N} can appear only n times
- {N, m} appears at least n times, at most m times
The three URLs www.gogle.com, www.google.com, and www.gooogle.com can correctly open the google homepage, so you can use {n, m} to match one of them, 2 or 3 letters "o ".
var gogle = "www.gogle.com";var google = "www.google.com";var gooogle = "www.gooogle.com";var reGoogle = /w{3}\.go{1,3}gle\.com/i;document.write(reGoogle.test(gogle));//truedocument.write(reGoogle.test(google));//truedocument.write(reGoogle.test(gooogle));//true
In the above regular expression, we use {3} to specify the character "w" and only
It can appear three times, and the letter "o" can appear one to three times with {1, 3.
Prevent over-matching
There is such an HTML text:
var html = "<em>bkjia</em>for a simple life<em>http://bkjia.com/</em>";
If <em> </em> and the text in the middle are matched, the regular expression can be written as follows:
var reEm1 = /<em>.*<\/em>/gi;document.write(html.match(reEm1));//"<em>bkjia</em>for a simple life<em>http://bkjia.com/</em>"var reEm2 = /<em>.*?<\/em>/gi;document.write(html.match(reEm2));//<em>bkjia</em>,<em>http://bkjia.com/</em>
When greedy mode is used, ". *" is used for matching to the maximum extent, so the entire string is output. In the inert mode ,".*?" Only perform a minimum matching, so the complete output is the string we need.
The syntax of the inert mode is very simple, that is, add "?" After the greedy Mode You can.
- *-> *?
- +-> +?
- {N ,}-> {n ,}?
Location match
var s = “_Don’t do it!”;
How can I match the word "do? It's easy!
var reDo = /do/gi;document.write(s.match(reDo));//Do,do
However, this simple regular expression/do/gi matches "do" in "don't", but this is not the expected result. In the regular expression, the qualifier "\ B" is used to match the word boundary.
var reDo = /\bdo\b/gi;document.write(s.match(reDo));//do
What exactly does "\ B" match ?" "\ B" matches a position, which is located between "\ w" (letters, numbers, underscores) and "\ W.
Since there is "\ B", is there "\ B? Of course, he and "\ B" are just the opposite. The origin matches a location that is not a word boundary. For example, when "do" in "don't" is matched in the above example, "\ B" can be used.
var reDo = /\Bdo\B/gi;document.write(s.match(reDo));//Do
In the case of non-matching, ^ is used only in [] and followed by [to obtain non-matching, and ^ has another purpose-string boundary matching.
- ^ Used to match the start of a string
- $ Is used to match the end of a string.
For example, we want to match a net domain name in the form of a http://bkjia.com:
var url = "http://bkjia.com";var reUrl = /^(http):\/\/bkjia\.(net)$/gi;document.write(reUrl.test(url));//true
The reUrl of the regular expression must start with "http" and end with "net.
Another example is the extended string method trim:
function trim(s){ return s.replace(/(^\s*)|(\s*$)/g,"");}
At the same time, we can use (? M) to enable the Branch matching mode. In this way, ^ not only matches the beginning of a normal string, but also matches the start position after the line separator (line break). $ not only matches the end of a normal string, but also matches the line separator (line break) end position.
Additional reading
The topic list of this article is as follows:
- What is a regular expression?
- Getting started with regular expressions: match a Fixed Single Character
- Getting started with regular expressions: matching any single character
- Getting started with regular expressions: Use character groups
- Getting started with regular expressions: Use character ranges in character groups
- Getting started with regular expressions: Use of assense character groups
- Getting started with regular expressions: matching null characters
- Getting started with regular expressions: Match one or more characters
- Regular Expression: matches zero or multiple characters.
- Regular Expression entry: matches zero or one string.
- Getting started with regular expressions: Match fixed numbers of Characters
- Getting started with regular expressions: match the number of characters in a range
- Getting started with regular expressions: greedy matching
- Getting started with regular expressions: inert matching
- Entry to Regular Expressions: two matching Modes
- Getting started with regular expressions: match word boundaries
- Getting started with regular expressions: boundary definition and relativity
- Getting started with regular expressions: Match non-word boundaries
- Getting started with regular expressions: match the beginning and end of a text
- Entry to regular expression: submode
- Regular Expression entry: "or" Match
- Getting started with regular expressions: replacing with referenced text
- Getting started with regular expressions: unmatched
- Regular Expression Summary: Regular Expressions in JavaScript
- Regular Expression Summary: advanced application of regular expressions in js