This paper introduces the concepts of character group, quantifier, string start/end position, grouping, selection structure, inverse reference and named grouping in regular expression by perfecting a regular expression of verifying mobile phone number.
1 Basic verification
That is, verifies whether the string is a 11-digit number.
An expression
[0123456789] {11}
or [0-9]{11}
or \d{11}
Knowledge points
character groups: regular expressions with square brackets to [...] Represents a group of characters. Character groups represent characters that may appear in the same location.
For example, [0123456789] means any one of the matching numbers 0123456789; [0123ABC] matches any one of the numbers 0123 and the letter ABC.
range notation for character groups: use dashes in character groups ([.. -..] ) to represent a range of characters.
For example, [A-z] means matching any one of the lowercase English letters; [a-za-z] means matching all lowercase English letters with one of the uppercase letters; [0-9] means matching any one of the 0123456789.
Note that the default range is the character between the starting character's Acsⅱ code and the Acsⅱ code of the ending character .
Character group Précis-writers method: for some commonly used character groups, regular expressions specify some précis-writers symbols to represent them.
\d all the numbers, i.e. [0-9]
\d all non-numbers, mutually exclusive with \d
\w all the word characters (characters, numbers, underscores), i.e. [0-9a-za-z_]
\w all non-word characters, mutually exclusive with \w
\s all whitespace characters, including spaces, tabs, carriage returns, line breaks, and so on
\s all non-whitespace characters, mutually exclusive with \s
quantifier: A quantifier indicates the number of occurrences of an object (such as a character, character group) that it modifies.
The general form of a quantifier is {m,n} (comma, which cannot be followed by a space), indicating that the number of characters (or groups of characters) it modifies is greater than or equal to M times, less than or equal to n times. In particular
{m} indicates that the decorated object can only appear m times;
{0,n} indicates that the decorated object appears at most n times and appears at least 0 times;
{m,} indicates that the decorated object appears at least m times.
2 is the length really only 11?
Observe the code in the following GIF to see that when the input string is a number with a length of 15, you can also match the first 11 digits. Even when the input character is abcd180123412341234, it can be matched to 11 digits.
This is because the meaning of the regular expression above is "match 11 numbers", so as long as There are 11 consecutive numbers in the input string you can match the success. To verify that the input string is just a phone number, you need to use the string starting position in the regular expression ^ and the string end position of $.
An expression
Knowledge points
There are some symbols in the regular expression that match the position, not the text, which are called anchor points (anchor). ^, $ is one of two of them.
^ Match position is the start position of the string
$ match position is the end of the string
3 more rigorous validation
We all know the domestic mobile phone numbers are 130-139,150-153, 155-159, 180, 182, 185-189, in addition, there are 170, 176-178 and so on. The expression we got in the previous section did not validate the beginning of the phone.
An expression
^1 (3[0-9]|5[012356789]|8[0256789]|7[0678]) \d{8}$
Knowledge points
grouping: You can use parentheses in a regular expression (...) Represents a grouping (subexpression) so that, in addition to the matching results, all matches are returned, and each sub-expression is returned to its own matching content. As a result of the expression in, the No. 0 element of the array is the value that the entire regular expression matches to, and the 1th element is the value that the parentheses match to the inner regular.
Preg_match ('/^1 (3[0-9]|5[012356789]|8[0256789]|7[0678]) \d{8}$/', ' 18012341234 ', $arr);p Rint_r ($arr);/*array ( [0] = 18012341234 [1] = 80) */
Select structure: parentheses pair (...) The sub-expressions inside are separated by a vertical bar | To indicate different choices, and the entire regular within the parentheses matches any selection.
For example, (3[0-9]|5[012356789]|8[0256789]|7[0678]) means that the value matched here can be either 3[0-9] or 5[012356789] or 8[0256789] or 7[0678].
4 icing on the cake
In some cases, there is a sign in the middle of the phone number, which becomes a 180-1234-1234 form, such as the current iphone automatically converts the phone number to this format.
Based on some of the knowledge presented so far, the following regular expressions can be written to be compatible with the 180-1234-1234 form:
^1 (3[0-9]|5[012356789]|8[0256789]|7[0678])-{0,1}\d{4}-{0,1}\d{4}$
Where-{0,1} represents a character-can occur 1 or no, this is the quantifier we have previously known, in fact, in the regular expression, this commonly used quantifier also stipulates a special notation:
? Equivalent to {0,1}, can occur 0 or 1 times
+ equals {1,}, occurrences greater than or equal to 1 times
* Equivalent to {0,}, occurrences greater than or equal to 0 times
Therefore, the above regular expression is also equivalent to the
^1 (3[0-9]|5[012356789]|8[0256789]|7[0678])-?\d{4}-?\d{4}$
However, the above expression can match 18012341234 and 180-1234-1234, in fact, can also match the two forms of 180-12341234, 1801234-1234.
If we only want to match the two forms of 18012341234 and 180-1234-1234, you can use the reverse reference in the regular expression:
^1 (3[0-9]|5[012356789]|8[0256789]|7[0678]) (-?) \d{4}\2\d{4}$
The \2 above is the reverse reference, which is matched with the second parenthesis pair (...). The content to match. The reverse reference is in the form of \num, which refers to the contents of the preceding grouping matches in the regular expression.
In the above regular expression, we use \2 to do the reverse reference, but \1 is not much use, then we can ignore those groups that are not used? The non-capturing groupings in a regular expression can satisfy this requirement:
^1 (?: 3[0-9]|5[012356789]|8[0256789]|7[0678]) (-?) \d{4}\1\d{4}$
The above (?: 3[0-9]|5[012356789]|8[0256789]|7[0678]) is a non-capturing grouping. The non-capturing form is (?: ...), after the non-capturing grouping is used, the result of the match is no longer in the matched result.
The above reference to grouping is based on the number of sub-expressions, when regular expressions are complex or numbered too much to figure out the number of each grouping is a very painful thing. Therefore, the regular expression provides a named grouping :
^1 (?: 3[0-9]|5[012356789]|8[0256789]|7[0678]) (? P -?) \d{4} (? P=separato) \d{4}$
In the above regular expression (? P -?) is named grouping. The form of a named grouping is (? P ...), a reference to a named group is used (?). P=name) Form.
5 Summary
So far, a robust regular expression that validates the phone number is complete. Although the function is very simple, it involves a lot of knowledge points in regular expressions.