Regular Expression, Regular Expression
When writing a program for processing strings or accessing webpages, it is often necessary to find strings that meet certain complex rules. The regular expression is the delimiter tool that describes these rules. In other words, a regular expression is the code that records the lifecycle text rules.
It is very likely that you have used the wildcard (wildcard) used by the handler in Windows/Dos to search for the keystore file, that is, * and ?. If you want to search for all the Word extension documents under a secret directory, you will search for *. doc. In this example, * is interpreted as any string. Similar to wildcards, regular expressions are also the delimiter tool used by the delimiter to match the delimiter text in the delimiter line. However, the delimiter can more accurately describe your needs than the wildcard.-Of course, the cost is more complicated. For example, you can compile a regular expression named "-", which is used to search for all numbers starting with 0, followed by 2-3 numbers, and then a hyphen "-". it is a string of 7 or 8 digits (such as 010-12345678 or 0376-7654321 ).
Such as him, history, and high. If the hacker uses hi to search, the hi in the hacker will also be found. If you want to precisely find the word "hi", we should use \ bhi \ B for the phrase.
\ B is a special code for the delimiter specified by the regular expression (well, some readers call it metacharacter), representing the start or end of a word, that is, the division of a word. Although English letters are generally separated by spaces, punctuation marks, or line replacement, \ B does not match any of these word separators, it only matches the cursor location.
. It is another metacharacters that match any character except for the delimiter. * It is also a metacharacter, but it does not represent a character or a position, limit is the number -- it specifies that * the content of the front edge can be repeatedly used to make the limit match for the entire expression. Therefore,. * When connected together, it means any number of characters that do not contain the replacement line.
0 \ d-\ d match a string that starts with 0 and then contains two numbers, then there is a hyphen "-" and the last eight digits (that is, the Chinese phone number. Of course, this example can only match a three-digit Partition Number ).
The \ d in this token is a new metacharacters that match a digit (0, or 1, or 2, or... Else... ). -It is not a metacharacter. It only matches its own character-A hyphen (or a hyphen ).
To avoid the repetition of so many annoying people, we can also write this expression: 0 \ d {2}-\ d {8 }. In this example, {2} ({8}) of the \ d post-faces indicates that the front faces \ d must be repeatedly matched twice (eight times ).
There are more metacharacters in the regular expression delimiter. For example, the delimiter matches any blank delimiter white characters, including spaces, tabs, and line breaks, in the plain text. \ W matches letters, numbers, underscores, and Chinese characters.
\ Ba \ w * \ B matches a word that starts with the letter a. First, a word starts with (\ B), and then, then there are any number of letters or numbers (\ w *), and finally the end of the word (\ B ). \ D + matches one or more consecutive numbers. In this example, the + is similar to the * metacharacters. The difference is that * matches any number of times (may be 0 times ), and + matches the repeat once or more times. \ B \ w {6} \ B matches exactly 6 Characters of words.
Metacharacters ^ (the symbol on the same keyspace as the number 6) and $ both match the cursor position, which is a bit similar to \ B. ^ Match the start of the string you want to search for and $ match the end. These two codes are very useful when verifying the imported content. For example, if you want to upload a website with a QQ number of 5 to 12 digits, enable
Usage: ^ \ d {5, 12} $.
The {5, 12} in this example is similar to the {2} mentioned in the previous example, except that the {2} match can only be repeated twice, {5, 12} indicates that the number of repetitions cannot be less than 5, but not more than 12. Otherwise, none of them match. Because the token uses ^ and $, the entire string to be imported into the token must be used to match with \ d {5, 12, that is to say, the entire input must be 5 to 12 digits. Therefore, if the entered QQ number matches this regular expression, it will meet the requirements.
Similar to the option that ignores the upper-case and lower-case characters, some regular expressions are used to process the lower-case expressions, and the lower-case option is used to process multiple rows. If this option is selected, the meaning of ^ and $ is changed to the starting and ending points of the matched rows.
If you want to search for metacharacters, for example, if you want to search for. Or *, the problem arises: you cannot specify them because they are interpreted as otherwise. In this case, you must use \ to cancel the special meaning of these characters. Therefore, you should use \. And \ * For the token \*. Of course, if you want to search for \ Ben, you also need to use \\.
Like [aeiou], it will match any English letter in the English alphabet, [.?!] Match punctuation marks (. Or? Or !) .
We can also easily specify the character range of the delimiter. For example, [0-9] indicates that the value of the Delimiter is exactly the same as that of \ d: The Delimiter is a digit; similarly, [a-z0-9A-Z _] is equivalent to \ w (if only English letters are taken into account ).
The lower plane is a more complex expression :\(? 0 \ d {2} [)-]? \ D {8 }. This expression can match phone numbers in the following formats, such as (010) 88886666, 022-22334455, or 02912345678. Let's perform some analysis on the Escape Character: escape first is an escape character \ (it can appear 0 or 1 time (? ), Followed by a forward value of 0, followed by two numbers (\ d {2}), followed by a) or-or a forward value in a space, it appears once or does not appear (? (\ D {8 }).
The branch condition in the regular expression 'distinct' refers to the following rules: if the limit is full, any of the rules should be regarded as a match, the specific routing method is to use | to separate different rules.
Example:
0 \ d {2}-\ d {8} | 0 \ d {3}-\ d {7} This expression can match two phone numbers separated by a hyphen: one is a three-digit area code, an eight-digit Local Code (for example, 010-12345678), the other is a four-digit area code, and a seven-digit local code (0376-2233445 ).
\(? 0 \ d {2 }\)? [-]? \ D {8} | 0 \ d {2} [-]? \ D {8}: this expression matches the phone number of the three-digit area code. The area code can be enclosed in square brackets or hidden, the area code and the local code can be separated by a hyphen or space, or there is no interval between them. You can use the branch condition to extend the expression to a four-digit area code.
\ D {5}-\ d {4} | \ d {5} is used to match the zip code of the United States. The u. S. Zip code uses five digits or a hyphen to separate the nine digits. The reason why we need to give this example is that it can explain the sequence of conditions when the condition is used as a branch condition. If you change it to \ d {5} | \ d {5}-\ d {4, then, it will only match the 5-digit ZIP code (and the first 5-digit of the 9-digit ZIP code ). The reason is that, when matching the branching condition, each condition will be tested from left to right. If the condition is full, the other conditions will not be ignored.