Regular Expression (4) -- location matching (loop view, word separator, and anchor)
The content here is about location matching in a regular expression. It contains two parts: a simple anchor point and a word separator, and a complicated zero-width assertion. Complexity has the advantages of complexity. Writing is complicated, and the location that can be matched is more complicated. A zero-width assertion can also be called a loop view. This part is important. There are several standard anchor points. They are ^, $. In general, if no mode is involved, the escape character (^) matches the starting position of the text. $ Matches the position above the line break \ n at the end of the text. For example, the regular expression "s $" indicates that the regular expression above matches the row ending with the character "s, it is a string like [· s/n. Word delimiters the word delimiters are also supported in different tools. Generally, the best support is \ B, and the second tool also supports \ <, \>. \ B matches the boundary of a word, that is, it can match the start or end of a word. The start position and end position of the word are matched respectively. This metacharacter is very simple. However, for each tool, its definition of "word character" has different meanings, which may lead to different tools and different content that can be matched by word delimiters. But in general, as long as it is a normal word, such as [happy], [new] words will certainly be able to match. Word delimiters do not analyze the semantics of words. The rule is usually only simple connected characters. Therefore, text like [a1d2c3d4] can also be matched successfully. However, words like [I .T] cannot be matched. Loop view (zero-width assertion) is also called a zero-width assertion. It can be understood literally as follows: Zero Width means that the matched width is 0 (if it matches 1 character, the width is 1, and the matching position cannot match any character, if the width is 0), assertions must satisfy certain conditions. So the zero-width assertion means the position matched under certain conditions. There are two types of view (sequential and backward). They are [sequential view (? = Expression)] [sequential negative view (?! Expression)] [forward view (? <= Expression)] [sequential negative view (? <! Expression)] the sequential view is the matching from the left to the right, and the reverse order is the matching from the right to the left. However, the use of reverse order is limited in the current regular expression. This limit is put at the end of the lecture. First, let's start with the context matching. For the sequence, the view (? = Expression? = Other expressions) When Expression matches the character on the right, Expression (? = Expression) the matching is successful, and the Report Engine matches successfully at the current position. When Expression fails to match the character on the right, the Expression (? = Expression) matching failed. Negative sequential view (?! Expression ?! When Expression fails to match the character on the right, the Expression (?! Expression. When Expression matches the character on the right, the Expression (?! Expression) failed to match. For a forward view (? <= Expression), the subexpression (division? <= Other expressions) When Expression matches the character on the left, Expression (? <= Expression) the matching is successful, and the engine is successfully matched at the current position. When Expression fails to match the character on the left, the Expression (? <= Expression) matching failed. For reverse-order negative view (? & Lt! Expression? & Lt! When Expression fails to match the character on the left, the Expression (? & Lt! Expression. When Expression matches the character on the left, the Expression (? & Lt! Expression) failed to match. The following are some examples. The example is from the network. The first is an example of sequential negative view. Source string: aa <p> one </p> bb <div> two </div> cc Regular Expression: <(?! /? P \ B) [^>] +> This regular expression indicates that all tags except <p ·> or </p> are matched as shown in the following figure, in this regular expression, <matches itself ,(?! /? P \ B) is a sequential negative expression. The subexpression is </? P \ B, which means that the right side of the expression cannot be the character [/p] or [p]. The question mark indicates a match or a mismatch. You should remember it. Then [^>] + [^ ·] is an excluded character group, indicating that all characters except [>] can be matched. The number of matching characters is the least once, no limit. The last expression> matches itself. The meaning of the above expression is: match a text such as <(<The right cannot be p or/p) (N characters except>)>. In this example, we can also see that expressions can usually be split into subexpressions, and finally connect them to form a complete expression. Let's look at the source string of an example in reverse order: <div> a test </div> Regular Expression :(? <= <Div>) [^ <] + (? = </Div>) This regular expression matches the content between the <div> and </div> labels, excluding the <div> and </div> labels. (? <= <Div>) means that the left side cannot be <div>, while (? = </Div>) indicates that the right side cannot be </div>. For more information, see the second reference. Let's go back and talk about the restriction of backward view. A necessary condition for reverse view is that reverse view can only be used in a limited length of text. This means that the length of the text that can be matched by the expression must be limited. If the expression is (? <! Books ?) Or (? <! \ W +) cannot match because their length cannot be determined. For this reason, some tools do not support reverse view.