Regular expressions
Simple mode: Matches the contents of the $_, just write the pattern in a pair of slashes (/).
such as: #!/usr/bin/env Perl
Use 5.010;
$_= "Yabba dabba Doo";
if (/abba/) {
Say "it matched!";
}
About meta characters
similar to the Shell :
.==> any character;
*==> repeats 0 times and more than 0 times;
+==> repeat once and more;
?==> repeat 0 or one time;
Pattern grouping
In a regular expression, groups the strings with parentheses () .
The inverse reference is written as a numeric number followed by a slash, such as \1 \2 . The corresponding number represents the capturing group for the corresponding order.
The following examples illustrate:
The reverse reference does not have to be in the back of the corresponding capturing group. The following pattern or 4 consecutive non-newline characters followed by y , and a \1 Reverse Reference indicates that the a 4 -character case.
$_= "Yabba dabba Doo";
If (/y (...) d\1/) {
Print "It matched the same after Y and d!\n";
}
You can also use multiple parentheses to separate groups, each of which can have its own reverse reference.
$_= "Yabba dabba Doo";
If (/y (.) (.) \2\1/) {# match ' Abba '
Print "It matched the same after Y and d!\n";
}
So how do you tell which parentheses are the group? Larry explanation: Just one point at the left parenthesis ( including nested parentheses ) The serial number is OK up. such as:
$_= "Yabba dabba Doo";
if (/y (.) (.) \3\2) d\1/) {
Print "It matched the same after Y and d!\n";
}
Disassemble: ( # The first parenthesis
(.) # a second parenthesis
(.) # A third parenthesis
\3
\2
)
The problems left above:
$_= "AA11BB";
If (/(.) \111/) {
Print "It matched!\n";
}
Originally intended to match Aa11 , now good. Perl interpreted it as matching section 111 brackets, and could not find this parenthesis at all, error.
Workaround:\g{1} can eliminate the two semantics of the inverse reference and the direct volume portion of the pattern.
Use 5.010;
$_= "AA11BB";
If (/(.) \g{1}11/) {
Print "It matched." \ n ";
}
And with \g{n} There is also a benefit is that N can be negative, that is, can be -1-2 and so on. The meaning of the expression is the reciprocal or relative position.
-1 represents The nearest opening parenthesis from \g{-1} ;
-2 represents The nearest second left parenthesis from \g{-2} ;
Use 5.010;
$_= "AA11BB";
if (/(.) (.) \g{-1}11/) {
Print "It matched." \ n ";
}
Choose a match
(| )
such as:/fred (and|or) barney/
Character
[A-za-z]
[^a-za-z]
Shorthand for character sets
A shorthand \dfor a character set that represents any number;
$_= ' The HAL-9000 requires authorization to continue. ';
if (/hal-[\d]+/) {
Say "It matched."
}
The modifier/a , written at the end of the regular expression, is expanded according to the semantics of ASCII ( from Perl 5.14 Modifiers introduced ):
Use 5.014;
$_= ' The HAL-9000 requires authorization to continue. ';
If (/hal-[\d]+/a) {# my old ASCII character explained
Say "It matched."
}
Description: The main reason for introducing/A is that the semantics of \d now is not only the scope of [0-9] , it also represents a number of more special characters.
\s can match the following 5 whitespace characters: page break \f horizontal tab \v vertical tab \h Carriage return \ n whitespace \p
Use 5.014;
if (/\s/a) {# semantics explained by the old ASCII Word
Say "The string matched ASCII whitescape";
}
\ r matches a break, either \ r \ n or \ nthe match.
\w matches " word " characters, so-called words are actually made up of [a-za-z0-9] .
Anti-semantic shorthand
\d said [^\d]
\w said [^\w]
\s said [^\s]
These abbreviations can be used as separate character sets in a pattern, or as part of a character set in square brackets. For example:/[\da-fa-f]/
Using regular expressions for matchingm//to match
The previous use // notation pattern, such as /fred/ , is actually a special case of m// .
Common like QW// , separators are also optional, such as:m{fred} m <fred>
Pattern modifiers
/a means expansion in accordance with The semantics of ASCII
/I indicates case-insensitive matching
Print "would to play a game?";
Chomp ($_=<stdin>);
If (/yes/i) {# case-insensitive matching
Say "I like too.";
}
/s means matching any character;
In many cases (.) There is no way to match a newline character, but if the string contains a newline character and you want to match those line breaks, you can use the / S modifier is complete. (The implementation principle is thatPerl Converts the dot number to a character set [\d\d] to handle, meaning it matches any character)
$_= "I saw barney\n down at the bowling alley\nwith fred\nlast night.\n";
If (/barney.*fred/s) {
Print "that string mentions Fred after barney\n";
}
The problem arises:/s will put the pattern in place . are modified to match any character, so what if we just want to match a few of them with any character? ? can be used \ n
/x indicates the addition of whitespace characters;
Examples are as follows:
#!/usr/bin/env Perl
Use 5.010;
$_= "Fred";
if (/fre d/x) {
Say "it matched."
}
Of course, the above modifiers can be combined:
if (/barney.*fred/is) {# use both /I and/ s
Print "that string mentions Fred after barney!\n";
}
Perl Learning notes-Regular expressions