Pattern matching in Perl learning notes _perl

Source: Internet
Author: User
Tags numeric lowercase numeric value uppercase letter

First, Introduction
pattern refers to the character of a particular sequence that is searched for in a string, which is included by a backslash:/def/, Mode def. Its usage, such as combining function split, splits a string into multiple words in a pattern: @array = Split (//, $line);

Two, matching operator =~,!~
=~ Verify that the match was successful: $result = $var =~/abc/; If the pattern is found in the string, it returns a value other than 0, true, or 0, or false, if it does not match.!~ is the opposite. These two operators are suitable for conditional control, such as:

Copy Code code as follows:

if ($question =~/please/) {
Print ("Thank for being polite!\n");
}
else {
Print ("That is not very polite!\n");
}

Iii. Special characters in the pattern
Perl supports special characters in the pattern and can play a special role.
1, character +
+ means one or more of the same characters, such as:/de+f/-Def, Deef, Deeeeef, etc. It matches as many of the same characters as possible, as/ab+/in the string ABBC will be ABB, not AB. When there are more than one space between the words in a line, you can split the following: @array = Split (/+/, $line);
Note: The Split function always starts a new word every time it encounters a split pattern, so if $line begins with a space, the first element of @array is an empty element. But it can distinguish whether there are really words, if $line only space, then @array is an empty array. and the tab character in the previous example is treated as a word. Pay attention to corrections.
2, characters [] and [^]
[] means matching one of a set of characters, such as/a[0123456789]c/will match a plus number plus C's string. Combined with + example:/d[ee]+f/matching Def, Def, Deef, Deef, Deeeeeeeef, etc. ^ represents all except its characters, such as:/d[^dee]f/matches a string of D plus non-e characters alphanumeric F.
3, character * and?
They are similar to +, except that they match 0, one or more of the same characters, and match 0 or one of the characters. such as/de*f/matching DF, Def, Deeeef,/de?f/matching DF or def.
4, escape character
If you want to include characters that are usually considered special in the pattern, you must add a slash before it. For example: in/\*+/, \* denotes the character *, not the meaning of one or more characters mentioned above. The slash is expressed as/\\/. The \q and \e are escaped with characters available in PERL5.
5. Match any letter or number
The above mentioned pattern/a[0123456789]c/matches the string with the letter a plus any number plus C, and the other means:/a[0-9]c/, similarly, [A-z] denotes any lowercase letter, [a-z] denotes any uppercase letter. Any uppercase and lowercase letters, numbers are represented by:/[0-9a-za-z]/.
6, Anchor mode
Anchor description
^ or \a only match string heads
$ or \z only match string tail
\b Match word boundaries
\b Word Internal match
Example 1:/^def/only matches a string that begins with Def,/$def/matches only a string at the end of Def, and the combined/^def$/matches only the string def (?). \a and \z are different from ^ and $ when matching multiple lines.
Example 2: Verify the type of the variable name:

Copy Code code as follows:

Example 3:\b matches the word boundary:/\bdef/matches def and Defghi words with Def, but does not match abcdef. /def\b/matches def and abcdef words at the end of Def, but does not match defghi,/\bdef\b/matches only String def. Note:/\bdef/can match $defghi, because $ is not considered part of the word.
Example 4:\b in the word internal matching:/\bdef/matching abcdef, but not matching def;/def\b/matching defghi,/\bdef\b/matching CDEFG, Abcdefghi, but do not match def,defghi,abcdef.
7, variable substitution in the pattern
Divide a sentence into words:
$pattern = "[\\t]+";
@words = Split (/$pattern/, $line);
8. Character Range escape
Escape Character Description Range
\d any number [0-9]
\d any character except a number [^0-9]
\w any word characters [_0-9a-za-z]
\w any non-word characters [^_0-9a-za-z]
\s Blank [\r\t\n\f]
\s not blank [^ \r\t\n\f]
Example:/[\da-z]/matches any number or lowercase letter.
9. Match any character
Character "." Matches all characters except newline, usually with *.
10, matching the specified number of characters
The character pair {} Specifies the number of occurrences of the matched character. For example:/de{1,3}f/matching Def,deef and deeef;/de{3}f/matching deeef;/de{3,}f/match not less than 3 E between D and F;/de{0,3}f/matches no more than 3 E between D and F.
11. Specify options
Character "|" Specifies two or more selections to match the pattern. such as:/def|ghi/matching Def or ghi.
Example: Verifying the legality of numbers
if ($number =~/^-?\d+$|^-?0[xx][\da-fa-f]+$/) {
Print ("$number is a legal integer.\n");
} else {
Print ("$number is not a legal integer.\n");
}
where ^-?\d+$ matches decimal digits, ^-?0[xx][\da-fa-f]+$ matches hexadecimal digits.
12. Partial Reuse of patterns
When the same part of the pattern appears multiple times, it can be enclosed in parentheses and referenced multiple times with \ n to simplify the expression:/\d{2} ([\w]) \d{2}\1\d{2}/match:
12-05-92
26.11.87
07 04 92 etc
Note: the/\d{2} ([\w]) \d{2}\1\d{2}/differs from/(\d{2}) ([\w]) \1\2\1/, which matches only strings in the form of 17-17-17, and does not match 17-05-91.
13. Escape and order of execution of specific characters
As with operators, escape and specific characters also have an order of execution:
Special Character description
() mode memory
+ * ? {} Number of occurrences
^ $ \b \b Anchor
| Options
14. Specify Pattern delimiter
By default, the pattern delimiter is a backslash/, but it can be specified by its own letter m, such as:
m!/u/jqpublic/perl/prog1! Equivalent to/\/u\/jqpublic\/perl\/prog1/
Note: When using the letter ' as a delimiter, do not make variable substitution, when using special characters as delimiters, its escape function or special function is not used.
15. Mode Order Variable
The result of invoking the reused part after the pattern match can be $n with the variable, and all the results are $& with the variable.

Copy Code code as follows:

$string = "This string contains the number 25.11.";
$string =~/-? (\d+) \.? (\d+)/; # match result is 25.11
$integerpart = $; # now $integerpart = 25
$decimalpart = $; # now $decimalpart = 11
$totalpart = $&; # now Totalpart = 25.11

Mode-matching options
option Description
G Match all possible patterns
I ignore case
M treats strings as multiple lines
o only assign one value at a time
s treats a string as a single line
x ignores whitespace in the pattern

1. Match all possible modes (g option)

Copy Code code as follows:

@matches = "Balata" =~/.a/g; # now @matches = ("ba", "La", "ta")
Matching loops:
while ("Balata" =~/.a/g) {
$match = $&;
Print ("$match \ n");
}
The results are:
Copy Code code as follows:

Ba
La
Ta

When option g is used, a function pos is available to control the next matching offset:

Copy Code code as follows:

$offset = pos ($string);
POS ($string) = $newoffset;

2. Ignore case (i option)
/de/i match de,de,de and de.
3, the string as multiple lines (M option)
in this case, the ^ symbol matches the start of the string or the beginning of a new line, and the $ symbol matches the end of any line.
4. Perform only one variable substitution example

Copy Code code as follows:

$var = 1;
$line =;
while ($var < 10) {
$result = $line =~/$var/O;
$line =;
$var + +;
}

Match/1/each time.
5. Consider a string as a single case
/a.*bc/s matches the string axxxxx \NXXXXBC, but/a.*bc/does not match the string.
6. Ignore spaces in the pattern
/\d{2} ([\w]) \d{2} \1 \d{2}/x equivalent to/\d{2} ([\w]) \d{2}\1\d{2}/.

V. Substitution operator
The syntax is s/pattern/replacement/, and the effect is to replace the part in the string with the pattern in replacement. Such as:

Copy Code code as follows:

$string = "Abc123def";
$string =~ s/123/456/; # now $string = "Abc456def";

You can use the pattern order variable $n in the replacement section, such as s/(\d+)/[$1]/, but special characters that do not support the pattern in the replacement section, such as {},*,+, etc., such as s/abc/[def]/will replace ABC with [DEF].
The options for the Replace operator are as follows:
Option description
G Change all matches in the pattern
I ignore capitalization in the pattern
E substitution string as an expression
M treats the string to be matched as multiple rows
o Only assign one time
s treats the string to be matched as a single line
x ignores whitespace in the pattern
Note: The E option considers the replacement part of the string as an expression and evaluates its value before replacing it, such as:

Copy Code code as follows:

$string = "0ABC1";
$string =~ s/[a-za-z]+/$& x 2/e; # now $string = "0ABCABC1"

Vi. translation Operators
This is another way to replace the syntax: tr/string1/string2/. Similarly, string2 is the replacement part, but the effect is to replace the first character in the string1 with the first character in the string2, replace the second character in the string1 with the second character in the string2, and so on. Such as:
$string = "ABCDEFGHICBA";
$string =~ tr/abc/def/; # now String = ' defdefghifed '
When string1 is longer than string2, its extra characters are replaced with the last character of String2, and the first substitution character is used when the same character occurs more than once in string1.
The options for translation operators are as follows:
Option description
C Translation of all unspecified characters
d Delete all specified characters
s indents multiple identical output characters into one
such as $string =~ tr/\d//C; Replace all non-numeric characters with spaces. $string =~ tr/\t//d, Remove tab and space, $string =~ tr/0-9//cs, and replace other characters between numbers with a single space.

Vii. Extended Pattern Matching
Perl supports some of the pattern-matching capabilities that PERL4 and standard UNIX pattern matching operations do not have. Its syntax is: (? pattern), where C is a character, patterns are the mode or sub pattern that works.
1. Do not store the matching contents in brackets
In Perl mode, the child mode in parentheses is stored in memory, which cancels the storage of the matches within the brackets, such as the \1 in the/(?: A|b|c) (D|e) f\1/that represents the matched D or E, rather than a or B or C.
2. Inline mode option
Typically, after the mode option is placed, there are four options: I, M, s, x can be used inline, syntax is:/(? option) pattern/, equivalent to/pattern/option.
3, affirmative and negative foresight match
The affirmative preview matching syntax is/pattern (? =string)/, whose meaning matches the pattern followed by string, instead, (?!). string) meaning to match a pattern that is not followed by string, such as:

Copy Code code as follows:

$string = "25abc8";
$string =~/abc (? =[0-9])/;
$matched = $&; # $& is the matching pattern, here for ABC, not ABC8

4, Mode annotation
In PERL5, you can use the #来加注释 in a pattern, such as:

Copy Code code as follows:

if ($string =~/(? i) [a-z]{2,3} (? # match two or three alphabetic characters)/{
...
}

The summary is summarized as follows:
The syntax used by/pattern/in a word processing pattern
/pattern/
Results
.
Find a string with only one character in addition to the newline character \ n
X?
Looking for 0 or 1 x characters
x*
Looking for 0 or 0 x characters
.*
Looking for 0 or more than 0 characters
x+
Looking for 0 or 1 x characters
.+
Looking for 1 or more than 1 characters
{m}
Find the exact number of M-specified characters
{M,n}
Looking at the number of m numbers above, n each number of the following specified characters
{m,}
Find the number of m above the specified characters
[]
Find characters in []
[^]
Find characters that don't match []
[0-9]
Find any one character that matches 0 to 9.
[A-z]
Find any one character that matches a to Z
[^0-9]
Find any one character that does not conform to 0 to 9
[^a-z]
Find any character that does not conform to A to Z
^
Find characters at the beginning of a character
$
Find characters at the end of a character
\d
Looking for a digit (number) character, as in [0-9] syntax
\d+
Looking for a digit (number) of strings above, and [0-9]+ syntax
\d
Looking for a non-digit (non-numeric) character, as in [^0-9] syntax
\d+
Look for a non-digit (not a number) character, as in [^0-9]+ syntax
\w
Search for an English letter or numeric character, as in [a-za-z0-9] syntax
\w+
Search for more than one English letter or numeric character, as in [a-za-z0-9]+ syntax
\w
Find a character that is not an English letter, a numeric value, and the same as [^a-za-z0-9] syntax
\w+
Search for more than one letter, numeric characters, and [^a-za-z0-9]+ syntax
\s
Looking for a blank character, as in [\n\t\r\f]
\s+
Search for more than one blank character, as in [\n\t\r\f]+
\s
Find a character that is not blank, as in [^\n\t\r\f]
\s+
Search for more than one blank character, as in [^\n\t\r\f]+
\b
Looking for a string that doesn't have an English letter, a value that is a boundary
\b
Find a string with an English letter and a value
A|b|c
Find a string that matches a character or a B or C character
Abc
Find a string that contains ABC
(pattern)
() This symbol is the memory of the found character, is a very useful grammar
The string found in the first () becomes either the variable or the \1 variable
The string found in the second () becomes a $ $ or \2 variable
And so on, the author will explain its usage in detail in the next section
/pattern/i
I this parameter is to ignore the meaning of the English case, that is, in the search for strings, not to consider the case of English
\
If you want to find a character in pattern mode that has a special meaning, precede the character with the symbol, which will invalidate the special character.

A simple example of two-word processing mode (Regular Expression)
Read the previous section of the word processing model (Regular Expression), beginners for the application of this grammar may not be very clear, so the author will be in this section, some examples in the word processing mode for you to see:
Example
Description
/perl/
Find a String containing Perl
/^perl/
Find a string that starts with Perl
/perl$/
Find a string with Perl at the end
/c|g|i/
Find a string containing C or G or I
/cg{2,4}i/
Find C followed by a string of 2 to 4 G, followed by I
/cg{2,}i/
Find C followed by a string of more than 2 G, followed by I
/cg{2}i/
Find C followed by a string of 2 G, followed by I
/cg*i/
Find C followed by a string of 0 or more g, followed by I, as/cg{0,1}i/
/cg+i/
Find C followed by more than one g, followed by a string of C, as/cg{1,}i/
/cg?i/
Find C followed by 0 or a G, followed by a C string, like/cg{0,1}i/
/c.i/
Find C followed by an arbitrary character, followed by the string of I
/C.. i/
Find C followed by a string of two arbitrary characters followed by I
/[cgi]/
Find a string that matches any one of these three characters
/[^cgi]/
Find a string without any of these three characters
/\d/
Find strings that match numbers
You can use/\d+/to represent a string of one or more values
/\d/
Find a string that matches a value that is not numeric
You can use/\d+/to represent one or more non-numeric strings.
/\w/
Find strings that match English letters and numbers
You can use/\w+/to represent a string of one or more English letters, numeric values.
/\w/
Find strings that match non-English letters, numeric characters
You can use/\w+/to represent one or more non-English letters, numeric strings
/\s/
Find a string that matches a blank
You can use/\s+/to represent a string of one or more whitespace characters
/\s/
Find a string that is not blank
You can use/\s+/to represent one or more strings that are not blank characters
/\*/
Find a string that matches the symbol for * because it has a special meaning in the word processing mode, so add the symbol before this special symbol to invalidate the special character.
/abc/i
Find strings that match ABC and do not consider the case of these strings
three character processing mode (Regular expresion) related operators and functions
In Perl program writing, the =~ and!~ operators and the S and T two functions are used to match the word processing pattern/pattern/, and if they can use these instructions, they can be very. It's easy to deal with strings and, of course, it's more handy in CGI programming. Now let the authors introduce the use of these operators and functions:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.