Perl Pattern Matching Parameters Detailed _ Basic Tutorial

Source: Internet
Author: User
Tags lowercase stdin uppercase letter
First, Introduction
Pattern refers to the character of a particular sequence that is searched for in a string, which is included by a backslash:/def/, Mode def. Its usage, such as combining function split, splits a string into multiple words in a pattern: @array = Split (//, $line);
Two, matching operator =~,!~
=~ Verify that the match was successful: $result = $var =~/abc/; If the pattern is found in the string, it returns a value other than 0, true, or 0, or false, if it does not match.!~ is the opposite.
These two operators are suitable for conditional control, such as:
if ($question =~/please/) {
Print ("Thank for being polite!\n");
}
else {
Print ("That is not very polite!\n");
}
Iii. special characters in the pattern
Perl supports special characters in the pattern and can play a special role.
1, character +
+ means one or more of the same characters, such as:/de+f/-Def, Deef, Deeeeef, etc. It matches as many of the same characters as possible, as/ab+/in the string ABBC will be ABB, not AB.
When there are more than one space between the words in a line, you can split the following:
@array = Split (/+/, $line);
Note: The Split function always starts a new word every time it encounters a split pattern, so if $line begins with a space, the first element of @array is an empty element. But it can distinguish whether there are really words, if $line only space, then @array is an empty array. and the tab character in the previous example is treated as a word. Pay attention to corrections.
2, characters [] and [^]
[] means matching one of a set of characters, such as/a[0123456789]c/will match a plus number plus C's string. Combined with + example:/d[ee]+f/matching Def, Def, Deef, DEDF, Deeeeeeeef, etc. ^ represents all except its characters, such as:/d[^dee]f/matches a string of D plus non-e characters alphanumeric F.
3, character * and?
They are similar to +, except that they match 0, one or more of the same characters, and match 0 or one of the characters. such as/de*f/matching DF, Def, Deeeef,/de?f/matching DF or def.
4, escape character
If you want to include characters that are usually considered special in the pattern, you must add a slash before it. For example: in/\*+/, \* denotes the character *, not the meaning of one or more characters mentioned above. The slash is expressed as/\\/. The \q and \e are escaped with characters available in PERL5.
5. Match any letter or number
The above mentioned pattern/a[0123456789]c/matches the string with the letter a plus any number plus C, and the other means:/a[0-9]c/, similarly, [A-z] denotes any lowercase letter, [a-z] denotes any uppercase letter. Any uppercase and lowercase letters, numbers are represented by:/[0-9a-za-z]/.
6, Anchor mode

Anchor Describe
^ or \a Match String Header only
$ or \z Match string Tail only
\b Match word boundaries
\b Word Internal match

Example 1:/^def/only matches a string that begins with Def,/$def/matches only a string at the end of Def, and the combined/^def$/matches only the string def (?). \a and \z are different from ^ and $ when matching multiple lines.
Example 2: Verify the type of the variable name:
if ($varname =~/^\$[a-za-z][_0-9a-za-z]*$/) {
Print ("$varname is a legal scalar variable\n");
} elsif ($varname =~/^@[a-za-z][_0-9a-za-z]*$/) {
Print ("$varname is a legal array variable\n");
} elsif ($varname =~/^[a-za-z][_0-9a-za-z]*$/) {
Print ("$varname is a legal file variable\n");
} else {
Print ("I don ' t understand what $varname is.\n");
}
Example 3:\b matches the word boundary:/\bdef/matches def and Defghi words with Def, but does not match abcdef. /def\b/matches def and abcdef words at the end of Def, but does not match defghi,/\bdef\b/matches only String def. Note:/\bdef/can match $defghi, because $ is not considered part of the word.
Example 4:\b in the word internal matching:/\bdef/matching abcdef, but not matching def;/def\b/matching defghi,/\bdef\b/matching CDEFG, Abcdefghi, but do not match def,defghi,abcdef.
7, variable substitution in the pattern
Divide a sentence into words:
$pattern = "[\\t]+";
@words = Split (/$pattern/, $line);
8. Character Range escape

E Escape character Describe Range
\d Any number [0-9]
\d Any character except a number [^0-9]
\w Any word character [_0-9a-za-z]
\w Any non-word character [^_0-9a-za-z]
\s Blank [\r\t\n\f]
\s Not blank [^ \r\t\n\f]

Example:/[\da-z]/matches any number or lowercase letter.
9. Match any character
Character "." Matches all characters except newline, usually with *.
10, matching the specified number of characters
The character pair {} Specifies the number of occurrences of the matched character. For example:/de{1,3}f/matching Def,deef and deeef;/de{3}f/matching deeef;/de{3,}f/match not less than 3 E between D and F;/de{0,3}f/matches no more than 3 E between D and F.
11. Specify options
Character "|" Specifies two or more selections to match the pattern. such as:/def|ghi/matching Def or ghi.
Example: Verifying the legality of numbers
if ($number =~/^-?\d+$|^-?0[xx][\da-fa-f]+$/) {
Print ("$number is a legal integer.\n");
} else {
Print ("$number is not a legal integer.\n");
}
where ^-?\d+$ matches decimal digits, ^-?0[xx][\da-fa-f]+$ matches hexadecimal digits.
12. Partial reuse of patterns
When the same part of the pattern appears multiple times, enclose it in parentheses and refer to it multiple times to simplify the expression:
/\d{2} ([\w]) \d{2}\1\d{2}/match:
12-05-92
26.11.87
07 04 92 etc
Note: the/\d{2} ([\w]) \d{2}\1\d{2}/differs from/(\d{2}) ([\w]) \1\2\1/, which matches only strings in the form of 17-17-17, and does not match 17-05-91.
13. Escape and order of execution of specific characters
As with operators, escape and specific characters also have an order of execution:

Special characters Describe
() Mode memory
+ * ? {} Number of occurrences
^ $ \b \b Anchor
| Options

14. Specify Pattern delimiter
By default, the pattern delimiter is a backslash/, but it can be specified by its own letter m, such as:
m!/u/jqpublic/perl/prog1! Equivalent to/\/u\/jqpublic\/perl\/prog1/
Note: When using the letter ' as a delimiter, do not make variable substitution, when using special characters as delimiters, its escape function or special function is not used.
15. Mode Order Variable
The result of invoking the reused part after the pattern match can be $n with the variable, and all the results are $& with the variable.
$string = "This string contains the number 25.11.";
$string =~/-? (\d+) \.? (\d+)/; # match result is 25.11
$integerpart = $; # now $integerpart = 25
$decimalpart = $; # now $decimalpart = 11
$totalpart = $&; # now Totalpart = 25.11
Mode-matching options

Options Describe
G Match all possible patterns
I Ignore case
M Treat a string as multiple lines
O Assign only one value at a time
S Treat a string as a single line
X Ignore whitespace in a pattern

1. Match all possible modes (g option)
@matches = "Balata" =~/.a/g; # now @matches = ("ba", "La", "ta")
Matching loops:
while ("Balata" =~/.a/g) {
$match = $&;
Print ("$match \ n");
}
The results are:
Ba
La
Ta
When option g is used, a function pos is available to control the next matching offset:
$offset = pos ($string);
POS ($string) = $newoffset;
2. Ignore case (i option)
/de/i match De,de,de and de.
3, the string as multiple lines (M option)
In this case, the ^ symbol matches the start of the string or the beginning of a new line, and the $ symbol matches the end of any line.
4. Perform only one variable substitution example
$var = 1;
$line = <STDIN>;
while ($var < 10) {
$result = $line =~/$var/O;
$line = <STDIN>;
$var + +;
}
Match/1/each time.
5. Consider a string as a single case
/A.*BC/S matches the string axxxxx \NXXXXBC, but/a.*bc/does not match the string.
6. Ignore spaces in the pattern
/\d{2} ([\w]) \d{2} \1 \d{2}/x equivalent to/\d{2} ([\w]) \d{2}\1\d{2}/.
V. Substitution operator
The syntax is s/pattern/replacement/, and the effect is to replace the part in the string with the pattern in replacement. Such as:
$string = "Abc123def";
$string =~ s/123/456/; # now $string = "Abc456def";
You can use the pattern order variable $n in the replacement section, such as s/(\d+)/[$1]/, but special characters that do not support the pattern in the replacement section, such as {},*,+, etc., such as s/abc/[def]/will replace ABC with [DEF].
The options for the Replace operator are as follows:

Options Describe
G Change all matches in a pattern
I Ignore case in mode
E Replace a string as an expression
M Treats the string to be matched as multiple rows
O Assign only one time
S Treat the string to be matched as a single line
X Ignore whitespace in a pattern

   Note: The E option considers the replacement part of the string as an expression, evaluates its value before replacing it, such as:
     $string = "0ABC1";
     $string =~ s/[a-za-z]+/$& x 2/e; # now $string = "0ABCABC1"
VI, translation operators
   This is another way to replace the syntax: tr/string1/string2/. Similarly, string2 is the replacement part, but the effect is to replace the first character in the string1 with the first character in the string2, replace the second character in the string1 with the second character in the string2, and so on. such as:
     $string = "ABCDEFGHICBA";
     $string =~ tr/abc/def/; # now String = ' defdefghifed '
   when string1 is longer than string2, its extra characters are replaced with the last character of string2, and when the same character occurs multiple times in the string1, The first substitution character will be used. The
   translation operator options are as follows:

Options Describe
C Translate all unspecified characters
D Delete all specified characters
S Indents multiple identical output characters into one

such as $string =~ tr/\d//C; Replace all non-numeric characters with spaces. $string =~ tr/\t//d, Remove tab and space, $string =~ tr/0-9//cs, and replace other characters between numbers with a single space.

Vii. Extended Pattern Matching
Perl supports some of the pattern-matching capabilities that PERL4 and standard UNIX pattern matching operations do not have. The syntax is: (? <c>pattern), where C is a character, and pattern is the mode or sub pattern that works.
1. Do not store the matching contents in brackets
In Perl mode, the child mode in parentheses is stored in memory, which cancels the storage of the matches within the brackets, such as the \1 in the/(?: A|b|c) (D|e) f\1/that represents the matched D or E, rather than a or B or C.
2. Inline mode option
Typically, after the mode option is placed, there are four options: I, M, s, x can be used inline, syntax is:/(? option) pattern/, equivalent to/pattern/option.
3, affirmative and negative foresight match
The affirmative preview matching syntax is/pattern (? =string)/, whose meaning matches the pattern followed by string, instead, (?!). string) meaning to match a pattern that is not followed by string, such as:
$string = "25abc8";
$string =~/abc (? =[0-9])/;
$matched = $&; # $& is the matching pattern, here for ABC, not ABC8
4, Mode annotation
In PERL5, you can use the #来加注释 in a pattern, such as:
if ($string =~/(? i) [a-z]{2,3} (? # match two or three alphabetic characters)/{
...
}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.