PERL5 the sixth chapter pattern matching

Source: Internet
Author: User
Tags uppercase letter


Sixth chapter Pattern Matching

by Flamephoenix

First, Introduction
Second, match operator
Third, special characters in the pattern
1, character +
2, characters [] and [^]
3, characters * and?
4. Escape character
5. Match any letter or number
6. Anchor Mode
7. Variable substitution in the pattern
8. Character Range escape prefix
9. Match any character
10. Match a specified number of characters
11. Specify options
12. Partial reuse of patterns
13. Escape and execution order of specific characters
14. Specify the pattern delimiter
15. Pattern Order Variables
Four, pattern matching options
1. Match all possible modes (g option)
2, ignoring case (i option) example
3. Treat string as multiple lines (M option)
4. Perform only one-time variable substitution example
5. Consider a string as a single instance
6. Ignore spaces in the pattern
V. Substitution operators
VI. Translation operators
Vii. Extended Pattern Matching
1, do not store the matching content in parentheses
2. Inline mode option
3. Positive and negative predictions match
4. Pattern annotations


First, Introduction
A pattern refers to a character in a string that is searched for a particular sequence, consisting of a backslash:/def/is the pattern def. Its usage, such as associative function split, divides a string into multiple words in a pattern: @array = Split (//, $line);
Second, match operator =~,!~
=~ Verify that the match was successful: $result = $var =~/abc/; If the pattern is found in the string, a value other than 0 is returned, that is true, the mismatch returns 0, which is false.!~ is the opposite.
These two operators are suitable for conditional control, such as:
if ($question =~/please/) {
Print ("Thank for being polite!\n");
}
else {
Print ("That is not very polite!\n");
}
Third, special characters in the pattern
Perl supports some special characters in the pattern, which can play a special role.
1, character +
+ means one or more of the same characters, such as:/de+f/means Def, Deef, Deeeeef, and so on. It matches as many of the same characters as possible, as/ab+/in the string ABBC will be ABB, not AB.
When there are more than one space between words in a row, you can split the following:
@array = Split (/+/, $line);
Note: Each time the split function encounters a split pattern, a new word is always started, so if $line starts with a space, the first element of @array is an empty element. But it can tell if there is a word, and if there are only spaces in $line, then @array is an empty array. and the tab character in the example above is treated as a word. Note the fix.
2, characters [] and [^]
[] means matching one of a set of characters, such as/a[0123456789]c/will match a plus number plus C string. and + Joint Use example:/d[ee]+f/matches Def, Def, Deef, Dedf, Deeeeeeeef, and so on. ^ denotes all characters except it, such as:/d[^dee]f/matches D plus non-e character multibyte F string.
3, characters * and?
They are similar to +, the difference is * Match 0, one or more of the same characters, match 0 or one of the characters. such as/de*f/match DF, Def, Deeeef, etc./de?f/match DF or def.
4. Escape character
If you want to include characters that are usually considered special in a pattern, you must forward a slash "\" to it. such as:/\*+/in \* is the character *, not the meaning of one or more of the above mentioned characters. The representation of a slash is/\\/. \q and \e are escaped by using characters in PERL5.
5. Match any letter or number
The above mentioned pattern/a[0123456789]c/matches the letter a plus any number plus C string, another method is:/a[0-9]c/, similar, [A-z] denotes any lowercase letter, [A-z] represents any uppercase letter. Any uppercase and lowercase letters, numbers are represented by:/[0-9a-za-z]/.
6. Anchor Mode

Anchor Describe
^ or \a Match only the first string
$ or \z Match only end of string
\b Match word boundaries
\b Word Interior matching

Example 1:/^def/matches only a string preceded by Def,/$def/matches only a def-terminated string, and the combined/^def$/only matches the string def (?). \a and \z are different from ^ and $ when multiple lines match.
Example 2: Verify the type of the variable name:
if ($varname =~/^\$[a-za-z][_0-9a-za-z]*$/) {
Print ("$varname is a legal scalar variable\n");
} elsif ($varname =~/^@[a-za-z][_0-9a-za-z]*$/) {
Print ("$varname is a legal array variable\n");
} elsif ($varname =~/^[a-za-z][_0-9a-za-z]*$/) {
Print ("$varname is a legal file variable\n");
} else {
Print ("I don ' t understand what $varname is.\n");
}
Example 3:\b matches the word boundary:/\bdef/matches def and Defghi and other words that begin with Def, but does not match abcdef. /def\b/matches def and ABCdef with a def-terminated word, but does not match the defghi,/\bdef\b/to match only the string def. Note:/\bdef/can match $defghi, because $ is not considered part of the word.
Example 4:\b in the word internal matching:/\bdef/matching abcdef, but does not match def;/def\b/matching Defghi, etc./\bdef\b/matches CDEFG, Abcdefghi, etc., but does not match def,defghi,abcdef.
7. Variable substitution in the pattern
Divide a sentence into words:
$pattern = "[\\t]+";
@words = Split (/$pattern/, $line);
8. Character Range escape

E Escape character Describe Range
\d Any number [0-9]
\d Any character except a number [^0-9]
\w Any word character [_0-9a-za-z]
\w Any non-word character [^_0-9a-za-z]
\s Blank [\r\t\n\f]
\s Non-whitespace [^ \r\t\n\f]

Example:/[\da-z]/matches any number or lowercase letter.
9. Match any character
Character "." Matches all characters except newline, usually in combination with *.
10. Match a specified number of characters
The character pair {} Specifies the number of occurrences of the matched character. such as:/de{1,3}f/match Def,deef and deeef;/de{3}f/match deeef;/de{3,}f/match not less than 3 E between D and F;/de{0,3}f/matches no more than 3 E between D and F.
11. Specify options
Character ' | ' Specifies two or more selections to match the pattern. such as:/def|ghi/matches def or Ghi.
Example: Verifying the legitimacy of a digital representation
if (=~/^-?\d+$|^-?0[xx][\da-fa-f]+$/) {
Print ("is a legal integer.\n");
} else {
Print ("is not a legal integer.\n");
}
Where ^-?\d+$ matches the decimal number, ^-?0[xx][\da-fa-f]+$ matches the hexadecimal number.
12. Partial reuse of patterns
When a pattern matches the same part more than once, enclose it in parentheses and use \ n to refer to it multiple times to simplify the expression:
/\d{2} ([\w]) \d{2}\1\d{2}/match:
12-05-92
26.11.87
07 04 92 etc
Note:/\d{2} ([\w]) \d{2}\1\d{2}/differs from/(\d{2}) ([\w]) \1\2\1/, which only matches strings that are shaped like 17-17-17, but does not match 17-05-91, and so on.
13. Escape and execution order of specific characters
Like operators, escape and specific characters also have a sequence of execution:

special characters description
() mode memory
+ *? {} occurrences
^ $ \b \b / td> anchor
| options


14. Specify the pattern delimiter
By default, the pattern delimiter is a backslash/, but it can be specified by the letter m itself, such as:
m!/u/jqpublic/perl/prog1! Equivalent to/\/u\/jqpublic\/perl\/prog1/
Note: When you use the letter ' as a delimiter, you do not replace the variable, and when you use a special character as a delimiter, its escape function or special function cannot be used.
15. Pattern Order Variables
The result of calling the reuse part after pattern matching is the available variable $n, and the result is $& with the variable.
$string = "This string contains the number 25.11.";
$string =~/-? (\d+) \.? (\d+)/; # match result is 25.11
$integerpart = $; # now $integerpart = 25
$decimalpart = $; # now $decimalpart = 11
$totalpart = $&; # now Totalpart = 25.11
Four, pattern matching options

Options Describe
G Match all possible patterns
I Ignore case
M Treat a string as multiple lines
O Assign a value only once
S Treat a string as a single line
X Ignore whitespace in a pattern


1. Match all possible modes (g option)
@matches = "Balata" =~/.a/g; # now @matches = ("ba", "La", "ta")
Matching loops:
while ("Balata" =~/.a/g) {
$match = $&;
Print ("$match \ n");
}
The result is:
Ba
La
Ta
When option g is used, a function pos is available to control the next matching offset:
$offset = pos ($string);
POS ($string) = $newoffset;
2, ignoring case (i option) example
/de/i matches De,de,de and de.
3. Treat string as multiple lines (M option)
In this case, the ^ symbol matches the beginning of the string or the start of a new line; the $ symbol matches the end of any line.
4. Perform only one-time variable substitution example
$var = 1;
$line = <STDIN>;
while ($var < 10) {
$result = $line =~/$var/O;
$line = <STDIN>;
$var + +;
}
Match/1/each time.
5. Consider a string as a single instance
/A.*BC/S matches the string axxxxx \NXXXXBC, but/a.*bc/does not match the string.
6. Ignore spaces in the pattern
/\d{2} ([\w]) \d{2} \1 \d{2}/x is equivalent to/\d{2} ([\w]) \d{2}\1\d{2}/.
V. Substitution operators
The syntax is s/pattern/replacement/, and the effect is to change the part of the string that matches the pattern to replacement. Such as:
$string = "Abc123def";
$string =~ s/123/456/; # now $string = "Abc456def";
In the replacement section you can use the pattern order variable $n, such as s/(\d+)/[$1]/, but the replacement part does not support the special characters of the pattern, such as {},*,+, etc., such as s/abc/[def]/will replace ABC with [DEF].
The options for the Replace operator are the following table:

options description
G change all matches in mode
I case in ignore mode
e replace string as expression
m treats the string to be matched as multiple lines
o assign only one time
S treats the string to be matched as a single line
x whitespace in ignore mode

   Note: The E option considers the replacement part of the string as an expression and evaluates its value before replacing it, such as:
     $string = "0ABC1";
     $string =~ s/[a-za-z]+/$& x 2/e; # now $string = "0ABCABC1"
VI, translation operator  
   This is another way to replace syntax such as: tr/string1/string2/. Similarly, string2 is the replacement part, but the effect is to replace the first character in string1 with the first character in string2, replace the second character in string1 with the second character in string2, and so on. For example:
     $string = "ABCDEFGHICBA";
     $string =~ tr/abc/def/; # now String = "defdefghifed"
   when string1 is longer than string2, its extra characters are replaced with the last character of string2, and when the same character appears multiple times in string1, The first substitution character is used. The options for the
   translation operators are as follows:

Options Describe
C Translate all unspecified characters
D Delete all specified characters
S To indent multiple identical output characters into one

such as $string =~ tr/\d//C; Replace all non-numeric characters with spaces. $string =~ tr/\t//d; remove tab and spaces, $string =~ tr/0-9//cs, and replace other characters between numbers with a single space.

Vii. Extended Pattern Matching
Perl supports some pattern-matching capabilities that are not available for PERL4 and standard UNIX pattern matching operations. Its syntax is: (? <c>pattern), where C is a character, pattern is the mode or sub-pattern that works.
1, do not store the matching content in parentheses
In Perl mode, the sub-patterns in parentheses are stored in memory, which cancels the storage of matches within the brackets, such as the \1 in the/(?: A|b|c) (d|e) f\1/, which indicates a matched D or E, instead of a or B or C.
2. Inline mode option
Usually the mode option is placed after, there are four options: I, M, s, x can be used inline, syntax is:/(option) pattern/, equivalent to/pattern/option.
3. Positive and negative predictions match
The predicted match syntax is/pattern (? =string)/, whose meaning matches the pattern followed by string, instead, (?!). string) meaning to match a pattern that is not followed by a string, such as:
$string = "25abc8";
$string =~/abc (? =[0-9])/;
$matched = $&; # $& is a matched pattern, here is ABC, not ABC8
4. Pattern annotations
PERL5 can be used in the mode in the #来加注释, such as:
if ($string =~/(? i) [a-z]{2,3} (? # Match II or three alphabetic characters)/{
...
}

Previous chapter Next Chapter catalogue

PERL5 the sixth chapter pattern matching

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.