Perl pattern matching

Source: Internet
Author: User

 

Chapter 6 pattern matching

By flamephoenix

I. Introduction
Ii. Matching Operators
3. Special characters in the Mode
1. Character +
2. characters [] and [^]
3. characters * and?
4. escape characters
5. match any letter or number
6. Anchor Mode
7. replace variables in the Mode
8. character range escape prefix
9. match any character
10. match a specified number of characters
11. Specify options
12. partial reuse of Models
13. Escape and execution order of specific characters
14. specify the mode delimiter
15. Pattern Order Variable
Iv. pattern matching options
1. match all possible modes (G option)
2. Case Insensitive (I option)
3. Think of strings as multiple rows (M option)
4. Only one variable replacement example is executed.
5. Think of a string as a singleton.
6. Ignore spaces in the mode.
5. Replacement Operators
Vi. Translation Operators
VII. Scaling mode matching
1. Do not store Matching content in brackets
2. Embedded mode options
3. Positive and negative predictions match
4. Mode comments

I. Introduction
The pattern refers to the character of a specific sequence in the string. The backslash contains:/DEF/That is, the pattern def. Its usage is as follows: Use the function split to divide a string into multiple words in a certain mode: @ array = Split (//, $ line );
Ii. Matching operator = ~ ,!~
= ~ Check whether the matching is successful: $ result = $ Var = ~ /ABC/; if this mode is found in the string, a non-zero value is returned, that is, true. If it does not match, 0 is returned, that is, false .!~ On the contrary.
These two operators are suitable for condition control, for example:
If ($ question = ~ /Please /){
Print ("Thank you for being polite! \ N ");
}
Else {
Print ("that was not very polite! \ N ");
}
3. Special characters in the Mode
Perl supports some special characters in the mode and can play some special roles.
1. Character +
+ Indicates one or more identical characters, such as/DE + F/DEF, deef, and deeeeef. It tries its best to match as many identical characters as possible, for example,/AB +/will match ABB in the string abbc, rather than AB.
When there is more than one space for each word in a row, it can be divided as follows:
@ Array = Split (/+/, $ line );
Note: The split function always starts a new word when it encounters the split mode. Therefore, if $ line is preceded by a space, the first element of @ array is an empty element. However, it can identify whether there are words. If $ line contains only spaces, @ array is an empty array. In the preceding example, The Tab character is treated as a word. Pay attention to correction.

2. characters [] and [^]
[] Means to match one of a group of characters. For example,/a [0123456789] C/matches a string with a plus number and C. Examples of combined use with +:/d [EE] + F/matches def, def, deef, dedf, and deeeeeeeef. ^ Indicates all characters except it, for example,/d [^ DEE] f/matches a string with a non-e character and a f character.
3. characters * and?
They are similar to +. The difference is that * matches 0, 1, or multiple identical characters ,? Matches 0 or 1 characters. For example,/De * f/matches DF, def, and deeeef;/de? F/matches DF or def.
4. escape characters
If you want to include characters in a pattern that are generally considered special, you must add a slash "\" before it "\". For example, in/\ * +/, \ * indicates the character *, rather than the meaning of one or more characters mentioned above. The slash is /\\/. In perl5, escape the \ q and \ e characters.
5. match any letter or number
The pattern/A [0123456789] C/matches the letter A with any number and a string of C, and the other representation is:/a [0-9] C/, similar, [A-Z] represents any lowercase letter, [A-Z] represents any capital letter. The format of uppercase and lowercase letters and numbers is/[0-9a-za-z]/.
6. Anchor Mode

Anchor Description
^Or\ Match only the beginning of a string
$Or\ Z Match only the end of a string
\ B Match word boundary
\ B Word internal match

Example 1:/^ DEF/matches only the strings whose names start with Def, And/$ DEF/matches only the strings whose names end with Def, the combined/^ def $/matches only the string def (?). \ A and \ Z are different from ^ and $ in multi-row matching.
Example 2: Test the type of the variable name:
If ($ varname = ~ /^ \ $ [A-Za-Z] [_ 0-9a-za-z] * $ /){
Print ("$ varname is a legal scalar variable \ n ");
} Elsif ($ varname = ~ /^ @ [A-Za-Z] [_ 0-9a-za-z] * $ /){
Print ("$ varname is a legal array variable \ n ");
} Elsif ($ varname = ~ /^ [A-Za-Z] [_ 0-9a-za-z] * $ /){
Print ("$ varname is a legal file variable \ n ");
} Else {
Print ("I don't understand what $ varname is. \ n ");
}
Example 3: \ B matches the following words in the word boundary:/\ bdef/matches def and defghi, but does not match abcdef. /Def \ B/matches words ending with Def, such as def and abcdef, but does not match defghi, And/\ bdef \ B/matches only the string def. Note:/\ bdef/can match $ defghi, because $ is not considered a part of a word.
Example 4: \ B matches in the word:/\ bdef/matches abcdef, but does not match def;/def \ B/matches defghi; /\ bdef \ B/matches cdefg and abcdefghi, but does not match def, defghi, abcdef.
7. replace variables in the Mode
Divide sentences into words:
$ Pattern = "[\ t] + ";
@ Words = Split (/$ pattern/, $ line );
8. Escape Character ranges

EEscape characters Description Range
\ D Any number [0-9]
\ D Any character except a number [^ 0-9]
\ W Any word character [_ 0-9a-za-z]
\ W Any non-word character [^ _ 0-9a-za-z]
\ S Blank [\ R \ t \ n \ F]
\ S Non-Blank [^ \ R \ t \ n \ F]

For example,/[\ da-Z]/matches any number or lowercase letter.
9. match any character
The character "." matches all characters except line breaks. It is usually used.
10. match a specified number of characters
Character pair {} specifies the number of occurrences of matched characters. For example,/de {1, 3} f/matches def, deef, and deeef;/de {3} f/matches deeef;/de {3 ,} f/matches no less than three E values between D and F;/de {} f/matches no more than three E values between D and F.
11. Specify options
Character "|" specifies two or more options to match the pattern. For example,/DEF | Ghi/matches DEF or Ghi.
For example, verify the validity of a number.
If ($ number = ~ /^ -? \ D + $ | ^ -? 0 [XX] [\ da-fa-F] + $ /){
Print ("$ number is a legal integer. \ n ");
} Else {
Print ("$ number is not a legal integer. \ n ");
}
^ -? \ D + $ matches the decimal number, ^ -? 0 [XX] [\ da-fa-F] + $ matches the hexadecimal number.
12. partial reuse of Models
When the pattern matches the same part for multiple times, it can be enclosed in brackets and referenced multiple times with \ n to simplify the expression:
/\ D {2} ([\ W]) \ D {2} \ 1 \ D {2}/match:
12-05-92
26.11.87
07 04 92, etc.
Note:/\ D {2} ([\ W]) \ D {2} \ 1 \ D {2}/different from/(\ D {2 }) ([\ W]) \ 1 \ 2 \ 1/. The latter matches only strings in the shape of 17-17, but not 17-05-91.
13. Escape and execution order of specific characters
Like operators, conversion and specific characters also have execution order:

Special characters Description
() Mode memory
+ *? {} Number of occurrences
^ $ \ B \ B Anchor
| Option

14. specify the mode delimiter
By default, the pattern Delimiter is backslash/, but it can be specified by the letter M, for example:
M! /U/jqpublic/perl/prog1! Equivalent to/\/U \/jqpublic \/perl \/prog1/
Note: variables are not replaced when the letter 'is used as the delimiters. When special characters are used as the delimiters, the escape function or special function cannot be used.
15. Pattern Order Variable
Call the reusable result variable $ N after the pattern match, and use the variable $ & for all the results &.
$ String = "this string contains the number 25.11 .";
$ String = ~ /-? (\ D + )\.? (\ D +)/; # The matching result is 25.11.
$ Integerpart = $1; # Now $ integerpart = 25
$ Decimalpart = $2; # Now $ decimalpart = 11
$ Totalpart =$ &; # Now totalpart = 25.11
Iv. pattern matching options

Option Description
G Match all possible modes
I Case Insensitive
M Treat a string as multiple rows
O Assign values only once
S Treat a string as a single line
X Blank in ignore Mode

1. match all possible modes (G option)
@ Matches = "balata" = ~ /. A/G; # Now @ matches = ("ba", "La", "ta ")
Matching cycle:
While ("balata" = ~ /. A/G ){
$ Match =$ &;
Print ("$ match \ n ");
}
Result:
Ba
La
Ta
When option G is used, the POs function can be used to control the offset of the next match:
$ Offset = pos ($ string );
POs ($ string) = $ newoffset;
2. Case Insensitive (I option)
/DE/I matches de, de, de and de.
3. Think of strings as multiple rows (M option)
In this case, the ^ symbol matches the start of a string or the start of a new row; $ symbol matches the end of any row.
4. Only one variable replacement example is executed.
$ Var = 1;
$ Line = <stdin>;
While ($ var <10 ){
$ Result = $ line = ~ /$ Var/O;
$ Line = <stdin>;
$ Var ++;
}
Match/1/each time /.
5. Think of a string as a singleton.
/A. * BC/s matches the string axxxxx \ nxxxxbc, but/a. * BC/does not match the string.
6. Ignore spaces in the mode.
/\ D {2} ([\ W]) \ D {2} \ 1 \ D {2}/X is equivalent to/\ D {2} ([\ W]) \ D {2} \ 1 \ D {2 }/.
5. Replacement Operators
The syntax is S/pattern/replacement/. The effect is to replace the part matching pattern in the string with replacement. For example:
$ String = "abc123def ";
$ String = ~ S/123/456/; # Now $ string = "abc456def ";
In the replacement part, you can use the pattern order variable $ N, such as S/(\ D +)/[$1]/, but the replacement part does not support special characters of the pattern, for example, {}, *, +, etc. For example, S/ABC/[DEF]/replaces ABC with [DEF].
The options of the replacement operator are as follows:

Option Description
G Change all matches in the pattern
I Case sensitivity in ignore Mode
E Replace string as expression
M Treat the string to be matched as multiple rows
O Assign a value only once
S Treat the string to be matched as a single line
X Blank in ignore Mode

Note: The e option regards the string to be replaced as an expression and calculates its value before replacement, for example:
$ String = "0abc1 ";
$ String = ~ S/[A-Za-Z] +/$ & X 2/E; # Now $ string = "0abcabc1"
Vi. Translation Operators
This is another replacement method. Syntax: TR/string1/string2 /. Similarly, string2 is the replacement part, but the effect is to replace the first character in string1 with the first character in string2, and replace the second character in string1 with the second character in string2, and so on. For example:
$ String = "abcdefghicba ";
$ String = ~ TR/ABC/DEF/; # Now string = "defdefghifed"
When string1 is longer than string2, its extra characters are replaced with the last character of string2. When the same character in string1 appears multiple times, the first replacement character is used.
The translation operator options are as follows:

Option Description
C Translate all unspecified characters
D Delete all specified characters
S Scale multiple identical output characters into one

For example, $ string = ~ TR/\ D // C; replace all non-numeric characters with spaces. $ String = ~ TR/\ t // D; Delete tabs and spaces; $ string = ~ TR/0-9 // Cs; replace other characters in the number with a space.

VII. Scaling mode matching
Perl supports some pattern matching capabilities not available for the perl4 and standard UNIX mode matching operations. The syntax is :(? <C> pattern), where C is a character, and pattern is the effective mode or child mode.
1. Do not store Matching content in brackets
In Perl mode, the sub-mode in the brackets is stored in the memory. This function removes the Matching content in the brackets, such /(? : A | B | C) (d | E) f \ 1/\ 1 indicates the matched D or E instead of A, B, or C.
2. Embedded mode options
Usually, the mode option is placed after it. There are four options: I, m, S, and X, which can be embedded and used. The syntax is :/(? Option) pattern/, equivalent to/pattern/option.
3. Positive and negative predictions match
Certainly, the precognition matching syntax is/pattern (? = String)/, which indicates matching the string pattern. The opposite is ,(?! String) indicates a non-String Match mode, for example:
$ String = "25abc8 ";
$ String = ~ /ABC (? = [0-9])/;
$ Matched =$ & ;#$ & indicates the matched mode. Here, it is ABC, not abc8.
4. Mode comments
Which of the following modes can be used in perl5? # Add comments, such:
If ($ string = ~ /(? I) [A-Z] {2, 3 }(? # Match two or three alphabetic characters )/{
...
}

Previous chapter
Next chapter
Directory

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.