Introduction to regular expressions in Perl
I. Introduction
Ii. Matching Operators
3. Special characters in the Mode
1. Character +
2. characters [] and [^]
3. characters * and?
4. escape characters
5. match any letter or number
6. Anchor Mode
7. replace variables in the Mode
8. character range escape prefix
9. match any character
10. match a specified number of characters
11. Specify options
12. partial reuse of Models
13. Escape and execution order of specific characters
14. specify the mode delimiter
15. Pattern Order Variable
Iv. pattern matching options
1. match all possible modes (G option)
2. Case Insensitive (I option)
3. Think of strings as multiple rows (M option)
4. Only one variable replacement example is executed.
5. Think of a string as a singleton.
6. Ignore spaces in the mode.
5. Replacement Operators
Vi. Translation Operators
VII. Scaling mode matching
1. Do not store Matching content in brackets
2. Embedded mode options
3. Positive and negative predictions match
4. Mode comments
I. Introduction
The pattern refers to the character of a specific sequence in the string. The backslash contains:/DEF/That is, the pattern def. Its usage is as follows: Use the function split to divide a string into multiple words in a certain mode: @ array = Split (//, $ line );
Ii. Matching operator = ~ ,!~
= ~ Check whether the matching is successful: $ result = $ Var = ~ /ABC/; if this mode is found in the string, a non-zero value is returned, that is, true. If it does not match, 0 is returned, that is, false .!~ On the contrary.
These two operators are suitable for condition control, for example:
If ($ question = ~ /Please /){
Print ("Thank you for being polite! \ N ");
}
Else {
Print ("that was not very polite! \ N ");
}
3. Special characters in the Mode
Perl supports some special characters in the mode and can play some special roles.
1. Character +
+ Indicates one or more identical characters, such as/DE + F/DEF, deef, and deeeeef. It tries its best to match as many identical characters as possible, for example,/AB +/will match ABB in the string abbc, rather than AB.
When there is more than one space for each word in a row, it can be divided as follows:
@ Array = Split (/+/, $ line );
Note: The split function always starts a new word when it encounters the split mode. Therefore, if $ line is preceded by a space, the first element of @ array is an empty element. However, it can identify whether there are words. If $ line contains only spaces, @ array is an empty array. In the preceding example, The Tab character is treated as a word. Pay attention to correction.
2. characters [] and [^]
[] Means to match one of a group of characters. For example,/a [0123456789] C/matches a string with a plus number and C. Examples of combined use with +:/d [EE] + F/matches def, def, deef, dedf, and deeeeeeeef. ^ Indicates all characters except it, for example,/d [^ DEE] f/matches a string with a non-e character and a f character.
3. characters * and?
They are similar to +. The difference is that * matches 0, 1, or multiple identical characters ,? Matches 0 or 1 characters. For example,/De * f/matches DF, def, and deeeef;/de? F/matches DF or def.
4. escape characters
If you want to include characters in a pattern that are generally considered special, you must add a slash "\" before it "\". For example, in/\ * +/, \ * indicates the character *, rather than the meaning of one or more characters mentioned above. The slash is /\\/. In perl5, escape the \ q and \ e characters.
5. match any letter or number
The pattern/A [0123456789] C/matches the letter A with any number and a string of C, and the other representation is:/a [0-9] C/, similar, [A-Z] represents any lowercase letter, [A-Z] represents any capital letter. The format of uppercase and lowercase letters and numbers is/[0-9a-za-z]/.
6. Anchor Mode
Anchor description
^ Or \ A matches only the first part of the string
$ Or \ Z only matches the end of the string
\ B match word boundary
\ B internal match
Example 1:/^ DEF/only matches the string whose names start with Def, And/$ DEF/only matches the string whose names end with Def, the combined/^ def $/matches only the string def (?). \ A and \ Z are different from ^ and $ in multi-row matching.
Example 2: Test the variable name type:
if ($ varname = ~ /^ \ $ [A-Za-Z] [_ 0-9a-za-z] * $/) {
Print ("$ varname is a legal scalar variable \ n ");
} elsif ($ varname = ~ /^ @ [A-Za-Z] [_ 0-9a-za-z] * $/) {
Print ("$ varname is a legal array variable \ n ");
} elsif ($ varname = ~ /^ [A-Za-Z] [_ 0-9a-za-z] * $/) {
Print ("$ varname is a legal file variable \ n ");
} else {
Print ("I dont understand what $ varname is. \ n ");
}< br> Example 3: \ B matches words starting with Def in the word boundary:/\ bdef/matches def and defghi, but does not match abcdef. /Def \ B/matches words ending with Def, such as def and abcdef, but does not match defghi, And/\ bdef \ B/matches only the string def. Note:/\ bdef/can match $ defghi, because $ is not considered a part of a word.
Example 4: \ B matches in the word:/\ bdef/matches abcdef, but does not match def;/def \ B/matches defghi; /\ bdef \ B/matches cdefg and abcdefghi, but does not match def, defghi, abcdef.
7. replace variables in the mode
divide sentences into words:
$ pattern = "[\ t] + ";
@ words = Split (/$ pattern/, $ line);
8. Escape within the character range
E. Description range of escape characters
\ D any number [0-9]
\ D any character except number [^ 0-9]
\ W any word character [_ 0-9a-za-z]
\ W any non-word character [^ _ 0-9a-za-z]
\ S blank [\ r \ t \ n \ F]
\ S non-blank [^ \ r \ t \ n \ F]
For example,/[\ da-Z]/matches any number or lowercase letter.
9. match any character
The character "." matches all characters except line breaks. It is usually used.
10. match a specified number of characters
Character pair {} specifies the number of occurrences of matched characters. For example,/de {1, 3} f/matches def, deef, and deeef;/de {3} f/matches deeef;/de {3 ,} f/matches no less than three E values between D and F;/de {} f/matches no more than three E values between D and F.
11. Specify options
Character "|" specifies two or more options to match the pattern. For example,/DEF | Ghi/matches DEF or Ghi.
For example, verify the validity of a number.
If ($ number = ~ /^ -? \ D + $ | ^ -? 0 [XX] [\ da-fa-F] + $ /){
Print ("$ number is a legal integer. \ n ");
} Else {
Print ("$ number is not a legal integer. \ n ");
}
^ -? \ D + $ matches the decimal number, ^ -? 0 [XX] [\ da-fa-F] + $ matches the hexadecimal number.
12. partial reuse of Models
When the pattern matches the same part for multiple times, it can be enclosed in brackets and referenced multiple times with \ n to simplify the expression:
/\ D {2} ([\ W]) \ D {2} \ 1 \ D {2}/match:
12-05-92
26.11.87
07 04 92, etc.
Note:/\ D {2} ([\ W]) \ D {2} \ 1 \ D {2}/different from/(\ D {2 }) ([\ W]) \ 1 \ 2 \ 1/. The latter matches only strings in the shape of 17-17, but not 17-05-91.
13. Escape and execution order of specific characters
Like operators, conversion and specific characters also have execution order:
Special Character Description
() Mode memory
+ *? {} Occurrences
^ $ \ B anchor
| Option
14. specify the mode delimiter
By default, the pattern Delimiter is backslash/, but it can be specified by the letter M, for example:
M! /U/jqpublic/perl/prog1! Equivalent to/\/U \/jqpublic \/perl \/prog1/
Note: variables are not replaced when letters are used as delimiters. When special characters are used as delimiters, the escape function or special function cannot be used.
15. Pattern Order Variable
Call the reusable result variable $ N after the pattern match, and use the variable $ & for all the results &.
$ String = "this string contains the number 25.11 .";
$ String = ~ /-? (\ D + )\.? (\ D +)/; # The matching result is 25.11.
$ Integerpart = $1; # Now $ integerpart = 25
$ Decimalpart = $2; # Now $ decimalpart = 11
$ Totalpart =$ &; # Now totalpart = 25.11
Iv. pattern matching options
Option description
G matches all possible modes
I case-insensitive
M treats the string as multiple rows
O is assigned only once.
S treats the string as a single line
X blank in ignore Mode
1. match all possible modes (G options)
@ matches = "balata" = ~ /. A/g; # Now @ matches = ("ba", "La", "ta")
matching loop:
while ("balata" = ~ /. A/G) {
$ match =$ &;
Print ("$ match \ n");
}< br> the result is:
BA
La
Ta
when option G is used, the POs function can be used to control the offset of the next match:
$ offset = pos ($ string);
pos ($ string) = $ newoffset;
2. Case Insensitive (I option) for example,
/DE/I matches de, And de.
3. Think of a string as a multi-row (M option)
in this case, the ^ symbol matches the start of a string or the start of a new row; $ symbol matches the end of any row.
4. Only one variable replacement example
$ Var = 1;
$ line =;
while ($ var <10) {
$ result = $ line = ~ /$ Var/O;
$ line =;
$ var ++;
}< br> match/1/at a time /.
5. Use a string as a single line example
/a. * BC/s to match the string axxxxx \ nxxxxbc, but/a. * BC/does not match the string.
6. Ignore spaces in mode
/\ D {2} ([\ W]) \ D {2} \ 1 \ D {2}/X is equivalent to/\ D {2} ([\ W]) \ D {2} \ 1 \ D {2 }/.
5. Replacement operator
the syntax is S/pattern/replacement/. The effect is to replace the matching part of the string with pattern with replacement. For example:
$ string = "abc123def";
$ string = ~ S/123/456/; # Now $ string = "abc456def";
In the replacement section, you can use the pattern order variable $ N, for example, S/(\ D +) /[$1]/, but the replacement part does not support special characters of the mode, such as {}, *, +, for example, S/ABC/[DEF]/will replace ABC with [DEF].
the options of the replacement operator are as follows:
Option description
All matches in G Change Mode
I. Case sensitivity in ignore Mode
E. Replace the string as an expression.
M treats the string to be matched as multiple rows
O is assigned only once.
S treats the string to be matched as a single line
X blank in ignore Mode
Note: The e option regards the string to be replaced as an expression and calculates its value before replacement, for example:
$ String = "0abc1 ";
$ String = ~ S/[A-Za-Z] +/$ & X 2/E; # Now $ string = "0abcabc1"
Vi. Translation Operators
This is another replacement method. Syntax: TR/string1/string2 /. Similarly, string2 is the replacement part, but the effect is to replace the first character in string1 with the first character in string2, and replace the second character in string1 with the second character in string2, and so on. For example:
$ String = "abcdefghicba ";
$ String = ~ TR/ABC/DEF/; # Now string = "defdefghifed"
When string1 is longer than string2, its extra characters are replaced with the last character of string2. When the same character in string1 appears multiple times, the first replacement character is used.
The translation operator options are as follows:
Option description
C. Translate all unspecified characters
D. Delete all specified characters.
S converts multiple identical output characters into one
For example, $ string = ~ TR/\ D // C; replace all non-numeric characters with spaces. $ String = ~ TR/\ t // D; Delete tabs and spaces; $ string = ~ TR/0-9 // Cs; replace other characters in the number with a space.
7. extended mode matching
Perl supports some pattern matching capabilities not available in perl4 and standard UNIX mode matching operations. The syntax is :(? Pattern), where C is a character, pattern is the effective mode or child mode.
1. Do not store Matching content in parentheses
in Perl mode, the Child mode in parentheses is stored in memory, this function removes the Matching content in the brackets, such /(?: A | B | C) (d | E) f \ 1/\ 1 indicates the matched D or E instead of A, B, or C.
2. Embedded mode options
generally, mode options are placed after them. Four options are available: I, m, S, and X. The syntax is :/(? Option) pattern/, equivalent to/pattern/option.
3. Positive and Negative foresight matching
the syntax of positive foresight matching is/pattern (? = String)/, which indicates matching the string pattern. The opposite is ,(?! String) indicates a non-String Match mode, for example,
$ string = "25abc8";
$ string = ~ /ABC (? = [0-9])/;
$ matched =$ & ;#$ & indicates the matched mode, which is ABC, instead of abc8
4. The mode comment
can be used in the mode in perl5? # Add comments, such as:
if ($ string = ~ /(? I) [A-Z] {2, 3 }(? # Match two or three alphabetic characters)/{
...
}