International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Perl

Pattern matching in Perl learning notes _perl

Last Update:2017-01-18 Source: Internet

Author: User

Tags numeric lowercase numeric value uppercase letter

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, Introduction
pattern refers to the character of a particular sequence that is searched for in a string, which is included by a backslash:/def/, Mode def. Its usage, such as combining function split, splits a string into multiple words in a pattern: @array = Split (//, $line);

Two, matching operator =~,!~
=~ Verify that the match was successful: $result = $var =~/abc/; If the pattern is found in the string, it returns a value other than 0, true, or 0, or false, if it does not match.!~ is the opposite. These two operators are suitable for conditional control, such as:

Copy Code code as follows:

if ($question =~/please/) {
Print ("Thank for being polite!\n");
}
else {
Print ("That is not very polite!\n");
}

Iii. Special characters in the pattern
Perl supports special characters in the pattern and can play a special role.
1, character +
+ means one or more of the same characters, such as:/de+f/-Def, Deef, Deeeeef, etc. It matches as many of the same characters as possible, as/ab+/in the string ABBC will be ABB, not AB. When there are more than one space between the words in a line, you can split the following: @array = Split (/+/, $line);
Note: The Split function always starts a new word every time it encounters a split pattern, so if $line begins with a space, the first element of @array is an empty element. But it can distinguish whether there are really words, if $line only space, then @array is an empty array. and the tab character in the previous example is treated as a word. Pay attention to corrections.
2, characters [] and [^]
[] means matching one of a set of characters, such as/a[0123456789]c/will match a plus number plus C's string. Combined with + example:/d[ee]+f/matching Def, Def, Deef, Deef, Deeeeeeeef, etc. ^ represents all except its characters, such as:/d[^dee]f/matches a string of D plus non-e characters alphanumeric F.
3, character * and?
They are similar to +, except that they match 0, one or more of the same characters, and match 0 or one of the characters. such as/de*f/matching DF, Def, Deeeef,/de?f/matching DF or def.
4, escape character
If you want to include characters that are usually considered special in the pattern, you must add a slash before it. For example: in/\*+/, \* denotes the character *, not the meaning of one or more characters mentioned above. The slash is expressed as/\\/. The \q and \e are escaped with characters available in PERL5.
5. Match any letter or number
The above mentioned pattern/a[0123456789]c/matches the string with the letter a plus any number plus C, and the other means:/a[0-9]c/, similarly, [A-z] denotes any lowercase letter, [a-z] denotes any uppercase letter. Any uppercase and lowercase letters, numbers are represented by:/[0-9a-za-z]/.
6, Anchor mode
Anchor description
^ or \a only match string heads
$ or \z only match string tail
\b Match word boundaries
\b Word Internal match
Example 1:/^def/only matches a string that begins with Def,/$def/matches only a string at the end of Def, and the combined/^def$/matches only the string def (?). \a and \z are different from ^ and $ when matching multiple lines.
Example 2: Verify the type of the variable name:

Copy Code code as follows:

if ($varname =~/^\$[a-za-z][_0-9a-za-z]*$/) {
Print ("$varname is a legal scalar variable\n");
} elsif ($varname =~ /^@[a-za-z][_0-9a-za-z]*$/) {
Print ("$varname is a legal array variable\n");
} elsif ($varname =~/^[a-za-z][_0-9a-za-z]*$/) {
Print ("$varname is a legal file variable\n");
} else {
Print ("I don ' t understand what $varname is.\n");
}

Example 3:\b matches the word boundary:/\bdef/matches def and Defghi words with Def, but does not match abcdef. /def\b/matches def and abcdef words at the end of Def, but does not match defghi,/\bdef\b/matches only String def. Note:/\bdef/can match $defghi, because $ is not considered part of the word.
Example 4:\b in the word internal matching:/\bdef/matching abcdef, but not matching def;/def\b/matching defghi,/\bdef\b/matching CDEFG, Abcdefghi, but do not match def,defghi,abcdef.
7, variable substitution in the pattern
Divide a sentence into words:
$pattern = "[\\t]+";
@words = Split (/$pattern/, $line);
8. Character Range escape
Escape Character Description Range
\d any number [0-9]
\d any character except a number [^0-9]
\w any word characters [_0-9a-za-z]
\w any non-word characters [^_0-9a-za-z]
\s Blank [\r\t\n\f]
\s not blank [^ \r\t\n\f]
Example:/[\da-z]/matches any number or lowercase letter.
9. Match any character
Character "." Matches all characters except newline, usually with *.
10, matching the specified number of characters
The character pair {} Specifies the number of occurrences of the matched character. For example:/de{1,3}f/matching Def,deef and deeef;/de{3}f/matching deeef;/de{3,}f/match not less than 3 E between D and F;/de{0,3}f/matches no more than 3 E between D and F.
11. Specify options
Character "|" Specifies two or more selections to match the pattern. such as:/def|ghi/matching Def or ghi.
Example: Verifying the legality of numbers
if ($number =~/^-?\d+$|^-?0[xx][\da-fa-f]+$/) {
Print ("$number is a legal integer.\n");
} else {
Print ("$number is not a legal integer.\n");
}
where ^-?\d+$ matches decimal digits, ^-?0[xx][\da-fa-f]+$ matches hexadecimal digits.
12. Partial Reuse of patterns
When the same part of the pattern appears multiple times, it can be enclosed in parentheses and referenced multiple times with \ n to simplify the expression:/\d{2} ([\w]) \d{2}\1\d{2}/match:
12-05-92
26.11.87
07 04 92 etc
Note: the/\d{2} ([\w]) \d{2}\1\d{2}/differs from/(\d{2}) ([\w]) \1\2\1/, which matches only strings in the form of 17-17-17, and does not match 17-05-91.
13. Escape and order of execution of specific characters
As with operators, escape and specific characters also have an order of execution:
Special Character description
() mode memory
+ * ? {} Number of occurrences
^ $ \b \b Anchor
| Options
14. Specify Pattern delimiter
By default, the pattern delimiter is a backslash/, but it can be specified by its own letter m, such as:
m!/u/jqpublic/perl/prog1! Equivalent to/\/u\/jqpublic\/perl\/prog1/
Note: When using the letter ' as a delimiter, do not make variable substitution, when using special characters as delimiters, its escape function or special function is not used.
15. Mode Order Variable
The result of invoking the reused part after the pattern match can be $n with the variable, and all the results are $& with the variable.

Copy Code code as follows:

$string = "This string contains the number 25.11.";
$string =~/-? (\d+) \.? (\d+)/; # match result is 25.11
$integerpart = $; # now $integerpart = 25
$decimalpart = $; # now $decimalpart = 11
$totalpart = $&; # now Totalpart = 25.11

Mode-matching options
option Description
G Match all possible patterns
I ignore case
M treats strings as multiple lines
o only assign one value at a time
s treats a string as a single line
x ignores whitespace in the pattern

1. Match all possible modes (g option)

Copy Code code as follows:

@matches = "Balata" =~/.a/g; # now @matches = ("ba", "La", "ta")
Matching loops:
while ("Balata" =~/.a/g) {
$match = $&;
Print ("$match \ n");
}

The results are:

Copy Code code as follows:

Ba
La
Ta

When option g is used, a function pos is available to control the next matching offset:

Copy Code code as follows:

$offset = pos ($string);
POS ($string) = $newoffset;

2. Ignore case (i option)
/de/i match de,de,de and de.
3, the string as multiple lines (M option)
in this case, the ^ symbol matches the start of the string or the beginning of a new line, and the $ symbol matches the end of any line.
4. Perform only one variable substitution example

Copy Code code as follows:

$var = 1;
$line =;
while ($var < 10) {
$result = $line =~/$var/O;
$line =;
$var + +;
}

Match/1/each time.
5. Consider a string as a single case
/a.*bc/s matches the string axxxxx \NXXXXBC, but/a.*bc/does not match the string.
6. Ignore spaces in the pattern
/\d{2} ([\w]) \d{2} \1 \d{2}/x equivalent to/\d{2} ([\w]) \d{2}\1\d{2}/.

V. Substitution operator
The syntax is s/pattern/replacement/, and the effect is to replace the part in the string with the pattern in replacement. Such as:

Copy Code code as follows:

$string = "Abc123def";
$string =~ s/123/456/; # now $string = "Abc456def";

You can use the pattern order variable $n in the replacement section, such as s/(\d+)/[$1]/, but special characters that do not support the pattern in the replacement section, such as {},*,+, etc., such as s/abc/[def]/will replace ABC with [DEF].
The options for the Replace operator are as follows:
Option description
G Change all matches in the pattern
I ignore capitalization in the pattern
E substitution string as an expression
M treats the string to be matched as multiple rows
o Only assign one time
s treats the string to be matched as a single line
x ignores whitespace in the pattern
Note: The E option considers the replacement part of the string as an expression and evaluates its value before replacing it, such as:

Copy Code code as follows:

$string = "0ABC1";
$string =~ s/[a-za-z]+/$& x 2/e; # now $string = "0ABCABC1"

Vi. translation Operators
This is another way to replace the syntax: tr/string1/string2/. Similarly, string2 is the replacement part, but the effect is to replace the first character in the string1 with the first character in the string2, replace the second character in the string1 with the second character in the string2, and so on. Such as:
$string = "ABCDEFGHICBA";
$string =~ tr/abc/def/; # now String = ' defdefghifed '
When string1 is longer than string2, its extra characters are replaced with the last character of String2, and the first substitution character is used when the same character occurs more than once in string1.
The options for translation operators are as follows:
Option description
C Translation of all unspecified characters
d Delete all specified characters
s indents multiple identical output characters into one
such as $string =~ tr/\d//C; Replace all non-numeric characters with spaces. $string =~ tr/\t//d, Remove tab and space, $string =~ tr/0-9//cs, and replace other characters between numbers with a single space.

Vii. Extended Pattern Matching
Perl supports some of the pattern-matching capabilities that PERL4 and standard UNIX pattern matching operations do not have. Its syntax is: (? pattern), where C is a character, patterns are the mode or sub pattern that works.
1. Do not store the matching contents in brackets
In Perl mode, the child mode in parentheses is stored in memory, which cancels the storage of the matches within the brackets, such as the \1 in the/(?: A|b|c) (D|e) f\1/that represents the matched D or E, rather than a or B or C.
2. Inline mode option
Typically, after the mode option is placed, there are four options: I, M, s, x can be used inline, syntax is:/(? option) pattern/, equivalent to/pattern/option.
3, affirmative and negative foresight match
The affirmative preview matching syntax is/pattern (? =string)/, whose meaning matches the pattern followed by string, instead, (?!). string) meaning to match a pattern that is not followed by string, such as:

Copy Code code as follows:

$string = "25abc8";
$string =~/abc (? =[0-9])/;
$matched = $&; # $& is the matching pattern, here for ABC, not ABC8

4, Mode annotation
In PERL5, you can use the #来加注释 in a pattern, such as:

Copy Code code as follows:

if ($string =~/(? i) [a-z]{2,3} (? # match two or three alphabetic characters)/{
...
}

The summary is summarized as follows:
The syntax used by/pattern/in a word processing pattern
/pattern/
Results
.
Find a string with only one character in addition to the newline character \ n
X?
Looking for 0 or 1 x characters
x*
Looking for 0 or 0 x characters
.*
Looking for 0 or more than 0 characters
x+
Looking for 0 or 1 x characters
.+
Looking for 1 or more than 1 characters
{m}
Find the exact number of M-specified characters
{M,n}
Looking at the number of m numbers above, n each number of the following specified characters
{m,}
Find the number of m above the specified characters
[]
Find characters in []
[^]
Find characters that don't match []
[0-9]
Find any one character that matches 0 to 9.
[A-z]
Find any one character that matches a to Z
[^0-9]
Find any one character that does not conform to 0 to 9
[^a-z]
Find any character that does not conform to A to Z
^
Find characters at the beginning of a character
$
Find characters at the end of a character
\d
Looking for a digit (number) character, as in [0-9] syntax
\d+
Looking for a digit (number) of strings above, and [0-9]+ syntax
\d
Looking for a non-digit (non-numeric) character, as in [^0-9] syntax
\d+
Look for a non-digit (not a number) character, as in [^0-9]+ syntax
\w
Search for an English letter or numeric character, as in [a-za-z0-9] syntax
\w+
Search for more than one English letter or numeric character, as in [a-za-z0-9]+ syntax
\w
Find a character that is not an English letter, a numeric value, and the same as [^a-za-z0-9] syntax
\w+
Search for more than one letter, numeric characters, and [^a-za-z0-9]+ syntax
\s
Looking for a blank character, as in [\n\t\r\f]
\s+
Search for more than one blank character, as in [\n\t\r\f]+
\s
Find a character that is not blank, as in [^\n\t\r\f]
\s+
Search for more than one blank character, as in [^\n\t\r\f]+
\b
Looking for a string that doesn't have an English letter, a value that is a boundary
\b
Find a string with an English letter and a value
A|b|c
Find a string that matches a character or a B or C character
Abc
Find a string that contains ABC
(pattern)
() This symbol is the memory of the found character, is a very useful grammar
The string found in the first () becomes either the variable or the \1 variable
The string found in the second () becomes a $ $ or \2 variable
And so on, the author will explain its usage in detail in the next section
/pattern/i
I this parameter is to ignore the meaning of the English case, that is, in the search for strings, not to consider the case of English
\
If you want to find a character in pattern mode that has a special meaning, precede the character with the symbol, which will invalidate the special character.

A simple example of two-word processing mode (Regular Expression)
Read the previous section of the word processing model (Regular Expression), beginners for the application of this grammar may not be very clear, so the author will be in this section, some examples in the word processing mode for you to see:
Example
Description
/perl/
Find a String containing Perl
/^perl/
Find a string that starts with Perl
/perl$/
Find a string with Perl at the end
/c|g|i/
Find a string containing C or G or I
/cg{2,4}i/
Find C followed by a string of 2 to 4 G, followed by I
/cg{2,}i/
Find C followed by a string of more than 2 G, followed by I
/cg{2}i/
Find C followed by a string of 2 G, followed by I
/cg*i/
Find C followed by a string of 0 or more g, followed by I, as/cg{0,1}i/
/cg+i/
Find C followed by more than one g, followed by a string of C, as/cg{1,}i/
/cg?i/
Find C followed by 0 or a G, followed by a C string, like/cg{0,1}i/
/c.i/
Find C followed by an arbitrary character, followed by the string of I
/C.. i/
Find C followed by a string of two arbitrary characters followed by I
/[cgi]/
Find a string that matches any one of these three characters
/[^cgi]/
Find a string without any of these three characters
/\d/
Find strings that match numbers
You can use/\d+/to represent a string of one or more values
/\d/
Find a string that matches a value that is not numeric
You can use/\d+/to represent one or more non-numeric strings.
/\w/
Find strings that match English letters and numbers
You can use/\w+/to represent a string of one or more English letters, numeric values.
/\w/
Find strings that match non-English letters, numeric characters
You can use/\w+/to represent one or more non-English letters, numeric strings
/\s/
Find a string that matches a blank
You can use/\s+/to represent a string of one or more whitespace characters
/\s/
Find a string that is not blank
You can use/\s+/to represent one or more strings that are not blank characters
/\*/
Find a string that matches the symbol for * because it has a special meaning in the word processing mode, so add the symbol before this special symbol to invalidate the special character.
/abc/i
Find strings that match ABC and do not consider the case of these strings
three character processing mode (Regular expresion) related operators and functions
In Perl program writing, the =~ and!~ operators and the S and T two functions are used to match the word processing pattern/pattern/, and if they can use these instructions, they can be very. It's easy to deal with strings and, of course, it's more handy in CGI programming. Now let the authors introduce the use of these operators and functions:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

regular expression matching in perl pattern matching file pattern matching pattern matching test php pattern matching examples text pattern matching php pattern matching

Q,qw,qr,qx,qq__perl in the PERL language 08-20

The role of Perl variables I, our, local and global variables... 08-21

A brief introduction to Perl language and its pros and cons 08-23

Perl Package related 10-11

Perl callback functions and closures 10-03

Perl CPAN: Modulelist solution 01-13

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Pattern matching in Perl learning notes _perl

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support