9 Regular Expressions in Perl
Three forms of regular expressions
Common patterns in regular expressions
8 Principles of Regular expressions
Regular expressions are a major feature of the Perl language and a bit of a problem in Perl, but if you can master him well, you can easily use regular expressions to complete the task of string processing, which is more handy in CGI programming. Below we list some of the basic grammar rules when writing regular expressions.
--------------------------------------------------------------------------------
9.1 Three forms of regular expressions
First of all we should know that in Perl programs, there are three forms of regular expressions, namely:
Matches: m/<regexp>;/ (can also be shortened to/<regexp>;/, omit m)
Replacement: s/<pattern>;/<replacement>;/
Conversion: tr/<pattern>;/<replacemnt>;/
These three forms are generally used in conjunction with =~ or!~ (where "=~" means matching, read as does in the entire statement, "!~" for mismatches, read as doesn ' t in the entire statement), and a scalar variable to be processed on the left. If you do not have the variable and the =~!~ operator, the default is to handle the contents of the $_ variable. Examples are as follows:
$str = "I love Perl";
$str =~ m/perl/; # indicates that if a "Perl" string is found in the $STR, return "1" or return "0".
$str =~ s/perl/bash/; # indicates that the "Perl" string in the variable $str is replaced with "BASH" and returns "1" If this substitution occurs, otherwise "0" is returned.
$str!~ tr/a-z/a-z/; # indicates that all uppercase letters in the variable $str are converted to lowercase letters and returns "0" if the conversion occurs, otherwise "1" is returned.
In addition, there are:
foreach (@array) {s/a/b/;} # Here Each loop takes an element from the @array array and stores it in the $_ variable and replaces the $_.
while (<FILE>;) {print if (m/error/);} # This is a bit more complicated, and he will print all the rows in the file that contain the error string.
if () occurs in a regular expression in Perl, the pattern in which a match or replace occurs () is automatically assigned to the system by the Perl interpreter. Take a look at the following example:
$string = "I Love Perl";
$string =~ s/(Love)/<$1>;/; # now = "Love", and the result of the substitution is to change the $string to "I <love>; Perl
$string = "I love Perl";
$string =~ s/(i) (. *) (Perl)/<$3>;$2<$1>;/; # here is $ = "I", $ = "Love", $ $ = "Perl", and after the replacement $string becomes "<perl>; Love <i>; "
Replace Operation s/<pattern>;/<replacement>;/ You can also add the E or G parameters at the end, their meanings are:
S/<pattern>;/<replacement>;/g indicates that all the strings in the pending string are in accordance with <pattern>; mode is all replaced by <replacement>; String, instead of just replacing the first occurrence of the pattern.
S/<pattern>;/<replacement>;/e said it would put <replacemnet>; Part as an operator, this parameter is not used much.
For example, the following:
$string = "I:love:perl";
$string =~ s/:/*/; #此时 $string = "I*love:perl";
$string = "I:love:perl";
$string =~ s/:/*/g; #此时 $string = "I*love*perl";
$string =~ tr/*//; #此时 $string = "I love Perl";
$string = "www22cgi44";
$string =~ s/(\d+)/$1*2/e; # (/d+) represents one or more numeric characters in the $string, which performs the operation of the numeric characters, so the last $string becomes "www44cgi88".
A complete example is given below:
#!/usr/bin/perl
Print "Please enter a string!\n";
$string = <STDIN>;; # <STIDN>; represents standard input, which allows the user to enter a string
Chop ($string); # Remove the last newline character \ $string
if ($string =~/perl/) {
Print ("The string in the input has Perl in the strings!\n";
}
If the input string contains a Perl string, the following message is displayed.
9.2 Common patterns in regular expressions
The following are some common patterns in regular expressions.
/pattern/Results
. Match all characters except line break
X? Match 0 or one x string
x* matches 0 or more x strings, but matches the minimum possible number of times
x+ matches 1 or more x strings, but matches the minimum possible number of times
. * matches any character 0 or one time
. + Match any character 1 or more times
{m} matches exactly a specified string of M
{M,n} matches a specified string below M + N
{m,} matches more than one specified string of M
[] matches the characters in []
[^] matches characters that do not match []
[0-9] match all numeric characters
[A-z] matches all lowercase alphabetic characters
[^0-9] matches all non-numeric characters
[^a-z] matches all non-lowercase alphabetic characters
^ characters that match the beginning of a character
$ matches characters ending with characters
\d matches a number of characters, like [0-9] Syntax
\d+ matches multiple numeric strings, as with [0-9]+ syntax
\d non-digital, other with \d
\d+ non-digital, other with \d+
\w A string of English letters or numbers, as in [a-za-z0-9] syntax
\w+ is the same as [a-za-z0-9]+ syntax]
\w A string of non-English letters or numbers, as in [^a-za-z0-9] syntax
\w+ is the same as [^a-za-z0-9]+ syntax]
\s spaces, as with [\n\t\r\f] syntax
\s+ and [\n\t\r\f]+ like
\s non-whitespace, same as [^\n\t\r\f] syntax
\s+ is the same as [^\n\t\r\f]+ syntax]
\b Matches a string with a border of English letters and numbers
\b Matches a string that is not in the English alphabet, the value is the boundary
A|b|c matches strings that match the A or B or C characters
ABC matches a string containing ABC
(pattern) () This symbol remembers the string you are looking for and is a useful syntax. The string found in the first () becomes either the variable or the \1 variable, the string found in the second () becomes the variable or \2 variable, and so on.
/pattern/i I this parameter indicates ignoring the English case, that is, when matching the string, do not consider the English case.
\ If you want to find a special character in pattern mode, such as "*", precede the character with a \ symbol to invalidate the special character
Here are a few examples:
Example description
/perl/find a String containing Perl
/^perl/find a string that begins with Perl
/perl$/found a string with Perl at the end
/c|g|i/find a String containing C or G or I
/cg{2,4}i/find C followed by 2 to 4 G, followed by a string of I
/cg{2,}i/find C followed by more than 2 G, followed by the I string
/cg{2}i/find C followed by 2 G, followed by the I string
/cg*i/find C followed by 0 or more g, followed by a string of I, as/cg{0,1}i/
/cg+i/find C followed by more than a G, followed by the string I, as/cg{1,}i/
/cg?i/find C followed by 0 or 1 G, followed by the string I, like/cg{0,1}i/
/c.i/find C followed by an arbitrary character, followed by the string I
/C.. i/find C followed by two arbitrary characters, followed by the I string
/[cgi]/find a string that matches any one of these three characters
/[^cgi]/found a string without any of these three characters
/\d/search for characters that match numbers, you can use/\d+/to represent a string of one or more numbers
/\d/search for characters that match not numbers, you can use/\d+/to represent one or more non-numeric strings
/\*/seeks to conform to this character, because * has its special meaning in the regular expression, so to precede this special symbol with the \ symbol, so as to make this special character invalid
/abc/i find strings that match the ABC and do not consider the case of these strings
9.3 The eight principles of regular expressions
If you have used the commands of SED, awk, and grep in Unix, you are not unfamiliar with the regular expressions (Regular expression) in the Perl language. Because of this function, the Perl language has a strong ability to handle strings. In the Perl language program, the use of regular expressions can often be seen, in CGI programming is no exception.
Regular expressions are a tricky part of learning Perl, but once you have mastered their syntax, you can have almost unlimited pattern-matching capabilities, and much of the work of Perl programming is mastering regular expressions. Here are some of the 8 principles used in the process of using regular expressions.
Regular expressions can form a huge coalition in the battle against data-often a war. We have to remember the following eight principles:
· Principle 1: Regular expressions have three different forms (match (m//), replace (s///eg), and conversion (tr///)).
· Principle 2: Regular Expressions match only scalars ($scalar =~ m/a/; @array =~ m/a/will treat @array as a scalar and therefore may not succeed).
· Principle 3: The regular expression matches the earliest possible match for a given pattern. By default, only the regular expression is matched or replaced once ($a = ' string string2 '; $a =~ s/string//; causes $a = ' String 2 ').
· Principle 4: Regular expressions can handle any and all characters that double quotes can handle ($a =~ m/$varb/Extend Varb to variables before matching, if $varb = ' a ' $a = ' as ', $a =~ s/$varb//; equivalent to $a =~ s/a//; The result of the execution is $a = "s").
· Principle 5: The regular expression in the evaluation process produces two cases: the result state and the reverse reference: $a =~ m/pattern/Indicates whether there is a substring pattern appearing in $a, $a =~ s/(word1) (WORD2)/$2$1/"swap" the two words.
· Principle 6: The core competencies of regular expressions are wildcard and multi-match operators and how they operate. $a =~ m/\w+/matches one or more word characters, $a =~ m/\d/"matches 0 or more digits.
· Principle 7: If you want to match more than one character set, Perl uses "|" to increase flexibility. If input m/(Cat|dog)/is equivalent to "match string cat or dog."
· Principle 8:perl provides extended functionality to regular expressions using (?..) syntax. (This is to ask students to read the relevant information after class)
Want to learn all these principles? I suggest you start with the simple, and try and experiment constantly. In fact, if you learn $a =~ m/error/is looking for substring ERROR in $a, then you are already getting more processing power than in lower-level languages such as C.
Note: This article reproduced here, the text of \w seems to be wrong, \w match a word in a character, the number of letters underlined.
The regular expression of Perl learning