Thanks to aka and the author.
Regular Expressions in Perl
Three forms of regular expressions
Common patterns in regular expressions
The 8 major principles of regular expressions
Regular expressions are a major feature of the Perl language and a bit of a pain in the Perl program, but if you have a good grasp of him, you can easily use regular expressions to complete the task of string processing, of course, in the CGI programming is more handy. Here we list some of the basic syntax rules for writing regular expressions.
9.1 Regular expressions in three different forms
First, we should know that in Perl programs, regular expressions have three forms of existence, respectively:
Match: m/<regexp>;/ (can also be abbreviated as/<regexp>;/, omit m)
Replace: s/<pattern>;/<replacement>;/
Transformation: tr/<pattern>;/<replacemnt>;/
These three forms are generally used in combination with =~ or!~ (where "=~" is matched, read as does in the entire statement, "!~" means mismatched, read as doesn in the entire statement), and scalar variables to be processed on the left. If you do not have the variable and the =~!~ operator, the default is to handle the contents of the $_ variable. Examples are as follows:
$str = "I love Perl";
$str =~ m/perl/; # indicates that if the "Perl" string is found in the $STR, return "1" otherwise return "0".
$str =~ s/perl/bash/; # means to replace the "Perl" string in the variable $str with "BASH" and return "1" If this substitution occurs, otherwise return "0".
$str!~ tr/a-z/a-z/; # represents the conversion of all uppercase letters in the variable $STR to lowercase letters, and returns "0" if the conversion takes place, or returns "1".
In addition, there are:
foreach (@array) {s/a/b/} # Here Each loop takes an element out of the @array array and stores it in the $_ variable and replaces it with the $_.
while (<FILE>;) {print if (m/error/);} # This one is a little more complicated, and he will print all the lines in the file that contain the error string.
if () in Perl's regular expression, the pattern within the match or replace () is automatically assigned by the Perl interpreter to the system $ ... Take a look at the following example:
$string = "I Love Perl";
$string =~ s/(Love)/<$1>;/; # at this time = love, and the result of this substitution is to change the $string to "I <love>; Perl
$string = "I love Perl";
$string =~ s/(i) (. *) (Perl)/<$3>;$2<$1>;/; # Here $ = "I", $ = "Love", $ = "Perl", and replaced $string into "<perl>; Love <i>; "
Replace Operation s/<pattern>;/<replacement>;/ You can also add the E or G parameters at the end, and their meanings are:
S/<pattern>;/<replacement>;/g indicates that all of the strings to be processed conform to <pattern>; To replace all of the modes with <replacement>; string instead of replacing only the first occurrence of the pattern.
S/<pattern>;/<replacement>;/e said the <replacemnet>; Part as an operator, this parameter is not used much.
For example, the following examples:
$string = "I:love:perl";
$string =~ s/:/*/; #此时 $string = "I*love:perl";
$string = "I:love:perl";
$string =~ s/:/*/g; #此时 $string = "I*love*perl";
$string =~ tr/*//; #此时 $string = "I love Perl";
$string = "www22cgi44";
$string =~ s/(\d+)/$1*2/e; # (/d+) represents one or more numeric characters in the $string, performing *2 operations on these numeric characters, so that the last $string becomes "www44cgi88".
A complete example is given below:
#!/usr/bin/perl
Print "Please enter a string!\n";
$string = <STDIN>;; # <STIDN>; represents standard input, which allows the user to enter a string
Chop ($string); # remove $string last newline character \ n
if ($string =~/perl/) {
Print ("The input string has the Perl this string!\n";
}
If you enter a string containing the Perl string, the following message is displayed.
9.2 Common patterns in regular expressions
The following are some of the common patterns in regular expressions.
/pattern/Results
. Match all characters except line breaks
X? Match 0 times or one x string
x* matches 0 or more x strings, but matches the least possible number of times
x+ matches 1 or more x strings, but matches the least possible number of times
. * Any character that matches 0 or more times
. + any character that matches 1 or more times
{m} matches a specified string that is just M
{M,n} matches the specified string below m + N
{m,} matches a specified string of more than M
[] matches characters within []
[^] matches characters that are not in accordance with []
[0-9] matches all numeric characters
[A-z] matches all lowercase alphabetic characters
[^0-9] matches all non-numeric characters
[^a-z] matches all non-lowercase alphabetic characters
^ characters that match the beginning of a character
$ characters that match the end of a character
\d a character that matches a number, as in [0-9] syntax
\d+ matches multiple numeric strings, as with [0-9]+ syntax
\d, other than \d
\d+, other than \d+
\w A string of English letters or numbers, as with [a-za-z0-9] syntax
\w+ is the same as [a-za-z0-9]+ syntax
\w A string of non-English letters or numbers, as in [^a-za-z0-9] syntax
\w+ is the same as [^a-za-z0-9]+ syntax
\s spaces, as with [\n\t\r\f] syntax
\s+ and [\n\t\r\f]+.
\s is not a space and is the same as [^\n\t\r\f] syntax
\s+ is the same as [^\n\t\r\f]+ syntax
\b Matches a string with an English letter, a number as a boundary
\b Matches a string that does not have an English letter and a value as a boundary
A|b|c matches a string with a character or a B or C character
ABC matches a string containing ABC
(pattern) () This symbol will remember the string you are looking for, and it is a very useful syntax. The string found in the first () becomes either a variable or a \1 variable, and the string found in the second () becomes a $ $ or \2 variable, and so on.
/pattern/i I This parameter indicates that the English case is ignored, that is, when the string is matched, the case of English is not considered.
\ If you want to find a special character in pattern mode, such as "*", then precede the character with a \ symbol, which will invalidate the special character.
Here are a few examples:
Example description
/perl/find a String containing Perl
/^perl/find a string that begins with Perl
/perl$/found a string with Perl at the end
/c|g|i/find a String containing C or G or I
/cg{2,4}i/find C followed by a string of 2 to 4 G, followed by I
/cg{2,}i/find C followed by a string of more than 2 G, followed by I
/cg{2}i/find C followed by a string of 2 G, followed by I
/cg*i/find C followed by 0 or more g, followed by the string of I, as/cg{0,1}i/
/cg+i/find C followed by more than one g, followed by a string of I, as/cg{1,}i/
/cg?i/find C followed by a string of 0 or 1 G, followed by I, as/cg{0,1}i/
/c.i/find C followed by an arbitrary character, followed by the string of I
/C.. i/find C followed by a string of two arbitrary characters followed by I
/[cgi]/find a string that matches any one of these three characters
/[^cgi]/find a string without any of these three characters
/\d/find characters that match numbers, you can use/\d+/to represent one or more numbers of strings
/\d/to find characters that are not numbers, you can use/\d+/to represent one or more non-numeric strings.
/\*/seeks to match the character of *, because * has its special meaning in the regular expression, so it is necessary to add the symbol before this special symbol to invalidate the special character.
/abc/i find strings that match ABC and do not consider the case of these strings
Eight principles of 9.3 regular expressions
If you've ever used sed, awk, and grep in Unix, you're not unfamiliar with regular expressions in the Perl language (Regular Expression). Because of this functionality, the Perl language has a very strong ability to handle strings. In Perl language programs, regular expressions can often be used, and are no exception in CGI programming.
Regular expressions are a difficult part of beginner Perl, but once you have mastered the syntax, you can have almost unlimited pattern-matching capabilities, and most of the work of Perl programming is to master regular expressions. Here are some of the 8 principles used in the regular expression process.
Regular expressions can form huge alliances in the battle against data-often a war. We have to remember the following eight principles:
· Principle 1: Regular expressions have three different forms (matching (m//), substitution (s///eg) and conversions (tr///)).
· Principle 2: Regular Expressions match only scalars ($scalar =~ m/a/; @array =~ m/a/will treat @array as a scalar and therefore may not succeed).
· Principle 3: A regular expression matches the earliest possible match of a given pattern. By default, only regular expressions are matched or replaced once ($a = ' string string2 '; $a =~ s/string//; cause $a = ' String 2 ').
· Principle 4: A regular expression can handle any and all of the characters that can be handled by double quotes ($a =~ m/$varb/Extend the Varb to a variable before matching; if $varb = ' a ' $a = ' as ', $a =~ s/$varb//; equivalent to $a =~ s/a//; The execution result is $a = "s").
· Principle 5: Regular expressions produce two situations in the evaluation process: The result state and the reverse reference: $a =~ m/pattern/Indicates whether there are substring pattern occurrences in the $a, $a =~ s/(word1) (WORD2)/$2$1/then "swap" the two words.
· Principle 6: The core competency of regular expressions is the wildcard and multiple matching operators and how they operate. $a =~ m/\w+/matches one or more word characters, $a =~ m/\d/"matches 0 or more digits.
· Principle 7: If you want to match more than one character set, Perl uses "|" to increase flexibility. If the input m/(Cat|dog)/is equivalent to "match string cat or dog."
· Principle 8:perl provides extended functionality for regular expressions using (?..) syntax. (This point invites students to see the relevant information after class)
Want to learn all these principles? I suggest that you start with a simple beginning, and try and experiment constantly. In fact, if you learn $a =~ m/error/is looking for substring ERROR in the $a, you are already getting more processing power than in lower-level languages like C.
Add:
Good
Concise and clear
But in the original
\w A string of English letters or numbers, as with [a-za-z0-9] syntax
That seems wrong.
I remember should also include underline, namely [a-za-z_0-9]
/cg*i/find C followed by 0 or more g, followed by the string of I, as/cg{0,1}i/
That's a typo.
Should be
/cg*i/find C followed by 0 or more g, followed by the string of I, as/cg{0,}i/