I. Three Forms of Regular Expressions
First, we should know that there are three regular expressions in the Perl program:
Match: M/<Regexp>;/(can also be abbreviated as/<Regexp>;/, skip m)
Replace: S/<pattern >;/ <replacement> ;/
Conversion: TR/<pattern>;/<replacemnt> ;/
The three forms are generally equal to = ~ Or !~ Use together (where "= ~ "Indicates a match, which is read as does in the entire statement ,"!~ "Does not match, read as doesn' t in the entire statement), and the scalar variable to be processed on the left. If this variable does not exist and = ~ !~ Operator, the content in the $ _ variable is processed by default. Example:
$ STR = "I love Perl ";
$ STR = ~ M/perl/; # indicates that if the "Perl" string is found in $ STR, "1" is returned; otherwise, "0" is returned ".
$ STR = ~ S/perl/bash/; # Replace the "Perl" string in the variable $ STR with "bash". If this replacement happens, "1" is returned; otherwise, "0" is returned ".
$ Str !~ TR/A-Z/a-z/; # indicates converting all uppercase letters in the variable $ STR to lowercase letters. If the conversion happens, "0" is returned; otherwise, "1" is returned ".
There are also:
Foreach (@ array) {S/a/B/;} # Here, each loop extracts an element from the @ array and stores it in the $ _ variable, and replace $.
While (<file >;) {print if (M/error/) ;}# this sentence is a little more complex. It prints all lines in the file containing the error string.
If () appears in the Regular Expression of Perl, the pattern in () is automatically assigned to the system $1 by the perl interpreter after matching or replacement, $2 ...... see the following example:
$ String = "I love Perl ";
$ String = ~ S/(love)/<$1>; //; #$1 = "love" at this time, and the result of this replacement is to change $ string to "I <love>; Perl"
$ String = "I love Perl ";
$ String = ~ S/(I )(. *) (Perl)/<$3>; $2 <$1>; //; # Here $1 = "I", $2 = "love ", $3 = "Perl", with $ string changed to "<Perl>; love <I> ;"
Replace operation S/<pattern>;/<replacement>;/You can add the E or G parameter at the end. Their meanings are as follows:
S/<pattern>;/<replacement>;/G indicates replacing all the modes that match <pattern>; in the string to be processed with <replacement>; string, instead of replacing the first appearance mode.
S/<pattern>;/<replacement>;/e indicates that the <replacemnet>; part is treated as an operator. this parameter is rarely used.
For example:
$ String = "I: Love: Perl ";
$ String = ~ S/:/*/; # $ string = "I * Love: Perl ";
$ String = "I: Love: Perl ";
$ String = ~ S/:/*/g; # $ string = "I * Love * Perl ";
$ String = ~ TR/* //; # $ string = "I love Perl ";
$ String = "www22cgi44 ";
$ String = ~ S/(/d +)/$1*2/E; # (/d +) represents one or more numeric characters in $ string, perform the * 2 operation on these numeric characters, so the last $ string is changed to "www44cgi88 ".
The following is a complete example:
#! /Usr/bin/perl
Print "enter a string! /N ";
$ String = <stdin >;#< stidn>; indicates the standard input, which allows the user to enter a string.
Chop ($ string); # Delete the character/N of the last line break of $ string
If ($ string = ~ /Perl /){
Print ("the input string contains the Perl string! /N ";
}
If the input string contains the Perl string, the following prompt is displayed.
Ii. Common patterns in Regular Expressions
Below are some common patterns in regular expressions.
/Pattern/result
. Match All characters except line breaks
X? Match 0 times or once x string
X * matches 0 or multiple times X strings, but the minimum number of possible matches
X + matches the string once or multiple times, but the minimum number of possible matches
. * Match any character 0 or once
. + Match any character once or multiple times
{M} matches a specified string of exactly M.
{M, n} matches a specified string of more than n m
{M,} matches more than m specified strings
[] Match characters in []
[^] Does not match characters in []
[0-9] Match All numeric characters
[A-Z] matches all lowercase letter characters
[^ 0-9] match all non-numeric characters
[^ A-Z] matches all non-lowercase letter characters
^ Match characters starting with a character
$ Match characters at the end of a character
/D matches the character of a number, which is the same as the [0-9] syntax.
/D + matches multiple numeric strings, the same as the [0-9] + syntax
/D is not a number; others are the same as/d
/D + non-numeric, others are the same as/d +
/W a string of letters or numbers, the same as the [a-zA-Z0-9] syntax
/W + the same syntax as [a-zA-Z0-9] +
/W a string of Non-English letters or numbers, the same as the [^ a-zA-Z0-9] syntax
/W + the same syntax as [^ a-zA-Z0-9] +
/S space, which is the same as the [/n/T/R/F] syntax.
/S + is the same as [/n/T/R/F] +
The/s is not a space. It is the same as the [^/n/T/R/F] syntax.
The/S + and [^/n/T/R/F] + syntaxes are the same.
/B matches strings with English letters and numbers.
/B matches strings that do not contain English letters and numbers.
A | B | C: a string that matches the character, B character, or C character
ABC matches strings containing ABC
(Pattern) () This symbol remembers the searched string, which is a very useful syntax. The string found in the first () is changed to $1 or/1, and the string found in the second () is changed to $2 or/2, and so on.
The/pattern/I parameter indicates that the English case is ignored, that is, when matching strings, the English case is not considered.
/If you want to find a special character in pattern mode, such as "*", add the/symbol before the character to invalidate the special character.
The following are some examples:
Example
/Perl/find a string containing Perl
/^ PERL/find a string starting with Perl
/Perl $/find the string ending with Perl
/C | G | I/find a string containing C, G, or I
/CG {2, 4} I/find C followed by 2 to 4G, followed by the I string
/CG {2,} I/find C followed by more than 2g, followed by the I string
/CG {2} I/find C followed by 2g, followed by the I string
/CG * I/find C followed by 0 or more G, followed by the I string, AS/CG {0, 1} I/
/CG + I/find C followed by more than one G, followed by the I string, like/CG {1,} I/
/CG? I/find C followed by 0 or 1g, followed by the I string, AS/CG {0, 1} I/
/C. I/find C followed by an arbitrary character, followed by the string of I
/C. I/find C followed by two arbitrary characters, followed by the I string
/[CGI]/find a string that matches any of the three characters
/[^ CGI]/find any one of the three characters
// D/find the character that matches the number. You can use // D +/to represent a string consisting of one or more numbers.
// D/find a character that matches a non-numeric character. You can use // D +/to represent a string consisting of one or more non-numeric characters.
// */Find the character that matches *. Because * has its special meaning in a regular expression, you must add/before this special symbol to invalidate this special character.
/ABC/I: Find strings that match ABC, regardless of the Case sensitivity of these strings.
Iii. Eight Principles of Regular Expressions
If SED, awk, and grep commands have been used in UNIX, it is believed that the regular expression (Regular Expression) in Perl will not be unfamiliar. The PERL language has this function, so it is very capable of processing strings. In programs in the Perl language, you can often see the use of regular expressions, which is no exception in CGI programming.
Regular Expressions are difficult for beginners in Perl, but once you have mastered the syntax, You can have almost unlimited pattern matching capabilities, and most of the work of Perl programming is to master regular expressions. The following describes the eight principles used in regular expressions.
Regular Expressions can form a large consortium in the battle against data-this is often a war. We should remember the following eight principles:
· Principle 1: regular expressions have three different forms (matching (M //), replacement (S // EG), and conversion (TR ///)).
· Principle 2: Regular Expressions only match scalar values ($ scalar = ~ M/A/; can work; @ array = ~ M/A/will treat @ array as a scalar, so it may not succeed ).
· Principle 3: Regular Expressions match the earliest possible match of a given pattern. Lack of time, only match or replace the regular expression once ($ A = 'string string2'; $ A = ~ S/string //; causes $ A = 'string 2 ').
· Principle 4: Regular Expressions can process any and all characters that double quotation marks can process ($ A = ~ M/$ varb/extend varb to a variable before matching. If $ varb = 'A' $ A = 'as', $ A = ~ S/$ varb //; equivalent to $ A = ~ S/A //;, the execution result is $ A = "S ").
· Principle 5: the regular expression produces two situations in the value evaluation process: Result Status and reverse reference: $ A = ~ M/pattern/indicates whether the child string pattern appears in $ A, $ A = ~ S/(word1) (word2)/$2 $1/the word "change.
· Principle 6: the Core Competence of Regular Expressions lies in wildcards and Multiple matching operators and how they operate. $ A = ~ M // W +/match one or more word characters; $ A = ~ M // D/"matches zero or multiple numbers.
· Principle 7: To match more than one character set, Perl uses "|" to increase flexibility. If M/(cat | dog)/is input, it is equivalent to "matching string cat or dog.
· Principle 8: Perl (?..) The Syntax provides extended functions for regular expressions. (Please read related materials after class)
Want to learn all these principles? I suggest you start with a simple process and keep trying and experimenting. In fact, if you learn $ A = ~ M/error/is to find the sub-string error in $ A, so you have gained a greater processing capability than in a lower-level language such as C.