An introduction to regular expressions in Perl

Source: Internet
Author: User
Tags character set numeric lowercase perl interpreter regular expression scalar
Regular expressions are a major feature of the Perl language and a bit of a pain in the Perl program, but if you have a good grasp of him, you can easily use regular expressions to complete the task of string processing, of course, in the CGI programming will be more handy

Thanks to aka and the author.

Regular Expressions in Perl
three forms of regular expressions

Common patterns in regular expressions

The 8 major principles of regular expressions

Regular expressions are a major feature of the Perl language and a bit of a pain in the Perl program, but if you have a good grasp of him, you can easily use regular expressions to complete the task of string processing, of course, in the CGI programming is more handy. Here we list some of the basic syntax rules for writing regular expressions.

9.1 Regular expressions in three different forms
First, we should know that in Perl programs, regular expressions have three forms of existence, respectively:

Match: m/<regexp>;/ (can also be abbreviated as/<regexp>;/, omit m)

Replace: s/<pattern>;/<replacement>;/

Transformation: tr/<pattern>;/<replacemnt>;/

These three forms are generally used in combination with =~ or!~ (where "=~" is matched, read as does in the entire statement, "!~" means mismatched, read as doesn in the entire statement), and scalar variables to be processed on the left. If you do not have the variable and the =~!~ operator, the default is to handle the contents of the $_ variable. Examples are as follows:

$str = "I love Perl";
$str =~ m/perl/; # indicates that if the "Perl" string is found in the $STR, return "1" otherwise return "0".
$str =~ s/perl/bash/; # means to replace the "Perl" string in the variable $str with "BASH" and return "1" If this substitution occurs, otherwise return "0".
$str!~ tr/a-z/a-z/; # represents the conversion of all uppercase letters in the variable $STR to lowercase letters, and returns "0" if the conversion takes place, or returns "1".

In addition, there are:

foreach (@array) {s/a/b/} # Here Each loop takes an element out of the @array array and stores it in the $_ variable and replaces it with the $_.
while (<FILE>;) {print if (m/error/);} # This one is a little more complicated, and he will print all the lines in the file that contain the error string.

if () in Perl's regular expression, the pattern within the match or replace () is automatically assigned by the Perl interpreter to the system $ ... Take a look at the following example:

$string = "I Love Perl";
$string =~ s/(Love)/<$1>;/; # at this time = love, and the result of this substitution is to change the $string to "I <love>; Perl
$string = "I love Perl";
$string =~ s/(i) (. *) (Perl)/<$3>;$2<$1>;/; # Here $ = "I", $ = "Love", $ = "Perl", and replaced $string into "<perl>; Love <i>; "

Replace Operation s/<pattern>;/<replacement>;/ You can also add the E or G parameters at the end, and their meanings are:

S/<pattern>;/<replacement>;/g indicates that all of the strings to be processed conform to <pattern>; To replace all of the modes with <replacement>; string instead of replacing only the first occurrence of the pattern.
S/<pattern>;/<replacement>;/e said the <replacemnet>; Part as an operator, this parameter is not used much.

For example, the following examples:

$string = "I:love:perl";
$string =~ s/:/*/; #此时 $string = "I*love:perl";
$string = "I:love:perl";
$string =~ s/:/*/g; #此时 $string = "I*love*perl";
$string =~ tr/*//; #此时 $string = "I love Perl";
$string = "www22cgi44";
$string =~ s/(d+)/$1*2/e; # (/d+) represents one or more numeric characters in the $string, performing *2 operations on these numeric characters, so that the last $string becomes "www44cgi88".

A complete example is given below:

#!/usr/bin/perl

Print "Please enter a string!n";
$string = <STDIN>;; # <STIDN>; represents standard input, which allows the user to enter a string
Chop ($string); # Delete the $string last newline character n
if ($string =~/perl/) {
Print ("The input string has the Perl this string!n";
}

If you enter a string containing the Perl string, the following message is displayed.

9.2 Common patterns in regular expressions
The following are some of the common patterns in regular expressions.

/pattern/  results  
. Match all characters except line breaks
X? match 0 or one x string
x* match 0 or more x strings, but match possible minimum number of
x+ matches 1 times or multiple x strings, but matches a possible minimum number of times
. * matches any character 0 or more times
. + any character that matches 1 or more times
{m} matches a specified string that is just m,
{m,n} matches a reference of n below M The fixed string
{m,} matches the specified string of more than M
[] matches the characters within []
[^] matches a character that does not match []
[0-9] matches all numeric characters
[A-z] matches all lowercase alphabetic characters
[^0-9] matches all non-numeric characters
[^a-z] matches the character at the beginning of all non-lowercase alphabetic characters
^ matching characters
$ matching character
D matches a number of characters, and [0-9] syntax
d+ matches multiple numeric strings , as in [0-9]+ syntax as
D non-numeric, other as D
d+ non-numeric, other strings with d+
W English letters or numbers, and [a-za-z0-9] syntax
w+ and [a-za-z0-9]+ syntax
W is a string of non-English letters or numbers, as in [^a-za-z0-9] syntax
w+ and [^a-za-z0-9]+ syntax
s spaces, as in [NTRF] syntax
s+ and [ntrf]+
s are not spaces, and [^NTRF] Syntax
s+ and [^ntrf]+ syntax
B match a string with an English letter, a numeric boundary
B matches a string that is not in English letters, the value is a boundary
A|b|c matches characters that match a character or a B character or a C character String
ABC matching string with ABC
(pattern) () This symbol remembers the string you are looking for, and is a very useful syntax. The string found in the first () becomes either a variable or a 1 variable, and the string found in the second () becomes a $ $ or 2 variable, and so on. &nbsP The
/pattern/i I parameter indicates that the English case is ignored, that is, when matching strings, the case of English is not considered.
If you want to find a special character in pattern mode, such as "*", precede the character with a symbol to invalidate the special character
 

Here are a few examples:

Example description
/perl/find a String containing Perl
/^perl/find a string that begins with Perl
/perl$/found a string with Perl at the end
/c|g|i/find a String containing C or G or I
/cg{2,4}i/find C followed by a string of 2 to 4 G, followed by I
/cg{2,}i/find C followed by a string of more than 2 G, followed by I
/cg{2}i/find C followed by a string of 2 G, followed by I
/cg*i/find C followed by 0 or more g, followed by the string of I, as/cg{0,1}i/
/cg+i/find C followed by more than one g, followed by a string of I, as/cg{1,}i/
/cg?i/find C followed by a string of 0 or 1 G, followed by I, as/cg{0,1}i/
/c.i/find C followed by an arbitrary character, followed by the string of I
/C.. i/find C followed by a string of two arbitrary characters followed by I
/[cgi]/find a string that matches any one of these three characters
/[^cgi]/find a string without any of these three characters
/d/find characters that match numbers, you can use/d+/to represent one or more numbers of strings
/d/to find characters that are not numbers, you can use/d+/to represent one or more non-numeric strings.
/*/seeks to match the character of *, because * has its special meaning in the regular expression, so it is necessary to precede this special symbol before it will invalidate the special character.
/abc/i find strings that match ABC and do not consider the case of these strings

Eight principles of 9.3 regular expressions
If you've ever used sed, awk, and grep in Unix, you're not unfamiliar with regular expressions in the Perl language (Regular Expression). Because of this functionality, the Perl language has a very strong ability to handle strings. In Perl language programs, regular expressions can often be used, and are no exception in CGI programming.

Regular expressions are a difficult part of beginner Perl, but once you have mastered the syntax, you can have almost unlimited pattern-matching capabilities, and most of the work of Perl programming is to master regular expressions. Here are some of the 8 principles used in the regular expression process.

Regular expressions can form huge alliances in the battle against data-often a war. We have to remember the following eight principles:

· Principle 1: Regular expressions have three different forms (matching (m//), substitution (s///eg) and conversions (tr///)).

· Principle 2: Regular Expressions match only scalars ($scalar =~ m/a/; @array =~ m/a/will treat @array as a scalar and therefore may not succeed).

· Principle 3: A regular expression matches the earliest possible match of a given pattern. By default, only regular expressions are matched or replaced once ($a = ' string string2 '; $a =~ s/string//; cause $a = ' String 2 ').

· Principle 4: A regular expression can handle any and all of the characters that can be handled by double quotes ($a =~ m/$varb/Extend the Varb to a variable before matching; if $varb = ' a ' $a = ' as ', $a =~ s/$varb//; equivalent to $a =~ s/a//; The execution result is $a = "s").

· Principle 5: Regular expressions produce two situations in the evaluation process: The result state and the reverse reference: $a =~ m/pattern/Indicates whether there are substring pattern occurrences in the $a, $a =~ s/(word1) (WORD2)/$2$1/then "swap" the two words.

· Principle 6: The core competency of regular expressions is the wildcard and multiple matching operators and how they operate. $a =~ m/w+/matches one or more word characters, $a =~ m/d/"matches 0 or more digits.

· Principle 7: If you want to match more than one character set, Perl uses "|" to increase flexibility. If the input m/(Cat|dog)/is equivalent to "match string cat or dog."

· Principle 8:perl provides extended functionality for regular expressions using (?..) syntax. (This point invites students to see the relevant information after class)

Want to learn all these principles? I suggest that you start with a simple beginning, and try and experiment constantly. In fact, if you learn $a =~ m/error/is looking for substring ERROR in the $a, you are already getting more processing power than in lower-level languages like C.

Add:

Good
Concise and clear
But in the original
A string of W-letters or numbers, as in [a-za-z0-9] syntax
That seems wrong.
I remember should also include underline, namely [a-za-z_0-9]

/cg*i/find C followed by 0 or more g, followed by the string of I, as/cg{0,1}i/
That's a typo.
Should be
/cg*i/find C followed by 0 or more g, followed by the string of I, as/cg{0,}i/

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.