Perl Learning (3) Regular Expressions

Source: Internet
Author: User

Regular Expression. The so-called regular expression is a set of character sets starting from the slash ring. They can be used to match the specified modulo in the text.
And perform the replacement operation. Perl has always been famous for its excellent pattern matching mechanism.

Table 2-2 some regular expression metacharacters

Metacharacters Meaning
^ Match the beginning of a row
$ Match the end of a row
A. c Match a single A, followed by any single character, followed by a C
[ABC] Matching A, B, or C
[^ ABC] The matching character is neither a nor B nor C.
[0-9] Match a single number between 0 and 9
AB * C Matches 0 to multiple B after a, and finally a C.
AB + c Matches one to multiple B after one A, and finally a C.
AB? C It matches 0 to 1 B after a and finally a C.
(AB) + C Matching one to multiple AB followed by one C
(AB) (c) Capture AB and assign its value to variable $1, capture C value and assign $2

 

Three Forms of Regular Expressions:

First, we should know that there are three regular expressions in the Perl program:
1. Match: M/<Regexp>;/(can also be abbreviated as/<Regexp>;/, skip m)
2. Replace: S/<pattern>;/<replacement> ;/
3. Conversion: TR/<pattern>;/<replacemnt> ;/

 

Examples of three forms:

#!/usr/bin/perl

 

$mystring = "I am love perl!";

Print ("the given string is $ mystring \ n ");

if($mystring =~ m/perl/){

Print ("the given string contains the Perl string! \ N ");

}

 

$mystring =~ s/perl/best/;

Print ("replaced string: $ mystring \ n ");

 

$mystring =~ tr/a-z/A-Z/;

Print ("converted string: $ mystring \ n ");

Common pattern in Regular Expressions

Below are some common patterns in regular expressions.

/Pattern/result

. Match All characters except line breaks

X? Match 0 times or once x string

X * matches 0 or multiple times X strings, but the minimum number of possible matches

X + matches the string once or multiple times, but the minimum number of possible matches

. * Match any character 0 or once

. + Match any character once or multiple times

{M} matches a specified string of exactly M.

{M, n} matches a specified string of more than n m

{M,} matches more than m specified strings

[] Match characters in []

[^] Does not match characters in []

[0-9] Match All numeric characters

[A-Z] matches all lowercase letter characters

[^ 0-9] match all non-numeric characters

[^ A-Z] matches all non-lowercase letter characters

^ Match characters starting with a character

$ Match characters at the end of a character

\ D matches the character of a number, which is the same as the [0-9] syntax.

\ D + matches multiple numeric strings, the same as the [0-9] + syntax

\ D is not a number; others are the same as \ D

\ D + non-numeric, others are the same as \ D +

A string of \ W English letters or numbers, the same as the [a-zA-Z0-9] syntax

\ W + the same syntax as [a-zA-Z0-9] +

\ W a string of Non-English letters or numbers, the same as the [^ a-zA-Z0-9] syntax

\ W + and [^ a-zA-Z0-9] + syntax is the same

\ S space, which is the same as the syntax of [\ n \ t \ r \ F]

\ S + is the same as [\ n \ t \ r \ f] +

\ S is not a space, and the syntax is the same as [^ \ n \ t \ r \ F]

The syntax of \ s + is the same as that of [^ \ n \ t \ r \ f] +.

\ B matches strings with English letters and numbers

\ B matches strings that do not contain English letters and numbers.

A | B | C: a string that matches the character, B character, or C character

ABC matches strings containing ABC

(Pattern) () This symbol remembers the searched string, which is a very useful syntax. The string found in the first () is changed to the $1 variable or the \ 1 variable,

The string found in the second () is changed to the $2 variable or the \ 2 variable, and so on.

The/pattern/I parameter indicates that the English case is ignored, that is, when matching strings, the English case is not considered.

\ If you want to find a special character in pattern mode, such as "*", you must add the \ symbol before the character to invalidate the special character.

The following are some examples:

Example

/Perl/find a string containing Perl

/^ PERL/find a string starting with Perl

/Perl $/find the string ending with Perl

/C | G | I/find a string containing C, G, or I

/CG {2, 4} I/find C followed by 2 to 4G, followed by the I string

/CG {2,} I/find C followed by more than 2g, followed by the I string

/CG {2} I/find C followed by 2g, followed by the I string

/CG * I/find C followed by 0 or more G, followed by the I string, AS/CG {0, 1} I/

/CG + I/find C followed by more than one G, followed by the I string, like/CG {1,} I/

/CG? I/find C followed by 0 or 1g, followed by the I string, AS/CG {0, 1} I/

/C. I/find C followed by an arbitrary character, followed by the string of I

/C. I/find C followed by two arbitrary characters, followed by the I string

/[CGI]/find a string that matches any of the three characters

/[^ CGI]/find any one of the three characters

/\ D/search for numbers. You can use/\ D +/to represent a string consisting of one or more numbers.

/\ D/find a character that matches a non-numeric character. You can use/\ D +/to represent a string consisting of one or more non-numeric characters.

/\ */Find the character that matches *. Because * has its special meaning in a regular expression, you must add the \ symbol before the special symbol to invalidate this special character.

/ABC/I: Find strings that match ABC, regardless of the Case sensitivity of these strings.

 

 

Eight Principles of Regular Expressions

If SED, awk, and grep commands have been used in UNIX, it is believed that the regular expression (Regular Expression) in Perl will not be unfamiliar.

The PERL language has this function, so it is very capable of processing strings. In programs in the Perl language, you can often see the use of regular expressions, which is no exception in CGI programming.

Regular Expressions are difficult for beginners in Perl, but once you have mastered the syntax, You can have almost unlimited pattern matching capabilities, and most of the work of Perl programming is to master regular expressions.

The following describes the eight principles used in regular expressions.

Regular Expressions can form a large consortium in the battle against data-this is often a war. We should remember the following eight principles:

· Principle 1: regular expressions have three different forms (matching (M //), replacement (S // EG), and conversion (TR ///)).

· Principle 2: Regular Expressions only match scalar values ($ scalar = ~ M/A/; can work; @ array = ~ M/A/will treat @ array as a scalar, so it may not succeed ).

· Principle 3: Regular Expressions match the earliest possible match of a given pattern. Lack of time, only match or replace the regular expression once ($ A = 'string string2'; $ A = ~ S/string //; causes $ A = 'string 2 ').

· Principle 4: Regular Expressions can process any and all characters that double quotation marks can process ($ A = ~ M/$ varb/extend varb to a variable before matching. If $ varb = 'A' $ A = 'as', $ A = ~ S/$ varb //; equivalent to $ A = ~ S/A //;, the execution result is $ A = "S ").

· Principle 5: the regular expression produces two situations in the value evaluation process: Result Status and reverse reference: $ A = ~ M/pattern/indicates whether the child string pattern appears in $ A, $ A = ~ S/(word1) (word2)/$2 $1/the word "change.

· Principle 6: the Core Competence of Regular Expressions lies in wildcards and Multiple matching operators and how they operate. $ A = ~ M/\ W +/matches one or more word characters; $ A = ~ M/\ D/"matches zero or multiple numbers.

· Principle 7: To match more than one character set, Perl uses "|" to increase flexibility. If M/(cat | dog)/is input, it is equivalent to "matching string cat or dog.

· Principle 8: Perl (?..) The Syntax provides extended functions for regular expressions. (Please read related materials after class)

Want to learn all these principles? I suggest you start with a simple process and keep trying and experimenting. In fact, if you learn $ A = ~ M/error/is to find the substring error in $,

Then you have gained more processing capabilities than in lower-layer languages such as C.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.