Perl Regular Expression Reference

Last Update:2018-10-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Regular Expressions two articles:

The basic regular
Perl Regular

This article is a bit of an extension of Perl regular, with the main content being the use ofqr//creating regular objects, and some other tricks.

qr//creating a regular object

Because variable substitution can be used in regular mode, we can save some of the expressions in the regular pattern in advance in the variable. For example:

$str="hello worlds gaoxiaofang";
$pattern="w.*d";
$str =~ /$pattern/;
print "$&\n";

However, this flaw is so large that special characters stored in variables that hold regular expressions are prevented from having special meanings. For example, whenm//a match delimiter is used, it cannot be saved in a variable/unless escaped.

Perl providesqr/pattern/the functionality to construct the pattern part as a regular expression object, and then you can:

Refer to this object directly in the regular expression
You can save this object to a variable and reference the stored regular object in a way that references the variable.
Inserting reference variables into other schemas to build more complex regular expressions

which

qr//Delimiter slash can be replaced with other symbols, such as symmetric bracketsqr() qr{} qr<> qr[], consistent symbol classes,qr%% qr## qr!! qr$$ qr"" qr‘‘and so on.
But using single quotes as delimiters is special (that isqr‘pattern‘), it parses the pattern part in single quotation marks, for example, a variable$varcannot be replaced, but instead represents 4 characters. However, the metacharacters of the regular expression still works, such as$still representing the end of the line.

$str="hello worlds gaoxiaofang";

#直 as a regular expression
$str =~ qr/w.*d/;
Print "$&\n";

#Save as a variable and then as a regular expression
$pattern=qr/w.*d/;
$str =~ $pattern; # (1)
$str =~ /$pattern/; # (2)
Print "$&\n";

# Save as a variable as part of a regular expression
$pattern=qr/w.*d/;
$str =~ /hel.* $pattern/;
Print "$&\n";

It also allows the modifier to be set for this regular object, such as the match modifier for ignoring the case is I, so that only this part of the regular object ignores the case when the real match is true, and the remainder is still case sensitive.

$str="HELLO wORLDs gaoxiaofang";

$pattern=qr/w.*d/i; # ignore case

$str =~ /HEL.* $pattern/; # Matches successfully, $pattern partially ignores case
$str =~ /hel.* $pattern/; # Match failed
$str =~ /hel.* $pattern/i; # Matches successfully, all ignore case

QR How to build a regular object

Output a regular reference to the QR build to see what the structure is:

$patt1=qr/w.*d/;
Print "$patt1\n";

$patt2=qr/w.*d/i; # plus modifier i
Print "$patt2\n";

$patt3=qr/w.*d/img; # plus modifier img
Print "$patt3\n";

The above print will output the following results:

(?^:w.*d)
(?^i:w.*d)
(?^mi:w.*d)

The function of QR is actually to add and bring some modifiers on the basis of the regular pattern we(?^:)have given, and the result is always(?^FLAGS:pattern).

But the modifier g above patt3 is missing. First you can look at(?^:)the role: non-capturing grouping, and resetting the modifier. Which modifiers are reset to? For the most(?^FLAGS:)part, only these modifiers "ALUPIMSX" are available, namely(?^alupimsx:):

If the given modifier is not in these modifiers, it is not recognized and sometimes an error occurs.
If the given modifier belongs to these modifiers, then no given modifier part will take the default value (different versions may default whether the values are turned on differently)

So the above G will be discarded, even if the regular reference is further manipulated, it will be an error.

Now that the QR is added to the pattern section,(?^:)when they are inserted into other regular patterns, it is guaranteed that the segment is independent and not affected by the global modifier.

$patt1=qr/w.*d/im;
$patt2=qr/hel.*d $patt1/i;
Print "$patt2\n"; # Output: (?^i:hel.*d (?^mi:w.*d))

The use of regular references as scalars

Since theqr//regular object reference created is a scalar, a regular reference can appear where the scalar can appear. For example, put in a hash structure, an array structure.

For example, put in an array to form a list of regular expressions, and then given a target to match, then use these patterns in the list to match.

Use v5.10.1;
My @patterns = (
     Qr/(?:Willie)?Gilligan/,
     Qr/Mary Ann/,
     Qr/Ginger/,
     Qr/(?:The )?Professor/,
     Qr/Skipper/,
     Qr/Mrs?. Howell/,
);

My $name = 'Ginger';
Foreach my $pattern ( @patterns ) {
     If( $name =~ /$pattern/ ) {
         Say "Match!";
         Print "$pattern";
         Last;
     }
}

You can also put these regular references into the hash and use key for each pattern to identify it, for example, PATTERN1 is used to match something:

Use v5.10.1;
My %patterns = (
     Gilligan => qr/(?:Willie)?Gilligan/,
     'Mary Ann' => qr/Mary Ann/,
     Ginger => qr/Ginger/,
     Professor => qr/(?:The )?Professor/,
     Skipper => qr/Skipper/,
     'A Howell' => qr/Mrs?. Howell/,
);
My $name = 'Ginger';
My( $match ) = grep { $name =~ $patterns{$_} } keys %patterns;
Say "Matched $match" if $match;

The result of the GREP statement is assigned to a scalar, so if more than one pattern can be matched$nameand executed more than once,$matchthe value may be different.

Building Complex Regular Expressions

With the QR, you can refine the regular expression into a small piece, and then combine it together. For example:

my $howells = qr/Thurston|Mrs/;
my $tagalongs = qr/Ginger|Mary Ann/;
my $passengers = qr/$howells|$tagalongs/;
my $crew = qr/Gilligan|Skipper/;
my $everyone = qr/$crew|$passengers/;

Just like the anatomy of each part of the URL in RFC 1738, if you convert to Perl regular, that's about it (know it):

# Reusable basic symbol class
My $alpha = qr/[a?z]/;
My $digit = qr/\d/;
My $alphadigit = qr/(?i:$alpha|$digit)/;
My $safe = qr/[\$_.+?]/;
My $extra = qr/[!*'\(\),]/;
My $national = qr/[{}|\\^~\[\]`]/;
My $reserved = qr|[;/?:@&=]|;
My $hex = qr/(?i:$digit|[A?F])/;
My $escape = qr/%$hex$hex/;
My $unreserved = qr/$alpha|$digit|$safe|$extra/;
My $uchar = qr/$unreserved|$escape/;
My $xchar = qr/$unreserved|$reserved|$escape/;
My $ucharplus = qr/(?:$uchar|[;?&=])*/;
My $digits = qr/(?:$digit){1,}/;

# Reusable URL component
My $hsegment = $ucharplus;
My $hpath = qr|$hsegment(?:/$hsegment)*|;
My $search = $ucharplus;
My $scheme = qr|(?i:https?://)|;
My $port = qr/$digits/;
My $password = $ucharplus;
My $user = $ucharplus;

My $toplevel = qr/$alpha|$alpha(?:$alphadigit|?)*$alphadigit/;
My $domainlabel = qr/$alphadigit|$alphadigit(?:$alphadigit|?)*$alphadigit/x;
My $hostname = qr/(?:$domainlabel\.)*$toplevel/;
My $hostnumber = qr/$digits\.$digits\.$digits\.$digits/;
My $host = qr/$hostname|$hostnumber/;
My $hostport = qr/$host(?::$port)?/;
My $login = qr/(?:$user(?::$password)\@)?/;

My $urlpath = qr/(?:(?:$xchar)*)/;

Then we can use a regular expression that looks incredibly complex to match whether a path is a qualified HTTP URL:

use v5.10.1;
my $httpurl = qr|$scheme$hostport(?:/$hpath(?:\?$search)?)?|;
while( <> ) {
    say if /$httpurl/;
}

Regular Expression Module

The above-built regularization is too complex, and many of the regular expressions used by others have been built for the wheels, and we can just use them. For example, aRegexp::Commonmodule provides a number of regular expressions that have already been built.

Install this module first:

sudo cpan -i Regexp::Common

The following are the built-in wheels available on the CPANRegexp::Common, for reference: Https://metacpan.org/release/Regexp-Common

Regexp::Common - Provide commonly requested regular expressions
Regexp::Common::CC - provide patterns for credit card numbers.
Regexp::Common::SEN - provide regexes for Social-Economical Numbers.
Regexp::Common::URI - provide patterns for URIs.
Regexp::Common::URI::RFC1035 - Definitions from RFC1035;
Regexp::Common::URI::RFC1738 - Definitions from RFC1738;
Regexp::Common::URI::RFC1808 - Definitions from RFC1808;
Regexp::Common::URI::RFC2384 - Definitions from RFC2384;
Regexp::Common::URI::RFC2396 - Definitions from RFC2396;
Regexp::Common::URI::RFC2806 - Definitions from RFC2806;
Regexp::Common::URI::fax - Returns a pattern for fax URIs.
Regexp::Common::URI::file - Returns a pattern for file URIs.
Regexp::Common::URI::ftp - Returns a pattern for FTP URIs.
Regexp::Common::URI::gopher - Returns a pattern for gopher URIs.
Regexp::Common::URI::http - Returns a pattern for HTTP URIs.
Regexp::Common::URI::news - Returns a pattern for file URIs.
Regexp::Common::URI::pop - Returns a pattern for POP URIs.
Regexp::Common::URI::prospero - Returns a pattern for prospero URIs.
Regexp::Common::URI::tel - Returns a pattern for telephone URIs.
Regexp::Common::URI::telnet - Returns a pattern for telnet URIs.
Regexp::Common::URI::tv - Returns a pattern for tv URIs.
Regexp::Common::URI::wais - Returns a pattern for WAIS URIs.
Regexp::Common::_support - Support functions for Regexp::Common.
Regexp::Common::balanced - provide regexes for strings with balanced parenthesized delimiters or arbitrary delimiters.
Regexp::Common::comment - provide regexes for comments.
Regexp::Common::delimited - provides a regex for delimited strings
Regexp::Common::lingua - provide regexes for language related stuff.
Regexp::Common::list - provide regexes for lists
Regexp::Common::net - provide regexes for IPv4, IPv6, and MAC addresses.
Regexp::Common::number - provide regexes for numbers
Regexp::Common::profanity - provide regexes for profanity
Regexp::Common::whitespace - provides a regex for leading or trailing whitescape
Regexp::Common::zip - provide regexes for postal codes.

These regular expressions are nested by hash, with the name of the hash%RE. For exampleRegexp::Common::URI::http, the module, which provides a regular expression of the HTTP URI, it is nested two layers, the first layer of the key is a URI, the key corresponding to the value of the second layer of hash, the second layer of the hash key is HTTP, so you can$RE{URI}{HTTP}get this regular in the way.

For example, it is reasonable to match an HTTP URL:

use Regexp::Common qw(URI);
while( <> ) {
    print if /$RE{URI}{HTTP}/;
}

When learning shell scripts, people often write regular expressions that match IPV4, and now we can get them directly fromRegexp::Common::net:

use Regexp::Common qw(net);
$ipv4=$RE{net}{IPv4};
print $ipv4;

Here are the results:

(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))

It is important to note that the resulting reference should be anchored when the match is true, otherwise it will return true if the 318.99.183.11 is matched, because 18.99.183.11 matches the matching result. So, add anchoring to both front and back, for example:

$ipv4 =~ /^$RE{net}{IPv4}$/;

Transform the above IPv4 (remove the function of the non-capturing grouping) and make it suitable for the extended regular that is generally supported in the shell tool:

(25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})(\.(25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})){3}

By default,Regexp::Commonthe individual modules do not have the capture feature turned on. If you want to use$1$NThis reference, you need to use the{-keep}option, and as for what each group captures, you need to refer to the help documentation for instructions.

For example:

use Regexp::Common qw(number);
while( <> ) {
    say $1 if /$RE{num}{int}{ ?base => 16 }{?keep}/;
}

Perl Regular Expression Reference

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Perl Regular Expression Reference

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support