Perl Regular Expression Reference

Source: Internet
Author: User

Tags: edit grep exp perl Regular Expression Basics regular common meaning EFI

Regular Expressions two articles:

    • The basic regular
    • Perl Regular

This article is a bit of an extension of Perl regular, with the main content being the use of qr// creating regular objects, and some other tricks.

qr//creating a regular object

Because variable substitution can be used in regular mode, we can save some of the expressions in the regular pattern in advance in the variable. For example:

$str="hello worlds gaoxiaofang";$pattern="w.*d";$str =~ /$pattern/;print "$&\n";

However, this flaw is so large that special characters stored in variables that hold regular expressions are prevented from having special meanings. For example, when m// a match delimiter is used, it cannot be saved in a variable / unless escaped.

Perl provides qr/pattern/ the functionality to construct the pattern part as a regular expression object, and then you can:

    • Refer to this object directly in the regular expression
    • You can save this object to a variable and reference the stored regular object in a way that references the variable.
    • Inserting reference variables into other schemas to build more complex regular expressions

which

    • qr//Delimiter slash can be replaced with other symbols, such as symmetric brackets qr() qr{} qr<> qr[] , consistent symbol classes, qr%% qr## qr!! qr$$ qr"" qr‘‘ and so on.
    • But using single quotes as delimiters is special (that is qr‘pattern‘ ), it parses the pattern part in single quotation marks, for example, a variable $var cannot be replaced, but instead represents 4 characters. However, the metacharacters of the regular expression still works, such as $ still representing the end of the line.
$str="hello worlds gaoxiaofang";# 直接作为正则表达式$str =~ qr/w.*d/;print "$&\n";# 保存为变量,再作为正则表达式$pattern=qr/w.*d/;$str =~ $pattern;    # (1)$str =~ /$pattern/;  # (2)print "$&\n";# 保存为变量,作为正则表达式的一部分$pattern=qr/w.*d/;$str =~ /hel.* $pattern/;print "$&\n";

It also allows the modifier to be set for this regular object, such as the match modifier for ignoring the case is I, so that only this part of the regular object ignores the case when the real match is true, and the remainder is still case sensitive.

$str="HELLO wORLDs gaoxiaofang";$pattern=qr/w.*d/i;         # 忽略大小写$str =~ /HEL.* $pattern/;   # 匹配成功,$pattern部分忽略大小写$str =~ /hel.* $pattern/;   # 匹配失败$str =~ /hel.* $pattern/i;  # 匹配成功,所有都忽略大小写
QR How to build a regular object

Output a regular reference to the QR build to see what the structure is:

$patt1=qr/w.*d/;print "$patt1\n";$patt2=qr/w.*d/i;    # 加上修饰符iprint "$patt2\n";$patt3=qr/w.*d/img;  # 加上修饰符imgprint "$patt3\n";

The above print will output the following results:

(?^:w.*d)(?^i:w.*d)(?^mi:w.*d)

The function of QR is actually to add and bring some modifiers on the basis of the regular pattern we (?^:) have given, and the result is always (?^FLAGS:pattern) .

But the modifier g above patt3 is missing. First you can look at (?^:) the role: non-capturing grouping, and resetting the modifier. Which modifiers are reset to? For the most (?^FLAGS:) part, only these modifiers "ALUPIMSX" are available, namely (?^alupimsx:) :

    • If the given modifier is not in these modifiers, it is not recognized and sometimes an error occurs.
    • If the given modifier belongs to these modifiers, then no given modifier part will take the default value (different versions may default whether the values are turned on differently)

So the above G will be discarded, even if the regular reference is further manipulated, it will be an error.

Now that the QR is added to the pattern section, (?^:) when they are inserted into other regular patterns, it is guaranteed that the segment is independent and not affected by the global modifier.

$patt1=qr/w.*d/im;$patt2=qr/hel.*d $patt1/i;print "$patt2\n";     # 输出:(?^i:hel.*d (?^mi:w.*d))
The use of regular references as scalars

Since the qr// regular object reference created is a scalar, a regular reference can appear where the scalar can appear. For example, put in a hash structure, an array structure.

For example, put in an array to form a list of regular expressions, and then given a target to match, then use these patterns in the list to match.

use v5.10.1;my @patterns = (    qr/(?:Willie )?Gilligan/,    qr/Mary Ann/,    qr/Ginger/,    qr/(?:The )?Professor/,    qr/Skipper/,    qr/Mrs?. Howell/,);my $name = 'Ginger';foreach my $pattern ( @patterns ) {    if( $name =~ /$pattern/ ) {        say "Match!";        print "$pattern";        last;    }}

You can also put these regular references into the hash and use key for each pattern to identify it, for example, PATTERN1 is used to match something:

use v5.10.1;my %patterns = (    Gilligan => qr/(?:Willie )?Gilligan/,    'Mary Ann' => qr/Mary Ann/,    Ginger => qr/Ginger/,    Professor => qr/(?:The )?Professor/,    Skipper => qr/Skipper/,    'A Howell' => qr/Mrs?. Howell/,);my $name = 'Ginger';my( $match ) = grep { $name =~ $patterns{$_} } keys %patterns;say "Matched $match" if $match;

The result of the GREP statement is assigned to a scalar, so if more than one pattern can be matched $name and executed more than once, $match the value may be different.

Building Complex Regular Expressions

With the QR, you can refine the regular expression into a small piece, and then combine it together. For example:

my $howells = qr/Thurston|Mrs/;my $tagalongs = qr/Ginger|Mary Ann/;my $passengers = qr/$howells|$tagalongs/;my $crew = qr/Gilligan|Skipper/;my $everyone = qr/$crew|$passengers/;

Just like the anatomy of each part of the URL in RFC 1738, if you convert to Perl regular, that's about it (know it):

# Reusable Basic Symbol class my $alpha = Qr/[a?z]/;my $digit = qr/\d/;my $alphadigit = qr/(? I: $alpha | $digit)/;my $safe = qr/[\$_.+?] /;my $extra = qr/[!* ' \ (\),]/;my $national = qr/[{}|\\^~\[\] ']/;my $reserved = qr| [;/?:@&=]|; My $hex = qr/(? I: $digit |[ A? F])/;my $escape = qr/% $hex $hex/;my $unreserved = qr/$alpha | $digit | $safe | $extra/;my $uchar = qr/$unreserved | $escape/;my $ Xchar = qr/$unreserved | $reserved | $escape/;my $ucharplus = qr/(?: $uchar |[;? &=]) */;my $digits = qr/(?: $digit) {1,}/;# The reusable URL consists of the element my $hsegment = $ucharplus; My $hpath = qr| $hsegment (?:/ $hsegment) *|; my $search = $ucharplus; my $scheme = qr| (? i:https?:/ /)|; My $port = qr/$digits/;my $password = $ucharplus; my $user = $ucharplus; My $toplevel = qr/$alpha | $alpha (?: $alphadigit |?) * $alphadigit/;my $domainlabel = qr/$alphadigit | $alphadigit (?: $alphadigit |?) * $alphadigit/x;my $hostname = qr/(?: $domainlabel \.) * $toplevel/;my $hostnumber = qr/$digits \. $digits \. $digits \. $digits/;my $host = qr/$hostname | $hostnumber/;my $hostport = qr/$host (?:: $port)?/;my $Login = qr/(?: $user (?:: $password) \@)?/;my $urlpath = qr/(?:(?: $xchar) *)/; 

Then we can use a regular expression that looks incredibly complex to match whether a path is a qualified HTTP URL:

use v5.10.1;my $httpurl = qr|$scheme$hostport(?:/$hpath(?:\?$search)?)?|;while( <> ) {    say if /$httpurl/;}
Regular Expression Module

The above-built regularization is too complex, and many of the regular expressions used by others have been built for the wheels, and we can just use them. For example, a Regexp::Common module provides a number of regular expressions that have already been built.

Install this module first:

sudo cpan -i Regexp::Common

The following are the built-in wheels available on the CPAN Regexp::Common , for reference: Https://metacpan.org/release/Regexp-Common

Regexp::common-provide commonly requested regular expressionsregexp::common::cc-provide patterns for credit card Numbe Rs. Regexp::common::sen-provide regexes for social-economical numbers.regexp::common::uri-provide patterns for URIs.Regex P::common::uri::rfc1035-definitions from RFC1035; Regexp::common::uri::rfc1738-definitions from RFC1738; Regexp::common::uri::rfc1808-definitions from RFC1808; Regexp::common::uri::rfc2384-definitions from RFC2384; Regexp::common::uri::rfc2396-definitions from RFC2396; Regexp::common::uri::rfc2806-definitions from RFC2806; Regexp::common::uri::fax-returns a pattern for fax uris.regexp::common::uri::file-returns a pattern for file Uris.rege Xp::common::uri::ftp-returns a pattern for FTP uris.regexp::common::uri::gopher-returns a pattern for gopher Uris.rege Xp::common::uri::http-returns a pattern for HTTP Uris.regexp::common::uri::news-returns a pattern for file Uris.regexp :: Common::uri::p op-returns a pattern for pop Uris.regexP::common::uri::p rospero-returns a pattern for Prospero uris.regexp::common::uri::tel-returns a pattern for telephone Uris.regexp::common::uri::telnet-returns a pattern for Telnet uris.regexp::common::uri::tv-returns a pattern for TV UR Is.regexp::common::uri::wais-returns a pattern for WAIS uris.regexp::common::_support-support functions for Regexp::co Mmon. Regexp::common::balanced-provide regexes for strings with balanced parenthesized delimiters or arbitrary delimiters. Regexp::common::comment-provide regexes for comments. Regexp::common::d elimited-provides a regex for delimited stringsregexp::common::lingua-provide regexes for language re lated stuff. Regexp::common::list-provide regexes for Listsregexp::common::net-provide regexes for IPv4, IPV6, and MAC addresses. Regexp::common::number-provide regexes for Numbersregexp::common::p rofanity-provide regexes for ProfanityRegexp:: Common::whitespace-provides a regex for leading or trailing Whitescaperegexp::commoN::zip-provide regexes for postal codes. 

These regular expressions are nested by hash, with the name of the hash %RE . For example Regexp::Common::URI::http , the module, which provides a regular expression of the HTTP URI, it is nested two layers, the first layer of the key is a URI, the key corresponding to the value of the second layer of hash, the second layer of the hash key is HTTP, so you can $RE{URI}{HTTP} get this regular in the way.

For example, it is reasonable to match an HTTP URL:

use Regexp::Common qw(URI);while( <> ) {    print if /$RE{URI}{HTTP}/;}

When learning shell scripts, people often write regular expressions that match IPV4, and now we can get them directly from Regexp::Common::net :

use Regexp::Common qw(net);$ipv4=$RE{net}{IPv4};print $ipv4;

Here are the results:

(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))

It is important to note that the resulting reference should be anchored when the match is true, otherwise it will return true if the 318.99.183.11 is matched, because 18.99.183.11 matches the matching result. So, add anchoring to both front and back, for example:

$ipv4 =~ /^$RE{net}{IPv4}$/;

Transform the above IPv4 (remove the function of the non-capturing grouping) and make it suitable for the extended regular that is generally supported in the shell tool:

(25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})(\.(25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})){3}

By default, Regexp::Common the individual modules do not have the capture feature turned on. If you want to use $1 $N This reference, you need to use the {-keep} option, and as for what each group captures, you need to refer to the help documentation for instructions.

For example:

use Regexp::Common qw(number);while( <> ) {    say $1 if /$RE{num}{int}{ ?base => 16 }{?keep}/;}

Perl Regular Expression Reference

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: