Perl Regular Expressions second-week notes

Last Update:2015-10-12 Source: Internet

Author: User

Tags posix expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

<title>1. Using regular expressions to modify text</title>1. Using regular expressions to modify text

The function of regular expressions is not only the query, but also the text can be modified, such as replacing

$var=~m/regex/i $var=~s/regex/replacement/i

The slash on both sides of the replacement is equivalent to double quotes, which means that replacement can have a variable of $1 and$2 to represent what was previously matched.
With$var=~s/regex/replacement/the ability to change the text in the $var, if there is no match, there will be no replacement of the text

$var=Jeff frield; $var=~s/Jeff/Jeffery/; $var=Jeffery frield

But if you run it$var=~s/Jeff/Jeffery/once, you'll get:

$var=Jefferyery fried;

Because Jeff is always matched, the substitution always occurs.
To avoid such a situation, we have to make the matching conditions more specific.
It matches the word Jeff, not the four letters , so we use:

$var=~s/\bJeff\b/Jeffery/

This way, the second run will$var=~s/\bJeff\b/Jeffery/not change the text again.

1.1 Example an official letter generator

Suppose there is an official letter system, which contains many letters of the Magic Board, some of which are marked with a different value for each specific correspondence
For example:
Dear =first=,
Congratulations to you! Obtained the =trinket=! Absolutely FREE! Do you want to get more =trinkey= for the =family= family? Tell you =full=, you can!

Then set three variables:

$given="王"; $family="小明"; $prize="10000克拉的钻石";

Then we can fill in the contents of the statement template:

$letter=~s/=FIRST=/$family/g; $letter=~s/=FAMILY=/$given/g; $letter=~s/=FULL=/$given $family/g; $letter=~s/=TRINKET=/价值连城$prize/g;

G is a modifier for global substitution, and he tells s///to continue the next time after a match substitution succeeds until the match is unsuccessful so that all relevant text is replaced by the effect
Results:
Dear Xiaoming,
Congratulations to you! Get a priceless 10000 carat diamond! Absolutely FREE! Do you also want to get more priceless 10000 carat diamonds for the wangs? Tell you Wang Xiaoming, you can!

1.2 Example two trimming number formats

Sometimes because the computer internal means floating point principle, the output of the number is 9.05000000372272, but we actually only need to keep three digits after the decimal point can be
The requirement is: retain two digits after the decimal point, if the third bit is not 0, also want to retain, for example 12.3750000000392 will become 12.375, 37.500 will become 37.50.

$num=~s/(\.\d\d[1-9]?)\d*/$1/;

Add a comma to a value using the surround Look function
Large values, in order to facilitate reading, usually add commas in the meantime.

"the US population is $pop\n"；

Will output the US population is 298444215, but 298,444,215 will be more pleasing to the eye.

We should start with the right side of the number, three numbers at a time if there are numbers on the left, add a comma.
This is an intuitive idea, but the regular expression is left-to-right to handle the text.
The comma should be added to "the number on the left, the number on the right is the position of multiples of 3 ".
For this kind of task, we use the surround look function to realize.
The surround structure does not match any characters, only matches the specific position in the text. This feature has actually been seen many times before, such as \b, ^, $, etc., all matching a position, but look more general than they are, because it matches the position you define.

sequential Surround View: Views text from left to right, trying to match regular expressions. An affirmative-order surround (? = ...) To indicate, for example (? =\d), that the match succeeds if the character to the right of the current position is a number
Reverse look: View text from right to left, trying to match a regular expression. Affirmative-type reverse-order surround (<= ...) To indicate, for example (? <=\d), that the match succeeds if the left character of the current position is a number (that is, the position immediately following the number).

A look around a regular expression does not "Occupy" the character when it is matched, but it matches the position.
Match by Jeffery Friedl with Jeffery, match to the position before the characters are Jeffery

By combining the surround-look structure with the truly matching characters, we can match more accurate content, such as:
(? =jeffery) Jeff
Can match to by Jeffery Friedl
Cannot match by Jefferson

(? =jeffery) Jeff; Jeff (? =ery)
Jeff (? =jeffery) does not match the above example, but rather matches the following Jeffery, for example, Jeffjeffery

1.3 Example three effs= "Jeff ' s

Change Jeffs to Jeff's.

s/Jeffs/Jeff’s/ s/\bJeffs\b/Jeff’ s/ s/\b(Jeff)(s)\b/$1’$2/

s/\bJeff(?=s\b)/Jeff’/

The benefit of looking at just one location is to allow us to check the entire Jeffs before matching Jeff

s/(?<=\bJeff)(?=s\b)/’/ s/(?=s\b)(?<=\bJeff)/’/

A precise position is found in the order of the look and the reverse, because the condition is only the position, so the conditions are not affected.

1.4 example four back to comma

"There is a number on the left and the number on the right is a multiple of 3".
The first requirement is to be able to be satisfied by looking in reverse order, with numbers on the left,(?<=\d)
The second requirement: a 3-digit number can be represented\d\d\d, and then can be used to(\d\d\d)+represent a number of times 3
Finally, add $ to make sure that there are no other characters behind these numbers to ensure that the result is "just right" to the end of the last 3 digits.

$pop=~s/(?<=\d)(?=(\d\d\d)+$)/,/g;
Print"The US population is $pop\n";

298,444,215

Imagine, if not add $, what will be the consequences?

$pop=~s/(?<=\d)(?=(\d\d\d)+)/,/g;

2,9,8,4,4,4,215
and enclosing the parentheses of the \d\d\d, in fact, we just used to make the + function in this bracket, and do not use its capture function, so it can be written as a non-capturing bracket:(?:......)

$pop=~s/(?<=\d)(?=(?:\d\d\d)+$)/,/g;

Negative surround
Now, we want to apply this comma-inserted regular expression to a very long string sink, for example

$text="The population of 299444215 is growing"

This doesn'ts/(?<=\d)(?=(\d\d\d)+$)/,/g;work, because the number is not the end, so the match is unsuccessful.
Workaround: You can change $ to \b, although \b is called the word delimiter, but for Perl, the \w that matches the word is [a-za-z0-9], and the numbers are included, so it's a generalized word.
Notice here that \b means that one side of this position is a word, the other side is not

Look around also has a related concept, what we said earlier(?=),(?<=)is called affirmative-order surround look and affirmative reverse look. Because their success is conditional on the sub-expression being able to match in these locations
There are also negative order look(?!)and negative reverse look(?<!), their success condition is that the sub-expression cannot match

type	Regular Expressions	criteria for matching success
A certain sequence of look	(?=)	Sub-expressions can match the right text
Affirmative reverse look	(? <=)	Subexpression can match the left-hand text
Negative order Look around	(?!)	Sub-expression cannot match right text
Negative reverse look	(?	Sub-expression cannot match left text

So, in fact, \b is(?<!\w)(?=\w)|(?<=\w)(?!\w)

s/(?<=\d)(?=(\d\d\d)+(?!\d)/,/g;

Not all host languages support reverse-order surround
So we can write like this, so we don't have to look in reverse.

s/(\d)(?=(\d\d\d)+$)/$1,/g;

What if we don't even look around?

s/(\d)((\d\d\d)+\b)/$1,$2/g;

May I?

The answer is no , the result is 298,444215 .
Because the G modifier specifies that the next match begins at the end of the match. But at the first match, (\d\d\d) +\b already matched 444215, so the next match for G starts at 5.
The workaround is to add a while loop in Perl, repeating the match instead of iterating the match

2. Usage Considerations for regular expressions

There are 3 main issues to note when using regular expressions in a particular host language or tool software
1. Supported meta-characters, and the meaning of these metacharacters, which are often referred to as "genres" of regular expressions
2. The regular expression and the language tool "interactive mode." For example, how to perform the operation of regular expressions, what actions are allowed, and the type of target text for these operations
3. How the regular expression engine applies an expression to text

As a result of the long history of regular expression, many programmers and new programs form their own genre, so they become a huge fan of the game.
Until the 1986,POSIX(a series of standards) was born, it was a standardized attempt to standardize the various genres of the entangled regular expression, using the same set of rules to implement regular expressions. It divides a variety of common genres into two broad categories:
Basic Regular Expressions (BREs) and extended Regular Expressions (EREs)
POSIX programs must support either of these

Perl Regular Expressions second-week notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More