<title>1. Using regular expressions to modify text</title>1. Using regular expressions to modify text
The function of regular expressions is not only the query, but also the text can be modified, such as replacing
$var=~m/regex/i $var=~s/regex/replacement/i
The slash on both sides of the replacement is equivalent to double quotes, which means that replacement can have a variable of $1 and$2 to represent what was previously matched.
With$var=~s/regex/replacement/the ability to change the text in the $var, if there is no match, there will be no replacement of the text
$var=Jeff frield; $var=~s/Jeff/Jeffery/; $var=Jeffery frield
But if you run it$var=~s/Jeff/Jeffery/once, you'll get:
$var=Jefferyery fried;
Because Jeff is always matched, the substitution always occurs.
To avoid such a situation, we have to make the matching conditions more specific.
It matches the word Jeff, not the four letters , so we use:
$var=~s/\bJeff\b/Jeffery/
This way, the second run will$var=~s/\bJeff\b/Jeffery/not change the text again.
1.1 Example an official letter generator
Suppose there is an official letter system, which contains many letters of the Magic Board, some of which are marked with a different value for each specific correspondence
For example:
Dear =first=,
Congratulations to you! Obtained the =trinket=! Absolutely FREE! Do you want to get more =trinkey= for the =family= family? Tell you =full=, you can!
Then set three variables:
$given="王"; $family="小明"; $prize="10000克拉的钻石";
Then we can fill in the contents of the statement template:
$letter=~s/=FIRST=/$family/g; $letter=~s/=FAMILY=/$given/g; $letter=~s/=FULL=/$given $family/g; $letter=~s/=TRINKET=/价值连城$prize/g;
G is a modifier for global substitution, and he tells s///to continue the next time after a match substitution succeeds until the match is unsuccessful so that all relevant text is replaced by the effect
Results:
Dear Xiaoming,
Congratulations to you! Get a priceless 10000 carat diamond! Absolutely FREE! Do you also want to get more priceless 10000 carat diamonds for the wangs? Tell you Wang Xiaoming, you can!
1.2 Example two trimming number formats
Sometimes because the computer internal means floating point principle, the output of the number is 9.05000000372272, but we actually only need to keep three digits after the decimal point can be
The requirement is: retain two digits after the decimal point, if the third bit is not 0, also want to retain, for example 12.3750000000392 will become 12.375, 37.500 will become 37.50.
$num=~s/(\.\d\d[1-9]?)\d*/$1/;
Add a comma to a value using the surround Look function
Large values, in order to facilitate reading, usually add commas in the meantime.
"the US population is $pop\n";
Will output the US population is 298444215, but 298,444,215 will be more pleasing to the eye.
We should start with the right side of the number, three numbers at a time if there are numbers on the left, add a comma.
This is an intuitive idea, but the regular expression is left-to-right to handle the text.
The comma should be added to "the number on the left, the number on the right is the position of multiples of 3 ".
For this kind of task, we use the surround look function to realize.
The surround structure does not match any characters, only matches the specific position in the text. This feature has actually been seen many times before, such as \b, ^, $, etc., all matching a position, but look more general than they are, because it matches the position you define.
sequential Surround View: Views text from left to right, trying to match regular expressions. An affirmative-order surround (? = ...) To indicate, for example (? =\d), that the match succeeds if the character to the right of the current position is a number
Reverse look: View text from right to left, trying to match a regular expression. Affirmative-type reverse-order surround (<= ...) To indicate, for example (? <=\d), that the match succeeds if the left character of the current position is a number (that is, the position immediately following the number).
A look around a regular expression does not "Occupy" the character when it is matched, but it matches the position.
Match by Jeffery Friedl with Jeffery, match to the position before the characters are Jeffery
By combining the surround-look structure with the truly matching characters, we can match more accurate content, such as:
(? =jeffery) Jeff
Can match to by Jeffery Friedl
Cannot match by Jefferson
(? =jeffery) Jeff; Jeff (? =ery)
Jeff (? =jeffery) does not match the above example, but rather matches the following Jeffery, for example, Jeffjeffery
1.3 Example three effs= "Jeff ' s
Change Jeffs to Jeff's.
s/Jeffs/Jeff’s/ s/\bJeffs\b/Jeff’ s/ s/\b(Jeff)(s)\b/$1’$2/
s/\bJeff(?=s\b)/Jeff’/
The benefit of looking at just one location is to allow us to check the entire Jeffs before matching Jeff
s/(?<=\bJeff)(?=s\b)/’/ s/(?=s\b)(?<=\bJeff)/’/
A precise position is found in the order of the look and the reverse, because the condition is only the position, so the conditions are not affected.
1.4 example four back to comma
"There is a number on the left and the number on the right is a multiple of 3".
The first requirement is to be able to be satisfied by looking in reverse order, with numbers on the left,(?<=\d)
The second requirement: a 3-digit number can be represented\d\d\d, and then can be used to(\d\d\d)+represent a number of times 3
Finally, add $ to make sure that there are no other characters behind these numbers to ensure that the result is "just right" to the end of the last 3 digits.
$pop=~s/(?<=\d)(?=(\d\d\d)+$)/,/g;
Print"The US population is $pop\n";
298,444,215
Imagine, if not add $, what will be the consequences?
$pop=~s/(?<=\d)(?=(\d\d\d)+)/,/g;
2,9,8,4,4,4,215
and enclosing the parentheses of the \d\d\d, in fact, we just used to make the + function in this bracket, and do not use its capture function, so it can be written as a non-capturing bracket:(?:......)
$pop=~s/(?<=\d)(?=(?:\d\d\d)+$)/,/g;
Negative surround
Now, we want to apply this comma-inserted regular expression to a very long string sink, for example
$text="The population of 299444215 is growing"
This doesn'ts/(?<=\d)(?=(\d\d\d)+$)/,/g;work, because the number is not the end, so the match is unsuccessful.
Workaround: You can change $ to \b, although \b is called the word delimiter, but for Perl, the \w that matches the word is [a-za-z0-9], and the numbers are included, so it's a generalized word.
Notice here that \b means that one side of this position is a word, the other side is not
Look around also has a related concept, what we said earlier(?=),(?<=)is called affirmative-order surround look and affirmative reverse look. Because their success is conditional on the sub-expression being able to match in these locations
There are also negative order look(?!)and negative reverse look(?<!), their success condition is that the sub-expression cannot match
type |
Regular Expressions |
criteria for matching success |
A certain sequence of look |
(?=) |
Sub-expressions can match the right text |
Affirmative reverse look |
(? <=) |
Subexpression can match the left-hand text |
Negative order Look around |
(?!) |
Sub-expression cannot match right text |
Negative reverse look |
(? |
Sub-expression cannot match left text |
So, in fact, \b is(?<!\w)(?=\w)|(?<=\w)(?!\w)
s/(?<=\d)(?=(\d\d\d)+(?!\d)/,/g;
Not all host languages support reverse-order surround
So we can write like this, so we don't have to look in reverse.
s/(\d)(?=(\d\d\d)+$)/$1,/g;
What if we don't even look around?
s/(\d)((\d\d\d)+\b)/$1,$2/g;
May I?
The answer is no , the result is 298,444215 .
Because the G modifier specifies that the next match begins at the end of the match. But at the first match, (\d\d\d) +\b already matched 444215, so the next match for G starts at 5.
The workaround is to add a while loop in Perl, repeating the match instead of iterating the match
2. Usage Considerations for regular expressions
There are 3 main issues to note when using regular expressions in a particular host language or tool software
1. Supported meta-characters, and the meaning of these metacharacters, which are often referred to as "genres" of regular expressions
2. The regular expression and the language tool "interactive mode." For example, how to perform the operation of regular expressions, what actions are allowed, and the type of target text for these operations
3. How the regular expression engine applies an expression to text
As a result of the long history of regular expression, many programmers and new programs form their own genre, so they become a huge fan of the game.
Until the 1986,POSIX(a series of standards) was born, it was a standardized attempt to standardize the various genres of the entangled regular expression, using the same set of rules to implement regular expressions. It divides a variety of common genres into two broad categories:
Basic Regular Expressions (BREs) and extended Regular Expressions (EREs)
POSIX programs must support either of these
Perl Regular Expressions second-week notes