Perl Learning 8 processing Text with Regular Expression

Last Update:2015-08-19 Source: Internet

Author: User

Tags processing text expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

label:

Replace with s ///: If you think of m // pattern match as the "find" function of a word processor, then the s /// substitution operator is the "find and replace" function. This operator simply replaces the part of the variable that matches the pattern with another string:

 $ _ = "He's out bowling with Barney tonigth.";
s / Barney / Fred /; #replacing Barney with Fred
print "$ _ \ n"; 

If the match fails, nothing will happen and the variable will not be affected. Of course, pattern strings and replacement strings can be more complicated. The following replacement string uses the first captured variable, which is $ 1, which is assigned to when the pattern matches:

 s / with (\ w +) / against $ 1's team /;
print "$ _ \ n"; #print “He ’s out bowling against Fred ’s team tonight.” 

S /// returns a Boolean value, which is true when the replacement is successful, otherwise it is false.

With / g for global replacement, s /// will only be replaced once, and the / g modifier will allow s /// to perform all possible, non-repetitive replacements:

 $ _ = "home, sweet home!";
s / home / cave / g;
print “$ _ \ n”; #print “cave, sweet cave!” 

A fairly common global replacement is to reduce white space, which is to convert any consecutive white space into a single space:

 $ _ = "Input data \ t may have extra whitespace.";
s / \ s + / / g; # ”Input data may have extra whitespace.” 

Case replacement: \ U escape character will convert all subsequent characters to uppercase:

 $ _ = "I saw Barney with Fred.";
s / (fred | barney) / \ U $ 1 / gi; # “I saw BARNEY with FRED.” 

Similarly, the \ L escape character converts all characters after it to lowercase. By default, they affect all subsequent (replaced) strings. You can also use \ E to turn off the case conversion function:

 s / (\ w +) with (\ w +) / \ U $ 2 \ E with $ 1 / i; # ”I saw FRED with Barney.” 
Split operator, usage: my @fields = split / separator /, $ string;

The split operator here scans the specified string in split mode and returns a list of fields (that is, substrings). During the period, as long as the pattern is matched somewhere, the place is the end of the current field and the beginning of the next field. Split will keep the empty field at the beginning, but will discard the empty field at the end.

The default split will separate the strings in $ _ with whitespace: my @fields = split; # split / \ s + /, $ _;

The join function does not use patterns, and its function is exactly the opposite of split. Usage: my $ result = join $ glue, @pieces; We can understand the first parameter of join as glue, it can be any string. The remaining parameters are a series of fragments. join will apply glue between each segment and return the resulting string:

my $ x = join ":", 4,6, 8, 10, 12; # $ x is "4: 6: 8: 10: 12"

Therefore, the list must have at least two elements, otherwise the glue cannot be applied.

M // in the context of the list, when using split, the pattern specifies only the separator: the field obtained by decomposition is not necessarily the data we need. Sometimes, it is easier to specify the part you want to leave. The / g modifier previously seen in the s /// example can also be used on the m // operator, and its effect is to allow the pattern to match multiple places in the string.

 my $ text = "Fred dropped a 5 ton granite block on Mr. Slate";
my @words = ($ text = ~ / ([a-z] +) / ig);
print “Result: @words \ n”;
# print: Fred dropped a ton granite block on Mr Slate 

This is like using split in reverse: the regular pattern specifies not the part you want to remove, but the part you want to leave.

In fact, if there are multiple sets of parentheses in the pattern, then each match can capture multiple strings. Suppose we want to turn a string into a hash, we can do this:

my $ data = "Barney Rubble Fred Flintstone Wilma Flintstone";
my% last_name = ($ data = ~ / (\ w +) \ s + (\ w +) / g);
Each time the pattern matches successfully, a pair of captured values is returned, and this pair of values happens to be the key-value pair of the new hash.

Non-greedy quantifier. The regular expression engine has been performing backtracking actions, constantly adjusting the content of pattern matching in different ways to adapt to the string, until it finally finds a whole matching success, and if it is not found at the end, it declares failure.

For pattern matching across lines, ^ and $ are the anchors that represent the beginning and end of the entire string. But when the pattern is added with the / m modifier, you can use them to match each line in the string, so that the position they represent is no longer the beginning and end of the entire string, but the beginning and end of each line Too. The following program will first read the entire file into a variable, and then replace the file name as the prefix of each line:

 $ fliename = "hsl.txt";
open FILE, $ fliename
or die "Can't open '$ fliename': $!";
my $ lines = join '', <FILE>;
$ lines = ~ s / ^ / $ fliename: / gm;
print $ lines; 

Update multiple files at once: When automatically updating the file content through the program, the most common way is to first open a new file that is consistent with the original content, then rewrite it in the desired location, and finally write the modified content into it.

Now there is a file called fred03.dat with the following content:

Needs to be changed to:

Simply put, we need to make three changes: the name of the Author field should be changed, the Date should be changed to today ’s date, and the Phone should be deleted.

 #! / usr / bin / perl -w
use strict;
chomp (my $ date = localtime);
$ ^ I = "f.bak";
while (<>) {
s / ^ Author:. * / Author: Randal L. Schwartz /;
s / ^ Phone:. * \ n //;
s / ^ Date:. * / Date: $ date /;
print;
} 

Enter in the command line: perl change.pl fred03.dat

Let us first assume that the diamond operator just opened the file fred03.dat. In addition to opening the file as before, he also changed the file name to fred03.dat.bak. Although the same file is opened, the file name on the disk is different. Next, the diamond operator will open a new file and name it fred03.dat. There will be no problem with this, because we no longer have the file with the same name. Now the diamond operator will set the default output to this newly opened file, so all the contents of the output will be written into this file.

exercise:

1. Mine:

 use strict;
chomp (my $ what = <STDIN>);
print "$ what" x 3; 

Answer: / ($ what) {3} / means that you do not understand

2

 my $ in = $ ARGV [0];
if (! defined $ in) {
die "Usage: $ 0 filename";
}
my $ out = $ in;
$ out = ~ s / (\. \ w +)? $ /. out /;
if (! open $ in_fh, '<', $ in) {
die "Can't open '$ in': $!";
}
if (! open $ out_fh, '>', $ in) {
die "Can't open '$ out': $!";
}
while (<$ in_fh>) {
s / Fred / Larry / gi;
print $ out_fh $ _;
} 

At the beginning of this program, it will first inventory its command line parameters, expect one.

3

 my $ in = $ ARGV [0];
if (! defined $ in) {
die "Usage: $ 0 filename";
}
my $ out = $ in;
$ out = ~ s / (\. \ w +)? $ /. out /;
if (! open $ in_fh, '<', $ in) {
die "Can't open '$ in': $!";
}
if (! open $ out_fh, '>', $ in) {
die "Can't open '$ out': $!";
}
while (<$ in_fh>) {
chomp;
s / Fred / Larry / gi;
s / Wilma / Fred / gi;
s / \ n / Wilma / g;
print $ out_fh "$ _ \ n";
} 

We must first find a "placeholder", and it must not appear in the data. Because chomp is used, we know that the newline character (\ n) will never appear in the string, so the newline character can act as a placeholder.

4

 $ ^ I = ".bak";
while (<>) {
if (/ \ A #! /) {
$ _. = "## Copyright (C) 2015 by HSL \ n";
}
print;
} 

5. In order to avoid duplicating the copyright statement, we will process all the documents twice. In the first round, we will first create a hash whose key is the file name, and what its value is not important. For simplicity, set to 1. In the second round, we will treat this hash as a to-do list one by one, and remove the file that already contains the copyright statement line. The name of the file currently being read can be obtained with $ ARGV, so it can be used directly as a hash key.

 my% do_these;
foreach (@ARGV) {
$ do_these {$ _} = 1;
}
while (<>) {
if (/ \ A ## Copyright /) {
delete $ do_these {$ ARGV};
}
}
@ARGV = sort keys% do_these;
$ _I = ".bak";
while (<>) {
if (/ \ A #! /) {
$ _ = "## Copyright (c) 2015 by HSL";
}
print; 

This chapter gave me a deeper understanding of how to use perl to manipulate files. At the same time, I am continuing to watch my VTR-to-Bitstream project, which encountered a lot of problems in changing Virtex-6 to ZYNQ. But I am discussing with my teacher how to overcome it. I also started to read some books about reconfigurable computing.

Copyright statement: This article is an original article by bloggers and may not be reproduced without the permission of the bloggers.

Perl Learning 8 Processing Text with Regular Expression

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More