Detailed usage of grep in perl

Last Update:2014-05-25 Source: Internet

Author: User

Tags glob

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I have been learning the perl language recently. The following describes how to use powerful grep functions in perl programming.

1. Grep Function
Grep has two expressions:

1 grep BLOCK LIST2 grep EXPR, LIST

BLOCKIndicates a code block, usually represented by {}; EXPR indicates an expression, usually a regular expression. In the original article, EXPR is anything, including one or more variables, operators, texts, functions, or subfunctions.
LISTIs the list to be matched.
Grep matches each element in the list by BLOCK or EXPR. it traverses the list and temporarily sets the element to $ _. In the context of the list, grep returns all matching elements, and the result is also a list. In a scalar context, grep returns the number of matched elements.
2. Grep and loops

1 open FILE "&lt;myfile"ordie"Can't open myfile: $!";2 print grep /terrorism|nuclear/i,&lt;FILE&gt;;

Open a file named myfile and search for lines containing terrorism or nuclear. <FILE> Returns a list containing the complete content of the FILE. You may have discovered that this method is very memory-consuming if the file is large, because all the content of the file is copied to the memory.
Of course, you can also use loop to complete the process:

1  while($line =&lt;FILE&gt;;){2     if($line =~/terrorism|nuclear/i){print $line }3     }

The code above shows that loop can do anything grep can do. Why grep? The answer is that grep is more perl-style, while loop is C-style.
A better explanation is: (1) grep makes readers more clearly aware that you select an element from the list; (2) grep is simpler than loop.
One suggestion: if you are a beginner in perl, it is better to use the loop. When you are familiar with perl, you can use the powerful grep tool.
3. Several grep examples

1. count the number of list elements matching expressions

$num_apple = grep /^apple$/i,@fruits;

In the scalar context, grep returns the number of matching elements; In the list context, grep returns a list of matching elements.
Therefore, the above code returns the number of apple words in the @ fruits array. Because $ num_apple is a scalar, it forces the grep result to be In the scalar context.

2. Extract unique elements from the list

1 @unique= grep {++$count{$_}&lt;2}2 qw(a b a c d d e f g f h h);3 print"@unique\n";

After the code is run, a B c d e f g h is returned.
That is, the unique element in the list qw (a B a c d e f g f h) is returned. Why? Let's take a look:
% Count is a hash structure. Its key is the list elements extracted one by one when traversing the qw () list. + $ Count {$ _} indicates the auto-increment of the hash value corresponding to $. In this comparison context, ++ $ count {$ _} and $ count {$ _} ++ have different meanings. The former indicates that before comparison, the value is increased by 1; the latter indicates that the value is increased by 1 only after comparison. Therefore, + + $ count {$ _} <2 indicates adding $ count {$ _} to 1 and then comparing it with 2. The default value of $ count {$ _} is undef or 0. So when an element a is treated as a hash keyword for the first time, its auto-increment hash value is 1. When it is used as a hash keyword for the second time, the corresponding hash value is changed to 2. After it is changed to 2, the comparison condition is not met, so a will not appear for 2nd times.
Therefore, the above code can be used to extract elements from the list for only one time.

2. Extract elements that appear exactly twice in the list

1     @crops= qw(wheat corn barley rice corn soybean hay2     alfalfa rice hay beets corn hay);3     @duplicates= grep { $count{$_}==2}4     grep {++$count{$_}&gt;;1}@crops;5     print"@duplicates\n";

Running result: rice
Here grep is twice. The order is from right to left. First, grep {++ $ count {$ _} >;1} @ crops; returns a list of elements whose number of occurrences is greater than 1 in @ crops.
Then perform grep {$ count {$ _} = 2} calculation on the generated temporary list. here you should understand that it is in the temporary list, if the number of occurrences of an element is equal to 2, it is returned.
So the above code returns rice. rice appears more than 1 times and is precise to 2. Do you understand?

3. List text files in the current directory

1     @files= grep {-f and-T } glob '* .*';2     print"@files\n";

Glob returns a list of all objects in the current directory, except those starting. {} Is a code block that contains conditions that match the list following it. This is just another usage of grep, which is similar to grep EXPR and LIST. -F and-T match the elements in the list. First, it must be a common file, and then it must be a text file. It is said that the write efficiency is high. Because-T has a higher overhead,-f should be judged before-T.

4. Select array elements and eliminate duplicates

1     @array= qw(To be ornot to be that is the question);2     @found_words=3     grep { $_ =~/b|o/i and++$counts{$_}&lt;2;}@array;4     print"@found_words\n";5

The running result is: To be or not to question.
The meaning in {} is that, for each element in @ array, match whether it contains B or o characters (case-insensitive) first, and then the number of times each element appears, it must be less than 2 (that is, once ).
Grep returns a list containing elements that meet the preceding two conditions in @ array.

5. Select an element from a two-dimensional array, and x <y

1     # An array of references to anonymous arrays2     @data_points=([5,12],[20,-3],3     [2,2],[13,20]);4     @y_gt_x= grep { $_-&gt;;[0]&lt; $_-&gt;;[1]}@data_points;5     foreach $xy (@y_gt_x){print"$xy-&gt;;[0], $xy-&gt;;[1]\n"}

The running result is:
5, 12
13, 20
Here, you should understand anonymous arrays. [] is an anonymous array, which is actually an array reference (similar to a pointer in C ).
@ Data_points the element is an anonymous array. For example:

1     foreach(@data_points){2     print $_-&gt;;[0];}

In this way, 1st elements in the anonymous array are accessed, and 0 is replaced with 1, which means 2nd elements.
So {$ _->; [0] <$ _->; [1]} is clear. It indicates the value of the first element of each anonymous array, smaller than the value of the second element.
Grep {$ _->; [0] <$ _->; [1]} @ data_points; returns an anonymous array list that meets the preceding conditions.

Reference: perl Language Learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More