Detailed usage of grep in perl

Source: Internet
Author: User
Tags glob

I have been learning the perl language recently. The following describes how to use powerful grep functions in perl programming.

1. Grep Function
Grep has two expressions:

1 grep BLOCK LIST2 grep EXPR, LIST

 

BLOCKIndicates a code block, usually represented by {}; EXPR indicates an expression, usually a regular expression. In the original article, EXPR is anything, including one or more variables, operators, texts, functions, or subfunctions.
LISTIs the list to be matched.
Grep matches each element in the list by BLOCK or EXPR. it traverses the list and temporarily sets the element to $ _. In the context of the list, grep returns all matching elements, and the result is also a list. In a scalar context, grep returns the number of matched elements.
2. Grep and loops

1 open FILE "<myfile"ordie"Can't open myfile: $!";2 print grep /terrorism|nuclear/i,<FILE>;

 

Open a file named myfile and search for lines containing terrorism or nuclear. <FILE> Returns a list containing the complete content of the FILE. You may have discovered that this method is very memory-consuming if the file is large, because all the content of the file is copied to the memory.
Of course, you can also use loop to complete the process:

1  while($line =&lt;FILE&gt;;){2     if($line =~/terrorism|nuclear/i){print $line }3     }

 

The code above shows that loop can do anything grep can do. Why grep? The answer is that grep is more perl-style, while loop is C-style.
A better explanation is: (1) grep makes readers more clearly aware that you select an element from the list; (2) grep is simpler than loop.
One suggestion: if you are a beginner in perl, it is better to use the loop. When you are familiar with perl, you can use the powerful grep tool.
3. Several grep examples

1. count the number of list elements matching expressions

 

$num_apple = grep /^apple$/i,@fruits;

 

 

 

In the scalar context, grep returns the number of matching elements; In the list context, grep returns a list of matching elements.
Therefore, the above code returns the number of apple words in the @ fruits array. Because $ num_apple is a scalar, it forces the grep result to be In the scalar context.

2. Extract unique elements from the list

1 @unique= grep {++$count{$_}&lt;2}2 qw(a b a c d d e f g f h h);3 print"@unique\n";

 

After the code is run, a B c d e f g h is returned.
That is, the unique element in the list qw (a B a c d e f g f h) is returned. Why? Let's take a look:
% Count is a hash structure. Its key is the list elements extracted one by one when traversing the qw () list. + $ Count {$ _} indicates the auto-increment of the hash value corresponding to $. In this comparison context, ++ $ count {$ _} and $ count {$ _} ++ have different meanings. The former indicates that before comparison, the value is increased by 1; the latter indicates that the value is increased by 1 only after comparison. Therefore, + + $ count {$ _} <2 indicates adding $ count {$ _} to 1 and then comparing it with 2. The default value of $ count {$ _} is undef or 0. So when an element a is treated as a hash keyword for the first time, its auto-increment hash value is 1. When it is used as a hash keyword for the second time, the corresponding hash value is changed to 2. After it is changed to 2, the comparison condition is not met, so a will not appear for 2nd times.
Therefore, the above code can be used to extract elements from the list for only one time.

2. Extract elements that appear exactly twice in the list

1     @crops= qw(wheat corn barley rice corn soybean hay2     alfalfa rice hay beets corn hay);3     @duplicates= grep { $count{$_}==2}4     grep {++$count{$_}&gt;;1}@crops;5     print"@duplicates\n";

 

Running result: rice
Here grep is twice. The order is from right to left. First, grep {++ $ count {$ _} >;1} @ crops; returns a list of elements whose number of occurrences is greater than 1 in @ crops.
Then perform grep {$ count {$ _} = 2} calculation on the generated temporary list. here you should understand that it is in the temporary list, if the number of occurrences of an element is equal to 2, it is returned.
So the above code returns rice. rice appears more than 1 times and is precise to 2. Do you understand?

3. List text files in the current directory

1     @files= grep {-f and-T } glob '* .*';2     print"@files\n";

 

Glob returns a list of all objects in the current directory, except those starting. {} Is a code block that contains conditions that match the list following it. This is just another usage of grep, which is similar to grep EXPR and LIST. -F and-T match the elements in the list. First, it must be a common file, and then it must be a text file. It is said that the write efficiency is high. Because-T has a higher overhead,-f should be judged before-T.

4. Select array elements and eliminate duplicates

1     @array= qw(To be ornot to be that is the question);2     @found_words=3     grep { $_ =~/b|o/i and++$counts{$_}&lt;2;}@array;4     print"@found_words\n";5      

 

The running result is: To be or not to question.
The meaning in {} is that, for each element in @ array, match whether it contains B or o characters (case-insensitive) first, and then the number of times each element appears, it must be less than 2 (that is, once ).
Grep returns a list containing elements that meet the preceding two conditions in @ array.

5. Select an element from a two-dimensional array, and x <y

1     # An array of references to anonymous arrays2     @data_points=([5,12],[20,-3],3     [2,2],[13,20]);4     @y_gt_x= grep { $_-&gt;;[0]&lt; $_-&gt;;[1]}@data_points;5     foreach $xy (@y_gt_x){print"$xy-&gt;;[0], $xy-&gt;;[1]\n"}

 

The running result is:
5, 12
13, 20
Here, you should understand anonymous arrays. [] is an anonymous array, which is actually an array reference (similar to a pointer in C ).
@ Data_points the element is an anonymous array. For example:

1     foreach(@data_points){2     print $_-&gt;;[0];}

 

In this way, 1st elements in the anonymous array are accessed, and 0 is replaced with 1, which means 2nd elements.
So {$ _->; [0] <$ _->; [1]} is clear. It indicates the value of the first element of each anonymous array, smaller than the value of the second element.
Grep {$ _->; [0] <$ _->; [1]} @ data_points; returns an anonymous array list that meets the preceding conditions.

Reference: perl Language Learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.