Summary of grep usage in Perl Z

Source: Internet
Author: User
Tags glob
A) Grep function
grep has 2 expressions:
grep BLOCK LIST
grep EXPR, LIST
BLOCK represents a code block, usually represented by {}; EXPR represents an expression, usually a regular expression. The original text says EXPR can be anything, including one or more variables, operators, text, functions, or subfunction calls.
LIST is the list to be matched.
grep performs BLOCK or EXPR matching on each element in the list. It traverses the list and temporarily sets the element to $ _. In the context of a list, grep returns all elements that match, and the result is also a list. In a scalar context, grep returns the number of elements that matched the hit.
(2) Grep vs. loops
open FILE "<myfile" or die "Can't open myfile: $!";
print grep / terrorism | nuclear / i, <FILE> ;;
Here, open a file myfile, and then look for the line containing terrorism or nuclear. <FILE>; returns a list, which contains the complete contents of the file.
You may have found that if the file is very large, this method is very memory-intensive, because all the contents of the file are copied into the memory.
The alternative is to use loop (loop) to complete:
while ($ line = <FILE>;) {
if ($ line = ~ / terrorism | nuclear / i) {print $ line}
}
The above code shows that loop can accomplish anything grep can do. Why use grep? The answer is that grep is more perl style, while loop is C style.
A better explanation is: (1) grep makes the reader more obvious that you are selecting an element from the list; (2) grep is more concise than loop.
One suggestion: if you are new to perl, it is better to use loops in a regular manner; once you are familiar with perl, you can use grep, a powerful tool.
(3) Several examples of grep
1. Count the number of list elements matching the expression
$ num_apple = grep / ^ apple $ / i, @fruits;
In a scalar context, grep returns the number of matching elements; in a list context, grep returns a list of matching elements.
Therefore, the above code returns the number of apple words in the @fruits array. Because $ num_apple is a scalar, it forces the grep result to be in a scalar context.
2. Extract unique elements from the list
@unique = grep {++ $ count {$ _} <2}
         qw (a b a c d d e f g f h h);
print "@unique \ n";
The above code will return after running: a b c d e f g h
That is, the only element in the list qw (a b a c d d e f g f h h) is returned. Why is this so? let us see:
% count is a hash structure, and its key is the list elements extracted one by one when traversing the qw () list. ++ $ count {$ _} indicates that the hash value corresponding to $ _ increases automatically. In this comparison context, the meanings of ++ $ count {$ _} and $ count {$ _} ++ are different. The former means that before the comparison, it will increase its own value by 1; After that, it increments itself by one. So, ++ $ count {$ _} <2 means increase $ count {$ _} by 1 and compare it with 2. The value of $ count {$ _} is undef or 0 by default. So when an element a is used as a hash key for the first time, the corresponding hash value after incrementing is 1, when it is used as a hash key for the second time, the corresponding hash value becomes 2. . After becoming 2, the comparison condition is not satisfied, so a will not appear for the second time.
So the above code can extract elements from the list only once.
2. Extract the elements that appear exactly 2 times in the list
@crops = qw (wheat corn barley rice corn soybean hay
       alfalfa rice hay beets corn hay);
@duplicates = grep {$ count {$ _} == 2}
       grep {++ $ count {$ _}>; 1} @crops;
print "@duplicates \ n";
The result is: rice
Here grep is done twice, the order is from right to left. First, grep {++ $ count {$ _}>; 1} @crops; returns a list, and the result of the list is the elements with the number of occurrences greater than 1 in @crops.
Then perform grep {$ count {$ _} == 2} on the resulting temporary list. You should also understand the meaning here, that is, in the temporary list, the number of occurrences of elements equal to 2 is returned.
So the above code returns rice. The number of occurrences of rice is greater than 1, and it is precisely equal to 2, understand? :-)
3. List text files in the current directory
@files = grep {-f and -T} glob '*. *';
print "@files \ n";
This is easy to understand. glob returns a list whose contents are any files in the current directory, except those beginning with '.'. {} Is a code block that contains conditions that match the list that follows it. This is just another usage of grep, which is actually similar to the usage of grep EXPR, LIST. -f and -T match the elements in the list, first it must be an ordinary file, then it must be a text file. It is said that the efficiency of writing this way is higher, because -T costs more, so before judging -T, first judge -f.
4. Select array elements and eliminate duplicates
@array = qw (To be or not to be that is the question);
@found_words =
grep {$ _ = ~ / b | o / i and ++ $ counts {$ _} <2;} @array;
print "@found_words \ n";
The running result is: To be or not to question
The meaning in {} is that for each element in @array, first match whether it contains b or o characters (case-insensitive), and then the number of occurrences of each element must be less than 2 (that is, 1 time) .
grep returns a list containing the elements in @array that satisfy the above 2 conditions.
5. Select elements from the two-dimensional array, and x <y
# An array of references to anonymous arrays
@data_points = ([5, 12], [20, -3],
         [2, 2], [13, 20]);
@y_gt_x = grep {$ _->; [0] <$ _->; [1]} @data_points;
foreach $ xy (@y_gt_x) {print "$ xy->; [0], $ xy->; [1] \ n"}
The running result is:
5, 12
13, 20
Here, you should understand anonymous arrays, [] is an anonymous array, it is actually a reference to an array (similar to a pointer in C).
The elements of @data_points are anonymous arrays. E.g:
foreach (@data_points) {
print $ _->; [0];}
This accesses the first element in the anonymous array, and replacing 0 with 1 is the second element.
So {$ _->; [0] <$ _->; [1]} is very clear, it means that the value of the first element of each anonymous array is less than the value of the second element.
And grep {$ _->; [0] <$ _->; [1]} @data_points; will return a list of anonymous arrays that meet the above conditions.
So, get the result you want!
6. Simple database query
The complexity of grep's {} depends on the amount of virtual memory available to the program. The following is a complex {} example, which simulates a database query:
# @database is array of references to anonymous hashes
@database = (
{name =>; "Wild Ginger",
   city =>; "Seattle",
   cuisine =>; "Asian Thai Chinese Korean Japanese",
   expense =>; 4,
   music =>; "\ 0",
   meals =>; "lunchdinner",
   view =>; "\ 0",
   smoking =>; "\ 0",
   parking =>; "validated",
   rating =>; 4,
   payment =>; "MCVISA AMEX",
},
# {...}, etc.
);
sub findRestaurants {
my ($ database, $ query) = @_;
return grep {
   $ query->; {city}?
       lc ($ query->; {city}) eqlc ($ _->; {city}): 1
    and $ query->; {cuisine}?
       $ _->; {cuisine} = ~ / $ query->; {cuisine} / i: 1
    and $ query->; {min_expense}?
     $ _->; {expense}>; = $ query->; {min_expense}: 1
    and $ query->; {max_expense}?
     $ _->; {expense} <= $ query->; {max_expense}: 1
    and $ query->; {music}? $ _->; {music}: 1
    and $ query->; {music_type}?
     $ _->; {music} = ~ / $ query->; {music_type} / i: 1
    and $ query->; {meals}?
     $ _->; {meals} = ~ / $ query->; {meals} / i: 1
    and $ query->; {view}? $ _->; {view}: 1
    and $ query->; {smoking}? $ _->; {smoking}: 1
    and $ query->; {parking}? $ _->; {parking}: 1
    and $ query->; {min_rating}?
     $ _->; {rating}>; = $ query->; {min_rating}: 1
    and $ query->; {max_rating}?
     $ _->; {rating} <= $ query->; {max_rating}: 1
    and $ query->; {payment}?
     $ _->; {payment} = ~ / $ query->; {payment} / i: 1
} @ $ database;
}
% query = (city =>; 'Seattle', cuisine =>; 'Asian | Thai');
@restaurants = findRestaurants (\ @ database, \% query);
print "$ restaurants [0]->; {name} \ n";
The running result is: Wild Ginger
The above code is not difficult to understand, but Fairy does not recommend using such a code, one consumes memory, and the other is difficult to maintain.
(Note, the above content is from bbs.chinaunix.net)
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.