Perl sort Functions

Source: Internet
Author: User
1) sort function

Sort list
Sort Block List
Sort subname list

The usage of sort is as follows. It sorts the list and returns the sorted list. If the subname or block is ignored, sort is performed in the standard string comparison order (for example, ASCII order ). If subname is specified, it is actually the name of the sub-function. This sub-function compares two list elements and returns an integer smaller than, equal to, or greater than 0, this depends on the order in which the elements are sort (ascending, constant, or descending ). You can also provide a block as an anonymous sub-function to replace subname. The effect is the same.

The two elements to be compared are temporarily assigned to the variables $ A and $ B. They are passed as references, so do not modify $ A or $ B. If a subfunction is used, it cannot be a recursive function.

(2) usage example

1. Sort in numerical order

@ Array = (8, 2, 32, 1, 4, 16 );
Print join ('', sort {$ A <=>; $ B} @ array)," \ n ";

The printed result is:
1 2 4 8 16 32

The same is:

Sub numerically {$ A <=>; $ B };
Print join ('', sort numerically @ array)," \ n ";

This is easy to understand. It only performs sort in the order of natural numbers. I will not elaborate on it.

2.1 perform sort in ASCII (non-dictionary order)

@ Languages = QW (Fortran lisp C ++ Perl Python Java );
Print join ('', sort @ ages)," \ n ";

Print result:
Perl C ++ Fortran Java lisp Python

This is equivalent:
Print join ('', sort {$ a cmp $ B} @ ages)," \ n ";

There is nothing to say about sorting in the ASCII order.

Note: If you perform sort on numbers in ASCII order, the result may be different from what you want:

Print join ('', sort 1 .. 11)," \ n ";
1 10 11 2 3 4 5 6 7 8 9

2.2 sort in dictionary order

Use locale;
@ Array = QW (ascii ascap at_large atLarge a arp );
@ Sorted = sort {($ da = Lc $ A) = ~ S/[\ W _] + // G;
($ Db = Lc $ B) = ~ S/[\ W _] + // G;
$ Da CMP $ dB;
} @ Array;
Print "@ sorted \ n ";

The printed result is:
A arp ascap ascii atLarge at_large

Use locale is optional-it makes the code more compatible, if the raw data contains international characters. Use locale affects the operation attributes of CMP, LT, le, GE, GT, and other functions. For more information, see perllocale's man page.

Note that the order of atLarge and at_large is reversed during output, although their sort order is the same (the subfunction in sort deletes the underline in the middle of at_large ). This happens because the sample runs on Perl 5.005 _ 02. Before Perl 5.6, the sort function does not protect the sequence of keys with the same values. Perl versions 5.6 and later will protect this order.

Note that both map, grep, and sort must protect the value of the temporary variable $ _ ($ A and $ B in sort). Do not modify it. In this Code, assign values to $ da and $ db before performing the replacement operation S/[\ W _] + // G for $ A or $ B, in this way, the replacement operation will not modify the original element.

3. Sort in descending order

Descending sort is relatively simple. You can change the beginning and end operands of CMP or <=>; to the next position.

Sort {$ B <= >;a} @ array;

Or change the mark of the returned value of the block or sub-function in the middle:

Sort {-($ A <=>; $ B)} @ array;

Or use the reverse function (this is a bit inefficient, but it may be easy to read ):

Reverse sort {$ A <=>; $ B} @ array;

4. Use multiple keys for sort

To use multiple keys for sort, put all the comparison operations connected with or in a subfunction. Put the main comparison operations in front and the secondary ones in the back.

# An array of references to anonymous hashes
@ Employees = (
{First =>; 'bill ', last =>; 'gates ',
Salary = & gt; 600000, age = & gt; 45 },
{First =>; 'George ', last =>; 'tester'
Salary =>; 55000, age =>; 29 },
{First =>; 'Steve ', last =>; 'ballmer ',
Salary =>; 600000, age =>; 41}
{First =>; 'Sally ', last =>; 'developer ',
Salary =>; 55000, age =>; 29 },
{First =>; 'job', last =>; 'tester ',
Salary =>; 55000, age =>; 29 },
);
Sub seniority {
$ B->;{ salary }<=>; $ A->;{ salary}
Or $ B->;{ age }<=>; $ A->;{ age}
Or $ A->;{ last} CMP $ B->;{ last}
Or $ A->;{ first} CMP $ B->;{ first}
}
@ Ranked = sort seniority @ employees;
Foreach $ EMP (@ ranked ){
Print "$ emp->;{ salary} \ t $ emp->;{ age }\t $ emp->;{ first}
$ Emp->; {last} \ n ";
}

The printed result is:

600000 45 Bill Gates
600000 41 Steve Ballmer
55000 29 Sally developer
55000 29 George Tester
55000 29 Joe Tester

The above code looks complicated and is easy to understand. @ Employees the element of the array is an anonymous hash. Anonymous hash is actually a reference. You can use the->; operator to access its value, for example, $ employees [0]->; {salary} can access the value of salary in the first anonymous hash. Therefore, the comparison above is very clear. First compare the salary value, then compare the age value, then compare the last value, and finally compare the first value. Note that the first two items are in descending order, and the last two items are in ascending order. Do not confuse them.

5. New array of sort

@ X = QW (Matt Elroy Jane Sally );
@ Rank [sort {$ X [$ A] CMP $ X [$ B]} 0... $ # x] = 0... $ # X;
Print "@ rank \ n ";

The printed result is:

2 0 1 3

Are you confused? Read it carefully. 0 .. $ # X is a list. Its value is the subscript of the @ x array. Here it is 0 1 2 3. $ X [$ A] CMP $ X [$ B] compares each element in @ X in ASCII order. Therefore, the sort result returns a list of the @ x subscripts. The sorting standard is the ASCII order of the @ x element corresponding to the subscripts.

What does sort return? Let's first print out the ascii sequence of elements in @ X:

@ X = QW (Matt Elroy Jane Sally );
Print join '', sort {$ a cmp $ B} @ X;

The printed result is: Elroy Jane Matt Sally.

The corresponding subscript in @ X is 1 2 0 3, so the result returned by the above sort is the list of 1 2 0 3. @ Rank [1 2 0 3] = 0 .. $ # X is just a simple array assignment operation, so the result of @ rank is (2 0 1 3.

6. Perform sort on Hash by keys

% Hash = (Donald =>; knuth, Alan =>; Turing, John =>; norann );
@ Sorted = map {($ _ = >;hash hash {$ _})} Sort keys % hash;
Foreach $ hashref (@ sorted ){
($ Key, $ value) = each % $ hashref;
Print "$ key = >;$ value \ n ";
}

The printed result is:

Alan =>; Turing
Donald =>; knuth
John =>; norann

The above code is not hard to understand. Sort keys % hash returns a list in the ASCII order of the % hash keys, and then uses map for calculation. Note that the map {}} is used in map {{}}, the {} in it is an anonymous hash. That is to say, the map result is an anonymous hash list. Do you understand?

Therefore, the elements in the @ sorted array are anonymous hash values. By referencing them through % $ hashref, you can access their key/value values.

7. Perform sort on Hash by values

% Hash = (Elliot =>; Babbage,
Charles =>; Babbage,
Grace =>; hopper,
Herman =>; fig
);
@ Sorted = map {($ _ = >;hash hash {$ _})}}
Sort {$ hash {$ A} CMP $ hash {$ B}
Or $ a cmp $ B
} Keys % hash;
Foreach $ hashref (@ sorted ){
($ Key, $ value) = each % $ hashref;
Print "$ key = >;$ value \ n ";
}

The printed result is:

Charles =>; Babbage
Elliot =>; Babbage
Herman =>; fig
Grace =>; Hopper

The author of this article says, I think it is very important:

Unlike hash keys, we cannot guarantee the uniqueness of hash values. If you use sort hash only based on values, the sort sequence of the two elements with the same value may change when you add or delete other values. In order to obtain a stable result, the primary sort should be performed on the value, and the key should be read from sort.

Here {$ hash {$ A} CMP $ hash {$ B} or $ a cmp $ B} First press value and then press Key to perform sort twice, the result returned by sort is the sorted keys list, which is then handed over to map for calculation and an anonymous hash list is returned. The access method is the same as the previous one. I will not go into details.

8. Perform sort on words in the file and remove duplicate words

Perl-0777ane '$, = "\ n ";\
@ Uniq {@ f} = (); print sort keys % uniq 'file

I am not very clear about this practice ,:(

@ Uniq {@ f} = () uses hash slice to create a hash. Its keys are the only words in the file; the syntax is equivalent to $ uniq {$ f [0], $ f [1],... $ f [$ # F]} = ().

The options are described as follows:

-0777-read the entire file instead of a single row
-A-Automatic split mode: splits rows into @ F arrays.
-E-read and run scripts from the command line
-N-traverse the file line by line: While (<> ;){...}
$,-Print function output domain delimiter
File-file name

9. Efficient Sorting: orcish algorithm and schwartzian Conversion

The sub-functions of each key and sort are usually called multiple times. If you are very concerned about the running time of sort, you can use the orcish algorithm or the schwartzian conversion, so that each key is calculated only once.

Take the following example as an example. It lists the sort files based on the file modification date.

# Forced algorithm-multiple accesses to the disk for each file
@ Sorted = sort {-M $ A <=>;-M $ B} @ filenames;

# Orcish algorithm -- create keys in hash
@ Sorted = sort {($ modtimes {$ A} | =-M $ A) <=>;
($ Modtimes {$ B} | =-M $ B)
} @ Filenames;

Clever algorithms, isn't it? Because the file modification date remains unchanged during script running, you can save it after-M operation. This is often used: P

The following is the usage of the schwartzian conversion:

@ Sorted = map ({$ _->; [0]}
Sort ({$ A->; [1] <=>; $ B->; [1]}
Map ({[$ _,-M]} @ filenames)
)
);

This Code combines map and sort into several layers. Remember the methods I mentioned earlier and read them later. Map ({[$ _,-M]} @ filenames) returns a list. The list element is an anonymous array, and the first value of the anonymous array is the file name, the second value is the file modification date.

Sort ({$ A->; [1] <=>; $ B->; [1]}... then, perform sort on the anonymous array list generated above. It performs sort based on the file modification date. The returned result of sort is an anonymous array after sorting.

The peripheral map ({$ _->; [0]}... is simple. It extracts the file name from the anonymous array generated by sort. This file name is sort based on the modification date, and each file runs only once-M.

This is the famous schwartzian conversion, which is popular among Perl users outside China. Remember the concept of schwartzian that fairy tells you. It will not be laugh at by foreigners next time: P

The author of this article said:

Orcish algorithms are generally more difficult to encode and are not as elegant as the schwartzian conversion. I recommend that you use the schwartzian conversion as an optional method.

Remember the basic code optimization rules: (1) do not write code; (2) ensure that the Code is correct before making it fast; (3) Before making the code fast, let it be clear first.

10. Perform sort (schwartzian conversion) on the row based on the last column)

Assume that the value of $ STR is as follows (each line ends with \ n ):

EIR 11 9 2 6 3 1 81% 63% 13
Oos 10 6 4 3 3 0 4 60% 70% 25
HRH 10 6 4 5 1 2 2 60% 70% 15
SPP 10 6 4 3 3 1 3 60% 60% 14

Perform sort based on the size of the last domain:

$ STR = join "\ n ",
Map {$ _->; [0]}
Sort {$ A->; [1] <=>; $ B->; [1]}
Map {[$ _, (split) [-1]}
Split/\ n/, $ STR;

The printed result is:

EIR 11 9 2 6 3 1 81% 63% 13
SPP 10 6 4 3 3 1 3 60% 60% 14
HRH 10 6 4 5 1 2 2 60% 70% 15
Oos 10 6 4 3 3 0 4 60% 70% 25

Let's take a look at the above Code step by step from the back:

Split/\ n/, $ STR; here, a list is returned, and the list element is each row.

Map {[$ _, (split) [-1]} the map here returns an anonymous array list. The values of the anonymous array are the whole row and the last column of the row. This step is the key when using the schwartzian conversion. Remember to use map to construct your own anonymous array list. The 1st elements of the anonymous array are the final values, the 2nd elements are used for comparison.

Sort {$ A->; [1] <=>; $ B->; [1]} perform sort based on 2nd elements for the anonymous array generated in the previous step, it returns the anonymous array list after sort.

Map {$ _->; [0]} extracts 1st elements from the anonymous array after the sort in the previous step, that is, the whole row.

$ STR = join "\ n", connect the rows in the previous step with "\ n", and assign the value to $ Str.

Maybe you will say, "Why is it so troublesome? I do not want to use this method ." Then, it can be replaced by a ready-made module on CPAN:

Use sort: fields;
@ Sorted = fieldsort [6, '2n ','-3n'] @ lines;

CPAN's module documentation is very detailed. Let's take a look.

11. Revisit Efficient Sorting: Guttman-rosler Conversion

Consider the following example:

@ Dates = QW (2001/1/1 2001/07/04 %/12/25 );

Which method is most effective if you want to sort them by date in ascending order?

The most intuitive schwartzian conversion can be written as follows:

@ Sorted = map {$ _->; [0]}
Sort {$ A->; [1] <=>; $ B->; [1]
Or $ A->; [2] <=>; $ B->; [2]
Or $ A->; [3] <=>; $ B->; [3]
}
Map {[$ _, split m </>; $ _, 3]} @ dates;

However, the more efficient Gutman-rosler conversion (GRT) writes:

@ Sorted = map {substr $ _, 10}
Sort
Map {M | (\ D)/(\ D +) |;
Sprintf "% d-% 02d-% 02d % s", $1, $2, $3, $ _
} @ Dates;

The author of this article said:

The GRT method is hard to code and harder to read than the schwartzian conversion, so I recommend using GRT only in extreme environments. Using a large data source, Perl 5.005 _ 03 and Linux 2.2.14 are tested. GRT conversion is 1.7 times faster than that of schwartzian. Run the Perl 5.005 _ 02 and Windows NT 4.0 SP6 tests. GRT is 2.5 times faster than that of schwartzian.

In addition, Perl 5.6 and later sort use the mergesort algorithm, while SORT earlier than 5.6 uses the quicksort algorithm. The former is obviously faster than the latter, upgrade your Perl version.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.