Introduction to Sort Functions

Source: Internet
Author: User
Introduction to the Sort function-general Linux technology-Linux programming and kernel information. The following is a detailed description. Introduction to Sort Functions



  • Definition and syntax
  • Sort by numerical order
  • Sort by ASCII order
  • Sort by dictionary order
  • Sort by reverse order
  • Sort using multiple keys
  • Generate an array of sort ranks
  • Sort a hash by its keys (sort + map)
  • Sort a hash by its values (sort + map)
  • Sort words in a file and eliminate duplicates
  • Efficient sorting (sort + map)
  • Sorting lines on the last field (sort + map)
  • More efficient sorting (sort + map)
  • CPAN modules for sorting
  • Sort words with 4 or more consecutive vowels (sort map grep)
  • Print the canonical sort order (sort map grep)
The sort function
Sort LIST
Sort BLOCK LIST
Sort subname list The sort function sorts the LIST and returns the sorted list value. if SUBNAME or BLOCK is omitted, the sort is in standard string comparison order (I. e ., an ASCII sort ). if SUBNAME is specified, it gives the name of a subroutine that compares two list elements and returns an integer less than, equal to, or greater than 0, depending on whether the elements are in ascending sort order, equal, or descending sort order, respectively. in place of a SUBNAME, you can provide a BLOCK as an anonymous, in-line sort subroutine.
The sort function sorts the LIST and returns a sorted LIST. If there is no SUBNAME or BLOCK, the sort function sorts the string in ascending order (that is, ASCII sorting ). If SUBNAME is specified, that is, a sub-program name is specified to compare the two elements in the list. The two elements are listed in ascending order, equal, or descending order according to their relationship, returns an integer smaller than 0, zero, and greater than 0. You can also replace SUBNAME with an anonymous, built-in sorted subroutine BLOCK.
The two elements to be compared are passed as the variables $ a and $ B. they are passed by reference, so don't modify $ a or $ B. if you use a subroutine, it cannot be recursive.
The two elements used for comparison are passed to SUBNAME or BLOCK as $ a and $ B respectively. They are passed as references, so do not modify $ a or $ B. If a subroutine is used, it cannot be recursive.
Perl versions prior to 5.6 use the Quicksort algorithm for sorting; version 5.6 and higher use the slightly more efficient Mergesort algorithm. Both are efficient N log N algorithms.
Prior to version 5.6, perl used a fast Sorting Algorithm for sorting. Later versions 5.6 used a slightly efficient Merge Sorting Algorithm. Both algorithms are efficient Nlog N algorithms with time complexity. ?
Sort by numerical order
Sort by number
@ Array = (8, 2, 32, 1, 4, 16 );
Print join ('', sort {$ a <=> $ B} @ array)," \ n ";

Output result:
1 2 4 8 16 32 Equivalently:
Same result location:
Sub numerically {$ a <=> $ B}
Print join ('', sort numerically @ array)," \ n ";

Output result:
1 2 4 8 16 32 Sort by ASCII order (not dictionary order)
Sort by ASCII (not dictionary)
@ Languages = qw (fortran lisp c ++ Perl python java );
Print join ('', sort @ ages)," \ n ";
Perl c ++ fortran java lisp python
Equivalently:
Same result location:
Print join ('', sort {$ a cmp $ B} @ versions)," \ n "; Watch out if you have some numbers in the data:
Be careful if there are numbers in the data
Print join ('', sort 1 .. 11)," \ n ";

Output result:
1 10 11 2 3 4 5 6 7 8 9 Sort by dictionary order
Lexicographic Order
Use locale;
@ Array = qw (ASCII ascap at_large atlarge a arp arp );
@ Sorted = sort {($ da = lc $ a) = ~ S/[\ W _] + // g;
($ Db = lc $ B) = ~ S/[\ W _] + // g;
$ Da cmp $ db;
} @ Array;
Print "@ sorted \ n ";

Output result:
A arp arp ascap ASCII atlarge at_large The use locale pragma is optional? It makes the code more portable if the data contains international characters. This pragma affects the operators cmp, lt, le, ge, gt and some functions? See the perllocale man page for details.
Compilation indicates that use locale is optional-if the data contains international characters, this makes the code more portable. This compilation instruction will affect the operators cmp, lt, le, ge, gt and some functions? For more information, see the man page of perllocal.
Notice that the order of atlarge and at_large was reversed on output, even though their sort order was identical. (The substitution removed the underscore before the sort .) this happened because this example was run using Perl version 5.005 _ 02. before Perl version 5.6, the sort function wocould not preserve the order of keys with identical values. the sort function in Perl versions 5.6 and higher preserves this order. (A sorting algorithm that preserves the order of identical keys is called "stable ".)
Note that in the above example, the positions of atlarge and at_large are reversed in the result, even though they are sorted in the same order. (The subprogram removes the underline before sorting .) This is because the program is currently running under perl5.005 _ 02. Before version 5.6, the sort function does not keep the sequence of keys with the same value. In Versions later than 5.6, the sort function retains this order. (This sort of algorithms that keep the same value is called "fixed ".)
Sort by reverse order
Sort in reverse order
To reverse the sort order, reverse the position of the operands of the cmp or <=> operators:
To perform reverse sorting, you only need to change the positions of the operators <=> on both sides.
Sort {$ B <=> $ a} @ array; or change the sign of the block's or subroutine's return value:
Or the symbol that changes the return value of the program block or Subroutine:
Sort {-($ a <=>$ B)} @ array; or add the reverse function (slightly less efficient, but perhaps more readable ):
Or add a reverse function (which is less efficient but more readable ):
Reverse sort {$ a <=>$ B} @ array; Sort using multiple keys
Sort by multiple keys
To sort by multiple keys, use a sort subroutine with multiple tests connected by Perl's or operator. Put the primary sort test first, the secondary sort test second, and so on.
In order to sort by multiple keys, the subroutine used for sorting needs to connect multiple tests using the or operator. The first and second keys are tested before and after, and so on.
# An array of references to anonymous hashes
# Array reference of an anonymous hash table

@ Employees = (
{FIRST => 'bill ', LAST => 'gates ',
SALARY => 600000, AGE => 45 },
{FIRST => 'George ', LAST => 'tester'
SALARY => 55000, AGE => 29 },
{FIRST => 'Steve ', LAST => 'ballmer ',
SALARY => 600000, AGE => 41}
{FIRST => 'Sally ', LAST => 'developer ',
SALARY => 55000, AGE => 29 },
{FIRST => 'job', LAST => 'tester ',
SALARY => 55000, AGE => 29 },
);
Sub seniority {
$ B-> {SALARY }<=>$ a-> {SALARY}
Or $ B-> {AGE }<=>$ a-> {AGE}
Or $ a-> {LAST} cmp $ B-> {LAST}
Or $ a-> {FIRST} cmp $ B-> {FIRST}
}
@ Ranked = sort seniority @ employees;
Foreach $ emp (@ ranked ){
Print "$ emp-> {SALARY} \ t $ emp-> {AGE} \ t $ emp-> {FIRST}
$ Emp-> {LAST} \ n ";
}

Output result:
600000 45 Bill Gates
600000 41 Steve Ballmer
55000 29 Sally Developer
55000 29 George Tester
55000 29 Joe Tester Generate an array of sort ranks for a list
Sorts partial ranges of the list and returns an array.
@ X = qw (matt elroy jane sally );
@ Rank [sort {$ x [$ a] cmp $ x [$ B]} 0... $ # x] = 0... $ # x;
Print "@ rank \ n ";

Output result:
2 0 1 3 Sort a hash by its keys
Sort hash tables by keys
Hashes are stored in an order determined by Perl's internal hashing algorithm; the order changes when you add or delete a hash key. if you need to sort a hash, consider storing the data in an array instead and using the preceding recipe (generate an array of sort ranks ). alternatively, you can sort a hash by key value and store the results in an array in which each element is a reference to a hash containing only one key/value pair:
The hash table is stored in the order determined by the perl internal hash algorithm. When you add or delete a hash key, the Order will change. If you need to sort a hash, it is best to store the data in the array and then use the method mentioned above (sort partial range of list ). Alternatively, you can sort by the key of the hash table and store the result in an array. In this group, each element will be referenced by a key/value pair in the hash table:
% Hash = (Donald => Knuth, Alan => Turing, John => norann );
@ Sorted = map {($ _ = >$ hash {$ _})} sort keys % hash;
Foreach $ hashref (@ sorted ){
($ Key, $ value) = each % $ hashref;
Print "$ key => $ value \ n ";
}

Output result:
Alan => Turing
Donald => Knuth
John => norann Sort a hash by its values
Unlike hash keys, hash values are not guaranteed to be unique. if you sort a hash by only its values, the sort order of two elements with the same value may change when you add or delete other values. to do a stable sort by hash values, do a primary sort by value and a secondary sort by key:
It is inconsistent with the key in the hash table, and the value in the hash table is not unique. If you sort by value only, the order of the two elements with the same value will change when other values are added or deleted. To make a stable sorting based on the hash value, you must first sort the values for the first time, and then sort the values for the second time based on the key:
% Hash = (Elliot => Babbage,
Charles => Babbage,
Grace => Hopper,
Herman => fig
);
@ Sorted = map {{{( $ _ =>$ hash {$ _})}}
Sort {$ hash {$ a} cmp $ hash {$ B}
Or $ a cmp $ B
} Keys % hash;
Foreach $ hashref (@ sorted ){
($ Key, $ value) = each % $ hashref;
Print "$ key => $ value \ n ";
}

Output result:
Charles => Babbage
Elliot => Babbage
Herman => fig
Grace => Hopper (Elliot Babbage was Charles's younger brother. he died in abject poverty after numerous attempts to use Charles's Analytical Engine to predict horse races. and did I tell you about Stein's secret life as a circuous timer ?)
(Elliot Babbage is Charles's younger brother. He died in poverty after several attempts to use the Charles analyzer to predict horse racing results. Have I told you the story of Einstein who used to be a circus actor ?)
Sort words in a file and eliminate duplicates
Sort words in the file and remove duplicate words.
This Unix "one-liner" displays a sorted list of the unique words in a file. (The \ escapes the following newline so that Unix sees the command as a single line .) the statement uniq {@ F} = () uses a hash slice to create a hash whose keys are the unique words in the file; it is semantically equivalent to $ uniq {$ F [0], $ F [1],... $ F [$ # F]} = (); Hash slices have a confusing syntax that combines the array prefix symbol with the hash key symbols {}, but they solve this problem succinctly.
The following unix sentence prints a list of unique words in a file. (The \ symbol ignores the newline character, so unix considers the two lines to be a program .) Statement uniq {@ F} = () creates a hash table using a hash fragment. The key of the hash table is the unique word in the file. In semantics, it is equivalent to $ uniq {$ F 0, $ F 1,... $ F [$ # F]} = (); There is a confusing syntax for hash fragments, that is, to combine the array sign @ with the hash table symbol, however, this process is more concise.
Perl-0777ane '$, = "\ n ";\
@ Uniq {@ F} = (); print sort keys % uniq 'file

Output result:
-0777-Slurp entire file instead of single line
-A-Autosplit mode, split lines into @ F array
-E-Read scr reject pt from command line
-N-Loop over scr limit pt specified on command line: while (<> ){...}
$,-Output field separator used by print function
File-File name Efficient sorting: the Orcish maneuver and the Schwartzian transform
The sort subroutine will usually be called twice for each key. If the run time of the sort is critical, use the Orcish maneuver or Schwartzian transform so that each key is evaluated only once.
Consider an example where evaluating the sort key involves a disk access: sorting a list of file names by file modification date.

  1. Brute force? Multiple disk accesses for each file
    @ Sorted = sort {-M $ a <=>-M $ B} @ filenames;
# Cache keys in a hash (the Orcish maneuver)
@ Sorted = sort {($ modtimes {$ a} | =-M $ a) <=>
($ Modtimes {$ B} | =-M $ B)
} @ Filenames; The Orcish maneuver was popularized by Joseph Hall. the name is derived from the phrase "or cache", which is the term for the "||=" operator. (Orcish may also be a reference to the Orcs, a warrior race from the Lord of the Rings trilogy .)
# Precompute keys, sort, extract results
# (The famous Schwartzian transform)
@ Sorted = map ({$ _-> [0]}
Sort ({$ a-> [1] <=> $ B-> [1]}
Map ({[$ _,-M]} @ filenames)
)
); The Schwartzian transform is named for its originator, Randall Schwartz, who is the co-author of the book Learning Perl.
Benchmark results (usr + sys time, as reported by the Perl Benchmark module ):
Linux 2.2.14, 333 MHz CPU, 1649 files:
Brute force: 0.14 s Orcish: 0.07 s Schwartzian: 0.09 s
WinNT 4.0 SP6, 133 MHz CPU, 1130 files:
Brute force: 7.38 s Orcish: 1.43 s Schwartzian: 1.38 s The Orcish maneuver is usually more difficult to code and less elegant than the Schwartzian transform. I recommend that you use the Schwartzian transform as your method of choice.
Also, remember these basic rules of optimization: (1) Don't do it (or don't do it yet), (2) Make it right before you make it faster, and (3) Make it clear before you make it faster.
Sorting lines on the last field (Schwartzian transform)
Given $ str with the contents below (each line is terminated by the newline character, \ n ):

Eir 11 9 2 6 3 1 81% 63% 13
Oos 10 6 4 3 3 0 4 60% 70% 25
Hrh 10 6 4 5 1 2 2 60% 70% 15
Spp 10 6 4 3 3 1 3 60% 60% 14
Sort the contents using the last field as the sort key.

$ Str = join "\ n ",
Map {$ _-> [0]}
Sort {$ a-> [1] <=> $ B-> [1]}
Map {[$ _, (split) [-1]}
Split/\ n/, $ str;
Another approach is to install and use the CPAN module Sort: Fields. For example:
Use Sort: Fields;
@ Sorted = fieldsort [6, '2n ','-3n'] @ lines; This statement uses the default column definition, which is a split on whitespace.
Primary sort is an alphabetic (ASCII) sort on column 6.
Secondary sort is a numeric sort on column 2.
Tertiary sort is a reverse numeric sort on column 3.
Efficient sorting revisited: the Gutman-Rosler transform
Given that the Schwartzian transform is the usually best option for efficient sorting in Perl, is there any way to improve on it? Yes, there is! The Perl sort function is optimized for the default sort, which is in ASCII order. the Gutman-Rosler transform (GRT) does some additional work in the enclosing map functions so that the sort is done in the default order. the Gutman-Rosler transform was first published by Michal Rutka and popularized by Uri Gutman and Larry Rosler.
Consider an example where you want to sort an array of dates. Given data in the format YYYY/MM/DD:
@ Dates = qw (2001/1/1 2001/07/04 %/12/25); what is the most efficient way to sort it in order of increasing date?
The straightforward Schwartzian transform wocould be:
@ Sorted = map {$ _-> [0]}
Sort {$ a-> [1] <=> $ B-> [1]
Or $ a-> [2] <=> $ B-> [2]
Or $ a-> [3] <=> $ B-> [3]
}
Map {[$ _, split m $ _, 3]} @ dates; The more efficient Gutman-Rosler transform is:
@ Sorted = map {substr $ _, 10}
Sort
Map {m | (\ d)/(\ d +) |;
Sprintf "% d-% 02d-% 02d % s", $1, $2, $3, $ _
} @ Dates; The GRT solution is harder to code and less readable than the Schwartzian transform solution, so I recommend you use the GRT only in extreme circumstances. for tests using large datasets, Perl 5.005 _ 03 and Linux 2.2.14, the GRT was 1.7 times faster than the Schwartzian transform. for tests using Perl 5.005 _ 02 and Windows NT 4.0 SP6, the GRT was 2.5 times faster. using the timing data from the first efficiency example, the Gutman-Rosler transform was faster than a brute force sort by a factor of 2.7 and 13 on Linux and Windows, respectively.
If you are still not satisfied, you may be able to speed up your sorts by upgrading to a more recent version of Perl. the sort function in Perl versions 5.6 and higher uses the Mergesort algorithm, which (depending on the data) is slightly faster than the Quicksort algorithm used by the sort function in previous versions.
Again, remember these basic rules of optimization: (1) Don't do it (or don't do it yet), (2) Make it right before you make it faster, and (3) Make it clear before you make it faster.
Some CPAN modules for sorting
These modules can be downloaded from Http://search.cpan.org /.
File: Sort? Sort one or more text files by lines
Sort: Fields? Sort lines using one or more columns as the sort key (s)
Sort: ArbBiLex? Construct sort functions for arbitrary sort orders
Text: required: BibSort? Generate sort keys for bibliographic entries.
Sort: Maker? Automatic generation of ST, GRT, etc. sorters.
Sort: Key? The fastest way to sort in Perl. The open source community is adding to CPAN regularly? Use the search engine to check for new modules. if one of the CPAN sorting modules can be modified to suit your needs, contact the author and/or post your idea to the Usenet group comp. lang. perl. modules. if you write a useful, generalized sorting module, please contribute it to CPAN!
The holy grail of Perl data transforms

Challenge: Find a practical problem that can be
Specified tively solved by a statement using map, sort and grep. The code shocould run faster and be more compact than alternative solutions.
Print sorted list of words with 4 or more consecutive vowels
Perl-e 'print sort map {uc} grep/[aeiou] {4}/, <> '\
/Usr/dict/words

Output result:
AQUEOUS
DEQUEUE
DEQUEUED
DEQUEUES
DEQUEUING
ENQUEUE
ENQUEUED
ENQUEUES
HAWAIIAN
OBSEQUIOUS
PHARMACOPOEIA
QUEUE
QUEUED
QUEUEING
QUEUER
QUEUERS
QUEUES
QUEUING
SEQUOIA (Pharmacopoeia is an official book containing a list of drugs with articles on their preparation and use .)
Print the canonical sort order
This prints the canonical order of characters used by the cmp, lt, gt, le and ge operators:
Print + (sort grep/\ w/, map {chr} 0 .. 255), "\ n ";

Output result:
Encrypt The map converts the numbers to their ASCII value; the grep gets rid of special characters and funky control characters that mess up your screen. the plus sign helps Perl interpret the syntax print (...) correctly.
This example shows that, in this case, the expression '_' lt 'A' is TRUE. if your program has the "use locale" pragma, the canonical order will depend on the program's current locale.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.