Shuffle question: Wash a pair of poker, what good way? Can be washed evenly, and can wash fast? How efficient is the order of chaos relative to a file? On the shuffle problem, there has been a good shell solution, here another three based on the awk method, there is a mistake also please point out. Method One is poor: similar to the exhaustive method, constructs a hash to record the number of times the line has been printed, if more than one time is not processed, this can prevent duplication, but the disadvantage is to increase the cost of the system. awk-V n= ' sed-n'$='Data ''Begin{fs="\ n"; RS=""}{srand (); while(t!=N) {x=int(N*rand () +1); A[X]++; if(a[x]==1) {print $x; t++ } }}'DataMethod Two transformation: based on the array subscript transformation method, that is, using the array to store the contents of each row, through the array subscript transformation Exchange the contents of the array, the efficiency is better than the method one. #! /usr/Awkbegin{srand ();} {B[NR]=$0;} End{c (B,NR); for(xinchb) {print b[x]; }}function C (arr,len,i,j,t,x) { for(xincharr) {i=int(Len*rand ()) +1; J=int(Len*rand ()) +1; T=Arr[i]; Arr[i]=Arr[j]; ARR[J]=T; }} Method three hashes: The best of the three methods. Take advantage of the features of the hash in awk (see in detail: 7.x in Info gawk), as long as you construct a random, non-repeating hash function, because the linenumber of each line of a file is unique, so use: Random number + per line linenumber------corresponding------>The content of that line is the random function that is constructed. Thus there are: awk'Begin{srand ()}{b[rand () nr]=$0}end{for (x in B) print b[x]}'data in fact, we are worried about the use of excessive memory problems do not care too much, you can do a test: test environment: PM1. 4GHz cpu,40g HDD, memory 256M laptopsuse9.3GNU Bash version3.00. -GNU Awk3.1.4produces a random file of more than 500,000 rows, about 38m:od/dev/urandom |dd count=75000>the less efficient way to use data is: The time spent shuffling a shuffle:-V n= ' sed-n'$='Data ''Begin{fs="\ n"; RS=""}{srand (); while(t!=N) {x=int(N*rand () +1); A[X]++; if(a[x]==1) {print $x; t++ } }}'Dataresult (omission of file contents): Real 3m41.864suser 0m34.224ssys 0m2.102s so efficiency is barely acceptable. Test of Method Two: Time awk-F awkfile datafile results (omission of file contents): Real 2m26.487suser 0m7.044ssys 0m1.371s efficiency is significantly better than the first. Then look at the efficiency of method three: Time awk'Begin{srand ()}{b[rand () nr]=$0}end{for (x in B) print b[x]}'data result (omission of file contents): Real 0m49.195suser 0m5.318ssys 0m1.301s is pretty good for a 38M file. Have fun!
Shell implementation Shuffle random