Shuffle question: Wash a pair of poker, what good way? Can be washed evenly, and can wash fast? How efficient is the order of chaos relative to a file?
Chinaunix is really a place where the Shell masters are gathered, so long as you want to get the problem, there is basically to find the answer. R2007 gives a trickery method that uses the Shell's $RANDOM variable to add a random line number to each line of the original file and then sorts it according to the random line number, and then filters out the line number that is temporarily added, so that the new file obtained after the operation is the equivalent of being randomly "washed" once:
While read I;do echo "$i $RANDOM";d one<file|sort-k2n|cut-d ""-f1
Of course, if your source file has a complex line of content, you must rewrite the code, but as long as you know the key skills of processing, the remaining problems are not difficult to solve.
Another random file ordering code analysis from Su Rong Rong that uses awk to shuffle the effect (originally posted here, as well as a follow-up discussion of this post, if you are not logged into the account, you can check out the highlights section here) and write more detailed:
--------------------------------------------------------------------
On the shuffle problem, there has been a good shell solution, here another three based on the awk method, there is a mistake also please point out.
Method One: Poor lifting
Similar to the exhaustive method, a hash is constructed to record the number of times the line has been printed, and if more occurrences are not processed, this prevents duplication, but the disadvantage is that it increases the overhead of the system.
Awk-v n= ' Sed-n'$='Data ''Begin{fs="\ n"; RS=""}{srand (); while(t!=N) {x=int(N*rand () +1); A[X]++; if(a[x]==1) {print $x; t++ } }}'Data
Method Two: Transform
Based on the method of array subscript transformation, that is, the content of each row is stored in an array, and the contents of the array are exchanged by the transformation of the array subscript, and the efficiency is better than that.
#! /usr/Awkbegin{srand ();} {B[NR]=$0;} End{c (B,NR); for inch b) { print b[x];} } function C (arr,len,i,j,t,x) { for-in arr) { I=int(len*rand ()) +1; J=int(Len*rand ()) +1; T=arr[i]; Arr[i]=Arr[j]; ARR[J]=t; }}
Method Three: Hash
The best of the three methods.
Using the features of the hash in awk (see details: 7.x in Info gawk), just construct a random, non-repeating hash function, because the linenumber of each line of a file is unique, so use:
Random number + each line linenumber------corresponding------> The contents of that line
is the random function that is constructed.
Thus there are:
awk ' Begin{srand ()}{b[rand () nr]=$0}end{for (x in B) print b[x]} ' data
In fact, we worry about the use of memory too large problem do not care too much, you can do a test:
Test environment:
PM 1.4GHz cpu,40g HDD, Memory 256M Laptop
SUSE 9.3 GNU Bash version 3.00.16 GNU Awk 3.1.4
Produces a random file of more than 500,000 rows, approximately 38M:
Od/dev/urandom |dd count=75000 >data
To take a less efficient approach:
Shuffle time used:
Time Awk-v n= ' sed-n'$='Data ''Begin{fs="\ n"; RS=""}{srand (); while(t!=N) {x=int(N*rand () +1); A[X]++; if(a[x]==1) {print $x; t++ } }}'Data
Results (omission of file contents):
Real 3m41.864suser 0m34.224ssys 0m2.102s
So efficiency is barely acceptable.
Test of Method Two:
Time Awk-f awkfile datafile
Results (omission of file contents):
Real 2m26.487suser 0m7.044ssys 0m1.371s
Efficiency is significantly better than the first one.
Then examine the efficiency of method three:
Time awk ' Begin{srand ()}{b[rand () nr]=$0}end{for (x in B) print b[x]} ' data
Results (omission of file contents):
Real 0m49.195suser 0m5.318ssys 0m1.301s
It's pretty good for a 38M file.
--------------------------------------------------------------------
There is a Python version of the code from flyfly written in a random order:
#coding: gb2312 import sys import RANDOMDEF usage (): Print"Usage:program srcfilename dstfilename" Globalfilename filename="" Try: FileName= sys.argv[1] Except:usage () raise () #open the phonebook Filef= open (filename,'R') Phonebook=F.readlines () print phonebook f.close () #write to file randomlyTry: FileName= sys.argv[2] Except:usage () raise () F= open (filename,'W') Random.shuffle (phonebook) F.writelines (phonebook) F.close ()
Shell scripting multiple ways to rearrange the contents of a file (shuffle problem)