Shell scripting multiple ways to rearrange the contents of a file (shuffle problem)

Source: Internet
Author: User
Tags rand shuffle

Shuffle question: Wash a pair of poker, what good way? Can be washed evenly, and can wash fast? How efficient is the order of chaos relative to a file?

Chinaunix is really a place where the Shell masters are gathered, so long as you want to get the problem, there is basically to find the answer. R2007 gives a trickery method that uses the Shell's $RANDOM variable to add a random line number to each line of the original file and then sorts it according to the random line number, and then filters out the line number that is temporarily added, so that the new file obtained after the operation is the equivalent of being randomly "washed" once:

While read I;do echo "$i $RANDOM";d one<file|sort-k2n|cut-d ""-f1

Of course, if your source file has a complex line of content, you must rewrite the code, but as long as you know the key skills of processing, the remaining problems are not difficult to solve.

Another random file ordering code analysis from Su Rong Rong that uses awk to shuffle the effect (originally posted here, as well as a follow-up discussion of this post, if you are not logged into the account, you can check out the highlights section here) and write more detailed:
--------------------------------------------------------------------
On the shuffle problem, there has been a good shell solution, here another three based on the awk method, there is a mistake also please point out.

Method One: Poor lifting

Similar to the exhaustive method, a hash is constructed to record the number of times the line has been printed, and if more occurrences are not processed, this prevents duplication, but the disadvantage is that it increases the overhead of the system.

Awk-v n= ' Sed-n'$='Data ''Begin{fs="\ n"; RS=""}{srand (); while(t!=N) {x=int(N*rand () +1); A[X]++; if(a[x]==1) {print $x; t++    }  }}'Data

Method Two: Transform

Based on the method of array subscript transformation, that is, the content of each row is stored in an array, and the contents of the array are exchanged by the transformation of the array subscript, and the efficiency is better than that.

#! /usr/Awkbegin{srand ();} {B[NR]=$0;} End{c (B,NR);  for inch b)  {    print b[x];}  } function C (arr,len,i,j,t,x) { for-in arr)  {      I=int(len*rand ()) +1;      J=int(Len*rand ()) +1;      T=arr[i];      Arr[i]=Arr[j];      ARR[J]=t;  }}

Method Three: Hash

The best of the three methods.
Using the features of the hash in awk (see details: 7.x in Info gawk), just construct a random, non-repeating hash function, because the linenumber of each line of a file is unique, so use:

Random number + each line linenumber------corresponding------> The contents of that line

is the random function that is constructed.
Thus there are:

awk ' Begin{srand ()}{b[rand () nr]=$0}end{for (x in B) print b[x]} ' data

  

In fact, we worry about the use of memory too large problem do not care too much, you can do a test:

Test environment:

PM 1.4GHz cpu,40g HDD, Memory 256M Laptop
SUSE 9.3 GNU Bash version 3.00.16 GNU Awk 3.1.4

Produces a random file of more than 500,000 rows, approximately 38M:

Od/dev/urandom |dd  count=75000 >data

To take a less efficient approach:

Shuffle time used:

Time Awk-v n= ' sed-n'$='Data ''Begin{fs="\ n"; RS=""}{srand (); while(t!=N) {x=int(N*rand () +1); A[X]++; if(a[x]==1) {print $x; t++    }  }}'Data

Results (omission of file contents):

Real    3m41.864suser    0m34.224ssys     0m2.102s

So efficiency is barely acceptable.

Test of Method Two:

Time Awk-f awkfile datafile

Results (omission of file contents):

Real    2m26.487suser    0m7.044ssys     0m1.371s

Efficiency is significantly better than the first one.

Then examine the efficiency of method three:

Time awk ' Begin{srand ()}{b[rand () nr]=$0}end{for (x in B) print b[x]} ' data

Results (omission of file contents):

Real    0m49.195suser    0m5.318ssys     0m1.301s

It's pretty good for a 38M file.
--------------------------------------------------------------------

There is a Python version of the code from flyfly written in a random order:

#coding: gb2312 import sys import RANDOMDEF usage (): Print"Usage:program srcfilename dstfilename" Globalfilename filename="" Try: FileName= sys.argv[1] Except:usage () raise () #open the phonebook Filef= open (filename,'R') Phonebook=F.readlines () print phonebook f.close () #write to file randomlyTry: FileName= sys.argv[2] Except:usage () raise () F= open (filename,'W') Random.shuffle (phonebook) F.writelines (phonebook) F.close ()

Shell scripting multiple ways to rearrange the contents of a file (shuffle problem)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.