Shell implementation Shuffle random

Source: Internet
Author: User
Tags rand shuffle

Shuffle question: Wash a pair of poker, what good way? Can be washed evenly, and can wash fast?    How efficient is the order of chaos relative to a file? On the shuffle problem, there has been a good shell solution, here another three based on the awk method, there is a mistake also please point out. Method One is poor: similar to the exhaustive method, constructs a hash to record the number of times the line has been printed, if more than one time is not processed, this can prevent duplication, but the disadvantage is to increase the cost of the system. awk-V n= ' sed-n'$='Data ''Begin{fs="\ n"; RS=""}{srand (); while(t!=N) {x=int(N*rand () +1); A[X]++; if(a[x]==1) {print $x; t++    }  }}'DataMethod Two transformation: based on the array subscript transformation method, that is, using the array to store the contents of each row, through the array subscript transformation Exchange the contents of the array, the efficiency is better than the method one. #! /usr/Awkbegin{srand ();} {B[NR]=$0;} End{c (B,NR); for(xinchb) {print b[x]; }}function C (arr,len,i,j,t,x) { for(xincharr) {i=int(Len*rand ()) +1; J=int(Len*rand ()) +1; T=Arr[i]; Arr[i]=Arr[j]; ARR[J]=T;    }} Method three hashes: The best of the three methods. Take advantage of the features of the hash in awk (see in detail: 7.x in Info gawk), as long as you construct a random, non-repeating hash function, because the linenumber of each line of a file is unique, so use: Random number + per line linenumber------corresponding------>The content of that line is the random function that is constructed. Thus there are: awk'Begin{srand ()}{b[rand () nr]=$0}end{for (x in B) print b[x]}'data in fact, we are worried about the use of excessive memory problems do not care too much, you can do a test: test environment: PM1. 4GHz cpu,40g HDD, memory 256M laptopsuse9.3GNU Bash version3.00. -GNU Awk3.1.4produces a random file of more than 500,000 rows, about 38m:od/dev/urandom |dd count=75000>the less efficient way to use data is: The time spent shuffling a shuffle:-V n= ' sed-n'$='Data ''Begin{fs="\ n"; RS=""}{srand (); while(t!=N) {x=int(N*rand () +1); A[X]++; if(a[x]==1) {print $x; t++    }  }}'Dataresult (omission of file contents): Real 3m41.864suser 0m34.224ssys 0m2.102s so efficiency is barely acceptable. Test of Method Two: Time awk-F awkfile datafile results (omission of file contents): Real 2m26.487suser 0m7.044ssys 0m1.371s efficiency is significantly better than the first. Then look at the efficiency of method three: Time awk'Begin{srand ()}{b[rand () nr]=$0}end{for (x in B) print b[x]}'data result (omission of file contents): Real 0m49.195suser 0m5.318ssys 0m1.301s is pretty good for a 38M file. Have fun! 

Shell implementation Shuffle random

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.