Copyright notice: This article by Hu Hengwei original article, reprint please indicate source:
Article original link: https://www.qcloud.com/community/article/86
Source: Tengyun https://www.qcloud.com/community
Is there a scenario where you don't know how to generate some files that already contain test data when you need to test the data, or if you're temporarily needing a small program that allows you to generate files of different sizes (for example, more than 1Mb less than 100Mb), you don't need to search the network to find how to generate it, Here are some simple ways to help you get lazy.
1. When you don't need to care about the contents of random files, just a fixed-size file
Mkfile directives in Unix systems such as Solaris, Mac OS x, can produce files of a specified size, and Linux does not
Example:mkfile -n 160g test1
Linux can be used with the DD instruction,/dev/zero is a special file descriptor that can return null values through it
Example:dd if=/dev/zero of=test.file count=1024 bs=1024
Generate a file of Count * Bs bytes, 1M
The benefit of this method of generating random files is that it is efficient (producing approximately 1s of 1G files) and that the file size created is accurate to bytes
There are also disadvantages.
Use null characters to populate file contents with no row for file statistics ( wc -l test.file
0)
2. When you don't need to care about the contents of a random file, but expect the test file to have a statistical line
will be /dev/zero
replaced by /dev/urandom
/dev/urandom
a random number generator under Linux
/dev/urandom
/dev/random
The difference between the two is not discussed in detail here, presumably, the former is not limited by the system interrupts
, even if there is not enough interrupt it can generate enough output value through the random number generator, and the latter if used dd
, it can not be ctrl+c
or kill -9
interrupts, if the value of the DS is large, the resulting random value is insufficient and consumes the CPU for a long time. Although /dev/random
the random number generated is more random, it is more efficient to mix with DD or recommend /dev/urandom
.
The disadvantage of /dev/zero
course is less efficient, the generation of 100Mb files takes about 10 seconds, and the file does not have readable content, the general situation is basically satisfied.
dd
It is an instruction that Linux and UNIX support.
3. When you care about the number of random content lines in a file, and don't care if the content is duplicated
The idea here is to find a reference file (say 2 lines), redirect the file to a new file, save the MV overlay, plus a for loop. (n is the number of cycles, the resulting file behavior2^(n+1))
Example: Suppose you first create a file.txt file that contains Hello and world two lines
for i in {1..n}; do cat file.txt file.txt > file2.txt && mv file2.txt file.txt; done
Because it is factorial, n=20 is already 200W row, the efficiency will fall more severe
4. When you care about the contents of a random file, but do not want to appear the duplicate content line situation
In this case the instructions of the system should not be satisfied, or can be written by the operating system of a large string of scripts can also be achieved, but not recommended, because of readability and maintenance considerations, you should introduce the Ruby or Python class scripting language to help
But we have to help with some system stuff.
Idea: /usr/share/dict/words
There are some words recorded, a total of 235886 lines, one word per line
You can pick some of the content from the inside as a file.
Plus cycle to meet the random file requirements we want
Example:ruby -e ‘a=STDIN.readlines;X.times do; b=[];Y.times do; b<<a[rand(a.size)].chomp end; puts b.join(" ")‘ < /usr/share/dict/words > file.txt
X is the number of rows required for a random file, Y is the word read from words, although it can be read by a command that is combined into a single sentence; Read y words repeatedly from the standard input, write to the B list, and then write to the standard output file file.txt by the Join Space connection content
This is rarely a duplicate line, and the efficiency of the production compared with other methods can be, 10 seconds to generate 100Mb files. You are welcome to discuss.
Reference:
Apple's official documentation for Mkfile:
Https://developer.apple.com/library/mac/documentation/Darwin/Reference/Manpages/man8/mkfile.8.html
DD Wiki:
Http://en.wikipedia.org/wiki/Dd_ (Unix)
How to generate random files from the command line on a Linux system