How to generate random files from the command line on a Linux system

Source: Internet
Author: User

Copyright notice: This article by Hu Hengwei original article, reprint please indicate source:
Article original link: https://www.qcloud.com/community/article/86

Source: Tengyun https://www.qcloud.com/community

Is there a scenario where you don't know how to generate some files that already contain test data when you need to test the data, or if you're temporarily needing a small program that allows you to generate files of different sizes (for example, more than 1Mb less than 100Mb), you don't need to search the network to find how to generate it, Here are some simple ways to help you get lazy.

1. When you don't need to care about the contents of random files, just a fixed-size file
    • Mkfile directives in Unix systems such as Solaris, Mac OS x, can produce files of a specified size, and Linux does not
      Example:mkfile -n 160g test1

    • Linux can be used with the DD instruction,/dev/zero is a special file descriptor that can return null values through it
      Example:dd if=/dev/zero of=test.file count=1024 bs=1024
      Generate a file of Count * Bs bytes, 1M
      The benefit of this method of generating random files is that it is efficient (producing approximately 1s of 1G files) and that the file size created is accurate to bytes
      There are also disadvantages.
      Use null characters to populate file contents with no row for file statistics ( wc -l test.file 0)

2. When you don't need to care about the contents of a random file, but expect the test file to have a statistical line

will be /dev/zero replaced by /dev/urandom /dev/urandom a random number generator under Linux

/dev/urandom /dev/random The difference between the two is not discussed in detail here, presumably, the former is not limited by the system interrupts , even if there is not enough interrupt it can generate enough output value through the random number generator, and the latter if used dd , it can not be ctrl+c or kill -9interrupts, if the value of the DS is large, the resulting random value is insufficient and consumes the CPU for a long time. Although /dev/random the random number generated is more random, it is more efficient to mix with DD or recommend /dev/urandom .

The disadvantage of /dev/zero course is less efficient, the generation of 100Mb files takes about 10 seconds, and the file does not have readable content, the general situation is basically satisfied.

ddIt is an instruction that Linux and UNIX support.

3. When you care about the number of random content lines in a file, and don't care if the content is duplicated

The idea here is to find a reference file (say 2 lines), redirect the file to a new file, save the MV overlay, plus a for loop. (n is the number of cycles, the resulting file behavior2^(n+1))

Example: Suppose you first create a file.txt file that contains Hello and world two lines
for i in {1..n}; do cat file.txt file.txt > file2.txt && mv file2.txt file.txt; done

Because it is factorial, n=20 is already 200W row, the efficiency will fall more severe

4. When you care about the contents of a random file, but do not want to appear the duplicate content line situation

In this case the instructions of the system should not be satisfied, or can be written by the operating system of a large string of scripts can also be achieved, but not recommended, because of readability and maintenance considerations, you should introduce the Ruby or Python class scripting language to help
But we have to help with some system stuff.

Idea: /usr/share/dict/words There are some words recorded, a total of 235886 lines, one word per line
You can pick some of the content from the inside as a file.
Plus cycle to meet the random file requirements we want

Example:ruby -e ‘a=STDIN.readlines;X.times do; b=[];Y.times do; b<<a[rand(a.size)].chomp end; puts b.join(" ")‘ < /usr/share/dict/words > file.txt

X is the number of rows required for a random file, Y is the word read from words, although it can be read by a command that is combined into a single sentence; Read y words repeatedly from the standard input, write to the B list, and then write to the standard output file file.txt by the Join Space connection content

This is rarely a duplicate line, and the efficiency of the production compared with other methods can be, 10 seconds to generate 100Mb files. You are welcome to discuss.

Reference:

Apple's official documentation for Mkfile:
Https://developer.apple.com/library/mac/documentation/Darwin/Reference/Manpages/man8/mkfile.8.html
DD Wiki:
Http://en.wikipedia.org/wiki/Dd_ (Unix)

How to generate random files from the command line on a Linux system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.