Dd, split, csplit command, ddsplitcsplit command
Directory:
1.1 dd command
1.2 split command
1.3 csplit command
The most common tool for file generation and slicing in Linux is dd, which provides comprehensive functions, but cannot extract file data in the unit of behavior, the file cannot be evenly distributed by size or number of rows (unless it is cyclically used ). The other two data splitting tools split and csplit can easily meet these requirements. Csplit is an upgraded version of split.
1.1 dd command
Reads data from the specified object if and writes it to the specified object of if. Use bs to specify the block size for reading and writing, use count to specify the number of data blocks for reading and writing, and multiply bs and count to specify the total size of the file. You can specify the skip to ignore the first number of blocks read from the specified object if, and seek to specify the number of blocks that are written to the specified object.
dd if=/dev/zero of=/tmp/abc.1 bs=1M count=20
If is input file, of is output file; bs has c (1 byte), w (2 bytes), B (512 bytes), kB (1000 bytes) K (1024 bytes), MB (1000), M (1024), GB, G, and other units. Therefore, do not add the letter B to the Unit.
Assume that the size of the existing file CentOS. iso is 1.3 GB. You need to split the file and restore it. The size of the first small file to be split is 500 mb.
dd if=/tmp/CentOS.iso of=/tmp/CentOS1.iso bs=2M count=250
Generate the second small file. Because the second small file does not know the specific size, the count option is not specified. Because the second small file needs to be split from 500 mb, the first MB of CentOS. iso needs to be ignored. If bs is 2 M, the number of data blocks dropped by the skip address is 250.
dd if=/tmp/CentOS.iso of=/tmp/CentOS2.iso bs=2M skip=250
Now CentOS. iso = CentOS1.iso + CentOS2.iso. You can restore CentOS [1-2]. iso.
cat CentOS1.iso CentOS2.iso >CentOS_m.iso
Compare the md5 values of CentOS_m.iso and CentOS. iso. They are identical.
shell> md5sum CentOS_m.iso CentOS.iso504dbef14aed9b5990461f85d9fdc667 CentOS_m.iso504dbef14aed9b5990461f85d9fdc667 CentOS.iso
What about the seek option? What is the difference with skip? The skip option ignores the first N data blocks during reading, while the seek option ignores the first N data blocks written to the file. Assume that the file to be written is. log, when seek = 2. log 3rd data blocks start to append data, if. if the log file size is less than two data blocks, the missing parts are automatically filled with/dev/zero.
Therefore, you can use the following method to restore CentOS 1.iso to the same file as CentOS. iso:
dd if=/tmp/CentOS.iso of=/tmp/CentOS1.iso bs=2M skip=250 seek=250
After restoration, their md5 values are the same.
shell> md5sum CentOS1.iso CentOS.iso504dbef14aed9b5990461f85d9fdc667 CentOS1.iso504dbef14aed9b5990461f85d9fdc667 CentOS.iso
1.2 split command
The split tool is used to split a file into multiple small files. To generate multiple small files, you must specify the units for splitting files. You can split files by row or by file size. In addition, you must solve the problem of naming small files. For example, the file name prefix and suffix. If the prefix is not explicitly specified, the default prefix is "x ".
The command syntax is described as follows:
Split [OPTION]... [INPUT [PREFIX]-a N: generate a suffix with a length of N, default N = 2-b N: N of each small file, that is, split the file by file size. Supports K, M, G, T (conversion unit: 1024), KB, MB, GB (conversion unit: 1000), and so on. The default unit is byte-l N: there are N rows in each small file, that is, split the file by line-d N: Specify to generate a numeric format suffix to replace the default letter suffix, the value starts from N, the default is 0. For example, the suffix 01/02/03 -- additional-suffix = string: append an additional suffix to each small file, such as adding ". log ". Some old versions do not support this option, which is supported on CentOS 7.2. INPUT: Specifies the INPUT file to be split. to split the standard INPUT file, use "-" PREFIX: Specifies the PREFIX of the small file. If not specified, the default value is "x"
For example, you can split/etc/fstab by row, and split each five rows. Specify the prefix of the small file as "fs _", suffix as the numeric suffix, and suffix length as 2.
[root@xuexi ~]# split -l 5 -d -a 2 /etc/fstab fs_[root@xuexi ~]# lsfs_00 fs_01 fs_02
View any small file.
[root@xuexi ~]# cat fs_01# Accessible filesystems, by reference, are maintained under '/dev/disk'# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info#UUID=b2a70faf-aea4-4d8e-8be8-c7109ac9c8b8 / xfs defaults 0 0UUID=367d6a77-033b-4037-bbcb-416705ead095 /boot xfs defaults 0 0
You can re-assemble and restore these split small files. For example, restore the three small files above ~ /Fstab. bak.
[root@xuexi ~]# cat fs_0[0-2] >~/fstab.bak
After restoration, their contents are completely consistent. You can use md5sum for comparison.
[root@xuexi ~]# md5sum /etc/fstab ~/fstab.bak29b94c500f484040a675cb4ef81c87bf /etc/fstab29b94c500f484040a675cb4ef81c87bf /root/fstab.bak
You can also split the standard input data and write them into small files. For example:
[root@xuexi ~]# seq 1 2 15 | split -l 3 -d - new_[root@xuexi ~]# ls new*new_00 new_01 new_02
You can append an additional suffix to each small file. Some old versions of split do not support this option, but are supported on csplit, but the new version of split is supported. For example, add ". log ".
[root@xuexi ~]# seq 1 2 20 | split -l 3 -d -a 3 --additional-suffix=".log" - new1_[root@xuexi ~]# ls new1*new1_000.log new1_001.log new1_002.log new1_003.log
1.3 csplit command
Split can only be split by row or by size, but cannot be split by paragraph. Csplit is a variant of split. It provides more functions. It mainly splits files by paragraph based on the specified context.
Csplit [OPTION]... file pattern... description: Splits a file into "xx00", "xx01",..., according to PATTERN ",..., the number of bytes of each small file is output in the standard output. Option Description:-B FORMAT: Specifies the file suffix FORMAT. The FORMAT is printf. The default value is % 02d. It indicates that the suffix is a 2-digit value, and if it is not enough, it is filled with 0. -F PREFIX: Specifies the PREFIX. If this parameter is not specified, the default value is "xx ". -K: used for emergencies. Indicates that even if an error occurs, small files that have been split are not deleted. -M: explicitly prohibit the line of a file from matching PATTERN. -S :( silent) does not print the size of a small file. -Z: if there are empty files in the split small files, delete them. FILE: the FILE to be split. If you want to split standard input data, use "-". PATTERNs: INTEGER: Numeric value, if N, indicates copying the content from 1 to the N-1 line to a small file, and the rest to another small file. /REGEXP/[OFFSET]: copy the content of the specified number of rows to a small file from the matched row to the OFFSET. : The OFFSET format is "+ N" or "-N", indicating to copy N rows backward and forward % REGEXP % [OFFSET]: The matched rows are ignored. {INTEGER}: If the value is N, it indicates that the previous pattern match of N is repeated. {*}: Indicates that the matching is stopped until the end of the file.
Assume that the file content is as follows:
[root@xuexi ~]# cat test.txtSERVER-1[connection] 192.168.0.1 success[connection] 192.168.0.2 failed[disconnect] 192.168.0.3 pending[connection] 192.168.0.4 successSERVER-2[connection] 192.168.0.1 failed[connection] 192.168.0.2 failed[disconnect] 192.168.0.3 success[CONNECTION] 192.168.0.4 pendingSERVER-3[connection] 192.168.0.1 pending[connection] 192.168.0.2 pending[disconnect] 192.168.0.3 pending[connection] 192.168.0.4 failed
Assume that each SERVER-n represents a paragraph. Therefore, you must split the file according to the paragraph and use the following statement:
[root@xuexi ~]# csplit -f test_ -b %04d.log test.txt /SERVER/ {*}0140139140
"-F test _" specifies that the prefix of a small file is "test _", "-B % 04d. log "specified file suffix format" 00xx. log ", which automatically appends an additional suffix to each small File ". log ","/SERVER/"indicates the matching mode. Each time a match is made, a small file is generated and the matched row is the content of the small file, "{*}" indicates that the previous mode is infinitely matched, that is,/SERVER/, until the end of the file. If you do not know {*} or specify it as {1 }, after a successful match, no match will be performed.
[root@xuexi ~]# ls test_*test_0000.log test_0001.log test_0002.log test_0003.log
Although the preceding file contains only three paragraphs: SERVER-1, SERVER-2, and SERVER-3, the splitting result generates four small files, note that the size of the first small file is 0 bytes. Why? This is because each line that matches a pattern match serves as the starting line of the next small file. Since the first line of this file "SERVER-1" is matched by/SERVER/, this line serves as the content of the next small file, an empty file is automatically generated before this small file.
You can use the "-z" option to delete the generated empty file.
[root@xuexi ~]# csplit -f test1_ -z -b %04d.log test.txt /SERVER/ {*}140139140
You can also specify the number of line offsets that match the copy operation. For example, when a row is matched, only one line after it is copied (including its own two rows), but the remaining rows are placed in the next small file.
[root@xuexi ~]# csplit -f test2_ -z -b %04d.log test.txt /SERVER/+2 {*}4213914098
The first small file contains only two rows.
[root@xuexi ~]# cat test2_0000.log SERVER-1[connection] 192.168.0.1 success
The rest of the SERVER-1 section is placed in the second small file.
[root@xuexi ~]# cat test2_0001.log[connection] 192.168.0.2 failed[disconnect] 192.168.0.3 pending[connection] 192.168.0.4 successSERVER-2[connection] 192.168.0.1 failed
Similarly, the third small file remains the same until all the unmatching content is stored in the last small file.
[root@xuexi ~]# cat test2_0003.log [connection] 192.168.0.2 pending[disconnect] 192.168.0.3 pending[connection] 192.168.0.4 failed
If you specify the "-s" or "-q" option to run in silent mode, no small file size information is output.
[root@xuexi ~]# csplit -q -f test3_ -z -b %04d.log test.txt /SERVER/+2 {*}
Back to series article outline: http://www.cnblogs.com/f-ck-need-u/p/7048359.html
Reprinted please indicate the source: Success!