Shell script example: Batch compare whether the content of multiple files is the same,
To compare whether the content of the two files is completely consistent, you can simply use the diff command. For example:
diff file1 file2 &>/dev/null;echo $?
However, the diff command can only specify two file parameters. Therefore, it is impossible to compare multiple files (directories are also treated as files) at a time, and diff is less efficient than non-text files or large files.
In this case, md5sum can be used for implementation. Compared with the row-by-row comparison of diff, md5sum is much faster.
For how to use md5sum, see file MD5 verification in Linux.
However, md5sum can only be used to check whether the files are the same by checking the md5 value. To achieve automatic batch comparison, you need to write a loop. The script is as follows:
#! /Bin/bash #################################### ######################## description: compare program files one time # author: Jun Ma Jinlong # blog: http://www.cnblogs.com/f-ck-need-u ####################################### ###################### filename: md5.sh # Usage: $0 file1 file2 file3... IFS = $ '\ n' declare-A md5_array # If use while read loop, the array in while statement will # auto set to null after Loop, so I use for statement # instead the while, and so, I modify the variable IFS to # $ '\ n '. # md5sum format: MD5/path/to/file # such: 80748c3a55b424226ad51a4bafa1c4aa/etc/fstabfor line in 'md5sum "$ @" 'Do index =$ {line % *} file =$ {line ##*} md5_array [$ index] =" $ file $ {md5_array [$ index]} "done # Traverse the md5_arrayfor I in $ {! Md5_array [@]} do echo-e "the same file with md5: $ I \ n -------------- \ n' echo $ {md5_array [$ I]} | tr ''' \ n' \ n "done
To test the script, copy several files and modify the content of these files, for example:
[root@xuexi ~]# for i in `seq -s' ' 6`;do cp -a /etc/fstab /tmp/fs$i;done[root@xuexi ~]# echo ha >>/tmp/fs4[root@xuexi ~]# echo haha >>/tmp/fs5
Currently, the/tmp directory contains six files fs1, fs2, fs3, fs4, fs5, and fs6. fs4 and fs5 are modified, and the remaining four files have the same content.
[root@xuexi tmp]# ./md5.sh /tmp/fs[1-6]the same file with md5: a612cd5d162e4620b442b0ff3474bf98--------------------------/tmp/fs6/tmp/fs3/tmp/fs2/tmp/fs1the same file with md5: 80748c3a55b726226ad51a4bafa1c4aa--------------------------/tmp/fs4the same file with md5: 30dd43dba10521c1e94267bbd117877b--------------------------/tmp/fs5
More universal comparison method: Compare files with the same name under multiple directories.
[root@xuexi tmp]# find /tmp -type f -name "fs[0-9]" -print0 | xargs -0 ./md5.sh the same file with md5:a612cd5d162e4620b442b0ff3474bf98--------------------------/tmp/fs6/tmp/fs3/tmp/fs2/tmp/fs1the same file with md5:80748c3a55b726226ad51a4bafa1c4aa--------------------------/tmp/fs4the same file with md5:30dd43dba10521c1e94267bbd117877b--------------------------/tmp/fs5
Script description:
(1 ). the result format of md5sum calculation is "MD5/path/to/file". Therefore, you must output both the MD5 value and the file corresponding to the same MD5 value in the result, and use an array.
(2) At the beginning, I used the while loop to read the md5sum result of each file from the standard input. The statement is as follows:
md5sum "$@" | while read index file;do md5_array[$index]="$file ${md5_array[$index]}"done
However, the while statement is executed in the subshell due to the MPs queue, so the md5_array array assigned in the while statement will expire at the end of the loop. So it can be rewritten:
while read index file;do md5_array[$index]="$file ${md5_array[$index]}"done <<<"$(md5sum "$@")"
However, I finally used a more complex for loop:
IFS=$'\n'for line in `md5sum "$@"`do index=${line%% *} file=${line##* } md5_array[$index]="$file ${md5_array[$index]}"done
However, there are two columns in each row of the md5sum statement, And the for loop uses the default IFS statement to split the two columns into two values. Therefore, the IFS Variable value is changed to $ '\ n ', assign a variable to a row.
(3) The index and file variables are used to split each row of md5sum into two variables. the MD5 part is used as the index of the array and the file part is used as the value of the array variable. Therefore, the array assignment statement is:
md5_array[$index]="$file ${md5_array[$index]}"
(4) After assigning values to the array, traverse the array. There are multiple traversal methods. I used to traverse the index list of the array, that is, the MD5 value of each row.
# Traverse the md5_arrayfor i in ${!md5_array[@]}do echo -e "the same file with md5: $i\n--------------\n`echo ${md5_array[$i]}|tr ' ' '\n'`\n"done
Back to series article outline: http://www.cnblogs.com/f-ck-need-u/p/7048359.html
Reprinted please indicate the source: Success!