Shell script example: Batch compare whether the content of multiple files is the same,

Source: Internet
Author: User

Shell script example: Batch compare whether the content of multiple files is the same,

To compare whether the content of the two files is completely consistent, you can simply use the diff command. For example:

diff file1 file2 &>/dev/null;echo $?

However, the diff command can only specify two file parameters. Therefore, it is impossible to compare multiple files (directories are also treated as files) at a time, and diff is less efficient than non-text files or large files.

In this case, md5sum can be used for implementation. Compared with the row-by-row comparison of diff, md5sum is much faster.

For how to use md5sum, see file MD5 verification in Linux.

However, md5sum can only be used to check whether the files are the same by checking the md5 value. To achieve automatic batch comparison, you need to write a loop. The script is as follows:

#! /Bin/bash #################################### ######################## description: compare program files one time # author: Jun Ma Jinlong # blog: http://www.cnblogs.com/f-ck-need-u ####################################### ###################### filename: md5.sh # Usage: $0 file1 file2 file3... IFS = $ '\ n' declare-A md5_array # If use while read loop, the array in while statement will # auto set to null after Loop, so I use for statement # instead the while, and so, I modify the variable IFS to # $ '\ n '. # md5sum format: MD5/path/to/file # such: 80748c3a55b424226ad51a4bafa1c4aa/etc/fstabfor line in 'md5sum "$ @" 'Do index =$ {line % *} file =$ {line ##*} md5_array [$ index] =" $ file $ {md5_array [$ index]} "done # Traverse the md5_arrayfor I in $ {! Md5_array [@]} do echo-e "the same file with md5: $ I \ n -------------- \ n' echo $ {md5_array [$ I]} | tr ''' \ n' \ n "done

To test the script, copy several files and modify the content of these files, for example:

[root@xuexi ~]# for i in `seq -s' ' 6`;do cp -a /etc/fstab /tmp/fs$i;done[root@xuexi ~]# echo ha >>/tmp/fs4[root@xuexi ~]# echo haha >>/tmp/fs5

Currently, the/tmp directory contains six files fs1, fs2, fs3, fs4, fs5, and fs6. fs4 and fs5 are modified, and the remaining four files have the same content.

[root@xuexi tmp]# ./md5.sh /tmp/fs[1-6]the same file with md5: a612cd5d162e4620b442b0ff3474bf98--------------------------/tmp/fs6/tmp/fs3/tmp/fs2/tmp/fs1the same file with md5: 80748c3a55b726226ad51a4bafa1c4aa--------------------------/tmp/fs4the same file with md5: 30dd43dba10521c1e94267bbd117877b--------------------------/tmp/fs5

More universal comparison method: Compare files with the same name under multiple directories.

[root@xuexi tmp]# find /tmp -type f -name "fs[0-9]" -print0 | xargs -0 ./md5.sh  the same file with md5:a612cd5d162e4620b442b0ff3474bf98--------------------------/tmp/fs6/tmp/fs3/tmp/fs2/tmp/fs1the same file with md5:80748c3a55b726226ad51a4bafa1c4aa--------------------------/tmp/fs4the same file with md5:30dd43dba10521c1e94267bbd117877b--------------------------/tmp/fs5

Script description:

(1 ). the result format of md5sum calculation is "MD5/path/to/file". Therefore, you must output both the MD5 value and the file corresponding to the same MD5 value in the result, and use an array.

(2) At the beginning, I used the while loop to read the md5sum result of each file from the standard input. The statement is as follows:

md5sum "$@" | while read index file;do    md5_array[$index]="$file ${md5_array[$index]}"done

However, the while statement is executed in the subshell due to the MPs queue, so the md5_array array assigned in the while statement will expire at the end of the loop. So it can be rewritten:

while read index file;do    md5_array[$index]="$file ${md5_array[$index]}"done <<<"$(md5sum "$@")"

However, I finally used a more complex for loop:

IFS=$'\n'for line in `md5sum "$@"`do    index=${line%% *}    file=${line##* }    md5_array[$index]="$file ${md5_array[$index]}"done

However, there are two columns in each row of the md5sum statement, And the for loop uses the default IFS statement to split the two columns into two values. Therefore, the IFS Variable value is changed to $ '\ n ', assign a variable to a row.

(3) The index and file variables are used to split each row of md5sum into two variables. the MD5 part is used as the index of the array and the file part is used as the value of the array variable. Therefore, the array assignment statement is:

md5_array[$index]="$file ${md5_array[$index]}"

(4) After assigning values to the array, traverse the array. There are multiple traversal methods. I used to traverse the index list of the array, that is, the MD5 value of each row.

# Traverse the md5_arrayfor i in ${!md5_array[@]}do    echo -e "the same file with md5: $i\n--------------\n`echo ${md5_array[$i]}|tr ' ' '\n'`\n"done  

 

Back to series article outline: http://www.cnblogs.com/f-ck-need-u/p/7048359.html

Reprinted please indicate the source: Success!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.