Python computes the MD5 value of the file

Source: Internet
Author: User
Tags md5 encryption

Preface

Recently to develop a Python-based merge folder/directory program, the original idea is based on the modification time comparison, that is, to determine whether the file has changed, compared to two files modified time can be. This idea is tested on the PC side of Windows without problems.

However, when copying a file from the PC to the USB flash drive, there is a problem, the file copied to the USB drive is two seconds slower than the PC-side file !

The copy function I'm using here is Shutil.copy2 (), which in theory will copy the modification time and last access time to 1, but the actual situation is not exactly the same.

Details I raised a question in Segmentfault: Why is it 2 seconds slower to copy a file from a PC to a USB stick?

See Shutil.copy2 's data and online solutions, found that may be the difference between the file system, Windows PC-side file system is NTFS, the USB flash drive is FAT32. But still want to understand the deeper reason, want to know the great God answer.

Pulled away, based on the above problems, forcing me to modify the comparison strategy, so I found a file with the MD5 value comparison method, yeah! ~

Specifically, each file will have a MD5 encryption value, this value to uniquely identify this file (Baidu Cloud fast upload/second transmission is based on this truth), it is commonly used to identify the authenticity of files, such as the download system installation image, next to a long character garbled, this is the image of the MD5 value.

  the MD5 value changes after the file has been modified , so it can be used to determine if a file has been modified.

This method is obviously more reliable than time-based comparisons, and the only problem now is MD5 computing time. How long does it take for a large file to calculate the MD5 value?

Online solution code has a lot of, mostly two methods, one is small file MD5 encryption, call function directly encrypted can, the other is a large file MD5 encryption strategy, method is the large File block update MD5 value, finally get the final value.

Here I use the second method to encrypt the large file for testing.

Test 1

The test object is a compressed file of nearly 2G, as follows:

The encryption results are as follows:

Run time is 20s, not slow, relative file size, I can still accept this speed ...

The calculated MD5 value is "8ee04176f69c10ca56f2358d51d792ed", this value is not right? I did the verification on the Internet:

Two values are equal, thus the algorithm is calculated correctly. See below for the test code.

Test URL in this: http://www.atool.org/file_hash.php

Interestingly, I have a second, third, fourth encryption of the file, found that the encryption time is less than 5 seconds, good magic, do not know why.

It's done. MD5 encryption, I can proceed to the next step of development ~:)

MD5 encryption code for large files

1 #CODING=GBK2 3 ImportHashlib4 ImportOS5 Importdatetime6 7 defGetFileMd5 (filename):8     if  notos.path.isfile (filename):9         returnTenMyhash =hashlib.md5 () Onef = file (filename,'RB') A      whileTrue: -b = F.read (8096) -         if  notB: the              Break - Myhash.update (b) - f.close () -     returnmyhash.hexdigest () +  -filepath = Raw_input ('Please enter the file path:') +  A #MD5 value of output file and record run time atStartTime =Datetime.datetime.now () - PrintGetFileMd5 (filepath) -Endtime =Datetime.datetime.now () - Print 'Running time:%ds'% ((endtime-starttime). seconds)

References

A collection of practical methods for OS and shutil modules in Python

MD5 value of Python detection file

Python computes the MD5 value of the file

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.