From: http://www.cnblogs.com/PandaBamboo/archive/2013/05/10/3071233.html
I haven't written this article for a long time, so I can take a rest in the past two days. I would like to take this opportunity to summarize my recent experiences in using python.
In my personal experience, file verification is often used when downloading files. The simplest implementation method in Linux is:
1 $ md5sum filename2 21c7ee192e64569ce43cfb869bdb2755 filename
Of course, there are corresponding modules in Python that can implement this function. MD5 modules can be used before python2.5, but hashlib is recommended to replace the MD5 module after python2.5. The simplest implementation code is as follows:
1 #!/usr/bin/env python 2 #coding : utf-8 3 4 import sys 5 import hashlib 6 7 def md5sum(filename): 8 file_object = open(filename, 'rb') 9 file_content = file_object.read()10 file_object.close()11 file_md5 = hashlib.md5(file_content)12 return file_md513 14 if __name__ == "__main__":15 file_md5 = md5sum(sys.argv[1])16 print file_md5.hexdigest()
Zhu Feng thinks there are two points worth noting:
One is to input hashlib. the MD5 () should be file_object.read (). In this way, the MD5 verification code is generated for the file content. At the beginning, Zhu Feng did not use the read () method, instead, input filename (such MD5 is generated for the file name), resulting in Incorrect verification code;
In addition, after hashlib. MD5 () is called, an object is returned. To achieve the same effect of md5sum in Linux, you must call the hexdigest () method.
Of course, the above Code is not fully considered. If you want to verify a large file, the file content will be read into the memory once, resulting in performance defects. For personal comparisons, we recommend that you use the following code for http://ryan-liu.iteye.com/blog/1530029.pdf:
1 #! /Usr/bin/ENV Python 2 # Coding: UTF-8 3 Import hashlib 4 5 def md5hex (Word): 6 "MD5 encryption algorithm, returns the 32-bit lowercase hexadecimal 7 "8 If isinstance (word, Unicode): 9 word = word. encode ("UTF-8") 10 Elif not isinstance (word, STR): 11 word = STR (Word) 12 m = hashlib. MD5 () 13 m. update (Word) 14 return M. hexdigest () 15 16 def md5sum (fname): 17 "Calculate the MD5 value of the file 18" "19 def read_chunks (FH): 20 FH. seek (0) 21 chunk = FH. read (8096) 22 while Chunk: 23 yield chunk24 chunk = FH. read (8096) 25 else: # Put the cursor back at the beginning of the file 26 FH. seek (0) 27 m = hashlib. MD5 () 28 If isinstance (fname, basestring) \ 29 and OS. path. exists (fname): 30 with open (fname, "rb") as FH: 31 for chunk in read_chunks (FH): 32 m. update (chunk) 33 # uploaded file cache or opened file stream 34 Elif fname. _ class __. _ name _ in ["stringio", "stringo"] \ 35 or isinstance (fname, file): 36 for chunk in read_chunks (fname): 37 m. update (chunk) 38 else: 39 return "" 40 return M. hexdigest ()
This code is powerful enough to read 8 K of content each time, and then call Update () to update MD5.
PS: Why 8 K? This involves the IO size. Provide an article, interested can have a look at understanding: http://blog.sina.com.cn/s/blog_6200c1440100vt4z.html