I haven't written this article for a long time, so I can take a rest in the past two days. I would like to take this opportunity to summarize my recent experiences in using python.
In my personal experience, file verification is often used when downloading files. The simplest implementation method in Linux is:
1 $ Md5sum filename221c7ee192e64569ce43cfb869bdb2755 filename
Of course, there are corresponding modules in Python that can implement this function. MD5 modules can be used before python2.5, but hashlib is recommended to replace the MD5 module after python2.5. Simplest implementationCodeAs follows:
1 # ! /Usr/bin/ENV Python 2 # Coding: UTF-8 3 4 Import Sys 5 Import Hashlib 6 7 Def Md5sum (filename ): 8 File_object = open (filename, ' RB ' ) 9 File_content = File_object.read () 10 File_object.close () 11 File_md5 = Hashlib. MD5 (file_content) 12 Return File_md5 13 14 If _ Name __ = " _ Main __ " : 15 File_md5 = md5sum (SYS. argv [1 ]) 16 Print File_md5.hexdigest ()
Zhu Feng thinks there are two points worth noting:
One is to input hashlib. the MD5 () should be file_object.read (). In this way, the MD5 verification code is generated for the file content. At the beginning, Zhu Feng did not use the read () method, instead, input filename (such MD5 is generated for the file name), resulting in Incorrect verification code;
In addition, after hashlib. MD5 () is called, an object is returned. To achieve the same effect of md5sum in Linux, you must call the hexdigest () method.
Of course, the above Code is not fully considered. If you want to verify a large file, the file content will be read into the memory once, resulting in performance defects. For personal comparisons, we recommend that you use the following code for http://ryan-liu.iteye.com/blog/1530029.pdf:
1 # ! /Usr/bin/ENV Python 2 # Coding: UTF-8 3 Import Hashlib 4 5 Def Md5hex (Word ): 6 """ MD5 EncryptionAlgorithmReturns the 32-bit lowercase hexadecimal symbol. 7 """ 8 If Isinstance (word, Unicode ): 9 WORD = word. encode ( " UTF-8 " ) 10 Elif Not Isinstance (word, STR ): 11 WORD = STR (word) 12 M = Hashlib. MD5 () 13 M. Update (word) 14 Return M. hexdigest () 15 16 Def Md5sum (fname ): 17 """ Calculate the MD5 value of a file 18 """ 19 Def Read_chunks (FH ): 20 FH. Seek (0) 21 Chunk = FH. Read (8096 ) 22 While Chunk: 23 Yield Chunk 24 Chunk = FH. Read (8096 ) 25 Else : # Put the cursor back at the beginning of the file. 26 FH. Seek (0) 27 M = Hashlib. MD5 () 28 If Isinstance (fname, basestring )\ 29 And OS. Path. exists (fname ): 30 With open (fname, " RB " ) As FH: 31 For Chunk In Read_chunks (FH ): 32 M. Update (chunk) 33 # Uploaded file cache or opened file stream 34 Elif Fname. _ Class __ . _ Name __ In [ " Stringio " , " Stringo " ] \ 35 Or Isinstance (fname, file ): 36 For Chunk In Read_chunks (fname ): 37 M. Update (chunk) 38 Else : 39 Return "" 40 Return M. hexdigest ()
This code is powerful enough to read 8 K of content each time, and then call Update () to update MD5.
PS: Why 8 K? This involves the IO size. Provide an articleArticle, Interested can take a look at the knowledge of: http://blog.sina.com.cn/s/blog_6200c1440100vt4z.html