Digest Algorithm (HASHLIB)
Python's hashlib provides a common digest algorithm, such as MD5,SHA1 and so on.
What is a digest algorithm? Abstract the algorithm is also called hash algorithm and hashing algorithm. It converts any length of data into a fixed-length data string (usually represented by a 16-binary string) through a function.
You have written an article that contains a string of text‘how to use python hashlib - by Michael‘and a summary of this article is attached‘2d73d4f15c0db7f5ecb321b6a65e5d6d‘.
If someone has tampered with your article and published it,‘how to use python hashlib - by Bob‘you can point out that Bob has tampered with your article because‘how to use python hashlib - by Bob‘the calculated summary differs from the original article's summary
It can be seen that the digest algorithmf()calculates a fixed-length summary of any length of data through a digest functiondatadigest, in order to find out whether the original data has been tampered with.
Abstract the algorithm can point out whether the data has been tampered with, because the digest function is a one-way function, the calculationf(data)is easy, but it isdigestvery difficult to push backdata. Also, making a bit change to the original data will result in a completely different summary of the calculations.
MD5
We use the Common Digest algorithm MD5 as an example to calculate the MD5 value of a string:
import hashlib
s = ‘tz_spider’
m = hashlib.md5 ()
# Encrypted data are bytes
m.update (s.encode (‘utf-8’))
print (‘md5 hash% s‘% m.hexdigest ())
"" "
md5 hash a4499790ea68682695a0a168a8ec1ecc
"" "
If the amount of data is large, it can be called multiple timesupdate(), and the result of the calculation is the same:
import hashlib
md5 = hashlib.md5 ()
md5.update (‘Life is short,‘ .encode (‘utf-8’))
md5.update (‘I learn Python’.encode (‘ utf-8 ’))
print (md5.hexdigest ())
import hashlib
md5 = hashlib.md5 ()
md5.update (‘Life is short, I learn Python’.encode (‘ utf-8 ’))
print (md5.hexdigest ())
"" "
d51a987403720a379fa5d20ab8b7741c
d51a987403720a379fa5d20ab8b7741c
"" "
SHA1
MD5 is the most common digest algorithm and is fast enough to generate a fixed byte of bytes, typically represented by a 32-bit 16 binary string.
Another common digest algorithm is SHA1, which calls SHA1 and calls MD5 exactly like:
import hashlib
md5 = hashlib.sha1 ()
md5.update (‘Life is short,‘ .encode (‘utf-8’))
md5.update (‘I learn Python’.encode (‘ utf-8 ’))
print (md5.hexdigest ())
import hashlib
md5 = hashlib.sha1 ()
md5.update (‘Life is short, I learn Python’.encode (‘ utf-8 ’))
print (md5.hexdigest ())
"" "
5723b4cd6bc67f1f3682cab2a382e333518ed23a
5723b4cd6bc67f1f3682cab2a382e333518ed23a
"" "
The result of the SHA1 is a bit byte, which is usually represented by a 40-bit 16 binary string.
Abstract algorithm Application
Abstract algorithm is mainly used for user login, the password is MD5 encrypted, stored in the database, the advantage of storage MD5 is that even if the operator can access the database, but also unable to know the user's plaintext password.
def get_md5(s):
md5 = hashlib.md5()
md5.update(s.encode(‘utf-8‘)) return md5.hexdigest()
user_md5_dict = {}
user_dict = { ‘michael‘: ‘123456‘, ‘bob‘: ‘abc‘} for item in user_dict:
user_md5_dict[item] = get_md5(user_dict.get(item)) print(user_md5_dict) """ {‘michael‘: ‘e10adc3949ba59abbe56e057f20f883e‘, ‘bob‘: ‘900150983cd24fb0d6963f7d28e17f72‘} """
Is it safe to use MD5 to store passwords? Also not necessarily, many users like to use,123456888888passwordThese simple password, so, hackers can calculate the MD5 value of these commonly used passwords in advance, get a counter-push table:
' e10adc3949ba59abbe56e057f20f883e ' ' 123456 ' ' 21218cca77804d2ba1922c33e0151105 ' ' 888888 '
In this way, no need to crack, only need to compare the database MD5, hackers get the use of common password user account (Crash Library).
Because the MD5 value of the common password is easy to calculate, so to ensure that the Stored user password is not the MD5 of the commonly used password, this method is implemented by adding a complex string to the original password, commonly known as "Add salt":
def calc_md5(password, salt=‘add salt‘):
md5 = hashlib.md5()
md5.update((password+salt).encode(‘utf-8‘))
a = md5.hexdigest() return a
user_md5_dict = {}
user_dict = { ‘michael‘: ‘123456‘, ‘bob‘: ‘abc‘} for item in user_dict:
user_md5_dict[item] = calc_md5(user_dict.get(item)) print(user_md5_dict) """ {‘michael‘: ‘121e5a2806adb57b7f5ddfb49c58cb38‘, ‘bob‘: ‘655cb18b65100d50c375826e4a7138d9‘} """
Salt processing of the MD5 password, as long as the salt is not known by hackers, even if the user entered a simple password, it is difficult to MD5 the plaintext password.
But if two users are using the same simple password, for example123456, in the database, two identical MD5 values will be stored, which means the passwords for the two users are the same. Is there a way for users with the same password to store different MD5?
If the user cannot modify the login, It is possible to calculate the MD5 by using the login as part of the salt, so that users who implement the same password also store different MD5.
def calc_md5(user, password, salt=‘add salt‘):
md5 = hashlib.md5()
md5.update((user + password + salt).encode(‘utf-8‘))
a = md5.hexdigest() return a
user_md5_dict = {}
user_dict = { ‘michael‘: ‘123456‘, ‘bob‘: ‘123456‘} for item in user_dict:
user_md5_dict[item] = calc_md5(item, user_dict.get(item)) print(user_md5_dict) """ {‘michael‘: ‘18833a2efa41021c1659af9eb7ffc0e5‘, ‘bob‘: ‘b2c512421985a2622a64fa84b486dc0b‘} """
Get the MD5 of a file
import os
def calc_md5 (filename):
"" "
Used to get the md5 value of a file
: param filename:
: return: MD5 code
"" "
if not os.path.isfile (filename): # If verify that the md5 file is not a file, return empty
return
myhash = hashlib.md5 ()
f = open (filename, ‘rb‘)
while True:
b = f.read (2048)
if not b:
break
myhash.update (b)
f.close ()
return myhash.hexdigest ()
print (calc_md5 (‘BaiduStockInfo.txt’))
"" "
94da595be98b4c65fc1ccf697a435322
"" "
Base64
The Base64 module is used for Base64 encoding and decoding. This encoding is very common in e-mail.
It can encode binary data that cannot be displayed as text into a text message that can be displayed. The encoded text size increases by 1/3.
The Base64 principle is simple, first, to prepare a 64-character array:
[' A ', ' B ', ' C ', ... ' A ', ' B ', ' C ', ... ' 0 ', ' 1 ', ... ' + ', '/']
Then, the binary data processing, every 3 bytes a group, is a 3x8=24bit, divided into 4 groups, each group of exactly 6 bit
So we get 4 numbers as index, then look up the table, get the corresponding 4 characters, is the encoded string.
Therefore, the BASE64 encoding will encode 3 bytes of binary data into 4 bytes of text data, the length of 33%, the advantage is that the encoded text data can be displayed directly in the message body, Web pages and so on.
What if the binary data to be encoded is not a multiple of 3 and the last 1 or 2 bytes are left? Base64 with the byte at the end of the top\x00, and then add 1 or 2 at the end of the code=, indicating how many bytes, decoding the time, will be automatically removed.
import base64
s = b‘1234567 ’
s1 = base64.b64encode (s) # encoding
print (s1)
print (base64.b64decode (s1)) # decode
Output
B ' mtizndu2nw== ' B ' 1234567 '
Since the standard BASE64 encoding may appear after the character+and/, in the URL can not be directly as a parameter, so there is a "url safe" base64 encoding, in fact + , / is the character and Separate into - and_
import base64
s = b‘i \ xb7 \ x1d \ xfb \ xef \ xff ’
s1 = base64.b64encode (s) # encoding
print (s1)
s2 = base64.urlsafe_b64encode (s)
print (s2)
print (base64.b64decode (s1)) # decode
print (base64.urlsafe_b64decode (s2)) # decode
Output
b‘abcd++//‘
b‘abcd--__‘
b‘i\xb7\x1d\xfb\xef\xff‘
b‘i\xb7\x1d\xfb\xef\xff‘
Reference https://www.liaoxuefeng.com/
Python module--hashlib and base64