Tutorial on algorithm processing using the hashlib module in Python

Source: Internet
Author: User
This article describes how to use the hashlib module to process algorithms in Python. the code is based on Python2.x. if you need it, refer to the Python hashlib to provide common digest algorithms, such as MD5, SHA1 and so on.

What is a digest algorithm? Digest algorithms are also called hash algorithms and hash algorithms. It converts data of any length into a fixed-length data string (usually expressed in hexadecimal strings) through a function ).

For example, you wrote an article about how to use python hashlib-by Michael. the abstract of this article is '2d73d4f15c0db7f5ecb321b6a65e5d6d '. If someone has tampered with your article and published it as 'How to use python hashlib-by Bob', you can suddenly point out that Bob has tampered with your article, because the Digest calculated based on 'How to use python hashlib-by Bob' is different from the Digest in the original article.

It can be seen that the digest algorithm uses the digest function f () to calculate a fixed-length digest for data of any length, in order to detect whether the original data has been tampered.

Abstract algorithms can point out whether data has been tampered with because abstract functions are a one-way function that is easy to calculate f (data), but it is very difficult to push data through digest. Furthermore, modifying the original data bit will lead to a completely different Digest.

Take the common digest algorithm MD5 as an example to calculate the MD5 value of a string:

import hashlibmd5 = hashlib.md5()md5.update('how to use md5 in python hashlib?')print md5.hexdigest()

The calculation result is as follows:

d26a53750bc40b38b65a520292f69306

If the data volume is large, you can call update () multiple times in multiple parts. the calculation result is the same:

md5 = hashlib.md5()md5.update('how to use md5 in ')md5.update('python hashlib?')print md5.hexdigest()

Try to change a letter to see if the calculation results are completely different.

MD5 is the most common digest algorithm, which is very fast. the generated result is a fixed 128-bit byte, which is usually represented by a 32-bit hexadecimal string.

Another common digest algorithm is SHA1. calling SHA1 is similar to calling MD5:

import hashlibsha1 = hashlib.sha1()sha1.update('how to use sha1 in ')sha1.update('python hashlib?')print sha1.hexdigest()

The result of SHA1 is 160 bits, usually expressed in a 40-bit hexadecimal string.

SHA256 and SHA512 are more secure than SHA1, but the more secure the algorithm is, the longer the digest length.

Is it possible that two different data sources get the same abstract using a digest algorithm? It is entirely possible that any digest algorithm maps an infinite number of data sets to a finite set. This is called a collision. for example, Bob tries to release an article 'How to learn hashlib in python-by Bob' based on your abstract ', the abstract of this article is exactly the same as that of your article. this situation is not impossible, but it is very difficult.
Digest Algorithm application

Where can an digest algorithm be applied? For example:

Any website that allows users to log on will store the user name and password for User logon. How to store usernames and passwords? The method is to save it to the database table:

name  | password--------+----------michael | 123456bob   | abc999alice  | alice2008

If the user password is stored in plain text, and the database leaks, the passwords of all users will fall into the hands of hackers. In addition, website O & M personnel can access the database, that is, they can obtain the passwords of all users.

The correct password saving method is not to store the user's plaintext password, but to store the Digest of the user's password, such as MD5:

username | password---------+---------------------------------michael | e10adc3949ba59abbe56e057f20f883ebob   | 878ef96e86145580c38c87f0410ad153alice  | 99b1c2188db85afee403b1536010c2c9

When a user logs on, the MD5 value of the plaintext password entered by the user is calculated first, and then compared with the MD5 value stored in the database. if the MD5 value is the same, the password is entered correctly. if the MD5 value is different, the password is definitely incorrect.

Exercise: calculate the MD5 password stored in the database based on the password entered by the user:

def calc_md5(password):  pass

The advantage of storing MD5 is that even if the O & M personnel can access the database, they cannot obtain the user's plaintext password.

Exercise: design a function to verify user logon. return True or False based on the user's entered password:

db = {  'michael': 'e10adc3949ba59abbe56e057f20f883e',  'bob': '878ef96e86145580c38c87f0410ad153',  'alice': '99b1c2188db85afee403b1536010c2c9'}def login(user, password):  pass

Is it safe to use the MD5 storage password? Not necessarily. Assume that you are a hacker and have obtained the database that stores the MD5 password. how can I use MD5 to reverse the user's plaintext password? Brute-force cracking is laborious, and real hackers will not.

Considering this situation, many users prefer simple passwords such as 123456,888888 and password. Therefore, hackers can calculate the MD5 values of these frequently used passwords in advance to obtain a reverse table:

'e10adc3949ba59abbe56e057f20f883e': '123456''21218cca77804d2ba1922c33e0151105': '888888''5f4dcc3b5aa765d61d8327deb882cf99': 'password'

In this way, you do not need to crack the password. Instead, you only need to compare the MD5 of the database, and the hacker will obtain the user account using the common password.

Do not use a simple password. However, can we enhance protection for simple passwords in programming?

Because the MD5 values of common passwords are easily calculated, make sure that the stored user passwords are not the MD5 values of common passwords that have been calculated, this method is implemented by adding a complex string to the original password, commonly known as "adding salt ":

def calc_md5(password):  return get_md5(password + 'the-Salt')

The MD5 password processed by Salt is hard to reverse-push the plaintext password through MD5 even if the user enters a simple password as long as it is not known by hackers.

However, if two users use the same simple password, such as 123456, two identical MD5 values will be stored in the database, which indicates that the passwords of the two users are the same. Is there a way for users with the same password to store different MD5 numbers?

If you cannot modify the login name, you can calculate the MD5 value by using the login name as a part of the Salt, so that users with the same password can also store different MD5 values.

Exercise: simulate user registration based on the user's login name and password to calculate a safer MD5:

db = {}def register(username, password):  db[username] = get_md5(password + username + 'the-Salt')

Then, verify the user logon based on the modified MD5 algorithm:

def login(username, password):  pass

Summary

Abstract algorithms are widely used in many places. Note that the digest algorithm is not an encryption algorithm and cannot be used for encryption (because digest reverse plain text cannot be used). It can only be used to prevent tampering, however, its one-way computing feature determines that the user password can be verified without storing the plaintext password.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.