Tutorials for processing algorithms in Python using the Hashlib module

Source: Internet
Author: User
Tags sha1
Python's hashlib provides a common digest algorithm, such as MD5,SHA1 and so on.

What is a digest algorithm? Abstract the algorithm is also called hash algorithm and hashing algorithm. It uses a function to convert any length of data into a fixed length data string (usually represented by a 16-binary string).

For example, you wrote an article about a string ' how-to-use Python hashlib-by Michael ', and attached the abstract of this article is ' 2d73d4f15c0db7f5ecb321b6a65e5d6d '. If someone has tampered with your article and published it as ' how-to-use Python hashlib-by bob ', you can suddenly point out that Bob tampered with your article, because the summary computed based on ' how-to-use Python hashlib-by bob ' differs from the original Summary of the initial article.

It can be seen that the digest algorithm calculates the fixed-length summary digest by using the Digest function f () for arbitrary length data, in order to find out whether the original data has been tampered with.

Abstract the algorithm can point out whether the data has been tampered with, because the digest function is a one-way function, it is easy to calculate f (data), but it is very difficult to digest data by using it. Also, making a bit change to the original data will result in a completely different summary of the calculations.

We use the Common Digest algorithm MD5 as an example to calculate the MD5 value of a string:

Import hashlibmd5 = Hashlib.md5 () md5.update (' How to use MD5 in Python hashlib? ') Print Md5.hexdigest ()

The calculation results are as follows:


If you have a large amount of data, you can call Update () multiple times in chunks, and the result is the same:

MD5 = HASHLIB.MD5 () md5.update (' How to use MD5 in ') md5.update (' Python hashlib? ') Print Md5.hexdigest ()

Try changing a letter to see if the results are completely different.

MD5 is the most common digest algorithm and is fast enough to generate a fixed byte of bytes, typically represented by a 32-bit 16 binary string.

Another common digest algorithm is SHA1, which calls SHA1 and calls MD5 exactly like:

Import HASHLIBSHA1 = HASHLIB.SHA1 () sha1.update (' How to use SHA1 in ') sha1.update (' Python hashlib? ') Print Sha1.hexdigest ()

The result of the SHA1 is a bit byte, which is usually represented by a 40-bit 16 binary string.

Algorithms that are more secure than SHA1 are SHA256 and SHA512, but the more secure the algorithm is, the slower it is, and the longer the digest length.

Is it possible that two different data get the same summary from a single digest algorithm? It is entirely possible because any digest algorithm maps an infinite collection of data into a finite set. This is known as a collision, such as Bob trying to roll out an article ' How to learn hashlib in python-by Bob ' based on your summary, and this article is exactly the same as your article, which is not impossible, but very difficult.
Abstract algorithm Application

Where can the abstract algorithm be applied? For a common example:

Any site that allows users to log on will store the user name and password that the user is logged on to. How do I store a user name and password? method is stored in the database table:

Name  | password--------+----------Michael | 123456bob   | abc999alice  | alice2008

If the user password is saved in clear text, if the database is compromised, all users ' passwords fall into the hands of the hacker. In addition, the site operators can access the database, that is, to get all the user's password.

The correct way to save a password is not to store the user's plaintext password, but instead to store a digest of the user's password, such as MD5:

Username | Password---------+---------------------------------Michael | E10adc3949ba59abbe56e057f20f883ebob   | 878ef96e86145580c38c87f0410ad153alice  | 99b1c2188db85afee403b1536010c2c9

When the user logs in, first calculate the user input plaintext password MD5, and then compared with the database storage MD5, if consistent, the password input is correct, if inconsistent, the password is definitely wrong.

Exercise: Calculates the MD5 password stored in the database according to the password entered by the user:

def calc_md5 (password):  Pass

The benefit of storing the MD5 is that even if the operations personnel can access the database, they will not be able to know the user's plaintext password.

Exercise: Design a function that verifies the user's login, returning True or false depending on whether the password entered by the user is correct:

db = {  ' michael ': ' e10adc3949ba59abbe56e057f20f883e ',  ' bob ': ' 878ef96e86145580c38c87f0410ad153 ',  ' Alice ': ' 99B1C2188DB85AFEE403B1536010C2C9 '}def login (user, password):  Pass

Is it safe to use MD5 to store passwords? And not necessarily. Suppose you are a hacker, have already got the database that stores MD5 password, how to push back the user's plaintext password through MD5? Brute force, real hackers don't do that.

Consider such a situation, many users like to use 123456,888888,password these simple password, so, the hacker can calculate in advance these common password MD5 value, get a counter-push table:

' e10adc3949ba59abbe56e057f20f883e ': ' 123456 ' 21218cca77804d2ba1922c33e0151105 ': ' 888888 ' 5f4dcc3b5aa765d61d8327deb882cf99 ': ' Password '

This way, no need to crack, only need to compare the database MD5, hackers get the use of common password user account.

For the user, of course, do not use too simple password. But can we enhance the protection of simple passwords in program design?

Because the MD5 value of the common password is easy to calculate, so to ensure that the Stored user password is not the MD5 of the commonly used password, this method is implemented by adding a complex string to the original password, commonly known as "Add salt":

def calc_md5 (password):  return get_md5 (password + ' The-salt ')

Salt processing of the MD5 password, as long as the salt is not known by hackers, even if the user entered a simple password, it is difficult to MD5 the plaintext password.

But if two users all use the same simple password such as 123456, in the database, two identical MD5 values will be stored, which means that the password of the two users is the same. Is there a way for users with the same password to store different MD5?

If the user cannot modify the login, it is possible to calculate the MD5 by using the login as part of the salt, so that users who implement the same password also store different MD5.

Exercise: Simulate user registrations based on user-entered logins and passwords to calculate a more secure MD5:

db = {}def register (username, password):  db[username] = get_md5 (password + username + ' The-salt ')

Then, the user login verification is implemented according to the modified MD5 algorithm:

DEF login (username, password):  Pass


Abstract algorithms are widely used in many places. Note that the digest algorithm is not an encryption algorithm and cannot be used for encryption (because plaintext cannot be reversed by the digest), but it is only used for tamper-proof, but its one-way computing feature determines that the user's password can be verified without storing the plaintext password.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.