Python implementation of short URL Shorturl Hash operation example Explained

Source: Internet
Author: User
Tags repetition
In this paper, we describe the hashing method of Python to realize short URL shorturl. Share to everyone for your reference. Specific as follows:

The common practice of shorturl is to store the original URL in the database and return a corresponding ID from the database.

The following is to achieve shorturl hash of the original URL without database support. Speaking of which, it is easy to think of MD5, fixed length, small conflict probability, but 32 characters, too long? We use MD5 as a basis to shorten the characters and to ensure that the hash does not conflict in a certain number of ranges.

We are divided into two steps to achieve.

First Step algorithm:

① the long URL with the MD5 algorithm to generate 32-bit signature string, divided into 4 segments, 8 characters per paragraph;
② the 4-stage loop processing, take 8 characters per paragraph, and consider him 16 binary string and 0X3FFFFFFF (30 bit 1) bit and operation, more than 30 bits of ignoring processing;
③ the 30 bits obtained in each section are divided into 6 segments, each 5 digits as an index of the alphabet to obtain a specific character, sequentially to obtain a 6-bit string;
④ such a MD5 string can get 4 6-bit strings, whichever one inside can be used as the short URL address of this long URL.
(The chance of repetition is approximately n/(32^6), which is n/1,073,741,824, where n is the number of records in the database)

We get 4 6-bit strings, but which one is the final hash result, the random selection is definitely not, the same URL two times hash will come to different results. Next, select according to the characteristics of the original URL and control the likelihood of the hash conflict in the same domain:

Second Step algorithm:

① extracts the domain name from the original URL and extracts the number (up to 6 digits);
② the resulting number and 4 modulus, according to the remainder of the obtained from the first step of the algorithm to choose which of the 4 shorturl;
③ extracts a feature string from a domain name: The first character in a domain name and the next two consonants (if the consonant is less than 2 take any of the first two);
The ④ domain name feature string and the selected Shorturl are stitched into 9-bit characters for the final shorturl;
(the next two steps are to control the conflict within a domain)

shorturl.py

#encoding: utf-8__author__ = ' James Lau ' import hashlibimport redef __original_shorturl (URL): ' Algorithm: ① generate 32-bit signatures with MD5 algorithm for long URLs String, divided into 4 paragraphs, each paragraph of 8 characters; ② for these 4 cycles, take 8 characters per paragraph, and treat him as 16 binary string and 0X3FFFFFFF (30 bit 1) bit and operation, more than 30 bits of ignore processing; ③ will divide the 30 bits from each paragraph into 6 segments,  Each 5-digit number takes a specific character as an index of the alphabet, and in turn obtains a 6-bit string; ④ such a MD5 string can obtain 4 6-bit strings, whichever one inside can be used as the short URL address of this long URL. (The chance of repetition is approximately n/(32^6), which is n/1,073,741,824, where n is the number of records in the database) ' base32 = [' A ', ' B ', ' C ', ' d ', ' e ', ' f ', ' g ', ' h ', ' I ', ' J ', ' K ', ' l ', ' m ', ' n ', ' o ', ' P ', ' Q ', ' R ', ' s ', ' t ', ' u ', ' V ', ' w ', ' x ', ' y ', ' z ', ' 0 ', ' 1 ', ' 2 ', ' 3 ',  ' 4 ', ' 5 '] m = Hashlib.md5 () m.update (URL) hexstr = M.hexdigest () Hexstrlen = Len (hexstr) Subhexlen = HEXSTRLEN/8    output = [] for I in range (0,subhexlen): Subhex = ' 0x ' +hexstr[i*8: (i+1) *8] res = 0x3fffffff & Int (subhex,16) out = "for J in Range" (6): val = 0x0000001F & res out + = (Base32[val]) res = res >> 5 ou Tput.append (out) return outputdef shorturl (URL): "Algorithm: ① extracts the domain name from the original URL, extracts the number (mostMore than 6 bits); ② will get the number and 4 modulo, according to the remainder of the decision from the first algorithm to choose which of the 4 shorturl; ③ extracts a feature string from a domain name: The first character in a domain name and the two consonants in the back (if the consonant is less than 2 take any of the first two); The ④ domain name feature string and the selected Shorturl are stitched into 9-bit characters for the final Shorturl; (the next two steps are to control the conflict in a domain) ' Match_full_domain_regex = re.compile (U ' ^https?:\ /\/([a-za-z0-9_\-\.] +[a-za-z0-9_\-]+\. [a-za-z]+) | ([A-za-z0-9_\-]+\.  [a-za-z]+)]. *$ ') Match_full_domain = Match_full_domain_regex.match (URL) If Match_full_domain is not none:full_domain = Match_full_domain.group (1) else:return None Not_numeric_regex = re.compile (U ' [^\d]+ ') numeric_string = Not_numer  Ic_regex.sub (r ', URL) if numeric_string is None or numeric_string== ': numeric_string = ' 0 ' else:numeric_string =  Numeric_string[-6:] Domainarr = Full_domain.split ('. ') Domain = domainarr[1] If Len (domainarr) ==3 else domainarr[0] vowels = ' aeiou0-9 ' If len (domain) <=3:prefix = Domai n Else:prefix = re.compile (U ' [%s]+ '%vowels). Sub (R ', domain[1:]) prefix = '%s%s '% (Domain[0],prefix[:2]) if Len (pref IX) >=2 Else Domain[0:3] T_shorturl = __original_shorturl (URL) t_choose = Int (numeric_string)%4 result = '%s%s '% (Prefix,t_shorturl[t_choose]) return result 

Hopefully this article will help you with Python programming.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.