The principle of short URL generation algorithm for microblog URLs (Java version, PHP version implementation example)

Source: Internet
Author: User
Tags php language key string md5 encryption repetition

Short URLs, as the name implies, are in the form of relatively short URLs. Usually with the ASP or PHP steering, in Web 2.0 today, it must be said that this is a trend. There are already many similar services, with short URLs you can use a short URL instead of the original lengthy URL, so that users can easily share links.
Example: Http://t.cn/SzjPjA

Short URL service, may be a lot of friends are no longer unfamiliar, now most of the microblogging, mobile email reminders and other places already have a lot of application patterns, and occupy a certain market. It is estimated that many friends are using it now.
Have seen Sina's short connection service, found behind the main 6 string composition, so the first thought is the original company wrote a game activation code rules, that is, the following algorithm 2,
26 uppercase letters 26 lowercase letters, 10 digits, randomly generated 6 then insert database corresponding to an ID, short connection jump, according to the string query to the corresponding ID, you can achieve the corresponding jump! But 2 of 62 times, do not know there is no repetition, small probability can, but the correspondence is not very large site should be enough
Since Twitter launched a short URL (shorturl), followed by the domestic micro-blog Follow the follow-up, Google public goo.gl use of APIs, short URL of the wind intensified. This is a new and popular web2.0 service. Now tidy up, including the complete short web site, Short URL generation principle, algorithm examples, and comparison of pros and cons, but also introduced several phper personal implementation.

Benefits of Short Links:

1, content needs, 2, user-friendly, 3, easy to manage.

Why do you do this, why I want to have these points:
Weibo limit words to 140 words A, then if we need to send some connection up, but this connection is very long, so that nearly occupy half of our content, this must not be allowed, so the short URL came into being.
Short URLs can be very well managed for open-level URLs in our projects. Some Web sites can cover violence, advertising and other information, so that we can through the user's report, the full management of this connection will not appear in our application, should be the same URL through the encryption algorithm, the same address is obtained.
We can do a series of Web site traffic, click and other statistics, mining out the majority of users focus, so as to facilitate our work on the follow-up of the project to make better decisions.

Algorithm principle
Algorithm One
1) to generate a long URL MD5 32-bit signature string, divided into 4 segments, 8 bytes per segment;
2) for this four-segment cycle processing, take 8 bytes, he is regarded as 16 binary string and 0X3FFFFFFF (30 bit 1) and operation, that is, more than 30 bits of ignoring processing;
3) The 30 bits are divided into 6 segments, and each 5 digit number is taken as an index of the alphabet to obtain a specific character, followed by a 6-bit string;
4) The total MD5 string can obtain 4 6-bit strings; Take any one inside to be the short URL address for this long URL;
This algorithm, although will generate 4, but there is still a repetition probability, the following algorithm one and three, is the implementation of this.
Algorithm two
A-za-z0-9 these 64-bit 6-bit combinations can produce 500多亿个 combinations. The combination of numbers and characters to make a certain mapping, you can produce a unique string, such as the 62nd combination is AAAAA9, the 63rd combination is Aaaaba, and then use the shuffle algorithm, the original string is scrambled to save, Then the combined string of the corresponding position will be an unordered combination.
Put the long URL into the database, take the returned ID, find the corresponding string, for example, the return ID is 1, then the corresponding string combination is BBB, the same ID is 2 o'clock, string combination for the BBA, and so on, until the 64 combinations will appear after the repetition of the possible, so if using the above 62 characters, If you combine any 6 characters into a string, you will have more than 50 billion of your data before you can repeat it.
See here to thoroughly improve the Sina Weibo interface and ultra-short URL algorithm, algorithm four can be counted as an implementation of this algorithm, this algorithm is generally not repeated, but if it is statistical, there is a big problem, especially the domain name related statistics, flying blind.

Java language implementations:
Java code
Package com.test;

Import Java.security.MessageDigest;

public class Shorturl {

Private final static string[] hexdigits = {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "a", "B", "C", "D     "," E "," F "};        public static String bytearraytohexstring (byte[] b) {StringBuffer RESULTSB = new StringBuffer ();        for (int i = 0; i < b.length; i++) {Resultsb.append (bytetohexstring (B[i]));    } return resultsb.tostring ();        } private static String bytetohexstring (byte b) {int n = b;        if (n < 0) n = + + N;        int D1 = N/16;        int d2 = n% 16;    return HEXDIGITS[D1] + HEXDIGITS[D2];            public static string Md5encode (String origin) {string resultstring = null;                try {resultstring=new String (origin);                MessageDigest MD = messagedigest.getinstance ("MD5");                Resultstring.trim ();            Resultstring=bytearraytohexstring (Md.digest (Resultstring.getbytes ("UTF-8")); }catch (Exception ex) {} return resultstring;     }public static void Main (string[] args) {String url = "Http://www.bai.com";     For (string string:shorttext (URL)) {print (string);                 }} public static string[] Shorttext (String string) {string key = "Xuliang";         Custom generate MD5 encryption string before the mixed key string[] chars = new string[]{//To use the character "a", "B", "C", "D", "E", "F", "G", "H" to generate the URL,          "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "s", "T", "U", "V", "w", "X", "Y", "z", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T"     , "U", "V", "W", "X", "Y", "Z"};     String hex = Md5encode (key + string);     int hexlen = Hex.length ();     int subhexlen = HEXLEN/8;     string[] Shortstr = new String[4];         for (int i = 0; i < Subhexlen; i++) {String outchars = "";         Int J = i + 1;         String Subhex = hex.substring (i * 8, J * 8); Long idx = Long.valuEOf ("3FFFFFFF", +) & Long.valueof (Subhex, 16);             for (int k = 0; k < 6; k++) {int index = (int) (Long.valueof ("0000003D", +) & IDX);             Outchars + = Chars[index];         IDX = idx >> 5;     } Shortstr[i] = Outchars; } return SHORTSTR; } public static void print (Object messagr) {System.out.println (MESSAGR);}

}

The following are the PHP language implementations:
PHP code
function Shorturl ($input) {
$base = Array (

' A ', ' B ', ' C ', ' d ', ' e ', ' f ', ' g ', ' h ',   ' I ', ' j ', ' K ', ' l ', ' m ', ' n ', ' o ', ' P ', '   Q ', ' R ', ' s ', ' t ', ' u ', ' V ', ' W ', ' x ',   ' y ', ' z ', ' 0 ', ' 1 ', ' 2 ', ' 3 ', ' 4 ', ' 5 '   );

$hex = MD5 ($input);
$hexLen = strlen ($hex);
$subHexLen = $hexLen/8;
$output = Array ();
for ($i = 0; $i < $subHexLen; $i + +) {

$subHex = substr ($hex, $i * 8, 8);   $int = 0X3FFFFFFF & (1 * (' 0x '. $subHex));   $out = ";   for ($j = 0; $j < 6; $j + +) {     $val = 0x0000001F & $int;     $out. = $base 32[$val];     $int = $int >> 5;   }   $output [] = $out;

}
return $output;
}
?>

PHP code
function random ($length, $pool = ")

{       $random = ';       if (Emptyempty ($pool)) {           $pool    = ' abcdefghkmnpqrstuvwxyz ';           $pool   . = ' 23456789 ';       }       Srand (Double) microtime () *1000000);       for ($i = 0; $i < $length; $i + +)        {           $random. = substr ($pool, (rand ()% (strlen ($pool))), 1);       }       return $random;   }

?>

Jump principle
When we generate short links, we only need to store the mapping of the original link and the short link in the table (database or NoSQL). When we visit a short link, we only need to find the original link from the mapping relationship to jump to the original link.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.