URL short address Compression Algorithm

Source: Internet
Author: User
Tags key string

Nowadays, the application of short web sites has become popular in various microblogs across the country. For example, QQ Weibo's url.cn and groom's t.cn. When we publish a website on Sina Weibo, Weibo will automatically identify the website and convert it, for example, http://t.cn/hrynr0. The reason for doing so is as follows:

1. Weibo has a limit of 140 words. If we need to send some connections, but this connection is so long that it will take up nearly half of our content, this is definitely not allowed, so the short website came into being.

2. Short URLs can be well managed in our project for Open-level URLs. Some websites can cover the content, violence, advertisements, and other information, so that we can use user reports to completely manage this connection and it will not be available in our applications, after the same URL is encrypted, the obtained URL is the same.

3. We can collect statistics such as traffic and clicks on a series of websites to find out the concerns of most users. This will help us make better decisions on the future work of the project.

The above three points are purely personal opinions, because they will be applied in some of my subsequent projects, so I will take a look at them. Next I will take a look at the theory of short URL ing algorithms (information found on the Internet ):

① Use the MD5 Algorithm to generate a 32-bit signature string for a long URL, which is divided into 4 segments and each segment contains 8 characters;

② Process these four segments cyclically, take the 8 characters of each segment, and regard it as a hexadecimal string and 0x3fffffff (30 bits 1) bits and operations, ignore processing with more than 30 bits;

③ Divide the 30 digits in each segment into six segments, and each 5 digits are used as the index of the alphabet to obtain a specific character, and 6 strings are obtained in sequence;

④ This MD5 string can obtain four 6-bit strings, and any one of them can be used as the short URL address of this long URL.

We do not necessarily say that the obtained URL is unique, but we can retrieve four sets of URLs, so there will be almost no large repetition.

First, please understand how to use MD5 to encrypt the string to get a 32-bit encrypted string in Java. below is the Java MD5 algorithm I have encapsulated:

[Md5encry. Class]

Package COM. example. demo_shorturl; import Java. security. messagedigest;/*** classname: md5encry <br/> * function: Todo add function. <br/> * Reason: Todo add reason. <br/> * Date: 9:51:15 <br/> * @ author geek_anjon * @ version * @ since JDK 1.6 * @ see */public class md5encry {private final static string [] hexdigits = {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "", "B", "C", "D", "E", "F"}; public static string bytearraytohexstring (byte [] B) {stringbuffer resultsb = new stringbuffer (); for (INT I = 0; I <B. length; I ++) {resultsb. append (bytetohexstring (B [I]);} return resultsb. tostring ();} Private Static string bytetohexstring (byte B) {int n = B; If (n <0) n = 256 + N; int d1 = N/16; int D2 = n % 16; return hexdigits [D1] + hexdigits [D2];} public static string md5encode (string origin) {string resultstring = NULL; try {resultstring = new string (origin); messagedigest MD = messagedigest. getinstance ("MD5"); resultstring. trim (); resultstring = bytearraytohexstring (MD. digest (resultstring. getbytes ("UTF-8");} catch (exception ex) {} return resultstring;} public static void main (string [] ARGs) {string data = "189022881112011111118: 09sz109123456789987654321 "; system. out. println (md5encode (data ));}}
[Invalid URL. Class]
Package COM. example. demo_shorturl;/*** classname: shorturl <br/> * function: Todo add function. <br/> * Reason: Todo add reason. <br/> * Date: 9:48:34 <br/> * @ author geek_anjon * @ version * @ since JDK 1.6 * @ see */public class signed URL {public static void main (string [] ARGs) {string url = "http://www.baidu.com"; for (string: plain text (URL) {print (string) ;}} public static string [] plain text (string) {string key = "geek "; // customize the mixed key string [] chars = new string [] {// use the character "A", "B ", "C", "D", "E", "F", "g", "H", "I", "J", "k", "L ", "M", "n", "O", "P", "Q", "r", "S", "T", "U", "V ", "W", "X", "Y", "Z", "0", "1", "2", "3", "4", "5 ", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F ", "G", "H", "I", "J", "k", "L", "M", "n", "O", "P ", "Q", "r", "S", "T", "U", "V", "W", "X", "Y ", "Z"}; string hex = md5encry. md5encode (Key + String); int hexlen = hex. length (); int subhexlen = hexlen/8; string [] substring STR = new string [4]; for (INT I = 0; I <subhexlen; I ++) {string outchars = ""; Int J = I + 1; string subhex = hex. substring (I * 8, J * 8); long idx = long. valueof ("3 fffffff", 16) & long. valueof (subhex, 16); For (int K = 0; k <6; k ++) {int Index = (INT) (Long. valueof ("0000003d", 16) & idx); outchars + = chars [Index]; idx = idx >>>5;} response STR [I] = outchars;} return response STR ;} private Static void print (Object messagr) {system. out. println (messagr );}}

Now you can use the plain text (URL) method to obtain the short link address.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.