Short URLs, as the name suggests, are seemingly very short URLs. Since Twitter launched its short web site service, major Internet companies have launched their own short web site services. The biggest advantage of short URLs is that they are short, with fewer characters, easy to publish, propagate, copy and store.
Through the search on the internet, the sensation spread 2 kinds of short URL algorithm, one is based on MD5 code, one is based on the self-added sequence.
1, based on MD5 code: This algorithm to calculate the short URL length is generally 5-bit or 6-bit, the calculation process may occur collisions (the probability is very small), the number of URLs can be expressed as 62
The 5 or 6 times of the party. It feels like Google (HTTP://GOO.GL), which uses a similar algorithm (guess), may look beautiful.
2, based on the self-added sequence: This algorithm is relatively simple to achieve, the probability of collision is 0, can be expressed in the URL can reach infinity, the length starting from 1. Seemingly Baidu's short URL service (http://dwz.cn/) is this algorithm.
Specific algorithm
1, MD5 code : Assuming that the length of the URL is n
A. Calculate the long address of the MD5 code, the 32-bit MD code into 4 paragraphs, 8 characters per paragraph
B. The 8 strings obtained by A are considered to be a 16 binary number, and the binary number represented by n * 6 1 is & operated
To get an n * 6 Long Binary number
C. The number obtained by B is divided into n segments, 6 bits per paragraph, and then the N 6-digit number and 61 are performed respectively &, and the resulting
Number as index to the alphabet to take the corresponding letter or number, splicing is a short URL length of n.
static final char[] DIGITS = {' 0 ', ' 1 ', ' 2 ', ' 3 ', ' 4 ', ' 5 ', ' 6 ', ' 7 ', ' 8 ', ' 9 ', ' A ', '" B ', ' C ', ' d ', ' e ', ' f ', ' g ', ' h ', ' I ', ' j ', ' K ', ' l ', ' m ', ' n ', ' o ', ' P ', ' Q ', ' R ', ' s ', ' t ', ' u ', ' V ', ' w ', ' x ', ' y ', ' z ' , ' A ', ' B ', ' C ', ' D ', ' E ', ' F ', ' G ', ' H ', ' I ', ' J ', ' K ', ' L ', ' M ', ' N ', ' O ', ' P ', ' Q ', ' R ', ' S ', ' T ', ' U ', ' V ', ' W ', ' X ',
' Y ', ' Z '};
public string Shorten (string longurl, int urllength) {if (Urllength < 0 | | urllength > 6) {throw new illegal
ArgumentException ("The length of URL must be between 0 and 6");
String Md5hex = Digestutils.md5hex (Longurl);
6 digit binary can indicate & number from 0-9a-za-z int binarylength = urllength * 6;
Long binarylengthfixer = long.valueof (Stringutils.repeat ("1", binarylength), BINARY);
for (int i = 0; i < 4; i++) {String subString = stringutils.substring (Md5hex, I * 8, (i + 1) * 8);
subString = long.tobinarystring (long.valueof (subString) & Binarylengthfixer);
subString = Stringutils.leftpad (subString, Binarylength, "0");
StringBuilder Sbbuilder = new StringBuilder ();
for (int j = 0; J < Urllength; J +) {String subString2 = stringutils.substring (substring, J * 6, (j + 1) * 6);
int charIndex = integer.valueof (SubString2, BINARY) & number_61;
Sbbuilder.append (Digits[charindex]); } String ShorTurl = Sbbuilder.tostring ();
if (Lookuplong (shorturl)!= null) {continue;
else {return shorturl;
}//If all 4 possibilities are already exists return null; }
2. Self-adding sequence:
A. Or the increment of the sequence, the value is expressed in 62.
Private Atomiclong sequence = new Atomiclong (0);
@Override
Protected String Shorten (string longurl) {
Long myseq = Sequence.incrementandget ();
String Shorturl = to62radixstring (MYSEQ);
return shorturl;
}
Private String to62radixstring (long seq) {
StringBuilder sbuilder = new StringBuilder ();
while (true) {
int remainder = (int) (seq%);
Sbuilder.append (Digits[remainder]);
seq = seq/62;
if (seq = = 0) {break
;
}
}
return sbuilder.tostring ();
}
The code in the MAVEN project uses 2 maps to simulate the mapping of long-short URLs, which may be based on database table matching index or some distributed KV system.
I hope this article will help you to learn short Web site services.