The php version of Weibo short link algorithm implements code. Idea: 1) generate a 32-bit signature string for the long URL md5, which is divided into four sections, each of which is 8 bytes; 2) process these four sections cyclically and take 8 bytes, think of it as a hex string and 0x3fffffff (30 bits 1) and an operation, that is beyond the thinking:
1) generate a 32-bit signature string for the long URL md5, which is divided into four sections, each of which contains 8 bytes;
2) for these four cycles, take 8 bytes and treat them as hexadecimal strings and 0x3fffffff (30 bits 1) and operations, that is, ignore processing with over 30 bits;
3) The 30 digits are divided into six segments. each of the five digits is used as an index of the alphabet to obtain a specific character, and six strings are obtained in sequence;
4) the total md5 string can be 4 6-bit strings; any one of them can be used as the short url address of this long url;
The following is the PHP code:
The code is as follows:
Function compute url ($ url = '', $ prefix ='', $ suffix = ''){
$ Base = array (
'A', 'B', 'C', 'D', 'e', 'e', 'F', 'G', 'H ',
'I', 'J', 'K', 'L', 'M', 'n', 'O', 'P ',
'Q', 'R', 'S', 'T', 'u', 'V', 'W', 'X ',
'Y', 'z', '0', '1', '2', '3', '4', '5 ');
$ Hex = md5 ($ prefix. $ url. $ suffix );
$ HexLen = strlen ($ hex );
$ SubHexLen = $ hexLen/8;
$ Output = array ();
For ($ I = 0; $ I <$ subHexLen; $ I ++ ){
$ SubHex = substr ($ hex, $ I * 8, 8 );
$ Int = 0x3FFFFFFF & (1 * ('0x '. $ subHex ));
$ Out = '';
For ($ j = 0; $ j <6; $ j ++ ){
$ Val = 0x0000001F & $ int;
$ Out. = $ base [$ val];
$ Int = $ int> 5;
}
$ Output [] = $ out;
}
Return $ output;
}
$ Urls = signed url ('http: // www.jb51.net /');
Var_dump ($ urls );
Result
The code is as follows:
Array (4 ){
[0] =>
String (6) "alms1l"
[1] =>
String (6) "2 ipmby"
[2] =>
String (6) "avo1hu"
[3] =>
String (6) "fdlban"
}
Another version:
The code is as follows:
Function compute url ($ url = '', $ prefix ='', $ suffix = ''){
$ Base = array (
"A", "B", "c", "d", "e", "f", "g", "h ",
"I", "j", "k", "l", "m", "n", "o", "p ",
"Q", "r", "s", "t", "u", "v", "w", "x ",
"Y", "z", "0", "1", "2", "3", "4", "5 ",
"6", "7", "8", "9", "A", "B", "C", "D ",
"E", "F", "G", "H", "I", "J", "K", "L ",
"M", "N", "O", "P", "Q", "R", "S", "T ",
"U", "V", "W", "X", "Y", "Z ");
$ Hex = md5 ($ prefix. $ url. $ suffix );
$ HexLen = strlen ($ hex );
$ SubHexLen = $ hexLen/8;
$ Output = array ();
For ($ I = 0; $ I <$ subHexLen; $ I ++ ){
$ SubHex = substr ($ hex, $ I * 8, 8 );
$ Int = 0x3FFFFFFF & (1 * ('0x '. $ subHex ));
$ Out = '';
For ($ j = 0; $ j <6; $ j ++ ){
$ Val = 0x0000003D & $ int;
$ Out. = $ base [$ val];
$ Int = $ int> 5;
}
$ Output [] = $ out;
}
Return $ output;
}
Result:
The code is as follows:
Array (4 ){
[0] =>
String (6) "6 jmMVj"
[1] =>
String (6) "2 EnIby"
[2] =>
String (6) "6 vIVfu"
[3] =>
String (6) "B7Fb6n"
}
However, the upgrade version has a higher hit rate. I don't know why.
Test code for test collision:
The code is as follows:
$ Result = array ();
$ Repeats = array ();
$ Loop = 20000;
For ($ I = 0; $ I <$ loop; $ I ++ ){
$ Url = 'http: // www.jb51.net /? Id = '. $ I;
$ Shorta = signed url ($ url );
$ Short = $ shorta [0];
If (in_array ($ short, $ result )){
$ Repeats [] = $ short;
}
$ Result [] = $ short;
}
$ Result = array ();
For ($ I = 0; $ I <$ loop; $ I ++ ){
$ Url = 'http: // www.jb51.net /? Id = '. $ I;
$ Shorta = signed url ($ url );
$ Short = $ shorta [0];
If (in_array ($ short, $ repeats )){
$ Result [$ short] [] = $ url;
}
}
Var_dump ($ repeats );
Var_dump ($ result );
Result:
The code is as follows:
Array (8 ){
[0] =>
String (6) "3 eQBzq"
[1] =>
String (6) "uw.nay"
[2] =>
String (6) "qEZbIv"
[3] =>
String (6) "fMneYf"
[4] =>
String (6) "FJj6Fr"
[5] =>
String (6) "3 Eviym"
[6] =>
String (6) "j2mmuy"
[7] =>
String (6) "jyQfIv"
}
Array (8 ){
'Jyqfiv' =>
Array (2 ){
[0] =>
String (26) "http://www.jb51.net /? Id = 1640"
[1] =>
String (27) "http://www.jb51.net /? Id = 18661"
}
'Fmneyf' =>
Array (2 ){
[0] =>
String (26) "http://www.jb51.net /? Id = 2072"
[1] =>
String (26) "http://www.jb51.net /? Id = 8480"
}
'3eqbzq' =>
Array (2 ){
[0] =>
String (26) "http://www.jb51.net /? Id = 4145"
[1] =>
String (26) "http://www.jb51.net /? Id = 4273"
}
'J2mmuy' =>
Array (2 ){
[0] =>
String (26) "http://www.jb51.net /? Id = 7131"
[1] =>
String (27) "http://www.jb51.net /? Id = 17898"
}
'Qezbiv' =>
Array (2 ){
[0] =>
String (26) "http://www.jb51.net /? Id = 7320"
[1] =>
String (26) "http://www.jb51.net /? Id = 8134"
}
'Uw.nay' =>
Array (2 ){
[0] =>
String (26) "http://www.jb51.net /? Id = 7347"
[1] =>
String (26) "http://www.jb51.net /? Id = 7962"
}
'Fj6fr '=>
Array (2 ){
[0] =>
String (26) "http://www.jb51.net /? Id = 8628"
[1] =>
String (26) "http://www.jb51.net /? Id = 9031"
}
'3eviym' =>
Array (2 ){
[0] =>
String (27) "http://www.jb51.net /? Id = 11175"
[1] =>
String (27) "http://www.jb51.net /? Id = 14437"
}
}
Segment 1) generate a 32-bit signature string for the long URL md5, which is divided into 4 segments, each segment has 8 bytes; 2) process these four segments cyclically and take 8 bytes, consider it as a hex string and 0x3fffffff (30 bits 1) and an operation that exceeds...