However, the status contains a mid field, and through mid we can actually get the URL by calculating it.
It is necessary to explain what is BASE62 encoding before starting the calculation. It is actually the interchange of decimal and 62-bit binary. For 62 binary, from 0 to 9, 10 is denoted by a lowercase letter A, followed by 26 letters, to Z 35, then 36 to uppercase A, and to 61 for capital Letter Z. So, we can implement the decimal number BASE62 encoded encode and decode. The following code is actually from StackOverflow:
Copy the Code code as follows:
ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def base62_encode (num, Alphabet=alphabet):
"" "Encode a number in Base X
' num ': the number to encode
' Alphabet ': the alphabet to use for encoding
"""
if (num = = 0):
return alphabet[0]
arr = []
base = Len (Alphabet)
While Num:
rem = num% base
num = num//Base
Arr.append (Alphabet[rem])
Arr.reverse ()
Return '. Join (ARR)
Def base62_decode (String, Alphabet=alphabet):
"" "Decode a Base X encoded string into the number
Arguments:
-' string ': the encoded string
-' alphabet ': the alphabet to use for encoding
"""
base = Len (Alphabet)
strlen = Len (String)
num = 0
IDX = 0
For char in string:
Power = (strlen-(idx + 1))
num + = Alphabet.index (char) * (base * * Power)
IDX + = 1
Return num
Let's start with the conversion of the URL to mid. For a Sina Weibo URL, it is shaped like: http://weibo.com/2991905905/z579Hz9Wr, the median number is the UID of the user, and it is important to follow the string "Z579HZ9WR". Its calculation is actually very simple, from the back forward four characters a group, you get:
Copy the Code code as follows:
Z
579H
Z9wr
Each string is decode with BASE62 encoding, and the decimal digits are:
Copy the Code code as follows:
35
1219149
8379699
Put them together to get mid for: "3512191498379699". Here to emphasize: for the addition of the beginning of the string, if the resulting decimal number is less than 7 bits, you need to top up 0. For example, the resulting decimal number is: 35,33040,8906190, you need to add two 0 in front of 33040.
The code is as follows:
Copy the Code code as follows:
def url_to_mid (URL):
'''
>>> url_to_mid (' Z0jh2lomb ')
3501756485200075L
>>> url_to_mid (' Z0ijpwgk7 ')
3501703397689247L
>>> url_to_mid (' Z0igabdsn ')
3501701648871479L
>>> url_to_mid (' Z08aubmue ')
3500330408906190L
>>> url_to_mid (' z06ql6b28 ')
3500247231472384L
>>> url_to_mid (' YCTXN8IXR ')
3491700092079471L
>>> url_to_mid (' Yat1n2xra ')
3486913690606804L
'''
url = str (URL) [::-1]
size = Len (URL)/4 if Len (URL)% 4 = = 0 else len (URL)/4 + 1
result = []
For I in range (size):
s = url[i * 4: (i + 1) * 4][::-1]
s = str (Base62_decode (s))
S_len = Len (s)
If I < size-1 and S_len < 7:
s = (7-s_len) * ' 0 ' + S
Result.append (s)
Result.reverse ()
return int (". Join (Result)")
Mid to URL is also very simple, for a mid, we from the forward every 7 bits of a group, with base62 code to encode, together can be. It is also important to note that each of the 7 sets of numbers, in addition to the beginning of a group, if the resulting 62 binary numbers less than 4 bits, need to complement 0.
Copy the Code code as follows:
def mid_to_url (Midint):
'''
>>> Mid_to_url (3501756485200075)
' Z0jh2lomb '
>>> Mid_to_url (3501703397689247)
' Z0ijpwgk7 '
>>> Mid_to_url (3501701648871479)
' Z0igabdsn '
>>> Mid_to_url (3500330408906190)
' Z08aubmue '
>>> Mid_to_url (3500247231472384)
' Z06ql6b28 '
>>> Mid_to_url (3491700092079471)
' Yctxn8ixr '
>>> Mid_to_url (3486913690606804)
' Yat1n2xra '
'''
Midint = str (midint) [::-1]
size = Len (midint)/7 If Len (midint)% 7 = = 0 Else len (midint)/7 + 1
result = []
For I in range (size):
s = midint[i * 7: (i + 1) * 7][::-1]
s = base62_encode (int (s))
S_len = Len (s)
If I < size-1 and Len (s) < 4:
s = ' 0 ' * (4-s_len) + S
Result.append (s)
Result.reverse ()
Return '. Join (Result)
Run doctest to see that all of the test cases have passed.
Finally, I don't quite understand why Sina Weibo does not directly include the URL in the field, and Sina Weibo's open platform also has a lot of non-standard places, in fact, the content of this article does not have any technical content, but it is to let developers toss a bit. There are, for example, the issue of refresh token and so on, here is not one by one enumeration.