Use Python to convert the mid and URL of Sina Weibo to each other (10 binary and 62 binary)

Source: Internet
Author: User
However, the status contains a mid field, and through mid we can actually get the URL by calculating it.

It is necessary to explain what is BASE62 encoding before starting the calculation. It is actually the interchange of decimal and 62-bit binary. For 62 binary, from 0 to 9, 10 is denoted by a lowercase letter A, followed by 26 letters, to Z 35, then 36 to uppercase A, and to 61 for capital Letter Z. So, we can implement the decimal number BASE62 encoded encode and decode. The following code is actually from StackOverflow:

Copy the Code code as follows:


ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def base62_encode (num, Alphabet=alphabet):
"" "Encode a number in Base X

' num ': the number to encode
' Alphabet ': the alphabet to use for encoding
"""
if (num = = 0):
return alphabet[0]
arr = []
base = Len (Alphabet)
While Num:
rem = num% base
num = num//Base
Arr.append (Alphabet[rem])
Arr.reverse ()
Return '. Join (ARR)

Def base62_decode (String, Alphabet=alphabet):
"" "Decode a Base X encoded string into the number

Arguments:
-' string ': the encoded string
-' alphabet ': the alphabet to use for encoding
"""
base = Len (Alphabet)
strlen = Len (String)
num = 0

IDX = 0
For char in string:
Power = (strlen-(idx + 1))
num + = Alphabet.index (char) * (base * * Power)
IDX + = 1

Return num

Let's start with the conversion of the URL to mid. For a Sina Weibo URL, it is shaped like: http://weibo.com/2991905905/z579Hz9Wr, the median number is the UID of the user, and it is important to follow the string "Z579HZ9WR". Its calculation is actually very simple, from the back forward four characters a group, you get:

Copy the Code code as follows:


Z
579H
Z9wr

Each string is decode with BASE62 encoding, and the decimal digits are:
Copy the Code code as follows:


35
1219149
8379699

Put them together to get mid for: "3512191498379699". Here to emphasize: for the addition of the beginning of the string, if the resulting decimal number is less than 7 bits, you need to top up 0. For example, the resulting decimal number is: 35,33040,8906190, you need to add two 0 in front of 33040.
The code is as follows:
Copy the Code code as follows:


def url_to_mid (URL):
'''
>>> url_to_mid (' Z0jh2lomb ')
3501756485200075L
>>> url_to_mid (' Z0ijpwgk7 ')
3501703397689247L
>>> url_to_mid (' Z0igabdsn ')
3501701648871479L
>>> url_to_mid (' Z08aubmue ')
3500330408906190L
>>> url_to_mid (' z06ql6b28 ')
3500247231472384L
>>> url_to_mid (' YCTXN8IXR ')
3491700092079471L
>>> url_to_mid (' Yat1n2xra ')
3486913690606804L
'''
url = str (URL) [::-1]
size = Len (URL)/4 if Len (URL)% 4 = = 0 else len (URL)/4 + 1
result = []
For I in range (size):
s = url[i * 4: (i + 1) * 4][::-1]
s = str (Base62_decode (s))
S_len = Len (s)
If I < size-1 and S_len < 7:
s = (7-s_len) * ' 0 ' + S
Result.append (s)
Result.reverse ()
return int (". Join (Result)")

Mid to URL is also very simple, for a mid, we from the forward every 7 bits of a group, with base62 code to encode, together can be. It is also important to note that each of the 7 sets of numbers, in addition to the beginning of a group, if the resulting 62 binary numbers less than 4 bits, need to complement 0.

Copy the Code code as follows:


def mid_to_url (Midint):
'''
>>> Mid_to_url (3501756485200075)
' Z0jh2lomb '
>>> Mid_to_url (3501703397689247)
' Z0ijpwgk7 '
>>> Mid_to_url (3501701648871479)
' Z0igabdsn '
>>> Mid_to_url (3500330408906190)
' Z08aubmue '
>>> Mid_to_url (3500247231472384)
' Z06ql6b28 '
>>> Mid_to_url (3491700092079471)
' Yctxn8ixr '
>>> Mid_to_url (3486913690606804)
' Yat1n2xra '
'''
Midint = str (midint) [::-1]
size = Len (midint)/7 If Len (midint)% 7 = = 0 Else len (midint)/7 + 1
result = []
For I in range (size):
s = midint[i * 7: (i + 1) * 7][::-1]
s = base62_encode (int (s))
S_len = Len (s)
If I < size-1 and Len (s) < 4:
s = ' 0 ' * (4-s_len) + S
Result.append (s)
Result.reverse ()
Return '. Join (Result)

Run doctest to see that all of the test cases have passed.

Finally, I don't quite understand why Sina Weibo does not directly include the URL in the field, and Sina Weibo's open platform also has a lot of non-standard places, in fact, the content of this article does not have any technical content, but it is to let developers toss a bit. There are, for example, the issue of refresh token and so on, here is not one by one enumeration.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.