Today I noticed that Baidu and Google have different URL encoding methods.
For example, we search for the word "technology" and then observe the IE Address Bar.
The result obtained by using Baidu is:
Http://www.baidu.com? Wd = % BC % CA % F5 & cl = 3
What is the result from Google?
Http://www.google.com/search? Hl = zh-CN & q = % E6 % 8A % 80% E6 % 9C % AF & lr =
That is, baidu_urlencode ("technology") = % BC % CA % F5, google_urlencode ("technology") = % E6 % 8A % 80% E6 % 9C % AF
Obviously, what Encoding algorithms are used for the two?
Come up with the cute Python to help us solve the problem.
>>> Import urllib
>>> Url = urllib. unquote ('HTTP: // www.baidu.com/s? Wd = % BC % CA % F5 & cl = 3 ')
>>> Url
'Http: // www.baidu.com/s? Wd =/xbc/xca/xf5 & cl = 3'
>>> Print url. decode ('gb2312 ')
Http://www.baidu.com? Wd = Technology & cl = 3
>>>
Obviously, the url encoding method of baidu is gb2312. What about google? Can it be like a bubble?
>>> Url2 = urllib. unquote ('HTTP: // www.google.com/search? Hl = zh-CN & q = % E6 % 8A % 80% E6 % 9C % AF & lr = ')
>>> Url2
'Http: // www.google.com/search? Hl = zh-CN & q =/xe6/x8a/x80/xe6/x9c/xaf & lr ='
>>> Print url2.decode ('gb2312 ')
Traceback (most recent call last ):
File "<input>", line 1, in?
UnicodeDecodeError: 'gb2312 'codec can't decode bytes in position 40-41: illegal multibyte sequence
Unfortunately, an error is reported because gb2312 decoding is incorrect. Try something else, maybe 'utf-8', go on
>>> Print url2.decode ('utf-8 ')
Http://www.google.com/search? Hl = zh-CN & q = Technology & lr =
Yeah. This indicates that google uses UTF-8 to encode the url.
Python is so cool!