I passed with Python3 (iii)-I went to. Coding problem again--urllib.parse.unquote

Source: Internet
Author: User

Remember to learn python when learning the crawler, often encountered coding problems (in fact, in python3 coding problem has been very little ...) ), with the requests library is very convenient to solve these problems. Recently, there are programmers who have co-learned python to write an e-book website, want the relevant crawler, so I went to try ... Of course, the first step to meet the "coding problem", this time requests will not be used.

observed that after the novel website search page, jump URL shape such as: http://so.biquge.la/cse/search?s=7138806708853866527&q=%CD%EA%C3%C0%CA%C0%BD%E7

Also, querying for different content changes only after the &q= content. Started thinking it was encrypted (well, I'm really small white ...) Daniel tells me it's just a code ... Use the Urllib.parse.unquote (inside the Python2 is Urllib.unquote).

In Python3 , this is exactly the case:

From urllib Import parsecity = Parse.unquote ('%E5%B1%B1%E8%A5%BF ',)  # encoding= ' utf-8 ' Print (city)  # Shanxi

This is an example of consulting others, the perfect run. But when I went to apply this format, there was garbled. The check is found to be related to the encoding of the Web page (the above code is also intercepted from the Web page). The example of the page encoding is UTF-8, while the encoding of the novel website to parse is GBK. The code is then modified as follows:

Name = Parse.unquote ('%ce%e4%b6%af%c7%ac%c0%a4 ', encoding= ' gb18030 ')  # GBK can also print (name)  # Wu Action Universe

In other words, the default in the first example is encoding= ' Utf-8 '. (PS: For GBK and GB18030, refer to this article.) )

Even if we can decode it successfully, then ... Naturally think, is how to weave back? Below, "reversing" Please note:

x = Parse.quote (' Martial universe ', encoding= ' GB18030 ') print (x)

Output Result:

%ce%e4%b6%af%c7%ac%c0%a4

As simple as imagined, that is, change the unquote to quote.

At this point, it is a more understanding of the coding problem, of course, the road is still very long!

Finally thank the group inside two great God's help @irvine-song before the waste Emperor, @ Fujian-Tianya.

I passed with Python3 (iii)-I went to. Coding problem again--urllib.parse.unquote

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.