This article mainly introduces how to implement url encoding for converting Chinese characters in python, and analyzes the skills related to the conversion of gbk and UTF-8 encoding for Chinese characters in combination with examples, which has some reference value, for more information about how to convert Chinese characters into url encoding, see the example in this article. We will share this with you for your reference. The details are as follows:
Today, I want to deal with Baidu posts. If you want to create a keyword list, you can directly add it to the list whenever you need it. However, the url is encoded as '% E4 % B8 % BD % E6 % B1 % 9f' when Chinese characters are added to the list (for example, 'lijiang ', therefore, a conversion is required. Here we use the urllib module.
>>> Import urllib >>> data = 'lijiang '>>> print data Lijiang >>> data \ xe4 \ xb8 \ xbd \ xe6 \ xb1 \ x9f' >>> urllib. quote (data) '% E4 % B8 % BD % E6 % B1 % 9f'
So we want to go back?
>>> Urllib. unquote ('% E4 % B8 % BD % E6 % B1 % 9f')' \ xe4 \ xb8 \ xbd \ xe6 \ xb1 \ x9f'> print urllib. unquote ('% E4 % B8 % BD % E6 % B1 % 9f') Lijiang
Students will find that % C0 % F6 % BD % AD appears in the post bar url, not '% E4 % B8 % BD % E6 % B1 % 9f ', it is actually a coding problem. Baidu is gbk, and other general websites such as google are utf8. Therefore, you can use the following statements.
>>> Import sys, urllib >>> s = 'lijiang '>>> urllib. quote (s. decode (sys. stdin. encoding ). encode ('gbk') '% C0 % F6 % BD % ad' >>> urllib. quote (s. decode (sys. stdin. encoding ). encode ('utf8') '% E4 % B8 % BD % E6 % B1 % 9f' >>>
For more articles about how to implement url encoding for Chinese conversion in python, refer to the PHP Chinese website!