2 Solutions for Python beautifulsoup Chinese garbled problem _python

Source: Internet
Author: User

Workaround One:

Use Python beautifulsoup to crawl the page and then output the page title, but the output is always garbled, find a long time to find solutions, the following share to everyone
First, the code.

Copy Code code as follows:

From BS4 import BeautifulSoup
Import Urllib2

url = ' http://www.jb51.net/'
page = Urllib2.urlopen (URL)

Soup = BeautifulSoup (page,from_encoding= "UTF8")
Print soup.original_encoding
Print (Soup.title). Encode (' GB18030 ')

File = Open ("Title.txt", "W")
File.write (str (soup.title))
File.close ()



For link in Soup.find_all (' a '):
Print link[' href ']

At the beginning of the test found that, although the output is garbled, but written in the file is normal. And then find a solution on the Internet.
Print the logic of an object: The internal is called the object's __str__ to get the corresponding string, here corresponds to the soup __str__ and for soup itself, is already Unicode encoding, so you can specify the __STR__ output when the encoding for GBK, To enable the correct display of non-garbled Chinese here
And for CMD: (Chinese system) encoded as GBK, so as long as the GB18030 code to the normal output
The following line of code
Copy Code code as follows:

Print (Soup.title). Encode (' GB18030 ')

Workaround Two:

BeautifulSoup when parsing a utf-8-encoded Web page, if you do not specify a fromencoding or if you specify Fromencoding as Utf-8, the Chinese garbled behavior occurs.

The workaround for this problem is to specify the value of the fromencoding parameter in the BeautifulSoup constructor as: GB18030

Copy Code code as follows:

Import Urllib2
From BeautifulSoup import BeautifulSoup

page = Urllib2.urlopen (' http://www.jb51.net/');
Soup = BeautifulSoup (page,fromencoding= "GB18030")
Print soup.originalencoding
Print soup.prettify ()

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.