Python BeautifulSoup 2 ways to solve Chinese garbled problem

Source: Internet
Author: User
Workaround One:

Use Python beautifulsoup to crawl the page and then output the page title, but the output is always garbled, find a long time to find a solution, the following share to everyone
The first is the code
Copy the Code code as follows:


From BS4 import BeautifulSoup
Import Urllib2

url = ' http://www.jb51.net/'
page = Urllib2.urlopen (URL)

Soup = BeautifulSoup (page,from_encoding= "UTF8")
Print soup.original_encoding
Print (Soup.title). Encode (' GB18030 ')

File = Open ("Title.txt", "W")
File.write (str (soup.title))
File.close ()



For link in Soup.find_all (' a '):
Print link[' href ']


At the beginning of the test found that although the output is garbled, but written in the file is normal. And then find a solution on the internet to find out
Print the logic of an object: The inside is called the __str__ of the object to get the corresponding string, here corresponds to the soup __str__ and for the soup itself, is actually Unicode encoding, so you can specify __STR__ output when the encoding is GBK, So that the non-garbled Chinese is displayed correctly here
And for the cmd: (Chinese system) encoded as GBK, so as long as the re-encoded as GB18030 can be normal output
This is the line of code
Copy CodeThe code is as follows:


Print (Soup.title). Encode (' GB18030 ')

Workaround Two:

BeautifulSoup when parsing a utf-8-encoded Web page, if you do not specify Fromencoding or specify Fromencoding as Utf-8, there will be garbled characters in Chinese.

The workaround for this problem is to specify the value of the fromencoding parameter in the BeautifulSoup constructor as: GB18030
Copy the Code code as follows:


Import Urllib2
From BeautifulSoup import BeautifulSoup

page = Urllib2.urlopen (' http://www.jb51.net/');
Soup = BeautifulSoup (page,fromencoding= "GB18030")
Print soup.originalencoding
Print soup.prettify ()

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.