Read BeautifulSoup Official document HTML tree Print

Source: Internet
Author: User

Prettify () can return a Unicode string of well-formed HTML:

Markup ='<a href= "http://example.com/" >i linked to <i>example.com</i></a>'Soup=BeautifulSoup (markup) soup.prettify ()#' Print(Soup.prettify ())####<body>#<a href= "http://example.com/" >#I linked to#<i>#example.com#</i>#</a>#</body>#

But you just want a string representing the HTML, and don't care about its format, you can use STR () or Unicode () ... Here str () returns a string formatted as UTF8, and you can use encode to turn it into bytestring or decode to make it Unicode.

Str (soup) # ' Unicode (SOUP.A)#  u ' <a href= "http://example.com/" >i linked to <i>example.com</i></a> '

There are some other details I don't look so much like, and finally there's a get_text () I'm mentioning that it can return all the text parts of the Calling tab ...

' <a href= "http://example.com/" >\ni linked to <i>example.com</i>\n</a> '  = beautifulsoup (markup) Soup.get_text () u'\ni linked to example.com\n' soup.i.get_text () u ' example.com '

You can also pass a string argument for him, using this parameter to divide the text of each part.

# Soup.get_text ("|") U'\ni linked to |example.com|\n'

You can also set the strip parameter to remove whitespace characters before and after each section (note that each part is not a whole)

# Soup.get_text ("|", Strip=true)u'I linked to|example.com'

Of course, this situation can also use the stripped_strings we mentioned earlier (), do not remember to read the previous article ...

 for inch Soup.stripped_strings] # [u ' I linked to ', U ' example.com ']

See here the document also read about 70%, I feel that these are enough for my current needs, so I do not continue to look down ...

Read BeautifulSoup Official document HTML tree Print

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.