Prettify () can return a Unicode string of well-formed HTML:
Markup ='<a href= "http://example.com/" >i linked to <i>example.com</i></a>'Soup=BeautifulSoup (markup) soup.prettify ()#' Print(Soup.prettify ())####<body>#<a href= "http://example.com/" >#I linked to#<i>#example.com#</i>#</a>#</body>#
But you just want a string representing the HTML, and don't care about its format, you can use STR () or Unicode () ... Here str () returns a string formatted as UTF8, and you can use encode to turn it into bytestring or decode to make it Unicode.
Str (soup) # ' Unicode (SOUP.A)# u ' <a href= "http://example.com/" >i linked to <i>example.com</i></a> '
There are some other details I don't look so much like, and finally there's a get_text () I'm mentioning that it can return all the text parts of the Calling tab ...
' <a href= "http://example.com/" >\ni linked to <i>example.com</i>\n</a> ' = beautifulsoup (markup) Soup.get_text () u'\ni linked to example.com\n' soup.i.get_text () u ' example.com '
You can also pass a string argument for him, using this parameter to divide the text of each part.
# Soup.get_text ("|") U'\ni linked to |example.com|\n'
You can also set the strip parameter to remove whitespace characters before and after each section (note that each part is not a whole)
# Soup.get_text ("|", Strip=true)u'I linked to|example.com'
Of course, this situation can also use the stripped_strings we mentioned earlier (), do not remember to read the previous article ...
for inch Soup.stripped_strings] # [u ' I linked to ', U ' example.com ']
See here the document also read about 70%, I feel that these are enough for my current needs, so I do not continue to look down ...
Read BeautifulSoup Official document HTML tree Print