In [1]: from BS4 import BeautifulSoup
In [2]: s = "' <div class=" Markdown_views ">
...: <p>beautifulsoup is a library of Python, the main function is from the Web page
...: Crawl the data we need. BeautifulSoup parsing HTML into objects for processing, all page transitions
...: For a dictionary or an array, the relative hermetical of the expression means that the process can be greatly simplified. </p>
...:
...:
...:
...: <p> recommended install BeautifulSoup 4 version using PIP to install:</p> "
In [4]: bs = BeautifulSoup (S, "Html.parser")
In [5]: print (Bs.text)
---------------------------------------------------------------------------
Unicodeencodeerror Traceback (most recent)
<ipython-input-5-0ea5f8e54d3a> in <module> ()
----> 1 print (bs.text)
Unicodeencodeerror: ' ASCII ' codec can ' t encode character U ' \u662f ' in position 14:ordinal not in range (128)
in [+]: Import sys
in [+]: Reload (sys
...: )
<module ' sys ' (built-in) >
in [+]: sys.setdefaultencoding (' Utf-8 ')
in [+]: bs = BeautifulSoup (S, "Html.parser")
in [+]: print (Bs.text)
BeautifulSoup is a library of Python, and the main function is to crawl the data we need from the Web. BeautifulSoup parsing HTML into an object for processing, all pages into a dictionary or array, relative to the hermetical expression, can greatly simplify the process.
0X01 Installation
It is recommended to install the BeautifulSoup 4 version using PIP for installation:
In [20]:
BeautifulSoup remove tags from html to get text