PythonWhat are the
Python Common LibrariesWill the programmer put it down? So addictive, today we're going to sort this out.
Python Common Libraries, you are welcome in the comment area or private messages I add or modify the relevant library content.
First introduce BeautifulSoup This library, because when the first contact with the crawler , you see strongly recommend this library. Later, it was really good. But the programmer, how to suffocate in a library, haha.
The installation process of several libraries is no longer described, refer to Anaconda. -_-.
The following examples are used to test this.
html = "" "
BeautifulSoup
Beautiful soup supports the HTML parser in the Python standard library and also supports some third-party parsers. For example lxml html,lxml xml,html5lib. However, to install these libraries, it will use the Python internal standard library.
There are two ways to create an BeautifulSoup object.
1. Soup=beautifulsoup (HTML) #用变量内容来创建
2. Soup=beautifulsoup (' mysite.html ') #用本地文件来创建
BeautifulSoup is to convert HTML into a complex tree structure, each node is a Python object, with the front-end base of the alumni know, similar to DOM objects. There are roughly four objects in the BeautifulSoup, Tag, navigablestring, BeautifulSoup, Comment. Since most of our operations are for each label to extract information, so I briefly describe the commonly used tag object.
Tag
tag is the tag of HTML.
For example, head,title,a,p in HTML, and so on.
In practice, we will find the desired label through the selector and then manipulate the tag object to get the required information. In BeautifulSoup, commonly used findall () and find () to search the document tree to get the label you want. At the same time, BeautifulSoup also supports CSS syntax for searching, the Select () method, and the returned type is list.
Ps:
1.findAll () equivalent to Find_all ()
2. More familiar with the front-end, with the Select () method is more handy.
Find ()
Find () is equivalent to limit=1 in FindAll (), but find () returns the result, and FindAll () returns a list.
CSS Selector
BeautifulSoup supports the CSS syntax selector to find the desired label.
Select (CSS selector) example: Soup.select ('. MyClass #box ') #后代选择器soup. Select (' Head>title ') #子选择器soup. Select (' Div+p ') # Adjacent Brother selector soup.select (' div~p ') #后续兄弟选择器
You can also add property lookups.
Soup.select ('. MyClass a[id= "box"])
The Select () method returns a list form.
The above is almost the usual function of BeautifulSoup