Make Python easier-introduction to Python's common standard libraries and introductions

Source: Internet
Author: User
Tags introductions
PythonWhat are the Python Common LibrariesWill the programmer put it down? So addictive, today we're going to sort this out. Python Common Libraries, you are welcome in the comment area or private messages I add or modify the relevant library content.

First introduce BeautifulSoup This library, because when the first contact with the crawler , you see strongly recommend this library. Later, it was really good. But the programmer, how to suffocate in a library, haha.

The installation process of several libraries is no longer described, refer to Anaconda. -_-.

The following examples are used to test this.

html = "" "

BeautifulSoup

Beautiful soup supports the HTML parser in the Python standard library and also supports some third-party parsers. For example lxml html,lxml xml,html5lib. However, to install these libraries, it will use the Python internal standard library.

There are two ways to create an BeautifulSoup object.

1. Soup=beautifulsoup (HTML) #用变量内容来创建

2. Soup=beautifulsoup (' mysite.html ') #用本地文件来创建

BeautifulSoup is to convert HTML into a complex tree structure, each node is a Python object, with the front-end base of the alumni know, similar to DOM objects. There are roughly four objects in the BeautifulSoup, Tag, navigablestring, BeautifulSoup, Comment. Since most of our operations are for each label to extract information, so I briefly describe the commonly used tag object.

Tag

tag is the tag of HTML.

For example, head,title,a,p in HTML, and so on.

In practice, we will find the desired label through the selector and then manipulate the tag object to get the required information. In BeautifulSoup, commonly used findall () and find () to search the document tree to get the label you want. At the same time, BeautifulSoup also supports CSS syntax for searching, the Select () method, and the returned type is list.

Ps:

1.findAll () equivalent to Find_all ()

2. More familiar with the front-end, with the Select () method is more handy.

Find ()

Find () is equivalent to limit=1 in FindAll (), but find () returns the result, and FindAll () returns a list.

CSS Selector

BeautifulSoup supports the CSS syntax selector to find the desired label.

Select (CSS selector) example: Soup.select ('. MyClass #box ') #后代选择器soup. Select (' Head>title ') #子选择器soup. Select (' Div+p ') # Adjacent Brother selector soup.select (' div~p ') #后续兄弟选择器

You can also add property lookups.

Soup.select ('. MyClass a[id= "box"])

The Select () method returns a list form.

The above is almost the usual function of BeautifulSoup

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.