International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Python

Python crawler tutorial -24-Data Extraction-BEAUTIFULSOUP4 (ii)

Last Update:2018-09-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article describes how BS traverses a Document object

Traversing Document objects

Contents:tag child nodes are exported as a list
Children: Child nodes are returned as iterators
Descendants: All descendant nodes
String: Prints the specific contents of the label with a string without a label, only the content
Case code 27bs3.py file: https://xpwi.github.io/py/py%E7%88%AC%E8%99%AB/py27bs3.py

# BeautifulSoup 的使用案例# 遍历文档对象from urllib import requestfrom bs4 import BeautifulSoupurl = ‘http://www.baidu.com/‘rsp = request.urlopen(url)content = rsp.read()soup = BeautifulSoup(content, ‘lxml‘)# bs 自动解码content = soup.prettify()print("=="*12)# 使用 contentsfor node in soup.head.contents:    if node.name == "meta":        print(node)    if node.name == "title":        print(node.string)print("=="*12)

Run results

Commonly used string to print out the specific contents of the label, without a label, only the content
Of course, if you think the traversal is too resource-intensive, there is no need to traverse the time, you can use the search

Searching for Document objects

Find_all (name, Attrs, recursive, text, * * Kwargs)
- uses Find_all (), which returns the list format, i.e. if Find_all (name= ' meta ' ), if more than one meta is returned as a list of
- name parameters: which character to search, the content that can be passed in is
  - 1. String
  - 2. Regular expressions, using regular to compile:
    For example : We need to print all the tags that start with me
    tags = soup.find_all (re.compile (' ^me '))
  - 3. Can also be a list
keyword parameter, which you can use to represent a property
text: literal value corresponding to tag
Case Code 27bs4.py file: https://xpwi.github.io/py/py%E7%88%AC%E8%99%AB/py27bs4.py

# BeautifulSoup 的使用案例# 搜索文档对象from urllib import requestfrom bs4 import BeautifulSoupimport reurl = ‘http://www.baidu.com/‘rsp = request.urlopen(url)content = rsp.read()soup = BeautifulSoup(content, ‘lxml‘)# bs 自动解码content = soup.prettify()# 使用 find_all# 使用 name 参数print("=="*12)tags = soup.find_all(name=‘link‘)for i in tags:    print(i)# 使用正则表达式print("=="*12)# 同时使用两个条件tags = soup.find_all(re.compile(‘^me‘), content=‘always‘)# 这里直接打印 tags 会打印一个列表for i in tags:    print(i)

Run results

Because two conditions are used, only one meta is matched
Next introduction, BeautifulSoup CSS Selector
Bye

-This note does not allow any person or organization to reprint

Python crawler tutorial -24-Data Extraction-BEAUTIFULSOUP4 (ii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Python thread pause, resume, exit detail and Example _python 01-18

Python design mode-UML-Package diagrams (Package Diagram) 09-09

Python abstract class (ABC module) 09-18

The difference between OS and sys two modules in Python 04-05

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python crawler tutorial -24-Data Extraction-BEAUTIFULSOUP4 (ii)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support