Python web crawler and Information Extraction-beautiful Soup Library Introduction

Source: Internet
Author: User
Tags python web crawler

I. Installation of Beautiful Soup Library

Win platform: "Run as Administrator" cmd

Execute pip Install Beautifulsoup4

Installation quiz: from BS4 import BeautifulSoup

Soup=beautifulsoup (' <p>data</p> ', 'html.parser')

Print (Soup.prettify ())

Second, the basic elements of Beautiful Soup Library

1, BeautifulSoup class

From BS4 import BeautifulSoup

Soup2=beautifulsoup (Open ("d://demo.html"), "Html.parser")

Soup=beautifulsoup ("

2. Basic elements of BeautifulSoup class <p class= "title" >...</p>

Basic elements Description
Tag tags, the most basic information organizational unit, with <> and </> to indicate the beginning and end
Name The name of the label,<p>...</p> is ' P ', format: <tag>.name
Attributes Label properties, dictionary form organization, format: <tag>.attrs
Navigablestring String in non-attribute string,<>...</> in tag, format: <tag>.string
Comment The annotation part of a string within a tag, a special type of comment

Review demo.html

>>> Import Requests
>>> r=requests.get ("http://python123.io/ws/demo.html")
>>> Demo=r.text
>>> Demo
'

Tag tag : Any tag that exists in HTML syntax can be accessed with soup.<tag> access

When there are multiple identical <tag> corresponding content in an HTML document,,soup.<tag> returns the first

>>> from BS4 import BeautifulSoup
>>> Soup=beautifulsoup (demo, "Html.parser")
>>> Soup.title
<title>this is a python demo page</title>
>>> SOUP.P
<p class= "title" ><b>the Demo Python introduces several Python courses.</b></p>
>>> Soup.a
<a class= "Py1" href= "http://www.icourse163.org/course/BIT-268001" id= "Link1" >basic python</a>

Python web crawler and Information Extraction-beautiful Soup Library Introduction

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.