Python web crawler, grilled data on the web __python

Source: Internet
Author: User
Tags sublime text python web crawler

Python is a very convenient thing to do the web crawler, the following first posted a piece of code, use the URL and settings can be directly to get some data:

Programming Environment: Sublime Text

<span style= "FONT-SIZE:18PX;" ># import needs to use the package, remember to install BeautifulSoup from
BS4 import beautifulsoup import	
urllib2
# You need to pick up the data website URL, Timeout is the overflow time, that is, in timeout time can not get the required data to exit, belong to the protection measures
Pagesource = Urllib2.urlopen ("http://www.ly.com/scenery/", timeout=8)
# Read the Web site's data
SourceData = Pagesource.read ()
sitesoup = BeautifulSoup (SourceData, "Html.parser")
# Data belongs to class
Selectkeys = Sitesoup.find_all ("div", attrs={"class": "S_com_detail"})
Selectkeyz = Sitesoup.find_all ("span", attrs={"class": "S_dis"})

# output format, the first is to output multiple data at the same time, the second represents output only a set of data, some of which are formatted for
Plink in Selectkeys:
	print "%s,%s,%s"% (Plink.find_all ("P") [0].find (Text=true), Plink.find_all ("i") [0].find ( Text=true), Plink.find_all ("B") [0].find (Text=true)) for
blink in Selectkeyz:
	print Blink.find (text=true) </span>


If you want to pick up the data from different websites, the procedures that need to be modified are as follows:



Action steps are as follows:

First step: First get the target URL, open the site you want to directly copy the URL, placed in the program box 1

The second part: on the website page on the right-check to open the following interface



The right is the source of the page, we need to find data from these sources

Step Three: Locate the data we want to download:

Click on this button:


Click the data you want to download select:


At this point the source of the Web page is located in this data place:


Step Fourth: Find the class that the data belongs to

Along the location of the source to look up, the first class is this data belongs to the class


Class before and after the DIV and detail respectively corresponding to the program inside the class before and after the content


Step Fifth: Find the unique identification for this data

The data we locate in the figure (ie 109) before and after the <b> and </b> respectively, so B is the data corresponding to the unique identification, the source of these pages all data such data (price) of the logo is B, the corresponding program in the last red box inside the contents.


At this point, the procedure is completed, if there is no clear place, please correct me

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.