They say Python can't find a job? So what are these jobs? 0 Basic Crawl Intelligence!

Source: Internet
Author: User


As a ready to change the data analysis of the small white, I first contact is a network crawler learning, every time the crawler run has a new bug harvest, through constant debug, finally slightly can crawl some data, here want to share with you ~

Private messages Small 007 can get a small series of carefully prepared PDF 10 set Oh!

Take a look at the last page of search results

PS: Tips, in the lower page of the jump page enter a large number, such as 10000 can jump to the last page.

Right-click on the page source code, ctrl+f search key to crawl information, such as red box content

The big data analyst on the page red box can't be found!!!

It might be hidden in a JSON file.

Then try again and search the data analyst.

It's finally there.

Why is this, after the inspection found:

Daiwa back of the data analysis in the middle there is a <b> tag, what this means, frighten me hurriedly Baidu a bit

Set to Bold? Exm Well, it does show bold on the chart.

Continue to observe the source code, found that I want the information are in this (red box), it seems not to grab packet analysis Spicy ~

There's no time to explain, get in the car!

Well, you're not the driver at all, start writing code ...

The above is the setting path and the final data written to Excel file to pave the way






Set up five empty lists put the final message I want to catch

There is no Chinese ah, copy out to visit to see.

Sure as it is!!!

Notice that there is a p=1 at the end of this URL, which is probably the page number, I'll try it for 5.

Look, sure enough, I'll try the last 90th page

Range (1,91) loops through the 1~90 page, p= "+str (k) to construct the loop URL (I'm going to crawl all 90 pages down)

Select regular extraction by observing Web page construction

Every time a page is extracted, all the information is constantly circulating in the list of result11~51

Results such as

A total of 5,221 data, not web search 12,354, this is eaten half alive!

I ran again, and sure enough the number is different, OK ... This question still needs to be solved, the trouble everybody Dalao understand the message reminds the younger brother

This <b></b> tag looks uncomfortable and uses Excel to do some post-processing

Find replacements

The amount of the error

Originally my default is WPS open, replaced by Office Excel opened after the operation results are as follows

is not much better, have the opportunity to continue to thank for the subsequent data analysis of this data ~

The complete code is as follows:

The code runs about 15~20second

They say Python can't find a job? So what are these jobs? 0 Basic Crawl Intelligence!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.