Easily crawl web data with Chrome plugin Web Scraper for 10 minutes

Source: Internet
Author: User

This article tags: webscraper Chrome plugin web page data crawling

Using the Chrome plug-in Web Scraper can easily crawl the Web page data, do not write code, mouse operation, where to crawl, not to consider the Crawler's landing, verification code, asynchronous loading and other complex problems.

Web Scraper Plugin

Introduction to Web Scraper official website:

Web Scraper Extension (free!)
Using our extension can create a plan (sitemap) how a Web site should is traversed and what should is Extracted. Using these sitemaps the Web Scraper would navigate the site accordingly and extract all Data. Scraped data later can be exported as CSV.

Let's take a look at the data I crawled with the web Scaper:

1. Wheel Brother fans

Wheel Brother has more than 540,000 fans, I only grabbed the first 20 pages 400 records

Setting data fields

2. Pinterest 7th Popular Data

Run crawlers to get data

Exporting data

Web Scraper crawl process and key points:

Three steps to complete the crawl operation after installing the Web scraper plugin
1. Create new Sitemap (creating a crawl Project)
2, choose to crawl the content of the Web page, point to point ~ point, operation
3. Open crawl, Download CSV data

The most critical of these is the second step, two points:

    1. Select Block Element first, each piece of data we take on the page, are duplicates, check multiple
    2. Re-fetch the required data fields in the data block (columns in Excel)

The main point of crawling large amounts of data is mastering pagination Control.
Pagination is divided into 3 situations:

    1. URL parameter Paging (more structured)
      A page parameter with paging in the URL, such as:

      https://www.zhihu.com/people/excited-vczh/followers?page=2

      When you create a sitemap directly, you can bring up the paging parameter in the Start url, written like This:

      https://www.zhihu.com/people/excited-vczh/followers?page=[1-27388]
    2. Scroll load, click "load more" to load the page data

    3. Click the Pagination Number tab (including "next" Tab)
      Note that the 第2-3 species here can be categorized as a way of loading asynchronously, most of which can be transformed into a 1th way to handle it.
      This way paging is not very well Controlled. The use of Link or Element Click is generally used for paging operations.

Diagram Web Scraper operation steps:

First Step: Create a sitemap

Step Two: Select the block data element

Step Three: Select the captured field text

Fourth Step: crawl

Web Scaper Usage experience:

1) In addition to the regular paging method, the other paging method is not good control, different sites by the page label, the operation is not the Same.

2) because the direct crawl of the page display values, crawl data is not well-structured, you need EXCEL function processing.
For example, Pinterest 7th popular in the article published time, the format has several kinds.

3) a little bit of web code based on the very fast, code is the king Ah.
In particular, a bit of python-based, in the selection of page data is easy to operate, understand, found in the operation of the Problem.

4) compared to eight claw fish, locomotive and other data collectors, Web scraper do not need to download software, free, no registration, but also a little bit of code Operation. of course, web scraper also have a paid cloud crawler.

Web Scraper can also import sitemaps, the following code to import, you can crawl to the wheel of the first 20 pages of fans:

{"starturl": "https://www.zhihu.com/people/excited-vczh/followers?page=[1-20]", "selectors": [{"parentSelectors": ["_root"], "type": "selectorelement", "multiple": true, "id": "items", "selector": "div." List-item "," delay ":" "},{" parentselectors ": [" items "]," type ":" selectortext "," multiple ": false," ID ":" name "," Selector ":" Div. Useritem-title a.userlink-link "," regex ":" "," delay ":" ""},{"parentselectors": ["items"], "type": "selectortext", " Multiple ": false," ID ":" desc "," selector ":" Div. " RichText "," regex ":" "," delay ":"},{"parentselectors": ["items"], "type": "selectortext", "multiple": false, "id": " Answers "," Selector ":" Span. Contentitem-statusitem:nth-of-type (1) "," "regex": "", "delay": ""},{"parentselectors": ["items"], "type": " Selectortext "," multiple ": false," ID ":" articles "," selector ":" Span. " Contentitem-statusitem:nth-of-type (2) "," "regex": "", "delay": ""},{"parentselectors": ["items"], "type": " Selectortext "," multiple ": false," ID ":" fans "," selector ":" Span. " Contentitem-statusitem:nth-of-type (3) "," regex ":" "," delay ":" "}]," _id ":" zh_vczh "}

PS, Web Scraper Data Tutorial

    1. Video tutorials in the official website
      Http://webscraper.io/tutorials

    2. A detailed step was written in the answer to @ Chen Dahin, and a video tutorial was Recorded.

      • Video tutorial (1): http://www.bilibili.com/video/av9664397/

      • Video tutorial (2): http://www.bilibili.com/video/av9708200/

      This question source 0 how to learn the crawler technology? @ Chen Dahin in the article in the Excel crawler, Web scraper, code Crawler to do a comparative analysis.

written at the End: for Freedom look outside the world, and it this line, not to go to Google data, finally, Amway some speed agent.

Accelerator recommendations Free Solutions Payment Plan Official website
A Red apricot accelerator Free program is not available, stable high-speed Enter 80 percent coupon code wh80, annual pay only 80 yuan/year Official website Direct HTTP://WHOSMALL.COM/GO/YZHX
Azumino Accelerator Best use of foreign trade VPN Minimum ¥30/month Official website Direct Http://whosmall.com/go/ay
Loco Accelerator Free 2 hours per day Minimum ¥15/month Official website Direct Http://whosmall.com/go/loco

This article tags: webscraper Chrome plugin web page data crawling

Turn from SUN's BLOG-focus on Internet knowledge, share the spirit of the internet!

Original Address : " crawl of Web data with Chrome plugin Web Scraper for 10 minutes "

Related reading : How does MacOS use the Package Manager homebrew-cask to install software? "

Related reading : How can I use Launchbar to download all the files on a webpage on Mac? "

Related reading : How does MacOS use Launchbar to upload files to Google drive? "

Related reading : " best Mac App Quick start and switch tool: Manico 2.0"

Related reading : Why do I choose Window Tidy as the MacOS split-screen tool? "

Related reading : Chrome extension stylish: "skin-changing" with one click to not like a website

Related reading : "integrating QQ music, netease cloud music and shrimp music resources" with chrome extension listen 1 "

Related reading : "8" new tab "chrome extensions: teach you to play the New tab page with a sneak "

Related reading : "7 practical Chrome Extensions Recommended: help you improve your chrome experience "

Related reading : " no extension is not Chrome: 15 premium chrome extensions recommended for everyone "

Related reading : thebest experience for Web browsing with 12 no less chrome extensions

Related reading : "5 Chrome extensions that bring happiness "

related reading: useful for Programmers: 2017 latest in Google's Hosts file download and summary of the various hosts encountered the problem of the solution and configuration of the detailed

Related blog:SUN's blog -focus on Internet knowledge, share the spirit of the internet! Go and see:www.whosmall.com

Original Address: http://whosmall.com/?post=473

Easily crawl web data with Chrome plugin Web Scraper for 10 minutes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.