An easy-to-use crawler frame

Source: Internet
Author: User

This article reprinted to http://www.tuicool.com/articles/VZBj2e

Original http://itindex.net/detail/52388-Frame

WebMagic is a crawler framework that does not need to be configured and facilitates two development, providing a simple and flexible API that allows a crawler to be implemented in just a small amount of code.

Official website http://webmagic.io/

WebMagic is an open source Java Vertical Crawler framework with the goal of simplifying the crawler's development process and allowing developers to focus on the development of logical functions. The core of WebMagic is very simple, but the whole process of covering the crawler is also a good learning material for crawler development. The author has been in the former company for a year of vertical crawler development, webmagic is to solve the crawler development of some repetitive work generated by the framework.

Web crawler is a technology, WebMagic is committed to the implementation of this technology to reduce the cost, but because of the respect of resource providers, WebMagic will not do anti-blocking things, including: Verification code cracking, proxy switching, automatic login, etc.

Key Features of WebMagic:

  • Fully modular design, powerful scalability.
  • The core is simple but covers all the processes of the crawler, flexible and powerful, but also a good material to learn how to get started.
  • Provides a rich Extract page API.
  • No configuration, but a crawler can be implemented through the pojo+ annotations form.
  • Support Multithreading.
  • Support distributed.
  • Supports crawling of pages with JS dynamic rendering.
  • No frame-dependent, can be flexibly embedded into the project.

Http://git.oschina.net/flashsword20/webmagic#readme

An easy-to-use crawler frame

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.