The knight crawls the target website content actual combat

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Previously mentioned, chivalrous Station group built a rich crawl module, can realize from the mainstream search engine, portal site, blog and other places to crawl related content. But if you want to get higher quality content, use the advanced function of Knight Station Group, build crawl module, crawl from Target website is a good way undoubtedly. Now, the case-oriented, sharing under my use of Knight Station group to make grab module, crawl the content of specific target site experience.

There are four kinds of models in the grasping module of Chivalrous Station GROUP--The intelligent grasping mode of key words, custom crawl mode, spider crawl mode and synchronous tracking mode, crawl the content of a specific target site, we usually use the custom crawl mode and Spider crawl mode, now take the custom grab mode as an example, step-by-step split to explain.

  

1, in the production of a new module menu, new Capture module;

2, check the custom crawl mode

3, click enter "Process 1: Get list link", set up. The main function of this step is to get the address of each article column page.

A) from the target site source code, access to the site source code, and check

b Enter the address of the website section to be collected

c) Set pagination extraction rules, extraction needs to collect all the pages under the column page

Pagination extraction rules specific operations in the Knight Station Group official video tutorials written in very detailed, can be viewed through the official Knight teaching video.

D test results and save.

4, enter "Process 2: Get Content Link", set up. The main function of this step is to get the URL address of the article that needs to be crawled through the article section page.

A click on the new result to extract the rules and fill in the relevant parameters.

b Fill in the Test List page address

c) test the rules and save them.

5, into the "Process 3: Content to obtain parameters", to set up, this step is mainly to achieve in the article page to crawl the content of the article function.

This step is relatively simple, generally speaking, choose "intelligent extraction Body, title mode" can be, because the Knight Station group Software has a strong intelligent grasping ability, generally can be more perfect to grasp the content and title of the body. Enter the target page, test, preview the viewing effect, click Save, it's done. The following diagram shows a simple diagram and effect, which captures the content of the target page exactly.

6, save the rule to the local, in case of loss

7, the rules will be submitted to the Knight Station group server, and then look at the software module backstage, you can see that their new crawl module in the background of the module.

At this point, the Knight Station Group crawl target site module new completed. We use our own crawl rules in our tasks. In use, we can at any time according to their own needs, the capture module to modify. This process refers to the official video tutorial entrance: The official tour of the Http://www.xiake5.com/demo Knight Station group completed. The official tutorials are perfect, and the newcomers are very simple. Impression: The production of collection module, always think it is difficult, but the actual operation, step by step is quite simple. The world is difficult and easy to do, as long as to do, rare will become simple, that is the truth. "Executive power" is really important!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.