Rookie Warrior to Quick grab module

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

These days I have been doing stations, feel each station manually to update, send articles, feel tired ah, feel that should buy a station group software. Because in Baidu a group of Chivalrous station cluster system, looked at the discovery has been replaced for years to pay now think of really regret last year did not start, 11544.html "> I contact www.xiake5.com on the Enterprise QQ, it is estimated that it is off work, no one should I." So in the group to ask who used a knight of the station group, a friend said he used, very good, and now there are free version, so in Www.xiake5.com/v3.rar download free version, began the knight for the first time.

First of all feel A5 small regulation gave me such a chance to try, hehe! The first time to get the authorization, is to open the software, updated for a while, open, found nothing understand. So point to help, while watching learning, in fact, is also very simple, there are official production of the tutorial. But when I choose to release the module, the first two are really not understand, it is utf8 this dongdong, I see the selection of this tutorial, is because his program is UTF8, I was gbk, do not know which to choose, the point of connection, said the error, later in the module, see GBK, try to choose that, The results were successful, and then inadvertently, and chose the UTF8, the result is also successful, I do not know whether the previous operation is wrong or how, Oh, now has released 100 Lais article, and then began to get a grab module to try.

First step:

1. Set crawl mode (spider crawling)

2. Click "Process 1" to start the setup.

Note: Spider crawling is mainly divided into two steps, that is, process 1 (crawling the URL of the article page) and Process 2 (crawl article content).

The following is: Click the process 1, the button after the window.

Step two: Configure URL grabbing parameters

1. Browser to open the page we want to collect.

2. View how the page is encoded. Under the console: in the Open Web page, right-click--> to view the source code--> find "CharSet" as shown in the following figure. After the "=" number is encoded. (Of course, 1, 2 steps, also can not, then I choose the encoding mode, the choice of automatic recognition can be.) )

3. Now the formal configuration (above two steps, can not).

Click the "Content page address extraction" button to eject the window. The following operation is shown in the operation sequence number of the figure below.

Note: There are three important steps below (i.e. 5,6,7 step). The purpose of this setting is that some of the extracted links are not links to the article pages, so we'll take them off. Where the result must be included, we set the URL attribute of the article page, the result is not included in, set the feature of the non post page.

To some, the URL extraction rule configuration of the article page, has been completed, we save in turn. Enter the next step, "Content extraction Parameters" configuration.

Step Three:

Click on the button, open the window, according to the configuration and order of the window to operate.

Click the test to come out of the window (ie extract the result)

At this point, the whole process is complete, we in turn, save each window, you can.

The following is the use of the above-made module, the Knight Station Group Log window output and article library collection of articles, the effect is good. ^_^ have a picture of the truth ...

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.