Practice of the locomotive collector tutorial-compiling CMS collection rules

Source: Internet
Author: User
How to compile CMS collection rules for locomotive collectors

Practical video of the locomotive collector tutorial-CMS collection rules compilation

**************************************** ****************************
Practice of the locomotive collector tutorial-compiling CMS collection rules
Finishied by Victor qq: 99767290

**************************************** ****************************

First, let's take a look at the basic functions of LocoySpider V3,
The basic functions of the locomotive we use today are as follows:

1. Create a site

2. Create a task

3. "Save to software database" in data publishing Mode"

Of course, this tutorial is centered on the topic "CMS collection rules writing", so it is impossible to cover all aspects of the functions of the head of the engine collector, please forgive me!

Now we will explain it to you in combination with actual practices.
**************************************** *****************

1. Create a site

1. function: Aggregation of collection tasks with the same collection content rules on the same site

2. Benefits:

A,Clear classification to facilitate query and call;

B. Default collection tasks created under the siteInheritanceSite collection content rules,Avoid duplicationThe trouble of Compiling Collection rules;

3. Practice:

We useDaily economic newsAs an example to explain, first we open the site http://www.nba.com.cn, browse the articles in different columns to find the siteArticle mode (Template)Almost identical

(Of course, there is a small difference, that is, some paragraphs are marked by paragraphs. <>
</P> some sections are divided by <DIV> </DIV>. If your website layout is set to <
Table> </table> is no big deal, but if your website uses the <DIV> </DIV> layout,
The residual DIV tag may damage your original layout. We will continue to discuss the solution to this situation later, so I will not go into details here ).

Well, now we have reason to believe that we can set up a website's "content rules" to cover all the topics of this website.

Click "CREATE" and select "daily economic news"

We will first write the "title" rule


Compilation of Title Tag rules

Note: To confirm the start string of a tag, pay attention to two points: 1. uniqueness; 2. close-fitting principle, that is, try to be as close to the target collection area;

Start string:<Span class = "txt181">

End string:</Span> <span class = "hui">
Note: confirm the uniqueness of the string: copy the string and press the shortcut key Ctrl + F to search for it. If the string is unique, a prompt "XXX cannot be found" is displayed.

To ensure the universality of labels, we can select different articles for testing. Here we will not demonstrate /.

Html Tag exclusion:Select "select all ".

Note: We can retain "spaces (placeholders)" because the separation of "long titles" of some sites does not rely on punctuation or pure white spaces, instead, they are separated by "placeholders". At this time, we need to retain the "space (placeholder)" option. (Check it out after class)

In this case, we can directly perform a collection test on a "typical page" by using a dot to test the collection effect. If we are satisfied, we will then write the rules for the article content.

Writing of the content tag rules

Start string:<Span id = "zoom" class = "content">

End string:<Br> <iframe

Html Tag exclusion:At this point, we need to retain the commonly used strings "<br/>", "P", and "<DIV" used to divide paragraphs, and keep the commonly used images "

Note: we have chosen to exclude "<table", but some articles often contain "Data Tables". At this time, we can only take full care of the overall situation and check for missing information in the future. Unless you can confirm that no additional layout table exists in your target collection area, we still exclude the table as a good example.

Writing of the author's tag rules

The key points are the same as the title label rules.

Start string:<Div align = "center" style = font-size: 9pt>

End string:[200

Html Tag exclusion:Select "select all ". (Test)


Time tag writing rules

The main points are the same as above.

Start string:<Span id = "zoom" class = "content">

End string:<Br> <iframe

Html Tag exclusion:Select "select all ". (Test)

Source tag specification Compilation


This value is generally set to the target website we collect by default.Fixed format data", But if you want to better reflect the copyright awareness of your website, you can make adjustments when collecting and setting the reprinted articles on the target website, we will not repeat it here.

Now, we have set the "content rules" for the entire site. The following describes how to set the collection task.

**************************************** **********************************

2. Create a collection task

Right-click the collection site you just created and select "Create task from this site". In the displayed dialog box, check "content rules ", the result is as mentioned above: "collection tasks created under the site inherit the collection content rules of the site by default". Now, we can directly compile the rules for "collection URLs.

"Collection URL depth"Tag Compilation

For the sake of flexibility and convenience, we usuallyArticle list pageSo we can useThe default value is "1"For more in-depth collection, We will elaborate in subsequent tutorials. We will not go into detail here.

Start website collectionWriting rules

Click "add wizard". In the displayed dialog box, there are three options: "single-page Website", "Batch/Multi-page", and "text import". Generally, we will not use the "text import" method. Here we will only elaborate on the first two collection methods.

We will firstSingle-page website"Settings, here we select the" Real Estate "column for learning.

The list page url is

Http://www.nbd.com.cn/ClassNews.asp? D_SClassID = 74,

Copy to the text field, click "add", and "complete adding ".

Go back to "new task"-"collection URL" and click"Select a region to collect web site"Set

From:Align = 'left'> homepage-To:Class = right_font> total

Test, result 40 page article page... All acquisition Tests passed. Satisfied. (We will not perform collection here) continue to learn.

Let's learn"Batch/multiple pages"

Click"Wizard add"In the pop-up dialog box, select"Batch/multiple pages"

To determineVariableTo perform the following operations:

1. on the webpage,Click""Next Page, Found the address bar URL: http://www.nbd.com.cn/ClassNews.asp? D_SClassID = 74 & page =2

2. Move the mousePoint to "next page"Found the browser at the bottom left of the status bar display address is http://www.nbd.com.cn/ClassNews.asp? D_SClassID = 74 & page =3;

3. Move the mousePoint to "" Last Page"Found the browser at the bottom left of the status bar display address is http://www.nbd.com.cn/ClassNews.asp? D_SClassID = 74 & page =58;

4. Move the mousePoint to "" Homepage""Found the browser at the bottom left of the status bar display address is http://www.nbd.com.cn/ClassNews.asp? D_SClassID = 74 & page =1;

Http://www.nbd.com.cn/ClassNews.asp? D_SClassID = 74 & page = 2
Http://www.nbd.com.cn/ClassNews.asp? D_SClassID = 74 & page = 3
Http://www.nbd.com.cn/ClassNews.asp? D_SClassID = 74 & page = 58
Http://www.nbd.com.cn/ClassNews.asp? D_SClassID = 74 & page = 1

In this way, we can determine"& Page = (*); "Is the variable of its list URL. Can I set it as follows:

Multi-page similar Address URL form: http://www.nbd.com.cn/ClassNews.asp? D_SClassID = 74 & page = (*);

The number ranges from1 to 58, with an interval multiple of 1;

Click "add" and complete adding.

Here, the "collection URLs for selected regions" settings are the same as those for "single-page URLs" and "collection URLs for selected regions", which are not described here.

Click "Start test URL" (this process is very long and I have paused video recording)

Of course, inActualDuring the operation, if the data volume is largeYou can skip the test., Directly collect data, even becauseIncomplete applicability of rulesAs a resultData Loss, I think so tooCan be ignored.

Here, I only select 2 pages for collection

The test results are displayed on a total of 80 pages.

Next step: Set "data publishing method"

**************************************** **********************************

Method 1:"Save to software database",

At the same time, select method 3"Publish the Web to the website online"Custom publishing method","Custom category ID"Select3, To the taskNameFor "Real Estate", and "Save, update" collection tasks, since we have just started the tutorial, we will not study in depth.

Return to the main interface of the locomotive, right-click the "Real Estate" task, and select "start" to complete the collection.

Data is automatically collectedRelease to method 3TheSpecified topic (ID = 3), At the same timeSave:

Locomotive installation directory/DATA/No.-Task Name/SpiderResult. mdb

.

Oh, yesterday net gave me a prompt for my error ,,,

I had to write a text, video, and collect information to my website in three hours. I fainted N times, and I wrote a rough warehouse. It was completely written by my feelings ,, sorry, please forgive me !!!, Correct the following:

Method 1 and method 3 areParallel Relationship, Can be selected at the same time,You can choose either of them., If you are not releasing a module, you can directly collect it to the local software database. "Local software database"Is MicrosoftAccessYou can open the database to view and check the data.

Method 3Publish the Web to the website online", I will explain it in the subsequent tutorial. I hope you will be patient.

All right, this tutorial ends! Next lesson, goodbye !!!

Video

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.