The web site to do the sample today is 163. Entertainment channelThis should be a more general and practical rule, starting with the following.
If you are a train collector veteran, then you can refer to, because I would like to explain the contrary to the traditional thinking; if you are a novice then you'd better take a closer look because it will speed up your entry and save you a lot of time later. Here are some basic steps you can use to be flexible:
1, please open the train collector first, new site, look at the following figure:
For ease of administration you can take any name you find easy to remember for your site, but I recommend that the name of the target source be used as the name of the site to facilitate future management, as shown in
Most of the site, the station is often only a set of templates or a number of similar templates, this side of the so-called is similar to the logo in the template is very close to, what is the template tag? A template tag refers to the start and end of a section of content. For example, a lot of formal sites (usually some sites are larger, more content of the site, such as Sina, 163, etc.) will be in the beginning of the content of the section with similar -->Or
And so on to indicate the beginning of the content. They do this for two reasons, one is because of the content, in order to coordinate between the various departments to make a corresponding mark to facilitate the project handover, another reason is the need for content control, with the popularity of XHTML, the use of layer control more and more, This makes it more and more easy to find a collection mark (which you will understand later). Here's what we're going to tell you about the whole station content rule.
2, the title tag explanation. The corresponding page is here: http://ent.163.com/06/1029/11/2UJNHOS3000322EL.html
First from the "Site basic information" switch to the "whole station content rules", and then the content of the page to collect the URL copy to the "typical page" and then click "Test" read the source code. First from the beginning of the title tag, we found that by default tag collected more "_ NetEase Entertainment", please double-click the title tag or select the title tag in the Click to modify, "_ NetEase Entertainment" added to the exclusion content box, the title tag is complete. As shown in figure:
3, the content tag explanation. The most important thing in making a collection rule (Task) is to look for signs that start and end. At present, most collectors require the start and end of the logo must be the entire source code is the only sign, that is, all of the HTML source can only find a start or end of the logo. But the train collector doesn't need to do this, and all you need is the first sign from top to bottom. I mean, the HTML code allows you to have n the same start (end, same below) flag, but as long as this is located in the content we want to collect the logo is the first HTML from top to bottom on it. Open any Content page, this way for example, we found his content from "Enter the Forum", so double-click the Code test box to find the code you want, as shown in http://ent.163.com/06/1029/11/2UJNHOS3000322EL.html:
We can use this as the beginning of the content of the logo, but this is not perfect, please open a few content pages, in the page "right click"-"View Source", and then compare the code, and extract the same part, I to
A flag that starts as content.
Next look at the end of the content sign, as shown in two:
Here's how I set up the rules to collect the content.
In general, what we collect from the start sign to the end sign contains the content or ads that must be excluded, or links. What we need to rule out here is "
Related Topics >>> The sixth annual Golden Eagle TV Festival”。 The elimination method is to find the corresponding code to the complete copy of the code into the Content Exclusion window, the change in the part of the "(*)" replaced. Since this is the whole station rule, we have to find a few more categories, such as the current 163 entertainment also includes the "star | Pictures | Movie | TV | Music | Forum | Special Topics | Celebrity visit "And so on, here I only take" stars, pictures, movies "as an explanation of the sample. Looking for other categories just want to make the rules universal perfect, if you just have one of these categories, such as "picture" then you can directly do this rule.
Http://ent.163.com/06/1018/15/2TNNT7EU00031H2L.html This page just has a paging, so on the way down page settings. His side of the "prev" and "Next" is linked with the picture, so as long as not the name of the picture (right click on the corresponding picture to view attributes, copy Picture name can) copy into the corresponding code box, detailed look at the picture:
At this prompt, any exclusion of the content you just find the corresponding code complete copy of the code to exclude the window and the variable part of it replaced by "(*)" can be. Because he has no ads here, all the rules of the whole station even if finished, click Save into a single task production. OK, the whole station rules on these two tags, the other according to the need to follow the above steps to add, remember, same. Other questions, please go to the train collector. Forum: http://bbs.locoy.com discussion.
Two, the following tutorial on the production of single task rules:
1, the production of content rules, many people may not yet understand the train collector Fortunately, now that this is definitely the unique characteristics of the train (at least so far is so, later there is no person out of the same function is unknown!) The
Train collector does not need to go through the Web site rules to directly access the content collection, so you can according to the site's difficult to decide whether to collect selected target source, and do not have to wait until the site collection after the discovery of the original site you have no way to pick or not worth your time (the front of the time in vain!) )。
Train v3.0 One of the biggest features is that you can inherit the rules of the site, as long as you make the rule before the general, then all the tasks in the next no need to create content collection rules. Because we have made the Content collection rules general, so this side of the rules we do not need to explain, directly inherit the site, such as:
2, Web site acquisition rule making
Steps: "New"--"new task", and other actions as follows:
Rules need to be good at discovering the regularity of things, do this collection is no problem. We're going to take an example of the address in this http://ent.163.com/special/00031HI0/entnews.html
This board takes only 1-3 pages to sample. We found that each leaf URL began with the "past entertainment hotspots" End is "1th 2 ... page, so please copy the corresponding code to the HTML source code, to the specific area acquisition range, in addition, the URL must include "/06/" so that the URL collection is done (simple, try it yourself), the following figure:
3, the way to publish. There are 5 ways to publish, for example, this is the most commonly used "online release".
Select Web Publishing online to Web site, click "Define Global Publishing", and then follow the steps of the system: Select the Release module-"Fill out the site/cms root address-" Use the train built-in browser landing--"login to close the built-in browser-" Refresh List-"test module, Test success-" Save configuration--"Save task-" Publish the following figure highlights are the steps you want to take, from left to right, from top to bottom:
Below is the two screenshots that I collected for the Local Forum collection test: