Locomotive Collector 3.0 Capture Graphics Tutorial _php tutorial

Source: Internet
Author: User
Part of the function to capture the sample
The site that we're going to make a sample of today is 163. Entertainment channelThis should be a more general and practical rule, starting with the following.
If you are a train collector veteran, then you can refer to, because I would like to explain the contrary to the traditional thinking, if you are a novice, you'd better take a closer look, because it will speed up your entry, and save you a lot of time in the future. Here are some basic steps to capture, which you can use flexibly:
First, establish a site
1, please open the train collector, new site, see:

For ease of administration you can take any name you find easy to remember for your site, but I recommend using the name of the target source as the name of the site for future management, such as
Most of the site, the station is often only a set of templates or a few sets of similar templates, the so-called similar to the template in the tag is very close, then what is the template tag? A template tag refers to a part of a content start and end mark. For example, many formal sites (usually some sites are larger, more content sites, such as Sina, 163, etc.) will be in the beginning of the content with similar or other signs to indicate the beginning of the content. They do so for two reasons, one is because of the content, for the coordination between the various departments to make a corresponding mark to facilitate the handover of the project, another reason is the need for content control, with the popularity of XHTML, with more layers of control, This makes it more and more easy for us to look for collection signs (which you will understand later). The above is for you to talk about this because the next thing we want to explain is the whole site content rules.
2, the title tag explanation. The corresponding page is in this: http://ent.163.com/06/1029/11/2UJNHOS3000322EL.html
First, from "Site basic information" switch to "full site content Rules", and then the content of the page to be collected to copy the URL to "typical page" and then click "Test" read the source code. First from the title tag, we found that the default label collected back more than the title "_ NetEase Entertainment", please double-click the title tag or select the title tag in the Click to modify, the "_ NetEase Entertainment" added to the exclusion box, the title tag completed.

3, the content tag explanation. The most important thing to make any tag of a collection rule (Task) is to look for a sign that the start is over. Most of the current collectors require the start and end of the logo must be the entire source code is the only sign, that is, all the HTML source can only find a beginning or end of the logo. But the train collector does not need to do this, and all you need is the first sign from the top to the bottom, I mean, the HTML code is allowed to have n the same start (end, same as) flag, but as long as this is located in the content we want to capture the HTML from top to bottom of the first can be. Open any Content page, this way http://ent.163.com/06/1029/11/2UJNHOS3000322EL.html for example, we found his content from "Enter the Forum", so double-click the Code test box to find the required code,

We can use this as a sign of the beginning of the content, but this is not perfect, please open a few content pages, in the Web page, "right click"-"View Source", and then compare the code, and extract the same part, I started as the content of the logo.
Next look at the end of the content flag, as shown in the two figure:

Here's what you get back from the rules set by me.

In general, the content that we collect from the start logo to the end sign will contain content or advertisements that must be excluded, or links. What we need to exclude here is " Related Topics >>> Sixth annual Golden Eagle TV Festival”。 The elimination method is to find the corresponding code to the complete copy of the code into the Content Exclusion window, the change of the part with "(*)" substitution. Since this is the whole site rule, you have to find a few more categories, such as the present 163 entertainment also includes the "star | Pictures | Movie | TV | Music | Forum | Special Topics | Celebrity visit "And so on, here I only take" stars, pictures, movies "as a sample to explain with you. Find other categories just want to make the rules of the general perfect, if you just one of the categories, such as "picture" then you directly do this rule.
Http://ent.163.com/06/1018/15/2TNNT7EU00031H2L.html This page just has pagination, so by the way, the settings on the next page. His side of the "previous" and "Next" is a picture to do the link, so long as the name of the image (right click on the corresponding image to view the properties, copy the image name can) copied into the corresponding code box, detailed look at the picture:

This way, the exclusion of any content you just need to find the corresponding code complete copy into the code to exclude the window and replace the variable part of the "(*)". As there is no advertisement on his side, all the rules of the whole station are finished, click Save to go to the single task production. OK, the whole station rules on the two tags, the other according to the need to follow the above steps to add, remember, original aim. Other questions, please go to the train collector. Forum: http://bbs.locoy.comExplore

Second, the following explanation of the single-task rule production:
1, the content of the rules of the production, many people may still do not understand the train collector Fortunately, now speaking of this is definitely the unique characteristics of the train (at least so far, there is no one out of the same function is unknown!). )
Train collectors do not need to go through the site rules can be directly into the content collection, so that you can easily decide whether to collect the selected target source according to the site, and do not have to wait until the URL after the acquisition of the site you have no way to harvest or not worth your waste this time (the front time wasted!) )。
Train v3.0 One of the biggest features is that you can inherit the rules of the site, as long as you make the rules in front of the general, then all the next task will not need to create content collection rules. Because the content collection rules that we make in the front are universal, so we don't have to explain the rules here, directly inherit the site,

2. Website Collection rule Making
Steps: "New"--"new task", other actions such as:

Rules need to be good at discovering the regularity of things, to do this collection is no problem. We're going to take the address of the sample here http://ent.163.com/special/00031HI0/entnews.html
This board only collects 1-3 of these pages as an example. We find that each leaf URL begins with "past entertainment hotspots" ending with "1th 2 ... Page ", so please go to the HTML source code to copy the corresponding code, to a specific area of the collection range, in addition, the URL must contain"/06/"such as the URL collection is done (simple, try it yourself), such as:

3, the release method. There are 5 ways to publish, with the most commonly used "online publishing" as an example.
Select the web to publish to the Web site, click "Define Global Publishing Method", and then follow the system prompts: Select the publishing module--"Fill in the site/cms root address-" Use the train built-in browser login-"Login after landing-" closed browser-"Refresh List-" test module, Test success-Save configuration --"Save task--" The highlighted part is the step you want to operate, from left to right from top to bottom:

Here are the two screenshots I collected from the Local Forum collection test:

http://www.bkjia.com/PHPjc/318149.html www.bkjia.com true http://www.bkjia.com/PHPjc/318149.html techarticle to capture an example of some of the features that are going to give you a sample today is the 163 entertainment channel This should be a more general and practical rule, start below. If you are a train collector ...

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.