Locomotive collector 3.0 graphic acquisition tutorial

Source: Internet
Author: User
Today we are going to give you an example of a website with 163 entertainment channels. This should be a common and practical rule. Let's start.

Today we are going to give you an example of a website with 163 entertainment channels. This should be a common and practical rule. Let's start.

Detailed description of some functions using collection examples
Today, we are going to give you an example. The website is 163Entertainment channelThis should be a more general and practical rule. Start later.
If you are a veteran of the train collector, you can refer to it because what I want to explain violates the traditional thinking. If you are a newbie, you 'd better take a closer look, this will speed up your entry and save you a lot of time in the future. The following are some basic steps for data collection, which can be used flexibly:
1. Create a site
1. Open the train collector and create a new site:


To facilitate management, you can obtain any name that you think is easy to remember for your site. However, it is recommended that you use the name of the target source as the site name for future management, as shown in


Most websites usually have only one set of templates or several similar templates. The so-called "mark in the template" is very similar. What is "template mark? Template tag refers to the start and end mark of a part of content. For example, many regular websites (usually some websites with relatively large content, such as sina and 163) use

To indicate the beginning of the content. There are two reasons for this. One is due to a large amount of content, which marks the cooperation between various departments for the convenience of project handover, and the other is the need for content control, with the popularity of xhtml, more and more layer-based controls make it easier to search for collection tags (which you will understand later ). The reason for this is that we are going to explain the content rules of the entire site.
2. Title label description. The corresponding page is here:
First, switch from "basic site information" to "whole site content rules", and then copy the URL of the content page to "typical page", and then click "test" to read the source code. Starting from the title tag, we found that the title collected by default tags has more "_ Netease entertainment". Double-click the title tag or select a title tag and click Modify, add "_ Netease entertainment" to the excluded content box, and the title label is complete.


3. content label description. The most important thing to create a tag for a collection rule (task) is to find the sign that begins and ends. Currently, most collectors require that the start and end signs must be the only sign of the entire source code, that is, only one start or end sign can be found in all html source code. However, the train collector does not need to do this. You only need to find the first icon from top to bottom. I mean, the html code allows n identical start (end, the same below) signs, however, as long as the sign of the content we want to collect is the first html from top to bottom. Open any content page. Here, we take it as an example. We found that the content is from "go to Forum". Therefore, double-click the code test box to find the required code,


We can use this as the marker for starting the content, but this is not perfect. Please open several content pages and right-click the page -- view source code ", then compare the code and extract the same part.

As a sign of the beginning of the content.
In general, the content we collect from the start sign to the end sign contains content, advertisements, or links that must be excluded. Here, we need to exclude "related topics> the sixth golden e TV and Art Festival ". The exclusion method is to find the corresponding code and copy the complete code into the content exclusion window. The change part is replaced. Because this is a full-site rule, you must find several more categories, for example, the current entertainment 163 includes "Stars | images | movies | TVs | music | forums | topics | celebrity visits, here, I will only extract "stars, images, and movies" to explain them to you. Looking for other categories is just to make the rules general and perfect. If you only need one of the categories, such as "Images", then you can directly make this rule.
This page has pages, so let's take a look at the settings on the top and bottom pages. The "Previous Page" and "next page" on his side use images as links, so you only need to choose not the image name (right-click the corresponding image to view attributes and copy the image name) copy it to the corresponding code box. For details, see the image:

Rules must be good at discovering regular things. It is no problem to do this collection. The address of the sample to be collected is
This board only collects 1-3 pages as an example. We found that the beginning of each foliar website included "past entertainment hotspots" and ended with "1st 2 ...... Page, so please copy the corresponding code in the html source code to the collection range in a specific area. In addition, the website must contain "/06/", so that the website can be collected (simple, try it yourself), such:

The following are two screenshots that I just collected from the local forum for testing:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.