Tsys okhtm.com modified version of Data acquisition method _ Application Skills

Source: Internet
Author: User
Tags closing tag file url
Acquisition function

[1] Classification management

A to add a channel



Channel B (Click to enter column management)



C Add a new column




[2] Project management

A add a new item



Project name: Fill in the name of the project, to facilitate their future management
Channel: Please select the channel to collect
Column: Please select the column you belong to
Topic: Not part of any special topic
Site name: The object of your own collection site name
Web site: The address of the object you use to collect
Web Login: No login settings parameter required
Login parameters: Need to log in to access please set login parameters (do not support Authenticode login)
Submit Address: Click on the Login button to verify the user name, password file URL, such as power 3.62
It's http://www.****.com/admin_chklogin.asp.
User (password) parameter: View the login form source code, find the form like the following
User name: <input type= "text" name= "UserName" value= "" >
Password: <input type= "password" name= "password" "value=" ">
User parameter is--the first line of the name= behind the username
The password parameter is--the second line of the name= behind the password
Failure information: User name or password is not correct, login after the failure of the message, used to determine whether the success of a sign, please be sure to fill in, otherwise can not collect, for example: you enter the user name or password is not correct, please re-enter!
Project notes: Other information to be recorded for the project, to be collected every day

B List Settings



Quote
A list is like a book's directory, a table of contents can have a page, or there are many pages, and the list is the same.

List index page:

You want to start collecting the list page.

List start/end tag:

Two points on the plane to determine a straight line, learn geometry? In this case, the start/end tag can be used to determine the news you are about to collect, and some of the results are not set up to capture other news.
For example, this is the main part of a list page code:
<table width= "98%" border= "0" cellspacing= "0" cellpadding= "3" >
<tr>
&LT;TD align= "left" valign= "Top" ><br>
<a href= "news.asp?id=1" target=_blank> News headlines </a><br>
<a href= "news.asp?id=2" target=_blank> News headlines </a><br>
.... Omitted
<a href= "news.asp?id=50" target=_blank> News headlines </a>
</td>
</tr>
</table>
The red part is the start and end tag of the list we want, is it the news clip you want to get in the middle? By doing so, you can choose a lot of start and end tags, which means they are not unique. But they are relatively unique, the only thing here is that the start tag is the only one in the code above the first news, and the closing tag is unique between the start tag and the end tag.

List Index Paging:

(1) Set the label
The partial code for the list index page is as follows:
&LT;TD height= align= "center" bgcolor= "#F6f7f8" > 1 <a href= "index_2.html" >2</A> <a href= "Index_" 3.html ">3</A> <a href=" index_4.html ">3</a><a href=" index_2.html "> Next </a>
<a href= "index_4.html" > last page </a> </td>
The red section is the paging start/end tag, so as long as the two codes are OK, is it not possible to determine the next page? , the rest of the process to deal with, and some fill in: <a href= "and" >2</a&gt, this is wrong, how wrong to think of themselves.

Index Paging redirection: Reference link settings

(2) Batch generation
If some lists are in this form:
First page http://www.it.com.cn/news/cyxw/yejie/index_1.html
Second page http://www.it.com.cn/news/cyxw/yejie/index_2.html
Third page http://www.it.com.cn/news/cyxw/yejie/index_3.html

Then you can set this: {$ID} is required

Original string: http://www.it.com.cn/news/cyxw/yejie/index_{$ID}.html

Build Scope: 1--3

The resulting program will generate: http://www.it.com.cn/news/cyxw/yejie/index_1.html

Http://www.it.com.cn/news/cyxw/yejie/index_2.html

Http://www.it.com.cn/news/cyxw/yejie/index_3.html

Such a few list pages

(3) Manually add

Enter a page URL and press ENTER to re-enter another page, so you can enter multiple URLs again and again.

(3) Link settings

Link start/end tag:

It's not set up in the collection process, it might stop the road.

Part of the Code

<table width= "98%" border= "0" cellspacing= "0" cellpadding= "3" >
<tr>
&LT;TD align= "left" valign= "Top" ><br>
<a href= "List.asp?type=it News" >[it News]</a><a href= "new.asp?id=1" target=_blank> News headlines </a>
<a href= "LIST.ASP?TYPE=PC News" &GT;[PC News]</a><a href= "new.asp?id=2" target=_blank> News headlines </a>
.... Omitted
<a href= "List.asp?type=it News" >[it News]</a><a href= "new.asp?id=50" target=_blank> News headlines </a>
</td>
</tr>
</table>
The Red section is a link start/end tag, note: If the headline has a column link (including other links, like the one on it news, PC News), the start tag must be stretched forward, and my previous 3.62 version of the video starts with href=, This can only be used for news headlines where there is no column link before.

Relocation of Links:

If news links are special, you can use this feature to reposition news URLs, such as some code might be:

<a href= "Javascript:window.open (' 1 ')" Target=_blank> News headlines </a><br>
<a href= "Javascript:window.open (' 5 ')" Target=_blank> News headlines </a><br>
.... Omitted
<a href= "Javascript:window.open (')" Target=_blank> News headlines </a>

Set the start/end tag to the red part, click a news to see its real page address, such as the first news address is this, http://www.scuta.net/news.asp?id=1, then the absolute link is set to http:// The www.scuta.net/news.asp?id={$ID} becomes.
C List Interception test





D List News link test



E Body Settings



F Sampling Test



G property setting



Set some collection options and note

Acquisition Options: Immediately publish save picture Reverse collection External links save pictures do not tick.

H point "complete". Collection Setup Complete

[3] Data acquisition

Here you can see the items you just set up,
Acquisition Mode: Fast mode stable mode filter Mode collection Test Body preview
These kinds of thinking, not much description. The results are all the same.

And then start a lengthy collection process. The speed of the server is related to the Internet.

[4] Data audits

In the data audit, there are "all" "" All "" "" all of these modes, point title can view the collection of articles (with pictures). You can also delete data

[5] Data export

is to import data from the collection library into the CMS datasheet, the default is to audit the article can be exported, if exported will show "exported" and vice versa.

There are several options to note when exporting data:



There are three kinds of export modes: Partial selection, all selection, the entire column export. But whatever that pattern is, select the [resource class] orientation or [resource characteristics] that you have set up in the system, and which categories you choose to export to.

Export complete.

Resource Management--> Regular resources you can see the article you just collected, the default is already audited.

You can then choose to generate or edit.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.