Tsys OkHtm.com modified data collection method

Source: Internet
Author: User

Collection Function

[1] Classification Management

A. Add a channel.

Channel B (click to enter column Management)

C. Add a new topic

[2] Project Management

A. Add a new project

Project name: enter the name of the collected project for future management.
Channel: select the channel to collect
Topic: select the topic
Topic: does not belong to any topic
Website name: the name of the website of the object that you use to collect
Website URL: the address of the object that you use to collect
Website Logon: you do not need to set parameters for logon.
Login Parameters: Set the login parameters to be accessed only after Logon (Verification Code logon is not supported)
Submit address: Click the login button to verify the URL of the file used for user name and password, such as the dynamic 3.62
It is http: // www. *****. com/Admin_ChkLogin.asp.
User (password) parameters: view the source code of the logon form and find the following code
UserName: <input type = "text" name = "UserName" value = ">
Password: <input type = "Password" name = "password" value = ">
The user parameter is -- the first line name = followed by UserName
The Password parameter is -- the name of the second line = the Password
Failure Information: when the user name or password is incorrect, the error message after logon is displayed. It is a sign used to determine whether the logon is successful. Be sure to enter this information. Otherwise, it cannot be collected. For example: the user name or password you entered is incorrect. Please enter it again!
Project remarks: Other information of the project to be recorded, which will be collected every day

Set columns B

Quote A list is like a directory in a book. A directory can have one or more pages, and the list is the same.

List index page:

The list page to be collected.

List start/End mark:

The two points on the plane determine a straight line. Have you learned the ry? The same principle applies here. The start/End mark can determine the news you want to collect, and some do not set the result to collect other news.
For example, this is the main part of the code on a list page:
<Table width = "98%" border = "0" cellspacing = "0" cellpadding = "3">
<Tr>
<Td align = "left" valign = "top"> <br>
<A href = "News. asp? Id = 1 "target = _ blank> News Title </a> <br>
<A href = "News. asp? Id = 2 "target = _ blank> News Title </a> <br>
... Omitted
<A href = "News. asp? Id = 50 "target = _ blank> News Title </a>
</Td>
</Tr>
</Table>
The red part is the start mark and end mark of the list we want. Are you keeping the news you want in the middle? According to this method, you can select a lot of start and end tags, that is, they are not unique. However, they are relatively unique. The only one here means that the start tag is unique in the code above the first news, and the end tag is unique between the start tag and the end tag.

List index page:

(1) set tags
The code of the index page is as follows:
<Td height = "24" align = "center" bgcolor = "# F6f7f8"> 1 <a href = "index_2.html"> 2 </A> <a href = "index_3.html"> 3 </A> <a href = "index_4.html"> 3 </A> <a href = "index_2.html"> next page </a>
<A href = "index_4.html"> last page </a> </td>
The red part indicates the start/End mark of the page. If the two codes are correct, can they determine the next page ?, The rest is handed over to the program for processing, and some are filled in: <a href = "and"> 2 </A>. This is wrong.

Index paging redirection: see link settings

(2) Batch generation
For example, some lists are in this form:
First http://www.it.com.cn/news/cyxw/yejie/index_1.html
Second http://www.it.com.cn/news/cyxw/yejie/index_2.html
Page 3 http://www.it.com.cn/news/cyxw/yejie/index_3.html

You can set this: {$ ID} is required.

Original string: http://www.it.com.cn/news/cyxw/yejie/index_?$id=.html

Generation range: 1--3

The result program generates: http://www.it.com.cn/news/cyxw/yejie/index_1.html

Http://www.it.com.cn/news/cyxw/yejie/index_2.html

Http://www.it.com.cn/news/cyxw/yejie/index_3.html

Such list pages

(3) manually add

Enter one page of web site and press enter to enter another page. You can enter multiple web sites repeatedly.

(3) link settings

Link start/End mark:

It is not set here that the collection process may stop.

Some code

<Table width = "98%" border = "0" cellspacing = "0" cellpadding = "3">
<Tr>
<Td align = "left" valign = "top"> <br>
<A href = "List. asp? Type = IT news "> [IT news] </a> <a href =" New. asp? Id = 1 "target = _ blank> News Title </a>
<A href = "List. asp? Type = Pc News "> [Pc news] </a> <a href =" New. asp? Id = 2 "target = _ blank> News Title </a>
... Omitted
<A href = "List. asp? Type = IT news "> [IT news] </a> <a href =" New. asp? Id = 50 "target = _ blank> News Title </a>
</Td>
</Tr>
</Table>
The red part indicates the start and end of the link. Note: if there is a Topic Link (including other links, just like IT news and Pc News) in front of the news title, the start tag must be extended. In my previous 3.62 video, the start tag is href =, which can only be used when there is no column link before the news title.

Link location:

If the news link is special, you can use this function to relocate the news URL. For example, some code may be like this:

<A href = "Javascript: window. open ('1')" target = _ blank> News Title </a> <br>
<A href = "Javascript: window. open ('5')" target = _ blank> News Title </a> <br>
... Omitted
<A href = "Javascript: window. open ('50')" target = _ blank> News Title </a>

Set the start/End mark to the red part and click a news item to view its real web address, such as the address of the first news, http://www.scuta.net/news.asp? Id = 1, then the absolute link is set to http://www.scuta.net/news.asp? Id = {$ ID.

Column c table truncation Test

Test the new link in column d

E. set positive text

F Sample Test

G property settings

Set some collection options. Note:

Collection option: Immediately publish and save images. Do not select this option to save images in the external link of inverted collection.

H: "complete". Collection settings complete

[3] Data Collection

Here you can see the project you just set,

Collection mode: fast mode, stable mode, filtering mode, collection, test body Preview

The results are similar.

Then a long collection process is started. The server speed is related to the network speed.

[4] data review

In the data review, select "select all" and select "all". Click the title to view collected articles (with images) or delete data.

[5] data export

Is to import data from the collection database to the cms data table. By default, only approved articles can be exported. If exported, "exported" is displayed, and vice versa.

Note the following when exporting data:

There are three export modes: Select part, select all, and export the entire topic. however, no matter which mode you want to select, please output the [resource category] direction or [resource features], which are related to the resource category you have set up in the system, select the category to export.

Export completed.

Resource Management --> you can see the articles you just collected in regular resources. By default, they have been reviewed.

Then you can select generate or edit.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.