An introductory tutorial on the ASP Thief Program (remote Data acquisition)

Source: Internet
Author: User
Tags end variable
Programs | tutorials | Getting Started | data | thief program The "thief" here refers to the powerful functionality provided by the XMLHTTP component in XML in the ASP, which crawls the data (images, Web pages and other files) from the remote Web site to the local, A class of programs that are displayed on a page or stored in a database after a variety of processing. You can use this kind of thief program to accomplish some seemingly impossible tasks in the past, such as cynical a page of a station to become your own page, or save some data (articles, pictures) of a station to the local database. The advantages of "thieves" are: no need to maintain the site, because the Thief program data from other sites, it will be updated with the site update, you can save a lot of server resources, the General Thief program on several files, all Web content is from other sites. The disadvantage is: instability, if the target site error, the program will also be wrong, and, if the target site to upgrade maintenance, then the Thief program to make corresponding changes; speed, because it is a remote call, speed and on the local server reading data than it is certainly slower. What, it sounds amazing, doesn't it? Let's start by learning some of the basics of "thief" programs.

Let's take a simple point to study the weather forecast program on the QQ website

The code is as follows:

<%
On Error Resume Next
server.scripttimeout=9999999
Function Gethttppage (Path)
t = GetBody (Path)
Gethttppage=bytestobstr (T, "GB2312")
End Function

' First of all, to carry out some initialization of the Thief program, the role of the above code is to ignore all non-fatal errors, the Thief program's running timeout time set very long (so do not run timeout error), converted to the original default UTF-8 encoding to GB2312 encoding, Otherwise, directly using the XMLHTTP component to invoke a page with Chinese characters will be garbled.

Function getbody (URL)
On Error Resume Next
Set retrieval = CreateObject ("Microsoft.XMLHTTP")
With retrieval
. Open "Get", url, False, "", ""
. Send
GetBody =. Responsebody
End With
Set retrieval = Nothing
End Function

' Then call the XMLHTTP component to create an object and initialize the settings.

Function Bytestobstr (Body,cset)
Dim objstream
Set objstream = Server.CreateObject ("ADODB.stream")
Objstream. Type = 1
Objstream. Mode =3
Objstream. Open
Objstream. Write body
Objstream. Position = 0
Objstream. Type = 2
Objstream. Charset = Cset
Bytestobstr = objstream. ReadText
Objstream. Close
Set objstream = Nothing
End Function

Function newstring (WSTR,STRNG)
Newstring=instr (LCase (WSTR), LCase (STRNG))
If Newstring<=0 then Newstring=len (WSTR)
End Function

' Processing crawled back data requires calling the ADODB.stream component and initializing the settings. %>

' Below is the page display section

<%
Dim wstr,str,url,start,over,city
' Define some variables that need to be used

City = request.querystring ("id")
The ID variable that the program returns (that is, the city the user chooses) is assigned to the ID

Url= "http://appnews.qq.com/cgi-bin/news_qq_search?city=" &city& ""
' Here you set the address of the page you want to crawl, but you can also specify an address directly without using a variable

Wstr=gethttppage (URL)
' Get all the data on the specified page

Start=newstring (WSTR, "' Here set the head of the data that needs to be processed, the variable should be set depending on the situation, and the content can be determined by looking at the source code of the page that needs to be crawled. Because in this program we need to crawl the entire page, so set the page all crawl. Note that the contents of the setting must be unique to the content of the page and cannot be duplicated.

Over=newstring (WSTR, "</HTML>")
' and start corresponds to the tail of the data that needs to be processed, and the content must be the only one on the page.

Body=mid (Wstr,start,over-start)
' Set the scope of the display page

' Here is the time to use the universe to remove the +++, and replace the characters specified in the data with some characters.

BODY = replace (body, "skin1", "Weather forecast-gram Network")
BODY = replace (body, "http://appnews.qq.com/cgi-bin/news_qq_search?city", "Tianqi.asp?id")

' The replacement has been completed in this program and a similar replacement operation can be carried out if there are other needs.

Response.Write Body
%>

After you replace the content you want to modify, you can display the modified content on the page. End of program

Program use methods and results: The above code to remove the description part after saving as tianqi.asp, upload to support ASP and XML space, run in the browser. You can do further landscaping or program optimization on the basis of this program.

These are just a few of the XMLHTTP components of the primary application, in fact it can also implement a lot of functions, such as saving remote images to the local server, with the ADODB.stream component can be obtained from the data saved into the database. Thieves have a wide range of roles and uses. But not to do the illegal thing Oh!

Perhaps some people have to ask, this "thief" program is only the patent of ASP? Non-also, PHP through the fopen function can achieve the same effect, because of the various characteristics of PHP itself, write out the Thief program and ASP, in volume and efficiency of the implementation has a clear advantage, but limited to space, here is not one to speak clearly

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.