Principle
The collection program uses the XMLHTTP component in XML to call webpages on other websites. For example, many news collection programs call sina's news pages, replace the html and filter advertisements. The advantages of using the collection program are: No need to maintain the website, because the data in the collection program comes from other websites, it will be updated with the updates of the website; can save server resources, generally, the collection program involves several files, and all the webpage content is from other websites. Disadvantages: unstable. If an error occurs on the target website, the program will also go wrong. If the target website is upgraded and maintained, the collection program must be modified accordingly; speed, because it is a remote call, it must be slower than reading data on the local server.
I. Examples
The following is a brief description of the Application of XMLHTTP in ASP.Copy codeThe Code is as follows: <%
'Common Functions
1. Enter the url of the target webpage. The returned value getHTTPPage is the html code of the target webpage.
Function getHTTPPage (url)
Dim Http
Set Http = server. createobject ("MSXML2.XMLHTTP ")
Http. open "GET", url, false
Http. send ()
If Http. readystate <> 4 then
Exit function
End if
GetHTTPPage = bytesToBstr (Http. responseBody, "GB2312 ")
Set http = nothing
If err. number <> 0 then err. Clear
End function
'2. If you use xmlhttp to call a webpage with Chinese characters, you can use the adodb. stream component to convert the webpage.
Function BytesToBstr (body)
Dim objstream
Set objstream = Server. CreateObject ("adodb. stream ")
Objstream. Type = 1
Objstream. Mode = 3
Objstream. Open
Objstream. Write body
Objstream. Position = 0
Objstream. Type = 2
Objstream. Charset = "GB2312" 'to convert the original default UTF-8 encoding to GB2312 encoding, otherwise the xml http component directly calls a Web page with Chinese characters will get garbled
BytesToBstr = objstream. ReadText
Objstream. Close
Set objstream = nothing
End Function
'Next try to call the http://www.jb51.net's html content
Dim Url, Html
Url = "http://www.jb51.net ";
Html = getHTTPPage (Url)
Response. write Html
%>
2. Several common functions
(1) InStr Functions
Description
Returns the position of the first occurrence of a character string (string2) in another string (string1.
Syntax
InStr (string1, string2)
For example:
Dim SearchString, SearchChar
SearchString = "http://www.jb51.net" 'the string to be searched in.
SearchChar = "jb51" 'search for "jb51 ".
MyBK = Instr (SearchString, SearchChar) 'returns 8
'If no value is found, "0" is returned. For example:
SearchChar = "BK"
MyBK = Instr (SearchString, SearchChar) 'returns 0
(2) Mid Functions
Description
Returns a specified number of characters from a string.
Syntax
Mid (string, start, over)
For example:
Dim MyBK
MyBK = Mid ("Our BK (www. google) Design ", 7, 12) 'intercept string" Our BK (www. google) Design "12 characters after 7th characters
'Now the value of MyBK is changed to "www. google"
(3) Replace Functions
Dim SearchString, SearchChar
SearchString = "Our BK design is a website construction resource website.
SearchString = Replace (SearchString, "BK design", "Www. google ")
'At this time, the SearchString value becomes "our Www. google is a website construction resource website"
3. Intercept the HTML code of the specified area
For example, I only want to obtain the text section between "<td>" and "</td>" in the following HTML code:
<Html>
<Title> (www. google) google search engine </title>
<Body>
<Table>
<Tr> <td> </tr>
<Tr> <td id = "Content"> BK (www. google) google search engine is a site with many resources ...... </Td> </tr>
</Table>
</Body>
</Html>
<%
......
Dim StrBK, start, over, RsBK
StrBK = getHTTPPage (webpage address)
Start = Instr (StrBK, "<td id =" "Content"> ") 'is used to locate the start position of a string. Someone may ask: the original code is <td id = "Content">. Why do you call <td id = "" Content ""> here? Answer: asp (In VBscript, two double quotation marks are used to represent a double quotation mark, because double quotation marks are sensitive characters for the program .)
Over = Instr (StrBK ,"... </Td> </tr> ") 'is used to locate the end of a string.
'You have to ask again: (why did the program call HTML code three more points before "... "Ah? A: tip: The above line also has a </td> </tr>. If you use </td> </tr> to locate the problem, the program mistakenly regards </td> </tr> of the preceding line as the end part of the string to be obtained.
RsBK = mid (StrBK, start, over-start) 'is used to retrieve the string between the start character and the over character in StrBK. As mentioned in the previous section of the mid function, over-start is used to calculate the distance between the start position and the end position, that is, the number of characters.
Response. write (RsBK) 'the content obtained by the final output program
%>
Don't be too happy. When you run it, you will find that the html code on the page is incorrect. Why? Because the html code you get is:
<Td id = "Content"> BK (www. google) google search engine is a site with many resources...
See it? Incomplete HTML code! What should we do? Start = Instr (StrBK, "<td id =" "Content"> ") This statement obtains the number of positions in StrBK, now we can add 17 after the program statement, then the program will point to the character after <td id = "Content">.
Okay, the program will be changed to this:
<%
......
Dim StrBK, start, over, RsBK
StrBK = getHTTPPage (webpage address)
Start = Instr (StrBK, "<td id =" "Content"> ") + 17
Over = Instr (StrBK ,"... </Td> </tr> ") 'here, you can remove the three dots by subtracting seven (-7 ).
RsBK = mid (StrBK, start, over-start)
Response. write (RsBK)
%>
In this way, we can steal what we want and display it on our own page ~
4. delete or modify the obtained characters
Replace "BK (www. google)" In RsBK with "BK ":
RsBK = replace (RsBK, "BK (www. google)", "BK ")
You can also directly Delete "(www. google:
RsBK = replace (RsBK, "(www. google )","")
Now RsBK becomes: "BK Google search engine is a site with many resources ...... "Now.
But in fact, the replace function may be unsuitable in some cases. For example, we want to remove all connections in a string. there may be many types of connections. replace can only replace one of them. We cannot replace it with one corresponding replace function?
However, you can use a regular expression to replace this operation. I will not go into detail here.
(1) How can we turn the pages of the other website into our own?
The answer is: Use the replace function and page parameter transfer.
For example, the target page contains the following code: "<a href00002.htm> next page </a>". We can use the content described above to obtain the string and then use the replace function: rsBK = replace (RsBK, "<a href =", "<a href = page. asp? Url = ")
Then, get the Url parameter value in the page. asp program, and use the collection technology to obtain the next page.
(2) how to store the obtained content into the database
Due to the limited space, let's talk about it here.
It is actually very simple:
Process the stolen content to prevent SQL Injection errors when writing data to the database, for example, replace (String ,"'","''")
Then execute an SQL command to insert the database ~
The above are just some preliminary applications about the XMLHTTP component. In fact, there are many other functions that it can implement, such as saving remote images to a local server and working with adodb. the stream component can save the obtained data to the database. Collection has a wide range of functions and uses.