ASP Acquisition Program Common function Analysis _ Thieves/Collection

Source: Internet
Author: User
Tags sql injection
Principle
The acquisition program actually uses the XMLHTTP component in the XML to invoke pages on other web sites. For example, news acquisition program, many are called Sina's news page, and some of the HTML has been replaced, while the ads also filtered. The advantages of using the acquisition program are: no maintenance site, because the data acquisition program from other sites, it will be updated with the site, you can save server resources, general acquisition program on several files, all Web content is from other sites. Disadvantages are: Instability, if the target site error, the program will also be wrong, and, if the target site to upgrade maintenance, then the acquisition program to make corresponding changes; speed, because it is a remote call, speed and read data on the local server, it is certainly slower.

One, Case
The following is a brief description of the application of XMLHTTP in ASP
Copy Code code as follows:

<%
' Common functions
' 1, enter URL target page address, return value Gethttppage is the HTML code of the target page
function gethttppage (URL)
Dim Http
Set Http=server.createobject ("MSXML2. XMLHTTP ")
Http.open "Get", Url,false
Http.send ()
If Http.readystate<>4 Then
Exit function
End If
Gethttppage=bytestobstr (Http.responsebody, "GB2312")
Set http=nothing
If Err.number<>0 then err. Clear
End Function
' 2, the conversion of XMLHTTP, directly with the use of Chinese characters to call the Web page will be chaos, can be converted through the ADODB.stream component
Function Bytestobstr (body)
Dim objstream
Set objstream = Server.CreateObject ("ADODB.stream")
Objstream. Type = 1
Objstream. Mode =3
Objstream. Open
Objstream. Write body
Objstream. Position = 0
Objstream. Type = 2
Objstream. Charset = "GB2312" converts the original default UTF-8 encoding to GB2312 encoding, otherwise directly using the XMLHTTP component to invoke the page with Chinese characters will be garbled
Bytestobstr = objstream. ReadText
Objstream. Close
Set objstream = Nothing
End Function
' Try to invoke http://www.jb51.net HTML content below
Dim url,html
Url= "Http://www.jb51.net";
Html = Gethttppage (URL)
Response.Write Html
%>

Two, a few commonly used functions
(i) INSTR function
Describe
Returns the position in which a character (string2) string appears for the first time in another string (string1).
Grammar
INSTR (string1, string2)
For example:
Dim SearchString, SearchChar
SearchString = "Http://www.jb51.net" ' string to search in.
SearchChar = "jb51" searches "jb51".
MYBK = Instr (searchstring, SearchChar) ' Return 8
' Returns ' 0 if it is not found, for example:
SearchChar = "BK"
MYBK = Instr (searchstring, SearchChar) ' return 0
(ii) Mid function
Describe
Returns a specified number of characters from a string.
Grammar
Mid (String, start, over)
For example:
Dim MYBK
MYBK = Mid ("Our BK (www.google) Design", 7, 12) ' intercepts the string "our BK (www.google) Design" The 7th character after 12 characters
' At this point the value of MYBK becomes ' www.google '
(iii) Replace function
Dim SearchString, SearchChar
SearchString = "Our BK design is a website construction resource site" ' The string to search in.
SearchString =replace (searchstring, "BK Design", "Www.google")
' At this point the value of SearchString becomes ' Our Www.google is a website Building resource website '

Iii. intercepting the HTML code for the specified area
For example, I just want to get the text section between "<td>" and "</td>" in the following HTML code:
<title> (www.google) Google search engine </title>
<body>
<table>
<tr><td></td></tr>
&LT;TR&GT;&LT;TD id= "Content" &GT;BK (www.google) Google search engine is a lot of resources site ......</td></tr>
</table>
</body>
<%
......
Dim STRBK,START,OVER,RSBK
Strbk=gethttppage (Address of Web page)
Start=instr (STRBK, "<td id=" "Content" ">") ' The role here is to get the location where the string starts. Here to ask someone: The original code is &LT;TD id= "content", how you call here is &LT;TD id= "content" "> Ah? Answer: In ASP (in the case of VBScript, it is true that a double quotation mark is used in two double quotes. , because double quotes are sensitive characters for the program. )
Over=instr (STRBK, "...</td></tr>") ' role here is to get the location of the end of the string.
"Here again asked:" (: The program calls the HTML code why more than 3 points in front of "..." Ah?) A: Hint: The above line also has a </td></tr&gt, if this place with </td></tr>, the program will mistakenly put the above line of </td></tr> As you want to get the end part of the string.
Rsbk=mid (Strbk,start,over-start) ' role here is to remove the string between the start character and the over character in the STRBK. The mid function I also said in the previous section, Over-start is to calculate the distance between the start and end positions, that is, the number of characters.
Response.Write (RSBK) ' The final output of the program get content
%>
Do not be happy too early, when you run, you will find the page's HTML code error, why? Because the HTML code you get is:
&LT;TD id= "Content" &GT;BK (www.google) Google search engine is a lot of resources site ...
Did you see that? There are incomplete HTML code AH! What do we do? Start=instr (STRBK, "<td id=" "Content" ">") this statement gets "&LT;TD id=" Content ">" The number of places in the STRBK, now we can add 17 to the end of the program statement, The program then points to the &LT;TD id= "Content" > the character behind it.
OK, the program will change to this:
<%
......
Dim STRBK,START,OVER,RSBK
Strbk=gethttppage (Address of Web page)
Start=instr (STRBK, "<td id=" "Content" ">") + 17
Over=instr (STRBK, "...</td></tr>") ' Here you can also subtract seven (-7) to remove 3 points
Rsbk=mid (Strbk,start,over-start)
Response.Write (RSBK)
%>
This is OK, we will be able to steal what we want to display in our own page, hehe ~

Iv. Delete or modify the acquired characters
Replace "BK" in RSBK with "BK": Www.google
Rsbk=replace (RSBK, "BK (Www.google)", "BK")
Or simply delete the words "(www.google)":
Rsbk=replace (RSBK, "(Www.google)", "")
Well, now RSBK has become: "BK Google search engine is a lot of resources site ...".
But in fact, there may be situations where the Replace function is not suitable, for example, we want to remove all the connections in a string. The connection may include many types, replace can only replace one of the specific one, we can not use a corresponding replace function to replace it?
However, you can use regular expressions instead of this work. There is no more detail here.
(a) How to turn the page of the other side of the website also to deal with our own?
The answer is: Use the Replace function and the transfer of page parameters.
For example, the other page contains such a paging code: "<a href=2.htm> next Page </a>", we can first use the content mentioned above, get this string, and then use the Replace function: Rsbk=replace (RSBK, "< A href= "," <a href=page.asp? Url= ")
Then page.asp the program to get the parameter value of the URL, and finally use the acquisition technology to get the next page you want the content on it.
(ii) How to put the acquired content into storage
Because of the limited space, here is a simple point.
It's actually very simple:
Handles the stolen content to prevent SQL injection errors when writing to the database, for example: replace (String, "'", "")
Then execute a SQL command to insert the database operation OK ~
These are just a few of the XMLHTTP components of the primary application, in fact it can also implement a lot of functions, such as saving remote images to the local server, with the ADODB.stream component can be obtained from the data saved into the database. The role of collection and the scope of use are very wide.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.