Program | Thief program
The thief procedure is actually a lazy man's magic. It can steal articles, live news, songs, and even the function of song data search and warehousing, etc. function is powerful, but there are many friends everywhere begging for the thief program, ask what? Because there are no more than a few details on the Internet Thief Program for your reference. In fact, it is not difficult to do the thief program, I am here to do the experience of the thief program posted to everyone to see, there may be some wrong place, please point out a lot.
First, basic principles and simple examples
Principle part of the Internet too much, blue1000 also do not need to be here wordy, quote here, hey! Note: This content is a reference (partially modified): Original author: 572019 References from: Easy network
(i) principle
The thief program actually invokes pages on other Web sites through the XMLHTTP component in the XML. For example, the News thief program, many are called Sina's news page, and some of the HTML is replaced, while the ads also filtered. The advantages of using a thief program are: no maintenance site, because the Thief program data from other sites, it will be updated with the site update, you can save server resources, the General Thief program on several files, all Web content is from other sites. Disadvantages are: Instability, if the target site error, the program will also be wrong, and, if the target site to upgrade maintenance, then the thief program to make the corresponding changes; speed, because it is a remote call, speed and read data on the local server, it must be slower.
(ii) case
The following is a brief description of the application of XMLHTTP in ASP
<%
' Common functions
' 1, enter URL target page address, return value Gethttppage is the HTML code of the target page
function gethttppage (URL)
Dim Http
Set Http=server.createobject ("MSXML2. XMLHTTP ")
Http.open "Get", Url,false
Http.send ()
If Http.readystate<>4 Then
Exit function
End If
Gethttppage=bytestobstr (Http.responsebody, "GB2312")
Set http=nothing
If Err.number<>0 then err. Clear
End Function
' 2, the conversion of XMLHTTP, directly with the use of Chinese characters to call the Web page will be chaos, can be converted through the ADODB.stream component
Function Bytestobstr (body)
Dim objstream
Set objstream = Server.CreateObject ("ADODB.stream")
Objstream. Type = 1
Objstream. Mode =3
Objstream. Open
Objstream. Write body
Objstream. Position = 0
Objstream. Type = 2
Objstream. Charset = "GB2312" converts the original default UTF-8 encoding to GB2312 encoding, otherwise directly using the XMLHTTP component to invoke the page with Chinese characters will be garbled
Bytestobstr = objstream. ReadText
Objstream. Close
Set objstream = Nothing
End Function
' Try to invoke http://www.3doing.com/earticle/HTML content below
Dim url,html
Url= "http://www.3doing.com/earticle/";
Html = Gethttppage (URL)
Response.Write Html
%>
Note: This content is a reference (partially modified): Original author: 572019 References from: Easy network
Two, a few commonly used functions
(i) INSTR function
Description: Returns the position in which a character (string2) string appears for the first time in another string (string1).
Grammar:
INSTR (string1, string2)
For example:
Dim SearchString, SearchChar
SearchString = "http://blue1000.com" ' string to search in.
SearchChar = "blue1000" searches "blue1000".
MYBK = Instr (searchstring, SearchChar) ' Return 8
' Returns ' 0 if it is not found, for example:
SearchChar = "BK"
MYBK = Instr (searchstring, SearchChar) ' return 0
(ii) Mid function
Describe
Returns a specified number of characters from a string.
Grammar
Mid (String, start, over)
For example:
Dim MYBK
MYBK = Mid ("Our BK (blue1000.com) Design", 7, 12) ' intercepts the string "our BK (blue1000.com) Design" The 7th character after 12 characters
' At this point the value of MYBK becomes ' blue1000.com '
(iii) Replace function (this I will not elaborate, for example)
Dim SearchString, SearchChar
SearchString = "Our BK design is a website construction resource site" ' The string to search in.
SearchString =replace (searchstring, "BK Design", "blue1000.com")
' At this point the value of SearchString becomes ' Our blue1000.com is a website Building resource website '
How do you get the code for the specified part, which only says how to get the HTML code for the entire page and a few common functions? How do I remove content that I don't need? How can I change the link of the outside station to mine? And how to achieve the original turn the page into my?
On a piece of blue1000 to tell us some of the principles and a few commonly used functions, this section to say some skills of things ~ (to master, this is nonsense, so the master can not look, but do not scold me OH)
(a) How do you intercept the HTML code for the specified region?
(Where does the HTML code for the other site look?) Halo: IE Browser >> view >> source files. Don't tell me you don't know what IE is! For example, I just want to get the text section between "<td>" and "</td>" in the following HTML code:
<TITLE>BK (blue1000.com) Design--Web page production resource site </title>
<body>
<table>
<tr><td></td></tr>
<TR><TD id= "Content" >BK (blue1000.com) Design--Web page Production resource site is a lot of resources site ......</td></tr>
</table>
</body>
<%
......
Dim STRBK,START,OVER,RSBK
Strbk=gethttppage (Address of Web page)
Start=instr (STRBK, "<td id=" "Content" ">") ' The role here is to get the location where the string starts. The InStr function said in the previous section Oh ~
"Someone here to ask: The original code is <TD id=" content ", how you call here is <TD id=" content "" > Ah? Answer: In ASP (in VBScript, it is true that a double quotation mark is used in two double quotes, because double quotes are sensitive characters for the program.) )
Over=instr (STRBK, "...</td></tr>") ' role here is to get the location of the end of the string.
"Here again asked:" (: The program calls the HTML code why more than 3 points in front of "..." Ah?) A: Hint: The above line also has a </td></tr>, if this place with </td></tr>, the program will mistakenly put the above line of </td></tr> As you want to get the end part of the string.
Rsbk=mid (Strbk,start,over-start) ' role here is to remove the string between the start character and the over character in the STRBK. The mid function I also said in the previous section, Over-start is to calculate the distance between the start and end positions, that is, the number of characters.
Response.Write (RSBK) ' The final output of the program get content
%>
Do not be happy too early, when you run, you will find the page's HTML code error, why? Because the HTML code you get is:
<TD id= "Content" >BK (blue1000.com) design-Web Production resource site is a lot of resources site ...
Did you see that? There are incomplete HTML code AH! What do we do? Start=instr (STRBK, "<td id=" "Content" ">") this statement gets "<TD id=" Content ">" The number of places in the STRBK, now we can add 17 to the end of the program statement, The program then points to the <TD id= "Content" > the character behind it.
OK, the program will change to this:
<%
......
Dim STRBK,START,OVER,RSBK
Strbk=gethttppage (Address of Web page)
Start=instr (STRBK, "<td id=" "Content" ">") + 17
Over=instr (STRBK, "...</td></tr>") ' Here you can also subtract seven (-7) to remove 3 points
Rsbk=mid (Strbk,start,over-start)
Response.Write (RSBK)
%>
This is OK, we will be able to steal what we want to display in our own page, hehe ~
(b) How do you delete a portion of the acquired character or do some modification?
On top of that, we can replace "BK" in RSBK with "BK": blue1000.com
Rsbk=replace (RSBK, "BK (blue1000.com)", "BK")
Or simply delete the words "(blue1000.com)":
Rsbk=replace (RSBK, "(blue1000.com)", "")
Well, now RSBK has become: "BK Design--Web page Production resource site is a lot of resources site ...".
But in fact, there may be situations where the Replace function is not suitable, for example, we want to remove all the connections in a string. The connection may include many types, replace can only replace one of the specific one, we can not use a corresponding replace function to replace it?
So now it's time to use regular expressions in the program.
If a friend just wants to get rid of all the links on the page, pay attention to Blue1000.com's next tutorial:<< how to clear all links in a Web page [regular expression]>>
(c) How will the page of the other website be processed into our own?
The answer is: Use the Replace function and the transfer of page parameters.
For example, the other page contains such a paging code: "<a href=2.htm> next Page </a>", we can first use the content mentioned above, get this string, and then use the Replace function: Rsbk=replace (RSBK, "< A href= "," <a href=page.asp? Url= ")
Then page.asp the program to get the parameter value of the URL, and finally use the thief technology to get the next page you want the content on it.
(iv) How to put the acquired content into storage
Because of the limited space, here is a simple point.
It's actually very simple:
Handles the stolen content to prevent SQL injection errors when writing to the database, for example: replace (String, "'", "")
Then execute a SQL command to insert the database operation OK ~
These are just a few of the XMLHTTP components of the primary application, in fact it can also implement a lot of functions, such as saving remote images to the local server, with the ADODB.stream component can be obtained from the data saved into the database. Thieves have a wide range of roles and uses.