Collection Principle---Collection technology---xmlhttp_ thieves/collection

Source: Internet
Author: User
Recently some time, gathering very hot, from the news of thieves, to music thieves, to the news collection, Flash collection, there are traces of him, there are many people are very interested in collecting, in order to serve everyone, I am also writing a collection program, the name for the Mind acquisition program, now I talk about the acquisition of the relevant technology.


The following is also not very advanced XMLHTTP technology, I also talk about a little bit, a collection needs to use a few pieces of content
If you need to know more questions, please go to www.google.com search XMLHTTP technology you will get more help, if you have any questions, you can post on the forum

Here's how to get data on the Web, not data processing.

First XMLHTTP Technology

Http://www.0579.info/study/exploitation/net/58685.htm

The above address, the basic principle of the article is already very detailed, but we generally collected words, just don't need to know too much at the beginning. As long as the practical can be, and so after the time is not enough, and then to find the relevant documents are also urgent

First, we need to create a XMLHTTP object
There are many versions of the XMLHTTP components that Microsoft has released, and I know the following:

"MSXML2. serverxmlhttp.4.0 "
"MSXML2. serverxmlhttp.3.0 "
"MSXML2. ServerXMLHTTP "
"MSXML2. xmlhttp.5.0 "
"MSXML2. xmlhttp.4.0 "
"MSXML2. xmlhttp.3.0 "
"MSXML2. XMLHTTP "
"Microsoft.XMLHTTP


There are so many components, of course, we need to apply for the highest version of the object, then what can be achieved?
Next I take out a piece of code, you can see, he is according to the highest version of the application XMLHTTP object

Dim arrprogid,prog,flag,xmlhttpcom

Arrprogid = Array ("MSXML2. serverxmlhttp.4.0 "," MSXML2. serverxmlhttp.3.0 "," MSXML2. ServerXMLHTTP "," MSXML2. xmlhttp.5.0 "," MSXML2. xmlhttp.4.0 "," MSXML2. xmlhttp.3.0 "," MSXML2. XMLHTTP "," Microsoft.XMLHTTP ")

For each Prog in Arrprogid
If (isobjinstalled (Prog) = True) Then
xmlhttpcom = Prog
Exit for
End If
Next


'//<summary>
'//Rem check that the component is supported is return True no return False
'//</summary>
Public Function isobjinstalled (strclassstring)
On Error Resume Next

'//Set initialization value

isobjinstalled = False
ERR = 0

'//test code

Dim Xtestobj
Set xtestobj = Server.CreateObject (strclassstring)
If 0 = Err Then isobjinstalled = True

'//clear the requested object

Set xtestobj = Nothing
ERR = 0
End Function


The above code is the highest version of the XMLHTTP object that is requested to support the current server.

Now let's talk about the collection function.


' Getfiletext for collection function function
Public Function getfiletext (URL)
On Error Resume Next ' Continue executing code if there is an error
Dim http ' defines a variable
' Set http=server.createobject (xmlhttpcom) ' Application object
Set http=server.createobject ("Microsoft.XMLHTTP") ' to be safe, write out a version that the server generally supports
Http.open "Get", Url,false ' open object waits for server response with Get method
Http.send () ' Send
If Http.readystate<>4 Then ' If the server is not responding, exit the function
Exit Function
End If

Getfiletext=bytes2bstr (Http.responsebody, "GB2312") converts the resulting data stream binaries into text character format (GB2312)

Set http=nothing ' Delete Object
If err.number<>0 Then Err. Clear ' If there is an error, clears the error
End Function


'//<summary>
'//Adopt ADODB. Stream processes the data collected and turns binary files into text characters
'//</summary>
Function Bytes2bstr (Vin,cset)
Dim Bytesstream,stringreturn
Set Bytesstream = Server.CreateObject ("ADODB. Stream ")
Bytesstream.type = 2
Bytesstream.open
Bytesstream.writetext Vin
bytesstream.position = 0
Bytesstream.charset = CSet
Bytesstream.position = 2
Stringreturn =bytesstream.readtext
Bytesstream.close
Set Bytesstream = Nothing
Bytes2bstr = Stringreturn
End Function


Below I define a path variable URL

URL = "http://ent.sina.com.cn/star/mainland/more.html";

The above is a URL, if we want to take the above address to collect, and show it, we can do this


URL = "http://ent.sina.com.cn/star/mainland/more.html";

Response.Write Getfiletext (URL)


So we can collect the contents of the Web site above.
Is it simple?

What should we do with the data after we collect it?
How to distinguish data, if you want the data, if you get the data into the warehouse
This is the problem that needs to be analyzed and explained in the future, the storage should pay attention to the place, with the positive expression of data processing


Attach the source file of the above code, you can download down, run up to try, is not really able to collect the database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.