Recently some time, gathering very hot, from the news of thieves, to music thieves, to the news collection, Flash collection, there are traces of him, there are many people are very interested in collecting, in order to serve everyone, I am also writing a collection program, the name for the Mind acquisition program, now I talk about the acquisition of the relevant technology.
The following is also not very advanced XMLHTTP technology, I also talk about a little bit, a collection needs to use a few pieces of content
If you need to know more questions, please go to www.google.com search XMLHTTP technology you will get more help, if you have any questions, you can post on the forum
Here's how to get data on the Web, not data processing.
The above address, the basic principle of the article is already very detailed, but we generally collected words, just don't need to know too much at the beginning. As long as the practical can be, and so after the time is not enough, and then to find the relevant documents are also urgent
First, we need to create a XMLHTTP object
There are many versions of the XMLHTTP components that Microsoft has released, and I know the following:
There are so many components, of course, we need to apply for the highest version of the object, then what can be achieved?
Next I take out a piece of code, you can see, he is according to the highest version of the application XMLHTTP object
For each Prog in Arrprogid
If (isobjinstalled (Prog) = True) Then
xmlhttpcom = Prog
Exit for
End If
Next
'//<summary>
'//Rem check that the component is supported is return True no return False
'//</summary>
Public Function isobjinstalled (strclassstring)
On Error Resume Next
'//Set initialization value
isobjinstalled = False
ERR = 0
'//test code
Dim Xtestobj
Set xtestobj = Server.CreateObject (strclassstring)
If 0 = Err Then isobjinstalled = True
'//clear the requested object
Set xtestobj = Nothing
ERR = 0
End Function
The above code is the highest version of the XMLHTTP object that is requested to support the current server.
Now let's talk about the collection function.
' Getfiletext for collection function function
Public Function getfiletext (URL)
On Error Resume Next ' Continue executing code if there is an error
Dim http ' defines a variable
' Set http=server.createobject (xmlhttpcom) ' Application object
Set http=server.createobject ("Microsoft.XMLHTTP") ' to be safe, write out a version that the server generally supports
Http.open "Get", Url,false ' open object waits for server response with Get method
Http.send () ' Send
If Http.readystate<>4 Then ' If the server is not responding, exit the function
Exit Function
End If
Getfiletext=bytes2bstr (Http.responsebody, "GB2312") converts the resulting data stream binaries into text character format (GB2312)
Set http=nothing ' Delete Object
If err.number<>0 Then Err. Clear ' If there is an error, clears the error
End Function
'//<summary>
'//Adopt ADODB. Stream processes the data collected and turns binary files into text characters
'//</summary>
Function Bytes2bstr (Vin,cset)
Dim Bytesstream,stringreturn
Set Bytesstream = Server.CreateObject ("ADODB. Stream ")
Bytesstream.type = 2
Bytesstream.open
Bytesstream.writetext Vin
bytesstream.position = 0
Bytesstream.charset = CSet
Bytesstream.position = 2
Stringreturn =bytesstream.readtext
Bytesstream.close
Set Bytesstream = Nothing
Bytes2bstr = Stringreturn
End Function
So we can collect the contents of the Web site above.
Is it simple?
What should we do with the data after we collect it?
How to distinguish data, if you want the data, if you get the data into the warehouse
This is the problem that needs to be analyzed and explained in the future, the storage should pay attention to the place, with the positive expression of data processing
Attach the source file of the above code, you can download down, run up to try, is not really able to collect the database
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.