Recently, collection has been a hot topic, from news thieves to music thieves to news collection and flash collection. Many people are interested in collection, to serve everyone, I am also writing a collection set Program The name is the intention collection program. Now I will talk about the technologies used for collection.
The following is not a very advanced XMLHTTP technology. I also want to talk a little bit about a collection of several pieces of content that need to be used.
If you want to learn more, go to www.google.com to search for XMLHTTP. You will get more help. If you have any questions, post them on the forum.
The following describes how to obtain data on the Internet and does not involve data processing.
First XMLHTTP Technology
Http://www.0579.info/study/exploitation/net/58685.htm
The address above, that articleArticleThe basic principles have been described in detail, but we generally do not need to know much about them at the beginning. As long as it is practical, when it is not enough, it is also urgent to find the relevant documents.
First, we need to create an XMLHTTP object
The XMLHTTP component released by Microsoft already has many versions. I know the following:
"Msxml2.serverxmlhttp. 4.0"
"Msxml2.serverxmlhttp. 3.0"
"Msxml2.serverxmlhttp"
"Msxml2.xmlhttp. 5.0"
"Msxml2.xmlhttp. 4.0"
"Msxml2.xmlhttp. 3.0"
"Msxml2.xmlhttp"
"Microsoft. XMLHTTP
There are so many components above, we certainly need to apply for the highest version of the object, then what can be done?
Below I will take out a paragraphCodeYou can see that he applied for an XMLHTTP object based on the highest version.
Dim arrprogid, prog, flag, xmlhttpcom
Arrprogid = array ("msxml2.serverxmlhttp. 4.0 "," msxml2.serverxmlhttp. 3.0 "," msxml2.serverxmlhttp "," msxml2.xmlhttp. 5.0 "," msxml2.xmlhttp. 4.0 "," msxml2.xmlhttp. 3.0 "," msxml2.xmlhttp "," Microsoft. XMLHTTP ")
For each prog in arrprogid
If (isobjinstalled (Prog) = true) then
Xmlhttpcom = prog
Exit
End if
Next
'// <Summary>
'// REM check if the component supports true or false
'// </Summary>
Public Function isobjinstalled (strclassstring)
On Error resume next
'// Set the initialization value
Isobjinstalled = false
Err = 0
'// Test code
Dim xtestobj
Set xtestobj = server. Createobject (strclassstring)
If 0 = err then isobjinstalled = true
'// Clear the applied object
Set xtestobj = nothing
Err = 0
End Function
The above code is the XMLHTTP object of the highest version supported by the current server.
The following describes the collection functions.
'Getfiletext is a collection function
Public Function getfiletext (URL)
On Error resume next 'when there is an error, continue executing the code
Dim HTTP 'defines Variables
'Set HTTP = server. Createobject (xmlhttpcom) 'Application Object
Set HTTP = server. Createobject ("Microsoft. XMLHTTP") 'write a version that is generally supported by the server.
HTTP. Open "get", URL, and false' open the object and wait for the server response in the get Mode
HTTP. Send () 'send
If HTTP. readystate <> 4 then', exit the function if the server does not respond.
Exit Function
End if
Getfiletext = bytes2bstr (HTTP. responsebody, "gb2312") 'converts the binary data stream to the text character format (gb2312)
Set HTTP = nothing 'delete an object
If err. Number <> 0 then err. clear' if an error occurs, clear the error.
End Function
'// <Summary>
'// Use ADODB. Stream to process the collected data and convert the binary file into text characters
'// </Summary>
Function bytes2bstr (VIN, cset)
Dim bytesstream, stringreturn
Set bytesstream = server. Createobject ("ADODB. Stream ")
Bytesstream. type = 2
Bytesstream. Open
Bytesstream. writetext vin
Bytesstream. Position = 0
Bytesstream. charset = cset
Bytesstream. Position = 2
Stringreturn = bytesstream. readtext
Bytesstream. Close
Set bytesstream = nothing
Bytes2bstr = stringreturn
End Function
Below I will define a PATH variable URL
Url = "http://ent.sina.com.cn/star/mainland/more.html ";
The above is a Web site. If we want to collect and display the above address, we can do this.
Url = "http://ent.sina.com.cn/star/mainland/more.html ";
Response. Write getfiletext (URL)
In this way, the content of the above URL can be collected.
Is it easy?
What should I do after the collected data?
How to differentiate data? If you get the data you want, if you store the data into the database?
This is a problem that needs to be analyzed and explained in the future. You should pay attention to it in the warehouse receiving, and use a positive expression to process data.
Attach the source file of the above code. You can download the source file and run it to see if it can be collected to the database.