XMLHTTP batch captures Remote Data

Source: Internet
Author: User

We can combine regular expressions to achieve better results. I hope you can share the session Sharing Technology of XMLHTTP.

<HTML>
<Head>
<Title> autoget </title>
<Meta http-equiv = "Content-Type" content = "text/html; charset = gb2312">
</Head>
<Body bgcolor = "# ffffff" style = "font-family: Arial; font-size: 12px">
<%
'================================================ ============
'Filename: getit. asp
'Intro: auto get data from remote website
'Author: babyt (atai)
'Url: http://blog.csdn.net/babyt
'Createat: 2002-02 lastupdate: 2004-09
'Db table: Data
'Table field:
'Uid-> long-> keep ID of the pages
'Ucontent-> text-> keep content of the pages (HTML)
'================================================ ============

Server. scripttimeout = 5000

'On error resume next
Set conn = server. Createobject ("ADODB. Connection ")
Conn. Open "provider = Microsoft. Jet. oledb.4.0; Data Source =" & server. mappath ("getit. mdb ")
Set rs = server. Createobject ("ADODB. recordset ")
SQL = "select * from data"
Rs. Open SQL, Conn, 1, 3

Dim comefrom, myerr, mycount

'================================================ ========================
Comefrom = "http://www.xxx.com/U.asp? Id ="
Myerr1 = "this document does not exist"
Myerr2 = "this document is hidden"
'================================================ ========================

'*************************************** ************************
'You only need to change the start point intmin and end point intmax here, and set the step size intstep
'Set each interval to about 50 thousand. It may take more than two hours. No manual intervention is required during this period.
'*************************************** *************************
Intmin = 0
Intmax= 10000
'Set The step size
Intstep = 100

'================================================ ==============================
'BelowCode Do not change
'================================================ ==============================
Call getpart (intmin)
Response. Write "converted" & intmin &"~~ "& Intmax &" data"
Rs. Close
Set rs = nothing
Conn. Close
Set conn = nothing
%>
</Body>
</Html>
<%
'Use XMLHTTP to capture the address and process the content concurrently
Function getbody (URL)
Dim objxml
On Error resume next
Set objxml = Createobject ("Microsoft. XMLHTTP ")
With objxml
. Open "get", URL, false ,"",""
. Send
Getbody =. responsebody
End
Getbody = bytestobstr (getbody, "gb2312 ")
Set objxml = nothing
End Function
'Use ADODB. Stream to process binary data
Function bytestobstr (strbody, codebase)
Dim objstream
Set objstream = server. Createobject ("ADODB. Stream ")
Objstream. type = 1
Objstream. mode = 3
Objstream. Open
Objstream. Write strbody
Objstream. Position = 0
Objstream. type = 2
Objstream. charset = codebase
Bytestobstr = objstream. readtext
Objstream. Close
Set objstream = nothing
End Function
'Main Function
Function getpart (istart)
Dim IGO
Time1 = timer ()
Mycount = 0
For IGO = istart to istart + intstep
If IGO <= intmax then
Response. Execute comefrom & IGO
'For simple data processing
Content = getbody (comefrom & IGO)
Content = Replace (content, CHR (34 ),""")
If instr (content, myerr1) or instr (content, myerr2) then
'Skip error message
Else
'Write to database
Rs. addnew
RS ("uid") = IGO
'********************************
RS ("ucontent") = Replace (content, ", CHR (34 ))
'*********************************
Rs. Update
Mycount = mycount + 1
Response. Write IGO & "<br>"
Response. Flush
End if
Else
Response. Write "<font color = Red> successfully captured" & mycount & "records ,"
Time2 = timer ()
Response. Write "Time consumed:" & int (formatnumber (time2-time1) * 000000,3) & "seconds </font> <br>"
Response. Flush
Exit Function
End if
Next
Response. Write "<font color = Red> successfully captured" & mycount & "records ,"
Time2 = timer ()
Response. Write "Time consumed:" & CINT (formatnumber (time2-time1), 3) & "seconds </font> <br>"
Response. Flush
'Recursion
Getpart (IGO + 1)
End function %>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.