ASP uses Microsoft.XMLHTTP to crawl Web content. and filter the required content
ASP uses microsoft.xmlhttp crawl Web content without garbled processing, and filter the required content
Demo Sample source code:
<% Dim Xmlurl,http,strhtml,strbody xmlUrl = Request.QueryString ("u") REM asynchronously reads the XML source Set http = server. CreateObject ("Microsoft.XMLHTTP") http. Open "POST", Xmlurl,false http.setrequestheader "User-agent", "mozilla/4.0" Http.setrequestheader "Connection", " Keep-alive "Http.setrequestheader" Content-type "," application/x-www-form-urlencoded "http. Send () strhtml = Bytestobstr (http. Responsebody) Set http = Nothing REM crawl main content strbody = GetBody (strhtml, "<div id=" "Div_newscontentc" "class=" "cnt" ">" , "</div>", 0,0) strbody =replace (Strbody, "(This article starts with", "") Strbody =replace (strbody, "Wealth Power network </a>. Reprint please indicate the source. "," ") Strbody =replace (strbody," This article starts in, reprint please indicate the source. ")"," ") Strbody =replace (Strbody," Rich power network </a>:http://www.927953.com "," ") Strbody =replace (strbody," This article starts with "," ") Response.Write Regremovehref (strbody) REM gets the corresponding URL response htmlfunction bytestobstr (body) Dim objstream Set objstream = Serve R.createobject ("ADODB.stream") objstream. Type = 1 objstream. Mode =3 objstream. Open objstream. Write Body objstream. Position = 0 objstream. Type = 2 objstream. Charset = "UTF-8" converts the original default UTF-8 encoding to GB2312 encoding. Otherwise directly with the ' XMLHTTP call with Chinese characters of the page will be garbled bytestobstr = objstream. ReadText objstream. The Close Set objstream = NothingEnd Functionrem uses the normal form. Grab the contents of the tag within the function GetBody (constr,startstr,overstr,inclul,inclur) If constr= "$False $" or constr= "" or IsNull (constr) = True or startstr= "" or IsNull (STARTSTR) =true or overstr= "" or IsNull (overstr) =true then getbody= "$False $" Exit F Unction End If Dim constrtemp Dim Start,over Constrtemp=lcase (Constr) startstr=lcase (STARTSTR) overstr=lcase (O VERSTR) Start = InStrB (1, ConstrTemp, Startstr, Vbbinarycompare) if Start<=0 then getbody= "$False $" Exit Function Else if Inclul=fal Se then Start=start+lenb (startstr) End If End If OVER=INSTRB (Start,constrtemp,overstr,vbbinarycompare) If over<=0 Or Over<=start then getbody= ' $False $ ' Exit Function Else If Inclur=true then Ove R=over+lenb (OVERSTR) End If End If GETBODY=MIDB (constr,start,over-start) End Functionrem filter a hyperlink function Regremoveh Ref (HTMLSTR) Dim Clstemplosestr,regex clstemplosestr = Cstr (htmlstr) Set regEx = New RegExp regex.pattern = "< (\ \) {0,1}a[^<>]*>" Regex.ignorecase = True Regex.global = True clstemplosestr = Regex.Replace (Clstemplosestr, "") Regremovehref = clstemplosestr Set regEx = NothingEnd function%>
For example, the following:
watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqvewltaxl1yw5nz3vhbmc=/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma== /dissolve/70/gravity/southeast "/>
ASP uses microsoft.xmlhttp crawl Web content without garbled processing, and filter the required content