Webpage Encoding Problems When collecting articles in vbs or ASP

Source: Internet
Author: User

'/* =================================================== ==========================================
'* Intro has been studying web page encoding for a long time, because recently we have to design a vbs script for Link detection, and the pages of people who are connected to you may be various encodings, the method used in the past is: if the use of gb2312 can not find the use of UTF-8 to query, can not find proof that the other side did not give you a link although not 100% correct, but it is almost the same, there are many people using these two types of codes. The idea of seeing an address in the favorites folder by accident can finally be acquired Article The webpage code is automatically determined. This problem has been plagued for a long time during the research process. Although it seems simple now, many people are looking for it, so I will post these three functions.
'* Filename getwebcodepage. vbs
'* Author yongfa365
'* Version V2.0
'* Web http://www.yongfa365.com
'* Email yongfa365 [at] QQ.com
'* Firstwrite http://www.yongfa365.com/Item/GetWebCodePage.vbs.html
'* Madetime 2008-01-29 20:55:46
'* Lastmodify 2008-01-30 20:55:46
'* ===================================================== ============================================= */

Call gethttppage ("http://www.baidu.com /")
Call gethttppage ("http://www.google.com /")
Call gethttppage ("http://www.yongfa365.com /")
Call gethttppage ("http://www.cbdcn.com /")
Call gethttppage ("http://www.csdn.net /")

'Get the matched content and return an array
'Getcontents (expression, String, whether to return the reference value)
'Msgbox getcontents ("A (. + ?) B "," a23234b AB a67896896b sadfasdfb ", true) (0)

function getcontents (patrn, strng, yinyong)
'By www.yongfa365.com reprinted please keep the link, so that the end user can get the latest update information in time
on error resume next
set Re = new Regexp
re. pattern = patrn
re. ignorecase = true
re. global = true
set matches = Re. execute (strng)
If yinyong then
for I = 0 to matches. count-1
If matches (I ). value <> "" Then retstr = retstr & matches (I ). submatches (0) & "Liu yongfa"
next
else
for each omatch in matches
If omatch. value <> "" Then retstr = retstr & omatch. value & "Liu Yong Fa"
next
end if
getcontents = Split (retstr, "Liu Yong Fa")
end function

function gethttppage (URL)
on error resume next
set XMLHTTP = Createobject ("msxml2.xmlhttp")
XMLHTTP. open "get", URL, false
XMLHTTP. send
If XMLHTTP. status <> 200 then exit function
getbody = XMLHTTP. responsebody
the idea here is to find the file header Based on the returned string. If not, use gb2312. Generally, the encoding can be directly matched.
'in the returned string, although Chinese characters are garbled, the encoding is not affected,
getcodepage = getcontents ("charset = [" "'] * ([^" ",'] +)", XMLHTTP. responsetext, true) (0)
'Check the encoding in the header file
If Len (getcodepage) <3 then getcodepage = getcontents ("charset = [" '] * ([^ "",'] +) ", XMLHTTP. getResponseHeader ("Content-Type"), true) (0)
If Len (getcodepage) <3 then getcodepage = "gb2312"
set XMLHTTP = nothing
'The following sentence should be blocked during official use
wscript. echo URL & "-->" & getcodepage
gethttppage = bytestobstr (getbody, getcodepage)
end function

function bytestobstr (body, cset)
on error resume next
dim objstream
set objstream = Createobject ("ADODB. stream ")
objstream. type = 1
objstream. mode = 3
objstream. open
objstream. write body
objstream. position = 0
objstream. type = 2
objstream. charset = cset
bytestobstr = objstream. readtext
objstream. close
set objstream = nothing
end function

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.