'/* =================================================== ==========================================
'* Intro has been studying web page encoding for a long time, because recently we have to design a vbs script for Link detection, and the pages of people who are connected to you may be various encodings, the method used in the past is: if the use of gb2312 can not find the use of UTF-8 to query, can not find proof that the other side did not give you a link although not 100% correct, but it is almost the same, there are many people using these two types of codes. The idea of seeing an address in the favorites folder by accident can finally be acquired Article The webpage code is automatically determined. This problem has been plagued for a long time during the research process. Although it seems simple now, many people are looking for it, so I will post these three functions.
'* Filename getwebcodepage. vbs
'* Author yongfa365
'* Version V2.0
'* Web http://www.yongfa365.com
'* Email yongfa365 [at] QQ.com
'* Firstwrite http://www.yongfa365.com/Item/GetWebCodePage.vbs.html
'* Madetime 2008-01-29 20:55:46
'* Lastmodify 2008-01-30 20:55:46
'* ===================================================== ============================================= */
Call gethttppage ("http://www.baidu.com /")
Call gethttppage ("http://www.google.com /")
Call gethttppage ("http://www.yongfa365.com /")
Call gethttppage ("http://www.cbdcn.com /")
Call gethttppage ("http://www.csdn.net /")
'Get the matched content and return an array
'Getcontents (expression, String, whether to return the reference value)
'Msgbox getcontents ("A (. + ?) B "," a23234b AB a67896896b sadfasdfb ", true) (0)
function getcontents (patrn, strng, yinyong)
'By www.yongfa365.com reprinted please keep the link, so that the end user can get the latest update information in time
on error resume next
set Re = new Regexp
re. pattern = patrn
re. ignorecase = true
re. global = true
set matches = Re. execute (strng)
If yinyong then
for I = 0 to matches. count-1
If matches (I ). value <> "" Then retstr = retstr & matches (I ). submatches (0) & "Liu yongfa"
next
else
for each omatch in matches
If omatch. value <> "" Then retstr = retstr & omatch. value & "Liu Yong Fa"
next
end if
getcontents = Split (retstr, "Liu Yong Fa")
end function
function gethttppage (URL)
on error resume next
set XMLHTTP = Createobject ("msxml2.xmlhttp")
XMLHTTP. open "get", URL, false
XMLHTTP. send
If XMLHTTP. status <> 200 then exit function
getbody = XMLHTTP. responsebody
the idea here is to find the file header Based on the returned string. If not, use gb2312. Generally, the encoding can be directly matched.
'in the returned string, although Chinese characters are garbled, the encoding is not affected,
getcodepage = getcontents ("charset = [" "'] * ([^" ",'] +)", XMLHTTP. responsetext, true) (0)
'Check the encoding in the header file
If Len (getcodepage) <3 then getcodepage = getcontents ("charset = [" '] * ([^ "",'] +) ", XMLHTTP. getResponseHeader ("Content-Type"), true) (0)
If Len (getcodepage) <3 then getcodepage = "gb2312"
set XMLHTTP = nothing
'The following sentence should be blocked during official use
wscript. echo URL & "-->" & getcodepage
gethttppage = bytestobstr (getbody, getcodepage)
end function
function bytestobstr (body, cset)
on error resume next
dim objstream
set objstream = Createobject ("ADODB. stream ")
objstream. type = 1
objstream. mode = 3
objstream. open
objstream. write body
objstream. position = 0
objstream. type = 2
objstream. charset = cset
bytestobstr = objstream. readtext
objstream. close
set objstream = nothing
end function